簡易檢索 / 詳目顯示

研究生: 林政龍
Cheng-Lung Lin
論文名稱: 結合簡單貝氏分類器與隱藏式馬可夫模型應用於網路流量分類
Internet Traffic Classification based on Hybrid Naive Bayes HMMs Classifier
指導教授: 鮑興國
Hsing-Kuo Pao
口試委員: 李育杰
Yuh-Jye Lee
鄧惟中
Wei-Chung Teng
陳存暘
Chun-Yang Chen
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2008
畢業學年度: 96
語文別: 英文
論文頁數: 55
中文關鍵詞: 隱藏式馬可夫模型簡易貝氏分類器網路流量傳輸層行為網路安全隱藏通道
外文關鍵詞: hidden Markov models, Naive Bayes, traffic classification, transition layer behavior, network security, covert channel
相關次數: 點閱:236下載:4
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 網路管理者必須仰賴自動化管理系統來處理大型網路架構,以減少人力成本。傳統上,大多數的防火牆都是簡單的以埠號(port number),或是擷取封包內容中的特殊字串以辨識應用協定。但這些傳統的安全機制只使用埠號作為辨識應用協定在現實生活中是多麼不可靠,並且在加密協定越來越普及的情況下,擷取封包內容必須經過加、解密,使得這類方法越來越不適用。因此,近來相關的研究中,便使用傳輸層中未被加密的有限資訊,來辨識網路流量的行為特徵。在此研究裡,我們使用混合簡易貝氏分類器與隱藏式馬可夫模型做為我們的系統架構,來有效地區分各應用協定或異常網路行為。利用簡易貝氏分類器簡單、快速的特性來處理多變量資料,並混合隱藏式馬可夫模型的時間特性來找出封包間的時間關係,來強化系統架構與提升正確率。我們實做此自動化系統並提供使用者介面應用已加密的網路中,分析並觀察應用協定分佈,以及偵測網路異常行為,最後以實際資料證實其有效性。


    To deal with the large network infrastructure, we must rely on an automatic network management system. Traditionally, most of the firewall simply use the port number of the packets to identify abnormal network traffic. Furthermore, some of them observe the characteristic in application layer to identify abnormal network traffic such as the payload of a packet. However, the traditional security mechanisms encounter difficulties with the increasing popularity of encrypted protocols. Recently, some related researches which can identify application protocol by some restricted characteristics and behaviors in transition layer of TCP/IP model after encryption. Therefore, we combine and implement two models which are Naive Bayes and Hidden Markov Models (HMMs) as an automatic system and use the limited information of encrypted packets to infer and classify the application protocol behavior. Generally speaking, HMMs are relatively good to estimate the potential relationship with temporal data. Naive Bayes is simple, fast, and effective. It is usually used for dealing multidimension dataset in lots of cases. In this thesis, we propose hybrid Naive Bayes HMMs classifier as a fundamental framework to infer application protocol behavior in encrypted network traffic. The hybrid model uses the temporal property of HMMs to inspect the relation between the packets and employs Naive Bayes to character the statistical signature. In this study, our approach can not only identify network behavior in encrypted network traffic, but also employ the temporal property to raise the accuracy. It can be applied to infer application protocol and detects the abnormal behavior. Comparing to related researches, our method only uses a few features to classify multi-flow protocol and get respectable performance.

    1 Introduction 1.1 Motivation 1.2 Main Work 1.3 Organization of Thesis 2 Related Research 2.1 Network Behavior Classification Methods 2.1.1 Port-based Classification 2.1.2 Payload-based Classification 2.1.3 Statistical Signature-based Classification 2.1.4 Markovian Signature-based Classification 2.2 Candidate Features 3 System Framework 3.1 Notations 3.2 Hidden Markov Models 3.2.1 Viterbi Algorithm 3.2.2 Forward Algorithm 3.2.3 Backward Algorithm 3.3 Parameter Re-Estimation 3.4 Naive Bayes 3.5 Hybrid Naive Bayes HMMs 3.5.1 Extended Baum-Welch algorithm 4 Experiment Results 4.1 Protocol Classification for QoS Management 4.2 Behavior Level 5 Conclusion and Future Work 5.1 Conclusion 5.2 Future Work

    [1] Jpcap. http://netresearch.ics.uci.edu/kfujii/jpcap/doc/index.html.
    [2] Symantec internet security threat report. volume IX, Cupertino CA, 2006. Symantec Corporation.
    [3] Testbed at NCKU. http://testbed.ncku.edu.tw/.
    [4] S. Castro and Gray World Team. Covert channel tunneling tool(cctt). Technical report, 2003. http://www.gray-world.net/pr cctt.shtml.
    [5] S. Castro and Gray World Team. How to cook a covert channel. Technical report, 2006.
    [6] CERT/CC. http://www.cert.org/stats/cert stats.html.
    [7] G. A. Churchill. Stochastic models for heterogeneous DNA sequences, volume 51, pages 79-94. Bull. Math. Biol., 1989.
    [8] B. Cohen. Incentives build robustness in bittorrent. In Proceedings of the First Workshop on the Economics of Peer-to-Peer Systems, Berkeley, June 2003.
    [9] M. Collins, C. Gates, and G. Kataria. A model for opportunistic network exploits: The case of p2p worms. 2006.
    [10] M. P. Collins and M. K. Reiter. Finding peer-to-peer file-sharing using coarse network behaviors. Springer, 2006.
    [11] G. Conti. Security Data Visualization. No Starch Press, San Francisco, CA, USA, 2007.
    [12] H. Dahmouni, S. Vaton, and D. Rosse. A markovian signature-based approach to ip traffic classification. In MineNet '07: Proceedings of the 3rd annual ACM workshop on Mining network data, pages 29-34, New York, NY, USA, 2007. ACM.
    [13] D. E. Denning. An intrusion-detection model. IEEE Trans. Softw. Eng., 13(2):222-232, 1987.
    [14] C. Dewes, A. Wichmann, and A. Feldmann. An analysis of internet chat systems. In IMC '03: Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement, pages 51-64, New York, NY, USA, 2003. ACM.
    [15] R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological sequence analysis, chapter 3. Cambridge University Press, 1998.
    [16] A. Dyatlov. Firepass, 2003. http://gray-world.net/pr_Firepass.shtml.
    [17] K. L. Eikvil and R. B. Huseby. Applications of hidden markov chains in image analysis. In The Journal of Pattern Recognition Society, volume 32, pages 703-713, 1999.
    [18] R. J. Elliott, L. Aggoun, and J. B. Moore. Hidden markov models: Estimation and control. Springer, New York, 1995.
    [19] The International Collaboration for Advancing Security Technology (iCAST) botnet dataset wiki. http://140.118.155.19.59/trac/ideasds/wiki/BotnetDataset.
    [20] Foundstone. Fpipe. Technical report, August 2000. http://www.foundstone.com/us/index.asp.
    [21] T. Karagiannis, A. Broido, N. Brownlee, Kc. Cluffy, and M. Faloutsos. Is p2p dying or just hiding. In IEEE Globecom, 2004.
    [22] T. Karagiannis, K. Papagiannaki, and M. Faloutsos. Blinc: multilevel traffic classification in the dark. SIGCOMM Comput. Commun. Rev., 35(4):229-240, 2005.
    [23] S. Kondo and N. Sato. Botnet traffic detection techniques by c&c session classification using svm. In Proceedings of the 2nd International Workshop on Security (IWSEC), 2007.
    [24] A. Krogh, M. Brown, I. S. Mian, K. Sjolander, and D. Haussler. Hidden markov models in computational biology: Applications to protein modeling. Technical report, Santa Cruz, CA, USA, 1993.
    [25] J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML '01: Proceedings of the Eighteenth International Conference on Machine Learning, pages 282-289, San Francisco, CA, USA, 2001. Morgan Kaufmann Publishers Inc.
    [26] Y. C. Lee. Intrusion detection systems with temporal realationships. Master's thesis, National Taiwan University of Science and Technology, August 2008.
    [27] R. Lippmann, J. W. Haines, D. J. Fried, J. Korba, and K. Das. The 1999 DARPA off-line intrusion detection evaluation. Comput. Netw., 34(4):579-595, 2000. http://dx.doi.org/10.1016/S1389-1286(00)00139-0.
    [28] A. McCallum, D. Freitag, and F. C. N. Pereira. Maximum entropy markov models for information extraction and segmentation. In ICML '00: Proceedings of the Seventeenth International Conference on Machine Learning, pages 591-598, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc.
    [29] A. W. Moore and K. Papagiannaki. Toward the accurate identification of network applications. In Proceedings of the Sixth Passive and Active Measurement Workshop (PAM 2005), March 2005.
    [30] A. W. Moore and D. Zuev. Discriminators for use in flow-based classification. Technical report, Intel Reasearch, Cambridge, 2005.
    [31] A. W. Moore and D. Zuev. Internet traffic classification using bayesian analysis techniques. SIGMETRICS Perform. Eval. Rev., 33(1):50-60, 2005.
    [32] U.S. Department of Defense. Trusted Computer System Evaluation Criteria. DoD 5200.28-STD, Washington: GPO, 1985.
    [33] L. Rabiner and B. Juang. An introduction to hidden markov models. ASSP Magazine, IEEE [see also IEEE Signal Processing Magazine], 3(1):4-16, Jan 1986.
    [34] L. R. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. pages 267-296, 1990.
    [35] M. A. Rajab, J. Zarfoss, F. Monrose, and A. Terzis. A multifaceted approach to understanding the botnet phenomenon. In Proceedings of ACM SIGCOMM/USENIX Internet Measurement Conference (IMC), pages 41-52, 2006.
    [36] E. Rescorla. SSL and TLS: Designing and Building Secure Systems. Addison-Wesley Professiona, 10 2000.
    [37] M. Roughan, S. Sen, O. Spatscheck, and N. Duffield. Class-of-service mapping for QoS: a statistical signature-based approach to ip traffic classification. In IMC '04: Proceedings of the 4th ACM SIGCOMM conference on Internet measurement, pages 135-148, New York, NY, USA, 2004. ACM.
    [38] L. Salgarelli, F. Gringoli, and T. Karagiannis. Comparing traffic classifiers. SIGCOMM Comput. Commun. Rev., 37(3):65-68, 2007.
    [39] S. Sen, O. Spatscheck, and D. Wang. Accurate, scalable in-network identification of p2p traffic using application signatures. In WWW '04: Proceedings of the 13th international conference on World Wide Web, pages 512-521, New York, NY, USA, 2004. ACM.
    [40] 1999 DARPA Intrusion Detection Evaluation Data Set. http://www.ll.mit.edu/mission/communications/ist/corpora/ideval/data/1999data.html.
    [41] I. H. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco, 2nd edition, 2005.
    [42] C. V. Wright, F. Monrose, and G. M. Masson. On inferring application protocol behaviors in encrypted network traffic. J. Mach. Learn. Res., 7:2745-2769, 2006.
    [43] T. Ylonen. SSH - secure login connections over the internet. Proceedings of the 6th Security Symposium (USENIX Association: Berkeley, CA):37, 1996.
    [44] L. Yu and H. Liu. Feature selection for high-dimensional data: A fast correlation-based filter solution. In Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003), 2003.
    [45] Y. Zhang and V. paxson. Detecting back doors. In Proceedings of the 9th USENIX Security Symposium, pages 157-170, August 2000a.

    QR CODE