簡易檢索 / 詳目顯示

研究生: 吳承儒
Cheng-Ru Wu
論文名稱: 以組合式分類器偵測點對點傳輸資料之研究
P2P Flow Identification by Ensemble Classification
指導教授: 陳郁堂
Yie-Tarng Chen
口試委員: 徐俊傑
Chiun-Chieh Hsu
吳乾彌
Chen-Mie Wu
方文賢
Wen-Hsien Fang
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2009
畢業學年度: 97
語文別: 英文
論文頁數: 55
中文關鍵詞: 隱藏式馬可夫模型適當提升演算法組合式分類器P2P流量偵測
外文關鍵詞: Hidden Markov Model, AdaBoost, Ensemble classification, P2P flow identification
相關次數: 點閱:165下載:9
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來點對點(Peer-to-Peer; P2P)傳輸嚴重的影響網路傳輸品質,佔用了大部分的網路頻寬,現行多數P2P應用已經可以任意的設定port number來躲避well-define port的偵測阻擋。而在學網和ISP業者多數使用L7 filter,它可以藉由分析封包中的payload來阻擋P2P的傳輸,但P2P應用可以透過通訊協定表頭加密(Protocol Encryption; PE)躲避L7 filter的分析。先前的相關研究中,使用機器學習演算法去偵測P2P流量都遭遇到偵測率偏低和虛警率過高的情況。因此在本論文中,我們提出了一個結合隱藏式馬可夫模型(Hidden Markov Model; HMM)和適當提升演算法(Adaptive boost algorithm; AdaBoost)的整合分類系統去偵測P2P資料流,我們的系統分成兩個階段,第一階段中我們利用封包傳送大小變化的特性,類比成生物序列的形式,利用隱藏馬可夫模型(Hidden Markov Model; HMM)去分類資料流中是否為P2P資料流,在第二階段中我們利用flow的現有屬性使用適當提升演算法搭配Decision Stump分類器去進行最終的分類,在本實驗中我們收集真實校園網路流量做為模擬資料,實驗模擬結果得到優異的偵測率(98%)及偏低的虛警率(5.9%),我們所提供的方法僅用到傳輸層的表頭資訊,不需分析封包的payload和port number,可以及早偵測到P2P flow且可以偵測未知的P2P協定。


    Peer-to-peer (P2P) traffic has accounted for major fraction of all internet traffic. Hence, P2P flow identification becomes an important problem for network management. A robust P2P flow identification approach should operate properly without port information and payload information, since new-generation P2P applications can use arbitrary port number to avoid fixed-port block and use payload encryption to avoid P2P signature detection. Previous research that use machine learning approach for P2P flow identification, suffer form low detection rate and high false positive rate due to lack for proper features. In our research, we propose an ensemble classification approach, which integrates Hidden Markov Model (HMM) and Adaboost algorithm. The proposed P2P identification scheme can be divided into two stages. In the first stage, we investigated the phenomenon of small packet and large packet interchange in the P2P flow and identified an important feature, called packet size sequence pattern, and use Hidden Markov Model (HMM) to recognize the patterns. In the second stage, we use Adaboost algorithm with traditional flow attributes to promote the detection accuracy and reduce false positive in classification. To verify the performance of the proposed P2P identification based on ensemble classification, we collect network traffic traces from NTUST campus, and run intensive simulations. The simulation results show that the ensemble classification approach for P2P flow identification can achieve 98% detection rate and 5% false alarm rate.

    Chapter 1 Introduction 1 1.1 Problem Statements 1 1.2 Assumption 2 1.3 Related Work 3 1.4 Motivation 5 1.5 Goal 5 1.6 Contribution 5 Chapter 2 Characterization of P2P Traffic 7 2.1 Overview 7 2.2 P2P File-sharing Protocols 8 2.2.1 BitTorrent 8 2.2.2 eDonkey / eMule 9 2.2.3 Foxy 10 2.2.4 GoGoBox 11 2.3 Packet Size Sequence Pattern 12 Chapter 3 Design of P2P Flow Identification System 22 3.1 Flow Collection Mechanism 23 3.2 Identification of AdaBoost Model 25 3.2.1 Multi Attributes 25 3.2.2 AdaBoost Algorithm 27 3.3 Identification of Hidden Markov Model 29 3.3.1 Packet Size Sequence Pattern Collection 29 3.3.2 Hidden Markov Model Algorithm 31 3.4 Ensemble Classification with HMM and Adaboost 33 Chapter 4 Performance Evaluation 35 4.1 Trace Data Description 35 4.2 Build Ground Truth Classification 36 4.3 Bandwidth Estimation 37 4.4 Performance Metrics 38 4.5 Performance Evaluation 39 4.5.1 AdaBoost Simulation Result 39 4.5.2 HMM Simulation Result 41 4.5.3 Ensemble classification with HMM and AdaBoost 42 4.6 Comparison with Different Methods 43 4.7 Sampled packets with Different Methods 45 Chapter 5 Conclusion 47 Reference 48

    [1] S. Sen, Q. Spatscheck, and D. Wang, “Accurate, scalable In-Network Identification of P2P Traffic Using Application Signatures,” in Proceedings of the 13th internation al conference on World Wild Web, New York, USA, 2004, pp.512-521.
    [2] T. Karagiannis, A. Brodio, M. Faloutsos, and Kc claffy, “Transport Layer Identification of P2P Traffic,” in Proceedings of the 4th ACM SIGCOMM conference on internet measurement, Taormina, Sicily, Italy, 2004, pp. 121-134.
    [3] T. Karagiannis, A. Broido, and M. Faloutsos, “File-sharing in the Internet: A Characterization of P2P Traffic in the Backbone,” in Technical Report, UC Riverside, 2003.
    [4] T. Karagiannis, A. Broido, and M. Faloutsos, “BLINC: Multilevel Traffic Classification in the Dark,” in ACM SIGCOMM, Pennsylvania, Philadelphia, USA, August 2005, pp. 229-240.
    [5] F. Constantinou and P. Mavrommatis, “Identifying Known and Unknown Peer-to-Peer Traffic,” in Proceedings. of Fifth IEEE International Symposium on Network Computing and Applications, 2006, pp. 93-102.
    [6] X. Lu, H. Duan, and X. Li, “Identification of P2P Traffic Based on the Content Redistribution Characteristic,” in IEEE International Symposium on communication technologies (ISCIT), 2007, pp. 596-601.
    [7] J. Erman, M. Arlitt, and A. Mahanti, “Traffic Classification Using clustering Algorithms,” in ACM SIGCOMM’06 MineNet Workshop, Pisa, Italy, 2006, pp. 281-286.
    [8] Snort. http://www.snort.org/.
    [9] BitTorrent Protocol. http://bitconjurer.org/BitTorrent/
    [10] eMule protocol. http://www.emule-project.net/.
    [11] Foxy protocol. http://tw.myfoxy.net/
    [12] GoGoBox. http://www.gogobox.com.tw/
    [13] IANA. Internet Assigned Numbers Authority. http://www.iana.org/assignments/port-numbers.
    [14] WireShark tool. http://www.wireshark.org/.
    [15] Coral Reef tool. http://www.caida.org/tools/measurement/coralreef/.
    [16] L.R. Rabiner, “A Tutorial to Hidden Markov Models and Selected Applications in Speech Recognition”, in IEEE Proceedings, 1989, pp. 267-295.
    [17] R. Polikar, “Ensemble Based Systems in Decision Making”, in IEEE CIRCUITS AND SYSTEM MAGAZINE, 2006, pp. 21-45.
    [18] P. Silapachote, D.R. Karuppiah, and A. R. Hanson, “Feature Selection Using AdaBoost for Face Expression Recognition, “ in Proceedings of the 4th IASTED International Conference on Visualization, Imaging, and Image Processing, Marbella, Spain, 2004, pp. 84–89.
    [19] D. Zuev and A.W. Moore, “Traffic Classification using a Statistical Approach,” in Proceedingd of the 6th Passive Active Network Measurement Workshop (PAM), Boston, USA, March 2005, pp. 321-324.
    [20] S. Zander, T. Nguyen, and G. Armitage, “Automated Traffic Classification and Application Identification using Machine Learning,” in IEEE Local Computer Networks, 2005, pp. 250-257.
    [21] M. Kin, H. Kang, and J. Hong, “Toward Peer-to-Peer Traffic Analysis Using Flows,” in Self-Managing Distributed Systems, 14th IFIP/IEEE International Workshop on Distributed Systems: Operations and Management, October 2003, pp. 55-67.
    [22] S. Sen and J. Wang, “Analyzing Peer-To-Peer Traffic Across Large Networks,” in Proceedings of ACM SIGCOMM Internet Measurement Workshop, Marseilles, France , November 2002, pp. 219-232.
    [23] T. Nicholas, “Using AdaBoost and Decision Stumps to Identify Spam E-mail,” in Stanford Natural Language Processing Group, CS224N/Ling237, 2003.

    QR CODE