簡易檢索 / 詳目顯示

研究生: 蔡博堯
Bo-Yao Tsai
論文名稱: 基於Hadoop之網際網路訊務萃取器設計與實作
Design and Implementation of Internet Traffic Extractor using Hadoop
指導教授: 鄭瑞光
Ray-guang Cheng
口試委員: 許獻聰
Shiann-tsong Sheu
呂政修
Jenq-shiou Leu
陳仁暉
Jen-hui Chen
王瑞堂
Jui-tang Wang
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2015
畢業學年度: 103
語文別: 中文
論文頁數: 35
中文關鍵詞: 巨量資料HadoopMapReduce網際網路訊務萃取
外文關鍵詞: Big data, Hadoop, MapReduce, Internet traffic extractor
相關次數: 點閱:157下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 網路封包為網路資料的傳輸單位,我們可以藉由網路封包來分析網路協定、效能或是除錯。當研究人員在大型的網路節點(例如,學校、公司等)收集網路封包,在研究上將面臨巨量資料的問題,其導致無法迅速且有效地分析封包。本論文提出一個基於Hadoop的網際網路訊務萃取器來解決巨量資料(Big data)問題,它能夠萃取出使用者感興趣的訊務、重新排序封包並且放置於資料夾中,以利於進一步的研究。
    MapReduce是Hadoop [1]資料運算的核心,為了運用MapReduce運算架構來實現網際網路訊務萃取器,我們考慮它的運作流程和特性,解決並實現萃取器的設計議題。最後我們在實驗中證實,我們的萃取器是可以正確地萃取出特定的封包串流(flow),而且藉由Hadoop叢集架構的延展特性,使這個萃取器具有處理巨量封包的能力。


    Network packet tracing has been used for many different purposes, such as protocol analysis, networking performance analysis, network software debugging, and so on. Due to the big data problem, it is difficult to analyze the packet traces efficiently and promptly when doing the research. In this paper, we proposed an Hadoop based Internet traffic extractor to solve big data problem. An Internet traffic extractor can extract bi-directional flows containing the interested traffic, reorder the packets in each flow, and then put the reordered packets of a flow into a folder for further analysis.
    MapReduce is the core technology of Hadoop [1] for processing big data. In order to design an Internet traffic extractor, we solved the design issues of an Internet traffic extractor by considering the properties of MapReduce. In our experiments, we verified that the extractor can correctly extract the specific flows. Furthermore, the extractor is capable of dealing with big data with the scalability of Hadoop cluster.

    論文摘要 Abstract 誌謝 目錄 圖目錄 第一章 緒論 1.1 研究動機 1.2 相關文獻 1.2.1 傳統網路訊務分析工具 1.2.2 Hadoop巨量資料處理之應用 1.3 Hadoop簡介 1.3.1 MapReduce 1.4 設計議題 1.5 論文內容介紹 第二章 Hadoop網路訊務萃取器之系統架構 2.1 系統架構 第三章 設計議題之解法 3.1 萃取雙向封包串流 3.2 減少reduce程式的運算複雜度與群組不當的問題 3.3 移除重複的TCP封包 第四章 實驗結果 4.1 輸出結果驗證 4.1.1 有效性 4.1.2 完整性 4.1.3 正確性 4.2 效能探討 4.2.1 節點數量比較 4.2.2 特徵值封包萃取比較 4.2.3 通訊埠/IP萃取與特徵值萃取比較 4.2.4 巨量資料測試 第五章 結論 參考文獻

    [1] Hadoop [Online]. Available: http://hadoop.apache.org/
    [2] Tcpdump [Online]. Available: http://www.tcpdump.org/
    [3] Wireshark [Online]. Available: https://www.wireshark.org/
    [4] Netresec [Online]. Available: http://www.netresec.com/?page=CapLoader
    [5] Y. H. Kim et al., “PcapWT: An efficient packet extraction tool for large volume network traces,” in Computer Networks, vol. 79, pp 99-102, Mar. 2015.
    [6] Y. Y. Qiao, Z. M. Lei, L. Yuan, M. J. Guo, “Offline traffic analysis system based on Hadoop,” The Journal of China Universities of Posts and Telecommunications, pp. 97-103, 2013.
    [7] V. K. C. Bumgardner and V. W. Marek, “Scalable hybrid stream and Hadoop network analysis system,” Proceedings of the 14th Int. Conf. ACM/SPE, pp. 219-224, 2014.
    [8] Y. Lee, W. Kang, and H. Son, “An internet traffic analysis method with MapReduce,” Proceedings of IEEE/IFIP Network Operations and Management Symposium Workshops (NOMS Wksps), pp. 357-361, Apr. 2010.
    [9] Y. Lee and Y. Lee, “Toward scalable internet traffic measurement and analysis with Hadoop,” ACM SIGCOMM Comput. Commun. Rev., vol. 43, pp. 5-13, Jan. 2013.
    [10] Y. Cai, B. Wu, X. Zhang, M. Luo, J. Su, “Flow identification and characteristics mining from internet traffic with Hadoop,” Proceedings of 2014 International Conference on Information and Telecommunication Systems (CITS), Computer, pp. 1-5, 2014.
    [11] A. Lukashin, L. Laboshin, V. Zaborovsky, and V. Mulukha, “Distributed packet trace processing method for information security analysis,” in Internet of Things, Smart Spaces, and Next Generation Networks and Systems, S. Balandin et al., Eds. St. Petersburg, Russia: Springer International Publishing, 2014, vol. 8638, pp. 535–543.
    [12] X. Y. Chen, H. K. Pao and Y. J. Lee, “Efficient traffic speed forecasting based on massive heterogenous historical data,” Proceedings of IEEE International Conference on Big Data, pp. 10-17, Oct. 2014.
    [13] R. K. Grace, R. Manimegalai, S. S. Kumar, “Medical image retrieval system in grid using Hadoop framework,” Proceedings of International Conference on Computational Science and Computational Intelligence (CSCI), pp. 144-148, Mar. 2014.
    [14] A. K. Dubey, V. Jain, A.P Mittal, “Stock Market Prediction using Hadoop Map-Reduce Ecosystem,” Proceedings of 2nd International Conference on Computing for Sustainable Global Development, pp. 616-621, Mar. 2015.
    [15] TCP session reconstruction tool [Online]. Available: http://www.codeproject.com/Articles/20501/TCP-Session-Reconstruction-Tool
    [16] Winmerge [Online]. Available: http://winmerge.org/

    QR CODE