基於FastText與深度學習技術之DGA域名檢測與分類｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	邱建輔 Jian-Fu Qiu
論文名稱：	基於FastText與深度學習技術之DGA域名檢測與分類 Detection and Classification of DGA Domain Names based on FastText and Deep Learning Techniques
指導教授：	陳俊良 Jiann-Liang Chen
口試委員:	黃能富 Nen-Fu Huang 呂政修 Jenq-Shiou Leu 洪論評 Lun-Ping Hung 鄧德雋 Der-Jiunn Deng 陳俊良 Jiann-Liang Chen
學位類別：	碩士 Master
系所名稱：	電資學院 - 電機工程系 Department of Electrical Engineering
論文出版年：	2023
畢業學年度：	111
語文別：	英文
論文頁數：	72
中文關鍵詞：	資訊安全、深度學習、殭屍網路、域名檢測
外文關鍵詞：	DGA domain, FastText
相關次數：	點閱：201 下載：9
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著科技日新月異的進步，在現今的社會中，網路儼然成為人們生活中不可或缺的一項重要元素。雖然通訊網路帶來了多項優點，但是人們的資訊安全意識卻並沒有跟著成長。目前還是經常發生多起資安事件，並且引起事件的原因有相當多樣，例如：感染惡意軟體、程式漏洞與社交工程等常見問題。面對層出不窮且日漸創新的惡意攻擊方式，人們有必要研究創新且實用的資安防禦技術。
然而，殭屍網路是一種由大量受感染的電腦組成的資安威脅型態。受害者的電腦首先被惡意軟體控制，接著攻擊者可由Command and Control Server(C&C Server)遠端操縱以執行多種惡意攻擊行為。而攻擊者為了躲避資安研究人員的追查，常使用Domain Generation Algorithm(DGA)自動產生多個可拋棄式域名，以該種策略與受害者的裝置建立連線，以利後續的控制。
故本研究基於FastText與深度學習技術解決DGA域名檢測問題。通過使用FastText進行詞向量權重的提取之後，接著以卷積神經網路(Convolutional Neural Network, CNN)和長短期記憶(Long Short-Term Memory, LSTM)為基礎所組成之Hybrid DGA DefenseNet(HDDN)從資料集中自動化提取特徵，並進行檢測與分類。
最後，本研究於兩個國際公開資料集中進行測試。根據實驗結果發現HDDN模型能有效檢測DGA域名。該模型於檢測任務中可達到97.70%的準確率，且於分類任務中可達到93.86%的準確率。由實驗結果可知，本研究提出之DGA域名檢測模型較優於過往之研究。

With the rapid advancement of technology, the Internet has become an indispensable part of people's lives in today's society. Although the Internet brings many benefits, people's awareness of information security has not grown accordingly. Cybersecurity incidents have become more frequent, and the causes of these incidents are diverse, such as malware infections, software vulnerabilities, and social engineering. With the ever-increasing and evolving range of malicious attack methods, people must study innovative and practical cybersecurity defense techniques.
However, a botnet is a cyber security threat consisting of many infected computers. The victim's computer is first compromised by malware, allowing the attacker to remotely control it through a command and control (C&C) server to perform various malicious attack behaviors. To avoid detection by cybersecurity researchers, attackers often use a Domain Generation Algorithm (DGA) to automatically generate multiple disposable domains and establish connections to the compromised devices for subsequent control.
This study addresses the problem of DGA domain detection using FastText and deep learning techniques. After extracting word vectors using FastText, a Hybrid DGA DefenseNet (HDDN) based on Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) is constructed to automatically extract features from the dataset and perform detection and classification.
Finally, this study conducted tests on two international public datasets. The results of the experiments showed that the HDDN model showed superior performance. The model achieved an accuracy rate of 97.70% and 93.86% in the detection and classification tasks, respectively. Based on the experimental results, it can be concluded that the DGA domain detection model proposed in this study outperforms previous research.

摘要    I
Abstract    II
Acknowledgement    III
List of Figures    VII
List of Tables    X
Chapter 1    Introduction    1
1    Motivation    1
2    Contributions    6
3    Organization    8
Chapter 2    Related Works    9
1    Malware Development and Research Orientation    9
2    The Development of Botnet Attacks    10
3    Research related to detecting DGA Botnet attacks    12
Chapter 3    Proposed System    15
1    System Architecture    15
2    Data Collection    16
2.1    Netlab360 DGA Dataset    17
2.2    UMUDGA　Dataset    18
3    Data Preprocessing    21
3.1    Data Integration    22
3.2    Character-based Tokenization    22
3.3    Word Index & Text-to-Sequence    23
3.4    Padding    23
4    Embedding    25
4.1    FastText    28
5    Detection Model Architecture    29
5.1    TextCNN    29
5.2    Hybrid DGA DefenseNet (HDDN) Architecture    32
5.3    Batch Normalization    35
Chapter 4    Performance Analysis    37
1    Experimental Environment    37
2    Evaluation Metrics    38
3    Performance Evaluation    40
3.1    Detection Task - Netlab360    40
3.2    Detection Task - UMUDGA    43
3.3    Multiclass Classification Task - Netlab360    45
3.4    Multiclass Classification Task - UMUDGA    49
4    Performance Comparison with the Literature    52
Chapter 5    Conclusion and Future Works    53
1    Conclusion    53
2    Future Works    54
References    56
                                

[1] SEON, What Is a Botnet?, Available: https://seon.io/resources/dictionary/botnet/ [Accessed: 19-Apr 2023]
[2] Simplilearn, What Is a Botnet, Its Architecture and How Does It Work?, Available: https://www.simplilearn.com/tutorials/cyber-security-tutorial/what-is-a-botnet [Accessed: 22-Apr 2023]
[3] Imperva, 2022 Imperva Bad Bot Report, Available: https://www.imperva.com/resources/resource-library/reports/bad-bot-report/ [Accessed: 23-Apr 2023]
[4] Kaspersky, The notorious botnet is back: Emotet’s activity grows three-fold in just one month, Available: https://www.kaspersky.com/about/press-releases/2022_the-notorious-botnet-is-back-emotets-activity-grows-three-fold-in-just-one-month [Accessed: 27-Apr 2023]
[5] J. Vania, A. Meniya, and H. Jethva, "A review on botnet and detection technique," International Journal of Computer Trends and Technology, vol. 4, pp. 23-29, 2013.
[6] A. M. Manasrah, T. Khdour, and R. Freehat, "DGA-based botnets detection using DNS traffic mining," Journal of King Saud University - Computer and Information Sciences, vol. 34, pp. 2045-2061, 2022.
[7] Ö. A. Aslan and R. Samet, "A Comprehensive Review on Malware Detection Approaches," IEEE Access, vol. 8, pp. 6249-6271, 2020.
[8] Y. Li and Q. Liu, "A comprehensive review study of cyber-attacks and cyber security; Emerging trends and recent developments," Energy Reports, vol. 7, pp. 8176-8186, 2021.
[9] F. A. Aboaoja, A. Zainal, F. A. Ghaleb, B. A. S. Al-rimy, T. A. E. Eisa, and A. A. H. Elnour. "Malware Detection Issues, Challenges, and Future Directions: A Survey," Applied Sciences, vol. 12, no. 17, pp. 8482, 2022.
[10] B. Shteiman, R. Benson, and S. Gil, T1568.002: Dynamic Resolution: Domain Generation Algorithms of MITRE ATT&CK framework, Available: https://attack.mitre.org/techniques/T1568/002/ [Accessed: 8-Feb 2023]
[11] Y. Xing, H. Shu, H. Zhao, D. Li, and L. Guo, "Survey on Botnet Detection Techniques: Classification, Methods, and Evaluation," Mathematical Problems in Engineering, vol. 2021, pp. 24, 2021.
[12] K. Alieyan, A. Almomani, R. Abdullah and M. Anbar, "A Rule-based Approach to Detect Botnets based on DNS," Proceedings of the IEEE International Conference on Control System, Computing and Engineering, pp. 115-120, 2018.
[13] E. C. Ogu, O. A. Ojesanmi, O. Awodele, and ‘S. Kuyoro, "A Botnets Circumspection: The Current Threat Landscape, and What We Know So Far," Information, vol. 10, no. 11, pp. 337, 2019.
[14] F. Suthar, N. Patel, and S. V.O. Khanna, "A Signature-Based Botnet (Emotet) Detection Mechanism," International Journal of Engineering Trends and Technology, vol. 70, no. 5, pp. 185-193, 2022.
[15] J. Y. Lee, J. Y. Chang, and E. G. Im, "DGA-based malware detection using DNS traffic analysis," Proceedings of the Conference on Research in Adaptive and Convergent Systems, pp. 283–288, 2019.
[16] T. S. Wang, H. T. Lin, W. T. Cheng, and C. Y. Chen, "DBod: Clustering and detecting DGA-based botnets using DNS traffic analysis," Computers & Security, vol. 64, pp. 1-15, 2017.
[17] M. Grill, I. Nikolaev, V. Valeros and M. Rehak, "Detecting DGA malware using NetFlow," Proceedings of the IFIP/IEEE International Symposium on Integrated Network Management, pp. 1304-1309, 2015.
[18] J. Ahmed, H. H. Gharakheili, C. Russell and V. Sivaraman, "Automatic Detection of DGA-Enabled Malware Using SDN and Traffic Behavioral Modeling," IEEE Transactions on Network Science and Engineering, vol. 9, no. 4, pp. 2922-2939, 2022.
[19] T Divya, P.P Amritha, and S. Viswanathan, "A model to detect domain names generated by DGA malware," Procedia Computer Science, vol. 215, pp. 403-412, 2022.
[20] P.V.S. Charan, S.K. Shukla, and P.M. Anand, "Detecting Word Based DGA Domains Using Ensemble Models," Cryptology and Network Security, vol. 12579, pp. 127-143, 2020.
[21] Y. Cheng, T. Lu, S. Yan, J. Zhang, and X. Yu, "N-Trans: Parallel Detection Algorithm for DGA Domain Names," Future Internet, no. 7, pp. 209, 2022.
[22] A. Cucchiarelli, C. Morbidoni, L. Spalazzi, and M. Baldi, "Algorithmically generated malicious domain names detection based on n-grams features," Expert Systems with Applications, vol. 170, pp. 114551, 2021.
[23] Z. Mattia, G. P. Manuel, and M. P. Gregorio, "UMUDGA - University of Murcia Domain Generation Algorithm Dataset," Mendeley Data, V1, 2020.
[24] Leyan, Word Embedding, Available: https://medium.com/ml-note/word-embedding-3ca60663999d [Accessed: 23-May 2023]
[25] B. Ratheesh, Word Embeddings, WordPiece and Language-Agnostic BERT (LaBSE), Available: https://medium.com/mlearning-ai/word-embeddings-wordpiece-and-language-agnostic-bert-labse-98c7626878c7 [Accessed: 23-May 2023]
[26] Facebook, Word vectors for 157 languages, Available: https://fasttext.cc/docs/en/crawl-vectors.html [Accessed: 1-May 2023]
[27] Y. Kim, "Convolutional Neural Networks for Sentence Classification," arXiv [cs.NE], 2014.
[28] S. Hochreiter and J. Schmidhuber, "Long Short-Term Memory," Neural Comput; vol. 9, pp. 1735–1780, 1997.
[29] H. Zhang, Y. Shan, P. Jiang, and X. Cai, "A Text Classification Method Based on BERT-Att-TextCNN Model," Proceedings of the Advanced Information Management, Communicates, Electronic and Automation Control Conference, pp. 1731-1735, 2022.
[30] D. Haputhanthri and A. Wijayasiri, "Short-Term Traffic Forecasting using LSTM-based Deep Learning Models," Proceedings of the Moratuwa Engineering Research Conference, pp. 602-607, 2021.
[31] Google, Keras, Available: https://www.tensorflow.org/guide/keras?hl=zh-tw [Accessed: 1-Jun 2023]
[32] S. Ioffe and C. Szegedy, "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift," arXiv [cs], 2015.

簡易檢索 / 詳目顯示

相關論文