簡易檢索 / 詳目顯示

研究生: 邱建輔
Jian-Fu Qiu
論文名稱: 基於FastText與深度學習技術之DGA域名檢測與分類
Detection and Classification of DGA Domain Names based on FastText and Deep Learning Techniques
指導教授: 陳俊良
Jiann-Liang Chen
口試委員: 黃能富
Nen-Fu Huang
呂政修
Jenq-Shiou Leu
洪論評
Lun-Ping Hung
鄧德雋
Der-Jiunn Deng
陳俊良
Jiann-Liang Chen
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 72
中文關鍵詞: 資訊安全深度學習殭屍網路域名檢測
外文關鍵詞: DGA domain, FastText
相關次數: 點閱:201下載:9
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著科技日新月異的進步,在現今的社會中,網路儼然成為人們生活中不可或缺的一項重要元素。雖然通訊網路帶來了多項優點,但是人們的資訊安全意識卻並沒有跟著成長。目前還是經常發生多起資安事件,並且引起事件的原因有相當多樣,例如:感染惡意軟體、程式漏洞與社交工程等常見問題。面對層出不窮且日漸創新的惡意攻擊方式,人們有必要研究創新且實用的資安防禦技術。
    然而,殭屍網路是一種由大量受感染的電腦組成的資安威脅型態。受害者的電腦首先被惡意軟體控制,接著攻擊者可由Command and Control Server(C&C Server)遠端操縱以執行多種惡意攻擊行為。而攻擊者為了躲避資安研究人員的追查,常使用Domain Generation Algorithm(DGA)自動產生多個可拋棄式域名,以該種策略與受害者的裝置建立連線,以利後續的控制。
    故本研究基於FastText與深度學習技術解決DGA域名檢測問題。通過使用FastText進行詞向量權重的提取之後,接著以卷積神經網路(Convolutional Neural Network, CNN)和長短期記憶(Long Short-Term Memory, LSTM)為基礎所組成之Hybrid DGA DefenseNet(HDDN)從資料集中自動化提取特徵,並進行檢測與分類。
    最後,本研究於兩個國際公開資料集中進行測試。根據實驗結果發現HDDN模型能有效檢測DGA域名。該模型於檢測任務中可達到97.70%的準確率,且於分類任務中可達到93.86%的準確率。由實驗結果可知,本研究提出之DGA域名檢測模型較優於過往之研究。


    With the rapid advancement of technology, the Internet has become an indispensable part of people's lives in today's society. Although the Internet brings many benefits, people's awareness of information security has not grown accordingly. Cybersecurity incidents have become more frequent, and the causes of these incidents are diverse, such as malware infections, software vulnerabilities, and social engineering. With the ever-increasing and evolving range of malicious attack methods, people must study innovative and practical cybersecurity defense techniques.
    However, a botnet is a cyber security threat consisting of many infected computers. The victim's computer is first compromised by malware, allowing the attacker to remotely control it through a command and control (C&C) server to perform various malicious attack behaviors. To avoid detection by cybersecurity researchers, attackers often use a Domain Generation Algorithm (DGA) to automatically generate multiple disposable domains and establish connections to the compromised devices for subsequent control.
    This study addresses the problem of DGA domain detection using FastText and deep learning techniques. After extracting word vectors using FastText, a Hybrid DGA DefenseNet (HDDN) based on Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) is constructed to automatically extract features from the dataset and perform detection and classification.
    Finally, this study conducted tests on two international public datasets. The results of the experiments showed that the HDDN model showed superior performance. The model achieved an accuracy rate of 97.70% and 93.86% in the detection and classification tasks, respectively. Based on the experimental results, it can be concluded that the DGA domain detection model proposed in this study outperforms previous research.

    摘要 I Abstract II Acknowledgement III List of Figures VII List of Tables X Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Contributions 6 1.3 Organization 8 Chapter 2 Related Works 9 2.1 Malware Development and Research Orientation 9 2.2 The Development of Botnet Attacks 10 2.3 Research related to detecting DGA Botnet attacks 12 Chapter 3 Proposed System 15 3.1 System Architecture 15 3.2 Data Collection 16 3.2.1 Netlab360 DGA Dataset 17 3.2.2 UMUDGA Dataset 18 3.3 Data Preprocessing 21 3.3.1 Data Integration 22 3.3.2 Character-based Tokenization 22 3.3.3 Word Index & Text-to-Sequence 23 3.3.4 Padding 23 3.4 Embedding 25 3.4.1 FastText 28 3.5 Detection Model Architecture 29 3.5.1 TextCNN 29 3.5.2 Hybrid DGA DefenseNet (HDDN) Architecture 32 3.5.3 Batch Normalization 35 Chapter 4 Performance Analysis 37 4.1 Experimental Environment 37 4.2 Evaluation Metrics 38 4.3 Performance Evaluation 40 4.3.1 Detection Task - Netlab360 40 4.3.2 Detection Task - UMUDGA 43 4.3.3 Multiclass Classification Task - Netlab360 45 4.3.4 Multiclass Classification Task - UMUDGA 49 4.4 Performance Comparison with the Literature 52 Chapter 5 Conclusion and Future Works 53 5.1 Conclusion 53 5.2 Future Works 54 References 56

    [1] SEON, What Is a Botnet?, Available: https://seon.io/resources/dictionary/botnet/ [Accessed: 19-Apr 2023]
    [2] Simplilearn, What Is a Botnet, Its Architecture and How Does It Work?, Available: https://www.simplilearn.com/tutorials/cyber-security-tutorial/what-is-a-botnet [Accessed: 22-Apr 2023]
    [3] Imperva, 2022 Imperva Bad Bot Report, Available: https://www.imperva.com/resources/resource-library/reports/bad-bot-report/ [Accessed: 23-Apr 2023]
    [4] Kaspersky, The notorious botnet is back: Emotet’s activity grows three-fold in just one month, Available: https://www.kaspersky.com/about/press-releases/2022_the-notorious-botnet-is-back-emotets-activity-grows-three-fold-in-just-one-month [Accessed: 27-Apr 2023]
    [5] J. Vania, A. Meniya, and H. Jethva, "A review on botnet and detection technique," International Journal of Computer Trends and Technology, vol. 4, pp. 23-29, 2013.
    [6] A. M. Manasrah, T. Khdour, and R. Freehat, "DGA-based botnets detection using DNS traffic mining," Journal of King Saud University - Computer and Information Sciences, vol. 34, pp. 2045-2061, 2022.
    [7] Ö. A. Aslan and R. Samet, "A Comprehensive Review on Malware Detection Approaches," IEEE Access, vol. 8, pp. 6249-6271, 2020.
    [8] Y. Li and Q. Liu, "A comprehensive review study of cyber-attacks and cyber security; Emerging trends and recent developments," Energy Reports, vol. 7, pp. 8176-8186, 2021.
    [9] F. A. Aboaoja, A. Zainal, F. A. Ghaleb, B. A. S. Al-rimy, T. A. E. Eisa, and A. A. H. Elnour. "Malware Detection Issues, Challenges, and Future Directions: A Survey," Applied Sciences, vol. 12, no. 17, pp. 8482, 2022.
    [10] B. Shteiman, R. Benson, and S. Gil, T1568.002: Dynamic Resolution: Domain Generation Algorithms of MITRE ATT&CK framework, Available: https://attack.mitre.org/techniques/T1568/002/ [Accessed: 8-Feb 2023]
    [11] Y. Xing, H. Shu, H. Zhao, D. Li, and L. Guo, "Survey on Botnet Detection Techniques: Classification, Methods, and Evaluation," Mathematical Problems in Engineering, vol. 2021, pp. 24, 2021.
    [12] K. Alieyan, A. Almomani, R. Abdullah and M. Anbar, "A Rule-based Approach to Detect Botnets based on DNS," Proceedings of the IEEE International Conference on Control System, Computing and Engineering, pp. 115-120, 2018.
    [13] E. C. Ogu, O. A. Ojesanmi, O. Awodele, and ‘S. Kuyoro, "A Botnets Circumspection: The Current Threat Landscape, and What We Know So Far," Information, vol. 10, no. 11, pp. 337, 2019.
    [14] F. Suthar, N. Patel, and S. V.O. Khanna, "A Signature-Based Botnet (Emotet) Detection Mechanism," International Journal of Engineering Trends and Technology, vol. 70, no. 5, pp. 185-193, 2022.
    [15] J. Y. Lee, J. Y. Chang, and E. G. Im, "DGA-based malware detection using DNS traffic analysis," Proceedings of the Conference on Research in Adaptive and Convergent Systems, pp. 283–288, 2019.
    [16] T. S. Wang, H. T. Lin, W. T. Cheng, and C. Y. Chen, "DBod: Clustering and detecting DGA-based botnets using DNS traffic analysis," Computers & Security, vol. 64, pp. 1-15, 2017.
    [17] M. Grill, I. Nikolaev, V. Valeros and M. Rehak, "Detecting DGA malware using NetFlow," Proceedings of the IFIP/IEEE International Symposium on Integrated Network Management, pp. 1304-1309, 2015.
    [18] J. Ahmed, H. H. Gharakheili, C. Russell and V. Sivaraman, "Automatic Detection of DGA-Enabled Malware Using SDN and Traffic Behavioral Modeling," IEEE Transactions on Network Science and Engineering, vol. 9, no. 4, pp. 2922-2939, 2022.
    [19] T Divya, P.P Amritha, and S. Viswanathan, "A model to detect domain names generated by DGA malware," Procedia Computer Science, vol. 215, pp. 403-412, 2022.
    [20] P.V.S. Charan, S.K. Shukla, and P.M. Anand, "Detecting Word Based DGA Domains Using Ensemble Models," Cryptology and Network Security, vol. 12579, pp. 127-143, 2020.
    [21] Y. Cheng, T. Lu, S. Yan, J. Zhang, and X. Yu, "N-Trans: Parallel Detection Algorithm for DGA Domain Names," Future Internet, no. 7, pp. 209, 2022.
    [22] A. Cucchiarelli, C. Morbidoni, L. Spalazzi, and M. Baldi, "Algorithmically generated malicious domain names detection based on n-grams features," Expert Systems with Applications, vol. 170, pp. 114551, 2021.
    [23] Z. Mattia, G. P. Manuel, and M. P. Gregorio, "UMUDGA - University of Murcia Domain Generation Algorithm Dataset," Mendeley Data, V1, 2020.
    [24] Leyan, Word Embedding, Available: https://medium.com/ml-note/word-embedding-3ca60663999d [Accessed: 23-May 2023]
    [25] B. Ratheesh, Word Embeddings, WordPiece and Language-Agnostic BERT (LaBSE), Available: https://medium.com/mlearning-ai/word-embeddings-wordpiece-and-language-agnostic-bert-labse-98c7626878c7 [Accessed: 23-May 2023]
    [26] Facebook, Word vectors for 157 languages, Available: https://fasttext.cc/docs/en/crawl-vectors.html [Accessed: 1-May 2023]
    [27] Y. Kim, "Convolutional Neural Networks for Sentence Classification," arXiv [cs.NE], 2014.
    [28] S. Hochreiter and J. Schmidhuber, "Long Short-Term Memory," Neural Comput; vol. 9, pp. 1735–1780, 1997.
    [29] H. Zhang, Y. Shan, P. Jiang, and X. Cai, "A Text Classification Method Based on BERT-Att-TextCNN Model," Proceedings of the Advanced Information Management, Communicates, Electronic and Automation Control Conference, pp. 1731-1735, 2022.
    [30] D. Haputhanthri and A. Wijayasiri, "Short-Term Traffic Forecasting using LSTM-based Deep Learning Models," Proceedings of the Moratuwa Engineering Research Conference, pp. 602-607, 2021.
    [31] Google, Keras, Available: https://www.tensorflow.org/guide/keras?hl=zh-tw [Accessed: 1-Jun 2023]
    [32] S. Ioffe and C. Szegedy, "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift," arXiv [cs], 2015.

    QR CODE