簡易檢索 / 詳目顯示

研究生: 曾將為
CHIANG-WEI Tzeng
論文名稱: 基於深度學習之惡意網址偵測方法
Malicious URLs Detection by Deep Learning Approach
指導教授: 吳宗成
Tzong-Chen Wu
口試委員: 羅乃維
Nai-Wei Lo
查士朝
Shi-Cho Cha
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理系
Department of Information Management
論文出版年: 2018
畢業學年度: 106
語文別: 中文
論文頁數: 45
中文關鍵詞: 惡意網址深度學習卷積神經網路(CNN)遞迴神經網路(RNN)
外文關鍵詞: Malicious URLs, Deep Learning, Convolutional Neural Networks(CNN), Recurrent Neural Network(RNN)
相關次數: 點閱:346下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

  隨著網際網路技術日益漸進,惡意網址及網站的數量以指數般成長,如何在使用者上網瀏覽的同時,提高資訊安全意識,是目前相當重要且不可或缺的議題。對於網際網路之安全意識薄弱的使用者,其實很容易誤信釣魚、詐騙的網站與惡意連結,而造成信用與財產損失和資料外洩的風險。本研究主要是將海量的惡意網址與網站,透過深度學習訓練資料,用以偵測未知的網址背後是否含有惡意行為。
  本研究大量收集網路上已經釋出的惡意網站網址、歷史黑名單等資料集,藉由深度學習訓練資料。其核心價值是設計一套架構以及方法,透過卷積神經網路(Convolutional Neural network, CNN)自動化提取特徵的特性,搭配遞迴神經網路(Recurrent Neural network, RNN)的長短期記憶(Long Short-Term memory, LSTM),來偵測未知的網址背後的網站是否含有惡意行為。本研究提出的方法在76萬筆資料的精確度表現為98.86%;而曲線下面積(Area under the Curve of ROC, AUC)表現為0.9974,略高於Le等人[11]的研究成果,以茲證明本研究所提出的方法能有效識別惡意網址。

關鍵字:惡意網址、深度學習、卷積神經網路(CNN)、遞迴神經網路(RNN)


Since the Internet technology improves day by day, the number of malicious URLs and websites is raising dramatically high as well. It is extremely important and indispensable let the users know that they should be more aware of their personal information security while they are surfing the internet. To those who are lack of cybersecurity awareness, they could naively be deceived by phishing sites, scams, and malicious links, which often causes enormous loss of their credit, property, or personal data. The goal of this research is to infer that if there is any malicious activity behind an unknown website by analyzing the massive data of those malicious websites and URLs with the deep learning system.
The research collected a significant number of datasets, including the malicious websites and the blacklists that were released on the internet. The main target is to design a structural method that combines the feature of the automatic extraction of Convolutional Neural network (CNN) and the Long Short-Term memory (LSTM) of the Recurrent Neural network (RNN) in order to learn the risk of being attacked by an unknown website. In conclusion, the method of this research shows 98.86% accuracy rate and Area under the Curve of ROC shows 0.9974 in more than 760 thousand pieces of data, and Area under the Curve of ROC (AUC) is slightly higher than the result of Le’s[11] research. It proves that the method provided by this research can efficiently infer malicious links.

Keywords: Malicious URLs, Deep Learning, Convolutional Neural Networks(CNN), Recurrent Neural Network(RNN)

摘要 III Abstract IV 目錄 VI 圖目錄 VII 表目錄 VIII 第一章 緒論 1 1.1 研究背景與研究動機 1 1.2 研究目的 2 1.3 論文架構 3 1.4 研究限制 5 第二章 文獻探討 6 2.1惡意網址 6 2.2機器學習演算法 14 第三章 研究設計 18 3.1資料收集方法 18 3.2系統架構與流程 20 第四章 實驗結果與分析 23 4.1 深度學習訓練模型 23 4.2結果與分析 27 第五章 結論與未來發展 42 5.1結論 42 5.2未來研究方向 43 參考文獻 44

[1] J. Hong, “The state of phishing attacks,” Communications of the ACM, vol. 55, no. 1, pp. 74–81, 2012.
[2] B. Liang, J. Huang, F. Liu, D. Wang, D. Dong, and Z. Liang, “Malicious web pages detection based on abnormal visibility recognition,” in E-Business and Information System Security, 2009. EBISS’09. International Conference on. IEEE, pp. 1–5 2009.
[3] F. D. Abdi, W. Lian, “Malicious URL Detection using Convolutional Neural Network,” International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.7, No.6, December 2017.
[4] D. R. Patil and J. Patil, “Survey on malicious web pages detection techniques,” International Journal of u-and e-Service, Science and Technology, vol. 8, no. 5, pp. 195-206, 2015.
[5] M. Cova, C. Kruegel, and G. Vigna, “Detection and analysis of drive by-download attacks and malicious javascript code,” in Proceedings of the 19th international conference on World wide web. ACM, pp. 281–290, 2010.
[6] J. Ma, L. Saul, S. Savage, and G. Voelker, “Learning to detect malicious URLs.” ACM Transactions on Intelligent Systems and Technology (TIST), 2(3): 1-24, 2011.
[7] J. Ma, L. Saul, S. Savage, and G. Voelker, “Beyond Blacklists: Learning to Detect Malicious Websites from Suspicious URLs”, In Proceedings of the ACM SIGKDD Conference, Paris, France, Jun 2009.
[8] R. Verma and D. Avisha, “What’s in a URL: Fast Feature Extraction and Malicious URL Detection.” In Proc. 3rd ACM International Workshop on Security and Privacy Analytics, IWSPA 2017, Scottsdale, Arizona, USA. 55–63, 22-24 March, 2017.
[9] F. Vanhoenshoven, G. NÃąpoles, R. Falcon, K. Vanhoof, and M. KÃűppen. “Detecting malicious urls using machine learning techniques.” In 2016 IEEE Symposium Series on Computational Intelligence (SSCI), pages 1–8, Dec 2016.
[10] J. Saxe and K. Berlin, “expose: A character-level convolutional neural network with embeddings for detecting malicious urls, file paths and registry keys” arXiv preprint arXiv:1702.08568, 2017.
[11] H. Le, Q. Pham, D. Sahoo, Steven C.H. Hoi , “URLNet: Learning a URL Representation with Deep Learning for Malicious URL Detection”, arXiv:1802.03162 v2 [cs.CR] 2, Mar 2018.
[12] A. Krizhevsky, I. Sutskever, G.E. Hinton, “Imagenet classification with deep con-volutional neural networks.” In: Advances in neural information processing systems.
pp. 1097–1105, 2012.
[13] A. Severyn, A.Moschitti, “Unitn: Training deep convolutional neural network for twitter sentiment classification.” In: Proceedings of the 9th International Workshopon Semantic Evaluation (SemEval 2015), Association for Computational Linguistics,Denver, Colorado. pp. 464–469, 2015.
[14] C. Dos Santos, M. Gatti, “Deep convolutional neural networks for sentiment analysis of short texts.” In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. pp. 69–78, August 2014.
[15] X. Zhang, J. Zhao, Y. LeCun, “Character-level convolutional networks for text class-ification. ” In: Advances in Neural Information Processing Systems. pp. 649–657, 2015.
[16] S. Hochreiter, J. Schmidhuber, “Long short-tm memory.” Neural computation 9(8), 1735–1780, 1997.
[17] N. Kalchbrenner, L.E speholt, K. Simonyan, A.v.d. Oord, A. Graves, K. Kavukcuoglu, “Neural machine translation in linear time.” arXiv preprint arXiv:1610.10099, 2016.
[18] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. “Dropout: A simple way to prevent neural networks from overfitting.” The Journal of Machine Learning Research, pages 1929–1958, 2014.
[19] K. Hara, D. Saitoh, and H. Shouno, “Analysis of Dropout Learning Regarded as Ensemble Learning, in Artificial Neural Networks and Machine Learning” ICANN 2016, pp. 72–79, Springer, Cham, 2016.
[20] S. Ruder, “An overview of gradient descent optimization algorithms.” arXiv:1609.04747v2 [cs.LG], 15 Jun 2017
[21] iThome新聞:挖礦綁架,小心你的電腦變礦工。2018年6月15日,取自https://www.ithome.com.tw/foucs/119617
[22] iThome新聞:WannaCry進化為蠕蟲,勒索軟體殺傷力大增。2018年6月15日,取自https://www.ithome.com.tw/news/114312
[23] 郭璟塘 AZ(2017年8月26日)。HITCON 2017:看我如何搭配黑帽SEO玩轉網站排名流量。2018年6月15日,取自https://hitcon.org/2017/CMT/slide-files/d2_s4_r1.pdf

QR CODE