研究生: |
陳昱蓁 Yu-Chen Chen |
---|---|
論文名稱: |
人工智慧應用於技術支援詐騙偵測與特徵分析 Artificial Intelligence with Feature Analysis for Technical Support Scam Detection |
指導教授: |
陳俊良
Jiann-Liang Chen 馬奕葳 Yi-Wei Ma |
口試委員: |
郭耀煌
Yau-Hwang Kuo 廖婉君 Wan-Jiun Liao 孫雅麗 Yea-Li Sun 陳俊良 Jiann-Liang Chen 黎碧煌 Bih-Hwang Lee |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電機工程系 Department of Electrical Engineering |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 中文 |
論文頁數: | 105 |
中文關鍵詞: | 技術支援詐騙 、惡意網頁 、人工智慧 、深度學習 |
外文關鍵詞: | Technical Support Scam |
相關次數: | 點閱:265 下載:16 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著資訊傳播越發快速,網路使用者接收各式各樣的資訊,其中難免夾雜惡意內容,通常搭配不同社交工程(Social Engineering)手法達到網路詐騙的目的。而駭客會將攻擊手法與時俱進以欺騙使用者藉以控制、竊取資料,並以此作為傳播惡意程式至終端設備的主要途徑。技術支援詐騙(Technical Support Scam, TSS)為近年較新穎的詐騙手法,使用者視窗會跳出終端系統商的警告畫面,並要求使用者撥打電話給技術支援,但其實是假警告,其目的為騙取使用者信任從而騙取維修費與相關個資,甚至於使用者終端植入惡意程式。
本研究為防範「技術支援詐騙」網路威脅,基於行為特徵提出一智慧化學習系統-AI@TSS系統,其概念結合網頁樣本分析、建立特徵與特徵評估等機制打造偵測模型,期望此模型能夠提供防毒軟體公司或相關業者在防禦技術支援詐騙之惡意攻擊時有更明確的策略。本研究收集惡性與技術支援詐騙網頁,並切割為訓練與測試資料集,其命名為AI@TSS資料集。為了驗證資料集可信度,本研究藉由Principle Component Analysis與Autoencoder降維機制繪製散點圖,從散點圖能證實AI@TSS資料集的樣本類別標記正確。
為了將技術支援詐騙網頁從惡意網頁中偵測出來,本研究分析網頁行為並提出42項特徵,分別為惡意網頁偵測導向之host-based特徵與技術支援詐騙網頁偵測導向之code-based特徵,並經由LightGBM特徵評估機制證實本研究所提出的自定義特徵具有鑑別度。最終將最佳特徵組合所訓練出之偵測模型通過本研究的實際測試,其效能在LightGBM演算法之偵測模型可達到98%的準確度,Random Forest演算法之偵測模型可達到95.84%的準確度,而Deep Neural Network架構可達到93%的準確度。由數據結果可知,AI@TSS系統之偵測模型優於現有技術支援偵測方法。
Technical Support Scam (TSS) is a cybercrime that not only elicits the trust of a user but also takes their property. The user's web window pop-up warning information when the technical support scamming process been started, which mimics the official system to scams personal information through fake technical support. The scammer uses a variety of confidence tricks to persuade the user to pay for the supposed maintenance service. Finally, the scammer installs malware on the victim device.
For the purpose of detecting TSS-type attacks among malicious samples, behavioral patterns are used to design a novel TSS-aware task system, called AI@TSS system. This system consists of web sample analysis, feature establishment, and feature evaluation, to generate a detection model. AI@TSS expected to provide antivirus companies or related companies with a more specific strategy to defend malicious infection. To analyze the web page behavior, thousands of samples were collected as training data and hundreds of samples were collected as testing data. This study uses the Principle Component Analysis and Autoencoder reduced-dimension mechanism to plot scatterplots, which confirm the samples class labeling of the dataset is correct.
Forty-two features are proposed, which include the host-based feature for malicious web page detection and the code-based feature for TSS web page detection. The feature analysis mechanism proves the code-based features that are proposed in this task make TSS and malicious samples effectively distinguish. The experimental performance demonstrates that the LightGBM algorithm reaches 98% accuracy. The comparison shows superior to the existing detection methods, with 2.16% and 3.48% improvement in accuracy and precision. This study confirms the effectiveness of AI@TSS in detecting tech support scam web pages.
[1] D. Rico-Bautista, Y. Medina-Cárdenas and C.D. Guerrero, "Smart University: A Review from the Educational and Technological View of Internet of Things," Proceedings of the International Conference on Information Technology and Systems, pp. 427-440, 2019.
[2] K.d.S. Brito, A.A.de Lima, S.E. Ferreira, V.de Arruda Burégio, V.C. Garcia and S.R.de Lemos Meira, "Evolution of the Web of Social Machines: A Systematic Review and Research Challenges, " IEEE Transactions on Computational Social Systems, Vol.7, No.2, pp.373-388, 2020.
[3] A.S. Timmaraju, A. Liu and P. Tripathi, "Addressing Challenges in Building Web-Scale Content Classification Systems," Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp.8134-8138, 2020.
[4] L. Invernizzi, P. M. Comparetti, S. Benvenuti, C. Kruegel, M. Cova, and G. Vigna, "EvilSeed: A Guided Approach to Finding Malicious Web Pages, " Proceedings of the IEEE Symposium on Security and Privacy, pp.20-23, 2012.
[5] Microsoft, "New Breakthroughs in Combatting Tech Support Scams, " Retrieved from https://blogs.microsoft.com/on-the-issues/2018/11/29/new-breakthroughs-in-combatting-tech-support-scams/ (last visited on 2020/07/13)
[6] Magnimind, "10 Powerful Examples of AI Applications, " Retrieved from https://becominghuman.ai/10-powerful-examples-of-ai-applications-553f7f062d9f (last visited on 2020/07/13)
[7] R. Trifonov, O. Nakov and V. Mladenov, "Artificial Intelligence in Cyber Threats Intelligence," Proceedings of the International Conference on Intelligent and Innovative Computing Applications, pp.1-4, 2018.
[8] B.S. Sagar, S. Niranjan, N. Kashyap and D.N. Sachin, "Providing Cyber Security using Artificial Intelligence– A Survey," Proceedings of the3rd International Conference on Computing Methodologies and Communication, pp.717-720, 2019.
[9] Y.C. Chen, L.D. Chen, Y.J. Chen, and J.L. Chen, "Malicious URL Classification using Machine Learning Techniques," IEICE Technical Report- Internet Architecture, pp.1-5, 2019.
[10] D. Harley, M. Grooten, V. Bulletin, S.A. Burn and C. Johnston, "My PC Has 32,539 Errors: How Telephone Support Scams Really Work," Proceedings of the 22nd International Conference on Virus Bulletin, pp.1-8, 2012.
[11] N. Miramirkhani, O. Starov and N. Nikiforakis, "Dial One for Scam: A Large-Scale Analysis of Technical Support Scams," Proceedings of the Symposium on 24th Network and Distributed System Security, pp.1-15, 2017.
[12] B.J. Musadiq, and C. S, "A Call to Deal with Technical Support Scams," International Journal of Research in Engineering, Science and Management, pp.406-410, 2019.
[13] J. Larson, B. Tower, D. Hadfield, D. Edge, and C. White, "Using Web-scale Graph Analytics to Counter Technical Support Scams," Proceedings of the IEEE International Conference on Big Data, pp.3968-3971, 2018.
[14] R. Sampsa and V. Leppänen, "You have a Potential Hacker’s Infection: A Study on Technical Support Scams," Proceedings of IEEE International Conference Computer and Information Technology, pp.197-203, 2017.
[15] V.S. Tseng, J.C. Ying, C.W. Huang, Y. Kao and K.T. Chen, "Fraudetector: A Graph-Mining-based Framework for Fraudulent Phone Call Detection," Proceedings of ACM International Conference on Knowledge Discovery and Data Mining, pp.2157-2166, 2015.
[16] P. Gupta, B. Srinivasan and M. Ahamad, "Phoneypot: Data-driven Understanding of Telephony Threats," Proceedings of the Symposium on Network and Distributed System Security, pp.1-14, 2015.
[17] E. Tyugu, "Artificial Intelligence in Cyber Defense," Proceedings of the 3rd International Conference on Cyber Conflict, pp.1-11, 2011.
[18] S. Morishige, S. Haruta, H. Asahina and I. Sasase, "Obfuscated Malicious JavaScript Detection Scheme using the Feature based on Divided URL," Proceedings of the IEEE International Conference on Communications, pp.1-6, 2017.
[19] D. Sahoo, C. Liu and S.C. Hoi, "Malicious URL Detection using Machine Learning: A Survey," arXiv:1701.07179, pp.1-37, 2017.
[20] P. Mehta, M. Bukov, C.H. Wang, A.G.R. Day, C. Richardson, C.K. Fisher and D.J. Schwab, "A High-Bias, Low-Variance Introduction to Machine Learning for Physicists," arXiv:1803.08823, pp.1-124, 2019.
[21] M. Zareapoor and P. Shamsolmoali, "Application of Credit Card Fraud Detection: Based on Bagging Ensemble Classifier," Procedia Computer Science, Vol. 48, No. C, pp.679-686, 2015.
[22] S. Patil, H. Somavanshi, J. Gaikwad, A. Deshmane and R. Badgujar, "Credit Card Fraud Detection Using Decision Tree Induction Algorithm," Proceedings of the International Journal of Computer Science and Mobile Computing, pp.92-95, 2015.
[23] J.O. Awoyemi, A.O. Adetunmbi, and S.A. Oluwadare, "Credit Card Fraud Detection using Machine Learning Techniques: A Comparative Analysis," Proceedings of International Conference on Computing Networking and Informatics, pp.1-9, 2017.
[24] N.T. Lich, N.T.T. Thuy and N.T. Toan, "MASI: Moving to Adaptive Samples in Imbalanced Credit Card Dataset for Classification," Proceedings of the IEEE International Conference on Innovative Research and Development, pp.1-5, 2018.
[25] S. Xuan, G. Liu, Z. Li, L. Zheng, S. Wang and C. Jiang, "Random Forest for Credit Card Fraud Detection," Proceedings of the IEEE International Conference on Networking Sensing and Control, pp.1-6, 2018.
[26] D. Devi, S.K. Biswas and B. Purkayastha, "A Cost-Sensitive Weighted Random Forest Technique for Credit Card Fraud Detection, " Proceedings of the 10th International Conference on Computing, Communication and Networking Technologies, pp.1-6, 2019.
[27] S. Makki, Z. Assaghir, Y. Taher, R. Haque, M.S. Hacid and H. Zeineddine, "An Experimental Study with Imbalanced Classification Approaches for Credit Card Fraud Detection," IEEE Access, Vol.7, pp.93010-93022, 2019.
[28] T.H. Huang, C.M. Yu and H.Y. Kao, "Data-Driven and Deep Learning Methodology for Deceptive Advertising and Phone Scams Detection," Proceedings of the International Conference on Technologies and Applications of Artificial Intelligence, pp.166-171, 2017.
[29] H. He, Y. Bai, E.A. Garcia, and S. Li, "ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning," Proceeding of the IEEE International Joint Conference on Neural Networks, pp.1322-1328, 2008.
[30] G. Wei, J. Zhao, Z. Yu, Y, Feng, G. Li and X. Sun, "An Effective Gas Sensor Array Optimization Method Based on Random Forest," Proceedings of the IEEE Conference on Sensors, pp.1-4, 2018.
[31] X. Liu, P. He, W. Chen, and J. Gao, "Multi-task Deep Neural Networks for Natural Language Understanding," arXiv:1901.11504, pp.1-10, 2019.
[32] Y. Ju, G. Sun, and Q. A. Chen, "Model Combining Convolutional Neural Network and Light GBM Algorithm for Ultra-Short-Term Wind Power Forecasting," IEEE Access, Vol.7, pp.28309-28318, 2019.