簡易檢索 / 詳目顯示

研究生: 陳昱蓁
Yu-Chen Chen
論文名稱: 人工智慧應用於技術支援詐騙偵測與特徵分析
Artificial Intelligence with Feature Analysis for Technical Support Scam Detection
指導教授: 陳俊良
Jiann-Liang Chen
馬奕葳
Yi-Wei Ma
口試委員: 郭耀煌
Yau-Hwang Kuo
廖婉君
Wan-Jiun Liao
孫雅麗
Yea-Li Sun
陳俊良
Jiann-Liang Chen
黎碧煌
Bih-Hwang Lee
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 中文
論文頁數: 105
中文關鍵詞: 技術支援詐騙惡意網頁人工智慧深度學習
外文關鍵詞: Technical Support Scam
相關次數: 點閱:265下載:16
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著資訊傳播越發快速,網路使用者接收各式各樣的資訊,其中難免夾雜惡意內容,通常搭配不同社交工程(Social Engineering)手法達到網路詐騙的目的。而駭客會將攻擊手法與時俱進以欺騙使用者藉以控制、竊取資料,並以此作為傳播惡意程式至終端設備的主要途徑。技術支援詐騙(Technical Support Scam, TSS)為近年較新穎的詐騙手法,使用者視窗會跳出終端系統商的警告畫面,並要求使用者撥打電話給技術支援,但其實是假警告,其目的為騙取使用者信任從而騙取維修費與相關個資,甚至於使用者終端植入惡意程式。
    本研究為防範「技術支援詐騙」網路威脅,基於行為特徵提出一智慧化學習系統-AI@TSS系統,其概念結合網頁樣本分析、建立特徵與特徵評估等機制打造偵測模型,期望此模型能夠提供防毒軟體公司或相關業者在防禦技術支援詐騙之惡意攻擊時有更明確的策略。本研究收集惡性與技術支援詐騙網頁,並切割為訓練與測試資料集,其命名為AI@TSS資料集。為了驗證資料集可信度,本研究藉由Principle Component Analysis與Autoencoder降維機制繪製散點圖,從散點圖能證實AI@TSS資料集的樣本類別標記正確。
    為了將技術支援詐騙網頁從惡意網頁中偵測出來,本研究分析網頁行為並提出42項特徵,分別為惡意網頁偵測導向之host-based特徵與技術支援詐騙網頁偵測導向之code-based特徵,並經由LightGBM特徵評估機制證實本研究所提出的自定義特徵具有鑑別度。最終將最佳特徵組合所訓練出之偵測模型通過本研究的實際測試,其效能在LightGBM演算法之偵測模型可達到98%的準確度,Random Forest演算法之偵測模型可達到95.84%的準確度,而Deep Neural Network架構可達到93%的準確度。由數據結果可知,AI@TSS系統之偵測模型優於現有技術支援偵測方法。


    Technical Support Scam (TSS) is a cybercrime that not only elicits the trust of a user but also takes their property. The user's web window pop-up warning information when the technical support scamming process been started, which mimics the official system to scams personal information through fake technical support. The scammer uses a variety of confidence tricks to persuade the user to pay for the supposed maintenance service. Finally, the scammer installs malware on the victim device.
    For the purpose of detecting TSS-type attacks among malicious samples, behavioral patterns are used to design a novel TSS-aware task system, called AI@TSS system. This system consists of web sample analysis, feature establishment, and feature evaluation, to generate a detection model. AI@TSS expected to provide antivirus companies or related companies with a more specific strategy to defend malicious infection. To analyze the web page behavior, thousands of samples were collected as training data and hundreds of samples were collected as testing data. This study uses the Principle Component Analysis and Autoencoder reduced-dimension mechanism to plot scatterplots, which confirm the samples class labeling of the dataset is correct.
    Forty-two features are proposed, which include the host-based feature for malicious web page detection and the code-based feature for TSS web page detection. The feature analysis mechanism proves the code-based features that are proposed in this task make TSS and malicious samples effectively distinguish. The experimental performance demonstrates that the LightGBM algorithm reaches 98% accuracy. The comparison shows superior to the existing detection methods, with 2.16% and 3.48% improvement in accuracy and precision. This study confirms the effectiveness of AI@TSS in detecting tech support scam web pages.

    摘要 I Abstract II List of Tables XI Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Contributions 8 1.3 Organization 10 Chapter 2 Related Work 11 2.1 Tech Support Scam Concept 11 2.2 Anti-Scams Techniques 17 2.2.1 Blacklist-based Detection 17 2.2.2 Heuristics-based Detection 18 2.3 Artificial Intelligent 19 2.4 Previous Study 20 Chapter 3 Proposed AI@TSS System 23 3.1 AI@TSS Architecture 23 3.2 Data Collection 26 3.2.1 Data Source 26 3.2.2 Crawler Technology 26 3.3 Feature Definition 29 3.3.1 Host-based Features 30 3.3.2 Code-based Features 42 3.4 Data Distribution 55 Chapter 4 System Environment and Performance Analysis 61 4.1 System Environment 61 4.1.1 Experimental Environment 61 4.1.2 Experimental Parameter 63 4.2 Feature-based Analysis 66 4.2.1 Features Observing 67 4.2.2 Features Contrasting 67 4.2.2.1 Feature Analysis of Host-based Features 68 4.2.2.2 Feature Analysis of Host-based Features with F42 69 4.2.2.3 Feature Analysis of Code-based Features 71 4.2.2.4 Feature Analysis of Code-based Features with F9 73 4.2.2.5 Feature Analysis of Self-defined Features 74 4.2.2.6 Feature Analysis of All Features 75 4.2.3 AI@TSS Classifier Generation 77 4.3 Performance Analysis 79 4.3.1 AI@TSS Classifier Prediction 80 4.3.2 Comparison of Different Studies 81 4.4 Summary 82 Chapter 5 Conclusions and Future Works 85 5.1 Conclusions 85 5.2 Future Works 85 References 87

    [1] D. Rico-Bautista, Y. Medina-Cárdenas and C.D. Guerrero, "Smart University: A Review from the Educational and Technological View of Internet of Things," Proceedings of the International Conference on Information Technology and Systems, pp. 427-440, 2019.
    [2] K.d.S. Brito, A.A.de Lima, S.E. Ferreira, V.de Arruda Burégio, V.C. Garcia and S.R.de Lemos Meira, "Evolution of the Web of Social Machines: A Systematic Review and Research Challenges, " IEEE Transactions on Computational Social Systems, Vol.7, No.2, pp.373-388, 2020.
    [3] A.S. Timmaraju, A. Liu and P. Tripathi, "Addressing Challenges in Building Web-Scale Content Classification Systems," Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp.8134-8138, 2020.
    [4] L. Invernizzi, P. M. Comparetti, S. Benvenuti, C. Kruegel, M. Cova, and G. Vigna, "EvilSeed: A Guided Approach to Finding Malicious Web Pages, " Proceedings of the IEEE Symposium on Security and Privacy, pp.20-23, 2012.
    [5] Microsoft, "New Breakthroughs in Combatting Tech Support Scams, " Retrieved from https://blogs.microsoft.com/on-the-issues/2018/11/29/new-breakthroughs-in-combatting-tech-support-scams/ (last visited on 2020/07/13)
    [6] Magnimind, "10 Powerful Examples of AI Applications, " Retrieved from https://becominghuman.ai/10-powerful-examples-of-ai-applications-553f7f062d9f (last visited on 2020/07/13)
    [7] R. Trifonov, O. Nakov and V. Mladenov, "Artificial Intelligence in Cyber Threats Intelligence," Proceedings of the International Conference on Intelligent and Innovative Computing Applications, pp.1-4, 2018.
    [8] B.S. Sagar, S. Niranjan, N. Kashyap and D.N. Sachin, "Providing Cyber Security using Artificial Intelligence– A Survey," Proceedings of the3rd International Conference on Computing Methodologies and Communication, pp.717-720, 2019.
    [9] Y.C. Chen, L.D. Chen, Y.J. Chen, and J.L. Chen, "Malicious URL Classification using Machine Learning Techniques," IEICE Technical Report- Internet Architecture, pp.1-5, 2019.
    [10] D. Harley, M. Grooten, V. Bulletin, S.A. Burn and C. Johnston, "My PC Has 32,539 Errors: How Telephone Support Scams Really Work," Proceedings of the 22nd International Conference on Virus Bulletin, pp.1-8, 2012.
    [11] N. Miramirkhani, O. Starov and N. Nikiforakis, "Dial One for Scam: A Large-Scale Analysis of Technical Support Scams," Proceedings of the Symposium on 24th Network and Distributed System Security, pp.1-15, 2017.
    [12] B.J. Musadiq, and C. S, "A Call to Deal with Technical Support Scams," International Journal of Research in Engineering, Science and Management, pp.406-410, 2019.
    [13] J. Larson, B. Tower, D. Hadfield, D. Edge, and C. White, "Using Web-scale Graph Analytics to Counter Technical Support Scams," Proceedings of the IEEE International Conference on Big Data, pp.3968-3971, 2018.
    [14] R. Sampsa and V. Leppänen, "You have a Potential Hacker’s Infection: A Study on Technical Support Scams," Proceedings of IEEE International Conference Computer and Information Technology, pp.197-203, 2017.
    [15] V.S. Tseng, J.C. Ying, C.W. Huang, Y. Kao and K.T. Chen, "Fraudetector: A Graph-Mining-based Framework for Fraudulent Phone Call Detection," Proceedings of ACM International Conference on Knowledge Discovery and Data Mining, pp.2157-2166, 2015.
    [16] P. Gupta, B. Srinivasan and M. Ahamad, "Phoneypot: Data-driven Understanding of Telephony Threats," Proceedings of the Symposium on Network and Distributed System Security, pp.1-14, 2015.
    [17] E. Tyugu, "Artificial Intelligence in Cyber Defense," Proceedings of the 3rd International Conference on Cyber Conflict, pp.1-11, 2011.
    [18] S. Morishige, S. Haruta, H. Asahina and I. Sasase, "Obfuscated Malicious JavaScript Detection Scheme using the Feature based on Divided URL," Proceedings of the IEEE International Conference on Communications, pp.1-6, 2017.
    [19] D. Sahoo, C. Liu and S.C. Hoi, "Malicious URL Detection using Machine Learning: A Survey," arXiv:1701.07179, pp.1-37, 2017.
    [20] P. Mehta, M. Bukov, C.H. Wang, A.G.R. Day, C. Richardson, C.K. Fisher and D.J. Schwab, "A High-Bias, Low-Variance Introduction to Machine Learning for Physicists," arXiv:1803.08823, pp.1-124, 2019.
    [21] M. Zareapoor and P. Shamsolmoali, "Application of Credit Card Fraud Detection: Based on Bagging Ensemble Classifier," Procedia Computer Science, Vol. 48, No. C, pp.679-686, 2015.
    [22] S. Patil, H. Somavanshi, J. Gaikwad, A. Deshmane and R. Badgujar, "Credit Card Fraud Detection Using Decision Tree Induction Algorithm," Proceedings of the International Journal of Computer Science and Mobile Computing, pp.92-95, 2015.
    [23] J.O. Awoyemi, A.O. Adetunmbi, and S.A. Oluwadare, "Credit Card Fraud Detection using Machine Learning Techniques: A Comparative Analysis," Proceedings of International Conference on Computing Networking and Informatics, pp.1-9, 2017.
    [24] N.T. Lich, N.T.T. Thuy and N.T. Toan, "MASI: Moving to Adaptive Samples in Imbalanced Credit Card Dataset for Classification," Proceedings of the IEEE International Conference on Innovative Research and Development, pp.1-5, 2018.
    [25] S. Xuan, G. Liu, Z. Li, L. Zheng, S. Wang and C. Jiang, "Random Forest for Credit Card Fraud Detection," Proceedings of the IEEE International Conference on Networking Sensing and Control, pp.1-6, 2018.
    [26] D. Devi, S.K. Biswas and B. Purkayastha, "A Cost-Sensitive Weighted Random Forest Technique for Credit Card Fraud Detection, " Proceedings of the 10th International Conference on Computing, Communication and Networking Technologies, pp.1-6, 2019.
    [27] S. Makki, Z. Assaghir, Y. Taher, R. Haque, M.S. Hacid and H. Zeineddine, "An Experimental Study with Imbalanced Classification Approaches for Credit Card Fraud Detection," IEEE Access, Vol.7, pp.93010-93022, 2019.
    [28] T.H. Huang, C.M. Yu and H.Y. Kao, "Data-Driven and Deep Learning Methodology for Deceptive Advertising and Phone Scams Detection," Proceedings of the International Conference on Technologies and Applications of Artificial Intelligence, pp.166-171, 2017.
    [29] H. He, Y. Bai, E.A. Garcia, and S. Li, "ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning," Proceeding of the IEEE International Joint Conference on Neural Networks, pp.1322-1328, 2008.
    [30] G. Wei, J. Zhao, Z. Yu, Y, Feng, G. Li and X. Sun, "An Effective Gas Sensor Array Optimization Method Based on Random Forest," Proceedings of the IEEE Conference on Sensors, pp.1-4, 2018.
    [31] X. Liu, P. He, W. Chen, and J. Gao, "Multi-task Deep Neural Networks for Natural Language Understanding," arXiv:1901.11504, pp.1-10, 2019.
    [32] Y. Ju, G. Sun, and Q. A. Chen, "Model Combining Convolutional Neural Network and Light GBM Algorithm for Ultra-Short-Term Wind Power Forecasting," IEEE Access, Vol.7, pp.28309-28318, 2019.

    QR CODE