簡易檢索 / 詳目顯示

研究生: 郭文翰
Wen-Han Kuo
論文名稱: 人工智慧技術於惡意軟體家族偵測之研究
Artificial Intelligence Technology for Malware Family Detection
指導教授: 陳俊良
Jiann-Liang Chen
口試委員: 陳俊良
Jiann-Liang Chen
黎碧煌
Bih-Hwang Lee
鄧惟中
Wei-Chung Teng
林華君
Hwa-Chun Lin
蔡榮宗
Jung-Tsung Tsai
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 93
中文關鍵詞: 惡意軟體人工智慧機器學習深度學習數據不平衡未知惡意軟體檢測
外文關鍵詞: Malware, Artificial Intelligence, Machine Learning, Deep Learning, Data Imbalance, Unknown Malware Detection
相關次數: 點閱:423下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著物聯網設備與通訊技術的快速發展,擴大了網際網路的應用方式,為了追求生活品質的情況下,無數的物聯網裝置及應用服務逐年成長。然而,資訊安全的重要性卻是多數人常忽視的議題,越來越多的有心人士或是駭客都可以利用惡意軟體針對網路安全性的漏洞進行攻擊。隨著惡意軟體威脅增加的同時,惡意軟體檢測系統的重要性也不言而喻。
    本研究提出一整合式系統架構,其概念結合機器學習、深度學習、資料平衡與特徵評估等機制建立惡意軟體檢測系統,並以家族的概念呈現分類結果,期望能夠提供防毒軟體公司或相關業者在防禦惡意軟體攻擊時有更明確的策略。本研究的資料來源為CTU-13開放資料集,由於CTU的資料是透過校園區域網路進行流量蒐集,資料集中會包含了正常、惡意與背景等流量。為了減少數據集中的雜訊及提升整體模型效能,本研究利用ANOVA、Chi-Square與AutoEncoder等特徵評估機制進行數據分析,將會降低模型準確率的特徵移除,進而達到減少模型運算時間和提高模型穩定性之目的。本研究考量到原始的數據集,各個類別的惡意軟體與良性軟體之間存在資料數量不平衡的問題。為了解決上述之問題,本研究在架構中加入資料平衡機制,透過SMOTEENN演算法針對少數類別的數據進行生成,以解決模型偏移之問題,並提升體整模型可信度。
    本研究也考慮了惡意軟體類型隨時間不斷更新與成長之情況,在本研究神經網路架構中,利用Activation Function機制針對惡意軟體進行偵測,若發現未知的惡意軟體,換句話說即不屬於本研究所訓練之家族列表中,則在下次進行模型更新時,將其加入本研究家族列表中並重新進行模型訓練。最終通過本研究之架構,在效能分析結果上,XGBoost演算法的偵測模型可達99.98%之準確率,而Back Propagation架構的偵測模型可達98.88%之準確率。


    The rapid development of Internet of Things (IoT) devices and communication technologies have greatly expanded the application of the internet. In response to people’s pursuit of high quality of life, the number of IoT devices and related services have increased annually. However, the importance of information security has been overlooked by majority of people, promoting hackers and those with ulterior motives to use malware to attack security holes in Internet applications. With the number of attack incidents increasing, detection system of malware has become imperative.
    This study proposed an integrative system framework that combines machine learning, deep learning, data balancing, and feature evaluation mechanism to detect malware, and a family-based approach was used to present classification results. The proposed framework can serve as a reference for antivirus companies and related service providers to develop adequate strategies for defending against malware attacks. This study acquired data from the CTU-13 open dataset, which was compiled through capturing the traffic from the network of a university. The dataset includes normal, malware, and background traffic. In order to reduce the noise in the dataset and improve the overall model efficiency, this study performed data analysis using feature evaluation methods including ANOVA, Chi-Square and AutoEncoder. Features that reduce the model accuracy were removed to reduce the model computation time and improve model stability. Because imbalanced data existed among various classes of malware and benign software in the original dataset, a data balancing mechanism was introduced to resolve this problem. The SMOTEENN algorithm was used to generate data for minority classes, thereby alleviating model deviations and enhancing the overall model credibility.
    This study also considered that malware receives updates and grows in number over time. Therefore, the neural networks architecture adopted in this study employ an activation function mechanism to detect malware. When an unknown malware program be found that does not belong to any family derived from the previous neural networks architecture, this mechanism incorporates the program in the model training for the subsequent model update. Analysis on the efficiency of the proposed framework revealed that the detection models with XGBoost and Back Propagation reached an accuracy rate of 99.98% and 98.88%, respectively.

    摘要 I Abstract II Contents IV List of Figures VI List of Tables VIII Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Contributions 6 1.3 Organization 8 Chapter 2 Background Knowledge 9 2.1 Malware Concept 9 2.2 Malware Detection Techniques 11 2.3 Artificial Intelligence 15 Chapter 3 Proposed Method 20 3.1 System Overview 20 3.2 Data Processing Layer 22 3.3 Data Training Layer 33 3.4 Data Testing Layer 34 4.1 System Environment 39 4.2 Performance Analysis 42 4.3 Comparison with Different Researches 70 4.4 Summary 73 Chapter 5 Conclusion and Future Work 76 5.1 Conclusion 76 5.2 Future Work 77 References 78

    [1] Gartner, Gartner Says 6.4 Billion Connected "Things" Will Be in Use in 2016, Up 30 Percent from 2015 (http://www.gartner.com/newsroom/id/3165317)
    [2] Statista, Internet of Things (IoT) connected devices installed base worldwide from 2015 to 2025 (in billions) (https://www.statista.com/statistics/471264/iot-number-of-connected-devices-worldwide/)
    [3] S. Naik and V. Maral, "Cyber security — IoT," Proceedings of the IEEE International Conference On Recent Trends in Electronics Information & Communication Technology (RTEICT), pp. 764-767, 2017.
    [4] HACKMAGEDDON, 2018: A Year of Cyber Attacks (https://www.hackmageddon.com/2019/01/15/2018-a-year-of-cyber-attacks/)
    [5] HACKMAGEDDON, April 2019 Cyber Attacks Statistics (https://www.hackmageddon.com/2019/05/20/april-2019-cyber-attacks-statistics/)
    [6] J.A. Berkowsky and T. Hayajneh, "Security Issues with Certificate Authorities," Proceedings of the IEEE 8th Annual Ubiquitous Computing, Electronics and Mobile Communication Conference (UEMCON), pp. 449-455, 2017.
    [7] Google Transparency Report, HTTPS Encryption on the Web (https://transparencyreport.google.com/https/overview?hl=en)
    [8] Symantec, Encrypted Traffic Management for Dummies (https://www.symantec.com/content/dam/symantec/docs/other-resources/encrypted-traffic-management-for-dummies.pdf)
    [9] A. Abdelsalam, S. Salsano, F. Clad, P. Camarillo and C. Filsfils, "SR-Snort: IPv6 Segment Routing Aware IDS/IPS," Proceedings of the IEEE Conference on Network Function Virtualization and Software Defined Networks (NFV-SDN), pp. 1-2, 2018.
    [10] B. Pingle, A. Mairaj and A.Y. Javaid, "Real-World Man-in-the-Middle (MITM) Attack Implementation Using Open Source Tools for Instructional Use," Proceedings of the IEEE International Conference on Electro/Information Technology (EIT), pp. 192-197, 2018.
    [11] Malwarebytes, 2019 State of Malware (https://resources.malwarebytes.com/files/2019/01/Malwarebytes-Labs-2019-State-of-Malware-Report-2.pdf)
    [12] H.A. Khan, A. Syed, A. Mohammad and M.N. Halgamuge, "Computer Virus and Protection Methods Using Lab Analysis," Proceedings of the IEEE 2nd International Conference on Big Data Analysis (ICBDA), pp. 882-886, 2017.
    [13] J.W. Ho and M. Wright, "Distributed Detection of Sensor Worms Using Sequential Analysis and Remote Software Attestations," IEEE Access, Vol. 5, pp. 680-695, 2017.
    [14] Y. Qin and T. Xia, "Sensitivity Analysis of Ring Oscillator based Hardware Trojan Detection," Proceedings of the IEEE 17th International Conference on Communication Technology (ICCT), pp. 1979-1983, 2017.
    [15] O. Nasser, S. AlThuhli, M. Mohammed, R. AlMamari and F. Hajamohideen, "An Investigation of Backdoors Implication to Avoid Regional Security Impediment," Proceedings of the Global Conference on Communication Technologies (GCCT), pp. 409-412, 2015.
    [16] J. Wang, H. Li and J. Zhao, "Micro-blog Spammer Detection based on Characteristics of Social Behaviors," Proceedings of the 8th IEEE International Conference on Software Engineering and Service Science (ICSESS), pp. 358-362, 2017.
    [17] A. Mittal, "Resolving the Menace of Spyware through Implementations in Application Layer and Network Layer," Proceedings of the Students Conference on Engineering and Systems, pp. 1-4, 2012.
    [18] E. Erturk, "A Case Study in Open Source Software Security and Privacy: Android Adware," Proceedings of the World Congress on Internet Security (WorldCIS), pp. 189-191, 2012.
    [19] G. Vormayr, T. Zseby and J. Fabini, "Botnet Communication Patterns," Journal of the IEEE Communications Surveys & Tutorials, Vol. 19, No. 4, pp. 2768-2796, 2017.
    [20] J.S. Aidan, H.K. Verma and L.K. Awasthi, "Comprehensive Survey on Petya Ransomware Attack," Proceedings of the International Conference on Next Generation Computing and Information Systems (ICNGCIS), pp. 122-125, 2017.
    [21] C.H. Kim, K.E. Kamundala and S. Kang, "Efficiency-based Comparison on Malware Detection Techniques," Proceedings of the International Conference on Platform Technology and Service (PlatCon), pp. 1-6, 2018.
    [22] J. Landage and M.P. Wankhade, "Malware and Malware Detection Techniques: A Survey," International Journal of Engineering Research & Technology (IJERT), Vol. 2, No. 12, 2013.
    [23] S. Jamalpur, Y.S. Navya, P. Raja, G. Tagore and G.R.K. Rao, "Dynamic Malware Analysis Using Cuckoo Sandbox," Proceedings of the Second International Conference on Inventive Communication and Computational Technologies (ICICCT), pp. 1056-1060, 2018.
    [24] F.C.C. Osorio, H. Qiu and A. Arrott, "Segmented Sandboxing - A Novel Approach to Malware Polymorphism Detection," Proceedings of the 10th International Conference on Malicious and Unwanted Software (MALWARE), pp. 59-68, 2015.
    [25] S. Umbarkar and S. Shukla, "Analysis of Heuristic Based Feature Reduction Method in Intrusion Detection System," Proceedings of the 5th International Conference on Signal Processing and Integrated Networks (SPIN), pp. 717-720, 2018.
    [26] B.J. Kumar, H. Naveen, B.P. Kumar, S.S. Sharma and J. Villegas, "Logistic Regression for Polymorphic Malware Detection Using ANOVA F-test," Proceedings of the International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS), pp. 1-5, 2017.
    [27] M. Gol and A. Abur, "A Modified Chi-Squares Test for Improved Bad Data Detection," Proceedings of the International Conference on IEEE Eindhoven PowerTech, pp. 1-5, 2015.
    [28] F. He, H. Yang, Y. Miao and R. Louis, "A Hybrid Feature Selection Method Based on Genetic Algorithm and Information Gain," Proceedings of the 5th International Conference on Computer Science and Network Technology (ICCSNT), pp. 320-323, 2017.
    [29] N. Aman, Y. Saleem, F.H. Abbasi and F. Shahzad, "A Hybrid Approach for Malware Family Classification," Proceedings of the International Conference on Applications and Techniques in Information Security (CCIS), Vol. 719, pp. 169-180, 2017.
    [30] A. Bansal and S. Mahapatra, "A Comparative Analysis of Machine Learning Techniques for Botnet Detection," Proceedings of the 10th International Conference on Security of Information and Networks, pp. 91-98, 2017.
    [31] L. Mathurb, M. Rahejab and P. Ahlawat, "Botnet Detection Via Mining of Network Traffic Flow," Proceedings of the International Conference on Computational Intelligence and Data Science (ICCIDS), Vol. 132, pp. 1668-1677, 2018.
    [32] E. Masabo, K.S. Kaawaase and J.S. Otim, "Big Data: Deep Learning for Detecting Malware," Proceedings of the IEEE/ACM Symposium on Software Engineering in Africa (SEiA), pp. 20-26, 2018.
    [33] N. Koroniotis, N. Moustafa, E. Sitnikova and J. Slay, "Towards Developing Network Forensic Mechanism for Botnet Activities in the IoT Based on Machine Learning Techniques," Proceedings of the 9th International Conference on Mobile Networks and Management, pp. 30-34, 2017.
    [34] R. Vinayakumar, M. Alazab, K.P. Soman, P. Poornachandran, A.A. Nemrat and S. Venkatraman, "Deep Learning Approach for Intelligent Intrusion Detection System," IEEE Access, Vol. 7, pp. 41525-41550, 2019.
    [35] A.S. Ahmad, "Brain Inspired Cognitive Artificial Intelligence for Knowledge Extraction and Intelligent Instrumentation System," Proceedings of the International Symposium on Electronics and Smart Devices (ISESD), pp. 352-356, 2017.
    [36] D. Ostrowski, "Artificial Intelligence with Big Data," Proceedings of the First International Conference on Artificial Intelligence for Industries (AI4I), pp. 125-126, 2018.
    [37] Okfalisa, I. Gazalba, M. Mustakim and N.G.I. Reza, "Comparative Analysis of K-Nearest Neighbor and Modified K-Nearest Neighbor Algorithm for Data Classification," Proceedings of the 2nd International conferences on Information Technology, Information Systems and Electrical Engineering (ICITISEE), pp. 294-298, 2017.
    [38] H. Xiangdong and W. Shaoqing, "Prediction of Bottom-Hole Flow Pressure in Coalbed Gas Wells Based on GA Optimization SVM," Proceedings of the IEEE 3rd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), pp. 138-141, 2018.
    [39] N.I. Nwulu, "A Decision Trees Approach to Oil Price Prediction," Proceedings of the International Artificial Intelligence and Data Processing Symposium (IDAP), pp. 1-5, 2017.
    [40] Z. Mengmeng and L. Yian, "Signal Sorting Using Teaching-Learning-Based Optimization and Random Forest," Proceedings of the 17th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES), pp. 258-261, 2018.
    [41] L. Jidong and Z. Ran, "Dynamic Weighting Multi Factor Stock Selection Strategy Based on XGboost Machine Learning Algorithm," Proceedings of the IEEE International Conference of Safety Produce Informatization (IICSPI), pp. 868-872, 2018.
    [42] I. Arora and A. Saha, "Comparison of Back Propagation Training Algorithms for Software Defect Prediction," Proceedings of the 2nd International Conference on Contemporary Computing and Informatics (IC3I), pp. 51-58, 2016.

    無法下載圖示 全文公開日期 2021/08/06 (校內網路)
    全文公開日期 2024/08/06 (校外網路)
    全文公開日期 2024/08/06 (國家圖書館:臺灣博碩士論文系統)
    QR CODE