簡易檢索 / 詳目顯示

研究生: 馬順哲
Shun-Che Ma
論文名稱: 檢測含有模糊操作碼的加殼樣本中物聯網惡意軟體
Detecting IoT Malware in Packed Samples with Ambiguous Opcode
指導教授: 鄭欣明
Shin-Ming Cheng
口試委員: 李育杰
Yuh-Jye Lee
王紹睿
Shao-Jui Wang
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 48
中文關鍵詞: 物聯網惡意軟體加殼操作碼靜態分析機器學習
外文關鍵詞: IoT malware, packing, opcode, static analysis, machine learning
相關次數: 點閱:248下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在物聯網快速發展的時代,惡意軟體威脅不斷增加,已成為一個重要的資訊安全問題。攻擊者利用加殼技術來規避惡意軟體檢測器的檢測,使得逆向工具難以分析加殼樣本。然而,即使這些工具成功破殼樣本,結果也不一定準確,導致在逆向工程後產生了包含模糊操作碼(Ambiguous Opcodes)的破殼失敗樣本。模糊操作碼會對基於機器學習的惡意軟體檢測方法的效果產生負面影響。為了解決這個問題,我們提出了一種新的方法,利用 n-Gram 和遞迴特徵刪除法(Recursive Feature Elimination)自動從破殼失敗的樣本中排除模糊操作碼。我們的實驗表明,排除模糊操作碼信息可以將破殼失敗樣本的檢測準確率從 85.8%提高到86.4%。本文重點介紹了加殼技術在物聯網領域中所帶來的挑戰,並提供了一個有望改善破殼失敗樣本的檢測準確率的解決方案。我們相信本文實驗的結果將有助於開發有效的惡意軟體檢測方法,提升物聯網設備的安全性。


    The ever-increasing threat of malware in the Internet of Things (IoT) era has become a major cybersecurity concern. Attackers use packing techniques to evade detection by malware detectors, making it difficult for reverse engineering tools to analyze packed samples. However, even when these tools manage to unpack samples, the results may not always be accurate, resulting in failed unpacked samples that contain ambiguous opcodes (AOs) after reverse engineering. AOs do not belong to the sample itself, nor can they represent its behavior and can negatively impact the effectiveness of machine learning-based malware detection methods. To address this issue, we propose a novel method that utilizes n-Gram and Recursive Feature Elimination (RFE) to automatically exclude AOs from failed unpacked samples. n-Gram can preserve the continuity between opcodes, while Recursive Feature Elimination (RFE) can iteratively calculate feature importance and select the best features during the training process, aiming to eliminate irrelevant features. Our experiments demonstrate that excluding AO information can improve the detection accuracy of failed unpacked samples from 85.8% to 86.4%. This paper highlights the challenges posed by packing techniques in the IoT field and provides a promising solution to improve the detection accuracy of failed unpacked samples. We believe that the results reported in this paper can contribute to the development of effective malware detection methods for enhancing the security of IoT devices.

    中文摘要 i ABSTRACT ii 誌謝 iv 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Challenges and Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Background and Related Work 5 2.1 IoT Malware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Packing Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Previous Work on Packed Samples Analysis . . . . . . . . . . . . . . 7 2.3.1 Static analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3.2 Static and dynamic analysis . . . . . . . . . . . . . . . . . . 8 2.3.3 Other methods . . . . . . . . . . . . . . . . . . . . . . . . . 9 3 Proposed Method 10 3.1 Reverse Engineering on Packed Samples . . . . . . . . . . . . . . . . 12 3.2 Ambiguous Opcodes . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.3.1 n-Gram features . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3.2 TF-IDF representation . . . . . . . . . . . . . . . . . . . . . 15 3.4 Feature Selection by RFE . . . . . . . . . . . . . . . . . . . . . . . . 16 3.5 Classification Methods . . . . . . . . . . . . . . . . . . . . . . . . . 16 4 Experiments 19 4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2.1 Recall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2.2 Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.2.3 False Positive Rate (FPR) . . . . . . . . . . . . . . . . . . . 22 4.3 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.3.1 Impact of AOs on Malware Detection Accuracy . . . . . . . . 23 4.3.2 Feature Selection Using RFE . . . . . . . . . . . . . . . . . . 25 4.3.3 Detection Accuracy against Clear Opcode Rate . . . . . . . . 26 5 Discussion and Future Work 31 5.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 6 Conclusions 33

    [1] “Upx,” https://upx.github.io/.
    [2] G. Sood, virustotal: R Client for the virustotal API, 2017, r package version 0.2.1.
    [3] I. Santos, F. Brezo, J. Nieves, Y. K. Penya, B. Sanz, C. Laorden, and P. G. Bringas, “Idea: Opcode-sequence-based malware detection,” in Proc. Springer ESSoS 2010, Feb. 2010, pp. 35–43.
    [4] Q.-D. Ngo, H.-T. Nguyen, V.-H. Le, and D.-H. Nguyen, “A survey of iot malware and detection methods based on static features,” Elsevier ICT Express, vol. 6, no. 4, pp. 280–286, 2020.
    [5] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selection for cancer classification using support vector machines,” Machine Learning, vol. 46, pp. 389–422, 01 2002.
    [6] Y. Mekdad, G. Bernieri, M. Conti, and A. E. Fergougui, “A threat model method for ics malware: the trisis case,” in Proc. ACM International Conference on Computing Frontiers 2021, May 2021, pp. 221–228.
    [7] M. Wazid, A. K. Das, J. J. Rodrigues, S. Shetty, and Y. Park, “Iomt malware detection approaches: analysis and research challenges,” IEEE Access, vol. 7, pp. 182 459–182 476, 2019.
    [8] J. C. S. Sicato, P. K. Sharma, V. Loia, and J. H. Park, “Vpnfilter malware analysis on cyber threat in smart home network,” MDPI Applied Sciences, vol. 9, no. 13, p. 2763, 2019.
    [9] D. Sharma and H. K. Verma, “Malware signature and behavior performance evaluation utilizing packers,” in Proc. IEEE ASIANCON 2022, Aug. 2022, pp. 1–8.
    [10] H. J. Asghar, B. Z. H. Zhao, M. Ikram, G. Nguyen, D. Kaafar, S. Lamont, and D. Coscia, “Sok: Use of cryptography in malware obfuscation,” arXiv preprint arXiv:2212.04008, 2022.
    [11] S. Gülmez and I. Sogukpinar, “Graph-based malware detection using opcode sequences,” in Proc. IEEE ISDFS 2021, Jun. 2021, pp. 1–5.
    [12] A. G. Kakisim, S. Gulmez, and I. Sogukpinar, “Sequential opcode embeddingbased malware detection method,” Elsevier Computers & Electrical Engineering, vol. 98, p. 107703, 2022.
    [13] X. Gao, C. Hu, C. Shan, and W. Han, “Malicage: A packed malware family classification framework based on dnn and gan,” Elsevier JISA, vol. 68, p. 103267, 2022.
    [14] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, and Y. Bengio, “Generative adversarial networks,” Commun. Acm, vol. 63, no. 11, pp. 139–144, 2020.
    [15] Y. Hua, Y. Du, and D. He, “Classifying packed malware represented as control flow graphs using deep graph convolutional neural network,” in Proc. IEEE ICCEA 2020, Mar. 2020, pp. 254–258.
    [16] Q. Sun, M. Abuhamad, E. Abdukhamidov, E. Chan-Tin, and T. Abuhmed, “Mlxpack: Investigating the effects of packers on ml-based malware detection systems using static and dynamic traits,” in Proc. Workshop on CySSS 2022, May 2022, pp. 11–18.
    [17] N. Maleki, M. Bateni, and H. Rastegari, “An improved method for packed malware detection using pe header and section table information,” Modern Education and Computer Science Press IJCNIS, vol. 11, no. 9, p. 9, 2019.
    [18] X. Li, Z. Shan, F. Liu, Y. Chen, and Y. Hou, “A consistently-executing graphbased approach for malware packer identification,” IEEE Access, vol. 7, pp. 51 620–51 629, 2019.
    [19] H. Aghakhani, F. Gritti, F. Mecca, M. Lindorfer, S. Ortolani, D. Balzarotti, G. Vigna, and C. Kruegel, “When malware is packin’heat; limits of machine learning classifiers based on static analysis features,” in Proc. Symposium NDSS 2020, Feb. 2020.
    [20] “Retdec,” https://github.com/avast/retdec.
    [21] J. Leskovec, A. Rajaraman, and J. D. Ullman, Mining of Massive Datasets, 2nd ed. Cambridge University Press, 2014.
    [22] Wikipedia, “Random forest.” [Online]. Available: https://en.wikipedia.org/wiki/Random_forest
    [23] Wikipedia, “k-nearest neighbors algorithm.” [Online]. Available: https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm
    [24] Wikipedia, “Multilayer perceptron.” [Online]. Available: https://en.wikipedia.org/wiki/Multilayer_perceptron
    [25] Wikipedia, “Decision tree.” [Online]. Available: https://en.wikipedia.org/wiki/Decision_tree
    [26] Wikipedia, “Gradient boosting.” [Online]. Available: https://en.wikipedia.org/wiki/Gradient_boosting
    [27] “Detect-it-easy,” https://github.com/horsicq/Detect-It-Easy.

    無法下載圖示 全文公開日期 2025/08/16 (校內網路)
    全文公開日期 2025/08/16 (校外網路)
    全文公開日期 2025/08/16 (國家圖書館:臺灣博碩士論文系統)
    QR CODE