Basic Search / Detailed Display

Author: 蔡尚洲
Shang-Chou Tsai
Thesis Title: 一個透過重新排序系統階層執行流程的新穎物聯網惡意軟體分類器
A Novel IoT Malware Classifier Based on Reordering System-Level Execution Flow
Advisor: 鄭欣明
Shin-Ming Cheng
Committee: 黃俊穎
Chun-Ying Huang
蕭舜文
Shun-Wen Hsiao
陳嘉玫
Chia-Mei Chen
Degree: 碩士
Master
Department: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
Thesis Publication Year: 2022
Graduation Academic Year: 110
Language: 中文
Pages: 46
Keywords (in Chinese): 物聯網惡意軟體系統呼叫串列機器學習動態分析
Keywords (in other languages): IoT malware, System call sequence, Machine learning, Dynamic analysis
Reference times: Clicks: 633Downloads: 10
Share:
School Collection Retrieve National Library Collection Retrieve Error Report

科技進步迅速的現代社會,人們對設備的效能和整合能力需求逐年增加。
終端設備和物聯網技術逐漸受到重視。但個人電腦和聯網設備的架構差異
過大,導致防毒軟體無法直接套用在物聯網架構上。由於上述原因,物聯
網設備在公開網路中存在許多資訊安全漏洞。為此必須分析和找出物聯網
運行時的安全漏洞。在動態分析中,從沙箱提取的惡意軟體執行流程可以
直接監測到惡意軟體的攻擊行為。雖然動態分析可以忽略惡意軟體在二進
制階層的混淆技術,但惡意軟體透過創建多個程序完成惡意攻擊,程序的
交錯執行掩蓋了惡意行為同時增加了特徵轉換後的雜訊,這些因素提高了
分析的難度。本文中,提出一個新的動態分析的分類框架,透過重新排序
執行流程也就是系統呼叫名稱序列。重新排序分成切割和聚合兩部分。切
割可以有效降低特徵轉換後的雜訊,聚合可以提高特徵向量對於惡意行為
描述的完整性連帶提高模型準確度。我們使用 84K 筆惡意軟體的資料集
進行實驗,以驗證所提出方法的有效性。結果顯示惡意軟體分類的準確率
在隨機森林上達到 98.7%,模型驗證時間和準確度都優於過往基於系統呼
叫名稱序列的分類方法。


In a modern society with rapid technological progress, the requirements
for performance and integration of devices are increasing every year. Terminal devices and Internet of Things(IoT) technologies are gradually becoming more and more important. However, the architecture of personal
computers and IoT devices is too different, so anti-virus software cannot be
directly applied on IoT architecture. Due to the above reasons, many information security vulnerabilities exist in the open networks of IoT devices.
For this reason, it is necessary to analyze and find the security vulnerabilities in the operation of IoT. In dynamic analysis, malware execution flows
extracted from sandboxes can be directly observed to detect malware attack behaviors. Although dynamic analysis can ignore malware obfuscation techniques at the binary level, malware creates multiple processes to
complete malicious attacks. The malicious behavior may masked by the
interleaved execution flow resulting in increased noise after feature transformation, which makes the analysis more difficult. In this paper, we propose a new classification framework for dynamic analysis by reordering
the execution process, i.e., System Call Name Sequence. Reordering has
two parts: splitting and fusion. Splitting can effectively reduce the noise
after feature transition. Fusion can improve the descriptive completeness of
the malicious behavior feature vector and the model accuracy. We conduct
experiments using 84K malware data set to verify the effectiveness of the
proposed method. The results show that the accuracy of malware classification reaches 98.7% on Random Forest, and the model verification time
and accuracy are better than the previous classification method based on
system call name sequence.

Abstract in Chinese . . . . . . . . . . . . . . . . . . . . . . . . . . iii Abstract in English . . . . . . . . . . . . . . . . . . . . . . . . . . iv Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Background and Related work . . . . . . . . . . . . . . . . . . 7 2.0.1 IoT malware . . . . . . . . . . . . . . . . . . . . 7 2.0.2 IoT malware analysis and related work . . . . . . 9 3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.0.1 data collection . . . . . . . . . . . . . . . . . . . 15 3.0.2 Feature preprocessing . . . . . . . . . . . . . . . 18 3.0.3 Split . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.0.4 Fusion . . . . . . . . . . . . . . . . . . . . . . . . 20 3.0.5 Classifier Training . . . . . . . . . . . . . . . . . 22 4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.0.1 Data set . . . . . . . . . . . . . . . . . . . . . . . 25 4.0.2 Parameter tuning . . . . . . . . . . . . . . . . . . 25 4.0.3 Evaluation metrics . . . . . . . . . . . . . . . . . 28 4.0.4 Performance evaluation . . . . . . . . . . . . . . 29 5 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.0.1 Compare to related articles . . . . . . . . . . . . . 31 5.0.2 Limitations and Future Work . . . . . . . . . . . . 31 6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

[1] S.-M. Cheng, P.-Y. Chen, C.-C. Lin, and H.-C. Hsiao, “Traffic-aware patching for cyber security in
mobile IoT,” vol. 55, pp. 29–35, July 2017.
[2] J. Granjal, E. Monteiro, and J. S. Silva, “Security for the internet of things: a survey of existing
protocols and open research issues,” IEEE Communications Surveys & Tutorials, vol. 17, pp. 1294–
1312, Jan. 2015.
[3] M. Antonakakis et al., “Understanding the Mirai botnet,” in Proc. USENIX Security 2017, pp. 1093–
1110, Aug. 2017.
[4] H. S. Galal, Y. B. Mahdy, and M. A. Atiea, “Behavior-based features model for malware detection,”
Computer Virology and Hacking Techniques, vol. 12, pp. 59–67, Dec. 2016.
[5] A. Costin and J. Zaddach, “IoT malware: Comprehensive survey, analysis framework and case studies,” in Proc. BlackHat USA 2018, Aug. 2018.
[6] Q.-D. Ngo, H.-T. Nguyen, H.-A. Tran, and D.-H. Nguyen, “IoT botnet detection based on the integration of static and dynamic vector features,” in Proc. IEEE ICCE 2020, Jan 2021.
[7] J. Singh and J. Singh, “A survey on machine learning-based malware detection in executable files,”
Systems Architecture, vol. 112, p. 101861, Jan 2021.
[8] H. Alasmary, A. Abusnaina, R. Jang, M. Abuhamad, A. Anwar, D. Nyang, and D. Mohaisen, “Soteria:
Detecting adversarial examples in control flow graph-based malware classifiers,” in Proc. in ICDCS,
pp. 888–898, IEEE, Feb 2020.
[9] K. Lucas, M. Sharif, L. Bauer, M. K. Reiter, and S. Shintre, “Malware makeover: breaking ml-based
static analysis by modifying executable bytes,” in Proc. in ACM ASIACCS, pp. 744–758, May 2021.
[10] J. T. Juwono, C. Lim, and A. Erwin, “A comparative study of behavior analysis sandboxes in malware
detection,” in Proc. in CONMEDIA, p. 73.
[11] A. Küchler, A. Mantovani, Y. Han, L. Bilge, and D. Balzarotti, “Does every second count? time-based
evolution of malware behavior in sandboxes.,” in Proc. in NDSS, Feb 2021.
[12] W.-C. Wu and S.-H. Hung, “Droiddolphin: a dynamic android malware detection framework using
big data and machine learning,”
[13] S. Hou, A. Saas, L. Chen, and Y. Ye, “Deep4maldroid: A deep learning framework for android malware detection based on linux kernel system call graphs,” in Proc. in IEEE/WIC/ACM WIW, pp. 104–
111, jan 2016.
[14] C. Acarturk, M. Sirlanci, P. G. Balikcioglu, D. Demirci, N. Sahin, and O. A. Kucuk, “Malicious code
detection: Run trace output analysis by lstm,” IEEE Access, vol. 9, p. 9625–9635, Jan 2021.
[15] D. Rabadi and S. G. Teo, “Advanced windows methods on malware detection and classification,” in
Proc. ACSAC, pp. 54–68, Dec 2020.
[16] Z. Zhang, P. Qi, and W. Wang, “Dynamic malware analysis with feature engineering and feature
learning,” in Proc. in AAAI 2020, vol. 34, pp. 1210–1217, April.
[17] G. L. Nguyen, B. Dumba, Q.-D. Ngo, H.-V. Le, and T. N. Nguyen, “A collaborative approach to early
detection of iot botnet,” Computers Electrical Engineering, p. 107525, Jan 2022.
[18] R. Surendran, T. Thomas, and S. Emmanuel, “Gsdroid: Graph signal based compact feature representation for android malware detection,” Expert Systems with Applications, vol. 159, p. 113581, Nov
2020.
[19] L. Onwuzurike, E. Mariconti, P. Andriotis, E. D. Cristofaro, G. Ross, and G. Stringhini, “Mamadroid:
Detecting android malware by building markov chains of behavioral models (extended version),”
ACM Transactions on Privacy and Security (TOPS), vol. 22, pp. 1–34, May 2019.
[20] Z. Liu, N. Japkowicz, R. Wang, Y. Cai, D. Tang, and X. Cai, “A statistical pattern based feature
extraction method on system call traces for anomaly detection,” IST, vol. 126, p. 106348, Oct 2020.
[21] Y. A. Ahmed, B. Koçer, S. Huda, B. A. S. Al-rimy, and M. M. Hassan, “A system call refinementbased enhanced minimum redundancy maximum relevance method for ransomware early detection,”
Journal of Network and Computer Applications, vol. 167, p. 102753, Oct 2020.
[22] Y. Wu, B. Zhang, Z. Lai, and J. Su, “Malware network behavior extraction based on dynamic binary
analysis,” in Proc. in IEEE International Conference on CSAE, pp. 316–320, June 2012.
[23] M. Ramilli, M. Bishop, and S. Sun, “Multiprocess malware,” in Proc. in MALWARE, p. 8–13, IEEE,
Oct 2011.
[24] S. M. Bidoki, S. Jalili, and A. Tajoddin, “Pbmmd: A novel policy based multi-process malware detection,” EAAI, vol. 60, pp. 57–70, April 2017.
[25] E. Aghaei and G. Serpen, “Ensemble classifier for misuse detection using n-gram feature vectors
through operating system call traces,” IJHIS, vol. 14, pp. 141–154, March 2017.
[26] H. Zhang, S. Luo, Y. Zhang, and L. Pan, “An efficient android malware detection system based on
method-level behavioral semantic analysis,” IEEE Access, vol. 7, p. 69246–69256, May 2019.
[27] W. Xie, S. Xu, S. Zou, and J. Xi, “A system-call behavior language system for malware detection
using a sensitivity-based lstm model,”
[28] J. Jeon, J. H. Park, and Y.-S. Jeong, “Dynamic analysis for IoT malware detection with convolution
neural network model,” IEEE Access, vol. 8, p. 96899–96911, May 2020.
[29] A. De Lorenzo, F. Martinelli, E. Medvet, F. Mercaldo, and A. Santone, “Visualizing the outcome of
dynamic analysis of android malware with vizmal,” JISA, vol. 50, p. 102423.
[30] “Radare2.” https://rada.re/r/.
[31] “Angr.” http://angr.io/.
[32] K. Han, B. Kang, and E. G. Im, “Malware analysis using visualized image matrices,” SWJ, vol. 2014,
July 2014.
[33] “Strace tool.” https://en.wikipedia.org/wiki/Strace. [Online]Available.
[34] virustotal: R Client for the virustotal API.
[35] “Random forest.” https://towardsdatascience.com/understanding-random-forest-58381e0602d2.
[36] “K nearest neighbors.” https://www.ibm.com/topics/knn.
[37] “Support vector machine.” https://docs.opencv.org/3.4/d1/d73/tutorial_
introduction_to_svm.html.
[38] “Naive bayse.” https://www.kdnuggets.com/2020/06/naive-bayes-algorithm-everything.
html.
[39] “Multi layer perceptron.” https://www.sciencedirect.com/topics/computer-science/
multilayer-perceptron.
[40] “Logistic regression.” https://www.ibm.com/topics/logistic-regression.
[41] “Decision tree.” https://en.wikipedia.org/wiki/Decision_tree.

QR CODE