簡易檢索 / 詳目顯示

研究生: 詹家祐
Jia-You Jhan
論文名稱: 基於基本塊嵌入之物聯網 惡意程式辨認
Basic Block-Based Embedding for IoT Malware Identification
指導教授: 李漢銘
Hahn-Ming Lee
鄭欣明
Shin-Ming Cheng
口試委員: 黃俊穎
Chun-Ying Huang
蕭旭君
Hsu-Chun Hsiao
毛敬豪
Ching-Hao Mao
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 49
中文關鍵詞: 惡意程式檢測靜態檢測語意分析
外文關鍵詞: malware detection, static analysis, semantic analysis
相關次數: 點閱:318下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

近年來針對物聯網的網路攻擊不斷地增加,這些攻擊也對物聯網的應用造
成了許多威脅。因此近年來大家也開始重視物聯網的安全性問題。由於物聯網
的設備較限制,近期的研究著重在靜態檢測的方法上。雖然這些方法,在準確
率上有很好的效果,但是卻仍然存在著許多問題。此外目前現有的物聯網惡意
程式偵測的方法缺乏考慮惡意程式本身的語意關係。於是本篇論文基於之前的
研究方法提出一個基於語意分析的高效惡意程式識別方法,透過在Qiling 模擬
引擎去得到程式正確執行後的指令前後文,再將所獲得的指令轉成基本塊的形
式,我們將程式的一個基本塊當成一個句子,使用自然語言的句子嵌入方法,
將基本塊轉成向量的形式,透過學習基本塊跟基本塊彼此之間的上下文關係,
去學習關於惡意程式的語意行為並將其表達成向量表達式。在保有相同準確率
的前提下,我們的方法改善了原本的方法所遇到的問題而且我們的方法在測試
時可以大幅降地時間成本。


In recent years, IoT has become the main target of attackers. Because the security
of the IoT devices is not enough safe. In order to against these IoT malware, many
research start to focus on IoT malware detection. But there are some limitations in
existing methods. Existing method in malware detection don’t consider the semantic
relationships of the malware. Therefore, this paper proposes an efficient malware
identification method based on semantic analysis. We extend the research of previous
work. We use the Qiling engine to get correct instructions context in the training stage.
Then, we convert the instructions into blocks. We treat a basic block of the file as a
sentence and use sentence embedding methods to convert them into vectors. We use
block level to learn the semantic relationships of the malware. In our method, we can
solve the problems in the previous work and get the great accuracy. In addition, our
approach can reduce a lot of time in testing stage.
ii

中文摘要i ABSTRACT ii 誌謝iii 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Challenges and Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Background and Related Work 6 2.1 IoT malware detection . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Static methods of IoT malware detection . . . . . . . . . . . . . . . . 7 2.2.1 Graphbased . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.2 Nongraph based . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Dynamic methods of IoT malware detection . . . . . . . . . . . . . . 9 2.4 Embedding methods . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.5 Asm2vec model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3 Basic Block Based Sentence Embedding for IoT Malware 14 3.1 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.1 The data preprocessing in training stage . . . . . . . . . . . . 15 3.1.2 The data preprocessing in Testing stage . . . . . . . . . . . . 17 3.2 Sentence embedding . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3 Model Description . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4 Experiments and Analysis 21 4.1 Experiment Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2 Experimental Verification . . . . . . . . . . . . . . . . . . . . . . . . 22 4.2.1 Instruction level or Block level . . . . . . . . . . . . . . . . . 22 4.2.2 Sentence embedding . . . . . . . . . . . . . . . . . . . . . . 23 4.3 Experiment Result . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.3.1 The experiment of Mirai samples . . . . . . . . . . . . . . . 25 4.3.2 The experiment of Bashlite samples . . . . . . . . . . . . . . 25 4.4 Compared with Previous Work . . . . . . . . . . . . . . . . . . . . . 26 5 Conclusions & Further Work 27 5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 5.2 Limitations and Further Work . . . . . . . . . . . . . . . . . . . . . . 28

[1] Z. Yu, R. Cao, Q. Tang, S. Nie, J. Huang, and S. Wu, “Order matters: Semanticaware
neural networks for binary code similarity detection,” in Proceedings of
the AAAI Conference on Artificial Intelligence, 2020.
[2] Q. Le and T. Mikolov, “Distributed representations of sentences and documents,”
in International conference on machine learning. PMLR, 2014.
[3] Adr, “Active production against iot malware using process emulation and word
embedding,” 2020.
[4] “Qiling framework.” [Online]. Available: https://qiling.io
[5] S. H. Ding, B. C. Fung, and P. Charland, “Asm2vec: Boosting static representation
robustness for binary clone search against code obfuscation and compiler
optimization,” in IEEE Symposium on Security and Privacy (SP), 2019.
[6] Y. Shoshitaishvili, R. Wang, C. Salls, N. Stephens, M. Polino, A. Dutcher,
J. Grosen, S. Feng, C. Hauser, C. Kruegel, and G. Vigna, “Sok: (state of) the
art of war: Offensive techniques in binary analysis,” in IEEE Symposium on Security
and Privacy (SP), 2016.
[7] H.T.
Nguyen, Q.D.
Ngo, and V.H.
Le, “A novel graphbased
approach for iot
botnet detection,” International Journal of Information Security, 2020.
[8] E. M. Dovom, A. Azmoodeh, A. Dehghantanha, D. E. Newton, R. M. Parizi, and
H. Karimipour, “Fuzzy pattern tree for edge malware detection and categorization
in iot,” Journal of Systems Architecture, 2019.
[9] H. Darabian, A. Dehghantanha, S. Hashemi, S. Homayoun, and K.K.
R. Choo,
“An opcodebased
technique for polymorphic internet of things malware detection,”
Concurrency and Computation: Practice and Experience, 2020.
[10] F. Shahzad and M. Farooq, “Elfminer:
Using structural knowledge and data mining
methods to detect new (linux) malicious executables,” Knowledge and information
systems, 2012.
[11] J. Su, D. V. Vasconcellos, S. Prasad, D. Sgandurra, Y. Feng, and K. Sakurai,
“Lightweight classification of iot malware based on image recognition,” in IEEE
42Nd annual computer software and applications conference (COMPSAC), 2018.
[12] H. Alasmary, A. Khormali, A. Anwar, J. Park, J. Choi, A. Abusnaina, A. Awad,
D. Nyang, and A. Mohaisen, “Analyzing and detecting emerging internet of things
malware: A graphbased
approach,” IEEE Internet of Things Journal, 2019.
[13] A. Azmoodeh, A. Dehghantanha, and K.K.
R. Choo, “Robust malware detection
for internet of (battlefield) things devices using deep eigenspace learning,” IEEE
transactions on sustainable computing, 2018.
[14] M. Alhanahnah, Q. Lin, Q. Yan, N. Zhang, and Z. Chen, “Efficient signature
generation for classifying crossarchitecture iot malware,” in IEEE Conference on Communications and Network Security (CNS), 2018.
[15] T. N. Phu, L. H. Hoang, N. N. Toan, N. D. Tho, and N. N. Binh, “Cfdvex: A
novel feature extraction method for detecting crossarchitecture
iot malware,” in
Proceedings of the Tenth International Symposium on Information and Communication
Technology, 2019.
[16] J. Jeon, J. H. Park, and Y.S.
Jeong, “Dynamic analysis for iot malware detection
with convolution neural network model,” IEEE Access, 2020.
[17] Q.D.
Ngo, H.T.
Nguyen, H.A.
Tran, and D.H.
Nguyen, “Iot botnet detection
based on the integration of static and dynamic vector features,” in IEEE Eighth
International Conference on Communications and Electronics (ICCE), 2021.
[18] B. Cakir and E. Dogdu, “Malware classification using deep learning methods,” in
Proceedings of the ACMSE Conference, 2018.
[19] E. L. Goodman, C. Zimmerman, and C. Hudson, “Packet2vec: Utilizing
word2vec for feature extraction in packet data,” arXiv:2004.14477, 2020.
[20] Y. Goldberg and O. Levy, “word2vec explained: deriving mikolov et al.’s
negativesampling
wordembedding
method,” arXiv:1402.3722, 2014.
[21] L. Massarelli, G. A. Di Luna, F. Petroni, L. Querzoni, and R. Baldoni, “Investigating
graph embedding neural networks with unsupervised features extraction
for binary analysis,” in Proceedings of the 2nd Workshop on Binary Analysis Research
(BAR), 2019.
[22] J. Devlin, M.W.
Chang, K. Lee, and K. Toutanova, “Bert: Pretraining
of deep
bidirectional transformers for language understanding,” arXiv:1810.04805, 2018.
[23] “Ida pro.” [Online]. Available: https://hexrays.
com/idapro/
[24] D. Cer, Y. Yang, S.y.
Kong, N. Hua, N. Limtiaco, R. S. John, N. Constant,
M. GuajardoCéspedes,
S. Yuan, C. Tar et al., “Universal sentence encoder,”
arXiv:1803.11175, 2018.
[25] M. Pagliardini, P. Gupta, and M. Jaggi, “Unsupervised learning of sentence embeddings
using compositional ngram
features,” arXiv:1703.02507, 2017.
[26] J. Wieting, M. Bansal, K. Gimpel, and K. Livescu, “Towards universal paraphrastic
sentence embeddings,” arXiv:1511.08198, 2015.
[27] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations
of words and phrases and their compositionality,” in Advances in
neural information processing systems, 2013.

QR CODE