簡易檢索 / 詳目顯示

研究生: 詹家祐
Jia-You Jhan
論文名稱: 基於基本塊嵌入之物聯網 惡意程式辨認
Basic Block-Based Embedding for IoT Malware Identification
指導教授: 李漢銘
Hahn-Ming Lee
鄭欣明
Shin-Ming Cheng
口試委員: 黃俊穎
Chun-Ying Huang
蕭旭君
Hsu-Chun Hsiao
毛敬豪
Ching-Hao Mao
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 49
中文關鍵詞: 惡意程式檢測靜態檢測語意分析
外文關鍵詞: malware detection, static analysis, semantic analysis
相關次數: 點閱:321下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來針對物聯網的網路攻擊不斷地增加,這些攻擊也對物聯網的應用造
    成了許多威脅。因此近年來大家也開始重視物聯網的安全性問題。由於物聯網
    的設備較限制,近期的研究著重在靜態檢測的方法上。雖然這些方法,在準確
    率上有很好的效果,但是卻仍然存在著許多問題。此外目前現有的物聯網惡意
    程式偵測的方法缺乏考慮惡意程式本身的語意關係。於是本篇論文基於之前的
    研究方法提出一個基於語意分析的高效惡意程式識別方法,透過在Qiling 模擬
    引擎去得到程式正確執行後的指令前後文,再將所獲得的指令轉成基本塊的形
    式,我們將程式的一個基本塊當成一個句子,使用自然語言的句子嵌入方法,
    將基本塊轉成向量的形式,透過學習基本塊跟基本塊彼此之間的上下文關係,
    去學習關於惡意程式的語意行為並將其表達成向量表達式。在保有相同準確率
    的前提下,我們的方法改善了原本的方法所遇到的問題而且我們的方法在測試
    時可以大幅降地時間成本。


    In recent years, IoT has become the main target of attackers. Because the security
    of the IoT devices is not enough safe. In order to against these IoT malware, many
    research start to focus on IoT malware detection. But there are some limitations in
    existing methods. Existing method in malware detection don’t consider the semantic
    relationships of the malware. Therefore, this paper proposes an efficient malware
    identification method based on semantic analysis. We extend the research of previous
    work. We use the Qiling engine to get correct instructions context in the training stage.
    Then, we convert the instructions into blocks. We treat a basic block of the file as a
    sentence and use sentence embedding methods to convert them into vectors. We use
    block level to learn the semantic relationships of the malware. In our method, we can
    solve the problems in the previous work and get the great accuracy. In addition, our
    approach can reduce a lot of time in testing stage.
    ii

    中文摘要i ABSTRACT ii 誌謝iii 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Challenges and Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Background and Related Work 6 2.1 IoT malware detection . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Static methods of IoT malware detection . . . . . . . . . . . . . . . . 7 2.2.1 Graphbased . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.2 Nongraph based . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Dynamic methods of IoT malware detection . . . . . . . . . . . . . . 9 2.4 Embedding methods . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.5 Asm2vec model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3 Basic Block Based Sentence Embedding for IoT Malware 14 3.1 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.1 The data preprocessing in training stage . . . . . . . . . . . . 15 3.1.2 The data preprocessing in Testing stage . . . . . . . . . . . . 17 3.2 Sentence embedding . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3 Model Description . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4 Experiments and Analysis 21 4.1 Experiment Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2 Experimental Verification . . . . . . . . . . . . . . . . . . . . . . . . 22 4.2.1 Instruction level or Block level . . . . . . . . . . . . . . . . . 22 4.2.2 Sentence embedding . . . . . . . . . . . . . . . . . . . . . . 23 4.3 Experiment Result . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.3.1 The experiment of Mirai samples . . . . . . . . . . . . . . . 25 4.3.2 The experiment of Bashlite samples . . . . . . . . . . . . . . 25 4.4 Compared with Previous Work . . . . . . . . . . . . . . . . . . . . . 26 5 Conclusions & Further Work 27 5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 5.2 Limitations and Further Work . . . . . . . . . . . . . . . . . . . . . . 28

    [1] Z. Yu, R. Cao, Q. Tang, S. Nie, J. Huang, and S. Wu, “Order matters: Semanticaware
    neural networks for binary code similarity detection,” in Proceedings of
    the AAAI Conference on Artificial Intelligence, 2020.
    [2] Q. Le and T. Mikolov, “Distributed representations of sentences and documents,”
    in International conference on machine learning. PMLR, 2014.
    [3] Adr, “Active production against iot malware using process emulation and word
    embedding,” 2020.
    [4] “Qiling framework.” [Online]. Available: https://qiling.io
    [5] S. H. Ding, B. C. Fung, and P. Charland, “Asm2vec: Boosting static representation
    robustness for binary clone search against code obfuscation and compiler
    optimization,” in IEEE Symposium on Security and Privacy (SP), 2019.
    [6] Y. Shoshitaishvili, R. Wang, C. Salls, N. Stephens, M. Polino, A. Dutcher,
    J. Grosen, S. Feng, C. Hauser, C. Kruegel, and G. Vigna, “Sok: (state of) the
    art of war: Offensive techniques in binary analysis,” in IEEE Symposium on Security
    and Privacy (SP), 2016.
    [7] H.T.
    Nguyen, Q.D.
    Ngo, and V.H.
    Le, “A novel graphbased
    approach for iot
    botnet detection,” International Journal of Information Security, 2020.
    [8] E. M. Dovom, A. Azmoodeh, A. Dehghantanha, D. E. Newton, R. M. Parizi, and
    H. Karimipour, “Fuzzy pattern tree for edge malware detection and categorization
    in iot,” Journal of Systems Architecture, 2019.
    [9] H. Darabian, A. Dehghantanha, S. Hashemi, S. Homayoun, and K.K.
    R. Choo,
    “An opcodebased
    technique for polymorphic internet of things malware detection,”
    Concurrency and Computation: Practice and Experience, 2020.
    [10] F. Shahzad and M. Farooq, “Elfminer:
    Using structural knowledge and data mining
    methods to detect new (linux) malicious executables,” Knowledge and information
    systems, 2012.
    [11] J. Su, D. V. Vasconcellos, S. Prasad, D. Sgandurra, Y. Feng, and K. Sakurai,
    “Lightweight classification of iot malware based on image recognition,” in IEEE
    42Nd annual computer software and applications conference (COMPSAC), 2018.
    [12] H. Alasmary, A. Khormali, A. Anwar, J. Park, J. Choi, A. Abusnaina, A. Awad,
    D. Nyang, and A. Mohaisen, “Analyzing and detecting emerging internet of things
    malware: A graphbased
    approach,” IEEE Internet of Things Journal, 2019.
    [13] A. Azmoodeh, A. Dehghantanha, and K.K.
    R. Choo, “Robust malware detection
    for internet of (battlefield) things devices using deep eigenspace learning,” IEEE
    transactions on sustainable computing, 2018.
    [14] M. Alhanahnah, Q. Lin, Q. Yan, N. Zhang, and Z. Chen, “Efficient signature
    generation for classifying crossarchitecture iot malware,” in IEEE Conference on Communications and Network Security (CNS), 2018.
    [15] T. N. Phu, L. H. Hoang, N. N. Toan, N. D. Tho, and N. N. Binh, “Cfdvex: A
    novel feature extraction method for detecting crossarchitecture
    iot malware,” in
    Proceedings of the Tenth International Symposium on Information and Communication
    Technology, 2019.
    [16] J. Jeon, J. H. Park, and Y.S.
    Jeong, “Dynamic analysis for iot malware detection
    with convolution neural network model,” IEEE Access, 2020.
    [17] Q.D.
    Ngo, H.T.
    Nguyen, H.A.
    Tran, and D.H.
    Nguyen, “Iot botnet detection
    based on the integration of static and dynamic vector features,” in IEEE Eighth
    International Conference on Communications and Electronics (ICCE), 2021.
    [18] B. Cakir and E. Dogdu, “Malware classification using deep learning methods,” in
    Proceedings of the ACMSE Conference, 2018.
    [19] E. L. Goodman, C. Zimmerman, and C. Hudson, “Packet2vec: Utilizing
    word2vec for feature extraction in packet data,” arXiv:2004.14477, 2020.
    [20] Y. Goldberg and O. Levy, “word2vec explained: deriving mikolov et al.’s
    negativesampling
    wordembedding
    method,” arXiv:1402.3722, 2014.
    [21] L. Massarelli, G. A. Di Luna, F. Petroni, L. Querzoni, and R. Baldoni, “Investigating
    graph embedding neural networks with unsupervised features extraction
    for binary analysis,” in Proceedings of the 2nd Workshop on Binary Analysis Research
    (BAR), 2019.
    [22] J. Devlin, M.W.
    Chang, K. Lee, and K. Toutanova, “Bert: Pretraining
    of deep
    bidirectional transformers for language understanding,” arXiv:1810.04805, 2018.
    [23] “Ida pro.” [Online]. Available: https://hexrays.
    com/idapro/
    [24] D. Cer, Y. Yang, S.y.
    Kong, N. Hua, N. Limtiaco, R. S. John, N. Constant,
    M. GuajardoCéspedes,
    S. Yuan, C. Tar et al., “Universal sentence encoder,”
    arXiv:1803.11175, 2018.
    [25] M. Pagliardini, P. Gupta, and M. Jaggi, “Unsupervised learning of sentence embeddings
    using compositional ngram
    features,” arXiv:1703.02507, 2017.
    [26] J. Wieting, M. Bansal, K. Gimpel, and K. Livescu, “Towards universal paraphrastic
    sentence embeddings,” arXiv:1511.08198, 2015.
    [27] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations
    of words and phrases and their compositionality,” in Advances in
    neural information processing systems, 2013.

    QR CODE