研究生: 康帷晟
Wei-Cheng Kang
論文名稱: 使用基於優化的資料增強方法提升物聯網惡意軟體變體檢測的穩健性
Improving Robustness against Detection of IoT Malware Variants Using Optimization-based Data Augmentation
指導教授: 李漢銘
Hahn-Ming Lee
Shin-Ming Cheng
口試委員: 游家牧
Chia-Mu Yu
Yi-Ting Huang
學位類別: 碩士
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 51
中文關鍵詞: 物聯網惡意軟體檢測惡意軟體變體資料增強穩健性人工智慧
外文關鍵詞: IoT malware detection, malware variants, data augmentation, Robustness, artificial intelligence
  • 隨著物聯網設備的快速興起,這些設備成為惡意軟體攻擊的目標。其中,攻擊者常通過修改現有的惡意軟體來創建新的變體,導致大量惡意軟體變體的出現,而這些多樣的變體往往能夠規避檢測器的偵測。當檢測器基於不同的特徵給出相互矛盾的結果時,這對惡意軟體的正確標籤造成了挑戰。

    With the rapid rise of Internet of Things (IoT) devices, these devices have become prime targets for malware attacks. Attackers often create new variants by modifying existing malware, resulting in a proliferation of diverse variants that can evade detection mechanisms. The inconsistency in detection results based on different features poses a significant challenge to the accurate labeling of malware.
    To improve the robustness of detectors against malware variants, we proposed a novel data augmentation method. The augmentation process is formulated as an optimization problem, subject to three meticulously designed penalty functions. Instead of imposing explicit constraints, the penalty functions effectively adjust and optimize the augmented data. We use the BFGS algorithm to optimize and enhance the processing efficiency of the augmented data. This approach enables the augmented data to simulate variants, which is crucial for addressing the diversity of malware variants.
    Through comparative experiments with existing methods, we demonstrate the effectiveness of our strategy. Results show significant improvements in model generalizability and robustness against variants in static feature detection tasks, while maintaining efficient data generation. Ablation studies confirm the contributions of each penalty function. In summary, our research offers a new effective tool for detecting IoT malware variants.

    中文摘要 i ABSTRACT ii 誌謝 iii 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Challenges and Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Outline of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2 Background and Related Work 9 2.1 IoT Malware Variants . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 IoT Static Malware Detection . . . . . . . . . . . . . . . . . . . . . . 11 2.3 Data Augmentation Approaches in Malware Detection . . . . . . . . 11 2.4 BFGS Optimization Algorithm . . . . . . . . . . . . . . . . . . . . . 12 3 Proposed Method 14 3.1 Optimization Process . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.1.1 To Simulate Misclassified Malware Variants . . . . . . . . . 18 3.1.2 Retain the Original Performance . . . . . . . . . . . . . . . . 19 3.2 Augmented Data Optimization . . . . . . . . . . . . . . . . . . . . . 20 4 Experimental Setup 22 4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.2 Model Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . 23 4.3 Experiments of Hyperparameters . . . . . . . . . . . . . . . . . . . . 24 5 Evaluation 26 5.1 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 5.2 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 5.3 Comparison with other methods . . . . . . . . . . . . . . . . . . . . 30 6 Limitations & Future Work 32 6.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 7 Conclusion 35

