簡易檢索 / 詳目顯示

研究生: 林敬堯
Jing-Yao Lin
論文名稱: 多視點惡意文件偵測
Multi-View Malicious Document Detection
指導教授: 鮑興國
Hsing-Kuo Pao
口試委員: 李育杰
Yuh-Jye Lee
鄧惟中
Wei-Chung Teng
項天瑞
Tien-Ruey Hsiang
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2013
畢業學年度: 101
語文別: 英文
論文頁數: 54
中文關鍵詞: 惡意文件多視點漏洞利用
外文關鍵詞: Exploit, Malicious document, Vulnerability
相關次數: 點閱:172下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 惡意文件是現代相當著名的攻擊之一。大多數的惡意文件會出現使用者預期之外的行為,而在最終將會造成使用者在不經意的情況之下造成損失。而偵測惡意文件在現代資訊安全中也已經佔有了相當的地位。惡意文件當中通常包含了一些特定的控制碼來造成惡意的程式碼執行。這些控制碼原本只是為了豐富文件的功能性而設計,但是卻也同時產生了漏洞( vulnerabilities )而成為了觸發攻擊的鑰匙。
    在惡意文件當中,一個攻擊者經常會利用習慣性的用詞構築觸發漏洞的工具,並且利用其中特定的功能性文字來觸發惡意的常數資料。所以我們提出對於文件內部的物件使用三種不同視點對於文件內部的物件進行分析,三種視點包括:功能性用詞( functional words )、習慣性用詞( preference words ),以及常數資料( constant data )的使用。相較於以往大多數惡意文件的研究針對特定的文件格式提出特定的偵測方式,我們的方法則是對於文件內部的物件提出三種較一般性的不同視點已進行分析。我們也提出了利用類TF-IDF的正規化方式以提高擬態攻擊( mimicry attack )的偵測率。總結來說,利用三種視點的輸入值( inputs )來進行對於文件的分類( classification )。我們結合了各種單一的視點進行實驗來對於提出的方法進行評估。


    Malicious document is one of the most notorious components of modern
    attacks, which may appear normal but behave strangely or beyond users'
    expectation; very often, it leads to severe consequences in the end. Detect-
    ing malicious documents tops one of the most important tasks in modern
    information security. Malicious documents usually contatin speci c con-
    trol codes inside which may cause the malicious shellcode be executed.
    The document control codes are originally designed to enrich the docu-
    ments' functionalities; but in this case,they may create vulnerabilities and
    then become a key to trigger attacks.
    Di erent from previous research that focused on detecting mali-
    cious document of particular format, we analyze the document objects
    from three general di erent views: the use of functional words, preference
    words, and constant data. The functional words control how an attack is
    launched, through what actions, if the document is considered a malicious
    one; the preference words usually suggest the favored choices of words; and
    the constant data can be consider the bullets to complete the attack. We
    also propose a TF-IDF-like method to normalize the features for mimicry
    attacks. Overall, given the three views' inputs, the detection is done via
    classi cation. We evaluate the proposed approach through series of ex-
    periments that use di erent combinations of views for prediction.

    1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . 5 2 Security of Document File Format 6 2.1 History of Malicious Document . . . . . . . . . . . . . . . 6 2.2 An Example of Attack Procedure . . . . . . . . . . . . . . 9 2.3 Examples of Malicious Code . . . . . . . . . . . . . . . . . 9 3 Related Work on Malicious Document Detection 14 3.1 Dynamic Analysis and Static Analysis . . . . . . . . . . . 14 3.1.1 Dynamic Analysis . . . . . . . . . . . . . . . . . . . 14 3.1.2 Static Analysis . . . . . . . . . . . . . . . . . . . . 15 3.2 Malicious Document Detection with Mimicry Attacks . . . 15 I 4 Methodology 17 4.1 Multi-view Features Extraction . . . . . . . . . . . . . . . 18 4.1.1 Frequency Feature of Functional Words . . . . . . . 18 4.1.2 Frequency Feature of Preference Words . . . . . . . 20 4.1.3 n-gram Entropy Feature of Constant Data . . . . . 21 4.2 Support Vector Machines . . . . . . . . . . . . . . . . . . . 24 4.3 TF-IDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5 Results 27 5.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 5.2 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . 30 5.2.1 Feature Extraction of Functional Words . . . . . . 30 5.2.2 Feature Extraction of Preference Words . . . . . . . 31 5.2.3 Feature Extraction of Constant Data . . . . . . . . 33 5.3 Individual Experiments under SVM . . . . . . . . . . . . . 34 5.4 Experiments of Features Combination under SVM . . . . . 37 5.5 Experiment for Simulated Dataset of Mimicry Attack . . . 39 6 Conclusion 42

    1] Adobe Reader/Acrobat 0-day Clari cation. http://secunia.com/
    blog/44/.
    [2] Apache PDFBox - A Java PDF Library. http://pdfbox.apache.
    org/.
    [3] Contagio. http://contagiodump.blogspot.com.
    [4] Fireshark - An open-source web analysis tool for the malicious web.
    http://fireshark.org/.
    [5] Malware in September: The Fine Art of Targeted At-
    tacks. http://www.kaspersky.com/about/news/virus/2011/
    Malware in September The Fine Art of Targeted Attacks.
    [6] Microsoft warns you to stay away from these PDF les. http://news.
    thewindowsclub.com/microsoft-warns-stay-pdf-files-62289/.
    [7] National Vulnerability Database. http://nvd.nist.gov/.
    [8] Research Virus:WM/Concept.A. http://www.microsoft.com/
    security/portal/threat/encyclopedia/entry.aspx?Name=
    Virus%3AWM%2FConcept.A.
    [9] Reserved words in JavaScript. http://www.javascripter.net/faq/
    reserved.htm.
    [10] VBS/PeachyPDF@MM. http://www.mcafee.com/
    threat-intelligence/malware/default.aspx?id=99179.
    [11] Vulnerability Summary for CVE-2013-1335. http://web.nvd.nist.
    gov/view/vuln/detail?vulnId=CVE-2013-1335.
    [12] A. Barth, J. Weinberger, and D. Song. Cross-origin javascript capa-
    bility leaks: Detection, exploitation, and defense. In 18th USENIX
    Security Symposium, 2009.
    [13] C. J. C. Burges. A tutorial on support vector machines for pattern
    recognition. Data Min. Knowl. Discov., 2(2):121{167, 1998.
    42
    [14] F-Secure. Threat report 2012. http://www.f-secure.com/static/
    doc/labs global/Research/Threat Report H2 2012.pdf.
    [15] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin.
    Liblinear: A library for large linear classi cation. J. Mach. Learn.
    Res., 9:1871{1874, June 2008.
    [16] B. Gu, X. Bai, Z. Yang, A. C. Champion, and D. Xuan. Malicious
    shellcode detection with virtual memory snapshots. In INFOCOM,
    2010 Proceedings IEEE, pages 1{9. IEEE, 2010.
    [17] P. Laskov and N. Srndi c. Static detection of malicious javascript-
    bearing pdf documents. In Proceedings of the 27th Annual Computer
    Security Applications Conference, pages 373{382. ACM, 2011.
    [18] P. Likarish, E. Jung, and I. Jo. Obfuscated malicious javascript de-
    tection using classi cation techniques. In Malicious and Unwanted
    Software (MALWARE), 2009 4th International Conference on, pages
    47{54. IEEE, 2009.
    [19] D. Maiorca, I. Corona, and G. Giacinto. Looking at the bag is not
    enough to nd the bomb: an evasion of structural methods for ma-
    licious pdf les detection. In Proceedings of the 8th ACM SIGSAC
    symposium on Information, computer and communications security,
    pages 119{130. ACM, 2013.
    [20] D. Maiorca, G. Giacinto, and I. Corona. A pattern recognition sys-
    tem for malicious pdf les detection. In Machine Learning and Data
    Mining in Pattern Recognition, pages 510{524. Springer, 2012.
    [21] N. F. G. K. Selvaraj and N. F. Gutierrez. The rise of pdf malware.
    Symantec Security Response, 2010.
    [22] A. Shabtai, R. Moskovitch, Y. Elovici, and C. Glezer. Detection
    of malicious code by applying machine learning classi ers on static
    features: A state-of-the-art survey. Information Security Technical
    Report, 14(1):16{29, 2009.
    [23] D. Stevens. Malicious pdf documents explained. Security & Privacy,
    IEEE, 9(1):80{82, 2011.
    [24] Z. Tzermias, G. Sykiotakis, M. Polychronakis, and E. P. Markatos.
    Combining static and dynamic analysis for the detection of malicious
    documents. In Proceedings of the Fourth European Workshop on Sys-
    tem Security, page 4. ACM, 2011.

    QR CODE