簡易檢索 / 詳目顯示

研究生: 塔哈瓦
Ismail AL-Taharwa
論文名稱: JavaScript 惡意程式自動化分析與偵測
Automatic Analysis and Detection of JavaScript Malware
指導教授: 李漢銘
Hahn-Ming Lee
鄭博仁
Albert B. Jeng
何正信
Cheng-Seen Ho
陳錫明
Shyi-Ming Chen
口試委員: 陳秋華
Chyou-Hwa Chen
林豐澤
Feng-Tse Lin
學位類別: 博士
Doctor
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2014
畢業學年度: 102
語文別: 英文
論文頁數: 100
中文關鍵詞: JavaScript malwareDrive-by downloadObfuscationHeap-sprayingAST representationLatent behavior prediction.
外文關鍵詞: JavaScript malware, Drive-by download, Obfuscation, Heap-spraying, AST representation, Latent behavior prediction.
相關次數: 點閱:392下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • Dynamic nature and simplicity of JavaScript make it an emergent mean for developing and crafting interactive web content. Nowadays, JavaScript is install almost into every PC, notebook and smartphone. These characteristics make JavaScript the preferred mean among web attackers to target wider space of victims. JavaScript malware are becoming prevalent mean to mount and keep survival of large-scale web attacks. Therefore, the detection of JavaScript malware has been receiving serious research attention. Signature solutions has been proposed first. However, due to the appearance of obfuscated JavaScript malware, researchers shifted to characterize script's malignancy state. Syntactical characterization provided a good results in terms of detecting of plain malware and wide range of their obfuscated variation. Unfortunately, advanced patterns of JavaScript malware has been observed to maintain massively, heavily versions of obfuscation techniques. Due to those observations, the two primary factors preventing the development of large-scale, real-time detector for drive-by downloads become contradictory. On one hand, prevalence of deceptively (i.e., massively, heavily, and extremely) obfuscated scripts hinders workability of static detectors. On the other hand, dynamic analysis incurs excessive overhead along with many other limitations.


    Dynamic nature and simplicity of JavaScript make it an emergent mean for developing and crafting interactive web content. Nowadays, JavaScript is install almost into every PC, notebook and smartphone. These characteristics make JavaScript the preferred mean among web attackers to target wider space of victims. JavaScript malware are becoming prevalent mean to mount and keep survival of large-scale web attacks. Therefore, the detection of JavaScript malware has been receiving serious research attention. Signature solutions has been proposed first. However, due to the appearance of obfuscated JavaScript malware, researchers shifted to characterize script's malignancy state. Syntactical characterization provided a good results in terms of detecting of plain malware and wide range of their obfuscated variation. Unfortunately, advanced patterns of JavaScript malware has been observed to maintain massively, heavily versions of obfuscation techniques. Due to those observations, the two primary factors preventing the development of large-scale, real-time detector for drive-by downloads become contradictory. On one hand, prevalence of deceptively (i.e., massively, heavily, and extremely) obfuscated scripts hinders workability of static detectors. On the other hand, dynamic analysis incurs excessive overhead along with many other limitations.

    ABSTRACT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . iv 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Concept Philosophy . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.5 Outline of this Dissertation . . . . . . . . . . . . . . . . . . . . . . . 7 2 Background and RelatedWork 8 2.1 JavaScript Malware . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.1 Basic JavaScript Malware . . . . . . . . . . . . . . . . . . . 9 2.1.2 Large-Scale Web Attacks . . . . . . . . . . . . . . . . . . . . 10 2.1.3 New Trends of Large-ScaleWeb Attacks and JavaScript Malware 11 2.2 JavaScript Obfuscation . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 vi CONTENTS vii 3 Automatic Detection of JavaScript Malware 18 3.1 Staged Scenario for Detecting JavaScript Malware . . . . . . . . . . . 18 3.2 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.3 Analysis of JSMalTor’s Resilience Potentials . . . . . . . . . . . . . 24 4 ODT: Obfuscation Detection Techniques 26 4.1 ODT’s Detection Schemas . . . . . . . . . . . . . . . . . . . . . . . 26 4.2 Analysis of ODT’s Statistical Measurements . . . . . . . . . . . . . . 27 4.3 Modeling ODT’s Features . . . . . . . . . . . . . . . . . . . . . . . 28 5 JSOD: JavaScript Obfuscation Detector 30 5.1 JSOD Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.2 Analysis of AST representation . . . . . . . . . . . . . . . . . . . . . 32 5.2.1 AST Tree-level and Node-level Definitions . . . . . . . . . . 32 5.3 Variable Context-Level Feature Extractor (VCLFE) . . . . . . . . . . 34 5.3.1 Context Locator . . . . . . . . . . . . . . . . . . . . . . . . 35 5.4 Vector Space Representations for CbFs . . . . . . . . . . . . . . . . . 39 5.4.1 Illustrative Scenario of JSOD Solution . . . . . . . . . . . . . 40 6 Drive-by Disclosure: Automatic Detector of Drive-by Downloads Based on Latent Behavior Prediction 42 6.1 Prediction of Script’s Latent Behaviors . . . . . . . . . . . . . . . . . 43 6.1.1 Context-based Features (CbFs) Extractor . . . . . . . . . . . 44 6.1.2 Actual Context Generator . . . . . . . . . . . . . . . . . . . 46 6.1.3 Sanitization . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 CONTENTS viii 7 Experiments 53 7.1 Detection of JavaScript Encoding . . . . . . . . . . . . . . . . . . . . 53 7.2 Detection of JavaScript Obfuscation . . . . . . . . . . . . . . . . . . 55 7.2.1 JSOD’s Experimental Setups . . . . . . . . . . . . . . . . . . 56 7.2.2 Evaluation of JSOD solution . . . . . . . . . . . . . . . . . . 59 7.3 Characterization of Scripting Practice and Prediction of Latent Behaviors 64 7.3.1 Effectiveness Analysis . . . . . . . . . . . . . . . . . . . . . 66 7.3.2 Comparison with Other Techniques . . . . . . . . . . . . . . 70 7.3.3 Efficiency Analysis . . . . . . . . . . . . . . . . . . . . . . . 72 8 Conclusion and FurtherWork 74 8.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 8.2 Limitation and Further Work . . . . . . . . . . . . . . . . . . . . . . 76

    [1] “Wepawet,” "http://wepawet.cs.ucsb.edu".
    [2] I. Al-Taharwa, H.-M. Lee, A. Jeng, K.-P. Wu, C.-H. Mao, T.-E. Wei, and S.-M.
    Chen, “Redjsod: A readable javascript obfuscation detector using semantic-based
    analysis,” in 11th International Conference on Trust, Security and Privacy in
    Computing and Communications (TrustCom), 2012 IEEE, 2012, pp. 1370–1375.
    [3] I. Al-Taharwa, A. B. Jeng, H.-M. Lee, and S.-M. Chen, “Cloud-based antimalware
    solution,” in The International Symposium on Grids and Clouds and
    the Open Grid Forum, ser. ISGC 2011 & OGF 31. Proceedings of Science,
    2011.
    [4] I. Al-Taharwa, H.-M. Lee, A. B. Jeng, K.-P. Wu, C.-S. Ho, and S.-M. Chen,
    “Jsod: Javascript obfuscation detector,” Security and Communication Networks,
    vol. 6, p. 1111, 2014.
    [5] I. Al-Taharwa, C.-H. Mao, H.-K. Pao, K.-P. Wu, C. Faloutsos, H.-M. Lee, S.-M.
    Chen, and A. B. Jeng, “Obfuscated malicious javascript detection by causal relations
    finding,” in proceeding of the 13th International Conference on Advanced
    Communication Technology, ser. ICACT ’11. Pyeongchang, Korea: IEEE, Feb
    2011, pp. 787–792.
    [6] Alexa. (2012) Global top sites. http://www.alexa.com/top- sites.
    [7] G. Blanc, R. Ando, and Y. Kadobayashi, “Term-rewriting deobfuscation for static
    client-side scripting malware detection,” in Proceedings of the 4th International
    Conference on New Technologies, Mobility and Security (NTMS). Paris, France:
    IEEE, Feb 2011, pp. 1–6.
    [8] G. Blanc and Y. Kadobayashi, “A step towards static script malware abstraction:
    Rewriting obfuscated script with maude,” IEICE Transactions on Information
    and Systems, vol. 94, no. 11, pp. 2159–2166, 2011.
    [9] D. Byrne. (2007) Intranet invasion through anti-dns pinning. Online:
    http://www.blackhat.com/html/bh-usa-07/bh-usa-07-speakers.html. Black
    Hat. USA. [Online]. Available: http://www.blackhat.com/html/bh-usa-07/
    bh-usa-07-speakers.html
    [10] D. Canali, M. Cova, G. Vigna, and C. Kruegel, “Prophiler: a fast filter for the
    large-scale detection of malicious web pages,” in Proceedings of the 20th international
    conference on World wide web, ser. WWW ’11. New York, NY, USA:
    ACM, 2011, pp. 197–206.
    [11] K. Chellapilla and A. Maykov, “A taxonomy of javascript redirection spam,” in
    Proceedings of the 3rd international workshop on Adversarial information retrieval
    on the web, ser. AIRWeb’07. New York, NY, USA: ACM, 2007, pp.
    81–88.
    [12] M. Cova, C. Kruegel, and G. Vigna, “Detection and analysis of drive-bydownload
    attacks and malicious javascript code,” in Proceedings of the 19th international conference on World wide web, ser. WWW ’10. New York, NY,
    USA: ACM, Apr 2010, pp. 281–290.
    [13] C. Curtsinger, B. Livshits, B. Zorn, and C. Seifert, “Zozzle: Fast and precise
    in-browser javascript malware detection,” in Proceedings of the 20th USENIX
    conference on Security, ser. SEC’11. Berkeley, CA, USA: USENIX Association,
    Aug 2011, pp. 3–3.
    [14] J. Davis and M. Goadrich, “The relationship between precision-recall and roc
    curves,” in Proceedings of the 23rd international conference on Machine learning,
    ser. ICML ’06. New York, NY, USA: ACM, 2006, pp. 233–240.
    [15] A. Dewald, T. Holz, and F. C. Freiling, “Adsandbox: Sandboxing javascript
    to fight malicious websites,” in Proceedings of the 2010 ACM Symposium on
    Applied Computing, ser. SAC ’10. New York, NY, USA: ACM, 2010, pp.
    1859–1864. [Online]. Available: http://doi.acm.org/10.1145/1774088.1774482
    [16] Y. Ding, T. Wei, T. Wang, Z. Liang, and W. Zou, “Heap taichi: exploiting memory
    allocation granularity in heap-spraying attacks,” in Proceedings of the 26th
    Annual Computer Security Applications Conference, ser. ACSAC ’10. New
    York, NY, USA: ACM, 2010, pp. 327–336.
    [17] M. Egele, E. Kirda, and C. Kruegel, “Mitigating drive-by download attacks:
    Challenges and open problems,” in Open Research Problems in Network Security
    Workshop, ser. iNetSec’09, Zurich, Switzerland, 2009, pp. 52–62.
    [18] M. Egele, P. Wurzinger, C. Kruegel, and E. Kirda, “Defending browsers against
    drive-by downloads: Mitigating heap-spraying code injection attacks,” in Proceedings of the 6th International Conference on Detection of Intrusions and Malware,
    and Vulnerability Assessment, ser. DIMVA ’09. Milan, Italy: Springer-
    Verlag, 2009, pp. 88–106.
    [19] Z. Fadlullah, T. Taleb, A. Vasilakos, M. Guizani, and N. Kato, “Dtrab: Combating
    against attacks on encrypted protocols through traffic-feature analysis,”
    IEEE/ACM Transactions on Networking, vol. 18, no. 4, pp. 1234–1247, 2010.
    [20] B. Feinstein and D. Peck, “Caffeine monkey: Automated collection, detection
    and analysis of malicious javascript,” Blackhat DEFCON 16, Tech. Rep., 2007.
    [21] Fortinet. (2010, June) Malware top 10 list. http://www.fortiguard.
    com/report/roundupjune2010.html.
    [22] F. Gadaleta, Y. Younan, and W. Joosen, “Bubble: a javascript engine level countermeasure
    against heap-spraying attacks,” in Second international conference
    on Engineering Secure Software and Systems, ser. ESSoS’10. Springer-Verlag,
    2010, pp. 1–17.
    [23] F. Howard, “Malware with your mocha? obfuscation and antiemulation tricks in
    malicious javascript.” SophosLabs, Tech. Rep., September 2010.
    [24] X. Hu, T.-c. Chiueh, and K. G. Shin, “Large-scale malware indexing using
    function-call graphs,” in Proceedings of the 16th ACM conference on Computer
    and communications security, ser. CCS ’09. New York, NY, USA: ACM, 2009,
    pp. 611–620.
    [25] Y.-W. Huang, F. Yu, C. Hang, C.-H. Tsai, D.-T. Lee, and S.-Y. Kuo, “Securing
    web application code by static analysis and runtime protection,” in Proceedings of the 13th international conference on World Wide Web, ser. WWW ’04. New
    York, NY, USA: ACM, 2004, pp. 40–52.
    [26] InfoSecurity. (2010, July) Obfuscated javascript malware making a comeback.
    http://a.elsevierlb1.intuitiv.net/view/106 79/obfuscated-javascript-malwaremaking-
    a-comeback.
    [27] L. Invernizzi, S. Benvenuti, P. M. Comparetti, M. Cova, C. Kruegel, and G. Vigna,
    “Evilseed: A guided approach to finding malicious web pages,” in IEEE
    Symposium on Security and Privacy (SP), 2012, pp. 428–442.
    [28] S. Kaplan, B. Livshits, B. Zorn, C. Seifert, and C. Curtsinger, “"nofus: Automatically
    detecting" + string.fromcharcode(32) + "obfuscated ".tolowercase()
    + "javascript code",” Technical Report MSR-TR 2011-57, Microsoft Research,
    Tech. Rep., May 2011.
    [29] A. Kapravelos, Y. Shoshitaishvili, M. Cova, C. Kruegel, and G. Vigna, “Revolver:
    An automated approach to the detection of evasive web-based malware,”
    in USENIX Security, Washington, D.C., USA, 2013, pp. 637–651.
    [30] B.-I. Kim, C.-T. Im, and H.-C. Jung, “Suspicious malicious web site detection
    with strength analysis of a javascript obfuscation,” International Journal of Advanced
    Science and Technology, vol. 26, pp. 19–32, 2011.
    [31] C. Kolbitsch, B. Livshits, B. Zorn, and C. Seifert, “Rozzle: De-cloaking internet
    malware,” in Proceedings of the 2012 IEEE Symposium on Security and Privacy,
    ser. SP ’12. San Francisco, California, USA: IEEE Computer Society, May
    2012, pp. 443–457.
    [32] T. Krueger and K. Rieck, “Intelligent defense against malicious javascript code,”
    Praxis der Informationsverarbeitung und Kommunikation (PIK), vol. 35(1),
    no. 1, p. 54U˝ 60, 2012.
    [33] P. Likarish, E. Jung, and I. Jo, “Obfuscated malicious javascript detection using
    classification techniques,” in Proceedings of the 4th International Conference on
    Malicious and Unwanted Software (MALWARE). Montreal, Canada: IEEE, Oct
    2009, pp. 47 –54.
    [34] P. Likarish and E. Jung, “A targeted web crawling for building malicious
    javascript collection,” in Proceedings of the ACM first international workshop
    on Data-intensive software management and mining, ser. DSMM ’09. New
    York, NY, USA: ACM, 2009, pp. 23–26.
    [35] G. Lu, K. Coogan, and S. Debray, “Automatic simplification of obfuscated
    javascript code (extended abstract),” in Proceedings of the 6th International Conference
    on Information Systems, Technology and Management, ser. ICISTM’12.
    Springer Berlin Heidelberg, 2012, pp. 348–359.
    [36] G. Lu and S. Debray, “Automatic simplification of obfuscated javascript code: A
    semantics-based approach,” in Proceedings of the 6th IEEE International Conference
    on Software Security and Reliability, ser. SERE’12, Washington, D.C.,
    USA, Jun 2012, pp. 31–40.
    [37] Malware(patrol). (2011) block list. http://www.malware.com .br/lists.shtml.
    [38] Malwareurl. Blacklist of malware by malwareurl, http://www.malwareurl.com/.
    [39] Microsoft. Microsoft research.
    [40] A. Moser, C. Kruegel, and E. Kirda, “Exploring multiple execution paths for
    malware analysis,” in Proceedings of the 2007 IEEE Symposium on Security and
    Privacy, ser. SP ’07. Washington, DC, USA: IEEE Computer Society, 2007, pp.
    231–245.
    [41] Mozilla. Firefox extensions. https://addons.mozilla. org/en-us/firefox/extensions.
    [42] ——. Javascript docummentation. https://devel- oper.mozilla.org/en/javascript.
    [43] ——. Spidermonkey parsing rules, https://developer.mozilla.org/enus/
    docs/spidermonkey/parserapi. [Online]. Available: https://developer.mozilla.
    org/en-US/docs/SpiderMonkey/ParserAPI
    [44] ——. (2011) Benign plain script. https://developer.mozilla.
    org/en/javascript/guide/statements.
    [45] ——. (2013) Javascript guide 1.5 by mozilla, https://developer.mozilla.org/enus/
    docs/web/javascript/guide.
    [46] OWASP. (2010) Webscarab project. https://www.owasp.org/
    index.php/category:owasp-webscarabproject.
    [47] T. Pietraszek and A. Tanner, “Data mining and machine learning towards reducing
    false positives in intrusion detection,” Information Security Technical Report,
    vol. 10, no. 3, pp. 169–183, January 2005.
    [48] V. M. Prieto, M. Alvarez, R. Lopez-Garcia, and F. Cacheda, “A scale for crawler
    effectiveness on the client-side hidden web,” Computer Science and Information
    Systems, vol. 9(2), pp. 561–583, 2012.
    [49] J. Ramos, “Using tf-idf to determine word relevance in document queries,” in
    Proceedings of 1st Instructional Conference on Machine Learning, ser. iCML’03,
    Piscataway, N.J., USA, 2003.
    [50] RandomLink. Random website dot com. http://www. randomwebsite.com/.
    [51] P. Ratanaworabhan, B. Livshits, and B. Zorn, “Nozzle: A defense against heapspraying
    code injection attacks,” in Proceedings of the 18th Usenix Security Symposium.
    Montreal, Canada: USENIX Association, Aug 2009, pp. 169–186.
    [52] K. Rieck, T. Krueger, and A. Dewald, “Cujo: efficient detection and prevention
    of drive-by-download attacks,” in Proceedings of the 26th Annual Computer Security
    Applications Conference, ser. ACSAC ’10. New York, NY, USA: ACM,
    2010, pp. 31–39.
    [53] R. Sommer and V. Paxson, “Outside the closed world: on using machine learning
    for network intrusion detection,” in Proceedings of the IEEE Symposium on
    Security and Privacy. IEEE Computer Society, May 2010, pp. 305 –316.
    [54] M. Sutton. (2010, June) Antivirus struggling with obfuscated javascript.
    http://research.zscaler.com/search/label/o- bfuscation. zcaler.
    [55] L. Wei, H. Zhu, Z. Cao, X. Dong, W. Jia, Y. Chen, and A. V. Vasilakos, “Security
    and privacy for storage and computation in cloud computing,” Information
    Sciences, vol. 258, no. 0, pp. 371 – 386, 2014.
    [56] L. Wei, H. Zhu, Z. Cao, W. Jia, and A. Vasilakos, “Seccloud: Bridging secure
    storage and computation in cloud,” in IEEE 30th International Conference on
    Distributed Computing Systems Workshops (ICDCSW), 2010, pp. 52–61.
    [57] WEKA. (2012) Weka3.6.6 data mining software.
    http://www.cs.waikato.ac.nz/ml/weka/.
    [58] Wikipedia. Abstract syntax tree. http://en.wikipedia. org/wiki/abstractsyntaxtree.
    [59] ——. Definitions of precision and recall http://
    en.wikipedia.org/wiki/precisionandrecall.
    [60] N. Xiong, A. V. Vasilakos, J. Wu, Y. R. Yang, A. Rindos, Y. Zhou, W.-Z. Song,
    and Y. Pan, “A self-tuning failure detection scheme for cloud computing service,”
    in IEEE 26th International Parallel Distributed Processing Symposium (IPDPS),
    ser. IPDPS’12, Shanghai, China, 2012, pp. 668–679.
    [61] N. Xiong, A. V. Vasilakos, L. T. Yang, L. Song, Y. Pan, R. Kannan, and Y. Li,
    “Comparative analysis of quality of service and memory usage for adaptive failure
    detectors in healthcare systems,” IEEE Journal on Selected Areas in Communications,
    vol. 27, no. 4, pp. 495–509, 2009.
    [62] W. Xiong, H. Hu, N. Xiong, L. T. Yang, W.-C. Peng, X. Wang, and Y. Qu,
    “Anomaly secure detection methods by analyzing dynamic characteristics of the
    network traffic in cloud communications,” Information Sciences, vol. 258, pp.
    403–415, 2014.
    [63] W. Xu, F. Zhang, and S. Zhu, “Jstill: Mostly static detection of obfuscated malicious
    javascript code,” in CODASPY, San Antonio, Texas, USA, 2013, pp. 117–
    128.
    [64] C. Yue and H. Wang, “Characterizing insecure javascript practices on the web,”
    in Proceedings of the 18th international Conference on World Wide Web, ser.
    WWW ’09. New York, NY, USA: ACM, Apr 2009, pp. 961–970.
    [65] X. Zhang, F. Zhou, X. Zhu, H. Sun, A. Perrig, A. Vasilakos, and H. Guan,
    “Dfl: Secure and practical fault localization for datacenter networks,” IEEE/ACM
    Transactions on Networking, vol. V, p. 1, (to appear 2014).

    QR CODE