簡易檢索 / 詳目顯示

研究生: 魏俐嘉
Li-Jia Wei
論文名稱: 基於擷取關鍵字與入侵指標元素區分情報文章與技術文章
Distinguishing between Intelligence Articles and Technical Articles based on Extracted Keywords and IOC Elements
指導教授: 李漢銘
Hahn-Ming Lee
口試委員: 李漢銘
Hahn-Ming Lee
沈金祥
Jin-Shiang Shen
林豐澤
Feng-Tze Lin
鄭欣明
Shin-Ming Jeng
鄭博仁
Bo-Ren Jeng
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2017
畢業學年度: 105
語文別: 英文
論文頁數: 59
中文關鍵詞: 情報資訊安全偵測部落格網路威脅情報入侵指標
外文關鍵詞: intelligence, cyber, detect, blog, CTI, IOC
相關次數: 點閱:218下載:13
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 現今網路技術越來越成熟、提供的服務越來越多樣化,舉凡網路銀行、資訊檢索等,提供給使用者方便快速的平台。也因為網路的便利性,使得網路的資訊安全議題及如何解決問題的情報逐漸被人重視。先前研究點出情報資訊雖然被以商品販售而無法免費取得,但仍然可以從公開來源如Twitter及部落格等可以挖掘出可用資訊。在本篇研究中,我們的目標在於解決人工蒐集情報的昂貴地時間成本問題。

    我們提供一個結合偵測情報元素及描繪文章輪廓的方式進行情報偵測。透過此演算法,我們開發一個能夠自動偵測技術文章是否帶有情報的系統。本演算法會自動尋找文章中是否帶有情報元素、並結合自然語言處理的技術將原本人類可識別的文章以文章結構模型方式呈現,給予系統描繪整篇文章的輪廓架構。並透過隨機森林決策樹分類器訓練出可自動檢測情報文章的系統,以用於從技術文章中找出帶有情報的文章。

    本研究結果顯示其精準度及召回率皆優於先前研究,本研究有以下幾點貢獻:(1)設計自動化偵測情報文章的系統,並達到精度90\%、召回率94.7\%的成績。(2)提出一文章結構模型將文章用不同方式描繪內文輪廓。(3)使用自然語言處理技術解決資訊安全領域中找出情報的議題。


    The network technology is more and more mature, it provides users many convenient and diversified services such as online banking and information retrieval etc. Because the online service is convenient for users, it brings several security issues and makes many enterprises and scholars care about the cyber threat intelligence that can help to solve security problems. Previous works find that although the intelligence provider only sells the cyber threat intelligence for users, it's still possible appears on Twitter, blogs or other public sources. In this thesis, our goal is to solve the costly time cost when working on manual intelligence gathering.
    We proposed an algorithm that combines the detection intelligence element and profiles the content of article for intelligence detection. In this thesis, we developed a system which automatically detects a technical article with intelligence. This algorithm will automatically search the articles whether contained the intelligence element and combine natural language processing technology transform the original technology to article structure model that for the system to profile the outline of articles. Then, use random forest training a system which can automatically detect intelligence articles from technical articles.
    The results of this thesis show that the precision and the recall rate are better than the previous research. This thesis has the following contributions: (1) Designing a system for automated detection of intelligence articles where the precision is 90\% and the recall rate is 94.7\%. (2) Proposing an article structure model that has different ways to describe the article profile. (3) Using natural language processing technology to solve the issues that find intelligence from technical articles in the field of information security.

    -‡X i ABSTRACT iii Œ v 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2 Challenges and Goals . . . . . . . . . . . . . . . . . . . 8 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . 9 1.4 The Outline of Thesis . . . . . . . . . . . . . . . . . . . 10 2 Background 11 2.1 Introduction of Cyber Threat Intelligence . . . . . . . . 11 2.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . 14 3 System Description 19 3.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2 System Structure . . . . . . . . . . . . . . . . . . . . . 22 3.2.1 IOC Article Recognizer . . . . . . . . . . . . . 25 3.2.2 Security-related Terms Picker . . . . . . . . . . 28 3.2.3 Structure Vector Generator . . . . . . . . . . . . 32 3.2.4 Similarity-based Profiles Generator . . . . . . . 33 3.2.5 Intelligence Articles Detector . . . . . . . . . . 36 4 Experiments and Results 38 4.1 Experiments Design and Dataset . . . . . . . . . . . . . 39 4.1.1 Experiment Concept and Description . . . . . . 39 4.1.2 Dataset . . . . . . . . . . . . . . . . . . . . . . 40 4.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . 41 4.3 The Result of the Experiments . . . . . . . . . . . . . . 43 4.3.1 Baseline . . . . . . . . . . . . . . . . . . . . . . 43 4.3.2 Baseline with Security-related Terms Picker . . . 44 4.3.3 Baseline with IOC Article Recognizer . . . . . . 45 4.3.4 Baseline with Security-related Terms Picker and IOC Article Recognizer . . . . . . . . . . . . . 46 4.4 Discussion of Experiment Results . . . . . . . . . . . . 47 4.4.1 Experiment Results of Comparison . . . . . . . 47 4.4.2 Case Studies . . . . . . . . . . . . . . . . . . . 50 5 Conclusions and FurtherWorks 52 5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 52 5.2 Further Work . . . . . . . . . . . . . . . . . . . . . . . 53

    [1] Bast, H., Buchhold, B., Haussmann, E., et al. (2016). Semantic search on text and
    knowledge bases. volume 10, pages 119–271. Now Publishers, Inc.
    [2] Bromiley, M. (2016). Threat intelligence: What it is, and how to use it effectively.
    Technical report, SANS.
    [3] Catakoglu, O., Balduzzi, M., and Balzarotti, D. (2016). Automatic extraction of
    indicators of compromise for web applications. In Proceedings of the 25th International
    Conference on World Wide Web, pages 333–343. International World Wide
    Web Conferences Steering Committee.
    [4] CWE (2017). Common weakness enumeration.
    [5] Docutek (2017). Docutek. http://www.docutek.com.tw/.
    [6] ForensicWiKi (2017). Forensicwiki. http://forensicswiki.org/wiki/
    Main_Page.
    [7] Gartner (2014). Threat intelligence: What is it, and how can it protect you from
    today’s advanced cyber-attacks? Technical report, Gartner.
    [8] Gartner (2017). Gartner, inc. http://www.gartner.com/technology/
    home.jsp.
    [9] HitconZeroDay (2017). Hitcon zeroday. https://zeroday.hitcon.org/.
    [10] HuntsmanSecurity (2017). Huntsman security. https://www.
    huntsmansecurity.com/.
    [11] Johnson, C., Badger, L., Waltermire, D., Snyder, J., and Skorupka, C. (2016).
    Guide to cyber threat information sharing. Technical report.
    [12] Kaspersky (2017). Lazarus under the hood. Technical report, Kaspersky.
    [13] Kaspersky Lab (2017). Kaspersky lab. https://www.kaspersky.com/.
    [14] Kergl, D. (2015). Enhancing network security by software vulnerability detection
    using social media analysis extended abstract. In Data MiningWorkshop (ICDMW),
    2015 IEEE International Conference on, pages 1532–1533. IEEE.
    [15] Liao, X., Yuan, K., Wang, X., Li, Z., Xing, L., and Beyah, R. (2016). Acing
    the ioc game: Toward automatic discovery and analysis of open-source cyber threat
    intelligence. In Proceedings of the 2016 ACM SIGSAC Conference on Computer
    and Communications Security, pages 755–766. ACM.
    [16] Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J. R., Bethard, S., and Mc-
    Closky, D. (2014). The stanford corenlp natural language processing toolkit. In
    ACL (System Demonstrations), pages 55–60.
    [17] Mosteller, F. and Wallace, D. L. (1963). Inference in an authorship problem:
    A comparative study of discrimination methods applied to the authorship of the
    disputed federalist papers. volume 58, pages 275–309. Taylor & Francis.
    [18] Mtsweni, J., Shozi, N. A., Matenche, K., Mutemwa, M., Mkhonto, N., and van
    Vuuren, J. J. (2016). Development of a semantic-enabled cybersecurity threat intelligence
    sharing model. In International Conference on Cyber Warfare and Security,
    page 244. Academic Conferences International Limited.
    [19] NIST (2017). National institute of standards and technology. https://www.
    nist.gov/.
    [20] Nunes, E., Diab, A., Gunn, A., Marin, E., Mishra, V., Paliath, V., Robertson, J.,
    Shakarian, J., Thart, A., and Shakarian, P. (2016). Darknet and deepnet mining for
    proactive cybersecurity threat intelligence. In Intelligence and Security Informatics
    (ISI), 2016 IEEE Conference on, pages 7–12. IEEE.
    [21] NVD (2017). National vulnerability database. https://web.nvd.nist.
    gov/.
    [22] OASIS (2017). Advancing open standards for the information society. https:
    //www.oasis-open.org/.
    [23] OBrien, C. (2015). Automated network defense through threat intelligence and
    knowledge management. Technical report, SANS Institute InfoSec Reading Room.
    [24] OpenIOC (2011). Sophisticated indicators for the modern threat landscape: An
    introduction to openioc. Technical report, OpenIOC.
    [25] OpenIOC (2017). The openioc framework. http://www.openioc.org/.
    [26] OSVDB (2017). Osvdb. https://blog.osvdb.org/.
    [27] OTX (2017). Open threat exchange. https://www.alienvault.com/
    open-threat-exchange.
    [28] Ranks NL (2017). Ranks nl. http://www.ranks.nl/home.
    [29] Sabottke, C., Suciu, O., and Dumitras, T. (2015). Vulnerability disclosure in
    the age of social media: Exploiting twitter for predicting real-world exploits. In
    USENIX Security, volume 15.
    [30] SANS (2017). Sans. https://www.sans.org/.
    [31] STIX (2017). Structured threat information expression. https://
    stixproject.github.io/.
    [32] SWIFT (2017). Society for worldwide interbank financial telecommunication.
    https://www.swift.com/.
    [33] symantec (2017). Symantec corporation. https://www.symantec.com/.
    [34] Synopsys (2017). Synopsys. https://www.synopsys.com/.
    [35] Threat Post (2017). Threat post. https://threatpost.com/blog/.
    [36] ThreatMiner (2017). Threatminer. https://www.threatminer.org/.
    [37] Tian, Q., Shang, P., and Feng, G. (2014). Financial time series analysis based on
    information categorization method. volume 416, pages 183–191. Elsevier.
    [38] Trend Micro (2017). Trend micro blog. http://blog.trendmicro.com/.
    [39] Tutkan, M., Ganiz, M. C., and Akyokus¸, S. (2016). Helmholtz principle based
    supervised and unsupervised feature selection methods for text mining. volume 52,
    pages 885–910. Elsevier.
    [40] USCERT (2017). United states computer emergency readiness team. https:
    //www.us-cert.gov/.
    [41] Verint (2017). Verint systems. http://www.verint.com/.
    [42] Yang, A., Peng, C.-K., and Goldberger, A. L. (2017). The marlowe-shakespeare
    authorship debate: Approaching an old problem with new methods. Web.
    [43] Zhu, Z. and Dumitras, T. (2016). Featuresmith: Automatically engineering features
    for malware detection by mining the security literature. In Proceedings of the
    2016 ACM SIGSAC Conference on Computer and Communications Security, pages
    767–778. ACM.

    QR CODE