簡易檢索 / 詳目顯示

研究生: 劉恩賜
En-Sih Liou
論文名稱: 應用條件隨機域進行插入導向前後關聯分析以辨識網站擬態攻擊
Interleaving-Oriented Correlation Finding for Web Mimicry Attacks based on Conditional Random Fields
指導教授: 李漢銘
Hahn-Ming Lee
口試委員: 鄭博仁
PO-REN JENG
王榮英
Jung-Ying Wang
陳志銘
Chih-Ming Chen
鮑興國
Hsing-Kuo Pao
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2009
畢業學年度: 97
語文別: 英文
論文頁數: 63
中文關鍵詞: 網站擬態攻擊條件隨機域
外文關鍵詞: Web Mimicry Attack, Conditional Random Fields
相關次數: 點閱:149下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 網站擬態攻擊(Web Mimicry Attack)藉由插入無意義而且不會影響原本攻擊效果的字串來躲避異常行為偵測法(Anomaly Detection)的網站應用入侵偵測系統的偵測。本篇論文以找出插入導向的前後關係為基礎,提出網站擬態攻擊的偵測與防禦系統,Interleaving-Oriented Correlation Finding System (IOCFS)。IOCF首先將HTTP的請求資料切割轉換成標記的序列,然後利用條件隨機域(Conditional Random Fields)來朔模標記之間的關聯性以建立偵測網站擬態攻擊的系統模型。條件隨機域是一個廣泛應用於處理序列標記問題(Sequence Labeling Problem)的演算法,因此善於擷取在標記序列中的不同標記之間的高度相依性。跟之前常用於解決序列標記問題的方法做比較(例如,隱藏馬可夫鏈模型),因為條件隨機域消除嚴格的獨立假設而且能夠朔模標記序列中的長期相依性以加強偵測網站擬態攻擊的能力。本論文提出的方法只要分析HTTP的請求資料即可偵測網站擬態攻擊,因此可以輕易掛載在目前網站入侵偵測系統上以加強網站擬態攻擊的偵測與防禦。本論文實驗使用ECML/PKDD07的分析網站流量邀請比賽中(Analyzing Web Traffic challenge)所公開的資料集合來驗證本論文提出的方法。實驗結果顯示本論文提出的系統能有效偵測出網站擬態攻擊和一般網站攻擊,即使插入大量正常字元企圖影響入侵偵測系統的判斷。


    Web mimicry attacks lead anomaly-based web application intrusion detection to be
    evaded through insertion of meaningless or irrelevant characters. In this study, we propose Interleaving-Oriented Correlation Finding (IOCF) in order to against the Web Mimicry attacks. IOCF intends to segment HTTP requests into token sequences and models the token correlation in order to identify web mimicry attacks based on Conditional Random Fields (CRFs). CRFs is a widespread algorithm for solving sequence labeling problem and therefore robust for capturing the high dependency among different tokens in token sequence. Since CRF relaxes strong independence assumptions with previewed probabilistic sequence analysis methods (e.g. HMM), it is capable to capture long term dependency among observed sequences of token for improving the detection capability of Web Mimicry attacks. The proposed method just needs to inspect
    HTTP request, and is easier to plug in existing intrusion detection system for
    identifying subtle web attacks. The datasets are from “ECML/PKDD 2007’s Analyzing Web Traffic challenge”, public datasets for web application attacks detection for evaluation. The experimental result shows that the proposed system performs well in both web mimicry attacks and general web application attacks detection even in heavy interleaving cases.

    1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 3 1.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Goals . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Contributions . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 6 1.5 Outlines of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . 7 2 Background 8 2.1 Web Mimicry Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3 Sequential Analysis Models . . . . . . . . . . . . . . . . . . . . . . 14 3 System Architecture 16 3.1 Token Sequence Extractor . . . . . . . . . . . . . . . . . . . . . . . 18 3.1.1 High Variability Generalizer . . . . . . . . . . . . . . . . . . 19 3.1.2 Obfuscated Characters Decoder . . . . . . . . . . . . . . . . 20 3.1.3 Redundant Characters Remover . . . . . . . . . . . . . . . . 21 3.1.4 Token Sequence Generator . . . . . . . . . . . . . . . . . . . 21 3.2 CRFs-based Token Correlator . . . . . . . . . . . . . . . . . . . . . 23 3.2.1 Feature Template Configurer . . . . . . . . . . . . . . . . . . 24 3.2.2 CRFs Model Constructer . . . . . . . . . . . . . . . . . . . . 25 3.3 Web Mimicry Attacks Detector . . . . . . . . . . . . . . . . . . . . . 26 3.3.1 CRFs-based Token Sequence Labeler . . . . . . . . . . . . . 27 3.3.2 Mimicry Attacks Identifier . . . . . . . . . . . . . . . . . . . 28 3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4 Experiments 31 4.1 Datasets . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.2 Performance Measurements and Tools . . . . . . . . . . . . . . . . . 35 4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.3.1 Effectiveness Analysis in Detection of Web Mimicry Attacks . 36 4.3.2 Tolerance Analysis in Interleaving with Different Ratios . . . 38 4.3.3 Performance Analysis in General Web Attacks . . . . . . . . 40 4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5 Conclusion and Further Work 44 5.1 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.3 Further Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

    [1] CRF++: Yet Another CRF toolkit, Std., 2006. [Online]. Available:
    http://chasen.org/oetaku/software/CRF++/
    [2] P. Domingos and M. J. Pazzani,“On the optimality of the simple bayesian classifier under zero-one loss,”in Machine Learning, vol. 29, no. 2-3, 1997, pp. 103–130.
    [3] ECML/PKDD 2007's Analyzing Web Traffic Challenge, 2007. [Online].
    Available: http://www.lirmm.fr/pkdd2007-challenge/index.html
    [4] Est′evez-Tapiador and Juan M. and Pedro Garc′ıa-Teodoro and Jes′us E. D′ıaz-
    Verdejo,“Measuring normality in HTTP traffic for anomaly-based intrusion detection,"in The International Journal of Computer and Telecommunications Networking, vol. 45, no. 2, 2004, pp. 175–193.
    [5] R. Fielding, J. Gettys, J. C. Mogul, H. Frystyk, L. Masinter, P. Leach, and
    T. Berners-Lee,“Hypertext Transfer Protocol – HTTP/1.1,”1998. [Online].
    Available: citeseer.ist.psu.edu/article/fielding98hypertext.html
    [6] P. Fogla and W. Lee,“Evading network anomaly detection systems: formal reasoning and practical techniques,”in CCS ’06: Proceedings of the 13th ACM conference on Computer and communications security, 2006, pp. 59–68.
    [7] G. D. Fornay, The viterbi algorithm,”in Proceedings of the IEEE, vol. 61, no. 3, 1973, pp. 268–278.
    [8] D. Gao, M. K. Reiter, and D. X. Song,“Behavioral distance for intrusion detection,”in In Proceedings of the 8th International Symposium on Recent Advances in Intrusion Detection (RAID 2005), 2005, pp. 63–81.
    [9] D. Gao,M. K. Reiter, and D. X. Song.,“Behavioral DistanceMeasurement Using
    Hidden Markov Models,”in In Proceedings of the 9th International Symposium
    on Recent Advances in Intrusion Detection (RAID 2006), 2006, pp. 19–40.
    [10] Ian H. Witten and E. Frank,“Data mining: Practical machine learning tools and techniques,”in Morgan Kaufmann, 2nd Edition, 2005.
    [11] K. S. Jones,“A statistical interpretation of term specificity and its application in retrieval,”in Document retrieval systems, 1988, pp. 132–142.
    [12] Kenneth L. Ingham and H. Inoue,“Comparing anomaly detection techniques for http,”in In Proceedings of the 10th International Symposium on Recent Advances in Intrusion Detection (RAID 2007), 2007, pp. 42–62.
    [13] Kenneth L. Ingham and A. Somayaji,“A methodology for designing accurate
    anomaly detection systems,”in Proceedings of the 4th international IFIP/ACM
    Latin American conference on Networking, 2007, pp. 139–143.
    [14] Kenneth L. Ingham, A. Somayaji, J. Burge, and S. Forrest,“Learning DFA representations of HTTP for protecting web applications,”in The International Journal of Computer and Telecommunications Networking, vol. 51, no. 5, 2007, pp.
    1239–1255.
    [15] M. Kiani, A. Clark, and G.Mohay,“Evaluation of anomaly based character distribution models in the detection of sql injection attacks,”in Availability, Reliability and Security, ARES 08, 2008, pp. 47–55.
    [16] R. Kohavi and F. Provost.,“Glossary of terms,”in Machine Learning, vol. 30, no. 2/3, 1998, p. 271.
    [17] C. Kruegel and G. Vigna,“Anomaly detection of web-based attacks,”in CCS
    ’03: Proceedings of the 10th ACM conference on Computer and communications
    security, 2003, pp. 251–261.
    [18] C. Kruegel, G. Vigna, and W. Robertson,“A multi-model approach to the detection of web-based attacks,”in The International Journal of Computer and
    Telecommunications Networking, vol. 48, no. 5, 2005, pp. 717–738.
    [19] J. Lafferty, A. McCallum, and F. Pereira,“Conditional random fields: Probabilistic models for segmenting and labeling sequence data,”in Proceedings of the Eighteenth International Conference on Machine Learning, 2001, pp. 282 – 289.
    [20] Lawrence R. Rabiner,“A tutorial on hidden markov models and selected applications in speech recognition,”in Proceedings of the IEEE, vol. 77, no. 2, 1989, p. 257.
    [21] O. Maor and A. Shulman, SQL injection signatures evasion, Std., 2004. [Online]. Available: http://www.imperva.com/applicationdefensecenter/whitepapers/sqlinjectionsignaturesevasion.html
    [22] Metasploit Project, Metasploit Developement Team, Std., 2006. [Online].
    Available: http://www.metasploit.com/
    [23] Panda Labs, MPack Uncovered, Std., 2007. [Online]. Available: http:
    //pandalabs.pandasecurity.com/
    [24] C. Raissi, J. Brissaud, G. Dray, P. Poncelet, M. Roche, and M. Teisseire, “Web analyzing traffic challenge: Description and results,”in Proceedings of the ECML/PKDD 2007 Discovery Challenge, 2007.
    [25] SANS, SANS Top 20, Std. [Online]. Available: http://www.sans.org/top20/
    [26] Y. Song, S. Stolfo, and A. Keromytis,“Spectrogram: A mixture-of-markovchains model for anomaly detection in web traffic,”in the 16th Annual Network and Distributed System (NDSS), 2009.
    [27] Sophos, Security threat report 2008, Std., 7 2008. [Online]. Available:
    https://secure.sophos.com/security/whitepapers/sophossecurity-report-2008
    [28] Sriraghavan R.G. and Lucchese L.,“Data processing and anomaly detection
    in web-based applications,”in Machine Learning for Signal Processing. MLSP
    2008. IEEE Workshop on, 2008, pp. 187–192.
    [29] F. Valeur, D. Mutz, and G. Vigna,“A learning-based approach to the detection of sql attacks,”in detection of intrusions and malware, and vulnerability assessment(DIMVA 2005), 2005, pp. 123–140.
    [30] D. Wagner and P. Soto,“Mimicry attacks on host-based intrusion detection systems,”in CCS ’02: Proceedings of the 9th ACM conference on Computer and
    communications security, 2002, pp. 255–264.
    [31] K.Wang, Janak J. Parekh, and Salvatore J. Stolfo,“Anagram: A content anomaly detector resistant to mimicry attack,”in In Proceedings of the 9th International Symposium on Recent Advances in Intrusion Detection (RAID 2006), 2006, pp. 226–248.

    QR CODE