研究生: |
廖俊雄 Chun - Hsiung Liao |
---|---|
論文名稱: |
使用平滑支撐向量機之個人化垃圾郵件過濾系統 Personalized Spam Mail Filtering by Using SSVM |
指導教授: |
李育杰
Yuh-Jye Lee |
口試委員: |
鄧惟中
Teng Wei-Chung 鮑興國 Hsing-Kuo Pao 阮聖彰 Shanq-Jang Ruan |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
論文出版年: | 2011 |
畢業學年度: | 99 |
語文別: | 中文 |
論文頁數: | 59 |
中文關鍵詞: | 垃圾郵件 、黑名單 、白名單 、平滑式支撐向量機 |
外文關鍵詞: | Spam Mail, Blacklist, Whitelist, Smooth Support Vector Machine |
相關次數: | 點閱:436 下載:2 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著網際網路的快速發展,垃圾郵件已成為企業與個人在資安上的重要挑戰。除了傳統的商業垃圾郵件外,其它包括釣魚攻擊、色情訊息、惡意程式(病毒)等皆是透過垃圾郵件散佈,這些垃圾郵件除了造成大量的網路資源耗用外,更為企業及個人帶來資料洩露的風險。
在本研究中,我們建立一套個人化垃圾郵件過濾架構,除了透過建立個人化黑名單、白名單清單進行垃圾郵件判斷外,並提供使用者回饋機制,由使用者自行決定為正常郵件或垃圾郵件,並針對個人化的歷史郵件資料(包括正常郵件與垃圾郵件),透過平滑式支撐向量機(Smooth Support Vector Machine)進行分類學習,產生分類模型,提供後續新進郵件之分類判斷。
With the rapid development of the Internet, spam mail has become a business and personal data placement in an important challenge. In addition to traditional commercial spam mail, other attacks, including phishing, pornographic messages, malicious code (viruses) are spread through spam, junk e-mail in addition to caused by a large number of network resource consumption, the more businesses and individuals with to the risk of data leakage.
In this study, we set up a personalized spam filter structure, except by creating a personal blacklist, white list for spam list to determine the outside, and provide users with a feedback mechanism by the user to decide the normal mail or spam e-mail, and e-mail for personal history (including normal mail and spam), via Smooth Support Vector Machine classification of learning, resulting in classification model, to provide follow-up with a message of classification judgments.
參考文獻
中文部份
[1]NCC 防治垃圾郵件宣導網 http://www.ncc.gov.tw/antispam/html/我國防制垃圾郵件之政策規劃及推動作為.mht
[2]http://www.symantec.com/zh/tw/about/news/release/article.jsp?prid=20100615_01 賽門鐵克發表6月最新垃圾郵件及網路釣魚報告
[3]錢冠評,「整合平行處理的行為檢查和病毒偵測之郵件伺服器防禦系統」,國立中正大學電機工程研究所,民國九十七年
[4]沈成達,「行動網路上的一個高效率檔案系統」,國立交通大學資訊工程研究所
[5]實例解析網路釣魚攻擊的幕後
http://forum.icst.org.tw/phpbb/viewtopic.php?t=7620
[6]葉生正、蘇民揚,結合SVM與Naive Bayes演算法防堵垃圾郵件的研究,銘傳大學,民國九十六年
[7]彭聖全,利用Google搜尋引擎實作英文文法改錯工具,國立中正大學資訊工程研究所,民國九十八年
英文部份
[8]RFC 821 Simple Mail Transfer Protocol
[9]RFC 5321 Simple Mail Transfer Protocol
[10]Graham-Cumming, J. (2006),Does Bayesian Poisoning Exist?,
virusbtn.com/spambulletin/archive/2006/02/sb200602-poison
[11]Dhamija, R, J. D. Tygar and M. Hearst. "Why Phishing Works".CHI 2006, April 22-27 , Montreal, Quebec, Canada
[12]N. Chou, R. Ledesma, Y. Teraguchi, and J. C.Mitchell. Client-side defense against web-based identity theft. In NDSS, 2004.
[13]Y. Zhang, J. Hong, and L. Cranor. Cantina: A content-based approach to detecting phishing web sites. In WWW, 2007.
[14]Fette, I., Sadeh, N., & Tomasic, A. (2007). Learning to Detect Phishing Emails.Proceedings of the 16th International Conference on World Wide Web. New York:ACM, 649-656
[15]SpamCop Blocking List. http://www.spamcop.net/bl.shtml
[16]S. Hao, N. Feamster, A. Gray, N. Syed, and S. Krasser," Detecting spammers with snare: Spatio-temporal network-level automated reputation engine." in 18th USENIX Security, Montreal, Aug 2009
[17]Greylisting,http://www.greylisting.org/.
[18]I. Androutsopoulos, J. Koutsias, K.V. Chandrinos, G. Paliouras and C.D. Spyropoulos : An Evaluation of Naive Bayesian Anti-Spam Filtering , In Proceedings of the Workshop on Machine Learning in the New Information Age, 11th European Conference on Machine Learning (ECML 2000), Barcelona, Spain, pp. 9–17, 2000.
[19]Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Sakkis, G., Spyropoulos, C.D., and Stamatopoulos, P., “Learning to filter spam e-mail: a comparison of a naïve Bayesian and a memory-based approach”. In Proceedings of the Workshop on Machine Learning and Textual Information Access, PKDD 2000, Lyon, France, pp. 1– 3.
[20]Sahami, M, etc. : A Bayesian Approach to Filtering Junk E-Mail. Papers from the AAAI Workshop, pp. 55-62 , Madison Wisconsin. AAAI Technical Report WS-98-05,1998.
[21]K.M. Schneider, (2004) Learning to Filter Junk E-Mail from Positive and Unlabeled Examples, Proceedings of the 1st International Joint Conference on Natural Language Processing (IJCNLP-04), pp. 602-60
[22]Meyer,T.A,Whately,B.:SpamBayes:Effective open-Source , Bayesian Based,Email Classification System., First Conference on Email and Anti-Spam(CEAS),2004
[23]H. Drucker, D. Wu and V.N. Vapnik, Support Vector Machines for Spam Categorization , IEEE Transaction on Neural Networks, 1999, Vol.10, No.5, pp.1048-1054
[24]M. Woitaszek, M. Shaaban, R. Czernikowski, (January 2003) Identifying Junk Electronic Mail in Microsoft Outlook with a Support Vector Machine, In Proceedings of the 2003 Symposium on Applications and the Internet (SAINT'03), pp. 166
[25]Tsai,C.,MMSEG: A Word Identification System for Mandarin Chinese Text Based on Two Variants of the Maximum Matching Algorithm”, http://technology.chtsai.org/mmseg/
[26]Porter, M., "An Algorithm for Suffix Stripping. Program (Automated Library and Information Systems, Vol. 14, No. 1, pp. 130-137,1980.
[27]Berry, M.W., & Browne, M. (1999).Understanding Search Engines: Mathematical Modeling and Text Retrieval(Software , Environments , Tools ) . Society for industrial & Applied Mathematics.
[28]G. Salton and M. Mcgill , Introduction to Modern Information Retrieval , McGraw-Hill , New York , 1983.
[29]Minoru Sasaki, Hiroyuki Shinnou, “Spam Detection Using Text Clustering”, IEEE International Conference on Cyberworlds, 2005.
[30]Y.Yang and J Pedersen. A Comparative study of Feature Selection in Text Categorization. In International Conference on Machine Learning(ICML) , 1997
[31]Salton, G. Buckley, C. Term-weighting approaches in automatic text retrieval. Information Processing and Management, pages 513-523, 1988.
[32]G.Salton and C.Yang. and A.Wong(1975), A Vector Space Model for automatic indexing. Communication of the ACM 18(11), 613-620
[33]Ian H.Witten , Eibe Frank Data Mining , Morgan Kaufmann
[34]Type I and type II errors, http://en.wikipedia.org/wiki/False_positive
[35]Yun-Jye Lee and O. L. Mangasarian , SSVM: A Smooth Support Vector Machine for Classification. Computation Optimization and Application (2001)
[36]http://search.cpan.org/dist/MIME-tools/lib/MIME/Parser.pm
[37]Androutsopoulos, I., Paliouras, G., “Learning to filter Unsolicited Commercial E-mail”. The Third Conference of E-mail and Anti-Spam 2004