簡易檢索 / 詳目顯示

研究生: 葉哲甫
Che-Fu Yeh
論文名稱: 基於電子郵件文字社群網路之適應性電子郵件意圖探尋機制
Adaptive E-mail Intention Finding Mechanism Based on E-mail Words Social Networks
指導教授: 李漢銘
Hahn-Ming Lee
口試委員: 賴溪松
Chi-Sung Laih
郭耀煌
Yau-Hwang Kuo
林豐澤
Feng-Tse Lin
鮑興國
Hsing-Kuo Pao
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2007
畢業學年度: 95
語文別: 英文
論文頁數: 61
中文關鍵詞: 電子郵件廣告信社群網路文字社群網路
外文關鍵詞: E-mail, Spam, Social Network, Words Social Network
相關次數: 點閱:189下載:4
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

隨著廣告信的快速演進,完整和成功的過濾廣告信的方法仍尚未被發現,然而,這裡頭廣告信發信者唯一沒有隨著技術演進而改變的,就是散播廣告信的意圖,常見的意圖如釣魚郵件的詐騙和各類型的商品宣傳。在本篇研究中,我們提出了一個電子郵件文字的社群網路分析機制,藉由分析使用者對於相關郵件包含字的興趣與否,去描寫出郵件使用者的意圖輪廓。一個電子郵件文字的社群網路,是由個別使用者的郵件信箱資訊,與延伸郵件於網路搜尋引擎的網路資訊所建構,並以網路資訊和關聯法則為基礎去延伸字的社群關係網路。透過提出的方法,使用者感興趣與不感興趣的網路皆會被建構,並使用於分析使用者的意圖,此外本篇研究也提出一個基礎在電子郵件文字的社群網路之偵測方法去分類郵件,最後,以結合人工免疫系統與郵件使用者回饋的概念,本篇研究亦提出一個適應性演算法去更新電子郵件文字的社群網路。由實驗的結果,本篇研究提出的系統能夠藉由分析使用者的意圖來過濾廣告信,並提供了一些分析個人興趣本質和背景的相關想法和進一步的研究。


Through the rapid evaluation of spam, no fully successful solution for filtering spam has been found. However, the spammers still spread spam by using the same intentions such as advertising and phishing. In this investigation, we propose a mechanism of Email Words Social Network (EWSN) for profiling users’ intentions related to interesting and uninteresting e-mail. An EWSN is constructed from the information in an individual user’s mailbox, and expands e-mail information from the World Wide Web (WWW) via the search engine. Based on the web information and association rules among the words, words and relations are expanded as a words’ social network. Via the EWSN, both interested and uninterested EWSNs can be constructed to analyze user intentions. Additionally, an efficiency detection mechanism based on the EWSN is proposed to classify e-mail. Finally, the adaptation algorithm of artificial immune system is applied to EWSN, which is thus adapted to follow the user’s confirmed classification results. The experimental results indicate that the proposed system is very helpful for classifying spam e-mail by analyzing senders’ intentions. Some ideas for analyzing interested nature of people, and profiling their backgrounds, are also presented.

Content Abstract II Acknowledgements IV Content V List of Tables IX List of Figures X Chapter 1 Introduction 1 1.1 Unsolicited Bulk E-mail 1 1.2 The Challenges of Current Research 2 1.3 Motivations 3 1.4 Goals 4 1.5 Outline of the Thesis 4 Chapter 2 Background 5 2.1 The Nature of E-mail 5 2.1.1 The Social Network of E-mail 5 2.1.2 The Content Structure of E-mail 6 2.2 General Spam Filtering Methods 7 2.2.1 The Rule-Based Methods 8 2.2.2 The Sender Reputation Methods 8 2.2.3 The Content-Based Methods 9 2.3 Related Work 9 2.3.1 Topic Discovery Methods 10 2.3.2 Words Social Network 10 Chapter 3 E-mail Words Social Networks 12 3.1 Concept of Proposed Methodology 12 3.2 The System Architecture of EWSN 14 3.3 E-mail Words Social Networks Constructor 16 3.3.1 Preprocessing 17 3.3.2 Novel Words Expanding 18 3.3.3 Words Associating 19 3.3.4 Words Relational Linking 21 3.3.5 Writing Intentions Labeling 22 3.4 Concept-Based Detector 23 3.4.1 Concept Stimulation Measuring 24 3.4.2 Concept Variant Rate Testing 25 3.5 Immune-Based Conceptual Adaptor 26 Chapter 4 Experiments 30 4.1 Experimental Design 30 4.1.1 Dataset Description 31 4.1.2 Evaluation Criteria 32 4.2 EWSN Parameter Setting 33 4.2.1 The Effect of Selected Keywords 34 4.2.1.1 Extended Keywords 34 4.2.1.2 Different Sources Keywords 35 4.2.2 The Effect of Training E-mail 39 4.2.2.1 Spam EWSN 40 4.2.2.2 Non-Spam EWSN 42 4.2.3 The Structures Analysis of EWSN 43 4.3 Performance Test 44 4.3.1 Comparison with Open-Source Spam Filter Software 45 4.3.2 Comparison with General Spam Filter Methods 46 4.3.2.1 SA Corpus Test 47 4.3.2.2 TREC Corpus Test 48 4.3.2.3 Incremental Test 49 Chapter 5 Discussions and Conclusions 51 5.1 Discussions 51 5.1.1 The Interrelated Relation between Non-Spam EWSN and Spam EWSN 52 5.1.2 Cliques Growth Rate 52 5.1.3 Advantages of the Proposed Methodology 53 5.1.4 Limitations of the Proposed Methodology 53 5.2 Conclusions 54 5.3 Further Work 55 References 56 Vita 61

[1] S. Ahmed and F. Mithun. Word Stemming to Enhance Spam Filtering. In Proceedings of the First Conference on Email and Anti-Spam, July 2004.
[2] P. S. Andrews and J. Timmis. Inspiration for the next generation of artificial immune systems. International Conference on Artificial Immune Systems, pages 126-138. 2005.
[3] R. Bekkerman, A. McCallum. Disambiguating Web Appearances of People in a Social Network. International World Wide Web Conference, pages 463-470. May 2005.
[4] M. W. Berry. Survey of Text Mining: Clustering, Classification, and Retrieval. Springer-Verlag. 2003.
[5] D. M. Blei and J. D. Lafferty. Correlated topic models. Advances in Neural Information Processing Systems, 18:147-154, 2006.
[6] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, March 2003.
[7] P. O. Boykin and V. P. Roychowdhury. Leveraging social networks to fight spam. Computer, 38(4):61–68, April 2005.
[8] P. J. Carrington, J. Scott, and S. Wasserman. Models and Methods in Social Network Analysis. Cambridge University Press. 2005.
[9] L. N. De Castro, and J. Timmis. Artificial Immune Systems: A New Computational Intelligence Approach. Springer-Verlag. London, September 2002.
[10] G. V. Cormack and T. R. Lynam. TREC 2005 Spam Track Overview. Fourteenth Text REtrieval Conference, 2005.
[11] G. V. Cormack and T. R. Lynam. Spam corpus creation for TREC. In Proceedings of the Second Conference on Email and Anti-Spam, July 2005.
[12] G. V. Cormack and T. R. Lynam. On-line Supervised Spam Filter Evaluation. ACM Trans. on Information Systems, 25(3), July 2007.
[13] A. Culotta, R. Bekkerman, and A. McCallum. Extracting social networks and contact information from email and the Web. In Proceedings of the First Conference on Email and Anti-Spam, July 2004.
[14] A. J. Donnell. The Evolutionary Microcosm of Stock Spam Oapos. IEEE Security & Privacy Magazine, 5(1):70-75, 2007.
[15] H. D. Drucker, D. Wu, and V. Vapnik. Support Vector Machines for spam categorization. IEEE Trans. on Neural Networks, 10(5):1048-1054, 1999.
[16] H. Ebel, L. I. Mielsch, and S. Bornholdt. Scale-free topology of email networks. Physical Review E, 66, 2002.
[17] T. Fawcett. “In vivo” spam filtering: A challenge problem for data mining. KDD Explorations, 5(2):140-148, December 2003.
[18] A. A. A. Ferreira, G. Corso, G. Piuvezam, and M. S. C. F. Alves. A Scale-Free Network of EvokedWords. Brazilian Journal of Physics, 36(3A), September 2006.
[19] G. Fumera, I. Pillai and F. Roli. Spam Filtering Based On The Analysis Of Text Information Embedded Into Images. Journal of Machine Learning Research, 7:2699-2720, December 2006.
[20] L. H. Gomes, C. Cazita, J. M. Almeida, V. Almeida, and W.M. Junior. Workload models of spam and legitimate e-mails. Performance Evaluation, 64(7-8):690-714, August 2007.
[21] J. Goodman, and W. T. Yih. Online Discriminative Spam Filter Training. In Proceedings of the Third Conference on Email and Anti-spam, July 2006.
[22] J. G. Hidalgo and M. M. Lopez. Combining text and heuristics for cost-sensitive spam filtering. Computational Natural Language Learning Workshop, pages 99-102. 2000.
[23] S. A. Hofmeyr and S. Forrest. Immunity by Design: An artificial Immune System. Genetic and Evolutionary Computation Conference, 1999.
[24] S. A. Hofmeyr and S. Forrest. Architecture for an artificial immune system. Evolutionary Computation journal, 8(4):443-473, 2000.
[25] R. Kohavi and F. Provost. Glossary of terms. Machine Learning, 30:271-274, 1998.
[26] J. S. Kong, B. A. Rezaei, N. Sarshar, V. P. Roychowdhury, and P. O. Boykin. Collaborative Spam Filtering Using Email Networks. Computer, 39(8):67-73, August 2006.
[27] F. li, and M. H. Hsieh. An Empirical Study of Clustering Behavior of Spammers and Group-based Anti-Spam Strategies. In Proceedings of the Third Conference on Email and Anti-spam, July 2006.
[28] M. J. Martin-Bautista, D. Sanchez, J. Chamorro-Martinez, J. M. Serrano, and M.A. Vila. Mining web documents to find additional query terms using fuzzy association rules. Fuzzy Sets and Systems, 148(1):85-104, November 2004.
[29] A. McCallum, A. Corrada-Emmanuel, and X. Wang. Topic and Role Discovery in Social Networks. International Joint Conference on Artificial Intelligence, August 2005.
[30] A. E. Motter, A. P. S. de Moura, Y. C. Lai, and P. Dasgupta. Topology of the conceptual social network of language. Physical Review E, 65, June 2002.
[31] T. Oda and T. White. Immunity from Spam: An Analysis of an Artificial Immune System for Junk Email Detection. The 4th International Conference on Artificial Immune Systems, pages 276-289. August 2005.
[32] A. J. O’Donnell, W. C. Mankowski, and J. Abrahason. Using E-mail Social Network Analysis for Detecting Unauthorized Accounts. In Proceedings of the Third Conference on Email and Anti-spam, July 2006.
[33] B. S. Richard and O. K. Jeffrey. MailCat: an intelligent assistant for organizing e-mail. The third annual conference on Autonomous Agents, pages: 276 - 282. 1999.
[34] M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz. A Bayesian approach to filtering junk e-mail. AAAI Workshop on Learning for Text Categorization, pages 55-62. 1998.
[35] A. Secker, A. A. Freitas, and J. Timmis. AISEC: An Artificial Immune System for Email Classification. The IEEE Congress on Evolutionary Computation Proceedings, 1:131-138, December 2003.
[36] D. Shen, J. T. Sun, Q. Yang, Z. Chen. Building Bridges for Web Query Classification. The 29th ACM International Conference on Research and Development in Information Retrieval, pages 131-138. August 2006.
[37] A. C. Surendran, J. C. Platt, and E. Renshaw. Automatic Discovery of Personal Topics to Organize Email. In Proceedings of the Second Conference on Email and Anti-Spam, July 2005.
[38] B. Taylor. Sender Reputation in a Large Webmail Service. In Proceedings of the Third Conference on Email and Antispam, July 2006.
[39] V. H. Tuulos and H. Tirri. Combining Topic Models and Social Networks for Chat Data Mining. IEEE/WIC/ACM International Conference on Web Intelligence Proceedings, pages 206- 213. September 2004.
[40] S. Wasserman, and K. Faust. Social Networks Analysis: Methods and Applications. Cambridge University Press. 1994.
[41] I. H. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques, 2nd Edition, Morgan Kaufmann, San Francisco, 2005.
[42] M. Wong and W. Schlitt. Sender Policy Framework (SPF) for Authorizing Use of Domains in E-mail, Available at: http://www.openspf.org/Project_Overview.
[43] DomainKeys, Proving and Protecting Email Sender Identity, Available at: http://antispam.yahoo.com/domainkeys.
[44] SPAMASSASSIN, The SpamAssassin corpus, Available at: http://spamassassin.apache.org/publiccorpus/.
[45] UCINET, The Software for Social Network Analysis, Available at: http://www.analytictech.com/downloaduc6.htm

QR CODE