基於電子郵件文字社群網路之適應性電子郵件意圖探尋機制

簡易檢索 / 詳目顯示

回結果列表

研究生：	葉哲甫 Che-Fu Yeh
論文名稱：	基於電子郵件文字社群網路之適應性電子郵件意圖探尋機制 Adaptive E-mail Intention Finding Mechanism Based on E-mail Words Social Networks
指導教授：	李漢銘 Hahn-Ming Lee
口試委員:	賴溪松 Chi-Sung Laih 郭耀煌 Yau-Hwang Kuo 林豐澤 Feng-Tse Lin 鮑興國 Hsing-Kuo Pao
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2007
畢業學年度：	95
語文別：	英文
論文頁數：	61
中文關鍵詞：	電子郵件、廣告信、社群網路、文字社群網路
外文關鍵詞：	E-mail, Spam, Social Network, Words Social Network
相關次數：	點閱：189 下載：4
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

隨著廣告信的快速演進，完整和成功的過濾廣告信的方法仍尚未被發現，然而，這裡頭廣告信發信者唯一沒有隨著技術演進而改變的，就是散播廣告信的意圖，常見的意圖如釣魚郵件的詐騙和各類型的商品宣傳。在本篇研究中，我們提出了一個電子郵件文字的社群網路分析機制，藉由分析使用者對於相關郵件包含字的興趣與否，去描寫出郵件使用者的意圖輪廓。一個電子郵件文字的社群網路，是由個別使用者的郵件信箱資訊，與延伸郵件於網路搜尋引擎的網路資訊所建構，並以網路資訊和關聯法則為基礎去延伸字的社群關係網路。透過提出的方法，使用者感興趣與不感興趣的網路皆會被建構，並使用於分析使用者的意圖，此外本篇研究也提出一個基礎在電子郵件文字的社群網路之偵測方法去分類郵件，最後，以結合人工免疫系統與郵件使用者回饋的概念，本篇研究亦提出一個適應性演算法去更新電子郵件文字的社群網路。由實驗的結果，本篇研究提出的系統能夠藉由分析使用者的意圖來過濾廣告信，並提供了一些分析個人興趣本質和背景的相關想法和進一步的研究。

Through the rapid evaluation of spam, no fully successful solution for filtering spam has been found. However, the spammers still spread spam by using the same intentions such as advertising and phishing. In this investigation, we propose a mechanism of Email Words Social Network (EWSN) for profiling users’ intentions related to interesting and uninteresting e-mail. An EWSN is constructed from the information in an individual user’s mailbox, and expands e-mail information from the World Wide Web (WWW) via the search engine. Based on the web information and association rules among the words, words and relations are expanded as a words’ social network. Via the EWSN, both interested and uninterested EWSNs can be constructed to analyze user intentions. Additionally, an efficiency detection mechanism based on the EWSN is proposed to classify e-mail. Finally, the adaptation algorithm of artificial immune system is applied to EWSN, which is thus adapted to follow the user’s confirmed classification results. The experimental results indicate that the proposed system is very helpful for classifying spam e-mail by analyzing senders’ intentions. Some ideas for analyzing interested nature of people, and profiling their backgrounds, are also presented.

Content
Abstract	II
Acknowledgements	IV
Content	V
List of Tables	IX
List of Figures	X
Chapter 1	Introduction	1
1	Unsolicited Bulk E-mail	1
2	The Challenges of Current Research	2
3	Motivations	3
4	Goals	4
5	Outline of the Thesis	4
Chapter 2	Background	5
1	The Nature of E-mail	5
1.1	The Social Network of E-mail	5
1.2	The Content Structure of E-mail	6
2	General Spam Filtering Methods	7
2.1	The Rule-Based Methods	8
2.2	The Sender Reputation Methods	8
2.3	The Content-Based Methods	9
3	Related Work	9
3.1	Topic Discovery Methods	10
3.2	Words Social Network	10
Chapter 3	E-mail Words Social Networks	12
1	Concept of Proposed Methodology	12
2	The System Architecture of EWSN	14
3	E-mail Words Social Networks Constructor	16
3.1	Preprocessing	17
3.2	Novel Words Expanding	18
3.3	Words Associating	19
3.4	Words Relational Linking	21
3.5	Writing Intentions Labeling	22
4	Concept-Based Detector	23
4.1	Concept Stimulation Measuring	24
4.2	Concept Variant Rate Testing	25
5	Immune-Based Conceptual Adaptor	26
Chapter 4	Experiments	30
1	Experimental Design	30
1.1	Dataset Description	31
1.2	Evaluation Criteria	32
2	EWSN Parameter Setting	33
2.1	The Effect of Selected Keywords	34
2.1.1	Extended Keywords	34
2.1.2	Different Sources Keywords	35
2.2	The Effect of Training E-mail	39
2.2.1	Spam EWSN	40
2.2.2	Non-Spam EWSN	42
2.3	The Structures Analysis of EWSN	43
3	Performance Test	44
3.1	Comparison with Open-Source Spam Filter Software	45
3.2	Comparison with General Spam Filter Methods	46
3.2.1	SA Corpus Test	47
3.2.2	TREC Corpus Test	48
3.2.3	Incremental Test	49
Chapter 5	Discussions and Conclusions	51
1	Discussions	51
1.1	The Interrelated Relation between Non-Spam EWSN and Spam EWSN	52
1.2	Cliques Growth Rate	52
1.3	Advantages of the Proposed Methodology	53
1.4	Limitations of the Proposed Methodology	53
2	Conclusions	54
3	Further Work	55
References	56
Vita	61

                                

[1] S. Ahmed and F. Mithun. Word Stemming to Enhance Spam Filtering. In Proceedings of the First Conference on Email and Anti-Spam, July 2004.
[2] P. S. Andrews and J. Timmis. Inspiration for the next generation of artificial immune systems. International Conference on Artificial Immune Systems, pages 126-138. 2005.
[3] R. Bekkerman, A. McCallum. Disambiguating Web Appearances of People in a Social Network. International World Wide Web Conference, pages 463-470. May 2005.
[4] M. W. Berry. Survey of Text Mining: Clustering, Classification, and Retrieval. Springer-Verlag. 2003.
[5] D. M. Blei and J. D. Lafferty. Correlated topic models. Advances in Neural Information Processing Systems, 18:147-154, 2006.
[6] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, March 2003.
[7] P. O. Boykin and V. P. Roychowdhury. Leveraging social networks to fight spam. Computer, 38(4):61–68, April 2005.
[8] P. J. Carrington, J. Scott, and S. Wasserman. Models and Methods in Social Network Analysis. Cambridge University Press. 2005.
[9] L. N. De Castro, and J. Timmis. Artificial Immune Systems: A New Computational Intelligence Approach. Springer-Verlag. London, September 2002.
[10] G. V. Cormack and T. R. Lynam. TREC 2005 Spam Track Overview. Fourteenth Text REtrieval Conference, 2005.
[11] G. V. Cormack and T. R. Lynam. Spam corpus creation for TREC. In Proceedings of the Second Conference on Email and Anti-Spam, July 2005.
[12] G. V. Cormack and T. R. Lynam. On-line Supervised Spam Filter Evaluation. ACM Trans. on Information Systems, 25(3), July 2007.
[13] A. Culotta, R. Bekkerman, and A. McCallum. Extracting social networks and contact information from email and the Web. In Proceedings of the First Conference on Email and Anti-Spam, July 2004.
[14] A. J. Donnell. The Evolutionary Microcosm of Stock Spam Oapos. IEEE Security & Privacy Magazine, 5(1):70-75, 2007.
[15] H. D. Drucker, D. Wu, and V. Vapnik. Support Vector Machines for spam categorization. IEEE Trans. on Neural Networks, 10(5):1048-1054, 1999.
[16] H. Ebel, L. I. Mielsch, and S. Bornholdt. Scale-free topology of email networks. Physical Review E, 66, 2002.
[17] T. Fawcett. “In vivo” spam filtering: A challenge problem for data mining. KDD Explorations, 5(2):140-148, December 2003.
[18] A. A. A. Ferreira, G. Corso, G. Piuvezam, and M. S. C. F. Alves. A Scale-Free Network of EvokedWords. Brazilian Journal of Physics, 36(3A), September 2006.
[19] G. Fumera, I. Pillai and F. Roli. Spam Filtering Based On The Analysis Of Text Information Embedded Into Images. Journal of Machine Learning Research, 7:2699-2720, December 2006.
[20] L. H. Gomes, C. Cazita, J. M. Almeida, V. Almeida, and W.M. Junior. Workload models of spam and legitimate e-mails. Performance Evaluation, 64(7-8):690-714, August 2007.
[21] J. Goodman, and W. T. Yih. Online Discriminative Spam Filter Training. In Proceedings of the Third Conference on Email and Anti-spam, July 2006.
[22] J. G. Hidalgo and M. M. Lopez. Combining text and heuristics for cost-sensitive spam filtering. Computational Natural Language Learning Workshop, pages 99-102. 2000.
[23] S. A. Hofmeyr and S. Forrest. Immunity by Design: An artificial Immune System. Genetic and Evolutionary Computation Conference, 1999.
[24] S. A. Hofmeyr and S. Forrest. Architecture for an artificial immune system. Evolutionary Computation journal, 8(4):443-473, 2000.
[25] R. Kohavi and F. Provost. Glossary of terms. Machine Learning, 30:271-274, 1998.
[26] J. S. Kong, B. A. Rezaei, N. Sarshar, V. P. Roychowdhury, and P. O. Boykin. Collaborative Spam Filtering Using Email Networks. Computer, 39(8):67-73, August 2006.
[27] F. li, and M. H. Hsieh. An Empirical Study of Clustering Behavior of Spammers and Group-based Anti-Spam Strategies. In Proceedings of the Third Conference on Email and Anti-spam, July 2006.
[28] M. J. Martin-Bautista, D. Sanchez, J. Chamorro-Martinez, J. M. Serrano, and M.A. Vila. Mining web documents to find additional query terms using fuzzy association rules. Fuzzy Sets and Systems, 148(1):85-104, November 2004.
[29] A. McCallum, A. Corrada-Emmanuel, and X. Wang. Topic and Role Discovery in Social Networks. International Joint Conference on Artificial Intelligence, August 2005.
[30] A. E. Motter, A. P. S. de Moura, Y. C. Lai, and P. Dasgupta. Topology of the conceptual social network of language. Physical Review E, 65, June 2002.
[31] T. Oda and T. White. Immunity from Spam: An Analysis of an Artificial Immune System for Junk Email Detection. The 4th International Conference on Artificial Immune Systems, pages 276-289. August 2005.
[32] A. J. O’Donnell, W. C. Mankowski, and J. Abrahason. Using E-mail Social Network Analysis for Detecting Unauthorized Accounts. In Proceedings of the Third Conference on Email and Anti-spam, July 2006.
[33] B. S. Richard and O. K. Jeffrey. MailCat: an intelligent assistant for organizing e-mail. The third annual conference on Autonomous Agents, pages: 276 - 282. 1999.
[34] M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz. A Bayesian approach to filtering junk e-mail. AAAI Workshop on Learning for Text Categorization, pages 55-62. 1998.
[35] A. Secker, A. A. Freitas, and J. Timmis. AISEC: An Artificial Immune System for Email Classification. The IEEE Congress on Evolutionary Computation Proceedings, 1:131-138, December 2003.
[36] D. Shen, J. T. Sun, Q. Yang, Z. Chen. Building Bridges for Web Query Classification. The 29th ACM International Conference on Research and Development in Information Retrieval, pages 131-138. August 2006.
[37] A. C. Surendran, J. C. Platt, and E. Renshaw. Automatic Discovery of Personal Topics to Organize Email. In Proceedings of the Second Conference on Email and Anti-Spam, July 2005.
[38] B. Taylor. Sender Reputation in a Large Webmail Service. In Proceedings of the Third Conference on Email and Antispam, July 2006.
[39] V. H. Tuulos and H. Tirri. Combining Topic Models and Social Networks for Chat Data Mining. IEEE/WIC/ACM International Conference on Web Intelligence Proceedings, pages 206- 213. September 2004.
[40] S. Wasserman, and K. Faust. Social Networks Analysis: Methods and Applications. Cambridge University Press. 1994.
[41] I. H. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques, 2nd Edition, Morgan Kaufmann, San Francisco, 2005.
[42] M. Wong and W. Schlitt. Sender Policy Framework (SPF) for Authorizing Use of Domains in E-mail, Available at: http://www.openspf.org/Project_Overview.
[43] DomainKeys, Proving and Protecting Email Sender Identity, Available at: http://antispam.yahoo.com/domainkeys.
[44] SPAMASSASSIN, The SpamAssassin corpus, Available at: http://spamassassin.apache.org/publiccorpus/.
[45] UCINET, The Software for Social Network Analysis, Available at: http://www.analytictech.com/downloaduc6.htm

簡易檢索 / 詳目顯示

相關論文