研究生: |
張世良 Shih-liang Chang |
---|---|
論文名稱: |
依使用者行為和內文分析偵測垃圾微網誌 Detecting Microblog Spam using User Behavior and Content Analysis |
指導教授: |
洪西進
Shi-Jinn Horng |
口試委員: |
林韋宏
none 郭奕宏 none 顏成安 none 王獻 none 林琮烈 none |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
論文出版年: | 2010 |
畢業學年度: | 98 |
語文別: | 中文 |
論文頁數: | 50 |
中文關鍵詞: | 垃圾微網誌 、使用者行為 、Twitter |
外文關鍵詞: | Microblog spam, User behavior, Twitter |
相關次數: | 點閱:242 下載:4 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著網路的進步,微網誌(Microblog)儼然成為一個新興的網路服務,相較於傳統部落格,其內容上相對的簡短,只能發表140個字,並強調簡單又快速的更新與加強人與人之間的互動。由於微網誌帶來的便利,使得有心人士會利用微網誌來散佈、商業廣告或一些惡意的連結,造成使用者的困擾。
本論文透過觀察使用者行為和分析內文,來偵測Twitter上的垃圾微網誌。對微網誌上所發表之內文我們用內文分析做關連性的偵測,再加上對使用者在Twitter上的行為分析,來判斷是否為垃圾微網誌。本實驗所用的資料庫是從Twitter的使用者收集下來,手動標記2100筆使用者。實驗結果證明本論文對垃圾微網誌有90%的偵測率,研究結果將可提供微網誌網路服務業者有效制止對使用者造成困擾的惡意連結或垃圾微網誌。
In these years, Internet grows up quickly. Microblog is a new form of blog. A microblog differs from a traditional blog in that its content is typically much smaller, in both actual size and aggregate file size. Microbolog can post up to 140 characters on the author's profile page. Because microblog is an easy way to contact with other people, spammer could use it to spread malicious links, sex ad and meaningless content to bother users.
This paper propose a method that combines Content-based features and User-behavior features to identify if a mircoblog is a spam. The former is used to detect the relationship of the posted contents and the latter is used to detect the user’s behavior. The data in the experimental database all were collected from Twitter's users and there are 2100 users. Experimental results show that the detection rate of the proposed microblog spammer detector is up to 90%.
[1] http://en.wikipedia.org/wiki/Twitter
[2] http://www.alexa.com/siteinfo/twitter.com
[3]http://www.comscore.com/Press_Events/Press_Releases/2009/4/Twitter_Traffic_More_than_Doubles/(language)/eng-US
[4] Kelly, Ryan, ed. (August 12, 2009), "Twitter Study – August 2009" (PDF), Twitter Study Reveals Interesting Results About Usage, San Antonio, Texas: Pear Analytics, retrieved August 18, 2009
[5] http://blog.twitter.com/2010/04/growing-around-world.html
[6] Yu-Ru Lin Hari Sundaram, Yun Chi Jun Tatemura BelleTseng, "SPLOG Detection Using Content, Time and Link Structures," IEEE International Conference on Multimedia & Expo (ICME), 2007, PP. 2030 – 2033.
[7] Yiqun Liu, Rongwei Cen, Min Zhang, Shaoping Ma, Liyun Ru, "Identifying Web Spam with User Behavior Analysis," Adversarial Information Retrieval on the Web (AIRWeb’08), April 22, 2008, PP. 9-16.
[8] Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon, "What is Twitter, a Social Network or a News Media?," International World Wide Web Conference Committee(WWW 2010), 2010, PP.591-600.
[9] http://apiwiki.twitter.com/Twitter-API-Documentation
[10] The New York Times. http://bits.blogs.nytimes.com/2009/07/07/spammers-shorten-their-urls/.
[11] Akshay Java, Xiaodan Song, Tim Finin, Belle Tseng, "Why We Twitter: Understanding Microblogging Usage and Communities," Joint 9th WEBKDD and 1st SNA-KDD Workshop ’07 , August 12, 2007, PP. 56-65.
[12] M Faloutsos, P Faloutsos, C Faloutsos, "On Power-law Relationships of the Internet Topology," Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM 1999), Aug. 30-Sep. 03, 1999, PP. 251-262.
[13] V. Vapnik, "Statistical Learning Theory," Wiley, New York, 1998.
[14] E. Ardizzone, A. Chella, R.Pirrone, "An Architecture for Automatic Gesture Analysis," Proceedings of the Working Conference on Advanced Visual Interfaces, May 2000, PP. 205-210.
[15] http://www.iterasi.net/openviewer.aspx?sqrlitid=luyjnglcz0qchuls5ophfa
[16] Kyumin Lee, James Caverlee, Steve Webb, "The Social Honeypot Project: Protecting Online Communities from Spammers," International World Wide Web Conference Committee(WWW2010), 2010, PP.1139-1140.
[17] Balachander Krishnamurthy , Phillipa Gill , Martin Arlitt, A few chirps about twitter, Proceedings of the first workshop on Online social networks, August 18-18, 2008,PP.19-24.
[18] http://twitter.com/
[19]http://ebiquity.umbc.edu/resource/html/id/216/Spam-in-Blogs-and-Social-Media