簡易檢索 / 詳目顯示

研究生: 張世良
Shih-liang Chang
論文名稱: 依使用者行為和內文分析偵測垃圾微網誌
Detecting Microblog Spam using User Behavior and Content Analysis
指導教授: 洪西進
Shi-Jinn Horng
口試委員: 林韋宏
none
郭奕宏
none
顏成安
none
王獻
none
林琮烈
none
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2010
畢業學年度: 98
語文別: 中文
論文頁數: 50
中文關鍵詞: 垃圾微網誌使用者行為Twitter
外文關鍵詞: Microblog spam, User behavior, Twitter
相關次數: 點閱:195下載:4
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著網路的進步,微網誌(Microblog)儼然成為一個新興的網路服務,相較於傳統部落格,其內容上相對的簡短,只能發表140個字,並強調簡單又快速的更新與加強人與人之間的互動。由於微網誌帶來的便利,使得有心人士會利用微網誌來散佈、商業廣告或一些惡意的連結,造成使用者的困擾。
    本論文透過觀察使用者行為和分析內文,來偵測Twitter上的垃圾微網誌。對微網誌上所發表之內文我們用內文分析做關連性的偵測,再加上對使用者在Twitter上的行為分析,來判斷是否為垃圾微網誌。本實驗所用的資料庫是從Twitter的使用者收集下來,手動標記2100筆使用者。實驗結果證明本論文對垃圾微網誌有90%的偵測率,研究結果將可提供微網誌網路服務業者有效制止對使用者造成困擾的惡意連結或垃圾微網誌。


    In these years, Internet grows up quickly. Microblog is a new form of blog. A microblog differs from a traditional blog in that its content is typically much smaller, in both actual size and aggregate file size. Microbolog can post up to 140 characters on the author's profile page. Because microblog is an easy way to contact with other people, spammer could use it to spread malicious links, sex ad and meaningless content to bother users.

    This paper propose a method that combines Content-based features and User-behavior features to identify if a mircoblog is a spam. The former is used to detect the relationship of the posted contents and the latter is used to detect the user’s behavior. The data in the experimental database all were collected from Twitter's users and there are 2100 users. Experimental results show that the detection rate of the proposed microblog spammer detector is up to 90%.

    圖目錄 V 表目錄 VII 第一章導論 1 1.1背景 1 第二章 相關工作 6 2.1垃圾微網誌(Microblog spam) 6 2.2 Spam 種類 7 2.3相關研究 9 第三章 資料收集 11 3.1 資料收集方法 11 3.1.1 使用者個人資料 11 3.1.2 Twitter上的發文(Tweets) 11 3.2 資料處理 12 3.3 Follower & Following 12 3.3 Twitter Search 13 3.4使用者帳號分類 14 第四章 研究方法 16 4.1系統架構 16 4.2特徵集定義 18 4.3 個人資料特徵集(Profile identity set) 19 4.4使用者行為特徵集(Behavior identity set) 24 1. Ratio 25 2. Bi-follow 數 26 3. 變異數 Variance 27 4. 平均發文間隔 28 5. 連結比率 29 6. Tag數 30 7. 重復內容比 31 8. 重復連結位址 32 9. 被標記的次數 33 4.5關鍵字過濾器 34 4.6 SVM分類器 35 第五章 實驗流程 43 5.1 交叉驗證 Cross-validation 43 5.2偵測率,誤報率和F1-measure 44 5.3實驗過程 45 5.4其他論文之比較 46 第六章 結論及未來展望 48 文獻參考 49

    [1] http://en.wikipedia.org/wiki/Twitter
    [2] http://www.alexa.com/siteinfo/twitter.com
    [3]http://www.comscore.com/Press_Events/Press_Releases/2009/4/Twitter_Traffic_More_than_Doubles/(language)/eng-US
    [4] Kelly, Ryan, ed. (August 12, 2009), "Twitter Study – August 2009" (PDF), Twitter Study Reveals Interesting Results About Usage, San Antonio, Texas: Pear Analytics, retrieved August 18, 2009
    [5] http://blog.twitter.com/2010/04/growing-around-world.html
    [6] Yu-Ru Lin Hari Sundaram, Yun Chi Jun Tatemura BelleTseng, "SPLOG Detection Using Content, Time and Link Structures," IEEE International Conference on Multimedia & Expo (ICME), 2007, PP. 2030 – 2033.
    [7] Yiqun Liu, Rongwei Cen, Min Zhang, Shaoping Ma, Liyun Ru, "Identifying Web Spam with User Behavior Analysis," Adversarial Information Retrieval on the Web (AIRWeb’08), April 22, 2008, PP. 9-16.
    [8] Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon, "What is Twitter, a Social Network or a News Media?," International World Wide Web Conference Committee(WWW 2010), 2010, PP.591-600.
    [9] http://apiwiki.twitter.com/Twitter-API-Documentation
    [10] The New York Times. http://bits.blogs.nytimes.com/2009/07/07/spammers-shorten-their-urls/.
    [11] Akshay Java, Xiaodan Song, Tim Finin, Belle Tseng, "Why We Twitter: Understanding Microblogging Usage and Communities," Joint 9th WEBKDD and 1st SNA-KDD Workshop ’07 , August 12, 2007, PP. 56-65.
    [12] M Faloutsos, P Faloutsos, C Faloutsos, "On Power-law Relationships of the Internet Topology," Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM 1999), Aug. 30-Sep. 03, 1999, PP. 251-262.
    [13] V. Vapnik, "Statistical Learning Theory," Wiley, New York, 1998.
    [14] E. Ardizzone, A. Chella, R.Pirrone, "An Architecture for Automatic Gesture Analysis," Proceedings of the Working Conference on Advanced Visual Interfaces, May 2000, PP. 205-210.
    [15] http://www.iterasi.net/openviewer.aspx?sqrlitid=luyjnglcz0qchuls5ophfa
    [16] Kyumin Lee, James Caverlee, Steve Webb, "The Social Honeypot Project: Protecting Online Communities from Spammers," International World Wide Web Conference Committee(WWW2010), 2010, PP.1139-1140.
    [17] Balachander Krishnamurthy , Phillipa Gill , Martin Arlitt, A few chirps about twitter, Proceedings of the first workshop on Online social networks, August 18-18, 2008,PP.19-24.
    [18] http://twitter.com/
    [19]http://ebiquity.umbc.edu/resource/html/id/216/Spam-in-Blogs-and-Social-Media

    QR CODE