簡易檢索 / 詳目顯示

研究生: 白士宗
Shr-Tzung Bai
論文名稱: WikiSERM:基於連續事件風險衡量之維基百科版本破壞偵測
WikiSERM: Wikipedia Vandalism Detection Through Sequential Event Risk Measure
指導教授: 李漢銘
Hahn-Ming Lee
口試委員: 鄧惟中
Wei-Chung Teng
鄭博仁
Albert B. Jeng
鄭欣明
Shin-Ming Cheng
廖弘源
none
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2015
畢業學年度: 103
語文別: 英文
論文頁數: 71
中文關鍵詞: 維基百科破壞偵測連續事件風險衡量
外文關鍵詞: Wikipedia, Vandalism Detection, Sequential Event Risk Measure
相關次數: 點閱:224下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • Wikipedia是一個多語系且具有豐富內容的線上百科。Wikipedia以Web2.0為發展的概念,允許任何人進行分享編輯,這也使得Wikipedia很容易被破壞。因此,維持Wikipedia的內容品質是所有維基人長期持續努力的方向。過往針對Wikipedia破壞偵測的研究方向,主要專注在文字語意、特徵統計與機器學習。目前的方向則更注重非語言特徵分析與連續性內容的上下文關聯性分析。WikiSERM依據維基百科定義的編輯標籤進行關鍵字萃取,且該編輯標籤適用於各種語言維基,因此具有非語言相關的特性,並使得WikiSERM得以應用在各種語言版本的維基百科破壞偵測。WikiSERM使用完整的文章版本作為判斷風險趨勢的證據,分析個別關鍵字於維基文章的使用狀態(例如:保持使用中或完全被刪除)。透過關鍵字使用狀態的連續性事件分析,則可獲得每個關鍵字在每個對應版本的風險狀態。且WikiSERM將關鍵字的風險程度加以紀錄成二維陣列以便未來快速查詢比對,因此WikiSERM得以應對增量性的資料並立即地提供風險評估結果。最後,藉由針對異動版本版本中的關鍵字異動分析(例如:新增高風險關鍵字、新增低風險關鍵字、刪除高風險關鍵字,或刪除低風險關鍵字),WikiSERM將超過風險臨界值的版本判定為高風險版本。本研究成果可以協助維基百科管理者快速找尋破壞性版本,並且識別該版本中的高風險關鍵字項目。


    Wikipedia is a multi-language and wealth-content online encyclopedia. Based on the concept of Web2.0, Wikipedia allows anyone to share and edit Wikipedia content, which also makes Wikipedia easily to be destroyed. Therefore, all Wikipedians pay long-term sustained effort on maintaining the quality of Wikipedia content. The past research directions for Wikipedia vandalism detection focused on the text semantic, feature statistical and machine learning. The current directions focus on language-independent feature analysis and continuity content-context correlation analysis. WikiSERM extracts the key-item based on the Wikipedia edit tag which applies to various languages of Wikipedia. Therefore key-item based on the Wikipedia edit tag has the language-independent feature, and it makes WikiSERM be applied in various language versions of Wikipedia vandalism detection. WikiSERM take the full version of the article as evidence to judge risk trends and to analyze using-status of each key item in a Wikipedia article (e.g., keeps being used or completely deleted). Through analysis of the continuity of key item using-status, we can get risk status of each key item in each corresponding revision. WikiSERM records those risk results of previous revision as two-dimensional array for querying quickly, therefore WikiSERM has the ability to deal with the incremental data and to provide the risk assessment result immediately. Through the analysis of key item transaction in the Wikipedia revision (e.g., add high-risk key item, add low-risk key item, delete high-risk key item, delete low-risk key item), WikiSERM take those over-threshold revisions as a high risk version. Our approach can help Wikipedia administrators to quickly find vandalism revision, and identify which is the high-risk key item in the vandalism revision.

    1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Challenges and Goals . . . . . . . . . . . . . . . .. .. . . 2 1.3 Contributions . . . . . . . . . . . . . . . . . . .. . . . . 3 1.4 The Outline of Thesis . . . . . . . . . . . . .. . . . . . . 3 2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Wikipedia Vandalism . . . . . . . . . . . . . . . . . . . . 4 2.2 Wikipedia Visualization . . . . . . . . . . . . . .. . . . . 6 2.3 Wikipedia Vandalism Detection . . . . . . . . . . .. . . . . 6 3 Description of WikiSERM . . . . . . . . . . . . . . . . . . . . . . . 11 3.1 WikiSERM Overview . . . . . . . . . . . . . . . . . . . . . 11 3.2 System Architecture . . . . . . . . . . . . . . . . . . . . 13 3.3 Key-Item Revising Data Detection Modular . . . . . .. . . . 16 3.4 WikiSERM Vandalism Detection Modular . . . . . . .. . . . . 23 4 Experiments & Result Analysis . . . . . . . . . . . . . . . . . . . . 43 4.1 Sample Dataset . . . . . . . . . . . . . . . . . . . . . . 43 4.2 Vandalism Detection Result . . . . . . . . . . . . . . . . 45 4.3 Limitations Analysis . . . . . . . . . . . . . . . . . . . 47 4.4 Experiment Result Review . . . . . . . . . . . . . . .. . . 50 5 Conclusion and Future Work. . . . . . . . . . . . . . . . . . . . . . 53 References . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . 55

    [1] “Wikipedia:version 1.0 editorial team/ assessment,” http://en.wikipedia.org/wiki/Wikipedia:Version_1.0_Editorial_Team/Assessment, Retrieved 2014-08-12.
    [2] X. Niu, X. Sun, H. Wang, S. Rong, G. Qi, and Y. Yu, “Zhishi. me-weaving chinese linking open data,” in The Semantic Web–ISWC 2011, pp. 205–220, Springer, 2011.
    [3] H. T. Welser, D. Cosley, G. Kossinets, A. Lin, F. Dokshin, G. Gay, and M. Smith, “Finding social roles in wikipedia,” in Proceedings of the 2011 iConference, pp. 122–129, ACM, 2011.
    [4] A. Kittur, B. Suh, B. A. Pendleton, and E. H. Chi, “He says, she says: conflict and coordination in wikipedia,” in Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 453–462, ACM, 2007.
    [5] K. Smets, B. Goethals, and B. Verdonk, “Automatic vandalism detection in wikipedia: Towards a machine learning approach,” in AAAI workshop on Wikipedia and artificial intelligence: An Evolving Synergy, pp. 43–48, 2008.
    [6] “Wikipedia:vandalism,” http://en.wikipedia.org/wiki/Wikipedia:Vandalism, Retrieved 2014-08-12.
    [7] R. Priedhorsky, J. Chen, S. T. K. Lam, K. Panciera, L. Terveen, and J. Riedl, “Creating, destroying, and restoring value in wikipedia,” in Proceedings of the 2007 international ACM conference on Supporting group work, pp. 259–268, ACM, 2007.
    [8] “Wikipedia:protection policy,” http://en.wikipedia.org/wiki/Wikipedia:Protection_policy#Semi-protection, Retrieved 2014-12-06.
    [9] “Wikipedia:edit warring the-three-revert-rule,” http://en.wikipedia.org/wiki/Wikipedia:Edit_warring#The_three-revert_rul, Retrieved 2014-10-27.
    [10] B. Adler, L. de Alfaro, and I. Pye, “Detecting wikipedia vandalism using wikitrust,” Notebook papers of CLEF, vol. 1, pp. 22–23, 2010.
    [11] S. Javanmardi, D. W. McDonald, R. Caruana, S. Forouzan, and C. V. Lopes, “Learning to detect vandalism in social content systems: A study on wikipedia,” in Mining Social Networks and Security Informatics, pp. 203–225, Springer, 2013.
    [12] “被革了命的維基百科和進化中的知識協作,” http://tech.sina.com.cn/zl/post/detail/i/2014-05-15/pid_8452959.htm, Retrieved 2014-07-10.
    [13] S. Hsu, “Renowned toxicologist lin chieh-liang dies from pulmonary infection at 55,” Taipei Times. 5 August 2013. Retrieved 2014-08-12.
    [14] R. P. Biuk-Aghai and R. C. K. Chan, “Feeling the pulse of a wiki: visualization of recent changes in wikipedia,” in Proceedings of the 5th International Symposium on Visual Information Communication and Interaction, pp. 77–86, ACM, 2012.
    [15] R. Kleeb, P. A. Gloor, K. Nemoto, and M. Henninger, “Wikimaps: dynamic maps of knowledge,” International Journal of Organisational Design and Engineering, vol. 2, no. 2, pp. 204–224, 2012.
    [16] “text2mindmap,” https://www.text2mindmap.com/, Retrieved 2014-08-12.
    [17] F. Flöck and M. Acosta, “Wikiwho: precise and efficient attribution of authorship of revisioned content,” in Proceedings of the 23rd international conference on World wide web, pp. 843–854, International World Wide Web Conferences Steering Committee, 2014.
    [18] L. Ramaswamy, R. S. Tummalapenta, K. Li, and C. Pu, “A content-context-centric approach for detecting vandalism in wikipedia,” in 9th International Conference Conference on Collaborative Computing: Networking, Applications and Worksharing (Collaboratecom 2013), pp. 115–122, IEEE, 2013.
    [19] A. G. West, S. Kannan, and I. Lee, “Stiki: an anti-vandalism tool for wikipedia using spatio-temporal analysis of revision metadata,” in Proceedings of the 6th International Symposium on Wikis and Open Collaboration, p. 32, ACM, 2010.
    [20] A. G. West, S. Kannan, and I. Lee, “Detecting wikipedia vandalism via spatio-temporal analysis of revision metadata?,” in Proceedings of the Third European Workshop on System Security, pp. 22–28, ACM, 2010.
    [21] S.-C. Chin, W. N. Street, P. Srinivasan, and D. Eichmann, “Detecting wikipedia vandalism with active learning and statistical language models,” in Proceedings of the 4th workshop on Information credibility, pp. 3–10, ACM, 2010.
    [22] L. de Alfaro and B. Adler, “Content-driven reputation for collaborative systems,” in Trustworthy Global Computing, pp. 3–13, Springer, 2014.
    [23] T. Wöhner and R. Peters, “Assessing the quality of wikipedia articles with lifecycle based metrics,” in Proceedings of the 5th International Symposium on Wikis and Open Collaboration, p. 16, ACM, 2009.
    [24] M. Hu, E.-P. Lim, A. Sun, H. W. Lauw, and B.-Q. Vuong, “Measuring article quality in wikipedia: models and evaluation,” in Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pp. 243–252, ACM, 2007.
    [25] Q. Wu, D. Irani, C. Pu, and L. Ramaswamy, “Elusive vandalism detection in wikipedia: a text stabilitybased approach,” in Proceedings of the 19th ACM international conference on Information and knowledge management, pp. 1797–1800, ACM, 2010.
    [26] K.-N. Tran, P. Christen, S. Sanner, and L. Xie, “Context-aware detection of sneaky vandalism on wikipedia across multiple languages,” in Advances in Knowledge Discovery and Data Mining, pp. 380–391, Springer, 2015.
    [27] E. Alfonseca, G. Garrido, J.-Y. Delort, and A. Peñas, “Whad: Wikipedia historical attributes data,” Language resources and evaluation, vol. 47, no. 4, pp. 1163–1190, 2013.
    [28] A. Ceroni, M. Georgescu, U. Gadiraju, K. D. Naini, and M. Fisichella, “Information evolution in wikipedia,” in Proceedings of The International Symposium on Open Collaboration, p. 24, ACM, 2014.
    [29] O. Nov, “What motivates wikipedians?,” Communications of the ACM, vol. 50, no. 11, pp. 60–64, 2007.
    [30] D. Best, “Web 2.0: Next big thing or next big internet bubble,” Technische Universiteit Eindhoven, 2006.
    [31] D. Fichter, “Web 2.0, library 2.0 and radical trust: A first take,” April, 2006. http://library.usask.ca/~fichter/blog_on_the_side/2006/04/web-2.html, Retrieved 2014-10-27.
    [32] “7 key features of web 2.0,” http://webapprater.com/general/7-key-features-of-web-2-0.html, Retrieved 2014-10-27.
    [33] 鄭光廷, 徐士傑, and 林東清, “影響使用者持續使用web 2.0 傳遞與分享知識之研究,” 資訊管理學報, vol. 19, no. 2, pp. 249–274, 2012.
    [34] “知識協作,” http://wiki.mbalib.com/zh-tw/%E7%9F%A5%E8%AF%86%E5%8D%8F%E4%BD%9C, Retrieved 2014-11-10.
    [35] B. T. Adler and L. De Alfaro, “A content-driven reputation system for the wikipedia,” in Proceedings of the 16th international conference on World Wide Web, pp. 261–270, ACM, 2007.
    [36] L. De Alfaro, A. Kulshreshtha, I. Pye, and B. T. Adler, “Reputation systems for open collaboration,” Communications of the ACM, vol. 54, no. 8, pp. 81–87, 2011.
    [37] Y. Suzuki, “Quality assessment of wikipedia articles using h-index,” Journal of Information Processing, vol. 23, no. 1, pp. 22–30, 2015.
    [38] S. Javanmardi, D. W. McDonald, and C. V. Lopes, “Vandalism detection in wikipedia: a highperforming, feature-rich model and its reduction through lasso,” in Proceedings of the 7th International Symposium on Wikis and Open Collaboration, pp. 82–90, ACM, 2011.
    [39] K.-N. Tran and P. Christen, “Cross language prediction of vandalism on wikipedia using article views and revisions,” in Advances in Knowledge Discovery and Data Mining, pp. 268–279, Springer, 2013.
    [40] K.-N. Tran and P. Christen, “Cross-language learning from bots and users to detect vandalism on wikipedia,” Knowledge and Data Engineering, IEEE Transactions on, vol. 27, pp. 673–685, March 2015.
    [41] M. Potthast and T. Holfeld, “Overview of the 2nd international competition on wikipedia vandalism detection,” in CLEF, 2011.
    [42] A. G. West and I. Lee, “Multilingual vandalism detection using language-independent & ex post facto evidence,” in CLEF, 2011.
    [43] “維基百科: 使用說明: 編輯頁面,” https://zh.wikipedia.org/wiki/Help:%E7%BC%96%E8%BE%91%E9%A1%B5%E9%9D%A2, Retrieved 2014-08-12.
    [44] “維基百科: 模板訊息/清理,” https://zh.wikipedia.org/wiki/Wikipedia:%E6%A8%A1%E6%9D%BF%E6%B6%88%E6%81%AF/%E6%B8%85%E7%90%86, Retrieved 2014-08-12.

    QR CODE