簡易檢索 / 詳目顯示

研究生: 林敬銘
Ching-Ming Lin
論文名稱: 基於改良式FCM微網誌主題分類
Topic Classification on Micro Blog Using Improved FCM
指導教授: 洪西進
Shi-Jinn Horng
口試委員: 辛錫進
Hsi-Chin Hsin
林韋宏
Wei-Hung Lin
謝仁偉
Ren-Wei Shie
沈榮麟
Rung-Lin Shen
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2011
畢業學年度: 99
語文別: 中文
論文頁數: 56
中文關鍵詞: 微網誌文本探勘主題偵測Fuzzy C-Mean
外文關鍵詞: micro blog, text mining, topic detection, Fuzzy C-Mean
相關次數: 點閱:292下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

目前在文本探勘或是主題偵測上的研究上,鮮少建立在微網誌的環境下。而微網誌所提供的搜尋系統往往無法搜尋到有效的資料。在文本探勘當中較常使用到的Fuzzy C-Mean卻往往因為中心點的找尋不同而有不同的結果,在文本探勘上也造成許多問題。

本論文建立一個改良過的搜尋系統,使用者輸入關鍵字之後可以得到分類之後的結果,且了解目前針對關鍵字的主要話題為何。並將系統結合Google應用服務引擎技術,讓大眾可以使用本系統,並在最低硬體成本的消耗下達到最大的效益。另外本論文針對傳統Fuzzy C-Mean演算法做改進,不僅大幅的增加了分群的準確度,也縮短的分群的時間。


In the researches of text mining and topic detection, there are few researches base on micro blog. And the result of search engine which is supported by website seldom finds the useful data. In text mining or data mining, Fuzzy C-Mean is a popular method. The result strongly depends on the center point. And the result shows that different center point have different result. It may cause many problems in text mining and data mining.

In this thesis, we build an on-line micro blog search engine. When user enter the word which he interest in, the system shows the result and classified. The precision, recall and run time is good. And we construct our system on Google App Engine. Let more people use our system. We improve the Fuzzy C-Mean algorithm; enhance the precision of Fuzzy C-Mean.

中文摘要 I 英文摘要 II 致謝 III 目錄 IV 圖目錄 VII 表目錄 IX 第一章 序論 1 1.1 研究動機與目的 1 1.2 相關研究 4 1.3 論文章節安排 6 第二章 資料蒐集 7 2.1 抓取Twitter訊息 7 2.1.1 Twitter搜尋系統 8 2.1.2 過濾搜尋結果 10 2.1.3 tf-idf 13 2.1.4 Stemming演算法 14 2.1.5 訊息儲存 14 2.2 抓取已標籤單字列表 15 第三章 訊息的分群及分類 19 3.1 相似訊息子集合 19 3.2 訊息的分類 23 3.2.1 單字出現機率總和分群 23 3.2.2 Fuzzy C-mean分群 24 3.3 改良式Fuzzy C-Mean 演算法 26 第四章 結合Google應用服務引擎 29 4.1 結合Google 應用服務引擎 29 4.2 結合Google Chrome瀏覽器 31 第五章 系統實作及實驗結果 32 5.1 開發環境及系統架構 32 5.2 實驗結果 33 5.2.1 對訊息使用tf-idf降維實驗結果 36 5.2.2 抓取外部連結標題實驗結果 39 5.2.3 傳統FCM以及改良後FCM比較結果 42 5.2.4 與相關研究之比較 44 5.2.5 大量資料準確度比對 46 5.2.6 結合Google App Engine 47 5.2.7 結合Google Chrome 50 第六章 結論 52 參考文獻 54

[1] Zhou Yantao ; Tang Jianbo ; Wang Jiaqin; , "An Improved TFIDF Feature Selection Algorithm Based On Imformation Entropy" Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, 2009. IDAACS 2009. IEEE International Workshop on , vol., no., pp.526-529, 21-23 Sept. 2009
[2] Xiang-dong Li; Zhi-hua Fu; Li-ping Wu; Xiao-bin Liu; Run-hua Tan; , "Research on the Improvement of the Fuzzy C-Means Text Mining Methods Based on Genetic Algorithm," Intelligent Systems (GCIS), 2010 Second WRI Global Congress on , vol.3, no., pp.13-16, 16-17 Dec. 2010
[3] Jiabin Deng; JuanLi Hu; Hehua Chi; Juebo Wu; , "An Improved Fuzzy Clustering Method for Text Mining," Networks Security Wireless Communications and Trusted Computing (NSWCTC), 2010 Second International Conference on , vol.1, no., pp.65-69, 24-25 April 2010
[4] Dedek, Jan; Vojtas, Peter; , "Fuzzy Classification of Web Reports with Linguistic Text Mining," Web Intelligence and Intelligent Agent Technologies, 2009. WI-IAT '09. IEEE/WIC/ACM International Joint Conferences on , vol.3, no., pp.167-170, 15-18 Sept. 2009
[5] Shengdong Li; Xueqiang Lv; Tao Wang; Shuicai Shi; , "The key technology of topic detection based on K-means," Future Information Technology and Management Engineering (FITME), 2010 International Conference on , vol.2, no., pp.387-390, 9-10 Oct. 2010
[6] Wartena, C.; Brussee, R.; , "Topic Detection by Clustering Keywords," Database and Expert Systems Application, 2008. DEXA '08. 19th International Workshop on , vol., no., pp.54-58, 1-5 Sept. 2008
[7] Thomas Hofmann “Probabilistic Latent Semantic Indexing” Proceedings of the Twenty-Second Annual International SIGIR Conference on Research and Development in Information Retrieval, p50-57 1999
[8] Thomas Hofmann “Probabilistic Latent Semantic Analysis” Uncertainity in Arti_cial Intelligence, UAI'99, Stockholm, 1999
[9] David M. Blei, Andrew Y. Ng, Michael I. Jordan “Latent Dirichlet Allocation” Journal of Machine Learning Research, pp. 993-1022 March 2003
[10] Beaux P. Sharifi “Automatic Microblog Classification and Summarization” University of Colorado, Colorado Springs
[11] Sasa Petrovi’, Miles Osborne, Victor Lavrenko “Streaming First Story Detection with application to Twitter” The 2010 Annual Conference of the North American Chapter of the ACL, pages 181–189
[12] James Benhardus “Streaming Trend Detection in Twitter” 2010 UCCS REU FOR ARTIFICIAL INTELLIGENCE, NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL FINAL REPORT
[13] Mario Cataldi, Luigi Di Caro, and Claudio Schifanella. 2010. “Emerging topic detection on Twitter based on temporal and social terms evaluation.” In Proceedings of the Tenth International Workshop on Multimedia Data Mining (MDMKDD '10), pp.1-10 2010
[14] Phuvipadawat, S.; Murata, T.; , "Breaking News Detection and Tracking in Twitter," Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on , vol.3, no., pp.120-123, Aug. 31 2010-Sept. 3 2010
[15] http://yearinreview.Twitter.com/trends/
[16] http://www.lextek.com/manuals/onix/stopwords1.html
[17] http://www.lextek.com/manuals/onix/stopwords2.html
[18] C.J. van Rijsbergen, S.E. Robertson and M.F. Porter, 1980. “New models in probabilistic information”, London: British Library. (British Library Research and Development Report, no. 5587).
[19] http://en.wikipedia.org/wiki/MapReduce
[20] http://dev.Twitter.com/doc/get/search
[21] http://www.dbanotes.net/arch/google_app_engine-arch_intro.html

QR CODE