使用向量空間模型改善之萬用演算法為基礎之文件分群

簡易檢索 / 詳目顯示

回結果列表

研究生：	許大為 David - Alexandre
論文名稱：	使用向量空間模型改善之萬用演算法為基礎之文件分群 Meta-heuristic Based Document Clustering using Vector Space Model Modification
指導教授：	郭人介 Ren-Jieh Kuo
口試委員:	喻奉天 Vincent F. Yu 林希偉 Shi-Woei Lin
學位類別：	碩士 Master
系所名稱：	管理學院 - 工業管理系 Department of Industrial Management
論文出版年：	2014
畢業學年度：	102
語文別：	英文
論文頁數：	90
中文關鍵詞：	Cluster analysis 、Document clustering 、Meta-heuristic 、Vector space model
外文關鍵詞：	Cluster analysis, Document clustering, Meta-heuristic, Vector space model
相關次數：	點閱：322 下載：1
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

This study attempts to employ document clustering algorithm by using meta-heuristic algorithm and vector space model modification method to improve the performance of document clustering. The proposed vector space model modification in document clustering algorithm is performed in the similarity calculation. In order to optimize the usability, the vector space model modification is combined with meta-heuristic algorithm. The document clustering with vector space model modification tries to find the most significant part in each document vector space to calculate the highest similarity between two documents instead of calculate the similarity from whole document vector space. This proposed algorithm is compared with the ordinary method using four benchmark data sets, SMS Spam Detection, WebKB, Reuters-8, and Reuters-52. The simulation results indicate that document clustering algorithm with vector space model modification can improve the performances.

ABSTRACT	i
ACKNOWLEDGEMENT	ii
CONTENTS	iii
LIST OF FIGURES	v
LIST OF TABLES	vi
Chapter 1 INTRODUCTION	1
1.1. Research Background	1
1.2. Research Objectives	2
1.3. Research Scopes and Constraints	3
1.4. Thesis Organization	3
Chapter 2 LITERATURE SURVEY	5
2.1. Text Document Clustering	5
2.1.1. Basic Concept	5
2.1.2. Data Representation	10
2.1.3. Data Preprocessing	11
2.1.4. Validation Methods	13
2.2. Text Document Clustering Methods	15
2.3. Meta-Heuristic Methods	17
2.3.1. Particle Swarm Optimization (PSO) Algorithm	18
2.3.2. Genetics Algorithm	19
2.4. Meta-heuristics in Text Document Clustering	20
Chapter 3 METHODOLOGY	23
3.1. Document Clustering with Vector Space Modification	26
3.2. Document Clustering with Meta-heuristic and Vector Space Modification	31
3.2.1. Particle Swarm Optimization based Modified Vector Space	31
3.2.2. Genetic Algorithm based Modified Vector Space	34
Chapter 4 COMPUTATIONAL RESULTS AND ANALYSIS	38
4.1. Parameters Setup	38
4.2. Computational Results	41
4.3. Statistical Test	45
4.4. Algorithm Convergence	49
Chapter 5 CONCLUSIONS AND FUTURE RESEARCH	55
5.1. Conclusion	55
5.2. Contributions	57
5.3. Future Research	58
REFERENCE	59
Appendix I COMPUTATIONAL RESULT	62
Appendix II STATISTICAL RESULT OF PROPOSED ALGORITHMS	78

                                

AbdelHamid, N. M., Halim, M. A. & Walee, M., 2013. Bees Algorithm-based Document Clustering. s.l., The 6th International Conference on Information Technology.
Aggarwal, C. C. & Zhai, C. X., 2012. A survey of text clustering algorithms. In: Mining text data. s.l.:Springer US, pp. 77-128.
Andrews, N. O. & Fox, E. A., 2007. Recent developments in document clustering. s.l.:Department of Computer Science, Virginia Tech.
Ayvaz, M. T., 2007. Simultaneous determination of aquifer parameters and zone structures with fuzzy c-means clustering and meta-heuristic harmony search algorithm. Advances in Water Resources, Volume 30, pp. 2326-2338.
Budin, L., Golub, M. & Budin, A., 2010. Traditional techniques of genetic algorithms applied to floating-point chromosome representations.. sign, 1(11), p. 52.
Cormack, G. V., Gomez Hidalgo, J. M. & Puertas Sanz, E., 2007. Feature engineering for mobile (SMS) spam filtering. New York, International ACM Conference on Research and Development in information Retrieval.
Cui, X. & Potok, T. E., 2005. Document Clustering Analysis based on Hybrid PSO + K-Means Algorithm. Journal of Computer Sciences, 27(Special), p. 33.
Cutting, D., Karger, D., Pedersen, J. & Tukey, J., 1992. Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. s.l., ACM SIGIR Conference.
Eberhart, R. & Kennedy, J., 1995. A New Optimizer Using Particle Swarm Theory. In: Sixth International Symposium on Micro Machine and Human Science. Nagoya: IEEE, pp. 39-43.
Fung, B. C. M., Wang, K. & Martin, E., 2005. Hierarchical Document Clustering. In: Encyclopedia of Data Warehousing and Mining. s.l.:s.n., pp. 555-559.
Garai, G. & Chaudhuri, B. B., 2004. A novel genetic algorithm for automatic clustering. Pattern Recognition Letters, 25(2), pp. 173-187.
Gomez Hidalgo, J. M., Cajigas Bringas, G., G., P. S. & E., C. G., 2006. Content Based SMS Spam Filtering. Amsterdam, ACM Symposium on Document Engineering.
Hammouda, K. M. & Mohamed, K. S., October 2004. Efficient Phrase-based Document Indexing for Web Document Clustering. IEEE Transactions on Knowledge and Data Engineering, 16(10), pp. 1279-1296.
Huang, A., 2008. Similarity measures for text document clustering. New Zealand, The Sixth New Zealand Computer Science Research Student Conference.
Izakian, H., Abraham, A. & Snasel, V., 2009. Fuzzy clustering using hybrid fuzzy c-means and fuzzy particle swarm optimization. ature & Biologically Inspired Computing, pp. 1690-1694.
Jensi, R. & Jiji, W. G., 2013. A Survey on Optimization Approaches to Text Document Clustering. International Journal on Computational Sciences & Applications (IJCSA), 3(6), pp. 31-44.
Jones, G., Robertson, A., Santimetvirul, C. & Willett, P., 1995. Non-hierarchic document clustering using a genetic algorithm. Information Research , Volume 1, pp. 1-1.
Kang, J. & Zhang, W., 2011. Combination of Fuzzy c-means and Harmony Search Algorithms for Clustering of Text Document. Journal of Computational Information Systems, 7(16), pp. 5980-5986.
Kang, J. & Zhang, W., 2012. Combination of Fuzzy c-means and Particle Swarm Optimization for Text Document Clustering. Advances in Electrical Enginnering and Automation, Volume 139, pp. 247-252.
Lee, K. S. & Geem, Z. W., 2005. A new meta-heuristic algorithm for continuous engineering optimization: harmony search theory and practice. Computer Methods in Applied Mechanics and Engineering, 194(36-38), pp. 3902-3933.
Machnik, Ł., 2007. A document clustering method based on ant algorithms. Task Quarterly, 11(1-2), pp. 87-102.
Mahdavi, M., Chehreghani, M. H., Abolhassani, H. & Forsati, R., 2008. Novel meta-heuristic algorithms for clustering web documents. Applied Mathematics and Computation, 201(1-2), pp. 441-451.
Min, W. & Siqing, Y., 2010. Improved K-means clustering based on genetic algorithm. Computer Application and System Modeling (ICCASM), Volume 6.
Moschitti, A. & Basili, R., 2004. Complex Linguistic Features for Text Classification: A Comprehensive Study. In: Advances in Information Retrieval. Berlin Heidelberg: Springer , pp. 181-196.
Nazini, M., Roshna, M. & Shaik, J. H., 2013. Efficiently Measuring Similarities Between Objects in Different Views of Hierarchical Clustering. International Journal of Computer Science and Telecommunications, 4(2), pp. 523-527.
Pessiot, J. F., Kim, Y. M., Amini, M. R. & Gallinari, P., 2010. Improving document clustering in a learned concept space. Information processing & management, 46(2), pp. 180-192.
Porter, M. F., 1980. An Algorithm for Suffix Stripping. Program: electronic library and information systems, 14(3), pp. 130-137.
Premalatha, K. & Natarajan, A. M., 2010. Hybrid PSO and GA models for Document Clustering. International Journal of Soft Computing and Its Applications, 2(3), pp. 302-320.
Schutze, H. & Silverstein, C., 1997. Projections for Efficient Document Clustering. s.l., ACM SIGIR Conference.
Signh, V. K., Tiwari, N. & Garg, S., 2011. Document Clustering using K-means, Heuristic K-means and Fuzzy C-means. Computational Intelligence and Communication Network (CICN), 2011 International Conference on, pp. 297-301.
Sindhiya, B. & Tajunisha, N., December 2013. Concept and Term Based Similarity Measure for Text Classification and Clustering. International Journal of Engineering Research and Development, 9(3), pp. 28-33.
Song, W., Li, C. H. & Park, S. C., 2009. Genetic algorithm for text clustering using ontology and evaluating the validity of various semantic similarity measures. Expert Systems with Applications, 36(5), pp. 9095-9104.
Steinbach, M., Karypis, G. & Kumar, V., 2000. A Comparison of Document Clustering Techniques. KDD workshop on text mining, 400(1), pp. 525-526.
Win, T. T. & Mon, L., 2010. Document Clustering by Fuzzy C-Mean Algorithm. Advanced Computer Controll (ICACC), 2010 2nd International Conference on, Volume 1, pp. 239 - 242.
Xiao, X., Dow, E. R., Eberhart, R. & Miled, Z. B., 2003. A Gene Clustering Using Self-Organizing Maps and Particle Swarm Optimization. In: Parallel and Distributed Processing Symposium. s.l.:IEEE.
Zulvia, F. E., 2010. A Hybrid Particle Swarm Optimization with Genetic Algorithm for Solving Capacitated Vehicle Routing Problem with Fuzzy demand - A Case Study on Garbage Collection System, s.l.: PhD diss., MSc Thesis.

簡易檢索 / 詳目顯示

相關論文