簡易檢索 / 詳目顯示

研究生: 翟友樂
Trisna - Yulia Junita
論文名稱: 應用非凌駕排序粒子群演算法於結合資料分群及分類之研究
Non-Dominated Sorting Particle Swarm Optimizer for Combining Clustering and Classification
指導教授: 楊朝龍
Chao-Lung Yang
口試委員: 歐陽超
Chao Ou-Yang
郭人介
Ren-Jieh Kuo
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業管理系
Department of Industrial Management
論文出版年: 2016
畢業學年度: 104
語文別: 英文
論文頁數: 58
中文關鍵詞: 資料分群資料分類多目標準則分析非凌駕排序粒子群最佳化方法
外文關鍵詞: Clustering, Classification, Multi-Objective Optimization, Non-Dominated Sorting Algorithm, Particle Swarm Optimizer
相關次數: 點閱:348下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

本研究的目標在發展一個整合資料分群及資料分類技術的架構並運用多準則決策模型(Non-dominated Sorting Particle Swarm Optimization for Sequential Clustering and Classification, NSGA-SCC)來同時進行分析潛在資料特徵、資料欄位選取及利用另一資料來預測所發現的資料特徵。所發展的方法可針對施行資料分群及資料分類的資料集進行欄位選取,其目的在於選擇適當的資料欄位以同時維持資料分群的密集性(compactness)及資料分類的預測準確度(classification accuracy),並施行於不同的資料集:效能資料 (Q資料集) 及與效能有關的資料 (X資料集)來進行資料分析。在本研究中,所提出之方法NSPSO-SCC與傳統的SCC 、PSO-SCC, 及 Non Dominated Sorting Genetic Algorithm – Sequential Clustering Classification (NSGAII-SCC)方法進行比較。實驗的結果發現,所提出之方法能比SCC、PSO-SCC獲得較佳的分群及分類結果。雖然統計分析的結果顯示NSPSO-SCC 及NSGAII-SCC的結果有顯著的不同,但由於兩者的Pareto-Front前緣彼此重疊,因此未能明確分別兩者在結合分群及分類方法的差異。總之,本研究所提出的方法得以同時考量分群及分類法的分析品質,並同時進行關鍵資料欄位的選取。


In this research, Non Dominated Sorting Particle Swarm Optimization – Sequential Clustering Classification (NSPSO-SCC) was proposed for revealing the hidden patterns on one set of data, selecting data features and training a model which can predict the pattern of the data. In order to perform these tasks, the proposed approach combines clustering and classification method with multi-objective PSO method. The Non Dominated Sorting PSO was utilized to perform feature selection on both clustering and classification task, and at the same time it also maintained the quality of clustering and classification result in terms of compactness of clusters and classification accuracy. The clustering and classification procedures are conducted using two types sub-dataset i.e. Q and X dataset. For conducting clustering and classification task, hierarchical clustering and decision tree method are used. The performance of NSPSO-SCC was compared with PSO-SCC, traditional-SCC and Non Dominated Sorting Genetic Algorithm – Sequential Clustering Classification (NSGAII-SCC). The experimental result shows that NSPSO-SCC achieves better performance than PSO-SCC and traditional-SCC methods. Although the statistical test shows there is significant difference between NSPSO-SCC and NSGAII-SCC in terms of MSE and 1/acc, there is no obvious evidence showing NSPSO-SCC is better than NSGAII-SCC because their Pareto-front lines are overlapped. However, the diversity metric shows that NSPSO-SCC performance is better than NSGAII-SCC since the diversity metric of NSPSO-SCC is lower than diversity metric of NSGAII-SCC. This result suggests that the proposed method can achieve promising clustering and classification results at one time.

摘要 iii ABSTRACT iv 誌 謝 v TABLES OF CONTENTS vi LIST OF TABLES viii LIST OF FIGURES ix CHAPTER 1 INTRODUCTION 1 1.1 Background 1 1.2 Research Goals 4 1.3 Organization 5 CHAPTER 2 LITERATURE REVIEW 7 2.1 Clustering and Classification 7 2.2 Multi-objective Optimization Problem 8 2.3 Basic Particle Swarm Optimizer (PSO) 9 2.4 Multi-objective Particle Swarm Optimizer 10 2.5 Previous Work Related to Combining Clustering and Classification 12 CHAPTER 3 METHODOLOGY 16 3.1 Research Framework 16 3.2 Particle Representation 18 3.3 Clustering and Classification Algorithm 19 3.3.1 Hierarchical Clustering 20 3.3.2 Decision Tree 21 3.3.3 Evaluation Criteria 21 3.4 Non Dominated PSO-SCC 22 3.4.1 Non-Dominated Sorting and Crowding Distance Mechanism 25 CHAPTER 4 EXPERIMENTAL RESULT 28 4.1 Benchmark Techniques 28 4.2 Dataset 30 4.3 Parameter Setting and Simulation Running 30 4.4 Experimental Result 31 4.4.1 Result Comparison of NSPSO-SCC, PSO-SCC and SCC Method 31 4.4.2 Comparison on Computational Time 33 4.4.3 Result Comparison of NSPSO-SCC and NSGAII-SCC Method 34 4.5 Application of NSPSO-SCC in Wall-Mart Data Analysis 39 4.5.1 Clustering and Classification Result 42 CHAPTER 5 DISCUSSION AND CONCLUSION 45 REFERENCES 48 APPENDIX 52 A. Main Code of NSPSO-SCC 52 B. Canonical Correlation Analysis 57

Bouguettaya, A., Yu, Q., Liu, X., Zhou, X., & Song, A. (2015). Efficient agglomerative hierarchical clustering. Expert Systems with Applications, 42(5), 2785-2797.
Cai, W., Chen, S., & Zhang, D. (2009). A simultaneous learning framework for clustering and classification. Pattern Recognition, 42(7), 1248-1259. doi:http://dx.doi.org/10.1016/j.patcog.2008.11.029
Cai, W., Chen, S., & Zhang, D. (2010). A multiobjective simultaneous learning framework for clustering and classification. Neural Networks, IEEE Transactions on, 21(2), 185-200.
Chen, Y.-L., & Hung, L. T.-H. (2009). Using decision trees to summarize associative classification rules. Expert Systems with Applications, 36(2, Part 1), 2338-2351. doi:http://dx.doi.org/10.1016/j.eswa.2007.12.031
Chien, C.-F., Wang, W.-C., & Cheng, J.-C. (2007). Data mining for yield enhancement in semiconductor manufacturing and an empirical study. Expert Systems with Applications, 33(1), 192-198.
Chuan, S., Ming, C., & Zhongzhi, S. (2005, 13-15 Oct. 2005). A Fast Nondominated Sorting Algorithm. Paper presented at the 2005 International Conference on Neural Networks and Brain.
Clerc, M. (2010). Particle swarm optimization (Vol. 93): John Wiley & Sons.
Coello, C. A. C., & Lechuga, M. S. (2002). MOPSO: A proposal for multiple objective particle swarm optimization. Paper presented at the Evolutionary Computation, 2002. CEC'02. Proceedings of the 2002 Congress on.
Coello, C. A. C., Pulido, G. T., & Lechuga, M. S. (2004). Handling multiple objectives with particle swarm optimization. Evolutionary Computation, IEEE Transactions on, 8(3), 256-279.
Coletta, L. F., da Silva, N. F., & Hruschka, E. R. (2014). Combining Classification and Clustering for Tweet Sentiment Analysis. Paper presented at the Intelligent Systems (BRACIS), 2014 Brazilian Conference on.
Deb, K. (2001). Multi-objective optimization using evolutionary algorithms (Vol. 16): John Wiley & Sons.
Deb, K. (2003). Multi-objective Evolutionary Algorithms: Introducing Bias Among Pareto-optimal Solutions. In A. Ghosh & S. Tsutsui (Eds.), Advances in Evolutionary Computing: Theory and Applications (pp. 263-292). Berlin, Heidelberg: Springer Berlin Heidelberg.
Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T. (2002). A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE transactions on evolutionary computation, 6(2), 182-197.
Fieldsend, J. E., & Singh, S. (2002). A multi-objective algorithm based upon particle swarm optimisation, an efficient data structure and turbulence.
Friedl, M. A., & Brodley, C. E. (1997). Decision tree classification of land cover from remotely sensed data. Remote Sensing of Environment, 61(3), 399-409. doi:http://dx.doi.org/10.1016/S0034-4257(97)00049-7
Gini, G., Craciun, M. V., König, C., & Benfenati, E. (2004). Combining Unsupervised and Supervised Artificial Neural Networks to PredictAquatic Toxicity. Journal of Chemical Information and Computer Sciences, 44(6), 1897-1902. doi:10.1021/ci0401219
Han, J., Kamber, M., & Pei, J. (2011). Data mining: concepts and techniques: Elsevier.
Hu, X. (2006). PSO Tutorial. Retrieved from http://www.swarmintelligence.org/tutorials.php
Hu, X., & Eberhart, R. (2002). Multiobjective optimization using dynamic neighborhood particle swarm optimization. Paper presented at the wcci.
Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data: Prentice-Hall, Inc.
Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: a review. ACM computing surveys (CSUR), 31(3), 264-323.
Jin, X., & Han, J. (2010). Partitional Clustering. In C. Sammut & G. I. Webb (Eds.), Encyclopedia of Machine Learning (pp. 766-766). Boston, MA: Springer US.
Kaewchinporn, C., Vongsuchoto, N., & Srisawat, A. (2011, 11-13 May 2011). A combination of decision tree learning and clustering for data classification. Paper presented at the Computer Science and Software Engineering (JCSSE), 2011 Eighth International Joint Conference on.
Kesavaraj, G., & Sukumaran, S. (2013, 4-6 July 2013). A study on classification techniques in data mining. Paper presented at the Computing, Communications and Networking Technologies (ICCCNT),2013 Fourth International Conference on.
Kyriakopoulou, A., & Kalamboukis, T. (2008). Combining clustering with classification for spam detection in social bookmarking systems. Paper presented at the Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases Discovery Challenge,(ECML/PKDD RSDC 2008).
Lambert, Z. V., & Durand, R. M. (1975). Some Precautions in Using Canonical Analysis. Journal of Marketing Research, 12(4), 468-475. doi:10.2307/3151100
Li, T.-S., Huang, C.-L., & Wu, Z.-Y. (2006). Data mining using genetic programming for construction of a semiconductor manufacturing yield rate prediction system. Journal of Intelligent Manufacturing, 17(3), 355-361.
Li, X. (2003). A non-dominated sorting particle swarm optimizer for multiobjective optimization. Paper presented at the Genetic and Evolutionary Computation—GECCO 2003.
Liu, Y. (2008). A fast and elitist multi-objective particle swarm algorithm: NSPSO. Paper presented at the Granular Computing, 2008. GrC 2008. IEEE International Conference on.
Mitra, S., & Acharya, T. (2003). Data Mining: Multimedia. Soft Computing, and Bioinformatics. John Wiley, New York.
Momon, S., Godin, N., Reynaud, P., R’Mili, M., & Fantozzi, G. (2012). Unsupervised and supervised classification of AE data collected during fatigue test on CMC at high temperature. Composites Part A: Applied Science and Manufacturing, 43(2), 254-260. doi:http://dx.doi.org/10.1016/j.compositesa.2011.10.016
Nizamani, S., Memon, N., Wiil, U. K., & Karampelas, P. (2011). CCM: a text classification model by clustering. Paper presented at the Advances in Social Networks Analysis and Mining (ASONAM), 2011 International Conference on.
Panov, P., Džeroski, S., & Soldatova, L. (2008). OntoDM: An ontology of data mining. Paper presented at the 2008 IEEE International Conference on Data Mining Workshops.
Papas, D., & Tjortjis, C. (2014). Combining Clustering and Classification for Software Quality Evaluation Artificial Intelligence: Methods and Applications (pp. 273-286): Springer.
Qian, Q., Chen, S., & Cai, W. (2012). Simultaneous clustering and classification over cluster structure representation. Pattern Recognition, 45(6), 2227-2236.
Quyen, N. T. P. (2016). Data Analysis Framework of Constrained Clustering and Sequential Clustering Classification. (Doctor of Philosophy Dissertation), National Taiwan University of Science and Technology, Taipei, Taiwan.
Rana, S., Jasola, S., & Kumar, R. (2011). A review on particle swarm optimization algorithms and their applications to data clustering. Artificial Intelligence Review, 35(3), 211-222.
Raquel, C. R., & Naval Jr, P. C. (2005). An effective use of crowding distance in multiobjective particle swarm optimization. Paper presented at the Proceedings of the 7th annual conference on Genetic and evolutionary computation.
Rokach, L., & Maimon, O. (2005). Decision Trees. In O. Maimon & L. Rokach (Eds.), Data Mining and Knowledge Discovery Handbook (pp. 165-192). Boston, MA: Springer US.
Srinivas, N., & Deb, K. (1994). Muiltiobjective optimization using nondominated sorting in genetic algorithms. Evolutionary computation, 2(3), 221-248.
Srinivasan, D., & Hou, S. T. (2003, 8-12 Dec. 2003). Particle swarm inspired evolutionary algorithm (PS-EA) for multiobjective optimization problems. Paper presented at the Evolutionary Computation, 2003. CEC '03. The 2003 Congress on.
Steinbach, M., Ertöz, L., & Kumar, V. (2004). The challenges of clustering high dimensional data New directions in statistical physics (pp. 273-309): Springer.
Tan, P. N., Steinbach, M., & Kumar, V. (2014). Introduction to Data Mining: Pearson Education, Limited.
Tsai, S.-J., Sun, T.-Y., Liu, C.-C., Hsieh, S.-T., Wu, W.-C., & Chiu, S.-Y. (2010). An improved multi-objective particle swarm optimizer for multi-objective problems. Expert Systems with Applications, 37(8), 5872-5886.
Xue, B., Zhang, M., & Browne, W. N. (2012). Multi-objective particle swarm optimisation (PSO) for feature selection. Paper presented at the Proceedings of the 14th annual conference on Genetic and evolutionary computation.
Xue, B., Zhang, M., & Browne, W. N. (2013). Particle swarm optimization for feature selection in classification: A multi-objective approach. Cybernetics, IEEE Transactions on, 43(6), 1656-1671.
Zhang, X. Y., Yang, P., Zhang, Y. M., Huang, K., & Liu, C. L. (2014). Combination of Classification and Clustering Results with Label Propagation. IEEE Signal Processing Letters, 21(5), 610-614. doi:10.1109/LSP.2014.2312005

QR CODE