簡易檢索 / 詳目顯示

研究生: 蘇艾瑪
Irma - Armunifah
論文名稱: 應用非凌駕排序基因演算法於結合資料分群及分類之研究
Non-Dominated Sorting Genetic Algorithm for Combining Clustering and Classification
指導教授: 楊朝龍
Chao-Lung Yang
口試委員: 曹譽鐘
Yu-Chung Tsao
郭人介
Ren-Jieh Kuo
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業管理系
Department of Industrial Management
論文出版年: 2015
畢業學年度: 103
語文別: 英文
論文頁數: 47
中文關鍵詞: 資料分群資料分類多目標準則分析零售銷售銷售時點資料。
外文關鍵詞: Clustering, Classification, Multi-Objective Optimization, Non-Dominated Sorting Genetic Algorithm, Point of Sales Data.
相關次數: 點閱:240下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 資料分群(clustering)與資料分類法(classification)分屬資料探勘方法中的非監督式(unsupervised)及監督式(supervised)學習方法。當資料的標籤已知時,分類法可用來訓練一個分類模型將未知標籤的資料進行分類。若資料的標籤無法事先得知,則分群方法可用來解析資料中潛在的結構。本研究的目標在發展一個整合資料分群及資料分類技術的架構並運用多準則決策模型(non-dominated sorting generic algorithm for combining clustering and classification, NSGA-CCC)來有系統地分析複雜資料。結合資料分群及資料分類的特性,並將之施行於資料集中不同的資料型態:效能資料 (Q 資料集) 及與效能有關的資料 (X資料集)來進行資料分析。資料分群方法首先施行於Q資料集,將量化資料進行分群。其分群的結果(即資料的分群標示)再結合X資料集以作為資料分類方法的施行對象。在本研究中,基於樹狀結構的可讀性及成果的穩定性,階層分群法與決策樹方法作為資料分群及分類方法的例子。結合多準則決策分析non-dominated sorting generic algorithm-II方法,來同時檢視資料分群及分類方法的成效以選擇適合的群組數以平衡兩方法的效能。本研究以零售銷售業的銷售時點資料集作為此一方法的應用例子,來找尋影響零售業銷售的因子。實驗的結果發現,所提出的方法得以同時考量分群及分類法的分析品質,並同時進行關鍵資料欄位的選取。


    Clustering and classification are two major data mining techniques for exploring the hidden structures of data and classifying the unknown data instance, respectively. Due to the nature of difference between unsupervised and supervised methods, clustering and classification are applied separately for different analysis applications depending on whether the data label is available. In this research, a multi-objective framework combining clustering and classification called non-dominated sorting generic algorithm for combining clustering and classification (NSGA-CCC) was proposed to analyze different portions of a dataset. The clustering method is expected to help on rapidly analyzing or identifying the performance measures (Q dataset). The clustering results, labels, are then combined with other information (X dataset) as the inputs of the classification model which classifies the clustering labels by using X dataset. The non-dominated sorting generic algorithm-II (NSGA-II) was integrated in the framework to achieve optimal number of clusters. In this research, hierarchical clustering and decision tree are used for combining clustering and classification as an example. The point of sales (POS) dataset was used as a case study to evaluate NSGA-CCC and investigate the performance of retailing stores. The experimental result shows that NSGA-CCC can achieve the promising performance on finding the best number of cluster considering multi-objectives of clustering and classification and also performing the features selection simultaneously.

    摘 要 iii ABSTRACT iv 誌 謝 v TABLES OF CONTENTS vi LIST OF FIGURES viii LIST OF TABLES ix CHAPTER 1 INTRODUCTION 1 1.1 Background 1 1.2 Research Objective 3 CHAPTER 2 LITERATURE REVIEW 5 2.1 Clustering and Classification 5 2.2 Combining Clustering and Classification 6 2.3 Genetic Algorithm 7 2.4 Multi-Objective Approach 8 2.5 Multi-Objective Genetic Algorithm 10 CHAPTER 3 METHODOLOGY 16 3.1 Research Framework 16 3.2 Hierarchical Clustering 17 3.3 Decision Tree 18 3.4 Genetic Algorithm Operator 19 3.5 Non-Dominated Sorting 22 3.6 Main Loop of NSGA-CCC 26 CHAPTER 4 EXPERIMENTAL RESULTS 28 4.1 Dataset: Point of Sales Data 28 4.2 Prior Experiment 29 4.3 NSGA-CCC Simulation Running 31 4.4 Clustering and Classification Result 35 4.5 Comparing NSGA-CCC with NSGA-I and Full Feature Dataset 38 CHAPTER 5 DISCUSSION AND CONCLUSION 42 REFERENCES 45 APPENDIX 48

    Abawajy, J. H., A.V.Kalarev, & M.Chowdhury. (2013). Multistage approach for clustering and classification of ECG Data. Computer Methods and Programs in Biomedicine, 112, 720-730.
    Amershi, S., & Conati, C. (2009). Combining Unsupervised and Supervised Classification to Build User Models for Exploratory Learning Environments. Educational Data Mining, 1.
    Aviad, B., & Roy, G. (2011). Classification by Clustering Decision Tree-like Classifier based on Adjusted Clusters. expert systems with applications, 38, 8220–8228.
    Bandyopadhyay, S., Maulik, U., & Mukhopadhyay, A. (2007). Multiobjective Genetic Clustering for Pixel Classification in Remote Sensing Imagery. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 45(5), 1506-1511.
    Bian, X., Zhang, T., & Zhang, X. (2011). Combining clustering and classification for remote-sensing images using unlabeled data. Chinese Optics Letters, 9(1), 011002-011001-011002-011004.
    Cai, W., Chen, S., & Zhang, D. (2009). A simultaneous learning framework for clustering and classification. Pattern Recognition, 42, 1248--1259.
    Camps-Valls, G., & Bruzzone, L. (2005). Kernel-Based Methods for Hyperspectral Image Classification. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 43(6), 1351-1362.
    Chen, F.-L., & Li, F. C. (2010). Combination of feature selection approaches with SVM in credit scoring. expert systems with applications, 37, 8.
    Chen, Y.-L., & Hung, L. T.-H. (2009). Using decision trees to summarize associative classification rules. expert systems with applications, 36(2), 14.
    Deb, K. (2001). Multi Objective Optimization using Evolutionary Algorithms. San Fransisco USA: John wiley & Sons Inc.
    Deb, K. (2003). Multi -Objective Evolutionary Algorithms: Introducing Bias Aming Pareto Optimal Solutions. In A. Ghosh & S. Tsutsui (Eds.), Advances in Evolutionary Computing (pp. 263-292): Springer Berlin Heidelberg.
    Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T. (2002). A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6(2), 182-177.
    Florentino, H. O., Cantane, D. R., Santos, F. L. P., & Bannwart, B. F. (2014). Multiobjective Genetic Algorithm applied to dengue control. Mathematical Biosciences, 258, 77–84.
    Haupt, R. L., & Haupt, S. E. (2004). Practical Genetic Algorithms (2ne Edition ed.). New Jersey: A John Wiley and Sons Inc.
    Kaewchinporn, C., Vongsuchoto, N., & Srisawat, A. (2011). A combination of Decision Tree Learning and Clustering for Data Classification. International Joint Conference on Computer Science and Software Engineering, 363-367.
    Konak, A., Coit, D. W., & smith, A. E. (2006). Multi-objective optimization using genetic algorithms: A tutorial. Reliability Engineering and System Safety, 91, 992–1007.
    Lam, H. K., Ekong, E., Liu, H., Xiao, B., & Araujo, H. (2014). A Study of Neural-Network-based Classifiers For Material Classification. Neurocomputing, 144, 367–377.
    Luis Javier García Villalba , A. L. S. O., Jocelin Rosales Corripio. (2015). Smartphone image clustering. expert systems with applications, 42, 1927–1940.
    Mak, B., & Munakata, T. (2002). Rule extraction from expert heuristics: A comparative study of rough sets with neural networks and ID3. European Journal of Operational Research, 136, 212-229.
    Maulik, U., Bandyopadhyay, S., & Mukhopadhyay, A. (2011). Multiobjective Genetic Algorithms for Clustering. London: Springer.
    Melgani, F., & Bruzzone, L. (2004). Classification of Hyperspectral Remote Sensing Images With Support Vector Machines. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 42(8), 1778-1790.
    Mukhopadhyay, A., & Bandyopadhyay, S. (2008). Combining Multiobjective Fuzzy Clustering and Probabilistic ANN, Classifier for Unsupervised Pattern Classification: Application to Satellite Image Segmentation. IEEE congress on Evolutionary Computation, 877-883.
    Ozyildirim, B. M., & Avci, M. (2014). Logarithmic learning for generalized classifier neural network. neural network, 60, 133–140.
    S.P. Adhau, R. M. M., P.G. Adhau. (2014). K-Means clustering technique applied to availability of micro hydro power. Sustainable Energy Technologies and Asssessments, 8, 191–201.
    Sharma, A. K., Sheikh, S., Pelczer, I., & .Levy, G. (1994). Classification and Clustering: Using Neural Networks. Chemical Information Computer Science, 34, 1130-1131 1139.
    Silva, T. C., & Zhao, L. (2015). High-level pattern-based classification via tourist walks in networks. information sciences, 294, 109-126.
    Wikaisuksakul, S. (2014). A multi-objective genetic algorithm with fuzzy c-means for automaticdata clustering. Applied soft computing, 24, 679–691.
    Xia, M., Lu, W., Yang, J., Ma, Y., Yao, W., & Zheng, Z. (2015). A hybrid method based on extreme learning machine and k-nearest
    neighbor for cloud classification of ground-based visible cloud image. Neurocomputing, 160, 238-249.
    Yang, X., & Guo, C. (2011). A Programming of Genetic Algorithm in Matlab 7.0. Modern Applied Science, 5(1), 230-235.
    Yeh, J. H., & Lin, C. (2012). Classification Improvement Based on Automatic Clustering Assisted Feature Combination. International Conference on Machine Learning and Computer Science, 152-156.
    Zeng, H.-J., Wang, X.-H., Chen, Z., Lu, H., & Ma, W.-Y. (2003). CBC: Clustering Based Text Classification Requiring Minimal Labeled Data.
    Zhang, X.-Y., Yang, P., Zhang, Y.-M., Huang, K., & Liu, C.-L. (2014). Combination of Classification and Clustering Results with Label Propagation. IEEE Signal Processing Letters, 21(5), 610-614.
    Zitzler, E., & Thiele, L. (1999). Multiobjective Evolutionary Algorithms: A Comparative Case Study and the Strength Pareto Approach. IEEE Transactions on Evolutionary Computation, 3(4), 257-261.

    QR CODE