簡易檢索 / 詳目顯示

研究生: 張秉睿
Ping-Jui Chang
論文名稱: 基於維度和灰色理論關聯性的模糊C 均值插補缺失數據的方法
A New and Effective Dimension- and Grey Theory Correlation-based Fuzzy C-Means Method for Imputing Incomplete Data
指導教授: 鍾國亮
Kuo-Liang Chung
口試委員: 鍾國亮
Kuo-Liang Chung
蔡文祥
Wen-Hsiang Tsai
姚智原
Chih-Yuan Yao
李同益
Tong-Yee Lee
鄧惟中
Wei-Chung Teng
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 38
中文關鍵詞: 分群不完整資料補值
外文關鍵詞: clustering, incomplete data, impute data
相關次數: 點閱:154下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在大數據分析中填補缺失數據是一個重要且具有挑戰性的問題。在本文中,我們提出了一種新的有效的基於維度和灰色理論關聯性的模糊 (DGC-fuzzy) c-means 方法來估算不完整數據。對於每個數據集,所提出的DGC-fuzzy c-means方法首先建立一個維度優先的陣列,用於保存要補值維度的優先順序。接下來,第一優先度的不完整維度通過在相同維度 中的均值來補值。然後,僅考慮完整維度和以補完值的維度,應用模糊 c-means 將數據集劃分為c個群,並且在每個群中,第二優先維度中的每個缺失值都由同一群中前一半高度相關的值的平均值來補值。重複上述分群和補值過程,直到插補所有不完整的維度。實驗結果表明,與最先進的 方法相比,所提出的用於估算不完整數據的DGC-fuzzy c-means 方法具有更好的準確性和分群優勢。


    Imputing missing data in big data analysis is an important and challenging problem. In this thesis, a new and effective dimension- and grey theory correlation-based fuzzy (DGC-fuzzy) c-means method had been proposed for imputing incomplete data. For each dataset, the proposed DGC-fuzzy c-means method first builds up a dimension-priority array for saving the order of every dimension to be imputed. Next, the first-priority incomplete dimension is imputed by mean-filling in the same dimension. Then, by only considering the complete and imputed dimensions, the fuzzy c-means method is applied to partition the dataset into c clusters, and in each cluster, every missing attribute in the second-priority dimension is imputed by the mean value of the half higher correlated available attributes in the same dimension. The above clustering and imputation processes are repeated until all incomplete dimensions are imputed. Experimental results demonstrate that the proposed DGC-fuzzy c-means method for imputing incomplete data achieves better accuracy and clustering benefits when compared with the state-of-the-art methods.

    Abstract in Chinese Abstract in English Acknowledgments Contents List of Figures List of Tables List of Algorithms 1 Introduction 1.1 Related works 1.2 Contributions 2 The Proposed Dimension–and Grey Theory Correlation-based Fuzzy C-Means (DGC-fuzzy C-means) Method for Imputing Incomplete Data 3 Experimental Results 3.1 Working environment 3.2 Statistical accuracy and clustering comparison 3.2.1 The accuracy (ACC) performance comparison 3.2.2 The F-score performance comparison 3.2.3 The normalized mutual information (NMI) performance comparison 4 Conclusion 5 Future work References Appendix A : Using grey theory approach to select the first half of the higher correlated available attributes in D[∗, j] to impute the missing attribute D[i, j]

    [1] R. J. A. Little and D. B. Rubin, Statistical Analysis with Missing Data (3rd), John Wiley and Sons, 2019. [2] K. Pelckmans and J. De Brabanter and J. A. K. Suykens and B. De Moor, “Handling missing values in support vector machine classifiers,” Neural Network, vol. 18, pp. 684-692, 2005. [3] W. Young and G. Weckman and W. Holland, “A survey of methodologies for the treatment of missing values within datasets: Limitations and benefits,” Theoretical Issues Ergonomics Science, vol. 12, pp. 15-43, 2010. [4] I. Myrtvei and E. Stensrud and U. Olsson, “Analyzing data sets with missing data: An empirical evaluation of imputation methods and likelihood-based methods,” IEEE Transactions on Software Engineering, vol. 11, pp. 999-1013, 2001. [5] S. Zhang, “Nearest neighbor selection for iteratively knn imputation,” Journal of Systems and Software, vol. 85, pp. 2541–2552, 2012. [6] G. Rahman and Z. Islam, “Missing value imputation using a fuzzy clusteringbased em approach,” Knowledge and Information Systems, vol. 46, pp. 389-422, 2016. [7] A. Gersho and R. M. Gray, Vector quantization and signal compression, Springer Science and Business Media, 1992. [8] P. J. Garćıa–Laencina, J. L. Sancho-Ǵomez, A. R. Figueiras-Vidal, M. Verleysen, “K nearest neighbours with mutual information for simultaneous classification and missing data imputation,” Neurocomputing, vol. 72, 2009. [9] Z. Zivkovic, “Improved adaptive gaussian mixture model for background subtraction,” Proceedings of the 17th International Conference on Pattern Recognition, pp. 28–31, 2004. [10] J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, USA, New York: Plenum, 1981. [11] D. Li and J. Deogun and W. Spaulding and B. Shuart, “Towards missing data imputation: A study of fuzzy k-means clustering method,” International Conference on Rough Sets and Current Trends in Computing, 2003. [12] I. B. Aydilek and A. Arslan, “A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm,” Information Sciences, vol. 233, pp. 23- 25, 2013. [13] V. Vapnik and S. E. Golowich and A. Smola, “Support vector method for function approximation, regression estimation, and signal processing,” Proceedings of the 9th International Conference on Neural Information Processing Systems, pp. 281-287, 1997. [14] J. H. Holland, Adaptation in Natural and Artificial Systems, Bradford Books, 1975. [15] S. Nikfalazar, C. H. Yeh, S. Bedingfield, H. A. Khorshidi, “A new iterative fuzzy clustering algorithm for multiple imputation of missing data,” IEEE International Conference on Fuzzy Systems, pp. 1-6, 2017. [16] R. J. G. B. Campello and E. R. Hruschka, “A fuzzy extension of the silhouette width criterion for cluster analysis,” Fuzzy Sets and Systems, vol. 157, pp. 2858–2875, 2006. [17] H. Khan and X. Wang, H. Liu, “Missing value imputation through shorter inter val selection driven by fuzzy c-means clustering,” Computers and Electrical Engineering, vol. 93, pp. 28–31, 2021. [18] P. J. Rousseeuw, “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis,” Journal of Computational and Applied Mathematics, vol. 20, pp. 53-65, 1987. [19] F. H. Mausor and J. Jaafar and S. M. Taib and R. Razali, “Iterative fuzzy c means, fuzzy silhouette, and imputation for missing values in a dataset,” IEEE International Conference on Computing, pp. 382– 385, 2021. [20] C. C. Huang and H. M. Lee, “A grey-based nearest neighbor approach for missing attribute value prediction,” Applied Intelligence, vol. 20, pp. 239–252, 2004. [21] D. Dua and C. Graff, Uci machine learning repository, http:// archive.ics.uci.edu/ml/, 2020 [22] C. J. V. Rijsbergen, Information Retrieval, London, U.K: Butterworths, 1979. [23] A. Amelio and C. Pizzuti, “Is normalized mutual information a fair measure for comparing community detection methods?,” International Conference on Advances in Social Network Analysis and Mining, pp. 1584–1585, 2015. [24] J. Han and J. Pei and M. Kamber, Data mining: Concepts and techniques (3rd), USA, San Francisco: Morgan Kaufmann Publishers, 2011

    QR CODE