簡易檢索 / 詳目顯示

研究生: 楊程皓
Cheng-Hao Yang
論文名稱: 基於基因演算法具可解讀規則及最適群聚之模糊分群演算法
An GA-based Fuzzy Clustering Algorithm with Interpretable Rules and Best-Fit Clusters
指導教授: 呂政修
Jenq-Shiou Leu
口試委員: 石維寬
Wei-Kuan Shih
陳省隆
Hsing-Lung Chen
方文賢
Wen-Hsien Fang
陳郁堂
Yie-Tarng Chen
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2017
畢業學年度: 105
語文別: 中文
論文頁數: 49
中文關鍵詞: 演化式演算法模糊邏輯規則式分群
外文關鍵詞: Evolutionary Computing, Fuzzy logic, Rule-Based clustering
相關次數: 點閱:311下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

隨著物聯網的興起,資料分析和機器學習成為了熱門的研究議題。藉由大量的資料,我們能夠從中獲取重要的資訊。由於資料蒐集的方便性,所得到的資料集往往是高維度且未標籤的,若沒有經由適當的前處理,資料集當中容易出現冗餘或異常的資料,進而影響機器學習演算法的學習成效。為了解決上述困境,特徵選擇演算法成為了一個常見的前處理方法,用以篩選重要的特徵,在非監督式學習效果尤其顯著。
在非監督式學習中,最廣為人知的是分群問題。分群演算法有k-平均演算法、階層式分群法及均值偏移分群法…等等,即使透過這些傳統演算法能得到相對不錯的分群結果,但是依舊無法得知「哪些是重要的特徵?」及「如何決定適當的群聚數量?」。本文基於模糊邏輯及基因演算法提出了一個分群演算法,不僅能夠找出重要的分群特徵,也能找到該資料集適合分為幾群,並輸出if-then形式的模糊分群規則,讓人類能夠輕易地解讀分群依據。實驗部分將會將此演算法用於現實環境的資料集,以驗證演算法的表現。


With the increasing popularity of Internet of Things, big data analysis becomes an important topic. By using multi-sensors devices, we can easily gather real life data and mine important information from them. These datasets are mostly high-dimensional data, and most of them are unlabeled. Therefore, reducing high dimensional data by using feature selection to choose important feature sets becomes an important topic in machine learning, especially in unsupervised learning. There are many kinds of clustering algorithms, such as k-means, hierarchical clustering, mean shift clustering, etc. Although we can get comparatively better result, we are still interested in “Which feature contributes to the result of clustering?” and “What is the correct number of clusters?” . In this paper, we propose a clustering algorithm not only finds significant and important features, but also proper number of clusters with clustering rules which human can easily interpret. Experimental results show that the proposed algorithm can perform well in the real-environment wine dataset.

論文摘要 II ABSTRACT III 誌謝 IV 目錄 V 圖表索引 VII 第 1 章 緒論 1 第 2 章 相關研究技術與知識 4 2.1 分群演算法的應用 4 2.2 特徵選擇的重要性 5 2.3 分群結果的評估 6 第 3 章 系統架構設計規劃 8 3.1 歸屬函數和模糊分群規則 8 3.1.1 歸屬函數(Membership function) 8 3.1.2 模糊分群規則(Fuzzy clustering rules) 10 3.2 基因表示和適應性函數 10 3.2.1 基因表示(GA-chromosome representation) 11 3.2.2 適應性函數(Fitness function) 12 3.3 基因演算法 16 3.3.1 演算法流程 16 3.3.2 智慧型交配(Intelligent Crossover) 20 第 4 章 實驗測試與評估結果 23 4.1 資料集介紹 23 4.1.1 人工資料集(Synthesized dataset) 23 4.1.2 乳腺癌醫學診斷(Breast Cancer dataset on UCI) 24 4.1.3 紅酒成分資料集(Wine dataset on UCI) 25 4.2 實驗測試與評估結果 26 4.2.1 人工資料集之實驗結果 26 4.2.2 乳腺癌診斷資料集之實驗結果 27 4.2.3 紅酒資料集之實驗結果 30 4.2.4 智慧型交配與單點交配之實驗結果比較 35 第 5 章 結論及未來展望 37 參考文獻 38

[1]
A. Eustace, "A fall spring-clean". Official Google Blog, 2011.
[2]
Google Image Labeler. http://images.google.com/imagelabeler/.
[3]
M. Jordan, T. M. Mitchell, “Machine learning: Trends, perspectives, and prospects”, Science, Vol. 349, Issue 6245, July 2015
[4]
Garima, H. Gulati, P. K. Singh, “Clustering techniques in data mining: A comparison”, International Conference on Computing for Sustainable Global Development (INDIACom), 2015
[5]
C. Otto, B. Klare, A. K. Jain, “An efficient approach for clustering face images”, International Conference on Biometrics (ICB) , 2015
[6]
C.-H. Hsieh, C.-H. Yang, C.-H. Mao, C.-P. Lai, J.-S. Leu, “GMiner: Rule-Based Fuzzy Clustering for Google Drive Behavioral Type Mining”, International Computer Symposium (ICS), 2016
[7]
T. M. Fahrudin, I. Syarif, A. R. Barakbah, “Feature Selection Algorithm using Information Gain Based Clustering for Supporting The Treatment Process of Breast Cancer”, International Conference on Informatics and Computing (ICIC), 2016
[8]
Clustering performance evaluation, http://scikitlearn.org/stable/modules/clustering
.html#clustering-evaluation-
[9]
D. Steinley, “Properties of the Hubert-Arabie adjusted Rand index”, Psychological Methods, Vol.9, No.3, pp.386-396, 2004
[10]
T. C. Havens, J. C. Bezdek, C. Leckie, L. O. Hall, M. Palaniswami,“Fuzzy c-Means Algorithms for Very Large Data” , IEEE Transactions on Fuzzy Systems, Vol. 20, Issue. 6, pp. 1130-1146, 2012
[11]
Y. Chauhan, V. Chaurasia, C. Agarwal, “A Survey Of K-Means And GA-KM The Hybrid Clustering Algorithm”, International Journal of Scientific & Technology Research, Vol. 3, Issue 6, June, 2014
[12]
Y. Lu, S. Lu, F. Fotouhi, Y. Deng, S. J. Brown, “FGKA: A Fast Genetic K-means Clustering Algorithm”, Proceedings of The 19th Annual ACM Symposium on Applied Computing. pp: 622-623, 2004
[13]
S. Ghosh, S. K. Dubey, “Comparative Analysis of K-Means and Fuzzy CMeans Algorithms”, International Journal of Advanced Computer Science and Applications, Vol. 4, No.4, 2013
[14]
U. Maulik, S. Bandyopadhyay, “Performance evaluation of some clustering algorithms and validity indices”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, Issue: 12, pp. 1650 - 1654, 2002
[15]
Y.-C. Chuang, C.-T. Chen, “A study on real-coded genetic algorithm for process optimization using ranking selection, direction-based crossover and dynamic mutation”,IEEE Congress on Evolutionary Computation (CEC), 2011
[16]
M. Črepinšek, S.-H. Liu, M. Mernik, “Exploration and exploitation in evolutionary algorithms: A survey”, ACM Computing Surveys (CSUR), Vol. 45 Issue 3, 2013
[17]
S.-Y. Ho, C.-H. Hsieh, H.-M. Chen, H.-L. Huang, ”Interpretable gene expression classifier with an accurateand compact fuzzy rule base for microarray data analysis”, Biosystems, Vol. 85, No. 3, pp. 165–176, 2006.
[18]
S.-Y. Ho, L.-S. Shu , J.-H. Chen, “Intelligent Evolutionary Algorithms for Large Parameter Optimization Problems”, IEEE Transactions On Evolutionary Computation, Vol. 8, No. 6, December, 2004
[19]
C. Merz, P. Murphy, UCI Repository of Machine Learning Databases, Online Available ftp://ftp.ics.uci.edu/pub/machine-Learning databases
[20]
P. Bholowalia, A. Kumar, “EBK-Means: A Clustering Technique based on ElbowMethod and K-Means in WSN”, International Journal of Computer Applications , Vol. 105 , No. 9, November, 2014
[21]
D. J. Ketchen, C. L. Shook, "The application of cluster analysis in Strategic Management Research: An analysis and critique", Strategic Management Journal, 1996
[22]
C. Legány, S. Juhász, A. Babos, “Cluster validity measurement techniques”, AIKED'06 Proceedings of the 5th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases, February, 2006
[23]
E. Rendón, I. Abundez, A. Arizmendi, E. M. Quiroz , “Internal versus External cluster validation indexes”, International Journal of Computers and communications, Issue 1, Vol. 5, 2011

QR CODE