研究生: |
林立 Li - Lin |
---|---|
論文名稱: |
整合粒子群為基之K平均數最佳化演算法與粒化運算以處理資料不平衡之分類問題-以攝護腺癌症預後為例 Integration of Particle Swarm K-means Optimization Algorithm and Granular Computing for Imbalanced Data Classification Problem - A Case Study on Prostate Cancer Prognosis |
指導教授: |
郭人介
Ren-jieh Kuo |
口試委員: |
歐陽超
Chao Ou-Yang 駱至中 Chih-chung Lo |
學位類別: |
碩士 Master |
系所名稱: |
管理學院 - 工業管理系 Department of Industrial Management |
論文出版年: | 2013 |
畢業學年度: | 101 |
語文別: | 英文 |
論文頁數: | 130 |
中文關鍵詞: | 預後 、前列腺癌 、粒化運算 、粒子群演算法 、類別不平衡 |
外文關鍵詞: | Prognosis, Prostate cancer, Granular computing, Particle swarm K-means optimization, Class imbalance |
相關次數: | 點閱:206 下載:5 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於台灣地區男性罹患前列腺癌之死亡人數正在逐年增加,前列腺癌在國內之男性癌症死亡原因及癌症好發率之排名亦呈現年年攀升的現象,目前醫師對於前列腺病人的預後(prognosis)仍是使用五年存活率(five-year survival rate)做為判別的分水嶺,如何利用前列腺癌篩檢中的病理變數去推估較為確切的存活年限成為相當重要的議題。然而,眾多病理資料通常有類別資料分布不均的現象,導致病理判斷的誤差。
有鑑於此,本研究提出一用以分類之粒化運算模型,應用於攝護腺癌症之資料前處理,利用資訊粒化(Information Granulation ; IG) 之方法將性質相似的多數類別資料元素轉換成資訊粒子,降低多數類別資料比率,平衡多數類別與少數類別之資料比率,減少關鍵的稀少資料被多數類別資料稀釋的情形發生,有效地解決非平衡資料所造成的負面效果。以期能有效地幫助使用者(醫生)從有限的病理資料中了解攝護腺癌病患的狀況,並提高判斷病患壽命與存活率之精確性。
In Taiwan, the morbidity of prostate cancer is the fifth of cancer of men, and the mortality is the seventh. Recently, men suffering from prostate cancer gradually increase every year. Currently, prognosis of prostate cancer is discriminated according to the five-year survival rate. It has become a critical issue of how to estimate the life expectancy of prostate cancer. However, pathological data are usually characterized as skewed distribution, easily leading to errors in judging pathology.
In order to decrease the errors in judging pathology, this study focus on the problem of class imbalance. This study attempts to propose a PSKO-based granular computing( GrC ) model to preprocess the skewed class distribution. GrC model acquires knowledge from information granules rather than from numerical data, and process multidimensional and sparse data by using Singular Value Decomposition and Latent Semantic Indexing (LSI). The data possessing features of multi-dimension and scarcity can be preprocessed by using LSI to reduce the data dimension and records.
In addition, the proposed method employed ten data sets from the UCI Machine Learning Repository to demonstrate the effectiveness of our methodology. Experimental results indicate that the proposed model of information granules has promising performance of classifying both imbalanced data and balanced data. PSKO-based granular computing( GrC ) model can not only obtain better information granules, but also increase the ability of classifying imbalanced data. It is also able to support physicians in judging the pathological condition of prostate cancer of patients and survival rate.
Al-Sultan, K. S., & Khan, M. M. (1996). Computational experience on four algorithms for the hard clustering problem. Pattern Recognition Letters, 17(3), 295–308
Altman, E.I., Avery, R., Eisenbeis, R., Stinkey, J., (1981). Application of Classification Techniques in Business, Banking and Finance. Contemporary Studies in Economic and Financial Analysis, 3 JAI Press, Greenwich, CT.
Anil, C., Carroll, D., Green, P. E., & Rotondo, J. A. (1997). A feature-based approach to market segmentation via overlapping K-centroids clustering. Journal of Marketing Research, XXXIV, 370–377.
Arentze, T. A. (2009). Spatial Data Mining, Cluster and Pattern Recognition. In K. Editors-in-Chief: Rob & T. Nigel (Eds.), International Encyclopedia of Human Geography (pp. 325-331). Oxford: Elsevier.
Balakrishnan, P.V.(Sundar), Cooper, M. C., & Jacob, V. S. (1994). A study of classification capabilities of neural networks using unsupervised learning: A comparison with K-means clustering. Psychometrika, l.59(4), 509–525.
Basheer, I. A., & Hajmeer, M. (2000). Artificial neural networks: fundamentals, computing, design, and application. Journal of Microbiological Methods, 43(1), 3-31.
Bargiela, A. & Pedrycz, W., (2002), Granular Computing: An Introduction, Kluwer Academic Publishers, Boston.
Bargiela, A., & Pedrycz, W. (2003) Recursive information granulation: aggregation and interpretation issues, IEEE Transactions on Systems, Man, and Cybernetics, 33 (1), 96–112.
Batista, G. E. A. P. A., Prati, R. C., & Monard, M. C. (2004). A study of the behaviour ofseveral methods for balancing machine learning training data. SIGKDDExplorations, 6, 20–29.
Bereta, M., & Burczyński, T. (2009). Immune K-means and negative selection algorithms for data analysis. Information Sciences, 179(10), 1407-1425.
Berlanga, F. J., Rivera, A. J., del Jesus, M. J., & Herrera, F. (2010). GP-COACH: Genetic Programming-based learning of COmpact and ACcurate fuzzy rule-based classification systems for High-dimensional problems. Information Sciences, 180(8), 1183-1200.
Benardos, P. G., & Vosniakos, G. C. (2002). Prediction of surface roughness in CNC face milling using neural networks and Taguchi’s design of experiments. Robotics and Computer Integrated Manufacturing, 18, 43–354
Bigus, J. P. (1996). Data mining with neural networks. New york, McGraw- Hill.
Box,G. E. P. & Jenkins, G. (1976). Time Series Analysis, Forecasting and Control, San Francisco: Holden-Day,.
Castellano, G. & Fanelli,A.M. (2001). Information granulation via neural network-based learning, in: IFSA World Congress and 20th NAFIPS International Conference, 5, 3059–3064.
Chang, D., Zhao, Y., Zheng, C., & Zhang, X. (2012). A genetic clustering algorithm using a message-based similarity measure. Expert Systems with Applications, 39(2), 2194-2202.
Chang, F.-C., & Huang, H.-C. (2012). A refactoring method for cache-efficient swarm intelligence algorithms. Information Sciences, 192(0), 39-49.
Chang, S. (1998). Internet segmentation:State-of-the-art marketing applications. Journal of Segmentation in Marketing, 2(1), 19–34.
Chang, P.-T., L.-C. Huang, et al. (2000). The fuzzy Delphi method via fuzzy statistics and membership function fitting and an application to the human resources. Fuzzy Sets and Systems, 112(3): 511-520.
Chang, P.C.,Wang, Y.W., Liu, C.H. (2006). Combining SOM and GA-CBR for flow time prediction in semiconductor manufacturing factory. Lecture Notes in Computer Science, 767–775.
Chang, P.-C. & Y.-W. Wang (2006). "Fuzzy Delphi and back-propagation model for sales forecasting in PCB industry." Expert Systems with Applications, 30(4), 715-726.
Chang, P.-C., C.-H. Liu, et al. (2009). Data clustering and fuzzy neural network for sales forecasting: A case study in printed circuit board industry. Knowledge-Based Systems, 22(5), 344-355.Bereta, M., & Burczyński, T. (2009). Immune K-means and negative selection algorithms for data analysis. Information Sciences, 179(10), 1407-1425.
Chang, D., Zhao, Y., Zheng, C., & Zhang, X. (2012). A genetic clustering algorithm using a message-based similarity measure. Expert Systems with Applications, 39(2), 2194-2202.
Chang, F.-C., & Huang, H.-C. (2012). A refactoring method for cache-efficient swarm intelligence algorithms. Information Sciences, 192(0), 39-49.
Chen, M.-C., Chen, L.-S., Hsu, C.-C., & Zeng, W.-R. (2008). An information granulation based data mining approach for classifying imbalanced data. Information Sciences, 178(16), 3214-3227.
Chiu, C.-Y., Kuo, I. T., & Lin, C.-H. (2009). Applying artificial immune system and ant algorithm in air-conditioner market segmentation. Expert Systems with Applications, 36(3, Part 1), 4437-4442.
Das, S., Abraham, A., & Konar, A. (2008). Automatic kernel clustering with a Multi-Elitist Particle Swarm Optimization Algorithm. Pattern Recognition Letters, 29(5), 688-699.
Eastham, J. A., Kattan, M. W., Fearn, P., Fisher, G., Berney, D. M., Oliver, T., et al. (2008). Local Progression among Men with Conservatively Treated Localized Prostate Cancer: Results from the Transatlantic Prostate Group. European Urology, 53(2), 347-354.
El-Zonkoly, A. M. (2011). Optimal placement of multi-distributed generation units including different load models using particle swarm optimization. Swarm and Evolutionary Computation, 1(1), 50-59.
Fu, W., Madan, E., Yee, M., & Zhang, H. (2012). Progress of molecular targeted therapies for prostate cancers. Biochimica et Biophysica Acta (BBA) - Reviews on Cancer, 1825(2), 140-152.
Kashan, M. H., Nahavandi, N., & Kashan, A. H. (2012). DisABC: A new artificial bee colony algorithm for binary optimization. Applied Soft Computing, 12(1), 342-352.
Kuo, R. J., An, Y. L., Wang, H. S., & Chung, W. J. (2006). Integration of self-organizing feature maps neural network and genetic K-means algorithm for market segmentation. Expert Systems with Applications, 30(2), 313-324.
Kuo, R. J., & Lin, L. M. (2010). Application of a hybrid of genetic algorithm and particle swarm optimization algorithm for order clustering. Decision Support Systems, 49(4), 451-462.
Kuo, R. J., Wang, H. S., Hu, T.-L., & Chou, S. H. (2005). Application of ant K-means on clustering analysis. Computers & Mathematics with Applications, 50(10–12), 1709-1724.
Lopez, V., Fernandez, A., Moreno-Torres, J. G., & Herrera, F. (2012). Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics. Expert Systems with Applications, 39(7), 6585-6608.
Loyola, P., Roman, P. E., & Velasquez, J. D. (2012). Predicting web user behavior using learning-based ant colony optimization. Engineering Applications of Artificial Intelligence, 25(5), 889-897.
Maulik, U., & Bandyopadhyay, S. (2000). Genetic algorithm-based clustering technique. Pattern Recognition, 33(9), 1455-1465.
Messing, E. M., & Thompson Jr, I. (2003). Follow-up of conservatively managed prostate cancer: Watchful waiting and primary hormonal therapy. Urologic Clinics of North America, 30(4), 687-702.
Muniyandi, A. P., Rajeswari, R., & Rajaram, R. (2012). Network Anomaly Detection by Cascading K-Means Clustering and C4.5 Decision Tree algorithm. Procedia Engineering, 30(0), 174-182.
Murthy, C. A., & Chowdhury, N. (1996). In search of optimal clusters using genetic algorithms. Pattern Recognition Letters, 17(8), 825-832.
Niknam, T., & Amiri, B. (2010). An efficient hybrid approach based on PSO, ACO and k-means for cluster analysis. Applied Soft Computing, 10(1), 183-197.
Stephan, C., Jung, K., Lein, M., & Diamandis, E. P. (2007). PSA and other tissue kallikreins for prostate cancer detection. European Journal of Cancer, 43(13), 1918-1926.
Su, C.-T., Chen, L.-S., & Chiang, T.-L. (2006). A neural network based information granulation approach to shorten the cellular phone test process. Computers in Industry, 57(5), 412-423.
Su, C.-T., Chen, L.-S., & Yih, Y. (2006). Knowledge acquisition through information granulation for imbalanced data. Expert Systems with Applications, 31(3), 531-541.
Timmis, J., Neal, M., & Hunt, J. (2000). An artificial immune system for data analysis. Biosystems, 55(1–3), 143-150.
Wan, Y., Pei, T., Zhou, C., Jiang, Y., Qu, C., & Qiao, Y. (2012). ACOMCD: A multiple cluster detection algorithm based on the spatial scan statistic and ant colony optimization. Computational Statistics & Data Analysis, 56(2), 283-296.
Yang, D., Jiao, L., Gong, M., & Liu, F. (2011). Artificial immune multi-objective SAR image segmentation with fused complementary features. Information Sciences, 181(13), 2797-2812.
Yusup, N., Zain, A. M., & Hashim, S. Z. M. (2012). Overview of PSO for Optimizing Process Parameters of Machining. Procedia Engineering, 29(0), 914-923.
Zhang, C., Ouyang, D., & Ning, J. (2010). An artificial bee colony approach for clustering. Expert Systems with Applications, 37(7), 4761-4767.
Zhu, Z., Zhu, X., Guo, Y., Ye, Y., & Xue, X. (2012). Inverse matrix-free incremental proximal support vector machine. Decision Support Systems, 53(3), 395-405.
Zopounidis, C., & Doumpos, M. (2002). Multicriteria classification and sorting methods: A literature review. European Journal of Operational Research, 138(2), 229-246.