簡易檢索 / 詳目顯示

研究生: 林立
Li - Lin
論文名稱: 整合粒子群為基之K平均數最佳化演算法與粒化運算以處理資料不平衡之分類問題-以攝護腺癌症預後為例
Integration of Particle Swarm K-means Optimization Algorithm and Granular Computing for Imbalanced Data Classification Problem - A Case Study on Prostate Cancer Prognosis
指導教授: 郭人介
Ren-jieh Kuo
口試委員: 歐陽超
Chao Ou-Yang
駱至中
Chih-chung Lo
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業管理系
Department of Industrial Management
論文出版年: 2013
畢業學年度: 101
語文別: 英文
論文頁數: 130
中文關鍵詞: 預後前列腺癌粒化運算粒子群演算法類別不平衡
外文關鍵詞: Prognosis, Prostate cancer, Granular computing, Particle swarm K-means optimization, Class imbalance
相關次數: 點閱:206下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 由於台灣地區男性罹患前列腺癌之死亡人數正在逐年增加,前列腺癌在國內之男性癌症死亡原因及癌症好發率之排名亦呈現年年攀升的現象,目前醫師對於前列腺病人的預後(prognosis)仍是使用五年存活率(five-year survival rate)做為判別的分水嶺,如何利用前列腺癌篩檢中的病理變數去推估較為確切的存活年限成為相當重要的議題。然而,眾多病理資料通常有類別資料分布不均的現象,導致病理判斷的誤差。
    有鑑於此,本研究提出一用以分類之粒化運算模型,應用於攝護腺癌症之資料前處理,利用資訊粒化(Information Granulation ; IG) 之方法將性質相似的多數類別資料元素轉換成資訊粒子,降低多數類別資料比率,平衡多數類別與少數類別之資料比率,減少關鍵的稀少資料被多數類別資料稀釋的情形發生,有效地解決非平衡資料所造成的負面效果。以期能有效地幫助使用者(醫生)從有限的病理資料中了解攝護腺癌病患的狀況,並提高判斷病患壽命與存活率之精確性。


    In Taiwan, the morbidity of prostate cancer is the fifth of cancer of men, and the mortality is the seventh. Recently, men suffering from prostate cancer gradually increase every year. Currently, prognosis of prostate cancer is discriminated according to the five-year survival rate. It has become a critical issue of how to estimate the life expectancy of prostate cancer. However, pathological data are usually characterized as skewed distribution, easily leading to errors in judging pathology.
    In order to decrease the errors in judging pathology, this study focus on the problem of class imbalance. This study attempts to propose a PSKO-based granular computing( GrC ) model to preprocess the skewed class distribution. GrC model acquires knowledge from information granules rather than from numerical data, and process multidimensional and sparse data by using Singular Value Decomposition and Latent Semantic Indexing (LSI). The data possessing features of multi-dimension and scarcity can be preprocessed by using LSI to reduce the data dimension and records.
    In addition, the proposed method employed ten data sets from the UCI Machine Learning Repository to demonstrate the effectiveness of our methodology. Experimental results indicate that the proposed model of information granules has promising performance of classifying both imbalanced data and balanced data. PSKO-based granular computing( GrC ) model can not only obtain better information granules, but also increase the ability of classifying imbalanced data. It is also able to support physicians in judging the pathological condition of prostate cancer of patients and survival rate.

    ABSTRACT I ACKNOWLEDGEMENTS III CONTENTS IV LIST OF FIGURES VI LIST OF TABLES VIII CHAPTER 1 INTRODUCTION 1 1.1. Research Background 1 1.2. Research Objectives 2 1.3. Research Scopes and Constraints 3 1.4. Framework and Organization 4 CHAPTER 2 LITERATURE SURVEY 1 2.1 Prostate Cancer 1 2.1.1 Cancer Registry Annual Report 1 2.1.2 Process of Diagnosis 2 2.1.3 Critical factors of Prostate Cancer 3 2.1.4 Data mining in Classification of Prostate Cancer 5 2.2 Classification 6 2.2.1 Decision Tree 8 2.2.2 Artificial Neural Networks 9 2.2.3 Support Vector Machines 10 2.3 Class Imbalance Problems 12 2.4 Cluster Analysis 15 2.4.1 Definition of cluster analysis 15 2.4.2 Clustering technology 17 2.5 Granular Computing 21 2.6 Meta-heuristic 22 2.6.1 Genetic algorithms 23 2.6.2 Particle swarm optimization 24 2.6.3 Ant colony optimization 24 2.6.4 Artificial immune system 25 2.6.5 Artificial bee colony 27 CHAPTER 3 RESEARCH METHODOLOGY 28 3.1 Research framework 28 3.2 Construction of Information Granules 32 3.2.1 Apply PSKO Algorithm in IG Process 32 3.3 Selection of Granularity 36 3.4 Representation of Information Granules 37 3.5 Latent Semantic Indexing 38 3.6 Classification 41 3.6.1 Back-propagation Neural Network 41 3.6.2 Decision Tree 44 3.6.3 Support Vector Machine 45 CHAPTER 4 Experimental Results and Analysis 47 4.1 Balanced Benchmark Data Sets 47 4.1.1 Computational Results of Numerical Model Using BPN 48 4.1.2 Parameter Determination- Taguchi Method 52 4.1.3 Computational Results of BPN 57 4.1.4 Computational Results of C4.5 60 4.1.5 Computational Results of SVM 63 4.2 Imbalanced Benchmark Data Sets 66 4.2.1 Computational Results of Numerical Model Using BPN 66 4.2.2 Parameter Determination- Taguchi Method 71 4.2.3 Computational Results of BPN 74 4.2.4 Computational Results of C4.5 77 4.2.5 Computational Results of SVM 80 4.3 Statistical Hypothesis 83 CHAPTER 5 Model Evaluation Results and Discussion 87 5.1 Data Collection 87 5.2 Factor Selection- Stepwise Regression 87 5.3 Parameter Determination- Taguchi Method 90 5.4 Prognosis Trial 95 5.5 Statistical Hypothesis 99 CHAPTER 6 Conclusion and Future Research 103 6.1 Conclusion 103 6.2 Contributions 103 6.3 Future Research 104 REFERENCE 105 APPENDIX 110 Appendix Ⅰ- The result of statistical hypothesis of BPN. 110 Appendix II- The result of statistical hypothesis of C4.5. 116 Appendix III- The result of statistical hypothesis of SVM. 122

    Al-Sultan, K. S., & Khan, M. M. (1996). Computational experience on four algorithms for the hard clustering problem. Pattern Recognition Letters, 17(3), 295–308
    Altman, E.I., Avery, R., Eisenbeis, R., Stinkey, J., (1981). Application of Classification Techniques in Business, Banking and Finance. Contemporary Studies in Economic and Financial Analysis, 3 JAI Press, Greenwich, CT.
    Anil, C., Carroll, D., Green, P. E., & Rotondo, J. A. (1997). A feature-based approach to market segmentation via overlapping K-centroids clustering. Journal of Marketing Research, XXXIV, 370–377.
    Arentze, T. A. (2009). Spatial Data Mining, Cluster and Pattern Recognition. In K. Editors-in-Chief: Rob & T. Nigel (Eds.), International Encyclopedia of Human Geography (pp. 325-331). Oxford: Elsevier.
    Balakrishnan, P.V.(Sundar), Cooper, M. C., & Jacob, V. S. (1994). A study of classification capabilities of neural networks using unsupervised learning: A comparison with K-means clustering. Psychometrika, l.59(4), 509–525.
    Basheer, I. A., & Hajmeer, M. (2000). Artificial neural networks: fundamentals, computing, design, and application. Journal of Microbiological Methods, 43(1), 3-31.
    Bargiela, A. & Pedrycz, W., (2002), Granular Computing: An Introduction, Kluwer Academic Publishers, Boston.
    Bargiela, A., & Pedrycz, W. (2003) Recursive information granulation: aggregation and interpretation issues, IEEE Transactions on Systems, Man, and Cybernetics, 33 (1), 96–112.
    Batista, G. E. A. P. A., Prati, R. C., & Monard, M. C. (2004). A study of the behaviour ofseveral methods for balancing machine learning training data. SIGKDDExplorations, 6, 20–29.
    Bereta, M., & Burczyński, T. (2009). Immune K-means and negative selection algorithms for data analysis. Information Sciences, 179(10), 1407-1425.
    Berlanga, F. J., Rivera, A. J., del Jesus, M. J., & Herrera, F. (2010). GP-COACH: Genetic Programming-based learning of COmpact and ACcurate fuzzy rule-based classification systems for High-dimensional problems. Information Sciences, 180(8), 1183-1200.
    Benardos, P. G., & Vosniakos, G. C. (2002). Prediction of surface roughness in CNC face milling using neural networks and Taguchi’s design of experiments. Robotics and Computer Integrated Manufacturing, 18, 43–354
    Bigus, J. P. (1996). Data mining with neural networks. New york, McGraw- Hill.
    Box,G. E. P. & Jenkins, G. (1976). Time Series Analysis, Forecasting and Control, San Francisco: Holden-Day,.
    Castellano, G. & Fanelli,A.M. (2001). Information granulation via neural network-based learning, in: IFSA World Congress and 20th NAFIPS International Conference, 5, 3059–3064.
    Chang, D., Zhao, Y., Zheng, C., & Zhang, X. (2012). A genetic clustering algorithm using a message-based similarity measure. Expert Systems with Applications, 39(2), 2194-2202.
    Chang, F.-C., & Huang, H.-C. (2012). A refactoring method for cache-efficient swarm intelligence algorithms. Information Sciences, 192(0), 39-49.
    Chang, S. (1998). Internet segmentation:State-of-the-art marketing applications. Journal of Segmentation in Marketing, 2(1), 19–34.
    Chang, P.-T., L.-C. Huang, et al. (2000). The fuzzy Delphi method via fuzzy statistics and membership function fitting and an application to the human resources. Fuzzy Sets and Systems, 112(3): 511-520.
    Chang, P.C.,Wang, Y.W., Liu, C.H. (2006). Combining SOM and GA-CBR for flow time prediction in semiconductor manufacturing factory. Lecture Notes in Computer Science, 767–775.
    Chang, P.-C. & Y.-W. Wang (2006). "Fuzzy Delphi and back-propagation model for sales forecasting in PCB industry." Expert Systems with Applications, 30(4), 715-726.
    Chang, P.-C., C.-H. Liu, et al. (2009). Data clustering and fuzzy neural network for sales forecasting: A case study in printed circuit board industry. Knowledge-Based Systems, 22(5), 344-355.Bereta, M., & Burczyński, T. (2009). Immune K-means and negative selection algorithms for data analysis. Information Sciences, 179(10), 1407-1425.
    Chang, D., Zhao, Y., Zheng, C., & Zhang, X. (2012). A genetic clustering algorithm using a message-based similarity measure. Expert Systems with Applications, 39(2), 2194-2202.
    Chang, F.-C., & Huang, H.-C. (2012). A refactoring method for cache-efficient swarm intelligence algorithms. Information Sciences, 192(0), 39-49.
    Chen, M.-C., Chen, L.-S., Hsu, C.-C., & Zeng, W.-R. (2008). An information granulation based data mining approach for classifying imbalanced data. Information Sciences, 178(16), 3214-3227.
    Chiu, C.-Y., Kuo, I. T., & Lin, C.-H. (2009). Applying artificial immune system and ant algorithm in air-conditioner market segmentation. Expert Systems with Applications, 36(3, Part 1), 4437-4442.
    Das, S., Abraham, A., & Konar, A. (2008). Automatic kernel clustering with a Multi-Elitist Particle Swarm Optimization Algorithm. Pattern Recognition Letters, 29(5), 688-699.
    Eastham, J. A., Kattan, M. W., Fearn, P., Fisher, G., Berney, D. M., Oliver, T., et al. (2008). Local Progression among Men with Conservatively Treated Localized Prostate Cancer: Results from the Transatlantic Prostate Group. European Urology, 53(2), 347-354.
    El-Zonkoly, A. M. (2011). Optimal placement of multi-distributed generation units including different load models using particle swarm optimization. Swarm and Evolutionary Computation, 1(1), 50-59.
    Fu, W., Madan, E., Yee, M., & Zhang, H. (2012). Progress of molecular targeted therapies for prostate cancers. Biochimica et Biophysica Acta (BBA) - Reviews on Cancer, 1825(2), 140-152.
    Kashan, M. H., Nahavandi, N., & Kashan, A. H. (2012). DisABC: A new artificial bee colony algorithm for binary optimization. Applied Soft Computing, 12(1), 342-352.
    Kuo, R. J., An, Y. L., Wang, H. S., & Chung, W. J. (2006). Integration of self-organizing feature maps neural network and genetic K-means algorithm for market segmentation. Expert Systems with Applications, 30(2), 313-324.
    Kuo, R. J., & Lin, L. M. (2010). Application of a hybrid of genetic algorithm and particle swarm optimization algorithm for order clustering. Decision Support Systems, 49(4), 451-462.
    Kuo, R. J., Wang, H. S., Hu, T.-L., & Chou, S. H. (2005). Application of ant K-means on clustering analysis. Computers & Mathematics with Applications, 50(10–12), 1709-1724.
    Lopez, V., Fernandez, A., Moreno-Torres, J. G., & Herrera, F. (2012). Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics. Expert Systems with Applications, 39(7), 6585-6608.
    Loyola, P., Roman, P. E., & Velasquez, J. D. (2012). Predicting web user behavior using learning-based ant colony optimization. Engineering Applications of Artificial Intelligence, 25(5), 889-897.
    Maulik, U., & Bandyopadhyay, S. (2000). Genetic algorithm-based clustering technique. Pattern Recognition, 33(9), 1455-1465.
    Messing, E. M., & Thompson Jr, I. (2003). Follow-up of conservatively managed prostate cancer: Watchful waiting and primary hormonal therapy. Urologic Clinics of North America, 30(4), 687-702.
    Muniyandi, A. P., Rajeswari, R., & Rajaram, R. (2012). Network Anomaly Detection by Cascading K-Means Clustering and C4.5 Decision Tree algorithm. Procedia Engineering, 30(0), 174-182.
    Murthy, C. A., & Chowdhury, N. (1996). In search of optimal clusters using genetic algorithms. Pattern Recognition Letters, 17(8), 825-832.
    Niknam, T., & Amiri, B. (2010). An efficient hybrid approach based on PSO, ACO and k-means for cluster analysis. Applied Soft Computing, 10(1), 183-197.
    Stephan, C., Jung, K., Lein, M., & Diamandis, E. P. (2007). PSA and other tissue kallikreins for prostate cancer detection. European Journal of Cancer, 43(13), 1918-1926.
    Su, C.-T., Chen, L.-S., & Chiang, T.-L. (2006). A neural network based information granulation approach to shorten the cellular phone test process. Computers in Industry, 57(5), 412-423.
    Su, C.-T., Chen, L.-S., & Yih, Y. (2006). Knowledge acquisition through information granulation for imbalanced data. Expert Systems with Applications, 31(3), 531-541.
    Timmis, J., Neal, M., & Hunt, J. (2000). An artificial immune system for data analysis. Biosystems, 55(1–3), 143-150.
    Wan, Y., Pei, T., Zhou, C., Jiang, Y., Qu, C., & Qiao, Y. (2012). ACOMCD: A multiple cluster detection algorithm based on the spatial scan statistic and ant colony optimization. Computational Statistics & Data Analysis, 56(2), 283-296.
    Yang, D., Jiao, L., Gong, M., & Liu, F. (2011). Artificial immune multi-objective SAR image segmentation with fused complementary features. Information Sciences, 181(13), 2797-2812.
    Yusup, N., Zain, A. M., & Hashim, S. Z. M. (2012). Overview of PSO for Optimizing Process Parameters of Machining. Procedia Engineering, 29(0), 914-923.
    Zhang, C., Ouyang, D., & Ning, J. (2010). An artificial bee colony approach for clustering. Expert Systems with Applications, 37(7), 4761-4767.
    Zhu, Z., Zhu, X., Guo, Y., Ye, Y., & Xue, X. (2012). Inverse matrix-free incremental proximal support vector machine. Decision Support Systems, 53(3), 395-405.
    Zopounidis, C., & Doumpos, M. (2002). Multicriteria classification and sorting methods: A literature review. European Journal of Operational Research, 138(2), 229-246.

    無法下載圖示 全文公開日期 2018/07/26 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE