簡易檢索 / 詳目顯示

研究生: 艾坦琉
Talitha - Octaviani
論文名稱: 運用腦部健診資料探討民眾回診之決策法則
Investigate the Decision Rules of Examinees' Re-coming From a Cerebrovascular Health Checkup Dataset
指導教授: 歐陽超
Chao Ou-Yang
口試委員: 郭人介
Ren-Jieh Kuo
汪漢澄
Han-Cheng Wang
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業管理系
Department of Industrial Management
論文出版年: 2015
畢業學年度: 103
語文別: 英文
論文頁數: 123
中文關鍵詞: 回檢健檢者腦血管健康檢查決策樹粒子群演算法K-means分群法
外文關鍵詞: Examinees’ re-coming, Cerebrovascular Health Examination, Decision Tree, Particle Swarm Optimization, K-means clustering.
相關次數: 點閱:323下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 面對多數民眾擔心罹患疾病卻不自知的情況,一般健康檢查應該成為民眾的例行習慣。隨著健康檢查的普遍化,醫院內健檢資料逐漸增加,對此龐大資料,如能正確地應用資料探勘手法進行分析,將可有效的幫助醫院提升醫療品質。在醫療一般健檢上,存在著健檢者是否會回診之情況,本研究應用健檢資料先以粒子群最佳化方法進行特徵擷取,接著以K-means分群法對健檢者進行分,最後決策樹關聯法則將可提供醫院對可能回診之健檢者進行相關之追蹤與行銷。


    The problem faced for most people are being sick without knowing it. Regular health check-ups should be standard procedure in every person’s health routine. Preventive medical check-ups in this context is preventing the cerebrovascular disease, could be understood as one of the condition that enables the re-coming of the examinees besides doing the treatment for the disease. Establishing examinee’s diagnoses often determine the recommendation made to the examinees. The health examination data provides the information needed to make an accurate prediction of the re-coming of the examinee with diagnosis for the examinee’s condition. The condition of the examinee that match the criteria is predicted as do the re-coming and if the rule has the ABD score the examinee indicated as vulnerable to cerebrovascular disease which will have higher chance to come back to the hospital whether to do another checkup for preventing the disease or even doing the treatment for the disease. Knowledge Discovery has the potential to obtain the useful information from the dataset. Decision tree method is one of DM approach that capable to generate decision tree that can be converted to produce comprehensive rules. In generating the rules, the problem occurred is more attribute used more complex the rules would be. In this research, a metaheuristic approach applied in order to maximize the efficiency of the rules generated. The method applied is Particle Swarm Optimization (PSO). This method is applied to do feature selection in order to minimize the error of the tree.

    Master’s Thesis Recommendation Form I Qualification Form by Master’s Degree Examinations Committee II 摘要 III ABSTRACT IV ACKNOWLEDGEMENTS V CONTENTS VI LISTS OF TABLES IX LISTS OF FIGURES X CHAPTER 1 INTRODUCTION 11 1.1 Research Background 11 1.2 Research Objectives 13 1.3 Scope and Constraints 14 1.3.1 Scopes 14 1.3.2 Constraints 14 1.4 Research framework 15 CHAPTER 2 LITERATURE REVIEW 16 2.1 Cerebrovascular Disease 16 2.2 Cerebrovascular Parameter 18 2.3 State of The Art 18 2.4 Data Mining and Knowledge Discovery 21 2.4.1 Knowledge Representation 22 2.4.1.1 Prediction Rules 22 2.4.2 Clustering (K-means Clustering) 22 2.4.2.1 Determining the Number of Cluster (Elbow Method) 24 2.4.3 Data Mining 24 2.4.3.1 Classification 24 2.4.3.2 Classifier Performance Evaluation (k-fold cross-validation) 25 2.4.3.3 Classification Model Validation (Error Rate) 25 2.4.3.4 Decision Tree Induction (CART) 27 2.4.3.5 Rule-Extraction from a Decision Tree 29 2.5 Metaheuristic 30 2.5.1 Particle swarm optimization algorithm (PSO) 30 2.5.2 Binary Particle Swarm Optimization (BPSO) 31 2.5.3 Particle Swarm Optimization for Feature Selection 31 CHAPTER 3 METHODOLOGY 32 3.1 Data collection 34 3.2 Data Pre-processing 34 3.2.1 Feature Reduction 35 3.2.2 Remove the Outliers 35 3.2.3 Data Categorization 35 3.2.4 Clustering Data (K-means clustering) 36 3.3 Data Processing 36 3.3.1 Particle Swarm Optimization 36 3.3.2 Decision Tree Algorithm (CART) 39 CHAPTER 4 IMPLEMENTATION 41 4.1 Data Analysis 41 4.2 Data Preprocessing 44 4.3 Data Processing 47 4.3.1 Discovering the Decision Model for Examinees’ Re-coming 48 4.4 Data Post-processing 52 4.4.1 Discovering the Decision Rules for Examinees’ Re-coming 52 4.4.2 Defining Cluster and Rules Characteristic and ABD Score for each rule 57 4.5 Discussion and Analysis 72 4.5.1 PSODT Analysis 72 4.5.2 Decision Rules Result 72 4.5.3 Predict The Examinee’s Re-coming 96 CHAPTER 5 CONCLUSION AND FUTURE RESEARCH 100 5.1 Conclusion 100 5.2 Contributions 102 5.3 Future Research 102 REFERENCES 104 APPENDIX A. Matlab Source Code 110 APPENDIX B. Best Particles Result 117 APPENDIX C. Lowest Error Decision Tree 121

    Adegoke, B., Ola, B., & Omotayo, M. (2014). Review of Feature Selection Methods in Medical Image Processing. IOSR Journal of Engineering (IOSRJEN), Vol. 04(01), 01-05.
    Anyanwu, M. N., & Shiva, S. G. (2009). Comparative analysis of serial decision tree classification algorithms. International Journal of Computer Science and Security, 3(3), 230-240.
    Arabie, P., Hubert, L. J., & de Soete, G. (1996). Clustering and classification: World Scientific.
    Bai, Q. (2010). Analysis of particle swarm optimization algorithm. Computer and information science, 3(1), p180.
    Bass. (2015). 7 Dangers of Very High Triglycerides. from http://www.bettermedicine.com/the-dangers-of-very-high-triglycerides/dangers-of-very-high-triglycerides
    Berkhin, P. (2006). A survey of clustering data mining techniques Grouping Multidimensional Data (pp. 25-71): Springer Berlin Heidelberg.
    Berlingerio, M., Bonchi, F., Giannotti, F., & Turini, F. (2007). Mining clinical data with a temporal dimension: a case study. Paper presented at the Bioinformatics and Biomedicine, 2007. BIBM 2007. IEEE International Conference on.
    Bhukya, D. P., Ramachandram, S., & AL, R. S. (2009). Performance evaluation of partition based clustering algorithms in grid environment using design of experiments. Performance Evaluation, 2076, 331X.
    Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and regression trees: CRC press.
    Bruha, I. (2001). Pre-and post-processing in machine learning and data mining Machine Learning and Its Applications (pp. 258-266): Springer.
    Chakrabarti, S., Ester, M., Fayyad, U., Gehrke, J., Han, J., Morishita, S., Piatetsky-Shapiro, G., Wang, W. (2006). Data mining curriculum: a proposal, Version 1.0. ACM SIGKDD, 23.

    Changala, R., Gummadi, A., Yedukondalu, G., & Raju, U. (2012). Classification by decision tree induction algorithm to learn decision trees from the classlabeled training tuples. International Journal of Advanced Research in Computer Science and Software Engineering, 2(4), 427-434.
    Chen, K.-H., Wang, K.-J., Tsai, M.-L., Wang, K.-M., Adrian, A. M., Cheng, W.-C., Yang, T-Z., Tan, K-P., Chang, K.-S. (2014). Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm. BMC bioinformatics, 15(1), 49.
    Childs, J. D., & Cleland, J. A. (2006). Development and application of clinical prediction rules to improve decision making in physical therapist practice. Physical Therapy, 86(1), 122-131.
    Cho, Y.-J., Lee, H., & Jun, C.-H. (2011). Optimization of Decision Tree for Classification Using a Particle Swarm. Industrial Engineeering & Management Systems, 10(4), 272-278.
    Elkan, C. (2011). Evaluating classifiers. from http://cseweb.ucsd.edu/~elkan/250Bwinter2011/classifiereval.pdf
    Ethni, S., Zahawi, B., Giaouris, D., & Acarnley, P. (2009). Comparison of particle swarm and simulated annealing algorithms for induction motor fault identification. Industrial Informatics, 2009. INDIN 2009. 7th IEEE International Conference on, 470-474.
    Fakhari, A. (2015). What are pros and cons of decision tree versus other classifier as. from http://www.researchgate.net/post/What_are_pros_and_cons_of_decision_tree_versus_other_classifier_as_KNN_SVM_NN
    Freitas, A. A. (2002). Data mining and knowledge discovery with evolutionary algorithms: Springer Science & Business Media.
    Ghannad-Rezaie, M., Soltanain-Zadeh, H., Siadat, M.-R., & Elisevich, K. V. (2006). Medical data mining using particle swarm optimization for temporal lobe epilepsy. Evolutionary Computation, 2006. CEC 2006. IEEE Congress on, 761-768.
    Gheyas, I. A., & Smith, L. S. (2010). Feature subset selection in large dimensionality domains. Pattern recognition, 43(1), 5-13.
    Grubbs, F. E. (1969). Procedures for detecting outlying observations in samples. Technometrics, 11(1), 1-21.

    Gulyani, T. (2015). Technology For You: K-Means Clustering Advantages and Disadvantages. from http://playwidtech.blogspot.tw/2013/02/k-means-clustering-advantages-and.html
    Herani, I. R. (2013). Development of Carotid Artery Diagnostic Prediction Model using Hybrid Data Mining Approach.
    Hospital, F. (2015). Cerebrovascular Disease | Florida Hospital. from https://www.floridahospital.com/cerebrovascular-disease
    Huang, Z. (1998). Extensions to the k-means algorithm for clustering large data sets with categorical values. Data mining and knowledge discovery, 2(3), 283-304.
    Hunt, E. B., Marin, J., & Stone, P. J. (1966). Experiments in induction The American Journal of Psychology (Vol. 80, pp. 651-653): University of Illinois Press.
    Ingui, B. J., & Rogers, M. A. (2001). Searching for clinical prediction rules in MEDLINE. Journal of the American Medical Informatics Association, 8(4), 391-397.
    Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: a review. ACM computing surveys (CSUR), 31(3), 264-323.
    Juntao, W., & Xiaolong, S. (2011). An improved K-Means clustering algorithm. Communication Software and Networks (ICCSN), 2011 IEEE 3rd International Conference on, 44-46.
    Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization. Neural Networks, 1995. Proceedings., IEEE International Conference on, 4, 1942-1948 vol.1944.
    Kodinariya, T. M., & Makwana, P. R. (2013). Review on determining number of Cluster in K-Means Clustering. International Journal, 1(6).
    Komaroff, D. A. (2015). What do the new blood pressure guidelines mean for the 65+ age group? , from http://www.askdoctork.com/new-blood-pressure-guidelines-mean-65-age-group-201403216195
    Kumari, B., & Swarnkar, T. (2011). Filter versus wrapper feature subset selection in large dimensionality micro array: A review. International Journal of Computer Science and Information Technologies, 2 ((3)), 1048-1053.
    Liu, B. (2006). Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications): Springer-Verlag New York, Inc.
    M Harb, H., & S Desuky, A. (2014). Feature Selection on Classification of Medical Datasets based on Particle Swarm Optimization. International Journal of Computer Applications, 104(5), 14-17.
    Maimon, O. (2007). Data Mining With Decision Trees: Theory and Applications. Series in Machine Perception & Artificial Intelligence: World Scientific.
    Maimon, O., & Rokach, L. (2010). Data Mining and Knowledge Discovery Handbook: Springer.
    Maslarov, D., & Drenska, D. (2012). Association between ABCD2 score, cerebral vascular territory, and dyslipidemia in patients with transient ischemic attack (our Bulgarian experience). International Journal of Stroke, 7(6), E1-E1.
    McGinn, T. G., Guyatt, G. H., Wyer, P. C., Naylor, C. D., Stiell, I. G., Richardson, W. S., & Group, E.-B. M. W. (2000). Users' guides to the medical literature: XXII: how to use articles about clinical decision rules. Jama, 284(1), 79-84.
    Montagu, A., Reckless, I. P., & Buchan, A. M. (2012). Stroke: management and prevention. Medicine, 40(9), 490-499.
    Navi, B. B., Kamel, H., Shah, M. P., Grossman, A. W., Wong, C., Poisson, S. N., Whelstone, W, D., Josepshon, S, A., Johnston, S, C., Kim, A. S. (2012). Application of the ABCD2 score to identify cerebrovascular causes of dizziness in the emergency department. Stroke, 43(6), 1484-1489.
    NITHYA, N., Duraiswamy, K., & Gomathy, P. (2013). A survey on clustering techniques in medical diagnosis. International Journal of Computer Science Trends and Technology (IJCST), 1(2), 17-23.
    Norton, A. High Blood Pressure in Young Adults Could Mean Heart Trouble in Middle Age. May 29, 2015, from http://www.webmd.com/hypertension-high-blood-pressure/news/20140204/high-blood-pressure-in-young-adults-could-mean-heart-trouble-in-middle-age
    Oracle. (2015). Classification. from http://docs.oracle.com/cd/B28359_01/datamine.111/b28129/classify.htm#DMCON004
    Piatetsky-Shapiro, G. (1997). Knowledge Discovery and Acquisition from Imperfect Information. In A. Motro & P. Smets (Eds.), Uncertainty Management in Information Systems (pp. 155-188): Springer US.
    Refaeilzadeh, P., Tang, L., & Liu, H. (2009). Cross-validation Encyclopedia of database systems (pp. 532-538): Springer.
    Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323-2326.
    Sayad, D. S. (2015). Decision Tree. from http://www.saedsayad.com/decision_tree.htm
    Selvi, V., & Umarani, D. R. (2010). Comparative analysis of ant colony and particle swarm optimization techniques. International Journal of Computer Applications, 5(4).
    Shaffer, D., Fisher, P., Dulcan, M. K., Davies, M., Piacentini, J., Schwab-Stone, M. E., Lethey, B, B., Bourdon, K., Jonsen, P, S., Bird, H. R. (1996). The NIMH Diagnostic Interview Schedule for Children Version 2.3 (DISC-2.3): Description, acceptability, prevalence rates, and performance in the MECA study. Journal of the American Academy of Child & Adolescent Psychiatry, 35(7), 865-877.
    Shi, Y., & Eberhart, R. (1998). A modified particle swarm optimizer. Evolutionary Computation Proceedings, 1998. IEEE World Congress on Computational Intelligence., The 1998 IEEE International Conference on, 69-73.
    Sisodia, D., Singh, L., Sisodia, S., & Saxena, K. (2012). Clustering Techniques: A Brief Survey of Different Clustering Algorithms. Int. J. Latest Trend Eng. Technol, 1(3).
    Tan, P.-N., Steinbach, M., & Kumar, V. (2005). Introduction to Data Mining, (First Edition): Addison-Wesley Longman Publishing Co., Inc.
    Tan, P.-N., Steinbach, M., & Kumar, V. (2006). Introduction to data mining (Vol. 1): Pearson Addison Wesley Boston.
    Toussi, M., Lamy, J.-B., Le Toumelin, P., & Venot, A. (2009). Using data mining techniques to explore physicians' therapeutic decisions when clinical guidelines do not provide recommendations: methods and example for type 2 diabetes. BMC Medical Informatics and Decision Making, 9(1), 28.
    Tsai, M.-C., Chen, K.-H., Su, C.-T., & Lin, H.-C. (2012). An Application of PSO Algorithm and Decision Tree for Medical Problem. 2nd Internatonal Conference on Intelligent Computational System (ICS’2012) Oct, 13-14.
    Walter. (2012). Preventive Care - Preventive Medical Check-up for Adults. 2012, from https://www.wien.gv.at/english/health-socialservices/preventive.html
    Xue, B. (2014). Particle Swarm Optimisation for Feature Selection in Classification. (Doctor of Philosophy), Victoria University of Wellington.
    Xue, B., Zhang, M., & Browne, W. N. (2013). Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE transactions on cybernetics, 43(6), 1656-1671.
    Yohannes, Y., & Hoddinott, J. (1999). Classification and regression trees: an introduction. International Food Policy Research Institute, 2033.
    Young, M., Radcliffe, T., John, P, St. (2015). K-Means Clustering Overview. from http://www.improvedoutcomes.com/docs/WebSiteDocs/Clustering/K-Means_Clustering_Overview.htm
    Zhang, Y., Wang, S., Phillips, P., & Ji, G. (2014). Binary PSO with mutation operator for feature selection using decision tree applied to spam detection. Knowledge-Based Systems, 64, 22-31.
    Zhao, H., Sinha, A. P., & Ge, W. (2009). Effects of feature construction on classification performance: An empirical study in bank failure prediction. Expert Systems with Applications, 36(2), 2633-2644.

    QR CODE