簡易檢索 / 詳目顯示

研究生: 吳真莉
CHANDRAWATI - PUTRI WULANDARI
論文名稱: 運用多變數離散化方法於關聯規則之探勘-以腦部健檢資料庫為例
Applying a Multivariate Discretization Method for Mining Association Rules from a Cerebrovascular Health Examination Dataset
指導教授: 歐陽超
Chao Ou-Yang
口試委員: 郭人介
Ren-Jieh Kuo
汪漢澄
Han-Cheng Wang
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業管理系
Department of Industrial Management
論文出版年: 2014
畢業學年度: 102
語文別: 英文
論文頁數: 127
中文關鍵詞: 關聯規則離散化Apriori算法數據挖掘遺傳算法中風腦血管疾病
外文關鍵詞: Association Rules, Discretization, Apriori, Data Mining, Genetic Algorithm, Stroke, Cerebrovascular Disease
相關次數: 點閱:222下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在台灣與世界各國,腦血管疾病以及中風,常常對健康有重大的威脅。透過磁共振(Magnetic Resonance Imaging)產生的腦部影像,來檢查中風問題卻相當昂貴。資料探勘被廣泛的用在各種分析上,其中也包含了醫療資料,在這個研究中我們用資料探勘來尋找中風的關聯法則。
    此研究為了產生出高信賴度的關聯法則,參考病人進行腦血管檢驗的的結果,此研究的資料庫,來自台灣當地的醫療中心過去所使用的腦血管健康檢查資料。MVD-CG在分群技術與基因演算法的基礎上,第一次被採用來將連續型的資料離散化。接著,透過過去的方法產生關聯情形,本研究的研究成果顯示了,使用資料探勘能夠在醫療資料中找到有趣、有用且重要的資訊。因此基於我們所找到的法則,醫生能夠花更多時間注意跟察覺明顯的特徵。本研究專致力於用離散化方法尋找關聯法則,然後辨識出潛藏在腦血管疾病背後的重要因子並尋找他們的關聯性,同時也考慮各分群的共通性。


    Cerebrovascular diseases, or strokes, have always been highlighted as a big threat to health in Taiwan as well as worldwide. It is costly to detect stroke through brain image examination like Magnetic Resonance Imaging (MRI). Data mining has been used widely to analyze different types including medical data. In this paper, data mining is used to find the association rules for stroke. The purpose of this study is to generate high confidence of association rules from patient’s cerebrovascular examination results as a reference.
    In this research, a dataset of cerebrovascular health examination from a local medical center in Taiwan is used. MVD-CG is applied firstly to discretize the continuous data based on clustering and genetic algorithm. Towards the end, association patterns will be generated by using apriori method. The result shows that this research has demonstrated the use of data mining to find interesting, useful, and important knowledge towards medical data. Hence, based on these finding rules, the physicians can pay more attention and aware to the features highlighted. This research has focused on the association rules mining based on discretization method, to identify the important features to be concerned behind the cerebrovascular disease, as well as considered common pattern of each cluster.

    ABSTRACTii ACKNOWLEDGMENTiii TABLE OF CONTENTiv LIST OF FIGUREvi LIST OF TABLEvii CHAPTER I INTRODUCTION1 1.1.Background1 1.2.Purpose5 1.3.Research Structure5 CHAPTER II LITERATURE REVIEW7 2.1.Cerebrovascular Disease7 2.2.Risk Factors8 2.3.State of The Art8 2.4.Data Mining and Knowledge Discovery10 2.5.Data Preprocessing12 2.5.1.Discretization of Continuous Attributes13 2.5.2.Multivariate Discretization Based on Density Clustering and Genetic Algorithm (MVD-CG)15 2.6.Association Rules19 2.6.1.Apriori Algorithm22 CHAPTER III RESEARCH METHODOLOGY25 3.1.Data Collection (Input Data)25 3.2.Data Preprocessing27 3.2.1.Data Normalization27 3.2.2.Data Discretization with MVD-CG Algorithm28 CHAPTER IV IMPLEMENTATION36 4.1.Data Collection36 4.2.Data Preprocessing40 4.2.1.Data Normalization42 4.2.2.Data Discretization42 4.3.Data Mining48 4.3.1.Discovering Association Rules for Stroke Dataset49 4.3.2.Discovering Association Rules for Neck Vascular Dataset58 4.4.Data Postprocessing63 4.4.1.Result and Analysis of Association Rules Mining Results for Stroke Dataset66 4.4.2.Result and Analysis of Association Rules Mining Results for Neck Vascular Dataset75 4.4.3.Comparison of Association Rules Results Between Stroke Dataset and Neck Vascular Dataset81 CHAPTER V CONCLUSION AND FUTURE RESEARCH86 5.1.Conclusion86 5.2.Future Research88 REFERENCES88 APPENDIX A. Discretization Results96 APPENDIX B. Association Rules Result For Stroke Dataset102 APPENDIX C. Association Rules Results For Neck Vascular Disease109

    (MOHW), M. o. (2013, July 4). 2012 Statistics of Causes of Death. Dipetik February 7, 2014, dari Ministry of Health and Welfare: http://www.mohw.gov.tw/EN/Ministry/Statistic.aspx?f_list_no=474&fod_list_no=3523
    (WHO), W. H. (2013, July). The top 10 causes of death. Dipetik February 7, 2014, dari World Health Organization: http://who.int/mediacentre/factsheets/fs310/en/#
    Agrawal, C. C., & Yu, P. S. (1999). Data Mining Techniques for Associations, Clustering, and Classification. Dalam N. Zhong, & L. Zhou, Methodologies for Knowledge Discovery and Data Mining (hal. 13-23). Heidelberg: Springer-Verlag Berlin.
    Agrawal, R., & Srikant, R. (1994). Fast Algorithm for Mining Association Rules . Proceedings of the 20th VLDB Conference. Santiago, Chile: Very Large Data Base Endowment (VLDB).
    Agustianty, S. (2012). Developing a Data Mining Approach to Investigate Association between Physician Prescription and Patient Outcome - A Study on Rehospitalization in Stevens-Johnson Syndrome. Taipei: Department of Industrial Management National Taiwan University of Science and Technology .
    Ahmmed , S., Ahmmed , S., & Rahman, C. M. (2007). Discretization of Continuous Attributes Using Genetic and Entropy Based Concept Learner. Daffodil International University Journal of Science and Technology , 13-19.
    Ankerst, M., Breunig, M. M., Kriegel, H.-P., & Sander, J. (1999). OPTICS: Ordering Points To Identify the Clustering Structure. International Conference On Management of Data (hal. 49-60). Philadelphia: Proceedings of the ACM SIGMOD'99 .
    Bansal , D., & Bhambhu , L. (2013). Execution of Apriori Algorithm of Data Mining Directed Tumultuous Crimes Concerning Women . International Journal of Advanced Research in Computer Science and Software Engineering , 54-62.
    Bruha, I. (2001). Pre- and Post-processing in Machine Learning and Data Mining. Dalam G. Paliouras, V. Karkaletsis, & C. D. Spyropoulos, Machine Learning and Its Applications (hal. 258-266). Heidelberg: Springer-Verlag Berlin Heidelberg.
    Caplan, L. R. (2009). Subarachnoid Hemorrhage, Aneurysms, and Vascular Malformations. Caplan's Stroke : A Clinical Approach, 4th Edition, 446–486.
    Chaves, R., Ramirez, J., & Gorriz, J. M. (2013). Integrating Discretization and Association Rule-Based Classification for Alzheimer's Disease Diagnosis. Expert Systems with Applications, 1571-1578.
    Daszykowski, M., Walczak, B., & Massart, D. L. (2001). Looking for Natural Patterns in Data Part 1. Density-based Approach. Chemometrics and Intelligent Laboratory Systems 56, 83-92.
    Deora, C. S., Arora, S., & Makani , Z. (2013). Comparison of Interestingness Measures: Support-Confidence Framework versus Lift-Irule Framework. International Journal of Engineering Research and Applications (IJERA), Volume 3(Issue 2), 208-215.
    Dougherty , J., Kohavi , R., & Sahami , M. (1995). Supervised and Unsupervised Discretization of Continuous Features . Machine Learning: Proceedings on The Twelfth International Conference, (hal. 194-202).
    Forina, M., Oliveros, M. C., Casolino, C., & Casale , M. (2004). Minimum Spanning Tree: Ordering Edges To Identify Clustering Structure. Analytica Chimica Acta 515, 43-53.
    Foundation, V. D. (2012). Carotid Artery Disease . Dipetik January 17, 2014, dari Vascular Disease Foundation (Fighting Vascular Disease, Improving Vascular Health): http://www.vasculardisease.org/flayers/carotid-artery-disease-flyer.pdf
    Garcia, E., Romero, C., Ventura, S., & Calders, T. (2010). Drawbacks and Solutions of Applying Association Rule Mining in Learning Management Systems. Proceeding of the International Workshop on Applying Data Mining in e-Learning , (hal. 13-22).
    Han, J., Kamber, M., & Pei , J. (2012). Data Mining Concepts and Techniques. Waltham, USA: Morgan Kaufmann Elsevier.
    Health, N. I. (2010, November ). What Are the Signs and Symptoms of Carotid Artery Disease? Dipetik January 27, 2014, dari National Heart, Lung and Blood Institute : https://www.nhlbi.nih.gov/health/health-topics/topics/catd/signs.html
    Health, N. I. (2010, November). What Is Carotid Artery Disease? Dipetik January 28, 2014, dari National Health, Lung, and Blood Institute : https://www.nhlbi.nih.gov/health/health-topics/topics/catd/
    Health, N. I. (2010, November 1). Who Is at Risk for Carotid Artery Disease? Dipetik January 27, 2014, dari National Heart, Lung, and Blood Institute : https://www.nhlbi.nih.gov/health/health-topics/topics/catd/atrisk.html
    Herani, I. R. (2013). Development of Carotid Artery Diagnostic Prediction Model using Hybrid Data Mining Approach . Taipei: Department of Industrial Management National Taiwan University of Science and Technology .
    Hillenmeyer, M. (2005, June). Applied Statistics. Dipetik February 5, 2014, dari http://www.stanford.edu/~maureenh/quals/pdf/stats.pdf
    Hu, R. (2010). Medical Data Mining Based on Association Rules . Canadian Center of Science and Education - Computer and Information Science, 104-108.
    Hyndman, R. J. (1995, July 5). The Problem with Sturges' Rule for Constructng Histogram. Dipetik February 5, 2014, dari http://www.robjhyndman.com/papers/sturges.pdf
    Jahirabadkar, S., & Kulkarni, P. (2014). Algorithm to determine ε-distance parameter in density based clustering. Journal of Expert Systems with Applications 41, 2939-2946.
    Johnson, S. (2012, August 20). Carotid Artery Disease . Dipetik January 27, 2014, dari Healthline: http://www.healthline.com/health/carotid-artery-disease?toptoctest=expand
    Kumar, R., & Kalia, A. (2013). A Comparative Study of Association Rule Algorithms for Investment in Related Sector of Stock Market. International Journal of Computer Applications, 62, 0975-8887.
    Kuo, R.-J., & Shih, C. W. (2007). Association Rule Mining Through The Ant Colony System for National Health Insurance Research Database in Taiwan. Computers and Mathematics with Applications, 54, 1303-1318.
    Liu, H., Hussain, F., Tam, C. L., & Dash, M. (2002). Discretization : An Enabling Techniques. Dalam Fayyad, Mannila, & Ramakrishnan, Data Mining and Knowledge Discovery, 6 (hal. 393-423). The Netherlands: Kluwer Academic Publisher.
    Maimon, O., & Rokach, L. (2010). Introduction to Knowledge Discovery and Data Mining. Dalam O. Maimon, & L. Rokach, Data Mining and Knowledge Discovery Handbook (hal. 1-15). New York: Springer Science Business Media.
    Man, K. F., Tang , K. S., & Kwong, S. (1996). Genetic Algorithms: Concepts and Applications. IEEE Transactions on Industrial Electronics. 43, hal. 519-534. IEEE.
    Maslove, D. M., Podchiyska, T., & Lowe, H. J. (2012). Discretization of Continuous Features in Clinical Datasets. J Am Med Inform Assoc, 1-10.
    Nahar , J., Imam, T., Tickle , K. S., & Chen , Y.-P. P. (2013). Association Rule Mining to Detect Factors Which Contribute to Heart Disease in Males and Females. Expert System with Applications, 1086-1093.
    Negnevitsky, M. (2005). Artificial Intelligent A Guide to Intelligent Systems. London: Pearson Education.
    Rajeswari, K., Shindalkar, M., Thorawade, N., & Bhandari, P. (2013, Jul-Aug). DSS Using Apriori Algorithm, Genetic Algorithm and Fuzzy Logic. International Journal of Engineering Research and Applications (IJERA), 3(4), 132-136.
    Rieza, M. (2013). Applying a Hybrid Data Preprocessing Methods in Stroke Prediction. Taipei: Department of Industrial Management National Taiwan University of Science and Technology .
    Service, N. H. (2012, August 29). Stroke - Symptoms . Dipetik January 27, 2014, dari NHS Choices (Your Health, Your Choices): http://www.nhs.uk/Conditions/Stroke/Pages/Symptoms.aspx
    Sheikh, L. M., Tanveer, B., & Hamdani, S. M. (2004). Interesting Measure for Mining Association Rules. Proceeding of INMIC 2004. IEEE.
    Srinivas, K., Rao, G. R., & Govardhan, A. (2012). Analysis of Attribute Association in Heart Disease Using Data Mining Techniques. International Journal of Engineering Research and Applications Vol. 2 ISSN: 2248-9622, 1680-1683.
    Stroke, N. I. (2014, February). Stroke: Hope Through Research. Dipetik March 11, 2014, dari National Institute of Neurological Disorders and Stroke: Reducing the Burden of Neurological Disease: http://www.ninds.nih.gov/disorders/stroke/detail_stroke.htm
    Sturges, H. A. (1926). The Choice of a Class Interval. Journal of the American Statistical Association , 65-66.
    Surgery, S. f. (2010). Carotid Artery Disease, Stroke, Transient Ischemic Attacks (TIAs). Dipetik January 27, 2014, dari VascularWeb: https://www.vascularweb.org/vascularhealth/Pages/carotid-artery-disease-,-stroke-,-transient-ischemic-attacks-(-tias-)-.aspx
    Tan , P.-N., Steinbach, M., & Kumar, V. (2006). Introduction to Data Mining. Boston: Pearson Education Inc.
    Tseng, M.-C., & Lin, H.-J. (2009). Readmission after hospitalization for stroke in Taiwan: Results from a national sample. Journal of Neurological Science , 52-55.
    Wand, M. P. (1997). Data-Based Choice of Histogram Bin Width. The American Statistician, Vol. 51, No. 1, 59-64.
    WebMD. (undated). Stroke: Causes and Risk Factors. Dipetik 03 31, 2014, dari WebMD: http://www.webmd.com/heart/atrial-fibrillation-stroke-11/stroke-causes-risks
    Wei, H. (2009). A Novel Multivariate Discretization Method for Mining Association Rules . Asia-Pacific Conference on Information Processing (hal. 378-381). IEEE.
    Witten , I. H., Eibe, F., & Hall, M. A. (2011). Data Mining: Practical Machine Learning Tools and Techniques . Burlington, M.A, USA: Morgan Kaufmann.
    Yan, Y., Luo, X., Zhang, J., Su, L., Liang, W., Huang, G., et al. (2014). Association Between Phospodiesterase 4D Polymorphism SNP83 and Ischemic Stroke. Journal of the Neurological Sciences, 3-11.
    Yoo, I., Alafaireet, P., Marinov, M., Pena-Hernandez, K., Gopidi, R., Chang, J.-F., et al. (2012). Data Mining in Healthcare and Biomedicine: A Survey of the Literature. J Med Syst, 2431-2448.
    Zhang, X., & Liu, S. (2007). A New Inteval-Genetic Algorithm. International Conference on Natural Computation (ICNC 2007).
    Zhao, Q., & Bhowmick, S. (2003). Association Rule Mining: A Survey. Singapore: CAIS Nanyang Technological University .

    QR CODE