簡易檢索 / 詳目顯示

研究生: 嗎嘉應
Adinda - Oktavia Kusumanegara
論文名稱: 運用階層式分群法及加權Apriori探討腦部健檢民眾回診之關聯法則
Applying Hierarchical Clustering and Weighted Apriori to Investigate the Examinees' Re-Coming Association Rules
指導教授: 歐陽超
Chao Ou-Yang
口試委員: 郭人介
Ren-Jieh Kuo
汪漢澄
Han-Cheng Wang
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業管理系
Department of Industrial Management
論文出版年: 2015
畢業學年度: 103
語文別: 英文
論文頁數: 93
中文關鍵詞: 回檢腦血管健康檢查華德法分群權重Apriori
外文關鍵詞: Examinees’ Re-coming, Cerebrovasular Health Examination, Ward’s Agglomerative, Weighted-Apriori
相關次數: 點閱:325下載:6
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在一些國家,健康檢查是常見的健康照護方法之一。在過去十年,台灣健康照護的服務需求逐漸增加,是由於人口老齡化以及慢性病患樹增加所導致。然而,不可預期的到來,使得醫院難以提供令人滿意的服務。因此,醫院需訂定策略來處理相關問題。應用資料探勘技術在醫療領域可以提供醫院重要資訊。本研究使用群集技術與關聯規則預測腦血管健檢民眾回檢的模式。利用權重Apriori計算各屬性權重值找出規則,可幫助醫院提升服務品質。在執行關聯規則之前,將資料加以分群是必須的過程,因透過分群技術將相似資料聚集在一起後,可在不同群集間找出不同規則。本研究使用華德法階層式分群,此方法容易理解且不需要事先指定分群個數。


    General health examinations are common elements of health care in some country. Taiwan demand for healthcare services has increased over the past decade. The increase has been driven by several factors, including an ageing population, and the increasing prevalence of chronic disease. The fluctuation number of examinees with unpredictable coming behavior makes hospital difficult to provide the satisfying service. Hospital needs to make strategic planning such as healthcare management to solve this problem by predicting examinee coming. Data mining applications in healthcare is the realization that data mining can generate information that very useful to all parties involved in the healthcare industry, such as improving the treatment quality of hospitals. This research used clustering and association rule task to know the pattern of cerebrovascular medical examination databases to predict examinees’ re-coming. The Weighted-Apriori algorithm finds out the relationships among item sets using support, confidence, and weight of each feature as the priority rank of the association rule, the characteristic of the rule can be generated, which help the hospital to improve the service quality. The data is performed on partitions that are essentially distinct from each other is the reason why clustering performs before association rule mining is one of essential process. Each cluster would be expected to contain associations without interference or contamination from other sub groupings that have different patterns of relationships. This research used hierarchical clustering method called Ward’s agglomerative which relatively simple to understand, implement, and does not need to specify number of clusters in advance.

    Master’s Thesis Recommendation Form ................................................................... ii Qualification Form by Master’s Degree Examination Committee ....................... iii 摘要............................................................................................................................... iv ABSTRACT .................................................................................................................. v ACKNOWLEDGEMENTS ....................................................................................... vi CONTENTS................................................................................................................ vii LISTS OF TABLES ..................................................................................................... x LISTS OF FIGURES .................................................................................................. xi CHAPTER 1 INTRODUCTION .......................................................................... 1 1.1 Research Background ..................................................................................... 1 1.2 Research Objectives ........................................................................................ 3 1.3 Scope and Constraints ..................................................................................... 3 1.4 Research Structure .......................................................................................... 3 CHAPTER 2 LITERATURE REVIEW .............................................................. 5 2.1 Health Examination ........................................................................................ 5 2.2 Cerebrovascular Disease ................................................................................. 7 2.3 Risk Factors .................................................................................................... 8 2.4 State of the Art ................................................................................................ 9 2.5 Data Mining .................................................................................................. 10 2.5.1 Ward’s Agglomerative Hierarchical Clustering ............................... 12 2.5.2 Weighted Apriori .............................................................................. 14 CHAPTER 3 METHODOLOGY ....................................................................... 17 3.1 Data Collection ............................................................................................. 17 3.2 Data Preprocessing........................................................................................ 17 3.2.1 Feature Selection ............................................................................... 19 3.2.2 Data Cleaning.................................................................................... 19 3.2.3 Remove Outlier ................................................................................. 19 3.2.4 Data Normalization ........................................................................... 20 3.2.5 Data Splitting .................................................................................... 20 3.2.6 Transforming Data into Categorical Data ......................................... 20 3.3 Data Mining .................................................................................................. 21 3.3.1 Ward’s Agglomerative Hierarchical Clustering Method .................. 22 3.3.2 Apriori ............................................................................................... 23 3.3.3 Characterization of Frequent Itemsets .............................................. 24 3.3.4 Weighted-Apriori .............................................................................. 24 3.4 Data Postprocessing ...................................................................................... 25 CHAPTER 4 RESULTS ANALYSIS AND DISCUSSION .............................. 26 4.1 Data Collection ............................................................................................. 26 4.2 Data Preprocessing........................................................................................ 26 4.2.1 Feature Selection ............................................................................... 27 4.2.2 Data Cleaning.................................................................................... 28 4.2.3 Remove Outlier ................................................................................. 28 4.2.4 Data Splitting .................................................................................... 28 4.2.5 Data Normalization ........................................................................... 29 4.2.6 Transform Data into Categorical Data .............................................. 29 4.3 Data Mining .................................................................................................. 31 4.3.1 Ward’s Agglomerative Hierarchical Clustering Method .................. 31 4.3.2 Apriori ............................................................................................... 35 4.3.3 Weighted-Apriori .............................................................................. 41 4.4 Data Postprocessing ...................................................................................... 54 4.4.1 Discussion and Analysis ................................................................... 56 CHAPTER 5 CONCLUSIONS AND FEATURE RESEARCH ...................... 63 4.5 Conclusions ................................................................................................... 63 5.2 Contributions................................................................................................. 64 5.3 Future Research ............................................................................................ 64 REFERENCES ........................................................................................................... 65 APPENDIX A. Apriori Algorithm Results for 2 Comings ..................................... 69 APPENDIX B. Apriori Algorithm Results for 3 Comings ..................................... 71 APPENDIX C. Apriori Algorithm Results for 4 Comings ..................................... 75

    Agrawal, R., Gehrke, J., Gunopulos, D., & Raghavan, P. (1998). Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. 27(2), 94-105.
    AlZoubi, W. A. (2013). Mining Medical Databases Using Graph based Association Rules. International Journal of Mechani Learning and Computing, 3(3), 294-296.
    AmericanHeartAssociation. (2015). Types of Stroke. Retrieved May 13th, 2015, from http://www.strokeassociation.org
    Arumugam, G., & Vijayakumar, V. (2014). Efficient Mining of Weighted Quantitative Association Rules and Characterization of Frequent Itemsets. International Journal on Computer Science and Engineering, 6(1), 1-11.
    Bansal, D., & Bhambhu, L. (2013). Execution of APRIORI Algorithm of Data Mining Directed towards Tumultuous Crimes Concerning Women. International Journal of Advanced Research in Computer Science and Software Engineering, 3(9), 54-62.
    Batalli-Këpuska, A., Bajraktari, G., Zejnullahu, M., Azemi, M., Shala, M., Batalli, A., Ibrahimi, Pranvera., Jashari, Fisnik., Henein, M. Y. (2013). Abnormal Systolic and Diastolic Myocardial Function in Obese Asymptomatic Adolescents. International journal of cardiology, 168(3), 2347-2351.
    Behradmanesh, S., & Nasri, P. (2012). Serum Cholesterol and LDL-C in Association with Level of Diastolic Blood Pressure in Type 2 Diabetic Patients. Journal of Renal Injury Prevention, 1(1), 23-26.
    Blashfield, R. K. (1976). Mixture Model Tests of Cluster Analysis: Accuracy of Four Agglomerative Hierarchical Methods. Psychological Bulletin, 83(3), 377-388.
    Cambien, F., Warnet, J., Eschwege, E., Jacqueson, A., Richard, J., & Rosselin, G. (1987). Body Mass, Blood Pressure, Glucose, and Lipids. Does Plasma Insulin Explain Their Relationships? Arteriosclerosis, Thrombosis, and Vascular Biology, 7(2), 197-202.
    Carroll, D., Ring, C., Hunt, K., Ford, G., & Macintyre, S. (2003). Blood Pressure Reactions to Stress and The Prediction of Future Blood Pressure: Effects of Sex, Age, and Socioeconomic Position. Psychosomatic Medicine, 65(6), 1058-1064.
    Donnan, G. A., Fisher, M., Macleod, M., & Davis, S. M. (2008). Stroke. The Lancet, 371(9624), 1612-1623.
    Durairaj, M., & Ranjani, V. (2013). Data Mining Applications in Healthcare Sector a Study. International Journal of Scientific and Engineering Research, 2(10), 29-35.
    Eijkemans, M. J., van Houdenhoven, M., Nguyen, T., Boersma, E., Steyerberg, E. W., & Kazemier, G. (2010). Predicting The Unpredictable. Anesthesiology, 112(1), 41-49.
    Fayyad, U. M., Piatetsky-Shapiro, G., Smyth, P., & Uthurusamy, R. (1996). Advances in Knowledge Discovery and Data Mining. USA: American Association for Artificial Intelligence Press.
    Freitas, A. A. (2002). Data Mining and Knowledge Discovery with Evolutionary Algorithms. Verlag Berlin Heidelberg: Springer Science and Business Media.
    Gan, G., Ma, C., & Wu, J. (2007). Data Clustering: Theory, Algorithms, and Applications. Philadelphia, PA: SIAM.
    Han, J., Kamber, M., & Pei, J. (2011). Data Mining: Concepts and Techniques. Waltham, USA: Morgan Kaufmann Elsevier.
    Hand, D. J., Mannila, H., & Smyth, P. (2001). Principles of Data Mining. London, UK: Massachusetts Institute of Technology Press.
    Hands, S., & Everitt, B. (1987). A Monte Carlo Study of The Recovery of Cluster Structure in Binary Data by Hierarchical Clustering Techniques. Multivariate Behavioral Research, 22(2), 235-243.
    Hu, F. B., Wang, B., Chen, C., Jin, Y., Yang, J., Stampfer, M. J., & Xu, X. (2000). Body Mass Index and Cardiovascular Risk Factors in A Rural Chinese Population. American journal of epidemiology, 151(1), 88-97.
    Hu, H., Chu, F., Wong, W., Lo, Y., & Sheng, W. (1986). Trends in Mortality from Cerebrovascular Disease in Taiwan. Stroke, 17(6), 1121-1125.
    Hu, R. (2010). Medical Data Mining Based on Association Rules. Computer and Information Science, 3(4), 104-108.
    Huang, Y. C. (2013). Mining Association Rules between Abnormal Health Examination Results and Outpatient Medical Records. Health Information Management Journal, 42(2), 23-30.
    Jabbar, M., Chandra, P., & Deekshatulu, B. (2011). Cluster Based Association Rule Mining for Heart Attack Prediction. Journal of Theoretical and Applied Information Technology, 32(2), 196-201.
    Jeng, J., & Su, T. (2007). Epidemiological Studies of Cerebrovascular Diseases and Carotid Atherosclerosis in Taiwan. Acta Neurologica Taiwanica, 16(4), 190-202.
    Jovanoski, V., & Lavrač, N. (2001). Progress in Artificial Intelligence. Ljubljana, Slovenia: Springer Berlin Heidelberg.
    Kanavi, R., Smilee, J., Mallikarjuna, P., Vedavathi, K., & Jayarajan, M. (2011). Correlation Between Body Mass Index and Cardiovascular Parameters in Obese and Non-Obese in Different Age Group. International Journal of Biological and Medical Research, 2(2), 551-555.
    Koh, H. C., & Tan, G. (2011). Data Mining Applications in Healthcare. Journal of healthcare information management, 19(2), 64-72.
    Kuiper, F. K., & Fisher, L. (1975). 391: A Monte Carlo comparison of six clustering procedures. Biometrics, 777-783.
    Lorr, M. (1983). Cluster Analysis for Social Scientists: Techniques for Analysing and Simplifying Complex Blocks of Data (Vol. 1). San Francisco: Jossey-Bass.
    Luo, Q. (2008). Advancing Knowledge Discovery and Data Mining, Adelaide, SA.
    Markus, H. (2008). Stroke: Causes and Clinical Features. Medicine, 36(11), 586-591.
    Milligan, G. W., & Cooper, M. C. (1988). A Study of Standardization of Variables in Cluster Analysis. Journal of classification, 5(2), 181-204.
    Ou-Yang, C., Agustianty, S., & Wang, H.-C. (2013). Developing a Data Mining Approach to Investigate Association between Physician Prescription and Patient Outcome–A Study on Re-hospitalization in Stevens–Johnson Syndrome. Computer methods and programs in biomedicine, 112(1), 84-91.
    Petushkova, N. A., Pyatnitskiy, M. A., Rudenko, V. A., Larina, O. V., Trifonova, O. P., Kisrieva, J. S., Samenkova, Natalia F., Kuznetsova, Galina P., Karuzina, Irina I., Lisitsa, A. V. (2014). Applying of Hierarchical Clustering to Analysis of Protein Patterns in the Human Cancer-Associated Liver. PloS one, 9(8), 1-12.
    PWC, P. W. C. (2012). Checking up on Taiwan Healthcare - Market Challenges and Opportunities.
    Reckless, I. P., & Buchan, A. M. (2008). Stroke: Management and Prevention. Medicine, 36(11), 592-600.
    Sheikh, L. M., Tanveer, B., & Hamdani, M. (2004). Interesting Measures for Mining Association Rules.
    Song, S., Warren, J., & Riddle, P. (2014). Developing High Risk Clusters for Chronic Disease Events with Classification Association Rule Mining. Paper presented at the Proceedings of the Seventh Australasian Workshop on Health Informatics and Knowledge Management.
    SVN, S. f. V. N.-. (2014). Atherosclerosis: Circulating the Facts About Peripheral Vascular Disease (pp. 1-7).
    Tan, P.-N., Steinbach, M., & Kumar, V. (2006). Introduction to Data Mining (Vol. 1). Boston, USA: Pearson Addison Wesley Boston.
    Tesfaye, F., Nawi, N., Van Minh, H., Byass, P., Berhane, Y., Bonita, R., & Wall, S. (2007). Association between Body Mass Index and Blood Pressure Across Three Populations in Africa and Asia. Journal of human hypertension, 21(1), 28-37.
    Tomar, D., & Agarwal, S. (2013). A survey on Data Mining Approaches for Healthcare. International Journal of Bio-Science and Bio-Technology, 5(5), 241-266.
    V. Feigin, G. D. (2008). Cerebrovascular Disease. Elsevier, 556-571.
    WEA, W. E. A.-. (2012). Give Yourself the 15 Minutes Gift of Your Health Screening and Results. from Premera Blue Cross
    Wei, W., & Yaohua, W. (2011). Mining the Changes of Concurrencies in Checkup Data. International Journal of Digital Content Technology and its Applications, 5(7), 52-58.
    Wulandari, C. P. (2014). Applying a Multivariate Discretization Method for Mining Association Rules from a Cerebrovascular Health Examination Dataset. National Taiwan University of Science and Technology, Taipei.
    Yun, U. (2008). A New Framework for Detecting Weighted Sequential Patterns in Large Sequence Databases. Knowledge-Based Systems, 21(2), 110-122.
    Zhao, Q., & Bhowmick, S. S. (2003). Association Rule Mining: A Survey (pp. 1-20): Nanyang Technological University.

    QR CODE