Author: |
凃美綺 Mei-Ci Tu |
---|---|
Thesis Title: |
SVM、RF 與 MLP 應用於腦中風預測之比較研究 Comparing with the Application of SVM, RF, and MLP in Stroke Prediction |
Advisor: |
黃世禎
Sun-Jen Huang |
Committee: |
劉俞志
Yu-Chih Liu 魏小蘭 Hsiao-Lan Wei |
Degree: |
碩士 Master |
Department: |
管理學院 - 資訊管理系 Department of Information Management |
Thesis Publication Year: | 2019 |
Graduation Academic Year: | 107 |
Language: | 中文 |
Pages: | 58 |
Keywords (in Chinese): | 腦中風 、支援向量機 、隨機森林 、多層感知器 、模型訓練與驗證流程 |
Keywords (in other languages): | Stroke, SVM, Random Forest, MLP, Model Training and Verification process |
Reference times: | Clicks: 713 Downloads: 8 |
Share: |
School Collection Retrieve National Library Collection Retrieve Error Report |
腦中風為世界之第二大死亡原因,腦中風不僅造成人體生理上的不適,後續 的照護治療也造成社會與家庭上龐大的負擔,也嚴重影響了未來的生活品質,為 了避免腦中風的發生,唯有從平常生活作息中就開始注意與管理腦中風相關危險 因子,才能有效地預防腦中風的發生。而現今機器學習與深度學習技術已廣泛應 用於醫療領域,其中應用於腦中風之研究,資料多為醫學檢驗檢查之影像資料, 並使用圖像相關之模型進行建模,但其檢驗檢查資料取得不易,而且後續實際預 測腦中風時,也需病患實際到醫院進行檢驗檢查,才能透過檢驗檢查之影像資料 進行預測,因此模型在實際應用上較不便利且需花費額外的醫療資源,而檢驗檢 查對病患不僅是經濟上的負擔,也增加了病患急性腎功能衰退發生率,亦對國家 造成醫療上之負擔。
本研究使用美國疾病控制與預防中心的行為風險因子監測系統所進行之健 康電訪調查資料作為資料來源,並蒐集相關文獻彙整腦中風相關危險因子做為變 數篩選之依據,建構支援向量機、隨機森林與多層感知器之腦中風預測模型,並 建構一套模型訓練與驗證流程以及模型評估流程,依據模型評估指標比較各個模 型評估結果,並依照各評估指標挑選各面向之最佳模型,並發現以準確度為重的 情境之下,多層感知器於準確度評估指標中表現為最佳,以敏感度為重的情境之 下,支援向量機於敏感度評估指標中表現為最佳,以特異度為重的情境之下,隨 機森林於特異度評估指標中表現為最佳。本研究提出之建模與評估方式可提供後 人研究之參考,建構之模型亦能建構於資訊系統上做應用。
Stroke is the second cause of death in the world. Stroke not only causes physical discomfort in the human body, but subsequent care treatment also causes a huge burden on society and the family. It also seriously affects the quality of life in the future. In order to avoid the occurrence of stroke, people only pay attention to the risk factors related to stroke and control it from the usual routine, and people can effectively prevent the occurrence of stroke. Nowadays, machine learning and deep learning technology have been widely used in the medical field. Among them, the research on brain stroke is mostly the image data of medical examination, and it is modeled using the image algorithm. However, medical examination data is not easy to obtain. Moreover, when it is actually used in the future, it is also necessary for the patient to go to the hospital for examination, through this information to predict. Therefore, the model is less convenient in practical applications and requires additional medical resources. The inspection is not only an economic burden on the patient but also increases the incidence of acute renal function decline in the patient and increases a burden on the country.
This study employed the data from the Behavioral Risk Factor Surveillance System, and collected the stroke-related risk factors as the basis for the selection of variables. It further established a set of SVM, RF, and MLP of stroke prediction models, and constructed a set of model training and verification process and model evaluation process. The results of each model evaluation were compared based on the model evaluation indicators and then the best model of each evaluation was selected according to each evaluation indicator. It was found that the MLP was the best performance in the accuracy evaluation. SVM was the best performance in the sensitivity evaluation. RF was the best performance in the specificity evaluation. The modeling and evaluation methods proposed in this study can provide a reference for future research. The information system can also be developed based on the constructed model in this study.
[1] World Health Organization. (2018). The top 10 causes of death. Retrieved from https://www.who.int/news-room/fact-sheets/detail/the-top-10-causes-of-death
[2] World Health Organization. (2014). Stroke, Cerebrovascular accident. Retrieved from https://www.who.int/topics/cerebrovascular_accident/en/
[3] Adams HP Jr, Brott TG, Crowell RM, Furlana J, Gomez CR, Grotta J, Helgason, Marler JR, Woolson RF, Zivin JA, Feinberg W, Mayberg M. Guidelines for the management of patients with acute ischemic stroke. A statement for healthcare professionals from a special writing group of the Stroke Council, American Heart Association. Stroke 1994;25:1901-1904.
[4] Leunens, G., Verstraete, J., Van den Bogaert, W., Van Dam, J., Dutreix, A., & Van der Schueren,E. (1992). Human errors in data transfer during the preparation and delivery of radiation treatment affecting the final result: “garbage in, garbage out”.Radiotherapy and Oncology, 23(4), 217-222.
[5] Centers for Disease Control and Prevention. (2019). About Stroke. Retrieved from https://www.cdc.gov/stroke/about.htm
[6] Behavioral Risk Factor Surveillance System (2019). About BRFSS. Retrieved from https://www.cdc.gov/brfss/index.html
[7] S Patro and Kishore Kumar Sahu. (2015). Normalization: A Preprocessing Stage. Department of Computer Science Engineering and Intelligent Transport (CES & IT), Veer SurendraSai University of Technology (VSSUT), Burla, Odisha, India.
[8] Obukhov Egor (2016). Handling the Problem of Unbalanced Data Sets in the Classification of Technical Equipment States. International Conference on Applied Innovations in IT,4, 77-79.
[9] Cortes, C. and Vapnik, V. (1995) Support Vector Networks. Machine Learning 20:273–297.
[10] Temurtas H, Yumusak N, Temurtas F. A comparative study on diabetes disease diagnosis using neural networks. Expert Syst Appl. 2009;36:8610–8615.
[11] scikit-learn developers. Support Vector Machines. Retrieved from https://scikit-learn.org/stable/modules/svm.html
[12] scikit-learn developers. sklearn.ensemble.RandomForestClassifier. https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
[13] Andrew P.Bradley (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition,30(7), 1145-1159.
[14] AiguoWang,&NingAn,&GuilinChen,&LianLi,&GilAlterovitz (2015). Predicting hypertension without measurement: A non-invasive,questionnaire-based approach. Expert Systems with Applications,42(21), 7601-7609.
[15] Gunn, S. R. (1998). Support Vector machines for classification and regression, Technical Report, University of Southampton.
[16] Breiman, L., Friedman, J., Olshen, R. A., & Stone. C. J. (1984). Classication and Regression Trees. Wadsworth International Group.
[17] Breiman, L. (2001). Random Forests. Machine Learning, 45 (1), 5-32.
[18] Goldstein, L., B., & Simel, D. L. (2005). Is This Patient Having a Storke? JAMA, 293(19).
[19] Ibrikci, T., Ustun, D., & Kaya, I. E. (2012). Diagnosis of several diseases by using combined kernels with Support Vector Machine. J Med Syst, 36(3), 1831-1840. doi:10.1007/s10916-010-9642-5
[20] Singh, A., & Guttag, J. V. (2011). A comparison of non-symmetric entropy-based classification trees and support vector machine for cardiovascular risk stratification. Conf Proc IEEE Eng Med Biol Soc, 2011, 79-82. doi: 10.1109/IEMBS.2011.6089901
[21] Gardner, M. W., & Dorling, S. (1998). Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences. Atmospheric environment, 32(14-15), 2627-2636.
[22] Delen, D., Walker, G., & Kadam, A. (2005). Predicting breast cancer survivability: A comparison of three data mining methods. Artificial Intelligence in Medicine Artificial Intelligence in Medicine, 34(2), 113–127.
[23] Ho, T.K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832–844.
[24] Vapnik VN, G. S., Smol A. (1997). Support vector method for function approximation, regression estimation and signal processing. 281-287.
[25] U. Qidwai, "Fuzzy Data to Crisp Estimates: Helping the Neurosurgeon Making Better Treatment Choices for Stroke Patients," 2018 IEEE-EMBS Conference on Biomedical Engineering and Sciences (IECBES), Sarawak, Malaysia, 2018, pp. 691-695.
[26] M. M. Mirbagheri and W. Z. Rymer, "Predication of reflex recovery after stroke using quantitative assessments of motor impairment at 1 month," 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Minneapolis, MN, 2009, pp. 7252-7255.
[27] K. Cao, C. Fu, H. Li, X. Xin and Y. Gao, "A novel prognostic model to predict the recovery of ischemie stroke patients," 2013 IEEE International Conference on Bioinformatics and Biomedicine, Shanghai, 2013, pp. 1-2.
[28] Sabut, Sukanta & Subudhi, Asit & Mohanty, Monalisa & Jena, SS. (2018). Computational Intelligence Approach for Predicting Ischemic Stroke using Brain MRI. 10.1109/ICICCT.2018.8473213.
[29] Bonita, Ruth. (1992). Epidemiology of Stroke. Lancet. 339. 342-4. 10.1016/0140-6736(92)91658-U.
[30] P. Huang et al., "Predicting stroke outcomes based on multi-modal analysis of physiological signals," 2015 IEEE International Conference on Digital Signal Processing (DSP), Singapore, 2015, pp. 454-457.
[31] 衛生福利部。 106 年死因統計結果分析。民國107年6月15日。
[32] 衛生福利部國民健康署。中風預防人人有責! 90%的中風均與危險因子有關。民國106年10月28日。
[33] 衛生福利部國民健康署。腦中風。民國107年1月2日。
[34] 李明憲(2013)。 智慧型初期缺血性腦中風偵測系統。 中山醫學大學醫學資訊學研究所學位論文
[35] 鄭曼汝(2018)。應用機器學習於臉部中風檢測。國立雲林科技大學資訊工程系碩士論文
[36] 衛生福利部中央健康保險署。105年健保支付檢查費用前20項排名。民國106年6月5日。
[37] 鄭建興(2003)。認識腦中風。健康世界,214。
[38] 胡漢華。台灣腦中風防治指引2008。台灣腦中風學會。
[39] 林宗勳。Support Vector Machines 簡介。daniel@cmlab.csie.ntu.edu.tw
[40] 蔡詩怡(2011)。以探索性資料分析方法發展心臟血管疾病臨床輔助預知模型。國立臺北護理健康大學資訊管理研究所碩士論文。
[41] 張瓈文(2012)。資料採礦技術應用於全民健保資料庫分析腦中風病患死亡相關因素之研究。輔仁大學統計資訊學系應用統計碩士班碩士論文
[42] 余懿真(2011)。智慧型腦中風偵測系統。中山醫學大學應用資訊科學學系碩士論文。
[43] 王致程(2015)。以CART與多重SVM探討腦出血影響因子與三十天腦出血死亡率。中華大學資訊管理學系碩士班碩士論文。