簡易檢索 / 詳目顯示

研究生: 陳俊延
Chun-yen Chen
論文名稱: 結合基因與決策樹演算法建立健檢資料屬性之決策法則–以中風病患資料為例
Applying genetic and decision tree algorithms to discover the decision rules based on a stroke health exam dataset
指導教授: 歐陽超
Chao Ou-Yang
口試委員: 汪漢澄
Han-cheng Wang
郭人介
Ren-jieh Kuo
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業管理系
Department of Industrial Management
論文出版年: 2014
畢業學年度: 102
語文別: 中文
論文頁數: 51
中文關鍵詞: 腦部健康檢查屬性篩選基因演算法決策樹
外文關鍵詞: Carotid, Feature selection, Genetic algorithm, Decision tree
相關次數: 點閱:272下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,腦血管疾病在國人十大死因中的排名不見減少,反而有逐年上升的趨勢,這是在醫療技術與生活水準漸漸提高,定期做健康檢查的人數上升的情況下仍需要密切關心的情況。
    這種情況產生的原因有部份是由於一般健康檢查所帶來的資訊較為不足,若要明確的判斷是有否罹患腦血管疾病的風險,則需進行頸部顯影檢測,這部分檢測需要支付額外的醫療費用,降低民眾檢測的意願,有許多民眾因此錯過了及早發現的可能性,造成額外更高的醫療支出,而在腦血管疾病當中,中風是最具代表性,造成的醫療支出也是最高的。
    故本研究與北部某醫學中心合作,使用其提供之腦部健康檢查資料與一般健檢資料,以文獻的探討與資料探勘的方法尋找一般健檢資料擁有的屬性與中風之間的關聯性,並以相關的屬性組合成中風的規則,藉由可接受的準確度與支持度輔助,使規則具代表性。
    由於規則是以一般健檢資料中的屬性所組成,故醫生可結合規則與自身專業知識,提供病人合適的建議與需要注意的地方。此外,亦可探討規則與規則之間的異同之處與其所具有的醫療上的意義。


    Stroke has always been highlighted as a big threat of health in the worldwide.
    Magnetic Resonance Imaging (MRI) and ultrasound are few tools can be used to discover stroke disease. However, it required extra resources (both in time and cost) for the doctors and patients to carry out the exam.
    On the other hand, taking general health exam is a popular activity for the people in Taiwan. Therefore, in this research, a hybrid data mining approach applying general health exam data to extract the appropriate decision rules is proposed. It can be used to support medical personnel to make appropriate suggestion for brain image examination.
    This research used a brain health exam dataset collected by a medical institute in north Taiwan from 2004 to 2011. This dataset includes general healthcare checkup and MRI brain exam. There are three stages in this approach. The stratified random sampling approach will be applied to construct a balanced health exam record dataset for the patients with stroke history. Then, genetic algorithm will be used to identify the most related factors for stroke from the health exam items. Finally, data for the identified items will be used to construct the decision rules based on C4.5 approach. The quality of the constructed rules will be examined by their precision and support. These decision rules may used to assist doctor to make appropriate judgments for the patients with stroke possibilities.

    摘要i Abstractii 目錄iii 圖目錄v 表目錄vi 壹、緒論- 1 - 1.1研究背景- 1 - 1.2研究目的- 2 - 1.3研究議題- 2 - 1.4重要性- 3 - 1.5研究架構- 4 - 貳、文獻探討- 5 - 2.1腦血管疾病- 5 - 2.1.1腦血管疾病的症狀與分類- 5 - 2.1.2腦血管疾病的影響因子- 7 - 2.2資料探勘- 9 - 2.3基因演算法- 10 - 2.4決策樹(C4.5)- 12 - 參、研究方法- 14 - 3.1資料處理階段- 16 - 3.1.1資料前處理- 16 - 3.1.1.1處理不平衡資料- 17 - 3.2方法建構階段- 18 - 3.2.1主要因子的選取- 18 - 3.2.2次要因子決定- 19 - 3.2.2.1應用基因演算法進行屬性篩選- 20 - 3.2.2.2應用決策樹判斷屬性之品質- 22 - 3.2.2.3計算準確度- 24 - 3.2.2.4計算支持度- 25 - 3.2.2.5決策樹的剪枝- 26 - 3.3結果分析階段- 26 - 3.3.1預期結果- 26 - 肆、實驗結果與分析- 28 - 4.1個案資料相關- 28 - 4.1.1資料介紹- 28 - 4.2參數設定- 28 - 4.2.1基因演算法參數設定- 28 - 4.2.2決策樹參數設定- 29 - 4.3法則擷取與分析- 30 - 4.3.1法則擷取模式- 30 - 4.3.2法則分析- 31 - 4.3.2.1以中風資料經決策樹所得之法則- 33 - 4.3.2.1以異常資料經決策樹所得之法則- 38 - 4.4小結- 42 - 4.5其他比較- 42 - 4.5.1基因演算法結合決策樹與決策樹比較- 42 - 4.5.2分層隨機抽樣與無抽樣比較- 44 - 伍、結論與建議- 45 - 5.1結論- 45 - 5.2未來建議與改善- 45 - 參考文獻- 47 -

    Bamford, J., P. Sandercock, M. Dennis, J. Burn, and C. Warlow. "Classification and natural history of clinically identifiable subtypes of cerebral infarction." Lancet., 6 1991: 1521-1526.
    Berry, MJ.A., and G.S. Linoff. Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management. Wiley; 3 edition, 2004.
    Chang, C. L., and C.H. Chen. "Applying decision tree and neural network to increase quality of dermatologic diagnosis." Expert Systems with Applications, vol. 36, no. 2, 2009: 4035-41.
    Cheng, CL. "Applying decision tree and neural network to increase quality of dermatologic diagnosis." Expert Systems with Applications Volume 36, Issue 2, Part 2, 2009: 4035-41.
    Dormandy, JA., et al. "Secondary prevention of macrovascular events in patients with type 2 diabetes in the PROactive Study (PROspective pioglitAzone Clinical Trial In macroVascular Events): a randomised controlled trial." Lancet, 2005: 1279-89.
    Fayyad, U. M., G. Piatetsky-Shapirp, and P. Smyth. ""From Data Mining to Knowledge Discovery: An Overview" Advances in knowledge Discovery and Data Mining." AAAI/MIT Press, 1999: 1-36.
    Frawlet, W. J., G. Paitetsky-Shapiro, and C. J. Matheus. "Knowledge discovery in databases : An overview, knowledge disccvery in database." AAAI/MIT Press, 1991: 1-30.
    Goertzel, B., C. Pennachin, L. S. Coelho, and M. A. Mudado. "Application of MUTIC to the exploration of gene expression data in prostate cancer." Genetics and Molecular Research, 2007: 890–900.
    Goldstein, L., and D. Simel. "Is this patient having a stroke?." JAMA, 2005: 2391-402.
    Grefenstette, JOHN J. "Optimization of Control Parameters for Genetic Algorithms." IEEE Volume:16 Issue:1, 1986: 122-28.
    Grupe, F. H., and M. M. Owrang. "Database mining discovery new knowledge and cooperative advantage." Information System Management, 1995: 26-31.
    Han, J., and M. Kamber. Data Mining: Concepts and Techniques, 2nd ed. Morgan Kaufmann, 2006.
    Holland,J.H. Adaption in natural and artificial systems. A Bradford Book, 1975.
    Hong, J.H., and S.B. Cho. "Efficient huge-scale feature selection with speciated genetic algorithm." Pattern Recognition Letters 27(2), 2006: 143-50.
    Joaquin, A., L. Griselda, and d.O. Juan. "Analysis of traffic accident severity using Decision Rules via Decision." Expert Systems with Applications 40, 2013: 6047-54.
    Juan, de.O., L. Griselda, and A. Joaquin. "Extracting decision rules from police accident reports through decision trees." Accident Analysis and Prevention 50, 2012: 1151-60.
    Kim, G ., S. Kim, T. Tek, and S. Kyungki. Feature Selection Using Genetic Algorithms for Handwritten Character Recognition. NSF, 2000.
    Kudo, M., and J. Sklansky. "Comparison of algorithms that select features for pattern classifiers." Pattern Recognition Letters 33(1), 2000: 25-41.
    Kumar, D.S., G. Sathyadevi, and S. Sivanesh. Decision Support System for Medical Diagnosis Using Data Mining. IJCSI, 2011.
    Kuncheva, L. I., and L. C. Jain. "Nearest neighbor classifier: Simultaneous editing and feature selection." Pattern Recognition Letters 20(11–13), 1999: 1149-56.
    NINDS. "Stroke: Hope Through Research." National Institute of Neurological Disorders and Stroke. 1999.
    Odderson, I.R., and B.S. McKenna. "A model for management of patients with stroke during the acute phase. Outcome and economic implications." Stroke, 1993: 1823-27.
    Quinlanm, J.R. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
    Ronco, AL. "Use of artificial neural networks in modeling associations of discriminant factors: towards an intelligent selective breast cancer screening." Artificial Intelligence in Medicine Volume 16, Issue 3, 1999: 299-309.
    Sartakhti, J.S., M.H. Zangooei, and K. Mozafari. "Hepatitis disease diagnosis using a novel hybrid method based on support vector machine and simulated annealing (SVM-SA)." Computer Methods and Programs in Biomedicine Volume 108, Issue 2, 2012: 570-79.
    Shinton, R., and G. Beevers. "Meta-analysis of relation between cigarette smoking and stroke." BMJ, 1989: 789-94.
    Siedlecki, W., and J. Sklansky. "A note on genetic algorithms for large-scale feature selection." Pattern Recognition Letters, 1989: 335-47.
    Sloan, M.A., S.J. Kittner, D. Rigamonti, and T.R. Price. "Occurrence of stroke associated with use/abuse of drugs." Neurology, 1991: 1358-64.
    Soni, J. Predictive data mining for medical diagnosis: an overview of heart disease prediction. IJCA Journal, 2011.
    Soni, Jyoti., Ujma. Ansari, Dipesh. Sharma, and Sunita. Soni. "International Journal of Computer Applications, vol. 17, issue 8." Predictive Data Mining for Medical Diagnosis: An Overview of Heart Disease Prediction, 2011: 43-48.
    Stem Cell Treatment Solutions. Stem Cell Treatment Solutions. 2012. http://www.stem-cell-solutions.com.au/test/training/research/97-degenerative/186-cerebrovascular-disease-stem-cell-therapy.
    Straus, S.E., S.R. Majumdar, and F.A. McAlister. "New evidence for stroke prevention: scientific review." JAMA, September 2002: 1388–95.
    Tong, D.L., and A.C. Schierz. "Hybrid genetic algorithm-neural network: Feature extraction for unpreprocessed microarray data." Artificial Intelligence in Medicine 53(1), 2011: 47-56.
    Ture, M., F. Tokatli, and I. Kurt. "Using Kaplan–Meier analysis together with decision tree methods (C&RT, CHAID, QUEST, C4.5 and ID3) in determining recurrence-free survival of breast cancer patients." Expert Systems with Applications, vol. 36, no. 2, 2009: 2017-26.
    Whisnant, J.P. "Effectiveness versus efficacy of treatment of hypertension for stroke prevention." Neurology, 1996: 301-07.
    WHO. Cerebrovascular Disorders (Offset Publications). World Health Organisation., 1978.
    Yeh, D., C. Cheng, and Y. Chen. "Expert Systems with Applications A predictive model for cerebrovascular disease using data mining." Expert Systems With Applications, vol. 38, no. 7, 2011: 8970-77.
    Zhang, P., K. Kumar, and B. Verma. "A Hybrid Classifier for Mass Classification with Different Kinds of Features in Mammography." Lecture Notes of Artificial Intelligence Volume 3614, 2005: 316-19.
    丁先玲, 王榮德, 且許文林. “台灣地區居民意外災害及惡性腫瘤、腦血管疾病之累積死亡率與潛在生命損失之長期趨勢.” 中華民國公共衛生學會雜誌, 1993: 84-91.
    美國心臟協會. Stroke Risk Factors. 美國心臟協會, 2007.
    張志華. 預測冠狀動脈繞道手術之重大併發症 - 類神經網路模型之建構及分析. 臺北醫學大學醫學資訊研究所, 2003.
    陳李成. 以資料探勘模組建立之心血管異常預測系統. 國立成功大學資訊工程研究所, 2008.
    黃政道. 以資料探勘方法及案例式推理規則建立頸動脈病變預測系統. 國立台灣科技大學工業管理研究所, 2012.
    潘信良. 以台灣腦中風登錄為基礎建構多階段中風功能恢復模型. 行政院國家科學委員會補助專題研究計畫, 台大醫學院復健科, 2008.
    鄭弼勳. 運用資料探勘建構消化性潰瘍之預測模式. 國立雲林科技大學工業工程與管理研究所, 2009.

    無法下載圖示 全文公開日期 2019/06/20 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE