簡易檢索 / 詳目顯示

研究生: 馮欣嵐
Hsin-lan Feng
論文名稱: 運用複合式資料探勘方法建立腦中風風險輔助預測模型
Applying a Hybrid Data Mining Approach to Develop a Stroke Prediction Model
指導教授: 歐陽超
Chao Ou-Yang
口試委員: 楊朝龍
Chao-Lung Yang
汪漢澄
Han-Cheng Wang
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業管理系
Department of Industrial Management
論文出版年: 2013
畢業學年度: 101
語文別: 中文
論文頁數: 87
中文關鍵詞: 腦中風不平衡資料屬性篩選倒傳遞類神經網路
外文關鍵詞: Stroke, Imbalance Data, Feature selection, BPN
相關次數: 點閱:341下載:9
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 腦血管疾病是國人健康的大敵,不論是死亡率或致殘率都十分高,除了是前三大死因外,更是青壯年人口失能與殘障的一大主因,因此如何提早預防及盡速發現病徵顯得十分重要。目前在腦血管疾病的檢驗上,仍是以頸部超音波或是腦部影像檢查效果最佳,但是腦部影像檢查通常所費不貲,若是無醫生建議或是明顯病徵出現,民眾進行腦部檢查的意願不大。因此本研究希望以較平價較易取得的一般健康檢查資料為基,運用資料探勘技術,找出一般健診資料與腦中風的關聯性,在一般健診的報告書中,入關於腦部血管的健康情形,提供民眾更多的資訊。

    本研究與北部某醫學中心合作,以七年間之腦部健診資料為例,應用複合式資料探勘方法,包括粗糙集、類神經網路等方式,首先在眾多一般健診因子中找出影響腦部血管病變的重要因子,接下來建置腦中風風險輔助預測模型,協助醫生針對民眾一般健診結果,適當給予是否進一步做腦血管檢查的建議,讓健康檢查的效益增到最大;也讓民眾可以察覺自身健康問題,透過預防或盡速治療,降低腦血管疾病所造成的傷害。


    Stroke has become a big threat of health for people worldwide, the death rate and
    disable rate of stroke are both high. Therefore, how to prevent stroke and discover it is
    an important issue now. The best way to examine and discover stroke is the brain
    image examination and ultrasound, however, the price of these examinations is
    relatively high. People won’t take these examinations if there is no advice from doctor
    or no obvious symptom people feel. Consequently, we want to use normal healthy
    examination that is cheaper and easy to take to be the basic of our research, using
    hybrid data mining techniques to find the association between normal healthy
    examination and stroke. And adding some suggestion to the normal healthy
    examination report, hope to provide more information to the public.
    We use the brain examination data from 2004 to 2011 to develop a
    Stroke-Risk-Predicting-Assistance Model by BPN. First, we do the clustering under
    sampling, and then find the relative feature by rough set theory, information gain and
    gain ratio. Finally, we use Taguchi method to set the best parameter for BPN. The
    Stroke-Risk-Predicting-Assistance Model can support doctor to give people some
    advise whether to do the brain examination or not, And to maximum the value of
    normal healthy examination. People can know their brain health state, and prevent or
    cure the stroke as soon as possible.

    IV 目錄 摘要................................................................................................................................. I ABSTRACT ................................................................................................................... II 謝誌............................................................................................................................... III 目錄...............................................................................................................................IV 圖目錄...........................................................................................................................VI 表目錄..........................................................................................................................VII 第一章、緒論................................................................................................................ 1 1.1 研究背景.............................................................................................................. 1 1.2 研究動機.............................................................................................................. 3 1.3 研究目的.............................................................................................................. 5 1.4 論文架構.............................................................................................................. 6 第二章、文獻探討........................................................................................................ 7 2.1 腦血管疾病 ......................................................................................................... 7 2.1.1 腦血管疾病之分類及症狀.......................................................................... 7 2.2 資料探勘............................................................................................................ 12 2.2.1 資料探勘在醫學上的應用......................................................................... 13 2.3 屬性篩選............................................................................................................ 14 2.3.1 粗糙集理論................................................................................................. 15 2.3.2 資訊增益(Information Gain) 與 Gain ratio ............................................. 16 2.4 不平衡資料........................................................................................................ 16 2.5 倒傳遞類神經網路............................................................................................ 17 第三章、研究方法...................................................................................................... 21 3.1 研究架構與流程................................................................................................ 21 3.2 資料收集及前處理............................................................................................ 24 3.2.1 資料前處理................................................................................................ 24 3.3 屬性篩選 ........................................................................................................... 29 3.3.1 粗糙集理論(Rough Set Theory) ................................................................ 30 3.3.3 Gain ratio .................................................................................................... 35 3.4 倒傳遞類神經模型............................................................................................ 37 3.4.1 使用田口方法做BPN 參數設定 ............................................................... 37 3.3.2 建立倒傳遞類神經模型............................................................................ 38 第四章、實做階段...................................................................................................... 39 V 4.1 資料介紹............................................................................................................ 39 4.2 資料前處理........................................................................................................ 42 4.2.1 重複資料與遺漏值處理............................................................................. 42 4.2.2 資料轉換.................................................................................................... 44 4.2.3 處理不平衡資料......................................................................................... 46 4.3 屬性篩選 ........................................................................................................... 51 4.3.1 粗糙集理論................................................................................................ 51 4.3.2 Information Gain 指標 ............................................................................. 54 4.3.3 Information Gain 指標 ............................................................................. 57 4.3.4 各組屬性整理............................................................................................. 60 4.4 建置倒傳遞類神經網路 ................................................................................... 61 4.4.1 田口方法設定參數.................................................................................... 61 4.4.2 個別建立BPN 模型 .................................................................................. 64 4.5 評估 ................................................................................................................... 65 4.5.1 選出最佳模型............................................................................................ 65 4.5.2 驗證模型可行性........................................................................................ 68 4.5.3 分析與討論................................................................................................ 69 第五章、結論與建議.................................................................................................. 75 5.1 結論.................................................................................................................... 75 5.2 未來建議............................................................................................................ 76 參考文獻...................................................................................................................... 77 附錄 A-1、TS_RST 模型之田口實驗結果 .............................................................. 80 附錄 A-2、RD_RST 模型之田口實驗結果 ............................................................. 81 附錄 A-3、 DB_IG 模型之田口實驗結果............................................................... 82 附錄 A-4、 TS_IG 模型之田口實驗結果 ............................................................... 83 附錄 A-5、 RD_IG 模型之田口實驗結果............................................................... 84 附錄 A-6、 DB_GR 模型之田口實驗結果 ............................................................. 85 附錄 A-7、TS_IG 模型之田口實驗結果 ................................................................. 86 附錄 A-8、 RD_GR 模型之田口實驗結果 ............................................................. 87

    77

    參考文獻
    Barnes, R.W., Toole, J.F., Nelson, JJ, & Howard, V.J. (2006). Neural networks for
    ischemic stroke. Journal of stroke and cerebrovascular diseases, 15(5),
    223-227.
    Berry, M.J., & Linoff, G.S. (2004). Data mining techniques: for marketing, sales, and
    customer relationship management: * Wiley Computer Publishing.
    Chen, Y.S., & Cheng, C.H. (2011). A soft-computing based rough sets classifier for
    classifying IPO returns in the financial markets. Applied Soft Computing.
    Chien, K. L., Sung, F. C., Hsu, H. C., Su, T. C., Lin, R. S., & Lee, Y. T. (2002).
    Apolipoprotein A-I and B and stroke events in a community-based cohort in
    Taiwan: report of the Chin-Shan Community Cardiovascular Study. Stroke,
    33(1), 39-44.
    Cios, K.J., & William Moore, G. (2002). Uniqueness of medical data mining. Artificial
    intelligence in medicine, 26(1), 1-24.
    Daszykowski, M., Walczak, B., & Massart, DL. (2001). Looking for natural patterns in
    data: Part 1. Density-based approach. Chemometrics and Intelligent
    Laboratory Systems, 56(2), 83-92.
    Dunham, M.H. (2006). Data mining: Introductory and advanced topics: Pearson
    Education India.
    Er, O., Yumusak, N., & Temurtas, F. (2010). Chest diseases diagnosis using artificial
    neural networks. Expert Systems with Applications, 37(12), 7648-7655.
    Han, J., & Kamber, M. (2006). Data mining: concepts and techniques: Morgan
    Kaufmann.
    Hassanien, A.E. (2004). Rough set approach for attribute reduction and rule
    generation: a case of patients with suspected breast cancer. Journal of the
    American society for Information science and Technology, 55(11), 954-962.
    Huang, L., Zhao, J., Singare, S., Wang, J., & Wang, Y. (2007). Discrimination of cerebral
    ischemic states using bispectrum analysis of EEG and artificial neural network.
    Medical engineering & physics, 29(1), 1-7.
    Kohavi, R., & John, G.H. (1997). Wrappers for feature subset selection. Artificial
    intelligence, 97(1), 273-324.
    Liu, Q., Cui, X., Abbod, M.F., Huang, S.J., Han, Y.Y., & Shieh, J.S. (2011). Brain death
    prediction based on ensembled artificial neural networks in neurosurgical
    intensive care unit. Journal of the Taiwan Institute of Chemical Engineers,
    42(1), 97-107.
    Mahapatra, S., & Mahapatra, SS. (2010). Attribute selection in marketing: A rough set
    78

    approach. IIMB Management Review, 22(1), 16-24.
    Salamo, M., & Lopez-Sanchez, M. (2011). Rough set based approaches to feature
    selection for Case-Based Reasoning classifiers. Pattern Recognition Letters,
    32(2), 280-292.
    Solutions, Stem Cell Treatment. (2012). Stem Cell Treatment Solutions. from
    http://www.stem-cell-solutions.com.au/test/training/research/97-degenerati
    ve/186-cerebrovascular-disease-stem-cell-therapy
    Tan, Pang Ning, Steinbach, Michael, & Kumar, Vipin. (2007). Introduction to Data
    Mining.
    中央健保局. (2005). 全民健康保險醫事服務機構醫療服務審查辦法第30 條.
    艾康健康管理中心. (2010). 缺血性腦中風(Ischemic Stroke). from
    http://chris751201.pixnet.net/blog/post/4985846-%E7%BC%BA%E8%A1%80
    %E6%80%A7%E8%85%A6%E4%B8%AD%E9%A2%A8(ischemic-stroke)
    吳建興. (2003). 以約略集合與決策樹萃取危險因子-以逆流性食道炎為例. 華梵
    大學資訊管理所.
    吳國禎. (2000). 資料探索在醫學資料庫之應用. 中原大學醫學工程所.
    李淑芬, 柯慧青, 洪錦墩, & 李美文. (2012). 影響民眾選擇自費健康檢查因素之
    研究. 澄清醫護管理雜誌, Vol.8(3), 27-37.
    李博智. (2002). 資料探勘在慢性病預測模式之建構. 元智大學資訊管理所.
    邱弘毅. (2008). 腦中風之現況與流行病學特徵. 腦中風會訊, Vol.15(3), 2-4.
    姚志成. (2005). 運用資料探勘技術建構脂肪肝預測模式. 中原大學資訊管理所.
    洪永祥, & 江柏儒. (2009). 應用 IG 特徵選取改善 SVM 多類別分類績效. 行政
    院國家科學委員會補助專題研究計畫.
    高志賢. (2007). 影像醫學在頭頸及腦部血的應用. 義大醫訊, Vol.23, 37-40.
    張禾坤. (2005). 決策樹應用在中西醫腦中風診斷之研究. 長庚大學資訊管理所.
    張志華. (2003). 預測冠狀動脈繞道手術之重大併發症 - 類神經網路模型之建構
    及分析. 國立台北醫學大學醫學資訊所.
    張晉瑋. (2011). 整合人工免疫系統與最佳化人工免疫網路-於倒傳遞神經網路之
    學習-以無線射頻辨識系統之定位為例. 國立台灣科技大學工業管理所.
    張琦, 吳賓, & 王柏. (2005). 非平衡數據訓練方法概述. 計算機科學研討會.
    張維揚. (2011). 大楊梅地區50 歲以上民眾初次患缺血性腦中風之危險因子探討
    ---以楊梅某地區教學醫院病患為例. 國立交通大學高階主管管理碩士學
    程.
    陳昭君. (2010). 以健保資料庫分析腦血管疾病共病之相對風險性. 亞洲大學資
    訊工程學所.
    陳柏翰. (2001). 以 RSS 演算法挖掘股市交易資料之研究. 國立中央大學資訊管
    理所.
    曾建元. (2012). 整合變數篩選與分類方法建構信用評等模型. 國立交通大學工
    79

    業工程與管理所.
    楊正三, 葉明龍, 莊麗月, 陳禹融, & 楊正宏. (2008). 利用資訊增益與瀰集演算
    法於基因微陣列之特徵選取與分類問題. 資訊科技國際期刊, 2(2), 50-62.
    廖介銘. (2003). 決策樹應用於糖尿病之探勘. 華梵大學資訊管理所.
    劉威廷. (2008). 運用人工智慧於頸動脈內膜中層厚度之研究. 虎尾科技大學工
    業工程與管理所.
    潘永浤. (2003). 應用田口方法於類神經網路輸入參數設計-零售商快速回應系統
    模式之建立為例. 義守大學工業工程與管理所.
    鄭弼勳. (2009). 運用資料探勘建構消化性潰瘍之預測模式. 雲林科技大學工業
    工程與管理所.
    羅復中. (2010). 應用資料探勘於用藥選擇-以顯影劑為例. 國立清華大學工業工
    程與工程管理所.
    羅隆晉. (2010). 以集群為基礎之多分類器模型對不平衡資料預測之研究. 銘傳
    大學資訊工程所.

    QR CODE