簡易檢索 / 詳目顯示

研究生: 陳政佑
Cheng-Yu Chen
論文名稱: 發展高敏感度為基之決策樹演算法用於腎臟疾病之預測
Developing of Forecasting Model for Chronic Kidney Disease based on a High Sensitivity Decision Tree Algorithm
指導教授: 歐陽超
Ou-Yang Chao
口試委員: 王孔政
Kung-Jeng Wang
陳杰峰
Chieh-Feng Chen
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業管理系
Department of Industrial Management
論文出版年: 2020
畢業學年度: 108
語文別: 中文
論文頁數: 69
中文關鍵詞: 腎臟疾病決策樹不平衡資料集分類回歸樹下採樣
外文關鍵詞: Chronic kidney disease, Decision Tree, Imbalanced Dataset, CART, Under-Sampling
相關次數: 點閱:218下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,腎臟疾病一直被列為本國十大死因之中,而每位進行透析治療的病患,每年開銷甚鉅,國家也編列大筆的健保預算用於照護以及預防,因此腎臟疾病不僅會對個人乃至家庭產生極大的經濟壓力,也會影響到整個國家對於健保預算的開支與決策。而大數據的興起,各大醫療機構乃至政府,除了積極建構醫療資料庫外,也對於大數據分析的導入也抱持著積極的想法,如診斷輔助系統、病灶偵測,進而提升治療成效與效益。
    本研究基於上述理由,使用美國National Center for Health Statistics(NCHS)的全國健康與營養調查(NHANES)所提供的開放資料,擷取2005年至2016年間的資料,並使用決策樹演算法來來找出腎臟疾病的危險因子與規則,以便在日後可以輔助醫師做診斷。然而醫療資料庫時常會面臨類別不平衡的問題,但根據過往的分類學習演算法,常會將少數類別分類錯誤,而在醫學上這些少數類別的資料往往具有重要的意義或分類錯誤的成本較高,本研究所收集的38447筆資料中屬於陽性類別的樣本僅佔全部資料筆數的7.6%,故本研究將調整分類回歸樹(CART)的演算法,設定Gini切分條件之閥值,並搭配下採樣(Under-Sampling)來發展高敏感度((Sensitivity)為基之決策樹演算法。


    In recent years, chronic kidney disease has been one of the top ten reasons of death in Taiwan. Patients that have to undergo dialysis treatment are burdened with large annual medical expenses. The government, on the other hand, also has to allocate a large health insurance budget for health care and prevention. As a result, not only does chronic kidney disease affect individuals and families, but will also in turn impact the national health insurance budget in regards to allocation and policy decision making. With the advent of big data, large medical facilities and even the government are actively building medical databases and are more open-minded towards big data analytics. Diagnosis aid system and lesion detection are just two examples of applications of big data analytics that can increase future treatment effectiveness and benefits.
    This research, based on the aforementioned reasons, uses the 2005 to 2016 National Health and Nutrition Examination Survey (NHANES), an open data survey from the US National Center for Health Statistics (NCHS), and decision trees to identify patterns and risk factors in chronic kidney diseases to supplement future diagnostics. Medical databases often encounter issues of imbalanced classes. Previous classification algorithms might make slight classification errors, but medically these slight data errors often or not, have deep ramifications or end up being quite costly. Of the 38,447 entries collected in this research, only 7.6% tested positive, thus this research will adjust the classification and regression tree (CART) algorithm, set a threshold for the Gini index, as well as utilize an under-sampling technique to develop a high sensitivity decision tree algorithm.

    摘要 i ABSTRACT ii 誌謝 iii 目錄 iv 圖目錄 vi 表目錄 vi 第一章、緒論 1 1.1 研究背景 1 1.2 研究目的 2 1.3 研究議題 2 1.4 重要性 3 第二章、文獻探討 4 2.1 腎臟疾病 4 2.1.1 腎絲球濾過率 6 2.2 勝算比 (Odds Ratios) 6 2.3 隨機下採樣 (Random Under-Sampling) 8 2.4 決策樹及醫療上的應用 8 2.4.1分類回歸樹(CART) 9 2.5 混淆矩陣 (Confusion Matrix) 11 第三章、研究步驟與方法 13 3.1 研究流程及架構 13 3.2 資料收集及預處理 15 3.2.1 資料串接與整理 15 3.2.2 計算eGFR值與合成最終的資料集 16 3.2.3 將數值型資料轉換成類別型 17 3.2.4 製作下採樣(Under-sampling)資料集 18 3.3 計算odds ratio與特徵挑選 19 3.4 模型建構與調整 20 3.4.1 資料對應 20 3.4.2 提升敏感度之CART決策樹模型 22 3.5模型評估 25 3.5.1 混淆矩陣(Confusion Matrix) 25 3.5.2 抽樣方法及驗證 25 第四章、實作結果 28 4.1 資料收集與預處理 28 4.1.1 特徵介紹 28 4.1.2 資料預處理與特徵挑選 29 4.2 模型參數與訓練 39 4.3 實驗結果 40 4.4 結論與決策樹規則擷取 48 第五章、結論與建議 54 5.1 結論 54 5.2 研究限制與未來建議 55 參考文獻 57

    "NHANES". NHANES Questionnaires, Datasets, and Related Documentation. Retrieved from https://wwwn.cdc.gov/nchs/nhanes/Default.aspx
    "國家衛生研究院". (2015). 2015台灣慢性腎臟病臨床診療指引. Retrieved from https://www.tsn.org.tw/UI/H/H00202.aspx
    "衛生福利部國民健康署". 若以年齡、性別、血清肌酐酸綜合計算出【腎絲球過濾率值(GFR)】來判斷腎功能,慢性腎臟病可分為哪五期?. Retrieved from https://www.hpa.gov.tw/Pages/Detail.aspx?nodeid=1158&pid=6645
    "衛生福利部統計處". (2019). 108年死因統計結果分析. Retrieved from https://dep.mohw.gov.tw/DOS/cp-4927-54466-113.html
    Bash, L. D., Selvin, E., Steffes, M., Coresh, J., & Astor, B. C. (2008). Poor glycemic control in diabetes and the risk of incident chronic kidney disease even in the absence of albuminuria and retinopathy: Atherosclerosis Risk in Communities (ARIC) Study. Archives of internal medicine, 168(22), 2440-2447.
    Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and regression trees: CRC press.
    Grech, V. (2018). WASP (Write a Scientific Paper) using Excel–12: Odds ratio and relative risk. Early human development, 122, 58-59.
    Haroun, M. K., Jaar, B. G., Hoffman, S. C., Comstock, G. W., Klag, M. J., & Coresh, J. (2003). Risk factors for chronic kidney disease: a prospective study of 23,534 men and women in Washington County, Maryland. Journal of the American Society of Nephrology, 14(11), 2934-2941.
    Kalra, A. (2016). The odds ratio: Principles and applications. Journal of the Practice of Cardiovascular Sciences, 2(1), 49-49.
    Kopple, J. D. (2001). National kidney foundation K/DOQI clinical practice guidelines for nutrition in chronic renal failure. American journal of kidney diseases, 37(1), S66-S70.
    Kotsiantis, S., Kanellopoulos, D., & Pintelas, P. (2006). Handling imbalanced datasets: A review. GESTS International Transactions on Computer Science and Engineering, 30(1), 25-36.
    Kumar, D. S., Sathyadevi, G., & Sivanesh, S. (2011). Decision support system for medical diagnosis using data mining. International Journal of Computer Science Issues (IJCSI), 8(3), 147.
    Lavanya, D., & Rani, K. U. (2011). Performance evaluation of decision tree classifiers on medical datasets. International Journal of Computer Applications, 26(4), 1-4.
    Lavanya, D., & Rani, K. U. (2012). Ensemble decision tree classifier for breast cancer data. International Journal of Information Technology Convergence and Services, 2(1), 17.
    Levey, A. S., & Coresh, J. (2012). Chronic kidney disease. The lancet, 379(9811), 165-180.
    Levey, A. S., Coresh, J., Bolton, K., Culleton, B., Harvey, K. S., Ikizler, T. A., . . . Kusek, J. (2002). K/DOQI clinical practice guidelines for chronic kidney disease: evaluation, classification, and stratification. American journal of kidney diseases, 39(2 SUPPL. 1).
    Lewis, R. J. (2000). An introduction to classification and regression tree (CART) analysis. Paper presented at the Annual meeting of the society for academic emergency medicine in San Francisco, California.
    Loh, W. Y. (2014). Classification and regression tree methods. Wiley StatsRef: Statistics Reference Online.
    Quinlan, J. R. (2014). C4. 5: programs for machine learning: Elsevier.
    Rowe, J. W., Andres, R., Tobin, J. D., Norris, A. H., & Shock, N. W. (1976). The effect of age on creatinine clearance in men: a cross-sectional and longitudinal study. Journal of gerontology, 31(2), 155-163.
    Sathyadevi, G. (2011). Application of CART algorithm in hepatitis disease diagnosis. Paper presented at the 2011 International Conference on Recent Trends in Information Technology (ICRTIT).
    Sheskin, D. J. (2003). Handbook of parametric and nonparametric statistical procedures: crc Press.
    Soni, J., Ansari, U., Sharma, D., & Soni, S. (2011). Predictive data mining for medical diagnosis: An overview of heart disease prediction. International Journal of Computer Applications, 17(8), 43-48.
    Stevens, L. A., Coresh, J., Greene, T., & Levey, A. S. (2006). Assessing kidney function—measured and estimated glomerular filtration rate. New England Journal of Medicine, 354(23), 2473-2483.
    Szumilas, M. (2010). Explaining odds ratios. Journal of the Canadian academy of child and adolescent psychiatry, 19(3), 227.
    Tan, P.-N., Steinbach, M., & Kumar, V. (2016). Introduction to data mining: Pearson Education India.
    Traynor, J., Mactier, R., Geddes, C. C., & Fox, J. G. (2006). How to measure renal function in clinical practice. Bmj, 333(7571), 733-737.
    Veganzones, D., & Severin, E. (2018). An investigation of bankruptcy prediction in imbalanced datasets. Decision Support Systems, 112, 111-124. Retrieved from <Go to ISI>://WOS:000441493300010. doi:10.1016/j.dss.2018.06.011
    Wang, Y.-X., Wang, A.-P., Ye, Y.-N., Gao, Z.-N., Tang, X.-L., Yan, L., . . . Qin, G.-J. (2019). Elevated triglycerides rather than other lipid parameters are associated with increased urinary albumin to creatinine ratio in the general population of China: a report from the REACTION study. Cardiovascular diabetology, 18(1), 57.
    Webster, A. C., Nagler, E. V., Morton, R. L., & Masson, P. (2017). Chronic kidney disease. The lancet, 389(10075), 1238-1252.
    Yang, W.-C., Hwang, S.-J., Chiang, S.-S., Chen, H.-F., & Tsai, S.-T. (2001). The impact of diabetes on economic costs in dialysis patients: experiences in Taiwan. Diabetes research and clinical practice, 54, 47-54.
    林柏松, 劉品秀, 郭芳娟, 吳再坤, 陳昶旭, 白美安, & 王雪芳. (2015). 慢性腎臟病患者代謝症候群與腎功能之探討. 臺灣營養學會雜誌, 40(4), 122-135.
    張立建. (2015). 慢性腎衰竭介紹. Retrieved from https://803.mnd.gov.tw/health/xen.html
    陳苓怡, 郭美娟, 黃尚志, 蔡哲嘉, & 陳鴻鈞. (2012). 臨床評估腎臟功能方法之優缺點. 內科學誌, 23(1), 34-41.

    無法下載圖示 全文公開日期 2025/08/26 (校內網路)
    全文公開日期 2025/08/26 (校外網路)
    全文公開日期 2025/08/26 (國家圖書館:臺灣博碩士論文系統)
    QR CODE