研究生: |
陳政佑 Cheng-Yu Chen |
---|---|
論文名稱: |
發展高敏感度為基之決策樹演算法用於腎臟疾病之預測 Developing of Forecasting Model for Chronic Kidney Disease based on a High Sensitivity Decision Tree Algorithm |
指導教授: |
歐陽超
Ou-Yang Chao |
口試委員: |
王孔政
Kung-Jeng Wang 陳杰峰 Chieh-Feng Chen |
學位類別: |
碩士 Master |
系所名稱: |
管理學院 - 工業管理系 Department of Industrial Management |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 中文 |
論文頁數: | 69 |
中文關鍵詞: | 腎臟疾病 、決策樹 、不平衡資料集 、分類回歸樹 、下採樣 |
外文關鍵詞: | Chronic kidney disease, Decision Tree, Imbalanced Dataset, CART, Under-Sampling |
相關次數: | 點閱:218 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來,腎臟疾病一直被列為本國十大死因之中,而每位進行透析治療的病患,每年開銷甚鉅,國家也編列大筆的健保預算用於照護以及預防,因此腎臟疾病不僅會對個人乃至家庭產生極大的經濟壓力,也會影響到整個國家對於健保預算的開支與決策。而大數據的興起,各大醫療機構乃至政府,除了積極建構醫療資料庫外,也對於大數據分析的導入也抱持著積極的想法,如診斷輔助系統、病灶偵測,進而提升治療成效與效益。
本研究基於上述理由,使用美國National Center for Health Statistics(NCHS)的全國健康與營養調查(NHANES)所提供的開放資料,擷取2005年至2016年間的資料,並使用決策樹演算法來來找出腎臟疾病的危險因子與規則,以便在日後可以輔助醫師做診斷。然而醫療資料庫時常會面臨類別不平衡的問題,但根據過往的分類學習演算法,常會將少數類別分類錯誤,而在醫學上這些少數類別的資料往往具有重要的意義或分類錯誤的成本較高,本研究所收集的38447筆資料中屬於陽性類別的樣本僅佔全部資料筆數的7.6%,故本研究將調整分類回歸樹(CART)的演算法,設定Gini切分條件之閥值,並搭配下採樣(Under-Sampling)來發展高敏感度((Sensitivity)為基之決策樹演算法。
In recent years, chronic kidney disease has been one of the top ten reasons of death in Taiwan. Patients that have to undergo dialysis treatment are burdened with large annual medical expenses. The government, on the other hand, also has to allocate a large health insurance budget for health care and prevention. As a result, not only does chronic kidney disease affect individuals and families, but will also in turn impact the national health insurance budget in regards to allocation and policy decision making. With the advent of big data, large medical facilities and even the government are actively building medical databases and are more open-minded towards big data analytics. Diagnosis aid system and lesion detection are just two examples of applications of big data analytics that can increase future treatment effectiveness and benefits.
This research, based on the aforementioned reasons, uses the 2005 to 2016 National Health and Nutrition Examination Survey (NHANES), an open data survey from the US National Center for Health Statistics (NCHS), and decision trees to identify patterns and risk factors in chronic kidney diseases to supplement future diagnostics. Medical databases often encounter issues of imbalanced classes. Previous classification algorithms might make slight classification errors, but medically these slight data errors often or not, have deep ramifications or end up being quite costly. Of the 38,447 entries collected in this research, only 7.6% tested positive, thus this research will adjust the classification and regression tree (CART) algorithm, set a threshold for the Gini index, as well as utilize an under-sampling technique to develop a high sensitivity decision tree algorithm.
"NHANES". NHANES Questionnaires, Datasets, and Related Documentation. Retrieved from https://wwwn.cdc.gov/nchs/nhanes/Default.aspx
"國家衛生研究院". (2015). 2015台灣慢性腎臟病臨床診療指引. Retrieved from https://www.tsn.org.tw/UI/H/H00202.aspx
"衛生福利部國民健康署". 若以年齡、性別、血清肌酐酸綜合計算出【腎絲球過濾率值(GFR)】來判斷腎功能,慢性腎臟病可分為哪五期?. Retrieved from https://www.hpa.gov.tw/Pages/Detail.aspx?nodeid=1158&pid=6645
"衛生福利部統計處". (2019). 108年死因統計結果分析. Retrieved from https://dep.mohw.gov.tw/DOS/cp-4927-54466-113.html
Bash, L. D., Selvin, E., Steffes, M., Coresh, J., & Astor, B. C. (2008). Poor glycemic control in diabetes and the risk of incident chronic kidney disease even in the absence of albuminuria and retinopathy: Atherosclerosis Risk in Communities (ARIC) Study. Archives of internal medicine, 168(22), 2440-2447.
Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and regression trees: CRC press.
Grech, V. (2018). WASP (Write a Scientific Paper) using Excel–12: Odds ratio and relative risk. Early human development, 122, 58-59.
Haroun, M. K., Jaar, B. G., Hoffman, S. C., Comstock, G. W., Klag, M. J., & Coresh, J. (2003). Risk factors for chronic kidney disease: a prospective study of 23,534 men and women in Washington County, Maryland. Journal of the American Society of Nephrology, 14(11), 2934-2941.
Kalra, A. (2016). The odds ratio: Principles and applications. Journal of the Practice of Cardiovascular Sciences, 2(1), 49-49.
Kopple, J. D. (2001). National kidney foundation K/DOQI clinical practice guidelines for nutrition in chronic renal failure. American journal of kidney diseases, 37(1), S66-S70.
Kotsiantis, S., Kanellopoulos, D., & Pintelas, P. (2006). Handling imbalanced datasets: A review. GESTS International Transactions on Computer Science and Engineering, 30(1), 25-36.
Kumar, D. S., Sathyadevi, G., & Sivanesh, S. (2011). Decision support system for medical diagnosis using data mining. International Journal of Computer Science Issues (IJCSI), 8(3), 147.
Lavanya, D., & Rani, K. U. (2011). Performance evaluation of decision tree classifiers on medical datasets. International Journal of Computer Applications, 26(4), 1-4.
Lavanya, D., & Rani, K. U. (2012). Ensemble decision tree classifier for breast cancer data. International Journal of Information Technology Convergence and Services, 2(1), 17.
Levey, A. S., & Coresh, J. (2012). Chronic kidney disease. The lancet, 379(9811), 165-180.
Levey, A. S., Coresh, J., Bolton, K., Culleton, B., Harvey, K. S., Ikizler, T. A., . . . Kusek, J. (2002). K/DOQI clinical practice guidelines for chronic kidney disease: evaluation, classification, and stratification. American journal of kidney diseases, 39(2 SUPPL. 1).
Lewis, R. J. (2000). An introduction to classification and regression tree (CART) analysis. Paper presented at the Annual meeting of the society for academic emergency medicine in San Francisco, California.
Loh, W. Y. (2014). Classification and regression tree methods. Wiley StatsRef: Statistics Reference Online.
Quinlan, J. R. (2014). C4. 5: programs for machine learning: Elsevier.
Rowe, J. W., Andres, R., Tobin, J. D., Norris, A. H., & Shock, N. W. (1976). The effect of age on creatinine clearance in men: a cross-sectional and longitudinal study. Journal of gerontology, 31(2), 155-163.
Sathyadevi, G. (2011). Application of CART algorithm in hepatitis disease diagnosis. Paper presented at the 2011 International Conference on Recent Trends in Information Technology (ICRTIT).
Sheskin, D. J. (2003). Handbook of parametric and nonparametric statistical procedures: crc Press.
Soni, J., Ansari, U., Sharma, D., & Soni, S. (2011). Predictive data mining for medical diagnosis: An overview of heart disease prediction. International Journal of Computer Applications, 17(8), 43-48.
Stevens, L. A., Coresh, J., Greene, T., & Levey, A. S. (2006). Assessing kidney function—measured and estimated glomerular filtration rate. New England Journal of Medicine, 354(23), 2473-2483.
Szumilas, M. (2010). Explaining odds ratios. Journal of the Canadian academy of child and adolescent psychiatry, 19(3), 227.
Tan, P.-N., Steinbach, M., & Kumar, V. (2016). Introduction to data mining: Pearson Education India.
Traynor, J., Mactier, R., Geddes, C. C., & Fox, J. G. (2006). How to measure renal function in clinical practice. Bmj, 333(7571), 733-737.
Veganzones, D., & Severin, E. (2018). An investigation of bankruptcy prediction in imbalanced datasets. Decision Support Systems, 112, 111-124. Retrieved from <Go to ISI>://WOS:000441493300010. doi:10.1016/j.dss.2018.06.011
Wang, Y.-X., Wang, A.-P., Ye, Y.-N., Gao, Z.-N., Tang, X.-L., Yan, L., . . . Qin, G.-J. (2019). Elevated triglycerides rather than other lipid parameters are associated with increased urinary albumin to creatinine ratio in the general population of China: a report from the REACTION study. Cardiovascular diabetology, 18(1), 57.
Webster, A. C., Nagler, E. V., Morton, R. L., & Masson, P. (2017). Chronic kidney disease. The lancet, 389(10075), 1238-1252.
Yang, W.-C., Hwang, S.-J., Chiang, S.-S., Chen, H.-F., & Tsai, S.-T. (2001). The impact of diabetes on economic costs in dialysis patients: experiences in Taiwan. Diabetes research and clinical practice, 54, 47-54.
林柏松, 劉品秀, 郭芳娟, 吳再坤, 陳昶旭, 白美安, & 王雪芳. (2015). 慢性腎臟病患者代謝症候群與腎功能之探討. 臺灣營養學會雜誌, 40(4), 122-135.
張立建. (2015). 慢性腎衰竭介紹. Retrieved from https://803.mnd.gov.tw/health/xen.html
陳苓怡, 郭美娟, 黃尚志, 蔡哲嘉, & 陳鴻鈞. (2012). 臨床評估腎臟功能方法之優缺點. 內科學誌, 23(1), 34-41.