簡易檢索 / 詳目顯示

研究生: 沈迪聖
Wilson Simon
論文名稱: 根據健康檢查數據庫中的失衡數據,基於決策樹的慢性腎髒病預測方法
A Decision Tree Based Predictive Approach for Chronic Kidney Disease Suspected on Imbalance Data From a Health Examination Database
指導教授: 歐陽超
Chou Ou-Yang
口試委員: 王孔政
Kung-Jeng Wang
陳杰峰
Jie-Feng Chen
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業管理系
Department of Industrial Management
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 49
中文關鍵詞: 慢性腎病失衡數據決策樹比值比
外文關鍵詞: chronic kidney disease, imbalance data, decision tree, odds ratios
相關次數: 點閱:205下載:11
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

在美國,慢性腎髒病影響3700萬人(15%)
在全球範圍內佔10%,與重要的醫療保健,發病率和
死亡成本。因為這種疾病可以秘密發展到晚期,
即使導致死亡,及早發現對按時開始乾預也很重要。
這項研究解決了慢性腎臟之間的關係問題
疾病和危險因素變量的患病率和比值比。
關聯度量考慮了重要風險因素變量對
CKD。在使用慢性腎臟預測模型處理數據之後
使用決策樹懷疑疾病,很明顯,數據類別
分佈不平衡。五種不同的隨機欠採樣方法
程度的不平衡是為了彌補性能的劣勢
數據不平衡引起的模型。評估模型的性能
通過ROC曲線下的AUC,準確性,敏感性和特異性進行比較。結果
表明不平衡分佈,其中少數群體佔20%,
大大提高了預測性能。因此,這項研究提供了
更好地了解醫學研究領域中的數據不平衡
在不平衡數據集中設計預測方法。


Chronic kidney disease affects 37 million people (15%) in the United States
and 10% worldwide and is associated with significant health care, morbidity and
mortality costs. Because this disease can secretly progress to an advanced stage and
even lead to death, early detection is very important to start interventions on time.
This study addresses the problem of the relationship between chronic kidney
disease and risk factor variables in terms of prevalence rates and odds ratios.
Association measurement considers the effect of significant risk factor variables on
CKD. After processing the data using the predictive model for chronic kidney
disease suspected using a decision tree, it is clear that the category of data
distribution is imbalanced. The random under-sampling method with five different
degrees of imbalance is taken to make up the disadvantage of the performance of
models caused by the imbalance data. The performance of the models is evaluated
by AUC under the ROC curve, accuracy, sensitivity, and specificity. the result
shows that an imbalance distribution, in which the minority class represents 20%,
significantly improving prediction performance. Accordingly, this study provides a
better understanding of the data imbalance in the field of medical study and
designing prediction methods in imbalanced datasets.

ABSTRACT I ACKNOWLEDGMENT II LIST OF FIGURES V LIST OF TABLES VI CHAPTER I INTRODUCTION 1 1.1. BACKGROUND 1 1.2. PURPOSE 2 1.3. RESEARCH STRUCTURE 2 CHAPTER II LITERATURE REVIEW 3 2.1. CHRONIC KIDNEY DISEASE 3 2.2. PREVALENCE 5 2.3. ODDS RATIO 5 2.4. RANDOM UNDER-SAMPLING APPROACH 8 2.5. DECISION TREE 8 2.6. CONFUSION MATRIX 9 CHAPTER III RESEARCH METHODS 12 3.1. DATA COLLECTION 12 3.2. DATA PREPROCESSING 12 3.3.2. Study of Demographic and Biochemical Characteristics of Study Subject 14 3.3. PROPOSED METHOD 15 3.3.1. Study of Odds Ratio of Risk Factor Correlated with Chronic Kidney Disease 15 3.3.2. Handling Imbalance Dataset 16 3.3.3. CONSTRUCT DECISION TREE 16 3.4. EVALUATION METRICS 17 CHAPTER IV RESULTS AND DISCUSSION 18 4.1. DATA COLLECTION 18 4.2. DATA PREPROCESSING 19 4.2.1. Estimated glomerular filtration rate 20 4.2.2. Demographics of the Study Subjects 21 4.3. EXPERIMENT RESULT 24 4.3.1. Odds Ratio 24 4.3.2. Handling Imbalance Dataset 26 4.3.3. Result on Different Degrees of Imbalance 28 4.4. ANALYZING THE PREDICTION MODELS 31 CHAPTER V CONCLUSIONS AND FUTURE RESEARCH 33 5.1. CONCLUSION 33 5.2. FUTURE RESEARCH 34 REFERENCES 35 APPENDIXES 37

Brid, Rajesh S., and Medium.com. 2018. 'Decision trees a simple way to visualize a decision', Accessed 12 Des. https://medium.com/greyatom/decision-trees-a-simple-way-to-visualize-a-decision-dc506a403aeb.
Diseases, National Institute of Diabetes and Digestive and Kidney. 2017. 'What Is Chronic Kidney Disease?', Accessed 10 Des. https://www.niddk.nih.gov/health-information/kidney-disease/chronic-kidney-disease-ckd/what-is-chronic-kidney-disease.
Gaitonde, D. Y., D. L. Cook, and I. M. Rivera. 2017. 'Chronic Kidney Disease: Detection and Evaluation'.
Grech, Victor. 2018. 'WASP (Write a Scientific Paper) using Excel – 12: Odds ratio and relative risk', Early Human Development, 122: 58-59.
Jha, Vivekanand, Guillermo Garcia Garcia, Kunitoshi Iseki, Zuo Li, Saraladevi Naicker, Brett Plattner, Rajiv Saran, Angela Wang, and Chih-Wei Yang. 2013. 'Chronic kidney disease: Global dimension and perspectives', Lancet, 382.
Kalra, Aakshi. 2016. 'The odds ratio: Principles and applications', Journal of the Practice of Cardiovascular Sciences, 2: 49.
Liu, Y., C. Wang, and L. Zhang. 2009. "Decision Tree Based Predictive Models for Breast Cancer Survivability on Imbalanced Data." In 2009 3rd International Conference on Bioinformatics and Biomedical Engineering, 1-4.
Luque, Amalia, Alejandro Carrasco, Alejandro Martín, and Ana de las Heras. 2019. 'The impact of class imbalance in classification performance metrics based on the binary confusion matrix', Pattern Recognition, 91: 216-31.
Moon, Graham, Myles Gould, Tim Brown, Duncan C, Iggulden p, Kelvyn Jones, Litva A, Subramanian Sv, and Liz Twigg. 2000. Epidemiology: An Introduction.
Nelson, Daniel, and Unite.ai. 2019a. 'What is a confusion matrix?', Accessed 19 Des. https://www.unite.ai/what-is-a-confusion-matrix/.
———. 2019b. 'What is a decision tree', Accessed 15 Des. https://www.unite.ai/what-is-a-decision-tree/.
Pottelbergh, Gijs. 2013. 'the diagnosis and outcome of chronic kidney disease in older persons'.
Ramezankhani, Azra, Omid Pournik, Jamal Shahrabi, Davood Khalili, Fereidoun Azizi, and Farzad Hadaegh. 2014. 'Applying decision tree for identification of a low risk population for type 2 diabetes. Tehran Lipid and Glucose Study', Diabetes Research and Clinical Practice, 105: 391-98.
Schulz, Elizabeth V., and Carol L. Wagnera. 2019. 'History, epidemiology and prevalence of neonatal bone mineral metabolic disorders', Seminars in Fetal and Neonatal Medicine: 101069.
Sirsat, Manisha. 2019. 'Confusion Matrix', Accessed 18 Des. https://manisha-sirsat.blogspot.com/2019/04/confusion-matrix.html.
Szumilas, M. 2010. 'Explaining odds ratios', Journal of the Canadian Academy of Child and Adolescent Psychiatry, 19: 227-29.
Thomas, Robert, Abbas Kanso, and John R. Sedor. 2008. 'Chronic kidney disease and its complications', Primary care, 35: 329-vii.
Veganzones, David, and Eric Séverin. 2018. 'An investigation of bankruptcy prediction in imbalanced datasets', Decision Support Systems, 112: 111-24.
Westergren, Albert, Siv Karlsson, Ola Ohlsson, and Ingalill Hallberg. 2001. 'Eating difficulties, need for assisted eating, nutritional status and pressure ulcers in patients admitted for stroke rehabilitation', Journal of clinical nursing, 10: 257-69.

QR CODE