研究生: 沈迪聖
Wilson Simon
論文名稱: 根據健康檢查數據庫中的失衡數據,基於決策樹的慢性腎髒病預測方法
A Decision Tree Based Predictive Approach for Chronic Kidney Disease Suspected on Imbalance Data From a Health Examination Database
指導教授: 歐陽超
Chou Ou-Yang
口試委員: 王孔政
Kung-Jeng Wang
Jie-Feng Chen
學位類別: 碩士
系所名稱: 管理學院 - 工業管理系
Department of Industrial Management
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 49
中文關鍵詞: 慢性腎病失衡數據決策樹比值比
外文關鍵詞: chronic kidney disease, imbalance data, decision tree, odds ratios
Chronic kidney disease affects 37 million people (15%) in the United States
and 10% worldwide and is associated with significant health care, morbidity and
mortality costs. Because this disease can secretly progress to an advanced stage and
even lead to death, early detection is very important to start interventions on time.
This study addresses the problem of the relationship between chronic kidney
disease and risk factor variables in terms of prevalence rates and odds ratios.
Association measurement considers the effect of significant risk factor variables on
CKD. After processing the data using the predictive model for chronic kidney
disease suspected using a decision tree, it is clear that the category of data
distribution is imbalanced. The random under-sampling method with five different
degrees of imbalance is taken to make up the disadvantage of the performance of
models caused by the imbalance data. The performance of the models is evaluated
by AUC under the ROC curve, accuracy, sensitivity, and specificity. the result
shows that an imbalance distribution, in which the minority class represents 20%,
significantly improving prediction performance. Accordingly, this study provides a
better understanding of the data imbalance in the field of medical study and
designing prediction methods in imbalanced datasets.

ABSTRACT I ACKNOWLEDGMENT II LIST OF FIGURES V LIST OF TABLES VI CHAPTER I INTRODUCTION 1 1.1. BACKGROUND 1 1.2. PURPOSE 2 1.3. RESEARCH STRUCTURE 2 CHAPTER II LITERATURE REVIEW 3 2.1. CHRONIC KIDNEY DISEASE 3 2.2. PREVALENCE 5 2.3. ODDS RATIO 5 2.4. RANDOM UNDER-SAMPLING APPROACH 8 2.5. DECISION TREE 8 2.6. CONFUSION MATRIX 9 CHAPTER III RESEARCH METHODS 12 3.1. DATA COLLECTION 12 3.2. DATA PREPROCESSING 12 3.3.2. Study of Demographic and Biochemical Characteristics of Study Subject 14 3.3. PROPOSED METHOD 15 3.3.1. Study of Odds Ratio of Risk Factor Correlated with Chronic Kidney Disease 15 3.3.2. Handling Imbalance Dataset 16 3.3.3. CONSTRUCT DECISION TREE 16 3.4. EVALUATION METRICS 17 CHAPTER IV RESULTS AND DISCUSSION 18 4.1. DATA COLLECTION 18 4.2. DATA PREPROCESSING 19 4.2.1. Estimated glomerular filtration rate 20 4.2.2. Demographics of the Study Subjects 21 4.3. EXPERIMENT RESULT 24 4.3.1. Odds Ratio 24 4.3.2. Handling Imbalance Dataset 26 4.3.3. Result on Different Degrees of Imbalance 28 4.4. ANALYZING THE PREDICTION MODELS 31 CHAPTER V CONCLUSIONS AND FUTURE RESEARCH 33 5.1. CONCLUSION 33 5.2. FUTURE RESEARCH 34 REFERENCES 35 APPENDIXES 37

