研究生: Inggi Rengganing Herani
Inggi - Rengganing Herani
論文名稱: 運用複合式資料探勘方法建立頸動脈病變預測模型
Development of Carotid Artery Diagnostic Prediction Model using Hybrid Data Mining Approach
指導教授: 歐陽超
Chao Ou-Yang
口試委員: 郭人介
Ren-Jieh Kuo
Chao-Lung Yang
學位類別: 碩士
系所名稱: 管理學院 - 工業管理系
Department of Industrial Management
論文出版年: 2013
畢業學年度: 101
語文別: 英文
論文頁數: 56
中文關鍵詞: Carotid Artery DiseaseResamplingImbalance DataFeature SelectionBack Propagation Network
外文關鍵詞: Carotid Artery Disease, Resampling, Imbalance Data, Feature Selection, Back Propagation Network
相關次數: 點閱:442下載:9
Carotid artery disease is the main caused of disability and death related with stroke or cerebrovascular disease, and in the worldwide medical issue, stroke was responsible for the high number of death. Because there are no symptoms of carotid artery disease, it is important to perform medical test using ultrasound or imaging method to visualize the carotid arteries. This kind of test is uncomfortable, expensive, and has some risks. Therefore, to reduce the risks and economic issue, this research presents method that generates some important information for the doctor to diagnose the carotid artery disease.
Hybrid data mining approach is applied to produce some combination models. Dataset in real world are often imbalance. It dominated by normal data and only small percentage of abnormal or sick data. To overcome the imbalance dataset, we used Synthetic Minority Over-Sampling Technique (SMOTE) and Simple K-Means Clustering. While SMOTE is used to over-sampling the minority data, Clustering is used to under-sampling the majority data. Genetic Algorithm and Gain Ratio also used for selecting important features. These methods emphasized on selecting subset of salient features and reduced the number of features. Towards the end, new dataset would be processed using Back Propagation Network (BPN), Naive Bayes, and Decision Tree to predict the accuracy of the disease.
Experimental results show that these hybrid methods achieved high accuracy, so it can assist doctors to analyze and predict the presence of carotid artery disease in patients.

Abstract ii Table of Content iii List of Figure v List of Table vi CHAPTER I INTRODUCTION 1 1.1 Background 1 1.2 Purpose 2 1.3 Research Structure 2 CHAPTER II LITERATURE REVIEW 4 2.1 Carotid Artery Disease 4 2.1.1 Risk Factors and Symptoms 5 2.1.2 Diagnostic Testing 5 2.2 Data Mining and Knowledge Discovery 6 2.3 Data Collecting and Pre-processing 7 2.4 Imbalance Data Problem 7 2.4.1 SMOTE 8 2.4.2 K-Means Clustering 8 2.5 Feature Selection Method 9 2.5.1 Genetic Algorithm 9 2.5.2 Gain Ratio Attribute Evaluator 11 2.6 Predicting Model 12 2.6.1 Back Propagation Network (BPN) 12 2.6.2 Naive Bayes 14 2.6.3 Decision Tree (C4.5) 14 CHAPTER III RESEARCH METHODOLOGY 16 3.1 Data Pre-processing 21 3.1.1 Remove Outliers 21 3.1.2 Data Normalization 21 3.2 Dealing with Imbalance Data 22 3.2.1 SMOTE 22 3.2.2 K-Means Clustering 23 3.3 Selecting Features Method 23 3.3.1 Genetic Algorithm 23 3.3.2 Gain Ratio Attribute Evaluator 25 3.4 Buildup Predicting Model 26 3.4.1 Back Propagation Network (BPN) 26 CHAPTER IV MODEL IMPLEMENTATION 27 4.1 Data Analysis 27 4.2 Data Pre-processing 30 4.2.1 Remove Outliers 30 4.2.2 Data Normalization 31 4.3 Dealing with Imbalance Data 31 4.3.1 SMOTE 32 Random Remove Sampling 34 4.3.2 K-Means Clustering 34 4.4 Selecting Features 36 4.4.1 Genetic Algorithm 36 4.4.2 Gain Ratio Attribute Evaluation 37 4.4.3 Comparing Feature Selection 39 4.5 Prediction Model 40 4.5.1 Back Propagation Network (BPN) 41 4.5.2 Naive Bayes 42 4.5.3 Decision Tree (C4.5) 43 4.5.4 Support Vector Machine 44 4.6 Assessment 44 4.6.1 Model Comparison 44 4.6.2 Select the Best Model of BPN 45 4.6.3 Model Analysis 46 CHAPTER V CONCLUSION AND FUTURE RESEARCH 49 1.1 Conclusion 49 1.2 Future Research 50 REFERENCES 51

