簡易檢索 / 詳目顯示

研究生: 洪美蘭
Angelia - Melani Adrian
論文名稱: 以類電磁演算法為基之肺癌診斷決策樹規則生成模式
Rule Generation using Electromagnetism-like Mechanism Based Decision Tree Algorithm for Lung Cancer Prognosis
指導教授: 王孔政
Kung-Jeng Wang
口試委員: 歐陽超
Chao Ou-Yang
周碩彥
Shuo-Yan Chou
郭人介
Ren-Jieh Kuo
F. Frank Chen
F. Frank Chen
蔣明晃
Ming-Huang Chiang
學位類別: 博士
Doctor
系所名稱: 管理學院 - 工業管理系
Department of Industrial Management
論文出版年: 2016
畢業學年度: 104
語文別: 英文
論文頁數: 87
中文關鍵詞: 類電磁演算法基於規則的分類決策樹特徵選擇肺癌存活時間治療費用
外文關鍵詞: electromagnetism-like mechanism, rule based classifier, decision tree, feature selection, lung cancer, survival time, treatment cost.
相關次數: 點閱:235下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 肺癌為全球癌症死亡的主要原因之一。肺癌不僅是致命的疾病,也需要昂貴的醫療。預測肺癌患者存活時間為醫生與患者極具挑戰性的任務。基於規則的分類方法廣泛使用於醫療診斷,它可以基於專家的知識建模,規則容易理解。然而,規則系統的缺點之一是因大量生成的規則造成計算成本的沉重負擔。類電磁演算法是新穎有效的優化技術,可以用來優化規則為基礎的系統。本研究提出利用類電磁演算法和決策樹算法的合成技術(註記為EMDT演算法)。EMDT應用到兩個預測的情況:肺癌患者的存活時間和治療成本預測。本研究比較幾種常用之方法,以無母數統計檢定其在分類精度與規則縮減的優越性。實驗結果證實,本研究所提出的EMDT方法優於其它的方法,EMDT可以有效預測的肺癌患者的存活時間和治療費用。


    Lung cancer is a leading cause of cancer death worldwide. Due to the poor prognosis, patients diagnosed with lung cancer used to wonder on how long they can still live. Therefore, predicting a patient’s survival time is one of the most challenging tasks for medical practitioners. Lung cancer is not only a devastating and deadly disease, but also costly for treatments. Recently, the implementations of data mining in medical and healthcare has gained its popularity. Rule based classification is one of widely used data mining techniques for medical diagnosis since the rules are easy to interpret. However one of the drawbacks of rule based system is the high burden of computational cost due to large number of rules that could be generated. Electromagnetism-like mechanism is a new and also powerful optimization technique that could be used to optimize the rule based system. In this research, we present a feature selection method using electromagnetism-like mechanism (EM) algorithm and decision tree algorithm, denoted as EMDT algorithm. The proposed EMDT has been applied to two predicting cases, first is lung cancer patients’ survival time and second is the cost of treatments. A comparison with several popular classifiers including several rule based classifiers has been carried out. Non parametric test will be used to justify the performance of the methods in terms of classification accuracy and rule reduction. The results confirmed that the proposed EMDT method can be successfully applied as a feature selection technique. Moreover, EMDT can be used to generate rules for predicting the lung cancer patients’survivability and associated treatments cost.

    摘要 i Abstract ii Acknowledgement iii Table of Contents v List of Tables viii List of Figures ix CHAPTER 1 INTRODUCTION 10 1.1. Research Background 10 1.2. Research Objective 12 1.3. Research Contribution 12 1.4. Research Limitation 13 1.5. Research Framework 13 1.6. Research Outline 14 CHAPTER 2 LITERATURE REVIEW 16 2.1. Feature Selection 16 2.1.1. Feature selection for classification 16 2.1.2. Application of feature selection for medical diagnosis and prognosis. 17 2.2. Electromagnetism-like Mechanism Algorithm 19 2.3. Stages of Electromagnetism-like mechanism Algorithm 19 2.4. Feature Selection Using Electromagnetism-like Mechanism Algorithm 21 2.4.1. Diagnosis of brain metastases from lung cancer using a modified electromagnetism like mechanism algorithm 21 2.4.2. An improved electromagnetism-like mechanism algorithm and its application to the prediction of diabetes mellitus 22 2.5. Decision Tree Algorithm 23 2.5.1. Constructing a decision tree 24 2.5.2. C5.0 decision tree rule algorithm 25 2.6. Methods Comparison 27 2.6.1. OneR algorithm 27 2.6.2. JRip algorithm 27 2.6.3. C4.5 decision tree algorithm 27 2.6.4. C5.0 decision tree rule algorithm 27 2.6.5. Decision table algorithm 28 2.6.6. Nearest neighbor algorithm 28 2.6.7. Naïve Bayes algorithm 29 CHAPTER 3 METHODOLOGY 30 3.1. Variables and Data 30 3.1.1. Variables 30 3.1.2. Data Source 32 3.1.3. Data Preprocessing 33 3.1.4. Attribute interaction 33 3.2. Electromagnetism-like Mechanism Based Decision Tree Algorithm for Medical Prognosis 35 3.2.1. Particle Representation 41 3.2.2. Fitness Function 41 3.2.3. Proposed method implementation 41 3.3. Performance Evaluations 42 3.4. Statistical Justification 42 3.5. Experimental Setting 42 CHAPTER 4 ELECTROMAGNETISM LIKE MECHANISM BASED DECISION TREE ALGORITHM FOR LUNG CANCER PROGNOSIS 44 4.1. Problem Background 44 4.2. Experiment Result 45 4.3. Discussion 52 CHAPTER 5 ELECTROMAGNETISM LIKE MECHANISM BASED DECISION TREE ALGORITHM FOR LUNG CANCER PATIENTS’ TREATMENT COST PREDICTION 54 5.1. Problem Background 54 5.2. Experiment Result 55 5.3. Discussion 62 CHAPTER 6 CONCLUSION AND FUTURE RESEARCH 64 6.1. Conclusion 64 6.2. Future Research 65 REFERENCES 66 Appendix 73

    Abdel-Aal, R., E. (2005). GMDH-based feature ranking and selection for improved classification of medical data. Journal of Biomedical Informatics, 38(6), 456–468.
    Amancio, D., A., Comin, C., H., Casanova, D., Travieso, G., Bruno, O., M., Rodrigues, F., A., da Fontoura Costa, L. (2014). A Systematic Comparison of Supervised Classifiers, PLoS One, 9(4): e94137.
    American Cancer Society. (2014). Cancer facts & figures 2014. Atlanta: American Cancer Society.
    Araúzo-Azofra, A., Benítez, J., M. (2008). Empirical study of feature selection methods in classification, Hybrid Intelligent Systems, 2008. HIS '08. Eighth International Conference on, Barcelona, 584-589.
    Birbil, ̧S.I. (2002). Stochastic global optimization techniques. PhD Thesis. Raleigh: North Carolina State University, Raleigh, NC, USA
    Birbil I, Fang SC. (2003). An electromagnetism-like mechanism for global optimization. Journal of Global Optimization; 25: 263–282.
    Campbell, A., H., Lee, E., J. (1963). The relationship between lung cancer and chronic bronchitis. British Journal of Diseases of the Chest, 57(3), 113-119.
    Chen, K.,H., Wang, K.,J., Adrian, A.,M., Wang, K.,M., Teng, N.,C.(2016). Diagnosis of brain metastases from lung cancer using a modified electromagnetism like mechanism algorithm. Journal of Medical Systems, 40(1): 1-14.
    Chen, K.,H., Wang, K.,J., Tsai, M., T., Wang, K.,M., Adrian, A.,M., Cheng, C., W., Yang, T., S., Teng, N., C., Tan, K., P., Chang, K., S. (2014). Gene selection for cancer identification: A decision tree model empowered by particle swarm optimization algorithm. BMC Bioinformatics, 15(1), 49.
    Chiang, C.J., Chen, Y.C., Chen, C.J., You, S.L., Lai, M.S., & Force, T. C. (2010). Cancer trends in Taiwan. Japanese Journal of Clinical Oncology, 40(10): 897–904.
    Chiang, J.,K., Kao, Y.,H., Lai, N.,S. (2015). The Impact of hospice care on survival and healthcare costs for patients with lung cancer: A National Longitudinal Population-Based Study in Taiwan. PLoS ONE, 10(9).
    Cipriano, L., E., Romanus, D., Earle, C., C., Neville, A., A., Halpern, E., F., Gazelle, G., S., McMahon, P., M. (2011). Lung cancer treatment costs, including patient responsibility, by stage of disease and treatment modality. Value Health, 14(1), 41–52.
    Cohen, W. W. 1995. Fast effective rule induction. in Machine Learning: Proceedings of the Twelfth International Conference. Lake Tahoe, California: Morgan Kaufmann.
    Dobbin, K., K., Simon, R., M. (2011). Optimally splitting cases for training and testing high dimensional classifiers. BMC Medical Genomics, 4:31.
    Du, W., S., Hu, B.,Q. (2016). Attribute reduction in ordered decision tables via evidence theory, Information Sciences,364–365, 91-110.
    Ferlay, J., Soerjomataram, I., Ervik, M., Dikshit, R., Eser, S., Mathers, C., . . . Bray, F. (2013). GLOBOCAN 2012 v1.0, Cancer Incidence and Mortality Worldwide: IARC CancerBase No. 11 [Internet]. Retrieved from International Agency for Research on Cancer: http://globocan.iarc.fr.
    Furey, T.S., Cristianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Haussler, D. (2000). Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10), 906–914.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I., H. (2009); The WEKA Data Mining Software: An Update; SIGKDD Explorations, 11(1).
    Harb, H., M., Desuki, A., S. (2014). Feature selection on classification of medical datasets based on particle swarm optimization. International Journal of Computer Applications, 104(5), 14-17.
    Heuvers, M., E., Hegmans, J., P., Stricker, B. ,H., Aerts, J,. G. (2012). Improving lung cancer survival; Time to move on. BMC Pulmonary Medicine, 12:77.
    Henderson, R., Jones, M., Stare, J. (2001). Accuracy of point predictions in survival analysis. Statistics in Medicine, 20(20): 3083-3096.
    Holte, R.C. (1993). Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11, 63-90.
    Hui, C., L. (2010). Willingness to Pay for Lung Cancer Treatment. Value in Health, 13(6), 743-749.
    Hsiung, C., Y., Leung, S., W., Wang, C.,J., Lo S.,K., Chen, H.,C., Sun, L.,M., Fang, F.-M. (1998). The prognostic factors of lung cancer patients with brain metastases treated with radiotherapy. Journal of Neuro-Oncology, 36: 71-77.
    James, L., E., Gower, N., H., Rudd, R., M., Spiro, S., G., Harper, P., G., Trask, C., W., Partridge, M., Ruiz de Elvira, M., C., Souhami, R., L. (1996). A randomised trial of low-dose/high-frequency chemotherapy as palliative treatment of poor-prognosis small-cell lung cancer: a Cancer research Campaign trial. British Journal of Cancer, 73(12): 1563–1568.
    Janecek, A., Gansterer, W., Demel, M., Ecker, G., F. (2008). On the Relationship Between Feature Selection and Classification Accuracy. Journal of Machine Learning Research, 4:90-105.
    Ko, Y.-C., Lee, C.-H., Chen, M.-J., Huang, C.-C., Chang, W.-Y., Lin, H.-J., Wang, H.-Z., & Chang, P.-Y. (1997). Risk factors for primary lung cancer among non-smoking women in Taiwan. International Journal of Epidemiology, 26 (1): 24-31.
    Kohavi R., John., G. H. (1997). Wrappers for feature selection. Artificial Intelligence, 97(1-2): 273-324.
    Kreuzer, M., Boffetta, P., Whitley, E., Ahrens, W., Gaborieau, V., Heinrich, J., Jöckel, K., H., Kreienbrock, L., Mallone, S., Merletti, F., Roesch, F., Zambon, P., Simonato, L. (2000). Gender differences in lung cancer risk by smoking: a multicentre case-control study in Germany and Italy. British journal of cancer 82(1):227-33.
    Lang, H., C., & Wu, S., L. (2012). Lifetime costs of the top five cancers in Taiwan. The European Journal of Health Economics, 13: 347-353.
    Li, M., Shang, C., Feng, S., Fan, J. (2014a). Quick attribute reduction in inconsistent decision tables. Information Sciences, 254 (1), 155-180.
    Li, T.Y., Hsieh, J.S., Lee, K.T., Hou, M.F, Wu, C.L, et al. (2014b). Cost Trend Analysis of Initial Cancer Treatment in Taiwan. PLoS ONE 9:10.
    Liu, H., Motoda. H. (2007). Computational Methods of Feature Selection. Chapman and Hall/CRC Press.
    Lowd, D., & Domingos, P. (2005). Naive bayes models for probability estimation. In Proceedings of the Twenty Second International Conference on Machine Learning, ACM Press, 529-536.
    MacKillop, W.,J., Quirt, C. ,F. (1997). Measuring the accuracy of prognostic judgments in oncology. Journal of Clinical Oncology, 50: 21–29.
    Medina, F.,M., Barrera, R.,R.., Morales, J.,F., Echegoyen, R.,C., Chavarria J.,G., Rebora, F.,T. (1996). Primary lung cancer in Mexico city: a report of 1019 cases. Lung Cancer, 14: 185-193.
    Milker-Zabel S., Debus J., Thilmann C., Schlegel W., Wannenmacher M. (2001). Fractionated stereotactically guided radiotherapy and radiosurgery in the treatment of functional and nonfunctional adenomas of the pituitary gland. International Journal of Radiation Oncology Biology and Physics, 1-50(5): 1279-86.
    Miyahara, K., Pazzani, M., J. (2000). Collaborative filtering with the simple Bayesian Classifier. Pacific Rim International Conference on Artificial Intelligence, 679–689.
    Mohit, R., R., V., Katoch, S., Vanjare, A., Omkar, S., N.(2015). Classification of Complex UCI Datasets Using Machine Learning Algorithms Using Hadoop. International Journal of Computer Science and Software Engineering (IJCSSE), 4(7), 190-198.
    Niemiec J, Kolodziejski L, Dyczek S. (2005). EGFR LI and Ki-67 LI are independent prognostic parameters influencing survivals of surgically treated squamous cell lung cancer patients. Neoplasma, 52: 231-7.
    Nutt, C., L., Mani, D., R., Betensky, R., A., Tamayo, P., Cairncross, J., G., Ladd, C., et al. (2003). Gene Expression-based Classification of Malignant Gliomas Correlates Better with Survival than Histological Classification. 63(7), 1602-1607.
    Omiotek, Z., Burda, A., WóJcik, W. (2013). The use of decision tree induction and artificial neural networks for automatic diagnosis of Hashimoto's disease. Expert Systems with Applications, 40(16), 6684-6689.
    Parkin, D. M., Bray, F., Ferlay, J., & Pisani, P. (2001). Estimating the world cancer burden: Globocan 2000. International Journal of Cancer, 94: 153-156.
    Peng, Y., Wu, Z., Jiang, J. (2010). A novel feature selection approach for biomedical data classification. Journal of Biomedical Informatics, 43 (1), 15–23.
    Quinlan, J. (1993). C4.5: Programs for machine learning. Morgan Kaufmann.
    Rapp E., Pater, J.,L., Willan, A., Cormier, Y., Murray, N., Evans, W.,K., Hodson, D.,I., Clark, D.,A., Feld, R., Arnold, A.,M., et al. (1988). Chemotherapy can prolong survival in patients with advanced non-small-cell lung cancer--report of a Canadian multicenter randomized trial. Journal of Clinical Oncology, 6(4):633-41.
    Roth, J. A. (2008). Targeted genetic therapy for lung cancer. Lung Cancer, Third Edition (eds J. A. Roth, J. D. Cox and W. K. Hong), Blackwell Publishing Ltd, Oxford, UK.
    Santillan, A.,A., Camargo, C., A .Jr, Colditz GA. (2003). A meta-analysis of asthma and risk of lung cacner (United States), Cancer Causes Control, 14(4), 327-334.
    Sequist, L.,V, Joshi, V.,A., Jänne, P.,A., et al. (2007). Response to treatment and survival of patients with non-small cell lung cancer undergoing somatic EGFR mutation testing. Oncologist, 12:90-8.
    Shahzad, W., Asad, S., Khan, M., A. (2013). Feature subset selection using association rule mining and JRip classifier. International Journal of Physical Science, 8(18), 885-896.
    Sheikhpour, R, Sarram, M.,A., Sheikhpour, R. (). Particle swarm optimization for bandwidth determination and feature selection of kernel density estimation based classifiers in diagnosis of breast cancer. Applied Soft Computing, 40, 113-131.
    Shouman, M., Turner, T., Stocker, R. (2012). Applying k-nearest neighbour in diagnosing heart disease patients. International Journal of Information and Education Technology, 2(3), 220 – 223.
    Siegel, R., Naishadham, D., Jemal A. (2013) Cancer statistics, CA: a cancer journal for clinicians, 63 (1), 11-30,
    Sivapriya, T., R., Kamal, A., R., N., B. (2013). Hybrid feature reduction and selection for enhanced classification of high dimensional medical data. IEEE International Conference on Computational Intelligence and Computing Research, IEEE ICCIC.
    Sridevi, T., Murugan, A. (2014). A novel feature selection method for effective breast cancer diagnosis and prognosis. International Journal of Computer Applications, 88(11), 28-33.
    Su, J., & Zhang, H. (2006). Full Bayesian network classifiers. In Proceedings of the 23rd international conference on Machine learning (ICML '06). NY, USA, ACM, 1-8.
    Swati, S., Ashok., G. (2013). Dimensionality Reduction Techniques for Improved Diagnosis of Heart Disease. International Journal of Computer Applications, 61(5), 1-8.
    Świątkowska, B., Szubert, Z., Sobala, W., Szeszenia-Dąbrowska, N. (2015). Predictors of lung cancer among former asbestos-exposed workers. Lung Cancer, 89 (3), 243-248.
    Tang J., Alelyani, S., Liu, H. (2014). Feature selection for Classification: A review. Data Classification Algorithms and Applications.
    Torre, L. A., Siegel, R. L1, Jemal, A. (2016). Lung Cancer Statistics. Advances in Experimental Medicine and Biology, 893:1-19.
    Therneau, T., M., Grambsch, P., M. (2000). Modeling Survival Data: Extending the Cox Model. Springer, New York.
    Wang, Y., C., Chen, C.,Y., Chen, S.,K., Cherng, S.,H., Ho, W.,L, Lee, H. (1998). High frequency of deletion mutations in p53 gene from squamous cell lung cancer patients in Taiwan, Cancer Research 15:58(2), 328-33.
    Wang, H., Huang, L., Jing, R., Yang, Y., Liu, K., Li, M., Wen, Z.(2015a). Identifying oncogenes as features for clinical cancer prognosis by Bayesian nonparametric variable selection algorithm. Chemometrics and Intelligent Laboratory Systems, 146(15), 464-471.
    Wang, K.,J., Adrian, A.,M., Chen, K.,H., Wang, K.,M. (2015b). An improved electromagnetism-like mechanism algorithm and its application to the prediction of diabetes mellitus. Journal of Biomedical Informatics, 54: 220-229.
    Wei, W., Wang, J., Liang, J., Mi, X.,Dang. (2015). Compacted decision tables based attribute reduction. Knowledge-Based Systems, 86, 261-277.
    Xiao, G., Ma, S., Minna, J., Xie, Y. (2015). Adaptive prediction model in prospective molecular-signature-based clinical studies. Clinical Cancer REsearch, 20(3), 531–539.
    Yabroff, K.,R., Lamont, E.,B., Mariotto, A., Warren, J.,L., Topor, M., Meekins, A., Brown, M.,L. Cipriano, et al.,(2011). Cost of care for elderly cancer patients in the United States. Journal of National Cancer Institute, 100(9):630-41.
    Yu, Y.,H., Liao, C.,C., Hsu, W.,H., Chen, H.,J., Liao, W.,C., Muo, C.,H., Sung, F.,C., Chen, C.,Y. (2011). Increased lung cancer risk among patients with pulmonary tuberculosis: a population cohort study. Journal of Thoracic Oncology, 6(1), 32–37.
    Zarogoulidou, V., Panagopoulou, E., Papakosta, D., Petridis, D., Porpodis, K., Zarogoulidis, K., Zarogoulidis, P., Arvanitidou M. (2015) .Estimating the direct and indirect costs of lung cancer: a prospective analysis in a Greek University Pulmonary Department. Journal of Thoracic Disease, 7(1): S12–S19.

    QR CODE