簡易檢索 / 詳目顯示

研究生: 文吉蘭
Bunjira - Makond
論文名稱: 貝氏網路用於決策肺癌併腦部轉移:以台灣為例
Bayesian Networks for Decision Making in Brain Metastasis Patients from Lung Cancer: A Case Study in Taiwan
指導教授: 王孔政
Kung-Jeng Wang
口試委員: 郭人介
Ren-Jieh Kuo
林希偉
Shi-Woei Lin
行陸
Hsing Luh
蔣明晃
Ming-Huang Chiang
陳穆臻
Mu-Chen Chen
學位類別: 博士
Doctor
系所名稱: 管理學院 - 工業管理系
Department of Industrial Management
論文出版年: 2014
畢業學年度: 102
語文別: 英文
論文頁數: 97
中文關鍵詞: 貝氏網絡肺癌腦部轉移台灣全民健保資料庫生存率預測
外文關鍵詞: Bayesian networks, lung cancer, brain metastases, Taiwan NHI database, prediction
相關次數: 點閱:486下載:7
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 貝氏網路(Bayesian Network)用於建立肺癌生存率預測模式是值得深入研究的議題。在本研究中,生物訊息變數用來描述和預測肺癌併腦部轉移的發生機率,並建立二個貝氏網路。本研究使用1996年至2010年台灣健保資料庫。以貝氏網路構建36,043名肺癌患者併發腦部轉移的發生機率。同樣地,以貝氏網路預測438位腦部轉移患者的生存率。我們使用SMOTE(Synthetic Minority Over-Sampling Technique)解決資料不平衡問題。所提出的貝氏網路分別與單純貝氏(Naive Bayes)、邏輯式迴歸(Logistic Regression)和支持向量機(Support Vector Machine)三種模式做比較。結果顯示,SMOTE可以提高四種模式的敏感度,亦維持高的度準確性。雖然實驗顯示所提出的貝氏網路沒有顯著優於其他三種模式,但本研究之貝氏網路的優勢可提供醫療決策者更明確的肺癌患者診療指引。


    The Bayesian network (BN) is a promising method for modeling lung cancer data under uncertainty. In this study, two BNs were proposed and graphically represented using bioinformatics variables to describe and predict the occurrence of brain metastasis from lung cancer and their survivability. Additionally, the BNs can be used to support an informative medical decision/observation by using probabilistic reasoning. A nationwide cancer patient database from 1996 to 2010 in Taiwan was used. The cohort is consisted of 36,043 lung cancer patients while among them 438 brain metastasis patients were used to build the BN for predicting their survivability. We utilized synthetic minority over-sampling technique (SMOTE) to solve the imbalanced property embedded in the problem. The proposed BNs were compared with three competitive models, namely, naive Bayes, logistic regression, and support vector machine. Results showed that SMOTE can improve the performances of the four models in terms of sensitivity, while keeping high accuracy and specificity. The benefits of using the proposed BNs over the benchmark models have been identified and discussed.

    Contents 摘要 i Abstract ii Acknowledgement iii Contents v List of Figures vii List of Tables viii Chapter 1: Introduction 1 1.1 Research background 1 1.2 Research motivation 2 1.3 Research framework 2 1.4 Research limitation 3 1.5 Dissertation organization 4 Chapter 2: Literature review 5 2.1 Bayesian network 5 2.1.1 Bayesian network definition 5 2.1.2 Structure construction 6 2.1.3 Parameter estimation 6 2.1.4 Inference in Bayesian network 8 2.2 Synthetic minority oversampling technique 9 2.3 Benchmark methods 11 2.3.1 Naive Bayes 11 2.3.2 Logistic regression 11 2.3.3 Support vector machine 12 2.4 Summary 13 Chapter 3: Methodology 14 3.1 Variables and data 14 3.1.1 Variables 14 3.1.2 Database 18 3.1.3 Data pre-processing 18 3.2 Deal with imbalanced data 19 3.3 Modeling 20 3.3.1 Graphical model construction for the development of brain metastasis 20 3.3.2 Graphical model construction for the survivability of brain metastasis patients 22 3.4 Model evaluation criteria 23 Chapter 4: Bayesian network for modeling and predicting the occurrence of brain metastasis from lung cancer 25 4.1 Research issue 25 4.2 Experimental results 25 4.2.1 The inferences and findings 25 4.2.2 Assessment on prediction performance 29 4.3 Summary 35 Chapter 5: Bayesian network for modeling and predict the survivability of brain metastasis patients 37 5.1 Research issue 37 5.2 Experimental results 37 5.2.1 Inferences and findings 37 5.2.2 Assessment on prediction performance 41 5.3 Summary 46 Chapter 6: Conclusions and Future Research 48 6.1 Conclusions and contributions 48 6.2 Further research 50 References 52 Appendix A: The correlation analysis 60 Appendix B: Resulting conditional probability from model in Figure 3-2 60 Appendix C: Resulting conditional probability from model in Figure 3-3 71 Appendix D: The probability computation in Chapter 4 83 Appendix E: The probability computation in Chapter 5 85 Resume 87

    American Cancer Society. (2011). Global Cancer Facts & Figures 2nd Edition. Atlanta: American Cancer Society.
    American Cancer Society. (2012). Cancer Facts & Figures 2012. Atlanta: American Cancer.
    American cancer society. (2013). Radiation therapy for small cell lung cancer. Available from URL: http://www.cancer.org/cancer/lungcancer-smallcell/ detailedguide/small-cell-lung-cancer-treating-radiation-therapy [Accessed 2013 Jan.]
    Badriyah, T., Briggs, J. S., & Prytherch, D. R. (2012).Decision trees for predicting risk of mortality using routinely collected data. International Journal of Social and Human Sciences, 6, 303-306.
    Bajard, A., Westeel, V., Dubiez, A., Jacoulet, P., Pernet, D., Dalphin, J.C., & Depierre, A. (2004). Multivariate analysis of factors predictive of brain metastases in localised non-small cell lung carcinoma. Lung Cancer, 45, 317-323.
    Bozkurt, S. & Uyar, A. (2011). Comparison of Bayesian Network and Binary Logistic Regression Methods for Prediction of Prostate Cancer. 2011 4th International Conference on Biomedical Engineering and Informatics (BMEI), 1689-1691.
    Brown, J. S., Eraut, D., Trask, C., & Davison, A. G. (1996). Age and the treatment of lung cancer. Thorax, 51, 564-568.
    Cartman, M. L., Hatfield, A. C., Muers, M. F., Peake, M. D., Haward, R. A., & Forman, D. (2002). Lung cancer: district active treatment rates affect survival. J Epidemiol Community Health,56,424-429.
    Chawla, N.V., Bowyer, K.W., Hall, L.O. and Kegelmeyer, W.P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321-357.
    Chawla, N.V., Lazarevic, A., Hall, L.O., & Bowyer, K.W. (2003). SMOTEBoost: Improving prediction of the minority class in boosting. Proceedings of the 7th European conference on principles and practice of knowledge discovery in database,107-119.
    Chen, Y. (2009). Learning Classifiers from Imbalanced, only positive and unlabeled data sets. Department of Computer Science Iowa State University, 1-5.
    Chi, A. & Komaki, R. (2010). Treatment of Brain Metastasis from Lung Cancer. Cancers, 2 , 2100-2137.
    Cruz-Ramirez, N., Acosta-Mesa, H. G., Carrillo-Calvet, H., Nava-Fernandez, L. A., & Barrientos-Martinez, R. E. (2007). Diagnosis of breast cancer using Bayesian networks: A case study. Computers in Biology and Medicine, 37, 1553-1564.
    de Groot, P., & Munden, R. F. (2012). Lung cancer epidemiology, risk factors, and prevention. Radiologic Clinics of North America, 50(5), 863-876.
    Dekker, A., Dehing-Oberije, C., De Ruysscher, D., Lambin, P., Hope, A., Komati, K., Fung, G., Shipeng Yu, De Neve, W., & Lievens, Y. (2009). Survival Prediction in Lung Cancer Treated with Radiotherapy: Bayesian Networks vs. Support Vector Machines in Handling Missing Data. In proceeding of Machine Learning and Applications, 2009. ICMLA '09 , Miami Beach, FL, pp.494-497.
    Forrest, L. F., Adams, J., Wareham, H., Rubin, G., & White, M. (2013). Socioeconomic inequalities in lung cancer treatment: systematic review and meta-Analysis. PLOS Medicine,10(2), 1-25.
    Gao, M., Hong, X., Chen, S., & Harris, C.J. (2011). A combined SMOTE and PSO based RBF classifier for two-class imbalanced problems. Neurocomputing, 74, 3456-3466.
    Gavrilovic, I. T., & Posner, J. B. (2005). Brain metastases: epidemiology and pathophysiology. Journal of Neuro-Oncology, 75, 5-14.
    Graesslin, O. (2010). Nomogram to Predict Subsequent Brain Metastasis in Patients With Metastatic Breast Cancer. Journal of Clinical Oncology, 28 (12), 2032-2037.
    Gu, Q., Cai, Z., & Ziu, L. (2009). Classification of imbalanced data sets by using the hybrid re-sampling algorithm based on isomap. In LNCS, Advances in Computation and Intelligence, 5821, 287-296.
    Guo X., Sun T., Wu H., He W., Liang Z., Zhang M., Guo A., & Wang W. (2010). Support Vector Machine Prediction Model of Early-stage Lung Cancer Based on Curvelet Transform to Extract Texture Features of CT Image. World Academy of Science, Engineering and Technology, 71, 333-337.
    Hall, W.A., Djalilian, H.R., Nussbaum, E.S., & Cho, K.H. (2000). Long-term survival with metastatic cancer to the brain. Med Oncol, 17(4),279-86.
    Hamalainen, W. & Vinni, M. (2006). Comparison of machine learning methods for intelligent tutoring systems. Proceedings of the eighth international conference in intelligent tutoring systems, Taiwan, 525-534.
    He, H., & Garcia, E. A. (2009).Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263-1284.
    Heckerman, D. (1996). A Tutorial on Learning with Bayesian Networks. Technical Report MSR-TR-95-06, Microsoft Research, Advanced Technology Division.
    Hosmer, D.W., & Lemeshow, S. (2000). Applied logistic regression (2nd ed.). New York, New York, USA: A Wiley-Interscience Publication, John Wiley & Sons Inc.
    Hsiung, C-Y., Leung, S.W., Wang, C.-J., Lo S.-K., Chen, H.-C., Sun, L.-M. &
    Fang, F.-M. (1998). The prognostic factors of lung cancer patients with brain metastases treated with radiotherapy. Journal of Neuro-Oncology, 36, 71-77.
    Hubbs, J. L., Boyd, J. A., Hollis, D., Chino, J. P., Saynak, M., & Kelsey, C. R. (2010). Factors Associated With the Development of Brain Metastases. Cancer, 5038-5046. doi: 10.1002/cncr.25254
    Hung, J.-J., Jeng, W.-J., Hsu, W.-H., Wu, K.-J., Chou, T-Y, Hsieh, C.-C., Huang, M.-H., Liu, J.-S., & Wu, Y.-C. (2010). Prognostic factors of postrecurrence survival in completely resected stage I non-small cell lung cancer with distant metastasis. Thorax, 65, 241-245.
    Jayasurya, K., Fung, G., Yu, S., Dehing-Oberije, C., De Ruysscher, D., Hope, A., De Neve, W., Lievens, Y., Lambin, P., & Lambin, A. L. A. J. (2010). Comparison of Bayesian network and support vector machine models for two-year survival prediction in lung cancer patients treated with radiotherapy. Med. Phys., 37(4), 1401-1407.
    Jemal, A., Center, M. M., DeSantis, C., & Ward, E. M. (2010). Global Patterns of Cancer Incidence and Mortality Rates and Trends. Cancer Epidemiology, Biomarkers & Prevention, 19 (8),1893-1907. doi:10.1158/1055-9965.
    Kak, A. (2008). Ml, MAP, and Bayesian- The Holy Trinity of Parameter Estimation
    and Data Prediction. An RVL Tutorial Presentation Summer 2008, Purdue University.
    Kamei, Y., Monden, A., Matsumoto, S., Kakimoto, T., & Matsumoto, K. (2007). The Effects of Over and Under Sampling on Fault-prone Module Detection. Proceedings of First International Symposium on Empirical Software Engineering and Measurement, 196-204.
    Ko, Y.-C., Lee, C.-H., Chen, M.-J., Huang, C.-C., Chang, W.-Y., Lin, H.-J., Wang, H.-Z., & Chang, P.-Y. (1997). Risk Factors for Primary Lung Cancer among Non-Smoking Women in Taiwan. International Journal of Epidemiology, 26 (1), 24-31.
    Ko, Y.-C., Wang, J.-L., Wu, C.-C., Huang, W.-T., Lin, M.-C. (2005). Lung Cancer at a medical center in southern Taiwan. Chang Gung Med J, 28, 387-395.
    Kotsiantis, S., & Kanellopoulos, D. (2006). Discretization Techniques: A recent survey. GESTS International Transactions on Computer Science and Engineering, 32 (1), 47-58.
    Lazarević, A., Srivastava, J., & Kumar, V. (2004). Tutorial: Data Mining for analysis of rare events: A case study in security, financial and medical applications. Proceeding of Pacific-Asia Conference Knowledge Discovery and Data Mining.
    Lock, M., Chowy, E., Pondz, G. R., Dox, V., Danjouxy, C., Dinniwell, R., Leak, J., & Bezjak, A. (2004). Prognostic Factors in Brain Metastases: Can We Determine Patients Who Do Not Benefit from Whole-brain Radiotherapy? Clinical Oncology, 16, 332-338.
    Lorensuhewa, A., Pham, B., & Geva, S. (2006). Inferencing design styles using Bayesian Networks. Ruhuna Journal OF Science, 1, 113-124.
    Lowd, D., & Domingos, P. (2005). Naive bayes models for probability estimation. In Proceedings of the Twentysecond International Conference on Machine Learning, ACM Press, 529-536.
    Lucas., P.J., van der Gaag., L.C., & Abu-Hanna, A. (2004). Bayesian networks in biomedicine and health-care. Artificial Intelligence in Medicine, (30)3, 201-214.
    Maciejewski, T., & Stefanowski, J. (2011). Local neighbourhood extension of SMOTE for mining imbalanced data. Proceeding of the IEEE symposium on computational intelligence and data mining, 104-111.
    Majid, A., Ali, S., Iqbal, M., & Kausar, N. (2014). Prediction of human breast and colon cancers from imbalanced data using nearest neighbor and support vector machines. Computer Methods and Programs in Biomedicine, 113(3), 792-808.
    Mancini, F., Sousa, F. S., Hummel A. D., Falcao, A. E. J., Yi, L. C., Ortolani, C. F. ,
    Sigulem, D., & Pisa, I. T. (2011). Classification of Postural Profiles among Mouth-breathing Children by Learning Vector Quantization. Methods of information in medicine, 50(4), 349-57.
    Medina, F.M., Barrera, R.R., Morales, J.F., Echegoyen, R.C., Chavarria J.G., & Rebora, F.T. (1996). Primary lung cancer in Mexico city: a report of 1019 cases. Lung Cancer, 14, 185-193.
    Ministry of health and welfare. (2014). 2012 statistics of causes of death, Caues of death in Taiwan, 2012. Available from URL: http://www.mohw.gov.tw/en/Ministry/Index.aspx [Accessed 2014 May.]
    Mujoomdar, A., Austin, J.H.M., Malhotra, R., Powell, C.A., Pearson, G.D.N., Shiau, M.C., & Raftopoulos, H. (2007). Clinical Predictors of Metastatic Disease to the Brain from Non-Small Cell Lung Carcinoma: Primary Tumor Size, Cell Type, and Lymph Node Metastases. Radiology, 242 (3), 882-888.
    Murphy, K. (2001). The bayes net toolbox for matlab. Computing science and statistics, 33(2), 1024-1034.
    National Health Insurance Research Database, Taiwan, Bureau of National Health Insurance, Department of Health and managed by National Health Research Institutes. Available from URL: http://www.nhri.org.tw/nhird/en/index.htm
    [Accessed 2013 Jan.]
    Nicandro, C.-R., Efren, M.-M., Yaneli, A.-A. M., Enrique, M.-D.-C.-M., Gabriel, A.-M. H., Nancy, P.-C., Alejandro, G.-H., de Jesus, H.-R. G., & Erandi, B.-M. R. (2013). Evaluation of the Diagnostic Power of Thermography in Breast Cancer Using Bayesian Network Classifiers. Computational and Mathematical Methods in Medicine, 1-10.
    Nieder, C., Marienhagen, K., Thamm, R., Astner, S. T., Molls, M., Norum, J. (2008a). Prediction of Very Short Survival in Patients with Brain Metastases from Breast Cancer. Clinical Oncology, (20), 337- 339.
    Nieder, C., Thamm, R., Astner, S., T., & Molls, M. (2008b). Prediction of very short survival in patients with brain metastases from non-small cell lung cancer. Cancer Therapy, 6, 163-166.
    Nieder, C., Pawinski, A., & Molls, M. (2010). Prediction of Short Survival in Patients with Brain Metastases Based on Three Different Scores: a Role for ‘Triple-negative’ Status? Clinical Oncology, 22, 65-69.
    Oh, J.H., Craft, J., Lozi, R.A., Vaidya, M., Meng, Y., Deasy, J.O., Bradley, J. D, & Naqa, I.E. (2011). A Bayesian network approach for modeling local failure in lung cancer. Physics in Medicine and Biology, 56(6), 1635-1651.
    Pelayo, L., & Dick, S. (2007). Applying novel resampling strategies to software defect prediction. Proceedings of the annual meeting of the north American fuzzy information processing society, 69-72.
    Samet, J. M., Avila-Tang, E., Boffetta, P., Hannan, L. M., Olivo-Marston, S., Thun, M. J., & Rudin, C. M. (2011). Lung cancer in never smokers: clinical epidemiology and environmental risk factors. Clin Cancer Res, 15(18), 5626-5645.
    Sartakhti, J. S., Zangooei, M. H., & Mozafari, K. (2012). Hepatitis disease diagnosis using a novel hybrid method based onsupport vector machine and simulated annealing (SVM-SA). Computer Methods and Programs in Biomedicine, 108(2), 570-579.
    Schuette, W. (2004). Treatment of brain metastases from lung cancer: chemotherapy. Lung Cancer, 45 (Suppl. 2), S253-S257.
    Sesen, M. B., Nicholson, A.E., Banares-Alcantara, R., Kadir, T., Brady, M. (2013). Bayesian Networks for Clinical Decision Support in Lung Cancer Care. PLOS ONE, 8(12), 1-13.
    Staudt, M., Lasithiotakis, K., Leiter, U., Meier, F., Eigentler, T., Bamberg, M., Tatagiba, M., Brossart, P. & Garbe, C. (2010). Determinants of survival in patients with brain metastases from cutaneous melanoma. British Journal of Cancer, 102, 1213-1218.
    Stelzer, K. J. (2013). Epidemiology and prognosis of brain metastases. A Supplement to Surgical Neurology International: Neuro-Oncology, 4(4), s192-s202.
    Su, J., & Zhang, H. (2006). Full Bayesian network classifiers. In Proceedings of the 23rd international conference on Machine learning (ICML '06). NY, USA, ACM, 1-8.
    Sun, T, Wang, J., Li, X., Lv, P., Liu, F., Luo, Y., Gao, Q., Zhu, H., & Guo, X. (2013). Comparative evaluation of support vector machines for computer aided diagnosis of lung cancer in CT based on a multi-dimensional data set. Computer Methods and Programs in Biomedicine, 111(2), 519-524.
    Twardy, C. R., Nicholson, A. E., Korb, K. B., & McNeil, J. (2006). Epidemiological data mining of cardiovascular Bayesian networks. Electronic Journal of Health Informatics, 1(1): e3, 1-13.
    Uusitalo, L. (2007). Advantages and challenges of Bayesian networks in environmental modeling. Ecological Modelling, 203, 312-318.
    Wang, K.-J., Makond, B., Chen, K.-H., Wang, K.-M. (2013). A hybrid combining SMOTE with PSO to estimate 5-year survivability of breast cancer patients, Applied Soft Computing Journal, in print. http://dx.doi.org/10.1016/j.asoc.2013.09.014
    Wang, K.-J., Makond, B. & Wang, K.-M. (2014). Modeling and predicting the occurrence of brain metastasis from lung cancer by Bayesian network: A case study of Taiwan, Comput. Biol. Med., 47, 147-160.
    WHO Department of Gender, Women and Health. (2004). Gender in lung cancer and smoking research. Department of Gender, Women and Health Family and Community Health World Health Organization.
    Witten, I.H. & Frank, E. (2005). Data mining: practical machine learning tools and techniques. San Francisco, CA: Morgan Kaufmann.
    Xia, Y. & Prasanna, V. K. (2008). Junction Tree Decomposition for Parallel Exact Inference. Proceedings of Parallel and Distributed Processing, Los Angeles, CA, 1-12.
    Zhang, J., Wang, Y., Dong, Y., Wang, Y. (2007). Ultrasonographic feature selection and pattern classification for cervical lymph nodes using support vector machines. Computer Methods and Programs in Biomedicine, 88(1), 75-84.
    Zhang, S., Tjortjis, C., Zeng X., Qiao, H., Buchan, I. & Keane, J. (2009). Comparing data mining methods with logistic regression in childhood obesity prediction. Inf Syst Front, 11, 449-460.
    Zhao, X.M., Li, X., Chen, L., & Aihara, K. (2007). Protein classification with imbalanced data. Proteins, 70(4), 1125-1132.

    QR CODE