簡易檢索 / 詳目顯示

研究生: 楊竣壹
Chun-Yi Yang
論文名稱: 資料探勘與統計分析混合模型於肺結核與肺癌之關聯研究
A Hybrid of Data Mining and Statistical Analysis Approach on Association between Pulmonary Tuberculosis and Lung Cancer
指導教授: 周碩彥
Shuo-Yan Chou
口試委員: 王孔政
Kung-Jeng Wang
游慧光
none
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業管理系
Department of Industrial Management
論文出版年: 2014
畢業學年度: 102
語文別: 英文
論文頁數: 61
中文關鍵詞: 資料探勘肺結核肺癌世代研究存活分析決策樹關聯法則
外文關鍵詞: Cohort study, tuberculosis, lung cancer, Cox regression, survival analysis
相關次數: 點閱:353下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 研究背景與目的: 結核病是全球性的感染病症,而肺癌在台灣前十大癌症死因中名列第三位,這兩種疾病之間的病理關聯是研究人員所亟欲了解的課題。因此本研究以台灣健保資料庫,針對肺結核病患研究其他共病症對罹患肺癌的風險影響,並且以混合的資料探勘和統計方法模型做研究分析。

    研究方法: 搜尋2000至2002年間主要診斷為肺結核的新個案並持續追蹤至2011年。總共有6,172筆年齡大於20歲的新個案,並根據決策樹演算法的結果將族群分為中年族群 (1,459人) 和老年族群 (3,527人),最後分別將兩個族群用association rule, Cox regression and survival analysis 作分析比較。

    研究結果: 罹患肺結核的老年病患較罹患肺結核的中年病患有高出4倍的機率會罹患肺癌(8.45 versus 39.03 per 10,000 person-years)。罹患慢性阻塞性肺病(COPD) 分別在中年族群(6.64; 95% CI, 2.17-20.33)和老年族群(2.22; 95% CI, 1.52-3.23)都會提高罹患肺癌的風險。中年族群的肺結核病患通常相對於老年族群有較低的肺癌發生率(98.9% versus 95.8%, log-rank p < 0.0001)。

    結論: 本研究發現在經過資料探勘方法並且依年齡分群研究後,不同年齡分層的肺結核病患之間的確有不一樣的肺癌病理模式。罹患癌症的風險會因為年齡的增長而逐漸提高。


    Background and objective: Being as a global infectious disease and top 10 most fatal cancers in Taiwan, it is important to acquire the clinical pathology of tuberculosis (TB) and lung cancer. This study explored the association of tuberculosis and lung cancer with other comorbidities and investigated whether any featured attribute could be critical factor in influence of the risk of lung cancer among TB patients by conducting a hybrid data mining and statistical approach.
    Methods: Study objects were be identified from the NHIRD with diagnosis of tuberculosis between 2000 and 2002 and tracked to 2011. In a cohort of 6,137 patients with tuberculosis and aged over 20 years old, 1,459 patients were divided into middle age group and 3,527 patients were identified as elder age group based on the result of decision tree. Association rule, Cox regression and survival analysis were used for comparison between groups.
    Results: The incident rate of lung cancer is approximately 4-fold higher in the middle age group than the elder age group (8.45 versus 39.03 per 10,000 person-years). COPD increases the risk of lung cancer in both middle age group (6.64; 95% CI, 2.17-20.33) and elder age group (2.22; 95% CI, 1.52-3.23). The patients in middle age generally have more chance to be free from lung cancer compared to those with elder age in survival analysis (98.9% versus 95.8%, log-rank p < 0.0001).
    Conclusions: This study provides a comprehensive analysis on impacts of age with comorbidities in risk of lung cancer among tuberculosis patients. The risk may increase further on patients in middle age group than those with elder age.

    Chapter 1. INTRODUCTION 1 1.1. General Background Information 1 1.2. Research Objective 2 1.3. Research Framework 3 Chapter 2. LITERATURE REVIEW 4 2.1. Data Mining 4 2.2. Data Mining Application in Healthcare 5 2.2.1. Decision Tree 7 2.2.2. Association Rule 9 2.3. Statistical Methods 12 2.3.1. Cox Model 12 2.3.2. Survival Analysis 13 Chapter 3. MATERIALS AND METHODS 16 3.1. Data Sources 16 3.2. Criteria and Definition 16 3.3. Study Cohorts 17 3.4. Research Flow 21 3.4.1. Decision Tree 22 3.4.2. Association Rule 22 3.4.3. Cox Regression 23 3.4.4. Survival Analysis 23 Chapter 4. RESULT 25 4.1. Data Mining Analysis 25 4.1.1. Decision Tree 25 4.1.2. Association Rule 29 4.2. Statistical Analysis 31 4.2.1. Cox Regression 31 4.2.2. Survival Rate Analysis 34 Chapter 5. DISCUSSION 36 5.1. Conclusions 36 5.2. Limitations of this study 38 5.3. Future Research 40 Chapter 6. ACKNOWLEDGMENTS 42 Chapter 7. REFERENCE 43 Chapter 8. APPENDIX 46

    [1] http://www.cancerresearchuk.org. Cancer mortality for common cancers.
    [2] http://www.hpa.gov.tw. Top Ten Common Cancers in Taiwan.
    [3] C.-Y. Wu, H.-Y. Hu, C.-Y. Pu, N. Huang, H.-C. Shen, C.-P. Li, et al., "Pulmonary tuberculosis increases the risk of lung cancer," Cancer, vol. 117, pp. 618-624, 2011.
    [4] Y. M. Chen, J. Y. Chao, C. M. Tsai, P. Y. Lee, and R. P. Perng, "Shortened survival of lung cancer patients initially presenting with pulmonary tuberculosis," Japanese Journal of Clinical Oncology, vol. 26, pp. 322-327, Oct 1996.
    [5] Y. H. Yu, C. C. Liao, W. H. Hsu, H. J. Chen, W. C. Liao, C. H. Muo, et al., "Increased Lung Cancer Risk among Patients with Pulmonary Tuberculosis A Population Cohort Study," Journal of Thoracic Oncology, vol. 6, pp. 32-37, Jan 2011.
    [6] A. W. Benko, Ben, "Online decision support gives plans an edge," Managed Healthcare Executive, vol. 13, p. 20, May 2003 2003.
    [7] H.-Y. Liang, X.-L. Li, X.-S. Yu, P. Guan, Z.-H. Yin, Q.-C. He, et al., "Facts and fiction of the relationship between preexisting tuberculosis and lung cancer risk: A systematic review," International Journal of Cancer, vol. 125, pp. 2936-2944, 2009.
    [8] C. Shearer, "The CRISP-DM model: the new blueprint for data mining," Journal of data warehousing, vol. 5, pp. 13-22, 2000.
    [9] H. C. Koh and G. Tan, "Data mining applications in healthcare," Journal of Healthcare Information Management—Vol, vol. 19, p. 65, 2011.
    [10] A. Yiin, SQL SERVER 2008 Data Mining: 悅知文化, 2009.
    [11] L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen, Classification and regression trees: CRC press, 1984.
    [12] J. R. Quinlan, Discovering rules by induction from large collections of examples: Expert systems in the micro electronic age. Edinburgh University Press, 1979.
    [13] J. R. Quinlan, C4. 5: programs for machine learning vol. 1: Morgan kaufmann, 1993.
    [14] A. Oztekin, D. Delen, and Z. J. Kong, "Predicting the graft survival for heart–lung transplantation patients: An integrated data mining methodology," International journal of medical informatics, vol. 78, pp. e84-e96, 2009.
    [15] J. Y. Li, H. Q. Liu, S. K. Ng, and L. Wong, "Discovery of significant rules for classifying cancer diagnosis data," Bioinformatics, vol. 19, pp. II93-II102, Sep 2003.
    [16] R. Agrawal, T. Imieliński, and A. Swami, "Mining association rules between sets of items in large databases," in ACM SIGMOD Record, 1993, pp. 207-216.
    [17] J. Nahar, T. Imam, K. S. Tickle, and Y.-P. P. Chen, "Association rule mining to detect factors which contribute to heart disease in males and females," Expert Systems with Applications, vol. 40, pp. 1086-1093, 3// 2013.
    [18] Y.-M. Tai and H.-W. Chiu, "Comorbidity study of ADHD: Applying association rule mining (ARM) to National Health Insurance Database of Taiwan," International Journal of Medical Informatics, vol. 78, pp. e75-e83, 12// 2009.
    [19] D. R. Cox, "Regression models and life tables," JR stat soc B, vol. 34, pp. 187-220, 1972.
    [20] C. H. Tseng, "Diabetes and risk of bladder cancer: a study using the National Health Insurance database in Taiwan," Diabetologia, vol. 54, pp. 2009-2015, 2011/08/01 2011.
    [21] L.-Y. Lin, C.-H. Lee, C.-C. Yu, C.-T. Tsai, L.-P. Lai, J.-J. Hwang, et al., "Risk factors and incidence of ischemic stroke in Taiwanese with nonvalvular atrial fibrillation—A nation wide database analysis," Atherosclerosis, vol. 217, pp. 292-295, 7// 2011.
    [22] J. P. Costella, "A simple alternative to Kaplan-Meier for survival curves," Peter MacCallum Cancer Centre Working Paper No, 2010.
    [23] W. Zheng, W. Blot, M. Liao, Z. Wang, L. Levin, J. Zhao, et al., "Lung cancer and prior tuberculosis infection in Shanghai," British journal of cancer, vol. 56, p. 501, 1987.
    [24] M. S. TOCKMAN, N. R. ANTHONISEN, E. C. WRIGHT, and M. G. DONITHAN, "Airways obstruction and the risk for lung cancer," Annals of internal medicine, vol. 106, pp. 512-518, 1987.
    [25] D. M. SKILLRUD, K. P. OFFORD, and R. D. MILLER, "Higher risk of lung cancer in chronic obstructive pulmonary diseaseA prospective, matched, controlled study," Annals of internal medicine, vol. 105, pp. 503-507, 1986.
    [26] M. P. Purdue, L. Gold, B. Jarvholm, M. C. Alavanja, M. H. Ward, and R. Vermeulen, "Impaired lung function and lung cancer incidence in a cohort of Swedish construction workers," Thorax, vol. 62, pp. 51-56, 2007.
    [27] V. Krishnaiah, D. G. Narsimha, and D. N. S. Chandra, "Diagnosis of Lung Cancer Prediction System Using Data Mining Classification Techniques," IJCSIT) International Journal of Computer Science and Information Technologies, vol. 4, pp. 39-45, 2013.
    [28] H.-H. Lin, M. Murray, T. Cohen, C. Colijn, and M. Ezzati, "Effects of smoking and solid-fuel use on COPD, lung cancer, and tuberculosis in China: a time-based, multiple risk factor, modelling study," The Lancet, vol. 372, pp. 1473-1483, 2008.
    [29] C. C. Leung, T. Li, T. H. Lam, W. W. Yew, W. S. Law, C. M. Tam, et al., "Smoking and tuberculosis among the elderly in Hong Kong," American journal of respiratory and critical care medicine, vol. 170, pp. 1027-1033, 2004.
    [30] J. Wang, P. Hsueh, I. Jan, L. Lee, Y. Liaw, P. Yang, et al., "The effect of smoking on tuberculosis: different patterns and poorer outcomes," The International Journal of Tuberculosis and Lung Disease, vol. 11, pp. 143-149, 2007.
    Chapter 8.

    無法下載圖示 全文公開日期 2019/06/20 (校內網路)
    全文公開日期 2024/06/20 (校外網路)
    全文公開日期 2024/06/20 (國家圖書館:臺灣博碩士論文系統)
    QR CODE