簡易檢索 / 詳目顯示

研究生: AMNA SHAFA IRFAN
AMNA SHAFA IRFAN
論文名稱: 使用非侵入循環訊號擷取與機器學習模型比較癌症和冠狀動脈疾病
Comparison of non-invasive circulatory acquisition techniques for Cancer and Coronary Artery Disease using various Machine Learning Models
指導教授: 許昕
Hsin Hsiu
口試委員: 林益如
Yi-Ru Lin
Hsing-Kuo Pao
Hsing-Kuo Pao
Hsiao-Feng Hu
Hsiao-Feng Hu
學位類別: 碩士
Master
系所名稱: 應用科技學院 - 醫學工程研究所
Graduate Institute of Biomedical Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 70
中文關鍵詞: supervised learningnon-invasive medical devices (NIMD)Breast CancerCoronary Artery Disease (CAD)Blood Pressure (BP) waveformPhotoplethysmogram (PPG)
外文關鍵詞: supervised learning, non-invasive medical devices (NIMD), Breast Cancer, Coronary Artery Disease (CAD), Blood Pressure (BP) waveform, Photoplethysmogram (PPG)
相關次數: 點閱:292下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • Early detection of chronic diseases, such as Cancer and Coronary Artery Disease (CAD), has proved to be a challenging task in today’s world. In the current world, non-invasive techniques, being more feasible, have taken the place of invasive techniques for prior detection of diseases. However, non-invasive techniques could be unreliable because of their unjustified performance. The goal of this project is to use Artificial Intelligence (AI) for an optimal analysis of using two non-invasive techniques namely Blood Pressure (BP) and Photoplethysmogram (PPG) for the classification of Cancer and CAD. Also, to provide the preliminary study for the creation of a medical data analysis system, that could perform correct predictions for chronic diseases. Eight Machine Learning models, including k-Nearest Neighbors (k-NN), Linear Discriminant Analysis (LDA), Decision Trees (DT), Random Forest (RF), Gaussian Naïve Bayes (GNB), Logistic Regression (LR), Support Vector Machine (SVM) and Multilayer Perceptron (MLP) , for disease classification were used to test the accuracy of the signals acquired using these wearable non-invasive sensors. The quantitative analysis of the training data is based on 10-fold cross validation accuracy. However, for the predictive analysis, a confusion matrix is used; giving a rise to the calculation of accuracy, sensitivity, specificity, precision, false positive rate and area under convergence of the Receiver Operating Curve (ROC). The results showed that SVM classifier outperforms in most of the cases; classifying Cancer is easier than CAD; using PPG technique outperforms BP yielding above 89% result for 10-fold cross validation. Also, the predictive analysis using parameters based on confusion matrix gives near optimal results yielding 75% above AUC, accuracy, sensitivity, specificity and precision and 20% below for False Positive Rate (FPR). The results showed that our BP and PPG techniques of acquiring Cancer and CAD data could be used as a wearable method for earlier detection of chronic diseases. The current limitation in this project is the data size, which was quite small to get an optimal result. However, even with the limited number of samples, a near optimal result could be achieved. Therefore, we could consider using our system as an aid for the current technology to detect some chronic diseases such as Cancer and CAD at an early stage. In the future work, larger data could be used for a better generalized predictive analysis of some chronic diseases using BP and PPG waveforms. Ultimately, this could eradicate the requirement of patients specially going to the hospitals just for the early detection. In near future, using these wearable non-invasive sensors implanted in some smart devices such as a cellphone or I-pad etc., would be more cost-effective and time efficient.


    Early detection of chronic diseases, such as Cancer and Coronary Artery Disease (CAD), has proved to be a challenging task in today’s world. In the current world, non-invasive techniques, being more feasible, have taken the place of invasive techniques for prior detection of diseases. However, non-invasive techniques could be unreliable because of their unjustified performance. The goal of this project is to use Artificial Intelligence (AI) for an optimal analysis of using two non-invasive techniques namely Blood Pressure (BP) and Photoplethysmogram (PPG) for the classification of Cancer and CAD. Also, to provide the preliminary study for the creation of a medical data analysis system, that could perform correct predictions for chronic diseases. Eight Machine Learning models, including k-Nearest Neighbors (k-NN), Linear Discriminant Analysis (LDA), Decision Trees (DT), Random Forest (RF), Gaussian Naïve Bayes (GNB), Logistic Regression (LR), Support Vector Machine (SVM) and Multilayer Perceptron (MLP) , for disease classification were used to test the accuracy of the signals acquired using these wearable non-invasive sensors. The quantitative analysis of the training data is based on 10-fold cross validation accuracy. However, for the predictive analysis, a confusion matrix is used; giving a rise to the calculation of accuracy, sensitivity, specificity, precision, false positive rate and area under convergence of the Receiver Operating Curve (ROC). The results showed that SVM classifier outperforms in most of the cases; classifying Cancer is easier than CAD; using PPG technique outperforms BP yielding above 89% result for 10-fold cross validation. Also, the predictive analysis using parameters based on confusion matrix gives near optimal results yielding 75% above AUC, accuracy, sensitivity, specificity and precision and 20% below for False Positive Rate (FPR). The results showed that our BP and PPG techniques of acquiring Cancer and CAD data could be used as a wearable method for earlier detection of chronic diseases. The current limitation in this project is the data size, which was quite small to get an optimal result. However, even with the limited number of samples, a near optimal result could be achieved. Therefore, we could consider using our system as an aid for the current technology to detect some chronic diseases such as Cancer and CAD at an early stage. In the future work, larger data could be used for a better generalized predictive analysis of some chronic diseases using BP and PPG waveforms. Ultimately, this could eradicate the requirement of patients specially going to the hospitals just for the early detection. In near future, using these wearable non-invasive sensors implanted in some smart devices such as a cellphone or I-pad etc., would be more cost-effective and time efficient.

    Chapter-1 Introduction 1.1 Outlook 1.2 Project Objective 1.3 Thesis Outline Chapter-2 What is Machine Learning? 2.1 Supervised Learning 2.1.1 Classification and Regression 2.1.2 Generalization, Overfitting and Underfitting 2.1.2.1 Resampling and A Validation Dataset 2.1.3 Supervised Machine Learning Algorithms 2.1.3.1 k-Nearest Neighbors 2.1.3.2 Linear Discriminant Analysis 2.1.3.3 Logistic Regression 2.1.3.4 Decision Tree 2.1.3.5 Random Forest 2.1.3.6 Gaussian Naive Bayes 2.1.3.7 Support Vector Machine 2.1.3.8 Multilayer Perceptron (Deep Learning) 2.1.3.9 Summary of Supervised Machine Learning Models 2.2 Unsupervised Learning 2.2.1 Types of Unsupervised Learning 2.2.1.1 Unsupervised Transformations 2.2.1.1.1 Principal Component Analysis 2.2.1.1.2 Non-Negative Matrix Factorization 2.2.1.1.3 Manifold Learning with t-Distributed Stochastic Neighbor Embedding 2.2.1.2 Clustering 2.2.1.2.1 k-Means Clustering 2.2.1.2.2 Agglomerative Clustering 2.2.1.2.2 Density-Based Spatial Clustering of Applications with Noise 2.2.1.3 Summary of Unsupervised Learning Algorithms Chapter-3 Data Analysis 3.1 Data Acquisition Techniques 3.1.1 Blood Pressure 3.1.2 Photoplethysmography 3.1.3 Features 3.2 Problem Identification 3.3 Exploratory Data Analysis 3.4 Data Preprocessing 3.5 Comparison Among Predictive Machine Learning Models Chapter-4 Graphical User Interface Development Chapter-5 Results 5.1 Results for Blood Pressure Data 5.2 Results for Photoplethysmogram Data 5.3 Summary of Results Chapter-6 Discussion 6.1 Comparison Among Machine Learning Models 6.2 Comparative Analysis among Data Acquisition Methods 6.3 Comparative Analysis between Diseases 6.4 Preliminary Predictive Analysis Chapter-7 Conclusion and Future Work 7.1 Conclusion 7.2 Future Work References

    [1] "Breast Cancer". NCI. January 1980. Archived from the original on 25 June 2014. Retrieved 29 June 2014
    [2] https://www.mayoclinic.org/diseases-conditions/breast-cancer/symptoms-causes/syc-20352470- Mayo Clinic
    [3] https://www.breastcancer.org/symptoms/diagnosis/invasive- BREASTCANCER.ORG
    [4] Jain A, Khalid M, Qureshi MM, Georgian-Smith D, Kaplan JA, Buch K, Grinstaff MW, Hirsch AE, Hines NL, Anderson SW, Gallagher KM, Bates DD, Bloch BN (November 2017). "Stereotactic core needle breast biopsy marker migration: An analysis of factors contributing to immediate marker migration". European Radiology. 27 (11): 4797–4803. doi:10.1007/s00330-017-4851-7. PMID 28526892
    [5] Bhatia, Sujata K. (2010). Biomaterials for clinical applications (Online-Ausg. ed.). New York: Springer. p. 23. ISBN 9781441969200. Archived from the original on 10 January 2017
    [6] GBD 2013 Mortality and Causes of Death, Collaborators (17 December 2014). "Global, regional, and national age-sex specific all-cause and cause-specific mortality for 240 causes of death, 1990–2013: a systematic analysis for the Global Burden of Disease Study 2013". Lancet. 385 (9963): 117–171. doi:10.1016/S0140-6736(14)61682-2. PMC 4340604. PMID 25530442
    [7] https://www.ottawaheart.ca/heart-condition/coronary-artery-disease-atherosclerosis- University of Ottawa, Heart Institute
    [8] G. Sandeep Patil, H & N. Babu, Ajit & S. Ramkumar, P. (2017). Non-Invasive Data Acquisition and Measurement in Bio-Medical Technology:. 10.4018/978-1-5225-0571-6.ch010
    [9] Ramesh, A. N., Kambhampati, C., Monson, J. R., & Drew, P. J. (2004). Artificial intelligence in medicine. Annals of the Royal College of Surgeons of England, 86(5), 334-8
    [10] The definition "without being explicitly programmed" is often attributed to Arthur Samuel, who coined the term "machine learning" in 1959, but the phrase is not found verbatim in this publication, and may be a paraphrase that appeared later. Confer "Paraphrasing Arthur Samuel (1959), the question is: How can computers learn to solve problems without being explicitly programmed?" in Koza, John R.; Bennett, Forrest H.; Andre, David; Keane, Martin A. (1996). Automated Design of Both the Topology and Sizing of Analog Electrical Circuits Using Genetic Programming. Artificial Intelligence in Design '96. Springer, Dordrecht. pp. 151–170. doi:10.1007/978-94-009-0279-4_9
    [11] Bishop, C. M. (2006), Pattern Recognition and Machine Learning, Springer, ISBN 978-0-387-31073-2
    [12] "Why Machine Learning Models Often Fail to Learn: QuickTake Q&A". Bloomberg.com. 2016-11-10. Retrieved 2017-04-10
    [13] "The First Wave of Corporate AI Is Doomed to Fail". Harvard Business Review. 2017-04-18. Retrieved 2018-08-20
    [14] "Why the A.I. euphoria is doomed to fail". VentureBeat. 2016-09-18. Retrieved 2018-08-20
    [15] "9 Reasons why your machine learning project will fail". www.kdnuggets.com. Retrieved 2018-08-20
    [16] "IBM's Watson recommended 'unsafe and incorrect' cancer treatments - STAT". STAT. 2018-07-25. Retrieved 2018-08-21
    [17] Hernandez, Daniela; Greenwald, Ted (2018-08-11). "IBM Has a Watson Dilemma". Wall Street Journal. ISSN 0099-9660. Retrieved 2018-08-21
    [18] Friedman, Jerome H. (1998). "Data Mining and Statistics: What's the connection?". Computing Science and Statistics. 29 (1): 3–9
    [19] Russell, Stuart J.; Norvig, Peter (2010). Artificial Intelligence: A Modern Approach (Third ed.). Prentice Hall. ISBN 9780136042594
    [20] Alpaydin, Ethem (2010). Introduction to Machine Learning. MIT Press. p. 9. ISBN 978-0-262-01243-0
    [21] https://www.expertsystem.com/machine-learning-definition/- Expert System
    [22] Altman, N. S. (1992). "An introduction to kernel and nearest-neighbor nonparametric regression". The American Statistician. 46 (3): 175–185. doi:10.1080/00031305.1992.10475879
    [23] McLachlan, G. J. (2004). Discriminant Analysis and Statistical Pattern Recognition. Wiley Interscience. ISBN 0-471-69115-1. MR 1190469
    [24] Cox, DR (1958). "The regression analysis of binary sequences (with discussion)". J Roy Stat Soc B. 20 (2): 215–242. JSTOR 2983890
    [25] Rokach, Lior; Maimon, O. (2008). Data mining with decision trees: theory and applications. World Scientific Pub Co Inc. ISBN 978-9812771711
    [26] Ho, Tin Kam (1995). Random Decision Forests (PDF). Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, 14–16 August 1995. pp. 278–282. Archived from the original (PDF) on 17 April 2016. Retrieved 5 June 2016
    [27] Russell, Stuart; Norvig, Peter (2003) [1995]. Artificial Intelligence: A Modern Approach (2nd ed.). Prentice Hall. ISBN 978-0137903955
    [28] Cortes, Corinna; Vapnik, Vladimir N. (1995). "Support-vector networks". Machine Learning. 20 (3): 273–297. doi:10.1007/BF00994018
    [29] Rosenblatt, Frank. x. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan Books, Washington DC, 1961
    [30] Hilary L. Seal (1967). "The historical development of the Gauss linear model". Biometrika. 54 (1/2): 1–24. doi:10.1093/biomet/54.1-2.1. JSTOR 2333849
    [31] Seber, G. A. F.; Wild, C. J. (1989). Nonlinear Regression. New York: John Wiley and Sons. ISBN 0471617601
    [32] Bühlmann, Peter; Van De Geer, Sara (2011). "Statistics for High-Dimensional Data". Springer Series in Statistics: 9. doi:10.1007/978-3-642-20192-9. ISBN 978-3-642-20191-2. If p > n, the ordinary least squares estimator is not unique and will heavily overfit the data. Thus, a form of complexity regularization will be necessary
    [33] Draper, N. and Smith, H. (1981) Applied Regression Analysis, 2d Edition, New York: John Wiley & Sons, Inc
    [34] Hopfield, J. J. (1982). "Neural networks and physical systems with emergent collective computational abilities". Proc. Natl. Acad. Sci. U.S.A. 79 (8): 2554–2558. doi:10.1073/pnas.79.8.2554
    [35] Jang, Jyh-Shing R (1991). Fuzzy Modeling Using Generalized Neural Networks and Kalman Filter Algorithm (PDF). Proceedings of the 9th National Conference on Artificial Intelligence, Anaheim, CA, USA, July 14–19. 2. pp. 762–767
    [36] https://machinelearningmastery.com/overfitting-and-underfitting-with-machine-learning-algorithms/- Machine Learning Mastery
    [37] Devijver, Pierre A.; Kittler, Josef (1982). Pattern Recognition: A Statistical Approach. London, GB: Prentice-Hall
    [38] James, Gareth (2013). An Introduction to Statistical Learning: with Applications in R. Springer. p. 176. ISBN 978-1461471370
    [39] Martinez, A. M.; Kak, A. C. (2001). "PCA versus LDA" (PDF). IEEE Transactions on Pattern Analysis and Machine Intelligence. 23 (=2): 228–233. doi:10.1109/34.908974
    [40] https://machinelearningmastery.com/linear-discriminant-analysis-for-machine-learning/- Machine Learning Mastery
    [41] Newsom, I. (2015). Data Analysis II: Logistic Regression. Available at: http://web.pdx.edu/~newsomj/da2/ho_logistic.pdf
    [42] https://www.datascience.com/blog/introduction-to-machine-learning-algorithms- ORACLE. + DATASCIENCE.COM
    [43] Bottou L. (2010) Large-Scale Machine Learning with Stochastic Gradient Descent. In: Lechevallier Y., Saporta G. (eds) Proceedings of COMPSTAT'2010. Physica-Verlag HD
    [44] Kotsiantis, S. B. (2007). Supervised Machine Learning: A Review of Classification Techniques. Informatica 31 (2007). Pp. 249 – 268. Retrieved from IJS website: http://wen.ijs.si/ojs2.4.3/index.php/informatica/article/download/148/140
    [45] T. Hastie, R. Tibshirani, J. H. Friedman (2001) ― “The elements of statistical learning”, Data mining, inference, and prediction, 2001, New York: Springer Verlag
    [46] Gareth, James; Witten, Daniela; Hastie, Trevor; Tibshirani, Robert (2015). An Introduction to Statistical Learning. New York: Springer. p. 315. ISBN 978-1-4614-7137-0
    [47] Mehtaa, Dinesh; Raghavan, Vijay (2002). "Decision tree approximations of Boolean functions". Theoretical Computer Science. 270 (1–2): 609–623. doi:10.1016/S0304-3975(01)00011-1
    [48] Deng, H.; Runger, G.; Tuv, E. (2011). Bias of importance measures for multi-valued attributes and solutions (PDF). Proceedings of the 21st International Conference on Artificial Neural Networks (ICANN). pp. 293–300
    [49] Hothorn, T.; Hornik, K.; Zeileis, A. (2006). "Unbiased Recursive Partitioning: A Conditional Inference Framework". Journal of Computational and Graphical Statistics. 15 (3): 651–674. CiteSeerX 10.1.1.527.2935. doi:10.1198/106186006X133933. JSTOR 27594202
    [50] Brandmaier, Andreas M.; Oertzen, Timo von; McArdle, John J.; Lindenberger, Ulman (2012). "Structural equation model trees". Psychological Methods. 18 (1): 71–86. doi:10.1037/a0030001. hdl:11858/00-001M-0000-0024-EA33-9. PMC 4386908. PMID 22984789
    [51] Painsky, Amichai; Rosset, Saharon (2017). "Cross-Validated Variable Selection in Tree-Based Methods Improves Predictive Performance". IEEE Transactions on Pattern Analysis and Machine Intelligence. 39 (11): 2142–2153. doi:10.1109/TPAMI.2016.2636831. PMID 28114007
    [52] https://scikit-learn.org/stable/modules/naive_bayes.html- scikit learn
    [53] Auria, Laura and Moro, R. A., Support Vector Machines (SVM) as a Technique for Solvency Analysis (August 1, 2008). DIW Berlin Discussion Paper No. 811. Available at SSRN: https://ssrn.com/abstract=1424949 or http://dx.doi.org/10.2139/ssrn.1424949
    [54] Cybenko, G. 1989. Approximation by superpositions of a sigmoidal function Mathematics of Control, Signals, and Systems, 2(4), 303–314
    [55] Hastie, Trevor. Tibshirani, Robert. Friedman, Jerome. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York, NY, 2009
    [56] Pearson, K. (1901). "On Lines and Planes of Closest Fit to Systems of Points in Space" (PDF). Philosophical Magazine. 2 (11): 559–572. doi:10.1080/14786440109462720
    [57] Inderjit S. Dhillon; Suvrit Sra (2005). Generalized Nonnegative Matrix Approximations with Bregman Divergences (PDF). NIPS
    [58] van der Maaten, L.J.P.; Hinton, G.E. (Nov 2008). "Visualizing Data Using t-SNE" (PDF). Journal of Machine Learning Research. 9: 2579–2605
    [59] E.W. Forgy (1965). "Cluster analysis of multivariate data: efficiency versus interpretability of classifications". Biometrics. 21: 768–769. JSTOR 2528559
    [60] Rokach, Lior, and Oded Maimon. "Clustering methods." Data mining and knowledge discovery handbook. Springer US, 2005. 321-352
    [61] Ester, Martin; Kriegel, Hans-Peter; Sander, Jörg; Xu, Xiaowei (1996). Simoudis, Evangelos; Han, Jiawei; Fayyad, Usama M., eds. A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96). AAAI Press. pp. 226–231. CiteSeerX 10.1.1.121.9220. ISBN 1-57735-004-9
    [62] P. J. Phillips, P. J. Flynn, T. Scruggs, K. W. Bowyer, J. Chang, K. Hoffman, J. Marques, J. Min and W. Worek, “Overview of the Face Recognition Grand Challenge,” in Computer vision and pattern recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, 2005, pp. 947-954
    [63] C. Li, Y. Diao, H. Ma and Y. Li, “A Statistical PCA Method for Face Recognition,” in Intelligent Information Technology Application, 2008, pp. 376-380
    [64] Hsu, T. L., Chao, P. T., Hsiu, H., et al. (2006). Organ-specific ligation-induced changes in harmonic components of the pulse spectrum and regional vasoconstrictor selectivity in Wistar rats. Experimental Physiology, 91(1), 163-170. doi:10.1113/expphysiol.2005.031575
    [65] Chen Guanzhang. (2016). Cyclic signal measurement applied to the study of disease course monitoring in breast cancer patients
    [66] Song Qiaojuan. (2016). Cyclic measurement indicators for the study of coronary artery disease
    [67] https://scikit-learn.org/stable/ - scikit learn
    [68] "Tkinter — Python interface to Tcl/Tk — Python v2.6.1 documentation". Retrieved 2009-03-12
    [69] Summerfield, Mark (October 28, 2007), Rapid GUI Programming with Python and Qt (Covers PyQt4) (1st ed.), Prentice Hall, p. 648, ISBN 978-0-13-235418-9

    無法下載圖示 全文公開日期 2024/01/23 (校內網路)
    全文公開日期 2039/01/23 (校外網路)
    全文公開日期 2039/01/23 (國家圖書館:臺灣博碩士論文系統)
    QR CODE