簡易檢索 / 詳目顯示

研究生: 余姿瑱
TZU-TIEN YU
論文名稱: 發展以亂度為基之監督式離散化方法於腦中風稀少關聯法則探勘
Developing Entropy-Based Discretization Method to Discover the Rare Association Rules of Cerebrovascular Disease
指導教授: 歐陽超
Chao Ou-Yang
口試委員: 郭人介
Ren-Jieh Kuo
汪漢澄
Han-Cheng Wang
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業管理系
Department of Industrial Management
論文出版年: 2017
畢業學年度: 105
語文別: 中文
論文頁數: 88
中文關鍵詞: 腦中風亂度稀少關聯法則Apriori-Rare
外文關鍵詞: Cerebrovascular Disease, Entropy, Apriori –Rare, Rare association rules
相關次數: 點閱:286下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

腦中風是國人所熟知的高風險疾病,雖然由腦部核磁共振(MRI, Magnetic Resonance Imaging)檢查可了解腦部血管阻塞之狀況,但由於腦部MRI並非一般健檢項目,因此民眾除非安排專門檢查,一般來說都無法得知自己腦部血管阻塞的程度。然而一般性的健康檢查則是國人可以定期安排的檢查,因此本研究擬運用腦部健檢資料集(包含一般性健檢資料與MRI檢查結果之診斷資料)去探勘一般健檢項目與腦中風的相關性。
由於Apriori關聯分析法運用高頻項目集之概念去找出滿足最小支持度(Minimum Support)與最小信賴度(Minimum Confidence)的關聯法則,因此往往探勘出的是一些引起中風的已知法則(例如:高齡、高血壓、高血脂、高膽固醇等)。但實務上仍有不少腦部血管病變患者只具有部分特徵(例如:中齡、BMI正常、高血脂、高膽固醇) ,但往往因其發生頻率不是很高而被忽略。而具有這些特徵的民眾,將來可能也有發生腦中風的風險。 所以本研究擬藉由探勘一般健檢項目中,支持度相對次高的項目集中所得的知識進而協助健檢醫師對腦中風的預防。
本研究擬先運用亂度找出雜亂程度最低的區間組合,以利於後續關聯法則的結果。而探勘這些尚未廣為人知之知識項目集的支持度往往比已知知識項目集支持度,因此使用Apriori -Rare演算法,找出支持度相對較低的項目及,挖掘出本研究目標的腦中風未察覺項目法則,找出一些可能引起腦中風的次要屬性區間之關聯預測模型。


Brain strokes have always been highlighted as a big threat to health in Taiwan as well as worldwide. It is costly to detect stroke through brain image examination like Magnetic Resonance Imaging (MRI). Therefore, most of the people don’t know their cerebrovascular conditions.
Apriori is a well known association mining methodology. It can identify the high frequencies itemsets fulfilled the required minimum support and confidence. However, when applying this method in cerebrovascular disease mining, the identified item sets are usually well known due to the concept of high frequency mining. However, in the real world, many patients with cerebrovascular disease might have few rare known symptoms such as medium BMI and medium ages. Therefore, this research will propose an entropy based discritization method along with a Apriori-Rare method to identify the association rules with those rare known knowledge.
This approach includes two stages. In the feature discretization stage, entropy is used as an index to identify the feature intervals with low uncertainty. Then the identified intervals will be applied by the Apriori-Rare algorithm to find the association rules with rare known symptoms. These rules can be a reference for doctors to identify the potential cerebrovascular disease patients.

摘要 I Abstract II 誌謝 III 目錄 IV 圖目錄 VI 表目錄 VII 第一章 緒論 1 1.1 研究背景 1 1.2 研究目的 2 1.3 研究議題 3 1.3.1腦中風未察覺項目關聯法則 3 1.3.2屬性離散化 3 1.4 重要性 4 1.5 論文架構 5 第二章 文獻探討 6 2.1 腦血管疾病的症狀與分類 6 2.2 關聯法則 7 2.3 離散化 12 2.4 屬性篩選 13 第三章 研究方法 14 3.1 研究流程與架構 14 3.2 資料前處理 16 3.2.1 資料整理 16 3.3.2 處理不平衡資料 16 3.2.3 資料正規化 17 3.3 定義未察覺項目法則 17 3.3.1 關聯法則 17 3.3.2 稀少關聯法則 18 3.4 屬性離散化與篩選 18 3.4.1 Entropy離散化 18 3.4.2 屬性篩選 25 3.5 Apriori-Rare 26 第四章 個案與實驗成果 32 4.1資料介紹 32 4.2資料前處理 34 4.2.1資料整理 34 4.2.2樣本抽樣 36 4.2.3資料轉換 36 4.3 未察覺項目腦中風關聯知識 36 4.4實驗數據結果與分析 38 4.4.1屬性離散化 38 4.4.2屬性篩選 40 4.4.3關聯法則分析 45 4.4.4結果分析 58 第五章 結論與建議 62 5.1 結論 62 5.2 研究限制與未來建議 63 參考文獻 64 附錄 68

Agapito, G., Guzzi, P. H., & Cannataro, M. (2015). DMET-Miner: Efficient discovery of association rules from pharmacogenomic data. Journal of Biomedical Informatics, 56, 273-283. doi:10.1016/j.jbi.2015.06.005
Agrawal, R., Imieliński, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. Paper presented at the Acm sigmod record.
Bayardo, R. J., Agrawal, R., & Gunopulos, D. (2000). Constraint-Based Rule Mining in Large, Dense Databases. Data Mining and Knowledge Discovery, 4(2), 217-240. doi:10.1023/a:1009895914772
Cohen, E., Datar, M., Fujiwara, S., Gionis, A., Indyk, P., Motwani, R., . . . Yang, C. (2001). Finding interesting associations without support pruning. IEEE Transactions on Knowledge and Data Engineering, 13(1), 64-78. doi:10.1109/69.908981
Cremaschi, P., Carriero, R., Astrologo, S., Coli, C., Lisa, A., Parolo, S., & Bione, S. (2015). An Association Rule Mining Approach to Discover lncRNAs Expression Patterns in Cancer Datasets. Biomed Research International, 13. doi:10.1155/2015/146250
Dash, M., & Liu, H. (1997). Feature selection for classification. Intelligent Data Analysis, 1(1), 131-156. doi:http://dx.doi.org/10.1016/S1088-467X(97)00008-5
Fayyad, U. M., & Irani, K. B. (1993). Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. Proceedings of 13th International Joint Conference on Artificial Intelligence, pp. 1022-1029.
García, S., Luengo, J., Sáez, J. A., López, V., & Herrera, F. (2013). A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning. IEEE Transactions on Knowledge and Data Engineering, 25(4), 734-750. doi:10.1109/TKDE.2012.35
Grzymala-Busse, J. (2013). Discretization Based on Entropy and Multiple Scanning. Entropy, 15(5), 1486.
Gupta, A., Mehrotra, K. G., & Mohan, C. (2010). A Clustering-Based Discretization for Supervised Learning. Statistics & Probability Letters , 80 (9-10), pp. 816-824.
Hahsler, M. (2006). A Model-Based Frequency Constraint for Mining Associations from Transaction Data. Data Mining and Knowledge Discovery, 13(2), 137-166. doi:10.1007/s10618-005-0026-2
Han, J., & Kamber, M. (2006). Data Mining:Concepts and Techniques.
Holland, J. H. (1992). Adaptation in Natural and Artificial Systems.
Huan, L., & Setiono, R. (1995, 5-8 Nov 1995). Chi2: feature selection and discretization of numeric attributes. Paper presented at the Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.
Inzitari, D., Eliasziw, M., Gates, P., Sharpe, B., Chan, R., Meldrum, H., et al. (2000). The causes and risk of stroke in patients with asymptomatic internal-carotid-artery stenosis. North American Symptomatic Carotid Endarterectomy Trial Collaborators. . The New England journal of medicine , 342 (23), pp. 1693-1700.
Ivancevic, V., Tusek, I., Tusek, J., Knezevic, M., Elheshk, S., & Lukovic, I. (2015). Using association rule mining to identify risk factors for early childhood caries. Computer Methods and Programs in Biomedicine, 122(2), 175-181. doi:10.1016/j.cmpb.2015.07.008
Ji, Y. Q., Ying, H., Tran, J., Dews, P., Lau, S. Y., & Massanari, R. M. (2016). A functional temporal association mining approach for screening potential drug-drug interactions from electronic patient databases. Informatics for Health & Social Care, 41(4), 387-404. doi:10.3109/17538157.2015.1064427
Ke, W., Yu, H., & Jiawei, H. (2003). Pushing support constraints into association rules mining. IEEE Transactions on Knowledge and Data Engineering, 15(3), 642-658. doi:10.1109/TKDE.2003.1198396
Kennedy, J., & Eberhart, R. (1995, Nov/Dec 1995). Particle swarm optimization. Paper presented at the Neural Networks, 1995. Proceedings., IEEE International Conference on.
Koh, Y. S., & Rountree, N. (2005). Finding Sporadic Rules Using Apriori-Inverse. In T. B. Ho, D. Cheung, & H. Liu (Eds.), Advances in Knowledge Discovery and Data Mining: 9th Pacific-Asia Conference, PAKDD 2005, Hanoi, Vietnam, May 18-20, 2005. Proceedings (pp. 97-106). Berlin, Heidelberg: Springer Berlin Heidelberg.
Koh, Y. S., & Rountree, N. (2010). Rare Association Rule Mining: An Overview. doi:10.4018/978-1-60566-754-6.ch001
Koh, Y. S., Rountree, N., & O’Keefe, R. A. (2008). Mining interesting imperfectly sporadic rules. Knowledge and Information Systems, 14(2), 179-196. doi:10.1007/s10115-007-0074-6
Li, J., Zhang, X., Dong, G., Ramamohanarao, K., & Sun, Q. (1999). Efficient Mining of High Confidence Association Rules without Support Thresholds. In J. M. Żytkow & J. Rauch (Eds.), Principles of Data Mining and Knowledge Discovery: Third European Conference, PKDD’99, Prague, Czech Republic, September 15-18, 1999. Proceedings (pp. 406-411). Berlin, Heidelberg: Springer Berlin Heidelberg.
Liu, B., Hsu, W., & Ma, Y. (1999a). Mining association rules with multiple minimum supports. Paper presented at the Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, San Diego, California, USA.
Liu, B., Hsu, W., & Ma, Y. (1999b). Pruning and summarizing the discovered associations. Paper presented at the Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, San Diego, California, USA.
Luna, J. M., Romero, C., Romero, J. R., & Ventura, S. (2015). An evolutionary algorithm for the discovery of rare class association rules in learning management systems. Applied Intelligence, 42(3), 501-513. doi:10.1007/s10489-014-0603-4
Nahar, J., Imam, T., Tickle, K. S., & Chen, Y.-P. P. (2013). Association rule mining to detect factors which contribute to heart disease in males and females. Expert Systems with Applications, 40(4), 1086-1093. doi:http://dx.doi.org/10.1016/j.eswa.2012.08.028
Patel, U. Y. B. A. (2014). A Recent Overview: Rare Association Rule Mining. International Journal of Computer Applications.
Quinlan, J. R. (1986). Induction of Decision Trees. Mach. Learn., 1(1), 81-106. doi:10.1023/a:1022643204877
Rahal, I., Ren, D., Wu, W., & Perrizo, W. (2004, 15-17 Nov. 2004). Mining confident minimal rules with fixed-consequents. Paper presented at the 16th IEEE International Conference on Tools with Artificial Intelligence.
Robnik-Šikonja, M., & Kononenko, I. (2003). Theoretical and Empirical Analysis of ReliefF and RReliefF. Machine Learning, 53(1), 23-69. doi:10.1023/a:1025667309714
Seno, M., Karypis, G., Center, A. H. P. C. R., & Minnesota, U. o. (2001). LP Miner: An Algorithm for Finding Frequent Itemsets Using Length-decreasing Support Constraint: Army High Performance Computing Research Center.
Solutions, S. C. T. (2012). Stem Cell Treatment Solutions.
SunithaVanamala, sree, L. P., & Bhavani, S. D. (2013). Efficient Rare Association Rule Mining Algorithm. International Journal of Engineering Research and Applications (IJERA).
Surana, A., Kiran, R. U., & Reddy, P. K. (2010). Selecting a Right Interestingness Measure for Rare Association
Rules.
Szathmáry, L. (2014). Finding minimal rare itemsets with an extended version of the Apriori algorithm. International Conference on Applied Informatics
Eger, pp. 85–92. doi: 10.14794/ICAI.9.2014.1.85
Szathmary, L., Napoli, A., & Valtchev, P. (2007, 29-31 Oct. 2007). Towards Rare Itemset Mining. Paper presented at the 19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007).
Tao, F., Murtagh, F., & Farid, M. (2003). Weighted Association Rule Mining using weighted support and significance framework. Paper presented at the Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, Washington, D.C.
Vanamala, S., Sree, L. P., & Bhavani, S. D. (2014, 11-13 Dec. 2014). Rare association rule mining for data stream. Paper presented at the International Conference on Computing and Communication Technologies.
Wang, K., He, Y., & Cheung, D. W. (2001). Mining confident rules without support requirement. Paper presented at the Proceedings of the tenth international conference on Information and knowledge management, Atlanta, Georgia, USA.
Wulandari, C. W. P. (2014). Applying a Multivariate Discretization Method for Mining Association Rules from a Cerebrovascular Health Examination Dataset.
Xiong, H., Tan, P. N., & Vipin, K. (2003, 19-22 Nov. 2003). Mining strong affinity association patterns in data sets with skewed support distribution. Paper presented at the Third IEEE International Conference on Data Mining.
Yun, H., Ha, D., Hwang, B., & Ho Ryu, K. (2003). Mining association rules on significant rare data using relative support. Journal of Systems and Software, 67(3), 181-191. doi:https://doi.org/10.1016/S0164-1212(02)00128-0
Zhong, X. X., He, Q. Y., Liao, J. Q., Yin, X. J., Zhao, G. F., & Li, M. (2016). The compatibility law of Chinese patent medicines for the treatment of coronary heart disease angina pectoris based on association rules and complex network. International Journal of Clinical and Experimental Medicine, 9(6), 9418-9424.

QR CODE