簡易檢索 / 詳目顯示

研究生: 朱芷萱
Chih-Hsuan Chu
論文名稱: 簡單貝氏分類器結合p-值之研究
Naive Bayesian classifier based on p-values.
指導教授: 楊維寧
Wei-Ning Yang
口試委員: 陳雲岫
Yun-Shiow Chen
呂永和
Yung-Ho Leu
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理系
Department of Information Management
論文出版年: 2016
畢業學年度: 104
語文別: 中文
論文頁數: 30
中文關鍵詞: 簡單貝氏分類器主成份分析p-值
外文關鍵詞: Naive Bayesian Classifier, Principal Component Analysis, p-value
相關次數: 點閱:314下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

本研究主要是應用「簡單貝式分類器」 結合「主成分分析」 法與統計推論中的「 p-值」 進行二元問題分類,並經由「屬性變數」 的篩選出主要相關因素來提升分類的準確率。貝氏分類器是依據待歸類物件的「屬性向量」將物件歸類於最有可能的類別。用以歸類物件於各類別的機率稱為 「事後機率」。「事後機率是指觀察到待歸類物件的「屬性向量」之後,依據「屬性向量」 在各類別中出現的機率 (likelihood) 來修正物件歸屬於各類別的「事前機率」。具有某「屬性向量」 之物件歸屬各類別的「事後機率」 正比於物件歸屬各類別的「事前機率」 與「屬性向量」 在各類別中出現機率的乘積。本研究應用「主成分分析法」去除「屬性變數」之間的關聯性,以達到「貝式分類器」各「屬性」獨立的假設。「假設檢定」中的 p-值 (p-value) 大小主要是反映「實際觀察到的」 與「當假設為真時所預期的」之間的落差大小,p-值愈小表示落差愈大。本研究是以p-值取代貝氏分類器中「屬性向量」 在各類別中出現的機率(likelihood) 。本研究並以乳癌資料集進行研究方法的統計實驗。


Naive Bayesian classifier estimates the joint likelihood of a testing instance as the product of the likelihood for each individual feature estimated from the training data and then applies Bayes' rule to calculate the posterior distribution of the class. In addition to the likelihood, p-value in statistical hypothesis testing which reflects the discrepancy between the observed sample and the expected sample under some hypothesis serves similar purpose and will be used to replace the likelihood in the proposed Bayesian classifier. We alleviate the naive independence assumption among features for each class by applying principal component analysis to obtain the uncorrelated transformed features. The joint p-value in the proposed Bayesian classifier which is the product of the p-value associated with each transformed feature estimated from the training data is used to calculate, in conjunction with the prior distribution, the posterior p-value for the testing instance. Empirical results demonstrate substantial improvement on the classification accuracy when compared with the existing classification methods.

摘 要 IV ABSTRACT V 致 謝 VI 目錄 VII 圖目錄 VIII 表目錄 IX 第一章 緒論 …1 1.1 研究動機 1 1.2 研究目的 1 1.3 論文架構 1 第二章 文獻探討 3 2.1 簡單貝氏分類器 (NAIVE BAYES CLASSIFIER) 3 2.2 主成分分析法(PRINCIPAL COMPONENT ANALYSIS) 5 2.3 「假設檢定」 (HYPOTHESIS TESTING) 的 P-值 (P-VALUE) 5 2.4 利用 P-值 (P-VALUE) 利用從事分類 6 2.5 屬性變數的篩選 (FEATURE SELECTION) 7 2.6 文獻探討 12 第三章 實驗分析 13 3.1 實驗環境 13 3.1.1 操作工具 13 3.1.2 資料來源 13 3.1.2 實驗項目 14 3.2 實驗方法 14 3.2.1 資料前置處理 15 3.2.2 實驗步驟 15 3.3 實驗結果分析 17 第四章 結論 18 附錄 19

[1] I. Rish. An empirical study of the naive Bayes classifier. T.J. Watson Research Center, 2001.
[2] P. Domingos and M. Pazzani. On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29:103–130, 1997.
[3] N. Friedman, D. Geiger, and Goldszmidt M. Bayesian network classifiers. Machine Learning, 29:131–163, 1997.
[4] J. Hilden. Statistical diagnosis based on conditional independence does not require it. Comput. Biol. Med., 14(4):429–435,1984.
[5] P. Langley, W. Iba, and K. Thompson. An analysis of Bayesian classifiers. In Proceedings of the Tenth National Conference on Artificial Intelligence, pages 399–406, San Jose, CA, 1992. AAAI Press.
[6] J. Hellerstein, Jayram Thathachar, and I. Rish. Recognizing end-user transactions in performance management. In Proceedings ofAAAI-2000, pages 596–602, Austin, Texas, 2000.
[7] I. Rish, J. Hellerstein, and T. Jayram. An analysis of data characteristics that affect naive Bayes performance. Technical Report RC21993, IBM T.J. Watson Research Center, 2001
[8] C. J. Colbourn, Learning Augmented Bayesian Classifiers: A Comparison of Distribution-based and Classification-based Approaches (1999), New York: Oxford University Press(1987).
[9] Kim Esbensen, Paul Geladi, Principal component analysis, Chemometrics and Intelligent Laboratory Systems, 2 (1987) 37-52.
[10] a H. Hotelling, Analysis of a complex of statistical variables into principal components, Journal of Educational Psychology, 24 (1933), pp. 417–441.
[11] Koller, Daphne and Sahami, Mehran, Toward Optimal Feature Selection, Toward Optimal Feature Selection Technical Report. Stanford InfoLab , (1996)
[12] Pal, M., Foody, G.M, Feature Selection for Classification of Hyperspectral Data by SVM, Geoscience and Remote Sensing, IEEE Transactions on (Volume:48 , Issue: 5 ). (2010)
[13] Monalisa Mandal, Anirban Mukhopadhyay, An Improved Minimum Redundancy Maximum Relevance Approach for Feature Selection in Gene Expression Data, Procedia Technology, Modeling Techniques and Applications Volume 10, 2013, Pages 20–27.
[14] Kemal Polat, Salih Güne¸s, Breast cancer diagnosis using least square support vector machine, Digital Signal Processing 17 (2007) 694–701 

無法下載圖示 全文公開日期 2021/07/22 (校內網路)
全文公開日期 本全文未授權公開 (校外網路)
全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
QR CODE