基於P值與主成分分析之樸素貝葉斯分類演算法｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	楊超霆 Chao-Ting Yang
論文名稱：	基於P值與主成分分析之樸素貝葉斯分類演算法 A Naive Bayes’ Classifier Based on The p-values of Features And Principal Component Analysis
指導教授：	楊維寧 Wei-Ning Yang
口試委員:	陳雲岫 Yun-Shiow Chen 呂永和 Yung-Ho Leu
學位類別：	碩士 Master
系所名稱：	管理學院 - 資訊管理系 Department of Information Management
論文出版年：	2022
畢業學年度：	110
語文別：	中文
論文頁數：	13
中文關鍵詞：	高斯分布、P值、多重共線性、主成分分析、樸素貝葉斯分類器
外文關鍵詞：	Gaussian distributions, P-value, Multicollinearity, PCA, naive Bayes’ classifier
相關次數：	點閱：331 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

以往在實驗中會假設資料是高斯分布，但真實資料的研究中發現這是不合理的。因此本研究提出一種使用「P值」（P-Value）的演算方法，不對特徵進行分布假設之情形下，在訓練資料中計算特徵在類別的比例，因此P值更能有效的對訓練資料進行分類。P值的大小能反映出測試資料的特徵與類別的差異程度。當P值越大則代表測試資料與該類別更接近，因此更容易被分類至此類別。為了減少特徵的「多重共線性」（Multicollinearity）問題，會先使用「主成分分析」（Principal Component Analysis）將所有特徵轉換為互不相關的「主成分」（Principal Component）；而後根據訓練資料經過主成分分析的結果分布，來計算每一個主成分的P值。本研究針對樸素貝葉斯分類器結合P值與主成分分析，實驗結果顯示，該方法能有效提升樸素貝葉斯分類器分類效果。

Empirical studies on real datasets often indicate the assumptions of Gaussian distributions of features may not be plausible. Without distribution assumptions on the features, the proportion of feature values belonging to a specific class in the training data which are more extreme than the the feature value of the testing instance, called p-value, is used to judge the likelihood that the testing instance falls in the specific class. The size of a p-value reflects the discrepancy between the feature value of the testing instance and the expected feature value in the class. For a specific feature, a testing instance with a large p-value indicates the testing instance is consistent with the expected instance in the class and therefore is more likely to be classified into the class. To alleviate the problem of multicollinearity of features, the principal component analysis is first used to transform the original features into uncorrelated principal components. Then the p-value corresponding to each principal component is evaluated from the empirical distribution of the principal component in the training data. The proposed method which combines the p-values of features and the principal component analysis is studied for a naive Bayes’ classifier. Empirical experiments show that the proposed method achieves improvements over the Gaussian naive Bayes’ classifier.

摘要 1
Abstract 1
致謝 2
目錄 3
第1章    緒論 4
1.1    研究動機    4
1.2    研究目的    4
1.3    論文架構    5
第2章    資料集與研究方法 5
2.1    資料集簡介 5
2.1.1    Banknote Authentication    5
2.1.2    Diabetic Retinopathy Debrecen Data Set 5
2.2    資料處理方法 6
2.2.1    主成分分析 6
2.2.2    樸素貝葉斯分類演算法 6
2.2.3    基於P值所計算之樸素貝葉斯分類演算法 7
2.2.4    拉普拉斯平滑 7
第3章    實驗步驟與結果 8
3.1    實驗步驟    8
3.1.1    A組：使用全部特徵進行NBC-PV 8
3.1.2    B組：使用PCA挑選特徵後進行NBC-PV分類器 8
3.2    實驗結果    9
3.2.1    Banknote Authentication實驗結果 10
3.2.2    Diabetic Retinopathy Debrecen Data Set實驗結果 10
第4章    結論與未來展望 10
4.1    結論 10
4.2    未來展望    11
參考文獻    12

                                

[1] Bryant, Fred B., Yarnold, Paul R., ” Principal-components analysis and exploratory and confirmatory factor analysis ” in L. G. Grimm & P. R. Yarnold (Eds.), Reading and understanding multivariate statistics, pp. 99–136, 1995.
[2] David A. Field, ” Laplacian smoothing and Delaunay triangulations” Numerical Methods in Biomedical Engineering, 4(6), pp. 709-712, 1988.
[3] Edward R. Mansfield, Billy P. Helms, ” Detecting Multicollinearity ” The American Statistician, 36(3), pp. 158-160, 1982.
[4] Hawkins, Douglas M.,” The problem of overfitting,” Journal of Chemical Information and Modeling, 44(1), pp. 1–12, 1997.
[5] Nello Cristianini, John Shawe-Taylor, An Introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press, 2000.
[6] Stuart J. Russell, Peter Norvig, Artificial Intelligence: A Modern Approach. 2010.
[7] UCI Machine Learning Repository: banknote authentication Data Set. [Online: http://archive.ics.uci.edu/ml/datasets/banknote+authentication].
[8] UCI Machine Learning Repository: Diabetic Retinopathy Debrecen Data Set Data Set. [Online:https://archive.ics.uci.edu/ml/datasets/Diabetic+Retinopathy+Debrecen+Data+Set].
[9] Xiao Fang., ” Inference-Based Naïve Bayes: Turning Naïve Bayes Cost-Sensitive” IEEE Transactions on Knowledge and Data Engineering, 25(10), pp. 2302 - 2313, 2013.
[10] Xiaoqun Wang, Ian H. Sloan, ” Brownian bridge and principal component analysis: towards removing the curse of dimensionality” IMA Journal of Numerical Analysis, 27(4), pp. 631-654, 2007.

全文公開日期 2025/07/11 (校內網路)
全文公開日期 2025/07/11 (校外網路)
全文公開日期 2025/07/11 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文