研究生: |
蔡嘉文 Chia-Wen Tsai |
---|---|
論文名稱: |
植基於反應變數與主成份相關性之多項式核函數特徵選取方法 A Feature Selection Strategy of Polynomial Kernel Function Based on the Correlation between Response Variable and Principal Components |
指導教授: |
楊維寧
Wei-Ning Yang |
口試委員: |
呂永和
Yung-ho Leu 陳雲岫 Yun-Shiow Chen |
學位類別: |
碩士 Master |
系所名稱: |
管理學院 - 資訊管理系 Department of Information Management |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 中文 |
論文頁數: | 20 |
中文關鍵詞: | 核函數 、多項式核 、線性不可分 、維度擴張 、主成分分析 、特徵選取 、決定係數 、羅吉斯迴歸 |
外文關鍵詞: | Kernel Function, Polynomial Kernel, Non-linearly Separable Data, Dimensional Expansion, Principal Components Analysis, Feature Selection, Coefficient of Determination, Logistic Regression |
相關次數: | 點閱:752 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在機器學習(Machine learning)領域中,低維度且線性不可分的資料一直都是個很有挑戰性的任務,為了解決這一困難,可以應用核函數將屬性向量從低維度空間轉換到高維度空間。但是在增加屬性向量的同時,可能會造成屬性向量之間的多重共線性問題,因此,會採用主成分分析(PCA)來解決出現的多重共線性(multi-collinearity)問題。
本研究中採用了多項式核函數來增加屬性向量的維度,並且應用PCA 將其轉換成不相關的主成分。但是與傳統的PCA相較之下,我們提出的方法,會將主成分按其和反應變數的相關性做排序,而不是傳統PCA的方式以變異量做排序,排序完後將其作為分類模型的候選特徵。
分類模型準確度(accuracy)可以由反應變數及選取特徵之間的R^2亦即「決定係數」(coefficient of determination , R^2)來做出判斷,當我們將主成份依據相關係數之平方r^2排序並由Top_(1 )、 Top_(2 )、 Top_(3 ) ... 的順序逐步加入欲訓練之資料集時,由於主成份之間是互不相關的,所以r^2是可以累加計算R^2的,因此我們才決定利用r^2而非如傳統方法利用變異量(variance)來做為特徵選取指標。
利用羅吉斯迴歸(Logistic Regression)作為分類模型,結合上我們對本研究提出之特徵選取方法,再利用真偽鈔資料集與眼球影像資料集進行實驗,實驗結果顯示本研究所提之特徵選取方法相較於傳統的核函數主成分分析方法對於分類準確度有顯著的提升,獲得了更高的分類效能。
In machine learning, it is a challenging task to deal with low-dimensional and non-linearly separable data. One possible way to alleviate this difficulty is to apply kernel function to map the feature vector from low-dimensional space into a high-dimensional space. However, increasing the number of feature variables may incur the problem of multicollinearity among the feature variables. Principal Components Analysis (PCA) is then adopted to solve the multicollinearity problem emerged.
In this research, a polynomial kernel function is used to increases the dimension of the feature vector. Then PCA is adopted to generate the uncorrelated principal components. Contrast to the traditional PCA, principal components are ranked by their correlations, instead of variances, with the response variable and then serve as the feature candidates for the classification model.
The accuracy of the classification model depends on the response variable and the square of the correlation coefficient of the selected feature. That is the "coefficient of determination"(R^2). We rank the principal components according to the square of the correlation coefficient (r^2), and gradually adding Top_(1 ),Top_(2 ),Top_(3 ), ... to the dataset to be trained. Since the principal components are not related to each other, r^2 can be accumulated to calculate R^2. As a result, we decide to apply r^2 as a feature-selection indicator instead of the variance applied in traditional principal component analysis.
A Logistic Regression model is used as the classification model combined with the proposed feature selection scheme. The proposed model is applied to the Banknote Authentication dataset and the diabetic retinopathy dataset for experimentation. Experimental results demonstrate that the proposed model achieves higher classification accuracy than the traditional kernel function PCA method.
[1] Stuart J. Russell, Peter Norvig, Artificial Intelligence: A Modern Approach. 2010.
[2] Nello Cristianini, John Shawe-Taylor, An Introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press, 2000.
[3] Michael Patrick Allen, “The problem of multicollinearity,” in Understanding Regression Analysis, Springer, Boston, MA., 1997.
[4] Svante Wold, Kim Esbensen and Paul Geladi, “Principal Component Analysis,” Chemometrics and Intelligent Laboratory Systems, Vol. 2, Issues 1–3, pp. 37-52, Aug. 1987.
[5] Thomas Hofmann, Bernhard Schölkopf, and Alexander J. Smola, “Kernel methods in machine learning,” Ann. Statist., Vol. 36, pp. 1171-1220, 3 Num. 2008.
[6] Campbell, Colin. "An introduction to kernel methods." Studies in Fuzziness and Soft Computing 66 (2001): 155-192.
[7] Bernhard Scholkopf, Alexander Smola, and Klaus-Robert Muller, “Nonlinear Component Analysis as a Kernel Eigenvalue Problem,” Neural Computation, vol. 10, no. 5, pp. 1299-1319, 1 Jul. 1998.
[8] Dua, D. and Graff, C., “banknote authentication Data Set,” UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science, 2019.
[9] Balint Antal, Andras Hajdu: An ensemble-based system for auto-matic screening of diabetic retinopathy, Knowledge-Based Systems 60 (April 2014), 20-27.
[10] Hein, M. and O. Bousquet. “Hilbertian Metrics and Positive Definite Kernels on Probability Measures,” AISTATS, 2005.
[11] N. Bourbaki, “Chapter V: Hilbert spaces (elementary theory),” in Topological Vector Spaces, Berlin: Springer-Verlag, 1987.
[12] Liang, Zhiyu, and Yoonkyung Lee. "Eigen‐analysis of nonlinear PCA with polynomial kernels." Statistical Analysis and Data Mining: The ASA Data Science Journal 6.6 (2013): 529-544.
[13] Chang, Yin-Wen, et al. "Training and testing low-degree polynomial data mappings via linear SVM." Journal of Machine Learning Research 11.4 (2010).
[14] Berkson, J. (1944). Application of the Logistic Function to Bio-Assay. Journal of the American Statistical Association, 39(227), 357-365.