植基於反應變數與主成份相關性之核函數特徵選取方法

簡易檢索 / 詳目顯示

回結果列表

研究生：	鄔育琳 Yu-Lin Wu
論文名稱：	植基於反應變數與主成份相關性之核函數特徵選取方法 A Feature Selection Strategy of Kernel Function Based on the Correlation between Response Variable and Principal Components
指導教授：	楊維寧 Wei-Ning Yang
口試委員:	呂永和 Yung-ho Leu 陳雲岫 Yun-Shiow Chen
學位類別：	碩士 Master
系所名稱：	管理學院 - 資訊管理系 Department of Information Management
論文出版年：	2021
畢業學年度：	109
語文別：	中文
論文頁數：	27
中文關鍵詞：	核函數、逕向基函數核、非線性問題、維度擴張、主成份分析、特徵選取、決定係數、羅吉斯迴歸
外文關鍵詞：	Kernel Function, Radial Basis Function Kernel, Nonlinear problem, Dimensional expansion, Principal Components Analysis, Feature Selection, Coefficient of Determination, Logistic Regression
相關次數：	點閱：492 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在機器學習領域中，資料分析會隨著「屬性向量」維度過高或過低而變成具有挑戰性的任務。隨著屬性向量的維度越高，分類模型會需要更大量的運算成本，還有可能因為訓練過度而發生過擬合(overfitting)的情況。在此情況下我們可以透過特徵選取來鎖定影響力較高的特徵，得以有效解決上述問題。而在相反的情況下，當屬性向量維度過低的情況，其實也很可能發生屬性向量跟「反應變數」的相關性(correlated)過低的問題，這也會使我們的模型不容易找到泛化的特徵，進而導致我們無法更進一步的提升準確度(accuracy)。在此情況我們也可以使用「核函數」將屬性向量轉換到高維空間中來改善此問題。

本研究提出一種植基於「反應變數」(response variable)與「主成份」(principal components)相關性的「核函數」(kernel function)特徵選取方法。首先使用核函數中的「逕向基函數核」(Radial Basis function kernel, RBF kernel)將屬性向量轉換到高維度空間，目的在於處理分類問題中的線性不可分問題，同時也可以擴增屬性向量的維度使其能夠更接近反應變數進而突破原始屬性向量所達不到的準確度。

接著利用「主成份分析」(Principal Components Analysis, PCA)將原本相關性過高的屬性向量轉換為互不相關的「主成份向量」，藉以去除「屬性變數」之間的相關性太高時所導致的多元共線性(multi-collinearity)問題。

分類模型準確度可以由反應變數及選取特徵間的相關係數之平方，也就是「決定係數」(coefficient of determination , R^2)來做出判斷，當我們將主成份依據相關係數平方r^2排序並由Top_(1 )、 Top_(2 )、 Top_(3 ) ... 的順序逐步加入欲訓練之資料集時，由於主成份之間是互不相關的，所以r^2是可以累加計算R^2的，因此我們決定利用r^2來做為特徵選取指標，而非傳統主成分分析所使用的變異量(variance)。

為了瞭解我們所提出之特徵選取方法的表現，我們利用羅吉斯迴歸(Logistic Regression)對本研究提出之特徵選取方法、未使用核函數之特徵選取方法以及依據變異量所選取的特徵，分別建構出分類的模型，再利用真偽鈔資料集與心臟病資料集進行實驗，實驗結果顯示本研究所提之特徵選取方法相較於傳統的核函數主成分分析方法和只使用依據r^2排序的主成分分析的分法皆對於分類準確度有著較顯著的提升，並且可以有效提高分類器效能。

In machine learning, data analysis is a challenging task as the dimension of the “attribute vector” gets too high or too low. Higher the dimension of the attribute vector becomes, the classification model will require more computational cost, and overfitting may occur due to overtraining. In this situation, to solve the above problem, we can apply the method of feature selection to adopt features that influence more. In the opposite, when the dimension of the attribute vector is low, the problem is that the correlation between the attribute vector and the “reaction variable” might be low, too. Not only that, it is also difficult for our model to find the general feature and hard to improve the further accuracy. In this case, we can adopt the “kernel function” to transform the attribute vector into a high-dimensional space to solve this problem.

This research proposes a feature selection’s scheme of “kernel functions” which depends on the correlation between “reaction variables” and “principal components”. In order to deal with the linear inseparability problem in the classification problem, we adopt the “Radial Basis Function Kernel” in the kernel function to transform the attribute vector into a high-dimensional space. In the meantime, the dimensional of the attribute vector can be enlarged to make it influence more with the response variable. Comparing with the original accuracy, the accuracy which has adopted our scheme increases dramatically.

Then we adopt “Principal Components Analysis” (PCA) to transform the highly correlated attribute vectors into uncorrelated “principal component vectors”, so as to solve the multi-collinearity problem of high correlation between the “attribute variables”.

The accuracy of the classification model depends on the response variable and the square of the correlation coefficient of the selected feature. That is the "coefficient of determination"(R^2). We rank the principal components according to the square of the correlation coefficient (r^2), and gradually adding Top_(1 ),Top_(2 ),Top_(3 ), ... to the dataset to be trained. Since the principal components are not related to each other, r^2 can be accumulated to calculate R^2. As a result, we decide to apply r^2 as a feature-selection indicator instead of the variance applied in traditional principal component analysis.

In order to demonstrate the performance of the feature selection method we proposed, we adopt Logistic Regression of the feature selection method proposed in our scheme, the feature selection method without the kernel function, and the feature selection method based on the variance to construct the classification model respectively. Furthermore, the classification models are then applied on the Banknote Authentication dataset and the Heart Disease dataset for experimentation. Experimental results demonstrate that the feature selection method proposed in our scheme achieves higher classification accuracy than traditional kernel function PCA method and the PCA method based on rank of r^2. Apart from this, our scheme can effectively enhance the performance of the classifier.

摘要    I
Abstract    II
誌謝    III
目錄    IV
圖目錄    V
表目錄    V
第1章 緒論    1
第2章    資料集與研究方法    3
第3章    實驗步驟與結果    12
第4章    結論與未來展望    17
參考文獻    18


                                

[1] Stuart J. Russell, Peter Norvig, Artificial Intelligence: A Modern Approach. 2010.

[2] Nello Cristianini, John Shawe-Taylor, An Introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press, 2000.

[3] Jamal I. Daoud, Multicollinearity and Regression Analysis. IOP Publishing Ltd, 2017.

[4] Michael Patrick Allen, “The problem of multicollinearity,” in Understanding Regression Analysis, Springer, Boston, MA., 1997.

[5] Svante Wold, Kim Esbensen and Paul Geladi, “Principal Component Analysis,” Chemometrics and Intelligent Laboratory Systems, Vol. 2, Issues 1–3, pp. 37-52, Aug. 1987.

[6] Hawkins, Douglas M.,” The problem of overfitting,” Journal of Chemical Information and Modeling, 44(1), pp. 1–12, 1997.

[7] Thomas Hofmann, Bernhard Schölkopf, and Alexander J. Smola, “Kernel methods in machine learning,” Ann. Statist., Vol. 36, pp. 1171-1220, 3 Num. 2008.

[8] Thomas Hofmann, Bernhard Schölkopf, and Alexander J. Smola, “Kernel methods in machine learning,” Ann. Statist., Vol. 36, pp. 1171-1220, 3 Num. 2008.

[9] Bernhard Scholkopf, Alexander Smola, and Klaus-Rob ert Muller, “Nonlinear Component Analysis as a Kernel Eigenvalue Problem,” Neural Computation, vol. 10, no. 5, pp. 1299-1319, 1 Jul. 1998.

[10] Dua, D. and Graff, C., “banknote authentication Data Set,” UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science, 2019.

[11] Andras Janosi, M.D. ( Hungarian Institute of Cardiology. Budapest ), William Steinbrunn, M.D. ( University Hospital, Zurich, Switzerland ), Matthias Pfisterer, M.D. (University Hospital, Basel, Switzerland ) and Robert Detrano, M.D., Ph.D. (V.A. Medical Center, Long Beach and Cleveland Clinic Foundation ), “Heart Disease Data Set,” UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science, 1988.

[12] Hein, M. and O. Bousquet. “Hilbertian Metrics and Positive Definite Kernels on Probability Measures,” AISTATS, 2005.

[13] N. Bourbaki, “Chapter V: Hilbert spaces (elementary theory),” in Topological Vector Spaces, Berlin: Springer-Verlag, 1987.

[14] Yin-Wen Chang, Cho-Jui Hsieh, Kai-Wei Chang, Michael Ringgaard, and Chih-Jen Lin, “Training and Testing Low-degree Polynomial Data Mappings via Linear SVM.,” Journal of Machine Learning Research, 11, pp. 1471–1490, 9 Aug. 2010.

[15] Amnon Shashua, Introduction to Machine Learning. Hebrew University of Jerusalem, Fall, 2008.

[16] Neil H. Spencer, “How Do We Decide which Distance Measure and Linkage Method to Use?,” Essentials of Multivariate Data Analysis, Chapman and Hall/CRC, 11 Feb. 2014.

[17] Shieh, Gwowen, “Improved shrinkage estimation of squared multiple correlation coefficient and squared cross-validity coefficient,” Organizational Research Methods, 11(2), pp. 387–407, 11 Apr 2008.

[18] Bermejo, Sergio, Cabestany, Joan, “Oriented principal component analysis for large margin classifiers,” Neural Networks, 14(10), pp. 1447–1461, 2001.

[19] Amemiya, T., Advanced Econometrics, Harvard University Press, 1985.

[20] John Michael Linacre,” Logistic ogive, not autocatalytic curve,” in Rasch Measurement Transactions, pp. 260, 1993.

[21] Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, MIT Press, 2016.

全文公開日期 2026/01/19 (校內網路)
全文公開日期本全文未授權公開 (校外網路)
全文公開日期本全文未授權公開 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文