簡易檢索 / 詳目顯示

研究生: 蘇冠瑛
Kuan-Ying Su
論文名稱: 利用核化逆回歸法實現半監督式學習
Kernel Sliced Inverse Regression (KSIR) for Semi-supervised Learning
指導教授: 李育杰
Yuh-Jye Lee
口試委員: 鮑興國
Hsing-Kuo Kenneth Pao
陳素雲
Su-Yun Huang
葉倚任
Yi-Ren Yeh
林軒田
Hsuan-Tien Lin
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2014
畢業學年度: 102
語文別: 英文
論文頁數: 53
中文關鍵詞: 切片逆回歸法核化切片逆回歸法維度縮減半監督式學習
外文關鍵詞: Sliced Inverse Regression, Kernel Sliced Inverse Regression, Dimension Reduction, Semi-supervised Learning
相關次數: 點閱:233下載:9
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在巨量資料的年代,我們很難直接針對這些高維度且高複雜度的資料直接做處理。此外,標記資料需要大量的時間、人力、甚至是金錢;相對的,未被標記的資料非常的多而且容易取得。半監督式學習利用可以利用少量標記資料加上大量的未標記資料對資料做分析處理。本研究是利用核化切片逆迴歸法來實現半監督式學習的方法,核化切片逆迴歸法主要是利用平均數、共變異矩陣與加權平均數、加權共變異矩陣解透過特徵值分解找到有效維度縮減的方向,其中只有加權平均數以及加權共變異矩陣需要用到標記資料所提供的訊息;所以我們近一步地把這個方法延伸到半監督式學習領域:取代了原本對平均數及共變異矩陣的估計,以往在半監督式學習領域裡,只有標記資料參與計算;在半監督式核化切片逆迴歸法中,我們利用標記資料以及未標記資料去估計平均數以及共變異矩陣,然而加權平均數及加權共變異矩陣依然只用有標記的資料去做計算。透過這個方法,我們有效的運用標記以及未標記的資料去計算有效維度縮減方向,並透過這個方向將資料投影到低維度空間,最後linear SSVM根據這些低維度資料建出分類模型。實驗結果顯示本方法能夠在短時間內有效且準確處理高維度資料。


    In the big data age, the large and complex collection of data sets becomes difficult to process using on-hand database management tools or traditional data processing applications. In addition, due to the cost of collecting labeled data, we have much more unlabeled data than we do labeled data. Hence, the problem of utilizing both labeled and unlabeled data for machine learning tasks, known as known as semi-supervised learning (SSL), is attracting increasing attention. Sliced inverse regression (SIR) is a renowned supervised linear dimension reduction method and has been extended to nonlinear setting via the kernel trick. Projecting the data instances onto the e.d.r. subspace extracted by kernelized SIR (KSIR) and then applying a linear classification algorithm such as SVM is a very powerful tool for classification problems. The e.d.r. subspace is generated by solving a generalized eigenvalue problem that is defined by the sample covariance matrix and the between-class sample covariance matrix. Thus with a good estimation of these kernalized covariance matrices, we are able to extract a more accurate e.d.r. subspace for the purpose of classification. Based on this observation, we will apply KSIR to the SSL problem. We assume that the labeled instances inherit the probability distribution of the classes' data distribution. Thus, we can use these labeled and unlabeled data to estimate the kernalized sample covariance matrix and the between-class sample covariance matrix to generate the KSIR $e. d. r.$ subspace. This $e. d. r.$ subspace will embed the class characteristic information into each KSIR direction. The linear smooth SVM classifier is applied to the projection images of labeled data in the $e. d. r.$ subspace. The numerical results show that the proposed method's classification performance in a SSL setting is competitive with that of classifiers working under a supervised learning setting.

    1 Introduction 1 1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Methods and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Organization of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Related Work 6 2.1 Label Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Low-density Separation Methods . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 Graph-Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4 Change of Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3 Kernel Sliced Inverse Regression (KSIR) 15 3.1 Kernel Sliced Inverse Regression - KSIR . . . . . . . . . . . . . . . . . . . 18 3.1.1 KSIR for Classification . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.1.2 Extension for KSIR directions . . . . . . . . . . . . . . . . . . . . . 23 4 KSIR for Semi-supervised Learning 27 5 Experimental Results 35 5.1 Experimental Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.2 Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.3 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 6 Conclusion and Future Work 48

    [1] Cmu world wide knowledge base (web->kb) project. available at http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-51/www/co-training/data/.
    [2] Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation, 15(6):1373-1396, 2003.
    [3] Mikhail Belkin and Partha Niyogi. Semi-supervised learning on riemannian manifolds. Machine learning, 56(1-3):209-239, 2004.
    [4] Mikhail Belkin, Partha Niyogi, and Vikas Sindhwani. On manifold regularization. In Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics (AISTAT 2005), pages 17-24, 2005.
    [5] Mikhail Belkin, Partha Niyogi, and Vikas Sindhwani. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. The Journal of Machine Learning Research, 7:2399-2434, 2006.
    [6] Kristin Bennett, Ayhan Demiriz, et al. Semi-supervised support vector machines. Advances in Neural Information processing systems, pages 368-374, 1999.
    [7] CL Blake and Christopher J Merz. Uci repository of machine learning databases [http://www. ics. uci. edu/~ mlearn/mlrepository. html]. irvine, ca: University of california. Department of Information and Computer Science, 55, 1998.
    [8] Avrim Blum and Shuchi Chawla. Learning from labeled and unlabeled data using graph mincuts. 2001.
    [9] Avrim Blum and Tom Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on Computational learning theory, pages 92-100. ACM, 1998.
    [10] Vittorio Castelli and Thomas M Cover. On the exponential value of labeled samples. Pattern Recognition Letters, 16(1):105-111, 1995.
    [11] Vittorio Castelli and Thomas M Cover. The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter. Information Theory, IEEE Transactions on, 42(6):2102-2117, 1996.
    [12] Chien-Chung Chang, Hsing-Kuo Pao, and Yuh-Jye Lee. An rsvm based two-teachers{one-student semi-supervised learning algorithm. Neural Networks, 25:57{69, 2012.
    [13] Chih-Cheng Chang. Smooth support vector machine for multi-class classification. 2007.
    [14] Chih-Chung Chang and Chih-Jen Lin. Libsvm : a library for support vector machines. software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm. ACM Transactions on Intelligent Systems and Technology, 2:27:1-27:27, 2011.
    [15] Olivier Chapelle and Alexander Zien. Semi-supervised classification by low density separation. 2004.
    [16] Olivier Chapelle, Bernhard Scholkopf, Alexander Zien, et al. Semi-supervised learning, volume 2. MIT press Cambridge, 2006.
    [17] Olivier Chapelle, Vikas Sindhwani, and Sathiya S Keerthi. Optimization techniques for semi-supervised support vector machines. The Journal of Machine Learning Research, 9:203-233, 2008.
    [18] Ronan Collobert, Fabian Sinz, Jason Weston, and Leon Bottou. Large scale transductive svms. The Journal of Machine Learning Research, 7:1687-1712, 2006.
    [19] Chien-Ming Huang, Yuh-Jye Lee, Dennis Lin, and Su-Yun Huang. Model selection for support vector machines via uniform design. Computational Statistics & Data Analysis, 52(1):335-346, 2007.
    [20] Jonathan J. Hull. A database for handwritten text recognition research. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 16(5):550-554, 1994.
    [21] Thorsten Joachims. Making large scale svm learning practical. 1999.
    [22] Thorsten Joachims. Transductive inference for text classification using support vector machines. In ICML, volume 99, pages 200-209, 1999.
    [23] Thorsten Joachims. Learning to classify text using support vector machines: Methods, theory and algorithms. Kluwer Academic Publishers, 2002.
    [24] Ian Jollie. Principal component analysis. Wiley Online Library, 2005.
    [25] Abhishek Kumar and Hal Daume. A co-training approach for multi-view spectral clustering. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 393-400, 2011.
    [26] Yuh-Jye Lee and Olvi Mangasarian. SSVM: A smooth support vector machine for classification. Computational Optimization and Applications, 20(1):5-22, 2001.
    [27] Yuh-Jye Lee and Olvi L Mangasarian. Rsvm: Reduced support vector machines. In Proceedings of the first SIAM international conference on data mining, pages 5-7. SIAM, 2001.
    [28] Yuh-Jye Lee, Yi-Ren Yeh, and Yu-Chiang FrankWang. Anomaly detection via online oversampling principal component analysis. Knowledge and Data Engineering, IEEE Transactions on, 25(7):1460-1470, 2013.
    [29] Ker-Chau Li. Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, 86(414):316-327, 1991.
    [30] Jaw Link and Yuh-jye Lee. Engine diagnostics in the eyes of machine learning. In Proceedings of the 2014 ASME Turbo Expo: Controls, Diagnostics & Instrumentation. ASME, 2014.
    [31] Geoffrey McLachlan. Discriminant analysis and statistical pattern recognition, volume 544. John Wiley & Sons, 2004.
    [32] Tom Mitchell. The role of unlabeled data in supervised learning. In Proceedings of the sixth international colloquium on cognitive science, pages 2-11. Citeseer, 1999.
    [33] Ion Muslea, Steven Minton, and Craig A Knoblock. Active+ semi-supervised learning=robust multi-view learning. In ICML, volume 2, pages 435-442, 2002.
    [34] Sameer A Nene, Shree K Nayar, Hiroshi Murase, et al. Columbia object image library (coil-20). Technical report, Technical Report CUCS-005-96, 1996.
    [35] Joel Ratsaby and Santosh S Venkatesh. Learning from a mixture of labeled and unlabeled examples with parametric side information. In Proceedings of the eighth annual conference on Computational learning theory, pages 412-417. ACM, 1995.
    [36] Sam T Roweis and Lawrence K Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323-2326, 2000.
    [37] Lawrence K Saul, Kilian Q Weinberger, Jihun H Ham, Fei Sha, and Daniel D Lee. Spectral methods for dimensionality reduction. Semisupervised learning, pages 293- 308, 2006.
    [38] Bernhard Scholkopf, Alexander Smola, and Klaus-Robert Muller. Nonlinear component analysis as a kernel eigenvalue problem. Neural computation, 10(5):1299-1319, 1998.
    [39] Bernhard Scholkopft and Klaus-Robert Mullert. Fisher discriminant analysis with kernels. 1999.
    [40] Terence Sim, Simon Baker, and Maan Bsat. The cmu pose, illumination, and expression (pie) database. In Automatic Face and Gesture Recognition, 2002. Proceedings. Fifth IEEE International Conference on, pages 46-51. IEEE, 2002.
    [41] Vikas Sindhwani and S Sathiya Keerthi. Large scale semi-supervised linear svms. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 477-484. ACM, 2006.
    [42] Vikas Sindhwani, Partha Niyogi, and Mikhail Belkin. Beyond the point cloud: from transductive to semi-supervised learning. In Proceedings of the 22nd international conference on Machine learning, pages 824-831. ACM, 2005.
    [43] Joshua B Tenenbaum, Vin De Silva, and John C Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319-2323, 2000.
    [44] Vladimir Vapnik. The nature of statistical learning theory. springer, 2000.
    [45] Wei Wang and Zhi-Hua Zhou. On multi-view active learning and the combination with semi-supervised learning. In Proceedings of the 25th international conference on Machine learning, pages 1152-1159. ACM, 2008.
    [46] Yongmao Wang and Yukun Wang. Semi-supervised dimensionality reduction. In The Third International Symposium Computer Science and Computational Technology (ISCSCT 2010), page 506. Citeseer, 2010.
    [47] Han-Ming Wu. Kernel sliced inverse regression with applications to classification. Journal of Computational and Graphical Statistics, 17(3), 2008.
    [48] Xin Yang, Haoying Fu, Hongyuan Zha, and Jesse Barlow. Semi-supervised nonlinear dimensionality reduction. In Proceedings of the 23rd international conference on Machine learning, pages 1065-1072. ACM, 2006.
    [49] Yi-Ren Yeh, Su-Yun Huang, and Yuh-Jye Lee. Nonlinear dimension reduction with kernel sliced inverse regression. Knowledge and Data Engineering, IEEE Transactions on, 21(11):1590-1603, 2009.
    [50] Zhi-Hua Zhou and Ming Li. Tri-training: Exploiting unlabeled data using three classifiers. Knowledge and Data Engineering, IEEE Transactions on, 17(11):1529- 1541, 2005.
    [51] Xiaojin Zhu. Semi-supervised learning literature survey. Computer Science, University of Wisconsin-Madison, 2:3, 2006.
    [52] Xiaojin Zhu, Zoubin Ghahramani, John Laerty, et al. Semi-supervised learning using gaussian fields and harmonic functions. In ICML, volume 3, pages 912-919, 2003.
    [53] Xiaojin Zhu, Jaz S Kandola, Zoubin Ghahramani, John D Lafferty, et al. Nonparametric transforms of graph kernels for semi-supervised learning. In NIPS, volume 17, pages 1641-1648, 2004.
    [54] Xiaojin Zhu, John Lafferty, and Ronald Rosenfeld. Semi-supervised learning with graphs. PhD thesis, Carnegie Mellon University, Language Technologies Institute, School of Computer Science, 2005.

    QR CODE