簡易檢索 / 詳目顯示

研究生: 葉倚任
Yi-Ren Yeh
論文名稱: 資料維度縮減及其應用
Dimension Reduction and Its Applications
指導教授: 李育杰
Yuh-Jye Lee
口試委員: 鮑興國
Hsing-Kuo Pao
陳素雲
Su-Yun Hunag
王鈺強
Yu-Chiang Wang
林智仁
Chih-Jen Lin
學位類別: 博士
Doctor
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2010
畢業學年度: 98
語文別: 英文
論文頁數: 112
中文關鍵詞: 資料維度縮減變數選擇資料視覺畫化異常點偵測
外文關鍵詞: dimension reduction, data visualization, anomaly detection
相關次數: 點閱:192下載:11
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著各式感測器的開發,生物技術與各式醫療診測儀器的快速進展,現今的資料維度與特徵漸趨於複雜且龐大。因此在此一篇論文中,我們提出了一個利用非線性資料降維之方法來進行分析資料的架構。主要的想法為透過非線性的降維方式將資料投映至有效低維度空間 (effective dimension reduction subspace),最後並利用線性演算法來進行分析資料。我們首先將對監督式的降維方法 sliced inverse regression (SIR) 做非線性的推導與擴充。藉由這非線性資料降維方法,主要的線性結構將被保存至有效低維度空間。也因此,我們可以在有效低維度空間中利用線性演算法來分析資料。而除了 SIR 的非線性推導以及演算法實作之外,我們也針對非線性主成分分析(principal component analysis)的 robustness 進行研究,並提出以 iterative re-weight 的概念將異常點的權重降低,進而抑制異常點的影響。

    此外,我們將資料降維之方法應用在異常偵測與公司破產預測上。在異常偵測方面,我們利用 PCA 對異常點較敏感之特性來進行異常偵測。其概念為利用 leave one out(LOO)來逐一檢查有無每一個點對 principal direction 的影響程度。除此之外,我們也利用 over-sampling 的方式來強化異常點之影響程度。最後在計算上,我們則利用 online PCA 技巧在 LOO 的過程直接更新principal direction 來提升計算速度與避免儲存 covariance matrix。在公司破產預測上,我們也利用 1-norm support vector machine (SVM)和 incremental forward feature selection (IFFS) 來挑選較具意義之財務指標(變數選擇),而在實驗上充分顯現出其良好效果。最後,我們也提出利用兩兩不同的模型以2-D方式來呈現資料,進而輔助使用者來進而最後的決策。


    In this dissertation, we develop a dimension reduction framework for learning tasks based on the nonlinear dimension reduction methods. We first focus on the kernel extension of sliced inverse regression (SIR) which is a supervised dimension reduction and its implementation. Based on these nonlinear dimension reduction methods, the main linear features are extracted in this embedded feature space and many linear algorithms can be applied to the images of input data in this feature effective dimension reduction subspace. Except for the kernel SIR, we also study the robustness
    issue of kernel principal component analysis (PCA). A class of new robust procedures is proposed based on eigenvalue decomposition of weighted covariance. The proposed procedures will place less weights to deviant patterns and thus behave more resistant to data contamination and model deviation.

    In the end of this dissertation, anomaly detection and bankruptcy prognosis via dimension reduction methods are presented. In the anomaly detection, we have explored the variation of principal directions in the leave one out scenario. The over-sampling PCA is also proposed to enlarge the outlierness of an outlier. In addition, the online estimating principal directions in LOO is used for reducing the computational loading and satisfying the
    on-line detecting demand. In the bankruptcy prognosis, we apply feature selection via 1-norm support vector machine (SVM) and incremental forward feature selection (IFFS). The results show the selection of accounting ratios via 1-norm SVM and IFFS can perform as well as the greedy search. Besides, an 2-D visualization scheme
    via different models is also proposed to bank officer to make the decision.

    1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Dimension Reduction Framework for Learning Tasks . . . . . . . . . . . . 2 1.3 Organization of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Dimension Reduction 4 2.1 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.1 Filter Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.2 Wrapper Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.1 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . 7 2.2.2 Sliced Inverse Regression . . . . . . . . . . . . . . . . . . . . . . . . 8 3 Smooth Support Vector Machine and ε-Insensitive Regression 10 3.1 Smooth Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . 10 3.2 Smooth ε-Support Vector Regression . . . . . . . . . . . . . . . . . . . . . 11 3.3 Reduced Smooth Support Vector Machine . . . . . . . . . . . . . . . . . . 14 4 Kernel Sliced Inverse Regression and Its Implementation 16 4.1 Motivation and Background . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.2 RKHS Feature Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.3 KSIR in RKHS: A Geometric Framework and Properties . . . . . . . . . . 19 4.4 Estimating KSIR Directions via the Reduced SVD . . . . . . . . . . . . . . 24 4.5 Low-rank Approximation to KSIR Matrix . . . . . . . . . . . . . . . . . . 26 4.6 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.6.1 Data Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.6.2 KSIR Dimension Reduction for Classification . . . . . . . . . . . . 35 4.6.3 KSIR Dimension Reduction for Regression . . . . . . . . . . . . . . 38 4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5 Robust Kernel Principal Component Analysis 44 5.1 Motivation and Background . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.2 Conventional KPCA and Influence Functions . . . . . . . . . . . . . . . . . 45 5.3 Variants of Robust KPCA and Influence Functions . . . . . . . . . . . . . 47 5.3.1 Robustness Study on the Feature Sample space . . . . . . . . . . . 51 5.3.2 Robustness Study on a General Hilbert Space . . . . . . . . . . . . 54 5.4 Implementation and Numerical Examples . . . . . . . . . . . . . . . . . . . 56 5.4.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.4.2 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 6 Applications to Bankruptcy Prognosis and Anomaly Detection 66 6.1 Anomaly Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.1.1 Motivation and Background . . . . . . . . . . . . . . . . . . . . . . 66 6.1.2 The Influence of An outlier on Principal Directions . . . . . . . . . 67 6.1.3 Over-sampling Principal Components Analysis and Its Online Updating. . . 68 6.1.4 Data Cleaning and On-line Anomaly Detection . . . . . . . . . . . 71 6.1.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 72 6.1.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 6.2 Bankruptcy Prognosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 6.2.1 Motivation and Background . . . . . . . . . . . . . . . . . . . . . . 74 6.2.2 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 6.2.3 Selection of Accounting Ratios . . . . . . . . . . . . . . . . . . . . . 79 6.2.4 Experimental Setting and Results . . . . . . . . . . . . . . . . . . . 80 6.2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 7 Summary and Conclusion 92 A Proofs in Kernel SIR 94 B Proofs in Robust Kernel PCA 97

    [1] Ethem Alpaydin. Introduction to Machine Learning. The MIT Press, 2004.
    [2] The UCI KDD Archive. KDD Cup 1999 Data. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.
    [3] Arthur Asuncion and David Newman. UCI repository of machine learning databases, 2007. http://www.ics.uci.edu/~mlearn/mlrepository.html.
    [4] Philippe Besse and James O. Ramsay. Principal components analysis of sampled functions. Psychometrika, 51(2):285–311, 1986.
    [5] Andrew P. Bradley. The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognition, 30:1145–1159, 1997.
    [6] Markus Breunig, Hans-Peter Kriegel, Raymond T. Ng, and J‥org Sander. LOF: Identifying density-based local outliers. In Proceeding of the 2000 ACM SIGMOD International Conference on Management of Data, 2000.
    [7] Raymond B. Cattell. The scree test for the number of factors. Multivariate Behavioral Research, 1:245–276, 1966.
    [8] Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly detection : A survey. ACM Computing Surveys, 41(3):1–58, 2009.
    [9] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
    [10] Chun-Houh Chen and Ker-Chau Li. Can SIR be as popular as multiple linear regression? Statistica Sinica, 8:289–316, 1998.
    [11] Shiyi Chen, Wolfgang H‥ardle, and Rouslan Moro. Estimation of default probabilities with support vector machines. SFB 649 Discussion Paper 2006-077, 2006.
    [12] R. Dennis Cook. Regression Graphics: Ideas for Studying Regressions Through Graphics. John Wiley and Sons, 1998.
    [13] Frank Critchley. Influence in principal components analysis. Biometrika, 72(3):627–636, 1985.
    [14] Christophe Croux and Gentiane Haesbroeck. Principal component analysis based on robust estimators of the covariance or correlation matrix: Influence functions and efficiencies. Biometrika, 87(3):603–618, 2000.
    [15] Persi Diaconis and David Freedman. Asymptotics of graphical projection pursuit. Annals of Statistics, 12:793–815, 1984.
    [16] Michael F. Driscoll. The reproducing kernel hilbert space structure of the sample paths of a gaussian process. Probability Theory and Related Fields, 26(4):309–316, 1973.
    [17] Naihua Duan and Ker-Chau Li. Slicing regression: a link-free regression method. Annals of Statistics, 19(2):505–530, 1991.
    [18] Louis Ferr’e. Determining the dimension in sliced inverse regression and related methods. Journal of the American Statistical Association, 93:132–140, 1998.
    [19] Glenn Fung and Olvi. L. Mangasarian. A feature selection Newton method for support vector machine classification. Computational Optimization and Applications, 28(2):185–202, 2004.
    [20] Peter Hall and Mohammad Hosseini-Nasab. On properties of functional principal components analysis. Journal of the Royal Statistical Society B, 68:109–126, 2006.
    [21] Peter Hall and Ker-Chau Li. On almost linearity of low dimensional projections from high dimensional data. Annals of Statistics, 21:867–889, 1993.
    [22] Wolfgang H‥ardle, Yuh-Jye Lee, Dorothea sch‥afer, and Yi-Ren Yeh. Variable selection and oversampling in the use of smooth support vector machines for predicting the default risk of companies. Journal of Forecasting, 28(6):512–534, 2009.
    [23] Wolfgang H‥ardle, Rouslan Moro, and Dorothea Sch‥afer. Graphical data representation in bankruptcy analysis based on support vector machines. In C. Chen, W.K. Hrdle, and A. Unwin, editors, Handbook of Data Visualization, Heidelberg, 2007. Springer.
    [24] Douglas M. Hawkins. Identification of Outliers. Chapman and Hall, 1980.
    [25] Simon Haykin. Adaptive Filter Theory. Prentice Hall, 1991.
    [26] Isao Higuchi and Shinto Eguchi. The influence function of principal component analysis by self-organizing rule. Neural Computation, 10(6):1435–1444, 1998.
    [27] Isao Higuchi and Shinto Eguchi. Robust principal component analysis with adaptive selection for tuning parameters. Journal of Machine Learning Research, 5:453–471, 2004.
    [28] Chien-Ming Huang, Yuh-Jye Lee, Dennis K. J. Lin, and Su-Yun Huang. Model selection for support vector machines via uniform design. A special issue on Machine Learning and Robust Data Mining of Computational Statistics and Data Analysis, 52:335–346, 2007.
    [29] Ling Huang, Xuanlong Nguyen, Minos Garofalakis, Michael Jordan, Anthony D. Joseph, and Nina Taft. In-network pca and anomaly detection. In Proceeding of Advances in Neural Information Processing Systems 19, 2007.
    [30] Su-Yun Huang and Chii-Ruey Hwang. Kernel fisher discriminant analysis with gaussian reproducing kernel hilbert spaces -theory. Technical report, Institute of Statistical Science, Academia Sinica, 2006.
    [31] Su-Yun Huang, Yi-Ren Yeh, and Shinto Eguchi. Robust kernel principal component analysis. Neural Computation, 21(11):1–35, 2009.
    [32] Peter J. Huber. Robust Statistics. Wiley New York, 1981.
    [33] Michael I. Jordan, Francis R. Bach, and Francis R. Bach. Kernel independent component analysis. Journal of Machine Learning Research, 3:1–48, 2003.
    [34] Hidehiko Kamiya and Shinto Eguchi. A class of robust principal component vectors. Journal of Multivariate Analysis, 77(2):239–269, 2001.
    [35] Jan Pieter Krahnen and Martin Weber. Generally accepted rating principles: A primer. Journal of Banking and Finance, 25(1):3–23, January 2001.
    [36] Hans-Peter Kriegel, Matthias Schubert, and Arthur Zimek. Angle-based outlier detection in high-dimensional data. In Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, 2008.
    [37] Aleksandar Lazarevic, Levent Ert‥oz, Vipin Kumar, Aysel Ozgur, and Jaideep Srivastava. A comparative study of anomaly detection schemes in network intrusion detection. In Proceedings of the Third SIAM International Conference on Data Mining, 2003.
    [38] Yuh-Jye Lee, Chien-Chung Chang, and Chia-Huang Chao. Incremental forward feature selection with application to microarray gene expression. Journal of Biopharmaceutical Statistics, 18:827–840, 2008.
    [39] Yuh-Jye Lee, Wen-Feng Hsieh, and Chien-Ming Huang. ε–SSVR: A smooth support vector machine for ε–insensitive regression. IEEE Transactions on Knowledge and Data Engineering, 17(5):678–685., 2005.
    [40] Yuh-Jye Lee and Su-Yun Huang. Reduced support vector machines: A statistical theory. IEEE Transactions on Neural Networks, 18(1):1–13, 2007.
    [41] Yuh Jye Lee, Hung-Yi Lo, and Su-Yun Huang. Incremental reduced support vector machines. In Proceeding of International Conference on Informatics Cybernetics and System, 2003
    [42] Yuh-Jye Lee and Olvi L. Mangasarian. Rsvm: Reduced support vector machines. In Proceedings of the First SIAM International Conference on Data Mining, 2001.
    [43] Yuh-Jye Lee and Olvi L. Mangasarian. SSVM: A smooth support vector machine for classification. Computational Optimization and Applications, 20(1):5–22, 2001.
    [44] Ker-Chau Li. Sliced inverse regression for dimension reduction (with discussion). Journal of the American Statistical Association, 86:316–342, 1991.
    [45] Ker-Chau Li. Nonlinear confounding in high-dimensional regression. Annals of Statistics, 25(2):577–612, 1997.
    [46] Milan N. Luki’c and Jay H. Beder. Stochastic processes with sample paths in reproducing kernel hilbert spaces. Transaction American Mathmatical Society, 353(10):3945–3970, 2001.
    [47] Ricardo Antonio Maronna. Robust m-estimators of multivariate location and scatter. Annals of Statistics, 4:51–67, 1976.
    [48] MATLAB. User’s guide. MathWorks, Inc., Natick, MA, 1992.
    [49] Sebastian Mika, Gunnar R‥atsch, Jason Weston, Bernhard Sch‥olkopf, and Klaus-Robert M‥uller. Fisher discriminant analysis with kernels. In Proceedings of the 1999 IEEE Signal Processing Society Workshop, 1999.
    [50] M.N.H. Mollah, N. Sultana, M. Minami, and S. Eguchi. Exploring local pca structure by minimizing β-divergence. Institute of Statistical Mathematics Research Memo, 956, 2005.
    [51] Spiros Papadimitriou, Jimeng Sun, and Christos Faloutsos. Streaming pattern discovery in multiple time-series. In Proceedings of the 31st international conference on Very large data bases, 2005.
    [52] John Platt. Sequential minimal optimization: A fast algorithm for training support vector machines. Advances in Kernel Methods-Support Vector Learning, pages 185– 208, 1999.
    [53] James O. Ramsay and C.J. Dalzell. Some tools for functional data analysis (with discussion). Journal of the Royal Statistical Society B, 53:539–572, 1991.
    [54] James O. Ramsay and Bernard W. Silverman. Functional Data Analysis. Springer, 1997.
    [55] James O. Ramsay and Bernard W. Silverman. Applied Functional Data Analysis: Methods and Case Studies. Springer, 2002.
    [56] Sanjay Rawat, Arun K. Pujari, and Ved P. Gulati. On the use of singular value decomposition for a fast intrusion detection system. Electronic Notes in Theoretical Computer Science, 142(3):215–228, 2006.
    [57] Saburou Saitoh. Integral Transforms, Reproducing Kernels and Their Applications. Addison Wesley Longman, 1997.
    [58] Bernhard Sch‥olkopf, Alexander J. Smola, and Klaus-Robert M‥uller. Nonlinear principal component analysis as a kernel eigenvalue problem. Neural Computation, 10(5):1299–1319, 1998.
    [59] Bernhard Sch‥olkopf, Alexander J. Smola, and Klaus-Robert M‥uller. Advances in Kernel Methods -Support Vector Learning, chapter Kernel principal component analysis, pages 327–35. MIT Press, 1999.
    [60] James R. Schott. Determining the dimensionality in sliced inverse regression. Journal of the American Statistical Association, 89:141–148, 1994.
    [61] Bernard W. Silverman. Smoothed functional principal components analysis by choice of norm. Annals of Statistics, 24:1–24, 1996.
    [62] Alexander J. Smola and Bernhard Sch‥olkopf. Sparse greedy matrix approximation for machine learning. In Proceedings of the Seventeenth International Conference on Machine Learning, pages 911–918, 2000.
    [63] James R. Thompson and Richard A. Tapia. Nonparametric Function Estimation, Modeling, and Simulation. SIAM, 1990.
    [64] Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, 58:267–288, 1996.
    [65] I-Ping Tu, Hung Chen, and Xin Chen. An eigenvector variability plot. Statistica Sinica, 19:1741–1754, 2009.
    [66] Santiago Velilla. Assessing the number of linear components in a general regression problem. Journal of the American Statistical Association, 93:1088–1098, 1998.
    [67] Grace Wahba. Spline Models for Observational Data. SIAM, 1990.
    [68] Grace Wahba. Support vector machines, reproducing kernel hilbert spaces and the randomized gacv. Advances in Kernel Methods, 1999.
    [69] Wei Wang, Xiaohong Guan, and Xiangliang Zhang. A novel intrusion detection method based on principal component analysis in computer security. In Proceeding of the International Symposium on Neural Networks, 2004.
    [70] Christopher K. I. Williams and Matthias Seeger. Using the Nystr‥om method to speed up kernel machines. In Proceeding of Advances in Neural Information Processing Systems, pages 682–688, 2001.
    [71] C. F. Jeff Wu. On the convergence properties of the em algorithm. Annals of Statistics, 11(1):95–103, 1983.
    [72] Han-Ming Wu. Kernel sliced inverse regression with applications to classification. Journal of Computational and Graphical Statistics, 17(3):590–610, 2008.
    [73] Bin Yang. Projection approximation subspace tracking. IEEE Transaction on Signal Processing, 43:95–107, 1995.
    [74] Zhishen Ye and Robert E. Weiss. Using the bootstrap to select one of a new class of dimension reduction methods. Journal of the American Statistical Association, 98:968–979, 1998.
    [75] Yi-Ren Yeh, Su-Yun Huang, and Yuh-Jye Lee. Nonlinear dimension reduction with kernel sliced inverse regression. IEEE Transactions on Knowledge and Data Engineering, 21(11):1590–1603, 2009.
    [76] Yi-Ren Yeh, Zheng-Yi Lee, and Yuh-Jye Lee. Anomaly detection via over-sampling principal component analysis. In Proceeding of the First KES International Symposium on Intelligent Decision Technologies, 2009.
    [77] Ji Zhu, Saharon Rosset, Trevor Hastie, and Rob Tibshirani. 1-norm support vector machine. In Proceeding of Advances in Neural Information Processing Systems 16,2004.

    QR CODE