簡易檢索 / 詳目顯示

研究生: 高志鵬
Zhi-Peng Kao
論文名稱: 異常點阻抗之平滑支撐向量法和平滑LASSO
Outlier Resistance of 1-Norm Soft Margin Smooth SVM and Smooth LASSO
指導教授: 李育杰
Yuh-Jye Lee
口試委員: 張源俊
Yuan-chin Ivan Chang
陳素雲
Su-Yun Huang
鮑興國
Hsing-Kuo Pao
戴碧如
Bi-Ru Dai
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2008
畢業學年度: 96
語文別: 英文
論文頁數: 64
中文關鍵詞: 分類特徵選取最小絕對值壓縮和選取異常點平滑支撐向量機支撐向量機
外文關鍵詞: classification, feature selection, least absolute shrinkage and selection operator, outlier, smooth support vector machine, support vector machine
相關次數: 點閱:393下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

在支撐向量法 (Support Vector Machines: SVMs) 的分類問題(classification) 中,我們深入的探討遺失 (loss) 處罰 (penalty) 項,發現了不同的基準測量會影響到異常點之阻抗 (outlier resistance) 跟特徵選取 (feature selection) 的能力。基於我們的探討和實驗在支撐向量法的遺失項,我們會討論出1基準測量比2基準測量在異常點之阻抗的表現比較好。所以我們要把先前以2基準測量遺失項之平滑支撐向量法 (2-norm soft margin Smooth Support Vector Machine: SSVM2) 改良成以1基準測量遺失項之平滑支撐向量法 (SSVM1)。我們也提出了一個簡單的方法把異常點過濾掉 (outlier filtering) 以增進異常點阻抗之能力。我們的實驗表現出我們不管在標準的資料或者是有異常點的資料都表現的很好。對於最小絕對值壓縮和選取 (Least Absolute Shrinkage and Selection Operator: LASSO) 之處罰項用1基準測量,他就具有自動的特徵選取的能力。此外我們引進了平滑的技術在LASSO上,把他稱為平滑LASSO (SLASSO),來達到同步的分類和特徵選取。在我們平滑LASSO的實驗中我們不管在正確率上跟特微選取的個數上,我們的平滑LASSO都是表現最好的。


In classification tasks, we probe into loss and
penalty terms of standard SVMs deeper and find out that the
different measurements for loss and penalty terms will influence
the performances of outlier resistance and feature selection.
Based on our studies and experiments on the loss term of SVMs, we
show that 1-norm error is better than 2-norm error for outlier
resistance. Thus, we modify the previous 2-norm soft margin Smooth
Support Vector Machine (SSVM_2) to 1-norm soft margin SSVM_1.
We also propose a very simple idea for outlier filtering, which
can improve the ability of outlier resistance. Our experimental
results show that SSVM_1 performs well on the datasets with
outliers and benchmark datasets. For the character of 1-norm
penalty term of Least Absolute Shrinkage and Selection Operator
(LASSO), it can automatically select input features. In addition,
we introduce the smooth technique into LASSO and call it Smooth
LASSO (SLASSO) to provide for simultaneous classification and
feature selection. In the experiments of SLASSO, we compare it
with other approaches of ``wrapper" and ``filter" models for
feature selection. The results show that SLASSO has a slightly
better accuracy than other approaches with the desirable ability
of feature suppression.

1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Organization of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Outlier E®ect on Support Vector Machines 5 2.1 1-Norm and 2-Norm Soft Margin SVMs . . . . . . . . . . . . . . . . . . . . 5 2.2 Outlier E®ect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3 Smooth Support Vector Machines 15 3.1 2-Norm Soft Margin Smooth SVM: SSVM2 . . . . . . . . . . . . . . . . . . 15 3.2 1-Norm Soft Margin Smooth SVM: SSVM1 . . . . . . . . . . . . . . . . . . 18 3.3 Nonlinear for SSVM1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.4 A Newton-Armijo Algorithm for SSVM1 . . . . . . . . . . . . . . . . . . . 20 3.5 Implementation for SSVM1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4 Experimental Tests for SSVM1 24 4.1 Heuristics for Outlier Filtering . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.2 Numerical Results and Comparisons . . . . . . . . . . . . . . . . . . . . . . 30 5 Smooth LASSO for Classi‾cation 34 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.2 SLASSO for Classi‾cation . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.3 A Newton-Armijo Algorithm for SLASSO on classi‾cation . . . . . . . . . 37 5.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 5.5 Experimental Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.5.1 Data Presentation Experimental Setup . . . . . . . . . . . . . . . . 39 5.5.2 Numerical Results and Comparisons . . . . . . . . . . . . . . . . . 42 6 Conclusion 45

[1] U. Alon, N. Barkai, D. A. Notterman, K. Gish, S. Ybarra, D. Mack, and A. J. Levine.
Broad patterns of gene expression revealed by clustering analysis of tumor and normal
colon tissues probed by oligonucleotide arrays. Cell Biology, 96:6745{6750, 1999.
[2] L. Armijo. Minimization of functions having Lipschitz-continuous ‾rst partial deriv-
atives. Paci‾c Journal of Mathematics, 16:1{3, 1966.
[3] A. Asuncion and D. J. Newman. UCI machine learning repository, 2007.
http://www.ics.uci.edu/ mlearn/MLRepository.html.
[4] D. P. Bertsekas. Nonlinear Programming. Athena Scienti‾c, Belmont, MA, second
edition, 1999.
[5] A. Blum and P. Langley. Selection of relevant features and examples in machine
learning. Arti‾cial Intelligence, 97(1-2):245{271, 1997.
[6] P. S. Bradley and O. L. Mangasarian. Feature selection via concave minimization
and support vector machines. In J. Shavlik, editor, Machine Learning Proceedings
of the Fifteenth International Conference(ICML '98), pages 82{90, San Francisco,
California, 1998. MorganKaufmann.
[7] C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data
Mining and Knowledge Discovery, 2:121{167, 1998.
[8] C. C. Chang and C. J. Lin. Libsvm: a library for support vector machines. Available:
http://www.csie.ntu.edu.tw/ cjlin/papers/libsvm.pdf.
[9] B. Chen and P. T. Harker. Smooth approximations to nonlinear complementarity
problems. SIAM Journal of Optimization, 7:403{420, 1997.
[10] C. Chen and O. L. Mangasarian. Smoothing methods for convex inequalities and
linear complementarity problems. Mathematical Programming, 71(1):51{69, 1995.
[11] C. Chen and O. L. Mangasarian. A class of smoothing functions for nonlinear and
mixed complementarity problems. Computational Optimization and Applications,
5(2):97{138, 1996.
[12] X. Chen, L. Qi, and D. Sun. Global and superlinear convergence of the smoothing
newton method and its application to general box constrained variational inequalities.
Mathematics of Computatoin, 67:519{540, 1998.
[13] X. Chen and Y. Ye. On homotopy-smoothing methods for variational inequalities.
SIAM Journal on Control and Optimization, 37:589{616, 1999.
[14] V. Cherkassky and F. Mulier. Learning from Data - Concepts, Theory and Methods.
John Wiley & Sons, New York, 1998.
[15] P. W. Christensen and J. S. Pang. Frictional contact algorithms based on semismooth
newton methods. In M. Fukushima and L. Qi, editors, Reformulation: Nonsmooth,
Piecewise Smooth, Semismooth and Smoothing Methods, pages 81{116, Dordrecht,
Netherlands, 1999. Kluwer Academic Publishers.
[16] C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20:273{297,
1995.
[17] R. Courant and D. Hilbert. Methods of Mathematical Physics. Interscience Publish-
ers, New York, 1953.
[18] N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines.
Cambridge University Press, Cambridge, 2000.
[19] J. E. Dennis and R. B. Schnabel. Numerical Methods for Unconstrained Optimization
and Nonlinear Equations. Prentice-Hall, Englewood Cli®s, N.J., 1983.
[20] K. T. Fang and Y. Wang. Number-Theoretic Methods in Statistics. Chapmman &
Hall, London, 1994.
[21] M. Fukushima and L. Qi. Reformulation: Nonsmooth, Piecewise Smooth, Semismooth
and Smoothing Methods. Kluwer Academic Publishers, Dordrecht, The Netherlands,
1999.
[22] G. M. Fung and O. L. Mangasarian. A feature selection newton method for support
vector machine classi‾cation. Computational Optimization and Applications, 28:185{
202, 2004.
[23] T. Furey, N. Cristianini, N. Du®y, D. Bednarski, M. Schummer, and D. Haussler.
Support vector machine classi‾cation and validation of cancer tissue samples using
microarray expression data. Bioinformatics, 16, 2000.
[24] G. H. Golub and C. F. Van Loan. Matrix Computations. The John Hopkins University
Press, Baltimore, Maryland, 3rd edition, 1996.
[25] T. Golub, D. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. Mesirov, H. Coller,
M. Loh, J. Downing, M. Caligiuri, C. Bloom‾eld, and E. Lander. Molecular classi‾-
cation of cancer: Class discovery and class prediction by gene expression monitoring.
Science, 286(5439), 1999.
[26] I. Guyon, M. Nikravesh, S. Gunn, and L. Zadeh. Feature Extraction: Foundations
and Applications. Springer, New York, 2006.
[27] I. Guyon, J. Watson, S. Barnhill, and V. Vapnik. Gene selection for cancer classi‾-
cation using support vector machines. Machine Learning, 46(1/2/3):191{202, 2002.
[28] C. M. Huang, Y. J. Lee, D. K. J. Lin, and S. Y. Huang. Model selection for support
vector machines via uniform design. Technical report, Department of Computer
Science and Information Engineering, National Taiwan University of Science and
Technology, Taipei 106, Taiwan, 2006.
[29] T. Joachims. SVMlight, 2002. http://svmlight.joachims.org.
[30] R. Kohavi and G. John. Wrapper fo feature subset selection. Arti‾cial Intelligent
Journal, pages 273{324, 1997.
[31] Y. J. Lee, C. C. Chang, and C. H. Chao. Incremental forward feature selection with
application to microarray gene expression. Journal of Biopharmaceutical Statistics,
(to appear) 2008.
[32] Y. J. Lee, W. F. Hsieh, and C. M. Huang. ²-SSVR: A smooth support vector machine
for ²-insensitive regression. IEEE Transactions on Knowledge and Data Engineering,
17:678{685, 2005.
[33] Y. J. Lee and S. Y. Huang. Reduced support vector machines: a statistical theory.
IEEE Transactions on Neural Networks, 18:1{13, 2007.
[34] Y. J. Lee and O. L. Mangasarian. RSVM: Reduced support vector machines. In
First SIAM International Conference on Data Mining, Chicago, 2001.
[35] Y. J. Lee and O. L. Mangasarian. SSVM: A smooth support vector machine. Com-
putational Optimization and Applications, 20:5{22, 2001.
[36] D. MacKay. Bayesian nonlinear modeling for the prediction competition. in ASHRAE
Trans, Atlanta, 1994.
[37] O. L. Mangasarian. Mathematical Programming in Neural Networks. ORSA Journal
on Computing, 5(4):349{360, 1993.
[38] O. L. Mangasarian. Nonlinear Programming. SIAM, Philadelphia, PA, 1994.
[39] O. L. Mangasarian. Generalized support vector machines. In A. Smola, P. Bartlett,
B. SchÄolkopf, and D. Shuurmans, editors, Advance in Large Margin Classi‾ers, pages
135{146, Cambridge, MA, 2000. MIT Press.
[40] O. L. Mangasarian. Exact 1-norm support vector machines via unconstrained convex
di®erentialble minimization. Journal of Machine Learning Research, 7:1517{1530,
2006.
[41] O. L. Mangasarian and D. R. Musicant. Successive Overrelaxation for Support Vector
Machines. IEEE Transactions on Neural Networks, 10:1032{1037, 1999.
[42] O. L. Mangasarian, W. N. Street, and W. H. Wolberg. Breast cancer diagnosis and
prognosis via linear programming. Operations Research, 43(4):570{577, 1995.
[43] MATLAB. User's Guide. The MathWorks, Inc., Natick, MA 01760, 1994{2001.
[44] M. Molla, M.Waddell, D. Page, and J. Shavlik. Using machine learning to design and
interpret gene-expression microarrays. AI Magazine, Special Issue on Bioinformatics,
2003.
[45] S. Odewahn, E. Stockwell, R. Pennington, R. Humphreys, and W. Zumach. Au-
tomated star/galaxy discrimination with neural networks. Astronomical Journal,
103(1):318{331, 1992.
[46] M. R. Osborne, B. Presnell, and B. A. Turlach. A new approach to variable selection
in least squares problems. IMA Journal of Numerical Analysis, 20:389{404, 2000.
[47] M. R. Osborne, B. Presnell, and B. A. Turlach. On the lasso and its dual. Journal
of Computational and Graphical Statistics, 9(2):319{337, 2000.
[48] D. Page, F. Zhan, J. Cussens, M. Waddell, J. Hardin, B. Barlogie, and J. Shaugh-
nessy. Comparative data mining for microarrays: A case study based on multiple
myeloma. Technical report, Computer Sciences Department, University of Wisconsin,
November 2002.
[49] J. C. Platt. Sequential Minimal Optimization: A Fast Algorithm for Training Support
Vector Machines. In B. SchÄolkopf, C. J. C. Burges, and A. J. Smola, editors, Advances
in Kernel Methods - Support Vector Learning, pages 185{208. MIT Press, 1999.
[50] V. Roth. The generalized lasso. IEEE Transactions on Neural Networks, 15(1):16{28,
2004.
[51] D. Slonim, P. Tamayo, J. Mesirov, T. Golub, and E. Lander. Class prediction and
discovery using gene expression data. In Fourth annual international conference on
computational molecular biology, pages 263{272, 2000.
[52] M. Stone. Cross-validatory choice and assessment of statistical predictions. Journal
of the Royal Statistical Society, 36:111{147, 1974.
[53] R. Tibshirani. Regression shrinkage and selection via the lasso. J.R.S.S.B, 58(1):267{
288, 1996.
[54] P. Tseng. Analysis of a non-interior continuation method based on Chen-Mangasarian
smoothing functions for complementarity problems. In M. Fukushima and L. Qi,
editors, Reformulation: Nonsmooth, Piecewise Smooth, Semismooth and Smoothing
Methods, pages 381{404, Dordrecht, Netherlands, 1999. Kluwer Academic Publishers.
[55] M.J. van de Vijver et al. A gene-expression signature as a predictor of survival in
breast cancer. The New England Journal of Medicine, 347:1999{2009, 2002.
[56] V. N. Vapnik. The Nature of Statistical Learning Theory. Springer, New York, 1995.
[57] J. Weston, A. Elissee®, B. SchÄolkopf, and M. Tipping. Use of the zero-norm with
linear models and kernel methods. Jorunal of Machine Learning Research, 3:1439{
1461, 2003.
[58] J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio, and V. Vapnik. Feature
selection for SVMs. In Advances in Neural Information Processing Systems 13, pages
668{674, 2001.
[59] J. Zhu, S. Rosset, T. Hastie, and R. Tibshirani. 1-norm support vector machines. In
Advances in Neural Information Processing Systems 16-NIPS2003. MIT Press, 2003.
[60] H. Zou. An improved 1-norm svm for simultaneous classi‾cation and variable se-
lection. Eleventh International Conference on Arti‾cial Intelligence and Statistics,
2007.

QR CODE