研究生: |
簡立仁 Li-Jen Chien |
---|---|
論文名稱: |
穩健平滑支撐向量機學習機制之研究 Robust Smooth Support Vector Machine Learning |
指導教授: |
李育杰
Yuh-Jye Lee |
口試委員: |
鮑興國
Hsing-Kuo Kenneth Pao 卓政宏 none 王鈺強 Yu-Chiang Frank Wang 陶金旺 C. W. Tao |
學位類別: |
博士 Doctor |
系所名稱: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
論文出版年: | 2010 |
畢業學年度: | 98 |
語文別: | 英文 |
論文頁數: | 90 |
中文關鍵詞: | 群集法 、特徵選取 、多元分類 、異常濾除 、穩健性 、支撐向量法 、平滑法 |
外文關鍵詞: | clustering, feature selection, multiclass classification, outlier filtering, robustness, support vector machine, smoothing technique |
相關次數: | 點閱:548 下載:19 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
這篇論文提出四項穩健的平滑支撐向量機的學習機制,首先探討如何求取具代表性RSVM縮簡集(reduced set)的方法。作者提出聚集縮簡支撐向量法clustering reduced support vector machine (CRSVM),作法為針對每個類別求其指定數量之聚集中心(centroids)當作縮簡集,再計算出各聚集對應的聚集密度,進而推估出聚集中心建構之高斯核函數Gaussian kernel所需的寬度參數(width parameter),如此可大幅縮簡機器學習所需的時間,實驗證明在較小的縮簡集的情況下,其正確率上的表現亦與原本RSVM不相上下。
再者我們修改SSVM2分類器目標函數中的2-norm loss term,採用1-norm loss term來建構 SSVM1二元穩健分類器。為達到進一步穩健需求,也設計在SSVMs牛頓法學習過程中,直接於每一輪計算將訓練集中異常點去除,不加入後續的學習過程,如此在實驗上顯示SSVMs建模分類器表現較好,亦較穩定。
其次則由於發展SSVM1過程中的啟發,我們接續將1-norm SVM作平滑化,並因為LASSO在1-norm penalty term上語意的代表性,我們將之名為smooth LASSO for classification (SLASSO)。實驗證明SLASSO可同時在分類(classification)及特徵選取(feature selection)上有優質表現。
最後針對多元分類演算法框架的缺點作改進,提出one-vs.-one-vs.-rest (OOR)方法,作法類同常用的one-vs.-one方法,但其基礎分類器採用新設計的TSSVM三元分類器,可以有廢票的機制,如此就不會受無關類別的強加票數干擾。實驗結果顯示OOR不僅在多元分類問題的正確率上優於其他方法,我們經由投票內容分析,亦證明OOR在投票上的確可以將無關類別的票濾除,使得投票結果的可信程度大幅提昇。鑑於OOR的特性,我們認為OOR可有自動偵測隱藏類別(hidden class)的能力,透過設計 “leave-one-class-out"實驗模式,我們亦證明OOR在偵測隱藏hidden class的能力明顯優於 one-vs.-one及one-vs.-rest等兩種方法。
This dissertation proposes four robust smooth support vector machine learning methodologies. First, we propose a new approach to generate representative reduced set for RSVM. Clustering reduced support vector machine (CRSVM) generates cluster centroids of each class and uses them to form the reduced set. By estimating the approximate density for each cluster, we can compute the width parameter used in Gaussian kernel. Secondly, we modify the previous 2-norm soft margin smooth support vector machine (SSVM2) to propose a new 1-norm soft margin smooth support vector machine (SSVM1). We also propose a heuristic method of outlier filtering for SSVMs which costs little in training process and improves the ability of outlier resistance a lot. Thirdly, we introduce the smooth technique into 1-norm SVM and call it smooth LASSO for classification (SLASSO). It can provide simultaneous classification and feature selection. Results showed that SLASSO has slightly better accuracy than other
approaches with the desirable ability of feature suppression. In the end of this dissertation, we implement a ternary SSVM (TSSVM) and use it to design a novel multiclass classification scheme, one-vs.-one-vs.-rest (OOR). It decomposes the problem into a series of k(k-1)/2 ternary classification subproblems. Results show that TSSVM/OOR performs better than one-vs.-one and one-vs.-rest. We also find out that the prediction confidence of OOR is significantly higher than the one-vs.-one scheme. Due to the nature of OOR design, it can be applied to detect the hidden (unknown) class directly. We conduct a "leave-one-class-out" experiment on the pendigits dataset which shows that OOR outperforms the one-vs.-one and one-vs.-rest in the hidden class detection rate.
[1] CBCL Face Database #1. MIT Center For Biological and Computation Learning.
http://cbcl.mit.edu/software-datasets/FaceData2.html.
[2] E. L. Allwein, R. E. Schapire, and Y. Singer. Reducing multiclass to binary: a unifying approach for margin classifiers. J. Machine Learning Research, 1:113-141,
2000.
[3] U. Alon, N. Barkai, D. A. Notterman, K. Gish, S. Ybarra, D. Mack, and A. J. Levine.
Broad patterns of gene expression revealed by clustering analysis of tumor and normal
colon tissues probed by oligonucleotide arrays. Cell Biology, 96:6745-6750, 1999.
[4] L. Armijo. Minimization of functions having Lipschitz-continuous first partial derivatives. Pacific Journal of Mathematics, 16:1-3, 1966.
[5] A. Asuncion and D. J. Newman. UCI Machine Learning Repository, 2007.
http://www.ics.uci.edu/ mlearn/MLRepository.html
University of California, Irvine, School of Information and Computer Sciences.
[6] Y. Bengio. Gradient-based optimization of hyperparameters. Neural Computation,
12(8):1889-1900, 2000.
[7] D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, MA, second
edition, 1999.
[8] L. Bottou, C. Cortes, J. Denker, H. Drucker, I. Guyon, L. Jackel, Y. LeCun,
U. Muller, E. Sackinger, P. Simard, and V. Vapnik. Comparison of classifier meth-
ods: a case study in handwriting digit recognition. In International Conference on
Pattern Recognition, pages 77-87. IEEE Computer Society Press, 1994.
[9] C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data
Mining and Knowledge Discovery, 2(2):121-167, 1998.
[10] C. C. Chang and C. J. Lin. LIBSVM: a library for support vector machines. 2001.
Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm.
[11] O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee. Choosing multiple param-
eters for support vector machines. Machine Learning, 46(1):131-159, 2002.
84
[12] C. Chen and O. L. Mangasarian. Smoothing methods for convex inequalities and
linear complementarity problems. Mathematical Programming, 71(1):51-69, 1995.
[13] C. Chen and O. L. Mangasarian. A class of smoothing functions for nonlinear and
mixed complementarity problems. Computational Optimization and Applications,
5(2):97-138, 1996.
[14] J.-H. Chen. M-estimator based robust kernels for support vector machines. In Pro-
ceedings of the Seventeenth International Conference on Pattern Recognition (ICPR
2004), volume 1, pages 168-171, Cambridge, UK, 2004.
[15] X. Chen and Y. Ye. On homotopy-smoothing methods for variational inequalities.
SIAM Journal on Control and Optimization, 37:589-616, 1999.
[16] V. Cherkassky and F. Mulier. Learning from Data - Concepts, Theory and Methods.
John Wiley & Sons, New York, 1998.
[17] L.-J. Chien. and Y.-J. Lee. Clustering model selection for reduced support vector
machines. In Proc. 5th Intelligent Data Engineering and Automated Learning, pages
714-719, Exeter, UK, August 2004. LNCS 3177, Springer-Verlag.
[18] A. Christmann and I. Steinwart. On robustness properties of convex risk mini-
mization methods for pattern recognition. Journal of Machine Learning Research,
5:1007-1034, 2004.
[19] F. H. Clarke. Optimization and Nonsmooth Analysis. SIAM, Philadelphia, PA, 1990.
[20] C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20:273-297,
1995.
[21] K. Crammer and Y. Singer. Improved output coding for classification using con-
tinuous relaxation. In Proceeding of the Thirteenth Annual Conference on Neural
Information Processing Systems, 2000.
[22] K. Crammer and Y. Singer. On the learnability and design of output codes for
multiclass problems. Machine Learning, 47:201-233, 2002.
[23] N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines.
Cambridge University Press, Cambridge, 2000.
[24] J. E. Dennis and R. B. Schnabel. Numerical Methods for Unconstrained Optimization
and Nonlinear Equations. Prentice-Hall, Englewood Cli®s, N.J., 1983.
[25] T. G. Dietterich and B. Bakiri. Solving multiclass learning problems via error-
correcting output codes. J. Artificial Intelligence Research, 2:263-286, 1995.
[26] H. Drucker, C. J. C. Burges, L. Kaufman, A. Smola, and V. Vapnik. Support vector
regression machines. In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances
in Neural Information Processing Systems -9-, pages 155-161, Cambridge, MA, 1997.
MIT Press.
85
[27] K. T. Fang and Y. Wang. Number-Theoretic Methods in Statistics. Chapmman &
Hall, London, 1994.
[28] D.S. Felsenthal and M. Machover. Ternary voting games. International Journal of
Game Theory, 26:335-351, 1997.
[29] J. Friedman. Another approach to polychotomous classification. Technical report,
Department of Statistics, Stanford University, 1996. Available at http://www-
stat.stanford.edu/reports/friedman/poly.ps.Z.
[30] G. M. Fung and O. L. Mangasarian. A feature selection newton method for support vector machine classification. Computational Optimization and Applications, 28:185-202, 2004.
[31] J. FÄurnkranz. Round robin classification. Journal of Machine Learning Research,
2:721-747, 2002.
[32] G. H. Golub and C. F. Van Loan. Matrix Computations. The John Hopkins University
Press, Baltimore, Maryland, 3rd edition, 1996.
[33] T. Golub, D. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. Mesirov, H. Coller,
M. Loh, J. Downing, M. Caligiuri, C. Bloomfield, and E. Lander. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring.
Science, 286(5439), 1999.
[34] I. Guyon, J. Watson, S. Barnhill, and V. Vapnik. Gene selection for cancer classification using support vector machines. Machine Learning, 46(1/2/3):191-202, 2002.
[35] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning:
Data Mining, Inference, and Prediction. Springer, New York, second edition, 2009.
[36] C. W. Hsu and C. J. Lin. A comparison on methods for multi-class support vector
machines. IEEE Transactions on Neural Networks, 13:415-425, 2002.
[37] C.-M. Huang, Y.-J. Lee, D. K. J. Lin, and S.-Y. Huang. Model selection 1 for support
vector machines via uniform design. A special issue on Machine Learning and Robust
Data Mining of Computational Statistics and Data Analysis, 52:335-346, 2007.
[38] S. S. Keerthi, O. Chapelle, and D. DeCoste. Building support vector machines with
reduced classifier complexity. JMLR, 7:1493-1515, 2006.
[39] S. S. Keerthi and C.-J. Lin. Asymptotic behaviors of support vector machines with
Gaussian kernel. Neural Computation, 15(7):1667-1689, 2003.
[40] S. Knerr, L. Personnaz, and G. Dreyfus. Single-layer learning revisited: a stepwise
procedure for building and training a neural network. In J. Fogelman, editor, Neu-
rocomputing: Algorithms, Architectures and Applications. Springer-Verlag, 1990.
[41] R. Kohavi and G. John. Wrapper fo feature subset selection. Artificial Intelligent
Journal, pages 273-324, 1997.
86
[42] U. Krefiel. Pairwise classification and support vector machines. In B. SchÄolkopf,
C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods - Support
Vector Learning, pages 255-268, Cambridge, MA, 1999. MIT Press.
[43] Jan Larsen, Claus Svarer, Lars Nonboe Andersen, and Lars Kai Hansen. Adaptive
regularization in neural network modeling. Lecture Notes in Computer Science, pages
113-132, 1998.
[44] Y. Lee, Y. Lin, and G.Wahba. Multicategory support vector machines. In Proceedings
of the 33rd Symposium on the Interface, 2001.
[45] Y. J. Lee, C. C. Chang, and C. H. Chao. Incremental forward feature selection with
application to microarray gene expression. Journal of Biopharmaceutical Statistics,
(to appear) 2008.
[46] Y. J. Lee, W. F. Hsieh, and C. M. Huang. 2-SSVR: A smooth support vector machine
for 2-insensitive regression. IEEE Transactions on Knowledge and Data Engineering,
17:678-685, 2005.
[47] Y.-J. Lee and S.-Y. Huang. Reduced support vector machines: A statistical theory.
IEEE Transactions on Neural Networks, 18(1):1-13, 2007.
[48] Y. J. Lee and O. L. Mangasarian. RSVM: Reduced support vector machines. In
First SIAM International Conference on Data Mining, Chicago, 2001.
[49] Y. J. Lee and O. L. Mangasarian. SSVM: A smooth support vector machine. Com-
putational Optimization and Applications, 20:5-22, 2001.
[50] Y. Liu and Y. Wu. Robust truncated-hinge-loss support vector machines. Journal
of the American Statistical Association, 102:974-983, 2007.
[51] A. Lyhyaoui, M. Martinez, I. Mora, M. Vazquez, J.-L. Sancho, and A. R. Figueiras-
Vidal. Sample selection via clustering to construct support vector-like classifier. IEEE
Transactions on Neural Networks, 10:1474-1481, 1999.
[52] O. L. Mangasarian. Mathematical Programming in Neural Networks. ORSA Journal
on Computing, 5(4):349-360, 1993.
[53] O. L. Mangasarian. Nonlinear Programming. SIAM, Philadelphia, PA, 1994.
[54] O. L. Mangasarian. Generalized support vector machines. In A. Smola, P. Bartlett,
B. SchÄolkopf, and D. Shuurmans, editors, Advance in Large Margin Classifiers, pages
135-146, Cambridge, MA, 2000. MIT Press.
[55] O. L. Mangasarian. Exact 1-norm support vector machines via unconstrained convex
di®erentialble minimization. Journal of Machine Learning Research, 7:1517-1530,
2006.
[56] O. L. Mangasarian and D. R. Musicant. Large scale kernel regression via lin-
ear programming. Technical Report 99-02, Data Mining Institute, Computer Sci-
ences Department, University of Wisconsin, Madison, Wisconsin, August 1999.
ftp://ftp.cs.wisc.edu/pub/dmi/tech-reports/99-02.ps.
87
[57] O. L. Mangasarian and D. R. Musicant. Successive Overrelaxation for Support Vector
Machines. IEEE Transactions on Neural Networks, 10:1032-1037, 1999.
[58] O. L. Mangasarian and D. R. Musicant. Robust linear and support vector regression.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(9):950-955,
2000. ftp://ftp.cs.wisc.edu/pub/dmi/tech-reports/99-09.ps.
[59] O. L. Mangasarian, W. N. Street, and W. H. Wolberg. Breast cancer diagnosis and
prognosis via linear programming. Operations Research, 43(4):570-577, 1995.
[60] MATLAB. User's Guide. The MathWorks, Inc., Natick, MA 01760, 1994-2001.
http://www.mathworks.com.
[61] D. Michie, D. J. Spiegelhalter, and C. C. Taylor. Machine Learning, Neural and
Statistical Classification. Prentice-Hall, Englewood Cli®s, N.J., 1994. Data available
at anonymous ftp://ftp.ncc.up.pt/pub/statlog.
[62] T. M. Mitchell. Machine Learning. McGraw-Hill, Boston, 1997.
[63] M. Molla, M.Waddell, D. Page, and J. Shavlik. Using machine learning to design and
interpret gene-expression microarrays. AI Magazine, Special Issue on Bioinformatics,
2003.
[64] S. Odewahn, E. Stockwell, R. Pennington, R. Humphreys, and W. Zumach. Au-
tomated star/galaxy discrimination with neural networks. Astronomical Journal,
103(1):318-331, 1992.
[65] M. R. Osborne, B. Presnell, and B. A. Turlach. A new approach to variable selection
in least squares problems. IMA Journal of Numerical Analysis, 20:389-404, 2000.
[66] Y.-J. Oyang, S.-C. Hwang, Y.-Y. Ou, C.-Y. Chen, and Z.-W. Chen. An novel learning
algorithm for data classification with radial basis function networks. In Proceeding
of 9th International Conference on Neural Information Processing, pages 18-22, Sin-
gapore, Nov. 2001.
[67] D. Page, F. Zhan, J. Cussens, M. Waddell, J. Hardin, B. Barlogie, and J. Shaugh-
nessy. Comparative data mining for microarrays: A case study based on multiple
myeloma. Technical report, Computer Sciences Department, University of Wisconsin,
November 2002.
[68] C. E. Rasmussen and Z. Ghahramani. Occam's razor. In Todd K. Leen, Thomas G.
Dietterich, and Volker Tresp, editors, Advances in Neural Information Processing
Systems 13, pages 294-300, Cambridge, MA, 2001. MIT Press.
[69] R. Rifkin and A. Klautau. In defense of one-vs-all classification. Journal of Machine
Learning Research, 5:101-141, 2004.
[70] V. Roth. The generalized lasso. IEEE Transactions on Neural Networks, 15(1):16-28,
2004.
88
[71] B. W. Silverman. Density Estimation for Statistics and Data Analysis. Chapmman,
Hall, London, 1986.
[72] A. Smola and B. SchÄolkopf. Sparse greedy matrix approximation for machine learn-
ing. In Proc. 17th International Conf. on Machine Learning, pages 911-918. Morgan
Kaufmann, San Francisco, CA, 2000.
[73] A. Smola and B. SchÄolkopf. A tutorial on support vector regression. Statistics and
Computing, 14:199-222, 2004.
[74] C. Staelin. Parameter selection for support vector machines. Hewlett-Packard Com-
pany, Tech. Rep. HPL-2002-354R1, 2003.
[75] I. Steinwart and A. Christmann. Support Vector Machines. Springer, New York,
2008.
[76] C. J. Stone. An asymptotically optimal window selection rule for kernel density
estimates. Annals of Statistics, 12:1285-1297, 1984.
[77] M. Stone. Cross-validatory choice and assessment of statistical predictions. Journal
of the Royal Statistical Society, 36:111-147, 1974.
[78] R. Tibshirani. Regression shrinkage and selection via the lasso. J.R.S.S.B, 58(1):267-288, 1996.
[79] V. Vapnik. Statistical Learning Theory. Wiley, New York, NY, 1998.
[80] V. N. Vapnik. The Nature of Statistical Learning Theory. Springer, 2000.
[81] J. Weston, A. Elissee®, B. SchÄolkopf, and M. Tipping. Use of the zero-norm with
linear models and kernel methods. Jorunal of Machine Learning Research, 3:1439-1461, 2003.
[82] J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio, and V. Vapnik. Feature
selection for SVMs. In Advances in Neural Information Processing Systems 13, pages
668-674, 2001.
[83] J. Weston and C. Watkins. Multi-class support vector machines. In M. Verleysen,
editor, Proceedings of ESANN99, Brussels, 1999. D. Facto Press.
[84] C. K. I. Williams and M. Seeger. Using the NystrÄom method to speed up kernel ma-
chines. In Todd K. Leen, Thomas G. Dietterich, and Volker Tresp, editors, Advances
in Neural Information Processing Systems 13, pages 682-688, Cambridge, MA, 2001.
MIT Press.
[85] I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and
Techniques with Java Implementations. Morgan Kaufmann, San Francisco, CA, 1999.
[86] M. Wu, B. SchÄolkopf, and G. Bakir. A direct method for building sparse kernel
learning algorithms. Journal of Machine Learning Research, 7:603-624, 2006.
[87] H. Xu, C. Caramanis, and S. Mannor. Robustness and regularization of support
vector machines. Journal of Machine Learning Research, 10:1485-1510, 2009.
[88] J. Zhu, S. Rosset, T. Hastie, and R. Tibshirani. 1-norm support vector machines. In
Advances in Neural Information Processing Systems 16-NIPS2003. MIT Press, 2003.
[89] H. Zou. An improved 1-norm svm for simultaneous classification and variable se-
lection. Eleventh International Conference on Artificial Intelligence and Statistics,
2007.