簡易檢索 / 詳目顯示

研究生: 簡立仁
Li-Jen Chien
論文名稱: 穩健平滑支撐向量機學習機制之研究
Robust Smooth Support Vector Machine Learning
指導教授: 李育杰
Yuh-Jye Lee
口試委員: 鮑興國
Hsing-Kuo Kenneth Pao
卓政宏
none
王鈺強
Yu-Chiang Frank Wang
陶金旺
C. W. Tao
學位類別: 博士
Doctor
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2010
畢業學年度: 98
語文別: 英文
論文頁數: 90
中文關鍵詞: 群集法特徵選取多元分類異常濾除穩健性支撐向量法平滑法
外文關鍵詞: clustering, feature selection, multiclass classification, outlier filtering, robustness, support vector machine, smoothing technique
相關次數: 點閱:264下載:19
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 這篇論文提出四項穩健的平滑支撐向量機的學習機制,首先探討如何求取具代表性RSVM縮簡集(reduced set)的方法。作者提出聚集縮簡支撐向量法clustering reduced support vector machine (CRSVM),作法為針對每個類別求其指定數量之聚集中心(centroids)當作縮簡集,再計算出各聚集對應的聚集密度,進而推估出聚集中心建構之高斯核函數Gaussian kernel所需的寬度參數(width parameter),如此可大幅縮簡機器學習所需的時間,實驗證明在較小的縮簡集的情況下,其正確率上的表現亦與原本RSVM不相上下。

    再者我們修改SSVM2分類器目標函數中的2-norm loss term,採用1-norm loss term來建構 SSVM1二元穩健分類器。為達到進一步穩健需求,也設計在SSVMs牛頓法學習過程中,直接於每一輪計算將訓練集中異常點去除,不加入後續的學習過程,如此在實驗上顯示SSVMs建模分類器表現較好,亦較穩定。

    其次則由於發展SSVM1過程中的啟發,我們接續將1-norm SVM作平滑化,並因為LASSO在1-norm penalty term上語意的代表性,我們將之名為smooth LASSO for classification (SLASSO)。實驗證明SLASSO可同時在分類(classification)及特徵選取(feature selection)上有優質表現。

    最後針對多元分類演算法框架的缺點作改進,提出one-vs.-one-vs.-rest (OOR)方法,作法類同常用的one-vs.-one方法,但其基礎分類器採用新設計的TSSVM三元分類器,可以有廢票的機制,如此就不會受無關類別的強加票數干擾。實驗結果顯示OOR不僅在多元分類問題的正確率上優於其他方法,我們經由投票內容分析,亦證明OOR在投票上的確可以將無關類別的票濾除,使得投票結果的可信程度大幅提昇。鑑於OOR的特性,我們認為OOR可有自動偵測隱藏類別(hidden class)的能力,透過設計 “leave-one-class-out"實驗模式,我們亦證明OOR在偵測隱藏hidden class的能力明顯優於 one-vs.-one及one-vs.-rest等兩種方法。


    This dissertation proposes four robust smooth support vector machine learning methodologies. First, we propose a new approach to generate representative reduced set for RSVM. Clustering reduced support vector machine (CRSVM) generates cluster centroids of each class and uses them to form the reduced set. By estimating the approximate density for each cluster, we can compute the width parameter used in Gaussian kernel. Secondly, we modify the previous 2-norm soft margin smooth support vector machine (SSVM2) to propose a new 1-norm soft margin smooth support vector machine (SSVM1). We also propose a heuristic method of outlier filtering for SSVMs which costs little in training process and improves the ability of outlier resistance a lot. Thirdly, we introduce the smooth technique into 1-norm SVM and call it smooth LASSO for classification (SLASSO). It can provide simultaneous classification and feature selection. Results showed that SLASSO has slightly better accuracy than other
    approaches with the desirable ability of feature suppression. In the end of this dissertation, we implement a ternary SSVM (TSSVM) and use it to design a novel multiclass classification scheme, one-vs.-one-vs.-rest (OOR). It decomposes the problem into a series of k(k-1)/2 ternary classification subproblems. Results show that TSSVM/OOR performs better than one-vs.-one and one-vs.-rest. We also find out that the prediction confidence of OOR is significantly higher than the one-vs.-one scheme. Due to the nature of OOR design, it can be applied to detect the hidden (unknown) class directly. We conduct a "leave-one-class-out" experiment on the pendigits dataset which shows that OOR outperforms the one-vs.-one and one-vs.-rest in the hidden class detection rate.

    1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Robust SSVMs Learning Tasks . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Organization of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Smooth Support Vector Machine Methodologies 7 2.1 1-Norm and 2-Norm Soft Margin SVMs . . . . . . . . . . . . . . . . . . . . 7 2.2 Smooth Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Reduced Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . 11 2.4 Model Selection of SVMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3 Reduced Set Selection for RSVM 15 3.1 Motivation and Background . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2 Clustering Reduced Support Vector Machine . . . . . . . . . . . . . . . . . 18 3.2.1 Parameters Estimation in RBFN . . . . . . . . . . . . . . . . . . . 18 3.2.2 Kernel Width Estimation and Algorithm Description . . . . . . . . 19 3.3 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3.1 A Comparison of Variant RSVM Schemes . . . . . . . . . . . . . . 21 3.3.2 Spectral Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4 Robust Smooth Support Vector Machine 30 4.1 Motivation and Background . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.2 Outlier E®ect on Soft Margin SVMs . . . . . . . . . . . . . . . . . . . . . . 32 4.3 1-Norm Soft Margin Smooth SVM (SSVM1) . . . . . . . . . . . . . . . . . 34 4.3.1 Linear SSVM1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.3.2 Nonlinear SSVM1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.3.3 A Newton-Armijo Algorithm for SSVM1 . . . . . . . . . . . . . . . 36 4.3.4 Implementation for SSVM1 . . . . . . . . . . . . . . . . . . . . . . 38 4.4 A Heuristic for Outlier Filtering on SSVMs . . . . . . . . . . . . . . . . . . 39 4.4.1 Heuristic Algorithm of Outlier Filtering . . . . . . . . . . . . . . . . 41 4.4.2 Numerical Results of SSVM2, SSVM1, LIBSVM [10], SSVM2-o and SSVM1-o . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.5 Model Selection for Robust "epsilon-SSVR via 3D Uniform Design . . . . . . . . . 45 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5 Smooth LASSO for Classification 50 5.1 Motivation and Background . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.2 LASSO and 1-norm Support Vector Machine . . . . . . . . . . . . . . . . . 52 5.3 SLASSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.4 Newton-Armijo Algorithm for SLASSO on classification . . . . . . . . . . . 54 5.5 Experimental Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.5.1 Data Presentation and Experimental Setup . . . . . . . . . . . . . . 56 5.5.2 Numerical Results and Comparisons . . . . . . . . . . . . . . . . . 59 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6 Robust Strategies on Multi-class Classification 63 6.1 Motivation and Background . . . . . . . . . . . . . . . . . . . . . . . . . . 64 6.2 The Framework for Multi-class Classification via TSSVM . . . . . . . . . . 66 6.2.1 Multi-class Smooth Support Vector Machine . . . . . . . . . . . . . 66 6.2.2 Ternary Smooth Support Vector Machine (TSSVM) . . . . . . . . . 68 6.2.3 One-vs.-One-vs.-Rest Strategy . . . . . . . . . . . . . . . . . . . . . 70 6.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 6.3.1 Numerical Results on UCI Datasets . . . . . . . . . . . . . . . . . . 73 6.3.2 The Purification Ability of OOR . . . . . . . . . . . . . . . . . . . 75 6.3.3 The Prediction Confidence of OOR . . . . . . . . . . . . . . . . . . 78 6.3.4 The Detection Ability of OOR for Hidden Classes . . . . . . . . . . 79 6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 7 Conclusion 82

    [1] CBCL Face Database #1. MIT Center For Biological and Computation Learning.
    http://cbcl.mit.edu/software-datasets/FaceData2.html.
    [2] E. L. Allwein, R. E. Schapire, and Y. Singer. Reducing multiclass to binary: a unifying approach for margin classifiers. J. Machine Learning Research, 1:113-141,
    2000.
    [3] U. Alon, N. Barkai, D. A. Notterman, K. Gish, S. Ybarra, D. Mack, and A. J. Levine.
    Broad patterns of gene expression revealed by clustering analysis of tumor and normal
    colon tissues probed by oligonucleotide arrays. Cell Biology, 96:6745-6750, 1999.
    [4] L. Armijo. Minimization of functions having Lipschitz-continuous first partial derivatives. Pacific Journal of Mathematics, 16:1-3, 1966.
    [5] A. Asuncion and D. J. Newman. UCI Machine Learning Repository, 2007.
    http://www.ics.uci.edu/ mlearn/MLRepository.html
    University of California, Irvine, School of Information and Computer Sciences.
    [6] Y. Bengio. Gradient-based optimization of hyperparameters. Neural Computation,
    12(8):1889-1900, 2000.
    [7] D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, MA, second
    edition, 1999.
    [8] L. Bottou, C. Cortes, J. Denker, H. Drucker, I. Guyon, L. Jackel, Y. LeCun,
    U. Muller, E. Sackinger, P. Simard, and V. Vapnik. Comparison of classifier meth-
    ods: a case study in handwriting digit recognition. In International Conference on
    Pattern Recognition, pages 77-87. IEEE Computer Society Press, 1994.
    [9] C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data
    Mining and Knowledge Discovery, 2(2):121-167, 1998.
    [10] C. C. Chang and C. J. Lin. LIBSVM: a library for support vector machines. 2001.
    Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm.
    [11] O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee. Choosing multiple param-
    eters for support vector machines. Machine Learning, 46(1):131-159, 2002.
    84
    [12] C. Chen and O. L. Mangasarian. Smoothing methods for convex inequalities and
    linear complementarity problems. Mathematical Programming, 71(1):51-69, 1995.
    [13] C. Chen and O. L. Mangasarian. A class of smoothing functions for nonlinear and
    mixed complementarity problems. Computational Optimization and Applications,
    5(2):97-138, 1996.
    [14] J.-H. Chen. M-estimator based robust kernels for support vector machines. In Pro-
    ceedings of the Seventeenth International Conference on Pattern Recognition (ICPR
    2004), volume 1, pages 168-171, Cambridge, UK, 2004.
    [15] X. Chen and Y. Ye. On homotopy-smoothing methods for variational inequalities.
    SIAM Journal on Control and Optimization, 37:589-616, 1999.
    [16] V. Cherkassky and F. Mulier. Learning from Data - Concepts, Theory and Methods.
    John Wiley & Sons, New York, 1998.
    [17] L.-J. Chien. and Y.-J. Lee. Clustering model selection for reduced support vector
    machines. In Proc. 5th Intelligent Data Engineering and Automated Learning, pages
    714-719, Exeter, UK, August 2004. LNCS 3177, Springer-Verlag.
    [18] A. Christmann and I. Steinwart. On robustness properties of convex risk mini-
    mization methods for pattern recognition. Journal of Machine Learning Research,
    5:1007-1034, 2004.
    [19] F. H. Clarke. Optimization and Nonsmooth Analysis. SIAM, Philadelphia, PA, 1990.
    [20] C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20:273-297,
    1995.
    [21] K. Crammer and Y. Singer. Improved output coding for classification using con-
    tinuous relaxation. In Proceeding of the Thirteenth Annual Conference on Neural
    Information Processing Systems, 2000.
    [22] K. Crammer and Y. Singer. On the learnability and design of output codes for
    multiclass problems. Machine Learning, 47:201-233, 2002.
    [23] N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines.
    Cambridge University Press, Cambridge, 2000.
    [24] J. E. Dennis and R. B. Schnabel. Numerical Methods for Unconstrained Optimization
    and Nonlinear Equations. Prentice-Hall, Englewood Cli®s, N.J., 1983.
    [25] T. G. Dietterich and B. Bakiri. Solving multiclass learning problems via error-
    correcting output codes. J. Artificial Intelligence Research, 2:263-286, 1995.
    [26] H. Drucker, C. J. C. Burges, L. Kaufman, A. Smola, and V. Vapnik. Support vector
    regression machines. In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances
    in Neural Information Processing Systems -9-, pages 155-161, Cambridge, MA, 1997.
    MIT Press.
    85
    [27] K. T. Fang and Y. Wang. Number-Theoretic Methods in Statistics. Chapmman &
    Hall, London, 1994.
    [28] D.S. Felsenthal and M. Machover. Ternary voting games. International Journal of
    Game Theory, 26:335-351, 1997.
    [29] J. Friedman. Another approach to polychotomous classification. Technical report,
    Department of Statistics, Stanford University, 1996. Available at http://www-
    stat.stanford.edu/reports/friedman/poly.ps.Z.
    [30] G. M. Fung and O. L. Mangasarian. A feature selection newton method for support vector machine classification. Computational Optimization and Applications, 28:185-202, 2004.
    [31] J. FÄurnkranz. Round robin classification. Journal of Machine Learning Research,
    2:721-747, 2002.
    [32] G. H. Golub and C. F. Van Loan. Matrix Computations. The John Hopkins University
    Press, Baltimore, Maryland, 3rd edition, 1996.
    [33] T. Golub, D. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. Mesirov, H. Coller,
    M. Loh, J. Downing, M. Caligiuri, C. Bloomfield, and E. Lander. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring.
    Science, 286(5439), 1999.
    [34] I. Guyon, J. Watson, S. Barnhill, and V. Vapnik. Gene selection for cancer classification using support vector machines. Machine Learning, 46(1/2/3):191-202, 2002.
    [35] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning:
    Data Mining, Inference, and Prediction. Springer, New York, second edition, 2009.
    [36] C. W. Hsu and C. J. Lin. A comparison on methods for multi-class support vector
    machines. IEEE Transactions on Neural Networks, 13:415-425, 2002.
    [37] C.-M. Huang, Y.-J. Lee, D. K. J. Lin, and S.-Y. Huang. Model selection 1 for support
    vector machines via uniform design. A special issue on Machine Learning and Robust
    Data Mining of Computational Statistics and Data Analysis, 52:335-346, 2007.
    [38] S. S. Keerthi, O. Chapelle, and D. DeCoste. Building support vector machines with
    reduced classifier complexity. JMLR, 7:1493-1515, 2006.
    [39] S. S. Keerthi and C.-J. Lin. Asymptotic behaviors of support vector machines with
    Gaussian kernel. Neural Computation, 15(7):1667-1689, 2003.
    [40] S. Knerr, L. Personnaz, and G. Dreyfus. Single-layer learning revisited: a stepwise
    procedure for building and training a neural network. In J. Fogelman, editor, Neu-
    rocomputing: Algorithms, Architectures and Applications. Springer-Verlag, 1990.
    [41] R. Kohavi and G. John. Wrapper fo feature subset selection. Artificial Intelligent
    Journal, pages 273-324, 1997.
    86
    [42] U. Krefiel. Pairwise classification and support vector machines. In B. SchÄolkopf,
    C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods - Support
    Vector Learning, pages 255-268, Cambridge, MA, 1999. MIT Press.
    [43] Jan Larsen, Claus Svarer, Lars Nonboe Andersen, and Lars Kai Hansen. Adaptive
    regularization in neural network modeling. Lecture Notes in Computer Science, pages
    113-132, 1998.
    [44] Y. Lee, Y. Lin, and G.Wahba. Multicategory support vector machines. In Proceedings
    of the 33rd Symposium on the Interface, 2001.
    [45] Y. J. Lee, C. C. Chang, and C. H. Chao. Incremental forward feature selection with
    application to microarray gene expression. Journal of Biopharmaceutical Statistics,
    (to appear) 2008.
    [46] Y. J. Lee, W. F. Hsieh, and C. M. Huang. 2-SSVR: A smooth support vector machine
    for 2-insensitive regression. IEEE Transactions on Knowledge and Data Engineering,
    17:678-685, 2005.
    [47] Y.-J. Lee and S.-Y. Huang. Reduced support vector machines: A statistical theory.
    IEEE Transactions on Neural Networks, 18(1):1-13, 2007.
    [48] Y. J. Lee and O. L. Mangasarian. RSVM: Reduced support vector machines. In
    First SIAM International Conference on Data Mining, Chicago, 2001.
    [49] Y. J. Lee and O. L. Mangasarian. SSVM: A smooth support vector machine. Com-
    putational Optimization and Applications, 20:5-22, 2001.
    [50] Y. Liu and Y. Wu. Robust truncated-hinge-loss support vector machines. Journal
    of the American Statistical Association, 102:974-983, 2007.
    [51] A. Lyhyaoui, M. Martinez, I. Mora, M. Vazquez, J.-L. Sancho, and A. R. Figueiras-
    Vidal. Sample selection via clustering to construct support vector-like classifier. IEEE
    Transactions on Neural Networks, 10:1474-1481, 1999.
    [52] O. L. Mangasarian. Mathematical Programming in Neural Networks. ORSA Journal
    on Computing, 5(4):349-360, 1993.
    [53] O. L. Mangasarian. Nonlinear Programming. SIAM, Philadelphia, PA, 1994.
    [54] O. L. Mangasarian. Generalized support vector machines. In A. Smola, P. Bartlett,
    B. SchÄolkopf, and D. Shuurmans, editors, Advance in Large Margin Classifiers, pages
    135-146, Cambridge, MA, 2000. MIT Press.
    [55] O. L. Mangasarian. Exact 1-norm support vector machines via unconstrained convex
    di®erentialble minimization. Journal of Machine Learning Research, 7:1517-1530,
    2006.
    [56] O. L. Mangasarian and D. R. Musicant. Large scale kernel regression via lin-
    ear programming. Technical Report 99-02, Data Mining Institute, Computer Sci-
    ences Department, University of Wisconsin, Madison, Wisconsin, August 1999.
    ftp://ftp.cs.wisc.edu/pub/dmi/tech-reports/99-02.ps.
    87
    [57] O. L. Mangasarian and D. R. Musicant. Successive Overrelaxation for Support Vector
    Machines. IEEE Transactions on Neural Networks, 10:1032-1037, 1999.
    [58] O. L. Mangasarian and D. R. Musicant. Robust linear and support vector regression.
    IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(9):950-955,
    2000. ftp://ftp.cs.wisc.edu/pub/dmi/tech-reports/99-09.ps.
    [59] O. L. Mangasarian, W. N. Street, and W. H. Wolberg. Breast cancer diagnosis and
    prognosis via linear programming. Operations Research, 43(4):570-577, 1995.
    [60] MATLAB. User's Guide. The MathWorks, Inc., Natick, MA 01760, 1994-2001.
    http://www.mathworks.com.
    [61] D. Michie, D. J. Spiegelhalter, and C. C. Taylor. Machine Learning, Neural and
    Statistical Classification. Prentice-Hall, Englewood Cli®s, N.J., 1994. Data available
    at anonymous ftp://ftp.ncc.up.pt/pub/statlog.
    [62] T. M. Mitchell. Machine Learning. McGraw-Hill, Boston, 1997.
    [63] M. Molla, M.Waddell, D. Page, and J. Shavlik. Using machine learning to design and
    interpret gene-expression microarrays. AI Magazine, Special Issue on Bioinformatics,
    2003.
    [64] S. Odewahn, E. Stockwell, R. Pennington, R. Humphreys, and W. Zumach. Au-
    tomated star/galaxy discrimination with neural networks. Astronomical Journal,
    103(1):318-331, 1992.
    [65] M. R. Osborne, B. Presnell, and B. A. Turlach. A new approach to variable selection
    in least squares problems. IMA Journal of Numerical Analysis, 20:389-404, 2000.
    [66] Y.-J. Oyang, S.-C. Hwang, Y.-Y. Ou, C.-Y. Chen, and Z.-W. Chen. An novel learning
    algorithm for data classification with radial basis function networks. In Proceeding
    of 9th International Conference on Neural Information Processing, pages 18-22, Sin-
    gapore, Nov. 2001.
    [67] D. Page, F. Zhan, J. Cussens, M. Waddell, J. Hardin, B. Barlogie, and J. Shaugh-
    nessy. Comparative data mining for microarrays: A case study based on multiple
    myeloma. Technical report, Computer Sciences Department, University of Wisconsin,
    November 2002.
    [68] C. E. Rasmussen and Z. Ghahramani. Occam's razor. In Todd K. Leen, Thomas G.
    Dietterich, and Volker Tresp, editors, Advances in Neural Information Processing
    Systems 13, pages 294-300, Cambridge, MA, 2001. MIT Press.
    [69] R. Rifkin and A. Klautau. In defense of one-vs-all classification. Journal of Machine
    Learning Research, 5:101-141, 2004.
    [70] V. Roth. The generalized lasso. IEEE Transactions on Neural Networks, 15(1):16-28,
    2004.
    88
    [71] B. W. Silverman. Density Estimation for Statistics and Data Analysis. Chapmman,
    Hall, London, 1986.
    [72] A. Smola and B. SchÄolkopf. Sparse greedy matrix approximation for machine learn-
    ing. In Proc. 17th International Conf. on Machine Learning, pages 911-918. Morgan
    Kaufmann, San Francisco, CA, 2000.
    [73] A. Smola and B. SchÄolkopf. A tutorial on support vector regression. Statistics and
    Computing, 14:199-222, 2004.
    [74] C. Staelin. Parameter selection for support vector machines. Hewlett-Packard Com-
    pany, Tech. Rep. HPL-2002-354R1, 2003.
    [75] I. Steinwart and A. Christmann. Support Vector Machines. Springer, New York,
    2008.
    [76] C. J. Stone. An asymptotically optimal window selection rule for kernel density
    estimates. Annals of Statistics, 12:1285-1297, 1984.
    [77] M. Stone. Cross-validatory choice and assessment of statistical predictions. Journal
    of the Royal Statistical Society, 36:111-147, 1974.
    [78] R. Tibshirani. Regression shrinkage and selection via the lasso. J.R.S.S.B, 58(1):267-288, 1996.
    [79] V. Vapnik. Statistical Learning Theory. Wiley, New York, NY, 1998.
    [80] V. N. Vapnik. The Nature of Statistical Learning Theory. Springer, 2000.
    [81] J. Weston, A. Elissee®, B. SchÄolkopf, and M. Tipping. Use of the zero-norm with
    linear models and kernel methods. Jorunal of Machine Learning Research, 3:1439-1461, 2003.
    [82] J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio, and V. Vapnik. Feature
    selection for SVMs. In Advances in Neural Information Processing Systems 13, pages
    668-674, 2001.
    [83] J. Weston and C. Watkins. Multi-class support vector machines. In M. Verleysen,
    editor, Proceedings of ESANN99, Brussels, 1999. D. Facto Press.
    [84] C. K. I. Williams and M. Seeger. Using the NystrÄom method to speed up kernel ma-
    chines. In Todd K. Leen, Thomas G. Dietterich, and Volker Tresp, editors, Advances
    in Neural Information Processing Systems 13, pages 682-688, Cambridge, MA, 2001.
    MIT Press.
    [85] I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and
    Techniques with Java Implementations. Morgan Kaufmann, San Francisco, CA, 1999.
    [86] M. Wu, B. SchÄolkopf, and G. Bakir. A direct method for building sparse kernel
    learning algorithms. Journal of Machine Learning Research, 7:603-624, 2006.
    [87] H. Xu, C. Caramanis, and S. Mannor. Robustness and regularization of support
    vector machines. Journal of Machine Learning Research, 10:1485-1510, 2009.
    [88] J. Zhu, S. Rosset, T. Hastie, and R. Tibshirani. 1-norm support vector machines. In
    Advances in Neural Information Processing Systems 16-NIPS2003. MIT Press, 2003.
    [89] H. Zou. An improved 1-norm svm for simultaneous classification and variable se-
    lection. Eleventh International Conference on Artificial Intelligence and Statistics,
    2007.

    QR CODE