簡易檢索 / 詳目顯示

研究生: 張志丞
Chih-Cheng Chang
論文名稱: 多元分類之平滑式支撐向量法
Smooth Support Vector Machine for Multi-class Classification
指導教授: 李育杰
Yuh-Jye Lee
口試委員: 項天瑞
Tien-Ruey Hsiang
鮑興國
Hsing-Kuo Kenneth Pao
戴碧如
Bi-Ru Dai
黃文瀚
Wen-Han Huang
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2007
畢業學年度: 95
語文別: 英文
論文頁數: 86
中文關鍵詞: 信心水準多元分類一對一一對其餘平滑式之支撐向量法支撐向量法三元分類投票
外文關鍵詞: confidence, multi-class classification, one-vs.-one, one-vs.-rest, smooth support vector machine, support vector machine, trinary classification, voting
相關次數: 點閱:187下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 平滑式技術已經成功的被引進支撐向量法 (Support Vector Machines: SVMs) 以供處理二元分類問題 (binary classification problems) 以及迴歸問題 (regression problems)。此篇論文主要的工作是利用過去在平滑式之支撐向量法 (Smooth Support Vector Machine: SSVM) 的一些優良性質,從只能處理二元分類的問題擴展至具有處理多元分類問題的能力,因而發展出了單機式 (single machine approach) 的多元分類之平滑式支撐向量法 (Multi-class Smooth Support Vector Machine: MSSVM)。然而,多元分類之平滑式支撐向量法與其它單機式的多元分類之支撐向量法共同會面臨到的問題是,當分類問題的類別數過多時,會使得它在處理的時候變得相當複雜。因此,我們只設定多元分類之平滑式支撐向量法處理三元分類的問題。對於類別數大於三的分類問題,我們另外設計了一種平滑式之一對一對其餘 (Smooth One-vs.-One-vs.-Rest: SOOR) 的策略來處理這類型的分類問題,這種策略是將一個k元分類的問題分解成k(k-1)/2個三元分類的問題以分別處理。每一個三元分類問題是被定義成二個特定的類別以及其餘的資料自成一類。對於預測一個未知資料之類別標籤 (class label) 時,我們利用的是最大得票數的策略 (majority voting scheme)。使用平滑式之一對一對其餘的策略以及一對一 (One-vs.-One) 的策略來處理多元分類之問題時,相較之下我們的平滑式之一對一對其餘的策略可以有效率的濾除掉在一對一策略中產生出來的雜訊票 (nuisance votes)。在實驗中,我們利用了九個已被廣泛使用的資料集測試了多元分類之平滑式支撐向量法以及平滑式之一對一對其餘的策略,其結果與二個有名的一對一以及一對其餘 (One-vs.-Rest) 的策略相較之下,在正確率上有些許提升,而在預測的信心水準方面,我們的平滑式之一對一對其餘的策略與一對一的策略相較之下,我們的方法具有相當好的表現。


    The smooth technique has been introduced into SVMs for both binary classification and regression problems. In this work, we extend the previous work in Smooth Support Vector Machine (SSVM) from binary to k-class classification based on a single machine approach and call it Multi-class Smooth Support Vector Machine (MSSVM). Similar to other single machine approaches, MSSVM will become very complicated and very difficult to solve when k is large. We only implement MSSVM for trinary classification problems. For k>3, we propose a Smooth One-vs.-One-vs.-Rest (SOOR) scheme which decomposes a k-class classification problem into k(k-1)/2 trinary classification problems. Each trinary classification problem is defined by two particular classes and the rest part of data. The class label is determined by a majority voting scheme. Under SOOR setting, we can filter out those nuisance votes in One-vs.-One scheme. In our experiments, we compare MSSVM and SOOR with One-vs.-One and One-vs.-Rest, which are simple and well-known schemes for multi-class classification, on nine public UCI data sets. The results show that MSSVM and SOOR have a slightly better accuracy than One-vs.-One and One-vs.-Rest schemes, and the confidence of SOOR is obviously higher than One-vs.-One for prediction.

    1 Introduction 1.1 Background 1.2 Our Main Works 1.3 Organization of Thesis 1.4 Notation 2 SVMs for Binary Classification 2.1 Support Vector Machines 2.2 Smooth Support Vector Machine 3 SVMs for Multi-class Classification 3.1 Multiple Machines Approaches 3.1.1 One-vs.-One Scheme 3.1.2 One-vs.-Rest Scheme 3.1.3 Discussion 3.2 Single Machine Approaches 3.2.1 Vapnik, Weston & Wakins Approach 3.2.2 Crammer & Singer Approach 3.2.3 Discussion 4 Multi-class Smooth Support Vector Machine (MSSVM) 4.1 Linear MSSVM 4.2 Non-linear MSSVM 4.2.1 Non-linear MSSVM with the Full-kernel 4.2.2 Non-linear MSSVM with the Reduced-kernel 4.3 Newton-Armijo Algorithm for MSSVM 4.4 Implement MSSVM for Trinary Classification 4.5 Test MSSVM on Trinary Synthetic Toy Data Sets 5 A Smooth One-vs.-One-vs.-Rest (SOOR) Scheme 5.1 SOOR scheme 5.2 Test SOOR on 4-class Synthetic Data Sets 5.3 Discussion 6 Experimental Works 6.1 Experiment on UCI Data Sets 6.1.1 Data Presentation 6.1.2 Experimental Setup 6.1.3 Tuning Procedure 6.1.4 Accuracy Report 6.2 Filterable Ability of SOOR 6.3 Conditional & Confidence-level Error Analysis 7 Conclusion and Future Work 7.1 Conclusion 7.2 Future Work

    [1] E. L. Allwein, R. E. Schapire, and Y. Singer. Reducing multiclass to binary: a unifying approach for margin classifiers. J. Machine Learning Research, 1:113-141, 2000.

    [2] L. Armijo. Minimization of functions having Lipschitz-continuous first partial derivatives. Pacific Journal of Mathematics, 16:1-3, 1966.

    [3] D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, MA, second edition, 1999.

    [4] L. Bottou, C. Cortes, J. Denker, H. Drucker, I. Guyon, L. Jackel, Y. LeCun, U. Muller, E. Sackinger, P. Simard, and V. Vapnik. Comparison of classifier methods: a case study in handwriting digit recognition. In International Conference on Pattern Recognition, pages 77-87. IEEE Computer Society Press, 1994.

    [5] C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2:121-167, 1998.

    [6] B. Chen and P. T. Harker. Smooth approximations to nonlinear complementarity problems. SIAM Journal of Optimization, 7:403-420, 1997.

    [7] C. H. Chen and K. C. Li. A three-way classification strategy for reducing class- abundance: the zip code recognition example. In J. Rojo and V. Perez-Abreu, editors, Lecture Notes-Monograp, volume 44, pages 63-86. I.M.S., 2004.

    [8] C. H. Chen and O. L. Mangasarian. Smoothing methods for convex inequalities and linear complementarity problems. Mathematical Programming, 71(1):51-69, 1995.

    [9] C. H. Chen and O. L. Mangasarian. A class of smoothing functions for nonlinear and mixed complementarity problems. Computational Optimization and Applications, 5(2):97-138, 1996.

    [10] X. Chen, L. Qi, and D. Sun. Global and superlinear convergence of the smoothing Newton method and its application to general box constrained variational inequalities. Mathematics of Computation, 67:519-540, 1998.

    [11] X. Chen and Y. Ye. On homotopy-smoothing methods for variational inequalities. SIAM Journal on Control and Optimization, 37:589-616, 1999.

    [12] V. Cherkassky and F. Mulier. Learning from Data - Concepts, Theory and Methods. John Wiley & Sons, New York, 1998.

    [13] P. W. Christensen and J. S. Pang. Frictional contact algorithms based on semismooth newton methods. In Reformulation: Nonsmooth, Piecewise Smooth, Semismooth and Smoothing Methods, M. Fukushima and L. Qi, (editors), pages 81-116, Dordrecht, Netherlands, 1999. Kluwer Academic Publishers.

    [14] F. H. Clarke. Optimization and Nonsmooth Analysis. SIAM, Philadelphia, PA, 1990.

    [15] C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20:273-297, 1995.

    [16] R. Courant and D. Hilbert. Methods of Mathematical Physics. Interscience Publishers, New York, 1953.

    [17] T. M. Cover and P. E. Hart. Nearest Neighbor Pattern Classification. IEEE Transaction on Information Theory, 13:21-27, 1967.

    [18] K. Crammer and Y. Singer. Improved output coding for classification using continuous relaxation. In Proceeding of the Thirteenth Annual Conference on Neural
    Information Processing Systems, 2000.

    [19] K. Crammer and Y. Singer. On the learnability and design of output codes for multiclass problems. In Computational Learning Theory, pages 35-46, 2000.

    [20] K. Crammer and Y. Singer. On the learnability and design of output codes for multiclass problems. Machine Learning, 47:201-233, 2002.

    [21] N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines. Cambridge University Press, Cambridge, 2000.

    [22] J. E. Dennis and R. B. Schnabel. Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Prentice-Hall, Englewood Cliffs, N.J., 1983.

    [23] T. G. Dietterich and B. Bakiri. Solving multiclass learning problems via error-correcting output codes. J. Artificial Intelligence Research, 2:263-286, 1995.

    [24] K. T. Fang and Y. Wang. Number-Theoretic Methods in Statistics. Chapmman & Hall, London, 1994.

    [25] J. Friedman. Another approach to polychotomous classification. Technical report, Department of Statistics, Stanford University, 1996. Available at http://www-stat.stanford.edu/reports/friedman/poly.ps.Z.

    [26] M. Fukushima and L. Qi. Reformulation: Nonsmooth, Piecewise Smooth, Semismooth and Smoothing Methods. Kluwer Academic Publishers, Dordrecht, The Netherlands, 1999.

    [27] J. Fürnkranz. Round robin classification. Journal of Machine Learning Research, 2:721-747, 2002.

    [28] C. W. Hsu and C. J. Lin. A comparison on methods for multi-class support vector machines. IEEE Transactions on Neural Networks, 13:415-425, 2002.

    [29] C. W. Hsu and C. J. Lin. A simple decomposition method for support vector machines. Machine Learning, 46:291-314, 2002.

    [30] C. M. Huang, Y. J. Lee, D. K. J. Lin, and S. Y. Huang. Model selection for support vector machines via uniform design. Technical report, Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, Taipei 106, Taiwan, 2006. http://dmlab1.csie.ntust.edu.tw/downloads.

    [31] C. K. I. Williams and M. Seeger. Using the Nyström method to speed up kernel machines. In T. K. Leen, T. G.. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pages 682-688, Cambridge, MA, USA, 2001. MIT Press.

    [32] T. Joachims. SVMlight, 2002. http://svmlight.joachims.org.

    [33] S. Knerr, L. Personnaz, and G.. Dreyfus. Single-layer learning revisited: a stepwise procedure for building and training a neural network. In J. Fogelman, editor, Neurocomputing: Algorithms, Architectures and Applications. Springer-Verlag, 1990.

    [34] U. Kreβel. Pairwise classification and support vector machines. In B. Schölkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods – Support Vector Learning, pages 255-268, Cambridge, MA, 1999. MIT Press.

    [35] Y. Lee, Y. Lin, and G.. Wahba. Multicategory support vector machines. Technical Report 1043, Department of Statistics, University of Wisconsin, 2001.

    [36] Y. Lee, Y. Lin, and G.. Wahba. Multicategory support vector machines. In Proceedings of the 33rd Symposium on the Interface, 2001.

    [37] Y. J. Lee, W. F. Hsieh, and C. M. Huang. є-SSVR: A smooth support vector machine for є-insensitive regression. IEEE Transactions on Knowledge and Data Engineering, 17:678-685, 2005.

    [38] Y. J. Lee and S. Y. Huang. Reduced support vector machines: a statistical theory. IEEE Transactions on Neural Networks, 18:1-13, 2007.

    [39] Y. J. Lee and O. L. Mangasarian. RSVM: Reduced support vector machines. In First SIAM International Conference on Data Mining, Chicago, 2001.

    [40] Y. J. Lee and O. L. Mangasarian. SSVM: A smooth support vector machine. Computational Optimization and Applications, 20:5-22, 2001.

    [41] D. D. Lewis. Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval. AT&T Labs - Research, 180 Park Avenue Florham Park, NJ 07932-0971 USA.

    [42] O. L. Mangasarian. Mathematical Programming in Neural Networks. ORSA Journal on Computing, 5(4):349-360, 1993.

    [43] O. L. Mangasarian. Nonlinear Programming. SIAM, Philadelphia, PA, 1994.

    [44] O. L. Mangasarian and D. R. Musicant. Successive Overrelaxation for Support Vector Machines. IEEE Transactions on Neural Networks, 10:1032-1037, 1999. ftp://ftp.cs.wisc.edu/math-prog/tech-reports/98-18.ps.

    [45] MATLAB. User's Guide. The MathWorks, Inc., Natick, MA 01760, 1994-2001. http://www.mathworks.com.

    [46] D. Michie, D. J. Spiegelhalter, and C. C. Taylor. Machine Learning, Neural and Statistical Classification. Prentice-Hall, Englewood Cliffs, N.J., 1994. Data available at anonymous ftp://ftp.ncc.up.pt/pub/statlog.

    [47] D. J. Newman, S. Hettich, C. L. Blake, and C. J. Merz. UCI Repository of machine learning databases. Technical report, Irvine, CA: University of California, Department of Information and Computer Science, 1998. Available at http://www.ics.uci.edu/~mlearn/MLRepository.html.

    [48] M. J. L. Orr. Recent Advances in Radial Basis Function Networks. Technical report, Institute for Adaptive and Neural Computation, June 1999.

    [49] J. C. Platt. Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. In B. Schölkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods - Support Vector Learning, pages 185-208. MIT Press, 1999. http://www.research.microsoft.com/~jplatt/smo.html.

    [50] J. C. Platt, N. Cristianini, and J. Shawe-Taylor. Large margin DAGs for multiclass classification. In Advances in Neural Information Processing Systems, volume 12, pages 547-553. MIT Press, 2000.

    [51] R. Rifkin and A. Klautau. In defense of one-vs-all classification. Journal of Machine Learning Research, 5:101-141, 2004.

    [52] A. Smola and B. Schölkopf. Sparse greedy matrix approximation for machine learning. In Proceeding 17th International Conference on Machine Learning, pages 911-918, San Francisco, CA, 2000.

    [53] P. Tseng. Analysis of a non-interior continuation method based on Chen-Mangasarian smoothing functions for complementarity problems. In Reformulation: Nonsmooth, Piecewise Smooth, Semismooth and Smoothing Methods, M. Fukushima and L. Qi, (editors), pages 381-404, Dordrecht, Netherlands, 1999. Kluwer Academic Publishers.

    [54] V. Vapnik. Statistical Learning Theory. Wiley, New York, NY, 1998.

    [55] V. N. Vapnik. The Nature of Statistical Learning Theory. Springer, New York, 1995.

    [56] J. Weston and C. Watkins. Multi-class support vector machines. In M. Verleysen, editor, Proceedings of ESANN99, Brussels, 1999. D. Facto Press.

    [57] H. M. Wu. Kernel sliced inverse regression with applications on classification. Under revision.

    QR CODE