簡易檢索 / 詳目顯示

研究生: 陳采瀅
Tsai-Ying - Chen
論文名稱: 壓縮平滑支撐向量機
The Compressed Smooth Support Vector Machine
指導教授: 鮑興國
Hsing-Kuo Pao
口試委員: 李育杰
Yuh-Jye Lee
項天瑞
Tien-Ruey Hsiang
楊傳凱
Chuan-Kai Yang
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2017
畢業學年度: 105
語文別: 英文
論文頁數: 43
中文關鍵詞: SVMCompressed Sensing
外文關鍵詞: SVM, Compressed Sensing
相關次數: 點閱:295下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • In most cases, the real world data is not linearly separable. Under this condition, we
    can use the nonlinear Support Vector Machines (SVM), which involves nonlinear kernel
    matrix, instead of the linear SVM as a classification optimization problem. A key
    problem with that the disadvantages of the nonlinear SVM are apparent when encountering
    large-scale data in that the size of the kernel matrix grows quadratically with
    the size of the data which makes in-memory computation intractable. Our aim is to
    avoid the intractability of the nonlinear SVM on large-scale data. We introduce the
    Compressed Smooth Support Vector Machine (CSSVM), a new method based on the
    Smooth SVM (SSVM). Furthermore, we apply the idea of a reduced kernel and combine
    it with compressed sensing. In our method we use a two stage reduction: the first
    reduction is logically the same as Reduced SVM (RSVM) but the reduction ratio will
    not be too small in this stage. That is, we do not throw away nearly as much data as
    we normally do with standard RSVM. In the second reduction we use a sensing matrix
    to reduce the dimension. In this way we reduce the dimension without sacrificing
    valuable information which allows us not to obviously affect the accuracy, while improving
    stability. The numerical result shows that the CSSVM is competitive in both
    speed and the stability of accuracy. In conclusion, CSSVM is a good choice for handling
    large-scale classification problems using a nonlinear separating hyperplane.


    In most cases, the real world data is not linearly separable. Under this condition, we
    can use the nonlinear Support Vector Machines (SVM), which involves nonlinear kernel
    matrix, instead of the linear SVM as a classification optimization problem. A key
    problem with that the disadvantages of the nonlinear SVM are apparent when encountering
    large-scale data in that the size of the kernel matrix grows quadratically with
    the size of the data which makes in-memory computation intractable. Our aim is to
    avoid the intractability of the nonlinear SVM on large-scale data. We introduce the
    Compressed Smooth Support Vector Machine (CSSVM), a new method based on the
    Smooth SVM (SSVM). Furthermore, we apply the idea of a reduced kernel and combine
    it with compressed sensing. In our method we use a two stage reduction: the first
    reduction is logically the same as Reduced SVM (RSVM) but the reduction ratio will
    not be too small in this stage. That is, we do not throw away nearly as much data as
    we normally do with standard RSVM. In the second reduction we use a sensing matrix
    to reduce the dimension. In this way we reduce the dimension without sacrificing
    valuable information which allows us not to obviously affect the accuracy, while improving
    stability. The numerical result shows that the CSSVM is competitive in both
    speed and the stability of accuracy. In conclusion, CSSVM is a good choice for handling
    large-scale classification problems using a nonlinear separating hyperplane.

    1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Research Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 Related Work 7 2.1 The Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.1 Linear SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.2 Nonlinear SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 The Smooth Support Vector Machine . . . . . . . . . . . . . . . . . . . . . 12 2.2.1 Linear SSVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.2 Nonlinear SSVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3 The Reduced Support Vector Machine . . . . . . . . . . . . . . . . . . . . 16 2.4 Compressed Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3 The Compressed Smooth Support Vector Machine 20 II 3.1 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.1.1 Truncation Trick . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.1.2 Sensing Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.2 Predicting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4 Experiments 29 4.1 Experimental Se?ing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.1.1 Dataset Information . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.1.2 Equipment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.1.3 Parameters Selection . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2.1 Rkernel vs. TRkernel . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2.2 RP vs. SRP vs. CS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.2.3 Libsvm vs. TRkernel vs. Ckernel . . . . . . . . . . . . . . . . . . . 35 4.2.4 Rkernel vs. Ckernel . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5 Conclusions 38 AppendixA 39 List of Tables 4.1 The information about datasets we used in the experiment . . . . . . . . 31 4.2 System information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.3 The parameters in CSSVM . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.4 Rkernel vs. TRkernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.5 RP vs. SRP vs. CS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.6 TRkernel vs. Ckernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.7 Libsvm vs. Ckernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.8 Rkernel vs. Ckernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 List of Figures 1 Nonlinear SVM vs. RSVM vs. CSSVM . . . . . . . . . . . . . . . . . . . . 1 1.1 Research framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 Binary classification problem - linearly separable case . . . . . . . . . . . 7 2.2 Hard-margin SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Soft-margin SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4 A nonlinear map from the input space to some high-dimensional feature space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.5 Smoothing the plus function . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1 Nonlinear SVM vs. RSVM vs. CSSVM . . . . . . . . . . . . . . . . . . . . 22 3.2 Create a compressed kernel . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3 The second way of the truncation trick . . . . . . . . . . . . . . . . . . . . 25 4.1 The parameter C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

    [1] D. Achlioptas. Database-friendly random projections. In PODS ’01: Proceedings
    of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database
    systems, pages 274–281, New York, NY, USA, 2001. ACM. ISBN 1-58113-361-8. doi:
    http://doi.acm.org/10.1145/375551.375608.
    [2] R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin. A Simple Proof of the
    Restricted Isometry Property for Random Matrices. 2007.
    [3] K. Bryan and T. Leise. Making do with less: An introduction to compressed sensing.
    SIAM Review, 55(3):547–566, 2013. URL http://dblp.uni-trier.de/db/
    journals/siamrev/siamrev55.html#BryanL13.
    [4] R. Calderbank, S. Jafarpour, and R. Schapire. Compressed learning: Universal
    sparse dimensionality reduction and learning in the measurement domain.
    preprint, 2009.
    [5] E. J. Candès, J. Romberg, and T. Tao. Robust uncertainty principles: Exact signal
    reconstruction from highly incomplete frequency information. Information Theory,
    IEEE Transactions on, 52(2):489–509, 2006. ISSN 0018-9448. doi: 10.1109/TIT.2005.
    862083.
    [6] C. C. Chang and C. J. Lin. LIBSVM: A library for support vector machines. ACM
    Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011. Software
    available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
    [7] B. Chen and P. T. Harker. Smooth approximations to nonlinear complementarity
    problems. SIAM Journal on Optimization, 7(2):403–420, 1997.
    [8] C. Chen and O. L. Mangasarian. Smoothing methods for convex inequalities and
    linear complementarity problems. Mathematical Programming, 71:51–69, 1995. doi:
    10.1007/BF01592244. URL http://dx.doi.org/10.1007/BF01592244.
    [9] P. Domingos. A few useful things to know about machine learning. Commun.
    ACM, 55(10):78–87, October 2012. ISSN 0001-0782. doi: 10.1145/2347736.2347755.
    URL http://doi.acm.org/10.1145/2347736.2347755.
    [10] D. L. Donoho. Compressed sensing. IEEE Trans. Inform. Theory, 52:1289–1306,
    2006.
    [11] G. Gordon and R. Tibshirani. Karush?kuhn?tucker conditions. In Proc. Optim.
    Fall Lecture Notes, pages 1–26, 2012. URL https://www.cs.cmu.edu/~ggordon/
    10725-F12/slides/16-kkt.pdf.
    [12] X. Huang, J. A. K. Suykens, S. Wang, A. Maier, and J. Hornegger. Classification
    with truncated ℓ1 distance kernel. Internal report 15-211 ESAT-SISTA, KU Leuven,
    2015.
    [13] W. C. Lai, P. H. Huang, Y. J. Lee, and A. Chiang. A distributed ensemble scheme for
    nonlinear support vector machine. IEEE Tenth International Conference on Intelligent
    Sensors, Sensor Networks and Information Processing (ISSNIP), pages 1–6, 2015.
    [14] Y. J. Lee and S. Y. Huang. Reduced support vector machines: A statistical theory.
    Neural Networks, IEEE Transactions on, 18(1):1–13, 2007.
    [15] Y. J. Lee and O. L. Mangasarian. RSVM: Reduced support vector machines. In
    Data Mining Institute, Computer Sciences Department, University of Wisconsin, pages
    00–07, 2001.
    [16] Y. J. Lee and O. L. Mangasarian. SSVM: A smooth support vector machine. Technical
    report, 2001. Data Mining Institute, University of Wisconsin, Technical Report
    99-03. ftp://ftp.cs.wisc.edu/pub/dmi/tech-reports/99-03.ps.
    [17] M. Lichman. UCI machine learning repository, 2013. URL http://archive.ics.
    uci.edu/ml.
    [18] K. M. Lin and C. J. Lin. A study on reduced support vector machines. IEEE Transactions
    on Neural Networks, 14(6):1449–1559, 2003. URL http://www.csie.ntu.edu.
    tw/~cjlin/papers/rsvmTEX.pdf.
    [19] O. L. Mangasarian and D. R. Musicant. Successive overrelaxation for support vector
    machines. IEEE Transactions on Neural Networks, 10(5):1032–1037, 1999.
    [20] H. Ouyang and A. G Gray. Fast stochastic frank-wolfe algorithms for nonlinear
    svms. In SDM, pages 245–256. SIAM, 2010.
    [21] Y. E. Tsai and Y. J. Lee. Stochastic newton method for large scale nonlinear support
    vector machine. 2016.

    QR CODE