簡易檢索 / 詳目顯示

研究生: 蔡育燐
Yu-Lin Tsai
論文名稱: 適用於大規模資料集問題結合類別平均值資訊於被動進取演算法
Passive and Aggressive Algorithm with Class Means Information for Large-Scale Problems
指導教授: 李育杰
Yuh-Jye Lee
口試委員: 劉庭祿
Tyng-Luh Liu
曹振海
Chen-Hai Tsao
王鈺強
Yu-Chiang Wang
鮑興國
Hsing-Kuo Pao
葉倚任
Yi-Ren Yeh
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2011
畢業學年度: 99
語文別: 英文
論文頁數: 47
中文關鍵詞: 機器學習線上學習大規模資料集問題
外文關鍵詞: on-line learning, PA algorithm, large-scale problem, proximal model
相關次數: 點閱:175下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在即時學習的設定上,我們可以應用即時學習的演算法在大規模分類的問題上,這類的演算法可以想成一種SGD的演算法,SGD嘗試去找一個最佳化問題的解,在它的目標函數中包含兩部分,正規化和訓練損失的總合。在線上學習的演算法中如何去決定一個學習步伐已經成為一個重要的議題。被動進取演算法(PA)提出了一個更新策略去找出新的分類器。它建議新的分類器必須將當前的資料點分對以外還要能與現在的分類器越相像越好。而且PA也堆導出了封閉形式的更新規則使得每次更新速度非常的快。然而,不去記憶曾經看過的資料點可能會傷害到學習的效果。我們提出了一個利用類別平均值來保留過去資料點的資訊之更新方法。當資料是非常接近線性可分的狀況下,累積到目前為止的正類別之平均值減掉負類別之平均值所形成的向量可以當作一個近似的分類器。我們將這個近似的分類器融合到PA演算法中,並且推導出封閉形式的更新規則,因此我們所提出來的演算法(PAm)跟PA一樣可以快速的更新分類器。在我們的實驗中,結果顯示對於資料的輸入順序在與PA相比之下PAm是比較不受影響,在連續的兩輪之間PAm分類器也較PA分類器的相似程度來得高,由實驗的結果來看,PAm在一輪的訓練之後就可以得到接近最佳的解,而且也適合用來解決大規模資料集的問題上。


    Due to the nature of on-line learning setting, the on-line learning algorithms have been successfully applied to large-scale classification tasks. This type of algorithms can be interpreted as a stochastic gradient descent method that try to find the solution of the underlying minimization problem with the objective function consisting the sum of training loss and regularization term. How to decide the learning rate becomes an important issue in this type learning algorithms. The passive and aggressive algorithm proposed an updating scheme to determine the new updated classifier. It suggests that the new classifier should not only classify the new arriving data correctly but also as close to the current classifier as possible. A closed form of updating rule was derived that makes PA algorithm training extremely fast. However, a lack of memory for previous instances might hurt the learning efficiency. We propose a new updating rule that takes the accumulated means information for different classes into account. The positive class mean and negative class mean of accumulated training data until now keep a memory of previous instances. Besides, the difference between them provides a good proximal classification model especially when the dataset is almost linearly separable. We augment the proximal model into PA algorithm updating rule in a closed form. Thus, our proposed method has the same computational advantage with PA algorithm. The preliminary numerical results show that our proposed method is less sensitive to the input order of training data than PA algorithm. It will return a near optimal classifier in a single-pass. Two consecutive classifiers generated by two training epochs are very close. These evidences show that our method is suitable for task with extremely large dataset.

    1 Introduction 1 2 On-line Learning Algorithm 6 2.1 Batch Learning vs. On-line Learning . . . . . . . 7 2.2 On-line Perceptron . . . . . . . . . . . . . . . . 8 2.3 Passive and aggressive algorithm . . . . . . . . .10 3 Passive and Aggressive Algorithm with Class Means 14 3.1 Motivation PAm . . . . . . . . . . . . . . . . . .15 3.2 PAm algorithm . . . . . . . . . . . . . . . . . . 16 3.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . .21 4 Numerical Results 28 5 Conclusion and Future Work 36 A The Derivation of the Closed form of PAm family 38

    [1] A.K. Menon. Large-scale support vector machines: algorithms and theory. Research Exam, University of California, San Diego, 2009.
    [2] T. Zhang. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the twenty- rst international conference on Machine learning, page 116. ACML, 2004.
    [3] L. Bottou and O. Bousquet. The tradeoffs of large scale learning. Advances in neural information processing systems, 20:161–168, 2008.
    [4] John Lagford. Conerns about the large scale learning challenge, 2008.
    [5] H.F. Yu, C.J. Hsieh, K.W. Chang, and C.J. Lin. Large linear classification when data cannot fit in memory. In The 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 833–842. ACM KDD 2010, 2010.
    [6] F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological review, 65(6):386–408, 1958.
    [7] N. Cesa-Bianchi, A. Conconi, and C. Gentile. A second-order perceptron algorithm. In Computational Learning Theory, pages 129–140. Springer, 2002.
    [8] J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 2010.
    [9] S. Lee and S.J. Wright. Asset: Approximate stochastic subgradient estimation training for support vector machines. In submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010.
    [10] A. Bordes, L. Bottou, and P. Gallinari. Sgd-qn: Careful quasi-newton stochastic gradient descent. The Journal of Machine Learning Research, 10:1737–1754, 2009.
    [11] A. Bordes, L. Bottou, P. Gallinari, J. Chang, and S.A. Smith. Erratum: Sgdqn is less careful than expected. Journal of Machine Learning Research, 11:2229–2240, 2010.
    [12] C.N. Hsu, Y.M. Chang, H.S. Huang, and Y.J. Lee. Periodic step-size adaptation for single-pass on-line learning. In Advances in Neural Information Processing Systems, volume 22, pages 763–771, 2009.
    [13] C.N. Hsu, H.S. Huang, Y.M. Chang, and Y.J. Lee. Periodic step-size adaptation in second-order gradient descent for single-pass on-line structured learning. Machine learning, 77(2):195–224, 2009.
    [14] K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer. Online passive aggressive algorithms. The Journal of Machine Learning Research, 7:551–585, 2006.
    [15] M. Dredze, K. Crammer, and F. Pereira. Confidence-weighted linear classification. In Proceedings of the 25th International Conference on Machine Learning, pages 264–271. ACML, 2008.
    [16] K. Crammer, M.D. Fern, and O. Pereira. Exact convex confidence-weighted learning. In In Advances in Neural Information Processing Systems 22. Citeseer, 2008.
    [17] K. Crammer, A. Kulesza, and M. Dredze. Adaptive regularization of weight vectors. Advances in Neural Information Processing Systems, 22, 2009.
    [18] K. Crammer, M. Dredze, and A. Kulesza. Multi-class confidence weighted algorithms. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, volume 2, pages 496–504. Association for Computational Linguistics, 2009.
    [19] L. Bottou. Online learning and stochastic approximations. On-line learning in neural networks, pages 9–42, 1998.
    [20] L. Bottou and Y. Le Cun. On-line learning for very large data sets. Applied Stochastic Models in Business and Industry, 21(2):137–151, 2005.
    [21] S. Shalev-Shwartz. Online learning: Theory, algorithms, and applications. PhD thesis, Hebrew University, 2007.
    [22] J. Ma, A. Kulesza, M. Dredze, K. Crammer, L.K. Saul, and F. Pereira. Exploiting feature covariance in high-dimensional online learning. In Proceedings of the Arti cial Intelligence and Statistics. Citeseer, 2010.
    [23] T. Dietterich. Overfitting and undercomputing in machine learning. ACM Computing Surveys (CSUR), 27(3):326–327, 1995.
    [24] B. Scholkopf and A.J. Smola. Learning with kernels, volume 64. Citeseer, 2002.
    [25] N. Cristianini and J. Shawe-Taylor. An introduction to support vector machines: and other kernel-based learning methods. Cambridge Univ Pr, 2000.
    [26] Y.J. Lee and O.L. Mangasarian. Ssvm: A smooth support vector machine for classification. Computational optimization and Applications, 20(1):5–22, 2001.
    [27] L. Bottou and C.J. Lin. Support vector machine solvers, 2007.
    [28] K. Hiraoka, S. Yoshizawa, K. Hidai, M. Hamahira, H. Mizoguchi, and T. Mishima. Convergence analysis of online linear discriminant analysis. In Neural Networks, 2000. IJCNN 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on, volume 3, pages 387–391. IEEE, 2000.
    [29] Y.J. Lee and O.L. Mangasarian. Rsvm: Reduced support vector machines. In SIAM International Conference on Data Mining, pages 00–07. Citeseer, 2001.
    [30] C. Angulo and A. Catal`a. Online learning with kernels for smart adaptive systems: a review. Proc. European Network for Intelligent Technologies, Oulu, Finland, 2003.
    [31] J. Kivinen, A.J. Smola, and R.C. Williamson. Online learning with kernels. IEEE Transactions on Signal Processing, 52(8):2165–2176, 2004.
    [32] SVN Vishwanathan, N.N. Schraudolph, and A.J. Smola. Step size adaptation in
    reproducing kernel hilbert space. The Journal of Machine Learning Research, 7:1107–1133, 2006.
    [33] Y.J. Lee and S.Y. Huang. Reduced support vector machines: A statistical theory. IEEE Transactions on Neural Networks, 18(1):1–13, 2007.

    QR CODE