簡易檢索 / 詳目顯示

研究生: 張鍵中
Chien-Chung Chang
論文名稱: 縮減集及其應用之研究
A Study on Reduced Set with Applications
指導教授: 李育杰
Yuh-Jye Lee
口試委員: 劉庭祿
Tyng-Luh Liu
鮑興國
Hsing-Kuo Pao
曹振海
Chen-Hai Tsao
王鈺強
Yu-Chiang Wang
葉倚任
Yi-Ren Yeh
學位類別: 博士
Doctor
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2011
畢業學年度: 99
語文別: 英文
論文頁數: 80
中文關鍵詞: 共同訓練共識訓練特徵選擇多觀點學習半監督式學習支撐向量法
外文關鍵詞: co-training, consensus training, feature selection, multi-view learning, semi-supervised learning, support vector machines
相關次數: 點閱:188下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

為了克服傳統非線性支撐向量法(nonlinear SVMs)在處理大量資料時所遇到的計算困難,縮減集支撐向量法(reduced SVM, RSVM)利用一個比較小的長方形核矩陣(rectangular kernel matrix)取代非線性支撐向量法公式中充分稠密的正方形核矩陣(fully dense kernel matrix)。這個長方形核矩陣是利用一個從原始資料中隨機選取的資料子集合所生成的,我們稱這個資料子集合為「縮減集」(reduced set)。選擇具有代表性(representative)的縮減集能夠提升縮減集支撐向量法的效能,因此在本篇論文中,我們首先介紹三種能選擇比較具有代表性縮減集的方法,分別是漸增式縮減集支撐向量法(IRSVM)、集群縮減集支撐向量法(CRSVM)、以及系統抽樣縮減集支撐向量法(SSRSVM)。IRSVM依序尋找一個不在現有基底函數(basis function)所形成的特徵空間(feature space)中之核函數(kernel function),並將其增加到現有基底函數集之中。CRSVM則是針對各類別資料,先進行k-means的集群演算法,再將所有的群心(centroids)集合成縮減集。SSRSVM首先利用一個極小的縮減集建立一個初始的分類器(classifier),並利用此分類器對不在縮減集內的訓練資料(training data)進行分類,再選取部份被錯分之點加入縮減集,並使用新的縮減集重新訓練一個分類器。我們重複上述之程序直到滿足停止條件(stopping criteria)為止。
我們提出了一種名為 incremental forward feature selection(IFFS)的特徵(feature)選擇方法,這個方法是起源於IRSVM。它的直覺想法是,如果新的特徵會帶來最多額外的資訊量,則它將會被加到目前的特徵子集中。資訊量可以利用此一新的特徵向量與目前特徵子集所形成的行空間之距離度量。在本論文的最後,我們提出了一種基於縮減集支撐向量法的多觀點(multi-view)類型半監督式學習演算法,名為two-teachers-one-student (2T1S)。在2T1S的方法中,縮減集是在核特徵空間(feature space)中扮演觀點(view)的角色,而非如傳統多觀點演算法是在原資料空間(input space)中選擇特徵子集當作觀點。我們的2T1S結合了共同訓練(co-training)和共識訓練(consensus training)兩種概念。透過共同訓練的方法,由兩個觀點所形成的兩個分類器(老師)能夠“教導”第三個觀點所形成的分類器(學生),其中,老師與學生的角色是會輪替的。藉由共識訓練的想法,利用兩個分類器共同對未標記資料進行類別的標記所形成的共識(即標記為同一類別),能提升我們猜測未標記資料點之類別的信心。


The reduced support vector machine (RSVM) replaces the fully dense matrix with a smaller rectangular kernel matrix, which is used in the nonlinear SVM formulation to avoid the computational difficulties. This rectangular kernel matrix is generated by a uniform random subset, named reduced set. In this dissertation, we present three schemes to select a more representative reduced set for RSVM, namely incremental RSVM (IRSVM), clustering RSVM (CRSVM), and systematic sampling RSVM (SSRSVM). The IRSVM sequentially adds a kernel function to the current basis function set only when the function is dissimilar to the current set. The CRSVM performs the k-means clustering algorithm to each class and then uses the cluster centroids to form a reduced set. The SSRSVM starts with a small initial reduced set and adds a portion of misclassified points into the reduced set iteratively based on the current classifier.

Inspired by IRSVM, we propose a feature selection algorithm, named incremental forward feature selection (IFFS). The intuition behind this method is that a new feature will be added into the current selected feature subset if it will bring in the extra information. We measure this information by using the distance between the new feature vector and the column space spanned by current feature subset. Last part of this thesis, we propose an RSVM based multi-view algorithm for semi-supervised learning, named two-teachers-one-student (2T1S). With RSVM, different from typical multi-view methods, reduced sets play distinct views in the represented kernel feature space rather than in the input space. Our 2T1S blends the concepts of co-training and consensus training. Through co-training, the classifiers generated by two views can “teach” the third classifier from the remaining view to learn, and this process is performed for each choice of teachers-student combination. By consensus training, predictions from more than one view can give us higher confidence for labeling unlabeled data.

1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Applications of Reduced Set . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Notations and Problem Setting . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Organization of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Preliminaries 5 2.1 A Brief Overview of the Support Vector Machines . . . . . . . . . . . . . . . . 5 2.2 Smooth Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Reduced Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . 7 3 Reduced Set Selection Methods 10 3.1 Incremental RSVM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2 Clustering RSVM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.3 Generating the Reduced Set by Systematic Sampling . . . . . . . . . . . . . . 13 3.3.1 Systematic Sampling for RSVM . . . . . . . . . . . . . . . . . . . . . 13 3.3.2 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4 Incremental Forward Feature Selection 17 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.2 Filter Model for Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . 19 4.3 Wrapper Model for Feature Selection . . . . . . . . . . . . . . . . . . . . . . 22 4.3.1 1-norm SVM for Feature Selection . . . . . . . . . . . . . . . . . . . . 22 4.3.2 Incremental Forward Feature Selection (IFFS) . . . . . . . . . . . . . 24 4.4 Experiment Setting and Numerical Results . . . . . . . . . . . . . . . . . . . . 25 4.4.1 Acute Leukemia Data Set . . . . . . . . . . . . . . . . . . . . . . . . 25 4.4.2 Colon Cancer Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.4.3 Numerical Results and Comparisons . . . . . . . . . . . . . . . . . . . 28 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5 An RSVM Based Two-teachers-one-student Semi-supervised Learning Algorithm 33 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.2 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.3 2T1S Approach for SSL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.3.1 Reduced Sets for Multi-view Learning . . . . . . . . . . . . . . . . . . 40 5.3.2 View Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.3.3 The 2T1S algorithm = co-training + consensus training . . . . . . . . . 43 5.4 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.4.1 Five-Group Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.4.2 Checkerboard Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.4.3 UCI Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.4.4 Comparison of 2T1S with Co-training and Tri-training Algorithms . . . 53 5.5 A Variant 2T1S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.5.1 The Part-concept of Passive-Aggressive Algorithm for PASL . . . . . . 54 5.5.2 The Technique of Down-weighting for PASL . . . . . . . . . . . . . . 55 5.5.3 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.6 An Application for 2T1S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 6 Summary and Conclusion 72

[1] M. F. Abdel Hady, F. Schwenker, and G. Palm. Semi-supervised learning for treestructured
ensembles of RBF networks with Co-Training. Neural Networks, 23:497–509,
2010.
[2] U. Alon, N. Barkai, D. A. Notterman, K. Gish, S. Ybarra, D. Mack, and A. J. Levine.
Broad patterns of gene expression revealed by clustering analysis of tumor and normal
colon tissues probed by oligonucleotide arrays. Cell Biology, 96:6745–6750, 1999.
[3] Ethem Alpaydin. Introduction to Machine Learning. The MIT Press, 2004.
[4] A. Asuncion and D.J. Newman. UCI repository of machine learning databases, 2007.
http://www.ics.uci.edu/»mlearn/MLRepository.html.
[5] M. Belkin, P. Niyogi, and V. Sindhwani. Manifold regularization: A geometric framework
for learning from examples. Journal of Machine Learning Research, 7:2399–2434, 2006.
[6] K. Bennett and A. Demiriz. Semi-supervised support vector machines. In Advances in
Neural Information Processing Systems 11., pages 368–374, 1999.
[7] C. M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, New
York, 1995.
[8] Avrim Blum and Shuchi Chawla. Learning from labeled and unlabeled data using graph
mincuts. In Proc. 18th International Conf. on Machine Learning, pages 19–26. Morgan
Kaufmann, San Francisco, CA, 2001.
[9] Avrim Blum and Tom Mitchell. Combining labeled and unlabeled data with co-training. In
COLT: Proceedings of the Workshop on Computational Learning Theory, Morgan Kaufmann
Publishers, pages 92–100, 1998.
[10] C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data
Mining and Knowledge Discovery, 2(2):121–167, 1998.
[11] CBCL. Face database # 1. MIT Center for Biological and Computation Learning, 2000.
http://cbcl.mit.edu/projects/cbcl/software-datasets/FaceData2.html.
[12] C.-C. Chang. and Y.-J. Lee. Generating the reduced set by systematic sampling. In Z. R.
Yang, R. Everson, and H. Yin, editors, Proceedings of the Fifth International Conference
on Intelligent Data Engineering and Automated Learning, pages 720–725, Exeter, UK,
August 2004. LNCS 3177, Springer-Verlag.
[13] C.-C. Chang., Y.-J. Lee, and H.-K. Pao. A passive-aggressive algorithm for semisupervised
learning. In The 2010 Conference on Technologies and Applications of Artificial
Intelligence (TAAI 2010), pages 335–341, Hsinchu, Taiwan, Nov. 2010.
[14] C.-C. Chang., H.-K. Pao, and Y.-J. Lee. An RSVM based two-teachers-one-student semisupervised
learning algorithm. Neural Networks, page to appear, accepted.
[15] O. Chapelle, B. Sch¨olkopf, and A. Zien, editors. Semi-Supervised Learning. MIT Press,
Cambridge, MA, 2006.
[16] O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee. Choosing multiple parameters
for support vector machines. Machine Learning, 46(1):131–159, 2002.
[17] C. Chen and O. L. Mangasarian. Smoothing methods for convex inequalities and linear
complementarity problems. Mathematical Programming, 71(1):51–69, 1995.
[18] C. Chen and O. L. Mangasarian. A class of smoothing functions for nonlinear and mixed
complementarity problems. Computational Optimization and Applications, 5(2):97–138,
1996.
[19] L.-J. Chien, C.-C. Chang, and Y.-J. Lee. Variant methods of reduced set selection for
reduced support vector machines. Journal of Information Science and Engineering,
26(1):183–196, 2010.
[20] L.-J. Chien. and Y.-J. Lee. Clustering model selection for reduced support vector machines.
In Z. R. Yang, R. Everson, and H. Yin, editors, Proceedings of the Fifth International
Conference on Intelligent Data Engineering and Automated Learning, pages
714–719, Exeter, UK, August 2004. LNCS 3177, Springer-Verlag.
[21] C.-Y. Chiu., Y.-J. Lee, C.-C. Chang, W.-Y. Luo, and H.-C. Huang. Semi-supervised learning
for false alarm reduction. In Proceedings of the Tenth Industrial Conference on Data
Mining (ICDM2010), pages 595–605, Berlin, Germany, July 2010. LNAI 6171, Springer-
Verlag.
[22] Ronan Collobert, Fabian H. Sinz, JasonWeston, and L´eon Bottou. Large scale transductive
svms. Journal of Machine Learning Research, 7:1687–1712, 2006.
[23] C. Constantinopoulos and A. Likas. Semi-supervised and active learning with the probabilistic
RBF classifier. Neurocomputing, 71(13–15):2489–2498, 2008.
[24] C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20:273–279, 1995.
[25] K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer. Online passiveaggressive
algorithms. Journal of Machine Learning Research, 7:551–585, 2006.
[26] N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines. Cambridge
University Press, Cambridge, 2000.
[27] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data
via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological),
39(1):1–38, 1977.
[28] A. Dong and B. Bhanu. Active concept learning in image databases. IEEE Transactions
on Systems, Man, and Cybernetics, B: Cybernetics, 35(3):450–466, 2005.
[29] R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. JohnWiley & Sons,
New York, 1973.
[30] S. Dudoit, J. Fridlyand, and T. P. Speed. Comparison of discrimination methods for the
classification of tumors using gene expression data. Journal of The American Statistical
Association, 97(457):77–87, 2002.
[31] T. Evgeniou, M. Pontil, and T. Poggio. Regularization networks and support vector machines.
In A. Smola, P. Bartlett, B. Sch¨olkopf, and D. Schuurmans, editors, Advances in
Large Margin Classifiers, pages 171–203, Cambridge, MA, 2000. MIT Press.
[32] G. Fung and O. L. Mangasarian. Proximal support vector machine classifiers. In F. Provost
and R. Srikant, editors, Proceedings KDD-2001: Knowledge Discovery and Data Mining,
August 26-29, 2001, San Francisco, CA, pages 77–86, New York, 2001. Asscociation for
Computing Machinery. ftp://ftp.cs.wisc.edu/pub/dmi/tech-reports/01-02.ps.
[33] G. Fung and O. L. Mangasarian. A feature selection Newton method for support vector
machine classification. Computational optimization and applications, 28:185–202, 2004.
[34] T. Furey, N. Cristianini, N. Duffy, D. Bednarski, M. Schummer, and D. Haussler. Support
vector machine classification and validation of cancer tissue samples using microarray
expression data. Bioinformatics, 16:906–914, 2000.
[35] T. Golub, D. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. Mesirov, H. Coller, M. Loh,
J. Downing, M. Caligiuri, C. Bloomfield, and E. Lander. Molecular classification of
cancer: Class discovery and class prediction by gene expression monitoring. Science,
286:531–537, 1999.
[36] I. Guyon, J. Watson, S. Barnhill, and V. Vapnik. Gene selection for cancer classification
using support vector machines. Machine Learning, 46(1/2/3):191–202, 2002.
[37] J. A. Hartigan and M. A. Wong. A k-means clustering algorithm. Applied Statistics,
28(1):100–108, 1979.
[38] S. Hettich and S.D. Bay. The uci kdd archive. http://kdd.ics.uci.edu/, 1999.
[39] T. K. Ho and E. M. Kleinberg. Building projectable classifiers of arbitrary complexity.
In Proceedings of the 13th International Conference on Pattern Recognition, pages 880–
885, Vienna, Austria, 1996. http://cm.bell-labs.com/who/tkh/pubs.html. Checker dataset
at: ftp://ftp.cs.wisc.edu/math-prog/cpo-dataset/machine-learn/checker.
[40] C.-M. Huang, Y.-J. Lee, D. K. J. Lin, and S.-Y. Huang. Model selection for support vector
machines via uniform design. A special issue on Machine Learning and Robust Data
Mining of Computational Statistics and Data Analysis, 52:335–346, 2007.
[41] T. Joachims. Learning to Classify Text Using Support Vector Machines. Kluwer Academic
Publishers, Norwell, Massachusetts, 2002.
[42] L. Kaufman. Solving the quadratic programming problem arising in support vector classification.
In B. Sch¨olkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel
Methods - Support Vector Learning, pages 147–167. MIT Press, 1999.
[43] R. Kohavi and G. John. Wrapper for feature subset selection. Artificial Intelligent Journal,
97:273–324, 1997.
[44] R. Kohavi and G. John. The wrapper approach. In H. Liu and H. Motoda, editors, Feature
Selection for Knowledge Discovery and Data Mining, pages 33–50. Kluwer Academic
Publishers, Dordrecht, The Netherlands, 1998.
[45] V. Kolmogorov and R. Zabin. What energy functions can be minimized via graph cuts?
Pattern Analysis and Machine Intelligence, IEEE Transactions on, 26(2):147–159, 2004.
[46] P. Langley and S. Sage. Induction of selective Bayesian classifiers. In Proceedings of the
Tenth Conference on Uncertainty in Artificial Intelligence(UAI), pages 399–406, Seattle,
1994.
[47] W. Lee and S.J. Stolfo. A framework for constructing features and models for intrusion
detection systems. ACM Transactions on Information and System Security (TISSEC),
3(4):227–261, 2000.
[48] Y.-J. Lee, C.-C. Chang, and C.-H. Chao. Incremental forward feature selection with application
to microarray gene expression. Journal of Biopharmaceutical Statistics, 18(5):827–
840, 2008.
[49] Y.-J. Lee and W.-F. Hsieh. ²-ssvr: A smooth support vector machine for ²-insensitive
regression. IEEE Transactions on Knowledge and Data Engineering, 17:678–685, 2003.
[50] Y.-J. Lee and S.-Y. Huang. Reduced support vector machines: A statistical theory. IEEE
Transactions on Neural Networks, 18:1–13, 2007.
[51] Y.-J. Lee, H.-Y. Lo, and S.-Y. Huang. Incremental reduced support vector machine. In
Proceedings of the 2003 International Conference on Informatics, Cybernetics, and Systems
(ICICS 2003), Kaohsiung, Taiwan, 2003.
[52] Y.-J. Lee and O. L. Mangasarian. RSVM: Reduced support vector machines. In Proceedings
of the First SIAM International Conference on Data Mining, 2001.
[53] Y.-J. Lee, O. L. Mangasarian, and W. H. Wolberg. Survival-time classification of breast
cancer patients. Computational Optimization and Applications, 25:151–166, 2003.
[54] Yuh-Jye Lee and O. L. Mangasarian. SSVM: A smooth support vector machine for classification.
Computational Optimization and Applications, 20(1):5–22, 2001.
[55] Abdelouahid Lyhyaoui, Manel Martinez, Inma Mora, Maryan Vazquez, Jose-Luis Sancho,
and Anibal R. Figueiras-Vidal. Sample selection via clustering to construct support vectorlike
classifier. IEEE Transactions on Neural Networks, 10:1474–1481, 1999.
[56] J. B. MacQueen. Some methods for classification and analysis of multivariate observations.
In Lucien M. Le Cam and Jerzy Neyman, editors, Proceedings of the Fifth Berkeley
Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, pages 281–
297, Berkeley, California, 1967. University of California Press.
[57] O. L. Mangasarian. Nonlinear Programming. SIAM, Philadelphia, PA, 1994.
[58] O. L. Mangasarian. Generalized support vector machines. In A. Smola, P. Bartlett,
B. Sch¨olkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers,
pages 135–146, Cambridge, MA, 2000. MIT Press. ftp://ftp.cs.wisc.edu/math-prog/techreports/
98-14.ps.
[59] S. Mukherjee, P. Tamayo, D. Slonim, A. Verri, T. Golub, J. Mesirov, and T. Poggio. Support
vector machine classification of microarray data. Technical Report AI Memo/CBCL
Paper #1677/#182, MIT AI Lab and CBCL, 1998.
[60] Clayton Silva Oliveira, Fabio Gagliardi Cozman, and Ira Cohen. Splitting the unsupervised
and supervised components of semi-supervised learning. In Proc. of the 22nd ICML
Workshop on Learning, pages 67–73, 2005.
[61] M. R. Osborne, B. Presnell, and B. A. Turlach. On the lasso and its dual. Journal of
Computational and Graphical Statistics, 9(2):319–337, 2000.
[62] Yen-Jen Oyang, Shien-Ching Hwang, Yu-Yen Ou, Chien-Yu Chen, and Zhi-Wei Chen.
An novel learning algorithm for data classification with radial basis function networks.
In Proceeding of 9th International Conference on Neural Information Processing, pages
18–22, Singapore, Nov. 2001.
[63] D. Slonim, P. Tamayo, J. Mesirov, T. Golub, and E. Lander. Class prediction and discovery
using gene expression data. In Proceedings of the fourth Annual International Conference
on Computational Molecular Biology, pages 263–272, 2000.
[64] Alex J. Smola and Bernhard Sch¨olkopf. Sparse greedy matrix approximation for machine
learning. In Proc. 17th International Conf. on Machine Learning, pages 911–918. Morgan
Kaufmann, San Francisco, CA, 2000.
[65] J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric framework for
nonlinear dimensionality reduction. Science, 290(5500):2319–2323, December 2000.
[66] Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal
Statistical Society. Series B (Methodological), 58(1):267–288, 1996.
[67] V. N. Vapnik. The Nature of Statistical Learning Theory. Springer, New York, 1995.
[68] F. Wang and C. Zhang. Robust self-tuning semi-supervised learning. Neurocomputing,
70(16–18):2931–2939, 2007.
[69] J. Weston, A. Elisseeff, B. Sch¨olkopf, and M. Tipping. Use of the zero-norm with linear
models and kernel methods. Jorunal of Machine Learning Research, 3:1439–1461, 2003.
[70] J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio, and V. Vapnik. Feature
selection for SVMs. In Advances in Neural Information Processing Systems 13, pages
668–674, 2001.
[71] C. K. I.Williams and M. Seeger. Using the Nystr¨om method to speed up kernel machines.
Advances in Neural Information Processing Systems (NIPS2000), 2000.
[72] X. Wu, V. Kumar, J.-R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G.-J. McLachlan, A. Ng,
B. Liu, P.-S. Yu, Z.-H. Zhou, M. Steinbach, D.-J. Hand, and D. Steinberg. Top 10 algorithms
in data mining. Knowledge and Information Systems, 14(1):1–37, 2008.
[73] L. Yu and H. Liu. Feature selection for high-dimensional data: A fast correlation-based
filter solution. In Proceedings of the Twentieth International Conference on Machine
Learning (ICML), pages 856–863, Washington DC, 2003.
[74] H. Zhao. Combining labeled and unlabeled data with graph embedding. Neurocomputing,
69(16–18):2385–2389, 2006.
[75] Z.-H. Zhou and M. Li. Tri-training: Exploiting unlabeled data using three classifiers. IEEE
Transactions on Knowledge and Data Engineering, 17(11):1529–1541, 2005.
[76] J. Zhu, S. Rosset, T. Hastie, and R. Tibshirani. 1-norm support vector machines. In
Sebastian Thrun, Lawrence Saul, and Bernhard Sch¨olkopf, editors, Advances in Neural
Information Processing Systems 16, Cambridge, MA, 2004. MIT Press.
[77] X. Zhu. Semi-supervised learning literature survey. Technical Report 1530,
Dept. of Computer Science, University of Wisconsin, Madison, December 2005.
http://www.cs.wisc.edu/»jerryzhu/pub/ssl survey.pdf.
[78] X. Zhu. Semi-supervised learning with graphs. Cmu-lti-05-192, Ph.D. Dissertation,
Carnegie Mellon University, Pittsburgh, PA, May 2005.

QR CODE