簡易檢索 / 詳目顯示

研究生: 王牧得
Mu-De Wang
論文名稱: 使用群聚分析改善主動半監督式學習
Improved Active Semi-supervised Learning by Using Clustering Analysis
指導教授: 戴碧如
Bi-Ru Dai
口試委員: 陳建錦
Chien-Chin Chen
鮑興國
Hsing-kuo Pao
蔡曉萍
Hsiao-Ping Tsa
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2015
畢業學年度: 103
語文別: 英文
論文頁數: 42
中文關鍵詞: 主動學習半監督分類半監督分群支持向量機自我訓練
外文關鍵詞: Active learning, Semi-supervised classification, Semi-supervised clustering, Support vector machine, Self-training
相關次數: 點閱:272下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 大多數的分類演算法都需要大量的標記樣本(Labeled samples)來訓練分類器,然而現實生活中資料型態大多數都是屬於未標記資料(Unlabeled data)。人工標記相當耗費人力及時間, 因此研究學者們提出了主動學習(Active learning)及半監督學習(Semi-supervised learning)的技術來減少人工標記(Manual labeling)的工作量。主動學習及半監督學習之間有一定的互補性,因此有部分的研究將它們結合起來,進而減少人工標記的工作量。在眾多主動半監督學習的研究中較為著名的主動學習方法是,於每次疊代詢問當前分類信心(Classification confidence)最低的樣本。然而,這種方法只專注處理分類臨界上的樣本,而忽略其餘大量的未標記資料的運用。為了能充分使用未標記資料,本論文提出一個結合分群(Clustering)及分類的主動半監督學習框架。我們的動機是分群分析是一個功能強大的知識發現工具,它可以從未標記資料揭示潛在的資料空間結構。在我們的框架中,半監督分群集成到自我訓練(Self-training)分類,以幫助訓練更好的分類器。我們的實驗結果顯示,我們提出的框架相較於比較對象有較高的效率。


    Most classification algorithms require a large number of labeled samples to train classifiers, but the real datasets are mostly unlabeled. It really needs human labor and is time-consuming to label manually; hence researchers have proposed techniques of active learning and semi-supervised learning to reduce workload on manual labeling. There is a certain complementarity between active learning and semi-supervised learning, and therefore some researches combine them to reduce manual labeling workload. Also, the most popular active learning method in numerous studies is to query each iteration current lowest classification confidence samples. However, it only focuses on samples that are more likely to be on the class boundary, while ignoring the usage of the rest large numbers of unlabeled samples. To use the unlabeled data more effectively, an active learning with semi-supervised learning framework that combines clustering and classification has been proposed in this paper. We also mentioned a powerful knowledge-discovery tool that can reveal the underlying data distribution from unlabeled data. It motivates us to create a framework that integrates semi-supervised clustering into self-training classification in order to train a better classifier. In our framework, semi-supervised clustering is integrated into self-training classification to help train a better classifier. The experimental result shows that our framework has higher efficiency than competitors.

    指導教授推薦書 II 論文口試委員審定書 III Abstract IV 論文摘要 V 致 謝 VI Table of Contents VII List of Figures VIII List of Tables IX 1. Introduction 1 1.1 Background 1 1.2 Motivation and Contribution 2 1.3 Thesis Organization 4 2. Related Works 5 2.1 Semi-supervised Learning 5 2.2 Active Learning 5 2.3 Active Learning with Semi-supervised Learning 6 3. Proposed Method 9 3.1 Active Seed Selection 10 3.2 Semi-supervised Clustering 14 3.3 Informative Instances Selection 15 3.4 Reliable Instances Estimation 17 4 Experiment Study 20 4.1 Datasets 20 4.2 Experimental Setup 21 4.3 Experimental Results 21 4.3.1 Comparisons of Min-Max Approach 21 4.3.2 Comparisons of Semi-supervised Clustering 23 4.3.3 Comparisons of Accuracy and Processing Time 25 5 Conclusion and Future Works 29 Reference 30

    1. Duda, R. O., Hart, P. E., & Stork, D. G. (2012). Pattern classification. John Wiley & Sons.
    2. Grabner, H., Leistner, C., & Bischof, H. (2008). Semi-supervised on-line boosting for robust tracking. In Computer Vision–ECCV 2008 (pp. 234-247). Springer Berlin Heidelberg.
    3. Su, Y., Shan, S., Chen, X., & Gao, W. (2009). Hierarchical ensemble of global and local classifiers for face recognition. Image Processing, IEEE Transactions on, 18(8), 1885-1896.
    4. Cao, Y., He, H., & Huang, H. H. (2011). LIFT: A new framework of learning from testing data for face recognition. Neurocomputing, 74(6), 916-929.
    5. Pan, F., Wang, J., & Lin, X. (2011). Local margin based semi-supervised discriminant embedding for visual recognition. Neurocomputing, 74(5), 812-819.
    6. Qi, Z., Xu, Y., Wang, L., & Song, Y. (2011). Online multiple instance boosting for object detection. Neurocomputing, 74(10), 1769-1775.
    7. Settles, B. (2010). Active learning literature survey. University of Wisconsin, Madison, 52(55-66), 11.
    8. Zhu, X., Lafferty, J., & Rosenfeld, R. (2005). Semi-supervised learning with graphs. Carnegie Mellon University, Language Technologies Institute, School of Computer Science.
    9. Leng, Y., Xu, X., & Qi, G. (2013). Combining active learning and semi-supervised learning to construct SVM classifier. Knowledge-Based Systems,44, 121-131.
    10. Tong, S., & Koller, D. (2002). Support vector machine active learning with applications to text classification. The Journal of Machine Learning Research,2, 45-66.
    11. Melville, P., & Mooney, R. J. (2004, July). Diverse ensembles for active learning. In Proceedings of the twenty-first international conference on Machine learning (p. 74). ACM.
    12. Settles, B., Craven, M., & Ray, S. (2008). Multiple-instance active learning. InAdvances in neural information processing systems (pp. 1289-1296).
    13. Guo, Y., & Greiner, R. (2007, January). Optimistic Active-Learning Using Mutual Information. In IJCAI (Vol. 7, pp. 823-829).
    14. Hoi, S. C., Jin, R., & Lyu, M. R. (2006, May). Large-scale text categorization by batch mode active learning. In Proceedings of the 15th international conference on World Wide Web (pp. 633-642). ACM.
    15. Settles, B. (2008). Curious machines: Active learning with structured instances. ProQuest.
    16. Zhou, Y., Kantarcioglu, M., & Thuraisingham, B. (2012, December). Self-Training with Selection-by-Rejection. In Data Mining (ICDM), 2012 IEEE 12th International Conference on (pp. 795-803). IEEE.
    17. Nigam, K. P. (2001). Using unlabeled data to improve text classification(Doctoral dissertation, Massachusetts Institute of Technology).
    18. Zhao, H. (2006). Combining labeled and unlabeled data with graph embedding.Neurocomputing, 69(16), 2385-2389.
    19. Wang, F., & Zhang, C. (2007). Robust self-tuning semi-supervised learning.Neurocomputing, 70(16), 2931-2939.
    20. Blum, A., & Mitchell, T. (1998, July). Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on Computational learning theory (pp. 92-100). ACM.
    21. Yaslan, Y., & Cataltepe, Z. (2010). Co-training with relevant random subspaces. Neurocomputing, 73(10), 1652-1661.
    22. McCallumzy, A. K., & Nigamy, K. (1998). Employing EM and pool-based active learning for text classification. In Proc. International Conference on Machine Learning (ICML) (pp. 359-367).
    23. Muslea, I., Minton, S., & Knoblock, C. A. (2002, July). Active+ semi-supervised learning= robust multi-view learning. In ICML (Vol. 2, pp. 435-442).
    24. He, X., Ji, M., & Bao, H. (2009, June). A unified active and semi-supervised learning framework for image compression. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on (pp. 65-72). IEEE.
    25. Yu, D., Varadarajan, B., Deng, L., & Acero, A. (2010). Active learning and semi-supervised learning for speech recognition: A unified framework using the global entropy reduction maximization criterion. Computer Speech & Language,24(3), 433-444.
    26. Li, H., Liao, X., & Carin, L. (2009, April). Active learning for semi-supervised multi-task learning. In Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on (pp. 1637-1640). IEEE.
    27. Zhang, Y., Wen, J., Wang, X., & Jiang, Z. (2014). Semi-supervised learning combining co-training with active learning. Expert Systems with Applications,41(5), 2372-2378.
    28. Gan, H., Sang, N., Huang, R., Tong, X., & Dan, Z. (2013). Using clustering analysis to improve semi-supervised classification. Neurocomputing, 101, 290-298.
    29. Gan, H., Sang, N., Huang, R., Tong, X., & Dan, Z. (2009). Discussion of FCM algorithm with partial supervision. Proceedings of the Eighth International Symposium on Distributed Computing and Applications to Business. Engineering and Science, 27-31
    30. Vu, V. V., Labroche, N., & Bouchon-Meunier, B. (2010, October). Active learning for semi-supervised k-means clustering. In Tools with Artificial Intelligence (ICTAI), 2010 22nd IEEE International Conference on (Vol. 1, pp. 12-15). IEEE.
    31. Mallapragada, P. K., Jin, R., & Jain, A. K. (2008, December). Active query selection for semi-supervised clustering. In Pattern Recognition, 2008. ICPR 2008. 19th International Conference on (pp. 1-4). IEEE.
    32. Fu, J., Lee, S., & Wu, W. (2012, November). Efficient active learning based on uncertain clusters. In Technologies and Applications of Artificial Intelligence (TAAI), 2012 Conference on (pp. 157-164). IEEE.
    33. Vu, V. V., Labroche, N., & Bouchon-Meunier, B. (2012). Improving constrained clustering with active query selection. Pattern Recognition, 45(4), 1749-1758.
    34. Basu, S., Banerjee, A., & Mooney, R. (2002). Semi-supervised clustering by seeding. In In Proceedings of 19th International Conference on Machine Learning (ICML-2002.
    35. Bezdek, J. C., Ehrlich, R., & Full, W. (1984). FCM: The fuzzy c-means clustering algorithm. Computers & Geosciences, 10(2), 191-203.
    36. Vapnik, V. N., & Vapnik, V. (1998). Statistical learning theory (Vol. 1). New York: Wiley.
    37. Shawe-Taylor, J., & Sun, S. (2011). A review of optimization methodologies in support vector machines. Neurocomputing, 74(17), 3609-3618.
    38. Witten, I. H., & Frank, E. (2005). Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann.
    39. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: an update. ACM SIGKDD explorations newsletter, 11(1), 10-18.

    無法下載圖示 全文公開日期 2020/07/28 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE