簡易檢索 / 詳目顯示

研究生: 鍾至衡
Chih-Heng Chung
論文名稱: 結合群集合成技術恢復訓練標籤以強化半監督式多標籤分類
Improving Semi-supervised Multi-label Classification by Training Labels Recovery with Consensus Clustering
指導教授: 戴碧如
Bi-Ru Dai
口試委員: 黃俊龍
Jiun-Long Huang
戴志華
Chih-Hua Tai
戴碧如
Bi-Ru Dai
沈之涯
Chih-Ya Shen
陳怡伶
Yi-Ling Chen
學位類別: 博士
Doctor
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 69
中文關鍵詞: 多標籤分類標籤不全半監督學習
外文關鍵詞: Multi-label classification, Incomplete label, Semi-supervised learning
相關次數: 點閱:197下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在現實世界的應用中經常遇到具有分佈不均且標籤不全的半監督式分類問題。正面資訊不充分、完全不存在負面資訊及缺失標籤的分佈不均導致多標籤分類結果的準確性降低。在本文中,我們提出半監督式之不完備訓練標籤恢復(SITLR)演算法,用以解決訓練資料標籤不全的半監督式多標籤分類問題。通過我們提出的權重調整及LF-CARS演算法的負面資訊初始化,SITLR藉由資訊的分佈針對訓練資料中已知的標籤資訊進行強化,恢復一些重要的標籤資訊。恢復後的訓練資料可用於任何現存的多標籤分類算法,並訓練出更好的分類模型,並在測試階段產生品質更佳的預測。實驗驗證了SITLR方法的各項設計是有效的。


    The problem of semi-supervised classification with non-uniformly distributed incomplete labels is frequently encountered in real world applications. The lack of positive information, the absence of negative examples and the non-uniform distribution of missing labels lead to the diminished accuracy of multi-label classification results. In this research, we propose the Semi-supervised Incomplete Training Label Recovery (SITLR) algorithm to solve the semi-supervised multi-label classification with incompletely labeled training data. With the proposed weight adjustment step and negative information initialization with LF-CARS algorithm, SITLR focuses on enhancing the information of labeled training instances according to the distribution of data, where it only recovers some important labels and the recovered training data can be applied to any existing multi-label classification algorithm for building a better classification model and generating better label predictions in the testing phase. The experiments verified the effectiveness of SITLR.

    論文摘要 IV Abstract V 致謝 VI Table of Contents VII List of Figures IX List of Tables X 1. Introduction 1 2. Related Work 6 2.1. Multi-label Classification 6 2.2. Incomplete Label Learning 6 2.3. Semi-supervised Learning 7 2.4. Consensus Clustering 8 3. Proposed Method 11 3.1. Problem Definition 11 3.2. Label Recovery Framework 13 3.3. Recovering Training Labels by Means of Iterative Weight Updating 17 3.4. Base Classifier 20 3.5. Negative Information Initialization with a Clustering Result 23 3.6. Loose Fragment-based Consensus Clustering Algorithm with a Robust Similarity (LF-CARS) 27 4. Experiments 35 4.1. Datasets, Settings and Comparisons of Experiments of Consensus Clustering 35 4.2. Experiments of Consensus Clustering 37 4.3. Datesets and Evaluation Criteria of Experiments of Semi-supervised Multi-label Classification 42 4.4. Experiment of the Negative Information Initialization 45 4.5. Experiment of the Weight Adjustment Step 47 4.6. Experiment of Different K of the Base Classifier 48 4.7. Experiment of Overall Performance 50 5. Conclusion 53 Reference 54

    [1] Amirhossein Hosseini Akbarnejad and Mahdieh Soleymani Baghshah. 2019. An Efficient Semi-Supervised Multi-label Classifier Capable of Handling Missing Labels. IEEE Transactions on Knowledge and Data Engineering vol. 31, no. 2, pp. 229–242 (2019).
    [2] Matthew R Boutell, Jiebo Luo, Xipeng Shen, and Christopher M Brown. 2004. Learning multi-label scene classification. Pattern recognition vol. 37, no. 9, pp. 1757–1771 (2004).
    [3] Forrest Briggs, Yonghong Huang, Raviv Raich, Konstantinos Eftaxias, Zhong Lei, William Cukierski, Sarah Frey Hadley, Adam Hadley, Matthew Betts, Xiaoli Z Fern, et al. 2013. The 9th annual mlsp competition: New methods for acoustic classification of multiple simultaneous bird species in a noisy environment. In Machine Learning for Signal Processing (MLSP), 2013 IEEE International Workshop on. IEEE, pp. 1–8.
    [4] Serhat Selcuk Bucak, Rong Jin, and Anil K Jain. 2011. Multi-label learning with incomplete class assignments. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, pp. 2801–2808.
    [5] Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: a library for support vector machines. ACM transactions on intelligent systems and technology (TIST) vol. 2, no. 3, article 27 (2011).
    [6] Minmin Chen, Alice Zheng, and Kilian Weinberger. 2013. Fast image tagging. In International conference on machine learning. pp. 1274–1282.
    [7] Chih-Heng Chung and Bi-Ru Dai. 2016. A Framework of the Semi-supervised Multi-label Classification with Nonuniformly Distributed Incomplete Labels. In International Conference on Big Data Analytics and Knowledge Discovery. Springer, pp. 267–280.
    [8] André Elisseeff and Jason Weston. 2002. A kernel method for multi-labelled classification. In Advances in neural information processing systems. pp. 681–687.
    [9] Charles Elkan and Keith Noto. 2008. Learning classifiers from only positive and unlabeled data. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp. 213–220.
    [10] Yuhong Guo and Dale Schuurmans. 2011. Adaptive Large Margin Training for Multilabel Classification. In AAAI.
    [11] Mark J Huiskes and Michael S Lew. 2008. The MIR flickr retrieval evaluation. In Proceedings of the 1st ACM international conference on Multimedia information retrieval. ACM, pp. 39–43.
    [12] Thorsten Joachims. 1998. Text categorization with support vector machines: Learning with many relevant features. In European conference on machine learning. Springer, pp. 137–142.
    [13] Ha Quang Minh and Vikas Sindhwani. 2011. Vector-valued Manifold Regularization. In ICML. Citeseer, pp. 57–64.
    [14] Zhongang Qi, Ming Yang, Zhongfei Mark Zhang, and Zhengyou Zhang. 2011. Mining partially annotated images. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp. 1199–1207.
    [15] Zhongang Qi, Ming Yang, Zhongfei Mark Zhang, and Zhengyou Zhang. 2012. Multi-view learning from imperfect tagging. In Proceedings of the 20th ACM international conference on Multimedia. ACM, pp. 479–488.
    [16] Cees GM Snoek, Marcel Worring, Jan C Van Gemert, Jan-Mark Geusebroek, and Arnold WM Smeulders. 2006. The challenge problem for automated detection of 101 semantic concepts in multimedia. In Proceedings of the 14th ACM international conference on Multimedia. ACM, pp. 421–430.
    [17] Ashok N Srivastava and Brett Zane-Ulman. 2005. Discovering recurring anomalies in text reports regarding complex space systems. In Aerospace conference, 2005 IEEE. IEEE, pp. 3853–3862.
    [18] Yu-Yin Sun, Yin Zhang, and Zhi-Hua Zhou. 2010. Multi-label learning with weak label. In Twenty-Fourth AAAI Conference on Artificial Intelligence.
    [19] Konstantinos Trohidis, Grigorios Tsoumakas, George Kalliris, and Ioannis P Vlahavas. 2008. Multi-Label Classification of Music into Emotions. In ISMIR, vol. 8, pp. 325–330.
    [20] Douglas Turnbull, Luke Barrington, David Torres, and Gert Lanckriet. 2008. Semantic annotation and retrieval of music and sound effects. IEEE Transactions on Audio, Speech, and Language Processing vol. 16, no. 2, pp. 467–476 (2008).
    [21] Tong Wei, Lan-Zhe Guo, Yu-Feng Li, and Wei Gao. 2018. Learning safe multi-label prediction for weakly labeled data. Machine Learning vol. 107, no. 4, pp. 703–725 (2018).
    [22] Qingyao Wu, Michael K Ng, Yunming Ye, Xutao Li, Ruichao Shi, and Yan Li. 2014. Multi-label collective classification via markov chain based learning method. Knowledge-Based Systems vol. 63, pp. 1–14 (2014).
    [23] QingyaoWu, Mingkui Tan, Hengjie Song, Jian Chen, and Michael K Ng. 2016. ML-FOREST: A multi-label tree ensemble method for multi-label classification. IEEE transactions on knowledge and data engineering vol. 28, no. 10, pp. 2665–2680 (2016).
    [24] Qingyao Wu, Yunming Ye, Haijun Zhang, Tommy WS Chow, and Shen-Shyang Ho. 2015. ML-TREE: A tree-structurebased approach to multilabel learning. IEEE transactions on neural networks and learning systems vol. 26, no. 3, pp. 430–443 (2015).
    [25] Miao Xu, Rong Jin, and Zhi-Hua Zhou. 2013. Speedup matrix completion with side information: Application to multi-label learning. In Advances in neural information processing systems. pp. 2301–2309.
    [26] Yiming Yang. 1999. An evaluation of statistical approaches to text categorization. Information retrieval vol. 1, no. 1-2, pp. 69–90 (1999).
    [27] Hsiang-Fu Yu, Prateek Jain, Purushottam Kar, and Inderjit Dhillon. 2014. Large-scale multi-label learning with missing labels. In International conference on machine learning. pp. 593–601.
    [28] Min-Ling Zhang and Zhi-Hua Zhou. 2007. ML-KNN: A lazy learning approach to multi-label learning. Pattern recognition vol. 40, no. 7, pp. 2038–2048 (2007).
    [29] Yin Zhang and Zhi-Hua Zhou. 2010. Multilabel dimensionality reduction via dependence maximization. ACM Transactions on Knowledge Discovery from Data (TKDD) vol. 4, no. 3, article 14 (2010).
    [30] Feipeng Zhao and Yuhong Guo. 2015. Semi-Supervised Multi-Label Learning with Incomplete Labels. In IJCAI. pp. 4062–4068.
    [31] Xu, Rui, and Donald C. 2005. Wunsch. Survey of clustering algorithms. IEEE Transactions on Neural Networks vol. 16, no. 3, pp. 645–678 (2005)
    [32] Verma, Deepak, and Marina Meila. 2003. A comparison of spectral clustering algorithms. University of Washington Tech Rep UWCSE030501 vol. 1, pp. 1-18 (2003).
    [33] Ester, Martin, et al. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. 2nd Int’l Conf. Knowledge Discovery and Data Mining (KDD 96), AAAI Press, pp. 226–231
    [34] Borah, B., and D. K. Bhattacharyya. 2008. DDSC: a density differentiated spatial clustering technique. Journal of computers vol 3.2, pp. 72-79 (2008).
    [35] Gionis, Aristides, Heikki Mannila, and Panayiotis Tsaparas. 2007. Clustering aggregation. ACM Transactions on Knowledge Discovery from Data (TKDD) vol. 1.1, article 4 (2007).
    [36] Strehl, Alexander, and Joydeep Ghosh. 2002. Cluster ensembles---a knowledge reuse framework for combining multiple partitions. Journal of machine learning research 3. pp. 583-617, Dec (2002).
    [37] Nguyen, Nam, and Rich Caruana. 2007. Consensus clusterings. Seventh IEEE International Conference on Data Mining (ICDM 2007). IEEE, 2007. Institute of Electrical and Electronics Engineers Inc. pp. 607–612.
    [38] Goder, Andrey, and Vladimir Filkov. 2008. Consensus clustering algorithms: Comparison and refinement. Proceedings of the Meeting on Algorithm Engineering & Expermiments. Society for Industrial and Applied Mathematics (2008).
    [39] Topchy, Alexander, Anil K. Jain, and William Punch. 2003. Combining multiple weak clusterings. Third IEEE International Conference on Data Mining. IEEE, 2003. Institute of Electrical and Electronics Engineers Inc. pp. 331–338
    [40] Wu, Ou, and Zhu, Mingliang, and Hu, Weiming. 2009. Fragment-based clustering ensembles. International Conference on Information and Knowledge Management, Proceedings, Association for Computing Machinery, pp. 1795–1798.
    [41] Fern, Xiaoli Z., and Carla E. Brodley. 2003. Random projection for high dimensional data clustering: A cluster ensemble approach. Proceedings of the 20th international conference on machine learning (ICML-03). 2003., American Association for Artificial Intelligence, pp. 186–193 (2003).
    [42] Fred, Ana LN, and Anil K. Jain. 2002. Data clustering using evidence accumulation. Object recognition supported by user interaction for service robots. vol. 4 (2002).
    [43] Lance, Godfrey N., and William Thomas Williams. 1967. A general theory of classificatory sorting strategies: 1. Hierarchical systems. The computer journal vol. 9.4, pp. 373-380 (1967).
    [44] Karypis, George, and Vipin Kumar. 1998. Multilevelk-way partitioning scheme for irregular graphs. Journal of Parallel and Distributed computing vol. 48.1, pp. 96-129 (1998).
    [45] Karypis, George, Eui-Hong Sam Han, and Vipin Kumar. 1999. Chameleon: Hierarchical clustering using dynamic modeling. Computer vol. 8, pp. 68-75 (1999).
    [46] UC Irvine Machine Learning Repository (UCI) http://archive.ics.uci.edu/ml/
    [47] Weka the University of Waikato http://www.cs.waikato.ac.nz/ml/weka/
    [48] Dai, Bi-Ru, and Chih-Heng Chung. 2012. LF-CARS: A Loose Fragment-Based Consensus Clustering Algorithm with a Robust Similarity. International Conference on Discovery Science. Springer, Berlin, Heidelberg (2012).
    [49] Chung, Chih-Heng, and Bi-Ru Dai. 2014. A fragment-based iterative consensus clustering algorithm with a robust similarity. Knowledge and information systems vol. 41.3, pp. 591-609 (2014).

    無法下載圖示 全文公開日期 2024/08/26 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE