簡易檢索 / 詳目顯示

研究生: 傅枝燁
Zhi-Ye Fu
論文名稱: 包含連續型特徵標註的主動式學習
Active Learning with Numerical Feature Annotation
指導教授: 鮑興國
Hsing-Kuo Pao
口試委員: 李育杰
Yuh-Jye Lee
項天瑞
Tien-Ruey Hsiang
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 68
中文關鍵詞: 主動式學習連續性特徵特徵標註向量量化
外文關鍵詞: Active Learning, Numerical Feature, Feature Annotation, Vector Quantization
相關次數: 點閱:277下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 正如我們所知,主動學習(Active Learning) 是一種當訓練集中有大量
    未標記的資料時,可以以盡可能少的標記成本使模型主動選擇資料
    向專家詢問標籤的演算法。然而,大多數研究都集中在主動學習的
    查詢策略(Query Strategy),即模型如何選擇信息量較多的資料。而
    有一些研究探討如何同時查詢普通資料和資料中的重要特徵。如雙
    重監督主動學習模型(Active Learning Dual Supervision Model)將重
    要特徵合成一筆虛假資料,通過這樣的方式將特徵視為正常實例。
    但它只能應用於文本和離散資料,因為這樣的特徵有明確含義。
    然而,本研究致力於如何對連續型數值資料進行特徵標注。首
    先通過向量量化(Vector Quantization) 方法將連續型特徵轉換為離散
    型,使得特徵具有明確含義。這樣的特徵可用于合成偽實例。在此
    基礎上,我們深入觀察了在雙重監督主動學習模型中被用於合成實
    例的特徵,以此來佐證雙重監督模型在訓練早期階段就已發現真
    實重要的特徵以及雙重模型中用於計算特徵重要性機制的可靠性。
    本研究不僅採用向量量化方法,並進一步討論了向量量化中的兩
    種聚類演算法,K 平均(Kmeans)
    和高斯混合模型(Gaussian Mixture
    Model)。由於不同的聚類演算法適用於不同類型的資料集,因此這
    也是本文的研究點之一。


    As we know active learning is an algorithm which can select informative
    instances for model actively with as little labeling cost as possible, when
    there is abundant unlabeled data in training set. However, most researches
    focus on query strategy in active learning, namely how the model selects
    the informative instances. And some researches are talking about querying
    label for instance and feature simultaneously. For instance, they treat
    features as normal instance by synthesizing instance with selected features
    through Active Learning Dual Supervision (ALDS) Model. But it can only
    apply to text and categorical data because their features are meaningful.
    So, this research put effort on how to label numerical features simultaneously
    in active learning. What we want to do is intuitive, converting
    numerical features to categorical by applying vector quantization method.
    Then features become meaningful, which means that they can be used to
    synthesize pseudo instances. Based on it, we investigate details of features
    which are selected to synthesize instances in ALDS to see if ALDS model
    finds important features in the early stage. Meanwhile, the more important
    features are found by ALDS, the better performance model can get. What’s
    more, we are not only adopting vector quantization but also discussing two
    kinds of clustering methods in vector quantization. The two we used are
    Kmeans
    and Gaussian Mixture Model algorithms. We discuss what kind
    of data suits for each method respectively.

    Contents Recommendation Letter . . . . . . . . . . . . . . . . . . . . . . . . i Approval Letter . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Abstract in Chinese . . . . . . . . . . . . . . . . . . . . . . . . . . iii Abstract in English . . . . . . . . . . . . . . . . . . . . . . . . . . iv Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . v Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Proposed Method . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.1 Overall Framework . . . . . . . . . . . . . . . . . . . . . 10 3.2 Vector Quantization . . . . . . . . . . . . . . . . . . . . . 11 3.2.1 Kmeans . . . . . . . . . . . . . . . . . . . . . . 12 3.2.2 Gaussian Mixture Model . . . . . . . . . . . . . . 14 3.3 Active Learning . . . . . . . . . . . . . . . . . . . . . . . 17 3.3.1 Active Learning Scenarios . . . . . . . . . . . . . 17 3.3.2 Query Strategies . . . . . . . . . . . . . . . . . . 19 3.4 Active Learning Dual Supervision . . . . . . . . . . . . . 22 3.4.1 Hierarchical Clustering . . . . . . . . . . . . . . . 22 3.4.2 Pure Cluster Filtering . . . . . . . . . . . . . . . . 23 3.4.3 Feature Scoring . . . . . . . . . . . . . . . . . . . 24 3.4.4 Instance Synthesizing . . . . . . . . . . . . . . . . 27 4 Experiments and Results . . . . . . . . . . . . . . . . . . . . . 29 4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.1.1 Pima Indian Diabetes . . . . . . . . . . . . . . . . 30 4.1.2 NBA . . . . . . . . . . . . . . . . . . . . . . . . 30 4.1.3 Magic Gamma Telescope . . . . . . . . . . . . . . 31 4.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . 32 4.3 Experiment Setting . . . . . . . . . . . . . . . . . . . . . 32 4.4 Experiment Evaluation . . . . . . . . . . . . . . . . . . . 33 4.5 Experiment Result . . . . . . . . . . . . . . . . . . . . . 34 4.5.1 Adding Feature Annotation on Numerical Data . . 34 4.5.2 Synthesized Instances in each Data Set . . . . . . 38 4.5.3 Vector Quantization Method Analysis . . . . . . . 45 4.5.4 Vector Quantization with Combination . . . . . . 50 5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 54 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

    References
    [1] B. Settles, “Active learning literature survey,” tech. rep., University of WisconsinMadison
    Department
    of Computer Sciences, 2009.
    [2] D. D. Lewis and W. A. Gale, “A sequential algorithm for training text classifiers,” in SIGIR’94,
    pp. 3–12, Springer, 1994.
    [3] T. Scheffer, C. Decomain, and S. Wrobel, “Active hidden markov models for information extraction,”
    in International Symposium on Intelligent Data Analysis, pp. 309–318, Springer, 2001.
    [4] C. E. Shannon, “A mathematical theory of communication,” Bell system technical journal, vol. 27,
    no. 3, pp. 379–423, 1948.
    [5] B. Settles and M. Craven, “An analysis of active learning strategies for sequence labeling tasks,” in
    Proceedings of the conference on empirical methods in natural language processing, pp. 1070–1079,
    Association for Computational Linguistics, 2008.
    [6] N. Roy and A. McCallum, “Toward optimal active learning through monte carlo estimation of error
    reduction,” ICML, Williamstown, pp. 441–448, 2001.
    [7] B. Demir, C. Persello, and L. Bruzzone, “Batchmode
    activelearning
    methods for the interactive classification
    of remote sensing images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 49,
    no. 3, pp. 1014–1031, 2010.
    [8] M. Ducoffe and F. Precioso, “Adversarial active learning for deep networks: a margin based approach,”
    arXiv preprint arXiv:1802.09841, 2018.
    [9] O. Sener and S. Savarese, “Active learning for convolutional neural networks: A coreset
    approach,”
    arXiv preprint arXiv:1708.00489, 2017.
    [10] A. Chriswanto, A Unified Approach on Active Learning Dual Supervision. PhD thesis, 2017.
    [11] H. Raghavan, O. Madani, and R. Jones, “Active learning with feedback on features and instances,”
    Journal of Machine Learning Research, vol. 7, no. Aug, pp. 1655–1686, 2006.
    [12] H. Raghavan and J. Allan, “An interactive algorithm for asking and incorporating feature feedback into
    support vector machines,” in Proceedings of the 30th annual international ACM SIGIR conference on
    Research and development in information retrieval, pp. 79–86, ACM, 2007.
    [13] G. Druck, B. Settles, and A. McCallum, “Active learning by labeling features,” in Proceedings of
    the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1Volume
    1,
    pp. 81–90, Association for Computational Linguistics, 2009.
    [14] W.K.
    Wong, I. Oberst, S. Das, T. Moore, S. Stumpf, K. McIntosh, and M. Burnett, “Enduser
    feature
    labeling: A locallyweighted
    regression approach,” in Proceedings of the 16th international conference
    on Intelligent user interfaces, pp. 115–124, ACM, 2011.
    [15] B. Settles, “Closing the loop: Fast, interactive semisupervised
    annotation with queries on features and
    instances,” in Proceedings of the conference on empirical methods in natural language processing,
    pp. 1467–1478, Association for Computational Linguistics, 2011.
    [16] J. Attenberg, P. Melville, and F. Provost, “A unified approach to active dual supervision for labeling
    features and examples,” in Joint European Conference on Machine Learning and Knowledge Discovery
    in Databases, pp. 40–55, Springer, 2010.
    [17] D. Xu and Y. Tian, “A comprehensive survey of clustering algorithms,” Annals of Data Science, vol. 2,
    no. 2, pp. 165–193, 2015.
    [18] Wikipedia, “Kmeans
    clustering.” https://en.wikipedia.org/wiki/K-means_clustering/.
    [19] Wikipedia, “Mixture model.” https://en.wikipedia.org/wiki/Mixture_model#Gaussian_
    mixture_model/.
    [20] D. Müllner, “Modern hierarchical, agglomerative clustering algorithms,” arXiv preprint arXiv:
    1109.2378, 2011.
    [21] S. Dasgupta and D. Hsu, “Hierarchical sampling for active learning,” in Proceedings of the 25th international
    conference on Machine learning, pp. 208–215, ACM, 2008.
    [22] M. Lichman, “Uci machine learning repository.” http://archive.ics.uci.edu/ml/, 2013.
    [23] D. Cournapeau, “scikitlearn.”
    http://scikit-learn.org/stable/index.html, 2007.

    無法下載圖示 全文公開日期 2024/07/01 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE