簡易檢索 / 詳目顯示

研究生: Adrian Chriswanto
Adrian Chriswanto
論文名稱: A Unified Approach on Active Learning Dual Supervision
A Unified Approach on Active Learning Dual Supervision
指導教授: 鮑興國
Hsing-Kuo Pao
口試委員: 鮑興國
Hsing-Kuo Pao
李育杰
Yuh-Jye Lee
蘇黎
Li Su
陳瑞彬
Ray-Bing Chen
邱舉明
Ge-Ming Chiu
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2017
畢業學年度: 105
語文別: 英文
論文頁數: 51
中文關鍵詞: Active LearningDual SupervisionFeature QueryingFeature FeedbackQuery Synthesis
外文關鍵詞: Active Learning, Dual Supervision, Feature Querying, Feature Feedback, Query Synthesis
相關次數: 點閱:295下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

Active Learning is a machine learning framework that tries to solves the issue of having huge amount of unlabeled data compared to the labeled data. Most studies in Active Learning focus on how to select the unlabeled data to be labeled by a human oracle in order to maximize the performance gain of the model with as little labeling effort as possible. In this thesis, however, we focus not only in how to select the data instances but also how to select which features to be labeled by the oracle in a unified manner. By unified, it means that we tried to select the best possible combination of features and instances on each iteration. Labeling the features is especially helpful for high dimensional data, since it allows the model to discover the important features earlier.

The method that we propose is by synthesizing new instances that represent a set of features. By utilizing synthesized instances, we can treat this set of features as if they are regular instances. Therefore they could be compared on an equal ground when the model tries to select which instances to be labeled by the oracle. The features used to build the synthesized instances need to be carefully selected so the resulting synthesized instances could improve the model and not introducing any contradicting information. We utilize hierarchical clustering in order to group features that have similar context. This is done first by picking clusters whose purity are estimated to be high. Then we score the features based on how common the feature is in the cluster and how related the feature is to the estimated majority label. The top scoring features then will be used to synthesize instances. Since we are picking clusters that are estimated to has high purity, there is a good chance that the top scoring features will not contradicting each other.


Active Learning is a machine learning framework that tries to solves the issue of having huge amount of unlabeled data compared to the labeled data. Most studies in Active Learning focus on how to select the unlabeled data to be labeled by a human oracle in order to maximize the performance gain of the model with as little labeling effort as possible. In this thesis, however, we focus not only in how to select the data instances but also how to select which features to be labeled by the oracle in a unified manner. By unified, it means that we tried to select the best possible combination of features and instances on each iteration. Labeling the features is especially helpful for high dimensional data, since it allows the model to discover the important features earlier.

The method that we propose is by synthesizing new instances that represent a set of features. By utilizing synthesized instances, we can treat this set of features as if they are regular instances. Therefore they could be compared on an equal ground when the model tries to select which instances to be labeled by the oracle. The features used to build the synthesized instances need to be carefully selected so the resulting synthesized instances could improve the model and not introducing any contradicting information. We utilize hierarchical clustering in order to group features that have similar context. This is done first by picking clusters whose purity are estimated to be high. Then we score the features based on how common the feature is in the cluster and how related the feature is to the estimated majority label. The top scoring features then will be used to synthesize instances. Since we are picking clusters that are estimated to has high purity, there is a good chance that the top scoring features will not contradicting each other.

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i Table of contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.1 Active Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.1.1 Active Learning Scenarios . . . . . . . . . . . . . . . . . . . . 8 3.1.2 Query Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2 Overall Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.3 Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.4 Synthesizing Instances . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.1.1 20 Newsgroups . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.1.2 Mushroom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.1.3 Chess . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.1.4 SPECT Heart . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.3 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.4 Oracle Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.6 Experimental Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.7 Experiment Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.7.1 Using One Feature on Each Synthesized Instances . . . . . . . 25 4.7.2 Using Multiple Features on Each Synthesized Instances . . . . 29 4.7.3 Using Different Cost for Features and Instances . . . . . . . . 33 4.7.4 Using Different Confidence Threshold on Oracle . . . . . . . . 35 4.7.5 Using Different Model . . . . . . . . . . . . . . . . . . . . . . 36 5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Appendix A: Example of clusters queried . . . . . . . . . . . . . . . . . . . . . 39 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

[1] H. Raghavan, O. Madani, and R. Jones, “Active learning with feedback on
features and instances,” Journal of Machine Learning Research, vol. 7, no. Aug,
pp. 1655–1686, 2006.
[2] H. Raghavan and J. Allan, “An interactive algorithm for asking and incorporat-
ing feature feedback into support vector machines,” in Proceedings of the 30th
annual international ACM SIGIR conference on Research and development in
information retrieval, pp. 79–86, ACM, 2007.
[3] J. Attenberg, P. Melville, and F. Provost, “A unified approach to active dual su-
pervision for labeling features and examples,” in Joint European Conference on
Machine Learning and Knowledge Discovery in Databases, pp. 40–55, Springer,
2010.
[4] B. Settles, “Closing the loop: Fast, interactive semi-supervised annotation with
queries on features and instances,” in Proceedings of the conference on empirical
methods in natural language processing, pp. 1467–1478, Association for Com-
putational Linguistics, 2011.
[5] B. Settles, “Active learning literature survey,” University of Wisconsin, Madi-
son, vol. 52, no. 55-66, p. 11, 2010.
[6] G. Druck, B. Settles, and A. McCallum, “Active learning by labeling features,”
in Proceedings of the 2009 Conference on Empirical Methods in Natural Lan-
guage Processing: Volume 1-Volume 1, pp. 81–90, Association for Computa-
tional Linguistics, 2009.
[7] W.-K. Wong, I. Oberst, S. Das, T. Moore, S. Stumpf, K. McIntosh, and M. Bur-
nett, “End-user feature labeling: A locally-weighted regression approach,” in
Proceedings of the 16th international conference on Intelligent user interfaces,
pp. 115–124, ACM, 2011
[8] D. D. Lewis and W. A. Gale, “A sequential algorithm for training text clas-
sifiers,” in Proceedings of the 17th annual international ACM SIGIR confer-
ence on Research and development in information retrieval, pp. 3–12, Springer-
Verlag New York, Inc., 1994.
[9] T. Scheffer, C. Decomain, and S. Wrobel, “Active hidden markov models for
information extraction,” in International Symposium on Intelligent Data Anal-
ysis, pp. 309–318, Springer, 2001.
[10] C. E. Shannon, “A mathematical theory of communication,” ACM SIGMO-
BILE Mobile Computing and Communications Review, vol. 5, no. 1, pp. 3–55,
2001.
[11] B. Settles and M. Craven, “An analysis of active learning strategies for sequence
labeling tasks,” in Proceedings of the conference on empirical methods in natural
language processing, pp. 1070–1079, Association for Computational Linguistics,
2008.
[12] N. Roy and A. McCallum, “Toward optimal active learning through monte carlo
estimation of error reduction,” ICML, Williamstown, pp. 441–448, 2001.
[13] B. Demir, C. Persello, and L. Bruzzone, “Batch-mode active-learning methods
for the interactive classification of remote sensing images,” IEEE Transactions
on Geoscience and Remote Sensing, vol. 49, no. 3, pp. 1014–1031, 2011.
[14] D. Müllner, “Modern hierarchical, agglomerative clustering algorithms,” arXiv
preprint arXiv:1109.2378, 2011.
[15] S. Dasgupta and D. Hsu, “Hierarchical sampling for active learning,” in Pro-
ceedings of the 25th international conference on Machine learning, pp. 208–215,
ACM, 2008.
[16] M. Lichman, “UCI machine learning repository,” 2013.
[17] D. Lewis, “Naive (bayes) at forty: The independence assumption in information
retrieval,” Machine learning: ECML-98, pp. 4–15, 1998.
[18] T. Joachims, “Text categorization with support vector machines: Learning with
many relevant features,” Machine learning: ECML-98, pp. 137–142, 1998.
[19] D. Cournapeau, “scikit-learn,” 2007.

QR CODE