研究生: |
陳信杰 Hsin-Chieh Chen |
---|---|
論文名稱: |
非迭代式主動式學習 Active Learning in Non-Iterative Approach |
指導教授: |
洪西進
Shi-Jinn Horng |
口試委員: |
林祝興
Chu-Hsin Lin 楊竹星 Chu-Sing Yang 李正吉 Cheng-Chi Lee 顏成安 Cheng-An Yen 洪西進 Shi-Jinn Horng |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
論文出版年: | 2023 |
畢業學年度: | 111 |
語文別: | 中文 |
論文頁數: | 37 |
中文關鍵詞: | 主動式學習 、非迭代式 、資料篩選 |
外文關鍵詞: | Active Learning, Non-Iterative Approach, Data selection |
相關次數: | 點閱:304 下載:6 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著深度學習技術的快速發展,深度學習已經廣泛的被許多企業用於各
種用途與產品,然而並非所有的使用場景都有公開的資料集可以使用,因此有大
量的時間與人力成本都須用於標記資料,有鑑於此,主動式學習[1]。技術應運而
生,此技術可以自主地為模型篩選具有學習價值的資料並過濾掉對模型學習貢獻
度低的資料,有效地避免需要大量標記資料,從而大大降低人力與時間成本。
但傳統的主動式學習(Active Learning)[1]架構需要經過多次的迭代(Iteration),
重複標記(Label)、訓練(Training)、篩選資料(Data selection)的流程直到篩選出預
期數量的資料,本研究提出一個無需迭代的非迭代式主動式學習(Non-Iterative
Active Learning)架構,透過搭建一個以 LSTM[2]為基礎的信心度(Confidence)修
正模型,使利用少量資料訓練出的模型,所產出的信心度分布,能夠與使用大量
資料訓練出的模型產出的結果相近,以此種方法提升以 Entropy[3]為基礎的
Selection Function[4]篩選資料時的準確度。
本研究使用主動式學習領域常見的指標數據集 CIFAR10[5]、CIFAR100[6]
進行實驗,實驗結果顯示本研究之方法在 CIFAR10 和 CIFAR100 資料集中,分
別可以只使用 50%與 63%資料就還原出使用完整資料訓練時的準確率,且在
CIFAR10 資料集中以選取 50%資料量為目標的情況下與傳統主動式學習架構相
比,能夠以不低於傳統架構的準確率,但整體執行速度提升 7 倍,同樣地,在
CIFAR100 資料集中以選取 50%資料量為目標的情況下與傳統主動式學習架構
相比也能夠提升 4.7 倍。
With the rapid development of deep learning technology, deep learning has
been widely adopted by many companies for various purposes and products.
However, not all use cases have publicly available datasets to use. As a result, a
significant amount of time and manpower is required for data labeling. In light of
this, active learning techniques have emerged, which autonomously select
valuable data for model learning and filter out data with low contribution to the
model, effectively reducing the need for extensive data labeling and significantly
lowering manpower and time costs.
However, traditional active learning frameworks require multiple iterations,
involving repetitive processes of labeling, training, and data selection until the
desired amount of data is selected. In this study, we propose a non-iterative
active learning framework, eliminating the need for iterations. This framework
utilizes a confidence correction model based on LSTM , which enables the
confidence distribution generated by a model trained on a small amount of data
to be comparable to the results produced by a model trained on a large amount of
data. This method improves the accuracy of data selection using an entropybased selection function .
We conducted experiments using the commonly used benchmark datasets in
the active learning field, CIFAR10 and CIFAR100. The experimental results
demonstrate that our proposed method can achieve accuracy comparable to that
of models trained on complete datasets using only 50% and 63% of the data for
CIFAR10 and CIFAR100, respectively. Moreover, compared to the traditional
active learning framework, when aiming to select 50% of the data on the
CIFAR10 dataset, our method achieves comparable accuracy while improving
the overall execution speed by 7 times. Similarly, when aiming to select 50% of
the data on the CIFAR100 dataset, our method improves the execution speed by
4.7 times compared to the traditional active learning framework.
[1] B. Settles, “Active learning literature survey,” 2009.
[2] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[3] B. Bein, “Entropy,” Best Practice & Research Clinical Anaesthesiology, vol. 20, no. 1, pp. 101–109, 2006.
[4] J. Kremer, K. Steenstrup Pedersen, and C. Igel, “Active learning with support vector machines,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 4, no. 4, pp. 313–326, 2014.
[5] A. Krizhevsky, G. Hinton, and others, “Learning multiple layers of features from tiny images,” 2009.
[6] A. Krizhevsky, V. Nair, and G. Hinton, “Cifar-10 (canadian institute for advanced research),” URL http://www. cs. toronto. edu/kriz/cifar. html, vol. 5, no. 4, p. 1, 2010.
[7] J. MacQueen and others, “Some methods for classification and analysis of multivariate observations,” in Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 1967, vol. 1, no. 14, pp. 281–297.
[8] M. Ester, H.-P. Kriegel, J. Sander, X. Xu, and others, “A density-based algorithm for discovering clusters in large spatial databases with noise,” in kdd, 1996, vol. 96, no. 34, pp. 226–231.
[9] J. A. Bilmes and others, “A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models,” International computer science institute, vol. 4, no. 510, p. 126, 1998.
[10] M. Toneva, A. Sordoni, R. T. des Combes, A. Trischler, Y. Bengio, and G. J. Gordon, “An empirical study of example forgetting during deep neural network learning,” arXiv preprint arXiv:1812.05159, 2018.
[11] D. Yoo and I. S. Kweon, “Learning loss for active learning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 93–102.
[12] O. Sener and S. Savarese, “Active learning for convolutional neural networks: A core-set approach,” arXiv preprint arXiv:1708.00489, 2017.
[13] C. Coleman et al., “Selection via proxy: Efficient data selection for deep learning,” arXiv preprint arXiv:1906.11829, 2019.
[14] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
[15] K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, 2016, pp. 630–645.
[16] S. Bhatnagar, S. Goyal, D. Tank, and A. Sethi, “Pal: pretext-based active learning,” arXiv preprint arXiv:2010.15947, 2020.
[17] K. Lee, H. Lee, K. Lee, and J. Shin, “Training confidence-calibrated classifiers for detecting out-of-distribution samples,” arXiv preprint arXiv:1711.09325, 2017.
[18] C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” in International conference on machine learning, 2017, pp. 1321–1330.
[19] M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in International conference on machine learning, 2019, pp. 6105–6114.
[20] M. Menéndez, J. Pardo, L. Pardo, and M. Pardo, “The jensen-shannon divergence,” Journal of the Franklin Institute, vol. 334, no. 2, pp. 307–318, 1997.