研究生: 陳信杰
Hsin-Chieh Chen
論文名稱: 非迭代式主動式學習
Active Learning in Non-Iterative Approach
指導教授: 洪西進
Shi-Jinn Horng
口試委員: 林祝興
Chu-Hsin Lin
Chu-Sing Yang
Cheng-Chi Lee
Cheng-An Yen
Shi-Jinn Horng
學位類別: 碩士
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 37
中文關鍵詞: 主動式學習非迭代式資料篩選
外文關鍵詞: Active Learning, Non-Iterative Approach, Data selection
  • 隨著深度學習技術的快速發展,深度學習已經廣泛的被許多企業用於各
    但傳統的主動式學習(Active Learning)[1]架構需要經過多次的迭代(Iteration),
    重複標記(Label)、訓練(Training)、篩選資料(Data selection)的流程直到篩選出預
    Active Learning)架構,透過搭建一個以 LSTM[2]為基礎的信心度(Confidence)修
    資料訓練出的模型產出的結果相近,以此種方法提升以 Entropy[3]為基礎的
    Selection Function[4]篩選資料時的準確度。
    本研究使用主動式學習領域常見的指標數據集 CIFAR10[5]、CIFAR100[6]
    進行實驗,實驗結果顯示本研究之方法在 CIFAR10 和 CIFAR100 資料集中,分
    別可以只使用 50%與 63%資料就還原出使用完整資料訓練時的準確率,且在
    CIFAR10 資料集中以選取 50%資料量為目標的情況下與傳統主動式學習架構相
    比,能夠以不低於傳統架構的準確率,但整體執行速度提升 7 倍,同樣地,在
    CIFAR100 資料集中以選取 50%資料量為目標的情況下與傳統主動式學習架構
    相比也能夠提升 4.7 倍。

    With the rapid development of deep learning technology, deep learning has
    been widely adopted by many companies for various purposes and products.
    However, not all use cases have publicly available datasets to use. As a result, a
    significant amount of time and manpower is required for data labeling. In light of
    this, active learning techniques have emerged, which autonomously select
    valuable data for model learning and filter out data with low contribution to the
    model, effectively reducing the need for extensive data labeling and significantly
    lowering manpower and time costs.
    However, traditional active learning frameworks require multiple iterations,
    involving repetitive processes of labeling, training, and data selection until the
    desired amount of data is selected. In this study, we propose a non-iterative
    active learning framework, eliminating the need for iterations. This framework
    utilizes a confidence correction model based on LSTM , which enables the
    confidence distribution generated by a model trained on a small amount of data
    to be comparable to the results produced by a model trained on a large amount of
    data. This method improves the accuracy of data selection using an entropybased selection function .
    We conducted experiments using the commonly used benchmark datasets in
    the active learning field, CIFAR10 and CIFAR100. The experimental results
    demonstrate that our proposed method can achieve accuracy comparable to that
    of models trained on complete datasets using only 50% and 63% of the data for
    CIFAR10 and CIFAR100, respectively. Moreover, compared to the traditional
    active learning framework, when aiming to select 50% of the data on the
    CIFAR10 dataset, our method achieves comparable accuracy while improving
    the overall execution speed by 7 times. Similarly, when aiming to select 50% of
    the data on the CIFAR100 dataset, our method improves the execution speed by
    4.7 times compared to the traditional active learning framework.

