簡易檢索 / 詳目顯示

研究生: 王潔汝
Chieh-Ju Wang
論文名稱: 基於注意力特徵圖及自監督聯合訓練之非監督式領域自適應分類
Unsupervised Domain Adaptive Classification Based on Attention Feature Map and Self-Supervised Co-training
指導教授: 郭景明
Jing-Ming Guo
口試委員: 杭學鳴
Hsueh-Ming Hang
張傳育
Chuan-Yu Chang
陳彥霖
Yen-Lin Chen
丁建均
Jian-Jiun Ding
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 80
中文關鍵詞: 非監督學習領域自適應注意力機制標籤正則化自監督聯合學習
外文關鍵詞: Unsupervised learning, Domain adaptation, Attention mechanism, Label regularization, Self-supervised co-training
相關次數: 點閱:195下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著深度學習的興起,資料量大、資料種類多成為了影響實驗結果的重大因子之一,但在現實生活中,並非所有資料都有龐大的資料庫,例如:肺部X光的醫學影像、罕見疾病影像、農作物疾病影像、光學檢測的瑕疵影像、人臉影像…等,這些資料集往往不易取得外,亦牽涉個人隱私,提升蒐集資料的難度,然而,即便擁有了大量資料庫,也需耗費大量的人力標註成本,為了改善上述所提及的資料蒐集不易、昂貴的人力標註成本,提出了領域自適應(Domain Adaptation)的策略。
    本論文使用非監督學習結合注意力機制、標籤正則化及聯合學習的方式,針對領域自適應(Domain Adaptation)的分類模型進行優化,其資料集為2017年視覺領域自適應比賽(Visual Domain Adaptation Challenge)之資料集,其中源域(Source domain)及目標域(Target domain)分別各有12類,源域資料集由不同角度和不同光照條件下生成的 3D 模型之合成 2D 影像,目標域資料集由從 Microsoft COCO 資料集擷取圖像所組成的真實影像,其主要目標為透過一已知正確標記且與目標測試資料集擁有相同類別及類別數量之源域資料集來做特徵轉移,將其學習的知識直接應用在目標資料集上,如此以來,不僅省下了人工標註的成本,也解決資料蒐集不易之問題,然而,在此過程中,發現目標測試資料集中的部分影像有錯誤標記、多個類別同時出現在一張圖片上、分類物件不完整以及圖片無特定重點類別等問題,造成分類的難度增加,因此提出注意力機制,期望同一張圖片在老師及學生模型同時訓練下能夠關注在相同的特徵點上,以提高分類準確度,再加入了標籤正則化的手法,以避免出現模型對特定類別過於絕對的預測,透過削弱其預測最高之類別,同時學習類別間的同質性及異質性,最後加入自監督聯合學習,針對經過不同資料擴增後的目標域圖片,以聯合投票的方式,提升對於單一圖片有多個類別的分類能力。


    With the rise of deep learning, a large amount and the variety of data have become one of the significant factors for the practical application. Yet, in reality not all of the data sets have sufficient samples, for example, medical images of lung X-rays, images of rare diseases, images of crop diseases, images of defects in optical inspection, etc. These data sets are not only difficult to collect, but also involve personal privacy, making data collection more challenging. However, even if we have a large database, it still needs to spend a lot of human tagging costs. To improve the above-mentioned difficulties, this study employs the domain adaptation to tackle the problems.
    In this thesis, the unsupervised learning combined with attention mechanism is adopted along with the label regularization and co-training to optimize the classification model for domain adaptation. The dataset is from the visual domain adaptation challenge 2017. The source domain dataset consists of synthetic 2D images of 3D models generated under different angles and lighting conditions. The target domain dataset consists of real images from the Microsoft COCO dataset. However, during this process, some images in the target test dataset were found to have mislabeling, and sometimes multiple categories show on a single image. Some incomplete classified objects and images without a specific focus category are also accompanied. Consequently, we proposed an attention mechanism, expecting the same picture to focus on the same features under the simultaneous training of the teacher and student models to improve the classification accuracy. The label regularization is also deployed to avoid the model's overly absolute prediction of a specific category, and to learn the homogeneity and heterogeneity among categories by weakening the category with the highest prediction. Finally, the self-supervised co-training is employed to improve the classification of a single image with multiple categories by joint voting for the target domain images after data augmentation.

    中文摘要 III Abstract IV 致謝 V 目錄 VI 圖片索引 IX 表格索引 XI 第一章 緒論 1 1.1 背景介紹 1 1.2 VisDA2017背景介紹 4 1.3 研究動機與目的 5 1.4 論文架構 7 第二章 文獻探討 8 2.1 自集成式領域適應 8 2.1.1 π-model[3] 8 2.1.2 Temporal Ensembling[3] 9 2.1.3 Mean Teacher[4] 11 2.1.4 Self-ensemble for visual domain adaptation[1] 12 2.2 卷積神經網路在注意力轉移下的提升能力[5] 13 2.3 標籤正則化 16 2.3.1 標籤平滑的幫助(Label Smoothing)[8] 16 2.3.2 神經網路中的知識蒸餾(Knowledge Distillation)[10] 18 2.3.3 研究標籤平滑與知識蒸餾之間的兼容性[12] 19 2.4 自監督式的聯合訓練 21 2.4.1 自監督式學習下的對比式學習(Contrastive Learning)[13] 21 2.4.2 半監督式學習下的多向協同訓練(Multi-head Co-training)[14] 22 2.5 深度特徵可視化Class Activation Map(CAM)[16] 24 第三章 研究方法 27 3.1 VisDA2017資料集介紹[2] 27 3.2 網路架構流程圖 32 3.3 注意力模組 33 3.4 多向聯合訓練 35 3.5 標籤平滑化 37 3.6 損失函數公式 39 第四章 實驗結果 40 4.1 實驗環境 40 4.2 實現細節 40 4.3 實驗結果與分析 40 4.3.1 評估指標 40 4.3.1.1 準確度(Accuracy) 41 4.3.1.2 標準差(Standard deviation) 41 4.3.2 實驗結果 42 4.3.2.1 各類別加總平均後的實驗結果 42 4.3.3 消融實驗 42 4.3.3.1 測試時加入資料擴增的結果 42 4.3.3.2 測試時無資料擴增的結果 43 4.3.3.3 本文提出的方法與Baseline使用ResNet101的比較結果 43 4.3.3.4 加入注意力模組、標籤平滑化及多向聯合學習的各別差異 44 4.3.3.5 注意力模組的置信度閾值 45 4.3.3.6 標籤平滑化的溫度數值 45 4.3.3.7 多向聯合學習的置信度閾值 47 4.3.4 可視化結果 48 4.3.4.1 圖片歸類的分析 48 4.3.4.1.1 一般圖片(Normal image) 48 4.3.4.1.2 非完整類別圖片(Incomplete image) 50 4.3.4.1.3 未關注特定單個類別的圖片(Unfocused image) 52 4.3.4.1.4 多類類別出現在單一圖片上 53 4.3.4.2 各類別可視化結果 54 4.3.4.2.1 飛機(Plane) 54 4.3.4.2.2 腳踏車(Bicycle) 55 4.3.4.2.3 巴士(Bus) 56 4.3.4.2.4 汽車 (Car) 57 4.3.4.2.5 馬(Horse) 58 4.3.4.2.6 刀子(Knife) 59 4.3.4.2.7 摩托車(Motorcycle) 60 4.3.4.2.8 人(Person) 61 4.3.4.2.9 植物(Plant) 62 4.3.4.2.10 滑板(Skateboard) 63 4.3.4.2.11 火車(Train) 64 4.3.4.2.12 卡車(Truck) 65 第五章 結論與未來展望 66 參考文獻 67

    [1] G. French, M. Mackiewicz, and M. Fisher, "Self-ensembling for visual domain adaptation," arXiv preprint arXiv:1706.05208, 2017.
    [2] X. Peng, B. Usman, N. Kaushik, J. Hoffman, D. Wang, and K. Saenko, "Visda: The visual domain adaptation challenge," arXiv preprint arXiv:1710.06924, 2017.
    [3] S. Laine and T. Aila, "Temporal ensembling for semi-supervised learning," arXiv preprint arXiv:1610.02242, 2016.
    [4] A. Tarvainen and H. Valpola, "Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results," Advances in neural information processing systems, vol. 30, 2017.
    [5] S. Zagoruyko and N. Komodakis, "Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer," arXiv preprint arXiv:1612.03928, 2016.
    [6] M. Lin, Q. Chen, and S. Yan, "Network in network," arXiv preprint arXiv:1312.4400, 2013.
    [7] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
    [8] R. Müller, S. Kornblith, and G. E. Hinton, "When does label smoothing help?," Advances in neural information processing systems, vol. 32, 2019.
    [9] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, "Inception-v4, inception-resnet and the impact of residual connections on learning," in Thirty-first AAAI conference on artificial intelligence, 2017.
    [10] G. Hinton, O. Vinyals, and J. Dean, "Distilling the knowledge in a neural network," arXiv preprint arXiv:1503.02531, vol. 2, no. 7, 2015.
    [11] J. Gou, B. Yu, S. J. Maybank, and D. Tao, "Knowledge distillation: A survey," International Journal of Computer Vision, vol. 129, no. 6, pp. 1789-1819, 2021.
    [12] Z. Shen, Z. Liu, D. Xu, Z. Chen, K.-T. Cheng, and M. Savvides, "Is label smoothing truly incompatible with knowledge distillation: An empirical study," arXiv preprint arXiv:2104.00676, 2021.
    [13] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, "A simple framework for contrastive learning of visual representations," in International conference on machine learning, 2020: PMLR, pp. 1597-1607.
    [14] M. Chen, Y. Du, Y. Zhang, S. Qian, and C. Wang, "Semi-supervised learning with multi-head co-training," in Proceedings of the AAAI Conference on Artificial Intelligence, 2022, vol. 36, no. 6, pp. 6278-6286.
    [15] M.-R. Amini, V. Feofanov, L. Pauletto, E. Devijver, and Y. Maximov, "Self-Training: A Survey," arXiv preprint arXiv:2202.12040, 2022.
    [16] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, "Learning deep features for discriminative localization," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2921-2929.
    [17] J. Liang, D. Hu, and J. Feng, "Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation," in International Conference on Machine Learning, 2020: PMLR, pp. 6028-6039.
    [18] S. Lee, D. Kim, N. Kim, and S.-G. Jeong, "Drop to adapt: Learning discriminative features for unsupervised domain adaptation," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 91-100.
    [19] C.-Y. Lee, T. Batra, M. H. Baig, and D. Ulbricht, "Sliced wasserstein discrepancy for unsupervised domain adaptation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 10285-10295.
    [20] V. Prabhu, S. Khare, D. Kartik, and J. Hoffman, "Sentry: Selective entropy optimization via committee consistency for unsupervised domain adaptation," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 8558-8567.
    [21] Y. Zou, Z. Yu, X. Liu, B. Kumar, and J. Wang, "Confidence regularized self-training," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 5982-5991.

    無法下載圖示 全文公開日期 2024/08/29 (校內網路)
    全文公開日期 2024/08/29 (校外網路)
    全文公開日期 2024/08/29 (國家圖書館:臺灣博碩士論文系統)
    QR CODE