研究生: |
王潔汝 Chieh-Ju Wang |
---|---|
論文名稱: |
基於注意力特徵圖及自監督聯合訓練之非監督式領域自適應分類 Unsupervised Domain Adaptive Classification Based on Attention Feature Map and Self-Supervised Co-training |
指導教授: |
郭景明
Jing-Ming Guo |
口試委員: |
杭學鳴
Hsueh-Ming Hang 張傳育 Chuan-Yu Chang 陳彥霖 Yen-Lin Chen 丁建均 Jian-Jiun Ding |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電機工程系 Department of Electrical Engineering |
論文出版年: | 2022 |
畢業學年度: | 110 |
語文別: | 中文 |
論文頁數: | 80 |
中文關鍵詞: | 非監督學習 、領域自適應 、注意力機制 、標籤正則化 、自監督聯合學習 |
外文關鍵詞: | Unsupervised learning, Domain adaptation, Attention mechanism, Label regularization, Self-supervised co-training |
相關次數: | 點閱:389 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著深度學習的興起,資料量大、資料種類多成為了影響實驗結果的重大因子之一,但在現實生活中,並非所有資料都有龐大的資料庫,例如:肺部X光的醫學影像、罕見疾病影像、農作物疾病影像、光學檢測的瑕疵影像、人臉影像…等,這些資料集往往不易取得外,亦牽涉個人隱私,提升蒐集資料的難度,然而,即便擁有了大量資料庫,也需耗費大量的人力標註成本,為了改善上述所提及的資料蒐集不易、昂貴的人力標註成本,提出了領域自適應(Domain Adaptation)的策略。
本論文使用非監督學習結合注意力機制、標籤正則化及聯合學習的方式,針對領域自適應(Domain Adaptation)的分類模型進行優化,其資料集為2017年視覺領域自適應比賽(Visual Domain Adaptation Challenge)之資料集,其中源域(Source domain)及目標域(Target domain)分別各有12類,源域資料集由不同角度和不同光照條件下生成的 3D 模型之合成 2D 影像,目標域資料集由從 Microsoft COCO 資料集擷取圖像所組成的真實影像,其主要目標為透過一已知正確標記且與目標測試資料集擁有相同類別及類別數量之源域資料集來做特徵轉移,將其學習的知識直接應用在目標資料集上,如此以來,不僅省下了人工標註的成本,也解決資料蒐集不易之問題,然而,在此過程中,發現目標測試資料集中的部分影像有錯誤標記、多個類別同時出現在一張圖片上、分類物件不完整以及圖片無特定重點類別等問題,造成分類的難度增加,因此提出注意力機制,期望同一張圖片在老師及學生模型同時訓練下能夠關注在相同的特徵點上,以提高分類準確度,再加入了標籤正則化的手法,以避免出現模型對特定類別過於絕對的預測,透過削弱其預測最高之類別,同時學習類別間的同質性及異質性,最後加入自監督聯合學習,針對經過不同資料擴增後的目標域圖片,以聯合投票的方式,提升對於單一圖片有多個類別的分類能力。
With the rise of deep learning, a large amount and the variety of data have become one of the significant factors for the practical application. Yet, in reality not all of the data sets have sufficient samples, for example, medical images of lung X-rays, images of rare diseases, images of crop diseases, images of defects in optical inspection, etc. These data sets are not only difficult to collect, but also involve personal privacy, making data collection more challenging. However, even if we have a large database, it still needs to spend a lot of human tagging costs. To improve the above-mentioned difficulties, this study employs the domain adaptation to tackle the problems.
In this thesis, the unsupervised learning combined with attention mechanism is adopted along with the label regularization and co-training to optimize the classification model for domain adaptation. The dataset is from the visual domain adaptation challenge 2017. The source domain dataset consists of synthetic 2D images of 3D models generated under different angles and lighting conditions. The target domain dataset consists of real images from the Microsoft COCO dataset. However, during this process, some images in the target test dataset were found to have mislabeling, and sometimes multiple categories show on a single image. Some incomplete classified objects and images without a specific focus category are also accompanied. Consequently, we proposed an attention mechanism, expecting the same picture to focus on the same features under the simultaneous training of the teacher and student models to improve the classification accuracy. The label regularization is also deployed to avoid the model's overly absolute prediction of a specific category, and to learn the homogeneity and heterogeneity among categories by weakening the category with the highest prediction. Finally, the self-supervised co-training is employed to improve the classification of a single image with multiple categories by joint voting for the target domain images after data augmentation.
[1] G. French, M. Mackiewicz, and M. Fisher, "Self-ensembling for visual domain adaptation," arXiv preprint arXiv:1706.05208, 2017.
[2] X. Peng, B. Usman, N. Kaushik, J. Hoffman, D. Wang, and K. Saenko, "Visda: The visual domain adaptation challenge," arXiv preprint arXiv:1710.06924, 2017.
[3] S. Laine and T. Aila, "Temporal ensembling for semi-supervised learning," arXiv preprint arXiv:1610.02242, 2016.
[4] A. Tarvainen and H. Valpola, "Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results," Advances in neural information processing systems, vol. 30, 2017.
[5] S. Zagoruyko and N. Komodakis, "Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer," arXiv preprint arXiv:1612.03928, 2016.
[6] M. Lin, Q. Chen, and S. Yan, "Network in network," arXiv preprint arXiv:1312.4400, 2013.
[7] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[8] R. Müller, S. Kornblith, and G. E. Hinton, "When does label smoothing help?," Advances in neural information processing systems, vol. 32, 2019.
[9] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, "Inception-v4, inception-resnet and the impact of residual connections on learning," in Thirty-first AAAI conference on artificial intelligence, 2017.
[10] G. Hinton, O. Vinyals, and J. Dean, "Distilling the knowledge in a neural network," arXiv preprint arXiv:1503.02531, vol. 2, no. 7, 2015.
[11] J. Gou, B. Yu, S. J. Maybank, and D. Tao, "Knowledge distillation: A survey," International Journal of Computer Vision, vol. 129, no. 6, pp. 1789-1819, 2021.
[12] Z. Shen, Z. Liu, D. Xu, Z. Chen, K.-T. Cheng, and M. Savvides, "Is label smoothing truly incompatible with knowledge distillation: An empirical study," arXiv preprint arXiv:2104.00676, 2021.
[13] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, "A simple framework for contrastive learning of visual representations," in International conference on machine learning, 2020: PMLR, pp. 1597-1607.
[14] M. Chen, Y. Du, Y. Zhang, S. Qian, and C. Wang, "Semi-supervised learning with multi-head co-training," in Proceedings of the AAAI Conference on Artificial Intelligence, 2022, vol. 36, no. 6, pp. 6278-6286.
[15] M.-R. Amini, V. Feofanov, L. Pauletto, E. Devijver, and Y. Maximov, "Self-Training: A Survey," arXiv preprint arXiv:2202.12040, 2022.
[16] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, "Learning deep features for discriminative localization," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2921-2929.
[17] J. Liang, D. Hu, and J. Feng, "Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation," in International Conference on Machine Learning, 2020: PMLR, pp. 6028-6039.
[18] S. Lee, D. Kim, N. Kim, and S.-G. Jeong, "Drop to adapt: Learning discriminative features for unsupervised domain adaptation," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 91-100.
[19] C.-Y. Lee, T. Batra, M. H. Baig, and D. Ulbricht, "Sliced wasserstein discrepancy for unsupervised domain adaptation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 10285-10295.
[20] V. Prabhu, S. Khare, D. Kartik, and J. Hoffman, "Sentry: Selective entropy optimization via committee consistency for unsupervised domain adaptation," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 8558-8567.
[21] Y. Zou, Z. Yu, X. Liu, B. Kumar, and J. Wang, "Confidence regularized self-training," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 5982-5991.