CTBDA：基於清晰到模糊學習之領域自適應語意分割｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳昱愷 Yu-Kai Chen
論文名稱：	CTBDA：基於清晰到模糊學習之領域自適應語意分割 CTBDA: Learn from Clear to Blurry for Domain Adaptive Semantic Segmentation
指導教授：	阮聖彰 Shanq-Jang Ruan
口試委員:	林昌鴻 Chang-Hong Lin 呂政修 Jenq-Shiou Leu 彭文志 Wen-Chih Peng
學位類別：	碩士 Master
系所名稱：	電資學院 - 電子工程系 Department of Electronic and Computer Engineering
論文出版年：	2023
畢業學年度：	111
語文別：	英文
論文頁數：	70
中文關鍵詞：	領域自適應、語意式分割、先進駕駛輔助系統、深度學習、非監督式學習
外文關鍵詞：	Domain Adaptation, Semantic Segmentation, ADAS, Deep Learning, Unsupervised Learning
相關次數：	點閱：359 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

將訓練於來源域 (Source domain)的深度學習模型直接應用於目標域 (Target domain)會因為Domain shift problem導致其準確度顯著下降。無監督領域自適應（Unsupervised Domain Adaptation, UDA）方法致力於將來源域學習到的知識轉移至目標域，解決Domain shift problem。本論文致力於改進用於語義式分割的UDA方法。UDA方法需要使用模型生成的偽標籤 (Pseudo-label)進行訓練。然而，語義式分割模型在模糊物體的精確度較低，使其偽標籤會帶有高雜訊，限制了UDA方法的準確性。為了減少高雜訊模糊樣本對UDA方法的影響，我們提出了一種新穎的無監督領域自適應框架，稱為Clear to Blurry Domain Adaptation (CTBDA)。我們的方法從清晰樣本訓練到模糊樣本，循序漸進地完成語義式分割模型的訓練，減少了訓練初期高雜訊模糊樣本的影響，使模型能夠更有效地學習正確的知識。由於物體的距離反映了其成像的模糊程度，且因為遠處物體的成像較小，我們利用物體大小來評估樣本的清晰度，並以輸入影像的熵評估物體大小。CTBDA顯著地提高了當前state-of-the-art方法的準確性，使GTA5→CityScapes任務的mIoU提高了1.25，Synthia→CityScapes任務的mIoU提高了1.3，分別達到了77.15和68.5 mIoU。

Directly applying a deep learning model trained on the source domain to the target domain results in a significant drop in model accuracy due to the domain shift problem. Unsupervised domain adaptation (UDA) methods attempt to transfer the knowledge learned from the source domain to the target domain to solve the domain shift problem. This paper focuses on improving UDA methods for semantic segmentation. UDA methods require using pseudo-labels generated by the model for training. However, semantic segmentation models have difficulty making precise predictions for blurry objects, resulting in high-noise pseudo-labels for such objects, which limits the accuracy of UDA methods. To overcome the limitations of UDA methods caused by high noise blurry samples, we propose Clear to Blurry Domain Adaptation (CTBDA), a novel unsupervised domain adaptation framework for semantic segmentation. Our method trains the semantic segmentation model from clear to blurry samples, reducing the influence of high-noise blurry samples at the beginning of training and allowing the model to learn the proper knowledge more effectively. Since object distance reflects the blurriness of its imaging, and farther objects appear smaller in images, we utilize input space entropy, which corresponds to object size, to evaluate the clarity of samples. CTBDA significantly improves the accuracy of current state-of-the-art methods. It increases the state-of-the-art performance by 1.25 mIoU for GTA5→CityScapes and 1.3 mIoU for Synthia→CityScapes task, resulting in 77.15 and 68.5 mIoU respectively.

Contents
Recommendation Letter . . . . . . . . . . . . . . . . . . . . . . . . i
Approval Letter . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Abstract in Chinese . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Abstract in English . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . v
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
List of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1 Unsupervised Domain Adaptation . . . . . . . . . . . . . 9
2.2 Curriculum Learning for UDA . . . . . . . . . . . . . . . 11
3 Preliminary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1 HRDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 MIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4 Proposed Methods . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.1 The Overview of Framework . . . . . . . . . . . . . . . . 18
4.2 Class-Balanced Threshold Scheduler . . . . . . . . . . . . 20
4.3 Difficulty Measurer . . . . . . . . . . . . . . . . . . . . . 22
4.4 Loss Functions . . . . . . . . . . . . . . . . . . . . . . . 25
4.5 Pseudo Code . . . . . . . . . . . . . . . . . . . . . . . . 26
5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.1 Implementation Details . . . . . . . . . . . . . . . . . . . 29
5.2 Comparison of State-of-the-Art methods . . . . . . . . . . 31
5.3 Qualitative Comparison . . . . . . . . . . . . . . . . . . . 35
5.4 In-Depth Analysis of CTBDA . . . . . . . . . . . . . . . 42
5.5 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . 45
6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.1 Limitations and Future Work . . . . . . . . . . . . . . . . 49
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Letter of Authority . . . . . . . . . . . . . . . . . . . . . . . . . . 57
                                

[1] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Be-
nenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset
for semantic urban scene understanding,” in Proceedings of the IEEE
conference on computer vision and pattern recognition, pp. 3213–
3223, 2016.
[2] S. R. Richter, V. Vineet, S. Roth, and V. Koltun, “Playing for data:
Ground truth from computer games,” in Computer Vision–ECCV
2016: 14th European Conference, Amsterdam, The Netherlands, Oc-
tober 11-14, 2016, Proceedings, Part II 14, pp. 102–118, Springer,
2016.
[3] H. Shimodaira, “Improving predictive inference under covariate shift
by weighting the log-likelihood function,” Journal of statistical plan-
ning and inference, vol. 90, no. 2, pp. 227–244, 2000.
[4] F. J. Piva, D. de Geus, and G. Dubbelman, “Empirical generaliza-
tion study: Unsupervised domain adaptation vs. domain generaliza-
tion methods for semantic segmentation in the wild,” in Proceedings
of the IEEE/CVF Winter Conference on Applications of Computer Vi-
sion, pp. 499–508, 2023.
[5] G. Wilson and D. J. Cook, “A survey of unsupervised deep domain
adaptation,” ACM Transactions on Intelligent Systems and Technol-
ogy (TIST), vol. 11, no. 5, pp. 1–46, 2020.
[6] L. Hoyer, D. Dai, and L. Van Gool, “Daformer: Improving network
architectures and training strategies for domain-adaptive semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Com-
puter Vision and Pattern Recognition, pp. 9924–9935, 2022.
[7] L. Hoyer, D. Dai, and L. Van Gool, “Hrda: Context-aware high-
resolution domain-adaptive semantic segmentation,” in Computer
Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, Oc-
tober 23–27, 2022, Proceedings, Part XXX, pp. 372–391, Springer,
2022.
[8] L. Hoyer, D. Dai, H. Wang, and L. Van Gool, “Mic: Masked image
consistency for context-enhanced domain adaptation,” arXiv preprint
arXiv:2212.01322, 2022.
[9] P. Zhang, B. Zhang, T. Zhang, D. Chen, Y. Wang, and F. Wen,
“Prototypical pseudo label denoising and target structure learning
for domain adaptive semantic segmentation,” in Proceedings of the
IEEE/CVF conference on computer vision and pattern recognition,
pp. 12414–12424, 2021.
[10] E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo,
“Segformer: Simple and efficient design for semantic segmentation
with transformers,” Advances in Neural Information Processing Sys-
tems, vol. 34, pp. 12077–12090, 2021.
[11] H. Yan, C. Zhang, and M. Wu, “Lawin transformer: Improving se-
mantic segmentation transformer with multi-scale representations via
large window attention,” arXiv preprint arXiv:2201.01615, 2022.
[12] W. Tranheden, V. Olsson, J. Pinto, and L. Svensson, “Dacs: Domain
adaptation via cross-domain mixed sampling,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision,
pp. 1379–1389, 2021.
[13] Q. Wang, D. Dai, L. Hoyer, L. Van Gool, and O. Fink, “Domain adap-
tive semantic segmentation with self-supervised depth estimation,” in
Proceedings of the IEEE/CVF International Conference on Computer
Vision, pp. 8515–8525, 2021.
[14] Y. Zou, Z. Yu, B. Kumar, and J. Wang, “Unsupervised domain adap-
tation for semantic segmentation via class-balanced self-training,” in
Proceedings of the European conference on computer vision (ECCV),
pp. 289–305, 2018.
[15] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial net-
works,” Communications of the ACM, vol. 63, no. 11, pp. 139–144,
2020.
[16] S. Ettedgui, S. Abu-Hussein, and R. Giryes, “Procst: boosting se-
mantic segmentation using progressive cyclic style-transfer,” arXiv
preprint arXiv:2204.11891, 2022.
[17] Y. Yang and S. Soatto, “Fda: Fourier domain adaptation for seman-
tic segmentation,” in Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition, pp. 4085–4095, 2020.
[18] C. Chen, W. Xie, W. Huang, Y. Rong, X. Ding, Y. Huang, T. Xu, and
J. Huang, “Progressive feature alignment for unsupervised domain
adaptation,” in Proceedings of the IEEE/CVF conference on computer
vision and pattern recognition, pp. 627–636, 2019.
[19] Y. Wang, Z. Zhang, W. Hao, and C. Song, “Attention guided multiple
source and target domain adaptation,” IEEE Transactions on Image
Processing, vol. 30, pp. 892–906, 2020.
[20] Y. Luo, P. Liu, L. Zheng, T. Guan, J. Yu, and Y. Yang, “Category-
level adversarial adaptation for semantic segmentation using purified
features,” IEEE Transactions on Pattern Analysis and Machine Intel-
ligence, vol. 44, no. 8, pp. 3940–3956, 2021.
[21] T.-H. Vu, H. Jain, M. Bucher, M. Cord, and P. Pérez, “Advent: Adver-
sarial entropy minimization for domain adaptation in semantic seg-
mentation,” in Proceedings of the IEEE/CVF Conference on Com-
puter Vision and Pattern Recognition, pp. 2517–2526, 2019.
[22] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-
image translation using cycle-consistent adversarial networks,” in
Proceedings of the IEEE international conference on computer vi-
sion, pp. 2223–2232, 2017.
[23] D.-H. Lee et al., “Pseudo-label: The simple and efficient semi-
supervised learning method for deep neural networks,” in Workshop
on challenges in representation learning, ICML, vol. 3, p. 896, 2013.
[24] Q. Zhang, J. Zhang, W. Liu, and D. Tao, “Category anchor-guided un-
supervised domain adaptation for semantic segmentation,” Advances
in neural information processing systems, vol. 32, 2019.
[25] Y. Zou, Z. Yu, X. Liu, B. Kumar, and J. Wang, “Confidence regu-
larized self-training,” in Proceedings of the IEEE/CVF International
Conference on Computer Vision, pp. 5982–5991, 2019.
[26] L. Gao, J. Zhang, L. Zhang, and D. Tao, “Dsp: Dual soft-paste for un-
supervised domain adaptive semantic segmentation,” in Proceedings
of the 29th ACM International Conference on Multimedia, pp. 2825–
2833, 2021.
[27] Q. Zhou, Z. Feng, Q. Gu, J. Pang, G. Cheng, X. Lu, J. Shi, and L. Ma,
“Context-aware mixup for domain adaptive semantic segmentation,”
IEEE Transactions on Circuits and Systems for Video Technology,
2022.
[28] N. Araslanov and S. Roth, “Self-supervised augmentation consis-
tency for adapting semantic segmentation,” in Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition,
pp. 15384–15394, 2021.
[29] J. Choi, T. Kim, and C. Kim, “Self-ensembling with gan-based data
augmentation for domain adaptation in semantic segmentation,” in
Proceedings of the IEEE/CVF International Conference on Computer
Vision, pp. 6830–6840, 2019.
[30] G. French, M. Mackiewicz, and M. Fisher, “Self-ensembling for vi-
sual domain adaptation,” arXiv preprint arXiv:1706.05208, 2017.
[31] L. Melas-Kyriazi and A. K. Manrai, “Pixmatch: Unsupervised do-
main adaptation via pixelwise consistency training,” in Proceedings
of the IEEE/CVF Conference on Computer Vision and Pattern Recog-
nition, pp. 12435–12445, 2021.
[32] J. Huang, D. Guan, A. Xiao, S. Lu, and L. Shao, “Category contrast
for unsupervised domain adaptation in visual tasks,” in Proceedings
of the IEEE/CVF Conference on Computer Vision and Pattern Recog-
nition, pp. 1203–1214, 2022.
[33] B. Xie, S. Li, M. Li, C. H. Liu, G. Huang, and G. Wang, “Sepico:
Semantic-guided pixel contrast for domain adaptive semantic seg-
mentation,” IEEE Transactions on Pattern Analysis and Machine In-
telligence, 2023.
[34] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple frame-
work for contrastive learning of visual representations,” in Interna-
tional conference on machine learning, pp. 1597–1607, PMLR, 2020.
[35] Q. Lian, F. Lv, L. Duan, and B. Gong, “Constructing self-motivated
pyramid curriculums for cross-domain semantic segmentation: A
non-adversarial approach,” in Proceedings of the IEEE/CVF Inter-
national Conference on Computer Vision, pp. 6758–6767, 2019.
[36] C. Sakaridis, D. Dai, and L. Van Gool, “Map-guided curriculum do-
main adaptation and uncertainty-aware evaluation for semantic night-
time image segmentation,” IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 44, no. 6, pp. 3139–3153, 2020.
[37] Q. Xu, Y. Ma, J. Wu, C. Long, and X. Huang, “Cdada: A curriculum
domain adaptation for nighttime semantic segmentation,” in Proceed-
ings of the IEEE/CVF International Conference on Computer Vision,
pp. 2962–2971, 2021.
[38] Y. Zhang, P. David, H. Foroosh, and B. Gong, “A curriculum do-
main adaptation approach to the semantic segmentation of urban
scenes,” IEEE transactions on pattern analysis and machine intel-
ligence, vol. 42, no. 8, pp. 1823–1841, 2019.
[39] Y. Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum
learning,” in Proceedings of the 26th annual international conference
on machine learning, pp. 41–48, 2009.
[40] X. Wang, Y. Chen, and W. Zhu, “A survey on curriculum learning,”
IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 44, no. 9, pp. 4555–4576, 2021.
[41] D. Rolnick, A. Veit, S. Belongie, and N. Shavit, “Deep learning is ro-
bust to massive label noise,” arXiv preprint arXiv:1705.10694, 2017.
[42] P. Morerio, R. Volpi, R. Ragonesi, and V. Murino, “Generative
pseudo-label refinement for unsupervised domain adaptation,” in
Proceedings of the IEEE/CVF Winter Conference on Applications of
Computer Vision, pp. 3130–3139, 2020.
[43] G. Ros, L. Sellart, J. Materzynska, D. Vazquez, and A. M. Lopez,
“The synthia dataset: A large collection of synthetic images for se-
mantic segmentation of urban scenes,” in Proceedings of the IEEE
conference on computer vision and pattern recognition, pp. 3234–
3243, 2016.
[44] I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,”
arXiv preprint arXiv:1711.05101, 2017.
[45] H. Wang, T. Shen, W. Zhang, L.-Y. Duan, and T. Mei, “Classes
matter: A fine-grained adversarial approach to cross-domain seman-
tic segmentation,” in Computer Vision–ECCV 2020: 16th European
Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part
XIV, pp. 642–659, Springer, 2020.

全文公開日期 2025/07/27 (校內網路)
全文公開日期 2025/07/27 (校外網路)
全文公開日期 2025/07/27 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文