基於遮罩門控鑑別器之自適應城市場景語意分割模型｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	林詠翔 Yong-Xiang Lin
論文名稱：	基於遮罩門控鑑別器之自適應城市場景語意分割模型 Adapting Semantic Segmentation of Urban Scenes via Mask-aware Gated Discriminator
指導教授：	花凱龍 Kai-Lung Hua
口試委員:	花凱龍 Kai-Lung Hua 楊傳凱 Chuan-Kai Yang 陳駿丞 Jun-Cheng Chen 鐘國亮 Kuo-Liang Chung 郭景明 Jing-Ming Guo
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2020
畢業學年度：	108
語文別：	中文
論文頁數：	46
中文關鍵詞：	電腦視覺、域適應、語意分割、深度學習
外文關鍵詞：	Computer vision, Domain adaptation, Semantic segmentation, Deep learning
相關次數：	點閱：300 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

訓練深度神經網絡進行語義分割依賴於像素級標籤進行監督。但是收集大型數據集
的像素集標籤是非常昂貴且耗時。一種解決方法是利用合成數據集，我們可以使用相應
的標籤生成數據。不幸的是在合成數據上訓練的網絡在真實圖像上表現不佳，這被解釋
為域移位問題。針對此問題科學家提出域適應的技術，域適應技術已顯示出將從合成數
據學習的知識轉移到現實世界數據的潛力。之前的工作主要利用對抗性訓練來執行全
局對齊功能。但是，我們觀察到背景對像在不同的域中具有較小的變化，反之前景類別
在不同域中的變化較大。利用上述的觀察，我們提出了一種域自適應方法，可以分別對
前景對象和背景對象進行建模和調整。我們的方法從畫風轉移開始，以緩解域移位的問
題。接下來是前景自適應模塊，它基於預測出的前景遮罩搭配我們提出的門控鑑別器進
行學習，以便分別適應前景和背景類別。我們在實驗中證明，我們的模型在平均交并比
（mIoU）方面優於幾個最先進的基線。

Training a deep neural network for semantic segmentation relies on pixel-level ground
truth labels for supervision. However, collecting large datasets with pixel-level annotations
is very expensive and time consuming. One workaround is to utilize synthetic data
where we can generate potentially unlimited data with their corresponding ground truth
labels. Unfortunately, networks trained on synthetic data perform poorly on real images
due to the domain shift problem. Domain adaptation techniques have shown potential in
transferring the knowledge learned from synthetic data to real world data. Prior works
have mostly leveraged on adversarial training to perform a global aligning of features.
However, we observed that background objects have lesser variations across different domains
as opposed to foreground objects. Using this insight, we propose a method for
domain adaptation that models and adapts foreground objects and background objects
separately. Our approach starts with a fast style transfer to match the appearance of the
inputs. This is followed by a foreground adaptation module that learns a foreground mask
that is used by our gated discriminator in order to adapt the foreground and background
objects separately. We demonstrate in our experiments that our model outperforms several
state-of-the-art baselines in terms of mean intersection over union (mIoU).

論文摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II
誌謝. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III
目錄. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IV
圖目錄. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI
表目錄. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VIII
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1 Semantic Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Domain Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Self-attention Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1 Overview and Problem Formulation . . . . . . . . . . . . . . . . . . . . 6
1.1 Domain adaptation . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Style Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Semantic Segmentation Network . . . . . . . . . . . . . . . . . . . . . . 10
4 Foreground Adaptation Module . . . . . . . . . . . . . . . . . . . . . . 10
4.1 Foreground Mask Network . . . . . . . . . . . . . . . . . . . . . 10
4.2 Mask-aware Gated Discriminator . . . . . . . . . . . . . . . . . 11
5 Spatial Adaptation Module . . . . . . . . . . . . . . . . . . . . . . . . . 12
6 Objective Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1 GTA5-Cityscapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2 SYNTHIA-Cityscapes . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6 Examples of Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
0.1 Future works . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
參考文獻. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
                                

[1] Y.-H. Tsai, W.-C. Hung, S. Schulter, K. Sohn, M.-H. Yang, and M. Chandraker,
“Learning to adapt structured output space for semantic segmentation,” in CVPR,
2018.
[2] H. Zhang, I. J. Goodfellow, D. N. Metaxas, and A. Odena, “Self-attention generative
adversarial networks,” CoRR, vol. abs/1805.08318, 2018.
[3] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic
segmentation,” in CVPR, 2015.
[4] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab:
Semantic image segmentation with deep convolutional nets, atrous convolution, and
fully connected crfs,” IEEE Transactions on Pattern Analysis and Machine Intelligence,
2018.
[5] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in
CVPR, 2017.
[6] T. Yao, Y. Pan, C. Ngo, H. Li, and T. Mei, “Semi-supervised domain adaptation with
subspace learning for visual recognition,” in CVPR, 2015.
[7] S. R. Richter, V. Vineet, S. Roth, and V. Koltun, “Playing for data: Ground truth
from computer games,” in ECCV, 2016.
[8] G. Ros, L. Sellart, J. Materzynska, D. Vazquez, and A. Lopez, “The SYNTHIA
Dataset: A large collection of synthetic images for semantic segmentation of urban
scenes,” in CVPR, 2016.
[9] S. R. Richter, Z. Hayder, and V. Koltun, “Playing for benchmarks,” in ICCV, 2017.
[10] A. Gretton, A. Smola, J. Huang, M. Schmittfull, K. Borgwardt, and B. Schölkopf,
“Covariate shift and local learning by distribution matching,” in Dataset Shift in
Machine Learning, 2009.
[11] M. Long, Y. Cao, J. Wang, and M. I. Jordan, “Learning transferable features with
deep adaptation networks,” in ICML, 2015.
[12] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, “Adversarial discriminative domain
adaptation,” in CVPR, 2017.
[13] F. S. Saleh, M. S. Aliakbarian, M. Salzmann, L. Petersson, and J. M. Alvarez, “Effective
use of synthetic data for urban scene semantic segmentation,” in ECCV, 2018.
[14] Y. Li, M.-Y. Liu, X. Li, M.-H. Yang, and J. Kautz, “A closed-form solution to photorealistic
image stylization,” in ECCV, 2018.
[15] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,”
arXiv preprint arXiv:1511.07122, 2015.
[16] J. Fu, H. Tian, Z. Fang, and H. Lu, “Dual attention network for scene segmentation,”
CoRR, vol. abs/1809.02983, 2018.
[17] Y. Yuan and J. Wang, “Ocnet: Object context network for scene parsing,” CoRR,
vol. abs/1809.00916, 2018.
[18] G. Papandreou, L.-C. Chen, K. P. Murphy, and A. L. Yuille, “Weakly-and semisupervised
learning of a deep convolutional network for semantic image segmentation,”
in Proceedings of the IEEE international conference on computer vision,
pp. 1742–1750, 2015.
[19] J. Hoffman, D. Wang, F. Yu, and T. Darrell, “Fcns in the wild: Pixel-level adversarial
and constraint-based adaptation,” arXiv preprint arXiv:1612.02649, 2016.
[20] Y. Zhang, P. David, and B. Gong, “Curriculum domain adaptation for semantic segmentation
of urban scenes,” in ICCV, 2017.
[21] K. He, G. Gkioxari, P. Dollár, and R. B. Girshick, “Mask r-cnn,” 2017 IEEE International
Conference on Computer Vision (ICCV), pp. 2980–2988, 2017.
[22] J. Hoffman, E. Tzeng, T. Park, J.-Y. Zhu, P. Isola, K. Saenko, A. A. Efros, and T. Darrell,
“Cycada: Cycle-consistent adversarial domain adaptation,” in ICML, 2018.
[23] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser,
and I. Polosukhin, “Attention is all you need,” in NIPS, 2017.
[24] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang, “Free-form image inpainting
with gated convolution,” arXiv preprint arXiv:1806.03589, 2018.
[25] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, “Spectral normalization for
generative adversarial networks,” in ICLR, 2018.
[26] H. Zhang, I. J. Goodfellow, D. N. Metaxas, and A. Odena, “Self-attention generative
adversarial networks,” arXiv preprint arXiv:1805.0831, 2018.
[27] Y.-H. Chen, W.-Y. Chen, Y.-T. Chen, B.-C. Tsai, Y.-C. F. Wang, and M. Sun, “No
more discrimination: Cross city adaptation of road scene segmenters,” 2017 IEEE
International Conference on Computer Vision (ICCV), pp. 2011–2020, 2017.
[28] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke,
S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,”
in CVPR, 2016.
[29] G. Ros, L. Sellart, J. Materzynska, D. Vázquez, and A. M. López, “The synthia
dataset: A large collection of synthetic images for semantic segmentation of urban
scenes,” 2016 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), pp. 3234–3243, 2016.

全文公開日期 2025/03/24 (校內網路)
全文公開日期 2025/03/24 (校外網路)
全文公開日期 2025/03/24 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文