簡易檢索 / 詳目顯示

研究生: 李鎮宇
Zhen-Yu Li
論文名稱: 基於機率衰減可微分增強實現語義圖像合成
Differentiable Augmentation with Probabilistic Attenuation for Semantic Image Synthesis
指導教授: 陳郁堂
Yie-Tarng Chen
口試委員: 林銘波
Ming-Bo Lin
黃琴雅
Chin-Ya Huang
方文賢
Wen-Hsien Fang
陳省隆
Hsing-Lung Chen
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 42
中文關鍵詞: 生成對抗網絡語義圖像合成風格轉換數據有效性
外文關鍵詞: Generative Adversarial Network, mage Synthesis, Style Transfer, Data Efficiency
相關次數: 點閱:189下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

隨著生成對抗網絡(GAN) [1]的快速發展,獲得高質量的圖像合成並不容易。這是因為生成器的圖像質量在訓練到一定程度後無法進一步提高,同時判別器陷入過度擬合,無法很好地區分真假圖像。GAN架構和訓練過程在提高GAN生成的圖像質量方面也發揮著重要作用。受之前對語義圖像合成研究[2]的啟發,我們提出了一種新的語義圖像合成方法來解決這些問題。首先,我們利用具有概率衰減的可微增強來有效地增加訓練標籤,並避免在訓練中陷入局部最小值。由於生成器的架構是語義圖像合成中的關鍵問題,我們通過結合CLADE歸一化和雙重註意網絡重新設計生成器的架構。與SPADE歸一化相比,這種組合使用更少的參數,同時實現了較高的訓練效率,還保留了對象之間的關係。對於多模式轉換,將3D噪聲注入到所提出的網絡中以生成具有各種風格的圖像。模擬表明,所提出的方法在廣泛的Cityscapes和Mapillary數據集上的FID分數方面優於最先進的方法。


With fast progress in generative adversarial networks (GANs) [1], it is not easy to acquire high-quality image synthesis. This is because the image quality from the generator can not be further improved after training to a certain level, meanwhile, the discriminator falls into overfitting, which cannot distinguish real images or fake ones well. The GAN architecture, and training process play important roles to enhance the image quality of GAN generation as well. Inspired by the previous research on semantic image synthesis [2], we propose a new approach for semantic image synthesis to address these setbacks. First, we take advantage of differentiable augmentation with probabilistic attenuation to increase training labels efficiently, and avoid getting stuck at a local-minimum in training as well. Since the architecture of the generator is a key issue in semantic image synthesis, were-design the generator’s architecture by combining the CLADE normalization, and dual attention networks. Such a combination uses fewer parameters while achieving high training efficiency in comparison with the SPADE normalization, also retaining the relationship between objects. Toward multiple-mode translation, 3D noises are injected into the proposed networks to generate images with various styles. Simulations reveal that the proposed approach outperforms the state of the art approaches in terms of FID scores on the widespread Cityscapes and Mapillary datasets.

Abstract i Acknowledgment iii Table of contents iv List of Figures vii List of Tables ix 1 Introduction 1 1.1 Motivations 1 1.2 Summary of thesis 1 1.3 Contributions 2 1.4 Thesis Outline 2 2 Related Works 3 2.1 Conditional Generative Adversarial Network 3 2.2 Differentiable Augmentation 4 3 The Proposed Networks for Semantic Image Synthesis 6 3.1 Overview 6 3.2 Generator 7 3.2.1 3D Noise 8 3.2.2 CLADE Normalization 9 3.2.3 Dual Attention Modules 10 3.2.4 Objective Function 11 3.3 Discriminator 12 3.3.1 OASIS Discriminator 12 3.3.2 Differentiable Augmentation with Probabilistic Attenuation 15 3.4 Summary 18 4 Experimental Result 19 4.1 Datasets 19 4.2 Implementation Details 21 4.3 Evaluation Protocol 22 4.4 Main Experiments 24 4.5 Ablations 28 4.5.1 Ablation on the dual attention 28 4.5.2 Ablation on the probabilistic attenuation 28 4.6 Error Analysis29 4.6.1 Failure Cases 29 4.6.2 Different Distribution between other Datasets 31 4.7 Practical Application 32 4.7.1 Style Change 33 4.7.2 Simulator Convert to The Real World 33 5 Conclusions 40 References 41

1] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. C. Courville, and Y. Bengio, “Generative adversarial networks,”ArXiv, vol. abs/1406.2661, 2014.
[2] T. Park, M.-Y. Liu, T.-C. Wang, and J.-Y. Zhu, “Semantic image synthesis with spatially-adaptive normalization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2337–2346,2019.
[3] H. Tang, S. Bai, and N. Sebe, “Dual attention gans for semantic image synthesis,” in Proceedings of the 28th ACM International Conference on Multi-media, pp. 1994–2002, 2020.
[4] E. Sch ̈onfeld, V. Sushko, D. Zhang, J. Gall, B. Schiele, and A. Khoreva,“You only need adversarial supervision for semantic image synthesis,” in International Conference on Learning Representations, 2020.
[5] Z. Tan, D. Chen, Q. Chu, M. Chai, J. Liao, M. He, L. Yuan, G. Hua, andN. Yu, “Semantic image synthesis via efficient class-adaptive normalization,”arXiv preprint arXiv:2012.04644, 2020.
[6] M. Mirza and S. Osindero, “Conditional generative adversarial nets,”arXiv preprint arXiv:1411.1784, 2014.
[7] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125–1134, 2017.
[8] T.-C. Wang, M.-Y. Liu, J.-Y. Zhu, A. Tao, J. Kautz, and B. Catanzaro, “High-resolution image synthesis and semantic manipulation with conditional gans,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8798–8807, 2018.
[9] S. Zhao, Z. Liu, J. Lin, J.-Y. Zhu, and S. Han, “Differentiable augmentation for data-efficient gan training,”arXiv preprint arXiv:2006.10738, 2020.[10] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, pp. 234–241, Springer,2015.
[11] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter,“Gans trained by a two time-scale update rule converge to a local nashequilibrium,”Advances in neural information processing systems, vol. 30,2017.
[12] M. Seitzer, “pytorch-fid: FID Score for PyTorch.”https://github.com/mseitzer/pytorch-fid, August 2020. Version 0.1.1.
[13] F. Yu, V. Koltun, and T. Funkhouser, “Dilated residual networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition ,pp. 472–480, 2017.
[14] A. Tao, K. Sapra, and B. Catanzaro, “Hierarchical multi-scale attention for semantic segmentation,”arXiv preprint arXiv:2005.10821, 2020.
[15] A. Kirillov, Y. Wu, K. He, and R. Girshick, “Pointrend: Image segmentation as rendering,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9799–9808, 2020.

無法下載圖示 全文公開日期 2024/09/17 (校內網路)
全文公開日期 2026/09/17 (校外網路)
全文公開日期 2026/09/17 (國家圖書館:臺灣博碩士論文系統)
QR CODE