Basic Search / Detailed Display

Author: 盧明孝
Ming-Hsiao Lu
Thesis Title: 條件生成對抗網路應用於圖像轉換之研究
A Study of Conditional Generative Adversarial Networks on Image-to-image Translation
Advisor: 吳怡樂
Yi-Leh Wu
Committee: 吳怡樂
Yi-Leh Wu
閻立剛
Li-Kang Yen
唐政元
Cheng-Yuan Tang
陳建中
Jiann-Jone Chen
Degree: 碩士
Master
Department: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
Thesis Publication Year: 2020
Graduation Academic Year: 108
Language: 英文
Pages: 34
Keywords (in Chinese): 圖像轉換條件生成對抗網路
Keywords (in other languages): Image-to-image Translation, Conditional Generative Adversarial Networks
Reference times: Clicks: 252Downloads: 17
Share:
School Collection Retrieve National Library Collection Retrieve Error Report

近年來,卷積神經網絡的迅速發展,衍生了各種圖像生成的新穎方法,在電腦視覺領域中產生了很多應用,並為圖像轉換任務帶來了很大的進步。有許多現有的生成模型應用只能生成單調的圖像,但是在圖像轉換任務中的生成結果應該是要多變化的。我們更改了現有模型的架構,並且加入額外的損失函數,讓生成模型可以在圖像翻譯任務中做到一對多的映射。我們提出的方法可以產生許多不同的輸出結果,並且維持著良好的圖像品質。在實驗中,將生成模型應用在三種不同的圖像轉換任務上,比較了我們提出的方法與其他方法的生成結果。與原本的模型比較,除了有更高的圖像多樣性,也節省了大約22%訓練所耗費的時間。


In recent years, the rapid growth of convolutional neural networks has led to a variety of novel image generation methods, which have brought many applications in computer vision and great progress to image translation tasks. Many existing generative model applications that can only generate monotonous images, but the results of the image translation task should be more variable. We change the architecture of the existing model and add an additional loss function so that the generative model can do one-to-many mapping in image translation tasks. The proposed method can produce a high variety of translations while maintaining good image quality. In the experiments, we apply the generative model to three different image translation tasks and compare the results of the proposed method with other methods. Comparing with the original model, in addition to higher image diversity, the proposed method also reduce about 22% of the time spent on training.

論文摘要 i Abstract ii Contents iii List of Figures iv List of Tables v Chapter 1 Introduction 1 Chapter 2 Related Work 3 2.1 Generative Adversarial Networks (GAN) 3 2.2 Conditional generative adversarial networks (cGAN) 4 2.3 Image-to-image translation 5 Chapter 3 GAN Model and Loss Function 7 3.1 The Pix2pix Baseline Model 7 3.2 The BicycleGan Baseline Model 9 3.2.1 Conditional Variational Autoencoder GAN (cVAE-GAN) 10 3.2.2 Conditional Latent Regressor GAN (cLR-GAN) 12 3.2.3 BicycleGan 13 3.3 Modified Model and Loss Function 14 Chapter 4 Experiments 17 4.1 Datasets and Evaluation Metrics 17 4.1.1 Datasets and Baseline 17 4.1.2 Evaluation Metrics 18 4.2 Qualitative Evaluation 19 4.3 Quantitative Evaluation 24 4.4 Analysis of the Loss function 29 Chapter 5 Conclusions and Future Work 32 References 33

[1] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in advances in neural information processing systems, 2014.
[2] M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784, 2014.
[3] J.-Y. Zhu, R. Zhang, D. Pathak, T. Darrell, A. A. Efros, O. Wang, and E. Shechtman, “Toward multimodal image-to-image translation,” in advances in neural information processing systems, 2017.
[4] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in proceedings of the IEEE conference on computer vision and pattern recognition, 2017.
[5] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein GAN,” in international conference on machine learning, 2017.
[6] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” arXiv preprint arXiv:1511.06434, 2015.
[7] J. Zhao, M. Mathieu, and Y. LeCun, “Energy-based generative adversarial network,” arXiv preprint arXiv:1609.03126, 2017.
[8] E. Denton, S. Chintala, A. Szlam, and R. Fergus, “Deep generative image models using a laplacian pyramid of adversarial networks,” in advances in neural information processing systems, 2015.
[9] J. Li, X. Liang, Y. Wei, T. Xu, J. Feng, and S. Yan, “Perceptual generative adversarial networks for small object detection,” in proceedings of the IEEE conference on computer vision and pattern recognition, 2017.
[10] J.-Y. Zhu, P. Krähenbühl, E. Shechtman, and A. Efros, “Generative visual manipulation on the natural image manifold,” in European conference on computer vision, 2016.
[11] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen, “Improved techniques for training GANs,” in advances in neural information processing systems, 2016.
[12] M. F. Mathieu, J. Zhao, A. Ramesh, P. Sprechmann, and Y. LeCun, “Disentangling factors of variation in deep representation using adversarial training,” in advances in neural information processing systems, 2016.
[13] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, “Generative adversarial text to image synthesis,” in international conference on machine learning, 2016.
[14] X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, C. C. Loy, Y. Qiao, and X. Tang, “Esrgan: Enhanced super-resolution generative adversarial networks,” in European conference on computer vision, 2018.
[15] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in proceedings of the IEEE international conference on computer vision, 2017.
[16] Z. Yi, H. Zhang, P. Tan, and M. Gong, “DualGAN: Unsupervised dual learning for image-to-image translation,” in proceedings of the IEEE international conference on computer vision, 2017.
[17] A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang, and R. Webb, “Learning from simulated and unsupervised images through adversarial training,” in proceedings of the IEEE conference on computer vision and pattern recognition, 2017.
[18] M.-Y. Liu, T. Breuel, and J. Kautz, “Unsupervised image-to-image translation networks,” in advances in neural information processing systems, 2017.
[19] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in international conference on medical image computing and computer-assisted intervention, 2015.
[20] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in proceedings of the IEEE conference on computer vision and pattern recognition, 2015.
[21] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in proceedings of the IEEE conference on computer vision and pattern recognition, 2016.
[22] R. Tyleček and R. Šára, “Spatial pattern templates for recognition of objects with regular structure,” in German conference on pattern recognition, 2013.
[23] A. Yu and K. Grauman, “Fine-grained visual comparisons with local learning,” in proceedings of the IEEE conference on computer vision and pattern recognition, 2014.
[24] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in proceedings of the IEEE conference on computer vision and pattern recognition, 2018.
[25] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, G. Klambauer, and S. Hochreiter, “GANs trained by a two time-scale update rule converge to a local nash equilibrium,” in advances in neural information processing systems, 2017.

QR CODE