基於生成對抗網路之條件式高解析度圖像生成｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	楊庭嘉 Ting-Jia Yang
論文名稱：	基於生成對抗網路之條件式高解析度圖像生成 Conditional High-Resolution Image Synthesizing Based on Generative Adversarial Networks
指導教授：	王乃堅 Nai-Jian Wang
口試委員:	蘇順豐 Shun-Feng Su 鍾順平 Shun-Ping Chung 郭景明 Jing-Ming Guo 方劭云 Shao-Yun Fang 王乃堅 Nai-Jian Wang
學位類別：	碩士 Master
系所名稱：	電資學院 - 電機工程系 Department of Electrical Engineering
論文出版年：	2018
畢業學年度：	106
語文別：	英文
論文頁數：	49
中文關鍵詞：	圖像生成、生成對抗網路、條件式圖像生成
外文關鍵詞：	image generation, generative adversarial network, conditional image generation
相關次數：	點閱：344 下載：1
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本論文實作一個可根據使用者輸入條件而生成高解析度人臉圖像的系統。我們首先實
作一個基於漸進式生長生成對抗網路以及Wasserstein距離的高解析圖像生成器，並使
用CelebA資料庫訓練其生成高解析度的人臉圖像。為了控制此模型的的輸出，使其生
成根據使用者所提供之各種人臉特徵描述的圖像，我們加入第二組生成對抗網路，學
習圖像生成器的條件潛在機率分布。我們訓練此次要生成對抗網路，使其根據輸入的
特徵向量，合成一組潛在向量作為圖像生成器的輸入以對輸出的圖像進行操控。我們
以此法在CelebA及MNIST資料庫上進行驗證。實驗結果顯示此方法可將一個現有的生
成模型轉為條件式生成模型，並仍保留足夠的生成樣本多樣性。此外，此方法允許我
們導入來自多個不同資料庫的標籤，而不需重新訓練圖像生成器

We present a conditional high resolution image generating system that can synthesize
face images based on input of a wide range of facial attributes.
First we focus on the training and optimization of an image generation model that synthe
sizes high resolution images based on the Wasserstein distance and progressive growing
method. Then to exert control over the image generation model, we deviate from the
current trend and instead explore the viability of augmenting the image generation model
by introducing a second generative adversarial network that learns its conditional latent
space distribution in hope to manipulate the attributes of its output images.
Experiments were conducted on CelebA and MNIST dataset. Results show that our
method can convert an existing generative model to a conditional generative model that
synthesize images based on input class or attributes while retaining reasonable diversity.
Furthermore, this method allows one to incorporate labels from multiple datasets without
retraining the generative model from scratch.

摘要 I
Abstract II
致謝 III
Table of Contents IV
List of Figures VII
List of Tables VIII
Introduction 1
1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
3 Research Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
4 Structure of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
High Resolution Image Generation 4
1 Artiﬁcial Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1 Feed-forward Neural Network . . . . . . . . . . . . . . . . . . . . 4
1.2 Training Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Stochastic Gradient Descent . . . . . . . . . . . . . . . . . . . . . 5
2 Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Generative Adversarial Network . . . . . . . . . . . . . . . . . . . . . . . 9
3.1 Image Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Latent Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4 Wasserstein GAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5 Progressive Growing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.1 Resolution Transitioning . . . . . . . . . . . . . . . . . . . . . . . 15
IV
5.2 Samples Interpolation . . . . . . . . . . . . . . . . . . . . . . . . 16
5.3 Pixel Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.1 CelebA Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.2 Latent Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.3 Model Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6.4 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.5 Hyperparameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.6 Variable Moving Average . . . . . . . . . . . . . . . . . . . . . . . 22
6.7 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Conditional Latent Vector Generation 24
1 Reference Classiﬁer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.1 Training and Evaluation . . . . . . . . . . . . . . . . . . . . . . . 25
2 Secondary GAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.1 Dropout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2 Label Normalization . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3 Post-Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Results 32
1 MNIST Evaluation and Results . . . . . . . . . . . . . . . . . . . . . . . 32
2 CelebA-HQ Evaluation and Results . . . . . . . . . . . . . . . . . . . . . 36
2.1 Multi-Scale Structural Similarity . . . . . . . . . . . . . . . . . . 36
2.2 Quality Assessment Using Critic Network . . . . . . . . . . . . . . 37
2.3 Attribute Reconstruction Accuracy . . . . . . . . . . . . . . . . . 37
3 Generated Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.1 Generated Images . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2 Latent Space Interpolations . . . . . . . . . . . . . . . . . . . . . 43
3.3 Conditionally Generated Images . . . . . . . . . . . . . . . . . . . 44
V
3.4 Attribute Interpolations . . . . . . . . . . . . . . . . . . . . . . . 45
Discussions and Conclusions 46
References 47

                                

[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classiﬁcation with deep
convolutional neural networks,” in Advances in neural information processing sys
tems, 2012, pp. 1097–1105.
[2] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,
A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural
information processing systems, 2014, pp. 2672–2680.
[3] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein gan,” arXiv preprint
arXiv:1701.07875, 2017.
[4] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, “Improved
training of wasserstein gans,” in Advances in Neural Information Processing Systems,
2017, pp. 5769–5779.
[5] J. Zhao, M. Mathieu, and Y. LeCun, “Energy-based generative adversarial network,”
arXiv preprint arXiv:1609.03126, 2016.
[6] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learn
ing with deep convolutional generative adversarial networks,” arXiv preprint
arXiv:1511.06434, 2015.
[7] D. Berthelot, T. Schumm, and L. Metz, “Began: boundary equilibrium generative
adversarial networks,” arXiv preprint arXiv:1703.10717, 2017.
[8] T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of gans for
improved quality, stability, and variation,” arXiv preprint arXiv:1710.10196, 2017.
[9] H. Zhang, T. Xu, H. Li, S. Zhang, X. Huang, X. Wang, and D. Metaxas, “Stack
gan: Text to photo-realistic image synthesis with stacked generative adversarial
networks,” arXiv preprint, 2017.
[10] M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint
arXiv:1411.1784, 2014.
[11] A. Odena, C. Olah, and J. Shlens, “Conditional image synthesis with auxiliary clas
siﬁer gans,” arXiv preprint arXiv:1610.09585, 2016.
[12] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel, “In
fogan: Interpretable representation learning by information maximizing generative
adversarial nets,” in Advances in Neural Information Processing Systems, 2016, pp.
2172–2180.
[13] Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for
image quality assessment,” in Signals, Systems and Computers, 2004. Conference
Record of the Thirty-Seventh Asilomar Conference on, vol. 2. Ieee, 2003, pp. 1398–
1402.
[14] J. Poland, “Three diﬀerent algorithms for generating uniformly distributed random
points on the n-sphere,” unpublished note available on the internet, 2000.
[15] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”
in Proceedings of the IEEE conference on computer vision and pattern recognition,
2016, pp. 770–778.
[16] K. He, X. Zhang, and S. Ren, “Delving deep into rectiﬁers: Surpassing human-level
performance on imagenet classiﬁcation,” in Proceedings of the IEEE international
conference on computer vision, 2015, pp. 1026–1034.
[17] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. Corrado,
A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving,
M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Man´e,
R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner,
I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Vi´egas,
O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, “Tensorﬂow: Large-scale machine learning on heterogeneous distributed systems,” 2015. [Online].
Available: http://download.tensorﬂow.org/paper/whitepaper2015.pdf
[18] Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,”
in Proceedings of International Conference on Computer Vision (ICCV), 2015.
[19] S. Ioﬀe and C. Szegedy, “Batch normalization: Accelerating deep network training
by reducing internal covariate shift,” arXiv preprint arXiv:1502.03167, 2015.
[20] A. Krogh and J. A. Hertz, “A simple weight decay can improve generalization,” in
Advances in neural information processing systems, 1992, pp. 950–957.
[21] R. Kohavi et al., “A study of cross-validation and bootstrap for accuracy estimation
and model selection,” in Ijcai, vol. 14, no. 2. Montreal, Canada, 1995, pp. 1137–
1145.
[22] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov,
“Dropout: A simple way to prevent neural networks from overﬁtting,” The Journal
of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.

簡易檢索 / 詳目顯示

相關論文