簡易檢索 / 詳目顯示

研究生: 楊庭嘉
Ting-Jia Yang
論文名稱: 基於生成對抗網路之條件式高解析度圖像生成
Conditional High-Resolution Image Synthesizing Based on Generative Adversarial Networks
指導教授: 王乃堅
Nai-Jian Wang
口試委員: 蘇順豐
Shun-Feng Su
鍾順平
Shun-Ping Chung
郭景明
Jing-Ming Guo
方劭云
Shao-Yun Fang
王乃堅
Nai-Jian Wang
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 英文
論文頁數: 49
中文關鍵詞: 圖像生成生成對抗網路條件式圖像生成
外文關鍵詞: image generation, generative adversarial network, conditional image generation
相關次數: 點閱:344下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文實作一個可根據使用者輸入條件而生成高解析度人臉圖像的系統。我們首先實
    作一個基於漸進式生長生成對抗網路以及Wasserstein距離的高解析圖像生成器,並使
    用CelebA資料庫訓練其生成高解析度的人臉圖像。為了控制此模型的的輸出,使其生
    成根據使用者所提供之各種人臉特徵描述的圖像,我們加入第二組生成對抗網路,學
    習圖像生成器的條件潛在機率分布。我們訓練此次要生成對抗網路,使其根據輸入的
    特徵向量,合成一組潛在向量作為圖像生成器的輸入以對輸出的圖像進行操控。我們
    以此法在CelebA及MNIST資料庫上進行驗證。實驗結果顯示此方法可將一個現有的生
    成模型轉為條件式生成模型,並仍保留足夠的生成樣本多樣性。此外,此方法允許我
    們導入來自多個不同資料庫的標籤,而不需重新訓練圖像生成器


    We present a conditional high resolution image generating system that can synthesize
    face images based on input of a wide range of facial attributes.
    First we focus on the training and optimization of an image generation model that synthe
    sizes high resolution images based on the Wasserstein distance and progressive growing
    method. Then to exert control over the image generation model, we deviate from the
    current trend and instead explore the viability of augmenting the image generation model
    by introducing a second generative adversarial network that learns its conditional latent
    space distribution in hope to manipulate the attributes of its output images.
    Experiments were conducted on CelebA and MNIST dataset. Results show that our
    method can convert an existing generative model to a conditional generative model that
    synthesize images based on input class or attributes while retaining reasonable diversity.
    Furthermore, this method allows one to incorporate labels from multiple datasets without
    retraining the generative model from scratch.

    摘要 I Abstract II 致謝 III Table of Contents IV List of Figures VII List of Tables VIII 1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.3 Research Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.4 Structure of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 High Resolution Image Generation 4 2.1 Artificial Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.1 Feed-forward Neural Network . . . . . . . . . . . . . . . . . . . . 4 2.1.2 Training Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.3 Stochastic Gradient Descent . . . . . . . . . . . . . . . . . . . . . 5 2.2 Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Generative Adversarial Network . . . . . . . . . . . . . . . . . . . . . . . 9 2.3.1 Image Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.2 Latent Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4 Wasserstein GAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.5 Progressive Growing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.5.1 Resolution Transitioning . . . . . . . . . . . . . . . . . . . . . . . 15 IV 2.5.2 Samples Interpolation . . . . . . . . . . . . . . . . . . . . . . . . 16 2.5.3 Pixel Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.6 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.6.1 CelebA Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.6.2 Latent Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.6.3 Model Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.6.4 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.6.5 Hyperparameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.6.6 Variable Moving Average . . . . . . . . . . . . . . . . . . . . . . . 22 2.6.7 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3 Conditional Latent Vector Generation 24 3.1 Reference Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.1.1 Training and Evaluation . . . . . . . . . . . . . . . . . . . . . . . 25 3.2 Secondary GAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.2.1 Dropout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2.2 Label Normalization . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.2.3 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.3 Post-Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4 Results 32 4.1 MNIST Evaluation and Results . . . . . . . . . . . . . . . . . . . . . . . 32 4.2 CelebA-HQ Evaluation and Results . . . . . . . . . . . . . . . . . . . . . 36 4.2.1 Multi-Scale Structural Similarity . . . . . . . . . . . . . . . . . . 36 4.2.2 Quality Assessment Using Critic Network . . . . . . . . . . . . . . 37 4.2.3 Attribute Reconstruction Accuracy . . . . . . . . . . . . . . . . . 37 4.3 Generated Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.3.1 Generated Images . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.3.2 Latent Space Interpolations . . . . . . . . . . . . . . . . . . . . . 43 4.3.3 Conditionally Generated Images . . . . . . . . . . . . . . . . . . . 44 V 4.3.4 Attribute Interpolations . . . . . . . . . . . . . . . . . . . . . . . 45 5 Discussions and Conclusions 46 References 47

    [1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep
    convolutional neural networks,” in Advances in neural information processing sys
    tems, 2012, pp. 1097–1105.
    [2] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,
    A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural
    information processing systems, 2014, pp. 2672–2680.
    [3] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein gan,” arXiv preprint
    arXiv:1701.07875, 2017.
    [4] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, “Improved
    training of wasserstein gans,” in Advances in Neural Information Processing Systems,
    2017, pp. 5769–5779.
    [5] J. Zhao, M. Mathieu, and Y. LeCun, “Energy-based generative adversarial network,”
    arXiv preprint arXiv:1609.03126, 2016.
    [6] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learn
    ing with deep convolutional generative adversarial networks,” arXiv preprint
    arXiv:1511.06434, 2015.
    [7] D. Berthelot, T. Schumm, and L. Metz, “Began: boundary equilibrium generative
    adversarial networks,” arXiv preprint arXiv:1703.10717, 2017.
    [8] T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of gans for
    improved quality, stability, and variation,” arXiv preprint arXiv:1710.10196, 2017.
    [9] H. Zhang, T. Xu, H. Li, S. Zhang, X. Huang, X. Wang, and D. Metaxas, “Stack
    gan: Text to photo-realistic image synthesis with stacked generative adversarial
    networks,” arXiv preprint, 2017.
    [10] M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint
    arXiv:1411.1784, 2014.
    [11] A. Odena, C. Olah, and J. Shlens, “Conditional image synthesis with auxiliary clas
    sifier gans,” arXiv preprint arXiv:1610.09585, 2016.
    [12] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel, “In
    fogan: Interpretable representation learning by information maximizing generative
    adversarial nets,” in Advances in Neural Information Processing Systems, 2016, pp.
    2172–2180.
    [13] Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for
    image quality assessment,” in Signals, Systems and Computers, 2004. Conference
    Record of the Thirty-Seventh Asilomar Conference on, vol. 2. Ieee, 2003, pp. 1398–
    1402.
    [14] J. Poland, “Three different algorithms for generating uniformly distributed random
    points on the n-sphere,” unpublished note available on the internet, 2000.
    [15] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”
    in Proceedings of the IEEE conference on computer vision and pattern recognition,
    2016, pp. 770–778.
    [16] K. He, X. Zhang, and S. Ren, “Delving deep into rectifiers: Surpassing human-level
    performance on imagenet classification,” in Proceedings of the IEEE international
    conference on computer vision, 2015, pp. 1026–1034.
    [17] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. Corrado,
    A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving,
    M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Man´e,
    R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner,
    I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Vi´egas,
    O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, “Tensorflow: Large-scale machine learning on heterogeneous distributed systems,” 2015. [Online].
    Available: http://download.tensorflow.org/paper/whitepaper2015.pdf
    [18] Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,”
    in Proceedings of International Conference on Computer Vision (ICCV), 2015.
    [19] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training
    by reducing internal covariate shift,” arXiv preprint arXiv:1502.03167, 2015.
    [20] A. Krogh and J. A. Hertz, “A simple weight decay can improve generalization,” in
    Advances in neural information processing systems, 1992, pp. 950–957.
    [21] R. Kohavi et al., “A study of cross-validation and bootstrap for accuracy estimation
    and model selection,” in Ijcai, vol. 14, no. 2. Montreal, Canada, 1995, pp. 1137–
    1145.
    [22] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov,
    “Dropout: A simple way to prevent neural networks from overfitting,” The Journal
    of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.

    QR CODE