簡易檢索 / 詳目顯示

研究生: 黃竹萱
Cheryl May Huang
論文名稱: 基於生成對抗網路之臉部卡通風格轉換
Cartoon Style Transfer in Faces using Generative Adversarial Networks
指導教授: 戴文凱
Wen-Kai Tai
口試委員: 鮑興國
Hsing-Kuo Pao
鄭文皇
Wen-Huang Cheng
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 92
中文關鍵詞: 風格轉換風格化對抗生成網路卡通濾鏡圖像編輯臉部風格轉換自動編碼器
外文關鍵詞: style transfer, stylization, GAN, cartoon filter, image editing, facial style transfer, autoencoder
相關次數: 點閱:589下載:18
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  •   生成對抗網路(GAN)是一種可生成高品質圖像的模型,目前已成為一個熱門的研究主題。然而因為圖像在不同主題下的巨大差異,風格轉換仍然是個有挑戰性的難題。基於 Swapping Autoencoder for Deep Image Manipulation 的方法,本論文提出了一種卡通風格濾鏡,可以將真實人臉轉換為卡通風格圖像。比起多數其他GAN論文訓練整個生成器,本論文將結構修改為僅訓練生成器表層的第六到第八層,並經由遷移學習(Transfer Learning)以及資料預處理(data preprocessing),發現我們的模型可比近期的其他研究更快生成出更好的轉換結果。此外,我們提出了stylized loss決定圖像在訓練中的風格化程度。根據我們的使用者感知研究,在結構與風格非常不相似的兩張圖片中,比起結構相似,受測者傾向於選擇風格較完整的圖像,也就是使用我們的方法所產生的結果。最後,我們提出了四種應用方法,證明我們的模型無論在圖片編輯,或是不同臉部中的特徵轉換都相當具有實用性。


    Generative Adversarial Networks (GANs) have become a popular research topic as being a high quality generative model, but the application of style transfer with different domains images still remains challenging due to the great divergence between image statistics. We propose a method based on Swapping Autoencoder for Deep Image Manipulation, a basic cartoon filter that can transfer a natural face image to a cartoon style image which is still identity-recognizable. Compared to most other GAN models that trained the entire generator, we modifies the architecture to only train the sixth to eighth layers of the generator. With transfer learning and data preprocessing, our model can generate images faster and better than many recent models. We also proposed stylized loss to determine the stylization degree of an image. According to human perceptual study, among the two very dissimilar images in structure and style, the participants will tend to choose the image with the better style, which is usually ours result. Finally, we represent four applications show that our model is useful in both latent editing and different facial features.

    Abstract in Chinese . . . . . . . . . . . . . . . . . . . . . . . . . . iii Abstract in English . . . . . . . . . . . . . . . . . . . . . . . . . . iv Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . v Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Image Style Transfer . . . . . . . . . . . . . . . . . . . . 4 2.2 Generative Adversarial Networks (GANs) . . . . . . . . . 5 2.3 Conditional Image Generation . . . . . . . . . . . . . . . 6 2.3.1 cGANs . . . . . . . . . . . . . . . . . . . . . . . 6 2.3.2 pix2pix . . . . . . . . . . . . . . . . . . . . . . . 7 2.3.3 CycleGAN, DiscoGAN, and DualGAN . . . . . . 7 2.4 Style Based Generators for GANs . . . . . . . . . . . . . 12 2.4.1 StyleGAN and StyleGAN2 . . . . . . . . . . . . . 12 2.4.2 Image2StyleGAN++, InterfaceGAN, and StyleFlow 15 2.5 Facial Style Transfer . . . . . . . . . . . . . . . . . . . . 18 2.5.1 FaceBlit . . . . . . . . . . . . . . . . . . . . . . . 18 2.5.2 Deep Image Analogy . . . . . . . . . . . . . . . . 20 2.5.3 Swapping Autoencoder . . . . . . . . . . . . . . . 21 3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.1 Objective Setup . . . . . . . . . . . . . . . . . . . . . . . 24 3.2 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . 26 3.3 Loss Functions . . . . . . . . . . . . . . . . . . . . . . . 30 3.4 Input Code Study with Model Fine-tuning . . . . . . . . . 32 3.5 Overall Training and Architecture . . . . . . . . . . . . . 34 4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 36 4.1 Comparisons . . . . . . . . . . . . . . . . . . . . . . . . 36 4.1.1 Training and Testing Results . . . . . . . . . . . . 37 4.1.2 Human Perceptual Study . . . . . . . . . . . . . . 39 4.1.3 Quantitative Evaluation . . . . . . . . . . . . . . 46 4.1.4 Data Preprocessing and Runtime . . . . . . . . . . 52 4.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.2.1 Texture and Structure Interpolation . . . . . . . . 56 4.2.2 Mean Style Transfer . . . . . . . . . . . . . . . . 57 4.2.3 Facial Attribute Study . . . . . . . . . . . . . . . 58 4.2.4 Further Transformation . . . . . . . . . . . . . . . 59 5 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . 63 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Appendix A Datasets . . . . . . . . . . . . . . . . . . . . . . . . 69 Appendix B Human Perceptual Study . . . . . . . . . . . . . . . 71

    [1] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4401–4410, 2019.
    [2] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, “Analyzing and improving the image quality of stylegan,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8110–8119, 2020.
    [3] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems, pp. 2672–2680, 2014.
    [4] L. A. Gatys, A. S. Ecker, and M. Bethge, “Image style transfer using convolutional neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2414–2423, 2016.
    [5] TikTok Inc., “Tiktok.” https://www.tiktok.com/zh-Hant-TW/. Accessed on 31.7.2021.
    [6] Snap Inc., “Snapchat.” https://www.snapchat.com/. Accessed on 31.7.2021.
    [7] T. Park, J.-Y. Zhu, O. Wang, J. Lu, E. Shechtman, A. A. Efros, and R. Zhang, “Swapping autoencoder for deep image manipulation,” arXiv preprint arXiv:2007.00653, 2020.
    [8] X. Wang and J. Yu, “Learning to cartoonize using white-box cartoon representations,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8090–8099, 2020.
    [9] X. Huang and S. Belongie, “Arbitrary style transfer in real-time with adaptive instance normalization,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 1501–1510, 2017.
    [10] Y. Li, C. Fang, J. Yang, Z. Wang, X. Lu, and M.-H. Yang, “Universal style transfer via feature transforms,” in Advances in Neural Information Processing Systems, pp. 386–396, 2017.
    [11] T. Q. Chen and M. Schmidt, “Fast patch-based style transfer of arbitrary style,” arXiv preprint arXiv: 1612.04337, 2016.
    [12] J. Gui, Z. Sun, Y. Wen, D. Tao, and J. Ye, “A review on generative adversarial networks: Algorithms, theory, and applications,” arXiv preprint arXiv:2001.06937, 2020.
    [13] S. Mukherjee, H. Asnani, E. Lin, and S. Kannan, “Clustergan: Latent space clustering in generative adversarial networks,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 4610–4617, 2019.
    [14] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” arXiv preprint arXiv:1511.06434, 2015.
    [15] X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. Paul Smolley, “Least squares generative adversarial networks,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 2794–2802, 2017.
    [16] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein gan,” arXiv preprint arXiv:1701.07875, 2017.
    [17] M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784, 2014.
    [18] H. Tang, W. Wang, S. Wu, X. Chen, D. Xu, N. Sebe, and Y. Yan, “Expression conditional gan for facial expression-to-expression translation,” in Proceedings of the IEEE International Conference on Image Processing, pp. 4449–4453, 2019.
    [19] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134, 2017.
    [20] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle consistent adversarial networks,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232, 2017.
    [21] T. Kim, M. Cha, H. Kim, J. K. Lee, and J. Kim, “Learning to discover cross-domain relations with generative adversarial networks,” arXiv preprint arXiv:1703.05192, 2017.
    [22] Z. Yi, H. Zhang, P. Tan, and M. Gong, “Dualgan: Unsupervised dual learning for image-to-image translation,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 2849– 2857, 2017.
    [23] T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of gans for improved quality,
    stability, and variation,” arXiv preprint arXiv:1710.10196, 2017.
    [24] E. Härkönen, A. Hertzmann, J. Lehtinen, and S. Paris, “Ganspace: Discovering interpretable gan controls,” arXiv preprint arXiv:2004.02546, 2020.
    [25] J. Zhu, Y. Shen, D. Zhao, and B. Zhou, “In-domain gan inversion for real image editing,” in Proceedings of the European Conference on Computer Vision, pp. 592–608, 2020.
    [26] E. Richardson, Y. Alaluf, O. Patashnik, Y. Nitzan, Y. Azar, S. Shapiro, and D. Cohen-Or, “Encoding in style: a stylegan encoder for image-to-image translation,” arXiv preprint arXiv:2008.00951, 2020.
    [27] Y. Viazovetskyi, V. Ivashkin, and E. Kashin, “Stylegan2 distillation for feed-forward image manipulation,” in Proceedings of the European Conference on Computer Vision, pp. 170–186, 2020.
    [28] R. Abdal, Y. Qin, and P. Wonka, “Image2stylegan++: How to edit the embedded images?,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8296–8305, 2020.
    [29] R. Abdal, Y. Qin, and P. Wonka, “Image2stylegan: How to embed images into the stylegan latent space?,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 4432–4441, 2019.
    [30] Y. Shen, C. Yang, X. Tang, and B. Zhou, “Interfacegan: Interpreting the disentangled face representation learned by gans,” arXiv preprint arXiv:2005.09635, 2020.
    [31] R. Abdal, P.Zhu, N. J. Mitra, and P. Wonka, “Styleflow: Attribute-conditioned exploration of stylegan generated images using conditional continuous normalizing flows,” ACM Transactions on Graphics, vol. 40, no. 3, pp. 1–21, 2021.
    [32] A. Texler, O. Texler, M. Kučera, M. Chai, and D. Sỳkora, “Faceblit: Instant real-time example-based style transfer to facial videos,” in Proceedings of the ACM on Computer Graphics and Interactive Techniques, vol. 4, pp. 1–17, 2021.
    [33] J. Fišer, O. Jamriška, D. Simons, E. Shechtman, J. Lu, P. Asente, M. Lukáč, and D. Sỳkora, “Example based synthesis of stylized facial animations,” ACM Transactions on Graphics, vol. 36, no. 4, pp. 1–11, 2017.
    [34] D. Sỳkora, O. Jamriška, O. Texler, J. Fišer, M. Lukáč, J. Lu, and E. Shechtman, “Styleblit: Fast
    example-based stylization with local guidance,” Computer Graphics Forum, vol. 38, pp. 83–91, 2019.
    [35] Y. Shih, S. Paris, C. Barnes, W. T. Freeman, and F. Durand, “Style transfer for headshot portraits,” ACM Transactions on Graphics, vol. 33, no. 4, 2014.
    [36] J. Liao, Y. Yao, L. Yuan, G. Hua, and S. B. Kang, “Visual attribute transfer through deep image analogy,” arXiv preprint arXiv:1705.01088, 2017.
    [37] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
    [38] C. Barnes, E. Shechtman, A. Finkelstein, and D. B. Goldman, “Patchmatch: A randomized correspondence algorithm for structural image editing,” ACM Transactions on Graphics, vol. 28, no. 3, p. 24, 2009.
    [39] J. Kim, M. Kim, H. Kang, and K. Lee, “U-gat-it: unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation,” arXiv preprint arXiv: 1907.10830, 2019.
    [40] bryandlee, “naver-webtoon-faces.” https://github.com/bryandlee/naver-webtoon-faces. Accessed on 31.7.2021.
    [41] J. N. Pinkney and D. Adler, “Resolution dependant gan interpolation for controllable image synthesis between domains,” arXiv preprint arXiv:2010.05334, 2020.
    [42] S. Mo, M. Cho, and J. Shin, “Freeze the discriminator: a simple baseline for fine-tuning gans,” arXiv preprint arXiv:2002.10964, 2020.
    [43] T. R. Shaham, T. Dekel, and T. Michaeli, “Singan: Learning a generative model from a single natural image,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4570–4580, 2019.
    [44] K. Ding, K. Ma, S. Wang, and E. P. Simoncelli, “Image quality assessment: Unifying structure and texture similarity,” arXiv preprint arXiv:2004.07728, 2020.
    [45] N. Kolkin, J. Salavon, and G. Shakhnarovich, “Style transfer by relaxed optimal transport and self-similarity,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10051–10060, 2019.
    [46] M. Ruder, A. Dosovitskiy, and T. Brox, “Artistic style transfer for videos,” in Proceedings of the German Conference on Pattern Recognition, pp. 26–36, Springer, 2016.
    [47] L. A. Gatys, M. Bethge, A. Hertzmann, and E. Shechtman, “Preserving color in neural artistic style transfer,” arXiv preprint arXiv:1606.05897, 2016.
    [48] J. Yoo, Y. Uh, S. Chun, B. Kang, and J.-W. Ha, “Photorealistic style transfer via wavelet transforms,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 9036–9045, 2019.
    [49] Adobe Inc., “Adobe Character Animator.” https://www.adobe.com/products/ character-animator.html. Accessed on 31.7.2021.
    [50] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two-time scale update rule converge to a local nash equilibrium,” Advances in Neural Information Processing Systems, pp. 6626–6637, 2017.
    [51] J. Yaniv, Y. Newman, and A. Shamir, “The face of art: landmark detection and geometric style in portraits,” ACM Transactions on Graphics, vol. 38, no. 4, pp. 1–15, 2019.
    [52] J. Pinkney, “Toonify!.” https://toonify.photos/. Accessed on 31.7.2021.

    QR CODE