基於生成對抗網路之動漫頭像編輯工具｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	邱品峰 Pin-Feng Qiu
論文名稱：	基於生成對抗網路之動漫頭像編輯工具 An Editing Tool for Anime Portrait based on Generative Adversarial Networks
指導教授：	戴文凱 Wen-Kai Tai
口試委員:	鮑興國 Hsing-Kuo Pao 章耀勳 Yao-Xun Chang
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2023
畢業學年度：	111
語文別：	英文
論文頁數：	68
中文關鍵詞：	動漫風格、對抗生成網路、圖像編輯、GAN inversion
外文關鍵詞：	anime style, generative adversarial network, image editing, GAN inversion
相關次數：	點閱：240 下載：5
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

近年來，ACG（動畫、漫畫和遊戲）文化非常盛行，以其特殊的動畫
風格和可愛的動畫角色吸引了大量客群。因此創造出精美的動畫產品（動
漫肖像、動漫角色）被認為是 ACG 文化中最重要的部分之一。然而，要
創造出良好的動漫肖像，需要大量的經驗和創造力。即使是專業人士也需
要花費一些時間和精力才能完成這項工作，因此，我們想開發一個工具來
解決這個問題。

基於 EditGAN 的方法，本論文提出了一編輯工具，能夠透過操作語
意分割圖來生成、編輯動畫肖像。我們也提出了一個過濾資料的流程，
以確保我們訓練資料集中動漫肖像的品質。因為動漫的語意分割圖很難
取得，自己標記也十分花費時間，所以我們使用 DatasetGAN 的概念來
產生大量的語意資料集。此外，我們也提出了 Anime loss 來改進 GAN
inversion 在動漫領域上的效能。

本論文做了使用者感知調查，讓受試者來決定編輯過後的動漫肖像的
品質和與原始的 EditGAN 的比較。結果顯示，我們產生的編輯結果良好
且普遍能被接受，同時優於原始的 EditGAN 。

In recent years, ACG (Anime, Manga, and Game) culture has become popular. ACG culture attracts a large audience with its distinctive animation style and cute animated characters. Creating beautiful animation products such as anime portraits and characters is considered as one of the most important parts of ACG culture. However, creating good anime portraits requires a lot of experiences and creativity. Even professionals may spend a lot of time and efforts to complete this work. Therefore, we aim to develop a tool to solve this problem.

In this thesis, we propose an editing tool that can generate and edit
anime portraits by manipulating semantic segmentation maps, based on the
method of EditGAN. We also propose a dataset filtering pipeline to ensure
the quality of anime portraits in our training dataset. Because obtaining
semantic segmentation maps for anime is challenging and time-consuming
to label, we use the concept of DatasetGAN to generate a large amount
of semantic data. In addition, we propose an anime loss to improve the
performance of GAN inversion in the anime domain.

Finally, we conduct a human perceptual study to evaluate the quality of the edited anime portraits and compare them to those generated by the original EditGAN. The results show that the edited anime portraits we generated are good and generally acceptable, and perform better than those generated by the original EditGAN.

Abstract in Chinese . . . . . . . . . . . . . . . . . . . . . . . . . . i
Abstract in English . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1 Generative Adversarial Network (GAN) . . . . . . . . . . 4
2 Style-Based Generator (StyleGAN, StyleGAN2) . . . . . . 4
3 GAN Inversion . . . . . . . . . . . . . . . . . . . . . . . 8
4 DragGAN . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5 EditGAN . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1 Anime portrait dataset . . . . . . . . . . . . . . . . . . . . 16
2 Dataset filtering pipeline . . . . . . . . . . . . . . . . . . 18
3 GAN inversion of anime portraits . . . . . . . . . . . . . 19
3.1 Anime loss . . . . . . . . . . . . . . . . . . . . . 20
3.2 Embedding anime portraits into latent space . . . . 22
4 Semantic anime dataset . . . . . . . . . . . . . . . . . . . 22
5 Anime face-parsing model . . . . . . . . . . . . . . . . . 24
6 Creating branch . . . . . . . . . . . . . . . . . . . . . . . 26
7 Editing branch . . . . . . . . . . . . . . . . . . . . . . . . 27
Experimental results . . . . . . . . . . . . . . . . . . . . . . . . 30
1 Effectiveness of dataset filtering pipeline . . . . . . . . . . 30
2 Performance of anime face-parsing model . . . . . . . . . 31
3 Evaluation of GAN inversion . . . . . . . . . . . . . . . . 33
4 Evaluation of creating and editing branch . . . . . . . . . 36
4.1 Human perceptual study . . . . . . . . . . . . . . 39
Conclusions and Future Work . . . . . . . . . . . . . . . . . . . 44
1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 44
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Appendix A Human perceptual study . . . . . . . . . . . . . . . . . 50
                                

[1] A. Jabbar, X. Li, and B. Omar, “A survey on generative adversarial networks: Variants, applications,
and training,” ACM Computing Surveys, vol. 54, no. 8, pp. 1–49, 2021.
[2] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,
pp. 4401–4410, 2019.
[3] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, “Analyzing and improving the
image quality of stylegan,” in Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, pp. 8110–8119, 2020.
[4] W. Xia, Y. Zhang, Y. Yang, J. Xue, B. Zhou, and M. Yang, “Gan inversion: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, pp. 3121–3138, mar 2023.
[5] X. Pan, A. Tewari, T. Leimkühler, L. Liu, A. Meka, and C. Theobalt, “Drag your gan: Interactive
point-based manipulation on the generative image manifold,” in ACM SIGGRAPH 2023 Conference
Proceedings, 2023.
[6] H. Ling, K. Kreis, D. Li, S. W. Kim, A. Torralba, and S. Fidler, “Editgan: High-precision semantic
image editing,” in Advances in Neural Information Processing Systems, vol. 34, pp. 16331–16345,
2021.
[7] Y. Zhang, H. Ling, J. Gao, K. Yin, J.-F. Lafleche, A. Barriuso, A. Torralba, and S. Fidler, “Datasetgan: Efficient labeled data factory with minimal human effort,” in Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition, pp. 10145–10155, 2021.
[8] Linaqruf, “anything-v3.0.” https://huggingface.co/Linaqruf/anything-v3.0. [Online; accessed 02-May-2023].
[9] Daxueconsulting contributors, “The colorful marketing potential concealed in China's anime, comics,
and games (acg) market.” https://daxueconsulting.com/chinas-acg-market/, 2021. [Online; accessed 18-December-2022].
[10] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and
Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems,
pp. 2672–2680, 2014.
[11] T. Park, M.-Y. Liu, T.-C. Wang, and J.-Y. Zhu, “Semantic image synthesis with spatially-adaptive
normalization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2337–2346, 2019.
[12] E. Richardson, Y. Alaluf, O. Patashnik, Y. Nitzan, Y. Azar, S. Shapiro, and D. Cohen-Or, “Encoding in style: a stylegan encoder for image-to-image translation,” in Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition, pp. 2287–2296, 2021.
[13] R. Abdal, Y. Qin, and P. Wonka, “Image2stylegan: How to embed images into the stylegan latent
space?,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4432–
4441, 2019.
[14] A. Creswell and A. A. Bharath, “Inverting the generator of a generative adversarial network,” IEEE
Transactions on Neural Networks and Learning Systems, vol. 30, no. 7, pp. 1967–1974, 2018.
[15] O. Tov, Y. Alaluf, Y. Nitzan, O. Patashnik, and D. Cohen-Or, “Designing an encoder for stylegan
image manipulation,” ACM Transactions on Graphics, vol. 40, no. 4, pp. 1–14, 2021.
[16] S. Guan, Y. Tai, B. Ni, F. Zhu, F. Huang, and X. Yang, “Collaborative learning for faster stylegan
embedding,” arXiv preprint arXiv:2007.01758, 2020.
[17] S. Pidhorskyi, D. A. Adjeroh, and G. Doretto, “Adversarial latent autoencoders,” in Proceedings of
the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14104–14113, 2020.
[18] D. Bau, J.-Y. Zhu, J. Wulff, W. Peebles, H. Strobelt, B. Zhou, and A. Torralba, “Inverting layers of a
large generator,” in ICLR Workshop, vol. 2, p. 4, 2019.
[19] J. Zhu, Y. Shen, D. Zhao, and B. Zhou, “In-domain gan inversion for real image editing,” in Proceedings of European Conference on Computer Vision, pp. 592–608, Springer, 2020.
[20] Anonymous, D. community, and G. Branwen, “Danbooru2021: A large-scale crowdsourced and
tagged anime illustration dataset.” https://gwern.net/danbooru2021, January 2022.
[21] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis
with latent diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, pp. 10684–10695, 2022.
[22] SkyTNT, “anime-segmentation.” https://github.com/SkyTNT/anime-segmentation/. [Online; accessed 17-March-2023].
[23] Nagadomi, “lbpcascade animeface.” https://github.com/nagadomi/lbpcascade_animeface.
[Online; accessed 17-March-2023].
[24] Matthew Baas, “Danbooru2018 pretrained resnet models for pytorch.” https://rf5.github.io/
2019/07/08/danbuuro-pretrained.html. [Online; accessed 17-March-2023].
[25] lltcggie, “waifu2x-caffe.” https://github.com/lltcggie/waifu2x-caffe. [Online; accessed
17-March-2023].
[26] Y. Alaluf, O. Patashnik, and D. Cohen-Or, “Restyle: A residual-based stylegan encoder via iterative refinement,” in Proceedings of the IEEE/CVF International Conference on Computer Vision,
pp. 6711–6720, 2021.
[27] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep
features as a perceptual metric,” in Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, pp. 586–595, 2018.
[28] J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “Arcface: Additive angular margin loss for deep face recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,
pp. 4685–4694, 2019.
[29] T. Wei, D. Chen, W. Zhou, J. Liao, W. Zhang, L. Yuan, G. Hua, and N. Yu, “E2style: Improve the
efficiency and effectiveness of stylegan inversion,” IEEE Transactions on Image Processing, vol. 31,
pp. 3267–3280, 2022.
[30] K. Wada, “labelme: Image polygonal annotation with python.” https://github.com/wkentaro/
labelme, 2018.
[31] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th
International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp. 234–
241, Springer, 2015.
[32] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings
of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.
[33] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two timescale update rule converge to a local nash equilibrium,” in Advances in Neural Information Processing
Systems, pp. 6629–6640, 2017.

簡易檢索 / 詳目顯示

相關論文