研究生: |
黃立邦 Li-Bang Huang |
---|---|
論文名稱: |
由童年至老年之大區間人臉年齡轉換 Facial Age Transformation between Childhood and Seniorhood |
指導教授: |
鍾聖倫
Sheng-Luen Chung 徐繼聖 Gee-Sern Jison Hsu |
口試委員: |
洪一平
Yi-Ping Hung 莊永裕 Yung-Yu Chuang 鄭文皇 Wen-Huang Cheng 徐繼聖 Gee-Sern Jison Hsu 鍾聖倫 Sheng-Luen Chung |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電機工程系 Department of Electrical Engineering |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 中文 |
論文頁數: | 72 |
中文關鍵詞: | 年齡轉換 、StarGAN v2 、Face++ 、ArcFace |
外文關鍵詞: | age transformation, StarGAN v2, Face++, ArcFace |
相關次數: | 點閱:176 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
人臉圖像全年域之年齡轉換所要解決的問題是,由一張人臉圖像生成同一人
較年輕,以及老年時的長像。轉換過程中必需反應出人臉在不同年齡階段,頭形
以及臉上紋理的變化。解決此問題的難處在於無法大規模地針對同一人進行全年
域的圖像收集以當作有效的訓練集;並且,由於欠缺多量有效的全年域的真實樣
本,難以進行客觀有效的效能評估。據此,針對全年齡域人臉生成的年齡轉換任
務,本論文在 StarGAN v2 的基礎上,將年齡設定為 target domain 的屬性,並進
行以下的調整:(一) 讓 Generator 以 Skip Connection 加入能聚焦人臉五官輪廓
的 FAN 子網路,並利用權重的方式調整匯入生成器中解碼器的代表身份特徵的
比例;(二) 同時在訓練時加入比對原始影像與生成影像之身分的 Light CNN 子
網路。此新的網路能有效解耦人臉圖像中:年齡、身份、以及其他風格相關的特
徵。(三) 最後,透過將原始影像同時設定為風格影像以排除非年齡相關的其他風
格特徵涉入生成圖像的流程,以維持原始影像中的身分特徵,而得到較好的身分
與年齡解耦的效果。與目前最好的人臉年齡轉換技術 LATS 作對比,我們同樣使
用 FFHQ-aging dataset,並以 Face++ 與 ArcFace 針對年齡轉換後之生成圖像,
進行大規模的年齡估測與身分辨識。實驗結果證實,本論文所提出的網路架構有
更好的特徵解耦能力,在生成影像品質、預期年齡轉換效果,以及身分保留效果
上皆優於 LATS。
This study aims to generate a facial image reflecting how a person might look like in the future, or looked like in the past, based on a single photo. The transformation has to take into account of the changes in head shape and in facial texture presented at different ages. Nonetheless, the task is challenging in that it is impossible to collect a training set containing identity specific facial images throughout lifetime on a large scale. Furthermore, due to the lack of sufficient facial images covering a whole lifespan, it is difficult to conduct systematic performance evaluation. In this regard, taking advantage of the excellent performance of StarGAN v2 in style transfer, where the generator can be supervised by the discriminator for the generation of desired target style, this study treats age transformation as a transformation of facial aging patterns. In our adapted StarGAN v2-based solution, the age is specified by the target domain. Three adaptations have been made: (1) Adjustable weight for the adaptive wing based heatmap, (2) Identity loss by Light CNN during training phase, and (3) Exclusion of non-age related feature by duplicating source image for reference image. To compare with the state-of-the-art LATS, we use the FFHQ-aging dataset for training and inferencing, and use Face++ and ArcFace to perform in large-scale age estimation and identity recognition on the age-transformed images. Experimental results confirm the proposed solution’s superiority over LATS in terms of image quality, intended age rendition effect, and identity preservation.
[1] I. J. Goodfellow et al., "Generative adversarial nets," in Advances in Neural Information Processing Systems, 2014, vol. 3, pp. 2672-2680.
[2] X. Huang and S. Belongie, "Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization," in Proceedings of the IEEE International Conference on Computer Vision, 2017, vol. 2017-October, pp. 1510-1519.
[3] J. Y. Zhu, T. Park, P. Isola, and A. A. Efros, "Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks," in Proceedings of the IEEE International Conference on Computer Vision, 2017, vol. 2017-October, pp. 2242-2251.
[4] Y. Choi, M. Choi, M. Kim, J. W. Ha, S. Kim, and J. Choo, "StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2018, pp. 8789-8797.
[5] Y. Choi, Y. Uh, J. Yoo, and J.-W. Ha, "Stargan v2: Diverse image synthesis for multiple domains," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 8188-8197.
[6] A. Bulat and G. Tzimiropoulos, "How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks)," in Proceedings of the IEEE International Conference on Computer Vision, 2017, vol. 2017-October, pp. 1021-1030.
[7] X. Wang, L. Bo, and L. Fuxin, "Adaptive wing loss for robust face alignment via heatmap regression," in Proceedings of the IEEE International Conference on Computer Vision, 2019, vol. 2019-October, pp. 6970-6980.
[8] R. Or-El, S. Sengupta, O. Fried, E. Shechtman, and I. Kemelmacher-Shlizerman, "Lifespan Age Transformation Synthesis," in Proceedings of the European Conference on Computer Vision, 2020.
[9] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, "Analyzing and improving the image quality of stylegan," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 8110-8119.
[10] X. Huang, M. Y. Liu, S. Belongie, and J. Kautz, "Multimodal Unsupervised Image-to-Image Translation," in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) vol. 11207 LNCS, ed, 2018, pp. 179-196.
[11] W. Liu, Y. Wen, Z. Yu, M. Li, B. Raj, and L. Song, "SphereFace: Deep hypersphere embedding for face recognition," in Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017, vol. 2017-January, pp. 6738-6746.
[12] X. Wu, R. He, Z. Sun, and T. Tan, "A light CNN for deep face representation with noisy labels," IEEE Transactions on Information Forensics and Security, Article vol. 13, no. 11, pp. 2884-2896, 2018.
[13] B. C. Chen, C. S. Chen, and W. H. Hsu, "Face recognition and retrieval using cross-age reference coding with cross-age celebrity dataset," IEEE Transactions on Multimedia, Article vol. 17, no. 6, pp. 804-815, 2015, Art. no. 7080893.
[14] T. Karras, T. Aila, S. Laine, and J. Lehtinen, "Progressive growing of GANs for improved quality, stability, and variation," in 6th International Conference on Learning Representations, ICLR 2018 - Conference Track Proceedings, 2018.
[15] T. Karras, S. Laine, and T. Aila, "A style-based generator architecture for generative adversarial networks," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019, vol. 2019-June, pp. 4396-4405.
[16] C. H. Lee, Z. Liu, L. Wu, and P. Luo, "MaskGAN: Towards Diverse and Interactive Facial Image Manipulation," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2020, pp. 5548-5557.
[17] J. Deng, J. Guo, N. Xue, and S. Zafeiriou, "ArcFace: Additive angular margin loss for deep face recognition," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019, vol. 2019-June, pp. 4685-4694.
[18] M. Inc. Face++ API. Available: http://www.faceplusplus.com
[19] Z. Wang, X. Tang, W. Luo, and S. Gao, "Face Aging with Identity-Preserved Conditional Generative Adversarial Networks," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2018, pp. 7939-7947.
[20] H. Yang, D. Huang, Y. Wang, and A. K. Jain, "Learning Face Age Progression: A Pyramid Architecture of GANs," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2018, pp. 31-39.