簡易檢索 / 詳目顯示

研究生: 徐逸安
Yi-An Hsu
論文名稱: 一個基於臉部識別網路和面部動作單元縮放技術的臉部表情轉換系統
A Facial Expression Transformation System Based on Face Recognition Networks and Facial Action Unit Scaling Techniques
指導教授: 范欽雄
Chin-Shyurng Fahn
口試委員: 繆紹綱
Shaou-Gang Miaou
王榮華
Jung-Hua Wang
馮輝文
Huei-Wen Ferng
范欽雄
Chin-Shyurng Fahn
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2023
畢業學年度: 112
語文別: 英文
論文頁數: 42
中文關鍵詞: 生成對抗式網路人臉表情變換人臉表情影像生成臉部操縱影像轉譯
外文關鍵詞: Generative Adversarial Network, Facial expression transformation, Facial expression image generation, Facial manipulation, Image translation
相關次數: 點閱:60下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

隨著深度學習的蓬勃發展,越來越多的任務開始使用深度學習的方式。在日常人與人的溝通中,臉部表情的變化被認為是傳達情緒的重要渠道之一。隨著生成對抗式網路(GANs)的進步,近年來已有個人或公司使用虛擬身分作為直播或公關的代言人。本篇論文的目標在保持目標影像人臉身分不變的情況下,根據源影像的表情,將目標影像人臉的表情轉換至源影像的表情。我們的控制條件採用臉部動作編碼系統(The Facial Action Coding System; FACS),這是一種基於生物解剖學對人臉肌肉進行分析編碼的系統。
在相關研究中,我們介紹了在後續工作中會使用到的臉部識別網路Inception network,以及現有的臉部表情變換方法。我們將現有的臉部變換方法根據控制條件分為三類,並且對於各種控制條件說明相關的缺點。在觀察到現有使用動作單元(Action Units; AUs)作為控制條件的臉部表情變換方法,我們發現這些方法皆專注於如何使網路能夠更好的對臉部表情進行變換,而忽略了維持人臉身分訊息。我們認為這是一體兩面的,當人臉的表情變換越是誇張,而相應的人臉的身分訊息則丟失的越大。因此我們想透過約束人臉身分訊息的方式,來探討透過限制人臉身分的方式對於臉部表情變換的可行性。生成器處理流程上,我們使用臉部識別網路約束生成器中的編碼器,生成的影像再次傳送到編碼器和原始目標影像特徵進行約束。細節架構上,我們使用skip-connection串聯編碼器和生成器,目的是用來提高影像的品質,最終由實驗證明我們模型的可行性。
在實驗上我們使用KDEF作為我們實驗的資料集,透過FID、ACD、ED、SSIM和PSNR五個評估指標上,我們和其他模型相比擁有較好的性能。在視覺呈現上,我們使用了兩種不同的比較方式,第一種是使用來源影像作為臉部表情變換的控制條件,第二種是改變單一AU數值保持其他AUs不變的臉部控制條件。和GANimation相比我們的影像更為清晰,也保持了更多的身分訊息,而和Express-master相比我們擁有更為準確的臉部表情變換。我們也給予一些表現較差的例子,包含了三個可能的原因。第一個是因為來源影像的臉部表情在資料集內數量太少導致的,第二個是目標影像的臉部表情和來源影像的臉部表情差距過大,第三個則是因為使用隨機的AUs數值。最後,透過消融研究,我們將損失函數逐步去除,然後觀察不同損失函數對模型的影響。我們確認我們所採用的特徵約束損失和生成約束損失對於我們的系統是有效且必要的。


With the vigorous development of deep learning, more and more tasks are beginning to use deep learning methods. In daily communication between people, changes in facial expressions are considered to be one of the important channels for conveying emotions. With the advancement of Generative Adversarial Networks (GANs), in recent years, individuals or companies have used virtual identities as spokespersons for live broadcasts or public relations. The goal of this thesis is to transfer the expression of the face in the target image to the expression of the source image according to the expression of the source image while keeping the identity of the face in the target image unchanged. Our control condition used The Facial Action Coding System (FACS), a system that analyzes and codes human facial muscles based on biological anatomy.
In related work, we introduce the face recognition network Inception network that will be used in subsequent work, as well as existing facial expression transformation methods. We classify existing face transformation methods into three categories according to control conditions, and illustrate the associated shortcomings for various control conditions. After observing existing facial expression transformation methods that use action units (AUs) as control conditions, we found that these methods focus on how to enable the network to better transform facial expressions, while ignoring the maintenance of facial identity information. We believe that this is two sides of the same coin. When the facial expression changes become more exaggerated, the corresponding identity information of the face will be lost. Therefore, we want to explore the feasibility of facial expression transformation by restricting facial identity information. In the generator processing flow, we use the face recognition network to constrain the encoder in the generator, and the generated image is again sent to the encoder and the original target image features for constraints. In terms of detailed architecture, we use skip-connection to concatenate the encoder and generator in order to improve the quality of the image, and ultimately prove the feasibility of our model through experiments.
In experiments, we use KDEF as the data set for our experiments. Through the five evaluation metrics of FID,ACD,ED,SSIM, and PSNR, we have better performance than other models. In terms of visual presentation, we used two different comparison methods. The first is to use the source image as the control condition for facial expression transformation, and the second is to change the single AU value and keep other AUs unchanged. Compared with GANimation, our images are clearer and retain more identity information, and compared with Express-master, we have more accurate facial expression transformations. We also give some bad performance examples, including three possible reasons. The first is because the number of facial expressions in the source image is too small in the data set. The second is that the facial expressions in the target image are too different from the facial expressions in the source image. The third is due to the use of random AUs values. Finally, through ablation study, we remove the loss functions one by one, and then observe the impact of different loss functions on the model. We confirm that the feature constraint loss and generate constraint loss we adopt are effective and necessary for our system.

中文摘要 i Abstract iii List of Figures viii List of Tables ix Chapter 1 Introduction 1 1.1 Overview 1 1.2 Motivation 1 1.3 System Description 2 1.4 Thesis Organization 3 Chapter 2 Related Work 4 2.1 Face Recognition 4 2.2 Facial Expression Transformation Approach 5 2.2.1 Style transfer approaches 6 2.2.2 The geometry approaches 7 2.2.3 The AUs approaches 8 Chapter 3 Our Facial Expression Transformation Generation Model 10 3.1 Data Preprocessing 10 3.2 Generator Architecture 12 3.2.1 Generator processing flow 12 3.2.2 Detailed architecture of the generator 14 3.3 Discriminator Architecture 16 3.4 Loss Function 17 3.4.1 Adversarial loss 17 3.4.2 Conditional loss 18 3.4.3 Content loss 19 3.4.4 Feature constraint loss 19 3.4.5 Generate constraint loss 19 3.4.6 Overall objective function 20 Chapter 4 Experimental Results and Discussion 21 4.1 Experimental Environment Setup 21 4.2 Dataset Description 22 4.3 The Results of Our Facial Expression Transformation 22 4.3.1 Evaluation metrics 23 4.3.2 Training on our facial expression transformation system 25 4.3.3 Comparison of our model and the others 25 4.3.4 Visual comparison with other models 27 4.3.5 Bad performance examples 34 4.4 Ablation Study 36 Chapter 5 Conclusions and Future Work 38 5.1 Conclusions 38 5.2 Future Work 39 References 40

[1] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, Boston, Massachusetts, pp. 1-9.
[2] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proceedings of the 32nd International Conference on Machine Learning, 2015, Lille, France, pp. 448-456.
[3] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” arXiv preprint arXiv:1512.00567, 2015.
[4] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, “Inception-v4, Inception-ResNet and the impact of residual connections on learning,” arXiv preprint arXiv:1602.07261, 2016.
[5] E. Friesen and P. Ekman, Facial action coding system: A technique for the measurement of facial movement, California: Consulting Psychologists Press, 1978.
[6] J. Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” arXiv preprint arXiv:1703.10593, 2017.
[7] Y. Choi, M. Choi, M. Kim, J. Ha, S. Kim, and J. Choo, “StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation,” arXiv preprint arXiv:1711.09020, 2017.
[8] Y. Xia, W. Zheng, Y. Wang, H. Yu, J. Dong, and F.-Y. Wang, “Local and global perception generative adversarial network for facial expression synthesis,” IEEE Transactions Circuits System and Video Technology, vol. 32, no. 3, pp. 1443-1452, 2022.
[9] F. Qiao, N. Yao, Z. Jiao, Z. Li, H. Chen, and H. Wang, “Geometry-contrastive GAN for facial expression transfer,” arXiv preprint arXiv:1802.01822, 2018.
[10] H. Tang, D. Xu, G. Liu, W. Wang, N. Sebe, and Y. Yan, “Cycle in cycle generative adversarial networks for keypoint-guided image generation,” in Proceedings of the 27th ACM International Conference on Multimedia, 2019, Nice, France, pp. 2052-2060.
[11] C. Fu, Y. Hu, X. Wu, G. Wang, Q. Zhang, and R. He, “High-fidelity face manipulation with extreme poses and expressions,” IEEE Transactions on Information Forensics and Security, vol. 16, pp. 2218-2231, 2021.
[12] Y. Fan, X. Jiang, S. Lan, and J. Lan, “Facial expression transfer based on conditional generative adversarial networks,” IEEE Access, vol. 11, pp. 2169-3536, 2023.
[13] A. Pumarola, A. Agudo, A. M. Martinez, A. Sanfeliu, and F. Moreno-Noguer, “Ganimation: Anatomically-aware facial animation from a single image,” in Proceedings of the European Conference on Computer Vision, Munich, Germany, 2018, pp. 818-833.
[14] R. Wu, G. Zhang, S. Lu, and T. Chen, “Cascade EF-GAN: Progressive facial expression editing with local focuses,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, Seattle, Washington, pp. 5021-5030.
[15] J. Ling, H. Xue, L. Song, S. Yang, R. Xie, and X. Gu, “Toward fine-grained facial expression manipulation,” in Proceedings of the European Conference on Computer Vision, 2020, Glasgow, United Kingdom, pp. 37-53.
[16] M. Liu, Y. Ding, M. Xia, X. Liu, E. Ding, W. Zuo, and S. Wen, “STGAN: A unified selective transfer network for arbitrary image attribute editing,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, Long Beach, California, pp. 3668-3677.
[17] Q. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, 2015, Munich, Germany, Part III, pp. 234-241.
[18] T. Baltrusaitis, A. Zadeh, Y. C. Lim, and L. P. Morency, “OpenFace 2.0: Facial behavior analysis toolkit,” in Proceedings of the 13rd IEEE International Conference on Automatic Face and Gesture Recognition, 2018, Xi’an, China, pp. 59-66.
[19] T. Esler, N. Martin, et al., “Face Recognition Using Pytorch,” [Online] Available: https://github.com/timesler/facenet-pytorch. [Accessed Nov. 2, 2023].
[20] Y. Shen, P. Luo, J. Yan, X. Wang, and X. Tang, “Faceid-GAN: Learning a symmetry three-player GAN for identity-preserving face synthesis,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, Salt Lake City, Utah, pp. 821-830.
[21] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, “Improved training of wasserstein GANs,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, Long Beach, California, pp. 5767-5777.
[22] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in Proceedings of the 34th International Conference on Machine Learning, 2017, Sydney, Australia, pp. 214-223.
[23] D. Lundqvist, A. Flykt, and A. Öhman, “The Karolinska Directed Emotional Faces-KDEF [CD-ROM],” Karolinska Institute, Stockholm, 1998.
[24] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “GANs trained by a two time-scale update rule converge to a local Nash equilibrium,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, Long Beach, California, pp. 6626-6637.
[25] S. Tulyakov, M. Y. Liu, X. Yang, and J. Kautz, “MocoGAN: Decomposing motion and content for video generation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, Salt Lake City, Utah, pp. 1526-1535.
[26] D. Sandberg, R. Rai, et al., “Face Recognition using Tensorflow,” [Online] Available: https://github.com/davidsandberg/facenet. [Accessed Apr. 17, 2018].
[27] D. P. Kingma, and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.

無法下載圖示 全文公開日期 2029/01/24 (校內網路)
全文公開日期 2034/01/24 (校外網路)
全文公開日期 2034/01/24 (國家圖書館:臺灣博碩士論文系統)
QR CODE