研究生: |
蔡孟哲 Meng-Che Tsai |
---|---|
論文名稱: |
基於生成對抗網路之 單張人臉影像表情轉換 Human Facial Expressions Transform from Single Image Based on Generative Adversarial Network |
指導教授: |
王乃堅
Nai-Jian Wang |
口試委員: |
蘇順豐
Shun-Feng Su 鍾順平 Shun-Ping Chung 曾德峰 Der-Feng Tseng 方劭云 Shao-Yun Fang 王乃堅 Nai-Jian Wang |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電機工程系 Department of Electrical Engineering |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 中文 |
論文頁數: | 61 |
中文關鍵詞: | 生成對抗網路 、風格轉換 、表情轉換 、深度學習 、類神經網路 |
外文關鍵詞: | Generative Adversarial Network, Style Transformation, Face Expression Transformation, Deep learning, Neural Network |
相關次數: | 點閱:421 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
生成對抗網路使各個領域有突破性地發展,生成網路能夠透過學習訓練資料的分布進而直接生成出相似於訓練資料的分布,在風格轉換的領域上有著極大的推進作用。 本論文實作了一個可根據使用者輸入條件針對單張輸入影像生成指定細微表情影片的系統,我們首先使用基於Wasserstein距離做為損失函數優化的生成對抗網路,進行多域間的跨域轉換,並且使用EmotioNet資料集做為訓練資料集並針對資料集的影像作臉部動作編碼系統(Face Action Code System)蒐集影像動作單元資訊生成指定人臉表情影像。實驗中為了使我們更改臉部表情期間不更動背景場景,我們使用注意力網路著重在必須要修改的區域,為求影像構圖完整我們使用Unet做為主要網路架構。 實驗結果顯示此方法可成功根據使用者輸入條件生成出符合人類視覺評估的影像,除此之外,我們對臉部動作編碼系統中的動作單元進行運算分析,找出動作編碼之間的相依相斥關係以做為操作系統時的指引推薦。
Generative Adversarial Network make breakthrough improvements in various
fields. Generative Adversarial Network can learn the distribution of the given training
data and synthesize new data which is similar to it. It makes a great enhancement on
the style transformation task.
We present a conditional fine human face expression generating system that can
synthesize fine face expression clip based on input of single image. First we focus on
the training and optimization of an image generation model that synthesizes fine face
expression clip based on the Wasserstein distance to exert domain-to-domain
transformation. Then to get the Action Units for training, we use EmotioNet dataset as
training dataset and use Face Action Code System detector on it. To make the
generated image more realistic, we implement Attention Network in our system to
make sure the generator synthesizes the image focus on the foreground and ignore the
background. To prevent our system from synthesizing incomplete image composition,
we use Unet to guarantee the image composition.
Experimental results show that our method can successfully synthesize clips that
conforming human visual assessment. Furthermore, we analyze the Action Units
which was proposed in Face Action Code System and find out the dependence and
repulsion between Action Units. This analysis is a guideline to make sure users
generate image successfully.
[1] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, 2014, pp. 2672– 2680.
[2] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1125– 1134.
[3] M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv
preprint arXiv:1411.1784, 2014.
[4] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2223–2232.
[5] Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo, “Stargan: Unified generative adversarial networks for multi-domain image-toimage translation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8789–8797.
[6] F. Qiao, N. Yao, Z. Jiao, Z. Li, H. Chen, and H. Wang, Geometry-contrastive gan for facial expression transfer,” arXiv preprint arXiv:1802.01822, 2018.
[7] H. Ding, K. Sricharan, and R. Chellappa, “Exprgan: Facial expression editing with controllable expression intensity,” in Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[8] A. Pumarola, A. Agudo, A. M. Martinez, A. Sanfeliu, and F. Moreno-Noguer, “Ganimation: Anatomically-aware facial animation from a single image,” in 49
Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 818–833.
[9] Z. Shao, H. Zhu, J. Tang, X. Lu, and L. Ma, “Explicit facial expression transfer via fine-grained semantic representations,” arXiv preprint arXiv:1909.02967, 2019.
[10] L. Song, Z. Lu, R. He, Z. Sun, and T. Tan, “Geometry guided adversarial facial expression synthesis,” in Proceedings of the 26th ACM international conference on Multimedia, 2018, pp. 627–635.
[11] F. Qiao, N. Yao, Z. Jiao, Z. Li, H. Chen, and H. Wang, “Emotional facial
expression transfer from a single image via generative adversarial nets,” Computer Animation and Virtual Worlds, vol. 29, no. 3-4, p. e1819, 2018.
[12] V. Nair and G. E. Hinton, “Rectified linear units improve restricted Boltzmann machines,” in Proceedings of the 27th international conference on machine learning (ICML-10), 2010, pp. 807–814.
[13] A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in ICML 2013 – 30th International Conference on
Machine Learning (ICML), June 16– 21, Atlanta, Georgia, Proceedings, 2013.
[14] A. Odena, C. Olah, and J. Shlens, “Conditional image synthesis with auxiliary classifier gans,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017, pp.2642–2651.
[15] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for
biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241.
[16] X. Chen, C. Xu, X. Yang, and D. Tao, “Attention-gan for object transfiguration in wild images,” in Proceedings of the European Conference on Computer Vision 50 (ECCV), 2018, pp. 164–180.
[17] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein gan,” arXiv preprint
arXiv:1701.07875, 2017.
[18] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville,
“Improved training of wasserstein gans,” in Advances in neural information processing systems, 2017, pp. 5767–5777.
[19] Y. Wu and K. He, “Group normalization,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 3–19.
[20] E. Friesen and P. Ekman (1978). Facial action coding system: a technique for
the measurement of facial movement. Palo Alto, CA: Consulting Psychologists
Press.
[21] C. Fabian Benitez-Quiroz, R. Srinivasan, and A. M. Martinez, “Emotionet: An
accurate, real-time algorithm for the automatic annotation of a million facial
expressions in the wild,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 5562–5570.
[22] Ageitgey, A. (2020, February 20). ageitgey/face_recognition. Retrieved June 5, 2020, from https://github.com/ageitgey/face_recognition