基於生成對抗網路之單張人臉影像表情轉換｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	蔡孟哲 Meng-Che Tsai
論文名稱：	基於生成對抗網路之單張人臉影像表情轉換 Human Facial Expressions Transform from Single Image Based on Generative Adversarial Network
指導教授：	王乃堅 Nai-Jian Wang
口試委員:	蘇順豐 Shun-Feng Su 鍾順平 Shun-Ping Chung 曾德峰 Der-Feng Tseng 方劭云 Shao-Yun Fang 王乃堅 Nai-Jian Wang
學位類別：	碩士 Master
系所名稱：	電資學院 - 電機工程系 Department of Electrical Engineering
論文出版年：	2020
畢業學年度：	108
語文別：	中文
論文頁數：	61
中文關鍵詞：	生成對抗網路、風格轉換、表情轉換、深度學習、類神經網路
外文關鍵詞：	Generative Adversarial Network, Style Transformation, Face Expression Transformation, Deep learning, Neural Network
相關次數：	點閱：421 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

生成對抗網路使各個領域有突破性地發展，生成網路能夠透過學習訓練資料的分布進而直接生成出相似於訓練資料的分布，在風格轉換的領域上有著極大的推進作用。本論文實作了一個可根據使用者輸入條件針對單張輸入影像生成指定細微表情影片的系統，我們首先使用基於Wasserstein距離做為損失函數優化的生成對抗網路，進行多域間的跨域轉換，並且使用EmotioNet資料集做為訓練資料集並針對資料集的影像作臉部動作編碼系統(Face Action Code System)蒐集影像動作單元資訊生成指定人臉表情影像。實驗中為了使我們更改臉部表情期間不更動背景場景，我們使用注意力網路著重在必須要修改的區域，為求影像構圖完整我們使用Unet做為主要網路架構。實驗結果顯示此方法可成功根據使用者輸入條件生成出符合人類視覺評估的影像，除此之外，我們對臉部動作編碼系統中的動作單元進行運算分析，找出動作編碼之間的相依相斥關係以做為操作系統時的指引推薦。

Generative Adversarial Network make breakthrough improvements in various
fields. Generative Adversarial Network can learn the distribution of the given training
data and synthesize new data which is similar to it. It makes a great enhancement on
the style transformation task.
We present a conditional fine human face expression generating system that can
synthesize fine face expression clip based on input of single image. First we focus on
the training and optimization of an image generation model that synthesizes fine face
expression clip based on the Wasserstein distance to exert domain-to-domain
transformation. Then to get the Action Units for training, we use EmotioNet dataset as
training dataset and use Face Action Code System detector on it. To make the
generated image more realistic, we implement Attention Network in our system to
make sure the generator synthesizes the image focus on the foreground and ignore the
background. To prevent our system from synthesizing incomplete image composition,
we use Unet to guarantee the image composition.
Experimental results show that our method can successfully synthesize clips that
conforming human visual assessment. Furthermore, we analyze the Action Units
which was proposed in Face Action Code System and find out the dependence and
repulsion between Action Units. This analysis is a guideline to make sure users
generate image successfully.

摘要................................................................................................................................ I
Abstract ........................................................................................................................ II
誌謝............................................................................................................................. III
目錄............................................................................................................................. IV
圖目錄......................................................................................................................... VI
表目錄...................................................................................................................... VIII
第一章 緒論............................................................................................................ 1
1 研究背景.......................................................................................................... 1
2 研究動機與目標.............................................................................................. 2
3 文獻回顧.......................................................................................................... 3
4 論文架構.......................................................................................................... 4
第二章 類神經網路與生成對抗網路 ................................................................... 5
1 人工神經網路(ANN) ...................................................................................... 5
1.1 前向傳播(Forward Propagation) .......................................................... 6
1.2 反向傳播(Backward Propagation) ....................................................... 7
2 卷積神經網路(CNN) ...................................................................................... 8
2.1 卷積層(Convolution Layer) .................................................................. 8
2.2 激勵函數(Activation Function) ............................................................ 8
2.3 匯集層(Pooling Layer) ....................................................................... 10
3 生成對抗網路(GAN) .................................................................................... 10
第三章 基於生成對抗網路之單張人臉影像表情轉換 ..................................... 20
1 網路流程........................................................................................................ 20
2 損失函數(Loss Function) ............................................................................. 21
2.1 域分類損失(Domain Classification Loss) ......................................... 21
V
2.2 對抗式損失(Adversarial Loss) ........................................................... 22
2.3 重構損失(Reconstruction Loss) ......................................................... 23
3 網路架構........................................................................................................ 24
3.1 生成器架構......................................................................................... 24
3.2 鑑別器架構......................................................................................... 27
3.3 訓練與細節......................................................................................... 27
第四章 實驗結果與分析 ..................................................................................... 30
1 實驗環境規格................................................................................................ 30
2 臉部動作編碼系統(FACS) ........................................................................... 30
3 訓練資料集與前處理.................................................................................... 31
4 實驗結果........................................................................................................ 32
4.1 注意力網路結果................................................................................. 32
4.2 逐漸改變單動作單元影像................................................................. 33
4.3 逐漸改變多動作單元影像................................................................. 36
4.4 轉換完整影像動作單元..................................................................... 38
4.5 失敗的實驗結果................................................................................. 43
4.6 實驗結果與訓練集分析..................................................................... 44
第五章 結論與未來研究方向 ............................................................................. 47
參考資料...................................................................................................................... 48
                                

[1] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, 2014, pp. 2672– 2680.
[2] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1125– 1134.
[3] M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv
preprint arXiv:1411.1784, 2014.
[4] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2223–2232.
[5] Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo, “Stargan: Unified generative adversarial networks for multi-domain image-toimage translation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8789–8797.
[6] F. Qiao, N. Yao, Z. Jiao, Z. Li, H. Chen, and H. Wang, Geometry-contrastive gan for facial expression transfer,” arXiv preprint arXiv:1802.01822, 2018.
[7] H. Ding, K. Sricharan, and R. Chellappa, “Exprgan: Facial expression editing with controllable expression intensity,” in Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[8] A. Pumarola, A. Agudo, A. M. Martinez, A. Sanfeliu, and F. Moreno-Noguer, “Ganimation: Anatomically-aware facial animation from a single image,” in 49
Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 818–833.
[9] Z. Shao, H. Zhu, J. Tang, X. Lu, and L. Ma, “Explicit facial expression transfer via fine-grained semantic representations,” arXiv preprint arXiv:1909.02967, 2019.
[10] L. Song, Z. Lu, R. He, Z. Sun, and T. Tan, “Geometry guided adversarial facial expression synthesis,” in Proceedings of the 26th ACM international conference on Multimedia, 2018, pp. 627–635.
[11] F. Qiao, N. Yao, Z. Jiao, Z. Li, H. Chen, and H. Wang, “Emotional facial
expression transfer from a single image via generative adversarial nets,” Computer Animation and Virtual Worlds, vol. 29, no. 3-4, p. e1819, 2018.
[12] V. Nair and G. E. Hinton, “Rectified linear units improve restricted Boltzmann machines,” in Proceedings of the 27th international conference on machine learning (ICML-10), 2010, pp. 807–814.
[13] A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in ICML 2013 – 30th International Conference on
Machine Learning (ICML), June 16– 21, Atlanta, Georgia, Proceedings, 2013.
[14] A. Odena, C. Olah, and J. Shlens, “Conditional image synthesis with auxiliary classifier gans,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017, pp.2642–2651.
[15] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for
biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241.
[16] X. Chen, C. Xu, X. Yang, and D. Tao, “Attention-gan for object transfiguration in wild images,” in Proceedings of the European Conference on Computer Vision 50 (ECCV), 2018, pp. 164–180.
[17] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein gan,” arXiv preprint
arXiv:1701.07875, 2017.
[18] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville,
“Improved training of wasserstein gans,” in Advances in neural information processing systems, 2017, pp. 5767–5777.
[19] Y. Wu and K. He, “Group normalization,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 3–19.
[20] E. Friesen and P. Ekman (1978). Facial action coding system: a technique for
the measurement of facial movement. Palo Alto, CA: Consulting Psychologists
Press.
[21] C. Fabian Benitez-Quiroz, R. Srinivasan, and A. M. Martinez, “Emotionet: An
accurate, real-time algorithm for the automatic annotation of a million facial
expressions in the wild,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 5562–5570.
[22] Ageitgey, A. (2020, February 20). ageitgey/face_recognition. Retrieved June 5, 2020, from https://github.com/ageitgey/face_recognition

全文公開日期 2025/07/06 (校內網路)
全文公開日期 2025/07/06 (校外網路)
全文公開日期 2025/07/06 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文