簡易檢索 / 詳目顯示

研究生: 何岡秩
Kang-Chi Ho
論文名稱: 基於Wasserstein生成對抗網絡進行可解析之臉部特徵學習
Transformation of Identity-Preserved Facial Features using Wasserstein Generative Adversarial Network with Gradient Penalty
指導教授: 徐繼聖
Gee-Sern Hsu
口試委員: 洪一平
Yi-Ping Hung
王鈺強
Yu-Chiang Wang
郭景明
Jing-Ming Guo
林惠勇
Huei-Yung Lin
學位類別: 碩士
Master
系所名稱: 工程學院 - 機械工程系
Department of Mechanical Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 中文
論文頁數: 60
中文關鍵詞: 生成對抗網路人臉辨識人臉正面化
外文關鍵詞: Generative Adversarial Network, Face Recognition, Facial Frontalization
相關次數: 點閱:350下載:6
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

我們提出透過Wasserstein損失函數的生成對抗網路輔助擷取人臉屬性特徵(Disentangle Represenetation of Generative Adversarial Network)進行跨角度人臉辨識(Cross Pose Face Recognition),DR-WGAN藉由Wasserstein-1距離以及梯度懲罰的訓練機制取代DRGAN中的Jensen-Shannon (JS) divergence對DRGAN的訓練穩定度以及影像生成的品質上進行效能的提升,由於考量到Wasserstein-1距離以及梯度懲罰的訓練,因此整個生成與對抗的網絡需要重新設計與改良,在研究與實驗的流程中,我們觀察不同設置上的改變對於網路效能上的影響,包括1)資料正規化、2)激活函數、3)訓練資料的增量,並於論文中重點說明網路重新設計上需要考量的問題,我們探討兩種不同的網路設計,其一由生成網路(G)與對抗網路(D)所組成而第二項則基於的一項加入一分類網路(C),實驗中DR-WGAN於MPIE標準資料庫中擊敗DRGAN與其他優秀的演算法,在CFP實驗中證實加入了額外的分類器於DR-WGAN中相較於單純由G與D所組合的網路有更好的效能提升。


We propose the Disentangled Representation Learning on a Wasserstein Generative Adversarial Network with Gradient Penalty, or abbreviated as the DR-WGAN, for handling cross-pose face recognition. The proposed DR-WGAN has improved the state-of-the-art DR-GAN (Disentangled Representation Learning GAN) in the quest for the disentangled facial representation good for cross-pose recognition. The DR-WGAN considers the Wasserstein-1 distance and gradient penalty in the design of the discriminator, instead of the Jensen-Shannon (JS) divergence considered in the DR-GAN, substantially improves the training stability, and in turn the generated image quality. As the Wasserstein-1 distance and gradient penalty considered in the discriminator, the overall generative and adversarial framework needs to be redesigned. We have studied the effects of different approaches for data normalization, activation functions, and data augmentation, and highlight the issues to be considered in the redesigned framework. Two structures of the redesigned frameworks are studied, one with a generator and a discriminator, and the other with an additional classifier. Experiments on the MPIE database show that the DR-WGAN outperforms the DR-GAN and other state-of-the-art approaches. Experiments on the CFP database shows that the framework with an additional classifier outperforms the one without.

摘要 V Abstract VI 誌謝 VII 目錄 VIII 圖目錄 X 表目錄 XIII 第一章 介紹 1 1.1 研究背景和動機 1 1.2 方法概述 1 1.3 論文貢獻 2 1.4 論文架構 3 第二章 文獻回顧 4 2.1 基於深度卷積神經網絡之影像生成相關文獻 4 2.2 Auto Encoder & Adversarial Auto Encoder 4 2.3 Generative Adversarial Network 5 2.4 Semi-Supervised GAN 7 2.5 Wasserstein Generative Adversarial Network 8 2.6 WGAN-Gradient Penalty 10 2.7 Disentangle Representation of GAN 10 第三章 主要方法 11 3.1 人臉仰俯角的資料增量 13 3.2 Layer Normalization對於DRWGAN的影響 15 3.3 Disentangle Representation of Wasserstein GAN 16 3.3.1 網絡架構介紹 17 3.3.2 Problem Formulation 18 第四章 實驗設置與分析 22 4.1 標準資料庫介紹 22 4.1.1 Multi-Pie Database 22 4.1.2 CASIA-WebFace 23 4.1.3 CelebFaces Attributes 24 4.1.4 IARPA Janus Benchmark A 25 4.1.5 Celebrity in Frontal-Profile 26 4.1.6 Annotated Facial Landmark in the Wild 27 4.2 實驗設計 27 4.2.1 Modification on Loss Function 27 4.2.2 Modification on Normalization Method 30 4.3 實驗結果與分析 30 4.3.1 Evaluation on Multi-Pie Database 30 4.3.2 Evaluation on CFP Database 34 4.3.3 Evaluation on IJBA Database 43 第五章 結論與未來研究方向 44 第六章 參考文獻 45

[1] Baldi, Pierre. "Autoencoders, unsupervised learning, and deep architectures." Proceedings of ICML Workshop on Unsupervised and Transfer Learning. 2012.
[2] Makhzani, Alireza, et al. "Adversarial autoencoders." arXiv preprint
arXiv:1511.05644 (2015).
[3] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In NIPS, 2014
[4] Arjovsky, Martin, and Léon Bottou. "Towards principled methods for training generative adversarial networks." arXiv preprint arXiv:1701.04862 (2017).
[5] Arjovsky, Martin, Soumith Chintala, and Léon Bottou. "Wasserstein gan." arXiv preprint arXiv:1701.07875 (2017).
[6] Gulrajani, Ishaan, et al. "Improved training of wasserstein gans." arXiv preprint arXiv:1704.00028 (2017).
[7] Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information Processing Systems. 2016.
[8] Tran, Luan, Xi Yin, and Xiaoming Liu. "Disentangled representation learning gan for pose-invariant face recognition." CVPR. Vol. 3. No. 6. 2017.un, Y., Wang, X., & Tang, X. (2015).
[9] Tran, Luan, Xi Yin, and Xiaoming Liu. "Representation learning by rotating your faces." arXiv preprint arXiv:1705.11136 (2017).
[10] Kwak, Hanock, and Byoung-Tak Zhang. "Ways of conditioning generative adversarial networks." arXiv preprint arXiv:1611.01455 (2016).
[11] Zhu, Xiangyu, et al. "High-fidelity pose and expression normalization for face recognition in the wild." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
[12] Medsker, L. R., and L. C. Jain. "Recurrent neural networks." Design and Applications 5 (2001).
[13] Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton. "Layer normalization." arXiv preprint arXiv:1607.06450 (2016).
[14] Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." arXiv preprint arXiv:1502.03167 (2015).
[15] Gross, Ralph, et al. "Multi-pie." Image and Vision Computing28.5 (2010): 807-813.
[16] Zhu, Xiangyu, et al. "Face alignment across large poses: A 3d solution." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
[17] Blanz, Volker, and Thomas Vetter. "Face recognition based on fitting a 3D morphable model." IEEE Transactions on pattern analysis and machine intelligence 25.9 (2003): 1063-1074.
[18] Romdhani, Sami, and Thomas Vetter. "Estimating 3D shape and texture using pixel intensity, edges, specular highlights, texture constraints and a prior." Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. Vol. 2. IEEE, 2005.
[19] Paysan, Pascal, et al. "A 3D face model for pose and illumination invariant face recognition." Advanced video and signal based surveillance, 2009. AVSS'09. Sixth IEEE International Conference on. Ieee, 2009.
[20] Cao, Chen, et al. "Facewarehouse: A 3d facial expression database for visual computing." IEEE Transactions on Visualization and Computer Graphics 20.3 (2014): 413-425.
[21] Hsu, Gee-Sern, and Cheng-Hua Hsieh. "Cross-pose landmark localization using multi-dropout framework." Biometrics (IJCB), 2017 IEEE International Joint Conference on. IEEE, 2017.
[22] Kingma, Diederik P., and Jimmy Ba. "Adam: A method for stochastic optimization." arXiv preprint arXiv:1412.6980 (2014).Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).
[23] Yi, Dong, et al. "Learning face representation from scratch." arXiv preprint arXiv:1411.7923 (2014).
[24] Liu, Ziwei, et al. "Deep learning face attributes in the wild." Proceedings of the IEEE International Conference on Computer Vision. 2015.
[25] Sengupta, Soumyadip, et al. "Frontal to profile face verification in the wild." Applications of Computer Vision (WACV), 2016 IEEE Winter Conference on. IEEE, 2016.
[26] Klare, Brendan F., et al. "Pushing the frontiers of unconstrained face detection and recognition: IARPA Janus Benchmark A." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
[27] Koestinger, Martin, et al. "Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization." Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on. IEEE, 2011.
[28] Zhu, Zhenyao, et al. "Deep learning identity-preserving face space." Computer Vision (ICCV), 2013 IEEE International Conference on. IEEE, 2013.
[29] Zhu, Zhenyao, et al. "Multi-view perceptron: a deep model for learning face identity and view representations." Advances in Neural Information Processing Systems. 2014.
[30] Yim, Junho, et al. "Rotating your face using multi-task deep neural network." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
[31] Wei, Xiang, et al. "Improving the Improved Training of Wasserstein GANs: A Consistency Term and Its Dual Effect." arXiv preprint arXiv:1803.01541 (2018).
[32] Klambauer, Günter, et al. "Self-normalizing neural networks." Advances in Neural Information Processing Systems. 2017.
[33] Wei, Xiang, et al. "Improving the Improved Training of Wasserstein GANs: A Consistency Term and Its Dual Effect." arXiv preprint arXiv:1803.01541 (2018).
[34] Ren, Mengye, et al. "Normalizing the normalizers: Comparing and extending network normalization schemes." arXiv preprint arXiv:1611.04520 (2016).
[35] Li, Chongxuan, et al. "Triple generative adversarial nets." arXiv preprint arXiv:1703.02291 (2017).

QR CODE