簡易檢索 / 詳目顯示

研究生: 蔡竣泓
Chun-Hung Tsai
論文名稱: 解纏臉形學習之大角度人臉重演
Disentangled Shape Learning for Large-Pose Face Reenactment
指導教授: 徐繼聖
Gee-Sern Hsu
口試委員: 莊永裕
Yung-Yu Chuang
林嘉文
Chia-Wen Lin
陳祝嵩
Chu-Song Chen
郭景明
Jing-Ming Guo
徐繼聖
Gee-Sern Hsu
學位類別: 碩士
Master
系所名稱: 工程學院 - 機械工程系
Department of Mechanical Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 60
中文關鍵詞: 深度學習人臉重演
外文關鍵詞: Deep Learning, Face Reenactment
相關次數: 點閱:262下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

我們提出用於大角度人臉重演的Disentangled Shape Learning (DSL) network。DSL由兩個主要的模組組成:1. 解纏臉型抽取器 (Disentangled Shape Encoder, DSE);2. 目標人臉生成器 (Target Face Generator, TFG)。輸入一張參考人臉,DSE首先產生出一張投影正規化坐標碼 (Projected Normalized Coordinate Code, PNCC)來解纏參考人臉之表情和身份,並將PNCC串接參考人臉之三維人臉地標點作為DSE的輸出。使用DSE的輸出和來源人臉的編碼作為TFG的輸入,TFG可以產生一張目標的重演人臉圖像。由於解纏了參考人臉的表情和身份,使生成的目標重演人臉擁有參考人臉的姿態和表情,並更好的保留來源人臉的身份訊息。本方法在Multi-PIE、VoxCeleb1和VoxCeleb2資料庫上與相關文獻進行比較,結果證實本方法皆展現出優異的競爭力。為了展現大角度人臉重演之效能,我們對上述資料庫設計了只考慮大角度資料的評估協定。


We propose the Disentangled Shape Learning (DSL) network for large-pose face reenactment. The DSL is realized by two modules, one is the Disentangled Shape Encoder (DSE) and the other is the Target Face Generator (TFG). Given a reference face, the DSE first generates the Projected Normalized Coordinate Code (PNCC) to disentangle the expression and identity of the reference, and concatenates the PNCC with the 3D landmarks of the reference as output code. The TFG takes the DSE output code and the facial code of the source face to generate the target reenacted face. Due to the expression and identity disentanglement of the reference, the generated target face can better preserve the source identity and exhibit the expression and pose of the reference. We evaluate the proposed approach on the MPIE, VoxCeleb1 and VoxCeleb2 benchmarks and compare with state-of-the-art approaches. To better demonstrate the performance for large-pose reenactment, we design the protocols that only consider the large-pose data in the above benchmarks.

目錄 摘要 2 Abstract 3 誌謝 4 目錄 5 圖目錄 7 表目錄 9 第1章 介紹 10 1.1 研究背景和動機 10 1.2 方法概述 12 1.3 論文貢獻 13 1.4 論文架構 15 第2章 文獻回顧 16 2.1 3DMM 16 2.2 X2Face 17 2.3 Few-shot Video-to-Video Synthesis 18 2.4 First Order Motion Model for Image Animation 20 2.5 HeadGAN 21 2.6 Fast Bi-layer Neural Synthesis of One-Shot Realistic Head Avatars 22 第3章 主要方法 24 3.1 整體網路架構 25 3.2 解纏臉型抽取器設計 26 3.3 目標人臉生成器設計 28 第4章 實驗設置與分析 33 4.1 資料庫介紹 33 4.1.1 Multi-PIE 33 4.1.2 VoxCeleb1 35 4.1.3 VoxCeleb2 36 4.2 實驗設置 36 4.2.1 資料劃分、設置 37 4.2.2 效能評估指標 38 4.2.3 實驗設計 41 4.3 實驗結果與分析 44 4.3.1 目標人臉生成器之設置比較 44 4.3.2 PNCC特徵之影響 48 4.3.3 探討三維人臉形變模型之解纏特性 49 4.4 與相關文獻之比較 50 4.5 大角度人臉重演的評估標準及效能測試 53 第5章 結論與未來研究方向 54 第6章 參考文獻 55   圖目錄 圖 1 1、DSL之網路架構圖 12 圖 2 1、3DMM人臉模型範例 17 圖 2 2、X2Face之網路架構圖 17 圖 2 3、Few-shot vid2vid之網路架構 20 圖 2 4、FOM之網路架構圖 21 圖 2 5、HeadGAN之網路架構圖 22 圖 2 6、Bi-layer之網路架構圖 23 圖 3 1、本架構生成之人臉重演成果。 24 圖 3 2、DSL整體網路架構。 25 圖 3 3、解纏臉型抽取器 (DSE) 之架構圖。 26 圖 3 4、三維人臉投影過程。 27 圖 3 5、Face Alignment Network之網路架構圖 27 圖 3 6、人臉地標點圖。 28 圖 3 7、目標人臉生成器 (TFG) 之訓練流程圖。 28 圖 4 1、Multi-PIE資料庫中攝影機拍攝不同角度之擺放參考位置 34 圖 4 2、Multi-PIE資料庫中不同角度之成像結果 34 圖 4 3、Multi-PIE資料庫中20種光源變化之人臉樣本 34 圖 4 4、Multi-PIE資料庫中6種表情變化之人臉樣本 35 圖 4 5、VoxCeleb1資料庫中知名人物影片片段樣本。 36 圖 4 6、VoxCeleb2資料庫中知名人物影片片段樣本。 36 圖 4 7、TFG在考慮不同損失函數下的圖片比較結果 47 圖 4 8、MPIE資料庫上,加上PNCC特徵後,TFG的生成影像比較 49 圖 4 9、MPIE資料庫上,透過解纏身份特徵和表情特徵後的生成影像比較 50 圖 4 10、本方法與最新方法生成圖片比較結果 51 圖 4 11、本方法與最新方法的跨身份重演生成圖片比較結果 52 圖 4 12、在VoxCeleb2資料庫上,跨身份之大角度人臉生成圖片結果 53

[1] Volker Blanz and Thomas Vetter. A morphable model for the synthesis of 3d faces. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques, 1999.
[2] Chen Cao, Yanlin Weng, Shun Zhou, Yiying Tong, and Kun Zhou. Facewarehouse: A 3d facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics, 20(3):413–425, 2013.
[3] Jie Cao, Yibo Hu, Hongwen Zhang, Ran He, and Zhenan Sun. Towards high fidelity face frontalization in the wild. In IJCV, 2020.
[4] Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman. Vggface2: A dataset for recognising faces across pose and age. In FG, 2018.
[5] Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In CVPR, 2018.
[6] Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. Stargan v2: Diverse image synthesis for multiple domains. In CVPR, 2020.
[7] Joon Son Chung, Arsha Nagrani, and Andrew Zisserman. Voxceleb2: Deep speaker recognition. In INTERSPEECH, 2018.
[8] Deng, Jiankang, et al. "Arcface: Additive angular margin loss for deep face recognition." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
[9] Michail Christos Doukas, Stefanos Zafeiriou, and Viktoriia Sharmanska. Headgan: One-shot neural head synthesis and editing. In IEEE/CVF International Conference on Com- puter Vision (ICCV), 2021.
[10] Ralph Gross, Iain Matthews, Jeffrey Cohn, Takeo Kanade, and Simon Baker. Multi-pie. Image and Vision Computing, 2010.
[11] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NIPS, 2017.
[12] Xun Huang and Serge Belongie. Arbitrary style transfer in real-time with adaptive instance normalization. In ICCV, 2017.
[13] Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision, pages 694–711. Springer, 2016.
[14] Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. In CVPR, 2019.
[15] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
[16] Arsha Nagrani, Joon Son Chung, and Andrew Zisserman. Voxceleb: a large-scale speaker identification dataset. arXiv preprint arXiv:1706.08612, 2017.
[17] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Al- ban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. 2017.
[18] Pascal Paysan, Reinhard Knothe, Brian Amberg, Sami Romdhani, and Thomas Vetter. A 3d face model for pose and illumination invariant face recognition. In 2009 sixth IEEE international conference on advanced video and signal based surveillance, pages 296–301. Ieee, 2009.
[19] Aliaksandr Siarohin, St ́ephane Lathuili`ere, Sergey Tulyakov, Elisa Ricci, and Nicu Sebe. First order motion model for image animation. In NIPS, 2019.
[20] Christian Szegedy, Vincent Vanhoucke, Serget Loffe, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. arXiv preprint arXiv:1512.00567, 2015.
[21] Ting-Chun Wang, Ming-Yu Liu, Andrew Tao, Guilin Liu, Jan Kautz, and Bryan Catanzaro. Few-shot video-to-video synthesis. In Conference on Neural Information Processing Systems (NeurIPS), 2019.
[22] Olivia Wiles, A Sophia Koepke, and Andrew Zisserman. X2face: A network for controlling face generation using images, audio, and pose codes. In ECCV, 2018.
[23] Wayne Wu, Yunxuan Zhang, Cheng Li, Chen Qian, and Chen Change Loy. Reenactgan: Learning to reenact faces via boundary transfer. In ECCV, 2018.
[24] Guangming Yao, Yi Yian, Tianjia Shao, and Kun Zhou. Mesh guided one-shot face reenactment using graph convolutional networks. arXiv preprint arXiv:2008.07783, 2020.
[25] Egor Zakharov, Aleksei Ivakhnenko, Aliaksandra Shysheya, and Victor Lempitsky. Fast bi-layer neural synthesis of one- shot realistic head avatars. In European Conference of Computer vision (ECCV), August 2020.
[26] Egor Zakharov, Aliaksandra Shysheya, Egor Burkov, and Victor Lempitsky. Few-shot adversarial learning of realis- tic neural talking head models. In ICCV, 2019.
[27] Jiangning Zhang, Xianfang Zeng, Mengmeng Wang, Yusu Pan, Liang Liu, Yong Liu, Yu Ding, and Changjie Fan. Freenet: Multi-identity face reenactment. In CVPR, 2020.
[28] Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shecht- man, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018.
[29] Xiangyu Zhu, Zhen Lei, Xiaoming Liu, Hailin Shi, and Stan Z Li. Face alignment across large poses: A 3d solution. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.

無法下載圖示 全文公開日期 2024/09/27 (校內網路)
全文公開日期 2024/09/27 (校外網路)
全文公開日期 2024/09/27 (國家圖書館:臺灣博碩士論文系統)
QR CODE