重構臉型關注學習之單張大角度人臉重演｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	洪偉傑 Wei-Jie Hong
論文名稱：	重構臉型關注學習之單張大角度人臉重演 Recomposed Shape Attention Learning for One-Shot Large-Pose Face Reenactment
指導教授：	徐繼聖 Gee-Sern Hsu
口試委員:	鍾聖倫 Sheng-Luen Chung 陳祝嵩 Chu-Song Chen 林惠勇 Huei-Yung Lin
學位類別：	碩士 Master
系所名稱：	工程學院 - 機械工程系 Department of Mechanical Engineering
論文出版年：	2023
畢業學年度：	111
語文別：	中文
論文頁數：	61
中文關鍵詞：	人臉重演、人臉辨識、三維人臉模型
外文關鍵詞：	Face Reenactment, Transformer, FLAME
相關次數：	點閱：146 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

我們提出重構臉型關注學習之單張大角度人臉重演模型 RSAL。相比於主流
方法以 GAN 為主要生成器，我們引入了 Transformer 機制，提升在大角度及表
情變化下身分保留。RSAL 模型由三個模組組成，分別為臉型重構編碼器(Shape
Recomposition Encoder, SRE)、臉型強化轉換器(Shape Enhanced Transformer, SET)
和注意力嵌入生成器(Attention-Embedded Generator, AEG)。首先 SRE 生成目標
的臉型編碼，將來源人臉身份和參考人臉動作重新建構成目標臉型編碼。SET 從
來源的人臉及臉型編碼中提取風格特徵碼。AEG 將目標臉型編碼和風格特徵碼
作為輸入，產生出重演人臉，能夠生成高質量的人臉圖像。我們方法的優點之
一是訓練機制，改進相關方法的不穩定性，大多用於微調(Finetune)少數樣本。
但本方法可以使用單張來源人臉，並實現跨身分人臉重演。本方法在 MPIE-LP、
VoxCeleb1[16] 和 VoxCeleb2-LP 資料庫與相關論文進行比較，結果證實本方法
在單張大角度人臉重演上展現出優異競爭力。

We propose Recomposed Shape Attention Learning (RSAL) for One-Shot
Large-Pose Face Reenactment. Different from previous approaches that based on
GAN for identity preservation during training, we introduce transformer mechanism
to improve identity preservation across large poses. The RSAL model consists of
three modules, namely the Shape Recomposition Encoder (SRE), the Shape
Enhanced Transformer (SET) and the Attention-Embedded Generator (AEG). Given
a source face and a reference face, the SRE generates the depth shape code that
combine the source identity and reference action. The SET extracts style code from
the source face and fused depth code. The AEG takes the fused depth code and style
code as inputs to generate the desired reenacted face which show capable of
producing high-quality facial images. The favorable properties of the approach is
training mechanism, which improves weak controllability in the previous methods
used to fine-tuned few images (few shot). Our method can use single source image
(one-shot) and enable cross reenactment. We evaluate our approach on the MPIE-LP,
VoxCeleb1, and VoxCeleb2-LP datasets. The large pose qualitative and quantitative
results show that proposed approach produces reenacted faces better than
state-of-the-art.

目錄
摘要................................................................................2
Abstract............................................................................3
誌謝................................................................................4
目錄................................................................................5
圖目錄..............................................................................7
表目錄..............................................................................9
第 1 章 介紹.......................................................................10
1 研究背景和動機..................................................................10
2 方法概述.......................................................................13
3 論文貢獻.......................................................................14
4 論文架構.......................................................................16
第 2 章 文獻回顧...................................................................17
1 FLAME ........................................................................17
2 TransEditor...................................................................18
3 Style Transformer for Image Inversion and Editing.............................19
4 StyleSwin ....................................................................21
5 First Order Motion Model for Image Animation .................................21
6 HeadGAN.......................................................................22
7 Bi-layer......................................................................22
8 Face2Face.....................................................................22
第 3 章 主要方法...................................................................25
1 整體網路架構...................................................................26
2 角度適應編碼器設計.............................................................27
3 臉型重構編碼器設計.............................................................29
4 臉型強化轉換器和注意力嵌入生成器設計.............................................29
第 4 章 實驗設置與分析..............................................................33
1 資料庫介紹.....................................................................33
1.1 Multi-LP....................................................................33
1.2 VoxCeleb1 ..................................................................36
1.3 VoxCeleb2-LP ...............................................................36
2 實驗設置.......................................................................37
2.1 資料劃分、設置...............................................................37
2.2 效能評估指標.................................................................38
2.3 實驗設計.....................................................................40
3 實驗結果與分析.................................................................43
3.1 臉型強化轉換器之設置比較......................................................43
3.2 FLAME 特徵之影響 ............................................................45
3.3 探討身分損失函數比較..........................................................46
3.4 探討自注意力機制.............................................................46
3.5 探討重演人臉微調張數比較......................................................46
4 與相關文獻之比較...............................................................50
第 5 章 結論與未來研究方向..........................................................54
第 6 章 參考文獻....................................................................55
                                

[1] Tianye, et al. Learning a model of facial shape and expression from 4D scans. ACM
Trans. Graph., 2017
[2] Chen Cao, Yanlin Weng, Shun Zhou, Yiying Tong, and Kun Zhou.
Facewarehouse: A 3d facial expression database for visual computing. IEEE
Transactions on Visualization and Computer Graphics, 20(3):413–425, 2013.
[3] Jie Cao, Yibo Hu, Hongwen Zhang, Ran He, and Zhenan Sun. Towards high
fidelity face frontalization in the wild. In IJCV, 2020.
[4] Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman. Vggface2: A dataset
for recognising faces across pose and age. In FG, 2018.
[5] Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and
Jaegul Choo. Stargan: Unified generative adversarial networks for multi-domain
image-to-image translation. In CVPR, 2018.
[6] Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. Stargan v2: Diverse
image synthesis for multiple domains. In CVPR, 2020.
[7] Joon Son Chung, Arsha Nagrani, and Andrew Zisserman. Voxceleb2: Deep
speaker recognition. In INTERSPEECH, 2018.
[8] Deng, Jiankang, et al. "Arcface: Additive angular margin loss for deep face
recognition." Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition. 2019.
[9] Michail Christos Doukas, Stefanos Zafeiriou, and Viktoriia Sharmanska.
Headgan: One-shot neural head synthesis and editing. In IEEE/CVF
International Conference on Com- puter Vision (ICCV), 2021.
[10] Ralph Gross, Iain Matthews, Jeffrey Cohn, Takeo Kanade, and Simon Baker.
Multi-pie. Image and Vision Computing, 2010.
[11] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and
Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a
local nash equilibrium. In NIPS, 2017.
[12] Xun Huang and Serge Belongie. Arbitrary style transfer in real-time with
adaptive instance normalization. In ICCV, 2017.
[13] Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual losses for real-time
style transfer and super-resolution. In European conference on computer vision,
pages 694–711. Springer, 2016.
[14] Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture
for generative adversarial networks. In CVPR, 2019.
[15] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization.
arXiv preprint arXiv:1412.6980, 2014.
[16] Arsha Nagrani, Joon Son Chung, and Andrew Zisserman. Voxceleb: a
large-scale speaker identification dataset. arXiv preprint arXiv:1706.08612,
2017.
[17] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang,
Zachary DeVito, Zeming Lin, Al- ban Desmaison, Luca Antiga, and Adam Lerer.
Automatic differentiation in pytorch. 2017.
[18] Pascal Paysan, Reinhard Knothe, Brian Amberg, Sami Romdhani, and Thomas
Vetter. A 3d face model for pose and illumination invariant face recognition. In
2009 sixth IEEE international conference on advanced video and signal based
surveillance, pages 296–301. Ieee, 2009.
[19] Aliaksandr Siarohin, St ́ephane Lathuili`ere, Sergey Tulyakov, Elisa Ricci, and
Nicu Sebe. First order motion model for image animation. In NIPS, 2019.
[20] Christian Szegedy, Vincent Vanhoucke, Serget Loffe, and Zbigniew Wojna.
Rethinking the inception architecture for computer vision. arXiv preprint
arXiv:1512.00567, 2015.
[21] Ting-Chun Wang, Ming-Yu Liu, Andrew Tao, Guilin Liu, Jan Kautz, and Bryan
Catanzaro. Few-shot video-to-video synthesis. In Conference on Neural
Information Processing Systems (NeurIPS), 2019.
[22] Olivia Wiles, A Sophia Koepke, and Andrew Zisserman. X2face: A network for
controlling face generation using images, audio, and pose codes. In ECCV, 2018.
[23] Wayne Wu, Yunxuan Zhang, Cheng Li, Chen Qian, and Chen Change Loy.
Reenactgan: Learning to reenact faces via boundary transfer. In ECCV, 2018.
[24] Guangming Yao, Yi Yian, Tianjia Shao, and Kun Zhou. Mesh guided one-shot
face reenactment using graph convolutional networks. arXiv preprint
arXiv:2008.07783, 2020.
[25] Egor Zakharov, Aleksei Ivakhnenko, Aliaksandra Shysheya, and Victor
Lempitsky. Fast bi-layer neural synthesis of one- shot realistic head avatars. In
European Conference of Computer vision (ECCV), August 2020.
[26] Egor Zakharov, Aliaksandra Shysheya, Egor Burkov, and Victor Lempitsky.
Few-shot adversarial learning of realis- tic neural talking head models. In ICCV,
2019.
[27] Jiangning Zhang, Xianfang Zeng, Mengmeng Wang, Yusu Pan, Liang Liu, Yong
Liu, Yu Ding, and Changjie Fan. Freenet: Multi-identity face reenactment. In
CVPR, 2020.
[28] Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shecht- man, and Oliver Wang.
The unreasonable effectiveness of deep features as a perceptual metric. In
Proceedings of the IEEE conference on computer vision and pattern recognition,
pages 586–595, 2018.
[29] Xiangyu Zhu, Zhen Lei, Xiaoming Liu, Hailin Shi, and Stan Z Li. Face
alignment across large poses: A 3d solution. In Proceedings of the IEEE
conference on computer vision and pattern recognition, 2016.
[30] XU, Yanbo, et al. Transeditor: Transformer-based dual-space gan for highly
controllable facial editing. In: Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition. 2022
[31] HU, Xueqi, et al. Style transformer for image inversion and editing.
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition. 2022.
[32] ZHANG, Bowen, et al. Styleswin: Transformer-based gan for high-resolution
image generation. In: Proceedings of the IEEE/CVF conference on computer
vision and pattern recognition. 2022.
[33] LIU, Ze, et al. Swin transformer: Hierarchical vision transformer using shifted
windows. In: Proceedings of the IEEE/CVF international conference on
computer vision. 2021
[34] YANG, Kewei, et al. Face2Face ρ: Real-Time High-Resolution One-Shot Face
Reenactment. In: European conference on computer vision. Cham: Springer
Nature Switzerland, 2022.
[35] HSU, Gee-Sern; TSAI, Chun-Hung; WU, Hung-Yi. Dual-generator face
reenactment. In: Proceedings of the IEEE/CVF Conference on Computer Vision
and Pattern Recognition. 2022.
[36] Vggface2: A dataset for recognising faces across pose and age. In: 2018 13th
IEEE international conference on automatic face & gesture recognition (FG
2018). IEEE, 2018.
[37] MENG, Qiang, et al. Magface: A universal representation for face recognition
and quality assessment. In: Proceedings of the IEEE/CVF conference on
computer vision and pattern recognition. 2021
[38] LOPER, Matthew, et al. SMPL: A skinned multi-person linear model. ACM transactions
on graphics (TOG), 2015

全文公開日期 2024/08/10 (校內網路)
全文公開日期 2024/08/10 (校外網路)
全文公開日期 2024/08/10 (國家圖書館：臺灣博碩士論文系統)