簡易檢索 / 詳目顯示

研究生: 黃柏翰
Po-Han Huang
論文名稱: 多任務自混合圖像之人臉偽造辨識模型
Multi-Task Self-Blended Images for Face Forgery Detection
指導教授: 花凱龍
Kai-Lung Hua
口試委員: 陳駿丞
Jun-Cheng Chen
陳宜惠
Yi-Hui Chen
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 50
中文關鍵詞: 深偽檢測自監督式訓練多任務學習人臉偽造檢測
外文關鍵詞: Deepfake Detection, Self-Supervised Learning, Multi-Task Learning, Face Forgery Detection
相關次數: 點閱:324下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

深偽檢測 (Deepfake Detection) 任務因社群媒體上廣泛流傳的偽造圖
像而引起關注。但目前大多數基於監督式學習 (Supervised Learning) 的
方法在面對未見過的偽造手法時,往往表現出較差的泛化能力。最近,自
監督式學習 (Self-Supervised Learning, SSL) 對於深偽檢測在模型泛化能
力上表現出優於監督式學習的效能。然而,我們注意到大多數的自監督式
學習方法通常僅採用單一標籤訓練,將所有合成圖像 (Blended Image) 視
為偽造的類別,但忽略了不同的合成參數伴隨而來的難度差異。我們認為
這個問題會限制深偽檢測的模型在自監督式學習上的提升。為了解決這個
問題,我們通過引入幾個基於不同子任務 (Sub-Task) 的輔助損失,強化
了當前最先進的自監督式學習方法,透過該方法的合成管道取得各合成圖
像的參數,並將其分群取得子任務分類標籤。我們的方法在各項指標上展
現了泛化能力的明顯提升。具體而言,對於跨數據集評估 (Cross-Dataset
Evaluation),所提出的方法在各種數據集上的 AUC 上優於現有技術,並
在 CDF、DFDC、DFDCP 和 FFIW 數據集上分別提高了 3.4%、1.47%、
1.56% 和 1.3%,在 DFD 數據集上也比基線方法提高了 1.04%。


Deepfake detection has attracted extensive attention due to wide-spread
forged images on social media. Recently, self-supervised learning (SSL)
for Deepfake detection has outperformed supervised methods in term of
model generalization. However, we notice that most of the SSL methods usually only adopt the hard label paradigm and treat all the synthesized image as the fake class but ignore to consider its difficult level according to different synthesis parameters. Thus, this usually results in
suboptimal performance for face forgery detection. To address this issue,
we strengthen the current state-of-the-art SSL method by introducing several auxiliary losses based on different sub-tasks for data generation to infer their synthesis parameters where the ground-truth labels are obtained
from the synthesis pipeline. Comprehensive evaluations of our approach
have demonstrated noticeable performance improvement in generalization.
Specifically, for the inter-dataset evaluation, the proposed approach outperforms the state-of-the-art method in terms of AUC on various datasets
with improvements of 3.4%, 1.47%, 1.56%, and 1.3% on the CDF, DFDC,
DFDCP, and FFIW datasets and outperforms the baseline method with
1.04% on the DFD dataset.

Contents 論文摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . . . III Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IV List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VIII 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1 Deepfake Generation . . . . . . . . . . . . . . . . 7 2.2 Deepfake Detection . . . . . . . . . . . . . . . . . 8 3 Proposed Method . . . . . . . . . . . . . . . . . . . . . . 13 3.1 Supervision on Manipulations . . . . . . . . . . . 14 3.2 Supervision on Blending Ratio . . . . . . . . . . . 16 3.3 Loss Functions . . . . . . . . . . . . . . . . . . . 16 4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.1 Evaluation Datasets . . . . . . . . . . . . . . . . . 18 4.2 Implementation Details . . . . . . . . . . . . . . . 18 4.3 Evaluation Metric . . . . . . . . . . . . . . . . . . 19 4.4 Intra-Dataset Evaluation . . . . . . . . . . . . . . 20 4.5 Inter-Dataset Evaluation . . . . . . . . . . . . . . 20 4.6 Ablation Study . . . . . . . . . . . . . . . . . . . 27 5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 29 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

[1] Kaede Shiohara and Toshihiko Yamasaki, “Detecting deepfakes with
self-blended images,” IEEE/CVF Conference on Computer Vision
and Pattern Recognition (CVPR), 2022.
[2] “Deepfakes: What are they and why would i make one?,” , Online;
Accessed 2029.
[3] Lingzhi Li, Jianmin Bao, Ting Zhang, Hao Yang, Dong Chen, Fang
Wen, and Baining Guo, “Face x-ray for more general face forgery
detection,” IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR), 2020.
[4] Liang Chen, Yong Zhang, Yibing Song, Lingqiao Liu, and Jue Wang,
“Self-supervised learning of adversarial examples: Towards good
generalizations for deepfake detections,” IEEE/CVF Conference on
Computer Vision and Pattern Recognition (CVPR), 2022.
[5] Tianchen Zhao, Xiang Xu, Mingze Xu, Hui Ding, Yuanjun Xiong, and
Wei Xia, “Learning self-consistency for deepfake detection,” IEEE/
CVF International Conference on Computer Vision (ICCV), 2021.
[6] Laurens Van der Maaten and Geoffrey Hinton, “Visualizing data using t-sne.,” Journal of machine learning research, vol. 9, no. 11,
2008.
[7] Alec Radford, Luke Metz, and Soumith Chintala, “Unsupervised representation learning with deep convolutional generative adversarial
networks,” arXiv preprint arXiv:1511.06434, 2015.
[8] Martin Arjovsky, Soumith Chintala, and Léon Bottou, “Wasserstein
generative adversarial networks,” in International conference on machine learning. PMLR, 2017, pp. 214–223.
[9] Tero Karras, Samuli Laine, and Timo Aila, “A style-based generator
architecture for generative adversarial networks,” in Proceedings of
the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4401–4410.
[10] Jilyan Bianca Dy, John Jethro Virtusio, Daniel Stanley Tan, YongXiang Lin, Joel Paz Ilao, Yung-Yao Chen, and Kai-Lung Hua, “Mcgan: mask controlled generative adversarial network for image retargeting,” Neural Computing and Applications, vol. 35, pp. 10497 –
10509, 2023.
[11] Daniel Stanley Tan, Jonathan Hans Soeseno, and Kai-Lung Hua,
“Controllable and identity-aware facial attribute transformation,”
IEEE Transactions on Cybernetics, vol. 52, pp. 4825–4836, 2021.
[12] Jonathan Hans Soeseno, Daniel Stanley Tan, Wen-Yin Chen, and KaiLung Hua, “Faster, smaller, and simpler model for multiple facial
attributes transformation,” IEEE Access, vol. 7, pp. 36400–36412,
2019.
[13] Yu-Hsiang Hung, Julianne Tan, Tai-Ming Huang, Shang-Che Hsu,
Yi-Ling Chen, and Kai-Lung Hua, “Unpaired image-to-image translation using negative learning for noisy patches,” IEEE MultiMedia,
vol. 29, no. 4, pp. 59–68, 2022.
[14] “Swap-face.,” .
[15] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE international conference on
computer vision, 2017, pp. 2223–2232.
[16] Guim Perarnau, Joost Van De Weijer, Bogdan Raducanu, and Jose M
Álvarez, “Invertible conditional gans for image editing,” arXiv
preprint arXiv:1611.06355, 2016.
[17] Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun
Kim, and Jaegul Choo, “Stargan: Unified generative adversarial networks for multi-domain image-to-image translation,” in Proceedings
of the IEEE conference on computer vision and pattern recognition,
2018, pp. 8789–8797.
[18] Weihao Xia, Yujiu Yang, Jing-Hao Xue, and Baoyuan Wu, “Tedigan: Text-guided diverse face image generation and manipulation,”
in Proceedings of the IEEE/CVF conference on computer vision and
pattern recognition, 2021, pp. 2256–2265.
[19] Justus Thies, Michael Zollhofer, Marc Stamminger, Christian
Theobalt, and Matthias Niessner, “Face2Face: Real-time face capture
and reenactment of rgb videos,” IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR), 2016.
[20] Soumya Tripathy, Juho Kannala, and Esa Rahtu, “Icface: Interpretable and controllable face reenactment using gans,” in Proceedings of the IEEE/CVF winter conference on applications of computer
vision, 2020, pp. 3385–3394.
[21] Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian
Riess, Justus Thies, and Matthias Niessner, “Faceforensics++: Learning to detect manipulated facial images,” IEEE/CVF International
Conference on Computer Vision (ICCV), 2019.
[22] Xi Wu, Zhen Xie, YuTao Gao, and Yu Xiao, “Sstnet: Detecting manipulated faces through spatial, steganalysis and temporal features,”
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020.
[23] Alexandros Haliassos, Konstantinos Vougioukas, Stavros Petridis,
and Maja Pantic, “Lips don’t lie: A generalisable and robust approach
to face forgery detection,” IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
[24] Mohammad Shahid, Isabel Chien, Wannaporn Sarapugdi, Lili Miao,
and Kai-Lung Hua, “Deep spatial-temporal networks for flame detection,” Multimedia Tools and Applications, vol. 80, pp. 35297 –
35318, 2020.
[25] Mohammad Shahid, Shang-Fu Chen, Yu-Ling Hsu, Yung-Yao Chen,
Yi-Ling Chen, and Kai-Lung Hua, “Forest fire segmentation via temporal transformer from aerial images,” Forests, 2023.
[26] Yinglin Zheng, Jianmin Bao, Dong Chen, Ming Zeng, and Fang Wen,
“Exploring temporal coherence for more general video face forgery
detection,” IEEE/CVF International Conference on Computer Vision
(ICCV), 2021.
[27] Ping Wang, Kunlin Liu, Wenbo Zhou, Hang Zhou, Honggu Liu,
Weiming Zhang, and Nenghai Yu, “Adt: Anti-deepfake transformer,”
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.
[28] Xiaohui Zhao, Yang Yu, Rongrong Ni, and Yao Zhao, “Exploring
complementarity of global and local spatiotemporal information for fake face video detection,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.
[29] Mohammad Shahid and Kai-Lung Hua, “Fire detection using transformer network,” Proceedings of the 2021 International Conference on Multimedia Retrieval, 2021.
[30] Mohammad Shahid, John Jethro Virtusio, Yu-Hsien Wu, Yung-Yao
Chen, Muhammad Tanveer, Khan Muhammad, and Kai-Lung Hua,
“Spatio-temporal self-attention network for fire detection and segmentation in video surveillance,” IEEE Access, vol. PP, pp. 1–1,
2021.
[31] Hanqing Zhao, Wenbo Zhou, Dongdong Chen, Tianyi Wei, Weiming Zhang, and Nenghai Yu, “Multi-attentional deepfake detection,”
IEEE/CVF Conference on Computer Vision and Pattern Recognition
(CVPR), 2021.
[32] Hui Guo, Shu Hu, Xin Wang, Ming-Ching Chang, and Siwei Lyu,
“Eyes tell all: Irregular pupil shapes reveal gan-generated faces,”
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.
[33] Weixuan Chen and Daniel McDuff, “Deepphys: Video-based physiological measurement using convolutional attention networks,” in
Proceedings of the european conference on computer vision (ECCV),
2018, pp. 349–365.
[34] Xin Yang, Yuezun Li, and Siwei Lyu, “Exposing deep fakes using
inconsistent head poses,” in ICASSP 2019-2019 IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP).
IEEE, 2019, pp. 8261–8265.
[35] Tackhyun Jung, Sangwon Kim, and Keecheon Kim, “Deepvision:
Deepfakes detection using human eye blinking pattern,” IEEE Access, vol. 8, pp. 83144–83154, 2020.
[36] Luca Maiano, Lorenzo Papa, Ketbjano Vocaj, and Irene Amerini,
“Depthfake: a depth-based strategy for detecting deepfake videos,”
arXiv preprint arXiv:2208.11074, 2022.
[37] Anjith George and Sebastien Marcel, “Cross modal focal loss for rgbd
face anti-spoofing,” IEEE/CVF Conference on Computer Vision and
Pattern Recognition (CVPR), 2021.
[38] Ilke Demir and Umur Aybars Ciftci, “Where do deep fakes look?
synthetic face detection via gaze tracking,” in ACM Symposium on
Eye Tracking Research and Applications, 2021, pp. 1–11.
[39] Jiun-Da Lin, Yue-Hua Han, Po-Han Huang, Julianne Tan, Jun-Cheng
Chen, M. Tanveer, and Kai-Lung Hua, “Defaek: Domain effective
fast adaptive network for face anti-spoofing,” Neural Networks, vol.
161, pp. 83–92, 2023.
[40] Jiun-Da Lin, Hung-Hsiang Lin, Jilyan Dy, Jun-Cheng Chen, M. Tanveer, Imran Razzak, and Kai-Lung Hua, “Lightweight face antispoofing network for telehealth applications,” IEEE Journal of
Biomedical and Health Informatics, vol. 26, no. 5, pp. 1987–1996,
2022.
[41] Alexandros Haliassos, Rodrigo Mira, Stavros Petridis, and Maja Pantic, “Leveraging real talking faces via self-supervision for robust
forgery detection,” in IEEE/CVF Conference on Computer Vision
and Pattern Recognition(CVPR), 2022, pp. 14950–14962.
[42] Chang-Sung Sung, Jun-Cheng Chen, and Chu-Song Chen, “Hearing
and seeing abnormality: Self-supervised audio-visual mutual learning for deepfake detection,” in IEEE International Conference on
Acoustics, Speech and Signal Processing(ICASSP), 2023.
[43] Patrick Pérez, Michel Gangnet, and Andrew Blake, “Poisson image
editing,” in ACM SIGGRAPH 2003 Papers, pp. 313–318. 2003.
[44] Alexander Buslaev, Vladimir I. Iglovikov, Eugene Khvedchenya,
Alex Parinov, Mikhail Druzhinin, and Alexandr A. Kalinin, “Albumentations: Fast and flexible image augmentations,” Information,
vol. 11, no. 2, 2020.
[45] Iacopo Masi, Aditya Killekar, Royston Marian Mascarenhas,
Shenoy Pratik Gurudatt, and Wael AbdAlmageed, “Two-branch recurrent network for isolating deepfakes in videos,” European Conference on Computer Vision (ECCV), 2020.
[46] Tianfei Zhou, Wenguan Wang, Zhiyuan Liang, and Jianbing Shen,
“Face forensics in the wild,” IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR), 2021.
[47] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr
Dollar, “Focal loss for dense object detection,” IEEE/CVF International Conference on Computer Vision (ICCV), 2017.
[48] “Deepfakes,” , Online; Accessed 2022.
[49] Justus Thies, Michael Zollhöfer, and Matthias Nießner, “Deferred
neural rendering: Image synthesis using neural textures,” ACM
Transaction on Graphics, 2019.
[50] “Faceswap,” , Online; Accessed 2022.
[51] Yuezun Li, Xin Yang, Pu Sun, Honggang Qi, and Siwei Lyu,
“Celeb-df: A large-scale challenging dataset for deepfake forensics,”
IEEE/CVF Conference on Computer Vision and Pattern Recognition
(CVPR), 2020.
[52] “Contributing data to deepfake detection research,” , Online; Accessed 2022.
[53] Brian Dolhansky, Joanna Bitton, Ben Pflaum, Jikuo Lu, Russ Howes,
Menglin Wang, and Cristian Canton Ferrer, “The deepfake detection
challenge dataset,” arXiv preprint arXiv:2006.07397, 2020.
[54] Brian Dolhansky, Russ Howes, Ben Pflaum, Nicole Baram, and Cristian Canton Ferrer, “The deepfake detection challenge (DFDC) preview dataset,” arXiv preprint arXiv:1910.08854, 2019.
[55] Mingxing Tan and Quoc Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” International Conference on
Machine Learning (ICML), 2019.
[56] Pierre Foret, Ariel Kleiner, Hossein Mobahi, and Behnam Neyshabur,
“Sharpness-aware minimization for efficiently improving generalization,” International Conference on Learning Representations (ICLR),
2021.
[57] Jiankang Deng, Jia Guo, Evangelos Ververas, Irene Kotsia, and Stefanos Zafeiriou, “Retinaface: Single-shot multi-level face localisation in the wild,” IEEE/CVF Conference on Computer Vision and
Pattern Recognition (CVPR), 2020.
[58] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Deep
residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp.
770–778.
[59] François Chollet, “Xception: Deep learning with depthwise separable
convolutions,” in Proceedings of the IEEE conference on computer
vision and pattern recognition, 2017, pp. 1251–1258.

無法下載圖示 全文公開日期 2025/08/09 (校內網路)
全文公開日期 2025/08/09 (校外網路)
全文公開日期 2025/08/09 (國家圖書館:臺灣博碩士論文系統)
QR CODE