簡易檢索 / 詳目顯示

研究生: 羅奕翔
Yi-Xiang Luo
論文名稱: 雙重注意力神經網路技術於人臉偽造影片檢測之應用
Dual Attention Neural Network Approaches to Face Forgery Video Detection
指導教授: 陳俊良
Jiann-Liang Chen
口試委員: 郭耀煌
廖婉君
孫雅麗
黎碧煌
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 54
中文關鍵詞: 深偽影像電腦視覺深度學習卷積神經網路注意力機制
外文關鍵詞: Deepfake, CV, Deep learning, CNN, Attention
相關次數: 點閱:252下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著深度學習技術的成熟,深偽(Deepfake)被廣泛應用於影像編輯技術,創造出一個新的影音環境。運用該技術製作出來的影片大部分是惡意內容,即色情換臉、製作假新聞影片等。這些影片在網路上流竄,這不僅會帶來嚴重的資訊安全問題,也存在法院中作為證據的影像是否被竄改的疑慮,會嚴重影響網路使用者對資訊的信心。這兩年由於全球壟罩在COVID-19疫情之下,許多人們被迫在家隔離,人們對於社群媒體以及網路上散布的資訊依賴度大增。檢測影片是否遭到偽造比以往來更加受到重視,因此Deepfake檢測技術成為近年來備受關注的議題。
    深偽影片檢測任務的目標是如何從假影片中捕獲篡改痕跡。在本研究中,使用卷積神經網路(Convolutional Neural Network, CNN)來提取人臉篡改的痕跡,並提出了一套雙重注意偽造檢測網絡(Dual Attention for Forgery Detection Network, DAFDN),它使用空間降維注意模塊(Spatial Reduction Attention Block, SRAB)和偽造特徵注意模組(Forgery Feature Attention Module, FFAM)應用於骨幹網絡中的塊之間。這兩個模組增強了網路注意圖像中偽造的區域。此外,在本文中為了進一步將兩個注意力模組有效的嵌入骨幹網路視,將多種不同組合的變異體進行泛化能力以及注意能力分析,並將表現最好的變異體作為本研究提出的DAFDN模型。
    為了評估模型架構的偵測校能,本研究使用兩個公共測試資料集,包括DFDC 以及 FaceForensics++,比較所提出DAFDN與其他方法的表現,並以可視化的方式呈現 XceptionNet 和 DAFDN 的注意力圖。研究結果顯示,本研究所提DAFDN策略在兩種數據集上都取得了良好的性能: DFDC及FaceForensics++分別獲得 0.911及0.945 AUC 值。本研究所提出之DFDC優於其他方法如XceptionNet and Ensemble EfficientNet。


    With the maturity of deep learning technology, Deepfake is widely used in video editing technology to create a new audio-visual environment. Most of the videos created using this technology are malicious content, i.e., pornographic face swaps, fake news videos, etc. The circulation of these videos on the Internet poses serious information security concerns and raises doubts about whether the images used as evidence in court have been tampered with, which can seriously affect the confidence of Internet users in information. In the past two years, as the COVID-19 epidemic and many people have monopolized the world have been forced to isolate themselves at home, reliance on social media and information disseminated on the Internet has increased. Detection of video forgery has become more critical than ever, and therefore deepfake detection technology has become an issue of great interest in recent years.

    Deepfake videos inspection mission aims to capture tampering traces from fake videos. This study used a convolutional neural network (CNN) to extract traces of face tampering. It also proposed a Dual Attention for Forgery Detection Network (DAFDN), which uses spatial reduction attention block (SRAB) and forgery feature attention module (FFAM) applied to the backbone network. DAFDN embeds the two proposed attention mechanisms and enables the network to extract the peculiar traces left by the warping of images. In addition, various combinations of variants are analyzed for generalization ability and attention ability to further embed the two attention modules into the backbone network effectively. The best-performing variant is used as the DAFDN model proposed in this study.

    This study uses two public datasets, DFDC and FaceForensics++, to compare the performance of the proposed DAFDN with other existing methods. The results show that the proposed DAFDN can achieve 0.911 and 0.945 AUC values from dataset DFDC and FaceForensics++, respectively. These results are better than previous methods, such as XceptionNet and EfficientNet-related methods.

    摘要 I Abstract II List of Figures VII List of Tables IX Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Contributions 3 1.3 Organization 4 Chapter 2 Related Work 6 2.1 Deepfake Detection 6 2.1.1 Warping Traces Detection 6 2.1.2 Visual Artifacts Detection 9 2.1.3 Biological Signals Detection 10 2.2 CNNs 12 2.2.1 Introduction of CNN 12 2.2.2 VGG 12 2.2.3 ResNet 13 2.2.4 XceptionNet 13 2.2.5 EfficientNet 14 2.3 Attention 15 Chapter 3 Methods 18 3.1 Forgery Attention Module 18 3.1.1 Spatial Reduction Attention Block (SRAB) 19 3.1.2 Forgery Feature Attention Module (FFAM) 20 3.2 Dual Attention Forgery Detection Network 21 Chapter 4 Performance Analysis 26 4.1 Dataset 26 4.2 Data Preprocessing 28 4.3 Experimental Environment 29 4.4 Implementation Details 30 4.5 Classification Performance 32 Chapter 5 Conclusions and Future Works 35 5.1 Conclusions 35 5.2 Future Works 35 5.2.1 Attention Mechanism Optimization 35 5.2.2 Enhance Generalization Capability 36 5.2.3 Deepfake Video Voice Detection 36 References 38

    [1] Sowmya K. N., and H. R. Chennamma, “A Survey On Video Forgery Detection,” arXiv:1503.00843, Mar 2015.
    [2] C. Vaccari and A. Chadwick, “Deepfakes and Disinformation: Exploring the Impact of Synthetic Political Video on Deception, Uncertainty, and Trust in News,” Social Media+ Society, Vol.6, No.1, pp.1-13, Feb. 2020.
    [3] MIT Technology Review, “The coronavirus is the first true social-media “infodemic”,” Available: https://www.technologyreview.com/2020/02/12/844851/the-coronavirus-is-the-first-true-social-media-infodemic/ [Accessed: 21-June 2022].
    [4] European Commission, “The 2022 Code of Practice on Disinformation,” Available: https://digital-strategy.ec.europa.eu/en/policies/code-practice-disinformation [Accessed: 21-June 2022].
    [5] A. Chattopadhay, A. Sarkar, P. Howlader and V. N. Balasubramanian, “Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks,” 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 839-847, 2018.
    [6] S. Woo, J. Park, J.Y. Lee, and I.S. Kweon, “CBAM: Convolutional Block Attention Module,” Proceedings of the European Conference on Computer Vision, pp.3-19, 2018.
    [7] N. Bonettini, E. D. Cannas, S. Mandelli, L. Bondi, P. Bestagini, and S. Tubaro, “Video Face Manipulation Detection Through Ensemble of CNNs," Proceedings of the 25th International Conference on Pattern Recognition, pp. 5012-5019, Jan. 2021.
    [8] D. Afchar, V. Nozick, J. Yamagishi, and I. Echizen, "MesoNet: a Compact Facial Video Forgery Detection Network," Proceedings of the 2018 IEEE International Workshop on Information Forensics and Security, pp. 1-7, 2018.
    [9] Y. Li and S. Lyu, “Exposing DeepFake Videos by Detecting Face Warping Artifacts,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 46-52, 2019.
    [10] T. Karras, S. Laine and T. Aila, “A Style-based Generator Architecture for Generative Adversarial Networks,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401-4410, 2019.
    [11] X. Yang, Y. Li, H. Qi, and S. Lyu, “Exposing GAN-synthesized Faces using Landmark Locations,” Proceedings of the ACM Workshop on Information Hiding and Multimedia Security, pp.113-118, 2019.
    [12] X. Yang, Y. Li, and S. Lyu, “Exposing Deep Fakes using Inconsistent Head Pose,” Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp.8261-8265, 2019.
    [13] F. Matern, C. Riess, and M. Stamminger, “Exploiting Visual Artifacts to Expose Deepfakes and Face Manipulations,” Proceedings of the IEEE Winter Applications of Computer Vision Workshops, pp.83-92, 2019.
    [14] Y. Li, M. Chang, and S. Lyu, “In Ictu Oculi: Exposing AI Created Fake Videos by Detecting Eye Blinking,” Proceedings of the IEEE International Workshop on Information Forensics and Security, pp.1-7, 2018.
    [15] A. Rossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner, “FaceForensics++: Learning to Detect Manipulated Facial Images,” Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1-11, October 2019.
    [16] T. T. Nguyen, Q. V. H. Nguyen, D. T. Nguyen, D. T. Nguyen, T. Huynh-The, S. Nahavandi, T. T. Nguyen, Quoc-Viet Pham and C. M. Nguyen, “Deep Learning for Deepfakes Creation and Detection: A Survey,” arXiv:1909.11573 [cs.CV], Sep 2019.
    [17] B. Dolhansky, J. Bitton, B. Pflaum, J. Lu, R. Howes, M. Wang, and C.C. Ferrer, “The DeepFake Detection Challenge Dataset,” arXiv: 2006.07397v4, October 2020.
    [18] M. Tan and Quoc V. Le, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,” Proceedings of the 36th International Conference on Machine Learning, PMLR 97:6105-6114, 2019.
    [19] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser and I. Polosukhin, “Attention is All you Need,” Proceedings of the Advances in Neural Information Processing Systems 30, 2017.
    [20] X. Wang, R. Girshick, A. Gupta and K. He, “Non-local Neural Networks,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7794-7803, 2018.
    [21] J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang and H. Lu, “Dual Attention Network for Scene Segmentation,” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3146-3154, 2019.
    [22] H. Zhao, W. Zhou, D. Chen, T. Wei, W. Zhang, and N. Yu, “Multi-Attentional Deepfake Detection,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.2185-2194, 2021.
    [23] N. Bonettini, E.D. Cannas, S. Mandelli, L. Bondi, P. Bestagini and S. Tubaro, "Video Face Manipulation Detection Through Ensemble of CNNs," Proceedings of the 25th International Conference on Pattern Recognition, pp.5012-5019, 2021.
    [24] D. Zhang, F. Lin, Y. Hua, P. Wang, D. Zeng and S. Ge, “Deepfake Video Detection with Spatiotemporal Dropout Transformer,” arXiv:2207.06612 [cs.CV], Jul 2022.
    [25] U.A. Ciftci, I. Demir and L. Yin, “FakeCatcher: Detection of Synthetic Portrait Videos using Biological Signals,” Proceedings of the IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: 10.1109, July 2020.
    [26] U. A. Ciftci, I. Demir and Lijun Yin, “How Do the Hearts of Deep Fakes Beat? Deep Fake Source Detection via Interpreting Residuals with Biological Signals,” Proceedings of the 2020 IEEE International Joint Conference on Biometrics, pp. 1-10, 2020.
    [27] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” arXiv: 1409.1556v6, Apr 2015.
    [28] K. He, X. Zhang, S. Ren and J. Sun, “Deep Residual Learning for Image Recognition,” arXiv:1512.03385v1, Dec 2015.
    [29] F. Chollet, “Xception: Deep Learning with Depthwise Separable Convolutions,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.1251-1258, 2017.
    [30] J. Hu, L. Shen, and G. Sun, “Squeeze-and-Excitation Networks,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.7132-7141, 2018.
    [31] J. Thies, M. Zollhöfer, M. Stamminger, C. Theobalt, and M. Nießner, “Face2face: Real-time Face Capture and Reenactment of RGB Videos,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.2387-2395, 2016.
    [32] J. Thies, M. Zollhöfer, and M. Nießner, “Deferred Neural Rendering: Image Synthesis using Neural Textures,” Proceedings of the ACM Transactions on Graphics, Vol.38, No.4, pp.1-12, 2019.
    [33] E. Zakharov, A. Shysheya, E. Burkov, and Victor Lempitsky. “Few-shot Adversarial Learning of Realistic Neural Talking Head Models,” Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.9458-9467, October 2019.
    [34] Y. Nirkin, Y. Keller and T. Hassner, “FSGAN: Subject Agnostic Face Swapping and Reenactment,” Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.7184-7193, 2019.
    [35] Y. Li, X. Yang, P. Sun, H. Qi, and S. Lyu, “Celeb-DF: A Large-Scale Challenging Dataset for DeepFake Forensics,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.3207-3216, 2020.

    QR CODE