研究生: 羅奕翔
Yi-Xiang Luo
論文名稱: 雙重注意力神經網路技術於人臉偽造影片檢測之應用
Dual Attention Neural Network Approaches to Face Forgery Video Detection
指導教授: 陳俊良
Jiann-Liang Chen
口試委員: 郭耀煌
學位類別: 碩士
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 54
中文關鍵詞: 深偽影像電腦視覺深度學習卷積神經網路注意力機制
外文關鍵詞: Deepfake, CV, Deep learning, CNN, Attention
深偽影片檢測任務的目標是如何從假影片中捕獲篡改痕跡。在本研究中,使用卷積神經網路(Convolutional Neural Network, CNN)來提取人臉篡改的痕跡,並提出了一套雙重注意偽造檢測網絡(Dual Attention for Forgery Detection Network, DAFDN),它使用空間降維注意模塊(Spatial Reduction Attention Block, SRAB)和偽造特徵注意模組(Forgery Feature Attention Module, FFAM)應用於骨幹網絡中的塊之間。這兩個模組增強了網路注意圖像中偽造的區域。此外,在本文中為了進一步將兩個注意力模組有效的嵌入骨幹網路視,將多種不同組合的變異體進行泛化能力以及注意能力分析,並將表現最好的變異體作為本研究提出的DAFDN模型。
為了評估模型架構的偵測校能,本研究使用兩個公共測試資料集,包括DFDC 以及 FaceForensics++,比較所提出DAFDN與其他方法的表現,並以可視化的方式呈現 XceptionNet 和 DAFDN 的注意力圖。研究結果顯示,本研究所提DAFDN策略在兩種數據集上都取得了良好的性能: DFDC及FaceForensics++分別獲得 0.911及0.945 AUC 值。本研究所提出之DFDC優於其他方法如XceptionNet and Ensemble EfficientNet。

With the maturity of deep learning technology, Deepfake is widely used in video editing technology to create a new audio-visual environment. Most of the videos created using this technology are malicious content, i.e., pornographic face swaps, fake news videos, etc. The circulation of these videos on the Internet poses serious information security concerns and raises doubts about whether the images used as evidence in court have been tampered with, which can seriously affect the confidence of Internet users in information. In the past two years, as the COVID-19 epidemic and many people have monopolized the world have been forced to isolate themselves at home, reliance on social media and information disseminated on the Internet has increased. Detection of video forgery has become more critical than ever, and therefore deepfake detection technology has become an issue of great interest in recent years.

Deepfake videos inspection mission aims to capture tampering traces from fake videos. This study used a convolutional neural network (CNN) to extract traces of face tampering. It also proposed a Dual Attention for Forgery Detection Network (DAFDN), which uses spatial reduction attention block (SRAB) and forgery feature attention module (FFAM) applied to the backbone network. DAFDN embeds the two proposed attention mechanisms and enables the network to extract the peculiar traces left by the warping of images. In addition, various combinations of variants are analyzed for generalization ability and attention ability to further embed the two attention modules into the backbone network effectively. The best-performing variant is used as the DAFDN model proposed in this study.

This study uses two public datasets, DFDC and FaceForensics++, to compare the performance of the proposed DAFDN with other existing methods. The results show that the proposed DAFDN can achieve 0.911 and 0.945 AUC values from dataset DFDC and FaceForensics++, respectively. These results are better than previous methods, such as XceptionNet and EfficientNet-related methods.

摘要 I Abstract II List of Figures VII List of Tables IX Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Contributions 3 1.3 Organization 4 Chapter 2 Related Work 6 2.1 Deepfake Detection 6 2.1.1 Warping Traces Detection 6 2.1.2 Visual Artifacts Detection 9 2.1.3 Biological Signals Detection 10 2.2 CNNs 12 2.2.1 Introduction of CNN 12 2.2.2 VGG 12 2.2.3 ResNet 13 2.2.4 XceptionNet 13 2.2.5 EfficientNet 14 2.3 Attention 15 Chapter 3 Methods 18 3.1 Forgery Attention Module 18 3.1.1 Spatial Reduction Attention Block (SRAB) 19 3.1.2 Forgery Feature Attention Module (FFAM) 20 3.2 Dual Attention Forgery Detection Network 21 Chapter 4 Performance Analysis 26 4.1 Dataset 26 4.2 Data Preprocessing 28 4.3 Experimental Environment 29 4.4 Implementation Details 30 4.5 Classification Performance 32 Chapter 5 Conclusions and Future Works 35 5.1 Conclusions 35 5.2 Future Works 35 5.2.1 Attention Mechanism Optimization 35 5.2.2 Enhance Generalization Capability 36 5.2.3 Deepfake Video Voice Detection 36 References 38

