簡易檢索 / 詳目顯示

研究生: 吳財俊
Kittiwat Pheramunchai
論文名稱: Error Level Analysis As A Guide Mask For Robust Deepfake Detection
Error Level Analysis As A Guide Mask For Robust Deepfake Detection
指導教授: 洪西進
Shi-Jinn Horng
口試委員: 李政吉
Cheng-Chi Lee
楊昌彪
Chang-Biau Yang
楊竹星
Chu-Sing Yang
林韋宏
Wei-Hung Lin
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 62
中文關鍵詞: DeepfakeDeepfake detectionFace manipulationError level analysisELAInceptionResnetInception-ResnetELA-InceptionResnet
外文關鍵詞: Deepfake, Deepfake detection, Face manipulation, Error level analysis, ELA, Inception, Resnet, Inception-Resnet, ELA-InceptionResnet
相關次數: 點閱:167下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • Recently, people tend to use many video calls for work and important meetings, making them more prone to the Deepfake attack. Deepfake uses machine learning to manipulate video, making it almost impossible to distinguish by human eyes. The criminal can use the Deepfake technology to manipulate those videos and make people misunderstand that individual or even the company. So, that is why distinguishing the fake video created by Deepfake becomes very important. Many research pieces study Deepfake detection and can achieve quite good results. However, Deepfake technology is continuously evolving along with the growth of AI and machine learning, making new methods of creating fake videos released one after another. As a result, there are now many novel methods to create Deepfake videos which makes previous models for detecting Deepfake become inefficient because the previous models have not learned about the new methods. Since there are many methods to manipulate the video, the robustness of the detecting model becomes a challenge.
    In this thesis, the custom dataset is used by combining many existing datasets together, such as FaceForensic++, Celeb-DF, etc. Also, a balance partition between the videos that are picked from each dataset is considered in order to achieve more diversity in face identities and creation methods, which can lead to achieving high robustness when training the model. Moreover, the new model of Deepfake detection is proposed in this thesis which is the ELA-Inception-Resnet. The ELA-InceptionResnet combines the Error Level Analysis and the Inception-Resnet architecture together by using the Error Level Analysis to guide the Inception-Resnet model on defining which features are essential for distinguishing manipulated videos since each input videos are having different types of manipulating methods, so the distinguishing features of each method could be different. The Error Level Analysis guides the model by giving different weights to the features map, higher weights for more essential features, and vice versa. After several experiments, ELA-InceptionResnet became the most robust model compared to the state-of-art model with the highest average accuracy of 91.41% among four different test datasets. However, ELA-InceptionResnet may not be able to beat the accuracy of the native model of each dataset since those models are explicitly trained on those datasets.


    Recently, people tend to use many video calls for work and important meetings, making them more prone to the Deepfake attack. Deepfake uses machine learning to manipulate video, making it almost impossible to distinguish by human eyes. The criminal can use the Deepfake technology to manipulate those videos and make people misunderstand that individual or even the company. So, that is why distinguishing the fake video created by Deepfake becomes very important. Many research pieces study Deepfake detection and can achieve quite good results. However, Deepfake technology is continuously evolving along with the growth of AI and machine learning, making new methods of creating fake videos released one after another. As a result, there are now many novel methods to create Deepfake videos which makes previous models for detecting Deepfake become inefficient because the previous models have not learned about the new methods. Since there are many methods to manipulate the video, the robustness of the detecting model becomes a challenge.
    In this thesis, the custom dataset is used by combining many existing datasets together, such as FaceForensic++, Celeb-DF, etc. Also, a balance partition between the videos that are picked from each dataset is considered in order to achieve more diversity in face identities and creation methods, which can lead to achieving high robustness when training the model. Moreover, the new model of Deepfake detection is proposed in this thesis which is the ELA-Inception-Resnet. The ELA-InceptionResnet combines the Error Level Analysis and the Inception-Resnet architecture together by using the Error Level Analysis to guide the Inception-Resnet model on defining which features are essential for distinguishing manipulated videos since each input videos are having different types of manipulating methods, so the distinguishing features of each method could be different. The Error Level Analysis guides the model by giving different weights to the features map, higher weights for more essential features, and vice versa. After several experiments, ELA-InceptionResnet became the most robust model compared to the state-of-art model with the highest average accuracy of 91.41% among four different test datasets. However, ELA-InceptionResnet may not be able to beat the accuracy of the native model of each dataset since those models are explicitly trained on those datasets.

    Abstract I Acknowledgment II 學位論文創新性聲明 III Table of Contents IV Chapter 1 Introduction 1 1.1 Motivations 2 1.2 Contributions 2 Chapter 2 Preliminaries and Related Works 3 2.1 Backgrounds 3 2.2 Facial manipulation methods 5 2.2.1 Identity swap 6 2.2.1.1 Deepfake 7 2.2.1.2 Faceswap 8 2.2.1.3 Faceshifter 9 2.2.2 Expression swap 11 2.2.2.1 Face2Face 11 2.2.2.2 Neural Texture 12 2.3 Detection methods 15 2.3.1 Mesonet 15 2.3.2 Xception (FaceForensics++) 17 2.3.3 DSP-FWA 19 2.4 Datasets 20 2.4.1 FaceForensics++ (FF++) 21 2.4.2 Celeb-DF v2 22 2.4.3 Google Deepfake Detection 23 2.4.4 Wild Deepfake 24 2.5 Error Level Analysis (ELA) 24 2.6 Inception-Resnet 26 2.6.1 Inception 26 2.6.2 Resnet 27 2.7 Facenet-Pytorch MTCNN 32 Chapter 3 Proposed Method 34 3.1 Dataset 34 3.2 ELA + Inception-Resnet 36 3.2.1 Guide mask 37 3.2.2 ELA-InceptionResnet model architecture 38 Chapter 4 Experimental Results 41 4.1 Prototype version 41 4.2 Experiment environment 41 4.3 Training and testing result 41 4.4 Performance comparison 42 4.5 Discussion 43 Chapter 5 Further Improvements and Conclusion 44 5.1 Further improvements 44 5.2 Conclusion 45 Figure 1 Types of Deepfake videos 3 Figure 2 Example of Donald Trump fake video 4 Figure 3 Types of facial manipulation method 6 Figure 4 Example of identity swap 7 Figure 5 Example of basic Autoencoder architecture 7 Figure 6 Traditional Deepfake architecture 8 Figure 7 Example of Blendshape 9 Figure 8 Example of U-net Architecture 10 Figure 9 Example of HEAR-NET architecture 11 Figure 10 Example of expression swap 11 Figure 11 Example of Face2Face 12 Figure 12 Example of Laplacian Pyramid 13 Figure 13 Example of Deferred Neural Rendering architecture 14 Figure 14 Example of Neural texture face manipulation pipeline 14 Figure 15 Example of Mesonet architecture 15 Figure 16 Example of MesoInception architecture 16 Figure 17 Example of Depth-wise convolution 17 Figure 18 Example of Point-wise convolution 17 Figure 19 Example of Xception architecturen 18 Figure 20 Example of DSP-FWA architecture 19 Figure 21 Example of Faceforensics++ videos 21 Figure 22 Example of Celeb-DF v2 videos 22 Figure 23 Example of Google Deepfake Detection videos 23 Figure 24 Example of DCT coefficients 24 Figure 25 Concept of Error Level Analysis 25 Figure 26 Example of Inception Module 26 Figure 27 Example of Residual block and the shortcut 27 Figure 28 Example of Inception-ResnetV2 architecture 28 Figure 29 Example of Stem architecture 28 Figure 30 Example of Inception-Resnet module A 29 Figure 31 Example of Inception-Resnet module B 29 Figure 32 Example of Inception-Resnet module C 29 Figure 33 Example of Reduction block A 29 Figure 34 Example of Reduction block B 30 Figure 35 Example of similar class from dog breed classification 30 Figure 36 Example of MTCNN pipeline 31 Figure 37 Performance comparison for face extraction task 32 Figure 38 Example of partition process 34 Figure 39 Example of applying ELA on Deepfake video 36 Figure 40 Example of Guide mask 37 Figure 41 Guide mask implementation in ELA-InceptionResnet 39 Figure 42 ELA-InceptionResnet Full Architecture 40   Table 1 Recent existing Deepfake dataset 21 Table 2 Custom dataset Information 35 Table 3 Proportion of training, validating and testing set 35 Table 4 Testing result 42 Table 5 Performance Comparison 42

    [1] "WHO coronavirus disease 2019 (COVID-19) situation report," WHO, [Online]. Available: https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports. [Accessed 7 June 2021].
    [2] "Deepfakes software for all," faceswap, [Online]. Available: https://github.com/deepfakes/faceswap. [Accessed 1 June 2021].
    [3] H. Ajder, G. Patrini, F. Cavalli, and L. Cullen, The State of Deepfakes: Landscape, Threats, and Impact, September 2019.
    [4] "Video Evidence A Primer for Prosecutors," Bureau of Justice Assistance U.S. Department of Justice, [Online]. Available: https://bja.ojp.gov/sites/g/files/xyckuh186/files/media/document/final-video-evidence-primer-for-prosecutors.pdf, 2016. [Accessed 1 June 2021]
    [5] R. Chesney, D. K. Citron, "Deep Fakes: A Looming Challenge for Privacy, Democracy, and National Security," in 107 California Law Review 1753 (2019), U of Texas Law, Public Law Research Paper No. 692, U of Maryland Legal Studies Research Paper No. 2018-21, Available at SSRN: https://ssrn.com/abstract=3213954 or http://dx.doi.org/10.2139/ssrn.3213954, 2018.
    [6] "Reface App," Reface.ai, [Online]. Available: https://hey.reface.ai/. [Accessed 16 June 2021].
    [7] A. Heath, "Snapchat lets you swap with any face in your camera roll — and it's insanely fun," INSIDER, 23 April 2016. [Online]. Available: https://www.businessinsider.com/snapchat-lets-you-swap-with-any-face-in-your-camera-roll-and-its-insanely-fun-2016-4#and-queen-bey-13. [Accessed 13 August 2021].
    [8] E. Bowman, L. Wu, "In An Era Of Fake News, Advancing Face-Swap Apps Blur More Lines," NPR, 3 February 2018. [Online]. Available: https://www.npr.org/2018/02/03/582767531/in-an-era-of-fake-news-advancing-face-swap-apps-blur-more-lines. [Accessed 10 June 2021].
    [9] J. Tammekänd, J. Thomas, and K. Peterson, "Deepfakes 2020: The Tipping Point," Sentinel, info@thesentinel.ai , October 2020.
    [10] N. Lomas, "Reface grabs $5.5M seed led by A16z to stoke its viral face-swap video app," Techcrunch, 8 December 2020. [Online]. Available: https://tcrn.ch/33QdRAI. [Accessed 6 June 2021].
    [11] H. V. D. Burchard, "Belgian socialist party circulates ‘deep fake’ Donald Trump video," Politico, 21 May 2018. [Online]. Available: https://www.politico.eu/article/spa-donald-trump-belgium-paris-climate-agreement-belgian-socialist-party-circulates-deep-fake-trump-video/. [Accessed 4 June 2021].
    [12] R. Tolosana, R. Vera-Rodriguez, J. Fierrez, A. Morales and J. Ortega-Garcia, "Deepfakes and beyond: A Survey of face manipulation and fake detection”, Information Fusion, Volume 64, 2020, Pages 131-148, ISSN 1566-2535, https://doi.org/10.1016/j.inffus.2020.06.014.
    [13] P. Wang, "This Person Does Not Exist," [Online]. Available: https://thispersondoesnotexist.com/. [Accessed 10 June 2021].
    [14] A. Heathman, "How Instagram's AR filters became the new route to internet stardom," Evening Standard, 23 January 2020. [Online]. Available: https://www.standard.co.uk/tech/instagram-filters-disney-2020-resolutions-trend-where-next-a4342366.html. [Accessed 12 June 2021].
    [15] G. E. Hinton and R. S. Zemel,"Autoencoders, minimum description length and Helmholtz free energy" In Proceedings of the 6th International Conference on Neural Information Processing Systems (NIPS'93). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 3–10 1994.
    [16] T. T. Nguyen, C. M. Nguyen, D. T. Nguyen, D. T. Nguyen and S. Nahavandi, "Deep Learning for Deepfakes Creation and Detection," , arXiv:1909.11573 [cs.CV], 2019.
    [17] A. Rössler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, M. Nießner, "FaceForensics++: Learning to Detect Manipulated Facial Images”, arXiv:1901.08971[cs.CV], 2019.
    [18] "Maya," Autodesk, [Online]. Available: https://www.autodesk.com/products/maya/overview. [Accessed 14 August 2021].
    [19] L. Li, J. Bao, H. Yang, D. Chen, F. Wen, "FaceShifter: Towards High Fidelity And Occlusion Aware Face Swapping," arXiv:1912.13457 [cs.CV], 2020.
    [20] J. Deng, J. Guo, N. Xue, S. Zafeiriou, "ArcFace: Additive Angular Margin Loss for Deep Face Recognition," in arXiv:1801.07698, 2018.
    [21] O. Ronneberger, P. Fischer, T. Brox, "U-Net: Convolutional Networks for Biomedical Image Segmentation," in arXiv:1505.04597 [cs.CV], 2015.
    [22] S. Suwajanakorn, "Fake videos of real people -- and how to spot them | Supasorn Suwajanakorn," Ted, 26 July 2018. [Online]. Available: https://www.youtube.com/watch?v=o2DDU4g0PRo. [Accessed 2021 June 16].
    [23] J. Thies, M. Zollhöfer, M. Stamminger, C. Theobalt, M. Nießner, "Face2Face: Real-time Face Capture and Reenactment of RGB Videos," arXiv:2007.14808[cs.CV], 2020.
    [24] "P. M. Kroonenberg and J. de Leeuw, Principal component analysis of three-mode data by means of alternating least squares algorithms, Psychometrika, 45th edition ,1980, pages 69–97".
    [25] "Jolliffe, I. T. (2002). Principal Component Analysis. Springer Series in Statistics. New York: Springer-Verlag. doi:10.1007/b98835. ISBN 978-0-387-95442-4".
    [26] J. Thies, M. Zollhöfer, M. Nießner, "Deferred Neural Rendering: Image Synthesis using Neural Textures," arXiv:1904.12356 [cs.CV], 2019.
    [27] Y. Li, M. Chang and S. Lyu, "In Ictu Oculi: Exposing AI Created Fake Videos by Detecting Eye Blinking," 2018 IEEE International Workshop on Information Forensics and Security (WIFS), 2018, pp. 1-7, doi: 10.1109/WIFS.2018.8630787.
    [28] D. Güera and E. J. Delp, "Deepfake Video Detection Using Recurrent Neural Networks," 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 2018, pp. 1-6, doi: 10.1109/AVSS.2018.8639163.
    [29] X. Yang, Y. Li and S. Lyu, "Exposing Deep Fakes Using Inconsistent Head Poses," ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 8261-8265, doi: 10.1109/ICASSP.2019.8683164.
    [30] Y. Li, X. Yang, P. Sun, H. Qi and S. Lyu, "Celeb-DF: A Large-Scale Challenging Dataset for DeepFake Forensics," 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 3204-3213, doi: 10.1109/CVPR42600.2020.00327.
    [31] N. Dufour and A. Gully, "Deepfakes detection dataset by Google & Jigsaw," Google & Jigsaw [Online]. Available: https://ai.googleblog.com/2019/09/contributing-data-to-deepfake-detection.html, 2019
    [32] B. J. Zi, M. H. Chang, J. J. Chen, X. J. Ma, Y. G. Jiang, "WildDeepfake: A Challenging Real-World Dataset for Deepfake Detection," in arXiv:2101.01456 [cs.CV], 2021.
    [33] D. Afchar, V. Nozick, J. Yamagishi, I. Echizen, "MesoNet: a Compact Facial Video Forgery Detection Network," arXiv:1809.00888 [cs.CV], 2018, doi: 10.1109/WIFS.2018.8630761
    [34] F. Chollet, "Xception: Deep Learning with Depthwise Separable Convolutions," arXiv:1610.02357v3 [cs.CV], 2017.
    [35] C. Szegedy, W. Liu, Y. Q. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, "Going Deeper with Convolutions," arXiv:1409.4842 [cs.CV], 2014.
    [36] Y. Z. Li and S. W. Lyu, DSP-FWA [Online]. Available: https://github.com/yuezunli/DSP-FWA, accessed on: 6 June 2021

    [37] Y. Z. Li and S. W. Lyu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2019, pp. 46-52
    [38] K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-778, doi: 10.1109/CVPR.2016.90.
    [39] K. He, X. Zhang, S. Ren and J. Sun, "Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1904-1916, 1 Sept. 2015, doi: 10.1109/TPAMI.2015.2389824.
    [40] B. Dolhansky, R. Howes, B. Pflaum, N. Baram, C. C. Ferrer, "The Deepfake Detection Challenge (DFDC) Preview Dataset," AI Red Team, Facebook AI, in arXiv:1910.08854 [cs.CV], 2019.
    [41] A. York, "Always Up-to-Date Guide to Social Media Video Specs," sproutsocial, 20 May 2021. [Online]. Available: https://sproutsocial.com/insights/social-media-video-specs-guide/. [Accessed 18 June 2021].
    [42] K. Catoy, "Current video resolution standards," video4change, 21 August 2020. [Online]. Available: https://video4change.org/the-basics-of-video-resolution/#:~:text=Current%20video%20resolution%20standards&text=The%20most%20common%20video%20resolution,the%20minimum%20standard%20for%20HD.. [Accessed 20 June 2021].
    [43] N. Krawetz, "A Picture’s Worth... Digital Image Analysis and Forensics Version 2," in Semanticscholar; Corpus ID: 53007583, 2007.
    [44] Y. Blau and T. Michaeli, “Rethinking Lossy Compression: The Rate-Distortion-Perception Tradeoff,” in Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, ser., Proceedings of Machine Learning Research, K. Chaudhuri and R. Salakhutdinov, Eds., vol. 97. PMLR, 2019, pp. 675–685.: [Online]. Available: http://proceedings.mlr.press/v97/blau19a.html.
    [45] N. Ahmed, T. Natarajan and K. R. Rao, "Discrete Cosine Transform," in IEEE Transactions on Computers, vol. C-23, no. 1, pp. 90-93, Jan. 1974, doi: 10.1109/T-C.1974.223784.
    [46] A. Alemi, "Improving Inception and Image Classification in TensorFlow," Google, 31 August 2016. [Online]. Available: https://ai.googleblog.com/2016/08/improving-inception-and-image.html. [Accessed 2 July 2021].
    [47] C. Szegedy, S. Ioffe, V. Vanhoucke, A. Alemi, "Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning," in arXiv:1602.07261 [cs.CV], 2016.
    [48] K. Zhang, Z. Zhang, Z. Li and Y. Qiao, "Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks," in IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1499-1503, Oct. 2016, doi: 10.1109/LSP.2016.2603342.
    [49] T. Esler, "github.com/timesler/facenet-pytorch," 24 May 2021. [Online]. Available: https://github.com/timesler/facenet-pytorch. [Accessed 14 July 2021].
    [50] "imutils," [Online]. Available: https://pypi.org/project/imutils/. [Accessed 2 July 2021].
    [51] "dlib," [Online]. Available: http://dlib.net/. [Accessed 2 July 2021].
    [52] “Deepfake Detection Challenge” [Online]. Available: https://www.kaggle.com/c/deepfake-detection-challenge [Accessed 2 July 2021]

    [53] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio, "Generative Adversarial Networks" in arXiv:1406.2661 [stat.ML], 2014.

    QR CODE