Error Level Analysis As A Guide Mask For Robust Deepfake Detection｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	吳財俊 Kittiwat Pheramunchai
論文名稱：	Error Level Analysis As A Guide Mask For Robust Deepfake Detection Error Level Analysis As A Guide Mask For Robust Deepfake Detection
指導教授：	洪西進 Shi-Jinn Horng
口試委員:	李政吉 Cheng-Chi Lee 楊昌彪 Chang-Biau Yang 楊竹星 Chu-Sing Yang 林韋宏 Wei-Hung Lin
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2021
畢業學年度：	109
語文別：	英文
論文頁數：	62
中文關鍵詞：	Deepfake 、Deepfake detection 、Face manipulation 、Error level analysis 、ELA 、Inception 、Resnet 、Inception-Resnet 、ELA-InceptionResnet
外文關鍵詞：	Deepfake, Deepfake detection, Face manipulation, Error level analysis, ELA, Inception, Resnet, Inception-Resnet, ELA-InceptionResnet
相關次數：	點閱：167 下載：5
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

Recently, people tend to use many video calls for work and important meetings, making them more prone to the Deepfake attack. Deepfake uses machine learning to manipulate video, making it almost impossible to distinguish by human eyes. The criminal can use the Deepfake technology to manipulate those videos and make people misunderstand that individual or even the company. So, that is why distinguishing the fake video created by Deepfake becomes very important. Many research pieces study Deepfake detection and can achieve quite good results. However, Deepfake technology is continuously evolving along with the growth of AI and machine learning, making new methods of creating fake videos released one after another. As a result, there are now many novel methods to create Deepfake videos which makes previous models for detecting Deepfake become inefficient because the previous models have not learned about the new methods. Since there are many methods to manipulate the video, the robustness of the detecting model becomes a challenge.
In this thesis, the custom dataset is used by combining many existing datasets together, such as FaceForensic++, Celeb-DF, etc. Also, a balance partition between the videos that are picked from each dataset is considered in order to achieve more diversity in face identities and creation methods, which can lead to achieving high robustness when training the model. Moreover, the new model of Deepfake detection is proposed in this thesis which is the ELA-Inception-Resnet. The ELA-InceptionResnet combines the Error Level Analysis and the Inception-Resnet architecture together by using the Error Level Analysis to guide the Inception-Resnet model on defining which features are essential for distinguishing manipulated videos since each input videos are having different types of manipulating methods, so the distinguishing features of each method could be different. The Error Level Analysis guides the model by giving different weights to the features map, higher weights for more essential features, and vice versa. After several experiments, ELA-InceptionResnet became the most robust model compared to the state-of-art model with the highest average accuracy of 91.41% among four different test datasets. However, ELA-InceptionResnet may not be able to beat the accuracy of the native model of each dataset since those models are explicitly trained on those datasets.

Abstract    I
Acknowledgment    II
學位論文創新性聲明    III
Table of Contents    IV
Chapter 1 Introduction    1
1.1 Motivations    2
1.2 Contributions    2
Chapter 2 Preliminaries and Related Works    3
2.1 Backgrounds    3
2.2 Facial manipulation methods    5
2.2.1 Identity swap    6
2.2.1.1 Deepfake    7
2.2.1.2 Faceswap    8
2.2.1.3 Faceshifter    9
2.2.2 Expression swap    11
2.2.2.1 Face2Face    11
2.2.2.2 Neural Texture    12
2.3 Detection methods    15
2.3.1 Mesonet    15
2.3.2 Xception (FaceForensics++)    17
2.3.3 DSP-FWA    19
2.4 Datasets    20
2.4.1 FaceForensics++ (FF++)    21
2.4.2 Celeb-DF v2    22
2.4.3 Google Deepfake Detection    23
2.4.4 Wild Deepfake    24
2.5 Error Level Analysis (ELA)    24
2.6 Inception-Resnet    26
2.6.1 Inception    26
2.6.2 Resnet    27
2.7 Facenet-Pytorch MTCNN    32
Chapter 3 Proposed Method    34
3.1 Dataset    34
3.2 ELA + Inception-Resnet    36
3.2.1 Guide mask    37
3.2.2 ELA-InceptionResnet model architecture    38
Chapter 4 Experimental Results    41
4.1 Prototype version    41
4.2 Experiment environment    41
4.3 Training and testing result    41
4.4 Performance comparison    42
4.5 Discussion    43
Chapter 5 Further Improvements and Conclusion    44
5.1 Further improvements    44
5.2 Conclusion    45



Figure 1 Types of Deepfake videos    3
Figure 2 Example of Donald Trump fake video    4
Figure 3 Types of facial manipulation method    6
Figure 4 Example of identity swap    7
Figure 5 Example of basic Autoencoder architecture    7
Figure 6 Traditional Deepfake architecture    8
Figure 7 Example of Blendshape    9
Figure 8 Example of U-net Architecture    10
Figure 9 Example of HEAR-NET architecture    11
Figure 10 Example of expression swap    11
Figure 11 Example of Face2Face    12
Figure 12 Example of Laplacian Pyramid    13
Figure 13 Example of Deferred Neural Rendering architecture    14
Figure 14 Example of Neural texture face manipulation pipeline    14
Figure 15 Example of Mesonet architecture    15
Figure 16 Example of MesoInception architecture    16
Figure 17 Example of Depth-wise convolution    17
Figure 18 Example of Point-wise convolution    17
Figure 19 Example of Xception architecturen    18
Figure 20 Example of DSP-FWA architecture    19
Figure 21 Example of Faceforensics++ videos    21
Figure 22 Example of Celeb-DF v2 videos    22
Figure 23 Example of Google Deepfake Detection videos    23
Figure 24 Example of DCT coefficients    24
Figure 25 Concept of Error Level Analysis    25
Figure 26 Example of Inception Module    26
Figure 27 Example of Residual block and the shortcut    27
Figure 28 Example of Inception-ResnetV2 architecture          28
Figure 29 Example of Stem architecture          28
Figure 30 Example of Inception-Resnet module A          29
Figure 31 Example of Inception-Resnet module B       29
Figure 32 Example of Inception-Resnet module C        29
Figure 33 Example of Reduction block A        29
Figure 34 Example of Reduction block B    30
Figure 35 Example of similar class from dog breed classification    30
Figure 36 Example of MTCNN pipeline    31
Figure 37 Performance comparison for face extraction task    32
Figure 38 Example of partition process    34
Figure 39 Example of applying ELA on Deepfake video    36
Figure 40 Example of Guide mask    37
Figure 41 Guide mask implementation in ELA-InceptionResnet    39
Figure 42 ELA-InceptionResnet Full Architecture    40

 
Table 1 Recent existing Deepfake dataset    21
Table 2 Custom dataset Information    35
Table 3 Proportion of training, validating and testing set    35
Table 4 Testing result    42
Table 5 Performance Comparison    42


                                

[1] "WHO coronavirus disease 2019 (COVID-19) situation report," WHO, [Online]. Available: https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports. [Accessed 7 June 2021].
[2] "Deepfakes software for all," faceswap, [Online]. Available: https://github.com/deepfakes/faceswap. [Accessed 1 June 2021].
[3] H. Ajder, G. Patrini, F. Cavalli, and L. Cullen, The State of Deepfakes: Landscape, Threats, and Impact, September 2019.
[4] "Video Evidence A Primer for Prosecutors," Bureau of Justice Assistance U.S. Department of Justice, [Online]. Available: https://bja.ojp.gov/sites/g/files/xyckuh186/files/media/document/final-video-evidence-primer-for-prosecutors.pdf, 2016. [Accessed 1 June 2021]
[5] R. Chesney, D. K. Citron, "Deep Fakes: A Looming Challenge for Privacy, Democracy, and National Security," in 107 California Law Review 1753 (2019), U of Texas Law, Public Law Research Paper No. 692, U of Maryland Legal Studies Research Paper No. 2018-21, Available at SSRN: https://ssrn.com/abstract=3213954 or http://dx.doi.org/10.2139/ssrn.3213954, 2018.
[6] "Reface App," Reface.ai, [Online]. Available: https://hey.reface.ai/. [Accessed 16 June 2021].
[7] A. Heath, "Snapchat lets you swap with any face in your camera roll — and it's insanely fun," INSIDER, 23 April 2016. [Online]. Available: https://www.businessinsider.com/snapchat-lets-you-swap-with-any-face-in-your-camera-roll-and-its-insanely-fun-2016-4#and-queen-bey-13. [Accessed 13 August 2021].
[8] E. Bowman, L. Wu, "In An Era Of Fake News, Advancing Face-Swap Apps Blur More Lines," NPR, 3 February 2018. [Online]. Available: https://www.npr.org/2018/02/03/582767531/in-an-era-of-fake-news-advancing-face-swap-apps-blur-more-lines. [Accessed 10 June 2021].
[9] J. Tammekänd, J. Thomas, and K. Peterson, "Deepfakes 2020: The Tipping Point," Sentinel, info@thesentinel.ai , October 2020.
[10] N. Lomas, "Reface grabs $5.5M seed led by A16z to stoke its viral face-swap video app," Techcrunch, 8 December 2020. [Online]. Available: https://tcrn.ch/33QdRAI. [Accessed 6 June 2021].
[11] H. V. D. Burchard, "Belgian socialist party circulates ‘deep fake’ Donald Trump video," Politico, 21 May 2018. [Online]. Available: https://www.politico.eu/article/spa-donald-trump-belgium-paris-climate-agreement-belgian-socialist-party-circulates-deep-fake-trump-video/. [Accessed 4 June 2021].
[12] R. Tolosana, R. Vera-Rodriguez, J. Fierrez, A. Morales and J. Ortega-Garcia, "Deepfakes and beyond: A Survey of face manipulation and fake detection”, Information Fusion, Volume 64, 2020, Pages 131-148, ISSN 1566-2535, https://doi.org/10.1016/j.inffus.2020.06.014.
[13] P. Wang, "This Person Does Not Exist," [Online]. Available: https://thispersondoesnotexist.com/. [Accessed 10 June 2021].
[14] A. Heathman, "How Instagram's AR filters became the new route to internet stardom," Evening Standard, 23 January 2020. [Online]. Available: https://www.standard.co.uk/tech/instagram-filters-disney-2020-resolutions-trend-where-next-a4342366.html. [Accessed 12 June 2021].
[15] G. E. Hinton and R. S. Zemel,"Autoencoders, minimum description length and Helmholtz free energy" In Proceedings of the 6th International Conference on Neural Information Processing Systems (NIPS'93). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 3–10 1994.
[16] T. T. Nguyen, C. M. Nguyen, D. T. Nguyen, D. T. Nguyen and S. Nahavandi, "Deep Learning for Deepfakes Creation and Detection," , arXiv:1909.11573 [cs.CV], 2019.
[17] A. Rössler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, M. Nießner, "FaceForensics++: Learning to Detect Manipulated Facial Images”, arXiv:1901.08971[cs.CV], 2019.
[18] "Maya," Autodesk, [Online]. Available: https://www.autodesk.com/products/maya/overview. [Accessed 14 August 2021].
[19] L. Li, J. Bao, H. Yang, D. Chen, F. Wen, "FaceShifter: Towards High Fidelity And Occlusion Aware Face Swapping," arXiv:1912.13457 [cs.CV], 2020.
[20] J. Deng, J. Guo, N. Xue, S. Zafeiriou, "ArcFace: Additive Angular Margin Loss for Deep Face Recognition," in arXiv:1801.07698, 2018.
[21] O. Ronneberger, P. Fischer, T. Brox, "U-Net: Convolutional Networks for Biomedical Image Segmentation," in arXiv:1505.04597 [cs.CV], 2015.
[22] S. Suwajanakorn, "Fake videos of real people -- and how to spot them | Supasorn Suwajanakorn," Ted, 26 July 2018. [Online]. Available: https://www.youtube.com/watch?v=o2DDU4g0PRo. [Accessed 2021 June 16].
[23] J. Thies, M. Zollhöfer, M. Stamminger, C. Theobalt, M. Nießner, "Face2Face: Real-time Face Capture and Reenactment of RGB Videos," arXiv:2007.14808[cs.CV], 2020.
[24] "P. M. Kroonenberg and J. de Leeuw, Principal component analysis of three-mode data by means of alternating least squares algorithms, Psychometrika, 45th edition ,1980, pages 69–97".
[25] "Jolliffe, I. T. (2002). Principal Component Analysis. Springer Series in Statistics. New York: Springer-Verlag. doi:10.1007/b98835. ISBN 978-0-387-95442-4".
[26] J. Thies, M. Zollhöfer, M. Nießner, "Deferred Neural Rendering: Image Synthesis using Neural Textures," arXiv:1904.12356 [cs.CV], 2019.
[27] Y. Li, M. Chang and S. Lyu, "In Ictu Oculi: Exposing AI Created Fake Videos by Detecting Eye Blinking," 2018 IEEE International Workshop on Information Forensics and Security (WIFS), 2018, pp. 1-7, doi: 10.1109/WIFS.2018.8630787.
[28] D. Güera and E. J. Delp, "Deepfake Video Detection Using Recurrent Neural Networks," 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 2018, pp. 1-6, doi: 10.1109/AVSS.2018.8639163.
[29] X. Yang, Y. Li and S. Lyu, "Exposing Deep Fakes Using Inconsistent Head Poses," ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 8261-8265, doi: 10.1109/ICASSP.2019.8683164.
[30] Y. Li, X. Yang, P. Sun, H. Qi and S. Lyu, "Celeb-DF: A Large-Scale Challenging Dataset for DeepFake Forensics," 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 3204-3213, doi: 10.1109/CVPR42600.2020.00327.
[31] N. Dufour and A. Gully, "Deepfakes detection dataset by Google & Jigsaw," Google & Jigsaw [Online]. Available: https://ai.googleblog.com/2019/09/contributing-data-to-deepfake-detection.html, 2019
[32] B. J. Zi, M. H. Chang, J. J. Chen, X. J. Ma, Y. G. Jiang, "WildDeepfake: A Challenging Real-World Dataset for Deepfake Detection," in arXiv:2101.01456 [cs.CV], 2021.
[33] D. Afchar, V. Nozick, J. Yamagishi, I. Echizen, "MesoNet: a Compact Facial Video Forgery Detection Network," arXiv:1809.00888 [cs.CV], 2018, doi: 10.1109/WIFS.2018.8630761
[34] F. Chollet, "Xception: Deep Learning with Depthwise Separable Convolutions," arXiv:1610.02357v3 [cs.CV], 2017.
[35] C. Szegedy, W. Liu, Y. Q. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, "Going Deeper with Convolutions," arXiv:1409.4842 [cs.CV], 2014.
[36] Y. Z. Li and S. W. Lyu, DSP-FWA [Online]. Available: https://github.com/yuezunli/DSP-FWA, accessed on: 6 June 2021

[37] Y. Z. Li and S. W. Lyu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2019, pp. 46-52
[38] K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-778, doi: 10.1109/CVPR.2016.90.
[39] K. He, X. Zhang, S. Ren and J. Sun, "Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1904-1916, 1 Sept. 2015, doi: 10.1109/TPAMI.2015.2389824.
[40] B. Dolhansky, R. Howes, B. Pflaum, N. Baram, C. C. Ferrer, "The Deepfake Detection Challenge (DFDC) Preview Dataset," AI Red Team, Facebook AI, in arXiv:1910.08854 [cs.CV], 2019.
[41] A. York, "Always Up-to-Date Guide to Social Media Video Specs," sproutsocial, 20 May 2021. [Online]. Available: https://sproutsocial.com/insights/social-media-video-specs-guide/. [Accessed 18 June 2021].
[42] K. Catoy, "Current video resolution standards," video4change, 21 August 2020. [Online]. Available: https://video4change.org/the-basics-of-video-resolution/#:~:text=Current%20video%20resolution%20standards&text=The%20most%20common%20video%20resolution,the%20minimum%20standard%20for%20HD.. [Accessed 20 June 2021].
[43] N. Krawetz, "A Picture’s Worth... Digital Image Analysis and Forensics Version 2," in Semanticscholar; Corpus ID: 53007583, 2007.
[44] Y. Blau and T. Michaeli, “Rethinking Lossy Compression: The Rate-Distortion-Perception Tradeoff,” in Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, ser., Proceedings of Machine Learning Research, K. Chaudhuri and R. Salakhutdinov, Eds., vol. 97. PMLR, 2019, pp. 675–685.: [Online]. Available: http://proceedings.mlr.press/v97/blau19a.html.
[45] N. Ahmed, T. Natarajan and K. R. Rao, "Discrete Cosine Transform," in IEEE Transactions on Computers, vol. C-23, no. 1, pp. 90-93, Jan. 1974, doi: 10.1109/T-C.1974.223784.
[46] A. Alemi, "Improving Inception and Image Classification in TensorFlow," Google, 31 August 2016. [Online]. Available: https://ai.googleblog.com/2016/08/improving-inception-and-image.html. [Accessed 2 July 2021].
[47] C. Szegedy, S. Ioffe, V. Vanhoucke, A. Alemi, "Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning," in arXiv:1602.07261 [cs.CV], 2016.
[48] K. Zhang, Z. Zhang, Z. Li and Y. Qiao, "Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks," in IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1499-1503, Oct. 2016, doi: 10.1109/LSP.2016.2603342.
[49] T. Esler, "github.com/timesler/facenet-pytorch," 24 May 2021. [Online]. Available: https://github.com/timesler/facenet-pytorch. [Accessed 14 July 2021].
[50] "imutils," [Online]. Available: https://pypi.org/project/imutils/. [Accessed 2 July 2021].
[51] "dlib," [Online]. Available: http://dlib.net/. [Accessed 2 July 2021].
[52] “Deepfake Detection Challenge” [Online]. Available: https://www.kaggle.com/c/deepfake-detection-challenge [Accessed 2 July 2021]

[53] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio, "Generative Adversarial Networks" in arXiv:1406.2661 [stat.ML], 2014.

簡易檢索 / 詳目顯示

相關論文