研究生: |
吳財俊 Kittiwat Pheramunchai |
---|---|
論文名稱: |
Error Level Analysis As A Guide Mask For Robust Deepfake Detection Error Level Analysis As A Guide Mask For Robust Deepfake Detection |
指導教授: |
洪西進
Shi-Jinn Horng |
口試委員: |
李政吉
Cheng-Chi Lee 楊昌彪 Chang-Biau Yang 楊竹星 Chu-Sing Yang 林韋宏 Wei-Hung Lin |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 英文 |
論文頁數: | 62 |
中文關鍵詞: | Deepfake 、Deepfake detection 、Face manipulation 、Error level analysis 、ELA 、Inception 、Resnet 、Inception-Resnet 、ELA-InceptionResnet |
外文關鍵詞: | Deepfake, Deepfake detection, Face manipulation, Error level analysis, ELA, Inception, Resnet, Inception-Resnet, ELA-InceptionResnet |
相關次數: | 點閱:173 下載:5 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
Recently, people tend to use many video calls for work and important meetings, making them more prone to the Deepfake attack. Deepfake uses machine learning to manipulate video, making it almost impossible to distinguish by human eyes. The criminal can use the Deepfake technology to manipulate those videos and make people misunderstand that individual or even the company. So, that is why distinguishing the fake video created by Deepfake becomes very important. Many research pieces study Deepfake detection and can achieve quite good results. However, Deepfake technology is continuously evolving along with the growth of AI and machine learning, making new methods of creating fake videos released one after another. As a result, there are now many novel methods to create Deepfake videos which makes previous models for detecting Deepfake become inefficient because the previous models have not learned about the new methods. Since there are many methods to manipulate the video, the robustness of the detecting model becomes a challenge.
In this thesis, the custom dataset is used by combining many existing datasets together, such as FaceForensic++, Celeb-DF, etc. Also, a balance partition between the videos that are picked from each dataset is considered in order to achieve more diversity in face identities and creation methods, which can lead to achieving high robustness when training the model. Moreover, the new model of Deepfake detection is proposed in this thesis which is the ELA-Inception-Resnet. The ELA-InceptionResnet combines the Error Level Analysis and the Inception-Resnet architecture together by using the Error Level Analysis to guide the Inception-Resnet model on defining which features are essential for distinguishing manipulated videos since each input videos are having different types of manipulating methods, so the distinguishing features of each method could be different. The Error Level Analysis guides the model by giving different weights to the features map, higher weights for more essential features, and vice versa. After several experiments, ELA-InceptionResnet became the most robust model compared to the state-of-art model with the highest average accuracy of 91.41% among four different test datasets. However, ELA-InceptionResnet may not be able to beat the accuracy of the native model of each dataset since those models are explicitly trained on those datasets.
Recently, people tend to use many video calls for work and important meetings, making them more prone to the Deepfake attack. Deepfake uses machine learning to manipulate video, making it almost impossible to distinguish by human eyes. The criminal can use the Deepfake technology to manipulate those videos and make people misunderstand that individual or even the company. So, that is why distinguishing the fake video created by Deepfake becomes very important. Many research pieces study Deepfake detection and can achieve quite good results. However, Deepfake technology is continuously evolving along with the growth of AI and machine learning, making new methods of creating fake videos released one after another. As a result, there are now many novel methods to create Deepfake videos which makes previous models for detecting Deepfake become inefficient because the previous models have not learned about the new methods. Since there are many methods to manipulate the video, the robustness of the detecting model becomes a challenge.
In this thesis, the custom dataset is used by combining many existing datasets together, such as FaceForensic++, Celeb-DF, etc. Also, a balance partition between the videos that are picked from each dataset is considered in order to achieve more diversity in face identities and creation methods, which can lead to achieving high robustness when training the model. Moreover, the new model of Deepfake detection is proposed in this thesis which is the ELA-Inception-Resnet. The ELA-InceptionResnet combines the Error Level Analysis and the Inception-Resnet architecture together by using the Error Level Analysis to guide the Inception-Resnet model on defining which features are essential for distinguishing manipulated videos since each input videos are having different types of manipulating methods, so the distinguishing features of each method could be different. The Error Level Analysis guides the model by giving different weights to the features map, higher weights for more essential features, and vice versa. After several experiments, ELA-InceptionResnet became the most robust model compared to the state-of-art model with the highest average accuracy of 91.41% among four different test datasets. However, ELA-InceptionResnet may not be able to beat the accuracy of the native model of each dataset since those models are explicitly trained on those datasets.
[1] "WHO coronavirus disease 2019 (COVID-19) situation report," WHO, [Online]. Available: https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports. [Accessed 7 June 2021].
[2] "Deepfakes software for all," faceswap, [Online]. Available: https://github.com/deepfakes/faceswap. [Accessed 1 June 2021].
[3] H. Ajder, G. Patrini, F. Cavalli, and L. Cullen, The State of Deepfakes: Landscape, Threats, and Impact, September 2019.
[4] "Video Evidence A Primer for Prosecutors," Bureau of Justice Assistance U.S. Department of Justice, [Online]. Available: https://bja.ojp.gov/sites/g/files/xyckuh186/files/media/document/final-video-evidence-primer-for-prosecutors.pdf, 2016. [Accessed 1 June 2021]
[5] R. Chesney, D. K. Citron, "Deep Fakes: A Looming Challenge for Privacy, Democracy, and National Security," in 107 California Law Review 1753 (2019), U of Texas Law, Public Law Research Paper No. 692, U of Maryland Legal Studies Research Paper No. 2018-21, Available at SSRN: https://ssrn.com/abstract=3213954 or http://dx.doi.org/10.2139/ssrn.3213954, 2018.
[6] "Reface App," Reface.ai, [Online]. Available: https://hey.reface.ai/. [Accessed 16 June 2021].
[7] A. Heath, "Snapchat lets you swap with any face in your camera roll — and it's insanely fun," INSIDER, 23 April 2016. [Online]. Available: https://www.businessinsider.com/snapchat-lets-you-swap-with-any-face-in-your-camera-roll-and-its-insanely-fun-2016-4#and-queen-bey-13. [Accessed 13 August 2021].
[8] E. Bowman, L. Wu, "In An Era Of Fake News, Advancing Face-Swap Apps Blur More Lines," NPR, 3 February 2018. [Online]. Available: https://www.npr.org/2018/02/03/582767531/in-an-era-of-fake-news-advancing-face-swap-apps-blur-more-lines. [Accessed 10 June 2021].
[9] J. Tammekänd, J. Thomas, and K. Peterson, "Deepfakes 2020: The Tipping Point," Sentinel, info@thesentinel.ai , October 2020.
[10] N. Lomas, "Reface grabs $5.5M seed led by A16z to stoke its viral face-swap video app," Techcrunch, 8 December 2020. [Online]. Available: https://tcrn.ch/33QdRAI. [Accessed 6 June 2021].
[11] H. V. D. Burchard, "Belgian socialist party circulates ‘deep fake’ Donald Trump video," Politico, 21 May 2018. [Online]. Available: https://www.politico.eu/article/spa-donald-trump-belgium-paris-climate-agreement-belgian-socialist-party-circulates-deep-fake-trump-video/. [Accessed 4 June 2021].
[12] R. Tolosana, R. Vera-Rodriguez, J. Fierrez, A. Morales and J. Ortega-Garcia, "Deepfakes and beyond: A Survey of face manipulation and fake detection”, Information Fusion, Volume 64, 2020, Pages 131-148, ISSN 1566-2535, https://doi.org/10.1016/j.inffus.2020.06.014.
[13] P. Wang, "This Person Does Not Exist," [Online]. Available: https://thispersondoesnotexist.com/. [Accessed 10 June 2021].
[14] A. Heathman, "How Instagram's AR filters became the new route to internet stardom," Evening Standard, 23 January 2020. [Online]. Available: https://www.standard.co.uk/tech/instagram-filters-disney-2020-resolutions-trend-where-next-a4342366.html. [Accessed 12 June 2021].
[15] G. E. Hinton and R. S. Zemel,"Autoencoders, minimum description length and Helmholtz free energy" In Proceedings of the 6th International Conference on Neural Information Processing Systems (NIPS'93). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 3–10 1994.
[16] T. T. Nguyen, C. M. Nguyen, D. T. Nguyen, D. T. Nguyen and S. Nahavandi, "Deep Learning for Deepfakes Creation and Detection," , arXiv:1909.11573 [cs.CV], 2019.
[17] A. Rössler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, M. Nießner, "FaceForensics++: Learning to Detect Manipulated Facial Images”, arXiv:1901.08971[cs.CV], 2019.
[18] "Maya," Autodesk, [Online]. Available: https://www.autodesk.com/products/maya/overview. [Accessed 14 August 2021].
[19] L. Li, J. Bao, H. Yang, D. Chen, F. Wen, "FaceShifter: Towards High Fidelity And Occlusion Aware Face Swapping," arXiv:1912.13457 [cs.CV], 2020.
[20] J. Deng, J. Guo, N. Xue, S. Zafeiriou, "ArcFace: Additive Angular Margin Loss for Deep Face Recognition," in arXiv:1801.07698, 2018.
[21] O. Ronneberger, P. Fischer, T. Brox, "U-Net: Convolutional Networks for Biomedical Image Segmentation," in arXiv:1505.04597 [cs.CV], 2015.
[22] S. Suwajanakorn, "Fake videos of real people -- and how to spot them | Supasorn Suwajanakorn," Ted, 26 July 2018. [Online]. Available: https://www.youtube.com/watch?v=o2DDU4g0PRo. [Accessed 2021 June 16].
[23] J. Thies, M. Zollhöfer, M. Stamminger, C. Theobalt, M. Nießner, "Face2Face: Real-time Face Capture and Reenactment of RGB Videos," arXiv:2007.14808[cs.CV], 2020.
[24] "P. M. Kroonenberg and J. de Leeuw, Principal component analysis of three-mode data by means of alternating least squares algorithms, Psychometrika, 45th edition ,1980, pages 69–97".
[25] "Jolliffe, I. T. (2002). Principal Component Analysis. Springer Series in Statistics. New York: Springer-Verlag. doi:10.1007/b98835. ISBN 978-0-387-95442-4".
[26] J. Thies, M. Zollhöfer, M. Nießner, "Deferred Neural Rendering: Image Synthesis using Neural Textures," arXiv:1904.12356 [cs.CV], 2019.
[27] Y. Li, M. Chang and S. Lyu, "In Ictu Oculi: Exposing AI Created Fake Videos by Detecting Eye Blinking," 2018 IEEE International Workshop on Information Forensics and Security (WIFS), 2018, pp. 1-7, doi: 10.1109/WIFS.2018.8630787.
[28] D. Güera and E. J. Delp, "Deepfake Video Detection Using Recurrent Neural Networks," 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 2018, pp. 1-6, doi: 10.1109/AVSS.2018.8639163.
[29] X. Yang, Y. Li and S. Lyu, "Exposing Deep Fakes Using Inconsistent Head Poses," ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 8261-8265, doi: 10.1109/ICASSP.2019.8683164.
[30] Y. Li, X. Yang, P. Sun, H. Qi and S. Lyu, "Celeb-DF: A Large-Scale Challenging Dataset for DeepFake Forensics," 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 3204-3213, doi: 10.1109/CVPR42600.2020.00327.
[31] N. Dufour and A. Gully, "Deepfakes detection dataset by Google & Jigsaw," Google & Jigsaw [Online]. Available: https://ai.googleblog.com/2019/09/contributing-data-to-deepfake-detection.html, 2019
[32] B. J. Zi, M. H. Chang, J. J. Chen, X. J. Ma, Y. G. Jiang, "WildDeepfake: A Challenging Real-World Dataset for Deepfake Detection," in arXiv:2101.01456 [cs.CV], 2021.
[33] D. Afchar, V. Nozick, J. Yamagishi, I. Echizen, "MesoNet: a Compact Facial Video Forgery Detection Network," arXiv:1809.00888 [cs.CV], 2018, doi: 10.1109/WIFS.2018.8630761
[34] F. Chollet, "Xception: Deep Learning with Depthwise Separable Convolutions," arXiv:1610.02357v3 [cs.CV], 2017.
[35] C. Szegedy, W. Liu, Y. Q. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, "Going Deeper with Convolutions," arXiv:1409.4842 [cs.CV], 2014.
[36] Y. Z. Li and S. W. Lyu, DSP-FWA [Online]. Available: https://github.com/yuezunli/DSP-FWA, accessed on: 6 June 2021
[37] Y. Z. Li and S. W. Lyu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2019, pp. 46-52
[38] K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-778, doi: 10.1109/CVPR.2016.90.
[39] K. He, X. Zhang, S. Ren and J. Sun, "Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1904-1916, 1 Sept. 2015, doi: 10.1109/TPAMI.2015.2389824.
[40] B. Dolhansky, R. Howes, B. Pflaum, N. Baram, C. C. Ferrer, "The Deepfake Detection Challenge (DFDC) Preview Dataset," AI Red Team, Facebook AI, in arXiv:1910.08854 [cs.CV], 2019.
[41] A. York, "Always Up-to-Date Guide to Social Media Video Specs," sproutsocial, 20 May 2021. [Online]. Available: https://sproutsocial.com/insights/social-media-video-specs-guide/. [Accessed 18 June 2021].
[42] K. Catoy, "Current video resolution standards," video4change, 21 August 2020. [Online]. Available: https://video4change.org/the-basics-of-video-resolution/#:~:text=Current%20video%20resolution%20standards&text=The%20most%20common%20video%20resolution,the%20minimum%20standard%20for%20HD.. [Accessed 20 June 2021].
[43] N. Krawetz, "A Picture’s Worth... Digital Image Analysis and Forensics Version 2," in Semanticscholar; Corpus ID: 53007583, 2007.
[44] Y. Blau and T. Michaeli, “Rethinking Lossy Compression: The Rate-Distortion-Perception Tradeoff,” in Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, ser., Proceedings of Machine Learning Research, K. Chaudhuri and R. Salakhutdinov, Eds., vol. 97. PMLR, 2019, pp. 675–685.: [Online]. Available: http://proceedings.mlr.press/v97/blau19a.html.
[45] N. Ahmed, T. Natarajan and K. R. Rao, "Discrete Cosine Transform," in IEEE Transactions on Computers, vol. C-23, no. 1, pp. 90-93, Jan. 1974, doi: 10.1109/T-C.1974.223784.
[46] A. Alemi, "Improving Inception and Image Classification in TensorFlow," Google, 31 August 2016. [Online]. Available: https://ai.googleblog.com/2016/08/improving-inception-and-image.html. [Accessed 2 July 2021].
[47] C. Szegedy, S. Ioffe, V. Vanhoucke, A. Alemi, "Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning," in arXiv:1602.07261 [cs.CV], 2016.
[48] K. Zhang, Z. Zhang, Z. Li and Y. Qiao, "Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks," in IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1499-1503, Oct. 2016, doi: 10.1109/LSP.2016.2603342.
[49] T. Esler, "github.com/timesler/facenet-pytorch," 24 May 2021. [Online]. Available: https://github.com/timesler/facenet-pytorch. [Accessed 14 July 2021].
[50] "imutils," [Online]. Available: https://pypi.org/project/imutils/. [Accessed 2 July 2021].
[51] "dlib," [Online]. Available: http://dlib.net/. [Accessed 2 July 2021].
[52] “Deepfake Detection Challenge” [Online]. Available: https://www.kaggle.com/c/deepfake-detection-challenge [Accessed 2 July 2021]
[53] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio, "Generative Adversarial Networks" in arXiv:1406.2661 [stat.ML], 2014.