簡易檢索 / 詳目顯示

研究生: 羅濟威
Chi-Wei Lo
論文名稱: 基於人臉辨識加人臉對齊及數據增強強化後的深度造假檢測器
Deepfake Detector based on Face Recognition Enhanced with Face Alignment and Data Augmentation
指導教授: 李漢銘
Hahn-Ming Lee
鄭欣明
Shin-Ming Cheng
口試委員: 李漢銘
Hahn-Ming Lee
鄭欣明
Shin-Ming Cheng
毛敬豪
Ching-Hao Mao
鄧惟中
Wei-Chung Teng
林豐澤
Feng-Tse Lin
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 86
中文關鍵詞: 深度造假檢測人臉辨識人臉對齊
外文關鍵詞: deepfake detection, face recognition, face alignment
相關次數: 點閱:315下載:16
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

由於GAN技術的成熟,人工檢測的技術已經難以區分。一旦攻擊者使用深度造假的攻擊,篡改影片中的人物臉部,替換成具有政治影響力身份的人,並將此篡改過後的影片發佈到社群媒體上,勢必會造成一個國家巨大的損失,因此防禦深度造假的檢測在最近陸續被提出。然而這些深度造假的檢測準確度並不高,是因為現有的檢測方法幾乎都是萃取出影片中的臉部並任意縮放其尺寸來直接訓練成二元分類器,而往往忽略掉這些萃取出的臉部,經過任意縮放後,會導致原始臉部的圖像失真,進而遺失掉深度造假的重要特徵。而且,這些臉部有著各個不同的角度,若不經過特別的處理,會讓分類器不易精準地學習到深度造假的特徵。為了能改善現有深度造假檢測的準度,我們提出了人臉對齊的方法,目的是在縮放臉部尺寸的同時,能保有原本臉部的原始特徵,並且把這些臉部的特徵分別對齊到同一個區域。這項研究的結果表明,與現有的基於面部的深度假檢測方法相比,我們獲得了93.25%的準確率、99.00%的召回率和99.10%的AUC。本研究主要有以下幾點貢獻: (1) 提出一種基於人臉識別並使用人臉對齊強化後的新框架,改善現有的深度造假檢測; (2) 提出一種生成深度造假訓練數據集的方法; (3) 建構自動化深度造假檢測系統。


Recently, deepfake receives lots of attentions since it could successfully manipulate the face on a video to a certain person who has the political influence without being discovered. Once the fake video is published to the social media, severe negative effects are caused. Although many deepfake detection methods are proposed to mitigate this threat, the accuracy of the existing deepfake detection methods is not high enough since they just crop the targeted faces and resize them to train as a binary classifier. The problem of face distortion and the angles of cropped faces are not well considered. In this thesis, we propose a face alignment method to keep the original features of the faces and align the features to a specific position. Moreover, we propose a method to generate a reliable deepfack training dataset. The experimental results of the developed automatic deepfake detection system show that the alignment-based approach achieves 93.25\% accuracy rate, 99.00\% recall rate, and 99.10\% AUC, which outperforms existing face-based deepfake detection methods. The main contributions of the study are as follows: (1) Proposing a new framework based on face recognition enhanced with face alignment to improve the existing deepfake detections; (2) Proposing a method to generate the deepfake training dataset; (3) Building an automatic deepfake detection system.

1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Challenges and Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 Background and Related Work 7 2.1 Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Deepfake Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1 Video-based Deepfake . . . . . . . . . . . . . . . . . . . . . 8 2.2.2 Audio-based Deepfake . . . . . . . . . . . . . . . . . . . . . 9 2.3 Deepfake Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3.1 Deepfake detection based on learning . . . . . . . . . . . . . 10 2.3.2 Deepfake detection based on biological characteristics . . . . 10 2.4 Deepfake Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.5 Face Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3 Deepfake Detector based on Face Recognition Enhanced with Face Alignment 14 3.1 Attack Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2 Video Reader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3 Face Recognizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.4 Face Alignment System . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.4.1 Face Rotator . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.4.2 Face Cropper . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.4.3 Face Resizer . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.5 Face Comparator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.6 Data Augmenter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.7 Deepfake Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4 Experimental Results and Effectiveness Analysis 26 4.1 Environment Setup and Dataset . . . . . . . . . . . . . . . . . . . . . 27 4.1.1 Experimental Design . . . . . . . . . . . . . . . . . . . . . . 27 4.1.2 Deepfake Detection Challenge dataset . . . . . . . . . . . . . 28 4.1.3 Analysis Environment . . . . . . . . . . . . . . . . . . . . . 29 4.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.3 Effectiveness Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.3.1 Experimental Settings . . . . . . . . . . . . . . . . . . . . . 33 4.3.2 Analysis Tool . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.3.3 Comparison of Deepfake Detection on Entire Frames and Faces 37 4.3.4 Comparison of Face-based Deepfake Detection Enhanced with and without Face Alignment . . . . . . . . . . . . . . . . . . 38 4.3.5 Comparison of Face-based Deepfake Detection Enhanced with and without Face Alignment after Data Augmentation . . . . 39 4.3.6 Summary of Effectiveness Analysis . . . . . . . . . . . . . . 40 4.3.7 Time Complexity Analysis . . . . . . . . . . . . . . . . . . . 46 5 Conclusions & Further Work 47 5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.2 Limitations & further work . . . . . . . . . . . . . . . . . . . . . . . 48

[1] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative Adversarial Nets,” in Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2014, pp. 2672–2680.
[2] E. Alpaydin, Introduction to machine learning. MIT press, 2020.
[3] J. Wu, C. Zhang, T. Xue, B. Freeman, and J. Tenenbaum, “Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling,” in Advances in neural information processing systems, 2016, pp. 82–90.
[4] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1125–1134.
[5] Y. Hu, H. He, C. Xu, B. Wang, and S. Lin, “Exposure: A white-box photo postprocessing framework,” ACM Transactions on Graphics (TOG), vol. 37, no. 2, pp. 1–17, 2018.
[6] O. Kupyn, V. Budzan, M. Mykhailych, D. Mishkin, and J. Matas, “Deblurgan: Blind motion deblurring using conditional adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8183–8192.
[7] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2223–2232.
[8] Z. He, W. Zuo, M. Kan, S. Shan, and X. Chen, “Attgan: Facial attribute editing by only changing what you want,” IEEE Transactions on Image Processing, vol. 28, no. 11, pp. 5464–5478, 2019.
[9] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang et al., “Photo-realistic single image super-resolution using a generative adversarial network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4681–4690.
[10] T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of gans for improved quality, stability, and variation,” arXiv preprint arXiv:1710.10196, 2017.
[11] “Ai-assisted fake porn is here and we’re all fuckedr,” accessed on: Jul. 08, 2020. [Online]. Available: https://www.vice.com/en_us/article/gydydm/gal-gadot-fake-ai-porn
[12] “California laws seek to crack down on deepfakes in politics and porn,” accessed on: Jul. 08, 2020. [Online]. Available: https://www.cnet.com/news/ california-laws-seek-to-crack-down-on-deepfakes-in-politics-and-porn
[13] “Deepfake,” accessed on: Jul. 08, 2020. [Online]. Available: https://en. wikipedia.org/wiki/Deepfake
[14] “Faceswap,” accessed on: Jul. 08, 2020. [Online]. Available: https://faceswap. dev/
[15] “Faceswap,” accessed on: Jul. 08, 2020. [Online]. Available: https://github.com/ MarekKowalski/FaceSwap/
[16] J. Thies, M. Zollhofer, M. Stamminger, C. Theobalt, and M. Nießner, “Face2face: Real-time face capture and reenactment of rgb videos,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2387– 2395.
[17] J. Thies, M. Zollhöfer, and M. Nießner, “Deferred neural rendering: Image synthesis using neural textures,” ACM Transactions on Graphics, vol. 38, no. 4, pp. 1–12, 2019.
[18] L. Song, W. Wu, C. Qian, R. He, and C. C. Loy, “Everybody’s Talkin’: Let Me Talk as You Want,” arXiv preprint arXiv:2001.05201, 2020.
[19] A. Rossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner, “Faceforensics++: Learning to detect manipulated facial images,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 1–11.
[20] H. H. Nguyen, J. Yamagishi, and I. Echizen, “Capsule-forensics: Using capsule networks to detect forged images and videos,” in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019, pp. 2307–2311.
[21] D. Afchar, V. Nozick, J. Yamagishi, and I. Echizen, “Mesonet: a compact facial video forgery detection network,” in 2018 IEEE International Workshop on Information Forensics and Security (WIFS). IEEE, 2018, pp. 1–7.
[22] T. Nhu, I. Na, and S. Kim, “Forensics face detection from gans using convolutional neural network,” in Proceeding of 2018 International Symposium on Information Technology Convergence (ISITC 2018), 2018.
[23] M. Koopman, A. M. Rodriguez, and Z. Geradts, “Detection of deepfake video manipulation,” in The 20th Irish Machine Vision and Image Processing Conference (IMVIP), 2018, pp. 133–136.
[24] D. Güera and E. J. Delp, “Deepfake video detection using recurrent neural networks,” in 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 2018, pp. 1–6.
[25] Y. Li, M.-C. Chang, and S. Lyu, “In ictu oculi: Exposing ai created fake videos by detecting eye blinking,” in 2018 IEEE International Workshop on Information Forensics and Security (WIFS). IEEE, 2018, pp. 1–7.
[26] U. A. Ciftci and I. Demir, “FakeCatcher: Detection of Synthetic Portrait Videos using Biological Signals,” arXiv preprint arXiv:1901.02212, 2019.
[27] “Deepfake detection challenge results: An open initiative to advance ai,” accessed on: Jul. 08, 2020. [Online]. Available: https://ai.facebook.com/blog/ deepfake-detection-challenge-results-an-open-initiative-to-advance-ai
[28] “Who coronavirus disease 2019 (covid-19) situation report,” accessed on: Jul. 08, 2020. [Online]. Available: https://www.who.int/docs/default-source/coronaviruse/20200312-sitrep-52-covid-19.pdf?sfvrsn=e2bfc9c0_2
[29] “This open-source program deepfakes you during zoom meetings, in real time,” accessed on: Jul. 08, 2020. [Online]. Available: https://www.vice.com/en_us/article/g5xagy/ this-open-source-program-deepfakes-you-during-zoom-meetings-in-real-time
[30] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, pp. 436–444, 2015.
[31] A. C. Hardy, “A study of the persistence of vision,” Proceedings of the National Academy of Sciences of the United States of America, vol. 6, no. 4, p. 221, 1920.
[32] S. Suwajanakorn, S. M. Seitz, and I. Kemelmacher-Shlizerman, “Synthesizing obama: learning lip sync from audio,” ACM Transactions on Graphics, vol. 36, no. 4, pp. 1–13, 2017.
[33] X. Yang, Y. Li, and S. Lyu, “Exposing deep fakes using inconsistent head poses,” in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019, pp. 8261–8265.
[34] P. Korshunov and S. Marcel, “Deepfakes: a new threat to face recognition? assessment and detection,” arXiv preprint arXiv:1812.08685, 2018.
[35] Y. Li, X. Yang, P. Sun, H. Qi, and S. Lyu, “Celeb-df: A large-scale challenging dataset for deepfake forensics,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 3207–3216.
[36] B. Dolhansky, R. Howes, B. Pflaum, N. Baram, and C. C. Ferrer, “The deepfake detection challenge (dfdc) preview dataset,” arXiv preprint arXiv:1910.08854, 2019.
[37] “Fakeapp,” accessed on: Jul. 08, 2020. [Online]. Available: https://www. malavida.com/en/soft/fakeapp
[38] “faceswap-gan,” accessed on: Jul. 08, 2020. [Online]. Available: https: //github.com/shaoanlu/faceswap-GAN
[39] D. E. King, “Max-margin object detection,” arXiv preprint arXiv:1502.00046, 2015.
[40] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, “Joint face detection and alignment using multitask cascaded convolutional networks,” IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1499–1503, 2016.
[41] V. Bazarevsky, Y. Kartynnik, A. Vakunov, K. Raveendran, and M. Grundmann, “Blazeface: Sub-millisecond neural face detection on mobile gpus,” arXiv preprint arXiv:1907.05047, 2019.
[42] J. Deng, J. Guo, Y. Zhou, J. Yu, I. Kotsia, and S. Zafeiriou, “Retinaface: Singlestage dense face localisation in the wild,” arXiv preprint arXiv:1905.00641, 2019.
[43] S. Yang, P. Luo, C.-C. Loy, and X. Tang, “Wider face: A face detection benchmark,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 5525–5533.
[44] R. W. Hamming, “Error detecting and error correcting codes,” The Bell system technical journal, vol. 29, no. 2, pp. 147–160, 1950.
[45] C. Shorten and T. M. Khoshgoftaar, “A survey on image data augmentation for deep learning,” Journal of Big Data, vol. 6, no. 1, p. 60, 2019.
[46] A. Canziani, A. Paszke, and E. Culurciello, “An analysis of deep neural network models for practical applications,” arXiv preprint arXiv:1605.07678, 2016.
[47] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
[48] S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1492–1500.
[49] T. Fawcett, “An introduction to ROC analysis,” Pattern recognition letters, vol. 27, no. 8, pp. 861–874, 2006.
[50] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 618–626.

QR CODE