簡易檢索 / 詳目顯示

研究生: 黃泰銘
Tai-Ming Huang
論文名稱: 基於圖像的深度偽造檢測:通過基礎模型適應的泛化方法
Generalized Image-based Deepfake Detection through Foundation Model Adaptation
指導教授: 花凱龍
Kai-Lung Hua
口試委員: 曹昱
Yu Tsao
陳駿丞
Jun-Cheng Chen
陳宜惠
Yi-Hui Chen
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 43
中文關鍵詞: 深偽檢測基礎模型擴散模型生成圖像通用檢測器
外文關鍵詞: Deepfake Detection, Foundation Model Adapting, Diffusion-Generated Image, Universal detector
相關次數: 點閱:217下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 生成式人工智慧和合成圖像一直是電腦視覺的重要領域之一。隨著最近擴散模型(DM)
    的興起,展現出超越基於生成對抗網絡(GAN)性能的逼真圖像。檢測這些生成圖像 (Deepfakes)對於防止它們對我們的社會產生不利影響至關重要。儘管有一些工作專 注於檢測基於 GAN 的生成圖像,但我們注意到檢測方法無法推廣到新的基於 DM 生 成之圖像。為了解決這個問題,我們提出了一種新的基礎模型適應方法。利用基本模型 強大的特徵編碼和豐富的語義信息,我們採用混合解碼網絡來有效利用這些見解。競
    爭性實驗結果證明了我們的方法在檢測基於 GAN 和 DM 的合成圖像方面的有效性。 此外,我們還收集了一系列生成圖像數據集,包括 LSUN-Bedroom [1]、FFHQ [2] 和 MSCOCO [3] 三類,以及超過 15 種不同的生成模型架構。我們還對生成的面部圖像進 行了廣泛且具有前瞻性的實驗和研究。我們將很快發布我們的數據集,以促進新型合成 圖像檢測方面的社區發展。


    Generative AI and synthetic imaging have always been one of the important fields in computer vision. With the recent rise of Diffusion Models (DM), images that surpass the realism of Generative Adversarial Networks(GAN)-based performance are displayed. De- tecting these generated images (Deepfakes) is crucial to preventing them from adversely affecting our society. Although there are works focused on detecting GAN-based gener- ated images, we notice that the detection methods cannot generalize to new DM-based im- ages. To solve this problem, we propose a new method of foundation model adaptation. By leveraging the strong feature encoding and rich semantic information of the basic model, we employ a mixed decoding side network to effectively utilize these insights. Com- petitive experimental results prove the effectiveness of our method in detecting synthetic images based on GAN and DM. Moreover, we have collected a series of generated image datasets, including three categories of LSUN-Bedroom [1], FFHQ [2] and MSCOCO [3], and more than 15 different generation model architectures. We also carry out extensive and forward-looking experiments and research on generated facial images. We will soon release our datasets to promote the community development in the detection of new types of synthetic images.

    論文摘要 Abstract II Acknowledgement III Contents IV List of Figures V List of Tables VI 1 Introduction 1 2 Related Work 3 3 Proposed Method 6 3.1 Foundation Model Adaptation for Deepfake Detection 6 4 Experiments 13 4.1 Data Preparation 13 4.2 Experimental Setup 14 4.3 Comparison to Existing State-of-the-art Detectors 18 4.4 Multiple Dataset Evaluation 20 4.5 Ablation Study 24 5 Conclusions 26 References 27 授權書 35

    [1] F. Yu, A. Seff, Y. Zhang, S. Song, T. Funkhouser, and J. Xiao, “Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop,” arXiv preprint arXiv:1506.03365, 2015.
    [2] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” 2019.
    [3] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and
    C. L. Zitnick, “Microsoft coco: Common objects in context,” European Conference on Computer Vision (ECCV), 2014.
    [4] Y. Luo, Y. Zhang, J. Yan, and W. Liu, “Generalizing face forgery detection with high- frequency features,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
    [5] G. Somepalli, V. Singla, M. Goldblum, J. Geiping, and T. Goldstein, “Diffusion art or digital forgery? investigating data replication in diffusion models,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
    [6] N. Carlini, J. Hayes, M. Nasr, M. Jagielski, V. Sehwag, F. Tramer, B. Balle, D. Ip- polito, and E. Wallace, “Extracting training data from diffusion models,” arXiv preprint arXiv:2301.13188, 2023.
    [7] S.-Y. Wang, O. Wang, R. Zhang, A. Owens, and A. A. Efros, “Cnn-generated images are surprisingly easy to spot...for now,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
    [8] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,
    A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in neural in- formation processing systems (NeurIPS), 2014.
    [9] J. Dy, J. J. Virtusio, D. Tan, Y.-X. Lin, J. Ilao, Y.-Y. Chen, and K.-L. Hua, “Mc- gan: Mask controlled generative adversarial network for image retargeting,” Neural Computing and Applications (NCAA), 2023.
    [10] W. Xu, C. Long, R. Wang, and G. Wang, “Drb-gan: A dynamic resblock generative adversarial network for artistic style transfer,” IEEE International Conference on Computer Vision (ICCV), 2021.
    [11] J. J. Virtusio, J. J. M. Ople, D. S. Tan, N. K. M. Tanveer, and K.-L. Hua, “Neural style palette: A multimodal and interactive style transfer from a single style image,” IEEE Transactions on Multimedia (IEEE TMM), 2023.
    [12] D. S. Tan, Y.-X. Lin, and K.-L. Hua, “Incremental learning of multi-domain image- to-image translations,” IEEE Transactions on Circuits and Systems for Video Tech- nology (IEEE TCSVT), 2021.
    [13] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with con- ditional adversarial networks,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
    [14] Y.-H. Hung, J. Tan, T.-M. Huang, S.-C. Hsu, Y.-L. Chen, and K.-L. Hua, “Unpaired image-to-image translation using negative learning for noisy patches,” IEEE Multi- Media Magazine (IEEE Multimedia), 2022.
    [15] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken,
    A. Tejani, J. Totz, Z. Wang, et al., “Photo-realistic single image super-resolution using a generative adversarial network,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
    [16] J. J. M. Ople, D. Tan, A. Azcarraga, C.-L. Yang, and K.-L. Hua, “Super-resolution by image enhancement using texture transfer,” IEEE International Conference on Image Processing (ICIP), 2020.
    [17] F. Mokhayeri, K. Kamali, and E. Granger, “Cross-domain face synthesis using a controllable gan,” IEEE Winter Conference on Applications of Computer Vision (WACV), 2020.
    [18] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, “Analyzing and improving the image quality of stylegan,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
    [19] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in Neural Information Processing Systems (NeurIPS), 2020.
    [20] T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of gans for im- proved quality, stability, and variation,” International Conference on Learning Rep- resentations (ICLR), 2017.
    [21] P. Dhariwal and A. Nichol, “Diffusion models beat gans on image synthesis,” Ad- vances in Neural Information Processing Systems (NeurIPS), 2021.
    [22] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
    [23] J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” arXiv preprint arXiv:2010.02502, 2020.
    [24] A. Hertz, R. Mokady, J. Tenenbaum, K. Aberman, Y. Pritch, and D. Cohen- Or, “Prompt-to-prompt image editing with cross attention control,” arXiv preprint arXiv:2208.01626, 2022.
    [25] A. Nichol, P. Dhariwal, A. Ramesh, P. Shyam, P. Mishkin, B. McGrew, I. Sutskever, and M. Chen, “Glide: Towards photorealistic image generation and editing with text- guided diffusion models,” arXiv preprint arXiv:2112.10741, 2021.
    [26] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
    [27] D. Cozzolino, J. Thies, A. Rössler, C. Riess, M. Nießner, and L. Verdoliva, “Foren- sictransfer: Weakly-supervised domain adaptation for forgery detection,” arXiv preprint arXiv:1812.02510, 2018.
    [28] X. Zhang, S. Karaman, and S.-F. Chang, “Detecting and simulating artifacts in gan fake images,” IEEE International Workshop on Information Forensics and Security (WIFS), 2019.
    [29] F. Marra, D. Gragnaniello, L. Verdoliva, and G. Poggi, “Do gans leave artificial fin- gerprints?,” IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), 2019.
    [30] S. Mandelli, N. Bonettini, P. Bestagini, and S. Tubaro, “Detecting gan-generated images by orthogonal training of multiple cnns,” IEEE International Conference on Image Processing (ICIP), 2022.
    [31] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
    [32] Y.-T. Zhou, J.-B. Dy, S.-C. Hsu, Y.-L. Hsu, C.-L. Yang, and K.-L. Hua, “Ssrface: A face recognition framework against shallow data,” arXiv preprint arXiv:2301.13188, 2023.
    [33] J.-D. Lin, H.-H. Lin, J. Dy, J.-C. Chen, M. Tanveer, I. Razzak, and K.-L. Hua, “Lightweight face anti-spoofing network for telehealth applications,” IEEE Journal of Biomedical and Health Informatics (IEEE JBHI), 2022.
    [34] J.-D. Lin, Y.-H. Han, J. T. Po-Han Huang, J.-C. Chen, M. Tanveer, and K.-L. Hua, “Defaek: Domain effective fast adaptive network for face anti-spoofing,” Neural Networks, 2023.
    [35] R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill, et al., “On the opportunities and risks of foundation models,” arXiv preprint arXiv:2108.07258, 2021.
    [36] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry,
    A. Askell, P. Mishkin, J. Clark, et al., “Learning transferable visual models from nat- ural language supervision,” International conference on machine learning (ICML), 2021.
    [37] M. Jia, L. Tang, B.-C. Chen, C. Cardie, S. Belongie, B. Hariharan, and S.-N. Lim, “Visual prompt tuning,” European Conference on Computer Vision (ECCV), 2022.
    [38] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-rank adaptation of large language models,” International Conference on Learning Representations (ICLR), 2022.
    [39] P. Gao, S. Geng, R. Zhang, T. Ma, R. Fang, Y. Zhang, H. Li, and Y. Qiao, “Clip-adapter: Better vision-language models with feature adapters,” arXiv preprint arXiv:2110.04544, 2021.
    [40] F. W. H. H. X. B. Mengde Xu, Zheng Zhang, “Side adapter network for open- vocabulary semantic segmentation,” IEEE Conference on Computer Vision and Pat- tern Recognition (CVPR), 2023.
    [41] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner,
    M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” International Conference on Learning Representations (ICLR), 2021.
    [42] H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, “Training data-efficient image transformers & distillation through attention,” 2021.
    [43] I. Eckstein, J. P. Lee-Thorp, J. Ainslie, and S. Ontanon, “Fnet: Mixing tokens with fourier transforms,” arXiv preprint arXiv:2105.03824, 2022.
    [44] Y. Rao, W. Zhao, Z. Zhu, J. Lu, and J. Zhou, “Global filter networks for image classification,” Advances in neural information processing systems (NeurIPS), 2021.
    [45] B. N. Patro, V. P. Namboodiri, and V. S. Agneeswaran, “Spectformer: Frequency and attention is what you need in a vision transformer,” arXiv preprint arXiv:2304.06446, 2023.
    [46] Z. Lin, S. Geng, R. Zhang, P. Gao, G. de Melo, X. Wang, J. Dai, Y. Qiao, and H. Li, “Frozen clip models are efficient video learners,” arXiv preprint arXiv:2208.03550, 2022.
    [47] Z. Sha, Z. Li, N. Yu, and Y. Zhang, “De-fake: Detection and attribution of fake images generated by text-to-image diffusion models,” arXiv preprint arXiv:2210.06998, 2022.
    [48] J. Ricker, S. Damm, T. Holz, and A. Fischer, “Towards the detection of diffusion model deepfakes,” arXiv preprint arXiv:2210.14571, 2022.
    [49] R. Corvi, D. Cozzolino, G. Zingarini, G. Poggi, K. Nagano, and L. Verdoliva, “On the detection of synthetic images generated by diffusion models,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.
    [50] T. Yao, Y. Pan, Y. Li, C.-W. Ngo, and T. Mei, “Wave-vit: Unifying wavelet and transformers for visual representation learning,” European conference on computer vision (ECCV), 2022.
    [51] Y. Rao, W. Zhao, Z. Zhu, J. Lu, and J. Zhou, “Global filter networks for image clas- sification,” Advances in Neural Information Processing Systems (NeurIPS), 2021.
    [52] M. Shahid, S.-F. Chen, Y.-L. Hsu, Y.-Y. Chen, Y.-L. Chen, and K.-L. Hua, “Forest fire segmentation via temporal transformer from aerial images,” Forests, 2023.
    [53] M. Shahid and K.-L. Hua, “Forest fire segmentation via temporal transformer from aerial images,” ACM International Conference on Multimedia Retrieval (ACM ICMR), 2021.
    [54] P. Dhariwal and A. Nichol, “Diffusion models beat gans on image synthesis,” Ad- vances in Neural Information Processing Systems (NeurIPS), 2021.
    [55] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in Neural Information Processing Systems (NeurIPS), 2020.
    [56] A. Q. Nichol and P. Dhariwal, “Improved denoising diffusion probabilistic models,”
    International Conference on Machine Learning (ICML), 2021.
    [57] L. Liu, Y. Ren, Z. Lin, and Z. Zhao, “Pseudo numerical methods for diffusion models on manifolds,” International Conference on Machine Learning (ICML), 2022.
    [58] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” 2022.
    [59] A. Sauer, K. Chitta, J. Müller, and A. Geiger, “Projected gans converge faster,” Ad- vances in Neural Information Processing Systems(NeurIPS), 2021.
    [60] Z. Wang, H. Zheng, P. He, W. Chen, and M. Zhou, “Diffusion-gan: Training gans with diffusion,” Computing Research Repository (CoRR), 2022.
    [61] J. Choi, J. Lee, C. Shin, S. Kim, H. Kim, and S. Yoon, “Perception prioritized training of diffusion models,” 2022.
    [62] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, “Analyzing and improving the image quality of StyleGAN,” 2020.
    [63] T. Karras, M. Aittala, S. Laine, E. Härkönen, J. Hellsten, J. Lehtinen, and T. Aila, “Alias-free generative adversarial networks,” Advances in Neural Information Pro- cessing Systems(NeurIPS), 2021.
    [64] A. Nichol, P. Dhariwal, A. Ramesh, P. Shyam, P. Mishkin, B. McGrew, I. Sutskever, and M. Chen, “Glide: Towards photorealistic image generation and editing with text- guided diffusion models,” Computing Research Repository (CoRR), 2021.
    [65] A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, “Hierarchical text- conditional image generation with clip latents,” Computing Research Repository (CoRR), 2022.
    [66] I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017.
    [67] Z. Wang, J. Bao, W. Zhou, W. Wang, H. Hu, H. Chen, and H. Li, “Dire for diffusion- generated image detection,” arXiv preprint arXiv:2303.09295, 2023.
    [68] M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” International conference on machine learning (ICML), 2019.
    [69] K. Shiohara and T. Yamasaki, “Detecting deepfakes with self-blended images,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
    [70] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
    [71] M. Oquab, T. Darcet, T. Moutakanni, H. V. Vo, M. Szafraniec, V. Khalidov,
    P. Fernandez, D. Haziza, F. Massa, A. El-Nouby, R. Howes, P.-Y. Huang, H. Xu,
    V. Sharma, S.-W. Li, W. Galuba, M. Rabbat, M. Assran, N. Ballas, G. Synnaeve,
    I. Misra, H. Jegou, J. Mairal, P. Labatut, A. Joulin, and P. Bojanowski, “Dinov2: Learning robust visual features without supervision,” 2023.
    [72] Y. Zheng, H. Yang, T. Zhang, J. Bao, D. Chen, Y. Huang, L. Yuan, D. Chen, M. Zeng, and F. Wen, “General facial representation learning in a visual-linguistic manner,” arXiv preprint arXiv:2112.03109, 2021.
    [73] M. Cherti, R. Beaumont, R. Wightman, M. Wortsman, G. Ilharco, C. Gordon,
    C. Schuhmann, L. Schmidt, and J. Jitsev, “Reproducible scaling laws for con- trastive language-image learning,” IEEE Conference on Computer Vision and Pat- tern Recognition (CVPR), 2023.
    [74] Y. Zheng, H. Yang, T. Zhang, J. Bao, D. Chen, Y. Huang, L. Yuan, D. Chen, M. Zeng, and F. Wen, “General facial representation learning in a visual-linguistic manner,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
    [75] S. Beckers, H. Chockler, and J. Halpern, “A causal analysis of harm,” Conference on Neural Information Processing Systems (NeurIPS), 2022.

    無法下載圖示 全文公開日期 2025/08/01 (校內網路)
    全文公開日期 2025/08/01 (校外網路)
    全文公開日期 2025/08/01 (國家圖書館:臺灣博碩士論文系統)
    QR CODE