簡易檢索 / 詳目顯示

研究生: 盧敬元
Jing-Yuan Lu
論文名稱: 一個針對多種生成對抗網路產生的偽造風格文字影像的竄改偵測方法
A Tampering Detection Approach to Forgery Style Text Images Generated by Multiple Generative Adversarial Networks
指導教授: 范欽雄
Chin-Shyurng Fahn
口試委員: 李建德
鄭為民
吳怡樂
范欽雄
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2023
畢業學年度: 112
語文別: 英文
論文頁數: 71
中文關鍵詞: 深度學習頻率域轉換生成對抗網路偽造文字影像偵測場景文本影像
外文關鍵詞: deep learning, frequency domain transformation, Generative Adversarial Network, forgery detection of character images, scene text images
相關次數: 點閱:37下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 身處於網路時代,網路上每天都充斥著大量的資訊在流傳,其中不乏許多經過精心製作的偽造影像。除了利用影像編輯軟體人工竄改影像,還有將生成對抗網路(GAN)應用在人臉偽造的深度偽造技術(Deepfake)。近年來,文本影像的篡改技術也被廣泛應用於竄改身分證件以及偽造他人簽署的文件。這些偽造文件若被有心人士利用,將對社會安全造成嚴重威脅。
    在本論文中,我們針對生成對抗網路產生的文本影像提出了基於多種光譜特徵的偽造偵測模型。實作上分成兩個部分。首先,我們藉由數個生成對抗網路產生大量且多樣的偽造影像。生成的偽造影像包含繁體中文和英文,並且有著不同的字體與背景。接下來,我們將偽造的影像和真實的影像分別進行多種頻率域變換,這些轉換包括離散傅立葉變換(DFT)、離散餘弦變換(DCT)和離散小波變換(DWT)。並將轉換後的光譜串連,作為密集連接卷積網路(DenseNet)的模型輸入進行訓練。
    最後,論文的實驗主要分成三個部分。首先實驗一中,我們使用影像處理常見的幾個評估指標對生成對抗網路產生的文本影像進行量化分析。在實驗二中,因為要找出用於偽偵測中泛化性最高的深度神經網路,這些轉換後的光譜會與不同的神經網路模型搭配並進行訓練。實驗結果顯示,在DCT和DWT光譜上視覺轉換器(ViT)的正確率最高,分別達到98.47%和97.14%,不過DenseNet的正確率也十分接近。至於在DFT光譜上DenseNet達到95.42%的正確率,ViT正確率則大幅下降。綜合以上實驗結果,DenseNet在我們的資料集中有最好的泛化能力。最後一個實驗,我們將不同的光譜串接再一起形成不同的組合,並採用DenseNet進行消融實驗。實驗結果表明,當串結三個光譜作為輸入訓練模型,正確率能達到99.66%。比起指使用單一光譜進行辨識,我們的方法能夠更好的幫助模型將不同來源的偽造影像識別出來。


    In the era of the internet, we are inundated with a vast amount of information circulating on the web, including many carefully crafted forged images. Apart from manually altering images using image editing software, there is also the application of Generative Adversarial Networks (GANs) in deepfake technology for facial forgery. In recent years, tampering techniques for text images have also been widely used to manipulate identification documents and forge signatures on documents. If these forged documents are maliciously utilized, they pose a serious threat to social security.
    In this thesis, we propose a forgery detection model based on multiple spectral features for text images generated by Generative Adversarial Networks. The implementation is mainly divided into two parts. Firstly, we generate a large and diverse set of forged images through multiple Generative Adversarial Networks. The generated forged images include traditional Chinese and English, with different fonts and backgrounds. Next, we perform various frequency domain transformations on both forged and real images, including Discrete Fourier Transform (DFT), Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT). The transformed spectrums are concatenated and used as the input for training a Dense Connectivity Convolutional Network (DenseNet).
    Finally, the experiments in the thesis are divided into three parts. In the first experiment, we perform quantitative analysis on the text images generated by Generative Adversarial Networks using several common image quality evaluation metrices. In the second experiment, to find the deep neural network with the highest generalization for forgery detection, the transformed spectrums are paired with different deep learning models and trained. The experimental results show that the Vision Transformer (ViT) achieves the highest accuracy on DCT and DWT spectra, reaching 98.47% and 97.14%, respectively, although DenseNet performs very closely. As for the DFT spectrum, DenseNet achieves an accuracy of 95.42%, while ViT's accuracy significantly decreases. Overall, DenseNet demonstrates the best generalization ability on our dataset. In the last experiment, we concatenate different spectra to form various combinations and conduct a comprehensive experiment using DenseNet. The results show that when three spectra are concatenated as input to train the model, the accuracy can reach 99.66%. Compared to recognizing using a single spectrum, our approach better assists the model in identifying forged images from different sources.

    中文摘要 i Abstract ii 誌謝 iv List of Figures vii List of Tables ix Chapter 1 Introduction 1 1.1 Overview 1 1.2 Motivation 3 1.3 System Description 4 1.4 Thesis Organization 6 Chapter 2 Related Work 7 2.1 GAN and GAN-based Forgery Characters Generation 7 2.2 Scene Texts Forgery Images Generation 9 2.3 GAN-generated Images Detection Methods 12 Chapter 3 FSText Dataset 14 3.1 Source Dataset 14 3.2 Forged Texts Image Generation 15 3.2.1 SRNet 15 3.2.2 TENet 17 3.2.3 FSTNet 19 3.3 Implementation Detail of FSTNet 22 3.3.1 Mask prediction module 22 3.3.2 Background inpainting module 23 3.3.3 Text conversion module 26 3.3.4 Fusion module 26 3.4 Illustration of the Proposed Dataset 29 3.4.1 Forged text image generation through SRNet 29 3.4.2 Forged text image generation through TENet 31 3.4.3 Forged text image generation through FSTNet 33 Chapter 4 Spectrum-based Detection Method for Forged Text Images 35 4.1 Preprocessing of Data 35 4.2 Deep Learning Model for Forgery Text Images Detection 40 4.2.1 VGG19 40 4.2.2 Densely connected convolutional network (DenseNet) 41 4.2.3 Vision Transformer (ViT) 42 4.3 Forgery Text Image Detection Method with Spectrum Input 44 Chapter 5 Experimental Results and Discussion 47 5.1 Experimental Environment Setup 47 5.2 Experimental Results on Different GANs 49 5.3 Experimental Results of Forgery Classification 53 5.3.1 The detection results of RGB-based Models 55 5.3.2 The detection results of DCT-based Models 56 5.3.3 The detection results of DFT-based Models 58 5.3.4 The detection results of DWT 60 5.4 Ablation Study on Different Spectrum Combinations 62 Chapter 6 Conclusions and Future Work 65 6.1 Conclusions 65 6.2 Future Work 66

    [1] D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. G. i Bigorda, S. R. Mestre, J. Mas, D. F. Mota, J. A. Almazan, and L. P. De Las Heras, “ICDAR 2013 robust reading competition,” in 2013 12th International Conference on Document Analysis and Recognition, 2013: IEEE, pp. 1484-1493.
    [2] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in Neural Information Processing Systems, vol. 27, 2014.
    [3] L. Wu, C. Zhang, J. Liu, J. Han, J. Liu, E. Ding, and X. Bai, “Editing text in the wild,” in Proceedings of the 27th ACM International Conference on Multimedia, 2019, pp. 1500-1508.
    [4] A. Ghai, P. Kumar, and S. Gupta, “A deep-learning-based image forgery detection framework for controlling the spread of misinformation,” Information Technology & People, 2021.
    [5] T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of gans for improved quality, stability, and variation,” arXiv preprint arXiv:1710.10196, 2017.
    [6] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4401-4410.
    [7] Y. Viazovetskyi, V. Ivashkin, and E. Kashin, “Stylegan2 distillation for feed-forward image manipulation,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXII 16, 2020: Springer, pp. 170-186.
    [8] A. Brock, J. Donahue, and K. Simonyan, “Large scale GAN training for high fidelity natural image synthesis,” arXiv preprint arXiv:1809.11096, 2018.
    [9] Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo, “Stargan: Unified generative adversarial networks for multi-domain image-to-image translation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8789-8797.
    [10] Y. Choi, Y. Uh, J. Yoo, and J.-W. Ha, “Stargan v2: Diverse image synthesis for multiple domains,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 8188-8197.
    [11] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2223-2232.
    [12] J. Kim, M. Kim, H. Kang, and K. Lee, “U-gat-it: Unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation,” arXiv preprint arXiv:1907.10830, 2019.
    [13] H. Tang, H. Liu, D. Xu, P. H. Torr, and N. Sebe, “Attentiongan: Unpaired image-to-image translation using attention-guided generative adversarial networks,” IEEE Transactions on Neural Networks and Learning Systems, 2021.
    [14] P. Lyu, X. Bai, C. Yao, Z. Zhu, T. Huang, and W. Liu, “Auto-encoder guided GAN for Chinese calligraphy synthesis,” in 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), 2017, vol. 1: IEEE, pp. 1095-1100.
    [15] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, 2015: Springer, pp. 234-241.
    [16] B. Chang, Q. Zhang, S. Pan, and L. Meng, “Generating handwritten chinese characters using cyclegan,” in 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), 2018: IEEE, pp. 199-207.
    [17] P. Roy, S. Bhattacharya, S. Ghosh, and U. Pal, “STEFANN: scene text editor using font adaptive neural network,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 13228-13237.
    [18] S. J. Nightingale, K. A. Wade, and D. G. Watson, “Can people identify original and manipulated photos of real-world scenes?,” Cognitive research: principles and implications, vol. 2, no. 1, pp. 1-21, 2017.
    [19] A. Gupta, A. Vedaldi, and A. Zisserman, “Synthetic data for text localisation in natural images,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2315-2324.
    [20] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
    [21] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” arXiv preprint arXiv:1511.06434, 2015.
    [22] Y. Tian. “Zi2zi: Master Chinese calligraphy with conditional adversarial networks,” [Online]Available: https://github.com/kaonashi-tyc/zi2zi. [Accessed Nov. 01, 2023].
    [23] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1125-1134.
    [24] V. Fragoso, S. Gauglitz, S. Zamora, J. Kleban, and M. Turk, “TranslatAR: A mobile augmented reality translator,” in 2011 IEEE Workshop on Applications of Computer Vision (WACV), 2011: IEEE, pp. 497-502.
    [25] S. Yang, J. Liu, W. Yang, and Z. Guo, “Context-aware unsupervised text stylization,” in Proceedings of the 26th ACM International Conference on Multimedia, 2018, pp. 1688-1696.
    [26] P. Lyu, C. Yao, W. Wu, S. Yan, and X. Bai, “Multi-oriented scene text detection via corner localization and region segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7553-7563.
    [27] K. Saeed, M. Tabędzki, M. Rybnik, and M. Adamski, “K3M: A universal algorithm for image skeletonization and a review of thinning techniques,” International Journal of Applied Mathematics and Computer Science, vol. 20, no. 2, pp. 317-335, 2010.
    [28] C. Chen, S. Zhang, F. Lan, and J. Huang, “Domain-agnostic document authentication against practical recapturing attacks,” IEEE Transactions on Information Forensics and Security, vol. 17, pp. 2890-2905, 2022.
    [29] S. S. Ali, I. I. Ganapathi, N.-S. Vu, S. D. Ali, N. Saxena, and N. Werghi, “Image forgery detection using deep learning by recompressing images,” Electronics, vol. 11, no. 3, p. 403, 2022.
    [30] S. Walia, K. Kumar, S. Agarwal, and H. Kim, “Using xai for deep learning-based image manipulation detection with shapley additive explanation,” Symmetry, vol. 14, no. 8, p. 1611, 2022.
    [31] N. Yu, L. S. Davis, and M. Fritz, “Attributing fake images to gans: Learning and analyzing gan fingerprints,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 7556-7566.
    [32] F. Marra, D. Gragnaniello, L. Verdoliva, and G. Poggi, “Do gans leave artificial fingerprints?,” in 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), 2019: IEEE, pp. 506-511.
    [33] P. Gupta, C. S. Rajpoot, T. Shanthi, D. Prasad, A. Kumar, and S. S. Kumar, “Image Forgery Detection using Deep Learning Model,” in 2022 3rd International Conference on Smart Electronics and Communication (ICOSEC), 2022: IEEE, pp. 1256-1262.
    [34] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
    [35] D. Gragnaniello, F. Marra, G. Poggi, and L. Verdoliva, “Analysis of adversarial attacks against CNN-based image forgery detectors,” in 2018 26th European Signal Processing Conference (EUSIPCO), 2018: IEEE, pp. 967-971.
    [36] Y. Zhang, J. Goh, L. L. Win, and V. L. Thing, “Image region forgery detection: A deep learning approach,” SG-CRC, vol. 2016, pp. 1-11, 2016.
    [37] C.-S. Fahn and T.-C. Wu, “A Deep-Neural-Network-Based Approach To Detecting Forgery Images Generated From Various Generative Adversarial Networks,” in 2022 International Conference on Machine Learning and Cybernetics (ICMLC), 2022: IEEE, pp. 115-123.
    [38] G. Liu, F. A. Reda, K. J. Shih, T.-C. Wang, A. Tao, and B. Catanzaro, “Image inpainting for irregular holes using partial convolutions,” in Proceedings of the European conference on Computer Vision (ECCV), 2018, pp. 85-100.
    [39] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770-778.
    [40] Z. Tang, T. Miyazaki, Y. Sugaya, and S. Omachi, “Stroke-based scene text erasing using synthetic data for training,” IEEE Transactions on Image Processing, vol. 30, pp. 9306-9320, 2021.
    [41] L. A. Gatys, A. S. Ecker, and M. Bethge, “A neural algorithm of artistic style,” arXiv preprint arXiv:1508.06576, 2015.
    [42] lakshwin. “A pytorch implementation of the SRNet architecture,” [Online]Available: https://github.com/lakshw1n/SRNet [Accessed Oct. 16, 2023].
    [43] X. Zhang, S. Karaman, and S.-F. Chang, “Detecting and simulating artifacts in gan fake images,” in 2019 IEEE International Workshop on Information Forensics and Security (WIFS), 2019: IEEE, pp. 1-6.
    [44] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 4700-4708.
    [45] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, and S. Gelly, “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.

    無法下載圖示 全文公開日期 2029/01/24 (校內網路)
    全文公開日期 2034/01/24 (校外網路)
    全文公開日期 2034/01/24 (國家圖書館:臺灣博碩士論文系統)
    QR CODE