簡易檢索 / 詳目顯示

研究生: 方柏玄
Po-Hsuan Fang
論文名稱: 一個針對生成對抗網路產生的全局、局部偽造中文文本影像的竄改偵測方法
A Tampering Detection Method for Global and Local Forgeries of Chinese Text Images Generated by Generative Adversarial Networks
指導教授: 馮輝文
Huei-Wen Ferng
口試委員: 林啟芳
Chi-Fang Lin
黃榮堂
Jung-Tang Huang
范欽雄
Chin-Shyurng Fahn
馮輝文
Huei-Wen Ferng
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 61
中文關鍵詞: 深度學習頻率域轉換生成對抗網路偽造文字影像偵測場景文本局部替換
外文關鍵詞: deep learning, frequencydomain transformation, Generative Adversarial Network, forgery detection of character images, scene text images partial replacement
相關次數: 點閱:126下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

在如今的時代中,科技逐漸進步,ChatGPT等等的產品,也是以每年、甚至每月的速度推成出新,而在偽造技術方面,也跟著如今這個AI社會的進步推陳出新,從以前單純的copy move,splicing等等人為操作,對圖片進行編輯處理、換置,進步到現在利用AI學習,GAN生成等等手段,即可輕易產生肉眼都無法輕易看出來的偽造圖片。如今對文本影像的竄改也已大量出現,例如身分證件被有心人士進行竄改,進行不法行為,亦或是重要人士的簽名被偽造,用於簽署重要文件,都容易對社會安全產生動盪。因此,開發檢測偽造文本影像的偵測器已刻不容緩,效能也必須與時俱進,必須要能輕易偵測出AI生成的假圖片。
在本論文中,我們針對AI生成的文本影像(基於生成對抗網路)提出了基於多種光譜特徵的偽造偵測模型。實作上分成兩個部分。第一步,我們藉由數個生成對抗網路產生數量繁多且背景多元的偽造影像(包括手寫中文及印刷體中文)。接下來,我們將和真實的影像及偽造的影像分別進行多種頻率域轉換,這些轉換包括離散傅立葉變換(DFT)、離散餘弦變換(DCT)和離散小波變換(DWT)。轉換後,將光譜串連,作為密集連接卷積網路 (DenseNet)的模型輸入進行訓練。
而在實驗分面,主要分為三部分敘述。首先實驗一中,使用了幾種評估指標對生成對抗網路產生的偽造文本進行分析,用以比較模型的效能。實驗二中,為了找出最適用的偵測模型,我們使用各種光譜轉換方式,搭配不同的深度學習模型來進行訓練。實驗數據顯示,在DCT、DWT光譜中,ViT有最高的正確率,分別達到98.9%及97.73%,但我們改良後的DenseNet正確率也與其十分接近。而在DFT光譜中,ViT的準確度大幅下降,而我們提出的DenseNet仍然維持95.71%的準確度。由第二個實驗,我們選出了改良後的DenseNet來做消融實驗,目的是找出最佳的光譜組合,而實驗結果表明串聯三個光譜,會有最佳的正確率,達到了99.16%。此結果表明,比起單一光譜,我們的方法能更準確的辨識偽造影像。


In today's era, technology is gradually advancing, and products like ChatGPT are being updated at an annual or even monthly rate. Similarly, forgery techniques have evolved with the progress of our AI-driven society. From simple human-operated methods like copy-move and splicing for editing and rearranging images, we have progressed to using AI learning and GAN generation methods, which can easily create forgeries that are difficult to detect with the naked eye. Nowadays, the tampering of text images has also become prevalent. For example, identification documents may be altered by malicious individuals to carry out illegal activities, or important signatures may be forged for signing critical documents, posing a threat to social security. Therefore, developing detectors for identifying forged text images has become imperative, and their performance must keep pace with the times to effectively detect AI-generated fake images.
In this thesis, we introduce a forgery detection model based on multiple spectral features for AI-generated text images (leveraging Generative Adversarial Networks). The implementation is divided into two main steps. First, we create a large and diverse set of forged images (including handwritten and printed) using several Generative Adversarial Networks. Following this, we apply various frequency domain transformations to both real and forged images, including Discrete Fourier Transform (DFT), Discrete Cosine Transform (DCT), and Discrete Wavelet Transform (DWT). After these transformations, the spectra are concatenated and used as input to train a Densely Connected Convolutional Network (DenseNet).
The experimental section is mainly divided into three parts. In the first experiment, we used several evaluation metrics to analyze forged text generated by Generative Adversarial Networks (GANs) to compare the performance of the models. In the second experiment, to find the most suitable detection model, we used various spectral transformation methods in combination with different deep learning models for training. The experimental data showed that ViT achieved the highest accuracy in the DCT and DWT spectrums, reaching 98.9% and 97.73%, respectively, but our improved DenseNet also had very close accuracy. In the DFT spectrum, the accuracy of ViT significantly decreased, while our proposed DenseNet maintained an accuracy of 95.71%. From the second experiment, we selected the improved DenseNet for ablation experiments to find the optimal spectrum combination. The experimental results showed that concatenating three spectrums yielded the best accuracy, reaching 99.16%. This result indicates that our method can more accurately identify forged images compared to using a single spectrum.

中文摘要 i Abstract iii 誌謝 v List of Figures viii List of Tables x Chapter 1 Introduction 1 1.1 Overview 1 1.2 Motivation 3 1.3 System Description 5 1.4 Thesis Organization 6 Chapter 2 Related Work 7 2.1 GAN and GAN-based Image Generation 7 2.2 Scene Texts Forgery Images Generation 9 2.3 Images Detection Methods Base on GAN-generated 12 Chapter 3 Forged Text Dataset 15 3.1 Source Dataset 15 3.2 Forged Texts Image Generation 16 3.2.1 SRNet 16 3.2.2 TENet 19 3.3 Implementation Partial Text Replacement 21 3.4 Illustration of the Proposed Dataset 23 3.4.1 Forged text image generation through SRNet 23 3.4.2 Forged text image generation through TENet 26 Chapter 4 Tampering Detection Methods for Forged Chinese Text Images 28 4.1 Data Preprocessing 28 4.2 Deep Learning Model for Detecting Forged Text Images 30 4.2.1 VGG19 31 4.2.2 Densely connected convolutional network (DenseNet) 32 4.2.3 Vision Transformer (ViT) 33 4.2.4 Improved DenseNet 34 4.3 Tampering Detection Methods for Forged Text Images Using Spectrum Input 36 Chapter 5 Experimental Results and Discussion 40 5.1 Experimental Environment 40 5.2 Experimental Results on Different GANs for Text Image Generation 42 5.3 Experimental Tampering Detection for Forged Text Images 44 5.3.1 Tampering detection results of RGB-based Models 46 5.3.2 Tampering detection results of DCT-based Models 47 5.3.3 Tampering detection results of DFT-based Models 49 5.3.4 Tampering detection results of DWT-based Models 51 5.4 Ablation Study on Different Spectrum Combinations 53 Chapter 6 Conclusions and Future Work 56 6.1 Conclusions 56 6.2 Future Work 57

[1] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in Neural Information Processing Systems, vol. 27, 2014, Montreal, Canada.
[2] L. Wu, C. Zhang, J. Liu, J. Han, J. Liu, E. Ding, and X. Bai, “Editing text in the wild,” in Proceedings of the 27th ACM International Conference on Multimedia, 2019, New York, pp. 1500-1508.
[3] A. Ghai, P. Kumar, and S. Gupta, “A deep-learning-based image forgery detection framework for controlling the spread of misinformation,” Information Technology & People, 2021.
[4] A. Radford, L. Metz, and S. Chintala, "Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks," in Proceedings of the International Conference on Learning Representations (ICLR), 2016.
[5] J. -Y. Zhu, T. Park, P. Isola, and A. A. Efros, "Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks," in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2223-2232.
[6] P. Isola, J. -Y. Zhu, T. Zhou, and A. A. Efros, "Image-to-Image Translation with Conditional Adversarial Networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 5967-5976.
[7] T. Karras, S. Laine, and T. Aila, "A Style-Based Generator Architecture for Generative Adversarial Networks," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4401-4410.
[8] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, "Analyzing and Improving the Image Quality of StyleGAN," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 8110-8119.
[9] T. Karras, M. Aittala, S. Laine, E. Härkönen, J. Hellsten, J. Lehtinen, and T. Aila, "Alias-Free Generative Adversarial Networks," in Proceedings of the NeurIPS Conference, 2021.
[10] A. Brock, J. Donahue, and K. Simonyan, "Large Scale GAN Training for High Fidelity Natural Image Synthesis," in Proceedings of the International Conference on Learning Representations (ICLR), 2019.
[11] P. Lyu, X. Bai, C. Yao, Z. Zhu, T. Huang, and W. Liu, “Auto-encoder guided GAN for Chinese calligraphy synthesis,” in 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), 2017, vol. 1: IEEE, pp. 1095-1100.
[12] B. Chang, Q. Zhang, S. Pan, and L. Meng, “Generating handwritten chinese characters using cyclegan,” in 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), 2018: IEEE, pp. 199-207.
[13] S. J. Nightingale, K. A. Wade, and D. G. Watson, “Can people identify original and manipulated photos of real-world scenes?,” Cognitive research: principles and implications, vol. 2, no. 1, pp. 1-21, 2017.
[14] X. Liu, Z. Zhang, M. Liu, and H. Miao, “CASIA online and offline Chinese handwriting databases,” in Proc. 11th International Conference on Document Analysis and Recognition, 2011, pp. 37-41.
[15] P. Roy, S. Bhattacharya, S. Ghosh, and U. Pal, “STEFANN: scene text editor using font adaptive neural network,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, Seattle, Washington, pp. 13228-13237.
[16] L. G. Hafemann, R. Sabourin, and L. S. Oliveira, "Learning Features for Offline Handwritten Signature Verification using Deep Convolutional Neural Networks," Pattern Recognition, vol. 70, pp. 163-176, 2017.
[17] J. Yan, Z. He, J. Tan, H. Liu, and X. Zhang, “Recaptured document image detection based on deep learning and color feature,” 2017 IEEE International Conference on Image Processing (ICIP), Beijing, 2017, pp. 3720-3724.
[18] A. Malathi, S. Subashini, and T. Suganya, “Document forgery detection using multi-class support vector machine,” in 2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), Chennai, India, 2016, pp. 1375-1380.
[19] N. Yu, L. S. Davis, and M. Fritz, "Attributing Fake Images to GANs: Learning and Analyzing GAN Fingerprints," in Proc. IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 2019, pp. 7556-7566.
[20] P. Gupta, C. S. Rajpoot, T. Shanthi, D. Prasad, A. Kumar, and S. S. Kumar, “Image forgery detection using deep learning model,” in Proceedings of the 3rd International Conference on Smart Electronics and Communication, 2022, Trichy, India, pp. 1256-1262.
[21] D. Gragnaniello, F. Marra, G. Poggi, and L. Verdoliva, “Analysis of adversarial attacks against CNN-based image forgery detectors,” in Proceedings of the 26th European Signal Processing Conference, 2018, Rome, Italy, pp. 967-971.
[22] J. C. F. and J. R. K., "Exposing Digital Forgeries Through Spectral Analysis," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1234-1242.
[23] M. S., L. T., and K. B., "Wavelet-Based Detection of Image Forgery," in Proceedings of the IEEE International Conference on Image Processing (ICIP), 2018, pp. 1503-1507.
[24] A. P., M. D., and E. H., "Detecting GAN-Generated Fake Images Using Spectral Analysis," in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 7556-7566.
[25] K. Saeed, M. Tabędzki, M. Rybnik, and M. Adamski, “K3M: A universal algorithm for image skeletonization and a review of thinning techniques,” International Journal of Applied Mathematics and Computer Science, vol. 20, no. 2, pp. 317-335, 2010.
[26] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[27] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, Honolulu, Hawaii, pp. 4700-4708.
[28] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, and S. Gelly, “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
[29] M. G. Ting, Y. Liang, J. H. Pan, J. L. Huang, X. L. Chen, and Y. C. She, "Calligraphic Font Recognition Algorithm Based on Improved DenseNet Network," Computer Systems & Applications, vol. 31, no. 2, pp. 253-259, 2022.
[30] Z. Liu, J. Li, Z. Shen, et al., "Learning efficient convolutional networks through network slimming," in Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017, pp. 2755–2763.

無法下載圖示 全文公開日期 2029/08/08 (校內網路)
全文公開日期 2034/08/08 (校外網路)
全文公開日期 2034/08/08 (國家圖書館:臺灣博碩士論文系統)
QR CODE