簡易檢索 / 詳目顯示

研究生: 沈修民
Hsiu-Min Shen
論文名稱: 一個用於偵測被影像編輯軟體竄改之影像的深度學習技術
A Deep Learning Approach to Detecting Images Tampered with Image Editing Software
指導教授: 范欽雄
Chin-Shyurng Fahn
口試委員: 李建德
鄭為民
馮輝文
范欽雄
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 55
中文關鍵詞: 深度學習殘差學習網路Transformer影像竄改偵測影像編輯軟體
外文關鍵詞: Deep learning, residual learning network, Transformer, image tampering detection, image editing software
相關次數: 點閱:108下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 影像竄改對社會、政治和商業構成了潛在威脅,由於數位相機和智慧型手機的普 及性,人們可以快速且方便地獲取數位影像,而隨著影像編輯軟體的發展,人們很容 易篡改影像,如果有人帶有惡意,後果可能不堪設想。透過操縱圖像、偽造圖像,可 以誤導公眾、操縱輿論,損害個人和公司的聲譽。因此,發展有效的影像竄改偵測方 法變得愈加重要。本研究提出了一種基於深度學習技術的影像竄改偵測及定位方法, 結合了殘差學習網路和Transformer 編碼器,以捕捉不同影像區塊之間的關係並有效 提取細微的偽造特徵。此外我們也融合了不同尺度的特徵,以提升對不同大小篡改區 域的偵測效果。
    再來,我們使用混淆矩陣來區別影像像素的四種分類結果,展示了模型在真實和 偽造像素分類上的結果。在實驗方面,我們首先基於 COCO 資料集產生了自製且大 量的偽造影像資料集,並在此基礎上對模型進行預訓練。隨後,利用三個公開資料集 進行微調,以提高模型的檢測性能。我們比較了預訓練模型與微調模型的偵測結果, 實驗結果顯示,儘管預訓練模型取得了一定的偵測效果,但還不夠讓人滿意。微調後 的模型在所有公開資料集上的 AUC 和 F1 score 均顯著提高。我們的方法在 COVERAGE上的AUC達到0.895,F1 score為0.527;在 CASIA上的AUC為0.847, F1 score 為 0.468;在 IMD2020 上的AUC為0.895,F1 score為0.533。在模型比較方 面,我們的方法在 COVERAGE 上的表現僅略遜於 MVSS-Net,與 ManTra-Net 和 Constrained-R-CNN 相比,我們的方法在三個資料集上的AUC和F1 score皆較高。


    Image tampering pose potential threats to society, politics, and business. Due to the popularity of digital cameras and smartphones, people can quickly and easily obtain digital images, and with the development of image editing software, people can easily tamper with images, and if someone is intentional, the consequences may be disastrous. Through the manipulation of images, tampered images that can mislead the public, manipulate opinions, and damage the reputation of individuals and companies. Therefore, developing effective methods for detecting image tampering is increasingly important. This thesis proposes an approach based on deep learning to detecting and locating the tampered areas in images, combining residual learning network with Transformer encoder to capture relationships between different image patches and effectively extract subtle tampering features. Additionally, we integrated multi-scale features to enhance the detection of tampered areas of various sizes.
    We use confusion matrix to distinguish between four types of classification outcomes, which shows the model's results in classifying real and tampered pixels. In our experiments, we first generate our own large-scale tampered image dataset based on the COCO dataset and use it for pre-training our model. Subsequently, we fine-tuned the model using three public datasets to improve detection performance. We compare the detection results of the pre-trained model with the fine-tuned model, and experimental results shows that while the pre-trained model achieves some detection success, the results are not satisfactory. After fine-tuning, the model's AUC and F1 scores significantly improve across all public datasets. Our method achieves AUC of 0.895 and F1 score of 0.527 on the COVERAGE dataset; AUC of 0.847 and F1 score of 0.468 on the CASIA dataset; and AUC of 0.895 and F1 score of 0.533 on the IMD2020 dataset. In terms of model comparison, our method performs slightly worse than MVSS-Net on the COVERAGE dataset. Compared to ManTra-Net and Constrained-R-CNN, our method consistently achieves higher AUC and F1 score across all three datasets.

    中文摘要 i Abstract ii 誌謝 iv List of Figures vii List of Tables ix Chapter 1 Introduction 1 1.1 Overview 1 1.2 Motivation 2 1.3 System Description 4 1.4 Thesis Organization 5 Chapter 2 Related Work 6 2.1 Literature Review 6 2.2 Deep Learning Based Detection Methods 7 Chapter 3 Our Proposed Method for Detecting Tampered images 11 3.1 Deep Residual Learning Network (ResNet) 11 3.2 Transformer 14 3.2.1 Attention mechanism 14 3.2.2 Transformer encoder 17 3.3 Multi-Scale Feature Fusion and Deep Supervision 20 3.4 Loss Functions 21 3.4.1 Dice Loss 21 3.4.2 Focal Loss 22 3.4.3 Combined Loss 24 Chapter 4 Experimental Results and Discussions 25 4.1 Experimental Environment Setup 25 4.2 Evaluation Metrics 26 4.3 Dataset Description 29 4.4 Experimental Results 33 4.4.1 Results on pre-trained model 33 4.4.2 Results on fine-tuned model 36 4.4.3 Failure cases 43 4.4.4 Cross-dataset generalization ability 44 4.4.5 Ablation study 45 4.4.6 Detection performance comparison of our model and the others 47 Chapter 5 Conclusions and Future Work 51 5.1 Conclusions 51 5.2 Future Work 52 References 53

    [1] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Underthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J.Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv: 2010.11929, 2020.
    [2] A. C. Popescu and H. Farid, “Exposing digital forgeries in color filter array interpolated images,” IEEE Transactions on Signal Processing, vol. 53, no. 10, pp. 3948-3959, 2005.
    [3] K. Bahrami, A. C. Kot, L. Li, and H. Li, “Blurred image splicing localization by exposing blur type inconsistency,” IEEE Transactions on Information Forensics and Security, vol. 10, no. 5, pp. 999-1009, 2015.
    [4] T. Bianchi and A. Piva, “Image forgery localization via block-grained analysis of JPEG artifacts,” IEEE Transactions on Information Forensics and Security, vol. 7, no. 3, pp. 1003-1017, 2012.
    [5] N. Krawetz and H. F. Solutions, “A picture’s worth,” Hacker Factor Solutions, vol. 6, no. 2, pp. 2-32, 2007.
    [6] B. Mahdian and S. Saic, “Using noise inconsistencies for blind image forensics,” Image and Vision Computing, vol. 27, no. 10, pp. 1497-1503, 2009.
    [7] P. Ferrara, T. Bianchi, A. De Rosa, and A. Piva, “Image forgery localization via fine-grained analysis of CFA artifacts,” IEEE Transactions on Information Forensics and Security, vol. 7, no. 5, pp. 1566-1577, 2012.
    [8] Y. Rao and J. Ni, “A deep learning approach to detection of splicing and copy-move forgeries in images,” in Proceedings of the IEEE International Workshop on Information Forensics and Security, Abu Dhabi, United Arab Emirates, 2016, pp. 1-6.
    [9] Y. Zhang, L. L. Win, J. Goh, and V. L. Thing, “Image region forgery detection: A deep learning approach,” in Proceedings of the Singapore Cyber-Security Conference, Singapore, 2016, pp. 1-11.
    [10] Y. Wu, W. AbdAlmageed, and P. Natarajan, “Mantranet: Manipulation tracing network for detection and localization of image forgeries with anomalous features,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, California, 2019, pp. 9543-9552.
    [11] J. J. Fridrich and J. Kodovsky, “Rich models for steganalysis of digital images,” IEEE Transactions on Information Forensics and Security, vol. 7, no. 3, pp. 868-882, 2012.
    [12] C. Yang, H. Li, F. Lin, B. Jiang, and H. Zhao, “Constrained R-CNN: A general image manipulation detection model,” in Proceedings of the IEEE International Conference on Multimedia and Expo, London, United Kingdom, 2020, pp. 1-6.
    [13] B. Bayar and M. C. Stamm, ‘‘Constrained convolutional neural networks: A new approach towards general purpose image manipulation detection,’’ IEEE Transactions on Information Forensics and Security, vol. 13, no. 11, pp. 2691-2706, 2018.
    [14] C. Dong, X. Chen, R. Hu, J. Cao, and X. Li, “MVSS-Net: Multi-view multi-scale supervised networks for image manipulation detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 3, pp. 3539-3553, 2023.
    [15] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, 2016, pp. 770-778.
    [16] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Proceedings of the 31st Conference on Neural Information Processing Systems, Long Beach, California, 2017, pp. 6000-6010.
    [17] J. Alammar, “The Illustrated Transformer,” [Online] Available: http://jalammar.github.io/illustratedtransformer. [Accessed on June 28, 2024].
    [18] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, Massachusetts, 2015, pp. 3431-3440.
    [19] J. Hao, Z. Zhang, S. Yang, D. Xie, and S. Pu, “TransForensics: image forgery localization with dense self-attention,” in Proceedings of the IEEE International Conference on Computer Vision, Montreal, Canada, 2021, pp. 15055-15064.
    [20] C. Y. Lee, S. Xie, P. Gallagher, Z. Zhnag, and Z. Tu, “Deeply-Supervised Nets,” in Proceedings of the Artificial Intelligence and Statistics, San Diego, California, 2015, pp. 562-570.
    [21] F. Milletari, N. Navab, and S. A. Ahmadi, “V-Net: Fully convolutional neural networks for volumetric medical image segmentation,” in Proceedings of the 2016 Fourth International Conference on 3D Vision, Stanford, California, 2016, pp. 565-571.
    [22] T. Y. Lin, P. Goyal, R. Girshick, K, He, and P. Dollár, “Focal loss for dense object detection,” in Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 2017, pp. 2980-2988.
    [23] T. Y. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona, D. Ramanan, C. Zitnick, and P. Dollár, “Microsoft COCO: Common objects in context,” in Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 2014, pp. 740-755.
    [24] B. Wen, Y. Zhu, R. Subramanian, T. T. Ng, X. Shen, and S. Winkler, “COVERAGE—A novel database for copy-move forgery detection,” in Proceedings of the 2016 IEEE International Conference on Image Processing, Phoenix, Arizona, 2016, pp. 161-165.
    [25] J. Dong, W. Wang, and T. Tan, “Casia image tampering detection evaluation database,” in Proceedings of the 2013 IEEE China Summit and International Conference on Signal and Information Processing, Beijing, China, 2013, pp. 422-426.
    [26] A. Novozamsky, B. Mahdian, and S. Saic, “IMD2020: A large-scale annotated dataset tailored for detecting manipulated images,” in Proceedings of the IEEE Winter Conference on Applications of Computer Vision Workshops, Snowmass Village, Colorado, 2020, pp. 71-80.
    [27] D. P. Kingma, and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.

    無法下載圖示 全文公開日期 2029/08/07 (校內網路)
    全文公開日期 2034/08/07 (校外網路)
    全文公開日期 2034/08/07 (國家圖書館:臺灣博碩士論文系統)
    QR CODE