簡易檢索 / 詳目顯示

研究生: 羅笠程
Li-Cheng Lo
論文名稱: 一個基於深度學習技術的複製搬移和拼接合成影像之篡改區域的定位方法
A Deep-Learning-Based Approach to Locating the Tampered Areas of Copy-move and Splicing Images
指導教授: 范欽雄
Chin-Shyurng Fahn
口試委員: 黃元欣
繆紹綱
鄭為民
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 75
中文關鍵詞: 偽造影像偵測複製搬移偽造拼接偽造EfficientNetTransformer 編碼器深度學習
外文關鍵詞: Forgery image detection, copy-move manipulation, splicing manipulation, EfficientNet, Transformer encoder, deep learning
相關次數: 點閱:191下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

在這個資訊爆炸的時代,隨著計算機性能的不斷提升、移動設備的普及,以及社
交媒體的蓬勃發展,產生篡改影像變得越來越容易,因此越來越多的篡改影像開始出
現在生活中的各個角落。由於計算機技術和深度學習的發展,過去可以信任的影像和
影片已經變得難以辨別真假。在本論文中,我們提出了一種基於 EfficientNet 和
Transformer 編碼器的深度學習方法來定位複製移動和拼接圖像的篡改區域,這都是
當今最常見的偽造技術。在輸入偽造影像後,我們的偵測模型將生成一個二元遮罩,
其中黑色表示真實區域,白色表示篡改區域,其中使用 COCO 資料集生成大量隨機
假影像來產生我們的預訓練模型。我們使用 AUC 和 F1 score 來評估我們的偵測模
型性能;為了公平地評估偵測模型的性能,我們在三個公共資料集上進行跨資料集驗
證:COVERAGE、CASIA、IMD2020。在驗證每一個資料集時,我們先對自己的偵測
模型進行微調;在驗證 COVERAGE 時,Constrained R-CNN 表現最佳,獲得的AUC
為0.918,F1 score 為 0.750,而我們只獲得 AUC 為 0.852,F1 score 為 0.491,因為
我們的訓練資料集中,細緻的複製搬移偽造影像資料數量太少;在驗證 CASIA 時,
TransForensics 表現最佳,獲得的AUC 為0.837,F1 score 為 0. 627,但我們僅獲得
AUC 為 0.779,F1 score 為 0.408,因為我們的偵測模型對於拼接偽造的背景偵測能
力較弱,而在驗證 IMD2020 時,我們的偵測模型表現最優,獲得的 AUC 為 0.890,
F1 score 為 0.520 ,因為我們的偵測模型在對於日常偽造影像中,可以比在
COVERAGE 和 CASIA 上驗證,達到更好的偵測效能。


In this era of information explosion, with the continuous improvement of computer
performance, the popularization of mobile devices, and the vigorous development of social
media, it is becoming more accessible and easier to produce forgery images. Therefore, more
and more forgery images begin to appear in every corner surrounding our daily life. With
the development of computer technology and deep learning, pictures and videos that were
once trustworthy have become challenging to distinguish between real and fake. In this thesis,
we present a deep learning method based on EfficientNet and Transformer encoder to locate
tampered regions of copy-move and splicing images, both of which are the most common
forgery techniques today. After inputting a forgery image, our deep-learning-based detection
model would generate a binary mask where black means authentic region, and white means
tampered region. We use the COCO dataset to create a large number of random forgery
images to produce our pretrained model. Both the AUC and F1 score are employed to
evaluate the performance of our detection model; to reasonably accomplish it, we carry out
cross-dataset validation on three public datasets: COVERAGE, CASIA, and IMD2020.
When validating each dataset, we first fine-tune our detection model; after validating on
COVERAGE, Constrained R-CNN performs the best, its AUC is 0.918, and F1 score is
0.750, while our detection model’s AUC is 0.852 and F1 score is 0.491 only, because the
number of refined copy-move images is too small in our dataset; after validating on CASIA,
TransForensics performs the best, its AUC is 0.837, and an F1 score is 0.627, while our
detection model’s AUC is 0.779 and F1 score is 0.408 merely, because our detection model
has weak detection ability for background splicing; after validating on IMD2020, our
detection model performs the best, obtaining an AUC of 0.890 and an F1 score of 0.520,
because our detection model can achieve the preferable performance than after validating on
COVERAGE and CASIA for daily forgery images.

Contents 中文摘要 ................................................................................................................................ i Abstract ................................................................................................................................ ii 致謝 ...................................................................................................................................... iv List of Figures .................................................................................................................... vii List of Tables ....................................................................................................................... ix Chapter 1 Introduction ................................................................................................. 1 Overview ........................................................................................................................... 1 Motivation ......................................................................................................................... 2 System Description ........................................................................................................... 3 Organization of Thesis ...................................................................................................... 5 Chapter 2 Related Work ............................................................................................... 6 Copy-move and Splicing Images ...................................................................................... 6 Traditional Forgery Image Detection Methods ................................................................. 7 Deep Learning models ...................................................................................................... 8 2.3.1 Two-stream Faster R-CNN network (RGB-N) ............................................................ 8 2.3.2 ManTra-Net .................................................................................................................. 9 2.3.3 Constrained R-CNN ................................................................................................... 10 2.3.4 TransForensics ............................................................................................................ 11 Chapter 3 Our Forgery Image Detection Method .................................................... 12 EfficientNet ..................................................................................................................... 12 3.1.1 Depthwise separable convolution ............................................................................... 12 3.1.2 Modules ...................................................................................................................... 13 3.1.3 Sub-blocks and EfficientNet-B4 ................................................................................. 16 Transformer Block .......................................................................................................... 18 3.2.1 Positional encoding .................................................................................................... 18 3.2.2 Self-attention mechanism ........................................................................................... 19 3.2.3 Multi-head self-attention ............................................................................................ 20 3.2.4 Transformer encoder .................................................................................................. 21 Feature Fusion................................................................................................................. 22 v Loss Function .................................................................................................................. 24 Chapter 4 Experimental Results and Discussion ...................................................... 27 Experimental Setup ......................................................................................................... 27 4.1.1 Developing environment setup ................................................................................... 27 4.1.2 Training dataset .......................................................................................................... 28 Evaluation Analysis ........................................................................................................ 31 4.2.1 F1 score and Area under curve (AUC) ....................................................................... 31 4.2.2 Validation on the COVERAGE dataset...................................................................... 35 4.2.3 Validation on the CASIA dataset ............................................................................... 40 4.2.4 Validation on the IMD2020 dataset............................................................................ 50 Ablations Study............................................................................................................... 54 Chapter 5 Conclusions and Future Work ................................................................. 57 Conclusions ..................................................................................................................... 57 Future Work .................................................................................................................... 57 References........................................................................................................................... 59

References
[1] A. Vaswani et al., “Attention is all you need,” Advances in Neural Information
Processing Systems, vol. 30, 2017.
[2] K. He et al., “Deep residual learning for image recognition,” in Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, ?, Nevada, 2016, pp.
770-778.
[3] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic
segmentation,” in Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, Boston, Massachusetts, 2015, pp. 3431-3440.
[4] M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural
networks,” in Proceedings of the International Conference on Machine Learning,
Long Beach, California, 2019, pp. 6105-6114.
[5] O. Russakovsky et al., “Imagenet large scale visual recognition challenge,”
International Journal of Computer Vision, vol. 115, no. 3, pp. 211-252, 2015.
[6] A. Dosovitskiy et al., “An image is worth 16x16 words: Transformers for image
recognition at scale,” Oct 2020. [Online]. Available:
https://arxiv.org/abs/2010.11929.
[7] B. Wen et al., “COVERAGE—A novel database for copy-move forgery detection,”
in Proceedings of the 2016 IEEE International Conference on Image Processing,
Phoenix, Arizona, 2016, pp. 161-165.
[8] J. Dong, W. Wang, and T. Tan, “Casia image tampering detection evaluation
database,” in Proceedings of the 2013 IEEE China Summit and International
Conference on Signal and Information Processing, Beijing, China, 2013, pp. 422-
426.
[9] A. C. Popescu and H. Farid, “Exposing digital forgeries in color filter array
interpolated images,” IEEE Transactions on Signal Processing, vol. 53, no. 10, pp.
3948-3959, 2005.
[10] G. Li et al., “A sorted neighborhood approach for detecting duplicated regions in
image forgeries based on DWT and SVD,” in Proceedings of the 2007 IEEE
International Conference on Multimedia and Expo, Beijing, China, 2007, pp. 1750-
1753.
[11] Y. Q. Shi, C. Chen, and W. Chen, “A natural image model approach to splicing
detection,” in Proceedings of the 9th Workshop on Multimedia & Security, Dallas,
Texas, 2007, pp. 51-62.
[12] H. Gou, A. Swaminathan, and M. Wu, “Noise features for image tampering detection
and steganalysis,” in Proceedings of the 2007 IEEE International Conference on
Image Processing, San Antonio, Texas, 2007, vol. 6, pp. VI-97-VI-100.

61

[13] Y. F. Hsu and S. F. Chang, “Statistical fusion of multiple cues for image tampering
detection,” in Proceedings of the 2008 42nd Asilomar Conference on Signals,
Systems and Computers, Pacific Grove, California, 2008, pp. 1386-1390.
[14] P. Zhou et al., “Learning rich features for image manipulation detection,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
Salt Lake, Utah, 2018, pp. 1053-1061.
[15] Y. Wu, W. AbdAlmageed, and P. Natarajan, “Mantra-net: Manipulation tracing
network for detection and localization of image forgeries with anomalous features,”
in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
Long Beach, California, 2019, pp. 9543-9552.
[16] C. Yang et al., “Constrained R-CNN: A general image manipulation detection model,”
in Proceedings of the 2020 IEEE International Conference on Multimedia and Expo,
London, United Kingdom, 2020, pp. 1-6.
[17] J. Hao et al., “TransForensics: image forgery localization with dense self-attention,”
in Proceedings of the IEEE International Conference on Computer Vision, Montreal,
Canada, 2021, pp. 15055-15064.
[18] M. Tan et al., “Mnasnet: Platform-aware neural architecture search for mobile,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
Long Beach, California, 2019, pp. 2820-2828.
[19] A. G. Howard et al., “Mobilenets: Efficient convolutional neural networks for mobile
vision applications, ” arXiv preprint arXiv:1704.04861, 2017. [Online]. Available:
[20] Vardan Agarwal, “Complete architectural details of all efficientNet models” [Online].
Available: https://towardsdatascience.com/complete-architectural-details-of-all-
efficientnet-models-5fd5b736142 (Accessed July 20, 2022).
[21] F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
Honolulu, Hawaii, 2017, pp. 1251-1258.
[22] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training
by reducing internal covariate shift,” in Proceedings of the International Conference
on Machine Learning, Lille, France, 2015, pp. 448-456.
[23] P. Ramachandran, B. Zoph, and Q. V. Le, “Searching for activation functions, ” Oct.
2017. [Online]. Available: https://arxiv.org/abs/1710.05941.
[24] J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” Jul. 2016. [Online].
Available: https://arxiv.org/abs/1607.06450.
[25] D. Hendrycks and K. Gimpel, “Gaussian error linear units (gelus),” Jul. 2020.
[Online]. Available: https://arxiv.org/abs/1606.08415.
[26] C.-Y. Lee et al., “Deeply-Supervised Nets,” in Proceedings of the Artificial
Intelligence and Statistics, San Diego, California, 2015, pp. 562-570.

62

[27] F. Milletari, N. Navab, and S. A. Ahmadi, “V-Net: Fully convolutional neural
networks for volumetric medical image segmentation,” in Proceedings of the 2016
Fourth International Conference on 3D Vision, Stanford, California, 2016, pp. 565-
571.
[28] T. Y. Lin et al., “Focal loss for dense object detection,” in Proceedings of the IEEE
International Conference on Computer Vision, Venice, Italy, 2017, pp. 2980-2988.
[29] T. Y. Lin et al., “Microsoft COCO: Common objects in context,” in Proceedings of
the European Conference on Computer Vision, Zurich, Switzerland, 2014, pp. 740-
755.
[30] A. Novozamsky, B. Mahdian, and S. Saic, “IMD2020: A large-scale annotated
dataset tailored for detecting manipulated images,” in Proceedings of the IEEE
Winter Conference on Applications of Computer Vision Workshops, Snowmass
Village, Colorado, 2020, pp. 71-80.
[31] N. Krawetz and H. F. Solutions, “A picture’s worth,” Hacker Factor Solutions, vol.
6, no. 2, pp. 2-32, 2007.
[32] B. Mahdian and S. Saic, “Using noise inconsistencies for blind image forensics,”
Image and Vision Computing, vol. 27, no. 10, pp. 1497-1503, 2009.
[33] P. Ferrara et al., “Image forgery localization via fine-grained analysis of CFA
artifacts,” IEEE Transactions on Information Forensics and Security, vol. 7, no. 5,
pp. 1566-1577, 2012.
[34] J. H. Bappy et al., “Exploiting spatial structure for localizing manipulated image
regions,” in Proceedings of the IEEE International Conference on Computer Vision,
Venice, Italy, 2017, pp. 4970-4979.
[35] W. Li, Y. Yuan, and N. Yu, “Passive detection of doctored JPEG image via block
artifact grid extraction,” Signal Processing, vol. 89, no. 9, pp. 1821-1829, 2009.

無法下載圖示 全文公開日期 2027/08/24 (校內網路)
全文公開日期 2032/08/24 (校外網路)
全文公開日期 2032/08/24 (國家圖書館:臺灣博碩士論文系統)
QR CODE