研究生: |
李維 Wei Li |
---|---|
論文名稱: |
風格保留之場景文字生成 Generation of Style-Preserving Scene Text |
指導教授: |
徐繼聖
Gee-Sern Hsu |
口試委員: |
鍾聖倫
Sheng-Luen Chung 林惠勇 Huei-Yung Lin 陳祝嵩 Chu-Song Chen 洪一平 Yi-Ping Hung |
學位類別: |
碩士 Master |
系所名稱: |
工程學院 - 機械工程系 Department of Mechanical Engineering |
論文出版年: | 2023 |
畢業學年度: | 111 |
語文別: | 中文 |
論文頁數: | 57 |
中文關鍵詞: | 場景文字編輯 、風格轉換 、機器學習 |
外文關鍵詞: | Scene Text Editing, StyleGAN2 |
相關次數: | 點閱:255 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
我們提出了風格保留生成器(Style-Preserving Generator, SPG),主要是應用於
場景文字編輯(Scene Text Editing, STE)。場景文字編輯的目的是將原始文字替換
成所需文字,同時保留原始文字的背景和樣式。本作貢獻有三點,第一,主流模
型的場景文字編輯架構相當複雜,SPG 不同於主流架構我們去除了模型中圖像
修復的機制,降低了模型複雜度; 第二,SPG 模型中加入 Transformer 機制,這
使的模型在圖像生成上能同時兼具細節與全局觀; 第三,SPG 模型針對大角度的
場景文字編輯有非常好的效果。
應用上場景文字編輯在合成資料的生成上有相當的潛力,透過轉移真實資料
上的文字,我們可以生成大量高品質擬真資料。本文我們將使用 SPG 生成之擬
真資料,應用於場景文字辨識(Scene Text Recognition, STR)的任務,從實驗結果
中我們發現加入 SPG 生成之資料,可以有效提升辨識效能。在現今大數據、大
資料量流行的趨勢下,SPG 可以省下過去資料蒐集、標註大量的成本藉由合成
資料提升真實資料集的辨識效能。最後,我們引入了新的車牌數據集(License
Plate Dataset 2022, LP2023)。利用 SPG 產生文字編輯後的合成資料,並對我們在
LP2023 上評估我們的方法,以證明其在從合成數據中學習的性能。
We propose a Style-Preserving Generator (SPG) primarily designed
for scene text editing (STE). STE is to replace the original text with the
desired text while preserving the background and style of the original
content. The complex architectural designs in existing research on scene
text editing have resulted in cumbersome workflows and increased
training difficulty for models. SPG is a reasonable and effective
framework that utilizes a more streamlined approach, offering improved
performance and robustness. Additionally, we address the problem of
applying scene text editing to scene text recognition (STR). Collecting real
data for STR is a challenging and time-consuming task. Therefore, we aim
to enhance the recognition performance of real datasets by leveraging
synthetic data. We introduce a new dataset for license plates (LP2023).
Utilizing SPG, we generate synthetic data with edited text and evaluate our
method on LP2023 to demonstrate its performance in learning from
synthetic data.
[1] WU, Liang, et al. Editing text in the wild. In: Proceedings of the 27th ACM
international conference on multimedia. 2019.
[2] QU, Yadong, et al. Exploring stroke-level modifications for scene text editing.
In: Proceedings of the AAAI Conference on Artificial Intelligence. 2023.
[3] BHUNIA, Ankan Kumar, et al. Handwriting transformers. In: Proceedings of the
IEEE/CVF international conference on computer vision. 2021.
[4] VASWANI, Ashish, et al. Attention is all you need. Advances in neural
information processing systems, 2017.
[5] XIE, Enze, et al. SegFormer: Simple and efficient design for semantic
segmentation with transformers. Advances in Neural Information Processing
Systems, 2021.
[6] POPESCU, Marius-Constantin, et al. Multilayer perceptron and neural
networks. WSEAS Transactions on Circuits and Systems, 2009.
[7] KOONCE, Brett; KOONCE, Brett. EfficientNet. Convolutional Neural
Networks with Swift for Tensorflow: Image Recognition and Dataset
Categorization, 2021.
[8] SANDLER, Mark, et al. Mobilenetv2: Inverted residuals and linear bottlenecks.
In: Proceedings of the IEEE conference on computer vision and pattern
recognition. 2018.
[9] ZHU, Manyu, et al. Image inpainting by end-to-end cascaded refinement with
mask awareness. IEEE Transactions on Image Processing, 2021.
[10] ZHOU, Bolei, et al. Places: An image database for deep scene
understanding. arXiv preprint arXiv:1610.02055, 2016.
[11] LIU, Ziwei, et al. Large-scale celebfaces attributes (celeba) dataset. Retrieved
August, 2018.
[12] DOERSCH, Carl, et al. What makes paris look like paris?. ACM Transactions
on Graphics, 2012.
[13] KARRAS, Tero, et al. Analyzing and improving the image quality of stylegan.
In: Proceedings of the IEEE/CVF conference on computer vision and pattern
recognition. 2020.
[14] KARRAS, Tero; LAINE, Samuli; AILA, Timo. A style-based generator
architecture for generative adversarial networks. In: Proceedings of the
IEEE/CVF conference on computer vision and pattern recognition. 2019.
[15] HE, Kaiming, et al. Deep residual learning for image recognition.
In: Proceedings of the IEEE conference on computer vision and pattern
recognition. 2016.
[16] UDRE, Carole H., et al. Generalised dice overlap as a deep learning loss
function for highly unbalanced segmentations. In: Deep Learning in Medical
Image Analysis and Multimodal Learning for Clinical Decision Support: Third
International Workshop, DLMIA 2017, and 7th International Workshop, MLCDS 2017, Held in Conjunction with MICCAI 2017, Québec City, QC, Canada,
September 14, Proceedings 3. Springer International Publishing, 2017.
[17] HO, Yaoshiang; WOOKEY, Samuel. The real-world-weight cross-entropy loss
function: Modeling the costs of mislabeling. IEEE access, 2019.
[18] NIKLAUS, Simon; LIU, Feng. Context-aware synthesis for video frame
interpolation. In: Proceedings of the IEEE conference on computer vision and
pattern recognition. 2018.
[19] Xun Huang and Serge Belongie. Arbitrary style transfer in real-time with
adaptive instance normalization. In ICCV, 2017.
[20] SIMONYAN, Karen; ZISSERMAN, Andrew. Very deep convolutional networks
for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[21] GUPTA, Ankush; VEDALDI, Andrea; ZISSERMAN, Andrew. Synthetic data
for text localisation in natural images. In: Proceedings of the IEEE conference
on computer vision and pattern recognition. 2016.
[22] HSU, Gee-Sern; CHEN, Jiun-Chang; CHUNG, Yu-Zu. Application-oriented
license plate recognition. IEEE transactions on vehicular technology, 2012.
[23] KARATZAS, Dimosthenis, et al. ICDAR 2013 robust reading competition.
In: 2013 12th international conference on document analysis and recognition.
IEEE, 2013.
[24] K. Wang, B. Babenko, and S. Belongie. End-to-end scene text recognition. In
ICCV 2011, 2011.
[25] PHAN, Trung Quy, et al. Recognizing text with perspective distortion in natural
scenes. In: Proceedings of the IEEE international conference on computer
vision. 2013.
[26] MISHRA, Anand; ALAHARI, Karteek; JAWAHAR, C. V. Scene text
recognition using higher order language priors. In: BMVC-British machine
vision conference. BMVA, 2012.
[27] NAYEF, Nibal, et al. ICDAR2019 robust reading challenge on multi-lingual
scene text detection and recognition—RRC-MLT-2019. In: 2019 International
conference on document analysis and recognition (ICDAR). IEEE, 2019.
[28] VEIT, Andreas, et al. Coco-text: Dataset and benchmark for text detection and
recognition in natural images. arXiv preprint arXiv:1601.07140, 2016.
[29] BAEK, Jeonghun, et al. What is wrong with scene text recognition model
comparisons? dataset and model analysis. In: Proceedings of the IEEE/CVF
international conference on computer vision. 2019.
[30] HEUSEL, Martin, et al. Gans trained by a two time-scale update rule converge
to a local nash equilibrium. Advances in neural information processing systems,
2017.
[31] RONNEBERGER, Olaf; FISCHER, Philipp; BROX, Thomas. U-net:
Convolutional networks for biomedical image segmentation. In: Medical Image
Computing and Computer-Assisted Intervention–MICCAI 2015: 18th
International Conference, Munich, Germany, October 5-9, 2015, Proceedings,
Part III 18. Springer International Publishing, 2015.
[32] LI, Jingyuan, et al. Recurrent feature reasoning for image inpainting.
In: Proceedings of the IEEE/CVF conference on computer vision and pattern
recognition. 2020.
[33] LI, Wenbo, et al. Mat: Mask-aware transformer for large hole image inpainting.
In: Proceedings of the IEEE/CVF conference on computer vision and pattern
recognition. 2022.
[34] XU, Zhenbo, et al. Towards end-to-end license plate detection and recognition:
A large dataset and baseline. In: Proceedings of the European conference on
computer vision (ECCV). 2018.