藉由參數化對抗生成模型定位對話框｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	謝公耀 Kung-Yao Hsieh
論文名稱：	藉由參數化對抗生成模型定位對話框 Robust Parametric GAN-based Balloon Localization
指導教授：	賴祐吉 Yu-Chi Lai
口試委員:	賴祐吉 Yu-Chi Lai 戴文凱 Wen-Kai Tai 花凱龍 Kai-Lung Hua 林士勛 Shih-Syun Lin
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2022
畢業學年度：	110
語文別：	中文
論文頁數：	81
中文關鍵詞：	Deep Learning 、Faster R-CNN 、GAN 、Object Detection 、Segmentation 、Manga Image Analysis
外文關鍵詞：	Deep Learning, Faster R-CNN, GAN, Object Detection, Segmentation, Manga Image Analysis
相關次數：	點閱：275 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

為了對漫畫加入動畫效果和翻譯文字，需要偵測出文字區域。因為黑白式圖像文字，對話框(Speech Balloon) 和狀聲詞(Onomatopoeia) 變化多瑞，且與背景圖像元素(Graphic Elements) 混合在一起。本論文基於深度學習(Deep Learning) 提出注意力(Attention) 機制，讓模型可以聚焦在文字區域，並藉由增加不同候選區域(Region Proposals)，解決漫畫文字物件重疊率過高，使模型計算損失函數(Loss Function) 能夠有明確的目標。而漫水填充(Flood Fill) 或主動輪廓模型(Active Contour Model) 等分割演算法，只適合在封閉的區域上使用，主動輪廓模型(Active Contour Model) 需要對每一分割區域調整參數，更無法找出銳角的區域。近年來，深度學習(Deep Learning) 被大規模的使用在影像分割上，但是，這些影像分割模型大多都是設計給彩色空間的影像使用，在只有黑與白的二值圖片上，缺少了色彩資訊，沒有明暗亮度提供特徵，所以不能夠像彩色圖片一樣，僅使用顏色就大致找出目標物件的範圍，且邊界處也不能隨著訓練被淡化，導致分割結果超出實際邊界。生成對抗網路(Generative Adversarial Network，GAN) 訓練模式也被逐漸加入分割網路，但是，僅分辨真偽是無法有幫助鑑別和生成的。本論文接著導入參數化生成對抗網路(Parametric Generative Adversarial Network，PGAN) 分割對話框(Speech Balloon) 區域，透過鑑別器(Discriminator) 監督學習文字區域特徵、向外延伸之銳角點等，並強化邊界資訊，讓分割結果能夠產生較為銳利的邊界，不像以往的分割模型輸出平滑的結果，使得邊界被忽視。最後，本論文除了利用AP、mAP 做為評估標準外，額外以倒角距離(Chamfer Distance，CD) 評估對話框(Speech Balloon) 分割，有利於區分以往重疊率高，細節部分卻沒有出來的情況。利用額外漫畫進行評估，其中包括海賊王(二集)、怪醫黑傑克(一集)、七龍珠(二集)、足球小將(一集)、麻辣教師(一集)、獵人(一集)、頭文字D(一集)、犬夜叉(一集)、JoJo的奇妙冒險(一集)、黑子的籃球(兩集)、灌籃高手(一集)、死神(一集)、火影忍者(一集)、忍者亂太郎(二集)、進擊的巨人(一集)，本論文分別對對話框(Speech Balloon) 和狀聲詞(Onomatopoeia) 的定位和分割做評估，證明本論文之研究能夠有效地抓出對話框(Speech Balloon) 和狀聲詞(Onomatopoeia)，以利於後續的漫畫翻譯和動畫化作業。

In order to animate the manga and translate the text, text areas need to be detected. Since blackandwhite graphic text, speech balloon and onomatopoeia are highly variable and mixed with background graphic elements, this paper proposes an attention mechanism based on deep learning, which allows the model to focus on text regions and solve the problem of high overlap of manga text objects by adding different region proposals, so that the model can calculate the loss function with a clear target. On the other hand, segmentation algorithms such as flood fill or active contour model are only suitable for closed regions, and active contour model needs to adjust the parameters for each region, and it is impossible to find the regions with sharp corners. In deep learning, most of image segmentation models are designed for images in color space. In binary pictures with only black and white, color information is missing, and there is no brightness or darkness to provide features. The segmentation result is out of the the actual boundaries. GAN training model is gradually added to the segmentation network, however, it cannot help to generate only with identifying the real and fake. This paper then introduces the parametric GAN to segment the speech balloon regions, and learn the textual regions through the supervision of the discriminator. The discriminator learns text region features, outwardly extending sharp points, etc., and enhances the boundary information so that the segmentation results can produce sharper boundaries, unlike the previous segmentation models that output smooth results and make the boundaries ignored. Finally, we use AP and mAP as evaluation metric, and use chamfer distance as an additional evaluation method which is advantageous to distinguish the situation where the overlap rate is high but the details are not revealed. Then, we use the following manga for evaluation our method on localization and segmentation of text region respectively, and demonstrates that our method can effectively capture text region for subsequent manga translation and animation.: One Piece, Black Jack, Dragon Balls, Captain Tsubasa, GTO, Hunter x Hunter, Initial D, Inuyasha, Jojos Bizarre Adventure, Kuroko’s Basketball, Slam Dunk, Shinigami, Naruto, Ninja, Attack on Titan.

中文摘要 I
Abstract  II
目錄 III
圖目錄 V
表目錄 VIII
介紹 1
1 問題定義 2
2 主要貢獻 2
3 論文架構 3
相關研究 4
1 漫畫文字偵測 4
1.1 基於文字特性之偵測 4
1.2 基於區域特性之偵測 5
2 漫畫文字區域之分割 6
2.1 對話框(Speech Balloon) 區域之分割 7
2.2 狀聲詞(Onomatopoeia) 字型之分割 8
2.3 基於生成對抗網路(Generative Adversarial Network，GAN)之分割 9
方法總覽 10
漫畫注意力(Manga Attention)  13
1 建構訓練資料(Training data)  13
2 漫畫注意力(Manga Attention) 網路架構 14
3 損失函數(Loss Function)  16
文字區域分割 19
1 資料集的建立 19
2 參數定義 19
3 網路架構 22
4 損失函數(Loss Function)  23
實驗設計和結果與討論 25
1 資料集介紹 25
2 評估方式介紹 26
3 實驗結果 28
結語與未來工作 68
參考文獻 69
授權書 72
                                

[1] U.R. Ko and H.G. Cho, SickZilMachine:A Deep Learning Based Script Text Isolation System for Comics Translation, pp. 413–425. 2020.
[2] J. Del Gobbo and R. Matuk Herrera, “Unconstrained text detection in manga: A new dataset and baseline,” in Computer Vision – ECCV 2020 Workshops (A. Bartoli and A. Fusiello, eds.), (Cham), pp. 629–646, Springer International Publishing, 2020.
[3] M. Yamada, R. Budiarto, M. Endo, and S. Miyazaki, “Comic image decomposition for reading comics on cellular phones.,” IEICE Transactions, vol. 87D, pp. 1370–1376, 2004.
[4] C. Rigaud, N. Tsopze, J.C. Burie, and J.M. Ogier, “Robust frame and text extraction from comic books,” pp. 129–138, 2011.
[5] A. K. NGO HO, J.C. Burie, and J.M. Ogier, “Panel and speech balloon extraction from comic books,” 2012.
[6] A. Ghorbel, J.M. Ogier, and N. Vincent, “Text extraction from comic books,” 2015.
[7] Y. Aramaki, Y. Matsui, T. Yamasaki, and K. Aizawa, “Text detection in manga by combining connectedcomponentbased and regionbased classifications,” in 2016 IEEE International Conference on Image Processing (ICIP), pp. 2901–2905, 2016.
[8] Q. Guo, K. Kato, N. Sato, and Y. Hoshino, “An algorithm for extracting text strings from comic strips,” 2006.
[9] H. Tolle and K. Arai, “Method for real time text extraction of digital manga comic,” The International Journal on the Image, vol. 4, pp. 669–676, 2011.
[10] C.Y. Su, R.I. Chang, and J.C. Liu, “Recognizing text elements for svg comic compression and its novel applications,” pp. 1329 – 1333, 2011.
[11] T. Ogawa, A. Otsubo, R. Narita, Y. Matsui, T. Yamasaki, and K. Aizawa, “Object detection for comics using manga109 annotations,” 2018.
[12] S. Ren, K. He, R. Girshick, and J. Sun, “Faster rcnn: Towards realtime object detection with region proposal networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, 2015.
[13] W.T. Chu and C.C. Yu, “Text detection in manga by deep region proposal, classification, and regression,” in 2018 IEEE Visual Communications and Image Processing (VCIP), pp. 1–4, 2018.
[14] K. Aizawa, A. Fujimoto, A. Otsubo, T. Ogawa, Y. Matsui, K. Tsubota, and H. Ikuta, “Building a manga dataset “manga109”with annotations for multimedia applications,” IEEE MultiMedia, 2020.
[15] X. Liu, C. Li, Z. Haichao, T.T. Wong, and X. xu, “Textaware balloon extraction from manga,” The Visual Computer, vol. 32, 2015.
[16] M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: Active contour models,” INTERNATIONAL JOURNAL OF COMPUTER VISION, vol. 1, no. 4, pp. 321–331, 1988.
[17] C. Xu and J. Prince, “Snakes, shapes, and gradient vector flow,” IEEE Transactions on Image Processing, vol. 7, no. 3, pp. 359–369, 1998.
[18] C. Rigaud, J.C. Burie, J.M. Ogier, D. Karatzas, and J. Van De Weijer, “An active contour model for speech balloon detection in comics,” in 2013 12th International Conference on Document Analysis and Recognition, pp. 1240–1244, 2013.
[19] D. Dubray and J. Laubrock, “Deep cnnbased speech balloon detection and segmentation for comic books,” pp. 1237–1243, 2019.
[20] N.V. Nguyen, C. Rigaud, and J.C. Burie, “Comic mtl: optimized multitask learning for comic book image analysis,” International Journal on Document Analysis and Recognition (IJDAR), vol. 22, 2019.
[21] C. Rigaud, D. Karatzas, J. Weijer, J.C. Burie, and J.M. Ogier, “Automatic text localisation in scanned comic books,” 2013.
[22] P. Luc, C. Couprie, S. Chintala, and J. Verbeek, “Semantic segmentation using adversarial networks,” 2016.
[23] P. Moeskops, M. Veta, M. W. Lafarge, K. A. J. Eppenhof, and J. P. W. Pluim, “Adversarial training and dilated convolutions for brain mri segmentation,” 2017.
[24] W. Dai, J. Doyle, X. Liang, H. Zhang, N. Dong, Y. Li, and E. P. Xing, “Scan: Structure correcting adversarial network for organ segmentation in chest xrays,” 2017.
[25] A. Odena, C. Olah, and J. Shlens, “Conditional image synthesis with auxiliary classifier gans,” 2016.
[26] Y. Xue, T. Xu, H. Zhang, R. Long, and X. Huang, “Segan: Adversarial network with multiscale l1 loss for medical image segmentation,” Neuroinformatics, vol. 16, 2018.
[27] T.Y. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona, D. Ramanan, C. L. Zitnick, and P. Dollár, “Microsoft coco: Common objects in context,” 2015.
[28] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask rcnn,” 2017.
[29] M. Xie, M. Xia, C. Li, X. Liu, and T.T. Wong, “Seamless manga inpainting with semantics awareness,” ACM Transactions on Graphics (SIGGRAPH 2021 issue), vol. 40, no. 4, pp. 96:1–96:11, 2021.
[30] C. Li, X. Liu, and T.T. Wong, “Deep extraction of manga structural lines,” ACM Transactions on Graphics (SIGGRAPH 2017 issue), vol. 36, no. 4, pp. 117:1–117:12, 2017.

全文公開日期 2025/09/26 (校內網路)
全文公開日期 2027/09/26 (校外網路)
全文公開日期 2027/09/26 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文