研究生: |
謝易家 Yi-Chia Hsieh |
---|---|
論文名稱: |
基於極值區域之即時場景文字偵測與辨識系統 Real-Time Scene Text Detection and Recognition Using Extremal Region |
指導教授: |
王乃堅
Nai-Jian Wang |
口試委員: |
莊季高
Jih-Gau Juang 郭景明 Jing-Ming Guo 王乃堅 Nai-Jian Wang 方劭云 Shao-Yun Fang 鍾順平 Shun-Ping Chung |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電機工程系 Department of Electrical Engineering |
論文出版年: | 2017 |
畢業學年度: | 105 |
語文別: | 中文 |
論文頁數: | 82 |
中文關鍵詞: | 場景文字辨識 、極值區域 、非最大值抑制 、局部二元模式 、自適應增強演算法 、鍊碼 、支持向量機 |
外文關鍵詞: | scene text recognition, extremal region, non-maximum suppression, local binary pattern, AdaBoost, chain-code, support vector machine |
相關次數: | 點閱:337 下載:1 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在現今資訊爆炸的時代中,多媒體資訊已成為人們生活中不可或缺的一部分,影音資訊被拿來當作數位日記,成為人們紀錄資訊的一種媒介,也同時產生了大量的影像文字資訊。文字在影像中通常存在著有用的資訊,可能是路牌上的字抑或是路邊的招牌等等,因此如何讓電腦在影像中找出文字資訊並加以應用,將會是很重要的發展課題。場景文字偵測與辨識是希望透過影像處理,將影像中的文字資訊擷取出來,並藉此提供路牌提醒、自動翻譯、影像索引等各式應用,作為輔助人們的工具。
本論文提出一個快速文字偵測與辨識演算法,發展出一個能以輸入影像並得到影像中文字的系統,主要分為三大部分:(1)文字候選區域擷取、(2)文字分類與分組、(3)文字辨識。文字候選區域是以極值區域作為文字偵測器,在YCrCb及它們的反向通道等多通道上作擷取,並利用非最大值抑制將大量重複的候選區域除去,之後進行文字分類,我們使用平均局部二元模式作為特徵,並以實值自適應增強演算法訓練出分類器,分類共有兩階段,將文字分為強文字、弱文字與非文字,兩階段的分類可以同時保有高召回率與高準確率,接著再追蹤出與強文字有相似特性的弱文字,下一步進行文字分組,將個別的字母組合成單字,最後再以我們以鍊碼方向作為特徵並以支持向量機作為分類器,進行文字辨識且判斷所屬標籤,並將結果顯示在文字的上方。
實驗結果可以看出我們的系統在影片序列的測試可以達到即時的偵測速度,以及近即時的辨識速度,並對於各式字體與字的大小都可以擷取,同時可以容忍適當的轉角、模糊以及光源不均的情況,顯現出本系統的穩健性。而我們在ICDAR 2013的資料庫的文字偵測則達到了70.4%的偵測率。
In the era of information explosion, multimedia has become an indispensable part of modern life. People use videos and images as digital diary and create enormous image text data consequently. Texts in image usually contain informative data, and therefore scene text recognition system would be a promising application.
This thesis presents a fast scene text localization and recognition algorithm. We have develop a system that takes images as input and recognizes texts in the input images as output. The system consists of three parts: (1) Character candidate extraction, (2) Character classification and grouping, (3) Optical character recognition. In the first stage, extremal region(ER) is used as a candidate extractor. In order to reach high recall rate, we extract ER in multiple channels such as YCrCb and their inverted channels. A non-maximum suppression skill is introduced to eliminate overlapped candidates. In the second stage, we used mean local binary pattern as feature and train our classifier by AdaBoost. Text candidates are classified as one of strong text, weak text and non-text by a 2 stages classifier. The 2 stages classifier is intended to remain high recall and precision simultaneously. We then track the weak texts with strong texts as long as they have similar properties. Our next step was to group the candidates and transform them from character level to word level. Finally, our optical character recognition is done by using chain-code direction as feature and support vector machine as classifier.
The experimental results show that our system is able to detect text in real-time and recognize text in nearly real-time. In addition, the system can detect text in different text fonts and text size, also tolerate moderate rotation, blurring and inconsistent lighting. Thus, the robustness of the system is validated.
[1] K. Wang, B. Babenko and S. Belongie, “End-to-end scene text recognition,” International Conference on Computer Vision, pages 1457–1464, 2011.
[2] J.-J. Lee, P.-H. Lee, S.-W. Lee, A. L. Yuille, and C. Koch, “Adaboost for text detection in natural scene,” International Conference on Document Analysis and Recognition, pages 429–434, 2011.
[3] D. Nister and H. Stewenius, “Linear time maximally stable extremal regions,” European Conference on Computer Vision, pages 183–196, 2008.
[4] M. Donoser and H. Bischof, “Efficient maximally stable extremal region (MSER) tracking,” Computer Vision and Pattern Recognition, vol. 1, pages 553–560, 2006.
[5] M. Couprie, L. Najman, and G. Bertrand, “Quasi-linear algorithms for the topological watershed,” Journal of Mathematical Imaging and Vision, vol.22, no.2, pages 231–249, 2005.
[6] L. Neumann and J. Matas, “A method for text localization and recognition in real-world images,” Asian Conference on Computer Vision, pages 770–783, 2010.
[7] L. Neumann and J. Matas, “Text localization in real-world images using efficiently pruned exhaustive search,” International Conference on Document Analysis and Recognition, pages 687–691, 2011.
[8] L. Neumann and J. Matas, “Real-time scene text localization and recognition,” Computer Vision and Pattern Recognition, pages 3538–3545, 2012.
[9] L. Neumann and J. Matas, “On combining multiple segmentations in scene text recognition,” International Conference on Document Analysis and Recognition, pages 523–527, 2013.
[10] H. Cho, M. Sung and B. Jun, ”Canny Text Detector: Fast and robust scene text localization algorithm,” Computer Vision and Pattern Recognition, pages 3566–3573, 2016.
[11] M.-C. Sung, B. Jun, H. Cho, and D. Kim, “Scene text detection with robust character candidate extraction method,” International Conference on Document Analysis and Recognition, pages 426–430, 2015.
[12] B. Epshtein, E. Ofek, and Y. Wexler, “Detecting text in natural scenes with stroke width transform,” Computer Vision and Pattern Recognition, pages 2963–2970, 2010.
[13] X.-C. Yin, X. Yin, K. Huang and H. W. Hao, “Robust text detection in natural scene images,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 36, no 5, pages 970–983, 2014.
[14] C. Yi and Y. Tian, “Text string detection from natural scenes by structure-based partition and grouping,” IEEE Trans. Image Processing,vol. 20, no. 9 pages 2594–2605, 2011.
[15] M. Busta, L.Nuumann and J. Matas, “FASText: Efficient unconstrained scene text detector,” International Conference on Computer Vision, pages 1206–1214, 2015.
[16] C. Shi, C. Wang, B. Xiao, Y. Zhang and S. Gao, “Scene text detection using graph model built upon maximally stable extremal regions extremal regions,” Pattern Recognition, vol. 34, no. 2, pages 107–116, 2013.
[17] Wahyono, M. Jeong and K.-H. Jo, “Multi language text detection using fast stroke width transform,” Korea-Japan Joint Workshop on Frontiers of Computer Vision, pages 1–4, 2015.
[18] A. Zamberletti, I. Gallo, and L. Noce, “Text localization based on fast feature pyramids and multi-resolution maximally stable extremal regions,” Asian Conference on Computer Vision, pages 91–105, 2014.
[19] R. E. Schapire and Y. Singer, “Improved boosting algorithms using confidence-rated predictions,” Machine Learning, pages 297–336, 1999.
[20] P. J. Rousseeuw, “Least median of squares regression,” Journal of the American Statistical Association, pages 871–880, 1984.
[21] R. Rothe, M. Guillaumin, and L. V. Gool, “Non-maximum suppression for object detection by passing messages between windows,” Asian Conference on Computer Vision, pages 290–306, 2014.
[22] P. Viola and M. J. Jones, “Rapid object detection using a boosted cascade of simple features,” Computer Vision and Pattern Recognition, pages 511–518, 2001.
[23] P. Norvig. (2017, Jun.). How to write a spelling corrector. Retrieve from
http://norvig.com/spell-correct.html.
[24] C.-C. Chang and C.-J. Lin, “LIBSVM : A library for support vector machines,”
(2017, Jun.). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[25] C. Wolf and J.-M. Jolion, “Object count/area graphs for the evaluation of object detection
and segmentation algorithms,” International Journal on Document Analysis and Recognition ,
vol. 8, no. 4, pages 280-296, 2006.
(2017, Jun.). Software available at http://liris.cnrs.fr/christian.wolf/software/deteval/.
[26] D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. Gomez, S. Robles, J. Mas, D. Fernandez,
J. Almazan, L.P. de las Heras, “ICDAR 2013 robust reading competition,” International
Conference of Document Analysis and Recognition, pages 1115-1124, 2013. (2017, Jun.).
Dataset available at http://rrc.cvc.uab.es/?ch=2&com=downloads.