簡易檢索 / 詳目顯示

研究生: 謝易家
Yi-Chia Hsieh
論文名稱: 基於極值區域之即時場景文字偵測與辨識系統
Real-Time Scene Text Detection and Recognition Using Extremal Region
指導教授: 王乃堅
Nai-Jian Wang
口試委員: 莊季高
Jih-Gau Juang
郭景明
Jing-Ming Guo
王乃堅
Nai-Jian Wang
方劭云
Shao-Yun Fang
鍾順平
Shun-Ping Chung
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2017
畢業學年度: 105
語文別: 中文
論文頁數: 82
中文關鍵詞: 場景文字辨識極值區域非最大值抑制局部二元模式自適應增強演算法鍊碼支持向量機
外文關鍵詞: scene text recognition, extremal region, non-maximum suppression, local binary pattern, AdaBoost, chain-code, support vector machine
相關次數: 點閱:337下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

在現今資訊爆炸的時代中,多媒體資訊已成為人們生活中不可或缺的一部分,影音資訊被拿來當作數位日記,成為人們紀錄資訊的一種媒介,也同時產生了大量的影像文字資訊。文字在影像中通常存在著有用的資訊,可能是路牌上的字抑或是路邊的招牌等等,因此如何讓電腦在影像中找出文字資訊並加以應用,將會是很重要的發展課題。場景文字偵測與辨識是希望透過影像處理,將影像中的文字資訊擷取出來,並藉此提供路牌提醒、自動翻譯、影像索引等各式應用,作為輔助人們的工具。
本論文提出一個快速文字偵測與辨識演算法,發展出一個能以輸入影像並得到影像中文字的系統,主要分為三大部分:(1)文字候選區域擷取、(2)文字分類與分組、(3)文字辨識。文字候選區域是以極值區域作為文字偵測器,在YCrCb及它們的反向通道等多通道上作擷取,並利用非最大值抑制將大量重複的候選區域除去,之後進行文字分類,我們使用平均局部二元模式作為特徵,並以實值自適應增強演算法訓練出分類器,分類共有兩階段,將文字分為強文字、弱文字與非文字,兩階段的分類可以同時保有高召回率與高準確率,接著再追蹤出與強文字有相似特性的弱文字,下一步進行文字分組,將個別的字母組合成單字,最後再以我們以鍊碼方向作為特徵並以支持向量機作為分類器,進行文字辨識且判斷所屬標籤,並將結果顯示在文字的上方。
實驗結果可以看出我們的系統在影片序列的測試可以達到即時的偵測速度,以及近即時的辨識速度,並對於各式字體與字的大小都可以擷取,同時可以容忍適當的轉角、模糊以及光源不均的情況,顯現出本系統的穩健性。而我們在ICDAR 2013的資料庫的文字偵測則達到了70.4%的偵測率。


In the era of information explosion, multimedia has become an indispensable part of modern life. People use videos and images as digital diary and create enormous image text data consequently. Texts in image usually contain informative data, and therefore scene text recognition system would be a promising application.
This thesis presents a fast scene text localization and recognition algorithm. We have develop a system that takes images as input and recognizes texts in the input images as output. The system consists of three parts: (1) Character candidate extraction, (2) Character classification and grouping, (3) Optical character recognition. In the first stage, extremal region(ER) is used as a candidate extractor. In order to reach high recall rate, we extract ER in multiple channels such as YCrCb and their inverted channels. A non-maximum suppression skill is introduced to eliminate overlapped candidates. In the second stage, we used mean local binary pattern as feature and train our classifier by AdaBoost. Text candidates are classified as one of strong text, weak text and non-text by a 2 stages classifier. The 2 stages classifier is intended to remain high recall and precision simultaneously. We then track the weak texts with strong texts as long as they have similar properties. Our next step was to group the candidates and transform them from character level to word level. Finally, our optical character recognition is done by using chain-code direction as feature and support vector machine as classifier.
The experimental results show that our system is able to detect text in real-time and recognize text in nearly real-time. In addition, the system can detect text in different text fonts and text size, also tolerate moderate rotation, blurring and inconsistent lighting. Thus, the robustness of the system is validated.

摘要 I Abstract II 致謝 III 目錄 IV 圖目錄 VII 表目錄 IX 第一章 緒論 1 1.1 研究背景與動機 1 1.2 文獻回顧 2 1.3 論文目標 3 1.4 論文組織 3 第二章 系統架構與發展環境 5 2.1.1系統架構 5 2.1.2 系統概觀 7 2.2開發環境 8 第三章 文字候選區域擷取與非最大值抑制 10 3.1文字候選區域 10 3.1.1極值區域(Extremal Region, ER) 10 3.1.2 物件樹(Component tree) 12 3.2 ER偵測演算法 12 3.3非最大值抑制(Non-maximum suppression) 23 第四章 文字分類、追蹤與分組 25 4.1文字分類(Classification) 25 4.1.1自適應增強演算法(Adaptive Boosting, Adaboost) 26 4.1.2局部二元模式(Local Binary Pattern, LBP) 33 4.2文字追蹤(Character Tracking) 36 4.3文字分組(Character Grouping) 40 第五章 文字辨識 43 5.1支持向量機(Support Vector Machine, SVM) 44 5.2文字辨識演算法 45 5.2.1幾何正規化(Geometric Normalization) 45 5.2.2鍊碼方向特徵(Chain-Code Direction Feature) 47 5.2.3最佳序列選擇(Optimal Path Selection) 49 5.2.4回授驗證(Feedback Verification) 50 5.2.5拼字檢查(Spell Checking) 51 第六章 實驗結果與分析 54 6.1 字母等級(Character level)召回率比較 54 6.2 ICDAR 2013 Dataset文字偵測 56 6.3影像序列文字偵測與辨識 60 6.3.1序列影像一 60 6.3.2序列影像二 62 6.3.3序列影像三 64 第七章 結論與未來研究方向 66 7.1結論 66 7.2未來研究方向 66 參考文獻 68

[1] K. Wang, B. Babenko and S. Belongie, “End-to-end scene text recognition,” International Conference on Computer Vision, pages 1457–1464, 2011.
[2] J.-J. Lee, P.-H. Lee, S.-W. Lee, A. L. Yuille, and C. Koch, “Adaboost for text detection in natural scene,” International Conference on Document Analysis and Recognition, pages 429–434, 2011.
[3] D. Nister and H. Stewenius, “Linear time maximally stable extremal regions,” European Conference on Computer Vision, pages 183–196, 2008.
[4] M. Donoser and H. Bischof, “Efficient maximally stable extremal region (MSER) tracking,” Computer Vision and Pattern Recognition, vol. 1, pages 553–560, 2006.
[5] M. Couprie, L. Najman, and G. Bertrand, “Quasi-linear algorithms for the topological watershed,” Journal of Mathematical Imaging and Vision, vol.22, no.2, pages 231–249, 2005.
[6] L. Neumann and J. Matas, “A method for text localization and recognition in real-world images,” Asian Conference on Computer Vision, pages 770–783, 2010.
[7] L. Neumann and J. Matas, “Text localization in real-world images using efficiently pruned exhaustive search,” International Conference on Document Analysis and Recognition, pages 687–691, 2011.
[8] L. Neumann and J. Matas, “Real-time scene text localization and recognition,” Computer Vision and Pattern Recognition, pages 3538–3545, 2012.
[9] L. Neumann and J. Matas, “On combining multiple segmentations in scene text recognition,” International Conference on Document Analysis and Recognition, pages 523–527, 2013.
[10] H. Cho, M. Sung and B. Jun, ”Canny Text Detector: Fast and robust scene text localization algorithm,” Computer Vision and Pattern Recognition, pages 3566–3573, 2016.
[11] M.-C. Sung, B. Jun, H. Cho, and D. Kim, “Scene text detection with robust character candidate extraction method,” International Conference on Document Analysis and Recognition, pages 426–430, 2015.
[12] B. Epshtein, E. Ofek, and Y. Wexler, “Detecting text in natural scenes with stroke width transform,” Computer Vision and Pattern Recognition, pages 2963–2970, 2010.
[13] X.-C. Yin, X. Yin, K. Huang and H. W. Hao, “Robust text detection in natural scene images,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 36, no 5, pages 970–983, 2014.
[14] C. Yi and Y. Tian, “Text string detection from natural scenes by structure-based partition and grouping,” IEEE Trans. Image Processing,vol. 20, no. 9 pages 2594–2605, 2011.
[15] M. Busta, L.Nuumann and J. Matas, “FASText: Efficient unconstrained scene text detector,” International Conference on Computer Vision, pages 1206–1214, 2015.
[16] C. Shi, C. Wang, B. Xiao, Y. Zhang and S. Gao, “Scene text detection using graph model built upon maximally stable extremal regions extremal regions,” Pattern Recognition, vol. 34, no. 2, pages 107–116, 2013.
[17] Wahyono, M. Jeong and K.-H. Jo, “Multi language text detection using fast stroke width transform,” Korea-Japan Joint Workshop on Frontiers of Computer Vision, pages 1–4, 2015.
[18] A. Zamberletti, I. Gallo, and L. Noce, “Text localization based on fast feature pyramids and multi-resolution maximally stable extremal regions,” Asian Conference on Computer Vision, pages 91–105, 2014.
[19] R. E. Schapire and Y. Singer, “Improved boosting algorithms using confidence-rated predictions,” Machine Learning, pages 297–336, 1999.
[20] P. J. Rousseeuw, “Least median of squares regression,” Journal of the American Statistical Association, pages 871–880, 1984.
[21] R. Rothe, M. Guillaumin, and L. V. Gool, “Non-maximum suppression for object detection by passing messages between windows,” Asian Conference on Computer Vision, pages 290–306, 2014.
[22] P. Viola and M. J. Jones, “Rapid object detection using a boosted cascade of simple features,” Computer Vision and Pattern Recognition, pages 511–518, 2001.
[23] P. Norvig. (2017, Jun.). How to write a spelling corrector. Retrieve from
http://norvig.com/spell-correct.html.
[24] C.-C. Chang and C.-J. Lin, “LIBSVM : A library for support vector machines,”
(2017, Jun.). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[25] C. Wolf and J.-M. Jolion, “Object count/area graphs for the evaluation of object detection
and segmentation algorithms,” International Journal on Document Analysis and Recognition ,
vol. 8, no. 4, pages 280-296, 2006.
(2017, Jun.). Software available at http://liris.cnrs.fr/christian.wolf/software/deteval/.
[26] D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. Gomez, S. Robles, J. Mas, D. Fernandez,
J. Almazan, L.P. de las Heras, “ICDAR 2013 robust reading competition,” International
Conference of Document Analysis and Recognition, pages 1115-1124, 2013. (2017, Jun.).
Dataset available at http://rrc.cvc.uab.es/?ch=2&com=downloads.

QR CODE