簡易檢索 / 詳目顯示

研究生: 劉家維
Chia-Wei Liu
論文名稱: 一個提昇自然場景影像之文字偵測與擷取效能的適應性方法
An Adaboost Approach to Detecting and Extracting Texts from Natural Scene Images
指導教授: 范欽雄
Chin-Shyurng Fahn
口試委員: 繆紹綱
Shaou-Gang Miaou
鄭為民
Wei-Min Jeng
吳育德
Yu-Te Wu
鍾國亮
Kuo-Liang Chung
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2006
畢業學年度: 94
語文別: 英文
論文頁數: 82
中文關鍵詞: Adaboost弱分類器文字偵測相連元件小波轉換共生矩陣
外文關鍵詞: Adaboost, weak classifier, text detection, connected component, wavelet transform, co-occurrence matrix
相關次數: 點閱:261下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

影像與視訊中的文字往往包含了大量的訊息,而文字資訊萃取技術已被廣泛
地應用在多媒體索引以及智慧型運輸系統上面。欲準確地從影像及視訊中偵測與
擷取文字並不簡單,因為文字的大小、方向以及排列方式皆不固定。在本論文中,
我們提出了一個基於連通元件的文字偵測與擷取方法。首先利用Canny 邊緣檢
測以及線性連通元件標記法準確地求得文字的候選區塊,再利用一些文字的特性
來做初步的過濾,以大幅減少候選區塊的數量以及後續分類器判斷的時間還有提
昇正確率。接著將剩下的候選區塊交由Adaboost 演算法所訓練出來的強分類器
作判斷並且合併屬於同一個字的字母。與其它機器學習演算法相比,Adaboost
演算法在收斂的速度上佔了很大的優勢,因此我們可以時常更新我們的訓練樣本
以應付各種的情形,同時並不會花費太大的成本;最後,我們採用了一個適應性
門檻值的二值化方法來做文字抽取的動作,即使在不平衡的光源底下依然可以成
功地抽取出文字。根據實驗結果顯示:我們所提方法的文字召回率以及準確率都
高於95%,並且整體系統的執行效能也令人滿意。


Texts in images and videos often contain lots of information. Text information
extraction techniques have been widely applied to multimedia index and intelligent transportation systems. It is hard to precisely detect and extract texts from images and videos due to the differences in size or alignment. In this thesis, we propose a new connected-component-based text detection and extraction method. We first utilize the Canny operator and a linear-time connected-component labeling algorithm to find out candidate blocks precisely. Then several fundamental rules derived from the characteristics of texts to screen non-text blocks. Reducing the number of candidate blocks can speed up the efficiency of the classifier and improve the accuracy. Next, we distinguish text blocks from the remainder candidate blocks using the strong
classifier which is trained by an Adaboost algorithm and group the characters we detect into words. Compared with other machine learning algorithms, the Adaboost algorithm has an advantage of facilitating the speed of convergence. Thus, we can update training samples to deal with comprehensive circumstances but do not spend much computational cost. Finally, we adopt an adaptive threshold binarization method to extract the text regions. Even in an unbalanced illuminant environment, we can still extract texts successfully. Experimental results show that the texts recall and precision rates are both more than 95% and the system efficiency of execution is also satisfactory.

Contents Abstract......................................................i 中文摘要......................................................ii Contents......................................................iii List of Figures...............................................v List of Tables................................................viii Chapter 1 Introduction........................................1 1.1 Overview..................................................1 1.2 Background................................................1 1.3 Motivation................................................2 1.4 Thesis Organization.......................................3 Chapter 2 Related Works.......................................5 2.1 Artificial Neural Network.................................5 2.2 Adaboost..................................................6 2.3 Support Vector Machines...................................10 2.4 Other methods.............................................12 Chapter 3 Our Proposed Method.................................14 3.1 Candidate character blocks................................15 3.1.1 Canny edge detection....................................18 3.1.2 connected component labeling............................20 3.1.3 preliminary filter......................................24 3.2 Block feature extraction..................................27 3.2.1 discrete wavelet transform..............................29 3.2.2 wavelet features........................................33 3.3 Verification of the candidate character blocks............35 3.3.1 Adaboost................................................36 3.3.2 the weak classifier.....................................44 3.4 Connected component fusion and text extraction............47 Chapter 4 Experimental Results and Discussion.................51 4.1 Training samples..........................................51 4.2 Error estimation of different models......................53 4.3 System evaluation and data analysis.......................57 Chapter 5 Conclusions and Future Works........................66 References....................................................68

References
[1] Y. Freund and R. E. Schapire, “Experiments with a new boosting algorithm,” in Proceedings of the 13th International Conference on Machine Learning, pp. 148-156, Bari, Italy, 1996.
[2] Q. Ye, W. Gao, W. Wang, and W. Zeng, “A robust text detection algorithm in
images and video frames,” in Proceedings of the IEEE Conference on Information, Communications, and Signal Processing, vol. 2, pp. 802-806, Singapore, 2003.
[3] H. Li, D. Doermann, and O. Kia, “Automatic text detection and tracking in
digital video,” IEEE Transactions on Image Processing, vol. 9, no. 1, pp. 147–156, 2000.
[4] K. Suzuki, I. Horiba, and N. Sugie, “Linear-time connected-component labeling based on sequential local operations,” Computer Vision and Image
Understanding, vol.85, no. 1, pp. 1-23, 2003.
[5] J. Canny, “A computational approach to edge detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 8, no. 6, pp. 679-698, 1986.
[6] R. C. Gonzalez and R. E. Woods, Digital Image Processing, 2nd Ed.,
Prentice-Hall, Upper Saddle River, New Jersey, 2002.
[7] M. Acharyya and M. K. Kundu, “Document image segmentation using wavelet
scale-space features,” IEEE Transactions on Circuits and Systems for Video
Technology, vol. 12, no. 12, pp. 1117-1127, 2002.
[8] M. Kimachi, Y. Wu, and T. Aizawa, “Using adaboost to detect and segment
characters from natural scenes,” in Proceedings of First International Workshop on Camera-Based Document Analysis and Recognition, pp. 52-59, Seoul, Korea, 2005.
[9] K. Jung, K. I. Kim, and A. K. Jain, “Text information extraction in images and video: a survey,” Pattern Recognition, vol. 37, no. 5, pp. 977-997, 2004.
[10] Y. Zhu, T. Tan, and Y. Wang, “Font recognition based on global texture analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 10, pp. 1192-1200, 2001.
[11] H. K. Kim, “Efficient automatic text location method and content-based indexing and structuring of video database,” Journal of Visual Communication and Image Representation, vol. 7, no. 4, pp. 336-344, 1996.
[12] Y. Zhong and A. K. Jain, “Object localization using color, texture, and shape,” Pattern Recognition, vol. 33, no. 4, pp. 671-684, 2000.
[13] S. Antani, R. Kasturi, and R. Jain, “A survey on the use of pattern recognition methods for abstraction, indexing, and retrieval of images and video,” Pattern Recognition, vol. 35, no. 4, pp. 945-965, 2002.
[14] M. Flickner and H. Sawney, “Query by image and video content: the QBIC
system,” IEEE Computer, vol. 28, no. 9, pp. 23-32, 1995.
[15] M. A. Smith and T. Kanade, “Video skimming for quick browsing based on
audio and image characterization,” Technical Report CMU-CS-95-186, Carnegie
Mellon University, Pittsburgh, Pennsylvania, 1995.
[16] M. H. Yang, D. J. Kriegman, and N. Ahuja, “Detecting faces in images: a
survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.
24, no. 1, pp. 34-58, 2002.
[17] Y. Cui and Q. Huang, “Character extraction of license plates from video,” in Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, San Juan, Puerto Rico, pp. 502-507, 1997.
[18] B. T. Chun, Y. Bae, and T. Y. Kim, “Automatic text extraction in digital videos using FFT and neural network,” in Proceedings of the IEEE International Fuzzy Systems Conference, vol. 2, pp. 1112-1115, 1999.
[19] P. Viola and M. J. Jones, “Robust real-time object detection,” in Proceedings of the IEEE Workshop on Statistical and Computational Theories of Vision, Vancouver, Canada, 2001.
[20] X. Chen and A. L. Yuille, “Detecting and reading text in natural scenes,” in Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, vol. 2, pp. 366-373, 2004.
[21] J. Friedman, T. Hastie, and R. Tibshirani, “Additive logistic regression: a statistical view of boosting,” The Annals of Statistics, vol. 28, no. 2, pp. 337-407, 2000.
[22] X. Chen, J. Yang, J. Zhang, and A. Waibel, “Automatic detection and recognition of signs from natural scenes,” IEEE Transactions on Image Processing, vol. 13, no. 1, pp. 87-99, 2004.
[23] R. E. Schapire and Y. Singer, “Improved boosting algorithms using
confidence-rated predictions,” Machine Learning, vol. 37, no. 3, pp. 297-336,
1999.
[24] R. Gottumukkal and V. K. Asari, “Real time face detection from color video stream based on PCA method,” in Proceedings of the Applied Imagery Pattern Recognition Workshop, Washington, DC, pp. 146-150, 2003.
[25] K. I. Kim, K. Jung, and J. H. Kim, “Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 12, pp. 1631-1639, 2003.
[26] J. Sauvola, T. Seppanen, S. Haapakoski, and M. Pietikainen, “Adaptive
document binarization,” in Proceedings of the 4th International Conference on
Document Analysis and Recognition, Ulm, Germany, vol. 1, pp. 147-152, 1997.
[27] A. Vezhnevets, “GML Adaboost Matlab Toolbox,” Graphics and Media
Laboratory, Computer Science Department, Moscow State University, Moscow,
Russian Federation, http://research.graphicon.ru/.
[28] S. Z. Li, Z. Q. Zhang, H. Shum, and H. J. Zhang, “FloatBoost Learning for
Classification,” in Proceedings of The 16th Annual Conference on Neural
Information Processing Systems, Vancouver, Canada, pp. 993-1000, 2002.
[29] Y. Freund, “An adaptive version of the boost by majority algorithm,” Machine Learning, vol. 43, no. 3, pp. 293-318, 2001.
[30] W. Fan, S. J. Stolfo, J. Zhang, and P. K. Chan, “AdaCost: misclassification cost-sensitive boosting,” in Proceedings of the 16th International Conference on Machine Learning, Bled, Slovenia, pp. 97-105, 1999.
[31] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees, Chapman and Hall, New York, 1984.

無法下載圖示 全文公開日期 2011/07/20 (校內網路)
全文公開日期 本全文未授權公開 (校外網路)
全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
QR CODE