簡易檢索 / 詳目顯示

研究生: 魏貽誠
Yi-cheng Wei
論文名稱: 利用支持向量機的影像文字偵測方法
A Pyramidal Video Text Detection Method Based on SVM
指導教授: 林昌鴻
Chang-hong Lin
口試委員: 阮聖彰
Shanq-jang Ruan
陳維美
Wei-mei Chen
吳晉賢
Chin-hsien Wu
許孟超
Mon-chau Shie
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2011
畢業學年度: 99
語文別: 中文
論文頁數: 62
中文關鍵詞: K-means演算法離散小波轉換文字偵測支持向量主成份分析
外文關鍵詞: pyramidal method, video text detection
相關次數: 點閱:228下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著科技的發展,數位影像傳播的日益快速,影像文字偵測的研究也漸受重視,在本篇論文中,我們提出一個以支持向量機為基礎的文字偵測方法,能夠自動偵測影像中的文字區塊,用以幫助我們瞭解影像中的文字訊息。由於影像的背景複雜度以及影像文字大小、顏色及字型等的各種差異造成影像文字偵測上的困難;針對這些問題,我們使用金字塔式的作法來處理。首先,我們以雙線性內插法將輸入影像轉換成三種大小的影像,這三種大小的影像再轉換成灰階的梯度影像並計算各梯度影像的最大差值,接著應用 K-means 演算法將各最大梯度差值的影像聚類成:文字與非文字兩類。之後將這三個聚類後的連通成份影像轉換為相同大小的連通成份影像後經邏輯運算合併成單一的連通成份影像,並標記各連通成份以產生影像中的文字區域,再利用輸入影像的 Sobel 邊緣影像的側投影分析來決定每一候選文字區塊的邊界。
    在文字驗證階段,我們以兩階段來驗證候選文字區塊,首先是以文字的幾何與紋理特徵來對候選文字區塊做第一階段的驗證,接著應用小波轉換分解,計算第一階段驗證後的候選文字區塊的統計特徵,並經主成份分析法的轉換矩陣降低維度後,使用支持向量機核函數的計算來替代樣本資料在特徵空間中的內積運算,並以循序最小優化法求得最佳的決策函數後,來對第一階段驗證後的候選文字區塊是否含有文字資訊做第二階段的驗證。實驗結果顯示,本方法在不同影像背景、文字大小、顏色以及不同語言的影像文字區塊偵測上的有效性,且文字區塊的統計特徵經過降低維度處理後,也能對系統整體的執行效能有所提升。


    Along with the advance in technology, video data transferring becomes more and more convenient. Therefore, text detection researches for video images have received increasing attention. In this thesis, a new approach for detecting texts from video images is presented. Due to the different background complexity, font sizes and colors, it is difficult to detect the text regions from video images. We adopt a pyramidal method to solve these problems. First, we generate two downsized image by using bilinear interpolation method from the input image. Then, the maximum gradient difference images of each pixel in the three different size gradient images, which convert from the input image, have been calculated separately. Next, the K-means algorithm is applied to each maximum gradient difference image to separate all the pixels into two clusters: text and non-text, and the results of the K-means clustering are combined to form the text regions. After that, the projection profile analysis is applied to the Sobel edge map of the text regions to determine the text candidate boundaries. Finally, we identify each text candidate by using two phase verification. Text candidates are verified based on their geometrical properties and textures in the first phase. In the second verifying phase, statistic features of the phase one verified text candidates are computed through discrete wavelet transform, and principal component analysis is further employed to reduce the dimensions of these features. Then, we apply the optimal decision function of the support vector machine that is obtained from sequential minimal optimization to verify the text candidate whether they contain texts or not. Experimental results show that our method is effective for detecting texts in images with different font sizes and colors, languages and background complexity; moreover, the text candidate features reduction, which can also promote the system performance in the phase two verification.

    英 文 摘 要 ............................................... I 中 文 摘 要 .............................................. II 誌 謝................................................... III 目 錄.................................................... IV 圖索引 .................................................. VII 表索引 ................................................... IX 第一章 緒論 ............................................... 1 1.1 研究背景 ............................................... 1 1.2 研究動機 ............................................... 2 1.3 系統架構 ............................................... 4 1.4 論文內容概述............................................ 5 第二章 相關理論 ........................................... 7 2.1 相關文獻探討............................................ 7 2.2 K-means 聚類演算法 ...................................... 9 2.3 紋理分析 .............................................. 12 2.3.1 區域二元圖樣 ...................................... 12 2.3.2 灰階共生矩陣 ...................................... 13 2.4 離散小波轉換........................................... 15 2.5 主成分分析 ............................................ 17 2.5.1 主成分分析理論基礎 ................................ 17 2.5.2 主成分分析快速降維運算............................. 20 2.6 支持向量機 ............................................ 21 2.6.1 線性可分支持向量機 ................................ 21 2.6.2線性不可分支持向量機 ............................... 23 2.6.3支持向量機的核函數 ................................. 24 2.6.4 多類支持向量機 .................................... 25 第三章 影像文字偵測系統 ................................... 27 3.1 產生候選文字區塊 ....................................... 29 3.1.1 影像邊緣偵測 ...................................... 30 3.1.2 K-means聚類 ....................................... 32 3.1.3 決定候選文字區塊邊界............................... 34 3.2 第一階段驗證候選文字區塊 ............................... 36 3.2.1 利用區域二元圖樣驗證候選文字區塊 ................... 37 3.3 候選文字區塊第二階段驗證 ............................... 38 3.3.1 利用小波轉換擷取候選文字區塊特徵 ................... 38 3.3.2 利用支持向量機驗證候選文字區塊 ..................... 41 第四章 實驗結果 .......................................... 45 4.1 實驗設備與測詴 ......................................... 45 4.1.1 系統配備 .......................................... 45 4.1.2 實驗資料與測詴 .................................... 45 4.2 實驗數據分析........................................... 49 4.3 文字偵測結果........................................... 53 第五章 結論與未來方向 ..................................... 57 5.1 結論 .................................................. 57 5.2 未來方向 .............................................. 57 參考文獻 ................................................. 59

    [1] 繆紹綱,數位影像處理 活用-MATLAB,全華圖書 (2007)。
    [2] 郭逸奇,「基於支向機的方法在實際場景影像定位文字」,台灣科技大學資訊工程學系碩士論文 (2005)。
    [3] J. Zhang and R. Kasturi, “Extraction of text objects in video documents: Recent progress,” Eighth IAPR Workshop on Document Analysis Systems, pp. 5-17 (2008).
    [4] P. Shivakumara, T. Q. Phan and C. L. Tan, “Video text detection based on filters and edge features,” IEEE ICME, pp. 514-517 (2009).
    [5] P. Shivakumara, T. Q. Phan and C. L. Tan, “A gradient difference based technique for video text detection,” IEEE ICDAR, No. 09, pp. 156-160 (2009).
    [6] X. Wang, L. Huang and C. Liu, “A video text location method based on background classification,” IJDAR, Vol. 13, No. 3, pp. 173-186 (2010).
    [7] T. Q. Phan, P. Shivakumara and C. L. Tan, “A Laplacian method for video text detection,” IEEE ICDAR, No. 09, pp. 66-70 (2009).
    [8] M. R. Lyu and J. Q. Song, “A comprehensive method for multilingual video text detection, localization, and extraction,” IEEE Trans. Circuits and System for Video Technology, Vol. 15, No. 2, pp. 243-255 (2005).
    [9] C. W. Ngo and C. K. Chan, “Video text detection and segmentation for optical character recognition,” Multimedia Systems, Vol. 10, No. 3, pp. 261-272 (2005).
    [10]C. Liu., C. Wang and R. Dai, “Text detection in images based on
    unsupervised classification of edge-based features”. IEEE ICDAR, pp. 610-614 (2005).
    [11]W. J. Kim and C. G. Kim, “A new approach for overlay text detection and extraction from complex video scene,” IEEE Trans. Image Processing, Vol. 18, No. 2, pp. 401-411 (2009).
    [12]X. Zhang, F. C. Sun and L. Gu, “A combined algorithm for video text extraction,” Fuzzy Systems and Knowledge Discovery, Vol. 5, No. 10, pp. 2294-2298 (2010).
    [13]L. Sun, G. Liu, X. Qian and D. Gao, “A Novel text detection and localization method based on corner response,” IEEE ICME, pp. 390-393 (2009).
    [14]V. Y. Mariano and R. Kasturi, “Locating uniform-colored text in video frames,” IEEE ICPR, vol.4, pp. 539-542 (2000).
    [15]E. K. Wong and M. Chen, “A robust algorithm for text extraction in color video,” IEEE ICME, Vol. 2, No. 2000, pp. 797-800 (2000).
    [16]Q. X. Ye and Q. M. Huang, “A new text detection algorithm in images/video frames,” Advances in Multimedia Information Processing – PCM 2004, Vol.3332, pp. 858-865 (2004).
    [17]J. Gllavata, R. Ewerth and B. Freisleben, “Text Detection in Images based on unsupervised classification of high frequency wavelet coefficients,” IEEE ICPR, Vol.1, pp. 425-428 (2004).
    [18]Q. Ye, W. Gao, W, Wang, and W, Zeng, “A robust text detection algorithm in images and video frames,” ICSP, Vol. 2, pp. 802-806 (2003).
    [19]X. Tang, B. Luo, X. Gao, E. Pissaloux and H. Zhan, “Video text extraction using temporal feature vectors,” IEEE ICME, Vol. 1, No.02, pp. 85-88 (2002).
    [20]G. Miao, Q. Huang, S. Jiang, W. Gao, “Coarse-to-fine video text detection,” IEEE ICME, pp. 569-572 (2008).
    [21]Q. Liu, C. Jung, S. Kim, Y. Moon, and J. Kim, “Stroke filter for text localization in video images,” IEEE ICIP, pp. 1473-1476 (2006).
    [22]C. Jung, Q. Liu and J. Kim, “A stroke filter and its application to text localization,” Pattern Recognition Letters, Vol. 30, No. 2, pp. 114-122 (2009).
    [23]K. Suzuki, I. Horiba, N. Sugie, “Linear-time connected-component labeling based on sequential local operations,” Source Computer Vision and Image Understanding archive, vol. 89 , no. 1, pp. 1-23 (2003).
    [24]R.C. Gonzalez and R.E. Woods, “Digital Image Processing,” Person Education, 3rd edition (2008).
    [25]U. Qidwai and C.H. Cheng, “Digital Image Processing An Algorithm Approach with MATLAB,” CRC Press (2010).
    [26]R.C. Gonzalez, R.E. Woods and S. L. Eddins, “Digital Image Processing Using MATLAB,” McGraw-Hill Education, 2nd edition (2011).
    [27]R. Haralick, K. Shanmugam, and I.Dinstein, “Textural features for image classification,” IEEE TSMC, vol. 3, pp. 610-621 (1973).
    [28]P. Shivakumara, T. Q. Phan and C. L. Tan, “A Laplacian Approach to Multi-Oriented Text Detection in Video,” IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 33, NO. 2, pp. 412-419 (2011).
    [29]C. Sun and W. Wee, "Neighboring gray level dependence matrix for texture classification," Computer Vision, Graphics, and Image Processing, vol. 23, pp. 341-352 (1983).
    [30]李祐昇,「利用小波轉換自動偵測影像中的文字」,台灣大學資訊管理研究所碩士論文 (2001)。
    [31]劉家維,「一個提昇自然場景影像之文字偵測與擷取效能的適應性方法」,台灣科技大學資訊工程學系碩士論文 (2006)。
    [32]林君諺,「影像文字自動擷取演算法」,義守大學資訊管理研究所碩士論文 (2007)。
    [33]謝育錡,「嵌入式即時人臉偵測與辨識系統」,台灣科技大學電子工程學系碩士論文 (2009)。
    [34]蕭博文,「以SVM為基礎之偽鈔辨識」,台灣科技大學電機工程學系碩士論文 (2009)。
    [35]張斐章、張麗秓,類神經網路,東華書局 (2005)。
    [36]連國珍, 數位影像處理,儒林出版社 (2007)。
    [37] http://www.csie.ntu.edu.tw/~cjlin/。

    無法下載圖示 全文公開日期 2016/08/03 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE