簡易檢索 / 詳目顯示

研究生: 魏愛津
Ai-Jin Wei
論文名稱: 相似字元相似對的生成最佳化
Optimization of Finding Confusing Pairs for Similar Words
指導教授: 楊英魁
Ying-Kuei Yang
口試委員: 黎碧煌
Bih-Hwang Lee
孫宗瀛
Tsung-Ying Sun
謝鴻琳
Horng-Lin Shieh
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2012
畢業學年度: 100
語文別: 中文
論文頁數: 59
中文關鍵詞: 類群歸屬度向量類群序列相似對文字辨識相似度
外文關鍵詞: cluster-membership-degree vector, cluster sequence, confusing pair
相關次數: 點閱:154下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 文字辨識系統可做多種應用,如安全機制需要辨別簽名的真偽,或是辨識郵遞區號以快速將郵件進行區域分類等等。然而辨識系統困難之一卻是相似文字不容易被辨別出來,根據Chiang et al 的實驗,取前幾個最有可能的答案,其辨識率將高於僅取一個最有可能的答案,原因在於相似字形干擾了系統辨識結果。為了提高文字辨識率,有許多研究便著重於判斷字形輪廓是否為相似字形;或者計算樣本對樣板字形的距離,並取幾個最近距離的樣板的類別作為相似字形;另外還有尋找相似對(confusing pair)與自動分析相似對的關鍵區域(critical region),將相似的文字組成相似對,並利用費式判別分析法(Fisher Linear Discriminant Analysis)找出相似字形的相異區域。

    比較Leung et al 與Gao et al 的方法,相似對的數目已大幅減少,然而本文認為對於相似對數目的減少仍有改善空間,因此提出利用模糊C-Means聚類法(Fuzzy C-Means)將部分影像之同PCA特徵參數的資料進行類群分類,以類群代碼表示樣本各特徵參數具有最大歸屬度的類群,然後將影像的PCA特徵參數改由類群代碼組成的類群序列來表示,並找出對應的類群歸屬度向量。將得到的類群序列作為範本,取另一組樣本並計算與範本的類群序列相同時的類群歸屬度向量,利用兩組樣本的類群歸屬度向量計算它們的相似度。由於Leung et al 的相似對是透過貝氏分類器計算PCA特徵參數發生的機率而得,但是考慮相似字形一定會有特徵相似的特性,本文認為計算兩樣本特徵的相似度將可進一步篩選並減少相似對,也可減少進行第二階段辨識的字數和第二階段辨識的時間,但卻不需因此而犧牲系統的辨識率。


    Character recognition system can be applied in various areas, such as signature identification for security or zip-code recognition on mails. However, one of difficulties in character recognition system is to accurately identify similar characters. According to the experiment by Chiang et al, the recognition result by selecting the top few candidates has higher recognition rate than selecting the top one candidate. That proves similar characters influence the recognition result. To increase the recognition rate, there are many researches focusing on how to identify similar characters. For example, some researches identify similar characters based on their outline. Or calculate the distance between a sample and prototypes of classes. Select a few nearest prototypes as similar characters. Another method is to find out confusing pairs for similar characters and automatically detect critical regions in a confusing pair.

    If comparing results of Leung et al and Gao et al, the number of confusing pairs has been dramatically decreased by Leung et al. However, based on Leung et al’s result, it’s still possible to reduce the number of confusing pairs. Therefore, this thesis is to classify PCA character parameter by Fuzzy C-Means, define cluster sequences to represent each sample, find out the cluster-membership-degree vector, and then calculate the similarity degree between samples. Leung et al’s confusing pairs are discovered by Bayes classifier whose algorithm is mainly based on probability. Considering similar characters must have some similar features, using feature similarity degree between two characters will be a proper way to further reduce the number of confusing pairs as well as recognition time without impacting the recognition rate too much.

    摘要 ABSTRACT 誌謝 目錄 圖索引 公式索引 表索引 第一章 緒論 1.1 研究背景 1.2 研究目標與方法 1.3 論文架構 第二章 文獻探討 2.1 離線辨識系統 2.1.1 影像前處理 2.1.2 特徵擷取 2.1.3 辨識系統 2.2 相似字形 2.2.1 相似字形先分群後辨識 2.2.2 二階段辨識 2.3 實驗的系統架構 2.3.1 影像前處理 2.3.2 特徵擷取 2.3.3 尋找相似對 2.3.4 關鍵區域分析 第三章 無效相似對的消除 3.1 模糊C-Means聚類法 (Fuzzy C-Means) 3.2 類群序列與類群歸屬度向量 3.3 相似度計算與相似對之消除 3.4 辨識系統 第四章 實驗結果與分析 4.1 PCA特徵參數擷取數量的臨界值探討 4.2 系統辨識 4.2.1 消除無效相似對 4.2.2 最佳參數 4.3 尋找相似對的其它方法 第五章 結論 參考文獻

    [1] G. A. Montazer, H. Q. Saremi, and V. Khatibi, “A neuro-fuzzy inference engine for Farsi numeral characters recognition,” Expert Systems with Applications: An International Journal, vol. 37, issue 9, pp. 6327 – 6337, September 2010.
    [2] Y. M. Al-Omari, S. N. H. S. Abdullah, and K. Omar, “State-of-the-Art in offline signature verification system,” 2011 International Conference on Pattern Analysis and Intelligent Robotics, vol. 1, pp. 59 – 64, 28-29 June 2011.
    [3] J. Xue, X. Ding, C. Liu, R. Zhang, and W. Qian, “Location and interpretation of destination addresses on handwritten Chinese envelopes,” Pattern Recognition Letters, vol. 22, issue 6-7, pp. 639 – 656, May 2001.
    [4] K. C. Leung, C. H. Leung, “Recognition of handwritten Chinese characters by critical region analysis.” Pattern Recognition, vol. 43, issue 3, pp. 949 – 961, March 2010.
    [5] T. F. Gao and C. L. Liu, “High accuracy handwritten Chinese character recognition using LDA-based compound distances,” Pattern Recognition, vol 41, issue 11, pp. 3442 – 3451, November 2008.
    [6] H. Yamada, K. Yamamoto, and T. Saito, “A nonlinear normalization method for handprinted Kanji character recognition — Line density equalization,” Pattern Recognition, vol. 23, issue 9, pp. 1023 – 1029, 2009.
    [7] J. Tsukumo and H. Tanaka, “Classification of handprinted Chinese characters using nonlinear normalization and correlation methods,” 9th International Conference on Pattern Recognition, vol. 45, issue 5, pp. 168 – 171, 14-17 November 1988.
    [8] C. L. Liu and K. Marukawa, “Pseudo two-dimensional shape normalization methods for handwritten Chinese character recognition,” Pattern Recognition, vol. 38, issue 12, pp. 2242 – 2255, December 2005.
    [9] L. Lam, S. W. Lee, and C. Y. Suen, “Thinning methodologies -- A comprehensive survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, issue 9, pp. 869 – 885, Septmber 1992.
    [10] L. Y. Tseng and C. T. Chuang, “An efficient knowledge-based stroke extraction method for multi-font Chinese characters,” Pattern Recgnition, vol. 25, issue 12, pp. 1445 – 1458, 28 April 1992.
    [11] H. M. Lee, C. W. Huang, and C. C. Sheu, “A fuzzy rule-based system for handwritten Chinese characters recognition based on radical extraction,” Fuzzy Sets and Systems, vol. 100, issue 1-3, pp. 59 – 70, 1998.
    [12] Y. Chen, X. Ding, and Y. Wu, “Off-line handwritten Chinese character recognition based on crossing line feature,” Proceedings of the Fourth International Conference on Document Analysis and Recognition, vol. 1, pp. 206 – 210, 18-20 Aug 1997.
    [13] T. H. Su, T. W. Zhang, D. J. Guan, and H. J. Huang, “Off-line recognition of realistic Chinese handwriting using segmentation-free strategy,” Pattern Recognition, vol. 42, issue 1, pp. 167 – 182, January 2009.
    [14] D. Shi, R. I. Damper, and S. R. Gunn, “Offline handwritten Chinese character recognition by radical decomposition,” ACM Transactions on Asian language information processing (TALIP), vol. 2, issue 1, pp. 27 – 48, March 2003.
    [15] C. H. Chou, C. C. Lin, Y. H. Liu, and F. Chang, “A prototype classification method and its use in a hybrid solution for multiclass pattern recognition,” Pattern Recognition, vol. 39, issue. 4, pp. 624 – 634, April 2006.
    [16] K. Fukushima, “Neocognitron trained with winner-kill-loser rule,” Neural Networks, vol 23, issue 7, pp. 926 – 938, September 2010.
    [17] C. C. Chiang and S. S. Yu, “A method for improving the machine recognition of confusing Chinese characters,” Proceedings of the 13th International in Pattern Recognition, vol. 3, pp. 79 – 83, 25-29 Aug 1996.
    [18] F. Yang, X. D. Tian, X. Zhang, and X. B. Jia, “An Improved method for similar handwritten Chinese character recognition,” Proceedings of the 2010 Third International Symposium on Intelligent Information Technology and Security Informatics, pp. 419 – 422, 2010.
    [19] N. Liu, H. Wang, and W. Y. Yau, “Face recognition with weighted kernel principal component analysis,” 9th International Conference on Control, Automation, Robotics and Vision, pp. 1 – 5, 5-8 December 2006.
    [20] J. S. R. Jang, C. T. Sun, and E. Mizutani, Neuro-fuzzy and soft computing, Pearson Education Taiwan Ltd., Taipei, 2004.

    無法下載圖示 全文公開日期 2017/07/24 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE