研究生: |
魏愛津 Ai-Jin Wei |
---|---|
論文名稱: |
相似字元相似對的生成最佳化 Optimization of Finding Confusing Pairs for Similar Words |
指導教授: |
楊英魁
Ying-Kuei Yang |
口試委員: |
黎碧煌
Bih-Hwang Lee 孫宗瀛 Tsung-Ying Sun 謝鴻琳 Horng-Lin Shieh |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電機工程系 Department of Electrical Engineering |
論文出版年: | 2012 |
畢業學年度: | 100 |
語文別: | 中文 |
論文頁數: | 59 |
中文關鍵詞: | 類群歸屬度向量 、類群序列 、相似對 、文字辨識 、相似度 |
外文關鍵詞: | cluster-membership-degree vector, cluster sequence, confusing pair |
相關次數: | 點閱:154 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
文字辨識系統可做多種應用,如安全機制需要辨別簽名的真偽,或是辨識郵遞區號以快速將郵件進行區域分類等等。然而辨識系統困難之一卻是相似文字不容易被辨別出來,根據Chiang et al 的實驗,取前幾個最有可能的答案,其辨識率將高於僅取一個最有可能的答案,原因在於相似字形干擾了系統辨識結果。為了提高文字辨識率,有許多研究便著重於判斷字形輪廓是否為相似字形;或者計算樣本對樣板字形的距離,並取幾個最近距離的樣板的類別作為相似字形;另外還有尋找相似對(confusing pair)與自動分析相似對的關鍵區域(critical region),將相似的文字組成相似對,並利用費式判別分析法(Fisher Linear Discriminant Analysis)找出相似字形的相異區域。
比較Leung et al 與Gao et al 的方法,相似對的數目已大幅減少,然而本文認為對於相似對數目的減少仍有改善空間,因此提出利用模糊C-Means聚類法(Fuzzy C-Means)將部分影像之同PCA特徵參數的資料進行類群分類,以類群代碼表示樣本各特徵參數具有最大歸屬度的類群,然後將影像的PCA特徵參數改由類群代碼組成的類群序列來表示,並找出對應的類群歸屬度向量。將得到的類群序列作為範本,取另一組樣本並計算與範本的類群序列相同時的類群歸屬度向量,利用兩組樣本的類群歸屬度向量計算它們的相似度。由於Leung et al 的相似對是透過貝氏分類器計算PCA特徵參數發生的機率而得,但是考慮相似字形一定會有特徵相似的特性,本文認為計算兩樣本特徵的相似度將可進一步篩選並減少相似對,也可減少進行第二階段辨識的字數和第二階段辨識的時間,但卻不需因此而犧牲系統的辨識率。
Character recognition system can be applied in various areas, such as signature identification for security or zip-code recognition on mails. However, one of difficulties in character recognition system is to accurately identify similar characters. According to the experiment by Chiang et al, the recognition result by selecting the top few candidates has higher recognition rate than selecting the top one candidate. That proves similar characters influence the recognition result. To increase the recognition rate, there are many researches focusing on how to identify similar characters. For example, some researches identify similar characters based on their outline. Or calculate the distance between a sample and prototypes of classes. Select a few nearest prototypes as similar characters. Another method is to find out confusing pairs for similar characters and automatically detect critical regions in a confusing pair.
If comparing results of Leung et al and Gao et al, the number of confusing pairs has been dramatically decreased by Leung et al. However, based on Leung et al’s result, it’s still possible to reduce the number of confusing pairs. Therefore, this thesis is to classify PCA character parameter by Fuzzy C-Means, define cluster sequences to represent each sample, find out the cluster-membership-degree vector, and then calculate the similarity degree between samples. Leung et al’s confusing pairs are discovered by Bayes classifier whose algorithm is mainly based on probability. Considering similar characters must have some similar features, using feature similarity degree between two characters will be a proper way to further reduce the number of confusing pairs as well as recognition time without impacting the recognition rate too much.
[1] G. A. Montazer, H. Q. Saremi, and V. Khatibi, “A neuro-fuzzy inference engine for Farsi numeral characters recognition,” Expert Systems with Applications: An International Journal, vol. 37, issue 9, pp. 6327 – 6337, September 2010.
[2] Y. M. Al-Omari, S. N. H. S. Abdullah, and K. Omar, “State-of-the-Art in offline signature verification system,” 2011 International Conference on Pattern Analysis and Intelligent Robotics, vol. 1, pp. 59 – 64, 28-29 June 2011.
[3] J. Xue, X. Ding, C. Liu, R. Zhang, and W. Qian, “Location and interpretation of destination addresses on handwritten Chinese envelopes,” Pattern Recognition Letters, vol. 22, issue 6-7, pp. 639 – 656, May 2001.
[4] K. C. Leung, C. H. Leung, “Recognition of handwritten Chinese characters by critical region analysis.” Pattern Recognition, vol. 43, issue 3, pp. 949 – 961, March 2010.
[5] T. F. Gao and C. L. Liu, “High accuracy handwritten Chinese character recognition using LDA-based compound distances,” Pattern Recognition, vol 41, issue 11, pp. 3442 – 3451, November 2008.
[6] H. Yamada, K. Yamamoto, and T. Saito, “A nonlinear normalization method for handprinted Kanji character recognition — Line density equalization,” Pattern Recognition, vol. 23, issue 9, pp. 1023 – 1029, 2009.
[7] J. Tsukumo and H. Tanaka, “Classification of handprinted Chinese characters using nonlinear normalization and correlation methods,” 9th International Conference on Pattern Recognition, vol. 45, issue 5, pp. 168 – 171, 14-17 November 1988.
[8] C. L. Liu and K. Marukawa, “Pseudo two-dimensional shape normalization methods for handwritten Chinese character recognition,” Pattern Recognition, vol. 38, issue 12, pp. 2242 – 2255, December 2005.
[9] L. Lam, S. W. Lee, and C. Y. Suen, “Thinning methodologies -- A comprehensive survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, issue 9, pp. 869 – 885, Septmber 1992.
[10] L. Y. Tseng and C. T. Chuang, “An efficient knowledge-based stroke extraction method for multi-font Chinese characters,” Pattern Recgnition, vol. 25, issue 12, pp. 1445 – 1458, 28 April 1992.
[11] H. M. Lee, C. W. Huang, and C. C. Sheu, “A fuzzy rule-based system for handwritten Chinese characters recognition based on radical extraction,” Fuzzy Sets and Systems, vol. 100, issue 1-3, pp. 59 – 70, 1998.
[12] Y. Chen, X. Ding, and Y. Wu, “Off-line handwritten Chinese character recognition based on crossing line feature,” Proceedings of the Fourth International Conference on Document Analysis and Recognition, vol. 1, pp. 206 – 210, 18-20 Aug 1997.
[13] T. H. Su, T. W. Zhang, D. J. Guan, and H. J. Huang, “Off-line recognition of realistic Chinese handwriting using segmentation-free strategy,” Pattern Recognition, vol. 42, issue 1, pp. 167 – 182, January 2009.
[14] D. Shi, R. I. Damper, and S. R. Gunn, “Offline handwritten Chinese character recognition by radical decomposition,” ACM Transactions on Asian language information processing (TALIP), vol. 2, issue 1, pp. 27 – 48, March 2003.
[15] C. H. Chou, C. C. Lin, Y. H. Liu, and F. Chang, “A prototype classification method and its use in a hybrid solution for multiclass pattern recognition,” Pattern Recognition, vol. 39, issue. 4, pp. 624 – 634, April 2006.
[16] K. Fukushima, “Neocognitron trained with winner-kill-loser rule,” Neural Networks, vol 23, issue 7, pp. 926 – 938, September 2010.
[17] C. C. Chiang and S. S. Yu, “A method for improving the machine recognition of confusing Chinese characters,” Proceedings of the 13th International in Pattern Recognition, vol. 3, pp. 79 – 83, 25-29 Aug 1996.
[18] F. Yang, X. D. Tian, X. Zhang, and X. B. Jia, “An Improved method for similar handwritten Chinese character recognition,” Proceedings of the 2010 Third International Symposium on Intelligent Information Technology and Security Informatics, pp. 419 – 422, 2010.
[19] N. Liu, H. Wang, and W. Y. Yau, “Face recognition with weighted kernel principal component analysis,” 9th International Conference on Control, Automation, Robotics and Vision, pp. 1 – 5, 5-8 December 2006.
[20] J. S. R. Jang, C. T. Sun, and E. Mizutani, Neuro-fuzzy and soft computing, Pearson Education Taiwan Ltd., Taipei, 2004.