研究生: |
蔡仲明 Chung-Ming Tsai |
---|---|
論文名稱: |
基於GMM及PPM模型的國、閩南、客語之語言辨識 Language Identification of Mandarin, Holo, and Hakka based on GMM and PPM models |
指導教授: |
古鴻炎
Hung-Yan Gu |
口試委員: |
蔡偉和
Wei-Ho Tsai 鍾國亮 Kuo-Liang Chung 余明興 Ming-Shing Yu 王新民 Hsin-Min Wang |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
論文出版年: | 2007 |
畢業學年度: | 95 |
語文別: | 中文 |
論文頁數: | 61 |
中文關鍵詞: | 語言辨識 、方言辨識 、GMMKMean 、線上語言辨識 、國台客語辨識 、國閩南客語辨識 |
外文關鍵詞: | language identification, GMMKMean, Chinese dialect identification, online language identification |
相關次數: | 點閱:144 下載:1 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文採用梅爾倒頻譜係數及其差分係數來代表聲學特性,再以Legendre多項式和離散餘弦函數去逼近音調軌跡,而萃取出音調特性的係數。接著對這兩類係數分別建立高斯混合模型,此外,也對各個音框的特徵係數作表徵化,再去建立PPM模型,以模式化連續的表徵之間的語言特性。除了以線外實驗探討適合對國、閩南、客語三種語言作辨識的模型結構外,我們也建造了一個可供線上測試的自動語言辨識系統。在線外測試中,三種語言的平均辨識率可到達97%,而在線上的初步測試裡,辨識率則可達73%之平均值。
In this thesis, different models and model structures are studied to identify among the three languages: Mandarin, Holo, and Hakka. Acoustic features are represented as Mel-frequency cepstrum coefficients (MFCC). In addition, Legendre polynomials and discrete cosine transforms (DCT) are used to approximate the pitch contour of a voice segment. For the two kinds of features, Gauussian mixture models (GMM) are constructed respectively. Also, each frame’s feature vector is tokenized in order to construct a prediction by partial matching (PPM) model to modelize the characteristics embedded in a sequence of tokens. According to the models studied, a practical language identification system has been built. In offline tests, identification success rate can reach 97% in average. In initial online (i.e. inputting with a telephone) tests, the system can obtain a success rate of 73% in average.
[1] K. Li and T. Edwards, “Statistical models for automatic language identification,” Proc. International Conference on Acoustics, Speech, Signal Processing, 1980.
[2] M. A. Zissman, “Comparison of four approaches to automatic language identification,” IEEE Trans. Speech and Audio Processing, vol. 4, pp. 31-44, Jan. 1996.
[3] J. Navrátil, “Spoken language recognition—a step toward multilinguality in speech processing,” IEEE Trans. Speech and Audio Processing, vol. 9, Sep. 2001.
[4] L. Mary, K. S. Rao, and B. Yegnanarayana, “Neural network classifiers for language identification using phonotactic and prosodic features,” International Conference on Intelligent Sensing and Information Processing, 2005.
[5] W. H. Tsai and W. W. Chang, “Discriminative training of Gaussian mixture bigram models with application to Chinese dialect identification,” Speech Communication, vol. 36, pp. 317-326, Mar. 2002.
[6] P. A. Torres-Carrasquillo, D. A. Reynolds, and J. R. Deller Jr., “Language identification using Gaussian mixture model tokenization,” IEEE International Conference on Acoustics, Speech, Signal Processing, May 2002.
[7] C. Y. Lin and H. C. Wang, “Language identification using pitch contour information in the ergodic Markov model,” IEEE International Conference on Acoustics, Speech, Signal Processing, Jun. 2006.
[8] 林俊青,多國語言辨識系統之特徵設計研究,碩士論文,國立中山大學電機工程研究所,2002。
[9] 張智傑,以高斯混合模型表徵器與語言模型為基礎之語言辨認研究,碩士論文,國立清華大學電機工程研究所,2005。
[10] 郭頂益,多國語言辨識系統之設計研究,碩士論文,國立中山大學電機工程研究所,1999。
[11] 林俊宇,應用隱含式語意索引與語言模型於中英夾雜語音之語言鑑別,碩士論文,國立成功大學資訊工程研究所,2001。
[12] University of Cambridge, “Hidden Markov model Toolkit (HTK),” http://htk.eng.cam.ac.uk/.
[13] D. O’Shaughnessy, Speech Communications, 2nd ed., IEEE Press, 2000.
[14] H. Y. Gu, H. F. Chang, and J. H. Wu, “A pitch-contour normalization method following Zhao’s pitch scale and its application,” Conference on Computational Linguistics and Speech Processing (ROCLING), 2004.
[15] H. Y. Gu, S. Y. Sun, and H. F. Chang, “A scoring system for Mandarin tones uttered in disyllabic words,” Conference on Computational Linguistics and Speech Processing (ROCLING), 2006.
[16] A. V. Oppenheim and R. W. Schafter, Descrete-time Signal Processing, 2nd ed., Prentice-Hall, 1999.
[17] Wikipedia, “Windows function,” http://en.wikipedia.org/wiki/Window_function.
[18] Diracdelta.co.uk, “Hamming window,” http://www.diracdelta.co.uk/science/source/h/a/hamming%20window/source.html.
[19] Wikipedia, “Legendre polynomials,” http://en.wikipedia.org/wiki/Legendre_polynomials.
[20] S. J. Leon, Linear Algebra with Applications, 7th ed., Prentice-Hall, 2006.
[21] S. Chakrabarti, Mining the Web, Morgan Kaufman, 2003.
[22] M. Kaufmann, Data Compression, 2nd ed., Morgan Kaufmann, 2000.
[23] W. J. Teahan, “Probability estimation for PPM,” New Zealand Computer Science Research Student Conference, Apr. 1995.
[24] Erik de Castro Lopo, “Secret rabbit code,” “http://www.mega-nerd.com/SRC/index.html.