簡易檢索 / 詳目顯示

研究生: 蔡仲明
Chung-Ming Tsai
論文名稱: 基於GMM及PPM模型的國、閩南、客語之語言辨識
Language Identification of Mandarin, Holo, and Hakka based on GMM and PPM models
指導教授: 古鴻炎
Hung-Yan Gu
口試委員: 蔡偉和
Wei-Ho Tsai
鍾國亮
Kuo-Liang Chung
余明興
Ming-Shing Yu
王新民
Hsin-Min Wang
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2007
畢業學年度: 95
語文別: 中文
論文頁數: 61
中文關鍵詞: 語言辨識方言辨識GMMKMean線上語言辨識國台客語辨識國閩南客語辨識
外文關鍵詞: language identification, GMMKMean, Chinese dialect identification, online language identification
相關次數: 點閱:144下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文採用梅爾倒頻譜係數及其差分係數來代表聲學特性,再以Legendre多項式和離散餘弦函數去逼近音調軌跡,而萃取出音調特性的係數。接著對這兩類係數分別建立高斯混合模型,此外,也對各個音框的特徵係數作表徵化,再去建立PPM模型,以模式化連續的表徵之間的語言特性。除了以線外實驗探討適合對國、閩南、客語三種語言作辨識的模型結構外,我們也建造了一個可供線上測試的自動語言辨識系統。在線外測試中,三種語言的平均辨識率可到達97%,而在線上的初步測試裡,辨識率則可達73%之平均值。


    In this thesis, different models and model structures are studied to identify among the three languages: Mandarin, Holo, and Hakka. Acoustic features are represented as Mel-frequency cepstrum coefficients (MFCC). In addition, Legendre polynomials and discrete cosine transforms (DCT) are used to approximate the pitch contour of a voice segment. For the two kinds of features, Gauussian mixture models (GMM) are constructed respectively. Also, each frame’s feature vector is tokenized in order to construct a prediction by partial matching (PPM) model to modelize the characteristics embedded in a sequence of tokens. According to the models studied, a practical language identification system has been built. In offline tests, identification success rate can reach 97% in average. In initial online (i.e. inputting with a telephone) tests, the system can obtain a success rate of 73% in average.

    第1章 導言 1.1 研究動機 1.2 文獻回顧 1.3 系統架構 1.4 論文大綱 第2章 語音訊號處理及特徵萃取 2.1 語音檔案格式 2.2 聲學特徵 2.2.1 語音訊號預處理 2.2.2 MFCC特徵參數 2.2.3 MFCC抽取工具:HTK 2.3 音調特徵 2.3.1 計算音高 2.3.2 音高軌跡修正 2.3.3 音高軌跡分段 2.3.4 Legendre Polynomial係數 2.3.5 DCT轉換 第3章 語言辨識模型 3.1 語言特徵 3.2 K-Mean分群 3.3 高斯混合模型(GMM) 3.4 K-Mean based on GMM distance 3.5 PPM模型 3.6分數量測 第4章 線外測試實驗 4.1 實驗語料及環境 4.2 MFCC GMM 混合數實驗 4.3 MFCC差分間距之實驗 4.4 PPM階數及逃脫機率實驗 4.5 音調特徵實驗 4.6 整體系統實驗 第5章 線上辨識系統及實驗 5.1 系統組成 5.2 實驗結果 5.3 線上系統之程式介面 第6章 結論

    [1] K. Li and T. Edwards, “Statistical models for automatic language identification,” Proc. International Conference on Acoustics, Speech, Signal Processing, 1980.
    [2] M. A. Zissman, “Comparison of four approaches to automatic language identification,” IEEE Trans. Speech and Audio Processing, vol. 4, pp. 31-44, Jan. 1996.
    [3] J. Navrátil, “Spoken language recognition—a step toward multilinguality in speech processing,” IEEE Trans. Speech and Audio Processing, vol. 9, Sep. 2001.
    [4] L. Mary, K. S. Rao, and B. Yegnanarayana, “Neural network classifiers for language identification using phonotactic and prosodic features,” International Conference on Intelligent Sensing and Information Processing, 2005.
    [5] W. H. Tsai and W. W. Chang, “Discriminative training of Gaussian mixture bigram models with application to Chinese dialect identification,” Speech Communication, vol. 36, pp. 317-326, Mar. 2002.
    [6] P. A. Torres-Carrasquillo, D. A. Reynolds, and J. R. Deller Jr., “Language identification using Gaussian mixture model tokenization,” IEEE International Conference on Acoustics, Speech, Signal Processing, May 2002.
    [7] C. Y. Lin and H. C. Wang, “Language identification using pitch contour information in the ergodic Markov model,” IEEE International Conference on Acoustics, Speech, Signal Processing, Jun. 2006.
    [8] 林俊青,多國語言辨識系統之特徵設計研究,碩士論文,國立中山大學電機工程研究所,2002。
    [9] 張智傑,以高斯混合模型表徵器與語言模型為基礎之語言辨認研究,碩士論文,國立清華大學電機工程研究所,2005。
    [10] 郭頂益,多國語言辨識系統之設計研究,碩士論文,國立中山大學電機工程研究所,1999。
    [11] 林俊宇,應用隱含式語意索引與語言模型於中英夾雜語音之語言鑑別,碩士論文,國立成功大學資訊工程研究所,2001。
    [12] University of Cambridge, “Hidden Markov model Toolkit (HTK),” http://htk.eng.cam.ac.uk/.
    [13] D. O’Shaughnessy, Speech Communications, 2nd ed., IEEE Press, 2000.
    [14] H. Y. Gu, H. F. Chang, and J. H. Wu, “A pitch-contour normalization method following Zhao’s pitch scale and its application,” Conference on Computational Linguistics and Speech Processing (ROCLING), 2004.
    [15] H. Y. Gu, S. Y. Sun, and H. F. Chang, “A scoring system for Mandarin tones uttered in disyllabic words,” Conference on Computational Linguistics and Speech Processing (ROCLING), 2006.
    [16] A. V. Oppenheim and R. W. Schafter, Descrete-time Signal Processing, 2nd ed., Prentice-Hall, 1999.
    [17] Wikipedia, “Windows function,” http://en.wikipedia.org/wiki/Window_function.
    [18] Diracdelta.co.uk, “Hamming window,” http://www.diracdelta.co.uk/science/source/h/a/hamming%20window/source.html.
    [19] Wikipedia, “Legendre polynomials,” http://en.wikipedia.org/wiki/Legendre_polynomials.
    [20] S. J. Leon, Linear Algebra with Applications, 7th ed., Prentice-Hall, 2006.
    [21] S. Chakrabarti, Mining the Web, Morgan Kaufman, 2003.
    [22] M. Kaufmann, Data Compression, 2nd ed., Morgan Kaufmann, 2000.
    [23] W. J. Teahan, “Probability estimation for PPM,” New Zealand Computer Science Research Student Conference, Apr. 1995.
    [24] Erik de Castro Lopo, “Secret rabbit code,” “http://www.mega-nerd.com/SRC/index.html.

    QR CODE