簡易檢索 / 詳目顯示

研究生: 許勝銘
sheng-ming shiu
論文名稱: 大詞彙客語語音辨識系統之初步研究
An Initial Study on Large-Vocabulary Hakka Speech Recognition System
指導教授: 古鴻炎
Hung-yan Gu
口試委員: 余明興
Ming-Shing Yu
王新民
Hsin-Min Wang
鍾國亮
Kuo-Liang Chung
林伯慎
Bor-Shen Lin
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2007
畢業學年度: 95
語文別: 中文
論文頁數: 60
中文關鍵詞: 客語語音辨識
外文關鍵詞: Hakka Speech Recognition
相關次數: 點閱:154下載:4
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文嘗試以HTK裡HMM模型的相關程式模組為基礎,來研究建造一個初步的大詞彙客語語音辨識系統,當使用HTK的文法工具,作大量詞彙的直接辨識時,辨識速度非常緩慢,因此我們研究了一種兩階段的辨識作法,實驗顯示辨識正確率雖然略為降低,但辨識速度則得到大幅度的改進,已接近即時處理。此外我們以實驗方式找出最適合客語語音辨識的模型參數設定,發現以右相關聲母及韻母單位來建立HMM模型,模型狀態數設為4,高斯混合數設為11,並將兩階段辨識方式中候選音節數量設為13時,可以得到最好的辨識正確率90.0%,而辨識每個詞彙平均需花費約0.95秒。


    In this thesis, we study to build a large-vocabulary Hakka spoken word recognition system. Some of the system modules are directly took from the HMM modules in HTK. When the grammar tool in HTK is used to do large vocabulary word recognition, the recognition speed of the system is quite slow. Therefore, we propose and study a two-stage based recognition method. With this method, the recognition speed is largely improved and near to real-time although the recognition rate is slightly decreased. In addition, we have found the best values for the model parameters according to experiments executed with the Hakka word recognition system. That is, right-context dependent initial unit and final unit are best segmentation units for HMM acoustic modeling. Besides, the best number of states is 4, the best number of mixtures is 11, and the best number of syllable candidates for word construction in the second recognition stage is 13. Under this setting of parameter values, the highest recognition rate obtained is 90.0%, and the time spent to recognize a word utterance is 0.95 second in average.

    摘要 I ABSTRACT II 誌謝 III 目錄 IV 圖表索引 VI 第1章 緒論 1 1.1 研究動機及目的 1 1.2 語音辨識處理流程 2 1.2 語音辨識研究回顧 3 1.3 研究方法 5 1.4 論文架構 6 第2章 HTK工具軟體 7 2.1 HTK簡介 7 2.2 標音 10 2.2.1 詞彙標音 10 2.2.2 音節標音 11 2.2.3 次音節標音 11 2.3 字典與文法規則 12 2.3.1 字典編輯 12 2.3.2 文法規則 13 2.3 特徵參數抽取 16 2.5 模型訓練 16 2.6 辨識比對 18 第3章 客語語音辨識系統 19 3.1 客語簡介 19 3.2 語料錄製 22 3.3 MFCC特徵係數 22 3.4 聲學模型 25 3.5 字典及詞彙網路建造 27 3.6 客語語音辨識系統-訓練階段 28 3.7 客語語音辨識系統-辨識階段 33 3.7.1 原始辨識方式 34 3.7.2 兩階段辨識 37 3.8 辨識系統之介面 39 第4章 辨識系統測試實驗 41 4.1 辨識速度比較 41 4.2 聲學模型比較 43 4.3 狀態數與高斯混合數的組合 45 4.4 MVN與不同測試語料 47 第5章 結論與未來展望 50 參考文獻 52 附錄(一) 客語音節列表 54 附錄(二) 右相關聲母與韻母列表 58 附錄(三) 測試語料詞彙列表 59 作者簡介 60

    [1] Douglas O’Shaughnessy, Speech Communication Human and Machine, Addison-Wesley Publishing Company, 1987.
    [2] Steve Young, Gunnar Evermann, Thomas Hain, Dan Kershaw, Gareth Moore, Julian Odell, Dave Ollason, Dan Povey, Valtcho Valtchev, Phil Woodland, The HTK Book ( for HTK version 3.2.1), Cambridge University Engineering Department, 2002.
    [3] http://www.speech.kth.se/wavesurfer/index.html
    [4] Eric Chang, Jianlai Zhou, Shuo Di, Chao Huang, Kai-Fu Lee, “Large Vocabulary Mandarin Speech Recognition with Different Approaches in Modeling Tones”, International Conference on Spoken Language Processing, ICSLP’00, pp.983-976, 2000.
    [5] Tranzai Lee, Fang Zheng, Wenhu Wu, “Reference Point Alignment Frequency Warp Method for Speaker Adaptation”, International Conference on Signal Pocessings, ICSP’02, pp.756-759, 2000.
    [6] Arlo Faria, David Gelbart, “Efficient Pitch-based Estimation of VTLN Warp Factors”, International Computer Science Institute EUROSPEECH, pp.213-216, 2005
    [7] Brian Mak, James T. Kwok, and Simon Ho, “A Study of Various Composite Kernels for Kernel Eigenvoice Speaker Adaptation”, IEEE Int. Conf. Speech and Signal Processing, Vol. 1, pp. 325-328, May 2004
    [8] Kuan-ting Chen, Wen-wei Liau, Hsin-min Wang and Lin-shan Lee, “Fast Speaker Adaptation Using Eigenspace-based Maximum Likelihood Linear Regression”, International Conference on Spoken Language Processing, ICSLP’00, pp.742-745, 2000.
    [9] Veera Venkataramani, William Byrne, “MLLR Adaptation Techniques for Pronunciation Modeling”, IEEE Workshop. Automatic Speech Recognition and Understanding, pp.421-424, Dec 2001.
    [10] T.Pfau, R. Faltlhauser, and G. Ruske, “A Combination of Speaker Normalization and Speech Rate Normalization for Automatic Speech Recognition”, International Conference on Spoken Language Processing, ICSLP’00, pp. 362-365, 2000.
    [11] Stephen Cox, “Speaker Normalization in the MFCC Domain”, International Conference on Spoken Language Processing, ICSLP’00, pp.853-856, 2000.
    [12] Zhu Xuan, Chen Yining, Liu Jia, Liu Runsheng, “Feature Selection in Mandarin Large Vocabulary Continuous Speech Recognition”, International Conference on Signal Processing Proceedings, ICSP’02, pp.508-511, 2002.
    [13] Ángel de la Torre, Antonio M. Peinado, Member, “Histogram Equalization of Speech Representation for Robust Speech Recognition”, IEEE Trans. Speech and Audio Processing, Vol. 13, No. 3, pp.355-366, May 2005.
    [14] 范文芳主編, 台灣語通用拼音普及版, 1999.
    [15] 何石松主編, 現代客語詞彙彙編, 2002.
    [16] 黃銘崇, 不特定語者語詞辨識系統特徵設計, 國立中山大學電機工程學系研究所, 碩士論文, 2001.
    [17] 劉富華, 基於兩段式訓練之馬可夫模型的國語音節辨認系統, 國立台灣大學電機工程研究所, 碩士論文, 1988.
    [18] 梁伯宇, 國語連續語音辨識之聲學模型研究, 國立台灣大學電機工程研究所, 碩士論文, 1998.
    [19] 呂道誠, 不特定語者、國台雙語大詞彙語音辨識之聲學模型研究, 長庚大學電機工程研究所, 碩士論文, 2001.
    [20] 楊鎮光, 快速演算法在大字彙關鍵詞萃取上的應用,國立中央大學電機工程研究所, 碩士論文, 2001.
    [21] 杜佳倫, 台灣客家話摡述, ACLCLP通訊, 第十七卷, 第五期, 2006.

    QR CODE