研究生: |
趙文儀 Wun-yi Jhao |
---|---|
論文名稱: |
以參考語者為基礎之語者調適方法研究 A Study on Speaker Adaptation Based on Reference Speakers |
指導教授: |
林伯慎
Bor-shen Lin |
口試委員: |
羅乃維
Nai-wei Lo 古鴻炎 Hung-yan Gu |
學位類別: |
碩士 Master |
系所名稱: |
管理學院 - 資訊管理系 Department of Information Management |
論文出版年: | 2010 |
畢業學年度: | 98 |
語文別: | 中文 |
論文頁數: | 33 |
中文關鍵詞: | 語者調適 、模型加權組合 、特徵參數轉換 、最大相似度線性迴歸調適法 、最大事後機率調適法 |
外文關鍵詞: | speaker adaptation, feature transformation, model combination, MAP, MLLR, reference speakers |
相關次數: | 點閱:162 下載:1 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在語者調適的方法中,如何在有限語料的情況下,對辨識率進行有效的提升,一直都是重要的議題。本論文探討了一些以參考語者為基礎的語者調適方法,在只需使用者的少量語料的情況下也能達到好的辨識率。所謂的參考語者是指語料庫裡與目標語者統計特性相近的語者。我們探討的方法包括:直接使用相近的語料進行調適、用參考語者模型做加權組合、以及參考語者特徵轉換等三種方式。進一步,我們對「參考語者特徵轉換」調適法做了修正,加入了目標語者的MLLR_MAP調適模型以改善辨識率。最後,我們結合了特徵參數轉換與模型加權組合這兩種調適法,並討論其對於辨識率的影響。
由實驗結果得知,在「參考語者特徵轉換」調適法中加入目標語者的調適模型,最多可對於語者不特定模型的辨識率提升約6.67%,對於改善辨識率可以有明顯的提升。而在調整變異數對實驗結果的影響方面,根據實驗結果顯示,在模型重估時使用原來的變異數而不重新調整的話,通常可以有較佳的辨識率。最後在結合「參考語者特徵轉換」與「參考語者模型加權」這兩種調適法的方法中,實驗結果顯示比起原本只使用與目標語者相近的參考語者做加權組合,其實驗結果會有較佳的辨識率。
How to effectively improve the recognition rate through speaker adaptation has been an important issue for speaker recognition system. In this paper we discuss some speaker adaptation methods based on reference speakers, including finding the speeches or speakers in the database that are close to the target speaker, the acoustic models of the reference speakers, and transforming the features of reference speakers during re-estimation. All these basic methods can achieve improvements on adaptation performances. In addition, we modify and combine the above methods to obtain new adaptation schemes, and achieve better performance than MAP, MLLR methods.
In the experiment results, compared to speaker independent model, the best obtainable performance is 6.67% based on the feature transformation. Also, when integrating model combination approach and feature transformation approach for reference speakers, a better performance can be achieved. It can be concluded that, it is potentially helpful to utilize the data or models of reference speakers for improving adaptation performance provided the reference speakers are close enough to the target speaker.
[1] T. K. Moon, “The Expectation-Maximization Algorithm,” IEEE
Signal Processing Magazine, vol. 13, no. 6, pp. 47-60, November
1996.
[2] Leggetter,C.J. and Woodland,P.C. “Maximum likelihood linear regression
for speaker adaptation of continuous density hidden Markov models”
Computer Speech and Language,vol.9,no.2,pp.171-185,April ,1995
[3] L. Breiman,J.H. Friedman,R.A. Olshen,and C.J. Stone
(1984),“Classification and Regression Tree,” Wadsworth,California.
[4] The HTK Book (for HTK Version 3.2.1)
[5] L. R. Rabiner,“A Tutorial on Hidden Markov Models and Selected
Applications in Speech Recognition”,Proc. IEEE,Vol. 77,No.2, pp.
257-286, Feb. 1989
[6] Brian Mak*, Tsz-Chung Lai, Roger Hsiao, “Improving Reference Speaker
Weighting Adaptation By The Use Of Maximum-Likelihood Reference
Speakrs”,ICASSP 2006
[7] Teng wenxuan,Guillaume Gravier,Frédéric Bimbot,Frédéric Soufflet “Rapid
Speaker Adaptation by Reference Model Interpolation” , INTERSPEECH 2007
[8] Wen Xuan Teng,Guillaume Gravier,Frédéric Bimbot,Frédéric Souffle
“Speaker Adaptation By Variable Reference MOdel Subspace And
Application To Large Vocabulary Speech Recognition”, ICASSP 2009.
[9] Chao Huang,Tao Chen,Eric Chang“Adaptive Model Combination For Dynamic
Speaker Selection Training”, ICCLP 2002
[10] Chao Huang,Tao Chen,Eric Chang“Transformation and Combination of Hidden
Markov Models for Speaker Selection Training”, ICCLP 2004
[11] M. Padmanabhan,L. R. Bahl,D. Nahamoo,M. A. Picheny “Speaker Clustering
and Transformation for Speaker Adaptation in Speech Recognition
Systems”, IEEE TRANS,SPEECH AND SIGNAL PROCESSING 1998
[12] 麥克風語料庫TCC300Edu (中華民國計算語言學學會發行)