研究生: |
游政仁 Cheng-Jen YU |
---|---|
論文名稱: |
語者分群及語音命令辨識之研究 A Study on Speaker Clustering and Voice Command Recognition |
指導教授: |
古鴻炎
Hung-Yan Gu |
口試委員: |
王新民
Hsin-Min Wang 余明興 Ming-Sing Yu 林柏慎 Bor-Shen Lin 鍾國亮 Kuo-Liang Chung |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
論文出版年: | 2008 |
畢業學年度: | 96 |
語文別: | 中文 |
論文頁數: | 92 |
中文關鍵詞: | 語者分群 、語音辨識 |
外文關鍵詞: | speaker clustering, speech recognition |
相關次數: | 點閱:192 下載:2 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文以GMM來塑模(modeling)各語者的聲學特性,然後使用三種不同的距離量測,來計算基於pseude-divergence的GMM模型之間的距離。接著,分別實驗k-means分群法和樹狀結構分群法,對TCC-300語料庫中的303位語者進行分群,之後取出分群結果的群中心語者來進行 HMM 模型訓練,以進行語音命令的辨識實驗。實驗結果顯示,使用TCC-300語料庫當中所有語者所建立的HMM模型,其線外辨識率達到87.7%;使用k-means分群法所得到的少量語者所建立的HMM模型,其線外辨識率達到87.1%;而使用樹狀結構分群法所得到的少量語者所建立的HMM模型,其線外辨識率僅達到76.3%。此外,我們也探討音節模型、聲母和韻母模型與右相關聲母和韻母模型對於辨識率的影響,結果顯示音節模型的辨識率最高。為了應用於機器人上來作即時的語音命令控制,我們也實際建造了一個線上的語音命令辨識系統。
In this thesis, GMM(Gaussian mixture model) is used to model the acoustic characteristics of a speaker, and three distance measures are used to measure the distance between two GMM based on pseudo-divergence. Then, two different clustering methods, approximated k-means clustering and tree-structured clustering, are studied. By using the two clustering methods, the 303 speakers in the TCC-300 corpus are clustered, and the utterences of the centroid speaker of each cluster are gathered to train the HMM(hidden Markov model) models for voice command recognition experiments. According to the results of the experiments, if we use all the speakers’ utterences of TCC-300 corpus, the highest recognition rate is 87.7%. If we use the HMM models contructed by using the fewer speakers selected by the approximated k-means clustering method, the highest recognition rate can reach 87.1%. But if using the HMM models contructed by using the fewer speakers selected by the tree-structed clustering method, the highest recognition rate is only 76.3%. In addition, three kinds of modeling units, syllable, initial plus final , and right-context dependent initial plus final, are studied to realize their influences on recognition rate. The experiment results show that syllable is the best choice. Furthermore, we have contructed a real-time voice command recognition system that can be used to control a robot.
[1] Padmanabhan M. and Bahl L. R., “Speaker clustering and
transformation for speaker adaptation in speech recognition
systems,” IEEE Transaction On Speech and Audio Processing, vol
6, issue 1,pp. 71-77, 1998.
[2] Natio M., Deng L and Sagisaka Y., “Speaker clustering for speech
recognition using the paraeters characterizing vocal-tract dimensions,” In Proc. ICASSP,pp. 981-984,1998.
[3] N. Sugamura, K. Shikano and S. Furui, “Isolated word recognition
using phoneme-like templates,” International Conference on Acoustic, Speech and Signal Processing, ICASSP,Vol 8, pp.723-726,1983.
[4] Yamada M. and Komori Y., “Fast algorithm for speech recognition using speaker cluster HMM,” Proc, EuroSpeech,pp. 2043-2046,1997.
[5] Xaun Peng, Wang Xu and Bingxi Wang, “Speaker clustering via novel pseudo-divergence of Gaussian mixture models,” International Conference on Natural Language Processing and Knowledge Engineering , NLP-KE’05,pp.111-114,2005.
[6] D. Liu and F. Kubala, “Online speaker clustering,” International Conference on Acoustic, Speech and Signal Processing, ICASSP’03,vol 1,pp. 572-575,2003.
[7] A. N. Iyer, U. 0. Ofoegbu, R. B. Yantorno* and B. Y. Smolenskit,
“Blind speaker clustering,” Intelligent Signal Processing and Communications, ISPACS’06,pp. 343-346,2006.
[8] W. H. Tsai and S. M. Wang, “Speaker clustering based on minimum rand index,” International Conference on Acoustic, Speech and Signal Processing, ICASSP’07,Vol 4,pp.15-20,2007.
[9] Eric chang, Jianlai Zhou, Shuo Di, Chao Huang, Kai-Fu Lee, “Large vocabulary mandarin speech recognition with different approaches in modeling tones,” International conference on Spoken Language Processing,ICSLP’00,pp.976-983,2000.
[10] Douglas O’Shaughnessy, Speech Communication Human and Machine, Addison-Wesley Publishing Company, 1987.
[11] Steve Young, Gunnar Evermann, Thomas Hain, Dan Kershaw, Gareth Moore, Julian Odell, Dave Ollason, Dan Povey, Valtcho Valtchev, Phil Woodland, The HTK Book( for HTK version 3.2.1), Cambridge University Engineering Department, 2002.
[12] 王小川, 語音訊號處理,全華出版社,2005。
[13] Wikipedia,Mahalanobis distance
http://en.wikipedia.org/wiki/Mahalanobis_distance
[14] Wikipedia,Bhattacharyya distance
http://en.wikipedia.org/wiki/Bhattacharyya_distance
[15] Wikipedia,Hellinger distance
http://en.wikipedia.org/wiki/Hellinger_distance
[16] 林俊青,多國語言辨識系統之特徵設計研究,碩士論文,國立中山大學電機研究所,2002。
[17] Calinshi T.,Harabasz J.,”A dendrite method for cluster analysis,” Communication in Statistics, Vol. 3, pp.1-27, 1974.
[18] Davies, DL, Bouldin, D.W., “A cluster separation measure,” IEEE Transactions on Pattern Analysis and machine Itelligence, Vol. 1(2) , 1979.
[19] Dunn J.C., “Well separated clusters and optional fuzzy partitions,” J.Cybern, Vol. 4, pp. 95-104, 1974.
[20] 陳科旭,使用右文相關聲韻母模式之國語關鍵詞辨認,碩士論文,國立交通大學電信工程研究所,1989。
[21] 謝寶華,使用前後文相關HMM模式之國語連續語音辨認,碩士論文,國立交通大學電信工程研究所,1990。