語者分群及語音命令辨識之研究｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	游政仁 Cheng-Jen YU
論文名稱：	語者分群及語音命令辨識之研究 A Study on Speaker Clustering and Voice Command Recognition
指導教授：	古鴻炎 Hung-Yan Gu
口試委員:	王新民 Hsin-Min Wang 余明興 Ming-Sing Yu 林柏慎 Bor-Shen Lin 鍾國亮 Kuo-Liang Chung
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2008
畢業學年度：	96
語文別：	中文
論文頁數：	92
中文關鍵詞：	語者分群、語音辨識
外文關鍵詞：	speaker clustering, speech recognition
相關次數：	點閱：192 下載：2
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本論文以GMM來塑模(modeling)各語者的聲學特性，然後使用三種不同的距離量測，來計算基於pseude-divergence的GMM模型之間的距離。接著，分別實驗k-means分群法和樹狀結構分群法，對TCC-300語料庫中的303位語者進行分群，之後取出分群結果的群中心語者來進行 HMM 模型訓練，以進行語音命令的辨識實驗。實驗結果顯示，使用TCC-300語料庫當中所有語者所建立的HMM模型，其線外辨識率達到87.7%；使用k-means分群法所得到的少量語者所建立的HMM模型，其線外辨識率達到87.1%；而使用樹狀結構分群法所得到的少量語者所建立的HMM模型，其線外辨識率僅達到76.3%。此外，我們也探討音節模型、聲母和韻母模型與右相關聲母和韻母模型對於辨識率的影響，結果顯示音節模型的辨識率最高。為了應用於機器人上來作即時的語音命令控制，我們也實際建造了一個線上的語音命令辨識系統。

In this thesis, GMM(Gaussian mixture model) is used to model the acoustic characteristics of a speaker, and three distance measures are used to measure the distance between two GMM based on pseudo-divergence. Then, two different clustering methods, approximated k-means clustering and tree-structured clustering, are studied. By using the two clustering methods, the 303 speakers in the TCC-300 corpus are clustered, and the utterences of the centroid speaker of each cluster are gathered to train the HMM(hidden Markov model) models for voice command recognition experiments. According to the results of the experiments, if we use all the speakers’ utterences of TCC-300 corpus, the highest recognition rate is 87.7%. If we use the HMM models contructed by using the fewer speakers selected by the approximated k-means clustering method, the highest recognition rate can reach 87.1%. But if using the HMM models contructed by using the fewer speakers selected by the tree-structed clustering method, the highest recognition rate is only 76.3%. In addition, three kinds of modeling units, syllable, initial plus final , and right-context dependent initial plus final, are studied to realize their influences on recognition rate. The experiment results show that syllable is the best choice. Furthermore, we have contructed a real-time voice command recognition system that can be used to control a robot.

摘要	I
ABSTRACT	II
誌謝	IV
目錄	V
圖表索引	VII
第1章	緒論	1
1.1  研究動機及目的	1
1.2  分群研究方法之回顧	2
1.3  語音辨識研究之回顧	3
1.4	研究方法	5
1.5	論文架構	6
第2章	語音訊號處理及特徵萃取	8
2.1	語音信號處理步驟	8
2.2  端點偵測	9
2.3  MFCC萃取	13
第3章	高斯混合模型及其距離計算	18
3.1  高斯混合函數的參數估測	18
3.2  高斯分佈之距離定義	22
3.2.1 Mahalanobis Distance	22
3.2.2 Bhattacharyya Distance	23
3.2.3 Hellinger Distance	23
3.3  基於pseudo-divergence  之模型距離	24
第4章	分群方法	28
4.1  傳統k-means分群	28
4.2  修正之k-means分群	30
4.3  樹狀結構分群	32
4.4  群聚適切性評估	35
4.5  分群實驗	38
4.6  實驗結果討論	46
第5章	語音命令辨識-線外測試	48
5.1  訓練語料	48
5.2  聲學模型	48
5.3  聲學模型訓練	50
5.3.1語料預處理	50
5.3.2訓練流程與操作步驟	50
5.4  線外辨識實驗	56
第6章	語音命令辨識-線上測試	64
6.1  HTK相關檔案準備	64
6.2  程式架構與製作	67
6.3  程式介面與操作	69
6.4  線上測試結果	71
第7章	結論	73
參考文獻	76
附錄(一) 40-H-4分群結果及其辨識混淆情形	79
附錄(二) 40-B分群結果及其辨識混淆情形	81
附錄(三) 10群、15群、20群的辨識率	83
附錄(四) 中文音節列表	84
附錄(五) 聲母和韻母列表	87
附錄(六) 右相關聲母列表	88
作者簡介	91

                                

[1] Padmanabhan M. and Bahl L. R., “Speaker clustering and
transformation for speaker adaptation in speech recognition
systems,” IEEE Transaction On Speech and Audio Processing, vol
6, issue 1,pp. 71-77, 1998.
[2] Natio M., Deng L and Sagisaka Y., “Speaker clustering for speech
recognition using the paraeters characterizing vocal-tract dimensions,” In Proc. ICASSP,pp. 981-984,1998.
[3] N. Sugamura, K. Shikano and S. Furui, “Isolated word recognition
using phoneme-like templates,” International Conference on Acoustic, Speech and Signal Processing, ICASSP,Vol 8, pp.723-726,1983.
[4] Yamada M. and Komori Y., “Fast algorithm for speech recognition using speaker cluster HMM,” Proc, EuroSpeech,pp. 2043-2046,1997.
[5] Xaun Peng, Wang Xu and Bingxi Wang, “Speaker clustering via novel pseudo-divergence of Gaussian mixture models,” International Conference on Natural Language Processing and Knowledge Engineering , NLP-KE’05,pp.111-114,2005.
[6] D. Liu and F. Kubala, “Online speaker clustering,” International Conference on Acoustic, Speech and Signal Processing, ICASSP’03,vol 1,pp. 572-575,2003.
[7] A. N. Iyer, U. 0. Ofoegbu, R. B. Yantorno* and B. Y. Smolenskit,
“Blind speaker clustering,” Intelligent Signal Processing and Communications, ISPACS’06,pp. 343-346,2006.
[8] W. H. Tsai and S. M. Wang, “Speaker clustering based on minimum rand index,” International Conference on Acoustic, Speech and Signal Processing, ICASSP’07,Vol 4,pp.15-20,2007.
[9] Eric chang, Jianlai Zhou, Shuo Di, Chao Huang, Kai-Fu Lee, “Large vocabulary mandarin speech recognition with different approaches in modeling tones,” International conference on Spoken Language Processing,ICSLP’00,pp.976-983,2000.
[10] Douglas O’Shaughnessy, Speech Communication Human and Machine, Addison-Wesley Publishing Company, 1987.
[11] Steve Young, Gunnar Evermann, Thomas Hain, Dan Kershaw, Gareth Moore, Julian Odell, Dave Ollason, Dan Povey, Valtcho Valtchev, Phil Woodland, The HTK Book( for HTK version 3.2.1), Cambridge University Engineering Department, 2002.
[12] 王小川，語音訊號處理，全華出版社，2005。
[13] Wikipedia，Mahalanobis distance
http://en.wikipedia.org/wiki/Mahalanobis_distance
[14] Wikipedia，Bhattacharyya distance
http://en.wikipedia.org/wiki/Bhattacharyya_distance
[15] Wikipedia，Hellinger distance
http://en.wikipedia.org/wiki/Hellinger_distance
[16] 林俊青，多國語言辨識系統之特徵設計研究，碩士論文，國立中山大學電機研究所，2002。
[17] Calinshi T.,Harabasz J.,”A dendrite method for cluster analysis,” Communication in Statistics, Vol. 3, pp.1-27, 1974.
[18] Davies, DL, Bouldin, D.W., “A cluster separation measure,” IEEE Transactions on Pattern Analysis and machine Itelligence, Vol. 1(2) , 1979.
[19] Dunn J.C., “Well separated clusters and optional fuzzy partitions,” J.Cybern, Vol. 4, pp. 95-104, 1974.
[20] 陳科旭，使用右文相關聲韻母模式之國語關鍵詞辨認，碩士論文，國立交通大學電信工程研究所，1989。
[21] 謝寶華，使用前後文相關HMM模式之國語連續語音辨認，碩士論文，國立交通大學電信工程研究所，1990。

簡易檢索 / 詳目顯示

相關論文