簡易檢索 / 詳目顯示

研究生: 游政仁
Cheng-Jen YU
論文名稱: 語者分群及語音命令辨識之研究
A Study on Speaker Clustering and Voice Command Recognition
指導教授: 古鴻炎
Hung-Yan Gu
口試委員: 王新民
Hsin-Min Wang
余明興
Ming-Sing Yu
林柏慎
Bor-Shen Lin
鍾國亮
Kuo-Liang Chung
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2008
畢業學年度: 96
語文別: 中文
論文頁數: 92
中文關鍵詞: 語者分群語音辨識
外文關鍵詞: speaker clustering, speech recognition
相關次數: 點閱:192下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文以GMM來塑模(modeling)各語者的聲學特性,然後使用三種不同的距離量測,來計算基於pseude-divergence的GMM模型之間的距離。接著,分別實驗k-means分群法和樹狀結構分群法,對TCC-300語料庫中的303位語者進行分群,之後取出分群結果的群中心語者來進行 HMM 模型訓練,以進行語音命令的辨識實驗。實驗結果顯示,使用TCC-300語料庫當中所有語者所建立的HMM模型,其線外辨識率達到87.7%;使用k-means分群法所得到的少量語者所建立的HMM模型,其線外辨識率達到87.1%;而使用樹狀結構分群法所得到的少量語者所建立的HMM模型,其線外辨識率僅達到76.3%。此外,我們也探討音節模型、聲母和韻母模型與右相關聲母和韻母模型對於辨識率的影響,結果顯示音節模型的辨識率最高。為了應用於機器人上來作即時的語音命令控制,我們也實際建造了一個線上的語音命令辨識系統。


    In this thesis, GMM(Gaussian mixture model) is used to model the acoustic characteristics of a speaker, and three distance measures are used to measure the distance between two GMM based on pseudo-divergence. Then, two different clustering methods, approximated k-means clustering and tree-structured clustering, are studied. By using the two clustering methods, the 303 speakers in the TCC-300 corpus are clustered, and the utterences of the centroid speaker of each cluster are gathered to train the HMM(hidden Markov model) models for voice command recognition experiments. According to the results of the experiments, if we use all the speakers’ utterences of TCC-300 corpus, the highest recognition rate is 87.7%. If we use the HMM models contructed by using the fewer speakers selected by the approximated k-means clustering method, the highest recognition rate can reach 87.1%. But if using the HMM models contructed by using the fewer speakers selected by the tree-structed clustering method, the highest recognition rate is only 76.3%. In addition, three kinds of modeling units, syllable, initial plus final , and right-context dependent initial plus final, are studied to realize their influences on recognition rate. The experiment results show that syllable is the best choice. Furthermore, we have contructed a real-time voice command recognition system that can be used to control a robot.

    摘要 I ABSTRACT II 誌謝 IV 目錄 V 圖表索引 VII 第1章 緒論 1 1.1 研究動機及目的 1 1.2 分群研究方法之回顧 2 1.3 語音辨識研究之回顧 3 1.4 研究方法 5 1.5 論文架構 6 第2章 語音訊號處理及特徵萃取 8 2.1 語音信號處理步驟 8 2.2 端點偵測 9 2.3 MFCC萃取 13 第3章 高斯混合模型及其距離計算 18 3.1 高斯混合函數的參數估測 18 3.2 高斯分佈之距離定義 22 3.2.1 Mahalanobis Distance 22 3.2.2 Bhattacharyya Distance 23 3.2.3 Hellinger Distance 23 3.3 基於pseudo-divergence 之模型距離 24 第4章 分群方法 28 4.1 傳統k-means分群 28 4.2 修正之k-means分群 30 4.3 樹狀結構分群 32 4.4 群聚適切性評估 35 4.5 分群實驗 38 4.6 實驗結果討論 46 第5章 語音命令辨識-線外測試 48 5.1 訓練語料 48 5.2 聲學模型 48 5.3 聲學模型訓練 50 5.3.1語料預處理 50 5.3.2訓練流程與操作步驟 50 5.4 線外辨識實驗 56 第6章 語音命令辨識-線上測試 64 6.1 HTK相關檔案準備 64 6.2 程式架構與製作 67 6.3 程式介面與操作 69 6.4 線上測試結果 71 第7章 結論 73 參考文獻 76 附錄(一) 40-H-4分群結果及其辨識混淆情形 79 附錄(二) 40-B分群結果及其辨識混淆情形 81 附錄(三) 10群、15群、20群的辨識率 83 附錄(四) 中文音節列表 84 附錄(五) 聲母和韻母列表 87 附錄(六) 右相關聲母列表 88 作者簡介 91

    [1] Padmanabhan M. and Bahl L. R., “Speaker clustering and
    transformation for speaker adaptation in speech recognition
    systems,” IEEE Transaction On Speech and Audio Processing, vol
    6, issue 1,pp. 71-77, 1998.
    [2] Natio M., Deng L and Sagisaka Y., “Speaker clustering for speech
    recognition using the paraeters characterizing vocal-tract dimensions,” In Proc. ICASSP,pp. 981-984,1998.
    [3] N. Sugamura, K. Shikano and S. Furui, “Isolated word recognition
    using phoneme-like templates,” International Conference on Acoustic, Speech and Signal Processing, ICASSP,Vol 8, pp.723-726,1983.
    [4] Yamada M. and Komori Y., “Fast algorithm for speech recognition using speaker cluster HMM,” Proc, EuroSpeech,pp. 2043-2046,1997.
    [5] Xaun Peng, Wang Xu and Bingxi Wang, “Speaker clustering via novel pseudo-divergence of Gaussian mixture models,” International Conference on Natural Language Processing and Knowledge Engineering , NLP-KE’05,pp.111-114,2005.
    [6] D. Liu and F. Kubala, “Online speaker clustering,” International Conference on Acoustic, Speech and Signal Processing, ICASSP’03,vol 1,pp. 572-575,2003.
    [7] A. N. Iyer, U. 0. Ofoegbu, R. B. Yantorno* and B. Y. Smolenskit,
    “Blind speaker clustering,” Intelligent Signal Processing and Communications, ISPACS’06,pp. 343-346,2006.
    [8] W. H. Tsai and S. M. Wang, “Speaker clustering based on minimum rand index,” International Conference on Acoustic, Speech and Signal Processing, ICASSP’07,Vol 4,pp.15-20,2007.
    [9] Eric chang, Jianlai Zhou, Shuo Di, Chao Huang, Kai-Fu Lee, “Large vocabulary mandarin speech recognition with different approaches in modeling tones,” International conference on Spoken Language Processing,ICSLP’00,pp.976-983,2000.
    [10] Douglas O’Shaughnessy, Speech Communication Human and Machine, Addison-Wesley Publishing Company, 1987.
    [11] Steve Young, Gunnar Evermann, Thomas Hain, Dan Kershaw, Gareth Moore, Julian Odell, Dave Ollason, Dan Povey, Valtcho Valtchev, Phil Woodland, The HTK Book( for HTK version 3.2.1), Cambridge University Engineering Department, 2002.
    [12] 王小川, 語音訊號處理,全華出版社,2005。
    [13] Wikipedia,Mahalanobis distance
    http://en.wikipedia.org/wiki/Mahalanobis_distance
    [14] Wikipedia,Bhattacharyya distance
    http://en.wikipedia.org/wiki/Bhattacharyya_distance
    [15] Wikipedia,Hellinger distance
    http://en.wikipedia.org/wiki/Hellinger_distance
    [16] 林俊青,多國語言辨識系統之特徵設計研究,碩士論文,國立中山大學電機研究所,2002。
    [17] Calinshi T.,Harabasz J.,”A dendrite method for cluster analysis,” Communication in Statistics, Vol. 3, pp.1-27, 1974.
    [18] Davies, DL, Bouldin, D.W., “A cluster separation measure,” IEEE Transactions on Pattern Analysis and machine Itelligence, Vol. 1(2) , 1979.
    [19] Dunn J.C., “Well separated clusters and optional fuzzy partitions,” J.Cybern, Vol. 4, pp. 95-104, 1974.
    [20] 陳科旭,使用右文相關聲韻母模式之國語關鍵詞辨認,碩士論文,國立交通大學電信工程研究所,1989。
    [21] 謝寶華,使用前後文相關HMM模式之國語連續語音辨認,碩士論文,國立交通大學電信工程研究所,1990。

    QR CODE