研究生: |
邱義欽 Yi-Chin Chiu |
---|---|
論文名稱: |
文本無關之語者驗證方法研究 A Study on Text-Independent Speaker Verification Method |
指導教授: |
林伯慎
Bor-Shen Lin |
口試委員: |
羅乃維
Nai-Wei Lo 古鴻炎 Hung-Yan Gu |
學位類別: |
碩士 Master |
系所名稱: |
管理學院 - 資訊管理系 Department of Information Management |
論文出版年: | 2014 |
畢業學年度: | 102 |
語文別: | 中文 |
論文頁數: | 75 |
中文關鍵詞: | 高斯混合模型 、語者辨識、語者驗證 、文本無關 、類神經網路 、多層次感知器 |
外文關鍵詞: | Text Independent, Speaker Verification, MLP |
相關次數: | 點閱:478 下載:9 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文主要針對文本無關之語者驗證方法進行研究,提出效能改進的方式,期望能夠找出具實用價值的模型訓練流程與驗證架構。語者驗證系統效能主要會受到三個因素的影響,分別是語者特徵參數、語者模型與訓練流程,以及決策模組。本論文分別針對語者模型訓練流程、背景語者挑選方法、性別相關背景模型、和基於類神經網路的決策模組分別進行實驗。結果顯示,對於基本的語者驗證系統,增加高斯混合模型的混合數可以有效提升驗證效能,迭代次數則無明顯影響。在背景語者挑選方法上,以Bhattacharyya距離或最大相似度去挑選背景語者均優於隨機挑選,最大相似度則比Bhattacharyya距離更佳,但其計算量較大。另外,只挑選同性別的背景語者會比不限性別得到更好的驗證效能。背景語者挑選可在小幅犧牲驗證效能下減少許多相似度計算。而在類神經網路分類器的決策模組方面,不論是使用特定語者或是通用語者分類器,均能有效地提升驗證效能。改進原因是類神經網路的非線性決策函數具有較佳調整能力。而特定語者分類器改進幅度較大,然而其代價為使用者必須要錄製更多的語音資料作為訓練分類器之用。
This thesis investigates text-independent speaker verification method, and proposes a few ways of improving the verification performance by using better training procedure, selection of background speakers, gender-dependent background model, and nonlinear decision module based on artificial neural network. All these efforts aim to improve the speaker verification method such that it can be applied to a practical application.
Experimental results show that the number of mixtures of GMMs influences the verification performance largely, while the number of iterations is less relevant. The selection of background speakers can reduce the computation cost effectively by degrading the performance slightly. In addition, the selection of background speakers based on Bhattacharyya distance or maximum likelihood is significantly better than random selection, and gender-dependent background model is shown to achieve better performance than gender-independent one. As regards to the decisions module, both the speaker-dependent and speaker-independent ANN classifiers can improve the verification performance effectively. This is because nonlinear decision function of neural network has better descriptive capability.
[1] D. A. Reynolds, “An Overview of Automatic Speaker Recognition Technologies”, ICASSP, p. 4072-4075, 2002.
[2] B. H. Juang, “The past, present, and future of speech processing”, IEEE Signal Processing Magazine, page 24-48, May 1998.
[3] 吳金池,“語者辨識系統之研究”,國立中央大學電機工程研究所碩士論文,2002。
[4] 游智翔,“整合高斯混合與具性能指標支撐向量機模型之語者確認研究”,國立中央大學電機工程研究所碩士論文,2008。
[5] R. Vergin, D. O’Shaughnessy and A. Farhat,“Generalized Mel Frequency Coefficients for Large-Vocabulary Speaker-Independent Continuous-Speech Recognition”, IEEE Trans. on Speech and Audio Processing, Vol. 7, No. 5, pp. 525-532, September 1999.
[6] D. A. Reynolds and R. C. Rose, “Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models”, IEEE Tran. On Speech and Audio Processing, 3(1): 72-83, January 1995.
[7] G. McLachlan, Mixture Models. New York: Marcel Dekker, 1988.
[8] T. K. Moon, “The Expectation-Maximization Algorithm”, IEEE Signal Processing Magazine, Vol. 13, No. 6, pp. 47-60, November 1996.
[9] L. R. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, New Jersey, 1993.
[10] D. A. Reynolds, “Speaker Identification and Verification Using Gaussian Mixture Speaker Models”, Speech Communication, vol.17, no. 1-2, pp. 91-108, 1995.
[11] Q. Li, B.-H. Juang, Q. Zhou and C.-H. Lee, ”Automatic Verbal Information Verification for User Authentication”, IEEE Trans. SAP, 8(5):585-602, 2000.
[12] A. E. Rosenberg, J. Delong, C. H. Lee, B. H. Juang and F. K. Soong, ”The Use of Cohort Normalized Scores for Speaker Verification”, Pro. ICSLP. Banff, pp. 599-602. Oct. 1992.
[13] Chi-Shi Liu, Hsiao-Chuan Wang and Chin-Hui Lee, “Speaker Verification Using Normalized Log-Likelihood Score”, IEEE Trans.on Speech and Audio Processing, pp. 57-60. Jan. 1996.
[14] D. A. Reynolds, T. F. Quatieri and R. B. Dunn, “Speaker Verification Using Adapted Gaussian Mixture Models”, Digital Signal Processing, vol. 10, no. 1, pp. 19-41, 2000.
[15] C. S. Liu, H. C. Wang and C. H. Lee, “Speaker Verification Using Normalized Log-Likelihood Score”, IEEE Trans. Speech and Audio Processing, vol. 4, no. 1, pp. 56-60, 1996.
[16] A. Bhattacharyya, “On a Measure of Divergence between Two Statistical Populations Defined by Their Probability Distributions”. Bulletin of the Calcutta Mathematical Society 35: 99–109. MR0010358, 1943.
[17] J. Oglesby and J. S. Mason, “Optimization of Neural Models for Speaker Identification,” in Proc. Int. Conf. Acoustics, Speech, Signal Processing, pp. 261–264, 1990.
[18] Y. Bengio, R. De Mori, G. Flammia and R. Kompe, “Global Optimization of A Neural Network—Hidden Markov Model Hybrid,” IEEE Trans. Neural Networks, vol. 3, no. 2, pp. 252–259, 1992.
[19] H. Bourlard and C. J. Wellekins, “Links between Markov Models and Multilayer Perceptrons,” IEEE Trans. Pattern Anal. Machine Intell., vol. 12, pp. 1167–1178, Dec. 1990.
[20] J. Navratil, U. V. Chaudhari and G. N. Ramaswamy, “Speaker Verification Using Target and Background Dependent Linear Transforms and Multi-system Fusion,” in Proc. Eurospeech, 2001.
[21] David Rumelhart, James McClelland, eds., “Parallel Distributed Processing : Explorations in the Microstructure of Cognition”, Cambridge, Ma: MIT Press, Volume 1, 1986.