簡易檢索 / 詳目顯示

研究生: 邱義欽
Yi-Chin Chiu
論文名稱: 文本無關之語者驗證方法研究
A Study on Text-Independent Speaker Verification Method
指導教授: 林伯慎
Bor-Shen Lin
口試委員: 羅乃維
Nai-Wei Lo
古鴻炎
Hung-Yan Gu
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理系
Department of Information Management
論文出版年: 2014
畢業學年度: 102
語文別: 中文
論文頁數: 75
中文關鍵詞: 高斯混合模型語者辨識、語者驗證文本無關類神經網路多層次感知器
外文關鍵詞: Text Independent, Speaker Verification, MLP
相關次數: 點閱:295下載:9
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文主要針對文本無關之語者驗證方法進行研究,提出效能改進的方式,期望能夠找出具實用價值的模型訓練流程與驗證架構。語者驗證系統效能主要會受到三個因素的影響,分別是語者特徵參數、語者模型與訓練流程,以及決策模組。本論文分別針對語者模型訓練流程、背景語者挑選方法、性別相關背景模型、和基於類神經網路的決策模組分別進行實驗。結果顯示,對於基本的語者驗證系統,增加高斯混合模型的混合數可以有效提升驗證效能,迭代次數則無明顯影響。在背景語者挑選方法上,以Bhattacharyya距離或最大相似度去挑選背景語者均優於隨機挑選,最大相似度則比Bhattacharyya距離更佳,但其計算量較大。另外,只挑選同性別的背景語者會比不限性別得到更好的驗證效能。背景語者挑選可在小幅犧牲驗證效能下減少許多相似度計算。而在類神經網路分類器的決策模組方面,不論是使用特定語者或是通用語者分類器,均能有效地提升驗證效能。改進原因是類神經網路的非線性決策函數具有較佳調整能力。而特定語者分類器改進幅度較大,然而其代價為使用者必須要錄製更多的語音資料作為訓練分類器之用。


    This thesis investigates text-independent speaker verification method, and proposes a few ways of improving the verification performance by using better training procedure, selection of background speakers, gender-dependent background model, and nonlinear decision module based on artificial neural network. All these efforts aim to improve the speaker verification method such that it can be applied to a practical application.
    Experimental results show that the number of mixtures of GMMs influences the verification performance largely, while the number of iterations is less relevant. The selection of background speakers can reduce the computation cost effectively by degrading the performance slightly. In addition, the selection of background speakers based on Bhattacharyya distance or maximum likelihood is significantly better than random selection, and gender-dependent background model is shown to achieve better performance than gender-independent one. As regards to the decisions module, both the speaker-dependent and speaker-independent ANN classifiers can improve the verification performance effectively. This is because nonlinear decision function of neural network has better descriptive capability.

    第1章 緒論 1 1.1 研究動機 1 1.2 語者辨識系統簡介 2 1.3 論文目的與成果簡介 3 1.4 論文組織與架構 4 第2章 語音處理與語者辨識相關技術 5 2.1 特徵參數擷取 5 2.2 語者模型建立 7 2.2.1 高斯混合模型 7 2.2.2 語者模型訓練與參數預估 8 2.3 語者辨識方法 9 2.3.1 語者識別 10 2.3.2 語者驗證 10 2.3.3 背景語者模型 12 2.3.4 背景語者的挑選方式 14 2.3.5 驗證效能評估 15 2.4 多層次類神經網路模型 17 2.4.1 網路結構 17 2.4.2 模型訓練 19 2.5 本章摘要 22 第3章 基礎語者驗證系統實驗與分析 23 3.1 語音資料庫 23 3.2 高斯混合模型系統實驗分析 25 3.2.1 混合數的實驗 26 3.2.2 迭代次數的實驗 28 3.2.3 背景語者人數與挑選方式之實驗 29 3.2.4 以性別來區分背景語者之實驗 32 3.2.5 訓練或測試語音資料時間長度之實驗 36 3.2.6 提升特定時間長度語音資料下系統效能之實驗 39 3.3 本章摘要 42 第4章 類神經網路語者驗證系統實驗與分析 44 4.1 類神經網路語者驗證系統架構 44 4.2 類神經網路模型系統實驗分析 48 4.2.1 特定語者模型之語者驗證實驗 48 4.2.2 通用語者模型之語者驗證實驗 50 4.2.3 通用語者模型之背景語者實驗 54 4.2.4 通用語者模型之隱藏層層數實驗 57 4.3 本章摘要 59 第5章 結論 60 參考文獻 62

    [1] D. A. Reynolds, “An Overview of Automatic Speaker Recognition Technologies”, ICASSP, p. 4072-4075, 2002.
    [2] B. H. Juang, “The past, present, and future of speech processing”, IEEE Signal Processing Magazine, page 24-48, May 1998.
    [3] 吳金池,“語者辨識系統之研究”,國立中央大學電機工程研究所碩士論文,2002。
    [4] 游智翔,“整合高斯混合與具性能指標支撐向量機模型之語者確認研究”,國立中央大學電機工程研究所碩士論文,2008。
    [5] R. Vergin, D. O’Shaughnessy and A. Farhat,“Generalized Mel Frequency Coefficients for Large-Vocabulary Speaker-Independent Continuous-Speech Recognition”, IEEE Trans. on Speech and Audio Processing, Vol. 7, No. 5, pp. 525-532, September 1999.
    [6] D. A. Reynolds and R. C. Rose, “Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models”, IEEE Tran. On Speech and Audio Processing, 3(1): 72-83, January 1995.
    [7] G. McLachlan, Mixture Models. New York: Marcel Dekker, 1988.
    [8] T. K. Moon, “The Expectation-Maximization Algorithm”, IEEE Signal Processing Magazine, Vol. 13, No. 6, pp. 47-60, November 1996.
    [9] L. R. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, New Jersey, 1993.
    [10] D. A. Reynolds, “Speaker Identification and Verification Using Gaussian Mixture Speaker Models”, Speech Communication, vol.17, no. 1-2, pp. 91-108, 1995.
    [11] Q. Li, B.-H. Juang, Q. Zhou and C.-H. Lee, ”Automatic Verbal Information Verification for User Authentication”, IEEE Trans. SAP, 8(5):585-602, 2000.
    [12] A. E. Rosenberg, J. Delong, C. H. Lee, B. H. Juang and F. K. Soong, ”The Use of Cohort Normalized Scores for Speaker Verification”, Pro. ICSLP. Banff, pp. 599-602. Oct. 1992.
    [13] Chi-Shi Liu, Hsiao-Chuan Wang and Chin-Hui Lee, “Speaker Verification Using Normalized Log-Likelihood Score”, IEEE Trans.on Speech and Audio Processing, pp. 57-60. Jan. 1996.
    [14] D. A. Reynolds, T. F. Quatieri and R. B. Dunn, “Speaker Verification Using Adapted Gaussian Mixture Models”, Digital Signal Processing, vol. 10, no. 1, pp. 19-41, 2000.
    [15] C. S. Liu, H. C. Wang and C. H. Lee, “Speaker Verification Using Normalized Log-Likelihood Score”, IEEE Trans. Speech and Audio Processing, vol. 4, no. 1, pp. 56-60, 1996.
    [16] A. Bhattacharyya, “On a Measure of Divergence between Two Statistical Populations Defined by Their Probability Distributions”. Bulletin of the Calcutta Mathematical Society 35: 99–109. MR0010358, 1943.
    [17] J. Oglesby and J. S. Mason, “Optimization of Neural Models for Speaker Identification,” in Proc. Int. Conf. Acoustics, Speech, Signal Processing, pp. 261–264, 1990.
    [18] Y. Bengio, R. De Mori, G. Flammia and R. Kompe, “Global Optimization of A Neural Network—Hidden Markov Model Hybrid,” IEEE Trans. Neural Networks, vol. 3, no. 2, pp. 252–259, 1992.
    [19] H. Bourlard and C. J. Wellekins, “Links between Markov Models and Multilayer Perceptrons,” IEEE Trans. Pattern Anal. Machine Intell., vol. 12, pp. 1167–1178, Dec. 1990.
    [20] J. Navratil, U. V. Chaudhari and G. N. Ramaswamy, “Speaker Verification Using Target and Background Dependent Linear Transforms and Multi-system Fusion,” in Proc. Eurospeech, 2001.
    [21] David Rumelhart, James McClelland, eds., “Parallel Distributed Processing : Explorations in the Microstructure of Cognition”, Cambridge, Ma: MIT Press, Volume 1, 1986.

    QR CODE