Basic Search / Detailed Display

Author: 周家得
Chou - Chia Te
Thesis Title: 以支向機為基礎並結合特徵擷取之語者辨識系統
Speaker Recognition based on Support Vector Machine with Feature Selection
Advisor: 洪西進
Shi-Jinn Horng
Committee: 王振興
Jeen-Shing Wang
楊昌彪
Chang-Biau Yang
古鴻炎
Hung-Yan Gu
林勤經
none
柴惠珍
none
Degree: 碩士
Master
Department: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
Thesis Publication Year: 2006
Graduation Academic Year: 94
Language: 中文
Pages: 72
Keywords (in Chinese): 梅爾刻度式倒頻譜參數向量量化隱藏式馬可夫模型支向機
Keywords (in other languages): MFCC, SVM, HMM, VQ
Reference times: Clicks: 204Downloads: 4
Share:
School Collection Retrieve National Library Collection Retrieve Error Report

隨著語音辨認技術的純熟及語音應用範圍的擴大,語者辨識的研究已經越來越受重視。本論文應用現今幾種語音辨認中常用的技術來完成語者辨識系統,並利用現今熱門的機器學習技術來嘗試做語者辨識的模型。
在文句相關實驗中,也就是在做辨認時,必須先知道語者所說語音的內容,我們使用了三種方法來做探討,其中隱藏式馬可夫模型和向量量化模型為大家廣泛使用的方法,雖然在實驗結果上面有不錯的效果,但是總體來說支向機的表現結果為最好,相等錯誤率為0.5%。
在文句不相關實驗中,也就是我們不知道語者說話的內容,只能憑藉其聲學特徵來做辨識,使用了文句相關與文句不相關兩種方式來做探討,並結合本論文所提出的特徵擷取的方式來做特徵的選取,由實驗結果發現,支向機的效果依舊比向量量化模型來的優,相等錯誤率為1.89%,且支向機若是經由特徵擷取之後,辨識結果比單獨使用支向機更好,相等錯誤率為1.24%,但是由實驗結果也發現若是刪除太多的語者資訊,則辨識率會變差。


Due to the development of the speech recognition, speaker recognition technology leads to various biometric applications and attracts a lot of attentions. In this thesis, several common technologies used in speech are applied to speaker recognition system. This system is modeled by a machine learning system which is a popular research area now.
In the experiment of a text dependent speaker recognition system, this means we need to know the content of the speech in advance while the identification processing is proceeded. We compare three different methods using in speaker recognition. Hidden Markov Model and Vector Quantization Model have been broadly used. Although their experimental results are good, the results are still not as good as the result of using Support Vector Machine. The percentage of Equal Error Rate (EER) of a speaker recognition system using Support Vector Machine is 0.5%.
In the experiment of text independent system, the content of speech is unimportant, and only the speech features are used to identify the speaker. We discuss in two ways and integrate with the feature selection brought up by this thesis to extract features. It can be known from the results of the experiment that the outcome of speaker identification system by using Support Vector Machine is better than that of using Vector Quantization. The EER% of Support Vector Machine is 1.89%. The identification result will be even better than that only using Support Vector Machine after feature selection. The EER% of it will be 1.24%. However, if we delete too much feature information of a speaker, the identification rate will be worse.

摘要 I Abstract II 誌謝 III Index IV List of Figures and Tables VI Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Document Discussion 2 1.3 Summary 4 Chapter 2 Speaker's Recognition System 5 Chapter 3 Speaker Recognition Preprocessing 9 3.1 Recording 9 3.2 Energy Detection 11 3.3 Division of the Frame and Window Function 12 Chapter 4 Calculation of speaker’s recognition features 16 4.1 Fast Fourier Transform 17 4.2 Mel-Frequency Transform 18 4.3 Mel-Frequency Cepstrum Coefficients 22 4.4 Delta Coefficients 23 Chapter 5 Modeling 25 5.1 Vector Quantization 26 5.2 Hidden Markov Model 31 5.2.1 Definition 31 5.2.2 Viterbi Algorithm 36 5.3 Feature Selection 43 5.4 Support Vector Machine 46 Chapter 6 Experiments & Results 53 6.1 Developing Environment 53 6.2 System Operation 53 6.3 Experiment Design and the Structure of the Experiment 55 6.4 The Evaluation of System Efficiency 56 6.5 Brief Introduction of Speech Database 57 6.6 Experiment of Text Dependant Verification 58 6.7 Experiment of Text Independent Verification 60 6.7.1 The Influence of Threshold Values of Different feature selections to Identification Rate 60 6.7.2 Comparison with other theses 63 Chapter 7 Summary and Perspective 64 7.1 Summary 64 7.2 Perspective 65 Reference 66

[1] A. E. Rosenberg and M. R. Sambur, “New techniques for automatic speaker verification,” IEEE Trans. Acoust., Speech, Signal Processing, Vol. ASSP-23, Apr. 1975, pp. 169-176.
[2] A. E. Rosenberg, “Evaluation of an automatic speaker-verification system over telephone lines,” Bell Syst. Tech. J., Vol. 55, July-Aug, 1976, pp. 723-744.
[3] A. Gersho, R. Gray, ”Vector Quantization and Signal Compression,” Kluwer Academic Publishers, Boston, 1992.
[4] A. M. Aritaeeinia and P. Sivakumaran. “Comparison of VQ and DTW classifiers for speaker verification,” Security and Detection, 1997. ECOS 97., European Conference.
[5] B. Sabac, “Speaker recognition using Discriminative Features Selection,” Proc. ICASSP 2001, Vol. 1, pp. I-508-I-512.
[6] B. H. Juang and L. Rabiner, “Fundamental of speech recognition,” Prentice Hall , New Jersey, 1993.
[7] C. C. Chang and C. J. Lin (2001). LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[8] C. B. Lima, A. Alcaim and J. A. Apolinario, “On the use of PCA in GMM and AR-vector models for text independent speaker verification,” Digital Signal rocessing, 2002, Vol. 2, pp. 595 –598.
[9] C. Cornaz and U. Hunkeler, ”Digital signal processing Mini-Project: An Automatic Speaker Recognition,” http://icwww.epfl.ch/~humkeler/dsp/ minipro2.pdf, February, 2003.
[10] C. C. Broun, X. Zhang, R. M. Mersereau and M. Clements,“Automatic speechreading with application to speaker verification,”Proc. ICASSP 2002, Vol. 1, pp. I-685 -I-688.
[11] D. O’Shaughnessy, “Speaker recognition,” IEEE ASSP Mag., Oct. 1986, pp.4-7.
[12] D. A. Reynolds, “Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models,” IEEE Trans. on Speech and Audio Processing, Vol. 3, No. 1, Jan. 1995, pp.72-83.
[13] E. Karpov, “real time speaker identification,” University of Joensuu, Department of Computer Science, Master’s thesis, 2003.
[14] E. Ardizzone, A. Chella, R.Pirrone, “An Architecture for Automatic Gesture Analysis”, Proceedings of the Working Conference on Advanced Visual Interfaces May 2000.
[15] F. Soong et al., “A vector quantization approach to speaker recognition,” in Proc. IEEE ICASSP, 1985, pp.387-390.
[16] F. Hou and B. Wang,” Text-independent speaker recognition using support vector machine,”Info-tech and Info-net, 2001. Proceedings. ICII 2001 - Beijing. 2001 International Conferences on, Vol.3, 29 Oct.-1 Nov. 2001 pp.402 - 407.
[17] F. Hou and B. Wang,” Text-independent speaker verification using speaker clustering and support vector machines,”Signal Processing, 2002 6th International Conference on Volume 1, 26-30 Aug. 2002, pp.456 - 459.
[18] G.. R. Doddington, “Speaker recognition- Identifying people by their voices,” Proc. IEEE, Vol.73, 1985, pp. 1651-1664.
[19] G. Velius, “Variants of cepstrum based speaker identify verification,” in Proc. ICASSP, 1998, pp.583-586.
[20] G.. R. Doddington, ”A computer method of speaker verification,” Ph.D. Dissertation, Dep. Elec. Eng., University of Wisconsin, 1970.
[21] H. Gish.and M. Schmidt, ”Text Independent Speaker Identification,” IEEE Signal Processing Magazine, Vol.11, No. 4, 1994, pp.18-32.
[22] I. M. Chagnolleau, G. Durou and F. Bimbot, “Application of time-frequency principal component analysis to text-independent speaker identification”, IEEE Transactions on Speech and Audio Processing, Vol. 10 No.6, 2002, pp. 371 –378.
[23] J. L. Dugelay, J. C. Junqua, C Kotropoulos, R Kuhn, F Perronnin, I. Pitas, “Recent advances in biometric person authentication,” Acoustics, Speech, and Signal Processing, 2002. Proceedings. (ICASSP '02). IEEE International Conference on , Vol. 4 , 2002.
[24] J. R. Deller, J. H. L. Hansen, J. G.. Proakis, ”Discrete Time Processing of Speech Signals,” Piscataway(N. J. ), IEEE Press, 2000.
[25] J. P. Campbell and JR, ”Speaker Recognition : A Tutorial,” Proc. Of the IEEE, Vol.85, No.9, Sept 1997, pp. 1437-1462.
[26] J. P. Campbell Jr. Testing with the YOHO CD-ROM voice verification corpus. In Proc. ICASSP, volume 1, pages 341.344, 1995.
[27] J. Luettin, N. A. Thacker, S. W. Beet, ”Speaker Identification by lipreading,” Proceedings of the 4th International Conference on Spoken Language Processing (ICSLP'96), 1996.
[28] L. Sang; Z. Wu; Y. Yang; W. Zhang; ” Automatic speaker recognition using dynamic Bayesian network,” Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on, Vol.3, 6-9 July 2003 pp.613-616.
[29] L. Rudasi and S. A. Zahorian, “Text-independent talker identification with neural networks,” in Proc. IEEE ICASSP, May 1991,pp.389-392.
[30] M. N. Do, ” Digital signal processing Mini-Project: An Automatic Speaker Recognition system,” http://lcavwww.epfl.ch/~minhdo/asr_project.html, 2002.
[31] M. Sigmund, “Speaker recognition: Identifying people by their voices,” Brno University of Technology, Habilitation thesis, 2000.
[32] N. R. French and J.C. Steinberg, “Factors governing the intelligibility of speech sounds,” J. Acoust. Soc. Am., 19: 90-119, 1947.
[33] R.C. Rose, D. A. Reynolds, “Text-independent speaker identification using automatic acoustic segmentation,” Proc.ICASSP 1990, pp. 293-296.
[34] R. J. Mammone, X. Zhang, and R. P. Ramachandran, “Robust speaker recognition: A feature-based approach,” IEEE Signal Processing Mag., Vol. 13, 1996, pp.58- 71.
[35] R. Schwartz, S. Roucos, and M. Berouti, “The application of probability density estimation to text-independent speaker identification,” ICASSP-82, 1982, pp.1649-1652.
[36] R. C. Lummis, “Speaker verification by computer using speech intensity for temporal registration,” IEEE Trans. Audio Electroacoust., Vol.AU-21, Apr. 1973, pp. 80-89.
[37] R. Modic, B. Lindberg, and B. Petek, “Comparative Wavelet and MFCC speech recognition experiments on the Slovenian and English speechDat2,” NOLISP’03, pp.16.
[38] S. Fine, J. Navratil and R.A. Gopinath, “A hybrid GMM/SVM approach to speaker identification,” Proc. ICASSP 2001, Vol. 1, pp.417 –420.
[39] T. Isobe and J. Takahashi, “A new cohort normalization using local acoustic informtion for speaker verification”, Proc. ICASSP 1999,vol. 2, pp. 841 -844.
[40] T. Kinnunen, T. Kilpeläinen and P. Fränti, “ Comparison of clustering algorithms in speaker identification,” Proc. IASTED Int. Conf. Signal Processing and Communications (SPC 2000), Marbella, Spain, 2000, pp. 222-227.
[41] T. Soong, T. Phan, “Text Independent Speech Recognition, http://www.e ce.Utexas.edu/~bevans/courses/ee382c/projects/fall99/phan-soong/litsurvey.pdf,” 1999.
[42] T. Eriksson; S. Kim; Hong-Goo Kang and C. Lee; “An information-theoretic perspective on feature selection in speaker recognition” Signal Processing Letters, IEEE Vol.12, Issue 7, July 2005 pp.500 - 503
[43] V. Vapnik, “Statistical Learning Theory,” Wiley, New York, 1998
[44] V. Wan and S. Renals;” Evaluation of kernel methods for speaker verification and identification,”Acoustics, Speech, and Signal Processing, 2002. Proceedings. (ICASSP '02). IEEE International Conference on, Vol.1, 13-17 May 2002, pp.669-672.
[45] V. Wan and W.M. Campbell;” Support vector machines for speaker verification and identification,” Neural Networks for Signal Processing X, 2000. Proceedings of the 2000 IEEE Signal Processing Society Workshop, Vol.2, 11-13 Dec. 2000, pp.775 – 784.
[46] W.D. Zhang, M.W. Mak and X. He, “A two-stage scoring method combining world and cohort models for speaker verification,” Proc.ICASSP 2000, Vol. 2, pp. II1193 -II1196.
[47]W.M. Campbell and K.T Assaleh, “Polynomial classifier techniques for speaker verification,” Proc. ICASSP 1999, Vol. 1, pp. 321 -324.
[48]X. Huang, A. Acero and H.-W. Hin, “Spoken language processing,” Upper Saddle River, New Jersey, Prentice Hall PTR, 2001.
[49] Y. Linde, A. Buzo and R. Gray, “An algorithm for vector quantizer design,” IEEE Transactions on Communications, VOl. 28, 1980, pp. 84-95.
[50] Z. X. Yuan, B. L. Xu, and C. Z. Yu, “Binary quantization of feature vectors for robust text-independent speaker identification,” IEEE Tran. of Speech and Audio Processing, Vol. 7, No. 1, Jan 1990.
[51] Z. Ma; Y. Yang; Z. Wu; “Further feature extraction for speaker recognition,” Systems, Man and Cybernetics, 2003. IEEE International Conference on Vol.5, 5-8 Oct. 2003 pp.4153 – 4158.
[52] 顏銘祥, “ 以DSP為架構的不特定語句即時與鍺辨識系統, ” 國立中山大學碩士論文, 民國九十三年.
[53] 黃雪珠, ”基於小波轉換之語者識別分析, ”國立台灣大學碩士論文,民國九十二年.
[54] 古詩峰, ”基於小波轉換特徵參數以及使用麥克風和電話語料之大量語者識別系統, ”私立長庚大學碩士論文,民國九十一年.
[55] 黃俊豪, ”大量語者不特定語句環境下語者辨識系統之特徵設計,”國立中山大學碩士論文,民國八十九年.
[56] 謝宏坤, ”語音說明中搜尋任意定義之關鍵詞的研究, ”台灣科技大學碩士論文,中華民國八十九年.
[57] 許世俊,”用於高斯混合模型語者辨認之區別式訓練方法,”國立清華大 學碩士論文,中華民國八十五年.
[58] 楊璧如, ”語者歌者識別,”國立清華大學碩士論文,中華民國八十八年.

QR CODE