Basic Search / Detailed Display

Author: 周家得
Chou - Chia Te
Thesis Title: 以支向機為基礎並結合特徵擷取之語者辨識系統
Speaker Recognition based on Support Vector Machine with Feature Selection
Advisor: 洪西進
Shi-Jinn Horng
Committee: 王振興
Jeen-Shing Wang
Chang-Biau Yang
Hung-Yan Gu
Degree: 碩士
Department: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
Thesis Publication Year: 2006
Graduation Academic Year: 94
Language: 中文
Pages: 72
Keywords (in Chinese): 梅爾刻度式倒頻譜參數向量量化隱藏式馬可夫模型支向機
Keywords (in other languages): MFCC, SVM, HMM, VQ
Reference times: Clicks: 204Downloads: 4
School Collection Retrieve National Library Collection Retrieve Error Report


Due to the development of the speech recognition, speaker recognition technology leads to various biometric applications and attracts a lot of attentions. In this thesis, several common technologies used in speech are applied to speaker recognition system. This system is modeled by a machine learning system which is a popular research area now.
In the experiment of a text dependent speaker recognition system, this means we need to know the content of the speech in advance while the identification processing is proceeded. We compare three different methods using in speaker recognition. Hidden Markov Model and Vector Quantization Model have been broadly used. Although their experimental results are good, the results are still not as good as the result of using Support Vector Machine. The percentage of Equal Error Rate (EER) of a speaker recognition system using Support Vector Machine is 0.5%.
In the experiment of text independent system, the content of speech is unimportant, and only the speech features are used to identify the speaker. We discuss in two ways and integrate with the feature selection brought up by this thesis to extract features. It can be known from the results of the experiment that the outcome of speaker identification system by using Support Vector Machine is better than that of using Vector Quantization. The EER% of Support Vector Machine is 1.89%. The identification result will be even better than that only using Support Vector Machine after feature selection. The EER% of it will be 1.24%. However, if we delete too much feature information of a speaker, the identification rate will be worse.

摘要 I Abstract II 誌謝 III Index IV List of Figures and Tables VI Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Document Discussion 2 1.3 Summary 4 Chapter 2 Speaker's Recognition System 5 Chapter 3 Speaker Recognition Preprocessing 9 3.1 Recording 9 3.2 Energy Detection 11 3.3 Division of the Frame and Window Function 12 Chapter 4 Calculation of speaker’s recognition features 16 4.1 Fast Fourier Transform 17 4.2 Mel-Frequency Transform 18 4.3 Mel-Frequency Cepstrum Coefficients 22 4.4 Delta Coefficients 23 Chapter 5 Modeling 25 5.1 Vector Quantization 26 5.2 Hidden Markov Model 31 5.2.1 Definition 31 5.2.2 Viterbi Algorithm 36 5.3 Feature Selection 43 5.4 Support Vector Machine 46 Chapter 6 Experiments & Results 53 6.1 Developing Environment 53 6.2 System Operation 53 6.3 Experiment Design and the Structure of the Experiment 55 6.4 The Evaluation of System Efficiency 56 6.5 Brief Introduction of Speech Database 57 6.6 Experiment of Text Dependant Verification 58 6.7 Experiment of Text Independent Verification 60 6.7.1 The Influence of Threshold Values of Different feature selections to Identification Rate 60 6.7.2 Comparison with other theses 63 Chapter 7 Summary and Perspective 64 7.1 Summary 64 7.2 Perspective 65 Reference 66

[1] A. E. Rosenberg and M. R. Sambur, “New techniques for automatic speaker verification,” IEEE Trans. Acoust., Speech, Signal Processing, Vol. ASSP-23, Apr. 1975, pp. 169-176.
[2] A. E. Rosenberg, “Evaluation of an automatic speaker-verification system over telephone lines,” Bell Syst. Tech. J., Vol. 55, July-Aug, 1976, pp. 723-744.
[3] A. Gersho, R. Gray, ”Vector Quantization and Signal Compression,” Kluwer Academic Publishers, Boston, 1992.
[4] A. M. Aritaeeinia and P. Sivakumaran. “Comparison of VQ and DTW classifiers for speaker verification,” Security and Detection, 1997. ECOS 97., European Conference.
[5] B. Sabac, “Speaker recognition using Discriminative Features Selection,” Proc. ICASSP 2001, Vol. 1, pp. I-508-I-512.
[6] B. H. Juang and L. Rabiner, “Fundamental of speech recognition,” Prentice Hall , New Jersey, 1993.
[7] C. C. Chang and C. J. Lin (2001). LIBSVM: a library for support vector machines. Software available at
[8] C. B. Lima, A. Alcaim and J. A. Apolinario, “On the use of PCA in GMM and AR-vector models for text independent speaker verification,” Digital Signal rocessing, 2002, Vol. 2, pp. 595 –598.
[9] C. Cornaz and U. Hunkeler, ”Digital signal processing Mini-Project: An Automatic Speaker Recognition,” minipro2.pdf, February, 2003.
[10] C. C. Broun, X. Zhang, R. M. Mersereau and M. Clements,“Automatic speechreading with application to speaker verification,”Proc. ICASSP 2002, Vol. 1, pp. I-685 -I-688.
[11] D. O’Shaughnessy, “Speaker recognition,” IEEE ASSP Mag., Oct. 1986, pp.4-7.
[12] D. A. Reynolds, “Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models,” IEEE Trans. on Speech and Audio Processing, Vol. 3, No. 1, Jan. 1995, pp.72-83.
[13] E. Karpov, “real time speaker identification,” University of Joensuu, Department of Computer Science, Master’s thesis, 2003.
[14] E. Ardizzone, A. Chella, R.Pirrone, “An Architecture for Automatic Gesture Analysis”, Proceedings of the Working Conference on Advanced Visual Interfaces May 2000.
[15] F. Soong et al., “A vector quantization approach to speaker recognition,” in Proc. IEEE ICASSP, 1985, pp.387-390.
[16] F. Hou and B. Wang,” Text-independent speaker recognition using support vector machine,”Info-tech and Info-net, 2001. Proceedings. ICII 2001 - Beijing. 2001 International Conferences on, Vol.3, 29 Oct.-1 Nov. 2001 pp.402 - 407.
[17] F. Hou and B. Wang,” Text-independent speaker verification using speaker clustering and support vector machines,”Signal Processing, 2002 6th International Conference on Volume 1, 26-30 Aug. 2002, pp.456 - 459.
[18] G.. R. Doddington, “Speaker recognition- Identifying people by their voices,” Proc. IEEE, Vol.73, 1985, pp. 1651-1664.
[19] G. Velius, “Variants of cepstrum based speaker identify verification,” in Proc. ICASSP, 1998, pp.583-586.
[20] G.. R. Doddington, ”A computer method of speaker verification,” Ph.D. Dissertation, Dep. Elec. Eng., University of Wisconsin, 1970.
[21] H. Gish.and M. Schmidt, ”Text Independent Speaker Identification,” IEEE Signal Processing Magazine, Vol.11, No. 4, 1994, pp.18-32.
[22] I. M. Chagnolleau, G. Durou and F. Bimbot, “Application of time-frequency principal component analysis to text-independent speaker identification”, IEEE Transactions on Speech and Audio Processing, Vol. 10 No.6, 2002, pp. 371 –378.
[23] J. L. Dugelay, J. C. Junqua, C Kotropoulos, R Kuhn, F Perronnin, I. Pitas, “Recent advances in biometric person authentication,” Acoustics, Speech, and Signal Processing, 2002. Proceedings. (ICASSP '02). IEEE International Conference on , Vol. 4 , 2002.
[24] J. R. Deller, J. H. L. Hansen, J. G.. Proakis, ”Discrete Time Processing of Speech Signals,” Piscataway(N. J. ), IEEE Press, 2000.
[25] J. P. Campbell and JR, ”Speaker Recognition : A Tutorial,” Proc. Of the IEEE, Vol.85, No.9, Sept 1997, pp. 1437-1462.
[26] J. P. Campbell Jr. Testing with the YOHO CD-ROM voice verification corpus. In Proc. ICASSP, volume 1, pages 341.344, 1995.
[27] J. Luettin, N. A. Thacker, S. W. Beet, ”Speaker Identification by lipreading,” Proceedings of the 4th International Conference on Spoken Language Processing (ICSLP'96), 1996.
[28] L. Sang; Z. Wu; Y. Yang; W. Zhang; ” Automatic speaker recognition using dynamic Bayesian network,” Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on, Vol.3, 6-9 July 2003 pp.613-616.
[29] L. Rudasi and S. A. Zahorian, “Text-independent talker identification with neural networks,” in Proc. IEEE ICASSP, May 1991,pp.389-392.
[30] M. N. Do, ” Digital signal processing Mini-Project: An Automatic Speaker Recognition system,”, 2002.
[31] M. Sigmund, “Speaker recognition: Identifying people by their voices,” Brno University of Technology, Habilitation thesis, 2000.
[32] N. R. French and J.C. Steinberg, “Factors governing the intelligibility of speech sounds,” J. Acoust. Soc. Am., 19: 90-119, 1947.
[33] R.C. Rose, D. A. Reynolds, “Text-independent speaker identification using automatic acoustic segmentation,” Proc.ICASSP 1990, pp. 293-296.
[34] R. J. Mammone, X. Zhang, and R. P. Ramachandran, “Robust speaker recognition: A feature-based approach,” IEEE Signal Processing Mag., Vol. 13, 1996, pp.58- 71.
[35] R. Schwartz, S. Roucos, and M. Berouti, “The application of probability density estimation to text-independent speaker identification,” ICASSP-82, 1982, pp.1649-1652.
[36] R. C. Lummis, “Speaker verification by computer using speech intensity for temporal registration,” IEEE Trans. Audio Electroacoust., Vol.AU-21, Apr. 1973, pp. 80-89.
[37] R. Modic, B. Lindberg, and B. Petek, “Comparative Wavelet and MFCC speech recognition experiments on the Slovenian and English speechDat2,” NOLISP’03, pp.16.
[38] S. Fine, J. Navratil and R.A. Gopinath, “A hybrid GMM/SVM approach to speaker identification,” Proc. ICASSP 2001, Vol. 1, pp.417 –420.
[39] T. Isobe and J. Takahashi, “A new cohort normalization using local acoustic informtion for speaker verification”, Proc. ICASSP 1999,vol. 2, pp. 841 -844.
[40] T. Kinnunen, T. Kilpeläinen and P. Fränti, “ Comparison of clustering algorithms in speaker identification,” Proc. IASTED Int. Conf. Signal Processing and Communications (SPC 2000), Marbella, Spain, 2000, pp. 222-227.
[41] T. Soong, T. Phan, “Text Independent Speech Recognition, http://www.e,” 1999.
[42] T. Eriksson; S. Kim; Hong-Goo Kang and C. Lee; “An information-theoretic perspective on feature selection in speaker recognition” Signal Processing Letters, IEEE Vol.12, Issue 7, July 2005 pp.500 - 503
[43] V. Vapnik, “Statistical Learning Theory,” Wiley, New York, 1998
[44] V. Wan and S. Renals;” Evaluation of kernel methods for speaker verification and identification,”Acoustics, Speech, and Signal Processing, 2002. Proceedings. (ICASSP '02). IEEE International Conference on, Vol.1, 13-17 May 2002, pp.669-672.
[45] V. Wan and W.M. Campbell;” Support vector machines for speaker verification and identification,” Neural Networks for Signal Processing X, 2000. Proceedings of the 2000 IEEE Signal Processing Society Workshop, Vol.2, 11-13 Dec. 2000, pp.775 – 784.
[46] W.D. Zhang, M.W. Mak and X. He, “A two-stage scoring method combining world and cohort models for speaker verification,” Proc.ICASSP 2000, Vol. 2, pp. II1193 -II1196.
[47]W.M. Campbell and K.T Assaleh, “Polynomial classifier techniques for speaker verification,” Proc. ICASSP 1999, Vol. 1, pp. 321 -324.
[48]X. Huang, A. Acero and H.-W. Hin, “Spoken language processing,” Upper Saddle River, New Jersey, Prentice Hall PTR, 2001.
[49] Y. Linde, A. Buzo and R. Gray, “An algorithm for vector quantizer design,” IEEE Transactions on Communications, VOl. 28, 1980, pp. 84-95.
[50] Z. X. Yuan, B. L. Xu, and C. Z. Yu, “Binary quantization of feature vectors for robust text-independent speaker identification,” IEEE Tran. of Speech and Audio Processing, Vol. 7, No. 1, Jan 1990.
[51] Z. Ma; Y. Yang; Z. Wu; “Further feature extraction for speaker recognition,” Systems, Man and Cybernetics, 2003. IEEE International Conference on Vol.5, 5-8 Oct. 2003 pp.4153 – 4158.
[52] 顏銘祥, “ 以DSP為架構的不特定語句即時與鍺辨識系統, ” 國立中山大學碩士論文, 民國九十三年.
[53] 黃雪珠, ”基於小波轉換之語者識別分析, ”國立台灣大學碩士論文,民國九十二年.
[54] 古詩峰, ”基於小波轉換特徵參數以及使用麥克風和電話語料之大量語者識別系統, ”私立長庚大學碩士論文,民國九十一年.
[55] 黃俊豪, ”大量語者不特定語句環境下語者辨識系統之特徵設計,”國立中山大學碩士論文,民國八十九年.
[56] 謝宏坤, ”語音說明中搜尋任意定義之關鍵詞的研究, ”台灣科技大學碩士論文,中華民國八十九年.
[57] 許世俊,”用於高斯混合模型語者辨認之區別式訓練方法,”國立清華大 學碩士論文,中華民國八十五年.
[58] 楊璧如, ”語者歌者識別,”國立清華大學碩士論文,中華民國八十八年.