Author: 周家得
Chou - Chia Te
Thesis Title: 以支向機為基礎並結合特徵擷取之語者辨識系統
Speaker Recognition based on Support Vector Machine with Feature Selection
Advisor: 洪西進
Shi-Jinn Horng
Committee: 王振興
Jeen-Shing Wang
Chang-Biau Yang
Hung-Yan Gu
Degree: 碩士
Department: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
Thesis Publication Year: 2006
Graduation Academic Year: 94
Language: 中文
Pages: 72
Keywords (in Chinese): 梅爾刻度式倒頻譜參數向量量化隱藏式馬可夫模型支向機
Keywords (in other languages): MFCC, SVM, HMM, VQ
Reference times: Clicks: 204Downloads: 4
Due to the development of the speech recognition, speaker recognition technology leads to various biometric applications and attracts a lot of attentions. In this thesis, several common technologies used in speech are applied to speaker recognition system. This system is modeled by a machine learning system which is a popular research area now.
In the experiment of a text dependent speaker recognition system, this means we need to know the content of the speech in advance while the identification processing is proceeded. We compare three different methods using in speaker recognition. Hidden Markov Model and Vector Quantization Model have been broadly used. Although their experimental results are good, the results are still not as good as the result of using Support Vector Machine. The percentage of Equal Error Rate (EER) of a speaker recognition system using Support Vector Machine is 0.5%.
In the experiment of text independent system, the content of speech is unimportant, and only the speech features are used to identify the speaker. We discuss in two ways and integrate with the feature selection brought up by this thesis to extract features. It can be known from the results of the experiment that the outcome of speaker identification system by using Support Vector Machine is better than that of using Vector Quantization. The EER% of Support Vector Machine is 1.89%. The identification result will be even better than that only using Support Vector Machine after feature selection. The EER% of it will be 1.24%. However, if we delete too much feature information of a speaker, the identification rate will be worse.

摘要 I Abstract II 誌謝 III Index IV List of Figures and Tables VI Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Document Discussion 2 1.3 Summary 4 Chapter 2 Speaker's Recognition System 5 Chapter 3 Speaker Recognition Preprocessing 9 3.1 Recording 9 3.2 Energy Detection 11 3.3 Division of the Frame and Window Function 12 Chapter 4 Calculation of speaker’s recognition features 16 4.1 Fast Fourier Transform 17 4.2 Mel-Frequency Transform 18 4.3 Mel-Frequency Cepstrum Coefficients 22 4.4 Delta Coefficients 23 Chapter 5 Modeling 25 5.1 Vector Quantization 26 5.2 Hidden Markov Model 31 5.2.1 Definition 31 5.2.2 Viterbi Algorithm 36 5.3 Feature Selection 43 5.4 Support Vector Machine 46 Chapter 6 Experiments & Results 53 6.1 Developing Environment 53 6.2 System Operation 53 6.3 Experiment Design and the Structure of the Experiment 55 6.4 The Evaluation of System Efficiency 56 6.5 Brief Introduction of Speech Database 57 6.6 Experiment of Text Dependant Verification 58 6.7 Experiment of Text Independent Verification 60 6.7.1 The Influence of Threshold Values of Different feature selections to Identification Rate 60 6.7.2 Comparison with other theses 63 Chapter 7 Summary and Perspective 64 7.1 Summary 64 7.2 Perspective 65 Reference 66

