Author: |
旭法 Sivabalan - Adinarayanan |
---|---|
Thesis Title: |
文句不相關語者驗證使用支援向量機 Text-Independent Speaker Verification Using Support Vector Machine |
Advisor: |
洪西進
Shi-Jinn Horng |
Committee: |
王有禮
Yue-Li Wang 梅興 Hsing Mei 王振興 Jeen-Shing Wang 楊昌彪 Chang-Biau Yang |
Degree: |
碩士 Master |
Department: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
Thesis Publication Year: | 2005 |
Graduation Academic Year: | 93 |
Language: | 英文 |
Pages: | 83 |
Keywords (in Chinese): | 梅爾倒頻譜參數 、支援向量機 、語者驗證 |
Keywords (in other languages): | Support Vector Machine, Speaker Verification, MFCC |
Reference times: | Clicks: 533 Downloads: 3 |
Share: |
School Collection Retrieve National Library Collection Retrieve Error Report |
系統實作中,以梅爾倒頻譜參數(Mel-Frequency Cepstral Coefficients, MFCCs)做為語者特徵,結合支援向量機(Support Vector Machine)建立語者相依模型。
This dissertation aims to explore the technology of speaker recognition,
specifically by researching the technique in current state-of-the-art systems. Current
state-of-the-art speaker verification systems are based on discriminatively trained
generative models. In these systems, discrimination is achieved with the linear
function. We studied the use of support vector machines (SVMs) for text
independent speaker verification. Two main approaches were considered. The first is
approach using linear SVMs. The second approach is an utterance based approach
using kernels SVMs. State-of-the-art speaker verification systems rely on generative
models to recognize speakers. It is a curious result since discriminative approaches
for classification should in theory be better than generative ones since the former are
optimized to minimize the classification error rate explicitly compared to the latter.
The polynomial kernel and radial basis function kernel are widely used for
speaker verification task. We examine the properties of the linear SSVMs in
comparison. By doing so, we will be able to study or adopt a simpler system with
faster execution time which would yield to high or close performance in term of
accuracy with the current kernel methods. The approach using linear SVMs is to
study the method efficiency in simplicity and time consumption in reducing the error
rate. This is in order to overcome the difficulties arising from an application of
complex kernel SVMs to speaker verification. We begin with an investigation into
the similar kernel functions like polynomial and RBF kernels. This technique were
tested on one of the top ten database named YOHO database and then evaluated on
the more difficult custom-build text-independent database. This separation of the development from the evaluation is important to ensure that the methods are general
and that the classifiers have not been tuned to one particular database.
Experimentally the linear SVMs benefits, by not only out perform current
state-of-the-art classifiers on the YOHO text-independent speaker verification
database but even with the kernel functions yielding to a close result and faster
execution time. This thesis reports equal error rates on the YOHO database that are
1.81% of equal error rate and 0.65% of equal error rate with our ownbuild textindependent
database
[1] Prabhakar, S., Pankanti, S., and Jain, A. Biometric recognition: security and
privacy concerns. IEEE Security & Privacy Magazine 1 (2003), 33–42.
[2] The Biometric Consortium. Webpage, December 2003.
http://www.biometrics.org/
[3] Kittler, J., and Nixon, M., Eds. 4th International Conference on Audio- and
Video-Based Biometric Person Authentication (AVBPA 2003). Lecture Notes
in Computer Science. Springer-Verlag, Berlin, 2003.
[4] J.R. Flanagan. Speech Analysis, Synthesis and Perception, chapter 3. Springer-
Verlag, 1972.
[5] L. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Englewood
Cliffs, New Jersey: Prentice Hall, pp. 14-17, pp. 52-65, pp. 112-117, pp. 183-
191, 1993.
[6] B. S. Atal, "Automatic recognition of speakers form there voices," Proc. IEEE,
vol.64, pp. 460-475, 1976.
[7] A. E. Rosenberg, and F. K. Soong, "Recent research in automatic speaker
recognition," in Advances in Speech Signal Processing, S. Furui, M. Sondhi,
Eds. New York: Marcel Dekker Inc., pp. 701-737, 1992.
[8] G. J. Tortora and S. R. Grabowski, Principles of Anatomy and Physiology, (8th
Ed.) New York: Harper Collins, p. 709, 1996.
[9] D. A. Reynolds, "Automatic speaker recognition using gaussian mixture
speaker models," Lincoln Laboratory Journal, vol. 8, no. 2, pp. 173-192, 1995.
[10] S. Furui, "An overview of speaker recognition technology," in Automatic
Speech and Speaker Recognition, C. H. Lee, F. K. Soong, and K. K. Paliwal,
Eds. Boston: Kluwer Academic, pp. 31-56 ,1996.
[11] G.R. Doddington. Speaker recognition - Identifying people by their voices.
Proceedings of the IEEE, 73(11):1651-1663, November 1985.
[12] G.R. Doddington. Speaker recognition based on idolectal differences between
speakers. In Proc. Eurospeech, pages 2521-2524, Aalborg, September 2001.
[13] W.D. Andrews, M.A. Kohler, and J.P. Campbell. Phonetic speaker recognition.
In Proc. Eurospeech, pages 2517-2520, Aalborg, September 2001.
[14] C.R. Janowski Jr., T.F. Quatieri, and D.A. Reynolds. Measuring fine structure
in speech: Application to speaker identification. In Proc. ICASSP, pages 325-
328, Detroit, May 1995.
[15] H.A. Murthy, F. Beaufays, L.P. Heck, and M. Weintraub. Robust textindependent
speaker identification over telephone channels. IEEE Trans. On
Speech and Audio Processing, 7(5):554-568, September 1999.
[16] Rose, P. Forensic Speaker Identification. Taylor & Francis, London, 2002.
[17] Niemi-Laitinen, T. Thesis, University of Helsinki, Department of Phonetics,
Helsinki, Finland, 1999.
[18] Kuhn, R., Junqua, J.-C., Nguyen, P., and Niedzielski, N. Rapid speaker
adaptation in eigenvoice space. IEEE Trans. on Speech and Audio Processing 8
(2000), 695–707.
[19] Martin, A., and Przybocki, M. Speaker recognition in a multi-speaker
environment. In Proc. 7th European Conference on Speech Communication and
Technology (Eurospeech 2001) (Aalborg, Denmark, 2001), pp. 787–790.
[20] Lapidot, I., Guterman, H., and Cohen, A. Unsupervised speaker recognition
based on competition between self-organizing maps. IEEE Transactions on
Neural Networks 13 (2002), 877–887.
[21] Liu, D., and Kubala, F. Fast speaker change detection for broadcast news
transcription and indexing. In Proc. 6th European Conference on Speech
Communication and Technology (Eurospeech 1999) (Budapest, Hungary,
1999), pp. 1031–1034.
[22] Kwon, S., and Narayanan, S. Speaker change detection using a new weighted
distance measure. In Proc. Int. Conf. on Spoken Language Processing (ICSLP
2002) (Denver, Colorado, USA, 2002), pp. 2537–2540.
[23] Brunelli, R., and Falavigna, D. Person identification using multiple cues. IEEE
Trans. on Pattern Analysis and Machine Intelligence 17, 10 (1995), 955–966.
[24] Toh, K.-A. Fingerprint and speaker verification decisions fusion. In Proc. 12th
Int. Conf. on Image Analysis and Processing (ICIAP’03) (2003), pp. 626–631.
[25] Kittler, J., and Nixon, M., Eds. 4th International Conference on Audio- and
Video-Based Biometric Person Authentication (AVBPA 2003). Lecture Notes
in Computer Science. Springer-Verlag, Berlin, 2003.
[26] Zetterholm, E. The significance of phonetics in voice imitation. In Proc. 8th
Australian Int. Conf. on Speech Science and Technology (2000), pp. 342–347.
[27] J.P. Campbell, J. (1997), “Speaker Recognition: A Tutorial”, in ‘Proceedings of
the IEEE’, Vol. 85, pp. 1437–1462
[28] Furui, S. Recent advances in speaker recognition. Pattern Recognition Letters
18, 9 (1997), 859–872.
[29] Reynolds, D. (2002), “An Overview of Automatic Speaker Recognition
Technology”, in ‘Proceedings of the International Conference on Acoustics,
Speech and Signal Processing. ICASSP 2002’, Vol. 4, pp. 4072–4075.
[30] Naik, J. M. (1990), ‘Speaker Verification: A Tutorial’, IEEE Communications
Magazine 28, 42–48.
[31] Che, C., Lin, Q. & Yuk, D.-S. (1996), “An HMM Approach to Text-Prompted
Speaker Verification”, in ‘Proceedings of the International Conference on
Acoustics, Speech and Signal Processing. ICASSP ’96’, pp. CD–ROM.
[32] Matsui, T. & Furui, S. (1993), “Concatenated Phoneme Models for Text-
Variable Speaker Recognition”, in ‘Proceedings of the International
Conference on Acoustics, Speech and Signal Processing. ICASSP ’93’, Vol. 2,
pp. 391–394.
[33] Gish, H. & Schmidt, M. (1994), ‘Text-Independent Speaker Identification’,
IEEE Signal Processing Magazine 11(4), 18–32.
[34] F.K. Song, A.E. Rosenberg and B.H. Juang, “A vector quantisation approach to
speaker recognition”, AT&T Technical Journal, Vol. 66-2, pp. 14-26, March
1987.
[35] A. E. Rosenberg, and F. K. Soong, "Recent research in automatic speaker
recognition," in Advances in Speech Signal Processing, S. Furui, M. Sondhi,
Eds. New York: Marcel Dekker Inc., pp. 701-737, 1992.
[36] F. K. Soong, A. E. Rosenberg, and B. H. Juang, "A vector quantization
approach to speaker recognition," AT & T Journal, vol. 66, no. 2, pp. 14-26,
1987.
[37] F. K. Soong, A. E. Rosenberg, and B. H. Juang, "A vector quantization
approach to speaker recognition," Proc. ICASSP'85, (Tampa, Florida), March
1985, pp. 387-390.
[38] Y. Linde, A. Buzo, and R. M. Gray, "An algorithm for vector quantisation,"
IEEE Trans. Communications, vol. COM-28, no. 1, pp 84-95, January 1980.
[39] J. Fritsch, Hierarchical Connectionist Acoustic Modeling for Domain-Adaptive
Large Vocabulary Speech Recognition, Ph. D. dissertation, University of
Karlsruhe, Germany, 2000.
[40] V.N. Vapnik, Statistical Learning Theory, John Wiley, New York, NY, USA,
1998.
[41] V.N. Vapnik. The Nature of Statistical Learning Theory. Springer, New York,
1995.
[42] C. J. C. Burges. A tutorial on support vector machines for pattern recognition.
Data Mining and Knowledge Discovery, 2(2):121-167, 1998.
[43] V. N. Vapnik. Estimation of Dependences Based on Empirical Data. Springer,
New York, 1982.
[44] T. Joachims. Learning to Classify Text Using Support Vector Machines.
Kluwer Academic Publishers, Norwell, Massachusetts, 2002.
[45] A. J. Robinson, Dynamic Error Propagation Networks, Ph.D. dissertation,
Cambridge University, UK, February 1989.
[46] T. Joachims, SVMLight: Support Vector Machine http://ai.informatik.
uniortmund.de/FORSCHUNG/VERFAHREN/SVM_LIGHT/svm_light.eng.ht
ml, University of Dortmund, November 1999.
[47] Y. LeCun, et. al., “Handwritten Digit Recognition with Backpropagation
Network,”Advances in Neural Information Processing Systems-2, Morgan
Kaufman,pp. 396-404, 1990.
[48] T. Joachims, “Text Categorization with Support Vector Machines: Learning
with Many Relevant Features,” Technical Report 23, LS VIII, University of
Dortmund, Germany, 1997.
[49] M. Schmidt, H. Gish, “Speaker Identification Via Support Vector
Classifiers,”Proceedings of the International Conference on Acoustics, Speech
and Signal Processing, pp. 105-108, Atlanta, GA, USA, May 1996
[50] S. Fine, J. Navratil and R. A. Gopinath. Hybrid GMM/SVM Approach to
Speaker Identification, Proceedings of the International Conference on
Acoustics, Speech and Signal Processing, Salt Lake City, Utah, USA, 2001.
[51] A. Ganapathiraju, J. Hamaker and J. Picone, “Support Vector Machines for
Speech Recognition,” Proceedings of the International Conference on Spoken
Language Processing, pp. 2923-2926, Sydney, Australia, November 1998.
[52] C. Philip and P. Moreno, “On the Use of Support Vector Machines for Phonetic
Classification,” Proceedings of the International Conference on Acoustics,
Speech and Signal Processing, Phoenix, Arizona, USA, 1999.
[53] R. Fletcher. Practical Methods of Optimization. John Wiley & Sons, Chichester,
second edition, 1987.
[54] N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector
Machines. Cambridge University Press, Cambridge, 2000.
[55] B. E. Boser, I. M. Guyon, and V. N. Vapnik. A training algorithm for optimal
margin classifiers. In D. Haussler, editor, Proceedings of the 5th Annual ACM
Workshop on Computational Learning Theory, pages 144{152, Pittsburgh, PA,
July 1992. ACM Press.
[56] C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20:273-
279, 1995
[57] M. Brown, W. Grundy, D. Lin, N. Cristianini, C. Sugnet, M. Ares, and D.
Haussler. Support vector machine classification of microarray gene expression
data. Technical Report UCSC-CRL-99-09, Department of Computer Science,
University of California, Santa Cruz, 1999.
[58] C. Chen and O. L. Mangasarian. Smoothing methods for convex inequalities
and linear complementarity problems. Mathematical Programming, 71(1):51-69,
1995.38
[59] C. Chen and O. L. Mangasarian. A class of smoothing functions for nonlinear
and mixed complementarity problems. Computational Optimization and
Applications, 5(2):97-138, 1996.
[60] Y.-J. Lee and O. L. Mangasarian. SSVM: A smooth support vector machine.
Computational Optimization and Applications, 20:5-22, 2001. Data Mining
Institute, University of Wisconsin, Technical Report 99-03.
ftp://ftp.cs.wisc.edu/pub/dmi/techreports/99-03.ps.
[61] G. Fung and O. L. Mangasarian. A feature selection Newton method for
support vector machine classification. Computational optimization and
applications, pages 1-18, 2003.
[62] J. Zhu, S. Rosset, T. Hastie, and R. Tibshirani. 1-norm support vector machines.
In Advances in Neural Information Processing Systems 07, 2003.
[63] Y.-J. Lee and O. L. Mangasarian. RSVM: Reduced support vector machines.
Technical Report 00-07, Data Mining Institute, Computer Sciences Department,
University of Wisconsin, Madison, Wisconsin, July 2000. Proceedings of the
First SIAM International Conference on Data Mining, Chicago, April 5-7, 2001,
CD-ROM Proceedings. ftp://ftp.cs.wisc.edu/pub/dmi/tech-reports/00-07.ps.
[64] O. L. Mangasarian. Generalized support vector machines. In A. Smola, P.
Bartlett, B. SchÄolkopf, and D. Schuurmans, editors, Advances in Large
Margin Classifiers, pages 135-146, Cambridge, MA, 2000. MIT Press.
ftp://ftp.cs.wisc.edu/mathprog/tech-reports/98-14.ps.
[65] R. Kohavi. Scaling up the accuracy of Naive-Bayes classifiers: a decision- tree
hybrid. In Proc. of the 2nd Int. Conf. on Knowledge Discovery and Data
Mining, 1996, Cambridge, MA 02142, 1996. The AAAI Press/The MIT Press.
http://robotics.stanford.edu/users/ronnyk/ronnyk-bib.html.
[66] J. Platt. Sequential minimal optimization: A fast algorithm for training support
vector machines. In B. SchÄolkopf, C. J. C. Burges, and A. J. Smola, editors,
Advances in Kernel Methods - Support Vector Learning, pages 185{208. MIT
Press, 1999. http://www.research.microsoft.com/~jplatt/smo.html.
[67] P. E. Gill, W. Murray, and M. H. Wright. Practical Optimization. Academic
Press, London, 1981.
[68] Y.-J. Lee, H.-Y. Lo, and S.-Y. Huang. Incremental reduced support vector
machine. In Proceedings of the 2003 International Conference on Informatics,
Cybernetics, and Systems (ICICS 2003), Kaohsiung, Taiwan, 2003.
[69] J. A. Swets, editor. Signal Detection and Recognition by Human Observers.
John Wiley & Sons, Inc., 1964.
[70] J. P. Egan. Signal Detection Theory and ROC. Academic Press, 1975.
[71] A. Martin, G. Doddington, T. Kamm, M. Ordowski, and M. Przybocki. The
DET curve in assessment of detection task performance. In Proc. EuroSpeech,
pages 1895.1898, September 1997.
[72] J. P. Campbell Jr. Testing with the YOHO CD-ROM voice verification corpus.
In Proc. ICASSP, volume 1, pages 341.344, 1995.
[73] Hou Fenglei, Wang Bingxi, “Text-independent speaker recognition using
support vector machine”, Info-tech and Info-net, 2001. Proc. ICII 2001-Beijing.
2001 International Conferences on Vol. 3, 29 Oct.-1 Nov.2001, pp 402-407
vol.3
[74] V.Wan and W.M.Campbell, “Support Vector Machines for speaker verification
and identification”, in Proc. Neural Networks for Signal Processing X,2000, pp.
775-784
[75] V.Wan and S.Renals, “Evaluation of kernel methods for speaker verification
and identification”, in Proc. ICASSP, vol. 1, 2002, pp. 669-672
[76] Lifeng Sang; Zhaohui Wu; Yingchun Yang; Wanfeng Zhang; Multimedia and
Expo, 2003. ICME '03. Proceedings. 2003 International Conference on Volume
3, 6-9 July 2003 Page(s):III - 613-16 vol.3
[77] Zhiyou Ma; Yingchun Yang; Zhaohui Wu; Systems, Man and Cybernetics,
2003. IEEE International Conference on Volume 5, 5-8 Oct. 2003 Page(s):
4153 - 4158 vol.5
[78] D. O’Shaughnessy, Speech Communication: Human and Machine,
AddisonWesley, New York, New York, USA, 1987.
[79] L.R Rabiner and R.W. Schafer, Digital Processing of Speech Signals, Prentice-
Hall, Englewood Cliffs, N.J., 1978.
[80] http://newsreader.mathworks.com/WebX?14@@/comp.soft-sys.matlab
[81] “MatlabVOICEBOX”
http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
[82] comp.speech Frequently Asked Questions WWW site,
http://svr-www.eng.cam.ac.uk/comp.speech/