研究生: |
黃崇哲 Chung-Che Huang |
---|---|
論文名稱: |
用於名人語音合成之PCA與ANN為基礎的音色轉換方法 A PCA and ANN Based Timbre Conversion Method for Synthesizing a Famous Person's Speech |
指導教授: |
古鴻炎
Hung- yan Gu |
口試委員: |
陳秋華
Chyou-Hwa Chen 王新民 Hsin-Min Wang 林伯慎 Bor-shen Lin |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
論文出版年: | 2011 |
畢業學年度: | 99 |
語文別: | 中文 |
論文頁數: | 88 |
中文關鍵詞: | 音色轉換 、非平行語料 、主成分分析 、類神經網路 、隱藏式馬可夫模型 |
外文關鍵詞: | timbre-conversion, non-parallel corpus, PCA, ANN, HMM |
相關次數: | 點閱:250 下載:4 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在少量非平行語料的情況下,本論文嘗試解決PCA與ANN結合之音色轉換方法所面對的問題,然後據以建立一個可合成出名人音色之國語語音合成系統。我們分別對聲、韻母設計一種分類方式,以減少類別數,然後對各類別的音節發音去訓練出一個共用的HMM結構,用以對各音節求取固定個數之DCC頻譜特徵向量,我們期望以這個共用的HMM結構去改進常見的左至右HMM結構。對於對映來源語音主成分係數至目標語音主成分係數之ANN的訓練,在非平行語料之情況,我們研究了一種混合粗糙分類與細緻分類的語境分類方式,來為各個目標音節尋找語境較為近似的來源音檔,接著對各個聲、韻母類別進行PCA分析及訓練專屬的ANN對映機制。在合成階段,則整合PCA與ANN結合之音色轉換功能至前人發展的國語語音合成系統,然後合成出經過轉換音色的語音去作聽測實驗,實驗的結果顯示,所合成出的語音音色多數人認為有些像目標名人,不過音質與韻律特性則需進一步改進。
In this thesis, we try to solve the problems encountered when applying the PCA (principle component analysis) and ANN (artificial neural network) based timbre-conversion method under the situation that only a small and non-parallel corpus is available. Then, this method is adopted to construct a Mandarin speech synthesis system to synthesize speech with a famous person's timbre. We design special classification methods for syllable initial and final, respectively, to decrease the number of categories. For the syllable signals of a category, an HMM of a state-sharing structure is trained in order to obtain a fixed number of DCC (discrete cepstral coefficient) vectors for each syllable signal. This state-sharing structure of HMM is intended to improve the left-to-right HMM structure. An ANN is adopted to map the PCA coefficients of a source syllable into the PCA coefficients of a target syllable. For training this ANN, we propose a method to classify a context under which a syllable is pronounced. This method combines precise classification and rough classification. In terms of the result of context classification, we can find some source syllables that are most similarly in context for a target syllable. For each category of syllable initial and final, a separate ANN for mapping PCA coefficients is trained after the DCC vectors are analyzed with PCA. In the synthesis stage, we integrate the function of PCA and ANN based timbre-conversion into the Mandarin speech synthesis system developed previously by others. By using this system, some speech signals are synthesized with timbre converted, and used to conduct listening tests. The results of the tests show that the timbre of the synthesized speech is a little similar to that of a target famous person. However, the signal quality and the characteristic of prosody need to be improved in the future.
[1] Z.Yue , X. Zou , Y. Jia, H. Wang, ”Voice conversion using HMM combined with GMM”, 2008 Congress on Image and Signal Processing, Sanya, Hainan, China, pp. 366-370, 2008.
[2] B. Zhang, Y. Yu, ”Voice Conversion Based on Improved GMM and Spectrum with Synchronous Prosody”, ICSP2008 Proceedings, Leipzig, Germany, pp. 659-662, 2008.
[3] D. O'Shaughnessy, Speech Communications 2/E, IEEE Press, 2000.
[4] O. Cappe, E. Moulines, ”Regularization Techniques for Discrete Cepstrum Estimation”, IEEE Signal processing letters, Vol. 3, No. 4, pp. 100-102, April 1996.
[5] E. N. Taoufik, R. Olivier, Chonavel, Thierry, “A voice conversion method based on joint pitch and spectral envelope transformation”, In Interspeech, Jeju Island, Korea, pp. 1225-1228, 2004.
[6] Z. H .Jian, Z. Yang, ”Voice Conversion Using Viterbi Algorithm Based On Gaussian Mixture Model”, Proceedings of 2007 International Symposium on Intelligent Signal Processing and Communication Systems, Xiamen, China, pp. 32-35.
[7] 吳嘉彧, 王小川, ”A Voice Conversion System based on Formant and LSF Mapping without Using Parallel Corpus”, pp. 319-332, 2009.
[8] A. Mouchtaris, J. V. Spiegel, P. Mueller, ”Department of Electrical & Systems Engineering Departmental Papers (ESE)”, IEEE Transactions on Audio, Speech and Language Processing, Vol. 14, No. 2, pp . 952-963, May 2006.
[9] Z. H. Jian, Z. Yang, ”Voice Conversion without Parallel Speech Corpus Based on Mixtures of Linear Transform”, Wireless Communications, Networking and Mobile Computing, 2007, White Plains, New York, USA, pp. 2825-2828,2007.
[10] Q. Yong, S. Zhiwei, R. Bakis, ” IBM Voice Conversion Systems for 2007 TC-STAR Evaluation”, Tsinghua Science and Technology Issn 1007-0214 13/22, pp. 510-514, 2008.
[11] Y. Stylianou, O. Capp´e, E. Moulines, “Continuous Probabilistic Transform for Voice Conversion”, IEEE Transaction on speech and audio processing, Vol. 6, No. 2 , pp. 131-142, March 1998.
[12] Y. Yu, B. Zhang, ”Voice Conversion Based on Improved GMM and Spectrum with Synchronous Prosody”, ICSP 2008 Proceedings, Leipzig, Germany, pp. 659-662, 2008.
[13] A. K. Paul, D. Das, Md. M. Kamal, ”Bangla Speech Recognition System using LPC and ANN” , 2009 Seventh International Conference on Advances in Pattern Recognition, Kolkata, West Bengal, India, pp. 171-174, 2009.
[14] 蔡承霖,整合音色變換之國語語音合成系統,國立台灣科技大學資訊工程研究所碩士論文,2009。
[15] 蔡松峯,GMM為基礎之語音轉換法的改進,國立台灣科技大學資訊工程研究所碩士論文,2009。
[16] 賴名彥,結合HMM頻譜模型與ANN韻律模型之國語語音合成系統,國立台灣科技大學資訊工程研究所碩士論文,2009。
[17] L. Rabiner and B. H. Juang, “Fundamentals of Speech Recognition”, Pretice-Hall International, Inc.1993.
[18] K. Tokuda, H. Zen, and A.W. Black, “An HMM-based speech synthesis system applied to English”, Proc. IEEE 2002 Workshop on Speech Synthesis, Santa Monica, USA, Sep. 2002.
[19] M. Turk, A. Pentland, “Eigenfaces for recognition”, Journal of cognitive neuroscience, Vol. 3, No. 1, pp.71-86, 1991.
[20] 葉怡成,類神經網路模式應用與實作,儒林圖書公司,台北,2006。
[21] 蔡哲彰,國語合成歌聲流暢度改進之研究,國立台灣科技大學資訊工程研究所碩士論文,2009。
[22] Wikipedia, “Artificial neural network,”
http://en.wikipedia.org/wiki/Artificial_neural_network.
[23] 吳昌益, 使用頻譜演進模型之國語語音合成研究, 國立台灣科技大學資訊工程研究所碩士論文,2007。
[24] 林正甫, 使用ANN抖音參數模型之國語歌聲合成, 國立台灣科技大學資訊工程研究所碩士論文,2008。
[25] T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi and T. Kitamura, "Duration Modeling in HMMbased Speech Synthesis System", Proc. of ICSLP, Sydney, Australia, Vol. 2, pp. 29–32, 1998.