研究生: |
陳忠緯 Chung-wei Chen |
---|---|
論文名稱: |
用於英語語音合成之基週軌跡產生方法 A Pitch Contour Generation Method for English Speech Synthesis |
指導教授: |
古鴻炎
Hung-Yan Gu |
口試委員: |
洪維廷
Wei-Tyng Hong 陳柏琳 Ber-Lin Chen 林彥君 Yen-Chun Lin 林伯慎 Bor-Shen Lin |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
論文出版年: | 2010 |
畢業學年度: | 98 |
語文別: | 中文 |
論文頁數: | 66 |
中文關鍵詞: | 基週軌跡 、聲調預測 、語音信號合成 |
外文關鍵詞: | pitch contour, tone predict, speech signal synthesis |
相關次數: | 點閱:753 下載:2 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文研究了英語語音合成之基週軌跡產生的方法。基週軌跡產生的第一階段工作是預測英語音節的聲調類別,我們提出以兩層式的演算法來作聲調預測,第一層透過動態規劃來尋找最佳的聲調組合之狀態序列,第二層則是用以估計各個音節的局部聲調機率,我們研究了三種局部聲調機率的估計方法,分別為加權式估計法、PPM估計法及類神經網路估計法。接著在基週軌跡產生的第二階段,我們把預測出的聲調、及其它語境資料帶入一個類神經網路來產生出各音節的基週軌跡。然後我們採用規則式作法來設定音量、音長及停頓,語音信號合成則是採用HNM合成法。目前已初步建立一個英語的語音合成系統,並且用以進行系統內聽測之實驗,我們發現聲調預測正確率越高,則合成語音的自然度會愈好;另外也進行了系統間聽測之實驗,結果顯示我們系統的合成語音的自然度,仍然比Festival HTS的差一截。
In this thesis, a pitch contour generation method for english speech synthesis is studied. The first phase of pitch contour generation is to predict the tone class of each syllable. We have proposed a two-tier algorithm to predict syllable tone classes for a sentence. The first tier is to find the best sequence of tone class combined states by using a dynamic programming based algorithm. In the second tier, for each tone combined state of a syllable, its local probability is estimated. We have studied three local probability estimation methods, namely, the weighted method, PPM based method and artificial neural network (ANN) based method. In the second phase, we take the predict tones and other contextual information into another ANN to generate a pitch contour for each syllable. Then, we use heuristic rules to set the volume, duration and pause of each syllable. Next, speech signal is synthesized by using the method of harmonic-plus-noise model. Therefore, we have initially built an English speech synthesis system. This system is then used to conduct listening tests. We find that as the accuracy of syllable tone prediction becomes higher, the naturalness of the synthesized speech will become better. Also inter-system listening tests have been conducted. The results show that our system’s naturalness level is still significantly lower than that of Festival HTS system.
[1] 周彥佐,基於HNM之國語、閩南語的語音合成研究,國立台灣科技大學資訊工程研究所碩士論文,台北,2007。
[2] 梁弘學,英語歌聲合成之研究,國立台灣科技大學資訊工程研究所碩士論文,台北,2009。
[3] R. A. J. Clark, “Generating Synthetic Pitch Contours Using Prosodic Structure”, PhD thesis, Edinburgh, 2003 .
[4] K. Silverman et al, “ToBI : A Standard for Labelling English Prosody”, Proceedings of the 1992 International Conference on Spoken Language Processing, pp. 867 – 870, Banff, 1992.
[5] P. Taylor, “The tilt intonation model” in ICSLP ’98, Sydney, 1998.
[6] J. E. Cahn, “Generating expression in synthesized speech. Technical Report.” Boston: MIT Media Lab, Cambridge, 1990.
[7] J. B. Pierrehumbert, “The phonology and phonetics of English intonation”, Ph.D. Thesis, MIT, Cambridge, 1980.
[8] The Centre for Speech Technology Research, The Festival Speech Synthesis System. http://www.cstr.ed.ac.uk/projects/festival/
[9] HTS working group , HMM-based Speech Synthesis System (HTS).
http://hts.sp.nitech.ac.jp/
[10] AT&T, Natural Voices. http://www.naturalreaders.com/
[11] NCH, Verbose Text to Speech Converter.
http://www.nch.com.au/verbose/index.html
[12] L. Breiman, J. H. Friedman, R. A. Olshen, C. J. Stone, “Classification and regression trees”, CRC Press, United Kingdom, 1998
[13] D. C. Montgomery, E.A. Peck, “Introduction to linear regression analysis”, Wiley, Hoboken, 2007
[14] T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi and T. Kitamura, “Duration Modeling in HMM-based Speech Synthesis System,“ Proc. of ICSLP, Sydney, 1998.
[15] 曹亦岑,使用小型語料類神經網路之國語語音合成韻律參數產生,國立台灣科技大學電機所,碩士論文,台北,1999。
[16] Carnegie Mellon University, The CMU Pronouncing Dictionary. http://www.speech.cs.cmu.edu/speech/
[17] K. Sjlander and J. Beskow, Centre of Speech Technolodge at KTH, http://www.speech.kth.se/wavesurfer/
[18] S. J. Lee, K. C. Kim, H. Y. Jung, and W. Cho, “Application of Fully Recurrent Neural Networks for Speech Recognition”, ICASSP, pp. 77-80, South Korea, 1991.
[19] K. Sayood, “Introduction to Data Compression, 3’rd ed.”, Morgan Kaufmann , San Francisco, 2005.
[20] 陳坤茂,作業研究(三版),華泰文化,台北,2005。
[21] Wikipedia, “Multilayer perceptron,”
http://en.wikipedia.org/wiki/Multilayer_perceptron.
[22] 葉怡成,類神經網路模式應用與實作,儒林圖書公司,台北,2006。