Basic Search / Detailed Display

Author: 陳忠緯
Chung-wei Chen
Thesis Title: 用於英語語音合成之基週軌跡產生方法
A Pitch Contour Generation Method for English Speech Synthesis
Advisor: 古鴻炎
Hung-Yan Gu
Committee: 洪維廷
Wei-Tyng Hong
陳柏琳
Ber-Lin Chen
林彥君
Yen-Chun Lin
林伯慎
Bor-Shen Lin
Degree: 碩士
Master
Department: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
Thesis Publication Year: 2010
Graduation Academic Year: 98
Language: 中文
Pages: 66
Keywords (in Chinese): 基週軌跡聲調預測語音信號合成
Keywords (in other languages): pitch contour, tone predict, speech signal synthesis
Reference times: Clicks: 556Downloads: 2
Share:
School Collection Retrieve National Library Collection Retrieve Error Report
  • 本論文研究了英語語音合成之基週軌跡產生的方法。基週軌跡產生的第一階段工作是預測英語音節的聲調類別,我們提出以兩層式的演算法來作聲調預測,第一層透過動態規劃來尋找最佳的聲調組合之狀態序列,第二層則是用以估計各個音節的局部聲調機率,我們研究了三種局部聲調機率的估計方法,分別為加權式估計法、PPM估計法及類神經網路估計法。接著在基週軌跡產生的第二階段,我們把預測出的聲調、及其它語境資料帶入一個類神經網路來產生出各音節的基週軌跡。然後我們採用規則式作法來設定音量、音長及停頓,語音信號合成則是採用HNM合成法。目前已初步建立一個英語的語音合成系統,並且用以進行系統內聽測之實驗,我們發現聲調預測正確率越高,則合成語音的自然度會愈好;另外也進行了系統間聽測之實驗,結果顯示我們系統的合成語音的自然度,仍然比Festival HTS的差一截。


    In this thesis, a pitch contour generation method for english speech synthesis is studied. The first phase of pitch contour generation is to predict the tone class of each syllable. We have proposed a two-tier algorithm to predict syllable tone classes for a sentence. The first tier is to find the best sequence of tone class combined states by using a dynamic programming based algorithm. In the second tier, for each tone combined state of a syllable, its local probability is estimated. We have studied three local probability estimation methods, namely, the weighted method, PPM based method and artificial neural network (ANN) based method. In the second phase, we take the predict tones and other contextual information into another ANN to generate a pitch contour for each syllable. Then, we use heuristic rules to set the volume, duration and pause of each syllable. Next, speech signal is synthesized by using the method of harmonic-plus-noise model. Therefore, we have initially built an English speech synthesis system. This system is then used to conduct listening tests. We find that as the accuracy of syllable tone prediction becomes higher, the naturalness of the synthesized speech will become better. Also inter-system listening tests have been conducted. The results show that our system’s naturalness level is still significantly lower than that of Festival HTS system.

    第1章 緒論 1.1 研究動機及目的 1.2 文獻回顧 1.3 研究方法 1.4 論文架構 第2章 英語語料的收集與分析 2.1 語料收集與標音 2.2 聲調符號及其意義 2.3 音高軌跡分析 2.4 半音節(demisyllable)分類 第3章 音高軌跡類神經網路 3.1 類神經網路簡介 3.2 類神經網路結構 3.3 類神經網路輸出入參數 3.4 ANN模型訓練 3.5 音高軌跡產生例子 第4章 英語音節之聲調預測 4.1 基本觀察 4.2 聲調序列之預測方法 4.2.1 動態規劃問題 4.2.2 聲調預測之動態規劃 4.2.3 動態規劃例子 4.3 加權式局部機率估計法 4.4 PPM局部機率估計法 4.5 類神經網路局部機率估計法 4.6 聲調預測實驗 第5章 語音合成系統建造 5.1 文句讀入及剖析 5.2 查辭典轉成音節序列 5.3 音量、音長與停頓設定 5.4 HNM信號合成 5.5 程式介面與測試 第6章 聽測實驗 第7章 結論

    [1] 周彥佐,基於HNM之國語、閩南語的語音合成研究,國立台灣科技大學資訊工程研究所碩士論文,台北,2007。
    [2] 梁弘學,英語歌聲合成之研究,國立台灣科技大學資訊工程研究所碩士論文,台北,2009。
    [3] R. A. J. Clark, “Generating Synthetic Pitch Contours Using Prosodic Structure”, PhD thesis, Edinburgh, 2003 .
    [4] K. Silverman et al, “ToBI : A Standard for Labelling English Prosody”, Proceedings of the 1992 International Conference on Spoken Language Processing, pp. 867 – 870, Banff, 1992.
    [5] P. Taylor, “The tilt intonation model” in ICSLP ’98, Sydney, 1998.
    [6] J. E. Cahn, “Generating expression in synthesized speech. Technical Report.” Boston: MIT Media Lab, Cambridge, 1990.
    [7] J. B. Pierrehumbert, “The phonology and phonetics of English intonation”, Ph.D. Thesis, MIT, Cambridge, 1980.
    [8] The Centre for Speech Technology Research, The Festival Speech Synthesis System. http://www.cstr.ed.ac.uk/projects/festival/
    [9] HTS working group , HMM-based Speech Synthesis System (HTS).
    http://hts.sp.nitech.ac.jp/
    [10] AT&T, Natural Voices. http://www.naturalreaders.com/
    [11] NCH, Verbose Text to Speech Converter.
    http://www.nch.com.au/verbose/index.html
    [12] L. Breiman, J. H. Friedman, R. A. Olshen, C. J. Stone, “Classification and regression trees”, CRC Press, United Kingdom, 1998
    [13] D. C. Montgomery, E.A. Peck, “Introduction to linear regression analysis”, Wiley, Hoboken, 2007
    [14] T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi and T. Kitamura, “Duration Modeling in HMM-based Speech Synthesis System,“ Proc. of ICSLP, Sydney, 1998.
    [15] 曹亦岑,使用小型語料類神經網路之國語語音合成韻律參數產生,國立台灣科技大學電機所,碩士論文,台北,1999。
    [16] Carnegie Mellon University, The CMU Pronouncing Dictionary. http://www.speech.cs.cmu.edu/speech/
    [17] K. Sjlander and J. Beskow, Centre of Speech Technolodge at KTH, http://www.speech.kth.se/wavesurfer/
    [18] S. J. Lee, K. C. Kim, H. Y. Jung, and W. Cho, “Application of Fully Recurrent Neural Networks for Speech Recognition”, ICASSP, pp. 77-80, South Korea, 1991.
    [19] K. Sayood, “Introduction to Data Compression, 3’rd ed.”, Morgan Kaufmann , San Francisco, 2005.
    [20] 陳坤茂,作業研究(三版),華泰文化,台北,2005。
    [21] Wikipedia, “Multilayer perceptron,”
    http://en.wikipedia.org/wiki/Multilayer_perceptron.
    [22] 葉怡成,類神經網路模式應用與實作,儒林圖書公司,台北,2006。

    QR CODE