簡易檢索 / 詳目顯示

研究生: 黃維
Wei - Huang
論文名稱: 以混合模型產生閩南語音節基週軌跡之研究
Min-Nan Syllable Pitch Contour Generation Using Mixed Models
指導教授: 古鴻炎
Hung-Yan Gu
口試委員: 余明興
none
陳錫明
none
呂仁園
none
陳桂霞
none
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2005
畢業學年度: 93
語文別: 中文
論文頁數: 76
中文關鍵詞: 語音合成
外文關鍵詞: speech synthesis
相關次數: 點閱:204下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文研究閩南語語音合成中音節基週軌跡產生之問題。以前人研究國語ANN模型的經驗為基礎,本文建造了一種閩南語ANN模型,考慮了閩南語不同於國語的聲調數目、聲母和韻母類別的差異,此外對於ANN的神經元數目、韻母之粗分類也作了研究。由於我們欲嘗試以混合的模型來產生出更為自然的基週軌跡,所以除了ANN之外,本文也使用了前人研究的HMM模型,對於ANN和HMM兩模型之間的混合作法,我們所探討的兩種作法,在內部測試的誤差上,均可獲得小幅度的改進。此外,依據聽測實驗的結果,也顯示混合模型所合成出的語句,在自然度上可以比個別模型更高一些。


    In this thesis, the problem of Min-Nan syllable pitch contour generation is studied. Since artificial neural network (ANN) had been studied by many researchers to synthesize Mandarin speech, we base on their experience to construct an ANN model for generating pitch-contours for the syllables in a Min-Nan sentence. In the ANN model, the differences between Mandarin and Min-Nan are considered, including the number of tones, and the types of syllable initials and finals. Besides, the numbers of neurons in the ANN model and different grouping ways for the finals are studied too. Because we hope the generated pitch-contour to be more natural, we have tried to use mixed models. That is, in addition to the ANN model, we also use a hidden Markov model (HMM) constructed by previous researcher. In this thesis, we have investigated two model mixing methods that are found to be able to slightly reduce the errors computed in inside tests. Also, according to the results of perceptual tests, naturalness level of the synthetic speech from using mixed models is found to be slightly higher than the speeches from using individual models.

    第一章 緒論…………………………………………………………….1 1.1 前言…………………………………………………………………1 1.2 基週軌跡產生之回顧………………………………………………1 1.3 研究動機和方法……………………………………………………3 第二章 基週軌跡處理及模型簡介…………………………………….6 2.1 基週軌跡正規化……………………………………………………6 2.2 基週軌跡向量量化…………………………………………………8 2.3 HMM模型簡介…………………………………………………..9 2.4 ANN模型簡介…………………………………………………..15 第三章 ANN模型之改進…………………………………………….20 3.1 ANN的結構……………………………………………………..21 3.2 ANN的神經元數目實驗 ………………………………………23 3.3 ANN的韻母分類實驗…………………………………………..28 第四章 HMM和ANN模型混合 ………………………………………30 4.1 模型混合的目的…………………………………………………..30 4.2 模型混合的方法.…………………………………………………..33 4.3 模型混合實驗….…………………………………………………..35 第五章 聽測實驗與結論………………………………………………39 5.1 閩南語合成語句的聽測…...…………………………………….40 5.2 國語合成語句的聽測……………………………………………41 5.3 結論………………………………………………………………44 參考文獻………………………………………………………………..47 附錄……………………………………………………………………..49

    Bellegarda,J.,"Statistical Prosodic Modeling: From Corpus Design to Parameter Estimation",IEEE trans. Speech and Audio Processing, Vol.9,No.1,pp.52-66,2001.

    Chen,S.H.,S.H.Hwang and Y.R.Wang,"An RNN-based Prosodic Information Information Synthesizer for Mandarin Text-to-Speech",IEEE trans. Speech and Audio Processing, Vol.6,No.3,pp.226-239,1998.

    Gu,Hung-Yan and Wen-Lung Shiu, "A Mandarin-syllable Signal Synthesis Method with Increased Flexibility in Duration, Tone and Timbre Control", Proceedings of the National Science Council, Republic of China, Part A: Physical Science and Engineering, Vol.22, No.3, pp.385-395,1998.

    Gu,Hung-Yan and Li, Shiue-Jen,"Hakka Pitch-contour Parameter Generation Using a Mandarin-trained Pitch-contour Model",2002 International Symposium on Chinese Spoken Language Processing, Taipei, pp. 269-272, 2002.

    Hwang,S.H.,S.H.Chen,"Neural Network Based F0 Synthesizer for Mandarin Text-to-Speech",IEE Proc.Image Signal Process.,Vol.141,No.6,pp.384-390,1994.

    Lee,L.S.,C.Y.Tseng and C.J.Hsieh,"Improved Tone Concatenation Rules in a Formant-based Chinese Text-to-Speech System",IEEE trans. Speech and Audio Processing, Vol.1, pp.287-294,1993.

    Lee,S.J.,K.C.Kim,H.Y.Jung and W.Cho,"Application of Fully Recurrent Neural Networks for Speech Recognition",ICASSP,pp.77-80,1991.

    Rabiner, L. and B.H.Juang, Fundamentals of Speech Recognition, Prentice-Hall international ,1993.

    Tao,J.,L.Cai,S.Zhao, and Z. Wu,"The Study of The Trainable Prosody Model for Chinese Text To Speech System",in Acta Acustica,Vol.26,No.1,pp.67,2001.

    Yu,M.S.,Pan,N.H.,and Wu,M.J.,"A Statistical Model with Hierarchical Structure for Predicting Prosody in a Mandarin Text-To-Speech System", Proceedings of International Symposium on Chinese Spoken Language Processing (ISCSLP 2002),Taipei,Taiwan,R.O.C.,pp.21-24,2002.

    黃紹華,中文文句翻語音系統中韻律訊息產生器之研究,國立交通大學電信研究所,博士論文,1996.

    楊仲捷,基於VQ/HMM之國語語音合成基週軌跡產生之研究,國立台灣科技大學電機所,碩士論文,1999.

    林顯易,一套基於類神經網路與模糊邏輯之中文語音合成系統,國立交通大學電機與控制工程學系,碩士論文,1999.

    蔡正雄,中文文句翻語音系統之遞廻式模糊類神經韻律模型研究,國立交通大學電機與控制研究所,碩士論文,2000.

    李雪貞,客語語音合成之初步研究,國立台灣科技大學資工所,碩士論文,2001.

    葉怡成,類神經網路模式應用與實作,儒林圖書公司,2001.

    楊叡承,以華台雙語資訊及韻律調整為改進之台語文字轉語音系統,長庚大學資工所,碩士論文,2003.

    曹亦岑,使用小型語料類神經網路之國語語音合成韻律參數產生,國立台灣科技大學電機所,碩士論文,2003.

    董忠司,台灣閩南語辭典,國立編譯館,2002.

    QR CODE