Author: 黃崇哲
Chung-Che Huang
Thesis Title: 用於名人語音合成之PCA與ANN為基礎的音色轉換方法
A PCA and ANN Based Timbre Conversion Method for Synthesizing a Famous Person's Speech
Advisor: 古鴻炎
Hung- yan Gu
Committee: 陳秋華
Chyou-Hwa Chen
Hsin-Min Wang
Bor-shen Lin
Degree: 碩士
Department: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
Thesis Publication Year: 2011
Graduation Academic Year: 99
Language: 中文
Pages: 88
Keywords (in Chinese): 音色轉換非平行語料主成分分析類神經網路隱藏式馬可夫模型
Keywords (in other languages): timbre-conversion, non-parallel corpus, PCA, ANN, HMM
In this thesis, we try to solve the problems encountered when applying the PCA (principle component analysis) and ANN (artificial neural network) based timbre-conversion method under the situation that only a small and non-parallel corpus is available. Then, this method is adopted to construct a Mandarin speech synthesis system to synthesize speech with a famous person's timbre. We design special classification methods for syllable initial and final, respectively, to decrease the number of categories. For the syllable signals of a category, an HMM of a state-sharing structure is trained in order to obtain a fixed number of DCC (discrete cepstral coefficient) vectors for each syllable signal. This state-sharing structure of HMM is intended to improve the left-to-right HMM structure. An ANN is adopted to map the PCA coefficients of a source syllable into the PCA coefficients of a target syllable. For training this ANN, we propose a method to classify a context under which a syllable is pronounced. This method combines precise classification and rough classification. In terms of the result of context classification, we can find some source syllables that are most similarly in context for a target syllable. For each category of syllable initial and final, a separate ANN for mapping PCA coefficients is trained after the DCC vectors are analyzed with PCA. In the synthesis stage, we integrate the function of PCA and ANN based timbre-conversion into the Mandarin speech synthesis system developed previously by others. By using this system, some speech signals are synthesized with timbre converted, and used to conduct listening tests. The results of the tests show that the timbre of the synthesized speech is a little similar to that of a target famous person. However, the signal quality and the characteristic of prosody need to be improved in the future.

摘要 I ABSTRCT II 誌謝 III 目錄 IV 圖索引 VII 表索引 VIII 第1章 緒論 1 1.1 研究動機及目的 1 1.2 文獻回顧 2 1.3 研究方法 5 1.4 論文架構 11 第2章 語料準備 12 2.1 語料收集與處理 12 2.1.1 錄音 12 2.1.2 標音 13 2.2 發音品質分級 15 2.3 聲韻母分類 16 2.3.1 聲母101類、韻母135類 16 2.3.2 聲母 65類、韻母 72類 17 2.3.3 聲母 46類、韻母 37類 19 第3章 隱藏式馬可夫模型 20 3.1 HMM簡介 20 3.2 左至右之HMM訓練 21 3.2.1 維特比搜尋 22 3.2.2 狀態音框收集 24 3.2.3 計算高斯分布與轉移機率、判斷收斂 24 3.3 聲母狀態共用與韻母狀態共用之HMM結構 26 3.4 狀態共用HMM之訓練 27 第4章 主成分分析與非平行語料配對 30 4.1 主成分分析—基本方法 30 4.2 主成分分析—減少計算量 32 4.3 語境成分之分類 34 4.4 語境配對 38 第5章 類神經網路對映 42 5.1 類神經網路結構 42 5.2 類神經網路輸出入參數 44 5.3 單元個數實驗 46 第6章 語音變換與合成 50 6.1 MLP對映 50 6.2 音節DCC係數產生 52 6.3 語音信號合成 54 6.4 程式介面 56 第7章 聽測實驗 59 7.1 聽測評估方式 59 7.2 聽測實驗一 60 7.3 聽測實驗二 61 7.4 聽測實驗三 62 7.5 聽測實驗四 63 第8章 結論 65 參考文獻 69 附錄一 聲母細緻分類P(precise) 72 附錄二 韻母細緻分類P(precise) 73 附錄三 聲母粗糙分類R(rough) 74 附錄四 韻母粗糙分類R(rough) 75 作者簡介 76

