研究生: 陳彥樺
Yen-Hua Chen
論文名稱: 以聲學語言模型、全域變異數匹配及目標音框挑選作強化之語音轉換系統
A Voice Conversion System Enhanced with Acoustic Language-model, Global Variance Matching, and Target Frame Selection
指導教授: 古鴻炎
Hung-Yan Gu
口試委員: 王新民
Hsin-Min Wang
Ming-Shing Yu
Bor-Shen Lin
學位類別: 碩士
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2016
畢業學年度: 104
語文別: 中文
論文頁數: 84
中文關鍵詞: 語音轉換聲學語言模型目標音框挑選全域變異數離散倒頻譜係數高斯混合模型諧波加雜音模型
外文關鍵詞: voice conversion, acoustic language-model, target frame selection, global variance, discrete cepstral coefficient, Gaussian mixture model, harmonic-plus-noise model
點閱:504下載:12
In this thesis, a combination method for voice conversion is proposed to enhance the performance of GMM based voice conversion systems. The combination method includes the processing modules, PPM acoustic language-model (ALM), target frame selection (TFS), and global variance (GV) matching. Actually, we implement the two voice conversion methods: ALM+TFS+GV and ALM+GV+TFS. In training stage, we use the 128 mean vectors of Gaussian mixtures from a trained GMM to establish a quasi-phonetic symbol binary classification tree. Then, the tree is used to train ALM. In conversion stage, input voice frames are segmented according to the probabilities estimated by ALM. Next, each voice frame’s spectrum is mapped with a single Gaussian mixture that corresponds to this frame. Afterward, the two modules, TFS and GV, are executed in order to reduce the problem of over-smoothed spectral envelope. In TFS, a converted DCC (discrete cepstral coefficient) vector for an input frame is used to search the nearest frame from the target-speaker training frames, and the found DCC is taken to replace the converted. GV matching can adjust the DCC’s variance of a sequence of converted DCC to match the variance of the target-speaker’s training DCC vectors.
According to the results of objective tests, the average DCC error of our method is larger than the baseline method. However, the signal-quality index, variance ratio (VR), indicates our method is better. In addition, according to the results of perception tests, the converted speech by our method can obtain higher signal quality and higher timbre similarity than the baseline method.

摘要 I ABSTRCT II 誌謝 III 目錄 IV 圖表索引 VI 第1章 緒論 1 1.1 研究動機 1 1.2 文獻回顧 1 1.3 研究方法 5 1.3.1 語音轉換系統之訓練流程 7 1.3.2 語音轉換系統之轉換流程 9 1.4 論文架構 11 第2章 語料準備與頻譜特徵參數 12 2.1 語料錄音 12 2.2 標音、切音 12 2.3 離散倒頻譜係數估計 14 2.4 DTW音框匹配 16 第3章 聲學語言模型之訓練與測試 18 3.1 建立近似音素之分類樹 18 3.2 向量量化編碼-近似音素符號 21 3.3 訓練PPM聲學語言模型 22 3.4 基於聲學語言模型之近似音素挑選 25 3.4.1 近似音素符號挑選 26 3.4.2 以動態規劃尋找最佳之近似音素序列 27 3.5 聲學語言模型之測試 29 3.5.1 PPM聲學語言模型之perplexity評估 29 3.5.2 近似音素挑選之正確率 31 第4章 頻譜係數對映 36 4.1 高斯混合模型(GMM) 36 4.2 單一高斯混合挑選與對映 39 4.2.1 單一高斯混合挑選 40 4.2.2 單一高斯混合對映 42 第5章 頻譜過度平滑之改進方法 44 5.1 目標音框挑選 45 5.2 全域變異數調整法 48 第6章 語音轉換實驗 51 6.1 語者配對 51 6.2 平均DCC誤差及變異數比值量測 53 6.3 實驗一:全域變異數權重值之比較 54 6.4 實驗二:不同轉換方法之比較 57 第7章 系統製作與聽測實驗 60 7.1 音高轉換 60 7.2 HNM合成 61 7.3 聽測實驗一:本論文方法之比較 63 7.4 聽測實驗二:與前人方法之比較 66 第8章 結論 69 參考文獻 73

