簡易檢索 / 詳目顯示

研究生: 王皓
Hao - Wang
論文名稱: 結合HMM頻譜模型與PV信號模型之語音信號合成方法
A Speech Signal Synthesis Method Combining HMM Spectral Model And PV Signal Model
指導教授: 古鴻炎
Hung-yan Gu
口試委員: 洪西進
Xi-jin Hong
馮輝文
Huei-wen Ferng
余明興
Ming-shing Yu
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2017
畢業學年度: 105
語文別: 中文
論文頁數: 94
中文關鍵詞: 隱藏式馬可夫模型相位聲碼器
外文關鍵詞: hidden Markov model, phase vocoder
相關次數: 點閱:208下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

本論文提出一種語音信號之合成方法,將相位聲碼器( Phase Vocoder, PV )為基礎之信號模型與HMM頻譜模型作結合,以改善HMM合成語音信號的音質,並且提升合成語音與原始錄音之間的音色相似度。由於 PV參數不適合作為 HMM的特徵向量,我們先以DCC向量訓練HMM模型,在HMM完成訓練時,才從HMM各狀態所收集的音框中挑選、儲存數個適合的真實音框;在合成階段,先依HMM各狀態被指派的F0值去作真實音框挑選,再抓取被挑到之音框對應的PV參數去作信號合成。由於初始合成出的語音聽起來有怪音或是抖動的情形,因此我們在訓練階段發展了依F0值作分組挑選之真實音框挑選法,並且在合成階段發展了基於動態規劃演算法之真實音框挑選法,以及PV參數的中值平滑處理方法。音質聽測的結果顯示,有無作中值平滑處理,對受測者來說並沒有一定的好或壞;在音色相似度方面,受測者比較我們的方法與前人提出的 HMM + HNM語音信號合成法,對於內部語句,我們的方法獲得了較高的平均評分,但是對於外部語句,我們的方法因為音質不夠穩定,以至於聽測的平均評分並沒有特別偏向那一種方法。


In this thesis, a speech signal synthesis method that combines HMM (hidden markov model) spectral model with PV (phase vocoder) based singal model is proposed. This mehod is intended to improve the signal quality of synthetic speech by HMM, and to increase the timbre similarity between the synthetic speech and the recorded source speech. Because PV parameters are not suitable being used as the feature vectors for HMM, we train HMM models with DCC (discrete cepstral coefficients) vector first. Then, a few real frames are selected and saved, for each HMM state, from the collected frames on each state. In the synthesis stage, real frame selection is done first according to the F0 value assigned to an HMM state. Then, the PV parameters corresponding to a selected real frame are picked and used to synthesize speech signal. Nevertheless, the initial synthetic speech signal has the problem that click and vibration sounds are heard. Therefore, in the tranning stage, we develop a real frame selection method which splits the collected frames into several group according to their F0 values and selects a frame from each group. Also, in the synthesis stage, we develop a dynamic programming based real frame selection method, and a median smoothing method to smooth the PV parameters. According to the results of signal quality perception tests, the processing step of median smoothing is not always good or bad to the participants. As to the perception tests of timbre similarity, each participant compares our method with the method, HMM+HNM, proposed by another researcher. If inside sentences are played, our method obtains higher average score. If outside sentences are played, our method does not obtain distinguishable score. This is because the signal quality of our method’s synthetic speech is not stable enough.

摘要 I ABSTRACT II 致謝 III 目錄 IV 圖表索引 VII 第1章 緒論 1 1.1 研究動機 1 1.2 文獻回顧 2 1.2.1 系統架構 2 1.2.2 HMM模型 4 1.2.3 信號合成方法 7 1.3 研究方法 10 1.4 論文架構 16 第2章 語料準備和特徵參數分析 17 2.1 語料錄音 17 2.2 標音、切音 18 2.3 F0估計 20 2.4 頻譜係數擷取 22 第3章 Phase Vocoder模型分析與合成 25 3.1 窗函數與短時傅立葉轉換 25 3.2 相位聲碼器之參數分析 27 3.2.1 振幅參數 28 3.2.2 相位參數與瞬時頻率 28 3.3 PV技術之信號合成 30 3.3.1 時間長度修改 33 3.3.2 音高修改 34 第4章 HMM頻譜模型訓練 37 4.1 隱藏式馬可夫模型 37 4.2 語音單元、文脈組合與半段式HMM 38 4.2.1 聲、韻母分類 38 4.2.2 文脈組合 39 4.2.3 半段式HMM 42 4.3 HMM訓練 45 4.4 真實音框挑選 48 4.5 HMM狀態時長參數之訓練 51 4.6 HMM訓練方法之改進 51 4.6.1 音框囤積於少數狀態 51 4.6.2 聲韻母邊界音框分配錯誤 52 4.6.3 改進方法 53 第5章 整合HMM與PV之信號合成方法 55 5.1 HMM模型挑選 55 5.2 各狀態之PV真實音框挑選 58 5.2.1 局部挑選法 59 5.2.2 動態規劃挑選法 60 5.3 PV音框序列生成 61 5.3.1 音框頻譜調整法(FSA) 63 5.3.2 狀態頻譜之中值平滑法(SSMS) 65 第6章 測試實驗與客觀評估 67 6.1 基頻相關之怪音問題 67 6.1.1 考慮F0值之真實音框挑選(HMM訓練階段) 68 6.1.2 真實音框之數量 70 6.1.3 相鄰狀態之真實音框替補法(合成階段) 71 6.2 合成語音之聲音抖動問題 72 6.2.1 局部挑選法之權重設定實例 72 6.2.2 動態規劃挑選法之實例 74 6.2.3 PV參數之平滑處理 75 6.3 合成語音之客觀量測 77 6.3.1 DCC距離之失真量測 77 6.3.2 內部語句量測 78 6.3.3 外部語句量測 79 第7章 主觀聽測 80 7.1 合成語音音質 81 7.1.1 內部測試 82 7.1.2 外部測試 83 7.2 音色相似度 84 7.2.1 內部測試 84 7.2.2 外部測試 85 第8章 結論 87 參考文獻 91

[1] 洪尉翔,使用MGE訓練之HMM模型及全域變異數匹配之合成語音信號音質改進方法,國立台灣科技大學資訊工程所碩士論文,2015。
[2] The HTS working group, HMM-based Speech Synthesis System (HTS), http://hts.sp.nitech.ac.jp/.
[3] T. Toda, and K. Tokuda, "A Speech Parameter Generation Algorithm Considering Global Variance for HMM -Based Speech Synthesis", IEICE trans. Information and Systems, vol. E90 -D, no.5, pp.816 -824 , 2007.
[4] D. Cole and S. Sridharan, "Speech Enhancement by Formant Sharpening in the Cepstral Domain", Cepstral formant enhancement, pp.244 -249, 2002.
[5] S. Imai, "Cepstral Analysis Synthesis on the Mel Frequency Scale", in ICASSP, Boston, USA, pp.93-96, April. 1983.
[6] S. Takamichi, T. Toda, G. Neubig, S. Sakti, and S. Nakamura, "A Postfilter to Modify the Modulation Spectrum in HMM-Based Speech Synthesis", in ICASSP, Florence, Italy, May. 2014.
[7] Z. H. Ling, X. H. Sun, L. R. Dai, and Y. Hu, "Modulation Spectrum Compensation for HMM-Based Speech Synthesis Using Line Spectral Pairs", in ICASSP, Shanghai, China, March. 2016.
[8] P. K. Narayanamurthy and C. S. Seelamantula. "Dictionary-Learning-Based Post-Filter for HMM-Based Speech Synthesis", in TENCON, Macau, Nov, 2015.
[9] Y. Stylianou, "Applying the Harmonic Plus Noise Model in Concatenative Speech Synthesis", IEEE trans. Speech and Audio Processing, vol9, no1, pp. 21-29, 2001.
[10] 王小川, 語音訊號處理, 修訂二版, 全華圖書股份有限公司, 2009。
[11] M. Dolson, "The phase vocoder: A tutorial", Computer Music Journal 10.4, 14-27, 1986.
[12] 古鴻炎,賴名彥,洪尉翔,陳彥樺,基於發音知識以建構頻譜 HMM之國語語音合成方法,ROCLING,桃園,2014 。
[13] 賴名彥,結合HMM頻譜模型與ANN韻律模型之國語語音合成系統,國立台灣科技大學資訊工程研究所碩士論文,2009。
[14] S. Young, "The HTK Hidden Markov Model Toolkit: Design and Philosophy", Tech Report TR.153, Department of Engineering, Cambridge University, UK, 1993.
[15] K. Sjolander and J. Beskow, Centre of Speech Technolodgy at KTH, http://www.speech.kth.se/wavesurfer/
[16] 古鴻炎、蔡松,”基於離散倒頻譜之頻譜包絡估計架構及其於語音轉換之應用”,ROCLING,台中,2009。
[17] 古鴻炎、張小芬、吳俊欣,”仿趙氏音高尺度之基週軌跡正規化方法及其應用”, ROCLING,台北,第325-334頁,2004。
[18] H. Y. Kim, et al., "Pitch Detection with Average Magnitude Difference Function Using Adaptive Threshold Algorithm for Estimating Shimmer and Jitter", Proc. of the 20th Annual International Conference of the IEEE, Engineering in Medicine and Biology Society, Rijeka, Croatia, Vol. 6, pp. 3162 -3164, 1998..
[19] K. Tokuda, H. Zen, and A. W. Black, "An HMM-Based Speech Synthesis System Applied to English", in Proc. IEEE 2002 Workshop on Speech Synthesis, Santa Monica, USA, Sep. 2002.
[20] H. Zen, K.Tokuda, T. Masuko, T. Kobayashi and T. Kitamura, "A Hidden Semi-Markov Model-Based Speech Synthesis System" , IEICE Trans. Information and Systems, vol. E90-D, no.5, pp.825-834, 2007.
[21] K. Shinoda and T. Watanabe, "Acoustic Modeling Based on the MDL Criterion for Speech Recognition", in Proc. EuroSpeech-97, Rhodes, Greece, 1997.
[22] 張世穎,結合HTS頻譜模型與ANN韻律模型之國語語音合成系統,國立台灣科技大學資訊工程研究所碩士論文,2011。
[23] T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi and T. Kitamura, "Duration Modeling for Hmm-Based Speech Synthesis System", Proc. of ICSLP, Sydney, Australia, Vol. 2, pp.29-32, 1998.
[24] K. Tokuda, et al. "Speech Synthesis Based on Hidden Markov Models", Proceedings of the IEEE 101.5 (2013): 1234-1252.
[25] 蔡松峰,GMM為基礎之語音轉換法的改進,國立台灣科技大學資訊工程研究所碩士論文,2009。
[26] L. Rabiner and B. H. Juang, "Fundamentals of Speech Recognition", Pretice-Hall International, Inc, 1993.
[27] 許瓊之,整合聲學指引規則至HMM最佳路徑搜尋之歌聲分段方法,國立台灣科技大學資訊工程所碩士論文,2015。
[28] E. Moulines and J Laroche. "Non-Parametric Techniques for Pitch-Scale and Time-Scale Modification of Speech", Speech communication 16.2 (1995): 175-205.
[29] 王祐邦,相角音碼器及其在語音韻律調整之應用,國立臺灣大學電信工程學研究所碩士論文,2005。
[30] 藍子杰,使用半音節單元挑選與相位聲碼器信號模型之歌聲合成,國立台灣科技大學資訊工程所碩士論文,2016。
[31] F. R. Moore,"Elements of Computer Music", Prentice-Hall, Inc., 1990.
[32] O'shaughnessy, Douglas. "Speech Communication: Human and Machine", Universities press, 1987.

QR CODE