研究生: |
王皓 Hao - Wang |
論文名稱: |
結合HMM頻譜模型與PV信號模型之語音信號合成方法 A Speech Signal Synthesis Method Combining HMM Spectral Model And PV Signal Model |
指導教授: |
Hung-yan Gu |
口試委員: |
Xi-jin Hong 馮輝文 Huei-wen Ferng 余明興 Ming-shing Yu |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
論文出版年: | 2017 |
畢業學年度: | 105 |
語文別: | 中文 |
論文頁數: | 94 |
中文關鍵詞: | 隱藏式馬可夫模型 、相位聲碼器 |
外文關鍵詞: | hidden Markov model, phase vocoder |
相關次數: | 點閱:559 下載:3 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文提出一種語音信號之合成方法,將相位聲碼器( Phase Vocoder, PV )為基礎之信號模型與HMM頻譜模型作結合,以改善HMM合成語音信號的音質,並且提升合成語音與原始錄音之間的音色相似度。由於 PV參數不適合作為 HMM的特徵向量,我們先以DCC向量訓練HMM模型,在HMM完成訓練時,才從HMM各狀態所收集的音框中挑選、儲存數個適合的真實音框;在合成階段,先依HMM各狀態被指派的F0值去作真實音框挑選,再抓取被挑到之音框對應的PV參數去作信號合成。由於初始合成出的語音聽起來有怪音或是抖動的情形,因此我們在訓練階段發展了依F0值作分組挑選之真實音框挑選法,並且在合成階段發展了基於動態規劃演算法之真實音框挑選法,以及PV參數的中值平滑處理方法。音質聽測的結果顯示,有無作中值平滑處理,對受測者來說並沒有一定的好或壞;在音色相似度方面,受測者比較我們的方法與前人提出的 HMM + HNM語音信號合成法,對於內部語句,我們的方法獲得了較高的平均評分,但是對於外部語句,我們的方法因為音質不夠穩定,以至於聽測的平均評分並沒有特別偏向那一種方法。
In this thesis, a speech signal synthesis method that combines HMM (hidden markov model) spectral model with PV (phase vocoder) based singal model is proposed. This mehod is intended to improve the signal quality of synthetic speech by HMM, and to increase the timbre similarity between the synthetic speech and the recorded source speech. Because PV parameters are not suitable being used as the feature vectors for HMM, we train HMM models with DCC (discrete cepstral coefficients) vector first. Then, a few real frames are selected and saved, for each HMM state, from the collected frames on each state. In the synthesis stage, real frame selection is done first according to the F0 value assigned to an HMM state. Then, the PV parameters corresponding to a selected real frame are picked and used to synthesize speech signal. Nevertheless, the initial synthetic speech signal has the problem that click and vibration sounds are heard. Therefore, in the tranning stage, we develop a real frame selection method which splits the collected frames into several group according to their F0 values and selects a frame from each group. Also, in the synthesis stage, we develop a dynamic programming based real frame selection method, and a median smoothing method to smooth the PV parameters. According to the results of signal quality perception tests, the processing step of median smoothing is not always good or bad to the participants. As to the perception tests of timbre similarity, each participant compares our method with the method, HMM+HNM, proposed by another researcher. If inside sentences are played, our method obtains higher average score. If outside sentences are played, our method does not obtain distinguishable score. This is because the signal quality of our method’s synthetic speech is not stable enough.
[1] 洪尉翔,使用MGE訓練之HMM模型及全域變異數匹配之合成語音信號音質改進方法,國立台灣科技大學資訊工程所碩士論文,2015。
[2] The HTS working group, HMM-based Speech Synthesis System (HTS), http://hts.sp.nitech.ac.jp/.
[3] T. Toda, and K. Tokuda, "A Speech Parameter Generation Algorithm Considering Global Variance for HMM -Based Speech Synthesis", IEICE trans. Information and Systems, vol. E90 -D, no.5, pp.816 -824 , 2007.
[4] D. Cole and S. Sridharan, "Speech Enhancement by Formant Sharpening in the Cepstral Domain", Cepstral formant enhancement, pp.244 -249, 2002.
[5] S. Imai, "Cepstral Analysis Synthesis on the Mel Frequency Scale", in ICASSP, Boston, USA, pp.93-96, April. 1983.
[6] S. Takamichi, T. Toda, G. Neubig, S. Sakti, and S. Nakamura, "A Postfilter to Modify the Modulation Spectrum in HMM-Based Speech Synthesis", in ICASSP, Florence, Italy, May. 2014.
[7] Z. H. Ling, X. H. Sun, L. R. Dai, and Y. Hu, "Modulation Spectrum Compensation for HMM-Based Speech Synthesis Using Line Spectral Pairs", in ICASSP, Shanghai, China, March. 2016.
[8] P. K. Narayanamurthy and C. S. Seelamantula. "Dictionary-Learning-Based Post-Filter for HMM-Based Speech Synthesis", in TENCON, Macau, Nov, 2015.
[9] Y. Stylianou, "Applying the Harmonic Plus Noise Model in Concatenative Speech Synthesis", IEEE trans. Speech and Audio Processing, vol9, no1, pp. 21-29, 2001.
[10] 王小川, 語音訊號處理, 修訂二版, 全華圖書股份有限公司, 2009。
[11] M. Dolson, "The phase vocoder: A tutorial", Computer Music Journal 10.4, 14-27, 1986.
[12] 古鴻炎,賴名彥,洪尉翔,陳彥樺,基於發音知識以建構頻譜 HMM之國語語音合成方法,ROCLING,桃園,2014 。
[13] 賴名彥,結合HMM頻譜模型與ANN韻律模型之國語語音合成系統,國立台灣科技大學資訊工程研究所碩士論文,2009。
[14] S. Young, "The HTK Hidden Markov Model Toolkit: Design and Philosophy", Tech Report TR.153, Department of Engineering, Cambridge University, UK, 1993.
[15] K. Sjolander and J. Beskow, Centre of Speech Technolodgy at KTH, http://www.speech.kth.se/wavesurfer/
[16] 古鴻炎、蔡松,”基於離散倒頻譜之頻譜包絡估計架構及其於語音轉換之應用”,ROCLING,台中,2009。
[17] 古鴻炎、張小芬、吳俊欣,”仿趙氏音高尺度之基週軌跡正規化方法及其應用”, ROCLING,台北,第325-334頁,2004。
[18] H. Y. Kim, et al., "Pitch Detection with Average Magnitude Difference Function Using Adaptive Threshold Algorithm for Estimating Shimmer and Jitter", Proc. of the 20th Annual International Conference of the IEEE, Engineering in Medicine and Biology Society, Rijeka, Croatia, Vol. 6, pp. 3162 -3164, 1998..
[19] K. Tokuda, H. Zen, and A. W. Black, "An HMM-Based Speech Synthesis System Applied to English", in Proc. IEEE 2002 Workshop on Speech Synthesis, Santa Monica, USA, Sep. 2002.
[20] H. Zen, K.Tokuda, T. Masuko, T. Kobayashi and T. Kitamura, "A Hidden Semi-Markov Model-Based Speech Synthesis System" , IEICE Trans. Information and Systems, vol. E90-D, no.5, pp.825-834, 2007.
[21] K. Shinoda and T. Watanabe, "Acoustic Modeling Based on the MDL Criterion for Speech Recognition", in Proc. EuroSpeech-97, Rhodes, Greece, 1997.
[22] 張世穎,結合HTS頻譜模型與ANN韻律模型之國語語音合成系統,國立台灣科技大學資訊工程研究所碩士論文,2011。
[23] T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi and T. Kitamura, "Duration Modeling for Hmm-Based Speech Synthesis System", Proc. of ICSLP, Sydney, Australia, Vol. 2, pp.29-32, 1998.
[24] K. Tokuda, et al. "Speech Synthesis Based on Hidden Markov Models", Proceedings of the IEEE 101.5 (2013): 1234-1252.
[25] 蔡松峰,GMM為基礎之語音轉換法的改進,國立台灣科技大學資訊工程研究所碩士論文,2009。
[26] L. Rabiner and B. H. Juang, "Fundamentals of Speech Recognition", Pretice-Hall International, Inc, 1993.
[27] 許瓊之,整合聲學指引規則至HMM最佳路徑搜尋之歌聲分段方法,國立台灣科技大學資訊工程所碩士論文,2015。
[28] E. Moulines and J Laroche. "Non-Parametric Techniques for Pitch-Scale and Time-Scale Modification of Speech", Speech communication 16.2 (1995): 175-205.
[29] 王祐邦,相角音碼器及其在語音韻律調整之應用,國立臺灣大學電信工程學研究所碩士論文,2005。
[30] 藍子杰,使用半音節單元挑選與相位聲碼器信號模型之歌聲合成,國立台灣科技大學資訊工程所碩士論文,2016。
[31] F. R. Moore,"Elements of Computer Music", Prentice-Hall, Inc., 1990.
[32] O'shaughnessy, Douglas. "Speech Communication: Human and Machine", Universities press, 1987.