研究生: |
曾聖文 Sheng-wen Tzeng |
---|---|
論文名稱: |
使用頻譜HMM模型及波形包絡模型之曲笛聲合成 Chinese-Flute Sound Synthesis Using Spectral HMM Models and Waveform Envelope Model |
指導教授: |
古鴻炎
Hung-Yan Gu |
口試委員: |
廖元甫
Yuan-fu Liao 徐茂濱 Mao-bin Syu 余明興 Ming-sing Yu |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
論文出版年: | 2014 |
畢業學年度: | 102 |
語文別: | 中文 |
論文頁數: | 87 |
中文關鍵詞: | 曲笛 、樂器 、合成 、隱藏式馬可夫 、模型 、振幅 、包絡 |
外文關鍵詞: | DCT, instrument, synthesis, amplitude, envelope, HTS, HMM |
相關次數: | 點閱:283 下載:3 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文基於HTS的頻譜HMM模型及自行建立的波形包絡模型,來研發一個曲笛聲音的合成系統。首先使用STRAIGHT來分析各個曲笛樂句錄音的基頻軌跡及作自動標音;使用HTS軟體來訓練頻譜HMM模型及決策樹;再對各音符的波形包絡作DCT轉換,然後以前後文分類後算出的平均DCT向量作為包絡模型。在合成階段,先令HTS作笛聲合成,接著則以我們程式產生的F0軌跡去取代HTS所產生的,再以包絡模型DCT向量還原出的包絡曲線,去調整HTS再次合成的波形包絡,如此就可合成出音高正確且比較自然的曲笛聲音。然後,我們進行頻譜誤差的量測及聽測的評估,結果顯示,作F0取代和波形包絡調整後的合成曲笛樂曲,在自然度和品質上都會比原始HTS合成的笛聲好。
In this thesis, a system to synthesis Chinese-flute sound is developed. In fact, the system is based on the HMM (Hidden Markov Model) models of HTS (HMMs trained by the HMM-based speech synthesis system) software and the waveform-envelope models constructed by us. In the training stage, STRAIGHT was used to analyze the pitch contours of the Chinese-flute recording, and the pitch contours are used to label the pitch symbol of each note automatically. Next, HTS was used to train the HMM models and decision trees. Also, DCT (Descrete Cosine Transform) transformation was performed for the waveform envelope of each note. Then, the waveform-envelope model is obtained by averaging the DCT vectors collected from each context class. In the synthesis stage, the HTS software is first commanded to synthesize the Chinese-flute sound of a score. Then, the pitch contours of the notes are replaced by the pitch contours generated by our program. Next, the waveform-envelope of each note’s signal resynthesized by HTS is modified according to the envelope curve generated by the envelope model. Conseqently, the pitchs of the synthesized Chinese-flute notes become correct and sound more natural. By using the synthetic and recorded sound files, spectrum error was measured and listening tests were conducted. The results show that after replacing pitch contours and modifying waveform-envelopes, the quality and naturalness level of the synthesized Chinese-flute music are apparently higher than the original HTS synthesized music.
[1]A. V. Oppenheim and R. W. Schafter, Descrete-time Signal Processing,2nd ed., Prentice-Hall, 1999.
[2]C. Dodge, and T. A. Jerse, Computer Music: Synthesis, composition, and performance, second ed., Schirmer Books, 1997.
[3]D. Dimitriadis and P. Maragos, "Robust Energy Demodulation Based on Continuous Models with Application to Speech Recognition. " in Proc. of Eurospeech-03, Geneva, Sept. 2003.
[4]H. Andrew, and L. Ayers. "Modeling acoustic wind instruments with contiguous group synthesis." Journal of the Audio Engineering Society 46.10, pp. 868-879, 2008.
[5]H. Banno, H. Hata, M. Morise, T. Takahashi, T. Irino and H. Kawahara, “ Implementation of realtime STRAIGHT speech manipulation system,” Acoust. Sci. & Tech. 2007. Vol.28, No.3, pp.140-146, 2007.
[6]H. Kawahara, I. Masuda-Katsuse and A. de Cheveigne’, “Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequencybased F0 extraction,” Speech Communication 27, pp. 187–207 , 1999.
[7]H. Zen, K. Tokuda, K. Oura, K. Hashimoto, S. Shiota, S. Takaki, J. Yamagishi, T. Toda, T. Nose, S. Sako, Alan W. Black, HMM-based Speech Synthesis System (HTS), http://hts.sp.nitech.ac.jp/.
[8]J. Yamagishi, “An Introduction to HMM-Based Speech Synthesis”.
[9]K. Myeongsu, and Y. Hong. "Formant Synthesis of Haegeum: A Sound Analysis/Synthesis System Using Cepstral Envelope." Information Science and Applications (ICISA), International Conference on. IEEE, 2011.
[10]K. Shinoda and T. Watanabe, “Acoustic modeling based on the mdl principle for speech recognition,” Rhodes, Greece, September 22-25, ISCA, 1997.
[11]K. Sjolander and J. Beskow, Centre of Speech Technolodge at KTH, http://www.speech.kth.se/wavesurfer/.
[12]K. Tokuda, H. Zen, and A.W. Black. “An hmm-based speech synthesis system applied to english,”Proc. IEEE 2002 Workshop on Speech Synthesis, Santa Monica, USA, Sep. 2002.
[13]MathWorks, MATLAB, http://www.mathworks.com/products/matlab/.
[14]M. Oehler, ”Wind Instrument Synthesis by Means of Cyclical Spectra”. 2006.
[15]R. Martın, E. Lopez12, and L. Jure. "Wind instruments synthesis toolbox for generation of music audio signals with labeled partials." 2009.
[16]SPTK Working Group , Speech Signal Processing Toolkit (SPTK), http://sp-tk.sourceforge.net/
[17]Tokuda, Keiichi, et al.,"Mel-generalized cepstral analysis-a unified approach to speech spectral estimation." ICSLP. Vol. 94. 1994.
[18]王如江,基於歌聲表情分析與單元選擇之國語歌聲合成研究,國立台灣科技大學資訊工程研究所碩士論文,2007。
[19]李振宇、林奇嶽,使用隱藏式馬可夫模型為基礎建立中文語音合成系統,ICL TECHNICAL JOURNAL , pp.88-94,2010.
[20]梁伯達.洞簫音色之 Hilbert-Huang Transform (HHT) 分析.,臺灣大學電信工程學研究所學位論文,2007。
[21]黃仕偉,基於小提琴技法之音樂合成,成功大學資訊工程學系碩博士班學位論文,2011。
[22]陳安璿,整合MIDI伴奏之歌唱聲合成系統,國立台灣科技大學資訊工程研究所碩士論文,2004。
[23]劉冠驛,基於隱藏式馬可夫模型之英文語音合成系統實作,交通大學電信工程系所學位論文,2011。
[24]賴名彥,結合HMM頻譜模型與ANN韻律模型之國語語音合成系統,國立台灣科技大學資訊工程研究所碩士論文,2009。