Basic Search / Detailed Display

Author: 曾聖文
Sheng-wen Tzeng
Thesis Title: 使用頻譜HMM模型及波形包絡模型之曲笛聲合成
Chinese-Flute Sound Synthesis Using Spectral HMM Models and Waveform Envelope Model
Advisor: 古鴻炎
Hung-Yan Gu
Committee: 廖元甫
Yuan-fu Liao
Mao-bin Syu
Ming-sing Yu
Degree: 碩士
Department: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
Thesis Publication Year: 2014
Graduation Academic Year: 102
Language: 中文
Pages: 87
Keywords (in Chinese): 曲笛樂器合成隱藏式馬可夫模型振幅包絡
Keywords (in other languages): DCT, instrument, synthesis, amplitude, envelope, HTS, HMM
Reference times: Clicks: 233Downloads: 3
School Collection Retrieve National Library Collection Retrieve Error Report
  • 本論文基於HTS的頻譜HMM模型及自行建立的波形包絡模型,來研發一個曲笛聲音的合成系統。首先使用STRAIGHT來分析各個曲笛樂句錄音的基頻軌跡及作自動標音;使用HTS軟體來訓練頻譜HMM模型及決策樹;再對各音符的波形包絡作DCT轉換,然後以前後文分類後算出的平均DCT向量作為包絡模型。在合成階段,先令HTS作笛聲合成,接著則以我們程式產生的F0軌跡去取代HTS所產生的,再以包絡模型DCT向量還原出的包絡曲線,去調整HTS再次合成的波形包絡,如此就可合成出音高正確且比較自然的曲笛聲音。然後,我們進行頻譜誤差的量測及聽測的評估,結果顯示,作F0取代和波形包絡調整後的合成曲笛樂曲,在自然度和品質上都會比原始HTS合成的笛聲好。

    In this thesis, a system to synthesis Chinese-flute sound is developed. In fact, the system is based on the HMM (Hidden Markov Model) models of HTS (HMMs trained by the HMM-based speech synthesis system) software and the waveform-envelope models constructed by us. In the training stage, STRAIGHT was used to analyze the pitch contours of the Chinese-flute recording, and the pitch contours are used to label the pitch symbol of each note automatically. Next, HTS was used to train the HMM models and decision trees. Also, DCT (Descrete Cosine Transform) transformation was performed for the waveform envelope of each note. Then, the waveform-envelope model is obtained by averaging the DCT vectors collected from each context class. In the synthesis stage, the HTS software is first commanded to synthesize the Chinese-flute sound of a score. Then, the pitch contours of the notes are replaced by the pitch contours generated by our program. Next, the waveform-envelope of each note’s signal resynthesized by HTS is modified according to the envelope curve generated by the envelope model. Conseqently, the pitchs of the synthesized Chinese-flute notes become correct and sound more natural. By using the synthetic and recorded sound files, spectrum error was measured and listening tests were conducted. The results show that after replacing pitch contours and modifying waveform-envelopes, the quality and naturalness level of the synthesized Chinese-flute music are apparently higher than the original HTS synthesized music.

    摘要I ABSTRACTII 誌謝III 目錄IV 圖表索引VI 第1章 緒論1 1.1 研究動機與目的1 1.2 文獻回顧2 1.3 研究方法4 1.4 論文架構7 第2章 曲笛音檔準備8 2.1 錄音8 2.2 基頻軌跡分析8 2.3 標音、HTS Label準備9 2.4 曲笛聲分析12 第3章 HMM頻譜模型訓練16 3.1 HTS語音合成軟體16 3.2 隱藏式馬可夫模型簡介18 3.3 HTS參數設定與輸入檔案準備22 3.3.1 曲笛樂音單元設定22 3.3.2 HTS參數檔案設定23 3.3.3 HTS輸入檔案24 3.4 頻譜係數萃取25 3.5 文脈無關之HMM模型27 3.6 粗糙的文脈相依HMM模型29 3.7 HMM樹狀分群與決策樹30 3.7.1 問題集31 3.7.2 決策樹33 3.8 HMM訓練及第二次分群34 3.9 文脈相依的標籤檔格式35 第4章 HTS曲笛聲合成37 4.1 準備標籤檔37 4.1.1 樂譜檔處理37 4.1.2 HTS所用標籤檔38 4.2 產生F0軌跡38 4.3 合成之笛聲波形41 第5章 振幅抖動控制44 5.1 整流、振幅包絡求取44 5.2 DCT係數計算47 5.3 基於包絡分類之包絡模型49 5.3.1 分類方式50 5.3.2 振幅包絡之平均曲線53 5.3.3 振幅調整55 第六章 測試實驗59 6.1 合成信號比較59 6.2 客觀距離量測62 6.3 主觀聽測64 6.3.1 內部曲笛樂曲的自然度比較65 6.3.2 外部樂曲的品質比較66 第七章 結論69 參考文獻72 附錄A HTS_pstream.c程式碼修改處74 A.1 關於length.txt檔的修改處74 A.2 關於reall.txt檔的修改處75

    [1]A. V. Oppenheim and R. W. Schafter, Descrete-time Signal Processing,2nd ed., Prentice-Hall, 1999.
    [2]C. Dodge, and T. A. Jerse, Computer Music: Synthesis, composition, and performance, second ed., Schirmer Books, 1997.
    [3]D. Dimitriadis and P. Maragos, "Robust Energy Demodulation Based on Continuous Models with Application to Speech Recognition. " in Proc. of Eurospeech-03, Geneva, Sept. 2003.
    [4]H. Andrew, and L. Ayers. "Modeling acoustic wind instruments with contiguous group synthesis." Journal of the Audio Engineering Society 46.10, pp. 868-879, 2008.
    [5]H. Banno, H. Hata, M. Morise, T. Takahashi, T. Irino and H. Kawahara, “ Implementation of realtime STRAIGHT speech manipulation system,” Acoust. Sci. & Tech. 2007. Vol.28, No.3, pp.140-146, 2007.
    [6]H. Kawahara, I. Masuda-Katsuse and A. de Cheveigne’, “Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequencybased F0 extraction,” Speech Communication 27, pp. 187–207 , 1999.
    [7]H. Zen, K. Tokuda, K. Oura, K. Hashimoto, S. Shiota, S. Takaki, J. Yamagishi, T. Toda, T. Nose, S. Sako, Alan W. Black, HMM-based Speech Synthesis System (HTS),
    [8]J. Yamagishi, “An Introduction to HMM-Based Speech Synthesis”.
    [9]K. Myeongsu, and Y. Hong. "Formant Synthesis of Haegeum: A Sound Analysis/Synthesis System Using Cepstral Envelope." Information Science and Applications (ICISA), International Conference on. IEEE, 2011.
    [10]K. Shinoda and T. Watanabe, “Acoustic modeling based on the mdl principle for speech recognition,” Rhodes, Greece, September 22-25, ISCA, 1997.
    [11]K. Sjolander and J. Beskow, Centre of Speech Technolodge at KTH,
    [12]K. Tokuda, H. Zen, and A.W. Black. “An hmm-based speech synthesis system applied to english,”Proc. IEEE 2002 Workshop on Speech Synthesis, Santa Monica, USA, Sep. 2002.
    [13]MathWorks, MATLAB,
    [14]M. Oehler, ”Wind Instrument Synthesis by Means of Cyclical Spectra”. 2006.
    [15]R. Martın, E. Lopez12, and L. Jure. "Wind instruments synthesis toolbox for generation of music audio signals with labeled partials." 2009.
    [16]SPTK Working Group , Speech Signal Processing Toolkit (SPTK),
    [17]Tokuda, Keiichi, et al.,"Mel-generalized cepstral analysis-a unified approach to speech spectral estimation." ICSLP. Vol. 94. 1994.
    [19]李振宇、林奇嶽,使用隱藏式馬可夫模型為基礎建立中文語音合成系統,ICL TECHNICAL JOURNAL , pp.88-94,2010.
    [20]梁伯達.洞簫音色之 Hilbert-Huang Transform (HHT) 分析.,臺灣大學電信工程學研究所學位論文,2007。