簡易檢索 / 詳目顯示

研究生: 蔡哲彰
Che-chang Tsai
論文名稱: 國語合成歌聲流暢度改進之研究
Fluency Improving for Mandarin Singing Voice Synthesis
指導教授: 古鴻炎
Hung-yan Gu
口試委員: 王新民
Hsin-min Wang
余明興
Ming-shing Yu
洪西進
Shi-jinn Horng
林柏慎
Bor-shen Lin
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2009
畢業學年度: 97
語文別: 中文
論文頁數: 72
中文關鍵詞: 國語歌聲合成頻譜演進共振峰軌跡連接
外文關鍵詞: Mandarin singing voice synthesis, spectrum progression, formant trace connecting
相關次數: 點閱:176下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文的目標是使用少量的合成單元來合成出流暢的歌聲。在相鄰音節之間,我們提出以反射係數為基礎的頻譜內插方法,來平順連接相鄰音節的共振峰軌跡;在音節內流暢度的改進方面,我們參考以前學長用於語音合成之頻演路徑之概念,修改出適用於歌聲合成的頻演模型。我們也修改了HNM合成程式,以配合上述兩項流暢性改進方法;此外,以更多的語料來訓練ANN抖音參數模型,希望藉以提升合成歌聲的自然度。經由主觀的自然度聽測實驗,所得的評分顯示,使用頻演模型及共振峰軌跡連接處理,的確可以增進歌聲信號的流暢性。


    In this thesis, the goal is to synthesize fluent singing voice by using a small amount of synthesis units. Between adjacent syllables, we propose a reflection-coefficient based spectrum interpolation method to let the formant traces be smoothly connected. To improve the intra-syllable fluency level of a synthetic syllable, we make use of the concept of spectrum progression proposed for speech synthesis to construct a spectrum progression model suitable for singing voice synthesis. Since the two fluency promoting methods must be realized with signal synthesis, we modify and correct the HNM synthesis program developed by others. In addition, we use a larger corpus to train the ANN vibrato parameter models in order to increase the naturalness level of the synthetic singing voice. According to the results of the listening tests, the score obtained by using spectrum progression model and formant trace connecting processing is indeed higher than those obtained without such processing.

    摘要 I ABSTRCT II 誌謝 III 目錄 IV 圖索引 VI 表索引 VIII 第1章 緒論 1 1.1 研究動機 1 1.2 文獻回顧 2 1.3 研究方法 4 1.4 論文架構 7 第2章 頻演路徑與抖音參數分析 8 2.1 語料準備 8 2.2 頻譜演進簡介 10 2.3 基於DTW之頻演路徑分析 12 2.4 頻演路徑求取之程式 15 2.5 抖音參數求取之程式 16 第3章 類神經網路模型 19 3.1 類神經網路簡介 19 3.2 類神經網路結構 20 3.3 類神經網路輸出入參數 22 3.4 單元個數實驗 26 3.5 MLP模型之訓練誤差比較 33 第4章 共振峰軌跡連接處理 34 4.1 音節邊界頻譜連接 34 4.2 頻譜包絡估計 35 4.3 頻譜包絡與反射係數相互轉換 38 4.4 不同係數之內插效果 41 4.5 反射係數內插實驗 47 第5章 系統製作與聽測實驗 50 5.1 頻演時間軸對映 50 5.2 共振峰軌跡之連接處理 53 5.3 HNM分析之MVF 57 5.4 HNM合成之參數 58 5.5 系統介面 60 5.6 聽測實驗 63 第6章 結論 66 參考文獻 69 作者簡介 72

    [1] 古鴻炎、廖皇量,”用於國語歌聲合成之諧波加噪音模型的改進研究”,WOCMAT 2006 國際電腦音樂與音訊技術研討會(台北),session 2 (音訊處理I),2006。
    [2] 林正甫, 使用ANN抖音參數模型之國語歌聲合成, 國立台灣科技大學資訊工程研究所碩士論文,2008。
    [3] 吳昌益, 使用頻譜演進模型之國語語音合成研究, 國立台灣科技大學資訊工程研究所碩士論文,2007。
    [4] J. Bonada and X. Serra, “Synthesis of the singing voice by performance sampling and spectral models,” IEEE Signal Processing Magazine, Vol. 24, pp. 67-79, 2007.
    [5] C. Y. Lin, T. Y. Lin, and J. S. R. Jang, “A corpus-based singing voice synthesis sys-tem for Mandarin Chinese,” 13-th ACM international conferenceon Multimedia, Singapore, pp. 359-362, November 2005.
    [6] Y. Meron, High Quality Singing Synthesis Using the Selection-based Synthesis Scheme, Ph.D. Dissertation, University of Tokyo, 1999.
    [7] D. T. Chappell and J. H. L. Hansen, “A comparison of spectral smoothing methods for segment concatenation based speech synthesis,” Speech Communication, vol. 36, pp. 343–374, 2002.
    [8] A. Conkie and S. Isard, “Optimal coupling of diphones,“ Progress in Speech Synthesis, Springer, New York, Chapter 23, pp. 293–304, 1997.
    [9] K. K. Paliwal, “Interpolation properties of linear prediction parametric representations,” Proc. EuroSpeech’95, Madrid, Vol. 2, pp. 1029–1032, 1995.
    [10] H. R. Pfitzinger, “DFW-based Spectral Smoothing for Concatenative Speech Synthesis,” Proc. ICSLP 2004, Korea, pp. 1397-1400, 2004.
    [11] T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi and T. Kitamura, “Duration Modeling in HMM-based Speech Synthesis System,“ Proc. of ICSLP, Vol. 2, pp. 29–32, 1998.
    [12] O. Capp′e and E. Moulines, “Regularization techniques for discrete cepstrum estimation,” IEEE Signal Processing Letters, vol. 3, no. 4, pp. 100–102, 1996.
    [13] 古鴻炎、蔡松峯,”基於離散倒頻譜之頻譜包絡估計架構及其於語音轉換之應用”,投稿於第二十一屆自然語言與語音處理研討會(ROCLING 2009)。
    [14] Kåre Sjölander and Jonas Beskow, Centre of Speech Technolodge at KTH,
    http://www.speech.kth.se/wavesurfer/.
    [15] J. Sundberg, “Effects of the vibrato and the ‘singing formant’ on pitch,” Musica Slovaca VI, Bratislava, pp. 51–69, 1978.
    [16] J. I. Shonle and K. E. Horan, “The pitch of vibrato tones,” J. Acoust. Soc. Am. 67, pp. 246–252. 1980.
    [17] J. C. Brown and K. V. Vaughn, “Pitch center of stringed instrument vibrato tones,” J. Acoust. Soc. Am. 100, pp. 1728–1735. 1996.
    [18] Wikipedia, “Multilayer perceptron,”
    http://en.wikipedia.org/wiki/Multilayer_perceptron.
    [19] 葉怡成,類神經網路模式應用與實作,儒林圖書公司,2006。
    [20] Wikipedia, “Recurrent neural network,”
    http://en.wikipedia.org/wiki/Recurrent_neural_network.
    [21] S. Imai and Y. Abe, “Spectral envelope extraction by improved cepstral method,” Electron. and Commun. vol. 62-A, no. 4, pp. 10–17, 1979.
    [22] A. Röbel and X. Rodet, “Real Time Signal Transposition with Envelope Preservation in the Phase Vocoder,” Proc. Int. Computer Music Conference (ICMC'05), Barcelona, pp. 672-675, 2005.
    [23] Alan V. Oppenheim, Ronald W. Schafer and John R. Buck, Discrete-time signal processing 2/E, Prentice-Hall, 1999.
    [24] D. O'Shaughnessy, Speech Communications 2/E, IEEE Press, 2000.
    [25] 王如江,基於歌聲表情分析與單元選擇之國語歌聲合成研究,國立台灣科技大學資訊工程研究所碩士論文,2007。
    [26] Hideki Kenmochi and Hayato Ohshita, “VOCALOID - Commercial singing synthesizer based on sample concatenation,” INTERSPEECH 2007, 2007.

    QR CODE