研究生: |
蔡哲彰 Che-chang Tsai |
---|---|
論文名稱: |
國語合成歌聲流暢度改進之研究 Fluency Improving for Mandarin Singing Voice Synthesis |
指導教授: |
古鴻炎
Hung-yan Gu |
口試委員: |
王新民
Hsin-min Wang 余明興 Ming-shing Yu 洪西進 Shi-jinn Horng 林柏慎 Bor-shen Lin |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
論文出版年: | 2009 |
畢業學年度: | 97 |
語文別: | 中文 |
論文頁數: | 72 |
中文關鍵詞: | 國語歌聲合成 、頻譜演進 、共振峰軌跡連接 |
外文關鍵詞: | Mandarin singing voice synthesis, spectrum progression, formant trace connecting |
相關次數: | 點閱:353 下載:1 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文的目標是使用少量的合成單元來合成出流暢的歌聲。在相鄰音節之間,我們提出以反射係數為基礎的頻譜內插方法,來平順連接相鄰音節的共振峰軌跡;在音節內流暢度的改進方面,我們參考以前學長用於語音合成之頻演路徑之概念,修改出適用於歌聲合成的頻演模型。我們也修改了HNM合成程式,以配合上述兩項流暢性改進方法;此外,以更多的語料來訓練ANN抖音參數模型,希望藉以提升合成歌聲的自然度。經由主觀的自然度聽測實驗,所得的評分顯示,使用頻演模型及共振峰軌跡連接處理,的確可以增進歌聲信號的流暢性。
In this thesis, the goal is to synthesize fluent singing voice by using a small amount of synthesis units. Between adjacent syllables, we propose a reflection-coefficient based spectrum interpolation method to let the formant traces be smoothly connected. To improve the intra-syllable fluency level of a synthetic syllable, we make use of the concept of spectrum progression proposed for speech synthesis to construct a spectrum progression model suitable for singing voice synthesis. Since the two fluency promoting methods must be realized with signal synthesis, we modify and correct the HNM synthesis program developed by others. In addition, we use a larger corpus to train the ANN vibrato parameter models in order to increase the naturalness level of the synthetic singing voice. According to the results of the listening tests, the score obtained by using spectrum progression model and formant trace connecting processing is indeed higher than those obtained without such processing.
[1] 古鴻炎、廖皇量,”用於國語歌聲合成之諧波加噪音模型的改進研究”,WOCMAT 2006 國際電腦音樂與音訊技術研討會(台北),session 2 (音訊處理I),2006。
[2] 林正甫, 使用ANN抖音參數模型之國語歌聲合成, 國立台灣科技大學資訊工程研究所碩士論文,2008。
[3] 吳昌益, 使用頻譜演進模型之國語語音合成研究, 國立台灣科技大學資訊工程研究所碩士論文,2007。
[4] J. Bonada and X. Serra, “Synthesis of the singing voice by performance sampling and spectral models,” IEEE Signal Processing Magazine, Vol. 24, pp. 67-79, 2007.
[5] C. Y. Lin, T. Y. Lin, and J. S. R. Jang, “A corpus-based singing voice synthesis sys-tem for Mandarin Chinese,” 13-th ACM international conferenceon Multimedia, Singapore, pp. 359-362, November 2005.
[6] Y. Meron, High Quality Singing Synthesis Using the Selection-based Synthesis Scheme, Ph.D. Dissertation, University of Tokyo, 1999.
[7] D. T. Chappell and J. H. L. Hansen, “A comparison of spectral smoothing methods for segment concatenation based speech synthesis,” Speech Communication, vol. 36, pp. 343–374, 2002.
[8] A. Conkie and S. Isard, “Optimal coupling of diphones,“ Progress in Speech Synthesis, Springer, New York, Chapter 23, pp. 293–304, 1997.
[9] K. K. Paliwal, “Interpolation properties of linear prediction parametric representations,” Proc. EuroSpeech’95, Madrid, Vol. 2, pp. 1029–1032, 1995.
[10] H. R. Pfitzinger, “DFW-based Spectral Smoothing for Concatenative Speech Synthesis,” Proc. ICSLP 2004, Korea, pp. 1397-1400, 2004.
[11] T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi and T. Kitamura, “Duration Modeling in HMM-based Speech Synthesis System,“ Proc. of ICSLP, Vol. 2, pp. 29–32, 1998.
[12] O. Capp′e and E. Moulines, “Regularization techniques for discrete cepstrum estimation,” IEEE Signal Processing Letters, vol. 3, no. 4, pp. 100–102, 1996.
[13] 古鴻炎、蔡松峯,”基於離散倒頻譜之頻譜包絡估計架構及其於語音轉換之應用”,投稿於第二十一屆自然語言與語音處理研討會(ROCLING 2009)。
[14] Kåre Sjölander and Jonas Beskow, Centre of Speech Technolodge at KTH,
http://www.speech.kth.se/wavesurfer/.
[15] J. Sundberg, “Effects of the vibrato and the ‘singing formant’ on pitch,” Musica Slovaca VI, Bratislava, pp. 51–69, 1978.
[16] J. I. Shonle and K. E. Horan, “The pitch of vibrato tones,” J. Acoust. Soc. Am. 67, pp. 246–252. 1980.
[17] J. C. Brown and K. V. Vaughn, “Pitch center of stringed instrument vibrato tones,” J. Acoust. Soc. Am. 100, pp. 1728–1735. 1996.
[18] Wikipedia, “Multilayer perceptron,”
http://en.wikipedia.org/wiki/Multilayer_perceptron.
[19] 葉怡成,類神經網路模式應用與實作,儒林圖書公司,2006。
[20] Wikipedia, “Recurrent neural network,”
http://en.wikipedia.org/wiki/Recurrent_neural_network.
[21] S. Imai and Y. Abe, “Spectral envelope extraction by improved cepstral method,” Electron. and Commun. vol. 62-A, no. 4, pp. 10–17, 1979.
[22] A. Röbel and X. Rodet, “Real Time Signal Transposition with Envelope Preservation in the Phase Vocoder,” Proc. Int. Computer Music Conference (ICMC'05), Barcelona, pp. 672-675, 2005.
[23] Alan V. Oppenheim, Ronald W. Schafer and John R. Buck, Discrete-time signal processing 2/E, Prentice-Hall, 1999.
[24] D. O'Shaughnessy, Speech Communications 2/E, IEEE Press, 2000.
[25] 王如江,基於歌聲表情分析與單元選擇之國語歌聲合成研究,國立台灣科技大學資訊工程研究所碩士論文,2007。
[26] Hideki Kenmochi and Hayato Ohshita, “VOCALOID - Commercial singing synthesizer based on sample concatenation,” INTERSPEECH 2007, 2007.