研究生: |
王如江 Ju-chiang Wang |
---|---|
論文名稱: |
基於歌聲表情分析與單元選擇之國語歌聲合成研究 Mandarin Singing Voice Synthesis Based on Singing Expression Analysis and Unit Selection |
指導教授: |
古鴻炎
Hung-yan Gu |
口試委員: |
余明興
none 王新民 Hsin-Min Wang 蔡偉和 Wei-Ho Tsai 鍾國亮 none |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
論文出版年: | 2007 |
畢業學年度: | 95 |
語文別: | 中文 |
論文頁數: | 96 |
中文關鍵詞: | 歌聲合成 、歌聲表情 、表情參數 、單元選擇 、人工智慧 |
外文關鍵詞: | singing voice expression |
相關次數: | 點閱:205 下載:3 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文研究了歌聲表情參數的分析,再應用音節單元選擇的方法及HNM(Harmonic plus Noise Model)信號合成的方法,來建立一個可以模仿真人表情的國語歌聲合成系統。我們重新錄製了國語三連音節,並且製作了自動標音程式。關於歌聲表情的分析,我們錄製了不同人所演唱的歌聲,再加以分析出各音符的基週軌跡、音量、音長、波形包絡等表情參數。然後在合成階段,使用所分析出的表情參數值去控制一個歌聲合成音符的子、母音邊界及ASR邊界的時間位置、基週軌跡、波形包絡及作連音的處理,也就是結合表情參數到HNM裡去合成出歌聲信號。此外,我們也作了聽測實驗,結果顯示表情參數的使用的確可改善合成歌聲的品質,並且所合成的歌聲可以相當程度地模仿出真人歌聲的表情。
In this thesis, we study to analyze the expression parameters of singing voice. Then, the methods of syllable unit selection and HNM(Harmonic plus Noise Model) based signal synthesis are integrated to build a Mandarin singing voice synthesis system. This system can mimic the vocal expression of a human by using the parameters analyzed from his recorded song. We have re-recorded Mandarin triple-syllable utterances, and developed an automatic sub-syllable segment boundary detection system. To analyze the expression parameters of singing voice, we have recorded songs sung by a different person. Then, those songs are analyzed to obtain each note’s parameters such as pitch contour, loudness, duration, and wave envelope. In the synthesis stage, the expression parameters obtained from a source note are used to determine the time positions of consonant, vowel, and ASR (Attack, Sustain, and Release) segments, and plan the pitch contour and wave envelope of a synthesized note. In fact, these expression parameters are taken into the HNM based mechanism to synthesize singing voice signals. In addition, we have done perception tests. The results show that the use of the expression parameters can indeed improve the quality of synthesized singing voice, and the synthesized singing voice can mimic a person’s singing expressions in a high similarity level.
[1] 廖皇量,國語歌聲合成信號品質改進之研究,國立台灣科技大學資訊工程研究所碩士論文,2006。
[2] 陳安璿,整合MIDI伴奏之歌唱聲合成系統,國立台灣科技大學資訊工程研究所碩士論文,2004。
[3] Xavier Rodet, “Synthesis and Processing of the Singing Voice”, IEEE Benelux Workshop on Model based Processing and Coding of Audio (MPCA-2002), Leuven, Belgium, November 15, 2002.
[4] Simon Dixon, "On the analysis of musical expression in audio signals", in Storage and Retrieval for Media Databases, 2003.
[5] Yorum Meron and Keikichi Hirose, "Synthesis of Vibrato Singing", Proceedings of the Acoustics, Speech, and Signal Processing on IEEE International Conference, 2000.
[6] FrunGois Thibuult and Philippe Depalle, "Adaptive Processing of Singing Voice Timbre", Electrical and Computer Engineering, Canadian Conference, Vol. 2, pp. 871- 874, 2-5, May, 2004.
[7] Youngtnoo E. Kim, "A Framework for Parametric Singing Voice Analysis/Synthesis", IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, October 19-22, 2003.
[8] Gerhard Widmer and Werner Goebl, " Computational Models of Expressive Music Performance: The State of the Art", Journal of New Music Research, Vol. 33, No. 3, pp. 203–216, 2004.
[9] Susan Jansens, Gerrit Bloothooft and Guus de Krom, "Perception and Acoustics of Emotions in Singing", 5th EUROSPEECH, 1997.
[10] 王國憲,基於聲學特性之國語語音合成流暢度改進之研究,國立台灣科技大學資訊工程研究所碩士論文,2003。
[11] 謝鴻文, 呂道誠, 江永進, 呂仁園, 幾個應用於連續語音音節切割之演算法的效能比較及系統實作, 長庚大學電機工程研究所, 長庚大學資訊工程研究所, 中央研究院資訊所, 清華大學統計研究所, (私人討論) 2005.
[12] 王小川, 語音訊號處理, 全華科技圖書股份有限公司, 2004.
[13] 周彥佐, 基於HNM 之國語、閩南語的語音合成研究, 國立台灣科技大學資訊工程研究所碩士論文, 2007.
[14] Yannis Stylianou, Harmonic plus Noise Models for Speech, combined with Statistical Methods, for Speech and Speaker Modification, Ph.D. thesis, Ecole Nationale Supèrieure des Télécommunications, Paris, France, 1996.
[15] Yannis Stylianou, "A simple and fast way for generating a harmonic signal", IEEE Signal Processing Letters, Vol.7, No.5, pp.111-113, 2000.
[16] Yannis Stylianou, "Applying the Harmonic plus Noise Model in Concatenative Speech Synthesis", IEEE Trans. Speech and Audio Processing, Vol.9, No.1, pp.21-29, 2001..
[17] Steve Young, Gunnar Evermann, Thomas Hain, Dan Kershaw, Gareth Moore, The HTK Book (for HTK version 3.3), Cambridge University Engineering Department, 2003.
[18] 游弘明,蔡偉和,王新民,流行歌曲之哼唱式檢索技術研究,中央研究院資訊科學研究所,WOCMAT 2005.
[19] Speech, Music and Hearing, Part of School of Computer Science and Communication, Royal Institute of Technology(KTH),
http://www.speech.kth.se/music/performance/
[20] 歐婉菁,劉長遠,合成歌聲,國立台灣大學資訊工程研究所碩士論文,2003。
[21] Tzu-Ying Lin, A Corpus-based Singing Voice Synthesis System for Mandarin Chinese, Master D. thesis, Dept. of Computer Science, National Tsing Hua University, 2004。
[22] 林政源,國語歌曲的歌聲合成,國立清華大學資訊工程研究所碩士論文,2004。
[23] 詹詩涵,基於音高調節之歌聲合成系統,國立清華大學資訊系統與應用研究所碩士論文,2005。
[24] http://www.ccarh.org/courses/253/handout/smf/, Standard MIDI File Structure.
[25] http://jedi.ks.uiuc.edu/~johns/links/music/midifile.html, Standard MIDI Files v1.0.
[26] Charles Dodge and Thomas A. Jerse, Computer Music: Synthesis, Composition, and Performance, 2'nd ed , Schirmer Books, 1997.