簡易檢索 / 詳目顯示

研究生: 鍾念佑
Nien-Yu Chung
論文名稱: 使用聲學特徵組合之歌唱聲與樂器聲識別
Recognition of Singing Voice and Instrument Sound Using Combinations of Acoustic Features
指導教授: 古鴻炎
Hung-Yan Gu
口試委員: 王新民
Hsin-Min Wang
范欽雄
Chin-Shyurng Fahn
鍾國亮
Kuo-Liang Chung
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2016
畢業學年度: 104
語文別: 中文
論文頁數: 74
中文關鍵詞: 聲音辨識高斯混合模型梅爾倒頻譜係數基週偵測係數Chroma延伸特徵
外文關鍵詞: Sound recognition, Gaussian mixture model(GMM), mel-frequency cepstral coefficients(MFCC), pitch-detection coefficients(PDC), Chroma extended features
相關次數: 點閱:176下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文的目標在於分辨輸入的聲音片段屬於歌唱聲(含有歌聲)或是樂器聲(不含歌聲),研究焦點放於組合不同種類之特徵係數以找出最具有識別效果之特徵向量,在此採用的特徵係數包括梅爾倒頻譜係數、基週偵測特徵係數與Chroma延伸特徵,並且加入前列係數的差分值。所採用的辨識方法則是基於高斯混合模型(GMM)之方法,我們分別訓練8、16、32和64等不同混合數之高斯混合模型,再據以進行外部聲音的識別實驗。在音框為單位之識別實驗中,我們嘗試了6種組合的特徵向量,其中MFCC加上基週偵測係數可比MFCC顯著提升識別率,若再加上差分值及投票機制處理,則音框之識別率最高可達71.3%。在片段為單位之識別實驗,我們嘗試了8種組合的特徵向量,對於純樂器聲片段之識別,識別率最高的是40維係數組合之特徵向量,達到97.1%,對於混合聲片段之識別,識別率最高的是17維係數組合之特徵向量(MFCC+P),達到94.7%,若以平均識別率來看,則最高的是40維係數之特徵向量,達到了93.8%。整體來說,組合MFCC、基週偵測係數、Chroma延伸特徵及它們的差分值之40維特徵向量,可得到最高的聲音片段識別率。


    This thesis aims to recognize the class that an input sound clip belongs to. The two sound classes concerned here are singing sound (with vocal singing) and instrument sound (without vocal singing). The focus of this research is placed on testing different combinations of those considered acoustic features in order to find a most effective feature vector for sound class recognition. The acoustic coefficients considered here include mel-frequency cepstral coefficients (MFCC), pitch-detection coefficients (PDC), Chroma extended features, and their delta coefficients. The recognition method studied is based on Gaussian mixture model (GMM). Different numbers of mixtures, e.g. 8, 16, 32 and 64, are used to train the parameters of the GMMs. Then, these GMMs are used in the experiments for recognizing external sound clips. In the experiments for sound frame recognition, we have tried 6 different feature vectors, i.e. 6 different combinations of acoustic features. Among the 6 feature vectors, the vector, MFCC plus PDC, is found to be significantly better than MFCC only in recognition rate. If the feature vector is augmented with delta values and the processing of voting mechanism is added, the best recognition rate achieved is 71.3% for sound frame recognition. In the experiments for sound clips recognition, we have tried 8 different feature vectors, i.e. 8 different combinations of acoustic features. To recognize pure-instrument sound clips, the feature vector consisting of 40 coefficients is found to be the best. The recognition rate achieved is 97.1%. To recognize mixed-sound clips, the feature vector consisting of 17 coefficients (MFCC+PDC) is found to be the best. The recognition rate achieved is 94.7%. If average recognition rate is concerned, the feature vector consisting of 40 coefficients would be the best. The recognition rate achieved is 93.8%. Therefore, the feature vector that obtains the highest recognition rate is of 40 dimensions and consists of MFCC, PDC, Chroma-extended features, and their delta values.

    摘要 I ABSTRCT II 致謝 III 目錄 IV 圖表索引 VI 第1章 緒論 1 1.1 研究動機 1 1.2 文獻回顧 1 1.3 研究方法 5 1.3.1 歌聲與樂器聲識別系統之訓練流程 6 1.3.2 歌聲與樂器聲識別系統之測試流程 7 1.4 論文架構 10 第2章 語料準備與特徵係數擷取 11 2.1 語料簡介 11 2.2 自動標音 11 2.3 聲音片段之切割 14 2.4 音框切割與窗函數 15 2.5 梅爾倒頻譜係數估計 15 2.6 基週偵測之特徵係數 19 2.7 Chroma特徵係數與能量標準差係數 22 2.7.1 Chroma向量 25 2.7.2 Chroma之延伸特徵係數 26 2.7.3 Chroma延伸特徵係數之差分 27 2.7.4 均方根能量之標準差 28 第3章 高斯混合模型(GMM) 30 3.1 簡介 30 3.2 歌唱聲、樂器聲與混合聲之GMM訓練 32 3.3 GMM之機率計算 35 第4章 音框單位之聲音識別實驗 37 4.1 十二維MFCC之特徵係數實驗 40 4.2 MFCC加五維基週偵測特徵係數之實驗 41 4.3 加入差分值之實驗 42 4.3.1 取3音框之差分 43 4.3.2 取11音框之差分 44 4.4 Chroma之延伸特徵及差分值之識別實驗 45 4.4.1 A型Chroma延伸特徵向量 45 4.4.2 B型Chroma延伸特徵向量 47 4.5 綜合討論 48 第5章 片段單位之聲音識別實驗 52 5.1 十二維MFCC特徵之實驗 52 5.2 MFCC加五維基週偵測特徵之實驗 53 5.3 加入差分特徵值之實驗 54 5.3.1 取3音框之差分 54 5.3.2 取11音框之差分 55 5.4 Chroma延伸特徵及差分值之識別實驗 56 5.4.1 A型Chroma延伸特徵向量 56 5.4.2 B型Chroma延伸特徵向量 57 5.5 結合多種特徵向量之識別實驗 58 5.5.1 20維特徵向量 58 5.5.2 40維特徵向量 59 5.6 綜合討論 60 第6章 結論 68 參考文獻 72

    [1] Y. V. Srinivasa Murthy, Shashidhar G. Koolagudi, Classification of vocal and non-vocal regions from audio songs using spectral features and pitch variations. CCECE, 2015.
    [2] L. Feng, A.B. Nielsen, L.K. Hansen, Vocal segments classification in popular music, in Proc. ISMIR, pp. 121-126, 2008.
    [3] A.L. Berenzweig, D.P.W. Ellis, Locating singing voice segments within music signals, IEEE Transactions on Audio, pp. 119 – 122, 2001.
    [4] T.L. Nwe , A. Shenoy, Y. Wang, Singing Voice Detection in Popular Music, Proceedings of the 12th annual ACM international conference on Multimedia, pp. 324-327, 2004.
    [5] S .Vembu, S. Baumann, Separation of Vocals from Polyphonic Audio Recordings, In Proc of 6th International Conference on Music Information Retrieval (ISMIR), pp. 1–8. 2005.
    [6] M. Ramona, G. Richard, B. David, Vocal detection in music with support vector machines, IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) , pp. 1885–1888, 2008.
    [7] G. Sell, P. Clark, Music Tonality Features for Speech/Music Discrimination, IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) , pp. 2489–2493, 2014.
    [8] W.Cai, Q. Li, X. Guan, Automatic Singer Identification Based on Auditory Features Natural Computation (ICNC) , pp. 1624–1628, 2011.
    [9] S.D. You, Y.C. Wu, Comparative Study of Singing Voice Detection Methods Computer Science and its Applications, Lecture Notes in Electrical Engineering, pp.1291-1298, 2015.
    [10] Y. Li , D. Wang, Singing voice separation from monaural recordings, International Conference on Music Information Retrieval (ISMIR), pp. 176–179, 2006.
    [11] A. Mesaros, T. Virtanen, A. Klapuri, Singer identification in polyphonic music using vocal separation and pattern recognition methods, International Conference on Music Information Retrieval (ISMIR), pp. 375–378, 2007.
    [12] N.C. Maddage, C. Xu, Y. Wang, Singer identification based on vocal and instrumental models, Proceedings of the 17th International Conference on (ICPR), pp. 375–378, 2004.
    [13] W. H. Tsai, H.M. Wang, Automatic singer recognition of popular music recordings via estimation and modeling of solo vocal signals, Processing IEEE Transactions on Audio, Speech, and Language, pp. 330- 341, 2006.
    [14] T.L. Nwe, H. Li, Exploring Vibrato-Motivated Acoustic Features for Singer identification, Processing IEEE Transactions on Audio, Speech, and Language, pp.519-530, 2007.
    [15] T. Zhang : Automatic singer identification, Multimedia and Expo, Proceeding international Conference on ICME, pp. 1- 33, 2003.
    [16] M. Rocamora, P. Herrera, Comparing Audio Descriptors for Singing Voice Detection in Music Audio Files. In Proceeding of 11th Brazilian Symposium on Computer Music, pp. 1–10, 2007.
    [17] Y.E. Kim, B. Whitman, Singer identification in popular music recordings using voice coding features, In Proceeding of International Conference on Music Information Retrieval, pp. 13–17, 2002.,
    [18] 王小川,語音訊號處理(修訂二版),全華圖書公司,2009。
    [19] 陳彥樺,以聲學語言模型、全域變異數匹配及目標音框挑選作強化之語音轉換系統,國立台灣科技大學,2015。
    [20] 蔡松,GMM為基礎之語音轉化法的改進,國立台灣科技大學,2009。
    [21] 許瓊之,整合聲學指引規則至HMM最佳路徑搜尋之歌聲分段方法,國立台灣科技大學,2014。

    QR CODE