簡易檢索 / 詳目顯示

研究生: 施孟寰
MENG-HUAN SHIH
論文名稱: 改進聲學特徵與歌曲比對方法於變造(音調/速度)歌曲檢索
Improving Acoustic Feature and Song Matching Method for Song Retrieval with Pitch/Tempo Modification
指導教授: 古鴻炎
Hung-Yan Gu
陳冠宇
Kuan-Yu Chen
口試委員: 古鴻炎
Hung-Yan Gu
陳冠宇
Kuan-Yu Chen
王新民
Hsin-Min Wang
林伯慎
Bor-Shen Lin
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 130
中文關鍵詞: 歌曲檢索聲學特徵動態時間校正兩步驟歌曲比對頻譜頂點雜湊
外文關鍵詞: song retrieval, acoustic feature, DTW, two stage song matching, spectral peak, hashing
相關次數: 點閱:266下載:8
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 我們研究了變造的查詢歌曲檢索資料庫歌曲之相關問題,輸入之查詢歌曲可能經過音調升降與播放速度改變的變造處理。關於聲學特徵,我們以前人的三頻譜頂點頻率值為基礎,研究了調性近似值估計及音高正規化的方法,來解決音調升降之變造問題,而速度改變之變造問題及查詢歌曲長度短小等問題,則安排於歌曲比對步驟作解決。歌曲比對除了考慮比對之速度,也要考慮準確性,因此我們研發了兩步驟分工之歌曲比對的新方法,這兩步驟分別稱為“初步歌曲比對”與“進階歌曲比對”。在初步歌曲比對中,經由雜湊來作特徵向量的快速配對,再以投票機制來快速的挑選出候選歌曲;在進階歌曲比對中,我們修改了一般的DTW演算法,以便讓時間長度差異很大的兩首歌曲也能作比對,來找出最佳匹配路徑的累積距離。使用1,000首歌的資料庫時,在查詢歌曲經過<+2,0,-2>個半音升降後,分別可得到<0.97,0.99,0.97>的檢索正確率,此外對於速度改變<+20,0,-20>的查詢歌曲,則可分別得到<0,98,0.99,0.97>的正確率。


    In this thesis, we have studied relevant problems about song retrieval with pitch/tempo modified query songs. As to acoustic features, we use the features of three spectral peaks proposed by others as the basis. Then we investigate key tone estimation and pitch normalization methods in order to solve the problem caused by a pitch-shifted query song. Additionally, the problems on query songs with tempo modification or shortened duration will be solved in the song matching steps. For matching two songs, not only processing speed but also accuracy must be considered. Therefore, we have developed a two-step based new method for song matching, i.e. “basic song matching" step and "advanced song matching” step. In the basic song matching step, we study a hash algorithm to fast match input feature vectors with those vectors in the database and then propose a voting mechanism to enable fast selection of candidate songs. In the advanced song matching step, a basic DTW (dynamic time warping) algorithm is improved in order that two songs of small or large differences in duration can all be matched to find the cumulated distance of the optimal matching path between them. Taking a database of 1,000 songs for song retrieving experiments, we obtain the correct rates, <0.97, 0.99, 0.97>, when the query songs are pitch shifted <+2, 0, -2> semitones respectively. Additionally, when the tempos of the query songs are changed to <120%,100%,80%>, we obtained the correct rates <0, 98, 0.99, 0.97>, respectively.

    摘要 I Abstract II 致謝 III 目錄 IV 圖表索引 VI 第1章 緒論 1 1.1 研究動機 1 1.2 研究方法 3 1.3 論文架構 7 第2章 歌曲檢索研究之回顧 8 2.1 基於地標特徵之檢索方法 10 2.2 基於四重地標特徵之檢索方法 13 2.3 尺度不變特徵轉換法 17 2.4 三頻譜頂點值之方法 21 2.5 跳空雙字組法 25 第3章 實驗資料準備與特徵分析 28 3.1 歌曲資料集介紹 28 3.2 歌曲資料集之問題 30 3.3 實驗歌曲之準備 34 3.4 信號前處理 37 3.4.1 前處理步驟與門檻值設定 38 3.4.2 靜音音框剔除 40 3.5 聲學特徵種類及分析 41 3.5.1 頻譜頂點之特徵與音高正規化 42 3.5.2 新增之頻寬特徵 49 3.5.3 CQT特徵 51 3.5.4 Chroma特徵 53 第4章 檢索處理之步驟 57 4.1 聲學特徵之雜湊 57 4.2 原始之歌曲比對方法 58 4.3 初步歌曲比對方法 60 4.4 進階之歌曲比對方法 63 第5章 參數調校之歌曲檢索實驗 70 5.1 實驗環境與評量方式 70 5.2 音框參數之設定與比較 71 5.3 靜音音框剔除之門檻 75 5.4 不同頻譜之歌曲檢索實驗 77 5.5 調性近似值估計之方法 79 5.6 新增頻寬特徵之實驗比較 81 5.7 只使用進階歌曲比對(省略初步比對)之實驗 84 5.8 查詢歌曲之精確率實驗 86 第6章 相關方法之比較 88 6.1 查詢歌曲長度同於庫存歌曲長度 88 6.1.1 類神經方法之簡介 89 6.1.2 類神經方法之問題 92 6.2 查詢歌曲長度同於庫存歌曲之實驗 94 6.2.1 量測初步歌曲比對之正確率 94 6.2.2 30秒查詢歌曲之實驗 95 6.3 查詢歌曲長度小於庫存歌曲長度之實驗 100 6.3.1 零秒~五秒隨機起始之初步歌曲比對正確率 100 6.3.2 零秒~五秒隨機起始之檢索實驗 102 6.3.3 五秒到二十五秒之檢索實驗 105 6.3.4 十秒到三十秒之檢索實驗 108 第7章 結論 111 參考文獻 116

    [1] A. Wang, “An Industrial Strength Audio Search Algorithm,” pp. 7-13, 2003.
    [2] TrackID in Google play, web resource, available: https://play.google.com/store/apps/details?id=com.sonyericsson.trackid.
    [3] “ACRcloud,” Url: https://www.acrcloud.com.
    [4] Jaap Haitsma,Ton Kalker, “A Highly Robust Audio Fingerprinting System,” The International Society for Music Information Retrieval (ISMIR), 2002.
    [5] Shumeet Baluja, “Waveprint: Efficient wavelet-based audio fingerprinting,” 2008.
    [6] Jacob George ,Ashok Jhunjhunwala, “Scalable and robust audio fingerprinting method tolerable to time-stretching,” IEEE International Conference on Digital Signal Processing (DSP), 2015.
    [7] Xiaoshuo Xu,Xiaoou Chen,Deshun Yang, “Effective Cover Song Identification Based On Skipping Bigrams,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018.
    [8] Reinhard Sonnleitner,Gerhard Widmer, “Quad-Based Audio Fingerprinting Robust To Time And Frequency,” Proc. of the 17th Int. Conference on Digital Audio Effects (DAFx-14), 2014.
    [9] Reinhard Sonnleitner,Gerhard Widmer, “Robust Quad-Based Audio Fingerprinting,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016.
    [10] Xiu Zhang, Linwei Li,Wei Li,Peizhong Lu,Wenqiang Zhang, “SIFT-based local spectrogram image descriptor: a novel feature for robust music identification,” EURASIP Journal on Audio, Speech, and Music Processing, 2015.
    [11] M. Muller, “Information Retrieval for Music and Motion,” Springer Verlag, 2007.
    [12] CC. Wang, MH Lin, JSR Jang, W Liou, “An Effective Re-ranking Method Based on Learning to Rank for Improving Audio Fingerprinting,” APSIPA, 2014.
    [13] Sunhyung Lee, Dongsuk Yook,Sukmoon Chang, “An Efficient Audio Fingerprint Search Algorithm for Music Retrieval,” IEEE Transactions on Consumer Electronics, 2013.
    [14] Sebastien Fenet, Gael Richard, Yves Grenier, “A Scalable Audio Fingerprint Method With Robustness To Pitch-Shifting,” International Society for Music Information Retrieval Conference ,ISMIR, 2011.
    [15] 官家任, “使用常數Q轉換與SURF特徵點的音訊指紋法之研究,” 2017.
    [16] Shumeet Baluja, Michele Covell, “Content Fingerprinting Using Wavelets,” Proc. CVMP, 2006.
    [17] Juheon Lee,Sungkyun Chang,Sang Keun Choe,Kyogu Lee, “cover song identification using song-to-song cross-similarity matrix with convolutional neural network,” International Conference on Acoustics, Speech, and Signal Processing(ICASSP), 2018.
    [18] D. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J.Comput. Vision, 2004.
    [19] H. Bay, T. Tuytelaars, and L. Van Gool, “Surf: Speeded up robust features,” Computer vision–ECCV, 2006.
    [20] Dominic Williams,Akash Pooransingh,Jesse Saitoo, “Efficient music identification using ORB descriptors of the spectrogram image,” EURASIP Journal on Audio, Speech,and Music Processing, 2017.
    [21] “SIFT: Theory and Practice,” Url: http://aishack.in/tutorials/sift-scale-invariant-feature-transform-features/.
    [22] Mayur Datar,Nicole Immorlica,Piotr Indyk,Vahab S. Mirrokni, “Locality-sensitive hashing scheme based on p-stable distributions,” Proceedings of the twentieth annual symposium on Computational geometry, 2004.
    [23] V. Lavrenko, “Locality-sensitive hashing: how it works,” 2015.Url: https://www.youtube.com/watch?v=Arni-zkqMBA.
    [24] Meinard Muller,Sebastian Ewert, “CHROMA TOOLBOX: MATLAB IMPLEMENTATIONS FOR EXTRACTING VARIANTS OF CHROMA-BASED AUDIO FEATURES,” Proceedings of the International Conference on Music Information Retrieval (ISMIR), 2011.
    [25] Michaël Defferrard,Kirell Benzi,Pierre Vandergheynst,Xavier Bresson, “FMA: A DATASET FOR MUSIC ANALYSIS,” 18th International Society for Music Information (ISMIR), 2017.
    [26] G. Tzanetakis,P. Cook, “Automatic Musical Genre Classification,” IEEE Transactions on Audio and Speech Processing, 2002.
    [27] Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere, “The Million Song Dataset,” In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR), 2011.
    [28] “Adobe Audition CC,” Adobe, 2018. Url: https://www.adobe.com/tw/products/audition.html.
    [29] J. C. Brown, “Calculation of a constant Q spectral transform,” Journal of the Acoustical Society of America, 1991.
    [30] Judith C. Brown, “Calculation of a constant Q spectral transform,” J. Acoust. Soc. Am., 1991.
    [31] Matt McVicar,Raúl Santos-Rodríguez,Yizhao Ni, Tijl De Bie, “Automatic Chord Estimation from Audio:A Review of the State of the Art,” IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 2014.
    [32] Shanshan Yao,Yunsheng Wang,Baoning Niu, “An Efficient Cascaded Filtering Retrieval Method for Big Audio Data,” IEEE Transactions on Multimedia, 2015.
    [33] Rabiner, L. and B.H. Juang, Fundamentals of Speech Recognition, 1993.
    [34] F. Chollet, “Keras: The python deep learning library.”.Astrophysics Source Code Library (2018)..
    [35] Brian McFee,Colin Raffe,Dawen Liang, “librosa: Audio and Music Signal Analysis in Python,” PROC. OF THE 14th PYTHON IN SCIENCE CONF. SCIPY , 2015.
    [36] Xiaoyu Qi,Deshun Yang,Xiaoou Chen, “Audio Feature Learning with Triplet-Based Embedding Network,” Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17), 2017.
    [37] L.Rosasco,E.De Vito,A.Caponnetto, M.Piana, “Are Loss Functions All the Same?,” Neural Computation, 2004.

    QR CODE