簡易檢索 / 詳目顯示

研究生: 黃志仁
Chih-Jen Huang
論文名稱: 基於音樂樣本的深度學習於音樂類型分類之研究
Sample-Based Music Genre Classification Using Deep Learning
指導教授: 吳怡樂
Yi-Leh Wu
口試委員: 陳建中
Jiann-Jone Chen
唐政元
Cheng-Yuan Tang
閻立剛
Li-Gang Yan
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 35
中文關鍵詞: 音樂類型深度卷積神經網路深度遞歸神經網路音樂樣本
外文關鍵詞: Music Genre, Convolution Neural Network, Recurrent Neural Network, Music Sample
相關次數: 點閱:470下載:20
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 音樂類型在音樂信息檢索領域具有特別重要的意義,近年來,許多研究藉由對音樂做各式各樣的特徵提取後,再使用機器學習的技術做訓練,結果在音樂類型的分類上獲得很大的成就,不過本文希望能夠跳過繁瑣的特徵提取過程,而是直接採用音樂的原始資料,因此,本文提出了第一個將原始音樂片段直接輸入到遞歸神經網絡模型,而無需特徵提取的方法,並且使用卷積神經網路模型做比較,數據集是使用GTZAN,它包含10種不同的類型,每種類型由100首曲目代表,最終結果顯示我們在不做特徵提取的情況下,通過使用卷積神經網路可以獲得平均47%的準確率,其中古典音樂可以達到最高96%的準確度,而藍調音樂只能獲得5%的準確度,在使用遞歸神經網絡下則是獲得平均29%的準確度,在古典音樂獲得最高87%的準確度,而在重金屬音樂則只獲得11%的準確度。


    Musical genre is very important in the field of music information retrieval. In recent years, many studies have employed various kinds of feature extraction and machine learning techniques to classify music genres. This paper hopes to skip the cumbersome feature extraction process, but directly use the raw data of music. Therefore, this paper proposes the first method to input the raw music clips directly into the recurrent neural network model without the method of feature extraction. We use the convolutional neural network model to compare with the recurrent neural network model. We use GTZAN as our database, which contains 10 different categories and each of them is represented by 100 tracks. The result shows that by using convolutional neural network we can get very high 96% accuracy with the Classical genre but only 5% accuracy with the Blues genre. The average accuracy is 47%. By using the recurrent neural network model, we can get 87% accuracy with the Classical genre but only 11% accuracy with the Metal genre. The average classification accuracy is 29%.

    論文摘要.................................III Abstract.................................IV Contents.................................VI LIST OF FIGURES..........................VII LIST OF TABLES...........................VIII Chapter 1. Introduction..................1 Chapter 2. Deep Learning Model...........4 2.1 Convolutional Neural Network.........4 2.2 Recurrent Neural Network.............6 Chapter 3. Proposed Methods..............8 3.1 Model structure......................8 3.2 Music Sample.........................11 3.3 Music Waveform.......................11 Chapter 4. Experiment Result.............13 4.1 Dataset and Environment..............13 4.2 CNN vs RNN...........................13 4.3 Sample Sizes.........................17 4.4 Sample Rates.........................18 Chapter 5. Conclusions and Future work...20 References...............................21 Appendix.................................23

    [1] G. Tzanetakis, P. Cook, “Musical genre classification of audio signals”, IEEE Transactions on speech and audio processing, pp. 293–302, 2002.
    [2] A. Meng, P. Ahrendt, J. Larsen, and L. K. Hansen, “Temporal feature integration for music genre classification,” Audio, Speech, and Language Processing, IEEE Transactions on, vol. 15, no. 5, pp. 1654–1664, 2007.
    [3] Y. Song and C. Zhang, “Content-based information fusion for semi-supervised music genre classification”, Multimedia, IEEE Transactions on, vol. 10, no. 1, pp. 145–152, 2008.
    [4] C. Tripp, H. Hung and M. Pontikakis, “Waveform-Based Musical Genre Classification”,
    https://www.academia.edu/4631247/Waveform-Based_Musical_Genre_Classification, Referenced on May 23 th, 2019
    [5] A. Holzapfel, Y. Stylianou, “Musical genre classification using nonnegative matrix factorization-based features”, IEEE Trans. Audio Speech Lang. Process., vol. 16, no. 2, pp. 424-434, Feb. 2008.
    [6] H. Lukashevich, J. Abeber, C. Dittmar, H. Grossmann, “From multi-labeling to multi-domain-laneling: A novel two-dimensional approach to music genre classification”, Proc. 10th Int. Conf. Music Inf. Retrieval, pp. 459-464, 2009.
    [7] Y. Panagakis, C. L. Kotropoulos, and G. R. Arce, “Music genre classification via joint sparse low-rank representation of audio features”, IEEE/ACM Trans. Audio, Speech and Lang. Proc., vol. 22, no. 12, pp. 1905–1917, Dec. 2014.
    [8] T. Li, A. Chan, and A. Chun. “Automatic musical pattern feature extraction using convolutional neural network”, in Proc. of the Int. MultiConf. of Engineers and Computer Scientists (IMECS), Hong Kong, Mar. 2010.
    [9] J. Dai, S. Liang, W. Xue, C. Ni, W. Liu, “Long Short-term Memory Recurrent Neural Network based Segment Features for Music Genre Classification”, in 10th International Symposium on Chinese Spoken Language Processing (ISCSLP), 2016.
    [10] J. Irvin, E. Chartock, and N. Hollander, “Recurrent Neural Networks with Attention for Genre Classification”, Stanford University, 2016.
    [11] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition”, Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, Nov 1998.
    [12] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks”, in Proc. Advances in Neural Information Processing Systems 25 1090–1098 (2012).
    [13] Wikipedia, “Typical convolutional neural network architecture”,
    https://en.wikipedia.org/wiki/Convolutional_neural_network, Referenced on May 24 th, 2019.
    [14] N. Ackermann, “Introduction to 1D Convolutional Neural Networks in Keras for Time Sequence”,
    https://blog.goodaudience.com/introduction-to-1d-convolutional-neural-networks-in-keras-for-time-sequences-3a7ff801a2cf, Referenced on May 20 th, 2019.
    [15] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory”, Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997.
    [16] Music formats uncovered,
    https://www.shedworx.com/music-formats-uncovered, Referenced on May 24 th, 2019.
    [17] D. Perrot, R. Gjerdigen, “Scanning the dial: An exploration of factors in identification of musical style”, Proc. Soc. Music Perception Cognition, pp. 88, 1999.
    [18] K. Martin, E. Schreirer, B. Vercoe, “Music content analysis through models of audition”, ACM Multimedia Workshop on Content Processing of Music for Multimedia Applications, 1998.
    [19] GTZAN dataset, http://marsyas.info/downloads/datasets.html, Referenced on May 16 th, 2019.
    [20] C. McKay, I. Fujinaga, “Automatic Genre Classification Using Large High-Level Musical Feature Sets”, Conf. on Music Information Retrivial, ISMIR, 2004.
    [21] T. Bertin-Mahieux, D. Ellis, B. Whitman and P. Lamere, “The million song dataset”, Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR), 2011.
    [22] Udeme Udofia, “Basic Overview of Convolutional Neural Network (CNN) ”,
    https://medium.com/@udemeudofia01/basic-overview-of-convolutional-neural-network-cnn-4fcc7dbb4f17, Referenced on May 31 th, 2019.
    [23] Wikipedia, “Long Short-Term Memory.svg”,
    https://en.wikipedia.org/wiki/File:Long_Short-Term_Memory.svg, Referenced on May 31 th, 2019.
    [24] C. P. Tang, K. L. Chui, Y. K. Yu, Z. Zeng, K. H. Wong, “Music genre classification using a hierarchical long short term memory (lstm) model”, International Workshop on Pattern Recognition IWPR, 2018.

    QR CODE