簡易檢索 / 詳目顯示

研究生: 葉政隆
Cheng-Lung Yeh
論文名稱: 基於音樂樣本使用卷積神經網路來進行音樂類型分類之研究
Using Convolutional Neural Network for Classifying Music Genre Based on Samples
指導教授: 吳怡樂
Yi-Leh Wu
口試委員: 何瑁鎧
Maw-Kae Hor
閻立剛
Li-Kang Yen
陳建中
Jiann-Jone Chen
唐政元
Cheng-Yuan Tang
吳怡樂
Yi-Leh Wu
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2017
畢業學年度: 105
語文別: 英文
論文頁數: 48
中文關鍵詞: 音樂類型深度卷積神經網路取樣樣本
外文關鍵詞: Music Genre, Convolution Neural Network, Sample
相關次數: 點閱:309下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

近年來的研究已經指出使用卷積神經網路(CNN)來進行圖片分類極為有效,這是因為卷積神經網路是依照資訊的結構性來進行訓練。也有一些研究[1]試著將卷積神經網路應用在非圖片資訊的使用上,想要知道在非圖片領域卷積神經網路能做到什麼地步。本研究會使用卷積神經網路來進行音樂類型的分類,擷取音樂取樣的樣本輸入卷積神經網路進行訓練,觀察卷積神經網路可以從這些較原始、純粹、人眼看似雜亂無章的資訊中學到什麼。最後的實驗結果顯示卷積神經網路確實能從這些人眼看似雜亂無章的圖片中區分音樂類型。


The ability of using convolutional neural network (CNN) for classified image have been proved efficiently in studies recent years. Because CNN is relevant to structural information. But there are few studies [1] using convolutional neural network for non-image classification, trying to explore its limit. This study will use the CNNs for music genre classification. We employ music samples as input to see how the CNNs can learn from these pure and simple features. The experiment results suggest that the CNNs can work on images which seem non-structured to human eyes.

論文摘要 II Abstract III Contents IV LIST OF FIGURES V LIST OF Tables VI Chapter 1. Introduction 1 Chapter 2. Deep Learning Model 4 Chapter 3. Proposed Methods 7 3.1 Music Sample 7 3.2 Music Waveform 10 3.3 Music Spectrum 11 Chapter 4. Experiment Result 13 4.1 Dataset and Environment 13 4.2 Sample Experiments 13 4.3 Waveform Experiments 24 4.4 Spectrum Experiments 26 4.5 Voting System 26 Chapter 5. Conclusions and Future work 38 References 39

[1] H. Orii, S. Tsuji, T. Kouda, ”Tactile texture recognition using convolutional neural networks for time-series data of pressure and 6-axis acceleration sensor”, IEEE International Conference on Industrial Technology (ICIT), 2017.
[2] J. Salamon, B. Rocha, E. Gómez, "Musical genre classification using melody features extracted from polyphonic music signals", IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2012.
[3] C. H. Lee, C. H. Chou, C. C. Lien, J. C. Fang, “Music genre classification using modulation spectral features and multiple prototype vectors representation”, IEEE 4th International Congress on Image and Signal Processing (CISP), 2011.
[4] D. Pradeep Kumar, B. J. Sowmya, K. G. Srinivasa, “A comparative study of classifiers for music genre classification based on feature extractors”, IEEE Distributed Computing, VLSI, Electrical Circuits and Robotics (DISCOVER), 2016.
[5] K. Hazim, S. Tomas, "Multimodal Genre Classification of TV programs and YouTube Videos", Multimedia Tools and Applications, vol. 63, no. 2, pp. 547-567, 2013.
[6] P. Ahrendt, A. Meng, J. Larsen, “Decision time horizon for music genre classification using short time features”, 12th European Signal Processing Conference, 2004.
[7] J. M. de Sousa, E. T. Pereira, L. R. Veloso, “A robust music genre classification approach for global and regional music datasets evaluation”, IEEE International Conference on Digital Signal Processing (DSP), 2016.
[8] A. B. Chan, A. H. Chun, “Automatic Musical Pattern Feature Extraction Using Convolutional Neural Network”, Proceedings of the International MultiConference of Engineers and Computer Scientists (IMECS), 2010.
[9] G. Tzanetakis, P. Cook, “Musical genre classification of audio signals”, IEEE Transactions on speech and audio processing, pp. 293–302, 2002.
[10] M. Kobayakawa, M. Hoshi, “Musical genre classification of MPEG-4 TwinVQ audio data”, IEEE International Conference on Multimedia and Expo (ICME), 2011.
[11] K. C. Hsu, C. S. Lin, T. S. Chi, “Sparse Coding Based Music Genre Classification Using Spectro-Temporal Modulations”, Proceedings of the 17th ISMIR Conference, 2016.
[12] T. Nakashika, C. Garcia, T. Takiguchi, “Local-feature-map Integration Using Convolutional Neural Networks for Music Genre Classification”, Interspeech ISCA's 13th Annual Conference, 2012.
[13] W. Zhang, W. Lei, X. Xu, X. Xing, “Improved Music Genre Classification with Convolutional Neural Networks”, Interspeech in San Francisco, 2016.
[14] B. Hua, F. L. Ma, L. C. Jiao, “Research on Computation of GLCM of Image Texture”, Chinese Journal of Electronics, 2006.
[15] GTZAN dataset. http://marsyasweb.appspot.com/download/data_sets/ ,
Referenced on May 18 th, 2017
[16] J. Dai, W. Liu, H. Zheng, W. Xue, C. Ni, “Semi-supervised Learning of Bottleneck Feature for Music Genre Classification”, Chinese Conference on Pattern Recognition (CCPR), pp. 552-562, 2016.
[17] Support vector machine, https://en.wikipedia.org/wiki/Support_vector_machine , Referenced on May 20 th, 2017.
[18] GPU development in recent years,
http://bkultrasound.com/blog/the-next-generation-of-ultrasound-technology , Referenced on May 20 th, 2017.
[19] Typical convolutional neural network architecture,
https://en.wikipedia.org/wiki/Convolutional_neural_network ,
Referenced on May 20 th, 2017.
[20] A. Krizhevsky, I. Sutskever, G. E. Hinton. “Imagenet classification with deep convolutional neural networks”, in Advances in neural information processing systems (pp. 1097-1105), 2012.
[21] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, A. C. Berg. “Imagenet large scale visual recognition challenge” International Journal of Computer Vision, 115(3), 211-252, 2015.
[22] WAVE PCM soundfile format, http://soundfile.sapp.org/doc/WaveFormat/ ,
Referenced on May 20 th, 2017.
[23] Fast Fourier Transform (FFT), https://read01.com/7DA3N4.html ,
Referenced on May 22 th, 2017.
[24] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. B. Girshick, S. Guadarrama, T. Darrell, “Caffe: Convolutional architecture for fast feature embedding”, In Proceedings of the ACM International Conference on Multimedia, pp. 675-678, 2014.

無法下載圖示 全文公開日期 2022/07/18 (校內網路)
全文公開日期 本全文未授權公開 (校外網路)
全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
QR CODE