簡易檢索 / 詳目顯示

研究生: 林摯烜
Chih-Hsuan Lin
論文名稱: 一個結合聲音特徵與深度學習之歌曲流行預測模型 — 以臺灣樂壇為例
A Song Popularity Prediction Model Based on Sound Features Combined with Deep Learning — Taking Taiwan Music Market as an Example
指導教授: 范欽雄
Chin-Shyurng Fahn
口試委員: 王榮華
Jung-Hua Wang
鄭為民
Wei-Min Jeng
陳冠宇
Kuan-Yu Chen
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 66
中文關鍵詞: 臺灣樂壇流行音樂卷積神經網路深度學習隨機森林長短期記憶體
外文關鍵詞: Taiwan Music Market, Popular Music, Convolutional Neural Network, Deep Learning, Random Forest, Long Short-Term Memory
相關次數: 點閱:336下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 音樂產業對於臺灣的經濟有著重要地位,光是在107年,臺灣地區的流行音樂產業總營收就有大約為新臺幣199.84億元,並且臺灣的唱片市場在亞洲的排名為第5名,世界排名為24名;然而在流行音樂的產業上,本來就擁有投資風險高這一特質,因為音樂的創作流程包含創作、錄製與行銷等,而每道流程都需要較高的固定成本,並且近年來,由於網路的普及,網路平台已經成為流行音樂主要的發行與販賣的管道,在大量資訊與其他娛樂商品的競爭下,往往需要投入更多的成本在行銷上面。因此,這些可能會賠很多錢的風險也使得很多尚未成名的獨立音樂創作人不敢勇於發行自己的音樂。
    為了解決此問題,本論文提出一個可用於預測歌曲是否流行的方法。首先,我們透過計算取得一些傳統的音樂特徵,並藉由Spotify再取得一些Spotify特有的特徵,接著,我們也會利用卷積神經網路(Convolutional Neural Network; CNN)與長短期記憶體(Long Short-Term Memory; LSTM)對音訊進行特徵的提取,最後再將傳統特徵、Spotify的特徵,以及CNN+LSTM所取得的特徵輸進隨機森林(Random Forest)來進行歌曲熱門程度的分類。
    在實驗的部分,我們使用的是Kaggle上的公開資料集,該資料集包含2017年每個地區每日前200的歌曲以及其每日流量,而我們只會使用臺灣地區的資料,並將資料依照流量分成前10名、11-50名、倒數50名,以及其它總共4個類別;評分方式則是使用準確率(Accuracy rate)、Cohen’s kappa係數,以及前10名的召回率(Recall rate),而我們的模型可以同時從音訊以及傳統特徵等兩方面來進行預測,在此資料集上,我們模型的準確率為70.8%、Cohen’s kappa係數為0.48、前10名的召回率為73.3%,並且平均預測一首歌所需花費的時間為0.005秒。


    The music industry plays an important role in Taiwan’s economy. In 2018 alone, the total revenue of the pop music industry in Taiwan was approximately NTD 19.984 billion, and Taiwan’s record market ranked fifth in Asia and ranked the 24th in the world. In the popular music industry, it has the characteristic of high investment risk because the music creation process includes creation, recording, and marketing, each of which requires a higher fixed cost. And in recent years, due to the popularity of the Internet, the Internet platform has become the major channel for distributing and selling popular music, and under the competition of a large amount of information and other entertainment products, it is often necessary to invest more costs in marketing.
    To solve the above problems, this thesis proposes a model that can be used to predict whether a song is popular. First, we obtain some traditional music features through calculation and some Spotify-specific features through Spotify, and we also use Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) to extract the features of the audio. Finally, we enter the traditional features, Spotify features, and the features obtained by CNN+LSTM into the random forest to classify the popularity of the songs.
    In the experiment, we use a public dataset on Kaggle. The dataset contains the top 200 songs of each region in 2017 and their daily stream. We will only use the data from Taiwan, and partition the data into four categories, including the top 10, 11-50, the middle, and the last 50 ranking according to the stream. Our model can make predictions from both audio and traditional features, and the evaluation method is to use accuracy, Cohen’s kappa coefficient, and top 10 recall. On this data set, the accuracy of our model is 70.8%, Cohen’s kappa coefficient is 0.48, and the recall of the top 10 is 73.3%, and the average time required to predict a song is 0.005 seconds.

    中文摘要 i Abstract ii 致謝 iii Contents iv List of Figures vi List of Tables viii Chapter 1 Introduction 1 1.1 Overview 1 1.2 Motivation 2 1.3 System Description 4 1.4 Thesis Organization 6 Chapter 2 Related Work 7 2.1 Early Music Popularity Prediction 7 2.2 Music Popularity Prediction with Artists 9 2.3 Music Popularity Prediction without Artists 11 Chapter 3 Music Preprocessing 19 3.1 Data Acquisition 19 3.2 Features Extraction 23 Chapter 4 Music Popularity Prediction Method 29 4.1 Artificial Neural Network 29 4.1.1 Fully connected layer 29 4.1.2 Dropout layer 30 4.1.3 Activation function 31 4.2 Random Forests 34 4.3 Long Short-Term Memory 38 4.4 Convolutional Neural Network 41 Chapter 5 Experimental Results 44 5.1 Experimental Setup 44 5.2 Music Popularity Datasets 45 5.3 The Results of Music Popularity Prediction 46 Chapter 6 Conclusions and Future Works 50 6.1 Conclusions 50 6.2 Future Work 51 References 52

    [1] RIT財團法人臺灣唱片出版事業基金會, 臺灣唱片業發展現況, [Online]. Available: http://www.rit.org.tw/index.php/thematic-project/5-5 (accessed on June 1, 2020).
    [2] F. Pachet and P. Roy, “Hit song science is not yet a science,” in Proceedings of the International Society for Music Information Retrieval, Philadelphia, Pennsylvania, pp. 355-360, 2008.
    [3] Y. Ni et al., “Hit song science once again a science,” in Proceedings of the International Workshop on Machine Learning and Music, Sierra Nevada, Spain, pp. 2-3, 2011.
    [4] F. Pachet, “Hit song science,” in Music Data Mining, CRC: Boca Raton, Florida, 2012, ch. 10, pp. 305-326.
    [5] R. Dhanaraj and B. Logan, ‘‘Automatic prediction of hit songs,’’ in Proceedings of the 6th International Conference on Music Information Retrieval, London, United Kingdom, pp. 488-491. 2005.
    [6] N. Cesa-Bianchi and C. Gentile, “Tracking the best hyperplane with a simple budget perceptron,” in Proceeding of the International Conference on Computational Learning Theory, Berlin, Heidelberg, pp. 143-167, 2006.
    [7] M. J. Salganik et al., “Experimental study of inequality and unpredictability in an artificial cultural market,” Science, vol. 311, no. 5762, pp. 854-856, 2006.
    [8] J. Pham et al., ‘‘Predicting song popularity,’’ Technique Report: CS229, Department of Computer Science. Stanford University, Stanford, California, 2016.
    [9] T. Bertin-Mahieux et al., ‘‘The million song dataset,’’ in Proceedings of the 12th International Conference on Music Information Retrieval, vol. 2, Miami, Florida, pp. 591-596, 2011.
    [10] A. Singhi et al., “Hit song detection using lyric features alone,” Presented at the 15th International Society for Music Information Retrieval, Taipei, Taiwan, Oct. 24-31, 2014.
    [11] H. Hirjee and D.G. Brown, “Rhyme analyzer: An analysis tool for rap lyrics,” present at the 11th International Society for Music Information Retrieval Conference, Utrecht, Netherlands, Aug. 9 -13, 2010.
    [12] L. C. Yang et al., “Revisiting the problem of audio-based hit song prediction using convolutional neural networks,” in Proceedings of the 42nd IEEE International Conference on Acoustics, Speech and Signal Processing, New Orleans, Louisiana, pp. 621-625, 2017.
    [13] C. Szegedy et al., “Going deeper with convolutions,” 2014, arXiv: 1409.4842.
    [14] J. Y. Liu and Y. H. Yang, “Event localization in Music auto-tagging,” in Proceedings of the 24th ACM international conference on Multimedia, Amsterdam, Netherlands, pp. 1048-1057, 2016.
    [15] L. C. Yu et al., “Hit song prediction for pop music by Siamese CNN with ranking loss.” 2017, arXiv:1710.10814.
    [16] S. Chopra et al., “Learning a similarity metric discriminatively, with application to face verification,” in Proceedings of the International IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, San Diego, California, pp. 539-546, 2005.
    [17] L. Lu et al., ‘‘A robust audio classification and segmentation method,’’ in Proceedings of the 9th ACM international conference on multimedia, Amsterdam, Netherlands, pp. 203-211, 2001.
    [18] G. Tzanetakis and P. Cook, ‘‘Musical genre classification of audio signals,’’ IEEE Transaction on Speech and Audio Processing, vol. 10, no. 5, pp. 293-302, July, 2002.
    [19] E. Scheirer, and M. Slaney, “Construction and evaluation of a robust multifeature speech/music discriminator,” in Proceedings of the International IEEE International Conference on Acoustics, Speech, and Signal Processing, Munich, Germany, pp. 1331-1334, 1997.
    [20] R. N. Shepard, “Circularity in judgments of relative pitch,” The Journal of the Acoustical Society of America, vol. 36, no. 12, pp. 2346-2353, 1964.
    [21] J. M. Grey and J. W. Gordon, “Perceptual effects of spectral modifications on musical timbres,” The Journal of the Acoustical Society of America, vol. 63, no. 5, pp. 1493-1500, 1978.
    [22] L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. 5-32, 2001.
    [23] T. K. Ho, “Random decision forests,” in Proceedings of the 3rd International Conference on Document Analysis and Recognition, Quebec, Canada, pp. 278-282, 1995.
    [24] T. K. Ho, “Random subspace method for constructing decision forests,” in Proceedings of the International IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, pp. 832-844, doi: 10.1109/34.730558.
    [25] T. M. Oshiro et al., “How many trees in a random forest?” in Proceedings of the 8th International Conference on Machine Learning and Data Mining in Pattern Recognition, Berlin, Germany, pp. 154-168, 2012.
    [26] L. Breiman et al., Classification and regression trees, London: Chapman and Hall, United Kingdom, 1984.

    [27] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997.
    [28] Y. LeCun et al., “Gradient-based learning applied to document recognition.” in Proceedings of the International IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
    [29] Eduardo, Spotify's Worldwide Daily Song Ranking, Accessed on: June 1, 2020. [Online]. Available: https://www.kaggle.com/edumucelli/spotifys-worldwide-daily-song-ranking.

    無法下載圖示 全文公開日期 2025/07/27 (校內網路)
    全文公開日期 2030/07/27 (校外網路)
    全文公開日期 2030/07/27 (國家圖書館:臺灣博碩士論文系統)
    QR CODE