Author: 黃勤翔
Chin-Hsiang Huang
Thesis Title: 使用深度學習技術取得流行歌樂譜
Using Deep Learning Techniques to Get Pop Sheet Music
Advisor: 洪西進
Shi-Jinn Horng
Committee: 楊竹星
Zhu-Xing Yang
Jung-Gil Lee
Ren-wei Xie
Zhu-Xing Lin
Degree: 碩士
Department: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
Thesis Publication Year: 2022
Graduation Academic Year: 110
Language: 中文
Pages: 32
Keywords (in Chinese): 深度學習人工智慧歌唱轉譜
Keywords (in other languages): Efficient Net, Exponential Moving Average
In this thesis, an automatic singing transcription (AST) is based on deep learning techniques which is better than the one based on hidden Markov model and we also point out the criteria of singing transcription. The method proposed in this thesis can solve human beings which like to play their favorite music but without sheet music. Any song or music got from Youtube or anywhere can use the method proposed in this thesis to translate the favorite song or music to sheet music and then can play the song or music. The model proposed in this thesis not only can output json file including both note data and predict confidence, but can output MIDI file which can be listened immediately while outputting. Users can use these two files to fix some notes, and it help the proposed model to be more practical.
EfficientNetV2 is the backbone of this thesis, and an attention module is added which can let the proposed model focus more on the nearby predicted note data. Using exponential moving average (EMA) measure with the input data, the proposed model will not access data too long before. Both attention module and EMA measure can improve the correctness of the predict notes. Through our proposed model and bigger dataset proposed recently, our model gets higher values in various indicators than those on past deep learning models and traditional hidden Markov model in the research in automatic singing transcription.

目 錄 摘 要 1 ABSTRACT 2 致 謝 3 目 錄 4 圖 目 錄 6 表 目 錄 6 第一章 緒論 7 1.1 研究背景與動機 7 1.2 相關研究 8 第二章 環境配置與硬體設備 9 2.1 環境配置 9 2.2 硬體設備 9 第三章 各領域介紹 10 3.1 深度學習介紹 10 3.1.1 深度神經網路 10 3.1.2 卷積神經網路 12 3.2 Attention介紹 14 3.3 MIDI介紹 14 3.4 聲音分離介紹 15 3.5 Efficient介紹 16 3.6 常數Q轉換介紹 17 第四章 模型架構 19 4.1 卷積神經網路中的翹楚 Efficient Net V2 19 4.2 在 Efficient Net V2中加入 Attention 模塊 21 4.3 在 Efficient Net V2中加入 EMA 模塊 21 第五章 研究設計 23 5.1 資料前處理 23 5.2 模型選擇 24 5.3 實驗細節 25 5.4 成果比較 27 5.5 消融實驗 28 第六章 結論與未來展望 29 參考資料 30 圖 目 錄 圖1 限制波爾茲曼機的前向傳播與重建 11 圖2 由限制波爾茲曼機堆疊成的深度信念網路 11 圖3 卷積概念 13 圖4 最大池化概念 13 圖5 CQT轉換與短時距傅立葉轉換的不同 18 圖6 EfficientV2與其他相似模型比較 20 圖7 MB Convolution與Fused MB Convolution比較 20 圖8 取樣點示意圖 24 圖9 實驗流程圖 26 表 目 錄 表1 軟體環境 9 表2 硬體設備 9 表3 成果比較 27 表4 消融實驗 28

