簡易檢索 / 詳目顯示

研究生: 王彥翔
Yan-Xiang Wang
論文名稱: 符號多軌可重複樂器音樂之高效生成
Efficient Generation of Symbolic Multi-Track Repeatable-Instrument Music
指導教授: 陳怡伶
Yi-Ling Chen
口試委員: 張智傑
Chih-Chieh Chang
戴碧如
Bi-Ru Dai
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 80
中文關鍵詞: 符號音樂生成多軌可重複樂器音樂音樂表示法位元組對編碼
外文關鍵詞: symbolic music generation, multi-track repeatable-instrument music, music representation, byte-pair encoding
相關次數: 點閱:158下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報


Abstract in Chinese iii Abstract in English v Acknowledgements vii Contents viii List of Figures xii List of Tables xiii 1 Introduction 1 2 Related Works 8 3 Methodology 13 3.1 Preprocessing of MIDI Data 14 3.2 Continuative Duration Encoding (CDE) 16 3.3 TNTM Representation of MTRI music 19 3.4 Using Multi-Note BPE 20 3.5 Learnable Positional Embedding with MPS Order 22 3.6 Embedding pooling and Multi-head Prediction 24 3.7 Loss Function 26 3.8 Generating Music 26 4 Track-Equipped Note and Track-Instrument Mapping Representation 28 4.1 Token Families 28 4.2 Token Fields 31 4.3 Practical Choices of Parameters and Vocabulary 32 5 Multi-Note Byte-Pair Encoding 35 5.1 Definition of Multi-note 36 5.2 Merge of Multi-notes 38 5.3 Multi-note Adjacency 39 5.3.1 Offset time 39 5.3.2 Overlapping 40 5.3.3 Immediately following 40 5.3.4 Adjacency 40 5.4 Adjacency Preservation under Merge Operation 41 5.5 Multi-Note BPE Algorithm 44 5.5.1 Implementation 44 5.5.2 Time Complexity Analysis 46 5.5.3 Applying Prelearned Contours 48 6 Experiment and Result 49 6.1 Dataset and Effectiveness of TNTM 49 6.2 Effectiveness of Multi-Note BPE 50 6.3 Model Configurations and Training Details 52 6.4 Experiment Settings 53 6.5 Generation Efficiency 56 6.6 Objective Quality 56 6.6.1 Pitch Class Entropy 59 6.6.2 Instrumentation Self-Similarity 59 6.6.3 Grooving Self-Similarity 60 6.6.4 Histogram intersection 60 6.6.5 Average rank 61 6.6.6 Objective quality evaluation result 61 6.7 Subjective Quality 63 6.7.1 Design of listening test 64 6.7.2 Results of listening test 65 6.7.3 Evaluation of MMGT with Linear Transformer for comparison with SymphonyNet 66 6.8 Ablation Study 69 7 Discussion and Conclusion 72 7.1 Conclusion 72 7.2 Future Work 72 References 74 Appendix A: Notation Table 78

[1] L.-C. Yang, S.-Y. Chou, and Y.-H. Yang, “Midinet: A convolutional generative adversarial network for symbolic-domain music generation,” in Proceeding of the 18th International Society for Music Information Retrieval Conference, 2017.
[2] H.-W. Dong, W.-Y. Hsiao, L.-C. Yang, and Y.-H. Yang, “Musegan: Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, 2018.
[3] C. Donahue, H. H. Mao, Y. E. Li, G. W. Cottrell, and J. McAuley, “Lakhnes: Improving multiinstrumental music generation with cross-domain pre-training,” in Proceeding of the 20th International Society for Music Information Retrieval Conference, 2019.
[4] Y. Ren, J. He, X. Tan, T. Qin, Z. Zhao, and T.-Y. Liu, “Popmag: Pop music accompaniment generation,” in Proceedings of the 28th ACM international conference on multimedia, pp. 1198–1206, 2020.
[5] H.-W. Dong, K. Chen, S. Dubnov, J. McAuley, and T. Berg-Kirkpatrick, “Multitrack music transformer,” in IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1–5, 2023.
[6] A. Vivaldi, “Concerto in B minor.” L’estro armonico, Op.3, no.10, RV 580, 1711.
[7] J. Ens and P. Pasquier, “Mmm : Exploring conditional multi-track music generation with the transformer,” 2020.
[8] J. Liu, Y. Dong, Z. Cheng, X. Zhang, X. Li, F. Yu, and M. Sun, “Symphony generation with permutation invariant language model,” in Proceeding of the 23rd International Society for Music Information Retrieval Conference, 2022.
[9] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. ukasz Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, vol. 30, Curran Associates, Inc., 2017. 74
[10] A. Laaksonen, K. Lemström, and O. Björklund, “Transposition and time-scaling invariant algorithm for detecting repeated patterns in polyphonic music,” in Mathematics and Computation in Music, p. 168–179, 2022.
[11] E. Karystinaios and G. Widmer, “Cadence detection in symbolic classical music using graph neural networks,” in Proceeding of the 23rd International Society for Music Information Retrieval Conference, 2022.
[12] D. Jeong, T. Kwon, Y. Kim, and J. Nam, “Graph neural network for music score data and modeling expressive piano performance,” in International Conference on Machine Learning, 2019.
[13] W. M. Szeto and M. H. Wong, “A graph-theoretical approach for pattern matching in post-tonal music analysis,” Journal of New Music Research, vol. 35, 12 2006.
[14] W.-Y. Hsiao, J.-Y. Liu, Y.-C. Yeh, and Y.-H. Yang, “Compound word transformer: Learning to compose full-song music over dynamic directed hypergraphs,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 178–186, 2021.
[15] M. Zeng, X. Tan, R. Wang, Z. Ju, T. Qin, and T.-Y. Liu, “Musicbert: Symbolic music understanding with large-scale pre-training,” in Findings of the Association for Computational Linguistics: ACLIJCNLP 2021, pp. 791–800, 2021.
[16] S. Oore, I. Simon, S. Dieleman, D. Eck, and K. Simonyan, “This time with feeling: Learning expressive musical performance,” Neural Computing and Applications, pp. 1–13, 2018.
[17] I. Simon and S. Oore, “Performance rnn: Generating music with expressive timing and dynamics.” https://magenta.tensorflow.org/performance-rnn, 2017.
[18] C.-Z. A. Huang, A. Vaswani, J. Uszkoreit, I. Simon, C. Hawthorne, N. Shazeer, A. M. Dai, M. D. Hoffman, M. Dinculescu, and D. Eck, “Music transformer: Generating music with long-term structure,” in 7th International Conference on Learning Representations, 2019.
[19] C. Payne, “Musenet..” http://openai.com/blog/musenet, Apr. 2019.
[20] Y.-S. Huang and Y.-H. Yang, “Pop music transformer: Beat-based modeling and generation of expressive pop piano compositions,” in Proceedings of the 28th ACM International Conference on Multimedia, pp. 1180–1188, 2020. 75
[21] S.-L. Wu and Y.-H. Yang, “Musemorphose: Full-song and fine-grained piano music style transfer with one Transformer VAE,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 1953–1967, 2023.
[22] Y.-J. Shih, S.-L. Wu, F. Zalkow, M. Muller, and Y.-H. Yang, “Theme transformer: Symbolic music generation with theme-conditioned transformer,” IEEE Transactions on Multimedia, 2022.
[23] Z. Wang, Y. Zhang, Y. Zhang, J. Jiang, R. Yang, J. Zhao, and G. Xia, “Pianotree vae: Structured representation learning for polyphonic music,” in Proceeding of the 21st International Society for Music Information Retrieval Conference, 2020.
[24] D. von Rütte, L. Biggio, Y. Kilcher, and T. Hofmann, “Figaro: Controllable music generation using learned and expert features,” in The Eleventh International Conference on Learning Representations, 2023.
[25] Y. Liao, W. Yue, Y. Jian, Z. Wang, Y. Gao, and C. Lu, “Micw: A multi-instrument music generation model based on the improved compound word,” in IEEE International Conference on Multimedia and Expo Workshops, pp. 1–10, IEEE, 2022.
[26] P. Gage, “A new algorithm for data compression,” The C Users Journal archive, vol. 12, pp. 23–38, 1994.
[27] R. Sennrich, B. Haddow, and A. Birch, “Neural machine translation of rare words with subword units,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), (Berlin, Germany), pp. 1715–1725, Association for Computational Linguistics, Aug. 2016.
[28] N. Fradet, N. Gutowski, F. Chhel, and J.-P. Briot, “Byte pair encoding for symbolic music,” in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 2001–2020, 2023.
[29] J. Tian, Z. Li, J. Li, and P. Wang, “N-gram unsupervised compoundation and feature injection for better symbolic music understanding,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2024. 76
[30] X. Song, A. Salcianu, Y. Song, D. Dopson, and D. Zhou, “Fast WordPiece tokenization,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 2089–2103, Nov. 2021.
[31] C. Raffel, Learning-based methods for comparing sequences, with applications to audio-to-MIDI alignment and matching. Phd thesis, Columbia University, Jan. 2016.
[32] A. Holtzman, J. Buys, L. Du, M. Forbes, and Y. Choi, “The curious case of neural text degeneration,” in 8th International Conference on Learning Representations, 2020.
[33] A. Katharopoulos, A. Vyas, N. Pappas, and F. Fleuret, “Transformers are rnns: Fast autoregressive transformers with linear attention,” in Proceedings of the 37th International Conference on Machine Learning, 2020.
[34] L.-C. Yang and A. Lerch, “On the evaluation of generative models in music,” Neural Computing and Applications, vol. 32, pp. 4773–4784, 2018.
[35] S.-L. Wu and Y.-H. Yang, “The jazz transformer on the front line: Exploring the shortcomings of aicomposed music through quantitative measures,” in Proceeding of the 21st International Society for Music Information Retrieval Conference, 2020.
[36] G. Mittal, J. H. Engel, C. Hawthorne, and I. Simon, “Symbolic music generation with diffusion models,” in Proceeding of the 22nd International Society for Music Information Retrieval Conference, pp. 468–475, 2021.
[37] Z. Qin, X. Han, W. Sun, D. Li, L. Kong, N. Barnes, and Y. Zhong, “The devil in linear transformer,” in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 7025–7041, Dec. 2022.

無法下載圖示 全文公開日期 2034/07/10 (校內網路)
全文公開日期 本全文未授權公開 (校外網路)
全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
QR CODE