符號多軌可重複樂器音樂之高效生成｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	王彥翔 Yan-Xiang Wang
論文名稱：	符號多軌可重複樂器音樂之高效生成 Efficient Generation of Symbolic Multi-Track Repeatable-Instrument Music
指導教授：	陳怡伶 Yi-Ling Chen
口試委員:	張智傑 Chih-Chieh Chang 戴碧如 Bi-Ru Dai
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2024
畢業學年度：	112
語文別：	英文
論文頁數：	80
中文關鍵詞：	符號音樂生成、多軌可重複樂器音樂、音樂表示法、位元組對編碼
外文關鍵詞：	symbolic music generation, multi-track repeatable-instrument music, music representation, byte-pair encoding
相關次數：	點閱：158 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

Abstract in Chinese iii
Abstract in English v
Acknowledgements vii
Contents viii
List of Figures xii
List of Tables xiii
Introduction 1
Related Works 8
Methodology 13
1 Preprocessing of MIDI Data 14
2 Continuative Duration Encoding (CDE) 16
3 TNTM Representation of MTRI music 19
4 Using Multi-Note BPE 20
5 Learnable Positional Embedding with MPS Order 22
6 Embedding pooling and Multi-head Prediction 24
7 Loss Function 26
8 Generating Music 26
Track-Equipped Note and Track-Instrument Mapping Representation 28
1 Token Families 28
2 Token Fields 31
3 Practical Choices of Parameters and Vocabulary 32
Multi-Note Byte-Pair Encoding 35
1 Definition of Multi-note 36
2 Merge of Multi-notes 38
3 Multi-note Adjacency 39
3.1 Offset time 39
3.2 Overlapping 40
3.3 Immediately following 40
3.4 Adjacency 40
4 Adjacency Preservation under Merge Operation 41
5 Multi-Note BPE Algorithm 44
5.1 Implementation 44
5.2 Time Complexity Analysis 46
5.3 Applying Prelearned Contours 48
Experiment and Result 49
1 Dataset and Effectiveness of TNTM 49
2 Effectiveness of Multi-Note BPE 50
3 Model Configurations and Training Details 52
4 Experiment Settings 53
5 Generation Efficiency 56
6 Objective Quality 56
6.1 Pitch Class Entropy 59
6.2 Instrumentation Self-Similarity 59
6.3 Grooving Self-Similarity 60
6.4 Histogram intersection 60
6.5 Average rank 61
6.6 Objective quality evaluation result 61
7 Subjective Quality 63
7.1 Design of listening test 64
7.2 Results of listening test 65
7.3 Evaluation of MMGT with Linear Transformer for comparison
with SymphonyNet 66
8 Ablation Study 69
Discussion and Conclusion 72
1 Conclusion 72
2 Future Work 72
References 74
Appendix A: Notation Table 78
                                

[1] L.-C. Yang, S.-Y. Chou, and Y.-H. Yang, “Midinet: A convolutional generative adversarial network for symbolic-domain music generation,” in Proceeding of the 18th International Society for Music Information Retrieval Conference, 2017.
[2] H.-W. Dong, W.-Y. Hsiao, L.-C. Yang, and Y.-H. Yang, “Musegan: Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, 2018.
[3] C. Donahue, H. H. Mao, Y. E. Li, G. W. Cottrell, and J. McAuley, “Lakhnes: Improving multiinstrumental music generation with cross-domain pre-training,” in Proceeding of the 20th International Society for Music Information Retrieval Conference, 2019.
[4] Y. Ren, J. He, X. Tan, T. Qin, Z. Zhao, and T.-Y. Liu, “Popmag: Pop music accompaniment generation,” in Proceedings of the 28th ACM international conference on multimedia, pp. 1198–1206, 2020.
[5] H.-W. Dong, K. Chen, S. Dubnov, J. McAuley, and T. Berg-Kirkpatrick, “Multitrack music transformer,” in IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1–5, 2023.
[6] A. Vivaldi, “Concerto in B minor.” L’estro armonico, Op.3, no.10, RV 580, 1711.
[7] J. Ens and P. Pasquier, “Mmm : Exploring conditional multi-track music generation with the transformer,” 2020.
[8] J. Liu, Y. Dong, Z. Cheng, X. Zhang, X. Li, F. Yu, and M. Sun, “Symphony generation with permutation invariant language model,” in Proceeding of the 23rd International Society for Music Information Retrieval Conference, 2022.
[9] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. ukasz Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, vol. 30, Curran Associates, Inc., 2017. 74
[10] A. Laaksonen, K. Lemström, and O. Björklund, “Transposition and time-scaling invariant algorithm for detecting repeated patterns in polyphonic music,” in Mathematics and Computation in Music, p. 168–179, 2022.
[11] E. Karystinaios and G. Widmer, “Cadence detection in symbolic classical music using graph neural networks,” in Proceeding of the 23rd International Society for Music Information Retrieval Conference, 2022.
[12] D. Jeong, T. Kwon, Y. Kim, and J. Nam, “Graph neural network for music score data and modeling expressive piano performance,” in International Conference on Machine Learning, 2019.
[13] W. M. Szeto and M. H. Wong, “A graph-theoretical approach for pattern matching in post-tonal music analysis,” Journal of New Music Research, vol. 35, 12 2006.
[14] W.-Y. Hsiao, J.-Y. Liu, Y.-C. Yeh, and Y.-H. Yang, “Compound word transformer: Learning to compose full-song music over dynamic directed hypergraphs,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 178–186, 2021.
[15] M. Zeng, X. Tan, R. Wang, Z. Ju, T. Qin, and T.-Y. Liu, “Musicbert: Symbolic music understanding with large-scale pre-training,” in Findings of the Association for Computational Linguistics: ACLIJCNLP 2021, pp. 791–800, 2021.
[16] S. Oore, I. Simon, S. Dieleman, D. Eck, and K. Simonyan, “This time with feeling: Learning expressive musical performance,” Neural Computing and Applications, pp. 1–13, 2018.
[17] I. Simon and S. Oore, “Performance rnn: Generating music with expressive timing and dynamics.” https://magenta.tensorflow.org/performance-rnn, 2017.
[18] C.-Z. A. Huang, A. Vaswani, J. Uszkoreit, I. Simon, C. Hawthorne, N. Shazeer, A. M. Dai, M. D. Hoffman, M. Dinculescu, and D. Eck, “Music transformer: Generating music with long-term structure,” in 7th International Conference on Learning Representations, 2019.
[19] C. Payne, “Musenet..” http://openai.com/blog/musenet, Apr. 2019.
[20] Y.-S. Huang and Y.-H. Yang, “Pop music transformer: Beat-based modeling and generation of expressive pop piano compositions,” in Proceedings of the 28th ACM International Conference on Multimedia, pp. 1180–1188, 2020. 75
[21] S.-L. Wu and Y.-H. Yang, “Musemorphose: Full-song and fine-grained piano music style transfer with one Transformer VAE,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 1953–1967, 2023.
[22] Y.-J. Shih, S.-L. Wu, F. Zalkow, M. Muller, and Y.-H. Yang, “Theme transformer: Symbolic music generation with theme-conditioned transformer,” IEEE Transactions on Multimedia, 2022.
[23] Z. Wang, Y. Zhang, Y. Zhang, J. Jiang, R. Yang, J. Zhao, and G. Xia, “Pianotree vae: Structured representation learning for polyphonic music,” in Proceeding of the 21st International Society for Music Information Retrieval Conference, 2020.
[24] D. von Rütte, L. Biggio, Y. Kilcher, and T. Hofmann, “Figaro: Controllable music generation using learned and expert features,” in The Eleventh International Conference on Learning Representations, 2023.
[25] Y. Liao, W. Yue, Y. Jian, Z. Wang, Y. Gao, and C. Lu, “Micw: A multi-instrument music generation model based on the improved compound word,” in IEEE International Conference on Multimedia and Expo Workshops, pp. 1–10, IEEE, 2022.
[26] P. Gage, “A new algorithm for data compression,” The C Users Journal archive, vol. 12, pp. 23–38, 1994.
[27] R. Sennrich, B. Haddow, and A. Birch, “Neural machine translation of rare words with subword units,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), (Berlin, Germany), pp. 1715–1725, Association for Computational Linguistics, Aug. 2016.
[28] N. Fradet, N. Gutowski, F. Chhel, and J.-P. Briot, “Byte pair encoding for symbolic music,” in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 2001–2020, 2023.
[29] J. Tian, Z. Li, J. Li, and P. Wang, “N-gram unsupervised compoundation and feature injection for better symbolic music understanding,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2024. 76
[30] X. Song, A. Salcianu, Y. Song, D. Dopson, and D. Zhou, “Fast WordPiece tokenization,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 2089–2103, Nov. 2021.
[31] C. Raffel, Learning-based methods for comparing sequences, with applications to audio-to-MIDI alignment and matching. Phd thesis, Columbia University, Jan. 2016.
[32] A. Holtzman, J. Buys, L. Du, M. Forbes, and Y. Choi, “The curious case of neural text degeneration,” in 8th International Conference on Learning Representations, 2020.
[33] A. Katharopoulos, A. Vyas, N. Pappas, and F. Fleuret, “Transformers are rnns: Fast autoregressive transformers with linear attention,” in Proceedings of the 37th International Conference on Machine Learning, 2020.
[34] L.-C. Yang and A. Lerch, “On the evaluation of generative models in music,” Neural Computing and Applications, vol. 32, pp. 4773–4784, 2018.
[35] S.-L. Wu and Y.-H. Yang, “The jazz transformer on the front line: Exploring the shortcomings of aicomposed music through quantitative measures,” in Proceeding of the 21st International Society for Music Information Retrieval Conference, 2020.
[36] G. Mittal, J. H. Engel, C. Hawthorne, and I. Simon, “Symbolic music generation with diffusion models,” in Proceeding of the 22nd International Society for Music Information Retrieval Conference, pp. 468–475, 2021.
[37] Z. Qin, X. Han, W. Sun, D. Li, L. Kong, N. Barnes, and Y. Zhong, “The devil in linear transformer,” in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 7025–7041, Dec. 2022.

全文公開日期 2034/07/10 (校內網路)
全文公開日期本全文未授權公開 (校外網路)
全文公開日期本全文未授權公開 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文