簡易檢索 / 詳目顯示

研究生: 蘇冠武
Kuan-Wu Su
論文名稱: 條件式具主觀成分異質內容之生成方法
Conditional Content Generation Based on Subjective Perception Context from Heterogeneous Content Types
指導教授: 呂政修
Jenq-Shiou Leu
口試委員: 呂政修
Jenq-Shiou Leu
周承復
Cheng-Fu Chou
魏宏宇
Hung-Yu Wei
陳俊良
Jiann-Liang Chen
陳郁堂
Yie-Tarng Chen
衛信文
Hsin-Wen Wei
方文賢
Wen-Hsien Fang
鄭瑞光
Ray-Guang Cheng
阮聖彰
Shanq-Jang Ruan
學位類別: 博士
Doctor
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 123
中文關鍵詞: 生成模型條件式生成深度學習主觀認知差異音樂生成圍棋
外文關鍵詞: Generative Models, Conditional Content Generation, Deep Learning, Subjective Perspective, Music Generation, The Game of Go
相關次數: 點閱:293下載:4
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近幾年來生成模型發展迅速,利用人工智慧進行自動內容生成是深度學習的下一步巨大進展,並將產生許多革命的新應用。各種以內容搜尋為主的服務正逐漸轉變成為自動內容生成為主,從查詢關鍵字並爬梳關聯內容,轉為利用特製提示來直接產生所需的內容。但自動生成的內容目前往往仍需要多次試誤,產生許多不同的變化並從中精挑細選。而且有些使用者真正想要的內容,會因提示語言本身的限制,和模型缺乏某些專門領域知識導致難以生成,需額外訓練資料並花時間重新微調生成模型。有鑑於這些缺陷,本研究從使用者對不同性質的內容具有不同主觀觀感出發,利用條件式生成模型去客製化生成內容。並以不同棋手對圍棋著手的評估差異為例,生成出對應不同主觀經驗的個人客製化觀感音樂旋律。生成出來的音樂與乾淨的高品質鋼琴演奏樂間的佛雷歇距離平均只有1.687,接近錄音室錄製的品質。同時具有利用基因鑲嵌轉譯模組與其他生成模型橋接的彈性。


    Generative models had advanced significantly in recent years, and AI-powered content creation is the next big breakthrough for deep learning applications. Content-based Multimedia Information Retrieval gives way to Content-based Multimedia Generation and Creation. Instead of finding a list of correlated contents from queries, Conditional Content Generation creates brand new contents based on carefully-crafted prompts. But time consuming trial-and-Error is still required with many operating iterations to produce enough contents where only some of them are preferred outcomes. Moreover, due to the lack of specific domain knowledge, and the limitation in language expressions for prompts, some desirable outcomes might not be possible without fine-tuning models and additional training with domain-specific datasets. Hence in this dissertation, a conditional content generation method embedded different subjective perspectives across heterogeneous content types is proposed, and demonstrated by utilizing different opinions of moves in the game of Go to generate associated contextual music segments. The result of the generated music on average has 1.687 Fréchet Audio Distance (FAD) score compared to a clean virtuosic classical piano music dataset, nearly at the level of studio recording quality. It also has the flexibility to combine with other generation models through the use of genetic embedding translator.

    論文摘要 I Abstract II Acknowledgments III List of Figures VI List of Tables IX Chapter 1 Introduction 1 1.1. Background 1 1.1.1. Content Generation 1 1.1.2. Information Retrieval and Conditional Content Generation 2 1.2. Motivation 5 1.2.1. Personalized Content Generation 5 1.2.2. Personalized Content Enrichment 5 1.3. Research Target 7 1.4. Research Problems 8 1.4.1. Heterogeneous Contents 8 1.4.2. Scalability and Personalization 11 1.4.3. Bridging Interpretations and Content Creation 11 1.5. Dissertation Organization 12 Chapter 2 Subjective Perceptions in Heterogeneous Contents 13 2.1. Subjective Perceptions and AIs in the Game of Go 14 2.1.1. Go Games as Spectator Events 17 2.1.2. Representations for a Go Game Position and Commentaries 19 2.1.3. Common Go Game Terminologies 20 2.1.4. Perceptions and Expressions used in Go Games 21 2.1.5. Player Proficiency and Ranks in Go Games 22 2.2. Perceptions in Music 24 Chapter 3 Related Methods and Models 26 3.1. Generative Models 26 3.1.1. Autoencoder 26 3.1.2. Generative Adversarial Networks 28 3.1.3. Recurrent Neural Networks 30 3.1.4. Transformers 32 3.2. Deep Reinforcement Learning 34 3.3. Context and Perspective Translator 37 3.3.1. Quantized Vectors and Codebook 37 3.3.2. Tokenizer and Embedding 38 3.3.3. Genetic Optimization Embedding 39 3.4. Low-level Feature Identification 41 3.4.1. Mid-Level Features Extraction 42 3.5. Spectrogram Representation 45 Chapter 4 System Structure and Methods 47 4.1. Data Gathering and Dynamic Survey Generation 47 4.2. Self and Semi Supervised Learning 47 4.3. System Architecture 49 4.3.1. Go Game Perception Interpreter Model 50 4.3.2. Musika Music Content Generation Model 54 4.3.3. Conditional Go game board and Perception Integration Generator 56 4.3.4. Genetic Embedding Translator and Controller 57 Chapter 5 Experiments and Results 60 5.1. Experimental Setup and Hardware 60 5.2. Conditional Music Generation based on Perceptions 60 5.3. Conditional Image Commentary Generation 63 Chapter 6 Conclusion 67 6.1. Discussion 67 6.2. Conclusion 68 6.3. Future Works 69 References 71 Appendix A Go Game Basics and Terminologies 80 Appendix B Perceptions of Patterns in Go Game Positions 88 Appendix C Music Elements Aligned with Perceptions for Go Games 94 C.1. Music Associated Elements 94 C.1.1. Pitch and Note-Value 94 C.1.2. Measures and Themes 95 C.1.3. Tempo and Beats 95 C.1.4. Rhythm and Accent 96 C.1.5. Scale and Chord 96 C.2. Associated Perceptions between Music and Go Games 97 C.2.1. Height and Pitch 98 C.2.2. Big/Small and Loudness 99 C.2.3. Speed and Tempo 101 C.2.4. Strong/Soft and Rhythm 103 2.3.5. Complexity and Melody 106 Publication List 111

    [1] D. Foster, Generative deep learning: teaching machines to paint, write, compose, and play, O'Reilly Media, 2019.
    [2] J. Summaira, X. Li, A. M. Shoib, S. Li and J. Abdul, "Recent Advances and Trends in Multimodal Deep Learning: A Review," arXiv preprint arXiv:2105.11087, 2021.
    [3] A. Radford, K. Narasimhan, T. Salimans, I. Sutskever and others, "Improving language understanding by generative pre-training," 2018.
    [4] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever and others, "Language models are unsupervised multitask learners," OpenAI, vol. 1, p. 9, 2019.
    [5] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell and others, "Language models are few-shot learners," Advances in neural information processing systems, vol. 33, p. 1877–1901, 2020.
    [6] S. Black, S. Biderman, E. Hallahan, Q. Anthony, L. Gao, L. Golding, H. He, C. Leahy, K. McDonell, J. Phang and others, "Gpt-neox-20b: An open-source autoregressive language model," arXiv preprint arXiv:2204.06745, 2022.
    [7] J. Devlin, M.-W. Chang, K. Lee and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.
    [8] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer and V. Stoyanov, "Roberta: A robustly optimized bert pretraining approach," arXiv preprint arXiv:1907.11692, 2019.
    [9] R. Thoppilan, D. De Freitas, J. Hall, N. Shazeer, A. Kulshreshtha, H.-T. Cheng, A. Jin, T. Bos, L. Baker, Y. Du and others, "Lamda: Language models for dialog applications," arXiv preprint arXiv:2201.08239, 2022.
    [10] G. Lample and A. Conneau, "Cross-lingual language model pretraining," arXiv preprint arXiv:1901.07291, 2019.
    [11] S. Reed, K. Zolna, E. Parisotto, S. G. Colmenarejo, A. Novikov, G. Barth-Maron, M. Gimenez, Y. Sulsky, J. Kay, J. T. Springenberg and others, "A generalist agent," arXiv preprint arXiv:2205.06175, 2022.
    [12] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville and Y. Bengio, "Generative adversarial networks," Communications of the ACM, vol. 63, p. 139–144, 2020.
    [13] L. Wang, W. Chen, W. Yang, F. Bi and F. R. Yu, "A State-of-the-Art Review on Image Synthesis With Generative Adversarial Networks," IEEE Access, vol. 8, pp. 63514-63537, 2020.
    [14] A. Aggarwal, M. Mittal and G. Battineni, "Generative adversarial network: An overview of theory and applications," International Journal of Information Management Data Insights, vol. 1, p. 100004, 2021.
    [15] J. Gui, Z. Sun, Y. Wen, D. Tao and J. Ye, "A review on generative adversarial networks: Algorithms, theory, and applications," IEEE Transactions on Knowledge and Data Engineering, 2021.
    [16] M. Tschannen, O. Bachem and M. Lucic, "Recent advances in autoencoder-based representation learning," arXiv preprint arXiv:1812.05069, 2018.
    [17] W. Xu, S. Keshmiri and G. Wang, "Adversarially Approximated Autoencoder for Image Generation and Manipulation," IEEE Transactions on Multimedia, vol. 21, pp. 2387-2396, 2019.
    [18] R. Child, "Very deep vaes generalize autoregressive models and can outperform them on images," arXiv preprint arXiv:2011.10650, 2020.
    [19] J. Yu, Y. Xu, J. Y. Koh, T. Luong, G. Baid, Z. Wang, V. Vasudevan, A. Ku, Y. Yang, B. K. Ayan and others, "Scaling autoregressive models for content-rich text-to-image generation," arXiv preprint arXiv:2206.10789, 2022.
    [20] Y. Lu, J. Collado, D. Whiteson and P. Baldi, "Sparse autoregressive models for scalable generation of sparse images in particle physics," Physical Review D, vol. 103, p. 036012, 2021.
    [21] J. Ho, A. Jain and P. Abbeel, "Denoising diffusion probabilistic models," Advances in Neural Information Processing Systems, vol. 33, p. 6840–6851, 2020.
    [22] L. Yang, Z. Zhang, Y. Song, S. Hong, R. Xu, Y. Zhao, Y. Shao, W. Zhang, B. Cui and M.-H. Yang, "Diffusion models: A comprehensive survey of methods and applications," arXiv preprint arXiv:2209.00796, 2022.
    [23] L.-C. Yang, S.-Y. Chou and Y.-H. Yang, "MidiNet: A convolutional generative adversarial network for symbolic-domain music generation," arXiv preprint arXiv:1703.10847, 2017.
    [24] C.-F. Huang and C.-Y. Huang, "Emotion-based AI music generation system with CVAE-GAN," in 2020 IEEE Eurasia Conference on IOT, Communication and Engineering (ECICE), 2020.
    [25] J.-Y. Liu, Y.-H. Chen, Y.-C. Yeh and Y.-H. Yang, "Unconditional audio generation with generative adversarial networks and cycle regularization," arXiv preprint arXiv:2005.08526, 2020.
    [26] P. Dhariwal, H. Jun, C. Payne, J. W. Kim, A. Radford and I. Sutskever, "Jukebox: A generative model for music," arXiv preprint arXiv:2005.00341, 2020.
    [27] M. Pasini and J. Schlüter, "Musika! Fast Infinite Waveform Music Generation," arXiv preprint arXiv:2208.08706, 2022.
    [28] S. Han, H. Ihm and W. Lim, "Symbolic Music Loop Generation with VQ-VAE," arXiv preprint arXiv:2111.07657, 2021.
    [29] V. Iashin and E. Rahtu, "Taming visually guided sound generation," arXiv preprint arXiv:2110.08791, 2021.
    [30] K.-W. Su, M.-C. Yu and J.-S. Leu, "A Neuroevolution Strategy Using Multi-Agent Incorporated Hierarchical Ensemble Model," in Proceedings of the Genetic and Evolutionary Computation Conference Companion, New York, NY, USA, 2018.
    [31] F. Shah, T. Naik and N. Vyas, "LSTM based music generation," in 2019 International Conference on Machine Learning and Data Engineering (iCMLDE), 2019.
    [32] M. Dua, R. Yadav, D. Mamgai and S. Brodiya, "An Improved RNN-LSTM based Novel Approach for Sheet Music Generation," Procedia Computer Science, vol. 171, pp. 465-474, 2020.
    [33] W. Schulze and B. van der Merwe, "Music Generation with Markov Models," IEEE MultiMedia, vol. 18, pp. 78-85, July 2011.
    [34] B. Mor, S. Garhwal and A. Kumar, "A Systematic Review of Hidden Markov Models and Their Applications," Archives of Computational Methods in Engineering, vol. 28, p. 1429–1448, 2021.
    [35] S. Smith, M. Patwary, B. Norick, P. LeGresley, S. Rajbhandari, J. Casper, Z. Liu, S. Prabhumoye, G. Zerveas, V. Korthikanti, E. Zhang, R. Child, R. Yazdani Aminabadi, J. Bernauer, X. Song, M. Shoeybi, Y. He, M. Houston, S. Tiwary and B. Catanzaro, "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model," arXiv e-prints, p. arXiv:2201.11990, January 2022.
    [36] A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, H. W. Chung, C. Sutton, S. Gehrmann, P. Schuh, K. Shi, S. Tsvyashchenko, J. Maynez, A. Rao, P. Barnes, Y. Tay, N. Shazeer, V. Prabhakaran, E. Reif, N. Du, B. Hutchinson, R. Pope, J. Bradbury, J. Austin, M. Isard, G. Gur-Ari, P. Yin, T. Duke, A. Levskaya, S. Ghemawat, S. Dev, H. Michalewski, X. Garcia, V. Misra, K. Robinson, L. Fedus, D. Zhou, D. Ippolito, D. Luan, H. Lim, B. Zoph, A. Spiridonov, R. Sepassi, D. Dohan, S. Agrawal, M. Omernick, A. M. Dai, T. Sankaranarayana Pillai, M. Pellat, A. Lewkowycz, E. Moreira, R. Child, O. Polozov, K. Lee, Z. Zhou, X. Wang, B. Saeta, M. Diaz, O. Firat, M. Catasta, J. Wei, K. Meier-Hellstern, D. Eck, J. Dean, S. Petrov and N. Fiedel, "PaLM: Scaling Language Modeling with Pathways," arXiv e-prints, p. arXiv:2204.02311, April 2022.
    [37] W. Fedus, B. Zoph and N. Shazeer, "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity," arXiv e-prints, p. arXiv:2101.03961, January 2021.
    [38] A. Nakamura and T. Harada, "Revisiting Fine-tuning for Few-shot Learning," arXiv e-prints, p. arXiv:1910.00216, October 2019.
    [39] Y. Wang, Q. Yao, J. T. Kwok and L. M. Ni, "Generalizing from a Few Examples: A Survey on Few-Shot Learning," ACM Comput. Surv., vol. 53, June 2020.
    [40] M. Tsimpoukelli, J. L. Menick, S. Cabi, S. M. A. Eslami, O. Vinyals and F. Hill, "Multimodal Few-Shot Learning with Frozen Language Models," in Advances in Neural Information Processing Systems, 2021.
    [41] R. Gal, Y. Alaluf, Y. Atzmon, O. Patashnik, A. H. Bermano, G. Chechik and D. Cohen-Or, "An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion," arXiv e-prints, p. arXiv:2208.01618, August 2022.
    [42] F.-z. El-Alami, S. O. E. Alaoui and N. E. Nahnahi, "Contextual semantic embeddings based on fine-tuned AraBERT model for Arabic text multi-class categorization," Journal of King Saud University - Computer and Information Sciences, vol. 34, pp. 8422-8428, 2022.
    [43] A. Dadashzadeh, A. Whone and M. Mirmehdi, "Auxiliary Learning for Self-Supervised Video Representation via Similarity-Based Knowledge Distillation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2022.
    [44] C. D. Manning, Introduction to information retrieval, Syngress Publishing,, 2008.
    [45] áOscar Celma, Music Recommendation and Discovery: The Long Tail, Long Fail, and Long Play in the Digital Music Space, Springer, 2010.
    [46] C. Carpineto and G. Romano, "A Survey of Automatic Query Expansion in Information Retrieval," ACM Comput. Surv., vol. 44, January 2012.
    [47] R. Campos, G. Dias, A. M. Jorge and A. Jatowt, "Survey of temporal information retrieval and related applications," ACM Computing Surveys (CSUR), vol. 47, p. 1–41, 2014.
    [48] X. Yan, J. Yang, K. Sohn and H. Lee, "Attribute2image: Conditional image generation from visual attributes," in European conference on computer vision, 2016.
    [49] A. Ramesh, P. Dhariwal, A. Nichol, C. Chu and M. Chen, "Hierarchical text-conditional image generation with clip latents," arXiv preprint arXiv:2204.06125, 2022.
    [50] R. Abbott, Research Handbook on Intellectual Property and Artificial Intelligence, Edward Elgar Publishing, 2022.
    [51] J. Salminen, S.-g. Jung, S. Chowdhury and B. J. Jansen, "Analyzing demographic bias in artificially generated facial pictures," in Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems, 2020.
    [52] V. Liu and L. B. Chilton, "Design Guidelines for Prompt Engineering Text-to-Image Generative Models," in CHI Conference on Human Factors in Computing Systems, 2022.
    [53] K. Zhou, J. Yang, C. C. Loy and Z. Liu, "Learning to prompt for vision-language models," International Journal of Computer Vision, vol. 130, p. 2337–2348, 2022.
    [54] M. Arjovsky, S. Chintala and L. Bottou, "Wasserstein generative adversarial networks," in International conference on machine learning, 2017.
    [55] A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen and I. Sutskever, "Zero-shot text-to-image generation," in International Conference on Machine Learning, 2021.
    [56] T. Karras, S. Laine and T. Aila, "A style-based generator architecture for generative adversarial networks," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019.
    [57] J. G. Reitman, M. J. Anderson-Coto, M. Wu, J. S. Lee and C. Steinkuehler, "Esports research: A literature review," Games and Culture, vol. 15, p. 32–50, 2020.
    [58] J. Hamari and M. Sjöblom, "What is eSports and why do people watch it?," Internet research, 2017.
    [59] S. Jain, D. Niranjan, H. Lamba, N. Shah and P. Kumaraguru, "Characterizing and Detecting Livestreaming Chatbots," in Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, New York, NY, USA, 2020.
    [60] M. A. Casey, R. Veltkamp, M. Goto, M. Leman, C. Rhodes and M. Slaney, "Content-based music information retrieval: Current directions and future challenges," Proceedings of the IEEE, vol. 96, p. 668–696, 2008.
    [61] Y. V. S. Murthy and S. G. Koolagudi, "Content-Based Music Information Retrieval (CB-MIR) and Its Applications toward the Music Industry: A Review," ACM Comput. Surv., vol. 51, June 2018.
    [62] H. Otake, Opening Theory Made Easy - Twenty Strategic Principles to Improve Your Opening Game, Kiseido, 2002.
    [63] P. Hindemith, The Craft of Musical Composition: Book 1: Theoretical Part, Schott Music, 2020.
    [64] K.-W. Su, "Study on retrieving subjective knowledge from music using adaptive cloud-based structure," 2012.
    [65] L. Kramer, Interpreting music, Univ of California Press, 2011.
    [66] P. Riddell, "Metaphor, simile, analogy and the brain," Changing English, vol. 23, p. 363–374, 2016.
    [67] C. Laurier and P. Herrera, "Automatic detection of emotion in music: Interaction with emotionally sensitive machines," in Machine learning: Concepts, methodologies, tools and applications, IGI Global, 2012, p. 1330–1354.
    [68] D. B. Benson, "Life in the game of Go," Information Sciences, vol. 10, pp. 17-29, 1976.
    [69] M. Forišek, "Computational Complexity of Two-Dimensional Platform Games," in Fun with Algorithms, Berlin, 2010.
    [70] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot and others, "Mastering the game of Go with deep neural networks and tree search," nature, vol. 529, p. 484–489, 2016.
    [71] M. Campbell, "Mastering board games," Science, vol. 362, pp. 1118-1118, 2018.
    [72] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton and others, "Mastering the game of go without human knowledge," nature, vol. 550, p. 354–359, 2017.
    [73] M. Newborn, Kasparov versus Deep Blue: Computer chess comes of age, Springer Science & Business Media, 2012.
    [74] C.-S. Lee, M.-H. Wang, S.-J. Yen, T.-H. Wei, I.-C. Wu, P.-C. Chou, C.-H. Chou, M.-W. Wang and T.-H. Yan, "Human vs. Computer Go: Review and Prospect [Discussion Forum]," IEEE Computational Intelligence Magazine, vol. 11, pp. 67-72, 2016.
    [75] J. Moudřı́k, "Meta-learning methods for analyzing go playing trends," Univerzita Karlova, Matematicko-fyzikálnı́ fakulta, 2013.
    [76] K. Fujita, "AlphaDDA: strategies for adjusting the playing strength of a fully trained AlphaZero system to a suitable human training partner," PeerJ Computer Science, vol. 8, p. e1123, 2022.
    [77] J. Foulds, "Learning to Play the Game of Go," 2006.
    [78] P. Shotwell, Go! More than a game, Tuttle Publishing, 2011.
    [79] N.-B. Guarriello, "Never give up, never surrender: Game live streaming, neoliberal work, and personalized media economies," New Media & Society, vol. 21, p. 1750–1769, 2019.
    [80] M. Redmond, Michael Redmond's Go TV channel - Shin Jinseo 9P vs. Park Junghwan 9P 26th Samsung Cup Final Game3, 2021.
    [81] Haifong Go Live Streaming, Haifong Go.
    [82] G. Bernardes and D. Cocharro, Dynamic Music Generation, Audio Analysis-Synthesis Methods., 2019.
    [83] A. Prechtl, Adaptive music generation for computer games, Open University (United Kingdom), 2016.
    [84] J. Barratt and C. Pan, "Playing go without game tree search using convolutional neural networks," arXiv preprint arXiv:1907.04658, 2019.
    [85] Z. Ling, H. Ma, Y. Yang, R. C. Qiu, S.-C. Zhu and Q. Zhang, "Explaining AlphaGo: Interpreting Contextual Effects in Neural Networks," arXiv e-prints, p. arXiv:1901.02184, January 2019.
    [86] M. Kaminskas and F. Ricci, "Contextual music information retrieval and recommendation: State of the art and challenges," Computer Science Review, vol. 6, p. 89–119, 2012.
    [87] G. Dong, G. Liao, H. Liu and G. Kuang, "A review of the autoencoder and its variants: A comparative perspective from target recognition in synthetic-aperture radar images," IEEE Geoscience and Remote Sensing Magazine, vol. 6, p. 44–68, 2018.
    [88] A. Creswell, T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta and A. A. Bharath, "Generative adversarial networks: An overview," IEEE signal processing magazine, vol. 35, p. 53–65, 2018.
    [89] A. Gharakhanian, Generative Adversarial Networks – Hot Topic in Machine Learning, 2017.
    [90] H. Salehinejad, S. Sankar, J. Barfett, E. Colak and S. Valaee, "Recent advances in recurrent neural networks," arXiv preprint arXiv:1801.01078, 2017.
    [91] Z. Khan, S. M. Khan, K. Dey and M. Chowdhury, "Development and evaluation of recurrent neural network-based models for hourly traffic volume and annual average daily traffic prediction," Transportation Research Record, vol. 2673, p. 489–503, 2019.
    [92] T. Lin, Y. Wang, X. Liu and X. Qiu, "A survey of transformers," AI Open, 2022.
    [93] P. Esser, R. Rombach and B. Ommer, "Taming transformers for high-resolution image synthesis," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021.
    [94] Y. Tay, M. Dehghani, D. Bahri and D. Metzler, "Efficient transformers: A survey," ACM Computing Surveys, vol. 55, p. 1–28, 2022.
    [95] X. Wang, S. Wang, X. Liang, D. Zhao, J. Huang, X. Xu, B. Dai and Q. Miao, "Deep reinforcement learning: a survey," IEEE Transactions on Neural Networks and Learning Systems, 2022.
    [96] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver and K. Kavukcuoglu, "Asynchronous methods for deep reinforcement learning," in International conference on machine learning, 2016.
    [97] J. Fan, Z. Wang, Y. Xie and Z. Yang, "A theoretical analysis of deep Q-learning," in Learning for Dynamics and Control, 2020.
    [98] A. Roy, A. Vaswani, A. Neelakantan and N. Parmar, "Theory and experiments on vector quantized autoencoders," arXiv preprint arXiv:1805.11063, 2018.
    [99] A. Łańcucki, J. Chorowski, G. Sanchez, R. Marxer, N. Chen, H. J. G. A. Dolfing, S. Khurana, T. Alumäe and A. Laurent, "Robust training of vector quantized bottleneck models," in 2020 International Joint Conference on Neural Networks (IJCNN), 2020.
    [100] C. Seger, An investigation of categorical variable encoding techniques in machine learning: binary versus one-hot and feature hashing, 2018.
    [101] S. J. Mielke, Z. Alyafeai, E. Salesky, C. Raffel, M. Dey, M. Gallé, A. Raja, C. Si, W. Y. Lee, B. Sagot and others, "Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP," arXiv preprint arXiv:2112.10508, 2021.
    [102] Q. Liu, M. J. Kusner and P. Blunsom, "A survey on contextual embeddings," arXiv preprint arXiv:2003.07278, 2020.
    [103] N. Gunantara, "A review of multi-objective optimization: Methods and its applications," Cogent Engineering, vol. 5, p. 1502242, 2018.
    [104] F. Baquero, T. M. Coque, J. C. Galán and J. L. Martinez, "The origin of niches and species in the bacterial world," Frontiers in microbiology, vol. 12, p. 657986, 2021.
    [105] M. A. Bartsch and G. H. Wakefield, "Audio thumbnailing of popular music using chroma-based representations," IEEE Transactions on multimedia, vol. 7, p. 96–104, 2005.
    [106] P. Baudiš and J.-l. Gailly, "Pachi: State of the art open source Go program," in Advances in Computer Games: 13th International Conference, ACG 2011, Tilburg, The Netherlands, November 20-22, 2011, Revised Selected Papers 13, 2012.
    [107] J. Schulman, F. Wolski, P. Dhariwal, A. Radford and O. Klimov, "Proximal policy optimization algorithms," arXiv preprint arXiv:1707.06347, 2017.
    [108] X. Zhai, A. Oliver, A. Kolesnikov and L. Beyer, "S4L: Self-Supervised Semi-Supervised Learning," in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
    [109] N. Stiennon, L. Ouyang, J. Wu, D. Ziegler, R. Lowe, C. Voss, A. Radford, D. Amodei and P. F. Christiano, "Learning to summarize with human feedback," in Advances in Neural Information Processing Systems, 2020.
    [110] D. J. Wu, "Accelerating self-play learning in go," arXiv preprint arXiv:1902.10565, 2019.
    [111] S. Targ, D. Almeida and K. Lyman, "Resnet in resnet: Generalizing residual architectures," arXiv preprint arXiv:1603.08029, 2016.
    [112] G. M. J.-B. C. Chaslot, Monte-carlo tree search, vol. 24, Maastricht University, 2010.
    [113] R. Coulom and M. Enzenberger, GoGUI, 2022.
    [114] K. Kilgour, M. Zuluaga, D. Roblek and M. Sharifi, "Fréchet Audio Distance: A Reference-Free Metric for Evaluating Music Enhancement Algorithms," in Proc. Interspeech 2019, 2019.
    [115] C. Hawthorne, A. Stasyuk, A. Roberts, I. Simon, C.-Z. A. Huang, S. Dieleman, E. Elsen, J. Engel and D. Eck, "Enabling factorized piano music modeling and generation with the MAESTRO dataset," arXiv preprint arXiv:1810.12247, 2018.
    [116] K.-W. Su, Generated Piano Music from Go Games, 2023.
    [117] D. Stern, R. Herbrich and T. Graepel, "Bayesian pattern ranking for move prediction in the game of Go," in Proceedings of the 23rd international conference on Machine learning, 2006.

    QR CODE