條件式具主觀成分異質內容之生成方法｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	蘇冠武 Kuan-Wu Su
論文名稱：	條件式具主觀成分異質內容之生成方法 Conditional Content Generation Based on Subjective Perception Context from Heterogeneous Content Types
指導教授：	呂政修 Jenq-Shiou Leu
口試委員:	呂政修 Jenq-Shiou Leu 周承復 Cheng-Fu Chou 魏宏宇 Hung-Yu Wei 陳俊良 Jiann-Liang Chen 陳郁堂 Yie-Tarng Chen 衛信文 Hsin-Wen Wei 方文賢 Wen-Hsien Fang 鄭瑞光 Ray-Guang Cheng 阮聖彰 Shanq-Jang Ruan
學位類別：	博士 Doctor
系所名稱：	電資學院 - 電子工程系 Department of Electronic and Computer Engineering
論文出版年：	2023
畢業學年度：	111
語文別：	英文
論文頁數：	123
中文關鍵詞：	生成模型、條件式生成、深度學習、主觀認知差異、音樂生成、圍棋
外文關鍵詞：	Generative Models, Conditional Content Generation, Deep Learning, Subjective Perspective, Music Generation, The Game of Go
相關次數：	點閱：293 下載：4
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

近幾年來生成模型發展迅速，利用人工智慧進行自動內容生成是深度學習的下一步巨大進展，並將產生許多革命的新應用。各種以內容搜尋為主的服務正逐漸轉變成為自動內容生成為主，從查詢關鍵字並爬梳關聯內容，轉為利用特製提示來直接產生所需的內容。但自動生成的內容目前往往仍需要多次試誤，產生許多不同的變化並從中精挑細選。而且有些使用者真正想要的內容，會因提示語言本身的限制，和模型缺乏某些專門領域知識導致難以生成，需額外訓練資料並花時間重新微調生成模型。有鑑於這些缺陷，本研究從使用者對不同性質的內容具有不同主觀觀感出發，利用條件式生成模型去客製化生成內容。並以不同棋手對圍棋著手的評估差異為例，生成出對應不同主觀經驗的個人客製化觀感音樂旋律。生成出來的音樂與乾淨的高品質鋼琴演奏樂間的佛雷歇距離平均只有1.687，接近錄音室錄製的品質。同時具有利用基因鑲嵌轉譯模組與其他生成模型橋接的彈性。

Generative models had advanced significantly in recent years, and AI-powered content creation is the next big breakthrough for deep learning applications. Content-based Multimedia Information Retrieval gives way to Content-based Multimedia Generation and Creation. Instead of finding a list of correlated contents from queries, Conditional Content Generation creates brand new contents based on carefully-crafted prompts. But time consuming trial-and-Error is still required with many operating iterations to produce enough contents where only some of them are preferred outcomes. Moreover, due to the lack of specific domain knowledge, and the limitation in language expressions for prompts, some desirable outcomes might not be possible without fine-tuning models and additional training with domain-specific datasets. Hence in this dissertation, a conditional content generation method embedded different subjective perspectives across heterogeneous content types is proposed, and demonstrated by utilizing different opinions of moves in the game of Go to generate associated contextual music segments. The result of the generated music on average has 1.687 Fréchet Audio Distance (FAD) score compared to a clean virtuosic classical piano music dataset, nearly at the level of studio recording quality. It also has the flexibility to combine with other generation models through the use of genetic embedding translator.

論文摘要    I
Abstract    II
Acknowledgments    III
List of Figures    VI
List of Tables    IX
Chapter 1    Introduction    1
1.1.    Background    1
1.1.1.    Content Generation    1
1.1.2. Information Retrieval and Conditional Content Generation    2
1.2. Motivation    5
1.2.1. Personalized Content Generation    5
1.2.2. Personalized Content Enrichment    5
1.3. Research Target    7
1.4. Research Problems    8
1.4.1. Heterogeneous Contents    8
1.4.2. Scalability and Personalization    11
1.4.3. Bridging Interpretations and Content Creation    11
1.5. Dissertation Organization    12
Chapter 2    Subjective Perceptions in Heterogeneous Contents    13
2.1. Subjective Perceptions and AIs in the Game of Go    14
2.1.1. Go Games as Spectator Events    17
2.1.2. Representations for a Go Game Position and Commentaries    19
2.1.3. Common Go Game Terminologies    20
2.1.4. Perceptions and Expressions used in Go Games    21
2.1.5. Player Proficiency and Ranks in Go Games    22
2.2. Perceptions in Music    24
Chapter 3    Related Methods and Models    26
3.1. Generative Models    26
3.1.1. Autoencoder    26
3.1.2. Generative Adversarial Networks    28
3.1.3. Recurrent Neural Networks    30
3.1.4. Transformers    32
3.2. Deep Reinforcement Learning    34
3.3. Context and Perspective Translator    37
3.3.1. Quantized Vectors and Codebook    37
3.3.2. Tokenizer and Embedding    38
3.3.3. Genetic Optimization Embedding    39
3.4. Low-level Feature Identification    41
3.4.1. Mid-Level Features Extraction    42
3.5. Spectrogram Representation    45
Chapter 4    System Structure and Methods    47
4.1. Data Gathering and Dynamic Survey Generation    47
4.2. Self and Semi Supervised Learning    47
4.3. System Architecture    49
4.3.1. Go Game Perception Interpreter Model    50
4.3.2. Musika Music Content Generation Model    54
4.3.3. Conditional Go game board and Perception Integration Generator    56
4.3.4. Genetic Embedding Translator and Controller    57
Chapter 5    Experiments and Results    60
5.1. Experimental Setup and Hardware    60
5.2. Conditional Music Generation based on Perceptions    60
5.3. Conditional Image Commentary Generation    63
Chapter 6    Conclusion    67
6.1. Discussion    67
6.2. Conclusion    68
6.3. Future Works    69
References        71
Appendix A  Go Game Basics and Terminologies    80
Appendix B  Perceptions of Patterns in Go Game Positions    88
Appendix C  Music Elements Aligned with Perceptions for Go Games    94
C.1. Music Associated Elements    94
C.1.1. Pitch and Note-Value    94
C.1.2. Measures and Themes    95
C.1.3. Tempo and Beats            95
C.1.4. Rhythm and Accent    96
C.1.5. Scale and Chord            96
C.2. Associated Perceptions between Music and Go Games    97
C.2.1. Height and Pitch            98
C.2.2. Big/Small and Loudness    99
C.2.3. Speed and Tempo            101
C.2.4. Strong/Soft and Rhythm    103
2.3.5. Complexity and Melody    106
Publication List            111
                                

[1] D. Foster, Generative deep learning: teaching machines to paint, write, compose, and play, O'Reilly Media, 2019.
[2] J. Summaira, X. Li, A. M. Shoib, S. Li and J. Abdul, "Recent Advances and Trends in Multimodal Deep Learning: A Review," arXiv preprint arXiv:2105.11087, 2021.
[3] A. Radford, K. Narasimhan, T. Salimans, I. Sutskever and others, "Improving language understanding by generative pre-training," 2018.
[4] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever and others, "Language models are unsupervised multitask learners," OpenAI, vol. 1, p. 9, 2019.
[5] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell and others, "Language models are few-shot learners," Advances in neural information processing systems, vol. 33, p. 1877–1901, 2020.
[6] S. Black, S. Biderman, E. Hallahan, Q. Anthony, L. Gao, L. Golding, H. He, C. Leahy, K. McDonell, J. Phang and others, "Gpt-neox-20b: An open-source autoregressive language model," arXiv preprint arXiv:2204.06745, 2022.
[7] J. Devlin, M.-W. Chang, K. Lee and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.
[8] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer and V. Stoyanov, "Roberta: A robustly optimized bert pretraining approach," arXiv preprint arXiv:1907.11692, 2019.
[9] R. Thoppilan, D. De Freitas, J. Hall, N. Shazeer, A. Kulshreshtha, H.-T. Cheng, A. Jin, T. Bos, L. Baker, Y. Du and others, "Lamda: Language models for dialog applications," arXiv preprint arXiv:2201.08239, 2022.
[10] G. Lample and A. Conneau, "Cross-lingual language model pretraining," arXiv preprint arXiv:1901.07291, 2019.
[11] S. Reed, K. Zolna, E. Parisotto, S. G. Colmenarejo, A. Novikov, G. Barth-Maron, M. Gimenez, Y. Sulsky, J. Kay, J. T. Springenberg and others, "A generalist agent," arXiv preprint arXiv:2205.06175, 2022.
[12] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville and Y. Bengio, "Generative adversarial networks," Communications of the ACM, vol. 63, p. 139–144, 2020.
[13] L. Wang, W. Chen, W. Yang, F. Bi and F. R. Yu, "A State-of-the-Art Review on Image Synthesis With Generative Adversarial Networks," IEEE Access, vol. 8, pp. 63514-63537, 2020.
[14] A. Aggarwal, M. Mittal and G. Battineni, "Generative adversarial network: An overview of theory and applications," International Journal of Information Management Data Insights, vol. 1, p. 100004, 2021.
[15] J. Gui, Z. Sun, Y. Wen, D. Tao and J. Ye, "A review on generative adversarial networks: Algorithms, theory, and applications," IEEE Transactions on Knowledge and Data Engineering, 2021.
[16] M. Tschannen, O. Bachem and M. Lucic, "Recent advances in autoencoder-based representation learning," arXiv preprint arXiv:1812.05069, 2018.
[17] W. Xu, S. Keshmiri and G. Wang, "Adversarially Approximated Autoencoder for Image Generation and Manipulation," IEEE Transactions on Multimedia, vol. 21, pp. 2387-2396, 2019.
[18] R. Child, "Very deep vaes generalize autoregressive models and can outperform them on images," arXiv preprint arXiv:2011.10650, 2020.
[19] J. Yu, Y. Xu, J. Y. Koh, T. Luong, G. Baid, Z. Wang, V. Vasudevan, A. Ku, Y. Yang, B. K. Ayan and others, "Scaling autoregressive models for content-rich text-to-image generation," arXiv preprint arXiv:2206.10789, 2022.
[20] Y. Lu, J. Collado, D. Whiteson and P. Baldi, "Sparse autoregressive models for scalable generation of sparse images in particle physics," Physical Review D, vol. 103, p. 036012, 2021.
[21] J. Ho, A. Jain and P. Abbeel, "Denoising diffusion probabilistic models," Advances in Neural Information Processing Systems, vol. 33, p. 6840–6851, 2020.
[22] L. Yang, Z. Zhang, Y. Song, S. Hong, R. Xu, Y. Zhao, Y. Shao, W. Zhang, B. Cui and M.-H. Yang, "Diffusion models: A comprehensive survey of methods and applications," arXiv preprint arXiv:2209.00796, 2022.
[23] L.-C. Yang, S.-Y. Chou and Y.-H. Yang, "MidiNet: A convolutional generative adversarial network for symbolic-domain music generation," arXiv preprint arXiv:1703.10847, 2017.
[24] C.-F. Huang and C.-Y. Huang, "Emotion-based AI music generation system with CVAE-GAN," in 2020 IEEE Eurasia Conference on IOT, Communication and Engineering (ECICE), 2020.
[25] J.-Y. Liu, Y.-H. Chen, Y.-C. Yeh and Y.-H. Yang, "Unconditional audio generation with generative adversarial networks and cycle regularization," arXiv preprint arXiv:2005.08526, 2020.
[26] P. Dhariwal, H. Jun, C. Payne, J. W. Kim, A. Radford and I. Sutskever, "Jukebox: A generative model for music," arXiv preprint arXiv:2005.00341, 2020.
[27] M. Pasini and J. Schlüter, "Musika! Fast Infinite Waveform Music Generation," arXiv preprint arXiv:2208.08706, 2022.
[28] S. Han, H. Ihm and W. Lim, "Symbolic Music Loop Generation with VQ-VAE," arXiv preprint arXiv:2111.07657, 2021.
[29] V. Iashin and E. Rahtu, "Taming visually guided sound generation," arXiv preprint arXiv:2110.08791, 2021.
[30] K.-W. Su, M.-C. Yu and J.-S. Leu, "A Neuroevolution Strategy Using Multi-Agent Incorporated Hierarchical Ensemble Model," in Proceedings of the Genetic and Evolutionary Computation Conference Companion, New York, NY, USA, 2018.
[31] F. Shah, T. Naik and N. Vyas, "LSTM based music generation," in 2019 International Conference on Machine Learning and Data Engineering (iCMLDE), 2019.
[32] M. Dua, R. Yadav, D. Mamgai and S. Brodiya, "An Improved RNN-LSTM based Novel Approach for Sheet Music Generation," Procedia Computer Science, vol. 171, pp. 465-474, 2020.
[33] W. Schulze and B. van der Merwe, "Music Generation with Markov Models," IEEE MultiMedia, vol. 18, pp. 78-85, July 2011.
[34] B. Mor, S. Garhwal and A. Kumar, "A Systematic Review of Hidden Markov Models and Their Applications," Archives of Computational Methods in Engineering, vol. 28, p. 1429–1448, 2021.
[35] S. Smith, M. Patwary, B. Norick, P. LeGresley, S. Rajbhandari, J. Casper, Z. Liu, S. Prabhumoye, G. Zerveas, V. Korthikanti, E. Zhang, R. Child, R. Yazdani Aminabadi, J. Bernauer, X. Song, M. Shoeybi, Y. He, M. Houston, S. Tiwary and B. Catanzaro, "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model," arXiv e-prints, p. arXiv:2201.11990, January 2022.
[36] A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, H. W. Chung, C. Sutton, S. Gehrmann, P. Schuh, K. Shi, S. Tsvyashchenko, J. Maynez, A. Rao, P. Barnes, Y. Tay, N. Shazeer, V. Prabhakaran, E. Reif, N. Du, B. Hutchinson, R. Pope, J. Bradbury, J. Austin, M. Isard, G. Gur-Ari, P. Yin, T. Duke, A. Levskaya, S. Ghemawat, S. Dev, H. Michalewski, X. Garcia, V. Misra, K. Robinson, L. Fedus, D. Zhou, D. Ippolito, D. Luan, H. Lim, B. Zoph, A. Spiridonov, R. Sepassi, D. Dohan, S. Agrawal, M. Omernick, A. M. Dai, T. Sankaranarayana Pillai, M. Pellat, A. Lewkowycz, E. Moreira, R. Child, O. Polozov, K. Lee, Z. Zhou, X. Wang, B. Saeta, M. Diaz, O. Firat, M. Catasta, J. Wei, K. Meier-Hellstern, D. Eck, J. Dean, S. Petrov and N. Fiedel, "PaLM: Scaling Language Modeling with Pathways," arXiv e-prints, p. arXiv:2204.02311, April 2022.
[37] W. Fedus, B. Zoph and N. Shazeer, "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity," arXiv e-prints, p. arXiv:2101.03961, January 2021.
[38] A. Nakamura and T. Harada, "Revisiting Fine-tuning for Few-shot Learning," arXiv e-prints, p. arXiv:1910.00216, October 2019.
[39] Y. Wang, Q. Yao, J. T. Kwok and L. M. Ni, "Generalizing from a Few Examples: A Survey on Few-Shot Learning," ACM Comput. Surv., vol. 53, June 2020.
[40] M. Tsimpoukelli, J. L. Menick, S. Cabi, S. M. A. Eslami, O. Vinyals and F. Hill, "Multimodal Few-Shot Learning with Frozen Language Models," in Advances in Neural Information Processing Systems, 2021.
[41] R. Gal, Y. Alaluf, Y. Atzmon, O. Patashnik, A. H. Bermano, G. Chechik and D. Cohen-Or, "An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion," arXiv e-prints, p. arXiv:2208.01618, August 2022.
[42] F.-z. El-Alami, S. O. E. Alaoui and N. E. Nahnahi, "Contextual semantic embeddings based on fine-tuned AraBERT model for Arabic text multi-class categorization," Journal of King Saud University - Computer and Information Sciences, vol. 34, pp. 8422-8428, 2022.
[43] A. Dadashzadeh, A. Whone and M. Mirmehdi, "Auxiliary Learning for Self-Supervised Video Representation via Similarity-Based Knowledge Distillation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2022.
[44] C. D. Manning, Introduction to information retrieval, Syngress Publishing,, 2008.
[45] áOscar Celma, Music Recommendation and Discovery: The Long Tail, Long Fail, and Long Play in the Digital Music Space, Springer, 2010.
[46] C. Carpineto and G. Romano, "A Survey of Automatic Query Expansion in Information Retrieval," ACM Comput. Surv., vol. 44, January 2012.
[47] R. Campos, G. Dias, A. M. Jorge and A. Jatowt, "Survey of temporal information retrieval and related applications," ACM Computing Surveys (CSUR), vol. 47, p. 1–41, 2014.
[48] X. Yan, J. Yang, K. Sohn and H. Lee, "Attribute2image: Conditional image generation from visual attributes," in European conference on computer vision, 2016.
[49] A. Ramesh, P. Dhariwal, A. Nichol, C. Chu and M. Chen, "Hierarchical text-conditional image generation with clip latents," arXiv preprint arXiv:2204.06125, 2022.
[50] R. Abbott, Research Handbook on Intellectual Property and Artificial Intelligence, Edward Elgar Publishing, 2022.
[51] J. Salminen, S.-g. Jung, S. Chowdhury and B. J. Jansen, "Analyzing demographic bias in artificially generated facial pictures," in Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems, 2020.
[52] V. Liu and L. B. Chilton, "Design Guidelines for Prompt Engineering Text-to-Image Generative Models," in CHI Conference on Human Factors in Computing Systems, 2022.
[53] K. Zhou, J. Yang, C. C. Loy and Z. Liu, "Learning to prompt for vision-language models," International Journal of Computer Vision, vol. 130, p. 2337–2348, 2022.
[54] M. Arjovsky, S. Chintala and L. Bottou, "Wasserstein generative adversarial networks," in International conference on machine learning, 2017.
[55] A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen and I. Sutskever, "Zero-shot text-to-image generation," in International Conference on Machine Learning, 2021.
[56] T. Karras, S. Laine and T. Aila, "A style-based generator architecture for generative adversarial networks," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019.
[57] J. G. Reitman, M. J. Anderson-Coto, M. Wu, J. S. Lee and C. Steinkuehler, "Esports research: A literature review," Games and Culture, vol. 15, p. 32–50, 2020.
[58] J. Hamari and M. Sjöblom, "What is eSports and why do people watch it?," Internet research, 2017.
[59] S. Jain, D. Niranjan, H. Lamba, N. Shah and P. Kumaraguru, "Characterizing and Detecting Livestreaming Chatbots," in Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, New York, NY, USA, 2020.
[60] M. A. Casey, R. Veltkamp, M. Goto, M. Leman, C. Rhodes and M. Slaney, "Content-based music information retrieval: Current directions and future challenges," Proceedings of the IEEE, vol. 96, p. 668–696, 2008.
[61] Y. V. S. Murthy and S. G. Koolagudi, "Content-Based Music Information Retrieval (CB-MIR) and Its Applications toward the Music Industry: A Review," ACM Comput. Surv., vol. 51, June 2018.
[62] H. Otake, Opening Theory Made Easy - Twenty Strategic Principles to Improve Your Opening Game, Kiseido, 2002.
[63] P. Hindemith, The Craft of Musical Composition: Book 1: Theoretical Part, Schott Music, 2020.
[64] K.-W. Su, "Study on retrieving subjective knowledge from music using adaptive cloud-based structure," 2012.
[65] L. Kramer, Interpreting music, Univ of California Press, 2011.
[66] P. Riddell, "Metaphor, simile, analogy and the brain," Changing English, vol. 23, p. 363–374, 2016.
[67] C. Laurier and P. Herrera, "Automatic detection of emotion in music: Interaction with emotionally sensitive machines," in Machine learning: Concepts, methodologies, tools and applications, IGI Global, 2012, p. 1330–1354.
[68] D. B. Benson, "Life in the game of Go," Information Sciences, vol. 10, pp. 17-29, 1976.
[69] M. Forišek, "Computational Complexity of Two-Dimensional Platform Games," in Fun with Algorithms, Berlin, 2010.
[70] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot and others, "Mastering the game of Go with deep neural networks and tree search," nature, vol. 529, p. 484–489, 2016.
[71] M. Campbell, "Mastering board games," Science, vol. 362, pp. 1118-1118, 2018.
[72] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton and others, "Mastering the game of go without human knowledge," nature, vol. 550, p. 354–359, 2017.
[73] M. Newborn, Kasparov versus Deep Blue: Computer chess comes of age, Springer Science & Business Media, 2012.
[74] C.-S. Lee, M.-H. Wang, S.-J. Yen, T.-H. Wei, I.-C. Wu, P.-C. Chou, C.-H. Chou, M.-W. Wang and T.-H. Yan, "Human vs. Computer Go: Review and Prospect [Discussion Forum]," IEEE Computational Intelligence Magazine, vol. 11, pp. 67-72, 2016.
[75] J. Moudřı́k, "Meta-learning methods for analyzing go playing trends," Univerzita Karlova, Matematicko-fyzikálnı́ fakulta, 2013.
[76] K. Fujita, "AlphaDDA: strategies for adjusting the playing strength of a fully trained AlphaZero system to a suitable human training partner," PeerJ Computer Science, vol. 8, p. e1123, 2022.
[77] J. Foulds, "Learning to Play the Game of Go," 2006.
[78] P. Shotwell, Go! More than a game, Tuttle Publishing, 2011.
[79] N.-B. Guarriello, "Never give up, never surrender: Game live streaming, neoliberal work, and personalized media economies," New Media & Society, vol. 21, p. 1750–1769, 2019.
[80] M. Redmond, Michael Redmond's Go TV channel - Shin Jinseo 9P vs. Park Junghwan 9P 26th Samsung Cup Final Game3, 2021.
[81] Haifong Go Live Streaming, Haifong Go.
[82] G. Bernardes and D. Cocharro, Dynamic Music Generation, Audio Analysis-Synthesis Methods., 2019.
[83] A. Prechtl, Adaptive music generation for computer games, Open University (United Kingdom), 2016.
[84] J. Barratt and C. Pan, "Playing go without game tree search using convolutional neural networks," arXiv preprint arXiv:1907.04658, 2019.
[85] Z. Ling, H. Ma, Y. Yang, R. C. Qiu, S.-C. Zhu and Q. Zhang, "Explaining AlphaGo: Interpreting Contextual Effects in Neural Networks," arXiv e-prints, p. arXiv:1901.02184, January 2019.
[86] M. Kaminskas and F. Ricci, "Contextual music information retrieval and recommendation: State of the art and challenges," Computer Science Review, vol. 6, p. 89–119, 2012.
[87] G. Dong, G. Liao, H. Liu and G. Kuang, "A review of the autoencoder and its variants: A comparative perspective from target recognition in synthetic-aperture radar images," IEEE Geoscience and Remote Sensing Magazine, vol. 6, p. 44–68, 2018.
[88] A. Creswell, T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta and A. A. Bharath, "Generative adversarial networks: An overview," IEEE signal processing magazine, vol. 35, p. 53–65, 2018.
[89] A. Gharakhanian, Generative Adversarial Networks – Hot Topic in Machine Learning, 2017.
[90] H. Salehinejad, S. Sankar, J. Barfett, E. Colak and S. Valaee, "Recent advances in recurrent neural networks," arXiv preprint arXiv:1801.01078, 2017.
[91] Z. Khan, S. M. Khan, K. Dey and M. Chowdhury, "Development and evaluation of recurrent neural network-based models for hourly traffic volume and annual average daily traffic prediction," Transportation Research Record, vol. 2673, p. 489–503, 2019.
[92] T. Lin, Y. Wang, X. Liu and X. Qiu, "A survey of transformers," AI Open, 2022.
[93] P. Esser, R. Rombach and B. Ommer, "Taming transformers for high-resolution image synthesis," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021.
[94] Y. Tay, M. Dehghani, D. Bahri and D. Metzler, "Efficient transformers: A survey," ACM Computing Surveys, vol. 55, p. 1–28, 2022.
[95] X. Wang, S. Wang, X. Liang, D. Zhao, J. Huang, X. Xu, B. Dai and Q. Miao, "Deep reinforcement learning: a survey," IEEE Transactions on Neural Networks and Learning Systems, 2022.
[96] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver and K. Kavukcuoglu, "Asynchronous methods for deep reinforcement learning," in International conference on machine learning, 2016.
[97] J. Fan, Z. Wang, Y. Xie and Z. Yang, "A theoretical analysis of deep Q-learning," in Learning for Dynamics and Control, 2020.
[98] A. Roy, A. Vaswani, A. Neelakantan and N. Parmar, "Theory and experiments on vector quantized autoencoders," arXiv preprint arXiv:1805.11063, 2018.
[99] A. Łańcucki, J. Chorowski, G. Sanchez, R. Marxer, N. Chen, H. J. G. A. Dolfing, S. Khurana, T. Alumäe and A. Laurent, "Robust training of vector quantized bottleneck models," in 2020 International Joint Conference on Neural Networks (IJCNN), 2020.
[100] C. Seger, An investigation of categorical variable encoding techniques in machine learning: binary versus one-hot and feature hashing, 2018.
[101] S. J. Mielke, Z. Alyafeai, E. Salesky, C. Raffel, M. Dey, M. Gallé, A. Raja, C. Si, W. Y. Lee, B. Sagot and others, "Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP," arXiv preprint arXiv:2112.10508, 2021.
[102] Q. Liu, M. J. Kusner and P. Blunsom, "A survey on contextual embeddings," arXiv preprint arXiv:2003.07278, 2020.
[103] N. Gunantara, "A review of multi-objective optimization: Methods and its applications," Cogent Engineering, vol. 5, p. 1502242, 2018.
[104] F. Baquero, T. M. Coque, J. C. Galán and J. L. Martinez, "The origin of niches and species in the bacterial world," Frontiers in microbiology, vol. 12, p. 657986, 2021.
[105] M. A. Bartsch and G. H. Wakefield, "Audio thumbnailing of popular music using chroma-based representations," IEEE Transactions on multimedia, vol. 7, p. 96–104, 2005.
[106] P. Baudiš and J.-l. Gailly, "Pachi: State of the art open source Go program," in Advances in Computer Games: 13th International Conference, ACG 2011, Tilburg, The Netherlands, November 20-22, 2011, Revised Selected Papers 13, 2012.
[107] J. Schulman, F. Wolski, P. Dhariwal, A. Radford and O. Klimov, "Proximal policy optimization algorithms," arXiv preprint arXiv:1707.06347, 2017.
[108] X. Zhai, A. Oliver, A. Kolesnikov and L. Beyer, "S4L: Self-Supervised Semi-Supervised Learning," in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
[109] N. Stiennon, L. Ouyang, J. Wu, D. Ziegler, R. Lowe, C. Voss, A. Radford, D. Amodei and P. F. Christiano, "Learning to summarize with human feedback," in Advances in Neural Information Processing Systems, 2020.
[110] D. J. Wu, "Accelerating self-play learning in go," arXiv preprint arXiv:1902.10565, 2019.
[111] S. Targ, D. Almeida and K. Lyman, "Resnet in resnet: Generalizing residual architectures," arXiv preprint arXiv:1603.08029, 2016.
[112] G. M. J.-B. C. Chaslot, Monte-carlo tree search, vol. 24, Maastricht University, 2010.
[113] R. Coulom and M. Enzenberger, GoGUI, 2022.
[114] K. Kilgour, M. Zuluaga, D. Roblek and M. Sharifi, "Fréchet Audio Distance: A Reference-Free Metric for Evaluating Music Enhancement Algorithms," in Proc. Interspeech 2019, 2019.
[115] C. Hawthorne, A. Stasyuk, A. Roberts, I. Simon, C.-Z. A. Huang, S. Dieleman, E. Elsen, J. Engel and D. Eck, "Enabling factorized piano music modeling and generation with the MAESTRO dataset," arXiv preprint arXiv:1810.12247, 2018.
[116] K.-W. Su, Generated Piano Music from Go Games, 2023.
[117] D. Stern, R. Herbrich and T. Graepel, "Bayesian pattern ranking for move prediction in the game of Go," in Proceedings of the 23rd international conference on Machine learning, 2006.

簡易檢索 / 詳目顯示

相關論文