研究生: 蘇冠武
Kuan-Wu Su
論文名稱: 條件式具主觀成分異質內容之生成方法
Conditional Content Generation Based on Subjective Perception Context from Heterogeneous Content Types
指導教授: 呂政修
Jenq-Shiou Leu
口試委員: 呂政修
Jenq-Shiou Leu
Cheng-Fu Chou
Hung-Yu Wei
Jiann-Liang Chen
Yie-Tarng Chen
Hsin-Wen Wei
Wen-Hsien Fang
Ray-Guang Cheng
Shanq-Jang Ruan
學位類別: 博士
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 123
中文關鍵詞: 生成模型條件式生成深度學習主觀認知差異音樂生成圍棋
外文關鍵詞: Generative Models, Conditional Content Generation, Deep Learning, Subjective Perspective, Music Generation, The Game of Go
相關次數: 點閱:524下載:9
Generative models had advanced significantly in recent years, and AI-powered content creation is the next big breakthrough for deep learning applications. Content-based Multimedia Information Retrieval gives way to Content-based Multimedia Generation and Creation. Instead of finding a list of correlated contents from queries, Conditional Content Generation creates brand new contents based on carefully-crafted prompts. But time consuming trial-and-Error is still required with many operating iterations to produce enough contents where only some of them are preferred outcomes. Moreover, due to the lack of specific domain knowledge, and the limitation in language expressions for prompts, some desirable outcomes might not be possible without fine-tuning models and additional training with domain-specific datasets. Hence in this dissertation, a conditional content generation method embedded different subjective perspectives across heterogeneous content types is proposed, and demonstrated by utilizing different opinions of moves in the game of Go to generate associated contextual music segments. The result of the generated music on average has 1.687 Fréchet Audio Distance (FAD) score compared to a clean virtuosic classical piano music dataset, nearly at the level of studio recording quality. It also has the flexibility to combine with other generation models through the use of genetic embedding translator.

論文摘要 I Abstract II Acknowledgments III List of Figures VI List of Tables IX Chapter 1 Introduction 1 1.1. Background 1 1.1.1. Content Generation 1 1.1.2. Information Retrieval and Conditional Content Generation 2 1.2. Motivation 5 1.2.1. Personalized Content Generation 5 1.2.2. Personalized Content Enrichment 5 1.3. Research Target 7 1.4. Research Problems 8 1.4.1. Heterogeneous Contents 8 1.4.2. Scalability and Personalization 11 1.4.3. Bridging Interpretations and Content Creation 11 1.5. Dissertation Organization 12 Chapter 2 Subjective Perceptions in Heterogeneous Contents 13 2.1. Subjective Perceptions and AIs in the Game of Go 14 2.1.1. Go Games as Spectator Events 17 2.1.2. Representations for a Go Game Position and Commentaries 19 2.1.3. Common Go Game Terminologies 20 2.1.4. Perceptions and Expressions used in Go Games 21 2.1.5. Player Proficiency and Ranks in Go Games 22 2.2. Perceptions in Music 24 Chapter 3 Related Methods and Models 26 3.1. Generative Models 26 3.1.1. Autoencoder 26 3.1.2. Generative Adversarial Networks 28 3.1.3. Recurrent Neural Networks 30 3.1.4. Transformers 32 3.2. Deep Reinforcement Learning 34 3.3. Context and Perspective Translator 37 3.3.1. Quantized Vectors and Codebook 37 3.3.2. Tokenizer and Embedding 38 3.3.3. Genetic Optimization Embedding 39 3.4. Low-level Feature Identification 41 3.4.1. Mid-Level Features Extraction 42 3.5. Spectrogram Representation 45 Chapter 4 System Structure and Methods 47 4.1. Data Gathering and Dynamic Survey Generation 47 4.2. Self and Semi Supervised Learning 47 4.3. System Architecture 49 4.3.1. Go Game Perception Interpreter Model 50 4.3.2. Musika Music Content Generation Model 54 4.3.3. Conditional Go game board and Perception Integration Generator 56 4.3.4. Genetic Embedding Translator and Controller 57 Chapter 5 Experiments and Results 60 5.1. Experimental Setup and Hardware 60 5.2. Conditional Music Generation based on Perceptions 60 5.3. Conditional Image Commentary Generation 63 Chapter 6 Conclusion 67 6.1. Discussion 67 6.2. Conclusion 68 6.3. Future Works 69 References 71 Appendix A Go Game Basics and Terminologies 80 Appendix B Perceptions of Patterns in Go Game Positions 88 Appendix C Music Elements Aligned with Perceptions for Go Games 94 C.1. Music Associated Elements 94 C.1.1. Pitch and Note-Value 94 C.1.2. Measures and Themes 95 C.1.3. Tempo and Beats 95 C.1.4. Rhythm and Accent 96 C.1.5. Scale and Chord 96 C.2. Associated Perceptions between Music and Go Games 97 C.2.1. Height and Pitch 98 C.2.2. Big/Small and Loudness 99 C.2.3. Speed and Tempo 101 C.2.4. Strong/Soft and Rhythm 103 2.3.5. Complexity and Melody 106 Publication List 111

