簡易檢索 / 詳目顯示

研究生: 林欣艷
Xin-Yan Lim
論文名稱: 二胡音樂視覺化:音樂情緒與影像情緒關係之研究
Erhu music visualization: a study on the relationship between music emotions and image emotions
指導教授: 孫沛立
Pei-Li Sun
口試委員: 歐立成
Li-Chen Ou
胡國瑞
Kuo-Jui Hu
學位類別: 碩士
Master
系所名稱: 應用科技學院 - 色彩與照明科技研究所
Graduate Institute of Color and Illumination Technology
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 103
中文關鍵詞: 二胡表演音樂視覺化音樂特徵影像情感音樂情感
外文關鍵詞: Erhu performance, music visualization, musical features, image emotion, music emotion
相關次數: 點閱:198下載:10
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

本研究以二胡音樂為對象,探討音樂特徵、音樂情緒、影像情緒和動態視覺效果之間相互的關聯性。目的在於為單調的二胡獨奏,提供能夠對聽眾的音樂情緒產生加分作用的影像與動態視覺效果。為了分析聽覺與視覺之間的情緒關連性,本研究設計了以下五個人因實驗:
實驗一「音樂與生成影像實驗」以十二個不同音樂特徵的二胡音樂片段作為刺激,利用文本描述式影像生成模型—Midjourney創建了多幅與Russel情感模型相關的中國風格影像,要求20位受測者選擇音樂特徵相應的情緒,以及最適合的生成影像。分析結果顯示,特定音樂特徵(如音高和節奏)與不同情感之間存在強烈的關聯,支持Russell情感模型解釋音樂與影像情緒反應的有效性。然而,速度和音高與情感之間的關聯較弱。
在實驗二「情緒與動態影像實驗」中,研究了動態影像與情緒之間的關係。分析結果顯示,影像統計學特徵,如CIELAB和標準差,與影像喚起的情感之間沒有顯示出強烈的聯繫。然而,自然性分組中出現了有趣的模式,「自然」影像往往喚起積極但低能量的情感。運度速度低的影像,也呈現類似的效果。
實驗三「情緒與影像色調實驗」涉及使用不同LUTs(對照表)的彩色圖像。不同LUTs與特定情感相關聯:例如,「日落」和「鮮艷」LUTs喚起高能量但消極情感,而「霧夜」和「城市牛仔」LUTs與悲傷和厭倦等情感相關聯。「光」LUT與興奮和快樂相關聯。此外,未經修改的原始影像與平靜、寧靜和放鬆的感覺強烈相關。
實驗四「音樂與影像類型實驗」旨在結合實驗一至三的結果,探索音樂特徵與不同影像類型之間的關係。分析結果顯示特定影像類型與特定情續之間存在較強的關聯,代表某些圖像表現更有效地喚起特定情感反應。此外,實驗一和實驗四結果的比較表明,不同影像類型在喚起受測者情續方面可能具有程度不同的影響。
最後,實驗五「二胡演奏視覺化模式實驗」調查了受測者對二胡獨奏影片樣本中的音樂可視化模式的偏好。以「色彩表示音高」的可視化得分最高,而「以音量和演奏者動作為基礎」的可視化模式,在有效傳達音樂類型和情感方面表現較差。
總體而言,本研究為二胡音樂特徵、影像情感和音樂可視化之間複雜聯繫提供了有價值的見解,為感知線索如何影響情感體驗提供了新的視角。這些發現對於設計具有情感衝擊力的影音內容和利用音樂與影像特徵實現沈浸式情感體驗的交互應用具有重要意義。


This study focuses on the erhu music and explores the correlations among musical features, musical emotions, visual emotions, and animated visual effects. The aim is to provide visually and dynamically enhancing effects for erhu performance that can positively influence the audience's emotional response. In order to analyze the emotional connections between auditory and visual stimuli, this research includes 5 psychophysical experiments as follows:
Experiment 1, "Music to Text-guided Image Experiment," employed 12 erhu music clips with different musical features as stimuli. Using the text-guided AI image generation model "Midjourney," multiple Chinese-style images related to Russell's emotion model were created. 20 participants were asked to choose the corresponding emotional responses to the musical features and the most suitable generated images. The results indicate strong correlations between specific musical features like scale and rhythm and various emotions, supporting the efficacy of Russell's emotion model in explaining emotional reactions to music and images. However, the correlations between tempo, pitch, and emotions are weaker.
Experiment 2, "Emotion to Animated Image Experiment," investigates the relationship between dynamic images and emotions. The analysis reveals that image statistical features such as CIELAB and standard deviation show no strong connections with the elicited emotions from the images. Nevertheless, interesting patterns emerge in the "naturalness" image group, where "natural" images often evoke positive but low-energy emotions. Similarly, images with low motion speed exhibit similar effects.
Experiment 3, "Emotion to Colorized Image Experiment," involves using different LUTs (Lookup Tables) on colored images. Different LUTs are associated with specific emotions. For instance, "sunset" and "vibrant" LUTs evoke high-energy but negative emotions, while "foggy night" and "urban cowboy" LUTs are linked to emotions like sadness and boredom. The "light" LUT is associated with excitement and happiness. Additionally, unmodified original images strongly correlate with feelings of calmness, serenity, and relaxation.
Experiment 4, "Music to Categorized Image Experiment," combines the results of Experiments 1 to 3 to explore the relationship between musical features and different image types. The analysis demonstrates strong connections between specific image types and certain emotional responses, indicating that certain image types are more effective in eliciting specific emotional reactions. Furthermore, a comparison between the results of Experiments 1 and 4 suggests that different image types may have varying degrees of impact on eliciting emotional responses from participants.
Finally, Experiment 5, "Music Visualization of Erhu Performance Experiment," investigates participants' preferences for visualizing patterns of music in erhu solo video samples. The "color representing pitch" visualization method receives the highest score, while the visualization pattern based on "volume and performer's movement" performs poorly in effectively conveying music genre and emotions.
Overall, this study offers valuable insights into the intricate connections between erhu musical features, visual emotions, and music visualization. It provides a fresh perspective on how perceptual cues influence emotional experiences, which is significant for designing emotionally impactful audio-visual content and employing music and image features to create immersive emotional experiences in interactive applications.

摘要 I Abstract III 致謝 V Table of Contents VI List of Figures IX List of Tables XII Chapter 1 Introduction 1 1.1 Background 1 1.2 Motivation and Aims 2 1.3 Thesis Outline 3 Chapter 2 Literature Review 4 2.1 Music and Emotion 4 2.1.1 Music Element 4 2.1.2 Music Emotion 6 2.2 Color and Image Emotion 8 2.3 Music and Color Emotion 10 2.4 Text-guided Image Generation 10 2.5 Music Visualization 13 2.6 Image Tracking Features 15 Chapter 3 Methods Overview 18 3.1 Experiment 1: Music to Text-guided Image 19 3.1.1 Experiment Design 19 3.1.2 Audio Samples 20 3.1.3 Text-guided Image Generator Comparison 22 3.1.4 Image Selected for Experiment 1 24 3.1.5 Experiment Procedure 27 3.1.6 Methods for Data Analysis 30 3.1.7 Summary 33 3.2 Experiment 2: Emotion to Animated Image 34 3.2.1 Experiment Design 34 3.2.2 Image Selection 34 3.2.3 Experimental Procedure 38 3.2.4 Methods for Data Analysis 40 3.2.5 Summary 40 3.3 Experiment 3: Emotion to Colorized Image 41 3.3.1 Experiment Design 41 3.3.2 Methods for Colorization 42 3.3.3 Image Selection 42 3.3.4 Experimental Procedure 45 3.3.5 Methods for Data Analysis 48 3.3.6 Summary 48 3.4 Experiment 4: Music to Categorized Image 48 3.4.1 Experimental Design 49 3.4.2 Image Selection 49 3.4.3 Experimental Procedure 49 3.4.4 Methods for Data Analysis 52 3.4.5 Summary 52 3.5 Experiment 5: Music Visualization of Erhu Performance 53 3.5.1 Experimental Design 53 3.5.2 Video Sample 54 3.5.3 Audio Conversion 54 3.5.4 MediaPipe 55 3.5.5 Optical Flow 55 3.5.6 Music Visualization 56 3.5.7 Experiment Questionnaire 61 3.5.8 Experiment Procedure 62 3.5.9 Methods for Data Analysis 62 3.5.10 Summary 63 Chapter 4 Results and Discussion 64 4.1 Result of Experiment 1 64 4.1.1 Scale Classification 67 4.1.2 Tempo Classification 68 4.1.3 Rhythm Classification 68 4.1.4 Pitch Classification 69 4.1.5 Gender Comparison 70 4.1.6 Repeatability 73 4.1.7 Summary Result of Experiment 1 75 4.2 Result of Experiment 2 80 4.2.1 Naturalness 82 4.2.2 Image Statistics on Lightness 83 4.2.3 Image Statistics on Chroma 84 4.2.4 Image Statistics on Lightness Contrast 85 4.2.5 Image Statistics on Magnitude of Motivation 86 4.2.6 Conclusion of Image Statistics Analysis for Animated Image 87 4.3 Result of Experiment 3 87 4.4 Result of Experiment 4 91 4.4.1 The comparison of Experiment 1 and Experiment 4 94 4.5 Result of Experiment 5 95 4.5.1 Repeatability 97 Chapter 5 Conclusions and Future Works 99 5.1 Conclusions 99 5.2 Future Works 100 References 101 Appendix A The sheet music of Experiment 1 audio samples 104 Appendix B Image prompts of Experiment 1 110 Appendix C LUTs Source 118 Appendix D Result of Experiment 1 119

[1] D. B. Lindsley, "Emotion," 1951.
[2] S. L. Koole, "The psychology of emotion regulation: An integrative review," Cognition and emotion, vol. 23, no. 1, pp. 4-41, 2009.
[3] J. M. Satterfield and E. Hughes, "Emotion skills training for medical students: a systematic review," Medical education, vol. 41, no. 10, pp. 935-941, 2007.
[4] M. Egger, M. Ley, and S. Hanke, "Emotion recognition from physiological signal analysis: A review," Electronic Notes in Theoretical Computer Science, vol. 343, pp. 35-55, 2019.
[5] L. B. Meyer, Emotion and meaning in music. University of chicago Press, 2008.
[6] J. Martineau, Elements of Music. eBook Partnership, 2021.
[7] K. Bove. "Elements of Music." https://kaitlinbove.com/elements-of-music (accessed 08/20, 2023).
[8] W. Song and A. B. Horner, "Uncovering the differences between the violin and erhu musical instruments by statistical analysis of multiple musical pieces," in Proceedings of Meetings on Acoustics, 2022, vol. 50, no. 1: AIP Publishing.
[9] J. T. Resources. "What Is Tonality in Music." https://juliajooya.com/2020/12/31/tonality-in-music/ (accessed 08/19, 2023).
[10] J. A. Sloboda and P. N. Juslin, "Psychological perspectives on music and emotion," Music and emotion: Theory and research, pp. 71-104, 2001.
[11] H.-A. Arjmand, J. Hohagen, B. Paton, and N. S. Rickard, "Emotional responses to music: Shifts in frontal brain asymmetry mark periods of musical change," Frontiers in psychology, vol. 8, p. 2044, 2017.
[12] J. A. Russell, "A circumplex model of affect," Journal of personality and social psychology, vol. 39, no. 6, p. 1161, 1980.
[13] Y. Song, S. Dixon, and M. T. Pearce, "Evaluation of musical features for emotion classification," in ISMIR, 2012, pp. 523-528.
[14] B.-j. Han, S. Rho, R. B. Dannenberg, and E. Hwang, "SMERS: Music Emotion Recognition Using Support Vector Regression," in ISMIR, 2009, pp. 651-656.
[15] R. Panda, R. M. Malheiro, and R. P. Paiva, "Audio features for music emotion recognition: a survey," IEEE Transactions on Affective Computing, 2020.
[16] S. Pouyanfar and H. Sameti, "Music emotion recognition using two level classification," in 2014 Iranian Conference on Intelligent Systems (ICIS), 2014: IEEE, pp. 1-6.
[17] Y.-H. Yang, Y.-C. Lin, Y.-F. Su, and H. H. Chen, "A regression approach to music emotion recognition," IEEE Transactions on audio, speech, and language processing, vol. 16, no. 2, pp. 448-457, 2008.
[18] A. M. Bhatti, M. Majid, S. M. Anwar, and B. Khan, "Human emotion recognition and analysis in response to audio music using brain signals," Computers in Human Behavior, vol. 65, pp. 267-275, 2016.
[19] K. Zhang, X. Wu, R. Tang, Q. Huang, C. Yang, and H. Zhang, "The JinYue database for huqin music emotion, scene and imagery recognition," in 2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR), 2021: IEEE, pp. 314-319.
[20] P. Valdez and A. Mehrabian, "Effects of color on emotions," Journal of experimental psychology: General, vol. 123, no. 4, p. 394, 1994.
[21] L. C. Ou, M. R. Luo, A. Woodcock, and A. Wright, "A study of colour emotion and colour preference. Part I: Colour emotions for single colours," Color Research & Application, vol. 29, no. 3, pp. 232-240, 2004.
[22] S. Liao, K. Sakata, and G. V. Paramei, "Color affects recognition of emoticon expressions," i-Perception, vol. 13, no. 1, p. 20416695221080778, 2022.
[23] M. Solli and R. Lenz, "Color emotions for multi‐colored images," Color Research & Application, vol. 36, no. 3, pp. 210-221, 2011.
[24] N. Kaya and H. H. Epps, "Relationship between color and emotion: A study of college students," College student journal, vol. 38, no. 3, pp. 396-405, 2004.
[25] S. Kobayashi, "The aim and method of the color image scale," Color research & application, vol. 6, no. 2, pp. 93-107, 1981.
[26] J. Jun et al., "Psychophysical and psychophysiological measurement of image emotion," in Color and Imaging Conference, 2010, vol. 2010, no. 1: Society for Imaging Science and Technology, pp. 121-127.
[27] S. Zhao, Y. Gao, X. Jiang, H. Yao, T.-S. Chua, and X. Sun, "Exploring principles-of-art features for image emotion recognition," in Proceedings of the 22nd ACM international conference on Multimedia, 2014, pp. 47-56.
[28] J. Machajdik and A. Hanbury, "Affective image classification using features inspired by psychology and art theory," in Proceedings of the 18th ACM international conference on Multimedia, 2010, pp. 83-92.
[29] S. Fan et al., "Emotional attention: A study of image sentiment and visual attention," in Proceedings of the IEEE Conference on computer vision and pattern recognition, 2018, pp. 7521-7531.
[30] E. S. Isbilen and C. L. Krumhansl, "The color of music: Emotion-mediated associations to Bach’s Well-tempered Clavier," Psychomusicology: Music, Mind, and Brain, vol. 26, no. 2, p. 149, 2016.
[31] S. E. Palmer, K. B. Schloss, Z. Xu, and L. R. Prado-León, "Music–color associations are mediated by emotion," Proceedings of the National Academy of Sciences, vol. 110, no. 22, pp. 8836-8841, 2013.
[32] R. Bresin, "What is the color of that music performance?," in ICMC, 2005.
[33] B. Li, X. Qi, T. Lukasiewicz, and P. H. Torr, "Manigan: Text-guided image manipulation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 7880-7889.
[34] A. Borji, "Generated faces in the wild: Quantitative comparison of stable diffusion, midjourney and dall-e 2," arXiv preprint arXiv:2210.00586, 2022.
[35] L. M. Machine Vision and Learning Group. "Stable Diffusion architecture." https://github.com/CompVis/latent-diffusion/blob/main/assets/modelfigure.png (accessed 07/04, 2023).
[36] R. Miyazaki, I. Fujishiro, and R. Hiraga, "Exploring MIDI datasets," in ACM SIGGRAPH 2003 sketches & applications, 2003, pp. 1-1.
[37] E. Chew and A. R. François, "Interactive multi-scale visualizations of tonal evolution in MuSA. RT Opus 2," Computers in Entertainment (CIE), vol. 3, no. 4, pp. 1-16, 2005.
[38] W.-Y. Chan, H. Qu, and W.-H. Mak, "Visualizing the semantic structure in classical music works," IEEE transactions on visualization and computer graphics, vol. 16, no. 1, pp. 161-173, 2009.
[39] G. D. Cantareira, L. G. Nonato, and F. V. Paulovich, "Moshviz: A detail+ overview approach to visualize music elements," IEEE Transactions on Multimedia, vol. 18, no. 11, pp. 2238-2246, 2016.
[40] K. A. Lim and C. Raphael, "Intune: A system to support an instrumentalist's visualization of intonation," Computer Music Journal, vol. 34, no. 3, pp. 45-55, 2010.
[41] A. Lehtiniemi and J. Holm, "Using animated mood pictures in music recommendation," in 2012 16th International Conference on Information Visualisation, 2012: IEEE, pp. 143-150.
[42] P. Ciuha, B. Klemenc, and F. Solina, "Visualization of concurrent tones in music with colours," in Proceedings of the 18th ACM international conference on Multimedia, 2010, pp. 1677-1680.
[43] T. Bergstrom, K. Karahalios, and J. C. Hart, "Isochords: visualizing structure in music," in Proceedings of Graphics Interface 2007, 2007, pp. 297-304.
[44] J. H. Fonteles, M. A. F. Rodrigues, and V. E. D. Basso, "Creating and evaluating a particle system for music visualization," Journal of Visual Languages & Computing, vol. 24, no. 6, pp. 472-482, 2013.
[45] K. N. Bowens, "Interactive musical visualization based on emotional and color theory," Texas A & M University, 2010.
[46] B. K. Horn and B. G. Schunck, "Determining optical flow," Artificial intelligence, vol. 17, no. 1-3, pp. 185-203, 1981.
[47] C. Lugaresi et al., "Mediapipe: A framework for building perception pipelines," arXiv preprint arXiv:1906.08172, 2019.
[48] C. Gan, D. Huang, H. Zhao, J. B. Tenenbaum, and A. Torralba, "Music gesture for visual sound separation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 10478-10487.
[49] K. Jakubowski, T. Eerola, P. Alborno, G. Volpe, A. Camurri, and M. Clayton, "Extracting coarse body movements from video in music performance: A comparison of automated computer vision techniques with motion capture data," Frontiers in Digital Humanities, vol. 4, p. 9, 2017.
[50] H. Bayd, P. Guyot, B. Bardy, and P. R. Slangen, "Automatic scoring of synchronization from fingers motion capture and music beats," in International Conference on Image Analysis and Processing, 2022: Springer, pp. 235-245.
[51] S. Zhang, "Erhu as Violin: Development of China's Representative Musical Instrument, c. 1990-2008," University of Pittsburgh, 2010.
[52] R. E. Thayer, The biopsychology of mood and arousal. Oxford University Press, 1990.

QR CODE