簡易檢索 / 詳目顯示

研究生: 黃振群
Chen-Chun Huang
論文名稱: 使用合成情緒影像在深度卷積網路中進行人臉情緒分類之研究
A Study of Human Face Sentiment Classification Using Synthetic Sentiment Images with Deep Convolutional Neural Networks
指導教授: 吳怡樂
Yi-Leh Wu
口試委員: 陳建中
Jiann-Jone Chen
唐政元
Cheng-Yuan Tang
吳怡樂
Yi-Leh Wu
閻立剛
Li-Gang Yan
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 中文
論文頁數: 31
中文關鍵詞: TensorFlow情緒辨識合成圖片深度卷積神經網路深度學習
外文關鍵詞: TensorFlow, Sentiment Classification, Convolution Neural Network, Deep Learning, Synthetic Image
相關次數: 點閱:260下載:15
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在社交網路上,圖片是用戶表達情緒的重要方式之一。因為圖片的便捷性,越來越多的人會在社交網路上上傳圖片。在過去,情感分析大部分都聚焦在文本內容上,像是latent semantic analysis,SVM,bag of words,等等。深度學習通常都需要耗費大量的時間來訓練以達到好的學習效果,但隨著硬體和演算法的發展,訓練時間不再是主要問題,而是數據集的大小。深層卷積網路在圖片分類上有非常優異的效果。在此篇論文中我們利用深層卷積網路來從視覺內容解決圖片情緒分析的問題。因為訓練神經網路需要大量的數據集來提供好的訓練表現,而我們無法獲得這樣的真實人類的情緒訓練集,因為情緒是主觀的,必須由多個人來為圖片提供註釋,這需要耗費大量的人力。本研究提出將合成人臉圖像納入訓練集來充分增加訓練集的大小。我們會比較在訓練集中僅只用合成臉部圖像、真實臉部圖像和混和圖像(真實+合成)。我們的實驗表明,只使用4026張真實圖像,其中把每個圖像利用合成圖像補充到相同的資料集大小(Anger:1063合+937真、Disgust:1857合+143真、Fear:1802合+198真、Happy:2000真、Sad:1252合+748真)共一萬張,可以在測試集中各達到87.79%, 74.19%, 86.99%, 79.80%的平均準確率。


    Image is one of the most important ways for users to express their emotions on social networks. Because of the convenience of image, more and more people upload images on social networks. In the past few years, most of the sentiment analysis focused on textual content, such as latitude semantic analysis, SVM, bag of words, etc. Deep learning usually takes a lot of time to train to achieve good learning results, but with the development of hardware and new algorithms, training time is no longer the main problem, but the size of the data set. Deep convolutional neural networks have very good results in image classification. In this paper, we use the deep convolutional neural networks to solve the problem of image sentiment analysis from visual content. Because training a neural network requires a large number of data sets to provide good training performance, we cannot obtain such a real human emotion training set, because emotions are subjective, and multiple people need to provide annotations for the images, which requires a lot of manpower. This study proposes to incorporate synthetic face images into the training set to substantially increase the size of the training set. We use only synthetic face images, real facial images, and mixtures of synthetic and real facial images in the training set. Our experiments show that by using only 4026 real images, where each image is supplemented by the synthetic image to the same data set size (Anger: 1063 + 937 true, Disgust: 1857 + 143 true, Fear: 1802 + 198 true, Happy: 2000 true, Sad: 1252 + 748 true) total of 10,000 images, can reach 87.79%, 74.19%, 86.99%, 79.80% average testing accuracy in each testing set in human face sentiment classification.

    論文摘要……………………...……..……………….…………………...….…I Abstract…………………..…...…………………….…………………....….…II Contents…………………...……………..………...……………………….…III LIST OF FIGURES……….……..………..………...………..…………….…IV LIST OF TABLES………………...……………………………………….......V Chapter 1. Introduction………….………………………………………….…..1 Chapter 2. Deep Learning Model………………………………………………5 Chapter 3. TensorFlow and Original Model………………….………………...7 3.1 TensorFlow……………………………………………………...…......7 3.2 CNN and Original Model……………………………………..…...…..8 Chapter 4. Experiment……………………………………………………...…13 4.1 Synthetic Face Dataset and Real Face Dataset………...……….……..13 4.2 Using Real Face Dataset on AlexNet Model……………………….…16 4.3 Using Synthetic Face Dataset on AlexNet Model…………….........…17 4.4 Using Mixture Dataset on AlexNet Model……….…………………18 Chapter 5. Conclusions and Future Work..………………………………..…..22 References……………………………………………………………..........…23

    [1] P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews, “The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression,” in Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE Computer Society Conference on. IEEE, 2010, pp. 94–101, 2010.
    [2] R. Gross, I. Matthews, J. Cohn, T. Kanade, and S. Baker, “Multipie,” Image and Vision Computing, vol. 28, no. 5, pp. 807–813, 2010.
    [3] M. Pantic, M. Valstar, R. Rademaker, and L. Maat, “Web-based database for facial expression analysis,” in Multimedia and Expo, 2005. ICME 2005. IEEE International Conference on. IEEE, pp. 5, 2005.
    [4] D. McDuff, R. Kaliouby, T. Senechal, M. Amr, J. Cohn, and R. Picard, “Affectiva -mit facial expression dataset (am-fed): Naturalistic and spontaneous facial expressions collected,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 881–888, 2013.
    [5] A. Dhall, R. Goecke, J. Joshi, M. Wagner, and T. Gedeon, “Emotion recognition in the wild challenge 2013,” in Proceedings of the 15th ACM on International conference on multimodal interaction. ACM, pp. 509–516, 2013.
    [6] I. J. Goodfellow, D. Erhan, P. L. Carrier, A. Courville, M. Mirza, B. Hamner, W. Cukierski, Y. Tang, D. Thaler, D.-H. Lee et al., “Challenges in representation learning: A report on three machine learning contests,” Neural Networks, vol. 64, pp. 59–63, 2015.
    [7] C. F. Benitez-Quiroz, R. Srinivasan, and A. M. Martinez, “Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild,” in Proceedings of IEEE International Conference on Computer Vision & Pattern Recognition (CVPR16), Las Vegas, NV, USA, 2016.
    [8] A. Mollahosseini, B. Hasani, M. J. Salvador, H. Abdollahi, D. Chan, and M. H. Mahoor, “Facial expression recognition from world wild web,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2016.
    [9] A. Mollahosseini, B. Hasani and M. H. Mahoor, “AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2017
    [10] A. Mollahosseini, D. Chan, and M. H. Mahoor, “Going deeper in facial expression recognition using deep neural networks,” IEEE Winter Conference on Applications of Computer Vision (WACV), 2016.
    [11] C. Shan, S. Gong, and P. W. McOwan, “Facial expression recognition based on local binary patterns: A comprehensive study,” Image and Vision Computing, vol. 27, no. 6, pp. 803–816, 2009.
    [12] D. McDuff, R. Kaliouby, T. Senechal, M. Amr, J. Cohn, and R. Picard, “Affectiva-mit facial expression dataset (am-fed): Naturalistic and spontaneous facial expressions collected,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 881–888, 2013.
    [13] Y. Fan, X. Lu, D. Li, and Y. Liu, “Video-based emotion recognition using cnn-rnn and c3d hybrid networks,” in Proceedings of the 18th ACM International Conference on Multimodal Interaction. ACM, pp. 445–450, 2016.
    [14] Y. Tang, “Deep learning using linear support vector machines,” arXiv preprint arXiv:1306.0239, 2013.
    [15] C. F. Benitez-Quiroz, R. Srinivasan, and A. M. Martinez, “Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild,” in Proceedings of IEEE International Conference on Computer Vision & Pattern Recognition (CVPR16), Las Vegas, NV, USA, 2016.
    [16] V. Campos, B. Jou, X. G. Nieto, “From Pixels to Sentiment: Fine-tuning CNNs for Visual Sentiment Prediction”, 2016.
    [17] Amazon Mechanical Turk, “https://www.mturk.com/,” Referenced on May 〖30〗^th, 2018.
    [18] Deep learning, “https://www.wikiwand.com/en/Deep_learning,” Referenced on May 〖30〗^th, 2018.
    [19] Garofolo, John S., et al. TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1. Web Download. Philadelphia: Linguistic Data Consortium, 1993.
    [20] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in NIPS'12 Proceedings of the 25th International Conference on Neural Information Processing Systems – Vol. 1 Pages 1097-1105, Dec, 2012
    [21] A. Krizhevsky, and G. Hinton, “Learning multiple layers of features from tiny images,” Technical report, University of Toronto, April, 2009
    [22] Support vector machine, “https://www.wikiwand.com/en/Support_vector_machin
    E,” Referenced on May 〖30〗^th, 2018.
    [23] Support vector machine, “http://mropengate.blogspot.com/2015/03/support-vecto
    r-machines-svm.html,” Referenced on May 〖30〗^th, 2018.
    [24] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mane, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viegas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng Google Research∗, “TensorFlow:
    Large-Scale Machine Learning on Heterogeneous Distributed Systems”, 2015
    [25] Graphs and Sessions | TensorFlow, “https://www.tensorflow.org/programmers_gu
    ide/graphs,” Referenced on May 〖30〗^th, 2018.
    [26] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks.” In Advances in neural information processing systems (pp. 1097-1105), 2012.
    [27] Nair, V., & Hinton, G. E. “Rectified linear units improve restricted boltzmann machines.” In Proceedings of the 27th International Conference on Machine Learning (ICML-10) (pp. 807-814), 2010.
    [28] Boureau, Y. L., Bach, F., LeCun, Y., & Ponce, J.” Learning mid-level features for recognition.” In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on (pp. 2559-2566). IEEE, 2010.
    [29] Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. “Improving neural networks by preventing co-adaptation of feature detectors.” arXiv preprint arXiv:1207.0580, 2012.
    [30] A. Mollahosseini, B. Hasani, M. H. Mahoor, “AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild.” Computer Vision and Pattern Recognition, 2017.
    [31] FaceGen – 3D Human Faces. “https://facegen.com/” Referenced on June 1^st, 2018.
    [32] Deep Learning: the role of the activation function “http://mropengate.blogspot.co
    m/2017/02/deep-learning-role-of-activation.html” Referenced on June 20, 2018.

    QR CODE