Basic Search / Detailed Display

Author: 林軒而
Hsuan-Er Lin
Thesis Title: 結合多個深度模型的臉部情緒辨識系統與應用
Combining Multiple Deep Models for Facial Emotion Recognition and Applications
Advisor: 洪西進
Shi-Jinn Horng
Committee: 洪西進
Shi-Jinn Horng
范欽雄
Chin-Shyurng Fahn
李正吉
Cheng-Chi Lee
顏成安
Cheng-An Yen
Degree: 碩士
Master
Department: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
Thesis Publication Year: 2021
Graduation Academic Year: 109
Language: 中文
Pages: 46
Keywords (in Chinese): 卷積神經網路VGG16ResNet50SeNet50遷移學習臉部情緒識別Fer2013
Keywords (in other languages): CNN, VGG16, ResNet50, SeNet50, Transfer Learning, Facial Emotion Recognition, Fer2013
Reference times: Clicks: 271Downloads: 0
Share:
School Collection Retrieve National Library Collection Retrieve Error Report
  • 我們提出結合三個不同的卷積神經網路並融合各自分類結果最終決定出臉部情緒,這三個網路在過去幾年不同的圖像辨識競賽上面都有很好的成績,分別是 VGG16、ResNet50 和 SeNet50 網路架構,我們使用這三種網路架構搭配遷移學習的方法處理臉部情緒辨識任務,而結合三種網路的方法是使用投票法,票數最高的類別則勝出,投票結果是代表三個網路結合後的分類,如果投票結果不能決定情緒類別,就把三個各自網路中同類別的預測機率值做平均值計算,之後會得到七個類別做完平均值計算的結果,再從七種裡面取最大數值,此最大值決定最終情緒類別。
    我們的實驗結果在 Facial Expression Recognition 2013 Challenge(簡稱 Fer2013) 數據集中獲得目前已知最佳的結果,在 Fer2013 Private Test 中獲得 76.17%準確度,論文的結果比目前已知最新的方法超出 0.75%的準確度。
    我們也實作即時性的臉部情緒辨識系統,為了要達到即時臉部辨識與辨識準確度的平衡性,論文採用了單個 ResNet50 神經網路,亦是在 Fer2013 Private Test 中單個模型達到已知最佳的準確度,其準確度為 74.33%,達到了辨識準確度與即時辨識的平衡效果,最後替系統加上有趣的應用,當模型預測出臉部情緒後,會有對應卡通情緒圖像與鏡頭人臉做融合,使系統更添趣味性。


    We propose to combine three different convolutional neural
    networks(CNN) and fuse their classification results to finally
    determine the facial emotions, these three networks have achieved
    excellent results in different image recognition competitions in the
    past few years. They are VGG16 、 ResNet50 and SeNet50 network
    architectures, we use these three network architectures with transfer
    learning methods to process facial emotion recognition tasks, and the
    method of combining the three networks is to use the voting method,
    the category with the highest number of votes wins, and the voting
    result is represents the classification after the combination of the
    three networks. If the voting results cannot determine the emotional
    category, calculate the average value of the probability values of the
    same category in the individual three models, and the largest
    calculated value is regarded as the final category.
    Our experiment for Facial Expression Recognition(FER) 2013
    Challenge dataset demonstrates that our approach achieves stat-of-the-
    art results. With a top accuracy 76.17% on Fer2013 private test, we
    surpass the state-of-the-art methods by more than 0.75% on Fer2013
    private test.
    We also implement a real-time facial emotion recognition system.
    In order to achieve a balance between real-time facial recognition and
    recognition accuracy, the paper uses a single ResNet50 neural network,
    which is the best single model on the FER2013 private test. The accuracy
    is 74.33%.The balance effect of recognition accuracy and real-time
    recognition is achieved. Finally, add interesting applications to the
    system. When the model predicts the facial emotions, the corresponding
    cartoon emotion images will be fused with the camera faces to make the
    system more interesting.

    第一章 緒論 ..........................................1 1.1 研究動機與目的 ...........................................1 1.2 相關研究.................................................2 1.3 論文章節安排.............................................3 第二章 深度學習介紹 ..................................4 2.1 深度學習 .................................................4 2.2 卷積神經網路 .............................................5 2.3 池化層 ...................................................6 2.4 深度學習分類器 ...........................................6 第三章 深度模型的網路架構與融合 ......................7 3.1 VGG16 ...................................................7 3.2 ResNet50 ................................................8 3.3 SeNet50 ................................................10 3.4 融合深度模型的網路架構 .................................12 3.5 融合深度模型網路的方法 .................................13 第四章 實驗結果與討論 ...............................15 4.1 軟、硬體環境 ...........................................15 4.2 實驗用資料集 ...........................................15 4.2.1 VGG-FACE 資料集 ....................................15 4.2.2 VGG-FACE 2 資料集 ..................................16 4.2.3 Fer2013 資料集 .....................................17 4.3 實驗前處理 .............................................19 4.3.1 資料前處理 .........................................19 4.3.2 資料增強 ...........................................20 4.3.3 遷移學習 ...........................................20 4.4 臉部情緒辨識模型的訓練與測試 ...........................21 4.4.1 單模型結果 .........................................21 4.4.2 融合模型結果 .......................................24 4.4.3 混淆矩陣 ...........................................24 4.5 臉部情緒辨識系統的應用 .................................26 4.5.1 即時臉部情緒辨識系統 ...............................26 4.5.2 臉部情緒辨識結合卡通變臉應用 .......................29 第五章 結論與未來展望 ...............................31 5.1 結論 ...................................................31 5.2 未來展望 ...............................................32 參考文獻 ............................................32

    [1] F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding for face
    recognition and clustering,” IEEE conference on computer vision and pattern
    recognition, 2015, pp. 815-823.
    [2] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale
    image recognition,” arXiv preprint arXiv:1409.1556, 2014.
    [3] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”
    IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
    [4] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” IEEE conference on
    computer vision and pattern recognition, 2018, pp. 7132-7141.
    [5] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected
    convolutional networks,” IEEE conference on computer vision and pattern recognition,
    2017, pp. 4700-4708.
    [6] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are features in deep
    neural networks?,” arXiv preprint arXiv:1411.1792, 2014.
    [7] I. J. Goodfellow et al., “Challenges in representation learning: A report on three
    machine learning contests,” International conference on neural information processing,

    33
    2013: Springer, pp. 117-124.
    [8] Y. Tang, “Deep learning using linear support vector machines,” arXiv preprint
    arXiv:1306.0239, 2013.
    [9] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep
    convolutional neural networks,” Advances in neural information processing systems,
    vol. 25, pp. 1097-1105, 2012.
    [10] A. Berg, J. Deng, and L. Fei-Fei, “Large scale visual recognition challenge (ILSVRC),
    2010,” URL http://www. image-net. org/challenges/LSVRC, vol. 3, 2010.
    [11] Y. Tian, T. Kanade, and J. F. Cohn, “Facial expression recognition,” Handbook of face
    recognition: Springer, 2011, pp. 487-519.
    [12] W. Hua, F. Dai, L. Huang, J. Xiong, and G. Gui, “HERO: Human emotions recognition
    for realizing intelligent Internet of Things,” IEEE Access, vol. 7, pp. 24321-24332,
    2019.
    [13] B.-K. Kim, J. Roh, S.-Y. Dong, and S.-Y. Lee, “Hierarchical committee of deep
    convolutional neural networks for robust facial expression recognition,” Journal on
    Multimodal User Interfaces, vol. 10, no. 2, pp. 173-189, 2016.
    [14] D. Li and G. Wen, “MRMR-based ensemble pruning for facial expression recognition,”
    Multimedia Tools and Applications, vol. 77, no. 12, pp. 15251-15272, 2018.
    [15] G. Wen, Z. Hou, H. Li, D. Li, L. Jiang, and E. Xun, “Ensemble of deep neural networks
    with probability-based fusion for facial expression recognition,” Cognitive
    Computation, vol. 9, no. 5, pp. 597-610, 2017.
    [16] Z. Yu and C. Zhang, “Image based static facial expression recognition with multiple
    deep network learning,” ACM on international conference on multimodal interaction,
    2015, pp. 435-442.
    [17] T. Connie, M. Al-Shabi, W. P. Cheah, and M. Goh, “Facial expression recognition
    using a hybrid CNN–SIFT aggregator,” International Workshop on Multi-disciplinary
    Trends in Artificial Intelligence, 2017: Springer, pp. 139-149.
    [18] H. Kaya, F. Gürpınar, and A. A. Salah, “Video-based emotion recognition in the wild
    using deep transfer learning and score fusion,” Image and Vision Computing, vol. 65,
    pp. 66-75, 2017.
    [19] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International
    journal of computer vision, vol. 60, no. 2, pp. 91-110, 2004.
    [20] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” IEEE
    computer society conference on computer vision and pattern recognition (CVPR'05),
    2005, vol. 1: Ieee, pp. 886-893.
    [21] B. Hasani and M. H. Mahoor, “Facial expression recognition using enhanced deep 3D
    convolutional neural networks,” IEEE conference on computer vision and pattern
    recognition workshops, 2017, pp. 30-40.
    [22] M.-I. Georgescu, R. T. Ionescu, and M. Popescu, “Local learning with deep and
    handcrafted features for facial expression recognition,” IEEE Access, vol. 7, pp. 64827-
    64836, 2019.
    [23] J. L. Elman, “Finding structure in time,” Cognitive science, vol. 14, no. 2, pp. 179-211,
    1990.
    [24] http://deanhan.com/2018/07/26/vgg16/
    [25] O. M. Parkhi, A. Vedaldi, and A. Zisserman, “Deep face recognition,” 2015.
    [26] Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman, “Vggface2: A dataset for
    recognizing faces across pose and age,” IEEE international conference on automatic
    face & gesture recognition (FG 2018), 2018: IEEE, pp. 67-74.
    [27] C. Shorten and T. Khoshgoftaar, “A survey on image data augmentation for deep
    learning,” J. Big Data 6, 60 (2019). https://doi.org/10.1186/s40537-019-0197-0
    [28] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, “Joint face detection and alignment using
    multitask cascaded convolutional networks,” IEEE Signal Processing Letters, vol. 23,
    no. 10, pp. 1499-1503, 2016.
    [29] https://www.huiyi8.com/sc/41196.html

    無法下載圖示 Full text public date 2026/05/05 (Intranet public)
    Full text public date 2026/05/05 (Internet public)
    Full text public date 2026/05/05 (National library)
    QR CODE