結合多個深度模型的臉部情緒辨識系統與應用｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	林軒而 Hsuan-Er Lin
論文名稱：	結合多個深度模型的臉部情緒辨識系統與應用 Combining Multiple Deep Models for Facial Emotion Recognition and Applications
指導教授：	洪西進 Shi-Jinn Horng
口試委員:	洪西進 Shi-Jinn Horng 范欽雄 Chin-Shyurng Fahn 李正吉 Cheng-Chi Lee 顏成安 Cheng-An Yen
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2021
畢業學年度：	109
語文別：	中文
論文頁數：	46
中文關鍵詞：	卷積神經網路、VGG16 、ResNet50 、SeNet50 、遷移學習、臉部情緒識別、Fer2013
外文關鍵詞：	CNN, VGG16, ResNet50, SeNet50, Transfer Learning, Facial Emotion Recognition, Fer2013
相關次數：	點閱：266 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

我們提出結合三個不同的卷積神經網路並融合各自分類結果最終決定出臉部情緒,這三個網路在過去幾年不同的圖像辨識競賽上面都有很好的成績,分別是 VGG16、ResNet50 和 SeNet50 網路架構,我們使用這三種網路架構搭配遷移學習的方法處理臉部情緒辨識任務,而結合三種網路的方法是使用投票法,票數最高的類別則勝出,投票結果是代表三個網路結合後的分類,如果投票結果不能決定情緒類別,就把三個各自網路中同類別的預測機率值做平均值計算,之後會得到七個類別做完平均值計算的結果,再從七種裡面取最大數值,此最大值決定最終情緒類別。
我們的實驗結果在 Facial Expression Recognition 2013 Challenge(簡稱 Fer2013) 數據集中獲得目前已知最佳的結果,在 Fer2013 Private Test 中獲得 76.17%準確度,論文的結果比目前已知最新的方法超出 0.75%的準確度。
我們也實作即時性的臉部情緒辨識系統,為了要達到即時臉部辨識與辨識準確度的平衡性,論文採用了單個 ResNet50 神經網路,亦是在 Fer2013 Private Test 中單個模型達到已知最佳的準確度,其準確度為 74.33%,達到了辨識準確度與即時辨識的平衡效果,最後替系統加上有趣的應用,當模型預測出臉部情緒後,會有對應卡通情緒圖像與鏡頭人臉做融合,使系統更添趣味性。

We propose to combine three different convolutional neural
networks(CNN) and fuse their classification results to finally
determine the facial emotions, these three networks have achieved
excellent results in different image recognition competitions in the
past few years. They are VGG16 、 ResNet50 and SeNet50 network
architectures, we use these three network architectures with transfer
learning methods to process facial emotion recognition tasks, and the
method of combining the three networks is to use the voting method,
the category with the highest number of votes wins, and the voting
result is represents the classification after the combination of the
three networks. If the voting results cannot determine the emotional
category, calculate the average value of the probability values of the
same category in the individual three models, and the largest
calculated value is regarded as the final category.
Our experiment for Facial Expression Recognition(FER) 2013
Challenge dataset demonstrates that our approach achieves stat-of-the-
art results. With a top accuracy 76.17% on Fer2013 private test, we
surpass the state-of-the-art methods by more than 0.75% on Fer2013
private test.
We also implement a real-time facial emotion recognition system.
In order to achieve a balance between real-time facial recognition and
recognition accuracy, the paper uses a single ResNet50 neural network,
which is the best single model on the FER2013 private test. The accuracy
is 74.33%.The balance effect of recognition accuracy and real-time
recognition is achieved. Finally, add interesting applications to the
system. When the model predicts the facial emotions, the corresponding
cartoon emotion images will be fused with the camera faces to make the
system more interesting.

第一章 緒論 ..........................................1
1 研究動機與目的 ...........................................1
2 相關研究.................................................2
3 論文章節安排.............................................3
第二章 深度學習介紹 ..................................4
1 深度學習 .................................................4
2 卷積神經網路 .............................................5
3 池化層 ...................................................6
4 深度學習分類器 ...........................................6
第三章 深度模型的網路架構與融合 ......................7
1 VGG16 ...................................................7
2 ResNet50 ................................................8
3 SeNet50 ................................................10
4 融合深度模型的網路架構 .................................12
5 融合深度模型網路的方法 .................................13
第四章 實驗結果與討論 ...............................15
1 軟、硬體環境 ...........................................15
2 實驗用資料集 ...........................................15
2.1 VGG-FACE 資料集 ....................................15
2.2 VGG-FACE 2 資料集 ..................................16
2.3 Fer2013 資料集 .....................................17
3 實驗前處理 .............................................19
3.1 資料前處理 .........................................19
3.2 資料增強 ...........................................20
3.3 遷移學習 ...........................................20
4 臉部情緒辨識模型的訓練與測試 ...........................21
4.1 單模型結果 .........................................21
4.2 融合模型結果 .......................................24
4.3 混淆矩陣 ...........................................24
5 臉部情緒辨識系統的應用 .................................26
5.1 即時臉部情緒辨識系統 ...............................26
5.2 臉部情緒辨識結合卡通變臉應用 .......................29
第五章 結論與未來展望 ...............................31
1 結論 ...................................................31
2 未來展望 ...............................................32
參考文獻 ............................................32
                                

[1] F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding for face
recognition and clustering,” IEEE conference on computer vision and pattern
recognition, 2015, pp. 815-823.
[2] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale
image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[3] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”
IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[4] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” IEEE conference on
computer vision and pattern recognition, 2018, pp. 7132-7141.
[5] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected
convolutional networks,” IEEE conference on computer vision and pattern recognition,
2017, pp. 4700-4708.
[6] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are features in deep
neural networks?,” arXiv preprint arXiv:1411.1792, 2014.
[7] I. J. Goodfellow et al., “Challenges in representation learning: A report on three
machine learning contests,” International conference on neural information processing,

33
2013: Springer, pp. 117-124.
[8] Y. Tang, “Deep learning using linear support vector machines,” arXiv preprint
arXiv:1306.0239, 2013.
[9] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep
convolutional neural networks,” Advances in neural information processing systems,
vol. 25, pp. 1097-1105, 2012.
[10] A. Berg, J. Deng, and L. Fei-Fei, “Large scale visual recognition challenge (ILSVRC),
2010,” URL http://www. image-net. org/challenges/LSVRC, vol. 3, 2010.
[11] Y. Tian, T. Kanade, and J. F. Cohn, “Facial expression recognition,” Handbook of face
recognition: Springer, 2011, pp. 487-519.
[12] W. Hua, F. Dai, L. Huang, J. Xiong, and G. Gui, “HERO: Human emotions recognition
for realizing intelligent Internet of Things,” IEEE Access, vol. 7, pp. 24321-24332,
2019.
[13] B.-K. Kim, J. Roh, S.-Y. Dong, and S.-Y. Lee, “Hierarchical committee of deep
convolutional neural networks for robust facial expression recognition,” Journal on
Multimodal User Interfaces, vol. 10, no. 2, pp. 173-189, 2016.
[14] D. Li and G. Wen, “MRMR-based ensemble pruning for facial expression recognition,”
Multimedia Tools and Applications, vol. 77, no. 12, pp. 15251-15272, 2018.
[15] G. Wen, Z. Hou, H. Li, D. Li, L. Jiang, and E. Xun, “Ensemble of deep neural networks
with probability-based fusion for facial expression recognition,” Cognitive
Computation, vol. 9, no. 5, pp. 597-610, 2017.
[16] Z. Yu and C. Zhang, “Image based static facial expression recognition with multiple
deep network learning,” ACM on international conference on multimodal interaction,
2015, pp. 435-442.
[17] T. Connie, M. Al-Shabi, W. P. Cheah, and M. Goh, “Facial expression recognition
using a hybrid CNN–SIFT aggregator,” International Workshop on Multi-disciplinary
Trends in Artificial Intelligence, 2017: Springer, pp. 139-149.
[18] H. Kaya, F. Gürpınar, and A. A. Salah, “Video-based emotion recognition in the wild
using deep transfer learning and score fusion,” Image and Vision Computing, vol. 65,
pp. 66-75, 2017.
[19] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International
journal of computer vision, vol. 60, no. 2, pp. 91-110, 2004.
[20] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” IEEE
computer society conference on computer vision and pattern recognition (CVPR'05),
2005, vol. 1: Ieee, pp. 886-893.
[21] B. Hasani and M. H. Mahoor, “Facial expression recognition using enhanced deep 3D
convolutional neural networks,” IEEE conference on computer vision and pattern
recognition workshops, 2017, pp. 30-40.
[22] M.-I. Georgescu, R. T. Ionescu, and M. Popescu, “Local learning with deep and
handcrafted features for facial expression recognition,” IEEE Access, vol. 7, pp. 64827-
64836, 2019.
[23] J. L. Elman, “Finding structure in time,” Cognitive science, vol. 14, no. 2, pp. 179-211,
1990.
[24] http://deanhan.com/2018/07/26/vgg16/
[25] O. M. Parkhi, A. Vedaldi, and A. Zisserman, “Deep face recognition,” 2015.
[26] Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman, “Vggface2: A dataset for
recognizing faces across pose and age,” IEEE international conference on automatic
face & gesture recognition (FG 2018), 2018: IEEE, pp. 67-74.
[27] C. Shorten and T. Khoshgoftaar, “A survey on image data augmentation for deep
learning,” J. Big Data 6, 60 (2019). https://doi.org/10.1186/s40537-019-0197-0
[28] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, “Joint face detection and alignment using
multitask cascaded convolutional networks,” IEEE Signal Processing Letters, vol. 23,
no. 10, pp. 1499-1503, 2016.
[29] https://www.huiyi8.com/sc/41196.html

全文公開日期 2026/05/05 (校內網路)
全文公開日期 2026/05/05 (校外網路)
全文公開日期 2026/05/05 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文