基於全局特徵與局部特徵之深度學習臉部表情辨識系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	謝宗廷 Chung-Ting Hsieh
論文名稱：	基於全局特徵與局部特徵之深度學習臉部表情辨識系統 Facial Expression Recognition with Deep Learning Based on Global and Local Features
指導教授：	林昌鴻 Chang-Hong Lin
口試委員:	吳晉賢 Chin-Hsien Wu 沈中安 Chung-An Shen 陳永耀 Yung-Yao Chen
學位類別：	碩士 Master
系所名稱：	電資學院 - 電子工程系 Department of Electronic and Computer Engineering
論文出版年：	2021
畢業學年度：	109
語文別：	英文
論文頁數：	79
中文關鍵詞：	人臉表情辨識、深度學習、卷積神經網路、全局特徵、局部特徵、集成式網路
外文關鍵詞：	Facial Expression Recognition, Deep Learning, Convolutional Neural Network, Global Feature, Local Feature, Ensemble Network
相關次數：	點閱：230 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

人臉表情識別(Facial Expression Recognition)一直是一個熱門的研究領域，表情識別更是被應用到生活中，如監控系統、人機互動、及心理評估等等。由於近幾年深度學習發展快速，表情辨識也更進一步朝著解決真實情景或是更複雜的情景發展。雖然在表情辨識中，深度學習的方法比機器學習的方法大幅提升了準確度，但是當應用於多變的真實場景中仍有很大的進步空間。
本論文針對表情辨識提出了一個新穎的卷積神經網路架構 (Convolutional Neural Network)，除了將全臉的特徵萃取出來的全局特徵 (Global Feature)，也特別將表情辨識至關重要的五官特徵萃取出來當作局部特徵 (Local Feature)加以學習，像是眼睛、嘴巴。此作法應用於真實場景中，當全臉有部分區域被遮蔽，我們可以依據局部特徵的紋理加以輔助預測出正確的表情。此外局部特徵是從網路中間層的全局特徵中劃分出來的，以達到減少大量的計算資源的優點，而整體架構我們可以視為一個集成式網路 (Ensemble Network)，將全局特徵與經過劃分的局部特徵個別送入專屬的網路中訓練。為了驗證本方法的有效性，我們使用RAF數據集進行測試，RAF數據集是由真實生活場景中收集來的各種圖片，其中資料集包括不同光線和遮擋情況等。我們的方法達到了87.25%的準確度並且優於其他先進的方法，以此證明我們的方法更適合應用於真實生活場景。

Facial expression recognition has been a very popular research topic, which was often applied in life, such as surveillance systems, human-computer interaction, and mental health assessment, etc. With deep learning developed quickly in recent years, facial expression recognition was able to solve more difficult and practical problems. Even though deep learning methods on the emotion recognition task outperform the traditional machine learning methods, there is still room for improvement to apply in the real-world scenarios.
This thesis proposed a novel convolutional neural network for the emotion recognition task. Besides using the global features from the whole face, local features extracted from facial parts are crucial to emotion recognition, so features of eyes and the mouth are also used in the training of the network. Such method is useful in real life scenario, where some portions of the face might be occluded, local features that utilized the textures of the partial region of the face can be helpful to make the right prediction. In addition, local features were extracted from the global features in the middle layers of the network to save computation resources. The entire architecture can be seen as an ensemble network, where global features and local features were fed into separated specialized networks to perform training. In order to verify the effectiveness of the proposed method, the RAF dataset was used, which is collected from real life images, including variety of different lighting and occlusions. The proposed method can achieve an accuracy of 87.25%, and perform better than state-of-the-art methods, which is proved to be more suitable in real life cases.

摘要    I
ABSTRACT    II
致謝    III
LIST OF CONTENTS    IV
LIST OF FIGURES    VI
LIST OF TABLES    VIII
CHAPTER 1    INTRODUCTIONS    1
1    Motivation    1
2    Contribution    3
3    Thesis Organization    4
CHAPTER 2    RELATED WORKS    5
1    Traditional methods for facial expression recognition    6
2    Deep learning methods for facial expression recognition    7
CHAPTER 3    PROPOSED METHODS    9
1    Data preprocessing    11
1.1    Face detection    12
1.2    Face alignment    13
2    Data augmentation    17
2.1    Random Flip    18
2.2    Random blackout    19
2.3    Gaussian noise    22
3    Network Architecture    24
3.1    Overview of the Architecture    25
3.2    Global feature network    27
3.2.1    Residual learning    29
3.3    Local feature network    30
3.3.1    Region decomposition    31
3.3.2    Local feature network Architecture    38
4    Training Details    41
4.1    Initialization    42
4.2    Loss function    43
4.3    Optimizer    44
CHAPTER 4    EXPERIMENTAL RESULTS    46
1    Experimental Environment    46
2    Dataset    47
2.1    RAF dataset    48
3    Performance Evaluation    51
3.1    Evaluation metrics    51
3.2    Performance Evaluation    54
3.3    System evaluation performed    61
CHAPTER 5    CONCLUSIONS and Future works    62
1    Conclusions    62
2    Future Works    63
REFERENCES    64


                                

[1] P. Abhang, S. Rao, B. W. Gawali, and P. Rokade, "Emotion recognition using speech and EEG signal–a review," International Journal of Computer Applications, vol. 15, no. 3, pp. 0975-8887, 2011.
[2] P. Ekman, W. V. Friesen, M. O'sullivan, A. Chan, I. Diacoyanni-Tarlatzis, K. Heider, R. Krause, W. A. LeCompte, T. Pitcairn, and P. E. Ricci-Bitti, "Universals and cultural differences in the judgments of facial expressions of emotion," International Journal of Personality and Social Psychology, vol. 53, no. 4, pp. 712-717, 1987.
[3] T.-T. Gao, H. Li, and S.-L. Yin, "Adaptive convolutional neural network-based information fusion for facial expression recognition," International Journal of Electronics and Information Engineering, vol. 13, no. 1, pp. 17-23, 2021.
[4] M. K. Chowdary, T. N. Nguyen, and D. J. Hemanth, "Deep learning-based facial emotion recognition for human–computer interaction applications," International Journal of Neural Computing and Applications, pp. 1-18, 2021.
[5] Z. Fei, E. Yang, D. D.-U. Li, S. Butler, W. Ijomah, X. Li, and H. Zhou, "Deep convolution network based emotion analysis towards mental health care," International Journal of Neurocomputing, vol. 388, pp. 212-227, 2020.
[6] T. Ojala, M. Pietikainen, and T. Maenpaa, "Multiresolution gray-scale and rotation invariant texture classification with local binary patterns," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 971-987, 2002.
[7] Y. Wu, L. Zhang, G. Chen, and P. N. Michelini, "Unconstrained facial expression recogniton based on cascade decision and Gabor filters," in IEEE International Conference on Pattern Recognition, 2021, pp. 3336-3341.
[8] S. Vishwanathan and M. N. Murty, "SSVM: a simple SVM algorithm," in IEEE International Conference on Neural Networks., vol. 3, no. 2, pp. 2393-2398, 2002.
[9] L. Breiman, "Random forests," International Journal of Machine learning, vol. 45, no. 1, pp. 5-32, 2001.
[10] S. M. S. A. Abdullah, S. Y. A. Ameen, M. A. Sadeeq, and S. Zeebaree, "Multimodal emotion recognition using deep learning," International Journal of Applied Science and Technology Trends, vol. 2, no. 02, pp. 52-58, 2021.
[11] K. Wang, X. Peng, J. Yang, S. Lu, and Y. Qiao, "Suppressing uncertainties for large-scale facial expression recognition," in IEEE International Conference on Computer Vision and Pattern Recognition, 2020, pp. 6897-6906.
[12] J. H. Kim, A. Poulose, and D. S. Han, "The extensive usage of the facial image threshing machine for facial emotion recognition performance," International Journal of Sensors, vol. 21, no. 6, pp. 2106-2026, 2021.

[13] M. Akhand, S. Roy, N. Siddique, M. A. S. Kamal, and T. Shimamura, "Facial emotion recognition using transfer learning in the deep CNN," International Journal of Electronics, vol. 10, no. 9, pp. 1009-1036, 2021.
[14] N. Mehendale, "Facial emotion recognition using convolutional neural networks," Journal of Applied Sciences, vol. 2, no. 3, pp. 1-8, 2020.
[15] J. Zhang, Z. Yin, P. Chen, and S. Nichele, "Emotion recognition using multi-modal data and machine learning techniques: a tutorial and review," Journal of Information Fusion, vol. 59, pp. 103-126, 2020.
[16] Y. Li, J. Zeng, S. Shan, and X. Chen, "Occlusion aware facial expression recognition using cnn with attention mechanism," in IEEE Transactions on Image Processing, vol. 28, no. 5, pp. 2439-2450, 2019.
[17] Y. Li, J. Zeng, S. Shan, and X. Chen, "Patch-gated CNN for occlusion-aware facial expression recognition," in IEEE International Conference on Pattern Recognition, 2018, pp. 2209-2214.
[18] Y. Fan, J. C. Lam, and V. O. Li, "Multi-region ensemble convolutional neural network for facial expression recognition," in International Conference on Artificial Neural Networks, 2018, vol. 11139, pp. 84-94.
[19] C. Liu, T. Tang, K. Lv, and M. Wang, "Multi-feature based emotion recognition for video clips," in ACM International Conference on Multimodal Interaction, 2018, pp. 630-634.
[20] K. Wang, X. Peng, J. Yang, D. Meng, and Y. Qiao, "Region attention networks for pose and occlusion robust facial expression recognition," IEEE International Conference on Transactions on Image Processing, vol. 29, pp. 4057-4069, 2020.
[21] M. Wegrzyn, M. Vogt, B. Kireclioglu, J. Schneider, and J. Kissler, "Mapping the emotional face: how individual face parts contribute to successful emotion recognition," Journal of PloS one, vol. 12, no. 5, pp. 1-15, 2017.
[22] S. Karanwal and M. Diwakar, "OD-LBP: orthogonal difference-local binary pattern for face recognition," Journal of Digital Signal Processing, vol. 110, pp. 102-948, 2021.
[23] S. Ahmed, M. Frikha, T. Hussein, and J. Rahebi, "Optimum feature selection with particle swarm optimization to face recognition system using Gabor wavelet transform and deep learning," Journal of Biomed Research International, vol. 2021, pp. 6621540-6621540, 2021.
[24] S. Happy and A. Routray, "Automatic facial expression recognition using features of salient facial patches," in IEEE Transactions on Affective Computing, vol. 6, no. 1, pp. 1-12, 2014.

[25] E. D. Martí, M. A. Patricio, and J. M. Molina, "A practical case study: face recognition on low quality images using gabor wavelet and support vector machines," Journal of Intelligent & Robotic Systems, vol. 64, no. 3, pp. 447-463, 2011.
[26] F. Deeba, A. Ahmed, H. Memon, F. A. Dharejo, and A. Ghaffar, "LBPH-based enhanced real-time face recognition," Journal of Advanced Computer Science and Applications, vol. 10, no. 5, pp. 274-280, 2019.
[27] S. Sun and R. Huang, "An adaptive k-nearest neighbor algorithm," in IEEE international conference on fuzzy systems and knowledge discovery, 2010, vol. 1, no. 7, pp. 91-94.
[28] X. Feng, M. Pietikainen, and A. Hadid, "Facial expression recognition with local binary patterns and linear programming," Journal of Pattern Recognition And Image Analysis, vol. 15, no. 2, pp. 546-548, 2005.
[29] P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews, "The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression," in IEEE International Conference on Computer Vision and Pattern Recognition, 2010, pp. 94-101.
[30] A. Dhall, A. Asthana, R. Goecke, and T. Gedeon, "Emotion recognition using PHOG and LPQ features," in IEEE International Conference on Automatic Face & Gesture Recognition, 2011, pp. 878-883.
[31] B. Fasel, "Robust face analysis using convolutional neural networks," in IEEE International Conference on Pattern Recognition, 2002, pp. 40-43.
[32] S. E. Kahou, C. Pal, X. Bouthillier, P. Froumenty, Ç. Gülçehre, R. Memisevic, P. Vincent, A. Courville, Y. Bengio, and R. C. Ferrari, "Combining modality specific deep neural networks for emotion recognition in video," in ACM International conference on multimodal interaction, 2013, pp. 543-550.
[33] Z. Zhong, L. Zheng, G. Kang, S. Li, and Y. Yang, "Random erasing data augmentation," in AAAI Conference on Artificial Intelligence, 2020, vol. 34, no. 07, pp. 13001-13008.
[34] K. H. Suh and E. C. Lee, "Face anti-spoofing based on deep neural network using brightness augmentation," in International Conference on Intelligent Human Computer Interaction, 2020, pp. 219-228.
[35] A. Xue, K. Sheng, S. Dai, and X. Li, "Robust landmark-free head pose estimation by learning to crop and background augmentation," Journal of IET Image Processing, vol. 14, no. 11, pp. 2553-2560, 2020.
[36] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770-778.

[37] X. Tan, S. Chen, Z.-H. Zhou, and F. Zhang, "Face recognition from a single image per person: a survey," Journal of Pattern Recognition, vol. 39, no. 9, pp. 1725-1745, 2006.
[38] J. Xiang and G. Zhu, "Joint face detection and facial expression recognition with MTCNN," in IEEE International Conference on Information Science and Control Engineering, 2017, pp. 424-427.
[39] I. M. Revina and W. S. Emmanuel, "A survey on human face expression recognition techniques," Journal of Computer and Information Sciences, vol. 33, no. 6, pp. 619-628, 2021.
[40] S. Li, W. Deng, and J. Du, "Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild," in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 356-370.
[41] G. Bradski and A. Kaehler, "Learning OpenCV: computer vision with the OpenCV library," O'Reilly Media, Inc., 2008.
[42] C. Bernardin and S. Olla, "Transport properties of a chain of anharmonic oscillators with random flip of velocities," Journal of Statistical Physics, vol. 145, no. 5, pp. 1224-1255, 2011.
[43] E. Riba, D. Mishkin, D. Ponsa, E. Rublee, and G. Bradski, "Kornia: an open source differentiable computer vision library for pytorch," in IEEE International Conference on Applications of Computer Vision, 2020, pp. 3674-3683.
[44] A. Dytso, H. V. Poor, and S. S. Shitz, "On the distribution of the conditional mean estimator in Gaussian noise," in IEEE International Conference on Information Theory Workshop, 2021, pp. 1-5.
[45] B. C. Ko, "A brief review of facial emotion recognition based on visual information," Journal of sensors, vol. 18, no. 2, pp. 401-401, 2018.
[46] X. Glorot and Y. Bengio, "Understanding the difficulty of training deep feedforward neural networks," in JMLR International Conference on Artificial Intelligence and Statistics, 2010, pp. 249-256.
[47] Z. Zhang and M. R. Sabuncu, "Generalized cross entropy loss for training deep neural networks with noisy labels," in International Conference on Neural Information Processing Systems, 2018, pp. 8792-8802.
[48] D. P. Kingma and J. Ba, "Adam: a method for stochastic optimization," International Conference on Learning Representations, pp. 1412-6980, 2014.
[49] J. Duchi, E. Hazan, and Y. Singer, "Adaptive subgradient methods for online learning and stochastic optimization," Journal of Machine Learning Research, vol. 12, no. 7, pp. 2121-2159, 2011.

[50] I. Bello, B. Zoph, V. Vasudevan, and Q. V. Le, "Neural optimizer search with reinforcement learning," in PMLR International Conference on Machine Learning, 2017, pp. 459-468.
[51] J. Fan, D. A. Keim, Y. Gao, H. Luo, and Z. Li, "JustClick: personalized image recommendation via exploratory search from large-scale Flickr images," IEEE Transactions on Circuits and Systems for Video Technology, vol. 19, no. 2, pp. 273-288, 2008.
[52] J. T. Townsend, "Theoretical analysis of an alphabetic confusion matrix," Journal of Perception & Psychophysics, vol. 9, no. 1, pp. 40-50, 1971.
[53] S. Guo, W. Huang, H. Zhang, C. Zhuang, D. Dong, M. R. Scott, and D. Huang, "Curriculumnet: Weakly supervised learning from large-scale web images," in European Conference on Computer Vision, 2018, pp. 135-150.
[54] W. Zhang, Y. Wang, and Y. Qiao, "Metacleaner: Learning to hallucinate clean representations for noisy-labeled visual recognition," in IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 7373-7382.

全文公開日期 2024/08/27 (校內網路)
全文公開日期本全文未授權公開 (校外網路)
全文公開日期本全文未授權公開 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文