臉部情緒識別之二階深度學習模型｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	許景甯 Ching-Ning Hsu
論文名稱：	臉部情緒識別之二階深度學習模型 Two-stage convolutional neural network model in facial emotion recognition
指導教授：	王孔政 Kung-Jeng Wan
口試委員:	羅明琇 Ming-Hsiu Lo 蔣明晃 Ming-Huang Chiang
學位類別：	碩士 Master
系所名稱：	管理學院 - 工業管理系 Department of Industrial Management
論文出版年：	2021
畢業學年度：	109
語文別：	英文
論文頁數：	57
中文關鍵詞：	卷積神經網路、臉部情緒識別、哈欠偵測
外文關鍵詞：	Convolution neural network, facial emotion recognition, yawning detection
相關次數：	點閱：387 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

臉部情緒識別 (FER) 能夠表現人的意圖、情感和認知狀態，在服務業中具有廣闊的應用前景。普遍上情感識別技術須透過影像處理的方法進行。然而，此方法存在識別準確度低和無法及時提取臉部特徵的問題。此外，訓練資料集中的空白或模糊影像也可能會影響識別準確度。基於上述問題，本研究中提出了一種基於深度學習的臉部識別模型方法以實現及時臉部情緒識別。本研究所提出的模型包括兩個階段，第一階段使用多任務卷積神經網絡（Multi-task cascaded convolutional neural network , MTCNN）來標定精確的人臉邊界框位置，第二階段使用端到端的神經網絡YOLOv3 ( You only look once v3) 以實現及時識別臉部情緒和特徵。本研究所使用的訓練資料集為RAF-DB，CK +和FER2013，其結果準確度分別可達到86.1％，95.6％和69％。進一步以個案作為分析，應用於駕駛疲勞的偵測上，使用YAWDD資料集作為測試可達到82.3％的準確度。另外，使用AFEW數據集模擬真實環境的測試中，模型的準確度可達到62.5％。

Facial emotion recognition (FER) has potentials in service industry by indicating human's intentions, feelings, and cognitive states. Conventional emotion recognition strategies supported by image process technology suffer from low recognition accuracy and problems in feature extraction in real time. In addition, the recognition accuracy can be affected by blank and/or blurry images/frames in a video. To solve these problems, a deep learning-based methodology is proposed in the study to achieve efficient and effective FER. The proposed method consists of two stages, where the first stage uses the multitask convolution neural network (MTCNN) to distinguish the precise face bounding box positions, and the second stage adopts an end-to-end YOLOv3 deep neural network to achieve real-time recognition of emotions and features. By training our model using RAF-DB, CK+ and FER2013 datasets, the accuracy reaches 86.1%, 95.6%, and 69% respectively. Further analysis shows that our model can detect driving fatigue with accuracy of 82.3% using the YAWDD dataset. In the test using the AFEW dataset to simulate a real environment, the accuracy of our model reaches 62.5%. The proposed FER model can facilitate the identification of customers’ intentions and reactions in the service process.

Table of Content
Abstract    i
摘要    ii
誌謝    iii
Table of Content    iv
List of Figures    vi
List of Tables    vii
Chapter 1.  Introduction    1
Chapter 2.  Literature survey    3
2.1   Facial expressions and emotion    3
2.2   Deep learning-based FER    4
2.3   Multi-task cascaded convolutional neural network (MTCNN)    5
2.4   YOLO    10
2.5   Yawning judgment    12
2.6      Summary    14
Chapter 3.  Method    15
3.1      Research framework    15
3.2   Facial expression databases    17
3.2.1  FER2013    17
3.2.2  Real-world affective faces (RAF)    18
3.2.3  The Extended Cohn-Kanade database (CK+)    18
3.2.4  Mixed dataset    19
3.3   Performance indicator    19
3.4   MTCNN for pre-processing    20
3.5   YOLOv3 for detect emotions    22
Chapter 4.  Experiment and discussion    27
4.1   Experiment Setup    27
4.2   MTCNN evaluation    27
4.3   The proposed FER model evaluation    28
Chapter 5.  Case study and test    32
5.1   Yawning detection    32
5.1.1  YawDD Dataset    32
5.2   AFEW database detection    34
5.2.1  AFEW database    34
5.2.2  Testing results    35
5.3   Applications to service quality cases    38
Chapter 6.   Conclusions    42
References     44
 
List of Figures

Figure 1. FER pipeline    4
Figure 2. Multi-task cascaded convolutional neural network    9
Figure 3. YOLOv3    11
Figure 4. Yawning detection flow    13
Figure 5. Framework of the proposed two-stage FER model    15
Figure 6. The proposed two-stage FER model    17
Figure 7. The sample images    19
Figure 8. The output format from MTCNN model    21
Figure 10.  MTCNN face detection procedure    22
Figure 11. Training performance of YOLOv3 model for emotion detection    25
Figure 12. Flowchart of facial expression detection algorithm    26
Figure 13. The procedure of yawning detection by the proposed two-stage FER model    32
Figure 14. Three different states of the YawDD dataset (Abtahi et al., 2014)    33
Figure 15. Samples of facial expression data in AFEW    35
Figure 16. Normalized confusion matrix    38
Figure 17. Example of video detection result    39
Figure 18. Example of video detection result    41

 
List of Tables

Table 1,  The feature extraction network of YOLOv3.    12
Table 2. Image defect types of dataset    16
Table 3. Confusion matrix for evaluation    19
Table 4. YOLOv3 hyperparameters    24
Table 5. Performance result by fine-tuning parameters in training    24
Table 6. The specification of the computational platform for FER model    27
Table 7. The distribution of emotion classes    27
Table 8. The distribution of emotion classes    27
Table 9. The training results by the proposed two-stage FER method in different datasets    29
Table 10. Comparison for different emotions on AP    30
Table 11. Accuracy and recall of different methods on yawn detection    34
Table 12. Precision and recall of different methods on AFEW dataset    35
Table 13. Customer emotion detection by different methods    39
Table 14. Server emotion detection by different methods    40


                                

References

Abtahi, S., Omidyeganeh, M., Shirmohammadi, S., & Hariri, B. (2014, March). YawDD: A yawning detection dataset. In Proceedings of the 5th ACM multimedia systems conference (pp. 24-28).
Bochkovskiy, A., Wang, C. Y., & Liao, H. Y. M. (2020). Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934.
Bi, X., Chen, Z., & Yue, J. (2020, December). A Novel One-step Method Based on YOLOv3-tiny for Fatigue Driving Detection. In 2020 IEEE 6th International Conference on Computer and Communications (ICCC) (pp. 1241-1245). IEEE.
Chen, R. C. (2019). Automatic License Plate Recognition via sliding-window darknet-YOLO deep learning. Image and Vision Computing, 87, 47-56.
Chen, E., Gong, Y., & Tie, Y. (Eds.). (2016). Advances in Multimedia Information Processing-PCM 2016: 17th Pacific-Rim Conference on Multimedia, Xi ́ An, China, September 15-16, 2016, Proceedings, Part I (Vol. 9916). Springer.
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., & Taylor, J. G. (2001). Emotion recognition in human-computer interaction. IEEE Signal processing magazine, 18(1), 32-80.
Dhall, A., Goecke, R., Lucey, S., & Gedeon, T. (2012). Collecting large, richly annotated facial-expression databases from movies. IEEE Annals of the History of Computing, 19(03), 34-41.
Ekman, P., & Friesen, W. V. (1971). Constants across cultures in the face and emotion. Journal of personality and social psychology, 17(2), 124.
Ekman, P., & Friesen, W. V. (2003). Unmasking the face: A guide to recognizing emotions from facial clues (Vol. 10). Ishk.
Ghofrani, A., Toroghi, R. M., & Ghanbari, S. (2019). Realtime face-detection and emotion recognition using mtcnn and minishufflenet v2. In 2019 5th Conference on Knowledge Based Engineering and Innovation (KBEI) (pp. 817-821). IEEE.
Giannopoulos, P., Perikos, I., & Hatzilygeroudis, I. (2018). Deep learning approaches for facial emotion recognition: A case study on FER-2013. In Advances in hybridization of intelligent methods (pp. 1-16). Springer, Cham.
González-Rodríguez, M. R., Díaz-Fernández, M. C., & Gómez, C. P. (2020). Facial-expression recognition: an emergent approach to the measurement of tourist satisfaction through emotions. Telematics and Informatics, 51, 101404.
Goodfellow, I. J., Erhan, D., Carrier, P. L., Courville, A., Mirza, M., Hamner, B., ... & Bengio, Y. (2013, November). Challenges in representation learning: A report on three machine learning contests. In International conference on neural information processing (pp. 117-124). Springer, Berlin, Heidelberg.
Hamm, J., Kohler, C. G., Gur, R. C., & Verma, R. (2011). Automated facial action coding system for dynamic analysis of facial expressions in neuropsychiatric disorders. Journal of neuroscience methods, 200(2), 237-256.
Huang, R., Pedoeem, J., & Chen, C. (2018, December). YOLO-LITE: a real-time object detection algorithm optimized for non-GPU computers. In 2018 IEEE International Conference on Big Data (Big Data) (pp. 2503-2510). IEEE.
Indira, D. N. V. S. L. S., Sumalatha, L., & Markapudi, B. R. (2021, February). Multi Facial Expression Recognition (MFER) for Identifying Customer Satisfaction on Products using Deep CNN and Haar Cascade Classifier. In IOP Conference Series: Materials Science and Engineering (Vol. 1074, No. 1, p. 012033). IOP Publishing.
Kanade, T., Cohn, J. F., & Tian, Y. (2000, March). Comprehensive database for facial expression analysis. In Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580) (pp. 46-53). IEEE.
Kang, H. B. (2013). Various approaches for driver and driving behavior monitoring: A review. In Proceedings of the IEEE International Conference on Computer Vision Workshops (pp. 616-623).
Kim, B. K., Roh, J., Dong, S. Y., & Lee, S. Y. (2016). Hierarchical committee of deep convolutional neural networks for robust facial expression recognition. Journal on Multimodal User Interfaces, 10(2), 173-189.
Ko, B. C. (2018). A brief review of facial emotion recognition based on visual information. sensors, 18(2), 401.
Landowska, A., & Miler, J. (2016, September). Limitations of emotion recognition in software user experience evaluation context. In 2016 Federated Conference on Computer Science and Information Systems (FedCSIS) (pp. 1631-1640). IEEE.
Laroca, R., Severo, E., Zanlorensi, L. A., Oliveira, L. S., Gonçalves, G. R., Schwartz, W. R., & Menotti, D. (2018, July). A robust real-time automatic license plate recognition based on the YOLO detector. In 2018 International Joint Conference on Neural Networks (IJCNN) (pp. 1-10). IEEE.
Lei, J., Han, Q., Chen, L., Lai, Z., Zeng, L., & Liu, X. (2017). A novel side face contour extraction algorithm for driving fatigue statue recognition. IEEE Access, 5, 5723-5730.
Li, S., Deng, W., & Du, J. (2017). Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2852-2861).
Lien, J. J., Kanade, T., Cohn, J. F., & Li, C. C. (1998, April). Automated facial expression recognition based on FACS action units. In Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition (pp. 390-395). IEEE.
Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010, June). The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In 2010 ieee computer society conference on computer vision and pattern recognition-workshops (pp. 94-101). IEEE.
Luh, G. C., Wu, H. B., Yong, Y. T., Lai, Y. J., & Chen, Y. H. (2019, July). Facial Expression Based Emotion Recognition Employing YOLOv3 Deep Neural Networks. In 2019 International Conference on Machine Learning and Cybernetics (ICMLC) (pp. 1-7). IEEE.
Mehrabian, A. (2017). Nonverbal communication. Routledge.
Noordewier, M. K., Topolinski, S., & Van Dijk, E. (2016). The temporal dynamics of surprise. Social and Personality Psychology Compass, 10(3), 136-149.
Pang, L., Liu, H., Chen, Y., & Miao, J. (2020). Real-time concealed object detection from passive millimeter wave images based on the YOLOv3 algorithm. Sensors, 20(6), 1678.
Poria, S., Cambria, E., Bajpai, R., & Hussain, A. (2017). A review of affective computing: From unimodal analysis to multimodal fusion. Information Fusion, 37, 98-125.
Redmon, J., & Farhadi, A. (2018,). YOLOv3: An Incremental Improvement. arXiv pre-print server. arxiv:1804.02767.
Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 815-823).
Söderlund, M., & Rosengren, S. (2008). Revisiting the smiling service worker and customer satisfaction. International Journal of Service Industry Management.
Su, C., & Wang, G. (2020, October). Design and application of learner emotion recognition for classroom. In Journal of Physics: Conference Series (Vol. 1651, No. 1, p. 012158). IOP Publishing.
Szeliski, R. (2010). Computer vision: algorithms and applications. Springer Science & Business Media.
Tacconi, D., Mayora, O., Lukowicz, P., Arnrich, B., Setz, C., Troster, G., & Haring, C. (2008, January). Activity and emotion recognition to support early diagnosis of psychiatric diseases. In 2008 Second International Conference on Pervasive Computing Technologies for Healthcare (pp. 100-102). IEEE.
Tian, Y. L., Kanade, T., & Cohn, J. F. (2005). Facial expression analysis. In Handbook of face recognition (pp. 247-275). Springer, New York, NY.
Tian, Y. I., Kanade, T., & Cohn, J. F. (2001). Recognizing action units for facial expression analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(2), 97-115.
Viola, P., & Jones, M. (2001, December). Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001 (Vol. 1, pp. I-I). IEEE.
Xie, Y., Chen, K., & Murphey, Y. L. (2018, November). Real-time and robust driver yawning detection with deep neural networks. In 2018 IEEE Symposium Series on Computational Intelligence (SSCI) (pp. 532-538). IEEE.
Yang, B., Cao, J., Ni, R., & Zhang, Y. (2017). Facial expression recognition using weighted mixture deep neural network based on double-channel facial images. IEEE Access, 6, 4630-4640.
Zhang, K., Zhang, Z., Li, Z., & Qiao, Y. (2016). Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, 23(10), 1499-1503.
Ziyana, C., Jianming, W., & Guanghao, J. (2019). Driver state detection framework based on temporal facial action information. Application Research of Computers. 36(11).
Zhang, W., & Su, J. (2017, November). Driver yawning detection based on long short term memory networks. In 2017 IEEE Symposium Series on Computational Intelligence (SSCI) (pp. 1-5). IEEE.

全文公開日期 2024/06/23 (校內網路)
全文公開日期 2024/06/23 (校外網路)
全文公開日期 2024/06/23 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文