簡易檢索 / 詳目顯示

研究生: 許景甯
Ching-Ning Hsu
論文名稱: 臉部情緒識別之二階深度學習模型
Two-stage convolutional neural network model in facial emotion recognition
指導教授: 王孔政
Kung-Jeng Wan
口試委員: 羅明琇
Ming-Hsiu Lo
蔣明晃
Ming-Huang Chiang
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業管理系
Department of Industrial Management
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 57
中文關鍵詞: 卷積神經網路臉部情緒識別哈欠偵測
外文關鍵詞: Convolution neural network, facial emotion recognition, yawning detection
相關次數: 點閱:387下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

臉部情緒識別 (FER) 能夠表現人的意圖、情感和認知狀態,在服務業中具有廣闊的應用前景。普遍上情感識別技術須透過影像處理的方法進行。然而,此方法存在識別準確度低和無法及時提取臉部特徵的問題。此外,訓練資料集中的空白或模糊影像也可能會影響識別準確度。基於上述問題,本研究中提出了一種基於深度學習的臉部識別模型方法以實現及時臉部情緒識別。本研究所提出的模型包括兩個階段,第一階段使用多任務卷積神經網絡(Multi-task cascaded convolutional neural network , MTCNN)來標定精確的人臉邊界框位置,第二階段使用端到端的神經網絡YOLOv3 ( You only look once v3) 以實現及時識別臉部情緒和特徵。本研究所使用的訓練資料集為RAF-DB,CK +和FER2013,其結果準確度分別可達到86.1%,95.6%和69%。進一步以個案作為分析,應用於駕駛疲勞的偵測上,使用YAWDD資料集作為測試可達到82.3%的準確度。另外,使用AFEW數據集模擬真實環境的測試中,模型的準確度可達到62.5%。


Facial emotion recognition (FER) has potentials in service industry by indicating human's intentions, feelings, and cognitive states. Conventional emotion recognition strategies supported by image process technology suffer from low recognition accuracy and problems in feature extraction in real time. In addition, the recognition accuracy can be affected by blank and/or blurry images/frames in a video. To solve these problems, a deep learning-based methodology is proposed in the study to achieve efficient and effective FER. The proposed method consists of two stages, where the first stage uses the multitask convolution neural network (MTCNN) to distinguish the precise face bounding box positions, and the second stage adopts an end-to-end YOLOv3 deep neural network to achieve real-time recognition of emotions and features. By training our model using RAF-DB, CK+ and FER2013 datasets, the accuracy reaches 86.1%, 95.6%, and 69% respectively. Further analysis shows that our model can detect driving fatigue with accuracy of 82.3% using the YAWDD dataset. In the test using the AFEW dataset to simulate a real environment, the accuracy of our model reaches 62.5%. The proposed FER model can facilitate the identification of customers’ intentions and reactions in the service process.

Table of Content Abstract i 摘要 ii 誌謝 iii Table of Content iv List of Figures vi List of Tables vii Chapter 1. Introduction 1 Chapter 2. Literature survey 3 2.1 Facial expressions and emotion 3 2.2 Deep learning-based FER 4 2.3 Multi-task cascaded convolutional neural network (MTCNN) 5 2.4 YOLO 10 2.5 Yawning judgment 12 2.6 Summary 14 Chapter 3. Method 15 3.1 Research framework 15 3.2 Facial expression databases 17 3.2.1 FER2013 17 3.2.2 Real-world affective faces (RAF) 18 3.2.3 The Extended Cohn-Kanade database (CK+) 18 3.2.4 Mixed dataset 19 3.3 Performance indicator 19 3.4 MTCNN for pre-processing 20 3.5 YOLOv3 for detect emotions 22 Chapter 4. Experiment and discussion 27 4.1 Experiment Setup 27 4.2 MTCNN evaluation 27 4.3 The proposed FER model evaluation 28 Chapter 5. Case study and test 32 5.1 Yawning detection 32 5.1.1 YawDD Dataset 32 5.2 AFEW database detection 34 5.2.1 AFEW database 34 5.2.2 Testing results 35 5.3 Applications to service quality cases 38 Chapter 6. Conclusions 42 References 44   List of Figures Figure 1. FER pipeline 4 Figure 2. Multi-task cascaded convolutional neural network 9 Figure 3. YOLOv3 11 Figure 4. Yawning detection flow 13 Figure 5. Framework of the proposed two-stage FER model 15 Figure 6. The proposed two-stage FER model 17 Figure 7. The sample images 19 Figure 8. The output format from MTCNN model 21 Figure 10. MTCNN face detection procedure 22 Figure 11. Training performance of YOLOv3 model for emotion detection 25 Figure 12. Flowchart of facial expression detection algorithm 26 Figure 13. The procedure of yawning detection by the proposed two-stage FER model 32 Figure 14. Three different states of the YawDD dataset (Abtahi et al., 2014) 33 Figure 15. Samples of facial expression data in AFEW 35 Figure 16. Normalized confusion matrix 38 Figure 17. Example of video detection result 39 Figure 18. Example of video detection result 41   List of Tables Table 1, The feature extraction network of YOLOv3. 12 Table 2. Image defect types of dataset 16 Table 3. Confusion matrix for evaluation 19 Table 4. YOLOv3 hyperparameters 24 Table 5. Performance result by fine-tuning parameters in training 24 Table 6. The specification of the computational platform for FER model 27 Table 7. The distribution of emotion classes 27 Table 8. The distribution of emotion classes 27 Table 9. The training results by the proposed two-stage FER method in different datasets 29 Table 10. Comparison for different emotions on AP 30 Table 11. Accuracy and recall of different methods on yawn detection 34 Table 12. Precision and recall of different methods on AFEW dataset 35 Table 13. Customer emotion detection by different methods 39 Table 14. Server emotion detection by different methods 40

References

Abtahi, S., Omidyeganeh, M., Shirmohammadi, S., & Hariri, B. (2014, March). YawDD: A yawning detection dataset. In Proceedings of the 5th ACM multimedia systems conference (pp. 24-28).
Bochkovskiy, A., Wang, C. Y., & Liao, H. Y. M. (2020). Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934.
Bi, X., Chen, Z., & Yue, J. (2020, December). A Novel One-step Method Based on YOLOv3-tiny for Fatigue Driving Detection. In 2020 IEEE 6th International Conference on Computer and Communications (ICCC) (pp. 1241-1245). IEEE.
Chen, R. C. (2019). Automatic License Plate Recognition via sliding-window darknet-YOLO deep learning. Image and Vision Computing, 87, 47-56.
Chen, E., Gong, Y., & Tie, Y. (Eds.). (2016). Advances in Multimedia Information Processing-PCM 2016: 17th Pacific-Rim Conference on Multimedia, Xi ́ An, China, September 15-16, 2016, Proceedings, Part I (Vol. 9916). Springer.
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., & Taylor, J. G. (2001). Emotion recognition in human-computer interaction. IEEE Signal processing magazine, 18(1), 32-80.
Dhall, A., Goecke, R., Lucey, S., & Gedeon, T. (2012). Collecting large, richly annotated facial-expression databases from movies. IEEE Annals of the History of Computing, 19(03), 34-41.
Ekman, P., & Friesen, W. V. (1971). Constants across cultures in the face and emotion. Journal of personality and social psychology, 17(2), 124.
Ekman, P., & Friesen, W. V. (2003). Unmasking the face: A guide to recognizing emotions from facial clues (Vol. 10). Ishk.
Ghofrani, A., Toroghi, R. M., & Ghanbari, S. (2019). Realtime face-detection and emotion recognition using mtcnn and minishufflenet v2. In 2019 5th Conference on Knowledge Based Engineering and Innovation (KBEI) (pp. 817-821). IEEE.
Giannopoulos, P., Perikos, I., & Hatzilygeroudis, I. (2018). Deep learning approaches for facial emotion recognition: A case study on FER-2013. In Advances in hybridization of intelligent methods (pp. 1-16). Springer, Cham.
González-Rodríguez, M. R., Díaz-Fernández, M. C., & Gómez, C. P. (2020). Facial-expression recognition: an emergent approach to the measurement of tourist satisfaction through emotions. Telematics and Informatics, 51, 101404.
Goodfellow, I. J., Erhan, D., Carrier, P. L., Courville, A., Mirza, M., Hamner, B., ... & Bengio, Y. (2013, November). Challenges in representation learning: A report on three machine learning contests. In International conference on neural information processing (pp. 117-124). Springer, Berlin, Heidelberg.
Hamm, J., Kohler, C. G., Gur, R. C., & Verma, R. (2011). Automated facial action coding system for dynamic analysis of facial expressions in neuropsychiatric disorders. Journal of neuroscience methods, 200(2), 237-256.
Huang, R., Pedoeem, J., & Chen, C. (2018, December). YOLO-LITE: a real-time object detection algorithm optimized for non-GPU computers. In 2018 IEEE International Conference on Big Data (Big Data) (pp. 2503-2510). IEEE.
Indira, D. N. V. S. L. S., Sumalatha, L., & Markapudi, B. R. (2021, February). Multi Facial Expression Recognition (MFER) for Identifying Customer Satisfaction on Products using Deep CNN and Haar Cascade Classifier. In IOP Conference Series: Materials Science and Engineering (Vol. 1074, No. 1, p. 012033). IOP Publishing.
Kanade, T., Cohn, J. F., & Tian, Y. (2000, March). Comprehensive database for facial expression analysis. In Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580) (pp. 46-53). IEEE.
Kang, H. B. (2013). Various approaches for driver and driving behavior monitoring: A review. In Proceedings of the IEEE International Conference on Computer Vision Workshops (pp. 616-623).
Kim, B. K., Roh, J., Dong, S. Y., & Lee, S. Y. (2016). Hierarchical committee of deep convolutional neural networks for robust facial expression recognition. Journal on Multimodal User Interfaces, 10(2), 173-189.
Ko, B. C. (2018). A brief review of facial emotion recognition based on visual information. sensors, 18(2), 401.
Landowska, A., & Miler, J. (2016, September). Limitations of emotion recognition in software user experience evaluation context. In 2016 Federated Conference on Computer Science and Information Systems (FedCSIS) (pp. 1631-1640). IEEE.
Laroca, R., Severo, E., Zanlorensi, L. A., Oliveira, L. S., Gonçalves, G. R., Schwartz, W. R., & Menotti, D. (2018, July). A robust real-time automatic license plate recognition based on the YOLO detector. In 2018 International Joint Conference on Neural Networks (IJCNN) (pp. 1-10). IEEE.
Lei, J., Han, Q., Chen, L., Lai, Z., Zeng, L., & Liu, X. (2017). A novel side face contour extraction algorithm for driving fatigue statue recognition. IEEE Access, 5, 5723-5730.
Li, S., Deng, W., & Du, J. (2017). Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2852-2861).
Lien, J. J., Kanade, T., Cohn, J. F., & Li, C. C. (1998, April). Automated facial expression recognition based on FACS action units. In Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition (pp. 390-395). IEEE.
Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010, June). The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In 2010 ieee computer society conference on computer vision and pattern recognition-workshops (pp. 94-101). IEEE.
Luh, G. C., Wu, H. B., Yong, Y. T., Lai, Y. J., & Chen, Y. H. (2019, July). Facial Expression Based Emotion Recognition Employing YOLOv3 Deep Neural Networks. In 2019 International Conference on Machine Learning and Cybernetics (ICMLC) (pp. 1-7). IEEE.
Mehrabian, A. (2017). Nonverbal communication. Routledge.
Noordewier, M. K., Topolinski, S., & Van Dijk, E. (2016). The temporal dynamics of surprise. Social and Personality Psychology Compass, 10(3), 136-149.
Pang, L., Liu, H., Chen, Y., & Miao, J. (2020). Real-time concealed object detection from passive millimeter wave images based on the YOLOv3 algorithm. Sensors, 20(6), 1678.
Poria, S., Cambria, E., Bajpai, R., & Hussain, A. (2017). A review of affective computing: From unimodal analysis to multimodal fusion. Information Fusion, 37, 98-125.
Redmon, J., & Farhadi, A. (2018,). YOLOv3: An Incremental Improvement. arXiv pre-print server. arxiv:1804.02767.
Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 815-823).
Söderlund, M., & Rosengren, S. (2008). Revisiting the smiling service worker and customer satisfaction. International Journal of Service Industry Management.
Su, C., & Wang, G. (2020, October). Design and application of learner emotion recognition for classroom. In Journal of Physics: Conference Series (Vol. 1651, No. 1, p. 012158). IOP Publishing.
Szeliski, R. (2010). Computer vision: algorithms and applications. Springer Science & Business Media.
Tacconi, D., Mayora, O., Lukowicz, P., Arnrich, B., Setz, C., Troster, G., & Haring, C. (2008, January). Activity and emotion recognition to support early diagnosis of psychiatric diseases. In 2008 Second International Conference on Pervasive Computing Technologies for Healthcare (pp. 100-102). IEEE.
Tian, Y. L., Kanade, T., & Cohn, J. F. (2005). Facial expression analysis. In Handbook of face recognition (pp. 247-275). Springer, New York, NY.
Tian, Y. I., Kanade, T., & Cohn, J. F. (2001). Recognizing action units for facial expression analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(2), 97-115.
Viola, P., & Jones, M. (2001, December). Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001 (Vol. 1, pp. I-I). IEEE.
Xie, Y., Chen, K., & Murphey, Y. L. (2018, November). Real-time and robust driver yawning detection with deep neural networks. In 2018 IEEE Symposium Series on Computational Intelligence (SSCI) (pp. 532-538). IEEE.
Yang, B., Cao, J., Ni, R., & Zhang, Y. (2017). Facial expression recognition using weighted mixture deep neural network based on double-channel facial images. IEEE Access, 6, 4630-4640.
Zhang, K., Zhang, Z., Li, Z., & Qiao, Y. (2016). Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, 23(10), 1499-1503.
Ziyana, C., Jianming, W., & Guanghao, J. (2019). Driver state detection framework based on temporal facial action information. Application Research of Computers. 36(11).
Zhang, W., & Su, J. (2017, November). Driver yawning detection based on long short term memory networks. In 2017 IEEE Symposium Series on Computational Intelligence (SSCI) (pp. 1-5). IEEE.

無法下載圖示 全文公開日期 2024/06/23 (校內網路)
全文公開日期 2024/06/23 (校外網路)
全文公開日期 2024/06/23 (國家圖書館:臺灣博碩士論文系統)
QR CODE