研究生: |
許景甯 Ching-Ning Hsu |
---|---|
論文名稱: |
臉部情緒識別之二階深度學習模型 Two-stage convolutional neural network model in facial emotion recognition |
指導教授: |
王孔政
Kung-Jeng Wan |
口試委員: |
羅明琇
Ming-Hsiu Lo 蔣明晃 Ming-Huang Chiang |
學位類別: |
碩士 Master |
系所名稱: |
管理學院 - 工業管理系 Department of Industrial Management |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 英文 |
論文頁數: | 57 |
中文關鍵詞: | 卷積神經網路 、臉部情緒識別 、哈欠偵測 |
外文關鍵詞: | Convolution neural network, facial emotion recognition, yawning detection |
相關次數: | 點閱:779 下載:4 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
臉部情緒識別 (FER) 能夠表現人的意圖、情感和認知狀態,在服務業中具有廣闊的應用前景。普遍上情感識別技術須透過影像處理的方法進行。然而,此方法存在識別準確度低和無法及時提取臉部特徵的問題。此外,訓練資料集中的空白或模糊影像也可能會影響識別準確度。基於上述問題,本研究中提出了一種基於深度學習的臉部識別模型方法以實現及時臉部情緒識別。本研究所提出的模型包括兩個階段,第一階段使用多任務卷積神經網絡(Multi-task cascaded convolutional neural network , MTCNN)來標定精確的人臉邊界框位置,第二階段使用端到端的神經網絡YOLOv3 ( You only look once v3) 以實現及時識別臉部情緒和特徵。本研究所使用的訓練資料集為RAF-DB,CK +和FER2013,其結果準確度分別可達到86.1%,95.6%和69%。進一步以個案作為分析,應用於駕駛疲勞的偵測上,使用YAWDD資料集作為測試可達到82.3%的準確度。另外,使用AFEW數據集模擬真實環境的測試中,模型的準確度可達到62.5%。
Facial emotion recognition (FER) has potentials in service industry by indicating human's intentions, feelings, and cognitive states. Conventional emotion recognition strategies supported by image process technology suffer from low recognition accuracy and problems in feature extraction in real time. In addition, the recognition accuracy can be affected by blank and/or blurry images/frames in a video. To solve these problems, a deep learning-based methodology is proposed in the study to achieve efficient and effective FER. The proposed method consists of two stages, where the first stage uses the multitask convolution neural network (MTCNN) to distinguish the precise face bounding box positions, and the second stage adopts an end-to-end YOLOv3 deep neural network to achieve real-time recognition of emotions and features. By training our model using RAF-DB, CK+ and FER2013 datasets, the accuracy reaches 86.1%, 95.6%, and 69% respectively. Further analysis shows that our model can detect driving fatigue with accuracy of 82.3% using the YAWDD dataset. In the test using the AFEW dataset to simulate a real environment, the accuracy of our model reaches 62.5%. The proposed FER model can facilitate the identification of customers’ intentions and reactions in the service process.
References
Abtahi, S., Omidyeganeh, M., Shirmohammadi, S., & Hariri, B. (2014, March). YawDD: A yawning detection dataset. In Proceedings of the 5th ACM multimedia systems conference (pp. 24-28).
Bochkovskiy, A., Wang, C. Y., & Liao, H. Y. M. (2020). Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934.
Bi, X., Chen, Z., & Yue, J. (2020, December). A Novel One-step Method Based on YOLOv3-tiny for Fatigue Driving Detection. In 2020 IEEE 6th International Conference on Computer and Communications (ICCC) (pp. 1241-1245). IEEE.
Chen, R. C. (2019). Automatic License Plate Recognition via sliding-window darknet-YOLO deep learning. Image and Vision Computing, 87, 47-56.
Chen, E., Gong, Y., & Tie, Y. (Eds.). (2016). Advances in Multimedia Information Processing-PCM 2016: 17th Pacific-Rim Conference on Multimedia, Xi ́ An, China, September 15-16, 2016, Proceedings, Part I (Vol. 9916). Springer.
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., & Taylor, J. G. (2001). Emotion recognition in human-computer interaction. IEEE Signal processing magazine, 18(1), 32-80.
Dhall, A., Goecke, R., Lucey, S., & Gedeon, T. (2012). Collecting large, richly annotated facial-expression databases from movies. IEEE Annals of the History of Computing, 19(03), 34-41.
Ekman, P., & Friesen, W. V. (1971). Constants across cultures in the face and emotion. Journal of personality and social psychology, 17(2), 124.
Ekman, P., & Friesen, W. V. (2003). Unmasking the face: A guide to recognizing emotions from facial clues (Vol. 10). Ishk.
Ghofrani, A., Toroghi, R. M., & Ghanbari, S. (2019). Realtime face-detection and emotion recognition using mtcnn and minishufflenet v2. In 2019 5th Conference on Knowledge Based Engineering and Innovation (KBEI) (pp. 817-821). IEEE.
Giannopoulos, P., Perikos, I., & Hatzilygeroudis, I. (2018). Deep learning approaches for facial emotion recognition: A case study on FER-2013. In Advances in hybridization of intelligent methods (pp. 1-16). Springer, Cham.
González-Rodríguez, M. R., Díaz-Fernández, M. C., & Gómez, C. P. (2020). Facial-expression recognition: an emergent approach to the measurement of tourist satisfaction through emotions. Telematics and Informatics, 51, 101404.
Goodfellow, I. J., Erhan, D., Carrier, P. L., Courville, A., Mirza, M., Hamner, B., ... & Bengio, Y. (2013, November). Challenges in representation learning: A report on three machine learning contests. In International conference on neural information processing (pp. 117-124). Springer, Berlin, Heidelberg.
Hamm, J., Kohler, C. G., Gur, R. C., & Verma, R. (2011). Automated facial action coding system for dynamic analysis of facial expressions in neuropsychiatric disorders. Journal of neuroscience methods, 200(2), 237-256.
Huang, R., Pedoeem, J., & Chen, C. (2018, December). YOLO-LITE: a real-time object detection algorithm optimized for non-GPU computers. In 2018 IEEE International Conference on Big Data (Big Data) (pp. 2503-2510). IEEE.
Indira, D. N. V. S. L. S., Sumalatha, L., & Markapudi, B. R. (2021, February). Multi Facial Expression Recognition (MFER) for Identifying Customer Satisfaction on Products using Deep CNN and Haar Cascade Classifier. In IOP Conference Series: Materials Science and Engineering (Vol. 1074, No. 1, p. 012033). IOP Publishing.
Kanade, T., Cohn, J. F., & Tian, Y. (2000, March). Comprehensive database for facial expression analysis. In Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580) (pp. 46-53). IEEE.
Kang, H. B. (2013). Various approaches for driver and driving behavior monitoring: A review. In Proceedings of the IEEE International Conference on Computer Vision Workshops (pp. 616-623).
Kim, B. K., Roh, J., Dong, S. Y., & Lee, S. Y. (2016). Hierarchical committee of deep convolutional neural networks for robust facial expression recognition. Journal on Multimodal User Interfaces, 10(2), 173-189.
Ko, B. C. (2018). A brief review of facial emotion recognition based on visual information. sensors, 18(2), 401.
Landowska, A., & Miler, J. (2016, September). Limitations of emotion recognition in software user experience evaluation context. In 2016 Federated Conference on Computer Science and Information Systems (FedCSIS) (pp. 1631-1640). IEEE.
Laroca, R., Severo, E., Zanlorensi, L. A., Oliveira, L. S., Gonçalves, G. R., Schwartz, W. R., & Menotti, D. (2018, July). A robust real-time automatic license plate recognition based on the YOLO detector. In 2018 International Joint Conference on Neural Networks (IJCNN) (pp. 1-10). IEEE.
Lei, J., Han, Q., Chen, L., Lai, Z., Zeng, L., & Liu, X. (2017). A novel side face contour extraction algorithm for driving fatigue statue recognition. IEEE Access, 5, 5723-5730.
Li, S., Deng, W., & Du, J. (2017). Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2852-2861).
Lien, J. J., Kanade, T., Cohn, J. F., & Li, C. C. (1998, April). Automated facial expression recognition based on FACS action units. In Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition (pp. 390-395). IEEE.
Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010, June). The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In 2010 ieee computer society conference on computer vision and pattern recognition-workshops (pp. 94-101). IEEE.
Luh, G. C., Wu, H. B., Yong, Y. T., Lai, Y. J., & Chen, Y. H. (2019, July). Facial Expression Based Emotion Recognition Employing YOLOv3 Deep Neural Networks. In 2019 International Conference on Machine Learning and Cybernetics (ICMLC) (pp. 1-7). IEEE.
Mehrabian, A. (2017). Nonverbal communication. Routledge.
Noordewier, M. K., Topolinski, S., & Van Dijk, E. (2016). The temporal dynamics of surprise. Social and Personality Psychology Compass, 10(3), 136-149.
Pang, L., Liu, H., Chen, Y., & Miao, J. (2020). Real-time concealed object detection from passive millimeter wave images based on the YOLOv3 algorithm. Sensors, 20(6), 1678.
Poria, S., Cambria, E., Bajpai, R., & Hussain, A. (2017). A review of affective computing: From unimodal analysis to multimodal fusion. Information Fusion, 37, 98-125.
Redmon, J., & Farhadi, A. (2018,). YOLOv3: An Incremental Improvement. arXiv pre-print server. arxiv:1804.02767.
Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 815-823).
Söderlund, M., & Rosengren, S. (2008). Revisiting the smiling service worker and customer satisfaction. International Journal of Service Industry Management.
Su, C., & Wang, G. (2020, October). Design and application of learner emotion recognition for classroom. In Journal of Physics: Conference Series (Vol. 1651, No. 1, p. 012158). IOP Publishing.
Szeliski, R. (2010). Computer vision: algorithms and applications. Springer Science & Business Media.
Tacconi, D., Mayora, O., Lukowicz, P., Arnrich, B., Setz, C., Troster, G., & Haring, C. (2008, January). Activity and emotion recognition to support early diagnosis of psychiatric diseases. In 2008 Second International Conference on Pervasive Computing Technologies for Healthcare (pp. 100-102). IEEE.
Tian, Y. L., Kanade, T., & Cohn, J. F. (2005). Facial expression analysis. In Handbook of face recognition (pp. 247-275). Springer, New York, NY.
Tian, Y. I., Kanade, T., & Cohn, J. F. (2001). Recognizing action units for facial expression analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(2), 97-115.
Viola, P., & Jones, M. (2001, December). Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001 (Vol. 1, pp. I-I). IEEE.
Xie, Y., Chen, K., & Murphey, Y. L. (2018, November). Real-time and robust driver yawning detection with deep neural networks. In 2018 IEEE Symposium Series on Computational Intelligence (SSCI) (pp. 532-538). IEEE.
Yang, B., Cao, J., Ni, R., & Zhang, Y. (2017). Facial expression recognition using weighted mixture deep neural network based on double-channel facial images. IEEE Access, 6, 4630-4640.
Zhang, K., Zhang, Z., Li, Z., & Qiao, Y. (2016). Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, 23(10), 1499-1503.
Ziyana, C., Jianming, W., & Guanghao, J. (2019). Driver state detection framework based on temporal facial action information. Application Research of Computers. 36(11).
Zhang, W., & Su, J. (2017, November). Driver yawning detection based on long short term memory networks. In 2017 IEEE Symposium Series on Computational Intelligence (SSCI) (pp. 1-5). IEEE.