在嵌入式系統上實現基於深度學習之即時臉部表情辨識

簡易檢索 / 詳目顯示

回結果列表

研究生：	李庚澔 Keng-Hao Lee
論文名稱：	在嵌入式系統上實現基於深度學習之即時臉部表情辨識 Deep Learning Based Real-Time Facial Expression Recognition on Embedded Systems
指導教授：	林昌鴻 Chang-Hong Lin
口試委員:	陳維美 Wei-Mei Chen 陳郁堂 Yie-Tarng Chen 林敬舜 Ching-Shun Lin
學位類別：	碩士 Master
系所名稱：	電資學院 - 電子工程系 Department of Electronic and Computer Engineering
論文出版年：	2019
畢業學年度：	107
語文別：	英文
論文頁數：	43
中文關鍵詞：	即時臉部表情辨識、臉部表情辨識、情緒、深度學習、嵌入式系統、手機應用程式
外文關鍵詞：	Real-Time Facial Expression Recognition
相關次數：	點閱：217 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

如何讓機器了解人類的情緒，並加以做更多的分析應用，因此臉部表情辨識(FER)成為一項重要的課題。然而傳統的機器學習方法在分析臉部表情時，需要萃取出上百個人工特徵，在面對不同場景以及人臉的情境下，傳統的機器學習方法存在著設計上的困難。深度學習最近在臉部表情辨識的課題上，被廣泛應用且提高辨識的準確率。本論文採用深度學習的架構並且將此臉部表情辨識系統移植到嵌入式硬體上。在嵌入式硬體上，我們能將臉部表情辨識帶入更多的應用情境上：比如說收集手機使用者的情緒或是使用者對廣告的反應，使得諸多訊息得以被收集以及做更多的分析。為了解決資料庫的資料不平衡問題，我們使用了類別平衡來使模型更能平均的學習各個類別。同時，我們採用資料擴增來豐富我們的資料庫，以利模型對於不同場景狀況的臉部表情辨識能更強健。此外，我們設計一款安裝於手機上的即時臉部表情辨識應用程式，其平均的運算速度約為每秒10~15幀，且平均的表情準確率為52.25%。

Facial expression recognition (FER) is a significant task for the machines to understand human emotions. However, traditional approaches need numerous different hand-crafted features which are difficult to design to adapt different situations. Deep learning is recently being adopted to solve FER problem because of its high accuracy. The proposed method adopted the deep learning architecture and further migrate the architecture to an embedded system. By migrating the proposed method to embedded systems can bring more applications to real world, such as analyzing the emotions of mobile users or collecting the users’ reactions to advertisements. To address the small dataset and extremely data imbalanced situation, we adopted data augmentation to increase the training samples, and class weight balancing during training to avoid the model to be dominated by the majority categories. Unlike most methods required high computation costs, such as a high-end CPU, and a GPU; we designed a mobile application for real-time facial expression recognition, and the average runtime is about 10-15 frames per second with the average accuracy at 52.25%.

摘要    i
Abstract    ii
致謝    iii
Table of Contents    iv
List of Figures    vi
List of Tables    vii
Chapter 1 - Introduction    1
1 Motivation    1
2 Contribution    2
3 Thesis Organization    3
Chapter 2 - Related Works    4
1 Hand-crafted Features FER    4
2 Deep Learning Based FER    5
Chapter 3 - Proposed Methods    6
1 Data preprocessing    6
1.1 Data Cleaning    6
1.2 Data Augmentation    8
2 Network Architecture    9
2.1 Overview of the YOLOv3 [29]    9
2.2 Backbone of the YOLOv3 [29]    10
2.3 Head of the YOLOv3 [29]    11
2.3 Prediction    12
2.4 Loss functions    15
3 Improved YOLOv3 [29]    16
3.1 Transfer Learning    17
3.2 Anchor Boxes    17
3.3 Categorical Cross Entropy    18
3.4 Class Weight Balancing    18
Chapter 4 - Training and Implementation    19
1 Training    19
1.1 Initialization    19
1.2 Learning Rate Decay    20
1.3 Optimizer    20
2 Implementation    20
2.1 Frameworks    20
2.2 Mobile Application    21
Chapter 5 - Experiment Results    23
1 Experimental Settings    23
2 RAF Dataset [13]    24
3 Evaluation    25
3.1 Evaluation Metrics    25
3.2 Evaluation of The Proposed Method    25
4 Comparison    26
4.1 Face Detection Comparison    26
4.2 Facial Expression Recognition Comparison    27
Chapter 6 - Conclusions and Future Works    28
1 Conclusions    28
2 Future Works    29
References    30


                                

[1] P. Abhang, S. Rao, B. W. Gawali, and P. Rokade, "Emotion Recognition using Speech and EEG Signal–A," International Journal of Computer Applications, vol. 975, p. 8887, 2011.
[2] P. Ekman et al., "Universals and cultural differences in the judgments of facial expressions of emotion," Journal of personality and social psychology, vol. 53, no. 4, p. 712, 1987.
[3] T. Ojala, M. Pietikäinen, and T. Mäenpää, "Multiresolution gray-scale and rotation invariant texture classification with local binary patterns," IEEE Transactions on Pattern Analysis & Machine Intelligence, no. 7, pp. 971-987, 2002.
[4] A. K. Jain and F. Farrokhnia, "Unsupervised texture segmentation using Gabor filters," Pattern recognition, vol. 24, no. 12, pp. 1167-1186, 1991.
[5] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012, pp. 1097-1105.
[6] H. Kaya, F. Gürpınar, and A. A. Salah, "Video-based emotion recognition in the wild using deep transfer learning and score fusion," Image and Vision Computing, vol. 65, pp. 66-75, 2017.
[7] M. M. McConnell and K. W. Eva, "The role of emotion in the learning and transfer of clinical skills and knowledge," Academic Medicine, vol. 87, no. 10, pp. 1316-1322, 2012.
[8] H.-W. Ng, V. D. Nguyen, V. Vonikakis, and S. Winkler, "Deep learning for emotion recognition on small datasets using transfer learning," in Proceedings of the 2015 ACM on international conference on multimodal interaction, 2015: ACM, pp. 443-449.
[9] C. Lu et al., "Multiple spatio-temporal feature learning for video-based emotion recognition in the wild," in Proceedings of the 2018 on International Conference on Multimodal Interaction, 2018: ACM, pp. 646-652.
[10] C. Liu, T. Tang, K. Lv, and M. Wang, "Multi-feature based emotion recognition for video clips," in Proceedings of the 2018 on International Conference on Multimodal Interaction, 2018: ACM, pp. 630-634.
[11] Y. Fan, J. C. Lam, and V. O. Li, "Video-based emotion recognition using deeply-supervised neural networks," in Proceedings of the 2018 on International Conference on Multimodal Interaction, 2018: ACM, pp. 584-588.
[12] V. Vielzeuf, C. Kervadec, S. Pateux, A. Lechervy, and F. Jurie, "An occam's razor view on learning audiovisual emotion recognition with small training sets," in Proceedings of the 2018 on International Conference on Multimodal Interaction, 2018: ACM, pp. 589-593.
[13] S. Li, W. Deng, and J. Du, "Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2852-2861.
[14] R. Ekman, What the face reveals: Basic and applied studies of spontaneous expression using the Facial Action Coding System (FACS). Oxford University Press, USA, 1997.
[15] M. Abdulrahman, T. R. Gwadabe, F. J. Abdu, and A. Eleyan, "Gabor wavelet transform based facial expression recognition using PCA and LBP," in 2014 22nd Signal Processing and Communications Applications Conference (SIU), 2014: IEEE, pp. 2265-2268.
[16] H.-B. Deng, L.-W. Jin, L.-X. Zhen, and J.-C. Huang, "A new facial expression recognition method based on local Gabor filter bank and PCA plus LDA," International Journal of Information Technology, vol. 11, no. 11, pp. 86-96, 2005.
[17] P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews, "The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression," in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, 2010: IEEE, pp. 94-101.
[18] B. Fasel, "Robust face analysis using convolutional neural networks," in Object recognition supported by user interaction for service robots, 2002, vol. 2: IEEE, pp. 40-43.
[19] B. Fasel, "Head-pose invariant facial expression recognition using convolutional neural networks," in Proceedings of the 4th IEEE International Conference on Multimodal Interfaces, 2002: IEEE Computer Society, p. 529.
[20] M. Matsugu, K. Mori, Y. Mitari, and Y. Kaneda, "Subject independent facial expression recognition with robust face detection using a convolutional neural network," Neural Networks, vol. 16, no. 5-6, pp. 555-559, 2003.
[21] Y. Chen, X. Zhao, and X. Jia, "Spectral–spatial classification of hyperspectral data based on deep belief network," IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 8, no. 6, pp. 2381-2392, 2015.
[22] H. Boughrara, M. Chtourou, C. B. Amar, and L. Chen, "Facial expression recognition based on a mlp neural network using constructive training algorithm," Multimedia Tools and Applications, vol. 75, no. 2, pp. 709-731, 2016.
[23] Z. Yu and C. Zhang, "Image based static facial expression recognition with multiple deep network learning," in Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, 2015: ACM, pp. 435-442.
[24] B.-K. Kim, H. Lee, J. Roh, and S.-Y. Lee, "Hierarchical committee of deep cnns with exponentially-weighted decision fusion for static facial expression recognition," in Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, 2015: ACM, pp. 427-434.
[25] B. Sun et al., "Combining multimodal features within a fusion network for emotion recognition in the wild," in Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, 2015: ACM, pp. 497-502.
[26] R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580-587.
[27] J. Li et al., "Facial expression recognition with faster R-CNN," Procedia Computer Science, vol. 107, pp. 135-140, 2017.
[28] S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks," in Advances in neural information processing systems, 2015, pp. 91-99.
[29] J. Redmon and A. Farhadi, "Yolov3: An incremental improvement," arXiv preprint arXiv:1804.02767, 2018.
[30] A. Dhall, R. Goecke, S. Lucey, and T. Gedeon, "Collecting large, richly annotated facial-expression databases from movies," IEEE multimedia, vol. 19, no. 3, pp. 34-41, 2012.
[31] S. Li and W. Deng, "Reliable crowdsourcing and deep locality-preserving learning for unconstrained facial expression recognition," IEEE Transactions on Image Processing, vol. 28, no. 1, pp. 356-370, 2018.
[32] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, "Joint face detection and alignment using multitask cascaded convolutional networks," IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1499-1503, 2016.
[33] V. Jain and E. Learned-Miller, "Fddb: A benchmark for face detection in unconstrained settings," 2010.
[34] S. Yang, P. Luo, C.-C. Loy, and X. Tang, "Wider face: A face detection benchmark," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 5525-5533.
[35] A. Bulat and G. Tzimiropoulos, "How far are we from solving the 2d & 3d face alignment problem?(and a dataset of 230,000 3d facial landmarks)," in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 1021-1030.
[36] C. Sagonas, E. Antonakos, G. Tzimiropoulos, S. Zafeiriou, and M. Pantic, "300 faces in-the-wild challenge: Database and results," Image and vision computing, vol. 47, pp. 3-18, 2016.
[37] T. Baltrušaitis, P. Robinson, and L.-P. Morency, "Openface: an open source facial behavior analysis toolkit," in 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), 2016: IEEE, pp. 1-10.
[38] E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, and Q. V. Le, "Autoaugment: Learning augmentation policies from data," arXiv preprint arXiv:1805.09501, 2018.
[39] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779-788.
[40] J. Redmon and A. Farhadi, "YOLO9000: better, faster, stronger," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7263-7271.
[41] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[42] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature pyramid networks for object detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2117-2125.
[43] V. Nair and G. E. Hinton, "Rectified linear units improve restricted boltzmann machines," in Proceedings of the 27th international conference on machine learning (ICML-10), 2010, pp. 807-814.
[44] S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," arXiv preprint arXiv:1502.03167, 2015.
[45] A. L. Maas, A. Y. Hannun, and A. Y. Ng, "Rectifier nonlinearities improve neural network acoustic models," in Proc. ICML, 2013, vol. 30, p. 3.
[46] N. Bodla, B. Singh, R. Chellappa, and L. S. Davis, "Soft-NMS--Improving Object Detection With One Line of Code," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 5561-5569.
[47] K. He, X. Zhang, S. Ren, and J. Sun, "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification," in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1026-1034.
[48] W. Liu et al., "Ssd: Single shot multibox detector," in European conference on computer vision, 2016: Springer, pp. 21-37.
[49] R. Girshick, "Fast r-cnn," in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440-1448.
[50] K. He, G. Gkioxari, P. Dollár, and R. Girshick, "Mask r-cnn," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2961-2969.
[51] A. Kuznetsova et al., "The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale," arXiv preprint arXiv:1811.00982, 2018.
[52] D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.
[53] X. Glorot and Y. Bengio, "Understanding the difficulty of training deep feedforward neural networks," in Proceedings of the thirteenth international conference on artificial intelligence and statistics, 2010, pp. 249-256.
[54] I. Apple. "Core ML Framework." https://developer.apple.com/documentation/coreml (accessed.
[55] I. Apple. "Vision Framework." https://developer.apple.com/documentation/vision (accessed.
[56] "Kears." https://keras.io/ (accessed.
[57] "Core ML Community Tools." https://github.com/apple/coremltools (accessed.
[58] Y. Fan, J. C. Lam, and V. O. Li, "Multi-region ensemble convolutional neural network for facial expression recognition," in International Conference on Artificial Neural Networks, 2018: Springer, pp. 84-94.

全文公開日期 2024/08/23 (校內網路)
全文公開日期 2024/08/23 (校外網路)
全文公開日期 2024/08/23 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文