簡易檢索 / 詳目顯示

研究生: 黃柏程
Po-Cheng Huang
論文名稱: 基於遞歸神經網路之混合分類式臉部情緒辨識系統
A Hybrid Facial Expression Recognition System based on Recurrent Neural Network
指導教授: 郭景明
Jing-Ming Guo
口試委員: 花凱龍
Kai-Lung Hua
王乃堅
Nai-Jian Wang
林鼎然
Ting-Lan Lin
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 115
中文關鍵詞: 表情檢測深度學習臉部行為視覺線索隨機森林
外文關鍵詞: Facial expression recognition, Deep learning, Facial behavior, Visual clues, Random Forest
相關次數: 點閱:427下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在現今視訊監控的應用中,基於臉部特徵的情緒辨識為較具有挑戰性且相對重要議題。在本論文中提出了一種基於特徵擷取之混合式臉部情緒辨識系統,並且應用於分辨情緒類別,共可得到六類的情緒輸出。在前處理部分先使用隨機森林分類器來擷取臉部特徵點,在臉部特徵點提取方面,透過改進隨機森林的特徵的擷取與分割法則,使其能得到較為準確的樣本分群結果,並且能抵抗旋轉、光影變化等外在因素帶來的影響,並利用提取出的特徵點對臉部區域做校正。在特徵擷取方面,近年來伴隨科技的發展與深度學習技術方面的成熟,我們改變以往傳統影像處理的方式,並導入深度學習之技術利用臉部特徵點用於分析臉部動作模組、臉部動作單元分析臉部細微變化、深度學習網路提取高維度特徵資訊,並結合以上三種特徵彼此間的搭配組合,得到更好的分類結果。在實驗結果方面,本論文使用Cohn-Kanade資料庫與Oulu-CASIA資料庫分別進行測試並與前人技術比較,儘管影片中存在不受控制的因素,如照明和頭部姿勢,但從結果可看出相較於前人所提出的技術,本論文所提出的算法皆可獲得良好的準確率,且大幅度降低了參數使用量,也因此有相當大的潛力可被應用於現實生活中,並仍保有一定的準確性。


    Facial expression recognition (FER) based on facial features is an important and challenging problem for automatic inspection of surveillance videos. In this thesis, a hybrid facial expression recognition system is proposed based on facial features, and a total of six facial expressions can be recognized. For the pre-processing, the Random Forest classifier is applied to track the facial landmark points, which is utilized for face alignment. In the feature extraction, with the advance of deep learning technology, we introduced the hybrid RNN technique by incorporating with deep learning to extract robust features. In addition, the geometrical features and the facial action unit based on the movement of the facial feature point are also considered. As the results, we evaluate the proposed method on the two database, CK+ and Oulu-CASIA, and compare with previous works. Though there are uncontrolled factors in the videos, such as lighting and head posture, the proposed method can achieve superior performance than former schemes. Thus, the proposed method has considerable potential to be applied in the practical applications.

    中文摘要 I Abstract II 誌謝 III 目錄 IV 圖表索引 VI 第一章 緒論 1 1.1 研究背景與動機 1 1.2 論文架構 3 第二章 文獻探討 4 2.1 特徵擷取相關文獻 5 2.1.1 情緒辨識相關研究 5 2.1.2 臉部動作編碼系統(Facial Action Coding System) 12 2.2 臉部特徵點擷取相關文獻 15 2.3 類神經網路相關文獻 26 2.3.1 向前傳播(Forward Propagation) 27 2.3.2 反向傳播(Backward Propagation) 29 2.3.3 影響神經網路效能的因素 34 2.3.4 卷積神經網路 38 2.3.5 卷積神經網路架構之發展 40 2.3.6 遞歸神經網路 46 2.4 基於神經網路之情緒辨識相關文獻 49 第三章 臉部情緒辨識 52 3.1 系統架構 53 3.2 前處理 54 3.2.1 人臉偵測(Face Detection) 54 3.2.2 感興趣之區域擷取(Region Of Interest, ROI) 57 3.3 特徵擷取 59 3.3.1 卷積特徵(Convolutional Features)(perturb) 59 3.3.2 臉部特徵點(Facial Landmark Point) 65 3.3.3 臉部動作單元(Facial Action Unit) 69 第四章 實驗結果 85 4.1 網路結構訓練 85 4.2 資料庫 86 4.3 他人技術比較 90 4.4 參數使用量 94 第五章 結論與未來展望 96 參考文獻 97

    [1]P. Ekman and W. V. Friesen. Constants across cultures in the face and emotion. Journal of personality and social psychology, 17(2):124, 1971.
    [2]M. Pantic and L. J. M. Rothkrantz, “Automatic analysis of facial expressions: The state of the art,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 12, pp. 1424–1445, Dec. 2000.
    [3]Calder, A. J., Burton, A. M., Miller, P., Young, A. W., & Akamatsu, S . A principal component analysis of facial expressions. Vision Research, 41, 1179 –1208, 2001.
    [4]M. Pantic, M. Valstar, R. Rademaker, and L. Maat. Webbased database for facial expression analysis. In 2005 IEEE international conference on multimedia and Expo, pages 5– pp. IEEE, 2005.
    [5]A. Dhall, R. Goecke, S. Ghosh, J. Joshi, J. Hoey, and T. Gedeon. From individual to group-level emotion recognition: Emotiw 5.0. In Proceedings of the 19th ACM International Conference on Multimodal Interaction, pages 524–528. ACM, 2017.
    [6]K. Sikka, T.Wu, J. Susskind, and M. Bartlett. Exploring bag of words architectures in the facial expression domain. In European Conference on Computer Vision, pages 250–259. Springer, 2012.
    [7]M. Liu, S. Shan, R. Wang, and X. Chen, “Learning expressionlets on spatio-temporal manifold for dynamic facial expression recognition,” in Proc. IEEE Conf. Compute. Vis. Pattern Recognition. (CVPR), Jun. pp. 1749–1756, 2014.
    [8]H. Jung, S. Lee, J. Yim, S. Park, and J. Kim, “Joint fine-tuning in deep neural networks for facial expression recognition,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV)., pp. 2983–2991, 2015.
    [9]T. Ojala, M. Pietikainen, and T. Maenpaa. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on pattern analysis and machine intelligence, 24(7):971–987, 2002.
    [10]N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 1, pages 886–893. IEEE, 2005.
    [11]C. Fabian Benitez-Quiroz, R. Srinivasan, and A. M. Martinez. Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5562–5570, 2016.
    [12]J. Yan, W. Zheng, Z. Cui, C. Tang, T. Zhang, Y. Zong, and N. Sun, “Multi-clue fusion for emotion recognition in the wild,” in Proceedings of the 18th ACM International Conference on Multimodal Interaction. ACM, pp. 458–463, 2016
    [13]X. Ouyang, S. Kawaai, E. G. H. Goh, S. Shen, W. Ding, H. Ming, and D.-Y. Huang, “Audio-visual emotion recognition using deep transfer learning and multiple temporal models,” in Proceedings of the 19th ACM International Conference on Multimodal Interaction. ACM, pp. 577–582, 2017.
    [14]V. Vielzeuf, S. Pateux, and F. Jurie, “Temporal multimodal fusion for video emotion classification in the wild,” in Proceedings of the 19th ACM International Conference on Multimodal Interaction. ACM, pp. 569–576, 2017.
    [15]T. Ojala, M. Pietikinen and D. Harwood, “A comparative study of texture measures with classification with local binary patterns,” in Pattern Recognition, vol. 29, no. 1, 1996.
    [16]P. Liu, J. M. Guo, et al., “Ocular Recognition for Blinking Eyes,” in IEEE Transactions on Image Processing, 2017.
    [17]P. Ekman and W. Friesen, “Facial Action Coding System: A Technique for the Measurement of Facial Movement. Palo Alto: Consulting Psychologists Press,” 1978.
    [18]Y. I. Tian, T. Kanade, and J. F. Cohn, “Recognizing action units for facial expression analysis,” in IEEE Transactions on pattern analysis and machine intelligence, vol. 23, no. 2, pp. 97-115, 2001.
    [19]X. Cao, Y. Wei, F. Wen, and J. Sun, “Face alignment by explicit shape regression,” in International Journal of Computer Vision , vol. 107, no. 2, pp. 177-190, 2014.
    [20]P. N. Belhumeur, D. W. Jacobs, D. J. Kriegman and N. Kumar, “Localization parts of faces using a consensus of exemplars,” in IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 12, pp. 2930-2940, 2013.
    [21]X. Xiong and F. D. L. Torre, “Supervised Descent Method and its Applications to Face Alignment,” in IEEE Computer Vision and Pattern Recognition (CVPR), pp. 532-539, June, 2013.
    [22]D. G. Lowe, “Object Recognition from Local Scale-Invariant Features,” in Computer vision, 1999. The proceedings of the seventh IEEE international conference on, vol. 2, pp. 1150-1157, 1999.
    [23]X. P. Burgos-Artizzu, P. Perona, and P. Dollar, “Robust face landmark estimation under occlusion,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 1513-1520, 2013.
    [24]S. Zhu, C. Li, C. C. Loy and X. Tang, “Face Alignment by Coarse-to-Fine Shape Searching,” in Proceedings of the IEEE International Conference on Computer Vision, 2015.
    [25]S. Ren, X. Cao, Y. Wei and J. Sin, “Face Alignment at 3000 FPS via Regression Local Binary Features, ” in IEEE Computer Vision and Pattern Recognition (CVPR), pp. 23-28, June 2014.
    [26]L. Breiman, “Random Forests,” in Machine Learning, vol. 45, pp. 5-32, Oct. 2001.
    [27]J. M. Guo, S. H. Tseng, and K. Wong, “Accurate Facial Landmark Extraction,” in IEEE Signal Processing Letters, vol. 23, no. 5, pp. 605-609, 2016.
    [28]A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems (NIPS), 2012.
    [29]J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A Large-Scale Hierarchical Image Database,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2009.
    [30]M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional neural networks,” in Proc. European Conference on Computer Vision (ECCV), 2014.
    [31]K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. Int. Conf. Learning Representations (ICLR), 2015.
    [32]C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2015.
    [33]K. He, X. Zhang, S. Ren, and J. Sun , “Deep residual learning for image recognition,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2016.
    [34]G. Huang, Z. Liu, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2017.
    [35]Howard, Andrew G., Zhu, Menglong, Chen, Bo, Kalenichenko, Dmitry, Wang, Weijun, Weyand, Tobias, Andreetto, Marco, and Adam, Hartwig. MobileNets: Efficient convolutional neural networks for mobile vision applications. CoRR, abs/1704.04861, 2017.
    [36]J. L. Elman, “Finding structure in time,” Cognitive science, vol. 14, no. 2, pp. 179–211, 1990.
    [37]S. Hochreiter, J. Schmidhuber, “Long short-term memory”, Neural Comput., 9 (1997) 1735–1780.
    [38]K. Zhang, Y. Huang, Y. Du, and L. Wang, “Facial expression recognition based on deep evolutional spatial-temporal networks,” IEEE Transactions on Image Processing, vol. 26, no. 9, pp. 4193–4203, 2017
    [39]P. Khorrami, T. Le Paine, K. Brady, C. Dagli, and T. S. Huang. How deep neural networks can improve emotion recognition on video data. In Image Processing (ICIP), 2016 IEEE International Conference on, pages 619–623. IEEE, 2016.
    [40]A. Asthana, S. Zafeiriou, S. Cheng, and M. Pantic, “Robust discriminative response map fitting with constrained local models,” in Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on. IEEE, pp. 3444–3451, 2013.
    [41]D. Hamester, P. Barros, and S. Wermter, “Face expression recognition with a 2-channel convolutional neural network,” in Neural Networks (IJCNN), 2015 International Joint Conference on. IEEE, pp. 1–8, 2015.
    [42]X. Zhao, X. Liang, L. Liu, T. Li, Y. Han, N. Vasconcelos, and S. Yan, “Peak-piloted deep network for facial expression recognition,” in European conference on computer vision. Springer, pp. 425– 442, 2016.
    [43]Y. Fan, X. Lu, D. Li, and Y. Liu. Video-based emotion recognition using cnn-rnn and c3d hybrid networks. In Proceedings of the 18th ACM International Conference on Multimodal Interaction, ICMI 2016, pages 445–450, New York, NY, USA, ACM, 2016
    [44]B. Hasani and M. H. Mahoor. Facial expression recognition using enhanced deep 3d convolutional neural networks. In Proceedings of CVPRW, pages 2278–2288. IEEE, 2017.
    [45]J. Donahue, L. Anne Hendricks, S. Guadarrama. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2625–2634, 2015.
    [46]P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Proc. Computer Vision and Pattern Recognition, pp. 1-9, 2001.
    [47]G. Zhao, X. Huang, M. Taini, S. Z. Li, and M. Pietik¨ainen. Facial expression recognition from near-infrared videos. Image and Vision Computing, 29(9):607–619, 2011.
    [48]M. F. Valstar, et al., “Fera 2015-second facial expression recognition and analysis challenge,” 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG) , vol. 6, pp. 1-8, 2015.
    [49]T. Baltrušaitis, M. Mahmoud, and P. Robinson, “Cross-dataset learning and person-specific normalisation for automatic action unit detection,” 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), vol. 6, pp.1-6, 2015.
    [50]T. Kanade, J. F. Cohn and Y. Tian, “Comprehensive database for facial expression analysis,” Fourth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 46-53, 2000.
    [51]R. Gross, I. Matthews, J. Cohn, T. Kanade, and S. Baker, “Multi-pie,” in Image and Vision Computing, vol. 28, no. 5, pp. 1-8, Sep. 2010.
    [52]C.-M. Kuo, S.-H. Lai, and M. Sarkis, “A compact deep learning model for robust facial expression recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 2121–2129, 2018.
    [53]D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
    [54]J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” arXiv preprint arXiv:1412.3555, 2014.
    [55]Y. Kim, B. Yoo, Y. Kwak, C. Choi, and J. Kim, “Deep generative contrastive networks for facial expression recognition,” arXiv preprint arXiv:1703.07140, 2017.

    QR CODE