簡易檢索 / 詳目顯示

研究生: 胡允誠
YUN-CHENG HU
論文名稱: 運用混合式深度學習方法於即時人臉表情辨識
Real-Time Facial Expression Recognition Using Hybrid Deep Learning
指導教授: 陳建中
Jiann-Jone Chen
口試委員: 杭學鳴
Hsueh-Ming Hang
鍾國亮
Kuo-Liang Chung
吳怡樂
Yi-Leh Wu
花凱龍
Kai-Lung Hua
陳建中
Jiann-Jone Chen
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2018
畢業學年度: 107
語文別: 中文
論文頁數: 82
中文關鍵詞: 深度學習神經網路卷積表情辨識即時辨識
外文關鍵詞: Deep Learning, CNN, Convolution, Neural Network, Real-Time, Facial Expression Recognition
相關次數: 點閱:234下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 臉部表情識別在電腦視覺領域中為相當熱門的研究之一,目前仍是一充滿挑戰的主題。隨著機器學習演算法的效能提升,深度學習應用也日漸普及。應用深度學習方法於辨識臉部表情可以顯著提升準確度。在一般機器學習方法中,會針對某一個資料庫做訓練參數微調,使得訓練結果處理當前資料庫有中最佳的效能。而深度學習不僅在訓練過程中能獲致更好的結果,且類神經網路也能發揮自主學習的能力,自動尋找出訓練資料的特徵,因而不須再針對當前資料集,自行設計一個擷取特徵的模型。運用深度學習方法所需的訓練過程,運算量大所需時間長,例如使用早期的卷積神經網路(Convolutional Neural Network, CNN)架構訓練FER2013人臉表情資料庫時,就需要耗費大約2千萬個參數運算量。若再經過一些參數微調設定,如此一往復,所需要花費的訓練時間就相當可觀。本論文參考谷歌(Google)的 Inception與Xception架構來設計深度學習網路方法,並提出串聯級方法。在第一個網路架構中,由兩層的卷積層開始,每層卷積後都有最大池化層(Pooling)緊,接著由三個Inception層所組成。另一個網路架構也是由兩層卷積層開始,接續由三個Xception層所組成。使用這兩種深度學習網路架構,除了能提升辨識準確率,也可降低運算量,最後使用一融合層(Merge Layer)將兩個架構串聯(Concatenate)。實驗結果顯示本論文所提之方法,針對人臉表情辨識準確度可達到70.1%。為驗證實際應用的功效,我們開發整合出一個臉部表情即時識別系統,實驗表現出良好的即時辨識效果,辨識速度每秒可達幀率為6至7張,意即辨識一張影像只需0.143秒。另外,本論文中我們亦提出一個更新資料庫的方法,讓使用者可以依照當前情境之需求來建置新的樣本與調整資料庫。


    Facial expression recognition becomes important and popular in computer vision applications. With the advance of machine learning technology, one can develop deep learning methods to improve the facial expression recognition performance. In general, it impose one machine-learning module on one specific database to fine tune system parameters to yield the best classification performance. The deep learning algorithm can further boost the machine learning ability in dealing with new databases. By utilizing the neural network model and machine-learning framework, the deep learning possesses self-learning capability. When changing a dataset, it would learn and classify the features by its own. However, utilizing deep learning requires performing time-consuming training process. For example, utilizing a traditional convolution neural network (CNN) on training FER2013 dataset would produce 20 million parameters requiring adjustment in the training process. In this research, we utilized two deep learning neural network frameworks, Inception and Xception by Google, and we proposed a method to concatenate the two frameworks in a merge layer. Experiments showed that the proposed method achieves 70.1% recognition accuracy. Based on the training results, we integrate face detection and facial expression recognition modules to develop a real-time facial expression recognition system. It needs only 0.143 second to recognize one frame. In addition, we also proposed a database updating method for users to build new training data when facing new testing data, so that the system can still yield good real-time recognition performances under different circumstances.

    摘要 I Abstract II 致謝 III 目錄 IV 圖目錄 VII 表目錄 IX 第一章 緒論 1 1.1 研究動機與目的 1 1.2問題描述與研究方法 2 1.3論文組織 3 第二章 背景知識 4 2.1卷積神經網路(Convolution Neural Network, CNN) 4 2.1.1 卷積層(Convolution Layer) 4 2.1.2 池化層(Pooling Layer) 5 2.1.3 全連接層(Fully-Connected Layer) 6 2.1.4 丟失函數(Dropout) 7 2.1.5 批次標準化(Batch Normalization) 8 2.1.6 線性整流函數(Rectified Linear Unit, ReLU) 10 2.1.7 分類輸出層(Classification Output Layer) 11 2.2 優化器 13 2.3 深度卷積神經網路架構介紹 14 2.3.1 AlexNet 14 2.3.2 VGG16 16 2.3.3 Inception 17 2.3.4 ResNet 19 2.3.5 Xception 21 2.4 人臉偵測 24 2.4.1 OpenCV之人臉偵測 25 2.5相關文獻探討 30 2.5.1 臉部表情識別應用深層神經網路方法 30 2.5.2 人臉偵測運用深度多任務學習框架方法 32 2.5.3 偏移量與變異數權衡 33 2.5.4 數據增強(Data Augmentation) 38 第三章 混合式深度學習方法於即時表情辨識系統 39 3.1 系統架構與功能概述 39 3.2 系統運作流程 40 3.3 運用深度學習之混合式網路方法 42 3.3.1 運用Inception架構方法 42 3.3.2 運用Xception架構方法 44 3.3.3 提出混合式網路方法 46 3.4 驗證與更新資料庫機制 47 第四章 實驗結果與分析探討 49 4.1 FER2013表情資料庫 49 4.2 實驗環境 50 4.3 實驗結果之分析與探討 51 4.3.1 設計Inception網路之實驗結果 51 4.3.2 設計Xception網路之實驗結果 54 4.3.3 混合式網路與各子網路比較 55 4.3.4 綜合比較實驗結果 57 4.3.5 多網路實驗結果探討 59 4.4 即時表情辨識系統之實驗結果 62 4.4.1 單張影像實測 63 4.4.2 即時辨識 64 第五章 結論與未來研究探討 65 5.1 結論 65 5.2 未來展望 67 參考文獻 68

    [1] Mehrabian A. Communication without words[J]. Communication Theory, pp. 193-200, 2008.
    [2] Alex Krizhevsky, Ilya Sutskever and Geoffrey E. Hinton, “Imagenet classification with deep convolutional neural networks,” In NIPS, pp. 1106-1114, 2012.
    [3] C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, vol. 22, no. 3, pp. 273-297, 1995.
    [4] Matthew D. Zeiler and Rob Fergus, “Visualizing and understanding convolutional networks,” In ECCV, 2014.
    [5] D.-J. Yu et al, “Mixed pooling for convolutional neural networks,” 364-375. 10.1007/978-3-319-11740-9_34, Springer, Cham, 2014.
    [6] N. Srivastava et al, “Dropout: a simple way to prevent neural networks from overfitting,” Journal of Machine Learning Research, 15.1:1929-1958, 2014. 
    [7] S. Ioffe et al, “Batch normalization: accelerating deep network training by reducing internal covariate shift,” Int. Conf. Machine Learning, 2015.
    [8] K. Simonyan et al, “Very deep convolutional networks for large-scale image recognition,” arXiv: 1409.1556, 2014
    [9] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” arXiv: 1409-4842, 2014.
    [10] Kaiming. He, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.
    [11] François Chollet, “Xception: deep learning with depthwise separable convolutions,” arXiv: 1610-02357, 2017.
    [12] Keras: the python deep learning library. https://keras.io/
    [13] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens and Z. Wojna. “Rethinking the inception architecture for computer vision,” arXiv: 1512-00567, 2015.
    [14] A. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto and H. Adam. “Mobilenets: efficient convolutional neural networks for mobile vision applications,” arXiv: 1704.04861, 2017.
    [15] D. Kingma and J. Ba, “Adam: a method for stochastic optimization,” arXiv: 1412.6980, 2014.
    [16] Challenges in representation learning: Facial expression recognition challenge: http://www.kaggle.com/c/challengesin-representation-learning-facial-expression-recognitionchallenge
    [17] OpenCV: open source computer vision library. https://opencv.org/
    [18] Ranjan, Rajeev, Vishal M. Patel, and Rama Chellappa. "Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition." IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.
    [19] Nvidia CUDA: compute unified device architecture.
    https://developer.nvidia.com/computeworks
    [20] P. Viola and M. Jones, “Rapid Object Detection Using a Boosted Cascade of Simple Features,” IEEE Computer Society Conference on CVPR, vol.1, pp.511-518, 2001.
    [21] Y. Freund and R. Schapire, “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting,” Journal of Computer and System Sciences, pp.119-139, Aug. 1997.
    [22] Pratap Dangeti. Extending machine learning algorithms [Video], 2017. https://www.packtpub.com/big-data-and-business-intelligence/extending-machine-learning-algorithms-video
    [23] A. Mollahosseini, D. Chan, and M. H. Mahoor. “Going deeper in facial expression recognition using deep neural networks,” IEEE Winter Conference on Applications of Computer Vision (WACV), 2016.
    [24] Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." arXiv preprint arXiv:1312.4400, 2013.
    [25] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. "The elements of statistical learning: data mining, inference, and prediction." , 2009.
    [26] James, Gareth, et al. An introduction to statistical learning. Vol. 112. New York: springer, 2013.
    [27] S. Fortmann-Roe, “Understanding the Bias-Variance Tradeoff.”
    http://scott.fortmann-roe.com/docs/BiasVariance.html
    [28] P. Domingos, “A few useful things to know about machine learning,” Commun. ACM, 55(10):78–87, 2012.
    [29] Tang, Yichuan. "Deep learning using linear support vector machines." arXiv preprint arXiv:1306.0239, 2013.
    [30] Kankanamge, Sarasi, Clinton Fookes, and Sridha Sridharan. "Facial analysis in the wild with LSTM networks." Image Processing (ICIP), 2017 IEEE International Conference on. IEEE, 2017.
    [31] Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9.8 (1997): 1735-1780.
    [32] M. Quinn, G. Sivesind and G. Reis, "Real-time Emotion Recognition From Facial Expressions.", 2017.
    [33] Goodfellow, Ian J., et al. "Challenges in representation learning: A report on three machine learning contests." International Conference on Neural Information Processing. Springer, Berlin, Heidelberg, 2013.

    QR CODE