基於生成對抗網路與島嶼損失函數的強健性人臉情感辨識技術

簡易檢索 / 詳目顯示

回結果列表

研究生：	吳逸軒 Yi-Xuan Wu
論文名稱：	基於生成對抗網路與島嶼損失函數的強健性人臉情感辨識技術 Robust Facial Expression Recognition based on Generative Adversarial Networks with Island Loss
指導教授：	花凱龍 Kai-Lung Hua
口試委員:	花凱龍 Kai-Lung Hua 楊傳凱 Chuan-Kai Yang 鍾國亮 Kuo-Liang Chung 陳駿丞 Jun-Cheng Chen 郭景明 Jing-Ming Guo
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2019
畢業學年度：	107
語文別：	中文
論文頁數：	44
中文關鍵詞：	人臉情感辨識、深度學習、生成網路
外文關鍵詞：	Facial Expression Recognition, Deep Learning, Generative Adversarial Networks
相關次數：	點閱：343 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

人臉情感辨識是電腦視覺領域的一個很重要的議題，有很多研究在情感辨識都有很傑出的表現，但如果來源(訓練)和目標(測試)數據集差異較大時，則會降低辨識的準確率，領域自適應的方法通常可以處理這個問題，然而，在實際應用中，目標數據集並不容易獲得，並且需要針對每個新的目標數據集調整模型。我們的人臉情感辨識方法是使用數據增強而不是領域自適應來訓練強健性的分類網絡，我們使用生成對抗網絡進行數據增強，以生成具有由動作單元定義的不同人臉情感的合成人臉影像，動作單元代表特定人臉肌肉的運動（例如臉頰抬起），我們對高變化的數據集（例如包含各種頭部姿勢，亮度，拍攝視角，人種）做進一步的資料增強，再用於訓練網絡，使網路更具有強健性。為了提高我們網絡的分類準確率，我們利用非局部模塊來捕捉遠程空間關係，並利用島嶼損失來減少同類間（相同的人臉情感）特徵的差異，並擴大不同類間（不同的人臉情感）特徵差異。我們的實驗證明，我們的網絡在跨數據集的人臉情感分類具有最先進的表現。

Facial expression recognition (FER) is an important issue in the field of computer vision.There are methods that perform FER, but they have reduced performance if the source (training) and target (testing) dataset have a large discrepancy.Domain adaptation is usually performed to handle this problem, however, in practical applications, the target dataset is not always readily available, and the model needs to be adapted for each new target dataset.Our FER method uses data augmentation rather than domain adaptation to generate robust facial expression classifier networks.We performed data augmentation using Generative Adversarial Networks (GAN) to generate synthetic face images with different facial expressions defined by Actions Units (AU), which are anatomically-based movements of certain facial muscle groups (e.g cheek raise).We augmented a high variation dataset (e.g. contains a variety of head poses, illumination, perspective, subject ethnicity) to create a new dataset, with a large amount of datapoints, for training a robust network.To improve the classification performance of our network, we utilized non-local blocks to capture long-range spatial relationship, and island loss to decrease intra-class (same facial expression) variations and increase inter-class (different facial expressions) differences.Our network can be trained on a single dataset, and our experiments show that our network has state-of-the-art performance in facial expression classification on different datasets.

Abstract in Chinese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Abstract in English . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 RELATED WORKS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3 PROPOSED APPROACH . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1 Generative Adversarial Network . . . . . . . . . . . . . . . . . . . . . . 6
3.2 Network architecture and nonlocal block . . . . . . . . . . . . . . . . . . 10
3.3 Island loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4 EXPERIMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2 Face Detection and Alignment . . . . . . . . . . . . . . . . . . . . . . . 20
4.3 Single dataset evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.4 Cross-dataset evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.1 FUTURE WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
                                

[1] S. Li, W. Deng, and J. Du, “Reliable crowdsourcing and deep locality-preserving learning for ex-pression recognition in the wild,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2584–2593, 2017.

[2] X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7794–7803, 2018.

[3] D. Acharya, Z. Huang, D. Pani Paudel, and L. Van Gool, “Covariance pooling for facial expression recognition,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPR),pp. 367–374, 2018.

[4] H. Yang, U. Ciftci, and L. Yin, “Facial expression recognition by de-expression residue learning,” in IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 2168–2177, 2018.

[5] M. Long, Y. Cao, J. Wang, and M. I. Jordan, “Learning transferable features with deep adaptation networks,” arXiv preprint arXiv:1502.02791, 2015.

[6] K. Yan, W. Zheng, Z. Cui, and Y. Zong, “Cross-database facial expression recognition via unsupervised domain adaptive dictionary learning,” in International Conference on Neural Information Processing,pp.427–434, Springer, 2016.

[7] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, pp.2672–2680,2014.

[8] A. Pumarola, A. Agudo, A. M. Martinez, A. Sanfeliu, and F. Moreno-Noguer, “Ganimation: Anatomically-aware facial animation from a single image,” in European Conference on Computer Vision (ECCV), pp. 818–833, 2018.

[9] R. Ekman, What the face reveals: Basic and applied studies of spontaneous expression using the Facial Action Coding System (FACS). Oxford University Press, USA, 1997.

[10] J. Cai, Z. Meng, A. S. Khan, Z. Li, J. O’Reilly, and Y. Tong, “Island loss for learning discriminative features in facial expression recognition,” in IEEE International Conference on Automatic Face & Gesture Recognition, pp. 302–309, 2018.

[11] C. Shan, S. Gong, and P. W. McOwan, “Facial expression recognition based on local binary patterns: A comprehensive study,” Image and vision Computing, vol. 27, no. 6, pp. 803–816, 2009.

[12] Z. Luo, J. Hu, W. Deng, and H. Shen, “Deep unsupervised domain adaptation for face recognition,” in IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 453–457, 2018.

[13] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” International Conference on Learning Representations, ICLR, 2016.

[14] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops(CVPR), July 2017.

[15] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in IEEE international conference on computer vision, pp. 2223– 2232, 2017.

[16] Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo, “Stargan: Unified generative adversarial networks for multi-domain image-to-image translation,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops(CVPR), pp. 8789–8797, 2018.

[17] S. Du, Y. Tao, and A. M. Martinez, “Compound facial expressions of emotion,” National Academy of Sciences, vol. 111, no. 15, pp. E1454–E1462, 2014.

[18] A. Buades, B. Coll, and J.-M. Morel, “A non-local algorithm for image denoising,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 60–65, 2005.

[19] Y. Wen, K. Zhang, Z. Li, and Y. Qiao, “A discriminative feature learning approach for deep face recognition,” in European Conference on Computer Vision (ECCV), pp. 499–515, 2016.

[20] P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews, “The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression,” in IEEE Confer-ence on Computer Vision and Pattern Recognition Workshops (CVPR), pp. 94–101, 2010.

[21] S. Li and W. Deng, “Deep emotion transfer network for cross-database facial expression recognition,” in International Conference on Pattern Recognition (ICPR), pp. 3092–3099, 2018.

[22] I. J. Goodfellow, D. Erhan, P. L. Carrier, A. Courville, M. Mirza, B. Hamner, W. Cukierski, Y. Tang, D. Thaler, D.-H. Lee, et al., “Challenges in representation learning: A report on three machine learning contests,” in International Conference on Neural Information Processing, pp. 117–124, 2013.

[23] M. Pantic, M. Valstar, R. Rademaker, and L. Maat, “Web-based database for facial expression analy-sis,” in IEEE international conference on multimedia and Expo (ICME), pp. 5–pp, 2005.

[24] S. Umeyama, “Least-squares estimation of transformation parameters between two point patterns,” IEEE Transactions on Pattern Analysis & Machine Intelligence, no. 4, pp. 376–380, 1991.

[25] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, “Joint face detection and alignment using multitask cascaded convolutional networks,” IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1499–1503, 2016.

[26] Z. Liu, S. Li, and W. Deng, “Boosting-poof: Boosting part based one vs one feature for facial ex-pression recognition in the wild,” in IEEE International Conference on Automatic Face & Gesture Recognition, pp. 967–972, 2017.

[27] V. Vielzeuf, C. Kervadec, S. Pateux, A. Lechervy, and F. Jurie, “An occam’s razor view on learning audiovisual emotion recognition with small training sets,” in International Conference on Multimodal Interaction, pp. 589–593, ACM, 2018.

[28] A. Mollahosseini, D. Chan, and M. H. Mahoor, “Going deeper in facial expression recognition using deep neural networks,” in IEEE Winter conference on applications of computer vision (WACV), pp. 1– 10, 2016.

[29] M. V. Zavarez, R. F. Berriel, and T. Oliveira-Santos, “Cross-database facial expression recognition based on fine-tuned deep convolutional network,” in SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), pp. 405–412, 2017.

[30] B. H. Mohammad Mahoor et al., “Facial expression recognition using enhanced deep 3d convolu-tional neural networks,” in IEEE Conference on Computer Vision and Pattern Recognition Work-shops(CVPR), pp. 30–40, 2017.

[31] B. Hasani and M. H. Mahoor, “Spatio-temporal facial expression recognition using convolutional neu-ral networks and conditional random fields,” in IEEE International Conference on Automatic Face & Gesture Recognition, pp. 790–795, 2017.

[32] G. Wen, Z. Hou, H. Li, D. Li, L. Jiang, and E. Xun, “Ensemble of deep neural networks with probability-based fusion for facial expression recognition,” Cognitive Computation, vol. 9, no. 5, pp. 597–610, 2017.

[33] J. Zeng, S. Shan, and X. Chen, “Facial expression recognition with inconsistently annotated datasets,” in Proceedings of the European conference on computer vision (ECCV), pp. 222–237, 2018.

全文公開日期 2024/07/26 (校內網路)
全文公開日期 2029/07/26 (校外網路)
全文公開日期 2039/07/26 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文