Author: |
林律恩 Lu-En Lin |
---|---|
Thesis Title: |
基於臉部特徵遮蔽式資料擴增之年齡及性別估測 Data Augmentation with Occluded Facial Features for Age and Gender Estimation |
Advisor: |
林昌鴻
Chang-Hong Lin |
Committee: |
林昌鴻
Chang-Hong Lin 陳維美 Wei-Mei Chen 林敬舜 Ching-Shun Lin 王煥宗 Huan-Chun Wang 陳永耀 Yung-Yao Chen |
Degree: |
碩士 Master |
Department: |
電資學院 - 電子工程系 Department of Electronic and Computer Engineering |
Thesis Publication Year: | 2020 |
Graduation Academic Year: | 108 |
Language: | 英文 |
Pages: | 67 |
Keywords (in Chinese): | 年齡預測 、性別預測 、深度學習 、卷積神經網路 、資料擴增 |
Keywords (in other languages): | Gender Classification, Age Classification, Deep Learning, Convolution Neural Networks, Data Augmentation |
Reference times: | Clicks: 529 Downloads: 0 |
Share: |
School Collection Retrieve National Library Collection Retrieve Error Report |
基於人臉分析廣泛的應用一直是熱門的研究項目。商業用途、娛樂用途、監控、及人機互動等等,皆需透過人臉分析來提供必要的資訊。由於近日深度學習在各領域的崛起,人臉分析的研究能夠朝著更加複雜和實際的使用情境發展。在年齡和性別的估測中,利用深度學習網路的應用,比較起以往機器學習的方式大幅提升了準確度。但年齡和性別估測在實際生活中的應用還有不少的進步空間。因此,本論文提出了一項資料擴增的方法,將深度網路的訓練資料進行處理,在臉部特徵上模擬現實生活中會遇到的難題,為深度網路提供更多的訓練資料,並提升深度網路的強健性和泛化程度。本方法利用了三種簡單的影像處理技巧來對輸入圖片的臉部特徵進行遮蔽,這三種技巧分別為遮罩(Blackout)、隨機亮度(Random Brightness)、及模糊(Blur)。此論文也提出一個稍微修改的交叉熵損失函數(Cross-entropy loss)。當年齡估測的結果和標準答案相差為一時,將給予年齡的損失函數較少的懲罰。為了驗證本方法的有效性,我們將提出的資料擴增方法實現在兩種不同的卷積神經網路(Convolutional Neural Network)¬–稍微修改的AdienceNet及稍微修改的VGG16,以進行年齡及性別預測。最終結果顯示出,在年齡預測上,我們的資料擴增方法及稍微修改的交叉熵損失函數能夠分別對稍微修改的AdienceNet及稍微修改的VGG16網路提升6.62%及6.53%的準確度;並在性別預測上,分別提升6.20%及6.31%的準確度。
Facial analysis tasks have been a very hot topic over the years for its broad varieties of applications, such as human-machine interaction, commercial uses, entertainment, and surveillance, etc. With the rise of deep learning, these tasks were able to solve more difficult and practical problems. Even though the task of age and gender estimation with deep learning methods achieved considerable improvements compared to the traditional machine learning based methods, the results are still far from satisfying the need for real-life applications. In this thesis, a data augmentation method that stimulates real-life challenges on the main feature of the human face is proposed. With the proposed method, we improve the generalization and robustness of the network by generating more variety of training samples. The proposed method, Feature occlusion, used three simple occlusion techniques, Blackout, Random Brightness, and Blur to stimulate different challenges that could happen in real-life situations. We also proposed a modified cross-entropy loss that gives less penalty to the age predictions that lands on the adjacent classes of the ground truth class. We verify the effectiveness of our proposed method by implementing the augmentation method and modified cross-entropy loss on two different convolution neural networks (CNNs), the slightly modified AdienceNet and the slightly modified VGG16, to perform age and gender classification. The proposed augmentation system improves age and gender classification accuracy of the slightly modified AdienceNet network by 6.62% and 6.53% on the Adience dataset, respectively. The proposed augmentation system also improves the age and gender classification accuracy of the slightly modified VGG16 network by 6.20% and 6.31% on the Adience dataset, respectively.
[1]
A.K. Jain, A. Ross, and S. Prabhakar, “An introduction to biometric recognition,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, no. 1, pp. 4-20, 2004.
[2]
R. Maldonado, P. Tansuhaj, and D.D. Muehling, “The impact of gender on ad processing: A social identity perspective,” Academy of Marketing Science Review, vol. 3, no. 3, pp. 1-15, 2003.
[3]
K. Luu, K. Ricanek, T.D. Bui, and C.Y. Suen, “Age estimation using active appearance models and support vector machine regression,” in 2009 IEEE 3rd International Conference on Biometrics: Theory, Applications, and Systems (BTAS 2009). IEEE, pp. 1-5, 2009.
[4]
A. Gunay, and V.V. Nabiyev, “Automatic age classification with LBP,” in 2008 23rd International Symposium on Computer and Information Sciences (ISCIS 2008). IEEE, pp. 1-4, 2008.
[5]
M. Hu, Y. Zheng, F. Ren and H. Jiang, “Age estimation and gender classification of facial images based on Local Directional Pattern,” in 2014 IEEE 3rd International Conference on Cloud Computing and Intelligence Systems (ICCCIS 2014). IEEE, pp. 103-107, 2014.
[6]
C. Shan, “Learning local features for age estimation on real-life faces,” in Proceedings of the 1st ACM International Workshop on Multimodal Pervasive Video Analysis (MPVA 2010), pp.23-28, 2010.
[7]
E. Eidinger, R. Enbar and T. Hassner, “Age and gender estimation of unfiltered faces,” IEEE Transactions on Information Forensics and Security, vol. 9, no. 12, pp. 2170-2179, 2014.
[8]
R. Ranjan, S. Sankaranarayanan, C.D. Castillo and R. Chellappa, “An all-in-one convolutional neural network for face analysis,” in 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017). IEEE, pp. 17-24, 2017.
[9]
P. Rodríguez, G. Curcurull, J.M. Gonfaus, F.X. Roca and J. Gonzalez, “Age and gender recognition in the wild with deep attention,” in Pattern Recognition, vol. 72 pp. 563-571, 2017.
[10]
R. Rothe, R. Timofte, and L. V. Gool, “Dex: Deep expectation of apparent age from a single image,” in Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCV 2015), pp. 10-15, 2015.
[11]
G. Ozbulak, Y. Aytar, and H.K. Ekenel, “How transferable are CNN-based features for age and gender classification?,” in 2016 International Conference of the Biometrics Special Interest Group (BIOSIG 2016). IEEE, pp. 1-6, 2016.
[12]
E. Hoffer, R. Banner, I. Golan, and D. Soudry, “Norm matters: efficient and accurate normalization schemes in deep networks,” Advances in Neural Information Processing Systems (NIPS 2018), pp. 2160-2170, 2018.
[13]
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The Journal of Machine Learning Research, vol. 15, no.1, pp. 1929-1958, 2014.
[14]
B. Zoph, E.D. Cubuk, G. Ghiasi, T.Y. Lin, J. Shlens, and Q.V. Le, “Learning data augmentation strategies for object detection,” arXiv preprint arXiv:1906.11172, 2019.
[15]
J. Wang and L. Perez, “The effectiveness of data augmentation in image classification using deep learning,” arXiv preprint arXiv:1712.04621, 2017.
[16]
J. Lemley, S. Bazrafkan, and P. Corcoran, “Smart Augmentation Learning an Optimal Data Augmentation Strategy,” in IEEE Access, vol. 5, pp. 5858-5869, 2017.
[17]
G. Levi and T. Hassncer, “Age and gender classification using convolutional neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition workshops (CVPR 2015), pp. 2622-2629, 2015.
[18]
K. Simonyan, and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[19]
J. Tapia, and C. Perez, “Gender classification based on fusion of different spatial scale features selected by mutual information from histogram of LBP, intensity, and shape,”, IEEE Transactions on Information Forensics and Security, vol. 8, no. 3, pp. 488-499, 2013.
[20]
T. Ahonen, A. Hadid, and M. Pietikainen, “Face description with local binary patterns: Application to face recognition,” IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 28, no. 12, pp. 2037-2041, 2006.
[21]
C. Cortes, and V. Vapnik, “Support-vector networks,” Machine learning, vol. 20, no. 3, pp. 273-297, 1995.
[22]
T.F. Cootes, G.J. Edwards, and C.J. Taylor, “Active appearance models,” IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 23, no. 6, pp. 681-685, 2001.
[23]
M.Y. El Dib, and M. El-Saban, “Human age estimation using enhanced bio-inspired features (EBIF),” in 2010 IEEE International Conference on Image Processing (ICIP 2010), pp. 1589-1592, 2010.
[24]
T.F. Cootes, C.J. Taylor, D.H. Cooper, and J. Graham, “Active shape models-their training and application,” Computer Vision and Image Understanding, vol. 61, no. 1, pp. 38-59, 1995.
[25]
D. Dunn, and W.E. Higgins, “Optimal gabor filters for texture segmentation,” IEEE Transactions on Image Processing, vol. 4, no. 7, pp. 947-964, 1995.
[26]
T. Jabid, M.H. Kabir, and O. Chae, “Local directional pattern (LDP) for face recognition,” in 2010 Digest of Technical Papers International Conference on Consumer Electronics (ICCE 2010), pp. 329-330, 2010.
[27]
P. Rauss, H. Moon, S. Rizvi and P. Rauss, “The FERET evaluation methodology for face-recognition algorithms,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 10, pp. 1090-1104, 2000.
[28]
Y. Fu, T. Hospedales, T. Xiang, Y. Yao and S. Gong, “Interestingness Prediction by Robust Learning to Rank,” in 13th European Conference on Computer Vision (ECCV 2014), pp. 488-503, 2014.
[29]
K. Ricanek, T. Tesafaye, “Morph: A longitudinal image database of normal adult age-progression,” in 7th International Conference on Automatic Face and Gesture Recognition (FGR 2006), pp. 341-345, 2006.
[30]
S. Lapuschkin, A. Binder, K.R. Muller, and W. Samek, “Understanding and comparing deep neural networks for age and gender classification,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV 2017), pp. 1629-1638, 2017.
[31]
Y. Jia, E. Shellhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” in Proceedings of the 22nd ACM International Conference on Multimedia, pp. 675-678, 2014.
[32]
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Auguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), pp. 1-9, 2015.
[33]
J. Chen, A. Kumar, R. Ranjan, V. M. Patel, A. Alavi and R. Chellappa, "A cascaded convolutional neural network for age estimation of unconstrained faces," in 2016 IEEE 8th International Conference on Biometrics Theory, Applications and Systems (BTAS 2016), pp. 1-8, 2016.
[34]
J. Wolfshaar, M. F. Karaaba and M. A. Wiering, "Deep convolutional neural networks and support vector machines for gender recognition," in 2015 IEEE Symposium Series on Computational Intelligence (SSCI 2015), pp. 188-195, 2015.
[35]
V. Mnih, N. Heess, A. Graves, and K. Kavukcuoglu, “Recurrent models of visual attention,” Advances in Neural Information Processing Systems(NIPS 2014), pp. 2204-2212, 2014.
[36]
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, “Generative adversarial nets,” Advances in Neural Information Processing Systems(NIPS 2014), pp. 2672-2680, 2014.
[37]
T. DeVries, and G. Taylor, “Improved regularization of convolutional neural networks with cutout,” arXiv preprint arXiv:1708.04552, 2017.
[38]
K. Zhang, Z. Zhang, Z. Li and Y. Qiao, “Joint face detection and alignment using multitask cascaded convolutional networks,” IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1499-1503, 2016.
[39]
G. Bradski, “The OpenCV library,” Dr. Dobb's Journal of Software Tools, 2000.
[40]
V. Nair, and G.E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in Proceedings of the 27th International Conference on Machine Learning (ICML 2010), 2010.
[41]
S. Ioffe, and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), pp. 448-456, 2015.
[42]
Flickr: https://www.flickr.com/, [Online], last accessed July 2020.
[43]
N. Qian, “On the momentum term in gradient descent learning algorithms,” Neural networks, vol. 12, no. 1, pp. 145-151, 1999.
[44] X. Glorot, and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2010), pp. 249-256, 2010.
[45] A. Krizhevsky, I. Sutskevera and G. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in Neural Information Processing Systems(NIPS 2012), pp. 1097-1105, 2012.
[46] A. Gallagher, T. Chen, “Understanding Images of Groups of People,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2009), pp. 256-263, 2009.