簡易檢索 / 詳目顯示

研究生: 陳彥宇
Yen-Yu Chen
論文名稱: 對抗自編碼生成影像與預篩選訓練集之年齡估計
Facial Age Estimation using Synthetic Data Augmentation and Training Set Re-Selection
指導教授: 徐繼聖
Gee-Sern Hsu
口試委員: 洪一平
徐繼聖
林惠勇
王鈺強
郭景明
學位類別: 碩士
Master
系所名稱: 工程學院 - 機械工程系
Department of Mechanical Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 中文
論文頁數: 83
中文關鍵詞: 年齡估計資料增量
外文關鍵詞: Age Estimation, Data Augmentation
相關次數: 點閱:447下載:4
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

分類和回歸模型會在訓練數據分佈不均勻中產生偏差,測試過程中會偏向樣訓練樣本較多的之族群。用於研究人臉年齡估計之資料庫通常是分佈不均勻,因為老年與幼年遠小於年輕族群的年齡樣本,並且會隨著年齡的增長,可用之影像也就越少。在數據不均勻之狀況下一般會以增加樣本避免這樣的情況。數據增加的常見做法包括對樣本作平移,縮放,旋轉和(鏡射)變換。我們使用了兩種最先進的增強方法,一個是3D人臉模型(3DMM),另一個是對抗自動編碼器(CAAE)。3DMM可以通過跟隨多個特徵來擬合2D人臉,將2D人臉座標轉換為3D人臉座標,生成新的人臉角度。 CAAE學習面部流形,可以同時實現平滑的年齡進展和回歸。給定一張臉,CAAE可以產生相同的臉,但在不同的年齡。除了樣本增加之外,我們還研究了訓練樣本是否能有效的訓練,迄今為止尚未得到足夠的重視。我們提出了一個合成網路架構,合併了VGGFace與DEX-per。VGGFace原本是訓練在人臉辨識,而DEX-per則是訓練在年齡資料庫上,並在本研究中進行再訓練,用於年齡估計。本研究貢獻如下: 1) 建立在以大量資料訓練所獲得之人臉辨識深度神經網路 (CNN),又透過年齡樣本的在訓練使這些人臉辨識模型透過轉移學習 (transfer learning) ,而進行年齡估測; 2) CAAE與3DMM提升樣本訓練之數量,解決年齡樣本分佈不均的問題; 3) 利用訓練後的網路進行訓練資料的錯誤分析並把訓練樣本中錯誤較大的樣本濾除以提高性能。


Classification and regression models can be made biased from training data imbalance. The data for studying facial age estimation is often imbalanced in that the facial samples for seniors are often less than those for juniors, and the more aged the subjects are, the fewer their images are available. Data imbalance is generally circumvented by data augmentation. The common practice for data augmentation includes translation, scaling, rotation and (affine) transformation imposed on the originals. We explore two state-of-the-art approaches for augmentation. One is the 3D Morphable Model (3DMM) and the other is the Conditional Adversarial Auto-Encoder (CAAE). The 3DMM can be fitted to a 2D face by following the multiple features matching, transforming the 2D face into a 3D face, allowing novel poses to be generated. The CAAE learns a face manifold, traversing on which smooth age progression and regression can be realized simultaneously. Given a face, the CAAE can generate the same face but in different ages. In addition to data augmentation, we also study the issue of data pruning, which has not received sufficient attention. We propose an ensemble network that integrates the VGG-Face network and DEX-pre network. Both the VGG-Face and DEX-pre are made for face recognition, and retrained in this study for facial age estimation. The contributions of this study can be summarized as follows: 1) the proposed ensemble network, built on face-pretrained models, offers an effective framework for facial age estimation; 2) the data augmentation by the CAAE and 3DMM can better handle the data imbalance issue; 3) the data pruning by removing the training data with large errors can clean the data and in turn improve the performance.

摘要 V Abstract VI 誌謝 VIII 目錄 IX 圖目錄 XI 表目錄 XIV 1. 介紹 1 1.1. 研究背景和動機 1 1.2. 論文貢獻 4 1.3. 論文架構 4 2. 文獻回顧 6 2.1. Biologically Inspired Features 6 2.2. Deep EXpectation of apparent age from a single image 9 2.3. Age Estimation Using Expectation of Label Distribution Learning 10 2.4. An Ensemble CNN2ELM for Age Estimation 11 2.5. Available Pre-Trained Face Models 13 3. 主要方法 16 3.1. 訓練樣本augmentation 16 3.2.1. Age Progression/Regression by Conditional Adversarial Autoencoder (CAAE) 16 3.2.2. 3DMM 23 3.2. Ensemble Network 26 3.3. 剃除較差的訓練樣本 28 3.4. 移動式分類和軟邊界回歸 30 4. 實驗設置與分析 31 4.1 資料庫介紹 31 4.1.1 MORPH 31 4.1.2 Chalearn2015 33 4.1.3 FG-Net 35 4.1.4 CACD 36 4.2 預訓練模型的比較 37 4.3 合成架構 40 4.4 剔除較差訓練樣本 41 4.5 Augementation 42 傳統增加方法 42 CAAE 42 3DMM 50 不同augmentation的比較 52 4.6 移動式分類和軟邊界回歸 58 Morph 58 Chalearn2015 60 FG-Net 61 5. 結論與未來研究方向 64 6. 參考文獻 65

[1] Gao, B. B., Xing, C., Xie, C. W., Wu, J., & Geng, X. (2017). Deep Label Distribution Learning With Label Ambiguity. IEEE Transactions on Image Processing, 26(6), 2825-2838.
[2] Rothe, R., Timofte, R., & Van Gool, L. (2015). Dex: Deep expectation of apparent age from a single image. In Proceedings of the IEEE International Conference on Computer Vision Workshops (pp. 10-15).
[3] Liu, X., Li, S., Kan, M., Zhang, J., Wu, S., Liu, W., ... & Chen, X. (2015). Agenet: Deeply learned regressor and classifier for robust apparent age estimation. In Proceedings of the IEEE International Conference on Computer Vision Workshops (pp. 16-24).
[4] Zhu, Y., Li, Y., Mu, G., & Guo, G. (2015). A study on apparent age estimation. In Proceedings of the IEEE International Conference on Computer Vision Workshops (pp. 25-31).
[5] Yang, X., Gao, B. B., Xing, C., Huo, Z. W., Wei, X. S., Zhou, Y., ... & Geng, X. (2015). Deep label distribution learning for apparent age estimation. In Proceedings of the IEEE International Conference on Computer Vision Workshops (pp. 102-108).
[6] Ranjan, R., Zhou, S., Cheng Chen, J., Kumar, A., Alavi, A., Patel, V. M., & ChelChaLearn2015pa, R. (2015). Unconstrained age estimation with deep convolutional neural networks. In Proceedings of the IEEE International Conference on Computer Vision Workshops (pp. 109-117).
[7] Eidinger, E., Enbar, R., & Hassner, T. (2014). Age and gender estimation of unfiltered faces. IEEE Transactions on Information Forensics and Security, 9(12), 2170-2179.
[8] Ozbulak, G., Aytar, Y., & Ekenel, H. K. (2016, September). How Transferable Are CNN-Based Features for Age and Gender Classification. In Biometrics Special Interest Group (BIOSIG), 2016 International Conference of the (pp. 1-6). IEEE.
[9] Liu, H., Lu, J., Feng, J., & Zhou, J. (2018). Label-sensitive deep metric learning for facial age estimation. IEEE Transactions on Information Forensics and Security, 13(2), 292-305.
[10] Duan, M., Li, K., & Li, K. (2018). An Ensemble CNN2ELM for Age Estimation. IEEE Transactions on Information Forensics and Security, 13(3), 758-772.
[11] Chen, S., Zhang, C., Dong, M., Le, J., & Rao, M. (2017, July). Using ranking-cnn for age estimation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Tan, Z., Wan, J., Lei, Z., Zhi, R., Guo, G., & Li, S. Z. (2017). Efficient Group-n Encoding and Decoding for Facial Age Estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence.
[13] Niu, Z., Zhou, M., Wang, L., Gao, X., & Hua, G. (2016). Ordinal regression with multiple output cnn for age estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4920-4928).
[14] Escalera, S., Fabian, J., Pardo, P., Baró, X., Gonzalez, J., Escalante, H. J., ... & Guyon, I. (2015). Chalearn looking at people 2015: Apparent age and cultural event recognition datasets and results. In Proceedings of the IEEE International Conference on Computer Vision Workshops (pp. 1-9).
[15] K. Ricanek and T. Tesafaye, “MORPH: A Longitudinal Image Database of Normal Adult Age-Progression,” Proc. IEEE Int’l Conf. Automatic Face and Gesture Recognition, pp. 341-345, 2-6 April 2006.
[16] K. Ricanek and T. Tesafaye, “MORPH: A Longitudinal Image Database of Normal Adult Age-Progression,” Proc. IEEE Int’l Conf. Automatic Face and Gesture Recognition, pp. 341-345, 2-6 April 2006.
[17] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1-9).
[18] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
[19] Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
[20] Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., ... & Darrell, T. (2014, November). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia (pp. 675-678). ACM.
[21] Masi, I., Trần, A. T., Hassner, T., Leksut, J. T., & Medioni, G. (2016, October). Do we really need to collect millions of faces for effective face recognition?. In European Conference on Computer Vision (pp. 579-596). Springer International Publishing.
[22] K. He, J. Sun, Convolutional neural networks at constrained time cost, in: CVPR, 2015, pp. 5353-5360.
[23] Hsu, G. S., Chang, K. H., & Huang, S. C. (2015). Regressive Tree Structured Model for Facial Landmark Localization. In Proceedings of the IEEE International Conference on Computer Vision (pp. 3855-3861).
[24] Parkhi, Omkar M., Andrea Vedaldi, and Andrew Zisserman. "Deep Face Recognition." BMVC. Vol. 1. No. 3. 2015.
[25] Masi, I., Trần, A. T., Hassner, T., Leksut, J. T., & Medioni, G. (2016, October). Do we really need to collect millions of faces for effective face recognition?. In European Conference on Computer Vision (pp. 579-596). Springer, Cham.
[26] Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009, June). Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on (pp. 248-255). IEEE.
[27] Panis, G., Lanitis, A., Tsapatsoulis, N., & Cootes, T. F. (2016). Overview of research on facial ageing using the FG-NET ageing database. IET Biometrics, 5(2), 37-46.
[28] Hsu, G. S., & Hsieh, C. H. (2017, October). Cross-pose landmark localization using multi-dropout framework. In Biometrics (IJCB), 2017 IEEE International Joint Conference on(pp. 390-396). IEEE.
[29] Blanz, Volker, and Thomas Vetter. "Face recognition based on fitting a 3D morphable model." IEEE Transactions on pattern analysis and machine intelligence 25.9 (2003): 1063-1074.
[30] R. Arandjelovic and A. Zisserman. All about VLAD. In ´ Proc. CVPR, 2013.
[31] H. Jégou, F. Perronnin, M. Douze, J. Sánchez, P. P’erez, and C. Schmid. Aggregating local image descriptors into compact codes. IEEE PAMI, 2011.
[32] Marquardt, Donald W. "An algorithm for least-squares estimation of nonlinear parameters." Journal of the society for Industrial and Applied Mathematics 11.2 (1963): 431-441.
[33] G. Guo, G. Mu, Y. Fu, C. Dyer, and T. Huang. A study on automatic age estimation using a large database. In IEEE, pages 1986–1991, Sept 2009.
[34] G. Guo and X. Wang. A study on human age estimation under facial expression changes. In IEEE,CVPR, pages 2547– 2553, 2012.
[35] Gao, B. B., Xing, C., Xie, C. W., Wu, J., & Geng, X. (2017). Deep Label Distribution Learning With Label Ambiguity. IEEE Transactions on Image Processing, 26(6), 2825-2838.
[36] Zhu, Xiangyu, et al. "High-fidelity pose and expression normalization for face recognition in the wild." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
[37] Zhang, Zhifei, Yang Song, and Hairong Qi. "Age progression/regression by conditional adversarial autoencoder." The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Vol. 2. 2017.
[38] Sun, Y., Wang, X., & Tang, X. (2014). Deep learning face representation from predicting 10,000 classes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1891-1898).
[39] Jison Hsu, G. S., Cheng, Y. T., Ching Ng, C., & Hoon Yap, M. (2017). Component Biologically Inspired Features With Moving Segmentation for Age Estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 38-45).
[40] Yang, X., Gao, B. B., Xing, C., Huo, Z. W., Wei, X. S., Zhou, Y., ... & Geng, X. (2015). Deep label distribution learning for apparent age estimation. In Proceedings of the IEEE International Conference on Computer Vision Workshops (pp. 102-108).
[41] Mu, G., Guo, G., Fu, Y., & Huang, T. S. (2009, June). Human age estimation using bio-inspired features. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on (pp. 112-119). IEEE.
[42] M. Riesenhuber and T. Poggio, “Hierarchical Models of Object Recognition in Cortex,” Nature Neuroscience, vol. 2, no. 11, pp. 1019-1025, 1999.
[43] T. Serre, L. Wolf, S. Bileschi, M. Riesenhuber, and T. Poggio, “Robust Object Recognition with Cortex-Like Mechanisms,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 3, pp. 411-426, Mar. 2007.
[44] Gao, Bin-Bin, et al. "Age Estimation Using Expectation of Label Distribution Learning."
[45] Duan, M., Li, K., & Li, K. (2018). An Ensemble CNN2ELM for Age Estimation. IEEE Transactions on Information Forensics and Security, 13(3), 758-772.
[46] Duan, Mingxing, Kenli Li, and Keqin Li. "An Ensemble CNN2ELM for Age Estimation." IEEE Transactions on Information Forensics and Security 13.3 (2018): 758-772.
[47] Xin Geng, Chao Yin, and Zhi-Hua Zhou. Facial age estimation by learning from label distributions. IEEE TPAMI, 35(10):2401–2412, 2013.
[48] Hu, Zhenzhen, et al. "Facial age estimation with age difference." IEEE Transactions on Image Processing 26.7 (2017): 3087-3097.
[49] Liu, Hao, et al. "Ordinal Deep Feature Learning for Facial Age Estimation." Automatic Face & Gesture Recognition (FG 2017), 2017 12th IEEE International Conference on. IEEE, 2017.
[50] Sun, Li, et al. "Facial age estimation through self-paced learning." Visual Communications and Image Processing (VCIP), 2017 IEEE. IEEE, 2017.
[51] Rothe, R., Timofte, R., & Van Gool, L. (2018). Deep expectation of real and apparent age from a single image without facial landmarks. International Journal of Computer Vision, 126(2-4), 144-157.
[52] He, Yating, et al. "Deep embedding network for robust age estimation." Image Processing (ICIP), 2017 IEEE International Conference on. IEEE, 2017.
[53] Liu, Hao, et al. "Label-sensitive deep metric learning for facial age estimation." IEEE Transactions on Information Forensics and Security 13.2 (2018): 292-305.
[54] K. Simonyan, A. Vedaldi, and A. Zisserman. Learning local feature descriptors using convex optimisation. IEEE PAMI, 2014.
[55] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deep-Face: Closing the gap to human-level performance in face verification. In Proc. CVPR, 2014
[56] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Web-scale training for face identification. In Proc. CVPR, 2015.
[57] F. Schroff, D. Kalenichenko, and J. Philbin. Facenet: A unified embedding for face recognition and clustering. In Proc. CVPR, 2015.
[58] Bor-Chun Chen, Chu-Song Chen, Winston H. Hsu. Cross-Age Reference Coding for Age-Invariant Face Recognition and Retrieval, ECCV 2014
[59] Kumar, Neeraj, et al. "Attribute and simile classifiers for face verification." Computer Vision, 2009 IEEE 12th International Conference on. IEEE, 2009.
[60] Sasaki, Yasnory TF, et al. "MENε/β noncoding RNAs are essential for structural integrity of nuclear paraspeckles." Proceedings of the National Academy of Sciences 106.8 (2009): 2525-2530.
[61] Masi, I., Trần, A. T., Hassner, T., Leksut, J. T., & Medioni, G. (2016, October). Do we really need to collect millions of faces for effective face recognition. In European Conference on Computer Vision (pp. 579-596). Springer International Publishing.
[62] I. Masi, S. Rawls, G. Medioni and P. Natarajan, "Pose-Aware Face Recognition in the Wild," CVPR, Las Vegas, NV, 2016, pp. 4838-4846.
[63] T. Hassner, S. Harel, E. Paz and R. Enbar, "Effective face frontalization in unconstrained images," CVPR, Boston, MA, 2015, pp. 4295-4304.
[64] Lewis, J.P., Anjyo, K., Rhee, T., Zhang, M., Pighin, F., Deng, Z.: Practice and Theory of Blendshape Facial Models. In: Eurographics 2014 (2014)
[65] Sun, Y., Chen, Y., Wang, X., & Tang, X. (2014). Deep learning face representation by joint identification-verification. In Advances in Neural Information Processing Systems (pp. 1988-1996).
[66] Geng, Xin, Zhi-Hua Zhou, and Kate Smith-Miles. "Automatic age estimation based on facial aging patterns." IEEE Transactions on pattern analysis and machine intelligence 29.12 (2007): 2234-2240.

QR CODE