簡易檢索 / 詳目顯示

研究生: 謝承樺
Cheng-Hua Hsieh
論文名稱: 以多層Dropout神經網路架構進行即時人臉角度與地標點偵測
Real-time Pose Estimation and Face Alignment using Multi-Dropout Framework
指導教授: 徐繼聖
Gee-Sern Hsu
口試委員: 洪一平
Yi-Ping Hung
王鈺強
Yu-Chiang Wang
郭景明
Jing-Ming Guo
劉雲夫
Yun-Fu Liu
學位類別: 碩士
Master
系所名稱: 工程學院 - 機械工程系
Department of Mechanical Engineering
論文出版年: 2017
畢業學年度: 105
語文別: 中文
論文頁數: 54
中文關鍵詞: 角度估測人臉地標點偵測卷積類神經網路
外文關鍵詞: Pose Estimation, Face Alignment, Convolutional Neural Network
相關次數: 點閱:330下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

本文提出以多層Dropout神經網路架構(Multi-Dropout Framework, MDF)進行跨視角人臉地標點與角度偵測,不同於多數的人臉地標點偵測器只可偵測yaw小於45度之樣本,所提出之架構偵測範圍可涵蓋完整之側面人臉。MDF針對人臉任務特別規畫之Single Shot MultiBox Detector (SSD)進行快速且精準的人臉偵測,隨後再經由三個多層Dropout回歸器(Multiple Dropout Regressor, MDR)進行跨視角人臉地標點偵測,其一MDR進行yaw、pitch與roll之精確角度回歸,角度估測誤差小於5度內,再根據所得之角度所屬正面或側面區間,由其餘兩MDR擇一進行人臉地標點偵測,由於MDR為MDF之核心,本研究旨在決定MDR架構與設定,藉此處理回歸問題。MDF提出下列優點與觀察:(1)跨視角人臉地標點偵測可藉一角度估測器與兩個地標點回歸器達成、(2)多層dropouts對於線性回歸模型訓練穩定性即具影響、(3)人臉側面化產生之人工影像可提供於訓練跨視角角度估測模型、(4)額外的LBP特徵可增加landmark準確性。本研究在公開資料庫300W、AFW與AFLW中與近年方法進行比較,證實MDF於跨視角人臉地標點偵測具有競爭力。


We propose the Multi-Dropout Framework (MDF) for face alignment and pose estimation across large poses. Unlike most landmark detectors only work for poses less than 45 degree in yaw, the proposed MDF works for full pose range, i.e., -90 to 90 degrees. The MDF uses a tailored version of the Single Shot Multibox Detector (SSD) for fast and precise face detection. Given an SSD detected face as input, three Multiple Dropout Regressors (MDRs) work together to locate the landmarks. One MDR estimates the precise pose of the face with error less than 5 degrees in yaw, pitch and roll. This MDR allows the split of input faces into frontal (<45 degrees) and profile (>45 degrees) pose ranges. The other two MDRs are each designed for detecting pose-oriented landmarks in either frontal or profile pose range. As the MDR is the core network of the MDF, this study aims to determine the MDR structures and settings appropriate for handling regression tasks. The MDF demonstrates the following advantages and observations. (1) Landmark detection across poses can be better approached by incorporating a pose estimator followed by two pose-oriented landmark regressors; (2) Multiple dropouts are required for stabilizing the training of regressor networks; (3) Pose estimation can be made precise to the degrees in yaw, pitch and roll, using the data augmented by a face profiling approach; (4) Additional hand-crafted features, such as the Local Binary Pattern (LBP), can improve the accuracy of landmark localization. A comparison study on benchmark databases shows that the MDF delivers a competitive performance to the state-of-the-art approaches for face alignment across large poses.

摘要 I Abstract II 誌謝 III 圖目錄 VII 表目錄 X 第一章 介紹 1 1.1 研究背景和動機 1 1.2 方法概述 2 1.3 論文貢獻 3 1.4 論文架構 4 第二章 文獻回顧 5 2.1 泛物件偵測相關文獻 5 2.1.1 Faster R-CNN 5 2.1.2 You Only Look Once (YOLO) 7 2.2 人臉地標點偵測相關文獻 9 2.2.1 Cascaded Convolutional Neural Network 9 2.2.2 Multi-Task Convolutional Neural Network 9 2.2.3 CNN-based 3D Model Fitting for Landmark Localization 10 2.2.4 Shape Searching for Landmark Localization 12 2.3 角度估測相關文獻 13 2.3.1 Regressive Tree Structured Model (RTSM) 13 第三章 主要方法 14 3.1 卷積類神經網路 (Convolutional Neural Network) 14 3.1.1 Feedforward 15 3.1.2 Backpropagation 16 3.1.3 Convolution 17 3.1.4 Max Pooling 18 3.1.5 Inverted Dropout 18 3.2 SSD人臉偵測器 20 3.2.1 網路架構 20 3.2.2 訓練階段 21 3.2.3 針對開放環境下作為人臉地標點之前置人臉偵測器下改良 22 3.3 Multi-Dropout Landmark Regressor (Landamrk MDR) 22 3.3.1 多層Dropout網路 22 3.3.2 稀疏與稠密Landmark MDR之訓練階段 24 3.4 3DMM擬合與人臉側面化 25 3.4.1 300W-Large Pose Augmentation (300W-LPA) 27 3.4.2 側面Landmark MDR訓練階段 28 3.5 Multi-Dropout Pose Regressor 28 3.6 以人工特徵作為CNN輔助特徵 30 第四章 實驗設置與分析 31 4.1 人臉標準資料庫介紹 31 4.1.1 Wider Face介紹 31 4.1.2 AFW介紹 31 4.1.3 PASCAL Faces介紹 32 4.2 人臉地標點標準資料庫介紹 32 4.2.1 M-PIE介紹 33 4.2.2 MTFL介紹 34 4.2.3 300W介紹 35 4.2.4 300W-LP 介紹 35 4.2.5 AFLW介紹 36 4.3 實驗樣本規格 36 4.3.1 影像前處理 36 4.3.2 Data Augmentation 37 4.4 實驗結果與分析 37 4.4.1 人臉偵測器效能比較 37 4.4.2 Dropout位置與數量分析 39 4.4.3 正面稀疏Landmark MDR效能比較 42 4.4.4 人臉偵測對地標點回歸之影響 44 4.4.5 人臉側面化影像訓練之Pose MDR效能評估 46 4.4.6 相關人臉地標點定位演算法比較 48 第五章 結論與未來研究方向 50 第六章 參考文獻 51

[1] Jia, Yangqing, et al. "Caffe: Convolutional architecture for fast feature embedding." Proceedings of the 22nd ACM international conference on Multimedia. ACM, 2014.
[2] Al-Rfou, Rami, et al. "Theano: A Python framework for fast computation of mathematical expressions." arXiv preprint arXiv:1605.02688 (2016).
[3] Dieleman, Sander, et al. "Lasagne: First release." Zenodo: Geneva, Switzerland (2015).
[4] Daniel Nouri. 2014. nolearn: scikit-learn compatible neural network library https://github.com/dnouri/nolearn
[5] Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks." Advances in neural information processing systems. 2015.
[6] Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE International Conference on Computer Vision. 2015.
[7] Girshick, Ross, et al. "Region-based convolutional networks for accurate object detection and segmentation." IEEE transactions on pattern analysis and machine intelligence 38.1 (2016): 142-158.
[8] Uijlings, Jasper RR, et al. "Selective search for object recognition." International journal of computer vision 104.2 (2013): 154-171.
[9] Everingham, Mark, et al. "The PASCAL visual object classes challenge 2007 (VOC2007) results." (2007).
[10] Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
[11] Hsu, Gee-Sern, Kai-Hsiang Chang, and Shih-Chieh Huang. "Regressive tree structured model for facial landmark localization." Proceedings of the IEEE International Conference on Computer Vision. 2015.
[12] Zhu, Xiangxin, and Deva Ramanan. "Face detection, pose estimation, and landmark localization in the wild." Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.
[13] Sun, Yi, Xiaogang Wang, and Xiaoou Tang. "Deep convolutional network cascade for facial point detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013.
[14] Zhang, Zhanpeng, et al. "Facial landmark detection by deep multi-task learning." European Conference on Computer Vision. Springer International Publishing, 2014.
[15] Zhang, Zhanpeng, et al. "Learning deep representation for face alignment with auxiliary attributes." IEEE transactions on pattern analysis and machine intelligence 38.5 (2016): 918-930.
[16] Ahmed, Amr, et al. "Training hierarchical feed-forward visual recognition models using transfer learning from pseudo-tasks." Computer Vision–ECCV 2008 (2008): 69-82.
[17] Weston, Jason, et al. "Deep learning via semi-supervised embedding." Neural Networks: Tricks of the Trade. Springer Berlin Heidelberg, 2012. 639-655.
[18] Li, Sijin, Zhi-Qiang Liu, and Antoni B. Chan. "Heterogeneous multi-task learning for human pose estimation with deep convolutional neural network." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2014.
[19] Zhu, Xiangyu, et al. "Face alignment across large poses: A 3D solution." CVPR, 2016.
[20] Jourabloo, Amin, and Xiaoming Liu. "Large-pose face alignment via CNN-based dense 3D model fitting." CVPR, 2016.
[21] Zhu, Shizhan, et al. "Face alignment by coarse-to-fine shape searching." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
[22] Blanz, Volker, and Thomas Vetter. "Face recognition based on fitting a 3D morphable model." IEEE Transactions on pattern analysis and machine intelligence 25.9 (2003): 1063-1074.
[23] Paysan, Pascal, et al. "A 3D face model for pose and illumination invariant face recognition." Advanced video and signal based surveillance, 2009. AVSS'09. Sixth IEEE International Conference on. IEEE, 2009.
[24] Cao, Chen, et al. "Facewarehouse: A 3d facial expression database for visual computing." IEEE Transactions on Visualization and Computer Graphics 20.3 (2014): 413-425.
[25] Hinton, Geoffrey E., et al. "Improving neural networks by preventing co-adaptation of feature detectors." arXiv preprint arXiv:1207.0580 (2012).
[26] Liu, Wei, et al. "SSD: Single shot multibox detector." European Conference on Computer Vision. Springer International Publishing, 2016.
[27] Erhan, Dumitru, et al. "Scalable object detection using deep neural networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014.
[28] Hariharan, Bharath, et al. "Hypercolumns for object segmentation and fine-grained localization." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
[29] Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." ICLR, 2015.
[30] Parkhi, Omkar M., Andrea Vedaldi, and Andrew Zisserman. "Deep Face Recognition." BMVC. Vol. 1. No. 3. 2015.
[31] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.
[32] Zhu, Xiangyu, et al. "High-fidelity pose and expression normalization for face recognition in the wild." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
[33] Ojala, Timo, Matti Pietikainen, and David Harwood. "Performance evaluation of texture measures with classification based on Kullback discrimination of distributions." Pattern Recognition, 1994. Vol. 1-Conference A: Computer Vision & Image Processing., Proceedings of the 12th IAPR International Conference on. Vol. 1. IEEE, 1994.
[34] Zhong, Zhuoyao, Lianwen Jin, and Zecheng Xie. "High performance offline handwritten chinese character recognition using googlenet and directional feature maps." Document Analysis and Recognition (ICDAR), 2015 13th International Conference on. IEEE, 2015.
[35] Li, Hui, and Chunhua Shen. "Reading car license plates using deep convolutional neural networks and lstms." arXiv preprint arXiv:1601.05610 (2016).
[36] Hassner, Tal, et al. "Effective face frontalization in unconstrained images." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
[37] Romdhani, Sami, and Thomas Vetter. "Estimating 3D shape and texture using pixel intensity, edges, specular highlights, texture constraints and a prior." Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. Vol. 2. IEEE, 2005.
[38] Yang, Shuo, et al. "Wider face: A face detection benchmark." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
[39] Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." European conference on computer vision. Springer International Publishing, 2014.
[40] Mathias, Markus, et al. "Face detection without bells and whistles." European Conference on Computer Vision. Springer International Publishing, 2014.
[41] Gross, Ralph, et al. "Multi-pie." Image and Vision Computing 28.5 (2010): 807-813.
[42] Belhumeur, Peter N., et al. "Localizing parts of faces using a consensus of exemplars." IEEE transactions on pattern analysis and machine intelligence 35.12 (2013): 2930-2940.
[43] Le, Vuong, et al. "Interactive facial feature localization." Computer Vision–ECCV 2012 (2012): 679-692.
[44] Sagonas, Christos, et al. "300 faces in-the-wild challenge: The first facial landmark localization challenge." Proceedings of the IEEE International Conference on Computer Vision Workshops. 2013.
[45] Messer, Kieron, et al. "XM2VTSDB: The extended M2VTS database." Second international conference on audio and video-based biometric person authentication. Vol. 964. 1999.
[46] Köstinger, Martin, et al. "Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization." Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on. IEEE, 2011.
[47] Viola, Paul, and Michael J. Jones. "Robust real-time face detection." International journal of computer vision 57.2 (2004): 137-154.
[48] Bradski, Gary. "The OpenCV Library." Dr. Dobb's Journal of Software Tools (2000).
[49] Yu, Xiang, et al. "Pose-free facial landmark fitting via optimized part mixtures and cascaded deformable shape model." Proceedings of the IEEE International Conference on Computer Vision. 2013.
[50] Cao, Xudong, et al. "Face alignment by explicit shape regression." International Journal of Computer Vision 107.2 (2014): 177-190.
[51] Luxand Incorporated: Luxand face SDK, http://www.luxand.com/
[52] Burgos-Artizzu, Xavier P., Pietro Perona, and Piotr Dollár. "Robust face landmark estimation under occlusion." Proceedings of the IEEE International Conference on Computer Vision. 2013.
[53] Ren, Shaoqing, et al. "Face alignment at 3000 fps via regressing local binary features." CVPR, 2014.
[54] Xiong, Xuehan, and Fernando De la Torre. "Supervised descent method and its applications to face alignment." Proceedings of the IEEE conference on computer vision and pattern recognition. 2013.
[55] Hou, Qiqi, et al. "Facial landmark detection via cascade multi-channel convolutional neural network." ICIP, 2015.
[56] Ding, Changxing, and Dacheng Tao. "Robust face recognition via multimodal deep face representation." IEEE Transactions on Multimedia 17.11 (2015): 2049-2058.

QR CODE