研究生: |
黃士傑 Shih-Chieh Huang |
---|---|
論文名稱: |
以多層式樹狀結構模型進行人臉地標點偵測與追蹤 Hierarchical Tree Structured Model for Facial Landmark Detection and Tracking |
指導教授: |
徐繼聖
Gee-Sern Hsu |
口試委員: |
莊仁輝
none 鍾國亮 none 王鈺強 none |
學位類別: |
碩士 Master |
系所名稱: |
工程學院 - 機械工程系 Department of Mechanical Engineering |
論文出版年: | 2015 |
畢業學年度: | 103 |
語文別: | 中文 |
論文頁數: | 93 |
中文關鍵詞: | 人臉偵測 、地標點定位 、角度估測 |
外文關鍵詞: | face detection, landmark localization, pose estimation |
相關次數: | 點閱:275 下載:3 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
不同於以往的人臉偵測、角度估測及Landmark(地標點)定位在實際系統的應用,Tree Structured Model,以下簡稱TSM,僅利用單一的統一模型,可解決此三種議題,但其需花費大量的計算時間(以640*480大小來說,平均單張偵測時間為10秒)且無法對較小的人臉(小於80*80)或者單張多人臉影像進行偵測,故無法符合實際系統的需求。
本研究首次提出多層式樹狀結構模型(Hierarchical Tree Structured Model),以下簡稱HTSM,將解決偵測速度及地標點定位準確度問題,由兩層式的TSM之尺度較小的粗略模型(coarse TSM, c-TSM)、精細模型(refined TSM, r-TSM)及雙向式支持向量回歸器(Bilateral Support Vector Regressor, BSVR)所組成,c-TSM是建立在低解析度的影像基礎上,其目的為能粗略且快速地偵測出人臉候選區域,而r-TSM使建立在高解析度的影像基礎上,其目的為能於c-TSM偵測出來的候選區域周圍以較準確的模組將其地標點標示出來,接著利用順向BSVR以r-TSM所定位出的地標點為基礎去估測出一組較稠密但誤差較大的地標點,最後,將順向BSVR估測出的地標點接續使用逆向BSVR去重新定位一組較收斂的地標點。本論文也針對三種特徵方法進行效能與速度之探討,包括:方向梯度直方圖(Histogram of Oriented Gradients, HOG)、局部二值模式(Local Binary Patterns, LBP)及離散餘弦轉換(Discrete Cosine Transform, DCT),經過效能及速度的比較後會提供一個結論。最後,本論文也會提供一份完整的HTSM模型在不同參數測試於不同標準資料庫的效能與時間比較,證實所提出的方法具有高度競爭力且可滿足實際系統之應用。
Although the Tree Structured Model (TSM) is proven effective for solving face detection, pose estimation and landmark localization in an unified model, its sluggish runtime makes it unfavorable in practical applications, especially when dealing with cases of multiple faces. We propose the Hierarchical Tree Structured Model (HTSM) to improve the run-time speed and localization accuracy. The HTSM is composed of two component TSMs, the coarse TSM (c-TSM) and the refined TSM (r-TSM), and a Bilateral Support Vector Regressor (BSVR). The c-TSM is built on the low-resolution octaves of samples so that it provides coarse but fast face detection. The r-TSM is built on the mid-resolution octaves so that it can locate the landmarks on the face candidates given by the c-TSM and improve precision. The r-TSM based landmarks are used in the forward BSVR as references to locate the dense set of landmarks, which are then used in the backward BSVR to relocate the landmarks with large localization errors. The forward and backward regression goes on iteratively until convergence. We also researched in different feature extractions, including HOG, LBP and DCT, then we give a conclusion of performance and speed about the three features. The performance of the HTSM is validated on three benchmark databases, the Multi-PIE, LFPW and AFW, and compared with the latest TSM to demonstrate its efficacy.
[1] X. Zhu, D. Ramanan. "Face detection, pose estimation and landmark localization in the wild",Computer Vision and Pattern Recognition (CVPR) Providence, Rhode Island, June 2012.
[2] T.F. Cootes and C.J. Taylor and D.H. Cooper and J. Graham. "Active shape models - their training and application". Computer Vision and Image Understanding (61): 38–59,1995
[3] T.F Cootes, G. J. Edwards, and C. J. Taylor, “Active appearance models,” In Proc. 5th European Conference on Computer Vision, Vol. 2, pp 484-498, 1998.
[4] T. Cootes, G. Edwards, and C. Taylor, “Active Appearance Models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 23, no. 6, pp.681-685 , 2001.
[5] P. Felzenszwalb, D. McAllester, D. Ramaman, A Discriminatively Trained, Multiscale, Deformable Part Model IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008
[6] X Yu, J Huang, S Zhang, W Yan, DN Metaxas ”Pose-free Facial Landmark Fitting via Optimized Part Mixtures and Cascaded Deformable Shape Model”,ICCV 2013
[7] S. Ioffe and D. Forsyth. Mixtures of trees for object recognition.In CVPR 2001
[8] A. Asthana, S. Zafeiriou, S. Cheng and M. Pantic. Robust Discriminative Response Map Fitting with Constrained Local Models. CVPR 2013.
[9] P. Viola and M. J. Jones. Robust real-time face detection. IJCV, 2004
[10] Z. Kalal, J. Matas, and K. Mikolajczyk. “Weighted sampling for large-scale boosting”. In BMVC 2008.
[11] D. Cristinacce and T. Cootes. “Feature detection and tracking with constrained local models” .In BMVC,2006.
[12] J. Saragih, S. Lucey, and J. Cohn. Deformable model fitting by regularized landmark mean-shift. IJCV, 2011
[13] Viola, Jones: Robust Real-time Object Detection, IJCV 2001
[14] G. Edwards, C. Taylor, and T. Cootes, “Interpreting face images using active appearance models,” In Proc. IEEE Int’l Conf. Automatic Face and Gesture Recognittion, pp. 300-305, 1998.
[15] S. C. Mitchell, J. G. Bosch, B. P. F. Lelieveldt, R. J. van der Geest, J. H. C. Reiber, and M. Sonka, “3-D active appearance models : Segmentation of cardiac MR and ultrasound images,” IEEE Trans. Medical Imaging, Vol. 21, no. 9, pp.1167-1178,2002.
[16] Simon Lucey, Iain Mattews, Chango Hu, Zara Ambadar, Fernando de la Torre, and Jeffry Chon, “AAM Derived Face Representations for Robust Facial Action Recognition,” International Conference on Automatic Face and Gesture Recognition, pp.155-160, 2006.
[17] M. B. Stegmann, ” Object tracking using active appearance models,” in Proc. 10th Danish Conf. Pattern Recognition and Image Analysis, Vol.1, pp. 54–60, 2001 .
[18] M. Fischler and R. Elschlager. The representation and matching of pictorial structures. IEEE Transactions on Computer,22(1):67–92, January 1973.
[19] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan.”Object detection with discriminatively trained partbased models.” IEEE TPAMI, 2009
[20] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, pages I: 886–893, 2005.
[21] R. Gross, I. Matthews, J. Cohn, T. Kanade, and S. Baker, ``Multi-pie,'' Proc. IEEE Conf. Automatic Face and Gesture Recognition, pp. 1,8,17–19, Sept. 2008.
[22] Peter N. Belhumeur, David W. Jacobs, David J. Kriegman, Neeraj Kumar, "Localizing Parts of Faces Using a Consensus of Exemplars,"Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition (CVPR),June 2011.
[23] Teodora Vatahska, Maren Bennewitz, and Sven Behnke. Feature-based head pose estimation from images. In Proceedings of the IEEE-RAS International Conference on Humanoid Robots (Humanoids), 2007
[24] B. D. Lucas, T. Kanade, et al., “An iterative image registration technique with an application to stereo vision.,” in IJCAI, vol. 81, pp. 674–679, 1981. 21, 22
[25] G. Tzimiropoulos, and M. Pantic, "Optimization problems for fast AAM
fitting in-the-wild," ICCV 2013
[26] A. J. Smola and B. Scholkopf, “A Tutorial on Support Vector Regression”, September 30, 2003
[27] H. Rowley, S. Baluja, and T. Kanade. Neural network-based face detection. Pattern Analysis and Machine Intelligence, IEEE Trans., 20(1):23–38, 1998.
[28] V. Le, J. Brandt, Z. Lin, L. Boudev, T. S. Huang. Interactive Facial Feature Localization. ECCV, 2012
[29] T. Ojala, M. Pietikäinen, and D. Harwood, "Performance evaluation of texture measures with classification based on Kullback discrimination of distributions", Proceedings of the 12th IAPR International Conference on Pattern Recognition (ICPR 1994), vol. 1, pp. 582 - 585.
[30] Mäenpää, Topi. The Local Binary Pattern Approach to Texture Analysis: Extenxions and Applications. Oulun yliopisto, 2003.
[31] N. Ahmed, T. Natarajan and K.R. Rao , "Discrete Cosine Transform," IEEE Trans, Computers, vol.C-23, no.1, pp.90-93, Jan. 1974.
[32] Ojala, T., Pietikäinen, M. and Mäenpää, T. (2002), Multiresolution Gray-scale and Rotation Invariant Texture Classification with Local Binary Patterns. IEEE Trans. Pattern Analysis and Machine Intelligence 24(7): 971-987.
[33] 張凱翔, “基於雙層式元件模型之人臉地標點定位與角度估測”, 台灣科技大學碩士學位論文, 2015