簡易檢索 / 詳目顯示

研究生: 黃士傑
Shih-Chieh Huang
論文名稱: 以多層式樹狀結構模型進行人臉地標點偵測與追蹤
Hierarchical Tree Structured Model for Facial Landmark Detection and Tracking
指導教授: 徐繼聖
Gee-Sern Hsu
口試委員: 莊仁輝
none
鍾國亮
none
王鈺強
none
學位類別: 碩士
Master
系所名稱: 工程學院 - 機械工程系
Department of Mechanical Engineering
論文出版年: 2015
畢業學年度: 103
語文別: 中文
論文頁數: 93
中文關鍵詞: 人臉偵測地標點定位角度估測
外文關鍵詞: face detection, landmark localization, pose estimation
相關次數: 點閱:275下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

不同於以往的人臉偵測、角度估測及Landmark(地標點)定位在實際系統的應用,Tree Structured Model,以下簡稱TSM,僅利用單一的統一模型,可解決此三種議題,但其需花費大量的計算時間(以640*480大小來說,平均單張偵測時間為10秒)且無法對較小的人臉(小於80*80)或者單張多人臉影像進行偵測,故無法符合實際系統的需求。
本研究首次提出多層式樹狀結構模型(Hierarchical Tree Structured Model),以下簡稱HTSM,將解決偵測速度及地標點定位準確度問題,由兩層式的TSM之尺度較小的粗略模型(coarse TSM, c-TSM)、精細模型(refined TSM, r-TSM)及雙向式支持向量回歸器(Bilateral Support Vector Regressor, BSVR)所組成,c-TSM是建立在低解析度的影像基礎上,其目的為能粗略且快速地偵測出人臉候選區域,而r-TSM使建立在高解析度的影像基礎上,其目的為能於c-TSM偵測出來的候選區域周圍以較準確的模組將其地標點標示出來,接著利用順向BSVR以r-TSM所定位出的地標點為基礎去估測出一組較稠密但誤差較大的地標點,最後,將順向BSVR估測出的地標點接續使用逆向BSVR去重新定位一組較收斂的地標點。本論文也針對三種特徵方法進行效能與速度之探討,包括:方向梯度直方圖(Histogram of Oriented Gradients, HOG)、局部二值模式(Local Binary Patterns, LBP)及離散餘弦轉換(Discrete Cosine Transform, DCT),經過效能及速度的比較後會提供一個結論。最後,本論文也會提供一份完整的HTSM模型在不同參數測試於不同標準資料庫的效能與時間比較,證實所提出的方法具有高度競爭力且可滿足實際系統之應用。


Although the Tree Structured Model (TSM) is proven effective for solving face detection, pose estimation and landmark localization in an unified model, its sluggish runtime makes it unfavorable in practical applications, especially when dealing with cases of multiple faces. We propose the Hierarchical Tree Structured Model (HTSM) to improve the run-time speed and localization accuracy. The HTSM is composed of two component TSMs, the coarse TSM (c-TSM) and the refined TSM (r-TSM), and a Bilateral Support Vector Regressor (BSVR). The c-TSM is built on the low-resolution octaves of samples so that it provides coarse but fast face detection. The r-TSM is built on the mid-resolution octaves so that it can locate the landmarks on the face candidates given by the c-TSM and improve precision. The r-TSM based landmarks are used in the forward BSVR as references to locate the dense set of landmarks, which are then used in the backward BSVR to relocate the landmarks with large localization errors. The forward and backward regression goes on iteratively until convergence. We also researched in different feature extractions, including HOG, LBP and DCT, then we give a conclusion of performance and speed about the three features. The performance of the HTSM is validated on three benchmark databases, the Multi-PIE, LFPW and AFW, and compared with the latest TSM to demonstrate its efficacy.

摘要 III Abstract II 誌謝 IVV 目錄 V 圖目錄 IX 表目錄 XI 演算法目錄 XIII 第1章 介紹 1 1.1 研究背景和動機 1 1.1.1 地標點定位 1 1.1.2人臉偵測 2 1.1.3角度估測 3 1.1.4地標點回歸 3 1.2 方法概述 4 1.3 論文貢獻 5 1.4 論文架構 6 第2章 相關文獻探討 8 2.1 地標點定位之相關理論 8 2.1.1 主動形狀模型 (Active Shape Models, ASMs) 8 2.1.2 主動外觀模型 (Active Appearance Models, AAMs) 11 2.1.3約束的局部模型 (Constrained Local Models, CLMs) 12 2.2 人臉偵測之相關理論 12 2.2.1 Adaboost 13 2.2.2 Quasi-random weighted sampling + trimming 14 2.3 角度估測之相關理論 15 2.3.1 Tree Structured Model 15 2.4 地標點回歸之相關理論 12 2.4.1 辨別式響應圖擬合 (Discriminative Response Map Fitting, DRMF) 13 2.4.2 擬合式主動外觀模型 (Fitting Active Appearance Models, Fitting AAMs) 14 第3章 主要方法與流程 18 3.1 可型變之部件模型 (Deformable Part Model, DPM ) 18 3.2 樹狀結構模型 (Tree Structured Model, TSM) 23 3.2.1 訓練模組 24 3.2.2 偵測模組 30 3.3 支持向量回歸 (Support Vector Regression, SVR) 34 3.4 多層式樹狀結構模型 (Hierarchical Tree Structured Model, HTSM) 39 3.5 TSM之參數探討 39 第4章 特徵方法介紹 40 4.1 局部二值模式 (Local Binary Patterns, LBP) 40 4.2 離散餘弦轉換 (Discrete Cosine Transform, DCT) 43 第5章 實驗設置與分析 46 5.1 標準資料庫介紹 46 5.1.1 Multi-Pose-Illumination-Expression (Multi-PIE) 46 5.1.2 Labeled Face Parts in the Wild (LFPW) 47 5.1.3 Annotated Face in the Wild (AFW) 48 5.1.4 Helen 49 5.1.5 FRGC Ver.2 49 5.2 實驗設計 50 5.3 實驗結果與分析 51 5.3.1結合不同資料庫之訓練模組比較 52 5.3.2 c-TSM及r-TSM於不同設置之效能比較 58 5.3.3 不同Part位置之效能比較 58 5.3.4 加入BSVR後效能比較 60 5.3.5 Feature Pyramid調整之比較 65 5.3.6增加光源變化樣本比較 66 5.3.7不同特徵方法效能比較 68 5.3.8相關文獻效能比較 68 5.3.9 HTSM回歸方式效能比較 72 5.3.10 300 Faces In-The-Wild Challenge測試 73 第6章 即時系統製作與效能評估 74 6.1系統架構 74 第7章 結論與未來研究方向 75 參考文獻 76

[1] X. Zhu, D. Ramanan. "Face detection, pose estimation and landmark localization in the wild",Computer Vision and Pattern Recognition (CVPR) Providence, Rhode Island, June 2012.
[2] T.F. Cootes and C.J. Taylor and D.H. Cooper and J. Graham. "Active shape models - their training and application". Computer Vision and Image Understanding (61): 38–59,1995
[3] T.F Cootes, G. J. Edwards, and C. J. Taylor, “Active appearance models,” In Proc. 5th European Conference on Computer Vision, Vol. 2, pp 484-498, 1998.
[4] T. Cootes, G. Edwards, and C. Taylor, “Active Appearance Models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 23, no. 6, pp.681-685 , 2001.
[5] P. Felzenszwalb, D. McAllester, D. Ramaman, A Discriminatively Trained, Multiscale, Deformable Part Model IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008
[6] X Yu, J Huang, S Zhang, W Yan, DN Metaxas ”Pose-free Facial Landmark Fitting via Optimized Part Mixtures and Cascaded Deformable Shape Model”,ICCV 2013
[7] S. Ioffe and D. Forsyth. Mixtures of trees for object recognition.In CVPR 2001
[8] A. Asthana, S. Zafeiriou, S. Cheng and M. Pantic. Robust Discriminative Response Map Fitting with Constrained Local Models. CVPR 2013.
[9] P. Viola and M. J. Jones. Robust real-time face detection. IJCV, 2004
[10] Z. Kalal, J. Matas, and K. Mikolajczyk. “Weighted sampling for large-scale boosting”. In BMVC 2008.
[11] D. Cristinacce and T. Cootes. “Feature detection and tracking with constrained local models” .In BMVC,2006.
[12] J. Saragih, S. Lucey, and J. Cohn. Deformable model fitting by regularized landmark mean-shift. IJCV, 2011
[13] Viola, Jones: Robust Real-time Object Detection, IJCV 2001
[14] G. Edwards, C. Taylor, and T. Cootes, “Interpreting face images using active appearance models,” In Proc. IEEE Int’l Conf. Automatic Face and Gesture Recognittion, pp. 300-305, 1998.
[15] S. C. Mitchell, J. G. Bosch, B. P. F. Lelieveldt, R. J. van der Geest, J. H. C. Reiber, and M. Sonka, “3-D active appearance models : Segmentation of cardiac MR and ultrasound images,” IEEE Trans. Medical Imaging, Vol. 21, no. 9, pp.1167-1178,2002.
[16] Simon Lucey, Iain Mattews, Chango Hu, Zara Ambadar, Fernando de la Torre, and Jeffry Chon, “AAM Derived Face Representations for Robust Facial Action Recognition,” International Conference on Automatic Face and Gesture Recognition, pp.155-160, 2006.
[17] M. B. Stegmann, ” Object tracking using active appearance models,” in Proc. 10th Danish Conf. Pattern Recognition and Image Analysis, Vol.1, pp. 54–60, 2001 .
[18] M. Fischler and R. Elschlager. The representation and matching of pictorial structures. IEEE Transactions on Computer,22(1):67–92, January 1973.
[19] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan.”Object detection with discriminatively trained partbased models.” IEEE TPAMI, 2009
[20] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, pages I: 886–893, 2005.
[21] R. Gross, I. Matthews, J. Cohn, T. Kanade, and S. Baker, ``Multi-pie,'' Proc. IEEE Conf. Automatic Face and Gesture Recognition, pp. 1,8,17–19, Sept. 2008.
[22] Peter N. Belhumeur, David W. Jacobs, David J. Kriegman, Neeraj Kumar, "Localizing Parts of Faces Using a Consensus of Exemplars,"Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition (CVPR),June 2011.
[23] Teodora Vatahska, Maren Bennewitz, and Sven Behnke. Feature-based head pose estimation from images. In Proceedings of the IEEE-RAS International Conference on Humanoid Robots (Humanoids), 2007
[24] B. D. Lucas, T. Kanade, et al., “An iterative image registration technique with an application to stereo vision.,” in IJCAI, vol. 81, pp. 674–679, 1981. 21, 22
[25] G. Tzimiropoulos, and M. Pantic, "Optimization problems for fast AAM
fitting in-the-wild," ICCV 2013
[26] A. J. Smola and B. Scholkopf, “A Tutorial on Support Vector Regression”, September 30, 2003
[27] H. Rowley, S. Baluja, and T. Kanade. Neural network-based face detection. Pattern Analysis and Machine Intelligence, IEEE Trans., 20(1):23–38, 1998.
[28] V. Le, J. Brandt, Z. Lin, L. Boudev, T. S. Huang. Interactive Facial Feature Localization. ECCV, 2012
[29] T. Ojala, M. Pietikäinen, and D. Harwood, "Performance evaluation of texture measures with classification based on Kullback discrimination of distributions", Proceedings of the 12th IAPR International Conference on Pattern Recognition (ICPR 1994), vol. 1, pp. 582 - 585.
[30] Mäenpää, Topi. The Local Binary Pattern Approach to Texture Analysis: Extenxions and Applications. Oulun yliopisto, 2003.
[31] N. Ahmed, T. Natarajan and K.R. Rao , "Discrete Cosine Transform," IEEE Trans, Computers, vol.C-23, no.1, pp.90-93, Jan. 1974.
[32] Ojala, T., Pietikäinen, M. and Mäenpää, T. (2002), Multiresolution Gray-scale and Rotation Invariant Texture Classification with Local Binary Patterns. IEEE Trans. Pattern Analysis and Machine Intelligence 24(7): 971-987.
[33] 張凱翔, “基於雙層式元件模型之人臉地標點定位與角度估測”, 台灣科技大學碩士學位論文, 2015

QR CODE