簡易檢索 / 詳目顯示

研究生: 郭煜民
Yu-Ming Kuo
論文名稱: 基於稀疏編碼分享的多層式樹狀模型之人臉地標點定位
Multi-layer Tree Structured Model with Sparse Coded Clustering for Facial Landmark Localization
指導教授: 徐繼聖
Gee-Sern Hsu
口試委員: 陳祝嵩
Chu-Song Chen
康峻宏
Jiunn-Horng Kang
鍾國亮
Kuo-Liang Chung
學位類別: 碩士
Master
系所名稱: 工程學院 - 機械工程系
Department of Mechanical Engineering
論文出版年: 2016
畢業學年度: 104
語文別: 中文
論文頁數: 102
中文關鍵詞: 人臉偵測人臉地標點估測樹狀結構模型
外文關鍵詞: Face detection, Face landmark localization, Tree Structured Model
相關次數: 點閱:225下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 我們提出了基於稀疏編碼的元件分享模型來進行人臉地標點定位,MTSM由3大TSM(Tree Structured Modles)模組所組成,分別為coarse TSM (c-TSM), median TSM (m-TSM)及refine TSM (r-TSM),不同於Regressive TSM (RTSM)使用了雙層式的架構,c-TSM及r-TSM,MTSM加入的m-TSM可以快速的排出由c-TSM在粗糙層搜索時所產生的false positive,m-TSM建立在比r-TSM更低解析度的影像且更少的元件數上,因此可以在c-TSM所給予的人臉候選區上做快速的篩選。由於大部分的false positve都經由m-TSM快速的排除掉,剩餘的人臉候選區交由模型相對複雜的r-TSM做處理,降低了整體的偵測時間。我們也提出了基於稀疏編碼元件之分享模型用於TSM的架構上來進行地標點探測,全部的元件都經由Dictionary Learning的處理,所以能以稀疏編碼的表示法來表示元件的特徵,樹狀模型的建立基於元件稀疏編碼間不同分群的分享方式,基於此架構下MTSM,我們稱之為Sparse Coded TSM (SC-TSM),我們在四個公開資料庫比較了MTSM、SC-TSM及RTSM與其它不同的近期方法,分別為Helen、AFW、AFLW及LFPW,且證實MTSM及SC-TSM具有競爭力。


    We propose the Multi-Layer Tree Structured Model (MTSM) with sparse coded part sharing for facial landmark localization. The MTSM consists of three component TSMs (Tree Structured Models), namely coarse TSM (c-TSM), median TSM (m-TSM) and refined TSM (r-TSM). Different from the Regressive TSM (RTSM) that uses two components, c-TSM and r-TSM, the MTSM has an additional m-TSM that can block out the false positives generated by the c-TSM at the coarse search phase. m-TSM is built on lower resolution with few parts than those of the r-TSM, and can therefore process the c-TSM detected candidates in higher speed. Because most false positives are filtered out by the m-TSM, fewer candidates are left to be processed by the relatively complex r-TSM, shortening the overall process time. We also propose the sparse coded part sharing as a new version of TSM-based landmark localization. All parts are processed by dictionary learning so that the sparse coded representations of part features can be defined. The parts are clustered using their sparse coded representations, and novel tree structures are defined using the parts across different clusters. This approach yields the Sparse Coded TSM (SC-TSM). We compare the MTSM, the SC-TSM, the RTSM and other contemporary approaches on four benchmark databases, Helen, AFW, ALFW and LFPW, and show that the performance of the MTSM and SC-TSM are competitive.

    摘要 III Abstract II 誌謝 IV 目錄 V 圖目錄 IX 表目錄 XII 第1章 介紹 1 1.1 研究背景和動機 1 1.1.1 地標點定位 2 1.1.2人臉偵測 3 1.1.3角度估測 4 1.1.4地標點回歸 4 1.2 方法概述 5 1.3 論文貢獻 7 1.4 論文架構 7 第2章 相關文獻探討 9 2.1 地標點定位之相關理論 9 2.1.1主動形狀模型 (Active Shape Models, ASMs) 9 2.1.2主動外觀模型 (Active Appearance Models, AAMs) 12 2.1.3約束的局部模型 (Constrained Local Models, CLMs) 13 2.1.4 樹狀結構模型 (Tree Structured Model, TSM) 13 2.2 人臉偵測 20 2.2.1 Adaboost 21 2.2.2 Regions with CNN features for face detection 22 2.3 地標點回歸 23 2.3.1 Discriminative Response Map Fitting (DRMF) 23 2.3.2 Supervised Descent Method (SDM) 23 2.3.3 Robust Cascaded Pose Regression (RCPR) 25 2.3.4 Gauss-newton deformable part model (GN-DPM) 25 2.3.5 Coarse-to-Fine Auto-Encoder Networks (CFAN) 26 2.3.6 Coarse-to-Fine Shape Searching (CFSS) 26 2.3.7 Tasks-Constrained Deep Convolutional Network (TCDCN) 28 2.3.8 Regressive Tree Structured Model (RTSM) 29 第3章 主要方法與流程 31 3.1 Multi layer Tree Structured Model 31 3.2 Models for Handling Large Rotations in Roll and Pitch 33 3.3 Dictionary Learning與Sparse Coded 36 第4章 視角群組及Sparse Coded分享模型介紹 38 4.1視角群組分享模型介紹 38 4.2 Tree Structured Model Shared with Sparse Coded cluster 42 4.2.1相關參數設置 42 4.2.2以初始分數值挑選分群元件濾波器 44 第5章 實驗設置與分析 48 5.1標準資料庫介紹 48 5.1.1 Multi-PIE 介紹 48 5.1.2 LFPW 介紹 49 5.1.3 AFW 介紹 50 5.1.4 Helen 介紹 50 5.1.5 AFLW介紹 51 5.1.6 MTFL介紹 52 5.1.7 Wider Face 介紹 52 5.2實驗設計 53 5.3實驗結果與分析 54 5.3.1 MTSM模型於不同視角群組分享下之效能比較 55 5.3.2 cm-TSM與R-CNN人臉初始化比較 63 5.3.3優化BSVR之效能比較 64 5.3.4基於Sparse Coded在不同分群下分享模型之效能比較 68 5.3.5 MTSM與SC-TSM之效能比較 73 5.3.6加入sub Roll&Pitch r-TSM後之效能比較 74 5.3.7相關文獻比較 76 第6章 結論與未來研究方向 79 參考文獻 82

    [1] X. Zhu, D. Ramanan. "Face detection, pose estimation and landmark localization in the wild",Computer Vision and Pattern Recognition (CVPR) Providence, Rhode Island, June 2012.
    [2] T.F. Cootes and C.J. Taylor and D.H. Cooper and J. Graham. "Active shape models - their training and application". Computer Vision and Image Understanding (61): 38–59,1995
    [3] T.F Cootes, G. J. Edwards, and C. J. Taylor, “Active appearance models,” In Proc. 5th European Conference on Computer Vision, Vol. 2, pp 484-498, 1998.
    [4] T. Cootes, G. Edwards, and C. Taylor, “Active Appearance Models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 23, no. 6, pp.681-685 , 2001.
    [5] P. Felzenszwalb, D. McAllester, D. Ramaman, A Discriminatively Trained, Multiscale, Deformable Part Model IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008
    [6] X Yu, J Huang, S Zhang, W Yan, DN Metaxas ”Pose-free Facial Landmark Fitting via Optimized Part Mixtures and Cascaded Deformable Shape Model”,ICCV 2013
    [7] S. Ioffe and D. Forsyth. Mixtures of trees for object recognition.In CVPR 2001
    [8] A. Asthana, S. Zafeiriou, S. Cheng and M. Pantic. Robust Discriminative Response Map Fitting with Constrained Local Models. CVPR 2013.
    [9] P. Viola and M. J. Jones. Robust real-time face detection. IJCV, 2004
    [10] Z. Kalal, J. Matas, and K. Mikolajczyk. “Weighted sampling for large-scale boosting”. In BMVC 2008.
    [11] D. Cristinacce and T. Cootes. “Feature detection and tracking with constrained local models” .In BMVC,2006.
    [12] J. Saragih, S. Lucey, and J. Cohn. Deformable model fitting by regularized landmark mean-shift. IJCV, 2011
    [13] Viola, Jones: Robust Real-time Object Detection, IJCV 2001
    [14] G. Edwards, C. Taylor, and T. Cootes, “Interpreting face images using active appearance models,” In Proc. IEEE Int’l Conf. Automatic Face and Gesture Recognittion, pp. 300-305, 1998.
    [15] S. C. Mitchell, J. G. Bosch, B. P. F. Lelieveldt, R. J. van der Geest, J. H. C. Reiber, and M. Sonka, “3-D active appearance models : Segmentation of cardiac MR and ultrasound images,” IEEE Trans. Medical Imaging, Vol. 21, no. 9, pp.1167-1178,2002.
    [16] Simon Lucey, Iain Mattews, Chango Hu, Zara Ambadar, Fernando de la Torre, and Jeffry Chon, “AAM Derived Face Representations for Robust Facial Action Recognition,” International Conference on Automatic Face and Gesture Recognition, pp.155-160, 2006.
    [17] M. B. Stegmann, ” Object tracking using active appearance models,” in Proc. 10th Danish Conf. Pattern Recognition and Image Analysis, Vol.1, pp. 54–60, 2001 .
    [18] M. Fischler and R. Elschlager. The representation and matching of pictorial structures. IEEE Transactions on Computer,22(1):67–92, January 1973.
    [19] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan.”Object detection with discriminatively trained partbased models.” IEEE TPAMI, 2009
    [20] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, pages I: 886–893, 2005.
    [21] R. Gross, I. Matthews, J. Cohn, T. Kanade, and S. Baker, ``Multi-pie,'' Proc. IEEE Conf. Automatic Face and Gesture Recognition, pp. 1,8,17–19, Sept. 2008.
    [22] Peter N. Belhumeur, David W. Jacobs, David J. Kriegman, Neeraj Kumar, "Localizing Parts of Faces Using a Consensus of Exemplars,"Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition (CVPR),June 2011.
    [23] Teodora Vatahska, Maren Bennewitz, and Sven Behnke. Feature-based head pose estimation from images. In Proceedings of the IEEE-RAS International Conference on Humanoid Robots (Humanoids), 2007
    [24] B. D. Lucas, T. Kanade, et al., “An iterative image registration technique with an application to stereo vision.,” in IJCAI, vol. 81, pp. 674–679, 1981. 21, 22
    [25] G. Tzimiropoulos, and M. Pantic, "Optimization problems for fast AAM
    fitting in-the-wild," ICCV 2013
    [26] A. J. Smola and B. Scholkopf, “A Tutorial on Support Vector Regression”, September 30, 2003
    [27] H. Rowley, S. Baluja, and T. Kanade. Neural network-based face detection. Pattern Analysis and Machine Intelligence, IEEE Trans., 20(1):23–38, 1998.
    [28] V. Le, J. Brandt, Z. Lin, L. Boudev, T. S. Huang. Interactive Facial Feature Localization. ECCV, 2012
    [29] T. Ojala, M. Pietikäinen, and D. Harwood, "Performance evaluation of texture measures with classification based on Kullback discrimination of distributions", Proceedings of the 12th IAPR International Conference on Pattern Recognition (ICPR 1994), vol. 1, pp. 582 - 585.
    [30] J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online learning for matrix
    factorization and sparse coding. The Journal of Machine Learning
    Research, 11:19–60, 2010.
    [31] Hsu, Gee-Sern, Kai-Hsiang Chang, and Shih-Chieh Huang. "Regressive tree structured model for facial landmark localization." Proceedings of the IEEE International Conference on Computer Vision. 2015.
    [32] Mairal, Julien, et al. "Online dictionary learning for sparse coding." Proceedings of the 26th annual international conference on machine learning. ACM, 2009.
    [33] Köstinger, Martin, et al. "Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization." Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on. IEEE, 2011.
    [34] Yang, Shuo, et al. "WIDER FACE: A Face Detection Benchmark." arXiv preprint arXiv:1511.06523 (2015).
    [35] Xiong, Xuehan, and Fernando De la Torre. "Supervised descent method and its applications to face alignment." Proceedings of the IEEE conference on computer vision and pattern recognition. 2013.
    [36] Burgos-Artizzu, Xavier P., Pietro Perona, and Piotr Dollár. "Robust face landmark estimation under occlusion." Proceedings of the IEEE International Conference on Computer Vision. 2013.
    [37] Tzimiropoulos, Georgios, and Maja Pantic. "Gauss-newton deformable part models for face alignment in-the-wild." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014.
    [38] Zhang, Jie, et al. "Coarse-to-fine auto-encoder networks (CFAN) for real-time face alignment." European Conference on Computer Vision. Springer International Publishing, 2014.
    [39] Zhu,Shizhan, et al. "Face alignment by coarse-to-fine shape searching."Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
    [40] Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2014.
    [41] Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE International Conference on Computer Vision. 2015.
    [42] Ren, Shaoqing, et al. "Faster R-CNN: Towards real-time object detection with region proposal networks." Advances in neural information processing systems. 2015.
    [43] Smith, Brandon M., et al. "Nonparametric context modeling of local appearance for pose-and expression-robust facial landmark localization." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014.
    [44] Zhang, Zhanpeng, et al. "Facial landmark detection by deep multi-task learning." European Conference on Computer Vision. Springer International Publishing, 2014.
    [45] Luxand Incorporated: Luxand face SDK, http://www.luxand.com/.
    [46] Yu, Xiang, et al. "Pose-free facial landmark fitting via optimized part mixtures and cascaded deformable shape model." Proceedings of the IEEE International Conference on Computer Vision. 2013.
    [47] Zhang, Zhanpeng, et al. "Facial landmark detection by deep multi-task learning." European Conference on Computer Vision. Springer International Publishing, 2014.
    [48] Dollár, Piotr, Peter Welinder, and Pietro Perona. "Cascaded pose regression." Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2010.
    [49]劉亞旭,”利用深層卷積神經網絡之多層次特征偵測臉部關鍵點”,國立台灣大學碩士學位論文, 2016

    QR CODE