簡易檢索 / 詳目顯示

研究生: 郭靜宜
Ching-Yi Kuo
論文名稱: 基於深度學習之人臉法向量貼圖生成研究
A Study on Facial Normal Map Generation based on Deep Learning
指導教授: 林宗翰
Tzung-Han Lin
口試委員: 歐立成
Li-Chen Ou
孫沛立
Pei-Li Sun
胡國瑞
Kuo-Jui Hu
學位類別: 碩士
Master
系所名稱: 應用科技學院 - 色彩與照明科技研究所
Graduate Institute of Color and Illumination Technology
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 72
中文關鍵詞: 人臉法向量貼圖人臉特徵點卷積神經網路U-net立體光度法光影重建
外文關鍵詞: Facial normal map, Facial landmark, Convolutional neural network(CNN), U-net, Photometric stereo, Image relighting
相關次數: 點閱:293下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 由於人臉的複雜性、多樣性以及攝影條件的不確定性,從單張或多張人臉圖像中生成高品質的法向量貼圖是一個具有挑戰性的任務。
    本研究旨在提出一個基於卷積神經網路U-net的模型架構,使用者只需輸入一張正面且光源均勻的人臉彩色影像到本模型中,即可生成尺寸為512*512的人臉法向量貼圖。
    本研究採用了基於立體光度法的照相系統,以蒐集訓練所需的人臉影像資料,總共拍攝了32組訓練資料與6組測試資料,此外,還使用了手機多加拍攝2組測試資料。為了增加資料多樣性,本研究利用光影重建方法使訓練資料從1張影像擴增至16,128張影像,同時,利用人臉特徵點對人臉區域進行分割並編號,將人臉區域編號資料疊加在影像中,期望模型能學習人臉的不同部位,最後將訓練資料進行分組,分別訓練出十個模型。
    實驗中,本研究使用評估指標餘弦相似度(Cosine similarity)和結構相似度(SSIM)比較真實法向量貼圖與模型生成的法向量貼圖,來評估模型性能。實驗結果發現,對於使用照相系統拍攝的測試資料來說,將訓練資料擴增至2,304張所訓練出來的模型表現最佳。而對於使用手機拍攝的測試資料來說,將訓練資料擴增至16,128張所訓練出來的模型表現最佳。這說這證明了本研究的資料擴增方法有助於增加訓練資料的多樣性並提升模型性能。另外,實驗結果顯示,對於使用照相系統拍攝的測試資料來說,僅使用不加入人臉區域編號資料之影像就能獲得良好結果。而對於使用手機拍攝的測試資料來說,加入人臉區域編號資料所得到的結果較穩定。
    最後,本研究注意到輸入的人臉影像若是頭髮較為雜亂或遮住某些部位,會影響本研究模型的結果品質。因此,在本研究中,要求輸入的影像符合正面拍攝、整理乾淨的臉部、綁起頭髮或固定在耳後,且不戴眼鏡等限制。


    Due to the complexity, diversity, and uncertainty of photography conditions for faces, generating high-quality normal maps from single or multiple face images is a challenging task.
    This study aims to propose a model architecture based on the convolutional neural network U-net. Users only need to input a front-facing and uniform light source color image into the model to generate a 512x512-sized face normal map.
    We employed a photography system based on photometric stereo to collect the required face image data for training. A total of 32 training datasets and 6 test datasets were captured, and additionally, 2 test datasets were captured using mobile phones. To increase data diversity, we relit the images to augment the training data from 1 image to 16,128 images. Additionally, we segmented and numbered face regions using facial landmarks, concatenating the numbering data and the images to enable the model to learn different facial parts. The training data was then grouped to train ten separate models.
    In the experiments, we used metrics such as Cosine similarity and Structural Similarity Index (SSIM) to compare the real normal maps with the generated ones to evaluate the models’ performance. The results showed that for test data captured using the photography system, the model trained with 2,304 augmented images performed the best. For test data captured using mobile phones, the model trained with 16,128 augmented images performed the best. This demonstrates that our data augmentation method improves data diversity and enhances model performance. Furthermore, the experiments showed that for test data captured using the photography system, good results were achieved even without using numbering data. However, for test data captured using mobile phones, the results were more stable when numbering data was used.
    Finally, we observed that the quality of the models’ results was affected when input face images have messy hair or covered certain parts. Therefore, in this study, we required input images to meet certain restrictions, including front-facing, clean facial appearance, tied-up hair, or hair fixed behind the ears, and not wearing glasses.

    摘要 I Abstract II 致謝 III 目錄 IV 圖目錄 VI 表目錄 IX 第1章、緒論 1 1.1 研究背景 1 1.2 研究動機與目的 1 1.3 論文架構 2 第2章、文獻回顧 3 2.1 光影重建與法向量貼圖研究 3 2.1.1 影像處理法 3 2.1.2 立體光度法 4 2.1.3 人工智慧法 6 2.2 人臉檢測研究 8 2.2.1 人臉資料集 8 2.2.2 人臉檢測方法 10 2.3 影像相似度評估方法 13 2.4 卷積神經網路U-net 16 第3章、實驗數據集與神經網路模型 18 3.1 實驗數據之建立 18 3.1.1 硬體設備 18 3.1.2 拍攝環境 20 3.1.3 法向量貼圖生成 20 3.1.4 訓練資料蒐集 22 3.2 影像前處理 22 3.2.1 去背方法 23 3.2.2 資料擴增(Data Augmentation)方法 24 3.2.3 人臉區域編號資料之產生 25 3.3 神經網路模型之設計 26 3.3.1 U-net網路架構 26 3.3.2 鄰近點採樣(Nearest neighbor) 30 3.4 神經網路模型之參數 30 3.4.1 損失函數(Loss function) 30 3.4.2 評估指標(Metric) 31 3.4.3 優化器(Optimizer) 31 3.4.4 正規化(Normalize)編碼 32 3.5 實驗設計 33 3.5.1 資料分組 33 3.5.2 訓練模型 34 3.5.3 實驗內容 35 第4章、實驗結果與討論 36 4.1 實驗一結果:模型生成之法向量貼圖分析 36 4.1.1 模型輸出之結果 37 4.1.2 生成之法向量貼圖細節比較 39 4.1.3 生成之法向量貼圖不合理之處 40 4.2 實驗二結果:評估模型(利用餘弦相似度與結構相似度) 40 4.3 實驗三結果:光影重建 43 第5章、結論與未來展望 49 5.1 結論 49 5.2 未來展望 50 參考文獻 51 附錄 56 附錄一 56 附錄二 58

    [1]O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi, Eds., in Lecture Notes in Computer Science. Cham: Springer International Publishing, 2015, pp. 234–241. doi: 10.1007/978-3-319-24574-4_28.
    [2]H.-Y. Chang and T.-H. Lin, “Portrait Imaging Relighting System based on A Simplified Photometric Stereo Method,” Applied Optics, vol. 61, no. 15, pp. 4379–4386, May 2022, doi: 10.1364/AO.451662.
    [3]M. Okabe, G. Zeng, Y. Matsushita, T. Igarashi, L. Quan, and H. Y. Shum, “Single-View Relighting with Normal Map Painting,” in Proceedings of Pacific Graphics, pages 27–34, 2006.
    [4]D. Shahlaei and V. Blanz, “Realistic Inverse Lighting from a Single 2D Image of a Face, Taken Under Unknown and Complex Lighting,” in 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), May 2015, pp. 1–8. doi: 10.1109/FG.2015.7163128.
    [5]F. Solomon and K. Ikeuchi, “Extracting the Shape and Roughness of Specular Lobe Objects Using Four Light Photometric Stereo,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 4, pp. 449–454, Apr. 1996, doi: 10.1109/34.491627.
    [6]A. E. Gendy and A. Shalaby, “Mean Profile Depth of Pavement Surface Macrotexture Using Photometric Stereo Techniques,” Journal of Transportation Engineering, vol. 133, no. 7, pp. 433–440, Jul. 2007, doi: 10.1061/(ASCE)0733-947X(2007)133:7(433).
    [7]J. Sun, M. Smith, L. Smith, S. Midha, and J. Bamber, “Object Surface Recovery Using a Multi-Light Photometric Stereo Technique for Non-Lambertian Surfaces Subject to Shadows and Specularities,” Image and Vision Computing, vol. 25, no. 7, pp. 1050–1057, Jul. 2007, doi: 10.1016/j.imavis.2006.04.025.
    [8]G. A. Atkinson, M. F. Hansen, M. L. Smith, and L. N. Smith, “A Efficient and Practical 3D Face Scanner Using Near Infrared and Visible Photometric Stereo,” Procedia Computer Science, vol. 2, pp. 11–19, Jan. 2010, doi: 10.1016/j.procs.2010.11.003.
    [9]A. Ghosh, G. Fyffe, B. Tunwattanapong, J. Busch, X. Yu, and P. Debevec, “Multiview Face Capture Using Polarized Spherical Gradient Illumination,” ACM Transactions on Graphics, vol. 30, no. 6, pp. 1–10, Dec. 2011, doi: 10.1145/2070781.2024163.
    [10]J. Riviere, P. Gotardo, D. Bradley, A. Ghosh, and T. Beeler, “Single-Shot High-Quality Facial Geometry and Skin Appearance Capture,” ACM Transactions on Graphics, vol. 39, no. 4, p. 81:81:1-81:81:12, Aug. 2020, doi: 10.1145/3386569.3392464.
    [11]P. Debevec, T. Hawkins, C. Tchou, H.-P. Duiker, W. Sarokin, and M. Sagar, “Acquiring the Reflectance Field of a Human Face,” in Proceedings of the 27th annual conference on Computer graphics and interactive techniques, in SIGGRAPH ’00. USA: ACM Press/Addison-Wesley Publishing Co., Jul. 2000, pp. 145–156. doi: 10.1145/344779.344855.
    [12]P. Debevec, A. Wenger, C. Tchou, A. Gardner, J. Waese, and T. Hawkins, “A Lighting Reproduction Approach to Live-Action Compositing,” ACM Transactions on Graphics, vol. 21, no. 3, pp. 547–556, Jul. 2002, doi: 10.1145/566654.566614.
    [13]K. Guo et al., “The Relightables: Volumetric Performance Capture of Humans with Realistic Relighting,” ACM Transactions on Graphics, vol. 38, no. 6, p. 217:1-217:19, Nov. 2019, doi: 10.1145/3355089.3356571.
    [14]H. Zhou, S. Hadap, K. Sunkavalli, and D. Jacobs, “Deep Single-Image Portrait Relighting,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Oct. 2019, pp. 7193–7201. doi: 10.1109/ICCV.2019.00729.
    [15]O. Sorkine and M. Alexa, “As-Rigid-As-Possible Surface Modeling,” The Eurographics Association, 2007. doi: 10.2312/SGP/SGP07/109-116.
    [16]V. Kazemi and J. Sullivan, “One Millisecond Face Alignment with an Ensemble of Regression Trees,” in 2014 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2014, pp. 1867–1874. doi: 10.1109/CVPR.2014.241.
    [17]Xiangyu Zhu Jianzhu Guo and Zhen Lei. 3DDFA. https://github.com/cleardusk/3DDFA, 2018.
    [18]T. Sun et al., “Single Image Portrait Relighting,” ACM Transactions on Graphics, vol. 38, no. 4, p. 79:1-79:12, Jul. 2019, doi: 10.1145/3306346.3323008.
    [19]C. Ji et al., “Geometry-Aware Single-Image Full-Body Human Relighting,” in Computer Vision – ECCV 2022, S. Avidan, G. Brostow, M. Cissé, G. M. Farinella, and T. Hassner, Eds., in Lecture Notes in Computer Science. Cham: Springer Nature Switzerland, 2022, pp. 388–405. doi: 10.1007/978-3-031-19787-1_22.
    [20]J. Wang et al., “Deep High-Resolution Representation Learning for Visual Recognition.” arXiv, Mar. 13, 2020. doi: 10.48550/arXiv.1908.07919.
    [21]Y. Wang, A. Holynski, X. Zhang, and X. Zhang, “SunStage: Portrait Reconstruction and Relighting using the Sun as a Light Stage.” arXiv, Mar. 24, 2023. doi: 10.48550/arXiv.2204.03648.
    [22]T. Li, T. Bolkart, M. J. Black, H. Li, and J. Romero, “Learning a Model of Facial Shape and Expression from 4D Scans,” ACM Transactions on Graphics, vol. 36, no. 6, p. 194:1-194:17, Nov. 2017, doi: 10.1145/3130800.3130813.
    [23]C. Sagonas, G. Tzimiropoulos, S. Zafeiriou, and M. Pantic, “300 Faces in-the-Wild Challenge: The First Facial Landmark Localization Challenge,” in 2013 IEEE International Conference on Computer Vision Workshops, Feb. 2013, pp. 397–403. doi: 10.1109/ICCVW.2013.59.
    [24]M. Köstinger, P. Wohlhart, P. M. Roth, and H. Bischof, “Annotated Facial Landmarks in the Wild: A Large-Scale, Real-World Database for Facial Landmark Localization,” in 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), Jan. 2011, pp. 2144–2151. doi: 10.1109/ICCVW.2011.6130513.
    [25]X. Zhu and D. Ramanan, “Face Detection, Pose Estimation, and Landmark Localization in the Wild,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2012, pp. 2879–2886. doi: 10.1109/CVPR.2012.6248014.
    [26]V. Le, J. Brandt, Z. Lin, L. Bourdev, and T. S. Huang, “Interactive Facial Feature Localization,” in Computer Vision – ECCV 2012, A. Fitzgibbon, S. Lazebnik, P. Perona, Y. Sato, and C. Schmid, Eds., in Lecture Notes in Computer Science. Berlin, Heidelberg: Springer, 2012, pp. 679–692. doi: 10.1007/978-3-642-33712-3_49.
    [27]P. N. Belhumeur, D. W. Jacobs, D. J. Kriegman, and N. Kumar, “Localizing Parts of Faces Using a Consensus of Exemplars,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 12, pp. 2930–2940, Feb. 2013, doi: 10.1109/TPAMI.2013.23.
    [28]K. Messer, J. Matas, J. Kittler, J. Luettin, and G. Maître, Eds., “XM2VTSDB: The Extended M2VTS Database,” Proc. Second International Conference on Audio- and Video-based Biometric Person Authentication (AVBPA’99), 1999.
    [29]T. Karras, S. Laine, and T. Aila, “A Style-Based Generator Architecture for Generative Adversarial Networks.” arXiv, Mar. 29, 2019. doi: 10.48550/arXiv.1812.04948.
    [30]T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive Growing of GANs for Improved Quality, Stability, and Variation.” arXiv, Feb. 26, 2018. doi: 10.48550/arXiv.1710.10196.
    [31]A. Bulat and G. Tzimiropoulos, “How Far Are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks),” in 2017 IEEE International Conference on Computer Vision (ICCV), Oct. 2017, pp. 1021–1030. doi: 10.1109/ICCV.2017.116.
    [32]A. Newell, K. Yang, and J. Deng, “Stacked Hourglass Networks for Human Pose Estimation.” arXiv, Jul. 26, 2016. doi: 10.48550/arXiv.1603.06937.
    [33]A. Bulat and G. Tzimiropoulos, “Binarized Convolutional Landmark Localizers for Human Pose Estimation and Face Alignment with Limited Resources,” in 2017 IEEE International Conference on Computer Vision (ICCV), Oct. 2017, pp. 3726–3734. doi: 10.1109/ICCV.2017.400.
    [34]S. Yang, P. Luo, C. C. Loy, and X. Tang, “WIDER FACE: A Face Detection Benchmark.” arXiv, Nov. 20, 2015. doi: 10.48550/arXiv.1511.06523.
    [35]J. Deng, J. Guo, Y. Zhou, J. Yu, I. Kotsia, and S. Zafeiriou, “RetinaFace: Single-stage Dense Face Localisation in the Wild.” arXiv, May 04, 2019. doi: 10.48550/arXiv.1905.00641.
    [36]R. Ranjan, V. M. Patel, and R. Chellappa, “HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 1, pp. 121–135, Jan. 2019, doi: 10.1109/TPAMI.2017.2781233.
    [37]R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.” arXiv, Oct. 22, 2014. doi: 10.48550/arXiv.1311.2524.
    [38]K. E. A. van de Sande, J. R. R. Uijlings, T. Gevers, and A. W. M. Smeulders, “Segmentation as Selective Search for Object Recognition,” in 2011 International Conference on Computer Vision, Jan. 2011, pp. 1879–1886. doi: 10.1109/ICCV.2011.6126456.
    [39]A. S. Joshi, S. S. Joshi, G. Kanahasabai, R. Kapil, and S. Gupta, “Deep Learning Framework to Detect Face Masks from Video Footage,” in 2020 12th International Conference on Computational Intelligence and Communication Networks (CICN), Sep. 2020, pp. 435–440. doi: 10.1109/CICN49253.2020.9242625.
    [40]K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, “Joint Face Detection and Alignment Using Multi-Task Cascaded Convolutional Networks,” IEEE Signal Process. Lett., vol. 23, no. 10, pp. 1499–1503, Oct. 2016, doi: 10.1109/LSP.2016.2603342.
    [41]M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: Inverted Residuals and Linear Bottlenecks.” arXiv, Mar. 21, 2019. doi: 10.48550/arXiv.1801.04381.
    [42]C. Lugaresi et al., “MediaPipe: A Framework for Building Perception Pipelines.” arXiv, Jun. 14, 2019. doi: 10.48550/arXiv.1906.08172.
    [43]U. Sara, M. Akter, and M. S. Uddin, “Image Quality Assessment through FSIM, SSIM, MSE and PSNR—A Comparative Study,” Journal of Computer and Communications, vol. 7, no. 3, Art. no. 3, Mar. 2019, doi: 10.4236/jcc.2019.73002.
    [44]J. Søgaard, L. Krasula, M. Shahid, D. Temel, K. Brunnström, and M. Razaak, “Applicability of Existing Objective Metrics of Perceptual Quality for Adaptive Video Streaming,” IS&T Int’l. Symp. on Electronic Imaging, vol. 28, no. 13, pp. 1–7, Feb. 2016, doi: 10.2352/ISSN.2470-1173.2016.13.IQSP-206.
    [45]張欣媛,“應用簡化立體光度法於肖像光影重建系統”,國立臺灣科技大學碩士論文,2021。
    [46]D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization.” arXiv, Jan. 29, 2017. doi: 10.48550/arXiv.1412.6980.

    QR CODE