簡易檢索 / 詳目顯示

研究生: 巫孟儒
Meng-Ju Wu
論文名稱: 以多視角學習法進行三維人體重建
Multi-View Learning for 3D Human Body Reconstruction
指導教授: 徐繼聖
Gee-Sern Hsu
口試委員: 鍾聖倫
Sheng-Luen Chung
林嘉文
Chia-Wen Lin
賴尚宏
Shang-Hong Lai
林惠勇
Huei-Yung Lin
學位類別: 碩士
Master
系所名稱: 工程學院 - 機械工程系
Department of Mechanical Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 81
中文關鍵詞: 三維人體重建多視角學習
外文關鍵詞: 3D Human Body Reconstruction, Multi-View Learning
相關次數: 點閱:157下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

我們提出了一個用於三維人體重建的 Multi-View Volumetric Learning (MVVL) network。透過一組多視角的2D人體圖像重建出對應姿勢且具有充足表面細節的3D 人體模型。MVVL包含兩個主要模組,一個多視角自動編碼器 (Multi-View Auto-Encoder, MVAE) 和一個自適應邊界模組 (Adaptive Boundary Module, ABM)。首先,給定一組多視角的人體影像,多視角自動編碼器會將其轉換成特徵向量,此組特徵隨後通過三維解碼器進行解碼,產生與視角相關的三維體素人體模型。我們透過一個視角整合模塊將所有視角的模型融合成一個粗糙的三維體素人體模型。而自適應邊界模組的主要任務為優化粗糙的三維人體,最後重建出一個較細緻的三維體素人體模型。此外,為了促成MVVL的訓練,我們從SIZER資料庫中提出了一個進行姿勢增量後的資料庫,並命名為SIZER-PA資料庫。本研究的貢獻包含三個層面1) 訓練MVVL時的低計算成本,使我們的方法比其他使用體積 (Volumetric) 重建的方法更具優勢。 2) 與最先進基於體積重建的方法相比的競爭性能。 3)我們在目前這個對3D人體資料庫有很高需求的領域中,提出了SIZER-PA資料庫。此資料庫包含各種不同姿勢且帶有服裝、紋理的三維人體模型,可用於學習三維人體重建。


We propose the Multi-View Volumetric Learning (MVVL) network for 3D human body reconstruction from 2D images. Given a set of multi-view 2D human images, the MVVL network can generate a 3D human body model that has the same pose as of the 2D human with sufficient surface details. The MVVL network is composed of two major modules, the Multi-View Auto-Encoder (MVAE) and the Adaptive Boundary Module (ABM). The MVAE encodes the 2D images to latent features by an image encoder, and decodes the latent features by a 3D decoder to view-dependent 3D human shapes, which are fused by a multi-view fusion module to make a coarse 3D human model. The ABM enhances the 3D boundaries of the coarse human model and generates a refined 3D body model as the output. To facilitate the training of the MVVL network, we construct a pose augmented dataset from the SIZER dataset, and call it SIZER-PA dataset. The contributions of this study are threefold. The first is the low computational cost for training the MVVL network, making our approach more advantageous than other volumetric reconstruction approaches. The second is the competitive performance to state-of-the-art volumetric approaches. The third is offering of the SIZER-PA dataset to this particular field with a high demand for quality 3D human datasets.

摘要 IV Abstract V 誌謝 VI 目錄 VII 圖目錄 IX 第1章 介紹 1 1.1 研究背景和動機 1 1.2 方法概述 4 1.3 論文貢獻 6 1.4 論文架構 8 第2章 文獻回顧 9 2.1 Pix2Vox 9 2.2 PointRend 11 2.3 三維人體模型資料庫 13 2.3.1 THUman Dataset 13 2.3.2 RenderPeople Dataset 14 2.3.3 Synthetic hUmans foR REAL (SURREAL) Dataset 15 2.3.4 SIZER Dataset 16 2.4 Skinned Multi-Person Linear Model 17 2.5 Implicit Function Method 19 2.5.1 PIFu 19 2.5.2 Geo-PiFu 21 2.6 Voxel-Based Method 23 2.5.1 BodyNet 23 2.5.2 DeepHuman 24 第3章 主要方法 27 3.1 整體網路架構 28 3.2 多視角自動編碼器設計 30 3.3 自適應邊界模塊設計 33 3.4 損失函數 36 第4章 SIZER-PA資料庫 41 第5章 實驗設置與分析 44 5.1 實驗設置 44 5.1.1 資料劃分、設置 44 5.1.2 效能評估指標 45 5.1.3 實驗設計 47 5.2 實驗結果與分析 52 5.2.1 自適應邊界模塊之設置比較 53 5.2.2 不同損失函數選用之比較 58 5.2.3 二維影像編碼器之架構設置比較 58 5.2.4 二維投影輪廓損失函數之影響 59 5.2.5 增加局部損失對於重建之影響 60 5.2.6 損失函數中的參數與權重之影響 62 5.3 與相關文獻之比較 63 第6章 結論與未來研究方向 66 參考文獻 68

[1]Matthew Loper, Naureen Mahmood, Javier Romero, GerardPons-Moll, and Michael J Black. Smpl: A skinned multi-person linear model. ACM transactions on graphics (TOG),34(6):1–16, 2015.
[2]Gul Varol, Duygu Ceylan, Bryan Russell, Jimei Yang, ErsinYumer, Ivan Laptev, and Cordelia Schmid. Bodynet: Volumetric inference of 3d human body shapes. InProceedings of the European Conference on Computer Vision (ECCV), September 2
[3]Zerong Zheng, Tao Yu, Yixuan Wei, Qionghai Dai, and Yebin Liu. Deephuman: 3d human reconstruction from a single image. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 7739–7749,
[4]Saito, Shunsuke, et al. "Pifu: Pixel-aligned implicit function for high-resolution clothed human digitization." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019.
[5]Tong He, John Collomosse, Hailin Jin, and Stefano Soatto.Geo-pifu: Geometry and pixel aligned implicit functions for single-view human reconstruction. In H. Larochelle,M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors,Advances in Neural Information Processing Systems, volume 33, pages 9276–9287. Curran Associates
[6]Grand Theft Auto V. https://www.rockstargames.com/V/
[7]Avatar. https://www.imdb.com/title/tt0499549/
[8]Lassner, Christoph, et al. "Unite the people: Closing the loop between 3d and 2d human representations." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
[9]Varol, Gul, et al. "Learning from synthetic humans." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
[10]RenderPeople. https://renderpeople.com/
[11]Tiwari, Garvita, et al. "Sizer: A dataset and model for parsing 3d clothing and learning size sensitive 3d clothing." Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16. Springer International Publishing, 2020.
[12]Xie, Haozhe, et al. "Pix2vox: Context-aware 3d reconstruction from single and multi-view images." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019.
[13]Choy, Christopher B., et al. "3d-r2n2: A unified approach for single and multi-view 3d object reconstruction." European conference on computer vision. Springer, Cham, 2016.
[14]Chang, Angel X., et al. "Shapenet: An information-rich 3d model repository." arXiv preprint arXiv:1512.03012 (2015).
[15]Sun, Xingyuan, et al. "Pix3d: Dataset and methods for single-image 3d shape modeling." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
[16]Kirillov, Alexander, et al. "Pointrend: Image segmentation as rendering." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.
[17]He, Kaiming, et al. "Mask r-cnn." Proceedings of the IEEE international conference on computer vision. 2017.
[18]Yu, Tao, et al. "Doublefusion: Real-time capture of human performances with inner body shapes from a single depth sensor." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
[19]Yu, Fisher, et al. "Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop." arXiv preprint arXiv:1506.03365 (2015).
[20]Treedy’s scanner, https://www.treedys.com
[21]Agisoft metashape, https://www.agisoft.com/
[22]Mescheder, Lars, et al. "Occupancy networks: Learning 3d reconstruction in function space." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.
[23]Kanazawa, Angjoo, et al. "End-to-end recovery of human shape and pose." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
[24]Bogo, Federica, et al. "Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image." European conference on computer vision. Springer, Cham, 2016.
[25]Lorensen, W. E., & Cline, H. E. (1987). Marching cubes: A high resolution 3D surface construction algorithm. ACM siggraph computer graphics, 21(4), 163-169..
[26]Deng, Jia, et al. "Imagenet: A large-scale hierarchical image database." 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009.
[27]Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
[28]Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." arXiv preprint arXiv:1502.03167 (2015).
[29]Lin, Tsung-Yi, et al. "Focal loss for dense object detection." Proceedings of the IEEE international conference on computer vision. 2017.
[30]Blender. https://www.blender.org/
[31]MeshLab. https://www.meshlab.net/
[32]Binvox: 3D mesh voxelizer. https://www.patrickmin.com/binvox/
[33]He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
[34]GitHub: BodyNet. https://github.com/gulvarol/bodynet
[35]GitHub: DeepHuman. https://github.com/ZhengZerong/DeepHuman

QR CODE