研究生: |
巫孟儒 Meng-Ju Wu |
---|---|
論文名稱: |
以多視角學習法進行三維人體重建 Multi-View Learning for 3D Human Body Reconstruction |
指導教授: |
徐繼聖
Gee-Sern Hsu |
口試委員: |
鍾聖倫
Sheng-Luen Chung 林嘉文 Chia-Wen Lin 賴尚宏 Shang-Hong Lai 林惠勇 Huei-Yung Lin |
學位類別: |
碩士 Master |
系所名稱: |
工程學院 - 機械工程系 Department of Mechanical Engineering |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 中文 |
論文頁數: | 81 |
中文關鍵詞: | 三維人體重建 、多視角學習 |
外文關鍵詞: | 3D Human Body Reconstruction, Multi-View Learning |
相關次數: | 點閱:305 下載:1 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
我們提出了一個用於三維人體重建的 Multi-View Volumetric Learning (MVVL) network。透過一組多視角的2D人體圖像重建出對應姿勢且具有充足表面細節的3D 人體模型。MVVL包含兩個主要模組,一個多視角自動編碼器 (Multi-View Auto-Encoder, MVAE) 和一個自適應邊界模組 (Adaptive Boundary Module, ABM)。首先,給定一組多視角的人體影像,多視角自動編碼器會將其轉換成特徵向量,此組特徵隨後通過三維解碼器進行解碼,產生與視角相關的三維體素人體模型。我們透過一個視角整合模塊將所有視角的模型融合成一個粗糙的三維體素人體模型。而自適應邊界模組的主要任務為優化粗糙的三維人體,最後重建出一個較細緻的三維體素人體模型。此外,為了促成MVVL的訓練,我們從SIZER資料庫中提出了一個進行姿勢增量後的資料庫,並命名為SIZER-PA資料庫。本研究的貢獻包含三個層面1) 訓練MVVL時的低計算成本,使我們的方法比其他使用體積 (Volumetric) 重建的方法更具優勢。 2) 與最先進基於體積重建的方法相比的競爭性能。 3)我們在目前這個對3D人體資料庫有很高需求的領域中,提出了SIZER-PA資料庫。此資料庫包含各種不同姿勢且帶有服裝、紋理的三維人體模型,可用於學習三維人體重建。
We propose the Multi-View Volumetric Learning (MVVL) network for 3D human body reconstruction from 2D images. Given a set of multi-view 2D human images, the MVVL network can generate a 3D human body model that has the same pose as of the 2D human with sufficient surface details. The MVVL network is composed of two major modules, the Multi-View Auto-Encoder (MVAE) and the Adaptive Boundary Module (ABM). The MVAE encodes the 2D images to latent features by an image encoder, and decodes the latent features by a 3D decoder to view-dependent 3D human shapes, which are fused by a multi-view fusion module to make a coarse 3D human model. The ABM enhances the 3D boundaries of the coarse human model and generates a refined 3D body model as the output. To facilitate the training of the MVVL network, we construct a pose augmented dataset from the SIZER dataset, and call it SIZER-PA dataset. The contributions of this study are threefold. The first is the low computational cost for training the MVVL network, making our approach more advantageous than other volumetric reconstruction approaches. The second is the competitive performance to state-of-the-art volumetric approaches. The third is offering of the SIZER-PA dataset to this particular field with a high demand for quality 3D human datasets.
[1]Matthew Loper, Naureen Mahmood, Javier Romero, GerardPons-Moll, and Michael J Black. Smpl: A skinned multi-person linear model. ACM transactions on graphics (TOG),34(6):1–16, 2015.
[2]Gul Varol, Duygu Ceylan, Bryan Russell, Jimei Yang, ErsinYumer, Ivan Laptev, and Cordelia Schmid. Bodynet: Volumetric inference of 3d human body shapes. InProceedings of the European Conference on Computer Vision (ECCV), September 2
[3]Zerong Zheng, Tao Yu, Yixuan Wei, Qionghai Dai, and Yebin Liu. Deephuman: 3d human reconstruction from a single image. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 7739–7749,
[4]Saito, Shunsuke, et al. "Pifu: Pixel-aligned implicit function for high-resolution clothed human digitization." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019.
[5]Tong He, John Collomosse, Hailin Jin, and Stefano Soatto.Geo-pifu: Geometry and pixel aligned implicit functions for single-view human reconstruction. In H. Larochelle,M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors,Advances in Neural Information Processing Systems, volume 33, pages 9276–9287. Curran Associates
[6]Grand Theft Auto V. https://www.rockstargames.com/V/
[7]Avatar. https://www.imdb.com/title/tt0499549/
[8]Lassner, Christoph, et al. "Unite the people: Closing the loop between 3d and 2d human representations." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
[9]Varol, Gul, et al. "Learning from synthetic humans." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
[10]RenderPeople. https://renderpeople.com/
[11]Tiwari, Garvita, et al. "Sizer: A dataset and model for parsing 3d clothing and learning size sensitive 3d clothing." Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16. Springer International Publishing, 2020.
[12]Xie, Haozhe, et al. "Pix2vox: Context-aware 3d reconstruction from single and multi-view images." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019.
[13]Choy, Christopher B., et al. "3d-r2n2: A unified approach for single and multi-view 3d object reconstruction." European conference on computer vision. Springer, Cham, 2016.
[14]Chang, Angel X., et al. "Shapenet: An information-rich 3d model repository." arXiv preprint arXiv:1512.03012 (2015).
[15]Sun, Xingyuan, et al. "Pix3d: Dataset and methods for single-image 3d shape modeling." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
[16]Kirillov, Alexander, et al. "Pointrend: Image segmentation as rendering." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.
[17]He, Kaiming, et al. "Mask r-cnn." Proceedings of the IEEE international conference on computer vision. 2017.
[18]Yu, Tao, et al. "Doublefusion: Real-time capture of human performances with inner body shapes from a single depth sensor." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
[19]Yu, Fisher, et al. "Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop." arXiv preprint arXiv:1506.03365 (2015).
[20]Treedy’s scanner, https://www.treedys.com
[21]Agisoft metashape, https://www.agisoft.com/
[22]Mescheder, Lars, et al. "Occupancy networks: Learning 3d reconstruction in function space." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.
[23]Kanazawa, Angjoo, et al. "End-to-end recovery of human shape and pose." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
[24]Bogo, Federica, et al. "Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image." European conference on computer vision. Springer, Cham, 2016.
[25]Lorensen, W. E., & Cline, H. E. (1987). Marching cubes: A high resolution 3D surface construction algorithm. ACM siggraph computer graphics, 21(4), 163-169..
[26]Deng, Jia, et al. "Imagenet: A large-scale hierarchical image database." 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009.
[27]Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
[28]Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." arXiv preprint arXiv:1502.03167 (2015).
[29]Lin, Tsung-Yi, et al. "Focal loss for dense object detection." Proceedings of the IEEE international conference on computer vision. 2017.
[30]Blender. https://www.blender.org/
[31]MeshLab. https://www.meshlab.net/
[32]Binvox: 3D mesh voxelizer. https://www.patrickmin.com/binvox/
[33]He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
[34]GitHub: BodyNet. https://github.com/gulvarol/bodynet
[35]GitHub: DeepHuman. https://github.com/ZhengZerong/DeepHuman