簡易檢索 / 詳目顯示

研究生: 林赫
He Lin
論文名稱: 3D單視圖重建模之多尺度特徵融合形變網路
3D Mesh Reconstruction from Single View Image via Multi-Scale Feature Fusion Deformation Network
指導教授: 蘇順豐
Shun-Feng Su
口試委員: 陳永耀
Yung-Yao Chen
花凱龍
Kai-Lung Hua
陳美勇
Mei-Yung Chen
林顯易
Hsien-I Lin
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 63
中文關鍵詞: 3D單視圖重建模多層感知器圖卷積網路深度學習形狀特徵學習
外文關鍵詞: 3D Reconstruction from Single View Image, Multilayer Perceptron, Graph Convolutional Network, Deep Learning, Shape Feature Learning
相關次數: 點閱:293下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文是利用提出的多尺度特徵融合形變網路來對單視圖進行3D重建模。3D 重建模 (3D Reconstruction) 是利用計算機視覺技術判斷圖像中物件的形狀資訊,並且藉由該圖像資訊重新建立3D模型的技術。本論文提出的多尺度特徵融合形變網路包含了骨幹網路、多尺度圖片網路、CBAM(Convolutional Block Attention Block)、多層感知器(MLP) 以及圖卷積網路(GCN),並且在訓練的過程中使用新的損失函數。首先網路利用CBAM專注在輸入圖片的形狀特徵,並利用多尺度特徵編碼器將單視圖中的物件進行多尺度的特徵提取,並將提取的特徵進行融合,以確保同時獲取多尺度的物件形狀特徵並讓網路有更大的感受域(receptive field)。以及考慮多層感知器直接對形狀特徵進行解碼可以讓重建效率增加;最後考慮到可以有效地讓特徵在圖(Graph)的頂點(Vertex)中做交換、以及頂點與線段(Edge)的關係,因此利用圖卷積模型(Graph Convolutional Network)以及將專注過後的特徵圖與3D模型做對齊,將粗略重建出的3D網格模型進行細化,最後網路輸出網格模型,並且計算損失函數以訓練整個網路。本論文在ShapeNet數據集中的三個類別下進行訓練與測試,並且在評估標準Chamfer Loss與F1-Score(tau) / (2 tau)達到平均0.472與60.06% / 76.93%。


    This study is about 3D mesh reconstruction from single view images by using the proposed multi-scale feature fusion deformation network. 3D reconstruction is a technology that finds the shape features of the object in the image, and try to reconstruct the 3D model by the shape information. The proposed multi-scaled feature deformation network consists of a backbone network, a multi-scaled image network, a CBAM (Convolutional Block Attention Module), a multilayer perceptron (MLP), and a graph-based convolutional network (GCN) with a new loss function in the learning process. The network first leverages CBAM to focus on shape features in the input images, then the image is extracted by the multi-scale image network to obtain different scaled shape features, and the features are fused to make the network have larger receptive field for mesh reconstruction. To consider a multilayer perceptron to decode the image shape features can make the reconstruction more efficiently. Finally, considering the efficiency of exchanging features between vertices and the correlation between vertices and edges, the rough mesh is refined by using GCN and the alignment with attention image features map, then the network outputs a mesh and calculates losses to train the network. The neural network proposed in this study is trained and tested in ShapeNet dataset with three categories, and reach average chamfer loss 0.472 and average f-score (tau / 2 tau) 60.06% / 76.93%.

    中文摘要 I Abstract II 致謝 III Table of Contents IV List of Figures VI List of Tables VIII Chapter 1 Introduction 1 1.1 Background 1 1.2 Motivation 2 1.3 Baseline Model 3 1.4 Contribution and Model Structure 4 1.5 Thesis Organization 6 Chapter 2 Related Work 7 2.1 Feature Base 7 2.2 Learning Base 7 Chapter 3 Methodology 10 3.1 Preliminary: Graph Convolution 11 3.2 System Overview 11 3.3 Multi-Scaled Image Network 13 3.4 Mesh Deformation 15 3.4.1 Multilayer Perceptron 15 3.4.2 Graph Convolutional Network 16 3.5 Projection Layer 18 3.6 Convolutional Block Attention Module 19 3.7 Losses 21 Chapter 4 Experiments 23 4.1 Dataset 23 4.2 Evaluation Metric 24 4.2.1 Chamfer Distance 24 4.2.2 F-score 24 4.3 Training and Testing Process 26 4.3.1 Training Phase 26 4.3.2 Testing Phase 27 4.4 Implement Detail 28 4.4.1 Hardware Environment 28 4.4.2 Training Details and Hyper Parameters Setting 28 4.5 Compare with state of the art 29 4.6 Ablation Study 38 4.6.1 Multi-scaled Convolutional Neural Network 38 4.6.2 K Nearest Neighbor Loss 39 4.7 Self-defined Initial Point Cloud Experiment 40 Chapter 5 Conclusions and Future Work 46 5.1 Conclusion 46 5.2 Future Work 47 References 48

    [1] R. Hartley and A. Zisserman, Multiple view geometry in computer vision. Cambridge university press, 2003.
    [2] A. X. Chang et al., "Shapenet: An information-rich 3d model repository," arXiv preprint arXiv:1512.03012, 2015.
    [3] C. B. Choy, D. Xu, J. Gwak, K. Chen, and S. Savarese, "3d-r2n2: A unified approach for single and multi-view 3d object reconstruction," in European conference on computer vision, pp. 628-644, 2016: Springer.
    [4] H. Fan, H. Su, and L. J. Guibas, "A point set generation network for 3d object reconstruction from a single image," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 605-613, 2017.
    [5] N. Wang, Y. Zhang, Z. Li, Y. Fu, W. Liu, and Y.-G. Jiang, "Pixel2mesh: Generating 3d mesh models from single rgb images," in Proceedings of the European Conference on Computer Vision (ECCV), pp. 52-67, 2018.
    [6] S. Woo, J. Park, J.-Y. Lee, and I. So Kweon, "Cbam: Convolutional block attention module," in Proceedings of the European conference on computer vision (ECCV), pp. 3-19, 2018.
    [7] J. L. Schonberger and J.-M. Frahm, "Structure-from-motion revisited," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4104-4113, 2016.
    [8] C. Cadena et al., "Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age," IEEE Transactions on robotics, vol. 32, no. 6, pp. 1309-1332, 2016.
    [9] Q. Huang, H. Wang, and V. Koltun, "Single-view reconstruction via joint analysis of image and shape collections," ACM Transactions on Graphics (TOG), vol. 34, no. 4, pp. 1-10, 2015.
    [10] H. Su, Q. Huang, N. J. Mitra, Y. Li, and L. Guibas, "Estimating image depth using shape collections," ACM Transactions on Graphics (TOG), vol. 33, no. 4, pp. 1-11, 2014.
    [11] A. Kar, S. Tulsiani, J. Carreira, and J. Malik, "Category-specific object reconstruction from a single image," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1966-1974, 2015.
    [12] Z. Wu et al., "3d shapenets: A deep representation for volumetric shapes," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1912-1920, 2015.
    [13] A. Brock, T. Lim, J. M. Ritchie, and N. Weston, "Generative and discriminative voxel modeling with convolutional neural networks," arXiv preprint arXiv:1608.04236, 2016.
    [14] R. Girdhar, D. F. Fouhey, M. Rodriguez, and A. Gupta, "Learning a predictable and generative vector representation for objects," in European Conference on Computer Vision, pp. 484-499, 2016: Springer.
    [15] J. Wu, C. Zhang, T. Xue, B. Freeman, and J. Tenenbaum, "Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling," Advances in neural information processing systems, vol. 29, pp. 82-90, 2016.
    [16] M. Tatarchenko, A. Dosovitskiy, and T. Brox, "Octree generating networks: Efficient convolutional architectures for high-resolution 3d outputs," in Proceedings of the IEEE International Conference on Computer Vision, pp. 2088-2096, 2017.
    [17] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, "Pointnet: Deep learning on point sets for 3d classification and segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 652-660, 2017.
    [18] W. Chen et al., "Deep rbfnet: Point cloud feature learning using radial basis functions," arXiv preprint arXiv:1812.04302, 2018.
    [19] C.-H. Lin, C. Kong, and S. Lucey, "Learning efficient point cloud generation for dense 3d object reconstruction," in proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, 2018.
    [20] P. Achlioptas, O. Diamanti, I. Mitliagkas, and L. Guibas, "Learning representations and generative models for 3d point clouds," in International conference on machine learning, pp. 40-49, 2018: PMLR.
    [21] E. Insafutdinov and A. Dosovitskiy, "Unsupervised learning of shape and pose with differentiable point clouds," in Advances in neural information processing systems, pp. 2802-2812, 2018.
    [22] Y. Yang, C. Feng, Y. Shen, and D. Tian, "Foldingnet: Point cloud auto-encoder via deep grid deformation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 206-215, 2018.
    [23] P. Mandikal, K. Navaneet, M. Agarwal, and R. V. Babu, "3D-LMNet: Latent embedding matching for accurate and diverse 3D point cloud reconstruction from a single image," arXiv preprint arXiv:1807.07796, 2018.
    [24] H. Kato, Y. Ushiku, and T. Harada, "Neural 3d mesh renderer," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3907-3916, 2018.
    [25] T. Groueix, M. Fisher, V. G. Kim, B. C. Russell, and M. Aubry, "A papier-mâché approach to learning 3d surface generation," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 216-224, 2018.
    [26] M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst, "Geometric deep learning: going beyond euclidean data," IEEE Signal Processing Magazine, vol. 34, no. 4, pp. 18-42, 2017.
    [27] K. Sun, B. Xiao, D. Liu, and J. Wang, "Deep high-resolution representation learning for human pose estimation," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5693-5703, 2019.
    [28] G. Gkioxari, J. Malik, and J. Johnson, "Mesh r-cnn," in Proceedings of the IEEE International Conference on Computer Vision, pp. 9785-9795, 2019.
    [29] A. Knapitsch, J. Park, Q.-Y. Zhou, and V. Koltun, "Tanks and temples: Benchmarking large-scale scene reconstruction," ACM Transactions on Graphics (ToG), vol. 36, no. 4, pp. 1-13, 2017.
    [30] I. Loshchilov and F. Hutter, "Decoupled weight decay regularization," arXiv preprint arXiv:1711.05101, 2017.
    [31] L. Mescheder, M. Oechsle, M. Niemeyer, S. Nowozin, and A. Geiger, "Occupancy networks: Learning 3d reconstruction in function space," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4460-4470, 2019.
    [32] L. Li and S. Wu, "DmifNet: 3D Shape Reconstruction Based on Dynamic Multi-Branch Information Fusion," arXiv preprint arXiv:2011.10776, 2020.
    [33] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016.
    [34] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
    [35] B. Guo, J. Menon, and B. Willette, "Surface reconstruction using alpha shapes," in Computer Graphics Forum, vol. 16, no. 4: Wiley Online Library, pp. 177-190, 1997.

    無法下載圖示 全文公開日期 2026/02/04 (校內網路)
    全文公開日期 2026/02/04 (校外網路)
    全文公開日期 2026/02/04 (國家圖書館:臺灣博碩士論文系統)
    QR CODE