基於全景圖影像投影貼圖三維場景重建系統｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳泳達 Yong-Da Tan
論文名稱：	基於全景圖影像投影貼圖三維場景重建系統 Panorama Video based Projective Texture 3D Scene Reconstruction System
指導教授：	姚智原 Chih-Yuan Yao
口試委員:	朱宏國 Hung-Kuo Chu 胡敏君 Min-Chun Hu 莊永裕 Yung-Yu Chuang
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2022
畢業學年度：	110
語文別：	中文
論文頁數：	50
中文關鍵詞：	電腦圖學、圖像拼接、網格重建、VR
外文關鍵詞：	Computer Graphic, Image Stitching, Mesh Reconstruction, VR
相關次數：	點閱：236 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

3D 網格重建(3D Mesh Reconstruction) 為電腦視覺中廣為討論與開發的領域之一，
其中室內場景或戶外場景網格重建。在多數方法中，通過掃描器的各種感測器掃描現實
中的環境，並構建出物件的點雲(Point Cloud)，接著再通過網格重建的演算法講點雲
(Point Cloud) 重構成對應的網格(Mesh)。重構後的3D 網格能夠被使用在很多虛擬實境
(Virtual Reality) 或擴增實境(Augmented reality) 的相關應用上，例如導覽系統或者地
圖導航等等。
其中較常見的方法中，Matterport [1] 是近年來最受矚目的3D 環境網頁技術。
Matterport [1] 適合用來重建室內場景或者戶外場景。使用者可以通過使用手機、360
相機、或使用官方硬體設備，對周遭環境進行掃描，然後通過該平台建構出3D 網格。
除此之外，該平台也提供了一個3D 場景給使用者去使用觀察重建後的3D 網格。在
Matterport [1] 的3D 場景中，使用者的移動方式是通過選擇固定站點的方式進行移動
的，使用者實際上並不能自由到處移動，因為Matterport [1] 拍攝時是通過定點拍攝的方
式，因此能位移的定點取決於使用者拍攝了多少資料。而在移動的表現方式中，使用者
在3D 空間上會朝著目標站點移動，Matterport [1] 則使用了兩個站點所對應的全景圖，
在3D 模型上進行切換，因此使用者在移動的過程中會感受到圖像不連續的切換，容易
發現明顯扭曲的貼圖。因此我們希望使用者在3D 空間能夠自由移動，並改善使用者在
移動的過程時的使用體驗。
本研究希望以此為研究方向，我們提出了3D 場景重建的系統。我們的系統可以以
全景圖的影像作為輸入，並建立網格。為了解決圖像不連續的所照成圖像扭曲的問題，
我們的貼圖是根據使用者當前的位置投影到模型上，而貼圖是透過全景圖影片的切換的
方式獲得的。我們的系統可以在3D 場景裡加入一些互動物件，我們的使用者可以在3D
場景裡自由的依照路線自由的移動，旋轉鏡頭或3D 物件進行互動。
I

3D Mesh Reconstruction is one of the most widely discussed and developed areas
in computer vision, and also 3D scene mesh reconstruction. We can capture point clouds
of the object by scanning the real environment with some various sensors of the scanner.
Then, The point clouds is reconstructed into a corresponding mesh by the mesh reconstruction
algorithm. The 3D mesh that reconstructed can be used in many VR(Virtual
Reality) or AR(Augmented Reality) applications, such as map navigation, environmental
navigation system.
Matterport [1] is one the most popular 3D web visualization of environmental information
technique. Matterport [1] suitable for reconstructing indoor scene or outdoor
scene. Users can scan their surroundings by using phone, 360 camera, or using other
scanner, and then reconstruct a 3D mesh through their service platform. In addition, Matterport
[1] also provides a 3D viewer and let user to observe the reconstructed 3D mesh.
In Matterport [1], user walk through the 3D scene by selecting hotspot, and user is not
actually free to move around in 3D scene, because capture environment from some fixed
location. Therefore, the number of hotspot we can move depends on how much data that
user capture. In their 3D scene, user move towards to the target hotspot, and Matterport [1]
uses two panorama texture and switches the texture on the 3D model. However, user easily
observe obvious distortion of the model’s texture cause by the discontinuous switching
of the texture when moving. Therefore, we want user able to walk through in 3D scene
freely and improve the user experience while moving.
In this research, we propose a 3D scene reconstruction system. We use panorama
image sequence as input, and reconstruct a 3D scene. To solve the problem of texture
image distortion caused by image discontinuity, our texture is projected onto the model
according to the camera’s current position, and the texture image is obtained by switching
from panorama videos. We can add some interactive 3D objects in the 3D scene, and our
users can walk through in the 3D scene freely according to the route, rotate the camera or
interact with 3D objects.

論文摘要
Abstract
誌謝
目錄
圖目錄
表目錄
緒論
1 研究背景與動機
2 論文貢獻
相關研究
1 圖像拼接
2 顏色校正
3 場景重建
4 Matterport
5 語義分割
6 深度預測
7 同時定位與地圖構建
8 網格重建
方法總覽
1 圖像拼接
2 網格重建
3 應用
研究方法
1 系統輸入
2 圖像拼接
2.1 拍攝魚眼圖像
2.2 魚眼校正
2.3 圖像對齊
2.4 顏色校正
2.5 接縫切割
3 網格重建
3.1 語義分割
3.2 深度預測
3.3 點雲建立
3.4 同時定位與地圖建構
3.5 SLAM 地圖比例計算
3.6 點雲對齊
3.7 分析道路輪廓
3.8 網格建立
3.9 貼圖投影
實驗結果與分析
1 影像拼接
1.1 效能分析
1.2 拼接結果
2 網格重建
2.1 效能分析
2.2 網格重建結果
2.3 重建結果的比較
2.4 與Matterport [1] 成果比較
2.5 應用
結論與後續工作
參考文獻
授權書
                                

[1] “Matterport.” https://matterport.com/.
[2] C.-C. Lin, S. U. Pankanti, K. Natesan Ramamurthy, and A. Y. Aravkin, “Adaptive
as-natural-as-possible image stitching,” in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 1155–1163, 2015.
[3] Y.-S. Chen and Y.-Y. Chuang, “Natural image stitching with the global similarity
prior,” in European conference on computer vision, pp. 186–201, Springer, 2016.
[4] M. Xia, J. Yao, R. Xie, M. Zhang, and J. Xiao, “Color consistency correction based
on remapping optimization for image stitching,” in Proceedings of the IEEE International
Conference on Computer Vision Workshops, pp. 2977–2984, 2017.
[5] J. Sun, Y. Xie, L. Chen, X. Zhou, and H. Bao, “Neuralrecon: Real-time coherent 3d
reconstruction from monocular video,” in Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition, pp. 15598–15607, 2021.
[6] S.-T. Yang, F.-E. Wang, C.-H. Peng, P. Wonka, M. Sun, and H.-K. Chu, “Dula-net: A
dual-projection network for estimating room layouts from a single rgb panorama,” in
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
(CVPR), June 2019.
[7] A. Tao, K. Sapra, and B. Catanzaro, “Hierarchical multi-scale attention for semantic
segmentation,” arXiv preprint arXiv:2005.10821, 2020.
[8] M. Ramamonjisoa, M. Firman, J. Watson, V. Lepetit, and D. Turmukhambetov,
“Single image depth prediction with wavelet decomposition,” in Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2021.
[9] S. Sumikura, M. Shibuya, and K. Sakurada, “Openvslam: a versatile visual slam
framework,” in Proceedings of the 27th ACM International Conference on Multimedia,
pp. 2292–2295, 2019.
[10] “Insta 360 One X2.” https://www.insta360.com/hk/product/
insta360-onex2.
[11] J. Zaragoza, T.-J. Chin, M. S. Brown, and D. Suter, “As-projective-as-possible image
stitching with moving dlt,” in Proceedings of the IEEE conference on computer
vision and pattern recognition, pp. 2339–2346, 2013.
[12] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic
segmentation,” in Proceedings of the IEEE conference on computer vision and
pattern recognition, pp. 3431–3440, 2015.
[13] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in
Proceedings of the IEEE conference on computer vision and pattern recognition,
pp. 2881–2890, 2017.
[14] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,”
arXiv preprint arXiv:1511.07122, 2015.
[15] A. Saxena, M. Sun, and A. Y. Ng, “Make3d: Learning 3d scene structure from a
single still image,” IEEE transactions on pattern analysis and machine intelligence,
vol. 31, no. 5, pp. 824–840, 2008.
[16] D. Eigen and R. Fergus, “Predicting depth, surface normals and semantic labels with
a common multi-scale convolutional architecture,” in Proceedings of the IEEE international
conference on computer vision, pp. 2650–2658, 2015.
[17] D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using
a multi-scale deep network,” Advances in neural information processing systems,
vol. 27, 2014.
[18] R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos, “Orb-slam: a versatile and accurate
monocular slam system,” IEEE transactions on robotics, vol. 31, no. 5, pp. 1147–
1163, 2015.
[19] D. Schlegel, M. Colosi, and G. Grisetti, “Proslam: Graph slam from a programmer’s
perspective,” in 2018 IEEE international conference on robotics and automation
(ICRA), pp. 3833–3840, IEEE, 2018.
[20] R. Munoz-Salinas and R. Medina-Carnicer, “Ucoslam: Simultaneous localization
and mapping by fusion of keypoints and squared planar markers,” Pattern Recognition,
vol. 101, p. 107193, 2020.
[21] J. Engel, T. Schöps, and D. Cremers, “Lsd-slam: Large-scale direct monocular
slam,” in European conference on computer vision, pp. 834–849, Springer, 2014.
[22] J. Engel, V. Koltun, and D. Cremers, “Direct sparse odometry,” IEEE transactions
on pattern analysis and machine intelligence, vol. 40, no. 3, pp. 611–625, 2017.
[23] P. Labatut, J.-P. Pons, and R. Keriven, “Robust and efficient surface reconstruction
from range data,” in Computer graphics forum, vol. 28, pp. 2275–2290, Wiley Online
Library, 2009.
[24] M. Kazhdan and H. Hoppe, “Screened poisson surface reconstruction,” ACM Transactions
on Graphics (ToG), vol. 32, no. 3, pp. 1–13, 2013.
[25] V. Kwatra, A. Schödl, I. Essa, G. Turk, and A. Bobick, “Graphcut textures: Image
and video synthesis using graph cuts,” Acm transactions on graphics (tog), vol. 22,
no. 3, pp. 277–286, 2003.
[26] G. Neuhold, T. Ollmann, S. Rota Bulo, and P. Kontschieder, “The mapillary vistas
dataset for semantic understanding of street scenes,” in Proceedings of the IEEE
international conference on computer vision, pp. 4990–4999, 2017.
[27] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti
vision benchmark suite,” in 2012 IEEE conference on computer vision and pattern
recognition, pp. 3354–3361, IEEE, 2012.

全文公開日期 2024/08/24 (校內網路)
全文公開日期 2024/08/24 (校外網路)
全文公開日期 2024/08/24 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文