應用適應性變形區塊深度估測技術於立體視覺之研究｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	潘炯融 Chiung-Jung Pan
論文名稱：	應用適應性變形區塊深度估測技術於立體視覺之研究 Study of Applying Adaptive Morphable Block Depth Estimation Technique to Stereo Vision
指導教授：	徐勝均 Sheng-Dong Xu
口試委員:	柯正浩 Cheng-Hao Ko 瞿忠正 Chung-Cheng Chiu
學位類別：	碩士 Master
系所名稱：	工程學院 - 自動化及控制研究所 Graduate Institute of Automation and Control
論文出版年：	2023
畢業學年度：	111
語文別：	中文
論文頁數：	75
中文關鍵詞：	立體視覺、區塊比對、深度估測、三維重建
外文關鍵詞：	Stereo Vision, Block Matching, 3D Reconstruction, Depth Estimation
相關次數：	點閱：180 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

立體視覺偵測一直是影像處理的重要技術之一。三維街景的應用，需要以影像來偵測出準確且豐富的三維資訊。然而，目前的影像比對技術仍然存在許多的問題與挑戰。主要問題在於比對點數過於稀疏。因此，無法呈現完整豐富的視差圖，且不易得到比對之準確率。雖然已有相關研究學者提出各種複雜的區塊比對方法來解決上述的問題，但是，這些方法相對地會需要花費更多比對的時間。所以，近幾年的研究大多將重點放在硬體的加速技術，除了以硬體方式來改進之外，本研究研究提出一套適應性變形區塊深度估測演算法，來解決上述的這些問題。其步驟如下:(1)我們使用高斯金字塔來降低影像比對時間。(2)再對最低解析影像使用不同區塊大小的比對結果來偵測所處理的像素特性。(3)接著利用像素特性進一步作影像區域特性的分類。(4)再對每一個像素會依據所分類的結果做出適應性變形區塊的比對。(5)依序從低解析到原解析偵測出豐富的深度資訊。(6)再完成三維影像的建立。最後，將提出的方法與近年所提出的相關演算法作比較。實驗結果顯示:本方法能成功地提高影像比對的準確率，取得更豐富的比對結果與大幅減少比對時間。因此，我們可以利用本論文所提出的偵測結果投影出更豐富完整的三維街景影像。

The stereo vision detection has always been one of the important techniques in image processing. The applications of 3D street view need to get accurate and rich 3D information from images. However, there are still many problems and challenges in the current techniques of image matching. The main problems lies in that the number of matching points is too sparse. Therefore, it is impossible to present a complete and rich disparity map, and it is difficult to get the accuracy of the matching. Although relevant researchers have proposed various complex block matching methods to solve the above problems, these methods will relatively take more time for matching. Therefore, most of the researches in recent years focused on hardware acceleration techniques. In addition to improving by means of hardware, this study proposes Adaptive Morphable Block Depth Estimation algorithm to solve the above-mentioned problems. The steps are as follows: (1) We use a Gaussian pyramid to reduce the image matching time. (2) Use the matching results of different block sizes for the lowest resolution image to detect the processed pixel characteristics. (3) Use the pixel characteristics to further classify the image region characteristics. (4) For each pixel, an adaptive morphable block matching is made according to the classified result. (5) Detect rich in-depth information sequentially from low resolution to original resolution. (6) Complete the establishment of the three-dimensional image. Finally, the proposed method in this paper will be compared with related algorithms proposed in recent years. The experimental results show that the proposed method can successfully improve the accuracy of image matching, obtain more abundant matching results, and greatly reduce the comparison time. Therefore, we can use the detection results proposed in this paper to project richer and more complete 3D street view images.

致謝                                            I
摘要                                           II
Abstract                                      III
目錄                                           IV
圖目錄                                          V
表目錄                                       VIII
第一章 緒論                                      1
1.1 研究背景與動機                               1
1.2 研究目的                                    3
1.3 論文架構                                    4
第二章 文獻探討                                  5
第三章 適應性變形區塊深度估測演算法               19
3.1 多區塊影像比對                              21
3.2 比對結果分析與融合視差圖                     25
3.3 影像區域分類與標記                          28
3.4 適應性變形區塊比對                           29
第四章 實驗結果                                 31
4.1 Middlebury 2014立體視覺影像資料庫            31
4.2 實驗室自行拍攝的街景影像                     42
第五章 結論與未來展望                            58
4.1 結論                                       58
4.2 未來展望                                    58
參考文獻                                        59

                                

[1] J. Carroll, “Vision-guided quadruped robot from Boston dynamics now opens doors,” February 13, 2018, Accessed on May 24, 2023, [Online]. Available: https://www. vision-systems.com/embedded/article/16752063/visionguided-quadruped-robot-from-boston-dynamics-now-opens-doors
[2] L. Calderone, “What is machine vision?” December 17, 2019, Accessed on May 22, 2023, [Online]. Available: https://www.roboticstomorrow.com/article/2019/12/what-is-machine-vision/14548
[3] R. Amadeo, “Google’s street view cars are now giant, mobile 3D scanners,” September 6, 2017, Accessed on May 29, 2023, [Online]. Available: https://arstechnica.com/ gadgets/2017/09/googles-street-view-cars-are-now-giant-mobile-3d-scanners/
[4] M. K. Ali, A. Rajput, M. Shahzad, F. Khan, F. Akhtar and A. Börner, “Multi-sensor depth fusion framework for real-time 3D reconstruction,” IEEE Access, vol. 7, pp. 136471-136480, September 2019, DOI: 10.1109/ACCESS.2019.2942375.
[5] F. Oniga and S. Nedevschi, “Processing dense stereo data using elevation maps: Road surface, traffic isle, and obstacle detection,” IEEE Transactions on Vehicular Technology, vol. 59, no. 3, pp. 1172-1182, March 2010, DOI: 10.1109/TVT.2009. 2039718.
[6] D. Gurskiy, “Tesla autonomy day recap – Tesla FSD,” April 25, 2019, Accessed on June 14, 2023, [Online]. Available: https://evbite.com/tesla-autonomy-day-recap/
[7] K. Armstrong, “Tesla introduces vision park assist: Availability, accuracy and videos,” March. 24, 2023, Accessed on June 20, 2023, [Online]. Available: https://www. notateslaapp.com/news/1292/tesla-introduces-vision-park-assist-availability-accuracy-and-videos
[8] B. Henricksen, “Motion estimation and image warping for video compression,”in Proc. IBC International Broadcasting Convention, Amsterdam, Netherlands, September 14-18, 1995, pp. 226-231, DOI: 10.1049/cp:19950957.
[9] M. S. Sri, B. R. Naik, and K. Jayasankar, “Object tracking using motion estimation based on block matching algorithm,” in Proc. International Conference on Inventive Computation Technologies (ICICT), Coimbatore, India, February 26-28, 2020, pp. 519-522, DOI: 10.1109/ICICT48043.2020.9112511.
[10] Z. Wang, S. Wang, X. Zhang, S. Wang, and S. Ma, “Three-zone segmentation-based motion compensation for video compression,” IEEE Transactions on Image Processing, vol. 28, no. 10, pp. 5091-5104, October 2019, DOI: 10.1109/TIP.2019.2910382.
[11] G. Egnal and R. P. Wildes, “Detecting binocular half-occlusions: empirical comparisons of five approaches,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 8, pp. 1127-1133, Augst 2002, DOI: 10.1109/TPAMI.2002.1023808.
[12] Y. He and S. Chen, “Recent advances in 3D data acquisition and processing by time-of-flight camera,” IEEE Access, vol. 7, pp. 12495-12510, January 2019, DOI: 10.1109/ ACCESS.2019.2891693.
[13] T.-Y. Wu，「ToF系統設計：深度感測架構」。April 7, 2022, Accessed on May 29, [Online]. Available: https://www.eettaiwan.com/20220407ta31-time-of-flight-system-design-depth-sensing-architecture/
[14] R. A. Hamzah, R. A. Rahim, and Z. M. Noh, “Sum of Absolute Differences algorithm in stereo correspondence problem for stereo matching in computer vision application,” in Proc. International Conference on Computer Science and Information Technology, Chengdu, China, July 9-10, 2010, pp. 652-657, DOI: 10.1109/ ICCSIT.2010.5565062.
[15] A. R. Putra, F. Mochammad, and H. Herdian, “FPGA implementation of template matching using binary sum of absolute difference,” in Proc. International Conference on System Engineering and Technology (ICSET), Bandung, Indonesia, October 3-4, 2016, pp. 13-17, DOI: 10.1109/ICSEngT.2016.7849615.
[16] A. V. Paramkusam and V. S. K. Reddy, “An efficient fast full search block matching algorithm with SSD criterion,” in Proc. Annual IEEE India Conference, Hyderabad, India, December 16-18, 2011, pp. 1-6, DOI: 10.1109/INDCON.2011.6139485.
[17] Z. Yin and L. Jianqiang, “Research on SSD algorithm based on occlusion detection,” in Proc International Conference on Electronic Information Technology and Computer Engineering (EITCE), Xiamen, China, October 18-20 2019, pp. 1987-1992, DOI: 10.1109/ EITCE47263.2019.9094772.
[18] S.-D. Wei and S.-H. Lai, “Fast template matching based on normalized cross correlation with adaptive multilevel winner update,” IEEE Transactions on Image Processing, vol. 17, no. 11, pp. 2227-2235, November 2008, DOI: 10.1109/TIP.2008.2004615.
[19] V. Q. Dinh, C. C. Pham, and J. W. Jeon, “Robust adaptive normalized cross-correlation for stereo matching cost computation,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 27, no. 7, pp. 1421-1434, July 2017, DOI: 10.1109/ TCSVT.2016.2539738.
[20] S. L. Al-khafaji, J. Zhou, A. Zia, and A. W.-C. Liew, “Spectral-spatial scale invariant feature transform for hyperspectral images,” IEEE Transactions on Image Processing, vol. 27, no. 2, pp. 837-850, February 2018, DOI: 10.1109/TIP.2017.2749145.
[21] S. Ehsan, N. Kanwal, A. F. Clark, and K. D. McDonald-Maier, “An algorithm for the contextual adaption of SURF octave selection with good matching performance: Best octaves,” IEEE Transactions on Image Processing, vol. 21, no. 1, pp. 297-304, January 2012, DOI: 10.1109/TIP.2011.2160869.
[22] P.-Y. Chen, C.-C. Huang, C.-Y. Lien, and Y.-H. Tsai, “An efficient hardware implementation of HOG feature extraction for human detection,” IEEE Transactions on Intelligent Transportation Systems, vol. 15, no. 2, pp. 656-662, April 2014, DOI: 10.1109/TITS.2013.2284666.
[23] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “ORB: An efficient alternative to SIFT or SURF,” in Proc International Conference on Computer Vision, Barcelona, Spain, November 6-13, 2011, pp. 2564-2571, DOI: 10.1109/ICCV.2011.6126544.
[24] R. Mur-Artal and J. D. Tardós, “ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB-D cameras,” IEEE Transactions on Robotics, vol. 33, no. 5, pp. 1255-1262, October 2017, DOI: 10.1109/TRO.2017.2705103.
[25] O.-S. Kwon and Y.-H. Ha, “Panoramic video using scale-invariant feature transform with embedded color-invariant values,” IEEE Transactions on Consumer Electronics, vol. 56, no. 2, pp. 792-798, May 2010, DOI: 10.1109/TCE.2010.5506003.
[26] S. L. Al-khafaji, J. Zhou, A. Zia, and A. W.-C. Liew, “Spectral-spatial scale invariant feature transform for hyperspectral images,” IEEE Transactions on Image Processing, vol. 27, no. 2, pp. 837-850, February 2018, DOI: 10.1109/TIP.2017.2749145.
[27] L. Han, J. Wang, Y. Zhang, X. Sun, and X. Wu, “Research on adaptive ORB-SURF image matching algorithm based on fusion of edge features,” IEEE Access, vol. 10, pp. 109488-109497, October 2022, DOI: 10.1109/ACCESS.2022.3212151.
[28] Y. Pang, H. Yan, Y. Yuan, and K. Wang, “Robust CoHOG feature extraction in human-centered image/video management system,” IEEE Transactions on Systems, Man, and Cybernetics, Part B, vol. 42, no. 2, pp. 458-468, April 2012, DOI: 10.1109/ TSMCB.2011.2167750.
[29] C. Sun, X. Wu, J. Sun, N. Qiao, and C. Sun, "Multi-stage refinement feature matching using adaptive ORB features for robotic vision navigation," IEEE Sensors Journal, vol. 22, no. 3, pp. 2603-2617, February 1, 2022, DOI: 10.1109/JSEN.2021.3138846.
[30] D. Esparza and G. Flores, “The STDyn-SLAM: A stereo vision and semantic segmentation approach for VSLAM in dynamic outdoor environments,” IEEE Access, vol. 10, pp. 18201-18209, February 2022, DOI: 10.1109/ACCESS.2022.3149885.
[31] H. Laga, L. V. Jospin, F. Boussaid, and M. Bennamoun, “A survey on deep learning techniques for stereo-based depth estimation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 4, pp. 1738-1764, April 1, 2022, DOI: 10.1109/ TPAMI.2020.3032602.
[32] D. Quan, W. Shuang, G. Yu, L. Ruiqi, Y. Bowu, W. Shaowei, H. Biao, and J. Licheng, “Deep feature correlation learning for multi-modal remote sensing image registration,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1-16, August 2022, no. 4708216, DOI: 10.1109/TGRS.2022.3187015.
[33] J. Okae, B. Li, J. Du, and Y. Hu, “Robust scale-aware stereo matching network,” IEEE Transactions on Artificial Intelligence, vol. 3, no. 2, pp. 244-253, April 2022, DOI: 10.1109/TAI.2021.3115401.
[34] S.-H. Seo, M. R. Azimi-Sadjadi, and B. Tian, “A least-squares-based 2-D filtering scheme for stereo image compression,” IEEE Transactions on Image Processing, vol. 9, no. 11, pp. 1967-1972, November 2000, DOI: 10.1109/83.877217.
[35] H. Hirschmuller, “Stereo processing by semiglobal matching and mutual information,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 2, pp. 328-341, Feberuary 2008, DOI: 10.1109/TPAMI.2007.1166.
[36] Z. Lu, J. Wang, Z. Li, S. Chen, and F. Wu, “A resource-efficient pipelined architecture for real-time semi-global stereo matching,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 2, pp. 660-673, February 2022, DOI: 10.1109/TCSVT.2021.3061704.
[37] N. Einecke and J. Eggert, “A multi-block-matching approach for stereo,” in Proc. IEEE Intelligent Vehicles Symposium (IV), Seoul, Korea (South), June 28-July 1, 2015, pp. 585-592, DOI: 10.1109/IVS.2015.7225748.
[38] Q. Chang and T. Maruyama, “Real-time stereo vision system: A multi-block matching on GPU,” IEEE Access, vol. 6, pp. 42030-42046, July 2018, DOI: 10.1109/ ACCESS.2018.2859445.
[39] P. Burt and E. Adelson, “The laplacian pyramid as a compact image code,” IEEE Transactions on Communications, vol. 31, no. 4, pp. 532-540, April 1983, DOI: 10.1109/ TCOM.1983.1095851.
[40] E. H. Adelson, C. H. , Anderson, J. R. Bergen, P. J. Burt, and J. M, Ogden, “Pyramid methods in image processing,” RCA Engineer, vol. 29. no. 6, pp. 33-41, 1984, DOI: 10. 1117/ 12.271130
[41] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” in Proc. International Journal of Computer Vision 60, Boston, USA, pp. 91–110, November 2004, DOI: 10.1023/B:VISI.0000029664. 99615.94
[42] Y. Lee and H. Kim, “A high-throughput depth estimation processor for accurate semiglobal stereo matching using pipelined inter-pixel aggregation,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 1, pp. 411-422, January 2022, DOI: 10.1109/TCSVT.2021.3061200.
[43] J. -H. Jung, H. -S. Lee, J. H. Lee, and D -J. Park, “A novel template matching scheme for fast full-search boosted by an integral image,” IEEE Signal Processing Letters, vol. 17, no. 1, pp. 107-110, January 2010, DOI: 10.1109/LSP.2009.2032452.
[44] D. Scharstein, H. Hirschmüller, Y. Kitajima, G. Krathwohl, N. Nešić, X. Wang, and P. Westling, “High-resolution stereo datasets with subpixel-accurate ground truth,” in Proc. German Conf. Pattern Recognit, Münster, Germany, October 15, 2014, pp. 31–42.
[45] Canon，「EOS 600D 規格」。February 10, 2011, Accessed on June 24, 2023, [Online]. Available: https://tw.canon/zh_TW/support/6200098100

全文公開日期 2025/08/15 (校內網路)
全文公開日期 2025/08/15 (校外網路)
全文公開日期 2025/08/15 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文