簡易檢索 / 詳目顯示

研究生: 陳俊瑋
Jun-Wei Chen
論文名稱: 基於多尺度注意力網路實現高解析深度圖估測
Hierarchical Multi-Scale Attention Networks for High Resolution Depth Estimation
指導教授: 陳郁堂
Yie-Tarng Chen
口試委員: 呂永和
林銘波
Ming-Bo Lin
方文賢
Wen-Hsien Fang
阮聖彰
Shanq-Jang Ruan
陳郁堂
Yie-Tarng Chen
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 37
中文關鍵詞: 單目深度估測多尺度注意力架構高解析度深度估測
外文關鍵詞: Monocular Depth Estimation, Multi-Scale Attention Architecture, High Resolution Depth Estimation
相關次數: 點閱:268下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

本文提出了一種新穎的高解析度單目深度估測。近年來,隨著單目深度估測的高速發展,對高解析度圖像進行深度估測仍然是一個具有挑戰性的問題。原因是基於單一尺度的單目深度估測網路會面臨著高頻細節與結構一致性之間的困境。特別是針對低解析度圖像設計的深度估測網路缺乏高頻細節但保有整體結構的訊息,而將相同的圖片輸入到針對高解析度圖像設計的網路中,可以獲得高頻細節但結構訊息變差。受多解析度估測的語意分割的成功案例啟發,首先,我們重新設計了一個多尺度網路,可以同時提供不同尺度的深度圖。接著,應用多尺度注意力機制來結合不同尺度的深度圖像,藉此來重建兩個不同尺度之間的關聯性。我們透過 CARLA 及 MVS-Synth 資料集驗證我們提出的方法的效能,並演示了在新視圖合成中的應用。


This work presents a novel high-resolution monocular depth estimation.
With the fast progress in mono depth estimation in recent years, the depth estimation for a high resolution image remains a challenging issue. This is because
monocular depth estimation based on a single scale network faces a dilemma between high frequency details and structure consistency. Specially, the depth estimation from a network designed for low resolution images lacks high frequency details but retain the overall structure information, while the same image input to a network tailored to high resolution, the high frequency details can be acquired but the structural information becomes worse. Inspired by the success of multi-resolution inference in semantic segmentation. First, we re-design a multiscale network, which can provide depth maps with different scales simultaneously. Next, a multi-scale attention mechanism is applied to fuse the depth images with different scales, such that object association between two different scales can be reconstructed. We verify the performance of our proposed approach on CARLA and MVS-Synth datasets and demonstrate the application to novel view synthesis as well.

Chapter 1 Introduction 1.1 Motivations 1.2 Summary of Thesis 1.3 Contributions 1.4 Thesis Outline Chapter 2 Related Work 2.1 Monocular Depth Estimation 2.2 Multi-scale context methods 2.3 Relational context methods Chapter 3 Method 3.1 HRNet v2 3.2 Hierarchical multi-scale attention 3.3 Loss function 3.4 Scale recovery 3.5 3D-Photo inpainting Chapter 4 Experimental and Results 4.1 Experiment details 4.1.1 Data augmentation 4.2 Training dataset 4.2.1 MVS-Synth dataset 4.2.2 CARLA dataset 4.2.3 Data pre-processing 4.3 Additional Evaluation dataset 4.3.1 ITRI dataset 4.4 Evaluation Metrics 4.5 Experimental Results 4.5.1 Virtual world depth estimation 4.5.2 Real world depth estimation Chapter 5 Conclusion References

[1] K. Sun, Y. Zhao, B. Jiang, T. Cheng, B. Xiao, D. Liu, Y. Mu, X. Wang, W. Liu, and J. Wang, “High-resolution representations for labeling pixels and regions,” arXiv preprint arXiv:1904.04514, 2019.
[2] M.-L. Shih, S.-Y. Su, J. Kopf, and J.-B. Huang, “3d photography using context-aware layered depth inpainting,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8028–8038, 2020.
[3] P.-H. Huang, K. Matzen, J. Kopf, N. Ahuja, and J.-B. Huang, “Deepmvs: Learning multi-view stereopsis,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2821–2830, 2018.
[4] A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “CARLA: An open urban driving simulator,” in Proceedings of the 1st Annual Conference on Robot Learning, pp. 1–16, 2017.
[5] C. Godard, O. Mac Aodha, M. Firman, and G. J. Brostow, “Digging into self-supervised monocular depth estimation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3828–3838, 2019.
[6] R. Ranftl, K. Lasinger, D. Hafner, K. Schindler, and V. Koltun, “Towards robust monocular depth estimation: Mixing datasets for zero-shot crossdataset transfer,” arXiv preprint arXiv:1907.01341, 2019.
[7] S. M. H. Miangoleh, S. Dille, L. Mai, S. Paris, and Y. Aksoy, “Boosting monocular depth estimation models to high-resolution via content-adaptive multi-resolution merging,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9685–9694, 2021.
[8] A. Tao, K. Sapra, and B. Catanzaro, “Hierarchical multi-scale attention for semantic segmentation,” arXiv preprint arXiv:2005.10821, 2020.
[9] F. Xue, G. Zhuo, Z. Huang, W. Fu, Z. Wu, and M. H. Ang, “Toward hierarchical self-supervised monocular absolute depth estimation for autonomous driving applications,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2330–2337, IEEE, 2020.
[10] T.-C. Wang, M.-Y. Liu, J.-Y. Zhu, A. Tao, J. Kautz, and B. Catanzaro, “High-resolution image synthesis and semantic manipulation with conditional gans,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8798–8807, 2018.
[11] T.-Y. Lin, P. Doll´ar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117–2125, 2017.
[12] K. Sun, B. Xiao, D. Liu, and J. Wang, “Deep high-resolution representation learning for human pose estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5693–5703, 2019.
[13] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004.

QR CODE