簡易檢索 / 詳目顯示

研究生: 吳冠毅
Kuan-Yi Wu
論文名稱: 利用新視角合成方法應用於自動駕駛車輛之研究
Novel View Synthesis Method for Self-driving Cars
指導教授: 方文賢
Wen-Hsien Fang
陳郁堂
Yie-Tarng Chen
口試委員: 方文賢
陳郁堂
邱建青
賴坤財
鍾聖倫
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 54
中文關鍵詞: 新視圖合成多平面圖像圖像修復影片修復
外文關鍵詞: Novel View Synthesis, Multi-Plane Image, Image Inpainting, VideoInpainting
相關次數: 點閱:199下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

新視圖合成是計算機視覺和圖學中的核心問題。它試圖從一個或多個從不同角度拍攝的已知圖像生成三維(3D)立體物件或場景的任意視圖,並且由於其在虛擬和增強現實(VR / AR)新興領域中的潛在應用而引起了廣泛的關注)。在本文中,我們考慮一個以多個來源視圖為輸入的框架,該框架可以是單眼或雙目序列。為此,我們首先利用具有3D卷積層的捲積神經網絡(CNN)將每個採樣視圖擴展為具有多平面圖像(MPI)表示的局部光場。之後,我們可以通過混合附近的圖層表示來生成看不見的景象,稱為新視圖。此外,為了固定圖像中的遺失孔洞區域並保持時間序列的連貫性,我們使用CNN編碼器在具有仿射矩陣的幀之間對齊和提取相似的內容,因此框架可以在序列中進行提示並填補缺失的孔洞。藉此,與原先的新視圖網路相比,再加上影片修復的神經網路,我們可以增強新視圖的預測能力。


View synthesis is a central problem in computer vision and graphics. It attempts togenerate arbitrary views of three-dimensional (3D) objects or scenes from one or moreknown images taken from varying viewpoints and has attracted lots of attention dueto its potential applications in the emerging fields of virtual and augmented reality(VR/AR). In this thesis, we consider a framework which takes multiple source views asinput, which could be a monocular or stereo sequence. Toward this end, we first utilizea convolutional neural network (CNN) with 3D convolutional layers to expand eachsampled view to a local light field with a multiple-plane-image (MPI) representation.Afterward, we can generate an unseen sight called novel view by blending the nearbylayer representations. Furthermore, to fix the image hole regions and maintain temporalcoherency, we use the CNN encoder to align and extract similar contents between frameswith affine matrices, so the framework can take cues among the sequence and fill up themissing holes. Thereby, we can enhance the novel view prediction capability comparedwith the previous works.

Introduction Background and Related Work Proposed Method Experimentation and Results Conclusion Appendix: Results on ITRI’s Dataset

[1] J. Heikkila and O. Silven, “A Four-Step Camera Calibration Procedure with Im-plicit Image Correction,” inProceedings of IEEE Computer Society Conference onComputer Vision and Pattern recognition, pp. 1106–1112, IEEE, 1997.[2] O. Bottema and B. Roth,Theoretical Kinematics, vol. 24. Courier Corporation,1990.[3] P. Musialski, P. Wonka, D. G. Aliaga, M. Wimmer, L. Van Gool, and W. Purgath-ofer, “A Survey of Urban Reconstruction,” inComputer Graphics Forum, vol. 32,pp. 146–177, Wiley Online Library, 2013.[4] J. F. Cornwell,Group Theory in Physics: An Introduction. Academic Press, 1997.[5] A. Watt, “3D Computer Graphics,” 1993.[6] R. Hartley and A. Zisserman, “Multiple View Geometry in Computer Vision,”2003.[7] Y. Furukawa, C. Hern ́andez,et al., “Multi-view stereo: A tutorial,”Foundationsand TrendsR©in Computer Graphics and Vision, vol. 9, pp. 1–148, June 2015.[8] O. Ozyesil, V. Voroninski, R. Basri, and A. Singer, “A survey of structure frommotion,”ArXiv Preprint ArXiv:1701.08493, 2017.[9] S. Poddar, R. Kottath, and V. Karar, “Evolution of visual odometry techniques,”ArXiv Preprint ArXiv:1804.11142, 2018.[10] H. C. Longuet-Higgins, “A Computer Algorithm for Reconstructing a Scene fromTwo Projections,”Nature, vol. 293, no. 5828, pp. 133–135, 1981.[11] M. R. U. Saputra, A. Markham, and N. Trigoni, “Visual slam and structure frommotion in dynamic environments: A survey,”ACM Computing Surveys (CSUR),vol. 51, no. 2, pp. 1–36, 2018.[12] H. Durrant-Whyte and T. Bailey, “Simultaneous localization and mapping: Parti,”IEEE Robotics & Automation Magazine, vol. 13, no. 2, pp. 99–110, 2006.[13] R. Mur-Artal and J. D. Tard ́os, “ORB-SLAM2: An Open-Source SLAM Systemfor Monocular, Stereo, and RGB-D Cameras,”IEEE Transactions on Robotics,vol. 33, no. 5, pp. 1255–1262, 2017.[14] J. Engel, V. Koltun, and D. Cremers, “Direct Sparse Odometry,”IEEE Transac-tions on Pattern Analysis and Machine Intelligence, vol. 40, no. 3, pp. 611–625,2017.[15] A. Geiger, J. Ziegler, and C. Stiller, “StereoScan: Dense 3D Reconstruction inReal-Time,” in2011 IEEE Intelligent Vehicles Symposium (IV), pp. 963–968,IEEE, 2011.[16] H. Zhan, C. S. Weerasekera, J. Bian, and I. Reid, “Visual Odometry Revisited:What Should Be Learnt?,”ArXiv Preprint ArXiv:1909.09803, 2019.[17] “Parallel Tracking and Mapping for Small AR Workspaces, author=Klein, Georgand Murray, David,” in2007 6th IEEE and ACM International Symposium onMixed and Augmented Reality, pp. 225–234, IEEE, 2007.[18] C. Forster, M. Pizzoli, and D. Scaramuzza, “SVO: Fast Semi-Direct MonocularVisual Odometry,” in2014 IEEE International Conference on Robotics and Au-tomation (ICRA), pp. 15–22, IEEE, 2014.[19] C. Forster, Z. Zhang, M. Gassner, M. Werlberger, and D. Scaramuzza, “SVO:SemiDirect Visual Odometry for Monocular and MultiCamera Systems,”IEEETransactions on Robotics, vol. 33, no. 2, pp. 249–265, 2016.[20] J. Engel, T. Sch ̈ops, and D. Cremers, “LSD-SLAM: Large-Scale Direct Monocu-lar SLAM,” inEuropean Conference on Computer Vision, pp. 834–849, Springer,2014.[21] B. Ummenhofer, H. Zhou, J. Uhrig, N. Mayer, E. Ilg, A. Dosovitskiy, and T. Brox,“Demon: Depth and Motion Network for Learning Monocular Stereo,” inPro-ceedings of the IEEE Conference on Computer Vision and Pattern Recognition,pp. 5038–5047, 2017. [22] S. Wang, R. Clark, H. Wen, and N. Trigoni, “Deepvo: Towards End-to-End VisualOdometry with Deep Recurrent Convolutional Neural Networks,” in2017 IEEEInternational Conference on Robotics and Automation (ICRA), pp. 2043–2050,IEEE, 2017.[23] H. Zhou, B. Ummenhofer, and T. Brox, “Deeptam: Deep Tracking and Mapping,”inProceedings of the European Conference on Computer Vision (ECCV), pp. 822–838, 2018.[24] T. Zhou, M. Brown, N. Snavely, and D. G. Lowe, “Unsupervised Learning of Depthand Ego-Motion from Video,” inProceedings of the IEEE Conference on ComputerVision and Pattern Recognition, pp. 1851–1858, 2017.[25] H. Zhan, R. Garg, C. Saroj Weerasekera, K. Li, H. Agarwal, and I. Reid, “Un-supervised Learning of Monocular Depth Estimation and Visual Odometry withDeep Feature Reconstruction,” inProceedings of the IEEE Conference on Com-puter Vision and Pattern Recognition, pp. 340–349, 2018.[26] C. Godard, O. Mac Aodha, M. Firman, and G. J. Brostow, “Digging Into Self-Supervised Monocular Depth Estimation,” inProceedings of the IEEE Interna-tional Conference on Computer Vision, pp. 3828–3838, 2019.[27] V. Guizilini, R. Ambrus, S. Pillai, A. Raventos, and A. Gaidon, “3D Packing forSelf-Supervised Monocular Depth Estimation,” inProceedings of the IEEE/CVFConference on Computer Vision and Pattern Recognition, pp. 2485–2494, 2020.[28] Z. Teed and J. Deng, “Deepv2d: Video to Depth With Differentiable Structurefrom Motion,”ArXiv Preprint ArXiv:1812.04605, 2018.[29] S. E. Chen and L. Williams, “View Interpolation for Image Synthesis,” inPro-ceedings of the 20th Annual Conference on Computer Graphics and InteractiveTechniques, pp. 279–288, ACM, 1993.[30] C. L. Zitnick, S. B. Kang, M. Uyttendaele, S. Winder, and R. Szeliski, “High-quality video view interpolation using a layered representation,” inACM Trans-actions on Graphics (TOG), vol. 23, pp. 600–608, ACM, 2004.[31] S. M. Seitz and C. R. Dyer, “View Morphing,” inProceedings of the 23rd AnnualConference on Computer Graphics and Interactive Techniques, pp. 21–30, ACM,1996.[32] P. E. Debevec, C. J. Taylor, and J. Malik,Modeling and Rendering Architecturefrom Photographs. University of California, Berkeley, 1996.[33] A. Fitzgibbon, Y. Wexler, and A. Zisserman, “Image-Based Rendering UsingImage-Based Priors,”Proceedings of the IEEE International Conference on Com-puter Vision, pp. 14–17, Oct. 2003.[34] S. J. Gortler, R. Grzeszczuk, R. Szeliski, and M. F. Cohen, “The Lumigraph,” inProceedings of the 23rd Annual Conference on Computer Graphics and InteractiveTechniques, pp. 43–54, 1996.[35] M. Levoy and P. Hanrahan, “Light Field Rendering,” inProceedings of the 23rdAnnual Conference on Computer Graphics and Interactive Techniques, pp. 31–42,1996.[36] P. Hedman, S. Alsisan, R. Szeliski, and J. Kopf, “Casual 3D Photography,”ACMTransactions on Graphics (TOG), vol. 36, no. 6, pp. 1–15, 2017.[37] Y. Nakashima, F. Okura, N. Kawai, and K. Ikeuchi, “Realtime Novel View Syn-thesis with Eigen-Texture Regression,”[38] K. Rematas, C. H. Nguyen, T. Ritschel, M. Fritz, and T. Tuytelaars, “NovelViews of Objects from A Single Image,”IEEE Transactions on Pattern Analysisand Machine Intelligence, vol. 39, no. 8, pp. 1576–1590, 2016.[39] J. Flynn, I. Neulander, J. Philbin, and N. Snavely, “Deepstereo: Learning to Pre-dict New Views from The World’s Imagery,” inProceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition, pp. 5515–5524, 2016.[40] N. K. Kalantari, T.-C. Wang, and R. Ramamoorthi, “Learning-Based View Syn-thesis for Light Field Cameras,”ACM Transactions on Graphics (TOG), vol. 35,no. 6, pp. 1–10, 2016.[41] S.-H. Sun, M. Huh, Y.-H. Liao, N. Zhang, and J. J. Lim, “Multi-View to NovelView: Synthesizing Novel Views with Self-learned Confidence,” inProceedings ofthe European Conference on Computer Vision (ECCV), pp. 155–171, 2018.[42] E. Park, J. Yang, E. Yumer, D. Ceylan, and A. C. Berg, “Transformation-Grounded Image Generation Network for Novel 3D View Synthesis,” inProceedingsof the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3500–3509, 2017.[43] T. Zhou, R. Tucker, J. Flynn, G. Fyffe, and N. Snavely, “Stereo Magnifi-cation: Learning View Synthesis Using Multiplane Images,”ArXiv PreprintArXiv:1805.09817, 2018.[44] T. Zhou, S. Tulsiani, W. Sun, J. Malik, and A. A. Efros, “View Synthesis byAppearance Flow,” inEuropean Conference on Computer Vision, pp. 286–301,Springer, 2016.[45] J. Xie, R. Girshick, and A. Farhadi, “Deep3d: Fully Automatic 2D-to-3D VideoConversion with Deep Convolutional Neural Networks,” inEuropean Conferenceon Computer Vision, pp. 842–857, Springer, 2016.[46] O. Elharrouss, N. Almaadeed, S. Al-Maadeed, and Y. Akbari, “Image Inpainting:A Review,”Neural Processing Letters, pp. 1–22, 2019.[47] H. Xue, S. Zhang, and D. Cai, “Depth Image Inpainting: Improving Low RankMatrix Completion with Low Gradient Regularization,”IEEE Transactions onImage Processing, vol. 26, no. 9, pp. 4311–4320, 2017.[48] Y. Wei and S. Liu, “Domain-Based Structure-Aware Image Inpainting,”Signal,Image and Video Processing, vol. 10, no. 5, pp. 911–919, 2016.[49] D. Zhang, Z. Liang, G. Yang, Q. Li, L. Li, and X. Sun, “A Robust ForgeryDetection Algorithm for Object Removal by Exemplar-Based Image Inpainting,”Multimedia Tools and Applications, vol. 77, no. 10, pp. 11823–11842, 2018.[50] T. Ruˇzi ́c and A. Piˇzurica, “Context-Aware Patch-Based Image Inpainting Us-ing Markov Random Field Modeling,”IEEE Transactions on Image Processing,vol. 24, no. 1, pp. 444–456, 2014.[51] M. Isogawa, D. Mikami, D. Iwai, H. Kimata, and K. Sato, “Mask Optimizationfor Image Inpainting,”IEEE Access, vol. 6, pp. 69728–69741, 2018.[52] W. Wang and Y. Jia, “Damaged Region Filling and Evaluation by SymmetricalExemplar-Based Image Inpainting for Thangka,”EURASIP Journal on Image andVideo Processing, vol. 2017, no. 1, pp. 1–13, 2017.[53] N. Cai, Z. Su, Z. Lin, H. Wang, Z. Yang, and B. W.-K. Ling, “Blind InpaintingUsing the Fully Convolutional Neural Network,”The Visual Computer, vol. 33,no. 2, pp. 249–261, 2017.[54] Z. Yan, X. Li, M. Li, W. Zuo, and S. Shan, “Shift-Net: Image Inpainting via DeepFeature Rearrangement,” inProceedings of the European Conference on ComputerVision (ECCV), pp. 1–17, 2018.[55] J. Dong, R. Yin, X. Sun, Q. Li, Y. Yang, and X. Qin, “Inpainting of RemoteSensing SST Images with Deep Convolutional Generative Adversarial Network,”IEEE Geoscience and Remote Sensing Letters, vol. 16, no. 2, pp. 173–177, 2018.[56] M. Granados, K. I. Kim, J. Tompkin, J. Kautz, and C. Theobalt, “BackgroundInpainting for Videos with Dynamic Objects and A Free-Moving Camera,” inEuropean Conference on Computer Vision, pp. 682–695, Springer, 2012.[57] J.-B. Huang, S. B. Kang, N. Ahuja, and J. Kopf, “Temporally Coherent Comple-tion of Dynamic Video,”ACM Transactions on Graphics (TOG), vol. 35, no. 6,pp. 1–11, 2016.[58] J. L. Schonberger and J.-M. Frahm, “Structure-from-Motion Revisited,” inPro-ceedings of the IEEE Conference on Computer Vision and Pattern Recognition,pp. 4104–4113, 2016.[59] D. G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,”Inter-national Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004.
[60] K. G ̈urlebeck and T. Lahmer, “Evaluation of the Metric Trifocal Tensor For Rela-tive Three-View Orientation,”Applications of Computer Science and Mathematicsin Architecture and Civil Engineering, p. 166.[61] M. A. Fischler and R. C. Bolles, “Random Sample Consensus: A Paradigm forModel Fitting with Applications to Image Analysis and Automated Cartography,”Communications of the ACM, vol. 24, no. 6, pp. 381–395, 1981.[62] B. Mildenhall, P. P. Srinivasan, R. Ortiz-Cayon, N. K. Kalantari, R. Ramamoorthi,R. Ng, and A. Kar, “Local Light Field Fusion: Practical View Synthesis with Pre-scriptive Sampling Guidelines,”ACM Transactions on Graphics (TOG), vol. 38,no. 4, pp. 1–14, 2019.[63] S. Lee, S. W. Oh, D. Won, and S. J. Kim, “Copy-and-Paste Networks for DeepVideo Inpainting,” inProceedings of the IEEE International Conference on Com-puter Vision, pp. 4413–4421, 2019.[64] A. Geiger, P. Lenz, and R. Urtasun, “Are We Ready for Autonomous Driving?The Kitti Vision Benchmark Suite,” in2012 IEEE Conference on Computer Visionand Pattern Recognition, pp. 3354–3361, IEEE, 2012.[65] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision Meets Robotics: TheKitti Dataset,”The International Journal of Robotics Research, vol. 32, no. 11,pp. 1231–1237, 2013.[66] A. Hore and D. Ziou, “Image Quality Metrics: PSNR vs. SSIM,” in2010 20thInternational Conference on Pattern Recognition, pp. 2366–2369, IEEE, 2010.[67] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image Quality As-sessment: From Error Visibility to Structural Similarity,”IEEE transactions onImage Processing, vol. 13, no. 4, pp. 600–612, 2004

無法下載圖示 全文公開日期 2025/08/28 (校內網路)
全文公開日期 本全文未授權公開 (校外網路)
全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
QR CODE