自動產生Google街景導覽影片並提供物件偵測、影像修補與3D虛擬實境顯示

簡易檢索 / 詳目顯示

回結果列表

研究生：	鄭元棓 Yuan-Bang Cheng
論文名稱：	自動產生Google街景導覽影片並提供物件偵測、影像修補與3D虛擬實境顯示 Automatic Generation of Video Navigation from Google Street View Database with Object Detection, Image Inpainting and Stereoscopic Virtual Reality Display
指導教授：	楊傳凱 Chuan-Kai Yang 張登文 Teng-Wen Chang
口試委員:	王照明 Chao-Ming Wang 孫沛立 Pei-Li Sun 花凱龍 Kai-Lung Hua 楊傳凱 Chuan-Kai Yang 張登文 Teng-Wen Chang
學位類別：	博士 Doctor
系所名稱：	管理學院 - 資訊管理系 Department of Information Management
論文出版年：	2019
畢業學年度：	107
語文別：	英文
論文頁數：	167
中文關鍵詞：	Google 街景影像、物件偵測、影像修補、HOG and Exemplar-SVMs 、Haar and Adaboost 、GPU 、Caffe and Faster R-CNN 、深度圖預測（檢測）、基於深度影像的渲染、三維虛擬實境360度顯示（3DVR360）、Unity and HTC Vive
外文關鍵詞：	Google Street View, Object Detection, Image Inpainting, HOG and Exemplar-SVMs, Haar and Adaboost, GPU, Caffe and Faster R-CNN, Depth Map Prediction, DIBR, Stereoscopic Virtual Reality 360 (3DVR360), Unity and HTC Vive
相關次數：	點閱：772 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

近十年間，在電腦科學領域已有許多關於人工智慧與深度學習的研究。同時，Google街景影像服務是我們時常會使用到的，我們能夠透過Google街景影像服務去查詢到我們想要到達目的地的街景圖。然而，卻只有很少的研究是在從事於能自動化地將Google街景影像直接轉變成一個導覽影片並且還能包括一些物件偵測與影像修補的功能；再加上，也只有很少的研究能夠將這個導覽影片轉變成一個三維虛擬實境360度的顯示，能夠讓使用者配載HTC Vive去觀看這個效果。
在我的研究裡，我嘗試結合目前最受歡迎的二項電腦科學領域的研究－深度學習（人工智慧）與虛擬實境。對於導覽影片的產生，總共我已經開發了我的系統有三個版本。第一，是稱作GSVPlayer-HH&I（Google街景播放器，具有HOG+Haar物件偵測與影像修補)，我主要使用基於CPU的方法去做物件偵測與影像修補。第二，是稱作GSVPlayer-FRRCNN&I（Google街景播放器，具有Faster R-CNN物件偵測與影像修補），這版本是基於在GSVPlayer-HH&I的基礎，反而我是使用基於GPU的方法（Faster R-CNN）去做物件偵測。第三，是稱作GSVPlayer-3DVR360（Google街景播放器，具有三維虛擬實境360度的顯示）。在這版本中，我實作一系列的影像處理、單視圖的深度圖檢測、基於深度影像的渲染、與三維虛擬實境360度顯示。對於這版本，結果顯示：即使這系統有較長的運算時間的需求，但是所有的使用者仍然是對於GSVPlayer-3DVR360感到滿意。
在我的論文中，針對系統的三個版本的結果與評估，在數量上與品質上我各別地呈現出相關內容；針對三者的討論與限制，我也做出詳盡的解釋。簡單地說，本論文的總結是，我所提出的這個系統是一個完整的整合式架構，我使用正確的系統流程與方法。
針對未來的研究，有許多的潛在方向可以去探索與研究。包括使用多台運算伺服器、具有時間順序考量的單視圖的深度圖檢測之卷積式神經網路、合成許多新的影格、YOLO的物件偵測方法、與針對高解析度影像的物件偵測與影像修補。

In recent years, there are abundant researches in artificial intelligence and deep learning. At the same time, Google Street View images are often used by us. We can use Google Street View to look up the scene views of destination where we want to go to. However, there is not much work that can automatically transform Google Street View images directly to a navigation video with the functionalities of object detection and image inpainting, and there is also not much work that can make the generated navigation video used together with a HTC Vive for displaying the 3DVR360 effect.
In my works, this study tries to combine currently the two most popular computer science researches of deep learning (or artificial intelligence) and virtual reality. Totally, this study has developed the three versions of my system for the navigation video generation. First, in this GSVPlayer-HH&I (i.e. Google Street View Player with HOG+Haar and Inpainting), the system mainly adopts the CPU-based methods for object detection and image inpainting. Second, in this GSVPlayer-FRRCNN&I (i.e. Google Street View Player with Faster R-CNN and Inpainting), based on the foundation of GSVPlayer-HH&I, the system instead uses the GPU-based methods (Faster R-CNN) for object detection. Third, in this GSVPlayer-3DVR360 (i.e. Google Street View Player with Stereoscopic Virtual Reality 360 Display), the system implements a series of image processing, monocular depth map estimation, DIBR and 3DVR360 display. One of the results gained is that, even though there is a problem of longer computation time in this system, all users are still satisfied with this GSVPlayer-3DVR360.
In my dissertation, for the three versions of my system, the results and evaluations regarding both quantities and qualities are presented respectively, and the discussion and limitation are explicitly explained. In conclusion, briefly speaking, the system I proposed is a complete integrated framework.
In future works, there are several potential directions can be explored and researched, including the use of multiple computing servers, a new CNN of monocular depth estimation with the temporal sequence, synthesizing novel frames, the YOLO object detection method, and object detection and image inpainting on high-resolution images.

摘要    I
ABSTRACT    III
誌謝    V
TABLE OF CONTENT    VI
LIST OF FIGURES    IX
LIST OF TABLES    XIV
Chapter 1    Introduction    1
1    Motivation    2
2    Purposes    2
3    Contribution    3
4    Scope    3
5    Organization    4
Chapter 2    Related Works    6
1    Applications of Google Earth and Google Street View    6
2    Object Detection    7
2.1.    CPU-Based Machine Learning    7
2.2.    GPU-Based Deep Learning    10
3    Foreground Extraction    12
4    Image Inpainting    12
5    Depth Map Prediction    14
6    Depth-Image-Based Rendering    18
7    Stereoscopic VR360    19
Chapter 3    Google Street View Player with HOG+Haar and Inpainting    22
1    System Architecture    22
2    System Flow    25
3    Implementation    26
3.1.    Preprocessing and First-Staged Inpainting    26
3.2.    Transformation Matrices between Two Consecutive Images    30
3.3.    Object Detection using HOG+Haar and Segmentation    33
3.4.    Road Structure Propagation and Second-Staged Inpainting    37
3.5.    Generation of the Inpainted Continuous Navigation Animation    39
Chapter 4    Google Street View Player with Faster R-CNN and Inpainting    43
1    System Architecture    43
2    System Flow    48
3    Implementation    48
3.1.    Preprocessing and First-Staged Inpainting    49
3.2.    Transformation Matrices between Two Consecutive Images    49
3.3.    Object Detection using Faster R-CNN and Segmentation    49
3.4.    Road Structure Propagation and Second-Staged Inpainting    50
3.5.    Generation of the Inpainted Continuous Navigation Animation    51
Chapter 5    Google Street View Player with Stereoscopic Virtual Reality 360 Display    52
1    System Architecture    52
2    System Flow    57
3    Implementation    59
3.1.    Image Fetching and Downloading    59
3.2.    Image Stitching    61
3.3.    Monocular Depth Map Estimation    66
3.4.    Depth-Image-Based Rendering    74
3.5.    Compressing and Uploading    84
3.6.    Unity and 3D VR 360 Display    84
Chapter 6    Results and Evaluations    93
1    GSVPlayer-HH&I    93
1.1.    System Setup    93
1.2.    Result and Evaluation    96
1.3.    Discussion and Limitation    113
2    GSVPlayer-FRRCNN&I    116
2.1.    System Setup    116
2.2.    Result and Evaluation    116
2.3.    Discussion and Limitation    124
3    GSVPlayer-3DVR360    127
3.1.    System Setup    127
3.2.    Result and Evaluation    127
3.3.    Discussion and Limitation    145
Chapter 7    Conclusion and Future Works    148
1    Conclusion    148
2    Future Works    149
2.1.    The Use of Multiple Computing Servers    149
2.2.    A New CNN of Monocular Depth Estimation with the Temporal Sequence    150
2.3.    Synthesizing Novel Frames    150
2.4.    The YOLO Object Detection Method    150
2.5.    Object Detection and Image Inpainting on High-Resolution Images    151
2.6.    Objective Method to Evaluate the “Smoother” Issue    151
2.7.    More Improvements of System and Function    152
References    155
Appendix I    162
Appendix II    165
Appendix III    167
                                

1. Aaron (2016) CycleVR. In, UK
2. Anand A, Saxena A (2010) Converting movie-grade 2D videos to 3D. In, CiteSeerx, p 1-7
3. Barnes C, Shechtman E, Finkelstein A, Goldman DB (2009) PatchMatch: A Randomized Correspondence Algorithm for Structural Image Editing. In: ACM SIGGRAPH 2009 Papers. ACM, New Orleans, Louisiana, p 24:21-24:11
4. Bertalmio M, Sapiro G, Caselles V, Ballester C (2000) Image inpainting. In: ACM SIGGRAPH 2010 Papers. ACM, p 417-424
5. Boykov Y, Veksler O, Zabih R (2001) Fast Approximate Energy Minimization via Graph Cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence 23:1222-1239
6. Chen YY, Ning C, Zhou YY, Wu KH, Zhang WW (2014) Pedestrian detection and tracking for counting applications in metro station. Discrete Dynamics in Nature and Society 2014
7. Cheng Y-B, Yang C-K, Chang G-C, Chang T-W (2018) Automatic Generation of Video Navigation from Google Street View Data with Car Detection and Inpainting. Multimedia Tools and Applications:in press
8. Chu W-T, Chao Y-C, Chang Y-S (2015) Street sweeper: detecting and removing cars in street view images. Multimedia Tools and Applications 74:10965-10988
9. Criminisi A, Perez P, Toyama K (2004) Region Filling and Object Removal by Exemplar-Based Image Inpainting. IEEE Trans Image Process 13:1200:1201-1212
10. Diener E, Emmons RA, Larsen RJ, Griffin S (1985) The Satisfaction with Life Scale. Journal of Personality Assessment 49:71-75
11. Eigen D, Puhrsch C, Fergus R (2014) Depth Map Prediction from a Single Image using a Multi-Scale Deep Network. In: The 27th Advances in Neural Information Processing System (NIPS 2014). p 1-9
12. Fehn C (2003) A 3D-TV Approach Using Depth-image-based Rendering (DIBR). In: The 3rd International conference, Visualization imaging and image processing. Visualization imaging and image processing, Benalmadena, Spain, p 482-487
13. Fehn C (2004) Depth-image-based rendering (DIBR), compression, and transmission for a new approach on 3D-TV. In: Proc. SPIE 5291, Stereoscopic Displays and Virtual Reality Systems XI. San Jose, California, United States, p 93-104
14. Felzenszwalb PF, Girshick RB, McAllester D (2010) Cascade object detection with deformable part models. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. p 2241-2248
15. Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object Detection with Discriminatively Trained Part-Based Models. IEEE Transactions on Pattern Analysis and Machine Intelligence 32:1627-1645
16. Flynn J, Neulander I, Philbin J, Snavely N (2016) DeepStereo: Learning to Predict New Views From the World's Imagery. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016. IEEE
17. Garg R, BG VK, Carneiro G, Reid I (2016) Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue. In: The 14th European Conference on Computer Vision (ECCV 2016). Amsterdam, the Netherlands, p 1-16
18. Girshick RB (2015) Fast R-CNN. In: IEEE ICCV 2015. arXiv - CoRR
19. Girshick RB, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE CVPR 2014. arXiv - CoRR
20. Godard C, Aodha OM, Brostow GJ (2017) Unsupervised Monocular Depth Estimation with Left-Right Consistency. In: IEEE CVPR 2017. arXiv
21. Guy R, Truong K (2012) CrossingGuard: Exploring Information Content in Navigation Aids for Visually Impaired Pedestrians. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, Austin, Texas, USA, p 405-414
22. Hao D, Feng X, Fan W, Chengxi Y (2015) A fast pedestrians counting method based on haar features and spatio-temporal correlation analysis. In: Proceedings of the 7th International Conference on Internet Multimedia Computing and Service. ACM, Zhangjiajie, Hunan, China, p 1-4
23. He K, Zhang X, Ren S, Sun J (2015) Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2015. arXiv - CoRR
24. Huang J-B, Kang SB, Ahuja N, Kopf J (2014) Image Completion Using Planar Structure Guidance. ACM Trans. Graph. 33:129:121-129:110
25. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: Convolutional Architecture for Fast Feature Embedding. In: Proceedings of the 22Nd ACM International Conference on Multimedia. ACM, Orlando, Florida, USA, p 675-678
26. Kansal S, Jain P (2015) Automatic Seed Selection Algorithm for Image Segmentation using Region Growing. International Journal of Advances in Engineering & Technology 8:362-367
27. Karsch K, Liu C, Kang SB (2012) Depth Extraction from Video Using Non-parametric Sampling. In: The 12th European Conference on Computer Vision (ECCV 2012). Florence, Italy, p 775-788
28. Karsch K, Liu C, Kang SB (2014) Depth Transfer: Depth Extraction from Video Using Non-Parametric Sampling. IEEE Transactions on Pattern Analysis and Machine Intelligence 36:2144-2158
29. Kim G, Cho JS (2012) Vision-based vehicle detection and inter-vehicle distance estimation. In: 2012 12th International Conference on Control, Automation and Systems. p 625-629
30. Kopf J, Chen B, Szeliski R, Cohen M (2010) Street Slide: Browsing Street Level Imagery. ACM Trans. Graph. 29:96:91-96:98
31. Kuznietsov Y, Stückler J, Leibe B (2017) Semi-Supervised Deep Learning for Monocular Depth Map Prediction. In: IEEE CVPR 2017. IEEE, p 6647-6655
32. Li Y, Sun J, Tang C-K, Shum H-Y (2004) Lazy snapping. In: ACM SIGGRAPH 2004 Papers. ACM, Los Angeles, California, p 303-308
33. Liu F, Shen C, Lin G, Reid I (2016) Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence 38:2024-2039
34. Liu G, Reda FA, Shih KJ, Wang T-C, Tao A, Catanzaro B (2018) Image Inpainting for Irregular Holes Using Partial Convolutions. arXiv - CoRR abs/1804.07723
35. Malisiewicz T, Gupta A, Efros AA (2011) Ensemble of exemplar-SVMs for object detection and beyond. In: 2011 International Conference on Computer Vision. p 89-96
36. Malisiewicz T, Shrivastava A, Gupta A, Efros AA (2012) Exemplar-SVMs for visual object detection, label transfer and image retrieval. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. p lxix-lxx
37. Meur OL, Gautier J, Guillemot C (2011) Examplar-based inpainting based on local geometry. In: 2011 18th IEEE International Conference on Image Processing. p 3401-3404
38. Mortensen EN, Barrett WA (1995) Intelligent scissors for image composition. In: SIGGRAPH '95. SIGGRAPH '95 Proceedings of the 22nd annual conference on Computer graphics and interactive techniques, New York, NY, USA, p 191-198
39. Oliveira MR, Santos VMF (2008) Automatic Detection of Cars in Real Roads using Haar-like Features. In: CONTROL2008. Proceedings of the 8th Portuguese Conference on Automatic Control (CONTROL2008), p 1-6
40. Peng Y, Xu M, Jin JS, Luo S, Zhao G (2011) Cascade-Based License Plate Localization with Line Segment Features and Haar-Like Features. In: 2011 Sixth International Conference on Image and Graphics. p 1023-1028
41. Prananta E, Pranowo, Budianto D (2016) GPU CUDA Accelerated Image Inpainting using Fourth Order PDE Equation. Telkomnika 14:1009-1015
42. Rasmussen M (2011) boxcutter. In:
43. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You Only Look Once: Unified, Real-Time Object Detection. In: IEEE CVPR 2016. IEEE, Las Vegas, NV, USA, p 1-10
44. Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39:1137-1149
45. Rother C, Kolmogorov V, Blake A (2004) "GrabCut" - interactive foreground extraction using iterated graph cuts. In: ACM SIGGRAPH 2004 Papers. ACM, Los Angeles, California, p 309-314
46. Rybski PE, Huber D, Morris DD, Hoffman R (2010) Visual classification of coarse vehicle orientation using Histogram of Oriented Gradients features. In: 2010 IEEE Intelligent Vehicles Symposium. p 921-928
47. Saxena A, Sun M, Ng AY (2007) Learning 3-D Scene Structure from a Siingle Still Image. In: IEEE 11th International Conference on Computer Vision, workshop on 3D Representation for Recognition (3dRR-07). IEEE, Rio de Janeiro, Brazil, p 1-8
48. Saxena A, Sun M, Ng AY (2008) Make3D: Depth Perception from a Single Still Image. In: The 23rd AAAI Conference on Artificial Intelligence and the 20th Innovative Applications of Artificial Intelligence Conference (AAAI 2008). Chicago, IL, United States, p 1571-1576
49. Saxena A, Sun M, Ng AY (2009) Make3D: Learning 3D Scene Structure from a Single Still Image. IEEE Transactions on Pattern Analysis and Machine Intelligence 31:824-840
50. Shih FY, Cheng S (2005) Automatic seeded region growing for color image segmentation. Image and Vision Computing 23:877-886
51. Silva DVSXD, Fernando WAC, Arachchi HK (2010) A New Mode Selection Technique for Coding Depth Maps of 3D Video. In: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, Dallas, TX, USA
52. Silva DVSXD, Fernando WAC, Yasakethu SLP (2009) Object Based Coding of the Depth Maps for 3D Video Coding. IEEE Transactions on Consumer Electronics 55:1699-1706
53. Tsai S-F, Cheng C-C, Li C-T, Chen L-G (2011) A Real-Time 1080p 2D-to-3D Video Conversion System. In: 2011 IEEE International Conference on Consumer Electronics (ICCE). IEEE, Las Vegas, NV, USA
54. Viola P, Jones M (2001) Rapid Object Detection using a Boosted Cascade of Simple Features. In: 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001). Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, p I-511-I-518
55. Wang J, Agrawala M, Cohen MF (2007) Soft Scissors: An Interactive Tool for Realtime High Quality Matting. In: ACM SIGGRAPH 2007 Papers. ACM, San Diego, California, p 9-1 - 9-6
56. Xie J, Girshick R, Farhadi A (2016) Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks. In: The 14th European Conference on Computer Vision (ECCV 2016). Amsterdam, the Netherlands, p 1-15
57. Yang C, Lu X, Lin Z, Shechtman E, Wang O, Li H (2017) High-Resolution Image Inpainting using Multi-Scale Neural Patch Synthesis. arXiv - CoRR abs/1611.09969
58. Yoshimoto Y, Dang TH, Kimura A, Shibata F, Tamura H (2011) Interaction Design of 2D/3D Map Navigation on Wall and Tabletop Displays. In: Proceedings of the ACM International Conference on Interactive Tabletops and Surfaces. ACM, Kobe, Japan, p 254-255
59. Yu J, Lin Z, Yang J, Shen X, Lu X, Huang TS (2018) Generative Image Inpainting with Contextual Attention. In: IEEE CVPR 2018. arXiv
60. Zhou T, Brown M, Snavely N, Lowe DG (2017) Unsupervised Learning of Depth and Ego-Motion from Video. In: IEEE CVPR 2017. IEEE, p 1-10

全文公開日期 2024/01/30 (校內網路)
全文公開日期本全文未授權公開 (校外網路)
全文公開日期本全文未授權公開 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文