簡易檢索 / 詳目顯示

研究生: 王煒翔
WEI-HSIANG WANG
論文名稱: 用於單一鏡頭移動拍攝之靜態場景相對深度估測技術
On the Relative Depth Estimation Techniques Used for Static Scene Videos Captured by Moving a Single Camera Lens
指導教授: 范欽雄
Chin-Shyurng Fahn
口試委員: 林啟芳
Chi-Fang Lin
黃榮堂
Jung-Tang Huang
馮輝文
Huei-Wen Ferng
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2015
畢業學年度: 103
語文別: 英文
論文頁數: 60
中文關鍵詞: 相對深度估測智慧型裝置影像分割對極幾何尺度不變特徵轉換
外文關鍵詞: Relative Depth Estimation, Smart Device, Image Segmentation, Epipolar Geometry, Scale-invariant Feature Transform
相關次數: 點閱:242下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

  近年來,深度圖一直被廣泛的運用,從Kinect的即時影像深度圖可以輕鬆的抓取人物動作,在人機互動上是非常重要的一環,而在智慧型手機風行的年代,靜態影像的深度圖也大為流行,被用來製作華麗的照片修改效果,靜態影像的相對深度偵測被受重視,為了此功能,甚至有手機大廠為了更好的影像處理效果,不惜成本在智慧型手機上搭載雙攝影機,但多數的智慧型手機仍然只搭載單攝影機,為了節省成本,單攝影機製作靜態影像深度圖的研究就更為重要,由於多數需要靜態深度圖支援的圖像處理效果,僅需要相對深度資訊即可,不需要精確的絕對深度資訊,所以本論文只針對相對深度資訊的估測進行研究。
  本文深度圖的製作概念是基於物體在連續影格上的移動量與其物體的遠近成正比而製作,概念類似於人坐在車上看窗外景物時,邊的東西移動的很快,但太陽或月亮幾乎不會移動。
  我們將會使用單攝影機錄製一段影片,此影片在拍攝期間會輕微的水平抑或垂直的晃動,當作製作深度圖的輸入資訊,針對每一個影格進行關鍵點的偵測,再將影片第一個影格的關鍵點使用尺度不變特徵轉換與其他影格的關鍵點做匹配,算出關鍵點之間的位移,當作深度估計的依據,再透過影像分割,將深度資訊填滿於所有的區塊。本文的實驗結果可以發現在複雜場景時,得到的深度資訊正確率較高,這是因為我們的方法是基於尺度不變特徵轉換所製作,對於特徵點較多的場景較為有利。


  In recent year, depth map is used extensively. The real time depth map captured by Kinect can get the human motion easily. It is very important in human–computer interaction. However, since smartphone is popularized, static scene depth maps are very popular. It is used to edit the photo with some special effects. To get a better static scene depth map, some smartphone company produce the smartphone with dual cameras, sparing no efforts. But most of company have the smartphone with dual cameras require cost-down. So static scene depth maps produced by single camera is more important. Most of popular special effects of photos only need relative depth information, don’t need the absolutely depth, so this thesis estimate relative depth.
  In this thesis, the concept of making depth map is based on the distance of moving object on the video frames proportionating to the depth. It is like the situation when a person sit in a moving car who can observe that the object out of car close to us move very fast and sun never move.
  We use a single camera to record a video to produce the depth map. While recording, the camera need vertically and/or horizontally jiggle. The video is detected the keypoint, then use them to match the keypoints by Scale-invariant Feature Transform. Distance of each matching keypoints is the depth information. Using the image segmentation to divide the image into several blocks can help the depth information expend to all the image by filling the depth in its blocks. The experiment of our method can realize the sense with complex black ground would get the more correct result.

中文摘要 i Abstract ii 致謝 iii Contents iv List of Figures vi List of Tables ix Chapter 1 Introduction 10 1.1 Overview 10 1.2 Motivation 11 1.3 System Description 11 1.4 Thesis organization 13 Chapter 2 Related Works 14 2.1 Depth Map Produced by Multiple Cameras 14 2.1.1 Kinect 14 2.1.2 HTC Smartphone M8 and M9+ 15 2.2 Single Camera 16 2.2.1 Epipolar Geometry 17 2.2.2 Structure from Motion 19 Chapter 3 Relative Depth Map 21 3.1 Generating the Relative Depth Information from a Video 21 3.2 Detection and Matching the Feature Points 22 3.2.1 Building a Scale-space 23 3.2.2 Building a DoG Scale-space 24 3.2.3 Directionality of Feature Point 27 3.2.4 Keypoint Descriptor 28 3.2.5 Keypoint Matching 29 3.3 Limited Recycling Incorrect Matching 29 3.3.1 Removing Matching by Random Sample Consensus 30 3.3.2 Removing Matching by the Upper Bound of Depth. 32 3.4 Average Relative Depth Map 33 Chapter 4 Expanding Depth Information 39 4.1 Image Segmentation 40 4.2 Filling Depth in the Blank of Keypoints 42 4.3 Removing the Depth Unable to Trust 44 4.4 Eliminating Empty Blocks 44 Chapter 5 Experimental Results and Discussions 47 5.1 Experiment Setup 47 5.2 The Precision of Relative Depth Estimation 48 5.3 The Relative Relationship Results of Simple Indoor Scenes 51 5.4 The Relative Relationship Results of Complex Outdoor Scenes 52 5.5 Comparing Our Method with the Stereo Vision Method 54 Chapter 6 Conclusions and Future Works 56 6.1 Conclusions 56 6.2 Future Works 57 References 58

[1] D. T. Vu, B. Chidester, J. Lu, and M. N. Do, “Scribble2focus: An Interactive Photo Refocusing System Based on Mobile Stereo Imaging,” in Proceedings of the IEEE Global Conference on Signal and Information, Austin, Texas, pp. 755-758, 2013.
[2] T. Ogi, Y. Tateyama, H. Lu, and E. Ikeda, “Digital 3D Ukiyoe Using the Effect of Motion Parallax,” in Proceedings of the 15th International Conference on Network-Based Information Systems, Melbourne, Australia, pp. 534-539, 2012.
[3] S. Lee, J. Lee, M. H. Hayes, A. K. Katsaggelos, and J. Paik, “Single Camera-based Full Depth Map Estimation Using Color Shifting Property of a Multiple Color-filter Aperture,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Kyoto, Japan, pp. 801-804, 2012.
[4] Z. Hu, L. Xu, and M. H. Yang, “Joint Depth Estimation and Camera Shake Removal from Single Blurry Image,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Ohio, pp. 2893-2900, 2014.
[5] S. H. Raza, O. Javed, A. Das, H. Sawhney, H. Cheng, and I. Essa, “Depth Extraction from Videos Using Geometric Context and Occlusion Boundaries,” in Proceedings of the British Machine Vision Conference, Nottingham, England, 2014.
[6] H. Hirschmüller, “Accurate and Efficient Stereo Processing by Semi-global Matching and Mutual Information, ” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, California, pp. 807-814, 2005.
[7] D. G. Lowe, “Distinctive Image Features from Scale-invariant Keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
[8] “Xbox 360 Accessories | Xbox,” [Online] Available http://www.xbox.com/
en-US/xbox-360/accessories?xr=shellnav (accessed on July 25, 2015).
[9] N. Chahal and S. Chaudhury, “High Quality Depth Map Estimation by Kinect Upsampling and Hole Filling Using RGB Features and Mutual Information,” in Proceedings of the Fourth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics, Jodhpur, India, pp. 1-4, 2013.
[10] “Smartphones | HTC United States, ” [Online] Available http://www.htc.com/
us/smartphones/ (accessed on July 25, 2015).
[11] “HTCdev - OpenSense SDK | HTC Dual Lens SDK, ” [Online] Available https://www.htcdev.com/devcenter/opensense-sdk/htc-dual-lens-api/sdk-overview-sample-code/ (accessed on July 25, 2015).
[12] R. Szeliski, Computer Vision: Algorithms and Applications: Springer Science & Business Media, Berlin, Germany, 2010.
[13] F. Yu and D. Gallup, “3D Reconstruction from Accidental Motion,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Ohio, pp. 3986-3993, 2014.
[14] T. Lindeberg, “Scale-space Theory: A Basic Tool for Analyzing Structures at Different Scales,” Journal of Applied Statistics, vol. 21, no. 1-2, pp. 225-270, 1994.
[15] M. A. Fischler and R. C. Bolles, “Random Sample Consensus: a Paradigm for Model Fitting with Applications to image Analysis and Automated Cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381-395, 1981.
[16] W. Wei, H. Jun, and T. Yiping, “Image Matching for Geomorphic Measurement Based on SIFT and RANSAC Methods,” in Proceedings of the International Conference on Computer Science and Software Engineering, Wuhan, China, pp. 317-320, 2008.
[17] P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient Graph-based Image Segmentation,” International Journal of Computer Vision, vol. 59, no. 2, pp. 167-181, 2004.

無法下載圖示 全文公開日期 2020/08/21 (校內網路)
全文公開日期 2025/08/21 (校外網路)
全文公開日期 2025/08/21 (國家圖書館:臺灣博碩士論文系統)
QR CODE