研究生: |
王煒翔 WEI-HSIANG WANG |
---|---|
論文名稱: |
用於單一鏡頭移動拍攝之靜態場景相對深度估測技術 On the Relative Depth Estimation Techniques Used for Static Scene Videos Captured by Moving a Single Camera Lens |
指導教授: |
范欽雄
Chin-Shyurng Fahn |
口試委員: |
林啟芳
Chi-Fang Lin 黃榮堂 Jung-Tang Huang 馮輝文 Huei-Wen Ferng |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
論文出版年: | 2015 |
畢業學年度: | 103 |
語文別: | 英文 |
論文頁數: | 60 |
中文關鍵詞: | 相對深度估測 、智慧型裝置 、影像分割 、對極幾何 、尺度不變特徵轉換 |
外文關鍵詞: | Relative Depth Estimation, Smart Device, Image Segmentation, Epipolar Geometry, Scale-invariant Feature Transform |
相關次數: | 點閱:242 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來,深度圖一直被廣泛的運用,從Kinect的即時影像深度圖可以輕鬆的抓取人物動作,在人機互動上是非常重要的一環,而在智慧型手機風行的年代,靜態影像的深度圖也大為流行,被用來製作華麗的照片修改效果,靜態影像的相對深度偵測被受重視,為了此功能,甚至有手機大廠為了更好的影像處理效果,不惜成本在智慧型手機上搭載雙攝影機,但多數的智慧型手機仍然只搭載單攝影機,為了節省成本,單攝影機製作靜態影像深度圖的研究就更為重要,由於多數需要靜態深度圖支援的圖像處理效果,僅需要相對深度資訊即可,不需要精確的絕對深度資訊,所以本論文只針對相對深度資訊的估測進行研究。
本文深度圖的製作概念是基於物體在連續影格上的移動量與其物體的遠近成正比而製作,概念類似於人坐在車上看窗外景物時,邊的東西移動的很快,但太陽或月亮幾乎不會移動。
我們將會使用單攝影機錄製一段影片,此影片在拍攝期間會輕微的水平抑或垂直的晃動,當作製作深度圖的輸入資訊,針對每一個影格進行關鍵點的偵測,再將影片第一個影格的關鍵點使用尺度不變特徵轉換與其他影格的關鍵點做匹配,算出關鍵點之間的位移,當作深度估計的依據,再透過影像分割,將深度資訊填滿於所有的區塊。本文的實驗結果可以發現在複雜場景時,得到的深度資訊正確率較高,這是因為我們的方法是基於尺度不變特徵轉換所製作,對於特徵點較多的場景較為有利。
In recent year, depth map is used extensively. The real time depth map captured by Kinect can get the human motion easily. It is very important in human–computer interaction. However, since smartphone is popularized, static scene depth maps are very popular. It is used to edit the photo with some special effects. To get a better static scene depth map, some smartphone company produce the smartphone with dual cameras, sparing no efforts. But most of company have the smartphone with dual cameras require cost-down. So static scene depth maps produced by single camera is more important. Most of popular special effects of photos only need relative depth information, don’t need the absolutely depth, so this thesis estimate relative depth.
In this thesis, the concept of making depth map is based on the distance of moving object on the video frames proportionating to the depth. It is like the situation when a person sit in a moving car who can observe that the object out of car close to us move very fast and sun never move.
We use a single camera to record a video to produce the depth map. While recording, the camera need vertically and/or horizontally jiggle. The video is detected the keypoint, then use them to match the keypoints by Scale-invariant Feature Transform. Distance of each matching keypoints is the depth information. Using the image segmentation to divide the image into several blocks can help the depth information expend to all the image by filling the depth in its blocks. The experiment of our method can realize the sense with complex black ground would get the more correct result.
[1] D. T. Vu, B. Chidester, J. Lu, and M. N. Do, “Scribble2focus: An Interactive Photo Refocusing System Based on Mobile Stereo Imaging,” in Proceedings of the IEEE Global Conference on Signal and Information, Austin, Texas, pp. 755-758, 2013.
[2] T. Ogi, Y. Tateyama, H. Lu, and E. Ikeda, “Digital 3D Ukiyoe Using the Effect of Motion Parallax,” in Proceedings of the 15th International Conference on Network-Based Information Systems, Melbourne, Australia, pp. 534-539, 2012.
[3] S. Lee, J. Lee, M. H. Hayes, A. K. Katsaggelos, and J. Paik, “Single Camera-based Full Depth Map Estimation Using Color Shifting Property of a Multiple Color-filter Aperture,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Kyoto, Japan, pp. 801-804, 2012.
[4] Z. Hu, L. Xu, and M. H. Yang, “Joint Depth Estimation and Camera Shake Removal from Single Blurry Image,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Ohio, pp. 2893-2900, 2014.
[5] S. H. Raza, O. Javed, A. Das, H. Sawhney, H. Cheng, and I. Essa, “Depth Extraction from Videos Using Geometric Context and Occlusion Boundaries,” in Proceedings of the British Machine Vision Conference, Nottingham, England, 2014.
[6] H. Hirschmüller, “Accurate and Efficient Stereo Processing by Semi-global Matching and Mutual Information, ” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, California, pp. 807-814, 2005.
[7] D. G. Lowe, “Distinctive Image Features from Scale-invariant Keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
[8] “Xbox 360 Accessories | Xbox,” [Online] Available http://www.xbox.com/
en-US/xbox-360/accessories?xr=shellnav (accessed on July 25, 2015).
[9] N. Chahal and S. Chaudhury, “High Quality Depth Map Estimation by Kinect Upsampling and Hole Filling Using RGB Features and Mutual Information,” in Proceedings of the Fourth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics, Jodhpur, India, pp. 1-4, 2013.
[10] “Smartphones | HTC United States, ” [Online] Available http://www.htc.com/
us/smartphones/ (accessed on July 25, 2015).
[11] “HTCdev - OpenSense SDK | HTC Dual Lens SDK, ” [Online] Available https://www.htcdev.com/devcenter/opensense-sdk/htc-dual-lens-api/sdk-overview-sample-code/ (accessed on July 25, 2015).
[12] R. Szeliski, Computer Vision: Algorithms and Applications: Springer Science & Business Media, Berlin, Germany, 2010.
[13] F. Yu and D. Gallup, “3D Reconstruction from Accidental Motion,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Ohio, pp. 3986-3993, 2014.
[14] T. Lindeberg, “Scale-space Theory: A Basic Tool for Analyzing Structures at Different Scales,” Journal of Applied Statistics, vol. 21, no. 1-2, pp. 225-270, 1994.
[15] M. A. Fischler and R. C. Bolles, “Random Sample Consensus: a Paradigm for Model Fitting with Applications to image Analysis and Automated Cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381-395, 1981.
[16] W. Wei, H. Jun, and T. Yiping, “Image Matching for Geomorphic Measurement Based on SIFT and RANSAC Methods,” in Proceedings of the International Conference on Computer Science and Software Engineering, Wuhan, China, pp. 317-320, 2008.
[17] P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient Graph-based Image Segmentation,” International Journal of Computer Vision, vol. 59, no. 2, pp. 167-181, 2004.