用於單一鏡頭移動拍攝之靜態場景相對深度估測技術｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	王煒翔 WEI-HSIANG WANG
論文名稱：	用於單一鏡頭移動拍攝之靜態場景相對深度估測技術 On the Relative Depth Estimation Techniques Used for Static Scene Videos Captured by Moving a Single Camera Lens
指導教授：	范欽雄 Chin-Shyurng Fahn
口試委員:	林啟芳 Chi-Fang Lin 黃榮堂 Jung-Tang Huang 馮輝文 Huei-Wen Ferng
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2015
畢業學年度：	103
語文別：	英文
論文頁數：	60
中文關鍵詞：	相對深度估測、智慧型裝置、影像分割、對極幾何、尺度不變特徵轉換
外文關鍵詞：	Relative Depth Estimation, Smart Device, Image Segmentation, Epipolar Geometry, Scale-invariant Feature Transform
相關次數：	點閱：242 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

　　近年來，深度圖一直被廣泛的運用，從Kinect的即時影像深度圖可以輕鬆的抓取人物動作，在人機互動上是非常重要的一環，而在智慧型手機風行的年代，靜態影像的深度圖也大為流行，被用來製作華麗的照片修改效果，靜態影像的相對深度偵測被受重視，為了此功能，甚至有手機大廠為了更好的影像處理效果，不惜成本在智慧型手機上搭載雙攝影機，但多數的智慧型手機仍然只搭載單攝影機，為了節省成本，單攝影機製作靜態影像深度圖的研究就更為重要，由於多數需要靜態深度圖支援的圖像處理效果，僅需要相對深度資訊即可，不需要精確的絕對深度資訊，所以本論文只針對相對深度資訊的估測進行研究。
　　本文深度圖的製作概念是基於物體在連續影格上的移動量與其物體的遠近成正比而製作，概念類似於人坐在車上看窗外景物時，邊的東西移動的很快，但太陽或月亮幾乎不會移動。
　　我們將會使用單攝影機錄製一段影片，此影片在拍攝期間會輕微的水平抑或垂直的晃動，當作製作深度圖的輸入資訊，針對每一個影格進行關鍵點的偵測，再將影片第一個影格的關鍵點使用尺度不變特徵轉換與其他影格的關鍵點做匹配，算出關鍵點之間的位移，當作深度估計的依據，再透過影像分割，將深度資訊填滿於所有的區塊。本文的實驗結果可以發現在複雜場景時，得到的深度資訊正確率較高，這是因為我們的方法是基於尺度不變特徵轉換所製作，對於特徵點較多的場景較為有利。

　　In recent year, depth map is used extensively. The real time depth map captured by Kinect can get the human motion easily. It is very important in human–computer interaction. However, since smartphone is popularized, static scene depth maps are very popular. It is used to edit the photo with some special effects. To get a better static scene depth map, some smartphone company produce the smartphone with dual cameras, sparing no efforts. But most of company have the smartphone with dual cameras require cost-down. So static scene depth maps produced by single camera is more important. Most of popular special effects of photos only need relative depth information, don’t need the absolutely depth, so this thesis estimate relative depth.
　　In this thesis, the concept of making depth map is based on the distance of moving object on the video frames proportionating to the depth. It is like the situation when a person sit in a moving car who can observe that the object out of car close to us move very fast and sun never move.
　　We use a single camera to record a video to produce the depth map. While recording, the camera need vertically and/or horizontally jiggle. The video is detected the keypoint, then use them to match the keypoints by Scale-invariant Feature Transform. Distance of each matching keypoints is the depth information. Using the image segmentation to divide the image into several blocks can help the depth information expend to all the image by filling the depth in its blocks. The experiment of our method can realize the sense with complex black ground would get the more correct result.

中文摘要		i
Abstract		ii
致謝		iii
Contents		iv
List of Figures	vi
List of Tables		ix
Chapter 1	Introduction	10
1	Overview	10
2	Motivation	11
3	System Description	11
4	Thesis organization	13
Chapter 2	Related Works	14
1	Depth Map Produced by Multiple Cameras	14
1.1	Kinect	14
1.2	HTC Smartphone M8 and M9+	15
2	Single Camera	16
2.1	Epipolar Geometry	17
2.2	Structure from Motion	19
Chapter 3	Relative Depth Map	21
1	Generating the Relative Depth Information from a Video	21
2	Detection and Matching the Feature Points	22
2.1	Building a Scale-space	23
2.2	Building a DoG Scale-space	24
2.3	Directionality of Feature Point	27
2.4	Keypoint Descriptor	28
2.5	Keypoint Matching	29
3	Limited Recycling Incorrect Matching	29
3.1	Removing Matching by Random Sample Consensus	30
3.2	Removing Matching by the Upper Bound of Depth.	32
4 Average Relative Depth Map	33
Chapter 4	Expanding Depth Information	39
1	Image Segmentation	40
2	Filling Depth in the Blank of Keypoints	42
3	Removing the Depth Unable to Trust	44
4	Eliminating Empty Blocks	44
Chapter 5	Experimental Results and Discussions	47
1	Experiment Setup	47
2	The Precision of Relative Depth Estimation	48
3	The Relative Relationship Results of Simple Indoor Scenes	51
4	The Relative Relationship Results of Complex Outdoor Scenes	52
5	Comparing Our Method with the Stereo Vision Method	54
Chapter 6	Conclusions and Future Works	56
1	Conclusions	56
2	Future Works	57
References		58

                                

[1] D. T. Vu, B. Chidester, J. Lu, and M. N. Do, “Scribble2focus: An Interactive Photo Refocusing System Based on Mobile Stereo Imaging,” in Proceedings of the IEEE Global Conference on Signal and Information, Austin, Texas, pp. 755-758, 2013.
[2] T. Ogi, Y. Tateyama, H. Lu, and E. Ikeda, “Digital 3D Ukiyoe Using the Effect of Motion Parallax,” in Proceedings of the 15th International Conference on Network-Based Information Systems, Melbourne, Australia, pp. 534-539, 2012.
[3] S. Lee, J. Lee, M. H. Hayes, A. K. Katsaggelos, and J. Paik, “Single Camera-based Full Depth Map Estimation Using Color Shifting Property of a Multiple Color-filter Aperture,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Kyoto, Japan, pp. 801-804, 2012.
[4] Z. Hu, L. Xu, and M. H. Yang, “Joint Depth Estimation and Camera Shake Removal from Single Blurry Image,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Ohio, pp. 2893-2900, 2014.
[5] S. H. Raza, O. Javed, A. Das, H. Sawhney, H. Cheng, and I. Essa, “Depth Extraction from Videos Using Geometric Context and Occlusion Boundaries,” in Proceedings of the British Machine Vision Conference, Nottingham, England, 2014.
[6] H. Hirschmüller, “Accurate and Efficient Stereo Processing by Semi-global Matching and Mutual Information, ” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, California, pp. 807-814, 2005.
[7] D. G. Lowe, “Distinctive Image Features from Scale-invariant Keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
[8] “Xbox 360 Accessories | Xbox,”　[Online] Available http://www.xbox.com/
en-US/xbox-360/accessories?xr=shellnav (accessed on July 25, 2015).
[9] N. Chahal and S. Chaudhury, “High Quality Depth Map Estimation by Kinect Upsampling and Hole Filling Using RGB Features and Mutual Information,” in Proceedings of the Fourth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics, Jodhpur, India, pp. 1-4, 2013.
[10] “Smartphones | HTC United States, ”　[Online] Available http://www.htc.com/
us/smartphones/ (accessed on July 25, 2015).
[11] “HTCdev - OpenSense SDK | HTC Dual Lens SDK, ”　[Online] Available https://www.htcdev.com/devcenter/opensense-sdk/htc-dual-lens-api/sdk-overview-sample-code/ (accessed on July 25, 2015).
[12] R. Szeliski, Computer Vision: Algorithms and Applications: Springer Science & Business Media, Berlin, Germany, 2010.
[13] F. Yu and D. Gallup, “3D Reconstruction from Accidental Motion,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Ohio, pp. 3986-3993, 2014.
[14] T. Lindeberg, “Scale-space Theory: A Basic Tool for Analyzing Structures at Different Scales,” Journal of Applied Statistics, vol. 21, no. 1-2, pp. 225-270, 1994.
[15] M. A. Fischler and R. C. Bolles, “Random Sample Consensus: a Paradigm for Model Fitting with Applications to image Analysis and Automated Cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381-395, 1981.
[16] W. Wei, H. Jun, and T. Yiping, “Image Matching for Geomorphic Measurement Based on SIFT and RANSAC Methods,” in Proceedings of the International Conference on Computer Science and Software Engineering, Wuhan, China, pp. 317-320, 2008.
[17] P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient Graph-based Image Segmentation,” International Journal of Computer Vision, vol. 59, no. 2, pp. 167-181, 2004.

全文公開日期 2020/08/21 (校內網路)
全文公開日期 2025/08/21 (校外網路)
全文公開日期 2025/08/21 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文