研究生: |
歐庭嘉 Ting-Jia Ou |
論文名稱: |
一個結合超像素空間金字塔的詞袋方法用以有效增強室外場景的語意分割 A Bag of Words Approach Combined with a Spatial Pyramid of Superpixels to Effectively Enhancing the Semantic Segmentation of Outdoor Scenes |
指導教授: |
Chin-Shyurng Fahn |
口試委員: |
Jian-de Lee 王榮華 Rong-hua Wang 花凱龍 Kai-long Hua |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
論文出版年: | 2015 |
畢業學年度: | 104 |
語文別: | 英文 |
論文頁數: | 84 |
中文關鍵詞: | 詞袋 、語意分割 、顏色分割 、尺度不變特徵轉換特徵 、空間金字塔 、超像素 |
外文關鍵詞: | BOW, Semantic Segmentation, Color segmentation, SIFT, Spatial pyramid, Superpixel |
相關次數: | 點閱:734 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在本篇論文中,我們提出了一套能適應大部分的室外環境,並將室外影像進行簡單而精確語意分割的系統。我們先將影像轉到接近人類視覺且具感知均勻性的CIE Lab色彩空間,接著透過模糊演算法搭配形態學,並以超像素的型態進行分割與合併,達到初步的顏色分割前處理,接著再將擷取所得的各個同質區塊輸入至Bag of Word模型當中,並輔以空間金字塔中的空間資訊,藉由SVM分類器比對各個簽章以達到分類的效果。
Nowadays, as photographing tool like cell phone and intelligent surveillance system have become an indispensable tool in our life, algorithms for analysis and processing of image and videos have sprung up in the past few years. Therefore, people and researchers wonder a better understanding of the content in the image, which could help their related analysis.
Image segmentation has always been the central problem in image processing, and semantic segmentation, segmenting image into several meaningful and homogeneous regions, is the most challenging part in this field and can provide a better way to realize an image. Presumably processing and analysis afterwards will be much faster and more efficient.
In this thesis, a semantic segmentation system is proposed. This system adapts to most outdoor environments and operate semantic segmentation to outdoor scene images. Our proposed method partitions and merges images through fuzzy algorithms and with superpixel patterns as the preprocessing of the color segmentation, then access the homogenous areas obtained into Bag of Word model, and use SVM classifier to compare their signatures with each types with the spatial information in the spatial pyramid to achieve the purpose of classification.
In the experiment part, the areas are approximately classified into sky, tree, building, grass, and road five types. And the results show that our system has enhanced about 5 % accuracy compared to related researches. We also have an accuracy increment in expanded 7-class classification, and it goes to show good performance and strong robustness of our method.
[1]G. Csurka , C. R. Dance , L. Fan , J. Willamowski, and C. Bray, “Visual categorization with bags of keypoints,” in Proceedings of the ECCV SLCV Workshop, Prague, Czech Republic, vol. 1, no. 1-22, pp.1 -12, 2004.
[2]N. Dalal, and B. Triggs, “Histograms of oriented gradients for human detection,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, vol. 1, pp. 886-893, 2005.
[3]S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, New York ,vol. 2, pp. 2169-2178, 2006.
[4] J. Xiao, and L. Quan, “Multiple view semantic segmentation for street view images,” in Proceedings of the IEEE 12th International Conference on Computer Vision, Kyoto, Japan, pp. 686–693, October 29, 2009.
[5] W. Bouachir, A. Torabi, and G. Bilodeau, "A bag of words approach for semantic segmentation of monitored scenes," arXiv preprint arXiv:1305.3189, 2013.
[6] R. E. Schapire and Y. Singer, "Improved boosting algorithms using confidence-rated predictions," Machine learning, vol. 37, no. 3, pp. 297-336, 1999.
[7]M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: Active contour models,” International journal of computer vision, vol. 1, no. 4, pp. 321–331, 1988.
[8]Jianping Fan, David. K. Y. Yau, “Automatic Image Segmentation by Integrating Color-Edge Extraction and Seeded Region Growing,” IEEE Transactions on Image Processing, vol. 10, no. 10, pp. 1454-1466, 2001
[9]S. Belongie, C. Carson, H. Greenspan, and J. Malik, “Color and texture-based image segmentation using EM and its application to content-based image retrieval,” in Proceedings of the IEEE Sixth International Conference on Computer Vision, Bombay, India, pp. 675–682, 1998.
[10]J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no.8, pp. 888-905, 2000.
[11] PF. Felzenszwalb and D.P. Huttenlocher, “Efficient graph-based image segmentation,” International Journal of Computer Vision, vol. 59, pp. 167–181, 2004.
[12]D. Comaniciu and P. Meer, “Mean shift: A robust approach toward feature space analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 5, pp. 603-619, 2002.
[13] J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, 2000.
[14] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, S. Susstrunk, “SLIC Superpixels Compared to State-of-the-art Superpixel Methods,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 11, pp. 2274-2282, 2012.
[15] J. Sivic and A. Zisserman, “Video google: a text retrieval approach to object matching in videos,” in Proceedings of the IEEE Ninth International Conference on Computer Vision, Nice, France, pp. 1470-1477, 2003.
[16] A. Agarwal and B. Triggs, “Hyperfeatures - multilevel local coding for visual recognition,” in Proceedings of the European Conference on Computer Vision, Graz, Austria, pp. 30-43, 2006.
[17] J. Krapac, J. Verbeek, and F. Jurie, "Modeling spatial layout with fisher vectors for image categorization," in Proceedings of the IEEE International Conference on Computer Vision, Barcelona, Spain, pp. 1487-1494, 2011.
[18] Yi Ren, A. Bugeau and J. Benois-Pineau, “Bag-of-Bags of Words Irregular Graph Pyramids vs Spatial Pyramid Matching for Image Retrieval,” in Proceedings of the IEEE 4th International Conference on Image Processing Theory, Tools and Applications (IPTA), Paris, France, pp. 1-6, 2014.
[19] J.C. Gemert, C.J. Veenman, A.W.M. Smeulders, and J.-M. Geusebroek, “Visual Word Ambiguity,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 7, pp. 1271-1283, 2010.
[20] G.V. Pedrosa, A.J.M. Traina, “From Bag-of-Visual-Words to Bag-of-Visual-Phrases using n-Grams,” in Proceedings of the IEEE 26th Conference on Graphics, Patterns and Images (SIBGRAPI), Arequipa, Peru, pp. 304–311, 2013.
[21] http://homepages.inf.ed.ac.uk/rbf/HIPR2/median.htm
[22] K. Suzuki, I. Horiba, and N. Sugie, “Linear-time connected-component labeling based on sequential local operations,” Computer Vision and Image Understanding, vol. 89, no. 1, pp. 1-23, 2003.
[23] M. Queen, James. “Some methods for classification and analysis of multivariate observations.” in Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, University of California Press, vol. 1, no. 14, pp. 281-297, 1967.
[24] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, Nov. 2004.
[25] Witkin, P. Andrew, “Scale-space filtering: A new approach to multi-scale description.” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing(ICASSP'84), San Diego, vol. 9, 1984.
[26] Koenderink, J. Jan, “The structure of images,” Biological cybernetics, vol. 50 no. 5, pp. 363-37, 1984.
[27] T. Lindeberg, “Scale-space theory in computer vision,” Springer Science & Business Media, vol. 256, 2013.
[28] R. Szeliski, “Computer Vision: Algorithms and Applications,” Springer Science & Business Media, 2010.
[29] D. G. Lowe, “Distinctive Image Features from Scale-invariant Keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
[30] Cortes, Corinna, and V. Vapnik, “Support-vector networks,” Machine learning, vol. 20, no. 3, pp. 273-297, 1995.
[31] B. Liu, S. Gould, D. Koller, “Single Image Depth Estimation from Predicted Semantic Labels,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, 2010.
[32] A. Saxena, M. Sun, A. Y. Ng, “Make3D: Learning 3D Scene Structure from a Single Still Image,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 5, pp. 824-840, 2009.
[33] “Weka Data Mining Software,” [Online] (Available access on 10/31) http://www.cs.waikato.ac.nz/ml/weka/