簡易檢索 / 詳目顯示

研究生: 歐庭嘉
Ting-Jia Ou
論文名稱: 一個結合超像素空間金字塔的詞袋方法用以有效增強室外場景的語意分割
A Bag of Words Approach Combined with a Spatial Pyramid of Superpixels to Effectively Enhancing the Semantic Segmentation of Outdoor Scenes
指導教授: 范欽雄
Chin-Shyurng Fahn
口試委員: 李建德
Jian-de Lee
王榮華
Rong-hua Wang
花凱龍
Kai-long Hua
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2015
畢業學年度: 104
語文別: 英文
論文頁數: 84
中文關鍵詞: 詞袋語意分割顏色分割尺度不變特徵轉換特徵空間金字塔超像素
外文關鍵詞: BOW, Semantic Segmentation, Color segmentation, SIFT, Spatial pyramid, Superpixel
相關次數: 點閱:277下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著近年來拍照工具及監視錄影器已經成為我們生活中不可或缺的常用工具,對影像及影片的分析處理演算法在過去的幾年如雨後春筍般湧出,也因此,人們更想對於圖片中的內容能有更深的了解,以利相關的分析處理。
    影像分割一直以來都是熱門的影像處理議題,其中的語意分割是最具挑戰性的部分,其目的在於將影像區分為若干個同質且具有意義的區塊,對圖像中的景物不僅能有更確切的定義與分類,當然同時也提供了另一個理解影像的方法,想必之後所施行的處理分析,一定會更為快速、有效率。
    在本篇論文中,我們提出了一套能適應大部分的室外環境,並將室外影像進行簡單而精確語意分割的系統。我們先將影像轉到接近人類視覺且具感知均勻性的CIE Lab色彩空間,接著透過模糊演算法搭配形態學,並以超像素的型態進行分割與合併,達到初步的顏色分割前處理,接著再將擷取所得的各個同質區塊輸入至Bag of Word模型當中,並輔以空間金字塔中的空間資訊,藉由SVM分類器比對各個簽章以達到分類的效果。
    在實驗測試中,區塊類別我們大致歸為天空、樹、草、路以及建築物五類,與相關的研究結果相比,成長了約5%左右的準確度,另外對於加入石頭、水而擴增而成的七個類別,也仍有一定程度的準確度提升,以此證明本論文的語意分割系統在準確度上有顯著的提升且有良好的強健度。


    Nowadays, as photographing tool like cell phone and intelligent surveillance system have become an indispensable tool in our life, algorithms for analysis and processing of image and videos have sprung up in the past few years. Therefore, people and researchers wonder a better understanding of the content in the image, which could help their related analysis.
    Image segmentation has always been the central problem in image processing, and semantic segmentation, segmenting image into several meaningful and homogeneous regions, is the most challenging part in this field and can provide a better way to realize an image. Presumably processing and analysis afterwards will be much faster and more efficient.
    In this thesis, a semantic segmentation system is proposed. This system adapts to most outdoor environments and operate semantic segmentation to outdoor scene images. Our proposed method partitions and merges images through fuzzy algorithms and with superpixel patterns as the preprocessing of the color segmentation, then access the homogenous areas obtained into Bag of Word model, and use SVM classifier to compare their signatures with each types with the spatial information in the spatial pyramid to achieve the purpose of classification.
    In the experiment part, the areas are approximately classified into sky, tree, building, grass, and road five types. And the results show that our system has enhanced about 5 % accuracy compared to related researches. We also have an accuracy increment in expanded 7-class classification, and it goes to show good performance and strong robustness of our method.

    摘要 Abstract List of Figures List of Tables Chapter 1Introduction 1.1Overview 1.2Motivation 1.3System Description 1.4Thesis Organization Chapter 2Background and Related Work 2.1Reviews of Image Segmentation 2.1.1Edge-based image segmentation 2.1.2Region-based image segmentation 2.2Reviews of Bag-of-Words Model Chapter 3Image Segmentation 3.1Image Preprocessing 3.1.1Median blur 3.1.2Color space transformation 3.2Superpixel SLIC Method 3.2.1Algorithm 3.2.2Distance measure 3.3Connected component Chapter 4Bag of Word Model 4.1Feature Extraction 4.1.1Scale-space building 4.1.2DoG scale-space building 4.1.3Directionality of feature point 4.1.4Keypoint descriptor 4.2Codebook generation 4.2.1K-means 4.2.2Vocabulary size 4.3Spatial Pyramid 4.4SVM classifier 4.5Classification with Additional Spatial Information Chapter 5Experimental Results and Discussions 5.1Experimental Setup 5.2The Result of Color Segmentation 5.3The Result of Semantic Segmentation Chapter 6Conclusions and Future Works 6.1Conclusions 6.2Future Works Reference

    [1]G. Csurka , C. R. Dance , L. Fan , J. Willamowski, and C. Bray, “Visual categorization with bags of keypoints,” in Proceedings of the ECCV SLCV Workshop, Prague, Czech Republic, vol. 1, no. 1-22, pp.1 -12, 2004.
    [2]N. Dalal, and B. Triggs, “Histograms of oriented gradients for human detection,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, vol. 1, pp. 886-893, 2005.
    [3]S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, New York ,vol. 2, pp. 2169-2178, 2006.
    [4] J. Xiao, and L. Quan, “Multiple view semantic segmentation for street view images,” in Proceedings of the IEEE 12th International Conference on Computer Vision, Kyoto, Japan, pp. 686–693, October 29, 2009.
    [5] W. Bouachir, A. Torabi, and G. Bilodeau, "A bag of words approach for semantic segmentation of monitored scenes," arXiv preprint arXiv:1305.3189, 2013.
    [6] R. E. Schapire and Y. Singer, "Improved boosting algorithms using confidence-rated predictions," Machine learning, vol. 37, no. 3, pp. 297-336, 1999.
    [7]M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: Active contour models,” International journal of computer vision, vol. 1, no. 4, pp. 321–331, 1988.
    [8]Jianping Fan, David. K. Y. Yau, “Automatic Image Segmentation by Integrating Color-Edge Extraction and Seeded Region Growing,” IEEE Transactions on Image Processing, vol. 10, no. 10, pp. 1454-1466, 2001
    [9]S. Belongie, C. Carson, H. Greenspan, and J. Malik, “Color and texture-based image segmentation using EM and its application to content-based image retrieval,” in Proceedings of the IEEE Sixth International Conference on Computer Vision, Bombay, India, pp. 675–682, 1998.
    [10]J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no.8, pp. 888-905, 2000.
    [11] PF. Felzenszwalb and D.P. Huttenlocher, “Efficient graph-based image segmentation,” International Journal of Computer Vision, vol. 59, pp. 167–181, 2004.
    [12]D. Comaniciu and P. Meer, “Mean shift: A robust approach toward feature space analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 5, pp. 603-619, 2002.
    [13] J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, 2000.
    [14] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, S. Susstrunk, “SLIC Superpixels Compared to State-of-the-art Superpixel Methods,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 11, pp. 2274-2282, 2012.
    [15] J. Sivic and A. Zisserman, “Video google: a text retrieval approach to object matching in videos,” in Proceedings of the IEEE Ninth International Conference on Computer Vision, Nice, France, pp. 1470-1477, 2003.
    [16] A. Agarwal and B. Triggs, “Hyperfeatures - multilevel local coding for visual recognition,” in Proceedings of the European Conference on Computer Vision, Graz, Austria, pp. 30-43, 2006.
    [17] J. Krapac, J. Verbeek, and F. Jurie, "Modeling spatial layout with fisher vectors for image categorization," in Proceedings of the IEEE International Conference on Computer Vision, Barcelona, Spain, pp. 1487-1494, 2011.
    [18] Yi Ren, A. Bugeau and J. Benois-Pineau, “Bag-of-Bags of Words Irregular Graph Pyramids vs Spatial Pyramid Matching for Image Retrieval,” in Proceedings of the IEEE 4th International Conference on Image Processing Theory, Tools and Applications (IPTA), Paris, France, pp. 1-6, 2014.
    [19] J.C. Gemert, C.J. Veenman, A.W.M. Smeulders, and J.-M. Geusebroek, “Visual Word Ambiguity,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 7, pp. 1271-1283, 2010.
    [20] G.V. Pedrosa, A.J.M. Traina, “From Bag-of-Visual-Words to Bag-of-Visual-Phrases using n-Grams,” in Proceedings of the IEEE 26th Conference on Graphics, Patterns and Images (SIBGRAPI), Arequipa, Peru, pp. 304–311, 2013.
    [21] http://homepages.inf.ed.ac.uk/rbf/HIPR2/median.htm
    [22] K. Suzuki, I. Horiba, and N. Sugie, “Linear-time connected-component labeling based on sequential local operations,” Computer Vision and Image Understanding, vol. 89, no. 1, pp. 1-23, 2003.
    [23] M. Queen, James. “Some methods for classification and analysis of multivariate observations.” in Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, University of California Press, vol. 1, no. 14, pp. 281-297, 1967.
    [24] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, Nov. 2004.
    [25] Witkin, P. Andrew, “Scale-space filtering: A new approach to multi-scale description.” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing(ICASSP'84), San Diego, vol. 9, 1984.
    [26] Koenderink, J. Jan, “The structure of images,” Biological cybernetics, vol. 50 no. 5, pp. 363-37, 1984.
    [27] T. Lindeberg, “Scale-space theory in computer vision,” Springer Science & Business Media, vol. 256, 2013.
    [28] R. Szeliski, “Computer Vision: Algorithms and Applications,” Springer Science & Business Media, 2010.
    [29] D. G. Lowe, “Distinctive Image Features from Scale-invariant Keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
    [30] Cortes, Corinna, and V. Vapnik, “Support-vector networks,” Machine learning, vol. 20, no. 3, pp. 273-297, 1995.
    [31] B. Liu, S. Gould, D. Koller, “Single Image Depth Estimation from Predicted Semantic Labels,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, 2010.
    [32] A. Saxena, M. Sun, A. Y. Ng, “Make3D: Learning 3D Scene Structure from a Single Still Image,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 5, pp. 824-840, 2009.
    [33] “Weka Data Mining Software,” [Online] (Available access on 10/31) http://www.cs.waikato.ac.nz/ml/weka/

    無法下載圖示 全文公開日期 2020/12/02 (校內網路)
    全文公開日期 2025/12/02 (校外網路)
    全文公開日期 2025/12/02 (國家圖書館:臺灣博碩士論文系統)
    QR CODE