簡易檢索 / 詳目顯示

研究生: 鄧惟勝
Thang - Duy Dang
論文名稱: 一個基於學習演算法的自然影像辨識系統
A Learning-based Algorithm for Natural Scene Recognition
指導教授: 花凱龍
Kai-Lung Hua
口試委員: 楊傳凱
Chuan-kai Yang
鄧惟中
Wei-Chung Teng
鄭文皇
Wen-Huang Cheng
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2015
畢業學年度: 103
語文別: 英文
論文頁數: 45
中文關鍵詞: 場景辨識稀疏編碼spatial pyramid pooling
外文關鍵詞: scene recognition, sparse representation, spatial pyramid pooling.
相關次數: 點閱:168下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 場景辨識在影像及視訊處理的領域上是一個重要的研究課題。其有著非常廣泛的應用,例如物件偵測與辨識、影像檢索、智慧型車載系統、機器人導航...等等。然而由於大自然影像常常伴隨複雜的光影變化,使得影像分析的難度大為提升。在這篇論文中,我們提出新的模型去學習及辨認大自然的場景,這個模型結合了locality-constrained稀疏編碼、Spatial Pyramid Pooling、線性支持向量機。首先,我們先對訓練影像提取出SIFT特徵以描述影像的局部空間資訊,這些特徵即被視為代表場景類別的codeword。接著我們使用locality-constrained稀疏編碼去學習codeword在訓練集合中的組成與分布,再將其經由改良過後的Spatial Pyramid Pooling做空間局部特徵的編碼。其中Spatial Pyramid Pooling在場景與物件辨識上的成效在近年文獻中已經被證明。在測試階段中,我們同樣對測試影像提取出稀疏編碼與Spatial Pyramid Pooling後的局部特徵,最後用線性支持向量機加以分類。從實驗的結果可以證明我們提出的系統在分類的準確率上優於其他的方法。


    Scene recognition is an important problem in many application areas of image and video processing. Scene recognition has a wide range of applications, such as object recognition and detection, content-based image indexing and retrieval and intelligent vehicle and robot navigation. However, the natural scene images tend to be very complex and difficult to analyze due to changes of illumination and transformation. In this thesis, we will investigate into building a novel model to learn and recognize scenes in nature.
    This study proposed a new approach that combines locality-constrained sparse coding (LCSP), Spatial Pyramid Pooling and linear SVM in end-to-end model. Firstly, interesting points each image in the training set are extracted by a local descriptor as dense SIFT which represents local spatial information. These features known as codewords and each codeword is represented as part of a topic. Then we employs LCSP algorithm to learn the codeword distribution of those local features from the training dataset. Next, a modified Spatial Pyramid Pooling model is employed for encoding the spatial distribution of local features. Spatial Pyramid Pooling model has been remarkably successful in terms of both scene and object recognition. In the testing stage, a linear SVM will be used to classify local features which are encoded by Spatial Pyramid Pooling. The new system achieved very competitive results and leading to state-of-the-art performance on several benchmarks.

    Abstract i Acknowledgement ii Table of Contents iii List of Tables v List of Illustrations vi Chapter 1 Introduction 1 1.1 Introduction of recognition of natural scene images 1 1.2 Literature review 1 Chapter 2 Proposed Model 4 2.1 The algorithm for recognizing natural images 4 2.2 Extract local features 8 2.3 Locality constrained sparse coding (LCSP) 10 2.4 Spatial Pyramid Pooling 11 2.5 Support Vector Machine (SVM) 13 Chapter 3 Experimental Results 17 3.1 The 8-class sports event 18 3.2 13-category scenes 19 3.3 15-class scene category 21 3.4 8-class outdoor 22 3.5 Caltech-101 dataset 24 3.6 MIT 67-class indoor scene 27 Chapter 4 Conclusion 31 References 32

    [1] J. Yang and J. Coughlin, “In-vehicle technology for self-driving cars: Advantages and challenges for aging drivers,” International Journal of Automotive Technology, vol. 15, no. 2, pp. 333–340, 2014.
    [2] J. Pineau, M. Montemerlo, M. Pollack, N. Roy, and S. Thrun, “Towards robotic assistants in nursing homes: Challenges and results,” Robotics and Autonomous Systems, vol. 42, no. 3, pp. 271–281, 2003.
    [3] S. Thrun, M. Bennewitz, W. Burgard, A. Cremers, F. Dellaert, D. Fox, D. Haehnel, C. Rosenberg, N. Roy, J. Schulte, et al., “Minerva: A second geration mobile tour-guide robot,” in IEEE Int. Conf. Robot. Autom, pp. 3136–3141, 1999.
    [4] A. Corrales Paredes, M. Malfaz, and M. A. Salichs, “Signage system for the navigation of autonomous robots in indoor environments,” Industrial Informatics, IEEE Transactions on, vol. 10, no. 1, pp. 680–688, 2014.
    [5] T. Harada, Y. Ushiku, Y. Yamashita, and Y. Kuniyoshi, “Discriminative spatial pyramid,” in Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pp. 1617–1624, IEEE, 2011.
    [6] G. Sharma, F. Jurie, and C. Schmid, “Discriminative spatial saliency for image classification,” in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pp. 3506– 3513, IEEE, 2012.
    [7] J. Liu, J. Luo, and M. Shah, “Recognizing realistic actions from videos “in the wild”,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 1996– 2003, IEEE, 2009.
    [8] S. N. Parizi, J. G. Oberlin, and P. F. Felzenszwalb, “Reconfigurable models for scene recognition,” in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pp. 2775–2782, IEEE, 2012.
    [9] J. Wu and J. M. Rehg, “Centrist: A visual descriptor for scene categorization,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 33, no. 8, pp. 1489–1501, 2011.
    [10] M. Pandey and S. Lazebnik, “Scene recognition and weakly supervised object localization with deformable part-based models,” in Computer Vision (ICCV), 2011 IEEE International Conference on, pp. 1307–1314, IEEE, 2011.
    [11] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories,” in Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, vol. 2, pp. 2169–2178, IEEE, 2006.
    [12] A. Coates and A. Y. Ng, “The importance of encoding versus training with sparse coding and vector quantization,” in Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 921–928, 2011.
    [13] R. Rigamonti, M. Brown, V. Lepetit, et al., “Are sparse representations really relevant for image classification?,” in Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pp. 1545–1552, IEEE, 2011.
    [14] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International journal of computer vision, vol. 60, no. 2, pp. 91–110, 2004.
    [15] J. Yang, K. Yu, Y. Gong, and T. Huang, “Linear spatial pyramid matching using sparse coding for image classification,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 1794–1801, IEEE, 2009.
    [16] S. Gao, I. W.-H. Tsang, L.-T. Chia, and P. Zhao, “Local features are not lonely–laplacian sparse coding for image classification,” in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pp. 3555–3561, IEEE, 2010.
    [17] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong, “Locality-constrained linear coding for image classification,” in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pp. 3360–3367, IEEE, 2010.
    [18] K. Yu, T. Zhang, and Y. Gong, “Nonlinear learning using local coordinate coding,” in Advances in neural information processing systems, pp. 2223–2231, 2009.
    [19] K. Grauman and T. Darrell, “The pyramid match kernel: Discriminative classification with sets of image features,” in Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, vol. 2, pp. 1458–1465, IEEE, 2005.
    [20] F. Perronnin, J. Sánchez, and T. Mensink, “Improving the fisher kernel for large-scale image classification,” in Computer Vision–ECCV 2010, pp. 143–156, Springer, 2010.
    [21] K. E. Van de Sande, J. R. Uijlings, T. Gevers, and A. W. Smeulders, “Segmentation as selective search for object recognition,” in Computer Vision (ICCV), 2011 IEEE International Conference on, pp. 1879–1886, IEEE, 2011.
    [22] L.-J. Li and L. Fei-Fei, “What, where and who? classifying events by scene and object recognition,” in Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, pp. 1–8, IEEE, 2007.
    [23] L. Fei-Fei and P. Perona, “A bayesian hierarchical model for learning natural scene categories,” in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 2, pp. 524–531, IEEE, 2005.
    [24] A. Oliva and A. Torralba, “Modeling the shape of the scene: A holistic representation of the spatial envelope,” International journal of computer vision, vol. 42, no. 3, pp. 145–175, 2001.
    [25] L. Fei-Fei, R. Fergus, and P. Perona, “One-shot learning of object categories,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 28, no. 4, pp. 594–611, 2006.
    [26] A. Quattoni and A. Torralba, “Recognizing indoor scenes,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 413–420, IEEE, 2009.
    [27] J. Yang, S. Zhang, G. Wang, and M. Li, “Scene and place recognition using a hierarchical latent topic model,” Neurocomputing, vol. 148, pp. 578–586, 2015.

    無法下載圖示 全文公開日期 2020/08/13 (校內網路)
    全文公開日期 2025/08/13 (校外網路)
    全文公開日期 2025/08/13 (國家圖書館:臺灣博碩士論文系統)
    QR CODE