簡易檢索 / 詳目顯示

研究生: 呂偉民
Wei-min Lu
論文名稱: 使用尺度不變特徵轉換於字袋模型在影像註解應用中有效性之研究
A Study of the Effectiveness of the Bags of Words Model with SIFT for Image Annotation
指導教授: 吳怡樂
Yi-leh Wu
口試委員: 何瑁鎧
Maw-kae Hor
唐政元
Cheng-yuan Tang
陳建中
Jiann-jone Chen
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2010
畢業學年度: 98
語文別: 英文
論文頁數: 38
中文關鍵詞: 尺度不變特徵轉換自動照片分類與註解事件辨識字袋模型
外文關鍵詞: SIFT, automatic image classification/annotation, event classification, bags of words model
相關次數: 點閱:218下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

近年來關於自動的照片分類與註解的研究中,尺度不變特徵轉換(SIFT)越來
越被廣泛用做視覺特徵。然而,由於事件照片內容變動性大,再加上SIFT 強大
的分辨能力,將造成SIFT 對於相同事件類別照片的描述有很大的差異。在這篇
論文中,我們拿兩張相同事件照片的SIFT特徵去計算兩兩特徵的歐基里德距離;
再拿兩張不同事件的照片也做同樣的處理。由此我們可以發現在相同事件照片間
SIFT 特徵的差距就如同在不同事件照片中那般大。接著我們比較三種特徵做事
件辨識的結果,探討此一現象對準確率的影響。第一種特徵是用SIFT,透過bags
of words model 產生;第二種是結合color moment 與EDH;第三種是結合前述
兩種特徵。最後結果顯示,第一種特徵準確率最低,第三種特徵次之,第二種特
徵準確率最高。因此我們可以知道對於事件照片而言,SIFT 並不是一個最好的
選擇。


In recent researches of automatic image classification and annotation, the SIFT feature becomes more and more popular as the visual feature. However, the highly varied contents in the event images together with the high discrimination power of the SIFT feature, the image features of the same event may differ greatly. In this study, we show that the distributions of the SIFT distance values are indiscriminate inter or intra event classes. We then design experiments to directly test the effectiveness of using the SIFT features together with the bags of words models. The experiment result shows that by combining the SIFT visual features with the bags of words models in addition to the traditional low level image features produces inferior classification results then by using the traditional low level image features alone. This result supports the following hypothesis: in event classification, the bags of words model based on the SIFT feature may not be effective.

指導教授推薦書 ............................................ I 論文口試委員審定書 ........................................ II 摘要 ...................................................... III Abstract .................................................. IV Contents .................................................. V List of Tables ............................................ VII List of Figures ........................................... VIII 1 Introduction ............................................ 1 2 Related Works ........................................... 3 3 Visual Features ......................................... 5 3.1 Color Moment ........................................ 5 3.2 Edge Direction Histogram ............................ 5 3.3 Scale-Invariant Feature Transform ................... 6 3.4 PCA-SIFT ............................................ 9 4 Bags of Words Model ..................................... 11 4.1 Feature Extraction .................................. 11 4.2 Codebook ............................................ 12 4.2.1 K-means ....................................... 12 4.2.2 DBSCAN ........................................ 13 4.2.3 Vocabulary Size ............................... 13 4.3 Bag-of-Word Representation .......................... 14 5 Experiments and Results ................................. 15 5.1 Event Concepts and Dataset .......................... 15 5.2 A Failed Experiment about Event Classification ...... 16 5.3 SIFT distance ....................................... 18 5.4 Event Classification ................................ 21 6 Conclusions ............................................. 25 References ................................................ 26 授權書 .................................................... IX

[1] J. Kustanowitz and B. Shneiderman, “Motivating annotation for digital
photographs: Lowering barriers while raising incentives,” Technical Report
HCIL-2004-18, University of Marland, May 2004.
[2] C. Cerosaletti, M. Das, A.C. Loui, and B. Kraus, “Approaches to consumer image organization based on semantic categories,” The International Society for Optical Engineering Optics East Conference, 2006.
[3] David G. Lowe, “Distinctive image features from scale-invariant keypoints,”International Journal of Computer Vision, 60(2):91–110, 2004.
[4] C. G. M. Snoek, M. Worring, J. van Gemert, J.-M. Geusebroek, D. Koelma, G. P.Nguyen, O. de Rooij, and F. Seinstra, “Mediamill: exploring news video archives based on learned semantics,” in MULTIMEDIA ’05: Proceedings of the 13th annual ACM international conference on Multimedia, (New York, NY, USA), pp. 225–226, 2005.
[5] M. Naphade, J. R. Smith, J. Tesic, S.-F. Chang,W. Hsu, L. Kennedy, A.
Hauptmann, and J. Curtis, “Large-scale concept ontology for multimedia,” IEEE
MultiMedia, vol. 13, no. 3, pp. 86–91, 2006.
[6] A. Yanagawa, S. fu Chang, L. Kennedy, and W. Hsu, “Columbia universitys
baseline detectors for 374,” Technical Report, Columbia University ADVENT, 2007.
[7] C.-C. Chang and C.-J. Lin, “LIBSVM: a library for support vector machines,”Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[8] Li-Jia Li and Li Fei-Fei, “What, where and who? Classifying events by scene and object recognition,” In IEEE International Conference on Computer Vision, 2007.
[9] Junsong Yuan, Jiebo Luo, Henry Kautz, and Ying Wu, “Mining GPS Traces and
Visual Words for Event Classification,” In Proceedings ACM International
Conference on Multimedia Information Retrieval, 2008.
[10] Jiebo Luo, Jie Yu, Dhiraj Joshi, and Wei Hao, “Event Recognition: Viewing the World with a Third Eye,” In ACM International Conference on Multimedia, pages 1071–1080, 2008.
[11] Naveed Imran, Jingen L, Jiebo Luo, and Mubarak Shah, “Event Recognition
from Photo Collections via PageRank,” In ACM International Conference on
Multimedia, pages 621–624, 2009.
[12] Liangliang Cao, Jiebo Luo, and Thomas S. Huang, “Annotating Photo Collections by Label Propagation According to Multiple Similarity Cues,” In ACM
International Conference on Multimedia, 2008.
[13] Jan C. van Gemert, Jan-Mark Geusebroek, Cor J. Veenman, and Arnold W.M.
Smeulders,“ Kernel Codebooks for Scene Categorization,” In European Conference
on Computer Vision, 2008.
[14] Li-Jia Li, Richard Socher, and Li Fei-Fei, “Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework,” In IEEE International Conference on Computer Vision and Pattern Recognition, 2009.
[15] Markus Stricker, and Markus Orengo, “Similarity of Color Images,” In
Proceedings The International Society for Optical Engineering, 1995.
[16] A. Vailaya, A. Jain, and H. J. Zhang, “On image classification: city vs.
land-scape,” Proceedings IEEE Workshop Content-Based Access of Image and Video
Libraries, pp. 3–8, June 1998.
[17] Alexander C. Loui, Jiebo Luo, Shih-Fu Chang, Dan Ellis, Wei Jiang, Lyndon
Kennedy, Keansub Lee, and Akira Yanagawa, “Kodak’s Consumer Video Benchmark
Data Set: Concept Definition and Annotation,” Columbia University ADVENT
Technical Report, 246-2008-4, Sep, 2008.
[18] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories,” In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 2, pages 2169-2178, 2006.
[19] J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid, “Local features and kernels for classication of texture and object categories: An in-depth study,” In Technical Report, INRIA, 2005.
[20] W. Zhao, Y.-G. Jiang, and C.-W. Ngo, “Keyframe retrieval by keypoints: Can
point-to-point matching help?” In Proceedings of 5th International Conference on Image and Video Retrieval, pages 72-81, 2006.
[21] J. Sivic and A. Zisserman, “Video google: A text retrieval approach to object matching in videos,” In Proceedings of 9th IEEE International Conference on Computer Vision, Vol. 2, 2003.
[22] Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), AAAI Press, pp. 226–231, ISBN 1-57735-004-9, 1996.
[23] Y. Ke and R. Sukthankar. PCA-SIFT: A more distinctive representation for local image descriptors. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Washington, USA, pages 511.517, 2004.

QR CODE