簡易檢索 / 詳目顯示

研究生: 王亭之
Ting-Zhi Wang
論文名稱: 學習圖片與文字對應關係的研究
Learning Visual Object and Word Association
指導教授: 陳郁堂
Yie-Tarng Chen
口試委員: 方文賢
Wen-Hsien Fang
吳乾彌
Qian-Mi Wu
陳省隆
Hsing-Long Chen
林銘波
Ming-Bo Lin
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2016
畢業學年度: 104
語文別: 英文
論文頁數: 48
中文關鍵詞: 二分匹配圖結構支持向量機圖片與文字的對應關係圖片標籤
外文關鍵詞: bipartite graph matching, Structural SVM, visual object and word association, image annotation
相關次數: 點閱:166下載:4
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 這篇論文主要目的在於學習一個模型,可以同時找到物件與文字的對應關係以及該圖片所屬樣版。我們把這個問題重新定義成一個二分匹配的問題:物件對應到文字的問題。然而圖片樣版複雜,所以在圖片內的物件與文字可能存在著很多關係,我們可以將這些關係訓練在我們的模型中,並用機械學習的方法得到權重。我們分別訓練物件與文字的對應關係模型與圖片樣版選取模型;在測試時,我們互相迭代這兩個模型,並得到最後的結果。結果顯示我們的方法優於其他方法。


    This paper presents a discriminative learning framework to simultaneously find the association between objects and words and perform template matching for complex association patterns. We formulate the problem of finding the association between visual objects and texts as a bipartite graph matching problem. Since the compatibility function has significant influence on the final matching results, we learn an optimal compatibility function which encodes the association rules for visual objects and words via a structural support vector machine (SVM). Also, an iterative inference procedure is developed to alternatively infer visual objects and texts association and template model selection. Simulations show the new method outperforms some competing counterparts.

    中文摘要 Abstract Acknowledgment Table of contents List of Tables List of Figures Chapter 1 Introduction 1.1 Objective of this Research 1.2 Assumption 1.3 Summary of the proposed approaches 1.4 Contribution Chapter 2 Related work Chapter 3 Pre-processing : Object and Word Detection 3.1 Detection method 3.2 Local Binary Patterns 3.3 Connected Components Labeling 3.4 Size filter Chapter 4 Model 4.1 Visual objects and words association 4.1.1 Joint feature map 4.1.2 Loss function 4.1.3 Structural learning algorithm 4.1.4 Find most violated constraints 4.2 Template matching 4.2.1 Joint feature map 4.2.2 Loss function 4.2.3 Find most violated constraints 4.3 Inference Chapter 5 Experimental Results 5.1 Dataset 5.2 Performance Metrics 5.3 Compared Algorithms 5.4 Image-and-Text Association Results 5.5 Comparing three template class 5.6 Initial Solutions and Final Results 5.7 Weight Learning for Different Templates 5.8 Template matching 5.9 Execution time 5.10 Discussions Chapter 6 Conclusion Reference

    [1] J. V. M. Guillaumin, T. Mensink and C. Schmid, "Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation," in Conf. Comput. Vis., pp. 309-316, ICCV, 2009.
    [2] Y. Verma and C. Jawahar, “Image annotation using metric learning in semantic neighbourhoods," in Proc. 12th European Conf. Comput. Vis., pp. 836-849, 2012.
    [3] H. I. M. M. Kalayeh and M. Shah, "Nmf-knn: image annotation using weighted multi-view non-negative matrix factorization," in Conf. Comput. Vis. Pattern Recognit., pp. 184-191, IEEE, 2014.
    [4] L. B. J. Johnson and L. Fei-Fei, "Love thy neighbors: Image annotation by exploiting image metadata," in Proc. IEEE Conf. Comput. Vis., pp. 4624-4632, 2015.
    [5] H. W. X. Y. F. Y. J. Wang, Y. Zhou and A. Peterson, "Image tag completion by local learning," in Advances in Neural Networks-ISNN 2015, pp. 232-239, 2015.
    [6] N. d. F. P. Duygulu, K. Barnard and D. Forsyth, "Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary," in Conf. on Comput. Vis., pp. 97-112, 2002.
    [7] H. T. Y. Mori and R. Oka, "Image-to-word transformation based on dividing and vector quantizing images with words," in Proc. Int'l Workshop Multimedia Intelligent Storage and Retrieval Management, 1999.
    [8] K. B. P. Duygulu and D. Forsyth, "Clustering art," in Conf. on Comput. Vis. Pattern Recognit., vol. 2, pp. 434-439, IEEE, 2001.
    [9] F. Monay and D. Gatica-Perez, "On image auto-annotation with latent space models," in Proc. ACM Conf. Multimedia, pp. 275-278, ACM, 2003.
    [10] D. Blei and M. Jordan, “Modeling annotated data," in ACM SIGIR, pp. 127-134, 2003.
    [11] G. S. E. Chang, K. Goh and G. Wu, “Cbsa: content-based soft annotation for multimodal image retrieval using bayes point machines," IEEE Trans. on CSVT, vol. 13, no. 1, pp. 26-38, 2003.
    [12] G. C. C. Cusano and R. Schettini, “Image annotation using svm," in Proc. SPIE, pp. 330-338, 2004.
    [13] J. Li and J. Z. Wang, “Automatic linguistic indexing of pictures by a statistical modeling approach," IEEE Trans.Pattern Anal. Mach. Intell., vol. 25, no. 9,
    pp. 1075{1088, 2003.
    [14] V. L. J. Jeon and R. Manmatha, “Automatic image annotation and retrieval using cross-media relevance models," in SIGIR, 2003.
    [15] R. M. V. Lavrenko and J. Jeon, “A model for learning the semantics of pictures," in NIPS, 2003.
    [16] T. J. I. Tsochantaridis, T. Hofmann and Y. Altun, “Support vector machine learning for interdependent and structured output spaces," in Proc. 21th Int. Conf. Mach. Learn., p. 104, ACM, 2004.
    [17] T. H. I. Tsochantaridis, T. Joachims and Y. Altun, “Large margin methods for structured and interdependent output variables," in J. Mach. Learn. Res., pp. 1453-1484, 2005.
    [18] R. M. S. L. Feng and V. Lavrenko, “Multiple bernoulli relevance models for image and video annotation," in Proc.IEEE Conf. Comput. Vis. Pattern Recognit., vol. 2, pp. II-1002, IEEE, 2004.
    [19] B. Chien and C. Ku, “A hybrid approach for large-scale image classi_cation," in Proc. ASE BigData and SocialInformatics, p. 40, ACM, 2015.
    [20] J. Verbeek, “Discriminative metric learning in nearest neighbor models for image annotation," 2009.
    [21] J. M. S. Belongie and J. Puzicha, “Shape matching and object recognition using shape contexts," IEEE Trans. on Pattern Anal. Mach. Intell., vol. 24, no. 4, pp. 509-522, 2002.
    [22] L. C. Q. V. L. T. S. Caetano, J. J. McAuley and A. J. Smola, “Learning graph matching," IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 6, pp. 1048-1058, 2009.
    [23] J. T. K. Lu, R. Ji and Y. Gao, “Learning-based bipartite graph matching for view-based 3d model retrieval," IEEE Trans. Image Process., vol. 23, no. 10, pp. 4553-4563, 2014.
    [24] X. H. Y. F. Pan and C. L. Liu, “A robust system to detect and localize texts in natural scene images," in Proc. 8th IAPR Workshop on Document Analysis Systems (DAS'08), pp. 35-42, IEEE, 2008.
    [25] H. S. M. B. Dillencourt and M. Tamminen, “A general approach to connected-component labeling for arbitrary image representations," Journal of the ACM, vol. 39, no. 2, pp. 253{280, 1992.
    [26] T. L. Jianzhuang, L. Wenqing and Yupeng, “Automatic thresholding of gray-level pictures using two-dimension otsu method," in Conf. Circuits and Systems, pp. 325-327, IEEE, 1991.
    [27] H. W. Kuhn, “The hungarian method for the assignment problem," Naval research logistics quarterly, vol. 2, no. 1-2, pp. 83-97, 1955.
    [28] R. Jonker and A. Volgenant, “A shortest augmenting path algorithm for dense and sparse linear assignment problems," Computing, vol. 38, no. 4, pp. 325-340, 1987.

    QR CODE