學習圖片與文字對應關係的研究｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	王亭之 Ting-Zhi Wang
論文名稱：	學習圖片與文字對應關係的研究 Learning Visual Object and Word Association
指導教授：	陳郁堂 Yie-Tarng Chen
口試委員:	方文賢 Wen-Hsien Fang 吳乾彌 Qian-Mi Wu 陳省隆 Hsing-Long Chen 林銘波 Ming-Bo Lin
學位類別：	碩士 Master
系所名稱：	電資學院 - 電子工程系 Department of Electronic and Computer Engineering
論文出版年：	2016
畢業學年度：	104
語文別：	英文
論文頁數：	48
中文關鍵詞：	二分匹配圖、結構支持向量機、圖片與文字的對應關係、圖片標籤
外文關鍵詞：	bipartite graph matching, Structural SVM, visual object and word association, image annotation
相關次數：	點閱：166 下載：4
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

這篇論文主要目的在於學習一個模型，可以同時找到物件與文字的對應關係以及該圖片所屬樣版。我們把這個問題重新定義成一個二分匹配的問題：物件對應到文字的問題。然而圖片樣版複雜，所以在圖片內的物件與文字可能存在著很多關係，我們可以將這些關係訓練在我們的模型中，並用機械學習的方法得到權重。我們分別訓練物件與文字的對應關係模型與圖片樣版選取模型；在測試時，我們互相迭代這兩個模型，並得到最後的結果。結果顯示我們的方法優於其他方法。

This paper presents a discriminative learning framework to simultaneously find the association between objects and words and perform template matching for complex association patterns. We formulate the problem of finding the association between visual objects and texts as a bipartite graph matching problem. Since the compatibility function has significant influence on the final matching results, we learn an optimal compatibility function which encodes the association rules for visual objects and words via a structural support vector machine (SVM). Also, an iterative inference procedure is developed to alternatively infer visual objects and texts association and template model selection. Simulations show the new method outperforms some competing counterparts.

中文摘要
Abstract
Acknowledgment
Table of contents
List of Tables
List of Figures
Chapter 1 Introduction
1.1 Objective of this Research
1.2 Assumption
1.3 Summary of the proposed approaches
1.4 Contribution
Chapter 2 Related work
Chapter 3 Pre-processing : Object and Word Detection
3.1 Detection method
3.2 Local Binary Patterns
3.3 Connected Components Labeling
3.4 Size filter
Chapter 4 Model
4.1 Visual objects and words association
4.1.1 Joint feature map
4.1.2 Loss function
4.1.3 Structural learning algorithm
4.1.4 Find most violated constraints
4.2 Template matching
4.2.1 Joint feature map
4.2.2 Loss function
4.2.3 Find most violated constraints
4.3 Inference
Chapter 5 Experimental Results
5.1 Dataset
5.2 Performance Metrics
5.3 Compared Algorithms
5.4 Image-and-Text Association Results
5.5 Comparing three template class
5.6 Initial Solutions and Final Results
5.7 Weight Learning for Different Templates
5.8 Template matching
5.9 Execution time
5.10 Discussions
Chapter 6 Conclusion
Reference

                                

[1] J. V. M. Guillaumin, T. Mensink and C. Schmid, "Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation," in Conf. Comput. Vis., pp. 309-316, ICCV, 2009.
[2] Y. Verma and C. Jawahar, “Image annotation using metric learning in semantic neighbourhoods," in Proc. 12th European Conf. Comput. Vis., pp. 836-849, 2012.
[3] H. I. M. M. Kalayeh and M. Shah, "Nmf-knn: image annotation using weighted multi-view non-negative matrix factorization," in Conf. Comput. Vis. Pattern Recognit., pp. 184-191, IEEE, 2014.
[4] L. B. J. Johnson and L. Fei-Fei, "Love thy neighbors: Image annotation by exploiting image metadata," in Proc. IEEE Conf. Comput. Vis., pp. 4624-4632, 2015.
[5] H. W. X. Y. F. Y. J. Wang, Y. Zhou and A. Peterson, "Image tag completion by local learning," in Advances in Neural Networks-ISNN 2015, pp. 232-239, 2015.
[6] N. d. F. P. Duygulu, K. Barnard and D. Forsyth, "Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary," in Conf. on Comput. Vis., pp. 97-112, 2002.
[7] H. T. Y. Mori and R. Oka, "Image-to-word transformation based on dividing and vector quantizing images with words," in Proc. Int'l Workshop Multimedia Intelligent Storage and Retrieval Management, 1999.
[8] K. B. P. Duygulu and D. Forsyth, "Clustering art," in Conf. on Comput. Vis. Pattern Recognit., vol. 2, pp. 434-439, IEEE, 2001.
[9] F. Monay and D. Gatica-Perez, "On image auto-annotation with latent space models," in Proc. ACM Conf. Multimedia, pp. 275-278, ACM, 2003.
[10] D. Blei and M. Jordan, “Modeling annotated data," in ACM SIGIR, pp. 127-134, 2003.
[11] G. S. E. Chang, K. Goh and G. Wu, “Cbsa: content-based soft annotation for multimodal image retrieval using bayes point machines," IEEE Trans. on CSVT, vol. 13, no. 1, pp. 26-38, 2003.
[12] G. C. C. Cusano and R. Schettini, “Image annotation using svm," in Proc. SPIE, pp. 330-338, 2004.
[13] J. Li and J. Z. Wang, “Automatic linguistic indexing of pictures by a statistical modeling approach," IEEE Trans.Pattern Anal. Mach. Intell., vol. 25, no. 9,
pp. 1075{1088, 2003.
[14] V. L. J. Jeon and R. Manmatha, “Automatic image annotation and retrieval using cross-media relevance models," in SIGIR, 2003.
[15] R. M. V. Lavrenko and J. Jeon, “A model for learning the semantics of pictures," in NIPS, 2003.
[16] T. J. I. Tsochantaridis, T. Hofmann and Y. Altun, “Support vector machine learning for interdependent and structured output spaces," in Proc. 21th Int. Conf. Mach. Learn., p. 104, ACM, 2004.
[17] T. H. I. Tsochantaridis, T. Joachims and Y. Altun, “Large margin methods for structured and interdependent output variables," in J. Mach. Learn. Res., pp. 1453-1484, 2005.
[18] R. M. S. L. Feng and V. Lavrenko, “Multiple bernoulli relevance models for image and video annotation," in Proc.IEEE Conf. Comput. Vis. Pattern Recognit., vol. 2, pp. II-1002, IEEE, 2004.
[19] B. Chien and C. Ku, “A hybrid approach for large-scale image classi_cation," in Proc. ASE BigData and SocialInformatics, p. 40, ACM, 2015.
[20] J. Verbeek, “Discriminative metric learning in nearest neighbor models for image annotation," 2009.
[21] J. M. S. Belongie and J. Puzicha, “Shape matching and object recognition using shape contexts," IEEE Trans. on Pattern Anal. Mach. Intell., vol. 24, no. 4, pp. 509-522, 2002.
[22] L. C. Q. V. L. T. S. Caetano, J. J. McAuley and A. J. Smola, “Learning graph matching," IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 6, pp. 1048-1058, 2009.
[23] J. T. K. Lu, R. Ji and Y. Gao, “Learning-based bipartite graph matching for view-based 3d model retrieval," IEEE Trans. Image Process., vol. 23, no. 10, pp. 4553-4563, 2014.
[24] X. H. Y. F. Pan and C. L. Liu, “A robust system to detect and localize texts in natural scene images," in Proc. 8th IAPR Workshop on Document Analysis Systems (DAS'08), pp. 35-42, IEEE, 2008.
[25] H. S. M. B. Dillencourt and M. Tamminen, “A general approach to connected-component labeling for arbitrary image representations," Journal of the ACM, vol. 39, no. 2, pp. 253{280, 1992.
[26] T. L. Jianzhuang, L. Wenqing and Yupeng, “Automatic thresholding of gray-level pictures using two-dimension otsu method," in Conf. Circuits and Systems, pp. 325-327, IEEE, 1991.
[27] H. W. Kuhn, “The hungarian method for the assignment problem," Naval research logistics quarterly, vol. 2, no. 1-2, pp. 83-97, 1955.
[28] R. Jonker and A. Volgenant, “A shortest augmenting path algorithm for dense and sparse linear assignment problems," Computing, vol. 38, no. 4, pp. 325-340, 1987.

簡易檢索 / 詳目顯示

相關論文