研究生: |
Ronny Haryanto Ronny - Haryanto |
---|---|
論文名稱: |
Mobile Book Recognition using Text and Feature Extraction Mobile Book Recognition using Text and Feature Extraction |
指導教授: |
林昌鴻
Chang Hong Lin |
口試委員: |
呂政修
Jenq-Shiou Leu 林敬舜 Ching Shun Lin 李佳翰 Chia Han Lee |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電子工程系 Department of Electronic and Computer Engineering |
論文出版年: | 2013 |
畢業學年度: | 101 |
語文別: | 英文 |
論文頁數: | 71 |
中文關鍵詞: | augmented reality 、book cover recognition 、text extraction 、feature extraction |
外文關鍵詞: | augmented reality, book cover recognition, text extraction, feature extraction |
相關次數: | 點閱:191 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
As smartphones get more popular these days, mobile Augmented Reality (AR) application is growing, especially applications that recognize an object and augment relevant information on the smartphone’s video viewfinder. Many applications outsource the recognition process to a server via a wireless connection; however this approach suffers from low performance due to restricted bandwidth which limits scalability in the number of client devices. In this thesis, we would like to propose a scalable mobile augmented reality framework by minimizing the bandwidth reliance of the server-client model augmented reality system and dividing the works on server and client respectively.
As a special case study, we propose a mobile book cover recognition system. Our mobile augmented reality system is based on the client-server model. On the mobile device, user prompts to capture the query image to be sent to server. On the server side, title extraction and features extraction is employed. Text detection: wavelet decomposition is employed to calculate the energy intensity around the captured book image and locate the approximate location of the text. Then, an OCR engine is employed to extract the textual information. Feature extraction: we use binary based features detector ORB to detect interest points and binary descriptor BRISK. Fuzzy decision making: two types of features, a set of words and a set of interest point descriptors. We have both database of book titles and image descriptors. The recognized text is matched to the database by using string similarity ranking algorithm to find out which title has the best similarity. The extracted image descriptors is matched to the database by using nearest neighbor algorithm to find out which image descriptor has the nearest distance to the image query and then we calculate which book has most image descriptors matched. A fuzzy decision making is applied to combine both features and to select which book is the best match.
As smartphones get more popular these days, mobile Augmented Reality (AR) application is growing, especially applications that recognize an object and augment relevant information on the smartphone’s video viewfinder. Many applications outsource the recognition process to a server via a wireless connection; however this approach suffers from low performance due to restricted bandwidth which limits scalability in the number of client devices. In this thesis, we would like to propose a scalable mobile augmented reality framework by minimizing the bandwidth reliance of the server-client model augmented reality system and dividing the works on server and client respectively.
As a special case study, we propose a mobile book cover recognition system. Our mobile augmented reality system is based on the client-server model. On the mobile device, user prompts to capture the query image to be sent to server. On the server side, title extraction and features extraction is employed. Text detection: wavelet decomposition is employed to calculate the energy intensity around the captured book image and locate the approximate location of the text. Then, an OCR engine is employed to extract the textual information. Feature extraction: we use binary based features detector ORB to detect interest points and binary descriptor BRISK. Fuzzy decision making: two types of features, a set of words and a set of interest point descriptors. We have both database of book titles and image descriptors. The recognized text is matched to the database by using string similarity ranking algorithm to find out which title has the best similarity. The extracted image descriptors is matched to the database by using nearest neighbor algorithm to find out which image descriptor has the nearest distance to the image query and then we calculate which book has most image descriptors matched. A fuzzy decision making is applied to combine both features and to select which book is the best match.
References
[1] R. Azuma, “A survey of augmented reality,” in Presence: Teleoperators and Virtual Environments, vol. 6, Aug. 1977, pp. 355–385.
[2] D. Chen, S. Tsai, R. Vedantham, R. Grzeszczuk, and B. Girod, “Streaming mobile augmented reality on mobile phones,” in International Symposium on Mixed and Augmented Reality (ISMAR), Orlando, FL, USA, October 2009, pp. 181–182.
[3] G. Takacs, V. Chandrasekhar, S. Tsai, D. Chen, R. Grzeszczuk, and B. Girod, “Unified real-time tracking and recognition with rotation-invariant fast features,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, June 2010, pp. 934 –941.
[4] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, November 2004, pp. 91–110.
[5] H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool, “Speeded-up robust features (SURF),” Computer Vision and Image Understanding, vol. 110, June 2008, pp. 346–359.
[6] V. Chandrasekhar, Y. Reznik, G. Takacs, D. Chen, S. Tsai, R. Grzeszczuk, and B. Girod, “Quantization schemes for low bitrate compressed histogram of gradients descriptors,” IEEE Computer Vision and Pattern Recognition Workshops, San Francisco, CA, USA, June 2010, pp. 33–40.
[7] D. Wagner and D. Schmalstieg, “First Steps Towards Handheld Augmented Reality,” Proc. Seventh Int’l Conf. Wearable Computers (ISWC ’03), pp. 127-135, 2003.
[8] D. Wagner and D. Schmalstieg, “ARToolKitPlus for Pose Tracking on Mobile Devices,” Proc. 12th Computer Vision Winter Workshop (CVWW ’07), pp. 139-146, 2007.
[9] I. Skrypnyk and D. Lowe, “Scene Modeling, Recognition and Tracking with Invariant Image Features,” Proc. Int’l Symp. Mixed and Augmented Reality (ISMAR ’04), pp. 110-119, 2004.
[10] V. Lepetit, P. Lagger, and P. Fua, “Randomized Trees for RealTime Keypoint Recognition,” Proc. Conf. Computer Vision and Pattern Recognition (CVPR ’05), pp. 775-781, 2005.
[11] G. Takacs, V. Chandrasekhar, N. Gelfand, Y. Xiong, W.-C. Chen, T. Bismpigiannis, R. Grzeszczuk, K. Pulli, and B. Girod, “Outdoors Augmented Reality on Mobile Phone Using Loxel-Based Visual Feature Organization,” IEEE Trans. Pattern Analysis and Machine Intelligence, 2008.
[12] J. Ha, K. Cho, F. A. Rojas, and H. S. Yang, “Real-Time Scalable Recognition and Tracking based on the Server-Client Model for Mobile Augmented Reality,” IEEE International Symposium on Virtual Reality Innovations, March 2011, pp. 267-272.
[13] D. Chen, S. Tsai, C. H. Hsu, J. P. Singh, and B. Girod, “Mobile Augmented Reality for Books on a Shelf,” IEEE International Conference on Multimedia and Expo, July 2011, pp. 1-6.
[14] B. R. Huang, C. H. Lin, and C. H. Lee, “Mobile Augmented Reality Based on Cloud Computing,” International Conferences on Anti-Counterfeiting, Security and Identification, August 2012, pp. 1-5.
[15] B.P. Lin, Wen-Hsiang Tsai, C.C. Wu, P.H. Hsu, J.Y. Huang, Tsai-Hwa Liu, "The Design of Cloud-Based 4G/LTE for Mobile Augmented Reality with Smart Mobile Devices," Service Oriented System Engineering (SOSE), March 2013, pp.561-566.
[16] M.Y. Hsieh and Wen-Hsiang Tsai, “A Study on Indoor Navigation by Augmented Reality and Down-Looking Omni-Vision Techniques Using Mobile Devices,” Technical Report, Institute of Multimedia Engineering, Department of Computer Science, NCTU, Hsinchu, Taiwan, July, 2012
[17] A.B. Tillon, I. Marchal, P. Houlier, "Mobile augmented reality in the museum: Can a lace-like technology take you closer to works of art?," Mixed and Augmented Reality - Arts, Media, and Humanities (ISMAR-AMH), October 2011, pp.41,47.
[18] D. Wagner, G. Reitmayr, A. Mulloni, T. Drummond, and D. Schmalstieg, “Real-Time Detection and Tracking for Augmented Reality on Mobile Phones,” Visualization and Computer Graphics, vol. 16, no. 3, June 2010, pp. 355-368.
[19] M. Ozuysal, P. Fua, and V. Lepetit, “Fast Keypoint Recognition in Ten Lines of Code,” Proc. Conf. Computer Vision and Pattern Recognition (CVPR ’07), pp. 1-8, 2007.
[20] E. Rublee, V. Rabaut, K. Konolige, and G. Bradski, “ORB: an efficient alternative to SIFT or SURF,” IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain, November 2011, pp. 2564-2571.
[21] S. Leutenegger, M. Chli, and R. Y. Siegwart, “BRISK: Binary Robust Invariant Scalable Keypoints,” IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain, November 2011, pp. 2548-2555.
[22] O. Miksik and K. Mikolajczyk, “Evaluation of Local Detectors and Descriptors for Fast Feature Matching,” International Conference on Pattern Recognition (ICPR), Tsukuba, Japan, November 2012, pp. 2681-2684.
[23] M. Muja and D. G. Lowe, “Fast approximate nearest neighbors with automatic algorithm configuration,” International Conference on Computer Vision Theory and Applications (VISAPP), 2009.
[24] Q. Ye, Q. Huang, W. Gao, and D. Zhao, “Fast and robust text detection in images and video frames,” Image and Vision Computing, vol. 23, no. 6, 1 June 2005, pp. 565-576.
[25] L. R. Dice, “Measures of the Amount of Ecologic Association between Species,” Ecology, vol. 26, no. 3, July 1945, pp. 297-302.