簡易檢索 / 詳目顯示

研究生: Ronny Haryanto
Ronny - Haryanto
論文名稱: Mobile Book Recognition using Text and Feature Extraction
Mobile Book Recognition using Text and Feature Extraction
指導教授: 林昌鴻
Chang Hong Lin
口試委員: 呂政修
Jenq-Shiou Leu
林敬舜
Ching Shun Lin
李佳翰
Chia Han Lee
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2013
畢業學年度: 101
語文別: 英文
論文頁數: 71
中文關鍵詞: augmented realitybook cover recognitiontext extractionfeature extraction
外文關鍵詞: augmented reality, book cover recognition, text extraction, feature extraction
相關次數: 點閱:191下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • As smartphones get more popular these days, mobile Augmented Reality (AR) application is growing, especially applications that recognize an object and augment relevant information on the smartphone’s video viewfinder. Many applications outsource the recognition process to a server via a wireless connection; however this approach suffers from low performance due to restricted bandwidth which limits scalability in the number of client devices. In this thesis, we would like to propose a scalable mobile augmented reality framework by minimizing the bandwidth reliance of the server-client model augmented reality system and dividing the works on server and client respectively.
    As a special case study, we propose a mobile book cover recognition system. Our mobile augmented reality system is based on the client-server model. On the mobile device, user prompts to capture the query image to be sent to server. On the server side, title extraction and features extraction is employed. Text detection: wavelet decomposition is employed to calculate the energy intensity around the captured book image and locate the approximate location of the text. Then, an OCR engine is employed to extract the textual information. Feature extraction: we use binary based features detector ORB to detect interest points and binary descriptor BRISK. Fuzzy decision making: two types of features, a set of words and a set of interest point descriptors. We have both database of book titles and image descriptors. The recognized text is matched to the database by using string similarity ranking algorithm to find out which title has the best similarity. The extracted image descriptors is matched to the database by using nearest neighbor algorithm to find out which image descriptor has the nearest distance to the image query and then we calculate which book has most image descriptors matched. A fuzzy decision making is applied to combine both features and to select which book is the best match.


    As smartphones get more popular these days, mobile Augmented Reality (AR) application is growing, especially applications that recognize an object and augment relevant information on the smartphone’s video viewfinder. Many applications outsource the recognition process to a server via a wireless connection; however this approach suffers from low performance due to restricted bandwidth which limits scalability in the number of client devices. In this thesis, we would like to propose a scalable mobile augmented reality framework by minimizing the bandwidth reliance of the server-client model augmented reality system and dividing the works on server and client respectively.
    As a special case study, we propose a mobile book cover recognition system. Our mobile augmented reality system is based on the client-server model. On the mobile device, user prompts to capture the query image to be sent to server. On the server side, title extraction and features extraction is employed. Text detection: wavelet decomposition is employed to calculate the energy intensity around the captured book image and locate the approximate location of the text. Then, an OCR engine is employed to extract the textual information. Feature extraction: we use binary based features detector ORB to detect interest points and binary descriptor BRISK. Fuzzy decision making: two types of features, a set of words and a set of interest point descriptors. We have both database of book titles and image descriptors. The recognized text is matched to the database by using string similarity ranking algorithm to find out which title has the best similarity. The extracted image descriptors is matched to the database by using nearest neighbor algorithm to find out which image descriptor has the nearest distance to the image query and then we calculate which book has most image descriptors matched. A fuzzy decision making is applied to combine both features and to select which book is the best match.

    Table of Contents Abstract i Table of Contents ii List of Figures iii List of Tables ix 1. INTRODUCTION 1 1.1. Motivation 1 1.2. Objective and Contribution 3 1.3. Thesis Organization 4 2. RELATED WORKS 5 2.1. Marker-based Augmented Reality 5 2.2. Markerless Augmented Reality 6 3. PROPOSED METHODS 16 3.1. Client-Server Communication 17 3.2. Book Cover Features Extraction 19 3.2.1. Features Detection 20 3.2.2. Descriptor Extraction 21 3.2.3. Feature Matching Algorithm 23 3.3. Title Extraction 24 3.3.1. Candidate Text Pixels Detection 25 3.3.2. Text Recognition 28 3.3.3. Similarity Matching 29 3.4. Decision Making 31 4. EXPERIMENTAL RESULTS 4.1. Developing Platform 32 4.2. Experimental Results 33 4.2.1. Text Detection 33 4.2.2. Text Recognition and Similarity Matching 45 4.2.3. Features Matching 51 4.2.4. Execution Time 52 4.3. Environment Testing 4.3.1. Scale differences 53 4.3.2. Tilt 54 4.3.3. Occlusion 54 4.3.4. Motion Blur 55 4.3.5. Recognition Result 54 5. CONCLUSION AND FUTURE WORKS 5.1. Conclusions 58 5.2. Future Works 59 References 60

    References

    [1] R. Azuma, “A survey of augmented reality,” in Presence: Teleoperators and Virtual Environments, vol. 6, Aug. 1977, pp. 355–385.
    [2] D. Chen, S. Tsai, R. Vedantham, R. Grzeszczuk, and B. Girod, “Streaming mobile augmented reality on mobile phones,” in International Symposium on Mixed and Augmented Reality (ISMAR), Orlando, FL, USA, October 2009, pp. 181–182.
    [3] G. Takacs, V. Chandrasekhar, S. Tsai, D. Chen, R. Grzeszczuk, and B. Girod, “Unified real-time tracking and recognition with rotation-invariant fast features,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, June 2010, pp. 934 –941.
    [4] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, November 2004, pp. 91–110.
    [5] H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool, “Speeded-up robust features (SURF),” Computer Vision and Image Understanding, vol. 110, June 2008, pp. 346–359.
    [6] V. Chandrasekhar, Y. Reznik, G. Takacs, D. Chen, S. Tsai, R. Grzeszczuk, and B. Girod, “Quantization schemes for low bitrate compressed histogram of gradients descriptors,” IEEE Computer Vision and Pattern Recognition Workshops, San Francisco, CA, USA, June 2010, pp. 33–40.
    [7] D. Wagner and D. Schmalstieg, “First Steps Towards Handheld Augmented Reality,” Proc. Seventh Int’l Conf. Wearable Computers (ISWC ’03), pp. 127-135, 2003.
    [8] D. Wagner and D. Schmalstieg, “ARToolKitPlus for Pose Tracking on Mobile Devices,” Proc. 12th Computer Vision Winter Workshop (CVWW ’07), pp. 139-146, 2007.
    [9] I. Skrypnyk and D. Lowe, “Scene Modeling, Recognition and Tracking with Invariant Image Features,” Proc. Int’l Symp. Mixed and Augmented Reality (ISMAR ’04), pp. 110-119, 2004.
    [10] V. Lepetit, P. Lagger, and P. Fua, “Randomized Trees for RealTime Keypoint Recognition,” Proc. Conf. Computer Vision and Pattern Recognition (CVPR ’05), pp. 775-781, 2005.
    [11] G. Takacs, V. Chandrasekhar, N. Gelfand, Y. Xiong, W.-C. Chen, T. Bismpigiannis, R. Grzeszczuk, K. Pulli, and B. Girod, “Outdoors Augmented Reality on Mobile Phone Using Loxel-Based Visual Feature Organization,” IEEE Trans. Pattern Analysis and Machine Intelligence, 2008.
    [12] J. Ha, K. Cho, F. A. Rojas, and H. S. Yang, “Real-Time Scalable Recognition and Tracking based on the Server-Client Model for Mobile Augmented Reality,” IEEE International Symposium on Virtual Reality Innovations, March 2011, pp. 267-272.
    [13] D. Chen, S. Tsai, C. H. Hsu, J. P. Singh, and B. Girod, “Mobile Augmented Reality for Books on a Shelf,” IEEE International Conference on Multimedia and Expo, July 2011, pp. 1-6.
    [14] B. R. Huang, C. H. Lin, and C. H. Lee, “Mobile Augmented Reality Based on Cloud Computing,” International Conferences on Anti-Counterfeiting, Security and Identification, August 2012, pp. 1-5.
    [15] B.P. Lin, Wen-Hsiang Tsai, C.C. Wu, P.H. Hsu, J.Y. Huang, Tsai-Hwa Liu, "The Design of Cloud-Based 4G/LTE for Mobile Augmented Reality with Smart Mobile Devices," Service Oriented System Engineering (SOSE), March 2013, pp.561-566.
    [16] M.Y. Hsieh and Wen-Hsiang Tsai, “A Study on Indoor Navigation by Augmented Reality and Down-Looking Omni-Vision Techniques Using Mobile Devices,” Technical Report, Institute of Multimedia Engineering, Department of Computer Science, NCTU, Hsinchu, Taiwan, July, 2012
    [17] A.B. Tillon, I. Marchal, P. Houlier, "Mobile augmented reality in the museum: Can a lace-like technology take you closer to works of art?," Mixed and Augmented Reality - Arts, Media, and Humanities (ISMAR-AMH), October 2011, pp.41,47.
    [18] D. Wagner, G. Reitmayr, A. Mulloni, T. Drummond, and D. Schmalstieg, “Real-Time Detection and Tracking for Augmented Reality on Mobile Phones,” Visualization and Computer Graphics, vol. 16, no. 3, June 2010, pp. 355-368.
    [19] M. Ozuysal, P. Fua, and V. Lepetit, “Fast Keypoint Recognition in Ten Lines of Code,” Proc. Conf. Computer Vision and Pattern Recognition (CVPR ’07), pp. 1-8, 2007.
    [20] E. Rublee, V. Rabaut, K. Konolige, and G. Bradski, “ORB: an efficient alternative to SIFT or SURF,” IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain, November 2011, pp. 2564-2571.
    [21] S. Leutenegger, M. Chli, and R. Y. Siegwart, “BRISK: Binary Robust Invariant Scalable Keypoints,” IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain, November 2011, pp. 2548-2555.
    [22] O. Miksik and K. Mikolajczyk, “Evaluation of Local Detectors and Descriptors for Fast Feature Matching,” International Conference on Pattern Recognition (ICPR), Tsukuba, Japan, November 2012, pp. 2681-2684.
    [23] M. Muja and D. G. Lowe, “Fast approximate nearest neighbors with automatic algorithm configuration,” International Conference on Computer Vision Theory and Applications (VISAPP), 2009.
    [24] Q. Ye, Q. Huang, W. Gao, and D. Zhao, “Fast and robust text detection in images and video frames,” Image and Vision Computing, vol. 23, no. 6, 1 June 2005, pp. 565-576.
    [25] L. R. Dice, “Measures of the Amount of Ecologic Association between Species,” Ecology, vol. 26, no. 3, July 1945, pp. 297-302.

    QR CODE