簡易檢索 / 詳目顯示

研究生: Vi Ngoc Tuong Truong
Vi - Ngoc Tuong Truong
論文名稱: A Translator for American Sign Language to Text and Speech
A Translator for American Sign Language to Text and Speech
指導教授: 楊傳凱
Chuan-Kai Yang
口試委員: 羅乃維
Nai-Wei Lo
林伯慎
Bor-Shen Lin
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理系
Department of Information Management
論文出版年: 2016
畢業學年度: 104
語文別: 英文
論文頁數: 57
外文關鍵詞: ASL, Haar-like classifiers, Static Hand Gesture Translator, SAPI 5.3, Text to Speech
相關次數: 點閱:202下載:15
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

  • The main goal of this study aims to develop a system that can automatically detect static hand signs of alphabets in American Sign Language and translate it into text and speech. Currently, there are several approaches based on color and shape detection to detect a hand sign, but there are also approaches using machine-learning techniques. In 2003, Viola and Jones's study is a milestone in developing an algorithm capable of detecting human faces in real time. The original technique was only for the face detection, but many other researchers have also successfully applied it for the detection of many other objects such as eyes, mouths, car’s number plates and traffic signs. Among them, the hand signs are detected successfully. In doing so, we adopted the two combined concepts AdaBoost and Haar-like classifiers. In this work, to increase the accuracy of the system, we use a huge database for training process, and it generates impressive results. The system was implemented and tested using a data set of 28000 samples of hand sign images, 1000 images for each hand sign of positive training images in different scales, illumination, and the data set of 11100 samples of negative images. All the Positive images were taken by the Logitech Webcam and the frames size were set to the VGA standard 640x480 resolution. Experiments show that our system is able to recognize all signs with a precision of 98.7%. Finally, displayed text will be converted into speech using speech synthesizer. In summary, the proposed system acquired a remarkable result in recognizing sign language against complex background.

    Table of Contents Abstract ii Acknowledgement iii List of Figures vi List of Tables viii CHAPTER 1 Introduction 1 1.1 Problem Background 1 1.2 American Sign Language (ASL) 1 1.3 Research Scope 3 1.4 Outline of Thesis 4 CHAPTER 2 Literature Review 5 2.1 Related Works 5 2.2 Our Approach 6 2.2.1 Haar-like Features 6 2.2.2 Summed-area Tables (Integral Image) 8 2.2.3 Adaboost 10 2.2.4 Cascades of Classifier 11 CHAPTER 3 Proposed System 13 3.1 System Overview 13 3.2 Training Process 14 3.2.1 Training Dataset 15 3.2.2 Training Steps 18 3.3 Testing Process 24 3.3.1 Testing Dataset 25 3.3.2 Preprocessing Stage 25 3.3.3 Classification Stage 26 3.4 Speech Synthesis 30 CHAPTER 4 Experiments and Results 32 4.1 Commonly-accepted Performance Evaluation Measures 32 4.2 Experimental Data 35 4.3 Confusion Matrix 36 4.4 Comparison 41 CHAPTER 5 Conclusions and Future Work 44 5.1 Environment Development 44 5.2 Conclusions 44 5.3 Future Work 45 References 46

    References
    [1] P. Viola and M. Jones, "Robust real-time object detection", ICCV Workshop on Statistical and Computation Theories of Vision, pp.747-772, July 2001.
    [2] K. Dabre and S. Dholay, "Machine learning model for sign language interpretation using webcam images", IEEE International Conference on Circuits, Systems, Communication and Information Technology Applications, India, April 2014.
    [3] Rainer Lienhart and Jochen Maydt, "An Extended Set of Haar-like Features for Rapid Object Detection", IEEE ICIP 2002, Vol. 1, pp. 900-903, September 2002.
    [4] Franklin C. Crow, "Summed-area tables for texture mapping". SIGGRAPH '84: Proceedings of the 11th Annual Conference on Computer Graphics and Interactive Techniques, pp. 207–212, USA, July 1984.
    [5] Yoav Freund and Robert E. Schapire, "A decision-theoretic generalization of on-line learning and an application to boosting", Journal of Computer and System Sciences, pp. 119–139, August 1997.
    [6] Siddharth S. Rautaray and Anupam Agrawal, "Vision-based hand gesture recognition for human computer interaction: a survey", DOI 10.1007/s10462-012-9356-9, Springer Science+Business Media Dordrecht, November 2012.
    [7] Marina Sokolova, Nathalic Japakowicz, Stan Szpakowicz, "Beyond Accuracy, F-Score and ROC: a family of Discriminant measures for performance evaluation", Advances in Artificial Intelligence, 2006.
    [8] J. Terrillon, Hideo Fukamachi, Shigeru Akamatsu, Mahdah Shirazi "Comparative performance of different skin chrominance models and chrominance spaces for the automatic detection of human faces in color images", Proceedings of The International Conference on Automatic Face and Gesture Recognition, pp 54–61, 2002.
    [9] Bradski G., "Real-time face and object tracking as a component of a perceptual user interface", IEEE Workshop on Applications of Computer vision, pp 214–219, Los Alamitos, California, 1998.
    [10] Kampmann M., "Segmentation of a head into face, ears, neck and hair for knowledge-based analysis synthesis coding of video-phone sequences", Proceedings of the International Conference on Image Processing (ICIP), vol 2, pp 876–880, Chicago, 1998.
    [11] Francois R., Medioni G., "Adaptive color background modeling for real-time segmentation of video streams", International Conference on Imaging Science, Systems, and Technology, pp 227–232, Las Vegas, 1999.
    [12] Shimada N., Shirai Y., Kuno Y., Miura J., "Hand gesture estimation and model refinement using monocular camera ambiguity limitation by inequality constraints", IEEE International Conference on Face and Gesture Recognition, pp 268–273, 1999.
    [13] Lee J, Kunii TL. "Model-based analysis of hand posture", IEEE Computer Graphics Application, 1995.
    [14] Cui Y, Weng J., "Hand sign recognition from intensity image sequences with complex background", Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR), pp 88–93, 1996.
    [15] Triesch J., Malsburg C., "Robust classification of hand postures against complex background", IEEE Automatic Face and Gesture Recognition, pp 170–175, 1996.
    [17] Cutler R., Turk M., "View-based interpretation of real-time optical flow for gesture recognition", Proceedings of the International Conference on Face and Gesture Recognition, pp 416–421, Washington, 1998.
    [18] Crowley J., Berard F., Coutaz J., "Finger tracking as an input device for augmented reality", International Workshop on Gesture and Face Recognition, 1995.
    [19] Birk H, Moeslund TB, Madsen CB, "Real-time recognition of hand alphabet gestures using principal component analysis", Proceedings of the Scandinavian Conference on Image Analysis, 1995.
    [20] Cote M, Payeur P, Comeau G., "Comparative study of adaptive segmentation techniques for gesture analysis in unconstrained environments", IEEE International Workshop on Imagining Systems and Techniques, pp 28–33, 2006.
    [21] Lu W-L, Little JJ, "Simultaneous tracking and action recognition using the pca-hog descriptor", The 3rd Canadian Conference on Computer and Robot Vision, pp 6–13, Quebec, 2006.
    [22] Charniak E, "Statistical language learning", MIT Press, Cambridge, 1993.
    [23] Liang R-H, Ouhyoung M, "A sign language recognition system using hidden Markov model and context sensitive search", Proceedings of the ACM Symposium on Virtual Reality Software and Technology, ACM Press, pp 59–66, 1996.
    [24] Stan S, Philip C., "FastDTW: toward accurate dynamic time warping in linear time and space", KDD Workshop on Mining Temporal and Sequential Data, 2004.
    [25] Sigal L, Sclaroff S, Athitsos V., "Skin color-based video segmentation under time-varying illumination", IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 862–877, 2004.
    [26] Pansare, J. R., Gawande, S. H., & Ingle M., "Real-Time Static Hand Gesture Recognition for American Sign Language (ASL) in Complex Background", Journal of Signal and Information Processing, pp. 364–367, 2012.

    QR CODE