A Translator for American Sign Language to Text and Speech｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	Vi Ngoc Tuong Truong Vi - Ngoc Tuong Truong
論文名稱：	A Translator for American Sign Language to Text and Speech A Translator for American Sign Language to Text and Speech
指導教授：	楊傳凱 Chuan-Kai Yang
口試委員:	羅乃維 Nai-Wei Lo 林伯慎 Bor-Shen Lin
學位類別：	碩士 Master
系所名稱：	管理學院 - 資訊管理系 Department of Information Management
論文出版年：	2016
畢業學年度：	104
語文別：	英文
論文頁數：	57
外文關鍵詞：	ASL, Haar-like classifiers, Static Hand Gesture Translator, SAPI 5.3, Text to Speech
相關次數：	點閱：202 下載：15
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

The main goal of this study aims to develop a system that can automatically detect static hand signs of alphabets in American Sign Language and translate it into text and speech. Currently, there are several approaches based on color and shape detection to detect a hand sign, but there are also approaches using machine-learning techniques. In 2003, Viola and Jones's study is a milestone in developing an algorithm capable of detecting human faces in real time. The original technique was only for the face detection, but many other researchers have also successfully applied it for the detection of many other objects such as eyes, mouths, car’s number plates and traffic signs. Among them, the hand signs are detected successfully. In doing so, we adopted the two combined concepts AdaBoost and Haar-like classifiers. In this work, to increase the accuracy of the system, we use a huge database for training process, and it generates impressive results. The system was implemented and tested using a data set of 28000 samples of hand sign images, 1000 images for each hand sign of positive training images in different scales, illumination, and the data set of 11100 samples of negative images. All the Positive images were taken by the Logitech Webcam and the frames size were set to the VGA standard 640x480 resolution. Experiments show that our system is able to recognize all signs with a precision of 98.7%. Finally, displayed text will be converted into speech using speech synthesizer. In summary, the proposed system acquired a remarkable result in recognizing sign language against complex background.

Table of Contents
Abstract	ii
Acknowledgement	iii
List of Figures	vi
List of Tables	viii
CHAPTER 1  Introduction	1
1 	Problem Background	1
2 	American Sign Language (ASL)	1
3 	Research Scope	3
4 	Outline of Thesis	4
CHAPTER 2   Literature Review	5
1 	Related Works	5
2 	Our Approach	6
2.1   Haar-like Features	6
2.2   Summed-area Tables (Integral Image)	8
2.3   Adaboost	10
2.4   Cascades of Classifier	11
CHAPTER 3   Proposed System	13
1 	System Overview	13
2 	Training Process	14
2.1   Training Dataset	15
2.2   Training Steps	18
3 	Testing Process	24
3.1   Testing Dataset	25
3.2   Preprocessing Stage	25
3.3   Classification Stage	26
4      Speech Synthesis	30
CHAPTER 4   Experiments and Results	32
1 	Commonly-accepted Performance Evaluation Measures	32
2 	Experimental Data	35
3 	Confusion Matrix	36
4 	Comparison	41
CHAPTER 5   Conclusions and Future Work	44
1 	Environment Development	44
2 	Conclusions	44
3	Future Work	45
References	46

                                

References
[1] P. Viola and M. Jones, "Robust real-time object detection", ICCV Workshop on Statistical and Computation Theories of Vision, pp.747-772, July 2001.
[2] K. Dabre and S. Dholay, "Machine learning model for sign language interpretation using webcam images", IEEE International Conference on Circuits, Systems, Communication and Information Technology Applications, India, April 2014.
[3] Rainer Lienhart and Jochen Maydt, "An Extended Set of Haar-like Features for Rapid Object Detection", IEEE ICIP 2002, Vol. 1, pp. 900-903, September 2002.
[4] Franklin C. Crow, "Summed-area tables for texture mapping". SIGGRAPH '84: Proceedings of the 11th Annual Conference on Computer Graphics and Interactive Techniques, pp. 207–212, USA, July 1984.
[5] Yoav Freund and Robert E. Schapire, "A decision-theoretic generalization of on-line learning and an application to boosting", Journal of Computer and System Sciences, pp. 119–139, August 1997.
[6] Siddharth S. Rautaray and Anupam Agrawal, "Vision-based hand gesture recognition for human computer interaction: a survey", DOI 10.1007/s10462-012-9356-9, Springer Science+Business Media Dordrecht, November 2012.
[7] Marina Sokolova, Nathalic Japakowicz, Stan Szpakowicz, "Beyond Accuracy, F-Score and ROC: a family of Discriminant measures for performance evaluation", Advances in Artificial Intelligence, 2006.
[8] J. Terrillon, Hideo Fukamachi, Shigeru Akamatsu, Mahdah Shirazi "Comparative performance of different skin chrominance models and chrominance spaces for the automatic detection of human faces in color images", Proceedings of The International Conference on Automatic Face and Gesture Recognition, pp 54–61, 2002.
[9] Bradski G., "Real-time face and object tracking as a component of a perceptual user interface", IEEE Workshop on Applications of Computer vision, pp 214–219, Los Alamitos, California, 1998.
[10] Kampmann M., "Segmentation of a head into face, ears, neck and hair for knowledge-based analysis synthesis coding of video-phone sequences", Proceedings of the International Conference on Image Processing (ICIP), vol 2, pp 876–880, Chicago, 1998.
[11] Francois R., Medioni G., "Adaptive color background modeling for real-time segmentation of video streams", International Conference on Imaging Science, Systems, and Technology, pp 227–232, Las Vegas, 1999.
[12] Shimada N., Shirai Y., Kuno Y., Miura J., "Hand gesture estimation and model refinement using monocular camera ambiguity limitation by inequality constraints", IEEE International Conference on Face and Gesture Recognition, pp 268–273, 1999.
[13] Lee J, Kunii TL. "Model-based analysis of hand posture", IEEE Computer Graphics Application, 1995.
[14] Cui Y, Weng J., "Hand sign recognition from intensity image sequences with complex background", Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR), pp 88–93, 1996.
[15] Triesch J., Malsburg C., "Robust classification of hand postures against complex background", IEEE Automatic Face and Gesture Recognition, pp 170–175, 1996.
[17] Cutler R., Turk M., "View-based interpretation of real-time optical flow for gesture recognition", Proceedings of the International Conference on Face and Gesture Recognition, pp 416–421, Washington, 1998.
[18] Crowley J., Berard F., Coutaz J., "Finger tracking as an input device for augmented reality", International Workshop on Gesture and Face Recognition, 1995.
[19] Birk H, Moeslund TB, Madsen CB, "Real-time recognition of hand alphabet gestures using principal component analysis", Proceedings of the Scandinavian Conference on Image Analysis, 1995.
[20] Cote M, Payeur P, Comeau G., "Comparative study of adaptive segmentation techniques for gesture analysis in unconstrained environments", IEEE International Workshop on Imagining Systems and Techniques, pp 28–33, 2006.
[21] Lu W-L, Little JJ, "Simultaneous tracking and action recognition using the pca-hog descriptor", The 3rd Canadian Conference on Computer and Robot Vision, pp 6–13, Quebec, 2006.
[22] Charniak E, "Statistical language learning", MIT Press, Cambridge, 1993.
[23] Liang R-H, Ouhyoung M, "A sign language recognition system using hidden Markov model and context sensitive search", Proceedings of the ACM Symposium on Virtual Reality Software and Technology, ACM Press, pp 59–66, 1996.
[24] Stan S, Philip C., "FastDTW: toward accurate dynamic time warping in linear time and space", KDD Workshop on Mining Temporal and Sequential Data, 2004.
[25] Sigal L, Sclaroff S, Athitsos V., "Skin color-based video segmentation under time-varying illumination", IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 862–877, 2004.
[26] Pansare, J. R., Gawande, S. H., & Ingle M., "Real-Time Static Hand Gesture Recognition for American Sign Language (ASL) in Complex Background", Journal of Signal and Information Processing, pp. 364–367, 2012.

簡易檢索 / 詳目顯示

相關論文