研究生: |
潘秀蓮 Yulia |
---|---|
論文名稱: |
Transition Motion Synthesis for Video-Based Text to ASL Transition Motion Synthesis for Video-Based Text to ASL |
指導教授: |
楊傳凱
Chuan-Kai Yang |
口試委員: |
林伯慎
Bor-Shen Lin 孫沛立 Pei-Li Sun |
學位類別: |
碩士 Master |
系所名稱: |
管理學院 - 資訊管理系 Department of Information Management |
論文出版年: | 2019 |
畢業學年度: | 107 |
語文別: | 英文 |
論文頁數: | 68 |
中文關鍵詞: | ASL 、Sign Language 、Deaf Talk 、OpenPose 、Transition Motion Synthesis |
外文關鍵詞: | ASL, Sign Language, Deaf Talk, OpenPose, Transition Motion Synthesis |
相關次數: | 點閱:128 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
This research describes a novel approach to provide a text to ASL 1
media, a VideoBased
Text to ASL. The hearing impaired or we called as
the Deaf are used to communicate using Sign Language. When they have
to face the spoken language, they have difficulties to read the spoken words
as fast as the hearing people.
The availability of a public dataset named ASL Lexicon Dataset give
the challenge to make the videobased
interpreter for the Deaf. The problem
is on the transition from one word to another since it does not exist in
the original dataset. Regarding to this case, our focus in on how to make a
better transition from one word to another rather than a blink.
After the dataset has been preprocessed,
they are fed to OpenPose library
to extract the skeleton of the signers and save it as JSON files. The
system requires the user to input some glosses2 by text, then it will find the
JSON files and the videos for the corresponding glosses. The whole sequences
of original video are also fed into the system to be used as a transition
pools. Later, the corresponding frames of the glosses are input together
with the transition pools to construct the sequence transition frames. After
getting the sequences, a smoothing algorithm is applied to enhance the
smoothness of the motion.
Since this algorithm is fully depends on the transition pulls, there are
some limitation regarding to make a good transition. If the transition frames we need to make a logically and visually correct motion are not available,
then the result will be not optimized. But as long as the frames we need are
available, this system can generate a logically and visually correct transitions.
This research describes a novel approach to provide a text to ASL 1
media, a VideoBased
Text to ASL. The hearing impaired or we called as
the Deaf are used to communicate using Sign Language. When they have
to face the spoken language, they have difficulties to read the spoken words
as fast as the hearing people.
The availability of a public dataset named ASL Lexicon Dataset give
the challenge to make the videobased
interpreter for the Deaf. The problem
is on the transition from one word to another since it does not exist in
the original dataset. Regarding to this case, our focus in on how to make a
better transition from one word to another rather than a blink.
After the dataset has been preprocessed,
they are fed to OpenPose library
to extract the skeleton of the signers and save it as JSON files. The
system requires the user to input some glosses2 by text, then it will find the
JSON files and the videos for the corresponding glosses. The whole sequences
of original video are also fed into the system to be used as a transition
pools. Later, the corresponding frames of the glosses are input together
with the transition pools to construct the sequence transition frames. After
getting the sequences, a smoothing algorithm is applied to enhance the
smoothness of the motion.
Since this algorithm is fully depends on the transition pulls, there are
some limitation regarding to make a good transition. If the transition frames we need to make a logically and visually correct motion are not available,
then the result will be not optimized. But as long as the frames we need are
available, this system can generate a logically and visually correct transitions.
[1] M. Ahmed, M. Idrees, Z. ul Abideen, R. Mumtaz, and S. Khalique, “Deaf talk using 3d animated sign
language: A sign language interpreter using microsoft’s kinect v2,” in 2016 SAI Computing Conference
(SAI), pp. 330–335, July 2016.
[2] A. Irving and R. Foulds, “A parametric approach to sign language synthesis,” in ASSETS, pp. 212–213,
10 2005.
[3] S. Cox, M. Lincoln, J. Tryggvason, M. Nakisa, M. Wells, M. Tutt, and S. Abbott, “Tessa, a system to
aid communication with deaf people,” in Proceedings of the Fifth International ACM Conference on
Assistive Technologies, Assets ’02, (New York, NY, USA), pp. 205–212, ACM, 2002.
[4] V. Athitsos, C. Neidle, S. Sclaroff, J. Nash, A. Stefan, , and A. Thangali, “The american sign language
lexicon video dataset,” in 2008 IEEE Computer Society Conference on Computer Vision and Pattern
Recognition Workshops, pp. 1–8, June 2008.
[5] Z. Cao, G. Hidalgo, T. Simon, S.E.
Wei, and Y. Sheikh, “OpenPose: realtime multiperson
2d pose
estimation using Part Affinity Fields,” in arXiv preprint arXiv:1812.08008, 2018.
[6] X. Xu, L. Wan, X. Liu, T.T.
Wong, L. Wang, and C.S.
Leung, “Animating animal motion from still,”
in ACM SIGGRAPH Asia 2008 Papers, SIGGRAPH Asia ’08, (New York, NY, USA), pp. 117:1–
117:8, ACM, 2008.
[7] Z. Cao, T. Simon, S.E.
Wei, and Y. Sheikh, “Realtime multiperson
2d pose estimation using part
affinity fields,” in CVPR, 2017.
[8] T. Simon, H. Joo, I. Matthews, and Y. Sheikh, “Hand keypoint detection in single images using multiview
bootstrapping,” in CVPR, 2017.
[9] S.E.
Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh, “Convolutional pose machines,” in CVPR, 2016.
[10] W. H. Organization, “Deafness and hearing loss.” https://www.who.int/news-room/
fact-sheets/detail/deafness-and-hearing-loss, 03 2019. [Online; accessed 01July2019].
[11] W. Sandler and D. LilloMartin,
Sign Language and Linguistic Universals. Cambridge University
Press, 2006.
[12] U. D. o. L. Bureau of Labor Statistics, “Occupational outlook handbook: Interpreters
and translators.” https://www.bls.gov/ooh/media-and-communication/
interpreters-and-translators.htm, 04 2019. [Online; accessed 01July2019].
[13] M. Jay, “History of american sign language.” https://www.startasl.com/
history-of-american-sign-language, 10 2010. [Online; accessed 09July2019].
[14] Wikipedia contributors, “American sign language.” https://en.wikipedia.org/w/index.php?
title=American_Sign_Language&oldid=904706391, 2019. [Online; accessed 9July2019].
[15] DawnSignPress, “History of american sign language.” https://www.dawnsign.com/
news-detail/history-of-american-sign-language, 08 2016. [Online; accessed 09July2019].
[16] B. Bauer and K.F.
Kraiss, “Towards an automatic sign language recognition system using subunits,” in
Revised Papers from the International Gesture Workshop on Gesture and Sign Languages in HumanComputer
Interaction, GW ’01, (London, UK, UK), pp. 64–75, SpringerVerlag,
2002.
[17] Jiangwen Deng and H. T. Tsui, “A pca/mda scheme for hand posture recognition,” in Proceedings
of Fifth IEEE International Conference on Automatic Face Gesture Recognition, pp. 294–299, May
2002.
[18] M. G. B. R. A. Tennant, The American Sign Language Handshape Dictionary. Gallaudet University
Press, 2010.
[19] C. Valli, The Gallaudet Dictionary of American Sign Language. Gallaudet University Press, 2006.
[20] Wikipedia contributors, “Data compression — Wikipedia, the free encyclopedia.” https://en.
wikipedia.org/w/index.php?title=Data_compression&oldid=906680933, 2019. [Online;
accessed 19July2019].