簡易檢索 / 詳目顯示

研究生: 潘秀蓮
Yulia
論文名稱: Transition Motion Synthesis for Video-Based Text to ASL
Transition Motion Synthesis for Video-Based Text to ASL
指導教授: 楊傳凱
Chuan-Kai Yang
口試委員: 林伯慎
Bor-Shen Lin
孫沛立
Pei-Li Sun
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理系
Department of Information Management
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 68
中文關鍵詞: ASLSign LanguageDeaf TalkOpenPoseTransition Motion Synthesis
外文關鍵詞: ASL, Sign Language, Deaf Talk, OpenPose, Transition Motion Synthesis
相關次數: 點閱:128下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • This research describes a novel approach to provide a text to ASL 1
    media, a VideoBased
    Text to ASL. The hearing impaired or we called as
    the Deaf are used to communicate using Sign Language. When they have
    to face the spoken language, they have difficulties to read the spoken words
    as fast as the hearing people.
    The availability of a public dataset named ASL Lexicon Dataset give
    the challenge to make the videobased
    interpreter for the Deaf. The problem
    is on the transition from one word to another since it does not exist in
    the original dataset. Regarding to this case, our focus in on how to make a
    better transition from one word to another rather than a blink.
    After the dataset has been preprocessed,
    they are fed to OpenPose library
    to extract the skeleton of the signers and save it as JSON files. The
    system requires the user to input some glosses2 by text, then it will find the
    JSON files and the videos for the corresponding glosses. The whole sequences
    of original video are also fed into the system to be used as a transition
    pools. Later, the corresponding frames of the glosses are input together
    with the transition pools to construct the sequence transition frames. After
    getting the sequences, a smoothing algorithm is applied to enhance the
    smoothness of the motion.
    Since this algorithm is fully depends on the transition pulls, there are
    some limitation regarding to make a good transition. If the transition frames we need to make a logically and visually correct motion are not available,
    then the result will be not optimized. But as long as the frames we need are
    available, this system can generate a logically and visually correct transitions.


    This research describes a novel approach to provide a text to ASL 1
    media, a VideoBased
    Text to ASL. The hearing impaired or we called as
    the Deaf are used to communicate using Sign Language. When they have
    to face the spoken language, they have difficulties to read the spoken words
    as fast as the hearing people.
    The availability of a public dataset named ASL Lexicon Dataset give
    the challenge to make the videobased
    interpreter for the Deaf. The problem
    is on the transition from one word to another since it does not exist in
    the original dataset. Regarding to this case, our focus in on how to make a
    better transition from one word to another rather than a blink.
    After the dataset has been preprocessed,
    they are fed to OpenPose library
    to extract the skeleton of the signers and save it as JSON files. The
    system requires the user to input some glosses2 by text, then it will find the
    JSON files and the videos for the corresponding glosses. The whole sequences
    of original video are also fed into the system to be used as a transition
    pools. Later, the corresponding frames of the glosses are input together
    with the transition pools to construct the sequence transition frames. After
    getting the sequences, a smoothing algorithm is applied to enhance the
    smoothness of the motion.
    Since this algorithm is fully depends on the transition pulls, there are
    some limitation regarding to make a good transition. If the transition frames we need to make a logically and visually correct motion are not available,
    then the result will be not optimized. But as long as the frames we need are
    available, this system can generate a logically and visually correct transitions.

    Recommendation Letter . . . . . . . . . . . . . . . . . . . . . . . . i Approval Letter . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . v Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii List of Pseudocodes . . . . . . . . . . . . . . . . . . . . . . . . . . xiv 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 Introduction to ASL . . . . . . . . . . . . . . . . . . . . . 5 2.2 Previous Text to ASL System . . . . . . . . . . . . . . . . 7 2.3 ASL Lexicon Video Dataset . . . . . . . . . . . . . . . . 9 2.4 OpenPose . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.5 Motion Synthesis . . . . . . . . . . . . . . . . . . . . . . 16 3 Proposed System . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1 System Overview . . . . . . . . . . . . . . . . . . . . . . 18 3.2 System Architecture . . . . . . . . . . . . . . . . . . . . . 19 3.3 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.4 OpenPose Library . . . . . . . . . . . . . . . . . . . . . . 20 3.5 Constructing Transition Frames . . . . . . . . . . . . . . . 21 3.5.1 Keypoints Selection . . . . . . . . . . . . . . . . 22 3.5.2 Similarity Measurement . . . . . . . . . . . . . . 23 3.5.3 Composing Transition Frames Sequence . . . . . . 25 3.5.4 Outliers Prevention . . . . . . . . . . . . . . . . . 32 3.5.5 Animation Smoothing . . . . . . . . . . . . . . . 33 4 Experimental Result . . . . . . . . . . . . . . . . . . . . . . . . 35 4.1 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.2 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.3 Algorithm Verification . . . . . . . . . . . . . . . . . . . 40 4.4 Scene Change Detection . . . . . . . . . . . . . . . . . . 43 4.5 User Study . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.5.1 Video Type Preference . . . . . . . . . . . . . . . 46 4.5.2 Motion Smoothness Quality . . . . . . . . . . . . 47 5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.2 Limitations and Future Works . . . . . . . . . . . . . . . 50 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

    [1] M. Ahmed, M. Idrees, Z. ul Abideen, R. Mumtaz, and S. Khalique, “Deaf talk using 3d animated sign
    language: A sign language interpreter using microsoft’s kinect v2,” in 2016 SAI Computing Conference
    (SAI), pp. 330–335, July 2016.
    [2] A. Irving and R. Foulds, “A parametric approach to sign language synthesis,” in ASSETS, pp. 212–213,
    10 2005.
    [3] S. Cox, M. Lincoln, J. Tryggvason, M. Nakisa, M. Wells, M. Tutt, and S. Abbott, “Tessa, a system to
    aid communication with deaf people,” in Proceedings of the Fifth International ACM Conference on
    Assistive Technologies, Assets ’02, (New York, NY, USA), pp. 205–212, ACM, 2002.
    [4] V. Athitsos, C. Neidle, S. Sclaroff, J. Nash, A. Stefan, , and A. Thangali, “The american sign language
    lexicon video dataset,” in 2008 IEEE Computer Society Conference on Computer Vision and Pattern
    Recognition Workshops, pp. 1–8, June 2008.
    [5] Z. Cao, G. Hidalgo, T. Simon, S.E.
    Wei, and Y. Sheikh, “OpenPose: realtime multiperson
    2d pose
    estimation using Part Affinity Fields,” in arXiv preprint arXiv:1812.08008, 2018.
    [6] X. Xu, L. Wan, X. Liu, T.T.
    Wong, L. Wang, and C.S.
    Leung, “Animating animal motion from still,”
    in ACM SIGGRAPH Asia 2008 Papers, SIGGRAPH Asia ’08, (New York, NY, USA), pp. 117:1–
    117:8, ACM, 2008.
    [7] Z. Cao, T. Simon, S.E.
    Wei, and Y. Sheikh, “Realtime multiperson
    2d pose estimation using part
    affinity fields,” in CVPR, 2017.
    [8] T. Simon, H. Joo, I. Matthews, and Y. Sheikh, “Hand keypoint detection in single images using multiview
    bootstrapping,” in CVPR, 2017.
    [9] S.E.
    Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh, “Convolutional pose machines,” in CVPR, 2016.
    [10] W. H. Organization, “Deafness and hearing loss.” https://www.who.int/news-room/
    fact-sheets/detail/deafness-and-hearing-loss, 03 2019. [Online; accessed 01July2019].
    [11] W. Sandler and D. LilloMartin,
    Sign Language and Linguistic Universals. Cambridge University
    Press, 2006.
    [12] U. D. o. L. Bureau of Labor Statistics, “Occupational outlook handbook: Interpreters
    and translators.” https://www.bls.gov/ooh/media-and-communication/
    interpreters-and-translators.htm, 04 2019. [Online; accessed 01July2019].
    [13] M. Jay, “History of american sign language.” https://www.startasl.com/
    history-of-american-sign-language, 10 2010. [Online; accessed 09July2019].
    [14] Wikipedia contributors, “American sign language.” https://en.wikipedia.org/w/index.php?
    title=American_Sign_Language&oldid=904706391, 2019. [Online; accessed 9July2019].
    [15] DawnSignPress, “History of american sign language.” https://www.dawnsign.com/
    news-detail/history-of-american-sign-language, 08 2016. [Online; accessed 09July2019].
    [16] B. Bauer and K.F.
    Kraiss, “Towards an automatic sign language recognition system using subunits,” in
    Revised Papers from the International Gesture Workshop on Gesture and Sign Languages in HumanComputer
    Interaction, GW ’01, (London, UK, UK), pp. 64–75, SpringerVerlag,
    2002.
    [17] Jiangwen Deng and H. T. Tsui, “A pca/mda scheme for hand posture recognition,” in Proceedings
    of Fifth IEEE International Conference on Automatic Face Gesture Recognition, pp. 294–299, May
    2002.
    [18] M. G. B. R. A. Tennant, The American Sign Language Handshape Dictionary. Gallaudet University
    Press, 2010.
    [19] C. Valli, The Gallaudet Dictionary of American Sign Language. Gallaudet University Press, 2006.
    [20] Wikipedia contributors, “Data compression — Wikipedia, the free encyclopedia.” https://en.
    wikipedia.org/w/index.php?title=Data_compression&oldid=906680933, 2019. [Online;
    accessed 19July2019].

    無法下載圖示 全文公開日期 2024/07/29 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE