簡易檢索 / 詳目顯示

研究生: 吳紹擎
Shao-Ching Wu
論文名稱: 提升手部動作辨識穩定性之非預期手辨識技術
Unexpected Hand Detection for Stabling Hand Motion Recognition
指導教授: 花凱龍
Kai-Lung Hua
楊朝龍
Chao-Lung Yang
口試委員: 花凱龍
Kai-Lung Hua
楊朝龍
Chao-Lung Yang
沈上翔
Shan-Hsiang Shen
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 35
中文關鍵詞: 物件追蹤手部追蹤手部姿態估計他手額外手語意辨識
外文關鍵詞: object tracking, hand tracking, hand pose estimation, additional hand, other hand, gesture recognition
相關次數: 點閱:176下載:8
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近期手部姿態估計領域中的主流研究都是以遮擋為主,但我們發現不論在現今主
    流應用的模型或是最先進的模型中均會在他手加入時受到干擾,導致存在不穩定
    性問題,此問題對於下游任務例如手語辨識、人機互動影響甚大,但未被探討及
    解決,因此本研究除了驗證此問題真實存在外,並提供解決方案及評估方法,整
    體來說我們的貢獻如下 (a) 在固定攝影機視角的前提下提出一種時間成本花費極
    低的演算法,供所有手部姿態估計模型解決此問題 (b) 提供一個資料集、一個新
    指標及一套半自動標註工具協助未來學者評估模型是否具有不穩定性問題 (c) 提
    出一個自適應方法解決攝影機與手部距離不同造成的手部大小不同之問題,進一
    步增加泛用性。


    In the recent field of hand pose estimation, the mainstream research has primarily
    focused on occlusion-based methods. However, we have discovered that regardless of
    the prevalent models or state-of-the-art approaches used today, they are susceptible
    to interference when the hand is introduced, leading to issues of instability. This
    problem has significant implications for downstream tasks such as gesture recogni-
    tion and human-robot collaboration (HRC) , yet it remains unexplored and unre-
    solved. Therefore, this study aims to not only validate the existence of this problem
    but also provide solutions and evaluation methods.
    Our contributions are as follows: (a) We propose a time-efficient algorithm with
    extremely low computational cost under the assumption of a fixed camera viewpoint.
    This algorithm can be employed by any hand pose estimation model to address the
    aforementioned instability issue. (b) We provide a dataset, a novel metric, and a
    semi-automatic annotation tool to assist future researchers in evaluating the presence
    of instability issues within their models. (c) We propose an adaptive approach to
    tackle the problem of varying hand sizes resulting from different distances between
    the camera and the hand, thereby further enhancing the generalizability of our
    methodology.

    Chinese Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 List of Illustrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.1 Hand pose estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 Hand tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3 Lucas-Kanade optical flow . . . . . . . . . . . . . . . . . . . . . . . . 15 3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.1 Applicable scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 The formation of Optical Flow Circles (OFC) through the use of optical flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3 The combination of OFC with the hand skeleton model . . . . . . . . 19 3.4 Examining the size of the OFC and the adaptive OFC . . . . . . . . 25 3.5 Skeleton circle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.6 Post-processing after detecting the additional hand . . . . . . . . . . 31 4 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2 Using mainstream evaluation metrics . . . . . . . . . . . . . . . . . . 35 4.3 Our evaluation metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.4 The time cost expenditure . . . . . . . . . . . . . . . . . . . . . . . . 38 4.5 The comparison of OFC and skeleton circles . . . . . . . . . . . . . . 38 5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

    [1] B. Tekin, F. Bogo, and M. Pollefeys, “H+ o: Unified egocentric recognition
    of 3d hand-object poses and interactions,” in Proceedings of the IEEE/CVF
    conference on computer vision and pattern recognition, 2019, pp. 4511–4520.
    [2] Y. Hasson, B. Tekin, F. Bogo, I. Laptev, M. Pollefeys, and C. Schmid, “Lever-
    aging photometric consistency over time for sparsely supervised hand-object
    reconstruction,” in Proceedings of the IEEE/CVF conference on computer vi-
    sion and pattern recognition, 2020, pp. 571–580.
    [3] T. Kwon, B. Tekin, J. Stühmer, F. Bogo, and M. Pollefeys, “H2o: Two hands
    manipulating objects for first person interaction recognition,” in Proceedings
    of the IEEE/CVF International Conference on Computer Vision, 2021, pp.
    10 138–10 148.
    [4] G. Moon, S.-I. Yu, H. Wen, T. Shiratori, and K. M. Lee, “Interhand2. 6m: A
    dataset and baseline for 3d interacting hand pose estimation from a single rgb
    image,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow,
    UK, August 23–28, 2020, Proceedings, Part XX 16. Springer, 2020, pp. 548–
    564.
    [5] F. Zhang, V. Bazarevsky, A. Vakunov, A. Tkachenka, G. Sung, C.-L. Chang,
    and M. Grundmann, “Mediapipe hands: On-device real-time hand tracking,”
    arXiv preprint arXiv:2006.10214, 2020.
    [6] Z. Yu, S. Huang, C. Fang, T. P. Breckon, and J. Wang, “Acr: Attention
    collaboration-based regressor for arbitrary two-hand reconstruction,” in Pro-
    ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog-
    nition, 2023, pp. 12 955–12 964.
    [7] S. Han, B. Liu, R. Cabezas, C. D. Twigg, P. Zhang, J. Petkau, T.-H. Yu, C.-J.
    Tai, M. Akbay, Z. Wang et al., “Megatrack: monochrome egocentric articulated
    hand-tracking for virtual reality.” ACM Trans. Graph., vol. 39, no. 4, p. 87, 2020.
    [8] A. Caputo, A. Giachetti, S. Soso, D. Pintani, A. D’Eusanio, S. Pini, G. Borghi,
    A. Simoni, R. Vezzani, R. Cucchiara et al., “Shrec 2021: Track on skeleton-
    based hand gesture recognition in the wild,” arXiv preprint arXiv:2106.10980,
    2021.
    [9] Q. De Smedt, H. Wannous, and J.-P. Vandeborre, “Skeleton-based dynamic
    hand gesture recognition,” in Proceedings of the IEEE Conference on Computer
    Vision and Pattern Recognition Workshops, 2016, pp. 1–9.
    [10] F. Guo, Z. He, S. Zhang, X. Zhao, J. Fang, and J. Tan, “Normalized edge
    convolutional networks for skeleton-based hand gesture recognition,” Pattern
    Recognition, vol. 118, p. 108044, 2021.
    [11] Y. Li, D. Ma, Y. Yu, G. Wei, and Y. Zhou, “Compact joints encoding for
    skeleton-based dynamic hand gesture recognition,” Computers & Graphics,
    vol. 97, pp. 191–199, 2021.
    [12] X. S. Nguyen, L. Brun, O. Lézoray, and S. Bougleux, “A neural network based
    on spd manifold learning for skeleton-based hand gesture recognition,” in Pro-
    ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog-
    nition, 2019, pp. 12 036–12 045.
    [13] B. Doosti, “Hand pose estimation: A survey,” arXiv preprint arXiv:1903.01013,
    2019.
    [14] R. Li, Z. Liu, and J. Tan, “A survey on 3d hand pose estimation: Cameras,
    methods, and datasets,” Pattern Recognition, vol. 93, pp. 251–272, 2019.
    [15] W. Chen, C. Yu, C. Tu, Z. Lyu, J. Tang, S. Ou, Y. Fu, and Z. Xue, “A
    survey on hand pose estimation with wearable sensors and computer-vision-
    based methods,” Sensors, vol. 20, no. 4, p. 1074, 2020.
    [16] H. Meng, S. Jin, W. Liu, C. Qian, M. Lin, W. Ouyang, and P. Luo, “3d inter-
    acting hand pose estimation by hand de-occlusion and removal,” in Computer
    Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–
    27, 2022, Proceedings, Part VI. Springer, 2022, pp. 380–397.
    [17] M. Li, L. An, H. Zhang, L. Wu, F. Chen, T. Yu, and Y. Liu, “Interacting
    attention graph for single image two-hand reconstruction,” in Proceedings of
    the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022,
    pp. 2761–2770.
    [18] H. Cho, D. Kim, C. Kim, S. Lee, and S. Baek, “Transformer-based global
    3d hand pose estimation in two hands manipulating objects scenarios,” arXiv
    preprint arXiv:2210.11384, 2022.
    [19] Y. Wen, H. Pan, L. Yang, J. Pan, T. Komura, and W. Wang, “Hierarchical
    temporal transformer for 3d hand pose estimation and action recognition from
    egocentric rgb videos,” in Proceedings of the IEEE/CVF Conference on Com-
    puter Vision and Pattern Recognition, 2023, pp. 21 243–21 253.
    [20] B. D. Lucas and T. Kanade, “An iterative image registration technique with an
    application to stereo vision,” in IJCAI’81: 7th international joint conference
    on Artificial intelligence, vol. 2, 1981, pp. 674–679.
    [21] G. Bradski, “The OpenCV Library,” Dr. Dobb’s Journal of Software Tools,
    2000.
    [22] C. Ionescu, D. Papava, V. Olaru, and C. Sminchisescu, “Human3. 6m: Large
    scale datasets and predictive methods for 3d human sensing in natural envi-
    ronments,” IEEE transactions on pattern analysis and machine intelligence,
    vol. 36, no. 7, pp. 1325–1339, 2013.

    無法下載圖示
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 2043/07/11 (國家圖書館:臺灣博碩士論文系統)
    QR CODE