提升手部動作辨識穩定性之非預期手辨識技術｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	吳紹擎 Shao-Ching Wu
論文名稱：	提升手部動作辨識穩定性之非預期手辨識技術 Unexpected Hand Detection for Stabling Hand Motion Recognition
指導教授：	花凱龍 Kai-Lung Hua 楊朝龍 Chao-Lung Yang
口試委員:	花凱龍 Kai-Lung Hua 楊朝龍 Chao-Lung Yang 沈上翔 Shan-Hsiang Shen
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2023
畢業學年度：	111
語文別：	英文
論文頁數：	35
中文關鍵詞：	物件追蹤、手部追蹤、手部姿態估計、他手、額外手、語意辨識
外文關鍵詞：	object tracking, hand tracking, hand pose estimation, additional hand, other hand, gesture recognition
相關次數：	點閱：176 下載：8
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

近期手部姿態估計領域中的主流研究都是以遮擋為主，但我們發現不論在現今主
流應用的模型或是最先進的模型中均會在他手加入時受到干擾，導致存在不穩定
性問題，此問題對於下游任務例如手語辨識、人機互動影響甚大，但未被探討及
解決，因此本研究除了驗證此問題真實存在外，並提供解決方案及評估方法，整
體來說我們的貢獻如下 (a) 在固定攝影機視角的前提下提出一種時間成本花費極
低的演算法，供所有手部姿態估計模型解決此問題 (b) 提供一個資料集、一個新
指標及一套半自動標註工具協助未來學者評估模型是否具有不穩定性問題 (c) 提
出一個自適應方法解決攝影機與手部距離不同造成的手部大小不同之問題，進一
步增加泛用性。

In the recent field of hand pose estimation, the mainstream research has primarily
focused on occlusion-based methods. However, we have discovered that regardless of
the prevalent models or state-of-the-art approaches used today, they are susceptible
to interference when the hand is introduced, leading to issues of instability. This
problem has significant implications for downstream tasks such as gesture recogni-
tion and human-robot collaboration (HRC) , yet it remains unexplored and unre-
solved. Therefore, this study aims to not only validate the existence of this problem
but also provide solutions and evaluation methods.
Our contributions are as follows: (a) We propose a time-eﬀicient algorithm with
extremely low computational cost under the assumption of a fixed camera viewpoint.
This algorithm can be employed by any hand pose estimation model to address the
aforementioned instability issue. (b) We provide a dataset, a novel metric, and a
semi-automatic annotation tool to assist future researchers in evaluating the presence
of instability issues within their models. (c) We propose an adaptive approach to
tackle the problem of varying hand sizes resulting from different distances between
the camera and the hand, thereby further enhancing the generalizability of our
methodology.

Chinese Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
List of Illustrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1 Hand pose estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2 Hand tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Lucas-Kanade optical flow . . . . . . . . . . . . . . . . . . . . . . . . 15
Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1 Applicable scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2 The formation of Optical Flow Circles (OFC) through the use of
optical flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3 The combination of OFC with the hand skeleton model . . . . . . . . 19
4 Examining the size of the OFC and the adaptive OFC . . . . . . . . 25
5 Skeleton circle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6 Post-processing after detecting the additional hand . . . . . . . . . . 31
Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2 Using mainstream evaluation metrics . . . . . . . . . . . . . . . . . . 35
3 Our evaluation metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4 The time cost expenditure . . . . . . . . . . . . . . . . . . . . . . . . 38
5 The comparison of OFC and skeleton circles . . . . . . . . . . . . . . 38
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
                                

[1] B. Tekin, F. Bogo, and M. Pollefeys, “H+ o: Unified egocentric recognition
of 3d hand-object poses and interactions,” in Proceedings of the IEEE/CVF
conference on computer vision and pattern recognition, 2019, pp. 4511–4520.
[2] Y. Hasson, B. Tekin, F. Bogo, I. Laptev, M. Pollefeys, and C. Schmid, “Lever-
aging photometric consistency over time for sparsely supervised hand-object
reconstruction,” in Proceedings of the IEEE/CVF conference on computer vi-
sion and pattern recognition, 2020, pp. 571–580.
[3] T. Kwon, B. Tekin, J. Stühmer, F. Bogo, and M. Pollefeys, “H2o: Two hands
manipulating objects for first person interaction recognition,” in Proceedings
of the IEEE/CVF International Conference on Computer Vision, 2021, pp.
10 138–10 148.
[4] G. Moon, S.-I. Yu, H. Wen, T. Shiratori, and K. M. Lee, “Interhand2. 6m: A
dataset and baseline for 3d interacting hand pose estimation from a single rgb
image,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow,
UK, August 23–28, 2020, Proceedings, Part XX 16. Springer, 2020, pp. 548–
564.
[5] F. Zhang, V. Bazarevsky, A. Vakunov, A. Tkachenka, G. Sung, C.-L. Chang,
and M. Grundmann, “Mediapipe hands: On-device real-time hand tracking,”
arXiv preprint arXiv:2006.10214, 2020.
[6] Z. Yu, S. Huang, C. Fang, T. P. Breckon, and J. Wang, “Acr: Attention
collaboration-based regressor for arbitrary two-hand reconstruction,” in Pro-
ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog-
nition, 2023, pp. 12 955–12 964.
[7] S. Han, B. Liu, R. Cabezas, C. D. Twigg, P. Zhang, J. Petkau, T.-H. Yu, C.-J.
Tai, M. Akbay, Z. Wang et al., “Megatrack: monochrome egocentric articulated
hand-tracking for virtual reality.” ACM Trans. Graph., vol. 39, no. 4, p. 87, 2020.
[8] A. Caputo, A. Giachetti, S. Soso, D. Pintani, A. D’Eusanio, S. Pini, G. Borghi,
A. Simoni, R. Vezzani, R. Cucchiara et al., “Shrec 2021: Track on skeleton-
based hand gesture recognition in the wild,” arXiv preprint arXiv:2106.10980,
2021.
[9] Q. De Smedt, H. Wannous, and J.-P. Vandeborre, “Skeleton-based dynamic
hand gesture recognition,” in Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition Workshops, 2016, pp. 1–9.
[10] F. Guo, Z. He, S. Zhang, X. Zhao, J. Fang, and J. Tan, “Normalized edge
convolutional networks for skeleton-based hand gesture recognition,” Pattern
Recognition, vol. 118, p. 108044, 2021.
[11] Y. Li, D. Ma, Y. Yu, G. Wei, and Y. Zhou, “Compact joints encoding for
skeleton-based dynamic hand gesture recognition,” Computers & Graphics,
vol. 97, pp. 191–199, 2021.
[12] X. S. Nguyen, L. Brun, O. Lézoray, and S. Bougleux, “A neural network based
on spd manifold learning for skeleton-based hand gesture recognition,” in Pro-
ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog-
nition, 2019, pp. 12 036–12 045.
[13] B. Doosti, “Hand pose estimation: A survey,” arXiv preprint arXiv:1903.01013,
2019.
[14] R. Li, Z. Liu, and J. Tan, “A survey on 3d hand pose estimation: Cameras,
methods, and datasets,” Pattern Recognition, vol. 93, pp. 251–272, 2019.
[15] W. Chen, C. Yu, C. Tu, Z. Lyu, J. Tang, S. Ou, Y. Fu, and Z. Xue, “A
survey on hand pose estimation with wearable sensors and computer-vision-
based methods,” Sensors, vol. 20, no. 4, p. 1074, 2020.
[16] H. Meng, S. Jin, W. Liu, C. Qian, M. Lin, W. Ouyang, and P. Luo, “3d inter-
acting hand pose estimation by hand de-occlusion and removal,” in Computer
Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–
27, 2022, Proceedings, Part VI. Springer, 2022, pp. 380–397.
[17] M. Li, L. An, H. Zhang, L. Wu, F. Chen, T. Yu, and Y. Liu, “Interacting
attention graph for single image two-hand reconstruction,” in Proceedings of
the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022,
pp. 2761–2770.
[18] H. Cho, D. Kim, C. Kim, S. Lee, and S. Baek, “Transformer-based global
3d hand pose estimation in two hands manipulating objects scenarios,” arXiv
preprint arXiv:2210.11384, 2022.
[19] Y. Wen, H. Pan, L. Yang, J. Pan, T. Komura, and W. Wang, “Hierarchical
temporal transformer for 3d hand pose estimation and action recognition from
egocentric rgb videos,” in Proceedings of the IEEE/CVF Conference on Com-
puter Vision and Pattern Recognition, 2023, pp. 21 243–21 253.
[20] B. D. Lucas and T. Kanade, “An iterative image registration technique with an
application to stereo vision,” in IJCAI’81: 7th international joint conference
on Artificial intelligence, vol. 2, 1981, pp. 674–679.
[21] G. Bradski, “The OpenCV Library,” Dr. Dobb’s Journal of Software Tools,
2000.
[22] C. Ionescu, D. Papava, V. Olaru, and C. Sminchisescu, “Human3. 6m: Large
scale datasets and predictive methods for 3d human sensing in natural envi-
ronments,” IEEE transactions on pattern analysis and machine intelligence,
vol. 36, no. 7, pp. 1325–1339, 2013.

全文公開日期本全文未授權公開 (校外網路)
全文公開日期 2043/07/11 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文