簡易檢索 / 詳目顯示

研究生: 陳禹心
Yu-Hsun Chen
論文名稱: 利用3D幾何關係進行骨架動作辨識之研究
The Study of Skeleton-Based Action Recognition Based on 3D Geometric Relationship
指導教授: 陳郁堂
Yie-Tarng Chen
口試委員: 陳省隆
Hsing-Lung Chen
吳乾彌
Chen-Mie Wu
林銘波
Ming-Bo Lin
方文賢
Wen-Hsien Fang
陳郁堂
Yie-Tarng Chen
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2017
畢業學年度: 105
語文別: 英文
論文頁數: 39
中文關鍵詞: 動作辨識身體部位骨架表示李群3D幾何關係
外文關鍵詞: Action Recognition, Body Part-Based, Skeletal Representation, Lie group, 3D geometric relationship
相關次數: 點閱:171下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

近年來,由於RGB-D攝影機的發明,視頻不僅可以提供RGB信息,還可以提供深度信息。從深度影片中獲取關節,然後將關節連接形成骨架來代表人的身體,而基於骨架的人體行為辨識研究則越來越受到重視。在本文中,我們從深度影像中獲取骨架信息。除了使用20個關節連接得到的19個(局部)身體部位之外,我們將骨架分成多個大平面,稱為中級身體部位。然後將每個身體部位(局部以及中級身體部位)之間的3D幾何關係用旋轉和位移來代表,並在特殊歐幾里得SE(3)$的3D李群空間中表示為一個點。除了“姿態”信息之外,我們還從時間間隔中計算身體部位的3D幾何關係,以獲得“軌跡”信息,我們稱這特徵為時間間隔特徵。使用這種3D幾何關係特徵表示方法,人體行為可以被表示為李群SE(3)x...xSE(3)上的曲線,其中x表示李群之間的直接乘積,然後利用對數圖,將曲線從李群映射到李群的向量空間以進行後續時間處理及分類的部分。我們使用“Dynamic Time Warping”(DTW)處理不同人做相同動作的速率變化,以及“Fourier Temporal Pyramid”(FTP)處理時間上的偏差跟雜訊,最後使用線性支持向量機(SVM)建立模組來執行分類。 在MSRaction3D,UTKinect和Floerence3D數據集的實驗結果表明,我們提出的特徵表徵優於其他骨架表徵,並達到與各種最先進的基於骨架的人類動作識別方法相當的結果。


In recent years, skeleton-based human action recognition has been receiving significant attention. In this paper, we present a novel 3D geometric feature for skeleton-based action based on Lie group. First, we divide the skeleton into several parts called mid-level body parts. Then, the 3D geometric relationships between each pair of body parts are represented as a point in Lie group SE(3) through rotation and displacement operations. Having the 'posture' information, we also represent the 3D geometric relationships in time interval in order to obtain the temporal dynamical information, which can be regarded 'trajectory' information. Consequently, human actions can be modeled as curves in the Lie group SE(3)x...xSE(3), then mapping action curves from the Lie group to its Lie algebra by using the logarithm map. Finally, we can perform action classification by “dynamic time warping” (DTW), “Fourier temporal pyramid” (FTP) representation and linear SVM. Experimental results show the proposed feature representation outperforms state-of-the-art approaches on MSRaction3D, UTKinect and Floerence3D datasets.

中文摘要 Abstract Acknowledgment Table of contents List of Tables Chapter 1 Introduction Chapter 2 Related work Chapter 3 Feature Representation of Human Body and Actions 3.1 Local Body-Part Feature Descriptor by Lie group SE(3) 3.2 Mid-Level Body-Part Feature Representation 3.3 Temporal Dynamical Feature Representation of Body Parts 3.4 Lie group SE(3) and Lie algebra se(3) Chapter 4 Temporal Modeling and Classification 4.1 Computing Nominal Curves and Warping Curves Using DTW 4.2 Fourier Temporal Pyramid Representation Chapter 5 Experimental Results 5.1 Datasets 5.2 Evaluation Settings and Parameters 5.3 Results and Analysis 5.3.1 Proposed features analysis 5.3.2 Bidirectional and unidirectional relationship analysis 5.3.3 Feature combination analysis 5.3.4 Temporal modeling analysis 5.3.5 Comparison with state-of-the-art results Chapter 6 Conclusion References

[1] W. Li, Z. Zhang, and Z. Liu, \Action recognition based on a bag of 3d points," in IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 9{14, 2010.
[2] Y. Zhu, W. Chen, and G. Guo, \Fusing spatiotemporal features and joints for 3d action recognition," in IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 486{491, 2013.
[3] M. E. Hussein, M. Torki, M. A. Gowayyed, and M. El-Saban, \Human action recognition using a temporal hierarchy of covariance descriptors on 3d joint locations," in International Joint Conference on Artificial Intelligence, 2013.
[4] L. Xia, C.-C. Chen, and J. Aggarwal, \View invariant human action recognition using histograms of 3d joints," in IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 20{27, 2012.
[5] J. Wang, Z. Liu, Y. Wu, and J. Yuan, \Mining actionlet ensemble for action recognition with depth cameras," in IEEE Conference on Computer Vision and Pattern Recognition, pp. 1290{1297, 2012.
[6] X. Yang and Y. L. Tian, \Eigenjoints-based action recognition using naive-bayes-nearest-neighbor," in IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 14{19, 2012.
[7] Y. Yacoob and M. J. Black, \Parameterized modeling and recognition of activities," in Sixth International Conference on Computer Vision, pp. 120{127,1998.
[8] F. O i, R. Chaudhry, G. Kurillo, R. Vidal, and R. Bajcsy, \Sequence of the Most Informative Joints (SMIJ): A New Representation for Human Skeletal Action Recognition," Journal of Visual Communication and Image Representation, pp. 24{38, 2014.
[9] E. Ohn-Bar and M. Trivedi, \Joint angles similarities and hog2 for action recognition," in IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 465{470, 2013.
[10] R. Vemulapalli, F. Arrate, and R. Chellappa, \Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group," in IEEE Conference on Computer Vision and Pattern Recognition, pp. 588{595, 2014.
[11] C. Chen, Y. Zhuang, F. Nie, Y. Yang, F. Wu, and J. Xiao, \Learning a 3D Human Pose Distance Metric from Geometric Pose Descriptor," IEEE Transactions on Visualization and Computer Graphics, pp. 1676{1689, 2011.
[12] X. Cai, W. Zhou, L. Wu, J. Luo, and H. Li, \Effective active skeleton representation for low latency human action recognition," vol. 18, no. 2, pp. 141{154, 2016.
[13] W. Zhu, C. Lan, J. Xing, W. Zeng, Y. Li, L. Shen, X. Xie, et al., \Co-occurrence feature learning for skeleton based action recognition using regularized deep lstm networks.," in Conference on Association for the Advancement of Artificial Intelligence, vol. 2, p. 8, 2016.
[14] L. Tao and R. Vidal, \Moving poselets: A discriminative and interpretable skeletal motion representation for action recognition," in IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 61{69, 2015.
[15] Z. Huang, C. Wan, T. Probst, and L. Van Gool, \Deep learning on lie groups for skeleton-based action recognition," arXiv:1612.05877, 2016.
[16] Y. Du, W. Wang, and L. Wang, \Hierarchical recurrent neural network for skeleton based action recognition," in IEEE Conference on Computer Vision and Pattern Recognition, pp. 1110{1118, 2015.
[17] J. Liu, A. Shahroudy, D. Xu, and G. Wang, \Spatio-temporal lstm with trust gates for 3d human action recognition," in European Conference on Computer Vision, pp. 816{833, Springer, 2016.
[18] F. Baradel, C. Wolf, and J. Mille, \Pose-conditioned spatio-temporal attention for human action recognition," arXiv:1703.10106, 2017.
[19] L. Seidenari, V. Varano, S. Berretti, A. Bimbo, and P. Pala, \Recognizing actions from depth cameras as weakly aligned multi-part bag-of-poses," in IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 479{ 485, 2013.
[20] C. Wang, Y. Wang, and A. L. Yuille, \An approach to pose-based action recognition," in IEEE Conference on Computer Vision and Pattern Recognition, pp. 915{922, 2013.

QR CODE