簡易檢索 / 詳目顯示

研究生: 徐麒杰
CHI-CHIEH HSU
論文名稱: 基於增強骨架軌跡圖進行人體骨架動作辨識之研究
The Study of Enhanced Joint Trajectory Maps for Action Recognition
指導教授: 陳郁堂
Yie-Tarng Chen
口試委員: 陳郁堂
Yie-Tarng Chen
呂政修
Jenq-Shiou Leu
林銘波
Ming-Bo Lin
陳省隆
Hsing-Lung Chen
吳乾彌
Chen-Mie Wu
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 英文
論文頁數: 45
中文關鍵詞: 深度學習可視化動作辨識人體骨架
外文關鍵詞: action recongnition, CNN, RGB-D, skeleton
相關次數: 點閱:290下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

基于人體骨架的人類行為辨識在應用於人機互動和智能監控中具有廣泛的意義。由於近幾年類神經網路在圖片的辨識及物件偵測取得很好的效果, 然而,如何有效地使用卷積類神經網絡網路在影片識別的範疇中,仍然是個艱鉅的挑戰。 除此以外,該如何有效的表示時間及空間骨架序列仍然是一個棘手的問題。為了解決這些問題,在本研究中提出了一個確實,有效卻簡單的編碼方法來將時空信息可視化,首先我們將3D骨架序列投影到平面座標軸,如此一來,我們得到多個2D圖像,利用人體骨架及物理運動的特性,分別去將投影出來的軌跡圖去著色。透過早期的特徵融合,採用端到端的類神經網路模型進行提取從彩色圖像特徵,該特徵具有強大的判別性所以利於分類、辨識該行為。為了證明了我們優越的方法,在兩個廣泛使用於骨架動作辨識(UTD-MHAD動作辨識)、(G3D遊戲數據集)的實驗結果中,本研究方法所需的訓練範本數量與辨識準確度皆優於其他先進的方法。


Human action recognition based on skeletons has wide applications in human-computer interaction and intelligent surveillance. However, how to effectively use convolutional neural network for video-based recognition brings challenges to this task. What's more, it remains a problem to effectively represent spatio-temporal skeleton sequences. To solve these problems, this work presents a compact, effective yet simple method to encode spatio-temporal information carried in 3D skeleton sequences into multiple 2D images. With early fusion, an end-to-end CNN model is adapted to extract robust and discriminative features from color images, which demonstrate the superiority of our method. Experimental results show the proposed method outperforms state-of-the art approaches on UTD human action dataset (UTD-MHAD), and G3D gaming dataset.
(We Compare with the method of joint trajectory map)

-‡X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v Table of contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3 Joint Trajectory Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4 PROPOSED ENCODING METHOD . . . . . . . . . . . . . . . . . . . . . 9 4.1 Joint Trajectory Maps . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.1.1 Encoding Joint Motion Direction . . . . . . . . . . . . . . . . 10 4.1.2 Encoding Joint Body Parts . . . . . . . . . . . . . . . . . . . 12 4.1.3 Encoding Joint Motion Magnitude . . . . . . . . . . . . . . . 13 4.1.4 Encoding Trajectory Motion Magnitude . . . . . . . . . . . . 14 4.1.5 Visual Enhancement . . . . . . . . . . . . . . . . . . . . . . . 15 4.2 Skeleton and Joint Momentum Maps . . . . . . . . . . . . . . . . . . 17 4.3 Geometric Skeleton Vector Map . . . . . . . . . . . . . . . . . . . . . 18 5 End-To-End Multi-Stream CNN . . . . . . . . . . . . . . . . . . . . . . . . 20 5.1 End-to-end CNN Structure . . . . . . . . . . . . . . . . . . . . . . . . 20 5.2 Training Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 5.3 Training Environment . . . . . . . . . . . . . . . . . . . . . . . . . . 21 6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 6.1 Evaluation of Di erent Encoding Schemes . . . . . . . . . . . . . . . 22 6.2 G3D Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 6.3 UTD - MHAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 6.4 MSRC-12 Kinect Gesture Dataset . . . . . . . . . . . . . . . . . . . . 25 7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

[1] J. Shotton, T. Sharp, A. Kipman, A. Fitzgibbon, M. Finocchio, A. Blake,
M. Cook, and R. Moore, \Real-time human pose recognition in parts from
single depth images," in Communications of the ACM, pp. 116-124, 2013.
[2] R. Vemulapalli and R. Chellapa, \Rolling rotations for recognizing human actions from 3d skeletal data," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4471-4479, 2016.
[3] R. Vemulapalli, F. Arrate, and R. Chellappa, \Human action recognition by
representing 3d skeletons as points in a lie group," in Proceedings of the IEEE
conference on computer vision and pattern recognition, pp. 588-595, 2014.
[4] M. E. Hussein, M. Torki, M. A. Gowayyed, and M. El-Saban, \Human action recognition using a temporal hierarchy of covariance descriptors on 3d
joint locations," in International Joint Conference on Arti cial Intelligence,
pp. 2466-2472, 2013.
[5] M. Zan r, M. Leordeanu, and C. Sminchisescu, \The moving pose: An ef-
cient 3d kinematics descriptor for low-latency action recognition and detection," in Proceedings of the IEEE International Conference on Computer Vi-
pp. 2752-2759, 2013.
[6] W. Li, Z. Zhang, and Z. Liu, \Action recognition based on a bag of 3d points,"
in IEEE Conference on Computer Vision and Pattern Recognition Workshops,
pp. 9-14, 2010.
[7] C. Chen, Y. Zhuang, F. Nie, Y. Yang, F. Wu, and J. Xiao, \Learning a 3D Hu-
man Pose Distance Metric from Geometric Pose Descriptor," in IEEE Trans-
actions on Visualization and Computer Graphics, pp. 1676-1689, IEEE, 2011.
[8] Y. Zhu, W. Chen, and G. Guo, \Fusing spatiotemporal features and joints for
3d action recognition," in IEEE Conference on Computer Vision and Pattern
Recognition Workshops, pp. 486-491, 2013.
[9] L. Xia, C.-C. Chen, and J. Aggarwal, \View invariant human action recognition
using histograms of 3d joints," in IEEE Conference on Computer Vision and
Pattern Recognition Workshops, pp. 20-27, 2012.
[10] S. Nie and Q. Ji, \Capturing global and local dynamics for human action recognition," in Pattern Recognition (ICPR), 2014 22nd International Conference
on, pp. 1946-1951, IEEE, 2014.
[11] S. Nie, Z. Wang, and Q. Ji, \A generative restricted boltzmann machine based
method for high-dimensional motion data modeling," in Computer Vision and
Image Understanding, pp. 14-22, Elsevier, 2015.
[12] J. Cavazza, P. Morerio, and V. Murino, \When kernel methods meet feature
learning: Log-covariance network for action recognition from skeletal data,"
in Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern
Recognition Workshops (CVPRW), Honolulu, HI, USA, pp. 21-26, 2017.
[13] P.Wang, Z. Li, Y. Hou, and W. Li, \Action recognition based on joint trajectory
maps using convolutional neural networks," in Proceedings of the 2016 ACM
on Multimedia Conference, pp. 102-106, ACM, 2016.
[14] M. Liu, H. Liu, and C. Chen, \Enhanced skeleton visualization for view-invariant human action recognition," in Pattern Recognition, pp. 346-362, Elsevier,
2017.
[15] X. Yang and Y. L. Tian, \Eigenjoints-based action recognition using naive-
bayes-nearest-neighbor," in IEEE Conference on Computer Vision and Pattern
Recognition Workshops, pp. 14-19, 2012.
[16] J. Wang, Z. Liu, Y. Wu, and J. Yuan, \Mining actionlet ensemble for action
recognition with depth cameras," in IEEE Conference on Computer Vision and
Pattern Recognition, pp. 1290-1297, 2012.
[17] X. Cai, W. Zhou, L. Wu, J. Luo, and H. Li, \Eective active skeleton representation for low latency human action recognition," in IEEE Transactions on
Multimedia, pp. 141-154, 2016.
[18] W. Zhu, C. Lan, J. Xing, W. Zeng, Y. Li, L. Shen, X. Xie, et al., \Co-occurrence
feature learning for skeleton based action recognition using regularized deep
lstm networks.," in Conference on Association for the Advancement of Arti cial
Intelligence, p. 8, 2016.
[19] Y. Du, Y. Fu, and L. Wang, \Skeleton based action recognition with convolu
tional neural network," in Pattern Recognition (ACPR), 2015 3rd IAPR Asian
Conference on, pp. 579-583, IEEE, 2015.
[20] P. Wang, W. Li, S. Liu, Z. Gao, C. Tang, and P. Ogunbona, \Large-scale
isolated gesture recognition using convolutional neural networks," in Pattern
Recognition (ICPR), 2016 23rd International Conference on, pp. 7-12, IEEE,
2016.
[21] F. Oi, R. Chaudhry, G. Kurillo, R. Vidal, and R. Bajcsy, \Sequence of the
Most Informative Joints (SMIJ): A New Representation for Human Skeletal
Action Recognition," in Journal of Visual Communication and Image Repre-
sentation, pp. 24-38, 2014.
[22] L. Tao and R. Vidal, \Moving poselets: A discriminative and interpretable
skeletal motion representation for action recognition," in IEEE Conference on
Computer Vision and Pattern Recognition Workshops, pp. 61{69, 2015.
[23] Y. Du, W. Wang, and L. Wang, \Hierarchical recurrent neural network for
skeleton based action recognition," in Proceedings of the IEEE conference on
computer vision and pattern recognition, pp. 1110{1118, 2015.
[24] J. Liu, A. Shahroudy, D. Xu, and G. Wang, \Spatio-temporal lstm with trust
gates for 3d human action recognition," in European Conference on Computer
Vision, pp. 816-833, 2016.
[25] F. Baradel, C. Wolf, and J. Mille, \Pose-conditioned spatio-temporal attention
for human action recognition," in arXiv preprint arXiv:1703.10106, 2017.
[26] J. Liu and A. Mian, \Learning human pose models from synthesized data for
robust rgb-d action recognition," in arXiv preprint arXiv:1707.00823, 2017.
[27] J. Serra, \Image Analysis and Mathematical Morphology," in ICASSP88 Inter-
national ,1982 Acoustics Speech and Signal Processing Conference on, p. 610,
IEEE, 1982.
[28] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadar-
rama, and T. Darrell, \Cafe: Convolutional architecture for fast feature embed-
ding," in Proceedings of the 22nd ACM international conference on Multimedia,
pp. 675-678, ACM, 2014.
[29] V. Bloom, D. Makris, and V. Argyriou, \G3d: A gaming action dataset and real
time action recognition evaluation framework," in Computer Vision and Pattern
Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference
on, pp. 7-12, IEEE, 2012.
[30] C. Chen, R. Jafari, and N. Kehtarnavaz, \Utd-mhad: A multimodal dataset
for human action recognition utilizing a depth camera and a wearable inertial
sensor," in Image Processing (ICIP), 2015 IEEE International Conference on,
pp. 168-172, IEEE, 2015.
[31] E. Escobedo and G. Camara, \A new approach for dynamic gesture recognition
using skeleton trajectory representation and histograms of cumulative magnitudes," in Graphics, Patterns and Images (SIBGRAPI), 2016 29th SIBGRAPI
Conference on, pp. 209-216, IEEE, 2016.
[32] J. Imran and P. Kumar, \Human action recognition using rgb-d sensor and deep
convolutional neural networks," in Advances in Computing, Communications
and Informatics (ICACCI), 2016 International Conference on, pp. 144-148,
IEEE, 2016.
[33] N. E. D. Elmadany, Y. He, and L. Guan, \Human gesture recognition via
bag of angles for 3d virtual city planning in cave environment," in Multimedia
Signal Processing (MMSP), 2016 IEEE 18th International Workshop on, pp. 1-5, IEEE, 2016.
[34] S. Fothergill, H. Mentis, P. Kohli, and S. Nowozin, \Instructing people for
training gestural interactive systems," in Proceedings of the SIGCHI Conference
on Human Factors in Computing Systems, pp. 1737-1746, 2012.
[35] S. Yang, C. Yuan, W. Hu, and X. Ding, \A hierarchical model based on latent
dirichlet allocation for action recognition," in Pattern Recognition (ICPR), 2014
22nd International Conference on, pp. 2613-2618, IEEE, 2014.
[36] L. Zhou, W. Li, Y. Zhang, P. Ogunbona, D. T. Nguyen, and H. Zhang, \Dis-
criminative key pose extraction using extended lc-ksvd for action recognition,"
in Digital lmage Computing: Techniques and Applications (DlCTA), 2014 International Conference on, pp. 1-8, IEEE, 2014.

QR CODE