Pose Spatio-Temporal based Human Action Recognition｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	Melkamu Sewuyie Denekew Melkamu Sewuyie Denekew
論文名稱：	Pose Spatio-Temporal based Human Action Recognition Pose Spatio-Temporal based Human Action Recognition
指導教授：	花凱龍 Kai-Lung Hua
口試委員:	楊朝龍 Chao-Lung Yang 陳怡伶 Yi-Ling Chen 花凱龍 Kai-Lung Hua
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2019
畢業學年度：	107
語文別：	英文
論文頁數：	35
中文關鍵詞：	Action Recognition 、Feature Descriptor 、Fisher Vector 、Pose Representation
外文關鍵詞：	Action Recognition, Feature Descriptor, Fisher Vector, Pose Representation
相關次數：	點閱：376 下載：1
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

Recognizing human actions in video sequences has been a challenging problem in the last few years. Several action representation approaches have been proposed to improve the recognition performance, but many problems still remain unsolved. For example, the representations of skeleton sequences captured by most of the previous methods lack spatial features joint information and lack detailed temporal features motion information. In order to extract human motion information efficiently and improve the accuracy of the human action recognition from video, we propose an approach for pose spatial-temporal based human action recognition using the joint point information instead of using structural information. First, we acquired the joint positions of the human body in every frame of the video. Then, we extracted the pose information using handcrafted features relative to the position of joints and the spatial dimension. We also computed for the change in the temporal dimension. The two sets of features form our human pose spatiotemporal feature descriptors. We then compute a fixed dimension of fisher vectors for each descriptor separately. Finally, we used a weighted fusion technique to classify the action. We evaluated on two public datasets and show that our proposed algorithm achieves 97.8% accuracy on PennAction dataset and 77.7% accuracy on JHMDB dataset, effectively improving the accuracy of the action recognition as compared to previous methods.

Table of Contents
Abstract    i
List of Tables    v
List of Figures    vi
Chapter One    1
 Introduction    1
1.    Background    1
1.    Related Work    4
2.    Contribution    5
3.    Approach    7
Chapter Two    8
 Pose Spatio -Temporal based Human Action Recognition    8
1.    Pose Estimation    8
2.    Extract Time and Space Features    8
2.1.    Normalization Human Body Coordinates    9
2.2.    Extract Spatial Temporal Features    9
3.    Feature Coding    18
4.    Action Recognition    19
4.1.    Weighted Fusion    20
4.2.    Feature Classification    21
Chapter Three    22
  Experiments and Discussion    22
1.    Datasets    22
2.    Experiment Result    23
2.1.    JHMDB Activity Dataset Experimental Result    23
2.2.    Evaluate Different Features    25
2.3.    Feature Code Representations    25
2.4.    Evaluate Different Limb Parts of Human Body    27
2.5.    Evaluate and Compare with other Methods    28
3.    Penn Action Dataset Experimental Results    30
Chapter Four    32
  Conclusion and Future Work    32
1    Conclusion    32
2    Future Work    32
Reference    33


                                

[1] Kong, Y., & Fu, Y. (2018). Human Action Recognition and Prediction: A Survey. CoRR, abs/1806.11230.
[2] Choutas, V., Weinzaepfel, P., Revaud, J., & Schmid, C. (2018). PoTion: Pose MoTion Representation for Action Recognition. CVPR.
[3] Wang, H., & Schmid, C. (2013). Action Recognition with Improved Trajectories. 2013 IEEE International Conference on Computer Vision, 3551-3558.
[4] Wang, H., Ullah, M.M., Kläser, A., Laptev, I., & Schmid, C. (2009). Evaluation of Local Spatio-temporal Features for Action Recognition. BMVC.
[5] Peng, X., Wang, L., Wang, X., & Qiao, Y. (2016). Bag of Visual Words and Fusion Methods for Action Recognition: Comprehensive Study and Good Practice. Computer Vision and Image Understanding, 150, 109-125.
[6] Cao, Z., Simon, T., Wei, S., & Sheikh, Y. (2017). Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1302-1310.
[7] Yang, Y., & Ramanan, D. (2013). Articulated Human Detection with Flexible Mixtures of Parts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 2878-2890.
[8] Iqbal, U., Garbade, M., & Gall, J. (2017). Pose for Action - Action for Pose. 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), 438-445.
[9] Singh, V.K., & Nevatia, R. (2011). Action recognition in cluttered dynamic scenes using Pose-Specific Part Models. 2011 International Conference on Computer Vision, 113-120.
[10] Andriluka, M., Roth, S., & Schiele, B. (2010). Monocular 3D pose estimation and tracking by detection. 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 623-630.
[11] Dhamsania, C., & Ratanpara, T.V. (2016). A survey on Human action recognition from videos. 2016 Online International Conference on Green Engineering and Technologies (IC-GET), 1-5.
[12] Chéron, G., Laptev, I., & Schmid, C. (2015). P-CNN: Pose-Based CNN Features for Action Recognition. 2015 IEEE International Conference on Computer Vision (ICCV), 3218-3226.
[13] Zhao, X., Yu, Y., Huang, Y., Huang, K., & Tan, T. (2012). Feature coding via vector difference for image classification. 2012 19th IEEE International Conference on Image Processing, 3121-3124.
[14] Jhuang, H., Gall, J., Zuffi, S., Schmid, C., & Black, M.J. (2013). Towards Understanding Action Recognition. 2013 IEEE International Conference on Computer Vision, 3192-3199.
[15] Zhang, W., Zhu, M., & Derpanis, K.G. (2013). From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding. 2013 IEEE International Conference on Computer Vision, 2248-2255.
[16] Cao, C., Zhang, Y., Zhang, C., & Lu, H. (2016). Action Recognition with Joints-Pooled 3D Deep Convolutional Descriptors. IJCAI.
[17] Dalal, N., Triggs, B., & Schmid, C. (2006). Human Detection Using Oriented Histograms of Flow and Appearance. ECCV.
[18] Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 1, 886-893 vol. 1.
[19] Wang, H., Kläser, A., Schmid, C., & Liu, C. (2012). Dense Trajectories and Motion Boundary Descriptors for Action Recognition. International Journal of Computer Vision, 103, 60-79.
[20] Simonyan, K., & Zisserman, A. (2014). Two-Stream Convolutional Networks for Action Recognition in Videos. NIPS.
[21] Shotton, J., Fitzgibbon, A.W., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., & Blake, A. (2011). Real-time human pose recognition in parts from single depth images. CVPR 2011, 1297-1304.
[22] Chu, X., Ouyang, W., Li, H., & Wang, X. (2016). Structured Feature Learning for Pose Estimation. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 4715-4723.
[23] Fan, R., Chang, K., Hsieh, C., Wang, X., & Lin, C. (2008). LIBLINEAR: A Library for Large Linear Classification. Journal of Machine Learning Research, 9, 1871-1874.
[24] Chunhui Z. Liguo W., Classification Technique for HSI, in Hyperspectral Image Processing, Beijing, National Defense Industry Press, 2016, pp. 45-77.
[25] John V. MTech, Geometry for Computer Graphics, UK: British Library Cataloguing in Publication , 2005.
[26] Daniel C. Alexander, Geralyn M. Koeberlein, Elementary Geometry for college student, Brooks/Cole, Cengage Learning, 2011.
[27] S. Holzner, Physics I Workbook For Dummies, 2nd Edition, march 2014.
[28] Luvizon, D.C., Tabia, H., & Picard, D. (2017). Learning features combination for human action recognition from skeleton sequences. Pattern Recognition Letters, 99, 13-20.

簡易檢索 / 詳目顯示

相關論文