簡易檢索 / 詳目顯示

研究生: 林欣毅
Hsin-Yi Lin
論文名稱: 運用於RGB-D影片的動作辨識之研究
The Study of Action Recognition in RGB-D Videos
指導教授: 陳郁堂
Yie-Tarng Chen
口試委員: 方文賢
Wen-Hsien Fang
林銘波
Ming-Bo Lin
陳省隆
Hsing-Lung Chen
吳乾彌
Chen-Mie Wu
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2016
畢業學年度: 104
語文別: 英文
論文頁數: 40
中文關鍵詞: 動作辨識跌倒偵測深度學習
外文關鍵詞: RGB-D
相關次數: 點閱:241下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

在本文中,我們研究Sparse的表示方式對RGB-D動作辨識的影響,特別是對人的跌倒偵測,Sparse Coding已經在RGB影片的動作辨識中表現出令人印象深刻的結果,然而Sparse Coding在RGB-D影片的動作辨識中的沒有得到徹底研究。在本研究中,我們提出了一個方法來有智慧的結合深度和骨架資訊的優勢,此外我們在骨架資訊中更加使用了兩層Sparse Coding的特徵學習方式,首先我們分別從深度及骨架中提取特徵,稱為DMM-HOG 及 Moving Pose Descriptor,接下來我們使用了Pyramid Temporal Pooling把Moving Pose Descriptor轉換成富有意義的向量表示,然後我們用Sparse Coding對這兩個特徵分別進行編碼,最後我們通過Logistic Regression成功的結合深度和骨架的分類結果。為了評估此方法的效能,我們在兩個公開的跌倒偵測Dataset中測試我們提出的方法,並跟其他方法相比下得到最高的準確度,此外我們也在一個公開的RGB-D動作辨識Dataset 'MSR Action3D' 中評估我們提出的方法。在此實驗中我們也能跟其他表現較好的方法取得能夠抗衡的結果。


In this thesis, we investigate the effects of sparse representation for RGB-D action recognition, especially for human fall detection. The sparse coding has shown impressive results of action recognition in RGB video. However, the performance of the sparse coding on RGB-D action recognition has not been thoroughly investigated. In this research, we propose an approach to intelligently combine both advantages of depth and skeleton information. Also, a two-level feature learning scheme using sparse coding is introduced for skeleton information. First, we extract features, called DMM-HOG and Moving Pose Descriptor, from the depth and skeleton respectively. Next, we use the pyramid temporal pooling to convert the Moving Pose Descriptor into a compact vector. Then we apply the sparse coding to encode both descriptors. Finally, we combine classification results of depth and skeleton by the logistic regression.
To evaluate the performance of the proposed approach, we test the proposed approach on two public available fall detection datasets and achieve the highest accuracy in comparison with other methods. Also, we evaluate the proposed approach on a public available RGB-D action recognition dataset, MSR Action3D. The proposed approach achieve a competitive result in comparison with the state-of-the-art approaches.

中文摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v Table of contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Action recognition in RGB video . . . . . . . . . . . . . . . . . . . . 4 2.2 Depth map and skeleton data descriptors . . . . . . . . . . . . . . . . 4 2.3 Video Feature Descriptors . . . . . . . . . . . . . . . . . . . . . . . . 5 2.4 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.5 Hierarchical architectures for action recognition . . . . . . . . . . . . 6 3 Deep Sparse Coding For Skeleton Data . . . . . . . . . . . . . . . . . . . . 7 3.1 System diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.2 Human body and action representation . . . . . . . . . . . . . . . . . 8 3.3 DMM-HOG Descriptor . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.4 Moving Pose Descriptor . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.5 Temporal pooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.6 Sparse Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 vi 3.6.1 Class-Speci c Dictionary Learning . . . . . . . . . . . . . . . . 16 3.7 Logistic regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.1 datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.2 Experiment Setup and Evaluation Protocol . . . . . . . . . . . . . . . 21 4.2.1 performance metrics . . . . . . . . . . . . . . . . . . . . . . . 21 4.3 Evaluation of the Proposed Approach . . . . . . . . . . . . . . . . . . 22 4.3.1 Sparse coding with di erent features . . . . . . . . . . . . . . 22 4.3.2 Sparse coding with di erent architectures . . . . . . . . . . . . 23 4.3.3 Sparse coding with normalization . . . . . . . . . . . . . . . . 24 4.3.4 Sparse coding with di erent pooling strategies . . . . . . . . . 24 4.3.5 Comparison with previous works . . . . . . . . . . . . . . . . 27 4.3.6 Computational complexity . . . . . . . . . . . . . . . . . . . . 30 4.3.7 Comparison with di erent datasets . . . . . . . . . . . . . . . 30 4.4 Experimental summary . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

[1] X. Ma, H. Wang, B. Xue, M. Zhou, B. Ji, and Y. Li, "Depth-based human fall detection via shape features and improved extreme learning machine," in IEEE Journal. Biomedical and Health Informatics, vol. 18, no. 6, pp. 1915-1922, 2014.
[2] X. Yang, C. Zhang, and Y. Tian, "Recognizing actions using depth motion
maps-based histograms of oriented gradients," in ACM international confer-
ence. Multimedia, 2012, pp. 1057-1060.
[3] M. Zan r, M. Leordeanu, and C. Sminchisescu, "The moving pose: An e cient
3d kinematics descriptor for low-latency action recognition and detection," in IEEE International Conference. Computer Vision (ICCV), 2013, pp. 2752-
2759.
[4] T. Guha and R. K. Ward, "Learning sparse representations for human ac-
tion recognition," in IEEE Transactions. Pattern Analysis and Machine Intel-
ligence, vol. 34, no. 8, pp. 1576-1588, 2012.
[5] A. Kovashka and K. Grauman, "Learning a hierarchy of discriminative space-time neighborhood features for human action recognition," in IEEE Conference.Computer Vision and Pattern Recognition (CVPR), 2010, pp. 2046-2053.
[6] Y. Song, L.-P. Morency, and R. Davis, "Action recognition by hierarchical sequence summarization," in IEEE Conference. Computer Vision and Pattern Recognition, 2013, pp. 3562-3569.
[7] ||, "Multi-view latent variable discriminative models for action recognition," in IEEE Conference. Computer Vision and Pattern Recognition (CVPR), 2012, pp. 2120-2127.
[8] A. F. Bobick and J. W. Davis, "The recognition of human movement using
temporal templates," in IEEE Transactions. pattern analysis and machine in-
telligence, vol. 23, no. 3, pp. 257-267, 2001.
[9] R. Vemulapalli, F. Arrate, and R. Chellappa, "Human action recognition by representing 3d skeletons as points in a lie group," in IEEE Conference. Computer Vision and Pattern Recognition, 2014, pp. 588-595.
[10] L. Xia, C.-C. Chen, and J. Aggarwal, "View invariant human action recog-nition using histograms of 3d joints," in IEEE Computer Society Conference.
Computer Vision and Pattern Recognition Workshops, 2012, pp. 20-27.
[11] C. Wang, Y. Wang, and A. L. Yuille, "An approach to pose-based action recognition," in IEEE Conference. Computer Vision and Pattern Recognition, 2013,pp. 915-922.
[12] X. Yang and Y. L. Tian, "Eigenjoints-based action recognition using naive-bayes-nearest-neighbor," in IEEE Computer Society Conference. Computer Vision and Pattern Recognition Workshops, 2012, pp. 14-19.
[13] M. S. Ryoo, B. Rothrock, and L. Matthies, "Pooled motion features for first person videos," in IEEE Conference. Computer Vision and Pattern Recognition, 2015, pp. 896-904.
[14] P. Doll ar, V. Rabaud, G. Cottrell, and S. Belongie, "Behavior recognition via sparse spatio-temporal features," in IEEE International Workshop. Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005, pp. 65-72.
[15] J. C. Niebles, H. Wang, and L. Fei-Fei, "Unsupervised learning of human action categories using spatial-temporal words," in International journal of computer vision, vol. 79, no. 3, pp. 299-318, 2008.
[16] M. Aslan, A. Sengur, Y. Xiao, H. Wang, M. C. Ince, and X. Ma, "Shape feature encoding via sher vector for e cient fall detection in depth-videos," in Applied Soft Computing, 2015.
[17] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, "Extreme learning machine: a new learning scheme of feedforward neural networks," in IEEE International Joint Conference, vol. 2, 2004, pp. 985-990.
[18] Y. Zhu, W. Chen, and G. Guo, "Fusing spatiotemporal features and joints
for 3d action recognition," in IEEE Conference. Computer Vision and Pattern
Recognition Workshops, 2013, pp. 486-491.
[19] M. E. Hussein, M. Torki, M. A. Gowayyed, and M. El-Saban, "Human action
recognition using a temporal hierarchy of covariance descriptors on 3d joint
locations." in IJCAI, vol. 13, 2013, pp. 2466-2472.
[20] J. Wang and Y. Wu, "Learning maximum margin temporal warping for action
recognition," in IEEE International Conference. Computer Vision, 2013, pp.
2688-2695.
[21] L. Bo, X. Ren, and D. Fox, "Hierarchical matching pursuit for image classi fication: Architecture and fast algorithms," in Advances in neural information processing systems, 2011, pp. 2115-2123.
[22] J. Mairal, F. Bach, J. Ponce, and G. Sapiro, "Online dictionary learning for sparse coding," in 26th annual international conference. Machine learning, 2009, pp. 689-696.
[23] J. Wang, Z. Liu, Y. Wu, and J. Yuan, "Mining actionlet ensemble for action recognition with depth cameras," in IEEE Conference. Computer Vision and Pattern Recognition (CVPR), 2012, pp. 1290-1297.
[24] B. Kwolek and M. Kepski, "Human fall detection on embedded platform using depth maps and wireless accelerometer," in Computer methods and programs in biomedicine, vol. 117, no. 3, pp. 489-501, 2014.
[25] A. Bourke, J. O brien, and G. Lyons, "Evaluation of a threshold-based triaxial accelerometer fall detection algorithm," in Gait and posture, vol. 26, no. 2, pp. 194-199, 2007.
[26] W. Li, Z. Zhang, and Z. Liu, "Action recognition based on a bag of 3d
points," in IEEE Computer Society Conference. Computer Vision and Pattern
Recognition-Workshops, 2010, pp. 9-14.
[27] E. Ohn-Bar and M. Trivedi, "Joint angles similarities and hog2 for action recognition," in IEEE Conference. Computer Vision and Pattern Recognition Workshops, 2013, pp. 465-470.

QR CODE