簡易檢索 / 詳目顯示

研究生: 林威佑
Wei-You Lin
論文名稱: 利用RGB信息作動作識別之嵌入全域注意力的多流網路技術
Multi-stream Networks with Global Attention for Action Recognition in RGB videos
指導教授: 鍾聖倫
Sheng-Luen Chung
口試委員: 蘇順豐
Shun-Feng Su
鍾聖倫
Sheng-Luen Chung
陸敬互
Ching-Hu Lu
徐繼聖
Ji-Sheng Xu
黃國勝
Guo-Sheng Huang
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 中文
論文頁數: 62
中文關鍵詞: 多流網路全域注意力模型疊加式學習曲線模型(ALC)動作識別決策融合
外文關鍵詞: Multi-stream network, global spatial attention (GSA) model, accumulative learning curve (ALC) model, decision fusion
相關次數: 點閱:239下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 深度相機Kinect獲取的RGB-D資訊及其所衍生的人體骨架關節點極大的促進了動作識別的發展。但是,受結構所限Kinect無法應用到室外場景,導致由Kinect所開發的方法很難應用於真正的場景。本文以替代Kinect為目標,構建以RGB及從RGB中提取的2D人體關節點為基礎的多流架構來實現動作識別。具體來說,一方面直接將RGB資訊餵進C3D中,獲取反應人體運動的外觀特性;另一方面,利用2D人體姿態估計方法從RGB序列中提取人體的2D關節點,然後,餵進LSTM架構來獲取動作的動態特性。最後,將兩類資訊進行融合。為了解決動作序列中對訓練模型真正有效的資料所占比例較小的問題,本文分別在動作的空間特性方面引入全域注意力模型、在動作的時間特性方面引入疊加式學習曲線模型(ALC)來提升關鍵資料的重要性、抑制普通資料對動作識別的干擾,強化有效資料在訓練模型中的作用。同時,為增強多流融合的效果,引入兩類基礎特徵,分別是反應人體組態的靜態特徵和反應人體運動的動態特徵,構建特性相差較大的多流融合架構。為驗證所提演算法的有效性,本文分別在目前最大的RGB-D動作資料集NTU datasets、最小的動作資料集MSRDaily Activity3D和視角變化很大的資料集Multiview Action3D上進行測試。實驗結果表明本文所提演算法遠遠好於目前經典的演算法。


    Significant progress has been made in activity recognition based on derived joint information provided by RGB-D cameras, like Kinect. However, due to sensory constraint, current RGB-D cameras technology suffers outdoors sensing degradation, limiting its applications. Replacing the more sophisticated RGB-D cameras, this paper presents variations of RGB-based solutions to action recognition: learning from RGB derived 2D joint data by LSTM, to learning from down-sampled RGB video by C3D networks, and to fusing multi-streams information, all extracted from RGB videos. To highlight the fact of uneven contributions of individual joint and of particular frames to classification of respective actions, this paper proposes a global learning (GL) deep learning structure. The proposed GL embodies two fused sub LSTM-based learning networks of global spatial attention (GSA) model and accumulative learning curve (ALC) model, extracting, respectively, spatial and temporal features critical to action recognition. In particular, global spatial attention (GSA) model highlights the importance and thus the weighting, of individual contribution from each 2D joints; whereas, the accumulative learning curve (ALC) model highlights the different weightings of accumulative learning results during the whole RGB videos. With both the most important 2D joints and particular frames thus highlighted, we attain a superior performance than a state-of-the-art attention deep learning structure for action recognition. In addition, we also examine C3D based solution that takes in down-sampled RGB images without pre-processing of the 2D joints. This modified C3D network is then fused with the previous 2D joint based global learning network. On three most representative action recognition datasets: the most comprehensive NTU dataset, a smaller one of MSRDaily Activity3D, and the one Multiview Action3D that contains the widest range of viewing angles, the proposed multi-stream networks out-performs all recognition results reported in literature.

    摘要 ABSTRACT 致謝 LIST OF FIGURES LIST OF TABLES Chapter I. 介紹 1.1 動作識別問題 1.2 動作識別之深度學習相關作法 1.3 本文貢獻與架構 Chapter II. 文獻審閱 2.1 基於RGB和CNN的動作識別 2.2 基於人體骨架關節點和LSTM的動作識別 2.3 動作識別的多流融合 2.4 動作識別中的注意力模型 2.5 以RGB為基礎的2D 人體姿態估計 Chapter III. LSTM技術之時空注意力模型 3.1 基於RGB的2D人體姿態估計及優化 3.2 帶時空注意力模型的LSTM架構 3.3 雙流基礎特徵融合 Chapter IV. CNN技術之RGB信息彙整 4.1 C3D架構及應用 4.2 基於改進型C3D的動作識別(D-C3D) 4.3 雙流C3D的融合 4.4 基於2D關節點和RGB的多流融合 Chapter V. 成果展示 5.1 NTU資料庫評估方法 5.2 2D關節點時空模型辨識效果 5.3 RGB靜態與動態基礎特徵辨識效果 5.4 系統整合辨識效果 5.5 Compare with the State-of-the-art 5.6 其他資料庫識別效果 Chapter VI. 結論 6.1 結論 6.2 未來展望 APPENDIX. REFERENCE 107年01月19日口試問題回覆

    [1] J. K. Aggarwal and M. S. Ryoo, "Human activity analysis: A review," ACM Computing Surveys, vol. 43, no. 3, pp. 1-43, 2011.
    [2] K. Yun, J. Honorio, D. Chattopadhyay, T. L. Berg, and D. Samaras, "Two-person interaction detection using body-pose features and multiple instance learning," in Proc. 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, RI, United states, 2012, pp. 28-35.
    [3] C. Zhang, Y. Tian, and E. Capezuti, "Privacy preserving automatic fall detection for elderly using RGBD cameras," in Proc. 13th International Conference on Computers Helping People with Special Needs, Linz, Austria, 2012, vol. 7382 LNCS, pp. 625-633.
    [4] J. Han, L. Shao, D. Xu, and J. Shotton, "Enhanced computer vision with Microsoft Kinect sensor: A review," IEEE Transactions on Cybernetics, vol. 43, no. 5, pp. 1318-1334, 2013.
    [5] S. Song, C. Lan, J. Xing, W. Zeng, and J. Liu, "An end-to-end spatio-temporal attention model for human action recognition from skeleton data," in Proc. 31st AAAI Conference on Artificial Intelligence, San Francisco, CA, United states, 2017, pp. 4263-4270.
    [6] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and F.-F. Li, "Large-scale video classification with convolutional neural networks," in Proc. 27th IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, United states, 2014, pp. 1725-1732.
    [7] K. Simonyan and A. Zisserman, "Two-stream convolutional networks for action recognition in videos," in Proc. 28th Annual Conference on Neural Information Processing Systems 2014, Montreal, QC, Canada, 2014, vol. 1, pp. 568-576.
    [8] S. Ji, W. Xu, M. Yang, and K. Yu, "3D Convolutional neural networks for human action recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 221-231, 2013.
    [9] X. Peng and C. Schmid, "Encoding Feature Maps of CNNs for Action Recognition," in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, Massachusetts, THUMOS Challenge 2015 Workshop, 2015, pp. 4321-4323.
    [10] B. Zhang, L. Wang, Z. Wang, Y. Qiao, and H. Wang, "Real-Time Action Recognition with Enhanced Motion Vector CNNs," in Proc. 29th IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, United states, 2016, vol. 2016-December, pp. 2718-2726.
    [11] V. Kantorov and I. Laptev, "Efficient feature extraction, encoding, and classification for action recognition," in Proc. 27th IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, United states, 2014, pp. 2593-2600.
    [12] Y. Wang, J. Song, L. Wang, L. Van Gool, and O. Hilliges, "Two-Stream SR-CNNs for Action Recognition in Videos," in Proc. British Machine Vision Conference (BMVC), York, UK, 2016, pp. 108.1-108.12.
    [13] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, "Learning spatiotemporal features with 3D convolutional networks," in Proc. 15th IEEE International Conference on Computer Vision, Santiago, Chile, 2015, vol. 2015 International Conference on Computer Vision, ICCV 2015, pp. 4489-4497.
    [14] L. Chen, H. Wei, and J. Ferryman, "A survey of human motion analysis using depth imagery," Pattern Recognition Letters, vol. 34, no. 15, pp. 1995-2006, 2013.
    [15] W. Zhu et al., "Co-Occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks," in Proc. 30th AAAI Conference on Artificial Intelligence, Phoenix, AZ, United states, 2016, pp. 3697-3703.
    [16] J. Liu, A. Shahroudy, D. Xu, and G. Wang, "Spatio-temporal LSTM with trust gates for 3D human action recognition," in Proc. European Conference on Computer Vision, Amsterdam, Netherlands, 2016, pp. 816-833.
    [17] H. Wang and L. Wang, "Modeling Temporal Dynamics and Spatial Configurations of Actions Using Two-Stream Recurrent Neural Networks," in Proc. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, 2017, pp. 3633-3642.
    [18] W. Li, L. Wen, M. C. Chang, S. N. Lim, and S. Lyu, "Adaptive RNN Tree for Large-Scale Human Action Recognition," in Proc. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017, pp. 1453-1461.
    [19] A. Shahroudy, J. Liu, T.-T. Ng, and G. Wang, "NTU RGB+D: A large scale dataset for 3D human activity analysis," in Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, United states, 2016, vol. 2016-January, pp. 1010-1019.
    [20] S. Zhang, X. Liu, and J. Xiao, "On geometric features for skeleton-based action recognition using multilayer LSTM networks," in Proc. 17th IEEE Winter Conference on Applications of Computer Vision, Santa Rosa, CA, United states, 2017, pp. 148-157.
    [21] Z. Shi and T. K. Kim, "Learning and Refining of Privileged Information-Based RNNs for Action Recognition from Depth Sequences," in Proc. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, 2017, pp. 4684-4693.
    [22] A. Shahroudy, T. T. Ng, Y. Gong, and G. Wang, "Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PP, no. 99, pp. 1-1, 2017.
    [23] P. Wang, W. Li, Z. Gao, Y. Zhang, C. Tang, and P. Ogunbona, "Scene Flow to Action Map: A New Representation for RGB-D Based Action Recognition with Convolutional Neural Networks," in Proc. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, 2017, pp. 416-425.
    [24] L. Yunan et al., "Large-scale gesture recognition with a fusion of RGB-D data based on the C3D model," in Proc. 2016 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 2016, pp. 25-30.
    [25] G. Zhu, L. Zhang, P. Shen, and J. Song, "Multimodal Gesture Recognition Using 3-D Convolution and Convolutional LSTM," IEEE Access, vol. 5, pp. 4517-4524, 2017.
    [26] Q. Li, Z. Qiu, T. Yao, T. Mei, Y. Rui, and J. Luo, "Action recognition by learning deep multi-granular spatio-temporal video representation," in Proc. 6th ACM International Conference on Multimedia Retrieval, New York, NY, United states, 2016, pp. 159-166.
    [27] J. H. Maunsell, "Neuronal mechanisms of visual attention," Annual Review of Vision Science, vol. 1, pp. 373-391, 2015.
    [28] L. Itti, C. Koch, and E. Niebur, "A model of saliency-based visual attention for rapid scene analysis," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 11, pp. 1254-1259, 1998.
    [29] D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate," in Proc. International Conference on Learning Representations(ICLR), San Diego, CA, United states, 2015, pp. 214-219.
    [30] J. Ba, V. Mnih, and K. Kavukcuoglu, "Multiple object recognition with visual attention," in Proc. International Conference on Learning Representations(ICLR), San Diego, CA, United states, 2015, pp. 112-118.
    [31] J. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, and Y. Bengio, "Attention-based models for speech recognition," in Proc. 29th Annual Conference on Neural Information Processing Systems, Montreal, QC, Canada, 2015, vol. 2015-January, pp. 577-585.
    [32] S. Sharma, R. Kiros, and R. Salakhutdinov, "Action recognition using visual attention," in Proc. International Conference on Learning Representations(ICLR, San Juan, Puerto Rico, 2016, pp. 321-327.
    [33] N. Sarafianos, B. Boteanu, B. Ionescu, and I. A. Kakadiaris, "3D Human pose estimation: A review of the literature and analysis of covariates," Computer Vision and Image Understanding, vol. 152, pp. 1-20, 2016.
    [34] J. Shotton et al., "Real-time human pose recognition in parts from single depth images," in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, CO, USA, 2011, pp. 1297-1304.
    [35] T. Pfister, J. Charles, and A. Zisserman, "Flowing ConvNets for Human Pose Estimation in Videos," in Proc. 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 2015, pp. 1913-1921.
    [36] S. E. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh, "Convolutional Pose Machines," in Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, Nevada, 2016, pp. 4724-4732.
    [37] X. Chu, W. Ouyang, H. Li, and X. Wang, "Structured Feature Learning for Pose Estimation," in Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, Nevada, 2016, pp. 4715-4723.
    [38] G. Gkioxari, B. Hariharan, R. Girshick, and J. Malik, "Using k-Poselets for Detecting People and Localizing Their Keypoints," in Proc. 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, Ohio, 2014, pp. 3582-3589.
    [39] U. Iqbal and J. Gall, "Multi-person pose estimation with local joint-to-person associations," in Proc. 14th European Conference on Computer Vision, ECCV 2016 ,vol. 9914 LNCS, ed, 2016, pp. 627-642.
    [40] L. Pishchulin et al., "DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation," in Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, Nevada, 2016, pp. 4929-4937.
    [41] E. Insafutdinov, L. Pishchulin, B. Andres, M. Andriluka, and B. Schiele, "Deepercut: A deeper, stronger, and faster multi-person pose estimation model," Lecture Notes in Computer Science, vol. 9910 LNCS, ed, 2016, pp. 34-50.
    [42] Z. Cao, T. Simon, S. E. Wei, and Y. Sheikh, "Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields," in Proc. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, 2017, pp. 1302-1310.
    [43] S.-F. Chung, "LSTM with Hand-crafted View-Invariant and Differential Cues (HVDC) for 3D Human Action Recognition," 2017.
    [44] W. Li, Z. Zhang, and Z. Liu, "Action recognition based on a bag of 3d points," in Proc. 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), San Francisco, CA USA, 2010, pp. 9-14.
    [45] L. Xia, C.-C. Chen, and J. Aggarwal, "View invariant human action recognition using histograms of 3d joints," in Proc. 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Providence, RI USA, 2012, pp. 20-27.
    [46] J. Wang, X. Nie, Y. Xia, Y. Wu, and S.-C. Zhu, "Cross-view action modeling, learning and recognition," in Proc. IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Ohio, 2014, pp. 2649-2656.
    [47] A.-A. Liu, W.-Z. Nie, Y.-T. Su, L. Ma, T. Hao, and Z.-X. Yang, "Coupled hidden conditional random fields for RGB-D human action recognition," Signal Processing, vol. 112, 2015, pp. 74-82.
    [48] H. Rahmani, A. Mahmood, D. Huynh, and A. Mian, "Histogram of oriented principal components for cross-view action recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 12, 2016, pp. 2430-2443.
    [49] J. Wang, Z. Liu, Y. Wu, and J. Yuan, "Mining actionlet ensemble for action recognition with depth cameras," in Proc. 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI USA, 2012, pp. 1290-1297.
    [50] X. Glorot and Y. Bengio, "Understanding the difficulty of training deep feedforward neural networks," in Proc. 13th International Conference on Artificial Intelligence and Statistics, Sardinia, Italy, 2010, vol. 9, pp. 249-256.
    [51] D. Kingma and J. Ba, "Adam: A method for stochastic optimization," in Proc. 3rd International Conference for Learning Representations(ICLR), San Diego, 2015, pp. 155-160.
    [52] P. J. Werbos, "Backpropagation through time: what it does and how to do it," Proceedings of the IEEE, vol. 78, no. 10, pp. 1550-1560, 1990.
    [53] W. Zaremba, I. Sutskever, and O. Vinyals, "Recurrent neural network regularization," arXiv preprint arXiv:1409.2329, 2014.
    [54] R. Vemulapalli, F. Arrate, and R. Chellappa, "Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group," in Proc. 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Ohio, 2014, pp. 588-595.
    [55] J. F. Hu, W. S. Zheng, J. Lai, and J. Zhang, "Jointly Learning Heterogeneous Features for RGB-D Activity Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, vol. 39, no. 11, pp. 2186-2200.
    [56] J. Liu, A. Shahroudy, D. Xu, and G. Wang, "Spatio-temporal LSTM with trust gates for 3D human action recognition," in Proc. European Conference on Computer Vision, Amsterdam, Netherlands, 2016, pp. 816-833.
    [57] S. Song, C. Lan, J. Xing, W. Zeng, and J. Liu, "An end-to-end spatio-temporal attention model for human action recognition from skeleton data," in Proc. 31st AAAI Conference on Artificial Intelligence(AAAI), San Francisco, CA, United States, 2017, pp. 4263-4270.
    [58] P. Wang, Z. Li, Y. Hou, and W. Li, "Action recognition based on joint trajectory maps using Convolutional Neural Networks," in Proc. 24th ACM Multimedia Conference, Amsterdam, United Kingdom, 2016, pp. 102-106.
    [59] M. Liu, H. Liu, and C. Chen, "Enhanced skeleton visualization for view invariant human action recognition," Pattern Recognition, 2017, vol. 68, pp. 346-362.
    [60] A. H. Ruiz, L. Porzi, S. R. Bulo, and F. Moreno-Noguer, "3D CNNs on distance matrices for human action recognition," in Proc. 25th ACM International Conference on Multimedia, Mountain View, CA, United states, 2017, pp. 1087-1095.
    [61] Baradel, Fabien, Christian Wolf, and Julien Mille. "Pose-conditioned Spatio-Temporal Attention for Human Action Recognition." arXiv preprint arXiv:1703.10106 (2017).
    [62] Wang, Pichao, et al. "Cooperative Training of Deep Aggregation Networks for RGB-D Action Recognition." (2018).
    [63] M. Zanfir, M. Leordeanu, and C. Sminchisescu, "The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection," in Proc. 2013 IEEE International Conference on Computer Vision, Portland, Oregon, 2013, pp. 2752-2759.
    [64] L. Tao and R. Vidal, "Moving Poselets: A Discriminative and Interpretable Skeletal Motion Representation for Action Recognition," in Proc. 2015 IEEE International Conference on Computer Vision Workshop (ICCVW), Boston, MA, USA, 2015, pp. 303-311.
    [65] J. Wang, Z. Liu, Y. Wu, and J. Yuan, "Mining actionlet ensemble for action recognition with depth cameras," in Proc. 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI USA, 2012, pp. 1290-1297.
    [66] A. Shahroudy, T. T. Ng, Q. Yang, and G. Wang, "Multimodal Multipart Learning for Action Recognition in Depth Videos," IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, vol. 38, no. 10, pp. 2123-2129.
    [67] J. Luo, W. Wang, and H. Qi, "Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps," in Proc. 2013 IEEE International Conference on Computer Vision, Portland, Oregon, 2013, pp. 1809-1816.
    [68] A. Shahroudy, T. T. Ng, Y. Gong, and G. Wang, "Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos," IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, vol. PP, no. 99, pp. 1-1.
    [69] R. Vemulapalli, F. Arrate, and R. Chellappa, "Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group," in Proc. 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Ohio, 2014, pp. 588-595.
    [70] D. Yong, W. Wang, and L. Wang, "Hierarchical recurrent neural network for skeleton based action recognition," in Proc. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, Massachusetts, 2015, pp. 1110-1118.
    [71] H. Rahmani, A. Mahmood, D. Huynh, and A. Mian, "Histogram of Oriented Principal Components for Cross-View Action Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, vol. 38, no. 12, pp. 2430-2443.
    [72] H. Rahmani and A. Mian, "3D Action Recognition from Novel Viewpoints," in Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, Nevada, 2016, pp. 1506-1515.
    [73] Fothergill, Simon, et al. "Instructing people for training gestural interactive systems." Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2012.
    [74] Bloom, Victoria, Dimitrios Makris, and Vasileios Argyriou. "G3d: A gaming action dataset and real time action recognition evaluation framework." Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on. IEEE, 2012.
    [75] Yun, Kiwon, et al. "Two-person interaction detection using body-pose features and multiple instance learning." Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on. IEEE, 2012.
    [76] Ofli, Ferda, et al. "Berkeley MHAD: A comprehensive multimodal human action database." Applications of Computer Vision (WACV), 2013 IEEE Workshop on. IEEE, 2013.
    [77] Wan, Jun, et al. "Chalearn looking at people rgb-d isolated and continuous datasets for gesture recognition." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2016. 

    QR CODE