研究生: |
鄧少鈞 Shao-Jun Deng |
---|---|
論文名稱: |
基於卷積神經網路及人體骨架資訊之靜態與動態動作即時追蹤與辨識系統 A Real-time Tracking and Recognition System for Static and Dynamic Human Actions Based on a Convolutional Neural Network and Human Skeleton Information |
指導教授: |
施慶隆
Ching-Long Shih |
口試委員: |
施慶隆
Ching-Long Shih 黃志良 Chih-Lyang Hwang 李文猶 Wen-Yo Lee 吳修明 Hsiu-Ming Wu |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電機工程系 Department of Electrical Engineering |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 中文 |
論文頁數: | 56 |
中文關鍵詞: | 機器學習 、卷積神經網路 、人體骨架 、動作辨識 、目標追蹤 、即時監控 |
外文關鍵詞: | Machine Learning, Convolutional Neural Network, Human Skeleton, Motion Recognition, Target Tracking, Real-Time Monitoring System |
相關次數: | 點閱:347 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文意旨為利用人體關節資訊基於卷積神經網路實現即時性的人體追蹤及動態動作辨識系統。為了達成以上目的,在硬體上使用Kinect相機、直流馬達與FPGA De0-NANO開發板。本系統主要由三個子系統組成,包含相機人體追蹤系統、關節資料處理系統與動作辨識端的機器學習系統。在人體追蹤方面,利用Kinect相機追蹤人體上半身特徵關節點,並根據相對應的距離與角度差計算鏡頭移動指令,經由閉迴路PID控制器,控制直流馬達達成鏡頭追蹤目的。在機器學習端,選用傳統的卷積神經網路搭配預建立的資料集進行訓練,產生權重檔供辨識端使用。資料處理方面,有別於以往圖像輸入改採取資料流方式,對特徵關節點做即時運算並排列成矩陣型態輸入至預訓練好的網路模型中取得辨識結果,再搭配上雙線程與系統狀態機的整合,即可於鏡頭追蹤到人體之極限距離達成即時性動態動作辨識的目的。
This paper utilizes human skeleton information based convolutional neural network (CNN) to realize a real-time human tracking and dynamic motion recognition system. For the above purpose, a Kinect, a DC motor and a FPGA development board are used to implement the human motion recognition system hardware. This software system consists of three sub-systems, including a camera human tracking system, a skeleton data processing system and a machine learning system. The Kinect camera is responsible for tracking the human upper body’s coordinates in order to calculate the distances and angles of two arms. A PID position controller is applied to control the yaw angle of the Kinect camera to achieve the function of human tracking. A convolutional neural network (CNN) is trained by using a pre-built data set and a weight file is generated for the human motion recognition testing. The CNN’s data source is differ from the traditional image input by a data stream method. It is based on the real-time calculation of the featured joint points in sequence and put into a matrix form. By integration of the two threads system and a state machine, the real-time dynamic human motion recognition can be achieved in a limited distance from the camera.
[1] Kinect SDK for windows,[online]Available: https://developer.microsoft.com/zh-tw/windows/kinect/
[2] E. P. Ijjina and C. K. Mohan, "Human Action Recognition Based on Recognition of Linear Patterns in Action Bank Features Using Convolutional Neural Networks," 2014 13th International Conference on Machine Learning and Applications, Detroit, MI, USA, 2014, pp. 178-182, doi: 10.1109/ICMLA.2014.33.
[3] B. Hu and J. Wang, "Deep Learning Based Hand Gesture Recognition and UAV Flight Controls," 2018 24th International Conference on Automation and Computing (ICAC), 2018, pp. 1-6, doi: 10.23919/IConAC.2018.8748953.
[4] Razieh Rastgoo, Kourosh Kiani, Sergio Escalera, "Sign Language Recognition: A Deep Survey, "Expert Systems with Applications,Volume 164,2021,113794,
ISSN 0957-4174,https://doi.org/10.1016/j.eswa.2020.113794.
[5] T. Liu, Y. Song, Y. Gu and A. Li, "Human Action Recognition Based on Depth Images from Microsoft Kinect," 2013 Fourth Global Congress on Intelligent Systems, Hong Kong, China, 2013, pp. 200-204, doi: 10.1109/GCIS.2013.38.
[6] T. Z. Wint Cho, M. T. Win and A. Win, "Human Action Recognition System Based on Skeleton Data," 2018 IEEE International Conference on Agents (ICA), Singapore, 2018, pp. 93-98, doi: 10.1109/AGENTS.2018.8458495.
[7] J. Yamato, J. Ohya and K. Ishii, "Recognizing Human Action in Time-sequential Images Using Hidden Markov Model," Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Champaign, IL, USA, 1992, pp. 379-385, doi: 10.1109/CVPR.1992.223161.
[8] C. Schuldt, I. Laptev and B. Caputo, "Recognizing Human Actions: a Local SVM Approach," Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., Cambridge, UK, 2004, pp. 32-36 Vol.3, doi: 10.1109/ICPR.2004.1334462.
[9] X. Wang, L. Gao, J. Song and H. Shen, "Beyond Frame-level CNN: Saliency-Aware 3-D CNN With LSTM for Video Action Recognition," in IEEE Signal Processing Letters, vol. 24, no. 4, pp. 510-514, April 2017, doi: 10.1109/LSP.2016.2611485.
[10] S. Vantigodi and R. Venkatesh Babu, "Real-time Human Action Recognition From Motion Capture Data," 2013 Fourth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), 2013, pp. 1-4, doi: 10.1109/NCVPRIPG.2013.6776204.
[11] S. Gattupalli, "Human Motion Analysis and Vision-Based Articulated Pose Estimation," 2015 International Conference on Healthcare Informatics, 2015, pp. 470-470, doi: 10.1109/ICHI.2015.78.
[12] N. Chen, Y. Chang, H. Liu, L. Huang and H. Zhang, "Human Pose Recognition Based on Skeleton Fusion from Multiple Kinects," 2018 37th Chinese Control Conference (CCC), 2018, pp. 5228-5232, doi: 10.23919/ChiCC.2018.8483016.
[13] X. Tong, P. Xu and X. Yan, "Research on Skeleton Animation Motion Data Based on Kinect," 2012 Fifth International Symposium on Computational Intelligence and Design, 2012, pp. 347-350, doi: 10.1109/ISCID.2012.238.
[14] D. Xu, X. Xiao, X. Wang and J. Wang, "Human Action Recognition Based on Kinect and PSO-SVM by Representing 3D Skeletons as Points in Lie Group," 2016 International Conference on Audio, Language and Image Processing (ICALIP), 2016, pp. 568-573, doi: 10.1109/ICALIP.2016.7846646.
[15] Y. Choubik and A. Mahmoudi, "Machine Learning for Real Time Poses Classification Using Kinect Skeleton Data," 2016 13th International Conference on Computer Graphics, Imaging and Visualization (CGiV), 2016, pp. 307-311, doi: 10.1109/CGiV.2016.66.
[16] W. Kuo, C. Kuo, S. Sun, P. Chang, Y. Chen and W. Cheng, "Machine Learning-based Behavior Recognition System for a Basketball Player Using Multiple Kinect cameras," 2016 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), 2016, pp. 1-1, doi: 10.1109/ICMEW.2016.7574661.
[17] S. Wei, Y. Song and Y. Zhang, "Human Skeleton Tree Recurrent Neural Networks with Joint Relative Motion Feature for Skeletons Based Action Recognition," 2017 IEEE International Conference on Image Processing (ICIP), 2017, pp. 91-95, doi: 10.1109/ICIP.2017.8296249.
[18] Y. Qiao, Y. Zhao and X. Song, "Dynamic Texture Classification Based on Motion Statistical Feature Matrix," 2013 Ninth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, 2013, pp. 535-538, doi: 10.1109/IIH-MSP.2013.138.
[19] N. Käse, M. Babaee and G. Rigoll, "Multi-view Human Activity Recognition using Motion Frequency," 2017 IEEE International Conference on Image Processing (ICIP), 2017, pp. 3963-3967, doi: 10.1109/ICIP.2017.8297026.
[20] F. Monti and C. S. Regazzoni, "Human Action Recognition Using the Motion of Interest Points," 2010 IEEE International Conference on Image Processing, 2010, pp. 709-712, doi: 10.1109/ICIP.2010.5651011.
[21] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” ICLR, 2015.
[22] A. Krizhevsky, I. Sutskever and G.E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks", Adv. Neural Inf. Process. Syst., pp. 1-9, 2012.
[23] J. Liu, G. Wang, L. Duan, K. Abdiyeva and A. C. Kot, "Skeleton-Based Human Action Recognition With Global Context-Aware Attention LSTM Networks," in IEEE Transactions on Image Processing, vol. 27, no. 4, pp. 1586-1599, April 2018, doi: 10.1109/TIP.2017.2785279.
[24] R. Li, H. Fu, W. Lo, Z. Chi, Z. Song and D. Wen, "Skeleton-Based Action Recognition with Key-Segment Descriptor and Temporal Step Matrix Model," in IEEE Access, vol. 7, pp. 169782-169795, 2019, doi: 10.1109/ACCESS.2019.2954744.