人體動作辨識技術應用於標準加工動作及例外動作處理

簡易檢索 / 詳目顯示

回結果列表

研究生：	許佑瑋 Yu-Wei Hsu
論文名稱：	人體動作辨識技術應用於標準加工動作及例外動作處理 Human Action Recognition on Standard Worker Operation and Exceptional Movement
指導教授：	楊朝龍 Chao-Lung Yang
口試委員:	歐陽超 Chao Ou-Yang 花凱龍 Kai-Lung Hua
學位類別：	碩士 Master
系所名稱：	管理學院 - 工業管理系 Department of Industrial Management
論文出版年：	2020
畢業學年度：	108
語文別：	英文
論文頁數：	77
中文關鍵詞：	人體動作辨識、基於骨架之時空卷積網路、支持向量機、例外動作偵測
外文關鍵詞：	Human Action Recognition, Skeleton-based Graph Convolutional Networks, SVM, Exceptional Motion Detection
相關次數：	點閱：335 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

本研究旨在應用基於骨架（skeleton-based）影像動作辨識技術於作業現場的標準加工作業及例外動作的辨識。本研究共分為兩實驗項目，首先應用時空圖卷積網路（Spatial Temporal Graph Convolution Network, ST-GCN）進行標準動作辨識模型建立，並以連續加工影像進行辨識測試。此研究的目的為利用人工智慧電腦視覺技術，辨識人員所操作之各個標準工序，進而取得各個動作的執行時間資訊。研究的第二部份針對加工例外動作偵測，提出兩階段（two-stage）方法，先以ST-GCN進行標準動作預訓練（pre-trained）模型建立，再將其輸出特徵資料以多個標準動作SVM分類器進行例外動作判定。其目的在無法窮舉所有可能出現在作業現場之規範外動作的情況下，以排除法偵測出定義之標準動作以外的例外動作，解決例外動作資料收集的困難，及模型訓練上的難題。本實驗以製造業三種常見的標準動作：上料、加工、下料動作為基礎進行模型訓練及測試。結果發現ST-GCN對三個動作的辨識準確率達84.7%，顯示該模型具備標準動作識別能力。第二部的實驗以ST-GCN結合多個標準動作SVM分類器，並以本研究提出之濾波處理器進行雜訊、例外動作訊號處理，最終在含有例外動作的連續加工測試集下，對標準動作及例外動作的整體辨識準確率達81.4%，其表現優於現今常用於例外情況判定的方法。

In this research, skeleton-based human action recognition technology was applied on standard and exceptional operation recognition. First, spatial-temporal graph convolution network (ST-GCN) was used for standard motion recognition and evaluated with continue processing videos. The objective is to apply artificial intelligence computer vision technology to identify the standard operation operated by personnel and extract the time duration of each motion. Second, the two-stage approach was developed for exceptional movement detection. Basically, ST-GCN was trained by standard motion data, then the feature output of it was employed to detect the exceptional movement with multiple SVM classifiers. The objective is to detect exceptional motions which are not defined or should be avoided during the operation. In this research, the first experiment was conducted based on three common standard movements in manufacturing: loading, processing and unloading, for model training and testing. The recognition accuracy by ST-GCN is 84.7% which demonstrates the capability of standard motion recognition. The second experiment conducted by the combined ST-GCN with multiple SVM calssifiers. The particular filter processing was proposed for noise and exceptional signal processing. Under the continuous processing videos with standard and exceptional movements, the overall recognition accuracy can reach 81.4%, which is better than the methods commonly used for exceptions detection.

摘要    i
ABSTRACT    ii
致謝    iii
CONTENTS    iv
FIGURE LIST    vi
TABLE LIST    viii
CHAPTER 1.    Introduction    1
CHAPTER 2.    Litureature Review    4
2.1    Time Study    4
2.2    Deep Learning Technology on Human Action Recognition    6
2.3    Human Pose Estimation    10
2.4    Exceptional Motion Detection    11
CHAPTER 3.    Methodology    13
3.1    Framework    13
3.2    Locate ROI in Video    15
3.3    Estimate Human Joints Location    15
3.4    Skeleton-based Action Recognition    17
3.5    SVM Standard Motion Classifier    20
3.6    Filter Processing    23
3.6.1    Noise Filter    24
3.6.2    Exceptional Motion Filter    25
CHAPTER 4.    Experiments and Results    27
4.1    Data Description    28
4.1.1    Real-World Data from Manufacturing Site    28
4.1.2    Simulation Data of Drilling Process    30
4.2    Standard Motion Recognition    32
4.2.1    ST-GCN Network Configuration    33
4.2.2    Performance Evaluation    33
4.2.3    Result    34
4.3    Exceptional Motion Detection    35
4.3.1    ST-GCN Network Configuration    38
4.3.2    SVM Classifier Configuration    39
4.3.3    Filter    41
4.3.4    Result    44
CHAPTER 5.    Conclusion    49
REFERENCES    51
APPENDIX    55


                                

[1] F. E. Meyers and J. R. Stewart, Motion and Time Study for Lean Manufacturing, 3th ed. Pearson, 2001.
[2] J. Zhang, Z. Yin, and R. Wang, "Recognition of mental workload levels under complex human–machine collaboration by using physiological features and adaptive support vector machines," IEEE Transactions on Human-Machine Systems, vol. 45, no. 2, pp. 200-214, 2014.
[3] J. Yang, Z. Shi, and Z. Wu, "Vision-based action recognition of construction workers using dense trajectories," Advanced Engineering Informatics, vol. 30, no. 3, pp. 327-336, 2016/08/01/ 2016.
[4] C. Reining, F. Niemann, F. M. Rueda, G. A. Fink, and M. t. Hompel, "Human Activity Recognition for Production and Logistics—A Systematic Literature Review," Information, vol. 10, no. 8, p. 245, 2019.
[5] F. Andris and W. N. Benjamin, Niebel's Methods, Standards, and Work Design, 13e. McGraw-Hill Education, 2013.
[6] A. Azizi and P. G. Yazdi, "Investigating the Effect of VR, Time Study and Ergonomics on the Design of Industrial Workstations," International Journal of Mechanical and Mechatronics Engineering, vol. 13, no. 10, pp. 661-667, 2019.
[7] I. o. M. Services. (2020, March 27). Time Study. Available: https://www.ims-productivity.com/page.cfm/content/Time-Study/
[8] A. Golabchi, S. Han, and S. AbouRizk, "A simulation and visualization-based framework of labor efficiency and safety analysis for prevention through design and planning," Automation in construction, vol. 96, pp. 310-323, 2018.
[9] R. Poppe, "A survey on vision-based human action recognition," Image and Vision Computing, vol. 28, no. 6, pp. 976-990, 2010.
[10] M. Fu et al., "Human Action Recognition: A Survey," Singapore, 2019, pp. 69-77: Springer Singapore.
[11] J. Liu, J. Luo, and M. Shah, "Recognizing realistic actions from videos “in the wild”," in 2009 IEEE Conference on Computer Vision and Pattern Recognition,, Miami, FL, USA, 2009, pp. 1996-2003.
[12] K. Soomro, A. R. Zamir, and M. Shah. (2012). UCF101: A dataset of 101 human action classes from videos in the wild. Available: http://crcv.ucf.edu/data/UCF101.php.
[13] K. Soomro and A. R. Zamir, "Action Recognition in Realistic Sports Videos," in Computer Vision in Sports, T. B. Moeslund, G. Thomas, and A. Hilton, Eds. Cham: Springer International Publishing, 2014, pp. 181-208.
[14] H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, "HMDB: A large video database for human motion recognition," in 2011 International Conference on Computer Vision, Barcelona, Spain, 2011, pp. 2556-2563.
[15] A. Shahroudy, J. Liu, T.-T. Ng, and G. Wang, "NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis," in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 1010-1019.
[16] J. Zang, L. Wang, Z. Liu, Q. Zhang, G. Hua, and N. Zheng, "Attention-Based Temporal Weighted Convolutional Neural Network for Action Recognition," in IFIP International Conference on Artificial Intelligence Applications and Innovations, Rhodes, Greece, 2018, pp. 97-108: Springer International Publishing.
[17] L. Wang, Y. Xu, J. Cheng, H. Xia, J. Yin, and J. Wu, "Human Action Recognition by Learning Spatio-Temporal Features With Deep Neural Networks," IEEE Access, vol. 6, pp. 17913-17922, 2018.
[18] L. Wang et al., "Temporal Segment Networks: Towards Good Practices for Deep Action Recognition," Cham, 2016, pp. 20-36: Springer International Publishing.
[19] K. Simonyan and A. Zisserman, "Two-stream convolutional networks for action recognition in videos," in Advances in neural information processing systems, Montreal, Canada, 2014, pp. 568-576.
[20] S. Ji, W. Xu, M. Yang, and K. Yu, "3D Convolutional Neural Networks for Human Action Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 221-231, 2013.
[21] T. N. Kipf and M. Welling, "Semi-Supervised Classification with Graph Convolutional Networks," presented at the 5th International Conference on Learning Representations (ICLR-17), Toulon, France, April 24-26, 2017. Available: https://openreview.net/forum?id=SJU4ayYgl
[22] S. Yan, Y. Xiong, and D. Lin, "Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition," presented at the Thirty-Second AAAI Conference on Artificial Intelligence, Hilton New Orleans Riverside, NewOrleans, Louisiana, USA, February 2 - 7, 2018.
[23] W. Zheng, P. Jing, and Q. Xu, Action Recognition Based on Spatial Temporal Graph Convolutional Networks (Proceedings of the 3rd International Conference on Computer Science and Application Engineering). Sanya, China: Association for Computing Machinery, 2019, p. Article 118.
[24] L. Shi, Y. Zhang, J. Cheng, and H.Lu, "Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition," in CVPR 2019, Long Beach, CA, USA, 2019.
[25] L. Shi, Y. Zhang, J. Cheng, and H. Lu, "Skeleton-Based Action Recognition with Directed Graph Neural Networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019), Long Beach, CA, 2019, pp. 7912-7921.
[26] Paperswithcode.com. (2020, March 30, 2020). Browse State-of-the-Art. Available: https://paperswithcode.com/sota
[27] L. Shi, Y. Zhang, J. Cheng, and H. Lu, "Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019, pp. 12026-12035.
[28] C. Caetano, J. Sena, F. Brémond, J. A. D. Santos, and W. R. Schwartz, "SkeleMotion: A New Representation of Skeleton Joint Sequences based on Motion Information for 3D Action Recognition," presented at the 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Taipei, Taiwan, September 18 - 21, 2019.
[29] J. Liu, N. Akhtar, and A. Mian, "Skepxels: Spatio-temporal Image Representation of Human Skeleton Joints for Action Recognition," presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, USA, June 15 - 21, 2019.
[30] J. Liu, A. Shahroudy, D. Xu, A. C. Kot, and G. Wang, "Skeleton-Based Action Recognition Using Spatio-Temporal LSTM Network with Trust Gates," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 12, pp. 3007-3021, 2018.
[31] S. Zhang et al., "Fusing Geometric Features for Skeleton-Based Action Recognition Using Multilayer LSTM Networks," IEEE Transactions on Multimedia, vol. 20, no. 9, pp. 2330-2343, 2018.
[32] T. M. Le, N. Inoue, and K. Shinoda, "A fine-to-coarse convolutional neural network for 3d human action recognition," 2018.
[33] A. Toshev and C. Szegedy, "DeepPose: Human Pose Estimation via Deep Neural Networks," presented at the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014/06, 2014. Available: http://dx.doi.org/10.1109/CVPR.2014.214
[34] M. Kocabas, S. Karagoz, and E. Akbas, "MultiPoseNet: Fast Multi-Person Pose Estimation Using Pose Residual Network," Cham, 2018, pp. 437-453: Springer International Publishing.
[35] B. Artacho and A. Savakis, "UniPose: Unified Human Pose Estimation in Single Images and Videos," arXiv preprint arXiv:2001.08095, 2020.
[36] L. Pishchulin et al., "DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation," 2016. Available: https://doi.org/10.1109/CVPR.2016.533
http://doi.ieeecomputersociety.org/10.1109/CVPR.2016.533
[37] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, "Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields," in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, USA, 2017, pp. 7291-7299.
[38] Z. Cao, G. H. Martinez, T. Simon, S. Wei, and Y. A. Sheikh, "OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields," IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1-1, 2019.
[39] H.-S. Fang, S. Xie, Y.-W. Tai, and C. Lu, "RMPE: Regional Multi-Person Pose Estimation," presented at the The IEEE International Conference on Computer Vision (ICCV), Venice, Italy, Oct. 22 - 29, 2017, 2017.
[40] B. Li et al., "The Overview of Multi-person Pose Estimation Method," in Signal and Information Processing, Networking and Computers, Singapore, 2019, pp. 600-607: Springer Singapore.
[41] B. Raj. (2018, March 30, 2020). An Overview of Human Pose Estimation with Deep Learning. Available: https://medium.com/beyondminds/an-overview-of-human-pose-estimation-with-deep-learning-d49eb656739b
[42] S. Qiao, Y. Wang, and J. Li, "Real-time human gesture grading based on OpenPose," in 2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), Shanghai, China, 2017, p. 6: IEEE.
[43] K. Mallat, S. K. Datta, and J.-L. Dugelay, "IoT Based People Detection for Emergency Scenarios," 2019.
[44] D. Xu, E. Ricci, Y. Yan, J. Song, and N. Sebe, "Learning deep representations of appearance and motion for anomalous event detection," arXiv preprint arXiv:1510.01553, 2015.
[45] R. T. Ionescu, S. Smeureanu, M. Popescu, and B. Alexe, "Detecting abnormal events in video using narrowed motion clusters," arXiv preprint arXiv:1801.05030, 2018.
[46] S. Amraee, A. Vafaei, K. Jamshidi, and P. Adibi, "Abnormal event detection in crowded scenes using one-class SVM," Signal, Image and Video Processing, vol. 12, no. 6, pp. 1115-1123, 2018.
[47] S. Bouindour, M. M. Hittawe, S. Mahfouz, and H. Snoussi, "Abnormal event detection using convolutional neural networks and 1-class SVM classifier," 2017.

全文公開日期 2025/07/27 (校內網路)
全文公開日期 2025/07/27 (校外網路)
全文公開日期 2025/07/27 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文