簡易檢索 / 詳目顯示

研究生: 許佑瑋
Yu-Wei Hsu
論文名稱: 人體動作辨識技術應用於標準加工動作及例外動作處理
Human Action Recognition on Standard Worker Operation and Exceptional Movement
指導教授: 楊朝龍
Chao-Lung Yang
口試委員: 歐陽超
Chao Ou-Yang
花凱龍
Kai-Lung Hua
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業管理系
Department of Industrial Management
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 77
中文關鍵詞: 人體動作辨識基於骨架之時空卷積網路支持向量機例外動作偵測
外文關鍵詞: Human Action Recognition, Skeleton-based Graph Convolutional Networks, SVM, Exceptional Motion Detection
相關次數: 點閱:335下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

本研究旨在應用基於骨架(skeleton-based)影像動作辨識技術於作業現場的標準加工作業及例外動作的辨識。本研究共分為兩實驗項目,首先應用時空圖卷積網路(Spatial Temporal Graph Convolution Network, ST-GCN)進行標準動作辨識模型建立,並以連續加工影像進行辨識測試。此研究的目的為利用人工智慧電腦視覺技術,辨識人員所操作之各個標準工序,進而取得各個動作的執行時間資訊。研究的第二部份針對加工例外動作偵測,提出兩階段(two-stage)方法,先以ST-GCN進行標準動作預訓練(pre-trained)模型建立,再將其輸出特徵資料以多個標準動作SVM分類器進行例外動作判定。其目的在無法窮舉所有可能出現在作業現場之規範外動作的情況下,以排除法偵測出定義之標準動作以外的例外動作,解決例外動作資料收集的困難,及模型訓練上的難題。本實驗以製造業三種常見的標準動作:上料、加工、下料動作為基礎進行模型訓練及測試。結果發現ST-GCN對三個動作的辨識準確率達84.7%,顯示該模型具備標準動作識別能力。第二部的實驗以ST-GCN結合多個標準動作SVM分類器,並以本研究提出之濾波處理器進行雜訊、例外動作訊號處理,最終在含有例外動作的連續加工測試集下,對標準動作及例外動作的整體辨識準確率達81.4%,其表現優於現今常用於例外情況判定的方法。


In this research, skeleton-based human action recognition technology was applied on standard and exceptional operation recognition. First, spatial-temporal graph convolution network (ST-GCN) was used for standard motion recognition and evaluated with continue processing videos. The objective is to apply artificial intelligence computer vision technology to identify the standard operation operated by personnel and extract the time duration of each motion. Second, the two-stage approach was developed for exceptional movement detection. Basically, ST-GCN was trained by standard motion data, then the feature output of it was employed to detect the exceptional movement with multiple SVM classifiers. The objective is to detect exceptional motions which are not defined or should be avoided during the operation. In this research, the first experiment was conducted based on three common standard movements in manufacturing: loading, processing and unloading, for model training and testing. The recognition accuracy by ST-GCN is 84.7% which demonstrates the capability of standard motion recognition. The second experiment conducted by the combined ST-GCN with multiple SVM calssifiers. The particular filter processing was proposed for noise and exceptional signal processing. Under the continuous processing videos with standard and exceptional movements, the overall recognition accuracy can reach 81.4%, which is better than the methods commonly used for exceptions detection.

摘要 i ABSTRACT ii 致謝 iii CONTENTS iv FIGURE LIST vi TABLE LIST viii CHAPTER 1. Introduction 1 CHAPTER 2. Litureature Review 4 2.1 Time Study 4 2.2 Deep Learning Technology on Human Action Recognition 6 2.3 Human Pose Estimation 10 2.4 Exceptional Motion Detection 11 CHAPTER 3. Methodology 13 3.1 Framework 13 3.2 Locate ROI in Video 15 3.3 Estimate Human Joints Location 15 3.4 Skeleton-based Action Recognition 17 3.5 SVM Standard Motion Classifier 20 3.6 Filter Processing 23 3.6.1 Noise Filter 24 3.6.2 Exceptional Motion Filter 25 CHAPTER 4. Experiments and Results 27 4.1 Data Description 28 4.1.1 Real-World Data from Manufacturing Site 28 4.1.2 Simulation Data of Drilling Process 30 4.2 Standard Motion Recognition 32 4.2.1 ST-GCN Network Configuration 33 4.2.2 Performance Evaluation 33 4.2.3 Result 34 4.3 Exceptional Motion Detection 35 4.3.1 ST-GCN Network Configuration 38 4.3.2 SVM Classifier Configuration 39 4.3.3 Filter 41 4.3.4 Result 44 CHAPTER 5. Conclusion 49 REFERENCES 51 APPENDIX 55

[1] F. E. Meyers and J. R. Stewart, Motion and Time Study for Lean Manufacturing, 3th ed. Pearson, 2001.
[2] J. Zhang, Z. Yin, and R. Wang, "Recognition of mental workload levels under complex human–machine collaboration by using physiological features and adaptive support vector machines," IEEE Transactions on Human-Machine Systems, vol. 45, no. 2, pp. 200-214, 2014.
[3] J. Yang, Z. Shi, and Z. Wu, "Vision-based action recognition of construction workers using dense trajectories," Advanced Engineering Informatics, vol. 30, no. 3, pp. 327-336, 2016/08/01/ 2016.
[4] C. Reining, F. Niemann, F. M. Rueda, G. A. Fink, and M. t. Hompel, "Human Activity Recognition for Production and Logistics—A Systematic Literature Review," Information, vol. 10, no. 8, p. 245, 2019.
[5] F. Andris and W. N. Benjamin, Niebel's Methods, Standards, and Work Design, 13e. McGraw-Hill Education, 2013.
[6] A. Azizi and P. G. Yazdi, "Investigating the Effect of VR, Time Study and Ergonomics on the Design of Industrial Workstations," International Journal of Mechanical and Mechatronics Engineering, vol. 13, no. 10, pp. 661-667, 2019.
[7] I. o. M. Services. (2020, March 27). Time Study. Available: https://www.ims-productivity.com/page.cfm/content/Time-Study/
[8] A. Golabchi, S. Han, and S. AbouRizk, "A simulation and visualization-based framework of labor efficiency and safety analysis for prevention through design and planning," Automation in construction, vol. 96, pp. 310-323, 2018.
[9] R. Poppe, "A survey on vision-based human action recognition," Image and Vision Computing, vol. 28, no. 6, pp. 976-990, 2010.
[10] M. Fu et al., "Human Action Recognition: A Survey," Singapore, 2019, pp. 69-77: Springer Singapore.
[11] J. Liu, J. Luo, and M. Shah, "Recognizing realistic actions from videos “in the wild”," in 2009 IEEE Conference on Computer Vision and Pattern Recognition,, Miami, FL, USA, 2009, pp. 1996-2003.
[12] K. Soomro, A. R. Zamir, and M. Shah. (2012). UCF101: A dataset of 101 human action classes from videos in the wild. Available: http://crcv.ucf.edu/data/UCF101.php.
[13] K. Soomro and A. R. Zamir, "Action Recognition in Realistic Sports Videos," in Computer Vision in Sports, T. B. Moeslund, G. Thomas, and A. Hilton, Eds. Cham: Springer International Publishing, 2014, pp. 181-208.
[14] H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, "HMDB: A large video database for human motion recognition," in 2011 International Conference on Computer Vision, Barcelona, Spain, 2011, pp. 2556-2563.
[15] A. Shahroudy, J. Liu, T.-T. Ng, and G. Wang, "NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis," in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 1010-1019.
[16] J. Zang, L. Wang, Z. Liu, Q. Zhang, G. Hua, and N. Zheng, "Attention-Based Temporal Weighted Convolutional Neural Network for Action Recognition," in IFIP International Conference on Artificial Intelligence Applications and Innovations, Rhodes, Greece, 2018, pp. 97-108: Springer International Publishing.
[17] L. Wang, Y. Xu, J. Cheng, H. Xia, J. Yin, and J. Wu, "Human Action Recognition by Learning Spatio-Temporal Features With Deep Neural Networks," IEEE Access, vol. 6, pp. 17913-17922, 2018.
[18] L. Wang et al., "Temporal Segment Networks: Towards Good Practices for Deep Action Recognition," Cham, 2016, pp. 20-36: Springer International Publishing.
[19] K. Simonyan and A. Zisserman, "Two-stream convolutional networks for action recognition in videos," in Advances in neural information processing systems, Montreal, Canada, 2014, pp. 568-576.
[20] S. Ji, W. Xu, M. Yang, and K. Yu, "3D Convolutional Neural Networks for Human Action Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 221-231, 2013.
[21] T. N. Kipf and M. Welling, "Semi-Supervised Classification with Graph Convolutional Networks," presented at the 5th International Conference on Learning Representations (ICLR-17), Toulon, France, April 24-26, 2017. Available: https://openreview.net/forum?id=SJU4ayYgl
[22] S. Yan, Y. Xiong, and D. Lin, "Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition," presented at the Thirty-Second AAAI Conference on Artificial Intelligence, Hilton New Orleans Riverside, NewOrleans, Louisiana, USA, February 2 - 7, 2018.
[23] W. Zheng, P. Jing, and Q. Xu, Action Recognition Based on Spatial Temporal Graph Convolutional Networks (Proceedings of the 3rd International Conference on Computer Science and Application Engineering). Sanya, China: Association for Computing Machinery, 2019, p. Article 118.
[24] L. Shi, Y. Zhang, J. Cheng, and H.Lu, "Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition," in CVPR 2019, Long Beach, CA, USA, 2019.
[25] L. Shi, Y. Zhang, J. Cheng, and H. Lu, "Skeleton-Based Action Recognition with Directed Graph Neural Networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019), Long Beach, CA, 2019, pp. 7912-7921.
[26] Paperswithcode.com. (2020, March 30, 2020). Browse State-of-the-Art. Available: https://paperswithcode.com/sota
[27] L. Shi, Y. Zhang, J. Cheng, and H. Lu, "Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019, pp. 12026-12035.
[28] C. Caetano, J. Sena, F. Brémond, J. A. D. Santos, and W. R. Schwartz, "SkeleMotion: A New Representation of Skeleton Joint Sequences based on Motion Information for 3D Action Recognition," presented at the 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Taipei, Taiwan, September 18 - 21, 2019.
[29] J. Liu, N. Akhtar, and A. Mian, "Skepxels: Spatio-temporal Image Representation of Human Skeleton Joints for Action Recognition," presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, USA, June 15 - 21, 2019.
[30] J. Liu, A. Shahroudy, D. Xu, A. C. Kot, and G. Wang, "Skeleton-Based Action Recognition Using Spatio-Temporal LSTM Network with Trust Gates," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 12, pp. 3007-3021, 2018.
[31] S. Zhang et al., "Fusing Geometric Features for Skeleton-Based Action Recognition Using Multilayer LSTM Networks," IEEE Transactions on Multimedia, vol. 20, no. 9, pp. 2330-2343, 2018.
[32] T. M. Le, N. Inoue, and K. Shinoda, "A fine-to-coarse convolutional neural network for 3d human action recognition," 2018.
[33] A. Toshev and C. Szegedy, "DeepPose: Human Pose Estimation via Deep Neural Networks," presented at the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014/06, 2014. Available: http://dx.doi.org/10.1109/CVPR.2014.214
[34] M. Kocabas, S. Karagoz, and E. Akbas, "MultiPoseNet: Fast Multi-Person Pose Estimation Using Pose Residual Network," Cham, 2018, pp. 437-453: Springer International Publishing.
[35] B. Artacho and A. Savakis, "UniPose: Unified Human Pose Estimation in Single Images and Videos," arXiv preprint arXiv:2001.08095, 2020.
[36] L. Pishchulin et al., "DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation," 2016. Available: https://doi.org/10.1109/CVPR.2016.533
http://doi.ieeecomputersociety.org/10.1109/CVPR.2016.533
[37] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, "Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields," in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, USA, 2017, pp. 7291-7299.
[38] Z. Cao, G. H. Martinez, T. Simon, S. Wei, and Y. A. Sheikh, "OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields," IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1-1, 2019.
[39] H.-S. Fang, S. Xie, Y.-W. Tai, and C. Lu, "RMPE: Regional Multi-Person Pose Estimation," presented at the The IEEE International Conference on Computer Vision (ICCV), Venice, Italy, Oct. 22 - 29, 2017, 2017.
[40] B. Li et al., "The Overview of Multi-person Pose Estimation Method," in Signal and Information Processing, Networking and Computers, Singapore, 2019, pp. 600-607: Springer Singapore.
[41] B. Raj. (2018, March 30, 2020). An Overview of Human Pose Estimation with Deep Learning. Available: https://medium.com/beyondminds/an-overview-of-human-pose-estimation-with-deep-learning-d49eb656739b
[42] S. Qiao, Y. Wang, and J. Li, "Real-time human gesture grading based on OpenPose," in 2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), Shanghai, China, 2017, p. 6: IEEE.
[43] K. Mallat, S. K. Datta, and J.-L. Dugelay, "IoT Based People Detection for Emergency Scenarios," 2019.
[44] D. Xu, E. Ricci, Y. Yan, J. Song, and N. Sebe, "Learning deep representations of appearance and motion for anomalous event detection," arXiv preprint arXiv:1510.01553, 2015.
[45] R. T. Ionescu, S. Smeureanu, M. Popescu, and B. Alexe, "Detecting abnormal events in video using narrowed motion clusters," arXiv preprint arXiv:1801.05030, 2018.
[46] S. Amraee, A. Vafaei, K. Jamshidi, and P. Adibi, "Abnormal event detection in crowded scenes using one-class SVM," Signal, Image and Video Processing, vol. 12, no. 6, pp. 1115-1123, 2018.
[47] S. Bouindour, M. M. Hittawe, S. Mahfouz, and H. Snoussi, "Abnormal event detection using convolutional neural networks and 1-class SVM classifier," 2017.

無法下載圖示 全文公開日期 2025/07/27 (校內網路)
全文公開日期 2025/07/27 (校外網路)
全文公開日期 2025/07/27 (國家圖書館:臺灣博碩士論文系統)
QR CODE