簡易檢索 / 詳目顯示

研究生: 高子晴
Tzu-Ching Kao
論文名稱: 結合位置資訊及人體動作辨識模型於產線組裝動作辨識之研究
Combining location information with human action recognition model for assembly action recognition in production lines
指導教授: 楊朝龍
Chao-Lung Yang
口試委員: 林柏廷
Po-Ting Lin
王孔政
Kung-Jeng Wang
學位類別: 碩士
Master
系所名稱: 產學創新學院 - 智慧製造科技研究所
Graduate Institute of Intelligent Manufacturing Tech
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 46
中文關鍵詞: 人體動作辨識物件偵測模型極限梯度提升生產線應用工廠智慧轉型
外文關鍵詞: human action recognition, object detection, XGBoost, production line application, intelligent manufacturing
相關次數: 點閱:141下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

近年來,隨著電腦視覺與人工智慧的進步,人體動作辨識技術在工業工程與各個領域上獲得顯著躍進。然而,將該技術應用於實際場域仍面臨挑戰,特別是工廠生產線中的標準動作流程(Standard Operating Procedures, SOPs)與其例外動作(非SOPs)之辨識任務中。其中包含資源需求、多樣化動作之區分以及手動標記動作資料困難等挑戰。本研究提出人員組裝動作辨識系統架構(Assembly Action Recognition Framework, AARF),首先分別由人體動作辨識模組(SlowFast)輸出各動作之預測機率(即softmax分數),以及由人員辨識模組(You Only Look Once v7, YOLOv7)輸出主要作業人員之位置資訊。接著,將以上預測機率與位置資訊兩者結合並進行特徵工程與資料前處理。最後,將該特徵輸入極限梯度提升分類器,以進行SOPs與非SOPs之辨識。本研究使用產線模擬組裝動作影片資料共三十部,其中二十一部包含隨機出現的非SOPs。此外,也應用於實際生產線資料進行延伸實驗,該資料蒐集自圓展科技的會議鏡頭人員組裝之產線資料。實驗結果發現,AARF透過加入人員位置資訊,成功提高了整體動作辨識結果,其在模擬資料上可達準確率89.14%,而在實際工廠生產線資料上亦有平均準確率87.67%的表現。


In recent years, significant progress has been made in human motion recognition technology. However, applying this technology to practical scenarios, especially in the task of recognizing Standard Operating Procedures (SOPs) and non-Standard Operating Procedures (non-SOPs) for factory operators, still presents challenges such as differentiation of diverse movements and difficulties in anomaly detection. To address these challenges, this research proposes an Assembly Action Recognition Framework (AARF). Initially, the action recognition module (SlowFast) outputs the predicted probabilities for each action as softmax scores, while the operator detection module (You Only Look Once v7, YOLOv7) provides location information of the primary operators. Subsequently, these outputs are combined for feature engineering and preprocessing. Finally, the features are input into the assembly classifier (eXtreme Gradient Boosting, XGBoost) for the recognition of both SOPs and non-SOPs. The dataset for this research contains 30 clips of simulated assembly actions, with 21 clips involving randomly occurring abnormal actions. Additionally, the real-world data collected from production lines in AVer Tech is also applied in extensive experiments. The experimental results show that AARF successfully enhances the overall action recognition results by incorporating personnel location information. It achieves an accuracy of 89.14% for human action recognition in simulated data and an average accuracy of 87.67% in the authentic production lines.

摘要 i ABSTRACT ii 致謝 iii TABLE OF CONTENTS iv LIST OF FIGURES vi LIST OF TABLES vii CHAPTER 1. INTRODUCTION 1 1.1. Background & Motivation 1 1.2. Contributions 2 1.3. Thesis Organization 2 CHAPTER 2. LITERATURE REVIEW 3 2.1. Human Action Recognition (HAR) 3 2.2. Anomaly Detection on Human Action 4 CHAPTER 3. METHODOLOGY 6 3.1. Framework 6 3.2. Action Recognition Module 7 3.3. Operator Detection Module (YOLOv7) 8 3.4. Data Preprocessing 8 3.4.1. Feature Extraction 8 3.4.2. Normalization 9 3.4.3. Data Imbalance 10 3.5. Assembly Classifier: SOPs & non-SOPs Recognition 11 3.5.1. XGBoost 11 3.5.2. Accumulative Moving Mode (AMM) 12 CHAPTER 4. EXPERIMENTS AND RESULTS 14 4.1. Dataset 14 4.1.1. Sims-normal Dataset 14 4.1.2. AVer Dataset 15 4.2. Implementation 16 4.2.1. Configuration 16 4.2.2. Performance Evaluation 18 4.3. Experiments and Results 19 4.3.1. Evaluation of Proposed Method 19 4.3.2. Evaluation on Cross-abnormal 21 4.3.3. Ablation Study: Feature Selection Strategy 22 4.3.4. Ablation Study: Using Different Models as Assembly Classifier 24 4.3.5. Ablation Study: Filter 25 4.3.6. Ablation Study: Clip Sampling 26 4.3.7. Experiments on AVer Dataset 28 CHAPTER 5. CONCLUSION AND DISCUSSION 35 5.1. Conclusion 35 5.2. Future Work 35 REFERENCES 37 APPENDIX A 41 APPENDIX B 43 APPENDIX C 46

[1] A. S. Fangbemi, B. Liu, N. H. Yu, and Y. Zhang, "Efficient human action recognition interface for augmented and virtual reality applications based on binary descriptor," in Augmented Reality, Virtual Reality, and Computer Graphics: 5th International Conference, AVR 2018, Otranto, Italy, June 24–27, 2018, Proceedings, Part I 5, 2018, pp. 252-260: Springer.
[2] K. M. Sagayam and D. J. J. V. R. Hemanth, "Hand posture and gesture recognition techniques for virtual reality applications: a survey," vol. 21, pp. 91-107, 2017.
[3] J. T. Pinto, P. M. Carvalho, C. Pinto, A. Sousa, L. G. Capozzi, and J. Cardoso, "Streamlining action recognition in autonomous shared vehicles with an audiovisual cascade strategy," 2022.
[4] J. Yin, J. Han, C. Wang, B. Zhang, and X. Zeng, "A skeleton-based action recognition system for medical condition detection," in 2019 IEEE Biomedical Circuits and Systems Conference (BioCAS), 2019, pp. 1-4: IEEE.
[5] R. Zhen, W. Song, Q. He, J. Cao, L. Shi, and J. J. E. Luo, "Human-computer interaction system: A survey of talking-head generation," vol. 12, no. 1, p. 218, 2023.
[6] P. Pareek and A. J. A. I. R. Thakkar, "A survey on video-based human action recognition: recent updates, datasets, challenges, and applications," vol. 54, no. 3, pp. 2259-2322, 2021.
[7] J. Zhang, P. Wang, R. X. J. R. Gao, and C.-I. Manufacturing, "Hybrid machine learning for human action recognition and prediction in assembly," vol. 72, p. 102184, 2021.
[8] A. R. M. Forkan, F. Montori, D. Georgakopoulos, P. P. Jayaraman, A. Yavari, and A. Morshed, "An industrial IoT solution for evaluating workers' performance via activity recognition," in 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS), 2019, pp. 1393-1403: IEEE.
[9] C. Wang and J. J. I. A. Yan, "A comprehensive survey of rgb-based and skeleton-based human action recognition," 2023.
[10] Z. Sun et al., "Human action recognition from various data modalities: A review," vol. 45, no. 3, pp. 3200-3225, 2022.
[11] S. Zhang et al., "Deep learning in human activity recognition with wearable sensors: A review on advances," vol. 22, no. 4, p. 1476, 2022.
[12] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, "Realtime multi-person 2d pose estimation using part affinity fields," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7291-7299.
[13] C. Lugaresi et al., "Mediapipe: A framework for building perception pipelines," 2019.
[14] S. Yan, Y. Xiong, and D. Lin, "Spatial temporal graph convolutional networks for skeleton-based action recognition," in Proceedings of the AAAI conference on artificial intelligence, 2018, vol. 32, no. 1.
[15] H. Duan, Y. Zhao, K. Chen, D. Lin, and B. Dai, "Revisiting skeleton-based action recognition," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 2969-2978.
[16] H.-C. Nguyen, T.-H. Nguyen, R. Scherer, and V.-H. J. S. Le, "Deep learning for human activity recognition on 3D human skeleton: Survey and comparative study," vol. 23, no. 11, p. 5121, 2023.
[17] S. Salisu, A. Mohamed, M. Jaafar, A. S. Pauzi, H. A. J. C. Younis, Materials, and Continua, "A Survey on Deep Learning-Based 2D Human Pose Estimation Models," vol. 76, no. 2, 2023.
[18] C. Feichtenhofer, H. Fan, J. Malik, and K. He, "Slowfast networks for video recognition," in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 6202-6211.
[19] J. Ahn, Y. Jang, and J. S. Chung, "Slowfast Network for Continuous Sign Language Recognition," in ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024, pp. 3920-3924: IEEE.
[20] G. Dai, X. Shu, R. Yan, P. Huang, and J. Tang, "Slowfast Diversity-aware Prototype Learning for Egocentric Action Recognition," in Proceedings of the 31st ACM International Conference on Multimedia, 2023, pp. 7549-7558.
[21] G.-I. Kim, H. Yoo, K. J. C. S. S. Chung, and Engineering, "SlowFast Based Real-Time Human Motion Recognition with Action Localization," vol. 47, no. 2, 2023.
[22] W. Sultani, C. Chen, and M. Shah, "Real-world anomaly detection in surveillance videos," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 6479-6488.
[23] N. R. Prasad, J. C. King, and T. Lu, "Machine intelligence-based decision-making (MIND) for automatic anomaly detection," in Optical Pattern Recognition XVIII, 2007, vol. 6574, pp. 141-149: SPIE.
[24] O. P. Popoola, K. J. I. T. o. S. Wang, Man,, and P. C. Cybernetics, "Video-based abnormal human behavior recognition—A review," vol. 42, no. 6, pp. 865-878, 2012.
[25] L. Wang, H. Tan, F. Zhou, W. Zuo, and P. J. I. A. Sun, "Unsupervised anomaly video detection via a double-flow convlstm variational autoencoder," vol. 10, pp. 44278-44289, 2022.
[26] D. Li, X. Nie, X. Li, Y. Zhang, and Y. J. P. R. L. Yin, "Context-related video anomaly detection via generative adversarial network," vol. 156, pp. 183-189, 2022.
[27] D. Chen, L. Yue, X. Chang, M. Xu, and T. J. P. R. Jia, "NM-GAN: Noise-modulated generative adversarial network for video anomaly detection," vol. 116, p. 107969, 2021.
[28] W. Tan and J. Liu, "Detection of fights in videos: A comparison study of anomaly detection and action recognition," in European Conference on Computer Vision, 2022, pp. 676-688: Springer.
[29] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, "YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 7464-7475.
[30] T. Chen and C. Guestrin, "Xgboost: A scalable tree boosting system," in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 2016, pp. 785-794.
[31] C.-L. Yang, S.-C. Hsu, Y.-W. Hsu, and Y.-C. J. I. J. o. C. I. M. Kang, "HAR-time: Human action recognition with time factor analysis on worker operating time," vol. 36, no. 8, pp. 1219-1237, 2023.
[32] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[33] L. Torrey and J. Shavlik, "Transfer learning," in Handbook of research on machine learning applications and trends: algorithms, methods, and techniques: IGI global, 2010, pp. 242-264.
[34] W. Kay et al., "The kinetics human action video dataset," 2017.
[35] T.-Y. Lin et al., "Microsoft coco: Common objects in context," in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, 2014, pp. 740-755: Springer.
[36] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. J. J. o. a. i. r. Kegelmeyer, "SMOTE: synthetic minority over-sampling technique," vol. 16, pp. 321-357, 2002.
[37] G. E. Batista, R. C. Prati, and M. C. J. A. S. e. n. Monard, "A study of the behavior of several methods for balancing machine learning training data," vol. 6, no. 1, pp. 20-29, 2004.
[38] D. P. Kingma and J. J. a. p. a. Ba, "Adam: A method for stochastic optimization," 2014.

無法下載圖示
全文公開日期 2026/07/10 (校外網路)
全文公開日期 2029/07/10 (國家圖書館:臺灣博碩士論文系統)
QR CODE