簡易檢索 / 詳目顯示

研究生: 粘靖烽
Jing-Feng Nian
論文名稱: 結合人體骨架熱點圖與標準化流於固定行為模式之人體異常動作檢測
Human Action Anomaly Detection Under Specific Action Pattern by Integrating Human Skeleton Heatmaps and Normalizing Flow
指導教授: 楊朝龍
Chao-Lung Yang
口試委員: 花凱龍
Kai-Lung Hua
許嘉裕
Chia-Yu Hsu
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業管理系
Department of Industrial Management
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 56
中文關鍵詞: 異常偵測異常動作辨識標準化流人體骨架資訊熱點圖
外文關鍵詞: Action Anomaly Detection, Normalizing Flow, Human Skeleton Information
相關次數: 點閱:174下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究旨在開發一個於特定動作模式下之半監督式(Semi-supervised)異常動作檢測技術,其中特定動作指的是被規範之人體動作,例如人員於工廠組裝作業中的被規範之標準作業流程即為本研究所針對的特定動作。本研究利用人體姿態辨識技術(Pose Estimation)將人體RGB影像轉換為人體骨架座標資訊,並將座標資訊映射至無背景之影像上轉換成熱點圖以呈現骨架圖(Skeleton Graph)之動作姿態。再透過預訓練(pre-trained)之殘差神經網路(Residual Neural Network,ResNet)進行影像之特徵擷取,作為標準化流神經網路模型(Normalizing Flow)之輸入。在標準化流技術中,模型可以透過訓練之正常動作資料,學習正常動作之機率分布,並在進行異常檢測時以測試資料之機率分布與正常動作之機率進行比較,進而判定是否為異常。本研究將異常檢測作為一二元分類問題進行研究,其中分類的基準是基於資料之異常分數而非一般分類模型之類別機率。由於異常動作是多樣且難以列舉的,透過比較異常分數的機制能更有效的檢測出正常模式外之異常行為,還可以節省人工標註類別之成本。本研究以電腦組裝作業之標準作業流程作為實驗情境,其中包含了七項部件之組裝流程。在實驗中以正常動作影像進行訓練,訓練完成後使用包含異常動作之影像進行測試。實驗結果發現本研究提出之方法對比其他三種方法能夠有效的檢測出標準作業流程外之異常動作,平均準確率達80.65%,平均異常動作辨識率為79.45%。


    The purpose of this research is to develop a semi-supervised framework for detecting anomaly action under specific action patterns, where specific behavior refers to regulated movements, such as Standard Operation Procedure (SOP). This research employed Pose Estimation to convert RGB images of the human body into human skeleton coordinates, and the coordinates were mapped to background-free images to generate a heatmap of the Skeleton Graph. Then the pre-trained model Resudual Neural Network (ResNet) was utilized to extract features from the heatmap, which served as the input for the Normalizing Flow model. In the normalizing flow model, the model learns the probability distribution of normal action based on the normal action data, and compares it with the probability distribution of the testing data to detect whether it is an anomaly. Unlike traditional supervised methods, this method is able to detect anomalies that cannot be enumerated exhaustively. Additionally, it significantly reduced the time and resources required for data annotation. For this research, the SOP of computer assembly was employed as the experimental context, which included the action of assembling seven components. The experimental results demonstrate that the proposed method effectively detects anomaly actions outside the SOP, achieving an average accuracy of 80.65% and an average Recall of 79.45%. Furthermore, three anomaly detection frameworks were used to compare the dataset from this experiment, revealing that the proposed method outperformed them in terms of action anomaly detection.

    摘要 i ABSTRACT ii 致謝 iii TABLE OF CONTENTS v LIST OF FIGURES vii LIST OF TABLES viii CHAPTER 1. INTRODUCTION 1 CHAPTER 2. LITERATURE REVIEW 4 2.1. Anomaly detection 4 2.2. Normalizing Flow 8 2.3. Skeleton Feature Extraction 9 CHAPTER 3. METHODOLOGY 11 3.1. Framework 11 3.2. Pose Estimation 12 3.2.1. Human Skeleton Detection 14 3.2.2. Heatmap Generation 16 3.3. Normalizing Flow 18 3.3.1. Feature Extraction 18 3.3.2. Normalizing Flow 20 3.4. Anomaly Detection 22 CHAPTER 4. EXPERIMENTS AND RESULTS 24 4.1. Data Description 24 4.2. Implementation 26 4.2.1. Framework Configuration 26 4.2.2. Performance Evaluation 26 4.3. Experiments and Results 27 4.3.1. Evaluation of Proposed Method 28 4.3.2. Comparing with other methods 32 4.4. Result Discussion 35 CHAPTER 5. CONCLUSION 36 5.1. Conclusion 36 5.2. Future Work 37 REFERENCES 39 APPENDIX 44

    [1] F. Patrona, A. Chatzitofis, D. Zarpalas, and P. Daras, "Motion analysis: Action detection, recognition and evaluation based on motion capture data," Pattern Recognition, vol. 76, pp. 612-622, 2018.
    [2] P. Matikainen, M. Hebert, and R. Sukthankar, "Trajectons: Action recognition through the motion analysis of tracked features," in 2009 IEEE 12th international conference on computer vision workshops, ICCV workshops, Kyoto Japan, 2009, pp. 514-521: IEEE.
    [3] M. Lou, J. Li, G. Wang, and G. He, "AR-C3D: Action recognition accelerator for human-computer interaction on FPGA," in 2019 IEEE International Symposium on Circuits and Systems (ISCAS), SAPPORO JAPAN, 2019, pp. 1-4: IEEE.
    [4] M. Meng, H. Drira, and J. Boonaert, "Distances evolution analysis for online and off-line human object interaction recognition," Image Vision Computing
    vol. 70, pp. 32-45, 2018.
    [5] M.-F. Tsai and M.-H. Li, "Intelligent attendance monitoring system with spatio-temporal human action recognition," Soft Computing, vol. 27, no. 8, pp. 5003-5019, 2023.
    [6] H. Liu, M. Yao, and L. Wang, "Svrat: a skeleton-based intelligent monitoring system for violence recognition and abuser tracking," in 2021 IEEE International Conference on Multimedia and Expo (ICME), Shenzhen, China, 2021, pp. 1-6: IEEE.
    [7] S. Yan, Y. Xiong, and D. Lin, "Spatial temporal graph convolutional networks for skeleton-based action recognition," in Proceedings of the AAAI conference on artificial intelligence, New Orleans LA USA, 2018, vol. 32, no. 1: ACM.
    [8] W. Liu, J. Cao, Y. Zhu, B. Liu, and X. Zhu, "Real-time anomaly detection on surveillance video with two-stream spatio-temporal generative model," Multimedia systems, vol. 29, no. 1, pp. 59-71, 2023.
    [9] A. Ortiz et al., "Identifying patient-specific behaviors to understand illness trajectories and predict relapses in bipolar disorder using passive sensing and deep anomaly detection: protocol for a contactless cohort study," BMC psychiatry, vol. 22, no. 1, p. 288, 2022.
    [10] L. E. Nugroho, L. Lazuardi, and A. S. Prabuwono, "Detection of anomalous vital sign of elderly using hybrid k-means clustering and isolation forest," in TENCON 2018-2018 IEEE Region 10 Conference, Jeju Island, Korea, 2018, pp. 0913-0918: IEEE.
    [11] W. Sultani, C. Chen, and M. Shah, "Real-world anomaly detection in surveillance videos," in Proceedings of the IEEE conference on computer vision and pattern recognition, Salt Lake City, Utah, USA, 2018, pp. 6479-6488.
    [12] D. R. Patrikar and M. R. Parate, "Anomaly detection using edge computing in video surveillance system," International Journal of Multimedia Information Retrieval, vol. 11, no. 2, pp. 85-110, 2022.
    [13] L. Kratz and K. Nishino, "Anomaly detection in extremely crowded scenes using spatio-temporal motion pattern models," in 2009 IEEE conference on computer vision and pattern recognition, Miami, FL, 2009, pp. 1446-1453: IEEE.
    [14] S. Wu, B. E. Moore, and M. Shah, "Chaotic invariants of lagrangian particle trajectories for anomaly detection in crowded scenes," in 2010 IEEE computer society conference on computer vision and pattern recognition, San Francisco, CA, 2010, pp. 2054-2060: IEEE.
    [15] K. Shaukat et al., "A review of time-series anomaly detection techniques: A step to future perspectives," in Advances in Information and Communication: Proceedings of the 2021 Future of Information and Communication Conference (FICC), Volume 1, Vancouver, Canada, 2021, pp. 865-877: Springer.
    [16] V. Kaltsa, A. Briassouli, I. Kompatsiaris, L. J. Hadjileontiadis, and M. G. Strintzis, "Swarm intelligence for detecting interesting events in crowded environments," IEEE transactions on image processing, vol. 24, no. 7, pp. 2153-2166, 2015.
    [17] S. D. Bansod and A. V. Nandedkar, "Crowd anomaly detection and localization using histogram of magnitude and momentum," The Visual Computer, vol. 36, no. 3, pp. 609-620, 2020.
    [18] G. Pang, C. Shen, L. Cao, and A. V. D. Hengel, "Deep learning for anomaly detection: A review," ACM computing surveys, vol. 54, no. 2, pp. 1-38, 2021.
    [19] Z. K. Abbas and A. A. Al-Ani, "A comprehensive review for video anomaly detection on videos," in 2022 International Conference on Computer Science and Software Engineering (CSASE), Duhok Kurdistan Region, Iraq, 2022, pp. 1-1: IEEE.
    [20] V. Chandola, A. Banerjee, and V. Kumar, "Anomaly detection: A survey," ACM computing surveys, vol. 41, no. 3, pp. 1-58, 2009.
    [21] C. Huang et al., "Self-supervision-augmented deep autoencoder for unsupervised visual anomaly detection," IEEE Transactions on Cybernetics, vol. 52, no. 12, pp. 13834-13847, 2021.
    [22] L. Wang, H. Tan, F. Zhou, W. Zuo, and P. Sun, "Unsupervised anomaly video detection via a double-flow ConvLSTM variational autoencoder," IEEE Access, vol. 10, pp. 44278-44289, 2022.
    [23] D. Gonzalez, M. A. Patricio, A. Berlanga, and J. M. Molina, "Variational autoencoders for anomaly detection in the behaviour of the elderly using electricity consumption data," Expert Systems, vol. 39, no. 4, p. e12744, 2022.
    [24] D. Li, X. Nie, X. Li, Y. Zhang, and Y. Yin, "Context-related video anomaly detection via generative adversarial network," Pattern Recognition Letters, vol. 156, pp. 183-189, 2022.
    [25] D. Chen, L. Yue, X. Chang, M. Xu, and T. Jia, "NM-GAN: Noise-modulated generative adversarial network for video anomaly detection," Pattern Recognition, vol. 116, p. 107969, 2021.
    [26] D. Gudovskiy, S. Ishizaka, and K. Kozuka, "Cflow-ad: Real-time unsupervised anomaly detection with localization via conditional normalizing flows," in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 2022, pp. 98-107.
    [27] J. Yu et al., "Fastflow: Unsupervised anomaly detection and localization via 2d normalizing flows," arXiv preprint arXiv:.07677, 2021.
    [28] O. Hirschorn and S. Avidan, "Normalizing Flows for Human Pose Anomaly Detection," J arXiv preprint arXiv:.10946, 2022.
    [29] X. Xia et al., "GAN-based anomaly detection: A review," Neurocomputing, vol. 493, pp. 497-535, 2022.
    [30] L. Ruff et al., "A unifying review of deep and shallow anomaly detection," IEEE, vol. 109, no. 5, pp. 756-795, 2021.
    [31] L. Jézéquel, N.-S. Vu, J. Beaudet, and A. Histace, "Semi-supervised anomaly detection with contrastive regularization," in 2022 26th International Conference on Pattern Recognition (ICPR), Montreal, QC, Canada, 2022, pp. 2664-2671: IEEE.
    [32] S. Han, H. Song, S. Lee, S. Park, and M. Cha, "Elsa: Energy-based learning for semi-supervised anomaly detection," %J arXiv preprint arXiv:.15296, 2021.
    [33] A. Spahr, B. Bozorgtabar, and J.-P. Thiran, "Self-taught semi-supervised anomaly detection on upper limb x-rays," in 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), Nice, France., 2021, pp. 1632-1636: IEEE.
    [34] D. Rezende and S. Mohamed, "Variational inference with normalizing flows," in International conference on machine learning, Lille France 2015, pp. 1530-1538: PMLR.
    [35] M. Hajij, G. Zamzmi, R. Paul, and L. Thukar, "Normalizing Flow for Synthetic Medical Images Generation," in 2022 IEEE Healthcare Innovations and Point of Care Technologies (HI-POCT), Houston, Texas, USA., 2022, pp. 46-49: IEEE.
    [36] M. Rudolph, T. Wehrbein, B. Rosenhahn, and B. Wandt, "Fully convolutional cross-scale-flows for image-based defect detection," in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 2022, pp. 1088-1097.
    [37] L. Ruthotto and E. Haber, "An introduction to deep generative modeling," GAMM‐Mitteilungen, vol. 44, no. 2, p. e202100008, 2021.
    [38] J. Zhou et al., "Graph neural networks: A review of methods and applications," AI open, vol. 1, pp. 57-81, 2020.
    [39] S. Zhang, H. Tong, J. Xu, and R. Maciejewski, "Graph convolutional networks: Algorithms, applications and open challenges," in Computational Data and Social Networks: 7th International Conference, Shanghai, China, 2018, pp. 79-91: Springer.
    [40] R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, Columbus, OH, USA, 2014, pp. 580-587.
    [41] H. Yang, D. Yan, L. Zhang, Y. Sun, D. Li, and S. J. Maybank, "Feedback graph convolutional network for skeleton-based action recognition," IEEE Transactions on Image Processing, vol. 31, pp. 164-175, 2021.
    [42] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, "Realtime multi-person 2d pose estimation using part affinity fields," in Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA, 2017, pp. 7291-7299.
    [43] Y. Cai et al., "Learning delicate local representations for multi-person pose estimation," in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 2020, pp. 455-472: Springer.
    [44] Y. Xu, J. Zhang, Q. Zhang, and D. Tao, "Vitpose: Simple vision transformer baselines for human pose estimation," Advances in Neural Information Processing Systems, vol. 35, pp. 38571-38584, 2022.
    [45] T.-Y. Lin et al., "Microsoft coco: Common objects in context," in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 2014, pp. 740-755: Springer.
    [46] M. Andriluka, L. Pishchulin, P. Gehler, and B. Schiele, "2d human pose estimation: New benchmark and state of the art analysis," in Proceedings of the IEEE Conference on computer Vision and Pattern Recognition, Columbus, OH, USA, 2014, pp. 3686-3693: IEEE.
    [47] S. Johnson and M. Everingham, "Clustered pose and nonlinear appearance models for human pose estimation," in bmvc, Aberystwyth, Wales, UK, 2010, vol. 2, no. 4, p. 5: Aberystwyth, UK.
    [48] H. Gao, Z. Wang, and S. Ji, "Large-scale learnable graph convolutional networks," in Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, NYUnited States, 2018, pp. 1416-1424.
    [49] H. Chen, X. Mei, Z. Ma, X. Wu, and Y. Wei, "Spatial–temporal graph attention network for video anomaly detection," Image Vision Computing, vol. 131, p. 104629, 2023.
    [50] Y. Xu, C. Huang, Y. Nan, and S. Lian, "TAD: A Large-Scale Benchmark for Traffic Accidents Detection from Video Surveillance," arXiv preprint arXiv:.12386, 2022.
    [51] D. P. Kingma and P. Dhariwal, "Glow: Generative flow with invertible 1x1 convolutions," Advances in neural information processing systems, vol. 31, 2018.
    [52] H. Duan, Y. Zhao, K. Chen, D. Lin, and B. Dai, "Revisiting skeleton-based action recognition," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 2022, pp. 2969-2978.
    [53] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, 2016, pp. 770-778.
    [54] F. Zhang et al., "Mediapipe hands: On-device real-time hand tracking," arXiv preprint arXiv:.10214, 2020.
    [55] V. Bazarevsky, I. Grishchenko, K. Raveendran, T. Zhu, F. Zhang, and M. Grundmann, "Blazepose: On-device real-time body pose tracking," arXiv preprint arXiv:.10204, 2020.
    [56] M. Rudolph, B. Wandt, and B. Rosenhahn, "Same same but differnet: Semi-supervised defect detection with normalizing flows," in Proceedings of the IEEE/CVF winter conference on applications of computer vision, Virtual Conference, 2021, pp. 1907-1916: IEEE.
    [57] D. P. Kingma and J. J. a. p. a. Ba, "Adam: A method for stochastic optimization," in ICLR 2015, San Diego, CA, USA, 2014: ICLR.
    [58] J. Xu, H. Wu, J. Wang, and M. Long, "Anomaly transformer: Time series anomaly detection with association discrepancy," arXiv preprint arXiv:.02642
    2021.
    [59] Z. Chen, C. K. Yeo, B. S. Lee, and C. T. Lau, "Autoencoder-based network anomaly detection," in 2018 Wireless telecommunications symposium (WTS), Phoenix, AZ United States, 2018, pp. 1-5: IEEE.

    無法下載圖示
    全文公開日期 2025/07/19 (校外網路)
    全文公開日期 2025/07/19 (國家圖書館:臺灣博碩士論文系統)
    QR CODE