基於物件姿態與光流分析之影片異常檢測｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	李奕錕 Yi-Kun Lee
論文名稱：	基於物件姿態與光流分析之影片異常檢測 Video Anomaly Detection with Object-Based Pose Estimation and Optical Flow Analysis
指導教授：	郭景明 Jing-Ming Guo
口試委員:	郭景明 Jing-Ming Guo 楊士萱 Shin-Hsuan Yang 黃敬群 Ching-Chun Huang 范志鵬 Chih-Peng Fan 王乃堅 Nai-Jian Wang
學位類別：	碩士 Master
系所名稱：	電資學院 - 電機工程系 Department of Electrical Engineering
論文出版年：	2023
畢業學年度：	111
語文別：	中文
論文頁數：	82
中文關鍵詞：	異常檢測、自監督式學習、未來影像預測、光流分析、資料增強
外文關鍵詞：	Anomaly detection, self-supervised training, future frame prediction, optical flow, anomaly generation
相關次數：	點閱：548 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

隨著人工智慧深度學習的發展，有越來越多的資料量與種類，這些訓練集測試用的資料，在傳統訓練方式上需要大量地被標記，用來讓網路監督式學習產生我們想要他能辦到的事情，但並非所有資料都能被平衡且大量的取得，像是醫療影像、物件瑕疵等，相比於正常的影像，這些影像在本質上就較少發生在一般生活之中，因此如何從正常影像提取出關鍵特徵，並用以判別異常的方法開始受到越來越多的關注。
在本論文中使用到CUHK Avenue及ShanghaiTech Campus校園異常檢測資料集，其中ShanghaiTech Campus較為貼近一般影像，因為它有不同的光照條件、攝影角度與不同的背景組成，而兩個資料集在訓練階段皆只提供正常資料，因此，本論文將以前人提出來的方法加以改良，用其所提出的整體架構，利用多個預訓練模型分別提取物件特徵。本論文再加入未來影像預測方法，去判斷未來影像會如何變化，在此假設中正常影像應保有一致的傾向，不會發生過大的突然變化，所以當異常情況發生就能檢測到較大誤差。再來對輸入影像進行異常資料生成，讓網路模型學習到更多資訊，且讓模型有簡單估計人體姿態的能力，並將光流分析套用到預測影像上，提取正常影像的誤差分布，用以建立正常與異常性的關鍵區別。
最後在實驗過程中一一測試所使用方法的有效性，證明其對模型判斷有正面的影響力，並能夠與前人的架構模型很好的共同判別物件的異常。在最終結果發現不管是在哪一個資料集上，本論文提出的方法均比現有的最佳方法有更優的表現。

The practical application of deep learning has been greatly influenced by the availability of a large and diverse range of data. However, in reality, not all datasets have a sufficient number of samples. For instance, datasets containing medical images of lung X-rays, images of rare diseases, abnormal event in real life, and images of defects in optical inspection often suffer from a lack of data. Thus, more and more method of anomaly detection is proposed to solve this problem, which is mainly be done by learning the normal pattern of data, and detect the abnormal event with those limited knowledge.
To address this challenge, the self-supervised learning is combined with several approach of anomaly detection like future frame prediction, data augmentation, pose estimation, and motion consistence. The experiments in this thesis incorporate with the two public anomaly detection datasets, which are CUHK Avenue and ShanghaiTech Campus dataset. While ShanghaiTech Campus is considered as more challenged dataset then CUHK Avenue, as it has several scenes in training and testing data. We proposed a way to extract simple human pose estimation with self-supervised method, which could detect the abnormal behavioral. And the analysis of optical flow generated between real frame and prediction frame helps to understand the normal pattern of training data, which also found to be useful for the detection of abnormal activities. As it was examined in the experiments, the proposed method works well with existing method to outperform the current state-of-the-art method on both datasets.

摘要    I
Abstract    II
致謝    III
目錄    IV
圖片索引    VII
表格索引    IX
第一章 緒論    1
1    研究背景與動機    1
2    論文架構    2
第二章 文獻探討    3
1    深度學習概論    3
1.1    深度學習模型更新方式    3
1.2    全連接神經網路    4
1.3    卷積神經網路    5
2    預訓練模型    8
2.1    Mask R-CNN[3]    8
2.2    FlowNet2.0[4]    10
3    異常檢測    11
3.1    Accurate-Interpretable-VAD (AI-VAD)[8]    12
3.2    AlphaPose[9]    15
3.3    Contrastive Language-Image Pre-Training (CLIP)[10]    17
3.4    一般影像與影片的異常檢測方式    19
3.2.1.    Anomaly Segmentation Network (AnoSeg)[30]    19
3.2.2.    Anomaly Detection in Aerial Videos With Transformers[31]    21
3.5    藉助預訓練模型的異常檢測    25
3.2.1.    Solving Decoupled Spatio-Temporal Jigsaw Puzzles[28]    25
3.2.2.    Variational Abnormal Behavior Detection (VABD)[38]    28
第三章 基於物件姿態與光流分析之影片異常檢測    30
1    架構流程圖    30
2    資料集組成與提取    31
3    影像標籤與異常影像生成方法說明    35
4    模型訓練說明    36
5    測試階段之異常分數提取方法    37
第四章 影片異常檢測實驗結果    40
1    訓練與測試實驗環境    40
2    測試階段與評估指標    40
3    實驗結果與分析    41
3.1    消融實驗    41
3.2.1.    使用不同異常分數的貢獻度與輸入影像數量    41
3.2.2.    姿態檢測之異常分數評估函數    42
3.2.3.    光流分析之GMM參數    43
3.2.4.    去除AlphaPose[9]的結果    44
3.2.5.    更少的輸入影像(單張影像輸入)    45
3.2.6.    測試無資料擴增的結果(無異常生成)    46
3.2.7.    各項超參數數值對於評估指標的影響    46
3.2.8.    針對未見過的資料之測試效果    47
3.2.9.    與過往文獻比較結果    48
3.2    於ShanghaiTech各類場景之異常分數圖    49
第五章 結論與未來展望    62
參考文獻    63
附錄一    70


                                

[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, pp. 1097-1105, 2012.
[2] J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, “Imagenet: A large-scale hierarchical image database,” IEEE conference on computer vision and pattern recognition, pp. 248-255, 2009.
[3] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in Proceedings of the IEEE international conference on computer vision, pp. 2961-2969, 2017.
[4] E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox, “Flownet 2.0: Evolution of optical flow estimation with deep networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2462–2470, 2017.
[5] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016.
[6] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” arXiv preprint arXiv:1409.1556, 2014.
[7] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” Advances in neural information processing systems, vol. 28, pp. 91-99, 2015.
[8] T. Reiss and Y. Hoshen, “Attribute based representations for accurate and interpretable video anomaly detection,” arXiv preprint arXiv:2212.00789, 2022.
[9] H. S. Fang, J. Li., H. Tang, C. Xu, H. Zhu, Y. Xiu, Y. L. Li, C. Lu. “Alphapose: Whole-body regional multi-person pose estimation and tracking in real-time,” IEEE Trans. Pattern Anal. Mach. Intell. 2022.
[10] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, “Learning transferable visual models from natural language supervision,” In International Conference on Machine Learning, pp. 8748–8763, 2021.
[11] D. Gong, L. Liu, V. Le, B. Saha, M. R. Mansour, S. Venkatesh, and A. V. D. Hengel, “Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1705–1714, 2019.
[12] R. T. Ionescu, F. S. Khan, M. I. Georgescu, and L. Shao, “Object-centric auto-encoders and dummy anomalies for abnormal event detection in video,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7842–7851, 2019.
[13] S. Lee, H. G. Kim, and Y. M. Ro, “Bman: Bidirectional multi-scale aggregation networks for abnormal event detection,” IEEE Transactions on Image Processing, vol. 29, pp. 2395–2408, 2019.
[14] T. N. Nguyen and J. Meunier, “Anomaly detection in video sequence with appearance-motion correspondence,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 1273–1283, 2019.
[15] H. Park, J. Noh, and B. Ham, “Learning memory-guided normality for anomaly detection,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14360–14369, 2020.
[16] Z. Wang, Y. Zou, and Z. Zhang, “Cluster attention contrast for video anomaly detection,” in Proceedings of the 28th ACM International Conference on Multimedia, pp. 2463-2471, 2020.
[17] C. Sun, Y. Jia, Y. Hu, and Y. Wu, “Scene-aware context reasoning for unsupervised abnormal event detection in videos,” in Proceedings of the 28th ACM International Conference on Multimedia, pp.184-192, 2020.
[18] G. Yu, S. Wang, Z. Cai, E. Zhu, C. Xu, J. Yin, and M. Kloft, “Cloze test helps: Effective video anomaly detection via learning to complete video events,” in Proceedings of the 28th ACM International Conference on Multimedia, pp. 583–591, 2020.
[19] Y. Chang, Z. Tu, W. Xie, and J. Yuan, “ Clustering driven deep autoencoder for video anomaly detection,” In European Conference on Computer Vision, pp. 329–345, 2020.
[20] R. Cai, H. Zhang, W. Liu, S. Gao, and Z. Hao, “Appearance-motion memory consistency network for video anomaly detection,” in Proceedings of the AAAI conference on artificial intelligence, pp. 2374-3468, 2021.
[21] M. I. Georgescu, A. Barbalau, R. T. Ionescu, F. S. Khan, M. Popescu, and M. Shah, “Anomaly detection in video via selfsupervised and multi-task learning,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12737–12747, 2021.
[22] H. Lv, C. Chen, Z. Cui, C. Xu, Y. Li, and J. Yang, “Learning normal dynamics in videos with meta prototype network,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15425–15434, June 2021.
[23] Z. Liu, Y. Nie, C. Long, Q. Zhang, and G. Li, “A hybrid video anomaly detection framework via memory-augmented flow reconstruction and flowguided frame prediction,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13588–13597, October 2022.
[24] X. Feng, D. Song, Y. Chen, Z. Chen, J. Ni, and H. Chen, “Convolutional transformer based dual discriminator generative adversarial networks for video anomaly detection,” in Proceedings of the 29th ACM International Conference on Multimedia, pp. 5546-5554, 2021.
[25] M. I. Georgescu, R. T. Ionescu, F. S. Khan, M. Popescu, and M. Shah, “A background-agnostic framework with adversarial training for abnormal event detection in video,” IEEE transactions on pattern analysis and machine intelligence, vol. 44, pp. 4505–4523, 2021.
[26] N. C. Ristea, N. Madan, R. T. Ionescu, K. Nasrollahi, F. S. Khan, T. B. Moeslund, and M. Shah, “Self-supervised predictive convolutional attentive block for anomaly detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13576–13586, 2022.
[27] Z. Yang, P. Wu, J. Liu, and X. Liu, “Dynamic local aggregation network with adaptive clusterer for anomaly detection,” In European Conference on Computer Vision, pp. 404–421, 2022.
[28] G. Wang, Y. Wang, J. Qin, D. Zhang, X. Bao, and D. Huang, “Video anomaly detection by solving decoupled spatio-temporal jigsaw puzzles,” In European Conference on Computer Vision , pp. 494-511, 2022.
[29] T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C L. Zitnick. “Microsoft coco: Common objects in context,” in European conference on computer vision, pp. 740-755, 2014.
[30] J. Song, K. Kong, Y. I. Park, S. G. Kim, and S. J. Kang, “AnoSeg: Anomaly segmentation network using self-supervised learning,” arXiv preprint arXiv:2110.03396, 2021.
[31] P. Jin, L. Mou, G. S. Xia, and X. X. Zhu, “Anomaly detection in aerial videos with transformers,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1-13, 2022.
[32] J. Masci, U. Meier, D. Cire¸san, and J. Schmidhuber, “Stacked convolutional auto-encoders for hierarchical feature extraction,” in Proc. Int. Conf. Artif. Neural Netw., pp. 52–59, June 2011.
[33] D. T. Nguyen, Z. Lou, M. Klar, and T. Brox, “Anomaly detection with multiple-hypotheses predictions,” in Proc. Int. Conf. Mach. Learn., pp. 4800–4809, 2019.
[34] X. Wang, Y. Du, S. Lin, P. Cui, Y. Shen, and Y. Yang, “AdVAE: A self-adversarial variational autoencoder with Gaussian anomaly prior knowledge for anomaly detection,” Knowl.-Based Syst., vol. 190, pp. 105187-105187, Feb. 2020.
[35] S. Akcay, A. Atapour-Abarghouei, and T. P. Breckon, “GANomaly: Semi-supervised anomaly detection via adversarial training,” in Proc. Asian Conf. Comput. Vis., pp. 622–637, 2018.
[36] S. Akcay, A. Atapour-Abarghouei, and T. P. Breckon, “Skip-GANomaly: Skip connected and adversarially trained encoder–decoder anomaly detection,” in Proc. Int. Joint Conf. Neural Netw., pp. 1–8, Jul. 2019.
[37] M. Salehi, N. Sadjadi, S. Baselizadeh, M. H. Rohban, and H. R. Rabiee, “Multiresolution knowledge distillation for anomaly detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 14902–14912, Jun. 2021.
[38] J. Li, Q. Huang, Y. Du, X. Zhen, S. Chen, L. Shao. “Variational Abnormal Behavior Detection With Motion Consistency,” IEEE Trans. Image Process., vol. 31, pp. 275-286, 2022.
[39] G. Farneback, “Two-Frame Motion Estimation Based on Polynomial Expansion,” In 13th Scandinavian Conference on Image Analysis, pp. 363–370, 2003.
[40] W. Liu, H. Chang, B. Ma, S. Shan, X. Chen, “Diversity-Measurable Anomaly Detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12147-12156, 2023.
[41] X. Huang, C. Zhao and Z. Wu, “A Video Anomaly Detection Framework Based on Appearance-Motion Semantics Representation Consistency,” IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1-5, 2023.
[42] W. Liu, W. Luo, D. Lian and S. Gao, “Future Frame Prediction for Anomaly Detection - A New Baseline,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6536-6545, 2018.
[43] X. Wang, Z. Che, B. Jiang, N. Xiao, K. Yang, J. Tang, J. Ye, J. Wang, Q. Qi, “Robust Unsupervised Video Anomaly Detection by Multipath Frame Prediction,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 6, pp. 2301-2312, June 2022.
[44] D. Chen, P. Wang, L. Yue, Y. Zhang, and T. Jia, “Anomaly detection in surveillance video based on bidirectional prediction,” Image Vis. Comput., vol.98, no. 1, pp. 1-8, Jun. 2020.
[45] Y. Zhao, B. Deng, C. Shen, Y. Liu, H. Lu, and X. S. Hua, “Spatio-temporal autoencoder for video anomaly detection,” in Proceedings of the 25th ACM international conference on Multimedia, pp. 1933-1941, 2017.
[46] R. Morais, V. Le, T. Tran, B. Saha, M. Mansour and S. Venkatesh, “Learning Regularity in Skeleton Trajectories for Anomaly Detection in Videos,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11988-11996, 2019.
[47] M. Ye, X. Peng, W. Gan, W. Wu, and Y. Qiao, “AnoPCN: Video anomaly detection via deep predictive coding network,” in Proceedings of the 27th ACM International Conference on Multimedia, pp.1805-1813, 2019.
[48] Y. Tang, L. Zhao, S. Zhang, C. Gong, G. Li, J. Yang, “Integrating prediction and reconstruction for anomaly detection,” Pattern Recognition Letters, vol. 129, pp. 123–130, 2020.
[49] C. Lu, J. Shi and J. Jia, “Abnormal Event Detection at 150 FPS in MATLAB,” 2013 IEEE International Conference on Computer Vision, pp. 2720-2727, 2013.
[50] P. Wu, J. Liu and F. Shen, “A deep one-class neural network for anomalous event detection in complex scenes,” IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 7, pp. 2609-2622, July 2020.
[51] R. T. Ionescu, F. S. Khan, M. -I. Georgescu and L. Shao, “Object-centric auto-encoders and dummy anomalies for abnormal event detection in video,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7834-7843, 2019.
[52] J. Yi and S. Yoon, “Patch svdd: Patch-level svdd for anomaly detection and segmentation,” in Proceedings of the Asian Conference on Computer Vision, 2020.
[53] C. L. Li, K. Sohn, J. Yoon, and T. Pfister, “Cutpaste: Self-supervised learning for anomaly detection and localization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9664–9674, 2021

全文公開日期 2025/08/23 (校內網路)
全文公開日期 2025/08/23 (校外網路)
全文公開日期 2025/08/23 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文