簡易檢索 / 詳目顯示

研究生: 鄭凱文
Kai-Wen Cheng
論文名稱: 影像異常行為的偵測與定位之研究
The Study of Video Anomaly Detection and Localization
指導教授: 陳郁堂
Yie-Tarng Chen
口試委員: 李根國
Gen-Guo Li
張智星
Jyh-Shing Roger Jang
闕志克
Tzi-cker Chiueh
蘇順豐
Shun-Feng Su
方文賢
Wen-Hsien Fang
林銘波
Ming-Bo Lin
學位類別: 博士
Doctor
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2016
畢業學年度: 105
語文別: 英文
論文頁數: 161
中文關鍵詞: 影像監視異常偵測定位矯正
外文關鍵詞: video surveillance, anomaly detection, localization refinement
相關次數: 點閱:306下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

這篇論文提出一個關於影像異常偵測與定位方法之研究,其主要架構是由階層式特徵、核心統計模組以及樹狀搜尋演算法所組成。儘管目前大多數的研究重心在於辨識單一事件是否有異常外觀或怪異行為 (我們稱之為局部異常),本篇論文則聚焦在辨識多個事件之間的互動關係是否異常,即使個別事件可能是正常的情況下 (我們稱之為整體異常)。為了達到同時偵測局部與整體異常之目的,我們首先利用階層式的特徵架構去描述影片中發生的各種事件。接著,我們從一個不包含任何異常事件的訓練資料中分析並學習正常事件的機率分佈,得到一個統計模型。最後,基於這個統計模型,我們提出一個樹狀結構的推論演算法,達到同時偵測與定位測試影像異常事件之目的。

根據上述的架構,本篇論文針對階層式特徵、統計模組、以及推論演算法三個方面進行研究與改善。
我們提出二個不同的階層式特徵:1) bag-of-words (BOW) 直方圖表示法和 2) 局部特徵點之ensemble表示法;二個不同的核心統計模組:1) 單一類別支持向量機 (SVM) 和 2) 高斯程序迴歸 (GPR);以及二個不同的樹狀搜尋演算法:1) 單一路徑搜尋法和 2) 多個路徑搜尋法。
實驗數據顯示本論文提出的方法成功地偵測與定位來自五個不同資料庫的各種影像異常事件,並且我們在效能與效率方面優於其他近年提出的方法。

此外,本論文提出的架構也可以延伸至物件辨識的領域,並成功改善許多基於卷積神經網路 (CNN) 的方法。我們提出一個樹狀搜尋演算法,視為後處理機制,利用迭代的方式不斷地矯正這些物件辨識方法的偵測結果,達到精準定位的效果,提升準確率。實驗顯示該演算法成功地偵測與定位PASCAL VOC 2007, 2012以及Youtube-Objects 資料庫中不同類別的物件,並且大幅改善現有的方法。


This dissertation presents a unified framework for video anomaly detection and localization via hierarchical feature representations, kernel-based statistical models, and tree-based search algorithms. While most research on this topic has focused more on detecting local anomalies, which refer to video events with unusual appearances or motions, we are more interested in global anomalies that involve multiple video events interacting in an unusual manner, even if any individual video event can be normal. To simultaneously detect local and global anomalies, we first introduce a hierarchical feature structure for video event representation. Then, a statistical model is built to understand the normal events in a training set which does not contain any anomalies, based on which a tree-based inference algorithm is developed to detect and locate abnormal events in unseen-before test videos.

Along the same structure, we gradually enrich our feature structures, statistical models, and inference algorithms to increasingly improve our previous methods. In this dissertation, we investigate two different hierarchical feature representations: 1) the bag-of-words histogram (BOW) and 2) the {\it ensemble} of nearby spatio-temporal interest points (STIP); two different kernel-based statistical models: 1) one-class support vector machine (SVM) and 2) Gaussian process regression (GPR); and two different inference algorithms: 1) single-instance path search and 2) multiple-instance path search (MiPS). Simulations on five popular benchmarks show that the proposed methods significantly outperform the main state-of-the-art methods, yet with lower computation time.

We also demonstrate that such a framework can be successfully applied to improve many convolution neural network (CNN) based object recognition methods. This is achieved by developing an iterative localization refinement (ILR) algorithm as a post-processing scheme to refine these object detection results in an iterative manner in order to match as much ground-truth as possible. Simulations show that the proposed method can improve the main state-of-the-art works on the large-scale PASCAL VOC 2007, 2012, and Youtube-Object datasets.

中文摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v Table of contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Poor Video Quality . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.2 Camera Viewpoint Distortion . . . . . . . . . . . . . . . . . . . 2 1.1.3 Sudden Illumination Change . . . . . . . . . . . . . . . . . . . . 3 1.1.4 Partial Occlusion . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.5 Variation in Normal Events . . . . . . . . . . . . . . . . . . . . . 3 1.1.6 Diversity of Abnormal Events . . . . . . . . . . . . . . . . . . . 4 1.1.7 Inaccurate Localization . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Motivations .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Contributions . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3.1 Summary of Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . 8 1.3.2 Summary of Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . 9 1.3.3 Summary of Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.4 Summary of Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . 11 1.3.5 Summary of Chapter 6 . . . . . . . . . . . . . . . . . . . . . . . 12 1.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.4.1 Video Anomaly Detection . . . . . . . . . . . . . . . . . . . . . 14 1.4.2 Video Anomaly Localization . . . . . . . . . . . . . . . . . . . . 16 1.4.3 Object recognition . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.5 Road map . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2 Video Anomaly Detection Using One-Class Support Vector Machine and Bayesian Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.2 Anomaly Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.2.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.2.2 One-Class Support Vector Machine . . . . . . . . . . . . . . . . 22 2.2.3 Anomaly Discriminancy . . . . . . . . . . . . . . . . . . . . . . 23 2.3 Experimental Results . .. . . . . . . . . . . . . . . . . . . . . . . . . 25 2.3.1 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . 25 2.3.2 Parameters in Experiments . . . . . . . . . . . . . . . . . . . . . 27 2.3.3 Comparisons with State-of-the-Art methods . . . . . . . . . . . . 27 2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3 Video Anomaly Detection Using Hierarchical Feature Representation and Gaussian Process Regression .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.1 Introduction . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2 Hierarchical Feature Representation . . . . . . . . . . . . . . . . . . . 34 3.2.1 Low Level: Multi-Scale Event Representation . . . . . . . . . . . 35 3.2.2 High Level: Ensemble of STIP Features . . . . . . . . . . . . . . 36 3.3 High-level Codebook Construction . . . . . . . . . . . . . . . . . . . . 37 3.3.1 Semantic and Structural Similarities . . . . . . . . . . . . . . . . 37 3.3.2 Bottom-Up Greedy Clustering . . . . . . . . . . . . . . . . . . . 38 3.4 GPR-based Global Anomaly Detection . . . . . . . . . . . . . . . . . . . 39 3.4.1 GPR Model Learning . . . . . . . . . . . . . . . . . . . . . . . . 39 3.4.2 GPR Model Inference . . . . . . . . . . . . . . . . . . . . . . . 40 3.4.3 Global Anomaly Detection . . . . . . . . . . . . . . . . . . . . . 42 3.5 Experimental Results . . . . . . . . .. . . . . . . . . . . . . . . . . . 44 3.5.1 Evaluation Protocol and Experiment Setup . . . . . . . . . . . . 45 3.5.2 Evaluation of the Proposed Approach . . . . . . . . . . . . . . . 46 3.5.3 Comparisons with Previous Works . . . . . . . . . . . . . . . . . 56 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4 Video Anomaly Localization Using Efficient Maximum Path Search . . . . . . 60 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.2 Anomaly Localization . .. . . . . . . . . . . . . . . . . . . . . . . . . 62 4.2.1 Region of Interest Extraction . . . . . . . . . . . . . . . . . . . . 63 4.2.2 Video Path Search . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.3.1 Evaluation Protocol and Experiment Setup . . . . . . . . . . . . 69 4.3.2 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . 70 4.3.3 Search Time Analysis . . . . . . . . . . . . . . . . . . . . . . . 74 4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5 Video Anomaly Localization Using Efficient Multiple-Instance Path Search . 77 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.2 Multiple-Instance Path Search Algorithm . . . . . . . . . . . . . . . . . 80 5.2.1 Region of Interest Extraction . . . . . . . . . . . . . . . . . . . . 80 5.2.2 Video Path Search . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.2.3 Adaptive Density-Based Threshold Selection . . . . . . . . . . . 88 5.3 Online Multiple-Instance Search Algorithm . . . . . . . . . . . . . . . . 89 5.4 Simulations and Discussions . . . . . . . . . . . . . . . . . . . . . . . 90 5.4.1 Evaluation Protocol and Experiment Setup . . . . . . . . . . . . 91 5.4.2 Assessment of the Proposed ROI Extraction . . . . . . . . . . . . 94 5.4.3 Assessment of the Proposed Threshold Selection Scheme . . . . . 95 5.4.4 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . 96 5.4.5 Search Speed Analysis . . . . . . . . . . . . . . . . . . . . . . . 100 5.5 Summary . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 6 Object Detection and Localization using Convolutional Neural Networks and Iterative Localization Refinement . . . . . . . . . . . . . . . . . . . . . 102 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 6.2 The Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.2.1 Overall Methodology . . . . . . . . . . . . . . . . . . . . . . . . 103 6.2.2 Iterative Localization Refinement . . . . . . . . . . . . . . . . . 105 6.2.3 The Divide-and-Conquer Paradigm . . . . . . . . . . . . . . . . 107 6.2.4 Fast Approximate Approach . . . . . . . . . . . . . . . . . . . . 111 6.2.5 Convergence Analysis . . . . . . . . . . . . . . . . . . . . . . . 111 6.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 112 6.3.1 Experiment Setup and Evaluation Protocol . . . . . . . . . . . . 113 6.3.2 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . 115 6.3.3 Comparisons with State-of-the-Art Methods . . . . . . . . . . . . 118 6.3.4 Computation Time Analysis . . . . . . . . . . . . . . . . . . . . 121 6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

[1] K.-W. Cheng, Y.-T. Chen, and W.-H. Fang, “An efficient subsequence search for video anomaly detection and localization,” Multimed. Tool Appl., pp. 1–22, Jan. 2015.
[2] K.-W. Cheng, Y.-T. Chen, and W.-H. Fang, “Gaussian process regression-based
video anomaly detection and localization with hierarchical feature representation,” IEEE Trans. Image Process., vol. 24, pp. 5288 – 5301, Dec. 2015.
[3] K.-W. Cheng, Y.-T. Chen, and W.-H. Fang, “Video anomaly detection and localization using hierarchical feature representation and gaussian process regression,” in Conf. Comput. Vis. Pattern Recognit., pp. 2909 – 2917, June 2015.
[4] D. Tran, J. Yang, and D. Forsyth, “Video event detection: from subvolume localization to spatio-temporal path search,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, pp. 404 – 416, July 2013.
[5] V. Mahadevan, W. Li, V. Bhalodia, and N. Vasconcelos, “Anomaly detection in
crowded scenes,” in Conf. Comput. Vis. Pattern Recognit., pp. 1975 – 1981, June
2010.
[6] R. Mehran, A. Oyama, and M. Shah, “Abnormal crowd behavior detection using
social force model,” in Conf. Comput. Vis. Pattern Recognit., pp. 935–942, June
2009.
[7] A. Adam, E. Rivlin, I. Shimshoni, and D. Reinitz, “Robust real-time unusual event detection using multiple fixed-location monitors,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, pp. 555–560, Jan. 2008.
[8] C. H. Lampert, M. B. Blaschko, and T. Hofmann, “Efficient subwindow search: A branch and bound framework for object localization,” IEEE Trans. Pattern Anal.
Mach. Intell., vol. 31, pp. 2129 – 2142, July 2009.
[9] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: A survey,” ACM
Comput. Surv., vol. 41, pp. 1–58, 2009.
[10] O. Popoola and K. Wang, “Video-based abnormal human behavior recognitionxa
review,” IEEE Trans. Syst. Man Cybern. Soc., pp. 865–878, Nov. 2012.
[11] Z. Zivkovic, “Improved adaptive gaussian mixture model for background subtraction,” in Int. Conf. Pattern Recognit., pp. 28–31, Aug. 2004.
[12] Y.-T. Chen, Y.-C. Lin, andW.-H. Fang, “A video-based human fall detection system for the smart home,” J. Chin. Eng., pp. 681–690, 2010.
[13] C. C. Loy, T. Xiang, and S. Gong, “Modelling multi-object activity by gaussian processes,” in Br. Mach. Vis. Conf., 2009.
[14] D. Duque, H. Santos, and P. Cortez, “Prediction of abnormal behaviors for intelligent video surveillance systems,” in Comput. Intell. Data Min., pp. 362 – 367, 2007.
[15] J. Kwon and K. M. Lee, “A unified framework for event summarization and rare event detection,” in Conf. Comput. Vis. Pattern Recognit., pp. 1266 – 1273, June 2012.
[16] Y.-T. Chen, Y.-C. Lin, andW.-H. Fang, “A hybrid human fall detection scheme,” in IEEE Int. Conf. Image Process., pp. 3485 – 3488, Sept. 2010.
[17] K. Kim, D. Lee, and I. Essa, “Gaussian process regression flow for analysis of motion trajectories,” in Int. Conf. Comput. Vis., pp. 1164 – 1171, Nov. 2011.
[18] D. Helbing and P. Molnar, “Social force model for pedestrian dynamics,” Phys. Rev. E, vol. 51, pp. 4282–4286, May 1995.
[19] Y. Benezeth, P. Jodoin, V. Saligrama, , and C. Rosenberger, “Abnormal events detection based on spatio-temporal co-occurences,” in Conf. Comput. Vis. Pattern Recognit., pp. 2458 – 2465, June 2009.
[20] Y. Cong, J. Yuan, and J. Liu, “Sparse reconstruction cost for abnormal event detection,” in Conf. Comput. Vis. Pattern Recognit., pp. 3449 – 3456, June 2011.
[21] A. Chan and N. Vasconcelos, “Modeling, clustering, and segmenting video with mixtures of dynamic textures,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, pp. 909 – 926, May 2008.
[22] W. Li, V. Mahadevan, and N. Vasconcelos, “Anomaly detection and localization in crowded scenes,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, pp. 18 – 32, June 2013.
[23] V. Saligrama and Z. Chen, “Video anomaly detection based on local statistical aggregates,” in Conf. Comput. Vis. Pattern Recognit., pp. 2112 – 2119, June 2012.
[24] V. Kaltsa, A. Briassouli, I. Kompatsiaris, L. J. Hadjileontiadis, and M. G. Strintzis, “Swarm intelligence for detecting interesting events in crowded environments,” IEEE Trans. Image Process., vol. 24, pp. 2153 – 2166, Mar 2015.
[25] O. Boiman and M. Irani, “Detecting irregularities in images and in video,” Int. J. Comput. Vis., vol. 74, pp. 17 – 31, Aug. 2007.
[26] M. J. Rohstkhari and M. D. Levine, “Online dominant and anomalous behavior
detection in videos,” in Conf. Comput. Vis. Pattern Recognit., pp. 2611 – 2618,
June 2013.
[27] I. Pruteanu-Malinici and L. Carin, “Infinite hidden markov models for unusualevent detection in video,” IEEE Trans. Image Process., vol. 17, pp. 811 – 822, Mar 2008.
[28] X. Cui, Q. Liu, M. Gao, and D. N. Metaxas, “Abnormal detection using interaction energy potentials,” in Conf. Comput. Vis. Pattern Recognit., pp. 3161 – 3167, June 2011.
[29] C. Lu, J. Shi, and J. Jia, “Abnormal event detection at 150 fps in matlab,” in Int. Conf. Comput. Vis., pp. 2720 – 2727, Dec. 2013.
[30] K. L. and N. K., “Anomaly detection in extremely crowded scenes using spatiotemporal motion pattern models,” in Conf. Comput. Vis. Pattern Recognit., pp. 1446 – 1453, June 2009.
[31] Y. Benezeth, P.-M. Jodoin, and V. Saligrama, “Abnormality detection using lowlevel co-occurring events,” Pattern Recognit. Lett., vol. 32, pp. 423 – 431, Feb. 2011.
[32] K.-W. Cheng, Y.-T. Chen, and W.-H. Fang, “Abnormal crowd behavior detection
and localization using maximum sub-sequence search,” in ACM/IEEE Int. Workshop
Anal. Retr. Track. Event Motion Imag. Stream, pp. 49 – 58, Oct. 2013.
[33] T. Wang, S. Wang, and X. Ding, “Detecting human action as the spatio-temporal
tube of maximum mutual information,” IEEE Trans. Circuits Syst. Video Technol.,
vol. 24, pp. 277 – 290, Feb 2014.
[34] L. Fei-Fei and P. Perona, “A bayesian hierarchical model for learning natural scene
categories,” in Conf. Comput. Vis. Pattern Recognit., pp. 524 – 531, June 2005.
[35] L. Fei-Fei, R. Fergus, and A. Torralba, “Recognizing and
learning object categories,” in Antonio Torralba’s website.
http://people.csail.mit.edu/torralba/shortCourseRLOC/., June 2007.
[36] B. Scholkopt, J. Platt, J. Shawe-Taylor, A. Smola, and R. Williamson, “Estimating
the support of a high-dimensional distribution,” Neural Comput., vol. 13, pp. 1443–
1471, July 2001.
[37] J. Yuan, Z. Liu, and Y.Wu, “Discriminative video pattern search for efficient action
detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, pp. 1728 – 1743, nov
2011.
[38] N. Otsu, “A threshold selection method from gray-level histograms,” IEEE Trans.
Sys. Man. Cyber., vol. 9, pp. 62 – 66, 1979.
[39] J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and A. W. M. Smeulders,
“Selective search for object recognition,” Int. J. Comput. Vis., vol. 104, pp. 154–
171, 2013.
[40] C. L. Zitnick and P. Dollar, “Edge boxes: Locating object proposals from edges,”
in Eur. Conf. Comput. Vis., 2014.
[41] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional
networks for visual recognition,” IEEE Trans. Pattern Anal. Mach. Intell.,
vol. 37, pp. 1904 – 1916, jun 2015.
[42] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate
object detection and semantic segmentation,” in Conf. Comput. Vis. Pattern
Recognit., pp. 580 – 587, jun 2014.
[43] P. Sermanet, D. Eigen, X. Z., M. Mathieu, R. Fergus, and Y. LeCun, “Overfeat:
Integrated recognition, localization and detection using convolutional networks,”
in Int. Conf. Learn. Represent., dec 2013.
[44] W. Ouyang, P. Luo, X. Zeng, S. Qiu, Y. Tian, H. Li, S. Yang, Z. Wang, Y. Xiong,
C. Qian, Z. Zhu, R. Wang, C.-C. Loy, X. Wang, and X. Tang, “Deepid-net: multistage
and deformable deep convolutional neural networks for object detection,” in
Conf. Comput. Vis. Pattern Recognit., pp. 2403 – 2412, jun 2015.
[45] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke,
and A. Rabinovich, “Going deeper with convolutions,” in Conf. Comput.
Vis. Pattern Recognit., pp. 1 – 9, June 2015.
[46] Y.-H. Tsai, O. C. Hamsici, and M.-H. Yang, “Adaptive region pooling for object
detection,” in Conf. Comput. Vis. Pattern Recognit., pp. 731 – 739, jun 2015.
[47] Y. Zhu, R. Urtasun, R. Salakhutdinov, and S. Fidler, “segdeepm: Exploiting segmentation
and context in deep neural networks for object detection,” in Conf. Comput.
Vis. Pattern Recognit., pp. 4703 – 4711, jun 2015.
[48] C. Szegedy, A. Toshev, and D. Erhan, “Deep neural networks for object detection,”
in Adv. Neural Inf. Process. Syst., pp. 2553–2561, 2013.
[49] T. Kong, A. Yao, Y. Chen, and F. Sun, “Hypernet: Towards accurate region proposal
generation and joint object detection,” in Conf. Comput. Vis. Pattern Recognit.,
jun 2016.
[50] D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov, “Scalable object detection
using deep neural networks,” in Conf. Comput. Vis. Pattern Recognit., pp. 2155 –
2162, jun 2014.
[51] P. F. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan, “Object detection
with discriminatively trained part-based models,” IEEE Trans. Pattern Anal. Mach.
Intell, vol. 32, pp. 1627 – 1645, 2010.
[52] V. M. Kettnaker, “Time-dependent hmms for visual intrusion detection,” in Comput.
Vis.Pattern Recognit. Workshop, p. 34, June 2003.
[53] A. S. Voulodimos, D. I. Kosmopoulos, N. D. Doulamis, and T. A. Varvarigou, “A
top-down event-driven approach for concurrent activity recognition,” Multimed.
Tool Appl., vol. 69, pp. 293–311, 2014.
[54] A.Wiliem, V. Madasu,W. Boles, and P. Yarlagadda, “Detecting uncommon trajectories,”
in Digit. Image Comput. Tech. Appl., pp. 398 – 404, Dec. 2008.
[55] C. Stauffer and W. Grimson, “Learning patterns of activity using real-time tracking,”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, pp. 747 – 757, Aug. 2000.
[56] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in
Conf. Comput. Vis. Pattern Recognit., pp. 886 – 893, June 2005.
[57] N. Dalal, B. Triggs, and C. Schmid, “Human detection using oriented histograms
of flow and appearance,” in Eur. Conf. Comput. Vis., pp. 428–441, May 2006.
[58] T. Wang and H. Snoussi, “Histograms of optical flow orientation for abnormal
events detection,” in IEEE Int. Workshop Perform. Eval. Track. Surveill., pp. 45 –
52, Jan. 2013.
[59] C.-K. Lee, M.-F. Ho, W.-S. Wen, and C.-L. Huang, “Abnormal event detection in
video using n-cut clustering,” in J. Intell. Inf. Hiding Multimedia Signal Process.,
pp. 407–410, Dec. 2006.
[60] F. Nater, H. Grabner, and L. Van Gool, “Exploiting simple hierarchies for unsupervised
human behavior analysis,” in Conf. Comput. Vis. Pattern Recognit., pp. 2014–
2020, June 2010.
[61] D. Tran and J. Yuan, “Optimal spatio-temporal path discovery for video event detection,”
in Conf. Comput. Vis. Pattern Recognit., pp. 3321 – 3328, June 2011.
[62] D. H. Hu, X. Zhang, J. Yin, V. W. Zheng, and Q. Yang, “Abnormal activity recognition
based on hdp-hmm models,” in Int. Jt. Conf. Artif. Intell., pp. 1715–1720,
2009.
[63] J. Yin, Q. Yang, and J. Pan, “Sensor-based abnormal human-activity detection,”
IEEE Trans. Knowl. Data Eng., vol. 20, pp. 1082 – 1090, Aug. 2008.
[64] V. Krger, D. Kragic, and C. Geib, “The meaning of action: a review on action
recognition and mapping,” Adv. Robot., vol. 21, pp. 1473 – 1501, 2007.
[65] J. M. Wang, D. J. Fleet, and A. Hertzmann, “Gaussian process dynamical models
for human motion,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, pp. 283–298,
Feb. 2008.
[66] V. Reddy, C. Sanderson, and B. C. Lovell, “Improved anomaly detection in
crowded scenes via cell-based analysis of foreground speed, size and texture,” in
Conf. Comput. Vis. Pattern Recognit. Workshops, pp. 55 – 61, jun 2011.
[67] J. Yuan, Z. Liu, and Y. Wu, “Discriminative subvolume search for efficient action
detection,” in Conf. Comput. Vis. Pattern Recognit., pp. 2442 – 2449, June 2009.
[68] J. Berclaz, F. Fleuret, E. Turetken, and P. Fua, “Multiple object tracking using
k-shortest paths optimization,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33,
pp. 1806 – 1819, 2011.
[69] H. B. Shitrit, J. Berclaz, F. Fleuret, and P. Fua, “Multi-commodity network flow
for tracking multiple people,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36,
pp. 1614 – 1627, Oct. 2014.
[70] A. W. M. Smeulders, D. M. Chu, R. Cucchiara, S. Calderara, A. Dehghan, and
M. Shah, “Visual tracking: An experimental survey,” IEEE Trans. Pattern Anal.
Mach. Intell., vol. 36, pp. 1442 – 1468, Nov. 2014.
[71] X. Zhang, Y.-H. Yang, Z. Han, H. Wang, and C. Gao, “Object class detection: A
survey,” ACM Comput. Surv.(CSUR), vol. 46, pp. 10:1–10:53, Oct. 2013.
[72] B. Leibe, A. Leonardis, and B. Schiele, “Robust object detection with interleaved
categorization and segmentation,” Int. Conf. Comput. Vis., vol. 77, pp. 259 – 289,
May 2008.
[73] J. Sivic and A. Zisserman, “Video google: a text retrieval approach to object matching
in videos,” in Int. Conf. Comput. Vis., pp. 1470 – 1477, oct 2003.
[74] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: Spatial pyramid
matching for recognizing natural scene categories,” in Conf. Comput. Vis. Pattern
Recognit., pp. 2169 – 2178, June 2006.
[75] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J.
Comput. Vis., vol. 60, pp. 91 – 110, Nov. 2004.
[76] T. Ahonen, A. Hadid, and M. Pietikainen, “Face recognition with local binary patterns,”
in Eur. Conf. Comput. Vis., 2004.
[77] R. Girshick, “Fast R-CNN,” in Int. Conf. Comput. Vis., Dec. 2015.
[78] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object
detection with region proposal networks,” in Neural Inf. Process. Syst., pp. 91–99,
2015.
[79] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied
to document recognition,” in Proc. of the IEEE, vol. 86, pp. 2278 – 2324, nov 1998.
[80] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep
convolutional neural networks,” in Adv. Neural Inf. Process. Syst., pp. 1106–1114,
2012.
[81] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell,
“Decaf: A deep convolutional activation feature for generic visual recognition,” in
Int. Conf. Mach. Learn., jun 2014.
[82] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,”
in Eur. Conf. Comput. Vis., pp. 818–833, sep 2014.
[83] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale
image recognition,” in Int. Conf. Learn. Represent., 2015.
[84] H. A. Rowley, S. Baluja, , and T. Kanade, “Neural network-based face detection,”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 1, pp. 23 – 38, 1998.
[85] P. Sermanet, K. Kavukcuoglu, S. Chintala, and Y. LeCun, “Pedestrian detection
with unsupervised multi-stage feature learning,” in Conf. Comput. Vis. Pattern
Recognit., pp. 3626 – 3633, jun 2013.
[86] B. Alexe, T. Deselaers, and V. Ferrari, “What is an object ?,” in Conf. Comput. Vis.
Pattern Recognit., pp. 73 – 80, June 2010.
[87] B. Alexe, T. Deselaers, and V. Ferrari, “Measuring the objectness of image windows,”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, pp. 2189–2202, nov 2012.
[88] P. Arbelaez, J. Pont-Tuset, J. T. Barron, F. Marques, and J. Malik, “Multiscale
combinatorial grouping,” in Conf. Comput. Vis. Pattern Recognit., pp. 328 – 335,
jun 2014.
[89] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified,
real-time object detection,” in Conf. Comput. Vis. Pattern Recognit., June 2016.
[90] J. Redmon and A. Angelova, “Real-time grasp detection using convolutional neural
networks,” in Int. Conf. Robot. Autom., pp. 1316 – 1322, May 2015.
[91] Y. Zhang, K. Sohn, R. Villegas, G. Pan, and H. Lee, “Improving object detection
with deep convolutional networks via bayesian optimization and structured prediction,”
in Conf. Comput. Vis. Pattern Recognit., pp. 249 – 258, jun 2015.
[92] S. Gidaris and N. Komodakis, “Object detection via a multi-region & semantic
segmentation-aware cnn model,” in Int. Conf. Comput. Vis., pp. 1134–1142, dec
2015.
[93] A. Zaharescu and R. Wildes, “Anomalous behaviour detection using spatiotemporal
oriented energies, subset inclusion histogram comparison and event-driven
processing,” in Eur. Conf. Comput. Vis., pp. 563–576, Sept. 2010.
[94] E. Rublee, V. Rabaud, K. Konolige, and G. R. Bradski, “Orb: An efficient alternative
to sift or surf,” in Int. Conf. Comput. Vis., pp. 2564 – 2571, Nov. 2011.
[95] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman,
“The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results.”
http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html.
[96] M. Bertini, A. D. Bimbo, and L. Seidenari, “Multi-scale and real-time nonparametric
approach for anomaly detection and localization,” Comput. Vis. Image
Underst., vol. 116, pp. 320–329, Mar. 2012.
[97] B. Antic and B. Ommer, “Video parsing for abnormality detection,” in Int. Conf.
Comput. Vis., pp. 2415 – 2422, Nov. 2011.
[98] P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie, “Behavior recognition via
sparse spatio-temporal features,” in IEEE Int. Workshop Vis. Surveill. Perform.
Eval. Track. Surveill., pp. 65–72, Oct. 2005.
[99] P. Scovanner, S. Ali, and M. Shah, “A 3-dimensional sift descriptor and its application
to action recognition,” in ACM Multimed., pp. 357–360, Sept. 2007.
[100] B. S. Everitt, S. Landau, M. Leese, and D. Stahl, Miscellaneous Clustering Methods,
in Cluster Analysis. John Wiley & Sons, Ltd, 5th Edition, 2011.
[101] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning:
Data Mining, Inference, and Prediction. Springer, 2009.
[102] C. Bishop, Pattern Recognition and Machine Learning. Springer, 2007.
[103] E. Tapia, “A note on the computation of high-dimensional integral images,” Pattern
Recognit. Lett., vol. 32, pp. 197 – 201, Jan. 2011.
[104] C. E. Rasmussen and C. K. I.Williams, Gaussian Processes for Machine Learning.
The MIT Press, 2006.
[105] J. Sylvester and J. Joseph, “On the relation between the minor determinants of
linearly equivalent quadratic functions,” Philosophical Magazine, vol. 1, pp. 295–
305, 1851.
[106] M. Woodbury, Inverting modified matrices. Statistical Research Group, Princeton
University, Princeton, NJ, 1950.
[107] S. J. Blunsden and R. B. Fisher, “The behave video dataset: ground truthed video
for multi-person behavior classification,” Annals of the BMVA, May 2009.
[108] C. H. Lampert, M. B. Blaschko, and T. Hofmann, “Beyond sliding windows: Object
localization by efficient subwindow search,” in Conf. Comput. Vis. Pattern
Recognit., pp. 1 – 8, June 2008.
[109] T. Yeh, J. J. Lee, and T. Darrell, “Fast concurrent object localization and recognition,”
in Conf. Comput. Vis. Pattern Recognit., pp. 280 – 287, jun 2009.
[110] P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple
features,” in Conf. Comput. Vis. Pattern Recognit., pp. I–511 – I–518 vol.1, June
2001.
[111] K.-W. Cheng, Y.-T. Chen, and W.-H. Fang, “Spatial-temporal maximumsubsequence
search for anomaly event detection and localization,” in IPPR Conf.
Comput. Vis. Graph. Image Process., Aug 2013.
[112] D. Comaniciu and P. Meer, “Mean shift: A robust approach toward feature space
analysis,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, pp. 603 – 619, 2002.
[113] T. T. Soong, Fundamentals of Probability and Statistics for Engineers. Wiley,
2004.
[114] E. K. P. Chong and S. H. Zak, An Introduction to Optimization, 4th Edition. Wiley,
2013.
[115] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman,
“The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results.”
http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html.
[116] A. Prest, C. Leistner, J. Civera, C. Schmid, and V. Ferrari, “Learning object class
detectors from weakly annotated video,” in Conf. Comput. Vis. Pattern Recognit.,
pp. 3282 – 3289, jun 2012.
[117] J. Hosang, R. Benenson, P. Dollar, and B. Schiele, “What makes for effective detection
proposals?,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 4, pp. 814
– 830, 2015.
[118] J. J. Lim, C. L. Zitnick, and P. Dollar, “Sketch tokens: A learned mid-level representation
for contour and object detection,” in Conf. Comput. Vis. Pattern Recognit.,
pp. 3158 – 3165, jun 2013.
[119] R. B. Girshick, P. F. Felzenszwalb, and D. McAllester, “Discriminatively trained
deformable part models, release 5.” http://people.cs.uchicago.edu/ rbg/latentrelease5/.
[120] X. Ren and D. Ramanan, “Histograms of sparse codes for object detection,” in
Conf. Comput. Vis. Pattern Recognit., pp. 3246 – 3253, jun 2013.
[121] W. Zou, X.Wang, M. Sun, and Y. Lin, “Generic object detection with dense neural
patterns and regionlets,” in Br. Mach. Vis. Conf., 2014.
[122] P.-A. Savalle, S. Tsogkas, G. Papandreou, and I. Kokkinos, “Deformable part models
with cnn features,” in Br. Mach. Vis. Conf. Workshop, 2014.
[123] B. Hariharan, P. Arbelaez, R. Girshick, and J. Malik, “Simultaneous detection and
segmentation,” in Eur. Conf. Comput. Vis., pp. 297–312, 2014.
[124] J. Dong, Q. Chen, S. Yan, and A. Yuille, “Towards unified object detection and
semantic segmentation,” in Eur. Conf. Comput. Vis., pp. 299–314, 2014.
[125] T. Le, D. Phung, K. Nguyen, and S. Venkatesh, “Fast one-class support vector
machine for novelty detection,” in Adv. Knowl. Discov. Data Min., pp. 189 – 200,
May 2015.
[126] Y. Shen, A. Y. Ng, and M. Seeger, “Fast gaussian process regression using kdtrees,”
in Adv. Neural Inf. Process. Syst., 2006.
[127] Y. Matsushita, E. Ofek, W. Ge, X. Tang, and H.-Y. Shum, “Full-frame video stabilization
with motion inpainting,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28,
no. 7, pp. 1150 – 1163, 2006.
[128] W. K. Hastings, “Monte carlo sampling methods using markov chains and their
applications,” Biometrika, vol. 57, no. 1, pp. 97–109, 1969.

無法下載圖示 全文公開日期 2021/11/14 (校內網路)
全文公開日期 本全文未授權公開 (校外網路)
全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
QR CODE