研究生: |
林郁達 Yu-Da Lin |
---|---|
論文名稱: |
即時視訊濃縮系統:最小碰撞軌跡圖與深度學習之物件偵測與管理技術 Online Video Synopsis: Object Detection and Management Based on Deep Learning and Minimum Collision Trajectory |
指導教授: |
郭景明
Jing-Ming Guo |
口試委員: |
丁建均
Jian-Jiun Ding 花凱隆 Kai-Lung Hua 楊家輝 Jia-hui Yang |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電機工程系 Department of Electrical Engineering |
論文出版年: | 2017 |
畢業學年度: | 105 |
語文別: | 中文 |
論文頁數: | 150 |
中文關鍵詞: | 視訊濃縮 、視訊摘要 、物件偵測 、視訊監控 |
外文關鍵詞: | video synopsis, video condensation, object detection, video surveillance |
相關次數: | 點閱:422 下載:1 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
視訊濃縮為一種快速瀏覽影片的技術,非常適合用來處理冗長的原始監控影片,視訊濃縮技術藉由將所有移動的物件擷取出來,並進行時間或空間上的位移來去除冗餘的資訊,同時保留物件在原始影片中的活動與行為,然而過往的視訊濃縮技術中存在計算複雜度過高且費時的問題,並且在視訊濃縮結果中存在閃爍的現象。為了克服這些問題,我們提出了一種基於軌跡的高效率視訊濃縮系統,本系統不需要任何物件追蹤的演算法與能量最佳化的物件排列算法。在本論文中提出幾項演算法,首先提出了基於時空軌跡之快速物件分群法,能夠免除一般視訊濃縮系統中的不自然現象,接著利用基於時空域的最小碰撞軌跡圖進行物件串的排列,決定物件串在濃縮影片中的最佳出現時間點。此外,我們結合了視訊濃縮系統與基於卷積神經網路的物件偵測技術,讓使用者可以快速的找到特定物件。最後,我們透過大量的實驗證明本視訊濃縮的穩健性,且能夠高效率的產生免除閃爍現象的濃縮影片。
Video synopsis is a feasible solution to expedite browsing in raw surveillance data and to perform various video analytics tasks. The technique provides a condensed video with reduced spatial or temporal redundancies, without losing the actual content of the source video. However, conventional methods are computationally intensive and time-consuming and also with blinking effect in the resultant video. To overcome these problems, we propose a trajectory-based video synopsis system which can achieve high-performance without object tracking and energy optimization for tube rearrangement. In comparison to existing methods, Spatial-temporal trajectory-based object tube extraction algorithm is performed consistently in keeping tubes continuously to avoid blinking effect. Tube rearrangement based on Minimum Collision Trajectory in spatial-temporal domain is proposed to decide the best temporal position of tubes in synopsis video. Moreover, we integrate the object detection system based on convolutional neural network (CNN) with object tubes, which enables a user quickly locating a specific object. Finally, the proposed system can efficiently generate a condensed video without blinking effect, and its robustness validated with extensive experiments.
[1] K. Kim, T. H. Chalidabhongse, D. Harwood, and L. Davis, “Real-time foreground-background segmentation using codebook model,” Real-Time Imaging, vol. 11, no. 3, pp. 172-185, June. 2005.
[2] J. M. Guo, Y. F. Liu, C. H. Hsia, M. H. Shih, and C. S. Hsu, “Hierarchical method for foreground detection using codebook model,” IEEE Transaction on Circuits and Systems for Video Technology, vol. 21, no.6, pp. 804-815, June. 2011.
[3] J. M. Guo, C. H. Hsia, Y. F. Liu, M. H. Shih, C.H. Chang, and J. Y Wu, “Fast background subtraction based on a multilayer Codebook model for moving object detection,” IEEE Transaction on Circuits and Systems for Video Technology, vol. 23, no.10, pp. 1809-1821, Oct. 2013.
[4] A. Rav-Acha, Y. Pritch, and S. Peleg, “Making a long video short: dynamic video synopsis,” In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 435-441, June. 2006.
[5] Y. Pritch, A. Rav-Acha, A. Gutman, S. Peleg, “Webcam synopsis: peeking around the world,” In Proc. IEEE International Conference on Computer Vision, pp. 1-8, Oct. 2007.
[6] Y. Pritch, A. Rav-Acha, and S. Peleg, “Nonchronological video synopsis and indexing, IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 30, no. 11, pp. 1971-1987, Nov. 2008.
[7] Y. Pritch, S. Ratovitch, A. Hendel, and S. Peleg, ”Clustered synopsis of surveillance video,” In Proc. IEEE International Conference on Advanced Video and Signal Based Surveillance, pp. 195-200, Sept. 2009.
[8] Y. Nie, NC. Xiao, H. Sun, and P. Li, “Compact video synopsis via global spatiotemporal optimization,” IEEE Transactions on Visualization and Computer Graphics, vol. 19, no. 10, pp. 1664-1676, Oct. 2013.
[9] X. Zhu, J. Liu, J. Wang, H. Lu, “Key observation selection for effective video synopsis,” In Proc. International Conference on Pattern Recognition , pp. 2528-2531, Nov. 2012.
[10] C. R. Huang, H. C. Chen, and P. C. Chung, ”Online surveillance video synopsis,” In Proc. IEEE International Symposium on Circuits and Systems, pp. 1843-1846, May. 2012.
[11] S. Feng, Z. Lei, D. Yi, and S. Z. Li, Online content-aware video condensation,” In Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 2082-2087, June. 2012.
[12] C.-R Huang, P.-C Chung, D.-K Yang, H.-C Chen, and G.-J Huang, “Maximum a posteriori probability estimation for online surveillance video synopsis”, IEEE Transactions on circuits and systems for video technology, vol. 24, no. 8, Aug 2014.
[13] L. Sun, J. Xing, H. Ai, and S. Lao, “A tracking based fast online complete video synopsis approach,” In Proc. International Conference on Pattern Recognition, pp. 1956-1959, Nov. 2012.
[14] S. Wang, J. Yang, Y. Zhao, A. Cai, and S. Z. Li, “A surveillance video analysis and storage scheme for scalable synopsis browsing,” In Proc. IEEE International Conference on Computer Vision Workshops, pp. 1947-1954, Nov. 2011.
[15] S. Feng, S. Liao, Z. Yuan, and S. Z. Li, “Online principal background selection for video synopsis,” In Proc. International Conference on Pattern Recognition, pp. 17-20, Aug. 2010.
[16] A. Yildiz, A. Ozgur, and Y.S. Akgul, “Fast non-linear video synopsis,” In Proc. International Symposium on Computer and Information Sciences, pp. 1-6, Oct. 2008.
[17] Z. Li, P. Ishwar, J. Konrad, “Video condensation by ribbon carving,” IEEE Transaction on Image Processing, vol. 18, no. 11, pp. 2572-2583, Nov. 2009.
[18] W. Fu, J. Wang, L. Gui, H. Lu, S. Ma, “Online video synopsis of structured motion,” Neurocomputing, vol.135, pp.155-162, July. 2014.
[19] M. Lu, Y. Wang, G. Pan, “Generating fluent tubes in video synopsis,” In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2292-2296, May. 2013.
[20] E. J. Carmona, J. Martinez-Cantos and J. Mira, “A new video segmentation method of moving objects based on blob-level knowledge,” Pattern Recognition Letters, vol. 29, issue 3, pp. 272-285, Feb. 2008.
[21] T. Kailath, ”The divergence and Bhattacharyya distance measures in signal selection,” IEEE Transactions on Communication Technology, vol. 15, no. 1, pp. 52-60, June. 1967.
[22] S. Suzuki and K. Abei, Topological structural analysis of digitized binary images by border following,” In Proc. Computer Vision, Graphics, and Image Processing, vol. 30, no. 1, pp. 32-46, June. 1985.
[23] K. Toyama, J. Krumm, B. Brumitt, and B. Meyers, “Wallflower: principles and practice of background maintenance,” In Proc. IEEE Conference on Computer Vision, vol. 1, pp. 255–261, Sept. 1999.
[24] V. Kolmogorov and R. Zabih, “What energy functions can be minimized via graph cut,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 2, pp. 147-159, Feb. 2004.
[25] J. Sun, W. Zhang, and H. Shum, “Background cut,” In Proc. European Conference on Computer Vision, pp. 628-641, May. 2006.
[26] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimization by simulated annealing,” Science, vol. 220, no. 4598, pp. 671-680, May. 1983.
[27] D. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol 60, no. 2, pp. 91-110, Nov. 2004.
[28] C. Stauffer and W. E. L. Grimson, “Adaptive background mixture models for real-time tracking,” In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2246-2252, Jun. 1999.
[29] M. Heikkila and M. Pietikainen, ” A texture-based method for modeling the background and detecting moving objects,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 4, pp. 657-662, April. 2006.
[30] K. Kim, D. Harwood, and L. S. Davis, “Background updating for visual surveillance,” Lecture Notes in Computer Science, vol. 3804, pp. 337-346, 2005.
[31] W. Hu, X. Zhou, M. Hu, and S. Maybank, “Occlusion reasoning for tracking multiple people,” IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 1, pp. 114–121, Jan. 2009.
[32] J. Zhu, S. Feng, D. Yi and S. Liao, “High-Performance Video Condensation System,” IEEE Trans. Circuits Syst. Video Technol., vol. 25, no. 7, Jul. 2015.
[33] X. Li, Z. Wang, and X. Lu, “Surveillance video synopsis via scaling down objects,” IEEE Trans. Image Process., vol. 25, no. 2, pp. 740–755, Feb. 2016.
[34] Y. He, Z. Qu, C. Gao, and N. Sang, “Fast Online Video Synopsis Based on Potential Collision Graph,” IEEE Signal Processing Letters., vol. 24, no. 1, Jan. 2017.
[35] O. Barnich and M. Van Droogenbroeck, “ViBe: A universal background subtraction algorithm for video sequences,” IEEE Trans. Image Process., vol. 20, no. 6, pp. 1709–1724, Jun. 2011.
[36] Redmon, J., Divvala, S., Girshick, R., Farhadi, A., “You only look once: Unified, real-time object detection,”. CVPR, 2016.
[37] J. Redmon and A. Farhadi, “Yolo9000: Better, faster, stronger”, arXiv preprint arXiv:1612.08242, 2016.