簡易檢索 / 詳目顯示

研究生: 林郁達
Yu-Da Lin
論文名稱: 即時視訊濃縮系統:最小碰撞軌跡圖與深度學習之物件偵測與管理技術
Online Video Synopsis: Object Detection and Management Based on Deep Learning and Minimum Collision Trajectory
指導教授: 郭景明
Jing-Ming Guo
口試委員: 丁建均
Jian-Jiun Ding
花凱隆
Kai-Lung Hua
楊家輝
Jia-hui Yang
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2017
畢業學年度: 105
語文別: 中文
論文頁數: 150
中文關鍵詞: 視訊濃縮視訊摘要物件偵測視訊監控
外文關鍵詞: video synopsis, video condensation, object detection, video surveillance
相關次數: 點閱:233下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

視訊濃縮為一種快速瀏覽影片的技術,非常適合用來處理冗長的原始監控影片,視訊濃縮技術藉由將所有移動的物件擷取出來,並進行時間或空間上的位移來去除冗餘的資訊,同時保留物件在原始影片中的活動與行為,然而過往的視訊濃縮技術中存在計算複雜度過高且費時的問題,並且在視訊濃縮結果中存在閃爍的現象。為了克服這些問題,我們提出了一種基於軌跡的高效率視訊濃縮系統,本系統不需要任何物件追蹤的演算法與能量最佳化的物件排列算法。在本論文中提出幾項演算法,首先提出了基於時空軌跡之快速物件分群法,能夠免除一般視訊濃縮系統中的不自然現象,接著利用基於時空域的最小碰撞軌跡圖進行物件串的排列,決定物件串在濃縮影片中的最佳出現時間點。此外,我們結合了視訊濃縮系統與基於卷積神經網路的物件偵測技術,讓使用者可以快速的找到特定物件。最後,我們透過大量的實驗證明本視訊濃縮的穩健性,且能夠高效率的產生免除閃爍現象的濃縮影片。


Video synopsis is a feasible solution to expedite browsing in raw surveillance data and to perform various video analytics tasks. The technique provides a condensed video with reduced spatial or temporal redundancies, without losing the actual content of the source video. However, conventional methods are computationally intensive and time-consuming and also with blinking effect in the resultant video. To overcome these problems, we propose a trajectory-based video synopsis system which can achieve high-performance without object tracking and energy optimization for tube rearrangement. In comparison to existing methods, Spatial-temporal trajectory-based object tube extraction algorithm is performed consistently in keeping tubes continuously to avoid blinking effect. Tube rearrangement based on Minimum Collision Trajectory in spatial-temporal domain is proposed to decide the best temporal position of tubes in synopsis video. Moreover, we integrate the object detection system based on convolutional neural network (CNN) with object tubes, which enables a user quickly locating a specific object. Finally, the proposed system can efficiently generate a condensed video without blinking effect, and its robustness validated with extensive experiments.

中文摘要I AbstractII 誌謝III 目錄IV 圖表索引VII 第一章 緒論1 1.1 研究背景與動機1 1.2 系統流程2 1.3 論文架構3 第二章 文獻探討5 2.1 前言5 2.2 視訊濃縮相關技術介紹7 2.2.1 時間域的位移(Temporal shifting)7 2.2.2 空間域與時間域的位移(Spatial–temporal shifting)21 2.3 背景濾除相關技術介紹25 2.3.2 視覺背景提取(Visual Background Extractor, ViBe)35 2.4 視訊濃縮所遭遇之問題43 第三章 視訊濃縮技術46 3.1 系統簡介46 3.2 視訊分析層(Video Analysis Layer)50 3.2.1 前景物件擷取(Foreground Object Extraction)50 3.2.2 物件連通與二次掃描標籤化(Connected-component and Two-pass Labeling)52 3.2.3 多背景影像產生(Multi-background Generation)56 3.2.4 基於卷積神經網路之即時物件辨識與分類(Object Detection and Categorize Based on Convolutional Neural Network)59 3.2.5 前景物件之前置資料結構建置(Data Structure Preparatory Construction of Foreground Object)75 3.3 物件串列萃取層(Object Tube Extraction Layer)78 3.3.1 物件追蹤串接演算法之缺點(Drawbacks of Object tracking method)79 3.3.2 基於時空軌跡之快速物件分群法(Fast Object Grouping based on Spatial-temporal Trajectory)83 3.4 視訊濃縮層(Video Synopsis Layer)87 3.4.1 物件排列算法與濃縮影片長度(Tube Rearrange and Length of Synopsis Video)87 3.4.2 時空重疊率與空間重疊率(Spatial-temporal Overlap and Temporal Overlap)88 3.4.3 基於時空域最小軌跡碰撞之物件排列演算法(Tube Rearrangement based on Minimum Collision Trajectory in spatial-temporal domain)92 3.4.4 可調式之濃縮影片長度(Adjustable Length of Synopsis Video)95 3.4.5 重疊物件的透明化處理(Transparentize of Overlapped Object)98 3.5 物件特性分類層(Object Tube Categorize Layer)100 3.5.1 物件串列之大小(Sizes of Object tubes)101 3.5.2 物件串列之移動方向(Direction of Object tubes)101 3.5.3 物件串列之顏色(Color of Object tubes)103 3.5.4 物件串列之類別(Category of Object tubes)105 第四章 實驗結果107 4.1 測試環境與樣本影片107 4.2 評估標準介紹110 4.3 視訊濃縮實驗結果111 4.3.1 評估標準之比較111 4.3.2 實際濃縮結果之比較120 4.4 系統功能介紹127 第五章 結論與未來展望131 參考文獻132

[1] K. Kim, T. H. Chalidabhongse, D. Harwood, and L. Davis, “Real-time foreground-background segmentation using codebook model,” Real-Time Imaging, vol. 11, no. 3, pp. 172-185, June. 2005.
[2] J. M. Guo, Y. F. Liu, C. H. Hsia, M. H. Shih, and C. S. Hsu, “Hierarchical method for foreground detection using codebook model,” IEEE Transaction on Circuits and Systems for Video Technology, vol. 21, no.6, pp. 804-815, June. 2011.
[3] J. M. Guo, C. H. Hsia, Y. F. Liu, M. H. Shih, C.H. Chang, and J. Y Wu, “Fast background subtraction based on a multilayer Codebook model for moving object detection,” IEEE Transaction on Circuits and Systems for Video Technology, vol. 23, no.10, pp. 1809-1821, Oct. 2013.
[4] A. Rav-Acha, Y. Pritch, and S. Peleg, “Making a long video short: dynamic video synopsis,” In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 435-441, June. 2006.
[5] Y. Pritch, A. Rav-Acha, A. Gutman, S. Peleg, “Webcam synopsis: peeking around the world,” In Proc. IEEE International Conference on Computer Vision, pp. 1-8, Oct. 2007.
[6] Y. Pritch, A. Rav-Acha, and S. Peleg, “Nonchronological video synopsis and indexing, IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 30, no. 11, pp. 1971-1987, Nov. 2008.
[7] Y. Pritch, S. Ratovitch, A. Hendel, and S. Peleg, ”Clustered synopsis of surveillance video,” In Proc. IEEE International Conference on Advanced Video and Signal Based Surveillance, pp. 195-200, Sept. 2009.
[8] Y. Nie, NC. Xiao, H. Sun, and P. Li, “Compact video synopsis via global spatiotemporal optimization,” IEEE Transactions on Visualization and Computer Graphics, vol. 19, no. 10, pp. 1664-1676, Oct. 2013.
[9] X. Zhu, J. Liu, J. Wang, H. Lu, “Key observation selection for effective video synopsis,” In Proc. International Conference on Pattern Recognition , pp. 2528-2531, Nov. 2012.
[10] C. R. Huang, H. C. Chen, and P. C. Chung, ”Online surveillance video synopsis,” In Proc. IEEE International Symposium on Circuits and Systems, pp. 1843-1846, May. 2012.
[11] S. Feng, Z. Lei, D. Yi, and S. Z. Li, Online content-aware video condensation,” In Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 2082-2087, June. 2012.
[12] C.-R Huang, P.-C Chung, D.-K Yang, H.-C Chen, and G.-J Huang, “Maximum a posteriori probability estimation for online surveillance video synopsis”, IEEE Transactions on circuits and systems for video technology, vol. 24, no. 8, Aug 2014.
[13] L. Sun, J. Xing, H. Ai, and S. Lao, “A tracking based fast online complete video synopsis approach,” In Proc. International Conference on Pattern Recognition, pp. 1956-1959, Nov. 2012.
[14] S. Wang, J. Yang, Y. Zhao, A. Cai, and S. Z. Li, “A surveillance video analysis and storage scheme for scalable synopsis browsing,” In Proc. IEEE International Conference on Computer Vision Workshops, pp. 1947-1954, Nov. 2011.
[15] S. Feng, S. Liao, Z. Yuan, and S. Z. Li, “Online principal background selection for video synopsis,” In Proc. International Conference on Pattern Recognition, pp. 17-20, Aug. 2010.
[16] A. Yildiz, A. Ozgur, and Y.S. Akgul, “Fast non-linear video synopsis,” In Proc. International Symposium on Computer and Information Sciences, pp. 1-6, Oct. 2008.
[17] Z. Li, P. Ishwar, J. Konrad, “Video condensation by ribbon carving,” IEEE Transaction on Image Processing, vol. 18, no. 11, pp. 2572-2583, Nov. 2009.
[18] W. Fu, J. Wang, L. Gui, H. Lu, S. Ma, “Online video synopsis of structured motion,” Neurocomputing, vol.135, pp.155-162, July. 2014.
[19] M. Lu, Y. Wang, G. Pan, “Generating fluent tubes in video synopsis,” In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2292-2296, May. 2013.
[20] E. J. Carmona, J. Martinez-Cantos and J. Mira, “A new video segmentation method of moving objects based on blob-level knowledge,” Pattern Recognition Letters, vol. 29, issue 3, pp. 272-285, Feb. 2008.
[21] T. Kailath, ”The divergence and Bhattacharyya distance measures in signal selection,” IEEE Transactions on Communication Technology, vol. 15, no. 1, pp. 52-60, June. 1967.
[22] S. Suzuki and K. Abei, Topological structural analysis of digitized binary images by border following,” In Proc. Computer Vision, Graphics, and Image Processing, vol. 30, no. 1, pp. 32-46, June. 1985.
[23] K. Toyama, J. Krumm, B. Brumitt, and B. Meyers, “Wallflower: principles and practice of background maintenance,” In Proc. IEEE Conference on Computer Vision, vol. 1, pp. 255–261, Sept. 1999.
[24] V. Kolmogorov and R. Zabih, “What energy functions can be minimized via graph cut,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 2, pp. 147-159, Feb. 2004.
[25] J. Sun, W. Zhang, and H. Shum, “Background cut,” In Proc. European Conference on Computer Vision, pp. 628-641, May. 2006.
[26] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimization by simulated annealing,” Science, vol. 220, no. 4598, pp. 671-680, May. 1983.
[27] D. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol 60, no. 2, pp. 91-110, Nov. 2004.
[28] C. Stauffer and W. E. L. Grimson, “Adaptive background mixture models for real-time tracking,” In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2246-2252, Jun. 1999.
[29] M. Heikkila and M. Pietikainen, ” A texture-based method for modeling the background and detecting moving objects,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 4, pp. 657-662, April. 2006.
[30] K. Kim, D. Harwood, and L. S. Davis, “Background updating for visual surveillance,” Lecture Notes in Computer Science, vol. 3804, pp. 337-346, 2005.
[31] W. Hu, X. Zhou, M. Hu, and S. Maybank, “Occlusion reasoning for tracking multiple people,” IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 1, pp. 114–121, Jan. 2009.
[32] J. Zhu, S. Feng, D. Yi and S. Liao, “High-Performance Video Condensation System,” IEEE Trans. Circuits Syst. Video Technol., vol. 25, no. 7, Jul. 2015.
[33] X. Li, Z. Wang, and X. Lu, “Surveillance video synopsis via scaling down objects,” IEEE Trans. Image Process., vol. 25, no. 2, pp. 740–755, Feb. 2016.
[34] Y. He, Z. Qu, C. Gao, and N. Sang, “Fast Online Video Synopsis Based on Potential Collision Graph,” IEEE Signal Processing Letters., vol. 24, no. 1, Jan. 2017.
[35] O. Barnich and M. Van Droogenbroeck, “ViBe: A universal background subtraction algorithm for video sequences,” IEEE Trans. Image Process., vol. 20, no. 6, pp. 1709–1724, Jun. 2011.
[36] Redmon, J., Divvala, S., Girshick, R., Farhadi, A., “You only look once: Unified, real-time object detection,”. CVPR, 2016.
[37] J. Redmon and A. Farhadi, “Yolo9000: Better, faster, stronger”, arXiv preprint arXiv:1612.08242, 2016.

QR CODE