研究生: |
張幸卿 Hsing-Ching Chang |
---|---|
論文名稱: |
基於內容的視訊摘要及收縮 Content Based Video Resizing in Temporal and Spatial Domain |
指導教授: |
楊傳凱
Chuan-Kai Yang |
口試委員: |
林惠勇
Huei-Yung Lin 莊永裕 Yung-Yu Chuang 陳祝嵩 Chu-Song Chen 鍾國亮 Kuo-Liang Chung |
學位類別: |
博士 Doctor |
系所名稱: |
管理學院 - 資訊管理系 Department of Information Management |
論文出版年: | 2009 |
畢業學年度: | 97 |
語文別: | 英文 |
論文頁數: | 110 |
中文關鍵詞: | 接縫切割 、動態規畫 、影片切割 、調整影片的大小 、影片摘要 、兩階段動態規畫 |
外文關鍵詞: | Video Carving, Video Resizing, Video Summarization, Two Stage Dynamic Programming Algorithm |
相關次數: | 點閱:351 下載:52 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於現代科技快速精進,使得錄影設備普及,卻造成影片資料的氾濫。但在大量累積的影片資料中卻只有少量的資料具有保存價值,所以如何從龐大的影片資料中萃取出重要資訊是一個刻不容緩的課題。本篇論文提出一個架構在 Seam-Based上的 Greedy演算法, 可有效的摘要出影片資料的重要資訊。本文的目的是想從影片資料中精簡出一部具有最大資訊含量的濃縮影片,且不需要做任何的事前處理。
我們假設影片是一兼具時間性與空間性的3D立方體。我們藉著依序刪除 “不重要2D曲面(Seam Surface)”以達到縮減影片的目的,而此 “不重要2D曲面”的擷取除了需遵守 ”連續性”(connectivity) 及 “單一性”( monotonicity )的限制外,我們提出應再加入 ”區域性”( locality) 的限制。如此可抑制一些因影片縮減所造成的缺陷。
根據greedy 演算法只追求區域性最佳解的特性,即可解決Graph Cut 方法所演繹出的兩個問題:記憶體需求量及計算效率。我們也提出用 Sobel Operator 來改善影片縮減後品質的方式。
我們也將此演算法推廣用以改變影片的大小 (即長、寬) 以適應大小不同的影片播放設備。
最後,將我們的結果與修改後的 ”two stage dynamic programming algorithm ”做比較,發現雖然我們演算法找出的 Seam Surface cost比”two stage dynamic programming algorithm ”的大, 但我們結果的quality卻比較好,由此更可證明我們方法的有效性。
With the rapid adoption of video capture devices and the increase of video data, the need for the effective and efficient extraction of essential video content from the vastly amount of data has never become so imperative.
In this thesis we present a new approach for video summarization using a seam-based approach by applying a greedy algorithm. Our goal is to compact the video as dense as possible to obtain an informative video volume without any pre-knowledge (such as saliency maps).
In our method, we successively remove the non-activity seam surfaces to achieve our goal. In addition to the two constraints of connectivity and monotonicity, we introduce an additional constraint, locality, which could potentially reduce the generation of visual artifacts due to video summarization.
This greedy strategy always looks for the local minimum, and therefore it overcomes the memory restriction and computation limitations faced in other approaches, such as the ones using graph-cut technique.
We also use the Sobel operator to improve the quality of the output. In addition, we extend our technique to the spatial domain for the purpose of video retargeting.
We also modify the “two stage dynamic programming algorithm” to search a 3D approach for seam carving. We demonstrate that despite our method is not configured to produce a global minimum state, our results are of a better quality and comparatively free of visual artifacts.
[1] H. Kang, Y. Matsushita, X. Tang, and X. Chen, “Space-Time Video Montage,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), pp. 1331–1338, June 2006.
[2] Y. Pritch, A. Rav-Acha, and S. Peleg, “Nonchronological Video Synopsis and Indexing,” IEEE Trans. PAMI, Vol 30, No. 11, pp. 1971-1984, Nov. 2008.
[3] B. CHEN, and P. SEN, ”Video Carving,” In Short Papers Proceedings of Eurographics, 2008.
[4] S. Avidan, and A. Shamir, “Seam Carving for Content Aware Image Resizing,” ACM Trans. Graph. 26, 3, 10, 2007
[5] M.Rubinstein, A. Shamir, and S. Avidan, “Improved Seam Carving for Video Retargeting,” ACM Trans. Graph. 27, 3, 2008
[6] R.C.T.Lee, R.C.T.Chang, S.S.Tseng, Y.T.Tsai, “Introduction to the Design and Analysis of Algorithms,” Second Edition, 旗標(FLAG)
[7] Changmin Sun, “Fast Stereo Matching Using Rectangular Subregioning and 3D Maximum-Surface Techniques,” International Journal of Computer Vision. No.1/2/3, pp.99-117, May 2002
[8] Rafael C. Gonzalez, Richard E. Woods, “Digital Image Processing” p.135, Prentice-Hall, Inc., 2002
[9] Robert ”Machine Perception of Three-Dimensional Solids.” In Optical and Electro-Optical Information Processing, Tippet,J.T.(ed), MIT Press, Cambridge, Mass, 1965
[10] J. Lopez, M. Markel, N. Siddiqi, G. Gebert and J. Evers, “Performance
of Passive Ranging from Image Flow,” IEEE International Conference on
Image Processing, vol. 1, pp. 929-932, 2003
[11] Nan Lu, Jihong Wang, Q.H. Wu and Li Yang, “An Improved Motion Detection Method for Real-Time Surveillance”, IAENG International Journal of Computer Science, 35:1, IJCS_35_1_16, 2008
[12] H.J Zhang, et al., “An Integrated System for Content-based Video Retrieval and Browsing”, Pattern Recognition, vol.30, no.4, pp.643-658, 1997
[13] W. Wolf, “Key-Frame Selection by Motion Analysis”, In Proceeding of IEEE International Conference on Acoustics, Speech, and Signal Processing, vol.2, pp.1228-1231, Atlanta, Georgia, U.S., May 7-10, 1996.
[14] Z.Zhu, X.Wu, J.Fan, and W. G. Aref A. K. E., ”Exploring Video Content Structure for Hierarchical Summarization,” Multimedia Systems 10, 2, 98–115, 2004
[15] W. C. Ngo, H. J. Zhang, R. T. Chin, T. C. Pong, “Motion-Based Video Representation for Scene Change Detection,” In ICPR, 2000 .
[16] D. Farin, W. Effelsberg, and Peter H. N. de, ”Robust Clustering-Based Video Summarization with Integration of Domain-Knowledge,” In ICME, pp. 89–92. 2002.
[17] A. Divakaran, K. A. Peker, R. Radhakrishnan, Z. Xiong, R.Cabasson, ”Video Summarization Using MPEG-7 Motion Activity and Audio Descriptors,” Tech. Rep. TR2003-34, MERL – Mitsubishi Electric Research Laboratories, 2003.
[18] J. Namj, A. H.Tewfik, ”Video Abstract of Video,” In The 3rd IEEE Workshop on Multimedia Signal Processing, pp. 117–122, 1999.
[19] E. P. Bennett, L. Mcmillan “Computational Time-Lapse Video,” In SIGGRAPH, 2007..
[20] N. Petrovic, N. Jojic, T.S. Huang, ”Adaptive Video Fast Forward,” Multimedia Tools and Applications 26, 3, 327–344, 2005.
[21] N. Jojic, N. Petrovic, T. S. Huang, ” Scene Generative Models for Adaptive Video Fast Forward”, In ICIP, pp. 619–622 2003.
[22] Ba Tu Truong and Svetha Venkatesh, “Video Abstraction : A Systematic Review and Classification,” TOMCCAP 2006. The tentative schedule is Vol. 3 No. 1, 2007. descriptors. Tech. rep., Mitsubishi Electric Research Laboratory, 2003
[23] Y. Li, T. Zhang, and D. Tretter, ” An Overview of Video Abstraction Techniques,” Technical Report HPL-2001-191, HP Laboratory, 2001
[24] Y.-F. Ma, X.-S. Hua, L. Lu, and H. Zhang,” A Generic Framework of User Attention Model and Its Application in Video Summarization,” IEEE Transactions on Multimedia, 7(5):907–919, 2005.
[25] A. Rav-acha, Y. Pritch, S. Peleg, ” Making a Long Video Short: Dynamic Video Synopsis,” CVPR (2006)
[26] Y. Pritch, A. Rav-acha, A. Gutman, S. Peleg, “Webcam Synopsis: Peeking Around the World,” ICCV (2007)
[27] J. Sun, W. Zhang, X. Tang, and H. Shum, “Background Cut,” In ECCV, pages 628.641, 2006.
[28] Y. Boykov, V. Kolmogoro, “An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision,” IEEE Transactions on Pattern Analysis and Maching Intelligence 26,9,1124-1137, 2004.
[29] A.Sch¨odl, R. Szeliski, , D. H. Salesin, , and I. Essa, “Video Textures,” Proceedings of SIGGRAPH, 489–498. ISBN 1- 58113-208-5, 2000.
[30] L. Q.Chen, X. Xie, X. Fan, W. Y.MA, , H. J. Zhang, and H. Q. Zhou, ” A Visual Attention Model for Adapting Images on Small Displays,” ACM Multimedia Systems Journal 9, 4, 353–364, 2003.
[31] H.Liu, X. Xie, W.-Y. MA, and H.-J.Zhang, ”Automatic Browsing of Large Pictures on Mobile Devices,” In Proceedings of ACM International Conference on Multimedia, 148–155, 2003.
[32] A. Santella, M. Agrawala, D. Decarlo, D. Salesin, and M. Cohen, ”Gaze-Based Interaction for Semiautomatic Photo Cropping,” In Proceedings of CHI, 771–780, 2006.
[33] B. Suh, H. Ling, B. B. Bederson, and D. W. Jacobs, “Automatic Thumbnail Cropping and Its Effectiveness,” In Proceedings of UIST, ACM, 95–104, 2003.
[34] P.Viola, , and M. J. Jones, ” Robust Real-Time Face Detection,” Int. J. Comput. Vision 57, 2, 137–154, 2004.
[35] L. Itti, C. Koch, , and E. Niebur, ”A Model of Saliency Based Visual Attention for Rapid Scene Analysis”, IEEE Trans. Pattern Anal. Mach. Intell., 20, 11, 1254–1259, 1998.
[36] D. Decarlo, , AND A. Santella, “Stylization and Abstraction of Photographs”, ACM Trans. Graph. 21, 3, 769–77, 2002.
[37] R. Gal, O. Sorkine, and D. Cohen-Or, ”Feature-Aware Texturing,” In Proceedings of Europgraphics Symposium on Rendering, 297–303, 2006.
[38] L. Wolf, M.Guttmann, D. Cchen-Or, “Non-Homogeneous Content-Driven Video-Retargeting,” In ICCV, pp. 1–6 , 2007.
[39] Y. Wang, C. Tai, O. Sorkine, T. Lee, “Optimized Scale-and-Stretch for Image Resizing,” In SIGGRAPH Asia 2008 .
[40] Y. F. Zhang, S. M. Hu, R. R. Martin, ”Shrinkability Maps for Content-Aware Video Resizing,” Computer Graphics Forum 27, 7, 2008.
[41] P. Viola and M. Jones, ” Robust Real-Time Face Detection,” IJCV, 2004.
[42] S.-C. Liu, C.-W. Fu, and S. Chang, “Statistical Change Detection with Moments under Time-Varying Illumination,” IEEE Trans. on Image Processing, 1998
[43] F. Liu, Y. Hu, M. Gleicher, “Discovering panoramas in web videos,” Proc. ACM Multimedia Conference, p.329-38, Oct. 2008.
[44] Weiming Dong1 and Jean-Claude Paul2, “Adaptive Content-Aware Image Resizing” EUROGRAPHICS Forum 28,2, 2009.
[45] Michael Rubinstein, Ariel Shamir, Shai Avidan, ”Multi-Operator Media Retargeting,” ACM Transactions on Graphics, Volume 28, Number 3, SIGGRAPH 2009