簡易檢索 / 詳目顯示

研究生: 張幸卿
Hsing-Ching Chang
論文名稱: 基於內容的視訊摘要及收縮
Content Based Video Resizing in Temporal and Spatial Domain
指導教授: 楊傳凱
Chuan-Kai Yang
口試委員: 林惠勇
Huei-Yung Lin
莊永裕
Yung-Yu Chuang
陳祝嵩
Chu-Song Chen
鍾國亮
Kuo-Liang Chung
學位類別: 博士
Doctor
系所名稱: 管理學院 - 資訊管理系
Department of Information Management
論文出版年: 2009
畢業學年度: 97
語文別: 英文
論文頁數: 110
中文關鍵詞: 接縫切割動態規畫影片切割調整影片的大小影片摘要兩階段動態規畫
外文關鍵詞: Video Carving, Video Resizing, Video Summarization, Two Stage Dynamic Programming Algorithm
相關次數: 點閱:255下載:52
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 由於現代科技快速精進,使得錄影設備普及,卻造成影片資料的氾濫。但在大量累積的影片資料中卻只有少量的資料具有保存價值,所以如何從龐大的影片資料中萃取出重要資訊是一個刻不容緩的課題。本篇論文提出一個架構在 Seam-Based上的 Greedy演算法, 可有效的摘要出影片資料的重要資訊。本文的目的是想從影片資料中精簡出一部具有最大資訊含量的濃縮影片,且不需要做任何的事前處理。
    我們假設影片是一兼具時間性與空間性的3D立方體。我們藉著依序刪除 “不重要2D曲面(Seam Surface)”以達到縮減影片的目的,而此 “不重要2D曲面”的擷取除了需遵守 ”連續性”(connectivity) 及 “單一性”( monotonicity )的限制外,我們提出應再加入 ”區域性”( locality) 的限制。如此可抑制一些因影片縮減所造成的缺陷。
    根據greedy 演算法只追求區域性最佳解的特性,即可解決Graph Cut 方法所演繹出的兩個問題:記憶體需求量及計算效率。我們也提出用 Sobel Operator 來改善影片縮減後品質的方式。
    我們也將此演算法推廣用以改變影片的大小 (即長、寬) 以適應大小不同的影片播放設備。
    最後,將我們的結果與修改後的 ”two stage dynamic programming algorithm ”做比較,發現雖然我們演算法找出的 Seam Surface cost比”two stage dynamic programming algorithm ”的大, 但我們結果的quality卻比較好,由此更可證明我們方法的有效性。


    With the rapid adoption of video capture devices and the increase of video data, the need for the effective and efficient extraction of essential video content from the vastly amount of data has never become so imperative.
    In this thesis we present a new approach for video summarization using a seam-based approach by applying a greedy algorithm. Our goal is to compact the video as dense as possible to obtain an informative video volume without any pre-knowledge (such as saliency maps).
    In our method, we successively remove the non-activity seam surfaces to achieve our goal. In addition to the two constraints of connectivity and monotonicity, we introduce an additional constraint, locality, which could potentially reduce the generation of visual artifacts due to video summarization.
    This greedy strategy always looks for the local minimum, and therefore it overcomes the memory restriction and computation limitations faced in other approaches, such as the ones using graph-cut technique.
    We also use the Sobel operator to improve the quality of the output. In addition, we extend our technique to the spatial domain for the purpose of video retargeting.
    We also modify the “two stage dynamic programming algorithm” to search a 3D approach for seam carving. We demonstrate that despite our method is not configured to produce a global minimum state, our results are of a better quality and comparatively free of visual artifacts.

    摘要 I ABSTRACT II 誌謝 III TABLE OF CONTENTS IV LIST OF TABLES VII LIST OF FIGURES VIII Chapter 1. Introduction 1 1.1. Background and Motivation 1 1.2. Contributions 6 1.3. Organization 7 Chapter 2. Related Works 8 2.1. Temporal Resizing 8 2.1.1. Frame-Based Approaches 8 2.1.2. Object-Based Approaches 9 2.1.3. Seam- Removal Approaches 9 2.1.4. Other Approaches 11 2.2. Spatial Resizing (Retargeting) 11 2.2.1. Scaling and Cropping Approaches 11 2.2.2. Image warping Approaches 12 2.2.3. Seam Carving Approaches 12 2.2.4. Multi-Operator Approaches 13 2.3. Two Stage Dynamic Programming 14 Chapter 3. The Difficulties Faced in Seam Carving for a Video 15 3.1. Optimal Seam 15 3.2. Desired Algorithm 19 Chapter 4. Energy 22 4.1. 2D Gradient 22 4.2. 3D Gradient 26 Chapter 5. Temporal Resizing 28 5.1. Seam Carving Using the Greedy Method for 2D Images 29 5.1.1. Finding the Seed of Seam 30 5.1.2. Growing the Seam from a Seed 30 5.1.3. Finding Min seam 31 5.1.4. Results 35 5.2. Seam Carving Using Greedy Method for 3D Video 40 5.2.1. Finding the Seed (Start Point) of Seam Surface 44 5.2.2. Growing the Surface from a Seam 44 5.2.3. Finding Min Seam Surface 47 Chapter 6. Spatial Resizing 48 6.1. Preparing Energy 50 6.2. Growing Seam Surface 50 Chapter 7. Seam Carving Using Two-Stage Dynamic Programming 51 7.1. Accumulating the cost in the vertical direction for each slice 52 7.2. Obtaining the Seam to Develop a Seam Surface 54 Chapter 8. Results and Discussion 56 8.1. Temporal Resizing 56 8.1.1. Results 56 8.1.2. Comparisons 62 8.1.2.1. Comparisons with Fast Forward Approach 62 8.1.2.2. Comparisons with Synopsis 64 8.1.2.3. Comparisons with Video carving 69 8.1.3. Limitations 73 8.2. Spatial Resizing 75 8.2.1. Results and Comparisons 75 8.2.2. Limitation 81 8.3. Two-Stage Dynamic Programming 83 8.3.1. Results and Comparisons 83 Chapter 9. Conclusions and Future Works 91 9.1. Conclusions 91 9.2. Future Works 91 9.2.1. Shaking Correction 91 9.2.2. Moving Scene 92 9.2.3. Multi-Operator Video Resizing 92 9.2.4. Taking the audio into account 93 References 94 About the Author 99 List of Publications 100

    [1] H. Kang, Y. Matsushita, X. Tang, and X. Chen, “Space-Time Video Montage,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), pp. 1331–1338, June 2006.
    [2] Y. Pritch, A. Rav-Acha, and S. Peleg, “Nonchronological Video Synopsis and Indexing,” IEEE Trans. PAMI, Vol 30, No. 11, pp. 1971-1984, Nov. 2008.
    [3] B. CHEN, and P. SEN, ”Video Carving,” In Short Papers Proceedings of Eurographics, 2008.
    [4] S. Avidan, and A. Shamir, “Seam Carving for Content Aware Image Resizing,” ACM Trans. Graph. 26, 3, 10, 2007
    [5] M.Rubinstein, A. Shamir, and S. Avidan, “Improved Seam Carving for Video Retargeting,” ACM Trans. Graph. 27, 3, 2008
    [6] R.C.T.Lee, R.C.T.Chang, S.S.Tseng, Y.T.Tsai, “Introduction to the Design and Analysis of Algorithms,” Second Edition, 旗標(FLAG)
    [7] Changmin Sun, “Fast Stereo Matching Using Rectangular Subregioning and 3D Maximum-Surface Techniques,” International Journal of Computer Vision. No.1/2/3, pp.99-117, May 2002
    [8] Rafael C. Gonzalez, Richard E. Woods, “Digital Image Processing” p.135, Prentice-Hall, Inc., 2002
    [9] Robert ”Machine Perception of Three-Dimensional Solids.” In Optical and Electro-Optical Information Processing, Tippet,J.T.(ed), MIT Press, Cambridge, Mass, 1965
    [10] J. Lopez, M. Markel, N. Siddiqi, G. Gebert and J. Evers, “Performance
    of Passive Ranging from Image Flow,” IEEE International Conference on
    Image Processing, vol. 1, pp. 929-932, 2003
    [11] Nan Lu, Jihong Wang, Q.H. Wu and Li Yang, “An Improved Motion Detection Method for Real-Time Surveillance”, IAENG International Journal of Computer Science, 35:1, IJCS_35_1_16, 2008
    [12] H.J Zhang, et al., “An Integrated System for Content-based Video Retrieval and Browsing”, Pattern Recognition, vol.30, no.4, pp.643-658, 1997
    [13] W. Wolf, “Key-Frame Selection by Motion Analysis”, In Proceeding of IEEE International Conference on Acoustics, Speech, and Signal Processing, vol.2, pp.1228-1231, Atlanta, Georgia, U.S., May 7-10, 1996.
    [14] Z.Zhu, X.Wu, J.Fan, and W. G. Aref A. K. E., ”Exploring Video Content Structure for Hierarchical Summarization,” Multimedia Systems 10, 2, 98–115, 2004
    [15] W. C. Ngo, H. J. Zhang, R. T. Chin, T. C. Pong, “Motion-Based Video Representation for Scene Change Detection,” In ICPR, 2000 .
    [16] D. Farin, W. Effelsberg, and Peter H. N. de, ”Robust Clustering-Based Video Summarization with Integration of Domain-Knowledge,” In ICME, pp. 89–92. 2002.
    [17] A. Divakaran, K. A. Peker, R. Radhakrishnan, Z. Xiong, R.Cabasson, ”Video Summarization Using MPEG-7 Motion Activity and Audio Descriptors,” Tech. Rep. TR2003-34, MERL – Mitsubishi Electric Research Laboratories, 2003.
    [18] J. Namj, A. H.Tewfik, ”Video Abstract of Video,” In The 3rd IEEE Workshop on Multimedia Signal Processing, pp. 117–122, 1999.
    [19] E. P. Bennett, L. Mcmillan “Computational Time-Lapse Video,” In SIGGRAPH, 2007..
    [20] N. Petrovic, N. Jojic, T.S. Huang, ”Adaptive Video Fast Forward,” Multimedia Tools and Applications 26, 3, 327–344, 2005.
    [21] N. Jojic, N. Petrovic, T. S. Huang, ” Scene Generative Models for Adaptive Video Fast Forward”, In ICIP, pp. 619–622 2003.
    [22] Ba Tu Truong and Svetha Venkatesh, “Video Abstraction : A Systematic Review and Classification,” TOMCCAP 2006. The tentative schedule is Vol. 3 No. 1, 2007. descriptors. Tech. rep., Mitsubishi Electric Research Laboratory, 2003
    [23] Y. Li, T. Zhang, and D. Tretter, ” An Overview of Video Abstraction Techniques,” Technical Report HPL-2001-191, HP Laboratory, 2001
    [24] Y.-F. Ma, X.-S. Hua, L. Lu, and H. Zhang,” A Generic Framework of User Attention Model and Its Application in Video Summarization,” IEEE Transactions on Multimedia, 7(5):907–919, 2005.
    [25] A. Rav-acha, Y. Pritch, S. Peleg, ” Making a Long Video Short: Dynamic Video Synopsis,” CVPR (2006)
    [26] Y. Pritch, A. Rav-acha, A. Gutman, S. Peleg, “Webcam Synopsis: Peeking Around the World,” ICCV (2007)
    [27] J. Sun, W. Zhang, X. Tang, and H. Shum, “Background Cut,” In ECCV, pages 628.641, 2006.
    [28] Y. Boykov, V. Kolmogoro, “An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision,” IEEE Transactions on Pattern Analysis and Maching Intelligence 26,9,1124-1137, 2004.
    [29] A.Sch¨odl, R. Szeliski, , D. H. Salesin, , and I. Essa, “Video Textures,” Proceedings of SIGGRAPH, 489–498. ISBN 1- 58113-208-5, 2000.
    [30] L. Q.Chen, X. Xie, X. Fan, W. Y.MA, , H. J. Zhang, and H. Q. Zhou, ” A Visual Attention Model for Adapting Images on Small Displays,” ACM Multimedia Systems Journal 9, 4, 353–364, 2003.
    [31] H.Liu, X. Xie, W.-Y. MA, and H.-J.Zhang, ”Automatic Browsing of Large Pictures on Mobile Devices,” In Proceedings of ACM International Conference on Multimedia, 148–155, 2003.
    [32] A. Santella, M. Agrawala, D. Decarlo, D. Salesin, and M. Cohen, ”Gaze-Based Interaction for Semiautomatic Photo Cropping,” In Proceedings of CHI, 771–780, 2006.
    [33] B. Suh, H. Ling, B. B. Bederson, and D. W. Jacobs, “Automatic Thumbnail Cropping and Its Effectiveness,” In Proceedings of UIST, ACM, 95–104, 2003.
    [34] P.Viola, , and M. J. Jones, ” Robust Real-Time Face Detection,” Int. J. Comput. Vision 57, 2, 137–154, 2004.
    [35] L. Itti, C. Koch, , and E. Niebur, ”A Model of Saliency Based Visual Attention for Rapid Scene Analysis”, IEEE Trans. Pattern Anal. Mach. Intell., 20, 11, 1254–1259, 1998.
    [36] D. Decarlo, , AND A. Santella, “Stylization and Abstraction of Photographs”, ACM Trans. Graph. 21, 3, 769–77, 2002.
    [37] R. Gal, O. Sorkine, and D. Cohen-Or, ”Feature-Aware Texturing,” In Proceedings of Europgraphics Symposium on Rendering, 297–303, 2006.
    [38] L. Wolf, M.Guttmann, D. Cchen-Or, “Non-Homogeneous Content-Driven Video-Retargeting,” In ICCV, pp. 1–6 , 2007.
    [39] Y. Wang, C. Tai, O. Sorkine, T. Lee, “Optimized Scale-and-Stretch for Image Resizing,” In SIGGRAPH Asia 2008 .
    [40] Y. F. Zhang, S. M. Hu, R. R. Martin, ”Shrinkability Maps for Content-Aware Video Resizing,” Computer Graphics Forum 27, 7, 2008.
    [41] P. Viola and M. Jones, ” Robust Real-Time Face Detection,” IJCV, 2004.
    [42] S.-C. Liu, C.-W. Fu, and S. Chang, “Statistical Change Detection with Moments under Time-Varying Illumination,” IEEE Trans. on Image Processing, 1998
    [43] F. Liu, Y. Hu, M. Gleicher, “Discovering panoramas in web videos,” Proc. ACM Multimedia Conference, p.329-38, Oct. 2008.
    [44] Weiming Dong1 and Jean-Claude Paul2, “Adaptive Content-Aware Image Resizing” EUROGRAPHICS Forum 28,2, 2009.
    [45] Michael Rubinstein, Ariel Shamir, Shai Avidan, ”Multi-Operator Media Retargeting,” ACM Transactions on Graphics, Volume 28, Number 3, SIGGRAPH 2009

    QR CODE