研究生: |
吳美莼 Mei-chun Wu |
---|---|
論文名稱: |
基於內容的影片大小調整系統 Content-Aware Video Resizing |
指導教授: |
鮑興國
Hsing-Kuo Kenneth Pao 楊傳凱 Chuan-kai Yang |
口試委員: |
項天瑞
Tien-Ruey Hsiang |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
論文出版年: | 2008 |
畢業學年度: | 96 |
語文別: | 中文 |
論文頁數: | 113 |
中文關鍵詞: | 影片摘要 、切縫 、圖形分割演算法 、動態執行 |
外文關鍵詞: | video summarization, seam carving, graph-cut algorithm, dynamic programming |
相關次數: | 點閱:180 下載:9 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
現代錄影設備隨手可得,如手機、數位相機、PDA、筆記型電腦、DV等,不論是品質、畫質上的技術都提升很多,因此造成影片資源大量氾濫。影片截取(video abstraction)或影片摘要(video summarization)的重點在於如何快速且有效地從大量沒有處理過的影片資源中取出有用的資料,因此已成為近幾年重要的研究主題。雖然目前已經發展出很多方法,但每種方法都還是有它的限制和缺點。
由Avidan等人發表的論文“seam carving for content-aware image resizing”得到啟發,Chen等人想出一種方法來完成影片分割(video carving): 即在影片所形成的立體資料(video volume)中,藉由重複找出最小損失(minimal cost)的2D曲面(2D sheet)和移除最小損失的2D曲面,以達成縮減影片時間。然而,Chen等人的方法仍有許多有待改善的空間。
第一,取出的2D曲面(2D sheet)可能不連續 : 即可能位於不同影片畫面(frame)的2個在空間(spatially)上相鄰的像素,卻在時間上(temporally)不相鄰(連續),因此可能造成影片破碎。第二,使用圖形分割(graph-cut)演算法計算時,會產生冗長的計算時間(computation time)。最後,相對地需要很大的記憶體,結果輸入的影片必須分成好幾段,根據記憶體大小調整每一段長度後才能執行;更糟糕地,如此分段的過程可能會導致較不理想的結果。
這篇論文主要貢獻在提出一個新的方法,在同一段鏡頭(shot)的影片長度內處理以上提到的2個問題。第一,藉由修改Avidan等人的“seam carving”論文中的動態執行方法(dynamic programming approach),我們可以取出連續的2D曲面而且防止影片破碎。第二,不同於Chen等人使用的圖形分割(graph-cut)演算法,我們的方法較簡單且有效率,能夠執行較簡單的作業但速度加快2個數量級-大約100倍。最後,記憶體的消耗量也大大縮小為大約十分之一。
為了使我們的研究更完整,我們也提出影片加長的方法,以及改變影片畫面大小的方法,並將結果與已現存的方法做比較。
The rapid advance on modern technology has significantly enhanced the procedure of video capture, both in quality and performance, thus the inundation of video data. Video abstraction or video summarization, which focuses on how to quickly and efficiently extract useful information from the raw and huge amount of video data, has therefore become a very important research issue in recent years. Numerous approaches have been developed but each has its drawbacks or limitations.
Inspired by the work of Avidan et al. on seam carving for content-aware image resizing, Chen et al. generalize the idea to perform video carving by repeatedly finding and removing the 2D sheet with the minimal cost from the video volume, thus achieving the temporal reduction of a video. However, there are several issues remained in Chen et al.'s approach, which leaves room for further improvement.
First, the extracted 2D sheet may not be smooth, that is, two spatially adjacent pixels may sit on two different frames that are temporally adjacent, thus causing periodically fragmented results. Second, the employed graph-cut algorithm entails a lengthy computation time. Finally, the required memory is relatively large, and as a result, the input video has to be broken into subsets so that each subset could be brought into the memory in its entirety and processed accordingly. Worse yet, such a partitioning scheme may lead to undesired sub-optimal results.
The main contribution of this paper is to propose a new approach that addresses all the mentioned issues in one shot. First, by modifying the dynamic programming approach originally adopted in Avidan et al.'s seam carving work, we could extract smooth 2D sheets and thus avoid fragmented outlook. Second, unlike Chen et al.'s graph-cut algorithm, our approach is much simpler and efficient, and could perform similar tasks but with a speed that is about two orders of magnitude faster. Finally, the memory consumption is also greatly reduced to be one order of magnitude smaller.
To make our work more complete, we also propose methods to extend the length of a video, as well as to resize a video in spatial domain. Results are shown and compared with existing approaches, if applicable, to demonstrate the effectiveness of our proposed approach.
[1]AVIDAN S. , SHAMIR A. : Seam Carving for Content-Aware Image Resizing. In SIGGRAPH '2007 (2007).
[2]BOYKOV Y. , JOLLY M. P. : Interactive Graph Cuts for Optimal Boundary & Region Segmentation of Objects in N-D Images. In ICCV '2001 (2001).
[3] BOYKOV Y. , KOLMOGOROV V. : An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision. IEEE Transactions on Pattern Analysis and Machine Intelligence '2004 (2004), pp.1124.1137.
[4]BENNETT E. P. , MCMILLAN L. : Computational Time-Lapse Video. In SIGGRAPH '2007 (2007).
[5]CHEN B. , SEN P. : Video Carving. In Eurographics '2008 (2008).
[6]DIVAKARAN A. , PEKER K. A. , RADHAKRISHNAN R. , XIONG Z. , CABASSON R. : Video Summarization using MPEG-7 Motion Activity and Audio Descriptors. Tech. Rep. TR2003-34, MERL – Mitsubishi Electric Research Laboratories '2003 (2003).
[7]FARIN D. , EFFELSBERG W. , DE WITH P. H. N. : Robust Clustering-Based Video Summarization with Integration of Domain-Knowledge. In ICME '2002 (2002), pp. 89.92.
[8]HUA X. , LU L. , ZHANG H. : Ave: Automated Home Video Editing. In ACM Multimedia '2003 (2003), pp. 490.497.
[9]TAO C., JIA J., AND SUN H. : Active Window Oriented Dynamic Video Retargeting. In ICCV '2007 (2007)
[10]KIM C. , HWANG J. : An Integrated Scheme for Object-Based Video Abstraction. In ACM Multimedia '2000 (2000), pp. 303.311.
[11]KANG H. , MATSUSHITA Y. , TANG X. , CHEN X. : Space-Time Video Montage. In CVPR '2006 (2006), pp. 1331.1338.
[12]LIU F. , GLEICHER M. : Video Retargeting: Automating Pan and Scan. In ACM Multimedia '2006 (2006), pp. 241.250.
[13]MA Y. , HUA X., LU L. , ZHANG H. : A Generic Framework of User Attention Model and Its Application in Video Summarization. IEEE Transactions on Multimedia '2005 (2005), pp.907.919.
[14]NAM J. , TEWFIK A. H. : Video Abstract of Video. In The 3rd IEEEWorkshop on Multimedia Signal Processing '1999 (1999), pp. 117.122.
[15]NGO C. W. , ZHANG H. J. , CHIN R. T. , PONG T. C. : Motion-Based Video Representation for Scene Change Detection. In ICPR '2000 (2000).
[16]PAN C. , CHUANG Y. , HSU W. H. : Ntu TRECVID-2007 Fast Rushes Summarization System. In Proceedings of the International Workshop on TRECVID Video Summarization '2007(2007) , pp. 74.78.
[17]PAL C. , JOJIC N. : Interactive Montages of Sprites for Indexing and Summarizing Security Video. In CVPR '2005 (2005), pp. 1192.1192.
[18]PETROVIC N. , JOJIC N. , HUANG T. S. : Adaptive Video Fast Forward. Multimedia Tools and Applications '2005 (2005) , pp. 327.344.
[19]PRITCH Y. , RAV-ACHA A. , PELEG S. : Non-Chronological Video Synopsis and Indexing. IEEE Transactions on Pattern Analysis and Machine Intelligence '2008 (2008).
[20]SMITH M. , KANADE T. : Video Skimming and Characterization through the Combination of Image and Language Understanding Techniques. In CVPR '1997 (1997), pp. 775.781.
[21]WANG J. , REINDERS M. , LAGENDIJK R . , LINDENBERG J. ,AND KANKANHALLI M. : Video Content Representation Tiny Devices. In ICME '2004 (2004), vol. 3, pp.1711–1714.
[22]SUKMARG O. , RAO K. : Fast Object Detection and Segmentation in Mpeg Compressed Domain. In IEEE TENCON '2000 (2000).
[23]SCHODL A. , SZELISKI R. , SALESIN D. H. , ESSA I. : Video Textures. In SIGGRAPH '2000 (2000), pp. 489.498.
[24]WOLF L. , GUTTMANN M. , COHEN-OR D. : Non-Homogeneous Content-Driven Video-Retargeting. In ICCV '2007 (2007), pp. 1.6.
[25]ZHU Z., WU X., FAN J., AMD W. G. AREF A. K. E.: Exploring Video Content Structure for Hierarchical Summarization. Multimedia Systems '2004 (2004), pp.98.115.
[26]AGARWALA A., COLIN ZHENG K., Pal C., AGRAWALA M., COHEN M., CURLESS B., SALESIN D., SZELISKI R. : Panoramic Video Textures. In SIGGRAPH '2005 (2005).
[27]HERMANS C., VANAKEN C., MERTENS T., VAN REETH F., BEKAERT P., CURLESS B., SALESIN D., SZELISKI R. : Augmented Panoramic Video. In EUROGRAPHICS '2008 (2008).
[28]KWATRA V., SCH¨ODL A., ESSA I., TURK G., BOBICK A.: Graphcut Textures: Image and Video Synthesis Using Graph Cuts. In SIGGRAPH '2003 (2003).
[29]LIU F., GLEICHER M.: Automatic Image Retargeting with Fisheye-View Warping. In UIST '2005 (2005).
[30]SETLUR V., TAKAGI S., RASKAR R., GLEICHER M., GOOCH B. : Automatic Image Retargeting. In Proc. Sym. On Mobile and ubiquitous multimedia table of contents '2005 (2005).
[31]RUBINSTEIN M., SHAMIR A., AVIDAN S.: Improved Seam Carving for Video Retargeting. In SIGGRAPH '2008 (2008).
[32]FAN X., XIE X., ZHOU H.-Q., AND MA W.-Y. : Looking into video frames on small displays. In ACM MULTIMEDIA '2003 (2003) , pp.247–250.
[33]WANG J., XU Y., SHUM H.-Y., AND COHEN M. F. : Video tooning. In ACM Trans. Graph. '2004 (2004), pp. 574–583.
[34]M. Irani, P. Anandan, J. Bergen, R. Kumar, and S. Hsu. : Efficient representations of video sequences and their applications. In Image Communication ‘1996 (1996) , 8(4): pp.327–351.
[35]BAR-JOSEPH Z. , EL-YANIV R. , LISCHINSKI D. , AND WERMAN M. : Texture mixing and texture movie synthesis using statistical learning. In IEEE Transactions on Visualization and Computer Graphics '2001 (2001), pp.120–135.
[36]WEI L.-Y. , AND LEVOY M. : Fast texture synthesis using treestructured vector quantization. In SIGGRAPH '2000 (2000), pp.479–488. ISBN 1-58113-208-5.
[37]RAV-ACHA A., PRITCH Y., LISCHINSKI D., AND PELEG S. : Dynamosaicing: Mosaicing of dynamic scenes. In PAMI '2007 (2007), pp.1789–1801.
[38]B. Suh, H. Ling, B. B. Bederson, D. W. Jacobs. : Automatic thumbnail cropping and its effectiveness. In User interface software and technology '2003 (2003), pp. 95–104.
[39]Yuzhu L., Liangshou W. :3D Segmentation & Visualization of MRI brain images. Digital Image Processing '2005 (2005).