研究生: |
劉汶珊 Wen-shan Liu |
---|---|
論文名稱: |
一個基於人機合作機制的影片摘要系統 A Video Summarization System with Human-machine Cooperation |
指導教授: |
范欽雄
Chin-shyurng Fahn |
口試委員: |
傅楸善
Chiou-shann Fuh 駱榮欽 Rong-chin Lo 吳怡樂 Yi-leh Wu |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
論文出版年: | 2014 |
畢業學年度: | 102 |
語文別: | 英文 |
論文頁數: | 58 |
中文關鍵詞: | 視頻摘要 、視覺總結評價 、關鍵畫面選取 、特徵描述 、視頻分割 、群眾外包 、人腦運算 |
外文關鍵詞: | Video summarization, Visual summary evaluation, Keyframes extraction, Frame content description, video segmentation, Crowdsourcing, Human computation. |
相關次數: | 點閱:218 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
多媒體產業和網路在近年已經蓬勃發展,相機、手機和平板電腦的錄像功能成本也越來越低廉,此外,網路的發展使我們能夠將影像上傳分享,包括視頻分享網站如YouTube,而其訪問量與視頻量恐怕是難以估計。因此訪問者如何節省時間來了解影片裡的重點顯得日益重要。
我們將研究視頻摘要這個課題,它是將原始視頻中選取關鍵片段而創建的。關鍵畫面的選取很重要,然而如果我們只使用畫面的特徵如顏色、形狀來做選擇,結果往往和人們的期待有落差,此外每個人感興趣的部分不同,所以我們希望能透過投票的方式來擷取能盡量符合大眾感興趣的片段。
為了實現這一目標,本論文提出半自動影片摘要系統,我們先利用畫面的視覺特徵如顏色、邊線來將影片中不同的場景分段,再利用群眾外包概念以及連續背包問題來選擇關鍵片段並且合成影片摘要。
The multimedia industry and internet have been very prosperous recently, and the cost of digital cameras, mobile phones and tablets with a sound recorder as well as DSLRs are lower and lower. In addition, the Internet enables us to access very large amount of video data from local and remote storage, including video sharing websites such as Youtube. The number of videos on the web is probably difficult to estimate. So, how to save a lot of time to understand the main contents about the video, and then viewers can choose whether they really want to watch the entire video is important.
Video summary is a research topic which selects a few important scenes from the original video to help people understand the story about the whole video in a very short time. As video summary is created only by the key segments of the video, the key segments selection is a major problem. However, if we only use the colors, shapes as a feature of keyframes, the summary result is still far from people’s expectation. In addition, not every people is interested in the same video segments, because people taste differ. So, we need select the key segments which most people want.
Toward this goal, we use two visual features: color, edge direction for selection of keyframes of shot change detection. And then let viewers to select the keyframes that they want. Finally, we will decide the number of frames needed to select form each shot through continuous knapsack problem algorithm.
[1] W. Rasheed, G. Kang, J. Kang, J. Chun, and J. Park, ”Sum of values of local histograms for image retrieval,” in Proceedings of the IEEE International Conference on Networked Computing and Advanced Information Management, Gyeongju, South Korea, vol.2, pp. 690-694, Sep., 2008.
[2] J. R. Smith and S. F. Chang, “Tools and techniques for color image retrieval,” IST/SPIE Storage and Retrieval for Image and Video Databases IV, vol. 26, no. 70, pp. 426-437, 1996.
[3] L. F. Wang, H. Y. Wu, and C. H. Pan, "Adaptive εLBP for background subtraction," in Proceedings of the 10th Asian Conference on Computer Vision, Queenstown ,New Zealand, pp. 560-571, Nov., 2011.
[4] H. C. Lee, C. W. Lee, and S. D. Kim, “Abrupt Shot Change Detection Using an Unsupervised Clustering of Multiple Features,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Istanbul, Turkey, vol. 4, pp. 2015-2018, Jun., 2000.
[5] W. Rasheed, Y. An, S. Pan, I. Jeong, J. Park, and J. Kang, “Image Retrieval Using Maximum Frequency of Local Histogram Based Color Correlogram,” in Proceedings of the IEEE International Conference on Modeling & Simulation, Busan, South Korea, pp. 322-326, May, 2008.
[6] J. Wang and W. Luo, “A Self-adapting Dual-threshold Method for Video Shot Transition Detection,” in Proceedings of the IEEE International Conference on Networking Sensing and Control, Sanya, Hainan, China, pp. 704-707, Apr., 2008.
[7] C. R. Huang, H. P. Lee, and C. S. Chen, “Shot Change Detection via Local Keypoint Matching,” IEEE Transactions on Multimedia, vol. 10, no. 6, pp.1097-1108, Oct., 2008.
[8] C. R. Huang, C. S. Chen, and P. C. Chung, “Contrast context histogram – A discriminating local descriptor for image matching,” in Proceedings of the 18th International Conference on Pattern Recognition, Hong Kong, China, vol. 4, pp. 53-56, 2006.
[9] C. Grana and R. Cucchiara, “Linear Transition Detection as a Unified Shot Detection Approach,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 17, no. 4 , pp. 483-489, Apr., 2007.
[10] C. H. Hsieh and F. J. Chang, “Application of Frame Partition Scheme to Shot Detection and Image Retrieval,” M.I. Thesis, Dept. CSIE, Chaoyang University of Technology, Taichung, Taiwan, 2010.
[11] A. Khosla, R. Hamid, C. Lin, and N. Sundaresan, “Large-Scale Video Summarization Using Web-Image Priors,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, Oregon, pp. 2698-2705, Jun., 2013.
[12] K. Crammer and Y. Singer, “On the algorithmic implementation of multiclass kernel-based vector machines,” Journal of Machine Learning Research, vol. 2 pp. 265-292, 2002.
[13] R. Gonzalez and R. Woods, Digital image processing, Prentice Hall, NJ: Upper Saddle River, 2008.
[14] B. T. Truong and S. Venkatesh. ”Video abstraction: A systematic review and classification,” ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 3, no. 1, Feb., 2007.
[15]Y. Tonomura, A. Akutsu, K. Otsugi, and T. Sadakata, “VideoMAP and VideoSpaceIcon: tools for anatomizing video content,” in Proceedings of the INTERACT '93 and CHI '93 Conference on Human Factors in Computing Systems, Amsterdam, Netherlands, pp. 131-141, Apr, 1993.
[16] H. Ueda, T. Miyatake, and S. Yoshizawa, “IMPACT: An interactive natural-motion picture dedicated multimedia authoring system,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, New Orleans, LA, pp.343-350, 1991.
[17]Y. Rui, T. S. Huang, and S. Mehrotra, “Exploring video structure beyond the shots,” in Proceedings of the IEEE International Conference on Multimedia Computing and Systems, Austin, TX., pp. 237-240, Jun., 1998.
[18] C. Gianluigi and S. Raimondo, “An innovative algorithm for key frame extraction in video summarization,” Journal of Real-Time Image Processing, vol. 1, no. 1, pp. 69-88, Mar., 2006.
[19] A. Pentland, R. Picard, G. Davenport, and K. Haase, “Video and image semantics: advanced tools for telecommunications,” IEEE Multimedia Magazine, vol. 1, no. 2, pp. 73-75, 1994.
[20] N. Vasconcelos and A. Lippman, “A spatiotemporal motion model for video summarization,” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Santa Barbara, CA, pp. 361-366, Jun., 1998.
[21] Y. Gong and X. Liu, “Generating optimal video summaries,” in Proceedings of the IEEE International Conference on Multimedia and Expo, New York, NY, vol. 3, pp. 1559-1562, Jul, 2000.
[22] B. Han, J. Hamm, and J. Sim, “Personalized video summarization with human in the loop,” IEEE Workshop on Applications of Computer Vision, Pohang, South Korea, pp. 51-57, Jan, 2011.