一個基於人機合作機制的影片摘要系統｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	劉汶珊 Wen-shan Liu
論文名稱：	一個基於人機合作機制的影片摘要系統 A Video Summarization System with Human-machine Cooperation
指導教授：	范欽雄 Chin-shyurng Fahn
口試委員:	傅楸善 Chiou-shann Fuh 駱榮欽 Rong-chin Lo 吳怡樂 Yi-leh Wu
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2014
畢業學年度：	102
語文別：	英文
論文頁數：	58
中文關鍵詞：	視頻摘要、視覺總結評價、關鍵畫面選取、特徵描述、視頻分割、群眾外包、人腦運算
外文關鍵詞：	Video summarization, Visual summary evaluation, Keyframes extraction, Frame content description, video segmentation, Crowdsourcing, Human computation.
相關次數：	點閱：218 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

多媒體產業和網路在近年已經蓬勃發展，相機、手機和平板電腦的錄像功能成本也越來越低廉，此外，網路的發展使我們能夠將影像上傳分享，包括視頻分享網站如YouTube，而其訪問量與視頻量恐怕是難以估計。因此訪問者如何節省時間來了解影片裡的重點顯得日益重要。
我們將研究視頻摘要這個課題，它是將原始視頻中選取關鍵片段而創建的。關鍵畫面的選取很重要，然而如果我們只使用畫面的特徵如顏色、形狀來做選擇，結果往往和人們的期待有落差，此外每個人感興趣的部分不同，所以我們希望能透過投票的方式來擷取能盡量符合大眾感興趣的片段。
為了實現這一目標，本論文提出半自動影片摘要系統，我們先利用畫面的視覺特徵如顏色、邊線來將影片中不同的場景分段，再利用群眾外包概念以及連續背包問題來選擇關鍵片段並且合成影片摘要。

The multimedia industry and internet have been very prosperous recently, and the cost of digital cameras, mobile phones and tablets with a sound recorder as well as DSLRs are lower and lower. In addition, the Internet enables us to access very large amount of video data from local and remote storage, including video sharing websites such as Youtube. The number of videos on the web is probably difficult to estimate. So, how to save a lot of time to understand the main contents about the video, and then viewers can choose whether they really want to watch the entire video is important.
Video summary is a research topic which selects a few important scenes from the original video to help people understand the story about the whole video in a very short time. As video summary is created only by the key segments of the video, the key segments selection is a major problem. However, if we only use the colors, shapes as a feature of keyframes, the summary result is still far from people’s expectation. In addition, not every people is interested in the same video segments, because people taste differ. So, we need select the key segments which most people want.
Toward this goal, we use two visual features: color, edge direction for selection of keyframes of shot change detection. And then let viewers to select the keyframes that they want. Finally, we will decide the number of frames needed to select form each shot through continuous knapsack problem algorithm.

摘要 i
Abstract ii
致謝 iii
List of Figures	vi
List of Tables	ix
Chapter 1 Introduction 10
1.1 Overview 10
1.2 Motivation 10
1.3 System description 11
1.4 Thesis organization 12
Chapter 2 Background and Related Works 13
2.1 Shot Transition 13
2.2 Shot Change Detection 15
2.3 Key Frames Selection 17
Chapter 3 Video Segmentation Method 19
3.1 Color Histogram Feature 19
3.1.1 Color Quantization 19
3.1.2 Progress of Color Histogram Feature 20
3.2 Edge Direction Histogram Feature 21
3.2.1 Sobel Operator 22
3.2.2 Progress of Edge Direction Histogram 23
3.3 Single Color Detection 25
3.4 Shot Change Detection 26
Chapter 4 Key Segments Selection 29
4.1 Key Segments Proportion Selection Method 29
4.1.1 Human Keyframes Selection	29
4.1.2 Continuous Knapsack Problem 30
4.1.3 Knapsack Problem for Key Segment Proportion 31
4.2 Key Segments Combined Method 32
Chapter 5 Experimental Results and Discussion 35
5.1 Experiment Setup 35
5.2 Results of Video Segmentation 37
5.3 Results of Key Segments Selection 43
Chapter 6 Conclusions and Future Works 54
6.1 Conclusions	54
6.2 Future Works 54
References 56

                                

[1] W. Rasheed, G. Kang, J. Kang, J. Chun, and J. Park, ”Sum of values of local histograms for image retrieval,” in Proceedings of the IEEE International Conference on Networked Computing and Advanced Information Management, Gyeongju, South Korea, vol.2, pp. 690-694, Sep., 2008.
[2] J. R. Smith and S. F. Chang, “Tools and techniques for color image retrieval,” IST/SPIE Storage and Retrieval for Image and Video Databases IV, vol. 26, no. 70, pp. 426-437, 1996.
[3] L. F. Wang, H. Y. Wu, and C. H. Pan, "Adaptive εLBP for background subtraction," in Proceedings of the 10th Asian Conference on Computer Vision, Queenstown ,New Zealand, pp. 560-571, Nov., 2011.
[4] H. C. Lee, C. W. Lee, and S. D. Kim, “Abrupt Shot Change Detection Using an Unsupervised Clustering of Multiple Features,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Istanbul, Turkey, vol. 4, pp. 2015-2018, Jun., 2000.
[5] W. Rasheed, Y. An, S. Pan, I. Jeong, J. Park, and J. Kang, “Image Retrieval Using Maximum Frequency of Local Histogram Based Color Correlogram,” in Proceedings of the IEEE International Conference on Modeling & Simulation, Busan, South Korea, pp. 322-326, May, 2008.
[6] J. Wang and W. Luo, “A Self-adapting Dual-threshold Method for Video Shot Transition Detection,” in Proceedings of the IEEE International Conference on Networking Sensing and Control, Sanya, Hainan, China, pp. 704-707, Apr., 2008.
[7] C. R. Huang, H. P. Lee, and C. S. Chen, “Shot Change Detection via Local Keypoint Matching,” IEEE Transactions on Multimedia, vol. 10, no. 6, pp.1097-1108, Oct., 2008.
[8] C. R. Huang, C. S. Chen, and P. C. Chung, “Contrast context histogram – A discriminating local descriptor for image matching,” in Proceedings of the 18th International Conference on Pattern Recognition, Hong Kong, China, vol. 4, pp. 53-56, 2006.
[9] C. Grana and R. Cucchiara, “Linear Transition Detection as a Unified Shot Detection Approach,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 17, no. 4 , pp. 483-489, Apr., 2007.
[10] C. H. Hsieh and F. J. Chang, “Application of Frame Partition Scheme to Shot Detection and Image Retrieval,” M.I. Thesis, Dept. CSIE, Chaoyang University of Technology, Taichung, Taiwan, 2010.
[11] A. Khosla, R. Hamid, C. Lin, and N. Sundaresan, “Large-Scale Video Summarization Using Web-Image Priors,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, Oregon, pp. 2698-2705, Jun., 2013.
[12] K. Crammer and Y. Singer, “On the algorithmic implementation of multiclass kernel-based vector machines,” Journal of Machine Learning Research, vol. 2 pp. 265-292, 2002.
[13] R. Gonzalez and R. Woods, Digital image processing, Prentice Hall, NJ: Upper Saddle River, 2008.
[14] B. T. Truong and S. Venkatesh. ”Video abstraction: A systematic review and classification,” ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 3, no. 1, Feb., 2007.
[15]Y. Tonomura, A. Akutsu, K. Otsugi, and T. Sadakata, “VideoMAP and VideoSpaceIcon: tools for anatomizing video content,” in Proceedings of the INTERACT '93 and CHI '93 Conference on Human Factors in Computing Systems, Amsterdam, Netherlands, pp. 131-141, Apr, 1993.
[16] H. Ueda, T. Miyatake, and S. Yoshizawa, “IMPACT: An interactive natural-motion picture dedicated multimedia authoring system,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, New Orleans, LA, pp.343-350, 1991.
[17]Y. Rui, T. S. Huang, and S. Mehrotra, “Exploring video structure beyond the shots,” in Proceedings of the IEEE International Conference on Multimedia Computing and Systems, Austin, TX., pp. 237-240, Jun., 1998.
[18] C. Gianluigi and S. Raimondo, “An innovative algorithm for key frame extraction in video summarization,” Journal of Real-Time Image Processing, vol. 1, no. 1, pp. 69-88, Mar., 2006.
[19] A. Pentland, R. Picard, G. Davenport, and K. Haase, “Video and image semantics: advanced tools for telecommunications,” IEEE Multimedia Magazine, vol. 1, no. 2, pp. 73-75, 1994.
[20] N. Vasconcelos and A. Lippman, “A spatiotemporal motion model for video summarization,” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Santa Barbara, CA, pp. 361-366, Jun., 1998.
[21] Y. Gong and X. Liu, “Generating optimal video summaries,” in Proceedings of the IEEE International Conference on Multimedia and Expo, New York, NY, vol. 3, pp. 1559-1562, Jul, 2000.
[22] B. Han, J. Hamm, and J. Sim, “Personalized video summarization with human in the loop,” IEEE Workshop on Applications of Computer Vision, Pohang, South Korea, pp. 51-57, Jan, 2011.

全文公開日期 2019/08/02 (校內網路)
全文公開日期 2024/08/02 (校外網路)
全文公開日期 2024/08/04 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文