研究生: |
廖傑明 Chieh-Ming Liaw |
---|---|
論文名稱: |
藉由聊天訊息偵測串流直播影片之精彩片段 Live-Streaming Video Highlight Detection Using Chat Messages |
指導教授: |
戴碧如
Bi-Ru Dai |
口試委員: |
戴志華
Chih-Hua Tai 帥宏翰 Hong-Han Shuai 戴碧如 Bi-Ru Dai 陳怡伶 Yi-Ling Chen |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 英文 |
論文頁數: | 41 |
中文關鍵詞: | 實況直播 、影片精彩片段偵測 、注意力模型 |
外文關鍵詞: | live stream, video highlight detection, attention model |
相關次數: | 點閱:202 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著科技不斷進步,資料的傳輸越來越快速,許多新興的服務也隨之產生。作為近年熱門的服務之一──實況直播儼然已成為許多人生活的一部分。不同於一般的電視節目,實況直播有著能夠即時與觀眾互動的特性,觀眾能在各個頻道的聊天室即時地參與討論;同時,實況直播的內容較為冗長、鬆散,相較於一般節目短而緊湊的內容,實況直播的影片時常長達數小時以上,縱使作為近年熱門的服務,如此未經剪輯的影片卻難以吸引社群外的人們。然而長達數小時的影片所產生的龐大資料量,卻讓現存基於影像內容進行自動偵測、剪輯精彩片段的方法,面臨許多硬體上的限制。在本篇論文中,我們將提出一個基於聊天訊息的長短期注意力架構(LSTA),實驗結果表明,我們基於聊天訊息的設計,是更為可靠用來偵測精彩片段的方法。
In recent years, live-streaming services have been booming and are still continuing to grow on the Internet.
Differing from TV shows and movies, live-streaming can have variable and longer lengths with no specific content restrictions.
Traditional methods of video highlight detection, which are based on visual features, will suffer the difficulties of data scale and inconsistency.
To address these issues, we alternatively extract information from the audience discussion in a chat room for highlight detection.
In this thesis, an attention-based model, LSTA, is proposed to integrate the long term and short term information in a chat room to determine which fragments should be identified as highlights.
Our results demonstrate the improvement over both state-of-the-art visual and textual content-based approaches.
[1] T. Yao, T. Mei, and Y. Rui, “Highlight detection with pairwise deep ranking for firstperson video summarization,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 982–990, June 2016.
[2] A. Javed, K. B. Bajwa, H. Malik, and A. Irtaza, “An efficient framework for automatic highlights generation from sports videos,” IEEE Signal Processing Letters, vol. 23, pp. 954–958, July 2016.
[3] Y. Song, “Realtime video highlights for yahoo esports,” arXiv preprint arXiv:1611.08780, 2016.
[4] H. Yang, B. Wang, S. Lin, D. Wipf, M. Guo, and B. Guo, “Unsupervised extraction of video highlights via robust recurrent autoencoders,” in Proceedings of the IEEE international conference on computer vision, pp. 4633–4641, Dec 2015.
[5] J. Wang, C. Xu, E. Chng, and Q. Tian, “Sports highlight detection from keyword sequences using hmm,” in 2004 IEEE International Conference on Multimedia and Expo (ICME)(IEEE Cat. No. 04TH8763), vol. 1, pp. 599–602, IEEE, June 2004.
[6] C.C. Cheng and C.T. Hsu, “Fusion of audio and motion information on hmmbased highlight extraction for baseball games,” IEEE Transactions on Multimedia, vol. 8, pp. 585–599, June 2006.
[7] E. Chu and D. Roy, “Audiovisual sentiment analysis for learning emotional arcs in movies,” in 2017 IEEE International Conference on Data Mining (ICDM), pp. 829–834, IEEE, Nov 2017.
[8] L.C. Hsieh, C.W. Lee, T.H. Chiu, and W. Hsu, “Live semantic sport highlight detection based on analyzing tweets of twitter,” in 2012 IEEE International Conference on Multimedia and Expo, pp. 949–954, IEEE, July 2012.
[9] G. Lv, T. Xu, E. Chen, Q. Liu, and Y. Zheng, “Reading the videos: Temporal labeling for crowdsourced timesync videos based on semantic embedding,” in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI’16, pp. 3000–3006, AAAI Press, 2016.
[10] C.Y. Fu, J. Lee, M. Bansal, and A. Berg, “Video highlight prediction using audience chat reactions,” in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, (Copenhagen, Denmark), pp. 972–978, Association for Computational Linguistics, 2017.
[11] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, June 2016.
[12] Q. Ping and C. Chen, “Video highlights detection and summarization with lagcalibration based on conceptemotion mapping of crowdsourced timesync comments,” in Proceedings of the Workshop on New Frontiers in Summarization, (Copenhagen, Denmark), pp. 1–11, Association for Computational Linguistics, 2017.
[13] Q. Ping, “Video recommendation using crowdsourced timesync comments,” in Proceedings of the 12th ACM Conference on Recommender Systems, RecSys ’18, (New York, NY, USA), pp. 568–572, ACM, 2018.
[14] R. Jiang, C. Qu, J. Wang, C. Wang, and Y. Zheng, “Towards extracting highlights from recorded live videos: An implicit crowdsourcing approach,” arXiv preprint arXiv:1910.12201, 2019.
[15] Z. Ji, K. Xiong, Y. Pang, and X. Li, “Video summarization with attentionbased encoderdecoder networks,” IEEE Transactions on Circuits and Systems for Video Technology, pp. 1–1, 2019.
[16] T.J. Fu, S.H. Tai, and H.T. Chen, “Attentive and adversarial learning for video summarization,” in 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1579–1587, IEEE, Jan 2019.
[17] S. Bird, E. Klein, and E. Loper, Natural language processing with Python: analyzing text with the natural language toolkit. ” O’Reilly Media, Inc.”, 2009.
[18] C. E. Shannon, “A mathematical theory of communication,” Bell system technical journal, vol. 27, pp. 379–423, July 1948.
[19] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word vectors with subword information,” Transactions of the Association for Computational Linguistics, vol. 5, pp. 135–146, 2017.
[20] R. Řehůřek and P. Sojka, “Software Framework for Topic Modelling with Large Corpora,” in Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, (Valletta, Malta), pp. 45–50, ELRA, May 2010. http://is.muni.cz/publication/884893/en.
[21] B. Zhang and R. Sennrich, “A Lightweight Recurrent Network for Sequence Modeling,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, (Florence, Italy), pp. 1538–1548, Association for Computational Linguistics, 2019.
[22] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Transactions on Signal Processing, vol. 45, pp. 2673–2681, Nov 1997.
[23] T. Luong, H. Pham, and C. D. Manning, “Effective approaches to attentionbased neural machine translation,” in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, (Lisbon, Portugal), pp. 1412–1421, Association for Computational Linguistics, 2015.
[24] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, pp. 436–444, May 2015.
[25] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proceedings of the 32nd International Conference on International Conference on Machine Learning Volume 37, ICML'15, p. 448–456, JMLR.org, 2015.
[26] Z. Huang, W. Xu, and K. Yu, “Bidirectional lstmcrf models for sequence tagging,” arXiv preprint arXiv:1508.01991, 2015.