藉由聊天訊息偵測串流直播影片之精彩片段｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	廖傑明 Chieh-Ming Liaw
論文名稱：	藉由聊天訊息偵測串流直播影片之精彩片段 Live-Streaming Video Highlight Detection Using Chat Messages
指導教授：	戴碧如 Bi-Ru Dai
口試委員:	戴志華 Chih-Hua Tai 帥宏翰 Hong-Han Shuai 戴碧如 Bi-Ru Dai 陳怡伶 Yi-Ling Chen
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2020
畢業學年度：	108
語文別：	英文
論文頁數：	41
中文關鍵詞：	實況直播、影片精彩片段偵測、注意力模型
外文關鍵詞：	live stream, video highlight detection, attention model
相關次數：	點閱：202 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

隨著科技不斷進步，資料的傳輸越來越快速，許多新興的服務也隨之產生。作為近年熱門的服務之一──實況直播儼然已成為許多人生活的一部分。不同於一般的電視節目，實況直播有著能夠即時與觀眾互動的特性，觀眾能在各個頻道的聊天室即時地參與討論；同時，實況直播的內容較為冗長、鬆散，相較於一般節目短而緊湊的內容，實況直播的影片時常長達數小時以上，縱使作為近年熱門的服務，如此未經剪輯的影片卻難以吸引社群外的人們。然而長達數小時的影片所產生的龐大資料量，卻讓現存基於影像內容進行自動偵測、剪輯精彩片段的方法，面臨許多硬體上的限制。在本篇論文中，我們將提出一個基於聊天訊息的長短期注意力架構（LSTA），實驗結果表明，我們基於聊天訊息的設計，是更為可靠用來偵測精彩片段的方法。

In recent years, live-streaming services have been booming and are still continuing to grow on the Internet.
Differing from TV shows and movies, live-streaming can have variable and longer lengths with no specific content restrictions.
Traditional methods of video highlight detection, which are based on visual features, will suffer the difficulties of data scale and inconsistency.
To address these issues, we alternatively extract information from the audience discussion in a chat room for highlight detection.
In this thesis, an attention-based model, LSTA, is proposed to integrate the long term and short term information in a chat room to determine which fragments should be identified as highlights.
Our results demonstrate the improvement over both state-of-the-art visual and textual content-based approaches.

Recommendation Letter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Approval Letter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Abstract in Chinese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Abstract in English . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Highlight Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Highlight Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.1 Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.2 Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.3 Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
5 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.1 Basic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.2 Long-Short Term Attention (LSTA) Model . . . . . . . . . . . . 9
Experiments and Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.1 Text Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2 Video-Text Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3 ESports Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Evaluation Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Training Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Deal with Imbalanced Data . . . . . . . . . . . . . . . . . . . . . 16
3.3 Deal with Variable-Length Data . . . . . . . . . . . . . . . . . . 16
4 Experiment Results and Discussions . . . . . . . . . . . . . . . . . . . . 17
4.1 The Comparison of Different Designs of Context Window . . . . 17
4.2 Compare with Textual Content-Based Methods . . . . . . . . . . 18
4.3 Compare with Visual Content-Based Methods . . . . . . . . . . . 18
4.4 Discussions on the Visualized Results . . . . . . . . . . . . . . . 21
4.5 Discussions on the Relation of Video Length and Performance . . 21
5 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.1 The Importance of Each Feature . . . . . . . . . . . . . . . . . . 23
5.2 Improve the Model Using Only Frequency and Diversity . . . . . 24
6 Apply a Different Process to Generate the Global View of a Segment . . . 26
Conclusions and Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
                                

[1] T. Yao, T. Mei, and Y. Rui, “Highlight detection with pairwise deep ranking for firstperson video summarization,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 982–990, June 2016.
[2] A. Javed, K. B. Bajwa, H. Malik, and A. Irtaza, “An efficient framework for automatic highlights generation from sports videos,” IEEE Signal Processing Letters, vol. 23, pp. 954–958, July 2016.
[3] Y. Song, “Realtime video highlights for yahoo esports,” arXiv preprint arXiv:1611.08780, 2016.
[4] H. Yang, B. Wang, S. Lin, D. Wipf, M. Guo, and B. Guo, “Unsupervised extraction of video highlights via robust recurrent autoencoders,” in Proceedings of the IEEE international conference on computer vision, pp. 4633–4641, Dec 2015.
[5] J. Wang, C. Xu, E. Chng, and Q. Tian, “Sports highlight detection from keyword sequences using hmm,” in 2004 IEEE International Conference on Multimedia and Expo (ICME)(IEEE Cat. No. 04TH8763), vol. 1, pp. 599–602, IEEE, June 2004.
[6] C.C. Cheng and C.T. Hsu, “Fusion of audio and motion information on hmmbased highlight extraction for baseball games,” IEEE Transactions on Multimedia, vol. 8, pp. 585–599, June 2006.
[7] E. Chu and D. Roy, “Audiovisual sentiment analysis for learning emotional arcs in movies,” in 2017 IEEE International Conference on Data Mining (ICDM), pp. 829–834, IEEE, Nov 2017.
[8] L.C. Hsieh, C.W. Lee, T.H. Chiu, and W. Hsu, “Live semantic sport highlight detection based on analyzing tweets of twitter,” in 2012 IEEE International Conference on Multimedia and Expo, pp. 949–954, IEEE, July 2012.
[9] G. Lv, T. Xu, E. Chen, Q. Liu, and Y. Zheng, “Reading the videos: Temporal labeling for crowdsourced timesync videos based on semantic embedding,” in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI’16, pp. 3000–3006, AAAI Press, 2016.
[10] C.Y. Fu, J. Lee, M. Bansal, and A. Berg, “Video highlight prediction using audience chat reactions,” in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, (Copenhagen, Denmark), pp. 972–978, Association for Computational Linguistics, 2017.
[11] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, June 2016.
[12] Q. Ping and C. Chen, “Video highlights detection and summarization with lagcalibration based on conceptemotion mapping of crowdsourced timesync comments,” in Proceedings of the Workshop on New Frontiers in Summarization, (Copenhagen, Denmark), pp. 1–11, Association for Computational Linguistics, 2017.
[13] Q. Ping, “Video recommendation using crowdsourced timesync comments,” in Proceedings of the 12th ACM Conference on Recommender Systems, RecSys ’18, (New York, NY, USA), pp. 568–572, ACM, 2018.
[14] R. Jiang, C. Qu, J. Wang, C. Wang, and Y. Zheng, “Towards extracting highlights from recorded live videos: An implicit crowdsourcing approach,” arXiv preprint arXiv:1910.12201, 2019.
[15] Z. Ji, K. Xiong, Y. Pang, and X. Li, “Video summarization with attentionbased encoderdecoder networks,” IEEE Transactions on Circuits and Systems for Video Technology, pp. 1–1, 2019.
[16] T.J. Fu, S.H. Tai, and H.T. Chen, “Attentive and adversarial learning for video summarization,” in 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1579–1587, IEEE, Jan 2019.
[17] S. Bird, E. Klein, and E. Loper, Natural language processing with Python: analyzing text with the natural language toolkit. ” O’Reilly Media, Inc.”, 2009.
[18] C. E. Shannon, “A mathematical theory of communication,” Bell system technical journal, vol. 27, pp. 379–423, July 1948.
[19] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word vectors with subword information,” Transactions of the Association for Computational Linguistics, vol. 5, pp. 135–146, 2017.
[20] R. Řehůřek and P. Sojka, “Software Framework for Topic Modelling with Large Corpora,” in Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, (Valletta, Malta), pp. 45–50, ELRA, May 2010. http://is.muni.cz/publication/884893/en.
[21] B. Zhang and R. Sennrich, “A Lightweight Recurrent Network for Sequence Modeling,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, (Florence, Italy), pp. 1538–1548, Association for Computational Linguistics, 2019.
[22] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Transactions on Signal Processing, vol. 45, pp. 2673–2681, Nov 1997.
[23] T. Luong, H. Pham, and C. D. Manning, “Effective approaches to attentionbased neural machine translation,” in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, (Lisbon, Portugal), pp. 1412–1421, Association for Computational Linguistics, 2015.
[24] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, pp. 436–444, May 2015.
[25] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proceedings of the 32nd International Conference on International Conference on Machine Learning Volume 37, ICML＇15, p. 448–456, JMLR.org, 2015.
[26] Z. Huang, W. Xu, and K. Yu, “Bidirectional lstmcrf models for sequence tagging,” arXiv preprint arXiv:1508.01991, 2015.

全文公開日期 2025/08/23 (校內網路)
全文公開日期本全文未授權公開 (校外網路)
全文公開日期本全文未授權公開 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文