Author: |
蔡育霖 Yu-Lin Cai |
---|---|
Thesis Title: |
集成學習應用於假新聞檢測之研究 A Study of Detecting Fake News Based on Ensemble Learning |
Advisor: |
陳正綱
Cheng-Kang Chen |
Committee: |
賴源正
Yuan-Cheng Lai 查士朝 Shi-Cho Cha |
Degree: |
碩士 Master |
Department: |
管理學院 - 資訊管理系 Department of Information Management |
Thesis Publication Year: | 2020 |
Graduation Academic Year: | 108 |
Language: | 中文 |
Pages: | 50 |
Keywords (in Chinese): | 假新聞偵測 、深度學習 、自然語言處理 、集成學習 |
Keywords (in other languages): | fake news detection, deep learning, natural language processing, ensemble learning |
Reference times: | Clicks: 792 Downloads: 0 |
Share: |
School Collection Retrieve National Library Collection Retrieve Error Report |
近年來隨著網路與社群媒體的發展,加快了資訊散播的速度,讓假新聞的氾濫與危害程度也日益嚴重,同時假新聞也持續不斷的推陳出新,根據不同的類型、動機變換新聞的寫作手法,因此傳統的假新聞偵測使用的知識庫比對方法出現瓶頸,逐漸不足以應付變化多端的假新聞,因此本研究希望使用自然語言處理技術來檢測假新聞,並達到辨別假新聞的目的。
本研究目的在於解決假新聞方法中知識庫比對上的問題,有別於過往其他研究針對假新聞使用單一特徵建立的模型,本研究提出一種使用集成學習模型的方法,透過自然語言處理技術中詞向量來理解文本語意,從而解決比對方法中受限於文本文字的問題,並使用識別文字蘊涵的方法來進行知識庫比對,及分別透過卷積神經網路模型提取文字特徵和情感分析模型判別情感特徵進行分類,最後使用集成學習整合分類器的分類結,使模型能夠同時針對假新聞中的真實性、文字特徵與情感特徵進行偵測,提供一個結合多面向特徵的分類模型。經實驗證實由集成學習結合多面向特徵的模型在假新聞偵測任務上能夠有較佳的表現。
In recent years, with the development of the Internet and social media, the speed of information dissemination has accelerated, and the spread and harm of fake news have become more and more serious. At the same time, fake news has also been continuously updated and changed according to different types and motives. Because of the writing method, the traditional knowledge base comparison method used by fake news detection has a bottleneck and is gradually insufficient to cope with the changing fake news. Therefore, this research hopes to use natural language processing technology to detect fake news and achieve the identification of fake news.
The purpose of this study is to solve the problem of knowledge base comparison in fake news methods. Unlike other earlier studies that used single feature methods for fake news, this study proposes a method of using an ensemble learning model through natural language processing. In the technology, word vectors are used to understand the semantic meaning of the text, to solve the problem of limited text in the comparison method, and use the method of recognizing the text implication to do knowledge base comparison, and extract the text features and convolution neural network models respectively. The sentiment analysis model discriminates sentiment features for classification, and finally uses ensemble learning to integrate the classification results of the classifier, so that the model can simultaneously detect the authenticity, text features and sentiment features in fake news, providing a multi-feature-oriented classification model. Experiments show that the model combined with multi-features and ensemble learning can do better on fake news detection tasks.
[1] Varieties of Democracy. (https://www.v-dem.net/en/).
[2] Claire Wardle, PhD, Hossein Derakhshan, “INFORMATION DISORDER: Toward an interdisciplinary framework for research and policy making”, In Council of Europe report DGI, 2017
[3] Eric Brill, “A Simple Rule-based Part of Speech Tagger”, In Proceedings of the Third Conference on Applied Natural Language Processing, 2002
[4] Sang-Zoo Lee, Jun-ichi Tsujii, Hae-Chang Rim, “Part-of-Speech Tagging Based on Hidden Markov Model Assuming Joint Independence”, In Annual Meeting of the Association for Computational Linguistics, 2000
[5] Jinho D. Choi, “Dynamic Feature Induction: The Last Gist to the State-of-the-Art”,In Proceedings of NAACL-HLT ,2016 , Pages 271–281
[6] Karl Stratos, Michael Collins, Daniel Hsu, “Unsupervised Part-Of-Speech Tagging with Anchor Hidden Markov Models”,In Transactions of the Association for Computational Linguistics, 2016, Pages 245–257
[7] Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, Chris Dyer, “Neural Architectures for Named Entity Recognition”, In Proceedings of NAACL-HLT, 2016, Pages 260–270
[8] Yequan Wang, Minlie Huang, Li Zhao, Xiaoyan Zhu, “Attention-based LSTM for Aspect-level Sentiment Classification”, Conference on Empirical Methods in Natural Language Processing, 2016, Pages 606–615
[9] Paul Neculoiu, Maarten Versteegh, Mihai Rotaru, “Learning Text Similarity with Siamese Recurrent Networks”, In Proceedings of the 1st Workshop on Representation Learning for NLP, 2016, Pages 148–157
[10] Ramesh Nallapati, Bowen Zhou, Cicero dos Santos, Caglar Gulcehre, Bing Xiang, “Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond”, In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, 2015, Pages 280–290
[11] Victoria L. Rubin, Niall J. Conroy, Yimin Chen, “Towards News Verification: Deception Detection Methods for News Discourse”, In Proceedings of the Hawaii International Conference on System Sciences, 2015
[12] Tanik Saikh, Amit Anand, Asif Ekbal, Pushpak Bhattacharyya, “A Novel Approach Towards Fake News Detection: Deep Learning Augmented with Textual Entailment Features”, In Springer Nature Switzerland AG, 2019, Pages 345–358
[13] Bhavika Bhutani, Neha Rastogi, Priyanshu Sehgal, Archana Purwar, “Fake News Detection Using Sentiment Analysis”, IEEE, 2019
[14] Hannah Rashkin, Eunsol Choi, Jin Yea Jang, Svitlana Volkova, Yejin Choi, “Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking”, In Proceedings of Conference on Empirical Methods in Natural Language Processing, 2017, Pages 2931–2937
[15] Yunfei Long, Qin Lu, Rong Xiang, Minglei Li, Chu-Ren Huang, “Fake News Detection Through Multi-Perspective Speaker Profiles”, In Proceedings of the The 8th International Joint Conference on Natural Language Processing, 2017, volume 2, pages 252–256.
[16] David Opitz, Richard Maclin, “Popular Ensemble Methods: An Empirical Study”, In Journal of Artificial Intelligence Research, 1999, Pages 169-198
[17] Jeremy Howard, Sebastian Ruder, “Universal Language Model Fine-tuning for Text Classification”, In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018, Pages 328–339
[18] Yoon Kim, “Convolutional Neural Networks for Sentence Classification”, In Proceedings of Conference on Empirical Methods in Natural Language Processing, 2014, Pages 1746–1751
[19] Conneau Alexis, Kiela Douwe, Schwenk Holger, Barrault Loïc, Bordes Antoine, “Supervised Learning of Universal Sentence Representations from Natural Language Inference Data”, In Proceedings of Conference on Empirical Methods in Natural Language Processing, 2017, Pages 670–680
[20] Abdullah Aziz Sharfuddin, Md. Nafis Tihami, Md. Saiful Islam, “A Deep Recurrent Neural Network with BiLSTM model for Sentiment Classification”, In International Conference on Bangla Speech and Language Processing, 2018, Pages 1-4
[21] Ales Tamchyna, Katerina Veselovska, “UFAL at SemEval-2016 Task 5: Recurrent Neural Networks for Sentence Classification”, In Proceedings of the 10th International Workshop on Semantic Evaluation, 2016, Pages 367-371
[22] Jakub Nowak, Ahmet Taspinar, Rafal Scherer, “LSTM Recurrent Neural Networks for Short Text and Sentiment Classification”, In International Conference on Artificial Intelligence and Soft Computing, 2017, Pages 553-562
[23] Jakub Nowak, Ahmet Taspinar, Rafal Scherer, “A Decomposable Attention Model for Natural Language Inference”, In Proceedings of Conference on Empirical Methods in Natural Language Processing, 2016, Pages 2249–2255
[24] Jonas Mueller, Aditya Thyagarajan, “Learning Text Similarity with Siamese Recurrent Networks”, In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016