Basic Search / Detailed Display

Author: 蔡育霖
Yu-Lin Cai
Thesis Title: 集成學習應用於假新聞檢測之研究
A Study of Detecting Fake News Based on Ensemble Learning
Advisor: 陳正綱
Cheng-Kang Chen
Committee: 賴源正
Yuan-Cheng Lai
查士朝
Shi-Cho Cha
Degree: 碩士
Master
Department: 管理學院 - 資訊管理系
Department of Information Management
Thesis Publication Year: 2020
Graduation Academic Year: 108
Language: 中文
Pages: 50
Keywords (in Chinese): 假新聞偵測深度學習自然語言處理集成學習
Keywords (in other languages): fake news detection, deep learning, natural language processing, ensemble learning
Reference times: Clicks: 273Downloads: 0
Share:
School Collection Retrieve National Library Collection Retrieve Error Report
  • 近年來隨著網路與社群媒體的發展,加快了資訊散播的速度,讓假新聞的氾濫與危害程度也日益嚴重,同時假新聞也持續不斷的推陳出新,根據不同的類型、動機變換新聞的寫作手法,因此傳統的假新聞偵測使用的知識庫比對方法出現瓶頸,逐漸不足以應付變化多端的假新聞,因此本研究希望使用自然語言處理技術來檢測假新聞,並達到辨別假新聞的目的。
    本研究目的在於解決假新聞方法中知識庫比對上的問題,有別於過往其他研究針對假新聞使用單一特徵建立的模型,本研究提出一種使用集成學習模型的方法,透過自然語言處理技術中詞向量來理解文本語意,從而解決比對方法中受限於文本文字的問題,並使用識別文字蘊涵的方法來進行知識庫比對,及分別透過卷積神經網路模型提取文字特徵和情感分析模型判別情感特徵進行分類,最後使用集成學習整合分類器的分類結,使模型能夠同時針對假新聞中的真實性、文字特徵與情感特徵進行偵測,提供一個結合多面向特徵的分類模型。經實驗證實由集成學習結合多面向特徵的模型在假新聞偵測任務上能夠有較佳的表現。


    In recent years, with the development of the Internet and social media, the speed of information dissemination has accelerated, and the spread and harm of fake news have become more and more serious. At the same time, fake news has also been continuously updated and changed according to different types and motives. Because of the writing method, the traditional knowledge base comparison method used by fake news detection has a bottleneck and is gradually insufficient to cope with the changing fake news. Therefore, this research hopes to use natural language processing technology to detect fake news and achieve the identification of fake news.
    The purpose of this study is to solve the problem of knowledge base comparison in fake news methods. Unlike other earlier studies that used single feature methods for fake news, this study proposes a method of using an ensemble learning model through natural language processing. In the technology, word vectors are used to understand the semantic meaning of the text, to solve the problem of limited text in the comparison method, and use the method of recognizing the text implication to do knowledge base comparison, and extract the text features and convolution neural network models respectively. The sentiment analysis model discriminates sentiment features for classification, and finally uses ensemble learning to integrate the classification results of the classifier, so that the model can simultaneously detect the authenticity, text features and sentiment features in fake news, providing a multi-feature-oriented classification model. Experiments show that the model combined with multi-features and ensemble learning can do better on fake news detection tasks.

    摘要 I Abstract II 誌謝 III 目錄 IV 圖目錄 VI 表目錄 VII 第一章 緒論 1 1.1 研究背景 1 1.2 研究動機與目的 5 1.3 研究效益 6 第二章 文獻探討 7 2.1 自然語言處理簡介 7 2.2 假新聞偵測 11 2.3 集成學習 12 2.4 遷移學習 15 第三章 集成學習研究模型 16 3.1 文字特徵分類模型 17 3.1.1 文字特徵分類 - 嵌入層 18 3.1.2 文字特徵分類 - 卷積層 18 3.1.3 文字特徵分類 - 池化層 19 3.1.4 文字特徵分類 - 分類 19 3.2 識別文字蘊涵模型 20 3.2.1 識別文字蘊涵 - 嵌入層 20 3.2.2 識別文字蘊涵 - 雙向長短期記憶網路 21 3.2.3 識別文字蘊涵 - 池化層 21 3.2.4 識別文字蘊涵 - 連接層 22 3.2.5 識別文字蘊涵 - 分類 22 3.3 情感分析模型 23 3.3.1 情感分析 - 嵌入層 23 3.3.2 情感分析 - 雙向長短期記憶網路 24 3.3.3 情感分析 - 全連接層 24 3.3.4 情感分析 - 分類 25 3.4 集成學習 26 3.4.1 集成學習 - 全連接層 26 3.4.2 集成學習 - 分類 27 第四章 集成學習模型結果分析 28 4.1 實驗設置 28 4.1.1 資料集介紹 28 4.1.2 評估指標 31 4.1.3 實驗設備及環境 32 4.2 實驗結果 33 4.3 實驗結論 36 第五章 結論與分析 37 參考文獻 39

    [1] Varieties of Democracy. (https://www.v-dem.net/en/).
    [2] Claire Wardle, PhD, Hossein Derakhshan, “INFORMATION DISORDER: Toward an interdisciplinary framework for research and policy making”, In Council of Europe report DGI, 2017
    [3] Eric Brill, “A Simple Rule-based Part of Speech Tagger”, In Proceedings of the Third Conference on Applied Natural Language Processing, 2002
    [4] Sang-Zoo Lee, Jun-ichi Tsujii, Hae-Chang Rim, “Part-of-Speech Tagging Based on Hidden Markov Model Assuming Joint Independence”, In Annual Meeting of the Association for Computational Linguistics, 2000
    [5] Jinho D. Choi, “Dynamic Feature Induction: The Last Gist to the State-of-the-Art”,In Proceedings of NAACL-HLT ,2016 , Pages 271–281
    [6] Karl Stratos, Michael Collins, Daniel Hsu, “Unsupervised Part-Of-Speech Tagging with Anchor Hidden Markov Models”,In Transactions of the Association for Computational Linguistics, 2016, Pages 245–257
    [7] Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, Chris Dyer, “Neural Architectures for Named Entity Recognition”, In Proceedings of NAACL-HLT, 2016, Pages 260–270
    [8] Yequan Wang, Minlie Huang, Li Zhao, Xiaoyan Zhu, “Attention-based LSTM for Aspect-level Sentiment Classification”, Conference on Empirical Methods in Natural Language Processing, 2016, Pages 606–615
    [9] Paul Neculoiu, Maarten Versteegh, Mihai Rotaru, “Learning Text Similarity with Siamese Recurrent Networks”, In Proceedings of the 1st Workshop on Representation Learning for NLP, 2016, Pages 148–157
    [10] Ramesh Nallapati, Bowen Zhou, Cicero dos Santos, Caglar Gulcehre, Bing Xiang, “Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond”, In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, 2015, Pages 280–290
    [11] Victoria L. Rubin, Niall J. Conroy, Yimin Chen, “Towards News Verification: Deception Detection Methods for News Discourse”, In Proceedings of the Hawaii International Conference on System Sciences, 2015
    [12] Tanik Saikh, Amit Anand, Asif Ekbal, Pushpak Bhattacharyya, “A Novel Approach Towards Fake News Detection: Deep Learning Augmented with Textual Entailment Features”, In Springer Nature Switzerland AG, 2019, Pages 345–358
    [13] Bhavika Bhutani, Neha Rastogi, Priyanshu Sehgal, Archana Purwar, “Fake News Detection Using Sentiment Analysis”, IEEE, 2019
    [14] Hannah Rashkin, Eunsol Choi, Jin Yea Jang, Svitlana Volkova, Yejin Choi, “Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking”, In Proceedings of Conference on Empirical Methods in Natural Language Processing, 2017, Pages 2931–2937
    [15] Yunfei Long, Qin Lu, Rong Xiang, Minglei Li, Chu-Ren Huang, “Fake News Detection Through Multi-Perspective Speaker Profiles”, In Proceedings of the The 8th International Joint Conference on Natural Language Processing, 2017, volume 2, pages 252–256.
    [16] David Opitz, Richard Maclin, “Popular Ensemble Methods: An Empirical Study”, In Journal of Artificial Intelligence Research, 1999, Pages 169-198
    [17] Jeremy Howard, Sebastian Ruder, “Universal Language Model Fine-tuning for Text Classification”, In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018, Pages 328–339
    [18] Yoon Kim, “Convolutional Neural Networks for Sentence Classification”, In Proceedings of Conference on Empirical Methods in Natural Language Processing, 2014, Pages 1746–1751
    [19] Conneau Alexis, Kiela Douwe, Schwenk Holger, Barrault Loïc, Bordes Antoine, “Supervised Learning of Universal Sentence Representations from Natural Language Inference Data”, In Proceedings of Conference on Empirical Methods in Natural Language Processing, 2017, Pages 670–680
    [20] Abdullah Aziz Sharfuddin, Md. Nafis Tihami, Md. Saiful Islam, “A Deep Recurrent Neural Network with BiLSTM model for Sentiment Classification”, In International Conference on Bangla Speech and Language Processing, 2018, Pages 1-4
    [21] Ales Tamchyna, Katerina Veselovska, “UFAL at SemEval-2016 Task 5: Recurrent Neural Networks for Sentence Classification”, In Proceedings of the 10th International Workshop on Semantic Evaluation, 2016, Pages 367-371
    [22] Jakub Nowak, Ahmet Taspinar, Rafal Scherer, “LSTM Recurrent Neural Networks for Short Text and Sentiment Classification”, In International Conference on Artificial Intelligence and Soft Computing, 2017, Pages 553-562
    [23] Jakub Nowak, Ahmet Taspinar, Rafal Scherer, “A Decomposable Attention Model for Natural Language Inference”, In Proceedings of Conference on Empirical Methods in Natural Language Processing, 2016, Pages 2249–2255
    [24] Jonas Mueller, Aditya Thyagarajan, “Learning Text Similarity with Siamese Recurrent Networks”, In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

    無法下載圖示 Full text public date 2025/07/22 (Intranet public)
    Full text public date This full text is not authorized to be published. (Internet public)
    Full text public date This full text is not authorized to be published. (National library)
    QR CODE