結合 BERT 預訓練語言模型與情感分析之假新聞預測模型

簡易檢索 / 詳目顯示

回結果列表

研究生：	周瑩紅 Ying-Hung Chau
論文名稱：	結合 BERT 預訓練語言模型與情感分析之假新聞預測模型 Detection of Fake News Using BERT with Sentiment Analysis
指導教授：	呂永和 Yung-Ho Leu
口試委員:	呂永和 Yung-Ho Leu 楊維寧 Wei-Ning Yang 陳雲岫 Yun-Shiow Chen
學位類別：	碩士 Master
系所名稱：	管理學院 - 資訊管理系 Department of Information Management
論文出版年：	2019
畢業學年度：	107
語文別：	英文
論文頁數：	52
中文關鍵詞：	假新聞、自然語言處理、BERT 、預訓練語言模型、情感分析
外文關鍵詞：	fake news, NLP, BERT, pre-trained language model, sentiment analysis
相關次數：	點閱：585 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

自2016年美國總統大選，假新聞漸成為一個熱門話題。雖然“假新聞”是一個已經存在的老問題
幾個世紀以來，今天的科技使錯誤信息的傳播容易的傳播。互聯網和社交媒體是近年來假新聞興起的重要推動因素。假新聞的傳播肯定會對個人和社會造成持續性的負面影響。由於大多數虛假新聞都圍繞著政治，因此這項研究會以政治新聞作研究對象。
更令人擔憂的趨勢是假新聞可以由機器自動生成。一家名為OpenAI的公司創建了GPT2，只需給予系統一句句子,便能夠生成連貫的句子，小說，甚至假新聞。
BERT 是目前最熱門的開源自然語言處理技術，是谷歌於去年發佈的。它在各種NLP任務上取得了很好的表現，包括問題回答，分類等。由於BERT是NLP任務的關鍵技術，因此論文提出了一種基於BERT預訓練語言模型的假新聞檢測算法。
此研究使用了目前最有公信力的真假新聞資料庫FakeNewsNet，從實驗結果可見，使用了BERT的模型有更強的分析力。另外，從過往的研究得知，假新聞會使用較多的負面和主觀術語負面和主觀術語，因此，情感分析被認為是新聞分類的重要元素。得結果中也可以看見，情緒分析可以加強BERT檢測假新聞模型的表現。

Fake news has become a hot-button issue and received tremendous attention since the 2016 U.S. presidential election. Although ‘fake news’ is an old problem that has been existed for centuries, today’s technology enables the spread of misinformation easier than ever. The internet and social media are the great enablers of the rise of fake news in recent years. The
spread of fake news will definitely continue to cause negative impacts on individuals and society. Since most of the fake news revolve around politics, this research is therefore focused on political news.
The more worrying trend is fake news can be generated automatically by machine. Early this year, a company named OpenAI has created an artificial intelligence system called GPT2, that is capable to generate coherent sentences, fiction and even fake news by just giving the system a block of text. Based on the above observation, detecting political fake news by text content is the main focus of this paper.
Bidirectional Encoder Representations from Transformers (BERT) is one of the hottest open-sourced technique for natural language processing(NLP), that Google AI released last year. It has achieved great performance on a variety of NLP tasks, including question answering, classification and others. As BERT is a crucial technique for NLP tasks, this thesis proposed a fake news detection algorithm based on BERT pre-trained language model. The experimental analysis by using BERT shows the competitive performance for fake news detection on FakeNewsNet over other models.
According to previous studies, negative and subjective terms are often used in fake news. Sentiment analysis is therefore considered as an important element for classifying news types in this research. It is shown that sentiment analysis can strengthen the BERT model for detecting fake news.

TABLE OF CONTENTS
ABSTRACT ..................................................................... i
ACKNOWLEDGEMENT ............................................................. ii
TABLE OF CONTENTS .......................................................... iii
LIST OF FIGURES .............................................................. v
LIST OF TABLES ............................................................... vi

Chapter 1 Introduction ........................................................ 1
1 RESEARCH BACKGROUND ....................................................... 1
2 RESEARCH PURPOSE .......................................................... 2
3 RESEARCH STRUCTURE ........................................................ 3
4 RESEARCH OVERVIEW ......................................................... 4

Chapter 2 Literature Review ................................................... 5
1 FAKE NEWS DEFINITION ...................................................... 5
2 TEXT MINING ............................................................... 7
3 SENTIMENT ANALYSIS OF NEWS ................................................ 8
4 RECURRENT NEURAL NETWORK................................................... 8
5 PRE-TRAINED LANGUAGE MODEL ............................................... 14
6 BIDIRECTIONAL ENCODER REPRESENTATION FROM TRANSFORMER(BERT)............... 16
6.1 Value proposition of BERT .............................................. 16
6.2 Model Architecture ..................................................... 17
6.3 Input Representation ................................................... 19
6.4 Pre-training Tasks ..................................................... 20
6.5 Pre-training and Fine-tuning procedures ................................ 22
6.6 Previous Document Classification Work Using BERT........................ 23

Chapter 3 Research Method .................................................... 25
1 RESEARCH STRUCTURE ....................................................... 25
2 DATASET DESCRIPTION ...................................................... 26
2.1 FakeNewsNet (FNN) ...................................................... 26
3 MODEL CONSTRUCTION........................................................ 27 3.3.1
Data Collection ........................................................ 27
3.2 Sentiment Analysis of Collected News ................................... 29
3.3 Construction of Classification Model ................................... 32
4 MODEL EVALUATION ......................................................... 33
4.1 K-fold Cross Validation ................................................ 33
4.2 Confusion Matrix ....................................................... 34
4.3 Evaluation Indexes ..................................................... 35

Chapter 4 Experiment Results ................................................. 36
1 EXPERIMENTAL ENVIRONMENT ................................................. 36
2 PARAMETERS SETTING ....................................................... 36
3 MODELS RESULT AND EVALUATION ............................................. 38

Chapter 5 Conclusion and Future Research ..................................... 40
1 CONCLUSION ............................................................... 40
2 FUTURE RESEARCH .......................................................... 41

Reference .................................................................... 42 
                                

[1] H.Allcott andM.Gentzkow, “SOCIAL MEDIA AND FAKE NEWS IN THE 2016
ELECTION,” 2017.
[2] B.Berlin, “Fake News : A Definition,” vol. 38, no. 1, pp. 84–117, 2018.
[3] C.Wardle, “INFORMATION DISORDER : Toward an interdisciplinary framework
for research and policy making Council of Europe report,” 2017.
[4] S.Dang andP. H.Ahmad, “Text Mining : Techniques and its Application,” no.
December 2014, 2015.
[5] U. C.Berkeley, “What Is Text Mining ? Marti Hearst,” 2003.
[6] V.Gupta, L. C.Science, andG. S.Lehal, “A Survey of Text Mining Techniques and
Applications,” vol. 1, no. 1, pp. 60–76, 2009.
[7] M. J. C.Samonte, “Polarity Analysis of Editorial Articles towards Fake News
Detection,” pp. 0–4, 2018.
[8] N. J.Conroy, V. L.Rubin, andY.Chen, “Automatic Deception Detection : Methods
for Finding Fake News,” 2015.
[9] V. S. R. D. A.S, “The Grim Conclusions of the Largest-Ever Study of Fake News,”
2019. [Online]. Available:
www.sciencemag.org/lookup/doi/10.1126/science.aap9559.
[10] H.Rashkin, “Truth of Varying Shades : Analyzing Language in Fake News and
Political Fact-Checking,” pp. 2931–2937, 2017.
[11] R. J.Rumelhart, D.E., Hinton, G. E. & Williams, “Learning Internal Representation
by Error Propagation.,” no. V, pp. 318–362, 1986.
[12] J.Elman, “Finding Structure in Time,” vol. 211, pp. 179–211, 1990.
[13] Colah, “Understanding LSTM Networks,” 2015. [Online]. Available:
http://colah.github.io/posts/2015-08-Understanding-LSTMs/.
[14] Sepp Hochreiter andJ?urgen Schmidhuber, “LONG SHORT-TERM MEMORY,”
vol. 9, no. 8, pp. 1–32, 1997.
[15] P.Vincent, “A Neural Probabilistic Language Model,” vol. 3, pp. 1137–1155, 2003.
[16] M. C.Kenton, L.Kristina, andJ.Devlin, “BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding,” 2017.
[17] A.Vaswani, “Attention Is All You Need,” no. Nips, 2017.
[18] R.Horev, “BERT Explained: State of the art language model for NLP,” 2018.
[Online]. Available: https://towardsdatascience.com/bert-explained-state-of-the-art
language-model-for-nlp-f8b21a9b6270.
[19] A.Adhikari, A.Ram, R.Tang, andJ.Lin, “DocBERT: BERT for Document
Classification,” 2019.
[20] M.Potthast, “A Stylometric Inquiry into Hyperpartisan and Fake News,” pp. 231–
240, 2018.
[21] K.Shu, S.Wang, andH.Liu, “Exploiting Tri-Relationship for Fake News Detection,”
no. December, 2017.

全文公開日期 2024/08/21 (校內網路)
全文公開日期本全文未授權公開 (校外網路)
全文公開日期本全文未授權公開 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文