簡易檢索 / 詳目顯示

研究生: Marvin Ernesto Garcia Raudez
Marvin Ernesto Garcia Raudez
論文名稱: 應用孿生神經網路於貼文及評論語義關連分析之研究
A Hybrid Approach for Analyzing Post-Comment Semantic Relationship using Siamese Neural Network
指導教授: 羅乃維
Na-Wei Lo
楊朝龍
Chao-Lung Yang
口試委員: 羅乃維
Na-Wei Lo
楊朝龍
Chao-Lung Yang
林伯慎
Bor-Shen Lin
歐陽超
Chao Ou-Yang
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理系
Department of Information Management
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 40
外文關鍵詞: Siamese, Twin Neural Network, Analyzing, Social Based Cluster
相關次數: 點閱:170下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報


The Internet started as a static network designed to exchange bytes between computers, but today is a complex system that moves immense quantities of information, it is no longer a private controlled project, but the largest and most complex computer network that ever existed in the world. This immense amount of freedom and almost zero cost at communicating enabled the internet to be a sophisticated tool that liberate us from geographic barriers and connected people in communities known as social media, by 2004 only 5% of United state adults used some form of social media but 72% after Facebook went live in 2005 [1] empowering not only individuals to connect with others, work on projects but also helps businesses to promote products, and services, or to get feedback, as a consequence big data started to gain some importance due to its large, complex, and valuable resources for business, however monitoring quality data issues, finding and removing duplicated entries, typos, and analyze multi-language became common issues in big data.
This research proposed a deep learning model able to search semantic relations between posts and comments on social network platforms that handle repeated and non-related data, using the siamese networks architecture and supplementary deep learning techniques such as word embedding layers and recurrent neural networks to search the most influential comments by considering their semantic relationship based on cosine similarity, demonstrating the advantages of using Siamese neural network in reducing training time, improving accuracy that exceeds other complex existing models.

ABSTRACT ..................................................................................................................................... i ACKNOWLEDGMENT................................................................................................................. ii TABLE OF CONTENT ................................................................................................................. iii LIST OF FIGURES ....................................................................................................................... iv LIST OF TABLES .......................................................................................................................... v CHAPTER 1. INTRODUCTION ................................................................................................... 1 1.1. Research Background ....................................................................................................... 1 1.2. Research Goals ................................................................................................................. 2 1.3. Research Structure............................................................................................................ 2 CHAPTER 2. LITERATURE REVIEW ........................................................................................ 3 2.1. The Social Network Content-Based Clustering ............................................................... 3 2.2. Word-Embedding ............................................................................................................. 4 2.2.1. Word To Vector ........................................................................................................ 4 2.2.2. GloVe: Global Vector For Word Representation ..................................................... 6 2.3. Long-Short Term Memory - Sequence To Vector ........................................................... 9 2.4. Siamese Neural Network ................................................................................................ 11 2.5. Cosine Similarity ............................................................................................................ 13 2.6. Contrastive Loss Function .............................................................................................. 14 CHAPTER 3. METHODOLOGY ................................................................................................ 15 3.1. Data Collection & Text-Preprocessing .......................................................................... 15 3.1.1. Tokenization – Word Embedding & Transfer Learning ......................................... 16 3.2. Siamese Training ............................................................................................................ 18 CHAPTER 4. EXPERIMENTS .................................................................................................... 22 4.1. Data Processing .............................................................................................................. 22 4.2. Data Mining Transformations ........................................................................................ 23 4.3. Model Parameters ........................................................................................................... 24 4.3.1 Base Model ............................................................................................................. 24 4.3.2 The Siamese Neural Network Hybrid Approach .................................................... 25 4.4. Model Comparison – Cheltenham's Facebook Group. .................................................. 26 CHAPTER 5. CONCLUSION...................................................................................................... 28 REFERENCES ............................................................................................................................. 30

[1]
C. Pew Research, "Social Media Fact Sheet," 7 April 2021. [Online]. Available: https://www.pewresearch.org/internet/fact-sheet/social-media/.
[2]
V. Dhawan and Z. Nadir, "Big data and social media analytics," Research Matters, p. 6, 2014.
[3]
Experian, "The data-driven strategy behind business growth," Experian Information Solutions, Inc., Boston, 2021.
[4]
A. Jabbar, "Data Mining Issues that Still Persist in 2018," 16 October 2018. [Online]. Available: https://bigdatashowcase.com/data-mining-issues-that-still-persist-in-2018/.
[5]
J. Bromley, I. Guyon, Y. LeCun and E. S, "Signature Verification using a "Siamese" Time Delay Neural Network," AT&T Bell Laboratories, 1993.
[6]
M. Chen and C. Tseng, "IncreSTS: Towards Real-Time Incremental Short Text Summarization on Comment Streams from Social Network Services," IEEE Transactions on Knowledge and Data Engineering, 2015, pp. 2986-3000.
[7]
H. S. Zhangcheng Qiu, "User clustering in a dynamic social network topic model for short text streams," Information Sciences, pp. 102-106, November 2017.
[8]
Y. Goldberg, Neural Network Methods in Natural Language Processing (Synthesis Lectures on Human Language Technologies), Graeme Hirst: Morgan & Claypool, 2017.
[9]
Google, "Tensorflow Org," 4 February 2022. [Online]. Available: https://www.tensorflow.org/tutorials/text/word2vec.
[10]
J. Pennington and R. Socher, "GloVe: Global Vectors for Word Representation," Stanford University, p. 12, 1 January 2014.
[11]
S. Hochreiter and S. Jürgen, "Long short-term memory," Neural Computation, p. 32, 1997.
[12]
C. Kyunghyun, B. v. Merrienboer and B. Dzmitry, "On the Properties of Neural Machine Translation: Encoder-Decoder Approaches," Cornell University, 2004.
[13]
D. Britz, "Recurrent Neural Network – Implementing a GRU/LSTM RNN," 27 October 2015. [Online]. Available: http://www.wildml.com/2015/10/recurrent-neural-network-tutorial-part-4-implementing-a-grulstm-rnn-with-python-and-theano/.
[14]
R. Mirco, P. Brakel, and M. Omologo, "Light Gated Recurrent Units for Speech Recognition," IEEE Xplore, p. 102, 2018.
31
[15]
C. Davide, "Artificial Neural Network," in Siamese Neural Networks: An Overview, New York, Springer Protocols Humana Press, 2020, pp. 73-94.
[16]
Y. Li, D. McLean, B. Zuhair, and J. O'Shea, "Sentence similarity based on semantic nets and corpus statistics.," IEEE Transactions on Knowledge and Data Engineering, 2006, pp. 1138-1150.
[17]
E. Agirre, C. Banea, C. Cardie, and D. M, "Semeval-2014 Task 10: Multilingual Semantic Textual Similarity.," in Association for Computational Linguistics, Dublin, Ireland, 2014.
[18]
H. B´echara, C. Hernani, T. Shiva, and G. Rohit, "MiniExperts: An SVM Approach for Measuring," Academia Edu, p. 6, 2015.
[19]
Pin ni, Yuming Li, Gangmin Li, Victore Chang, "A Hybrid Siamese Neural Network for Natural Language Inference in Cyber-Physical Systems," 15 March 2021. [Online]. Available: https://dl.acm.org/doi/fullHtml/10.1145/3418208.
[20]
G. Rohit, H. B´echara, I. E. Maarouf, and C. Orăsan, "UoW: NLP techniques developed at the University of Wolverhampton for Semantic Similarity and Textual Entailment," in SemEval@COLING, Wolverhampton, 2014.
[21]
R. Gupta, H. Bechara, and C. Orasan, "Intelligent translation memory matching and retrieval metric exploiting linguistic technology," AsLing, pp. 86-89, 2014.
[22]
P. Neculoiu, M. Versteegh and M. Rotaru, "Learning Text Similarity with Siamese Recurrent Networks," Textkernel B.V. Amsterdam, p. 10, 5 July 216.
[23]
D. J. James H. Martin, "Speech and language processing," International Edition, 2000.
[24]
R. Tharindu and C. Orasan, "Semantic Textual Similarity with Siamese Neural Networks," Research Gate, pp. 50-62, 1 January 2008.
[25]
Y. LeCun, H. Raia and C. Sumit, "Dimensionality Reduction by Learning an Invariant Mapping," The Courant Institute of Mathematical Sciences, p. 8, 2006.
[26]
P. Aravindpai, "What is Tokenization in NLP? Here’s All You Need To Know," 26 May 2020. [Online]. Available: https://www.analyticsvidhya.com/blog/2020/05/what-is-tokenization-nlp/.
[27]
J. Pennington and R. Socher, "Nlp Stanford," February 2022. [Online]. Available: https://nlp.stanford.edu/projects/glove/.
[28]
L. Oliveira, "Exploiting Siamese Neural Networks on Short Text Similarity Tasks for Multiple Domains and Languages.," p. 10, February 2020.
32
[29]
P. Neculoiu, M. Versteegh and M. Rotaru, "Learning Text Similarity with Siamese Recurrent Networks," ResearchGate, p. 10, 2016.
[30]
Z. Tanaka, K. Tong, and G. Aihara, "A Hybrid Pooling Method for Convolutional Neural Networks," in International Conference on Neural Information Processing, Japan, 2016.
[31]
G. Chevalier, "LARNN: Linear Attention Recurrent Neural Network," Cornell University, p. 14, 2018.
[32]
Global Social Media Stats Oc, "Data Reportal," 22 Nov 2021. [Online]. Available: https://datareportal.com/social/topics/3196/social-media-usage-in-united-sates/.
[33]
I. Rehan, "Issues with data duplication and formatting still hurting data quality in 2019," 25 January 2019. [Online]. Available: https://www.crayondata.com/issues-with-data-duplication-and-formatting-still-hurting-data-quality-in-2019/.
[34]
A. Beatrice, "All about the basics of Big Data: History, Types, and Applications," 2 March 2021. [Online]. Available: https://www.analyticsinsight.net/all-about-the-basics-of-big-data-history-types-and-applications/.
[35]
H. Hua, K. Gimpel, and J. J. Lin, "Multi-perspective sentence similarity modeling," 2015.
[36]
"Long short-term memory," License, Creative Commons Attribution-ShareAlike, 2022.

無法下載圖示 全文公開日期 2024/06/27 (校內網路)
全文公開日期 2024/06/27 (校外網路)
全文公開日期 2024/06/27 (國家圖書館:臺灣博碩士論文系統)
QR CODE