以句子切割為基礎的主題式抽象評論摘要｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	石家安 CHIA-AN SHIH
論文名稱：	以句子切割為基礎的主題式抽象評論摘要 Abstractive Review Text Summarization Based on Sentence Segmentation with a Topic Model
指導教授：	徐俊傑 Chiun-Chieh Hsh
口試委員:	黃世禎 Shih-Chen Huang 王有禮 Yue-Li Wang
學位類別：	碩士 Master
系所名稱：	管理學院 - 資訊管理系 Department of Information Management
論文出版年：	2023
畢業學年度：	111
語文別：	中文
論文頁數：	46
中文關鍵詞：	文本摘要、抽象摘要、特徵擷取、主題模型
外文關鍵詞：	Text Summarization, Abstractive Summarization, Feature Extraction, Topic Model
相關次數：	點閱：263 下載：3
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

電子商務的迅速發展，越來越多人在線上購買產品，促使評論數量大量增長，如何從評論中獲得所需資訊更為重要。為解決上述問題，文本摘要技術引起廣泛注意，該技術可正確聚合句子以生成代表性的摘要，故藉由閱讀摘要可快速從評論中獲得所需資訊。
因此，本研究將製作一主題抽象摘要模型，從各評論中找出具有主題性的名詞做為主題常用詞，並利用關聯規則判斷常用詞之間是否為關聯詞彙，以此進行句子切割。而後將切割好的句子組，以句子為單位進行BERT Sentence Embedding，在高維空間中與LDA主題模型主題詞向量相加，並利用K-means分群進行主題分群，最後即可將主題特徵運用在抽象摘要系統。
本研究以短文本Amazon手機評論進行實驗，並透過多方位的分析，包含Rouge指標和問卷調查法，問卷調查包含摘要資訊符合度及順暢度，並採用李克特氏五點量表，深度了解符合程度。經由實驗發現，本研究的摘要系統相較目前方法，於Rouge指標中準確率可提升4%，而在問卷調查中資訊符合度能獲得高於37%，以及在摘要順暢度能獲得高於41%之成效。

With the rapid development of e-commerce, more and more people buy products online, which results in an increase in the number of product reviews. It has become increasingly important to extract the necessary information from these reviews. To address this issue, text summarization techniques have received widespread attention. This technology can accurately aggregate sentences to generate representative summaries. By reading these summaries, necessary information can be quickly obtained from the reviews.
This study makes an abstractive review text summarization with topic model, and utilizes association rules to know whether split sentence or not. The segmented sentences are used as inputs for BERT Sentence Embedding. The sentence embeddings are added to the LDA topic model. K-means clustering is utilized to cluster the topics, and the resulting clusters can be applied in the abstractive text summarization.
This study conducts experiments on short-text Amazon phone reviews, and performs a multifaceted analysis, including Rouge metrics and questionnaire survey. The questionnaire survey assesses the informative and fluency of the summary and utilizes Likert's five-point scale to gain a deeper understanding of informative. Compared with the current method, the summarization system in this research can increase the accuracy rate in the Rouge by 4%, and the informative in the questionnaire can be higher than 37%, and the fluency can be higher than 41%.

中文摘要    I
英文摘要    II
誌謝    IV
目錄    V
圖目錄    VII
表目錄    VIII
第一章    緒論    1
1-1    研究背景    1
1-2    研究動機及目的    2
1-3    論文架構    3
第二章    文獻探討    4
2-1    文本摘要    4
2-1-1    提取文本摘要    5
2-1-2    抽象文本摘要    5
2-2    短文本文本摘要相關文獻    6
2-3    主題建模    7
2-3-1    主題模型相關文獻    7
2-3-2    主題模型運用在文本摘要    10
第三章    以句子切割為基礎的主題式抽象評論摘要    12
3-1    系統架構    12
3-2    資料集    14
3-3    資料前處理    14
3-4    句子切割    16
3-5    主題詞彙選取    20
3-6    序列到序列抽象摘要生成    22
3-6-1    Encoder & Decoder    23
3-6-2    注意力機制    24
第四章    實驗比較    26
4-1    資料集、實驗參數設定    26
4-1-1    資料集    26
4-1-2    實驗參數設定    27
4-2    實驗比較模型以及評估方法    28
4-2-1    實驗比較模型    28
4-2-2    主題模型評估方法    29
4-2-3    文本摘要評估方法    29
4-3    實驗結果    30
4-3-1    主題模型–實驗一    30
4-3-2    摘要模型–實驗二    31
4-3-3    問卷分析–實驗三    34
第五章    結論與未來展望    40
5-1    結論    40
5-2    未來研究方向    41
參考文獻    43
                                

[1] A. Abdi, S. M. Shamsuddin, S. Hasan, and J. Piran, “Machine learning-based multi-documents sentiment-oriented summarization using linguistic treatment,” Expert Systems with Applications, Vol. 109, No. 1, pp. 66-85, 2018.
[2] R. C. Belwal, S. Rai, and A. Gupta, “A new graph-based extractive text summarization using keywords or topic modeling,” Journal of Ambient Intelligence and Humanized Computing, Vol. 12, No. 10, pp. 8975-8990, 2021.
[3] D. M. Blei, “Probabilistic topic models,” Communications of the ACM, Vol. 55, No. 4, pp. 77-84, 2012.
[4] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” Journal of machine Learning research, Vol. 3, No. Jan, pp. 993-1022, 2003.
[5] R. Boorugu, and G. Ramesh, “A Survey on NLP based Text Summarization for Summarizing Product Reviews,” 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), pp. 352-356, 2020.
[6] M. Casillo, F. Colace, B. B. Gupta, D. Santaniello, and C. Valentino, “Fake News Detection Using LDA Topic Modelling and K-Nearest Neighbor Classifier,” Computational Data and Social Networks: 10th International Conference, pp. 330-339, 2021.
[7] R. Churchill, and L. Singh, “The evolution of topic modeling,” ACM Computing Surveys, Vol. 54, No. 10s, pp. 1-35, 2022.
[8] W. S. El-Kassas, C. R. Salama, A. A. Rafea, and H. K. Mohamed, “Automatic text summarization: A comprehensive survey,” Expert systems with applications, Vol. 165, No. 1, pp. 113679, 2021.
[9] H. Gupta, and M. Patel, “Method of text summarization using LSA and sentence based topic modelling with Bert,” 2021 international conference on artificial intelligence and smart systems (ICAIS), pp. 511-517, 2021.
[10] S. Gupta, and S. K. Gupta, “Abstractive summarization: An overview of the state of the art,” Expert Systems with Applications, Vol. 121, No. 1, pp. 49-65, 2019.
[11] G. Hamerly, and C. Elkan, “Learning the k in k-means,” Advances in neural information processing systems, Vol. 16, No. 1, pp. 281-288, 2003.
[12] R. He, and J. McAuley, “Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering,” proceedings of the 25th international conference on world wide web, pp. 507-517, 2016.
[13] Á. Hernández-Castañeda, R. A. García-Hernández, Y. Ledeneva, and C. E. Millán-Hernández, “Extractive automatic text summarization based on lexical-semantic keywords,” IEEE Access, Vol. 8, No. 1, pp. 49896-49907, 2020.
[14] S. Hochreiter, and J. Schmidhuber, “Long short-term memory,” Neural computation, Vol. 9, No. 8, pp. 1735-1780, 1997.
[15] B. Hu, Q. Chen, and F. Zhu, “LCSTS: A Large Scale Chinese Short Text Summarization Dataset,” Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1967–1972, 2015.
[16] L. Huang, L. Wu, and L. Wang, “Knowledge graph-augmented abstractive summarization with semantic-driven cloze reward,” Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5094–5107, 2020.
[17] Y. Keneshloo, T. Shi, N. Ramakrishnan, and C. K. Reddy, “Deep reinforcement learning for sequence-to-sequence models,” IEEE transactions on neural networks learning systems, Vol. 31, No. 7, pp. 2469-2489, 2019.
[18] R. Khan, Y. Qian, and S. Naeem, “Extractive based text summarization using K-means and TF-IDF,” International Journal of Information Engineering and Electronic Business, Vol. 10, No. 3, pp. 33, 2019.
[19] W. Li, M. Wu, Q. Lu, W. Xu, and C. Yuan, “Extractive summarization using inter-and intra-event relevance,” Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp. 369-376, 2006.
[20] X. Li, Y. Wang, A. Zhang, C. Li, J. Chi, and J. Ouyang, “Filtering out the noise in short text topic modeling,” Information Sciences, Vol. 456, No. 1, pp. 83-96, 2018.
[21] T.-P. Liang, X. Li, C.-T. Yang, and M. Wang, “What in Consumer Reviews Affects the Sales of Mobile Apps: A Multifacet Sentiment Analysis Approach,” International Journal of Electronic Commerce, Vol. 20, No. 2, pp. 236-260, 2015.
[22] F. F. Lubis, Y. Rosmansyah, and S. H. Supangkat, “Topic discovery of online course reviews using LDA with leveraging reviews helpfulness,” International Journal of Electrical and Computer Engineering (IJECE) Vol. 9, No. 1, pp. 426, 2019.
[23] C. Ma, W. Zhang, M. Guo, H. Wang, and Q. Z. Sheng, “Multi-document Summarization via Deep Learning Techniques: A Survey,” ACM Computing Surveys, Vol. 55, No. 5, pp. 1-37, 2020.
[24] M. Mohamed, and M. Oussalah, “SRL-ESA-TextSum: A text summarization approach based on semantic role labeling and explicit semantic analysis,” Information Processing Management, Vol. 56, No. 4, pp. 1356-1372, 2019.
[25] S. K. Nambiar, and S. M. Idicula, “Attention based Abstractive Summarization of Malayalam Document,” Procedia Computer Science, Vol. 189, No. 1, pp. 250-257, 2021.
[26] N. Nazari, and M. Mahdavi, “A survey on automatic text summarization,” Journal of AI Data Mining, Vol. 7, No. 1, pp. 121-135, 2019.
[27] D. Oneata, “Probabilistic latent semantic analysis,” Proceedings of the Fifteenth conference on Uncertainty, pp. 1-7, 1999.
[28] H. Pan, R. Yang, X. Zhou, R. Wang, D. Cai, and X. Liu, “Large scale abstractive multi-review summarization (LSARS) via aspect alignment,” Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2337-2346, 2020.
[29] P. P. Patil, S. Phansalkar, and V. V. Kryssanov, “Topic modelling for aspect-level sentiment analysis,” Proceedings of the 2nd International Conference on Data Engineering and Communication Technology: ICDECT 2017, pp. 221-229, 2019.
[30] M. Prathima, and H. Divakar, “Automatic extractive text summarization using k-means clustering,” International Journal of Computer Sciences Engineering, Vol. 6, No. 6, pp. 782-787, 2018.
[31] J. Qiang, Z. Qian, Y. Li, Y. Yuan, and X. Wu, “Short text topic modeling techniques, applications, and performance: a survey,” IEEE Transactions on Knowledge and Data Engineering, Vol. 34, No. 3, pp. 1427-1445, 2020.
[32] R. Rani, and D. Lobiyal, “An extractive text summarization approach using tagged-LDA based topic modeling,” Multimedia Tools and Applications, Vol. 80, No. 1, pp. 3275-3305, 2021.
[33] R. K. Roul, S. Mehrotra, Y. Pungaliya, and J. K. Sahoo, “A new automatic multi-document text summarization using topic modeling,” Distributed Computing and Internet Technology: 15th International Conference, pp. 212-221, 2019.
[34] A. M. Rush, S. Chopra, and J. Weston, “A neural attention model for abstractive sentence summarization,” Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 379–389, 2015.
[35] T. Shi, Y. Keneshloo, N. Ramakrishnan, and C. K. Reddy, “Neural abstractive text summarization with sequence-to-sequence models,” ACM Transactions on Data Science, Vol. 2, No. 1, pp. 1-37, 2021.
[36] I. Sutherland, and K. Kiatkawsin, “Determinants of guest experience in Airbnb: a topic modeling approach using LDA,” Sustainability, Vol. 12, No. 8, pp. 3402, 2020.
[37] X. Tan, M. Zhuang, X. Lu, and T. Mao, “An analysis of the emotional evolution of large-scale internet public opinion events based on the BERT-LDA hybrid model,” IEEE Access, Vol. 9, No. 1, pp. 15860-15871, 2021.
[38] C.-F. Tsai, K. Chen, Y.-H. Hu, and W.-K. Chen, “Improving text summarization of online hotel reviews with review helpfulness and sentiment,” Tourism Management, Vol. 80, No. 1, pp. 104122, 2020.
[39] S. Tulkens, and A. van Cranenburgh, “Embarrassingly simple unsupervised aspect extraction,” Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 3182–3187, 2020.
[40] A. P. Widyassari, S. Rustad, G. F. Shidik, E. Noersasongko, A. Syukur, and A. Affandy, “Review of automatic text summarization techniques & methods,” Journal of King Saud University-Computer Information Sciences, Vol. 34, No. 4, pp. 1029-1046, 2022.
[41] Q. Xie, X. Zhang, Y. Ding, and M. Song, “Monolingual and multilingual topic analysis using LDA and BERT embeddings,” Journal of Informetrics, Vol. 14, No. 3, pp. 101055, 2020.
[42] S. Xu, H. Li, P. Yuan, Y. Wu, X. He, and B. Zhou, “Self-attention guided copy mechanism for abstractive summarization,” Proceedings of the 58th annual meeting of the association for computational linguistics, pp. 1355-1362, 2020.
[43] M. Xue, “A text retrieval algorithm based on the hybrid LDA and Word2Vec model,” 2019 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS), pp. 373-376, 2019.
[44] H. Ya-Han, C. Yen-Liang, and C. Hui-Ling, “Opinion mining from online hotel reviews – A text summarization approach,” Information Processing & Management, Vol. 53, No. 2, pp. 436-449, 2017.
[45] Y. Yiran, and S. Srivastava, “Aspect-based Sentiment Analysis on mobile phone reviews with LDA,” Proceedings of the 2019 4th International Conference on Machine Learning Technologies, pp. 101-105, 2019.
[46] N. Yu, M. Huang, Y. Shi, and X. Zhu, “Product review summarization by exploiting phrase properties,” Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 1113-1124, 2016.
[47] M. Zhong, P. Liu, Y. Chen, D. Wang, X. Qiu, and X. Huang, “Extractive summarization as text matching,” Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 6197-6208, 2020.

全文公開日期本全文未授權公開 (校外網路)
全文公開日期本全文未授權公開 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文