簡易檢索 / 詳目顯示

研究生: 張維哲
Wei-Zhe Chang
論文名稱: HCFSum: 運用階層式對比學習與過濾策略於文件摘要之研究
HCFSum: A Hierarchical Contrastive Learning Framework with Filtering Strategy for Document Summarization
指導教授: 陳冠宇
Kuan-Yu Chen
口試委員: 曾厚強
Hou-Chiang, Tseng
張智傑
Chih-Chieh Chang
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 54
中文關鍵詞: 文件摘要對比式學習後處理機制抽取式去除噪音
外文關鍵詞: Document Summarization, Contrastive Learning, Post Processing Mechanism, Extractive, Filtering Noise
相關次數: 點閱:338下載:10
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

抽取文本摘要是將原始文檔生成簡潔且信息豐富摘要的任務。為了實現這一目標,模型需要對提取文檔的重要部分有深刻的理解,同時避免所選部件之間信息過度重疊。二元交叉熵損失(BCELoss)已成為訓練句子級摘要模型的常用標準,因為它可以對是否應該提取句子進行分類。此外,更有許多項研究證明了在訓練過程中,額外使用對比損失函數和假設訓練集的優勢。

本文概述抽取文本摘要領域的進展,介紹著名的句子級摘要模型,以及擁有排名器功能的摘要模型。由於二元交叉熵損失的局限性,這可能使句子級摘要模型難以選擇強大的前導句子,因此我們提出了一種新穎的分層對比訓練框架。該框架的目的是指導模型的學習方向以及幫助模型以摘要級角度來評分每個句子。此外,我們提出了一種成本低且功能強大的後處理技術,以減輕噪聲句子對重要句子分數的影響。在 CNN/Dailymail 和 XSUM 數據集上進行的大量實驗表明,與最先進的模型相比,具有卓越的進步。此外,我們也提出了全面的消融研究和分析,以驗證我們提出的方法的有效性。


Extractive text summarization involves generating concise, information-rich summaries from an original document. To achieve this, models must possess a strong understanding of how to extract the essential parts of a document while avoiding overlap in the information. Binary Cross Entropy Loss (BCELoss) has emerged as a widespread criterion for training sentence-level summarization models because it can classify whether a sentence should be extracted. Furthermore, several studies demonstrate the advantages of using multiple criteria and hypotheses during training.

This paper reviews recent advancements in extractive text summarization, focusing on notable sentence-level summarizers. Additionally, we explore approaches that integrate reranker capabilities into summarization systems. Motivated by the limitations of BCELoss, which can make it challenging for sentence-level models to select powerful leading sentences, we propose a novel hierarchical contrastive training framework. This framework can guide the model's learning direction and enable the model to score sentences from a summary-level perspective. Furthermore, we proposed a cost-effective yet powerful post-processing technique to mitigate the impact of noisy sentences on other important sentences, which enables the model to create more robust summaries. Extensive experiments conducted on the CNN/Dailymail and XSUM datasets demonstrate significant improvement compared to state-of-the-art systems. This study also presents comprehensive ablation studies and analyses to validate the effectiveness of our proposed method.

1 Introduction . . . 1 1.1 Motivation . . . 1 1.2 Summary of Contribution . . . 4 2 Related Work . . . 6 2.1 Category of Summarization Method . . . 6 2.2 Sentence-Level Summarizer . . . 7 2.2.1 NEUSUM . . . 7 2.2.2 BERTSUM . . . 11 2.2.3 StarSum . . . 12 2.2.4 RankSum . . . 16 2.3 Two-Stage Summarizer . . . 19 2.3.1 MATCHSUM . . . 19 2.4 One-Stage Summarizer . . . 21 2.4.1 CoLo . . . 21 3 Method . . . 24 3.1 Preliminaries . . . 24 3.2 Model Structure . . . 24 3.3 Binary Cross Entropy Loss . . . 25 3.4 Sequence-Level Contrastive Loss . . . 26 3.5 Summary-Level Contrastive Loss . . . 27 3.6 Final Objective . . . 28 3.7 Post-processing Strategy: FWPRC . . . 28 4 Experiments . . . 31 4.1 Corpus . . . 31 4.2 Training Config . . . 31 4.3 Experiments Results . . . 32 4.4 Ablation Study . . . 35 4.4.1 Analysis of Impact with Varied Backbone . . . 35 4.4.2 Analysis of Impact with Varied Angle Inputs . . . 37 4.4.3 Analysis of Impact with different interpolate weight . . . 38 4.5 Human Evaluation . . . 39 4.6 Highlight Comparison . . . 40 5 Conclusions. . . 45 參考文獻 . . . 46 授權書 . . . 55

[1] N. Moratanch and S. Chitrakala, “A survey on abstractive text summarization,” in
2016 International Conference on Circuit, power and computing technologies (IC-
CPCT), pp. 1–7, IEEE, 2016.
[2] N. Moratanch and S. Chitrakala, “A survey on extractive text summarization,” in
2017 international conference on computer, communication and signal processing (ICCCSP), pp. 1–6, IEEE, 2017.
[3] M. Allahyari, S. Pouriyeh, M. Assefi, S. Safaei, E. D. Trippe, J. B. Gutierrez,
and K. Kochut, “Text summarization techniques: a brief survey,” arXiv preprint arXiv:1707.02268, 2017.
[4] A. See, P. J. Liu, and C. D. Manning, “Get to the point: Summarization with pointer-generator networks,” 2017.
[5] J. Zhang, Y. Zhao, M. Saleh, and P. Liu, “PEGASUS: Pre-training with extracted
gap-sentences for abstractive summarization,” in Proceedings of the 37th Interna-
tional Conference on Machine Learning (H. D. III and A. Singh, eds.), vol. 119 of
Proceedings of Machine Learning Research, pp. 11328–11339, PMLR, 13–18 Jul
2020.
[6] X. Zhang, Y. Liu, X. Wang, P. He, Y. Yu, S.-Q. Chen, W. Xiong, and F. Wei, “Mo-
mentum calibration for text generation,” 2022.
[7] Y. Liu, P. Liu, D. Radev, and G. Neubig, “Brio: Bringing order to abstractive summarization,” 2022.
[8] M. Ravaut, S. Joty, and N. Chen, “SummaReranker: A multi-task mixture-of-experts
re-ranking framework for abstractive summarization,” in Proceedings of the 60th
Annual Meeting of the Association for Computational Linguistics (Volume 1: Long
Papers), (Dublin, Ireland), pp. 4504–4524, Association for Computational Linguis-
tics, May 2022.
[9] Y. Dong, Y. Shen, E. Crawford, H. van Hoof, and J. C. K. Cheung, “BanditSum: Ex-
tractive summarization as a contextual bandit,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, (Brussels, Belgium),
pp. 3739–3748, Association for Computational Linguistics, Oct.-Nov. 2018.
[10] X. Zhang, F. Wei, and M. Zhou, “HIBERT: Document level pre-training of hierarchical bidirectional transformers for document summarization,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, (Florence, Italy), pp. 5059–5069, Association for Computational Linguistics, July 2019.
[11] R. Jia, Y. Cao, H. Tang, F. Fang, C. Cao, and S. Wang, “Neural extractive summarization with hierarchical attentive heterogeneous graph network,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), (Online), pp. 3622–3631, Association for Computational Linguistics, Nov. 2020.
[12] J. Bian, X. Huang, H. Zhou, and S. Zhu, “Gosum: Extractive summarization of long
documents by reinforcement learning and graph organized discourse state,” 2023.
[13] R. Ferreira, L. de Souza Cabral, F. Freitas, R. D. Lins, G. de França Silva, S. J.Simske, and L. Favaro, “A multi-document summarization system based on statis-
tics and linguistic treatment,” Expert Systems with Applications, vol. 41, no. 13,
pp. 5780–5787, 2014.
[14] J. M. Sanchez-Gomez, M. A. Vega-Rodríguez, and C. J. Pérez, “Extractive multi-
document text summarization using a multi-objective artificial bee colony optimiza-
tion approach,” Knowledge-Based Systems, vol. 159, pp. 1–8, 2018.
[15] R. M. Alguliyev, R. M. Aliguliyev, N. R. Isazade, A. Abdi, and N. Idris, “Cosum:
Text summarization based on clustering and optimization,” Expert Systems, vol. 36,
no. 1, p. e12340, 2019.
[16] N. Gu, E. Ash, and R. Hahnloser, “MemSum: Extractive summarization of long
documents using multi-step episodic Markov decision processes,” in Proceedings of
the 60th Annual Meeting of the Association for Computational Linguistics (Volume
1: Long Papers), (Dublin, Ireland), pp. 6507–6522, Association for Computational
Linguistics, May 2022.
[17] G. Erkan and D. R. Radev, “Lexrank: Graph-based lexical centrality as salience in text summarization,” Journal of artificial intelligence research, vol. 22, pp. 457–479, 2004.
[18] G. Erkan and D. Radev, “Lexpagerank: Prestige in multi-document text summa-
rization,” in Proceedings of the 2004 conference on empirical methods in natural
language processing, pp. 365–371, 2004.
[19] J. Goldstein, V. O. Mittal, J. G. Carbonell, and M. Kantrowitz, “Multi-document
summarization by sentence extraction,” in NAACL-ANLP 2000 workshop: auto-
matic summarization, 2000.
[20] R. M. Alguliev, R. M. Aliguliyev, M. S. Hajirahimova, and C. A. Mehdiyev, “Mcmr:
Maximum coverage and minimum redundant text summarization model,” Expert
Systems with Applications, vol. 38, no. 12, pp. 14514–14522, 2011.
[21] Q. Zhou, N. Yang, F. Wei, S. Huang, M. Zhou, and T. Zhao, “Neural document
summarization by jointly learning to score and select sentences,” in Proceedings of
the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1:
Long Papers), (Melbourne, Australia), pp. 654–663, Association for Computational
Linguistics, July 2018.
[22] Y. Liu and M. Lapata, “Text summarization with pretrained encoders,” 2019.
[23] D. Wang, P. Liu, Y. Zheng, X. Qiu, and X. Huang, “Heterogeneous graph neural
networks for extractive document summarization,” 2020.
[24] K. Shi, X. Cai, L. Yang, J. Zhao, and S. Pan, “Starsum: A star architecture based model for extractive summarization,” IEEE/ACM Transactions on Audio, Speech,
and Language Processing, vol. 30, pp. 3020–3031, 2022.
[25] R. Paulus, C. Xiong, and R. Socher, “A deep reinforced model for abstractive sum
marization,” 2017.
[26] M. Zhong, P. Liu, Y. Chen, D. Wang, X. Qiu, and X. Huang, “Extractive summa-
rization as text matching,” 2020.
[27] Y. Liu and P. Liu, “Simcls: A simple framework for contrastive learning of abstractive summarization,” 2021.
[28] C. An, M. Zhong, Z. Wu, Q. Zhu, X. Huang, and X. Qiu, “CoLo: A contrastive learn
ing based re-ranking framework for one-stage summarization,” in Proceedings of the
29th International Conference on Computational Linguistics, (Gyeongju, Republic
48 of Korea), pp. 5783–5793, International Committee on Computational Linguistics,
Oct. 2022.
[29] A. Hermans, L. Beyer, and B. Leibe, “In defense of the triplet loss for person reidentification,” arXiv preprint arXiv:1703.07737, 2017.
[30] S. Xie and Y. Liu, “Using n-best lists and confusion networks for meeting summa-
rization,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19,
no. 5, pp. 1160–1169, 2011.
[31] A. Stolcke, Y. Konig, and M. Weintraub, “Explicit word error minimization in n-
best list rescoring,” in Fifth European Conference on Speech Communication and
Technology, 1997.
[32] V. H. Quan, M. Federico, and M. Cettolo, “Integrated n-best re-ranking for spoken language translation.,” in Interspeech, pp. 3181–3184, 2005.
[33] A. Nenkova, L. Vanderwende, and K. McKeown, “A compositional context sensitive
multi-document summarizer: exploring the factors that influence summarization,” in
Proceedings of the 29th annual international ACM SIGIR conference on Research
and development in information retrieval, pp. 573–580, 2006.
[34] A. Farzindar and G. Lapalme, “Legal text summarization by exploration of the
thematic structure and argumentative roles,” in Text Summarization Branches Out,
pp. 27–34, 2004.
[35] Y.-H. Hu, Y.-L. Chen, and H.-L. Chou, “Opinion mining from online hotel reviews–
a text summarization approach,” Information Processing & Management, vol. 53,
no. 2, pp. 436–449, 2017.
[36] W. Kryściński, B. McCann, C. Xiong, and R. Socher, “Evaluating the factual consistency of abstractive text summarization,” arXiv preprint arXiv:1910.12840, 2019.
[37] Q. Zhou, N. Yang, F. Wei, S. Huang, M. Zhou, and T. Zhao, “Neural document
summarization by jointly learning to score and select sentences,” arXiv preprint
arXiv:1807.02305, 2018.
[38] A. Jadhav and V. Rajan, “Extractive summarization with swap-net: Sentences and
words from alternating pointer networks,” in ACL 2018-56th Annual Meeting of the
Association for Computational Linguistics, Proceedings of the Conference (Long Pa-
pers), vol. 1, pp. 142–151, Association for Computational Linguistics (ACL), 2018.
[39] X. Zhang, F. Wei, and M. Zhou, “Hibert: Document level pre-training of hier-
archical bidirectional transformers for document summarization,” arXiv preprint
arXiv:1905.06566, 2019.
[40] D. Wang, P. Liu, M. Zhong, J. Fu, X. Qiu, and X. Huang, “Exploring domain shift
in extractive text summarization,” arXiv preprint arXiv:1908.11664, 2019.
[41] R. Mihalcea and P. Tarau, “Textrank: Bringing order into text,” in Proceedings of the 2004 conference on empirical methods in natural language processing, pp. 404–411, 2004.
[42] Z. Cao, F. Wei, L. Dong, S. Li, and M. Zhou, “Ranking with recursive neural net-
works and its application to multi-document summarization,” in Proceedings of the
AAAI conference on artificial intelligence, vol. 29, 2015.
[43] P. Ren, Z. Chen, Z. Ren, F. Wei, J. Ma, and M. de Rijke, “Leveraging contextual
sentence relations for extractive summarization using a neural attention model,” in
Proceedings of the 40th International ACM SIGIR Conference on Research and De-
velopment in Information Retrieval, pp. 95–104, 2017.
[44] R. McDonald, “A study of global inference algorithms in multi-document summa-
rization,” in European Conference on Information Retrieval, pp. 557–564, Springer,
2007.
[45] H. Lin and J. Bilmes, “A class of submodular functions for document summariza-
tion,” in Proceedings of the 49th annual meeting of the association for computational
linguistics: human language technologies, pp. 510–520, 2011.
[46] P. Ren, F. Wei, Z. Chen, J. Ma, and M. Zhou, “A redundancy-aware sentence regression framework for extractive summarization,” in Proceedings of COLING 2016,
the 26th International Conference on Computational Linguistics: Technical Papers,
pp. 33–43, 2016.
[47] D. M. Zajic, B. Dorr, J. Lin, and R. Schwartz, “Sentence compression as a component of a multi-document summarization system,” in Proceedings of the 2006 document understanding workshop, New York, 2006.
[48] M. A. Mohamed and M. Oussalah, “Similarity-based query-focused multi-document
summarization using crowdsourced and manually-built lexical-semantic resources,”
in 2015 IEEE Trustcom/BigDataSE/ISPA, vol. 2, pp. 80–87, IEEE, 2015.
[49] Z.-Y. Wu and K.-Y. Chen, “Ebsum: 基於 bert 的強健性抽取式摘要法 (ebsum: An en-
hanced bert-based extractive summarization framework),” in International Journal
of Computational Linguistics & Chinese Language Processing, Volume 24, Number
2, December 2019, 2019.
[50] K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk,
and Y. Bengio, “Learning phrase representations using rnn encoder-decoder for sta-
tistical machine translation,” 2014.
[51] G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, 2015.
[52] H. Inan, K. Khosravi, and R. Socher, “Tying word vectors and word classifiers: A
loss framework for language modeling,” arXiv preprint arXiv:1611.01462, 2016.
[53] J. Carbonell and J. Goldstein, “The use of mmr, diversity-based reranking for re-ordering documents and producing summaries,” in Proceedings of the 21st annual
international ACM SIGIR conference on Research and development in information
retrieval, pp. 335–336, 1998.
[54] Q. Guo, X. Qiu, P. Liu, Y. Shao, X. Xue, and Z. Zhang, “Star-transformer,” arXiv
preprint arXiv:1902.09113, 2019.
[55] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” 2017.
[56] J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” 2016.
[57] A. Joshi, E. Fidalgo, E. Alegre, and R. Alaiz-Rodriguez, “Ranksum—an unsuper-
vised extractive text summarization based on rank fusion,” Expert Systems with Ap-
plications, vol. 200, p. 116846, 2022.
[58] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” Journal of machine Learning research, vol. 3, no. Jan, pp. 993–1022, 2003.
[59] N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese
bert-networks,” arXiv preprint arXiv:1908.10084, 2019.
[60] A. Conneau, D. Kiela, H. Schwenk, L. Barrault, and A. Bordes, “Supervised learn-
ing of universal sentence representations from natural language inference data,” in
Proceedings of the 2017 Conference on Empirical Methods in Natural Language
Processing, (Copenhagen, Denmark), pp. 670–680, Association for Computational
Linguistics, Sept. 2017.
[61] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep
bidirectional transformers for language understanding,” in Proceedings of the 2019
Conference of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers),
(Minneapolis, Minnesota), pp. 4171–4186, Association for Computational Linguis-
tics, June 2019.
[62] S. Brin and L. Page, “The anatomy of a large-scale hypertextual web search engine,”
Computer networks and ISDN systems, vol. 30, no. 1-7, pp. 107–117, 1998.
[63] H. P. Edmundson, “New methods in automatic extracting,” Journal of the ACM
(JACM), vol. 16, no. 2, pp. 264–285, 1969.
[64] H. P. Luhn, “The automatic creation of literature abstracts,” IBM Journal of Research
and Development, vol. 2, no. 2, pp. 159–165, 1958.
[65] B. Mitra, F. Diaz, and N. Craswell, “Learning to match using local and distributed
representations of text for web search,” in Proceedings of the 26th international conference on world wide web, pp. 1291–1299, 2017.
[66] S. W.-t. Yih, M.-W. Chang, C. Meek, and A. Pastusiak, “Question answering using
enhanced lexical semantic models,” in Proceedings of the 51st Annual Meeting of
the Association for Computational linguistics, 2013.
[67] A. Severyn and A. Moschitti, “Learning to rank short text pairs with convolutional deep neural networks,” in Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, pp. 373–382, 2015.
[68] S. Wang and J. Jiang, “Learning natural language inference with lstm,” arXiv
preprint arXiv:1512.08849, 2015.
[69] Z. Wang, W. Hamza, and R. Florian, “Bilateral multi-perspective matching for nat
ural language sentences,” arXiv preprint arXiv:1702.03814, 2017.
[70] J. Bromley, I. Guyon, Y. LeCun, E. Säckinger, and R. Shah, “Signature verifica-
tion using a” siamese” time delay neural network,” Advances in neural information
processing systems, vol. 6, 1993.
[71] E. Hoffer and N. Ailon, “Deep metric learning using triplet network,” in Similarity Based Pattern Recognition: Third International Workshop, SIMBAD 2015, Copenhagen, Denmark, October 12-14, 2015. Proceedings 3, pp. 84–92, Springer, 2015.
[72] S. Sun and W. Li, “Alleviating exposure bias via contrastive learning for abstractive text summarization,” arXiv preprint arXiv:2108.11846, 2021.
[73] C. An, M. Zhong, Z. Geng, J. Yang, and X. Qiu, “Retrievalsum: A retrieval enhanced framework for abstractive summarization,” arXiv preprint arXiv:2109.07943, 2021.
[74] S. Lee, D. B. Lee, and S. J. Hwang, “Contrastive learning with adversarial perturbations for conditional text generation,” arXiv preprint arXiv:2012.07280, 2020.
[75] C. An, J. Feng, K. Lv, L. Kong, X. Qiu, and X. Huang, “Cont: Contrastive neural text generation,” Advances in Neural Information Processing Systems, vol. 35, pp. 2197–2210, 2022.
[76] F. Stahlberg and B. Byrne, “On nmt search errors and model errors: Cat got your
tongue?,” arXiv preprint arXiv:1908.10090, 2019.
[77] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov,
and L. Zettlemoyer, “Bart: Denoising sequence-to-sequence pre-training for natural
language generation, translation, and comprehension,” 2019.
[78] M. Eyal, T. Baumel, and M. Elhadad, “Question answering as an automatic evalua-
tion metric for news article summarization,” in Proceedings of the 2019 Conference
of the North American Chapter of the Association for Computational Linguistics:
Human Language Technologies, Volume 1 (Long and Short Papers), (Minneapolis,
Minnesota), pp. 3938–3948, Association for Computational Linguistics, June 2019.
[79] P. Papalampidi, F. Keller, L. Frermann, and M. Lapata, “Screenplay summarization
using latent narrative structure,” 2020.
[80] K. Davila, F. Xu, S. Setlur, and V. Govindaraju, “Fcn-lecturenet: Extractive summarization of whiteboard and chalkboard lecture videos,” IEEE Access, vol. 9,
pp. 104469–104484, 2021.
[81] K. M. Hermann, T. Kocisky, E. Grefenstette, L. Espeholt, W. Kay, M. Suleyman,
and P. Blunsom, “Teaching machines to read and comprehend,” Advances in neural
information processing systems, vol. 28, 2015.
[82] K. M. Hermann, T. Kocisky, E. Grefenstette, L. Espeholt, W. Kay, M. Suleyman, and P. Blunsom, “Teaching machines to read and comprehend,” in Advances in Neural
Information Processing Systems (C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and
R. Garnett, eds.), vol. 28, Curran Associates, Inc., 2015.
[83] A. See, P. J. Liu, and C. D. Manning, “Get to the point: Summarization with pointer generator networks,” arXiv preprint arXiv:1704.04368, 2017.
[84] S. Narayan, S. B. Cohen, and M. Lapata, “Don’t give me the details, just the
summary! topic-aware convolutional neural networks for extreme summarization,”
arXiv preprint arXiv:1808.08745, 2018.
[85] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2017.
[86] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettle
moyer, and V. Stoyanov, “Roberta: A robustly optimized bert pretraining approach,”
arXiv preprint arXiv:1907.11692, 2019.
[87] K. Lagler, M. Schindelegger, J. Böhm, H. Krásná, and T. Nilsson, “Gpt2: Empirical slant delay model for radio space geodetic techniques,” Geophysical research letters, vol. 40, no. 6, pp. 1069–1073, 2013.

QR CODE