人類與大語言模型的摘要能力比較及關鍵字向量長度訓練方法的雙重研究

簡易檢索 / 詳目顯示

回結果列表

研究生：	周孝威 Hsiao-Wei Chou
論文名稱：	人類與大語言模型的摘要能力比較及關鍵字向量長度訓練方法的雙重研究 A Dual Research on the Comparative Summarization Abilities of Humans and Large Language Models and the Training Method Using Keyword Vector Length
指導教授：	陳冠宇 Kuan-Yu Chen
口試委員:	曾厚強 Hou-Chiang Tseng 蘇明祥 Ming-Hsiang Su
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2024
畢業學年度：	112
語文別：	英文
論文頁數：	116
中文關鍵詞：	眾包實驗、大語言模型、預訓練、文件摘要、關鍵字訓練
外文關鍵詞：	Crowdsourcing Experiment, Large Language Model, Pre-training, Document Summarization, Keyword-based Training
相關次數：	點閱：550 下載：5
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

文本摘要，即將冗長的文本濃縮成關鍵信息，一直在人類發展中扮演著重要角色。從古代的手寫書籍到現代的數字信息爆炸，文本摘要不僅提高了信息處理的效率，還促進了知識的傳播和交流。近年來，由於大型語言模型的出現及其便利性和泛用性，它們越來越受到人們的關注。許多人更喜歡將文章輸入這些大型語言模型中以獲得摘要結果，而非使用傳統的本地端模型。這使得傳統的本地端模型研究逐漸受到衝擊，其原因在於人類撰寫的摘要難以與大型語言模型生成的摘要相比。因此，本篇論文將進一步深入探索此現象，並透過 Amazon Mechanical Turk、Upwork等眾包平台進行實驗。研究發現，若將人類撰寫的摘要替換成大型語言模型生成的摘要，並用來訓練傳統的本地端模型，傳統模型的生成能力將會超過大型語言模型。基於這一結論，我們提出了新的摘要方法"Adaptive-WordRank"，透過文章關鍵字來訓練模型，並在多項指標以及多個不同數據集上的測試顯示，我們的方法皆是有效的。綜合這兩項研究的結果，我們希望能為文本摘要領域帶來新的進步和變革。

Text summarization has played a crucial role throughout human history, from ancient handwritten books to the modern digital era, improving information processing efficiency and facilitating knowledge exchange. Recently, large language models have gained popularity for summarization tasks due to their convenience and versatility, with many preferring them over conventional local models. This trend has impacted research on conventional local models, as human-written summaries struggle to compete with those generated by large language models. Our study explores this phenomenon through experiments on crowdsourcing platforms like Amazon Mechanical Turk and Upwork. Our research reveals that training conventional local models with summaries generated by large language models can lead to the former surpassing the performance of large language models in generation capability. Based on this, We propose a new summarization method called 'Adaptive-WordRank,' which trains the model using the article's keywords. This approach has proven effective across multiple metrics and datasets. By combining these research findings, we hope to bring new advancements and transformation to the field of text summarization.

Recommendation Letter i
論文摘要 iii
Abstract iv
致謝 v
Acknowledgements vi
Contents vii
List of Figures xii
List of Tables xiv
Introduction 1
1. Background  1
2. What is a Large Language Models?  2
3. Large Language Models: Performance and Comparison  3
4. Crisis in the Field of Summarization  6
5. Enhancing Summarization: The Strategic Importance of Keywords 7
6. Research Motivation and Objectives  7
7. Methodology   8
8. Contributions   9
9. Structure of this paper 10
Related Work 11
1. Text Summarization in Machine Learning 11
2. Transformer Architecture 12
3. Large Language Model Summarization Capabilities 15
4. Likert Scale 16
5. Consistency Correlation Coefficient  18
5.1. Gwet’s Coefficient 18
5.2. Percentage Agreement 20
5.3. Pearson Correlation Coefficient 21
6. Related Summarization Methods 22
6.1. Prompt-based Learning  22
6.2. Contrastive Learning 22
6.3. Joint Extractive and Abstractive Summarization  23
6.4. Other Advanced Summarization Techniques   24
7. Selective Self Attention Networks (SSANs) 25
The Models Used in This Paper 28
1. GPT-4 28
2. BART - Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension 29
3. PEGASUS - Pre-training with Extracted Gap-sentences for Abstractive Summarization 30
4. BRIO - Bringing Order to Abstractive Summarization  32
Dataset 35
1. CNN / Daily Mail News Summarization Dataset (CNNDM) 35
2. Extreme Summarization Dataset (XSUM) 38
Research 1: Comparing the summarization capabilities of GPT4, Conventional Summary Models, and Humans 40
1. Knowledge Distillation 40
2. Research Platform   42
2.1. Amazon Mechanical Turk (Mturk) 42
2.2. Upwork | The World’s Work Marketplace 43
3. Evaluation Method  45
3.1. Human Preference 46
3.2. Automated Evaluation Metric   48
4. Comparison Results and Analysis between Large Language Models and Humans  50
4.1. Mturk Experiment Results   50
4.2. Upwork Experiment Results  53
4.3. Automated Evaluation Metric   58
4.4. Summary Novelty 60
Research 2: Adaptive-WordRank: Enhancing Abstractive Document Summarization with an Adaptive Word Ranking Strategy 62
1. WordRank: A Word Ranking based Training Strategy for Abstractive Document Summarization 62
2. Selective Cross Attention Networks  65
3. Experimental Datasets and Evaluation Metrics 70
4. Experimental Parameter Settings 72
5. Results and Analysis 74
5.1. CNNDM GPT−4 and XSUM GPT−4 74
5.2. CNNDMHuman and XSUMHuman 81
6. Ablation Study 90
6.1. Effects Across Each Architecture 90
6.2. The Impact of SCAN on Each Layer of the Decoder 93
6.3. Vector Length Comparison 95
Conclusions 98
Limitations & Future Work 99
Appendix 1：Research Supplement 100
A.1. The Impact of Summary Length on Human Preferences  100
A.2. Comparison of Summarizations by Different Large Language Models  103
A.3. Github Connection, API Prompt Templates, and Total Experiment Cost 105
A.4. Case Study of Upwork Multi-Summary Ranking 105
References 108
                                

[1] J. Gerstmayr, P. Manzl, and M. Pieber, “Multibody models generated from natural language.” Online. Available: https://link.springer.com/article/10.1007/s11044-023-09962-0.
[2] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
[3] X. Geng, L. Wang, X. Wang, B. Qin, T. Liu, and Z. Tu, “How does selective mechanism improve self-attention networks?,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, eds.), (Online), pp. 2986–2995, Association for Computational Linguistics, July 2020.
[4] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer, “BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics(D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, eds.), (Online), pp. 7871–7880, Association for Computational Linguistics, July 2020.
[5] J. Zhang, Y. Zhao, M. Saleh, and P. Liu, “Pegasus: Pre-training with extracted gap-sentences for abstractive summarization,” in International conference on machine learning, pp. 11328–11339, PMLR, 2020.
[6] Y. Liu, P. Liu, D. Radev, and G. Neubig, “BRIO: Bringing order to abstractive summarization,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (S. Muresan, P. Nakov, and A. Villavicencio, eds.), (Dublin, Ireland), pp. 2890–2903, Association for Computational Linguistics, May 2022.
[7] CNN/ Daily Mail, “Cnn/ daily mail.” Online. Available: https://paperswithcode.com/dataset/ cnn-daily-mail-1.
[8] H.-W. Chou, P.-Y. Wu, J.-J. Tu, and K.-y. Chen, “WordRank: A word ranking based training strategy for abstractive document summarization,” in Proceedings of the 35th Conference on Computational Linguistics and Speech Processing (ROCLING 2023) (J.-L. Wu and M.-H. Su, eds.), (Taipei City, Taiwan), pp. 79–88, The Association for Computational Linguistics and Chinese Language Processing (ACLCLP), Oct. 2023.
[9] J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat, et al., “Gpt-4 technical report,” arXiv preprint arXiv:2303.08774, 2023.
[10] W. Xiong, J. Liu, I. Molybog, H. Zhang, P. Bhargava, R. Hou, L. Martin, R. Rungta, K. A. Sankararaman, B. Oguz, et al., “Effective long-context scaling of foundation models,” arXiv preprint arXiv:2309.16039, 2023.
[11] Meta, “Llama3 perference.” Online. Available: https://ai.meta.com/blog/meta-llama-3/.
[12] K. Sparck Jones, “A statistical interpretation of term specificity and its application in retrieval,” Journal of documentation, vol. 28, no. 1, pp. 11–21, 1972.
[13] R. Mihalcea and P. Tarau, “Textrank: Bringing order into text,” in Proceedings of the 2004 conference on empirical methods in natural language processing, pp. 404–411, 2004.
[14] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” Advances in neural information processing systems, vol. 27, 2014.
[15] C. Doersch, A. Gupta, and A. A. Efros, “Unsupervised visual representation learning by context prediction,” in 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1422–1430, 2015.
[16] T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa, “Large language models are zero-shot reasoners,” 2023.
[17] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
[18] A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, H. W. Chung, C. Sutton, S. Gehrmann, et al., “Palm: Scaling language modeling with pathways,” Journal of Machine Learning Research, vol. 24, no. 240, pp. 1–113, 2023.
[19] T. Le Scao, A. Fan, C. Akiki, E. Pavlick, S. Ilić, D. Hesslow, R. Castagné, A. S. Luccioni, F. Yvon, M. Gallé, et al., “Bloom: A 176b-parameter open-access multilingual language model,” 2023.
[20] H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, and G. Lample, “Llama: Open and efficient foundation language models,” 2023.
[21] H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, et al., “Llama 2: Open foundation and fine-tuned chat models,” arXiv preprint arXiv:2307.09288, 2023.
[22] W. MacAskill, What We Owe The Future. Basic Books, Aug 2022.
[23] OpenAI, “Gpt-2: 1.5b release.” https://openai.com/research/gpt-2-1-5b-release, Nov 2019.
[24] S. Kreps, R. M. McCain, and M. Brundage, “All the news that’s fit to fabricate: Ai-generated text as a tool of media misinformation,” Journal of Experimental Political Science, vol. 9, no. 1, pp. 104–117, 2022.
[25] B. Buchanan, A. Lohn, M. Musser, and K. Sedova, “Truth, lies, and automation,” tech. rep., Center for Security and Emerging Technology, May 2021.
[26] R. Richardson, J. M. Schultz, and K. Crawford, “Dirty data, bad predictions: How civil rights violations impact police data, predictive policing systems, and justice,” NYUL Rev. Online, vol. 94, p. 15, 2019.
[27] A. Myers, “Ai’s powers of political persuasion.” https://hai.stanford.edu/news/ ais-powers-political-persuasion, Feb 2023.
[28] H. Bai, J. Voelkel, J. Eichstaedt, and R. Willer, “Artificial intelligence can persuade humans on political issues,” 2023.
[29] E. Horvitz, “On the horizon: Interactive and compositional deepfakes,” in INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, pp. 653–661, Nov 2022.
[30] Y. Bai, A. Jones, K. Ndousse, A. Askell, A. Chen, N. DasSarma, D. Drain, S. Fort, D. Ganguli, T. Henighan, et al., “Training a helpful and harmless assistant with reinforcement learning from human feedback,” arXiv preprint arXiv:2204.05862, 2022.
[31] R. Chesney and D. K. Citron, “Deep fakes: A looming challenge for privacy, democracy, and national security,” July 2018.
[32] J. Hilton, R. Nakano, S. Balaji, and J. Schulman, “Webgpt: Improving the factual accuracy of language models through web browsing,” OpenAI Blog, December, vol. 16, 2021.
[33] U.S. Department of Commerce, “Dual use export licenses,” March 13 2023. accessed 2023-03-13.
[34] NATO, “Arms control, disarmament and non-proliferation in nato,” February 27 2023. accessed 2023-02-27.
[35] N. Carlini, F. Tramer, E. Wallace, M. Jagielski, A. Herbert-Voss, K. Lee, A. Roberts, T. Brown, D. Song, U. Erlingsson, A. Oprea, and C. Raffel, “Extracting training data from large language models,” June 2021.
[36] N. Carlini, D. Ippolito, M. Jagielski, K. Lee, F. Tramer, and C. Zhang, “Quantifying memorization across neural language models,” Mar 2023.
[37] D. Ganguli, D. Hernandez, L. Lovitt, N. DasSarma, T. Henighan, A. Jones, N. Joseph, J. Kernion, B. Mann, A. Askell, Y. Bai, A. Chen, T. Conerly, D. Drain, N. Elhage, S. E. Showk, S. Fort, Z. Hatfield-Dodds, S. Johnston, S. Kravec, N. Nanda, K. Ndousse, C. Olsson, D. Amodei, D. Amodei, T. Brown, J. Kaplan, S. McCandlish, C. Olah, and J. Clark, “Predictability and surprise in large generative models,” in 2022 ACM Conference on Fairness, Accountability, and Transparency, pp. 1747–1764, June 2022.
[38] J. Wei, Y. Tay, R. Bommasani, C. Raffel, B. Zoph, S. Borgeaud, D. Yogatama, M. Bosma, D. Zhou, D. Metzler, E. H. Chi, T. Hashimoto, O. Vinyals, P. Liang, J. Dean, and W. Fedus, “Emergent abilities of large language models,” Oct 2022.
[39] Y. K. Chia, P. Hong, L. Bing, and S. Poria, “Instructeval: Towards holistic evaluation of instruction-tuned large language models,” arXiv preprint arXiv:2306.04757, 2023.
[40] W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang, J. Zhang, Z. Dong, et al., “A survey of large language models,” arXiv preprint arXiv:2303.18223, 2023.
[41] A. R. Fabbri, W. Kryściński, B. McCann, C. Xiong, R. Socher, and D. Radev, “Summeval: Re-evaluating summarization evaluation,” Transactions of the Association for Computational Linguistics, vol. 9, pp. 391–409, 2021.
[42] B. Ding, C. Qin, L. Liu, Y. K. Chia, S. Joty, B. Li, and L. Bing, “Is gpt-3 a good data annotator?,” arXiv preprint arXiv:2212.10450, 2022.
[43] Y. Liu, A. R. Fabbri, P. Liu, D. Radev, and A. Cohan, “On learning to summarize with large language models as references,” arXiv preprint arXiv:2305.14239, 2023.
[44] T. Zhang, F. Ladhak, E. Durmus, P. Liang, K. McKeown, and T. B. Hashimoto, “Benchmarking large language models for news summarization. arxiv 2023,” arXiv preprint arXiv:2301.13848.
[45] X. Pu, M. Gao, and X. Wan, “Summarization is (almost) dead,” arXiv preprint arXiv:2309.09558, 2023.
[46] R. Nallapati, B. Zhou, C. dos Santos, Ç. Gulçehre, and B. Xiang, “Abstractive text summarization using sequence-to-sequence RNNs and beyond,” in Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning (S. Riezler and Y. Goldberg, eds.), (Berlin, Germany), pp. 280–290, Association for Computational Linguistics, Aug. 2016.
[47] W. Li, X. Xiao, Y. Lyu, and Y. Wang, “Improving neural abstractive document summarization with explicit information selection modeling,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (E. Riloff, D. Chiang, J. Hockenmaier, and J. Tsujii, eds.), (Brussels, Belgium), pp. 1787–1796, Association for Computational Linguistics, Oct.-Nov. 2018
[48] T. Shi, Y. Keneshloo, N. Ramakrishnan, and C. K. Reddy, “Neural abstractive text summarization with sequenceto-sequence models,” ACM/IMS Trans. Data Sci., vol. 2, jan 2021.
[49] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,” J. Mach. Learn. Res., vol. 21, jan 2020.
[50] M. Reid, E. Marrese-Taylor, and Y. Matsuo, “Subformer: Exploring weight sharing for parameter efficiency in generative transformers,” in Findings of the Association for Computational Linguistics: EMNLP 2021 (M.- F. Moens, X. Huang, L. Specia, and S. W.-t. Yih, eds.), (Punta Cana, Dominican Republic), pp. 4081–4090, Association for Computational Linguistics, Nov. 2021.
[51] Z. Fan, Y. Gong, D. Liu, Z. Wei, S. Wang, J. Jiao, N. Duan, R. Zhang, and X. Huang, “Mask attention networks: Rethinking and strengthen transformer,” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies(K. Toutanova, A. Rumshisky, L. Zettlemoyer, D. Hakkani-Tur, I. Beltagy, S. Bethard, R. Cotterell, T. Chakraborty, and Y. Zhou, eds.), (Online), pp. 1692–1701, Association for Computational Linguistics, June 2021.
[52] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2016.
[53] J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” 2016.
[54] V. Sanh, A. Webson, C. Raffel, S. H. Bach, L. Sutawika, Z. Alyafeai, A. Chaffin, A. Stiegler, T. L. Scao, A. Raja, et al., “Multitask prompted training enables zero-shot task generalization,” arXiv preprint arXiv:2110.08207, 2021.
[55] T. Goyal, J. J. Li, and G. Durrett, “News summarization and evaluation in the era of gpt-3,” arXiv preprint arXiv:2209.12356, 2022.
[56] Y. Wang, S. Mishra, P. Alipoormolabashi, Y. Kordi, A. Mirzaei, A. Arunkumar, A. Ashok, A. S. Dhanasekaran, A. Naik, D. Stap, et al., “Benchmarking generalization via in-context instructions on 1,600+ language tasks,” arXiv preprint arXiv:2204.07705, vol. 2, 2022.
[57] S. Wang, Y. Liu, Y. Xu, C. Zhu, and M. Zeng, “Want to reduce labeling cost? GPT-3 can help,” in Findings of the Association for Computational Linguistics: EMNLP 2021 (M.-F. Moens, X. Huang, L. Specia, and S. W.-t. Yih, eds.), (Punta Cana, Dominican Republic), pp. 4195–4205, Association for Computational Linguistics, Nov. 2021.
[58] C.-Y. Hsieh, C.-L. Li, C.-k. Yeh, H. Nakhost, Y. Fujii, A. Ratner, R. Krishna, C.-Y. Lee, and T. Pfister, “Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes,” in Findings of the Association for Computational Linguistics: ACL 2023 (A. Rogers, J. Boyd-Graber, and N. Okazaki, eds.), (Toronto, Canada), pp. 8003–8017, Association for Computational Linguistics, July 2023.
[59] H. Dai, Z. Liu, W. Liao, X. Huang, Y. Cao, Z. Wu, L. Zhao, S. Xu, W. Liu, N. Liu, S. Li, D. Zhu, H. Cai, L. Sun, Q. Li, D. Shen, T. Liu, and X. Li, “Auggpt: Leveraging chatgpt for text data augmentation,” 2023.
[60] G. Sahu, P. Rodriguez, I. Laradji, P. Atighehchian, D. Vazquez, and D. Bahdanau, “Data augmentation for intent classification with off-the-shelf large language models,” in Proceedings of the 4th Workshop on NLP for Conversational AI (B. Liu, A. Papangelis, S. Ultes, A. Rastogi, Y.-N. Chen, G. Spithourakis, E. Nouri, and W. Shi, eds.), (Dublin, Ireland), pp. 47–57, Association for Computational Linguistics, May 2022.
[61] K. Shridhar, A. Stolfo, and M. Sachan, “Distilling reasoning capabilities into smaller language models,” in Findings of the Association for Computational Linguistics: ACL 2023 (A. Rogers, J. Boyd-Graber, and N. Okazaki, eds.), (Toronto, Canada), pp. 7059–7073, Association for Computational Linguistics, July 2023.
[62] S. Li, J. Chen, Y. Shen, Z. Chen, X. Zhang, Z. Li, H. Wang, J. Qian, B. Peng, Y. Mao, et al., “Explanations from large language models make small reasoners better,” arXiv preprint arXiv:2210.06726, 2022.
[63] Z. Gekhman, J. Herzig, R. Aharoni, C. Elkind, and I. Szpektor, “TrueTeacher: Learning factual consistency evaluation with large language models,” in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (H. Bouamor, J. Pino, and K. Bali, eds.), (Singapore), pp. 2053–2070, Association for Computational Linguistics, Dec. 2023.
[64] J. Maynez, S. Narayan, B. Bohnet, and R. McDonald, “On faithfulness and factuality in abstractive summarization,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, eds.), (Online), pp. 1906–1919, Association for Computational Linguistics, July 2020.
[65] N. Stiennon, L. Ouyang, J. Wu, D. Ziegler, R. Lowe, C. Voss, A. Radford, D. Amodei, and P. F. Christiano, “Learning to summarize with human feedback,” Advances in Neural Information Processing Systems, vol. 33, pp. 3008–3021, 2020.
[66] I. R. G. Barus, M. B. Simanjuntak, and I. Resmayasari, “Reading literacies through evieta-based learning material: Students’perceptions (study case taken from vocational school–ipb university),” Journal of Advanced English Studies, vol. 4, no. 1, pp. 15–20, 2021.
[67] A. Joshi, S. Kale, S. Chandel, and D. K. Pal, “Likert scale: Explored and explained,” British journal of applied science & technology, vol. 7, no. 4, pp. 396–403, 2015.
[68] I. R. G. Barus and M. B. Simanjuntak, “Evieta-based learning material in english business class: Students’ perceptions,” Seltics Journal: Scope of English Language Teaching Literature and Linguistics, pp. 73–82, 2020.
[69] R. Likert, “A technique for the measurement of attitudes.,” Archives of psychology, 1932.
[70] J. Cohen, “A coefficient of agreement for nominal scales,” Educational and psychological measurement, vol. 20, no. 1, pp. 37–46, 1960.
[71] J. L. Fleiss, “Measuring nominal scale agreement among many raters.,” Psychological bulletin, vol. 76, no. 5, p. 378, 1971.
[72] A. R. Feinstein and D. V. Cicchetti, “High agreement but low kappa: I. the problems of two paradoxes,” Journal of clinical epidemiology, vol. 43, no. 6, pp. 543–549, 1990.
[73] J. Sim and C. C. Wright, “The kappa statistic in reliability studies: use, interpretation, and sample size requirements,” Physical therapy, vol. 85, no. 3, pp. 257–268, 2005.
[74] K. L. Gwet, “Computing inter-rater reliability and its variance in the presence of high agreement,” British Journal of Mathematical and Statistical Psychology, vol. 61, no. 1, pp. 29–48, 2008.
[75] A. J. Viera, J. M. Garrett, et al., “Understanding interobserver agreement: the kappa statistic,” Fam med, vol. 37, no. 5, pp. 360–363, 2005.
[76] T. Wongpakaran, N. Wongpakaran, R. Intachote-Sakamoto, and T. Boripuntakul, “The group cohesiveness scale (gcs) for psychiatric inpatients,” Perspectives in psychiatric care, vol. 49, no. 1, pp. 58–64, 2013.
[77] K. L. Gwet, Handbook of inter-rater reliability: The definitive guide to measuring the extent of agreement among raters. Advanced Analytics, LLC, 2014.
[78] K. A. Hallgren, “Computing inter-rater reliability for observational data: an overview and tutorial,” Tutorials in quantitative methods for psychology, vol. 8, no. 1, p. 23, 2012.
[79] M. L. McHugh, “Interrater reliability: the kappa statistic,” Biochemia medica, vol. 22, no. 3, pp. 276–282, 2012.
[80] K. Pearson, On further methods of determining correlation, vol. 16. Dulau and Company, 1907.
[81] Q.-H. Hao, W. Peng, J. Wang, Y. Tu, H. Li, and T.-M. Zhu, “The correlation between internet addiction and interpersonal relationship among teenagers and college students based on pearson’s correlation coefficient: a systematic review and meta-analysis,” Frontiers in psychiatry, vol. 13, p. 818494, 2022.
[82] A. A. Boucaud, A correlational study examining the relationship between restorative practices and school climate in selected elementary schools in a large mid-Atlantic urban school district. PhD thesis, Concordia University (Oregon), 2017.
[83] J. C. De Winter, S. D. Gosling, and J. Potter, “Comparing the pearson and spearman correlation coefficients across distributions and sample sizes: A tutorial using simulations and empirical data.,” Psychological methods, vol. 21, no. 3, p. 273, 2016.
[84] T. Pražák and D. Stavárek, “The effect of financial ratios on the stock price development,” Interdiscip. Econ. Bus. Res, vol. 43, p. 3, 2017.
[85] P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig, “Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing,” ACM Computing Surveys, vol. 55, no. 9, pp. 1–35, 2023.
[86] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, et al., “Language models are unsupervised multitask learners,” OpenAI blog, vol. 1, no. 8, p. 9, 2019.
[87] T. Schick and H. Schütze, “Few-shot text generation with natural language instructions,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 390–402, 2021.
[88] X. L. Li and P. Liang, “Prefix-tuning: Optimizing continuous prompts for generation,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), (Online), pp. 4582–4597, Association for Computational Linguistics, Aug. 2021.
[89] J. He, W. Kryscinski, B. McCann, N. Rajani, and C. Xiong, “CTRLsum: Towards generic controllable text summarization,” in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, (Abu Dhabi, United Arab Emirates), pp. 5879–5915, Association for Computational Linguistics, Dec. 2022.
[90] Z.-Y. Dou, P. Liu, H. Hayashi, Z. Jiang, and G. Neubig, “GSum: A general framework for guided neural abstractive summarization,” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (Online), pp. 4830–4842, Association for Computational Linguistics, June 2021.
[91] R. Hadsell, S. Chopra, and Y. LeCun, “Dimensionality reduction by learning an invariant mapping,” in 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), vol. 2, pp. 1735–1742, IEEE, 2006.
[92] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in International conference on machine learning, pp. 1597–1607, PMLR, 2020.
[93] T. Gao, X. Yao, and D. Chen, “SimCSE: Simple contrastive learning of sentence embeddings,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, (Online and Punta Cana, Dominican Republic), pp. 6894–6910, Association for Computational Linguistics, Nov. 2021.
[94] Y. Yang, S. Yuan, D. Cer, S.-y. Kong, N. Constant, P. Pilar, H. Ge, Y.-H. Sung, B. Strope, and R. Kurzweil, “Learning semantic textual similarity from conversations,” in Proceedings of the Third Workshop on Representation Learning for NLP, (Melbourne, Australia), pp. 164–174, Association for Computational Linguistics, July 2018.
[95] Y. Liu and P. Liu, “SimCLS: A simple framework for contrastive learning of abstractive summarization,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), (Online), pp. 1065–1072, Association for Computational Linguistics, Aug. 2021.
[96] S. Xu, X. Zhang, Y. Wu, and F. Wei, “Sequence level contrastive learning for text summarization,” 2022.
[97] S. Sun and W. Li, “Alleviating exposure bias via contrastive learning for abstractive text summarization,” 2021.
[98] A. See, P. J. Liu, and C. D. Manning, “Get to the point: Summarization with pointer-generator networks,” in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1073–1083, 2017.
[99] M. Grusky, M. Naaman, and Y. Artzi, “Newsroom: A dataset of 1.3 million summaries with diverse extractive strategies,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 708–719, 2018.
[100] W.-T. Hsu, C.-K. Lin, M.-Y. Lee, K. Min, J. Tang, and M. Sun, “A unified model for extractive and abstractive summarization using inconsistency loss,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 132–141, 2018.
[101] I. Saito, K. Nishida, K. Nishida, and J. Tomita, “Abstractive summarization with combination of pre-trained sequence-to-sequence and saliency models,” arXiv preprint arXiv:2003.13028, 2020.
[102] J. Pilault, R. Li, S. Subramanian, and C. Pal, “On extractive and abstractive neural document summarization with transformer language models,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 9308–9319, 2020.
[103] S. Wang, Y. Xu, Y. Fang, Y. Liu, S. Sun, R. Xu, C. Zhu, and M. Zeng, “Training data is more valuable than you think: A simple and effective method by retrieving from training data,” arXiv preprint arXiv:2203.08773, 2022.
[104] P. Manakul and M. Gales, “Long-span summarization via local attention and content selection,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 6026–6041, 2021.
[105] S. Gehrmann, Y. Deng, and A. M. Rush, “Bottom-up abstractive summarization,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 4098–4109, 2018.
[106] H. Li, J. Zhu, J. Zhang, C. Zong, and X. He, “Keywords-guided abstractive sentence summarization,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 8196–8203, 2020.
[107] A. Aghajanyan, A. Shrivastava, A. Gupta, N. Goyal, L. Zettlemoyer, and S. Gupta, “Better fine-tuning by reducing representational collapse,” arXiv preprint arXiv:2008.03156, 2020.
[108] W. Qi, Y. Yan, Y. Gong, D. Liu, N. Duan, J. Chen, R. Zhang, and M. Zhou, “Prophetnet: Predicting future n-gram for sequence-to-sequence pre-training,” arXiv preprint arXiv:2001.04063, 2020.
[109] A. Jain, N. Gkanatsios, I. Mediratta, and K. Fragkiadaki, “Bottom up top down detection transformers for language grounding in images and point clouds,” 2022.
[110] T. Rohde, X. Wu, and Y. Liu, “Hierarchical learning for generation with long source sequences. arxiv 2021,” arXiv preprint arXiv:2104.07545.
[111] R. Y. Pang and H. He, “Text generation by learning from demonstrations,” arXiv preprint arXiv:2009.07839, 2020.
[112] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” 2016.
[113] Z. Lin, M. Feng, C. N. dos Santos, M. Yu, B. Xiang, B. Zhou, and Y. Bengio, “A structured self-attentive sentence embedding,” in 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, OpenReview.net, 2017.
[114] E. Jang, S. Gu, and B. Poole, “Categorical reparameterization with gumbel-softmax,” 2017.
[115] E. J. Gumbel, Statistical theory of extreme values and some practical applications: a series of lectures. U. S. Govt. Print. Office, 1954.
[116] L. Ding, L. Wang, D. Wu, D. Tao, and Z. Tu, “Context-aware cross-attention for non-autoregressive translation,” in Proceedings of the 28th International Conference on Computational Linguistics, COLING 2020, Barcelona, Spain (Online), December 8-13, 2020 (D. Scott, N. Bel, and C. Zong, eds.), pp. 4396–4402, International Committee on Computational Linguistics, 2020.
[117] L. Ding, L. Wang, X. Liu, D. F. Wong, D. Tao, and Z. Tu, “Understanding and improving lexical choice in non-autoregressive translation,” arXiv preprint arXiv:2012.14583, 2020.
[118] H. Gong, Y. Tang, J. M. Pino, and X. Li, “Pay better attention to attention: Head selection in multilingual and multi-domain sequence modeling,” in Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual (M. Ranzato, A. Beygelzimer, Y. N. Dauphin, P. Liang, and J. W. Vaughan, eds.), pp. 2668–2681, 2021.
[119] L. Ding, L. Wang, S. Shi, D. Tao, and Z. Tu, “Redistributing low-frequency words: Making the most of monolingual data in non-autoregressive translation,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2417–2426, 2022.
[120] Y. Xiao, L. Wu, J. Guo, J. Li, M. Zhang, T. Qin, and T.-y. Liu, “A survey on non-autoregressive generation for neural machine translation and beyond,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 10, pp. 11407–11427, 2023.
[121] M. Jia, L. Tang, B.-C. Chen, C. Cardie, S. Belongie, B. Hariharan, and S.-N. Lim, “Visual prompt tuning,” 2022.
[122] S. Yoo, E. Kim, D. Jung, J. Lee, and S. Yoon, “Improving visual prompt tuning for self-supervised vision transformers,” in Proceedings of the 40th International Conference on Machine Learning (A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, and J. Scarlett, eds.), vol. 202 of Proceedings of Machine Learning Research, pp. 40075–40092, PMLR, 23–29 Jul 2023.
[123] Y. Shen, E. M.-K. Lai, and M. Mohaghegh, “Effects of similarity score functions in attention mechanisms on the performance of neural question answering systems,” Neural Processing Letters, vol. 54, no. 3, pp. 2283–2302, 2022.
[124] G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, 2015.
[125] C.-Y. Lin, “ROUGE: A package for automatic evaluation of summaries,” in Text Summarization Branches Out, (Barcelona, Spain), pp. 74–81, Association for Computational Linguistics, July 2004.
[126] T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, and Y. Artzi, “Bertscore: Evaluating text generation with bert,” 2020.
[127] W. Yuan, G. Neubig, and P. Liu, “Bartscore: Evaluating generated text as text generation,” 2021.
[128] S. Banerjee and A. Lavie, “METEOR: An automatic metric for MT evaluation with improved correlation with human judgments,” in Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization (J. Goldstein, A. Lavie, C.-Y. Lin, and C. Voss, eds.), (Ann Arbor, Michigan), pp. 65–72, Association for Computational Linguistics, June 2005.
[129] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “Bleu: a method for automatic evaluation of machine translation,” in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (P. Isabelle, E. Charniak, and D. Lin, eds.), (Philadelphia, Pennsylvania, USA), pp. 311–318, Association for Computational Linguistics, July 2002.
[130] M. Ravaut, S. Joty, and N. Chen, “SummaReranker: A multi-task mixture-of-experts re-ranking framework for abstractive summarization,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (S. Muresan, P. Nakov, and A. Villavicencio, eds.), (Dublin, Ireland), pp. 4504–4524, Association for Computational Linguistics, May 2022.

簡易檢索 / 詳目顯示

相關論文