簡易檢索 / 詳目顯示

研究生: 王繹崴
Yi-Wei Wang
論文名稱: 語音辨識結果修正之研究與資料集
HypR: A comprehensive study for ASR hypothesis revising with a reference corpus
指導教授: 陳冠宇
Kuan-Yu Chen
口試委員: 陳冠宇
Kuan-Yu Chen
陳柏琳
Berlin Chen
蘇明祥
Ming-Hsiang Su
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2023
畢業學年度: 112
語文別: 英文
論文頁數: 72
中文關鍵詞: 辨識結果修改候選句子重新評分錯誤修正參考基準資料集
外文關鍵詞: Hypothesis revising, N-best reranking, error correction, reference benchmark
相關次數: 點閱:374下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

自動語音辨識的研究進展在近年有相當大的邁進。由於端到端語音辨識的發展,研究的門檻大幅的降低且可以在不需要額外的語言知識的情況下, 將前處理後的音訊輸入進辨識系統而不需要任何額外的轉換便可得到辨識結果,且在辨識正確率上達到不錯的表現。然而端到端的語音辨識系統距離完全正確仍然有一段距離,單純使用語音資料於訓練對於推論正確的文字仍有力有未逮之處,會受到其他發音相近的字或詞影響而使得推論結果偏離句子情境。
為了解決上述的問題,語言模型被用於配合端到端的語音辨識模型。 透過由遠多於語音資料數量的文本資料訓練的語言模型擁有更強力的語言知識,在語音辨識中可以輔助系統輸出更為合理且正確的答案。而輔助的方式分為兩種,分別是在語音辨識推論的過程中加入、輔助評分的Fusion,以及在語音模型輸出數個候選句子後,再透過語言模型去評分或根據這些句子再次生成一句新的答案的 Revising。
在這篇論文中,我們著重在 Revising 的領域。過去有許多相關的方法被提出並進一步改善了辨識的結果,但是過去的實驗多半都進行在各自的 實驗資料集上,而這些資料集的生成方式亦有所不同導致各自之間的分布 有所差異,使得在比較這些方法之間的效益以及差異變得相當困難,因為 訓練的資料集以及修改的對象都有所不同。
在這篇論文中,我們發表一個用於自動語音辨識的重新評分集修改的 資料集—HypR,並將近期的許多重新評分以及修改的方法執行於我們的資料集上來進行比較,這些修正各自用不同的評分方式,或是以不同的輸 入形式輸入進修改模型,以此從候選句子中選出更加正確的句子或是生成一句錯誤率更低的答案。在我們的實驗中,我們比較並探討了這些評分 方式與輸入格式配合 GPT-2, BERT, BART 等預訓練模型在錯誤修改上是否有幫助。所有的實驗中,我們發現 PBERT 獲得了最佳的表現,在此之上,我們將利用各個方法的資訊再得到更進一步的改善。最後,我們會HypR 公開於網路平台上,讓其他同樣致力於研究語音辨識相關領域的研究者們可以透過這個公開的資料集來進行公平的比較。


With the development of deep learning, automatic speech recognition (ASR) has made significant progress. To further enhance the performance, revis- ing recognition results is one of the lightweight but efficient manners. Var- ious methods can be roughly classified into N -best reranking methods and error correction models. The former aims to select the hypothesis with the lowest error rate from a set of candidates generated by ASR for a given in- put speech. The latter focuses on detecting recognition errors in a given hy- pothesis and correcting these errors to obtain an enhanced result. However, we observe that these studies are hardly comparable to each other as they are usually evaluated on different corpora, paired with different ASR mod- els, and even use different datasets to train the models. Accordingly, we first concentrate on releasing an ASR hypothesis revising (HypR) dataset in this study. HypR contains several commonly used corpora (AISHELL-1, AISHELL-2,TED-LIUM 2, LibriSpeech and CSJ) and provides 50 recog- nition hypotheses for each speech utterance. The checkpoint models of the ASR are also published. We hope the publicly available HypR dataset can become a reference benchmark for subsequent research and promote the school of research to an advanced level.
In addition, we implement and compare several classic and represen- tative methods, showing the recent research progress in revising speech recognition results. We compare different ASR revising methods that uti- lize classic pretrained models, such as GPT2, BERT and BART. These methods revise answer in different manners and can be defined as ASR reranking or ASR correction by how the revision be conducted in our the- sis. We further divide the reranking methods into token-level, setence-level and comparision-based methods by the scoring pattern. In our experiment, we will On the other hand, we tested the impact of input format to the cor- rection model. We also try to clarify if providing top-N information will lead the BART model to generate more proper answer than providing single hypothesis only.
Besides, we also investigate the ability of Large Language Model to ASR revising domain. We utilize ChatGPT to rerank or correct the ASR hypothesis by designing zero-shot prompt that contains 10-best hypothesis. Finally, after reproducing these proposed methods, we find that PBERT surprisingly outperforms other revising methods on our HypR benchmark, even though it undergoes simple training process. On top of that, we add extra module on PBERT to supply more information and further improves the accuracy of revision.

Recommendation Letter i Approval Letter ii Abstract in Chinese iii Abstract in English iv Acknowledgements vi Contents vii List of Figures ix List of Tables x List of Algorithms xi 1 Introduction 1 2 RelatedWork 3 2.1 Automatic Speech Recognition 3 2.2 Language Model 5 2.3 Transformer 6 2.4 Pretraining 8 2.4.1 GPT-2 10 2.4.2 BERT 11 2.4.3 BART 13 2.5 N -Best reranking and ASR Correction 14 2.5.1 N -Best Reranking 14 2.5.2 Error Correction Methods 23 3 Dataset introduction 29 3.1 Datasets 29 3.2 ASR architecture 31 4 Expriment 33 4.1 Experimental settings 33 4.1.1 N -Best reranking method 34 4.1.2 Correction 37 4.1.3 Large Language Model 38 4.1.4 Proposed Method 40 4.2 Performance Analysis 40 4.3 Efficiency Analysis 44 5 Conclusions 52 5.1 Future Work 52 References 54

[1] F. Jelinek, Statistical methods for speech recognition. MIT press, 1998.
[2] D. Yu and L. Deng, Automatic speech recognition, vol. 1. Springer, 2016.
[3] S. Young, “The htk hidden markov model toolkit: Design and philosophy,” Entropic Cambridge Re- search Laboratory, Ltd, vol. 2, pp. 2–44, 01 1994.
[4] S. K. Gaikwad, B. W. Gawali, and P. Yannawar, “A review on speech recognition technique,” Inter- national Journal of Computer Applications, vol. 10, no. 3, pp. 16–24, 2010.
[5] W. Chan, N. Jaitly, Q. Le, and O. Vinyals, “Listen, attend and spell: A neural network for large vocabulary conversational speech recognition,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4960–4964, 2016.
[6] S. Watanabe, T. Hori, S. Kim, J. R. Hershey, and T. Hayashi, “Hybrid ctc/attention architecture for end-to-end speech recognition,” IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 8, pp. 1240–1253, 2017.
[7] F.-H. Yu, K.-Y. Chen, and K.-H. Lu, “Non-autoregressive asr modeling using pre-trained language models for chinese speech recognition, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 1474–1482, 2022.
[8] K.-H. Lu and K.-Y. Chen, “A context-aware knowledge transferring strategy for ctc-based asr,” in 2022 IEEE Spoken Language Technology Workshop (SLT), pp. 60–67, 2023.
[9] J. Salazar, D. Liang, T. Q. Nguyen, and K. Kirchhoff, “Pseudolikelihood reranking with masked lan- guage models,” CoRR, vol. abs/1910.14659, 2019.
[10] L. Xu, Y. Gu, J. Kolehmainen, H. Khan, A. Gandhe, A. Rastrow, A. Stolcke, and I. Bulyko, “Rescore- BERT: Discriminative speech recognition rescoring with bert,” in ICASSP 2022 - 2022 IEEE Interna- tional Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, may 2022.
[11] Y. Leng, X. Tan, L. Zhu, J. Xu, R. Luo, L. Liu, T. Qin, X. Li, E. Lin, and T.-Y. Liu, “Fastcorrect: Fast error correction with edit alignment for automatic speech recognition,” Advances in Neural Informa- tion Processing Systems, vol. 34, pp. 21708–21719, 2021.
[12] K. Song, Y. Leng, X. Tan, Y. Zou, T. Qin, and D. Li, “Transcormer: Transformer for sentence scor- ing with sliding language modeling,” Advances in Neural Information Processing Systems, vol. 35, pp. 11160–11174, 2022.
[13] S.-H. Chiu and B. Chen, “Innovative bert-based reranking language models for speech recognition,” in 2021 IEEE Spoken Language Technology Workshop (SLT), IEEE, jan 2021.
[14] A. Ogawa, M. Delcroix, S. Karita, and T. Nakatani, “Rescoring n-best speech recognition list based on one-on-one hypothesis comparison using encoder-classifier model,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6099–6103, 2018.
[15] D. Fohr and I. Illina, “Bert-based semantic model for rescoring n-best speech recognition list,” in INTERSPEECH 2021, 2021.
[16] H.-W. Wang, B.-C. Yan, Y.-C. Wang, and B. Chen, “Effective asr error correction leveraging phonetic, semantic information and n-best hypotheses,” in 2022 Asia-Pacific Signal and Information Processing
Association Annual Summit and Conference (APSIPA ASC), pp. 117–122, 2022.
[17] J. Guo, M. Wang, X. Qiao, D. Wei, H. Shang, Z. Li, Z. Yu, Y. Li, C. Su, M. Zhang, S. Tao, and H. Yang, “Ucorrect: An unsupervised framework for automatic speech recognition error correction,” in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5, 2023.
[18] L. Zhu, W. Liu, L. Liu, and E. Lin, “Improving asr error correction using n-best hypotheses,” in 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 83–89, 2021.
[19] Y. Zhao, X. Yang, J. Wang, Y. Gao, C. Yan, and Y. Zhou, “BART based semantic correction for mandarin automatic speech recognition system,” in Interspeech 2021, ISCA, aug 2021.
[20] H. Bu, J. Du, X. Na, B. Wu, and H. Zheng, “AISHELL-1: an open-source mandarin speech corpus and A speech recognition baseline,” CoRR, vol. abs/1709.05522, 2017.
[21] J. Du, X. Na, X. Liu, and H. Bu, “AISHELL-2: transforming mandarin ASR research into industrial scale,” CoRR, vol. abs/1808.10583, 2018.
[22] A. Rousseau, P. Deléglise, and Y. Estève, “Enhancing the TED-LIUM corpus with selected data for language modeling and more TED talks,” in Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), (Reykjavik, Iceland), pp. 3935–3939, European Lan- guage Resources Association (ELRA), May 2014.
[23] A. Rousseau, P. Deléglise, and Y. Estève, “Enhancing the TED-LIUM corpus with selected data for language modeling and more TED talks,” in Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), (Reykjavik, Iceland), pp. 3935–3939, European Lan- guage Resources Association (ELRA), May 2014.
[24] K. Maekawa, “Corpus of spontaneous japanese: Its design and evaluation,” in ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition, 2003.
[25] A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, “Connectionist temporal classification: La- belling unsegmented sequence data with recurrent neural networks,” in Proceedings of the 23rd Inter- national Conference on Machine Learning, ICML ’06, (New York, NY, USA), p. 369–376, Associ- ation for Computing Machinery, 2006.
[26] L. Dong, S. Xu, and B. Xu, “Speech-transformer: A no-recurrence sequence-to-sequence model for speech recognition,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Pro- cessing (ICASSP), pp. 5884–5888, 2018.
[27] Y. Wang, A. Mohamed, D. Le, C. Liu, A. Xiao, J. Mahadeokar, H. Huang, A. Tjandra, X. Zhang, F. Zhang, C. Fuegen, G. Zweig, and M. L. Seltzer, “Transformer-based acoustic modeling for hybrid speech recognition,” in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6874–6878, 2020.
[28] A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust speech recog- nition via large-scale weak supervision,” in Proceedings of the 40th International Conference on Ma- chine Learning, ICML’23, JMLR.org, 2023.
[29] N. Chen, S. Watanabe, J. Villalba, P. Zelasko, and N. Dehak, “Non-autoregressive transformer for speech recognition,” IEEE Signal Processing Letters, vol. 28, pp. 121–125, 2021.
[30] Y. Higuchi, S. Watanabe, N. Chen, T. Ogawa, and T. Kobayashi, “Mask ctc: Non-autoregressive end- to-end asr with ctc and mask predict,” arXiv preprint arXiv:2005.08700, 2020.
[31] Y. Bai, J. Yi, J. Tao, Z. Tian, Z. Wen, and S. Zhang, “Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition,” in Proc. Interspeech 2020, pp. 3381–3385, 2020.
[32] J. Nozaki and T. Komatsu, “Relaxing the conditional independence assumption of ctc-based asr by conditioning on intermediate predictions,” arXiv preprint arXiv:2104.02724, 2021.
[33] C.-E. Lin and K.-Y. Chen, “A Lexical-aware Non-autoregressive Transformer-based ASR Model,” in Proc. INTERSPEECH 2023, pp. 1434–1438, 2023.
[34] Y. Higuchi, H. Inaguma, S. Watanabe, T. Ogawa, and T. Kobayashi, “Improved mask-ctc for non- autoregressive end-to-end asr,” in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8363–8367, 2021.
[35] X. Lu, P. Shen, Y. Tsao, and H. Kawai, “Cross-modal alignment with optimal transport for ctc-based asr,” arXiv preprint arXiv:2309.13650, 2023.
[36] L. Dong, N. Yang, W. Wang, F. Wei, X. Liu, Y. Wang, J. Gao, M. Zhou, and H.-W. Hon, “Unified Language Model Pre-training for Natural Language Understanding and Generation,” in Advances in Neural Information Processing Systems (H. Wallach, H. Larochelle, A. Beygelzimer, F. d. Alché-Buc, E. Fox, and R. Garnett, eds.), vol. 32, Curran Associates, Inc., 2019.
[37] A. Kannan, Y. Wu, P. Nguyen, T. N. Sainath, Z. Chen, and R. Prabhavalkar, “An analysis of incorpo- rating an external language model into a sequence-to-sequence model,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5828, 2018.
[38] J. Chorowski and N. Jaitly, “Towards better decoding and language model integration in sequence to sequence models,” CoRR, vol. abs/1612.02695, 2016.
[39] F. Burlot and F. Yvon, “Using monolingual data in neural machine translation: a systematic study,” in Proceedings of the Third Conference on Machine Translation: Research Papers (O. Bojar, R. Chat- terjee, C. Federmann, M. Fishel, Y. Graham, B. Haddow, M. Huck, A. J. Yepes, P. Koehn, C. Monz,
M. Negri, A. Névéol, M. Neves, M. Post, L. Specia, M. Turchi, and K. Verspoor, eds.), (Brussels, Belgium), pp. 144–155, Association for Computational Linguistics, Oct. 2018.
[40] J. Shin, Y. Lee, and K. Jung, “Effective sentence scoring method using bert for speech recognition,” in Asian Conference on Machine Learning, pp. 1081–1093, PMLR, 2019.
[41] C.-H. Kuo and K.-Y. Chen, “Correcting, rescoring and matching: An n-best list selection framework for speech recognition,” in 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 729–734, 2022.
[42] A. Mani, S. Palaskar, N. V. Meripo, S. Konam, and F. Metze, “Asr error correction and domain adapta- tion using machine translation,” in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6344–6348, 2020.
[43] Y. Weng, S. S. Miryala, C. Khatri, R. Wang, H. Zheng, P. Molino, M. Namazifar, A. Papangelis,
H. Williams, F. Bell, et al., “Joint contextual modeling for asr correction and language understanding,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6349–6353, IEEE, 2020.
[44] Z. Fang, R. Zhang, Z. He, H. Wu, and Y. Cao, “Non-autoregressive Chinese ASR error correction with phonological training,” in Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (Seattle, United States), pp. 5907–5917, Association for Computational Linguistics, July 2022.
[45] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
[46] N. Chen, S. Watanabe, J. Villalba, P. Żelasko, and N. Dehak, “Non-autoregressive transformer for speech recognition,” IEEE Signal Processing Letters, vol. 28, pp. 121–125, 2021.
[47] N. Chen, S. Watanabe, J. Villalba, P. Żelasko, and N. Dehak, “Non-autoregressive transformer for speech recognition,” IEEE Signal Processing Letters, vol. 28, pp. 121–125, 2021.
[48] J. Gu, J. Bradbury, C. Xiong, V. O. Li, and R. Socher, “Non-autoregressive neural machine translation,”arXiv preprint arXiv:1711.02281, 2017.
[49] J. Zhang, Y. Zhao, M. Saleh, and P. Liu, “PEGASUS: Pre-training with extracted gap-sentences for abstractive summarization,” in Proceedings of the 37th International Conference on Machine Learning (H. D. III and A. Singh, eds.), vol. 119 of Proceedings of Machine Learning Research, pp. 11328– 11339, PMLR, 13–18 Jul 2020.
[50] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” CoRR, vol. abs/ 1512.03385, 2015.
[51] J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” arXiv preprint arXiv:1607.06450, 2016.
[52] A. Radford and K. Narasimhan, “Improving language understanding by generative pre-training,” 2018.
[53] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, et al., “Language models are unsu- pervised multitask learners,” OpenAI blog, vol. 1, no. 8, p. 9, 2019.
[54] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam,G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh,D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark,C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, “Language models are few-shot learners,” CoRR, vol. abs/2005.14165, 2020.
[55] J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: pre-training of deep bidirectional transformers for language understanding,” CoRR, vol. abs/1810.04805, 2018.
[56] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettle- moyer, “BART: denoising sequence-to-sequence pre-training for natural language generation, trans- lation, and comprehension,” CoRR, vol. abs/1910.13461, 2019.
[57] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoy- anov, “Roberta: A robustly optimized bert pretraining approach,” arXiv preprint arXiv:1907.11692, 2019.
[58] Y. Liu, “Fine-tune bert for extractive summarization,” arXiv preprint arXiv:1903.10318, 2019.
[59] X. Zheng, C. Zhang, and P. C. Woodland, “Adapting gpt, gpt-2 and bert language models for speech recognition,” in 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU),
pp. 162–168, IEEE, 2021.
[60] J. Cai, M. Sunkara, X. Li, A. Bhatia, X. Pan, and S. Bodapati, “Masked audio text encoders are effective multi-modal rescorers,” arXiv preprint arXiv:2305.07677, 2023.
[61] J. Shin, Y. Lee, S. Yoon, and K. Jung, “Fast and accurate deep bidirectional language representations for unsupervised learning,” in Proceedings of the 58th Annual Meeting of the Association for Compu- tational Linguistics, (Online), pp. 823–835, Association for Computational Linguistics, July 2020.
[62] S.-H. Chiu, T.-H. Lo, and B. Chen, “Cross-sentence neural language models for conversational speech recognition,” in 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–7, 2021.
[63] G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, 2015.
[64] R. Prabhavalkar, T. N. Sainath, Y. Wu, P. Nguyen, Z. Chen, C.-C. Chiu, and A. Kannan, “Minimum word error rate training for attention-based sequence-to-sequence models,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4839–4843, 2018.
[65] M. Sundermeyer, R. Schlüter, and H. Ney, “LSTM neural networks for language modeling,” in Proc. Interspeech 2012, pp. 194–197, 2012.
[66] I. Illina and D. Fohr, “Semantic Information Investigation for Transformer-based Rescoring of N-best Speech Recognition,” in LTC 2023, Proceedings of Language and Technology 2023, (Poznan, Poland), Apr. 2023.
[67] J. Yang, R. Li, and W. Peng, “Asr error correction with constrained decoding on operation prediction,”arXiv preprint arXiv:2208.04641, 2022.
[68] Z. Fang, R. Zhang, Z. He, H. Wu, and Y. Cao, “Non-autoregressive Chinese ASR error correction with phonological training,” in Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (Seattle, United States), pp. 5907–5917, Association for Computational Linguistics, July 2022.
[69] Y. Leng, X. Tan, R. Wang, L. Zhu, J. Xu, W. Liu, L. Liu, T. Qin, X.-Y. Li, E. Lin, et al., “Fastcor- rect 2: Fast error correction on multiple candidates for automatic speech recognition,” arXiv preprint arXiv:2109.14420, 2021.
[70] Y. Leng, X. Tan, W. Liu, K. Song, R. Wang, X.-Y. Li, T. Qin, E. Lin, and T.-Y. Liu, “Softcorrect: Error correction with soft detection for automatic speech recognition,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 13034–13042, 2023.
[71] W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang, J. Zhang, Z. Dong, et al., “A survey of large language models,” arXiv preprint arXiv:2303.18223, 2023.
[72] Y. Chang, X. Wang, J. Wang, Y. Wu, K. Zhu, H. Chen, L. Yang, X. Yi, C. Wang, Y. Wang, et al., “A survey on evaluation of large language models,” arXiv preprint arXiv:2307.03109, 2023.
[73] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,” J. Mach. Learn. Res., vol. 21, jan 2020.
[74] L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, et al., “Training language models to follow instructions with human feedback,” Advances in Neural Information Processing Systems, vol. 35, pp. 27730–27744, 2022.
[75] OpenAI, “Gpt-4 technical report,” 2023.
[76] H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal,E. Hambro, F. Azhar, et al., “Llama: Open and efficient foundation language models,” arXiv preprint arXiv:2302.13971, 2023.
[77] H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhar- gava, S. Bhosale, et al., “Llama 2: Open foundation and fine-tuned chat models,” arXiv preprint arXiv:2307.09288, 2023.
[78] K. Maeng, A. Colin, and B. Lucia, “Alpaca: Intermittent execution without checkpoints,” Proceedings of the ACM on Programming Languages, vol. 1, no. OOPSLA, pp. 1–30, 2017.
[79] W.-L. Chiang, Z. Li, Z. Lin, Y. Sheng, Z. Wu, H. Zhang, L. Zheng, S. Zhuang, Y. Zhuang, J. E. Gonzalez, et al., “Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality,” See https://vicuna. lmsys. org (accessed 14 April 2023), 2023.
[80] N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly, “Parameter-efficient transfer learning for nlp,” in International Conference on Machine Learning, pp. 2790–2799, PMLR, 2019.
[81] X. L. Li and P. Liang, “Prefix-tuning: Optimizing continuous prompts for generation,” arXiv preprint arXiv:2101.00190, 2021.
[82] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-rank adaptation of large language models,” in International Conference on Learning Representations, 2022.
[83] T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer, “Qlora: Efficient finetuning of quantized llms,” arXiv preprint arXiv:2305.14314, 2023.
[84] C. Qin, A. Zhang, Z. Zhang, J. Chen, M. Yasunaga, and D. Yang, “Is chatgpt a general-purpose natural language processing task solver?,” arXiv preprint arXiv:2302.06476, 2023.
[85] C. Chen, Y. Hu, C.-H. H. Yang, S. M. Siniscalchi, P.-Y. Chen, and E. S. Chng, “Hyporadise: An open baseline for generative speech recognition with large language models,” Proc. of NeurIPS, 2023.
[86] R. Ma, M. Qian, P. Manakul, M. Gales, and K. Knill, “Can generative large language models perform asr error correction?,” arXiv preprint arXiv:2307.04172, 2023.
[87] S. Watanabe, T. Hori, S. Karita, T. Hayashi, J. Nishitoba, Y. Unno, N. Enrique Yalta Soplin, J. Hey- mann, M. Wiesner, N. Chen, A. Renduchintala, and T. Ochiai, “ESPnet: End-to-end speech processing toolkit,” in Proceedings of Interspeech, pp. 2207–2211, 2018.
[88] D. S. Park, W. Chan, Y. Zhang, C.-C. Chiu, B. Zoph, E. D. Cubuk, and Q. V. Le, “SpecAugment: A simple data augmentation method for automatic speech recognition,” in Interspeech 2019, ISCA, sep 2019.
[89] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Fun- towicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. Le Scao, S. Gugger,
M. Drame, Q. Lhoest, and A. Rush, “Transformers: State-of-the-art natural language processing,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, (Online), pp. 38–45, Association for Computational Linguistics, Oct. 2020.
[90] A. Tolmachev, D. Kawahara, and S. Kurohashi, “Juman++: A morphological analysis toolkit for scriptio continua,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, (Brussels, Belgium), pp. 54–59, Association for Computational Linguistics, Nov. 2018.

無法下載圖示 全文公開日期 2024/11/15 (校內網路)
全文公開日期 2025/11/15 (校外網路)
全文公開日期 2025/11/15 (國家圖書館:臺灣博碩士論文系統)
QR CODE