研究生: 陳曄
Ye Chen
論文名稱: 一種適用於對話選擇的從 BERT 蒸餾知識至 ESIM 模型的方法
An Effective Distilled ESIM Model from BERT for Response Selection
指導教授: 呂永和
Yung-Ho Leu
口試委員: 楊維寧
Wei-Ning Yang
Yun-Shiow Chen
學位類別: 碩士
系所名稱: 管理學院 - 資訊管理系
Department of Information Management
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 25
中文關鍵詞: 對話選擇聊天機器人知識蒸餾
外文關鍵詞: Response Selection, Knowledge Distillation, chatbot
相關次數: 點閱:477下載:51
  • 本文主要探討 Multi-Turn Response Selection 任務。近年來 BERT(Devlin, Chang, Lee, & Toutanova, 2019) 等預訓練模型在下游任務取得了 相當耀眼的成績。將預訓練模型加入 Response Selection 任務成爲 一個值得研究的問題。 BERT 模型雖然效果十分顯著,但有着模型 冗雜與推斷速度慢等缺點。因此有研究者嘗試將 BERT 模型通過 知識蒸餾的方式壓縮。本文利用知識蒸餾技術,將 BERT 的知識 蒸餾到 ESIM(Chen & Wang, 2019) 中,旨在得到一個更輕便,推 斷更快速且效能不太差的模型。根據實驗結果,在 Ubuntu Corpus V2(Lowe et al., 2017) 上我們的方法所得到的模型比原 ESIM 提升了 3.2%(R10@1);可以達到教師模型 92% 的效能,且模型縮小 55%; 與 DistilBERT(Sanh, Debut, Chaumond, & Wolf, 2019) 成績相近, 但模型更小,推斷速度更快。

    In this paper, we focus on Multi-Turn Response Selection. Pre-trained Language Models like BERT have achieved impressive results on many downstream natural language processing (NLP) tasks in recent years. Although BERT achieved significant performance in NLP, it suffered from its cumbersome model and long inference time. Therefore, some researchers have investigated how to compress the BERT model into a light model. In this paper, we leverage the knowledge distillation method to distill the BERT model into an ESIM model for Multi-Turn Response Selection. The resultant (distilled) ESIM model offered an improvement of 3.2% for R10@1 than the original ESIM model (Chen & Wang, 2019) on the Ubuntu Corpus V2 dataset and showed a comparable performance as the DistilBERT model (Sanh et al., 2019). Furthermore, the distilled ESIM model is 11× faster and 55% lighter than the teacher BERT model while retaining 92% of the performance of the teacher BERT model.

    1 Introduction 2 Related Work 3 Our Approach 4 Experiment 5 Conclusions 6 References

