一種適用於對話選擇的從 BERT 蒸餾知識至 ESIM 模型的方法

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳曄 Ye Chen
論文名稱：	一種適用於對話選擇的從 BERT 蒸餾知識至 ESIM 模型的方法 An Effective Distilled ESIM Model from BERT for Response Selection
指導教授：	呂永和 Yung-Ho Leu
口試委員:	楊維寧 Wei-Ning Yang 陳雲岫 Yun-Shiow Chen
學位類別：	碩士 Master
系所名稱：	管理學院 - 資訊管理系 Department of Information Management
論文出版年：	2021
畢業學年度：	109
語文別：	英文
論文頁數：	25
中文關鍵詞：	對話選擇、聊天機器人、知識蒸餾
外文關鍵詞：	Response Selection, Knowledge Distillation, chatbot
相關次數：	點閱：216 下載：51
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本文主要探討 Multi-Turn Response Selection 任務。近年來 BERT(Devlin, Chang, Lee, & Toutanova, 2019) 等預訓練模型在下游任務取得了相當耀眼的成績。將預訓練模型加入 Response Selection 任務成爲一個值得研究的問題。 BERT 模型雖然效果十分顯著，但有着模型冗雜與推斷速度慢等缺點。因此有研究者嘗試將 BERT 模型通過知識蒸餾的方式壓縮。本文利用知識蒸餾技術，將 BERT 的知識蒸餾到 ESIM(Chen & Wang, 2019) 中，旨在得到一個更輕便，推斷更快速且效能不太差的模型。根據實驗結果，在 Ubuntu Corpus V2(Lowe et al., 2017) 上我們的方法所得到的模型比原 ESIM 提升了 3.2%(R10@1)；可以達到教師模型 92% 的效能，且模型縮小 55%；與 DistilBERT(Sanh, Debut, Chaumond, & Wolf, 2019) 成績相近，但模型更小，推斷速度更快。

In this paper, we focus on Multi-Turn Response Selection. Pre-trained Language Models like BERT have achieved impressive results on many downstream natural language processing (NLP) tasks in recent years. Although BERT achieved significant performance in NLP, it suffered from its cumbersome model and long inference time. Therefore, some researchers have investigated how to compress the BERT model into a light model. In this paper, we leverage the knowledge distillation method to distill the BERT model into an ESIM model for Multi-Turn Response Selection. The resultant (distilled) ESIM model offered an improvement of 3.2% for R10@1 than the original ESIM model (Chen & Wang, 2019) on the Ubuntu Corpus V2 dataset and showed a comparable performance as the DistilBERT model (Sanh et al., 2019). Furthermore, the distilled ESIM model is 11× faster and 55% lighter than the teacher BERT model while retaining 92% of the performance of the teacher BERT model.

Introduction
Related Work
Our Approach
Experiment
Conclusions
References
                                

Bucilua, C., Caruana, R., & Niculescu-Mizil, A. (2006). Model compression. In Proceedings of the 12th acm sigkdd international
conference on knowledge discovery and data mining (pp. 535–
541).
Chen, Q., & Wang, W. (2019). Sequential attention-based network
for noetic end-to-end response selection. arXiv preprint arXiv:
1901.02609.
Chen, Q., Zhu, X., Ling, Z.-H., Wei, S., Jiang, H., & Inkpen, D.
(2017, July). Enhanced LSTM for natural language inference.
In Proceedings of the 55th annual meeting of the association for
computational linguistics (volume 1: Long papers) (pp. 1657–
1668). Vancouver, Canada: Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/
P17-1152 doi: 10.18653/v1/P17-1152
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019, June).
BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference
of the north American chapter of the association for computational linguistics: Human language technologies, volume 1 (long
and short papers) (pp. 4171–4186). Minneapolis, Minnesota:
Association for Computational Linguistics. Retrieved from
https://www.aclweb.org/anthology/N19-1423 doi: 10.18653/
v1/N19-1423
Gu, J.-C., Ling, Z.-H., & Liu, Q. (2019). Interactive matching network
for multi-turn response selection in retrieval-based chatbots. In
22
proceedings of the 28th acm international conference on information and knowledge management (pp. 2321–2324).
Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge
in a neural network. arXiv preprint arXiv:1503.02531.
Jiao, X., Yin, Y., Shang, L., Jiang, X., Chen, X., Li, L., … Liu, Q.
(2019). Tinybert: Distilling bert for natural language understanding. arXiv preprint arXiv:1909.10351.
Jurafsky, D., & Martin, J. H. (2018). Speech and language processing
(draft). preparation [cited 2020 June 1] Available from: https://
web. stanford. edu/˜ jurafsky/slp3.
Kadlec, R., Schmid, M., & Kleindienst, J. (2015). Improved deep
learning baselines for ubuntu corpus dialogs. arXiv preprint
arXiv:1510.03753.
Li, F.-L., Qiu, M., Chen, H., Wang, X., Gao, X., Huang, J., … others
(2017). Alime assist: An intelligent assistant for creating an
innovative e-commerce experience. In Proceedings of the 2017
acm on conference on information and knowledge management
(pp. 2495–2498).
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., … Stoyanov,
V. (2019). Roberta: A robustly optimized bert pretraining
approach. arXiv preprint arXiv:1907.11692.
Lowe, R., Pow, N., Serban, I. V., Charlin, L., Liu, C.-W., & Pineau, J.
(2017). Training end-to-end dialogue systems with the ubuntu
dialogue corpus. Dialogue & Discourse, 8(1), 31–65.
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z.,
… Lerer, A. (2017). Automatic differentiation in pytorch.
Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). Distilbert,
23
a distilled version of bert: smaller, faster, cheaper and lighter.
arXiv preprint arXiv:1910.01108.
Sato, S., Akama, R., Ouchi, H., Suzuki, J., & Inui, K. (2020,
July). Evaluating dialogue generation systems via response
selection. In Proceedings of the 58th annual meeting of the
association for computational linguistics (pp. 593–599). Online: Association for Computational Linguistics. Retrieved
from https://www.aclweb.org/anthology/2020.acl-main.55 doi:
10.18653/v1/2020.acl-main.55
Shum, H.-Y., He, X.-d., & Li, D. (2018). From eliza to xiaoice:
challenges and opportunities with social chatbots. Frontiers of
Information Technology & Electronic Engineering, 19(1), 10–26.
Tang, R., Lu, Y., Liu, L., Mou, L., Vechtomova, O., & Lin, J. (2019).
Distilling task-specific knowledge from bert into simple neural
networks. arXiv preprint arXiv:1903.12136.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez,
A. N., … Polosukhin, I. (2017). Attention is all you need. In
Advances in neural information processing systems (pp. 5998–
6008).
Weizenbaum, J. (1966). Eliza— a computer program for the study
of natural language communication between man and machine.
Communications of the ACM, 9(1), 36–45.
Whang, T., Lee, D., Lee, C., Yang, K., Oh, D., & Lim, H. (2019).
An effective domain adaptive post-training method for bert in
response selection. arXiv preprint arXiv:1908.04812.
Wu, S., Jiang, Y., Wang, X., Miao, W., Zhao, Z., Xie, J., & Li, M.
(2020). Enhancing response selection with advanced context
24
modeling and post-training. Association for the Advancement
of Artificial Intelligence.
Wu, Y., Wu, W., Xing, C., Zhou, M., & Li, Z. (2016). Sequential matching network: A new architecture for multi-turn response selection in retrieval-based chatbots. arXiv preprint
arXiv:1612.01627.
Xu, R., Tao, C., Jiang, D., Zhao, X., Zhao, D., & Yan, R. (2020).
Learning an effective context-response matching model with selfsupervised tasks for retrieval-based dialogues. arXiv preprint
arXiv:2009.06265.
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., & Le,
Q. V. (2019). Xlnet: Generalized autoregressive pretraining for
language understanding. arXiv preprint arXiv:1906.08237.
Zhou, X., Li, L., Dong, D., Liu, Y., Chen, Y., Zhao, W. X., … Wu,
H. (2018, July). Multi-turn response selection for chatbots
with deep attention matching network. In Proceedings of the
56th annual meeting of the association for computational linguistics (volume 1: Long papers) (pp. 1118–1127). Melbourne,
Australia: Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/P18-1103 doi:
10.18653/v1/P18-1103

簡易檢索 / 詳目顯示

相關論文