簡易檢索 / 詳目顯示

研究生: 陳曄
Ye Chen
論文名稱: 一種適用於對話選擇的從 BERT 蒸餾知識至 ESIM 模型的方法
An Effective Distilled ESIM Model from BERT for Response Selection
指導教授: 呂永和
Yung-Ho Leu
口試委員: 楊維寧
Wei-Ning Yang
陳雲岫
Yun-Shiow Chen
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理系
Department of Information Management
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 25
中文關鍵詞: 對話選擇聊天機器人知識蒸餾
外文關鍵詞: Response Selection, Knowledge Distillation, chatbot
相關次數: 點閱:216下載:51
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本文主要探討 Multi-Turn Response Selection 任務。近年來 BERT(Devlin, Chang, Lee, & Toutanova, 2019) 等預訓練模型在下游任務取得了 相當耀眼的成績。將預訓練模型加入 Response Selection 任務成爲 一個值得研究的問題。 BERT 模型雖然效果十分顯著,但有着模型 冗雜與推斷速度慢等缺點。因此有研究者嘗試將 BERT 模型通過 知識蒸餾的方式壓縮。本文利用知識蒸餾技術,將 BERT 的知識 蒸餾到 ESIM(Chen & Wang, 2019) 中,旨在得到一個更輕便,推 斷更快速且效能不太差的模型。根據實驗結果,在 Ubuntu Corpus V2(Lowe et al., 2017) 上我們的方法所得到的模型比原 ESIM 提升了 3.2%(R10@1);可以達到教師模型 92% 的效能,且模型縮小 55%; 與 DistilBERT(Sanh, Debut, Chaumond, & Wolf, 2019) 成績相近, 但模型更小,推斷速度更快。


    In this paper, we focus on Multi-Turn Response Selection. Pre-trained Language Models like BERT have achieved impressive results on many downstream natural language processing (NLP) tasks in recent years. Although BERT achieved significant performance in NLP, it suffered from its cumbersome model and long inference time. Therefore, some researchers have investigated how to compress the BERT model into a light model. In this paper, we leverage the knowledge distillation method to distill the BERT model into an ESIM model for Multi-Turn Response Selection. The resultant (distilled) ESIM model offered an improvement of 3.2% for R10@1 than the original ESIM model (Chen & Wang, 2019) on the Ubuntu Corpus V2 dataset and showed a comparable performance as the DistilBERT model (Sanh et al., 2019). Furthermore, the distilled ESIM model is 11× faster and 55% lighter than the teacher BERT model while retaining 92% of the performance of the teacher BERT model.

    1 Introduction 2 Related Work 3 Our Approach 4 Experiment 5 Conclusions 6 References

    Bucilua, C., Caruana, R., & Niculescu-Mizil, A. (2006). Model compression. In Proceedings of the 12th acm sigkdd international
    conference on knowledge discovery and data mining (pp. 535–
    541).
    Chen, Q., & Wang, W. (2019). Sequential attention-based network
    for noetic end-to-end response selection. arXiv preprint arXiv:
    1901.02609.
    Chen, Q., Zhu, X., Ling, Z.-H., Wei, S., Jiang, H., & Inkpen, D.
    (2017, July). Enhanced LSTM for natural language inference.
    In Proceedings of the 55th annual meeting of the association for
    computational linguistics (volume 1: Long papers) (pp. 1657–
    1668). Vancouver, Canada: Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/
    P17-1152 doi: 10.18653/v1/P17-1152
    Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019, June).
    BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference
    of the north American chapter of the association for computational linguistics: Human language technologies, volume 1 (long
    and short papers) (pp. 4171–4186). Minneapolis, Minnesota:
    Association for Computational Linguistics. Retrieved from
    https://www.aclweb.org/anthology/N19-1423 doi: 10.18653/
    v1/N19-1423
    Gu, J.-C., Ling, Z.-H., & Liu, Q. (2019). Interactive matching network
    for multi-turn response selection in retrieval-based chatbots. In
    22
    proceedings of the 28th acm international conference on information and knowledge management (pp. 2321–2324).
    Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge
    in a neural network. arXiv preprint arXiv:1503.02531.
    Jiao, X., Yin, Y., Shang, L., Jiang, X., Chen, X., Li, L., … Liu, Q.
    (2019). Tinybert: Distilling bert for natural language understanding. arXiv preprint arXiv:1909.10351.
    Jurafsky, D., & Martin, J. H. (2018). Speech and language processing
    (draft). preparation [cited 2020 June 1] Available from: https://
    web. stanford. edu/˜ jurafsky/slp3.
    Kadlec, R., Schmid, M., & Kleindienst, J. (2015). Improved deep
    learning baselines for ubuntu corpus dialogs. arXiv preprint
    arXiv:1510.03753.
    Li, F.-L., Qiu, M., Chen, H., Wang, X., Gao, X., Huang, J., … others
    (2017). Alime assist: An intelligent assistant for creating an
    innovative e-commerce experience. In Proceedings of the 2017
    acm on conference on information and knowledge management
    (pp. 2495–2498).
    Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., … Stoyanov,
    V. (2019). Roberta: A robustly optimized bert pretraining
    approach. arXiv preprint arXiv:1907.11692.
    Lowe, R., Pow, N., Serban, I. V., Charlin, L., Liu, C.-W., & Pineau, J.
    (2017). Training end-to-end dialogue systems with the ubuntu
    dialogue corpus. Dialogue & Discourse, 8(1), 31–65.
    Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z.,
    … Lerer, A. (2017). Automatic differentiation in pytorch.
    Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). Distilbert,
    23
    a distilled version of bert: smaller, faster, cheaper and lighter.
    arXiv preprint arXiv:1910.01108.
    Sato, S., Akama, R., Ouchi, H., Suzuki, J., & Inui, K. (2020,
    July). Evaluating dialogue generation systems via response
    selection. In Proceedings of the 58th annual meeting of the
    association for computational linguistics (pp. 593–599). Online: Association for Computational Linguistics. Retrieved
    from https://www.aclweb.org/anthology/2020.acl-main.55 doi:
    10.18653/v1/2020.acl-main.55
    Shum, H.-Y., He, X.-d., & Li, D. (2018). From eliza to xiaoice:
    challenges and opportunities with social chatbots. Frontiers of
    Information Technology & Electronic Engineering, 19(1), 10–26.
    Tang, R., Lu, Y., Liu, L., Mou, L., Vechtomova, O., & Lin, J. (2019).
    Distilling task-specific knowledge from bert into simple neural
    networks. arXiv preprint arXiv:1903.12136.
    Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez,
    A. N., … Polosukhin, I. (2017). Attention is all you need. In
    Advances in neural information processing systems (pp. 5998–
    6008).
    Weizenbaum, J. (1966). Eliza— a computer program for the study
    of natural language communication between man and machine.
    Communications of the ACM, 9(1), 36–45.
    Whang, T., Lee, D., Lee, C., Yang, K., Oh, D., & Lim, H. (2019).
    An effective domain adaptive post-training method for bert in
    response selection. arXiv preprint arXiv:1908.04812.
    Wu, S., Jiang, Y., Wang, X., Miao, W., Zhao, Z., Xie, J., & Li, M.
    (2020). Enhancing response selection with advanced context
    24
    modeling and post-training. Association for the Advancement
    of Artificial Intelligence.
    Wu, Y., Wu, W., Xing, C., Zhou, M., & Li, Z. (2016). Sequential matching network: A new architecture for multi-turn response selection in retrieval-based chatbots. arXiv preprint
    arXiv:1612.01627.
    Xu, R., Tao, C., Jiang, D., Zhao, X., Zhao, D., & Yan, R. (2020).
    Learning an effective context-response matching model with selfsupervised tasks for retrieval-based dialogues. arXiv preprint
    arXiv:2009.06265.
    Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., & Le,
    Q. V. (2019). Xlnet: Generalized autoregressive pretraining for
    language understanding. arXiv preprint arXiv:1906.08237.
    Zhou, X., Li, L., Dong, D., Liu, Y., Chen, Y., Zhao, W. X., … Wu,
    H. (2018, July). Multi-turn response selection for chatbots
    with deep attention matching network. In Proceedings of the
    56th annual meeting of the association for computational linguistics (volume 1: Long papers) (pp. 1118–1127). Melbourne,
    Australia: Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/P18-1103 doi:
    10.18653/v1/P18-1103

    QR CODE