使用預訓練語言模型進行法律案例檢索之研究｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	莊智翔 Zhi-Xiang Zhuang
論文名稱：	使用預訓練語言模型進行法律案例檢索之研究 A Study on Legal Case Retrieval Using Pre-Trained Language Models
指導教授：	范欽雄 Chin-Shyurng Fahn
口試委員:	陳彥霖 Yen-Lin Chen 陳怡伶 Yi-Ling Chen 鄭為民 Wei-Min Jeng
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2022
畢業學年度：	110
語文別：	英文
論文頁數：	51
中文關鍵詞：	案例檢索、預訓練語言模型、深度學習、文件重排序、Transformer
外文關鍵詞：	Case retrieva, pre-trained language model, deep learning, document re-ranking, Transformer
相關次數：	點閱：328 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

法律判決運行了許多世紀，在當今社會所有的案例判決都會產生法律判決書，而其產生龐大的資料，使得法律從業人員在查閱上需要大量的時間，雖然現今都已將案例資料數位化，但在數位化後如何幫助法律從業人員快速地找到其想要查詢的案例，成為重要的議題。
近年來，深度學習技術應用於法律領域方面，大部分都關注在案例分類以及判刑預測上，而在案例檢索多半使用傳統的檢索方法，本論文將自然語言處理技術應用在案例檢索上，我們分析法律案例並將它映射到向量空間中，並透過歷年的法律案例建立語義模型，找出與查詢案例相關的關聯案例，使法律從業人員能夠快速地掌握相關案例，針對法律案例檢索系統我們提出了兩種基於深度學習的自動化檢索方法，分別是基於文件長度的模型與基於段落長度的模型，這兩種方法主要使用Transformer的變體模型，並且檢索流程都可以被分為兩個階段，分別是檢索和重新排名。
本論文以COLIEE-2020競賽的任務1資料集進行研究及實驗，任務1的競賽目標是找到一個可靠且穩定的法律案例檢索系統，我們使用召回率及F1分數評估模型在COLIEE-2020競賽的效果，由於我們的兩種方法都被分為兩階段，在第一階段我們會以召回率當作評估方法，我們提出的基於段落長度的模型召回率達93.87%，相較於實驗中其他檢索方法，我們的基於段落長度模型具有最高的召回率；第二階段我們會使用F1分數來評估模型，我們提出的基於段落長度模型F1分數達到61.2%，相比於其他常見的檢索方法，我們具有最好的效果。

Legal judgments have been operated on for many centuries. Nowadays, all case judgments produce legal judgments, resulting in a huge amount of data, which requires a lot of time for legal practitioners to consult. Although case data have been digitized, how to help legal practitioners quickly find the cases they want to query even after digitization, has become an important topic.
In recent years, deep learning technology has been applied to the legal field, and most of them focus on case classification and sentencing prediction but maintain the traditional retrieval methods in terms of case retrieval. In this thesis, natural language processing technology is applied to case retrieval. We analyze legal cases and map them into vector space, and establish semantic models through legal cases over the years to find out the relevance cases to the query cases, so that the legal practitioners can quickly search similar cases. For the legal case retrieval system, we propose two automatic retrieval methods based deep learning, namely, the model based on document length and the model based on paragraph length. These two methods mainly use the variant model of Transformer, and the retrieval process can be divided into two stages: retrieval and re-ranking.
This thesis studies and conducts experiments on the task 1 dataset of the COLIEE-2020 competition. The goal of the task 1 competition is to find a reliable and stable legal case retrieval system. We use the recall and F1-score to evaluate the effect of the model in the COLIEE-2020 competition. Since these two methods are both divided into two stages, in the first stage, we will take the recall rate as the evaluation method. The recall rate of our model based on paragraph length is 93.87%, which, compared with other retrieval methods in the experiment, has the highest recall rate; in the second stage, we will use F1-score to evaluate the model, the F1-score of our paragraph length-based model reaches 61.2%, which has the best effect compared with other common retrieval methods.

Contents
中文摘要    i
Abstract    ii
誌謝    iv
List of Figures    vii
List of Tables    ix
Chapter 1    Introduction    1
1    Overview    1
2    Motivation    3
3    System Description    4
4    Thesis Organization    5
Chapter 2    Related Work    6
1    Word Embedding    6
1.1    Word2vec    6
1.2    Transformer    8
1.3    ELMo    11
1.5    BERT    14
2    Traditional Information Retrieval Models    15
2.1    Vector Space Model    16
2.2    BM25    17
3    Neural Approaches to Information Retrieval    18
3.1    DSSM    19
3.2    K-NRM    21
3.3    Multi-Stage Document Ranking with BERT    23
Chapter 3    Our Proposed Legal Case Retrieval Method    27
1    Data Preprocessing    27
2    Longformer    27
3    Sentence-BERT    29
4    RoBERTa    30
5    Document-level Model    33
6    Paragraph-level Model    36

Chapter 4    Experimental Results and Discussion    40
1    Experimental Environment Setup    40
2    COLIEE 2020 Task-1 Dataset    41
3    Evaluation Metric    42
4    Results of COLIEE 2020 Task1    42
Chapter 5    Conclusions and Future Work    47
1    Conclusions    47
2    Future Work    48
References    49


                                

[1] S. RoBERTson and H. Zaragoza, “The probabilistic relevance framework: BM25 and beyond,” Foundations and Trends in Information Retrieval, vol. 3, no. 4, pp. 333-389, 2009.
[2] Y. Bengio et al., “A neural probabilistic language model,” Journal of Machine Learning Research, vol. 3, pp. 1137-1155, 2003.
[3] R. ColloBERT and J. Weston, “A unified architecture for natural language processing: deep neural networks with multitask learning,” in Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, pp. 160-167, 2008.
[4] T. Mikolov et al., “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
[5] F. Morin and Y. Bengio, “Hierarchical probabilistic neural network language model,” in Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, Bridgetown, Barbados, pp. 246-252, 2005.
[6] T. Mikolov et al., “Distributed representations of words and phrases and their compositionality,” in Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, Nevada, pp. 3111-3119, 2013.
[7] Q. Le and T. Mikolov, “Distributed representations of sentences and documents,” in Proceedings of the 31st International Conference on Machine Learning, Beijing, China, pp. 1188-1196, 2014.
[8] A. Vaswani et al., “Attention is all you need,” in Proceedings of the Advances in Neural Information Processing Systems, Long Beach, California, pp. 5998-6008, 2017.
[9] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997.
[10] M. E. Peters et al., “Deep contextualized word representations,” in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, New Orleans, Louisiana, pp. 2227-2237, 2018.
[11] J. Devlin et al., “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, Minneapolis, Minnesota, pp. 4171-4186, 2019.
[12] G. Salton, A. Wong, and C. S. Yang, “A vector space model for automatic indexing,” Communications of the ACM, vol. 18, no. 11, pp. 613-620, 1975.
[13] S. Robertson, H. Zaragoza, and M. Taylor, “Simple BM25 extension to multiple weighted fields,” in Proceedings of the 13th ACM International Conference on Information and Knowledge Management, Washington, D.C., pp. 42-49, 2004.
[14] J. Guo et al., “A deep relevance matching model for ad-hoc retrieval,” in Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, Indianapolis, Indiana, pp. 55-64, 2016.
[15] R. Nogueira and K. Cho, “Passage Re-ranking with BERT,” arXiv preprint arXiv:1901.04085, 2019.
[16] P. S. Huang et al., “Learning deep structured semantic models for web search using clickthrough data,” in Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, San Francisco, California, pp. 2333-2338, 2013.
[17] Y. Shen et al., “Learning semantic representations using convolutional neural networks for web search,” in Proceedings of the 23rd International Conference on World Wide Web, Seoul, Korea, pp. 373-374, 2014.
[18] H. Palangi et al., “Semantic modelling with long-short-term memory for information retrieval,” arXiv preprint arXiv:1412.6629, 2014.
[19] C. Xiong et al., “End-to-end neural ad-hoc ranking with kernel pooling,” in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Tokyo, Japan, pp. 55-64, 2017.
[20] J. P. Vert, K. Tsuda, and B. Schölkopf, “A primer on kernel methods,” in Kernel Methods in Computational Biology, Chapter 2, Edited by B. Schölkopf, K. Tsuda, and J. P. Vert, Cambridge, MA: The MIT Press, pp. 35-70, 2004.
[21] R. Nogueira et al., “Multi-stage document ranking with BERT,” arXiv preprint arXiv:1910.14424, 2019.
[22] Z. Dai and J. Callan, “Deeper text understanding for IR with contextual neural language modeling,” in Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, pp. 985-988, 2019.
[23] I. Beltagy, M. E. Peters, and A. Cohan, “Longformer: The long-document transformer,” arXiv preprint arXiv:2004.05150, 2020.
[24] Z. Dai et al., “Transformer-XL: Attentive language models beyond a fixed-length context,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp. 2978-2988, 2019.
[25] N. Reimers and I. Gurevych, “Sentence-BERT: Sentence embeddings using siamese BERT-networks,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, pp. 3973-3983, 2019.
[26] Y. Liu et al., “RoBERTa: A robustly optimized BERT pretraining approach,” arXiv preprint arXiv:1907.11692, 2019.
[27] N. Thakur et al., “BEIR: A heterogenous benchmark for zero-shot evaluation of information retrieval models,” arXiv preprint arXiv:2104.08663, 2021.
[28] V. Sanh et al., “DistilBERT, A distilled version of BERT: Smaller, faster, cheaper and lighter,” arXiv preprint arXiv:1910.01108, 2019.
[29] N. Craswell, “Mean reciprocal rank,” in Encyclopedia of Database Systems, Edited by L. Liu and M.T. ÖZsu, Boston, MA: Springer, pp. 1703-1703, 2009.

全文公開日期 2027/01/21 (校內網路)
全文公開日期 2032/01/21 (校外網路)
全文公開日期 2032/01/21 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文