Unsupervised Identification of Relevant Prior Cases,arXiv - CS - Information Retrieval

当前位置： X-MOL 学术 › arXiv.cs.IR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Unsupervised Identification of Relevant Prior Cases
arXiv - CS - Information Retrieval Pub Date : 2021-07-19 , DOI: arxiv-2107.08973
Shivangi Bithel, Sumitra S Malagi

Document retrieval has taken its role in almost all domains of knowledge understanding, including the legal domain. Precedent refers to a court decision that is considered as authority for deciding subsequent cases involving identical or similar facts or similar legal issues. In this work, we propose different unsupervised approaches to solve the task of identifying relevant precedents to a given query case. Our proposed approaches are using word embeddings like word2vec, doc2vec, and sent2vec, finding cosine similarity using TF-IDF, retrieving relevant documents using BM25 scores, using the pre-trained model and SBERT to find the most similar document, and using the product of BM25 and TF-IDF scores to find the most relevant document for a given query. We compared all the methods based on precision@10, recall@10, and MRR. Based on the comparative analysis, we found that the TF-IDF score multiplied by the BM25 score gives the best result. In this paper, we have also presented the analysis that we did to improve the BM25 score.

中文翻译：

相关先前案例的无监督识别

文件检索在几乎所有知识理解领域都发挥了作用，包括法律领域。判例是指法院的判决，被认为是对涉及相同或相似事实或相似法律问题的后续案件作出裁决的权威。在这项工作中，我们提出了不同的无监督方法来解决识别给定查询案例的相关先例的任务。我们提出的方法是使用 word2vec、doc2vec 和 sent2vec 等词嵌入，使用 TF-IDF 查找余弦相似度，使用 BM25 分数检索相关文档，使用预训练模型和 SBERT 查找最相似的文档，并使用BM25 和 TF-IDF 得分以找到与给定查询最相关的文档。我们比较了基于 precision@10、recall@10 和 MRR 的所有方法。基于对比分析，我们发现TF-IDF分数乘以BM25分数给出了最好的结果。在本文中，我们还介绍了我们为提高 BM25 分数所做的分析。

更新日期：2021-07-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文