当前位置: X-MOL 学术Inf. Process. Manag. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An end-to-end joint model for evidence information extraction from court record document
Information Processing & Management ( IF 8.6 ) Pub Date : 2020-05-29 , DOI: 10.1016/j.ipm.2020.102305
Donghong Ji , Peng Tao , Hao Fei , Yafeng Ren

Information extraction is one of the important tasks in the field of Natural Language Processing (NLP). Most of the existing methods focus on general texts and little attention is paid to information extraction in specialized domains such as legal texts. This paper explores the task of information extraction in the legal field, which aims to extract evidence information from court record documents (CRDs). In the general domain, entities and relations are mostly words and phrases, indicating that they do not span multiple sentences. In contrast, evidence information in CRDs may span multiple sentences, while existing models cannot handle this situation. To address this issue, we first add a classification task in addition to the extraction task. We then formulate the two tasks as a multi-task learning problem and present a novel end-to-end model to jointly address the two tasks. The joint model adopts a shared encoder followed by separate decoders for the two tasks. The experimental results on the dataset show the effectiveness of the proposed model, which can obtain 72.36% F1 score, outperforming previous methods and strong baselines by a large margin.



中文翻译:

从法院记录文件中提取证据信息的端到端联合模型

信息提取是自然语言处理(NLP)领域的重要任务之一。现有的大多数方法都将重点放在一般文本上,而很少关注专门领域(例如法律文本)中的信息提取。本文探讨了法律领域中信息提取的任务,旨在从法院记录文件(CRD)中提取证据信息。在一般领域中,实体和关系主要是单词和短语,表示它们不跨越多个句子。相反,CRD中的证据信息可能跨越多个句子,而现有模型无法处理这种情况。为了解决这个问题,我们首先要在提取任务之外添加一个分类任务。然后,我们将这两个任务表述为一个多任务学习问题,并提出了一种新颖的端到端模型来共同解决这两个任务。联合模型采用共享编码器,然后为两个任务使用单独的解码器。数据集上的实验结果表明了该模型的有效性,该模型可以获得72.36%的F1分数,大大优于以前的方法和强大的基线。

更新日期:2020-05-29
down
wechat
bug