A deep neural network model for speakers coreference resolution in legal texts
Introduction
Coreference resolution is a fundamental task in natural language processing (NLP) \(Kong, Zhang, Zhou, 2019, Tauer, Date, Nagi, Sudit, 2019), and is also crucial for many NLP downstream tasks such as information extraction (Eirini & Grigorios, 2019), question answering (Liang et al., 2019) and machine translation (Harrat, Meftouh, & Smaili, 2019), etc. As one challenging research topic, given a document, coreference resolution aims to group entities into different clusters. Existing work can mainly be divided into four categories: mention-pair models (Bengtson, Roth, 2008, Choubey, Huang, 2017, Ng, Cardie, 2002), entity-level models (Clark, Manning, 2015, Clark, Manning, 2016), latent-tree models (Haponchyk, Moschitti, 2017, Martschat, Strube, 2015) and mention-ranking models (Lee, He, Lewis, Zettlemoyer, 2017, Lee, He, Zettlemoyer, 2018, Zhang, Song, Song, 2019a). For example, Choubey et al. propose an iterative approach for event coreference resolution, which gradually builds event clusters by training two distinct pairwise classifiers to identify within- and cross-document event mentions (Choubey & Huang, 2017). Haponchyk and Moschitti conduct experiments for coreference resolution by latent structure support vector machine (LSSVM) (Haponchyk & Moschitti, 2017). Lee et al. employ an approximation of higher-order inference based on a span-ranking architecture in an iterative manner (Lee et al., 2018). Zhang et al. use a biaffine model instead of pure feed forward networks to compute antecedent scores, and directly models the compatibility of anaphora and antecedent, with the mention detection and coreference clustering module jointly optimized (Zhang, Song, Song, & Yu, 2019b). However, the aforementioned researches largely focus on the texts of general domains. Besides, there are also some efforts devoted to other domains (Liu, Qi, Xu, Gao, Liu, 2019, Qazi, Wong, 2019, Yuan, Yu, 2019), such as electronic medical records (Jonnalagadda, Li, Sohn, Wu, Wagholikar, Torii, Liu, 2012, Miller, Dligach, Bethard, Lin, Savova, 2017, Xu, Liu, Wu, et al., 2011), cross-linguistic texts (Chen, Ng, 2016, Clercq, Hoste, Hendrickx, 2011, Shibata, Kurohashi, 2018) and scientific literatures (Chaimongkol, Aizawa, Tateisi, 2014, Huang, Zhu, Huang, Yang, Fung, Hu, 2018, Magnusson, Dietz, 2019), etc.
In recent years, with the opening of high-quality legal texts, NLP techniques have extensively been applied into various tasks of legal text mining (Giacalone, Cusatelli, Romano, Buondonno, Santarcangelo, 2018, Ji, Tao, Fei, Ren, 2020, Srinivasa, Thilagam, 2019), such as legal judgment prediction (Chalkidis, Androutsopoulos, Aletras, 2019a, Xiao, Zhong, Guo, Tu, Liu, Sun, Feng, Han, Hu, Wang, & Xu, Yang, Jia, Zhou, Luo, 2019a), legal text classification (Chalkidis, Fergadiotis, Malakasiotis, Androutsopoulos, 2019b, Li, Zhao, Li, Zhu, 2018a, Sulea, Zampieri, Malmasi, Vela, Dinu, van Genabith, 2017), legal entity recognition (Cardellino, Teruel, Alemany, Villata, 2017a, Cardellino, Teruel, Alemany, Villata, 2017b, Chalkidis, Androutsopoulos, Michos, 2017) and case facts analysis (Li, He, Yan, Zhang, Wang, 2019, Xu, He, Lian, Wan, Wang, 2019). Legal text mining is gradually becoming a hot research topic. However, the research of coreference resolution based on legal texts is still to be developed. Typically, Gupta et al. apply conditional random fields (CRF) to detect mentions on the ACE 2005 dataset (Walker, Strassel, Medero, & Maeda, 2006). They first use binary classifiers such as RF, SVM and Naive Bayes, to generate candidate mention pairs, and further create coreference groups using rule templates (Gupta et al., 2018). However, their methods heavily rely on hand-crafted features, failing to capture continuous contextual information.
In this paper, we explore the problem of Speakers Coreference Resolution (SCR) in court record documents (CRDs) with mention ranking models. CRDs is a record of the factual statements and debates of the parties in the judicial activities, which is different from other legal documents such as complaint, subpoena and notarized documents. As shown in Fig. 1, the court trial process includes three stages, which are checking the parties’ identities and attendances, presenting evidence and cross-examination, and confirming the mediations. We need to identify the corresponding relations between the three types of entities. For example, A1 (Abbreviation of judge) is coreferent with Lu and Judge, and A3 (Abbreviation of entrusted agent of plaintiff) is coreferent with Entrusted agent and Zhang. We provide some annotation examples in Fig. 2. It can be observed that the expressions of abbreviation entities are very flexible, involving the status and name entity information. Besides, name entities contain not only the name of a person but also the name of a company or organization.
Directly using existing models for the task can be problematic. First, legal texts are rigorous, highly professional and knowledge-rich, which are different from ordinary texts. These increase the difficulty of applying traditional NLP technologies to the legal field. Second, the documents in the CRD dataset come from real legal cases in different provinces. The format of these documents is similar, but the recording style varies, so the expression of the abbreviation entity is flexible and variable, and itself lacks sufficient semantic meaning. There are three main types for the abbreviation entities: 1) Full name or abbreviation of the status entity. e.g.,审判长(judge),审 (first token of judge). 2) Full name or abbreviation of the name entity. e.g.,被代程(entrusted agent of defendant Cheng) 3) Special expressions in the legal language. e.g.,答(Reply)均(All). In other words, the court record documents involve multiple speakers, and each speaker can be referred in multiple ways. Third, the court record documents describe the judicial process for resolving civil disputes. The document is recorded in the form of a dialogue between the parties, and there is no standardized written format. Generally, paragraphs involving fact statements and objections are relatively lengthy. The lengthy text increases the computational complexity of the model, causing entities to be scattered in the text. How to make full use of contextual information and model the entity dependencies are key issues we need to address.
To this end, we propose a deep neural network model for speakers coreference resolution in legal texts. First, to address the challenge of lengthy text with sparse entities, we select sentences that contain the predefined entities as the input of our model. Second, following Lee et al. (2018), we employ pre-trained language model ELMo (Peters et al., 2018), bi-directional long short-term memory(Bi-LSTM) (Graves & Schmidhuber, 2005) and attention-mechanism (Vaswani et al., 2017) to generate entity representations (Lee et al., 2018). Third, to effectively leverage contextual information, we construct a document-level graph with entities and their mentioned-by relation and mapping relation. Finally, a multi-scoring mechanism, which includes a feed-forward network and a biaffine model, is applied to model the dependencies between antecedents and generate candidate scores. The mentioned-by relation is demonstrated in Fig. 3. In the example, a speaker may refer to other names of a certain party to clear the claims or raise objections. The name entity程(Cheng) is mentioned by A1, and the entity黄(Huang) is mentioned by A2. The mentioned-by relation can effectively help the model determine the parties to which the abbreviation entity belongs.
Experimental results on the dataset show that our model achieves 87.53% F1 score on court record documents, outperforming neural baseline models by a large margin. Further analysis shows that the proposed method can effectively identify the reference relations between entities and model the entity dependencies. All codes and datasets are released publicly available for the research purpose on https://github.com/IvyGao58/SpeakersCoref under Apache License 2.0.
In summary, the main contributions of this paper are as follows:
- •
We explore a new problem of Speakers Coreference Resolution in legal texts, and provide annotated dataset for further research.
- •
We investigate two different solutions to resolve coreference, and create document-level graphs to integrate contextual information. It enables us to effectively establish dependencies between entities and avoid making locally consistent but globally inconsistent decisions.
- •
The proposed method achieves competitive performance, outperforming the baseline systems by a large margin, which can be applied in many downstream tasks such as question answering and text understanding.
Section snippets
Text mining in the legal domain
NLP methods have been widely applied into various text mining tasks in the legal domain (Qazi & Wong, 2019). An important reason is that they have automatic processing capabilities for large numbers of documents. For example, Do et al. (2017) employ ranking support vector machine (SVM) and convolutional neural network (CNN) into two tasks: legal information retrieval (IE) and question answering (QA). They compare the contributions of individual features and use legal IE and QA models to score
Task modeling
The task of speakers coreference resolution is an important subtask of coreference resolution. The aim is to create coreference links among the three types of entities related to the speaker in court record document, which are Abbreviation entity, Name entity and Status entity. In our work, we propose two solutions to formalize the SCR task. (1) Abbreviation to Name mapping (A-N), creating coreference links between abbreviation entity and name entity. (2) Abbreviation to Status mapping (A-S),
Model
The overall architecture of the proposed model is shown in Fig. 5, which consists of three main modules: a span-representation module that encodes contextual information, a graph neural network module to incorporate constructed relations, and a multi-scoring mechanism to generate coreference scores. First, we concatenate the word embeddings and the output vector of pre-trained language model as the final word representations, and then a multi-layer BiLSTM is used to encode the sentence
Dataset
The CRDs dataset is collected from real-world courts about civil cases between 2012 and 2019. Three types of legal entities are related (Abbreviation, Name, Status). Coreference links are generated when three types of entities are coreferent. In the CRDs, we totally get 9464 coreference annotations in 1289 documents. During training, the dataset is divided into training, development and test sets in a ratio of 7:1:2. The statistics information of the dataset is shown in Table 1. The average
Experimental results
Experimental results of different models are shown in Table 3. The sentence classification model uses BERT as the encoder, which can only reach 66.41% F1 score. This indicates that the content of the speaker’s statement cannot accurately reflect the party to which the speaker belongs, and directly use sentence representations is not enough to make a decision.
In our experiments, we explore two coreference solutions: creating coreference links between abbreviation entity and name entity (A-N),
Conclusion
We propose a deep neural network model for speakers coreference resolution in legal texts. By constructing document-level entity relations and applying multi-scoring mechanism, our model achieves strong performance on the CRDs dataset without using any domain external knowledge, demonstrating the effectiveness of the proposed method. The proposed model is capable of modeling dependencies between different types of entities and outperforms the baseline models by a large margin. The analysis
Acknowledgments
This work is supported by the National Natural Science Foundation of China (No.61702121, No.61772378), the Research Foundation of Ministry of Education of China (No. 18JZD015), the Major Projects of the National Social Science Foundation of China (No.11&ZD189), the Key Project of State Language Commission of China (No.ZDI135-112) and Guangdong Basic and Applied Basic Research Foundation of China (No.2020A151501705).
References (65)
- et al.
Big data and forensics: An innovative approach for a predictable jurisprudence
Information Sciences
(2018) - et al.
Framewise phoneme classification with bidirectional lstm and other neural network architectures
Neural Networks
(2005) - et al.
Machine translation for arabic dialects (survey)
Information Processing & Management
(2019) - et al.
A novel approach for entity resolution in scientific documents using context graphs
Information Sciences
(2018) - et al.
An end-to-end joint model for evidence information extraction from court record document
Information Processing & Management
(2020) - et al.
A novel intelligent classification model for breast cancer diagnosis
Information Processing & Management
(2019) - et al.
Towards generalizable entity-centric clinical coreference resolution
Journal of Biomedical Informatics
(2017) - et al.
An incremental graph-partitioning algorithm for entity resolution
Information Fusion
(2019) - et al.
Hclaime: A tool for identifying health claims in health news headlines
Information Processing & Management
(2019) - et al.
Binary and multitask classification model for dutch anaphora resolution: Die/dat prediction
CoRR
(2020)
Neural machine translation by jointly learning to align and translate
Proceedings of the 3rd international conference on learning representations
Understanding the value of features for coreference resolution
Proceedings of the 2008 conference on empirical methods in natural language processing
Legal NERC with ontologies, wikipedia and curriculum learning
Proceedings of the 15th conference of the European chapter of the association for computational linguistics
A low-cost, high-coverage legal named entity recognizer, classifier and linker
Proceedings of the 16th international conference on articial intelligence and law
Ontology population and alignment for the legal domain: Yago, wikipedia and lkif
Proceedings of the international semantic web conference
Corpus for coreference resolution on scientific papers
Proceedings of the ninth international conference on language resources and evaluation
Neural legal judgment prediction in english
Proceedings of the 57th conference of the association for computational linguistics
Extracting contract elements
Proceedings of the 16th international conference on articial intelligence and law
Large-scale multi-label text classification on EU legislation
Proceedings of the 57th conference of the association for computational linguistics
Towards better UD parsing: Deep contextualized word embeddings, ensemble, and treebank concatenation
Proceedings of the CoNLL 2018 shared task: Multilingual parsing from raw text to universal dependencies
Chinese zero pronoun resolution with deep neural networks
Proceedings of the 54th annual meeting of the association for computational linguistics
Preco: A large-scale dataset in preschool vocabulary for coreference resolution
Proceedings of the 2018 conference on empirical methods in natural language processing
Event coreference resolution by iteratively unfolding inter-dependencies among events
Proceedings of the 2017 conference on empirical methods in natural language processing
Entity-centric coreference resolution with model stacking
Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing of the Asian Federation of natural language processing
Improving coreference resolution by learning entity-level distributed representations
Proceedings of the 54th annual meeting of the association for computational linguistics
Cross-domain dutch coreference resolution
Proceedings of the international conference recent advances in natural language processing
BERT: pre-training of deep bidirectional transformers for language understanding
Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics
Deep biaffine attention for neural dependency parsing
Proceedings of the 5th international conference on learning representations
Automatic extraction and linking of person names in legal text
Proceedings of the 6th international conference on computer-assisted information retrieval
Local word vectors guiding keyphrase extraction
Information Processing & Management
Identifying participant mentions and resolving their coreferences in legal court judgements
Proceedings of the 21st international conference on text, speech, and dialogue
Cited by (27)
From words to gender: Quantitative analysis of body part descriptions within literature in Portuguese
2024, Information Processing and ManagementA knowledge-augmented neural network model for sarcasm detection
2023, Information Processing and ManagementA deep neural network model for coreference resolution in geological domain
2023, Information Processing and ManagementLearning interpretable word embeddings via bidirectional alignment of dimensions with semantic concepts
2022, Information Processing and ManagementCitation Excerpt :In psychiatry, they are used to detect incoherent speech for diagnosing schizophrenia (Iter et al., 2018; Voppel et al., 2021). In legal domain, they are used to predict outcomes of courts (Mumcuoğlu et al., 2021), evidence extraction from court records (Ji, Tao et al., 2020) and coreference resolution in legal texts (Ji, Gao et al., 2020). In the social domain, based on word, sentence and document embeddings polarization in social media can be analyzed (Demszky et al., 2019) and users of social media can be profiled (López-Santillan et al., 2020).
MVE-FLK: A multi-task legal judgment prediction via multi-view encoder fusing legal keywords
2022, Knowledge-Based SystemsCitation Excerpt :In recent years, with the opening of high-quality legal texts, a wide range of novel technologies have extensively been applied to various tasks of legal text processing, such as evidence extraction [1,2], legal judgment prediction through machine learning algorithms [3,4] and deep neural networks [5–8], etc. The application of these technologies, such as the generation of legal abstracts and the prediction of judgment results [2,9], can not only fulfill a large number of repetitive tasks in a short time, but also improve work efficiencies of the judicial department. Therefore, designing an accurate and practical Legal Judgment Prediction (LJP) system by utilizing novel technologies has gradually become one of the hottest topics in the realm of law.
A comparative study of automated legal text classification using random forests and deep learning
2022, Information Processing and ManagementCitation Excerpt :Ji, Tao, et al. (2020) incorporated the legal classification task to the information extraction task as a multi-task learning problem for evidence extraction from Chinese court documents. Later, they applied the same legal texts (Ji, Tao, et al., 2020) for speakers coreference resolution (Ji, Gao, et al., 2020). Compared to the general texts such as texts in social media and online newspapers, the legal texts are usually much longer and have a more complex structure, making the classification of legal text challenging (Boella et al., 2011).