A deep neural network model for speakers coreference resolution in legal texts

https://doi.org/10.1016/j.ipm.2020.102365Get rights and content

Abstract

Coreference resolution is one of the fundamental tasks in natural language processing (NLP), and is of great significance to understand the semantics of texts. Meanwhile, resolving coreference is essential for many NLP downstream applications. Existing methods largely focus on pronouns, possessives and noun phrases resolution in the general domain, while little work is proposed for professional domains such as the legal field. Different from general texts, how to code legal texts and capture the relationship between entities in the text, and then resolve coreference is a challenging problem. For better understanding the legal text, and facilitating a series of downstream tasks in legal text mining, we propose a deep neural network model for coreference resolution in court record documents. Specifically, the pre-trained language model and bi-directional long short-term memory networks are first utilized to encode legal texts. Second, graph neural networks are applied to incorporate reference relations between entities. Finally, two distinct classifiers are used to score the candidate pairs. Results on the dataset show that our model achieves 87.53% F1 score on court record documents, outperforming neural baseline models by a large margin. Further analysis shows that the proposed method can effectively identify the reference relations between entities and model the entity dependencies.

Introduction

Coreference resolution is a fundamental task in natural language processing (NLP) \(Kong, Zhang, Zhou, 2019, Tauer, Date, Nagi, Sudit, 2019), and is also crucial for many NLP downstream tasks such as information extraction (Eirini & Grigorios, 2019), question answering (Liang et al., 2019) and machine translation (Harrat, Meftouh, & Smaili, 2019), etc. As one challenging research topic, given a document, coreference resolution aims to group entities into different clusters. Existing work can mainly be divided into four categories: mention-pair models (Bengtson, Roth, 2008, Choubey, Huang, 2017, Ng, Cardie, 2002), entity-level models (Clark, Manning, 2015, Clark, Manning, 2016), latent-tree models (Haponchyk, Moschitti, 2017, Martschat, Strube, 2015) and mention-ranking models (Lee, He, Lewis, Zettlemoyer, 2017, Lee, He, Zettlemoyer, 2018, Zhang, Song, Song, 2019a). For example, Choubey et al. propose an iterative approach for event coreference resolution, which gradually builds event clusters by training two distinct pairwise classifiers to identify within- and cross-document event mentions (Choubey & Huang, 2017). Haponchyk and Moschitti conduct experiments for coreference resolution by latent structure support vector machine (LSSVM) (Haponchyk & Moschitti, 2017). Lee et al. employ an approximation of higher-order inference based on a span-ranking architecture in an iterative manner (Lee et al., 2018). Zhang et al. use a biaffine model instead of pure feed forward networks to compute antecedent scores, and directly models the compatibility of anaphora and antecedent, with the mention detection and coreference clustering module jointly optimized (Zhang, Song, Song, & Yu, 2019b). However, the aforementioned researches largely focus on the texts of general domains. Besides, there are also some efforts devoted to other domains (Liu, Qi, Xu, Gao, Liu, 2019, Qazi, Wong, 2019, Yuan, Yu, 2019), such as electronic medical records (Jonnalagadda, Li, Sohn, Wu, Wagholikar, Torii, Liu, 2012, Miller, Dligach, Bethard, Lin, Savova, 2017, Xu, Liu, Wu, et al., 2011), cross-linguistic texts (Chen, Ng, 2016, Clercq, Hoste, Hendrickx, 2011, Shibata, Kurohashi, 2018) and scientific literatures (Chaimongkol, Aizawa, Tateisi, 2014, Huang, Zhu, Huang, Yang, Fung, Hu, 2018, Magnusson, Dietz, 2019), etc.

In recent years, with the opening of high-quality legal texts, NLP techniques have extensively been applied into various tasks of legal text mining (Giacalone, Cusatelli, Romano, Buondonno, Santarcangelo, 2018, Ji, Tao, Fei, Ren, 2020, Srinivasa, Thilagam, 2019), such as legal judgment prediction (Chalkidis, Androutsopoulos, Aletras, 2019a, Xiao, Zhong, Guo, Tu, Liu, Sun, Feng, Han, Hu, Wang, & Xu, Yang, Jia, Zhou, Luo, 2019a), legal text classification (Chalkidis, Fergadiotis, Malakasiotis, Androutsopoulos, 2019b, Li, Zhao, Li, Zhu, 2018a, Sulea, Zampieri, Malmasi, Vela, Dinu, van Genabith, 2017), legal entity recognition (Cardellino, Teruel, Alemany, Villata, 2017a, Cardellino, Teruel, Alemany, Villata, 2017b, Chalkidis, Androutsopoulos, Michos, 2017) and case facts analysis (Li, He, Yan, Zhang, Wang, 2019, Xu, He, Lian, Wan, Wang, 2019). Legal text mining is gradually becoming a hot research topic. However, the research of coreference resolution based on legal texts is still to be developed. Typically, Gupta et al. apply conditional random fields (CRF) to detect mentions on the ACE 2005 dataset (Walker, Strassel, Medero, & Maeda, 2006). They first use binary classifiers such as RF, SVM and Naive Bayes, to generate candidate mention pairs, and further create coreference groups using rule templates (Gupta et al., 2018). However, their methods heavily rely on hand-crafted features, failing to capture continuous contextual information.

In this paper, we explore the problem of Speakers Coreference Resolution (SCR) in court record documents (CRDs) with mention ranking models. CRDs is a record of the factual statements and debates of the parties in the judicial activities, which is different from other legal documents such as complaint, subpoena and notarized documents. As shown in Fig. 1, the court trial process includes three stages, which are checking the parties’ identities and attendances, presenting evidence and cross-examination, and confirming the mediations. We need to identify the corresponding relations between the three types of entities. For example, A1 (Abbreviation of judge) is coreferent with Lu and Judge, and A3 (Abbreviation of entrusted agent of plaintiff) is coreferent with Entrusted agent and Zhang. We provide some annotation examples in Fig. 2. It can be observed that the expressions of abbreviation entities are very flexible, involving the status and name entity information. Besides, name entities contain not only the name of a person but also the name of a company or organization.

Directly using existing models for the task can be problematic. First, legal texts are rigorous, highly professional and knowledge-rich, which are different from ordinary texts. These increase the difficulty of applying traditional NLP technologies to the legal field. Second, the documents in the CRD dataset come from real legal cases in different provinces. The format of these documents is similar, but the recording style varies, so the expression of the abbreviation entity is flexible and variable, and itself lacks sufficient semantic meaning. There are three main types for the abbreviation entities: 1) Full name or abbreviation of the status entity. e.g.,审判长(judge),审 (first token of judge). 2) Full name or abbreviation of the name entity. e.g.,被代程(entrusted agent of defendant Cheng) 3) Special expressions in the legal language. e.g.,答(Reply)均(All). In other words, the court record documents involve multiple speakers, and each speaker can be referred in multiple ways. Third, the court record documents describe the judicial process for resolving civil disputes. The document is recorded in the form of a dialogue between the parties, and there is no standardized written format. Generally, paragraphs involving fact statements and objections are relatively lengthy. The lengthy text increases the computational complexity of the model, causing entities to be scattered in the text. How to make full use of contextual information and model the entity dependencies are key issues we need to address.

To this end, we propose a deep neural network model for speakers coreference resolution in legal texts. First, to address the challenge of lengthy text with sparse entities, we select sentences that contain the predefined entities as the input of our model. Second, following Lee et al. (2018), we employ pre-trained language model ELMo (Peters et al., 2018), bi-directional long short-term memory(Bi-LSTM) (Graves & Schmidhuber, 2005) and attention-mechanism (Vaswani et al., 2017) to generate entity representations (Lee et al., 2018). Third, to effectively leverage contextual information, we construct a document-level graph with entities and their mentioned-by relation and mapping relation. Finally, a multi-scoring mechanism, which includes a feed-forward network and a biaffine model, is applied to model the dependencies between antecedents and generate candidate scores. The mentioned-by relation is demonstrated in Fig. 3. In the example, a speaker may refer to other names of a certain party to clear the claims or raise objections. The name entity程(Cheng) is mentioned by A1, and the entity黄(Huang) is mentioned by A2. The mentioned-by relation can effectively help the model determine the parties to which the abbreviation entity belongs.

Experimental results on the dataset show that our model achieves 87.53% F1 score on court record documents, outperforming neural baseline models by a large margin. Further analysis shows that the proposed method can effectively identify the reference relations between entities and model the entity dependencies. All codes and datasets are released publicly available for the research purpose on https://github.com/IvyGao58/SpeakersCoref under Apache License 2.0.

In summary, the main contributions of this paper are as follows:

  • We explore a new problem of Speakers Coreference Resolution in legal texts, and provide annotated dataset for further research.

  • We investigate two different solutions to resolve coreference, and create document-level graphs to integrate contextual information. It enables us to effectively establish dependencies between entities and avoid making locally consistent but globally inconsistent decisions.

  • The proposed method achieves competitive performance, outperforming the baseline systems by a large margin, which can be applied in many downstream tasks such as question answering and text understanding.

Section snippets

Text mining in the legal domain

NLP methods have been widely applied into various text mining tasks in the legal domain (Qazi & Wong, 2019). An important reason is that they have automatic processing capabilities for large numbers of documents. For example, Do et al. (2017) employ ranking support vector machine (SVM) and convolutional neural network (CNN) into two tasks: legal information retrieval (IE) and question answering (QA). They compare the contributions of individual features and use legal IE and QA models to score

Task modeling

The task of speakers coreference resolution is an important subtask of coreference resolution. The aim is to create coreference links among the three types of entities related to the speaker in court record document, which are Abbreviation entity, Name entity and Status entity. In our work, we propose two solutions to formalize the SCR task. (1) Abbreviation to Name mapping (A-N), creating coreference links between abbreviation entity and name entity. (2) Abbreviation to Status mapping (A-S),

Model

The overall architecture of the proposed model is shown in Fig. 5, which consists of three main modules: a span-representation module that encodes contextual information, a graph neural network module to incorporate constructed relations, and a multi-scoring mechanism to generate coreference scores. First, we concatenate the word embeddings and the output vector of pre-trained language model as the final word representations, and then a multi-layer BiLSTM is used to encode the sentence

Dataset

The CRDs dataset is collected from real-world courts about civil cases between 2012 and 2019. Three types of legal entities are related (Abbreviation, Name, Status). Coreference links are generated when three types of entities are coreferent. In the CRDs, we totally get 9464 coreference annotations in 1289 documents. During training, the dataset is divided into training, development and test sets in a ratio of 7:1:2. The statistics information of the dataset is shown in Table 1. The average

Experimental results

Experimental results of different models are shown in Table 3. The sentence classification model uses BERT as the encoder, which can only reach 66.41% F1 score. This indicates that the content of the speaker’s statement cannot accurately reflect the party to which the speaker belongs, and directly use sentence representations is not enough to make a decision.

In our experiments, we explore two coreference solutions: creating coreference links between abbreviation entity and name entity (A-N),

Conclusion

We propose a deep neural network model for speakers coreference resolution in legal texts. By constructing document-level entity relations and applying multi-scoring mechanism, our model achieves strong performance on the CRDs dataset without using any domain external knowledge, demonstrating the effectiveness of the proposed method. The proposed model is capable of modeling dependencies between different types of entities and outperforms the baseline models by a large margin. The analysis

Acknowledgments

This work is supported by the National Natural Science Foundation of China (No.61702121, No.61772378), the Research Foundation of Ministry of Education of China (No. 18JZD015), the Major Projects of the National Social Science Foundation of China (No.11&ZD189), the Key Project of State Language Commission of China (No.ZDI135-112) and Guangdong Basic and Applied Basic Research Foundation of China (No.2020A151501705).

References (65)

  • D. Bahdanau et al.

    Neural machine translation by jointly learning to align and translate

    Proceedings of the 3rd international conference on learning representations

    (2015)
  • E. Bengtson et al.

    Understanding the value of features for coreference resolution

    Proceedings of the 2008 conference on empirical methods in natural language processing

    (2008)
  • C. Cardellino et al.

    Legal NERC with ontologies, wikipedia and curriculum learning

    Proceedings of the 15th conference of the European chapter of the association for computational linguistics

    (2017)
  • C. Cardellino et al.

    A low-cost, high-coverage legal named entity recognizer, classifier and linker

    Proceedings of the 16th international conference on articial intelligence and law

    (2017)
  • C. Cardellino et al.

    Ontology population and alignment for the legal domain: Yago, wikipedia and lkif

    Proceedings of the international semantic web conference

    (2017)
  • P. Chaimongkol et al.

    Corpus for coreference resolution on scientific papers

    Proceedings of the ninth international conference on language resources and evaluation

    (2014)
  • I. Chalkidis et al.

    Neural legal judgment prediction in english

    Proceedings of the 57th conference of the association for computational linguistics

    (2019)
  • I. Chalkidis et al.

    Extracting contract elements

    Proceedings of the 16th international conference on articial intelligence and law

    (2017)
  • I. Chalkidis et al.

    Large-scale multi-label text classification on EU legislation

    Proceedings of the 57th conference of the association for computational linguistics

    (2019)
  • W. Che et al.

    Towards better UD parsing: Deep contextualized word embeddings, ensemble, and treebank concatenation

    Proceedings of the CoNLL 2018 shared task: Multilingual parsing from raw text to universal dependencies

    (2018)
  • C. Chen et al.

    Chinese zero pronoun resolution with deep neural networks

    Proceedings of the 54th annual meeting of the association for computational linguistics

    (2016)
  • H. Chen et al.

    Preco: A large-scale dataset in preschool vocabulary for coreference resolution

    Proceedings of the 2018 conference on empirical methods in natural language processing

    (2018)
  • P.K. Choubey et al.

    Event coreference resolution by iteratively unfolding inter-dependencies among events

    Proceedings of the 2017 conference on empirical methods in natural language processing

    (2017)
  • K. Clark et al.

    Entity-centric coreference resolution with model stacking

    Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing of the Asian Federation of natural language processing

    (2015)
  • K. Clark et al.

    Improving coreference resolution by learning entity-level distributed representations

    Proceedings of the 54th annual meeting of the association for computational linguistics

    (2016)
  • O.D. Clercq et al.

    Cross-domain dutch coreference resolution

    Proceedings of the international conference recent advances in natural language processing

    (2011)
  • J. Devlin et al.

    BERT: pre-training of deep bidirectional transformers for language understanding

    Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics

    (2019)
  • Do, P.-K., Nguyen, H.-T., Tran, C.-X., Nguyen, M.-T., & Nguyen, M.-L. (2017). Legal question answering using ranking...
  • T. Dozat et al.

    Deep biaffine attention for neural dependency parsing

    Proceedings of the 5th international conference on learning representations

    (2017)
  • C. Dozier et al.

    Automatic extraction and linking of person names in legal text

    Proceedings of the 6th international conference on computer-assisted information retrieval

    (2000)
  • P. Eirini et al.

    Local word vectors guiding keyphrase extraction

    Information Processing & Management

    (2019)
  • A. Gupta et al.

    Identifying participant mentions and resolving their coreferences in legal court judgements

    Proceedings of the 21st international conference on text, speech, and dialogue

    (2018)
  • Cited by (27)

    • A knowledge-augmented neural network model for sarcasm detection

      2023, Information Processing and Management
    • Learning interpretable word embeddings via bidirectional alignment of dimensions with semantic concepts

      2022, Information Processing and Management
      Citation Excerpt :

      In psychiatry, they are used to detect incoherent speech for diagnosing schizophrenia (Iter et al., 2018; Voppel et al., 2021). In legal domain, they are used to predict outcomes of courts (Mumcuoğlu et al., 2021), evidence extraction from court records (Ji, Tao et al., 2020) and coreference resolution in legal texts (Ji, Gao et al., 2020). In the social domain, based on word, sentence and document embeddings polarization in social media can be analyzed (Demszky et al., 2019) and users of social media can be profiled (López-Santillan et al., 2020).

    • MVE-FLK: A multi-task legal judgment prediction via multi-view encoder fusing legal keywords

      2022, Knowledge-Based Systems
      Citation Excerpt :

      In recent years, with the opening of high-quality legal texts, a wide range of novel technologies have extensively been applied to various tasks of legal text processing, such as evidence extraction [1,2], legal judgment prediction through machine learning algorithms [3,4] and deep neural networks [5–8], etc. The application of these technologies, such as the generation of legal abstracts and the prediction of judgment results [2,9], can not only fulfill a large number of repetitive tasks in a short time, but also improve work efficiencies of the judicial department. Therefore, designing an accurate and practical Legal Judgment Prediction (LJP) system by utilizing novel technologies has gradually become one of the hottest topics in the realm of law.

    • A comparative study of automated legal text classification using random forests and deep learning

      2022, Information Processing and Management
      Citation Excerpt :

      Ji, Tao, et al. (2020) incorporated the legal classification task to the information extraction task as a multi-task learning problem for evidence extraction from Chinese court documents. Later, they applied the same legal texts (Ji, Tao, et al., 2020) for speakers coreference resolution (Ji, Gao, et al., 2020). Compared to the general texts such as texts in social media and online newspapers, the legal texts are usually much longer and have a more complex structure, making the classification of legal text challenging (Boella et al., 2011).

    View all citing articles on Scopus
    View full text