Elsevier

Neurocomputing

Volume 411, 21 October 2020, Pages 302-312
Neurocomputing

Joint extraction of entities and relations using graph convolution over pruned dependency trees

https://doi.org/10.1016/j.neucom.2020.06.061Get rights and content

Abstract

We present a novel end-to-end deep neural network model based on graph convolutional networks for simultaneous joint extraction of entities and relations among them. Our model captures context and syntactic information from sentence by stacking a graph convolutional layer over bidirectional sequential LSTM layers. We sequentially concatenate the subject, object and sentence representations for obtaining the directionality of relations. Besides, in order to address long entity-distances problem, we further apply a path-centric pruning procedure to input trees in order to preserve useful information while maximally removing irrelevant words. Experiments are conducted on NYT dataset, and the proposed model achieves the state-of-the-art results on entity and relation extraction task. Our source code is available on Github: https://github.com/michael-hon/LSTM-GCN-ER.

Introduction

The task of end-to-end extraction of entities and their relations is to detect entity mentions and identify semantic relations among entities from plain text at the same time. It is the basis of many downstream natural language processing tasks, including question answering [12], knowledge graph construction [26], and summarization [2].

Traditional approaches treat this entity and relationship extraction task as a pipeline of two sub-tasks, i.e., detecting the entities [16] first and then extracting the semantic relations [43] among entities. This pipeline framework is easy and flexible since we can be free to replace one component without considering another one. However, the pipeline approach suffers from error propagation since entity recognition errors can be propagated to the relation extraction step, resulting in poor performance. In addition to this, the pipeline method also ignores the correlation between these two tasks. For instance, in the sentence Qinghai is one of the most rugged areas in China, entity Qinghai and China information is very important for correct extraction of the relation contains, and vice versa.

Recently, more and more studies focus on extracting entities and relations simultaneously in a single model, which are usually named as joint extraction method. Joint model can utilize the information of entities and relations together to achieve better performance compared to the pipeline method. [25], [18], [30] employ feature engineering to construct a joint extraction system. They require a lot of time to construct features manually and rely heavily on other NLP tools, which causes the performance of the model to depend on the features of manual design. Therefore, in order to reduce manual work time, researchers try to apply neural networks to jointly extract entities and relations in a single model [13], [45], [1], [34]. However, most end-to-end models do not consider the issue of directionality of relations, i.e., properly assigning the subject and object within a relationship. In the above Qinghai example, (China, Qinghai) hold the contains relation, while (Qinghai, China) have the administrative_divisions relationship. These two triplets (China, contains, Qinghai) and (Qinghai, administrative_divisions, China) are said to exhibit SingleEntityOverlap because they have an overlapped entity. Thus, we can treat the relation directionality as SingleEntityOverlap issue.

Besides, in order to address long entity-distance problem in long sentences, previous works [24], [19], [8] utilize the dependency tree from input sentence to extract relation with Tree-LSTM[35] or recursive neural network[32]. However, these models have some shortcomings. Since models are operating directly on the dependency tree, it is hard to parallelize and thus the computation speed becomes very slow. Moreover, some research [4], [40], [39] just utilize the shortest dependency path (SDP) between entities for removing much irrelevant information. However, only considering the shortest dependency path between entities in the dependency tree may ignore some important information, which may have some limits. For example, in Fig. 1, if the model only uses the SDP as input, token not will be not taken into account, which will cause classification error.

Based on the above analysis, we present a novel end-to-end deep neural network model to jointly extract entities and relations. Our network model mainly consists of bidirectional sequential LSTM layers and a graph convolutional layer [22], [43]. Our model first decodes entities by conditional random field (CRF) and then encodes the dependency tree over input sentences with graph convolutional layers for detecting relations between two entities. Since graph convolutional operation can use matrix multiplications, which can be easily implemented by batch training and parallel computing, and thus our model is more efficient compared with [35], [24], [19]. Besides, we sequentially concatenate the subject and object representations with sentence representation and thus the final hidden representation is different for entity pairs (e1, e2) and (e2, e1) even for the same sentence. This indicates that our model is sensitive to the directionality of relations. In order to solve the problem that the shortest dependency path may lose important information, we employ a path-centric pruning technique to prune the dependency tree to maximally keep important information while removing irrelevant tokens [43].

We evaluate our model on the New York Times (NYT) dataset which is produced by distant supervised method [30]. Our model outperforms previous feature-based and neural network-based methods, which indicates the effectiveness of our proposed model. Besides, the experimental results show that our model can effectively address long distance between entities and the issue of relation directionality.

In summary, the main contributions of our work are: (i) we propose an end-to-end neural network model where graph convolutional network is introduced to detect relations. (ii) we consider the directionality of relations problem and long entity-distance problem. (iii) we conduct experiment on NYT dataset and achieve the state-of-the-art result.

Section snippets

Related work

Extracting entities and relations is important for many other NLP tasks. Currently two methods are mainly for this task, namely the pipeline approach and the joint extraction method.

The pipeline approach treats the task as two sub-tasks, i.e, named entity recognition (NER) and relation extraction (RE). For NER, most methods convert NER into sequential tagging task. [3] uses a hybrid bidirectional LSTM and CNN architecture to automatically detect word- and character-level features. [16]

Our model

In this section, we define our end-to-end neural relation extraction model. Fig. 2 illustrates the overview of the model. This model can be divided into sequence layer and GCN layer, one for extracting entities and one for identifying relations. At sequence layer, we use bidirectional sequential LSTMs to encode the source sentence and then CRF is used to decode entity sequences globally. At GCN layer, word sequence and entity tag sequence representation are concatenated and then input to graph

Experiments

To evaluate the performance of our model, we conduct experiment on a public dataset NewYork Times (NYT)2 which is produced by distant supervision method [31].

In NYT, the training data can be generated by distant supervision without manully labeling while the test dataset is manually labeled for ensuring quality with 3,880 relation triplets. Besides, this dataset contains 3 entity types3

Conclusion

We proposed a novel end-to-end deep neural network model to jointly extract entities and relations by combining bidirectional sequential LSTM layers with graph convolutional networks. Besides, we employ a path-centric pruning strategy to prune the dependency tree to hold relevant information while excluding irrelevant content as much as possible. The experimental results show that our model achieves the best result on New York Times (NYT) corpora and can effectively address the relation

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

CRediT authorship contribution statement

Yin Hong: Conceptualization, Methodology, Validation, Investigation, Writing - original draft. Yanxia Liu: Supervision, Writing - review & editing. Suizhu Yang: Writing - review & editing. Kaiwen Zhang: Writing - review & editing. Jianjun Hu: Writing - review & editing.

Acknowledgements

This work is supported by Research and Development Project in Key Areas of Guangdong Province (2018B010109004) and the program of China Scholarship Council (201806155098).

Yin Hong is currently a post graduate student in School of Software Engineering in South China University of Technology. His research interests include named entity recognition and relation extraction

References (45)

  • S. Zheng et al.

    Joint entity and relation extraction based on a hybrid neural network

    Neurocomputing

    (2017)
  • H. Adel et al.

    Global normalization of convolutional neural networks for joint entity and relation classification

  • R.K. Amplayo et al.

    Entity commonsense representation for neural abstractive summarization

  • J. Chiu et al.

    Named entity recognition with bidirectional lstm-cnns

    Trans. Assoc. Comput. Linguist.

    (2016)
  • K. Fundel et al.

    Relexrelation extraction using dependency parse trees

    Bioinformatics

    (2006)
  • M.R. Gormley et al.

    Improved relation extraction with feature-rich compositional embedding models

  • A.Z. Gregoric et al.

    Named entity recognition with parallel recurrent neural networks

  • Z. GuoDong, S. Jian, Z. Jie, Z. Min, Exploring various knowledge in relation extraction. In Proceedings of the 43rd...
  • P. Gupta, S. Rajaram, H. Schütze, T. Runkler, Neural relation extraction within and across sentence boundaries. In...
  • P. Gupta et al.

    Table filling multi-task recurrent neural network for joint entity and relation extraction

  • S. Hochreiter et al.

    Long short-term memory

    Neural Comput.

    (1997)
  • R. Hoffmann, C. Zhang, X. Ling, L. Zettlemoyer, D.S. Weld, Knowledge-based weak supervision for information extraction...
  • S. Hu et al.

    A state-transition framework to answer complex questions over knowledge base

  • A. Katiyar et al.

    Going out on a limb: joint extraction of entity mentions and relations without dependency trees

  • D.P. Kingma, J. Ba, Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980,...
  • J. Lafferty, A. McCallum, F.C. Pereira, Conditional random fields: probabilistic models for segmenting and labeling...
  • G. Lample et al.

    Neural architectures for named entity recognition

  • Y. LeCun et al.

    Gradient-based learning applied to document recognition

    Proc. IEEE

    (1998)
  • Q. Li, H. Ji, Incremental joint extraction of entity mentions and relations, in: Proceedings of the 52nd Annual Meeting...
  • Y. Liu et al.

    A dependency-based neural network for relation classification

  • Y. Luan et al.

    A general framework for information extraction using dynamic span graphs

  • C. Manning et al.

    The stanford corenlp natural language processing toolkit

  • Cited by (18)

    • Predicting input validation vulnerabilities based on minimal SSA features and machine learning

      2022, Journal of King Saud University - Computer and Information Sciences
      Citation Excerpt :

      However, there are a number of features that are unrelated to the flow of the vulnerability in the programs. Removing unrelated features from the programs’ extracted features will help to improve the performance of the model (Zhang et al., 2018; Jin et al., 2005; Hong et al., 2020). Spens et al. (2018) stated that reducing the amount of irrelevant features considered for classification increases the classifier’s accuracy, whereas Ali et al. (2019) discovered that irrelevant features reduce a classification process’s precision rate.

    • Graph-based joint pandemic concern and relation extraction on Twitter

      2022, Expert Systems with Applications
      Citation Excerpt :

      The output layer can obtain information from backward and forward states simultaneously. Hong et al. present a joint model based on GCN to perform entity and relation extraction by considering context and syntactic information of sentences (Hong, Liu, Yang, Zhang, & Hu, 2020). Zhang et al. utilise GCN over a pruned dependency tree to tackle the relation extraction (Zhang, Qi, & Manning, 2018).

    • WRTRe: Weighted relative position transformer for joint entity and relation extraction

      2021, Neurocomputing
      Citation Excerpt :

      These methods mentioned above regard entity recognition and relation classification as two separated steps, so they mostly lack interaction between two tasks and suffer from error propagation. While joint learning method [12,18–22,34,35] proposed in recent years which extracts entities and relations simultaneously and can use the close interaction information between entities and relations by sharing parameters. Zheng et al. [36] proposed a hybrid neural network which consists of a name entity recognition module and relation extraction module.

    View all citing articles on Scopus

    Yin Hong is currently a post graduate student in School of Software Engineering in South China University of Technology. His research interests include named entity recognition and relation extraction

    Yanxia Liu received the Ph.D. degree from South China University of Technology in 2014. She is currently the associate professor with the School of Software Engineering of South China University of Technology. Her research interests include knowledge graph, machine learning, and pattern recognition and medical image analysis.

    Suizhu Yang born in 1995,M.S. candidate. Her research interests include knowledge graph and distant supervision.

    Kaiwen Zhang is currently a post graduate student in School of Software Engineering in South China University of Technology. His research interests include knowledge graph and question answering.

    Jianjun Hu received the B.S. and M.S. degrees of Mechanical Engineering in 1995 and 1998 respectively from Wuhan University of Technology, China. He received the Ph.D. of Computer Science in 2004 from Michigan State University in the area of machine learning and evolutionary computation. He worked as postdoctoral fellow at Purdue University and University of Southern California from 2004 to 2007. He is currently associate professor at the Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, United States. His research interests include machine learning, deep learning, data mining, evolutionary computation, fault diagnosis, bioinformatics, and material informatics. Dr. Hu is also the associate editors for Nature Scientific Report, PLOS ONE, and BMC Bioinformatics.

    View full text