Joint extraction of entities and relations using graph convolution over pruned dependency trees

doi:10.1016/j.neucom.2020.06.061

Neurocomputing

Volume 411, 21 October 2020, Pages 302-312

https://doi.org/10.1016/j.neucom.2020.06.061 Get rights and content

Abstract

We present a novel end-to-end deep neural network model based on graph convolutional networks for simultaneous joint extraction of entities and relations among them. Our model captures context and syntactic information from sentence by stacking a graph convolutional layer over bidirectional sequential LSTM layers. We sequentially concatenate the subject, object and sentence representations for obtaining the directionality of relations. Besides, in order to address long entity-distances problem, we further apply a path-centric pruning procedure to input trees in order to preserve useful information while maximally removing irrelevant words. Experiments are conducted on NYT dataset, and the proposed model achieves the state-of-the-art results on entity and relation extraction task. Our source code is available on Github: https://github.com/michael-hon/LSTM-GCN-ER.

Introduction

The task of end-to-end extraction of entities and their relations is to detect entity mentions and identify semantic relations among entities from plain text at the same time. It is the basis of many downstream natural language processing tasks, including question answering [12], knowledge graph construction [26], and summarization [2].

Traditional approaches treat this entity and relationship extraction task as a pipeline of two sub-tasks, i.e., detecting the entities [16] first and then extracting the semantic relations [43] among entities. This pipeline framework is easy and flexible since we can be free to replace one component without considering another one. However, the pipeline approach suffers from error propagation since entity recognition errors can be propagated to the relation extraction step, resulting in poor performance. In addition to this, the pipeline method also ignores the correlation between these two tasks. For instance, in the sentence Qinghai is one of the most rugged areas in China, entity Qinghai and China information is very important for correct extraction of the relation contains, and vice versa.

Recently, more and more studies focus on extracting entities and relations simultaneously in a single model, which are usually named as joint extraction method. Joint model can utilize the information of entities and relations together to achieve better performance compared to the pipeline method. [25], [18], [30] employ feature engineering to construct a joint extraction system. They require a lot of time to construct features manually and rely heavily on other NLP tools, which causes the performance of the model to depend on the features of manual design. Therefore, in order to reduce manual work time, researchers try to apply neural networks to jointly extract entities and relations in a single model [13], [45], [1], [34]. However, most end-to-end models do not consider the issue of directionality of relations, i.e., properly assigning the subject and object within a relationship. In the above Qinghai example, (China, Qinghai) hold the contains relation, while (Qinghai, China) have the administrative_divisions relationship. These two triplets (China, contains, Qinghai) and (Qinghai, administrative_divisions, China) are said to exhibit SingleEntityOverlap because they have an overlapped entity. Thus, we can treat the relation directionality as SingleEntityOverlap issue.

Besides, in order to address long entity-distance problem in long sentences, previous works [24], [19], [8] utilize the dependency tree from input sentence to extract relation with Tree-LSTM[35] or recursive neural network[32]. However, these models have some shortcomings. Since models are operating directly on the dependency tree, it is hard to parallelize and thus the computation speed becomes very slow. Moreover, some research [4], [40], [39] just utilize the shortest dependency path (SDP) between entities for removing much irrelevant information. However, only considering the shortest dependency path between entities in the dependency tree may ignore some important information, which may have some limits. For example, in Fig. 1, if the model only uses the SDP as input, token not will be not taken into account, which will cause classification error.

Based on the above analysis, we present a novel end-to-end deep neural network model to jointly extract entities and relations. Our network model mainly consists of bidirectional sequential LSTM layers and a graph convolutional layer [22], [43]. Our model first decodes entities by conditional random field (CRF) and then encodes the dependency tree over input sentences with graph convolutional layers for detecting relations between two entities. Since graph convolutional operation can use matrix multiplications, which can be easily implemented by batch training and parallel computing, and thus our model is more efficient compared with [35], [24], [19]. Besides, we sequentially concatenate the subject and object representations with sentence representation and thus the final hidden representation is different for entity pairs (e1, e2) and (e2, e1) even for the same sentence. This indicates that our model is sensitive to the directionality of relations. In order to solve the problem that the $shortest$ $dependency$ $path$ may lose important information, we employ a path-centric pruning technique to prune the dependency tree to maximally keep important information while removing irrelevant tokens [43].

We evaluate our model on the New York Times (NYT) dataset which is produced by distant supervised method [30]. Our model outperforms previous feature-based and neural network-based methods, which indicates the effectiveness of our proposed model. Besides, the experimental results show that our model can effectively address long distance between entities and the issue of relation directionality.

In summary, the main contributions of our work are: (i) we propose an end-to-end neural network model where graph convolutional network is introduced to detect relations. (ii) we consider the directionality of relations problem and long entity-distance problem. (iii) we conduct experiment on NYT dataset and achieve the state-of-the-art result.

Section snippets

Related work

Extracting entities and relations is important for many other NLP tasks. Currently two methods are mainly for this task, namely the pipeline approach and the joint extraction method.

The pipeline approach treats the task as two sub-tasks, i.e, named entity recognition (NER) and relation extraction (RE). For NER, most methods convert NER into sequential tagging task. [3] uses a hybrid bidirectional LSTM and CNN architecture to automatically detect word- and character-level features. [16]

Our model

In this section, we define our end-to-end neural relation extraction model. Fig. 2 illustrates the overview of the model. This model can be divided into sequence layer and GCN layer, one for extracting entities and one for identifying relations. At sequence layer, we use bidirectional sequential LSTMs to encode the source sentence and then CRF is used to decode entity sequences globally. At GCN layer, word sequence and entity tag sequence representation are concatenated and then input to graph

Experiments

To evaluate the performance of our model, we conduct experiment on a public dataset NewYork Times (NYT)² which is produced by distant supervision method [31].

In NYT, the training data can be generated by distant supervision without manully labeling while the test dataset is manually labeled for ensuring quality with 3,880 relation triplets. Besides, this dataset contains 3 entity types³

Conclusion

We proposed a novel end-to-end deep neural network model to jointly extract entities and relations by combining bidirectional sequential LSTM layers with graph convolutional networks. Besides, we employ a path-centric pruning strategy to prune the dependency tree to hold relevant information while excluding irrelevant content as much as possible. The experimental results show that our model achieves the best result on New York Times (NYT) corpora and can effectively address the relation

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

CRediT authorship contribution statement

Yin Hong: Conceptualization, Methodology, Validation, Investigation, Writing - original draft. Yanxia Liu: Supervision, Writing - review & editing. Suizhu Yang: Writing - review & editing. Kaiwen Zhang: Writing - review & editing. Jianjun Hu: Writing - review & editing.

Acknowledgements

This work is supported by Research and Development Project in Key Areas of Guangdong Province (2018B010109004) and the program of China Scholarship Council (201806155098).

Yin Hong is currently a post graduate student in School of Software Engineering in South China University of Technology. His research interests include named entity recognition and relation extraction

References (45)

S. Zheng et al.
Joint entity and relation extraction based on a hybrid neural network
Neurocomputing
(2017)
H. Adel et al.
Global normalization of convolutional neural networks for joint entity and relation classification
R.K. Amplayo et al.
Entity commonsense representation for neural abstractive summarization
J. Chiu et al.
Named entity recognition with bidirectional lstm-cnns
Trans. Assoc. Comput. Linguist.
(2016)
K. Fundel et al.
Relexrelation extraction using dependency parse trees
Bioinformatics
(2006)
M.R. Gormley et al.
Improved relation extraction with feature-rich compositional embedding models
A.Z. Gregoric et al.
Named entity recognition with parallel recurrent neural networks
Z. GuoDong, S. Jian, Z. Jie, Z. Min, Exploring various knowledge in relation extraction. In Proceedings of the 43rd...
P. Gupta, S. Rajaram, H. Schütze, T. Runkler, Neural relation extraction within and across sentence boundaries. In...
P. Gupta et al.
Table filling multi-task recurrent neural network for joint entity and relation extraction

S. Hochreiter et al.

Long short-term memory

Neural Comput.

(1997)

R. Hoffmann, C. Zhang, X. Ling, L. Zettlemoyer, D.S. Weld, Knowledge-based weak supervision for information extraction...

S. Hu et al.

A state-transition framework to answer complex questions over knowledge base

A. Katiyar et al.

Going out on a limb: joint extraction of entity mentions and relations without dependency trees

D.P. Kingma, J. Ba, Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980,...

J. Lafferty, A. McCallum, F.C. Pereira, Conditional random fields: probabilistic models for segmenting and labeling...

G. Lample et al.

Neural architectures for named entity recognition

Y. LeCun et al.

Gradient-based learning applied to document recognition

Proc. IEEE

(1998)

Q. Li, H. Ji, Incremental joint extraction of entity mentions and relations, in: Proceedings of the 52nd Annual Meeting...

Y. Liu et al.

A dependency-based neural network for relation classification

Y. Luan et al.

A general framework for information extraction using dynamic span graphs

C. Manning et al.

The stanford corenlp natural language processing toolkit

Cited by (18)

Achieving Cognitive Mass Personalization via the Self-X Cognitive Manufacturing Network: An Industrial Knowledge Graph- and Graph Embedding-Enabled Pathway
2023, Engineering
Based on advanced information and communication infrastructures and enabled with the cutting-edge information processing of cognitive computing, existing smart manufacturing systems have foreseen a prevailing tendency that approaches a higher automation level, i.e., Self-X (e.g., Self-configuration/optimization/adaptation).
However, the readiness of ‘Self-X’ levels is still far to reach, encountering the practical challenges of semantics-based networking and human-machine untrust in the manufacturing scenario. To mind these gaps, the authors envision an industrial knowledge graph (IKG) and graph embedding (GE) enabled pathway, to flourish today’s smart manufacturing paradigms towards cognitive mass personalization. To pave it, three promising IKG and GE enabling techniques in the ‘Self-X’ cognitive manufacturing network are described. Potential opportunities and challenges are also pointed out to invite more opinions to refine and innovate the exploitation of IKG and GE for the future of smart manufacturing.
Predicting input validation vulnerabilities based on minimal SSA features and machine learning
2022, Journal of King Saud University - Computer and Information Sciences
Citation Excerpt :
However, there are a number of features that are unrelated to the flow of the vulnerability in the programs. Removing unrelated features from the programs’ extracted features will help to improve the performance of the model (Zhang et al., 2018; Jin et al., 2005; Hong et al., 2020). Spens et al. (2018) stated that reducing the amount of irrelevant features considered for classification increases the classifier’s accuracy, whereas Ali et al. (2019) discovered that irrelevant features reduce a classification process’s precision rate.
Structured Query Language injection (SQLi) and Cross-Site Scripting (XSS) are the most renowned kinds of input validation vulnerabilities. Of late, vulnerability prediction models based on machine learning have been gaining acceptance in the domain of Web security. Such models offer an easy and effective way of dealing with web application security concerns. However, most of them, in particular, rely on complex graphs generated from source code or regex patterns based on expert knowledge. This paper proposed a method for extracting features from source code and predicting input validation vulnerabilities using machine learning algorithms. The proposed method can extract all features related to the flow of vulnerabilities among the programs and remove the features that are irrelevant to the vulnerability flow. In addition, each vulnerability’s context has been assigned, providing additional data for our model to use in learning about the vulnerability context. Compared to other related methods, the feature extraction method proposed in this paper has been found to have high reusability and better performance. The best model related to the LSTM classifier had a 98.1% recall rate, a 97.9% precision, an accuracy of 98.67%, and a 99.03% area under the curve (AUC) in the test dataset.
Graph-based joint pandemic concern and relation extraction on Twitter
2022, Expert Systems with Applications
Citation Excerpt :
The output layer can obtain information from backward and forward states simultaneously. Hong et al. present a joint model based on GCN to perform entity and relation extraction by considering context and syntactic information of sentences (Hong, Liu, Yang, Zhang, & Hu, 2020). Zhang et al. utilise GCN over a pruned dependency tree to tackle the relation extraction (Zhang, Qi, & Manning, 2018).
Public concern detection provides potential guidance to the authorities for crisis management before or during a pandemic outbreak. Detecting people’s concerns and attention from online social media platforms has been widely acknowledged as an effective approach to relieve public panic and prevent a social crisis. However, detecting concerns in time from massive volumes of information in social media turns out to be a big challenge, especially when sufficient manually labelled data is in the absence during public health emergencies, e.g., COVID-19. In this paper, we propose a novel end-to-end deep learning model to identify people’s concerns and the corresponding relations based on Graph Convolutional Networks and Bi-directional Long Short Term Memory integrated with Concern Graphs. Except for the sequential features from BERT embeddings, the regional features of tweets can be extracted by the Concern Graph module, which not only benefits the concern detection but also enables our model to be high noise-tolerant. Thus, our model can address the issue of insufficient manually labelled data. We conduct extensive experiments to evaluate the proposed model by using both manually labelled tweets and automatically labelled tweets. The experimental results show that our model can outperform the state-of-the-art models on real-world datasets.
WRTRe: Weighted relative position transformer for joint entity and relation extraction
2021, Neurocomputing
Citation Excerpt :
These methods mentioned above regard entity recognition and relation classification as two separated steps, so they mostly lack interaction between two tasks and suffer from error propagation. While joint learning method [12,18–22,34,35] proposed in recent years which extracts entities and relations simultaneously and can use the close interaction information between entities and relations by sharing parameters. Zheng et al. [36] proposed a hybrid neural network which consists of a name entity recognition module and relation extraction module.
Entity and relation extraction is a critical task of information extraction in natural language processing. With fast developments of deep learning, this area has attracted great research attention. In spite of these achievements, however, due to the limited feature extraction ability of previous models, extracting overlapping and multiple relation triplets from a sentence is still an enormous challenge. Aim to this issue, here we propose a sequence-to-sequence method, which includes a weighted relative position Transformer encoder to flexibly capture the semantic relationship between entities. To prove the effectiveness of this suggested method, we conduct experiments on two publicly available datasets NYT24 and NYT29. The experimental results show that the proposed approach outperforms previous methods and achieves state-of-the-art performance. Such a framework may shed novel light into knowledge graph construction under complex situations and its potential applications.
Financial fraud risk analysis based on audit information knowledge graph
2021, Procedia Computer Science
The frequent occurrence of financial fraud caused by listed companies has seriously hindered the healthy development of the capital market. Existing studies usually analyze financial fraud through the effectiveness of audit opinion or the correlation between auditor change rate and financial fraud, ignoring the relationship between different participants. By using the audit relationships among corporations, audit firms and auditors, this paper constructs an audit information knowledge graph and proposes a knowledge graph reasoning framework based on the Sub Feature Extraction method to detect potential fraud corporations. In the process of analyzing the audit data of 376 companies in the China Growth Enterprises Market from 2013 to 2019, it is found that potential fraud corporations can be well identified by searching from the known fraud corporations using searched paths. In addition, we find two new audit features of financial fraud corporations which are respectively related to abnormal audit opinions issued by auditors and abnormal associations of audit firms. They can help regulators more effectively find the potential financial fraud corporations that need to be supervised.
Joint Extraction of Organizations and Relations for Emergency Response Plans with Rich Semantic Information Based on Multi-head Attention Mechanism
2023, International Arab Journal of Information Technology

View all citing articles on Scopus

Yanxia Liu received the Ph.D. degree from South China University of Technology in 2014. She is currently the associate professor with the School of Software Engineering of South China University of Technology. Her research interests include knowledge graph, machine learning, and pattern recognition and medical image analysis.

Suizhu Yang born in 1995,M.S. candidate. Her research interests include knowledge graph and distant supervision.

Kaiwen Zhang is currently a post graduate student in School of Software Engineering in South China University of Technology. His research interests include knowledge graph and question answering.

Jianjun Hu received the B.S. and M.S. degrees of Mechanical Engineering in 1995 and 1998 respectively from Wuhan University of Technology, China. He received the Ph.D. of Computer Science in 2004 from Michigan State University in the area of machine learning and evolutionary computation. He worked as postdoctoral fellow at Purdue University and University of Southern California from 2004 to 2007. He is currently associate professor at the Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, United States. His research interests include machine learning, deep learning, data mining, evolutionary computation, fault diagnosis, bioinformatics, and material informatics. Dr. Hu is also the associate editors for Nature Scientific Report, PLOS ONE, and BMC Bioinformatics.

View full text