Full length articleDFMKE: A dual fusion multi-modal knowledge graph embedding framework for entity alignment
Introduction
Entity alignment is critical for integrating distinct knowledge graphs (KGs) by connecting entities that refer to the same real-world object. Because most KGs are created with a specific purpose, different KGs showcase different representations for the same concepts [1].
Early studies on entity alignment primarily focus on attribute similarity [2], [3], which frequently suffers from attribute heterogeneity, making entity alignment error-prone [4]. Subsequently, other approaches to aligning entities in KGs required human intervention [5], or extra-resources [6], that is, a description of involved entities and their relations. The methods above require many examples to work in downstream tasks correctly, but obtaining a sufficiently large number of examples in real-life applications is challenging and expensive. More recently, some authors suggested semi-supervised models perform entity alignment in KGs; that is, they sought new algorithmic approaches which benefit from both labeled and unlabeled entities [7], [8].
For instance, [7] proposed the SEA (Semi-supervised Entity Alignment) framework. The SEA framework extends the popular TransE embedding method [9] to deal with the degree difference of entities. SEA, in particular, applies adversarial training to prevent entities with a similar degree of popularity are aggregated in the same region of the embedding space during the training step. After generating the embedding of two input KGs, SEA constructs and optimizes a suitable objective function that contains both labeled and unlabeled entities.
A further exciting approach is KECG (Knowledge Embedding model and Cross-Graph model), proposed in [8]. The KECG approach formulates the problem of aligning entities as an optimization problem in which the objective function is the sum of two components. The first component of the objective function to optimize is related to the so-called cross-graph model, which captures inner KG structures and alignments between entities in different KGs. The cross-graph model applies an attention mechanism, and, in particular, it is built utilizing an extended GAT as an encoder: in this way, the KECG system can ignore unimportant nodes. We observe that KECG uses both labeled and unlabeled examples in the cross-graph model construction process; thus, it has to be regarded as a semi-supervised approach. The second component of the objective function to optimize uses the TransE algorithm to learn the embedding of entities and relations in different KGs, and then it aligns these representations into a unified vector space.
SEA and KECG have achieved excellent performance on a range of popular datasets, but they cannot handle multi-modal KGs: in other words, KGs are often populated by heterogeneous data such as texts, numbers, or images to describe the same piece of reality. Distinct knowledge forms play a crucial role as auxiliary data to complete KGs and perform entity alignment; thus, we are strongly motivated to design new entity matching algorithms that can exploit their full potential multi-modal data.
Fig. 1 provides an example of entity alignment for multi-modal KGs, and it clarifies the peculiarities arising from the presence of multi-modal data in a KG. Here, the images associated with the entity “THU” indicate that the type of this entity is “university” in this case. However, leveraging multi-modal knowledge to perform entity alignment is not trivial. The inevitable heterogeneity among different modalities makes the entity alignment task challenging. For example, in Fig. 1, it is difficult to conclude that the “Tsinghua University” entity in KG1 and the “THU” entity in KG2 refer to the same object using only images or text information.
Recent studies [10], [11], [12] have put forth several models that combine multi-modal data from KGs into a joint embedding and enable the alignment model to modify modality weights automatically. However, the approaches above did not consider modal correlation at the feature level, and thus, these approaches may achieve poor results if multiple modalities are highly correlated. In addition, most existing works do not work well when seed entities, a list of labeled entities across KGs to initialize the training process, are not broadly available.
To address the issues above, we propose a dual fusion multi-modal knowledge graph embedding framework (DFMKE) for modeling the entity associations of multi-modal KGs and locating entities referring to the same real-world identity. Specifically, we propose an early fusion strategy to perform feature fusion among different modalities, which can exploit correlations between low-level features of each modality. Then, we discriminatively generate knowledge representations for each modality and design a late fusion method based on low-rank weight decomposition to leverage knowledge from multiple modalities for the entity alignment task. This work offers three contributions:
- •
To alleviate the inconsistency of original data in each modality, we propose a dual multi-modal knowledge graph embedding framework called DFMKE that can incorporate the advantages of both early fusion and late fusion techniques for entity alignment with joint training. The main idea of DFMKE is to integrate knowledge representations of multiple modalities from separate spaces to a shared spaces.
- •
We present a novel late fusion method for multi-modal fusion using modality-specific low-rank factors. This method can easily combine the output features from the early fusion method and reduce the computational complexity caused by input transformation into a tensor.
- •
We prove the performance of the DFMKE by conducting experiments on two public multi-modal datasets, and we compared DFMKE with several state-of-the-art entity alignment methods. DFMKE works well even when no seed entities are available to initialize the training process. We also offer interpretable analysis in our work by conducting ablation studies on early and late fusion modules’ contributions.
The remainder of the paper is organized as follows. Section 2 briefly discusses the main existing works; Section 3 introduces the technical details of the work; Section 4 reports and comments on the experimental results according to various benchmarks; Section 5 concludes this paper.
Section snippets
Related literature
KGs are a powerful tool to efficiently organize, manage and retrieve a large body of information which is usually represented as a collection of RDF triplets of the form (head, predicate, tail) [13].
Two core problems in the KG research area are: (a) Link Prediction, i.e., we aim at completing triplets of the form (head?, predicate, tail) (in which the head is not specified) or (head, predicate, tail?) (in which the tail is not specified); (b) Entity Matching, i.e., given two KGs, we wish to
Proposed framework
In this section, we first formulate the problem and then describe the technical details of the proposed framework. A multi-modal KG can be viewed as a tuple , where , , , denote the sets of entities, images, relations and attributes, respectively. Given a pair of entities from source KG and from target KG , the task of entity alignment is to match entities describing the same object in different KGs.
To tackle the entity alignment task, we propose a framework
Experiments
In this part, we assess DFMKE using two real-world datasets and show how it may be used to attain cutting-edge performance by utilizing multi-modal knowledge in the entity alignment task.
Conclusions
In this paper, we propose a dual fusion multi-modal KG embedding framework that integrates several representations of various types of information based on knowledge embedding for the entity alignment task. We first introduce an early fusion method for fusing features of multi-modal entities. Moreover, an efficient late fusion method using modality-specific low-rank factors was designed through shared space learning with the output vectors from early fusion to migrate features under different
Declaration of Competing Interest
The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Zhu JIA reports financial support was provided by National Natural Science Foundation of China.
Acknowledgments
This work was supported by the Key Laboratory of Intelligent Education Technology and Application of Zhejiang Province, Zhejiang Normal University, Zhejiang, China, the Key Research and Development Program of Zhejiang Province (No. 2021C03141), and the National Natural Science Foundation of China under Grant (62077015, 61877020 and 62037001).
References (58)
- et al.
Towards multi-modal causability with graph neural networks enabling information fusion for explainable AI
Inf. Fusion
(2021) - D. Wijaya, P.P. Talukdar, T. Mitchell, PIDGIN: ontology alignment using web text as interlingua, in: ACM International...
- et al.
Data fusion
ACM Comput. Surv.
(2009) - J. Volz, C. Bizer, M. Gaedke, G. Kobilarov, Discovering and maintaining links on the web of data, in: Proceedings of...
- B.D. Trisedya, J. Qi, R. Zhang, Entity alignment between knowledge graphs using attribute embeddings, in: Proceedings...
- et al.
Yago3: A knowledge base from multilingual wikipedias
- W. Hu, J. Chen, Y. Qu, A self-training approach for resolving object coreference on the semantic web, in: Proc. of the...
- S.C. Pei, L. Yu, R. Hoehndorf, X.L. Zhang, Semi-supervised entity alignment via knowledge graph embedding with...
- C. Li, Y. Cao, L. Hou, J. Shi, J. Li, T.-S. Chua, Semi-supervised entity alignment via joint knowledge embedding model...
- A. Bordes, N. Usunier, A. García-Duran, J. Weston, O. Yakhnenko, Translating embeddings for modeling multi-relational...
Resource description framework (RDF): Concepts and abstract syntax
PARIS: probabilistic alignment of relations, instances, and schema
Proc. VLDB Endow.
Multi-channel graph neural network for entity alignment
Semi-supervised entity alignment via knowledge graph embedding with awareness of degree difference
Knowledge graph embedding by translating on hyperplanes
Learning entity and relation embeddings for knowledge graph completion
Cited by (12)
Capsule network-based deep ensemble transfer learning for multimodal sentiment analysis
2024, Expert Systems with ApplicationsEdge propagation for link prediction in requirement-cyber threat intelligence knowledge graph
2024, Information SciencesMMIEA: Multi-modal Interaction Entity Alignment model for knowledge graphs
2023, Information FusionA framework for structured semantic representation capable of active sensing and interpretable inference: A cancer prognostic analysis case study
2023, Computers in Biology and MedicineKnowledge graph completion method based on quantum embedding and quaternion interaction enhancement
2023, Information Sciences