Graph-based neural network models with multiple self-supervised auxiliary tasks
Introduction
In the last decade, neural networks approaches that can deal with with structured data have been gaining a lot of traction [7], [11], [26], [30], [39]. Due to the prevalence of data structured in the form of graphs, the capability to explicitly exploit structural relationships among data points is particularly useful in improving the performance for a variety of tasks, e.g. in human activity detection [57] and gate recognition [5]. Graph Convolutional Networks (GCNs, [26]) stand out as a particularly successful iteration of such networks, especially for semi-supervised problems. GCNs act to encode graph structures, while being trained on a supervised target loss for all the nodes with labels. This technique is able to share the gradient information from the supervised loss through the graph adjacency matrix and to learn representations exploiting both labeled and unlabeled nodes. Although GCNs can stack multiple graph convolutional layers in order to capture high-order relations, these architectures suffer from “over-smoothing” when the number of layers increases [28], thus making difficult to choose an appropriate number of layers.
If we have a dataset with enough labels, supervised learning can usually achieve good results. Unfortunately, to label a large amount of data is an expensive task. In general, the amount of unlabelled data is substantially more than the data that has been human curated and labelled. It is therefore valuable to find ways to make use of this unlabelled data. A potential solution to this problem comes if we can get labels from unlabelled data and train unsupervised dataset in a supervised manner. Self-supervision achieves this by automatically generating additional labelled signals from the available unlabelled data, using them to learn representations. A possible approach in deep learning involves taking a complex signal, hiding part of it from the network, and then asking the network to fill in the missing information [13].
Additionally, it is found that joint learning of different tasks can improve performance over learning them individually, given that at least a subset of these tasks are related to each other [8]. This observation is at the core of multi-task learning. Precisely, given tasks where a subset of them are related, multi-task learning aims to help improve the learning of a model for by using the knowledge contained in all or some of the tasks [54].
In this paper we train neural network-based graph architectures by means of self-supervised auxiliary tasks in a multi-task framework, similarly to [51]. Considering the promising results of the GCN, we decided to experiment this framework in semi-supervised classification problems on graphs, employing GCN as a base building block. The main contribution of this paper consists of three novel auxiliary tasks for graph-based neural networks:
- a
utoencoding: with which we aim at extracting node representations robust enough to allow both semi-supervised classification as well as vertex features reconstruction;
- c
orrupted features reconstruction: with which we try to extract node representations that allows to reconstruct some of the vertex input features, starting from an embedding built from a corrupted version of them. This auxiliary task can be seen as the graph equivalent of reconstructing one of the color channels of a RGB image using the other channels in computer vision self-supervised learning;
- c
orrupted embeddings reconstruction: with which we try to extract node representations robust to embedding corruption. This is similar to the aforementioned auxiliary task, with the difference that the reconstruction is performed on the node embeddings instead of the vertex features.
These three tasks are intrinsically self-supervised, since the labels are directly extracted from the input graph and its vertex features. These novel auxiliary tasks allow to achieve competitive results on standard datasets and to reduce the aforementioned “over-smoothing” limitation of deep GCNs.
The paper is organized as follows: in Section 2 the related works are summarized; in Section 3 we introduce the three auxiliary tasks; in Section 4 a detailed comparison against GCN on a standard public datasets is presented; Section 5 reports conclusions and future works.
Section snippets
Related works
In recent years, graph representation learning have gained a lot of attention. These techniques can be divided in three main categories: i) random walk-based; ii) factorization-based; iii) neural network-based. In the first group, node2vec [17] and Deepwalk [36] are worth mentioning. The former is an efficient and scalable algorithm for feature learning that optimizes a novel network-aware, neighborhood preserving objective function, using stochastic gradient descent. The latter uses truncated
Methods
In this section, we introduce the formalization of a multi-task self-supervised GCN for semi-supervised classification. We will first give some preliminary definitions, including of a Graph Convolutional (GC) layer and multi-task target loss. We then proceed by showing the auxiliary tasks that can be learned jointly with the semi-supervised classification loss. Finally, we introduce the overall architecture we used in our experiments.
Datasets and Experimental Setup
We test our models on semi-supervised classification using the standard datasets Citeseer, Cora, and Pubmed [40]. These are citation networks, where graph vertexes correspond to documents and (undirected) edges to citations. The vertex features are a bag-of-words representation of the documents. Each node is associated to a class label. The Cora dataset contains 2.708 nodes, 5.429 edges, 7 classes and 1.433 features per node. The Citeseer dataset contains 3.327 nodes, 4.732 edges, 6 classes and
Conclusion
We introduced three self-supervised auxiliary tasks to improve semi-supervised classification performance on graph structured data by training them in a multi-task framework. Precisely, i.e. i) vertex features autoencoding; ii) corrupted vertex features reconstruction; iii) corrupted vertex embeddings reconstruction.
The experiments we performed on standard datasets showed better performance with respect to GCNs. Moreover, we compared our results with those achieved by You et al. [51] and M3S
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
The authors would like to thank Adam Elwood for his helpful and constructive comments that contributed to improve the work.
References (57)
- et al.
TGLSTM: a time based graph deep learning approach to gait recognition
Pattern Recognit. Lett.
(2019) - et al.
Multi-task learning via conic programming
Advances in Neural Information Processing Systems
(2008) - et al.
Multi-domain dialog state tracking using recurrent neural networks
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
(2015) - Y. Zhang, Q. Yang, A survey on multi-task learning, arXiv preprint...
- et al.
A framework for learning predictive structures from multiple tasks and unlabeled data
J. Mach. Learn. Res.
(2005) - et al.
Multi-task feature learning
Advances in Neural Information Processing Systems
(2007) - et al.
Convex multi-task feature learning
Mach. Learn.
(2008) - et al.
Task clustering and gating for bayesian multitask learning
J. Mach. Learn. Res.
(2003) - et al.
Multi-task Gaussian process prediction
Advances in Neural Information Processing Systems
(2008) - et al.
Spectral networks and locally connected networks on graphs
ICLR
(2013)