Elsevier

Pattern Recognition Letters

Volume 148, August 2021, Pages 15-21
Pattern Recognition Letters

Graph-based neural network models with multiple self-supervised auxiliary tasks

https://doi.org/10.1016/j.patrec.2021.04.021Get rights and content

Highlights

  • Graph-based neural network models exploiting multiple self-supervised auxiliary tasks.

  • We propose three new self-supervised auxiliary tasks for graph-based neural networks.

  • Vertex features autoencoding.

  • Corrupted vertex features reconstruction.

  • Corrupted vertex embeddings reconstruction.

Abstract

Self-supervised learning is currently gaining a lot of attention, as it allows neural networks to learn robust representations from large quantities of unlabeled data. Additionally, multi-task learning can further improve representation learning by training networks simultaneously on related tasks, leading to significant performance improvements. In this paper, we propose three novel self-supervised auxiliary tasks to train graph-based neural network models in a multi-task fashion. Since Graph Convolutional Networks are among the most promising approaches for capturing relationships among structured data points, we use them as a building block to achieve competitive results on standard semi-supervised graph classification tasks.

Introduction

In the last decade, neural networks approaches that can deal with with structured data have been gaining a lot of traction [7], [11], [26], [30], [39]. Due to the prevalence of data structured in the form of graphs, the capability to explicitly exploit structural relationships among data points is particularly useful in improving the performance for a variety of tasks, e.g. in human activity detection [57] and gate recognition [5]. Graph Convolutional Networks (GCNs, [26]) stand out as a particularly successful iteration of such networks, especially for semi-supervised problems. GCNs act to encode graph structures, while being trained on a supervised target loss for all the nodes with labels. This technique is able to share the gradient information from the supervised loss through the graph adjacency matrix and to learn representations exploiting both labeled and unlabeled nodes. Although GCNs can stack multiple graph convolutional layers in order to capture high-order relations, these architectures suffer from “over-smoothing” when the number of layers increases [28], thus making difficult to choose an appropriate number of layers.

If we have a dataset with enough labels, supervised learning can usually achieve good results. Unfortunately, to label a large amount of data is an expensive task. In general, the amount of unlabelled data is substantially more than the data that has been human curated and labelled. It is therefore valuable to find ways to make use of this unlabelled data. A potential solution to this problem comes if we can get labels from unlabelled data and train unsupervised dataset in a supervised manner. Self-supervision achieves this by automatically generating additional labelled signals from the available unlabelled data, using them to learn representations. A possible approach in deep learning involves taking a complex signal, hiding part of it from the network, and then asking the network to fill in the missing information [13].

Additionally, it is found that joint learning of different tasks can improve performance over learning them individually, given that at least a subset of these tasks are related to each other [8]. This observation is at the core of multi-task learning. Precisely, given T tasks {Ti}i=1T where a subset of them are related, multi-task learning aims to help improve the learning of a model for {Ti}i=1T by using the knowledge contained in all or some of the T tasks [54].

In this paper we train neural network-based graph architectures by means of self-supervised auxiliary tasks in a multi-task framework, similarly to [51]. Considering the promising results of the GCN, we decided to experiment this framework in semi-supervised classification problems on graphs, employing GCN as a base building block. The main contribution of this paper consists of three novel auxiliary tasks for graph-based neural networks:

  • a

    utoencoding: with which we aim at extracting node representations robust enough to allow both semi-supervised classification as well as vertex features reconstruction;

  • c

    orrupted features reconstruction: with which we try to extract node representations that allows to reconstruct some of the vertex input features, starting from an embedding built from a corrupted version of them. This auxiliary task can be seen as the graph equivalent of reconstructing one of the color channels of a RGB image using the other channels in computer vision self-supervised learning;

  • c

    orrupted embeddings reconstruction: with which we try to extract node representations robust to embedding corruption. This is similar to the aforementioned auxiliary task, with the difference that the reconstruction is performed on the node embeddings instead of the vertex features.

These three tasks are intrinsically self-supervised, since the labels are directly extracted from the input graph and its vertex features. These novel auxiliary tasks allow to achieve competitive results on standard datasets and to reduce the aforementioned “over-smoothing” limitation of deep GCNs.

The paper is organized as follows: in Section 2 the related works are summarized; in Section 3 we introduce the three auxiliary tasks; in Section 4 a detailed comparison against GCN on a standard public datasets is presented; Section 5 reports conclusions and future works.

Section snippets

Related works

In recent years, graph representation learning have gained a lot of attention. These techniques can be divided in three main categories: i) random walk-based; ii) factorization-based; iii) neural network-based. In the first group, node2vec [17] and Deepwalk [36] are worth mentioning. The former is an efficient and scalable algorithm for feature learning that optimizes a novel network-aware, neighborhood preserving objective function, using stochastic gradient descent. The latter uses truncated

Methods

In this section, we introduce the formalization of a multi-task self-supervised GCN for semi-supervised classification. We will first give some preliminary definitions, including of a Graph Convolutional (GC) layer and multi-task target loss. We then proceed by showing the auxiliary tasks that can be learned jointly with the semi-supervised classification loss. Finally, we introduce the overall architecture we used in our experiments.

Datasets and Experimental Setup

We test our models on semi-supervised classification using the standard datasets Citeseer, Cora, and Pubmed [40]. These are citation networks, where graph vertexes correspond to documents and (undirected) edges to citations. The vertex features are a bag-of-words representation of the documents. Each node is associated to a class label. The Cora dataset contains 2.708 nodes, 5.429 edges, 7 classes and 1.433 features per node. The Citeseer dataset contains 3.327 nodes, 4.732 edges, 6 classes and

Conclusion

We introduced three self-supervised auxiliary tasks to improve semi-supervised classification performance on graph structured data by training them in a multi-task framework. Precisely, i.e. i) vertex features autoencoding; ii) corrupted vertex features reconstruction; iii) corrupted vertex embeddings reconstruction.

The experiments we performed on standard datasets showed better performance with respect to GCNs. Moreover, we compared our results with those achieved by You et al. [51] and M3S

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The authors would like to thank Adam Elwood for his helpful and constructive comments that contributed to improve the work.

References (57)

  • R. Caruana

    Multitask learning

    Mach. Learn.

    (1997)
  • J. Chen et al.

    Learning incoherent sparse and low-rank patterns from multiple tasks

    ACM Trans. Knowl. Discov. Data (TKDD)

    (2012)
  • Z. Chen et al.

    GradNorm: gradient normalization for adaptive loss balancing in deep multitask networks

    International Conference on Machine Learning

    (2018)
  • M. Defferrard et al.

    Convolutional neural networks on graphs with fast localized spectral filtering

    NIPS

    (2016)
  • C. Doersch et al.

    Unsupervised visual representation learning by context prediction

    Proceedings of the IEEE International Conference on Computer Vision

    (2015)
  • C. Doersch et al.

    Multi-task self-supervised visual learning

    IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017

    (2017)
  • T. Evgeniou et al.

    Learning multiple tasks with kernel methods

    J. Mach. Learn. Res.

    (2005)
  • I. Goodfellow et al.

    Deep Learning

    (2016)
  • M. Gori et al.

    A new model for learning in graph domains

    Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005.

    (2005)
  • A. Grover et al.

    Node2vec: scalable feature learning for networks

    ACM SIGKDD

    (2016)
  • A. Grover et al.

    Graphite: Iterative Generative Modeling of Graphs

    (2019)
  • W. Hamilton et al.

    Inductive representation learning on large graphs

  • D.K. Hammond et al.

    Wavelets on graphs via spectral graph theory

    Appl. Comput. Harmonic Anal.

    (2011)
  • A. Jalali et al.

    A dirty model for multi-task learning

    Advances in Neural Information Processing Systems

    (2010)
  • E. Jang et al.

    Grasp2Vec: learning object representations from self-supervised grasping

    Conference on Robot Learning

    (2018)
  • A. Kendall et al.

    Multi-task learning using uncertainty to weigh losses for scene geometry and semantics

    Proceedings of the IEEE conference on computer vision and pattern recognition

    (2018)
  • D. Kingma et al.

    Adam: a method for stochastic optimization

    ICLR

    (2015)
  • T.N. Kipf et al.

    Semi-supervised classification with graph convolutional networks

    ICLR

    (2017)
  • Cited by (0)

    View full text