Updating of the Gaussian graphical model through targeted penalized estimation

https://doi.org/10.1016/j.jmva.2020.104621Get rights and content

Abstract

Updating of the Gaussian graphical model via shrinkage estimation is studied. This shrinkage is towards a nonzero parameter value representing prior quantitative information. Once new data become available, the previously estimated parameter needs updating. Shrinkage provides the means to this end, using the latter as a shrinkage target to acquire an updated estimate. The process of iteratively updating the Gaussian graphical model in shrinkage fashion is shown to yield an improved fit and an asymptotically unbiased and consistent estimator. The workings of updating via shrinkage are elucidated by linking it to Bayesian updating and through the inheritance by the update of eigen-properties of the previous estimate. The effect of shrinkage on the moments and loss of the estimator are pointed out. Practical issues possibly hampering updating are identified and solutions outlined. The presented updating procedure is illustrated through the reconstruction of a gene–gene interaction network using transcriptomic data.

Introduction

The reconstruction of networks from data is a key challenge of our time, as can be witnessed from the number of citations that recently published monographs on graphical models by [38] and [20] have accumulated. Many methods that serve this end have been proposed since, cf., the recently published review by [10]. Virtually all these methods reconstruct such networks de novo. That is, they do not make use of information on these networks already available. For molecular networks such information is condensed not only in repositories like KEGG and STRING [27], [35], but also in the form of omics data from the model system under comparable conditions available through portals like GEO and TCGA [11], [39]. Information from aforementioned sources may enhance the molecular network reconstruction from the data at hand. A thus reconstructed network qualifies itself as novel information on the network. In turn it needs updating when new (experimental) information on the network becomes available in the future. Such iterative reconstruction will progress the knowledge on the network. In this work we explore how this may be achieved.

A network is defined as an undirected graph G comprising the pair (V,E). V is the index set of the nodes – representing the (molecular) entities – of the network. E=V×V is the edge set, consisting of node pairs that are connected. The graphs under study are undirected, which implies that if node pair (v1,v2)E automatically (v2,v1)E is included too. An edge is operationalized as a pairwise conditional dependency among the random variables represented by the nodes connected by the edge, given all other random variables represented by the remaining nodes in the network. The absence of an edge is vice versa defined as a conditional independency. The conditional independence graph is to be learned from (say) transcriptomic data on the genes that constitute the network. To this end a Gaussian graphical model (GGM) is assumed: N(0p,Ω1) with Ω the inverse covariance matrix Ω=Σ1. Zeroes and nonzeroes in the inverse covariance matrix Ω correspond to conditional in- and dependencies, respectively, among the corresponding variate pairs. It thus suffices to estimate the inverse covariance matrix from the data by maximization of the log-likelihood, log(|Ω|)tr(SΩ) with sample covariance matrix S obtained from a sample of size n, yielding the maximum likelihood (ML) estimator Ω̂ML=S1. Rests to infer its support.

The estimation of a Gaussian graphical model is hampered by the collinearity of omics data. Then, the maximum likelihood estimator of the inverse covariance matrix is not (or ill-) defined as S is (near-)singular. This is overcome by the augmentation of the log-likelihood loss with a penalty. Here we concentrate on the nonzero centered ridge penalty, λΩTrF2, the sum of the square of the elements of ΩTr. With the subscript r referring to ridge, this Tr is a user-specified target matrix towards which the precision estimate is shrunken for large values of λ. Maximization of this ridge penalized log-likelihood yields the ridge precision estimator (see [42]): Ω̂r(λ)={12(SλTr)+[λIpp+14(SλTr)2]12}1. [42] shows that this estimator satisfies the following properties: (i) Ω̂r(λ)0, (ii) limλ0Ω̂r(λ)=S1 (should it exist), (iii) limλΩ̂r(λ)=Tr, and (iv) consistency. Furthermore, for a suitable choice of λ it – using a diagonal, equivariant target Tr=αIpp with α0 – also outperforms the ML precision estimator in terms of the mean squared error (MSE) [40]. In that MSE sense, even the inverse of the ridge precision estimator, Σ̂r(λ)=[Ω̂r(λ)]1, outperforms the ML covariance estimator. Although the focus of the present work is the ridge precision estimator, an alternative to the ridge precision estimator would be the inverse of the Ledoit–Wolf shrunken covariance matrix [21], Ω̂w(ν)=[Σ̂w(ν)]1 with Σ̂w(ν)=(1ν)S+νTw for ν(0,1). Note that Tw is a covariance target, while Tr is a precision target. Clearly, this Ledoit–Wolf shrinkage covariance estimator shrinks to S and Tw as ν0 and ν1, respectively. Moreover, Σ̂w(ν) too may – for a suitably chosen shrinkage parameter and a diagonal, equivariant target (although it has also been proven for other target choices, [21], [31]) – outperform the ML covariance estimator in the mean squared error sense.

The target provides means to include a quantitative suggestion of the precision. The number of choices for Tr and Tw is in principle unlimited. [16], [31] suggest several low-dimensionally parameterized targets. Some archetypical examples (modified for the ridge precision estimator) are:

  • The equivariant and rotational invariant target Tr=αIpp with α>0.

  • A diagonal Tr with (Tr)jj=1Sjj, such that only off-diagonal elements are shrunken.

  • A simple model of the covariance and variance by Tr=[σ2(1ρ)Ipp+σ2ρ1pp]1, with σ2>0 and ρ[1,1].

[16], [31] learn the free parameters of Tr from the data at hand through, e.g. the minimization of the mean squared error (under assumptions). As an alternative to these low-dimensionally parameterized and data-derived targets, one may also consider a fully unstructured target obtained through other means than the data at hand, e.g.; Tr=Ωknown taken from a related and well-studied model system. Of course, the inverse of these Tr may serve as covariance targets for the Ledoit–Wolf shrinkage covariance estimator Σ̂w(ν).

Here we describe the problem of updating a Gaussian graphical model after the arrival of novel information. The above introduced shrinkage estimators are employed to accommodate the prior information on the model (in the form of a target that represents the current knowledge/estimate on/of the model at that point in time and formed from the current and preceding data sets) to arrive at an updated estimate from the novel data. Properties of these updating estimators are investigated as well as intuition into their workings is provided. Resolutions for practical complications when updating, batch effects and the simultaneous arrival of multiple sources of novel information, are presented. The paper concludes with an example in which the precision matrix and the therefrom derived gene–gene interaction network of ten pathways are reconstructed from transcriptomic data of ER+ breast cancer samples of a series of five subsequently published data sets. Throughout, proofs are deferred to the appendix, while the supplementary material contains more elaborate versions of the proofs as well as additional results.

Estimation of precision matrices from multiple data sets has been considered in the literature. [6], [8], [29] deal with the simultaneous estimation of multiple precision matrices using so-called ‘fused’ ridge/lasso penalties, while e.g. [25], [28] adopt a Bayesian approach. The methods of [15], [43] have a similar thrust but these works are based on parameter communalities between the different precision matrices. On a different note but closely related is the work of [25], [26], [44], where the focus is on the reconstruction of multiple directed acyclic graphs in order to facilitate causal inference. Implicitly, these methods assume that the data used for the estimation of each precision matrix are on par, in the sense that no time order is present. This absence of an order motivates the use of a fused penalty that encourages to borrow information across data sets (should the data give rise to do so). In particular, this shrinks the different precision estimates towards each other, but not towards a target matrix that is formed from one of the other data sets. Moreover, this ‘across data set’-calibration is done once, instead of repetitively in the envisioned updating context considered here.

Section snippets

Updating

When new information becomes available, the previously reconstructed network and GGM needs updating. The latter may then serve as the target in the re-estimation of the network from the new data. This process may be iterated, thus giving rise to an updating scheme. To formalize this, assume an infinite sequence of studies, each producing data from the same normal law N(0p,Ω1) (variations on and deviations from this scenario are discussed at the end of this section and in Section 4). This

Why updating works

Intuition is provided as to why updating improves the knowledge of the sought-for network. In this and the next two sections the subscript k of the estimators is temporarily dropped to reduce notational clutter.

Practical complications

In practice issues may arise that hamper updating or require down-stream manipulation of the estimate. Here two such anticipated complications are identified and possible solutions are outlined.

Conclusion and discussion

We studied the problem of learning a Gaussian graphical model from an experiment that is carried out repeatedly. When data from the next experiment become available, the current estimate of the model parameter needs updating. We proposed to re-estimate the parameter by means of shrinkage estimators that strike a balance between the latest estimate of the parameter and the new experimental data. The resulting estimators yield with high probability an improvement in fit for subsequent

Proofs

Prior to the proof of Proposition 1 several prerequisites are stated.

Lemma 1

Assume Tr1S. Then, Σ̂r(λ)SF2 is strictly increasing in λ.

Proof

The proof proceeds by showing that the derivatives of the squared norm with respect to λ is strictly positive. Write the squared Frobenius norm as a trace and expand its argument: Σ̂r(λ)SF2=tr[Σ̂r2(λ)]2tr[Σ̂r(λ)S]+tr(S2). The derivative w.r.t. λ of the right-hand side is: ddλΣ̂r(λ)SF2=2tr{[Σ̂r(λ)S]ddλΣ̂r(λ)},where: ddλΣ̂(λ)=12[λIpp+14(SλTr)2]12Σ̂r(λ)[Σ̂r1

CRediT authorship contribution statement

Wessel N. van Wieringen: Conceptualization, Formal analysis, Methodology, Software, Writing - original draft. Koen A. Stam: Data curation. Carel F.W. Peeters: Conceptualization, Writing - review & editing. Mark A. van de Wiel: Writing - review & editing.

Acknowledgments

We thank the encouraging Editor and referees for their comments that led to this improved version of our work.

References (44)

  • DanaherP. et al.

    The joint graphical lasso for inverse covariance estimation across multiple classes

    J. R. Stat. Soc. Ser. B Stat. Methodol.

    (2014)
  • DavisC. et al.

    The rotation of eigenvectors by a perturbation. III

    SIAM J. Numer. Anal.

    (1970)
  • DrtonM. et al.

    Structure learning in graphical modeling

    Annu. Rev. Stat. Appl.

    (2017)
  • EdgarR. et al.

    Gene Expression Omnibus: NCBI gene expression and hybridization array data repository

    Nucleic Acids Res.

    (2002)
  • EfronB.

    Large-scale simultaneous hypothesis testing: the choice of a null hypothesis

    J. Amer. Statist. Assoc.

    (2004)
  • EtamadiN.

    Convergence of weighted averages of random variables revisited

    Proc. Amer. Math. Soc.

    (2006)
  • H. Gray, G.G.R. Leday, C.A. Vallejos, S. Richardson, Shrinkage estimation of large covariance matrices using multiple...
  • GuoJ. et al.

    Joint estimation of multiple graphical models

    Biometrika

    (2011)
  • HornA. et al.

    Matrix Analysis

    (2009)
  • JamisonB. et al.

    Convergence of weighted averages of independent random variables

    Z. Wahrscheinlichkeitstheor. Verwandte Geb.

    (1965)
  • M. Kolar, H. Liu, Marginal regression for multitask learning, in: Proceedings of the Fifteenth International Conference...
  • KollerD. et al.

    Probabilistic Graphical Models: Principles and Techniques

    (2009)
  • Cited by (7)

    • Smoothly adaptively centered ridge estimator

      2022, Journal of Multivariate Analysis
    • Promote sign consistency in the joint estimation of precision matrices

      2021, Computational Statistics and Data Analysis
    • Sequential Learning of Regression Models by Penalized Estimation

      2022, Journal of Computational and Graphical Statistics
    View all citing articles on Scopus
    View full text