Updating of the Gaussian graphical model through targeted penalized estimation

doi:10.1016/j.jmva.2020.104621

Journal of Multivariate Analysis

Volume 178, July 2020, 104621

https://doi.org/10.1016/j.jmva.2020.104621 Get rights and content

Abstract

Updating of the Gaussian graphical model via shrinkage estimation is studied. This shrinkage is towards a nonzero parameter value representing prior quantitative information. Once new data become available, the previously estimated parameter needs updating. Shrinkage provides the means to this end, using the latter as a shrinkage target to acquire an updated estimate. The process of iteratively updating the Gaussian graphical model in shrinkage fashion is shown to yield an improved fit and an asymptotically unbiased and consistent estimator. The workings of updating via shrinkage are elucidated by linking it to Bayesian updating and through the inheritance by the update of eigen-properties of the previous estimate. The effect of shrinkage on the moments and loss of the estimator are pointed out. Practical issues possibly hampering updating are identified and solutions outlined. The presented updating procedure is illustrated through the reconstruction of a gene–gene interaction network using transcriptomic data.

Introduction

The reconstruction of networks from data is a key challenge of our time, as can be witnessed from the number of citations that recently published monographs on graphical models by [38] and [20] have accumulated. Many methods that serve this end have been proposed since, cf., the recently published review by [10]. Virtually all these methods reconstruct such networks de novo. That is, they do not make use of information on these networks already available. For molecular networks such information is condensed not only in repositories like KEGG and STRING [27], [35], but also in the form of omics data from the model system under comparable conditions available through portals like GEO and TCGA [11], [39]. Information from aforementioned sources may enhance the molecular network reconstruction from the data at hand. A thus reconstructed network qualifies itself as novel information on the network. In turn it needs updating when new (experimental) information on the network becomes available in the future. Such iterative reconstruction will progress the knowledge on the network. In this work we explore how this may be achieved.

A network is defined as an undirected graph $G$ comprising the pair $(V, E)$ . $V$ is the index set of the nodes – representing the (molecular) entities – of the network. $E = V \times V$ is the edge set, consisting of node pairs that are connected. The graphs under study are undirected, which implies that if node pair $(v_{1}, v_{2}) \in E$ automatically $(v_{2}, v_{1}) \in E$ is included too. An edge is operationalized as a pairwise conditional dependency among the random variables represented by the nodes connected by the edge, given all other random variables represented by the remaining nodes in the network. The absence of an edge is vice versa defined as a conditional independency. The conditional independence graph is to be learned from (say) transcriptomic data on the genes that constitute the network. To this end a Gaussian graphical model (GGM) is assumed: $N (0_{p}, Ω^{- 1})$ with $Ω$ the inverse covariance matrix $Ω = Σ^{- 1}$ . Zeroes and nonzeroes in the inverse covariance matrix $Ω$ correspond to conditional in- and dependencies, respectively, among the corresponding variate pairs. It thus suffices to estimate the inverse covariance matrix from the data by maximization of the log-likelihood, $\propto log (| Ω |) - tr (S Ω)$ with sample covariance matrix $S$ obtained from a sample of size $n$ , yielding the maximum likelihood (ML) estimator ${\hat{Ω}}_{M L} = S^{- 1}$ . Rests to infer its support.

The estimation of a Gaussian graphical model is hampered by the collinearity of omics data. Then, the maximum likelihood estimator of the inverse covariance matrix is not (or ill-) defined as $S$ is (near-)singular. This is overcome by the augmentation of the log-likelihood loss with a penalty. Here we concentrate on the nonzero centered ridge penalty, $λ {‖ Ω - T_{r} ‖}_{F}^{2}$ , the sum of the square of the elements of $Ω - T_{r}$ . With the subscript $r$ referring to ridge, this $T_{r}$ is a user-specified target matrix towards which the precision estimate is shrunken for large values of $λ$ . Maximization of this ridge penalized log-likelihood yields the ridge precision estimator (see [42]): ${\hat{Ω}}_{r} (λ) = {\frac{1}{2} (S - λ T_{r}) + {[λ I_{p p} + \frac{1}{4} {(S - λ T_{r})}^{2}]}^{1 ∕ 2}}^{- 1} .$ [42] shows that this estimator satisfies the following properties: (i) ${\hat{Ω}}_{r} (λ) ≻ 0$ , (ii) ${lim}_{λ ↓ 0} {\hat{Ω}}_{r} (λ) = S^{- 1}$ (should it exist), (iii) ${lim}_{λ \to \infty} {\hat{Ω}}_{r} (λ) = T_{r}$ , and (iv) consistency. Furthermore, for a suitable choice of $λ$ it – using a diagonal, equivariant target $T_{r} = α I_{p p}$ with $α \geq 0$ – also outperforms the ML precision estimator in terms of the mean squared error (MSE) [40]. In that MSE sense, even the inverse of the ridge precision estimator, ${\hat{Σ}}_{r} (λ) = {[{\hat{Ω}}_{r} (λ)]}^{- 1}$ , outperforms the ML covariance estimator. Although the focus of the present work is the ridge precision estimator, an alternative to the ridge precision estimator would be the inverse of the Ledoit–Wolf shrunken covariance matrix [21], ${\hat{Ω}}_{ℓ w} (ν) = {[{\hat{Σ}}_{ℓ w} (ν)]}^{- 1}$ with ${\hat{Σ}}_{ℓ w} (ν) = (1 - ν) S + ν T_{ℓ w}$ for $ν \in (0, 1)$ . Note that $T_{ℓ w}$ is a covariance target, while $T_{r}$ is a precision target. Clearly, this Ledoit–Wolf shrinkage covariance estimator shrinks to $S$ and $T_{ℓ w}$ as $ν ↓ 0$ and $ν ↑ 1$ , respectively. Moreover, ${\hat{Σ}}_{ℓ w} (ν)$ too may – for a suitably chosen shrinkage parameter and a diagonal, equivariant target (although it has also been proven for other target choices, [21], [31]) – outperform the ML covariance estimator in the mean squared error sense.

The target provides means to include a quantitative suggestion of the precision. The number of choices for $T_{r}$ and $T_{ℓ w}$ is in principle unlimited. [16], [31] suggest several low-dimensionally parameterized targets. Some archetypical examples (modified for the ridge precision estimator) are:

•
The equivariant and rotational invariant target $T_{r} = α I_{p p}$ with $α > 0$ .
•
A diagonal $T_{r}$ with ${(T_{r})}_{j j} = 1 ∕ S_{j j}$ , such that only off-diagonal elements are shrunken.
•
A simple model of the covariance and variance by $T_{r} = {[σ^{2} (1 - ρ) I_{p p} + σ^{2} ρ 1_{p p}]}^{- 1}$ , with $σ^{2} > 0$ and $ρ \in [- 1, 1]$ .

[16], [31] learn the free parameters of $T_{r}$ from the data at hand through, e.g. the minimization of the mean squared error (under assumptions). As an alternative to these low-dimensionally parameterized and data-derived targets, one may also consider a fully unstructured target obtained through other means than the data at hand, e.g.; $T_{r} = Ω_{known}$ taken from a related and well-studied model system. Of course, the inverse of these $T_{r}$ may serve as covariance targets for the Ledoit–Wolf shrinkage covariance estimator ${\hat{Σ}}_{ℓ w} (ν)$ .

Here we describe the problem of updating a Gaussian graphical model after the arrival of novel information. The above introduced shrinkage estimators are employed to accommodate the prior information on the model (in the form of a target that represents the current knowledge/estimate on/of the model at that point in time and formed from the current and preceding data sets) to arrive at an updated estimate from the novel data. Properties of these updating estimators are investigated as well as intuition into their workings is provided. Resolutions for practical complications when updating, batch effects and the simultaneous arrival of multiple sources of novel information, are presented. The paper concludes with an example in which the precision matrix and the therefrom derived gene–gene interaction network of ten pathways are reconstructed from transcriptomic data of ER+ breast cancer samples of a series of five subsequently published data sets. Throughout, proofs are deferred to the appendix, while the supplementary material contains more elaborate versions of the proofs as well as additional results.

Estimation of precision matrices from multiple data sets has been considered in the literature. [6], [8], [29] deal with the simultaneous estimation of multiple precision matrices using so-called ‘fused’ ridge/lasso penalties, while e.g. [25], [28] adopt a Bayesian approach. The methods of [15], [43] have a similar thrust but these works are based on parameter communalities between the different precision matrices. On a different note but closely related is the work of [25], [26], [44], where the focus is on the reconstruction of multiple directed acyclic graphs in order to facilitate causal inference. Implicitly, these methods assume that the data used for the estimation of each precision matrix are on par, in the sense that no time order is present. This absence of an order motivates the use of a fused penalty that encourages to borrow information across data sets (should the data give rise to do so). In particular, this shrinks the different precision estimates towards each other, but not towards a target matrix that is formed from one of the other data sets. Moreover, this ‘across data set’-calibration is done once, instead of repetitively in the envisioned updating context considered here.

Section snippets

Updating

When new information becomes available, the previously reconstructed network and GGM needs updating. The latter may then serve as the target in the re-estimation of the network from the new data. This process may be iterated, thus giving rise to an updating scheme. To formalize this, assume an infinite sequence of studies, each producing data from the same normal law $N (0_{p}, Ω^{- 1})$ (variations on and deviations from this scenario are discussed at the end of this section and in Section 4). This

Why updating works

Intuition is provided as to why updating improves the knowledge of the sought-for network. In this and the next two sections the subscript $k$ of the estimators is temporarily dropped to reduce notational clutter.

Practical complications

In practice issues may arise that hamper updating or require down-stream manipulation of the estimate. Here two such anticipated complications are identified and possible solutions are outlined.

Conclusion and discussion

We studied the problem of learning a Gaussian graphical model from an experiment that is carried out repeatedly. When data from the next experiment become available, the current estimate of the model parameter needs updating. We proposed to re-estimate the parameter by means of shrinkage estimators that strike a balance between the latest estimate of the parameter and the new experimental data. The resulting estimators yield with high probability an improvement in fit for subsequent

Proofs

Prior to the proof of Proposition 1 several prerequisites are stated.

Lemma 1

Assume $T_{r}^{- 1} \neq S$ . Then, ${‖ {\hat{Σ}}_{r} (λ) - S ‖}_{F}^{2}$ is strictly increasing in $λ$ .

Proof

The proof proceeds by showing that the derivatives of the squared norm with respect to $λ$ is strictly positive. Write the squared Frobenius norm as a trace and expand its argument: ${‖ {\hat{Σ}}_{r} (λ) - S ‖}_{F}^{2} = tr [{\hat{Σ}}_{r}^{2} (λ)] - 2 tr [{\hat{Σ}}_{r} (λ) S] + tr (S^{2})$ . The derivative w.r.t. $λ$ of the right-hand side is: $\frac{d}{d λ} {‖ {\hat{Σ}}_{r} (λ) - S ‖}_{F}^{2} = 2 tr {[{\hat{Σ}}_{r} (λ) - S] \frac{d}{d λ} {\hat{Σ}}_{r} (λ)},$ where: $\frac{d}{d λ} \hat{Σ} (λ) = \frac{1}{2} {[λ I_{p p} + \frac{1}{4} {(S - λ T_{r})}^{2}]}^{- 1 ∕ 2} {\hat{Σ}}_{r} (λ) [{\hat{Σ}}_{r}^{- 1}$

CRediT authorship contribution statement

Wessel N. van Wieringen: Conceptualization, Formal analysis, Methodology, Software, Writing - original draft. Koen A. Stam: Data curation. Carel F.W. Peeters: Conceptualization, Writing - review & editing. Mark A. van de Wiel: Writing - review & editing.

Acknowledgments

We thank the encouraging Editor and referees for their comments that led to this improved version of our work.

References (44)

HannartA. et al.
Estimating high dimensional covariance matrices: A new look at the Gaussian conjugate framework
J. Multivariate Anal.
(2014)
LedoitO. et al.
A well-conditioned estimator for large-dimensional covariance matrices
J. Multivariate Anal.
(2004)
van WieringenW.N.
On the mean squared error of the ridge estimator of the covariance and precision matrix
Stat. Probab. Lett.
(2017)
AflakparastM. et al.
Reconstruction of molecular network evolution from cross-sectional omics data
Biom. J.
(2018)
AnandkumarA. et al.
High-dimensional structure estimation in ising models: Local separation criterion
Ann. Statist.
(2012)
BentoJ. et al.
Which graphical models are difficult to learn?
Adv. Neural Inf. Process. Syst.
(2009)
BergerJ.O.
Statistical Decision Theory and Bayesian Analysis
(2013)
BickelP.J. et al.
Convergence criteria for multiparameter stochastic processes and some applications
Ann. Math. Stat.
(1971)
BilgrauA.E. et al.
Targeted fused ridge estimation of inverse covariance matrices from multiple high-dimensional data classes
J. Mach. Learn. Res.
(2020)
ChenS. et al.
Selection and estimation for mixed graphical models
Biometrika
(2015)

DanaherP. et al.

The joint graphical lasso for inverse covariance estimation across multiple classes

J. R. Stat. Soc. Ser. B Stat. Methodol.

(2014)

DavisC. et al.

The rotation of eigenvectors by a perturbation. III

SIAM J. Numer. Anal.

(1970)

DrtonM. et al.

Structure learning in graphical modeling

Annu. Rev. Stat. Appl.

(2017)

EdgarR. et al.

Gene Expression Omnibus: NCBI gene expression and hybridization array data repository

Nucleic Acids Res.

(2002)

EfronB.

Large-scale simultaneous hypothesis testing: the choice of a null hypothesis

J. Amer. Statist. Assoc.

(2004)

EtamadiN.

Convergence of weighted averages of random variables revisited

Proc. Amer. Math. Soc.

(2006)

H. Gray, G.G.R. Leday, C.A. Vallejos, S. Richardson, Shrinkage estimation of large covariance matrices using multiple...

GuoJ. et al.

Joint estimation of multiple graphical models

Biometrika

(2011)

HornA. et al.

Matrix Analysis

(2009)

JamisonB. et al.

Convergence of weighted averages of independent random variables

Z. Wahrscheinlichkeitstheor. Verwandte Geb.

(1965)

M. Kolar, H. Liu, Marginal regression for multitask learning, in: Proceedings of the Fifteenth International Conference...

KollerD. et al.

Probabilistic Graphical Models: Principles and Techniques

(2009)

Cited by (7)

Smoothly adaptively centered ridge estimator
2022, Journal of Multivariate Analysis
With a focus on linear models with smooth functional covariates, we propose a penalization framework (SACR) based on the nonzero centered ridge, where the center of the penalty is adaptively reweighted, starting from the ordinary ridge solution as the initial center. In particular, we introduce a convex formulation that jointly estimates the model’s coefficients and the weight function, with a roughness penalty on the center, and constraints on the weights in order to recover a possibly smooth and/or sparse solution. This allows for a non-iterative and continuous variable selection mechanism, as the weight function can either inflate or deflate the initial center, reducing the unwanted shrinkage on some of the coefficients. As empirical evidence of the interpretability and predictive power of our method, we provide a simulation study and two real world spectroscopy applications, with both classification and regression.
Promote sign consistency in the joint estimation of precision matrices
2021, Computational Statistics and Data Analysis
The Gaussian graphical model is a popular tool for inferring the relationships among random variables, where the precision matrix provides a natural interpretation of conditional independence. With high-dimensional data, sparsity of the precision matrix is often assumed, and various regularization methods have been applied for estimation. In several scenarios, it is desirable to conduct the joint estimation of multiple precision matrices. In joint estimation, entries corresponding to the same element of multiple precision matrices form a group, and group regularization methods have been applied for the estimation and identification of sparsity structures. In many practical examples, it can be difficult to interpret the results when parameters within the same group have conflicting signs. Unfortunately, existing methods lack an explicit mechanism in regards to sign consistency of group parameters. To tackle this problem, a novel regularization method is developed for the joint estimation of multiple precision matrices. It effectively enhances the sign consistency of group parameters and hence can lead to more interpretable results, while still allowing for conflicting signs to achieve full flexibility. The method’s consistency properties are rigorously established. Simulations show that the proposed method outperforms competing alternatives under a variety of settings. For the two data examples, the proposed approach leads to interpretable results that are different from the alternatives.
Information-incorporated Gaussian graphical model for gene expression data
2022, Biometrics
rags2ridges: A One-Stop-ℓ<inf>2</inf>-Shop for Graphical Modeling of High-Dimensional Precision Matrices
2022, Journal of Statistical Software
Sequential Learning of Regression Models by Penalized Estimation
2022, Journal of Computational and Graphical Statistics
Rags2ridges: A one-stop-shop for graphical modeling of high-dimensional precision matrices
2020, arXiv

View all citing articles on Scopus

View full text

Updating of the Gaussian graphical model through targeted penalized estimation

Abstract

Introduction

Section snippets

Updating

Why updating works

Practical complications

Conclusion and discussion

Proofs

CRediT authorship contribution statement

Acknowledgments

J. Multivariate Anal.

J. Multivariate Anal.

Stat. Probab. Lett.

Reconstruction of molecular network evolution from cross-sectional omics data

Biom. J.

High-dimensional structure estimation in ising models: Local separation criterion

Ann. Statist.

Which graphical models are difficult to learn?

Adv. Neural Inf. Process. Syst.

Statistical Decision Theory and Bayesian Analysis

Convergence criteria for multiparameter stochastic processes and some applications

Ann. Math. Stat.

Targeted fused ridge estimation of inverse covariance matrices from multiple high-dimensional data classes

J. Mach. Learn. Res.

Selection and estimation for mixed graphical models

Biometrika

The joint graphical lasso for inverse covariance estimation across multiple classes

J. R. Stat. Soc. Ser. B Stat. Methodol.

The rotation of eigenvectors by a perturbation. III

SIAM J. Numer. Anal.

Structure learning in graphical modeling

Annu. Rev. Stat. Appl.

Gene Expression Omnibus: NCBI gene expression and hybridization array data repository

Nucleic Acids Res.

Large-scale simultaneous hypothesis testing: the choice of a null hypothesis

J. Amer. Statist. Assoc.

Convergence of weighted averages of random variables revisited

Proc. Amer. Math. Soc.

Joint estimation of multiple graphical models

Biometrika

Matrix Analysis

Convergence of weighted averages of independent random variables

Z. Wahrscheinlichkeitstheor. Verwandte Geb.

Probabilistic Graphical Models: Principles and Techniques