- Research
- Open Access

# Graph reconstruction using covariance-based methods

- Nurgazy Sulaimanov
^{1, 2}and - Heinz Koeppl
^{1, 2}Email author

**2016**:19

https://doi.org/10.1186/s13637-016-0052-y

© The Author(s) 2016

**Received:**27 March 2016**Accepted:**21 October 2016**Published:**23 November 2016

## Abstract

Methods based on correlation and partial correlation are today employed in the reconstruction of a statistical interaction graph from high-throughput omics data. These dedicated methods work well even for the case when the number of variables exceeds the number of samples. In this study, we investigate how the graphs extracted from covariance and concentration matrix estimates are related by using Neumann series and transitive closure and through discussing concrete small examples. Considering the ideal case where the true graph is available, we also compare correlation and partial correlation methods for large realistic graphs. In particular, we perform the comparisons with optimally selected parameters based on the true underlying graph and with data-driven approaches where the parameters are directly estimated from the data.

## Keywords

- High-dimensional graph reconstruction methods
- Concentration and covariance graphs

## 1 Introduction

Inference of biological networks including gene regulatory, metabolic, and protein-protein interaction networks has received much attention recently. With the development of high-throughput technologies, it became possible to measure a large number of genes and proteins at once and this led to a challenge to infer a large-scale gene regulatory and protein-protein interaction networks from high-dimensional data [1, 2]. In order to address this challenge, a wide range of network inference methods have been developed such as methods based on correlation or concentration matrices, mutual information, Bayesian networks, ordinary differential equations (ODEs), and Boolean logic [3, 4]. In addition, high-throughput experiments still remain to be costly, and therefore, experiments are usually carried out for a setting with many more genes or proteins than samples. Traditional statistical methods are usually ill-posed in this small *n* large *p* scenario, and novel methods from high-dimensional statistics that assume further structure, such as sparsity, are a good choice for graph reconstruction in this scenario [5]. Correlation methods that are based on the covariance matrix estimation are widely used in reconstructing gene co-expression and module graphs, especially in large-scale biomedical applications [6–8]. However, the edges of the interaction graph resulting from correlation methods include indirect dependencies due to transitive nature of interactions. Accordingly, the effect of indirect edges is getting more dramatic as the graph size grows, and this leads to an inaccurate graph reconstruction. In contrast, methods based on the concentration or partial correlation matrix allow to infer only direct dependencies between variables. In this respect, one can differentiate two graph types resulting from correlation and partial correlation-based methods which we will call covariance and concentration graphs on the following, respectively. Despite the fact that the covariance graph includes indirect dependencies, it is widely used in applications to represent sparse biological graphs by performing simple hard-thresholding [6] or through estimating the covariance matrix with shrinkage methods [9].

The aim of the paper is to shed light on the relation between covariance and concentration graphs and how this relation can be exploited to study the performance of correlation and partial correlation-based methods. In this manuscript, we provide a practical guide for researchers when using correlation and partial correlation methods and we believe that understanding these two concepts allows for a better selection of methods for graph reconstruction problems from high-throughput biological data.

In particular, we discuss different scenarios using simple examples when it is possible to eliminate indirect dependencies in the covariance graph by hard-thresholding and when it is not. Furthermore, we review recent methods that address the problem of direct and indirect dependencies in reconstructed graphs [10, 11] and provide new insights into those methods, both analytically and numerically. Moreover, we perform in silico comparison of two correlation-based and three partial correlation methods on different graph topologies in the high-dimensional case under the setting when the number of variables *p* exceeds the sample size *n*. The selected methods are popular approaches that are widely used in reconstructing large-scale gene regulatory and protein-protein interaction graphs. The first correlation method is based on the sample covariance matrix estimation where one applies hard-thresholding on the entries of sample covariance matrix to eliminate indirect edges in the covariance graph [12]. The second method estimates a sparse version of the covariance matrix via a shrinkage approach [9]. The partial correlation methods that we consider are the nodewise regression method [13], where partial correlations are computed via linear regression, the graphical Lasso method [14] which reconstructs a concentration graph by directly solving for the sparse version of the concentration matrix and an adaptive version of nodewise regression which determines the concentration graph in a two-stage procedure.

## 2 Notation and preliminaries

*p*-dimensional multivariate normally distributed random vector

*n*i.i.d. observations of

*X*which are given in terms of the

*n*×

*p*matrix

**X**=(

**X**

_{1},…,

**X**

_{ p }), where

**X**

_{ i }is

*n*×1 vector with

*i*=1,…,

*p*. Then, the sample covariance matrix reads

Reconstructed and true graphs are written in terms of a undirected graph *G*=(*Γ*,*E*), with *Γ*={1,…,*p*} the set of variables or nodes and *E*⊆*Γ*×*Γ* is a set of edges. Sometimes, we will also deal with weighted graphs where we extend *G* to contain a weight function \(w\,: E \rightarrow \mathbb {R}\), such that *w*
_{
ij
} denotes the weight of the edge (*i*,*j*)∈*E*. In this paper, we will consider two types of graphs.

*Σ*

_{ ij }=0 indicate that the nodes

*i*and

*j*are independent [15]. More generally, in terms of probability distributions, we have

We denote the covariance graph as \(\tilde {G}=(\Gamma,\tilde {E})\), accordingly. There is an edge between any two nodes *i* and *j* if *Σ*
_{
ij
}≠0 and no edge if *Σ*
_{
ij
}=0. This type of graphs is popular in genomics (for more information, see [16]).

^{−1}, and zero entries of the concentration matrix

*Θ*

_{ ij }=0 indicate that any nodes

*i*and

*j*are conditionally independent given the other nodes. In terms of probability distributions, for arbitrary \(k \in \mathcal {N}, k \neq i, j\) it means

*ρ*

_{ ij }through the relation

for *i*≠*j* and *ρ*
_{
ij
}=1 for *i*=*j*. There is an edge in the concentration graph between nodes *i* and *j* if *ρ*
_{
ij
}≠0 and no edge if *ρ*
_{
ij
}=0 (equivalently for *Θ*
_{
ij
}). Hence, the concentration graph is equivalent in topology to the graph defining the probabilistic graphical model for the Gaussian case and coincides with the graph defining the associated Gaussian Markov random field. Throughout this paper, we will assume that the true interaction graph corresponds to the concentration graph and therefore refer to it as *G*=(*Γ*,*E*).

In the following, we give a definition of direct and indirect edges in the covariance graph which will be convenient throughout the paper.

###
**Definition 1**

Let’s denote the sets of direct and indirect edges in the covariance graph \(\tilde {G}\) as \(\tilde {E}'\) and \(\tilde {E}''\), respectively, with \(\tilde {E}=\tilde {E}' \cup \tilde {E}''\). The set of direct edges is then defined as \(\tilde {E}'=E\) whereas the set of indirect edges is defined as \(\tilde {E}''=\tilde {E} \setminus E\).

## 3 How are covariance and concentration graphs related?

## 4 Methods

_{1}.

### 4.1 Correlation-based methods

#### 4.1.1 Hard-thresholding of sample covariance matrix

However, a selection of the threshold is hard to tackle analytically. Recently, some methods have been developed to choose the threshold from the data [19, 23, 24]. However, these methods have been designed for the case *p*<*n* and do not perform well in the *p*>*n* setting.

*p*>

*n*. In the following, we are going to briefly review this method. Scale-free graphs are characterized by a power law degree distribution

where *k* is the node degree, *γ* is the degree exponent, and *b* is the normalization constant [26, 27]. Some biological graphs have been reported to exhibit a power law have degree distributions with 2<*γ*<3 [27].

Assume a sample covariance matrix S defined as in (2). We further define the thresholding operation *T*
_{
d
}(*S*
_{
ij
}) yielding sample covariance matrix elements thresholded at *d*. To choose the threshold *d*, we fit an affine function \(f(k) = -\hat {\gamma }k + \hat {b}\) to the empirical degree distribution of a graph obtained by thresholding at *d* in the log domain and compute the *R*
^{2} value of the fit (0<*R*
^{2}<1) (Fig. 3 (left)). In addition, we also compute mean degrees \(\bar {k}=p^{-1}\sum _{i=1}^{p}\tilde {k}_{i}\), where \(\tilde {k}_{i}=\sum _{j=1}^{p}T_{d}(S_{ij})\) (Fig. 3 (right)). In particular, we are interested in high *R*
^{2} values and, for sparsity, low mean degree values \(\bar {k}\). We also require \(\hat {\gamma } > 0\), so that the slope of the fitted linear function is negative. High *R*
^{2}, low mean degree values, \(\bar {k}\) and \(\hat {\gamma } > 0\) give rise to graphs with a few connections and that a few nodes have more connections compared to other nodes. This indicates that the graph obtained from *T*
_{
d
}(S) is approximately scale-free. So far, we have introduced a sparse covariance estimation using hard-thresholding where hard-thresholding is performed after the estimation of the sample covariance matrix. In the following section, we discuss a direct estimation of the sparse covariance matrix in which no hard-thresholding is involved.

#### 4.1.2 Covariance Lasso

*Covariance Lasso*. In contrast to hard-thresholding introduced in the previous section, the sparsity in the covariance matrix is achieved by minimizing a log-likelihood function of the form

*λ*

_{cov}is the penalty parameter which induces sparsity in off diagonal elements of Σ, whereas P is a matrix with nonnegative elements and ∘ denotes elementwise multiplication. The matrix P can be chosen as the matrix of ones or zeros on the diagonal to avoid shrinking diagonal elements of Σ. The objective function given in (31) is nonconvex which is due to the term log detΣ and has several local minima, which makes the optimization problem difficult. Since the objective function contains convex and concave terms, a majorization-minimization approach is used to solve the problem. This approach was successfully applied earlier on similar problems [28, 29]. The concave part of the objective function (31) is approximated by its tangent at Σ

_{0}

_{0}=S or Σ

_{0}=diag(S) and \(\boldsymbol {\Theta }_{0}=\boldsymbol {\Sigma }_{0}^{-1}\). So one needs to estimate the covariance matrix by

In the case *p*>*n*, the sample covariance matrix S is not full rank, and to avoid this, one needs to use S=S+*s*
*I*, for some small regularizing parameter *s*>0.

*λ*

_{cov}should be determined from the data and

*K*-fold cross-validation is used for this purpose. First, the samples (1,…,

*n*) which correspond to the rows of the design matrix

**X**are partitioned into

*K*subsets which are used as training and validation sets. Initially, the covariance matrix is estimated as in (34) using the training set. We denote it as \(\boldsymbol {\hat {\Sigma }}_{T}\). The validation set is used to compute the sample covariance matrix, which we denote as S

_{ V }. The penalty parameter is then computed via

where \(L(\boldsymbol {\hat {\Sigma }}_{T}|\boldsymbol {S}_{V})\) is defined in (31).

### 4.2 Partial correlation-based methods

#### 4.2.1 Nodewise regression Lasso

**X**

_{ i },

*i*∈

*Γ*to be a response variable and

**X**

^{∖i }to be the matrix of predictor variables consisting of the remaining

*p*−1 variables. In order to get an estimate for the node

*i*∈

*Γ*, one regresses this node with the remaining nodes

*j*∈

*Γ*∖{

*i*} and get a linear model of the form

^{ i }is the set of

*p*−1 regression coefficients associated to node

*i*and \(\mathbb {E}[\boldsymbol {\epsilon }_{i}]=\mathbf {0}\). Denoting an element of vector β

^{ i }as the regression coefficient \({\beta ^{i}_{j}}\), with

*j*∈

*Γ*∖{

*i*}, then this coefficient can be related to the concentration matrix as

where *λ*
_{
L
}>0 denotes the penalty parameter. In order to estimate a whole graph, this procedure is applied to all nodes, by regressing each node by the remaining nodes. Nodewise regression Lasso returns sparse estimates which are not symmetric. In particular, there are two different estimates for each edge between any two nodes, which are estimated from two different regression problems. To decide for the absence or presence of the corresponding edge in the concentration graph, AND and OR operations are proposed in [13], i.e., an edge (*i*,*j*) is present if \(\hat {\beta }^{i}_{j}\) and/or \(\hat {\beta }^{j}_{i}\) are non-zero.

#### 4.2.2 Graphical Lasso

where *λ*
_{
G
} is the parameter which controls the size of the penalty. This log-likelihood function is convex and can be solved by a block coordinate descent method proposed in [31]. The estimated concentration matrix is symmetric, and there are no additional AND or OR operations needed.

#### 4.2.3 Adaptive Lasso

*λ*

_{ L }in (39) and

*λ*

_{ G }in (40) are chosen by cross-validation. However, a cross-validated choice of these penalty parameters does not lead to a consistent model selection and leads to overestimation [5, 13]. Therefore, it is suggested to apply cross-validation using the adaptive Lasso (adaptive version of nodewise regression) which gives a sparser solution compared to cross-validation with nodewise regression and graphical Lasso. Given the data where the underlying graph is not known, it is challenging to determine a good Lasso penalty from the data. One study showed that it is possible to assign different weights to different coefficients thereby allowing the coefficients to be non-equally penalized in the

*L*

_{1}penalty [22]. This is achieved by the following estimator:

where \(\tilde {\boldsymbol {\beta }}^{i}\) are initial estimates from (39) and used as weights. It is suggested to estimate \(\tilde {\beta }^{i}\) with the penalty parameter computed through cross-validation. In the second step, it is suggested to select the penalty parameter again by cross-validation in the adaptive Lasso. The adaptive Lasso has the property that if the initial estimates \(\tilde {\beta }^{i}_{j}=0\), then the final estimates resulting from the adaptive Lasso are also \(\hat {\beta }^{i}_{j}=0\). If the initial estimates \(\tilde {\beta }^{i}_{j}\) are large, then the adaptive Lasso applies a small penalty for these estimates and vice versa. This way, the adaptive Lasso allows to reduce the number of false positives from the first step and yields a sparse solution.

## 5 Comparison of correlation- and partial correlation-based methods

### 5.1 Generating synthetic data from different graph topologies

*p*and are generated from the adjacency matrices with the size

*p*×

*p*.

- 1.
*Chain graph*. The graph corresponds to a tridiagonal adjacency matrix where each row and column consist of one or two non-zero entries which correspond to the graph with the maximum degree of 2. The graph consists of*p*−1 number of edges. - 2.
*Cluster graph*. The rows/columns of the adjacency matrix are evenly partitioned into*l*disjoint submatrices. Here, we denote them as*U*_{ i },*i*=1,…,*l*. Since they are disjoint, we can write*U*_{1}∪*U*_{2}∪,…,∪*U*_{ l }={1,…,*p*} and the corresponding graph contains*p*(*p*/*l*−1)*P*/2 number of edges, where*P*is the probability of the edge between any two nodes in a subgraph. If probability*P*=1, then disjoint subgraphs are fully connected. Decreasing*P*allows to generate sparse subgraphs. - 3.
*Scale-free graph*(Barabasi-Albert model) ([26, 27]). The degree of the graph follows a power law distribution (30). The graph generation is based on a preferential attachment and starts with*m*_{0}nodes. The new nodes with*m*≤*m*_{0}edges are added to*m*_{0}existing nodes in the graph. A new node is added to the existing node*i*depending on the degree*k*_{ i }with the probability \(P(k_{i}) = k_{i}/\sum _{j}^{}k_{j}\). The graph contains*p*−1 edges. - 4.
*Hub graph*. The rows/columns of the adjacency matrix are evenly partitioned into*l*disjoint groups as in the cluster graph,*U*_{1}∪*U*_{2}∪,…,∪*U*_{ l }={1,…,*p*}. At each disjoint subgraph, a hub node has more connections to other nodes, whereas the other nodes have only one connection. Since a partitioning is even, every subgraph contains the same number of nodes and edges.

All graphs are generated using R package *huge* [32].

### 5.2 Comparison of methods based on optimal predictions

*p*=50 and generate the dataset with the sample size

*n*=30. To account uncertainty in the data generation, we resample the data 100 times and perform the graph reconstruction with 100 datasets each of size

*p*=50. This allows us to assess the performance of methods in the presence of noise. For better illustration purposes, we plot predicted edges on the correctly predicted vs total predicted axis (Fig. 6 (left)). In addition to methods, we perform predictions by random guessing, which is used for a quality control in our study. To assess the quality of predictions produced by different methods, we compute Euclidean distances from individual edge predictions to true edges as

where *T*
_{
R
} denotes true edges in the true graph, *C*
_{pred} and *T*
_{pred} represent correctly predicted and total predicted edges, respectively. We then compute the cumulative distribution of *d*
_{
E
} (Fig. 6 (middle)).

*E*=49 edges which is regarded as simplest (Fig. 6 (first top panel)). Other methods predict about 35 to 40 edges correctly, whereas the nodewise regression Lasso produces almost perfect predictions. On the scale-free graph, the nodewise regression Lasso performs best among four methods. The prediction accuracy is about more than half of true edges for the nodewise regression Lasso and less than half for three remaining methods. The three methods predict almost a similar number of edges out of which 10 to 20 are correct edges. From ROC curves, one can see that initially all three methods perform similarly, but later, the graphical Lasso starts outperforming the thresholded sample covariance and the covariance Lasso. Since the scale-free graph contains more highly connected nodes (maximum degree

*k*

_{max}= 13) compared to other graphs, the prediction accuracy of all methods reduces in comparison to chain and cluster graphs thereby being close to predictions by random guessing. For the cluster graph, we set the probability of the edge between any two nodes to

*P*=0.3, so that the resulting graph contains less hub nodes as possible (

*k*

_{max}=4). The nodewise regression Lasso predicts on average 40 true edges out of 70, whereas other methods predict 30. In case of the hub graph, where we have 10 disjoint subgraphs with 10 hub nodes, the predictions of the nodewise regression Lasso are again best among other methods by predicting about 40 true edges out of 50. In contrast, the remaining three methods only predict a half of all true edges. We observe that the thresholded covariance, the covariance Lasso, and the graphical Lasso predict almost a similar number of true edges in all four graphs. In contrast, the nodewise regression Lasso performs best compared to other methods in all four graphs. Our comparison metrics are based on the control of false positive edges, and a similar observation was published earlier in the work of Peng et al. [33], where the authors showed that the nodewise regression Lasso performs better than the graphical Lasso when controlling for false discovery rate.

## 6 Comparison of methods when underlying graph is not known

In this section, we are going to discuss how the methods perform when the underlying graph is not given. This is a typical case in applications where the underlying graph is not known, and a challenge is to infer the graph based on the data. We are therefore going to discuss available methods that allow the selection of the optimal threshold for the sample covariance matrix and optimal regularizations for covariance Lasso and adaptive Lasso methods. Because, a cross-validated choice of the penalty parameter in nodewise regression and graphical Lasso methods leads to overestimation problem, we consider selecting the penalty from the adaptive Lasso by cross-validation which gives a sparser solutions compared to former methods. We already introduced these methods in previous sections and are going to discuss how they perform in practice. For comparison, we choose the same settings: *p*=50 and *n*=30.

### 6.1 Scale-free criteria-based thresholding of sample covariance matrix

*R*

^{2}values and mean degree values \(\bar {k}\) for various thresholds uniformly selected from [0,1]. For a reference graph, we also compute the

*R*

^{2}value (green line) and the mean degree value \(\bar {k}\) (blue line) of the true graph. As illustrated in Fig. 7 a, higher

*R*

^{2}values are achieved for the threshold higher than 0.5 which can be compared to that of the true graph (green line). The corresponding mean degree value for the threshold higher than 0.5 is also close to that of the true graph (blue line). To compare how well the threshold is selected, we further perform hard-thresholding on the true covariance matrix and compute

*R*

^{2}and mean degree values (Fig. 7 b). Since the graph for the true covariance matrix is fully connected, without thresholding, it returns low

*R*

^{2}and high mean degree values. High

*R*

^{2}values are achieved for the threshold higher than 0.5 as it was observed in the scale-free selection case (Fig. 7 a). In particular, the mean degree values close to true mean values are also attained approximately at the same threshold. In practical applications, when inferring a gene co-expression graph from microarray data, it is usually suggested to select the threshold with high

*R*

^{2}values and low mean degree values. In particular, for a high-dimensional case with thousand genes, these two metrics show saturation for high

*R*

^{2}and low mean degree values. Although in our case there is no saturation effect, it is possible to select the threshold to be 0.6, for which the

*R*

^{2}value is high and the mean degree value is low. Furthermore, we perform simulations with this threshold and compute the number of true edges in the thresholded graph (Fig. 7 c). As the plot indicates, the selected threshold is nearly optimal giving predictions close to optimal ones. Despite it gives results close to the optimal ones, best threshold predictions are almost as bad as the results of random guessing. It is noteworthy that, in our simulations, this method was shown to work well when the sample size is larger than the variable size (

*p*<

*n*). Since we only consider the

*p*>

*n*case in our study, the results are not shown.

Theoretically, high *R*
^{2} values can be achieved only for scale-free graphs and not applicable for other graph types. We also show that it is not possible to attain high *R*
^{2} values with other graph types used in our study (results are not shown here).

### 6.2 Cross-validation with covariance Lasso

*λ*

_{cov}from the data, we compute it by cross-validation procedure. We perform fivefold cross-validation and select the penalty parameter that maximizes the log-likelihood function in (31). Figure 8 depicts computed likelihood values with the penalty parameters selected from a range

*λ*

_{cov}∈[0,7]. The results show that the maximum likelihood values for all graphs exist almost in a close range of the penalty parameter. For chain and cluster graphs, the maxima are attained between

*λ*

_{cov}=3 and

*λ*

_{cov}=5, whereas for scale-free and hub graphs, between

*λ*

_{cov}=4 and

*λ*

_{cov}=6. Therefore, the penalty parameters for further simulations, we have chosen from these ranges where the maximum for the log-likelihood is attained. We then performed the covariance graph estimation using these penalty parameters. Unfortunately, we observe that in all cases, these penalty values lead to the overestimation of the graph. In particular, a lot of false positive edges are selected in the estimated graph.

### 6.3 Cross-validation with adaptive Lasso

*p*>

*n*. Other graphs used in the study contain less number of hub nodes and the method performs well on these graphs. For example, the maximum degree of the chain graph is

*k*

_{max}=2, for the cluster graph

*k*

_{max}=4, for the hub graph

*k*

_{max}=9, and for the scale-free graph

*k*

_{max}=13. Therefore, we observe that the penalty selection under cross-validation with the adaptive Lasso is highly dependent on the number of hub nodes in the graph. We also have to mention that the adaptive Lasso method does not take any prior information about the graph topology and applies the uniform penalty on all edges in the graph, which is also a major drawback of the method when applied to graphs which contain more hub nodes. This observation was also reported earlier in the other studies [34–36].

## 7 Effect of correlation strength on the performance of methods

In this section, we are going to discuss the role of correlation strength on the performance of methods. It has been shown that a magnitude of correlations should be bounded from below in order for the method to give consistent predictions [13]. It is known that if data variability is less, then large sample size is required to increase an estimation accuracy. If the sample size is limited, which is often the case in biomedical applications, then it is possible to increase the prediction accuracy by increasing the variability in the data so that correlation information between variables is high. In this section, we examine how prediction accuracy of methods is affected with changes in data variability. For this purpose, we generate several datasets from the correlation matrices with different correlation magnitudes and then perform the graph reconstruction with four methods on these datasets. To generate datasets with a different degree of correlation, we use the method introduced in [32].

*p*×

*p*adjacency matrix which consists of binary values and represents a certain graph. To induce different correlation strengths in the data, we first multiply A with some scalar

*w*>0 and convert the resulting matrix into the positive definite matrix

*γ*=| min(

*λ*

_{ i })|+

*ε*,

*i*=1,…,

*p*and

*ε*>0. Here

*λ*

_{ i }are the eigenvalues of the matrix

*w*A. Then, we compute the correlation matrix by

where Λ is the matrix of diagonal elements of the covariance matrix \(\,\boldsymbol {\hat {\!A}}^{-1}\). As a measure of the correlation magnitude, we define \(\sigma =(\sqrt {\smash [b]{\text {var}(C_{ij}))}}, \ i, j = 1,\ldots,p\). Here, the different values of *w* allow to generate the correlation matrices with different magnitudes. The correlation matrix is then used to generate datasets using the procedure described in Fig. 4.

*σ*≈0.15, colored in blue), the performance of methods is relatively poor. In this regime, all methods predict about 1/4 of correct edges. Increasing the magnitude of correlation positively affects the performance of all methods (II, III, and IV). For instance, at

*σ*≈0.19, the sensitivity of the thresholded sample covariance matrix predictions increases from 0.23 to 0.67. In this regime, the sensitivity of the covariance Lasso increases from 0.24 to 0.72 (12 to 30 edges), while the sensitivity for the nodewise regression Lasso and the graphical Lasso increases from 0.24 to 0.7 (from 13 to 35 edges). The accuracy of covariance Lasso predictions does not change so much from II to IV, indicating a saturation effect of the method. The saturation effect is also observed for the thresholded sample covariance matrix from (III) to (IV). In contrast, the sensitivity of the nodewise regression Lasso and the graphical Lasso predictions increases with the increasing correlation strength. In the regime (III), the sensitivity of the nodewise regression Lasso is about 0.83, whereas at (IV), it is almost 0.93. The sensitivity of the graphical Lasso increases from 0.75 (III) to 0.82 (IV).

Sensitivity of predictions computed by four methods calculated as the average ratio of correctly predicted to total predicted edges

Correlation strength |
| 0.19 (II) | 0.22 (III) | 0.36 (IV) |
---|---|---|---|---|

Thresholded sample covariance | 0.23 | 0.67 | 0.73 | 0.73 |

Covariance Lasso | 0.24 | 0.72 | 0.8 | 0.77 |

Nodewise regression Lasso | 0.24 | 0.7 | 0.83 | 0.93 |

Graphical Lasso | 0.25 | 0.7 | 0.75 | 0.82 |

## 8 Conclusions

High-dimensional graph reconstruction methods have attracted much scientific interest over the last years and continue to be investigated further. In this work, we analyze the relation between concentration and covariance graphs and further conduct the detailed comparison between various graph reconstruction methods designed to infer concentration as well as covariance graphs. Our analytical study shows that it is possible to establish a link between these two graphs using Neumann series. In particular, we show the entry-wise relation between the entries of the covariance matrix and the transitive closure matrix associated to the concentration graph. We analytically demonstrate this relation for a star graph. Moreover, we analytically demonstrate a graph property that the covariance graph associated to the correlation matrix can be shown as the minimum transitive closure of the concentration graph. We also show a small scale demonstration for a three-node graph. Eventually, this property can be exploited to infer edge weights of the covariance graph directly from edge weights of the concentration graph. Currently, it has been shown for a star graph, but can be extended to other graph types too.

Furthermore, we performed the analytical and numerical studies on recently published network deconvolution and network silencing methods [10, 11]. In particular, we derived the analytical solution to the network deconvolution problem by exploiting facts from Kac-Murdock-Szëgo matrix. We also give more insights about the role of the scaling parameter which has been studied only numerically in the original study. Moreover, we conducted a detailed comparison of the methods designed to reconstruct covariance and concentration graphs on different graph topologies. In order to resemble the high-throughput experiments, we designed our simulation experiments with more variables than samples (*p*>*n*). We showed that the nodewise regression Lasso allows to select a consistent penalization which controls the number of false positives compared to the thresholded sample covariance, the covariance Lasso methods, and the graphical Lasso. The adaptive version of nodewise regression Lasso also allows to control the rate of false positives better than correlation-based methods when the penalty parameter is chosen via cross-validation.

## Declarations

### Acknowledgements

We would like to thank Sara Al-Sayed for useful comments and discussions. This work has been supported by the e:Bio project HostPathX funded by Federal Ministry of Education and Research (BMBF). HK also acknowledges support from the LOEWE research priority program CompuGene and from the H2020 European project PrECISE.

### Authors’ contributions

NS and HK conceived and designed the experiments. NS performed the experiments. NS and HK wrote the paper. Both authors read and approved the final manuscript.

### Competing interests

The authors declare that they have no competing interests.

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## Authors’ Affiliations

## References

- D Marbach, JC Costello, R Küffner, NM Vega, R Prill, et al, Wisdom of crowds for robust gene network inference. Nat. Methods.
**9**(8), 796–804 (2012).View ArticleGoogle Scholar - SM Hill, LM Heiser, T Cokelaer, M Unger, NK Nesser, et al, Inferring causal molecular networks: empirical assessment through a community-based effort. Nat. Methods.
**13**(4), 310–318 (2016).View ArticleGoogle Scholar - W-P Lee, W-S Tzou, Computational methods for discovering gene networks from expression data. Brief. Bioinformatics.
**10**(4), 408–423 (2009).Google Scholar - F Markowetz, R Spang, Inferring cellular networks—a review. BMC Bioinformatics.
**8**(6), 1–17 (2007).Google Scholar - P Bühlmann, S van de Geer,
*Statistics for high-dimensional data: methods, theory and applications*, 1st edn. (Springer, Heidelberg, 2011).View ArticleMATHGoogle Scholar - P Langfelder, S Horvath, WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics.
**9**(1), 559 (2008).View ArticleGoogle Scholar - J Dong, S Horvath, Understanding network concepts in modules. BMC Syst. Biol.
**1**(1), 1–20 (2007).View ArticleGoogle Scholar - S Horvath, J Dong, Geometric interpretation of gene coexpression network analysis. PLoS Comput. Biol.
**4**(8), 1000117 (2008).MathSciNetView ArticleGoogle Scholar - J Bien, RJ Tibshirani, Sparse estimation of a covariance matrix. Biometrika.
**98**(4), 807–820 (2011).MathSciNetView ArticleMATHGoogle Scholar - S Feizi, D Marbach, M Médard, M Kellis, Network deconvolution as a general method to distinguish direct dependencies in networks. Nat. Biotechnol.
**31**(8), 726–733 (2013).View ArticleGoogle Scholar - B Barzel, A-L Barabási, Network link prediction by global silencing of indirect correlations. Nat Biotechnol.
**31**(8), 720–5 (2013).View ArticleGoogle Scholar - R Mazumder, T Hastie, Exact covariance thresholding into connected components for large-scale graphical lasso. J. Mach. Learn. Res.
**13**(1), 781–794 (2012).MathSciNetMATHGoogle Scholar - N Meinshausen, P Bühlmann, High-dimensional graphs and variable selection with the Lasso. Ann. Statist.
**34**(3), 1436–1462 (2006).MathSciNetView ArticleMATHGoogle Scholar - J Friedman, T Hastie, R Tibshirani, Sparse inverse covariance estimation with the graphical lasso. Biostatistics.
**9**(3), 432–441 (2008).View ArticleMATHGoogle Scholar - T Hastie, R Tibshirani, J Friedman,
*The elements of statistical learning. Springer Series in Statistics*(Springer, New York, 2001).View ArticleMATHGoogle Scholar - AJ Butte, P Tamayo, D Slonim, TR Golub, IS Kohane, Discovering functional relationships between rna expression and chemotherapeutic susceptibility using relevance networks. Proc. Nat. Acad. Sci.
**97**(22), 12182–12186 (2000).View ArticleGoogle Scholar - SL Lauritzen,
*Graphical models*(Oxford University Press, Oxford, 1996).MATHGoogle Scholar - TH Cormen, CE Leiserson, RL Rivest, C Stein,
*Introduction to algorithms, third edition*, 3rd edn. (The MIT Press, Cambridge, 2009).MATHGoogle Scholar - PJ Bickel, E Levina, Covariance regularization by thresholding. Ann. Statist.
**36**(6), 2577–2604 (2008).MathSciNetView ArticleMATHGoogle Scholar - U Grenander, G Szeg ·o,
*Toeplitz forms and their applications*(Chelsea Pub. Co., New York, 1984). Spine title: Toeplitz forms.Google Scholar - M Dow, Explicit inverses of toeplitz and associated matrices. ANZIAM J.
**44**(E), 185–215 (2003).MATHGoogle Scholar - H Zou, The adaptive Lasso and its oracle properties. J. Am. Stat. Assoc.
**101**(476), 1418–1429 (2006).MathSciNetView ArticleMATHGoogle Scholar - N El Karoui, Operator norm consistent estimation of large-dimensional sparse covariance matrices. Ann. Statist.
**36**(6), 2717–2756 (2008).MathSciNetView ArticleMATHGoogle Scholar - PJ Bickel, E Levina, Regularized estimation of large covariance matrices. Ann. Statist.
**36**(1), 199–227 (2008).MathSciNetView ArticleMATHGoogle Scholar - B Zhang, S Horvath, A general framework for weighted gene co-expression network analysis. Stat. Appl. Genet Mol. Biol.
**4**(1), 1128 (2005).MathSciNetMATHGoogle Scholar - A-L Barabási, R Albert, Emergence of scaling in random networks. Science.
**286**(5439), 509–512 (1999).MathSciNetView ArticleMATHGoogle Scholar - A-L Barabási, ZN Oltvai, Network biology: understanding the cell’s functional organization. Nat. Rev. Genet.
**5**(2), 101–113 (2004).View ArticleGoogle Scholar - DR Hunter, R Li, Variable selection using MM algorithms. Ann. Statist.
**33**(4), 1617–1642 (2005).MathSciNetView ArticleMATHGoogle Scholar - K Lange,
*Optimization. Springer Texts in Statistics*(Springer, Heidelberg, 2004).Google Scholar - R Tibshirani, Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Series B.
**58:**, 267–288 (1994).MathSciNetMATHGoogle Scholar - O Banerjee, L El Ghaoui, A d’Aspremont, Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. J. Mach. Learn. Res.
**9:**, 485–516 (2008).MathSciNetMATHGoogle Scholar - T Zhao, H Liu, K Roeder, J Lafferty, L Wasserman, The huge package for high-dimensional undirected graph estimation in R. J. Mach. Learn. Res.
**13**(1), 1059–1062 (2012).MathSciNetMATHGoogle Scholar - J Peng, P Wang, N Zhou, J Zhu, Partial correlation estimation by joint sparse regression models. J. Am. Stat. Assoc.
**104**(486), 735–746 (2009).MathSciNetView ArticleMATHGoogle Scholar - KM Tan, P London, K Mohan, S-I Lee, M Fazel, D Witten, Learning graphical models with hubs. J. Mach. Learn. Res.
**15**(1), 3297–3331 (2014).MathSciNetMATHGoogle Scholar - J Peng, P Wang, N Zhou, J Zhu, Partial correlation estimation by joint sparse regression models. J. Am. Stat. Assoc.
**104**(486), 735–746 (2009).MathSciNetView ArticleMATHGoogle Scholar - Q Liu, AT Ihler, in
*AISTATS. JMLR Proceedings*, 15, ed. by G. J Gordon, D. B Dunson, and M Dudík. Learning scale free networks by reweighted l1 regularization (JMLR.orgFt. Lauderdale, 2011), pp. 40–48.Google Scholar