 Methodology article
 Open Access
 Published:
Reconstruction of largescale regulatory networks based on perturbation graphs and transitive reduction: improved methods and their evaluation
BMC Systems Biology volume 7, Article number: 73 (2013)
Abstract
Background
The datadriven inference of intracellular networks is one of the key challenges of computational and systems biology. As suggested by recent works, a simple yet effective approach for reconstructing regulatory networks comprises the following two steps. First, the observed effects induced by directed perturbations are collected in a signed and directed perturbation graph (PG). In a second step, Transitive Reduction (TR) is used to identify and eliminate those edges in the PG that can be explained by paths and are therefore likely to reflect indirect effects.
Results
In this work we introduce novel variants for PG generation and TR, leading to significantly improved performances. The key modifications concern: (i) use of novel statistical criteria for deriving a highquality PG from experimental data; (ii) the application of local TR which allows only short paths to explain (and remove) a given edge; and (iii) a novel strategy to rank the edges with respect to their confidence. To compare the new methods with existing ones we not only apply them to a recent DREAM network inference challenge but also to a novel and unprecedented synthetic compendium consisting of 30 5000gene networks simulated with varying biological and measurement error variances resulting in a total of 270 datasets. The benchmarks clearly demonstrate the superior reconstruction performance of the novel PG and TR variants compared to existing approaches. Moreover, the benchmark enabled us to draw some general conclusions. For example, it turns out that local TR restricted to paths with a length of only two is often sufficient or even favorable. We also demonstrate that considering edge weights is highly beneficial for TR whereas consideration of edge signs is of minor importance. We explain these observations from a graphtheoretical perspective and discuss the consequences with respect to a greatly reduced computational demand to conduct TR. Finally, as a realistic application scenario, we use our framework for inferring gene interactions in yeast based on a library of gene expression data measured in mutants with single knockouts of transcription factors. The reconstructed network shows a significant enrichment of known interactions, especially within the 100 most confident (and for experimental validation most relevant) edges.
Conclusions
This paper presents several major achievements. The novel methods introduced herein can be seen as state of the art for inference techniques relying on perturbation graphs and transitive reduction. Another key result of the study is the generation of a new and unprecedented largescale in silico benchmark dataset accounting for different noise levels and providing a solid basis for unbiased testing of network inference methodologies. Finally, applying our approach to Saccharomyces cerevisiae suggested several new gene interactions with high confidence awaiting experimental validation.
Background
Datadriven inference of intracellular regulatory networks, in particular of those involved in gene regulation, remains to be one key challenge of computational and systems biology.
Many methods for this daunting task have been proposed and new methods are appearing at a high rate [1–6]. The different inference methodologies can be categorized based on the model formalism and the principle used for deriving interactions in a regulatory network: sparse regression [7], correlationbased approaches [8], zscore [9], mutual information [10, 11], ANOVAbased analysis [12], Bayesian networks [13], Gaussian graphical models [14], random forest [15], differential equations [16], reaction networks [17] and Boolean networks [18, 19]. One final output of all these approaches is the reconstructed network topology, typically given as a (signed or unsigned, directed or undirected) graph. Recent efforts have shown that combining several of the aforementioned methods often outperform all single approaches [6].
It is of utmost importance to rigorously evaluate and compare the large number of methods before one can put confidence in the results of their application. The need for the verification of computational systems biology methods is now recognized worldwide. Notably, the Dialogue on Reverse Engineering Assessment and Methods (DREAM) project organizes international gene regulatory network inference challenges and evaluates the solutions submitted by participating research groups in a transparent manner [6, 20, 21]. This way a “collaborativecompetition” is established in which complicated problems are addressed as a community rather than individual laboratories [22]. Recently, it was demonstrated that such community effort was fruitful for the inference of an improved gene regulatory network of Escherichia coli and the inference of a novel gene regulatory network for the bacterium Staphylococcus aureus [6].
Verification of inference methods requires benchmark datasets [22]. Benchmarking on real biological data is challenging as true biological networks are largely unknown [23]. The availability of realistically simulated datasets is therefore of utmost importance for the verification of these methods. Only for simulated data can we be certain about the true complex system underlying the data. Simulated data has been used to validate methods, but typically the data was generated with small networks (containing 10–100 genes) [24, 25] and with the same models as used by the inference [26, 27]. A step to more realistic benchmark data was made in 2003 [28], generating simulated gene expression data using equations based on enzyme kinetics (for use of these data in method evaluations, see e.g. [8, 29]). As regulatory network inference methods are typically applied to genomewide data, a necessary next step is to perform evaluations also on genomescale.
In this paper we revisit two related gene network inference methods: downranking of feedforward loops (DRFFL [30]) and TRANSitive reduction for WEighted Signed Digraphs (TRANSWESD [31]). Both approaches were successfully employed (ranked first and third, respectively) in the DREAM4 In Silico Network 100 nodes challenge. In this challenge, the task was to reverse engineer gene networks from (simulated) steadystate and timeseries data. DRFFL and TRANSWESD share a common core as they both try to infer a minimal regulatory graph that can explain the gene expression changes observed in perturbation experiments. In particular, both methods apply the principle of Transitive Reduction to identify and eliminate edges reflecting indirect effects. Since both DRFFL and TRANSWESD were ranked high, their underlying inference strategy could provide a generally promising approach for gene network inference.
Network reconstruction based on transitive reduction usually involves three steps of which the last can be seen as optional:
Step 1 (Generation of a perturbation graph): A perturbation graph${\mathcal{G}}^{P}$ is generated from the perturbation data, i.e., a directed edge from a node i to a node j (i → j) is included in ${\mathcal{G}}^{P}$ if a perturbation in i changed the level of j significantly (significance to be measured by a certain criterion). Sometimes, the edges are also labeled by a sign and might also get a weight indicating their confidence or likelihood.
Step 2 (Transitive reduction): As an edge in the perturbation graph may reflect a direct but also an indirect effect between two nodes, the goal of the second step – the transitive reduction – is to identify and eliminate indirect effects in ${\mathcal{G}}^{P}$ yielding the final reconstructed graph ${\mathcal{G}}^{T}$. As a general rule for transitive reduction, an edge introduced due to indirect effects is detected by searching for alternative paths in ${\mathcal{G}}^{P}$ which could induce the same net effect as this edge. We say that such a path explains the edge and the latter is then removed.
Step 3 (Edge sorting): Normally one would consider all edges contained in ${\mathcal{G}}^{T}$ as the true edges. In an optional third step, all edges of the reconstructed graph ${\mathcal{G}}^{T}$ are ranked in a list according to a given confidence score for each edge. For certain applications it might be useful to augment this list also by “edges” (together with their confidence values) not contained in ${\mathcal{G}}^{T}$ (i.e., edges which were not contained in ${\mathcal{G}}^{P}$ or which were removed from the latter when computing the transitive reduction ${\mathcal{G}}^{T}$). In this way we get an ordered list of all potential pairwise interactions according to their confidence score.
These three steps are common to all approaches using transitive reduction (abbreviated by TR in the following) but different variants may arise (i) by using different approaches to derive the perturbation graph (abbreviated PG) in Step 1 or (ii) by considering different criteria a path must fulfill in order to explain a given edge in Step 2, or (iii) by different edge sorting schemes to be used in Step 3. For example, DRFFL [30] uses a zscorebased strategy to generate the PG and does not consider edge signs in the TR step when searching for valid paths that can explain certain edges. In contrast, TRANSWESD [31] generates the PG by selecting edges that satisfy two related but distinct statistical conditions whereas the actual TR procedure accounts for edge signs and also edge weights when searching for suitable paths that can explain a given edge.
In the present study, we propose and test novel variants for each of the three steps mentioned above, i.e., for PG generation, for TR, and for edge sorting. As one major outcome, we present particular combinations of PG generation and TR strategies which yielded superior results in diverse benchmark tests outperforming by far the two original approaches. As benchmarks we used not only the DREAM4 In Silico Network challenge but also a novel and unprecedented synthetic compendium consisting of several realistic 5000gene networks simulated with varying biological and measurement error variances resulting in a total of 270 datasets. In both benchmarks we focus on perturbations induced by single gene knockouts. Such experiments can be carried out at genomescale at least in some model organisms (see, for example, [32–35]). As a realistic application scenario we used our framework for inferring gene interactions in yeast Saccharomyces cerevisiae based on a library of gene expression data measured in mutants with single knockouts of transcription factors [35]. The reconstructed network shows a significant enrichment of known interactions, especially within the (most relevant) edges identified with highest confidence.
The results of the benchmarks demonstrate the relative performance of the different approaches, moreover, they also enable us to draw some general conclusions. For example, it turns out that when pruning the PG by TR, it is often sufficient or sometimes even favorable to restrict the search on paths having a length of only two. We also demonstrate that edge weights are highly beneficial for TR whereas edge signs are of minor importance (the latter finding was also reported in [36]). We give an explanation for these observations from a graphtheoretical perspective.
Methods
We start with a brief description of the original TR methods DRFFL and TRANSWESD which inspired the novel inference algorithms presented herein. Afterwards we introduce the new variants for PG generation, TR, and edge sorting. For the PG generation algorithms, we assume that we are given the following input variables (for a network of ngenes):

a 1 × n row vector G^{wt} containing the (possibly preprocessed) wildtype gene expression data

the n × n matrix G^{ko} containing the (possibly preprocessed) measured steadystate gene expression levels after perturbing/knockingout each single gene. The element ${\mathbf{G}}_{i,j}^{\mathit{\text{ko}}}$ stores the gene expression level of gene j after perturbing gene i.
These input variables directly correspond to the datasets provided in the DREAM4 challenge and in our novel compendium of simulated largescale networks (described below).
Downranking of feedforward loops (DRFFL)
The DRFFL algorithm described in [30] used the following strategies for the three steps:
Step 1 (PG generation)
In a preprocessing step, a confidence weight is assigned to each possible edge i → j of the network by computing the absolute value of the standard zscore z_{ ij }. The latter quantifies the difference between the expression ${\mathbf{G}}_{i,j}^{\mathit{\text{ko}}}$ of gene j under knockout/perturbation of gene i and its mean μ_{ j }, normalized by the standard deviation σ_{ j }:
Mean μ_{ j } and standard deviation σ_{ j } are computed on all available expression measurements of gene j, including the wildtype ${\mathbf{G}}_{j}^{\mathit{\text{wt}}}$. Then, the PG ${\mathcal{G}}^{P}$ is obtained by selecting all those edges whose z_{ ij } is larger than a given threshold β. We then denote the PG generated by the original DRFFL method by PG^{1}.
Step 2 (TR)
DRFFL circumvents possible problems arising in TR of cyclic graphs by allowing only those edges to be removed that connect nodes from different strongly connected components (a strongly connected component in a directed graph is a maximal subgraph in which for each ordered pair of nodes a path exists connecting these nodes). DRFFL uses unsigned and unweighted TR, i.e., an edge i → j is removed from ${\mathcal{G}}^{P}$ if i and j are from different components and if there is an alternative path connecting i and j without using edge i → j.
Step 3 (Edge sorting)
The confidence weights z_{ ij } of the remaining edges in the graph ${\mathcal{G}}^{T}$ obtained after TR are increased by a constant offset such that all edges in ${\mathcal{G}}^{T}$ are ranked higher than all other potential edges (not contained in ${\mathcal{G}}^{T}$). The latter are listed below the edges of ${\mathcal{G}}^{T}$ according to their confidence weight computed in Step 1 (PG generation).
TRANSWESD
TRANSWESD (TRANSitive reduction for WEighted Signed Digraphs) was introduced in [31] with the goal to generalize and improve previous TR approaches [37–39] to make it amenable for the reconstruction of large biological networks.
Step 1 (PG generation)
TRANSWESD constructs the PG ${\mathcal{G}}^{P}$ via two thresholds: an edge i → j is introduced in ${\mathcal{G}}^{P}$ if (i) a measure similar to the zscore z_{ ij } used by DRFFL exceeds a given threshold β and (ii) if the absolute change of the state of node j when perturbing i exceeds a certain minimal deviation γ, i.e. if ${\mathbf{G}}_{j}^{\mathit{\text{wt}}}{\mathbf{G}}_{\mathit{\text{ij}}}^{\mathit{\text{ko}}}>\gamma $. Each edge i → j gets a sign ${s}_{\mathit{\text{ij}}}=\text{sign}\left({\mathbf{G}}_{j}^{\mathit{\text{wt}}}{\mathbf{G}}_{\mathit{\text{ij}}}^{\mathit{\text{ko}}}\right)$ indicating whether the changes in i and j have the same direction (positive sign) or not (negative sign). In addition, a weight w_{ ij } is assigned to each edge i → j quantifying its uncertainty or behavioral distance (i.e., a large weight indicates a low confidence of this edge). Accordingly, TRANSWESD uses w_{ ij } = 1 − c_{ ij } with c_{ ij } being the conditional correlation coefficient between genes i and j which is computed from all experiments except those where gene i was directly perturbed. More specifically, herein we define the conditional correlation coefficient c_{ ij } as the Pearson correlation coefficient computed from all measurements of nodes i and j (columns in G^{wt} and G^{ko}) except in the experiments where j was knockedout. The PG generated by the original TRANSWESD procedure is denoted by PG^{2}.
Step 2 (TR)
A particular feature of TRANSWESD is that it can deal with signed and weighted PGs and that cycles are allowed. The TR rule is as follows: An edge i → j with sign s_{ ij } and weight w_{ ij } is removed if there is an alternative path P_{ ij } (i ⇒ j) which connects i and j and fulfills the following requirements: (i) P_{ ij } is simple, i.e., it does not contain a cycle; (ii) P_{ ij } does not involve edge i → j; (iii) the overall sign of P_{ ij } (obtained by multiplying the signs of all its edges) is the same as s_{ ij }; and (iv) the maximum weight of all edges on path P_{ ij } (denoted by w_{ max }(P_{ ij })) fulfills
The confidence factor α is typically chosen close to (but smaller than) unity; the default value used by Klamt et al. [31] is 0.95. With α < 1 it is ensured that all edges in the path P_{ ij } have a higher confidence than the edge i → j. However, in some cases it can nevertheless be advantageous to use also α > 1. If a path P_{ ij } with the four required properties exists in the PG, then the observed effect of i upon j is considered to be explained (induced) by path P_{ ij }. All edges i → j in the PG fulfilling these conditions are considered to be (potentially) removable and are collected in a set R. If the graph is acyclic, TR is simple and unique and all potentially removable edges in R can be deleted immediately. The situation is more complicated in cyclic graphs: the result of TR can become nonunique, depending on the order of edge removals. TRANSWESD uses a reasonable rule to resolve nonuniqueness: it removes the edges of R iteratively starting with the highest weight (lowest association) first. As a second problem in cyclic graphs, it may then happen that a formerly removable edge in R becomes nonremovable because certain paths may have been interrupted by preceding deletions of other removable edges. Even worse, an edge might still potentially be removable but its elimination would lead to the interruption of a path that was required to explain an edge already removed in a previous iteration (see the example below). It is therefore necessary to explicitly test, in each iteration, whether upon removal of the next edge of R all edges originally contained in the PG ${\mathcal{G}}^{P}$ are still explainable by the remaining graph (otherwise this edge has to be reinserted). This may require extensive shortest path calculations.
Therefore, to reduce the computational effort in largescale cyclic graphs, TRANSWESD provides two parameters (path_exact and full_check) to allow for the (optional) use of approximate solutions which may drastically reduce the required computation time. Since computing the shortest path of a given sign in cyclic signed digraphs is an NPcomplete (and thus delicate) problem, starting TRANSWESD with path_exact=0 enforces the use of approximate path calculation algorithms which have been shown to produce no or only few errors in largescale biological networks [31, 40]. The full_check=0 option can be used to suppress recomputation of shortest path lengths after deleting an edge (thus assuming that the relevant path lengths will not change). To our experience from numerous tests, there are usually only minor effects on the reconstruction quality when using this simplification. As we deal herein only with largescale networks, we used path_exact=full_check=0 in all calculations.
We illustrate the approach with the example shown in Figure 1. The graph on the lefthand side displays a hypothetical cyclic PG with its edge weights and signs. Using the standard confidence factor of α = 0.95, in principle, three edges could be identified as indirect effects as for each of them a suitable explaining path would exist. This concerns the edge A → C (explained by path A → B → C and alternatively also by path A → D → B → C both fulfilling the sign and weight conditions), the edge A → B (explained by path A → D→B) and the edge D → B (explainable by the path D ⊣ E → C ⊣ B). These three edges form the set R of potentially removable edges. According to the rules, TRANSWESD removes first edge A → C as it has the largest weight (lowest confidence). In the second iteration, A → B can be safely removed. Now the algorithm has to stop even though the edge D → B is still explainable by the path given above. If we removed this edge, no positive path from A to B and from A to C would remain in the graph, i.e., the originally observed influence of A on B and C would not be captured anymore. This example shows that TRANSWESD may keep an edge in the graph, even if there is an explaining path for it. The resulting graph ${\mathcal{G}}^{T}$ for this example is shown on the righthand side of Figure 1. (Note: with full_check=0 the edge D → B would be (wrongly) removed in addition to the others whereas path_exact=0 had no effect).
Step 3 (Edge sorting)
TRANSWESD ranks the edges according to their weights computed in Step 1 (PG generation): edges with highest confidence (lowest weights) are placed first. Edges retained in ${\mathcal{G}}^{T}$ are put first followed by edges that were contained in the PG ${\mathcal{G}}^{P}$ (but removed during TR). The last group comprises all other pairwise interactions; their order is also determined by the conditional correlation coefficient c_{ ij }.
Novel variants
DRFFL and TRANSWESD were successfully applied and highly ranked in the DREAM4 network reconstruction challenge. However, when we compared and mixed both methods (e.g., by replacing Step 1 (PG generation) of TRANSWESD with Step 1 (PG generation) of DRFFL) we realized that even better approaches, in particular for Step 1 (PG generation) (PG generation) and Step 2 (TR) (TR), might exist. In the following we describe several new variants focusing on those which in the benchmarks performed significantly better than the original DRFFL and TRANSWESD versions (see Results and discussion section).
Perturbation graph
The novel PG generation procedure delivers:

The signed and directed PG ${\mathcal{G}}^{P}$ itself.

A matrix W^{t} containing the weights of the edges in ${\mathcal{G}}^{P}$ to be used by the transitive reduction algorithm. The element W^{t}(i,j) contains the weight of the edge i → j in ${\mathcal{G}}^{P}$; it is set to ∞ if the edge was not included in the PG.

A matrix W^{r} containing the confidence weights for all possible interactions (i,j) to be used in the edge ranking procedure in Step 3. In contrast to W^{t}, this matrix contains a weight for all pairs (i,j) (except for i = j as we exclude selfloops), even if i → j is not contained in ${\mathcal{G}}^{P}$.
A key difference of the novel PG algorithms compared to the strategies used by DRFFL and TRANSWESD is that different edge weights are used for TR and for edge sorting. Moreover, the selection of candidate edges and the calculation of edge weights are based on (combinations of) correlation and zscore measures. In detail, the following calculations are performed:

1.
Compute the n × n conditional correlation matrix C from the expression measurements G ^{wt} and G ^{ko}.

2.
Use G ^{ko} to compute the n × n zscore matrix Z comprising the zscore values of all (potential) edges.

3.
Compute the n × n matrix Z ^{c} as the zscore calculated on the absolute value of the entries of the conditional correlation matrix C, and add a (minimal) offset to obtain positive values: Z ^{c} > 0.

4.
Build the PG by defining the following set of edges:

${\mathcal{S}}^{1}$ comprises all node pairs (i,j) for which Z_{i,j} > β.

${\mathcal{S}}^{2}$ comprises all node pairs (i,j) for which C_{i,j} > γ.

${\mathcal{S}}^{3}$ is the set of all node pairs (i,j) whose zscore and correlation values have opposite sign: C_{i,j} · Z_{i,j} < 0.

$\mathcal{B}={\mathcal{S}}^{1}\cap {\mathcal{S}}^{2}\cap {\mathcal{S}}^{3}$ is the set of node pairs (i,j) satisfying the three previous conditions.

${\mathcal{Z}}^{p}$ is the set of node pairs (i,j) with positive zscore value.

${\mathcal{Z}}^{n}$ is the set of node pairs (i,j) with negative zscore value.

${\mathcal{E}}^{p}={\mathcal{Z}}^{n}\cap \mathcal{B}$ is the set of positive edges of the PG.

${\mathcal{E}}^{n}={\mathcal{Z}}^{p}\cap \mathcal{B}$ is the set of negative edges of the PG.

${\mathcal{G}}^{p}={\mathcal{E}}^{p}\cup {\mathcal{E}}^{n}=\mathcal{B}$ yields the PG.


5.
Compute the ranking weight matrix by normalizing W ^{r} = Z + Z ^{c} between 0 and 1.

6.
Compute the weight matrix W ^{t} to be used for transitive reduction in TRANSWESD as W ^{t} = 1 − Z ^{c}.
Using this scheme, the PG is built by selecting all edges where (i) the zscore exceeds a given threshold β, (ii) the conditional correlation exceeds another threshold γ, and (iii) the signs of zscore and conditional correlation are opposite. The latter condition is justified because a positive zscore for the edge (i,j) is computed when the deletion/decrease of i (due to knockout or knockdown) yields an increase in the activity of j which should corresponds to a negative correlation between i and j. The same correspondence exists between negative zscore and positive correlation. Obviously, measurement noise may invalidate the truth of these statements, thus we only keep edges that are consistent with respect to this sign rule. With the rule described above, the positive edges contained in ${\mathcal{E}}^{p}$ stem from a negative zscore and the negative edges contained in ${\mathcal{E}}^{n}$ from a positive zscore. In the following we denote the PG generated by the above procedure PG^{new}.
The weights ${\mathbf{W}}_{i,j}^{r}$ used for edge ranking take equallyweighted into account (i) the absolute value of the standard zscore (Equation (1)) of the deviations induced by the perturbation in i and (ii) the zscore of the deviations of the conditional correlation between i and j relative to the averaged conditional correlations related to gene j. As far as we know, a zscore of conditional correlations has not yet been used in the context of network inference, however, the ranking weights introduced above proved to be optimal in the benchmarks delivering an edge sorting of high quality. Below we show that the new PG generation approach in combination with the proposed ranking scheme may already deliver a valuable approximation of the network itself but can often be further improved by TR techniques.
Regarding the weights to be used for TR (W^{t}), benchmark tests showed us that it is beneficial to use only the zscore of the conditional correlation coefficients.
Transitive reduction
Identifying and pruning edges representing the indirect interactions in ${\mathcal{G}}^{P}$ finally yielding ${\mathcal{G}}^{T}$ is the central goal of transitive reduction. We here present some novel and generalized variants of TR inspired from the original versions of DRFFL and TRANSWESD.
We observed that the TR used by TRANSWESD (see Step 2 (TR) of TRANSWESD described above) can be generalized in multiple ways:

One may consider unweighted TRANSWESD by setting α = ∞ in the weight rule (2).

One may consider unsigned TRANSWESD by setting all edge signs in ${\mathcal{G}}^{P}$ to “+”. In this case, the algorithm becomes simpler (polynomial instead of NPcomplete) as the calculation of shortest paths does not need to distinguish between positive and negative paths. It is then, however, still important to keep the weights to avoid nonunique results in cyclic networks.

When searching for a suitable path P_{ ij } that can explain a certain edge i → j, one may restrict the search on paths involving not more edges than a predefined number L. In this way one would manifest the expectation that observed indirect effects can be traced back to short paths.
With these generalizations we introduce the notation TRANSWESD^{S,W,L} to specify the chosen TR variant: S ∈ {u,s} indicates whether the signed (s) or unsigned (u) TR version is used; W ∈ {u,w} specifies either the unweighted (u) or weighted (w) version; and L specifies the maximal path length allowed. Accordingly, the original TRANSWESD version corresponds to TRANSWESD^{s,w,∞}. We also observe that TRANSWESD^{u,u,∞} mimics TR used by DRFFL when removal of edges within one and the same component would be blocked.
However, we soon realized that the unweighted variant of TRANSWESD does not perform very well, in particular when combined with the full_check=0 option (see above). We therefore do not analyze the unweighted version in detail but keep the notation for consistency with respect to the following variant.
In addition to the modified version of TRANSWESD, we introduce a related but different strategy which we call local transitive reduction (LTR). There are two key differences: only paths of length 2 are considered as possible explanations for indirect effects and an alternative condition on the edge weight is introduced replacing rule (2). The LTR algorithm considers an edge i → j potentially removable if three criteria are fulfilled: (i) existence of a feedforward loop, i.e. $\{i\to j,i\to k,k\to j\}\in {\mathcal{G}}^{P}$; (ii) sign consistency, i.e. s_{ ij } = s_{ ik } · s_{ kj }; and (iii) the weight condition:
Recall that we introduced Z^{c} as the zscore of the correlation coefficients and that the relation to the edge weight W^{t} which we use for modified TRANSWESD is thus simply Z^{c} = 1 − W^{t}. Therefore, the smaller ${Z}_{\mathit{\text{ij}}}^{c}$ the higher the confidence that the path i → k → j can explain the edge i → j (thus, a large weight is here associated with high confidence).
Analogously as described for TRANSWESD, to deal with nonuniqueness, the potentially removable edges are iteratively deleted according to the edge weights (lowest confidence first) and for each edge to be removed it is checked, whether all edges originally contained in ${\mathcal{G}}^{P}$ are still explainable by a 2path in the remaining graph (otherwise this edge is kept).
Although LTR is also a weighted and signed TR variant, it is considerably simpler than TRANSWESD as it uses a simple triangle rule which is much easier to check than searching for suitable paths. For this reason, in contrast to TRANSWESD, we can easily use the exact variant with path_exact=full_check=1 in largescale networks. As will be shown in the section, despite its simplicity, LTR yielded excellent performance in the benchmarks. For LTR we also tested different variants, including the unweighted (condition (3) is dropped by setting α = 0), the unsigned and the unsigned/unweighted version (in the latter, only the 2path i → k → j must exist to render edge i → j removable, irrespective of edge signs and weights). We introduce a similar notation as for TRANSWESD: LTR^{S,W} indicates whether edge signs (S ∈ {u,s}) and weights (W ∈ {u,w}) are considered or not; the length parameter L becomes obsolete as it is fixed to 2.
Edge sorting
We use a simple edge ranking procedure which is similar to the strategy used by DRFFL and (original) TRANSWESD. Note that all n(n − 1) potential edges (except selfloops) are included into this list, also those that were not contained in the PG ${\mathcal{G}}^{P}$ or that were removed during TR. The position of each edge is determined by the ranking weights stored in W^{r} (see above): edges with highest ranking weights are put first. To ensure that edges contained in the final graph ${\mathcal{G}}^{T}$ are really ranked higher than all other edges, an offset is added to the weight of all edges in ${\mathcal{G}}^{T}$.
Results and discussion
In the following we present performance results of the new PG generation algorithm in combination with the modified TRANSWESD and the new LTR technique for subsequent transitive reduction. We used two different case studies for benchmarking: (i) the datasets of the DREAM4 InSilico_Size100 network inference challenge, and (ii) a novel largescale synthetic compendium consisting of 30 5000gene networks simulated by SysGenSIM [41] with different connectivities and noise levels. The DREAM4 benchmark also enables a comparison of the performances of the new approaches with its inspiring original techniques DRFFL and (old) TRANSWESD. Generally, in the case of in silico datasets (with known gold standard), the goodness of the predictions is evaluated based on the established Area Under the Curve (AUC) measures of ROC (Receiver Operating Characteristic) and PR (PrecisionRecall) curves. The AUPR is the most informative (and the only shown) performance measure for the case studies in this paper due to the sparsity of gene networks implying large AUROC values differing only insignificantly for the different methods.
In addition to the two in silico studies where the reconstruction results can be evaluated by a perfect gold standard, we used our framework in a realistic application scenario to infer gene interactions in yeast Saccharomyces cerevisiae based on a library of gene expression data measured in mutants with single knockouts of transcription factors.
Performance on DREAM4 networks
In the DREAM4 InSilico_Size100 network reconstruction challenge [21, 42], simulated steadystate measurements of the expression of each gene in the wildtype as well as in the singlegene knockout and singlegene knockdown mutant were provided for 5 different in silico networks (100 nodes each) from which the networks had to be reconstructed. We only make use of wildtype and knockout data as they directly support the generation of the PG (knockdown data can, in principle, further improve the results; see below). For assessing the quality of reconstructed networks, an evaluation script is available at the DREAM website [43] which computes an overall score obtained from the geometric mean of pvalues calculated for the AUPR and the AUROC measures from all 5 reconstructed networks.
We considered predictions by several combinations of the original as well as of the new PG generation and TR methods. The methods’ parameters were chosen according to previously used values (e.g., α) or according to preliminary tests. Importantly, one and the same parameter set was used for all five networks, i.e., no optimization was conducted for every single network. The DREAM4 evaluation script was used to compute the respective overall scores which are summarized in Table 1.
We recall that the combined use of the unsigned and zscorebased PG^{1} with DRFFL [30] originally obtained the best score (71.59) for the DREAM4 challenge, while the coupling of PG^{2} and (original) TRANSWESD was ranked third with a score of 64.71 (see [31]). Although we used here only the knockout data (in [31] both knockout and knockdown data were used for computing the correlation coefficents) the results presented in Table 1 are slightly better (66.10) by fixing a small bug in the computation of PG^{2}.
The DREAM4 best overall score of 71.59 is already exceeded by just applying unsigned TRANSWESD^{u,w,∞} or unsigned and unweighted LTR^{u,u} to the unsigned perturbation graph PG^{1}. This supports the statement in [44] about the weakness of the original DRFFL algorithm, where transitive reduction is applied only to edges between but not within strongly connected components of the PG. In fact, TRANSWESD^{u,w,∞} and especially TRANSWESD^{u,w,2} and LTR^{u,u/w} improve the score of PG^{1} significantly up to 79.74.
Regarding the results for PG^{2} originally used by the TRANSWESD method in [31] we observe that the quality (score) of the PG is lower than for the simple zscore PG^{1}. Although all tested TR techniques (except DRFFL) can improve the score, it remains below the performance results obtained for the zscore approach PG^{1}.
Next we tested the performance results of the TR techniques in PG^{new} where we also applied the new edge sorting scheme and sorting weights. As a first observation, a notable quality improvement is obtained by the novel PG^{new} alone achieving a score of 81.76 which is markedly higher than the scores obtained by PG^{1} and PG^{2}, even after TR. A somewhat unexpected result was that the γ threshold for the conditional correlation coefficients was virtually not required as its optimal value turned out to be 0. However, in other tests described below, using a nonzero value for this threshold in combination with β (for the zscore) turns out to be beneficial. We also analyzed the robustness of the quality of PG^{new} and its edge ordering with respect to the chosen threshold parameters β and γ: Figure 2 displays the overall score of PG^{new} when varying the threshold parameters showing that it is (i) higher than the previous winning score (71.59), (ii) higher than PG^{1}, and (iii) higher than PG^{2} – even for the complete space of meaningful parameter values scanned. Hence, a reasonable robustness of the quality of PG^{new} with respect to the two threshold parameters can be concluded.
We then applied the TR techniques to PG^{new} which increase the scores up to 88.80, thus well above the best score recorded at the DREAM4 challenge. Regarding the different TRANSWESD variants, we see that the signed version (85.79) is only slightly better than the unsigned variant (85.33) whereas “local” TRANSWESD, which takes only paths of length 2 into account, results in a further significant improvement of the score (88.57). In line with these observations, unsigned and signed LTR differ only marginally whereas signed and weighted LTR^{S,W} produces the best overall results, not far from the unweighted variant. Recall that TRANSWESD^{s,w,2} and LTR^{S,W} differ essentially only by condition (2) vs. (3). Although the results of both local variants are comparable, it seems that the rule used by LTR can better predict true indirect effects. Generally, Table 1 shows that all TR techniques work well by strongly decreasing the number of false positive edges (FPs) with only a slight decrease in number of true positives (TPs). Apparently, the best ratio is obtained by local TR variants (i.e., by LTR and TRANSWESD^{s,w,2}). We also noticed that the (average) enrichment of TPs under the first 100 reconstructed edges in the sorted edge list (column TP^{100} in Table 1) is especially large confirming the potential of our methods: it reaches 66.6 for PG^{new}, 72.8 for TRANSWESD^{s,w,2} and even 75.8 for LTR^{S,W}. Hence, there is a high probability that topranked edges correspond to true interactions – a desirable property when validating the edges experimentally.
Figure 3 demonstrates that TRANSWESD (left) and LTR (right) are also fairly robust with respect to the confidence factor, as the score of the perturbation graph PG^{new} is highly improved by both methods for a broad range of meaningful values of α. In this context it is also of interest that unweighted LTR yielded very good predictions even without the need to specify any further parameter (as necessary for weighted LTR and TRANSWESD).
We summarize that the bestranked algorithms of the DREAM4 competition are significantly outperformed by our new methods for PG generation, TR, and edge sorting. We noticed that also other recently published network inference techniques applied to the DREAM4 networks [36, 45] reported lower predictions. For example, the highest overall score in [45] is 81.10 obtained by using both knockout and knockdown datasets whereas a score of 73.33 was achieved in [36] by CUTTERW, an approach similar to unsigned (and weighted) TRANSWESD. Moreover, if we also include knockdown data in our analysis (for calculating the conditional correlation coefficients), the scores in Table 1 grow up by approximately 3–4 points each, reaching a top of 92.03 with PG^{new} + LTR^{S,W}. As expected, this confirms that an increase of the number of measurements corresponds to an improvement of the prediction. However, most of the information is already provided by the knockout experiments.
Performance on SysGenSIM datasets
In order to provide an even more exhausting and more realistic test scenario for the developed inference algorithms, the software SysGenSIM [41] was used to create a new collection of synthetic gene networks and to simulate knockout experiments under different connectivities and noise conditions. SysGenSIM is able to generate large networks with a topology similar to those observed in real organisms, i.e., with a modular structure featuring exponential and powerlaw behavior for the in and outdegree distributions of nodes [46]. The generated 30 in silico networks have a considerable (close to genomescale) size of 5000 nodes each. One third of them has a low average degree (about 7500 edges, i.e. K ≃ 1.5), 10 networks have an intermediate average degree (about 10000 edges, i.e. K ≃ 2), while the last third of the networks exhibits the largest average degree (about 12500 edges, i.e. K ≃ 2.5). Finally, using equations of biochemical kinetics where the degradation rate of gene expression is represented by a firstorder process and where the transcription rate exhibits essential features of cooperativity and saturation [28], single knockout experiments have been simulated for all the genes of each network with SysGenSIM’s default kinetic parameters under 9 different combinations of noise conditions (for technical details see [41]). In fact, SysGenSIM allows for the selection of the standard deviation σ_{ θ } of the Gaussian distribution from which the biological synthesis and degradation variances are sampled (parameters θ^{syn} and θ^{deg} in Equation (1) in [41]) as well as the standard deviation σ_{ ν } of the Gaussian distribution from which the experimental noise ν is sampled. As possible values for both standard deviations we considered {0.025,0.05,0.1}, yielding a total of 9 combinations summarized in Table 2. Therefore a grand total of 270 different networks (30 topologies with 9 different noise configurations) with simulations of singleknockout experiments have been produced, the goal being the testing of the inference methodologies under different conditions of edge density, biological variance, and multiplicative measurement noise.
Due to the superior performance of PG^{new} we present results only for this PG. Figure 4 displays the performance of PG^{new} for the 9 different noise configurations of network 1 (connectivity K ≃ 1.5) in dependency of a wide range of (β,γ) parameters (examples for K ≃ 2 and K ≃ 2.5 are shown in Figures F1 and F2 in Additional file 1). The novel PG generation algorithm exhibits reasonable robustness with respect to both noise and threshold parameters. In fact it works decently with the same β and γ as used for the DREAM4 networks (for the optimal value, γ needs to be slightly raised from 0.00 to 0.05), while the procedures for PG^{1} and PG^{2} would need a more extensive retuning of the parameters to obtain reasonable results (not shown).
The effect of the TR algorithms applied to PG^{new} (Tables 3 and 4, Figure 5) becomes more heterogeneous and differentiated compared to the DREAM4 networks. First of all, we observe that the unweighted versions of LTR decrease in all cases the quality of the perturbation graph PG^{new} whereas weighted LTR and (nonlocal versions of) TRANSWESD improve it – partially significantly – in all scenarios (with one minor exception). This demonstrates that weighted TR can be highly beneficial. However, local TRANSWESD^{s,w,2}, which was comparable with LTR in the DREAM4 networks, achieves similar unfavorable results for these large and noisy networks as unweighted LTR. This confirms again that rule (3) seems to be better suited for local TR than rule (2). Furthermore, the quality of the PG as well as the relative improvement by the (weighted) TR techniques depends substantially on the magnitude of the noise level both with respect to AUPR and in the number of TPs and FPs. An interesting observation can be made regarding the effect of biological variance on the reconstruction quality: it appears that moderately increased (medium) biological noise is advantageous in case of high measurement noise for all K’s (i.e., networks with noise configuration MH perform better than those with LH; see Figure 5 and Table 3 as well as Tables T1 and T2 and Figure F3 in Additional file 1). Thus, higher biological noise may help to uncover true perturbation effects under high uncertainty of measurements.
It can also be noticed that, in general, TRANSWESD^{s,w, ∞} and LTR^{S,W} achieve similar superior AUPR performance, but by different means as manifested in Table 4: the LTR technique prunes the edges of the PG more generously than TRANSWESD^{s,w,∞}, resulting in a better reduction of false positive edges, but at the same time in an undesired higher decrease of true positives. We can also confirm a result from the DREAM4 benchmark: signed (weighted) LTR and TRANSWESD achieved always better AUPR scores than their unsigned versions (except in one case), but only to a very small extent. This important observation is discussed in more detail in the Conclusion section.
Finally, Figure F3 in Additional file 1 shows how the precision of PG^{new} and the effectiveness of TR decrease when the network connectivity (average node degree K) increases. Moreover, the superiority of weighted vs. unweighted TR can again clearly be seen.
Application to a realistic yeast knockout dataset
The ultimate test for our reverseengineering algorithm would be the application to a genomescale realworld dataset of singlegene perturbation experiments (e.g., through single gene knockouts). Only few such datasets are available. The most suitable for our purpose is the S. cerevisiae transcription factor knockout expression compendium of Hu et al. [34] where the expression of n = 6253 genes was measured after single knockouts (or knockdowns) of m = 269 transcription factors (TFs) being the most important regulators in yeast. Herein we refer to the revised dataset provided by Reimand et al. [35] where the original raw data of Hu et al. were reanalyzed with more sophisticated statistical techniques of the BioConductor package [47] leading to an increased informative content of the microarray measurements.
The processed data of Reimand et al. (http://www.ebi.ac.uk/arrayexpress/experiments/EMTAB109) consists of three matrices of size m × n:

L contains the logfold change values for all genes across all knockout experiments.

P includes the pvalues for differential expression.

A is the signed adjacency matrix of the graph reconstructed by Reimand et al. The entries A(i,j) correspond to (inferred) edges with a pvalue of P(i,j) < 0.05.
Our goal was to reprocess the logfold change values in order to apply our network inference algorithm and to produce a ranked list of edges. For comparing our reconstructed network with the predicted network of Reimand et al. we need a gold standard. However, as a reliable gold standard for gene regulation in S. cerevisiae is still not available (otherwise we would not need our inference methods), we made use of four published “silver standard” networks containing experimentally validated interactions (between TFs or from TFs to other genes):

1.
SS _{1}: is a collection of found chipchip results and motifs from 162 TFs [48] (size of silver standard network: 162 TFs × 6253 genes).

2.
SS _{2}: is a subset of the binding sites from S S _{1} which are also in nucleosomedepleted regions [48] (size of silver standard network: 159 TFs × 6253 genes).

3.
SS _{3}: this silver standard network contains known regulatory interactions between 142 TFs and 3459 targets compiled from the results of genetic, biochemical and ChIP (chromatin immunoprecipitation)chip experiments [49] (size of silver standard network: 142 TFs × 3459 genes).

4.
S S _{4}: contains interactions between 114 TFs and 5667 targets. This network was used as a reference network for a subchallenge of the DREAM5 competition [6] (size of silver standard network: 114 TFs × 5667 genes).
In order to obtain a gene expression matrix G exploitable by our inference algorithm, we “inverted” the logfold change values to obtain G(i,j) = 2^{L(i,j)} for all the possible edges. In this way, expression values larger than 1 represent an increase of the gene expression of j after the knockout of i (i.e., i is a inhibitor of j), and vice versa a positive regulation for values smaller than 1.
As for the synthetic datasets, the gene expression matrix G served then as input to produce the perturbation graph PG^{new} and the weight matrices W^{r} and W^{t}. The transitive reduction techniques TRANSWESD and LTR were then applied to PG^{new} and the resulting edges for each method were sorted according to our ranking scheme. This sorted edge list was delivered as output (prediction) of our procedure. We performed the whole inference process by employing the same parameters used for the simulated networks, i.e. β = 2.00, γ = 0.05, α = 0.95 for TRANSWESD, α = 1.50 for local TRANSWESD, and α = 0.15 for LTR. Note that m = 269 TFs were perturbed, hence, we can only infer edges that will start in a TF node and point to TF or nonTF genes.
A (predicted) confidencesorted edge list was also obtained for the original dataset of Reimand et al. by resorting the absolute values of the logfold changes in L according to the adjacency matrix A, which is then used as a reference to assess the performance of Reimand’s reconstructed network.
Afterwards we evaluated all predictions against the 4 silver standards. To allow for a fair scoring, only common nodes from silver standard networks and prediction lists were taken into consideration, i.e. if edge (i,j) is in the prediction list but node i or j is not contained in the silver standard, then the edge is not scored. On the other hand, if node k belongs to the silver standard but was not included in the microarray dataset, then all ingoing and outgoing edges of k were removed from the silver standard. Accordingly, the size of the silver standard SS_{3} reduced from 142 × 3459 to 122 × 3444 and of SS_{4} from 114 × 5667 to 108 × 5469.
For each of the four silver standards, Figure 6 shows the AUPR and the number of true positive edges (TP) computed for an increasing number of edges selected from top of the ordered edge lists as given by (1) Reimand’s predictions, (2) the perturbation graph PG^{new}, (3) TRANSWESD^{s,w,∞}, (4) LTR^{u,u}, and (5) LTR^{S,W}. Generally, the results of our methods (2)(5) with respect to the four different silver standard networks appear to be satisfactory, though with different measure for the four silver standard networks. It is apparent that our methods work especially well within the 100 – 200 topranked edges where all of the inferred networks (2)(5) show better agreement with the silver standards than the interactions found by Reimand. The PG itself performs again reasonably well and better than Reimand’s network in this region. All variants of TR show positive effects but not among the topranked edges because these are immune against pruning (accordingly, for these edges, the results for PG^{new} and for the TR methods are identical). We observe that unweighted and unsigned LTR^{u,u} performed best for this dataset. However, one should keep in mind that no tuning or adaptation of the parameters has been performed which could have prevented a better result for the weighted versions.
When increasing the number of considered edges to 500, it appears that Reimand’s network becomes better and better eventually, in some cases, getting higher overall agreement with the silver standards than our methods. However, we argue that for practical applications (e.g. validation of edge candidates) the first ≈ 100 edges are the most important ones. In this region, given the silver standards, our approach seems to work most efficient yielding high statistical significance: for the four silver standard networks we obtained [42, 27, 31, 20] TPs in the network reconstructed with LTR^{u,u} yielding corresponding pvalues of [6.31 · 10^{−46}, 6.68 · 10^{−28}, 3.93 · 10^{−33}, 6.79 · 10^{−25}], based on the hypergeometric distribution. These values are very similar for the other four PG/TRbased methods. It is most likely that the number of TPs is even larger given the high probability that not all interactions might be contained in the silver standards. Our prediction list might thus provide useful targets for validations. A sorted list of the 300 (identified) edges with highest confidence and a comparison with the four silver standards can be found in Table T4 in the Additional file 2.
Conclusion
We presented novel algorithms for the inference of gene regulatory networks from systematic perturbation experiments. These algorithms support the reconstruction of regulatory networks via three steps: (i) PG generation, (ii) TR to remove edges representing indirect effects in the PG, and (iii) sorting of edge candidates. We presented new variants for all of these three steps whose combined use yielded superior results over previous methods when tested with standardized benchmark scenarios.
Regarding the PG, it proved advantageous to identify, weight and sort candidate edges by a mixture of two measures, one being the standard zscore of deviations, the other one the zscore of conditional correlation coefficients. In particular, the latter was highly informative for edge pruning by TR whereas a combined weight of both zscores proved beneficial for edge sorting. With the new candidate edge selection and edge sorting schemes, we observed that the PG alone (without TR) achieved a reconstruction quality that is far above the results of previous methods after TR. Importantly, the quality of the PG appeared to be robust against larger variations in the two required threshold parameters. In this regard, one aspect for future work is to develop algorithms for automatic thresholding, that is to estimate the threshold parameters from the data.
We proposed new variants of TR and, based on unbiased in silico benchmarks, compared them with the original versions of the algorithms. Several key observations could be made:

1.
The DRFFL method [30] was inferior to all other TR methods tested which led us to the conclusion that TR should be employed not only between but also within cyclic structures. The winning performance of the original DRFFL in the DREAM challenge can mainly be attributed to its PG which is in parts similar to the one used by PG^{new}.

2.
We found that explicitly accounting for edge signs almost always improves the results in terms of AUPR but only to a very minor extent. While this is in agreement with the observations made in [36], we give here an extended explanation for this unexpected result. Generally, neglecting the edge sign can only be “harmful” during TR, if the true network contains a negative feedforward loop (FFL). As an example, Figure 7 (left) shows a hypothetical interaction graph containing one such negative FFL between node A and E (consisting of a positive path and a negative edge from A to E). If we now assume that all nodes are perturbed in single perturbation experiments we would get, in the ideal case, a PG as shown in Figure 7 (right; weights not considered here). This PG corresponds to the transitive closure of the original graph, in which each node i induces a significant effect on another node j if there is an edge or path from i to j in the true graph. What we can now see is that each edge contained in the PG but not in the true graph (reflecting thus an indirect effect) is part of a positive FFL consisting of this edge and a path of the same sign. This happens because all these edges will have the same sign as the path they were induced from. Hence, if we compute the TR within the unsigned version of this PG (e.g., by neglecting the signs in the TR step or by setting all signs to “+”) all edges that stem from indirect effects and span one branch of the induced FFLs would still correctly be removed.
Regarding the original negative FFL included in the true graph (Figure 7, left), we cannot be sure which sign it will get in the PG as there is a positive path as well as a negative edge from A to E and, hence, the direction of change in E when perturbing A cannot be predicted uniquely. Only if the overall effect of A on E measured during perturbation of A is negative (i.e., the true edge of A to E is dominating over the positive path from A to E), it may happen that it will be falsely removed during TR if edge signs are neglected. However, when using edge weights for (weighted) TR, it is rather unlikely that the path from A to E fulfills the rule (2) or, for a 2path used by LTR, rule (3) since the measured overall effect from A to E turned out to be negative, hence, the path seems to have a low potential to transduce an effect from A to E. Thus, it is unlikely that the true edge A ⊣ E would be falsely removed.
To summarize this aspect, there is a low probability that (weighted) TR removes a true edge within a negative feedforward loop and neglecting edge signs in TR will therefore have only minor impact on the reconstruction quality. This has important consequences since then the computationally expensive search for the shortest signconsistent paths (an NPcomplete problem) can be safely turned into a simple search for a shortest (unsigned) path connecting a given pair of nodes (a polynomial problem). Thus, when applying TRANSWESD to the 5000nodes networks, we may then use an exact (path_exact=1) instead of an approximate (path_exact=0) subalgorithm for computing shortest signed path. In contrast, full_check=0 is still required for TRANSWESD in large networks.

3.
With LTR and TRANSWESD^{s,w,2} we considered local variants of TR removing an edge only if there is an explaining path of length 2. Whereas TRANSWESD^{s,w,2} performed well for the DREAM challenge but unfavorable for the SysGenSIM data, weighted LTR yielded superior performance in almost all benchmark tests and only (signed or unsigned) TRANSWESD applied to all paths could deliver comparable results. We can thus first conclude that using the multiplicative rule (3) is better suited than the max rule (2) when focusing on short paths. However, it remains still paradoxical why TR restricted to paths of length 2 should be sufficient. This can once more be illustrated by Figure 7. If we again assume that the true graph induces a complete PG (i.e., the transitive closure of the true graph as shown on the righthand side of Figure 7) then we can indeed recognize that there is always a 2path that can, in principle, explain an edge from an indirect effect (e.g., edge A → E is explained by the 2path A → B → E). Hence, in principle, all false positive edges could be identified and removed explaining why LTR exhibits good behavior. However, one has to keep in mind that 2paths may contain edges that are themselves indirect effects (as B → E in the example above), hence, the order of edge removal might then become crucial. Here, the strategy to cut lowestconfidenceedges first worked apparently well in the benchmarks.
Again, showing that local TR based on 2paths does not lead to lower performance has important consequences, as we can then restrict the search on simple triangles whose detection is computationally easier than detecting paths of arbitrary length. In fact, unsigned (signed) LTR required in the average only 8 (9) seconds in networks with 5000 nodes whereas TRANSWESD (in approximation mode!) needed 150 (260) seconds.

4.
The SysGenSIM benchmark showed that edge weights really matter to obtain good results with LTR. Since (signed or unsigned) local LTR and unsigned TRANSWESD are computationally feasible in 5000nodes networks and as they achieved superior results in all benchmarks (outperforming the winning methods of the DREAM4 challenge by far) these techniques appear to be wellsuited for the reconstruction of largescale regulatory networks based on systematic perturbation experiments.

5.
Applied to a realistic application scenario with gene expression data from yeast mutants with single knockouts of transcription factors we could demonstrate that our approach delivers a high enrichment of known interactions especially within the topranked edge candidates. With this property, our method holds great potential to identify true unknown gene interactions that can subsequently be validated in experiments.
A potential weakness of our PG and TRbased methods is the requirement to perturb each node in the network at least once. At a genomescale level, such datasets are currently only available for a small number of organisms. On the other hand, one might focus on smaller subnetworks where all nodes can be perturbed. Furthermore, if m nodes out of n nodes can be perturbed in a network, we can use the information of the corresponding m perturbation experiments to (i) infer the complete subnetwork containing only the m perturbed nodes and (ii) to infer edges leading from the m perturbed nodes to the n − m unperturbed nodes. In the second of these subnetworks, TR cannot work effectively (no edge will be removed since only single edges and no paths between these nodes exist) meaning that some of the (false positive) edges in the PG reflecting indirect edges cannot be identified as such. However, the provided output might still have its own value and indicate direct or indirect functional relationships. In fact, we employed this approach for the yeast knockout dataset where “only” the TFs were knockedout.
We also emphasize that perturbation graphs (as a requirement for applying TR) could also be constructed by other approaches than systematic knockouts of all genes. One example are genetical genomics data containing gene expressions measurements from naturally occurring multifactorial perturbations (polymorphisms). As an example for using PG and TRbased methods based on genetical genomics data see [50].
We noticed that LTR shares some similarities with ARACNE presented in [11]. ARACNE also eliminates an edge in a feedforward loop consisting of three edges (socalled triplets) if a certain weight condition is fulfilled. However, there are several key differences since ARACNE only operates on undirected and unsigned graphs and uses different weights based on mutual information.
As an important additional results of our study, we have generated new and unprecedented largescale benchmark datasets that, in contrast to comparable simulations, account for different noise levels. We think that these datasets, which can be downloaded from http://sysgensim.sourceforge.net/datasets.html, are generally useful for unbiased testing of network inference methodologies complementing other available in silico benchmarks.
Availability of supporting data
The presented inference algorithms and the 5000gene benchmark (the PulaMagdeburg singlegene knockout benchmark dataset) can be downloaded from http://sysgensim.sourceforge.net/datasets.html.
Abbreviations
 PG:

Perturbation graph
 TR:

Transitive reduction
 LTR:

Local transitive reduction
 DRFFL:

Downranking of feedforward loop
 FFL:

Feedforward loop
 TRANSWESD:

TRANSitive reduction in WEighted Signed Digraphs
 TP:

True positive
 FP:

False positive
 FN:

False negative.
References
 1.
Markowetz F, Spang R: Inferring cellular networks–a review. BMC Bioinformatics. 2007, 8 (Suppl 6): S510.1186/147121058S6S5.
 2.
Gardner T, Faith J: Reverseengineering transcription control networks. Phys Life Rev. 2005, 2: 6588. 10.1016/j.plrev.2005.01.001.
 3.
Bansal M, Belcastro V, AmbesiImpiombato A, di Bernardo D: How to infer gene networks from expression profiles. Mol Syst Biol. 2007, 3: 78
 4.
Schlitt T, Brazma A: Current approaches to gene regulatory network modelling. BMC Bioinformatics. 2007, 8 (Suppl 6): S910.1186/147121058S6S9.
 5.
De Smet R, Marchal K: Advantages and limitations of current network inference methods. Nat Rev Microbiol. 2010, 8 (10): 717729.
 6.
Marbach D, Costello JC, Kuffner R, Vega NM, Prill RJ, Camacho DM, Allison KR, Kellis M, Collins JJ, Stolovitzky G, et al: Wisdom of crowds for robust gene network inference. Nat Methods. 2012, 9 (8): 796804. 10.1038/nmeth.2016.
 7.
Tibshirani R: Regression shrinkage and selection via the lasso. J R Stat Soc, Ser B. 1994, 58: 267288.
 8.
de la Fuente A, Bing N, Hoeschele I, Mendes P: Discovery of meaningful associations in genomic data using partial correlation coefficients. Bioinformatics. 2004, 20: 35653574. 10.1093/bioinformatics/bth445.
 9.
Prill R, Marbach D, SaezRodriguez J, Sorger P, Alexopoulos L, Xue X, Clarke N, AltanBonnet G, Stolovitzky G: Towards a rigorous assessment of systems biology models: the DREAM3 challenges. PLoS ONE. 2010, 5: e920210.1371/journal.pone.0009202.
 10.
Watkinson J, Lian K, Wang X, Zheng T, Anastassiou D: Inference of regulatory gene interactions from expression data using threeway mutual information. Ann New York Acad Sci. 2009, 1158: 302313. 10.1111/j.17496632.2008.03757.x.
 11.
Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Favera RD, Califano A: ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics. 2006, 7 (Suppl 1): S710.1186/147121057S1S7.
 12.
Kueffner R, Petri T, Tavakkolkhah P, Windhager L, Zimmer R: Inferring Gene Regulatory Networks by ANOVA. Bioinformatics. 2012, 28: 13761382. 10.1093/bioinformatics/bts143.
 13.
Friedman N, Linial M, Nachman I, Pe’er D: Using Bayesian networks to analyze expression data. J Comput Biol. 2000, 7: 601620. 10.1089/106652700750050961.
 14.
Chiquet J, Grandvalet Y, Ambroise C: Inferring multiple graphical structures. Stat Comput. 2011, 21 (4): 537553. 10.1007/s1122201091912.
 15.
HuynhThu V, Irrthum A, Wehenkel L, Geurts P: Inferring regulatory networks from expression data using treebased methods. PLoS ONE. 2010, 5 (9): e1277610.1371/journal.pone.0012776.
 16.
Nelander S, Wang W, Nilsson B, She QB, Pratilas C, Rosen N, Gennemark P, Sander C: Models from experiments: combinatorial drug perturbations of cancer cells. Mol Syst Biol. 2008, 4: 216
 17.
Durzinsky M, Wagler A, Weismantel R, Marwan W: Automatic reconstruction of molecular and genetic networks from discrete time series data. BioSystems. 2008, 93 (3): 18190. 10.1016/j.biosystems.2008.04.001.
 18.
Akutsu T, Kuhara S, Maruyama O, Miyano S: Identification of genetic networks by strategic gene disruptions and gene overexpressions under a boolean model. Theor Comput Sci. 2003, 298: 235251. 10.1016/S03043975(02)004255.
 19.
Rodriguez JS, Alexopoulos L, Epperlein J, Samaga R, Lauffenburger D, Klamt S, Sorger P: Discrete logic modelling as a means to link protein signalling networks with functional analysis of mammalian signal transduction. Mol Syst Biol. 2009, 5: 331
 20.
The DREAM Project. http://www.thedreamproject.org,
 21.
Marbach D, Prill RJ, Schaffter T, Mattiussi C, Floreano D, Stolovitzky G: Revealing strengths and weaknesses of methods for gene network inference. Proc Natl Acad Sci. 2010, 107 (14): 62866291. 10.1073/pnas.0913357107.
 22.
Meyer P, Alexopoulos L, Bonk T, Califano A, Cho C, de la Fuente, de Graaf D, Hartemink A, Hoeng J, Ivanov N, Koeppl H, Linding R, Marbach D, Norel R, Peitsch M, Rice J, Royyuru A, Schacherer F, Sprengel J, Stolle K, Vitkup D, Stolovitzky G: Verification of systems biology research in the age of collaborative competition. Nat Biotechnol. 2011, 29: 811815. 10.1038/nbt.1968.
 23.
Stumpf MP, Thorne T, de Silva E, Stewart R, An HJ, Lappe M, Wiuf C: Estimating the size of the human interactome. Proc Natl Acad Sci. 2008, 105 (19): 69596964. 10.1073/pnas.0708078105.
 24.
de la Fuente A, Brazhnik P, Mendes P: Linking the genes: inferring quantitative gene networks from microarray data. Trends Genet. 2002, 18 (8): 395398. 10.1016/S01689525(02)026926.
 25.
Kholodenko BN, Kiyatkin A, Bruggeman FJ, Sontag E, Westerhoff HV, Hoek JB: Untangling the wires: a strategy to trace functional interactions in signaling and gene networks. Proc Natl Acad Sci. 1284, 99 (20): 112846.
 26.
van Someren EP, Wessels LF, Reinders MJ: Linear modeling of genetic networks from experimental data. Proc Int Conf Intell Syst Mol Biol. 2000, 8: 355366.
 27.
Smith VA, Jarvis ED, Hartemink AJ: Evaluating functional network inference using simulations of complex biological systems. Bioinformatics. 2002, 18 (Suppl 1): S216S224. 10.1093/bioinformatics/18.suppl_1.S216.
 28.
Mendes P, Sha W, Ye K: Artificial gene networks for objective comparison of analysis algorithms. Bioinformatics. 2003, 19 (Suppl 2): ii122ii129. 10.1093/bioinformatics/btg1069.
 29.
Soranzo N, Bianconi G, Altafini C: Comparing association network algorithms for reverse engineering of largescale gene regulatory networks: synthetic versus real data. Bioinformatics. 2007, 23 (13): 16401647. 10.1093/bioinformatics/btm163.
 30.
Pinna A, Soranzo N, de la Fuente: From knockouts to networks: establishing direct causeeffect relationships through graph analysis. PLoS ONE. 2010, 5 (10): e1291210.1371/journal.pone.0012912.
 31.
Klamt S, Flassig RJ, Sundmacher K: TRANSWESD: inferring cellular networks with transitive reduction. Bioinformatics. 2010, 26 (17): 21602168. 10.1093/bioinformatics/btq342.
 32.
Winzeler EA, Shoemaker DD, Astromoff A, Liang H, Anderson K, Andre B, Bangham R, Benito R, Boeke JD, Bussey H, Chu AM, Connelly C, Davis K, Dietrich F, Dow SW, El Bakkoury M, Foury F, Friend SH, Gentalen E, Giaever G, Hegemann JH, Jones T, Laub M, Liao H, Liebundguth N, Lockhart DJ, LucauDanila A, Lussier M, M’Rabet N, Menard P, Mittmann M, Pai C, Rebischung C, Revuelta JL, Riles L, Roberts CJ, RossMacDonald P, Scherens B, Snyder M, SookhaiMahadeo S, Storms RK, Veronneau S, Voet M, Volckaert G, Ward TR, Wysocki R, Yen GS, Yu K, Zimmermann K, Philippsen P, Johnston M, Davis RW: Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science. 1999, 285 (5429): 901906. 10.1126/science.285.5429.901.
 33.
Dolgin E: Mouse library set to be knockout. Nature. 2011, 474 (7351): 262263. 10.1038/474262a.
 34.
Hu Z, Killion PJ, Iyer VR: Genetic reconstruction of a functional transcriptional regulatory network. Nat Genet. 2007, 39 (5): 683687. 10.1038/ng2012.
 35.
Reimand J, Vaquerizas JM, Todd AE, Vilo J, Luscombe NM: Comprehensive reanalysis of transcription factor knockout expression data in Saccharomyces cerevisiae reveals many new targets. Nucleic Acid Res. 2010, 38 (14): 47684777. 10.1093/nar/gkq232.
 36.
Bošnački D, Odenbrett MR, Wijs A, Ligtenberg W, Hilbers P: Efficient reconstruction of biological networks via transitive reduction on general purpose graphics processors. BMC Bioinformatics. 2012, 13: 28110.1186/1471210513281.
 37.
Tresch A, Beissbarth T, Sültmann H, Kuner R, Poustka A, Buness A: Discrimination of direct and indirect interactions in a network of regulatory effects. J Comput Biol. 2007, 14 (9): 12171228. 10.1089/cmb.2007.0085.
 38.
Wagner A: How to reconstruct a large genetic network from n gene perturbations in fewer than n2 easy steps. Bioinformatics. 2001, 17 (12): 11831197. 10.1093/bioinformatics/17.12.1183.
 39.
Kachalo S, Zhang R, Sontag E, Albert R, DasGupta B: NETSYNTHESIS: a software for synthesis, inference and simplification of signal transduction networks. Bioinformatics. 2008, 24 (2): 293295. 10.1093/bioinformatics/btm571.
 40.
Klamt S, Kamp AV: Computing paths and cycles in biological interaction graphs. BMC Bioinformatics. 2009, 10: 18110.1186/1471210510181.
 41.
Pinna A, Soranzo N, Hoeschele I, de la Fuente: Simulating systems genetics data with SysGenSIM. Bioinformatics. 2011, 27 (17): 24592462. 10.1093/bioinformatics/btr407.
 42.
DREAM4 In Silico Network Challenge. http://wiki.c2b2.columbia.edu/dream/index.php/D4c2,
 43.
Evaluation script for the DREAM4 In Silico Network Challenge. http://wiki.c2b2.columbia.edu/dream/results/DREAM4,
 44.
Schaffter T, Marbach D, Floreano D: GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods. Bioinformatics. 2011, 27 (16): 22632270. 10.1093/bioinformatics/btr373.
 45.
Wang Y, Zhou T: A relative variationbased method to unraveling gene regulatory networks. PLoS ONE. 2012, 7 (2): e3119410.1371/journal.pone.0031194.
 46.
Guelzim N, Bottani S, Bourgine P, Kepes F: Topological and causal structure of the yeast transcriptional regulatory network. Nat Genet. 2002, 31: 6063. 10.1038/ng873.
 47.
Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, et al: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004, 5 (10): R8010.1186/gb2004510r80.
 48.
Kaplan N, Moore IK, FondufeMittendorf Y, Gossett AJ, Tillo D, Field Y, LeProust EM, Hughes TR, Lieb JD, Widom J, et al: The DNAencoded nucleosome organization of a eukaryotic genome. Nature. 2008, 458 (7236): 362366.
 49.
Luscombe NM, Babu MM, Yu H, Snyder M, Teichmann SA, Gerstein M: Genomic analysis of regulatory network dynamics reveals large topological changes. Nature. 2004, 431 (7006): 308312. 10.1038/nature02782.
 50.
Flassig RJ, Heise S, Sundmacher K, Klamt S: An effective framework for reconstructing gene regulatory networks from genetical genomics data. Bioinformatics. 2013, 29 (2): 246254. 10.1093/bioinformatics/bts679.
Acknowledgements
SK and SH acknowledge funding by the German Federal Ministry of Education and Research (e:Bio  0316167B) and by the Ministry of Education and Research of SaxonyAnhalt (Research Center “Dynamic Systems: Biosystems Engineering”). AP and ALF have been partially supported by the Sardinian Regional Authority (RAS).
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
SK and ALF designed the study. AP generated the new datasets. AP and SK implemented the new algorithms. AP, RJF, SH and SK analyzed the data. All authors were involved in the analysis of the benchmark results and in drafting and proofreading the manuscript. All authors read and approved the final manuscript.
Electronic supplementary material
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
About this article
Received
Accepted
Published
DOI
Keywords
 Gene network inference
 Reverse engineering
 Perturbation experiments
 Causal networks
 Graph theory
 Interaction graphs
 Transitive reduction
 Transcriptional regulation
 Saccharomyces cerevisiae
 Yeast