TIGRESS: Trustful Inference of Gene REgulation using Stability Selection
 AnneClaire Haury^{1, 2, 3},
 Fantine Mordelet^{4},
 Paola VeraLicona^{1, 2, 3} and
 JeanPhilippe Vert^{1, 2, 3}Email author
DOI: 10.1186/175205096145
© Haury et al; licensee BioMed Central Ltd. 2012
Received: 22 April 2012
Accepted: 18 October 2012
Published: 22 November 2012
Abstract
Background
Inferring the structure of gene regulatory networks (GRN) from a collection of gene expression data has many potential applications, from the elucidation of complex biological processes to the identification of potential drug targets. It is however a notoriously difficult problem, for which the many existing methods reach limited accuracy.
Results
In this paper, we formulate GRN inference as a sparse regression problem and investigate the performance of a popular feature selection method, least angle regression (LARS) combined with stability selection, for that purpose. We introduce a novel, robust and accurate scoring technique for stability selection, which improves the performance of feature selection with LARS. The resulting method, which we call TIGRESS (for Trustful Inference of Gene REgulation with Stability Selection), was ranked among the top GRN inference methods in the DREAM5 gene network inference challenge. In particular, TIGRESS was evaluated to be the best linear regressionbased method in the challenge. We investigate in depth the influence of the various parameters of the method, and show that a fine parameter tuning can lead to significant improvements and stateoftheart performance for GRN inference, in both directed and undirected settings.
Conclusions
TIGRESS reaches stateoftheart performance on benchmark data, including both in silico and in vivo (E. coli and S. cerevisiae) networks. This study confirms the potential of feature selection techniques for GRN inference. Code and data are available on http://cbio.ensmp.fr/tigress. Moreover, TIGRESS can be run online through the GenePattern platform (GPDREAM, http://dream.broadinstitute.org).
Keywords
Gene Regulatory Network inference Feature selection Gene expression data LARS Stability selectionBackground
In order to meet their needs and adapt to changing environments, cells have developed various mechanisms to regulate the production of the thousands of proteins they can synthesize. Among them, the regulation of gene expression by transcription factors (TF) is preponderant: by binding to the promoter regions of their target genes (TG), TF can activate or inhibit their expression. Deciphering and understanding TFTG interactions has many potential farreaching applications in biology and medicine, ranging from the in silico modeling and simulation of the gene regulatory network (GRN) to the identification of new potential drug targets. However, while many TFTG interactions have been experimentally characterized in model organisms, the systematic experimental characterization of the full GRN remains a daunting task due to the large number of potential regulations.
The development of highthroughput methods, in particular DNA microarrays, to monitor gene expression on a genomewide scale has promoted new strategies to elucidate GRN. By systematically assessing how gene expression varies in different experimental conditions, one can try to reverse engineer the TFTG interactions responsible for the observed variations. Not surprisingly, many different approaches have been proposed in the last decade to solve this GRN reverse engineering problem from collections of gene expression data. When expression data are collected over time, for example, several methods have been proposed to construct dynamic models where TFTG interactions dictate how the expression level of a TG at a given time allows to predict the expression levels of its TG in subsequent times [1–11]. When expression data are not limited to time series, many methods attempt to capture statistical association between the expression levels of TG and candidate TF using correlation or informationtheoretic measures such as mutual information [12–14] or take explicit advantage of perturbations in the experiments such as gene knockdowns [15]. The difficulty to separate direct from indirect regulations has been addressed with the formalism of Bayesian networks [16–19], or by formulating the GRN inference problem as a feature selection problem [20]. Mutual informationbased ARACNE [13] was also designed to eliminate redundant edges. We refer to [21, 22] for detailed reviews and comparisons of existing methods.
Recent benchmarks and challenges have highlighted the good performance of methods which formalize the GRN inference problem as a regression and feature selection problem, namely, identifying a small set of TF whose expression levels are sufficient to predict the expression level of each TG of interest. The general idea that edges in a directed graph can be discovered node by node was addressed in, e.g.,[23]. Regarding the GRN inference application, this idea underlies the Bayesian network formalism [16], but is more directly addressed by GENIE3 [20], a method which uses random forests to identify TF whose expression levels are predictive for the expression level of each TG, and which is now recognized as stateoftheart on several benchmarks [20, 22]. Feature selection with random forests remains however poorly understood theoretically, and one may wonder how other wellestablished statistical and machine learning techniques for feature selection would behave to solve the GRN inference problem.
In this paper, we investigate the performance of a popular feature selection method, least angle regression (LARS) [24] combined with stability selection [25, 26], for GRN inference. LARS is a computationally efficient procedure for multivariate feature selection, closely related to Lasso regression [27]. Stability selection consists in running LARS or Lasso many times, resampling the samples and the variables at each run, and in computing the frequency with which each variable was selected across the runs. We introduce a novel, robust and accurate scoring technique for stability selection, which improves the performance of feature selection with LARS. The resulting method, which we call TIGRESS (for Trustful Inference of Gene REgulation with Stability Selection), was ranked among the top GRN inference methods in the DREAM5 gene reconstruction challenge and was evaluated to be the best linear regression based method [28]. We furthermore investigate in depth the influence of the various parameters of the method, and show that a fine parameter tuning can lead to significant improvements and stateoftheart performance for GRN inference. Finally, we show that TIGRESS performs well when TFs are not known in advance, i.e. it can predict edge directionality. Overall this study confirms the potential of stateoftheart feature selection techniques for GRN inference.
Methods
Problem formulation
We consider a set of p genes $\mathcal{G}=[1,p]$, including a subset $\mathcal{T}\subset [1,p]$ of transcription factors, among which we wish to discover direct interactions of the form (t,g) for $t\in \mathcal{T}$ and $g\in \mathcal{G}$. We do not try to infer selfregulation, meaning that for each target gene $g\in \mathcal{G}$ we define the set of possible regulators as ${\mathcal{T}}_{g}=\mathcal{T}\setminus \left\{g\right\}$ if $g\in \mathcal{T}$ is itself a transcription factor, and ${\mathcal{T}}_{g}=\mathcal{T}$ otherwise. The set of all candidate regulations is therefore $\mathcal{E}=\left\{(t,g),g\in \mathcal{G},t\in {\mathcal{T}}_{g}\right\}$, and the GRN inference problem is to identify a subset of true regulations among $\mathcal{E}$.
For that purpose, we assume we have gene expression measurements for all genes $\mathcal{G}$ in n experimental conditions. Although the nature of the experiments may vary and typically include knockdown or knockout experiments and even replicates, for simplicity we do not exploit this information and only consider the n×p data matrix of expression levels X as input for the GRN inference problem. Each row of X corresponds to an experiment and each column to a gene. We assume that the expression data have been preprocessed for quality control and missing values imputation.
In order to infer the regulatory network from the expression data X, we compute a score $s:\mathcal{E}\to \mathbb{R}$ to assess the evidence that each candidate regulation is true, and then predict as true regulation the pairs $(t,g)\in \mathcal{E}$ for which the evidence s(t,g) is larger than a threshold δ. We let δ as a usercontrolled parameter, where larger δvalues correspond to less predicted regulations, and only focus on designing a significance score s(t,g) that leads to “good” prediction for some values of δ. In other words, we only focus on finding a good ranking of the candidate regulations $\mathcal{E}$, by decreasing score, such that true regulations tend to be at the top of the list; we let the user control the level of false positive and false negative predictions he can accept. Note that such a ranking is the standard prediction format of the DREAM challenge.
GRN inference with feature selection methods
Many popular methods for GRN inference are based on such a score. For example, the correlation or mutual information between the expression levels of t and g along the different experiments is a popular way to score candidate regulations [12–14]. A drawback of such direct approaches is that it is then difficult to separate direct from indirect regulations. For example, if t_{1}regulates t_{2}which itself regulates g, then the correlation or mutual information between t_{1} and g is likely to be large, although (t_{1}g) is not a direct regulation. Similarly, if t_{1}regulates both t_{2}and g, then t_{2}and g will probably be very correlated, even if there is no direct regulation between them. In order to overcome this problem, a possible strategy is to postprocess the predicted regulations and try to remove regulations likely to be indirect because they are already explained by other regulations [13]. Another strategy is, given a target gene $g\in \mathcal{G}$, to jointly estimate the scores s(t g) for all candidate regulators $t\in {\mathcal{T}}_{g}$ simultaneously, with a method able to capture the fact that a large score for a candidate regulation (t g) is not needed if the apparent correlation between t and g is already explained by other, more likely regulations.
then the score s_{ g }(t) should typically assess the probability that β_{t,g}is nonzero [23]. More general models are possible, for example [20] model f_{ g } with a random forest [29] and score a predictor s_{ g }(t)with a variable importance measure specific to this model. Once a score s_{ g }(t) is chosen to assess the significance of each transcription factor in the targetgenespecific regression model (1), we can combine them across all target genes by defining the score of a candidate regulation $(t,g)\in \mathcal{E}$ as s(t g)=s_{ g }(t), and rank all candidate regulations by decreasing score for GRN inference.
Feature selection with LARS and stability selection
We now propose a new scoring function s_{ g }(t) to assess the significance of a transcription factor $t\in {\mathcal{T}}_{g}$ in the regression model (1). Our starting point to define the scoring function is the LARS method for feature selection in regression [24]. LARS models the regression function (1) linearly, i.e. it models the expression of a target gene as a linear combination of the expression of its transcription factors, as in (2). Starting from a constant model where no TF is used, it iteratively adds TF in the model to refine the prediction of X_{ g }. Contrary to classical forward stepwise feature selection [30, 31], LARS does not fully reoptimize the fitted model when a new TF is added to the model, but only refines it partially. This results in a statistically sound procedure for feature selection, akin to forward stagewise linear regression and the Lasso [27, 31], and a very efficient computational procedure. In practice, after L steps of the LARS iteration, we obtain a ranked list of L TF selected for their ability to predict the expression of the target gene of interest. Efficient implementations of LARS exist in various programming languages including R (lars package, [24]) and MATLAB (SPAMS toolbox, [32]). Since the selection of TF is iterative, LARS has the potential to disregard indirect regulations.
The direct use of LARS to score candidate regulations has, however, two shortcomings. First, LARS can be very sensitive and unstable in terms of selected features when there exist high correlations between different explanatory variables. Second, it only provides a ranking of the TF, for each TG of interest, but does not provide a score s_{ g }(t) to quantify the evidence that a TF t regulates a target gene g. Since we want to aggregate the predicted regulations across all target genes to obtain a global ranking of all candidate regulations, we need such a score.
This shows that the scores output by TIGRESS are naturally normalized per target gene, and we therefore do not consider further normalization before aggregating all scores together across target genes.
In other words, both the original and the area scores can be expressed as E[ϕ(H_{ t })], although with a different function ϕ. While the original score only assesses how often a feature ranks in the top L, the area score additionally takes into account the value of the rank, with features more rewarded if they are not only in the top L but also frequently with a small rank among the top L. Since s_{ area }integrates the frequency information over the full LARS path up to L steps, it should be less sensitive than s_{ original }to the precise choice of L, and should allow to investigate larger values of L without saturation effects when several curves hit the maximal frequency of 1. We note that other scores of the form E[ϕ(H_{ t })] for nonincreasing function ϕcould be investigated as well.
Parameters of TIGRESS
In summary, the full procedure for scoring all candidate edges in $\mathcal{E}$, which we call TIGRESS, splits the GRN inference problem into p independent regression problems taking each target gene $g\in \mathcal{G}$ in turn, and scores each candidate regulation (t g) for a candidate TF $t\in {\mathcal{T}}_{g}$ with the original (3) or area (4) stability score applied to LARS feature selection. In addition to the choice of the scoring method (original or area), the parameters of TIGRESS are (i) the number of runs R performed in stability selection to compute the scores, (ii) the number of LARS steps L, and (iii) the parameter α∈[0,1]which controls the random reweighting of each expression array in each stability selection run. Apart from R that should be taken as large as possible to ensure that frequencies are correctly estimated, and is only limited by the computational time we can afford to run TIGRESS, the influence of αand L on the final performance of the method are not obvious. Taking α=1means that no weight randomization is performed on the different expression arrays, while α=0 leads to maximal randomization. [26] advocate that a value between 0.2and 0.8 is often a good choice. Regarding the choice of L, [26] mentions that it has usually little influence on the result, but as discussed above, the choice of a good range of values may not be trivial in particular for the original score. We investigate below in detail how the performance of TIGRESS depends on the scoring method and on these parameters R, α and L.
Performance evaluation
We experimentally compare TIGRESS to several other GRN inference methods. We use the MATLAB implementations of CLR [14] and GENIE3 [20]. We run ARACNE [13] using the R package minet. We keep default parameter values for each of these methods. Results borrowed from the DREAM5 challenge [28] were directly obtained by each participating team.
Given a gene expression data matrix, each GRN inference method outputs a ranked list of putative regulatory interactions. Taking only the top K predictions in this list, we can compare them to known regulations to assess the number of true positives (TP, the number of known regulations in the top K predictions), false positives (FP, the number of predicted regulations in the top K which are not known regulations), false negatives (FN, the number of known interactions which are not in the top K predictions) and true negatives (TN, the number of pairs not in the top K predictions which are not known regulations). We then compute classical statistics to summarize these numbers for a given K, including precision (TP/(TP + FP)), recall (TP/(TP + FN)), and fallout (FP/(FP + TN)). We assess globally how these statistics vary with K by plotting the receiver operating characteristic (ROC) curve (recall as a function of fallout) and the precisionrecall curve (precision as a function of recall), and computing the area under these curves (respectively AUROC and AUPR) normalized between 0 and 1.
Data
Datasets
Network  ♯TF  ♯Genes  ♯Chips  ♯Verified interactions 

DREAM5 Network 1 (insilico)  195  1643  805  4012 
DREAM5 Network 3 (E. coli)  334  4511  805  2066 
DREAM5 Network 4 (S. cerevisiae)  333  5950  536  3940 
E. coli Network from [14]  180  1525  907  3812 
DREAM4 Multifactorial Network 1  100  100  100  176 
DREAM4 Multifactorial Network 2  100  100  100  249 
DREAM4 Multifactorial Network 3  100  100  100  195 
DREAM4 Multifactorial Network 4  100  100  100  211 
DREAM4 Multifactorial Network 5  100  100  100  193 
The first three benchmarks are taken from the DREAM5 challenge [28]. Network 1 is a simulated dataset. Its topology and dynamics were modeled according to known GRN, and the expression data were simulated using the GeneNetWeaver software [33]. We refer the interested reader to [22, 34] for more information on this network. The second and third benchmarks are Network 3 and Network 4 of the DREAM5 competition, corresponding respectively to real expression data for E. coli and S. cerevisiae. Note that we do not use in our experiments Network 2 of DREAM5, because no verified TFTG interaction is provided for this dataset consisting in expression data for S. aureus.
Additionally, we run experiments on the E. coli dataset from [14], which has been widely used as a benchmark in GRN inference literature. The expression data was downloaded from the Many Microbe Microarrays (M^{3D}) database [35] (version 4 build 6). It consists in 907 experiments and 4297 genes. We obtained the gold standard data from RegulonDB [36] (version 7.2, May 6th, 2011) that contains 3812 verified interactions among 1525 of the genes present in the microarrays experiments.
Finally, we borrowed the five DREAM4 [22] size 100 multifactorial networks [34] for which the TFs are not known in advance in order to assess TIGRESS’ ability to predict directionality.
As a preprocessing step, we simply meancenter and scale to unit variance the expression levels of each gene within each compendium.
Results
DREAM5 challenge results
In 2010 we participated to the DREAM5 Network Inference Challenge, an open competition to assess the performance of GRN methods [28]. Participants were asked to submit a ranked list of predicted interactions from four matrices of gene expression data. At the time of submission, no further information was available to participants (besides the list of TF), in particular the “true” network of verified interactions for each dataset was not given. After submissions were closed, the organizers of the challenge announced that one network (Network 1) was a simulated network with simulated expression data, while the other expression datasets were real expression data collected for E. coli (Network 3) and S. cerevisiae (Network 4), respectively. Teams were ranked for each network by decreasing score (5), and an overall score was computed summarizing the networkspecific pvalues [28].
We submitted predictions for all networks with a version of TIGRESS that we could not optimize since the benchmarks were blinded at the time of the challenge. We refer to it as Naive TIGRESS below. Naive TIGRESS is the variant of TIGRESS which scores candidate interactions with the original score (3) and uses the arbitrarily fixed parameters α=0.2, L=5, R_{1}=4,000, R_{3}=R_{4}=1,000, where R_{ i } refers to the number of runs for network i. The number of runs were simply set to ensure that TIGRESS would finish within 2 days on a singlecore laptop computer. R_{1}is larger than R_{3}and R_{4}because the size of network 1 is smaller than that of networks 3 and 4, implying that each TIGRESS run is faster. The choice α=0.2followed previous suggestions for the use of stability selection [26], while the choice L=5 roughly corresponded to the largest value for which no TFTG pair had a score of 1.
DREAM5 networks results
Method  Network 1  Network 3  Network 4  Overall  

AUPR  AUROC  Score  AUPR  AUROC  Score  AUPR  AUROC  Score  
GENIE3 [20]  0.291  0.815  104.65  0.093  0.617  14.79  0.021  0.518  1.39  40.28 
ANOVerence [37]  0.245  0.780  53.98  0.119  0.671  45.88  0.022  0.519  2.21  34.02 
Naive TIGRESS  0.301  0.782  87.80  0.069  0.595  4.41  0.020  0.517  1.08  31.1 
CLR [14]  0.255  0.773  55.02  0.075  0.590  5.29  0.021  0.516  1.07  20.46 
ARACNE [13]  0.187  0.763  24.47  0.069  0.572  3.24  0.018  0.504  1.1e4  9.24 
TIGRESS  0.320  0.789  105.28  0.066  0.589  3.25  0.020  0.514  0.46  36.33 
The winning method, both in silico and overall, was the GENIE3 method of [20]. GENIE3 already won the DREAM4 challenge, confirming its overall stateoftheart performance. It had particularly strong performance on the in silico network, and more modest performance on both in vivo networks. The ANOVAbased method of [37] ranked second overall, with particularly strong performance on the E. coli network. Naive TIGRESS ranked third overall, with particularly strong performance on the in silico network, improving over GENIE3 in terms of AUPR.
Interestingly, GENIE3 and TIGRESS follow a similar formulation of GRN inference as a collection of feature selection problems for each target gene, and use similar randomizationbased techniques to score the evidence of a candidate TFTG interaction. The main difference between the two methods is that GENIE3 aggregates the features selected by decision trees, while TIGRESS aggregates the features selected by LARS. The overall good results obtained by both methods suggest that this formalism is particularly relevant for GRN inference.
Influence of TIGRESS parameters
In this section, we provide more details about the influence of the various parameters of TIGRESS on its performance, taking DREAM5 in silico network as benchmark dataset. Obviously the improvements we report below would require confirmation on new datasets not used to optimize the parameters, but they shed light on the further potential of TIGRESS and similar regressionbased method when parameters are precisely tuned.
Starting from the parameters used in Naive TIGRESS (R=4,000, α=0.2and L=5, original score), we assess the influence of the different parameters by systematically testing the following combinations:

original (3) or area (4) scoring method;

randomization parameter α∈{0,0.1…,1};

length of the LARS path L∈{1,2…20};

number of randomization runs R∈{1,000;4,000;10,000}.
A first observation is that the area scoring method consistently outperforms the original scoring method, for any choice of αand L. This suggests that, by default, the newly proposed area score should be preferred to the classical original score. We also note that the performance of the area score is less sensitive to the value of α or L than that of the original score. For example, any value of αbetween 0.2and 0.8, and any L less than 10 leads to a score of at least 90 for the area score, but it can go down to 60 for the original score. This is a second argument in favor of the area scoring setting: as it is not very sensitive to the choice of the parameters, one may practically more easily tune it for optimal performance. On the contrary, the window of (α L) values leading to the best performance is more narrow with the original scoring method, and therefore more difficult to find a priori. The recommendation of [26] to choose αin the range [0.2,0.8]is clearly not precise enough for GRN inference. The best overall performance is obtained with (α=0.4,L=2) in both scoring settings.
Comparison with other methods
TIGRESS, as tuned optimally on this network, outperforms all methods in terms of AUPR and all methods but GENIE3 in terms of AUROC. Moreover, the shape of the Precision/Recall curve suggests that the top of the prediction list provided by TIGRESS contains more true edges than other methods. The ROC curve, on the other hand, focuses on the entire list of results. Therefore, we would argue that TIGRESS can be more reliable than GENIE in its first predictions but contains overall more errors when we go further in the list.
These results suggest that TIGRESS has the potential to compare with stateoftheart methods and confirm the importance of correct parameter tuning.
In vivo networks results
E. coli network results
Method  AUPR  AUROC  Score 

TIGRESS  0.0624  0.6026  0.3325 
ARACNE  0.0498  0.5531  0.3014 
CLR  0.0641  0.6019  0.3330 
GENIE3  0.0814  0.6375  0.3594 
Analysis of errors on E. coli
To understand further the advantages and limitations of TIGRESS, we analyze the type of errors it typically makes taking the E. coli dataset as example. We analyze FP, i.e. cases where TIGRESS predicts an interaction that does not appear in the gold standard GRN.
Directionality prediction : case study on DREAM4 networks
DREAM4 networks results
Method  Network 1  Network 2  Network 3  Network 4  Network 5  Overall score  

AUPR  AUROC  AUPR  AUROC  AUPR  AUROC  AUPR  AUROC  AUPR  AUROC  
GENIE3  0.154  0.745  0.155  0.733  0.231  0.775  0.208  0.791  0.197  0.798  37.482 
TIGRESS  0.165  0.769  0.161  0.717  0.233  0.781  0.228  0.791  0.234  0.764  38.848 
Computational Complexity
The complexity of runningL LARS steps on a regression problem withq covariates and n samples isO(nqL + L^{ 3 })[24]. In our case,q is the number of TF andn is the number of expression arrays, which we divide by two during the resampling step, and we pay this complexity for each TG and each resampling. Multiplying byp TG andR resampling runs, we therefore get a total complexity of orderO(pR(n /2 qL +L^{3})), which boils down toO(pRnqL/2) in the situation whereL is smaller thann/2andq.
Runtime
Method  Unit running time (s)  Total running time (s) 

GENIE3  1.2e6  2.208e+4 
TIGRESS  1.5e8  1.957e+4 
ARACNE    15.54 
ANOVerence    8.46 
CLR    3.86 
Discussion and conclusions
In this paper, we presented TIGRESS, a new method for GRN inference. TIGRESS expresses the GRN inference problem as a feature selection problem, and solves it with the popular LARS feature selection method combined with stability selection. It ranked in the top 3 GRN inference methods at the 2010 DREAM5 challenge, without any parameter tuning. We clarified in this paper the influence of each parameter, and showed that further improvement may result from finer parameter tuning.
We proposed in particular a new scoring method for stability selection, based on the area under the stability curve. It differs from the original formulation of [26] which does not take into account the full distribution of ranks of a feature in the randomized feature selection procedure. Comparing the two, we observed that the new area scoring technique yields better results and is less sensitive to the values of the parameters: practically all values of,e.g., the randomization parameter α yield the same performance. Similarly, the choice of the number L of LARS steps to run seems to have much less impact on the performance in this new setting. As we showed, the original and area scores for a featuret can be both expressed in a common formalism asE ϕ(H)] for different functionsϕ, where H_{ t }is the rank of feature t as selected by the randomized LARS. It could be interesting to systematically investigate variants of these scores with more general nonincreasing functionsϕ, not only for GRN inference but also more generally as a generic feature selection procedure.
Comparing TIGRESS  as tuned optimally  to stateoftheart algorithms on thein silico network, we observed that it achieves a similar performance to that of GENIE3 [20], the best performer at the DREAM5 challenge. However, TIGRESS does not do as good as this algorithm onin vivo networks. GENIE3 is also an ensemble algorithm but differs from TIGRESS in that it uses a nonlinear treebased method for feature selection, while TIGRESS uses LARS. The difference in performance could be explained by the fact that the linear relationship between TGs and TFs assumed by TIGRESS is farfetched given the obvious complexity of the problem.
A further analysis of our results on the E. coli network from [14] showed that many spuriously detected edges follow the same pattern: TIGRESS discovers edges when in reality the two nodes aresiblings, and thus tends to wrongly predict feedforward loops. This result suggests many directions for future work. Among them, we believe that operons,i.e. groups of TGs regulated together could be part of the problem. Moreover, it could be that there is more of a linear relationship between siblings than between parent and child, as TFs are known to be operating asswitches, i.e. it is only after a certain amount change in expression of the TF that related TGs are affected. However, it is worth noting thatin vivo networks gold standards may not be complete. Therefore, the hypothesis that TIGRESS is actually correct when predicting these loops cannot be discarded.
TIGRESS depends on four parameters: the scoring method, the numberR of resampling runs, the randomization factorαand the number of LARS stepsL. We showed in this paper that changing the value of these parameters can greatly affect the performance and provided guidelines to choose them. It is worth noting, though, that other modifications can be imagined. In particular, one may wonder about the influence of the resampling parameters (with or without replacement, proportion of samples to be resampled). These questions will be tackled in future work.
While it seems indeed more realistic not to restrict underlying models to linear ones, it is fair to say that no method performs very well in absolute values onin vivo networks. For example, performances on theE. coli network seem to level out at some 64% AUROC and 8% AUPR which cannot be considered satisfying. This suggests that while regressionbased procedures such as TIGRESS or GENIE3 are stateoftheart for GRN inference, their performances seem to hit a limit which probably cannot be outdistanced without some changes. It is worth noting that, as argued in [28], combining these methods together leads to improvement, as different sets of interactions are discovered by each method. Another way to overcome these limits may be a change in the global approach such as adding some supervision in the learning process as,e.g., investigated in [38].
Endnotes
^{1} Unfortunately, we were not able to run ANOVerence on this particular dataset due to heavy formatting requirements of the data by the algorithm that we did not have the necessary information to perform.
Declarations
Acknowledgements
JPV was supported by the French National Reseach Agency (ANR09BLAN005104 and ANR11BSV201302) and the European Research Council (SMACERC280032).
Authors’ Affiliations
References
 Arkin A, Shen P, Ross J: A test case of correlation metric construction of a reaction pathway from measurements. Science. 1997, 277 (5330): 12751279. 10.1126/science.277.5330.1275. [http://www.sciencemag.org/cgi/reprint/277/5330/1275.pdf] 10.1126/science.277.5330.1275View ArticleGoogle Scholar
 Liang S, Fuhrman S, Somogyi R: REVEAL, a general reverse engineering algorithm for inference of genetic network architectures. Pac Symp Biocomput. 1998, 3: 1829.Google Scholar
 Chen T, He HL, Church GM: Modeling gene expression with differential equations. Pac Symp Biocomput. 1999, 4: 2940.Google Scholar
 Akutsu T, Miyano S, Kuhara S: Algorithms for identifying Boolean networks and related biological networks based on matrix multiplication and fingerprint function. J Comput Biol. 2000, 7 (34): 331343. 10.1089/106652700750050817.View ArticleGoogle Scholar
 Yeung MKS, Tegnér J, Collins JJ: Reverse engineering gene networks using singular value decomposition and robust regression. Proc Natl Acad Sci USA. 2002, 99 (9): 61636168. 10.1073/pnas.092576199. [http://www.pnas.org/content/99/9/6163.abstract] 10.1073/pnas.092576199View ArticleGoogle Scholar
 Tegner J, Yeung MKS, Hasty J, Collins JJ: Reverse engineering gene networks: integrating genetic perturbations with dynamical modeling. Proc Natl Acad Sci USA. 2003, 100 (10): 59445949. 10.1073/pnas.0933416100.View ArticleGoogle Scholar
 Gardner TS, Bernardo D, Lorenz D, Collins JJ: Inferring genetic networks and identifying compound mode of action via expression profiling. Science. 2003, 301 (5629): 102105. 10.1126/science.1081900.View ArticleGoogle Scholar
 Chen KC, Wang TY, Tseng HH, Huang CYF, Kao CY: A stochastic differential equation model for quantifying transcriptional regulatory network in Saccharomyces cerevisiae. Bioinformatics. 2005, 21 (12): 28832890. 10.1093/bioinformatics/bti415.View ArticleGoogle Scholar
 Bernardo D, Thompson MJ, Gardner TS, Chobot SE, Eastwood EL, Wojtovich AP, Elliott SJ, Schaus SE, Collins JJ: Chemogenomic profiling on a genomewide scale using reverseengineered gene networks. Nat Biotechnol. 2005, 23 (3): 377383. 10.1038/nbt1075.View ArticleGoogle Scholar
 Bansal M, Della Gatta, Bernardo D: Inference of gene regulatory networks and compound mode of action from time course gene expression profiles. Bioinformatics. 2006, 22 (7): 815822. 10.1093/bioinformatics/btl003.View ArticleGoogle Scholar
 Zoppoli P, Morganella S, Ceccarelli M: TimeDelayARACNE: Reverse engineering of gene networks from timecourse data by an information theoretic approach. BMC Bioinformatics. 2010, 11: 15410.1186/1471210511154.View ArticleGoogle Scholar
 Butte AJ, Tamayo P, Slonim D, Golub TR, Kohane IS: Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. Proc Natl Acad Sci USA. 2000, 97 (22): 1218212186. 10.1073/pnas.220392197.View ArticleGoogle Scholar
 Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, Califano A: ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular contexts. BMC Bioinformatics. 2006, 7 Suppl 1: S710.1186/147121057S1S7.View ArticleGoogle Scholar
 Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, Cottarel G, Kasif S, Collins JJ, Gardner TS: Largescale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol. 2007, 5: e810.1371/journal.pbio.0050008.View ArticleGoogle Scholar
 Rice J, Tu Y, Stolovitzky G: Reconstructing biological networks using conditional correlation analysis. Bioinformatics. 2005, 21 (6): 765773. 10.1093/bioinformatics/bti064.View ArticleGoogle Scholar
 Friedman N, Linial M, Nachman I, Pe’er D: Using Bayesian networks to analyze expression data. J Comput Biol. 2000, 7 (34): 601620. 10.1089/106652700750050961.View ArticleGoogle Scholar
 Hartemink A, Gifford D, Jaakkola T, Young R: Using graphical models and genomic expression data to statistically validate models of genetic regulatory networks. Proceedings of the Pacific Symposium on Biocomputing 2002. Edited by: Altman RB, Dunker AK, Hunter L, Lauerdale K, Klein TE. 2002, World Scientific, 422433. [http://helixweb.stanford.edu/psb01/abstracts/p422.html]Google Scholar
 Perrin B, Ralaivola L, Mazurie A, Bottani S, Mallet J, d’Alche Buc F: Gene networks inference using dynamic Bayesian networks. Bioinformatics. 2003, 19 (suppl 2): ii138ii148. 10.1093/bioinformatics/btg1071.View ArticleGoogle Scholar
 Friedman N: Inferring cellular networks using probabilistic graphical models. Science. 2004, 303 (5659): 79910.1126/science.1094068.View ArticleGoogle Scholar
 HuynhThu VA, Irrthum A, Wehenkel L, Geurts P: Inferring regulatory networks from expression data using treebased methods. PLoS One. 2010, 5 (9): e1277610.1371/journal.pone.0012776.View ArticleGoogle Scholar
 Markowetz F, Spang R: Inferring cellular networks  a review. BMC Bioinformatics. 2007, 8 (Suppl 6): S510.1186/147121058S6S5. [http://www.biomedcentral.com/14712105/8/S6/S5] 10.1186/147121058S6S5View ArticleGoogle Scholar
 Marbach D, Prill RJ, Schaffter T, Mattiussi C, Floreano D, Stolovitzky G: Revealing strengths and weaknesses of methods for gene network inference. Proc Natl Acad Sci USA. 2010, 107 (14): 62866291. 10.1073/pnas.0913357107.http://www.pnas.org/content/107/14/6286.abstract, 10.1073/pnas.0913357107View ArticleGoogle Scholar
 Meinshausen N, Bühlmann P: High dimensional graphs and variable selection with the Lasso. Ann Stat. 2006, 34: 14361462. 10.1214/009053606000000281.View ArticleGoogle Scholar
 Efron B, Hastie T, Johnstone I, Tibshirani R: Least angle regression. Ann. Stat. 2004, 32 (2): 407499. 10.1214/009053604000000067.View ArticleGoogle Scholar
 Bach FR: Bolasso: model consistent Lasso estimation through the bootstrap. Proceedings of theth international conference on Machine learning Volume 308 of ACM International Conference Proceeding Series. Edited by: Cohen WW, McCallum A, Roweis ST. 2008, ACM, New York, NY, USA, 3340.Google Scholar
 Meinshausen N, Bühlmann P: Stability selection. J R Stat Soc Ser B. 2010, 72 (4): 417473. 10.1111/j.14679868.2010.00740.x.View ArticleGoogle Scholar
 Tibshirani R: Regression shrinkage and selection via the lasso. J R Stat Soc Ser B. 1996, 58: 267288.Google Scholar
 Marbach D, Costello J, Küffner R, Vega N, Prill R, Camacho D, Allison K, Kellis M, Collins J, Stolovitzky G, the DREAM5 Consortium: Wisdom of crowds for robust gene network inference. Nat Methods. 2012, 9 (8): 796804. 10.1038/nmeth.2016.View ArticleGoogle Scholar
 Breiman L: Random forests. Mach Learn. 2001, 45: 532. 10.1023/A:1010933404324.View ArticleGoogle Scholar
 Weisberg S: Applied linear regression. 1981, NewYork, WileyGoogle Scholar
 Hastie T, Tibshirani R, Friedman J: The elements of statistical learning: data mining, inference, and prediction. 2001View ArticleGoogle Scholar
 Mairal J, Bach F, Ponce J, Sapiro G: Online Learning for Matrix Factorization and Sparse Coding. J Mach Learn Res. 2010, 11: 1960. [http://jmlr.csail.mit.edu/papers/v11/mairal10a.html]Google Scholar
 Schaffter T, Marbach D, Floreano D: GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods. Bioinformatics. 2011, 27 (16): 22632270. 10.1093/bioinformatics/btr373. [http://bioinformatics.oxfordjournals.org/content/27/16/2263.abstract] 10.1093/bioinformatics/btr373View ArticleGoogle Scholar
 Marbach D, Schaffter T, Mattiussi C, Floreano D: Generating realistic in silico gene networks for performance assessment of reverse engineering methods. J Comput Biol. 2009, 16 (2): 229239. 10.1089/cmb.2008.09TT. [http://online.liebertpub.com/doi/abs/10.1089/cmb.2008.09TT] 10.1089/cmb.2008.09TTView ArticleGoogle Scholar
 Faith J, Driscoll M, Fusaro V, Cosgrove E, Hayete B, Juhn F, Schneider S, Gardner T: Many Microbe Microarrays Database: uniformly normalized Affymetrix compendia with structured experimental metadata. Nucleic Acids Res. 2008, 36 (Database issue): D866—D87010.1093/nar/gkm815.Google Scholar
 GamaCastro S, Salgado H, PeraltaGil M, SantosZavaleta A, MuñizRascado L, SolanoLira H, JimenezJacinto V, Weiss V, GarcíaSotelo JS, LópezFuentes A, PorrónSotelo L, AlquiciraHernández S, MedinaRivera A, MartínezFlores I, AlquiciraHernández K, MartínezAdame R, BonavidesMartínez C, MirandaRíos J, Huerta AM, MendozaVargas A, ColladoTorres L, Taboada B, VegaAlvarado L, Olvera M, Olvera L, Grande R, Morett E, ColladoVides J: RegulonDB version 7.0: transcriptional regulation of Escherichia coli K12 integrated within genetic sensory response units (Gensor Units). Nucleic Acids Res. 2011, 39 (suppl 1): D98—D105[http://nar.oxfordjournals.org/content/39/suppl_1/D98.abstract]Google Scholar
 Küffner R, Petri T, Tavakkolkhah P, Windhager L, Zimmer R: Inferring gene regulatory networks by ANOVA. Bioinformatics. 2012, 28 (10): 13761382. 10.1093/bioinformatics/bts143.View ArticleGoogle Scholar
 Mordelet F, Vert JP: SIRENE: Supervised inference of regulatory networks. Bioinformatics. 2008, 24 (16): i76—i8210.1093/bioinformatics/btn273.View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.