Statistical model comparison applied to common network motifs

Background Network motifs are small modules that show interesting functional and dynamic properties, and are believed to be the building blocks of complex cellular processes. However, the mechanistic details of such modules are often unknown: there is uncertainty about the motif architecture as well as the functional form and parameter values when converted to ordinary differential equations (ODEs). This translates into a number of candidate models being compatible with the system under study. A variety of statistical methods exist for ranking models including maximum likelihood-based and Bayesian methods. Our objective is to show how such methods can be applied in a typical systems biology setting. Results We focus on four commonly occurring network motif structures and show that it is possible to differentiate between them using simulated data and any of the model comparison methods tested. We expand one of the motifs, the feed forward (FF) motif, for several possible parameterizations and apply model selection on simulated data. We then use experimental data on three biosynthetic pathways in Escherichia coli to formally assess how current knowledge matches the time series available. Our analysis confirms two of them as FF motifs. Only an expanded set of FF motif parameterisations using time delays is able to fit the third pathway, indicating that the true mechanism might be more complex in this case. Conclusions Maximum likelihood as well as Bayesian model comparison methods are suitable for selecting a plausible motif model among a set of candidate models. Our work shows that it is practical to apply model comparison to test ideas about underlying mechanisms of biological pathways in a formal and quantitative way.

where β y is the production rate, S a signal, and α y the degradation rate.
The choice of time points where data are sampled is critical for dynamic systems. If data are sampled from the steady state y s = β y S/α y only, an identifiability problem arises. Since the time scale is lost, we can multiply and divide the parameters β y and α y by a single factor and obtain the same solution. In order to avoid this situation, it is important that data are sampled before the system reaches a steady state. Indeed, figures S1a to S1c illustrate that data obtained from a complete series, y c , are more informative than data obtained by the steady state interval alone, y ss , assuming a constant signal S = 1. In fact, the actual values of β y and α y are irrelevant when using y ss data as long as the ratio β y /α y remains the same.
We have also explored the case when the input signal is set to the values 1, 2 and 0 at times 0, 4 and 7 in a stepwise manner. The more diverse the input function, the narrower the posterior distribution becomes (figure S1d). This is caused by the known added variability in the input, which partially determines the values of the parameters. Similarly, increasing the number of sampled time points has a profound effect on the posterior distribution. When comparing a dataset of 11 time points with one of 40, the posterior variance of β y decreases from 0.01 to 4.67 · 10 −4 (see table S1).

Noise levels
Assuming that the measurements of the system have Gaussian noise with variance σ 2 , each observation y obs is normally distributed where y(θ) is the observed variable-dependent on parameters θ = {β y , α y }-free of noise. We explore two cases: one where the noise variance is assumed to be known, and one where it is unknown and has to be estimated as an additional model parameter. Using data generated from equation 1 with true noise variance equal to σ 2 = 0.04, we observe that if the variance is fixed at a smaller value than the true one (here we use 0.01 < σ 2 ), the posterior distribution of the rate parameters is overly optimistic. If the variance is assigned a value higher than the true one (we use 1 > σ 2 ), the posterior becomes very vague. The best results are obtained when the variance is set to its true value during the inference procedure, followed by the results obtained when it is estimated as an additional parameter ( figure S2). In other words, here we explore the effect of either fixing or estimating the noise variance. If available, such variance should be fixed to its known value or assigned a highly informative prior. This simplifies the inference exercise by providing more initial information (reducing the number of unknown parameters or restricting the space of possible values).
However, in the case where the exact value of the noise variance is unknown, we conclude that it is advisable to estimate it rather than to fix it to some arbitrary value.

Prior Specification
We investigate the sensitivity of the rate parameters inference to prior distributions using different uniform distribution intervals on the priors for β y and α y (for the noise variance, we set a constant inverse Gamma prior with shape and scale parameters of 0.5 and 0.05, respectively). Table S2 shows that even when a very uninformative prior is used, that is, a distribution spanning a large interval like that in the fourth row, the posterior distributions of the parameters are still centered around the true values. This shows that in this case -where samples are taken from the entire time series-the data is informative enough to drive the inference process, irrespective of the uncertainty in the prior density. Table S2 also lists the parameter estimates derived with standard least squares optimization (in this case, the Nelder and Mead method). In particular, the algorithm was started on 100 random initial parameter values drawn from a Normal distribution with mean and variance 1.

Model Comparison
We investigate the ability of the Bayesian and frequentist frameworks to identify the correct option between two competitive candidate models under different noise levels. We use M 1 , the model of equation 1, and a second model M 2 that contains a degradation term with a bimolecular reaction: The data are generated using model M 1 , and the added noise is sampled from a white Gaussian distribution with variance σ 2 = var(y)/SNR. Note that var(y) is the variance of the data free of noise, and the signalto-noise ratio (SNR) is set at different values between 0.5 and 100.

Cooperativity models
Cooperativity models are defined in table S4. Corresponding results are given in table S5.