BioPreDynbench: a suite of benchmark problems for dynamic modelling in systems biology
 Alejandro F Villaverde^{1},
 David Henriques^{1, 2},
 Kieran Smallbone^{3},
 Sophia Bongard^{4},
 Joachim Schmid^{4},
 Damjan CicinSain^{5, 6},
 Anton Crombach^{5, 6},
 Julio SaezRodriguez^{2},
 Klaus Mauch^{4},
 Eva BalsaCanto^{1},
 Pedro Mendes^{3},
 Johannes Jaeger^{5, 6} and
 Julio R Banga^{1}Email author
https://doi.org/10.1186/s1291801501444
© Villaverde et al.; licensee BioMed Central. 2015
Received: 21 July 2014
Accepted: 15 January 2015
Published: 20 February 2015
Abstract
Background
Dynamic modelling is one of the cornerstones of systems biology. Many research efforts are currently being invested in the development and exploitation of largescale kinetic models. The associated problems of parameter estimation (model calibration) and optimal experimental design are particularly challenging. The community has already developed many methods and software packages which aim to facilitate these tasks. However, there is a lack of suitable benchmark problems which allow a fair and systematic evaluation and comparison of these contributions.
Results
Here we present BioPreDynbench, a set of challenging parameter estimation problems which aspire to serve as reference test cases in this area. This set comprises six problems including medium and largescale kinetic models of the bacterium E. coli, baker’s yeast S. cerevisiae, the vinegar fly D. melanogaster, Chinese Hamster Ovary cells, and a generic signal transduction network. The level of description includes metabolism, transcription, signal transduction, and development. For each problem we provide (i) a basic description and formulation, (ii) implementations readytorun in several formats, (iii) computational results obtained with specific solvers, (iv) a basic analysis and interpretation.
Conclusions
This suite of benchmark problems can be readily used to evaluate and compare parameter estimation methods. Further, it can also be used to build test problems for sensitivity and identifiability analysis, model reduction and optimal experimental design methods. The suite, including codes and documentation, can be freely downloaded from the BioPreDynbench website, https://sites.google.com/site/biopredynbenchmarks/.
Keywords
Dynamic modelling Model calibration Parameter estimation Optimization Benchmarks Largescale Metabolism Transcription Signal transduction developmentBackground
Systems biology aims at understanding the organization of complex biological systems with a combination of mathematical modelling, experiments, and advanced computational tools. To describe the behaviour of complex systems, models with sufficient level of detail to provide mechanistic explanations are needed. This leads to the use of largescale dynamic models of cellular processes [1]. By incorporating kinetic information, the range of applications of biological models can be widened. The importance of kinetic models is being increasingly acknowledged in fields such as bioprocess optimization [2], metabolic engineering [3], physiology, as well as cell and developmental biology [4].
Systems identification, or reverse engineering, plays an important part in the model building process. The difficult nature of reverse engineering was stressed in [5], where the different perspectives that coexist in the area of systems biology were reviewed. Specifically, largescale dynamic biological models generally have many unknown, nonmeasurable parameters. For the models to encapsulate as accurately as possible our understanding of the system (i.e. reproducing the available data and, ideally, being capable of making predictions), these parameters have to be estimated. This task, known as parameter estimation, model calibration, or data fitting [610], consists of finding the parameter values that give the best fit between the model output and a set of experimental data. This is carried out by optimizing a cost function that measures the goodness of this fit. In systems biology models this problem is often multimodal (nonconvex), due to the nonlinear and constrained nature of the system dynamics. Hence, standard local methods usually fail to obtain the global optimum. As an alternative, one may choose a multistart strategy, where a local method is used repeatedly, starting from a number of different initial guesses for the parameters. However, this approach is usually not efficient for realistic applications, and global optimization techniques need to be used instead [11,12].
Many methods have been presented for this task, but less effort has been devoted to their critical evaluation. It is clear, however, that to make progress in this research area it is essential to assess performance of the different algorithms quantitatively, in order to understand their weaknesses and strengths. Furthermore, if a new algorithm is to be accepted as a valuable addition to the state of the art, it must be first rigorously compared with the existing plethora of methods. This systematic comparison requires adequate benchmark problems, that is, reference calibration case studies of realistic size and nature that can be easily used by the community. Several collections of benchmarks – and of methods for generating them – have already been published [1319]. An artificial gene network generator, which allows to choose from different topologies, was presented in [13]. The system, known as ABIOCHEM, generates pseudoexperimental noisy data in silico, simulating microarray experiments. An artificial gene network with ten genes generated in this way was later used to compare four reverseengineering methods [15]. More recently, a toolkit called GRENDEL was presented with the same purpose [17], including several refinements in order to increase the biological realism of the benchmark. A reverseengineering benchmark of a small biochemical network was presented in [14]. The model describes organism growth in a bioreactor and the focus was placed on model discrimination using measurements of some intracellular components. A proposal for minimum requirements of problem specifications, along with a collection of 44 small benchmarks for ODE model identification of cellular systems, was presented in [16]. The collection includes parameter estimation problems as well as combined parameter and structure inference problems. Another method for generation of dynamical models of gene regulatory networks to be used as benchmarksis GeneNetWeaver [19], which was used to provide the international Dialogue for Reverse Engineering Assessments and Methods (DREAM) competition with three network inference challenges (DREAM3, DREAM4 and DREAM5) [18]. Subsequent competitions (DREAM6, DREAM7) included also parameter estimation challenges of mediumscale models [20]. Similar efforts have been carried out in related areas, such as in optimization, where BBOB workshops (BlackBox Optimization Benchmarking, [21]) have been organised since 2009. In this context it is also worth mentioning the collection of largescale, nonlinearly constrained optimization problems from the physical sciences and engineering (COPS) [22].
Despite these contributions, there is still a lack of suitable benchmark problems in systems biology that are at the same time (i) dynamic, (ii) largescale, (iii) readytorun, and (iv) available in several common formats. None of the above mentioned collections possesses all these features, although each one has a subset of them. Here we present a collection of medium and largescale dynamic systems, with sizes of tens to hundreds of variables and hundreds to thousands of estimated parameters, which can be used as benchmarks for reverseengineering techniques. The collection includes two Escherichia coli models [23,24], a genomewide kinetic model of Saccharomyces cerevisiae [25], a metabolic model of Chinese Hamster Ovary (CHO) cells [26], a signal transduction model of human cells [27], and a developmental gene regulatory network of Drosophila melanogaster [2830].
Ensuring standardisation allows systems biology models to be reused outside of their original context: in different simulators, under different conditions, or as parts of more complex models [31]. To this end, we have made five of the six models (the exception is the spatial model of D. melanogaster) available in Systems Biology Markup Language (SBML [32]) format, allowing for their simulation in multiple software tools, including AMIGO [33] and COPASI [34]. Even when defined in a standard format such as SBML, large models such as the genomewide kinetic model of S. cerevisiae may give different results when simulated in different software environments. The inherent size and stiffness of genomescale systems biology models create new challenges to be addressed for their robust simulation by systems biology tools [25]. To address this problem all the models have been consistently formatted, with their dynamics provided both in C and in Matlab. Additionally, a benchmark consisting of a parameter estimation problem has been defined for every model, for which readytorun implementations are provided in Matlab (optionally with the use of the AMIGO toolbox [33]) and, in some cases, also in COPASI [34]. The availability of readytorun implementations is a highly desirable practice in computer science, since it ensures reproducibility of the results. Calibration results with state of the art optimization methods are reported, which can serve as a reference for comparison with new methodologies. Additionally, suggestions on how to compare the performance of several methods are also given in the Results and discussion section.
Problem statement

cost function to optimize (i.e. metrics which reflects the mismatch between experimental and predicted values)

dynamics of the systems (in our benchmark models they are given by systems of ordinary differential equations)

model parameters to be estimated

initial conditions for the dynamics (possibly unknown, in which case they are included among the parameters to be estimated)

upper and lower bounds for the parameters

state variables that can be measured (observed)

values of external stimuli, also known as control variables

measurements (over time and/or space) available for the calibration: number of experiments, stimuli (if any) for each experiment, data points per experiment, etc.

(optional) type and magnitude of errors considered for the experimental data

(optional) additional time series for validation of the calibrated model

solver used to numerically simulate the systems, and the relative and absolute error tolerances used
For known constant variances the loglikelihood cost function is similar to the generalized least squares, with weights chosen as the inverse of the variance, \(w^{\epsilon,o}_{s} = 1/\left (\sigma ^{\epsilon,o}_{s}\right)^{2}\). This is the case of most of the benchmark problems presented here (B1, B2, B4, B5). The exceptions are: problem B3, in which no noise has been added to the data, and thus the scaling factors \(w^{\epsilon,o}_{s}\) are taken as the squared inverse of the maximum experimental value for each observable; and problem B6, where the weights are inversely related to the level of expression. More details are given in the next section. Note that the cost functions used in Matlab/AMIGO are not exactly the same as the ones used in COPASI, since in COPASI the weights are scaled so that for each experiment the maximal occurring weight is 1.
where g is the observation function, x is the vector of state variables with initial conditions x _{0}, f is the set of differential and algebraic equality constraints describing the system dynamics (that is, the nonlinear process model), h _{ eq } and h _{ in } are equality and inequality constraints that express additional requirements for the system performance, and p ^{ L } and p ^{ U } are lower and upper bounds for the parameter vector p. The problem defined above is the general formulation of a nonlinear least squares optimization subject to dynamic constraints and bounds in the parameters. The problems included in this collection of benchmarks do not make use of constraints (6–7).
Remarks on parameter estimation methods
Fitting a large, nonlinear model to experimental (noisy) data is generally a multimodal problem. In these circumstances, the use of local optimization methods, which are usually gradientbased, entails the risk of converging to local minima. Hence it is needed to use global optimization methods that provide more guarantees of converging to the globally optimal solution [11,35]. Global optimization strategies can be roughly classified as deterministic, stochastic and hybrid. Deterministic methods can guarantee the location of the global optimum solution; however, their computational cost makes them unfeasible for largescale problems. Stochastic methods, which are based on probabilistic algorithms, do not provide those guarantees, but are frequently capable of finding optimal or nearoptimal solutions in affordable computation times.
Some of the most efficient stochastic global optimization methods are the metaheuristic approaches. A heuristic is an algorithm originated not from formal analysis, but from an expert knowledge of the task to be solved. A metaheuristic can be seen as a generalpurpose heuristic method designed to guide an underlying problemspecific heuristic. It is therefore a method that can be applied to different optimization problems with few modifications. Hybrid methods which combine metaheuristics for global optimization and local methods for accelerating convergence in the vicinity of local minima can be particularly efficient. One such method is the enhanced Scatter Search algorithm, eSS [36], and its parallel cooperative version, CeSS [37]. Matlab and R implementations are publicly available as part of the MEIGO toolbox [38]. The eSS method is available as a Matlab toolbox and is also included in AMIGO; this latter version is the one used in this work. It should be noted that AMIGO offers more than a dozen optimization solvers, including local and global methods, and the possibility of combining them to form userdefined sequential hybrid methods. In COPASI [34] it is possible to choose among thirteen different optimization methods for parameter estimation, including deterministic and stochastic: Evolutionary Programming, Evolutionary Strategy (SRES), Genetic Algorithm, Hooke and Jeeves, Levenberg–Marquardt, Nelder–Mead, Particle Swarm, Praxis, Random Search, Simulated Annealing, Scatter Search, Steepest Descent, and Truncated Newton.
Remarks on comparing optimization methods
Although the objective of this paper is to present a set of readytorun benchmarks, we list below several guidelines on how to compare different optimizers with these problems.
Many optimization methods require an initial point and/or bounds on the decision variables. For ensuring a fair comparison between different methods, the same bounds and initial points should be set. Obviously, the nominal solution can not be used as an initial point (note that in this work we use the term “nominal” to refer to the “true” or “reference” parameter values, i.e. for problems that use pseudoexperimental data the nominal solution is the parameter vector that was used to generate the data). Special emphasis should be laid on ensuring full reproducibility. This entails providing all source codes and binary files used in computations, as well as specifying all implementation details, such as software and hardware environment (including compiler versions and options, if any). If some aspects of a method can be tuned, these settings must be clearlyindicated.
Many different criteria may be used for comparing the performance of optimization methods. It can be expressed as a function of CPU time, number of function evaluations, or iteration counts. When considering several problems, a solver’s average or cumulative performance metric can be chosen. If an algorithm fails to converge a penalty can be used, in which case an additional decision is required to fix its value. An alternative is to use ranks instead of numerical values, although this option hides the magnitudes of the performance metric. All approaches have advantages and drawbacks, and their use requires making choices that are subjective to some extent. In an attempt to combine the best features of different criteria, Dolan and Moré [39] proposed to compare algorithms based on their performance profiles, which are cumulative distribution functions for a performance metric. Performance profiles basically rely on calculations of the ratio of the solver resource time versus the best time of all the solvers. It should be noted that, for complex largescale problems where identifiability is an issue, different methods often arrive at different solutions. In that case the use of performance profiles requires choosing a tolerance to define acceptable solutions. Performance profiles are a convenient way of summarizing results when there are many methods to be compared and many problems on which to test them. When this is not the case, however, more information can be provided by using convergence curves. Convergence curves plot the evolution of the objective function, as defined in Equation (1), as a function of the number of evaluations or the computation time (since the overhead is different for each method). They provide information not only about the final value reached by an algorithm, but also about the speed of progression towards that value.
When comparing different optimization methods, the best result (cost) and the mean (or median) for N runs should be reported in a table. Similar statistics for computation and number of evaluations should apply to all the methods. However, since the final values can be greatly misleading, convergence curves should be provided in addition to this table.
Note that, in order to make a fair comparison of convergence curves obtained with different software tools and/or hardware environments, it is a good practice to report any speedup due to parallelism. This can happen in nonobvious situations. For example, COPASI can make use of several threads in multicore PCs due to its use of the Intel MKL library. In summary, fair comparisons should be made taking into account the real overall computational effort used by each method/implementation. As a general rule, only methods running in the same platform should be compared.
Remarks on identifiability
Parameter estimation is just one aspect of what is known as the inverse problem. This larger problem also includes identifiability analysis, which determines whether the unknown parameter values can be uniquely estimated [40,41]. Lack of identifiability means that there are several possible parameter vectors that give the same agreement between experimental data and model predictions. We may distinguish between a priori structural identifiability and a posteriori or practical identifiability [40,41]. The parameters are structurally identifiable if they can be uniquely estimated from the designed experiment under ideal conditions of noisefree observations and errorfree model structure. Structural identifiability is a theoretical property of the model structure, which can be very difficult to determine for large and complex models[4244]. Even if a model is structurally identifiable, it may exhibit practical identifiability issues. Practical identifiability depends on the output sensitivity functions (partial derivatives of the measured states with respect to the parameters). If the sensitivity functions are linearly dependent the model is not identifiable, and sensitivity functions that are nearly linearly dependent result in parameter estimates that are highly correlated. Furthermore, even if they are linearly independent, low sensitivities may lead to an undesirable situation. Practical identifiability can be studied from sensitivitybased criteria like the Fisher information matrix (FIM). The practical identifiability of the models can be analyzed in this way with the AMIGO toolbox [33]. The AMIGO_LRank method ranks the model parameters according to their influence on the model outputs, using several sensitivity measures. In large biological models identifiability issues are the norm rather than the exception [9,42,4554]. This may be partly due to inconsistent modelling practices, but even when a model has been carefully built and is structurally identifiable, the amount of data required for a perfect calibration (practical identifiability) is usually large. As an illustration, consider the general case of a model described by differential equations, and assume the ideal situation where the structure of the equations is perfectly known. Then, a wellknown result states that identification of r parameter values requires 2r+1 measurements [55]. However, it is frequently the case that a model with more than a thousand parameters has to be calibrated with only dozens or maybe hundreds of measurements. Another issue which is intimately related to lack of identifiability is overfitting, which may occur when the number of parameters is too large compared to the number of data. In this case the calibrated model is actually describing the noise present in the data, instead of the true dynamics of the system. Like unidentifiability, overfitting is common in largescale systems biology models such as the ones presented here. Finally, it should be noted that lack of identifiability does not preclude the use of modelbased methods. Unique model predictions can in fact be obtained despite unidentifiability, as discussed by Cedersund [52].
Implementation

Dynamic model.

Experimental information: initial conditions, input functions, what is measured, measurement times, noise level and type.

Cost function to be used: its type (least squares, weighted least squares, maximum likelihood, etc), and why it should be chosen.

Parameters to estimate: lower and upper bounds, initial guesses, nominal values (the latter are reference values, which must not be used during estimations).

Implementations (Matlab with and without the AMIGO toolbox, C, COPASI): installation, requirements, and usage. Readytorun scripts are provided, with examples of how to execute them and their expected output.
Models
Model ID  B1  B2  B3  B4  B5  B6 

Model Ref  [25]  [23]  [24]  [26]  [27]  [29] 
Cell  S. cerevisiae  E. coli  E. coli  CHO  Generic  Drosophila 
melanogaster  
Description  Metabolic:  Metabolic:  Metabolic: CCM  Metabolic  Signal  Developmental 
level  genome scale  CCM  & transcription  transduction  GRN (spatial)  
Parameters  1759  116  178  117  86  37 
Dynamic states  276  18  47  34  26  108 –212 
Observed states  44  9  47  13  6  108 –212 
Experiments  1  1  1  1  10  1 
Data points  5280  110  7567  169  96  1804 
Data type  simulated  measured  simulated  simulated  simulated  measured 
Noise level  σ=5%  real  no noise  variable  σ=5%  real 
Problem B1: genomewide kinetic model of S. cerevisiae
Implementations of the benchmark problems are provided as Additional files 2 and 3. The version of the AMIGO toolbox used in this work is provided as Additional files 4 and 5. The biochemical structure of this model is taken from yeast.sf.net (version 6, [56]). In decompartmentalised form, this network has 1156 reactions and 762 variables. We fix some experimentally determined exchange fluxes, and use geometric FBA [57] to choose a unique reference flux distribution consistent with the experimental data. We fix some initial concentrations to their experimentally determined levels and assign the remainder typical values. We define reaction kinetics using the common modular rate law, a generalised form of the reversible MichaelisMenten kinetics that can be applied to any reaction stoichiometry [58]. The final model contains 261 reactions with 262 variables and 1759 parameters. This model has been created according to the pipeline presented in [25], which ensures consistency with our sparse data set; whilst no data is required to produce the model, it can incorporate any known flux or concentration data or any kinetic constants. As an addition to the model developed in [25], this version has been alligned with previously unpublished experimental data. The new data consist of 44 steadystate measurements (38 concentrations and 6 fluxes), which are included in Additional file 1: Table S12 and S13. The steady state is found to be stable. The number of measurements available at the present stage is not enough for carrying out a proper model calibration. Envisioning that dynamic (timeseries) measurements of the 44 observed variables may be available in the near future, we show in this paper how they will be employed for reestimating the parameter values. With this aim, we have generated pseudoexperimental noisy data corresponding to a pulse in the concentration of extracellular glucose, and have used this simulated data to recalibrate the model. We generated 120 samples per observable and added artificial measurement noise (standard uniform distribution, σ=5%) to resemble realistic conditions.
Problem B2: dynamic model of the Central Carbon Metabolism of E. coli
This model, originally published in [23] and available at the BioModels database [59], reproduces the response to a pulse in extracellular glucose concentration. It includes 18 metabolites in two different compartments: the cytosol (17 internal metabolites), and the extracellular compartment (1 extracellular metabolite: glucose). These metabolites are involved in 48 reactions: 30 kinetic rate reactions, 9 degradation equations, 8 dilution equations, and 1 equation for extracellular glucose kinetics. Additionally, there are 7 analytical functions, thus the model is defined by a total of 55 mathematical expressions. We have reformulated the model to use it as a parameter estimation problem; the 116 parameters to be estimated consist of kinetic parameters and maximum reaction rates.
As an addition to the model version available in the Biomodels database, we provide the experimental data that were used in the original publication but had not been published (Klaus Mauch, personal communication). The dataset is given in Additional file 1: Table S14, and consists of timecourse concentration measurements of nine metabolites. The aim of the model calibration in this case is to find a better fit to the experimental data than the one obtained with the nominal parameter vector used in the original publication [23]. Note that this is different from benchmarks 1 and 3–5, which use simulated data and where the aim is to recover a fit as good as the one obtained with the nominal parameter vector, with which the data were generated.
Problem B3: enzymatic and transcriptional regulation of the Central Carbon Metabolism of E. coli
This model simulates the adaptation of E. coli to changing carbon sources. Complete information about this model is available as the supplementary information of [24]. It is also included in the BioModels Database [59]. It should be noted that there are some differences in parameter values between the original model and the BioModels version; however, these changes do not alter the simulation results, a fact that indicates unidentifiability. The model contains 47 ODEs and 193 parameters, of which 178 are considered unknown and need to be estimated. The other 15 parameters are constants known to the modeler (number of subunits of the multimers–enzymes–, scaling factors, universal protein degradation rate, and gene expression rate constant). The outputs of the system are the 47 state variables, which represent concentrations. Pseudoexperimental data were generated by simulation of the sixth scenario defined in the simulation files included as supplementary material in [24]. This scenario simulates an extended diauxic shift which consists of three consecutive environments, where the carbon sources are first glucose, then acetate, and finally a mixture of both. Under these conditions, the 47 concentration profiles are sampled every 1000 seconds, for a total of 161 time points (45 hours). This model exhibits large differences in value among concentrations, which span five orders of magnitude. To equalize their contribution to the objective function, we scale each timeseries dividing it by the maximum of the experimental value (scaled least squares).
Problem B4: metabolic model of Chinese Hamster Ovary (CHO) cells
Chinese Hamster Ovary cells (CHO) are used for protein production in fermentation processes [60]. This model simulates a batch process with resting cells: no metabolites are fed for a final time horizon of 300 hours. The fermenter medium contains glucose as main carbon source, and leucine and methionine are the main amino acids taken up. Lactate was modelled to be a byproduct of the fermentation process. A generated protein serves as main product of the fermentation process. The model comprises 35 metabolites in three compartments (fermenter, cytosol, and mitochondria) and 32 reactions, including protein product formation, EmbdenMeyerhofParnas pathway (EMP), TCA cycle, a reduced amino acid metabolism, lactate production, and the electron transport chain. The kinetics are modelled as in [23], and the resulting ODE model comprises 117 parameters in total. Some aspects of this model were partially discussed in [26]. For optimization purposes pseudoexperimental data were generated, mimicking a typical cell behavior. The following 13 metabolites are assumed to be measured: in fermenter, glucose, lactate, product protein, leucine, and methionine; in cytosol, aspartate, malate, pyruvate, oxaloacetate, ATP, and ADP; and in mitochondria, ATP and ADP. Samples were assumed to be daily taken over the whole fermentation time.
Problem B5: signal transduction logic model
To illustrate the advantages and disadvantages of different formalisms related to logic models, MacNamara and colleagues constructed a plausible network of interactions consisting of signaling factors known to be activated downstream of EGF and T N Fα [27]. The model consists of 26 ODEs that use a logicbased formalism, which is explained in detail in [61]. In this formalism, state values can vary between 0 and 1 and represent the normalized activity of a given protein, which is typically measured as the level of phosphorylation. In total the model includes 86 continuous parameters, corresponding to the half maximal activations (k), the Hill coefficients (n) and a set of parameters controlling the rate of activation/deactivation of a given protein (τ). The model incorporates EGF and T N Fα which are treated as stimuli that trigger the pathway response. In addition to these two stimuli, the model includes two kinase inhibitors for R A F1 and P I3K, which can block the activity of both species. In total the model can be perturbed by these 4 cues, allowing a rich variation in the dynamic profiles of the model signaling components, an essential requirement for parameter estimation. In order to generate a dataset for reverse engineering the model structure, the authors generated data, corresponding to 10 insilico experiments, where the different cues (stimuli and inhibitors) are added in different combinations. For each experiment 6 strategically located states are observed. Each observable was measured at 16 equidistant time points per experiment. In addition to this Gaussian noise was added to the data in order to mimic a reasonable amount of experimental error. Note that the SBML implementation of this model uses the SBML qual format [62], an extension of SBML developed for qualitative models of biological networks.
Problem B6: the gap gene network of the vinegar fly, Drosophila melanogaster
Our last benchmark model is slightly different from those previously described, in that it represents a spatial model of pattern formation in multicellular animal development, and the data for fitting are based on microscopy, rather than metabolomics or transcriptomics. The gap genes form part of the segmentation gene network, which patterns the anterior–posterior (AP) axis of the Drosophila melanogaster embryo. They are the primary regulatory targets of maternal morphogen gradients, and are active during the blastoderm stage in early development. In the model, the embryo is a single row of dividing nuclei along the AP axis, with each nucleus containing the four gap genes and receiving input from four external factors. The gap genes included in the model are hunchback (hb), Krüppel (Kr), giant (gt), and knirps (kni), and the external inputs Bicoid (Bcd), Caudal (Cad), Tailless (Tll), and Huckebein (Hkb). Three processes occur within and between nuclei: (1) regulated gene product synthesis, (2) Fickian gene product diffusion, and (3) linear gene product decay. These processes are formalised with ODEs, and result in the model having 37 unknown parameters. This model [29,30] implements the gene circuit approach [63,64] used to reverseengineer the regulatory interactions of the gap genes by fitting to quantitative spatiotemporal gene expression data, which can be mRNA [29] or protein [30]. The data consist of 9 time points spanning 71 minutes of Drosophila development, and at each time point maximally 53 nuclei with data points for the four gap genes, and the four external inputs. The fit is measured with a weighted least squares scheme (WLS) with variable weights, which, in the case of the mRNA data used here [29], are inversely related to the level of expression. The weights were created from normalized, integrated mRNA expression data according to the formula: w=1.0−0.9y, with y∈[ 0,1] being the normalized staining intensity. This proportionality of variation with expression level reflects the fact that gap domains (showing high levels of expression) show more variation than those regions of the embryo in which a gene is not expressed [29].
Results and discussion
We show how our collection of benchmark problems can be used by reporting selected results using several parameter estimation methods. We emphasize that the purpose of this work is not to provide a comprehensive comparison of all existing approaches, but to provide a useful, versatile, and practical test set and illustrate its use. For simplicity, and to enable direct comparisons among benchmarks, all the computations reported in this section have been carried out in Matlab, using the algorithms available in the AMIGO toolbox [33]. This includes both global and local optimization methods; the latter have been used in a multistart procedure, where multiple instances are launched from different initial points selected randomly within the parameter bounds.
where n _{ o } is the number of observables and n _{ s } the number of sampling times. Figure 1 shows the sensitivity of the state variables with respect to the parameters. From this figure it becomes clear that many parameters such as 810, 3238, 5664, are not influencing observables. Therefore those parameters are expected to be poorly identifiable.
In the remainder of this section we show selected results of the best performing optimization methods in every parameter estimation problem. Complete results for every benchmark are reported in the Additional file 1.

B1 and B3 are the most expensive: in our computers, obtaining a reasonably good fit took at least one week.

B5 and B6 are intermediate in terms of cost; a good fit could be obtained in one day.

B2 and B4 are the least expensive, with good fits obtained in one or a few hours.
Parameter estimation with eSS (AMIGO implementation): settings and results
Model ID  B1  B2  B3  B4  B5  B6 

p ^{ U }  5·p _{ nom }  \(10\cdot p_{\textit {nom}}^{(ex)}\)  \(10\cdot p_{\textit {nom}}^{(ex)}\)  5·p _{ nom }  varying  varying 
p ^{ L }  0.2·p _{ nom }  \(0.1\cdot p_{\textit {nom}}^{(ex)}\)  \(0.1\cdot p_{\textit {nom}}^{(ex)}\)  0.2·p _{ nom }  varying  varying 
Local method  DHC  FMINCON  none  FMINCON  DHC  FMINCON 
CPU time  ≈170 hours  ≈3 hours  ≈336 hours  ≈1 hour  ≈16 hours  ≈24 hours 
Evaluations  6.9678·10^{5}  9.0728·10^{4}  7.2193·10^{6}  1.6193·10^{5}  8.8393·10^{4}  2.0751·10^{6} 
J _{0}  5.8819·10^{9}  3.1136·10^{4}  4.6930·10^{16}  6.6034·10^{8}  3.1485·10^{4}  8.5769·10^{5} 
J _{ f }  1.3753·10^{4}  2.3390·10^{2}  3.7029·10^{−1}  4.5718·10^{1}  3.0725·10^{3}  1.0833·10^{5} 
J _{ nom }  1.0846·10^{6}  −  0  3.9068·10^{1}  4.2737·10^{3}  − 
\(\sum \textit {NRMSE}_{0}\)  3.5834·10^{1}  8.5995·10^{−2}  3.5457·10^{1}  4.8005·10^{1}  4.0434·10^{1}  2.3808·10^{2} 
\(\sum \textit {NRMSE}_{f}\)  5.7558  2.4921  2.9298·10^{−1}  2.8010  2.7430·10^{1}  1.6212·10^{2} 
\(\sum \textit {NRMSE}_{\textit {nom}}\)  3.8203  −  0  2.8273  3.0114·10^{1}  − 
The cumulative normalized rootmeansquare error, ∑NRMSE, is simply the sum of the NRMSE ^{ o } for all observables.
Note that, due to the realistic nature of most of these problems, there may be lack of identifiability and optimization may result in overfitting: that is, an optimal solution may be found that gives a better fit to the pseudoexperimental data than the one obtained with the nominal parameter vector used to generate the data. This is explained because, in the presence of measurement noise, the optimal solution manages to fit partially not only the system dynamics, but also the noise itself–which of course cannot be achieved by the nominal solution. Hence in the results reported in Table 2 the optimal objective function value (J _{ f }) is sometimes smaller (i.e. better) than the nominal one (J _{ nom }). This may also happen with the ∑NRMSE values. Note however that, since the objective functions used in the calibration (J) and the ∑NRMSE are different metrics, their behavior may be different. For example, for B1 J _{ f }<J _{ nom } and \(\sum {}\textit {NRMSE}_{f}>\sum {}\textit {NRMSE}_{\textit {nom}}\), while for B4 the opposite is true: J _{ f }>J _{ nom } and \(\sum {}\textit {NRMSE}_{f}<\sum {}\textit {NRMSE}_{\textit {nom}}\).
Conclusions
To address the current lack of readytorun benchmarks for largescale dynamic models in systems biology, we have presented here a collection of six parameter estimation problems. They cover the most common types, including metabolism, transcription, signal transduction, and development. The benchmarks are made available in a number of formats. As a common denominator, all of the models have been implemented in Matlab and C. When possible (i.e. for benchmarks B1–B5), model descriptions are also given in SBML. Readytorun implementations of all the benchmarks are provided in Matlab format (both with and without the AMIGO toolbox) and in COPASI (for benchmarks B1–B4). With these files it is straightforward to reproduce the results reportedhere.
More importantly, the benchmark files can be easily adapted to test new parameter estimation methods for which a Matlab, C, or COPASI implementation is available. The performance of an existing or newly developed method can be evaluated by comparing its results with those reported here, as well as with those obtained by other methods. To this end, we have provided guidelines for comparing the performance of different optimizers. The problems defined here may also be used for educational purposes, running them as examples in classes or using them as assignments.
Finally, it should be noted that the utility of this collection goes beyond parameter estimation: the models provided here can also be used for benchmarking methods for optimal experimental design, identifiability analysis, sensitivity analysis, model reduction, and in the case of metabolic models also for metabolic engineering purposes.
Availability and requirements

For C implementations: GCC compiler, Matlab Compiler Runtime (the MCR is a free component
which does not require a Matlab installation)

For Matlab and AMIGO implementations: Matlab R2008 or newer

For COPASI implementations: COPASI version 4.13 (Build 87) or newer

For SBML: any software package capable of reading SBML
Please refer to the Additional file 1 for more details on these requirements and installation instructions
License: Artistic License 2.0
Restrictions for nonacademic use: None
Declarations
Acknowledgments
This work was supported by the EU project “BioPreDyn” (EC FP7KBBE20115, grant number 289434). We acknowledge support of the publication fee by the CSIC Open Access Publication Support Initiative through its Unit of Information Resources for Research (URICI). We would like to thank Attila Gábor for reading the manuscript, providing critical comments, and finding bugs in the codes; David Rodríguez Penas for helping in debugging the codes; and Thomas Cokelaer for providing the SBML qual file of model B5.
Authors’ Affiliations
References
 Link H, Christodoulou D, Sauer U. Advancing metabolic models with kinetic information. Curr Opin Biotechnol. 2014; 29:8–14.PubMedView ArticleGoogle Scholar
 Almquist J, Cvijovic M, Hatzimanikatis V, Nielsen J, Jirstrand M. Kinetic models in industrial biotechnology–improving cell factory performance. Metab Eng. 2014; 24:38–60.PubMedView ArticleGoogle Scholar
 Song HS, DeVilbiss F, Ramkrishna D. Modeling metabolic systems: the need for dynamics. Curr Opin Chem Eng. 2013; 2(4):373–82.View ArticleGoogle Scholar
 Jaeger J, Monk N. Bioattractors: Dynamical systems theory and the evolution of regulatory processes. J Physiol (Lond). 2014; 592:2267–81.View ArticleGoogle Scholar
 Villaverde AF, Banga JR. Reverse engineering and identification in systems biology: strategies, perspectives and challenges. J R Soc Interface. 2014; 11(91):20130505.PubMed CentralPubMedView ArticleGoogle Scholar
 van Riel N.A.W. Dynamic modelling and analysis of biochemical networks: mechanismbased models and modelbased experiments. Brief Bioinform. 2006; 7(4):364–74.View ArticleGoogle Scholar
 Jaqaman K, Danuser G. Linking data to models: data regression. Nat Rev Mol Cell Biol. 2006; 7(11):813–19.PubMedView ArticleGoogle Scholar
 Banga J, BalsaCanto E. Parameter estimation and optimal experimental design. Essays Biochem. 2008; 45:195.PubMedView ArticleGoogle Scholar
 Ashyraliyev M, FomekongNanfack Y, Kaandorp JA, Blom JG. Systems biology: parameter estimation for biochemical models. FEBS J. 2008; 276(4):886–902.View ArticleGoogle Scholar
 Vanlier J, Tiemann C, Hilbers P, van Riel N. Parameter uncertainty in biochemical models described by ordinary differential equations. Math Biosci. 2013; 246(2):305–14.PubMedView ArticleGoogle Scholar
 Moles CG, Mendes P, Banga JR. Parameter estimation in biochemical pathways: a comparison of global optimization methods. Genome Res. 2003; 13(11):2467.PubMed CentralPubMedView ArticleGoogle Scholar
 Banga JR. Optimization in computational systems biology. BMC Syst Biol. 2008; 2:47.PubMed CentralPubMedView ArticleGoogle Scholar
 Mendes P, Sha W, Ye K. Artificial gene networks for objective comparison of analysis algorithms. Bioinformatics. 2003; 19(suppl 2):122–29.View ArticleGoogle Scholar
 Kremling A, Fischer S, Gadkar K, Doyle FJ, Sauter T, Bullinger E, et al.A benchmark for methods in reverse engineering and modeldiscrimination: problem formulation and solutions. Genome Res. 2004; 14(9):1773–85.PubMed CentralPubMedView ArticleGoogle Scholar
 Camacho D, Vera Licona P, Mendes P, Laubenbacher R. Comparison of reverseengineering methods using an in silico network. Ann N Y Acad Sci. 2007; 1115(1):73–89.PubMedView ArticleGoogle Scholar
 Gennemark P, Wedelin D. Benchmarks for identification of ordinary differential equations from time series data. Bioinformatics. 2009; 25(6):780–86.PubMed CentralPubMedView ArticleGoogle Scholar
 Haynes BC, Brent MR. Benchmarking regulatory network reconstruction with grendel. Bioinformatics. 2009; 25(6):801–07.PubMed CentralPubMedView ArticleGoogle Scholar
 Marbach D, Schaffter T, Mattiussi C, Floreano D. Generating realistic in silico gene networks for performance assessment of reverse engineering methods. J Comput Biol. 2009; 16(2):229–39.PubMedView ArticleGoogle Scholar
 Schaffter T, Marbach D, Floreano D. Genenetweaver: in silico benchmark generation and performance profiling of network inference methods. Bioinformatics. 2011; 27(16):2263–70.PubMedView ArticleGoogle Scholar
 Meyer P, Cokelaer T, Chandran D, Kim KH, Loh PR, Tucker G, et al.Network topology and parameter estimation: from experimental design methods to gene regulatory network kinetics using a community based approach. BMC Syst Biol. 2014; 8(1):13.PubMed CentralPubMedView ArticleGoogle Scholar
 Auger A, Hansen N, Schoenauer M. Benchmarking of continuous black box optimization algorithms. Evol Comput. 2012; 20(4):481.PubMedView ArticleGoogle Scholar
 Dolan ED, Moré JJ, Munson TS. Benchmarking optimization software with cops 3.0. Argonne National Laboratory Technical Report ANL/MCSTM273, 9700 South Cass Avenue, Argonne, Illinois 60439, USA. 2004.Google Scholar
 Chassagnole C, NoisommitRizzi N, Schmid JW, Mauch K, Reuss M. Dynamic modeling of the central carbon metabolism of escherichia coli. Biotechnol Bioeng. 2002; 79(1):53–73.PubMedView ArticleGoogle Scholar
 Kotte O, Zaugg JB, Heinemann M. Bacterial adaptation through distributed sensing of metabolic fluxes. Mol Syst Biol. 2010; 6(1):355.PubMed CentralPubMedGoogle Scholar
 Smallbone K, Mendes P. Largescale metabolic models: From reconstruction to differential equations. Ind Biotech. 2013; 9(4):179–84.View ArticleGoogle Scholar
 Villaverde AF, Bongard S, Schmid J, Müller D, Mauch K, BalsaCanto E, et al.Highconfidence predictions in systems biology dynamic models. In: Advances in Intelligent and SoftComputing, vol. 294. Switzerland: Springer: 2014. p. 161–71.Google Scholar
 MacNamara A, Terfve C, Henriques D, Bernabé BP, SaezRodriguez J. State–time spectrum of signal transduction logic models. Phys Biol. 2012; 9(4):045003.PubMedView ArticleGoogle Scholar
 Jaeger J, Surkova S, Blagov M, Janssens H, Kosman D, Kozlov KN, et al.Dynamic control of positional information in the early drosophila embryo. Nature. 2004; 430(6997):368–71.PubMedView ArticleGoogle Scholar
 Crombach A, Wotton KR, CicinSain D, Ashyraliyev M, Jaeger J. Efficient reverseengineering of a developmental gene regulatory network. PLoS Comput Biol. 2012; 8(7):1002589.View ArticleGoogle Scholar
 Ashyraliyev M, Siggens K, Janssens H, Blom J, Akam M, Jaeger J. Gene circuit analysis of the terminal gap gene huckebein. PLoS Comput Biol. 2009; 5(10):1000548.View ArticleGoogle Scholar
 Krause F, Schulz M, Swainston N, Liebermeister W. Sustainable model building the role of standards and biological semantics. Methods Enzymol. 2011; 500:371–95.PubMedView ArticleGoogle Scholar
 Hucka M, Finney A, Sauro H. M, Bolouri H, Doyle JC, Kitano H, et al.The systems biology markup language (sbml): a medium for representation and exchange of biochemical network models. Bioinformatics. 2003; 19(4):524–31.PubMedView ArticleGoogle Scholar
 BalsaCanto E, Banga JR. Amigo, a toolbox for advanced model identification in systems biology using global optimization. Bioinformatics. 2011; 27(16):2311–3.PubMed CentralPubMedView ArticleGoogle Scholar
 Hoops S, Sahle S, Gauges R, Lee C, Pahle J, Simus N, et al.Copasi – a complex pathway simulator. Bioinformatics. 2006; 22(24):3067–74.PubMedView ArticleGoogle Scholar
 BalsaCanto E, Alonso A, Banga JR. An iterative identification procedure for dynamic modeling of biochemical networks. BMC Syst Biol. 2010; 4:11.PubMed CentralPubMedView ArticleGoogle Scholar
 Egea JA, Martí R, Banga JR. An evolutionary method for complexprocess optimization. Comput Oper Res. 2010; 37(2):315–24.View ArticleGoogle Scholar
 Villaverde A, Egea J, Banga J. A cooperative strategy for parameter estimation in large scale systems biology models. BMC Syst Biol. 2012; 6(1):75.PubMed CentralPubMedView ArticleGoogle Scholar
 Egea J, Henriques D, Cokelaer T, Villaverde A, MacNamara A, Danciu D. P, et al.Meigo: an opensource software suite based on metaheuristics for global optimization in systems biology and bioinformatics. BMC Bioinformatics. 2014; 15:136.PubMed CentralPubMedView ArticleGoogle Scholar
 Dolan E. D, Moré J. J. Benchmarking optimization software with performance profiles. Math Program, Ser A. 2002; 91(2):201–13.View ArticleGoogle Scholar
 Walter E, Pronzato L. Identification of parametric models from experimental data. Communications and control engineering series. London, UK: Springer; 1997.Google Scholar
 Gadkar KG, Gunawan R, Doyle FJ. Iterative approach to model identification of biological networks. BMC Bioinformatics. 2005; 6(1):155.PubMed CentralPubMedView ArticleGoogle Scholar
 Chiş OT, Banga JR, BalsaCanto E. Structural identifiability of systems biology models: a critical comparison of methods. PLoS ONE. 2011; 6(11):27755.View ArticleGoogle Scholar
 Chiş O, Banga J.R, BalsaCanto E. Genssi: a software toolbox for structural identifiability analysis of biological models. Bioinformatics. 2011; 27(18):2610–1.PubMed CentralPubMedGoogle Scholar
 Becker K, BalsaCanto E, CicinSain D, Hoermann A, Janssens H, Banga JR, et al.Reverseengineering posttranscriptional regulation of gap genes in drosophila melanogaster. PLoS Comput Biol. 2013; 9(10):1003281.View ArticleGoogle Scholar
 Zak DE, Gonye GE, Schwaber JS, Doyle FJ. Importance of input perturbations and stochastic gene expression in the reverse engineering of genetic regulatory networks: insights from an identifiability analysis of an in silico network. Genome Res. 2003; 13(11):2396–405.PubMed CentralPubMedView ArticleGoogle Scholar
 Yue H, Brown M, Knowles J, Wang H, Broomhead DS, Kell DB. Insights into the behaviour of systems biology models from dynamic sensitivity and identifiability analysis: a case study of an nf κb signalling pathway. Mol Biosyst. 2006; 2(12):640–9.PubMedView ArticleGoogle Scholar
 Anguelova M, Cedersund G, Johansson M, Franzen C, Wennberg B. Conservation laws and unidentifiability of rate expressions in biochemical models. IET Syst Biol. 2007; 1(4):230–7.PubMedView ArticleGoogle Scholar
 Srinath S, Gunawan R. Parameter identifiability of powerlaw biochemical system models. J Biotechnol. 2010; 149(3):132–40.PubMedView ArticleGoogle Scholar
 Szederkényi G, Banga JR, Alonso AA. Inference of complex biological networks: distinguishability issues and optimizationbased solutions. BMC Syst Biol. 2011; 5(1):177.PubMed CentralPubMedView ArticleGoogle Scholar
 Miao H, Xia X, Perelson AS, Wu H. On identifiability of nonlinear ode models and applications in viral dynamics. SIAM Rev. 2011; 53(1):3–39.View ArticleGoogle Scholar
 Jia G, Stephanopoulos G, Gunawan R. Incremental parameter estimation of kinetic metabolic network models. BMC Syst Biol. 2012; 6(1):142.PubMed CentralPubMedView ArticleGoogle Scholar
 Cedersund G. Conclusions via unique predictions obtained despite unidentifiability–new definitions and a general method. FEBS J. 2012; 279(18):3513–27.PubMedView ArticleGoogle Scholar
 Berthoumieux S, Brilli M, Kahn D, De Jong H, Cinquemani E. On the identifiability of metabolic network models. J Math Biol. 2013; 67(67):1795–832.PubMedView ArticleGoogle Scholar
 DiStefano III J. Dynamic systems biology modeling and simulation. Waltham, MA, USA: Academic Press; 2014.Google Scholar
 Sontag ED. For differential equations with r parameters, 2r+ 1 experiments are enough for identification. J Nonlinear Sci. 2002; 12(6):553–83.View ArticleGoogle Scholar
 Heavner BD, Smallbone K, Barker B, Mendes P, Walker LP. Yeast 5–an expanded reconstruction of the saccharomyces cerevisiae metabolic network. BMC Syst Biol. 2012; 6(1):55.PubMed CentralPubMedView ArticleGoogle Scholar
 Smallbone K, Simeonidis E. Flux balance analysis: A geometric perspective. J Theor Biol. 2009; 258(2):311–5.PubMedView ArticleGoogle Scholar
 Liebermeister W, Uhlendorf J, Klipp E. Modular rate laws for enzymatic reactions: thermodynamics, elasticities, and implementation. Bioinformatics. 2010; 26(12):1528–34.PubMedView ArticleGoogle Scholar
 Li C, Donizelli M, Rodriguez N, Dharuri H, Endler L, Chelliah V, et al.BioModels Database: an enhanced, curated and annotated resource for published quantitative kinetic models. BMC Syst Biol. 2010; 4:92.PubMed CentralPubMedView ArticleGoogle Scholar
 Wurm FM. Production of recombinant protein therapeutics in cultivated mammalian cells. Nat Biotechnol. 2004; 22(11):1393–98.PubMedView ArticleGoogle Scholar
 Wittmann DM, Krumsiek J, SaezRodriguez J, Lauffenburger DA, Klamt S, Theis FJ. Transforming boolean models to continuous models: methodology and application to tcell receptor signaling. BMC Syst Biol. 2009; 3(1):98.PubMed CentralPubMedView ArticleGoogle Scholar
 Chaouiya C, Bérenguier D, Keating SM, Naldi A, Van Iersel M. P, Rodriguez N, et al.Sbml qualitative models: a model representation format and infrastructure to foster interactions between qualitative modelling formalisms and tools. BMC Syst Biol. 2013; 7(1):135.PubMed CentralPubMedView ArticleGoogle Scholar
 Mjolsness E, Sharp DH, Reinitz J. A connectionist model of development. J Theor Biol. 1991; 152(4):429–53.PubMedView ArticleGoogle Scholar
 Reinitz J, Sharp DH. Mechanism of eve stripe formation. Mech Dev. 1995; 49(1):133–58.PubMedView ArticleGoogle Scholar
 RodríguezFernández M, Egea JA, Banga JR. Novel metaheuristic for parameter estimation in nonlinear dynamic biological systems. BMC Bioinformatics. 2006; 7:483.PubMed CentralPubMedView ArticleGoogle Scholar
Copyright
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.