Model building
A mathematical model has three important functions: first, it helps to better understand the biological phenomenon studied; secondly, it enables experiments to be specifically designed to make predictions of certain characteristics of the biological system that can then be experimentally verified; and finally, it summarizes the current body of knowledge in a format that can be easily communicated. Devising such a model involves a number of steps (Figure 1), commencing with a definition of its purpose and finishing with a preliminary working model.
The purpose of the model will condition the selection of the modeling framework and the information that should be included in the model. Only elements that might have an impact on the questions to be addressed by the model should be included. In this regard, account should be taken of the fact that reaction models can only include a small subset of all reactions taking place within a cell. Thus, assumptions must be made about the extent to which the species included in the model evolve independently of the species excluded from the model, and also about the species that are crucial for the purpose of the model. At this stage it is possible to define the network architecture and decide which type of modeling framework may be the most appropriate (deterministic generalized mass action based models, powerlaw models, stochastic models, partial differential equations, etc.)
In the next step, an initial mathematical model structure (or battery of model structures) is proposed. New experimental information must then be used to verify hypotheses, and to discriminate, if possible, among different model alternatives. The candidates will often depend on a number of unknown nonmeasurable parameters that can be computed by means of experimental data fitting (identification).
This crucial step provides the mathematical structure with the capacity to reproduce a given data set, make predictions and discriminate among different model candidates.
The last step is validation, which essentially means reconciling model predictions with any new data observed. This process is likely to reveal defects, in which case a new model structure and/or new (optimal) experiment is planned and implemented. This process is repeated iteratively until validation is considered to be complete and satisfactory.
Note that the success of this modelbuilding loop relies on being able to perform experiments under a sufficient number of conditions to extract a rich ensemble of dynamic responses, to accurately measure such responses and to iterate in order to improve the predictive capabilities of the model without a significant cost.
Since model identification is a task that consumes large amounts of experimental data, an iterative identification procedure is proposed which is intended to accurately compute model unknowns while reducing experimental cost.
Optimal identification procedure
The proposed iterative identification procedure is depicted in Figure 2.
If there are several model candidates two extra steps should be included in the loop, one to analyze structural distinguishability among candidates and the other to design experiments for model discrimination [17].
Mathematical model formulation
We will assume a biological system described by the vector of state variables x(t) ∈ X ⊂ , which is the unique solution of the set of nonlinear ordinary differential equations:
where corresponds to the external factors and θ∈ Θ ⊂ is the vector of model parameters where Θ is the feasible parameter space.
Moreover, given an experimental scheme, with n_{
e
}experiments, observables per experiment e and sampling times per experiment e and observable o, y^{e, o}∈ Y ⊂ will regard the vector of discrete time measurements, as follows:
where regards the s^{th}sampling time for observable o in experiment e. Thus every experimental (measured) data will be denoted as and similarly, the corresponding model predictions will be denoted as .
Structural identifiability analysis
Once the structure of the statespace representation, Eqns. (1)(3), has been established, the structural identifiability problem is concerned with the possibility of calculating a unique solution for the parameters while assuming perfect data (noisefree and continuous in time and space). Structural identifiability is thus related to the model structure and possibly to the type of stimulation and independent of the parameter values.
There are, at least, two obvious reasons to asses structural identifiability: first, the model parameters have a biological meaning, and we are interested in knowing whether it is at all possible to determine their values from experimental data; second, is related with the problems that a numerical optimization approach may find when trying to solve an unidentifiable model.
There are a few methods for testing the structural identifiability of nonlinear models [18, 19]: the similarity transformation approach [20], differential algebra methods [21, 22] and power series approaches [23, 24]. Unfortunately there is no method amenable to every model, thus at some point we have to face the selection of one of the possibilities. All of them present limitations related to the nonlinearity and the size of the system under consideration, meaning by size the number of state variables, the number of parameters and the number of observables. Probably the most easy to apply, provided one uses a symbolic manipulation software, are the power series expansions methods. In this regard two possibilities have been developed: the Taylor series and the generating series.
Details of the Taylor series approach can be found in [23]. The approach is based on the fact that observations are unique analytic functions of time and so all their derivatives with respect to time should also be unique. It is thus possible to represent the observables by the corresponding Maclaurin series expansion and it is the uniqueness of this representation that will guarantee the structural identifiability of the system. The idea is to establish a system of nonlinear algebraic equations on the parameters, based on the calculation of the Taylor series coefficients, and to check whether the system has a unique solution. The generating series approach[24] allows to extend the analysis to the entire class of bounded and measurable stimuli. In this case the series is generated with respect to the stimuli domain. The method requires the model to be linear in the stimuli as follows:
The observables can be expanded in series with respect to time and stimuli in such a way that the coefficients of this series are g(x, θ, t = 0) and the Lie derivatives:
where L_{
fg
}is the Lie derivative of g along the vector field f, given by:
with f_{
j
}the jth component of f.
If s(θ) regards the vector of all the coefficients of the series, a sufficient condition for the model to be identifiable is that there exists a unique solution for θ from s(θ), similarly to the Taylor series method. Note also that power series approaches assume that all the information on the progress of the observables is contained in the germ, i.e. the infinite set of power series coefficients evaluated at t = 0^{+}. If the derivatives are zero, then the germ is said not to be informative and no conclusions about identifiability can be directly drawn. Later observations (t > 0) could give more information and restrict the set of feasible values of θ. Probably the major drawback of the power series approaches is that the necessary number of power series coefficients is usually unknown. For the Taylor series approach an upper limit has been shown for bilinear and polynomial systems [25, 26]. Additionally Margaria et al. (2001) [27] showed that for the combination of the Taylor series and the differential algebra approaches, n_{
x
}+ 1 derivatives are sufficient for the case of rational systems with one observable. However there are not bounds for a general nonlinear system. In addition, solving the nonlinear system of equations resulting from the power series approaches is usually not a trivial task, particularly when the number of parameters is large and the number of observables is reduced. We therefore propose using the following identifiability tableaus to easily visualize the possible structural identifiability problems.
The tableau represents the nonzero elements of the Jacobian of the series coefficients with respect to the parameters. It consists of a table with as many columns as parameters and with as many rows as nonzero series coefficients, in principle, infinite, as shown in Figure 3.
If the Jacobian is rank deficient, i.e. the tableau presents empty columns, the corresponding parameters may be unidentifiable. Note that since the number of series coefficients may be infinite, unidentiability may not be fully guaranteed unless higher order series coefficients are demonstrated to be zero.
If the rank of the Jacobian coincides with the number of parameters, then it will be possible to, at least, locally identify the parameters. In this situation a careful inspection of the tableau will help to decide on an iterative procedure for solving the system of equations, as follows:

The number of nonzero coefficients is usually much larger than the number of parameters. In practice this means that we should select the first n_{
θ
}rows that guarantee the Jacobian rank condition. The tableau helps to easily detect the necessary coefficients and to generate a "minimum" tableau.

A unique nonzero element in a given row of the minimum tableau means that the corresponding parameter is structurally identifiable. If any, the parameters in this situation can be computed as functions of the power series coefficients and can be then eliminated from the "minimum" tableau to generate a "reduced" tableau. Subsequent reductions may lead to the appearance of new unique nonzero elements and so on. Thus all possible "reduced" tableaus should be built first.

Once no more reductions are possible, one should try to solve the remaining equations. Since it is often the case that not all remaining power series coefficients depend on all parameters, the tableau will help to decide on how to select the equations to solve for particular parameters.

If several meaningful solutions exist for a given set of parameters, then the model is said to be locally identifiable.
If the model turns out not to be completely identifiable, identifiability may be recovered by extending the set of observables, however this may not be accessible in practice. Alternatively one may consider fixing some parameters [21] or to reformulate the model.
Global ranking of parameters
Observables will depend differently on different parameters and this may be used to rank the parameters in order of their relative influence on model predictions. Such influence may be quantified by the use of parametric sensitivities.
Local parametric sensitivities for a given experiment e, observable o and at a sampling time are defined as follows:
They may be numerically computed by using the direct decoupled method within a backward differentiation formulae (BDF) based approach, as implemented in e.g. ODESSA [28].
The corresponding relative sensitivities, , can be used to asses the individual local parameter influence or importance, that is to establish a ranking of parameters. Brun and Reichert (2001) [29] suggested several importance factors, that may be generalized for the case of having several observables and experiments [16].
Of course, the values of the parameters are not known a priori, and even when optimally computed, optimal values are subject to uncertainty depending on the type of experiments and the presence of experimental noise. Consequently, the ranking for a given value of the parameters may be of limited value. Alternatively, one may compute ranking for a sufficiently large number of parameter vectors in the feasible parameter space.
The simplest approach is to apply a Monte Carlo sampling. By sampling repeatedly from the assumed jointprobability density function of the parameters and by evaluating the sensitivities for each sample, the distribution of sensitivity values, along with the mean and other characteristics, can be estimated. This approach yields reasonable results if the number of samples is quite large, requiring a great computational effort.
An alternative that can yield more precise estimates is Latin hypercube sampling (LHS). This method selects n_{
lhs
}different values for each of the parameters, which it does by dividing the range of each parameter into n_{
lhs
}nonoverlapping intervals on the basis of equal probability. Next, from each interval one value for the parameters is selected at random with respect to the probability density in the interval.
The n_{
lhs
}values thus obtained for the first parameter are then paired in a random manner (equally likely combinations) with the n_{
lhs
}values for the second and successive parameters. This method allows the overall parameter space to be explored without requiring an excessively large number of samples. The importance factors will then read:
where N_{
D
}= n_{
lhs
}n_{
e
}n_{
o
}n_{
s
}, δ^{msqr}and δ^{mabs}quantify how sensitive a model is to a given parameter considering δ^{mabs}interactions between parameters. δ^{max}and δ^{min}indicate the presence of outliers and provide information about the sign. δ^{mean}provides information about the sign of the averaged effect a change in a parameter has on the model output.
Ordering the parameters according to these criteria, preferably in decreasing order, results in a parameter importance ranking. This information may be useful to decide on reformulating the model or to fix the less relevant parameters to improve either structural or practical identifiability.
Note that the summations will, in general, hide the different effects from the different experiments and observables unless they are in the same order of magnitude. Similar analyses may be performed for experiments and observables, thus providing information on the parameters that are more relevant to a particular observable in a particular type of experiment.
Model calibration
Given the measurements, the aim of model calibration or parameter identification is to estimate some or all of the parameters θ in order to minimize the distance among data and model predictions. The maximumlikelihood principle yields an appropriate cost function to quantify such distance, which, for the case of Gaussian noise with known or constant variance, reads as the widely used weighted leastsquares function:
where collects the information related to a given measurement experimental noise.
Parameter identification is then formulated as a nonlinear optimization problem, where the decision variables are the parameters and the objective is to minimize J(θ) subject to the system dynamics in Eqns. (1)(3) and also, possibly, to some algebraic constraints that define the feasible region Θ.
This problem has recently received a great deal of attention in the literature. Jaqaman and Danuser presented a guide for model calibration in the context of biological systems [30] noting that there are two major issues in minimizing 13: first, the presence of local minima and second, the lack of practical identifiability.
To deal with first difficulty several authors have proposed the use of global optimization methods [31–34], since most of the model calibration problems related to biological models have several suboptimal solutions. Recently suggested, in addition, was the use of sequential hybrid globallocal methods [35, 36] to enhance efficiency, particularly for highly multimodal and large scale systems.
Practical identifiability analysis
As already mentioned in the introduction, practical identifiability analysis enables an evaluation of the possibility of assigning unique values to parameters from a given set of experimental data or experimental scheme subject to experimental noise. We distinguish between practical identifiability a priori, which anticipates the quality of the selected experimental scheme in terms of what we will call the expected uncertainty of the parameters, and practical identifiability a posteriori, which assesses the quality of the parameter estimates after model calibration in terms of the confidence region.
It is important to note that the major difference between the two analyses is that, a priori, we have to assume a maximum experimental error, whereas, a posteriori, since the experimental data are already available, the experimental error may be estimated either through experimental data manipulation (when replicates of the experiments are available) or after model calibration using the residuals (i.e. the differences among model predictions and the experimental data) [37].
Possibly the simplest approach to perform such analyses given a set of simulated (a priori) or real (a posteriori) experimental data is to draw contours of the cost J(θ) by pairs of parameters. This will help detect typical practical identifiability problems, such as strong correlation between parameters, the lack of identifiability for some parameters when the contours extend to infinity, or the presence of suboptimal solutions.
To quantify the expected uncertainty of the parameters and/or the confidence region, we rely on a Monte Carlobased sampling method [38–40]. The underlying idea is to simulate the possibility of performing hundreds of replicates of the same experimental scheme for a given experimental error. The model calibration problem is solved for each replicate and the cloud of solutions is recorded in a matrix. Note that, in order to avoid convergence to local solutions, an efficient global optimization method is required. The cloud of solutions is assumed to correspond to, or to be fully contained in, a hyperellipsoid. Principal component analysis applied to the 0.95  0.05 interquartile range of the cloud or matrix of solutions then provides information on hyperellipsoid eccentricity (correlation between parameters) and pseudovolume (accuracy of the parameters). The analysis of the histograms of the parameter solutions provides the mean value of the parameters (μ) and either maximum expected uncertainty (a priori) or the confidence intervals (a posteriori) for the parameters (C_{
θ
}). See details in [40].
The obtained expected uncertainty of the parameters will allow the different experimental designs to be compared a priori, i.e. without performing any experiment. The richest experiment, in terms of the quantity and quality of information, will be the one with the best compromise between pseudovolume and eccentricity.
The confidence intervals obtained for the parameters will enable a decision to be made on the need to perform further experiments to improve the quality of the parameter estimates and, thus, the predictive capabilities of the model.
Optimal experimental design
A crucial aspect of experimental data is data quantity and quality. As mentioned in the previous section, a given set of data may result in practical identifiability problems. This is why data generation and modeling have to be implemented as parallel and interactive processes, thereby avoiding the generation of data that may eventually turn out to be unsuited for modeling.
In addition, the use of modelbased (in silico) experimentation can greatly reduce the effort and cost of biological experiments, and simultaneously facilitate the understanding of complex biological systems [41–44].
The model identification loop is complemented with an optimal experimental design step. The aim is to calculate the best scheme of measurements in order to maximize the richness (quantity and quality) of the information provided by the experiments while minimizing, or at least, reducing, the experimental burden [38, 40].
The richness of the experimental information may be quantified by the use of the Fisher Information Matrix (ℱ) [37, 45], which for the case of Gaussian known or constant variance reads as follows:
where E represents the expectation for a given value of the parameters μ presumably close to the optimal solution θ*.
The optimal experimental design is then formulated and solved as a general dynamic optimization problem, see details in [40], that computes the timevarying stimuli profile, sampling times, experiments duration and (possibly) initial conditions so as to maximize a scalar measure of the Fisher Information Matrix subject to the system dynamics (Eqn. 1 and 3) and to other algebraic constraints associated with experimental limitations.
Regarding the selection of the scalar measure of the ℱ, several alternatives exist all of them related to the eigenvalues of the ℱ and thus related to the shape and size of the associated hyperellipsoid. The most popular are probably the Doptimality and Eoptimality criteria, the former corresponding to the maximization of the determinant of the ℱ and the latter corresponding to the maximization of the minimum eigenvalue. From previous studies [40] it may be concluded that the Eoptimality criterion offers the best quantityquality compromise for the information, particularly for cases where the parameters are highly correlated or the sensitivities with respect to the parameters are highly uneven; otherwise Doptimality may be more successful.