Quantifying uncertainty, variability and likelihood for ordinary differential equation models

Background In many applications, ordinary differential equation (ODE) models are subject to uncertainty or variability in initial conditions and parameters. Both, uncertainty and variability can be quantified in terms of a probability density function on the state and parameter space. Results The partial differential equation that describes the evolution of this probability density function has a form that is particularly amenable to application of the well-known method of characteristics. The value of the density at some point in time is directly accessible by the solution of the original ODE extended by a single extra dimension (for the value of the density). This leads to simple methods for studying uncertainty, variability and likelihood, with significant advantages over more traditional Monte Carlo and related approaches especially when studying regions with low probability. Conclusions While such approaches based on the method of characteristics are common practice in other disciplines, their advantages for the study of biological systems have so far remained unrecognized. Several examples illustrate performance and accuracy of the approach and its limitations.


Background
Ordinary differential equations (ODEs) are commonly used for modeling biological and biochemical systems. ODE models are often subject to considerable uncertainty and/or variability in both initial conditions and parameters [1][2][3][4]. Particularly in the case of nonlinear ODEs, it is essential to have efficient and accurate techniques for analyzing the effects of uncertainty and variability on the dynamical behavior. The effect of variations in the input on model behavior (output), the model sensitivity, can be analyzed in various ways. Most numerical approaches address the problem either by computing local sensitivity indices (partial derivatives of the solution with respect to the input variables) [5,6], by solving the ODE for a statistically large ensemble of random or quasi-random input values [7][8][9], or by approximating the functional relationship of the input and output [10][11][12]. When uncertainty can be narrowed down to 'small' perturbations, it is often sufficient to study its effects locally. It is, however, difficult to determine a priori if the uncertainty is small, and in many biological applications the assumption of small perturbations either is questionable (for example in pharmacokinetics [1]) or has been shown to be wrong [13,14].
To study those effects globally the sensitivity analysis problem can be formulated in terms of an ODE with random initial conditions. The task then is to determine the probability density function (pdf), or some features of the pdf, at a time t > 0. Approaches to global sensitivity analysis typically rely on Monte Carlo-type random or quasirandom approximations to the output distribution [7,8].
In such settings, information on the probability distribution is not directly accessible but encoded in the position and denseness of the sampling points. The quality of the approximation at a point will thus depend on the coverage of a surrounding region of that point, and involves some density estimation steps that might require problem specific knowledge [15]. Investigating regions of low probability with high accuracy generally requires prohibitively large numbers of sample points (for example, 100 sampling points in a region with 1/1000 = 0.1% of the total probability in expectation requires 100 000 sampling points). In [16], a global sensitivity analysis of ODEs was proposed based on recasting the problem in terms of a first-order partial differential equation (PDE) that describes the evolution of the associated pdf (this approach is also called 'stochastic sensitivity analysis' [5,6,16]). Knowledge of the entire pdf, in contrast to some specific observables (mean, variance etc.), fully characterizes the impact of uncertainty and variability. The PDE view also facilitates model assessment and parameter estimation studies, since the model likelihood naturally offers a way of assessing deviations of the model output from experimental data. The availability of the likelihood then gives access to a wide range of wellestablished numerical tools for model assessment and parameter estimation [15,17,18].
To numerically solve the particular class of PDEs that arise in ODEs with random initial data, finite difference schemes are commonly applied [5,6,16]. These methods typically become computationally prohibitive beyond three dimensions. Moreover, the numerical treatment of PDEs is generally less accessible to practitioners. The unscented Kalman filter (UKF) for time-continuous systems [19], originally designed for the approximate solution of the closely related Fokker-Planck equation, has also been applied to solve this PDE and to obtain sensitivity estimates of ODE models [20]. The UKF yields normal approximations of the output pdf, where the estimated mean and variance are second-order approximations of the true output mean and variance. As we illustrate in our examples, non-linear ODE models can easily give rise to strongly non-normal output distributions, even if the initial distribution is normal, such that in this case an analysis using the UKF will give misleading results.
It is however well-known that first-order PDEs can be solved using the method of characteristics [21,22]. With the method of characteristics, the pdf can be computed along trajectories of the system by solving the original ODE with an additional dimension for the density. This method thus provides a bridge from the intricate PDE description back to the ODE setting, where numerical solutions are readily accessible to practitioners. Propagating points along the trajectories of the system is a common feature of the method of characteristics, Monte Carlo methods and the unscented Kalman filter. In contrast to the other methods, however, the method of characteristics also propagates information (the value of the PDF) along the trajectories, and these values are exact up to the accuracy of the ODE solver. The additional computational costs of solving the extended ODE are negligible as the extra dimension does not necessitate additional sample/discretization points. In fact, since the pdf at a given state value is directly computable using the method of characteristics and its accuracy at a point does not depend on the denseness of sampling points in the surrounding region, considerably fewer discretization points are sufficient to obtain accurate estimates as compared to Monte Carlobased methods [23].
Although the method of characteristics is known and widely used in other fields such as meteorology, e.g. [23], its applicability and potential has not been adequately recognized among the biological modeling community. In this article we review the formulation of ODEs with random inputs in terms of the PDE description and its solution via the method of characteristics. We demonstrate the use of the method of characteristics for sensitivity analysis as well as for likelihood-based model assessment and parameter estimation and illustrate its benefits and limitations by means of numerical case studies.

ODEs with uncertain or variable input
Consider an ODE of the form Typically, x belongs to some extended state space comprising the state variable z ℝ n and the parameters p ℝ m of the system with d = n + m. That is where f(z|p) describes the dynamics of z given the parameter values p, which are assumed to remain constant in time. For example, z might denote the concentration of a molecular species of a metabolic network or signalling pathway, and p the associated reaction rate constants. Under the assumption that F : ℝ d ℝ d is continuously differentiable with respect to x, the initial value problem (1) is known to have a unique solution x(t) for t ≥ 0 [24].
Uncertainty and variability in the model input can be modeled by assuming that x 0 = X 0 is a random variable with pdf u 0 : ℝ d ℝ. Consequently, the solution {X t } t≥0 of the initial value problem (1) is a stochastic (Markov) process. For any t ≥ 0, let us denote by u(t) = u(t, ·), where F i denotes the i-th component of F , and divF is the sum of the partial derivatives of F i (equivalently, the trace of the Jacobian DF). Note that (3) is the Fokker-Planck equation corresponding to a stochastic differential equation with zero diffusion [26].
Computing the pdf along solutions of the ODE It is well-known that first-order PDEs of the form (3) can be solved using the method of characteristics [21,22], which in our case is identical to the solution of (3) along the solution of the initial value problem (1). Define r(t) = u(t, x(t)), where x(·) denotes the solution of the initial value problem (1). Applying the chain rule, r obeys the ODE: is the gradient of u, and using (3) we finally obtain from (4) The PDE (3) can thus be solved pointwise for each x 0 by solving the original ODE (1) together with an extra dimension for the density r, i.e., with initial conditions x(0) = x 0 and r(0) = u 0 (x 0 ). Since x ℝ d , the new system (6) has d + 1 dimensions. The computational effort for solving the extra dimension is negligible compared to the information gained. The distribution u(T) at time T > 0 can be approximated by the following two steps: (I) Discretize the region of interest in the state space ℝ d , resulting in discretization points ξ i (0), i = 1,...,N.
This procedure directly yields the density values u(t, ξ i (t)) along the trajectories ξ i (t), t [0, T] up to the accuracy of the ODE solver. In the subsequent examples we will further show how to modify step (I) in order to investigate the distribution on particular regions of the state and parameter space.
In comparison, Monte Carlo-based methods require density estimation subsequent to solving the ODE for the sample points: (i) Sample the initial distribution u 0 , resulting in sampling points ξ i (0) ℝ d , i = 1,...,N.
(ii) For each sampling point ξ i solve the original ODE (1) to obtain ξ i (t) at t = T.
(iii) Estimate the density from the propagated points ξ i (T), i = 1,...,N; for example, by considering a neighborhood B(x) of a point x and approximating the pdf by the relative frequency, i.e., u(T, In contrast to the method of characteristics, the quality of the approximation u(T, x) at a single point x depends on the total number of sample points [15]. To obtain good estimates of the pdf in regions with low probability, Monte Carlo-based methods typically require very large sample sizes. The quality of the approximation may moreover depend on the structure of the pdf to be estimated, and might therefore necessitate problem specific knowledge. An example is given below, where the pdf is concentrated on a non-linear manifold due to fast, contracting directions.
We illustrate the advantages of the method of characteristics for sensitivity analysis of ODE models by two examples in gene expression. We will see that descriptors such as mean and variance may provide only poor information about the pdf. Our first example demonstrates the benefits of the method of characteristics in terms of an efficient computation of the pdf in regions with low probability. In the second example, the pdf contracts onto a lowerdimensional manifold of the state space, and the method of characteristics provides the density values directly along that manifold. In a third example we further show how the method can be used to compute the likelihood of an ODE model and thus facilitates comparison to experimental data. For the sake of simplicity, we choose normal initial distributions to describe state and parameter uncertainty and variability. The method of characteristics, however, provides a general strategy to compute model uncertainty/variability and likelihood for arbitrary distributions of initial values and parameters with the only restriction that the associated pdf u must be continuously differentiable. The choice of normal initial distributions also illustrates that the assumption of normal output distributions-underlying the UKF and least squares parameter estimation-is easily violated for non-linear ODEs, even if the initial distribution is normal.

Examples
Sensitivity analysis and the impact of variability Example 1 (analyzing regions of low probability) Consider a protein X activating its own expression by cooperatively binding to the promoter that positively regulates its own expression (illustrated in Figure 1). Assuming that X is diluted due to cell-growth with rate constant k d > 0, the concentration x of the protein is modeled by the ODE [27] The first summand describes the activation of gene expression in terms of a Hill function [27], where V max > 0 denotes the maximal expression rate, K > 0 the protein concentration corresponding to the half-maximal expression rate, and b > 0 describes the cooperativity of promoter activation. A saturable synthesis was chosen to reflect a finite gene copy number encoding for x. The second summand describes the dilution of x. An initial density u 0 may, for example, represent the abundance of the protein X in individuals of a population of cells. Figure 2(a) shows the density u(t) in the interval t [0, 50] by solving (6), with F as in (7), for a normal initial density u 0 with mean μ = 2 (in [ mol volume ]) and variance s 2 = 0.2 (in [ mol volume 2 2 ]), and with Most of the distribution is translated linearly, since for large x the Hill-type activation term in (7) is approximately constant. For smaller values of x, however, the right hand side of (7) is strongly nonlinear. For values of x close to zero the dynamics are very slow, which causes the formation of a heavy tail. This is a characteristic feature of bimodality associated with positive feedback loops. Due to only few discretization points on the interval x [0, 10] at t = 50 the structure of the heavy tail is only poorly resolved. In contrast to conventional Monte Carlo methods, the method of characteristics allows for a refined computation of the density on any sub-interval of interest by the following two steps: (I) Discretize the sub-interval of interest, and solve the original ODE (7) for each discretization point ξ i (T), i = 1,...N, backward in time from t = T to 0 to obtain ξ i (0).
The two-step procedure is illustrated in Figure 3. This way the distribution can be studied on arbitrary regions of the state and parameter space, and no subsequent normalization is required. We used the two-step procedure to obtain an improved resolution of the heavy tail (Figure 2(b)) and observe the formation of a second mode close to the origin. As an interpretation, this may imply that for this part of the population, X does not reach a certain threshold concentration within the given time interval (which, in turn, may be necessary to activate some other pathway). In other applications, such as toxicological risk assessment studies [28], the information may analogously be used to determine the percentage of a population that exceeds or remains below a certain toxicological threshold. To illustrate that the method of characteristics can be applied to any continuously differentiable pdf, not only normal pdfs, we repeated the above computations for an initial exponential distribution with mean μ = 2 (Figure 2(c) &2(d)). With standard Monte Carlo-based sensitivity approaches such localized information is difficult to obtain, when the region of interest has only low probability. The heavy tail in Figure 2(b) has a total probability of approximately 0.001, which means that in expectation only 0.1% of the Monte Carlo sampling points will lie in the interval [0, 10]. With the two-step procedure we used 100 discretization points to approximate the heavy tail. Compared to our approach, it would require 100 000 Monte Carlo sample points to expect the same coverage on the interval [0, 10], and the subsequent step of density estimation required by Monte Carlo methods further impacts the approximation quality [15].
Apart from variability in the initial concentration x, we can additionally account for variability in the parameter values. We computed the density for a state space extended according to (2) by the cooperativity b, by the maximal expression rate V max and by both b and V max , i.e., with extended state space variables (x, b)′, (x, V max )′ and (x, b, V max )′, respectively. The initial distribution was assumed to be a joint normal distribution, where x (0) had mean and variance as before, and the means of V max and b were set to 1 and 4, respectively, each with variance 0.025. Using the above two-step procedure, we computed the densities at T = 50. Figure 4 Numerically, we discretized the above integral using the midpoint rule.
Comparison of the distributions indicates that variability in the cooperativity b has minor impact on the final variability of the protein concentration x (gray dasheddotted vs. black dashed & solid gray vs. dotted black line). The corresponding joint distributions of (x, V max )′ and (x, b)′ are shown in Figure 4(b) as well as the twodimensional marginal distributions for the threedimensional case obtained by integrating only b or V max , respectively. Same as in the one-dimensional marginal distribution, it can be seen that the variability in protein concentration is mainly dominated by the variability in the maximal expression rate V max .
The numerical integration (compare eq. (8)), necessary to visualize multivariate densities for d ≥ 3 or to compute observables such as mean and variance, is currently the computationally limiting step in the application of the method of characteristics, since it requires a regular (uniform) discretization of the state and parameter space, which becomes prohibitive in high dimensions. For low-and moderate-dimensional systems of ODEs, however, the method of characteristics provides a more efficient and equally simple alternative to conventional approaches for studying uncertainty and variability of ODE systems: The density information obtained via the  method of characteristics is expectedly more accurate than estimates obtained with Monte Carlo methods [23] and considerably richer than a simple characterization of the variability/uncertainty by means of certain indicators (e.g. sensitivity indices, or variance decompositions [5][6][7]). It moreover facilitates further statistical analysis such as model assessment and parameter estimation, e.g., by means of information theoretical approaches [17,18], as illustrated later in Example 3 for the case of parameter estimation.
Example 2 (variability along lower-dimensional manifolds) Consider the genetic toggle switch [29], where two proteins X 1 and X 2 mutually repress the other protein's expression (illustrated in Figure 5). In [29], the concentrations x 1 and x 2 of X 1 and X 2 , respectively, are modeled by the two-dimensional system The parameter a 1 > 0 represents the effective expression rate of protein X 1 , and b 1 > 0 describes the repression cooperativity of the promoter that regulates the expression of X 1 by X 2 . Analogously, a 2 and b 2 describe the effective expression rate of protein X 2 and its promoter's cooperativity of repression by X 1 , respectively. The parameter q 1 (q 2 ) corresponds to the concentration of X 2 (X 1 ) that represses the promoter activity of X 1 (X 2 ) by 50%. The parameter k d again denotes the dilution rate constant.  . We computed the density u(t) solving (6) with F as in (9). At t = 10, the majority of the probability is concentrated on the slow manifold of the vector field with large variance along the manifold (see Figure  6(b)). Such steep distributions on lower-dimensional manifolds pose problems to many other methods: Methods based on the estimation of mean and co-variance fail to describe the true shape of the distribution. Most PDE solvers have numerical problems with such steep gradients. Monte Carlo-based density estimation may yield a too coarse-grained approximation of the true density if knowledge of the manifold is not provided a priori ; that is, to arrange the bins of a histogram or for kernel density estimation the centers of the kernel functions along the manifold. Using the method of characteristics we avoid these problems: the steep gradients do not pose any problems to the independently computed trajectories, which directly yield the density values on the attractor manifold. The method thus does not require problem specific ingenuity.

Deterministic models in a likelihood setting for comparison with experimental data
So far we have discussed the use of the method of characteristics to study the sensitivity of ODE models. It however also offers benefits when comparing model output with experimental data for model assessment, such as validation/falsification and selection between different models, or parameter estimation. An exact match of the deterministic model with the data is unlikely, and a quantification of the mismatch remains a critical issue. Some numerical approaches are based on verifying that the experimental data lies within regions of the state space that are reachable with certain parameter sets of the (often linearized) ODE model [30,31]. Most other approaches assess the mismatch based on root mean square deviations and select a model or parameters based on a minimization of these errors. Such least squares approaches are based on the assumption that deviations are due to additive Gaussian noise, usually assumed to be identical and independent at all points in time [32]. As we have seen in the previous examples, a normal distribution can typically not be expected for general, nonlinear ODE models, even if the initial distribution is assumed to be normal. In addition, classical least squares approaches do not allow taking into account prior information on the initial condition or on the parameter to be estimated. Stochastic approaches offer a natural way of assessing deviations of the model output from data based on the likelihood function. Given a set of data points D = {ξ 1 ,...,ξ N }, the likelihood of a model is defined as the probability that the model predicts this data. For continuous state space models M, the likelihood L(M|D) given D is defined via the pdf by the product of its values at the data points where u(·|M) denotes the density function of the output distribution of the model M. Based on the likelihood, there are many methods available for model assessment and parameter estimation [15,18]. We can directly use the method of characteristics to efficiently compute the likelihood of an ODE model for given data points. We first consider the case that parameters are known and only initial conditions are affected by uncertainty. Given prior knowledge of this uncertainty defined by the initial density u 0 , and further given data points D = {ξ 1 (T),...,ξ N (T)} at a time T > 0, the likelihood for each single data point can be computed analogously to the two-step procedure in Example 1: (I) Solve the ODE (1) for each of the N data points ξ i (T) ℝ d , i = 1,...,N, backward in time from t = T to 0 to obtain ξ i (0).
In accordance with (10) For sake of clarity we only consider one data point. For several data points the same procedure as described below applies, and the final likelihood is given by the product of the single-data likelihood values. As we are interested in the likelihood of different values of V max , we consider the autoregulation model (7) extended according to (2) by V max . We apply the above two-step procedure to a representative ensemble {υ 1 ,...,υ N } of values of V max . For each pair (ξ(T), υ i ) the backward-solution yields a different value (ξ(0), υ i ). Given prior knowledge in terms of a joint pdf u 0 for x 0 and V max , the forward-solution of (6) with initial conditions ((ξ i (0), υ i ), u 0 (ξ i (0), υ i )) then yields the likelihood values u(T, (ξ(T), υ i )) associated with each v i , i = 1,...,N.
We computed the likelihood of a set of equidistant values of V max [0, 2] using the same parameter values as in Example 1 (shown in Figure 7) for two different scenarios of prior information: (a) a joint normal distribution of x 0 and V max with parameters μ = (2, 1)' and Σ = diag{0.2, 0.01} (solid gray line), and (b) a joint distribution of x 0 and V max , where x 0 is normally distributed with  (2, 0.2) and V max is independently uniformly distributed on the interval [0, 2] (dashed black line). The first scenario accounts for prior knowledge of x 0 and V max , where a more or less precise knowledge of V max is given (since s 2 (V max ) is small). Accordingly, the maximum-likelihood estimate is V max * ≈ 1 close to the prior mean of V max . In the second scenario no prior information on V max was imposed (except for its constraint within [0, 2]). The maximum-likelihood estimate of V max is therefore solely determined by the value of V max that yields the initial value closest to μ(x 0 ) = 2. Since the data point ξ(t = 20) = 5 is relatively unlikely for larger V max (compare with Figure 2

Conclusions
Studying the effects of uncertainty and variability in initial values or parameters of ODE models can be computationally intensive, since it generally involves solving the system a large number of times. The method of characteristics offers a simple yet accurate alternative to conventional approaches for small-and moderatedimensional systems. The approach does not assume a particular shape of neither input nor output distribution, it only requires the pdf to be sufficiently smooth (continuously differentiable) and yields density values that are exact up the accuracy of the ODE solver used. Our first two examples illustrate how a precise characterization of the model uncertainty/variability can be obtained with only few trajectories. In this context we also demonstrated that the analysis can be efficiently restricted to certain sub-regions of the state/parameter space. One limitation of the two-step procedure used for the latter analysis is that for chaotic models the backward-forward solution of the ODE is ill-conditioned [23]. In such cases one may resort to approximate techniques as the UKF [20], but care must be taken that the assumption of the Kalman filter, that the distribution remains approximately normal, is satiesfied. Another limitation, currently the main one, is the need for uniform grids when dimensions are to be integrated from the final density. But we anticipate that the method of characteristics will prove useful in the context of error control for approximate solution methods of eq. (1) or (3) such as Monte Carlo or the unscented Kalman filter in higher dimensions by providing exact values of the pdf at particular points in state space. As another application we considered the comparison of model results with experimental data. For deterministic models numerical approaches typically rely on root mean squared errors to quantify deviations. Their minimization can be interpreted as the maximum-likelihood estimate based on the assumption that deviations are normally (typically independently and identically) distributed at all times. While being a simple assumption, for general nonlinear ODE models it is rarely expected to hold. In the third example we described a simple framework, where the method of characteristics was applied to maximum-likelihood parameter estimation based on a distribution that accounts for prior knowledge of parameters and initial values and for the system dynamics.
We provide MATLAB files illustrating the method of characteristics in Additional file 1.

Additional material
Additional file 1: MATLAB files illustrating the usage of the method of characteristics.