A variational approach to parameter estimation in ordinary differential equations
© Kaschek and Timmer; licensee BioMed Central Ltd. 2012
Received: 25 October 2011
Accepted: 25 July 2012
Published: 14 August 2012
Ordinary differential equations are widely-used in the field of systems biology and chemical engineering to model chemical reaction networks. Numerous techniques have been developed to estimate parameters like rate constants, initial conditions or steady state concentrations from time-resolved data. In contrast to this countable set of parameters, the estimation of entire courses of network components corresponds to an innumerable set of parameters.
The approach presented in this work is able to deal with course estimation for extrinsic system inputs or intrinsic reactants, both not being constrained by the reaction network itself. Our method is based on variational calculus which is carried out analytically to derive an augmented system of differential equations including the unconstrained components as ordinary state variables. Finally, conventional parameter estimation is applied to the augmented system resulting in a combined estimation of courses and parameters.
The combined estimation approach takes the uncertainty in input courses correctly into account. This leads to precise parameter estimates and correct confidence intervals. In particular this implies that small motifs of large reaction networks can be analysed independently of the rest. By the use of variational methods, elements from control theory and statistics are combined allowing for future transfer of methods between the two fields.
KeywordsParameter estimation Calculus of variations Boundary value problem Optimal control Reaction networks Ordinary differential equations Statistical inference
Frequently, signalling pathways and chemical reaction networks in systems biology are modelled by ordinary differential equations (ODE). In many cases, the reaction networks are open systems comprising external inputs like drug stimuli. The system is then modelled by a non-autonomous ODE.
Similarly, modules of reaction networks are open systems. The nodes they have in common with the surrounding network are not or not entirely determined by the module species. They can be considered as intrinsic inputs and again the system can be modelled by a non-autonomous ODE. An example for such a cross-talk can be found in.
While reaction rates and initial reactant concentrations form a countable set of parameters, inputs correspond to an innumerable set of parameters since in general, every function of time is possible as input and each function value at each time point is a free parameter. Commonly, if measurements for the inputs are available, non-parametric estimates like smoothing splines are employed to describe the input data[2, 3]. Given the input, an objective function depending on rate parameters and initial values is defined and its minimum is approached by numerical optimization methods. In this way, the problem of infinitely many parameters is avoided. As we will show, one problem associated to this approach is that it does not account for the uncertainty present in the input. As a consequence, estimated parameter confidence intervals do not cover the actual variability, i.e. they are too small.
Therefore, it is preferable to parametrize the input which is possible if certain knowledge about origin and processes underlying inputs is available. This enables a reasonable choice of basis functions and the parametrization becomes finite. Following this approach, the problem of erroneous confidence intervals is circumvented presuming that the input model is correct. However, this assumption is problematic if only sparse information about the inputs and few measurement points are available.
We propose to approach the problem of input parametrization by calculus of variations. In the Result section, the system’s objective function used for ordinary parameter estimation is extended to a functional to be minimised. The original non-autonomous ODE is transformed into an augmented autonomous ODE. The result is interpreted and applied to simulated data.
Results and discussion
Derivation of the augmented ODE system
penalizes distances between species measurements and model prediction y μ (t i , [x], p,y0) at time points t i quadratically and weighted by the measurement uncertainties. In addition, input measurements are compared with the input function values x ν (t i ). In particular, χ2 is already a functional of [x]. In case of Gaussian noise, eq. (2) coincides with the maximum likelihood estimator.
The trajectory variation δyh is derived by eq. (1) and is expressed by variation of constants: Φ(t) denotes the fundamental system of the homogeneous linear problem with the matrix ∇ y f of first derivatives of f with respect to y and ∇ x fh constitutes the inhomo- geneity. Furthermore, a weighted residual function is defined as, analogously. For a detailed derivation see sections 2-5 in the Additional file1.
The right-hand sides of eqs. (67) depend on state variables y, u, and x, the latter being constrained by eq. (5). Particularly, if the input enters linearly in the dynamics of the reaction network, ∇ x f is independent of x and eq. (5) can be directly solved for x, i.e.. However, even in the non-linear case, the implicit function theorem provides the possibility to check locally whether eq. (5) uniquely defines x(u y). For the discussion of a global version of this statement, see section 6 of the Additional file1.
From the definition of u it follows that u(T) = 0 needs to vanish at the final time point T. Hence, the augmented ODE system needs to satisfy two-way boundary conditions y(0) = y0 and u(T) = 0. This fact constitutes a remarkable difference to the original initial value problem.
Starting from a dynamic system with inputs and measurements for both, state variables and inputs, we have derived differential equations for both of them. The original initial value problem has been transformed into a boundary value problem which is to be solved numerically. The solution trajectories Y(t|p,y0) = (y(t|p,y0),x(t|p,y0)) minimise the χ2 functional for given dynamic parameters p and initial values y0. However, there is still notable freedom in the choice of data and uncertainty representations, denoted by S y , S x and S σ , which decides about the meaning of the solution trajectories.
i.e. a sum of Gaussians located around the measurement points. The parameter τ is a measure for the correlation length of the data prior.
over the finite dimensional parameter space of p and y0. Note that the time-discrete χ2 function and the time-continuous χ2functional do not coincide exactly. Thereby, different measures of optimality are applied to input functions and parameters. This difference is resolved in the asymptotic case of infinitely many measurement points.
The distinction between parameter estimation and input reconstruction has further implications on the estimation of uncertainty bounds. Confidence intervals can only be assigned to the dynamic parameters and initial conditions. In contrast, the input becomes a usual state variable by construction. For state variables, the confidence region in parameter space needs to be mapped to state space by prediction, i.e. by evaluating the model for different parameter values within the confidence region. This can e.g. be realized by parameter sampling using MCMC methods. Alternatively, profile likelihoods can be employed.
with a diagonal matrix of new inputs. By construction, x can not change sign over time. The choice S D (t) = 0 and for all t reflects a constant input prior with penalized first derivative and can serve as starting point. Besides enforcing positivity of the input, the extension by eq. (11) presents a workaround for dealing with non-linear inputs because the new input variables d ν enter linearly and the old inputs x ν become regular state variables.
where g μν and gμ,0 do not depend on the input variables which have been transformed to by a coordinate transformation φ. Examples could be φ(x) = x2 or for a bimolecular or an enzymatic reaction, respectively. The possibility of a change of variables covers a broad range of biologically relevant reaction networks.
Although computation for linear input is remarkably faster than for non-linear input, it is still slower than solving an initial value problem. On the other hand, the solution of the boundary value problem is already optimal with regard to the input course. Therefore, computing time has to be compared to the time a parameter optimization algorithm takes to estimate the parametrized input course. The comparison will strongly depend on the number of parameters that are necessary to parametrize the partially unknown input. So far, there has not been a comprehensive study comparing the two methods.
Application to simulated data
with the auxiliary state variables u A , u B , the data representations S A , S B and the uncertainty representations and. The input x is related to the other state variables by. Several input functions x have been chosen for data generation, among them an exponential decay, x ∼ e−αt, an activation dynamics with a slow decay after a fast increase, x ∼ e−αt − e−βt with α > β, and a Gaussian input,. The example is numerically implemented in C and in R. Optimization is performed by a Gauss-Newton algorithm for nonlinear least-squares estimation.
The purpose of this section is to compare parameter estimation for the variational and the fixed input approach. The input data prior, i.e. the smoothing spline through the simulated input data points, is employed as input function for the fixed input approach.
A rather different situation is depicted in Figure2A-B. The input x is measured at four time points only, leading to a poor data prior shown as green dotted line in Figure2A. Like before, the species A and B have been measured at 20 time points. Most of the information about the dynamics of the input is encoded in these measurements. The correlation time τ has been chosen to be much smaller than the distance between time points allowing for much interstitial variability. The resulting trajectories Y after parameter estimation are shown as dotted lines in Figure2B. The true input curve is reconstructed almost entirely. The noticeable fluctuations are caused by coincidental noise correlations between species A and B: simultaneous deviations from the true course in opposite directions lead to immediate breakouts of the reconstructed input.
Also for this set-up, 1000 noise realizations have been generated for the comparison of the variational and the fixed input approach. The parameter and initial value distributions for both approaches are shown in Figure2C. Since the true input can be reconstructed, the variational approach is able to estimate all parameters accurately. In contrast, when the input is fixed to the apparent input data prior, parameter estimation leads to biased parameter estimates.
Finally, we investigated the coverage probability of the confidence intervals derived from the variational and the fixed input approach: for each simulated data set, parameter estimation is performed, confidence intervals are computed and the information if the true parameter value is situated within the 68%/90% confidence interval is collected. This information is cumulated over many runs of data generation.
For the set-up with 20 input measurement points, both estimation approaches provide accurate estimators with similar variances as confirmed by Figure1. However, as Figure3 shows, the coverage differs significantly between the two approaches. Confidence intervals for k1 and k2 are systematically underestimated by the fixed input approach. The variational approach in contrast is able to correctly take the degrees of freedom in the input into account. Thus, the coverage is close to the expected values.
For the set-up with 4 input measurement points, the variational approach performs significantly better than the fixed input approach with respect to coverage. However, also the variational approach produces confidence intervals that are slightly too small for the dynamic parameters k1 and k2, Figure3B left, and too small for the estimated initial values, Figure3B right. The reason for this behaviour is a combination of the small correlation length τ and the objective function given by eq. (9). Short values of τ allow that the input function has fast fluctuations. Especially around the input measurement points, function values tend to punctually approach the measured values, favoured by the time-discrete objective function. Since these fluctuations occur at a short time scale, it has little influence on the course of A and B and thereby, estimation of the dynamic parameters is almost unaffected.
This case shows that τ needs to be chosen appropriately for the problem: small for comprehensive input reconstruction and larger for propagation of uncertainties. A second possibility would be to adapt statistical results for conventional parameter estimation to the case of time-continuous objective functions.
In many applications, it is difficult to guess a proper input model because input data is not available or too noisy. Instead of parametrizing the input, we employed variational calculus to transform the ODE into an augmented system of ODEs describing the original and the input components. The solution of this system minimises the χ2 functional which plays a central role and is directly associated to the objective function of the original estimation problem. Since the extension of the χ2 function to the χ2 functional is not unique, the new functions, i.e. continuous data and uncertainty representations, need to be chosen intentionally. To this end we propose smoothing splines that have a concrete interpretation as data priors. Especially in the case of sparse sampling we propose to use weighting functions for the uncertainties. By this means, existing measurement points are taken into account and the course between time points is not excessively constrained by the data prior.
In the field of control theory and optimal control, so called cost functionals take the role of our χ2 functional. Once chosen the appropriate χ2 functional, our approach to input estimation can be embedded in the general framework of Pontryagin’s minimum principle and eqs. (67) can be identified as a Hamiltonian system.
We showed that our combined variational approach to parameter estimation enables the assembly of all information present in species and input measurements. By this means, it accounts properly for variability in the input due to measurement uncertainties and produces correct confidence bounds. Depending on the situation, the combination of all information leads to comprehensive reconstruction of the input curves. Information about the dynamics of the input can be concentrated in the species measurements like Figure2 shows. In such cases our approach clearly outperforms conventional approaches. The variational method is even applicable if no input measurements are available or if species are partially unobserved. A prominent example where the presented method could be applied is the PI3K/AKT/mTOR pathway. Even though various mTOR complexes and their phosphorylated states can be measured, it is not clear how they mediate AKT activation. By applying the variational method to AKT data, it would be possible to reconstruct the required mediator and subsequently relate it to mTOR complex measurements.
A completely different field of application is network modularization. The entire network can be dissected preferably at nodes where measurements are available. These nodes are then treated as independent inputs thus disentangling the network. In this way, the number of equations the variational approach has to deal with is kept small and computational efficiency is ensured.
A further step after the introduction of a time-continuous objective function would be to use the same function for parameter estimation. The time-continuous version of the objective function is closely related to the original function. Therefore, we are confident that it is possible to endow the time-continuous objective function with statistical meaning. This would not only allow for employing the same objective for parameter estimation and input reconstruction in our application. It would also enable the transfer of many more results from control theory and make it suitable for statistical inference.
The authors thank Raphael Engesser, Clemens Kreutz and Jan Hasenauer for their advice and valuable discussions. This work has been supported by the German Federal Ministry for Education and Research programme Medical Systems Biology SARA (0315394E).
- Kim D, Rath O, Kolch W, Cho KH: A Hidden Oncogenic Positive Feedback Loop Caused by Crosstalk between Wnt and ERK Pathways. Oncogene. 2007, 26 (31): 4571-4579. 10.1038/sj.onc.1210230.PubMedView ArticleGoogle Scholar
- Raue A, Kreutz C, Maiwald T, Bachmann J, Schilling M, Klingmüller U, Timmer J: Structural and Practical Identifiability Analysis of Partially Observed Dynamical Models by Exploiting the Profile Likelihood. Bioinformatics. 2009, 25 (15): 1923-1929. 10.1093/bioinformatics/btp358.http://bioinformatics.oxfordjournals.org/content/25/15/1923.abstract, 10.1093/bioinformatics/btp358PubMedView ArticleGoogle Scholar
- Swameye I, Müller TG, Timmer J, Sandra O, Klingmüller U: Identification of Nucleocytoplasmic Cycling as a Remote Sensor in Cellular Signaling by Databased Modeling. Proc Nat Acad Sci. 2003, 100 (3): 1028-1033. 10.1073/pnas.0237333100.http://www.pnas.org/content/100/3/1028.abstract, 10.1073/pnas.0237333100PubMedView ArticleGoogle Scholar
- Giaquinta M, Hildebrandt S: Calculus of Variations: The Lagrangian Formalism. 1996, Berlin: SpringerGoogle Scholar
- Kreutz C, Raue A, Timmer J: Likelihood based observability analysis and confidence intervals for predictions of dynamic models. ArXiv Preprint. 2011, arXiv:1107.0013v1 [physics.data-an]:p1–p16.Google Scholar
- R Development Core Team: R: A Language and Environment for Statistical Computing. 2011, Vienna: R Foundation for Statistical Computing, http://www.R-project.org [ISBN 3-900051-07-0]Google Scholar
- Suess EA, Trumbo BE: Introduction to Probability Simulation and Gibbs Sampling with R. 2010, New York: Springer, http://www.R-project.org [ISBN 3-900051-07-0]View ArticleGoogle Scholar
- Murphy SA, van der Vaart AW: On Profile Likelihood. J Am Stat Assoc. 2000, 95 (450): 449-465. 10.1080/01621459.2000.10474219.http://www.jstor.org/stable/2669386, 10.1080/01621459.2000.10474219View ArticleGoogle Scholar
- Kirk DE: Optimal Control Theory: An Introduction. 2004, Mineola: Courier Dover Publications, Inc.Google Scholar
- Dalle Pezze P, Sonntag AG, Thien A, Prentzell MT, Gdel M, Fischer S, Neumann-Haefelin E, Huber TB, Baumeister R, Shanley DP, Thedieck K: A Dynamic Network Model of mTOR Signaling Reveals TSC-Independent mTORC2 Regulation. Sci Signaling. 2012, 5 (217): ra25-10.1126/scisignal.2002469.View ArticleGoogle Scholar