Hybrid optimization method with general switching strategy for parameter estimation
© Balsa-Canto et al. 2008
Received: 07 November 2007
Accepted: 24 March 2008
Published: 24 March 2008
Skip to main content
© Balsa-Canto et al. 2008
Received: 07 November 2007
Accepted: 24 March 2008
Published: 24 March 2008
Modeling and simulation of cellular signaling and metabolic pathways as networks of biochemical reactions yields sets of non-linear ordinary differential equations. These models usually depend on several parameters and initial conditions. If these parameters are unknown, results from simulation studies can be misleading. Such a scenario can be avoided by fitting the model to experimental data before analyzing the system. This involves parameter estimation which is usually performed by minimizing a cost function which quantifies the difference between model predictions and measurements. Mathematically, this is formulated as a non-linear optimization problem which often results to be multi-modal (non-convex), rendering local optimization methods detrimental.
In this work we propose a new hybrid global method, based on the combination of an evolutionary search strategy with a local multiple-shooting approach, which offers a reliable and efficient alternative for the solution of large scale parameter estimation problems.
The presented new hybrid strategy offers two main advantages over previous approaches: First, it is equipped with a switching strategy which allows the systematic determination of the transition from the local to global search. This avoids computationally expensive tests in advance. Second, using multiple-shooting as the local search procedure reduces the multi-modality of the non-linear optimization problem significantly. Because multiple-shooting avoids possible spurious solutions in the vicinity of the global optimum it often outperforms the frequently used initial value approach (single-shooting). Thereby, the use of multiple-shooting yields an enhanced robustness of the hybrid approach.
The goal of systems biology is to shed light onto the functionality of living cells and how they can be influenced to achieve a certain behavior. Systems Biology therefore aims to provide a holistic view of the interaction and the dynamical relation between various intracellular biochemical pathways. Often, such pathways are qualitatively known which serves as a starting point for deriving a mathematical model. In these models, however, most of the parameters are generally unknown, which thus hampers the possibility for performing quantitative predictions. Modern experimental techniques can be used to obtain time-series data of the biological system under consideration from which unknown parameters values can be estimated. Since these data are often sparsely sampled, parameter estimation is still an important challenge in these systems. On the other hand, the use of model-based (in silico) experimentation can greatly reduce the effort and cost of biological experiments, and simultaneously facilitates the understanding of complex biological systems. In particular, the modeling and simulation of cellular signaling pathways as networks of biochemical reactions has recently received major attention . These models depend on several parameters such as kinetic constants or molecular diffusion constants which are in many cases not accessible to experimental determination. Therefore, it is necessary to solve the so-called inverse problem which consists of estimating unknown parameters by fitting the model to experimental data, i.e., by solving the model calibration or parameter estimation problem.
Parameter estimation is usually performed by minimizing a cost function which quantifies the differences between model predictions and measured data. In general, this is mathematically formulated as a non-linear optimization problem which often results to be multi-modal (non-convex). Most of the currently available optimization algorithms, specially local deterministic methods, may lead to suboptimal solutions if multiple local optima are present, as shown in [2, 3]. This is particularly important in the case of parameter estimation for biological systems, since in most cases no clear intuition even about the order of magnitude exists. Finding the correct solution (global optimum) of the model calibration problem is thus an integral part of the analysis of dynamic biological systems. Consequently, there has been a growing interest in developing procedures which attempt to locate the global optimum. In this concern, the use of deterministic [4–9] and stochastic global optimization methods [10–12] have been suggested. For deterministic global optimization routines the convergence to the global optimum is guaranteed but this approach is only feasible for a considerably small number of parameters. Stochastic global optimizers on the other side converges rapidly to the vicinity of the global solution, although further refinements are typically costly. In other words, finding the location of the optimum is computationally expensive, especially for large systems as found in systems biology. Alternatively, Rodriguez-Fernandez et al.  propose a hybrid method to exploit the advantages of combining global with local strategies. That is, robustness in finding the vicinity of the solution using the global optimization procedure and the fast convergence to solution by the local optimization procedure. At a certain point the search is switched from using the global optimizer to the local optimization routine by this hybrid strategy. The determination of the so called switching point is done on the basis of exhaustive numerical simulations prior to the actual optimization run.
In this work a refined hybrid strategy is proposed which offers two main advantages over previous alternatives : First, we employ a multiple-shooting method which enhances the stability of the local search strategy. Second, we propose a systematic and robust determination of the switching point. Since the calculation of the switching point can be done during the parameter estimation itself, computationally expensive simulations are no longer needed.
The right-hand side of the ODE depends in addition on some parameters . It is further assumed that f is continuously differentiable with respect to the state x and parameters p. Let Y ij denote the data of measurement i = 1, ..., n and of observable j = 1, ..., N, whereas n represents the total amount of data and N is the number of observables. Moreover, the data Y ij satisfies the observation equation
Y ij = g j (x(t i ), p) + σ ij ε ij i = 1,...,n, (2)
for some observation function g : ℝ d → ℝ N , d ≥ N, σ ij > 0, where ε i 's are independent and standard Gaussian distributed random variables. The sample points t i are ordered such that t 0 ≤ t 1 < ...; <t n ≤ t f and the observation function g is again continuously differentiable in both variables. Eqs. (1) and (2) define an single-experiment design. If several experiments are available, possibly under different experimental conditions, Eq. (2) depends on each experiment and must be modified in the following manner
Y ijk = g j (x(t i ), p) + σ ijk ε ijk k = 1, ..., n exp . (3)
Certain parameters may be different for each experiment, but the treatment of these local parameters and the different experiments requires only obvious modifications of the described procedures and therefore only the single-experiment design n exp = 1 is discussed in the following for sake of clarity.
In general, minimizing ℒ is a formidable task, which requires advanced numerical techniques.
Mathematical modeling in systems biology rely on quantitative information of biological components and their reaction kinetics. Due to paucity of quantitative data, various numerical optimization techniques have been employed to estimate parameters of such biological systems. Employed optimization techniques include local, deterministic approaches like Levenberg-Marquardt algorithm, Sequential Quadratic Programming, and stochastic approaches like Simulated Annealing, Genetic Algorithms and Evolutionary Algorithms (see for example, [10, 13]). Most commonly, local methods optimize the cost function, Eq. (4), directly with respect to initial values x 0 and parameters p. This optimization scheme is called initial value approach or alternatively single-shooting. Huge differences in the performance can be observed if either local or global optimization methods are used. Due to the presence of multiple minima in Eq. (4), convergence of local optimization methods to the global minimum is in most cases limited to a rather small domain in search space, see, e.g., [2, 3]. In contrast, global methods have generally a substantially larger convergence domain but the computational cost increases drastically.
One of the simplest global methods is a multistart method. Here, a large amount of initial guesses are drawn from a distribution and subjected to a parameter estimation algorithm based on a local optimization approach. The smallest minimum is then regarded as being the global optimum. In practice, however, there is no guarantee of arriving to the global solution and the computational effort can be quite large. These difficulties are arising because it is a-priori not clear how many random initial guesses are necessary. Over the last decade more suitable techniques for the solution of multi-modal optimization problems have been developed (see, e.g.,  for a review). Several recent works propose the application of global deterministic methods for model calibration in the context of chemical processes, biochemical processes, metabolic pathways, and signaling pathways [4–6, 8, 9]. Global deterministic methods in general take advantage of the problem's structure and even guarantee convergence within a preselected level of accuracy. Although very promising and powerful, there are still limitations to their application, manly due to rapid increase of computational cost with the size of the considered system and the number of its parameters. As opposed to deterministic approaches, global stochastic methods do not require any assumptions about the problem's structure. Stochastic global optimization algorithms are making use of pseudo-random sequences to determine search directions toward the global optimum. This leads to an increasing probability of finding the global optimum during the runtime of the algorithm. The main advantage of these methods is that they rapidly arrive to the proximity of the solution. Examples of global stochastic methods are: pure random search algorithms, evolutionary strategies, genetic algorithms, scatter search and clustering methods. Some of these strategies have been successfully applied to parameter estimation problems in the context of systems biology, see [10, 11, 15].
In  a combination of global stochastic methods with local methods has been proposed. This, so called hybrid approach, utilizes the property of the global search strategy to arrive quickly to the vicinity of the solution. At a certain point in the proximity of the solution the optimizer is switched from the global stochastic to the local deterministic search method. It has been shown that this strategy saves a huge amount of computational effort and provides an efficient and robust alternative for model calibration. Therefore, the hybrid method takes advantage of the complementary strengths of both optimization strategies: global convergence properties in the case of the stochastic method, and fast local convergence in the case of the deterministic approach. Speed and the stability, however, of the resulting hybrid approach also depends on the performance of the used local approach. For this reason we choose the method of multiple-shooting rather than the initial value approach in order to refine the hybrid optimization strategy as described in . As shown below multiple-shooting has in general a larger domain of convergence to the global optimum while only a small portion of additional computational load has to be taken into account compared to single shooting. A brief outline of the multiple-shooting method is given below.
Detailed discussion and some applications to measured data of the method can be found, e.g., in [16–22]. Here, we will concentrate on the main principles of multiple-shooting in order to construct a new hybrid approach. The basic idea of multiple-shooting is to subdivide the time interval I = [t 0, t f ] into n ms <n subintervals I k such that each interval contains at least one measurement. Each of the intervals are assigned to an individual experiment having its own initial values but sharing the same parameters p. Suppose that x(t i ; , p) for all k = 1, ..., n ms denotes the trajectory within an interval. Since the total trajectory for each t ∈ I = I 1 ∪ ... ∪ is usually discontinuous at the joins of the subintervals, smoothness as anticipated by the solution of Eq. (1) is not fulfilled. To enforce smoothness, the optimization is constrained such that all discontinuities are removed at convergence. This leads to a constrained non-linear optimisation problem, which has in addition the advantage that further equality and inequality constraints can easily be implemented. Note that if the integration between two time points is numerically unfeasible, the segment where this problem occurs can be removed. This, however, leads to a split trajectory which parts can be treated using a multiple-experiment fit.
where d θ denotes the derivative with respect to the parameters θ of the corresponding function. Setting θ l = θ l-1 + Δθ l and repeating Eq. (7) until Δθ l ≈ 0, yields the desired parameter estimates under the condition that all parameters itself are identifiable and the constraints are not contradictory. These extra assumptions are necessary to fulfil the so called Kuhn-Tucker conditions for the solvability of constrained, non-linear optimization problems [23, 25].
In combination with multiple-shooting the generalized-quasi-Newton approach has three major advantages: first, the optimization is sub-quadratically convergent. Second, a transformation of Eqs. (7) can be found such that the transformed equations are numerically equivalent to the initial value approach. Third, due to the linearization of the continuity constraints, they do not have to be fulfilled exactly after each iteration, but only at convergence. This allows discontinuous trajectories during the optimization process, reducing the problem of local minima. The first two properties yield the desired speed of convergence whereas the third property is mainly responsible for the stability of multiple-shooting. This is gained by the possibility that the algorithm can circumvent local minima by allowing for discontinuous trajectories while searching the global minimum. Whereas, the main disadvantage is due to the linearization of the cost function. It can easily happen that despite the update step Δθ l is pointing in the direction of decreasing ℒ the proposed step is too large. Such an overshooting is common to any simple optimization procedures based on the local approximation of the cost function. A suitable approach to cure this deficiency is realized by relaxing the update step; hence θ l = θ l-1 + λ l Δθ l for some λ l ∈ (0, 1]. This procedure is referred to as damping and provides the bases of the determination of the switching point which we propose in the following.
Besides the choice of the global and local optimization procedure, the determination of the switching point is vital for the robustness of the hybrid approach, as discussed in . This is supported by the results presented in  where it is shown that different switching points may lead to different solutions and that careful investigations and computationally expensive empirical tests must be consulted in order to determine an appropriate switching strategy. In order to avoid such time consuming tests, we propose a systematic determination of the switching point in the following. All calculations needed to compute the switching point are carried out during the optimization which reduces the computational effort significantly. As global stochastic optimization methods we decide to use evolutionary approaches such as Stochastic Ranking Evolutionary Search (SRES)  or Differential Evolution (DE) . The local search method is – as already stated above – multiple-shooting.
The multiple-shooting method is equipped with a relaxation algorithm to prevent overshooting of the update step. This overshooting is due to the quadratic approximation of the likelihood function in Eq. (7) which is often too crude for points far away of the minimum. For these points the calculated update step tends to be too long and might result in a step leading to an increased value of the cost function. The relaxation method, also called damping method, selects some λ l ∈ (0, 1] such that the update step θ l = θ l-1 + λ l Δθ l is descendant. For this some level function has to be used. Such a level function must share the same monotony properties of the cost function close to the global minimum. Here, the objective to judge whether the proposed step at θ l-1 is descendant is given by the following level function [17, 22, 23]:T(λ) = ||G(θ l-1)R a (θ l-1 + λΔθ l )||2,
where R a is the n × N-dimensional vector with components in Eq. (6) and G is the generalized inverse of Eq. (7), satisfying Δθ l = G(θ l-1)R a (θ l-1). Based on T(λ) a very efficient corrector-predictor scheme is given in [17, 23] to determine the optimal damping parameter λ. Furthermore, it can be shown that whenever the method enters the region of local convergence, the method converges to a full step procedure and thus λ → 1 [17, 22, 23]. This feature of the damping strategy can be utilized to detect the region of local convergence and provides a suitable criterion for determining the switching point. Calculating λ during the global optimization and successively checking whether λ = 1 yields the desired information about the switching point. For stability reasons we propose to switch to the local method only after a certain number, say n 1, of consecutive λ = 1 is achieved. After the initialization of the method a number of iterations n 0 is performed using the global method without checking the switching point criterion in order to decrease the computational load, note that a minimum of around 15 iterations will be usually needed, this number may be increases if the size of the search space also increases. For the simulations presented in this study n 1 = 2. Since the corrector-predictor scheme can be implemented very efficiently, calculation of the damping parameter λ is computationally inexpensive.
In order to demonstrate the performance of the method we have chosen two examples: the STAT5 signaling pathway  and Goodwin's model  for a feedback control system showing a Hopf bifurcation. In both cases we simulated data having a noise-to-signal ratio of either 0% or 10% and evaluated the performance of the proposed hybrid method in comparison to local and global search strategies.
Here, in(t) is the input and out(t) the output of the delay chain. We set in(t) = x 3(t), out(t) = x 3(t - τ), and N = 8. This provides a reasonable approximation of the time delay . Two different sets of data were obtained by numerical simulations with a noise to signal ratio of 0% and 10%, respectively. As observed quantities we choose the total amount of activated STAT5, y 1 = s 1(x 2 + x 3), and the total amount of STAT5 in the cytoplasm, y 2 = s 2(x 1 + x 2 + x 3), where s 1 and s 2 are scaling parameters introduced to deal with the fact that only relative protein amounts are measured. Initial conditions and the kinetic parameters were chosen to be: x 1(0) = 3.71, x i (0) = 0, (i = 2,...,4), k 1 = 2.12, k 2 = 0.109, τ = 5.2, s 1 = 0.33 and s 2 = 0.26. From the simulated data we aim to estimate the rate constants k 1, k 2, the delay parameter τ and the initial concentration of unphosphorylated STAT5 x 1(0). In case of local optimization methods – single and multiple-shooting – we used multistarts, where the initial guess of each restart is randomly chosen from the intervals [0, 5] (Box 5), [0, 10] (Box 10), and [0, 100] (Box 100), respectively, using a uniform distribution. For each box size 100 restarts are chosen. Note that the delay parameter τ has to be restricted to Δt <τ < (t f - t 0), where Δt denotes the sampling rate of the data. This follows from the fact that no information is contained in the data about delays smaller than τt and larger than the total measurement time t f - t 0.
Computational costs in the STAT5 case study (in seconds) for 0% and 10% noise to signal ratio, respectively.
Simulated data with 0%/10% noise
In contrast to the local methods, both, the global search strategy SRES and the hybrid approach, converged in all cases to the global optimum which emphasises the strength of global methods. Note that results obtained by DE are comparable to SRES and are therefore omitted. The power of the hybrid strategy can be appreciated considering the average computational cost as shown in Table 1. Using the hybrid reduces the computational load significantly by a factor of four. Due to the systematic switching point calculation no further adjustments were necessary to obtain this significant emendation.
Here, x represents an enzyme concentration whose rate of synthesis is regulated by feedback control via a metabolite z. The intermediate product y regulates the synthesis of z. Oscillatory behaviour is not a necessary characteristic of this set of equations. Different values for the parameters may result in limit cycle oscillations, damped oscillations or monotonic convergence to a steady state. In fact, only a restricted range of parameter values result in oscillations. The following values have been used here x(0) = 0.3617, y(0) = 0.9137, z(0) = 1.3934, for the initial conditions and a = 3.4884, A = 2.1500, b = 0.0969, α = 0.0969, β = 0.0581, γ = 0.0969, σ = 10, and δ = 0.0775, for the model parameters, resulting in oscillatory behavior.
Computational costs in the Goodwin case study (in seconds) for 0% and 10% noise to signal ratio, respectively.
Simulated data with 0%/10% noise
In this study we present a new hybrid strategy as a reliable method for solving challenging parameter estimation problems encountered in systems biology. The proposed method presents two advantages over previous hybrid methods: First, it is equipped with a switching strategy which allows the systematic determination of the transition from the local to global search. This avoids computationally expensive tests in advance and constitutes a major benefit of the proposed method. Second, using multiple-shooting as the local search procedure reduces the multi-modality of the non-linear optimization problem. Because multiple-shooting avoids possible spurious solutions in the vicinity of the global optimum it outmatches the initial value approach (single shooting) yielding an enhanced robustness of the hybrid.
We analyzed the performance of this new approach using two examples: the dynamical model of the STAT5 signaling pathway suggested in  and the Goodwin model describing oscillating processes . The hybrid was able to converge to the global solution in all runs performed with significant reductions in the computational cost. Moreover a comparison with other search strategies reveals that the hybrid results in a better compromise efficiency-robustness. In conclusion the proposed hybrid provides a robust and convenient method for parameter estimation problems occurring in systems biology.
This work was supported by the European Community as part of the FP6 COSBICS Project (STREP FP6-512060), the German Federal Ministry of Education and Research, BMBF-project FRISYS (grant 0313921) and Xunta de Galicia (PGIDIT05PXIC40201PM).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.