Mathematical modeling in systems biology rely on quantitative information of biological components and their reaction kinetics. Due to paucity of quantitative data, various numerical optimization techniques have been employed to estimate parameters of such biological systems. Employed optimization techniques include local, deterministic approaches like Levenberg-Marquardt algorithm, Sequential Quadratic Programming, and stochastic approaches like Simulated Annealing, Genetic Algorithms and Evolutionary Algorithms (see for example, [10, 13]). Most commonly, local methods optimize the cost function, Eq. (4), directly with respect to initial values *x*
_{0} and parameters *p*. This optimization scheme is called initial value approach or alternatively single-shooting. Huge differences in the performance can be observed if either local or global optimization methods are used. Due to the presence of multiple minima in Eq. (4), convergence of local optimization methods to the global minimum is in most cases limited to a rather small domain in search space, see, e.g., [2, 3]. In contrast, global methods have generally a substantially larger convergence domain but the computational cost increases drastically.

One of the simplest global methods is a multistart method. Here, a large amount of initial guesses are drawn from a distribution and subjected to a parameter estimation algorithm based on a local optimization approach. The smallest minimum is then regarded as being the global optimum. In practice, however, there is no guarantee of arriving to the global solution and the computational effort can be quite large. These difficulties are arising because it is a-priori not clear how many random initial guesses are necessary. Over the last decade more suitable techniques for the solution of multi-modal optimization problems have been developed (see, e.g., [14] for a review). Several recent works propose the application of global deterministic methods for model calibration in the context of chemical processes, biochemical processes, metabolic pathways, and signaling pathways [4–6, 8, 9]. Global deterministic methods in general take advantage of the problem's structure and even guarantee convergence within a preselected level of accuracy. Although very promising and powerful, there are still limitations to their application, manly due to rapid increase of computational cost with the size of the considered system and the number of its parameters. As opposed to deterministic approaches, global stochastic methods do not require any assumptions about the problem's structure. Stochastic global optimization algorithms are making use of pseudo-random sequences to determine search directions toward the global optimum. This leads to an increasing probability of finding the global optimum during the runtime of the algorithm. The main advantage of these methods is that they rapidly arrive to the proximity of the solution. Examples of global stochastic methods are: pure random search algorithms, evolutionary strategies, genetic algorithms, scatter search and clustering methods. Some of these strategies have been successfully applied to parameter estimation problems in the context of systems biology, see [10, 11, 15].

In [2] a combination of global stochastic methods with local methods has been proposed. This, so called hybrid approach, utilizes the property of the global search strategy to arrive quickly to the vicinity of the solution. At a certain point in the proximity of the solution the optimizer is switched from the global stochastic to the local deterministic search method. It has been shown that this strategy saves a huge amount of computational effort and provides an efficient and robust alternative for model calibration. Therefore, the hybrid method takes advantage of the complementary strengths of both optimization strategies: global convergence properties in the case of the stochastic method, and fast local convergence in the case of the deterministic approach. Speed and the stability, however, of the resulting hybrid approach also depends on the performance of the used local approach. For this reason we choose the method of multiple-shooting rather than the initial value approach in order to refine the hybrid optimization strategy as described in [2]. As shown below multiple-shooting has in general a larger domain of convergence to the global optimum while only a small portion of additional computational load has to be taken into account compared to single shooting. A brief outline of the multiple-shooting method is given below.

### Multiple-shooting

Detailed discussion and some applications to measured data of the method can be found, e.g., in [16–22]. Here, we will concentrate on the main principles of multiple-shooting in order to construct a new hybrid approach. The basic idea of multiple-shooting is to subdivide the time interval *I* = [*t*
_{0}, *t*
_{
f
}] into *n*
_{
ms
} <*n* subintervals *I*
_{
k
} such that each interval contains at least one measurement. Each of the intervals are assigned to an individual experiment having its own initial values
but sharing the same parameters *p*. Suppose that *x*(*t*
_{
i
};
, *p*) for all *k* = 1, ..., *n*
_{
ms
} denotes the trajectory within an interval. Since the total trajectory for each *t* ∈ *I* = *I*
_{1} ∪ ... ∪
is usually discontinuous at the joins of the subintervals, smoothness as anticipated by the solution of Eq. (1) is not fulfilled. To enforce smoothness, the optimization is constrained such that all discontinuities are removed at convergence. This leads to a constrained non-linear optimisation problem, which has in addition the advantage that further equality and inequality constraints can easily be implemented. Note that if the integration between two time points is numerically unfeasible, the segment where this problem occurs can be removed. This, however, leads to a split trajectory which parts can be treated using a multiple-experiment fit.

For each

*k* = 1, ...

*n*
_{
ms
} let

and

*θ*
_{
k
} = (

,

*p*) the optimization problem can then be formulated in the following manner:

where the continuity constraints are given at the first row of the constraints-part, followed by optional constraints

. Cost function ℒ(

*θ*
_{1}, ...

) is equivalent to Eq. (

4) if the continuity constraints are satisfied; hence

We solved the non-linear programming problem defined by Eq. (

5) iteratively by employing a generalized-quasi-Newton method [

23,

24]. With the current guess

, the update step

for the

*l*-th iteration is obtained by solving the resulting linearly constrained least squares problem:

where d_{
θ
} denotes the derivative with respect to the parameters *θ* of the corresponding function. Setting *θ*
^{
l
} = *θ*
^{l-1} + Δ*θ*
^{
l
} and repeating Eq. (7) until Δ*θ*
^{
l
} ≈ 0, yields the desired parameter estimates under the condition that all parameters itself are identifiable and the constraints are not contradictory. These extra assumptions are necessary to fulfil the so called Kuhn-Tucker conditions for the solvability of constrained, non-linear optimization problems [23, 25].

In combination with multiple-shooting the generalized-quasi-Newton approach has three major advantages: first, the optimization is sub-quadratically convergent. Second, a transformation of Eqs. (7) can be found such that the transformed equations are numerically equivalent to the initial value approach. Third, due to the linearization of the continuity constraints, they do not have to be fulfilled exactly after each iteration, but only at convergence. This allows discontinuous trajectories during the optimization process, reducing the problem of local minima. The first two properties yield the desired speed of convergence whereas the third property is mainly responsible for the stability of multiple-shooting. This is gained by the possibility that the algorithm can circumvent local minima by allowing for discontinuous trajectories while searching the global minimum. Whereas, the main disadvantage is due to the linearization of the cost function. It can easily happen that despite the update step Δ*θ*
^{
l
} is pointing in the direction of decreasing ℒ the proposed step is too large. Such an overshooting is common to any simple optimization procedures based on the local approximation of the cost function. A suitable approach to cure this deficiency is realized by relaxing the update step; hence *θ*
^{
l
} = *θ*
^{l-1} + λ^{
l
}Δ*θ*
^{
l
} for some λ^{
l
} ∈ (0, 1]. This procedure is referred to as damping and provides the bases of the determination of the switching point which we propose in the following.

### A new hybrid method

Besides the choice of the global and local optimization procedure, the determination of the switching point is vital for the robustness of the hybrid approach, as discussed in [26]. This is supported by the results presented in [2] where it is shown that different switching points may lead to different solutions and that careful investigations and computationally expensive empirical tests must be consulted in order to determine an appropriate switching strategy. In order to avoid such time consuming tests, we propose a systematic determination of the switching point in the following. All calculations needed to compute the switching point are carried out during the optimization which reduces the computational effort significantly. As global stochastic optimization methods we decide to use evolutionary approaches such as Stochastic Ranking Evolutionary Search (SRES) [27] or Differential Evolution (DE) [28]. The local search method is – as already stated above – multiple-shooting.

#### Calculation of the switching point

The multiple-shooting method is equipped with a relaxation algorithm to prevent overshooting of the update step. This overshooting is due to the quadratic approximation of the likelihood function in Eq. (7) which is often too crude for points far away of the minimum. For these points the calculated update step tends to be too long and might result in a step leading to an increased value of the cost function. The relaxation method, also called damping method, selects some λ^{
l
} ∈ (0, 1] such that the update step *θ*
^{
l
} = *θ*
^{l-1} + λ^{
l
}Δ*θ*
^{
l
} is descendant. For this some level function has to be used. Such a level function must share the same monotony properties of the cost function close to the global minimum. Here, the objective to judge whether the proposed step at *θ*
^{l-1} is descendant is given by the following level function [17, 22, 23]:*T*(λ) = ||*G*(*θ*
^{l-1})*R*
^{
a
}(*θ*
^{l-1} + λΔ*θ*
^{
l
})||^{2},

where *R*
^{
a
} is the *n* × *N*-dimensional vector with components
in Eq. (6) and *G* is the generalized inverse of Eq. (7), satisfying Δ*θ*
^{
l
} = *G*(*θ*
^{l-1})*R*
^{
a
}(*θ*
^{l-1}). Based on *T*(λ) a very efficient corrector-predictor scheme is given in [17, 23] to determine the optimal damping parameter λ. Furthermore, it can be shown that whenever the method enters the region of local convergence, the method converges to a full step procedure and thus λ → 1 [17, 22, 23]. This feature of the damping strategy can be utilized to detect the region of local convergence and provides a suitable criterion for determining the switching point. Calculating λ during the global optimization and successively checking whether λ = 1 yields the desired information about the switching point. For stability reasons we propose to switch to the local method only after a certain number, say *n*
_{1}, of consecutive λ = 1 is achieved. After the initialization of the method a number of iterations *n*
_{0} is performed using the global method without checking the switching point criterion in order to decrease the computational load, note that a minimum of around 15 iterations will be usually needed, this number may be increases if the size of the search space also increases. For the simulations presented in this study *n*
_{1} = 2. Since the corrector-predictor scheme can be implemented very efficiently, calculation of the damping parameter λ is computationally inexpensive.