- Methodology article
- Open access
- Published:
Parameter estimation in systems biology models using spline approximation
BMC Systems Biology volume 5, Article number: 14 (2011)
Abstract
Background
Mathematical models for revealing the dynamics and interactions properties of biological systems play an important role in computational systems biology. The inference of model parameter values from time-course data can be considered as a "reverse engineering" process and is still one of the most challenging tasks. Many parameter estimation methods have been developed but none of these methods is effective for all cases and can overwhelm all other approaches. Instead, various methods have their advantages and disadvantages. It is worth to develop parameter estimation methods which are robust against noise, efficient in computation and flexible enough to meet different constraints.
Results
Two parameter estimation methods of combining spline theory with Linear Programming (LP) and Nonlinear Programming (NLP) are developed. These methods remove the need for ODE solvers during the identification process. Our analysis shows that the augmented cost function surfaces used in the two proposed methods are smoother; which can ease the optima searching process and hence enhance the robustness and speed of the search algorithm. Moreover, the cores of our algorithms are LP and NLP based, which are flexible and consequently additional constraints can be embedded/removed easily. Eight system biology models are used for testing the proposed approaches. Our results confirm that the proposed methods are both efficient and robust.
Conclusions
The proposed approaches have general application to identify unknown parameter values of a wide range of systems biology models.
Background
In recent years, the rapid development of sophisticated experiment tools in molecular biology allows the acquisition of high qualitative time series data which can significantly improve the ability of revealing the complex dynamics and interactions of biological systems. Profiting from the rapid technological advances, more and more researchers from different disciplines can now utilize such observation data to establish mechanism-based models which can incorporate every possible detail and functioning of biological systems [1]. One common approach is to characterize the biological system with a set of Ordinary Differential Equations (ODEs) [2–7]. Generally, there are two major aspects of building an ODE model for a biological system from experimentally measured time series: (1) to determine the structure of the system through a set of suitable ODEs with unknown parameters; (2) to determine the unknown parameters of this ODE model. The identification of these unknown parameter with fixed model structure from observations is one of the central issues of computational systems biology [8]. This type of approach can be considered as a "reverse engineering process" [9–11].
The parameter estimation problem is generally formulated as an optimization problem that minimizes an objective function which represents the fitness of the model with respect to a set of experimental data [8, 12–17]. Two major optimization approaches are commonly adopted; the gradient-based nonlinear optimization method and the evolutionary based method. Also, simulations had shown that the simulated annealing (SA) method can offer promising results [18]. In [19], many deterministic and stochastic global optimization (GO) methods for parameter estimation were further compared using a three-step pathway model with noise free data assumption; the best result was given by the Stochastic Ranking Evolution Strategy (SRES) method. It is worth mentioning that, due to its simplicity in implementation, evolutionary algorithms, such as genetic algorithm and their variants, are extensively utilized for identifying unknown parameters of systems biology models [1, 11, 20–22]. However, most of these aforementioned approaches need a numerical ODE solver to perform the numerical integration for the underlining differential equations. Studies have revealed that more than 90% of the computation time is consumed in the ODE solver during the identification process [19]. In particular, for nonlinear dynamical systems with high-parameter-dimension, one trial usually consumes tens of hours or even days [10, 20, 21, 23]. Furthermore, the convergence property is aggravated by numerical integration failure, which is a major problem in the optimization process [11]. The computational burden can be relieved by reformulating the system involving differential equations into a system of algebraic equations [12, 15, 17, 24], which can be classified as "decomposition approaches". These decomposition approaches are widely employed in the parameter estimation of S-systems [7, 25]. The reliability of the decomposed methods depends on the accuracy of the "smooth" estimated derivatives and the states of the system. In practice, these data are subject to significant observation noise. Without proper pre-processing, the estimation faces the potential of the overfitting problem and hence the estimation can deviate badly from the "true" value [26, 27]. Regularization can be considered as a mathematical pre-processing on the measured noisy data set and be used to control the trade-off between the "roughness" of the solution and the infidelity of the data [28]. Since we are dealing with a known structured bio-system, the system model itself possesses a physical inertia and can serve as physical constraints which limit the system states within a set of possible trajectories. In this paper, the over-fitting problem can be relieved by embedding the model dynamics, the mass and energy balance constraints into our constrained optimization algorithms. Owing to the nonlinearity of systems biology models, the cost function to be minimized is complex and has multiple local minima. Minimization algorithms face the high possibility of getting trapped at local optima. For these reasons, the parameter estimation problem is still a bottleneck and a challenging task of computational analysis of systems biology [1, 11]. Until now, none of the parameter estimation methods is effective in all cases and can overwhelm all the other methods. Instead, various methods have their advantages and disadvantages. Consequently, it is worthy to develop acceptably "good enough" methods within a given tolerance and time frame.
For practical purpose, some essential issues should be taken into account when developing a parameter estimation method: first, the method must be "efficient" enough that a trial can be completed within a reasonable computation time; second, for biological systems, the observation data is often corrupted by high level of noise, which complicates the objective function surface and introduces unwanted additional local minima in the search space [29]. Hence, the approach should be robust subject to noise; third, it needs to be flexible enough for adding/removing physical constraints, such as model dynamics, the mass and energy balance constraints. Furthermore, the representative cost function should have less local minima so as to ease the optimization algorithm in converging to the global minima. In this paper, two parameter estimation methods of combining spline theory [28] with Linear Programming (LP) and Nonlinear Programming (NLP) are developed, respectively. These methods remove the need for an ODE solver. Our analysis exhibits that the cost function surfaces of the two proposed methods are smooth.
Moreover, the cores of our algorithms are LP and NLP based, which are very flexible and hence additional constraints can be embeded/removed easily. Eight systems biology models were used to test the proposed algorithms. Experimental results show that the proposed methods are both efficient and robust (see additional file 1 for details).
This paper is organized as follows: The preliminary problem formulation is given and the bottleneck of the problem is highlighted in the next section. Then, two parameter estimation methods surmounting those bottlenecks are presented. In section 3, two trials are given, a simple enzyme kinetic model and the mammalian G1/S transition network model, in order to illustrate the robustness and the effectiveness of these two proposed methods (more models and trial results are given in additional file 1 and 2). Finally, conclusions and discussions are given in section 4.
Methods
Parameter estimation problem of systems biology models
Biological pathway dynamics can be modelled by the following continuous ODEs:
where x ϵ Rn is the system's state vector (for example the concentrations of a process), θ ϵ Rk is the system's parameter vector (for instance, the reaction rates), u(t) ϵ Rp is system's input, y ϵ Rm denotes the measured data subject to a Gaussian white noise η(t) ~ N(0, σ2), and x0 is the initial state. f(·) is a set of nonlinear transition functions describing the dynamical properties of a biological system. Here, g(·) represents a measurement function. If all the states can be measured, the observer g(·) becomes an identity matrix. Otherwise, g(·) usually is a rectangular zero-one matrix with corresponding rows deleted (represent the immeasurable states) from the identity matrix I n .
The parameter estimation problem of nonlinear dynamical systems described in (1) can be formulated as a nonlinear programming problem (NLP) P0 with differential-algebraic constraints:
P0 minimizes a cost function that measures the fitness of the model with respect to a given set of experiment data subjecting to a set of constraints, where is the set of parameters to be estimated, ||·|| l denotes the l-norm with l > 0, is the estimated initial condition, is the estimated system states ( represents the estimated variable at time t j with parameter and initial condition ), w ij are the weighting coefficients, is the estimated measured data. In some applications, additional constraints are introduced to impose special structural properties of a given system; they can be implemented in the form of the equality and inequality constraints C eq and C ineq (for instance the system performance and the mass balance constraints). Finally, θL and θU are simple structural constraints such as the parameter's upper/lower bounds (they can be part of the C ineq ).
For the NLP-P0 , the direct optimization methods, such as Newton type methods and many GO methods, require solving the nonlinear dynamic model (1) for in order to compute the cost function. The common method to estimate and is using ODE solvers, which perform the numerical integration with fixed at each iteration [19]. During the process of identification, the integration has to be executed thousands, even millions of times. That is the main reason more than 90% of the time is consumed in the ODE solver [24] and the computation time spent on the P0 can be hours even days [10, 20]. Moreover, P0 is a nonlinear optimization problem subjecting to a set of linear and non-linear differential equation constraints. Hence, P0 is often multimodal (non-convex) and has many local minima. In a high-noise environment, the situation becomes more difficult. Consequently, P0 requires further manipulation in order to reduce the complexity so as to relieve the computation burden and also to avoid being trapped in local minima.
Instead of using ODE solvers to estimate x(t) and , one can utilize spline approximation. Given L real values τ i , called knots, with τ0 ≤ τ1≤ · · · ≤ τL-1. Using the Cox-de Boor recursion formula, the B-spline basis of degree n d = 0, 1, 2, · · ·, L - 2 can be defined as follows:
Let , a vector of length L i - n d - 2, be the chosen basis functions. Then, the estimated variable can be expressed in terms of the basis function expansion [28]
where is the estimation of the i th state of (1), pi, mis the weighting coefficient. Let , (5) can be rewritten in matrix form
Similarly, the estimated can be approximated by
where is the set of the derivatives of the basis functions. There are various types of splines suitable for this application, such as cubic spline, B-spline, uniform spline, nonuniform spline and interpolating spline. For more detail information about spline approximation theory, please refer to chapter (IX, XI, and XIV) in [28]. As B-spline is simple in formation and efficient for computation, it is adopted here. Our extensive tests have shown that uniform B-spline basis with produces good results. Hence, in this paper, unless otherwise indicated, the uniform B-spline basis with was used in the parameter identification process.
Next, two techniques based on spline for parameter estimation will be proposed: one is based on linear programming (LP) which is very efficient and can cover many special structured systems and the other one is based on NLP which is flexible and can cater for general system structures.
The LP Approach
In many bio-system models, f (x, θ) is autonomous system and linear in θ as follows:
where Φ(x) ϵ Rn ×kis a matrix and its elements are a function of the state x. Systems with structure (8) covers a large set of systems biology models, such as enzyme kinetic pathway model, RKIP pathway model, Iκ B-NF-κ B model TNFα-Mediated NF-κ B-signaling pathway model, irreversible inhibition of HIV proteinase model, Laub and Loomis model [2–4, 30]. In addition, these types of models are usually subject to the mass balance constraints which can be incorporated into the LP easily (It is demonstrated in the results section via the Enzyme kinetic model).
For noisy data, good smoothing approximation can be achieved by minimizing the following cost function [26]
where represents the m th derivative of , m ϵ Z+ and λ ≥ 0 control the trade-off between the "roughness" of the solution and hence can be used to relieve the overfitting problem. If the equality C eq represents the mass balance constraint and the inequality constraint C ineq represents parameter values' lower/upper bounds, the B-coefficient vector p = [p1; p2; · ·· p n ]Tcan be computed by solving the following quadratic programming sub-problem A1 :
stands for the equality constraints . A eq is a constant matrix, b eq is a constant vector. It is found empirically that m = 2 and 0 ≤ λ ≤ 0.05 produce relatively good results. Hence, the parameters m = 2, and λ = {0, 0.01, 0.03} corresponding to a noise level {0%, 5%, 10%}, respectively, were used in this study. Then, the "smooth" estimated state can be generated by the B-spline approximation (5).
Replace x(t) by the estimated state and integrate (8) yield:
where represents the transition matrix. Then, P0 can be reformulated into the following optimization problem:
Here, w ij ≥ 0 is weighting factor. Note that the L1-norm of a variable × is equivalent to the following relation: ; s.t. - α ≤ x ≤ α. Then P1 can be transformed into the following augmented optimization problem by introducing the slack variables α as follows:
It is a Linear Programming (LP) problem with variable {α, θ}, which is a convex problem with a wealth of fast and efficient routines available [31, 32].
Combine spline theory and NLP
To deal with systems biology models, of which the states and parameters are separable, the LP approach is suitable and efficient. In contrast, if the model does not belong to this category, such as the mammalian G1/S transition model and S-system model, the aforementioned approach cannot apply. Thus, a more general approach will be introduced. Recalling (6) and (7), the estimation of and can be constructed by a set of basis functions. We can replace and in (i-iv) of P0 with (6) and (7). With little change, P0 can be reformulated as
Note that the constraint-(i) of (2) has been replaced by constraints (i)-(iii) of P3. Then, NLP-P0-(2) with differential-algebraic constraints turns into NLP-P3-(14) with only algebraic equation constraints. Hence, P3 does not require ODE solvers, which eases the computation burden (as shown in Examples). In contrast to the the decomposition methods [12, 15], which divide the estimation of the system states (and its derivative) and the parameter estimation into two separate steps, P3 computes the estimated states (and its derivative) and parameter values at the same time. Note that constraint (iii) of P3 governs the estimated state (and its derivative) so as to ensure these estimates belong to the trajectory , which is a solution of system (1). Thus, the system model itself serves as a filter performing regularization. Hence, the overfitting problem can be relieved (see additional file 2 for details).
For a non-linear system, the Lagrangian of (2), , is an implicit function of [31]. However, many traditional optimization algorithms require the derivative during the optimization process. As is an implicit function of , cannot be obtained directly, but has to be computed via approximation methods [16], which makes the algorithm unreliable. For P3, the Lagrangian now consists of simple algebraic constraints. Thus, and ∂L/∂p are explicit functions of and p. In conclusion, many of the aforementioned difficulties can be reduced. P3 can be solved by a number of optimization approaches; either via evolution type algorithms, such as genetic algorithm (GA), simulated annealing (SA) and etc, or via traditional NLP algorithms, such as sequential quadratic programming(SQP), sequential penalty function, the trust region approach and etc [33, 34].
Results
Two biological system models, a simple enzyme kinetic model and the mammalian G1/S transition network model, are chosen as benchmarks for evaluating the performance of P2 and P3 respectively.
Enzyme kinetic model
Consider the well-known simplified enzyme kinetic model. E is the concentration of an enzyme that combines with a substrate S to form an enzyme-substrate complex ES with a rate constant k1. The complex ES holds two possible out comes in the next step. It can be dissociated into E and S with a rate constant k2, or it can further proceed to form a product P with a rate constant k3. It is assumed that none of the products reverts to the initial substrate. These relations can be represented by the following set of ODEs.
where k1, k2, k3, are the system unknown parameters. Let x1(t), x2(t), x3(t)and x4(t) represent S(t),E(t), ES(t) and P(t) respectively. Then, the above equation can be rewritten into
Then, the mass balance constraint becomes:
Or in matrix form, we have
where and .
According to (16), we have
An artificial data set with four time courses was created. A total of 40 sampling points were assigned on each time courses. The observation data was perturbed by a zero mean Gaussian white noise η(t) ~ N(0, σ2) in order to simulate the observation error. The estimated state was computed by solving the quadratic programming sub-problem A1. The unknown parameter values were estimated using P2. The searching region of the parameter values was [0, +∞). All the computations were performed on a Pentium Dual Core computer (2.13 GHz ×2) with 2 GB RAM. The algorithm was implemented with Matlab-7 using the interior point algorithm. To quantify the fitness of the estimated model, the following relative squared error (RSE) measure J is employed:
where is the estimated time-course at time t j of a state variable x i , and x i (t j ) represents the "true" time-courses without noise at time t j . Note that smaller RSE J reflects better estimation. In order to obtain a statistical result on the quality of the estimation, 5,000 trials were performed. At each trial an estimated is computed using P2. Then, the mean estimation and standard deviation were deduced. The computation was very efficient and only took a few seconds for one estimation trial.
Table 1 shows the statistical results of the estimation. It reveals that all the system parameter values are estimated successfully with a relative error of around 0.01% in noise free condition. When the system is subjected to a 10% observation noise level, all the mean estimated parameter values are within a relative tolerance, better than 2%. The RSE between the time-courses, produced by inferred model, and the given time-series data, averaged smaller than 1% even subject to 10% observation noise level condition. For this reason, we have confidence that the proposed method is robust within ±10% noise ratio (more complicated models and trials on P2 can be found in part I of additional file 1). Figure 1 shows the responses of one trial.
The mammalian G1/S transition network model
Next, the mammalian G1/S transition network model, which includes a set of proteins and regulatory gene network, is used to test P3. In the mammalian G1/S transition network, pRB and AP-1 are the tumor suppressor from the family of pocket proteins and the family of transcription factors that mediate mitogenic signals, E2F1 is the transcription factor targeting genes that regulate cell cycle progression, Cyclin D/cdk4,6, cyclin E/cdk2, complexes characterizing the G1- and S- phases. There are various positive and negative feedback loops in the network controlling the G1/S transition. The positive feedback regulation of E2F1 and a double activator-inhibitor module can lead to bistability. The double activator-inhibitor module of the antagonistic plays E2F/DP on pRB make up the key unit of this phase transition. The graph representation of the mammalian G1/S transition network model can be found in additional file 1 and more details can refer to Swat et al. [5]. Definition of Variables for G1/S Transition Model is shown in Table 2. The corresponding ODE model is as follows
where x is the set of state variables. There are totally 9 states and 39 parameters. The nominal parameter values are shown in Table 3.
Here, P3 was solved by the Stochastic Raking Evolution Strategy (SRES) algorithm [35]. The searching region of the parameters was [0, 50θ ]. SRES uses stochastic ranking as the constraint handling technique, which adjusts the balance between the objective and penalty functions automatically during the evolutionary search. The observation data include 4 sets of time course, which consists of 40 sample points. For trials with noise free data, the algorithm converged in 8 ~ 9 hours after 250,000 ~ 300,000 iterations. The estimated parameter values, as shown in Table 3 are almost identical to the nominal parameter values. However, for k23, k25 and J15, the estimated values are far from the nominal values, but the RSE measure is almost zero, which possibly implies that the system is insensitive with the changes of k23, k25 and J15. This phenomenon reveals that the G1/S transition model either has some parameters that are insensitive to the chosen observation, or they are non-identifiable parameters [36, 37]. It is worth mentioning that the this large computational effort is the consequence of the very tight convergence criteria, an almost equal good result can be reached within 200,000 generations in about 6.5 hours with the RSE measure J is smaller than 1%. Figure 2(a) shows the "true" time-series data without noise and computed dynamic time-series data from one identified model. When 10% random noises are added, the convergence time increased and the relative estimation errors between estimated parameters and nominal parameters increased with the increase of noise. However, the time-series produced by the estimated model is very similar to the original data, namely the RSE J is still small. This phenomenon may imply that there is no need to estimated every parameters accurately to achieve a model with equivalent dynamical properties with a good degree of accuracy. As the simulation time is long, performing thousands of simulations as the first method in order to evaluate the mean and variance of estimated parameters is impractical. Thus, due to the lack of space, results of just a few selected trial are shown in Table 3 (more trial results can be found in additional file 1).
Trials were performed using Matlab-7. The main reason to use Matlab is that it is a convenient environment to visualize all the information arising from the optimization runs of the solver, evaluate new algorithms and modify existing algorithms. In contrast to the convenience, it is worth mentioning that Matlab programs usually are one order of magnitude (10 times or more) slower than equivalent compiled Fortan or C codes [19]. This is the major drawbacks of carrying programs out with Matlab. However, even in this situation, the performance of the proposed methods is acceptable.
For fair comparison, we also used the SRES algorithm to solve the same parameter estimation problem in the same searching region, but using NLP-P0 with differential algebraic constraints as cost function. In this condition, after running 1 day, the algorithm failed to produce a set of parameters that can produce reasonable simulation result. We further reduced the searching region to [0, 3θ ] and used noise free data, but the estimation result was still not good and the RSE J is larger than 10.
Here, we use the G1/S model to show the differences of the cost function surfaces between NLP-P0 and NLP-P3: in this case, the cost function of P0 is a highly irregular and complicated manifold with multiple local minima; the augmented cost function adopted in problem P3 is a much "smoother" function and hence it is easier for the NLP algorithm to converge to the solution. In order to simplify the analysis for exposition purpose, we only vary parameters k1 and k2 over the range k1 ϵ [0; 2 ] and k2 ϵ [0, 3.2 ] and fix all other parameters at their nominal values. Figure 3(a) displays the cost function surface of P0 , while Figure 3(b) exhibits the same data as Figure 3(a) on the expanded scale and Figure 3(c) is the corresponding contour plots. Figure 3(a) shows that the cost function surface of is a ridge, which drops suddenly from 109 to 0. However, Figure 3(b) reveals the cost function surface of P0 are actually banana-shaped valley around the nominal value of the fixed parameters, this unfavorable profile can slow down the convergence rate of the algorithm. Furthermore, there are many local minima in the banana-shaped valley. Some algorithms, such as simulated annealing, genetic algorithm, have been proposed to overcome these problem. However, these algorithms are all computationally demanding. In conclusion, these cost function features make the problem P0 a severe challenge to every optimization algorithm.
With the same condition, Figure 4(a) displays the cost function surface of P3, while Figure 4(b) shows the corresponding contour line. Compared with the cost function surface of P0, the cost function surface of P3 is bowl-shaped, which is smoother. Similar results has also been observed when other combination of parameters served as variables. Obviously, if all 39 parameters vary at the same time, the surface of the cost function will be more "uneven" and more complicated. However, in this case, from the previous observations, the cost function surface of P3 is smoother than the cost function surface of P0.
Furthermore, P3 only involves algebraic equations as objective function and constraints. These properties make the NLP-P3 easier to solve.
Discussion and Conclusion
In this paper, two parameter estimation methods based on spline theory are proposed. One aims at a narrower class of systems which is linear in parameters; however, it can cover many commonly found biological systems. The benefit is that the estimation problem can be transformed in an LP sub-algorithm which are fast and robust. Additional linear constraints can be embedded relative easily. For general systems, the problem is solved by an NLP with algebraic constraints, which is more computationally demanding.
A simple enzyme kinetic model and the mammalian G1/S transition network model were used as benchmarks to evaluate the performance of the two proposed methods. We illustrate the usefulness with more examples in additional files 1 but these do not remotely cover all the conditions.
During the simulation of the mammalian G1/S transition network model, we found that the estimated parameter set Φ A ≡ {k1, k2, k p , J11, J12, Km 1, Km 2, ϕ1, ϕ2} were well within the respective nominal values. While the set Φ B ≡ {J61, J62, J63, J65, J68} were far from their nominal values. However, the time-series produced by the estimated model were very similar to the original data. This phenomena reveals that some parameter values are insensitive in the searching region. Interestingly, we find that the "sensitive" or "easily identified" parameters set Φ A are also the parameters of the double-activator-inhibitor module of the antagonistic players E2F/DP and pRB, which makes up the core unit of the G1/S transition model [5]. This phenomenon may imply that the parameter values of the core module are sensitive and easy to identify. In contrast, the parameters set Φ B seems to be insensitive, which may reflect that pRG p (x6) is not a key element of the total system. However, to identify which parameter values or variables are important, a sensitivity analysis is needed [38], which is another important topic in systems biology and deserves a more detailed study. This sensitivity analysis is a pre-process for isolating those states and parameters which are sensitive in order to reduce the dimension of the system model and to improve the numerical stability for the core estimation problem.
For most biological systems, the ODE models are often high-dimensional and nonlinear. The problem of system parameter estimation is computationally expensive and can easily be trapped in local minima. We find that under noisy conditions, it is almost impossible to accurately estimate every parameter of the sloppy biological system model. However, in practice, a model with equivalent dynamical properties with a good degree of accuracy can be constructed based on dominant sensitive parameters and system states. The following are some of practical observations:
-
1.
High quality experiment data is essential for identifying accurate biology systems. When the experiment data is corrupted with high level noise, it needs more experimental data. If an insufficient amount of time-series data is given as observed profiles, the high degree-freedom of systems biology models ensures that many candidate solutions will be found.
-
2.
Perform a sensitivity analysis and identifiability analysis before the identification phase [36–38].
-
3.
For systems models with insensitive or non-identifiable parameters, the search may lead to a solution where some parameters can have large deviation, but still produce satisfactory system responses. This problem can be partly relieved by introducing auxiliary information (additional constraints such as shrinking the searching region) of the model into the algorithm. However, it remains difficult to be solved completely by improving parameter estimation strategy. It indicates that researchers should focus on predictions rather than on accurately estimating every parameter.
Although the proposed algorithms are fast and robust, there is certainly room for improvement: for method 1, it is not general enough to catch every case; for method 2, the price for the simplicity and generality is at the expansion of the optimization variable dimension. Under high noise condition, method 2 is still not robust enough. At the moment, the testing is based on Matlab which is much slower than native codes produced by C, Fortran, etc, however the conversion is straight forward. Currently, many high-speed computation engines are available that make use of parallelism, for instance multi-cluster engines, array-processing engines etc. Hence, one possible way is developed algorithm on these high-speed computation engines environment. Another possible way is developing hybrid algorithms to incorporate elements from evolution algorithms such as GA, SA and PSO. In this paper, we have considered the parameter estimation problem with known structure. However, it is easy to expand our method to structure identification by introducing an additional penalty term to the objective function [39].
References
Chang WC, Li CW, Chen BS: Quantitative inference of dynamic regulatory pathways via microarray data. BMC Bioinformatics. 2005, 6: 1-19. 10.1186/1471-2105-6-44
Cho KH, Shin SY, Kim HW, Wolkenhauer O, McFerran B, Kolch W: Mathematical modeling of the influence of RKIP on the ERK signaling pathway. Lecture Notes in Computer Science. 2003, 2602: 127-131. full_text. full_text full_text
Cho KH, Shin SY, Lee HW, Wolkenhauer O: Investigations into the analysis and modeling of the TNFα -Mediated NF-κ B-Signaling pathway. Genome Res. 2003, 13 (11): 2413-2422. 10.1101/gr.1195703
Hoffmann A, Levchenko A, Scott ML, Baltimore D: The IkB-NF-kB Signalling module: temporal control and selective gene activation. Science. 2002, 298 (5596): 1241-1245. 10.1126/science.1071914
Swat M, Kel A, Herzel H: Bifurcation analysis of the regulatory modules of the mammalian G1/S transition. Bioinformatics. 2004, 20 (10): 1506-1511. 10.1093/bioinformatics/bth110
Tyson JJ, Chen K, Novak B: Network dynamics and cell physiology. Nature Reviews Molecular Cell Biology. 2001, 2: 908-916. 10.1038/35103078
Voit EO: Computational analysis of biochemical systems, a practical guide for biochemists and molecular biologists. 2000, Cambridge: Cambridge University Press,
Goel G, Chou IC, Voit EO: System estimation from metabolic time-series data. Bioinformatics. 2008, 24 (21): 2505-2511. 10.1093/bioinformatics/btn470
Ashyraliyev M, Fomekong-Nanfack Y, Kaandorp JA, Blom JG: Systems biology: parameter estimation for biochemical models. FEBS J. 2009, 276 (4): 886-902. 10.1111/j.1742-4658.2008.06844.x
Kikuchi S, Tominaga D, Arita M, Takahashi K, Tomita M: Dynamic modeling of genetic networks using genetic algorithm and S-system. Bioinformatics. 2003, 19 (5): 643-650. 10.1093/bioinformatics/btg027
Tsai KY, Wang FS: Evolutionary optimization with data, collocation for reverse engineering of biological networks. Bioinformatics. 2005, 21 (7): 1180-1188. 10.1093/bioinformatics/bti099
Chou IC, Martens H, Voit EO: Parameter estimation in biochemical systems models with alternating regression. Theor Biol Med Model. 2006, 19: 3-25.
Gadkar KG, Gunawan R, Doyle F: Iterative approach to model identification of biological networks. BMC Bioinformatics. 2005, 6: 155- 10.1186/1471-2105-6-155
Gonzalez OR, Küer C, Jung K, Naval PCJ, Mendoza E: Parameter estimation using Simulated Annealing for S-system models of biochemical networks. Bioinformatics. 2007, 23 (4): 480-486. 10.1093/bioinformatics/btl522.
Lall R, Voit EO: Parameter estimation in modulated, unbranched reaction chains within biochemical systems. Comput Biol Chem. 2005, 29 (5): 309-318. 10.1016/j.compbiolchem.2005.08.001
Peifer M, Timmer J: Parameter estimation in ordinary differential equations for biochemical processes using the method of multiple shooting. IET Syst Biol. 2007, 1 (2): 78-88. 10.1049/iet-syb:20060067
Veflingstad SR, Almeida J, Voit EO: Priming nonlinear searches for pathway identification. Theor Biol Med Model. 2004, 1: 8- 10.1186/1742-4682-1-8
Mendes P, Kell D: Non-linear optimization of biochemical pathways: application to metabolic engineering and parameter estimation. Bioinformatics. 1998, 14 (10): 869-883. 10.1093/bioinformatics/14.10.869
Moles CG, Mendes P, Banga JR: Parameter estimation in biochemical pathways: a comparison of global optimization methods. Genome Res. 2003, 13 (11): 2467-2474. 10.1101/gr.1262503
Cho DY, Cho KH, Zhang BT: Identification of biochemical networks by S-tree based genetic programming. Bioinformatics. 2006, 22 (13): 1631-1640. 10.1093/bioinformatics/btl122
Koh G, Teong HF, Cléent MV, Hsu D, Thiagarajan PS: A decompositional approach to parameter estimation in pathway modeling: a case study of the Akt and MAPK pathways and their crosstalk. Bioinformatics. 2006, 22 (24): 271-280. 10.1093/bioinformatics/btl264.
Kimura S, Ide K, Kashihara A, Kano M, Hatakeyama M, Masui R, Nakagawa N, Yokoyama S, Kuramitsu S, A K: Inference of S-system models of genetic networks using a cooperative coevolutionary algorithm. Bioinformatics. 2005, 21 (7): 1154-1163. 10.1093/bioinformatics/bti071
Polisetty PK, Voit EO, Gatzke EP: Identification of metabolic system parameters using global optimization methods. Theor Biol Med Model. 2006, 27 (3): 4-10.1186/1742-4682-3-4.
Voit EO, Almeida J: Decoupling dynamical systems for pathway identification from metabolic profiles. Bioinformatics. 2004, 20 (11): 1670-1681. 10.1093/bioinformatics/bth140
Savageau MA: Biochemical Systems Analysis: a study of Function and Design in Molecular Biology. 1976, Reading, MA: Addison-Wesley, Reading,
Craven P, Wahba G: Smoothing noisy data with spline functions estimating the correct degree of smoothing by the method of generalized cross validation. Numer Math. 1979, 31: 377-403. 10.1007/BF01404567.
Tetko IV, Livingstone DJ, Luik AI: Neural network studies. 1. Comparison of overfitting and overtraining. J Chem Inf Comput Sci. 1995, 35: 826-833.
Boor D: A practical guide to splines. 1987, New York: Springer,
Voss HU, Timmer J, Kurths J: Nonlinear dynamical system identification from uncertain and indirect measurements. International Journal of Bifurcation and Chaos. 2004, 14 (6): 1905-1933. 10.1142/S0218127404010345.
Kuzmic P: Program DYNAFIT for the analysis of enzyme kinetic data: application to HIV proteinase. Ana Biochem. 1996, 237: 260-273. 10.1006/abio.1996.0238.
Boyd SP, Vandenberghe L: Convex Optimization. 2003, Cambridge, U.K.: Cambridge University Press,
Winston WL: Operations research: applications and algorithms. 1994, Belmont, CA: Duxbury Press,
Fletcher RE: Numerical experiments with an exact penalty function method. Nonlinear Programming. 1981, 4: 99-129.
Powell MJD: A Fast Algorithm for Nonlinearly Constrained Optimization Calculations. Numerical Analysis. 1978, 27: 630-
Runarsson TP, Yao X: Stochastic ranking for constrained evolutionary optimization. IEEE Trans Evol Comput. 2000, 4: 284-294. 10.1109/4235.873238.
Raue A, Kreutz C, Maiwald T, J B, Schilling M, Klingmuller U, Timmer J: Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood. Bioinformatics. 2009, 25 (15): 1923-1929. 10.1093/bioinformatics/btp358
Hengl S, Kreutz C, Timmer J, Maiwald T: Data-based identifiability analysis of non-linear dynamical models. Bioinformatics. 2007, 23 (19): 2612-2618. 10.1093/bioinformatics/btm382
Yue H, Brown M, Knowles J, Wang H, Broomhead DS, Kell DB: Insights into the behaviour of systems biology models from dynamic sensitivity and identifiability analysis: a case study of an NF-kB signalling pathway. Mol Biosyst. 2006, 2 (12): 640-649. 10.1039/b609442b
Liu PK, Wang FS: Inference of biochemical network models in S-system using multiobjective optimization approach. Bioinformatics. 2008, 24 (8): 1085-1092. 10.1093/bioinformatics/btn075
Acknowledgements
The work is supported by CERG Grant 9041393 (CityU 123808). We would like to express our appreciation for Dr. Robin S. Bradbeer and Dr. K. T. Ko for their fruitful suggestions.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Authors' contributions
All authors contributed to the development of the theory and analysis of the results. CJ designed, analyzed and implemented the algorithm, performed the experiments and wrote the main body of manuscript. LY contributed to the development of the method and writing the manuscript and providing the critical review of the manuscript. All authors have read and approved the final version of the manuscript.
Electronic supplementary material
12918_2010_604_MOESM1_ESM.PDF
Additional file 1:In this additional file, we tested the proposed methods on seven systems biology models were used to test: TNFα -Mediated NF-κ B-Signaling Pathway Model, RKIP Regulated ERK Pathway model and the model of irreversible inhibition of HIV proteinase; Yeast fermentation pathway Model, large-scale target genetic network model, a three step pathway model and the mammalian G1/S transition network model. (PDF 1 MB)
12918_2010_604_MOESM2_ESM.PDF
Additional file 2:In this additional file, we use E2F/DP dimmer model to illustrate the differences between the three different type methods mentioned in the paper: (i) the direct optimization method; (ii) decomposition methods; (iii) methods Combine spline theory and NLP. (PDF 445 KB)
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Zhan, C., Yeung, L.F. Parameter estimation in systems biology models using spline approximation. BMC Syst Biol 5, 14 (2011). https://doi.org/10.1186/1752-0509-5-14
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/1752-0509-5-14