- Methodology article
- Open Access
Constructing stochastic models from deterministic process equations by propensity adjustment
- Jialiang Wu1,
- Brani Vidakovic2 and
- Eberhard O Voit2, 3Email author
https://doi.org/10.1186/1752-0509-5-187
© Wu et al; licensee BioMed Central Ltd. 2011
- Received: 18 July 2011
- Accepted: 8 November 2011
- Published: 8 November 2011
Abstract
Background
Gillespie's stochastic simulation algorithm (SSA) for chemical reactions admits three kinds of elementary processes, namely, mass action reactions of 0th, 1st or 2nd order. All other types of reaction processes, for instance those containing non-integer kinetic orders or following other types of kinetic laws, are assumed to be convertible to one of the three elementary kinds, so that SSA can validly be applied. However, the conversion to elementary reactions is often difficult, if not impossible. Within deterministic contexts, a strategy of model reduction is often used. Such a reduction simplifies the actual system of reactions by merging or approximating intermediate steps and omitting reactants such as transient complexes. It would be valuable to adopt a similar reduction strategy to stochastic modelling. Indeed, efforts have been devoted to manipulating the chemical master equation (CME) in order to achieve a proper propensity function for a reduced stochastic system. However, manipulations of CME are almost always complicated, and successes have been limited to relative simple cases.
Results
We propose a rather general strategy for converting a deterministic process model into a corresponding stochastic model and characterize the mathematical connections between the two. The deterministic framework is assumed to be a generalized mass action system and the stochastic analogue is in the format of the chemical master equation. The analysis identifies situations: where a direct conversion is valid; where internal noise affecting the system needs to be taken into account; and where the propensity function must be mathematically adjusted. The conversion from deterministic to stochastic models is illustrated with several representative examples, including reversible reactions with feedback controls, Michaelis-Menten enzyme kinetics, a genetic regulatory motif, and stochastic focusing.
Conclusions
The construction of a stochastic model for a biochemical network requires the utilization of information associated with an equation-based model. The conversion strategy proposed here guides a model design process that ensures a valid transition between deterministic and stochastic models.
Background
Most stochastic models of biochemical reactions are based on the fundamental assumption that no more than one reaction can occur at the exact same time. A consequence of this assumption is that only elementary chemical reactions can be converted directly into stochastic analogues [1]. These include: 1) zero-order reactions, such as the generation of molecules at a constant rate; 2) first-order reactions, with examples including elemental chemical reactions as well as transport and decay processes; and 3) second-order reactions, which include heterogeneous and homogeneous bimolecular reactions (dimerization). Reactions with integer kinetic orders other than 0, 1 and 2 are to be treated as combinations of sequential elementary reactions. The advantage of the premise of non-simultaneous reaction steps is that the stochastic reaction rate can be calculated from a deterministic, equation-based model with some degree of rigor, even though the derivation is usually not based on first physical principles but instead depends on other assumptions and on macroscopic information, such as a fixed rate constant in the equation-based model. The severe disadvantage is that this rigorous treatment is not practical for modelling larger biochemical reaction systems. The reasons include the following. First, in many cases, elementary reaction rates are not available. Secondly, even in the case that all reaction parameters are available, the computational expense is very significant when the system involves many species and reactions, and this fact ultimately leads to a combinatorial explosion of required computations. Within a deterministic modelling framework, the common practice in this situation is to fit the transient and steady-state experimental data with a phenomenological, (differential) equation-based model, which explicitly or implicitly eliminates or merges some intermediate species and reactions. The best-known examples are probably Michaelis-Menten and Hill rate laws, which are ultimately explicit, but in truth approximate a multivariate system of underlying chemical processes.
Similar model reduction efforts have been carried out for stochastic modelling. For instance, the use of a complex-order function (which corresponds to a reduced equation-based model) was shown to be justified for some types of stochastic simulations. A prominent example is again the Michaelis-Menten rate law, which can be reduced from a system of elementary reactions to an explicit function by means of the quasi-steady-state assumption (see Result section and [2, 3]). However, model reduction within the stochastic framework has proven to be far more difficult than in the deterministic counterpart. The difficulties are mainly due to the fact that the reduction must be carried out on the chemical master equation (CME). This process is nontrivial and has succeeded only in simple cases.
In general, the construction of a stochastic model for a large biochemical network requires the use of information available from an equation-based model. In the past, several strategies have been proposed for this purpose and within the context of Gillespie's exact stochastic simulation algorithm (SSA; [1]) and its variants [4]. For example, Tian and Burrage [5] proposed that a stochastic model could be directly formulated from the deterministic model through a Poisson leaping procedure. However, a rigorous mathematical justification for such a conversion is lacking. Typical moment-based approaches [6–8] derive ODEs for the statistical moments of the stochastic model from an equation-based model where the 0th, 1st and 2nd order reactions follow mass action rate laws. More recently the moment method was extended to cover models consisting of rational rate laws [9]. Moreover, it was realized that the moment method is complementary to, but cannot fully replace, stochastic simulations, because it does not cover situations like genetic switches [6, 10].
In this article, we explore the mathematical connection between deterministic and stochastic frameworks for the pertinent case of Generalized Mass Action (GMA) systems, which are frequently used in Biochemical Systems Theory (BST; [11–13]). Specifically, we address two questions: First, under what conditions can a deterministic, equation-based model be converted directly into a stochastic simulation model? And second, what is a proper way of implementing this conversion? We will develop a method to answer these questions and demonstrate it for functions in the canonical power-law format of GMA systems. However, the results are applicable to other functions and formats as well, as we will demonstrate with several examples.
Representations of systems of biochemical reactions
The size of the system is defined as Φ = AU, where A is the Avogadro number and U is the reaction volume.
The modelling of biochemical reaction networks typically uses one of two conceptual frameworks: deterministic or stochastic. In a deterministic framework, the state of the system is given by the a non-negative vector , where component [X s (t)] represents the concentration of species S s , measured in moles per unit volume. The temporal evolution of the state of the system is modelled by a set of ordinary differential equations, which in our case are assumed to follow a generalized mass action (GMA) kinetic law. By contrast, in a stochastic framework, the state of the systems is characterized by a vector , whose values are non-negative integers. Specifically, x s (t) = Φ [X s (t)] is the count of S s molecules, which is a sample value of the random variable X s (t). The system dynamics of this process is typically described with the chemical master equation (CME). Both GMA and CME will be discussed in detail in the following sections.
Motivation for the power-law formalism: reactions in crowded media
Power-law functions with non-integer kinetics have proven very useful in biochemical systems analysis, and forty years of research have demonstrated their wide applicability (e.g., see [11–13]). Generically, this type of description of a biochemical reaction can be seen either as a Taylor approximation in logarithmic space or as a heuristic or phenomenological model that has been applied successfully hundreds of times and in different contexts, even though it is difficult or impossible in many situations to trace it back to first mechanistic principles. A particularly interesting line of support for the power-law format can be seen in the example of a bimolecular reaction occurring in a spatially restricted environment. Savageau demonstrated that the kinetics of such a reaction can be validly formulated as a generalization of the law of mass action, where non-integer kinetic orders are allowed [14, 15]. Neff and colleagues [16–18] showed with careful experiments that this formulation is actually more accurate than alternative approaches.
The first term on the right-hand side of this equation, f ([X1], [X2])Δt x1x2, describes the production of S3: it depends on the totality of possible collisions x1 x2 and also on some fraction f ([X1], [X2])Δt that actually reacts and forms the product. In a dilute environment, f ([X1], [X2]) equals a traditional rate constant, and the reaction obeys the law of mass action, while in a spatially restricted environment, such as the cytoplasm, one needs to take crowding effects into account. As shown in Savageau [14, 15], the desired fraction of a reaction in a crowded environment becomes a rate function that depends on the current concentrations of S1 and S2. The second term, g ([X3]) Δtx3, describes the fraction g ([X3]) Δt of species S3 that dissociates back into S1 and S2. This fraction may depend on some functional form of [X3] because in a crowded environment the complex may not be able to dissociate effectively. Thus, rate constants in the generalized mass action setting become rate functions (cf. [17]).
where a = α + 1, b = β + 1, and c = γ + 1. As long as k f , k d , a, b and c remain more or less constant throughout a relevant range, the power-law model is mathematically well justified. In actual applications, the values of rate constants and kinetic orders can be estimated from experimental data [19]. When the functions f and g are originally not in power-law format, they can be locally approximated by power-law functions with a procedure similar to the one shown above (Equations (3) to (5)). An illustration will be given in the example section.
The Generalized Mass Action (GMA) format
for every s = 1, ..., N s . Each reaction contributes either a production flux or a degradation flux to the dynamics of a certain species. Positive terms (v rs > 0) represent the production of S s , while negative terms (v rs < 0) describe degradation. If f rs is positive, then S s accelerates the reaction R r ; a negative value represents that S s inhibits the reaction, and f rs = 0 implies that S s has no influence on the reaction. The rate constant k r for reaction R r , is either positive or zero. Both, the rate constant and the kinetic order, are to be estimated from data.
Proper use of equation-based functions for stochastic simulations
- 1)
f is a linear function;
- 2)
the reaction is monomolecular;
- 3)
all X i in the system are noise-free variables, i.e., without (or with ignorable) fluctuations, which implies that the covariance of any two participating reactants is zero (or close to zero).
Each of these assumptions constitutes a sufficient condition for the direct use of a rate function as the propensity function and applies, in principle, to GMA as well as other systems. The validity of these conditions will be discussed later. Specifically, the first condition will be addressed in the Results section under the headings "0th-order reaction kinetics" and "1st-order reaction kinetics, " while the second condition will be discussed under the heading "Real-valued order monomolecular reaction kinetics." The third condition will be the focus of Equations (29-36) and their associated explanations.
In reality, the rates of reactions in biochemical systems are commonly nonlinear functions of the reactant species, and fluctuations within each species are not necessarily ignorable. Therefore, to the valid use of an equation-based model in a stochastic simulation mandates that we know how to define a proper propensity function. The following section addresses this issue. It uses statistical techniques to characterize estimates for both the mean and variance of the propensity function, and these features will allow an assessment of the validity of the assumption α(X) = f s (X) and prescribe adjustments if the assumption is not valid.
Methods
Deriving the mean and variance of a power-law function of random variables
(for details, see Additional file 1). Here,
The approximation formulae for μ PL and σ PL 2 in eqns. (8)-(10) provide an easy numerical implementation if observation data are available to estimate cov [logX i , logX j ]. Furthermore, Equations (11)-(13) demonstrate how μ PL and σ PL 2 are related to μ s , σ s 2 and σ ij ; however, the price of this insight is paid by the possible inaccuracy introduced through the Taylor approximation. Equations (15)-(17) also provide a functional dependence of μ PL and σ PL 2 on (μ s , σ s 2, σ ij ), but it is only valid if the additional assumption of log-normality is acceptable.
Deriving proper propensity functions for stochastic simulations from differential equation-based models
where Φ is the system size as defined above.
Updating CME requires knowledge of every possible combination of all species counts within the population, which immediately implies that it can be solved analytically for only a few very simple systems and that numerical solutions are usually prohibitively expensive [24]. To address the inherent intractability of CME, Gillespie developed an algorithm, called the Stochastic Simulation Algorithm (SSA), to simulate CME models [1]. SSA is an exact procedure for numerically simulating the time evolution of a well-stirred reaction system. It is rigorously based on the same microphysical premise that underlies CME and gives a more realistic representation of a system's evolution than a deterministic reaction rate equation represented by ODEs. SSA requires knowledge of the propensity function, which however is truly available only for elementary reactions. These reactions include: 1) 0th order reactions, exemplified with the generation of a molecule at a constant rate; 2) 1st order monomolecular reactions, such as an elemental chemical conversion or decay of a single molecule; 3) 2nd order bimolecular reactions, including reactive collisions between two molecules of the same or different species. The reactive collision of more than two molecules at exactly the same time is considered highly unlikely and modelled as two or more sequential bimolecular reactions.
Here , where x s is the sample value of random variable X s . The approximation is invoked when x s is large and (x s - 1), ..., (x s - v rs + 1) are approximately equal to x s .
In Gillespie's original formulation [1] c r is a constant that only depends on the physical properties of the reactant molecules and the temperature of the system, and c r dt is the probability that a particular combination of reactant molecules will react within the next infinitesimally small time interval (t, t + dt). The constant c r can be calculated from the corresponding deterministic rate constants, if they are known.
The details of these derivations are shown in Additional file 1.
- 1)adopt a zero-covariance assumption as was done in [25], which implies ignoring random fluctuations within every species as well as their correlations. This assumption is only justified for some special cases such as monomolecular and bimolecular reactions under the thermodynamic limit (cf. [4, 6]), but is not necessary valid in generality. Here the thermodynamic limit is defined as a finite concentration limit which the system reaches when both population and volume approach infinity. Under this assumption, the left hand side of (29) becomes(30)
Here, the index r_0 is used to distinguish this 0-covariance propensity function from a second type of propensity in the next section.
for every s = 1, ..., N s .. Note that this result is exactly equivalent to the equation-based model (27).
for every s = 1, ..., N s . These expressions demonstrate that even with large numbers of molecules the mean of CME does not always converge to the GMA model. Indeed, the convergence is only guaranteed in one of the following special situations: 1) the reaction is of 0th order; 2) the reaction is a real value-order monomolecular reaction, with 1st order reaction as a special case; 3) the covariance contribution in (34) is sufficiently small to be ignored for all participating reactant species of a particular reaction channel. Except for these three special situations, the covariance as shown in (34) significantly affects the mean dynamics. Therefore, stochastic simulations using zero-covariance propensity functions will in general yield means different from what the deterministic GMA model produces. How large these differences are cannot be said in generality. Under the assumption that the GMA model correctly captures the mean dynamics of every species, this conclusion means that αr_0 is not necessarily an accurate propensity function for stochastic simulations, and the direct conversion of the equation-based model into a propensity function must be considered with caution.
- 2)We again assume that the GMA model is well defined, which implies that information regarding the species correlations and fluctuations has been captured in the parameters of the GMA model on the left hand size of Equations (7) and (28). To gain information regarding correlations, we use Taylor expansion to approximate the propensity function (see Additional file 1 for details):(37)
Here it is important to understand that although the random variables {X s }s∈Sappear in the expression c r (x), c r (x) is not a function of random variables but a deterministic function. The reason is that the cov [logX i (t), logX j (t)] in the composition of c r (x), which as the numerical characteristic of the random variables {X s }s∈S, is deterministic. Therefore, the stochastic rate function c r (x) is a well-justified deterministic function that is affected by both the state of the system and cov [logX i (t), logX j (t)], the numerical characteristic of fluctuations in the random variables {X s }s∈S.
Remembering that cov [logX i (t), logX j (t)], which is a component in both the stochastic rate function c r (x) and now in the function paf(t), is a deterministic function rather than a function of random variables, paf(t) is a deterministic correction to the kinetic constant k r in the construction of αr_cov in (41), which corrects the stochastic simulation toward the correct average.
for every s = 1, ..., N s , which is equivalent in approximation to the GMA model (28). In the other words, the mean of every molecular species obtained by using αr_cov in the CME derived equation (27) is approximately identical to the corresponding macroscopic variable in the GMA model.
Calculation of cov [logX i (t), logX j (t)]
When data in the form of multiple time series for all the reactants are available, it is possible to compute cov [logX i (t), logX j (t)] directly from these data. Once this covariance is known, the function paf, αr_cov and the mean dynamics can all be assessed. Alas, the availability of several time series data for all reactants under comparable conditions is rare, so that cov [logX i (t), logX j (t)] must be estimated in a different manner.
If one can validly assume that the covariance based on αr_0 does not differ significantly from the covariance based on αr_cov, one may calculate cov [logX i (t), logX j (t)] by one of following methods.
Method 1:
One uses αr_0 to generate multiple sets of time series data of all reactants and then computes cov [logX i (t), logX j (t)].
Method 2:
The first functional expression of cov [logX i (t), logX j (t)] is achieved by Taylor approximation, whereas the second expression is obtained by the additional assumption that the concentrations (X1, ..., X s ) are log-normally distributed [8, 23]. The consideration of a log-normal distribution is often justified by the fact that many biochemical data have indeed been observed to be log-normally distributed (e.g., [20–22]).
Second, one uses αr_0 to approximate the mean and covariance either by direct simulation, as shown in method 1, or by a moment-based approach, which is explained in Additional file 2, and which yields the differential equations
Here for r = 1, ..., N r , and s, m, n = 1, ..., N s , , (V) rs = v rs , , , , , , , , and Λ is a diagonal matrix with .
Statistical criteria for propensity adjustment
Suppose an equation-based model captures the average behavior of a stochastic system and one intends to find the propensity function for a stochastic simulation that will reproduce that means. One can use the 95% confidence interval to evaluate the need for a propensity adjustment. Specifically, for stable systems that will reach a steady state, we use the reversible reaction model as an example. If the steady state of the ODE x st is within the 95% confidence interval of n runs of stochastic simulations, i.e. , then the rate function in the original ODEs can be used as the propensity without adjustment; otherwise propensity adjustment is needed. Here μ st and δ st can be attained from either a moment-base method or from n independent runs of stochastic simulations using propensity without adjustment. An example discussing a reversible reaction with feedback controls can be found in the results section.
For other systems that do not reach a steady state, but where instead transient characteristics are of the highest interest, one can judge the need of propensity adjustment by whether the pertinent characteristics of the ODEs are within the 95% confidence interval of the corresponding characteristic, which is given by a prediction from the moment-based method or from n runs of stochastic simulations. The Repressilator example in the result section will serve as a demonstration.
Results
Generic special cases
- 1)
0th-order reaction kinetics
- 2)
1st-order reaction kinetics
- 3)
Real-valued order monomolecular reaction kinetics
- 4)
2nd-order reaction kinetics
- 5)
Bimolecular reaction with real-valued order kinetics
For bimolecular reactions of complex order, the propensity function is different from the rate equation. The difference can be ignored only if the contribution from the covariance is insignificant.
Power-law representation of a reversible reaction with feedback controls
Scheme of reversible reaction with feedback controls. S3 inhibits the forward reaction and S1 activates the reverse reaction.
Here S3 feeds back to inhibit the forward reaction and S1 feeds back on the reverse reaction and accelerates it. The task is to develop a stochastic model whose performance converges to that of the deterministic GMA model. We can see from equations (52) that three variables x1, x2 and x3 contribute to the forward flux and two variables x1 and x3 contribute to the backward flux . Because several variables are involved, their covariance has the potential of affecting the forward and the backward propensity functions in a stochastic simulation. To obtain the covariance information, we formulate the moment equations (53) from the ODE model (52).
Here μ = (μ1, μ2, μ3) T , , .
Moreover, for r = 1, 2 and m, n = 1, 2, 3, , α" = (α1", α2") T , , , α"⊙σ ≜ (α1"⊙ σ, α2"⊙ σ) T , α' = (α1', α2'), , and .
Comparative simulation results for a reversible reaction with feedback controls. In all panels, the x-axis denotes time in seconds and the y-axis represents the number of molecules of species S1. The upper and lower panels use two different sets of initial numbers of molecules, namely: (x1(0), x2(0), x3(0), U) = (5, 5, 6, 1μm3) and (x1(0), x2(0), x3(0), U) = (100, 100, 120, 20μm3), respectively. Other simulation parameters are (f1, f2, f3, g1, g3, k f , k g ) = (1.3, 1.8, -1, 1, 1, 0.5, 0.5). In both the upper and lower panels, the first column compares the time evolution of S1 molecules by different methods: the black line shows the ODE solution of Equation (52) for x1 ; the blue lines are the solutions of Equation (53) for μ1 and for μ1 ± σ1, respectively. The red dotted lines framing the mean indicate the 95% confidence interval. The second column shows the propensity adjustment functions for the forward reaction (solid line) and the backward reaction (dashed line). The third column shows 100 independent stochastic simulations with propensity adjustment (blue means and error bars), in comparison with the ODE (Equation (52)) prediction (black line). The fourth column shows a second set of 100 independent stochastic simulations without propensity adjustment (blue means and error bars), in comparison with the ODE (Equation (52)) prediction (black line). The red dotted lines framing the mean in columns 3 and 4 again indicate the 95% confidence intervals.
Repressilator
Reaction scheme of the Repressilator. Gene G1 codes for protein x1, whose dimer y1 represses the transcription of gene G2. Similarly, y2, the dimer of gene G2's protein product x2, represses the transcription of gene G3, and y3, the dimer of gene G3's protein product x3, represses the transcription of gene G1.
[28]. Here Φ = 1, , c p = κ+/κ-, c d = k+/k- and d = d0, i+ d r, i for i = 1, 2, 3. It has been shown that the simplified ODEs rather accurately approximate the transient dynamics of the full system by retaining the original oscillation period and amplitude.
Scaling of the Repressilator equations changes the oscillation period in the stochastic simulation. Solid lines represent solutions of ODEs (56), while dotted lines are trajectories of a stochastic simulation; blue lines represent x1 and black lines represent m1.
We can see from equations (55) that two variables x i and m i contribute to the production of x i ; hence, their covariance could affect the propensity function of x i in the production reaction of a stochastic simulation. Similar to the example of a reversible reaction (Equation 52), it is therefore necessary to evaluate covariance effects and to judge whether the propensity function needs adjusting. Thus, we need to compare the difference between the dynamics of the phenomenological model (55) and the dynamics under the influence of covariance, which can be produced by either stochastic simulation or the moment approach.
Power-law approximation of p ( x i ) -1 . Left panel: Approximation of y i = p(x i )-1 by a straight line in log-log space. Right panel: Corresponding power-law function in Cartesian space. Both axes are unitless.
which models the original function very well (see Figure 5). For x i ∈ [1, 30], this power-law function does not fit the original function precisely; the effect of this imprecision can be evaluated later at after we use this power-law function in the moment-based method. Moreover, using the truncated moment equations to estimate the mean and variance involves multiple approximations: First, the function p (x i )-1 on the right-hand side of (55) is replaced by a power-law function (see Figure 5). Second, the result is approximated by Taylor expansion to the second order. Third, similar to the example of a reversible reaction, the central moment of the third degree is assumed to be zero, which leads to a closed-form ODE for the first two moments.
Comparison of the dynamics of the Repressilator models using the original ODEs (55), the GMA approximation, and the moment approach based on the GMA approximation. The mean of the moment approach based on the GMA approximation fits the original ODEs (55) very well up to about t = 400 s. Black bold line: solution of the original ODEs (55); black dashed line: the GMA approximation; Red line: mean of the moment approach based on the GMA approximation; red dashed lines framing around the red line: mean ± standard deviation, which were produced with the moment approach. x-axis is time in second, y-axis is the number of x1 molecules (unitless).
Enzymatic reaction using a quasi-steady state assumption (QSSA)
which is known as Michaelis-Menten kinetics [30]. The characterizing parameters are Vmax = k2[E]0 and K m = (k-1 + k2)/k1.
where the volume was scaled so that Φ = 1 and the lower-case letter s denotes the molecule count of species S. Instead of reviewing the relatively complicated manipulations with CME, we show in the following that the techniques described above lead directly from the equation-based model to the propensity function for the reduced system.
Thus, we arrive at the propensity function for the reduced system, which is identical to the result of Rao and Arkin obtained through manipulations of CME.
In the above derivation, we used the simplest type of recasting, where a new, auxiliary variable simply consists of an old variable plus a constant. This reformulation of the Michaelis-Menten process as a pair of GMA equations is a special case of a much more general recasting technique that permits the equivalent conversion of any system of ordinary differential equations into a power-law format [31]. However, this equivalence transformation imposes constraints on the variables of the GMA equations, and it is at this point unclear whether there are mathematical warranties ensuring that the proposed transition from differential to stochastic equations in general preserves these constraints in all cases. This question will require further investigation.
Stochastic Focusing
Stochastic focusing [26] describes the phenomenon that the fluctuations of a chemical species can drive the system to reach a different steady state than what a deterministic ODE model predicts. To demonstrate the utility of propensity adjustment, we derive a stochastic model which produces consistent results with those of the deterministic model.
Here μ = (μ I , μ P , μ S ) T , , .
Moreover, for r = 1, ..., 6 and m, n = i, p, s, , α" = (α1", ..., α6") T , ,