Constructing stochastic models from deterministic process equations by propensity adjustment

Wu, Jialiang; Vidakovic, Brani; Voit, Eberhard O

doi:10.1186/1752-0509-5-187

Methodology article
Open access
Published: 08 November 2011

Constructing stochastic models from deterministic process equations by propensity adjustment

Jialiang Wu¹,
Brani Vidakovic² &
Eberhard O Voit^2,3

BMC Systems Biology volume 5, Article number: 187 (2011) Cite this article

9807 Accesses
13 Citations
1 Altmetric
Metrics details

Abstract

Background

Gillespie's stochastic simulation algorithm (SSA) for chemical reactions admits three kinds of elementary processes, namely, mass action reactions of 0^th, 1^st or 2^nd order. All other types of reaction processes, for instance those containing non-integer kinetic orders or following other types of kinetic laws, are assumed to be convertible to one of the three elementary kinds, so that SSA can validly be applied. However, the conversion to elementary reactions is often difficult, if not impossible. Within deterministic contexts, a strategy of model reduction is often used. Such a reduction simplifies the actual system of reactions by merging or approximating intermediate steps and omitting reactants such as transient complexes. It would be valuable to adopt a similar reduction strategy to stochastic modelling. Indeed, efforts have been devoted to manipulating the chemical master equation (CME) in order to achieve a proper propensity function for a reduced stochastic system. However, manipulations of CME are almost always complicated, and successes have been limited to relative simple cases.

Results

We propose a rather general strategy for converting a deterministic process model into a corresponding stochastic model and characterize the mathematical connections between the two. The deterministic framework is assumed to be a generalized mass action system and the stochastic analogue is in the format of the chemical master equation. The analysis identifies situations: where a direct conversion is valid; where internal noise affecting the system needs to be taken into account; and where the propensity function must be mathematically adjusted. The conversion from deterministic to stochastic models is illustrated with several representative examples, including reversible reactions with feedback controls, Michaelis-Menten enzyme kinetics, a genetic regulatory motif, and stochastic focusing.

Conclusions

The construction of a stochastic model for a biochemical network requires the utilization of information associated with an equation-based model. The conversion strategy proposed here guides a model design process that ensures a valid transition between deterministic and stochastic models.

Background

Most stochastic models of biochemical reactions are based on the fundamental assumption that no more than one reaction can occur at the exact same time. A consequence of this assumption is that only elementary chemical reactions can be converted directly into stochastic analogues [1]. These include: 1) zero-order reactions, such as the generation of molecules at a constant rate; 2) first-order reactions, with examples including elemental chemical reactions as well as transport and decay processes; and 3) second-order reactions, which include heterogeneous and homogeneous bimolecular reactions (dimerization). Reactions with integer kinetic orders other than 0, 1 and 2 are to be treated as combinations of sequential elementary reactions. The advantage of the premise of non-simultaneous reaction steps is that the stochastic reaction rate can be calculated from a deterministic, equation-based model with some degree of rigor, even though the derivation is usually not based on first physical principles but instead depends on other assumptions and on macroscopic information, such as a fixed rate constant in the equation-based model. The severe disadvantage is that this rigorous treatment is not practical for modelling larger biochemical reaction systems. The reasons include the following. First, in many cases, elementary reaction rates are not available. Secondly, even in the case that all reaction parameters are available, the computational expense is very significant when the system involves many species and reactions, and this fact ultimately leads to a combinatorial explosion of required computations. Within a deterministic modelling framework, the common practice in this situation is to fit the transient and steady-state experimental data with a phenomenological, (differential) equation-based model, which explicitly or implicitly eliminates or merges some intermediate species and reactions. The best-known examples are probably Michaelis-Menten and Hill rate laws, which are ultimately explicit, but in truth approximate a multivariate system of underlying chemical processes.

Similar model reduction efforts have been carried out for stochastic modelling. For instance, the use of a complex-order function (which corresponds to a reduced equation-based model) was shown to be justified for some types of stochastic simulations. A prominent example is again the Michaelis-Menten rate law, which can be reduced from a system of elementary reactions to an explicit function by means of the quasi-steady-state assumption (see Result section and [2, 3]). However, model reduction within the stochastic framework has proven to be far more difficult than in the deterministic counterpart. The difficulties are mainly due to the fact that the reduction must be carried out on the chemical master equation (CME). This process is nontrivial and has succeeded only in simple cases.

In general, the construction of a stochastic model for a large biochemical network requires the use of information available from an equation-based model. In the past, several strategies have been proposed for this purpose and within the context of Gillespie's exact stochastic simulation algorithm (SSA; [1]) and its variants [4]. For example, Tian and Burrage [5] proposed that a stochastic model could be directly formulated from the deterministic model through a Poisson leaping procedure. However, a rigorous mathematical justification for such a conversion is lacking. Typical moment-based approaches [6–8] derive ODEs for the statistical moments of the stochastic model from an equation-based model where the 0^th, 1^st and 2^nd order reactions follow mass action rate laws. More recently the moment method was extended to cover models consisting of rational rate laws [9]. Moreover, it was realized that the moment method is complementary to, but cannot fully replace, stochastic simulations, because it does not cover situations like genetic switches [6, 10].

In this article, we explore the mathematical connection between deterministic and stochastic frameworks for the pertinent case of Generalized Mass Action (GMA) systems, which are frequently used in Biochemical Systems Theory (BST; [11–13]). Specifically, we address two questions: First, under what conditions can a deterministic, equation-based model be converted directly into a stochastic simulation model? And second, what is a proper way of implementing this conversion? We will develop a method to answer these questions and demonstrate it for functions in the canonical power-law format of GMA systems. However, the results are applicable to other functions and formats as well, as we will demonstrate with several examples.

Representations of systems of biochemical reactions

Consider a well-stirred biochemical reaction system with constant volume and temperature, where N_s different chemical species ${S_{s}}_{s = 1}^{N_{s}}$ , interact through N_r unidirectional reaction channels ${R_{r}}_{r = 1}^{N_{r}}$ . Each reaction channel can be characterized as

R_{r} : {\underline{v}}_{r 1} S_{1} + \dots + {\underline{v}}_{r N_{s}} S_{N_{s}} \to_{}^{k_{r}} {\bar{v}}_{r 1} S_{1} + \dots + {\bar{v}}_{r N_{s}} S_{N_{s}},

where ${\underline{v}}_{r s}$ and ${\bar{v}}_{r s}$ are the counts of molecular species S_s consumed and produced due to reaction R_r, respectively, and k_r is the rate constant. The changed amount of $S_{s} v_{r s} = {\bar{v}}_{r s} - {\underline{v}}_{r s}$ , which is due to the firing of reaction R_r, defines the stoichiometric coefficient of S_s in R_r. The stoichiometric coefficients of all species can be summarized according to each reaction R_r in the stoichiometric vector

v_{r} ≜ \{\begin{matrix} v_{r 1} \\ ⋮ \\ v_{r N_{s}} \end{matrix}\} \in ℤ^{N_{s}} .

The stoichiometric vectors of all reactions can further be arranged as the stoichiometric matrix of the system

V ≜ [v_{1}, \dots, v_{N_{r}}] \in ℤ^{N_{s} \times N_{r}} .

The size of the system is defined as Φ = AU, where A is the Avogadro number and U is the reaction volume.

The modelling of biochemical reaction networks typically uses one of two conceptual frameworks: deterministic or stochastic. In a deterministic framework, the state of the system is given by the a non-negative vector $[X (t)] = {[[X_{1} (t)], \dots, [X_{N_{s}} (t)]]}^{T} \in ℝ^{N_{s}}$ , where component [X_s(t)] represents the concentration of species S_s, measured in moles per unit volume. The temporal evolution of the state of the system is modelled by a set of ordinary differential equations, which in our case are assumed to follow a generalized mass action (GMA) kinetic law. By contrast, in a stochastic framework, the state of the systems is characterized by a vector $x (t) = {[x_{1} (t), \dots, x_{N_{s}} (t)]}^{T} \in ℤ^{N_{s}}$ , whose values are non-negative integers. Specifically, x_s(t) = Φ [X_s (t)] is the count of S_s molecules, which is a sample value of the random variable X_s(t). The system dynamics of this process is typically described with the chemical master equation (CME). Both GMA and CME will be discussed in detail in the following sections.

Motivation for the power-law formalism: reactions in crowded media

Power-law functions with non-integer kinetics have proven very useful in biochemical systems analysis, and forty years of research have demonstrated their wide applicability (e.g., see [11–13]). Generically, this type of description of a biochemical reaction can be seen either as a Taylor approximation in logarithmic space or as a heuristic or phenomenological model that has been applied successfully hundreds of times and in different contexts, even though it is difficult or impossible in many situations to trace it back to first mechanistic principles. A particularly interesting line of support for the power-law format can be seen in the example of a bimolecular reaction occurring in a spatially restricted environment. Savageau demonstrated that the kinetics of such a reaction can be validly formulated as a generalization of the law of mass action, where non-integer kinetic orders are allowed [14, 15]. Neff and colleagues [16–18] showed with careful experiments that this formulation is actually more accurate than alternative approaches.

Within the conceptual framework of power-law representations, the rate of the association reaction between molecules of species S₁ and S₂ is given as $k {[X_{1} (t)]}^{f_{1}} {[X_{2} (t)]}^{f_{2}}$ . Here, k is the rate constant and f₁ and f₂ are real-valued kinetic orders, which are no longer necessarily positive integers as it is assumed in a mass action law. As an example, consider the reversible bimolecular reaction $S_{1} + S_{2} ⇌_{k_{b}}^{k_{f}} S_{3}$ . Like Neff and colleagues [17], we begin by formulating a discrete update function for the population of S₃ molecules as

x_{3} (t + Δ t) - x_{3} (t) = f ([X_{1}], [X_{2}]) Δ t x_{1} x_{2} - g ([X_{3}]) Δ t x_{3} .

(1)

The first term on the right-hand side of this equation, f ([X₁], [X₂])Δt x₁x₂, describes the production of S₃: it depends on the totality of possible collisions x₁ x₂ and also on some fraction f ([X₁], [X₂])Δt that actually reacts and forms the product. In a dilute environment, f ([X₁], [X₂]) equals a traditional rate constant, and the reaction obeys the law of mass action, while in a spatially restricted environment, such as the cytoplasm, one needs to take crowding effects into account. As shown in Savageau [14, 15], the desired fraction of a reaction in a crowded environment becomes a rate function that depends on the current concentrations of S₁ and S₂. The second term, g ([X₃]) Δtx₃, describes the fraction g ([X₃]) Δt of species S₃ that dissociates back into S₁ and S₂. This fraction may depend on some functional form of [X₃] because in a crowded environment the complex may not be able to dissociate effectively. Thus, rate constants in the generalized mass action setting become rate functions (cf. [17]).

By taking the limit Δt → 0, one obtains the differential equation

\frac{d x_{3}}{d t} = f ([X_{1}], [X_{2}]) x_{1} x_{2} - g ([X_{3}]) x_{3} .

(2)

Savageau used Taylor series expansion to approximate the functions f and g in the logarithmic space (log [X₁], log[X₂]) around some operating point (a, b). The result for f is

\begin{gathered} log f ([X_{1}], [X_{2}]) ≜ F (log [X_{1}], log [X_{2}]) \\ = {F (a, b) + \frac{\partial f ([X_{1}], [X_{2}])}{\partial [X_{1}]}|}_{(a, b)} (log [X_{1}] - a) \\ + {\frac{\partial f ([X_{1}], [X_{2}])}{\partial [X_{2}]}|}_{(a, b)} (log [X_{2}] - b) + HOT \\ \approx k_{f} + α log [X_{1}] + β log [X_{2}], \end{gathered}

(3)

where k_f, α, and β are constants related to the chosen operating point (a, b). The final step is achieved by ignoring all higher order terms (HOT) beyond the constant and linear terms. Transformation back to the Cartesian space yields

f ([X_{1}], [X_{2}]) \approx k_{a} {[X_{1}]}^{α} {[X_{2}]}^{β}, k_{a} = e^{k_{f}} .

(4)

The same procedure leads to the power-law expression for the degradation term: g ([X₃]) ≈ k_d [X₃]^γ. By combining constants we arrive at a power-law representation for the dynamics of species S₃ as

\begin{aligned} \frac{d [X_{3}]}{d t} & = k_{a} {[X_{1}]}^{α} {[X_{2}]}^{β} [X_{1}] [X_{2}] - k_{d} {[X_{3}]}^{γ} [X_{3}] \\ = k_{a} {[X_{1}]}^{a} {[X_{2}]}^{b} - k_{d} {[X_{3}]}^{c}, \end{aligned}

(5)

where a = α + 1, b = β + 1, and c = γ + 1. As long as k_f, k_d, a, b and c remain more or less constant throughout a relevant range, the power-law model is mathematically well justified. In actual applications, the values of rate constants and kinetic orders can be estimated from experimental data [19]. When the functions f and g are originally not in power-law format, they can be locally approximated by power-law functions with a procedure similar to the one shown above (Equations (3) to (5)). An illustration will be given in the example section.

The Generalized Mass Action (GMA) format

In the GMA format within Biochemical Systems Theory, each process is represented as a univariate or multivariate power-law function. GMA models may be developed de novo or as an approximation of some other nonlinear rate laws. GMA models characterize the time evolution of the system state given that the system was in the state X (t₀) at some initial time t₀. Generically, the state of the system is changed within a sufficiently small time interval by one out of the N_r possible reactions that can occur in the system. The reaction velocity through reaction channel R_r is:

[\frac{{[X_{1} (t)]}^{'}}{v_{r 1}} = \dots = \frac{{[X_{N_{s}} (t)]}^{'}}{v_{r N_{s}}}] = k_{r} \prod_{s = 1}^{N_{s}} {[X_{s} (t)]}^{f_{r s}}

(6)

for those $v_{r s} = {\bar{v}}_{r s} - {\underline{v}}_{r s} \neq 0$ , s = 1, ..., N_s. As shown in the example of a bimolecular reaction, the kinetic order f_rs associated with species S_s captures the effects of both reactant properties (such as the stoichiometric coefficient v_rs) and environmental influences (such as temperature, pressure, molecular crowding effects, etc.). Therefore f_rs does not necessary equal an integer v_rs, which is assumed to be the case in mass action kinetics, but is possibly real-valued and may be negative. Summing up the contributions of all reactions, one obtains a GMA model describing the dynamics of S_s as

\frac{d}{d t} [X_{s} (t)] = \sum_{r = 1}^{N_{r}} v_{r s} k_{r} \prod_{s = 1}^{N_{s}} {[X_{s} (t)]}^{f_{r s}}

(7)

for every s = 1, ..., N_s. Each reaction contributes either a production flux or a degradation flux to the dynamics of a certain species. Positive terms (v_rs > 0) represent the production of S_s, while negative terms (v_rs < 0) describe degradation. If f_rs is positive, then S_s accelerates the reaction R_r; a negative value represents that S_s inhibits the reaction, and f_rs = 0 implies that S_s has no influence on the reaction. The rate constant k_r for reaction R_r, is either positive or zero. Both, the rate constant and the kinetic order, are to be estimated from data.

Proper use of equation-based functions for stochastic simulations

The fundamental concept of a stochastic simulation is the propensity function α(X), and α(X)dt describes the probability that a reaction will change the value of a system variable within the next (infinitesimal) time interval (t, t +dt). While a formal definition will be given later (Equation 18), it is easy to intuit that the propensity function is in some sense analogous to the rate in the corresponding deterministic model. In fact, the propensity function is traditionally assumed to be α(X) = f_s(X), if the deterministic model is X_s' = f_s(X, t), s = 1, ..., N_s. However, a proper justification for this common practice is by and large missing. Indeed, we will show that the direct use of a rate function as the propensity function in a stochastic simulation algorithm requires that at least one of the following assumptions be true:

1)
f is a linear function;
2)
the reaction is monomolecular;
3)
all X_i in the system are noise-free variables, i.e., without (or with ignorable) fluctuations, which implies that the covariance of any two participating reactants is zero (or close to zero).

Each of these assumptions constitutes a sufficient condition for the direct use of a rate function as the propensity function and applies, in principle, to GMA as well as other systems. The validity of these conditions will be discussed later. Specifically, the first condition will be addressed in the Results section under the headings "0^th-order reaction kinetics" and "1^st-order reaction kinetics, " while the second condition will be discussed under the heading "Real-valued order monomolecular reaction kinetics." The third condition will be the focus of Equations (29-36) and their associated explanations.

In reality, the rates of reactions in biochemical systems are commonly nonlinear functions of the reactant species, and fluctuations within each species are not necessarily ignorable. Therefore, to the valid use of an equation-based model in a stochastic simulation mandates that we know how to define a proper propensity function. The following section addresses this issue. It uses statistical techniques to characterize estimates for both the mean and variance of the propensity function, and these features will allow an assessment of the validity of the assumption α(X) = f_s(X) and prescribe adjustments if the assumption is not valid.

Methods

Deriving the mean and variance of a power-law function of random variables

Consider a generic power-law function of random variables X_s with the format $P L (X) = k \prod_{s = 1}^{N_{s}} X_{s}^{f_{s}}$ . Estimates of its mean μ_PL and variance σ_PL are given as

μ_{P L} \approx k \prod_{s = 1}^{N_{s}} μ_{s}^{f_{s}} exp (\sum_{i < j}^{N_{s}} f_{i} f_{j} cov [log X_{i}, log X_{j}])

(8)

{σ_{P L}}^{2} \approx {μ_{P L}}^{2} Ω

(9)

(for details, see Additional file 1). Here,

Ω = \sum_{s = 1}^{N_{s}} f_{s} μ_{s}^{- 2} σ_{s}^{2} + 2 \sum_{i < j}^{N_{s}} f_{i} f_{j} cov [1 og X_{i}, log X_{j}]

(10)

and μ_s = E[X_s] and $σ_{s}^{2} = E [{(X_{s} - μ_{s})}^{2}]$ are the mean and variance of random variable X_s, respectively. If we choose to express cov [logX_i, logX_j] as a function of μ_s, σ_s² and covariance σ_ij = cov [X_i, X_j], using a Taylor approximation, we obtain

μ_{P L} \approx k \prod_{s = 1}^{N_{s}} μ_{s}^{f_{s}} exp (- \frac{1}{2} \sum_{s = 1}^{N_{s}} f_{s} σ_{s}^{2} / μ_{s}^{2} + \frac{1}{2} Ω)

(11)

{σ_{P L}}^{2} \approx {μ_{P L}}^{2} Ω,

(12)

where

\begin{gathered} Ω \approx \sum_{s = 1}^{N_{s}} f_{s} {(σ_{s} / μ_{s})}^{2} + 2 \sum_{i < j}^{N_{s}} f_{i} f_{j} \{σ_{i j} / (μ_{i} μ_{j}) \\ + \frac{1}{2} log (μ_{i}) {(σ_{j} / μ_{j})}^{2} + \frac{1}{2} log (μ_{j}) {(σ_{i} / μ_{i})}^{2} \\ - \frac{1}{4} {(σ_{i} / μ_{i})}^{2} {(σ_{j} / μ_{j})}^{2}\} . \end{gathered}

(13)

Since many biochemical variables approximately follow a log-normal distribution [20–22], it is valuable to consider the special situation where (X₁, ..., X_s)is log-normally distributed (i.e., (logX₁, ..., logX_s) is normally distributed). In such a case, a simpler alternative way to calculate cov [logX_i, logX_j] is

cov [1 og X_{i}, log X_{j}] = log (1 + \frac{σ_{i j}}{μ_{i} μ_{j}}) .

(14)

[23]. By substituting this result into (8)-(10), one obtains

μ_{P L} \approx k \prod_{s = 1}^{N_{s}} μ_{s}^{f_{s}} \prod_{i < j}^{N_{s}} {(1 + \frac{σ_{i j}}{μ_{i} μ_{j}})}^{f_{i} f_{j}}

(15)

{σ_{P L}}^{2} \approx {μ_{P L}}^{2} Ω,

(16)

where

Ω = \sum_{s = 1}^{N_{s}} f_{s} {(\frac{σ_{s}}{μ_{s}})}^{2} + 2 \sum_{i < j}^{N_{s}} f_{i} f_{j} log (1 + \frac{σ_{i j}}{μ_{i} μ_{j}}) .

(17)

The approximation formulae for μ_PL and σ_PL² in eqns. (8)-(10) provide an easy numerical implementation if observation data are available to estimate cov [logX_i, logX_j]. Furthermore, Equations (11)-(13) demonstrate how μ_PL and σ_PL² are related to μ_s, σ_s² and σ_ij; however, the price of this insight is paid by the possible inaccuracy introduced through the Taylor approximation. Equations (15)-(17) also provide a functional dependence of μ_PL and σ_PL² on (μ_s, σ_s², σ_ij), but it is only valid if the additional assumption of log-normality is acceptable.

Deriving proper propensity functions for stochastic simulations from differential equation-based models

Assuming that the GMA model faithfully captures the average behaviour of a biochemical reaction system and recalling $[X (t)] = {([X_{1} (t)], \dots, [X_{N_{s}} (t)])}^{T}$ , the expected metabolite numbers are defined as the expectation

E [X] = [X] Φ,

(18)

where Φ is the system size as defined above.

To describe the reaction channel R_r stochastically, one needs the state update vector v_rand must characterize the quantity of molecules flowing through of reaction channel R_r during a small time interval. The key concept of this type of description is the propensity function α_r(x), which is defined as

\begin{gathered} α_{r} (x) d t = the probability that exactly one reaction \\ R_{r} will occur some where inside U within infinitesmal \\ interval (t, t + d t), given current state X (t) = x . \end{gathered}

(19)

[1]. Because of the probabilistic nature of the propensity function, X(t) is no longer deterministic, and the result is instead stochastic and based on the transition probability

P (x, t | x_{0}, t_{0}) = Prob {X (t) = x, given X (t_{0}) = x_{0}},

(20)

which follows the chemical master equation (CME)

\begin{aligned} \frac{\partial P (x, t | x_{0}, t_{0})}{\partial t} & = \sum_{r = 1}^{N_{r}} [α_{r} (x + v_{r}) P (x + v_{r}, t | x_{0}, t_{0}) \\ - α_{r} (x) P (x, t | x_{0}, t_{0})] \end{aligned}

(21)

Updating CME requires knowledge of every possible combination of all species counts within the population, which immediately implies that it can be solved analytically for only a few very simple systems and that numerical solutions are usually prohibitively expensive [24]. To address the inherent intractability of CME, Gillespie developed an algorithm, called the Stochastic Simulation Algorithm (SSA), to simulate CME models [1]. SSA is an exact procedure for numerically simulating the time evolution of a well-stirred reaction system. It is rigorously based on the same microphysical premise that underlies CME and gives a more realistic representation of a system's evolution than a deterministic reaction rate equation represented by ODEs. SSA requires knowledge of the propensity function, which however is truly available only for elementary reactions. These reactions include: 1) 0^th order reactions, exemplified with the generation of a molecule at a constant rate; 2) 1^st order monomolecular reactions, such as an elemental chemical conversion or decay of a single molecule; 3) 2^nd order bimolecular reactions, including reactive collisions between two molecules of the same or different species. The reactive collision of more than two molecules at exactly the same time is considered highly unlikely and modelled as two or more sequential bimolecular reactions.

For elementary reactions, the propensity function of reaction R_r is computed as the product of a stochastic rate constant c_r and the number h_r of distinct combinations of reactant molecules, i.e.

α_{r} (x) = c_{r} h_{r} (x), r = 1, \dots, N_{r} .

(22)

Here $h_{r} (x) = \{\begin{matrix} \prod_{s = 1}^{N_{s}} (\begin{matrix} x_{s} \\ {\underline{v}}_{r s} \end{matrix}) \approx \frac{\prod_{s = 1}^{N_{s}} x_{s}^{{\underline{v}}_{r s}}}{\prod_{s = 1}^{N_{s}} {\underline{v}}_{r s}!}, for x_{s} \geq {\underline{v}}_{r s} > 0 \\ 0, otherwise \end{matrix}$ , where x_s is the sample value of random variable X_s. The approximation is invoked when x_s is large and (x_s - 1), ..., (x_s - v_rs + 1) are approximately equal to x_s.

In Gillespie's original formulation [1] c_r is a constant that only depends on the physical properties of the reactant molecules and the temperature of the system, and c_rdt is the probability that a particular combination of reactant molecules will react within the next infinitesimally small time interval (t, t + dt). The constant c_r can be calculated from the corresponding deterministic rate constants, if they are known.

Since the assumption of mass action kinetics is not valid generally, especially in spatially restricted environments and in situations dominated by macromolecular crowding, we address the broader scenario where c_ris not a constant but a function of the reactant concentrations. Thus, we denote c_ras a stochastic rate function, while retaining the definition of h_r as above. Knowing that any positive-valued differentiable function can be approximated locally by a power-law function, we assume the functional form of the stochastic rate function as

c_{r} (x) = κ_{r} \prod_{s = 1}^{N_{s}} x_{s} {(t)}^{ε_{r s}} .

(23)

Here, κ_r and ε_rs are constants that will be specified in the next section, and r = 1, ..., N_r. Note that ε_rs are now real-valued. Once the stochastic rate function is determined (see below), the propensity function can be calculated as

α_{r} (x) = c_{r} (x) h_{r} (x) = \frac{κ_{r}}{\prod_{s = 1}^{N_{s}} {\underline{v}}_{r s}!} \prod_{s = 1}^{N_{s}} x_{s}^{{\underline{v}}_{r s} + ε_{r s}} .

(24)

In order to identify the functional expression for a stochastic rate function, and thus the propensity function, we consider the connection between the stochastic and the deterministic equation models. By multiplying CME with x and summing over all x, we obtain

\frac{d}{d t} E [X (t)] = \sum_{r = 1}^{N_{r}} v_{r} E [α_{r} (X (t))] .

(25)

Similarly, the expectation for any species X_s(t) is given as

\frac{d}{d t} E [X_{s} (t)] = \sum_{r = 1}^{N_{r}} v_{r s} E [α_{r} (X (t))], s = 1, \dots, N_{s} .

(26)

The details of these derivations are shown in Additional file 1.

We can use these results directly to compute the propensity function for a stochastic GMA model, assuming that its deterministic counterpart is well defined. Specifically, we start with the deterministic GMA equation for X_s,

\frac{d}{d t} [X_{s} (t)] = \sum_{r = 1}^{N_{r}} v_{r s} k_{r} \prod_{s^{'} = 1}^{N_{s}} {[X_{s^{'}} (t)]}^{f_{r s^{'}}}, s = 1, \dots, N_{s},

(27)

where v_rs, k_r and f_rs'are again the stoichiometric coefficients, rate constants, and kinetic orders, respectively. By substituting $[X_{s}] = \frac{E [X_{s}]}{Φ}$ from Equation (18) into this GMA model, we obtain a "particle-based" equation of the format

\frac{d}{d t} (\frac{E [X_{s}]}{Φ}) = \sum_{r = 1}^{N_{r}} v_{r s} k_{r} \prod_{s^{'} = 1}^{N_{s}} {(\frac{E [X_{s^{'}}]}{Φ})}^{f_{r s^{'}}}, s = 1, \dots, N_{s} .

Elementary operations allow us to rewrite this equation as

\frac{d}{d t} (E [X_{s}]) = \sum_{r = 1}^{N_{r}} v_{r s} k_{r} Φ^{1 - F_{r}} \prod_{s^{'} = 1}^{N_{s}} E {[X_{s^{'}}]}^{f_{r s^{'}}}, s = 1, \dots, N_{s},

(28)

where $F_{r} = \sum_{s^{'} = 1}^{N_{s}} f_{r s^{'}}$ . In this formulation, the differential operator is justified only when large numbers of molecules are involved. The assumption that the deterministic equations precisely capture the average behaviour of the biochemical reaction system directly equates the stochastic CME (25) to the deterministic equation based model (28)

E [α_{r} (X (t))] = k_{r} Φ^{1 - F_{r}} \prod_{s^{'} = 1}^{N_{s}} E {[X_{s^{'}}]}^{f_{r s^{'}}} .

(29)

Now we have two choices for approximating the expectation of the propensity function on left-hand side of equation (29):

1)
adopt a zero-covariance assumption as was done in [25], which implies ignoring random fluctuations within every species as well as their correlations. This assumption is only justified for some special cases such as monomolecular and bimolecular reactions under the thermodynamic limit (cf. [4, 6]), but is not necessary valid in generality. Here the thermodynamic limit is defined as a finite concentration limit which the system reaches when both population and volume approach infinity. Under this assumption, the left hand side of (29) becomes
$\begin{aligned} E [α_{r} (x)] & = E [\frac{κ_{r}}{\prod_{s = 1}^{N_{s}} {\underline{v}}_{r s}!} \prod_{s = 1}^{N_{s}} x_{s}^{{\underline{v}}_{r s} + ε_{r s}}] \\ = \frac{κ_{r}}{\prod_{s = 1}^{N_{s}} {\underline{v}}_{r s}!} \prod_{s = 1}^{N_{s}} E {[X_{s}]}^{{\underline{v}}_{r s} + ε_{r s}} \end{aligned}$
(30)

for every r = 1, ..., N_r, and Equation (24) yields

\begin{aligned} ε_{r s} & = f_{r s} - {\underline{v}}_{r s} \\ κ_{r} & = k_{r} Φ^{1 - F_{r}} \prod_{s = 1}^{N_{s}} {\underline{v}}_{r s}! \\ c_{r} (x) & = k_{r} Φ^{1 - F_{r}} \prod_{s = 1}^{N_{s}} {\underline{v}}_{r s}! x_{s}^{ε_{r s}} \end{aligned}

(31)

and

α_{r_{-} 0} (x) = k_{r} Φ^{1 - F_{r}} \prod_{s = 1}^{N_{s}} x_{s}^{f_{r s}} .

(32)

Here, the index r_0 is used to distinguish this 0-covariance propensity function from a second type of propensity in the next section.

With the zero-covariance assumption, one can substitute (32) back into the equation for the expectation for each species, which yields

\frac{d}{d t} E [X_{s} (t)] = \sum_{r = 1}^{N_{r}} v_{r s} k_{r} Φ^{1 - F_{s}} \prod_{s = 1}^{N_{s}} μ_{s}^{f_{r s}}

(33)

for every s = 1, ..., N_s.. Note that this result is exactly equivalent to the equation-based model (27).

Equation (33) is based on assumption that both the fluctuations within species and their correlations are ignorable, which is not necessarily true in reality. If one uses it in simulations where the assumptions are not satisfied, it is possible that the means for the molecular species are significantly different from the corresponding equation-based model values. This discrepancy arises because the evolution of each species in the stochastic simulation is in truth affected by the covariance which is not necessarily zero, as it was assumed. This phenomenon was observed by Paulsson and collaborators [26] and further discussed in different moment-based approaches [6, 7]. To assess the applicability limit of the propensity defined by (32), we can apply approximation techniques as shown in eqns. (8)-(10) on the functional expression of α_{r_0}and obtain mean and variance as

\begin{aligned} μ_{α_{r_{-} 0}} & = E [α_{r_{-} 0} (X (t))] \\ \approx k_{r} Φ^{1 - F_{r}} \prod_{s^{'} = 1}^{N_{s}} E {[X_{s^{'}}]}^{f_{r s^{'}}} exp (\sum_{i < j}^{N_{s}} f_{r i} f_{r j} cov [1 og X_{i}, log X_{j}]) \end{aligned}

(34)

{σ_{α_{r_{-} 0}}}^{2} \approx {μ_{α_{r_{-} 0}}}^{2} Ω_{r},

(35)

where

Ω_{r} = \sum_{s = 1}^{N_{s}} f_{r s} μ_{s}^{- 2} σ_{s}^{2} + 2 \sum_{i < j}^{N_{s}} f_{r i} f_{r j} cov [1 og X_{i}, log X_{j}],

(36)

for every s = 1, ..., N_s. These expressions demonstrate that even with large numbers of molecules the mean of CME does not always converge to the GMA model. Indeed, the convergence is only guaranteed in one of the following special situations: 1) the reaction is of 0^th order; 2) the reaction is a real value-order monomolecular reaction, with 1^st order reaction as a special case; 3) the covariance contribution in (34) is sufficiently small to be ignored for all participating reactant species of a particular reaction channel. Except for these three special situations, the covariance as shown in (34) significantly affects the mean dynamics. Therefore, stochastic simulations using zero-covariance propensity functions will in general yield means different from what the deterministic GMA model produces. How large these differences are cannot be said in generality. Under the assumption that the GMA model correctly captures the mean dynamics of every species, this conclusion means that α_{r_0}is not necessarily an accurate propensity function for stochastic simulations, and the direct conversion of the equation-based model into a propensity function must be considered with caution.

Moreover, there is no theoretical basis to assume that there are no fluctuations in the molecular species or that these are independent. Therefore, we need to consider the second treatment of the expectation of the propensity function and study the possible effects of a non-zero covariance.

2)
We again assume that the GMA model is well defined, which implies that information regarding the species correlations and fluctuations has been captured in the parameters of the GMA model on the left hand size of Equations (7) and (28). To gain information regarding correlations, we use Taylor expansion to approximate the propensity function (see Additional file 1 for details):
$\begin{matrix} E [α_{r} (X (t))] = E [\frac{κ_{r}}{\prod_{s = 1}^{N_{s}} {\underline{v}}_{r s}!} \prod_{s = 1}^{N_{s}} X_{s}^{{\underline{v}}_{r s}}^{+ ε_{r s}}] \\ \approx \frac{κ_{r}}{\prod_{s = 1}^{N_{s}} {\underline{v}}_{r s}!} \prod_{s = 1}^{N_{s}} E {[X_{s}]}^{{\underline{v}}_{r s} + ε_{r s}} \\ \times \exp (\sum_{i < j}^{N_{s}} ({\underline{v}}_{r i} + ε_{r i}) ({\underline{v}}_{r j} + ε_{r j}) cov [1 og X_{i}, \log X_{j}]) \end{matrix}$
(37)

After substitution of (37) in (29), one obtains

\begin{aligned} κ_{r} & = k_{r} Φ^{1 - F_{r}} \prod_{s = 1}^{N_{s}} {\underline{v}}_{r s}! exp (- \sum_{i < j}^{N_{s}} f_{r i} f_{r j} cov [1 og X_{i}, log X_{j}]) \\ ε_{r s} & = f_{r s} - {\underline{v}}_{r s} . \end{aligned}

Given the state x of the system at time t, the stochastic rate function of reaction R_r is

\begin{aligned} c_{r} (x) & = κ_{r} \prod_{s = 1}^{N_{s}} x_{s}^{ε_{r s}} \\ = k_{r} Φ^{1 - F_{r}} \prod_{s = 1}^{N_{s}} {\underline{v}}_{r s}! \\ \times exp (- \sum_{i < j}^{N_{s}} f_{r i} f_{r j} cov [1 og X_{i} (t), log X_{j} (t)]) \prod_{s = 1}^{N_{s}} x_{s}^{f_{r s} - {\underline{v}}_{r s}} . \end{aligned}

(38)

Here it is important to understand that although the random variables {X_s}_s∈Sappear in the expression c_r(x), c_r(x) is not a function of random variables but a deterministic function. The reason is that the cov [logX_i(t), logX_j(t)] in the composition of c_r(x), which as the numerical characteristic of the random variables {X_s}_s∈S, is deterministic. Therefore, the stochastic rate function c_r(x) is a well-justified deterministic function that is affected by both the state of the system $[x_{1}, \dots, x_{N_{s}}]$ and cov [logX_i(t), logX_j(t)], the numerical characteristic of fluctuations in the random variables {X_s}_s∈S.

Given the expression c_r(x), the propensity function is

\begin{aligned} α_{r} (x) & = c_{r} (x) h_{r} (x) \\ = k_{r} Φ^{1 - F_{r}} \prod_{s = 1}^{N_{s}} x_{s}^{f_{r s}} \\ \times exp (- \sum_{i < j}^{N_{s}} f_{r i} f_{r j} cov [1 og X_{i} (t), log X_{j} (t)]) . \end{aligned}

(39)

These results are based on the assumption that there are large numbers of molecules for all reactant species participating in reaction R_r. For simplicity of discussion, we define the propensity adjustment factor (paf) of reaction R_r as

p a f (t) ≜ exp (- \sum_{i < j}^{N_{s}} f_{r i} f_{r j} cov [1 og X_{i} (t), log X_{j} (t)]) .

(40)

paf is a function of time t and represents the contribution of the reactants to correlations among species in the calculation of the propensity function for reaction R_r. We denote the propensity function in (39), which accounts for the contribution of the covariance, as α_{r_cov}, in order to distinguish it from the propensity function α_{r_0}(32), which is based on the assumption of zero-covariance, i.e.,

α_{r_{-} cov} (x) = p a f (t) k_{r} Φ^{1 - F_{r}} \prod_{s = 1}^{N_{s}} x_{s}^{f_{r s}} .

(41)

Remembering that cov [logX_i(t), logX_j(t)], which is a component in both the stochastic rate function c_r(x) and now in the function paf(t), is a deterministic function rather than a function of random variables, paf(t) is a deterministic correction to the kinetic constant k_r in the construction of α_{r_cov}in (41), which corrects the stochastic simulation toward the correct average.

In contrast to the propensity function α_{r_0}, α_{r_cov}leads to accurate stochastic simulations. To illustrate this difference, we analyze $\frac{d}{d t} E [X_{s} (t)]$ as follows: We apply the approximation techniques in eqns. (9)-(11) in order to obtain the mean and variance of the propensity function α_{r_cov}:

μ_{α_{r_{-} cov}} = E [α_{r_{-} cov} (X (t))] \approx k_{r} Φ^{1 - F_{r}} \prod_{s^{'} = 1}^{N_{s}} E {[X_{s^{'}}]}^{f_{r s^{'}}}

(42)

{σ_{r_{-} cov}}^{2} \approx {μ_{α_{r_{-} cov}}}^{2} Ω_{r} .

(43)

Here

Ω_{r} = \sum_{s = 1}^{N_{s}} f_{r s} μ_{s}^{- 2} σ_{s}^{2} + 2 \sum_{i < j}^{N_{s}} f_{r i} f_{r j} cov [1 og X_{i}, log X_{j}] .

(44)

By substituting (42) back into the derivation of CME (26), one obtains

\begin{gathered} \frac{d}{d t} E [X_{s} (t)] \\ = \sum_{r = 1}^{N_{r}} v_{r s} E [α_{r_{-} cov} (X (t))] \\ \approx \sum_{r = 1}^{N_{r}} v_{r s} k_{r} Φ^{1 - F_{r}} \prod_{s^{'} = 1}^{N_{s}} μ_{s^{'}}^{f_{r s^{'}}} \end{gathered}

(45)

for every s = 1, ..., N_s, which is equivalent in approximation to the GMA model (28). In the other words, the mean of every molecular species obtained by using α_{r_cov}in the CME derived equation (27) is approximately identical to the corresponding macroscopic variable in the GMA model.

Calculation of cov [logX_i(t), logX_j(t)]

When data in the form of multiple time series for all the reactants are available, it is possible to compute cov [logX_i(t), logX_j(t)] directly from these data. Once this covariance is known, the function paf, α_{r_cov}and the mean dynamics can all be assessed. Alas, the availability of several time series data for all reactants under comparable conditions is rare, so that cov [logX_i(t), logX_j(t)] must be estimated in a different manner.

If one can validly assume that the covariance based on α_{r_0}does not differ significantly from the covariance based on α_{r_cov}, one may calculate cov [logX_i(t), logX_j(t)] by one of following methods.

Method 1:

One uses α_{r_0}to generate multiple sets of time series data of all reactants and then computes cov [logX_i(t), logX_j(t)].

Method 2:

First, cov [logX_i(t), logX_j(t)] is expressed as a function of mean and covariance in one of the following ways; either as

\begin{gathered} cov [1 og X_{i}, log X_{j}] \approx σ_{i j} / (μ_{i} μ_{j}) + \frac{1}{2} log (μ_{i}) {(σ_{j} / μ_{j})}^{2} \\ + \frac{1}{2} log (μ_{j}) {(σ_{i} / μ_{i})}^{2} - \frac{1}{4} {(σ_{i} / μ_{i})}^{2} {(σ_{j} / μ_{j})}^{2} \end{gathered}

(46)

or as Equation (14):

cov [1 og X_{i}, log X_{j}] = log (1 + \frac{σ_{i j}}{μ_{i} μ_{j}}) .

The first functional expression of cov [logX_i(t), logX_j(t)] is achieved by Taylor approximation, whereas the second expression is obtained by the additional assumption that the concentrations (X₁, ..., X_s) are log-normally distributed [8, 23]. The consideration of a log-normal distribution is often justified by the fact that many biochemical data have indeed been observed to be log-normally distributed (e.g., [20–22]).

Second, one uses α_{r_0}to approximate the mean and covariance either by direct simulation, as shown in method 1, or by a moment-based approach, which is explained in Additional file 2, and which yields the differential equations

\begin{aligned} \frac{\partial μ_{s}}{\partial t} & \approx \sum_{r = 1}^{N_{r}} v_{r, s} \{α_{r_{-} 0} (μ) + \frac{1}{2} \sum_{m, n = 1}^{N_{s}} \frac{\partial^{2} α_{r_0} (μ)}{\partial X_{m} \partial X_{n}} σ_{m n}\} \\ \frac{\partial σ_{i j}}{\partial t} & \approx \sum_{r = 1}^{N_{r}} \{v_{r, i} \sum_{s = 1}^{N_{s}} \frac{\partial α_{r_0} (μ)}{\partial X_{s}} σ_{j s} + v_{r, j} \sum_{s = 1}^{N_{s}} \frac{\partial α_{r_0} (μ)}{\partial X_{s}} σ_{i s} \\ + v_{r, i} v_{r, j} [α_{r_{-} 0} (μ) + \frac{1}{2} \sum_{m, n = 1}^{N_{s}} \frac{\partial^{2} α_{r_0} (μ)}{\partial X_{m} \partial X_{n}} σ_{m n}]\} \end{aligned}

For convenience of computational implementation, the above equations can be written in matrix format

\begin{aligned} \frac{\partial μ}{\partial t} & \approx V^{T} (α + \frac{1}{2} α^{″} ⊙ σ) \\ \frac{\partial σ}{\partial t} & \approx {(σ (α^{'} V))}^{T} + σ (α^{'} V) + V^{T} Λ V . \end{aligned}

Here for r = 1, ..., N_r, and s, m, n = 1, ..., N_s, $μ = {(μ_{1}, \dots, μ_{N_{s}})}^{T}$ , (V)_rs= v_rs, $α = {(α_{1}, . . ., α_{N_{r}})}^{T}$ , $α^{″} = {(α_{r}^{^{″}}, . . ., α_{N_{r}}^{^{″}})}^{T}$ , ${(α_{r}^{^{″}})}_{m n} = \frac{\partial^{2} α_{r} (X)}{\partial X_{m} \partial X_{n}}$ , $α_{r}^{^{″}} ⊙ σ = \sum_{m, n = 1}^{N_{s}} \frac{\partial^{2} α_{r} (X)}{\partial X_{m} \partial X_{n}} |_{X = μ} σ_{m n}$ , $α^{^{″}} ⊙ σ ≜ {(α_{1}^{^{″}} ⊙ σ, \dots, α_{N_{r}}^{^{″}} ⊙ σ)}^{T}$ , $α^{'} = ({α_{1}}^{'}, \dots, {α_{N_{r}}}^{'})$ , ${α_{r}}^{'} = {(\frac{\partial α_{r} (μ)}{\partial X_{1}}, \dots, \frac{\partial α_{r} (μ)}{\partial X_{N_{s}}})}^{T}$ , and Λ is a diagonal matrix with ${(Λ)}_{r r} = α_{r} (μ) + \frac{1}{2} \sum_{m, n = 1}^{N_{s}} \frac{\partial^{2} α_{r} (μ)}{\partial X_{m} \partial X_{n}} σ_{m n}$ .

Statistical criteria for propensity adjustment

Suppose an equation-based model captures the average behavior of a stochastic system and one intends to find the propensity function for a stochastic simulation that will reproduce that means. One can use the 95% confidence interval to evaluate the need for a propensity adjustment. Specifically, for stable systems that will reach a steady state, we use the reversible reaction model as an example. If the steady state of the ODE x_st is within the 95% confidence interval of n runs of stochastic simulations, i.e. $x_{s t} \in [μ_{s t} - 1.96 \frac{δ_{s t}}{\sqrt{n}}, μ_{s t} + 1.96 \frac{δ_{s t}}{\sqrt{n}}]$ , then the rate function in the original ODEs can be used as the propensity without adjustment; otherwise propensity adjustment is needed. Here μ_st and δ_st can be attained from either a moment-base method or from n independent runs of stochastic simulations using propensity without adjustment. An example discussing a reversible reaction with feedback controls can be found in the results section.

For other systems that do not reach a steady state, but where instead transient characteristics are of the highest interest, one can judge the need of propensity adjustment by whether the pertinent characteristics of the ODEs are within the 95% confidence interval of the corresponding characteristic, which is given by a prediction from the moment-based method or from n runs of stochastic simulations. The Repressilator example in the result section will serve as a demonstration.

Results

Generic special cases

It is generally not valid to translate a rate from a deterministic biochemical model into a propensity function of the corresponding stochastic simulation without adjustment (see Equations. (34)-(36)). However, in some situations, the propensity adjustment (e.g., Equations (40)-(44)) is not needed, and in some other cases it becomes relatively simple.

1)
0^th-order reaction kinetics

Consider a very simple equation-based model of the type

\frac{d [X_{s} (t)]}{d t} = k_{r} or \frac{d E [X_{s} (t)]}{d t} = k_{r} Φ,

(47)

for all s = 1, ..., N_s, f_rs = 0. According to Equations (40)-(44), one obtains

\begin{gathered} Ω_{r} = 0 \\ {σ_{α_{r}}}^{2} \approx 0 \\ μ_{α_{r}} \approx exp (log (k_{r} Φ)) = k_{r} Φ \\ i .e . E [α_{r} (X)] \approx α_{r} (E [X]) \\ α_{r_cov} \approx k_{r} Φ = α_{r_0} . \end{gathered}

Thus, for a 0^th-order reaction, its rate equation can be taken directly as the propensity function in stochastic simulations.

2)
1^st-order reaction kinetics

Direct application of Equations (40)-(44) yields

\begin{aligned} \frac{d [X_{i} (t)]}{d t} & = k_{r} [X_{j} (t)] \\ or \frac{d E [X_{i} (t)]}{d t} & = k_{r} E [X_{j} (t)], \end{aligned}

(48)

f_rs = δ_sj, i, j = 1, ..., N_s. Therefore, according to Equations (40)-(44)

\begin{gathered} Ω_{r} = {(σ_{j} / μ_{j})}^{2} \\ {(σ_{α_{r}} / μ_{α_{r}})}^{2} = {(σ_{j} / μ_{j})}^{2} \\ μ_{α_{r}} \approx exp (log (k_{r} μ_{j})) = k_{r} μ_{j} \\ i .e . E [α_{r} (X)] \approx α_{r} (E [X]) \\ α_{r_{-} cov} (X) \approx k_{r} X_{j} = α_{r_{-} 0} (X) . \end{gathered}

Thus, for 1^st-order reactions, the rate equation can again be taken directly as the propensity function in stochastic simulations.

3)
Real-valued order monomolecular reaction kinetics

Consider a reaction with kinetics of the type

\begin{aligned} \frac{d [X_{i} (t)]}{d t} & = k_{r} {[X_{j} (t)]}^{f_{r j}} \\ or \frac{d E [X_{i} (t)]}{d t} & = k_{r} Φ^{1 - f_{r j}} E {[X_{j} (t)]}^{f_{r j}}, \end{aligned}

(49)

f_rj ≠ 0, f_rs = 0, for any s ≠ j, s = 1, ..., N_s. Equations (40)-(44) lead to

\begin{gathered} Ω_{r} = {(σ_{j} / μ_{j})}^{2} \\ {(σ_{α_{r}} / μ_{α_{r}})}^{2} = {(σ_{j} / μ_{j})}^{2} \\ μ_{α_{r}} \approx k_{r} Φ^{1 - f_{r j}} μ_{j}^{f_{r j}} i .e . E [α_{r} (X)] \approx α_{r} (E [X]) \\ α_{r_{-} cov} (X) \approx k_{r} Φ^{1 - f_{r j}} X_{j}^{f_{r j}} = α_{r_{-} 0} (X) . \end{gathered}

Thus, for reaction kinetics involving a single variable and a real-valued order, the rate equation can again be taken as the propensity function in stochastic simulations.

4)
2^nd-order reaction kinetics

This type of reaction can be expressed as

\begin{aligned} \frac{d}{d t} [X_{s} (t)] & = k_{r} [X_{i} (t)] [X_{j} (t)] \\ or \frac{d E [X_{s} (t)]}{d t} & = k_{r} Φ^{- 1} E [X_{i} (t)] E [X_{j} (t)], \end{aligned}

(50)

i, j ∈ {1, ..., N_s}, i ≠ j, f_ri = f_rj = 1, and f_rs = 0, for all s ≠ i, j. Therefore, according to Equations (40)-(44)

\begin{aligned} Ω_{r} & = {(σ_{i} / μ_{i})}^{2} + {(σ_{j} / μ_{j})}^{2} + 2 cov [1 og X_{i}, log X_{j}] \\ = {(σ_{i} / μ_{i})}^{2} + {(σ_{s} / μ_{s})}^{2} + 2 \{cov [X_{i} / μ_{i}, X_{j} / μ_{j}] \\ + \frac{1}{2} log (μ_{i}) {(σ_{j} / μ_{j})}^{2} + \frac{1}{2} log (μ_{j}) {(σ_{i} / μ_{i})}^{2} - \frac{1}{4} {(σ_{i} / μ_{i})}^{2} {(σ_{j} / μ_{j})}^{2}\} \end{aligned}

\begin{aligned} {(σ_{α_{r}} / μ_{α_{r}})}^{2} & = Ω_{r} \\ \partial & = {(σ_{i} / μ_{i})}^{2} + {(σ_{j} / μ_{j})}^{2} + 2 cov [1 og X_{i}, log X_{j}] \\ μ_{α_{r}} & \approx k_{r} {(N_{A} V)}^{- 1} μ_{i} μ_{j} \\ α_{r_{-} cov} (X) & = k_{r} Φ^{- 1} X_{i} X_{j} exp (- cov [1 og X_{i}, log X_{j}]) \neq α_{r_{-} 0} (X) . \end{aligned}

Thus, the proper propensity function for 2^nd-order reactions is different from the rate equation. The difference can be ignored only if the contribution from the covariance is insignificant. In general, the rate equation yields only an approximate propensity function for stochastic simulations, and the approximation quality must be assessed on a case-by-case basis.

5)
Bimolecular reaction with real-valued order kinetics

This type of reaction can be formulated as

\begin{aligned} \frac{d [X_{s} (t)]}{d t} & = k_{r} {[X_{i} (t)]}^{f_{i}} {[X_{j} (t)]}^{f_{j}} \\ or \frac{d E [X_{s} (t)]}{d t} & = k_{r} Φ^{1 - f_{i} - f_{j}} E {[X_{i} (t)]}^{f_{i}} E {[X_{j} (t)]}^{f_{j}}, \end{aligned}

(51)

i, j ∈ {1, ..., N_s}, i ≠ j, f_ri, f_rj ≠ 0, and f_rs = 0, for all s ≠ i, j. According to Equations (40)-(44) we obtain

\begin{aligned} Ω_{r} & = {(σ_{i} / μ_{i})}^{2} + {(σ_{j} / μ_{j})}^{2} + 2 f_{i} f_{j} cov [1 og X_{i}, log X_{j}] \\ = {(σ_{i} / μ_{i})}^{2} + {(σ_{j} / μ_{j})}^{2} + 2 f_{i} f_{j} \{cov [X_{i} / μ_{i}, X_{j} / μ_{j}] \\ + \frac{1}{2} log (μ_{i}) {(σ_{j} / μ_{j})}^{2} + \frac{1}{2} log (μ_{j}) {(σ_{i} / μ_{i})}^{2} - \frac{1}{4} {(σ_{i} / μ_{i})}^{2} {(σ_{j} / μ_{j})}^{2}\} \end{aligned}

\begin{aligned} {(σ_{α_{r}} / μ_{α_{r}})}^{2} & = Ω_{r} \\ = {(σ_{i} / μ_{i})}^{2} + {(σ_{j} / μ_{j})}^{2} + 2 f_{i} f_{j} cov [1 og X_{i}, log X_{j}] \\ μ_{α_{r}} & \approx k_{r} Φ^{f_{i} + f_{j} - 1} μ_{i}^{f_{i}} μ_{j}^{f_{j}} \\ α_{r_{-} cov} (X) & = k_{r} Φ^{f_{i} + f_{j} - 1} X_{i}^{f_{i}} X_{j}^{f_{j}} exp (- f_{i} f_{j} cov [1 og X_{i}, log X_{j}]) \\ \neq α_{r_{-} 0} (X) . \end{aligned}

For bimolecular reactions of complex order, the propensity function is different from the rate equation. The difference can be ignored only if the contribution from the covariance is insignificant.

Power-law representation of a reversible reaction with feedback controls

We consider a reversible reaction with feedback controls (see Figure 1) whose average behaviour is accurately described by the following GMA model

\begin{gathered} \frac{d x_{1}}{d t} = \frac{d x_{2}}{d t} = - \frac{d x_{3}}{d t} \\ = - k_{f} Φ^{1 - f_{1} - f_{2} - f_{3}} x_{1}^{f_{1}} x_{2}^{f_{2}} x_{3}^{f_{3}} + k_{b} Φ^{1 - g_{1} - g_{3}} x_{1}^{g_{1}} x_{3}^{g_{3}} . \end{gathered}

(52)

Here S₃ feeds back to inhibit the forward reaction and S₁ feeds back on the reverse reaction and accelerates it. The task is to develop a stochastic model whose performance converges to that of the deterministic GMA model. We can see from equations (52) that three variables x₁, x₂ and x₃ contribute to the forward flux $k_{f} Φ^{1 - f_{1} - f_{2} - f_{3}} x_{1}^{f_{1}} x_{2}^{f_{2}} x_{3}^{f_{3}}$ and two variables x₁ and x₃ contribute to the backward flux $k_{b} Φ^{1 - g_{1} - g_{3}} x_{1}^{g_{1}} x_{3}^{g_{3}}$ . Because several variables are involved, their covariance has the potential of affecting the forward and the backward propensity functions in a stochastic simulation. To obtain the covariance information, we formulate the moment equations (53) from the ODE model (52).

To simplify the calculation, as explained in detail in Additional file 2, we set the third central moment to zero and obtain a closed-form set of ODEs. Expressed differently, the rate of change in mean and covariance depends only on the functions of mean and covariance themselves, but not on higher-order moments. Thus,

\begin{aligned} \frac{\partial μ}{\partial t} & \approx V^{T} (α + \frac{1}{2} α^{″} ⊙ σ) \\ \frac{\partial σ}{\partial t} & \approx {(σ (α^{'} V))}^{T} + σ (α^{'} V) + V^{T} Λ V . \end{aligned}

(53)

Here μ = (μ₁, μ₂, μ₃)^T, $V = [\begin{matrix} - 1 & - 1 & 1 \\ 1 & 1 & - 1 \end{matrix}]$ , $α = {(α_{1}, α_{2})}^{T} = [\begin{matrix} k_{f} Φ^{1 - f_{1} - f_{2} - f_{3}} x_{1}^{f_{1}} x_{2}^{f_{2}} x_{3}^{f_{3}} \\ k_{b} Φ^{1 - g_{1} - g_{3}} x_{1}^{g_{1}} x_{3}^{g_{3}} \end{matrix}]$ .

Moreover, for r = 1, 2 and m, n = 1, 2, 3, ${(α_{r}^{^{″}})}_{m n} = \frac{\partial^{2} α_{r} (X)}{\partial x_{m} \partial x_{n}}$ , α" = (α₁", α₂")^T, $σ = [\begin{matrix} σ_{11} & σ_{12} & σ_{13} \\ σ_{21} & σ_{22} & σ_{23} \\ σ_{31} & σ_{32} & σ_{33} \end{matrix}]$ , $α_{r}^{^{″}} ⊙ σ ≜ \sum_{m, n = 1}^{3} \frac{\partial^{2} α_{r} (X)}{\partial x_{m} \partial x_{n}} |_{X = μ} σ_{m n}$ , α"⊙σ ≜ (α₁"⊙ σ, α₂"⊙ σ)^T, α' = (α₁', α₂'), ${α_{r}}^{'} = {(\frac{\partial α_{r} (μ)}{\partial x_{1}}, \frac{\partial α_{r} (μ)}{\partial x_{2}}, \frac{\partial α_{r} (μ)}{\partial x_{3}})}^{T}$ , and $Λ = [\begin{matrix} α_{1} (μ) + \frac{1}{2} \sum_{m, n = 1}^{3} \frac{\partial^{2} α_{1} (μ)}{\partial x_{m} \partial x_{n}} σ_{m n} & 0 \\ 0 & α_{2} (μ) + \frac{1}{2} \sum_{m, n = 1}^{3} \frac{\partial^{2} α_{2} (μ)}{\partial x_{m} \partial x_{n}} σ_{m n} \end{matrix}]$ .

Two initial conditions are chosen for representative simulations; they differ by a factor of 20 in species populations and reaction volume between the upper and lower panels of Figure 2. The purpose is to observe the thermodynamic limit of the systems: both scenarios have the same initial concentrations, but the system in the lower panel case has a larger species populations and reaction volume and can thus be regarded as the thermodynamic limit sample of system in the upper panel. As demonstrated by the figures in the first column, the moment approach predicts that for both population sizes the average trajectories of the stochastic model (without propensity adjustment) dynamics is lower than that of the equation-based model: the differences are about 10% of the steady-state value of the equation-based model in the upper figure and 1% in the lower figure; for 100 runs of the stochastic simulation, the steady-state value of the equation-based model lies outside the 95% confidence interval in the upper figure, while it is inside the interval in the lower figure. Therefore, we can expect that the propensity adjustment will significantly contribute to the stochastic simulation for the upper case while not for the lower case. This expectation is confirmed by the simulation results in the third and fourth columns. With the common assumption that the deterministic equations precisely capture the system's average behaviour, the case in the upper panel represents the situation where propensity adjustment is needed, while the lower panel represents the situation that a propensity without adjustment is sufficient when the system approaches its thermodynamics limit. This example furthermore demonstrates that either the moment approach or the stochastic simulations without propensity adjustment can be used to estimate whether there is a need to construct a propensity adjustment function for stochastic simulations.

Repressilator

Interestingly, a propensity function may even be obtained through power-law approximation of some function that describes complex transient behaviours of a reaction network. As an example, consider the so-called Repressilator [27], which is a three-component genetic circuit where each component represses its downstream neighbour. More specifically (as shown in Figure 3), gene G₁ codes for protein x₁, whose dimer y₁ subsequently represses the transcription of the gene G₂. Similarly, y₂, the dimer of gene G₂'s protein product x₂, represses the transcription of gene G₃, and y₃, the dimer of gene G₃'s protein product x₃, represses the transcription of gene G₁. The corresponding differential equation model following mass action kinetics is given by [28]

\begin{aligned} {x_{i}}^{'} & = - 2 κ_{+} x_{i}^{2} + 2 κ_{-} y_{i} + σ m_{i} - γ_{p} x_{i} \\ {y_{i}}^{'} & = κ_{+} x_{i}^{2} - κ_{-} y_{i} - k_{+} y_{i} d_{0, j} + k_{-} d_{r, j} \\ {d_{0, i}}^{'} & = - k_{+} y_{k} d_{0, i} + k_{-} d_{r, i} \\ {d_{r, i}}^{'} & = k_{+} y_{k} d_{0, i} - k_{-} d_{r, i} \\ {m_{i}}^{'} & = d_{0, i} - γ_{m} m_{i}, \end{aligned}

(54)

where i = 1, 2, 3; j = 2, 3, 1; k = 3, 1, 2; the rate constants are explained in the diagram below

\begin{gathered} x_{i} + x_{i} ⇌_{κ_{-}}^{κ_{+}} y_{i} \\ d_{0, i} + y_{k} ⇌_{κ_{-}}^{κ_{+}} d_{r, i} \\ d_{0, i} \to_{}^{α} d_{0, i} + m_{i} \\ m_{i} \to_{}^{α} m_{i} + x_{i} \\ x_{i} \to_{}^{γ_{p}} ϕ \\ m_{i} \to_{}^{γ_{m}} ϕ \end{gathered}

Assuming that the reversible dimerization and the dissociation/association of a protein dimer from/to the promoter are much faster than other processes, the full systems can be reduced to

\begin{aligned} {x_{i}}^{'} & = σ p {(x_{i})}^{- 1} m_{i} - γ_{p} p {(x_{i})}^{- 1} x_{i} \\ {m_{i}}^{'} & = \frac{α d}{1 + c_{d} c_{p} x_{k}^{2}} - γ_{m} m_{i} \end{aligned}

(55)

[28]. Here Φ = 1, $p (x_{i}) = 1 + 4 c_{p} x_{i} + \frac{4 c_{d} c_{p} d x_{i}}{{(1 + c_{d} c_{p} x_{i}^{2})}^{2}}$ , c_p = κ₊/κ_-, c_d = k₊/k_- and d = d_{0, i}+ d_{r, i} for i = 1, 2, 3. It has been shown that the simplified ODEs rather accurately approximate the transient dynamics of the full system by retaining the original oscillation period and amplitude.

In [28], the system (55) is further rescaled by setting $\tilde{t} = γ_{m} t, {\tilde{x}}_{i} = \sqrt{c_{d} c_{p}} x_{i}$ and $\tilde{m_{i}} = (σ \sqrt{c_{d} c_{p}} m_{i}) / (γ_{m} β)$ , which yields

\begin{aligned} \frac{d {\tilde{x}}_{i}}{d \tilde{t}} & = β p {({\tilde{x}}_{i})}^{- 1} \tilde{m_{i}} - β p {({\tilde{x}}_{i})}^{- 1} {\tilde{x}}_{i} \\ \frac{d \tilde{m_{i}}}{d \tilde{t}} & = \frac{κ d^{'}}{1 + {\tilde{x}}_{k}^{2}} - \tilde{m_{i}} . \end{aligned}

(56)

Intriguingly, one makes the following observation. The scaled ODE system (56) is consistent with the original system (55) in oscillation amplitude and period. However, its corresponding stochastic model produces results that deviate substantially from the average responses. To see the effects of the transition from a deterministic to a stochastic model, we apply SSA to the scaled system (56). The main result is that the oscillation periods of both x_i and m_i are reduced to half (Figure 4). The reason is that, in the stochastic simulation, the oscillation period is very sensitive to the ratio of x_i and m_i, which has been altered by the scaling operation. Therefore, in general one needs to pay attention to how scaling may affect the stochastic performance when the model is generated through the conversion of an ODE model.

We can see from equations (55) that two variables x_i and m_i contribute to the production of x_i; hence, their covariance could affect the propensity function of x_i in the production reaction of a stochastic simulation. Similar to the example of a reversible reaction (Equation 52), it is therefore necessary to evaluate covariance effects and to judge whether the propensity function needs adjusting. Thus, we need to compare the difference between the dynamics of the phenomenological model (55) and the dynamics under the influence of covariance, which can be produced by either stochastic simulation or the moment approach.

The influence of the covariance on the dynamics of the stochastic simulation is relatively easy to assess: we simply use the terms on the right-hand side of the differential equations (54) as the propensity functions in SSA and obtain simulation results shown in the 2^nd and the 4^th panels of Figure 5. Obtaining the covariance-influenced dynamics with the moment-based approach is more complicated, and we need to discuss some implementation issues.

First, the moment-based approach requires information regarding the first and the second derivatives of p(x_i)^-1, which have rather complicated functional forms. To simplify the calculation, we replace the function p(x_i)^-1 with an approximating power-law function. Specifically, suppose the original parameter values are κ₊ = k₊ = 5, κ_- = k_- = 100 and d = 20. Plotting the data (x_i, p(x_i)^-1)in log-log space (Figure 5) indicates that the original function is represented well by a straight line:

log y_{i} = log 3.5188 - 0.9384 log x_{i} .

for x_i ∈ [30, 300]. In Cartesian space, this line corresponds to the power-law function

y_{i} = 3.5188 x_{i}^{- 0.9384},

which models the original function very well (see Figure 5). For x_i ∈ [1, 30], this power-law function does not fit the original function precisely; the effect of this imprecision can be evaluated later at after we use this power-law function in the moment-based method. Moreover, using the truncated moment equations to estimate the mean and variance involves multiple approximations: First, the function p (x_i)^-1 on the right-hand side of (55) is replaced by a power-law function (see Figure 5). Second, the result is approximated by Taylor expansion to the second order. Third, similar to the example of a reversible reaction, the central moment of the third degree is assumed to be zero, which leads to a closed-form ODE for the first two moments.

Solving the technical issues as described, one obtains the corresponding moment-based model of (55) (not shown) with results shown in Figure 6. Suppose one is particularly interested in the period and amplitude of the oscillation within a time interval between 0 and 400 seconds. As shown in Figure 6, the GMA approximation (black dashed line) fits the original ODEs (55) (bold black solid line) very well at the beginning, but as time goes on, the approximation error accumulates. As seen in the time interval [350, 400], the GMA approximation deviates from the original ODEs significantly. However, this does not mean that the GMA approximation cannot be used as a propensity function for stochastic simulations; the moment-based method with the GMA approximation shows that, when the GMA approximation is used as propensity function (without adjustment) for stochastic simulations, the resulting mean (red solid line) consistently fits the trajectory of the original ODEs (bold solid black line) very well up to about t = 400 seconds. The oscillation period and amplitude in the stochastic simulation based on the GMA approximation (without adjustment) are almost identical to those of the original ODEs. Therefore, a propensity adjustment for the GMA approximation is not needed, and the GMA approximation can be used as a propensity function for stochastic simulations. In other words, a stochastic model for the Repressilator system can be generated by using the scheme in (32) without propensity adjustment. Moreover, the imprecision caused by the power-law approximation can be tolerated when its corresponding moment-based mean matches the original ODEs sufficiently well with respect to the features of highest interest.

Enzymatic reaction using a quasi-steady state assumption (QSSA)

We consider an enzymatic reaction following the Michaelis-Menten mechanism:

E + S ⇌_{k_{- 1}}^{k_{1}} E S \to_{}^{k_{2}} P + E .

Here enzyme E reacts with substrate S through a reversible reaction to form complex ES, which can proceed to yield product P and to release the enzyme E. By assuming the law of mass action for the reaction kinetics we obtain a set of differential equations for the system dynamics:

\begin{aligned} \frac{d [S]}{d t} & = k_{- 1} [E S] - k_{1} [S] ({[E]}_{0} - [E S]) \\ \frac{d [E S]}{d t} & = k_{1} [S] ({[E]}_{0} - [E S]) - (k_{1} + k_{2}) [E S] \\ \frac{d [P]}{d t} & = k_{2} [E S], \end{aligned}

(57)

where the total amount of enzyme in the form of free enzyme and complex [E]₀ ≜ [E] + [ES] is assumed to be constant. In addiction, by making the so-called quasi steady state assumption (QSSA) [29, 30], assuming that the complex ES is essentially in steady state, we can assert $\frac{d [E S]}{d t} \approx 0$ . As it has been discussed many times in the literature, QSSA reduces the system and leads to the approximate form

\begin{aligned} \frac{d [S]}{d t} & = - \frac{V_{max} [S]}{K_{m} + [S]} \\ \frac{d [P]}{d t} & = \frac{V_{max} [S]}{K_{m} + [S]}, \end{aligned}

(58)

which is known as Michaelis-Menten kinetics [30]. The characterizing parameters are V_max = k₂[E]₀ and K_m = (k_-1 + k₂)/k₁.

Applying QSSA, Rao and Arkin [2] were able to reduce the CME of S and ES to a CME only containing S. For the reduced CME, the propensity function for the overall reaction S → P is

α (s) = \frac{V_{max} s}{K_{m} + s},

(59)

where the volume was scaled so that Φ = 1 and the lower-case letter s denotes the molecule count of species S. Instead of reviewing the relatively complicated manipulations with CME, we show in the following that the techniques described above lead directly from the equation-based model to the propensity function for the reduced system.

First, we recast the equation-based model into the GMA format [31], by introducing an auxiliary variable [T] ≜ K_m + [S]. The result,

\begin{aligned} \frac{d [S]}{d t} & = - V_{max} [S] {[T]}^{- 1} \\ \frac{d [T]}{d t} & = \frac{d [S]}{d t} = - V_{max} [S] {[T]}^{- 1} \end{aligned}

(60)

is exactly equivalent to the reduced system in (58) with the initial condition [S]₀ and [T]₀ = K_m + [S]₀. The corresponding stochastic model has only one reaction channel and the propensity function is

α (s, t) = V_{max} s t^{- 1} .

(61)

The propensity adjustment factor can be set to 1 because T is a function of s and its covariance with s is therefore 1. By applying t = K_m + s, the propensity function can be simplified as

α (s, t) = V_{max} s t^{- 1} = V_{max} s {(K_{m} + s)}^{- 1} = α (s) .

(62)

Thus, we arrive at the propensity function for the reduced system, which is identical to the result of Rao and Arkin obtained through manipulations of CME.

In the above derivation, we used the simplest type of recasting, where a new, auxiliary variable simply consists of an old variable plus a constant. This reformulation of the Michaelis-Menten process as a pair of GMA equations is a special case of a much more general recasting technique that permits the equivalent conversion of any system of ordinary differential equations into a power-law format [31]. However, this equivalence transformation imposes constraints on the variables of the GMA equations, and it is at this point unclear whether there are mathematical warranties ensuring that the proposed transition from differential to stochastic equations in general preserves these constraints in all cases. This question will require further investigation.

Stochastic Focusing

Stochastic focusing [26] describes the phenomenon that the fluctuations of a chemical species can drive the system to reach a different steady state than what a deterministic ODE model predicts. To demonstrate the utility of propensity adjustment, we derive a stochastic model which produces consistent results with those of the deterministic model.

Following [32], we consider the following reactions system

\begin{gathered} ϕ \to_{}^{k_{1}} I \to_{}^{k_{2}} P \to_{}^{k_{3}} ϕ \\ I + S \to_{}^{k_{4}} S \\ S ⇌_{k_{5}}^{k_{6}} ϕ \end{gathered}

(63)

This system can be interpreted as follows: the intermediate species I is produced at constant rate k₁ from some source Φ and degrades with rate k₄ through the catalysis with signalling molecule S; the end product P is converted from species I at rate k₂ and degrades at rate k₃; the signalling molecule S is produced and degrades at rates k₅ and k₆, respectively. Moreover, the value of k₅ is reduced to half at a certain time point to achieve a significant divergence effect. In order to capture the average dynamics of the system accurately, we use a power-law model in GMA format instead of the mass action rate law in [32].

\begin{aligned} \frac{d i}{d t} & = k_{1} - k_{2} i - k_{4} i^{f_{I}} s^{f_{S}} \\ \frac{d p}{d t} & = k_{2} i - k_{3} p \\ \frac{d s}{d t} & = k_{5} - k_{6} s \end{aligned}

(64)

The system size is set to 1. We can see from equations (64) that two variables i and s contribute to the degradation of I and that their covariance could therefore affect the propensity function of I in the degradation reaction of a stochastic simulation. To calculate the propensity adjustment function paf₄(t) = exp (-f_I f_S cov [log I(t), log S(t)]) for reaction $R_{4} : I + S \to_{}^{k_{4}} S$ , we formulate equations (cf. (60)) for the moments as

\begin{aligned} \frac{\partial μ}{\partial t} & \approx V^{T} (α + \frac{1}{2} α^{″} ⊙ σ) \\ \frac{\partial σ}{\partial t} & \approx {(σ (α^{'} V))}^{T} + σ (α^{'} V) + V^{T} Λ V . \end{aligned}

(65)

Here μ = (μ_I, μ_P, μ_S)^T, $V = [\begin{matrix} 1 & 0 & 0 \\ - 1 & 1 & 0 \\ 0 & - 1 & 0 \\ - 1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & - 1 \end{matrix}]$ , $α = [\begin{matrix} α_{1} \\ α_{2} \\ α_{3} \\ α_{4} \\ α_{5} \\ α_{6} \end{matrix}] = [\begin{matrix} k_{1} \\ k_{2} i \\ k_{3} p \\ k_{4} i^{f_{I}} s^{f_{S}} \\ k_{5} \\ k_{6} s \end{matrix}]$ .

Moreover, for r = 1, ..., 6 and m, n = i, p, s, ${(α_{r}^{^{″}})}_{m n} = \frac{\partial^{2} α_{r}}{\partial m \partial n}$ , α" = (α₁", ..., α₆")^T, $σ = [\begin{matrix} σ_{11} & σ_{12} & σ_{13} \\ σ_{21} & σ_{22} & σ_{23} \\ σ_{31} & σ_{32} & σ_{33} \end{matrix}]$ , $α_{r}^{^{″}} ⊙ σ = \sum_{m, n = i, p, s} \frac{\partial^{2} α_{r} (μ)}{\partial m \partial n} σ_{m n}$ , α"⊙ σ ≜ (α₁ "⊙ σ, ..., α₆"⊙ σ)^T, α' = (α₁', ..., α₆'), ${α_{r}}^{'} = {(\frac{\partial α_{r}}{\partial i}, \frac{\partial α_{r}}{\partial p}, \frac{\partial α_{r}}{\partial s})}^{T}$ , and the diagonal matrix Λ is defined by ${(Λ)}_{r r} = α_{r} + \frac{1}{2} \sum_{m, n = i, p, s} \frac{\partial^{2} α_{r}}{\partial m \partial n} σ_{m n}$ .

The stochastic focusing model without propensity adjustment yields results quite different from those of the deterministic model, as is illustrated in Figure 7. In this figure, the blue lines in the 1^st panel are predicted from the moment equations (65) and the blue error bars for μ_P in the 2^nd panel are obtained from ten independent stochastic simulations. Both diverge systematically from the black line predicted by ODE model (64). By contrast, the stochastic model with propensity adjustment produces results consistent with the deterministic model, as shown by the 4^th panel.

Discussion

It is often implicitly assumed that the rate of a dynamic process can be directly taken as the propensity for a corresponding stochastic process. We have shown here that this is sometimes, but not always, true. Our results fall into three categories. The first develops conditions for a valid conversion of a rate to a propensity, the second presents a general conversion procedure, and the third discusses computational issues of propensity adjustment.

Conditions for the direct use of a rate constant (function) as propensity function

We have shown that the direct use of a rate constant or a rate function f as the propensity function in a stochastic simulation algorithm requires that at least one of the following assumptions be true:

1)
f is a linear function; this assumption has been validated in the Results sections addressing 0^th-order and 1^st-order reaction kinetics.
2)
the reaction is monomolecular; this assumptions was evaluated in the Results section describing real-valued order monomolecular reaction kinetics.
3)
all X_i in the system are noise-free variables, i.e., without (or with ignorable) fluctuations; this assumption implies that the covariance of any two participating reactants is zero (or close to zero). This assumption is assessed in equations (29 - 36).

Each of these three conditions is a sufficient condition for the direct use of a rate function f as the propensity function. Moreover, these statements are valid for functions of a general format, not just for GMA. This is so because the functional formats in cases 1 and 2 above are special cases of the GMA format. For the third case, a formal proof is only given for functions in GMA format, because this structured format allows us to give an explicit estimation on how the covariance can affect the average behavior of a stochastic simulation through equation (34). For functions not in GMA format, the conclusion is still holds, although an analogous explicit estimation is lacking. The argument is as follows. The bimolecular reaction E[α_r(X(t))] contains at least one quadratic moment of the form E[X_i(t)X_j(t)] (cf. [4] and page 38). Therefore, by definition of the covariance, E[X_i(t)X_j(t)] = E[X_i(t)]E[X_i(t)] + cov (X_i(t), X_j(t)), we obtain

E [X_{i} (t) X_{j} (t)] = E [X_{i} (t)] E [X_{j} (t)] \Leftrightarrow cov (X_{i} (t), X_{j} (t)) = 0 .

This result implies the following: If the covariance between every pair of random variables is zero (or ignorable), we have E[X_i(t)X_i(t)] = E[X_i(t)]E[X_i(t)] and therefore E[α_r(X(t))] = α_r(E[X(t)]). Expressed in words, the expectation of the propensity function on left-hand side of equation (29) equals its rate function, and the rate function can be directly used as propensity function in stochastic simulations.

If at least one of the three assumptions is satisfied, the stochastic simulation algorithm (SSA) is applicable without changes.

A general procedure for converting an equation-based model into a stochastic analogue

In the past, efforts have been made to manipulate the chemical master equation (CME) in order to achieve a proper propensity function for a reduced system (e.g., see [2]). However, manipulations of CME are usually complicated, and successes have been modest and rare. Here we propose an alternative strategy for converting a reduced dynamical model into a stochastic analogue. To achieve this conversion, we addressed two fundamental issues: First, under what conditions can a deterministic, equation-based model be validly used in stochastic simulations? And second, what is a proper strategy to implement such a conversion?

To address the first question, we showed that the following steps are necessary:

(1)
A concentration-based model needs to be converted into a particle-based model by accounting for the size of the system; if the concentration-based model is scaled (as was illustrated with the repressilator example), it may first have to be un-scaled in order to render the conversion valid;
(2)
The difference between the mean of a stochastic model without propensity adjustment and the corresponding quantities of the equation-based model should be evaluated. The mean of the stochastic model is obtained either through stochastic simulations or through a moment-based approach. If the difference is significant, then an adjustment of the propensity function for a non-elementary reaction is necessary.

To answer the second question, we need to execute the following steps

(3)
Compute a propensity adjustment function, either through simulated or experimental data or through a moment-based approach, in order to achieve the corrected propensity function (41);
(4)
Apply SSA or one of its variants using a propensity function with adjustment to obtain valid simulation trajectories.

Computational issues of propensity adjustments

When the propensity needs adjusting, an accurate propensity adjustment function (paf) is essential for obtaining the proper correction of the propensity. It is usually impossible to compute paf exactly, which necessitates a suitable approximation. The approximation error in paf originates from the following sources:

1)
The expression of paf in Equation (40) is a function of the mean, variance, and covariance, which are computed with a 2^nd-order Taylor expansion in log space.
2)
The moment-based approach, from which the functions of mean, variance and covariance are usually derived, is an approximation method that yields a closed ODE system for the moments. In the method used here, the propensity function is approximated by a 2^nd-order Taylor expansion, and the moments up to a certain degree (2 in our treatment) are retained, while all higher moments are assumed to be zero. One might expect that a higher-order Taylor expansion would improve the accuracy of paf, but it would come with a much higher computational cost. The error control of paf and the relative computational issues should be addressed in future studies.

Since computation cost is a major concern with the stochastic simulation of large biochemical reaction networks, another issue has yet to be addressed. Namely, how does the propensity function of a reduced system affect the accuracy and efficiency of various leaping methods that have been proposed to speed up SSA? Moreover, the question of molecular population sizes requires further analysis. Our derivation assumed large reactant populations, but simulations of a reversible pathway indicated that the method works rather well even for small populations. A more careful investigation of this issue of population size in different scenarios is still needed and should be the subject of further research.

Conclusions

Gillespie's stochastic simulation algorithm (SSA), as well as later variants, permits three kinds of elementary reactions to be modelled: 0^th, 1^st and 2^nd order reactions that are assumed to follow the law of mass action. All other types of reactions, containing non-integer kinetic orders and/or following other types of kinetic law, are assumed to be convertible to one of these three kinds, so that SSA can validly be applied. However, the conversion to elementary reactions is often difficult, infeasible, or simply impossible. First, the kinetic parameters of the underlying elementary reactions are in many cases unknown for a complex-order reaction. Second, even when all elementary kinetic parameters are available, the multitude of reaction channels and participating species creates a combinatorial complexity that renders SSA simulations computationally impractical. Within a deterministic framework, model reduction is a possible and often-used strategy to address such challenges. For example, a reduced mechanistic model, such as the Michaelis-Menten rate law, is often proposed to fit the experimental data, at the cost of sacrificing the original mechanistic interpretation. The reduction in these cases simplifies the original formulation by approximating, merging, or omitting intermediate reaction steps and reactants.

In this article, we propose a rather general strategy for converting a deterministic process model into a corresponding stochastic model and characterize the mathematical connections between the two. The deterministic framework is assumed to be a generalized mass action system and the stochastic analogue is in the format of the chemical master equation. The analysis identifies situations: where a direct conversion is valid; where internal noise affecting the system needs to be taken into account; and where the propensity function must be mathematically adjusted. The conversion from deterministic to stochastic models is illustrated with several representative examples, including reversible reactions with feedback controls, Michaelis-Menten enzyme kinetics, a genetic regulatory motif, and stochastic focusing. The construction of a stochastic model for a biochemical network requires the utilization of information associated with an equation-based model. The conversion strategy proposed here guides a model design process that ensures a valid transition between deterministic and stochastic models.

References

Gillespie D: Exact Stochastic Simulation of Coupled Chemical Reactions. J Phys Chem. 1977, 81 (25): 2340-2361. 10.1021/j100540a008.
Article CAS Google Scholar
Rao CV, Arkin AP: Stochastic chemical kinetics and the quasi-steady-state assumption: Application to the Gillespie algorithm. The Journal of chemical physics. 2003, 118 (11): 4999-5010. 10.1063/1.1545446.
Article CAS Google Scholar
Cao Y, Gillespie DT, Petzold LR: Multiscale stochastic simulation algorithm with stochastic partial equilibrium assumption for chemically reacting systems. J Comput Phys. 2005, 206: 395-10.1016/j.jcp.2004.12.014.
Article CAS Google Scholar
Gillespie DT: Stochastic simulation of chemical kinetics. Annual Review of Physical Chemistry. 2007, 58: 35-55. 10.1146/annurev.physchem.58.032806.104637.
Article CAS PubMed Google Scholar
Tian T, Burrage K: Stochastic models for regulatory networks of the genetic toggle switch. Proc Natl Acad Sci USA. 2006, 103 (22): 8372-7. 10.1073/pnas.0507818103.
Article PubMed Central CAS PubMed Google Scholar
Gomez-Uribe CA, Verghese GC: Mass fluctuation kinetics: Capturing stochastic effects in systems of chemical reactions through coupled mean-variance computations. The Journal of chemical physics. 2007, 126 (2): 024109-12. 10.1063/1.2408422.
Article PubMed Google Scholar
Lee CH, Kim K-H, Kim P: A moment closure method for stochastic reaction networks. The Journal of chemical physics. 2009, 130 (13): 134107-15. 10.1063/1.3103264.
Article PubMed Google Scholar
Singh A, Hespanha J: LogNormal Moment Closures for Biochemical Reactions. In Proc of the 45th Conf on Decision and Contr. 2006
Google Scholar
Milner P, Gillespie CS, Wilkinson DJ: Moment closure approximations for stochastic kinetic models with rational rate laws. Mathematical Biosciences. 2011, 231 (2): 99-104. 10.1016/j.mbs.2011.02.006.
Article PubMed Google Scholar
Chevalier MW, El-Samad H: A rigorous framework for multiscale simulation of stochastic cellular networks. The Journal of chemical physics. 2009, 131 (5): 054102-17. 10.1063/1.3190327.
Article PubMed Central PubMed Google Scholar
Voit EO: Computational analysis of biochemical systems: a practical guide for biochemists and molecular biologists. 2000, Cambridge University Press, xii:
Google Scholar
Savageau MA: Biochemical systems analysis. I. Some mathematical properties of the rate law for the component enzymatic reactions. Journal of Theorectical Biology. 1969, 25 (3): 365-9. 10.1016/S0022-5193(69)80026-3.
Article CAS Google Scholar
Savageau MA: Biochemical systems analysis. A study of function and design in molecular biology. 1976, Addison-Wesley, xvii:
Google Scholar
Savageau M: Michaelis-Menten mechanism reconsidered: implications of fractal kinetics. Journal of Theoretical Biology. 1995, 176 (1): 115-124. 10.1006/jtbi.1995.0181.
Article CAS PubMed Google Scholar
Savageau MA: Influence of fractal kinetics on molecular recognition. Journal of Molecular Recognition. 1993, 6 (4): 149-157. 10.1002/jmr.300060403.
Article CAS PubMed Google Scholar
Bajzer Z, Huzak M, Neff KL, Prendergast FG: Mathematical analysis of models for reaction kinetics in intracellular environments. Mathematical Biosciences. 2008, 215 (1): 35-47. 10.1016/j.mbs.2008.05.003.
Article CAS PubMed Google Scholar
Neff KL: Biochemical reaction kinetics in dilute and crowded solutions: Predictions of macroscopic and mesoscopic models and experimental observations. 2010, Mayo Clinic: Rochester, MN
Google Scholar
Neff , Kevin L, Offord Chetan P, Caride Ariel J, Strehler Emanuel E, Prendergast Franklyn G, Bajzer eljko: Validation of Fractal-Like Kinetic Models by Time-Resolved Binding Kinetics of Dansylamide and Carbonic Anhydrase in Crowded Media. Biophysical journal. 2011, 100 (10): 2495-2503. 10.1016/j.bpj.2011.04.016.
Article PubMed Central CAS PubMed Google Scholar
Chou I-C, Voit EO: Recent developments in parameter estimation and structure identification of biochemical and genomic systems. Math Biosc. 2009, 219: 57-83. 10.1016/j.mbs.2009.03.002.
Article CAS Google Scholar
Walton RJ, Preston CJ, Bartlett M, Smith R, Russell RGG: Biochemical measurements in Paget's disease of bone. European Journal of Clinical Investigation. 1977, 7 (1): 37-39. 10.1111/j.1365-2362.1977.tb01567.x.
Article CAS PubMed Google Scholar
Koch AL: The logarithm in biology 1. Mechanisms generating the log-normal distribution exactly. Journal of Theoretical Biology. 1966, 12 (2): 276-290. 10.1016/0022-5193(66)90119-6.
Article CAS PubMed Google Scholar
Limpert E, Stahel WA, Abbt M: Log-normal Distributions across the Sciences: Keys and Clues. BioScience. 2001, 51 (5): 341-10.1641/0006-3568(2001)051[0341:LNDATS]2.0.CO;2.
Article Google Scholar
Law AM, Kelton WD: Simulation Modeling and Analysis. 2000, Boston: Mc.Graw Hill, 3
Google Scholar
Gillespie D, Petzold L: Numerical Simulation for Biochemical Kinetics. In System Modelling in Cellular Biology. Edited by: Szallasi Z, Stelling J, Periwal V. 2006, MIT Press
Google Scholar
Wolkenhauer O, Ullah M, Kolch W, Cho K: Modelling and Simulation of IntraCellular Dynamics: Choosing an Appropriate Framework. IEEE Transactions on NanoBioscience. 2004, 3: 200-207. 10.1109/TNB.2004.833694.
Article PubMed Google Scholar
Paulsson J, Berg OG, Ehrenberg M: Stochastic focusing: Fluctuation-enhanced sensitivity of intracellular regulation. Proceedings of the National Academy of Sciences. 2000, 97 (13): 7148-7153. 10.1073/pnas.110057697.
Article CAS Google Scholar
Elowitz MB, Leibler S: A synthetic oscillatory network of transcriptional regulators. Nature. 2000, 403 (6767): 335-338. 10.1038/35002125.
Article CAS PubMed Google Scholar
Bennett MR, Volfson D, Tsimring L, Hasty J: Transient Dynamics of Genetic Regulatory Networks. Biophysical journal. 2007, 92 (10): 3501-3512. 10.1529/biophysj.106.095638.
Article PubMed Central CAS PubMed Google Scholar
Segel LA: On the validity of the steady state assumption of enzyme kinetics. Bull Math Biol. 1988, 50: 579-593.
Article CAS PubMed Google Scholar
Michaelis L, Menten ML: Die Kinetik der Invertinwirkung. Biochem Zeitschrift. 1913, 49: 333-369.
CAS Google Scholar
Savageau MA, Voit EO: Recasting Nonlinear Differential-Equations As S-Systems - A Canonical Nonlinear Form. Mathematical Bioscience. 1987, 87: 83-115. 10.1016/0025-5564(87)90035-6.
Article Google Scholar
Twomey A: On the Stochastic Modelling of Reaction-Diffusion Processes. 2007, University of Oxford
Google Scholar

Download references

Acknowledgements

The authors thank Dr. Yi Jiang for useful comments and for providing seminal references. The authors also appreciate Dr. Mukhtar Ullah's and Dr. Olaf Wolkenhauer's insightful comments and conceptual clarifications. This work was supported in part by a Molecular and Cellular Biosciences Grant (MCB-0946595; E.O. Voit, PI) from the National Science Foundation. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsoring institutions.

Author information

Authors and Affiliations

Deparment of Mathematics, Bioinformatics Program, Georgia Institute of Technology, Atlanta, GA, 30332, USA
Jialiang Wu
The Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA
Brani Vidakovic & Eberhard O Voit
Integrative BioSystems Institute and The Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA
Eberhard O Voit

Authors

Jialiang Wu
View author publications
You can also search for this author in PubMed Google Scholar
Brani Vidakovic
View author publications
You can also search for this author in PubMed Google Scholar
Eberhard O Voit
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eberhard O Voit.

Additional information

Authors' contributions

JW developed the mathematical derivations, designed and performed the simulation, and drafted the manuscript. BV contributed to the statistical reasoning and revised the manuscript. EV supervised the research and revised the manuscript. All authors read and approved the final manuscript.

Electronic supplementary material

Additional file 1: Derivation of the mean and variance of a power-law function of random variables. (PDF 123 KB)

12918_2011_790_MOESM2_ESM.PDF

Additional file 2: Computation of approximate mean and covariance for a generic propensity function to be used in stochastic simulations. (PDF 133 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Wu, J., Vidakovic, B. & Voit, E.O. Constructing stochastic models from deterministic process equations by propensity adjustment. BMC Syst Biol 5, 187 (2011). https://doi.org/10.1186/1752-0509-5-187

Download citation

Received: 18 July 2011
Accepted: 08 November 2011
Published: 08 November 2011
DOI: https://doi.org/10.1186/1752-0509-5-187

Constructing stochastic models from deterministic process equations by propensity adjustment

Abstract

Background

Results

Conclusions

Background

Representations of systems of biochemical reactions

Motivation for the power-law formalism: reactions in crowded media

The Generalized Mass Action (GMA) format

Proper use of equation-based functions for stochastic simulations

Methods

Deriving the mean and variance of a power-law function of random variables

Deriving proper propensity functions for stochastic simulations from differential equation-based models

Statistical criteria for propensity adjustment

Results

Generic special cases

Power-law representation of a reversible reaction with feedback controls

Repressilator

Enzymatic reaction using a quasi-steady state assumption (QSSA)

Stochastic Focusing

Discussion

Conditions for the direct use of a rate constant (function) as propensity function

A general procedure for converting an equation-based model into a stochastic analogue

Computational issues of propensity adjustments

Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Authors' contributions

Electronic supplementary material

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Systems Biology

Contact us