A higherorder numerical framework for stochastic simulation of chemical reaction systems
 Tamás SzékelyJr^{1}Email author,
 Kevin Burrage^{1, 2},
 Radek Erban^{3} and
 Konstantinos C Zygalakis^{4, 5}
DOI: 10.1186/17520509685
© Székely et al.; licensee BioMed Central Ltd. 2012
Received: 28 February 2012
Accepted: 22 June 2012
Published: 15 July 2012
Abstract
Background
In this paper, we present a framework for improving the accuracy of fixedstep methods for Monte Carlo simulation of discrete stochastic chemical kinetics. Stochasticity is ubiquitous in many areas of cell biology, for example in gene regulation, biochemical cascades and cellcell interaction. However most discrete stochastic simulation techniques are slow. We apply Richardson extrapolation to the moments of three fixedstep methods, the Euler, midpoint and θtrapezoidal τleap methods, to demonstrate the power of stochastic extrapolation. The extrapolation framework can increase the order of convergence of any fixedstep discrete stochastic solver and is very easy to implement; the only condition for its use is knowledge of the appropriate terms of the global error expansion of the solver in terms of its stepsize. In practical terms, a higherorder method with a larger stepsize can achieve the same level of accuracy as a lowerorder method with a smaller one, potentially reducing the computational time of the system.
Results
By obtaining a global error expansion for a general weak firstorder method, we prove that extrapolation can increase the weak order of convergence for the moments of the Euler and the midpoint τleap methods, from one to two. This is supported by numerical simulations of several chemical systems of biological importance using the Euler, midpoint and θtrapezoidal τleap methods. In almost all cases, extrapolation results in an improvement of accuracy. As in the case of ordinary and stochastic differential equations, extrapolation can be repeated to obtain even higherorder approximations.
Conclusions
Extrapolation is a general framework for increasing the order of accuracy of any fixedstep stochastic solver. This enables the simulation of complicated systems in less time, allowing for more realistic biochemical problems to be solved.
Keywords
Stochastic simulation algorithms τleap Highorder methods Monte Carlo errorBackground
Biochemical systems with small numbers of interacting components have increasingly been studied in recent years, as they are some of the most basic systems in cell biology[1–3]. Stochastic effects can strongly influence the dynamics of such systems. Applying deterministic ordinary differential equation (ODE) models to them, which approximate particle numbers as continuous concentrations, can lead to confusing results[4, 5]. In some cases, even systems with large populations cannot be accurately modelled by ODEs. For instance, when close to a bifurcation regime, ODE approximations cannot reproduce the behaviour of the system for some parameter values[6]. Stochastic systems can be modelled using discrete Markov processes. The density of states of a wellstirred stochastic chemical reaction system at each point in time is given by the chemical master equation (CME)[7, 8]. The stochastic simulation algorithm (SSA)[9] is an exact method for simulating trajectories of the CME as the system evolves in time.
The SSA can be computationally intensive to run for realistic problems, and alternative methods such as the τleap have been developed to improve performance[10]. Instead of simulating each reaction, the τleap performs several reactions at once, thus ‘leaping’ along the history axis of the system. This means that, unlike the SSA, the τleap is not exact; accuracy is maintained by not allowing too many reactions to occur per step. The size of each timestep, τ, determines the number of reactions occurring during that step, given by a Poisson random number.
This gain in speed must be balanced with loss of accuracy: larger steps mean fewer calculations but reduced accuracy. Many common τleap implementations employ a variable stepsize, as using the optimal stepsize τ at each point is crucial for the accuracy of the method[10–12]. However a fixedstep implementation can be useful in some cases. Although it may be less efficient, it is much easier to implement than variablestep equivalents. More importantly, the extrapolation framework that we describe in this paper requires a fixedstep method.
The original τleap as described by Gillespie[10] is known as the Euler τleap, as it can be compared to the Euler method for solving ODEs. It has been shown to have weak order of convergence one under both the scaling$\tau \to 0$ (traditional scaling)[13, 14] and$V\to \infty $ (large volume scaling)[15], where V is the volume of the system. In the same paper, Gillespie also proposed the midpoint τleap method[10], which has higherorder convergence in some cases[15, 16]. Tian and Burrage[17] proposed a variant known as the Binomial τleap method that avoids issues with chemical species becoming negative. Only recently has more work been done on constructing higherorder stochastic methods. One such method is the randomcorrected τleap[18], where at each timestep a random correction is added to the Poisson random number that determines the number of reactions in that step. Given a suitable random correction, the lowest order errors on the moments can be cancelled. In this way methods with up to weak order two convergence for both mean and covariance have been constructed. More recently, Anderson and Koyama[19] and Hu et al.[20] proposed another weak secondorder method, the θtrapezoidal τleap, which is an adaptation of the stochastic differential equation (SDE) solver of Anderson and Mattingly[21] for the discrete stochastic case.
 (1)
It increases the order of accuracy of the methods supplied to it. This is desirable for the obvious reason that the resulting solutions are more accurate, as well as that larger timesteps can be used to reach a certain level of accuracy, reducing the computational cost. This is discussed further in our Conclusions.
 (2)
It can be applied to any fixedstep solver, for instance inherently higherorder methods such as the θtrapezoidal τleap or methods with an extended stability region such as stochastic RungeKutta methods [24].
 (3)
The resulting higherorder solutions can be extrapolated again to give solutions with even higher order, as there is no (theoretical) limit on the number of times a method can be extrapolated (although statistical errors can obscure the results if the method is too accurate  see Section Monte Carlo error).
Our extrapolated methods may be useful for researchers in biology and biochemistry, as they are easy to implement and can accurately and quickly simulate discrete stochastic systems that could otherwise be too computationally intensive.
We show how the extrapolation framework can be applied to fixedstep stochastic algorithms using the examples of the fixedstep Euler τleap, midpoint τleap[10] and θtrapezoidal τleap[20] methods. The extrapolation procedure depends heavily on the the existence of an appropriate global error expansion for the weak error of the numerical method. Once this is known, extrapolation consists of simple arithmetic. We calculate such an expansion for an arbitrary weak firstorder method; this allows us to use extrapolation in order to obtain higherorder solutions. The weak order of all the moments of such methods can be improved by extrapolation. To reinforce this, we perform a simple error analysis by comparing the equations for the true and numerical mean of the Euler τleap method; we see that its global error is order one, and extrapolating it increases the order to two for the case of zerothorder and firstorder reactions. Using numerical simulations, we demonstrate that this is true for two firstorder and three higherorder test systems with the Euler, midpoint and θtrapezoidal τleap methods. Moreover, the extrapolated methods have consistently lower errors, and in many cases visibly higherorder convergence in the first two moments (the lack of convergence in some of the simulations is discussed in Section Monte Carlo error). Finally, we demonstrate that the extrapolation framework can be used to give even higherorder numerical solutions by applying a second extrapolation to the Euler τleap method.
The rest of this paper is organized as follows. We begin with an overview of the SSA and the τleap methods we will use later. We then discuss Richardson extrapolation for ODEs and SDEs and introduce the extrapolated discrete stochastic framework. We give numerical results to support our claims that extrapolation reduces the error of fixedstep methods. Finally, we discuss the Monte Carlo error and give our conclusions. The derivations of the global error expansions for SDEs and discrete stochastic methods and related material are presented in the Appendix.
Overview of stochastic simulation methods
SSA
Gillespie’s SSA[9] is a statistically exact method for simulating paths from a Markov jump process. The two basic assumptions of the SSA are (i) that individual molecules are not explicitly tracked, and (ii) there are frequent nonreactive collisions. Thus we assume that the system is wellmixed and homogeneous.
The SSA simulates a system of biochemical reactions with N species and M reactions, interacting inside a fixed volume V at constant temperature. The populations of chemical species (as molecular numbers, not concentrations) at time t are represented as a state vector$\mathbf{x}\equiv \mathbf{X}\left(t\right)\equiv {({X}_{1},\dots ,{X}_{N})}^{T}$. Reactions are represented by a stoichiometric matrix${\mathit{\nu}}_{j}\equiv {({\nu}_{1j},\dots ,{\nu}_{\mathrm{Nj}})}^{T}$, where j = 1,…,M, composed of M individual stoichiometric vectors. Each stoichiometric vector represents a reaction j occurring and the system changing from state x to x + _{ ν j }. Each reaction occurs in an interval t t + τ) with relative probability a_{ j }(x)dt, where a_{ j } is the propensity function of the jth reaction. Propensity functions are given by the massaction kinetics of the reactant chemical species. For more detail, the reader is referred to Ref.[9]. The variables X,ν_{ j } and a_{ j }(X) fully characterise the system at each point in time.
 1.
Generate two random numbers r1and r2from the unitinterval uniform distribution $\mathcal{U}(0,1)$.
 2.
Find the time until the next reaction $\tau =\frac{1}{{a}_{0}}ln\left(\frac{1}{{r}_{1}}\right)$, where ${a}_{0}\left({\mathbf{X}}_{n}\right)=\sum _{j=1}^{M}{a}_{j}\left({\mathbf{X}}_{n}\right)$.
 3.
Find next reaction j from $\sum _{m=1}^{j1}{a}_{m}\left({\mathbf{X}}_{n}\right)<{a}_{0}{r}_{2}\le \sum _{m=j}^{M}{a}_{m}\left({\mathbf{X}}_{n}\right)$.
 4.
Update t _{n + 1} = t _{ n } + τ and X _{n+ 1} = X _{ n } + ν _{ j }.
The Direct Method requires two newlygenerated random numbers at each timestep. Although there are other SSA implementations, such as the Next Reaction Method[25] and the Optimised Direct Method[26], which can be more economical, in general the SSA is computationally costly.
τleap method
The τleap algorithm leaps along the history axis of the SSA by evaluating groups of reactions at once[10]. This means significantly fewer calculations, i.e. shorter computational time, per simulation, but simulation accuracy is compromised: we do not know exactly how many reactions occurred during each time step, nor can we tell more precisely when each reaction occurs than in which timestep. The leap condition defines an upper bound for the size of each timestep τ: it must be so small that the propensities do not change significantly for its duration, i.e. the change in state from time t to t + τ is very small[10]. Since τ is small, the probability a(x)τ that a reaction occurs during t t + τ) is also small, so the number of times K_{ j }each reaction fires over one timestep can be approximated by$\mathcal{P}\left({a}_{j}\right(\mathbf{x}\left)\tau \right)$, a Poisson random variable with mean and variance a_{ j }(x)τ[10]. The Euler τleap algorithm is the basic τleap method, and corresponds to the Euler method for solving ODEs or the EulerMaruyama method for solving SDEs.
 1.
Generate M Poisson random numbers ${k}_{j}=\mathcal{P}\left({a}_{j}\right({\mathbf{X}}_{n}\left)\tau \right)$.
 2.
Update t _{n + 1} = t _{ n } + τ and ${{\mathbf{X}}_{n}}_{+1}={\mathbf{X}}_{n}+\sum _{j=1}^{M}{\mathit{\nu}}_{j}{k}_{j}$.
The Euler τleap has weak order one[13–15]. Although considerable work has been done on improving the mechanism for selecting the timesteps τ[10–12] and eliminating steps that would result in negative populations[17, 27–29], this does not affect the order of the method, limiting its accuracy. Methods with higher order are the only way to improve the accuracy beyond a certain point. Realising this, Gillespie also proposed a higherorder τleap method, the midpoint τleap[10]. This is similar to the midpoint method for ODEs, where at each step an estimate is made of the gradient of X at t_{ n } + τ/2. X_{ n } is then incremented using this extra information to give a more accurate approximation.
 1.
Calculate ${\mathbf{X}}^{\prime}={\mathbf{X}}_{n}+\frac{1}{2}\tau \sum _{j=1}^{M}{\mathit{\nu}}_{j}{a}_{j}\left({\mathbf{X}}_{n}\right)$.
 2.
Generate M Poisson random numbers ${k}_{j}=\mathcal{P}\left({a}_{j}\right({\mathbf{X}}^{\prime}\left)\tau \right)$.
 3.
Update t _{n + 1} = t _{ n } + τ and ${\mathbf{X}}_{n}{+}_{1}={\mathbf{X}}_{n}+\sum _{j=1}^{M}{\mathit{\nu}}_{j}{k}_{j}$.
Although under the scaling$\tau \to 0$ the midpoint τleap has the same order of accuracy in the mean as the Euler τleap method, under the large volume scaling it has weak order two[15, 16]. Our numerical simulations also suggest that it gives higherorder approximations to the first two moments for both linear and nonlinear systems (although this is not clear from the literature). However the local truncation error of its covariance is firstorder[16].
θtrapezoidal τleap method
Based on the SDE method of Anderson and Mattingly[21], the θtrapezoidal τleap[20] is a weak secondorder method. It consists of two steps, a predictor step with size θτ and a corrector step with size (1 − θ)τ that aims to cancel any errors made in the first step.
 1.
Generate M Poisson random numbers ${k}_{j}^{\prime}=\mathcal{P}\left({a}_{j}\right({\mathbf{X}}_{n}\left)\mathrm{\theta \tau}\right),j=1,\dots ,M$.
 2.
Calculate predictor step ${\mathbf{X}}^{\prime}={\mathbf{X}}_{n}+\sum _{j=1}^{M}{\mathit{\nu}}_{j}{k}_{j}^{\prime}$.
 3.
Calculate ${l}_{j}=\text{max}\left({\alpha}_{1}{a}_{j}\left({\mathbf{X}}^{\prime}\right){\alpha}_{2}{a}_{j}\left({\mathbf{X}}_{n}\right),0\right)$.
 4.
Generate M Poisson random numbers ${k}_{j}=\mathcal{P}\left({l}_{j}\right(1\theta \left)\tau \right),j=1,\dots ,M$.
 5.
Update t _{n + 1} = t _{ n } + τ and ${{\mathbf{X}}_{n}}_{+1}={\mathbf{X}}^{\prime}+\sum _{j=1}^{M}{\mathit{\nu}}_{j}{k}_{j}$.
Specifically, the θtrapezoidal τleap method was shown to have weak order of convergence two in the moments, and a local truncation error of$\mathcal{O}\left({\tau}^{3}{V}^{1}\right)$ for the covariance. τ = V^{−β}, 0 < β < 1 and in the analysis$V\to \infty $, but in simulations the system volume is kept constant; thus it seems that in practice this also results in weak secondorder convergence in the covariance[20].
Methods
The extrapolation framework
which implies that${\widehat{\mathbf{x}}}_{n}^{h}$ is now a secondorder approximation to x(T).
For instance, in Eq. (4) we used (with p = 2)${\mathbf{x}}_{n}^{h}=z(1,1)$ and${\mathbf{x}}_{2n}^{h/2}=z(1,2)$ to find${\widehat{\mathbf{x}}}_{n}^{h}=z(2,1)$. Repeating with${\mathbf{x}}_{2n}^{h/2}$ and${\mathbf{x}}_{2n}^{h/4}$, we could extrapolate to find${\widehat{\mathbf{x}}}_{2n}^{h/2}=z(2,2)$. Then we could extrapolate${\widehat{\mathbf{x}}}_{n}^{h}$ and${\widehat{\mathbf{x}}}_{2n}^{h/2}$ to find a thirdorder approximation${\widehat{\widehat{\mathbf{x}}}}_{n}^{h}=z(3,1)$, and so on.
where$a(\mathbf{x},t):{\mathbb{R}}^{N+1}\mapsto {\mathbb{R}}^{N}$$b(\mathbf{x},t):{\mathbb{R}}^{N+1\times M}\mapsto {\mathbb{R}}^{N\times M}$ and W_{ t } is a standard Mdimensional Wiener increment. Talay and Tubaro[23] derived a similar expansion to Eq. (1) for the global error when${\mathbf{x}}_{n}^{h}$ was calculated using the EulerMaruyama and Milstein schemes (outlined in Appendix A). By using this expansion and the extrapolation framework, they were able to derive a secondorder approximation to$\mathbb{E}\left(f\right(\mathbf{x}\left(t\right)\left)\right)$. The crucial step in obtaining the global error expansion was to express it as a telescopic sum of the local errors. Liu and Li[30] also followed a similar procedure to derive a global error expansion for numerical methods for SDEs with Poisson jumps, thus allowing them to obtain higherorder weak approximations.
Extrapolation for discrete chemical kinetics
The extrapolation framework can be extended to the discrete stochastic regime. Since it requires two or more sets of approximations with given stepsizes (e.g. h and h/2), it can only be used with a fixedstep method: as more complex τleap methods vary τ at each step, it is not clear how to extrapolate them. However, this has the advantage of making our method very easy to program, as there is no complex programming overhead, for instance in choosing the timestep for τleap methods. We stress that we mostly use extrapolation to obtain higherorder approximations to the moments of the system (or their combinations, such as the covariance). In principle, given enough of the moments, the full probability distribution at some given time could be constructed. This is known as the Hamburger moment problem[31] and in general is a difficult problem to solve, as it might admit an infinite number of solutions. However, in some cases it is possible to reconstruct the full distribution from the extrapolated moments, as we have a priori knowledge about its shape. For instance, when the final distribution of states is known to be binomial, only the mean and variance are necessary for constructing the full extrapolated distribution (see Numerical Results, System 1).
In this section we focus on the Euler τleap method (ETL), since this choice simplifies the analysis, but we show in Appendix B that any fixedstepsize method with known weak order can be extrapolated. In our numerical investigations we show results for the ETL, the midpoint τleap (MPTL) and the θtrapezoidal τleap (TTTL) method. Extrapolating the ETL is very similar to extrapolating an ODE solver. The extrapolated ETL, which we call xETL from here on, involves running two sets of S ETL simulations for time T = nτ.
 1.
Run S ETL simulations with stepsize τ, to get ${s}_{}{\mathbf{x}}_{n}^{\tau},s=1,\dots ,S$.
 2.
Calculate desired moments ${\mathbb{E}}_{S}\left(f\right({\mathbf{x}}_{n}^{\tau}\left)\right)=\frac{1}{S}\sum _{s=1}^{S}f{(}_{s}{\mathbf{x}}_{n}^{\tau})$.
 3.
Repeat steps 1 and 2 using stepsize ${\mathbb{E}}_{S}\left(f\right({\mathbf{x}}_{2n}^{\tau /2}\left)\right)$.
 4.
Take $\left(2{\mathbb{E}}_{S}\left(f\right({\mathbf{x}}_{2n}^{\tau /2}\left)\right){\mathbb{E}}_{S}\left(f\right({\mathbf{x}}_{n}^{\tau}\left)\right)\right)$ as the extrapolated approximation to the desired moment.
Algorithm 5 can be easily modified for use with any other fixedstep method, by replacing the ETL in Step 1 with the chosen method.
so the leading error term has been cancelled, leaving an order two approximation. Such a calculation would also apply for the MPTL. The difference is that for linear systems the MPTL is secondorder convergent with respect to the mean[16], and similarly for the TTTL[20]. This should be taken into account in order to choose the correct extrapolation coefficients.
The above analysis only applies for the mean of a linear system, a very restricted case, but it is useful for demonstrating the basic principles of stochastic extrapolation. We employ a similar approach to Talay and Tubaro[23] and Liu and Li[30] to find a general expression for the global error expansion of the moments of a weak firstorder discrete stochastic method; this is Appendix B. In Appendix C we explicitly evaluate this for the particle decay system and show that it is equivalent to Eq. 12 in this case. Appendix D contains the equations for the second moment for the case of linear systems.
Multimodal systems
As discussed before, one limitation of our approach is that only specific characteristics of the particle distribution can be extrapolated, rather than the full distribution. Typically we choose these to be the first and second moments, as for many systems these are the quantities of interest. However, in some cases the moments do not take values relevant to the actual dynamics of the system[35, 36]. This occurs, for instance, in bimodal or multimodal systems, which have two or more stable states. Nevertheless, our method can be easily generalised to accommodate multimodal distributions as follows.
 1.
Run S ETL simulations with stepsize τ, to get ${s}_{}{\mathbf{x}}_{n}^{\tau},s=1,\dots ,S$.
 2.
Plot histograms of the particle populations at time T and identify the stable states.
 3.
Choose point(s) at which to partition the S simulations into p subsets of S _{1},…,S _{p} simulations clustered around each stable state.
 4.
Calculate desired moments over the subsets of simulations, ${\mathbb{E}}_{{S}_{i}}\left(f\right({\mathbf{x}}_{n}^{\tau}\left)\right)=\frac{1}{{S}_{i}}{\sum}_{s=1}^{{S}_{i}}f{(}_{s}{\mathbf{x}}_{n}^{\tau}),i=1,\dots ,p$.
 5.
Repeat steps 1 and 4 using stepsize τ/2 to get ${\mathbb{E}}_{{S}_{i}}\left(f\right({\mathbf{x}}_{2n}^{\tau /2}\left)\right),i=1,\dots ,p$.
 6.
Take $\left(2{\mathbb{E}}_{{S}_{i}}\left(f\right({\mathbf{x}}_{2n}^{\tau /2}\left)\right){\mathbb{E}}_{{S}_{i}}\left(f\right({\mathbf{x}}_{n}^{\tau}\left)\right)\right),i=1,\dots ,p$ as the extrapolated approximation to the desired moment for each of the p subsets of simulations.
Algorithm 6 is also simple to code and does not require significant extra computational time compared to Algorithm 5 because the dynamics of the system are found from the original simulations that are necessary for the extrapolation anyway. The point(s) at which the simulations are split into subsets can affect the accuracy of the results, so must be chosen with some care. In the Numerical Results (System 5), we apply Algorithm 6 to a bimodal system, and investigate the effects of the choice of splitting point.
Results and discussion
Numerical results
for the extrapolated methods. Here x(T) is the analytical solution at time T and${\mathbb{E}}_{S}\left(f\right({\mathbf{x}}_{n}^{\tau}\left)\right)$ are the moments of its approximations given by S simulations of a fixedstep method with stepsize τ run for n steps. For the linear systems, the true solution is calculated analytically; for the nonlinear systems we use the value given by 10^{6} or 10^{7} repeats of the SSA (depending on the system). The error of a weak order α method with stepsize τ is approximately C τ^{ α }, where C is an unknown constant. To easily see the order of accuracy of our results, we plot all the errors on loglog plots. Gradients are calculated using a least squares fit through the points. The highest level of Monte Carlo error, which can be calculated for the linear systems, is marked on the appropriate plots as a straight black line. Below this level, the absolute error results are, at least in part, essentially random (see Section Monte Carlo error). We note that in all test systems, the timesteps used were all in the useful τleaping regime: Poisson counts for each reaction channel varied between tens to hundreds.
System 1: Particle decay system
System 2: Chain decay system
System 3: MichaelisMenten system
We investigated the effects of the coefficient of variation (CV, standard deviation divided by mean) on this system. The CVs of each species, averaged across all τ, in System 3 at T = 16 were CV(X(16)) = (0.01,0.003,0.02,0.004)^{ T }. In general, a higher CV indicates that the system is more noisy. We chose a new set of parameters for System 3 to give higher CVs:${\mathbf{X}}_{\text{new}}\left(0\right)={(100,50,200,0)}^{T},{\mathit{k}}_{\text{new}}={(1{0}^{4},0.05,0.07)}^{T}$. The CVs using these parameters were$\text{CV}\left({\mathbf{X}}_{\text{new}}\right(16\left)\right)={(0.05,0.03,0.1,0.07)}^{T}$, very different from the original CVs. However the relative errors (absolute error divided by average SSA state) at τ = 0.1 were very similar:${\text{error}}_{\text{rel}}\left(\mathbf{X}\right(16\left)\right)={(0.004,0.0005,0.003,0.001)}^{T}$ for the original system and${\text{error}}_{\text{rel}}\left({\mathbf{X}}_{\text{new}}\right(16\left)\right)={(0.001,0.0002,0.0007,0.002)}^{T}$ with the new parameters (note that it is not useful to average the errors across all τ). This shows that higher CV does not necessarily mean higher errors, and the two are indicators of different characteristics of the system. We have focused on the errors as this is the characteristic that we want to improve.
System 4: Twoenzyme mutual inhibition system
System 5: Schlögl system
The gradient of the MPTL mean error seems to change from approximately one (Figure6) to around 1.5 (Figure7). It is unlikely that this is due to Monte Carlo error, as the error of the MPTL is high enough that this should not be an issue. In fact, this is probably due to the large volume limit behaviour of the MPTL, discussed after Algorithm 3. Because the mean of the high peak is several times higher than the mean of the low peak, the system is closer to the large volume limit and the weak order of the MPTL increases accordingly. Once in the large volume limit, the gradient is expected to be two.
Relative differences in moments for different splitting values of Schlögl system
Moment  Low peak  High peak 

Mean  6.4%  1.6% 
Second moment  21.8%  2.5% 
Variance  81.6%  50.5% 
Higher extrapolation
Monte Carlo error
where${\mathbb{E}}_{\infty}\left(f\right({\mathbf{x}}_{n}^{\tau}\left)\right)$ are the theoretical values of the moments calculated by an infinite number of simulations with stepsize τ. The first term is the truncation error of the moments from their analytical solutions, i.e. the bias of the method, which depends only on the choice of timestep. The second term is the Monte Carlo error, which depends only on the number of simulations and is given by$\frac{C}{\sqrt{S}}$, where C is some constant and S the number of simulations. The Monte Carlo error can be so large that it overwhelms the bias of the underlying numerical method completely; in this case all of the numerical results are, in effect, incorrect, as they are random fluctuations.
This formulation is useful when the propensity functions are linear. In this case, the moment equations are closed, so${\mathbb{E}}_{\infty}\left(f\right({\mathbf{x}}_{n}^{\tau}\left)\right)$ can be calculated for the appropriate numerical method. As an example, consider the mean of the ETL: its true value is given by Eq. (10) and the value of its numerical approximation can be found by iterating Eq. (8). In addition, a similar calculation can be found in Appendix D for the second moment. Unfortunately, this is not possible for nonlinear systems, since in this case the equations describing the evolution of the moments are not closed any more[40].
Variance reduction methods, which aim to decrease the Monte Carlo error, are another useful way of reducing computational time: because the Monte Carlo error is lower, less simulations need to be run for a given accuracy, saving time. This is an important topic in its own right and we do not address it in this paper; we refer the interested reader to e.g.[41]. It is an active research area: recently Anderson and Higham[42] were able to significantly reduce the overal computational cost associated with the stochastic simulation of chemical kinetics, by extending the idea of multi level Monte Carlo for SDEs[43, 44].
Conclusions
Processing times of System 1
τ  

0.05  0.1  0.2  0.4  0.8  
ETL  21.3 (9.2)  13.3 (18.4)  7.63 (37.1)  4.08 (74.7)  2.21 (152.0) 
xETL    34.3 (0.05)  21 (0.16)  11.8 (0.61)  6.34 (2.6) 
xxETL      (MC) 42.1 (0.015)  (MC) 25 (0.016)  14 (0.043) 
MPTL  21.3 (0.024)  13.4 (0.070)  7.65 (0.25)  4.18 (1.0)  2.24 (4.2) 
xMPTL    (MC) 34.7 (0.0088)  (MC) 21.1 (0.0082)  (MC) 11.8 (0.0026)  6.38 (0.041) 
xxMPTL      (MC) 42.4 (0.0089)  (MC) 25.2 (0.0090)  (MC) 14 (0.0088) 
TTTL  40.8 (0.017)  21.1 (0.063)  13.3 (0.26)  7.65 (1.1)  4.17 (4.2) 
xTTTL    (MC) 62.1 (0.0016)  (MC) 34.6 (0.0022)  (MC) 20.9 (0.0044)  11.8 (0.041) 
xxTTTL      (MC) 75.4 (0.0022)  (MC) 42.2 (0.0031)  (MC) 25.1 (0.011) 
Processing times of System 4
τ  

0.005  0.010  0.020  0.040  0.08  
ETL  191 (16.7)  82.3 (34.5)  46 (76.1)  32.6 (188.5)  19.7 (630.1) 
xETL    293 (1.1)  123 (7.2)  66.4 (36.3)  35.1 (253.1) 
xxETL      279 (0.93)  152 (2.5)  80.1 (36.0) 
MPTL  165 (1.3)  84 (3.6)  44.9 (15.0)  24.1 (65.5)  12.3 (257.1) 
xMPTL    244 (1.0)  129 (7.7)  69.1 (35.5)  36.3 (126.2) 
xxMPTL      (MC) 289 (1.2)  (MC) 153 (1.5)  81.3 (5.3) 
TTTL  277 (1.2)  155 (3.5)  82.4 (13.5)  44.6 (58.1)  23.3 (90.0) 
xTTTL    (MC) 430 (0.41)  (MC) 239 (0.15)  128 (1.4)  68 (107.5) 
xxTTTL      (MC) 515 (0.45)  (MC) 283 (0.37)  (MC) 151 (16.9) 
Monte Carlo error is an unavoidable problem when using stochastic simulations. The statistical fluctuations inherent in stochastic systems can obscure the bias error (i.e. order of convergence) of the numerical method if their size relative to the bias is large, as the total error is made up of these two contributions. A large number S of simulations must be run, as the Monte Carlo error scales as$1/\sqrt{S}$. This error varies for each system. Figures9 and10 (and 11) show this clearly: both of the xxETLs have total errors of similar size for the same τ, but System 2 has relatively low Monte Carlo error, allowing us to see the bias of that system. However, we believe that System 3 has relatively high Monte Carlo error compared to its bias, implying that the xxETL errors we see in the figure are all due to statistical fluctuations. It should be noted that this seems to happen for all five test problems we use. The reason for this is that the extrapolated methods (and even the MPTL and TTTL, in some cases) have very high accuracy (i.e. low bias error). Since it is only possible to run a limited number of simulations, when the bias is very small, the total error will be given almost completely by the contribution from the Monte Carlo error.
A contrasting approach to reducing numerical errors is the multilevel Monte Carlo method. Originally developed for SDEs[43, 44], it has recently been extended to discrete chemical kinetics[42]. By considering a formulation of the total error similar to Eq. (15), the multilevel MonteCarlo method aims to reduce it by decreasing the Monte Carlo error. Here also many approximate solutions are generated with a variety of different timesteps. By intelligently combining many coarsegrained simulations with few finegrained ones, it is possible to find a similar level of accuracy to just using finegrained simulations. In contrast, extrapolation uses the same number of coarse and finescale solutions and gives results which are more accurate than the finescale solution, by reducing the bias instead of the Monte Carlo error. In cases where the bias is obscured by statistical errors, using a combination of both extrapolation and the multilevel Monte Carlo method would be ideal, as it would reduce both sources of error. This is an interesting research question and we plan to address it in the future.
In this work, we have extended the extrapolation framework, which can increase the weak order of accuracy of existing numerical methods, to the discrete stochastic regime. To demonstrate the concept, we have applied it to three fixedstep methods, the Euler, midpoint and θtrapezoidal τleap methods. Thus we have demonstrated numerically the effectiveness of extrapolation on a range of discrete stochastic numerical methods with different orders of accuracy for a variety of problems. The main requirement to use extrapolation with a numerical method is the existence of an expression for the global error that relates the error to the stepsize of the method. Analytically, this is all that must be found to show higher weak order convergence of the extrapolated method. To extrapolate once, only the leading error term need be known; further extrapolation requires knowledge of higher terms. We have found the form of the global weak error for a general weak order one method; the procedure is similar for higherorder methods. This is the real power of our approach: it can be applied to any fixedstep numerical method. Moreover, further extrapolations can raise the order of accuracy of the method indefinitely (although beyond a certain point the lower errors will be overtaken by Monte Carlo errors). We expect our method to be useful for more complex biochemical systems, for instance where frequent reactions must be simulated fast but accuracy is still important.
Appendices
SDE extrapolation
Eq. (17) shows that the EulerMaruyama and Milstein methods have global weak order one. It is easy to see that extrapolating them leads to solutions with global weak order two.
Discrete stochastic global error expansion
Remark 1
An important element in our derivation of a global error expansion relates to the boundedness of u(x t), and its discrete derivative. This boundedess is guaranteed[14] when the number of molecules in the chemical system is conserved or decreases with time. Proving this in the general case where zerothorder reactions can add molecules to the system is a nontrivial task. One way around this problem is to set the propensity functions a_{ j }(x) to zero outside a large but finite domain[14]; this is the approach we follow here.
whereψ_{ e }(x t) is dependent on the numerical method and given by (22).
An example explicit calculation of the global error expansion for a linear system
which agrees exactly with what one obtains by calculating Eq. (12) for System 1 (i.e. setting d=0,W=−κ).
Formulae for the second moment of the Euler τleap in the case of linear systems
We can now iterate this formula in order to obtain the numerical approximation for the second moment of the ETL at any timestep.
where B(t)=diag(C μ(t)). By solving Eq. (26) and iterating Eq. (25), we can use the formulation of Eq. (15) to quantify the bias and Monte Carlo errors of the ETL. A similar approach can also be used for the MPTL and TTTL.
List of abbreviations
 ODE:

Ordinary differential equation
 SDE:

Ordinary differential equation
 CME:

Chemical Master Equation
 SSA:

Stochastic simulation algorithm
 ETL:

Euler τleap
 xETL:

Extrapolated Euler τleap
 xxETL:

Doubleextrapolated Euler τleap
 MPTL, xMPTL, xxMPTL:

Midpoint τleap and extrapolated and doubleextrapolated versions
 TTTL, xTTTL, xxTTTL:

Thetatrapezoidal τleap and extrapolated and doubleextrapolated versions
 CV:

Coefficient of variation
Declarations
Acknowledgements
TSz is supported by the Engineering and Physical Sciences Research Council through the Systems Biology Doctoral Training Centre, University of Oxford. This publication was based on work supported in part by Award No. KUKC101304, made by King Abdullah University of Science and Technology (KAUST). The research leading to these results has received funding from the European Research Council under the European Community’s Seventh Framework Programme (FP7/20072013) /ERC grant agreement No. 239870. RE would also like to thank Somerville College, University of Oxford, for a Fulford Junior Research Fellowship; Brasenose College, University of Oxford, for a Nicholas Kurti Junior Fellowship; the Royal Society for a University Research Fellowship; and the Leverhulme Trust for a Philip Leverhulme Prize. TSz would like to thank Manuel Barrio for his discussions and help with the simulations. KCZ would like to thank James Lottes for his invaluable comments regarding the global error expansion.
Authors’ Affiliations
References
 McAdams HH, Arkin A: It’s a noisy business! Genetic regulation at the nanomolar scale. Trends Genet 1999, 15: 6569. 10.1016/S01689525(98)01659XView ArticleGoogle Scholar
 Elowitz M, Levine A, Siggia E, Swain P: Stochastic gene expression in a single cell. Science 2002, 297: 11831186. 10.1126/science.1070919View ArticleGoogle Scholar
 Fedoroff N, Fontana W: Small numbers of big molecules. Science 2002, 297: 11291131. 10.1126/science.1075988View ArticleGoogle Scholar
 Kaern M, Elston T, Blake W, Collins J: Stochasticity in gene expression: from theories to phenotypes. Nature Rev Genet 2005, 6: 451464. 10.1038/nrg1615View ArticleGoogle Scholar
 Wilkinson DJ: Stochastic modelling for quantitative description of heterogeneous biological systems. Nature Rev Genet 2009, 10: 122133. 10.1038/nrg2509View ArticleGoogle Scholar
 Erban R, Chapman SJ, Kevrekidis I, Vejchodsky T: Analysis of a stochastic chemical system close to a SNIPER bifurcation of its meanfield model. SIAM J Appl Mathematics 2009, 70: 9841016. 10.1137/080731360View ArticleGoogle Scholar
 McQuarrie DA: Stochastic approach to chemical kinetics. J Appl Probability 1967, 4: 413478. 10.2307/3212214View ArticleGoogle Scholar
 Gillespie DT: A rigorous derivation of the Chemical Master Equation. Physica A 1992, 188: 404425. 10.1016/03784371(92)90283VView ArticleGoogle Scholar
 Gillespie DT: Exact stochastic simulation of coupled chemical reactions. J Phys Chem 1977, 81: 23402361. 10.1021/j100540a008View ArticleGoogle Scholar
 Gillespie DT: Approximate accelerated stochastic simulation of chemically reacting systems. J Chem Phys 2001, 115: 17161733. 10.1063/1.1378322View ArticleGoogle Scholar
 Gillespie DT, Petzold LR: Improved leapsize selection for accelerated stochastic simulation. J Chem Phys 2003, 119: 82298234. 10.1063/1.1613254View ArticleGoogle Scholar
 Cao Y, Gillespie DT, Petzold LR: Efficient step size selection for the tauleaping simulation method. J Chem Phys 2006, 124: 044109. 10.1063/1.2159468View ArticleGoogle Scholar
 Rathinam M, Petzold LR, Cao Y, Gillespie DT: Consistency and stability of tauleaping schemes for chemical reaction systems. Multiscale Model Simul 2005, 4: 867895. 10.1137/040603206View ArticleGoogle Scholar
 Li T: Analysis of explicit tauLeaping schemes for simulating chemically reacting systems. Multiscale Model Simul 2007, 6: 417436. 10.1137/06066792XView ArticleGoogle Scholar
 Anderson DF, Ganguly A, Kurtz TG: Error analysis of tauleap simulation methods. Ann Appl Probability 2011, 21: 22262262. 10.1214/10AAP756View ArticleGoogle Scholar
 Hu Y, Li T, Min B: The weak convergence analysis of tauleaping methods: revisited. Commun Math Sci in press
 Tian TH, Burrage K: Binomial leap methods for simulating stochastic chemical kinetics. J Chem Phys 2004, 121: 1035610364. 10.1063/1.1810475View ArticleGoogle Scholar
 Hu Y, Li T: Highly accurate tauleaping methods with random corrections. J Chem Phys 2009, 130: 124109. 10.1063/1.3091269View ArticleGoogle Scholar
 Anderson DF, Koyama M: Weak error analysis of numerical methods for stochastic models of population processes. 2011.Google Scholar
 Hu Y, Li T, Min B: A weak second order tauleaping method for chemical kinetic systems. J Chem Phys 2011, 135: 024113. 10.1063/1.3609119View ArticleGoogle Scholar
 Anderson DF, Mattingly JC: A weak trapezoidal method for a class of stochastic differential equations. Commun Math Sci 2011, 9: 301318.View ArticleGoogle Scholar
 Hairer E, Nørsett SP, Wanner G: Solving ordinary differential equations: Nonstiff problems. Berlin: Springer; 1993.Google Scholar
 Talay D, Tubaro L: Expansion of the global error for numerical schemes solving stochastic differential equations. Stochastic Anal Appl 1990, 8: 483509. 10.1080/07362999008809220View ArticleGoogle Scholar
 Rué P, VillaFreixà J, Burrage K: Simulation methods with extended stability for stiff biochemical kinetics. BMC Sys Biol 2010, 4: 110123. 10.1186/175205094110View ArticleGoogle Scholar
 Gibson M, Bruck J: Efficient exact stochastic simulation of chemical systems with many species and many channels. J Phys Chem A 2000, 104: 18761889. 10.1021/jp993732qView ArticleGoogle Scholar
 Cao Y, Li H, Petzold LR: Efficient formulation of the stochastic simulation algorithm for chemically reacting systems. J Chem Phys 2004, 121: 40594067. 10.1063/1.1778376View ArticleGoogle Scholar
 Chatterjee A, Vlachos DG, Katsoulakis MA: Binomial distribution based tauleap accelerated stochastic simulation. J Chem Phys 2005, 122: 024112. 10.1063/1.1833357View ArticleGoogle Scholar
 Cao Y, Gillespie DT, Petzold LR: Avoiding negative populations in explicit Poisson tauleaping. J Chem Phys 2005, 123: 054104. 10.1063/1.1992473View ArticleGoogle Scholar
 Yates CA, Burrage K: Look before you leap: A confidencebased method for selecting species criticality while avoiding negative populations in tauleaping. J Chem Phys 2011, 134: 084109. 10.1063/1.3554385View ArticleGoogle Scholar
 Liu XQ, Li CW: Weak approximation and extrapolations of stochastic differential equations with jumps. SIAM J Numer Anal 2000, 37: 17471767. 10.1137/S0036142998344512View ArticleGoogle Scholar
 Shohat J, Tamarkin JD: The Problem of Moments. New York: American Mathematical Society; 1943.View ArticleGoogle Scholar
 Jahnke T, Huisinga W: Solving the Chemical Master Equation for monomolecular reaction systems analytically. J Math Biol 2007, 54: 126.View ArticleGoogle Scholar
 Gadgil C, Lee C, Othmer H: A stochastic analysis of firstorder reaction networks. Bull Math Biol 2005, 67: 901946. 10.1016/j.bulm.2004.09.009View ArticleGoogle Scholar
 van Kampen NG: Stochastic Processes in Physics and Chemistry. Amsterdam: Elsevier; 2007.Google Scholar
 Shahrezaei V, Swain PS: Analytical distributions for stochastic gene expression. Proc Nat Acad Sci USA 2008, 105: 1725617261. 10.1073/pnas.0803850105View ArticleGoogle Scholar
 MarquezLago T, Stelling J: Counterintuitive stochastic behavior of simple gene circuits with negative feedbacks. Biophys J 2010, 98: 17421750. 10.1016/j.bpj.2010.01.018View ArticleGoogle Scholar
 MarquezLago T, Burrage K: Binomial tauleap spatial stochastic simulation algorithm for applications in chemical kinetics. J Chem Phys 2007, 127: 104101. 10.1063/1.2771548View ArticleGoogle Scholar
 Schlögl F: Chemical reaction models for nonequilibrium phasetransitions. Z Phys 1972, 253: 147161. 10.1007/BF01379769View ArticleGoogle Scholar
 MacNamara S, Burrage K, Sidje RB: Multiscale modeling of chemical kinetics via the master equation. Multiscale Model Simul 2007, 6: 11461168.View ArticleGoogle Scholar
 Mélykúti B, Burrage K, Zygalakis KC: Fast stochastic simulation of biochemical reaction systems by alternative formulations of the chemical Langevin equation. J Chem Phys 2010, 132: 164109. 10.1063/1.3380661View ArticleGoogle Scholar
 Liu JS: Monte Carlo strategies in scientific computing. New York: Springer; 2001.Google Scholar
 Anderson DF, Higham DJ: Multilevel Monte Carlo for continuous time Markov chains, with applications in biochemical kinetics. SIAM Multiscale Model Simul 2012, 10: 146179. 10.1137/110840546View ArticleGoogle Scholar
 Kebaier A: Statistical Romberg extrapolation: A new variance reduction method and applications to option pricing. Ann Appl Probability 2005, 15: 26812705. 10.1214/105051605000000511View ArticleGoogle Scholar
 Giles MB: Multilevel Monte Carlo path simulation. Operations Res 2008, 56: 607617. 10.1287/opre.1070.0496View ArticleGoogle Scholar
 Rößler A: Stochastic Taylor expansions for the expectation of functionals of diffusion processes. Stochastic Anal App 2004, 22: 15531576.View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.