Experimental evidence at the single-cell level suggest that random fluctuations at the microscopic scale can have an important impact in determining the complex behaviour of living organisms [1, 2]. These findings have raised significant interest towards discrete stochastic (DS) models of biological systems as a tool for bridging the gap between stochastic events arising at the molecular level and the corresponding macroscopic phenomena. In DS models, the system evolves according to a stochastic algorithm which samples the probability of the next state transition from a given probability density function (PDF). Biochemical reactions, in particular, are often modelled as discrete-state, continuous-time Markov Processes (CTMP). This method represents an alternative to the traditional continuous deterministic modelling (CDM) approach when random fluctuations must be properly taken into account. This is the case, for instance, of systems composed of a small number of elements like molecular subsystems in living cells (e.g., metabolic networks, signalling pathways, or gene regulatory networks). In this context, descriptions provided by reaction rate equations fail both to predict the fluctuations in molecular populations and to capture stochastic-driven phenomena such as Stochastic Focusing [3], Stochastic Switching [4] and Multiplicative Noise Effects [5]. The Gillespie's Stochastic Simulation Algorithm (GSSA) [6, 7], based on [8] and [9], is probably the most popular algorithm used for simulating DS models of (bio)chemical systems. The GSSA relies on a Monte Carlo technique, namely the Inverse Transform Sampling (ITS). GSSA is used to numerically simulate the Markov process described analytically by the set of differential equations from the Chemical Master Equation (CME). Since the CME can rarely be solved either numerically or analytically especially for large systems, GSSA provides a computational method to generate statistically correct trajectories (possible solutions) of the CME. These trajectories are obtained by drawing random samples from the so-called Reaction Probability Density Function (RPDF) [6, 7] through the ITS.
Gillespie's SSA is designed for simulating sets of elementary chemical reactions occurring in a well-stirred mixture in a fixed volume and at a constant temperature [10]. Thus, the straightforward application of GSSA to a biochemical context can be difficult. Indeed, it is easy to realise the difficulties in specifying all the elementary reactions composing a given biochemical system, mainly because of the lack of needed information. Indeed, only macroscopic and mesoscopic events can be observed experimentally and, thus, it is not possible to know the complete list of elementary reactions. The usual strategy for tackling this problem consists of abstracting away the non-observable elementary steps and replacing them with a single reaction event rendered as a "Markov jump" with the transition time (τ ) sampled from a negative exponential distribution. Even though heuristic, this strategy can lead to simulation results showing a good agreement with experimental data (see e.g., [11–13]).
However, the impact of the approximations introduced by this abstraction process is difficult to evaluate or estimate, as reported by Gillespie in [13] for enzymatically catalysed reactions. One crucial issue of this abstraction-approximation strategy is related to the modelling of the time needed for a reaction to occur: even though each of the elementary reactions composing a biochemical system can be described as a CTMP (and, thus, with transition times probabilities following a negative exponential PDF), the waiting times between two subsequent events observed at a mesoscopic or macroscopic scale can be non-exponential, as reported, e.g., in [14, 15] and confirmed by experimental evidence [16, 17]. It can be demonstrated (see e.g., [18] that a stochastic process exhibiting nonexponential waiting times does not enjoy necessarily the so-called Markov Property (a.k.a. memoryless property) and thus, strictly speaking, it cannot be a Markov process. These arguments suggest the need for modelling frameworks that allow us to deal with a more general notion of waiting times, thus managing non-memoryless (non-Markovian) systems' evolutions. In terms of waiting times, this corresponds to considering frameworks in which the PDF describing transition times can be different from a negative exponential.
Various proposals have been developed for addressing the aforementioned issues. BioPEPAd [15] offers the possibility of adding deterministic delays to the duration of a reaction. In [19] an extension for process calculi is proposed, allowing the expression of activity durations through general probability distributions. A similar approach is proposed in [20] for extending Petri Nets. Furthermore, in [14] the authors improved the Beta Workbench (BWB) framework with the possibility of sampling waiting times from PDFs that are different from the negative exponential, such as the Erlang or Hyperexponential. This strategy results in better matches with the observed non-Markovian biological behaviours. However, even though the data-fitting can enhance results, the considered PDFs do not exactly match the experimental evidence. Hence, they may themselves introduce unpredictable approximations.
In this paper, we propose a Constraint Programming approach that is suited for being embedded in Monte Carlo algorithms for discrete-state continuous-time stochastic simulation of biochemical reactions. Our method allows us to sample random numbers from PDFs that may not necessarily follow a negative exponential distribution including, e.g. the Erlang and Hyperexponential distributions considered in [14].
Relying also on the results about single-molecule enzyme reactions presented in [21], we exploit our method for efficiently and accurately simulating the occurrence of enzyme catalysed reactions that follows the Michaelis-Menten (MM) scheme. Noticeably, our approach allows us to simulate this kind of reactions as a single S→ P (Substrate → Product) step without any loss of accuracy, i.e., without introducing approximations.
The contribution of this paper is hence twofold: on the one hand we propose a general method for building discrete-state continuous-time stochastic simulations of (bio)chemical systems, sampling waiting times from general PDFs distributions; on the other hand, exploiting our method, we provide an efficient and accurate strategy for the stochastic simulation of MM enzyme reactions. In the latter case, we provide a simulation technique that overcomes some of the limitations of the corresponding GSSA approach.
Our method is implemented on top of the Gecode library (http://www.gecode.org), an efficient framework to solve constraint satisfaction problems [22, 23]. Essentially, we use the real intervals constraint support of Gecode for solving sets of interval constraints. Although specific numerical methods may be computationally more efficient, declarative methods as Constraint Programming provide a flexible framework able to quickly adapt to different situations, as we explain later. Hence, the modellers have at their disposal a more general (software) tool that minimises the need for writing new code when the parameters of the model change.
We will show the features of our method through its application tot the simulation of enzyme reactions following the MM scheme. To do this, we first introduce in the following subsection some details regarding the MM reaction scheme and the results about single-molecule enzyme reactions obtained in [21].
Simulating Single-Molecule Enzyme Reactions
A convenient (and popular) way of describing all the elementary steps of an enzymatically catalysed reaction is through the Michaelis-Menten (MM) model. According to this reaction scheme, a catalysed reaction of the kind S→ P (where S and P are the substrate and product molecules respectively, while E represents the enzyme molecule and ES the enzyme-substrate complex) can be approximated with three "elementary" reactions:
(1)
The reaction set (1) can be simulated as a CTMP via GSSA using k1, k−1 and k2 as "elementary" rate constants. It is described as a single Markov step and, thus, the corresponding transition time follows a negative exponential PDF. The propensity is calculated through Equation (2), where v is the rate of product formation, [S] the substrate concentration and K
M
the MM constant (see, e.g., [13]).
MM kinetics is particularly convenient because it effectively reduces the three reactions in Equation (1) to a single reaction S → P with rate v. Moreover, the necessary parameters are easier to measure experimentally than the rate constants ki.
However, modelling enzymatic reactions with the MM kinetics introduces approximations at two different levels. The first level of approximation is inherent to the MM kinetics itself. Indeed, it should be noted that Equation (2) does not capture the dynamics of reaction set (1) exactly, being based on assumptions (e.g., the steady-state assumption [24]) that are approximately valid. The steady-state approximation is the first important assumption involved in the MM reaction scheme. According to this approximation, the concentration of the ES complex will rapidly reach the steady-state, i.e., after an initial burst phase, the concentration of ES will not change appreciably until a significant amount of substrate is consumed [25]. In in vivo experiments this assumption may not necessarily hold because the enzyme concentrations can be comparable to the substrate concentrations [26]. The steadystate approximation is well studied in the deterministic setting, where the reaction set (1) is described through a set of ordinary differential equations (ODEs).
Building a stochastic model à la Gillespie of the reaction scheme (1), relying upon MM kinetics, requires the conversion of an ODE model into a stochastic model. As previously pointed out, this is straightforward if the ODEs describe elementary reactions. Some authors (see e.g., [12]) compared the output of a stochastic model that uses MM kinetics and the corresponding model decomposed into the three "elementary" reactions in (1), and no significant differences in simulation results were found. The work in [11] verified the equivalence of the deterministic and the stochastic MM approximations under a restricted set of initial conditions. In spite of these results, however, there is no general (theoretical) method for converting MM terms from the deterministic to the stochastic setting. This problem has been exhaustively studied in [13], where it is shown that a "full" stochastic model of the reaction set in Equation (1), describing explicitly the three reactions, can be safely reduced to one S→ P reaction with stochastic rate v under certain conditions, namely, after the pre-steady-state transient period.
The second level of approximation regards the Markovian modelling paradigm. Modelling the reaction set in Equation (1) as a single Markov jump defined by the lumped reaction S→ P with rate v, implies subsuming that the process satisfies the Markov (or memoryless) property: the probability of transition from the current state to one of the possible successors depends only on the current state. This, in turn, means that the PDF describing the waiting time τ must be a negative exponential function. As showed by single-molecule experiments [17, 21], the PDF describing the occurrence of τ is significantly different from an exponential. In other words, experimental evidence suggests that the occurrence of an enzymatically catalysed reaction is not a Markov process. The problem of evaluating the approximation introduced by Markovian models of natural systems is well known in physics and in some cases can be quantified [27].
In the cases that have been studied, the waiting time τ for a biochemical reaction catalysed by a single enzyme results to be distributed according to the following PDF:
(3)
where [S] is the substrate concentration, k1, k−1, k2 and are the kinetic constants (see Equation (1)) and
We note that and . Differently from the classical ensemble experiments (in which the variations of reactive species concentrations are measured in solutions containing "large" amounts of molecules), a single-molecule experiment records the stochastic time trace of repetitive reactions of an individual enzyme molecule. In particular (see [21] and [17] for details) the measured quantities regard the duration of the waiting time that passes between a reaction and the following one. Therefore, in the absence of dynamic disorder, f (τ ) describes the temporal behaviour of the single-molecule MM system in Equation (3) at any specified substrate concentration. This PDF is exact and does not invoke the steady-state assumption, but it can be reduced to the steady-state case [17].
Note that Equation (3) is very different from the negative exponential PDF for waiting times proposed by Gillespie. This suggests that GSSA may be not adequate to simulate biochemical reactions catalysed by enzymes. Another remarkable fact to notice is that the reciprocal of the mean waiting time obeys a MM-type equation (called the Single-Molecule-Michaelis-Menten-Equation):
(4)
Figure 1 (right) shows a plot of [S] vs . Note that this plot exhibits the characteristic hyperbolic profile of the classical MM saturation curve. The reader may wish to compare it with Figure 1 on the left, which is a plot of [S] vs v drawn from the Equation (2), that refers to the MM model for enzymatically catalysed reactions in Equation (1). The fact that the fraction calculated from Equation (4) exactly coincides with Equation (2) highlights the ergodicity of the process, i.e., the consistency between the single-molecule and the ensemble-averaged kinetics. This issue is also evident by the fact that the same constants (k1, k−1, k2 and K
M
) appear both in Equation (2) and Equation (4). This correspondence is useful for computer-based stochastic simulations of single-molecule reactions because it allows the safe use of the constants k1, k−1, k2 and K
M
measured in the "ensemble" experiments.
In the next section, we show how our technique allows us to build a Monte Carlo-based algorithm for simulating the occurrence of (networks of) single-molecule enzymatic reactions, by sampling waiting times from Equation (3) through the ITS method.