From qualitative data to quantitative models: analysis of the phage shock protein stress response in Escherichia coli
 Tina Toni^{1, 2, 4}Email author,
 Goran Jovanovic^{3},
 Maxime Huvet^{1},
 Martin Buck^{3} and
 Michael PH Stumpf^{1, 2}Email author
https://doi.org/10.1186/17520509569
© Toni et al; licensee BioMed Central Ltd. 2011
Received: 19 April 2010
Accepted: 12 May 2011
Published: 12 May 2011
Abstract
Background
Bacteria have evolved a rich set of mechanisms for sensing and adapting to adverse conditions in their environment. These are crucial for their survival, which requires them to react to extracellular stresses such as heat shock, ethanol treatment or phage infection. Here we focus on studying the phage shock protein (Psp) stress response in Escherichia coli induced by a phage infection or other damage to the bacterial membrane. This system has not yet been theoretically modelled or analysed in silico.
Results
We develop a model of the Psp response system, and illustrate how such models can be constructed and analyzed in light of available sparse and qualitative information in order to generate novel biological hypotheses about their dynamical behaviour. We analyze this model using tools from Petrinet theory and study its dynamical range that is consistent with currently available knowledge by conditioning model parameters on the available data in an approximate Bayesian computation (ABC) framework. Within this ABC approach we analyze stochastic and deterministic dynamics. This analysis allows us to identify different types of behaviour and these mechanistic insights can in turn be used to design new, more detailed and timeresolved experiments.
Conclusions
We have developed the first mechanistic model of the Psp response in E. coli. This model allows us to predict the possible qualitative stochastic and deterministic dynamic behaviours of key molecular players in the stress response. Our inferential approach can be applied to stress response and signalling systems more generally: in the ABC framework we can condition mathematical models on qualitative data in order to delimit e.g. parameter ranges or the qualitative system dynamics in light of available endpoint or qualitative information.
Keywords
Background
Bacteria have evolved diverse mechanisms for sensing and adapting to adverse conditions in their environment [1, 2]. These stress response mechanisms have been extensively studied for decades due to their biomedical importance (e.g. development of antibiotic therapies). With the advent of molecular biology technologies it is now possible to study biochemical and molecular mechanisms underlying stress response signalling. However, due to the complexity of these pathways, the development of theoretical models is important in order to comprehend better the underlying biological mechanisms. Models can be especially useful when a system under study involves a large number of components and is too complex to comprehend intuitively.
Unfortunately, however, suitable models are few and far between. For most systems we lack reliable and useful mechanistic models; this even includes systems that have been attracting considerable attention from biologists and biochemists, and for which substantial amounts of data have been generated. The phage shock protein (Psp) response [3] in bacteria  in particular in Escherichia coli  is one such system. We know much about the constituent players in this stress response and have a basic understanding of their function and evolution [4]. But so far we lack models that would allow for more detailed quantitative, computational or mathematical analysis of this system.
The Psp system allows E. coli to respond to filamentous phage infection and some other adverse extracellular conditions, which can damage the cellular membrane. The stress signal is transduced through conformational changes that alter proteinprotein interactions of specific Psp membrane proteins, which mediate the release of a crucial transcription factor. This transcription factor then triggers the transcription of seven psp genes that activate and modulate the physiological response to stress, which includes membrane repair, reduced motility and finetuning of respiration.
The motivation for the research presented in this manuscript is twofold: (i) we want to construct and analyze a mechanstic mathematical model for the Psp stress response system; (ii) we will develop and illustrate a general theoretical framework that can be employed to make use of qualitative, semiquantitative or quantitative data and knowledge about biological systems in order to develop useful explanatory and predictive mathematical models of biological systems.
Our modelling strategy is guided by the following questions: can we reverseengineer a dynamical model for the Psp response system based on limited qualitative data? How much does this information allow us to delimit the ranges of e.g. kinetic reaction rates of such models? We take a twostep approach: we will first subsume all the available information into a Petri net framework and undertake a structural analysis of the model. We then study the dynamics of the model in stochastic and deterministic frameworks. Since parameter values are unknown, we employ an approximate Bayesian computation (ABC) method based on a sequential Monte Carlo (SMC) framework [5] in order to fit the model to the known facts. This allows us to predict what type of dynamic behaviour we may expect to see in timecourse experiments.
As we will show in the context of the Psp response in E. coli, such an approach can result in nontrivial insights into the system's dynamics. In particular, we will compare the outputs of analyses assuming stochastic and deterministic models, and show that some aspects of the system, such as qualitative dynamic behaviour and some parameters can already be constrained by using the limited information available. More generally, we will discuss how this procedure can be used in the reverseengineering of biological systems.
Introduction to the biology of phage shock protein response
Depending on the type of changing environmental conditions, bacteria can employ different stress response mechanisms. Some of the well studied stress response systems are the RpoE and Cpx extracytoplasmic systems [2] and the heatshock response [6]. The stress response systems that respond to alterations in the bacterial cell envelope are collectively known as extracytoplasmic or envelope stress responses. Recently a wealth of information has been obtained about the Psp response system [3, 4, 7, 8], which also belongs to this set of responses.
The Psp response was first observed during the filamentous phage infection of a bacterial cell [9]. It can also be invoked as a result of extreme temperature, osmolarity, mislocalization of envelope proteins called secretins, increase in ethanol concentration and presence of proton ionophores. These conditions damage the bacterial cell envelope, which serves as an ionpermeability barrier for the establishment of the proton motive force (pmf). The pmf is a result of an electrochemical gradient, which is caused by a charge difference due to active pumping of hydrogen ions across the membrane. When the cell envelope is adversely affected, and the Psp response cannot be established, this proton motive force dissipates. A physical change in the membrane and/or an associated biochemical change leads to switching on the cell's stress response. The induction of stress results in increased expression levels of the psp genes. The Psp response has been extensively studied in Escherichia coli[3, 7, 8, 10–13] and a short overview is given in the following paragraphs.
Under nostress conditions the protein PspA binds to PspF, which inhibits the ATPase activity of PspF, resulting in basal transcription of pspAE and pspG (Figure 2) [10]. Under Psp inducing conditions (i.e. when stress is present), the stimulus is converted into a signal, which is transduced apparently independently through the transmembrane ArcB sensor kinase, and proteins PspB and PspC. ArcA, a cytosolic cognate response regulator complementing ArcB, plays a role in signal amplification. Through PspB and PspC the signal disrupts the PspAPspF interaction and allows PspF to activate the transcription. The roles of PspD, PspE and PspG in Psp activation, transduction, transcriptional regulation or membrane repair are not yet fully understood.
The activation of transcription results in the increase in concentration of several Psp proteins. PspA, PspD and PspG play a major role in switching the cell to anaerobic respiration and fermentation, while PspA also binds to the inner membrane phospholipids, repairs the membrane damage and prevents further proton leakage. PspD is also involved in repair of the cell envelope, while PspG play a major role in ne tuning the cell metabolism towards anaerobic respiration and fermentation. Moreover, when overproduced, they all (PspA, D and G) downregulate cell motility, which in turn downregulates the pmf consumption and maintains energy usage. Although the PspF regulon and regulation of psp genes have been extensively studied, many open questions remain about the kinetics of signal transduction, the function of Psp proteins, and physiological responses. In particular, how does the response evolve over time? How quickly do cells respond to stress when it is induced, and how quickly does the membrane get repaired? Finally, how does the system respond to dissipation or removal of the stress?
Such behaviour is the result of a complex network of interactions, and interplay between the conformational changes of proteins, transcriptional activation and effector activities in the Psp system. All these mechanisms also depend on kinetic rates, which at present are unknown. The system has not yet been theoretically modelled or analysed in silico. However, we feel that this rich behaviour cannot be understood using verbal or reductionist models alone. Here we propose to address these questions with the help of mechanistic mathematical models of the system's response. We use inferential techniques to develop mathematical descriptions of a mechanistic model of the Psp response system, analyze these models, and interpret the biological implications of this analysis.
Results and Discussion
A mechanistic model of the Psp system
Biological systems are complex and assumptions need to be made and justified whenever building a model to describe their behaviour. It requires biological knowledge, intuition and mathematical skill to develop suitable models that make the right and necessary assumptions in order to simplify the model, while still incorporating all the key players and capturing the necessary level of complexity. Below we first frame our model in the context of a Petri net framework [15–17], which for the present purpose has the benefit of offering a convenient graphical representation that is readily translated into other modelling and simulation schemes. We will make use of some of the specific Petri net tools to check this model, but use ODEs and stochastic processes in order to study the dynamics of these mechanistic models.
In order to build a simple model of the Psp response system, we first need to make some assumptions. In particular, we need to decide which of the molecular species and numerous pieces of biological information have to be included in the Psp model to capture the basic stress response dynamics. Since the proteins PspD, PspE and PspG are only involved in the physiological response and their regulatory role is currently not known, we only include proteins PspA, PspB, PspC and PspF in our simplified model. Moreover, we model proteins PspB and PspC as a complex (BC). Proteins ArcA and ArcB play a role in amplifying the signal, but are not necessary for capturing the basic stress response dynamics [8]; we only treat them as an intermediate in passing the signal from the damaged membrane to elicit the change in conformation of PspB and PspC proteins, and will therefore not include them explicitly in the model here.
One of the consequences of membrane damage is dissipation of the proton motive force, which is believed to trigger the conformational changes of proteins PspB, PspC, and presumably PspA as well; in our model this corresponds to complexes BCA turning into B_{ c }C_{ c }A_{ c } (eqn. 1k). The other consequence of the damaged membrane is that the complex BCAF breaks into two parts (eqn. 1i): the first part is PspF, which is then free to form hexamers and acts as a transcription factor (TF) (eqn. 1c), and the second part is conformationally changed, B_{ c }C_{ c }A_{ c } .
The transcription factor TF activates the production of PspA, PspB and PspC proteins (eqn. 1e). The ratio of mRNA production of PspA, PspB and PspC has been experimentally measured as 100:60:40 [7]. Because we model PspB and PspC as a complex, we assume that the same number of both mRNAs is produced; we take this number to be 60 (but could have chosen e.g. 40 as well). Moreover, we assume that the protein numbers mimic this ratio. A fraction of PspA proteins forms a complex with BC (eqn. 1g), while the other part forms oligomers (olg) by binding of 36 PspA molecules into a complex (eqn. 1f). These oligomers act as effectors and reestablish pmf; we model this by repairing the damaged membrane parts (eqn. 1b). When the membrane is not damaged, proteins PspB and PspC change their conformation back into their native state (eqn. 1j).
When building this model we had to make some further assumptions. Once PspA is in the complex with PspB and PspC, it cannot be used anymore as an effector, i.e. PspA is never released from the complex. Only the newly transcribed PspA can form oligomers which act as effectors to repair the membrane. We also assume that there is no threshold level in terms of proportion of the membrane that needs to be damaged in order to pass the signal on, i.e. we simply assume that the signal is stronger if a larger proportion of the membrane is damaged (i.e. when there are more tokens in the dm state), and weaker if a lower proportion of membrane is damaged. This is incorporated into the model through markingdependent rates; for example, the rate of a BCAF breakdown (eqn. 1i), and the rate of BCA conformational change (eqn. 1k) will be proportional to how much of the membrane is damaged. Another assumption, which is in line with experimental evidence, is that the number of PspF proteins and related constructs (the sum of F, TF and BCAF) is constant in cells, and we therefore incorporate this assumption by excluding production and degradation of PspF from the model. However, we do model production and degradation of the other molecular species (eqns. 1l1p).
To complete the definition of a Petri net, we need to define the initial markings. This has to be done with care, as a badly chosen initial marking can result in so called "deadlocks", i.e. when none of the transitions can be fired anymore. A transition is said to be "dead" if it can never fire in any firing sequence. A property related to the absence of deadlocks is liveness, and different levels of liveness exist [18]. A Petri net is L1live if all transitions can be red at least once in some firing sequence. This property is, for example, satisfied by the following initial marking: M_{0} = (stress; dm, im, olg, hBCA, hB_{ c }C_{ c }A_{ c }, hBCAF, TF) = (1, 0, 100, 0, 0, 0, 20, 0). That is, we start with the stress turned on, the whole membrane in the intact state, and all Psp proteins present in the system bound in the complex hBCAF. There are no oligomers, hBCA or hB_{ c }C_{ c }A_{ c } complexes in the system, and no transcription factors TF available at the start of the simulation. The possible markings are: stress ∈ {0, 1}; dm; im ∈ {1, 2, ..., 100}, im = 1  dm; olg, hBCA, ; hBCAF, TF ∈ {1, ... 20}. The marking of a place dm can be interpreted as the percentage of membrane damage.
with (y_{1}, y_{2}, y_{3}, y_{4}, y_{5}, y_{6}, y_{7}, y_{8}) = (stress, dm, im, olg, hBCA, hB_{ c }C_{ c }A_{ c }, hBCAF, TF ), y_{1} ∈ {0, 1} and the initial condition y = (1, 0, 100, 0, 0, 0, 20α, 0), , and is an indicator function.
Petri net markings form a discrete space (i.e. modelling the numbers of the molecules), while the ODE model variables y_{ i } , i = 4, ..., 8 represent concentrations of molecules. Variables y_{1}, y_{2} and y_{3} are exceptions in that they do not represent molecules but the stress conditions, y_{1} ∈ {0, 1} and the state of the membrane, 0 ≤ y_{2} ≤100, y_{3} = 100  y_{2}. The relationship between the number of molecules in the Petri net and the concentrations in the ODE model is the following: for a concentration y_{ i } (units M/l) in a volume of V litres, there are M_{ i } = n_{ A }y_{ i }V molecules, where n_{ A } ≈ 6.02 × 10^{23} is Avogadro's constant, which represents the number of molecules in a mole [21].
measured in moles per litre. This applies to the molecular species olg, hBCA, hB_{ c }C_{ c }A_{ c } , hBCAF, TF.
This conversion rule does obviously not apply to the stress condition, which is either on or off (i.e. 1 or 0), and to the percentages of damaged and intact membrane. For the first order reactions (i.e. of type X → Y) the relationship between the stochastic rate c and the deterministic rate k is c = k. In our Petri net this rule applies for transitions tr_{2}, tr_{3} and tr_{5}  tr_{10}. For the second order reactions (i.e. of type X + Y → Z) this relationship becomes , which applies to transition tr_{4}, and a zeroth order reaction's (i.e. of type, ∅ → X) stochastic rate is c = kn_{ A }V , which we use for transition tr_{1}.
Model validation and calibration
Employing Petri net terminology we have developed a simple mechanistic model, which summarizes our current knowledge of the phage shock protein response system [8]. We now combine discrete Petri net structural analysis, and stochastic and deterministic simulation and analysis of the model [19]. The classical discrete Petri net theory offers several theoretical tools to analyse structural properties of the Petri net, which are useful for qualitative validation of the model. To validate the basic model structure, we calculate the structural invariants (we explain the meaning of these variants later) and calibrate the dynamic model against qualitative data. This fitting process also provides us with parameter estimates.
and its transpose (see Methods section for definitions of these terms). A Pinvariant is a nonzero vector y that solves Ay = 0, and a Tinvariant is a nonzero, nonnegative vector, x, that solves A^{ T } x = 0.
Pinvariants correspond to conservation laws of the network, while Tinvariants represent the sequence of transitions that lead back to the initial marking [21]. P and Tinvariants can be used to check the model for consistency, and to test the basic correctness of its biological interpretation [23].
Pinvariants of the simplified Petri net Psp model
Stress  dm  im  olg  hBCA  hB _{ c } C _{ c } A _{ c }  hBCAF  TF 

1  0  0  0  0  0  0  0 
0  1  1  0  0  0  0  0 
0  0  0  0  0  0  1  1 
Tinvariants of the simplified Petri net Psp model
tr _{1}  tr _{2}  tr _{3}  tr _{4}  tr _{5}  tr _{6}  tr _{7}  tr _{8}  tr _{9}  tr _{10} 

1  1  0  0  0  0  0  0  0  0 
0  0  0  0  0  1  1  0  0  0 
0  0  1  0  0  0  0  10  0  1 
0  0  0  1  1  1  0  0  0  0 
0  0  1  0  0  0  10  0  10  1 
0  0  1  10  10  0  0  0  10  1 

the membrane gets damaged and then repaired; (tr_{1}, tr_{2}).

proteins PspB and PspC change conformation, and then return to back to the original state, (tr_{6}, tr_{7}).

transcription and translation of new PspA, PspB and PspC proteins and their complexes, and their subsequent degradation; (tr_{3}, 10 tr_{8}, tr_{10}), (tr_{3}, 10 tr_{7}, 10 tr_{9}, tr_{10}).

binding of protein PspF to the complex of PspA, PspB and PspC, and subsequent breakup of the complex; (tr_{4}, tr_{5}, tr_{6}).

transcription and translation of new PspA, PspB and PspC proteins, formation of a complex between PspA, PspB, PspC and PspF proteins, the breakup of this complex and protein degradation; (tr_{3}, 10 tr_{4}, 10 tr_{5}, 10 tr_{9}, tr_{10}).
All these invariants are biologically sound (and may also be deduced by inspection of the model). While the basic system behaviour is determined by the minimal Tinvariants, the linear combinations of these invariants describe all possible behaviours of the system. The results here agree well with the P and Tinvariants of the full model in Figure 3, which are given in [Additional file 1].
Having obtained some level of support for the model structure, we next study its dynamics. We are particularly interested in the dynamics after the induction of stress, as well as the dynamics following the subsequent removal of stress (which is experimentally challenging). Despite the fact that many aspects and the molecular players involved in the Psp system have been studied in detail, not much is known about its temporal behaviour. We know that upon the induction of stress, most of the Psp protein levels rise and that the complex between PspA and PspF (hBCAF in our model) is likely to be broken down. However, the time course dynamics or kinetic rates (e.g. production and degradation rates) have so far not been measured. Moreover, the effects of removal of stress after stress induction has also never been experimentally studied. Our network model allows us to theoretically predict the possible dynamic behaviour.
where a represents the percentage of damaged membrane at the end of the stress induction period. Here we study the behaviour of the system for different values of a.
Figures 5(a)(b) show the inferred posterior distributions of the parameters. Illustrated are the two dimensional projections. Reassuringly, posterior distributions of both deterministic and stochastic rates have the same shape, with stochastic parameters allowing a slightly broader range. We can see, for example, that parameter k_{4} is already easily inferred from the available qualitative data. Moreover, some parameters are much more restricted (i.e. better inferred) in the deterministic case than in the stochastic case (e.g. k_{2}), while the other parameters are equally inferable in both cases (e.g. k_{10}).
Having obtained the posterior parameter distributions, we can now simulate possible dynamic behaviours for different parameter realizations in order to make predictions of the dynamic model output. Figures 6(a)(b) illustrate the possible stochastic behaviour and Figures 6(c)(d) the possible deterministic behaviour for randomly chosen parameters. These parameters were sampled from the inferred posterior distributions obtained above by using ABC SMC for calibrating the model against the endpoint data, represented by red dots. We present the results for different proportions of the damaged membrane a.
The trajectories generated from our posterior distribution over the model parameters do indeed provide interesting insights into the dynamics of membrane damage (dm). The proportion of damaged membrane is an indicator of the severity of the induced stress. We investigate the dynamics in response to different stress severity (a ranging from 15 up to 100, results shown for some representative values only). Under deterministic dynamics and when the damage is expected to be low (i.e. low a), oscillations may be observed (Figure 6(d)) for many parameters. This behaviour can be explained by the quick initial response to stress, which is then counteracted and attenuated by the membrane repair. The response machinery (specifically, membrane repair through PspA oligomers) acts as a negative feedback on stress induction. On the other hand, if the stress is strong (i.e. high a), then the repair machinery will have a smaller effect on the membrane relative to the damage induced by stress (Figure 6(c)). The lower the signal, the more pronounced the oscillations will be in molecular species olg, hBCA, hB_{ c }C_{ c }A_{ c } , hBCAF and TF as well.
When the stochastic framework is employed (Figures 6(a)(b)), the membrane damage fluctuates a lot (i.e. from nearly completely damaged membrane to almost intact membrane) and rapidly. But this is again less pronounced when the stress is strong (Figure 6(b)). Another interesting feature that we can observe from the simulated stochastic trajectories is the pronounced difference in the noise levels of different protein complexes. The highest variation is present in olg, followed by hBCA and hBCAF. Interestingly, hB_{ c }C_{ c }A_{ c } exhibits relatively low noise; presumably this is due to its frequency being a function primarily of the stress induction and is only very indirectly influenced by other processes.
The next step for modelling the Psp response must be obtaining real experimental quantitative and timeresolved measurements. These will allow for the improved estimation of posterior parameter distributions, and by having confidence in parameter estimates inferred from quantitative experimental data we can then explore the limits and behaviour of the Psp system response when exposed to different stress induction and removal schedules. In particular, even a small number of additional measurements would allow to determine the extent to which oscillatory behaviour is likely to occur in reality.
Conclusions
Our study was motivated by the following general questions: Can knowledge about quantitative stress response dynamics be inferred from available qualitative data? And can we thereby generate hypotheses which can be tested experimentally? We have approached these problems in an inferencebased manner. This means that we have developed a basic model structure and tested its consistency using tools from Petri net theory; for the proposed structure of our network model we have then shown that we can use an ABC SMC algorithm to identify regions in parameter space that allow the model to reproduce the observed (or desired) qualitative behaviour, and we have applied this framework to the Phage shock protein stress response system in E. coli. From the resulting posterior distributions over model parameters we were then able to sample plausible model parameters (in the sense that they are in concordance with our present state of knowledge about the system's behaviour) to study the type of deterministic and stochastic dynamical behaviour likely to arise for the Psp response.
The Psp system is part of the sophisticated stress response machinery that E. coli has acquired over the course of evolution in order to respond to adverse environmental conditions [4]. The intricate interplay between the different constituent components of the Psp reponse, like many other signal transduction systems, has only been studied in a traditional reductionist approach where the focus is on individual proteins, their structure and their interactions. Although these studies have already provided important insights into the stress response, there is a need for consolidating these (sometimes somewhat disparate) pieces of information into a mechanistic model of the stress response system. Here we have developed an inferential framework to analyze such models quantitatively in light of the available qualitative data.
We have shown that for the Psp response system the limited qualitative and semiquantitative data alone can already provide some insight into the dynamic nature of the stress response. We have been able to narrow down the parameter regions (i.e. we obtained the posterior parameter distributions in a Bayesian sense) for deterministic and stochastic dynamics of the Psp system. Furthermore, we have predicted the possible dynamic behaviour for all the molecular species involved in the response; furthermore, analysis of the stochastic dynamics has allowed us to predict the relative levels of noise in all of the molecular species. Most importantly, the predicted dynamic behaviour shows a nontrivial and a priori unexpected dependence on the stress intensity; oscillations can be observed for low stress intensity for many parameter values that are in agreement with present data, while no oscillations are observed for high stress intensities. Such oscillations could underly population heterogeneity and help to drive differences between responses of otherwise identical cells to environmental stresses. This in turn has recently been shown to have important implications to e.g. drug treatment and escape of some cells from therapeutic interventions [28].
The next step will be to collect quantitative time course data, including the basal level expression of psp genes, and "titrate" stress. Advances in quantitative real time live cell imaging methods applied to visualising the psp response across a range of stress conditions, magnitudes and durations of applied stresses are expected to yield the key data needed to examine oscillatory behaviour. These methods produce highly resolved data that will also enable us to target directly the role and biological relevance of oscillatory behaviour of the Psp response system.
This analysis has provided predictions of possible qualitative time course dynamic behaviours of crucial players in stress response. Our model of the Psp system has necessarily (given the amount of available data) focused on the core of this stress response. It would be desirable to extend the model by adding further layers of detail and separate the PspB and PspC proteins into two separate variables, since the proteins are passing the signal on independently; conformational change of PspC is a result of mechanical changes in the membrane, while PspB changes its conformation as a result of chemical changes, and activates the phosphorylation of ArcB and hence ArcA. It is believed that ArcA plays a role in amplification of the signal [13], and it would be of interest to incorporate ArcA explicitly into the model and study how such amplifications are mediated in practice. The ArcA/ArcB twocomponent system is also involved in other responses to environmental stimuli and is a potential relay of cross talk into the Psp response; capturing the effects of crosstalk will almost certainly require more involved mathematical formalism in order to understand the different contributing factors [29].
We initially developed and applied ABC SMC to deterministic and stochastic dynamical systems as a means of quantitative inference from quantitative time series data [5, 30]. In this paper we have applied ABC SMC in a slightly different context and have shown that it can successfully be applied to different and more limited types of data. Another difference to previous applications is that here our main purpose was not to infer parameters, but mainly to explore the likely range of qualitatively different trajectories that could reproduce the data. The scope for this strategy is considerable: it can be applied across all simulation models, and can perform inference tasks from limited, qualitative or quantitative data.
One such area of potentially fruitful application is in the comparative analysis of biological systems. For example, it is well known that some bacterial species, some of which are evolutionary closely related to E. coli, lack certain psp genes [31]. An adaptation of our current Psp model could then be used to study the likely changes in stress response dynamics by removing these genes from the model. This will allow us to predict how the dynamic stress response in species lacking specific molecular players differs from the stress response of the well studied model organism E. coli.
To take this approach one step further still, one can propose a set of candidate models and fit them to data representing the desired behaviour of the system. Then a model selection approach [30] can be employed in order to determine which of the proposed models reproduces the desired behaviour most reliably and most robustly. Such an approach can for example be used to guide the design of synthetic biological systems [32]; the function or action that the synthetic system is required to perform can be described by qualitative data (e.g. oscillations or production of a specific protein etc.) and candidate models can be fitted to these data (which are really design objectives) using an appropriate model selection technique in order to (i) choose which model will best reproduce the behaviour we would like the system to undertake, and (ii) infer the parameter distributions. In simulationbased studies we have found this to be a very promising and intuitive strategy to come up with signal transduction pathways that respond to stimuli in the environment in a desired and specified manner.
Methods
Introduction to Petri nets
Petri nets [33] are a graphical and mathematical modelling tool applicable to many systems and are often used to study concurrent processes [18]. Different kinds of Petri nets have been developed and applied for modelling biological pathways including metabolic pathways, gene regulatory networks, signal transduction pathways and integrated signallingregulatory systems. Reviews and extensive bibliographies can be found in references [15–17, 34]. Here the main purpose of using them is as a convenient pictorial representations that allows for the efficient exchange of ideas between biological domain experts and modellers.
In the following paragraphs we de ne the components of a Petri net and give a biological interpretation following Goss and Peccoud [35]. A Petri net is a directed bipartite graph, in which directed arcs connect two types of nodes: places P = {p_{1}, ..., p_{ n } } and transitions T = {tr_{1}, ..., tr_{ m } }. In a graphical representation, circles represent places and rectangles represent transitions. To model a system of molecular interactions as a Petri net, each place represents a distinct molecular species or condition. These places contain tokens, which represent individual molecules or other biological entities. The number of tokens in a place, p_{ i } , is its marking, and the state of all places is called a global marking, M. The initial marking, M_{0}, represent the number of tokens in each place at time t = 0.
Transitions represent chemical reactions or a change from one molecular state to another. Directed arcs, which represent input and output functions, link places to transitions and transitions to places. Each arc has an associated weight; (where are the nonnegative integers) is the matrix containing the weights of arcs going from places to transitions, and contains the weights of arcs going from transitions to places. These weights determine the stoichiometric coefficients of the species involved in the reaction. A transition tr_{ i } is said to be enabled when the marking of a place is equal to or greater than the coefficient of its corresponding input arc, M_{ j } ≥ Pre_{ ij } , j = 1, ..., n. Enabled transitions can "fire", moving the tokens from input to output places. This defines the place/transiton Petri net.
However, if we want to study how molecular species change in time, we need to incorporate a time component into a Petri net. Petri nets in which transitions fire in discrete time (e.g. t, t + 1, t + 2, . . .) are called timed Petri nets. Molecular events, however, are known to be governed by stochastic rate laws, which can be modelled by a Stochastic Petri net [36]. A Stochastic Petri net is derived from a place/transition Petri net, by assigning the rates to transitions; these rates are marking dependent and in the present context the markings of the stochastic Petri net are discrete and represent the number of molecular species. The times at which transitions fire are exponentially distributed and given the kinetic laws the stochastic Petri net can be simulated using Gillespie's algorithm [37].
If the number of molecules is sufficiently large and stochastic fluctuations can be ignored then we can choose to study how concentrations, rather than numbers of molecules, changer over time. We therefore further transform the Petri net into a timed continuous model, which can also be described with ordinary differential equations (how this is done, is described in detail elsewhere [19, 20]). Here, a continuous Petri net is just a convenient graphical representation of a dynamical system, which for modelling purposes is often described by deterministic, ordinary differential equations.
Modelling of chemical reactions, such as the one in the above example, is quite straightforward as the basic purpose of Petri nets is to represent production/consumption processes. In the same light, metabolic networks, which consist of biochemical reactions, can naturally be represented by a Petri net. However, genetic regulatory networks are more difficult to model with Petri nets. While the reactants get consumed in the metabolic networks, the regulators do not turn over during a regulatory process: in addition to flux of matter, the flow of information gains in importance. Therefore, a slightly different Petri net structure is needed for modelling regulatory networks. The situation is similar in signal transduction networks; molecules respond to signals rather than turn over. For example, a molecule might (transiently) change conformation in response to a signal. Another example is the binding of a transcription factor to DNA that result in new proteins being produced (without consumption of transcription factor). This type of information flow can be modelled by test arcs. Whenever a test arc is used in our model, the number of tokens in the "test place" does not change, and the rate of transition is dependent on the number of tokens (i.e. markingdependent). The test places in our model are stress, dm, im, TF and olg.
In this manuscript, Petri nets are used in the following way: the most basic, place/transition Petri nets are used to validate the basic structure of the model and determine the initial markings. Petri nets, which define the structure of our model, are then turned into stochastic and continuous (deterministic) simulation frameworks, which are used to study the dynamics.
Approximate Bayesian computation (ABC)
ABC methods have been developed in order to obtain Bayesian posterior distributions where likelihood functions are computationally intractable or too costly to evaluate [5]. They replace evaluation of the likelihood with a comparison of observed and simulated data. Let θ be such a parameter vector to be estimated. Given a suitable prior distribution, P(θ), our goal is to develop an approximation to the posterior, P (θ  D_{0}) ∝ f (D_{0}  θ )P(θ), where f (D_{0}θ) is the likelihood of θ given the data D_{0}. ABC methods take the following generic form:
1 Sample a candidate parameter vector θ* from a suitable proposal distribution P(θ) (our main constraint is that P(θ) > 0 wherever we expect the posterior to have weight).
2 Generate a simulated dataset D* from the conditional probability distribution f (Dθ*).
3 Compare simulated and experimental data sets, D* and D_{0}, respectively, using a distance function, d, and tolerance ε; if d(D_{0}, D*) ≤ ε, accept θ*, otherwise reject θ* and return to 1. Here ε ≥ 0 is the required level of agreement between D_{0} and D*.
which for sufficiently small ε is our approximation for the true posterior distribution, P(θD_{0}). Instead of defining a distance function d(D_{0}, D*) between the full datasets, it may be more convenient to define it on sufficient summary statistics, S(D_{0}) and S(D*), of the datasets. That is, the distance function may be defined as d(D_{0}, D*) = d'(S(D_{0}), S(D*)), where d' is a distance function on the summary statistic space.
The simplest ABC algorithm is the ABC rejection sampler[38], which repeatedly executes the generic ABC building block presented above. The disadvantage of the ABC rejection sampler is that the acceptance rate is low when the prior distribution is very different from the posterior distribution, and this will nearly always be the case in realworld applications. In order to speed up the procedure, ABC SMC has been developed [5, 39, 40].
ABC based on sequential Monte Carlo (SMC)
In ABC SMC particles, {θ^{(1)}, ..., θ^{(N)}}, sampled from the prior distribution, P(θ), are propagated through a set of intermediate distributions, π(θd(D_{0}, D*) ≤ ε_{ i } ), i = 1, ..., T  1, until it corresponds to a sample from the target distribution, π(θd(D_{0}, D*) ≤ ε_{ T } ). The tolerance schedule, ε_{ i } , is chosen such that ε_{1} > . . . > ε_{ T } ≥ 0; thus the distributions gradually evolve towards the target posterior. The ABC SMC algorithm proceeds as follows:
S1 Initialize ε_{1}, ..., ε_{ T } and set the population indicator t = 1.
S2.0 Set the particle index i = 1.
S2.1 When t = 1 sample θ** directly from P(θ).
Otherwise sample θ* from with weights w_{ t }_{}_{1} and perturb sampled particles to obtain θ** ~ K_{ t } (θθ*), where K_{ t } is the perturbation kernel.
When P (θ**) = 0, return to S2.1.
Generate a simulated dataset D* ~ f (D θ**).
If d(D_{0}, D*) ≥ ε_{ t } , return to S2.1.
If i < N set i = i + 1, go to S2.1.
S3 Normalize the weights.
If t < T, set t = t + 1 and return to S2.0.
Perturbation kernels K_{ t } are chosen here to be random walk (uniform or Gaussian) processes, but other choices are possible; in principle any update (e.g. from genetic algorithms) can be used as long as weights can be calculated. For a "friendly" introduction to ABC SMC we refer to [41].
This choice of kernel ensures good mixing and has been found to capture the extent of the posterior distribution faithfully; for kernels with narrower bandwitdh (due to the finite number of particles as well as the hard rejection criterion in S2.1 above) the variance of the posterior is likely to be underestimated [5, 40].
Declarations
Acknowledgements
We thank Professor Peter Taylor for hosting T.T. at the University of Melbourne in January and February 2009, where part of this research was done. We thank Juliane Liepe, Erika Cule and Kamil Erguler for support with programming. This work was supported through an MRC priority studentship to T.T., BBSRC and Wellcome Trust support to M.B. and M.P.H.S.; M.P.H.S. is a Royal Society Wolfson Research Merit Award holder.
Authors’ Affiliations
References
 Rowley G, Spector M, Kormanec J, Roberts M: Pushing the envelope: extracytoplasmic stress responses in bacterial pathogens. Nat Rev Microbiol. 2006, 4 (5): 38394. http://www.nature.com/nrmicro/journal/v4/n5/abs/nrmicro1394.html;jsessionid=EB38DF662D919AFA165074D399215E22 10.1038/nrmicro1394View ArticlePubMedGoogle Scholar
 Duguay A, Silhavy T: Quality control in the bacterial periplasm. Biochim Biophys Acta. 2004, 1694 (13): 121134. http://www.sciencedirect.com/science/article/pii/S0167488904000928 10.1016/j.bbamcr.2004.04.012View ArticlePubMedGoogle Scholar
 Darwin A: The phageshockprotein response. Mol Microbiol. 2005, 57 (3): 6218. http://www.blackwellsynergy.com/doi/abs/10.1111/j.13652958.2005.04694.x 10.1111/j.13652958.2005.04694.xView ArticlePubMedGoogle Scholar
 Huvet M, Toni T, Sheng X, Thorne T, Jovanovic G, Engl C, Buck M, Pinney JW, Stumpf MPH: The Evolution of the Phage Shock Protein Response System: Interplay between Protein Function, Genomic Organization, and System Function. Molecular biology and evolution. 2011, 28 (3): 11411155. 10.1093/molbev/msq301PubMed CentralView ArticlePubMedGoogle Scholar
 Toni T, Welch D, Strelkowa N, Ipsen A, Stumpf MPH: Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J R Soc Interface. 2009, 6: 187202. http://journals.royalsociety.org/content/0213t54011n31k27/ 10.1098/rsif.2008.0172PubMed CentralView ArticlePubMedGoogle Scholar
 Bukau B: Regulation of the Escherichia coli heatshock response. Mol Microbiol. 1993, 9 (4): 67180. http://www.ncbi.nlm.nih.gov/pubmed/7901731?dopt=abstract 10.1111/j.13652958.1993.tb01727.xView ArticlePubMedGoogle Scholar
 Jovanovic G, Lloyd L, Stumpf M, Mayhew A, Buck M: Induction and function of the phage shock protein extracytoplasmic stress response in Escherichia coli. J Biol Chem. 2006, 281 (30): 2114761. http://www.jbc.org/cgi/content/abstract/281/30/21147 10.1074/jbc.M602323200View ArticlePubMedGoogle Scholar
 Joly N, Engl C, Jovanovic G, Huvet M, Toni T, Sheng X, Stumpf M, Buck M: Managing membrane stress: the phage shock protein (Psp) response, from molecular mechanisms to physiology. FEMS microbiology reviews. 2010, 34: 797827. http://onlinelibrary.wiley.com/doi/10.1111/j.15746976.2010.00240.x/pdfView ArticlePubMedGoogle Scholar
 Brissette JL, Russel M, Weiner L, Model P: Phage shock protein, a stress protein of Escherichia coli. Proc Natl Acad Sci USA. 1990, 87 (3): 8626. http://www.pnas.org/cgi/reprint/87/3/862 10.1073/pnas.87.3.862PubMed CentralView ArticlePubMedGoogle Scholar
 Model P, Jovanovic G, Dworkin J: The Escherichia coli phageshockprotein (psp) operon. Mol Microbiol. 1997, 24 (2): 25561. http://onlinelibrary.wiley.com/doi/10.1046/j.13652958.1997.3481712.x/abstract 10.1046/j.13652958.1997.3481712.xView ArticlePubMedGoogle Scholar
 Kobayashi R, Suzuki T, Yoshida M: Escherichia coli phageshock protein A (PspA) binds to membrane phospholipids and repairs proton leakage of the damaged membranes. Mol Microbiol. 2007, 66: 1009. http://www3.interscience.wiley.com/journal/118542171/abstract?CRETRY=1&SRETRY=0 10.1111/j.13652958.2007.05893.xView ArticlePubMedGoogle Scholar
 Engl C, Jovanovic G, Lloyd LJ, Murray H, Spitaler M, Ying L, Errington J, Buck M: In vivo localizations of membrane stress controllers PspA and PspG in Escherichia coli. Mol Microbiol. 2009, 73 (3): 38296. 10.1111/j.13652958.2009.06776.xPubMed CentralView ArticlePubMedGoogle Scholar
 Jovanovic G, Engl C, Buck M: Physical, functional and conditional interactions between ArcAB and phage shock proteins upon secretininduced stress in Escherichia coli. Mol Microbiol. 2009, http://www3.interscience.wiley.com/journal/122538962/abstract?CRETRY=1&SRETRY=0Google Scholar
 Jovanovic G, Weiner L, Model P: Identification, nucleotide sequence, and characterization of PspF, the transcriptional activator of the Escherichia coli stressinduced psp operon. J Bacteriol. 1996, 178 (7): 193645. http://jb.asm.org/cgi/reprint/178/7/1936?view=long&pmid=8606168PubMed CentralPubMedGoogle Scholar
 Pinney J, Westhead D, McConkey G: Petri Net representations in systems biology. Biochem Soc Trans. 2003, 31: 15131515. http://www.biochemsoctrans.org/bst/031/bst0311513.htm 10.1042/BST0311513View ArticlePubMedGoogle Scholar
 Hardy S, Robillard P: Modeling and simulation of molecular biology systems using Petri nets: modeling goals of various approaches. Journal of Bioinformatics and Computational Biology. 2004, 2 (4): 595613. http://www.worldscinet.com/jbcb/02/0204/S0219720004000764.html 10.1142/S0219720004000752View ArticlePubMedGoogle Scholar
 Chaouiya C: Petri net modelling of biological networks. Brief Bioinformatics. 2007, 8 (4): 2109. http://bib.oxfordjournals.org/cgi/content/full/8/4/210 10.1093/bib/bbm029View ArticlePubMedGoogle Scholar
 Murata T: Petri nets: Properties, analysis and applications. Proceedings of the IEEE. 1989, 541580. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=24143Google Scholar
 Gilbert D, Heiner M: From Petri nets to differential equations  an integrative approach for biochemical network analysis. Edited by: Donatelli S, Thiagara jan PS. 2006, 181200. ICATPN 2006, LNCS 4024, SpringerVerlag Berlin Heidelberg, http://www.springerlink.com/index/l85l6147038t012j.pdfGoogle Scholar
 Mura I, CsikászNagy A: Stochastic Petri Net extension of a yeast cell cycle model. J Theor Biol. 2008, 254 (4): 85060. http://www.sciencedirect.com/science/article/pii/S002251930800372X 10.1016/j.jtbi.2008.07.019View ArticlePubMedGoogle Scholar
 Wilkinson DJ: Stochastic Modelling for Systems Biology. 2006, Chapman & Hall/CRC, http://books.google.com/books?id=5pa3jSZf4SkC&printsec=frontcoverGoogle Scholar
 Phillips R, Kondev J, Theriot J: Physical biology of the cell. Garland Science. 2008, 1Google Scholar
 Heiner M, Koch I, Will J: Model validation of biological pathways using Petri nets  demonstrated for apoptosis. BioSystems. 2004, 75: 1528. http://www.sciencedirect.com/science/article/pii/S030326470400036X 10.1016/j.biosystems.2004.03.003View ArticlePubMedGoogle Scholar
 Hanzalek Z, Svadova M: Matlab toolbox for Petri nets. 22nd International Conference, ICATPN. 2001Google Scholar
 Martinez J, Silva M: A simple and fast algorithm to obtain all invariants of a generalized Petri net. Application and Theory of Petri Nets, InformatikFachberichte. Edited by: Girault C, Reisig W. 1982, 52: 301310. Springer, http://portal.acm.org/citation.cfm?id=734691Google Scholar
 Liepe J, Barnes C, Cule E, Erguler K, Kirk P, Toni T, Stumpf M: ABCsysbio: A Python package for approximate Bayesian computation. Bioinformatics. 2010, 26: 17971799. 10.1093/bioinformatics/btq278PubMed CentralView ArticlePubMedGoogle Scholar
 Secrier M, Toni T, Stumpf M: The ABC of reverse engineering biological signalling systems. Mol Biosyst. 2009, 5: 19251935. http://www.rsc.org/publishing/journals/MB/article.asp?doi=b908951a 10.1039/b908951aView ArticlePubMedGoogle Scholar
 Raj A, Rifkin S, Andersen E, van Oudenaarden A: Variability in gene expression underlies incomplete penetrance. NATURE. 2010, 463 (7283): 913918. http://www.nature.com/nature/journal/v463/n7283/ 10.1038/nature08781PubMed CentralView ArticlePubMedGoogle Scholar
 Hlavacek WS, Faeder JR, Blinov ML, Posner RG, Hucka M, Fontana W: Rules for modeling signaltransduction systems. Sci STKE. 2006, 2006 (344): re6http://stke.sciencemag.org/cgi/content/full/sigtrans;2006/344/re6 10.1126/stke.3442006re6PubMedGoogle Scholar
 Toni T, Stumpf MPH: Simulationbased model selection for dynamical systems in systems and population biology. Bioinformatics. 2010, 26: 10410. http://bioinformatics.oxfordjournals.org/cgi/content/full/26/1/104 10.1093/bioinformatics/btp619PubMed CentralView ArticlePubMedGoogle Scholar
 Huvet M, Toni T, Tan H, Jovanovic G, Engl C, Buck M, Stumpf MPH: Modelbased evolutionary analysis: the natural history of phageshock stress response. Biochemical Society Transactions. 2009, 37 (Pt 4): 7627. http://www.biochemsoctrans.org/bst/ev/037/0762/bst0370762ev.htmView ArticlePubMedGoogle Scholar
 Myers C: Engineering Genetic Circuits. 2009, Chapman & Hall/CRCGoogle Scholar
 Petri CA: Kommunikation mit Automaten. 1962, 3: Bonn: Institut fur lnstrumentelle Mathematik, Schriften des IIMGoogle Scholar
 Will J, Heiner M: Petri Nets in Biology, Chemistry, and MedicineBibliography. Computer Science Reports, 04/02, Brandenburg University of Technology, Cottbus. 2002, http://artikelpdf.co.cc/link/petrinetsinbiologychemistryandmedicinebibliography/Google Scholar
 Goss PJ, Peccoud J: Quantitative modeling of stochastic systems in molecular biology by using stochastic Petri nets. Proc Natl Acad Sci USA. 1998, 95 (12): 67505. http://www.pnas.org/content/95/12/6750 10.1073/pnas.95.12.6750PubMed CentralView ArticlePubMedGoogle Scholar
 Haas P: Stochastic Petri Nets. 2002, SpringerView ArticleGoogle Scholar
 Gillespie D: Exact stochastic simulation of coupled chemical reactions. The Journal of Physical Chemistry. 1977, 81 (25): 23402361. 10.1021/j100540a008. http://pubs.acs.org/doi/abs/10.1021/j100540a008 10.1021/j100540a008View ArticleGoogle Scholar
 Pritchard J, Seielstad MT, PerezLezaun A, Feldman MW: Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Molecular Biology and Evolution. 1999, 16: 17911798. http://mbe.oxfordjournals.org/cgi/content/abstract/16/12/1791View ArticlePubMedGoogle Scholar
 Sisson SA, Fan Y, Tanaka MM: Sequential Monte Carlo without likelihoods. Proc Natl Acad Sci USA. 2007, 104 (6): 17605. 10.1073/pnas.0607208104PubMed CentralView ArticlePubMedGoogle Scholar
 Beaumont MA, Cornuet JM, Marin JM, Robert CP: Adaptive approximate Bayesian computation. Biometrika. 2009, 96 (4): 983990. 10.1093/biomet/asp052. http://biomet.oxfordjournals.org/cgi/content/abstract/96/4/983 10.1093/biomet/asp052View ArticleGoogle Scholar
 Toni T, Stumpf MPH: Tutorial on ABC rejection and ABC SMC for parameter estimation and model selection. arXiv. 2009, stat.CO: http://arxiv.org/abs/0910.4472v1Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.