- Research article
- Open Access
Stochastic simulation and analysis of biomolecular reaction networks
© Frazier et al; licensee BioMed Central Ltd. 2009
- Received: 26 November 2008
- Accepted: 17 June 2009
- Published: 17 June 2009
In recent years, several stochastic simulation algorithms have been developed to generate Monte Carlo trajectories that describe the time evolution of the behavior of biomolecular reaction networks. However, the effects of various stochastic simulation and data analysis conditions on the observed dynamics of complex biomolecular reaction networks have not recieved much attention. In order to investigate these issues, we employed a a software package developed in out group, called Biomolecular Network Simulator (BNS), to simulate and analyze the behavior of such systems. The behavior of a hypothetical two gene in vitro transcription-translation reaction network is investigated using the Gillespie exact stochastic algorithm to illustrate some of the factors that influence the analysis and interpretation of these data.
Specific issues affecting the analysis and interpretation of simulation data are investigated, including: (1) the effect of time interval on data presentation and time-weighted averaging of molecule numbers, (2) effect of time averaging interval on reaction rate analysis, (3) effect of number of simulations on precision of model predictions, and (4) implications of stochastic simulations on optimization procedures.
The two main factors affecting the analysis of stochastic simulations are: (1) the selection of time intervals to compute or average state variables and (2) the number of simulations generated to evaluate the system behavior.
- Reaction Event
- Exemplar Model
- System Biology Markup Language
- Stochastic Simulation Algorithm
- Chemical Master Equation
All biological processes at the cellular level are the consequence of a series of chemical-physical reactions at the molecular level that occur within the micro-volume of the cell. The collection of molecular species and the reactions among them is referred to here as a 'biomolecular reaction network'. The complete biomolecular reaction network for a cell includes thousands of molecular components and reactions involved in transcription, translation, molecular self-assembly, metabolic reactions, transport and physical movements. Since these reactions occur in an extremely small reaction volume, the number of molecules of any one molecular species that can participate in a given reaction ranges from single copies of genes to several hundred molecules of chemicals at the μ M concentration to several hundred thousand molecules of chemicals at the mM concentration. As a consequence of the fact that a subset of all the reactions in the system involve low copy numbers of substrate molecules, the behavior of individual instances of the system cannot be modeled accurately using continuous deterministic (C-D) approaches ([1–3]). Thus, these natural micro-systems should be modeled and simulated using basic theory of discrete stochastic (D-S) chemical kinetics .
With the evolution of systems biology in recent years, interest in modeling and simulating the behavior of engineered genetic circuits in bacterial cells has increased . In addition to living cells, nano-biotechnology researchers are exploring the possibility of developing and using artificial cellular constructs employing natural and engineered biological processes ([6–11]). In order to predict the behavior of these constructs, modeling and simulation of their biomolecular reaction networks are needed to enable the design and fabrication of both the constructs themselves and physical devices based on these constructs.
Using stochastic chemical kinetics for exact simulations of biomolecular reaction networks presents several computational challenges. The ultimate goals of the simulation exercise are to be able to: (1) model and simulate the behavior of the system using a complete and accurate physical and mathematical description of the system and an exact simulation algorithm, (2) generate large numbers of simulations, and (3) analyze the data in a meaningful way. All of these goals should be accomplished in a reasonable amount of computational time. The main factors that determine the computational challenge of a particular simulation activity are the size of the model (how many molecular species and reactions are involved), the nature of the reactions (stiffness – mixture of fast and slow reactions), the duration of the simulation, the number of simulations required for statistical significance, the data logging requirements, and, data analysis requirements. Depending on the computational dimensions of the problem, the ultimate goals of the simulation exercise, as defined above, may be attainable. However, as the computational dimension increases, the ability to meet all of the requirements of the ultimate solution becomes more difficult.
There are two approaches that can be taken to address the more difficult computing problems associated with larger computational dimensions. The first approach is to employ approximations at various levels of the modeling and simulation process. At the conceptual model level, detailed reaction mechanisms consisting of multiple micro-reactions can be replaced with approximate lumped macro-reactions. This has the effect of reducing the number of both molecular species and reactions in the model. At the simulation level, there are approximate stochastic simulation algorithms, such as τ-leaping , that can speed-up the simulation time, but at the expense of accuracy. At the statistical level, the accuracy of the statistical properties of the ensemble of system simulations, computed from the simulation data, increases as the number of simulations increases, ultimately approaching the statistical properties of the exact solution in the limit as the number of simulations increases to infinity. Thus, truncating the number of simulations to decrease the computational time results in less accurate approximations of the statistical properties of the system. Finally, the collection and storage of the raw simulation data will affect the computational time. The maximum information obtainable from a simulation run requires the collection of the time and nature of every reaction event. Limiting the amount of data collected will reduce computation time at the expense of the types and accuracy of subsequent data analyses.
The second approach is to run simulations on bigger and more powerful computer hardware using software designed to take advantage of multiprocessors. Although the exact stochastic simulation algorithms do not lend themselves to efficient parallelization of the algorithm itself, running multiple simulations on separate processors can reduce the overall time required to generate a statistically adequate ensemble of independent simulations. This approach does not reduce the actual computational time, but, by running multiple simulations in parallel, it does allow one to reduce the clock time required to obtain a sufficient number of simulations for appropriate statistical analysis. In addition, numerical analysis of the large data sets generated by the simulation can be parallelized to speed up this time consuming step. Overall, there are a range of trade-offs between simulation strategy, data accuracy and computational time that must be taken into consideration when optimizing the modeling, simulation and analysis of biomolecular reaction networks for particular applications.
In recent years, single cell experiments have become a significant focus of the experimental approach to problems in systems biology. Analysis of data derived from such experiments should be interpreted in the context of stochastic systems and we feel that it is of practical value to the broader community of researchers to present concrete examples of these issues using a hypothetical model that is relevance to the fundamental transcription-translation-metabolism scheme. To model and simulate the behavior of these systems, various software packages have been developed and released to the general public (e.g., [13–19]). Each of these software products has its advantages and disadvantages for different modeling needs. We developed a software package – the Biomolecular Network Simulator (BNS) – that is specifically designed to operate on either single or multiple processor hardware . The BNS software allows one to build a model of a synthetic biomolecular reaction network and to investigate its behavior using several different stochastic algorithms. Here we use the BNS software to investigate the effects of various external conditions, such as selecting: (1) the observation interval, (2) the time-averaging interval, or (3) the number of simulation, on the observed behavior of a hypothetical, yet relatively complex, biomolecular reaction network involving transcription, translation and metabolism to illustrate some of the unique data analysis issues that arise in stochastic simulations of biomolecular reaction networks. It is hoped that the reader will gain a better intuitive understand of how these factors can influence the interpretation of stochastic reaction systems.
Stochastic Simulation Algorithm
The mathematical description of the behavior of stochastic biomolecular reaction networks is based on Markov process theory . The system behaves as a multi-variant, discrete state, Markov jump process and is governed by the chemical master equation (CME). The solution of the CME is in fact the mathematically exact description of the behavior of the system. For our purposes, we will consider a biomolecular reaction network consisting of N S identifiable molecular species, denoted S i (i = 1, 2, ..., N S ). These molecular species can undergo N R fundamental chemical reactions r k (k = 1, 2, ..., N R ) and are confined to a fixed reaction volume, V R . It is assumed that the system is well-mixed (homogenous) and at constant volume and temperature. Let s(t) be an N S -dimensional state vector whose elements s i (t) (i = 1, 2, ..., N S ) are the number of molecules in the system of each molecular species S i at time t.
where a k (s, t) is the propensity of the k th fundamental reaction at time t and ν k is the state change vector, a N S -dimensional vector that specifies the changes in the number of molecules of each state variable when the k th reaction occurs. Note, the sum is over all of the N R possible reactions that can occur. Further note, the propensity for a given reaction, a k (s, t), is computed as the product of the reaction probability constant, c k , and the total number of combinations of possible reacting molecules for that reaction. The reaction probability constant is, in a sense, a measure of the reactivity of the reaction substrates .
For our purposes, it will be assumed that the initial condition as defined by Equation (2) will hold and the state density function that is the solution of the CME can be written as the conditional probability density function P(s, t|s0, t0 = 0).
and σ n (t) is the estimate of the standard deviation (SD) based on the ensemble of n simulations. Thus, computing the mean and SD of the ensemble of simulations at a given time provide an estimate of the first and second moments of the probability density function for the system at that time.
Although the basic biochemical reactions in a biomolecular reaction network are treated as discrete, jump Markov processes and thus stochastic in nature, if the number of molecules of every species in the system is large then the process can be approximated by a continuous Markov process . Furthermore, if the number of molecules and the volume increase in proportion such that the concentration of each species is constant (the so-called thermodynamic limit), then the solution describing the behavior of the state variables can be written as the sum of a single-valued variable that is the solution of the classical rate equations and a variable factor that decreases in magnitude as . Thus, for sufficiently large reaction volume, keeping concentrations constant (consequently large number of molecules), the first moment of the probability density function of the state variables approaches the classical continuous deterministic solution of the reaction rate ODEs. However, if there are only a few molecules of any given species, as is often the case in gene expression, this approximation will not accurately describe the instantaneous state of the system. Furthermore, the C-D approach will provide no information concerning the temporal fluctuations of state variables of a given system nor the variability between multiple instantiations of the system with identical initial conditions.
Biomolecular Network Simulator Software
The Biomolecular Network Simulator software was developed to allow for stochastic simulations on either desktop or multi-processor hardware (see age: http://www.bhsai.org/bns_alpha.html for complete documentation of the software and http://www.bhsai.org/downloads/bns_release.zip to download the software). The front-end graphical users interface (GUI) and the backend data analysis tools are written in MATLAB. This allows the user to exploit the interactive features and visualization tools of MATLAB for setting up simulations and analyzing and interpreting the resulting data. The simulation engine itself and the analysis tools are written in the C language to maximize speed for the computationally intensive part of a simulation run and post-simulation data analysis.
The BNS software accepts two types of model definitions: (1) Systems Biology Markup Language (SBML) format  and (2) BNS format where models are defined by a set of MATLAB m-files. There are two types of output files: snapshot data and event log data. Snapshot data files contain the state of the system (number of molecules of each molecular species that are selected to be monitored) and the number of reaction occurrences in each reaction channel since the last snapshot at user specified time intervals. The second type of output files – the event log files – contain the record of every discrete event that occurs during the simulation (i.e., the reaction name and time of each and every event).
ti, jis the j th time subinterval in the interval t i to t i + Δt and the summation is over , the total number of reaction events in the interval t i to t i + Δt that affect s k . The computation is accomplished by summing up the products of the time sub-intervals multiplied by the number of molecules present in that sub-interval, thus the average is weighted according to the amount of time the compound exists in each state during the selected time-interval. The averaging analysis can be performed for a single simulation run or for an ensemble of runs. In the latter case, the between run average (the average of the individual time-weighted averages over the ensemble of simulation runs) and standard deviation are plotted.
The data are plotted with the y-axis representing the fraction of the simulations that the state variable was in the discrete state indicated on the x-axis. This is an estimate of the probability that the system would be in that particular state at the defined time t i , i.e., the state probability density P(s, t|s0, t0) defined by Equation 1.
where is the number of reaction events in the time interval t i to t i + Δt. These analyses provide important information about the behavior of the system, e.g., relative event rates for important reactions. Such event rate data can be used to calculate the rate of substrate utilization in selected reaction channels as a function of the state of the system.
The BNS software can be run on high performance computing (HPC) hardware. Parallelization of the BNS code for simulation runs on HPC hardware is accomplished using the Message Passing Interface (MPI). In our parallelization scheme, the 'master' processor divides the total number of simulation runs into a set of jobs depending on the number of available processors and sends a job to each of the 'worker' processors. The snapshot data from the workers are sent back to the master processor for the interactive graphics while the event log files are saved directly to the hard drive by the workers. In this approach to parallelization, the power of parallel processing is utilized to run a large number of simulations simultaneously and thus speedup the overall clock time for large batch jobs.
In order to investigate the analysis of a discrete stochastic system, a hypothetical model of a generic two gene, self-assembling catalytic ligation reaction in a cell-free transcription-translation (CFTT) system is explored. The hypothetical biomolecular reaction network consists of the transcription and translation reactions of two genes in a gene expression system and the subsequent metabolic reactions of the expressed enzymes. The system is assumed to be a closed system contained in a vesicle with a reaction volume of 5 × 10-16 L corresponding to a spherical vesicle with a diameter of approximately 1.0 μ m. Note, in such a vesicle, a molecular species at a concentration of 1 μ M is equivalent to a total of 301 molecules present in the vesicle. The CFTT system contains all of the necessary components for transcription and translation of target genes into the expressed proteins.
Transcription of geneA and geneB consists of four reactions each. The four reactions are the association and dissociation of the RNA polymerase (RNAp – s001) and the promoter sites for geneA (P_A – s002) and geneB (P_B – s022) to form the promoter-polymerase complexes (RNAp_P_A – s003 and RNAp_P_A – s023). The polymerase then translocates from the promoter site to the transcription start site forming the transcription start complex (RNAp_geneA – s004 and RNAp_geneB – s024) and the subsequent formation of the mRNA (geneA_mRNA – s009 and geneB_mRNA – s025). The mRNAs can either be degraded by a generic RNase or used as a template for protein synthesis. Translation consists of three reactions that include association and dissociation of the small ribosomal unit (Rib_s – s015) with the ribosomal binding site on the mRNA to form the ribosomal-mRNA complex (Rib_s_geneA_mRNA – s016 and Rib_s_geneB_mRNA – s026) and the subsequent translation reaction resulting in the formation of the protein products (Pro_A – s018 and Pro_B – s027). Key substrates for the translation reaction are the 20 charged transfer RNAs (AA_i_tRNA_AA_i – s091 to s111) which are formed from the appropriate amino acids (AA_i – s032 to s051) and the corresponding transfer RNAs (tRNA_AA_i – s072 – s091) by the associated aminoacyl-transferases (Trans_AA_i – s052 to s071). The protein product Pro_A is capable of catalyzing the ligation of Sub_1 (s233) and Sub_2 (s237) to form the metabolic product Prod_A (s240) via a series of association, catalytic and dissociation reactions. The protein product Pro_B must form a tetramer (Pro_B_4) to be catalytically active. Once formed Pro_B_4 catalyzes the ligation of Prod_A and Sub_3 (s245) to form the final product Prod_B (s248) via another series of association, catalytic and dissociation reactions. All proteins can be degraded by a generic protease (Prot – s031).
In this biomolecular reaction network, the biomolecular reactions relating to the expression of geneA and geneB and subsequent metabolism are treated as stochastic in nature. Here we use the Gillespie direct stochastic algorithm to obtain sufficient numbers of probabilistically correct state space trajectories, consistent with the CME, for statistical analyses. The simulation data sets obtained through the use of these Monte Carlo simulations are used to illustrate some of the statistical properties of the stochastic behavior of the exemplar model. Note, these simulation data are for a generic model and do not necessarily represent the behavior of any actual system.
Simulation of exemplar model using the Gillespie Direct Algorithm
Figure 4(D) also illustrates another feature of the stochastic nature of the system. The two genes, geneA and geneB, are treated identically in this exemplar model in the sense that their nucleotide compositions are the same and all reactions with the RNA polymerase, subsequent transcription reactions, ribosomal subunits and translation reactions have the same probabilistic reaction constants. Therefore, if the system was continuous and deterministic the two genes would be expressed at the same rate and attain the same magnitude. However, due to the stochastic nature of the system, neither the rates nor the magnitudes of synthesis of Pro_A and Pro_B are identical within a given simulation. Furthermore, the rates and magnitudes are not repeatable from simulation to simulation. Inspecting Figure 4(D) reveals that in two cases more Pro_A was translated than Pro_B and in one more Pro_B. In the simulations shown in Figure 4(D), the protein that started with the higher translation rate ended up making the most protein. However, if more simulations are investigated, it is found that this is not always true, i.e., the two trajectories can cross over. Note also that for these state variables, both of which monotonically increase with no fluctuations, there is significant variability between simulations in the total protein synthesized at the end of the simulation ranging from 40 – 75 protein molecules synthesized.
As the number of individual experiments on single systems increases, the experimental estimates of the mean and standard deviation of the system state variables improve. However, if experimental data for state variables are only obtained as the mean of a large composite sample of vesicles (i.e., the data are obtained by analyzing samples consisting of a large number of individual systems), then the only meaningful comparison is between this 'macro-sample' mean and the mean of a large number of simulations at corresponding time points. In this case, no data concerning the variability between individual vesicles can be obtained. Note, the standard deviation obtained from analyzing multiple macro-samples does not correspond to the fluctuations exhibited in individual model simulations of the system or fluctuations between model simulations, but rather is a measure of experimental uncertainties (e.g., experimental measurement errors and non-identical systems), which are not taken into consideration by the model simulations. In fact, if there were no experimental uncertainties, then the macro-means of multiple macro-samples taken from ensembles of identical systems would converge to a single value as the number of individual systems collected in the macro-sample increases.
Time of last transcription and translation events.
Time of Last Transcription Event for geneA
Time of Last Translation Event for geneA
Improvement in estimating the mean and standard deviation of s with the number of simulation runs
Figure 8(C) shows the behavior of free protein B (Pro_B, s027) as n increases. In this case, the values of the state variable is not inherently limited by the stoichiometry of the system. For Pro_B, there are approximately 60 molecules synthesized. However, because Pro_B forms a tetramer and that tetramer is involved in catalytic reactions, the number of free molecules of Pro_B remains low throughout the simulation and fluctuates rapidly due to the association and dissociation of the dimer – tetramer complexes. Again, the estimates of the ensemble mean and SD show significant fluctuations from one time point to the next when n is small due to the inaccuracies in each estimate of ⟨s(t)⟩ n and ⟨σ(t)⟩ n . As n increases, each individual estimate of the mean of s(t) and SD improves and the plot approaches the exact smooth curve for P(s, t/s0, t0). For this state variable, the details of its behavior only become well defined with 100 or more simulations.
Finally, Figure 8(D) shows the time course of the number of proteins translated using the fictitious variables t250 and t251 to count the number of molecules of Pro_A and Pro_B synthesized, respectively. For any individual simulation, the translation of geneA and geneB can be quite independent within the constraint that the total number of protein molecules synthesized are limited by the availability of energy and substrates (Figure 4(D)). However, as the number of simulations increases, the ensemble means of Pro_A* and Pro_B* approach each other demonstrating that the expression of the two genes are controlled in an identical manner. All of these examples emphasize the need to conduct a large number of simulations to obtain accurate estimations of the behavior of the state variables.
The dependency of the accuracy of the estimates of the mean and SD of state variables on the number of simulations is an issue that must be taken into consideration when comparing model predictions with experimental data. If simulations are used to estimate what the model would predict for experimental observations, then the values for state variables predicted by the model will only be exact in the limit of n → ∞ simulations. Therefore, if a finite number of simulations are used, there will be some error in the model predictions. There are several situations where this issue becomes particularly important. One example is when one believes that a model is adequate, but there are still minor discrepancies between model predictions and the available experimental data. The usual cause for this situation is that the uncertainty in some model parameters, such as the reaction probability constants, leads to uncertainties in simulations that are greater than the uncertainties in the experimental data. To investigate whether minor adjustments to model parameters will improve the correlation between model predictions and experimental data, various optimization techniques can be employed. To successfully adjust model parameters based on experimental data, it is necessary to compute a large number of simulations to adequately estimate the behavior of the system each time model parameters are variedin the optimization algorithm. If too small a number of simulations are used, the fluctuations in model predictions as the model parameters are varied from one cycle to the next of the optimization process will be so great as to limit the usefulness of the optimization procedure. The larger the number of simulations the better the estimate of the model prediction, thus reducing this additional source of error that is not present when fitting solutions of C-D ODEs to experimental data.
When sufficiently large numbers of simulations are performed, it becomes reasonable to generate discrete histograms of the distributions of state variables at defined times. Figure 9 shows the distribution of several state variables at two time points, one during active transcription and translation (t = 1000 sec) and the other at the end of the simulation (t = 3600 sec). These histograms were generated using 1000 individual simulation. This is an estimate of the probability that the system would be in that particular state at the defined time t, i.e., the state probability density P(s, t|s0, t0). During gene transcription and translation (t = 1000 sec), the free polymerase probability density (Figure 9(A)) is distributed among the five possible states from no polymerases bound (s001 = 301) to four polymerases bound (s001 = 297) with two polymerases bound (s001 = 299) being the most probable state. At the end of the simulation (t = 3600), the most probable state is with four polymerases bound (prob = 0.7) and furthermore, the states with zero or one polymerase bound (s001 = 300 or 301) are not populated (prob = 0). This is due to the fact that when transcription terminates, two polymerases are stalled at the start sites of the two genes. This is seen in Figure 8(B), where before transcription terminates (t = 1000 sec), the start site of gene A is only occupied with a probability of approximately 0.3 while after transcription terminates it is occupied with a probability of 1.0. Figure 9(C) and 9(D) are the estimates of the state probability density functions for the free gene A mRNA (geneA_mRNA, s009) and the total number of protein A molecules synthesized (Pro_A*, t250). Both of these state variables increase with time while transcription and translation are active. Furthermore, the number of possible states available to each state variable increases and is only limited by the total number of molecules of mRNA and protein A synthesized. The free geneA mRNA state variable is influenced by five reactions (its synthesis (r4), its degradation (r5), its association and dissociation with the small ribosomal unit (r6 and r7), and its release after translation (r8)), while the fictitious total protein synthesized variable is only a counter for the number of transcription reactions (r8). The variability in Pro_A* is a consequence of the stochasticity in the transcription reaction alone, while the variability in geneA_mRNA is a combination of the variability in the total number of mRNA molecules transcribed at a given time and the variability introduced by the other reactions that influence that state variable. Discrete histograms of the state variables give a better visualization of the true nature of the state probability density function than just the first and second moments.
Comparison between single and multi-processor simulation runs
As is apparent from the discussion above, the ability to accurately estimate the first and second moments of the state density function of the system under investigation using the Monte Carlo simulation approach increases with increasing numbers of simulations. Various analysis techniques, such as optimization (data fitting) and sensitivity analysis, require repeated batch jobs of large numbers of simulation to obtain statistically valid results. This may not be a problem for a relatively simple biomolecular reaction system, but as the complexity of the system model increases this will increase the computational demand. The need for high performance computing becomes essential.
Running a simulation session as a batch job on multi-processor HPC hardware entails a certain amount of overhead, e.g., the time it takes to break up the job into smaller tasks and assign the problem to each processor on the front end and the data collection and storage on the back end. As a result, the speed-up gained by using multi-processor hardware is to a degree dependent on how computationally intensive the problem is. For a relatively simple problem that is not particularly computationally intensive, the majority of the clock time for the simulation session is taken up with overhead. Whereas, for a problem that is computationally intensive, the computations involved in the actual simulation are the time consuming component of the simulation process.
To further explore this effect, we repeated the test with a batch job of significantly less computational complexity. Using a one gene model with lumped reactions (denoted geneA_CFTT_0p0 – 1× model with a total of 45 state variables and 12 reactions), the speed-up for a batch job consisting of 10,000 simulation runs was calculated using up to 50 processors. The clock time for a single processor to run the batch job was 1,920 sec and the speed-up is shown in Figure 10(B). In this case, the computational demand for the actual simulations is relatively small and the overhead becomes a dominant factor. Given these conditions, using more than 20 processors provides little benefit. The computational complexity of this relatively simple model can be increased by increasing the number of plasmids per reaction network from 1 to 10 and the number of substrate molecules by a factor of 10 (denoted the geneA_CFTT_0p0 – 10× model). This is equivalent to having 10 plasmids containing geneA present in the same reaction volume with ten times the number of substrate molecules available. The speed-up results using the 10× model are also given in Figure 10(B). Here the value of additional processors is clearly apparent even when 50 processors are accessed.
In this manuscript, we have tried to point out some of the issues that arise in interpreting the results of modeling and simulating discrete stochastic systems. The issues of how the snapshot interval affects the visualization of state variables and how the time-averaging interval affects the estimation of the time-averaged reaction event rates illustrate how simulation and analysis parameters can influence the interpretation of system behavior. Probably even more important is how the number of simulations affect the estimation of the mean and SD of state variables. Unless sufficient numbers of simulations are conducted, these estimates will not be adequate to: (1) observe the details of the temporal dynamics of state variables, (2) accurately estimate the variability between simulations, or (3) when optimization procedures are being employed, to expect consistent convergence of solutions.
An important point to remember is that the stochastic nature of individual state variables is to some extent model dependent, i.e., will depend on: (1) the relationships between state variables, (2) the mathematical forms of the reaction propensities, (3) the values for reaction probability constants, and (4) initial conditions for state variables. For example, if the diameter of the vesicle was increased 10 times (from 1.0 to 10.0 μ m), then the initial numbers of molecules of each state variable would be increased 125 times (except for the number of genes as determined by the number of gene promoter sites which would still be 2, one for each gene). In this case, the effect of stochastic processes on some state variables, such as the transfer RNAs, would be significantly diminished while the effect on others, i.e., transcription, would remain.
In addition, we investigated the generic behavior of a biomolecular reaction network consisting of the expression of two genes in a cell-free transcription-translation system enclosed in a artificial reaction vessel. Such a closed system does not reach a steady state and generates a different class of kinetics than is found in systems where it is assumed there is an infinite supply of substrates and energy and all waste products are 'taken care of' by the system. The closed system investigated here simply 'dies out' when critical components are exhausted. The ability to simulate such a system allows one to identify the critical factors that limit the performance of the system. Although the model for the system is still relatively crude, it is clear that the availability of limiting amino acids controls the ultimate expression of proteins, the availability of GTP limits transcription of the plasmid genes to form mRNA and the availability of substrates (including ATP) for the catalytic ligation reactions limits the generation of the final products. This quantitative knowledge can be used, for example, to optimize the system to maximize production of products, either proteins or metabolites. As the models for the CFTT-vesicle system become more sophisticated, a more detailed understanding of the behavior of these biological constructs will evolve.
The BNS software is available to all researchers through the website below.
Project name: Biomolecular Network Simulator
Project home page: http://www.bhsai.org/bns_alpha.html
Project downloads: http://www.bhsai.org/downloads/bns_release.zip
License: GNU GPL
Operating systems: Platform independent
Programming Language: C, MATLAB
Other Requirements: MATLAB 6.5 or newer; MatlabMPI and MPI libraries for multiprocessor execution.
The work of Yaroslav Chushak was made possible by a grant from the Department of Defense High Performance Computing Modernization Program Office (HPCMPO). The work of Brent Foy was made possible by a grant from the Air Force Office of Scientific Research (AFOSR) and by the Air Force Summer Faculty Fellowship Program. DISCLAIMER: The opinions and assertions contained herein are the private views of the authors and are not to be construed as official or as reflecting the views of the U.S. Army, U.S. Air Force, or the U.S. Department of Defense. This paper has been approved for public release with unlimited distribution (Case Number: 88ABW-2008-0859).
- Elowitz MB, Levine AJ, Siggia ED, Swain PS: Stochastic gene expression in a single cell. Science. 2002, 297: 1183-1186. 10.1126/science.1070919View ArticlePubMedGoogle Scholar
- Kaern M, Elston TC, Blake WJ, Collins JJ: Stochasticity in gene expressions: From theories to phenotypes. Nat Rev Genetics. 2005, 6: 451-464. 10.1038/nrg1615.View ArticlePubMedGoogle Scholar
- Voit EO: Computational Analysis of Biochemical Systems. 2000, Cambridge, UK: Cambridge University PressGoogle Scholar
- Gillespie D: Exact stochastic simulation of coupled chemical reactions. J Phys Chem. 1977, 81: 2340-2361. 10.1021/j100540a008.View ArticleGoogle Scholar
- Szallasi Z, Stelling J, Periwal V: Systems Modeling in Cell Biology, From Concepts to Nuts and Bolts. 2006, Cambridge, MA: MIT PressView ArticleGoogle Scholar
- Ishikawa K, Sato K, Shima Y, Urabe I, Yomo T: Expression of a cascading genetic network within liposomes. FEBS Letters. 2004, 576: 387-390. 10.1016/j.febslet.2004.09.046View ArticlePubMedGoogle Scholar
- Noireaux V, Libchaber A: A vesicle bioreactor as a step toward an artificial cell assembly. Proc National Acad Sci USA. 2004, 101: 17669-17674. 10.1073/pnas.0408236101.View ArticleGoogle Scholar
- Noireaux V, Bar-Ziv R, Godefroy J, Salman H, Libchaber A: Toward an artificial cell based on gene expression in vesicles. Phys Bio. 2005, 2: 1-8. 10.1088/1478-3975/2/3/P01.View ArticleGoogle Scholar
- Oberholzer T, Albrizio M, Luisi PL: Polymerase chain reaction in liposomes. Chem Bio. 1995, 2: 677-682. 10.1016/1074-5521(95)90031-4.View ArticleGoogle Scholar
- Pohorille A, Deamer D: Artificial cells: prospects for biotechnology. TRENDS in Biotech. 2002, 20: 123-128. 10.1016/S0167-7799(02)01909-1.View ArticleGoogle Scholar
- Yu W, Sato K, Wakabayashi M, Nakaishi T, Ko-Mitamura EP, Shima Y, Urabe I, Yomo T: Synthesis of functional protein in liposome. J Biosci Bioeng. 2001, 92: 590-593. 10.1263/jbb.92.590View ArticlePubMedGoogle Scholar
- Gillespie D: Approximate accelerated stochastic simulations of chemically reacting systems. J Chem Phys. 2001, 115: 1716-10.1063/1.1378322.View ArticleGoogle Scholar
- Mendez P: Biochemistry by numbers: Simulation of biochemical pathways with Gepasi 3. Trends Biochem Sci. 1997, 22: 361-363. 10.1016/S0968-0004(97)01103-1View ArticleGoogle Scholar
- Sauro HM: Jarnac: A system for interactive metabolic analysis. Animating the Cellular Map: Proceedings of the 9th International Meeting on BioThermoKinetics. 2000, Stellenbosch University Press, Stellenbosch, ZAGoogle Scholar
- de Jong H, Geiselmann J, Hernandez C, Page M: Genetic network analyzer: Quantitative simulation of genetic regulatory networks. Bioinfor. 2003, 19: 336-344. 10.1093/bioinformatics/btf851.View ArticleGoogle Scholar
- Adalsteinsson D, McMillen D, Elston TC: Biochemical Network Stochastic Simulator (BioNetS): software for stochastic modeling of biochemical networks. BMC Bioinform. 2004, 5: 24-10.1186/1471-2105-5-24.View ArticleGoogle Scholar
- Dhar P, Meng TC, Somani S, Ye L, Sairam S, et al.: Cellware – a multi-algorithmic software for computational systems biology. Bioinform. 2004, 20: 1319-1321. 10.1093/bioinformatics/bth067.View ArticleGoogle Scholar
- Ramsey S, Orrell D, Bolouri H: Dizzy: Stochastic simulation of large-scale genetic regulatory networks. J Bioinform Comput Biol. 2005, 3: 415-436. 10.1142/S0219720005001132View ArticlePubMedGoogle Scholar
- Takahashi K, Kaizu K, Hu B, Tomita M: A multi-algorithm, multi-scale method for cell simulation. Bioinform. 2004, 20: 538-546. 10.1093/bioinformatics/btg442.View ArticleGoogle Scholar
- Chushak Y, Foy B, Frazier J: Biomolecular Network Simulator: Software for Stochastic Simulation of Cellular Biological Processes. Proceeding of the Spring Simulation Multiconference. 2007, 1: 345-349.Google Scholar
- Gillespie D: Markov Processes: An Introduction for Physical Scientist. 1991, Academic Press, London, New York, San DiegoGoogle Scholar
- Hucka M, Finney A, Sauro HM, Bolouri H, et al.: The systems biology markup language (SBML): A medium for representation and exchange of biochemical network models. Bioinfor. 2003, 19: 524-531. 10.1093/bioinformatics/btg015.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.