Experimental and computational validation of models of fluorescent and luminescent reporter genes in bacteria
© de Jong et al; licensee BioMed Central Ltd. 2010
Received: 27 October 2009
Accepted: 29 April 2010
Published: 29 April 2010
Fluorescent and luminescent reporter genes have become popular tools for the real-time monitoring of gene expression in living cells. However, mathematical models are necessary for extracting biologically meaningful quantities from the primary data.
We present a rigorous method for deriving relative protein synthesis rates (mRNA concentrations) and protein concentrations by means of kinetic models of gene expression. We experimentally and computationally validate this approach in the case of the protein Fis, a global regulator of transcription in Escherichia coli. We show that the mRNA and protein concentration profiles predicted from the models agree quite well with direct measurements obtained by Northern and Western blots, respectively. Moreover, we present computational procedures for taking into account systematic biases like the folding time of the fluorescent reporter protein and differences in the half-lives of reporter and host gene products. The results show that large differences in protein half-lives, more than mRNA half-lives, may be critical for the interpretation of reporter gene data in the analysis of the dynamics of regulatory systems.
The paper contributes to the development of sound methods for the interpretation of reporter gene data, notably in the context of the reconstruction and validation of models of regulatory networks. The results have wide applicability for the analysis of gene expression in bacteria and may be extended to higher organisms.
Fluorescent and luminescent reporter genes are popular tools for quantifying gene expression. The underlying principle of the technology is to fuse the promoter region and possibly (part of) the coding region of a gene of interest to a reporter gene. The reporter gene can be expressed from a (low-copy) plasmid or integrated at a suitable location in the host chromosome. The expression of the reporter gene generates a visible signal (fluorescence or luminescence) that is easy to capture and reflects the expression of the gene of interest (e.g., [1–5]).
The use of reporter genes allows real-time monitoring of gene expression, both at the level of individual cells and cell populations. By means of single-cell fluorescence and luminescence microscopy, fluctuations in gene expression due to internal and external noise can be measured. This has led to new insights into the ways cells both reduce and exploit these fluctuations (see [6–8] for reviews). Automated microplate readers measure gene expression of cell populations rather than individual cells. The lower resolution is compensated by a substantially higher throughput, as several dozens of genes can be monitored in parallel, at a much higher precision and sampling density than is currently possible by means of, e.g., DNA microarrays. The availability of libraries of fluorescent and luminescent reporter gene plasmids has further contributed to the potential of the technology [9, 10].
Several examples of the real-time quantification of reporter gene expression on the population level have appeared in the literature in recent years. These examples include the monitoring of gene expression in the lysis-lysogeny decision in bacteriophage λ, the oxidative stress  and DNA damage response [13, 14] in E. coli, the thermal induction of virulence factors in Y. pestis, the mapping of the regulatory region of the lac operon , and the dynamics of synthetic genetic regulatory networks . In a typical microplate experiment, 96 cultures are followed in parallel, over several hours. This results in large amounts of data, of the order of 10,000-100,000 measurements of absorbance and fluorescence and luminescence intensities per experiment. In order to meaningfully interpret these data, we need to assess what exactly reporter gene measurements can teach us about the actual processes going on in the cell. Mathematical models have been shown critical for inferring biologically relevant quantities from reporter gene data (e.g., [13, 18–23]). Most approaches present ways to infer the promoter activity from the primary data. By genetic construction, the measured promoter activity of a reporter gene carries over to any host gene that is under the control of the same promoter. Some studies have inferred the concentration profile of a transcription factor controlling the promoter by means of a known or hypothesized kinetic expression for the mechanism by which the transcription factor controls the promoter (see [13, 20] for good examples). Another approach is to reconstruct (relative) measures of the reporter mRNA and protein concentrations from the data and use these as estimates of the corresponding products of the host gene. This approach is intuitively attractive, as it allows a straightforward read-out of the expression of any gene whose regulatory sequences are cloned into a reporter construct. However, it poses the question of the accuracy of the estimates, because the kinetics of host and reporter gene expression may be different. The aim of this paper is to systematically investigate this question by means of a combination of models and experiments. Our specific contributions are the experimental validation of the approach by comparing the quantities reconstructed from the reporter gene data with direct measurements of the accumulation of mRNA and protein, obtained by Northern and Western blots, respectively. Moreover, we use the models to pinpoint potential systematic biases arising from the folding time of fluorescent reporter proteins, and from differences in the half-lives of the products of host and reporter genes. This allows us to correct for the resulting systematic errors in the measurements and obtain a more accurate estimate of synthesis rates and concentrations of the host protein.
To illustrate the interest of this approach for the analysis of gene expression in bacteria, we have constructed fluorescent and luminescent reporter systems of the gene fis of E. coli. More specifically, we have cloned the fis promoter into plasmids containing either a gene coding for a Green Fluorescent Protein (GFP), or an operon encoding the enzymes of a light-producing reaction catalyzed by bacterial luciferase. The E. coli host gene codes for the protein Fis, a global regulator of transcription that plays a central role in, among other things, the control of metabolism and the coupling of the DNA topology to cellular physiology . The expression pattern of fis has been thoroughly investigated before: fis expression is induced after a glucose upshift and decreases subsequently when the bacteria enter the exponential phase of growth [25–27]. It thus serves as an ideal example of a transient response in bacterial gene expression. A first interesting finding is that the relative mRNA and protein concentrations obtained from the reporter gene data are in good overall correspondence with the Northern and Western blot measurements, respectively. This suggests that the use of fluorescent and luminescent reporter genes in combination with automated microplate readers may yield reasonably accurate estimates of the expression profile of the products of the host gene. Second, we show that corrections for systematic biases due to differences in the half-lives of reporter and host mRNAs have mostly negligible effects, whereas corrections for differences in the half-lives of reporter and host proteins further improve the agreement between the inferred Fis concentration profiles and the Western blots. This conclusion, strengthened by simulation studies, suggests that the latter differences may need to be taken into account when using reporter gene data for the reconstruction of regulatory networks. Our work has wide applicability for the interpretation of measurements of gene expression in microorganisms.
Plasmids and strains
Escherichia coli strain BW25113 was used as a wild-type strain . The plasmids used in this study are listed in Section S6 of the Additional file 1. The gfp- and lux-containing plasmids (pZEgfp and pSBluc) are derivatives of plasmids pZE1RM  and pSB377 , respectively, with a modified sequence of the multiple cloning site. The sequence between the end of the multiple cloning site (EcoR I) and the start codon (ATG) of luxC and gfp is: gaattcCCCG GGTAATTCAG GCCTGGAGGA TACGTatg and gaattcCCCG GGTAATTCAT TAAAGAGGAG AAAGGTACCG Catg, respectively. We have amplified the promoter region of fis by PCR from genomic DNA of E. coli, with oligonucleotides Fis1 and Fis2 (Fis1: ATCGCTCGAG GTGACGCGG, Fis2: TACG GAATTC GAGTTAAGAA ATGACCATAC TGTGA). Oligonucleotide Fis1 contains an Xho I restriction site, and oligonucleotide Fis2 an EcoR I restriction site, which allows cloning of the amplified DNA between these two sites on plasmids pSBluc and pZEgfp. The resulting plasmids are called pSB-fislux and pZE-fisgfp, respectively. Plasmids were verified by sequencing. They possess a colE1 origin of replication, are present at about twenty copies per cell, and do not affect bacterial growth (data not shown).
Glycerol stocks, stored at -80°C, of strains BW25113  carrying (or not) a plasmid-encoded reporter gene were grown overnight (≈ 15 h) at 37°C, with shaking at 200 rpm, in M9 minimal medium  supplemented with 0.3% glucose. For plasmid-carrying strains, the growth medium was supplemented with 100 μ g·ml-1ampicillin. The overnight culture was diluted 20-fold into the same, fresh medium. After 4 hours of growth the culture medium was changed by centrifugation and resuspension in M9 without glucose. The volume was adjusted in order to obtain an OD600 of 0.2. The bacteria were incubated without nutrients at 37°C for an additional 15 hours. Abruptly limiting the glucose availability in this fashion assures that the bacteria are in a defined physiological state at the beginning of the experiment. For the upshift experiments, 50 μ l of these growth-arrested cultures were added to 100 μ l of prewarmed M9 medium, containing glucose at a final concentration of 0.15%, and grown in a microtiter plate (≈ 12 h) at 37°C. The microplates were agitated at regular intervals during growth in the Fusion microplate reader (Perkin Elmer). During a typical experimental run we acquire about 100 readings each of absorbance, luminescence, and fluorescence. Fluorescence excitation was at 485 nm and emission was monitored at 520 nm. Absorbance measurements used a 600 nm filter.
The absorbance, luminescence, and fluorescence data were fitted with regression splines, using the Spline toolbox of Matlab (Mathworks). In the absence of a specific parametric model of the data, regression splines provide a flexible, non-parametric modeling framework that allows estimation of the underlying trend in the absorbance and light intensity. In particular, we have used cubic B-splines  in combination with the generalized cross-validation (GCV) criterion for determining the number and the placement of the knots . The optimal spline fit is the one minimizing GCV, that is, minimizing the residual sum of squares subject to a penalty term increasing with the number of knots (Section S2 of the Additional file 1). In order to find an estimate of the minimizer of GCV, and therefore of the 'best' choice of knots, we have followed a simple, stepwise knot selection schema . The actual computation of the regression spline from a knot sequence is carried out by the Matlab function spap2.
A major advantage of the use of splines is that they greatly facilitate the computation of derived quantities from the primary data. Since splines are piecewise-polynomial functions, standard arithmetic operations, as well as differentiation and integration operations, can be carried out analytically . This is more efficient and leads to more precise results than the use of numerical approximations. The latter cannot be completely avoided though, as some of the expressions that need to be evaluated for the computation of the host protein synthesis rate and host protein concentration involve functions that are not splines (Section S4 of the Additional file 1). In this case the integrals are computed by means of the Matlab function quad.
For each of the derived quantities, we computed 95% confidence bands using a standard bootstrap method. In particular, we have followed the residual resampling scheme , which constructs bootstrap data sets by repeatedly resampling the residuals of the optimal spline fit (Section S5 of the Additional file 1). For each of the 200 bootstrap data sets generated, we computed the synthesis rates and concentrations of the host and reporter proteins. From this empirically determined distribution, we obtained an estimate of the 95% confidence interval for the predicted values at evenly-spaced time-points, using so-called bootstrap percentiles . The confidence bands shown in the figures in the text have been obtained by connecting the estimates of the point-wise confidence intervals.
The fluorescence background is determined by measuring the fluorescence of a strain carrying the promoterless vector pZEgfp. The background fluorescence is not constant, but rather varies with the population size due to the autofluorescence of bacterial cells. In this case, direct subtraction of the background readings from the uncorrected fluorescence intensity at each time-point t is not appropriate, as the size of the bacterial population generating the uncorrected signal is generally different from the size of the population generating the background signal.
Western and Northern blot analysis
Equal quantities of protein were separated on 18% SDS-PAGE acrylamide gels and transferred onto nitrocellulose filters (Amersham Pharmacia). Filters were incubated with anti-Fis antibodies. Immunoblots were developed by using horseradish peroxidase-conjugated goat anti-rabbit antibody, followed by enhanced chemiluminescence (Amersham). The image of the blot acquired with a highly sensitive CCD camera and averaged for two minutes was quantified using the ImageJ software .
Total RNA was extracted from cells using the hot phenol procedure , or the Trizol procedure (Invitrogen). RNA samples were stored in DEPC water at -80°C until further use. The total RNA was loaded on a polyacrylamide (6% TBE-Urea, Invitrogen) or agarose gel (1%). After migration, the RNA was transferred to a Hybond-N membrane (Amersham Biosciences) and crosslinked with UV (1200 J). The membrane was prehybridized in Ultrahyb (Ambion) for 1 h at 42°C, followed by addition of radiolabeled oligonucleotide probe and hybridization overnight at 42°C. Membranes were washed twice with 2× SSC/0.1% SDS at room temperature followed by one wash with 2× SSC/0.1% SDS at 42°C for 2 min. Oligonucleotide probes were labelled by polynucleotide kinase according to manufacturer protocols (Fermentas) using [32P] ATP (6000 Ci/mmole; Perkin-Elmer). Probes were purified over mini quick spin columns (Roche) prior to use. Membranes were exposed on a phosphor screen, the screen revealed on a FLA-8000 (Fujifilm), and the image of the film quantified using ImageJ. The sequences of the probes used are listed in Section S6 of the Additional file 1.
Measurement of degradation constants
To determine the degradation constant γ q of the GFP reporter of fis, we grew a bacterial culture under the experimental conditions described above to exponential phase and added chloramphenicol to 100 μ g/ml. The fluorescence data obtained after growth arrest were fitted by an exponential to yield the degradation constant. A similar procedure was followed for the luciferase reporter. A value for the degradation constant γ p of Fis was obtained by growing cells to the same growth stage and treating them with spectinomycine (100 μ g/ml). 1 ml samples were removed every hour during 5 h and treated as described in the section on Western blot analysis. An exponential fit gave the value of γ p .
To determine the degradation constant γ n of the reporter mRNA, strains BW25113 containing either plasmid pZACR105 (gfp) or pZACR101 (lux) were used (Section S6 of the Additional file 1). In these plasmids, the gfp gene or the lux operon are cloned downstream of the PLtetO-1 promoter that is controlled by the TetR repressor (; Ranquet et al., in preparation). Derepression of the promoter is achieved by adding anhydrotetracycline (aTc). The strains were grown at 37°C to mid-log phase in LB medium, and aTc (500 ng/ml final) was added for 30 min to induce transcription of gfp or lux. Rifampicine (150 g/ml final) was then added to stop transcription and samples were taken every minute during 10 min. mRNA was isolated and detected as described in the section on Northern blot analysis. The degradation constant γ m of fis messages was determined by growing the strain BW25113 in LB to mid exponential phase, where Fis is the most abundant. Rifampicine was added and the mRNA was extracted as described above.
Modeling reporter gene systems
In order to measure the expression of the gene fis in E. coli, we have constructed two reporter plasmids with identical backbones, including the antibiotic resistance gene and the origin of replication. The first contains the gfpmut3*-asv reporter gene, a variant of the gene coding for the Green Fluorescent Protein (GFP) from the jellyfish Aequorea victoria. The second plasmid carries the luxCDABE operon from Xenorhabdus luminescens, encoding the enzymes of a light-producing pathway in this bacterium . Because fis has its expression controlled at the transcriptional level [26, 27, 40], we prepared transcriptional fusions in which the promoter region of fis is fused to the gfp gene or the lux operon.
Transcription of the gene fis gives rise to fis mRNA, which is subsequently translated into Fis protein. The synthesis of mRNA and protein is counterbalanced by growth dilution and degradation of the gene products. Together these processes determine the net accumulation of mRNA and protein in the cell. The expression of the gfp reporter gene follows roughly the same stages, with an important difference though. Fluorescent activity of GFP in response to light excitation depends on post-translational modifications, notably the folding of the protein to an appropriate conformation, including the autocatalytic formation of the chromophore . This maturation process gives rise to an additional reaction step from GFP to active GFP (Figure 1). In the luminescent reporter gene system, light is not emitted in response to an excitatory signal, but as a by-product of an oxidation reaction. This reaction is catalyzed by the heterodimeric enzyme luciferase and requires a substrate, a long-chain aldehyde, which is synthesized by enzymes co-expressed with luciferase from the lux operon .
Variables and constants used in the models of the expression of the host and reporter genes.
host mRNA concentration [M]
reporter mRNA concentration [M]
host protein concentration [M]
total reporter protein concentration [M]
active reporter protein concentration [M]
Promoter activity and growth rate
promoter activity [dimensionless]
growth rate [min-1]
transcription rate constant [M min1]
translation rate constant [min-1]
folding rate constant [min-1]
host mRNA degradation constant [min-1]
reporter mRNA degradation constant [min-1]
host protein degradation constant [min-1]
reporter protein degradation constant [min-1]
Here, r(t) stands for the concentration of active GFP, as compared to the total GFP concentration q(t), and κ r is the rate constant for the first-order folding reaction. κ r (q(t) - r(t)) thus represents the folding rate and we call ln 2/κ r the folding time of GFP. The model (6)-(8) can, with some variations, be found in other work [20–23, 46].
Notice that a number of implicit assumptions underlie the above models of host and reporter gene expression. First, the promoter activity κ m f (t) characterizes the transcription of both the host and reporter genes, which is a direct consequence of the use of transcriptional fusions to measure fis expression. Second, we assume that the translation constant is the same for host and reporter gene expression. In the case of Fis this is justified by the fact that translation is not regulated [26, 27, 40]. In situations where this assumption is not valid, and post-transcriptional regulation occurs, translational fusions to the gfp reporter gene should be used. Third, the degradation constants of active and inactive GFP are assumed to be identical, which is reasonable in the absence of evidence to the contrary. Fourth, delays in transcription and translation are small with respect to the folding time and can safely be ignored here. Fifth, the growth characteristics of the wild-type and reporter strains are the same, an assumption that we have validated by comparing the growth rates of the two strains (data not shown).
That is, the total luciferase concentration equals the active luciferase concentration (see  for a more detailed model of the luminescent reporter system).
Measurements by means of reporter gene systems
We have grown E. coli strains carrying the reporter plasmids in parallel on a microplate, in M9 minimal medium, and at a constant temperature of 37°C. The basic experiment consisted in adding glucose to a growth-arrested culture, following the protocol described in the Methods section, and repeatedly measuring the absorbance at 600 nm, as well as fluorescence and luminescence intensities. The time-series data were fitted to cubic regression splines using a minimization criterion that balances goodness of fit and parsimony (Methods and Additional file 1). The resulting spline fits of the primary data were corrected for background levels of absorbance, fluorescence, and luminescence. The background measurements were carried out on wells containing growth medium without bacteria (absorbance background), and on wells with strains carrying a reporter plasmid lacking a promoter upstream of the reporter gene (fluorescence and luminescence background) (see Methods).
Panels b and d of Figure 2 show 95% confidence bands for the corrected absorbance and light intensity which were computed using the bootstrap method described in the Methods section. The confidence bands are tight, reflecting the high precision of the measurements, and the curves are reproducible (see Section S3 in the Additional file 1).
Computation of reporter concentrations and synthesis rates
Since we do not know the proportionality constant in (10), we express concentrations in units RFU and RLU of the ratio I(t)/A(t). Notice that this provides a relative quantification of concentrations, as is usual in this kind of experiments. For most purposes, however, the relative concentrations are informative and robust measures of the dynamics of the system, for instance when we are interested in fold changes over the time-course of the experiment (see Discussion below). When this does not lead to ambiguities, we simply speak of concentrations instead of relative concentrations when we refer to variables with units RLU and RFU.
The degradation constant γ q in (11) was measured as described in the Methods section. Its value is almost the same for the two reporters: 0.012 ± 0.001 min-1 for GFP and 0.011 ± 0.001 min-1 for luciferase, corresponding to a half-life of about 1 h (remember that the half-life equals ln 2/γ q .) In the case of luciferase we have q(t) = r(t), so that the total reporter concentration and its derivative can be directly determined from the primary data by means of (10). The total GFP concentration is not generally equal to the active GFP concentration, as explained above. However, for the time being, we will assume this equality to hold for GFP as well, before considering appropriate corrections at a later stage.
The above analysis shows that, using (10)-(12), we are able to reconstruct the reporter concentration and the reporter synthesis rate (proportional to the mRNA concentration) from the primary data. The major question raised by this analysis is whether the reconstructed quantities for the reporter system reliably represent the corresponding quantities of the host system, that is, whether n(t) = m(t) and q(t) = p(t). As discussed above, this is a priori unlikely. Remember that in the case of GFP, we have neglected the maturation step, while the half-lives of the host and reporter mRNAs and proteins are generally different as well. On the other hand, if the expression profiles of the reporter genes turned out to be good approximations of those of the host gene, this would enormously simplify the analysis and interpretation of the data. We have therefore verified to which extent the reporter concentration and synthesis rate profiles computed from the reporter gene data deviate from direct measurements of the abundance of Fis protein and fis mRNA.
Direct measurements of fis gene expression
In a similar way, the synthesis rate of the GFP and luciferase reporters has been compared with Northern blot measurements at various stages of growth. Figure 5d-e shows the superposition of the Northern blot values and the synthesis rate profiles computed from the reporter gene data. The quantities have been normalized with respect to the value of the peak in exponential phase, as above. Following the definition of the reporter synthesis rate in (11), the normalized synthesis rate equals the normalized mRNA concentration. Again, there is a good overall correspondence between the profiles obtained from the reporter gene data and the direct measurements. Some significant deviations occur though, especially at the end of exponential phase (GFP data) and in mid-exponential phase (luciferase data).
We conclude from the agreement with direct measurements of Fis protein and fis mRNA that reporter genes are a reliable tool for tracking the shape of the expression profile of the host gene. It would be interesting to know if the local deviations that we also observe are due to the systematic biases identified above. In order to answer this question, we have developed computational procedures for correcting the profiles obtained from the reporter gene data for differences in half-life and for non-negligible folding times.
Correction of systematic biases in computed protein and mRNA concentrations
In general, the half-lives of protein and mRNA will not be the same for Fis and its reporters, that is, γ m ≠ γ n and γ p ≠ γ q . This difference in half-life will cause the mRNA concentrations computed from the reporter data to deviate from the actual concentrations of fis mRNA. For example, the inferred concentration will be underestimated if γ n /γ m > 1, that is, if the lux or gfp message half-life is shorter than that of fis. Through the dependence of the protein synthesis rate on the mRNA concentration, this also affects the computed protein concentrations. The latter effect is modulated by possible differences in half-life of the host and reporter proteins. In particular, if γ q /γ p > 1, the error in the predicted mRNA concentration will be accentuated, whereas in the case of γ q /γ p < 1 it will be attenuated.
Measured values of the degradation constants in the models of fis, gfp and lux expression.
0.012 (0.001) min-1
0.011 (0.001) min-1
0.0065 (0.0020) min-1
0.30 (0.13) min-1
0.33 (0.15) min-1
0.56 (0.23) min-1
This assumption is valid since in our experimental conditions the bacteria have been in stationary phase for more than 12 h before dilution into fresh growth medium.
This even holds for large half-life differences. As shown in Figure 6c-d, for values of γ n /γ m varying between 0.25 and 4, the predicted mRNA profile remains the same. Even when γ n /γ m is varied by 100-fold (see Additional file 1), the differences are quite moderate and the overall shape remains largely insensitive to this parameter. We observe that such large differences in half-life do not frequently occur in bacteria , contrary to what has been observed for yeast .
The formulas required for the computation of p(t) are derived in Section S4 of the Additional file 1.
The maturation time of GFP was set to 25 min, as determined experimentally for the reporter used in this study (GFPmut3) , thus yielding a value κ r = 0.023 min-1. That is, it takes 25 min to convert half of a given pool of inactive GFP to its active form.
We have also experimented with variants of GFP, in particular a rather slow folding RFP (Red Fluorescent Protein). In this case, there are considerable differences between the expression profiles obtained with luciferase (data not shown). The corrections and the corresponding confidence bands also become much larger. We conclude that a fast-folding reporter protein is essential for reliable real-time monitoring of gene expression.
Research in biology has moved from a descriptive science to considering biological processes as dynamical systems . This systems biology approach relies on the analysis and interpretation of dynamical measurements and therefore calls for a precise mathematical treatment of quantitative time-series data of gene expression [13, 18–23]. The present manuscript provides such an analysis by showing a way in which biologically relevant quantities, and their confidence intervals, can be rigorously computed from the primary data by means of kinetic models. In particular, in comparison with, for example [13, 20], we infer relative mRNA and protein concentrations for a host gene using luminescent or fluorescent reporter systems under the control of the same promoter as the host gene. We extend previous work by explicitly stating and experimentally verifying the validity of the assumptions that underlie this procedure. We notably assess the effect on the model predictions of uncertain values for some of the parameters that are difficult or time-consuming to measure (such as the protein or mRNA half-lives). When such values are available, the computational procedures we provide can be used for correcting systematic errors due to differences in degradation constants.
A first conclusion from our study is that the expression profiles computed from the fluorescence and luminescence data are generally in good agreement with the Northern and Western blots (Figure 5). This is remarkable considering the fact that the measurements were obtained with completely different experimental methods and the comparison only involves normalization with respect to a maximum value, i.e., uses no freely adjustable parameters. It implies that when the half-lives of the host-gene products are unknown, we can still obtain a result that preserves the qualitative shape of the expression profile. As long as the systematic biases in the reporter systems remain limited, that is, a rapid folding time of the fluorescent reporter and similar degradation constants of host and reporter gene products, the expression profiles obtained are accurate. This is illustrated by the results for the gene fis coding for a global regulator in E. coli.
If the systematic biases are too large to be ignored, corrections for the resulting errors need to be carried out. Our results show that a difference in mRNA half-life does not significantly contribute to these deviations. As a consequence, knowing the order of magnitude of the mRNA half-life of the host gene is already sufficient for reliably calculating the expression profile. The insensitivity of the expression profile to changes in mRNA half-life does not hold for protein half-life. Variations in this parameter maintain the overall shape of the expression profile, but affect the normalized concentration levels and the timing of the peak (Figure 7). In particular, the simulation studies reveal that the longer the half-life of the host protein as compared to that of the reporter, the more the actual expression profile of the host gene is delayed. This effect has to be kept in mind when trying to reconstruct or validate models of regulatory networks based on reporter gene data [42, 53–56]. It should notably be taken into account when attempting to infer network connections based solely on mRNA measurements, as in a typical microarray experiment. The effect of a particular protein will occur later than the transcription of its gene and the time delay depends on the protein half-life.
All computations have been carried out under the assumption that the mRNA half-life does not change in the course of the experiment. This assumption is certainly valid during exponential growth, but may fail during growth transitions or in situations where the mRNA half-life is regulated. Indeed, our data show a systematic deviation between the calculated and measured quantities of mRNA and protein at the entry into stationary phase that is partly unaccounted for, even after applying the above corrections. The mRNA and protein half-lives have been measured during exponential phase. Due to technical difficulties, we were unable to measure these parameters in stationary phase. It is conceivable that the mRNA half-life of fis increases at the transition to stationary phase. If this were the case, the actual mRNA and protein concentrations would be higher than the ones computed from the reporter gene measurements. This effect could indeed explain the remaining discrepancies between prediction and measurement in Figure 6. The analysis also confirms that the derived quantities, relative protein concentrations and synthesis rates (mRNA concentrations), are largely independent of the physical characteristics of the reporter gene system (Figures 3 and 4). This is quite remarkable given the vastly different physical properties of our two reporter systems. It is true that, in our data, we see some minor differences between the two reporter systems at the entry into stationary phase, notably visible in the protein synthesis rates (Figure 4). As explained in the Results section, these are most likely due to transient fluctuations of the reduction potential of the cell at the entry into stationary phase . This difference must be kept in mind when interpreting reporter gene data and we recommend to always use two different reporter systems in parallel in order to separate gene expression from other effects. Identical profiles derived from the two reporter systems have a good chance to faithfully represent the true expression pattern of the host gene.
Finally, we note that the approach described in this paper yields relative rather than absolute measures of gene expression. As a consequence, the validation of the approach by means of Northern and Western blots concerns the comparison of relative values. In order to obtain an absolute quantification of protein concentrations, the proportionality constant in (10) needs to be determined by relating the fluorescence and luminescence intensity units to the number of (active) molecules, and the absorbance units to the number of (viable) cells. In addition, for an absolute quantification of mRNA concentrations the synthesis constant κ p needs to be measured. The techniques for doing this are time-consuming and error-prone, although novel approaches developed in the context of single-cell measurements may improve the absolute quantification of gene products (e.g., [58–60]). The calibration of the approach to obtain reliable absolute measures is an interesting perspective for further research. However, for many purposes in systems biology the determination of relative measures is sufficient, and our approach provides a speed-up and solid foundation for achieving this.
Research in biology has made the transition from a more or less intuitive understanding of the system to a quantitative, formal description. This systems biology approach crucially depends on the availability of reliable, quantitative data. Data acquisition techniques have enormously progressed in the past decade, but require sound and general methods for analyzing these data. The current manuscript contributes to the development of such methods and forms the basis for future analyses of the dynamics of regulatory systems. The present formalism is geared towards bacterial expression. However, small modifications of the method will allow to include additional reaction steps inherent in eukaryotic gene expression, such as splicing and nuclear export.
The authors would like to thank Bruno Besson and Antoine Frénoy (INRIA Grenoble - Rhône-Alpes) for help with the data analysis. We also thank Dominique Schneider (LAPM, Grenoble) and Charles Dorman (Trinity College, Dublin) for providing the Fis antibodies. C. Ranquet is grateful to Nadim Majdalani (NCI, Bethesda), Alexandre Bougdour and Ali Hakimi (LAPM, Grenoble) for advice and technical assistance concerning the Northern blot experiments. We acknowledge financial support from the ARC initiative at INRIA (GDyn project), the ACI IMPBio initiative of the French Ministry for Research (BacAttract project), the ANR BioSys (MetaGenoReg project), and the NEST programme of the European Commission (Hygeia project, NEST 4995, and EC-MOAN project, NEST-PATH-COM/043235).
- Giepmans B, Adams S, Ellisman M, Tsien R: The fluorescent toolbox for assessing protein location and function. Science. 2006, 312 (5771): 217-224. 10.1126/science.1124618View ArticlePubMedGoogle Scholar
- Greer L, Szalay A: Imaging of light emission from the expression of luciferases in living cells and organisms: A review. Luminescence. 2002, 17: 43-74. 10.1002/bio.676View ArticlePubMedGoogle Scholar
- Longo D, Hasty J: Dynamics of single-cell gene expression. Mol Syst Biol. 2006, 2: 64- 10.1038/msb4100110PubMed CentralView ArticlePubMedGoogle Scholar
- Shav-Tal Y, Singer R, Darzacq X: Imaging gene expression in single living cells. Nat Rev Mol Cell Biol. 2004, 5 (10): 855-861. 10.1038/nrm1494View ArticlePubMedGoogle Scholar
- Southward C, Surette M: The dynamic microbe: Green fluorescent protein brings bacteria to light. Mol Microbiol. 2002, 45 (5): 1191-1196. 10.1046/j.1365-2958.2002.03089.xView ArticlePubMedGoogle Scholar
- Rao C, Wolf D, Arkin A: Control, exploitation and tolerance of intracellular noise. Nature. 2002, 420 (6912): 231-237. 10.1038/nature01258View ArticlePubMedGoogle Scholar
- Kaern M, Elston T, Blake W, Collins J: Stochasticity in gene expression: From theories to phenotypes. Nat Rev Genet. 2005, 6 (6): 451-464. 10.1038/nrg1615View ArticlePubMedGoogle Scholar
- Wilkinson D: Stochastic modelling for quantitative description of heterogeneous biological systems. Nat Rev Genet. 2009, 10 (2): 122-133. 10.1038/nrg2509View ArticlePubMedGoogle Scholar
- Van Dyk T, Wei Y, Hanafey M, Dolan M, Reeve M, Rafalski J, Rothman-Denes L, LaRossa R: A genomic approach to gene fusion technology. Proc Natl Acad Sci USA. 2001, 98 (5): 2555-2560. 10.1073/pnas.041620498PubMed CentralView ArticlePubMedGoogle Scholar
- Zaslaver A, Bren A, Ronen M, Itzkovitz S, Kikoin I, Shavit S, Liebermeister W, Surette M, Alon U: A comprehensive library of fluorescent transcriptional reporters for Escherichia coli. Nat Meth. 2006, 3 (8): 623-628. 10.1038/nmeth895.View ArticleGoogle Scholar
- Kobiler O, Rokney A, Friedman N, Court D, Stavans J, Oppenheim A: Quantitative kinetic analysis of the bacteriophage λ genetic network. Proc Natl Acad Sci USA. 2005, 102 (12): 4470-4475. 10.1073/pnas.0500670102PubMed CentralView ArticlePubMedGoogle Scholar
- Lu C, Albano C, Bentley W, Rao G: Quantitative and kinetic study of oxidative stress regulons using green fluorescent protein. Biotechnol Bioeng. 2005, 89 (5): 574-587. 10.1002/bit.20389View ArticlePubMedGoogle Scholar
- Ronen M, Rosenberg R, Shraiman B, Alon U: Assigning numbers to the arrows: Parameterizing a gene regulation network by using accurate expression kinetics. Proc Natl Acad Sci USA. 2002, 99 (16): 10555-10560. 10.1073/pnas.152046799PubMed CentralView ArticlePubMedGoogle Scholar
- Van Dyk T, DeRose E, Gonye G: LuxArray, a high-density, genomewide transcription analysis of Escherichia coli using bioluminescent reporter strains. J Bacteriol. 2001, 183 (19): 5496-5505. 10.1128/JB.183.19.5496-5505.2001PubMed CentralView ArticlePubMedGoogle Scholar
- Forde C, Rocco J, Fitch F, McCutchen-Maloney S: Real-time characterization of virulence factor expression in Yersinia pestis using a GFP reporter system. Biochem Biophys Res Commun. 2004, 324 (2): 795-800. 10.1016/j.bbrc.2004.08.236View ArticlePubMedGoogle Scholar
- Setty Y, Mayo A, Surette M, Alon U: Detailed map of a cis-regulatory input function. Proc Natl Acad Sci USA. 2003, 100 (13): 7702-7707. 10.1073/pnas.1230759100PubMed CentralView ArticlePubMedGoogle Scholar
- Elowitz M, Leibler S: A synthetic oscillatory network of transcriptional regulators. Nature. 2000, 403 (6767): 335-338. 10.1038/35002125View ArticlePubMedGoogle Scholar
- Finkenstädt B, Heron E, Komorowski M, Edwards K, Tang S, Harper C, Davis J, White M, Millar A, Rand D: Reconstruction of transcriptional dynamics from gene reporter data using differential equations. Bioinformatics. 2008, 24 (24): 2901-2907. 10.1093/bioinformatics/btn562PubMed CentralView ArticlePubMedGoogle Scholar
- Gold D, Mallick B, Coombes K: Real-time gene expression: Statistical challenges in design and inference. J Comput Biol. 2008, 15 (6): 611-624. 10.1089/cmb.2007.0220View ArticlePubMedGoogle Scholar
- Huang Z, Senocak F, Jayaraman A, Hahn J: Integrated modeling and experimental approach for determining transcription factor profiles from fluorescent reporter data. BMC Syst Biol. 2008, 2: 64- 10.1186/1752-0509-2-64PubMed CentralView ArticlePubMedGoogle Scholar
- Leveau J, Lindow S: Predictive and interpretive simulation of green fluorescent protein expression in reporter bacteria. J Bacteriol. 2001, 183 (23): 6752-6762. 10.1128/JB.183.23.6752-6762.2001PubMed CentralView ArticlePubMedGoogle Scholar
- Subramanian S, Srienc F: Quantitative analysis of transient gene expression in mammalian cells using the green fluorescent protein. J Biotechnol. 1996, 49 (1-3): 137-151. 10.1016/0168-1656(96)01536-2View ArticlePubMedGoogle Scholar
- Wang X, Errede B, Elston T: Mathematical analysis and quantification of fluorescent proteins as transcriptional reporters. Biophys J. 2008, 94 (6): 2017-2026. 10.1529/biophysj.107.122200PubMed CentralView ArticlePubMedGoogle Scholar
- Bradley M, Beach M, de Koning A, Pratt T, Osuna R: Effects of Fis on Escherichia coli gene expression during different growth stages. Microbiol. 2007, 153 (9): 2922-2940. 10.1099/mic.0.2007/008565-0.View ArticleGoogle Scholar
- Azam TA, Iwata A, Nishimura A, Ueda S, Ishihama A: Growth phase-dependent variation in protein composition of the Escherichia coli nucleoid. J Bacteriol. 1999, 181 (20): 6361-6370.Google Scholar
- Mallik P, Pratt T, Beach M, Bradley M, Undamatla J, Osuna R: Growth phase-dependent regulation and stringent control of fis are conserved processes in enteric bacteria and involve a single promoter (fis P) in Escherichia coli. J Bacteriol. 2004, 186: 122-135. 10.1128/JB.186.1.122-135.2004PubMed CentralView ArticlePubMedGoogle Scholar
- Ninnemann O, Koch C, Kahmann R: The E. coli fis promoter is subject to stringent control and autoregulation. EMBO J. 1992, 11 (3): 1075-1083.PubMed CentralPubMedGoogle Scholar
- Baba T, Ara T, Hasegawa M, Takai Y, Okumura Y, Baba M, Datsenko K, Tomita M, Wanner B, Mori H: Construction of Escherichia coli K-12 in-frame, single-gene knock-out mutants: The Keio collection. Mol Syst Biol. 2006, 2: 10.1038/msb4100050. 2006.0008.,Google Scholar
- Déthiollaz S, Eichenberger P, Geiselmann J: Influence of DNA geometry on transcriptional activation in Escherichia coli. EMBO J. 1996, 15 (19): 5449-5458.PubMed CentralPubMedGoogle Scholar
- Miller J: Experiments in Molecular Genetics. 1972, Cold Spring Harbor, NY: Cold Spring Harbor Laboratory,Google Scholar
- de Boor C: A Practical Guide to Splines. 2001, New York: Springer-Verlag, 2,Google Scholar
- Hastie T, Tibshirani R: Generalized Additive Models. 1999, Boca Raton, FL: CRC Press,Google Scholar
- Lee T: On algorithms for ordinary least square regression spline fitting: A comparative study. J Stat Comput Simul. 2002, 72 (8): 647-663. 10.1080/00949650213743.View ArticleGoogle Scholar
- Hamilton L: Regression with Graphics: A Second Course in Applied Statistics. 1992, Belmond, CA: Duxbury Press,Google Scholar
- Abramoff M, Magelhaes P, Ram S: Image processing with ImageJ. Biophoton Int. 2004, 11 (7): 36-42.Google Scholar
- Aiba H, Adhya S, de Crombrugghe B: Evidence for two functional gal promoters in intact Escherichia coli cells. J Biol Chem. 1981, 256 (22): 11905-11910.PubMedGoogle Scholar
- Lutz R, Bujard H: Independent and tight regulation of transcriptional units in Escherichia coli via LacR/O, the TetR/O and AraC/l1-l2 regulatory elements. Nucleic Acids Res. 1997, 25 (6): 1203-1210. 10.1093/nar/25.6.1203PubMed CentralView ArticlePubMedGoogle Scholar
- Andersen J, Sternberg C, Poulsen L, Bjorn S, Givskov M, Molin S: New unstable variants of green fluorescent protein for studies of transient gene expression in bacteria. Appl Environ Microbiol. 1998, 64 (6): 2240-2246.PubMed CentralPubMedGoogle Scholar
- Meighen E: Molecular biology of bacterial bioluminescence. Microbiol Rev. 1991, 55: 123-142.PubMed CentralPubMedGoogle Scholar
- Walker K, Atkins C, Osuna R: Functional determinants of the Escherichia coli fis promoter: Roles of -35, -10, and transcription initiation regions in the response to stringent control and growth phase-dependent regulation. J Bacteriol. 1999, 181 (4): 1269-1280.PubMed CentralPubMedGoogle Scholar
- Tsien R: The green fluorescent protein. Annu Rev Biochem. 1998, 67: 509-544. 10.1146/annurev.biochem.67.1.509View ArticlePubMedGoogle Scholar
- de Jong H: Modeling and simulation of genetic regulatory systems: A literature review. J Comput Biol. 2002, 9: 67-103. 10.1089/10665270252833208View ArticlePubMedGoogle Scholar
- Goodwin B: Temporal Organization in Cells. 1963, New York, N.Y.: Academic Press,Google Scholar
- Kremling A: Comment on mathematical models which describe transcription and calculate the relationship between mRNA and protein expression ratio. Biotechnol Bioeng. 2007, 96 (4): 815-819. 10.1002/bit.21065View ArticlePubMedGoogle Scholar
- Tyson J, Othmer H: The dynamics of feedback control circuits in biochemical pathways. Prog Theor Biol. 1978, 5: 1-62.View ArticleGoogle Scholar
- Tigges M, Marquez-Lago T, Stelling J, Fussenegger M: A tunable synthetic mammalian oscillator. Nature. 2009, 457 (7227): 309-12. 10.1038/nature07616View ArticlePubMedGoogle Scholar
- Kelly C, Hsiung CJ, Lajoie C: Kinetic analysis of bacterial bioluminescence. Biotechnol Bioengin. 2003, 81 (3): 370-378. 10.1002/bit.10475.View ArticleGoogle Scholar
- Ball C, Osuna R, Ferguson K, Johnson R: Dramatic changes in Fis levels upon nutrient upshift in Escherichia coli. J Bacteriol. 1992, 174 (24): 8043-8056.PubMed CentralPubMedGoogle Scholar
- Bernstein J, Lin PH, Cohen S, Lin-Chao S: Global analysis of Escherichia coli RNA degradosome function using DNA microarrays. Proc Natl Acad Sci USA. 2004, 101 (9): 2758-63. 10.1073/pnas.0308747101PubMed CentralView ArticlePubMedGoogle Scholar
- Wang Y, Liu C, Storey J, Tibshirani R, Herschlag D, Brown P: Precision and functional specificity in mRNA decay. Proc Natl Acad Sci USA. 2002, 99 (9): 5860-5. 10.1073/pnas.092538799PubMed CentralView ArticlePubMedGoogle Scholar
- Cormack B, Valdivia R, Falkow S: FACS-optimized mutants of the green fluorescent protein (GFP). Gene. 1996, 173 (1 Spec No): 33-38. 10.1016/0378-1119(95)00685-0View ArticlePubMedGoogle Scholar
- Szallasi Z, Periwal V, Stelling J, : System Modeling in Cellular Biology: From Concepts to Nuts and Bolts. 2006, Cambridge, MA: MIT Press,Google Scholar
- Bansal M, Belcastro V, Ambesi-Impiombato A, di Bernardo D: How to infer gene networks from expression profiles. Mol Syst Biol. 2007, 3: 78-PubMed CentralView ArticlePubMedGoogle Scholar
- Cho KH, Choo SM, Jung S, Kim JR, Choi HS, Kim J: Reverse engineering of gene regulatory networks. IET Syst Biol. 2007, 1 (3): 149-163. 10.1049/iet-syb:20060075View ArticlePubMedGoogle Scholar
- Gardner T, Faith J: Reverse-engineering transcription control networks. Phys Life Rev. 2005, 2: 65-88. 10.1016/j.plrev.2005.01.001.View ArticlePubMedGoogle Scholar
- Markowetz F, Spang R: Inferring cellular networks: A review. BMC Bioinform. 2007, 28 (Suppl 6): S5-10.1186/1471-2105-8-S6-S5.View ArticleGoogle Scholar
- Buckstein M, He J, Rubin H: Characterization of nucleotide pools as a function of physiological state in Escherichia coli. J Bacteriol. 2008, 190 (2): 718-726. 10.1128/JB.01020-07PubMed CentralView ArticlePubMedGoogle Scholar
- Cai L, Friedman N, Xie X: Stochastic protein expression in individual cells at the single molecule level. Nature. 2006, 440 (7082): 358-362. 10.1038/nature04599View ArticlePubMedGoogle Scholar
- Golding I, Paulsson J, Zawilski S, Cox E: Real-time kinetics of gene activity in individual bacteria. Cell. 2005, 123 (6): 1025-1036. 10.1016/j.cell.2005.09.031View ArticlePubMedGoogle Scholar
- Rosenfeld N, Perkins T, Alon U, Elowitz M, Swain P: A fluctuation method to quantify in vivo fluorescence data. Biophys J. 2006, 91 (2): 759-766. 10.1529/biophysj.105.073098PubMed CentralView ArticlePubMedGoogle Scholar