Master equation simulation analysis of immunostained Bicoid morphogen gradient
© Wu et al. 2007
Received: 26 July 2007
Accepted: 16 November 2007
Published: 16 November 2007
Skip to main content
© Wu et al. 2007
Received: 26 July 2007
Accepted: 16 November 2007
Published: 16 November 2007
The concentration gradient of Bicoid protein which determines the developmental pathways in early Drosophila embryo is the best characterized morphogen gradient at the molecular level. Because different developmental fates can be elicited by different concentrations of Bicoid, it is important to probe the limits of this specification by analyzing intrinsic fluctuations of the Bicoid gradient arising from small molecular number. Stochastic simulations can be applied to further the understanding of the dynamics of Bicoid morphogen gradient formation at the molecular number level, and determine the source of the nucleus-to-nucleus expression variation (noise) observed in the Bicoid gradient.
We compared quantitative observations of Bicoid levels in immunostained Drosophila embryos with a spatially extended Master Equation model which represents diffusion, decay, and anterior synthesis. We show that the intrinsic noise of an autonomous reaction-diffusion gradient is Poisson distributed. We demonstrate how experimental noise can be identified in the logarithm domain from single embryo analysis, and then separated from intrinsic noise in the normalized variance domain of an ensemble statistical analysis. We show how measurement sensitivity affects our observations, and how small amounts of rescaling noise can perturb the noise strength (Fano factor) observed. We demonstrate that the biological noise level in data can serve as a physical constraint for restricting the model's parameter space, and for predicting the Bicoid molecular number and variation range. An estimate based on a low variance ensemble of embryos suggests that the steady-state Bicoid molecular number in a nucleus should be larger than 300 in the middle of the embryo, and hence the gradient should extend to the posterior end of the embryo, beyond the previously assumed background limit. We exhibit the predicted molecular number gradient together with measurement effects, and make a comparison between conditions of higher and lower variance respectively.
Quantitative comparison of Master Equation simulations with immunostained data enabled us to determine narrow ranges for key biophysical parameters, which for this system can be independently validated. Intrinsic noise is clearly detectable as well, although the staining process introduces certain limits in resolution.
Recently considerable attention has been given to the characterization and understanding of intrinsic molecular noise in biological systems [1–8]. Nearly all of these studies were performed using in vivo fluorescent reporters in single cell systems. In multicellular organisms, however, most quantitative gene expression data are obtained from fixed tissues. Examples of such data for the Drosophila segmentation system are contained in the FlyEx database, which provides spatiotemporal data on the expression of developmental segmentation genes . These data on protein expression levels are at cellular resolution and were obtained by means of immunofluorescence histochemistry and confocal scanning microscopy. At a large spatial scale, expression levels in these embryos form expression domains characteristic for each gene, but smaller fluctuations in expression levels between adjacent nuclei appear random. In this paper, we investigate the question of whether these fluctuations are a consequence of intrinsic molecular noise or stem from some type of measurement uncertainty. These alternatives are, of course, not mutually exclusive.
A complicating factor in separating the above alternatives is that each one involves an unknown chemical mechanism. Intrinsic noise will have a major contribution from fluctuations in the rate of initiation of transcription, but the chemical mechanisms underlying this process in multicellular organisms are very poorly understood. Measurement uncertainty can stem from chemical causes such as fluctuations in the number of primary and secondary antibody molecules which bind to proteins in the fixed embryo. The chemistry of this process is also very poorly understood. If observations could be made on a process whose fluctuation properties could be reliably predicted by a numerical model, comparison of the predicted fluctuations with those observed will provide critical information for distinguishing whether observed nucleus-to-nucleus variations are a consequence of intrinsic biological noise or merely fluctuations arising from the staining procedure.
An excellent candidate for a process with predictable fluctuation properties is one that involves only diffusion and first order decay. There is good evidence that the formation of the protein gradient of the morphogen Bicoid (Bcd) takes place by means of these two processes. Bcd protein is distributed in an exponential profile along the anterior-posterior (A-P) axis with higher concentrations towards the anterior [10, 11]. This gradient forms by translation of maternally deposited mRNA at the anterior pole of the embryo, and the synthesized protein spreads through the syncytial embryo by diffusion accompanied by decay . The observed exponential profile corresponds to a solution of Fick's equation for a substance undergoing first order decay and diffusing from a point source in one dimension, and hence it is reasonable to suppose that the reaction-diffusion mechanism leading to the formation of the gradient is reasonably well understood. Quantitative observations of this gradient in fixed tissue exhibit small fluctuations between neighboring nuclei, while the overall exponential profile ensures that such fluctuations can be monitored over a wide range of concentrations which extend to the lower limits of detectability.
The intrinsic fluctuations of the Bcd gradient will be well described by a stochastic Reaction-Diffusion Master Equation (RDME). Such equations typically do not have analytic solutions and are usually solved by running repeated stochastic simulations. A well known simulation algorithm due to Gillespie [13, 14] performs an exact simulation of the Chemical Master Equation for a well mixed system. This method has been extended to spatially distributed systems by Elf and others [15, 16]. These authors divide the spatially extended system into a series of subvolumes that are small enough to be regarded as well mixed. In each subvolume chemical reactions are simulated by Gillespie's algorithm, while diffusion between subvolumes is treated as a first order reaction.
In this study, we compare the results of such simulations to data in order to discover whether or not the data is sufficiently accurate as to exhibit the signature of a simple stochastic process. Stochastic processes underlying biological regulation can in general form complex patterns as a result of the reaction network, and for this reason a full consideration of 3 dimensional geometry is often necessary . In the case considered here fluctuations occur passively in the course of diffusion and the statistical signature that we seek is independent of detailed geometry. For these reasons, we chose to model a 1 dimensional system.
In the following, we first characterize the statistical properties of random variations in expression level between adjacent nuclei of individual embryos and compare them to the results of stochastic simulations. At the level of single embryos, we do not find a clear signature of stochastic processes in the data, but the need to separate spatially changing mean expression values from their variation limits the amount of statistical information that can be obtained from individual embryos. To address this problem, we consider ensembles of embryos, both over the whole dataset and over a restricted subset with low embryo-to-embryo variation.
In order to interpret these data, we introduce a stochastic model of the immunostaining procedure. We denote all random variables in this article with an upper hat but write a specific value without the hat.Thus, for example, denotes a random variable for the number of Bcd molecules in subvolume j, while n j denotes a particular value of this variable.
F [I j ] = a exp(-j/λ).
where is an unknown random variable independent of j, so that the variance of ln(I j ) is equal to the variance of .
where the factor m represents the proportionality between one Bcd molecule and the corresponding fluorescence intensity, which includes the combined effects of tissue fixation, first and second antibody binding, fluorescence excitation and image processing.
As we do not know the exact in vivo system parameters and molecular number within each nucleus, we performed a complete inspection of the behavior of the variance of m in the four dimensional parameter space f(m, J, D, ω), where J is the synthesis rate of Bcd in the anterior compartment, ω is its decay rate, and D is the diffusion coefficient. It was always true that the residuals in logarithmic coordinate increased towards the posterior of the embryo, or in other words at lower levels of Bcd. This behavior strongly contrasts with the position-independent residuals seen in Figure 1 and described in equation (2).
Here is a spatially uniform random variable which replaces m in equation (3), and is a spatially uniform random variable which represents nonspecific background staining. This picture allows us to consider noise that arises from both intrinsic and measurement related sources.
The simplest way to understand the consequences of (4) is to imagine the consequences if only one of , , and are allowed to have finite variance while the other two are constrained to deterministic (zero variance) behavior. We have already pointed out that allowing finite variance for with and deterministic leads to an increase of variance towards the posterior. The same is true if all noise comes from , that is if noise from background staining dominates. This will also lead to more noise in the logarithmic domain towards the posterior as provides a larger proportion of the total detected signal. Let us next consider the case where all noise comes from .
Comparison with equation (2) indicates that = ln(1 + σ α ).
In summary, we suspect the nucleus-to-nucleus variation observed in our data comes chiefly from the experimental rescaling noise , which is normally distributed. If Bcd intrinsic noise is to be observed, then the fluorescence noise intensity should be a function of the mean intensity in logarithm, instead of a constant as observed in embryo ms18. Nevertheless, the necessity of considering data in spatially resolved bins limits the amount of information that can be obtained from a single embryo. More information can be obtained by pooling data from many embryos, and we discuss this point in the next section.
Statistical analysis of many embryos is required in order to take our analysis further. This analysis will show how physical constraints on the model can be inferred from the ensemble dataset, and independent random variables separated. We consider a set of embryos indexed by i with expression levels I ij , and then pool data from corresponding bins to obtain the ensemble dataset . Since this dataset includes embryo-to-embryo variability, i.e. the variation of system parameters and experimental conditions over different embryos, the variance of the ensemble profile will be an upper bound for the average variance within each embryo. This allows us to identify physical constraints for system parameters and to determine if the model behaves properly within the permitted range of parameters.
We define independent global random variables as described in the last section with normal distributed measurement uncertainty = m (1 + σ α ). We also now assume that background noise is normally distributed with = σ β . We assume such global random variable represent the average variability for each embryo. The statistics of simulated global random variables were then collected from 2000 stochastic simulation runs and the Bcd molecular number random variable were sampled after reaching steady state. In this section we explicitly consider the effects of different choices for the molecule-to-fluorescence rescaling ratio m by comparing simulations to data at differing values of this parameter. Because m is not an explicit input to the model, this comparison is effected by varying the synthesis rate J, which varies the molecular number, and comparing the behavior of the model to fluorescence data which is on a fixed but arbitrary scale.
In (8a), ⟨ ⟩ = ma' exp(-j/λ') from simulation and from the ensemble data. Because λ' is only determined by D and ω, we first select combinations of D and ω such that λ' = λ. Selecting a synthesis rate J determines a', and also m, because m = a/a'. Finally, biologically reasonable values of J are determined by the constraint (8b).
In summary, when Bcd molecular number is large in the anterior region of the embryo, we expect to see a constant level of normalized variance in our fluorescence data, contributed solely by rescaling noise . Thus, can be identified independently from m and other random variables in our data.
With regard to the molecular parameters of diffusion, adopting the value of D = 17.2 (μm2/s) given in the literature  together with our observed value of λ = 80.65 (μm) for this ensemble implies that ω = 0.0027 (s-1) in order to satisfy (8a). These values, for any J, will cause the mean molecular number gradient, , to be in steady state after about 4000 seconds (cycle 8). A lower bound of J > 30 (molecules/s) and an upper bound of m <0.7 is imposed by the experimentally observed magnitude of η 2. Finally, we estimate the constraint for background noise by satisfying (8b) in the posterior end of the embryo. Violation of the inequality (8b) would cause the black model curve to be above the data on the right hand side of Figure 2A, and hence we have an upper bound σ β < 1.7.
In summary, analysis of our high-variance ensemble of embryos dataset implies that the Bcd synthesis rate is higher than 30 (molecules/s). Figure 2B shows a scatterplot of the molecular number associated with this synthesis rate. The panel indicates that most nuclei in the anterior fifth of the embryo contain more than 200 molecules of Bcd after reaching steady state, and that these molecular numbers can fluctuate over a range of more than 80 molecules in this region (Fig. 2C). This panel shows that these molecular fluctuations, even at the largest level of normalized variance compatible with data, still do not account for the observed variance in experimental observations. The additional variance comes chiefly from rescaling noise . Background noise only has a significant effect at the posterior end of the embryo, and indeed dominates the normalized variance in that region (Fig. 2A). Towards the posterior, rises faster than , and its sharp rise may in certain cases serve as a marker to distinguish regimes dominated by molecular noise from those dominated by background noise.
These 17 processed embryos constitute an ensemble for statistical analysis as described in the previous section. A trade-off of this treatment is the loss of statistical sample size, with only around 30 nuclei in each bin.
Figure 2D shows that this ensemble of 17 embryos has lower normalized variance compared to the 89 embryos ensemble in Figure 2A. The fluctuation of normalized variance is also higher because of smaller sample size. Note that rescaling noise is dominant over a larger portion of the embryo than is the case for the full 89 embryo ensemble. We estimate an upper bound for rescaling noise σ α to be 0.13. At this point it is possible to determine the smallest J compatible with variance as was done for the high variance ensemble. We do not do so, however, because in this ensemble the contribution of is too small to be separated from .
We compared this data to simulations performed using the same parameter values reported in the previous section, except that λ (for the whole ensemble) had a value of 90.91 (μm), leading to ω taking on a value of 0.00215 (s-1). These values cause the deterministic system to relax to steady state after 4000 seconds, as was the case with the high variance ensemble. In the high variance ensemble (Fig. 2A), the choice of σ β was dictated by the necessity of matching the observed variance along the entire A-P axis. In the case of the low variance ensemble this is not required, but measurements from nonexpressing nuclei indicate that σ β is equal to about 1. It is important to have at least a rough estimate for σ β so that fluctuations from this source are not spuriously assigned to fluctuations in molecular number. Finally, these constraints require that the lower bound of synthesis rate J be 200 (molecules/s) in order that that . The corresponding upper bound of molecule-to-fluorescence rescaling ratio is m = 0.07.
The lower limit of J imposed by this ensemble of low variance embryos in turn implies that there must be more than 300 Bcd molecules per subvolume in the middle of the embryo (j = 50) after reaching steady state (Fig. 2E). Moreover, it implies that the Bcd molecular gradient does not drop to 0 as the fluorescence intensity reaches the presumed background level, but remains at a level of at least 50 molecules per subvolume at j = 80, and 36 molecules per subvolume at the posterior pole (j = 100). Note that the variance of fluorescence measurements is similar between the two embryo ensembles (compare the green areas in Fig. 2C and Fig. 2F), but that the portion of that variance assigned to fluctuations in molecular number is smaller for the ensemble of 17 embryos (compare the red areas in Fig. 2C and Fig. 2F). The tighter constraints from the smaller ensemble make the lower limits on molecular number higher (compare Fig. 2B and Fig. 2E), and the lower limit on the variance of molecular number higher (compare the blue areas in Fig. 2C and Fig. 2F), although the higher limit of the normalized variance becomes smaller (compare the data points in Fig. 2A and Fig. 2D). The Bcd molecular number will thus vary by more than 100 molecules in the middle of the embryo. The 13% rescaling noise is still the main source of the characteristic variation observed in the anterior region of our fluorescence intensity data.
We have compared the nucleus to nucleus variation in expression levels of the exponentially distributed Bcd gradient observed in fixed tissue in a steady state with a stochastic model of the diffusion equation. The model is well supported, in the sense that there is a well-supported physical model for the spatial dependence of mean concentrations of Bcd [12, 20] on the scale of the embryo. The first major result of our analysis is to note that in many individual embryos the nucleus to nucleus variation in the log of concentration is independent of spatial position. This pattern of variation, which amounts to multiplicative noise in concentration space, is completely incompatible with the stochastic behavior of the diffusion equation. Simulations of the diffusion equation over an exhaustively large region of parameter space without exception give rise to solutions in which nucleus to nucleus variation of the bcd gradient is a function of position in the embryo, whether this variation is measured directly in Bcd levels or in their logarithms.
The data which we compare the model to is in the form of fluorescence levels, not concentrations. Although there is now good evidence that the specific batch of serum used to obtain this data has a mean response to Bcd  which is linear, there is no quantitative information about the variance of this sensitivity. Previous work on intrinsic molecular noise in yeast and bacteria utilized GFP [27, 28]in vivo, a situation where fluorescence is detected without molecular amplification. In the data reported here, and in most studies with fixed tissue, the signal from bound primary antibodies is amplified by incubation with secondary antibodies conjugated to a fluorescent dye. It is easily imaginable that this amplification process itself is subject to molecular fluctuations. These fluctuations would then give rise to rescaling noise in the proportionality between fluorescence levels and primary antigen molecular number from nucleus to nucleus. We have shown that such variation can explain the multiplicative noise observed.
When data from fixed embryos are pooled, better statistics are obtained. We analyzed data pooled from the entire dataset (N = 89) as well as a smaller pool of data from a set of embryos selected to have nearly identical mean Bcd profiles (N = 17). In order to analyze these data, we constructed an explicit statistical model. The model considers three sources of variance: intrinsic noise, rescaling noise, and background noise. By means of this statistical model it is possible to separate, at least roughly, the contributions of different sources of fluctuation. We have discussed mechanisms for the first two of these; the third, background noise, represents small fluctuations in the quantity of nonspecific molecules (background) from nucleus to nucleus. Because mean background has been previously removed from this data , the background noise has a mean of zero. We have confirmed that the background removal method does not affect our results, and the mean background intensity before removal is independent from the Bcd molecule-to-fluorescence rescaling ratio (data not shown). We also found from non-expressing areas of our data that the background noise (standard deviation) has about 54% positive correlation to the mean background intensity.
The results of this analysis constrained the physical parameters of the stochastic model considerably, with sharper constraints provided by the smaller dataset. The data require that the synthesis rate, J, of Bcd from its pool of anteriorly deposited mRNA be greater than 200 (molecules/s). We chose the subvolumes of the model to have the same volume as a nucleus, and hence the constraint on J also implies that there are a mean of at least 300 molecules of Bcd per nucleus at the midpoint of the embryo, and mean levels of at least 36 molecules of Bcd per nucleus at the posterior pole. In terms of concentration, our results show that Bcd concentration at the midpoint of the embryo is greater than 4 nM. Recently, a direct in vivo measurement of Bcd concentration was performed and yielded a mid embryo Bcd concentration of 8 nM , fully compatible with our results.
Although we are able to extract a clear signature of intrinsic molecular noise from the data by means of the statistical model, we also showed that at least one quantity diagnostic for different stochastic mechanisms, the Fano factor, cannot be read out from the data. Although the Fano factor v = 1 in the simulations, the full statistical model gives rise to a Fano factor which is a function of the mean, and even if all noise is restricted to be intrinsic, the observed Fano factor depends on the scale for conversion from fluorescence to molecular number.
We believe that our results in general demonstrate that fixed material processed with secondary fluorophores is not well suited to studies of molecular fluctuations. This arises from three issues, which may be separable. Fixation obviously prevents repeated observations on the same cell. While that is clearly a limitation, it need not affect an investigation of a molecule whose mean values are in steady state. The other two issues concern amplification. GFP is intrinsically fluorescent, but antibodies must bind to antigen, a process that is in itself subject to intrinsic molecular fluctuations. In the present study, the level of molecular fluctuation is doubtless increased by the need to bind secondary antibodies conjugated to fluorophore to the already bound primary antibody. It is thus possible that better data from fixed tissue could be obtained by conjugating dye directly to the primary antibody. This may prove to be a useful measure in situations where constructing a GFP fusion that can functionally substitute for the native gene is difficult or impossible.
More generally, we suggest that it is important to know how precision and robustness of developmental control is achieved at the molecular number level throughout development. Indeed, there is little doubt that stochastic processes are important later in development. Adult Drosophila normally have four scutellar bristles. In mutants of the scute gene, the number of bristles varies between one and three , strongly suggesting a stochastic process. On a more theoretical level, it has been suggested that fluctuations can augment the operating capabilities of biological regulatory networks . Our results indicate that in vivo monitoring of gene expression will be required to obtain high quality data on stochastic gene expression phenomena in eucaryotes. The central technical problem that must be solved to conduct such studies is the complete replacement of the endogenous gene with a fluorescently tagged functional version .
The Bcd gradient was modeled in one dimension with 101 homogeneous cubic subvolumes indexed by j from 0 to 100. Each subvolume has sides of length l = 5 μm and volume Δ = l 3. These dimensions were chosen in light of those of actual Drosophila embryos, which are 500 μm long. The subvolume dimensions are very close to those of blastoderm nuclei. In the first subvolume (j = 0), corresponding to the anterior pole of the embryo, we assume a zero-order synthesis reaction of Bcd molecules with constant rate J (molecules/s), representing the translation of a maternally deposited and localized stationary mRNA pool after egg deposition. The jth subvolume contains n j molecules of Bcd, which are the state variables of the model. We take initial conditions to be n j = 0 ∀ j. Diffusion of Bcd is modeled as a first-order elementary reaction for the exchange of molecules between neighboring subvolumes with rate constant k = D/l 2 seconds-1, where D is the effective Fickian diffusion constant. Dispersed degradation (decay) of Bcd is also modeled as a first-order reaction in all subvolumes with rate constant ω (seconds-1).
where C is the concentration of Bcd. Boundary conditions are given by ∂ x C|x = 0 = -J/Δ and ∂ x C|x = 500 = 0. The steady-state solution can be well approximated by C(x) ≈ a exp(-x/λ), where a = (J/Δ)λ and . MesoRD solves the deterministic equation by fixing the mesh size at the number of subvolumes chosen for the stochastic system. We used the built-in Euler solver with a stepsize of 0.001 second after verifying that these settings yielded stable and accurate solutions. Further analysis of simulations in comparison with quantitative immunostained data were performed in MATLAB.
We used Bcd protein expression data from cleavage cycle 13  that were downloaded from the FlyEx database [9, 35]. In FlyEx, confocal scans have been processed into tables containing average fluorescence levels in each nucleus ; these fluorescence levels are linearly proportional to Bcd concentration , and hence to molecular number. Data were taken from the central 10% strip along the A-P axis with their D-V coordinate suppressed, and normalized to remove the non-specific background . The gradient is in a steady state at cycle 13 . For quantitative analysis and comparison with the model, we pooled data into 5 μm 1D intervals to compare with the 5 μm subvolumes of the stochastic model. For certain purposes we considered a 17 member subset of embryos among which the inverse of the spatial exponential coefficient λ varied by less than 1%. This subset contained embryos ab15, cb2, ac1, hz8, ac6, iz4, cb19, ac2, cb31, iz15, ac16, ad18, ad31, hz30, be1, hz33 and as22 from FlyEx.
We thank Alexander Spirov and Manu for valuable discussions. We thank David Fange for the help with MesoRD installation. This work was supported by NIH award RO1 RR07801 and by CRDF GAP Award RUB-1578.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.