A deterministic map of Waddington's epigenetic landscape for cell fate specification
© Bhattacharya et al; licensee BioMed Central Ltd. 2011
Received: 17 February 2011
Accepted: 27 May 2011
Published: 27 May 2011
Skip to main content
© Bhattacharya et al; licensee BioMed Central Ltd. 2011
Received: 17 February 2011
Accepted: 27 May 2011
Published: 27 May 2011
The image of the "epigenetic landscape", with a series of branching valleys and ridges depicting stable cellular states and the barriers between those states, has been a popular visual metaphor for cell lineage specification - especially in light of the recent discovery that terminally differentiated adult cells can be reprogrammed into pluripotent stem cells or into alternative cell lineages. However the question of whether the epigenetic landscape can be mapped out quantitatively to provide a predictive model of cellular differentiation remains largely unanswered.
Here we derive a simple deterministic path-integral quasi-potential, based on the kinetic parameters of a gene network regulating cell fate, and show that this quantity is minimized along a temporal trajectory in the state space of the gene network, thus providing a marker of directionality for cell differentiation processes. We then use the derived quasi-potential as a measure of "elevation" to quantitatively map the epigenetic landscape, on which trajectories flow "downhill" from any location. Stochastic simulations confirm that the elevation of this computed landscape correlates to the likelihood of occurrence of particular cell fates, with well-populated low-lying "valleys" representing stable cellular states and higher "ridges" acting as barriers to transitions between the stable states.
This quantitative map of the epigenetic landscape underlying cell fate choice provides mechanistic insights into the "forces" that direct cellular differentiation in the context of physiological development, as well as during artificially induced cell lineage reprogramming. Our generalized approach to mapping the landscape is applicable to non-gradient gene regulatory systems for which an analytical potential function cannot be derived, and also to high-dimensional gene networks. Rigorous quantification of the gene regulatory circuits that govern cell lineage choice and subsequent mapping of the epigenetic landscape can potentially help identify optimal routes of cell fate reprogramming.
In the quantitative view of a cell as a dynamical system governed by genetic interaction networks , an intuitive association can be made between the valleys ("creodes" in Waddington's terminology) on the epigenetic landscape and the trajectories leading to the attractors, or stable steady states, of the gene networks that regulate cell fate [7–9]. But can we quantitatively map the undulating surface of the landscape, thereby providing a predictive model of the "directionality" of cellular differentiation? Waddington himself cautioned that the epigenetic landscape, while useful as a "rough and ready picture" of development, "cannot be interpreted rigorously ". The mathematician René Thom, in his formulation of catastrophe theory inspired by Waddington's ideas, proposed that a generalized "potential surface" could be derived for any dynamical system [2, 10]. However Thom's later writings suggest that he did not believe it possible to quantify the epigenetic landscape . This view has been echoed by other authors, who have described the landscape as a "colorful metaphor " with "no grounding in physical reality ".
Huang, Wang and colleagues have recently proposed a probabilistic "pseudo-potential" to quantify the epigenetic landscape for a gene network regulating cell fate, where the elevation of the surface is inversely related to the likelihood of occurrence of a particular state in phase space [8, 12, 13]. In this formulation a stochastic potential energy landscape is characterized for a gene network, based on a Hartree mean-field approximation of the underlying master equation . Such stochastic formulations have also been used to derive probabilistic potential landscapes for the lysis-lysogeny switch in bacteriophage lambda [15–17], the mitogen-activated protein kinase (MAPK) signal transduction network , biochemical oscillations , and the predator-prey system .
Here we propose a simple numerical method to map the epigenetic landscape that is not based on a probabilistic or master-equation approach. Instead, a quasi-potential surface (Figure 1B) is derived directly from the deterministic rate equations governing the dynamic behavior of a gene regulatory circuit. We then use stochastic simulations to show that the elevation of this computed landscape correlates to the likelihood of occurrence of particular cell fates, with well-populated low-lying valleys representing stable cellular states and higher ridges acting as barriers to transitions between the stable states.
Finally, we discuss ways in which this quantitatively mapped landscape may help predict the efficiency of cellular de-differentiation or trans-differentiation, and identify optimal routes of cell fate reprogramming. Recent discoveries have challenged the dogma of cell fate determination as a unidirectional and irreversible process. Even terminally differentiated adult cells have now been shown to retain considerable phenotypic plasticity and the ability to be reprogrammed into pluripotent stem cell-like states [21–27] or into alternative differentiated lineages [28–34] by forced expression of a single gene or a small number of genes. These findings have led to a resurgence of interest in Waddington's ideas about cell lineage choice, with several authors invoking the image of the epigenetic landscape [7–9, 35–39]. However the theoretical basis of plasticity in cell fate is still not fully understood, and the efficiency of reprogramming in these studies is often quite low . A quantitative understanding of the "forces" that drive cell differentiation, and the "barriers" that separate stable cell states, is urgently needed. Such understanding may eventually enable us to predict the relative ease or difficulty of de-differentiation or trans-differentiation among multiple cellular states.
In general, condition (3) will not be valid for an arbitrary circuit of two genes x and y that regulate each other as per Eq. 1, making it impossible to derive a closed-form potential function.
where Δx and Δy are sufficiently small increments along the trajectory such that and can be assumed to remain unchanged over the interval [(x, x+Δx); (y, y+Δy)]. The quantities Δx and Δy are obtained as the products and , respectively, where Δt is the time increment. We use the term "quasi-potential" to describe V q , to emphasize its distinction from a closed-form potential function.
For positive increments in time Δt, Δ V q is thus always negative along an evolving trajectory, ensuring that trajectories flow "downhill" along a putative "quasi-potential surface". Stable steady states of the system (dx/dt = 0; dy/dt = 0) would correspond to local minima on this quasi-potential surface, given that at these states Δ V q = 0 (per Eq. 5). The overall change in the quasi-potential along a trajectory can then be calculated by numerically integrating the quantity Δ V q in Eq. 4 from a given initial configuration up to a stable steady state, thereby allowing us to map out a temporal trajectory along the putative quasi-potential surface (Figure 2B). The quasi-potential thus defined is a measurable quantity that is minimized along a trajectory from any initial condition to an attractor in the phase space of the two genes, and is in effect a Liapunov function of the dynamical system represented by the two-gene circuit .
The same procedure can be applied to systems with more than two stable steady states - for instance, a "tristable" system produced by a circuit of two genes that induce their own expression, in addition to mutual inhibition (Figure 3C, D). This system has three steady states - two of which represent alternative differentiated cell lineages, while the third state depicts the common progenitor cell of the two lineages [8, 13, 42].
The "third dimension" (elevation) of the landscape represented by the quasi-potential, although directly derived from the dynamic rate equations without any additional information, thus yields an interpretation of cellular stability not immediately apparent from two-dimensional phase portrait analysis. The analysis above supports the contention that the length of the "least action trajectory" along the contours of the epigenetic landscape is more important in predicting transitions between alternative cellular states than the simple "aerial distance" in state space . It is also interesting to note that the contours of the quantitatively mapped epigenetic landscape act as a constraint on the extent of stochastic fluctuations in protein levels, with simulated cells "smeared out" on the surface of a shallower basin (Figure 4A, middle panel) compared to a tighter distribution of cells on a deeper valley (Figure 4D, middle panel).
These results suggest that calculating the relative heights of the ridges and valleys on the computed epigenetic landscape of a multi-gene system can help predict the probability of trans-differentiation from one cell lineage to another, or de-differentiation of a particular cell type to its progenitor state. Current efforts to reprogram cell fate with potential application in regenerative medicine suffer from a low rate of successful reprogramming  and a trial-and-error approach to choice of a reprogramming strategy . Computing the epigenetic landscape for the critical gene interactions regulating the transition between two cellular states may indicate particular genetic manipulations that would lower the barriers separating the two states, thereby increasing the efficiency of the reprogramming process. It can also help characterize the relative ease or difficulty of alternative routes of cell fate transition [7, 9, 36]. For instance, comparison of the elevation of the barriers separating two terminally-differentiated cell lineages on the epigenetic landscape might suggest that de-differentiation of cells of one lineage to the common progenitor cell of the two lineages followed by redirection to the second lineage would lead to more efficient reprogramming than direct trans-differentiation (Figure S2, Additional File 1).
Interestingly, this flexibility of the quasi-potential surface under gene manipulation gives a quantitative interpretation of the revised image of the epigenetic landscape proposed by Waddington (Figure S3, Additional File 1), which showed an array of pegs representing genes, holding up a sheet of fabric (the landscape) through a network of guy ropes (gene interactions) - meant to convey the idea that "the modelling of the epigenetic landscape ... is controlled by the pull of these numerous guy-ropes which are ultimately anchored to the genes ". Similar changes in the shape of the epigenetic landscape may also be brought about by external signals - for example endogenous cytokines or environmental chemicals - which by transiently altering the landscape could have an instructive effect on cell fate choice.
In this work, we have defined a deterministic quasi-potential that is minimized along a temporal trajectory followed by a gene network, and used it to quantitatively derive the corresponding epigenetic landscape. A gene network not being a mechanical system, this quasi-potential should not be confused with a potential energy function. It is rather a Liapunov function of the dynamical system represented by the gene network, along which trajectories flow monotonically "downhill" towards the steady states of the network . Other investigators have used a term analogous to the quasi-potential difference Δ V q in Eq. 4 to calculate the "energy landscape" for concentrations of one component in a gene network [46, 47]. Here we have used the concept of alignment of multiple trajectories to interpolate the epigenetic landscape of a two-variable system.
This novel and simple process for deriving the surface of the landscape from a path-integral quasi-potential is not restricted to two-gene systems. While the landscape cannot be visually rendered for circuits with more than two genes, the rates of transition across the potential barriers between multiple steady states in the system can still be computed to predict optimum routes of cell fate reprogramming.
However, many binary branching points in development, particularly in blood cell lineage specification, are governed by mutual antagonism of only two transcription factors associated with alternative lineage choices . Mapping the epigenetic landscape of pairs of such cross-inhibitory "master regulators" should therefore be of particular interest in understanding both normal development and induced cell fate reprogramming, and can be greatly aided by detailed quantitative characterization of the interactions between these regulators.
where variables x and y represent the concentrations of the two gene products, and parameters B X and B Y denote the basal (constitutive) expression rates of genes x and y, respectively. The parameters fold YX and fold XY represent the rate constants, and K DYX and K DXY the effective affinity constants, for the suppressive effects of gene y on gene x, and of gene x on gene y, respectively. The mutual suppression of the two genes is quantified by the Hill-coefficient n H (the interaction is ultrasensitive for values of n H > 1). Parameters deg X and deg Y represent the first-order degradation rate constants for the two gene products x and y, respectively. For this simplified model we used dimensionless parameters with the following values: fold YX = fold XY = 2; K DYX = 0.7; K DXY = 0.5; B X = B Y = 0.2; deg X = deg Y = 1; n H = 4. These values were tuned to ensure bistable switching behavior in the model.
where the new parameters fold XX and fold YY represent the rate constants, and K DXX and K XYY the effective affinity constants, for the positive autoregulation of genes x and y, respectively. The default parameter values chosen to ensure three robust stable states in this model were as follows: fold XX = fold YY = fold YX = fold XY = 10; K DXX = K DYY = K DYX = K DXY = 4; B X = B Y = 0; deg X = deg Y = 1; n H = 4. This system has been modeled previously [42, 48, 49] in the context of mutual inhibition of the transcription factors PU.1 and GATA1 in common myeloid progenitor (CMP) cells, which gives rise to either bipotential granulocyte/macrophage progenitor (GMP) cells or megakaryocyte/erythroid progenitor (MEP) cells.
To evaluate the change in the quasi-potential along each trajectory in x-y phase space by numerical integration, the initial level of the quasi-potential at time t = 0 at the origin of the trajectory was arbitrarily set to zero (the same initial quasi-potential level was used for all trajectories so that the drop in the quasi-potential along each trajectory could be compared and used as a basis for alignment of multiple trajectories along a basin of attraction).
Thereafter, at each time step:
The above steps were repeated until the quasi-potential V q converged to a minimum (decided by a pre-set tolerance). Multiple trajectories thus obtained were aligned into basins of attraction according to the process described in the main text. The quasi-potential surface was then derived by linear interpolation among the aligned trajectories.
The deterministic models were implemented and simulated on the MATLAB® (R2009a, The MathWorks, Inc., Natick, MA) platform, while the BioNetS program , based on the Gillespie algorithm [51, 52], was used for stochastic simulations. All graphics were rendered on MATLAB®.
The stochastically simulated "cells" (i.e. individual realizations of the stochastic network model) were overlaid on the quasi-potential surface at the x and y values predicted for each cell. Since stochastic simulations yield integral values, we added a small random "deviation" term [= (rand*0.5) where rand is a MATLAB® function that draws pseudorandom values from the standard uniform distribution on the open interval (0,1)] to each simulated x and y value to visualize multiple cells situated at the same point in x-y phase space. The appropriate "elevation" for each cell on the quasi-potential surface was calculated by linear interpolation between the two points on the deterministic trajectories "closest to" the location of the cell in x-y phase space. Source code for the model in MATLAB® format is appended in Additional File 2.
We thank J. D. Schroeter, C. Woods, R. B. Conolly, H. J. Clewell III, J.E. Trosko, M. Thattai, J. M. Haugh and B. Howell for critical discussions and reading of the manuscript. This work was supported by the Superfund Research Program of the U.S. National Institute of Environmental Health Sciences, and by the American Chemistry Council.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.