Natural selection drives rapid evolution of mouse embryonic heart enhancers
© Liao and Weng; licensee BioMed Central Ltd. 2012
Published: 12 December 2012
Skip to main content
© Liao and Weng; licensee BioMed Central Ltd. 2012
Published: 12 December 2012
Mouse E11.5 embryonic heart enhancers were found to exhibit exceptionally weak sequence conservation during vertebrate evolution compared to enhancers of other developing organs. However, it is unknown whether this phenomenon is due to elevated mutation rates, or is a consequence of natural selection.
In this study, based on the aligned orthologous genomic sequences of mouse and other closely related mammals, the substitution rates of fourfold degenerate sites or intron sequences in neighboring genes were used as neutral references to normalize substitution rates of mouse enhancers. Subsequent comparisons indicated that heart enhancers' evolutionary rates were increased by natural selection. Correspondingly, the results of Fisher's exact tests to examine the differential enrichment of substitutions between enhancers and neutral sequences suggest that both relaxed purifying selection and positive selection caused the rapid evolution of heart enhancers. Analyses on recombination rates and substitution patterns indicated that GC-biased gene conversion does not contribute to evolutionary rate variations among enhancers. In general, pleiotropic enhancers and enhancers in proximity to weakly expressed genes, tend to evolve slowly. Although heart enhancers are less pleiotropic and are adjacent to highly expressed genes, these biases do not account for the rapid evolution observed.
In combination, the results of the present study suggest that factors associated with functions or characteristics of the tissue may exert direct and profound effects on the intensity and direction of the natural selection applied to regulatory DNAs, such as enhancers.
Evolution of gene expression may underlie the morphological diversity of animal species [1–3]. Therefore, it is important to understand the rules that govern the evolution of regulatory DNA sequences. Enhancers are a major category of regulatory DNAs, and are short genomic segments that bind to regulatory proteins and increase gene transcription [4, 5]. Correspondingly, enhancers have been shown to regulate many of the processes that affect organismal development [6, 7]. To locate these important regulatory non-coding DNAs in the mouse (Mus musculus) genome, several thousand mouse enhancers that function in the embryonic heart, forebrain, midbrain, or limbs were recently identified using a p300 ChIP-Seq approach [8–10]. Subsequent analyses showed that the evolutionary signatures of the enhancers associated with different tissue types were variable: Embryonic heart enhancers exhibited an exceptionally low sequence conservation compared to enhancers from other tissues . Since the heart is a highly conserved organ critical to the survival of all vertebrate organisms, DNAs associated with the regulatory mechanisms that control heart development were expected to be conserved across a long phylogenetic distance . Although the underlying cause for the rapid sequence evolution observed in embryonic heart enhancers remains unknown, this unexpected phenomenon has challenged the use of sequence conservation as a criterion for predicting functionally important non-coding DNAs [12, 13].
There are three potential explanations for the fast evolution rate observed for heart enhancers. First, the mutation rate of heart enhancers are elevated, similar to that of fast evolving DNAs that reside in genomic regions prone to replication errors [14–18]. Second, heart enhancers are subject to a more frequent, or intense, positive selection, similar to that of genes encoding adaptive proteins where new mutations that develop tend to be fixed more frequently [19, 20]. Third, heart enhancers have evolved under a more relaxed purifying selection, similar to that of genes with non-essential functions that have high evolution rates relative to essential genes [21, 22]. To examine if differential frequencies/intensities in mutation rate or natural selection have led to an increased evolution of heart enhancers, based on mouse-rat or mouse-human genomic alignments, substitution rates of fourfold degenerate sites (d 4), or intron sites (d i), of neighboring genes were used as neutral references to normalize the substitution rate of each enhancer. Substitution rates of mouse enhancers from different tissues were then compared. Fisher's exact tests were also used to examine the differential enrichment of substitutions between enhancers and the corresponding neutral sequences, and these results were used to determine the strength of the purifying selection, or the frequency/intensity of the positive selection, experienced by the enhancers. We also examined the potential role for GC-biased gene conversion events in affecting the interpretation of processes that accelerate sequence substitutions of heart enhancers in mammals.
Studies on protein evolution have shown that unanticipated confounding factors could influence the identification and interpretation of previously reported protein evolutionary rate determinants. For instance, gene essentiality was asserted to influence bacterial protein evolutionary rates . However, a later study found that this phenomenon was simply due to that highly expressed genes evolved slowly and essential genes were highly expressed . Similarly, extracellularity was suggested to be correlated with the rate of protein evolution in yeasts . Nevertheless, this was because extracellular proteins tend to be non-essential to the organism and non-essential genes were less constrained evolutionarily . To determine whether the rapid evolution of heart enhancers is due to factors directly associated with functions or characteristics of the embryonic tissues or organs where a DNA performs functions (termed as "tissue factors" hereafter) [27, 28], the influence of enhancer properties and neighboring genes to the evolution of these heart enhancers were examined. We found that the evolution rates of heart enhancers were correlated with pleiotropy (see Methods), as well as the expression levels of neighboring genes. Embryonic heart enhancers generally were less pleiotropic and were in proximity of highly expressed neighboring genes. After controlling for these biases, heart enhancers still exhibited highest rates of evolution. The results suggested that tissue factors may exert direct effects on the intensity and direction of the natural selection on enhancers. Further studies to elucidate how tissue factors specifically cause variations in selective forces on enhancers are needed.
Based on the Ensembl mouse genome assembly v59 (NCBI m37), the co-activator protein, p300, was found to bind 2759, 2786, 3839, and 3597 enhancers in forebrain, midbrain, limb, and heart mouse embryonic tissues (Supplementary material in ). When enhancers with overlapping coordinates were consolidated into a single enhancer region, 11,332 non-overlapping enhancer regions were determined. These regions were further classified into "specific enhancers" (i.e., regions consolidated from enhancers of a single embryonic tissue) (n = 10,030) and "pleiotropic enhancers", the remaining set of enhancers (n = 1,302). It should be noted that, since only four tissues from the E11.5 stage were examined in p300 ChIP-seq experiments , the "specific enhancers" defined in the present study may function in other tissues or developmental stages during mouse embryogenesis. Furthermore, the term "pleiotropic" was used to describe enhancers that are likely to significantly affect the developmental processes of at least two of the examined embryonic organs in  when they are absent from the genome. The recombination rate of the mouse genome segment in which an enhancer was located was obtained from the Supplementary table of .
The coordinates of the mouse genes analyzed were retrieved from the BioMart (http://www.biomart.org/). For mouse genes with a null phenotype, the essentiality of each gene was defined based on phenotypic annotations of Mouse Genome Informatics 4.21 (http://www.informatics.jax.org/), according to . When premature death or infertility were the knockout phenotype, these genes were considered essential. All other genes with at least one documented null phenotype were considered non-essential. The gene expression profiles for the mouse genes analyzed were obtained from microarray data collected using 61 mouse tissues, including 53 adult tissues, 3 cell lines, and 5 early embryos from E6.5 to E10.5 . Expression levels were calculated by averaging microarray-based expression signals from all tissues according to the previous studies [22, 26].
To study the patterns and determinants of enhancer evolution rates, aligned mouse-rat (Rattus norvegicus), mouse-human (Homo sapiens), and mouse-rat-human orthologous genomic regions were retrieved from Ensembl (http://www.ensembl.org/) , or the UCSC Genome Browser (http://genome.ucsc.edu)  (genome versions: mouse: mm9; rat: rn4; human: hg18). Sequence divergence was subsequently calculated, and the numbers of substitutions were computed. Based on the genomic alignments retrieved from the UCSC Genome Browser (http://genome.ucsc.edu) , the sequence divergence of enhancer regions (D) from mouse versus other mammal sequences was calculated according to baseml of PAML  using a GTR model with gamma distribution for site heterogeneity (model = 7; ncatG = 5). Fourfold degenerate sites and intron sequences of the nearest gene with an ortholog in the rat or human genome were used as neutral references to an enhancer. Since mammalian genes are often alternatively spliced and produce multiple transcripts , the coding sequence and intron sequence of a gene were defined using the exon-intron structure of the longest isoform annotated in Ensembl v59, with alignments of coding sequences and intron sequences retrieved from the UCSC Genome Browser. Fourfold degenerate sites of mouse genes were defined according to W-H Li . When intron sequences were used as the neutral reference, similar to R Haygood, O Fedrigo, B Hanson, K-D Yokoyama and GA Wray , the first intron was removed due to the possibility it could contain regulatory motifs . The latter criterion resulted in that intronless genes and single-intron genes have only the neutral reference of d 4, and not d i. Similarly, sequence divergences between mouse and another mammal under neutrality, d 4 and d i, were calculated by baseml of PAML  using a GTR model with gamma distribution for site heterogeneity (model = 7; ncatG = 5). As indicated above, the calculation of d 4 or d i required the presence of an Ensembl annotation of a mouse gene to its one-to-one ortholog to ensure the reliability of neutral estimation. The normalized enhancer sequence divergences (D/d 4 or D/d i) were then used as indexes for the relative strength of selection. Lower D/d 4 or D/d i values indicated a weaker purifying selection, or a stronger positive selection, had occurred for the enhancer sequence examined .
According to AP Rooney and J Zhang , Fisher's exact test was used to examine the natural selection on enhancers based on the 2x2 contingency table. The row and column categories of the table are "the enhancer vs. the neutral reference used" and "the number of substituted sites vs. the number of sites with no substitution", respectively. Both fourfold degenerate sites, or intron sequences of the neighboring gene (defined above), were used as neutral references to examine the differential enrichment of substitutions between enhancers and neutral references. Compared to the neutral reference, significantly more substitutions in an enhancer implied that positive selection had occurred. Alternatively, fewer substitutions in an enhancer suggested that purifying selection had occurred. Otherwise, the model of neutrality cannot be rejected statistically, and the enhancer would be considered to have evolved free from natural selection.
The proportions of enhancers that underwent selection based on mouse-rat alignments (neutral reference: fourfold degenerate sites)
Positively selectedb/Under selectiona
The proportions of enhancers that underwent selection based on mouse-rat alignments (neutral reference: intron sites)
Positively selectedb/Under selectiona
It is important to note that estimates of the proportion of enhancers under selection were subject to the neutral references used. For example, when intron sites were served as the neutral reference, a higher proportion of enhancers were found to undergo natural selection (item "Under selection/Total"). In addition, "Positively selected/Under selection" and "Positively selected/Total" values were found to slightly differ. There are several possible explanations for these observations. First, the available numbers and samples of enhancers used to generate Table 1 (or Additional file 3) and Table 2 (or Additional file 4) varied. Second, an enhancer's neighboring gene generally had a larger number of intron sites than fourfold degenerate sites for Fisher's exact tests. With an increase in neutral reference sites, a statistically significant result to reject the neutral model was more likely to be obtained. Although our estimations of "Under selection/Total", "Positively selected/Under selection", and "Positively selected/Total" values may be affected by changes in the neutral reference used, these changes were small, especially for results based on mouse-human alignments (Additional file 3 and 4). In addition, it should be noted that the present study focused on the relative values for "Under selection/Total", "Positively selected/Under selection", and "Positively selected/Total", not the absolute values, to compare enhancers from different tissues. Estimations of these three indexes using a similar approach and neutral references should be subjected to the same biases. The consistent patterns observed for data in Tables 1 and 2 and Additional files 3 and 4 unambiguously indicate that a higher tendency of being targeted by relaxed purifying selection or positive selection was associated with the more rapid evolution of mouse embryonic heart enhancers.
Furthermore, based on alignments of mouse-rat-human orthologous DNAs, the number of nucleotides that were ancestrally A or T before mouse-rat divergence (human and rat are both either A or T of the aligned site) and had a substitution event in the mouse lineage was calculated and defined as "N substituted A/T" for each enhancer. The number of mouse lineage-specific substitutions from A/T to G/C (N A/T→G/C) was also calculated for each enhancer. If heart enhancers underwent more GC-biased gene conversion events, they would be predicted to have a higher N A/T→G/C to N substituted A/T ratio. However, heart enhancers were associated with a significantly lower ratio of N A/T→G/C to N substituted AT (Figure 2B) compared to the enhancers of other tissues.
In combination, the results shown in Figure 2 indicate that heart enhancers did not experience more GC-biased gene conversion events over their evolution. Therefore, mechanisms associated with GC-biased gene conversion did not explain the rapid evolution of heart enhancers and the selective pressure acting on them.
Enhancers tend to regulate neighboring genes that are within several kilobases . Therefore, we also investigated the association between the evolution rate of enhancer regions and the properties of adjacent genes (e.g., the closest coding gene in the mouse genome). The median distance for adjacent genes was found to be 49.7 kb. It is important to note that the adjacent gene of an enhancer may not be the same gene used to compute d 4 or d i, since the calculation of d 4 and d i require the presence of a one-to-one ortholog in another genome. For enhancer regions with an adjacent gene that is considered to be essential, a significantly smaller D (P = 3.83e-6, U test) was observed compared to enhancer regions adjacent to a non-essential gene (Figure 3B). These results suggest that faster rates of evolution occurred in the latter case. It has previously been shown that genomic regions with high gene density have low mutation rates . Moreover, highly expressed genes, which tend to be essential [24, 44], are located in gene-dense genomic regions . Thus, we expected that genomic regions enriched with essential genes would have a lowered mutation rate which potentially explains Figure 3B. As expected, a smaller D for enhancer regions with an adjacent gene that is essential was observed, while no significant difference in D/d 4 was observed for enhancers with an adjacent essential gene versus a non-essential gene (P = 0.27, U test, Figure 3E).
In addition, enhancer regions with a highly expressed adjacent gene tended to evolve faster than enhancer regions adjacent to a weakly expressed gene (Spearman's rank correlation coefficient σ of D vs. the expression level of the adjacent gene = 0.040, P = 1.9e-4; Figure 3C, for U test). A slightly more significant positive correlation was found between D/d 4 and the expression level of an adjacent gene (Spearman's σ = 0.044, P = 9.8e-5; Figure 3F, for U test), suggesting that embryonic enhancers near weakly expressed genes are subject to a more biased selective force. The results remained unchanged when gene expression levels were defined by the maximum expression signal detected for the 61 mouse tissues analyzed (Additional file 5). However, at the present stage, it remains unclear whether expression levels of an adjacent gene directly, or indirectly, affect D/d 4. Determining the cause of such an effect needs future studies.
The rapid evolution rates of mouse embryonic heart enhancers can potentially be explained by elevated rates of mutation and/or changes in the direction or intensity of natural selection. In the present study, neutral sites of the neighboring gene were used as references to examine mutation bias, as well as selective bias, on the evolution of enhancer sequences. Subsequent analyses demonstrated that the rapid evolution of mouse E11.5 heart enhancers cannot be explained by potential confounding factors examined in the present study, and therefore, may be directly associated with embryonic tissue factors.
The identification and understanding of regulatory DNAs in the mammalian genome is a major goal in the post-genomic era. This study employed a simple, yet effective, method to explore the role of natural selection on the evolution of non-coding sequences. Although the present study indicates that the rapid evolution of heart enhancers is likely to have been the result of a selective force associated with the type of embryonic tissue involved, the underlying causes for the relaxed selective constraint, or the higher frequency/intensity of positive selection, remain to be discovered. One possible underlying cause for a more relaxed purifying selection on heart enhancer is that during mouse embryogenesis, heart development begins and finishes earlier than brain and limb developments. If all enhancer sequences identified in  are functional regulatory DNA sequences at E11.5, these enhancers are regulating late-stage heart organogenesis but early-stage organogenesis of the brain or limbs. Mutations resulting in abnormal organ development at a later stage of organogenesis could bring a milder defect than those resulting in defects occur at an earlier stage, and therefore are more tolerable to the organism.
Our finding that heart enhancers tend to be positively selected is particularly intriguingly. In the future, it would be interesting to determine whether heart enhancers have been repeatedly shaped by positive selection because they are also involved in other developmental processes associated with morphological or physiological diversity. With the increasing abundance of functional genomic data and phenotypic data for mouse genes, such investigations will be feasible when a more complete set of spatial and temporal p300 ChIP-Seq data for mouse embryos and other species becomes available.
substitution rate of fourfold degenerate sites
substitution rate of intron sites
sequence divergence of enhancer regions
This work was supported by intramural funding from the National Health Research Institutes, Taiwan, and research grant (NSC 101-2311-B-400-001-MY3) and (NSC 99-2311-B-400-003-MY2) from the National Science Council, Taiwan, to B.-Y.L.
This article has been published as part of BMC Systems Biology Volume 6 Supplement 2, 2012: Proceedings of the 23rd International Conference on Genome Informatics (GIW 2012). The full contents of the supplement are available online at http://www.biomedcentral.com/bmcsystbiol/supplements/6/S2.
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.