Natural selection governs local, but not global, evolutionary gene coexpression networks in Caenorhabditis elegans
© Jordan et al; licensee BioMed Central Ltd. 2008
Received: 22 April 2008
Accepted: 13 November 2008
Published: 13 November 2008
Large-scale evaluation of gene expression variation among Caenorhabditis elegans lines that have diverged from a common ancestor allows for the analysis of a novel class of biological networks – evolutionary gene coexpression networks. Comparative analysis of these evolutionary networks has the potential to uncover the effects of natural selection in shaping coexpression network topologies since C. elegans mutation accumulation (MA) lines evolve essentially free from the effects of natural selection, whereas natural isolate (NI) populations are subject to selective constraints.
We compared evolutionary gene coexpression networks for C. elegans MA lines versus NI populations to evaluate the role that natural selection plays in shaping the evolution of network topologies. MA and NI evolutionary gene coexpression networks were found to have very similar global topological properties as measured by a number of network topological parameters. Observed MA and NI networks show node degree distributions and average values for node degree, clustering coefficient, path length, eccentricity and betweeness that are statistically indistinguishable from one another yet highly distinct from randomly simulated networks. On the other hand, at the local level the MA and NI coexpression networks are highly divergent; pairs of genes coexpressed in the MA versus NI lines are almost entirely different as are the connectivity and clustering properties of individual genes.
It appears that selective forces shape how local patterns of coexpression change over time but do not control the global topology of C. elegans evolutionary gene coexpression networks. These results have implications for the evolutionary significance of global network topologies, which are known to be conserved across disparate complex systems.
The regulation of genes, resulting in specific patterns and levels of mRNA expression, is thought to be critically important for cellular function, organismal development and evolution. Recent studies have shown that while expression of some genes may change rapidly within and between species [1–3], the topological properties of gene coexpression networks are substantially conserved [4–6]. In other words, the time, place and level at which genes are expressed may be highly dynamic and flexible, but the way that thousands of expression patterns are organized into complex networks seems nevertheless constrained.
The emergent topological similarity among diverse biological networks has suggested to some that 'topology matters' and that natural selection is the evolutionary force governing the pattern . This has galvanized competing perspectives on whether  or not  the elucidation of the global topological properties of biological networks can yield meaningful insight about local cellular functions and evolution. We conducted an evaluation of the role of natural selection in the evolutionary conservation of gene coexpression networks to address this outstanding issue.
Elucidation of the role that natural selection plays in shaping evolutionary gene coexpression network topologies was made possible through the comparison of gene expression patterns among C. elegans populations that evolved under a regime of natural selection versus those that evolved in the virtual absence of selection. Previously, C. elegans mutation accumulation (MA) lines were bred in order to produce populations that evolve effectively free from selective constraint . Microarray analysis has been used to compare gene expression levels for thousands of C. elegans genes among such MA lines with orthologous gene expression for natural isolate (NI) populations . This study clearly demonstrated a role for natural selection in constraining expression divergence, since a much higher fraction of MA than NI genes were found to be differentially expressed across populations. Other studies have demonstrated selective constraint on changes in gene expression between mammals [11–13] and in the fly . We wanted to evaluate how selective constraint on gene expression is manifest in the topologies of evolutionary gene coexpression networks. Given that C. elegans MA and NI lines segregate distinct mutational and transcriptional spectra [9, 10], we expected contrasting networks of evolutionary coexpression, if natural selection governs network topology.
Results and discussion
C. elegans microarray gene expression data were used to reconstruct evolutionary gene coexpression networks connecting genes with expression levels that covary across lines (Methods). Evolutionary gene coexpression networks were generated independently, and then compared, for the MA lines versus the NI populations. To generate evolutionary gene coexpression networks, genes are represented by nodes and the nodes (genes) are connected by an edge if they are determined to be coexpressed across lines or populations (Figure 1). Pairs of genes were determined to be coexpressed if correlation of their gene expression profiles across lines yielded an r-value greater than, or equal to, a defined threshold value. Results based on r > 0.95 are presented in the body of the manuscript, and results based on a series of differing thresholds, as well as for a different coexpression metric, are presented in Additional file 1. Since expression across five lines (populations) was evaluated for the MA and NI samples, the 0.95 r-value cut-off value corresponds to a P-value of ~0.01.
The node degree distributions of a number of gene coexpression networks have previously been shown to follow a power-law [4–6, 12]. However, the MA and NI node degree distributions seen here are better approximated by an exponential curve. This can readily be appreciated by representing the node degree distributions as log-log (Figure 2B) and semi-log (Figure 2C) plots; power-law distributions follow a straight line on log-log plots, while exponential distributions follow a straight line on semi-log plots. Both the MA and NI distributions are better fit to a straight line on the semi-log plots. Exponential node degree distributions of this kind are more characteristic of ecological networks, such as predator-prey (food web) networks [15, 16]. The exponential shape of the node degree distribution seen here indicates that the level of connectivity falls off more rapidly than seen for the other gene coexpression networks that show power-law distributions. This difference may reflect the kinds of gene expression profiles (vectors) analyzed here. We are considering changes in gene expression across lines and populations of a single species. Thus, the expression levels may not be expected to change much relative to previous coexpression studies, which have analyzed expression changes across different tissues, developmental stages, disease states and experimental conditions. Relatively uniform expression across C. elegans lines will not allow for the kinds of coordinated line-specific changes among genes that would lead to highly connected nodes.
Gene coexpression network topology parameter values and comparisons.
Network topology parameter values1
< k >
4.18 ± 14.24
0.25 ± 0.36
10.77 ± 5.03
20.76 ± 3.06
23,500 ± 145,912
5.33 ± 15.28
0.32 ± 0.37
10.17 ± 5.14
21.48 ± 4.22
17,440 ± 74,032
51.31 ± 15.21
0.48 ± 0.04
6.16 ± 2.07
11.01 ± 0.12
18,203 ± 7,370
37.83 ± 9.99
0.48 ± 0.05
6.29 ± 2.11
11.36 ± 0.51
14,140 ± 6,150
Observed versus random network topology parameter values 2
< k >
MA × NI
MA × MA random
NI × NI random
The similarity in the connectivity distributions between the two networks was unexpected given the distinct roles of natural selection in shaping gene expression divergence among the MA versus NI populations. Apparently, the removal of effective natural selection in the MA lines does not appreciably reshape the distribution of connectivity across C. elegans coexpression networks. In other words, different regimes of selection do not necessarily yield different global network topological properties. To further evaluate the role of selection in shaping global topological properties, values for a number of network topology parameters were computed and compared for the MA versus NI networks (Table 1). As with the node degree distributions, the network parameters are far more similar for MA and NI than between the observed and random networks. The MA and NI networks have similar numbers of nodes and edges, while the corresponding random networks have substantially more nodes and an order of magnitude increase in the number of edges. In addition, the average node degree (<k>) is 4.18 and 5.33 for the MA and NI networks respectively, while <k> = 51.30 and 37.84 for the random networks built from the MA and NI expression data (Table 1). Taking into account their standard deviations and the number of nodes considered, the <k> values are statistically indistinguishable for the observed MA and NI networks, while each observed network is significantly different from its corresponding random network (Table 1). The same trend holds for the average clustering coefficients, path lengths and eccentricities; observed networks have average values that are not significantly different from one another but are highly different from the random networks. Betweeness is the only parameter that does not statistically discriminate between the observed versus random networks. As was the case with the node degree distributions, the similarity between observed network topology parameters and their differences from random networks holds when different coexpression thresholds and a different gene expression vector comparison method is used [see Additional file 1 – Supplemental Table 1].
Essential genes should be subject to the effects of selection in both the MA and NI lines, since individuals with lethal or sterile mutations will not be able to reproduce in any setting. Consistent with this expectation, a higher fraction of essential genes are found preserved between both networks than in either the MA or NI networks alone [see Additional file 1 – Supplemental Figure 2]. In other words, non-essential genes are freer to change between networks as the selective conditions change. Nevertheless, the lack of correlation for node topological properties between networks holds for both essential and non-essential genes (Figure 4). Thus, the essential genes that are preserved in the MA and NI networks do not have similar topological properties across networks, and so can not be responsible for the conservation of global topological properties between the networks.
As previously described for the global network properties, the relatively small number of lines (populations), and according low dimensionality (n = 5) of the gene expression vectors, could result in low resolution when expression vectors are compared to build coexpression networks. Therefore, the low overlap of edges between MA and NI networks (0.65%) could be due to poor sampling. In order to control for this possibility, a null distribution of C. elegans coexpression network overlaps was computed by randomly sampling 100 pairs of expression sets of size n = 5 from the Kim et al. C. elegans gene expression dataset  and computing the overlaps between coexpression networks built from these random pairs of sets [see Additional file 1 – Supplemental Figure 3]. There is an average of 2.66% conserved edges between pairs of coexpression networks built with the Kim et al. data with n = 5. This relatively low figure probably reflects that fact that different pairs of genes are coexpressed in different sets of conditions. Nevertheless, the 2.66% overlap seen for the random pairs is ~4.1× greater than the 0.65% conserved edges we observe between the MA and NI networks, and the overlap between MA and NI networks is substantially lower than any of the 100 overlaps between the network pairs from the Kim et al. data [see Additional file 1 – Supplemental Figure 3]. Accordingly, the difference between the observed overlap between the MA and NI networks, and the overlap between the Kim et al. network pairs is highly statistically significant (z = 40.7 P ≈ 0). In other words, the low overlap between the MA and NI networks can not be explained by the size of the expression sets (n = 5) used in their construction.
Despite the fact that the distributions of connectivity over the entire MA and NI networks are quite similar, and very different from random, the local connections as well as the specific topological properties of the genes in the different network contexts are almost entirely different. Thus, the action of natural selection, or more appropriately the effect of removing selection, is revealed by differences in the local, but not the global, structure of the coexpression networks. This is analogous to the distinction between the conservation of global network topological properties and the divergence of local connections previously observed for orthologous human and mouse coexpression networks , but the pattern seen here is even more extreme. The change of specific pairs of coexpression relationships in the MA network is also consistent with previous results that showed selective constraint on gene expression divergence among NI populations and accelerated expression divergence for MA lines .
We sought to further evaluate the nature of the differences between the MA and NI evolutionary coexpression networks and to compare these evolutionary coexpression networks to more commonly analyzed interaction networks that link functionally associated genes. To do this, we compared pairs of evolutionarily coexpressed genes for the MA and NI networks to pairs of C. elegans genes previously determined to be coexpressed across 553 microarray experiments by Kim et al. . In the study of Kim et al., similarities between gene expression profiles were calculated using the Pearson correlation coefficient, and gene expression profile similarities (distances) were converted into two dimensions using force-directed placement. This procedure resulted in a C. elegans gene expression topological map where the proximity of pairs of genes in two dimensions (x, y) corresponds to their degree of coexpression across conditions, and presumably their functional relatedness. We evaluated the way that MA versus NI coexpressed gene pairs, along with the random gene pairs for each network, populate the C. elegans gene expression topological map by measuring Euclidean distances on the topological map between pairs of genes in the MA, NI or random networks. Coexpressed gene pairs in both the MA and NI networks are significantly more closely grouped on the C. elegans topological map than are pairs of genes from the corresponding random networks (2.5<t<23.4 3.6e-49<P < 1.1e-2 Student's ttest). This indicates that functionally relevant interactions are captured by both the MA and NI networks. Interestingly, MA coexpressed gene pairs are more closely grouped, on average, than NI coexpressed gene pairs on the C. elegans gene expression topological map (t = 14.78 P = 3.6e-49 Student's ttest). This result is similar to that reported by Denver and colleagues  who showed that genes differentially expressed in MA lines tended to cluster in specific coexpression "mounts," and concluded that this was likely caused by trans- acting mutations purged by selection from NI lines. Changes in expression across conditions recorded on the C. elegans topological map are due largely to the context-dependent action of transcription factors. Accordingly, the local differences between the MA and NI evolutionary coexpression networks are indicative of network rewiring, likely caused by mutations in trans- acting factors that 'capture' gene expression modules in the MA lines. Thus, the action of natural selection, or more appropriately the effect of removing selection, is only revealed by differences in the local structure of the coexpression networks.
The appearance of similar global topological properties across disparate complex biological systems led to the view that there were 'universal laws' that governed the function and evolution of cellular networks [7, 18]. It followed that the revelation of such laws, via the analytical tools of network theory, could yield revolutionary insight into biology. However, some reserved judgment as to whether such universal laws existed and if the statistical analysis of network topologies would reveal something non-trivial about biological systems . This initial agnosticism as to the ability of the network approach to reveal fundamental and novel biological principles has hardened into a deep skepticism regarding its very relevance . The pessimistic view of the network approach to biology is based in large part on the assertion that similar topological properties do not entail similar network architectures or functional constraints, and analogous conclusions have been reached for computer networks . Indeed, it has been shown that similar global network topological properties can emerge due to non-adaptive processes, such as simple birth-and-death models [21, 22], without any assumption of selection . Our own comparison of the MA versus NI evolutionary gene coexpression networks has revealed that similar properties at a high level of abstraction can obscure substantial and biologically relevant differences at lower levels. With respect to the evolution of biological systems, the details remain important.
Gene expression data
Gene coexpression networks
Coexpression networks were reconstructed independently for MA lines and NI populations. In the gene coexpression networks, gene as taken as nodes and pairs of nodes (genes) are connected by an edge if they are coexpressed across lines (populations). To determine if pairs of genes are coexpressed, the expression profile for each gene is taken as its normalized expression levels across MA or NI lines (populations). Thus, each gene g i was represented as a row vector with dimensions (n = 5) equal to the number of MA lines or NI populations evaluated:
g i = [zgi 1, zgi 2... zg in ]
Pairs of genes that have values r or ed that exceeded a given threshold were considered to be coexpressed and were thus represented as nodes connected by an edge in the MA or NI coexpression network. For the Pearson correlation coefficient, r-value thresholds used were 0.95, 0.96, 0.97, 0.98 and 0.99. Five Euclidean distance value thresholds were computed so that the resulting networks would have the same number of edges as the Pearson correlation coefficient networks: ed = 0.1942, 0.1855, 0.1744, 0.1586, and 0.1381.
The Pearson correlation coefficient and Euclidean distance are widely used in the analysis of gene expression data. The choice of these two metrics was also based in part on previous results that showed they were the most dissimilar of the distance metrics commonly used to compare and cluster expression data . In this sense, they represent a conservative pair of measures to be used when controlling for methodological effects on the results.
Random pairs of gene coexpression networks were built using the C. elegans gene expression data from the study of Kim et al. . The Kim et al. data set includes 553 microarray experiments over a variety of experimental conditions, including different growth conditions, developmental stages and mutants. For 1,000 genes, 2 sets of 5 experiments each were randomly sampled from the 553 experiments in the C. elegans gene expression data set. Each paired set of experiments, and the resulting sets gene expression vectors, was used as described above to construct pairs of coexpression networks. This procedure was repeated 100 times, and for each replicate pair, the overlap between the networks, in percentage of conserved edges, was computed.
Statistical analyses of network topologies
The number of nodes and edges for each coexpression network along with the average and standard deviation values for the following network parameters – node degree (k), clustering coefficient (C), path length (l), eccentricity (e) and betweeness (b) – were calculate using the program tYNA . The node degree (k) is simply the number of connections (edges) for any given node. The clustering coefficient (C) is a measure of how connected the neighbors of a given node are:
C = 2n/k(k - 1)
where n is the number of links among the k neighbors of the node. Path length (l) is the shortest path (geodesic), in terms of numbers of edges that are traversed, between any two nodes in the connected part of the network. The eccentricity (e) of a node is the largest geodesic between that node and any other node in the network. The betweeness (b) of a node is the number of geodesics that pass through the node. Average network parameter values were compared between networks using the Students' ttest. Node degree distributions were computed as f(k) × k, and curve fitting on the distributions was done using least squares.
Random networks were generated by randomly shuffling the positions of the expression values within the original MA and NI m × n data matrices. The shuffled MA and NI datasets were used to re-calculate all pairwise correlation (r) or Euclidean distance (ed) values between gene expression profiles (vectors), and the same threshold cutoffs as employed for the observed networks were used to make coexpression networks from the random r- or ed-values.
Comparison with the C. elegans gene expression topological map
Euclidean distances were then averaged over all pairs in each network and then compared for the MA and NI networks using the Student's ttest.
IKJ and LSK were supported by the School of Biology, Georgia Institute of Technology. JTS was supported by an Alfred P. Sloan Research Fellowship in Computational and Evolutionary Molecular Biology (BR 4499). DRD is supported by the OSU Computational and Genome Biology Initiative and NIH NIGMS. The authors wish to thank KY Yip for assistance with the tYNA analysis software.
- Gu Z, Nicolae D, Lu HH, Li WH: Rapid divergence in expression between duplicate genes inferred from microarray data. Trends Genet. 2002, 18: 609-613. 10.1016/S0168-9525(02)02837-8View ArticlePubMedGoogle Scholar
- Luscombe NM, Babu MM, Yu H, Snyder M, Teichmann SA, Gerstein M: Genomic analysis of regulatory network dynamics reveals large topological changes. Nature. 2004, 431: 308-312. 10.1038/nature02782View ArticlePubMedGoogle Scholar
- Makova KD, Li WH: Divergence in the spatial pattern of gene expression between human duplicate genes. Genome Res. 2003, 13: 1638-1645. 10.1101/gr.1133803PubMed CentralView ArticlePubMedGoogle Scholar
- Agrawal H: Extreme self-organization in networks constructed from gene expression data. Phys Rev Lett. 2002, 89: 268702- 10.1103/PhysRevLett.89.268702View ArticlePubMedGoogle Scholar
- Bergmann S, Ihmels J, Barkai N: Similarities and differences in genome-wide expression data of six organisms. PLoS Biol. 2004, 2: E9- 10.1371/journal.pbio.0020009PubMed CentralView ArticlePubMedGoogle Scholar
- Tsaparas P, Marino-Ramirez L, Bodenreider O, Koonin EV, Jordan IK: Global similarity and local divergence in human and mouse gene co-expression networks. BMC Evol Biol. 2006, 6: 70- 10.1186/1471-2148-6-70PubMed CentralView ArticlePubMedGoogle Scholar
- Barabasi AL, Oltvai ZN: Network biology: understanding the cell's functional organization. Nat Rev Genet. 2004, 5: 101-113. 10.1038/nrg1272View ArticlePubMedGoogle Scholar
- Keller EF: Revisiting "scale-free" networks. Bioessays. 2005, 27: 1060-1068. 10.1002/bies.20294View ArticlePubMedGoogle Scholar
- Denver DR, Morris K, Streelman JT, Kim SK, Lynch M, Thomas WK: The transcriptional consequences of mutation and natural selection in Caenorhabditis elegans. Nat Genet. 2005, 37: 544-548. 10.1038/ng1554View ArticlePubMedGoogle Scholar
- Vassilieva LL, Hook AM, Lynch M: The fitness effects of spontaneous mutations in Caenorhabditis elegans. Evolution. 2000, 54 (4): 1234-1246.View ArticlePubMedGoogle Scholar
- Jordan IK, Marino-Ramirez L, Koonin EV: Evolutionary significance of gene expression divergence. Gene. 2005, 345: 119-126. 10.1016/j.gene.2004.11.034PubMed CentralView ArticlePubMedGoogle Scholar
- Jordan IK, Marino-Ramirez L, Wolf YI, Koonin EV: Conservation and coevolution in the scale-free human gene coexpression network. Mol Biol Evol. 2004, 21: 2058-2070. 10.1093/molbev/msh222View ArticlePubMedGoogle Scholar
- Liao BY, Zhang J: Evolutionary conservation of expression profiles between human and mouse orthologous genes. Mol Biol Evol. 2006, 23: 530-540. 10.1093/molbev/msj054View ArticlePubMedGoogle Scholar
- Rifkin SA, Houle D, Kim J, White KP: A mutation accumulation assay reveals a broad capacity for rapid evolution of gene expression. Nature. 2005, 438: 220-223. 10.1038/nature04114View ArticlePubMedGoogle Scholar
- Dunne JA, Williams RJ, Martinez ND: Network structure and biodiversity loss in food webs: robustness increases with connectance. Ecology Letters. 2002, 5:Google Scholar
- Montoya JM, Pimm SL, Sole RV: Ecological networks and their fragility. Nature. 2006, 442: 259-264. 10.1038/nature04927View ArticlePubMedGoogle Scholar
- Kim SK, Lund J, Kiraly M, Duke K, Jiang M, Stuart JM, Eizinger A, Wylie BN, Davidson GS: A gene expression map for Caenorhabditis elegans. Science. 2001, 293: 2087-2092. 10.1126/science.1061603View ArticlePubMedGoogle Scholar
- Barabasi AL: Linked: the new science of networks. 2002, Cambridge: PerseusGoogle Scholar
- Wolf YI, Karev G, Koonin EV: Scale-free networks in biology: new insights into the fundamentals of evolution?. Bioessays. 2002, 24: 105-109. 10.1002/bies.10059View ArticlePubMedGoogle Scholar
- Lun L, David A, Walter W, John D: A first-principles approach to understanding the internet's router-level topology. Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications. 2004, Portland: ACM PressGoogle Scholar
- Karev GP, Wolf YI, Rzhetsky AY, Berezovskaya FS, Koonin EV: Birth and death of protein domains: a simple model of evolution explains power law behavior. BMC Evol Biol. 2002, 2: 18- 10.1186/1471-2148-2-18PubMed CentralView ArticlePubMedGoogle Scholar
- Koonin EV, Wolf YI, Karev GP: The structure of the protein universe and genome evolution. Nature. 2002, 420: 218-223. 10.1038/nature01256View ArticlePubMedGoogle Scholar
- Lynch M: The evolution of genetic networks by non-adaptive processes. Nat Rev Genet. 2007, 8: 803-813. 10.1038/nrg2192View ArticlePubMedGoogle Scholar
- Yip KY, Yu H, Kim PM, Schultz M, Gerstein M: The tYNA platform for comparative interactomics: a web tool for managing, comparing and mining multiple networks. Bioinformatics. 2006, 22: 2968-2970. 10.1093/bioinformatics/btl488View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.