Eigengene networks
Many module detection methods identify groups of genes whose expression profiles are highly correlated. For such modules, one can summarize the module expression profile by one representative gene: the module eigengene. An intuitive explanation of module eigengenes is provided in Figures 1C–E. Specifically, we define the module eigengene as the first right-singular vector of the standardized module expression data (Methods, Eq. 29). Eigengenes of different modules often exhibit correlations which we use to define eigengene networks. Figure 1A outlines our approach for constructing an eigengene network corresponding to the modules of a single gene co-expression network. We index the eigengenes by capital letters I, J,...; for example, E
J
denotes the (module) eigengene of the J-th module. We define the connection strength (adjacency) between eigengenes I and J as
(1)
Thus, the eigengene network A
Eigen
= (aEigen,IJ) is a special case of a signed weighted gene co-expression network (β = 1 in Eq. 26, Methods). We use a signed co-expression network because the sign of the correlation between eigengenes carries important biological information in our applications. We use a weighted gene co-expression network to describe the relationships between modules since this maintains the continuous nature of the co-expression information. Examples of two different visualization methods of eigengene networks are shown in Fig. 2C,D and 2E,H.
For the I-th module eigengene, we define the scaled connectivity (degree) C
I
(A
Eigen
) as mean connection strength with the other eigengenes:
(2)
where N denotes the number of module eigengenes. Note that the scaled connectivity C
I
(A
Eigen
) is close to 1 if the I-th eigengene has a high positive correlation with most other eigengenes.
The density D(A
Eigen
) of the eigengene network is defined as as the average scaled connectivity (Eq. 9):
(3)
The density D(A
Eigen
) is close to 1 if most eigengenes have high positive correlations with each other.
Meta-modules in a single eigengene network
Since eigengenes form a network, one can use a module detection procedure to identify modules comprised of eigengenes. We refer to modules in an eigengene network as meta-modules. Meta-modules may reveal a higher order organization among gene co-expression modules. We use average linkage hierarchical clustering to define meta-modules as branches of the resulting cluster tree (Methods, Eq. 21). The resulting meta-modules are sets of positively correlated eigengenes.
Differential eigengene network analysis
Several recent works have described differential network analysis methods for gene co-expression networks [11–13]. Here we propose methods for the differential analysis of eigengene networks. An overview is shown in Figure 1B. We start by defining and detecting consensus modules, i.e., modules that are shared by two or more gene co-expression networks. Consensus modules may represent biological pathways that are shared among the compared data sets. Study of their relationships, represented by consensus eigengene networks, may reveal important differences in pathway regulation under different conditions. Detection of consensus modules proceeds by defining a suitable consensus dissimilarity (Methods, Eq. 22) and using it as input to hierarchical clustering. To compare the consensus eigengene networks (Eq. 1) of two data sets whose adjacency matrices are and , we make use of the preservation network Preserv(1,2) = Preserv(, ), in which adjacencies are defined as
(4)
Here denotes the eigengene of the I-th consensus module in data set s. High values of indicate strong correlation preservation between eigengenes I and J across the two networks. The scaled connectivity C
I
(Preserv(1,2)) is given by
(5)
and is close to 1 if the correlations between the I-th eigengene and the other eigengenes are preserved across the two networks. The density D(Preserv(1,2)) is given by
(6)
Larger values of D(Preserv(1,2)) indicate stronger correlation preservation between all pairs of eigengenes across the two networks. Measures (5, 6) are intuitive, descriptive measures for assessing the extent of preservation between networks. To arrive at a statistical significance level (p-value), one can use a permutation test (described in Methods). Many statistical tests have been proposed to test for differences between correlations, e.g., [14–16].
Application 1: Differential eigengene network analysis of human and chimpanzee brain expression data
Here we report results of our differential eigengene network analysis of human and chimpanzee microarray brain data. The microarray data were originally published in [17]. A gene co-expression analysis of these data is reported in [11]. To facilitate a comparison with the original marginal module analysis, we used the genes selected by that work. The data, R code, and more details of this analysis can be found in Additional File 1 and on our webpage.
To find consensus modules, we used the consensus dissimilarity measure (Eq. 22) and average linkage hierarchical clustering. Genes of a given consensus module were assigned the same color, while unassigned genes were labeled grey. We found 7 consensus modules, shown in Fig. 2A: black (41 genes), blue (40 genes), brown (294 genes), pink (41 genes), red (78 genes), turquoise (884 genes), and yellow (151 genes). The functional enrichment analysis of these consensus modules is described below. For each data set, we represented the consensus modules by their corresponding module eigengenes and constructed an eigengene network between them (Eq. 1).
The differential eigengene network analysis yields two main novel findings that could not have been obtained using a standard marginal method. First, we find that the relationships between the module eigengenes are highly preserved. Figs. 2E and 2H show the eigengene networks AEigen,humanand AEigen,chimp, respectively. It is clear that the human and chimp eigengene networks of consensus modules are highly preserved. As described in Eq. (4), we defined a preservation network Preservehuman,chimp= Preserv(AEigen,human, AEigen,chimp) between the 7 consensus eigengenes.
For each individual eigengene, we find that its relationships with the other eigengenes is highly preserved as reflected by a high connectivity in the preservation network (Eq. 5): C
red
(Preservehuman,chimp) = 0.94, C
black
= 0.95, C
yellow
= 0.92, C
turquoise
= 0.95, C
pink
= 0.91, C
blue
= 0.91, C
brown
= 0.94. We find a high overall preservation (Eq. 6) between the two networks as reflected by a high density of the preservation network D(Preservehuman,chimp) = 0.93. Figs. 2F,G summarize our findings about the relationships of the consensus modules.
The second novel finding is that the consensus eigengenes in the human data set fall into three branches (meta-modules), see Fig. 2C. The first meta-module consists of the red, black, and yellow eigengenes; the second meta-module contains the turquoise eigengene; and the third meta-module contains the pink, blue and brown eigengenes. Remarkably, these 3 meta-modules can also be detected in the chimp data, see Fig. 2D. While the definition of consensus modules trivially implies that they are preserved between the two data sets, it is a non-trivial result that in this application the meta-modules are preserved as well.
To understand the biological meaning of the consensus modules, we studied differential expression of the consensus module eigengenes across the brain areas from which the microarray samples were taken. The results are summarized in Fig. 2 which shows the t-test p-values of differential expression of module eigengenes in the various brain regions from which samples were taken. Clearly, eigengenes can be characterized by their differential expression patterns in different brain regions. Furthermore, this analysis allows a biologically meaningful characterization of the meta-modules. The first meta-module (comprised of the black, yellow, and red module eigengenes) represents 270 genes that tend to be differentially expressed in the caudate nucleus. The second meta-module (comprised only of the turquoise eigengene) represents 884 genes that tend to be differentially expressed in cerebellum. The third meta-module (comprised of the pink, blue, and brown module eigengenes) represents 375 genes that are differentially expressed in the cortical samples. Thus, the meta-modules of this application correspond to biologically meaningful super-sets of modules and genes.
Given the strong relationships between modules in each meta-module, it is natural to ask whether the consensus modules are truly distinct. For example, the black and red modules show very similar levels of differential expression, see Fig. 2B. In this case, gene ontology information suggests that the two modules are indeed distinct. The black module is enriched with white matter related genes while no such enrichment can be found for the red module [11]. Likewise, gene ontology suggests that the yellow and black modules are distinct even though their module eigengenes are correlated.
In summary, the eigengene network analysis reveals a higher order organization of the consensus modules in the transcriptome.
Comparing our findings to a standard marginal module analysis
A standard approach for comparing the modules between several network is to identify modules in a 'reference' network and to study the preservation of the module assignment in the other networks [7]. In the original analysis, Oldham et al chose the human gene co-expression network as reference network since both preservation and non-preservation of human modules was of interest. This marginal module analysis is appropriate when the modules of one data set are the focus of the analysis but it is not designed to identify consensus modules that form the focus of our article. To compare differential eigengene network analysis analysis to the standard marginal module method, we compared our consensus modules to the 7 human modules found in [11]. We used a pairwise Fisher exact test to determine whether there is significant overlap between the consensus and the human modules. The results are summarized in Additional File 2. Overall, we find good agreement between consensus modules and human specific modules, which reflects the fact that most human modules are preserved in chimpanzees. Most of the human modules can be assigned to a consensus module and vice-versa, except for the human blue (360 genes) and green (126) modules which mostly disappeared from the consensus. Interestingly, small remnants (24 and 12 genes, respectively) of the two modules form the majority of the only consensus module (labeled pink, 41 genes) that does not have a clear human counterpart. Another small remnant (33 genes) of the human blue module forms most of the consensus blue module (40 genes).
The green and blue human modules were found to represent mostly cortical samples (and cerebellum for the green module) and were the least preserved in chimpanzees [11]. This is congruent with our finding of their lack of conservation using the consensus module method. One possible explanation for the absence of these modules in chimpanzees is that they largely reflect gene expression in the cerebral cortex, a brain region that has expanded dramatically in the human lineage. The standard marginal differential network analysis also identified several genes – LDOC1, EYA1, LECT1, PGAM2 – whose connectivities (Eq. 8) were significantly lower in the chimp network. None of these genes are present in our consensus modules, providing additional evidence of the method's agreement with the results of [11].
By definition, the consensus module detection is designed to find modules that are shared between data sets. Obviously, there will be many applications where data set specific modules are of interest. In such applications a standard marginal module detection analysis will be preferable.
Application 2: Differential eigengene network analysis of four mouse tissues
We analyzed gene expression data obtained from female mice of an F2 mouse intercross [18]. The microarray data measured gene expression levels in four different mouse tissues: liver, brain, adipose and muscle. More details concerning the data are presented in Additional File 3 and on our webpage. The consensus dissimilarity (Methods, Eq. (22)) was used as input to average linkage hierarchical clustering. In the resulting dendrogram, consensus modules were identified by the Dynamic Tree Cut branch cutting method [19]. We found 11 consensus modules (Fig. 3A): black (50 genes), blue (149 genes), brown (125 genes), green (59 genes), green-yellow (25 genes), magenta (36 genes), pink (44 genes), purple (27 genes), red (55 genes), turquoise (162 genes) and yellow (87 genes). Functional enrichment analysis of these modules is presented below.
Figures 3F,K,P, and 3U show the eigengene networks AEigen,brain, AEigen,muscle, AEigen,liver, and A
Eigen, adipose
, respectively. To assess the preservation of consensus modules across pairs of tissues, we defined preservation networks (Eq. 15), e.g., Preservmuscle,adipose= Preserv(AEigen,muscle, AEigen,adipose). We find the following overall preservation values between the eigengene networks: D(Preservbrain,muscle) = 0.93, Dbrain,liver= 0.88, Dbrain,adipose= 0.85, Dmuscle,liver= 0.88, Dmuscle,adipose= 0.85, Dliver,adipose= 0.87. Hence, at the level of tissues, we observe good preservation between the consensus eigengene networks with highest preservation between the brain and muscle tissues. Interestingly, these two data sets also show the strongest relationships between the eigengenes in each data set (strongest red and green patterns in the heatmap plots). This can be measured by the density of the absolute values of ME correlations, Dcor ≡ D(|cor(E
I
, E
J
)|). For the muscle and brain network we find Dcor,muscle= 0.45 and Dcor,brain= 0.45. The eigengenes in liver show, as a data set, relationships somewhat similar to those of brain and muscle, though the patterns in the heatmap plot are not as strong, Dcor,liver= 0.37. The adipose tissue shows the weakest relationships between the module eigengenes, Dcor,adipose= 0.31. The eigengene preservations, e.g., C
red
(Preservemuscle,adipose) can be found in Fig. 3, in the upper triangle of the matrix of plots F-U.
As an aside, we mention that pairwise network preservation measures are directly comparable only when the compared preservation networks involve the same set of consensus eigengenes, as is the case in this four-tissue application.
We find that the eigengene networks contain meta-modules, i.e., groups of highly correlated eigengenes (Figs. 3B–E). As an example, we focus on the meta-modules in the brain eigengene network. As can be seen from Fig. 3, the consensus eigengenes in brain tissue form 3 meta-modules that are partially preserved in the other tissues. Specifically, the first brain meta-module consists of the black, blue, magenta, and red consensus eigengenes. It is highly preserved in muscle and adipose but less so in liver. The second brain meta-module consists of the green-yellow, pink and yellow consensus eigengenes. This meta-module is highly preserved in muscle and liver but less so in adipose. The third brain meta-module consists of the turquoise, green and purple eigengenes. It is highly preserved in liver and adipose but less so in muscle. These results show that meta-modules may or may not be preserved across the different eigengene networks.
To understand the biological meaning of the consensus modules, we used functional enrichment analysis using gene ontology information [20]. The detailed results including alternative methods for adjusting for multiple comparisons can be found in the functional enrichment table presented in Additional File 4. Overall, we find that most modules are significantly enriched with known gene ontologies. Specifically, the black module is highly enriched with ribosomal genes (Bonferroni-corrected Fisher's exact p-value p = 8 × 10-10); the blue module with immune/stimulus/defense response (p < 3 × 10-17 for each of the three terms); brown with translation regulator activity (p = 4 × 10-3) and nucleotide binding (p = 5 × 10-3); magenta with stimulus/defense response (p < 2 × 10-6) and signal pathways (p < 2 × 10-3); red with cell cycle (p = 1.4 × 10-19) as well as nucleotide/ATP binding (p < 10-8); turquoise with protein binding (p = 6 × 10-3); yellow with carbohydrate metabolism (p = 3 × 10-4); pink and green-yellow with protein localization (p = 0.003 and p = 0.004), and green with alternative splicing/intracellular organelles (p = 4 × 10-4).
Our method detected two protein transport and localization modules (pink and green-yellow) and one may ask whether these modules are truly distinct. The two modules are closely related in 3 of the 4 data sets, but in the adipose tissue they have a weak (and negative) correlation of -0.24. Hence, from the consensus point of view, they are two distinct modules. Further, note that the green and black modules are very close on the consensus dendrogram, and their module eigengene (ME) correlation is high in absolute value but negative. The functional enrichment analysis suggests that the modules are different, although some terms are related (ribosomes for the black module and intracellular organelle for the green); this is an indication that the sign of the correlation of eigengenes is biologically meaningful.
While a standard marginal module analysis would succeed in studying preservation of individual data set modules, the consensus eigengene module analysis allows us to find shared modules and to study higher-order relationships between the consensus modules. Meta-modules in the brain tissues indicate the following relationships: the first (black, blue, magenta, red) suggests a relationship among ribosomal, immune/defense/stimulus response and cell cycle pathways; the second (green-yellow, pink, yellow) between protein localization and carbohydrate metabolism; the third (turquoise, green, purple) among protein binding and alternative splicing/intracellular organelle pathways.
The data also include clinical trait information on the mice (e.g., cholesterol and insulin levels, body weight, etc.), and one can ask whether some of the consensus modules (or more precisely, their eigengenes) relate significantly to any of the traits. We find no significant correlation between consensus module eigengenes and the traits. In application 3, we report significant relationships between consensus modules and clinical traits.
Permutation test of consensus module membership
We used the data from the brain and muscle tissues to perform a permutation test (described in Methods) of consensus module detection. We defined the combined number of genes assigned to consensus modules as test statistic. This test statistic was highly significant (p ≤ 0.001), which shows that the number of genes in the consensus modules was highly significant. However, this results depends on the level of stringency for defining consensus modules. Fig. 4 shows that as the height cutoff for the detection of branches in the consensus dendrogram increases, the probability of finding spurious consensus modules (and genes therein) increases; for excessively high branch cutoffs levels, the probability of finding as many genes in permuted data sets as in the unpermuted becomes unacceptably high.
Application 3: Consensus modules across female and male mouse liver tissues
Here we apply the differential eigengene network analysis to liver expression data from female and male mice of the above-mentioned F2 mouse intercross. The consensus module detection method identified 11 consensus modules, shown Fig. 5A: black (182 genes), blue (444 genes), brown (439 genes), green (207 genes), green-yellow (82 genes), magenta (105 genes), pink (168 genes), purple (83 genes), red (203 genes), salmon (58 genes), tan (67 genes), turquoise (605 genes), and yellow (302 genes). Overall, there is excellent preservation between the female and male eigengene networks, D(Preservfemale,male) = 0.94 (Figs. 5E,F). The module eigengene dendrograms in Figs. 5B,C as well as at the eigengene network heatmaps in Figs. 5D,G indicate that the two data sets share three meta-modules. The first one contains the blue and turquoise modules (1049 genes), the second one contains the green, magenta and pink modules (480 genes), and the third one contains the black, brown, tan, green-yellow and red modules (466 genes).
The experimental data include clinical traits such as mouse body weight, cholesterol levels, etc. As detailed in Additional File 5, we selected 7 potentially interesting traits. Figs. 5H,I present the correlations and corresponding p-values for relating the clinical traits to the module eigengenes. We find that the turquoise module (605 genes) is highly significantly correlated with weight in both the female (r = 0.5, p = 5 × 10-8) and male samples (r = 0.47, p = 3.1 × 10-8). The greenyellow module (82 genes) relates to weight with comparable correlations, r = -0.44 (p = 8 × 10-8) and r = -0.50 (p = 4 × 10-9) in females and males, respectively. The yellow module is significantly related to insulin levels in both the female and male data sets, r = 0.38 (p = 5 × 10-6) and r = 0.35 (p = 7 × 10-5), respectively. The correlation between the eigengenes of the consensus turquoise and greenyellow modules are -0.68 and -0.74 in the female and male samples, respectively; the module eigengenes are relatively close by absolute value of the correlation, but the sign difference suggests that they distinct. This result is another motivation to use signed networks (Eq. 1) to describe the relationships between eigengenes.
Given that the female and male networks appear similar but not the same, one may ask whether the consensus module analysis provides an indication of how they differ. For this purpose we compared the female liver module assignment as reported in [18] to our consensus module assignment, see Additional File 6. Using the same parameters for the clustering and branch detection, we found that two of the 12 modules (labeled by salmon and light-yellow color) in that work are not represented in the consensus modules. Investigating the function of these two modules is beyond the scope of this work.
Simulation studies of consensus modules
To assess the performance of the consensus module detection method, we performed a simulation study involving two simulated gene expression data sets. The two data sets contained both shared and non-shared modules. The actual simulation procedure is described in more detail in Additional File 7 and the R code can be found on our webpage.
Briefly, each simulated module is built around a chosen seed profile (referred to as the true module eigengene) by adding gene expression profiles with increasing amount of noise. We studied the performance of consensus module detection under varying levels of added noise. The sensitivity and specificity are determined from the numbers of true and false positives (n
TP
and n
FP
) and true and false negatives (n
TN
and n
FN
) as Sensitivity = n
TP
/(n
TP
+ n
FN
), Specificity = n
TN
/(n
TN
+ n
FP
). To measure the fidelity of the calculated module eigengenes to the true module eigengenes, we report the proportion P0.95 of the detected modules whose eigengene has a correlation greater than 0.95 with the true module eigengene, i.e., Fidelity = P0.95. Results of the simulation are summarized in Table 1. We found that when noise is low and modules are very clearly defined, the sensitivity, specificity, and fidelity are 100%. It is worth noting that for low and moderate noise levels, the fidelity does not vary substantially with changes in the branch cut height, indicating that module eigengenes are robust to inclusion/exclusion of moderate numbers of genes in the module. As the noise increases, sensitivity, specificity, and fidelity decrease. We note that the specificity and sensitivity depend on the choice of cutting parameters for the cluster trees. We have not performed an exhaustive search to identify parameter values that would give optimal performance. Our default settings perform well across a range of different simulation models.