Eigengene networks for studying the relationships between co-expression modules
© Langfelder and Horvath; licensee BioMed Central Ltd. 2007
Received: 08 May 2007
Accepted: 21 November 2007
Published: 21 November 2007
There is evidence that genes and their protein products are organized into functional modules according to cellular processes and pathways. Gene co-expression networks have been used to describe the relationships between gene transcripts. Ample literature exists on how to detect biologically meaningful modules in networks but there is a need for methods that allow one to study the relationships between modules.
We show that network methods can also be used to describe the relationships between co-expression modules and present the following methodology. First, we describe several methods for detecting modules that are shared by two or more networks (referred to as consensus modules). We represent the gene expression profiles of each module by an eigengene. Second, we propose a method for constructing an eigengene network, where the edges are undirected but maintain information on the sign of the co-expression information. Third, we propose methods for differential eigengene network analysis that allow one to assess the preservation of network properties across different data sets. We illustrate the value of eigengene networks in studying the relationships between consensus modules in human and chimpanzee brains; the relationships between consensus modules in brain, muscle, liver, and adipose mouse tissues; and the relationships between male-female mouse consensus modules and clinical traits. In some applications, we find that module eigengenes can be organized into higher level clusters which we refer to as meta-modules.
Eigengene networks can be effective and biologically meaningful tools for studying the relationships between modules of a gene co-expression network. The proposed methods may reveal a higher order organization of the transcriptome. R software tutorials, the data, and supplementary material can be found at the following webpage: http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/EigengeneNetwork.
Gene co-expression networks constructed from gene expression microarray data capture the relationships between transcripts [1–7]. From the point of view of individual genes ('from below'), modules are groups of highly interconnected genes that may form a biological pathway. From the point of view of systems biology ('from above'), functional modules bridge the gap between individual genes and emergent global properties [8–10]. Here we view modules as basic system components (i.e., nodes of a network) and describe their relationships using network language. We find that co-expression modules may form a biologically meaningful meta-network that reveals a higher-order organization of the transcriptome. We refer to modules in a meta-network of modules as meta-modules.
Our analysis can be viewed as a network reduction scheme that reduces a gene co-expression network involving thousands of genes to an orders of magnitude smaller meta-network involving module representatives (one eigengene per module). We refer to the resulting network as eigengene network. Using eigengene neworks, we will show that the information captured by co-expression modules is far richer than a catalogue of module membership.
As a motivating example, consider the comparison between gene co-expression networks in human and chimpanzee brains. Using gene expression microarray data corresponding to different brain regions, Oldham et al  found relatively large modules that are preserved between human and chimpanzee brains. Only one human brain module (corresponding to genes expressed in the cortex) was not preserved in chimpanzee brains. The original analysis focused on human modules and assessed their preservation in a corresponding chimpanzee co-expression network. We refer to such an analysis as a standard marginal module analysis since it simply determines whether a set of modules can be found in another network. Here we pursue a more comprehensive analysis that not only quantifies module preservation but also determines inter modular preservation. We refer to modules that are preserved among data sets as consensus modules. In our applications, we show that two consensus modules may be highly related to each other in one data set but unrelated in another. Inter-modular relationships are biologically interesting because changes in pathway dependencies may reflect biological perturbations.
In this work we present methods a) for finding consensus modules across multiple networks, b) for describing the relationship between consensus modules (eigengene networks), and c) for assessing whether the relationship between consensus modules is preserved across different networks (differential eigengene network analysis).
where N denotes the number of module eigengenes. Note that the scaled connectivity C I (A Eigen ) is close to 1 if the I-th eigengene has a high positive correlation with most other eigengenes.
The density D(A Eigen ) is close to 1 if most eigengenes have high positive correlations with each other.
Meta-modules in a single eigengene network
Since eigengenes form a network, one can use a module detection procedure to identify modules comprised of eigengenes. We refer to modules in an eigengene network as meta-modules. Meta-modules may reveal a higher order organization among gene co-expression modules. We use average linkage hierarchical clustering to define meta-modules as branches of the resulting cluster tree (Methods, Eq. 21). The resulting meta-modules are sets of positively correlated eigengenes.
Differential eigengene network analysis
Larger values of D(Preserv(1,2)) indicate stronger correlation preservation between all pairs of eigengenes across the two networks. Measures (5, 6) are intuitive, descriptive measures for assessing the extent of preservation between networks. To arrive at a statistical significance level (p-value), one can use a permutation test (described in Methods). Many statistical tests have been proposed to test for differences between correlations, e.g., [14–16].
Application 1: Differential eigengene network analysis of human and chimpanzee brain expression data
Here we report results of our differential eigengene network analysis of human and chimpanzee microarray brain data. The microarray data were originally published in . A gene co-expression analysis of these data is reported in . To facilitate a comparison with the original marginal module analysis, we used the genes selected by that work. The data, R code, and more details of this analysis can be found in Additional File 1 and on our webpage.
To find consensus modules, we used the consensus dissimilarity measure (Eq. 22) and average linkage hierarchical clustering. Genes of a given consensus module were assigned the same color, while unassigned genes were labeled grey. We found 7 consensus modules, shown in Fig. 2A: black (41 genes), blue (40 genes), brown (294 genes), pink (41 genes), red (78 genes), turquoise (884 genes), and yellow (151 genes). The functional enrichment analysis of these consensus modules is described below. For each data set, we represented the consensus modules by their corresponding module eigengenes and constructed an eigengene network between them (Eq. 1).
The differential eigengene network analysis yields two main novel findings that could not have been obtained using a standard marginal method. First, we find that the relationships between the module eigengenes are highly preserved. Figs. 2E and 2H show the eigengene networks AEigen,humanand AEigen,chimp, respectively. It is clear that the human and chimp eigengene networks of consensus modules are highly preserved. As described in Eq. (4), we defined a preservation network Preservehuman,chimp= Preserv(AEigen,human, AEigen,chimp) between the 7 consensus eigengenes.
For each individual eigengene, we find that its relationships with the other eigengenes is highly preserved as reflected by a high connectivity in the preservation network (Eq. 5): C red (Preservehuman,chimp) = 0.94, C black = 0.95, C yellow = 0.92, C turquoise = 0.95, C pink = 0.91, C blue = 0.91, C brown = 0.94. We find a high overall preservation (Eq. 6) between the two networks as reflected by a high density of the preservation network D(Preservehuman,chimp) = 0.93. Figs. 2F,G summarize our findings about the relationships of the consensus modules.
The second novel finding is that the consensus eigengenes in the human data set fall into three branches (meta-modules), see Fig. 2C. The first meta-module consists of the red, black, and yellow eigengenes; the second meta-module contains the turquoise eigengene; and the third meta-module contains the pink, blue and brown eigengenes. Remarkably, these 3 meta-modules can also be detected in the chimp data, see Fig. 2D. While the definition of consensus modules trivially implies that they are preserved between the two data sets, it is a non-trivial result that in this application the meta-modules are preserved as well.
To understand the biological meaning of the consensus modules, we studied differential expression of the consensus module eigengenes across the brain areas from which the microarray samples were taken. The results are summarized in Fig. 2 which shows the t-test p-values of differential expression of module eigengenes in the various brain regions from which samples were taken. Clearly, eigengenes can be characterized by their differential expression patterns in different brain regions. Furthermore, this analysis allows a biologically meaningful characterization of the meta-modules. The first meta-module (comprised of the black, yellow, and red module eigengenes) represents 270 genes that tend to be differentially expressed in the caudate nucleus. The second meta-module (comprised only of the turquoise eigengene) represents 884 genes that tend to be differentially expressed in cerebellum. The third meta-module (comprised of the pink, blue, and brown module eigengenes) represents 375 genes that are differentially expressed in the cortical samples. Thus, the meta-modules of this application correspond to biologically meaningful super-sets of modules and genes.
Given the strong relationships between modules in each meta-module, it is natural to ask whether the consensus modules are truly distinct. For example, the black and red modules show very similar levels of differential expression, see Fig. 2B. In this case, gene ontology information suggests that the two modules are indeed distinct. The black module is enriched with white matter related genes while no such enrichment can be found for the red module . Likewise, gene ontology suggests that the yellow and black modules are distinct even though their module eigengenes are correlated.
In summary, the eigengene network analysis reveals a higher order organization of the consensus modules in the transcriptome.
Comparing our findings to a standard marginal module analysis
A standard approach for comparing the modules between several network is to identify modules in a 'reference' network and to study the preservation of the module assignment in the other networks . In the original analysis, Oldham et al chose the human gene co-expression network as reference network since both preservation and non-preservation of human modules was of interest. This marginal module analysis is appropriate when the modules of one data set are the focus of the analysis but it is not designed to identify consensus modules that form the focus of our article. To compare differential eigengene network analysis analysis to the standard marginal module method, we compared our consensus modules to the 7 human modules found in . We used a pairwise Fisher exact test to determine whether there is significant overlap between the consensus and the human modules. The results are summarized in Additional File 2. Overall, we find good agreement between consensus modules and human specific modules, which reflects the fact that most human modules are preserved in chimpanzees. Most of the human modules can be assigned to a consensus module and vice-versa, except for the human blue (360 genes) and green (126) modules which mostly disappeared from the consensus. Interestingly, small remnants (24 and 12 genes, respectively) of the two modules form the majority of the only consensus module (labeled pink, 41 genes) that does not have a clear human counterpart. Another small remnant (33 genes) of the human blue module forms most of the consensus blue module (40 genes).
The green and blue human modules were found to represent mostly cortical samples (and cerebellum for the green module) and were the least preserved in chimpanzees . This is congruent with our finding of their lack of conservation using the consensus module method. One possible explanation for the absence of these modules in chimpanzees is that they largely reflect gene expression in the cerebral cortex, a brain region that has expanded dramatically in the human lineage. The standard marginal differential network analysis also identified several genes – LDOC1, EYA1, LECT1, PGAM2 – whose connectivities (Eq. 8) were significantly lower in the chimp network. None of these genes are present in our consensus modules, providing additional evidence of the method's agreement with the results of .
By definition, the consensus module detection is designed to find modules that are shared between data sets. Obviously, there will be many applications where data set specific modules are of interest. In such applications a standard marginal module detection analysis will be preferable.
Application 2: Differential eigengene network analysis of four mouse tissues
Figures 3F,K,P, and 3U show the eigengene networks AEigen,brain, AEigen,muscle, AEigen,liver, and A Eigen, adipose , respectively. To assess the preservation of consensus modules across pairs of tissues, we defined preservation networks (Eq. 15), e.g., Preservmuscle,adipose= Preserv(AEigen,muscle, AEigen,adipose). We find the following overall preservation values between the eigengene networks: D(Preservbrain,muscle) = 0.93, Dbrain,liver= 0.88, Dbrain,adipose= 0.85, Dmuscle,liver= 0.88, Dmuscle,adipose= 0.85, Dliver,adipose= 0.87. Hence, at the level of tissues, we observe good preservation between the consensus eigengene networks with highest preservation between the brain and muscle tissues. Interestingly, these two data sets also show the strongest relationships between the eigengenes in each data set (strongest red and green patterns in the heatmap plots). This can be measured by the density of the absolute values of ME correlations, Dcor ≡ D(|cor(E I , E J )|). For the muscle and brain network we find Dcor,muscle= 0.45 and Dcor,brain= 0.45. The eigengenes in liver show, as a data set, relationships somewhat similar to those of brain and muscle, though the patterns in the heatmap plot are not as strong, Dcor,liver= 0.37. The adipose tissue shows the weakest relationships between the module eigengenes, Dcor,adipose= 0.31. The eigengene preservations, e.g., C red (Preservemuscle,adipose) can be found in Fig. 3, in the upper triangle of the matrix of plots F-U.
As an aside, we mention that pairwise network preservation measures are directly comparable only when the compared preservation networks involve the same set of consensus eigengenes, as is the case in this four-tissue application.
We find that the eigengene networks contain meta-modules, i.e., groups of highly correlated eigengenes (Figs. 3B–E). As an example, we focus on the meta-modules in the brain eigengene network. As can be seen from Fig. 3, the consensus eigengenes in brain tissue form 3 meta-modules that are partially preserved in the other tissues. Specifically, the first brain meta-module consists of the black, blue, magenta, and red consensus eigengenes. It is highly preserved in muscle and adipose but less so in liver. The second brain meta-module consists of the green-yellow, pink and yellow consensus eigengenes. This meta-module is highly preserved in muscle and liver but less so in adipose. The third brain meta-module consists of the turquoise, green and purple eigengenes. It is highly preserved in liver and adipose but less so in muscle. These results show that meta-modules may or may not be preserved across the different eigengene networks.
To understand the biological meaning of the consensus modules, we used functional enrichment analysis using gene ontology information . The detailed results including alternative methods for adjusting for multiple comparisons can be found in the functional enrichment table presented in Additional File 4. Overall, we find that most modules are significantly enriched with known gene ontologies. Specifically, the black module is highly enriched with ribosomal genes (Bonferroni-corrected Fisher's exact p-value p = 8 × 10-10); the blue module with immune/stimulus/defense response (p < 3 × 10-17 for each of the three terms); brown with translation regulator activity (p = 4 × 10-3) and nucleotide binding (p = 5 × 10-3); magenta with stimulus/defense response (p < 2 × 10-6) and signal pathways (p < 2 × 10-3); red with cell cycle (p = 1.4 × 10-19) as well as nucleotide/ATP binding (p < 10-8); turquoise with protein binding (p = 6 × 10-3); yellow with carbohydrate metabolism (p = 3 × 10-4); pink and green-yellow with protein localization (p = 0.003 and p = 0.004), and green with alternative splicing/intracellular organelles (p = 4 × 10-4).
Our method detected two protein transport and localization modules (pink and green-yellow) and one may ask whether these modules are truly distinct. The two modules are closely related in 3 of the 4 data sets, but in the adipose tissue they have a weak (and negative) correlation of -0.24. Hence, from the consensus point of view, they are two distinct modules. Further, note that the green and black modules are very close on the consensus dendrogram, and their module eigengene (ME) correlation is high in absolute value but negative. The functional enrichment analysis suggests that the modules are different, although some terms are related (ribosomes for the black module and intracellular organelle for the green); this is an indication that the sign of the correlation of eigengenes is biologically meaningful.
While a standard marginal module analysis would succeed in studying preservation of individual data set modules, the consensus eigengene module analysis allows us to find shared modules and to study higher-order relationships between the consensus modules. Meta-modules in the brain tissues indicate the following relationships: the first (black, blue, magenta, red) suggests a relationship among ribosomal, immune/defense/stimulus response and cell cycle pathways; the second (green-yellow, pink, yellow) between protein localization and carbohydrate metabolism; the third (turquoise, green, purple) among protein binding and alternative splicing/intracellular organelle pathways.
The data also include clinical trait information on the mice (e.g., cholesterol and insulin levels, body weight, etc.), and one can ask whether some of the consensus modules (or more precisely, their eigengenes) relate significantly to any of the traits. We find no significant correlation between consensus module eigengenes and the traits. In application 3, we report significant relationships between consensus modules and clinical traits.
Permutation test of consensus module membership
Application 3: Consensus modules across female and male mouse liver tissues
The experimental data include clinical traits such as mouse body weight, cholesterol levels, etc. As detailed in Additional File 5, we selected 7 potentially interesting traits. Figs. 5H,I present the correlations and corresponding p-values for relating the clinical traits to the module eigengenes. We find that the turquoise module (605 genes) is highly significantly correlated with weight in both the female (r = 0.5, p = 5 × 10-8) and male samples (r = 0.47, p = 3.1 × 10-8). The greenyellow module (82 genes) relates to weight with comparable correlations, r = -0.44 (p = 8 × 10-8) and r = -0.50 (p = 4 × 10-9) in females and males, respectively. The yellow module is significantly related to insulin levels in both the female and male data sets, r = 0.38 (p = 5 × 10-6) and r = 0.35 (p = 7 × 10-5), respectively. The correlation between the eigengenes of the consensus turquoise and greenyellow modules are -0.68 and -0.74 in the female and male samples, respectively; the module eigengenes are relatively close by absolute value of the correlation, but the sign difference suggests that they distinct. This result is another motivation to use signed networks (Eq. 1) to describe the relationships between eigengenes.
Given that the female and male networks appear similar but not the same, one may ask whether the consensus module analysis provides an indication of how they differ. For this purpose we compared the female liver module assignment as reported in  to our consensus module assignment, see Additional File 6. Using the same parameters for the clustering and branch detection, we found that two of the 12 modules (labeled by salmon and light-yellow color) in that work are not represented in the consensus modules. Investigating the function of these two modules is beyond the scope of this work.
Simulation studies of consensus modules
To assess the performance of the consensus module detection method, we performed a simulation study involving two simulated gene expression data sets. The two data sets contained both shared and non-shared modules. The actual simulation procedure is described in more detail in Additional File 7 and the R code can be found on our webpage.
Briefly, each simulated module is built around a chosen seed profile (referred to as the true module eigengene) by adding gene expression profiles with increasing amount of noise. We studied the performance of consensus module detection under varying levels of added noise. The sensitivity and specificity are determined from the numbers of true and false positives (n TP and n FP ) and true and false negatives (n TN and n FN ) as Sensitivity = n TP /(n TP + n FN ), Specificity = n TN /(n TN + n FP ). To measure the fidelity of the calculated module eigengenes to the true module eigengenes, we report the proportion P0.95 of the detected modules whose eigengene has a correlation greater than 0.95 with the true module eigengene, i.e., Fidelity = P0.95. Results of the simulation are summarized in Table 1. We found that when noise is low and modules are very clearly defined, the sensitivity, specificity, and fidelity are 100%. It is worth noting that for low and moderate noise levels, the fidelity does not vary substantially with changes in the branch cut height, indicating that module eigengenes are robust to inclusion/exclusion of moderate numbers of genes in the module. As the noise increases, sensitivity, specificity, and fidelity decrease. We note that the specificity and sensitivity depend on the choice of cutting parameters for the cluster trees. We have not performed an exhaustive search to identify parameter values that would give optimal performance. Our default settings perform well across a range of different simulation models.
We propose the use of eigengene networks to study the relationships between co-expression modules. Eigengene networks will be useful for any module detection method that leads to modules of highly correlated genes. While eigengene networks can easily be adapted to other co-expression module detection methods, we define them within the framework of weighted gene co-expression network analysis since this framework preserves the continuous nature of the co-expression information and leads to robust results [4, 7]. Our empirical applications illustrate the kind of novel questions that can be addressed with eigengene networks. We find that modules can be organized into meta-modules that can be biologically meaningful and interesting.
Eigengene networks can naturally be integrated with other types of quantitative data. For example a microarray sample trait T (such as body weight or survival time) can be included as an additional node of the eigengene network. The adjacency between an eigengene E I and the sample trait T can be defined as aEigen,I,T= (1 + cor(E I , T))/2, generalizing Eq. (24) (Methods). Eigengenes that are adjacent to a clinical trait may correspond to pathways (modules) that are associated with the clinical trait. We illustrate this point in our third application involving female and male mouse liver data, in which we analyze the relationship between consensus modules and clinical traits.
Eigengene networks can be used for describing module relationships in a single data set (single eigengene network analysis) or they can be used to compare module relationships across different data sets (differential eigengene network analysis). To facilitate differential eigengene network analysis, we propose methods for finding consensus modules. Our approach for detecting consensus modules relies on a consensus dissimilarity measure (Methods, Eq. 22) that compares topological overlap matrices of different data sets. In our applications, consensus implies that the modules are present in all data sets (networks). In other applications, it may be preferable to relax this stringent requirement and look for 'common' modules instead. For example, if the number of studied data sets is large (say more than 5), the robustness can be increased by replacing the minimum by a suitably chosen quantile (e.g., the median). Details of such a generalization are presented in Methods.
Since we define the consensus topological overlap as a minimum (Methods, Eq. 18), a bias will result if the topological overlap of one network tends to be higher (or lower) than that of the other data sets because of non-biologic reasons including different microarray platforms, gene expression normalization methods, or different sample sizes. To address this potential bias, one can scale the individual topological overlap matrices or adjacency matrices. Alternatively, we describe a highly robust but less sensitive method for defining consensus modules in the Methods (Eq. 37). In brief, this robust method defines modules in each of the individual data sets and defines consensus modules by keeping track of shared module membership. Module detection depends on several parameters choices, e.g., how to cut off branches of a hierarchical cluster tree. In practice, it is advisable to carry out a robustness analysis with regard to the module definition. For example, the reader can use the R code published on our web page to verify that our findings are relatively robust. Since the module eigengene (first principal component) represents a suitably defined average gene expression profile, it is highly robust with regard to moderate changes in module membership. We find that the consensus eigengene network construction is highly robust and it has high sensitivity and specificity in our simulation studies.
We find that eigengene network methods lead to mathematically robust and biologically meaningful results. We provide three microarray data applications illustrating that eigengene networks effectively represent module relationships. Studies of inter-modular relationships may reveal changes in pathway dependencies due biological perturbations.
Network adjacency matrices and connectivity
Note that 0 ≤ D(A) ≤ 1. The density equals the average adjacency (connection strength) between the genes.
Transformations of the adjacency matrix
Note that Power(A, β) also satisfies the conditions of an adjacency matrix (Eq. 7). By choosing a power β > 1 the power transformation can be used to emphasize large adjacencies at the expense of low ones, i.e., the power transformation can be used for 'soft-thresholding' . We use this approach for defining weighted gene co-expression networks.
The preservation transformation Preserv(A(1), A(2),...) can be used to determine whether adjacencies are preserved between given networks A(1), A(2),.... Specifically,
Preserv ij (A(1), A(2),...) = 1 - [Max ij (A(1), A(2),...) - Min ij (A(1), A(2),...)].
is an aggregate measure of adjacency preservation between networks A(1) and A(2).
We now introduce the notion of a consensus network for given input adjacency matrices A(1), A(2),.... Intuitively, two nodes should be connected in a consensus network only if all of the input networks 'agree' on that connection. This naturally suggest to define
Consensus ij (A(1), A(2),...) = Min ij (A(1), A(2),...).
The Consensus transformation is related to Preserv (Eq. 15): if the Max(A(1), A(2),...) network is dense, that is if Max ij (A(1), A(2),...) ≈ 1 for all pairs of nodes i, j, we find Preserv(A(1), A(2),...) ≈ Consensus(A(1), A(2),...). On the other hand, if the Max network is sparse with most adjacencies close to zero, Preserv and Consensus differ.
We use the definition (18) in all our applications, but generalizations are of interest as well. Our definition of the Consensus adjacency may be too stringent when dealing with more than a handful of networks. To address this, we use the quantile transformation (Eq. 12) to define a more robust consensus network as follows
Consensusq,ij(A(1), A(2),...) = Quantq,ij(A(1), A(2),...).
Note that our Consensus network (Eq. 18) is a special case of Eq. (19) with q = 0. For q = 0.25 and q = 0.5, the resulting consensus network is defined as the first quartile and the median, respectively, of the input adjacencies.
Dissimilarity transformation for module detection
The dissimilarity transformation Dissim(A) turns an adjacency matrix (which is a measure of similarity) into a measure of dissimilarity by subtracting it from 1, i.e.,
Dissim ij (A) ≡ 1 - a ij .
This transformation is useful for defining module detection procedures. As an aside, we mention that Dissim(A) does not satisfy our definition of an adjacency matrix since its diagonal elements equal 0.
Module detection using hierarchical clustering
This dissimilarity is used as input to average linkage hierarchical clustering. Branches in the resulting cluster tree (dendrogram) are referred to as modules. As detailed in our R tutorials, we use two different branch cutting techniques: the constant-height cut method and the dynamic tree cut method . This module detection approach has been successfully used in several studies [6, 7, 11, 18, 22, 30].
Consensus modules are defined as modules in the consensus network (Eq. 18). Analogously to the single network case (Eq. 21), we define a consensus gene dissimilarity
and use it as input to average linkage hierarchical clustering. Consensus modules are defined as branches of the resulting clustering tree. By definition, consensus modules consist of genes that are closely related in all networks A(1), A(2),...; in other words, the modules are present in all networks.
Weighted gene co-expression network construction and module detection
Denote the i-th gene expression profile (where i = 1...n) across m microarrays as x i . Thus, x i is a vector with m components. We use two different measures of co-expression similarity to compare a pair of gene expression profiles x i and x j . The first measure S = (s ij ) is the absolute value of the Pearson correlation coefficient, i.e.,
s ij = |cor(x i , x j )|
Note that ssigned,ijequals 1, 1/2, and 0 if the correlation equals 1, 0, and -1, respectively.
The power transformation with β > 1 allows one to suppress low co-expression similarities that may be spurious while at the same time preserving the continuous nature of co-expression information.
In our applications, we use the unsigned adjacency (Eq. 25) to define gene co-expression networks. To choose the power β, we use the scale free topology criterion . We define eigengene networks using the signed adjacency (Eq. 26) because we find it useful to preserve the sign of the co-expression information between module eigengenes. The scaled connectivity C i (A signed ) is close to 1 and 0 if the correlations between x i and other network genes tend to be positive and negative, respectively.
An important step in network analysis is to identify modules of co-expressed genes. As detailed above, we define modules as clusters of genes with high topological overlap (Eq. 11) since this yields relatively large and robust modules [4, 6, 7, 22, 24].
To define the module eigengene of a module, we use the Singular Value Decomposition (SVD) of the module expression matrix . The gene expression matrix of the I-th module is denoted by X(I)= (), where the index i = 1, 2,...,n I corresponds to the module genes and the index l = 1, 2,...,m corresponds to the microarray samples. We assume that each gene expression profiles , i.e. each row of X(I), has been standardized to mean 0 and variance 1. The singular value decomposition of X(I)is denoted by
X(I)= UDV T ,
An equivalent definition can be given in terms of principal component analysis where the module eigengene is defined as the first principal component. Since the orientation (i.e., sign) of each singular vector is undefined, we fix the orientation of each eigengene by constraining it to have a positive correlation with the average gene expression across module genes. In practice, we find that the module eigengene explains typically more than 50 percent of the variance of the module expressions.
Relating genes within a module to the module eigengene
The closer is to 1 or -1, the stronger the evidence that the j-th gene is part of the I-th module.
Definition of meta-modules in eigengene networks
Using this dissimilarity as input to average linkage hierarchical clustering leads to a cluster tree of modules (represented by eigengenes). The branches of this tree correspond to meta-modules in our applications.
This dissimilarity is used to define and detect consensus meta-modules, that is meta-modules present in all input eigengene networks. A small consensus dissimilarity between two eigengenes is an indication that the modules are closely related in all studied data sets. This may be due to biological reasons, namely when corresponding modules represent distinct but interacting pathways. On the other hand, the corresponding modules could also be non-distinct and should be merged. For example, the module detection method may have been too sensitive, which results in many but related modules. To decide whether close modules should be merged, we suggest to use external information, e.g., gene ontology information, or to study module preservation in an independent data set.
Generalized consensus networks
A limitation of consensus networks defined by Eq. (18) (as well as by Eq. 19) and the consensus dissimilarity (22) is that the direct comparison of networks is only meaningful if the corresponding adjacencies have similar distributions. This need not be the case: differences in sample sizes, array platforms, or gene expression normalization methods may seriously bias the results of the Quantile transformation (Eq. 12). To address this, we propose a robust approach to defining consensus modules. The idea is to replace the TOM(1), TOM(2),... matrices in (22) by 'compressed' adjacency matrices , defined using the following steps. First, module detection is performed in each dataset separately, and corresponding module eigengenes are calculated. In data set s, denote by Module(s)(i) the index of the module that gene i belongs to. Module(s)(i) may encode the original module membership or it can be defined using the module eigengene based connectivity measure (Eq. 30): each gene is assigned to the module for which it has the maximum module eigengene based connectivity:
Module(s)(i) = argmax J (||).
As an aside, we mention that one could also use the quantile consensus Consensus q method (Eqs. 36, 37) instead of our minimum based consensus transformation.
Our definition (Eq. 37) is quite intuitive: the consensus dissimilarity is zero for two genes that belong to the same module in every individual set; the consensus dissimilarity will be small if the two genes belong to closely related modules in each data set; and for a gene outside any properly defined module (colored in grey in our applications), the dissimilarity with any other gene will attain its maximum value of 1. A potential major advantage of definition (Eq. 37) is that the individual dataset modules need not be obtained using the same module detection procedure. This allows finding consensus modules in datasets whose properties differ substantially. The differences can be countered by using appropriate data set specific module detection methods. However, this freedom of choice greatly increases the parameters used in the consensus module detection, and as a result, increases the danger of over-fitting. In contrast, our minimum-based consensus TOM method, Eq (22) involves only one clustering tree and hence involves far fewer parameters.
Simulation studies of consensus module detection.
Consensus module detection
Availability and requirements
Project name: Consensus Eigengene Networks
Operating system(s): Platform independent
Programming language: R
Licence: GNU GPL
We would like to thank Mike Oldham, Dan Geschwind and Winden Kellen for discussions about the human-chimp data; Jake Lusis, Tom Drake, Atila Van Nas for discussion about the mouse data; Tova Fuller, Anja Presson, Jun Dong, Wen Lin, Paul Mischel, Stan Nelson, and Lin Wang for helpful discussions. The work was supported in parts by grant 1U19AI063603 01.
- D'haeseleer P, Liang S, Somogyi R: Genetic network inference: from co-expression clustering to reverse engineering. Bioinformatics. 2000, 16 (8): 707-726. http://bioinformatics.oxfordjournals.org/cgi/content/abstract/16/8/707 10.1093/bioinformatics/16.8.707PubMedView Article
- Zhou X, Kao MC, Wong W: Transitive Functional Annotation by Shortest-path Analysis of Gene Expression Data. Proc Natl Acad Sci USA. 2002, 99 (20): 12783-12788. 10.1073/pnas.192159399PubMed CentralPubMedView Article
- Stuart JM, Segal E, Koller D, Kim SK: A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules. Science. 2003, 302 (5643): 249-255. 10.1126/science.1087447PubMedView Article
- Zhang B, Horvath S: A General Framework for Weighted Gene Co-expression Network Analysis. Statistical Applications in Genetics and Molecular Biology. 2005, 4: Article 17-10.2202/1544-6115.1128.View Article
- Wei H, Persson S, Mehta T, Srinivasasainagendra V, Chen L, Page G, Somerville C, Loraine A: Transcriptional Coordination of the Metabolic Network in Arabidopsis. Plant Physiol. 2006, 142 (2): 762-774. 10.1104/pp.106.080358PubMed CentralPubMedView Article
- Carlson MR, Zhang B, Fang Z, Horvath S, Mishel PS, Nelson SF: Gene Connectivity, Function, and Sequence Conservation: Predictions from Modular Yeast Co-expression Networks. BMC Genomics. 2006, 7 (40):
- Horvath S, Zhang B, Carlson M, Lu K, Zhu S, Felciano R, Laurance M, Zhao W, Shu Q, Lee Y, Scheck A, Liau L, Wu H, Geschwind D, Febbo P, Kornblum H, Cloughesy T, Nelson S, Mischel P: Analysis of Oncogenic Signaling Networks in Glioblastoma Identifies ASPM as a Novel Molecular Target. Proc Natl Acad Sci USA. 2006, 103 (46): 17402-17407. 10.1073/pnas.0608396103PubMed CentralPubMedView Article
- Albert R: Scale-free networks in cell biology. J Cell Sci. 2005, 118 (21): 4947-4957. 10.1242/jcs.02714PubMedView Article
- Barabási A, Oltvai Z: Network Biology: Understanding the Cell's Functional Organization. Nature Reviews: Genetics. 2004, 5 (2): 101-113. 10.1038/nrg1272PubMedView Article
- Hartwell L, Hopefield J, S L, Murray A: From Molecular to Modular Cell Biology. Nature. 1999, 402 (6761 Suppl): C47-52. 10.1038/35011540PubMedView Article
- Oldham M, Horvath S, Geschwind D: Conservation and Evolution of Gene Co-expression Networks in Human and Chimpanzee Brains. Proc Natl Acad Sci USA. 2006, 103 (47): 17973-17978. 10.1073/pnas.0605938103PubMed CentralPubMedView Article
- Fuller T, Ghazalpour A, Aten J, Drake T, Lusis A, Horvath S: Weighted Gene Co-expression Network Analysis Strategies Applied to Mouse Weight. Mammalian Genome. 2007, 6 (18): 463-472. 10.1007/s00335-007-9043-3.View Article
- Carter S, Brechb C, Griffin M, Bond A: Gene Co-expression Network Topology Provides a Framework for Molecular Characterization of Cellular State. Bioinformatics. 2004, 20 (14): 2242-2250. 10.1093/bioinformatics/bth234PubMedView Article
- Fisher RA: On the 'probable error' of a coefficient of correlation deduced from a small sample. Metron. 1915, 1: 1-32.
- Hotelling H: New light on the correlation coefficient and its transform. Journal of the Royal Statistical Society, Series B. 1953, 15 (2): 193-232.
- Jennrich RI: An Asymptotic χ2 Test for the Equality of Two Correlation Matrices. Journal of the American Statistical Association. 1970, 65 (330): 904-912. 10.2307/2284596.
- Khaitovich P, Muetzel B, She X, Lachmann M, Hellmann I, Dietzsch J, Steigele S, Do HH, Weiss G, Enard W, Heissig F, Arendt T, Nieselt-Struwe K, Eichler EE, Paabo S: Regional Patterns of Gene Expression in Human and Chimpanzee Brains. Genome Res. 2004, 14 (8): 1462-1473. 10.1101/gr.2538704PubMed CentralPubMedView Article
- Ghazalpour A, Doss S, Zhang B, Plaisier C, Wang S, Schadt E, Thomas A, Drake T, Lusis A, Horvath S: Integrating Genetics and Network Analysis to Characterize Genes Related to Mouse Weight. PloS Genetics. 2006, 2 (8): e130- 10.1371/journal.pgen.0020130PubMed CentralPubMedView Article
- Langfelder P, Zhang B, Horvath S: Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R. Bioinformatics. 2008, 24 (5): 719-720. 10.1093/bioinformatics/btm563PubMedView Article
- Dennis G, Sherman B, Hosack D, Yang J, Gao W, Lane H, Lempicki R: DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biology. 2003, 4 (9): R60-10.1186/gb-2003-4-9-r60. http://genomebiology.com/2003/4/9/R60 10.1186/gb-2003-4-9-r60PubMed CentralView Article
- Dong J, Horvath S: Understanding network concepts in modules. BMC Systems Biology. 2007, 1: 24-http://www.biomedcentral.com/1752-0509/1/24 10.1186/1752-0509-1-24PubMed CentralPubMedView Article
- Ravasz E, Somera A, Mongru D, Oltvai Z, Barabási A: Hierarchical Organization of Modularity in Metabolic Networks. Science. 2002, 297 (5586): 1551-1555. 10.1126/science.1073374PubMedView Article
- Li A, Horvath S: Network Neighborhood Analysis With the Multi-node Topological Overlap Measure. Bioinformatics. 2007, 23 (2): 222-231. 10.1093/bioinformatics/btl581PubMedView Article
- Yip A, Horvath S: Gene network interconnectedness and the generalized topological overlap measure. BMC Bioinformatics. 2007, 8: 22- http://www.biomedcentral.com/1471-2105/8/22 10.1186/1471-2105-8-22PubMed CentralPubMedView Article
- Bar-Joseph Z, Gerber G, Lee T, Rinaldi N, Yoo J, Robert F, Gordon DB, Fraenkel E, Jaakkola T, Young R, Gifford D: Computational discovery of gene modules and regulatory networks. Nature Biotechnology. 2003, 21 (11): 1337-1342. 10.1038/nbt890PubMedView Article
- Segal E, Shapira M, Regev A, Pe'er D, Botstein D, Koller D, Friedman N: Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat Genet. 2003, 34 (2): 166-76.PubMedView Article
- Xu X, Wang L, Ding D: Learning module networks from genome-wide location and expression data. FEBS Lett. 2004, 578 (3): 297-304. 10.1016/j.febslet.2004.11.019PubMedView Article
- Wu WS, Li WH, Chen BS: Computational reconstruction of transcriptional regulatory modules of the yeast cell cycle. BMC Bioinformatics. 2006, 7: 421- 10.1186/1471-2105-7-421PubMed CentralPubMedView Article
- Reiss D, Baliga N, Bonneau R: Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks. BMC Bioinformatics. 2006, 7: 280- 10.1186/1471-2105-7-280PubMed CentralPubMedView Article
- Ye Y, Godzik A: Comparative Analysis of Protein Domain Organization. Genome Biology. 2004, 14 (3): 343-353.
- Alter O, Brown P, Botstein D: Singular value decomposition for genome-wide expression data processing and modelling. PNAS. 2000, 97: 10101-10106. 10.1073/pnas.97.18.10101PubMed CentralPubMedView Article
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.