Biological interaction networks are conserved at the module level
© Zinman et al; licensee BioMed Central Ltd. 2011
Received: 16 March 2011
Accepted: 23 August 2011
Published: 23 August 2011
Orthologous genes are highly conserved between closely related species and biological systems often utilize the same genes across different organisms. However, while sequence similarity often implies functional similarity, interaction data is not well conserved even for proteins with high sequence similarity. Several recent studies comparing high throughput data including expression, protein-protein, protein-DNA, and genetic interactions between close species show conservation at a much lower rate than expected.
In this work we collected comprehensive high-throughput interaction datasets for four model organisms (S. cerevisiae, S. pombe, C. elegans, and D. melanogaster) and carried out systematic analyses in order to explain the apparent lower conservation of interaction data when compared to the conservation of sequence data. We first showed that several previously proposed hypotheses only provide a limited explanation for such lower conservation rates. We combined all interaction evidences into an integrated network for each species and identified functional modules from these integrated networks. We then demonstrate that interactions that are part of functional modules are conserved at much higher rates than previous reports in the literature, while interactions that connect between distinct functional modules are conserved at lower rates.
We show that conservation is maintained between species, but mainly at the module level. Our results indicate that interactions within modules are much more likely to be conserved than interactions between proteins in different modules. This provides a network based explanation to the observed conservation rates that can also help explain why so many biological processes are well conserved despite the lower levels of conservation for the interactions of proteins participating in these processes.
Accompanying website: http://www.sb.cs.cmu.edu/CrossSP
Basic cellular systems including the cell cycle, innate immunity, and mRNA translation operate in a similar manner across a large number of species. The proteins that participate in these systems are highly conserved, enabling many successful applications to infer gene function based on sequence similarity across species .
While genes with very similar sequence often perform the same function, dynamic properties of conserved proteins, including expression and interactions, seem to differ substantially between species. In studies profiling similar tissues in mouse and human, researchers found a large divergence in expression profiles  (correlations of 0.17 to 0.37 for orthologous genes, depending on the tissue). The correlation of cell cycle expression between two yeasts was determined to be around 0.1 . Similarly, in protein-DNA binding studies, researchers found that only 11% of binding interactions for highly conserved transcription factors overlapped between human and mouse . Studies of three yeast species with high sequence similarity identified only 20% overlap in binding targets  and similar results were obtained for bacteria . Protein interactions were also found to overlap at very low rates [7–10] (Gandhi et al. reported rates that are as low as less than 1% of the interactions between four species ). Only an estimated 18% to 29% of negative genetic interactions between S. cerevisiae and S. pombe were found to be conserved [11, 12].
Early studies have mainly focused on pairwise comparisons based on a single genomic data type. While the results in these early papers indicated low overlap between species, no attempt was made to generalize observations to address reasons for the lower conservation of interaction data when compared to sequence data conservation. Recent high throughput experiments with better coverage [13, 14] made it possible to reassess the conservation of interaction data. A number of possible reasons have been proposed to explain the lack of conservation for specific types of interaction data. For example, Fox et al.  observed that interactions connecting hub proteins are more conserved when compared to interactions involving proteins with a lower degree of connectivity. As they show using PPI data from multiple species, there is a positive correlation between the average degree of a protein and the conservation of its interacting partners. Byrne et al.  studied the genetic interaction networks of S.cerevisiae and C.elegans and reported that while only little overlap is seen for individual interactions, the properties of their genetic interaction networks are conserved. They proposed that changes in individual genetic interactions might be a form of evolution. Another direction suggested by Roguev et al.  demonstrated that conservation of interactions within protein complexes is higher than that of other interactions. They compared genetic interactions between chromatin-related genes in two yeasts and determined that protein complexes and the evolution of a new biological mechanism (RNAi) can help explain the minimal overlap observed, hypothesizing that protein-protein interactions pose a constraint on functional divergence in evolution. Similarly, Jensen et al. compared cell cycle expression of a number of species and discovered that while in-time expression was not conserved at the individual gene level, it was much more conserved at the protein complex level. Van Dam and Snel  showed that conservation rates for PPI within complexes in human and yeast are much higher than overall interaction conservation. On the other hand, Wang and Zhang  studied conservation of yeast, fly, and nematode PPI networks and determined that interactions in protein complexes are not conserved at levels that are higher than other interactions. Beltrao et al.  claimed that protein complexes are correlated with higher conservation only for stable interactions, while transient interactions, including phosphoregulation, are less conserved.
The experimental methods used to obtain expression data are large scale and produce measurements for the entire genome leading to a significantly better coverage of the interactome compared to the other data types. In addition, as there is no equivalent to protein complexes in expression data, early analysis of the conservation of dynamic properties in expression data focused on the identification of conserved expression modules across species [20–23]. While some important expression modules were conserved, many others were not.
The above discussion illustrates several (sometimes conflicting) trends observed for the conservation of interactions across species. One of the reasons for the disagreement between the results of these observations is the fact that each was only tested on a small dataset, often for only one type of interaction data (protein interaction, co-expression etc.), in one specific condition and between a single pair of species. To determine which of these trends hold more generally we performed a comprehensive analysis using four model organisms, and several genomic data types measured under a variety of conditions. As we show below, while all the proposed directions so far indeed explain part of the differences between species, none is enough to provide a comprehensive explanation. We have thus attempted to generalize these suggestions. Our findings suggest that while sequence and function are conserved at the individual protein level, interactions are conserved at a higher organizational level for which we use the term 'functional modules'. These results indicate that while gene-gene interactions are not well conserved, the overall network, through the intermediate level of modules, is conserved to a much higher degree.
Data collection and processing
We focused on four species for which large interaction datasets are available: the two yeasts S. cerevisiae and S. pombe, the nematode C. elegans, and the fruit fly D. melanogaster. We retrieved available sequence, expression, protein-protein interaction (PPI), and genetic interaction (GI) data as well as Gene Ontology (GO) annotations for all species. See Methods for details.
Conservation statistics between S. cerevisiae and S. pombe
Module based explanations
WMI -no hubs
Conservation of hub interactions
Several studies have previously analyzed specific interaction datasets in multiple species and identified trends in these datasets that differentiated conserved and non conserved interactions. To test how these generalize to the large datasets we collected we have reformulated some of these observed trends as possible explanations to the low conservation rates and analyzed them using our integrated networks. We first checked whether interactions involving hub proteins are more likely to be conserved. In order to examine this, we binned the nodes according to their degrees in the integrated network, and for each bin, we calculated the conservation rates for interactions involving at least one node whose degree falls into that bin. We found a positive correlation between the degree of the nodes and the conservation rates of the interactions that connect them with their partners (See Additional File 1 Figure S1). Fewer than 15% of the interactions involving nodes with low degrees (up to 300), which include the vast majority of the interactions, are conserved in both S. cerevisiae and S. pombe, while for those interactions involving nodes with high degrees (600-800), 24-26% are conserved. Therefore, we conclude that hub interactions are conserved at rates that are better than average, and the effect of hubs should be considered in subsequent analyses. Nonetheless, the conservation rates of hub interactions are still much lower than the conservation of sequence data and they provide only a limited explanation for the even lower conservation rates of all interactions.
Conservation of interactions within protein complexes
Protein complexes were previously shown  to have higher conservation rates. This analysis was limited to protein-protein interactions but interactions of other genomic data types that coincide with PPI were also shown to have higher conservation rates . In our analysis, we checked conservation rates for protein complexes that were defined in two recent studies in S. cerevisiae [13, 14]. Interactions in the integrated network that were part of the complexes defined by Krogan et al. were conserved at a rate of 26.22% (out of 3738 possible interactions), while the 1930 interactions that were part of the complexes identified by Gavin et al. had a conservation rate of 35.49%. Note that this is only a one-way comparison, since the complexes are defined only for S. cerevisiae. These results show that while conservation rates for interactions within protein complexes are indeed higher than the 'baseline' reported above, they still do not provide a complete and robust explanation to the question of conservation.
Conservation of interactions by molecular activity
Beltrao et al.  observed that stable interactions are more conserved than transient interactions for specific types of interactions (e.g., kinase-substrate interactions determined by phosphoproteomics). While we cannot obtain enough data to test this specific observation using our integrated networks, we did examine the role played by the various functions of proteins in distinguishing conserved and non conserved interactions. We looked at interactions for proteins with certain molecular functions (MF) with the rest of the genome for all molecular functions annotations in GO that contains more than 100 genes in S. cerevisiae. The average conservation rate for the molecular function term (GO:0003674, the root of the GO:MF tree) is similar to the baseline for the GO network (18%/22% - see Table 1). Interestingly, there are big differences for conservation rates for the different MF terms (See Additional File 1 Figure S2 and Additional File 3). Interactions that link transporters (GO:0005215) exhibit significantly lower rates of conservation probably due to their dynamic nature (8%/12%). A recent study on three yeast species  showed how differential expression of ABC transporters resulted in inherently different mechanisms for coping with an anti-fungal medicine. Interactions linking RNA polymerase II transcription factor activity (GO:0003702) also have lower conservation rates (9%/9%), possibly due to the specific regulation in each of the species and the transient nature of the interaction . Interactions connecting proteins annotated with kinase activity (GO:0016301), a category that consists of 222 proteins, are conserved at rates of 14%/23%, but the sub category of protein kinase activity (GO:0004672) that contains 135 proteins are conserved at rates of 19%/29% which is higher than the average. Interactions linking structural ribosome activity (GO:0003735) showed a significant higher-than-average conservation rate (25%/34%) which is in accordance with previous findings . It is important to note that the size of the molecular function terms did not have any effect on the conservation rates. To conclude, while the molecular function of a protein has an effect on the conservation rates of the interactions, we cannot establish a clear trend showing that stable interactions are always more conserved than transient interactions. Moreover, even the most conserved category, RNA binding activity (GO:0003723), shows only moderate conservation levels (26%/30%).
Extracting modules from diverse interaction datasets
Our analysis above indicates that the low conservation rates proposed so far (data type, hub status, protein complex, or protein activity) do not always generalize when applied to comprehensive data (See Table 1 for a summary of results formulated based on previous observations using our general large scale data). We thus hypothesized that a more general mechanism that combines elements from these proposed directions may be responsible for the low overlap between species. Specifically, we combined different types of interaction data to find gene modules, sets of highly interacting genes that often share similar function. Using these modules we studied the conservation of genomic interaction data at the network level rather than at the individual protein level. We used the Markov CLustering algorithm (MCL)  to search for modules in the integrated networks for each species (see Methods). MCL partitions a graph via a simulation of random walks effectively placing each node into exactly one module. Therefore, each module is a set of highly connected proteins and often contains different types of interactions. Since MCL can incorporate edge-weight information, edges that have higher linkage scores or are observed in more than one data type are more likely to be in the same module. MCL was also shown to be robust to random edge addition or removal , a key issue for noisy genomic data. Modules that did not include at least 3 nodes were discarded from further analyses (see Additional File 4 for a complete list of modules). Module sizes follow exponential distribution with very few modules containing more than 100 nodes (Additional File 1 Figure S3). As expected, many of the modules are significantly enriched with various functional GO categories (Additional File 5). In addition, some of the modules in S. cerevisiae significantly overlap protein complexes derived from high throughput experiments [13, 14], though many modules are not related to protein complexes (Additional File 6).
To evaluate the significance of our results, we created random networks for each of the real networks we studied for comparison. We tried two randomization methods; edge switching randomization and node label randomization (see Methods). The first randomization method retains the degree distribution of the original networks and the second randomization method retains both the degree distributions and diameters. We used these random networks to identify random modules and to compare them across species in the same way real modules were identified and analyzed. 1000 random networks were generated for each data type and the results were averaged.
Conservation of functional genomics data on the module level
Conservation rates of edges in different types of networks between S.cerevisiae and S. pombe
From S. cerevisiae to S. pombe
From S. pombe to S. cerevisiae
9.13 ± 0.04
9.26 ± 0.32
4.66 ± 0.48
5.22 ± 0.49
11.99 ± 0.06
13.31 ± 0.30
7.38 ± 0.57
7.71 ± 0.59
9.04 ± 0.05
9.68 ± 0.30
4.57 ± 0.50
5.05 ± 0.52
11.84 ± 0.05
12.72 ± 0.21
8.01 ± 0.60
8.34 ± 0.61
8.92 ± 0.05
9.58 ± 0.30
4.47 ± 0.49
5.38 ± 0.53
11.79 ± 0.05
12.68 ± 0.21
7.95 ± 0.60
8.24 ± 0.60
10.32 ± 0.05
10.2 ± 0.38
6.71 ± 0.78
7.12 ± 0.75
11.09 ± 0.05
12.27 ± 0.30
8.06 ± 0.70
8.46 ± 0.71
0.06 ± 0.01
0.06 ± 0.02
0.05 ± 0.09
1.42 ± 0.43
3.12 ± 0.42
3.70 ± 1.10
2.31 ± 1.33
2.62 ± 1.48
0.30 ± 0.05
0.29 ± 0.09
0.15 ± 0.19
1.68 ± 0.61
1.43 ± 0.27
1.50 ± 0.45
1.19 ± 1.27
1.73 ± 1.50
1.09 ± 0.05
0.96 ± 0.13
1.37 ± 1.96
2.89 ± 2.78
7.56 ± 0.29
7.17 ± 0.71
9.95 ± 10.16
10.98 ± 10.68
2.23 ± 0.08
2.16 ± 0.13
2.27 ± 2.12
3.78 ± 2.96
4.05 ± 0.11
4.28 ± 0.15
4.11 ± 2.58
5.22 ± 2.88
17.55 ± 0.64
25.61 ± 1.6
1.23 ± 0.76
1.96 ± 0.86
14.53 ± 0.39
28.88 ± 1.59
0.09 ± 0.15
0.34 ± 0.30
The conservation rates of extended-WMI are even higher (49.66%/31.97%, Table 2), while extended-BMI rates have only moderately increased (16.91%/20.79%), indicating that even if the specific interaction type is not observed in the other species, it may be that either it is actually present but was not measured, or that its effect is mediated indirectly through other members of the module.
We extended this analysis to all 12 pairwise species comparisons (note that the comparisons are not symmetric since the analysis depends on the query species, see Figure 2a). Figure 2b presents the results for all comparisons across the different data types (See also Figure 3, and Additional File 7). It can be seen that while the overall conservation rates change according to the distance between the species and the coverage of the specific data types, the overall trend is similar in all comparisons. Overall WMIs are more conserved than average, yet they are much less conserved in the random networks. Extended module conservation further increases the conservation rates. The only interaction type for which most comparisons do not show an improvement is negative GI. Indeed, negative GIs are often found between genes in parallel pathways rather than within the same pathway , so they are not expected to be conserved via modules.
In addition to using random networks as a control we carried out several other experiments to test the robustness of our findings and show that they are independent of the way the modules are defined, the amounts of data that are being used, or the orthology matching definitions.
To rule out the possibility that the WMI:BMI statistics are a result of the way the modules definition and parameter selection, we used an alternative graph clustering method, SPICi , to partition the networks into modules and ran the same analyses. SPICi uses a heuristic approach to greedily build clusters from selected seeds. This scheme is a bottom-up approach for partitioning the network whereas the other method we used, MCL, is a top-down approach. WMIs are shown to be conserved at higher rates than BMIs under this graph clustering scheme as well, for almost all species comparison and data types (Additional File 8). We tried using a novel method for evaluating module preservation  to check whether modules are preserved in terms of density and connectivity between the species regardless of the parameters used to obtain the modules. Even though the method was not intended for cross species analysis few modules were found to be significantly preserved (see Additional File 1 Supplementary Results and Figure S5).
In addition, we tested conservation rates for modules that are based on previous knowledge rather than clustering the interaction data. We created modules based on gene ontology terms that are defined based on direct experimental evidence only (precluding annotations that are defined by sequence similarity to avoid bias in the reported results, see Additional File 1 Supplementary Methods). While the resulting networks and modules are smaller and less comprehensive compared to our interactions data, the conservation trends for the GO-based modules are similar to the modules based on interaction data (Additional File 9). All together, these results show that our conclusions hold and are independent of the way the modules are defined, as long as there is a strong functional relationship within the module.
We also studied the effect of insufficient data coverage on our results. Missing data is the most common reason for differences between the true biological networks and our integrated networks. This is more likely to be the case for species other than S. cerevisiae, as fewer experiments for all data types were conducted. To this end, we randomly removed edges from the S. cerevisiae network and generated modules that are based on the trimmed networks. Calculating the conservation rates against S. pombe showed that in all cases our results regarding the large increase in WMI and extended-WMI conservation still hold (Additional File 10). Also, many of the modules from the full S. cerevisiae network were significantly retained in the trimmed networks (Additional File 1 Figure S4).
To rule out the possibility that our results are affected by the orthology definition we repeated the analysis using Inparanoid  mapping. Very similar results to the ones presented above were achieved for the one-to-one mappings generated from Inparanoid (not shown). Furthermore, we checked whether using many-to-many (M:N) Inparanoid mapping would change our results. Conservation definitions are slightly changed under M:N mapping definitions. We marked an edge as conserved in the query species if any edge between possible orthologous nodes in the reference species was conserved. While conservation statistics for both WMI and BMI in almost all species and data types naturally increased using the new definitions, the trend for WMI to have higher conservation rates is retained for most comparisons (Additional File 11).
We further evaluated the effect of stricter orthology mappings on the conservation patterns. We tried various orthology mappings between S. cerevisiae and S. pombe by keeping only high confidence orthology matching between the two species (Additional File 1 Supplementary methods). Stricter orthology mapping corresponded to fewer interactions whose functions are known to be more conserved (e.g., the ribosome complex), and showed similar or higher WMI/BMI conservation rate patterns for most comparisons (Additional File 12).
Lastly, we evaluated our results using an integrated network that included only the co-expression, PPI, and GI positive and did not include the sequence networks to rule out the possibility that our results are driven by paralog conservation. The trends we observed for our original analysis remained the same for this smaller network indicating that our module based conservation result is robust to the type of data used (see the "no-seqs" row in Table 2). Moreover, we created an additional network in which we further excluded all interactions (regardless of their type) connecting two nodes (genes) with BLASTP E-value cutoff of 1e-25 or less in all species. We observed the same trends for this network as for the other networks we analyzed (see the "exclude-para" row in Table 2) indicating that module-based conservation is a general trend that is independent of sequence conservation.
Conservation of modules across species
Our results indicate that while, in general, interactions at the node (protein) level are conserved at low rates, interactions within modules are conserved to a much greater degree. This raises the intriguing possibility that interactions are conserved on a level different from that of the individual genes. In other words, while there is a strong selective pressure to maintain interactions within a module, there is less pressure to maintain between-module interactions.
The within-module conservation statistics that are presented in this study are probably an underestimate for the real conservation rates due to the incompleteness of interaction data . Our results are robust with respect to varying the amount of available data (and coverage), when compared to random interaction networks, across all four species we studied. Many of the modules we discover independently in each species are significantly conserved across more than one species, and we expect this number to grow once additional data becomes available. This refined understating of conservation may lead to better cross species search tools that can utilize the network context in addition to sequence similarity.
Our results also shed new light on some recent discoveries about the relationships between genes associated with very different phenotypic outcomes in close species . The results suggest that while modules are conserved, interactions between modules may change at a higher pace, allowing modules involved in a specific function in one species to become involved in a different function in another species through interactions with other modules.
Our results indicate that although individual interactions in one species are generally conserved at lower levels when compared directly with a closely related species, interactions within functional modules are much more likely to be conserved. In contrast, interactions between functional modules are usually conserved at a lower rate than the general case. This may introduce flexibility in the evolution of networks since such between-module interactions can change more rapidly, allowing modules involved in a specific function in one species to become involved in a different function in another species through interactions with other modules.
All two-channel microarrays for S. cerevisiae, C. elegans, and D. melanogaster stored in Stanford Microarray Database (SMD, http://smd.stanford.edu) were retrieved. Default filtering options for both arrays and genes were applied to all the three organisms, resulting in 788 arrays for S. cerevisiae, 332 arrays for C. elegans, and 164 arrays for D. melanogaster.
All two-channel microarrays for S. pombe, were extracted from NCBI GEO (http://www.ncbi.nlm.nih.gov/geo) since SMD did not contain microarray data for S. pombe. For genes with several probes, the median log ratio of the probes was used as the value for the gene.
The Spearman correlation coefficient (SCC) was computed for all pairs of genes in each of the four species (see Additional File 1 Supplementary Methods). Following  we generated the co-expression network by computing log likelihood scores. These scores were computed using a probabilistic approach that assigns a score to each interaction between two genes based on their likelihood of participating in the same biological process (See Additional File 1 Supplementary Methods). All gene-pairs interactions with a positive score were connected in the co-expression network for that species. (All other interactions were not included). The log likelihood scores were calculated for each set of expression experiments, and if the interaction was observed in more than one experiment we used the maximal score from all experiments. The maximal score is an effective way to avoid cases where the expression experiments are not independent. See Additional File 2 for the distribution of edges in each of these networks.
Protein-protein interaction Network
We collected protein-protein interaction (PPI) data for the four species from several databases (see Additional File 1 Supplementary Methods). We took the union of all the PPIs documented in these databases and represented them as networks for each of the four species. We computed a log likelihood score for all PPI interactions. Unlike expression data for which we have correlation measurement for each edge leading to a unique score for each interaction, PPI networks are binary and result in a unique score for all interactions in each of the species (see Additional File 1 Supplementary Methods).
Genetic interaction network
We collected the genetic interaction (GI) data for the four species from BioGRID . For each species, one network for positive GIs and another for negative GIs were generated. See Supplementary methods for the types of BioGRID interactions designated as positive and negative GI. Again, log likelihood scores were computed for all GI interactions in a manner similar to the PPI networks (see Additional File 1 Supplementary methods).
Network representing paralogous genes within a species was generated by performing all-against-all BLASTP for each of the four organisms against itself. All genes that were matched with E-value less than 1E-25, divided by the number of genes in the species, were considered as interacting. Log likelihood scores were computed for all sequence interactions in a manner similar to the PPI networks (see Additional File 1 Supplementary Methods).
We generated a GO network for each species based on the Biological Process (BP) annotations in the Gene Ontology database (http://www.geneontology.org). We used the semantic similarity measures developed by Wang et al.  for this purpose, see Supplementary Methods for details. In calculating the gene-gene similarity scores, genes that are only annotated with large GO:BP (categories that contain more than 5% of the number of all genes in the corresponding species) were removed, since they are poorly characterized. A cutoff of 0.8 was applied for all the four species to convert the data into network representations.
The integrated network
The co-expression, PPI, positive GI, and sequence networks for each species were combined to generate an integrated weighted network by summing the log likelihood scores of an interaction from all networks. As the experiments from different genomic data are assumed to be independent, the summation should not create any bias for any edge in the integrated network.
We identified one-to-one mappings of orthologs for each pair of the four species. For S. cerevisiae and S. pombe, we first started from a manually curated list of orthologs for these two species . For cases of many-to-many mappings, all-against-all BLASTP was performed and pairs of genes that are each other's best reciprocal hit were assigned as additional one-to-one orthologs. For the other species, we directly used BLASTP to identify best reciprocal hits as one-to-one orthologs. In the additional robustness analyses, alternative orthology mappings for all species were downloaded from Inparanoid (Ver 7.0) . The one-to-one mappings from Inparanoid was generated by selecting the mappings with the higher bootstrap score.
The Markov Clustering algorithm (MCL)  was used to identify modules from each of the combined networks for the four species. The size distribution of all the modules for the four species is shown in Additional File 1 Figure S3. Modules with less than 3 genes were discarded from further analyses.
In order to evaluate the significance of our results, we used two randomization methods. In edge switching randomization, we generated randomized networks for each species and network type that preserved the degree distribution of the corresponding real networks. The randomized networks for each species were aggregated together into a combined randomized network for that species. We applied the same procedure that was used to analyze the real data on these randomized networks. Specifically, we ran MCL on each of the combined randomized network to get randomized modules for each species. Then, for each randomized network in species A, we compared it with the corresponding real network in species B using the randomized modules in A and the real modules in B, and we checked how many within/between-modules interactions in A (randomized) are conserved directly in B (real), and how many edges in A are not directly conserved but their orthologs lie in the same module in B (extended module conservation). In the second randomization method we used, node label randomization, we permuted the node labels in species A and compared it with the corresponding real network in species B in the same way as described above. For both methods, 1000 independent randomizations were performed and the p-values we report are based on the results obtained for these 1000 networks.
Matching modules across species
Modules between any two species were matched using a modified hypergeometric test, see Supplementary Methods for details. The p-values were Bonferroni corrected by multiplying by the number of modules from both species. If both of the reciprocal corrected conditional probabilities were below a cutoff of 0.05, we defined the modules as matching. (See Figure 4 and Additional File 13).
Matching S. cerevisiae modules with protein complexes
Markov clustering algorithm
Stanford microarray database
Spearman correlation coefficient.
We thank Zoltan Oltvai for critically reviewing the manuscript and for useful suggestions. Work supported in part by NIH grant 1RO1 GM085022 and NSF DBI-0965316 award to Z.B.J.
- Bork P, Dandekar T, Diaz-Lazcoz Y, Eisenhaber F, Huynen M, Yuan Y: Predicting function: from genes to genomes and back. J Mol Biol. 1998, 283: 707-725. 10.1006/jmbi.1998.2144View ArticlePubMedGoogle Scholar
- Liao BY, Zhang J: Low rates of expression profile divergence in highly expressed genes and tissue-specific genes during mammalian evolution. Mol Biol Evol. 2006, 23: 1119-1128. 10.1093/molbev/msj119View ArticlePubMedGoogle Scholar
- Rustici G, Mata J, Kivinen K, Lio P, Penkett CJ, Burns G, Hayles J, Brazma A, Nurse P, Bahler J: Periodic gene expression program of the fission yeast cell cycle. Nat Genet. 2004, 36: 809-817. 10.1038/ng1377View ArticlePubMedGoogle Scholar
- Odom DT, Dowell RD, Jacobsen ES, Gordon W, Danford TW, MacIsaac KD, Rolfe PA, Conboy CM, Gifford DK, Fraenkel E: Tissue-specific transcriptional regulation has diverged significantly between human and mouse. Nat Genet. 2007, 39: 730-732. 10.1038/ng2047PubMed CentralView ArticlePubMedGoogle Scholar
- Borneman AR, Gianoulis TA, Zhang ZD, Yu H, Rozowsky J, Seringhaus MR, Wang LY, Gerstein M, Snyder M: Divergence of transcription factor binding sites across related yeast species. Science. 2007, 317: 815-819. 10.1126/science.1140748View ArticlePubMedGoogle Scholar
- Price MN, Dehal PS, Arkin AP: Orthologous transcription factors in bacteria have different functions and regulate different genes. PLoS Comput Biol. 2007, 3: 1739-1750.View ArticlePubMedGoogle Scholar
- Fox A, Taylor D, Slonim DK: High throughput interaction data reveals degree conservation of hub proteins. Pac Symp Biocomput. 2009, 391-402.Google Scholar
- Suthram S, Sittler T, Ideker T: The Plasmodium protein network diverges from those of other eukaryotes. Nature. 2005, 438: 108-112. 10.1038/nature04135PubMed CentralView ArticlePubMedGoogle Scholar
- Gandhi TKB, Zhong J, Mathivanan S, Karthick L, Chandrika KN, Mohan SS, Sharma S, Pinkert S, Nagaraju S, Periaswamy B, et al.: Analysis of the human protein interactome and comparison with yeast, worm and fly interaction datasets. Nat Genet. 2006, 38: 285-293. 10.1038/ng1747View ArticlePubMedGoogle Scholar
- Yu H, Luscombe NM, Lu HX, Zhu X, Xia Y, Han JJ, Bertin N, Chung S, Vidal M, Gerstein M: Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs. Genome Res. 2004, 14: 1107-1118. 10.1101/gr.1774904PubMed CentralView ArticlePubMedGoogle Scholar
- Roguev A, Bandyopadhyay S, Zofall M, Zhang K, Fischer T, Collins SR, Qu H, Shales M, Park H, Hayles J, et al.: Conservation and rewiring of functional modules revealed by an epistasis map in fission yeast. Science. 2008, 322: 405-410. 10.1126/science.1162609PubMed CentralView ArticlePubMedGoogle Scholar
- Dixon SJ, Fedyshyn Y, Koh JLY, Prasad TSK, Chahwan C, Chua G, Toufighi K, Baryshnikova A, Hayles J, Hoe K, et al.: Significant conservation of synthetic lethal genetic interaction networks between distantly related eukaryotes. Proc Natl Acad Sci USA. 2008, 105: 16653-16658. 10.1073/pnas.0806261105PubMed CentralView ArticlePubMedGoogle Scholar
- Krogan NJ, Cagney G, Yu H, Zhong G, Guo X, Ignatchenko A, Li J, Pu S, Datta N, Tikuisis AP, et al.: Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature. 2006, 440: 637-643. 10.1038/nature04670View ArticlePubMedGoogle Scholar
- Gavin A, Aloy P, Grandi P, Krause R, Boesche M, Marzioch M, Rau C, Jensen LJ, Bastuck S, Dumpelfeld B, et al.: Proteome survey reveals modularity of the yeast cell machinery. Nature. 2006, 440: 631-636. 10.1038/nature04532View ArticlePubMedGoogle Scholar
- Byrne AB, Weirauch MT, Wong V, Koeva M, Dixon SJ, Stuart JM, Roy PJ: A global analysis of genetic interactions in Caenorhabditis elegans. J Biol. 2007, 6: 8- 10.1186/jbiol58PubMed CentralView ArticlePubMedGoogle Scholar
- Jensen LJ, Jensen TS, de Lichtenberg U, Brunak S, Bork P: Co-evolution of transcriptional and post-translational cell-cycle regulation. Nature. 2006, 443: 594-597.PubMedGoogle Scholar
- van Dam TJ, Snel B: Protein complex evolution does not involve extensive network rewiring. PLoS Comput Biol. 2008, 4: e1000132- 10.1371/journal.pcbi.1000132PubMed CentralView ArticlePubMedGoogle Scholar
- Wang Z, Zhang J: In search of the biological significance of modular structures in protein networks. PLoS Comput Biol. 2007, 3: e107- 10.1371/journal.pcbi.0030107PubMed CentralView ArticlePubMedGoogle Scholar
- Beltrao P, Trinidad JC, Fiedler D, Roguev A, Lim WA, Shokat KM, Burlingame AL, Krogan NJ: Evolution of phosphoregulation: comparison of phosphorylation patterns across yeast species. PLoS Biol. 2009, 7: e1000134- 10.1371/journal.pbio.1000134PubMed CentralView ArticlePubMedGoogle Scholar
- Segal E, Friedman N, Kaminski N, Regev A, Koller D: From signatures to models: understanding cancer using microarrays. Nat Genet. 2005, 37 (Suppl): S38-S45.View ArticlePubMedGoogle Scholar
- Bergmann S, Ihmels J, Barkai N: Similarities and differences in genome-wide expression data of six organisms. PLoS Biol. 2004, 2: E9- 10.1371/journal.pbio.0020009PubMed CentralView ArticlePubMedGoogle Scholar
- Stuart JM, Segal E, Koller D, Kim SK: A gene-coexpression network for global discovery of conserved genetic modules. Science. 2003, 302: 249-255. 10.1126/science.1087447View ArticlePubMedGoogle Scholar
- Berg J, Lassig M: Cross-species analysis of biological networks by Bayesian alignment. Proc Natl Acad Sci USA. 2006, 103: 10967-10972. 10.1073/pnas.0602294103PubMed CentralView ArticlePubMedGoogle Scholar
- Lee I, Lehner B, Crombie C, Wong W, Fraser AG, Marcotte EM: A single gene network accurately predicts phenotypic effects of gene perturbation in Caenorhabditis elegans. Nat Genet. 2008, 40: 181-188. 10.1038/ng.2007.70View ArticlePubMedGoogle Scholar
- Lee I, Date SV, Adai AT, Marcotte EM: A probabilistic functional network of yeast genes. Science. 2004, 306: 1555-1558. 10.1126/science.1099511View ArticlePubMedGoogle Scholar
- Hertz-Fowler C, Peacock CS, Wood V, Aslett M, Kerhornou A, Mooney P, Tivey A, Berriman M, Hall N, Rutherford K, et al.: GeneDB: a resource for prokaryotic and eukaryotic organisms. Nucleic Acids Res. 2004, 32: D339-D343. 10.1093/nar/gkh007PubMed CentralView ArticlePubMedGoogle Scholar
- O'Brien KP, Remm M, Sonnhammer ELL: Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res. 2005, 33: D476-D480.PubMed CentralView ArticlePubMedGoogle Scholar
- Sipiczki M: Where does fission yeast sit on the tree of life?. Genome Biol. 2000, 1: S1011-View ArticleGoogle Scholar
- Kuo D, Tan K, Zinman G, Ravasi T, Bar-Joseph Z, Ideker T: Evolutionary divergence in the fungal response to fluconazole revealed by soft clustering. Genome Biol. 2010, 11: R77- 10.1186/gb-2010-11-7-r77PubMed CentralView ArticlePubMedGoogle Scholar
- Smits P, Smeitink JAM, van den Heuvel LP, Huynen MA, Ettema TJG: Reconstructing the evolution of the mitochondrial ribosomal proteome. Nucleic Acids Res. 2007, 35: 4686-4703. 10.1093/nar/gkm441PubMed CentralView ArticlePubMedGoogle Scholar
- Enright AJ, Van Dongen S, Ouzounis CA: An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 2002, 30: 1575-1584. 10.1093/nar/30.7.1575PubMed CentralView ArticlePubMedGoogle Scholar
- Brohee S, van Helden J: Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinformatics. 2006, 7: 488- 10.1186/1471-2105-7-488PubMed CentralView ArticlePubMedGoogle Scholar
- Jiang P, Singh M: SPICi: a fast clustering algorithm for large biological networks. Bioinformatics. 2010, 26: 1105-1111. 10.1093/bioinformatics/btq078PubMed CentralView ArticlePubMedGoogle Scholar
- Langfelder P, Luo R, Oldham MC, Horvath S: Is my network module preserved and reproducible?. PLoS Comput Biol. 2011, 7: e1001057- 10.1371/journal.pcbi.1001057PubMed CentralView ArticlePubMedGoogle Scholar
- Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, Macisaac KD, Danford TW, Hannett NM, Tagne J, Reynolds DB, Yoo J, et al.: Transcriptional regulatory code of a eukaryotic genome. Nature. 2004, 431: 99-104. 10.1038/nature02800PubMed CentralView ArticlePubMedGoogle Scholar
- Liao B, Zhang J: Null mutations in human and mouse orthologs frequently result in different phenotypes. Proc Natl Acad Sci USA. 2008, 105: 6987-6992. 10.1073/pnas.0800387105PubMed CentralView ArticlePubMedGoogle Scholar
- Stark C, Breitkreutz B, Reguly T, Boucher L, Breitkreutz A, Tyers M: BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 2006, 34: D535-D539. 10.1093/nar/gkj109PubMed CentralView ArticlePubMedGoogle Scholar
- Wang JZ, Du Z, Payattakool R, Yu PS, Chen C: A new method to measure the semantic similarity of GO terms. Bioinformatics. 2007, 23: 1274-1281. 10.1093/bioinformatics/btm087View ArticlePubMedGoogle Scholar
- Wood V: Schizosaccharomyces pomb comparative genomics; from sequence to systems. Comparative Genomics. Edited by: Sunnerhagen P, Piskur J. 2006, 15: 233-285. 10.1007/4735_97. Springer Berlin/HeidelbergView ArticleGoogle Scholar