Understanding protein evolutionary rate by integrating gene co-expression with protein interactions
© Pang et al; licensee BioMed Central Ltd. 2010
Received: 26 February 2010
Accepted: 30 December 2010
Published: 30 December 2010
Among the many factors determining protein evolutionary rate, protein-protein interaction degree (PPID) has been intensively investigated in recent years, but its precise effect on protein evolutionary rate is still heavily debated.
We first confirmed that the correlation between protein evolutionary rate and PPID varies considerably across different protein interaction datasets. Specifically, because of the maximal inconsistency between yeast two-hybrid and other datasets, we reasoned that the difference in experimental methods contributes to our inability to clearly define how PPID affects protein evolutionary rate. To address this, we integrated protein interaction and gene co-expression data to derive a co-expressed protein-protein interaction degree (ePPID) measure, which reflects the number of partners with which a protein can permanently interact. Thus, irrespective of the experimental method employed, we found that (1) ePPID is a better predictor of protein evolutionary rate than PPID, (2) ePPID is a more robust predictor of protein evolutionary rate than PPID, and (3) the contribution of ePPID to protein evolutionary rate is statistically independent of expression level. Analysis of hub proteins in the Structural Interaction Network further supported ePPID as a better predictor of protein evolutionary rate than the number of distinct binding interfaces and clarified the slower evolution of co-expressed multi-interface hub proteins over that of other hub proteins.
Our study firmly established ePPID as a robust predictor of protein evolutionary rate, irrespective of experimental method, and underscored the importance of permanent interactions in shaping the evolutionary outcome.
Among the many factors determining protein evolutionary rate [1–5], protein-protein interaction degree (PPID), defined as the number of interaction partners a protein has in a protein interaction network, is an important predictor. A negative correlation between protein evolutionary rate and PPID was first reported in , which is consistent with the "functional density" hypothesis  that protein evolutionary rate is primarily determined by the proportion of residues involved in specific functions. Since then, several differing conclusions have been drawn. The controversies mainly focus on whether the correlation between PPID and protein evolutionary rate (1) is an artefact of biased protein interaction datasets [8–12], (2) is linked to experimental setup that favors counting more interactions for abundant proteins [13–15], or (3) is confounded by other genomic variables [16, 17].
The relationship between protein evolutionary rate and PPID is mostly studied through hub proteins, i.e., proteins with a large number of interaction partners, from many different aspects [18–23]. For example, hub proteins can be classified into date and party hubs , singlish-interface and multi-interface hubs , singlish-iMotif and multi-iMotif hubs . It was found that multi-interface hubs are mostly party hubs and singlish-interface hubs are mostly date hubs . It was also found that party hubs evolve more slowly than date hubs [18, 20] and multi-interface hubs evolve more slowly than singlish-interface hubs , but these findings are also challenged [19, 21]. Furthermore, it was found that multi-iMotif hubs do not evolve more slowly than singlish-iMotif hubs . These lines of evidence suggest a profound lack of consensus about the evolutionary rate differences between different types of hub proteins.
Therefore, in this paper, we first re-investigated the relationship between protein evolutionary rate and protein-protein interaction degree (PPID) and confirmed that the correlation between protein evolutionary rate and PPID varies considerably across different protein interaction datasets. We then integrated protein interaction and gene co-expression data to derive a co-expressed protein-protein interaction degree (ePPID) measure, which reflects the number of partners with which a protein can permanently interact. Our results demonstrated that ePPID is a more robust predictor of protein evolutionary rate than PPID. It was further found that the contribution of ePPID to protein evolutionary rate is statistically independent of expression level. Finally, we established that ePPID could predict protein evolutionary rate better than the number of distinct binding interfaces for hub proteins in the Structural Interaction Network and clarified the slower evolution of co-expressed multi-interface hub proteins over that of other hub proteins.
Controversial correlations between PPID and protein evolutionary rate
Spearman correlation of PPID, ePPID and betweenness with protein evolutionary rate.
Protein interaction datasets
PPID vs. dN
ePPID vs. dN
betweenness vs. dN
The variance of protein evolutionary rate explained by PPID and ePPID when controlling for protein abundance.
Protein interaction datasets
Percent variance explained in dN
PPID control for Log(abundance)(p)
ePPID control for Log(abundance) (p)
Spearman correlation and partial Spearman correlation of PPID and ePPID with protein evolutionary rate.
Protein interaction datasets
PPID vs. dN
PPID vs. dN control for abundance
ePPID vs. dN
ePPID vs. dN control for abundance
Co-expressed protein-protein interaction degree (ePPID) predicts protein evolutionary rate better than PPID
Proteins with higher PPID are assumed to have a greater proportion of residues involved in interactions and thus evolve more slowly than proteins with lower PPID [6, 9]. This may be true for a protein with many permanent interaction partners, because the protein tends to form a permanent complex with its partners through multiple distinct binding interfaces and may have a greater proportion of interface residues . However, a protein with many transient interaction partners may transiently interact with its different partners through the same binding interface (though it is possible that the protein may form a transient complex with its partners through multiple distinct binding interfaces), thus the PPID of the protein may not well reflect the proportion of its interface residues . Furthermore, interface residues of permanent interactions are found to evolve more slowly than those of transient interactions [37, 38]. In other words, permanent interactions are more likely to exert higher selective constrains on protein evolution [18, 20, 22, 37–39] and protein evolutionary rate may be more reflective of the proportion of residues involved in permanent interactions. On the other hand, permanent interactions tend to show significant co-expression [32, 40], so we speculate that the number of a protein's co-expressed interaction partners may well reflect the proportion of its residues involved in permanent interactions and thus better predict its evolutionary rate.
Therefore, we proposed a co-expressed protein-protein interaction degree (ePPID), defined as the maximal number of co-expressed interaction partners of a given protein in all gene expression datasets we used (in fact, other variations of such definition yield similar results, see Additional file 1, Text S4), to estimate the number of partners with which a protein can permanently interact (see Methods). It can be seen from the bottom panels of Figure 1A-C and Figure 1D-F that the ePPID measure has statistically significant negative correlation coefficients with protein evolutionary rate across all protein interaction datasets we studied. As shown in Table 1 (column 4 versus column 3), it is clear that the statistical significance obtained by ePPID is better than that obtained by PPID. Accordingly, across all protein interaction datasets, ePPID explains a higher percentage variance of protein evolutionary rate than PPID (Figure 1 and Additional file 1, Table S1). These results indicate that ePPID is a better predictor of protein evolutionary rate than PPID. In addition, our further analysis indicated that ePPID predicts evolutionary rate better than betweenness [46–48], another network centrality measure (the last column of Table 1).
We then found that Spearman rank correlations between ePPID and protein abundance are all statistically significant (see Additional file 1, Text S2), suggesting that protein abundance might be a confounding factor for the high correlations between ePPID and protein evolutionary rate. To address this question, we studied the percentage variance of protein evolutionary rate explained by ePPID when protein abundance is controlled for (the last column of Table 2). As can be seen, considerable percent variances of evolutionary rate explained by ePPID remain in all the six protein interaction datasets, with the exception of "DIP-CORE". In addition, partial Spearman correlation coefficient and corresponding statistical significance between ePPID and evolutionary rate (by controlling for protein abundance; the last column of Table 3) are reduced as compared to the original correlations (column 5 of Table 3) in all protein interaction datasets. However, the fact that these partial correlations all remain highly significant (the last column of Table 3) also suggests that ePPID makes an independent contribution to protein evolutionary rate. Moreover, with the exception of the "DIP-FULL" dataset, the partial correlations between ePPID and protein evolutionary rate are more significant than those between PPID and protein evolutionary rate after controlling for protein abundance (Table 3, the last column versus column 4), further indicating that ePPID is a better predictor of protein evolutionary rate than PPID. In fact, similar results (Additional file 1, Table S2 and S3) were found using three other protein evolutionary rate data (corresponding to different out-group controls, including S.cer vs S.par, S.cer vs S.mik and S.cer vs S.bay, see  for details). Mechanistically, we believe that permanent interactions impose more selective pressure on protein evolution than transient interactions, and protein evolutionary rate is more reflective of the number of a protein's permanent interaction partners as measured by ePPID.
The effect of transient interactions on predicting protein evolutionary rate
With the co-expression information, our ePPID measure can filter out many transient interactions. Thus, we next wanted to study why removing transient protein interactions improved the correlation. In the "Y2H-union" dataset, we noticed that ePPID explains more than four times the variance of evolutionary rate than does PPID; however, in other datasets, the improvements are generally less than three times (Additional file 1, Table S1). This result suggests that ePPID has filtered out many transient protein interactions in the "Y2H-union" dataset, which may be the reason of lower percent variance of evolutionary rate explained by PPID. On the other hand, improvements are less dramatic in other datasets because transient protein interactions are less enriched. Consistent with this notion, our study on non-co-expressed protein interactions (see Additional file 1, Text S5 for details) suggested that transient interactions are most enriched in the "Y2H-union" dataset (46.1%) while least enriched in the "Combined-AP/MS" dataset (14.4%, column 4 of Additional file 1 Table S4), which is also consistent with the fact that transient protein interactions are less co-expressed than permanent co-complex associations . In addition, the number of transient interaction partners of a protein even appears to be positively correlated with protein evolutionary rate (see Additional file 1, Text S5).
Since the "Y2H-union" dataset is enriched for transient physical interactions, ePPID in this dataset mainly filters out a protein's transient physical interactions and thus reflects the number of the protein's permanent physical interactions. In the "Combined-AP/MS" dataset which is enriched for permanent co-complex associations, ePPID mainly filters out a protein's transient co-complex associations and thus reflects the number of the protein's permanent co-complex associations. In the "Combined-AP/MS" dataset, ePPID may be overestimated due to indirect non-physical interactions (co-complex associations). Despite this effect, permanent interactions do place higher selective constrains on protein evolution than transient interactions do, further illustrating why our ePPID measure could better predict protein evolutionary rate.
However, the variance of protein evolutionary rate explained by ePPID is still the lowest in the "Y2H-union" dataset, which may be explained in three ways. First, ePPID cannot filter out all transient protein interactions, partly because of noise in the gene expression datasets we used. Second, Y2 H datasets may contain co-expressed protein pairs which are localized to different cellular compartments and seldom interact in nature. Third, ePPID may be underestimated based on incompleteness of Y2 H datasets [30, 32], which is also reflected by the lowest average degree in the "Y2H-union" dataset (column 5 of Additional file 1, Table S4).
Global study of ePPID and other genomic variables for protein evolutionary rate
A number of genomic variables, such as expression level [16, 25, 50–52], functional dispensability [25, 53] and pleiotropic effect [54, 55], are proposed to be associated with protein evolutionary rate. Also, these variables may have redundancy since they are correlated with each other. Therefore, we next attempted to determine the possible confounding effect of these variables on the correlations between ePPID and protein evolutionary rate.
Principal component regression analysis on six predictor variables and protein evolutionary rate for 752 yeast proteins in the "Y2H-union" dataset.
Percent variance explained in dN
Principal component regression analysis on six predictor variables and protein evolutionary rate for 723 yeast proteins in the "Combined-AP/MS" dataset.
Percent variance explained in dN
Principal component regression analysis on six predictor variables and protein evolutionary rate for 639 yeast proteins in the "LC-multiple" dataset.
Percent variance explained in dN
Results show that the first principal component explains much more variance of protein evolutionary rate than the other components in all the six datasets. Thus, in the following, we focus on the first principal component to study the percentage contribution of ePPID. In the "Combined-AP/MS", "LC-multiple", "Updated-HC", "DIP-CORE" and "DIP-FULL" datasets, the contribution of ePPID to the first principal component is more than that of all other variables. In the "Y2H-union" dataset, the ePPID contribution is more than betweenness, but less than the other four variables. Consistently, the independent contribution of ePPID to the total variance of protein evolutionary rate dN explained by all the six principal components in most datasets is comparable to that of the expression-related variables of mRNA abundance and protein abundance (Additional file 1, Table S7). Similar results were obtained when using codon adaptation index (CAI)  instead of mRNA abundance or protein abundance to perform analysis (see Additional file 1, Text S6). Furthermore, when using three expression-related variables of mRNA abundance, protein abundance and CAI to perform analysis, ePPID still has a considerable and independent contribution to protein evolutionary rate (see Additional file 1, Text S6). We therefore concluded that ePPID has an important and independent effect on protein evolutionary rate, confirming the importance and novelty of our proposed new measure.
Proteins with more co-expressed partners evolve more slowly than those with less co-expressed partners
ePPID helps the understanding of protein evolutionary rate in the Structural Interaction Network dataset
Protein interactions can also be studied from a structural perspective. We next applied our co-expressed and non-co-expressed protein classification method to hub proteins (with ≥5 protein interaction partners) in the "SIN" dataset  and studied the relationship between ePPID and the number of binding interfaces. As shown in Additional file 1, Table S9, non-co-expressed hubs correspond mostly to singlish-interface hubs, whereas co-expressed hubs correspond mostly to multi-interface hubs (Fisher's exact test, P = 1.63e-3), suggesting that co-expression may be a characteristic of proteins with many distinct interfaces, which enable these proteins to interact together permanently. To test our hypothesis, we studied whether a correlation exists between ePPID and the number of binding interfaces from . As it turned out, the correlation is highly significant (Spearman rank correlation rho = 0.408, P = 4.40e-8). Considering the difficulties in obtaining protein structure data, this result suggests that the ePPID measure is a good predictor of the number of binding interfaces of a protein.
Application of ePPID in human data
DNA mutations, especially those in protein-coding regions, are a driving force of biological novelties. Understanding protein evolutionary rate is thus an important topic. Along with rapid progress in high-throughput methods in recent years, it is possible to study protein evolutionary rate from many perspectives. Protein interactions, which are believed to exert an important selective pressure on protein evolution at the functional level, have been heavily studied in recent years. However, owing to the complexity in experimental setup and the biological system itself, controversial results have led investigators to debate the association between protein-protein interaction degree (PPID) and protein evolutionary rate.
Proteins with higher PPID are assumed to have a greater proportion of residues involved in interactions and thus evolve more slowly than proteins with lower PPID [6, 9]. This assumption was supported by the fact that ePPID, which measures the number of a protein's permanent interaction partners, could better predict protein evolutionary rate. In Y2 H datasets which are enriched for transient physical interactions, ePPID mainly filters out a protein's transient physical interactions and thus reflects the number of the protein's permanent physical interactions. Though the filtered interactions of a protein contribute to the PPID of the protein, they may not contribute to the proportion of the protein's residues involved in interactions (i.e., the protein tends to interact with its different filtered partners through the same binding interface). As demonstrated by our results, transient physical interactions on average indeed exert lower selective constraints on protein evolution. On the other hand, in AP/MS-related datasets, the protein pairs may not physically interact in nature; rather, they appear in the same protein complexes. In such datasets, ePPID mainly filters out a protein's transient co-complex associations and thus reflects the number of the protein's permanent co-complex associations. Though the filtered interactions of a protein may contribute to the proportion of the protein's residues involved in interactions (i.e., the protein may interact with its filtered partners through multiple distinct transient interfaces), they do not contribute to the proportion of the protein's residues involved in permanent interactions. Thus, they should also be filtered out because they may not exert higher selective constraints on protein evolution, which is also demonstrated by our results. However, Y2 H datasets are more likely to contain false negatives (incompleteness of Y2 H datasets) and ePPID in such datasets may be underestimated, whereas AP/MS-related datasets are more likely to contain false positives (indirect non-physical interactions) and ePPID in such datasets may be overestimated. We hope to study this effect when more reliable and complete protein interaction data become available in the future. Despite this slight difference, our results demonstrated a clearer role of protein interaction degree as a constraint on protein evolution.
In this work, we performed extensive studies to identify how protein interactions, as measured by PPID, affect protein evolutionary rate. By carefully comparing experimental setups, we observed that Y2 H assays may have introduced a considerable amount of transient protein interactions. On this basis, we hypothesized that the difference in experimental methods contributes to our inability to clearly define how PPID affects protein evolutionary rate. This hypothesis was confirmed by introducing a new protein interaction degree measure, the co-expressed protein-protein interaction degree (ePPID). Since ePPID is a measure that integrates protein interactions with gene co-expression information, it can filter out many transient protein interactions. As a result, ePPID gives a better prediction of protein evolutionary rate than PPID in the various protein interaction datasets tested. The relationship between ePPID and protein evolutionary rate is also robustly significant in all protein interaction datasets, which was not possible when using PPID in previous studies. We also investigated the redundancy between several variables that may affect protein evolutionary rate against the contribution of ePPID and found that ePPID makes an independent contribution to protein evolutionary rate. This result suggests the novelty of ePPID as an important determinant of protein evolutionary rate. Moreover, the application on hub proteins in the Structural Interaction Network provides further support that ePPID also gives a better prediction of protein evolutionary rate than the number of distinct binding interfaces and clarified the slower evolution of co-expressed multi-interface hub proteins over that of other hub proteins.
In summary, our work provides a new protein interaction degree measure by integrating protein interaction datasets with gene expression datasets. This new measure has, at least in part, resolved the longstanding debates on the role of protein interactions in affecting protein evolutionary rate. Finally, we have found that this result also holds in human. We will study if this can be observed in more species in the future.
Protein interaction datasets
To study the effect of experimental bias, reliability and completeness, we collected nine different yeast protein interaction datasets. We used the "Y2H-union", "Combined-AP/MS" and "LC-multiple" datasets to represent typical protein interaction datasets obtained from Y2 H assays, the AP/MS method and literature curation, respectively. To account for data quality (confidence), we also used the filtered yeast interactome ("FYI"), the Structural Interaction Network ("SIN"), "DIP-CORE" and the updated high-confidence dataset ("Updated-HC") as high-confidence datasets. In addition, the "SIN" dataset was also used to study the evolutionary rate of hub proteins through a mechanistic perspective. In contrast, we used the "DIP-FULL" and "Eight-union" datasets to account for completeness. The nine datasets are listed below.
3) LC-multiple: a protein interaction dataset based on the literature. Each protein interaction must have been curated from ≥2 different publications .
4) FYI: the filtered yeast interactome obtained from .
5) SIN: the Structural Interaction Network dataset obtained from .
6) Updated-HC: the updated high-confidence dataset obtained from .
8) DIP-FULL: the full dataset derived from the database of interacting proteins (DIP) .
9) Eight-union: the union of the above eight datasets.
The network properties of the nine datasets are summarized in Additional file 1, Table S4. The overlap of interactions between the nine datasets was shown in Additional file 1, Table S5A-C. We analyzed the six protein interaction datasets 1), 2), 3), 6), 7) and 8) in the main text and the analysis results of the other three 4), 5) and 9) were provided in Additional file 1, Text S1.
Data sources for correlation and principal component regression analyses
Evolutionary rate data (non-synonymous substitutions per site dN), which is based on the four-way yeast species alignments for 3,036 Saccharomyces cerevisiae genes, were obtained from . Specifically, orthologous genes were aligned by using ClustalW  and dN was then estimated using PAML . Three other protein evolutionary rate data (corresponding to different out-group controls, including S.cer vs S.par, S.cer vs S.mik and S.cer vs S.bay) were obtained from . mRNA abundance data were obtained from . Protein abundance data were obtained from . Codon adaptation index (CAI), which measures synonymous codon usage bias , was obtained from . Gene dispensability, measured by the average growth rates of homozygous deletion strains, was obtained from . The associated number of GO biological process terms of a gene , used as a measure of gene pleiotropy, was obtained from the Saccharomyces Genome Database . Protein betweenness, measured by the total number of shortest paths going through a protein in a protein interaction network [46, 47], was calculated by using R  with the package "igraph" .
Partial correlation analysis is frequently used to determine the confounding effect of variables such as protein abundance on the relationship between PPID and protein evolutionary rate [6, 9, 13–15, 25, 51, 71]. It is also reported that principal component regression analysis can provide a complementary analysis to partial correlation analysis  and that the relative contributions of the transformed predictors to the overall regression model can be evaluated independently and reliably . In this paper, we performed both principal component regression and partial correlation analyses to understand protein evolutionary rate. Principal component regression was performed by using R with the package "pls" . Before carrying out the principal component analysis, all variables were log transformed, except dispensability, and all predictor variables were standardized to zero mean and unit variance. It should be noted that a small constant of 0.001 was added to protein evolutionary rate dN as described in  to avoid zero values. A small constant of 0.1 was added to ePPID and betweenness to avoid zero values and we demonstrated that our results were not sensitive to these constants (see Additional file 1, Text S10). The statistical significance levels were determined according to Drummond et al. . Partial correlation analysis was performed by using R with the function provided by Kim and Yi . The method for computing the percentage variance of protein evolutionary rate explained by ePPID when protein abundance is controlled for was as follows. First, we performed a linear regression of protein evolutionary rate dN against Log(protein abundance) and obtained dN = f(Log(protein abundance)). We then computed the residue, i.e., dN_residue = dN-f(Log(protein abundance)). Finally, we performed a linear regression of the residue against ePPID: dN_residue = g(ePPID), and obtained the explained variance.
Gene expression datasets
We collected ten gene expression datasets [74–83], each with more than 50 samples (conditions), from the Yeast Functional Genomics Database  and the Saccharomyces Genome Database . Genes with missing value in >30% of the samples in a dataset were removed. Remaining missing values were imputed by the KNN impute algorithm with K = 10 using Euclidean distance , and technical replicates (i.e., spot repeats and dye swaps) were averaged.
Construction of gene co-expression networks
which approximately follows a standard normal distribution under the hypothesis of independence, where n is the sample size. We only considered positive correlations since they are reported to be more reflective of functional similarity than negative correlations . Next, P-values were obtained for the null hypothesis of no positive correlations and were corrected for multiple hypothesis testing by using false discovery rate (FDR) control procedure , and the adjusted P-values were set at the threshold of 0.001 per dataset (FDR = 0.001). In addition, we only considered those pairs that are among the top 10 percent of all possible correlations (PER = 10%) to avoid introducing too many high correlations. Our two-stage threshold selection procedure is similar to the procedure that controls both statistical significance and biological significance in [86, 88] and we demonstrated that our results were not sensitive to different thresholds of FDR and PER (see Additional file 1, Text S11). For each gene expression dataset, two genes are declared to be co-expressed if their correlation coefficient is above the thresholds of both FDR and PER.
Definition of co-expressed protein-protein interaction degree (ePPID) and classification of proteins
In a given protein interaction dataset, let PPIg denote the set of interaction partners of a given protein g. By filtering out the potential transient interaction partners in PPIg, we want to integrate gene co-expression in a way that allows us to identify the partners with which protein g can permanently interact. To explain, we can calculate the z-scores of co-expression between protein g and all genes in PPIg for each gene expression dataset. Let ePPIg(i) denote the number of genes that are found to be significantly co-expressed with gene g in gene expression dataset i (i = 1,2,...,10; see previous paragraph). The co-expressed protein-protein interaction degree (ePPID) of protein g is then defined as ePPIDg = max(ePPIg(i); i = 1,2,...,10). In addition, we tried other co-expressed protein-protein interaction degree measures to demonstrate the robustness of our results (see Additional file 1, Text S4).
To study whether evolutionary rate differences between different types of proteins can be better clarified by distinguishing permanent interactions from transient interactions, we divided proteins into low, medium and high PPID bins, with the high-PPID bin containing about 20% of the total number of proteins (also called hubs) in each protein interaction dataset. Similar to the concept behind the date and party hub definition in , we further grouped proteins into two classes. A protein was defined as co-expressed if the ratio of ePPID to PPID (ePPID/PPID) is ≥0.5; otherwise, it was defined as a non-co-expressed protein.
To study the contribution of permanent and transient interfaces to protein evolutionary rate in the "SIN" dataset, the ePPID of hub proteins in the "SIN" dataset was calculated similarly, and the hub proteins were further grouped into four classes: non-co-expressed singlish-interface hubs, non-co-expressed multi-interface hubs, co-expressed singlish-interface hubs and co-expressed multi-interface hubs by taking intersections.
average Pearson correlation coefficient
affinity purifications followed by mass spectrometry
codon adaptation index
non-synonymous substitution rate
co-expressed protein-protein interaction degree
false discovery rate
Pearson correlation coefficient
percent of all possible correlations
protein-protein interaction degree
XM is supported by NIH grant HG001696 to Michael Q Zhang. The authors greatly appreciate the comments from the three anonymous reviewers, which have significantly improved this manuscript. We thank Yangbo He for suggestions on calculating the percent variance when controlling for confounding variable.
- Xia Y, Franzosa EA, Gerstein MB: Integrated assessment of genomic correlates of protein evolutionary rate. PLoS Comput Biol. 2009, 5: e1000413- 10.1371/journal.pcbi.1000413PubMed CentralView ArticlePubMedGoogle Scholar
- Park D, Choi SS: Why proteins evolve at different rates: the functional hypothesis versus the mistranslation-induced protein misfolding hypothesis. FEBS Lett. 2009, 583: 1053-1059. 10.1016/j.febslet.2009.02.033View ArticlePubMedGoogle Scholar
- Rocha EP: The quest for the universals of protein evolution. Trends Genet. 2006, 22: 412-416. 10.1016/j.tig.2006.06.004View ArticlePubMedGoogle Scholar
- Pal C, Papp B, Lercher MJ: An integrated view of protein evolution. Nat Rev Genet. 2006, 7: 337-348. 10.1038/nrg1838View ArticlePubMedGoogle Scholar
- McInerney JO: The causes of protein evolutionary rate variation. Trends Ecol Evol. 2006, 21: 230-232. 10.1016/j.tree.2006.03.008View ArticlePubMedGoogle Scholar
- Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW: Evolutionary rate in the protein interaction network. Science. 2002, 296: 750-752. 10.1126/science.1068696View ArticlePubMedGoogle Scholar
- Zuckerkandl E: Evolutionary processes and evolutionary noise at the molecular level. I. Functional density in proteins. J Mol Evol. 1976, 7: 167-183. 10.1007/BF01731487View ArticlePubMedGoogle Scholar
- Jordan IK, Wolf YI, Koonin EV: No simple dependence between protein evolution rate and the number of protein-protein interactions: only the most prolific interactors tend to evolve slowly. BMC Evol Biol. 2003, 3: 1- 10.1186/1471-2148-3-1PubMed CentralView ArticlePubMedGoogle Scholar
- Fraser HB, Wall DP, Hirsh AE: A simple dependence between protein evolution rate and the number of protein-protein interactions. BMC Evol Biol. 2003, 3: 11- 10.1186/1471-2148-3-11PubMed CentralView ArticlePubMedGoogle Scholar
- Hahn MW, Conant GC, Wagner A: Molecular evolution in large genetic networks: does connectivity equal constraint?. J Mol Evol. 2004, 58: 203-211. 10.1007/s00239-003-2544-0View ArticlePubMedGoogle Scholar
- Saeed R, Deane CM: Protein protein interactions, evolutionary rate, abundance and age. BMC Bioinformatics. 2006, 7: 128- 10.1186/1471-2105-7-128PubMed CentralView ArticlePubMedGoogle Scholar
- Batada NN, Hurst LD, Tyers M: Evolutionary and physiological importance of hub proteins. PLoS Comput Biol. 2006, 2: e88- 10.1371/journal.pcbi.0020088PubMed CentralView ArticlePubMedGoogle Scholar
- Bloom JD, Adami C: Apparent dependence of protein evolutionary rate on number of interactions is linked to biases in protein-protein interactions data sets. BMC Evol Biol. 2003, 3: 21- 10.1186/1471-2148-3-21PubMed CentralView ArticlePubMedGoogle Scholar
- Fraser HB, Hirsh AE: Evolutionary rate depends on number of protein-protein interactions independently of gene expression level. BMC Evol Biol. 2004, 4: 13- 10.1186/1471-2148-4-13PubMed CentralView ArticlePubMedGoogle Scholar
- Bloom JD, Adami C: Evolutionary rate depends on number of protein-protein interactions independently of gene expression level: response. BMC Evol Biol. 2004, 4: 14- 10.1186/1471-2148-4-14PubMed CentralView ArticlePubMedGoogle Scholar
- Drummond DA, Raval A, Wilke CO: A single determinant dominates the rate of yeast protein evolution. Mol Biol Evol. 2006, 23: 327-337. 10.1093/molbev/msj038View ArticlePubMedGoogle Scholar
- Plotkin JB, Fraser HB: Assessing the determinants of evolutionary rates in the presence of noise. Mol Biol Evol. 2007, 24: 1113-1121. 10.1093/molbev/msm044View ArticlePubMedGoogle Scholar
- Fraser HB: Modularity and evolutionary constraint on proteins. Nat Genet. 2005, 37: 351-352. 10.1038/ng1530View ArticlePubMedGoogle Scholar
- Batada NN, Reguly T, Breitkreutz A, Boucher L, Breitkreutz BJ, Hurst LD, Tyers M: Stratus not altocumulus: a new view of the yeast protein interaction network. PLoS Biol. 2006, 4: e317- 10.1371/journal.pbio.0040317PubMed CentralView ArticlePubMedGoogle Scholar
- Bertin N, Simonis N, Dupuy D, Cusick ME, Han JD, Fraser HB, Roth FP, Vidal M: Confirmation of organized modularity in the yeast interactome. PLoS Biol. 2007, 5: e153- 10.1371/journal.pbio.0050153PubMed CentralView ArticlePubMedGoogle Scholar
- Batada NN, Reguly T, Breitkreutz A, Boucher L, Breitkreutz BJ, Hurst LD, Tyers M: Still stratus not altocumulus: further evidence against the date/party hub distinction. PLoS Biol. 2007, 5: e154- 10.1371/journal.pbio.0050154PubMed CentralView ArticlePubMedGoogle Scholar
- Kim PM, Lu LJ, Xia Y, Gerstein MB: Relating three-dimensional structures to protein networks provides evolutionary insights. Science. 2006, 314: 1938-1941. 10.1126/science.1136174View ArticlePubMedGoogle Scholar
- Aragues R, Sali A, Bonet J, Marti-Renom MA, Oliva B: Characterization of protein hubs by inferring interacting motifs from protein interactions. PLoS Comput Biol. 2007, 3: 1761-1771. 10.1371/journal.pcbi.0030178View ArticlePubMedGoogle Scholar
- Han JD, Bertin N, Hao T, Goldberg DS, Berriz GF, Zhang LV, Dupuy D, Walhout AJ, Cusick ME, Roth FP, Vidal M: Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature. 2004, 430: 88-93. 10.1038/nature02555View ArticlePubMedGoogle Scholar
- Wall DP, Hirsh AE, Fraser HB, Kumm J, Giaever G, Eisen MB, Feldman MW: Functional genomic analysis of the rates of protein evolution. Proc Natl Acad Sci USA. 2005, 102: 5483-5488. 10.1073/pnas.0501761102PubMed CentralView ArticlePubMedGoogle Scholar
- von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P: Comparative assessment of large-scale data sets of protein-protein interactions. Nature. 2002, 417: 399-403. 10.1038/nature750View ArticlePubMedGoogle Scholar
- Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, Chung S, Emili A, Snyder M, Greenblatt JF, Gerstein M: A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science. 2003, 302: 449-453. 10.1126/science.1087361View ArticlePubMedGoogle Scholar
- Sprinzak E, Sattath S, Margalit H: How reliable are experimental protein-protein interaction data?. J Mol Biol. 2003, 327: 919-923. 10.1016/S0022-2836(03)00239-0View ArticlePubMedGoogle Scholar
- Bader JS, Chaudhuri A, Rothberg JM, Chant J: Gaining confidence in high-throughput protein interaction networks. Nat Biotechnol. 2004, 22: 78-85. 10.1038/nbt924View ArticlePubMedGoogle Scholar
- Hart GT, Ramani AK, Marcotte EM: How complete are current yeast and human protein-interaction networks?. Genome Biol. 2006, 7: 120- 10.1186/gb-2006-7-11-120PubMed CentralView ArticlePubMedGoogle Scholar
- Shoemaker BA, Panchenko AR: Deciphering protein-protein interactions. Part I. Experimental techniques and databases. PLoS Comput Biol. 2007, 3: e42- 10.1371/journal.pcbi.0030042PubMed CentralView ArticlePubMedGoogle Scholar
- Yu H, Braun P, Yildirim MA, Lemmens I, Venkatesan K, Sahalie J, Hirozane-Kishikawa T, Gebreab F, Li N, Simonis N, et al.: High-quality binary protein interaction map of the yeast interactome network. Science. 2008, 322: 104-110. 10.1126/science.1158684PubMed CentralView ArticlePubMedGoogle Scholar
- Ghaemmaghami S, Huh WK, Bower K, Howson RW, Belle A, Dephoure N, O'Shea EK, Weissman JS: Global analysis of protein expression in yeast. Nature. 2003, 425: 737-741. 10.1038/nature02046View ArticlePubMedGoogle Scholar
- Newman JR, Ghaemmaghami S, Ihmels J, Breslow DK, Noble M, DeRisi JL, Weissman JS: Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise. Nature. 2006, 441: 840-846. 10.1038/nature04785View ArticlePubMedGoogle Scholar
- Holstege FC, Jennings EG, Wyrick JJ, Lee TI, Hengartner CJ, Green MR, Golub TR, Lander ES, Young RA: Dissecting the regulatory circuitry of a eukaryotic genome. Cell. 1998, 95: 717-728. 10.1016/S0092-8674(00)81641-4View ArticlePubMedGoogle Scholar
- Coghlan A, Wolfe KH: Relationship of codon bias to mRNA concentration and protein length in Saccharomyces cerevisiae. Yeast. 2000, 16: 1131-1145. 10.1002/1097-0061(20000915)16:12<1131::AID-YEA609>3.0.CO;2-FView ArticlePubMedGoogle Scholar
- Mintseris J, Weng Z: Structure, function, and evolution of transient and obligate protein-protein interactions. Proc Natl Acad Sci USA. 2005, 102: 10930-10935. 10.1073/pnas.0502667102PubMed CentralView ArticlePubMedGoogle Scholar
- Choi YS, Yang JS, Choi Y, Ryu SH, Kim S: Evolutionary conservation in multiple faces of protein interaction. Proteins. 2009, 77: 14-25. 10.1002/prot.22410View ArticlePubMedGoogle Scholar
- Teichmann SA: The constraints protein-protein interactions place on sequence divergence. J Mol Biol. 2002, 324: 399-407. 10.1016/S0022-2836(02)01144-0View ArticlePubMedGoogle Scholar
- Jansen R, Greenbaum D, Gerstein M: Relating whole-genome expression data with protein-protein interactions. Genome Res. 2002, 12: 37-46. 10.1101/gr.205602PubMed CentralView ArticlePubMedGoogle Scholar
- Ekman D, Light S, Bjorklund AK, Elofsson A: What properties characterize the hub proteins of the protein-protein interaction network of Saccharomyces cerevisiae?. Genome Biol. 2006, 7: R45- 10.1186/gb-2006-7-6-r45PubMed CentralView ArticlePubMedGoogle Scholar
- Deane CM, Salwinski L, Xenarios I, Eisenberg D: Protein interactions: two methods for assessment of the reliability of high throughput observations. Mol Cell Proteomics. 2002, 1: 349-356. 10.1074/mcp.M100037-MCP200View ArticlePubMedGoogle Scholar
- Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D: The Database of Interacting Proteins: 2004 update. Nucleic Acids Res. 2004, 32: D449-451. 10.1093/nar/gkh086PubMed CentralView ArticlePubMedGoogle Scholar
- Agarwal S, Deane CM, Porter MA, Jones NS: Revisiting date and party hubs: novel approaches to role assignment in protein interaction networks. PLoS Comput Biol. 2010, 6: e1000817- 10.1371/journal.pcbi.1000817PubMed CentralView ArticlePubMedGoogle Scholar
- Brown KR, Jurisica I: Online predicted human interaction database. Bioinformatics. 2005, 21: 2076-2082. 10.1093/bioinformatics/bti273View ArticlePubMedGoogle Scholar
- Freeman LC: A set of measures of centrality based on betweenness. Sociometry. 1977, 40: 35-41. 10.2307/3033543.View ArticleGoogle Scholar
- Girvan M, Newman ME: Community structure in social and biological networks. Proc Natl Acad Sci USA. 2002, 99: 7821-7826. 10.1073/pnas.122653799PubMed CentralView ArticlePubMedGoogle Scholar
- Hahn MW, Kern AD: Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks. Mol Biol Evol. 2005, 22: 803-806. 10.1093/molbev/msi072View ArticlePubMedGoogle Scholar
- Kellis M, Patterson N, Endrizzi M, Birren B, Lander ES: Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature. 2003, 423: 241-254. 10.1038/nature01644View ArticlePubMedGoogle Scholar
- Pal C, Papp B, Hurst LD: Highly expressed genes in yeast evolve slowly. Genetics. 2001, 158: 927-931.PubMed CentralPubMedGoogle Scholar
- Rocha EP, Danchin A: An analysis of determinants of amino acids substitution rates in bacterial proteins. Mol Biol Evol. 2004, 21: 108-116. 10.1093/molbev/msh004View ArticlePubMedGoogle Scholar
- Drummond DA, Bloom JD, Adami C, Wilke CO, Arnold FH: Why highly expressed proteins evolve slowly. Proc Natl Acad Sci USA. 2005, 102: 14338-14343. 10.1073/pnas.0504070102PubMed CentralView ArticlePubMedGoogle Scholar
- Hirsh AE, Fraser HB: Protein dispensability and rate of evolution. Nature. 2001, 411: 1046-1049. 10.1038/35082561View ArticlePubMedGoogle Scholar
- Otto SP: Two steps forward, one step back: the pleiotropic effects of favoured alleles. Proc Biol Sci. 2004, 271: 705-714. 10.1098/rspb.2003.2635PubMed CentralView ArticlePubMedGoogle Scholar
- Salathe M, Ackermann M, Bonhoeffer S: The effect of multifunctionality on the rate of evolution in yeast. Mol Biol Evol. 2006, 23: 721-722. 10.1093/molbev/msj086View ArticlePubMedGoogle Scholar
- Mandel J: Use of the singular value decomposition in regression analysis. Am Stat. 1982, 36: 15-24. 10.2307/2684086.Google Scholar
- Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P, et al.: A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature. 2000, 403: 623-627. 10.1038/35001009View ArticlePubMedGoogle Scholar
- Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y: A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci USA. 2001, 98: 4569-4574. 10.1073/pnas.061034498PubMed CentralView ArticlePubMedGoogle Scholar
- Collins SR, Kemmeren P, Zhao XC, Greenblatt JF, Spencer F, Holstege FC, Weissman JS, Krogan NJ: Toward a comprehensive atlas of the physical interactome of Saccharomyces cerevisiae. Mol Cell Proteomics. 2007, 6: 439-450.View ArticlePubMedGoogle Scholar
- Gavin AC, Aloy P, Grandi P, Krause R, Boesche M, Marzioch M, Rau C, Jensen LJ, Bastuck S, Dumpelfeld B, et al.: Proteome survey reveals modularity of the yeast cell machinery. Nature. 2006, 440: 631-636. 10.1038/nature04532View ArticlePubMedGoogle Scholar
- Krogan NJ, Cagney G, Yu H, Zhong G, Guo X, Ignatchenko A, Li J, Pu S, Datta N, Tikuisis AP, et al.: Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature. 2006, 440: 637-643. 10.1038/nature04670View ArticlePubMedGoogle Scholar
- Reguly T, Breitkreutz A, Boucher L, Breitkreutz BJ, Hon GC, Myers CL, Parsons A, Friesen H, Oughtred R, Tong A, et al.: Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae. J Biol. 2006, 5: 11- 10.1186/jbiol36PubMed CentralView ArticlePubMedGoogle Scholar
- Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22: 4673-4680. 10.1093/nar/22.22.4673PubMed CentralView ArticlePubMedGoogle Scholar
- Yang Z: Phylogenetic analysis by maximum likelihood. 2002, London: University CollegeGoogle Scholar
- Sharp PM, Li WH: The codon Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 1987, 15: 1281-1295. 10.1093/nar/15.3.1281PubMed CentralView ArticlePubMedGoogle Scholar
- Deutschbauer AM, Jaramillo DF, Proctor M, Kumm J, Hillenmeyer ME, Davis RW, Nislow C, Giaever G: Mechanisms of haploinsufficiency revealed by genome-wide profiling in yeast. Genetics. 2005, 169: 1915-1925. 10.1534/genetics.104.036871PubMed CentralView ArticlePubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al.: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25: 25-29. 10.1038/75556PubMed CentralView ArticlePubMedGoogle Scholar
- Cherry JM, Adler C, Ball C, Chervitz SA, Dwight SS, Hester ET, Jia Y, Juvik G, Roe T, Schroeder M, et al.: SGD: Saccharomyces Genome Database. Nucleic Acids Res. 1998, 26: 73-79. 10.1093/nar/26.1.73PubMed CentralView ArticlePubMedGoogle Scholar
- Team RDC: R: A language and environment for statistical computing. 2007, Vienna, Austria: R Foundation for Statistical ComputingGoogle Scholar
- Csardi G, Nepusz T: The igraph software package for complex network research. Int J Complex Syst. 2006, 1695:Google Scholar
- Lemos B, Bettencourt BR, Meiklejohn CD, Hartl DL: Evolution of proteins and gene expression levels are coupled in Drosophila and are independently associated with mRNA abundance, protein length, and number of protein-protein interactions. Mol Biol Evol. 2005, 22: 1345-1354. 10.1093/molbev/msi122View ArticlePubMedGoogle Scholar
- Kim SH, Yi SV: Understanding relationship between sequence and functional evolution in yeast proteins. Genetica. 2007, 131: 151-156. 10.1007/s10709-006-9125-2View ArticlePubMedGoogle Scholar
- Mevik BH, Wehrens R: The pls package: Principal component and partial least squares regression in R. Journal of Statistical Software. 2007, 18:Google Scholar
- Brem RB, Kruglyak L: The landscape of genetic complexity across 5, 700 gene expression traits in yeast. Proc Natl Acad Sci USA. 2005, 102: 1572-1577. 10.1073/pnas.0408709102PubMed CentralView ArticlePubMedGoogle Scholar
- Gasch AP, Spellman PT, Kao CM, Carmel-Harel O, Eisen MB, Storz G, Botstein D, Brown PO: Genomic expression programs in the response of yeast cells to environmental changes. Mol Biol Cell. 2000, 11: 4241-4257.PubMed CentralView ArticlePubMedGoogle Scholar
- Gasch AP, Huang M, Metzner S, Botstein D, Elledge SJ, Brown PO: Genomic expression responses to DNA-damaging agents and the regulatory role of the yeast ATR homolog Mec1p. Mol Biol Cell. 2001, 12: 2987-3003.PubMed CentralView ArticlePubMedGoogle Scholar
- Hughes TR, Marton MJ, Jones AR, Roberts CJ, Stoughton R, Armour CD, Bennett HA, Coffey E, Dai H, He YD, et al.: Functional discovery via a compendium of expression profiles. Cell. 2000, 102: 109-126. 10.1016/S0092-8674(00)00015-5View ArticlePubMedGoogle Scholar
- Huisinga KL, Pugh BF: A TATA binding protein regulatory network that governs transcription complex assembly. Genome Biol. 2007, 8: R46- 10.1186/gb-2007-8-4-r46PubMed CentralView ArticlePubMedGoogle Scholar
- Mnaimneh S, Davierwala AP, Haynes J, Moffat J, Peng WT, Zhang W, Yang X, Pootoolal J, Chua G, Lopez A, et al.: Exploration of essential gene functions via titratable promoter alleles. Cell. 2004, 118: 31-44. 10.1016/j.cell.2004.06.013View ArticlePubMedGoogle Scholar
- O'Rourke SM, Herskowitz I: Unique and redundant roles for HOG MAPK pathway components as revealed by whole-genome expression analysis. Mol Biol Cell. 2004, 15: 532-542.PubMed CentralView ArticlePubMedGoogle Scholar
- Roberts CJ, Nelson B, Marton MJ, Stoughton R, Meyer MR, Bennett HA, He YD, Dai H, Walker WL, Hughes TR, et al.: Signaling and circuitry of multiple MAPK pathways revealed by a matrix of global gene expression profiles. Science. 2000, 287: 873-880. 10.1126/science.287.5454.873View ArticlePubMedGoogle Scholar
- Saldanha AJ, Brauer MJ, Botstein D: Nutritional homeostasis in batch and steady-state culture of yeast. Mol Biol Cell. 2004, 15: 4089-4104. 10.1091/mbc.E04-04-0306PubMed CentralView ArticlePubMedGoogle Scholar
- Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell. 1998, 9: 3273-3297.PubMed CentralView ArticlePubMedGoogle Scholar
- The Yeast Functional Genomics Database. http://yfgdb.princeton.edu/
- Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB: Missing value estimation methods for DNA microarrays. Bioinformatics. 2001, 17: 520-525. 10.1093/bioinformatics/17.6.520View ArticlePubMedGoogle Scholar
- Lee HK, Hsu AK, Sajdak J, Qin J, Pavlidis P: Coexpression analysis of human genes across many microarray data sets. Genome Res. 2004, 14: 1085-1094. 10.1101/gr.1910904PubMed CentralView ArticlePubMedGoogle Scholar
- Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Statist Soc Ser B. 1995, 57: 289-300.Google Scholar
- Zhu D, Hero AO, Qin ZS, Swaroop A: High throughput screening of co-expressed gene pairs with controlled false discovery rate (FDR) and minimum acceptable strength (MAS). J Comput Biol. 2005, 12: 1029-1045. 10.1089/cmb.2005.12.1029View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.