Quantitative maps of genetic interactions in yeast - Comparative evaluation and integrative analysis
© Lindén et al; licensee BioMed Central Ltd. 2011
Received: 30 July 2010
Accepted: 24 March 2011
Published: 24 March 2011
High-throughput genetic screening approaches have enabled systematic means to study how interactions among gene mutations contribute to quantitative fitness phenotypes, with the aim of providing insights into the functional wiring diagrams of genetic interaction networks on a global scale. However, it is poorly known how well these quantitative interaction measurements agree across the screening approaches, which hinders their integrated use toward improving the coverage and quality of the genetic interaction maps in yeast and other organisms.
Using large-scale data matrices from epistatic miniarray profiling (E-MAP), genetic interaction mapping (GIM), and synthetic genetic array (SGA) approaches, we carried out here a systematic comparative evaluation among these quantitative maps of genetic interactions in yeast. The relatively low association between the original interaction measurements or their customized scores could be improved using a matrix-based modelling framework, which enables the use of single- and double-mutant fitness estimates and measurements, respectively, when scoring genetic interactions. Toward an integrative analysis, we show how the detections from the different screening approaches can be combined to suggest novel positive and negative interactions which are complementary to those obtained using any single screening approach alone. The matrix approximation procedure has been made available to support the design and analysis of the future screening studies.
We have shown here that even if the correlation between the currently available quantitative genetic interaction maps in yeast is relatively low, their comparability can be improved by means of our computational matrix approximation procedure, which will enable integrative analysis and detection of a wider spectrum of genetic interactions using data from the complementary screening approaches.
The recent advances in experimental biotechnologies have made it possible to start screening genome-wide datasets of quantitative genetic interactions in model organisms such as yeast [1–3]. High-throughput genetic screening approaches, such as those based on epistatic miniarray profiling (E-MAP) [4–7], genetic interaction mapping (GIM) , and synthetic genetic array (SGA) [9–11], have provided systematic means to global investigation of quantitative relationship between genotype and phenotype, with potential implications for a wide range of biological phenomena, including, for instance, modularity, essentiality, redundancy, buffering, epistasis, evolution, canalization and development of human disease [1–3, 12–21]. The rapid accumulation of quantitative genetic interaction data is providing us with unique opportunities to decipher how genes function as networks to regulate cellular processes and to maintain mutational robustness. However, the massive datasets also call for principled modelling frameworks and efficient analytic approaches to take a full advantage of the in-depth information encoded in the available and emerging quantitative interaction datasets . In particular, efficient bioinformatics procedures enabling integrative analysis of multiple datasets from various screening approaches could increase the quality and coverage of the genetic interaction maps, with the aim of completing the genetic interaction networks in yeast and other organisms.
Comparing the results from the alternative experimental approaches is crucial for validating the observed interactions, estimating the biases related to each approach, and filling the gaps in the currently incomplete datasets. It is therefore likely that comprehensive mapping of the quantitative genetic interaction networks will require integration of a number datasets from different screening approaches, similar to the recent efforts to complete the physical protein-protein interaction (PPI) networks in yeast and human [23–28]. A major challenge in such integrative analysis is that quantitative interaction data generated with the complementary experimental approaches in different laboratories are not directly comparable, due to differences, for instance, in experimental designs, growth conditions or screening protocols as well as in data pre-processing or scoring options. Even when the same mutant pairs are considered, the technical variation can lead to some disagreement in the detection results and to relatively large inconsistency between the datasets in general [8, 11]. The correction for such discrepancy can be beyond the capacity of the customized data processing techniques used within the individual screening approaches [29, 30]. A common modelling framework, adjusted for the different screening approaches, could improve the comparability of the results and allow for integrative analysis.
Compared to PPI networks, an additional challenge originates from the quantitative nature of the genetic interaction datasets; instead of comparing the overlap in binary terms, such as presence or absence of a physical interaction, here we should take into account the full spectrum of genetic interactions, ranging from extreme cases of negative interactions (i.e., synthetic sick and lethality) to the positive classes of interacting pairs (e.g., masking and suppression subcategories) [2, 3, 17]. We have recently shown that the quantitative data matrices obtained from the individual quantitative screening approaches can capture different portions of this spectrum, as compared to known classes of genetic interactions; for instance, the SGA and GIM datasets captured relatively well the negative classes of interactions, whereas the prediction of the positive interactions proved much more challenging when using the provided double-mutant fitness data alone . Similar observations have been made also when using the highly processed E-MAP data [32, 33]. To improve the predictive power of the individual quantitative datasets, we further developed our computational matrix approximation strategy , and showed that it could transform the original fitness matrices so that these allow for better discrimination of not only negative but also the positive end of interaction spectrum from the background variability .
In the present study, toward combining the quantitative detections from multiple large-scale genetic interaction approaches, we investigated the consistency among the currently available quantitative interaction datasets in yeast, as well as the sensitivity and specificity of the genetic interactions detected by using the three screening approaches (SGA, GIM and E-MAP), with respect to their overlap in common mutant pairs and coverage of known interacting pairs, as extracted from a gold-standard reference database of genetic interactions (BioGRID). We first show that the comparability of the detections between the different approaches can be improved using standardized matrix-based modelling framework within each individual dataset. Using appropriate scoring and aggregation functions, we then demonstrate how the detections from the different screening approaches can be combined more effectively, compared to that when using the individual datasets alone, suggesting that the matrix approximation-based meta-analytic procedure allows for the full exploitation of the existing data when predicting novel interactions or designing new experiments. To promote its widespread usage in the future screening studies, we have made publicly available an efficient, stand-alone R-implementation of the quantile-based matrix approximation procedure (QMAP), which includes a number of user-adjustable options that can be used to fine-tune the procedure for any given experimental dataset.
Results and Discussion
Scoring of quantitative genetic interactions
We have previously introduced a matrix-based modelling and approximation framework, and showed that it provides a quantitative and efficient means for scoring genetic interactions among thousands of genes, thereby leading to improved detection of both positive and negative pairs of interactions in large-scale quantitative screening experiments [31, 34]. Briefly, the matrix approximation strategy is based on the observation that most gene pairs in the large-scale genetic interaction screens have no significant interaction with each other [2, 3]. This implies that the single-mutant fitness effects, which are needed in the interaction scoring, could be estimated using solely the information encoded in the observed, double-mutant fitness matrix W, with entries w ab corresponding to the m query and n array strains, respectively, that is, a = 1,2,...m and b = 1,2,...n. The underlying idea of the matrix approximation it to decompose the original fitness matrix into separate components, W = x ⊗y, where the m and n-dimensional vectors x and y model the variability across the array and query mutants, respectively [31, 34].
In the symmetric case, that is , the above equation expresses in matrix notation the well-established multiplicative null model, w ab = w a w b , which states that the expected neutral phenotype of an organism's fitness, under the null hypothesis that it carries two non-interacting mutations (a and b), can be estimated by the product of the corresponding single-mutant fitness effects (w a and w b , respectively) . It was shown on symmetric, high-resolution data that the product function is the best null model among a family of alternative models (minimum, additive and log functions), in the sense that it yields a distribution with location close to zero and low dispersion over all of the measured deviations ε ab = w ab - w a w b [35, 36]. In the non-symmetric case, n ≠ m, even though the single-mutant effects x and y are not necessarily equal, these together can provide individual estimates for w a and w b , respectively. In the present work, the estimation of x and y was performed using a robust, rank-one matrix approximation method, named quantile-based matrix approximation (QMA) .
After performing the approximation of the double-mutant fitness matrix W under the null multiplicative model, the interaction class of a mutant pair (a,b) can be predicted using a specific scoring function s(x, y), such as minimum, maximum, product or scaled epistasis [13, 35, 36], which transform the original fitness matrix into a score (or residual) matrix s ab = w ab - s(x a , y b ). It has been shown before that there exists effective alternatives to the traditional product function when further classifying the significant genetic interactions into the positive and negative classes [13, 31]. Accordingly, the score values s ab can be used in place of the traditional deviations ε ab to test for a genetic interaction between genes a and b, where a large absolute score provides evidence for genetic interaction, while scores close to zero indicate non-interacting gene pairs. The positive interactions (or alleviating epistatic effects) should result in positive scores (s ab > 0), and the negative interactions (aggravating epistatic effects) in negative scores (s ab < 0), with synthetic lethality being the extreme case (w ab = 0).
Following the lessons learned from the integrative analysis of high-throughput PPI datasets , we first evaluated separately the data from the individual screening approaches (SGA, GIM and E-MAP), against a gold-standard reference database of know interactions (BioGRID) . Such within-approach benchmarking resulted in specific parameter combinations for the data-adjusted QMA estimates and scoring functions for positive and negative genetic interaction classes (Additional File 1) . In the following analyses, we utilized these same parameters and scoring functions to assess their robustness, and to demonstrate the relative advantages of the generic matrix approximation strategy, in terms of both improved comparability of the interaction scores as well as integrative detection of genetic interactions, among the screening approaches, in comparison to using the individual datasets alone. Our specific focus here is on the detection of pairs of positive interactions, the accurate scoring of which has been challenging in the past despite the quantitative approaches.
Agreement between the quantitative datasets
Parwise intersections between the three datasets used in the study
3885 × 1712
3881 × 15
543 × 339
5918 × 41
733 × 17
743 × 743
Coverage of the known genetic interactions in the dataset pairs
Even if the interactions extracted from the three datasets under study were pairwise deleted from the BioGRID's genetic interaction categories (Table 2), there may remain some bias in these categories toward the E-MAP approach due to the large number of interactions identified in the three other large-scale E-MAP studies [4, 6, 7]. If these had also been excluded from the comparative analyses, the sizes of the reference positive and negative classes would have become much smaller, hence hindering the comparative evaluations. Due to this potential bias, the interaction detection results for the data pairs other than the SGA - GIM should be interpreted with caution. Moreover, it was not initially expected that the matrix approximation could provide any further improvements in the E-MAP data, since this data has already been heavily pre-processed and custom-scored against an expected fitness , resulting in a symmetric and close to zero-centered data matrix . Therefore, we focus here on illustrating the benefits of QMA-based integrative analysis using the detection of positive interactions in the SGA - GIM data pair as our principal case study; however, the full set of results are provided in Additional files 2 - 7.
Pairwise correlations between the three quantitative datasets
Predictive relationship between the datasets
The modelling framework makes it also possible to avoid performing the single-mutant growth experiments in the large-scale genetic interaction screens, without compromising their quantitative scoring accuracy. Moreover, the model-estimated array-vector was in a good agreement with the experimentally-derived single-mutant fitness measurements available in the SGA data (Spearman's correlation ranged from 0.964 to 0.996, depending whether we use the fixed QMA settings or those adjusted for positive interactions, respectively). Despite such high rank correlation levels, however, there is a significant difference in the location and scaling between the estimated and measured fitness values, indicating that the estimates encode added information for interaction scoring. The QMA settings used here were originally selected on the basis of the pre-release version of the SGA data , which contained only 1277 of the query mutations of the current SGA dataset (75%), thus indicating the robustness of the QMA settings. In the following section, we further highlight the potential of the model-based strategy in integrative analysis by using the same QMA setup selected specifically for the positive interactions, even if this will likely to result in compromised prediction accuracies in the negative interaction classes.
Integrative identification of genetic interactions
After showing that the usage of the matrix approximation-based scoring system in place of the original double-mutant fitness matrix or its custom-scored version can lead to improvements in the comparability between the dataset pairs, we next evaluated whether these observed improvements in the rank correlation or prediction of the extreme pairs could contribute also to improved identification of genetic interactions, when using multiple datasets together, compared to using single datasets alone. To choose an appropriate data integration approach, we first evaluated the predictive performance of four rank aggregation functions (product, minimum, maximum and Borda count, which is effectively the same as the additive function), in terms of how accurately they can detect known pairs of interacting genes. Even if the QMA-based scoring setup was aimed here at the detection of positive interactions, we further tested its prediction capability also for the negative interactions to study its generalization capability beyond the type of interactions it was initially designed for. The prediction performance is illustrated here using the unbiased GIM-SGA data pair, whereas the E-MAP - SGA and E-MAP - GIM pairs are provided in Additional File 6.
Detection accuracies using the datasets either alone or combined
Positive genetic interactions
Negative genetic interactions
GIM - SGA
E-MAP - SGA
GIM - E-MAP
Taken together the integrative prediction results in the three dataset pairs, the Borda count and the rank product performed equally well when the aim is to identify the first candidate set of positive interactions with the highest specificity for follow-up studies, whereas the more stringent maximum function provided the best prediction accuracy when larger numbers of positive interactions are being identified. In the detection of negative interactions, the intermediate rank product showed consistently the best results among all the data pairs, making it an appropriate rank aggregation function in case both positive and negative interactions are being detected using the same setup. In addition to showing the benefits of the integrative detection, these results can also be used for comparative evaluation of the detection power among the individual datasets from the different screening approaches. For instance, on the basis of the same reference set of known interactions on a common set of shared mutant pairs in the SGA and GIM datasets, the GIM approach seems to detect particularly well larger number of negative interactions (Table 4), whereas the nearly genome-wide SGA dataset provides comparable detection power in the positive end of the genetic interaction spectrum (Figure 5).
Although the integrative detection based on combined scores was shown to provide marked improvements in the detection of both positive and negative interaction classes when using the SGA and GIM datasets together, it was interesting to note that in the SGA - E-MAP dataset pair, the E-MAP data alone provided extremely good detection accuracies in the positive class of interactions (Table 4). Rather than being a result of the superiority of this particular dataset, this is more likely attributable to the fact that many of the pairs (23%) of positive interactions in the BioGRID originate from the other large-scale genetic interaction screens performed with the E-MAP approach [4, 6, 7] (Table 2). These pairs clearly dominate the joint distribution of the positive interactions, while being supported by the SGA approach to a varying degree (Additional File 4). Interestingly, the detection of the negative interactions by the E-MAP approach alone was found sub-optimal (Table 4). Moreover, the additional benefits gained by the integrative analysis were more pronounced in the GIM - E-MAP than in the SGA - E-MAP data pair (Table 4). These results demonstrate that the intrinsic differences between the screening approaches influence how much they can complement each other.
To our knowledge, the present study is the first systematic and objective comparative evaluation of data from the main large-scale quantitative genetic interaction screening approaches (SGA, GIM and E-MAP). We showed here that even if the association between the original fitness measurements or their interaction scores is relatively low, their comparability can be improved by means of our matrix approximation technique. Toward an integrative analysis, we showed that a multi-approach analysis of quantitative genetic interactions can provide novel findings which are complementary to those obtained using any single screening approach alone. An integrative analysis can therefore provide a systematic means to pool information from previous interaction studies, with the aim of maximizing the number of both positive and negative interactions without compromising the reliability of the detections, as well as of minimizing the number of additional experiments needed when prioritizing of future screens. In general, such computational approach can facilitate the experimental efforts by improving the quality and coverage of the current genetic interaction networks, towards completing the still incomplete information of genetic interactions in yeast, which is - by and large - complementary to that obtained from the physical protein interactions and complexes [1, 5, 11, 17, 39, 40].
Although these results already demonstrate the potential of integrating datasets across different screening approaches using the matrix approximation strategy, more comprehensive studies are warranted in the future that combine experimental data from various types of genetic interaction studies, such as those performed under different environmental conditions, using fitness phenotypes other than growth, or on multiple perturbations or study organisms to investigate questions related, for instance, to plasticity and evolution of genetic networks or higher-order and interspecies interactions [2, 3, 17, 41–47]. Although we illustrated here the feasibility of the integrative analysis through QMA with its previously fixed parameters and scoring functions selected for each screening approach individually, even better prediction accuracies will likely to be obtained after a systematic optimization of these options for each dataset combination, downstream analysis objective, and interaction strength level separately (Additional File 7). The efficient QMA R-package, which includes a number of user-adjustable parameters (Additional File 8), was made available here to enable such tailored matrix approximation that meets the needs of a given study.
A potential limitation of the current evaluation setup is the definition of the reference set of interactions using the BioGRID database. For instance, since the interactions in the BioGRID database originate from multiple genetic interaction screening studies, there can be cases where a mutant pair AB is reported as encoding an interaction, even if BA is not, or where the reciprocal pairs AB and BA are marked as belonging to different classes of interactions. To make sure that such cases do not interfere with the comparative evaluations, we filtered out any unambiguous interaction pairs, and for the remaining interactions, we used the same interaction class for the reciprocal mutant pairs. Moreover, to provide as fair assessment as possible, we excluded those interactions identified from the datasets under comparison. Therefore, the detection accuracies presented here should be considered as lower bounds for the true accuracy of the screening approaches or their combination. Even if there may still remain some biases, especially toward the well-represented E-MAP approach, the BioGRID database includes also a wide range of other large-scale studies, thus providing a comprehensive reference set for the evaluations. To improve the future benchmarking studies, it would be beneficial to add a specific category for known non-interacting mutant pairs, similar to that available for physically non-interacting protein pairs .
Analogous to efforts for completing the mapping of the physical PPI networks [23–28], it would be important to provide the community with an easy access also to the raw interaction datasets, similar to that provided in the SGA database DRYGIN . For instance, our matrix approximation procedure was much more efficient with the original double-mutant fitness measurements, as provided by the SGA and GIM laboratories, compared to the highly processed and scored E-MAP datasets. The results with the E-MAP being one of the datasets were in many cases drastically different from that with the SGA - GIM dataset pair. As with any high-throughput assays, the large-scale genetic screening approaches are inherently noisy and biased in their nature, suggesting that each single assay can reveal only a limited scope of the full spectrum of genetic interaction classes. Therefore, it is likely that integrative analysis of data from the complementary screening approaches will be essential to complete the quantitative genetic interaction networks in yeast and other organisms. We invite those participating in the genetic interaction mapping effort to try out the matrix approximation-based procedure and to give us input and suggestions for its further improvements.
The methodological aim of the present study was to enable an integrated analysis of multiple genetic interaction datasets using a common scoring framework. adjusted for the high-throughput quantitative screening approaches. The next sections describe the genetic interaction datasets used to demonstrate the benefits of such integrative approach, as well as the methods used to model, standardize, compare and merge these datasets, while maintaining their biological consistency and quantitative nature.
Genetic interaction matrices
Three large-scale quantitative data sets on yeast were used in the present work for the systematic and comparative evaluations. To investigate the potential limitations in the between-approach agreement and relative benefits gained by an integrative analysis among the currently available high-throughput quantitative genetic interaction maps, we chose representative example datasets across the spectrum of high-throughput interaction screening approaches currently used for Saccharomyces cerevisiae.
The first dataset was available from the epistatic miniarray profiling (E-MAP) study of quantitative genetic interactions between genes involved in yeast chromosome biology . The original fitness measurements among 754 alleles of 743 genes were highly filtered and processed, providing a symmetric data matrix with close to zero-centered quantitative distribution for the pairwise interaction scores [29, 49]. The raw, unprocessed double-mutant fitness measurements were not available from this study.
Representing another screening approach, the genetic interaction mapping (GIM) combines ideas from the synthetic lethality analysis by microarray (SLAM) [50, 51] and from synthetic genetic array (SGA) approaches [9, 10]. The data matrix available from its pilot study contains double-mutant fitness measurements among 5918 array and 73 query genes . The filtered fitness effects were transformed back to non-log-scale to produce quantitative distribution with mean and median close to unity.
The third and the largest of the datasets is available from the recent SGA screening study . This data set contains double-mutant fitness measurements among 3885 array and 1712 query genes. The filtered and normalized double-mutant fitness data matrix, with median close to unity, was used in the matrix approximation procedure. The same dataset also includes a customized SGA scoring of the gene pairs [30, 52], which was used here as a baseline value for our QMA-based scoring procedure.
The quantile-based matrix approximation (QMA) is an efficient rank-one matrix approximation method, which is conceptually similar to the Tukey's median polish procedure, except that QMA uses multiplicative model instead of additive model and quantiles instead of medians . More specifically, the estimation of the single-mutant fitness effects is based on sub-sequent calculation of the p and q-quantile points for the rows and columns of the double-mutant fitness matrix W, respectively, and then arranging these quantiles in the estimated array and query vectors x and y.
Scoring of interactions
The presence and sign of an epistasis interaction between a gene pair (a,b) was scored using the residual s ab = w ab - s(x a , y b ). To avoid potential bias among the different genes in the datasets, duplicate rows and columns in the double-mutant fitness matrices were combined by calculating mean over the duplicates. The final dimensions of the data matrices are shown in Table 1. Before the data integration, each of the double-mutant fitness matrices was scored separately using the default QMA settings and scoring functions (Additional File 1), as described before .
Ranking of interactions
A gene pairs (a,b) was ranked according to its interaction score s ab obtained in each individual dataset using the fixed QMA settings and scoring functions for positive interactions (Additional File 1). A rank-based data aggregation was used for robust integration of the scores from two screening approaches. More precisely, four rank aggregation functions (minimum of the ranks, maximum of the ranks, product of the ranks, and Borda count, which is effectively the sum of the ranks) were evaluated in terms of their accuracy, compared to using the rankings from a single dataset alone.
Evaluation setup and measures
The pairwise intersections between the three dataset pairs were evaluated separately in terms of their number of common array and query mutants (Table 1), the coverage of the known pairs of genetic interactions (Table 2), as well as their association in fitness values and interaction scores across the shared mutant pairs (Table 3). The shared intersection among all the three datasets was only 498 × 7 in size, including 178 known negative and only 31 known positive interactions from the BioGRID database. Therefore, this triple intersection could not be reliably evaluated here.
BioGRID interaction matrix
We used the interactions available in the gold-standard BioGRID database (version 3.0.64 for S. cerevisia e) . We constructed a BioGRID's interaction matrix by treating the gene pairs extracted from the database as unordered, meaning that if an interaction exists for a mutant pair AB, we also copied the same interaction for the mutant pair BA for biological consistency. Similar symmetric strategy has been used also in previous studies [4–7, 11, 31]. For each pairwise intersection between datasets, separate positive and negative interaction matrices were created for evaluation purposes.
BioGRID interaction classes
Positive interaction matrix is constructed using 'Phenotypic Suppression' and 'Positive Genetic' categories from BioGRID database, and negative interaction matrix was generated by combining 'Synthetic Lethal', 'Synthetic Growth Defect', 'Phenotypic Enhancement' and 'Negative Genetic' categories. Such interaction matrices are ternary matrices with entries representing either an interacting, non-interacting or ambiguous case, where the pair belongs to both interaction classes. Since the ambiguous cases can lead to biases in the evaluation results, they were excluded from the evaluations.
Agreement between the datasets
The congruence between the dataset pairs was evaluated by calculating the Pearson and Spearman correlations across those mutant pairs shared by both datasets. The agreement of the datasets in terms of their extreme fitness values or interaction scores was evaluated by constructing interaction matrices using one of the datasets to define positive and negative genetic interactions. We used extreme 3% of the mutant pairs, according to the interaction rate estimate based on unbiased screens (3.15% ), and the BioGRID interactions here among the three dataset intersections (2.99%; Table 2). Other cut-off levels (1% and 5%) were also considered (Additional File 7).
Receiver operating characteristics
The receiver operating characteristic (ROC) curves were used to assess the discovery rate of genetic interactions. A single ROC curve summarizes the trade-off between true positive rate (TPR) and false positive rate (FPR) on a ranked list of mutant pairs. The true and false interactions were defined here using the interaction matrices (from the BioGRID or using 3% extreme values). The overall prediction performance was summarized using the area under the ROC curve (AUC). For an ideal classifier, TPR = 1, FPR = 0 and AUC = 1, whereas a random classifier has on average AUC of 0.5.
Partial AUC and early sensitivity
In many practical application cases, only the first few candidate mutant pair can be followed-up in further validation studies. Therefore, it is important to evaluate also the performance of a mutant pair ranking at low FPR levels, that is, for those pairs with highest specificity. We used here the partial area under the ROC curve (pAUC), in which the range of FPR is limited to a predefined interval between zero and r (here r = 0.1), and the resulting area is then normalized by dividing it with r. To investigate the early sensitivity of the detections, we also calculated the TPR at FPR of 0.01.
Enrichment of genetic interactions
Here, K is the total number of gene pairs in the grid, M is the total number of (positive or negative) interactions (M ≤ K), m is the number of interactions found (m ≤ M), and t is the number of gene pairs in the particular grid cell (m ≤ t ≤ K). The p-values in the figures were limited between 10-100 and 0.99.
To promote its widespread usage in the future screening studies, we have made publicly available an efficient, stand-alone R-implementation of the quantile-based matrix approximation procedure (QMAP). This implementation includes a number of user-adjustable options that can be adjusted through a graphical user interface to fine tune the procedure for a given experimental dataset and downstream analysis object under investigation. Along with the open source R-code, the implementation contains documentation of the data format for the input data, the parameters of the various options, as well the output data of the QMAP (Additional File 8).
List of abbreviations
area under the curve
epistatic miniarray profiling
false positive rate
genetic interaction mapping
partial area under the curve (pAUC)
quantile-based matrix approximation
synthetic lethality analysis by microarray
receiver operating characteristic
synthetic genetic array
true positive rate.
The authors thank Prof. Charlie Boone and Dr. Cosmin Saveanu for providing us with the quantitative SGA and GIM datasets, respectively. The work was supported by the Academy of Finland (grants 120 569, 133 227 and 140 880).
- Boone C, Bussey H, Andrews BJ: Exploring genetic interactions and networks with yeast. Nat Rev Genet. 2007, 8: 437-449. 10.1038/nrg2085View ArticlePubMedGoogle Scholar
- Dixon SJ, Costanzo M, Baryshnikova A, Andrews B, Boone C: Systematic mapping of genetic interaction networks. Annu Rev Genet. 2009, 43: 601-625. 10.1146/annurev.genet.39.073003.114751View ArticlePubMedGoogle Scholar
- Beltrao P, Cagney G, Krogan NJ: Quantitative genetic interactions reveal biological modularity. Cell. 2010, 141: 739-45. 10.1016/j.cell.2010.05.019PubMed CentralView ArticlePubMedGoogle Scholar
- Schuldiner M, Collins SR, Thompson NJ, Denic V, Bhamidipati A, Punna T, Ihmels J, Andrews B, Boone C, Greenblatt JF, Weissman JS, Krogan NJ: Exploration of the function and organization of the yeast early secretory pathway through an epistatic miniarray profile. Cell. 2005, 123: 507-519. 10.1016/j.cell.2005.08.031View ArticlePubMedGoogle Scholar
- Collins SR, Miller KM, Maas NL, Roguev A, Fillingham J, Chu CS, Schuldiner M, Gebbia M, Recht J, Shales M, Ding H, Xu H, Han J, Ingvarsdottir K, Cheng B, Andrews B, Boone C, Berger SL, Hieter P, Zhang Z, Brown GW, Ingles CJ, Emili A, Allis CD, Toczyski DP, Weissman JS, Greenblatt JF, Krogan NJ: Functional dissection of protein complexes involved in yeast chromosome biology using a genetic interaction map. Nature. 2007, 446: 806-810. 10.1038/nature05649View ArticlePubMedGoogle Scholar
- Wilmes GM, Bergkessel M, Bandyopadhyay S, Shales M, Braberg H, Cagney G, Collins SR, Whitworth GB, Kress TL, Weissman JS, Ideker T, Guthrie C, Krogan NJ: A genetic interaction map of RNA-processing factors reveals links between Sem1/Dss1-containing complexes and mRNA export and splicing. Mol Cell. 2008, 32: 735-746. 10.1016/j.molcel.2008.11.012PubMed CentralView ArticlePubMedGoogle Scholar
- Fiedler D, Braberg H, Mehta M, Chechik G, Cagney G, Mukherjee P, Silva AC, Shales M, Collins SR, van Wageningen S, Kemmeren P, Holstege FC, Weissman JS, Keogh MC, Koller D, Shokat KM, Krogan NJ: Functional organization of the S. cerevisiae phosphorylation network. Cell. 2009, 136: 952-963. 10.1016/j.cell.2008.12.039PubMed CentralView ArticlePubMedGoogle Scholar
- Decourty L, Saveanu C, Zemam K, Hantraye F, Frachon E, Rousselle JC, Fromont-Racine M, Jacquier A: Linking functionally related genes by sensitive and quantitative characterization of genetic interaction profiles. Proc Natl Acad Sci USA. 2008, 105: 5821-5826. 10.1073/pnas.0710533105PubMed CentralView ArticlePubMedGoogle Scholar
- Tong AH, Evangelista M, Parsons AB, Xu H, Bader GD, Pagé N, Robinson M, Raghibizadeh S, Hogue CW, Bussey H, Andrews B, Tyers M, Boone C: Systematic genetic analysis with ordered arrays of yeast deletion mutants. Science. 2001, 294: 2364-2368. 10.1126/science.1065810View ArticlePubMedGoogle Scholar
- Tong AH, Lesage G, Bader GD, Ding H, Xu H, Xin X, Young J, Berriz GF, Brost RL, Chang M, Chen Y, Cheng X, Chua G, Friesen H, Goldberg DS, Haynes J, Humphries C, He G, Hussein S, Ke L, Krogan N, Li Z, Levinson JN, Lu H, Ménard P, Munyana C, Parsons AB, Ryan O, Tonikian R, Roberts T, et al.: Global mapping of the yeast genetic interaction network. Science. 2004, 303: 808-813. 10.1126/science.1091317View ArticlePubMedGoogle Scholar
- Costanzo M, Baryshnikova A, Bellay J, Kim Y, Spear ED, Sevier CS, Ding H, Koh JL, Toufighi K, Mostafavi S, Prinz J, St Onge RP, VanderSluis B, Makhnevych T, Vizeacoumar FJ, Alizadeh S, Bahr S, Brost RL, Chen Y, Cokol M, Deshpande R, Li Z, Lin ZY, Liang W, Marback M, Paw J, San Luis BJ, Shuteriqi E, Tong AH, van Dyk N, et al.: The genetic landscape of a cell. Science. 2010, 327: 425-431. 10.1126/science.1180823View ArticlePubMedGoogle Scholar
- Hartman JL, Garvik B, Hartwell L: Principles for the buffering of genetic variation. Science. 2001, 291: 1001-1004. 10.1126/science.291.5506.1001View ArticlePubMedGoogle Scholar
- Segrè D, Deluna A, Church GM, Kishony R: Modular epistasis in yeast metabolism. Nat Genet. 2005, 37: 77-83.PubMedGoogle Scholar
- Davierwala AP, Haynes J, Li Z, Brost RL, Robinson MD, Yu L, Mnaimneh S, Ding H, Zhu H, Chen Y, Cheng X, Brown GW, Boone C, Andrews BJ, Hughes TR: The synthetic genetic interaction spectrum of essential genes. Nat Genet. 2005, 37: 1147-1152. 10.1038/ng1640View ArticlePubMedGoogle Scholar
- Ooi SL, Pan X, Peyser BD, Ye P, Meluh PB, Yuan DS, Irizarry RA, Bader JS, Spencer FA, Boeke JD: Global synthetic-lethality analysis and yeast functional profiling. Trends Genet. 2006, 22: 56-63. 10.1016/j.tig.2005.11.003View ArticlePubMedGoogle Scholar
- Jasnos L, Korona R: Epistatic buffering of fitness loss in yeast double deletion strains. Nat Genet. 2007, 39: 550-554. 10.1038/ng1986View ArticlePubMedGoogle Scholar
- Beyer A, Bandyopadhyay S, Ideker T: Integrating physical and genetic maps: from genomes to interaction networks. Nat Rev Genet. 2007, 8: 699-710. 10.1038/nrg2144PubMed CentralView ArticlePubMedGoogle Scholar
- Ulitsky I, Shamir R: Pathway redundancy and protein essentiality revealed in the Saccharomyces cerevisiae interaction networks. Mol Syst Biol. 2007, 3: 104- 10.1038/msb4100144PubMed CentralView ArticlePubMedGoogle Scholar
- Lehner B: Modelling genotype-phenotype relationships and human disease with genetic interaction networks. J Exp Biol. 2007, 210: 1559-1566. 10.1242/jeb.002311View ArticlePubMedGoogle Scholar
- Phillips PC: Epistasis - the essential role of gene interactions in the structure and evolution of genetic systems. Nat Rev Genet. 2008, 9: 855-867. 10.1038/nrg2452PubMed CentralView ArticlePubMedGoogle Scholar
- Gao H, Granka JM, Feldman MW: On the classification of epistatic interactions. Genetics. 2010, 184: 827-837. 10.1534/genetics.109.111120PubMed CentralView ArticlePubMedGoogle Scholar
- Breker M, Schuldiner M: Explorations in topology-delving underneath the surface of genetic interaction maps. Mol Biosyst. 2009, 5: 1473-1481. 10.1039/b907076cView ArticlePubMedGoogle Scholar
- Hart GT, Ramani AK, Marcotte EM: How complete are current yeast and human protein-interaction networks?. Genome Biol. 2006, 7: 120- 10.1186/gb-2006-7-11-120PubMed CentralView ArticlePubMedGoogle Scholar
- Goll J, Uetz P: The elusive yeast interactome. Genome Biol. 2006, 7: 223.PubMed CentralView ArticlePubMedGoogle Scholar
- Gentleman R, Huber W: Making the most of high-throughput protein-interaction data. Genome Biol. 2007, 8: 112- 10.1186/gb-2007-8-10-112PubMed CentralView ArticlePubMedGoogle Scholar
- Futschik ME, Chaurasia G, Herzel H: Comparison of human protein-protein interaction maps. Bioinformatics. 2007, 23: 605-611. 10.1093/bioinformatics/btl683View ArticlePubMedGoogle Scholar
- Venkatesan K, Rual JF, Vazquez A, Stelzl U, Lemmens I, Hirozane-Kishikawa T, Hao T, Zenkner M, Xin X, Goh KI, Yildirim MA, Simonis N, Heinzmann K, Gebreab F, Sahalie JM, Cevik S, Simon C, de Smet AS, Dann E, Smolyar A, Vinayagam A, Yu H, Szeto D, Borick H, Dricot A, Klitgord N, Murray RR, Lin C, Lalowski M, Timm J, et al.: An empirical framework for binary interactome mapping. Nat Methods. 2009, 6: 83-90. 10.1038/nmeth.1280PubMed CentralView ArticlePubMedGoogle Scholar
- Braun P, Tasan M, Dreze M, Barrios-Rodiles M, Lemmens I, Yu H, Sahalie JM, Murray RR, Roncari L, de Smet AS, Venkatesan K, Rual JF, Vandenhaute J, Cusick ME, Pawson T, Hill DE, Tavernier J, Wrana JL, Roth FP, Vidal M: An experimentally derived confidence score for binary protein-protein interactions. Nat Methods. 2009, 6: 91-97. 10.1038/nmeth.1281PubMed CentralView ArticlePubMedGoogle Scholar
- Collins SR, Schuldiner M, Krogan NJ, Weissman JS: A strategy for extracting and analyzing large-scale quantitative epistatic interaction data. Genome Biol. 2006, 7: R63- 10.1186/gb-2006-7-7-r63PubMed CentralView ArticlePubMedGoogle Scholar
- Koh JL, Ding H, Costanzo M, Baryshnikova A, Toufighi K, Bader GD, Myers CL, Andrews BJ, Boone C: DRYGIN: a database of quantitative genetic interaction networks in yeast. Nucleic Acids Res. 2010, 38: D502-D507. 10.1093/nar/gkp820PubMed CentralView ArticlePubMedGoogle Scholar
- Eronen VP, Lindén RO, Lindroos A, Kanerva M, Aittokallio T: Genome-wide scoring of positive and negative epistasis through decomposition of quantitative genetic interaction fitness matrices. PLoS One. 2010, 5: e11611- 10.1371/journal.pone.0011611PubMed CentralView ArticlePubMedGoogle Scholar
- Ulitsky I, Krogan NJ, Shamir R: Towards accurate imputation of quantitative genetic interactions. Genome Biol. 2009, 10: R140- 10.1186/gb-2009-10-12-r140PubMed CentralView ArticlePubMedGoogle Scholar
- Ryan C, Greene D, Cagney G, Cunningham P: Missing value imputation for epistatic MAPs. BMC Bioinformatics. 2010, 11: 197- 10.1186/1471-2105-11-197PubMed CentralView ArticlePubMedGoogle Scholar
- Järvinen AP, Hiissa J, Elo LL, Aittokallio T: Predicting quantitative genetic interactions by means of sequential matrix approximation. PLoS One. 2008, 3: e3284.PubMed CentralView ArticlePubMedGoogle Scholar
- Mani R, St Onge RP, Hartman JL, Giaever G, Roth FP: Defining genetic interaction. Proc Natl Acad Sci USA. 2008, 105: 3461-3466. 10.1073/pnas.0712255105PubMed CentralView ArticlePubMedGoogle Scholar
- St Onge RP, Mani R, Oh J, Proctor M, Fung E, Davis RW, Nislow C, Roth FP, Giaever G: Systematic pathway analysis using high-resolution fitness profiling of combinatorial gene deletions. Nat Genet. 2007, 39: 199-206. 10.1038/ng1948PubMed CentralView ArticlePubMedGoogle Scholar
- Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M: BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 2006, 33: D535-D539. 10.1093/nar/gkj109.View ArticleGoogle Scholar
- Le Meur N, Gentleman R: Modeling synthetic lethality. Genome Biol. 2008, 9: R135- 10.1186/gb-2008-9-9-r135PubMed CentralView ArticlePubMedGoogle Scholar
- Bandyopadhyay S, Kelley R, Krogan NJ, Ideker T: Functional maps of protein complexes from quantitative genetic interaction data. PLoS Comput Biol. 2008, 4: e1000065- 10.1371/journal.pcbi.1000065PubMed CentralView ArticlePubMedGoogle Scholar
- Ulitsky I, Shlomi T, Kupiec M, Shamir R: From E-MAPs to module maps: dissecting quantitative genetic interactions using physical interactions. Mol Syst Biol. 2008, 4: 209- 10.1038/msb.2008.42PubMed CentralView ArticlePubMedGoogle Scholar
- Fischbach MA, Krogan NJ: The next frontier of systems biology: higher-order and interspecies interactions. Genome Biol. 2010, 11: 208.PubMed CentralView ArticlePubMedGoogle Scholar
- Van Driessche N, Demsar J, Booth EO, Hill P, Juvan P, Zupan B, Kuspa A, Shaulsky G: Epistasis analysis with global transcriptional phenotypes. Nat Genet. 2005, 37: 471-477. 10.1038/ng1545View ArticlePubMedGoogle Scholar
- Harrison R, Papp B, Pál C, Oliver SG, Delneri D: Plasticity of genetic interactions in metabolic networks of yeast. Proc Natl Acad Sci USA. 2007, 104: 2307-2312. 10.1073/pnas.0607153104PubMed CentralView ArticlePubMedGoogle Scholar
- Tischler J, Lehner B, Fraser AG: Evolutionary plasticity of genetic interaction networks. Nat Genet. 2008, 40: 390-391. 10.1038/ng.114View ArticlePubMedGoogle Scholar
- Dixon SJ, Andrews BJ, Boone C: Exploring the conservation of synthetic lethal genetic interaction networks. Commun Integr Biol. 2009, 2: 78-81.PubMed CentralView ArticlePubMedGoogle Scholar
- Jonikas MC, Collins SR, Denic V, Oh E, Quan EM, Schmid V, Weibezahn J, Schwappach B, Walter P, Weissman JS, Schuldiner M: Comprehensive characterization of genes required for protein folding in the endoplasmic reticulum. Science. 2009, 323: 1693-1697. 10.1126/science.1167983PubMed CentralView ArticlePubMedGoogle Scholar
- Battle A, Jonikas MC, Walter P, Weissman JS, Koller D: Automated identification of pathways from quantitative genetic interaction data. Mol Syst Biol. 2010, 6: 379- 10.1038/msb.2010.27PubMed CentralView ArticlePubMedGoogle Scholar
- Smialowski P, Pagel P, Wong P, Brauner B, Dunger I, Fobo G, Frishman G, Montrone C, Rattei T, Frishman D, Ruepp A: The Negatome database: a reference set of non-interacting protein pairs. Nucleic Acids Res. 2010, 38: D540-D544. 10.1093/nar/gkp1026PubMed CentralView ArticlePubMedGoogle Scholar
- Collins SR, Roguev A, Krogan NJ: Quantitative genetic interaction mapping using the E-MAP approach. Methods Enzymol. 2010, 470: 205-231. full_text full_textPubMed CentralView ArticlePubMedGoogle Scholar
- Pan X, Yuan DS, Xiang D, Wang X, Sookhai-Mahadeo S, Bader JS, Hieter P, Spencer F, Boeke JD: A robust toolkit for functional profiling of the yeast genome. Mol Cell. 2004, 16: 487-496. 10.1016/j.molcel.2004.09.035View ArticlePubMedGoogle Scholar
- Pan X, Yuan DS, Ooi SL, Wang X, Sookhai-Mahadeo S, Meluh P, Boeke JD: dSLAM analysis of genome-wide genetic interactions in Saccharomyces cerevisiae. Methods. 2007, 41: 206-221. 10.1016/j.ymeth.2006.07.033PubMed CentralView ArticlePubMedGoogle Scholar
- Baryshnikova A, Costanzo M, Dixon S, Vizeacoumar FJ, Myers CL, Andrews B, Boone C: Synthetic genetic array (SGA) analysis in Saccharomyces cerevisiae and Schizosaccharomyces pombe. Methods Enzymol. 2010, 470: 145-179. full_text full_textView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.