Quantitative maps of genetic interactions in yeast  Comparative evaluation and integrative analysis
 Rolf O Lindén^{1, 2},
 VillePekka Eronen^{1, 2} and
 Tero Aittokallio^{1, 2}Email author
DOI: 10.1186/17520509545
© Lindén et al; licensee BioMed Central Ltd. 2011
Received: 30 July 2010
Accepted: 24 March 2011
Published: 24 March 2011
Abstract
Background
Highthroughput genetic screening approaches have enabled systematic means to study how interactions among gene mutations contribute to quantitative fitness phenotypes, with the aim of providing insights into the functional wiring diagrams of genetic interaction networks on a global scale. However, it is poorly known how well these quantitative interaction measurements agree across the screening approaches, which hinders their integrated use toward improving the coverage and quality of the genetic interaction maps in yeast and other organisms.
Results
Using largescale data matrices from epistatic miniarray profiling (EMAP), genetic interaction mapping (GIM), and synthetic genetic array (SGA) approaches, we carried out here a systematic comparative evaluation among these quantitative maps of genetic interactions in yeast. The relatively low association between the original interaction measurements or their customized scores could be improved using a matrixbased modelling framework, which enables the use of single and doublemutant fitness estimates and measurements, respectively, when scoring genetic interactions. Toward an integrative analysis, we show how the detections from the different screening approaches can be combined to suggest novel positive and negative interactions which are complementary to those obtained using any single screening approach alone. The matrix approximation procedure has been made available to support the design and analysis of the future screening studies.
Conclusions
We have shown here that even if the correlation between the currently available quantitative genetic interaction maps in yeast is relatively low, their comparability can be improved by means of our computational matrix approximation procedure, which will enable integrative analysis and detection of a wider spectrum of genetic interactions using data from the complementary screening approaches.
Background
The recent advances in experimental biotechnologies have made it possible to start screening genomewide datasets of quantitative genetic interactions in model organisms such as yeast [1–3]. Highthroughput genetic screening approaches, such as those based on epistatic miniarray profiling (EMAP) [4–7], genetic interaction mapping (GIM) [8], and synthetic genetic array (SGA) [9–11], have provided systematic means to global investigation of quantitative relationship between genotype and phenotype, with potential implications for a wide range of biological phenomena, including, for instance, modularity, essentiality, redundancy, buffering, epistasis, evolution, canalization and development of human disease [1–3, 12–21]. The rapid accumulation of quantitative genetic interaction data is providing us with unique opportunities to decipher how genes function as networks to regulate cellular processes and to maintain mutational robustness. However, the massive datasets also call for principled modelling frameworks and efficient analytic approaches to take a full advantage of the indepth information encoded in the available and emerging quantitative interaction datasets [22]. In particular, efficient bioinformatics procedures enabling integrative analysis of multiple datasets from various screening approaches could increase the quality and coverage of the genetic interaction maps, with the aim of completing the genetic interaction networks in yeast and other organisms.
Comparing the results from the alternative experimental approaches is crucial for validating the observed interactions, estimating the biases related to each approach, and filling the gaps in the currently incomplete datasets. It is therefore likely that comprehensive mapping of the quantitative genetic interaction networks will require integration of a number datasets from different screening approaches, similar to the recent efforts to complete the physical proteinprotein interaction (PPI) networks in yeast and human [23–28]. A major challenge in such integrative analysis is that quantitative interaction data generated with the complementary experimental approaches in different laboratories are not directly comparable, due to differences, for instance, in experimental designs, growth conditions or screening protocols as well as in data preprocessing or scoring options. Even when the same mutant pairs are considered, the technical variation can lead to some disagreement in the detection results and to relatively large inconsistency between the datasets in general [8, 11]. The correction for such discrepancy can be beyond the capacity of the customized data processing techniques used within the individual screening approaches [29, 30]. A common modelling framework, adjusted for the different screening approaches, could improve the comparability of the results and allow for integrative analysis.
Compared to PPI networks, an additional challenge originates from the quantitative nature of the genetic interaction datasets; instead of comparing the overlap in binary terms, such as presence or absence of a physical interaction, here we should take into account the full spectrum of genetic interactions, ranging from extreme cases of negative interactions (i.e., synthetic sick and lethality) to the positive classes of interacting pairs (e.g., masking and suppression subcategories) [2, 3, 17]. We have recently shown that the quantitative data matrices obtained from the individual quantitative screening approaches can capture different portions of this spectrum, as compared to known classes of genetic interactions; for instance, the SGA and GIM datasets captured relatively well the negative classes of interactions, whereas the prediction of the positive interactions proved much more challenging when using the provided doublemutant fitness data alone [31]. Similar observations have been made also when using the highly processed EMAP data [32, 33]. To improve the predictive power of the individual quantitative datasets, we further developed our computational matrix approximation strategy [34], and showed that it could transform the original fitness matrices so that these allow for better discrimination of not only negative but also the positive end of interaction spectrum from the background variability [31].
In the present study, toward combining the quantitative detections from multiple largescale genetic interaction approaches, we investigated the consistency among the currently available quantitative interaction datasets in yeast, as well as the sensitivity and specificity of the genetic interactions detected by using the three screening approaches (SGA, GIM and EMAP), with respect to their overlap in common mutant pairs and coverage of known interacting pairs, as extracted from a goldstandard reference database of genetic interactions (BioGRID). We first show that the comparability of the detections between the different approaches can be improved using standardized matrixbased modelling framework within each individual dataset. Using appropriate scoring and aggregation functions, we then demonstrate how the detections from the different screening approaches can be combined more effectively, compared to that when using the individual datasets alone, suggesting that the matrix approximationbased metaanalytic procedure allows for the full exploitation of the existing data when predicting novel interactions or designing new experiments. To promote its widespread usage in the future screening studies, we have made publicly available an efficient, standalone Rimplementation of the quantilebased matrix approximation procedure (QMAP), which includes a number of useradjustable options that can be used to finetune the procedure for any given experimental dataset.
Results and Discussion
Scoring of quantitative genetic interactions
We have previously introduced a matrixbased modelling and approximation framework, and showed that it provides a quantitative and efficient means for scoring genetic interactions among thousands of genes, thereby leading to improved detection of both positive and negative pairs of interactions in largescale quantitative screening experiments [31, 34]. Briefly, the matrix approximation strategy is based on the observation that most gene pairs in the largescale genetic interaction screens have no significant interaction with each other [2, 3]. This implies that the singlemutant fitness effects, which are needed in the interaction scoring, could be estimated using solely the information encoded in the observed, doublemutant fitness matrix W, with entries w_{ ab }corresponding to the m query and n array strains, respectively, that is, a = 1,2,...m and b = 1,2,...n. The underlying idea of the matrix approximation it to decompose the original fitness matrix into separate components, W = x ⊗y, where the m and ndimensional vectors x and y model the variability across the array and query mutants, respectively [31, 34].
In the symmetric case, that is , the above equation expresses in matrix notation the wellestablished multiplicative null model, w_{ ab }= w_{ a }w_{ b }, which states that the expected neutral phenotype of an organism's fitness, under the null hypothesis that it carries two noninteracting mutations (a and b), can be estimated by the product of the corresponding singlemutant fitness effects (w_{ a }and w_{ b }, respectively) [35]. It was shown on symmetric, highresolution data that the product function is the best null model among a family of alternative models (minimum, additive and log functions), in the sense that it yields a distribution with location close to zero and low dispersion over all of the measured deviations ε_{ ab }= w_{ ab } w_{ a }w_{ b }[35, 36]. In the nonsymmetric case, n ≠ m, even though the singlemutant effects x and y are not necessarily equal, these together can provide individual estimates for w_{ a }and w_{ b }, respectively. In the present work, the estimation of x and y was performed using a robust, rankone matrix approximation method, named quantilebased matrix approximation (QMA) [31].
After performing the approximation of the doublemutant fitness matrix W under the null multiplicative model, the interaction class of a mutant pair (a,b) can be predicted using a specific scoring function s(x, y), such as minimum, maximum, product or scaled epistasis [13, 35, 36], which transform the original fitness matrix into a score (or residual) matrix s_{ ab }= w_{ ab } s(x_{ a }, y_{ b }). It has been shown before that there exists effective alternatives to the traditional product function when further classifying the significant genetic interactions into the positive and negative classes [13, 31]. Accordingly, the score values s_{ ab }can be used in place of the traditional deviations ε_{ ab }to test for a genetic interaction between genes a and b, where a large absolute score provides evidence for genetic interaction, while scores close to zero indicate noninteracting gene pairs. The positive interactions (or alleviating epistatic effects) should result in positive scores (s_{ ab }> 0), and the negative interactions (aggravating epistatic effects) in negative scores (s_{ ab }< 0), with synthetic lethality being the extreme case (w_{ ab }= 0).
Following the lessons learned from the integrative analysis of highthroughput PPI datasets [25], we first evaluated separately the data from the individual screening approaches (SGA, GIM and EMAP), against a goldstandard reference database of know interactions (BioGRID) [37]. Such withinapproach benchmarking resulted in specific parameter combinations for the dataadjusted QMA estimates and scoring functions for positive and negative genetic interaction classes (Additional File 1) [31]. In the following analyses, we utilized these same parameters and scoring functions to assess their robustness, and to demonstrate the relative advantages of the generic matrix approximation strategy, in terms of both improved comparability of the interaction scores as well as integrative detection of genetic interactions, among the screening approaches, in comparison to using the individual datasets alone. Our specific focus here is on the detection of pairs of positive interactions, the accurate scoring of which has been challenging in the past despite the quantitative approaches.
Agreement between the quantitative datasets
Parwise intersections between the three datasets used in the study
SGA  GIM  EMAP  

SGA  3885 × 1712  3881 × 15  543 × 339 
GIM  23.99%  5918 × 41  733 × 17 
EMAP  33.34%  5.14%  743 × 743 
Coverage of the known genetic interactions in the dataset pairs
SGA  GIM  EMAP  

SGA  810  4723  82  645  3217  16481 
GIM  0.14%  1.11%  0  85  141  603 
EMAP  1.75%  8.95%  1.13%  4.84%  1607  5297 
Even if the interactions extracted from the three datasets under study were pairwise deleted from the BioGRID's genetic interaction categories (Table 2), there may remain some bias in these categories toward the EMAP approach due to the large number of interactions identified in the three other largescale EMAP studies [4, 6, 7]. If these had also been excluded from the comparative analyses, the sizes of the reference positive and negative classes would have become much smaller, hence hindering the comparative evaluations. Due to this potential bias, the interaction detection results for the data pairs other than the SGA  GIM should be interpreted with caution. Moreover, it was not initially expected that the matrix approximation could provide any further improvements in the EMAP data, since this data has already been heavily preprocessed and customscored against an expected fitness [29], resulting in a symmetric and close to zerocentered data matrix [38]. Therefore, we focus here on illustrating the benefits of QMAbased integrative analysis using the detection of positive interactions in the SGA  GIM data pair as our principal case study; however, the full set of results are provided in Additional files 2  7.
Pairwise correlations between the three quantitative datasets
SGA  GIM  EMAP  

Fitness  Score  Fitness  Score  Fitness  Score  
SGA  Fitness  1.000  0.235  0.099  0.092  0.268  0.490 
Score  0.243  1.000  0.073  0.095  0.258  0.491  
GIM  Fitness  0.021  0.052  1.000  0.954  0.192  0.191 
Score  0.014  0.056  0.824  1.000  0.194  0.196  
EMAP  Fitness  0.152  0.245  0.209  0.219  1.000  0.994 
Score  0.144  0.245  0.201  0.221  0.981  1.000 
Predictive relationship between the datasets
The modelling framework makes it also possible to avoid performing the singlemutant growth experiments in the largescale genetic interaction screens, without compromising their quantitative scoring accuracy. Moreover, the modelestimated arrayvector was in a good agreement with the experimentallyderived singlemutant fitness measurements available in the SGA data (Spearman's correlation ranged from 0.964 to 0.996, depending whether we use the fixed QMA settings or those adjusted for positive interactions, respectively). Despite such high rank correlation levels, however, there is a significant difference in the location and scaling between the estimated and measured fitness values, indicating that the estimates encode added information for interaction scoring. The QMA settings used here were originally selected on the basis of the prerelease version of the SGA data [31], which contained only 1277 of the query mutations of the current SGA dataset (75%), thus indicating the robustness of the QMA settings. In the following section, we further highlight the potential of the modelbased strategy in integrative analysis by using the same QMA setup selected specifically for the positive interactions, even if this will likely to result in compromised prediction accuracies in the negative interaction classes.
Integrative identification of genetic interactions
After showing that the usage of the matrix approximationbased scoring system in place of the original doublemutant fitness matrix or its customscored version can lead to improvements in the comparability between the dataset pairs, we next evaluated whether these observed improvements in the rank correlation or prediction of the extreme pairs could contribute also to improved identification of genetic interactions, when using multiple datasets together, compared to using single datasets alone. To choose an appropriate data integration approach, we first evaluated the predictive performance of four rank aggregation functions (product, minimum, maximum and Borda count, which is effectively the same as the additive function), in terms of how accurately they can detect known pairs of interacting genes. Even if the QMAbased scoring setup was aimed here at the detection of positive interactions, we further tested its prediction capability also for the negative interactions to study its generalization capability beyond the type of interactions it was initially designed for. The prediction performance is illustrated here using the unbiased GIMSGA data pair, whereas the EMAP  SGA and EMAP  GIM pairs are provided in Additional File 6.
Detection accuracies using the datasets either alone or combined
Positive genetic interactions  Negative genetic interactions  

Dataset pair  Early sensitivity  Partial AUC  Overall AUC  Early sensitivity  Partial AUC  Overall AUC 
GIM  SGA  
GIM rank  0.205  0.445  0.794  0.488  0.624  0.872 
SGA rank  0.205  0.477  0.785  0.494  0.595  0.795 
Borda count  0.432  0.527  0.826  0.562  0.619  0.857 
Minimum rank  0.386  0.496  0.777  0.556  0.663  0.897 
Maximum rank  0.227  0.525  0.856  0.550  0.601  0.804 
Rank product  0.432  0.522  0.795  0.594  0.680  0.892 
EMAP  SGA  
EMAP rank  0.345  0.637  0.889  0.286  0.510  0.821 
SGA rank  0.148  0.338  0.772  0.208  0.366  0.734 
Borda count  0.348  0.542  0.868  0.304  0.477  0.805 
Minimum rank  0.337  0.511  0.832  0.252  0.477  0.825 
Maximum rank  0.255  0.575  0.887  0.294  0.460  0.769 
Rank product  0.347  0.539  0.854  0.304  0.501  0.826 
GIM  EMAP  
GIM rank  0.300  0.450  0.792  0.237  0.397  0.783 
EMAP rank  0.333  0.579  0.884  0.256  0.411  0.767 
Borda count  0.367  0.533  0.878  0.293  0.457  0.807 
Minimum rank  0.400  0.508  0.839  0.298  0.463  0.825 
Maximum rank  0.333  0.572  0.892  0.279  0.438  0.768 
Rank product  0.367  0.530  0.857  0.326  0.480  0.827 
Random classifier  0.010  0.050  0.500  0.010  0.050  0.500 
Taken together the integrative prediction results in the three dataset pairs, the Borda count and the rank product performed equally well when the aim is to identify the first candidate set of positive interactions with the highest specificity for followup studies, whereas the more stringent maximum function provided the best prediction accuracy when larger numbers of positive interactions are being identified. In the detection of negative interactions, the intermediate rank product showed consistently the best results among all the data pairs, making it an appropriate rank aggregation function in case both positive and negative interactions are being detected using the same setup. In addition to showing the benefits of the integrative detection, these results can also be used for comparative evaluation of the detection power among the individual datasets from the different screening approaches. For instance, on the basis of the same reference set of known interactions on a common set of shared mutant pairs in the SGA and GIM datasets, the GIM approach seems to detect particularly well larger number of negative interactions (Table 4), whereas the nearly genomewide SGA dataset provides comparable detection power in the positive end of the genetic interaction spectrum (Figure 5).
Although the integrative detection based on combined scores was shown to provide marked improvements in the detection of both positive and negative interaction classes when using the SGA and GIM datasets together, it was interesting to note that in the SGA  EMAP dataset pair, the EMAP data alone provided extremely good detection accuracies in the positive class of interactions (Table 4). Rather than being a result of the superiority of this particular dataset, this is more likely attributable to the fact that many of the pairs (23%) of positive interactions in the BioGRID originate from the other largescale genetic interaction screens performed with the EMAP approach [4, 6, 7] (Table 2). These pairs clearly dominate the joint distribution of the positive interactions, while being supported by the SGA approach to a varying degree (Additional File 4). Interestingly, the detection of the negative interactions by the EMAP approach alone was found suboptimal (Table 4). Moreover, the additional benefits gained by the integrative analysis were more pronounced in the GIM  EMAP than in the SGA  EMAP data pair (Table 4). These results demonstrate that the intrinsic differences between the screening approaches influence how much they can complement each other.
Conclusions
To our knowledge, the present study is the first systematic and objective comparative evaluation of data from the main largescale quantitative genetic interaction screening approaches (SGA, GIM and EMAP). We showed here that even if the association between the original fitness measurements or their interaction scores is relatively low, their comparability can be improved by means of our matrix approximation technique. Toward an integrative analysis, we showed that a multiapproach analysis of quantitative genetic interactions can provide novel findings which are complementary to those obtained using any single screening approach alone. An integrative analysis can therefore provide a systematic means to pool information from previous interaction studies, with the aim of maximizing the number of both positive and negative interactions without compromising the reliability of the detections, as well as of minimizing the number of additional experiments needed when prioritizing of future screens. In general, such computational approach can facilitate the experimental efforts by improving the quality and coverage of the current genetic interaction networks, towards completing the still incomplete information of genetic interactions in yeast, which is  by and large  complementary to that obtained from the physical protein interactions and complexes [1, 5, 11, 17, 39, 40].
Although these results already demonstrate the potential of integrating datasets across different screening approaches using the matrix approximation strategy, more comprehensive studies are warranted in the future that combine experimental data from various types of genetic interaction studies, such as those performed under different environmental conditions, using fitness phenotypes other than growth, or on multiple perturbations or study organisms to investigate questions related, for instance, to plasticity and evolution of genetic networks or higherorder and interspecies interactions [2, 3, 17, 41–47]. Although we illustrated here the feasibility of the integrative analysis through QMA with its previously fixed parameters and scoring functions selected for each screening approach individually, even better prediction accuracies will likely to be obtained after a systematic optimization of these options for each dataset combination, downstream analysis objective, and interaction strength level separately (Additional File 7). The efficient QMA Rpackage, which includes a number of useradjustable parameters (Additional File 8), was made available here to enable such tailored matrix approximation that meets the needs of a given study.
A potential limitation of the current evaluation setup is the definition of the reference set of interactions using the BioGRID database. For instance, since the interactions in the BioGRID database originate from multiple genetic interaction screening studies, there can be cases where a mutant pair AB is reported as encoding an interaction, even if BA is not, or where the reciprocal pairs AB and BA are marked as belonging to different classes of interactions. To make sure that such cases do not interfere with the comparative evaluations, we filtered out any unambiguous interaction pairs, and for the remaining interactions, we used the same interaction class for the reciprocal mutant pairs. Moreover, to provide as fair assessment as possible, we excluded those interactions identified from the datasets under comparison. Therefore, the detection accuracies presented here should be considered as lower bounds for the true accuracy of the screening approaches or their combination. Even if there may still remain some biases, especially toward the wellrepresented EMAP approach, the BioGRID database includes also a wide range of other largescale studies, thus providing a comprehensive reference set for the evaluations. To improve the future benchmarking studies, it would be beneficial to add a specific category for known noninteracting mutant pairs, similar to that available for physically noninteracting protein pairs [48].
Analogous to efforts for completing the mapping of the physical PPI networks [23–28], it would be important to provide the community with an easy access also to the raw interaction datasets, similar to that provided in the SGA database DRYGIN [30]. For instance, our matrix approximation procedure was much more efficient with the original doublemutant fitness measurements, as provided by the SGA and GIM laboratories, compared to the highly processed and scored EMAP datasets. The results with the EMAP being one of the datasets were in many cases drastically different from that with the SGA  GIM dataset pair. As with any highthroughput assays, the largescale genetic screening approaches are inherently noisy and biased in their nature, suggesting that each single assay can reveal only a limited scope of the full spectrum of genetic interaction classes. Therefore, it is likely that integrative analysis of data from the complementary screening approaches will be essential to complete the quantitative genetic interaction networks in yeast and other organisms. We invite those participating in the genetic interaction mapping effort to try out the matrix approximationbased procedure and to give us input and suggestions for its further improvements.
Methods
The methodological aim of the present study was to enable an integrated analysis of multiple genetic interaction datasets using a common scoring framework. adjusted for the highthroughput quantitative screening approaches. The next sections describe the genetic interaction datasets used to demonstrate the benefits of such integrative approach, as well as the methods used to model, standardize, compare and merge these datasets, while maintaining their biological consistency and quantitative nature.
Genetic interaction matrices
Three largescale quantitative data sets on yeast were used in the present work for the systematic and comparative evaluations. To investigate the potential limitations in the betweenapproach agreement and relative benefits gained by an integrative analysis among the currently available highthroughput quantitative genetic interaction maps, we chose representative example datasets across the spectrum of highthroughput interaction screening approaches currently used for Saccharomyces cerevisiae.
EMAP dataset
The first dataset was available from the epistatic miniarray profiling (EMAP) study of quantitative genetic interactions between genes involved in yeast chromosome biology [5]. The original fitness measurements among 754 alleles of 743 genes were highly filtered and processed, providing a symmetric data matrix with close to zerocentered quantitative distribution for the pairwise interaction scores [29, 49]. The raw, unprocessed doublemutant fitness measurements were not available from this study.
GIM dataset
Representing another screening approach, the genetic interaction mapping (GIM) combines ideas from the synthetic lethality analysis by microarray (SLAM) [50, 51] and from synthetic genetic array (SGA) approaches [9, 10]. The data matrix available from its pilot study contains doublemutant fitness measurements among 5918 array and 73 query genes [8]. The filtered fitness effects were transformed back to nonlogscale to produce quantitative distribution with mean and median close to unity.
SGA dataset
The third and the largest of the datasets is available from the recent SGA screening study [11]. This data set contains doublemutant fitness measurements among 3885 array and 1712 query genes. The filtered and normalized doublemutant fitness data matrix, with median close to unity, was used in the matrix approximation procedure. The same dataset also includes a customized SGA scoring of the gene pairs [30, 52], which was used here as a baseline value for our QMAbased scoring procedure.
Matrix approximation
The quantilebased matrix approximation (QMA) is an efficient rankone matrix approximation method, which is conceptually similar to the Tukey's median polish procedure, except that QMA uses multiplicative model instead of additive model and quantiles instead of medians [31]. More specifically, the estimation of the singlemutant fitness effects is based on subsequent calculation of the p and qquantile points for the rows and columns of the doublemutant fitness matrix W, respectively, and then arranging these quantiles in the estimated array and query vectors x and y.
Scoring of interactions
The presence and sign of an epistasis interaction between a gene pair (a,b) was scored using the residual s_{ ab }= w_{ ab } s(x_{ a }, y_{ b }). To avoid potential bias among the different genes in the datasets, duplicate rows and columns in the doublemutant fitness matrices were combined by calculating mean over the duplicates. The final dimensions of the data matrices are shown in Table 1. Before the data integration, each of the doublemutant fitness matrices was scored separately using the default QMA settings and scoring functions (Additional File 1), as described before [31].
Ranking of interactions
A gene pairs (a,b) was ranked according to its interaction score s_{ ab }obtained in each individual dataset using the fixed QMA settings and scoring functions for positive interactions (Additional File 1). A rankbased data aggregation was used for robust integration of the scores from two screening approaches. More precisely, four rank aggregation functions (minimum of the ranks, maximum of the ranks, product of the ranks, and Borda count, which is effectively the sum of the ranks) were evaluated in terms of their accuracy, compared to using the rankings from a single dataset alone.
Evaluation setup and measures
The pairwise intersections between the three dataset pairs were evaluated separately in terms of their number of common array and query mutants (Table 1), the coverage of the known pairs of genetic interactions (Table 2), as well as their association in fitness values and interaction scores across the shared mutant pairs (Table 3). The shared intersection among all the three datasets was only 498 × 7 in size, including 178 known negative and only 31 known positive interactions from the BioGRID database. Therefore, this triple intersection could not be reliably evaluated here.
BioGRID interaction matrix
We used the interactions available in the goldstandard BioGRID database (version 3.0.64 for S. cerevisia e) [37]. We constructed a BioGRID's interaction matrix by treating the gene pairs extracted from the database as unordered, meaning that if an interaction exists for a mutant pair AB, we also copied the same interaction for the mutant pair BA for biological consistency. Similar symmetric strategy has been used also in previous studies [4–7, 11, 31]. For each pairwise intersection between datasets, separate positive and negative interaction matrices were created for evaluation purposes.
BioGRID interaction classes
Positive interaction matrix is constructed using 'Phenotypic Suppression' and 'Positive Genetic' categories from BioGRID database, and negative interaction matrix was generated by combining 'Synthetic Lethal', 'Synthetic Growth Defect', 'Phenotypic Enhancement' and 'Negative Genetic' categories. Such interaction matrices are ternary matrices with entries representing either an interacting, noninteracting or ambiguous case, where the pair belongs to both interaction classes. Since the ambiguous cases can lead to biases in the evaluation results, they were excluded from the evaluations.
Agreement between the datasets
The congruence between the dataset pairs was evaluated by calculating the Pearson and Spearman correlations across those mutant pairs shared by both datasets. The agreement of the datasets in terms of their extreme fitness values or interaction scores was evaluated by constructing interaction matrices using one of the datasets to define positive and negative genetic interactions. We used extreme 3% of the mutant pairs, according to the interaction rate estimate based on unbiased screens (3.15% [11]), and the BioGRID interactions here among the three dataset intersections (2.99%; Table 2). Other cutoff levels (1% and 5%) were also considered (Additional File 7).
Receiver operating characteristics
The receiver operating characteristic (ROC) curves were used to assess the discovery rate of genetic interactions. A single ROC curve summarizes the tradeoff between true positive rate (TPR) and false positive rate (FPR) on a ranked list of mutant pairs. The true and false interactions were defined here using the interaction matrices (from the BioGRID or using 3% extreme values). The overall prediction performance was summarized using the area under the ROC curve (AUC). For an ideal classifier, TPR = 1, FPR = 0 and AUC = 1, whereas a random classifier has on average AUC of 0.5.
Partial AUC and early sensitivity
In many practical application cases, only the first few candidate mutant pair can be followedup in further validation studies. Therefore, it is important to evaluate also the performance of a mutant pair ranking at low FPR levels, that is, for those pairs with highest specificity. We used here the partial area under the ROC curve (pAUC), in which the range of FPR is limited to a predefined interval between zero and r (here r = 0.1), and the resulting area is then normalized by dividing it with r. To investigate the early sensitivity of the detections, we also calculated the TPR at FPR of 0.01.
Enrichment of genetic interactions
Here, K is the total number of gene pairs in the grid, M is the total number of (positive or negative) interactions (M ≤ K), m is the number of interactions found (m ≤ M), and t is the number of gene pairs in the particular grid cell (m ≤ t ≤ K). The pvalues in the figures were limited between 10^{100} and 0.99.
Implementation issues
To promote its widespread usage in the future screening studies, we have made publicly available an efficient, standalone Rimplementation of the quantilebased matrix approximation procedure (QMAP). This implementation includes a number of useradjustable options that can be adjusted through a graphical user interface to fine tune the procedure for a given experimental dataset and downstream analysis object under investigation. Along with the open source Rcode, the implementation contains documentation of the data format for the input data, the parameters of the various options, as well the output data of the QMAP (Additional File 8).
List of abbreviations
 AUC:

area under the curve
 EMAP:

epistatic miniarray profiling
 FPR:

false positive rate
 GIM:

genetic interaction mapping
 PPI:

proteinprotein interaction
 PAUC:

partial area under the curve (pAUC)
 QMA:

quantilebased matrix approximation
 SLAM:

synthetic lethality analysis by microarray
 ROC:

receiver operating characteristic
 SGA:

synthetic genetic array
 TPR:

true positive rate.
Declarations
Acknowledgements
The authors thank Prof. Charlie Boone and Dr. Cosmin Saveanu for providing us with the quantitative SGA and GIM datasets, respectively. The work was supported by the Academy of Finland (grants 120 569, 133 227 and 140 880).
Authors’ Affiliations
References
 Boone C, Bussey H, Andrews BJ: Exploring genetic interactions and networks with yeast. Nat Rev Genet. 2007, 8: 437449. 10.1038/nrg2085View ArticlePubMedGoogle Scholar
 Dixon SJ, Costanzo M, Baryshnikova A, Andrews B, Boone C: Systematic mapping of genetic interaction networks. Annu Rev Genet. 2009, 43: 601625. 10.1146/annurev.genet.39.073003.114751View ArticlePubMedGoogle Scholar
 Beltrao P, Cagney G, Krogan NJ: Quantitative genetic interactions reveal biological modularity. Cell. 2010, 141: 73945. 10.1016/j.cell.2010.05.019PubMed CentralView ArticlePubMedGoogle Scholar
 Schuldiner M, Collins SR, Thompson NJ, Denic V, Bhamidipati A, Punna T, Ihmels J, Andrews B, Boone C, Greenblatt JF, Weissman JS, Krogan NJ: Exploration of the function and organization of the yeast early secretory pathway through an epistatic miniarray profile. Cell. 2005, 123: 507519. 10.1016/j.cell.2005.08.031View ArticlePubMedGoogle Scholar
 Collins SR, Miller KM, Maas NL, Roguev A, Fillingham J, Chu CS, Schuldiner M, Gebbia M, Recht J, Shales M, Ding H, Xu H, Han J, Ingvarsdottir K, Cheng B, Andrews B, Boone C, Berger SL, Hieter P, Zhang Z, Brown GW, Ingles CJ, Emili A, Allis CD, Toczyski DP, Weissman JS, Greenblatt JF, Krogan NJ: Functional dissection of protein complexes involved in yeast chromosome biology using a genetic interaction map. Nature. 2007, 446: 806810. 10.1038/nature05649View ArticlePubMedGoogle Scholar
 Wilmes GM, Bergkessel M, Bandyopadhyay S, Shales M, Braberg H, Cagney G, Collins SR, Whitworth GB, Kress TL, Weissman JS, Ideker T, Guthrie C, Krogan NJ: A genetic interaction map of RNAprocessing factors reveals links between Sem1/Dss1containing complexes and mRNA export and splicing. Mol Cell. 2008, 32: 735746. 10.1016/j.molcel.2008.11.012PubMed CentralView ArticlePubMedGoogle Scholar
 Fiedler D, Braberg H, Mehta M, Chechik G, Cagney G, Mukherjee P, Silva AC, Shales M, Collins SR, van Wageningen S, Kemmeren P, Holstege FC, Weissman JS, Keogh MC, Koller D, Shokat KM, Krogan NJ: Functional organization of the S. cerevisiae phosphorylation network. Cell. 2009, 136: 952963. 10.1016/j.cell.2008.12.039PubMed CentralView ArticlePubMedGoogle Scholar
 Decourty L, Saveanu C, Zemam K, Hantraye F, Frachon E, Rousselle JC, FromontRacine M, Jacquier A: Linking functionally related genes by sensitive and quantitative characterization of genetic interaction profiles. Proc Natl Acad Sci USA. 2008, 105: 58215826. 10.1073/pnas.0710533105PubMed CentralView ArticlePubMedGoogle Scholar
 Tong AH, Evangelista M, Parsons AB, Xu H, Bader GD, Pagé N, Robinson M, Raghibizadeh S, Hogue CW, Bussey H, Andrews B, Tyers M, Boone C: Systematic genetic analysis with ordered arrays of yeast deletion mutants. Science. 2001, 294: 23642368. 10.1126/science.1065810View ArticlePubMedGoogle Scholar
 Tong AH, Lesage G, Bader GD, Ding H, Xu H, Xin X, Young J, Berriz GF, Brost RL, Chang M, Chen Y, Cheng X, Chua G, Friesen H, Goldberg DS, Haynes J, Humphries C, He G, Hussein S, Ke L, Krogan N, Li Z, Levinson JN, Lu H, Ménard P, Munyana C, Parsons AB, Ryan O, Tonikian R, Roberts T, et al.: Global mapping of the yeast genetic interaction network. Science. 2004, 303: 808813. 10.1126/science.1091317View ArticlePubMedGoogle Scholar
 Costanzo M, Baryshnikova A, Bellay J, Kim Y, Spear ED, Sevier CS, Ding H, Koh JL, Toufighi K, Mostafavi S, Prinz J, St Onge RP, VanderSluis B, Makhnevych T, Vizeacoumar FJ, Alizadeh S, Bahr S, Brost RL, Chen Y, Cokol M, Deshpande R, Li Z, Lin ZY, Liang W, Marback M, Paw J, San Luis BJ, Shuteriqi E, Tong AH, van Dyk N, et al.: The genetic landscape of a cell. Science. 2010, 327: 425431. 10.1126/science.1180823View ArticlePubMedGoogle Scholar
 Hartman JL, Garvik B, Hartwell L: Principles for the buffering of genetic variation. Science. 2001, 291: 10011004. 10.1126/science.291.5506.1001View ArticlePubMedGoogle Scholar
 Segrè D, Deluna A, Church GM, Kishony R: Modular epistasis in yeast metabolism. Nat Genet. 2005, 37: 7783.PubMedGoogle Scholar
 Davierwala AP, Haynes J, Li Z, Brost RL, Robinson MD, Yu L, Mnaimneh S, Ding H, Zhu H, Chen Y, Cheng X, Brown GW, Boone C, Andrews BJ, Hughes TR: The synthetic genetic interaction spectrum of essential genes. Nat Genet. 2005, 37: 11471152. 10.1038/ng1640View ArticlePubMedGoogle Scholar
 Ooi SL, Pan X, Peyser BD, Ye P, Meluh PB, Yuan DS, Irizarry RA, Bader JS, Spencer FA, Boeke JD: Global syntheticlethality analysis and yeast functional profiling. Trends Genet. 2006, 22: 5663. 10.1016/j.tig.2005.11.003View ArticlePubMedGoogle Scholar
 Jasnos L, Korona R: Epistatic buffering of fitness loss in yeast double deletion strains. Nat Genet. 2007, 39: 550554. 10.1038/ng1986View ArticlePubMedGoogle Scholar
 Beyer A, Bandyopadhyay S, Ideker T: Integrating physical and genetic maps: from genomes to interaction networks. Nat Rev Genet. 2007, 8: 699710. 10.1038/nrg2144PubMed CentralView ArticlePubMedGoogle Scholar
 Ulitsky I, Shamir R: Pathway redundancy and protein essentiality revealed in the Saccharomyces cerevisiae interaction networks. Mol Syst Biol. 2007, 3: 104 10.1038/msb4100144PubMed CentralView ArticlePubMedGoogle Scholar
 Lehner B: Modelling genotypephenotype relationships and human disease with genetic interaction networks. J Exp Biol. 2007, 210: 15591566. 10.1242/jeb.002311View ArticlePubMedGoogle Scholar
 Phillips PC: Epistasis  the essential role of gene interactions in the structure and evolution of genetic systems. Nat Rev Genet. 2008, 9: 855867. 10.1038/nrg2452PubMed CentralView ArticlePubMedGoogle Scholar
 Gao H, Granka JM, Feldman MW: On the classification of epistatic interactions. Genetics. 2010, 184: 827837. 10.1534/genetics.109.111120PubMed CentralView ArticlePubMedGoogle Scholar
 Breker M, Schuldiner M: Explorations in topologydelving underneath the surface of genetic interaction maps. Mol Biosyst. 2009, 5: 14731481. 10.1039/b907076cView ArticlePubMedGoogle Scholar
 Hart GT, Ramani AK, Marcotte EM: How complete are current yeast and human proteininteraction networks?. Genome Biol. 2006, 7: 120 10.1186/gb2006711120PubMed CentralView ArticlePubMedGoogle Scholar
 Goll J, Uetz P: The elusive yeast interactome. Genome Biol. 2006, 7: 223.PubMed CentralView ArticlePubMedGoogle Scholar
 Gentleman R, Huber W: Making the most of highthroughput proteininteraction data. Genome Biol. 2007, 8: 112 10.1186/gb2007810112PubMed CentralView ArticlePubMedGoogle Scholar
 Futschik ME, Chaurasia G, Herzel H: Comparison of human proteinprotein interaction maps. Bioinformatics. 2007, 23: 605611. 10.1093/bioinformatics/btl683View ArticlePubMedGoogle Scholar
 Venkatesan K, Rual JF, Vazquez A, Stelzl U, Lemmens I, HirozaneKishikawa T, Hao T, Zenkner M, Xin X, Goh KI, Yildirim MA, Simonis N, Heinzmann K, Gebreab F, Sahalie JM, Cevik S, Simon C, de Smet AS, Dann E, Smolyar A, Vinayagam A, Yu H, Szeto D, Borick H, Dricot A, Klitgord N, Murray RR, Lin C, Lalowski M, Timm J, et al.: An empirical framework for binary interactome mapping. Nat Methods. 2009, 6: 8390. 10.1038/nmeth.1280PubMed CentralView ArticlePubMedGoogle Scholar
 Braun P, Tasan M, Dreze M, BarriosRodiles M, Lemmens I, Yu H, Sahalie JM, Murray RR, Roncari L, de Smet AS, Venkatesan K, Rual JF, Vandenhaute J, Cusick ME, Pawson T, Hill DE, Tavernier J, Wrana JL, Roth FP, Vidal M: An experimentally derived confidence score for binary proteinprotein interactions. Nat Methods. 2009, 6: 9197. 10.1038/nmeth.1281PubMed CentralView ArticlePubMedGoogle Scholar
 Collins SR, Schuldiner M, Krogan NJ, Weissman JS: A strategy for extracting and analyzing largescale quantitative epistatic interaction data. Genome Biol. 2006, 7: R63 10.1186/gb200677r63PubMed CentralView ArticlePubMedGoogle Scholar
 Koh JL, Ding H, Costanzo M, Baryshnikova A, Toufighi K, Bader GD, Myers CL, Andrews BJ, Boone C: DRYGIN: a database of quantitative genetic interaction networks in yeast. Nucleic Acids Res. 2010, 38: D502D507. 10.1093/nar/gkp820PubMed CentralView ArticlePubMedGoogle Scholar
 Eronen VP, Lindén RO, Lindroos A, Kanerva M, Aittokallio T: Genomewide scoring of positive and negative epistasis through decomposition of quantitative genetic interaction fitness matrices. PLoS One. 2010, 5: e11611 10.1371/journal.pone.0011611PubMed CentralView ArticlePubMedGoogle Scholar
 Ulitsky I, Krogan NJ, Shamir R: Towards accurate imputation of quantitative genetic interactions. Genome Biol. 2009, 10: R140 10.1186/gb20091012r140PubMed CentralView ArticlePubMedGoogle Scholar
 Ryan C, Greene D, Cagney G, Cunningham P: Missing value imputation for epistatic MAPs. BMC Bioinformatics. 2010, 11: 197 10.1186/1471210511197PubMed CentralView ArticlePubMedGoogle Scholar
 Järvinen AP, Hiissa J, Elo LL, Aittokallio T: Predicting quantitative genetic interactions by means of sequential matrix approximation. PLoS One. 2008, 3: e3284.PubMed CentralView ArticlePubMedGoogle Scholar
 Mani R, St Onge RP, Hartman JL, Giaever G, Roth FP: Defining genetic interaction. Proc Natl Acad Sci USA. 2008, 105: 34613466. 10.1073/pnas.0712255105PubMed CentralView ArticlePubMedGoogle Scholar
 St Onge RP, Mani R, Oh J, Proctor M, Fung E, Davis RW, Nislow C, Roth FP, Giaever G: Systematic pathway analysis using highresolution fitness profiling of combinatorial gene deletions. Nat Genet. 2007, 39: 199206. 10.1038/ng1948PubMed CentralView ArticlePubMedGoogle Scholar
 Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M: BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 2006, 33: D535D539. 10.1093/nar/gkj109.View ArticleGoogle Scholar
 Le Meur N, Gentleman R: Modeling synthetic lethality. Genome Biol. 2008, 9: R135 10.1186/gb200899r135PubMed CentralView ArticlePubMedGoogle Scholar
 Bandyopadhyay S, Kelley R, Krogan NJ, Ideker T: Functional maps of protein complexes from quantitative genetic interaction data. PLoS Comput Biol. 2008, 4: e1000065 10.1371/journal.pcbi.1000065PubMed CentralView ArticlePubMedGoogle Scholar
 Ulitsky I, Shlomi T, Kupiec M, Shamir R: From EMAPs to module maps: dissecting quantitative genetic interactions using physical interactions. Mol Syst Biol. 2008, 4: 209 10.1038/msb.2008.42PubMed CentralView ArticlePubMedGoogle Scholar
 Fischbach MA, Krogan NJ: The next frontier of systems biology: higherorder and interspecies interactions. Genome Biol. 2010, 11: 208.PubMed CentralView ArticlePubMedGoogle Scholar
 Van Driessche N, Demsar J, Booth EO, Hill P, Juvan P, Zupan B, Kuspa A, Shaulsky G: Epistasis analysis with global transcriptional phenotypes. Nat Genet. 2005, 37: 471477. 10.1038/ng1545View ArticlePubMedGoogle Scholar
 Harrison R, Papp B, Pál C, Oliver SG, Delneri D: Plasticity of genetic interactions in metabolic networks of yeast. Proc Natl Acad Sci USA. 2007, 104: 23072312. 10.1073/pnas.0607153104PubMed CentralView ArticlePubMedGoogle Scholar
 Tischler J, Lehner B, Fraser AG: Evolutionary plasticity of genetic interaction networks. Nat Genet. 2008, 40: 390391. 10.1038/ng.114View ArticlePubMedGoogle Scholar
 Dixon SJ, Andrews BJ, Boone C: Exploring the conservation of synthetic lethal genetic interaction networks. Commun Integr Biol. 2009, 2: 7881.PubMed CentralView ArticlePubMedGoogle Scholar
 Jonikas MC, Collins SR, Denic V, Oh E, Quan EM, Schmid V, Weibezahn J, Schwappach B, Walter P, Weissman JS, Schuldiner M: Comprehensive characterization of genes required for protein folding in the endoplasmic reticulum. Science. 2009, 323: 16931697. 10.1126/science.1167983PubMed CentralView ArticlePubMedGoogle Scholar
 Battle A, Jonikas MC, Walter P, Weissman JS, Koller D: Automated identification of pathways from quantitative genetic interaction data. Mol Syst Biol. 2010, 6: 379 10.1038/msb.2010.27PubMed CentralView ArticlePubMedGoogle Scholar
 Smialowski P, Pagel P, Wong P, Brauner B, Dunger I, Fobo G, Frishman G, Montrone C, Rattei T, Frishman D, Ruepp A: The Negatome database: a reference set of noninteracting protein pairs. Nucleic Acids Res. 2010, 38: D540D544. 10.1093/nar/gkp1026PubMed CentralView ArticlePubMedGoogle Scholar
 Collins SR, Roguev A, Krogan NJ: Quantitative genetic interaction mapping using the EMAP approach. Methods Enzymol. 2010, 470: 205231. full_text full_textPubMed CentralView ArticlePubMedGoogle Scholar
 Pan X, Yuan DS, Xiang D, Wang X, SookhaiMahadeo S, Bader JS, Hieter P, Spencer F, Boeke JD: A robust toolkit for functional profiling of the yeast genome. Mol Cell. 2004, 16: 487496. 10.1016/j.molcel.2004.09.035View ArticlePubMedGoogle Scholar
 Pan X, Yuan DS, Ooi SL, Wang X, SookhaiMahadeo S, Meluh P, Boeke JD: dSLAM analysis of genomewide genetic interactions in Saccharomyces cerevisiae. Methods. 2007, 41: 206221. 10.1016/j.ymeth.2006.07.033PubMed CentralView ArticlePubMedGoogle Scholar
 Baryshnikova A, Costanzo M, Dixon S, Vizeacoumar FJ, Myers CL, Andrews B, Boone C: Synthetic genetic array (SGA) analysis in Saccharomyces cerevisiae and Schizosaccharomyces pombe. Methods Enzymol. 2010, 470: 145179. full_text full_textView ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.