Knowledge based identification of essential signaling from genome-scale siRNA experiments
© Bankhead et al; licensee BioMed Central Ltd. 2009
Received: 10 February 2009
Accepted: 5 August 2009
Published: 5 August 2009
A systems biology interpretation of genome-scale RNA interference (RNAi) experiments is complicated by scope, experimental variability and network signaling robustness. Over representation approaches (ORA), such as the Hypergeometric or z-score, are an established statistical framework used to associate RNA interference effectors to biologically annotated gene sets or pathways. These methods, however, do not directly take advantage of our growing understanding of the interactome. Furthermore, these methods can miss partial pathway activation and may be biased by protein complexes. Here we present a novel ORA, protein interaction permutation analysis (PIPA), that takes advantage of canonical pathways and established protein interactions to identify pathways enriched for protein interactions connecting RNAi hits.
We use PIPA to analyze genome-scale siRNA cell growth screens performed in HeLa and TOV cell lines. First we show that interacting gene pair siRNA hits are more reproducible than single gene hits. Using protein interactions, PIPA identifies enriched pathways not found using the standard Hypergeometric analysis including the FAK cytoskeletal remodeling pathway. Different branches of the FAK pathway are distinctly essential in HeLa versus TOV cell lines while other portions are uneffected by siRNA perturbations. Enriched hits belong to protein interactions associated with cell cycle regulation, anti-apoptosis, and signal transduction.
PIPA provides an analytical framework to interpret siRNA screen data by merging biologically annotated gene sets with the human interactome. As a result we identify pathways and signaling hypotheses that are statistically enriched to effect cell growth in human cell lines. This method provides a complementary approach to standard gene set enrichment that utilizes the additional knowledge of specific interactions within biological gene sets.
The ability to study a gene's contribution to phenotype through RNA interference (RNAi) has provided unprecedented insight to the essential biology of mammalian cell lines. RNAi knockdowns inhibit messenger RNA translation leading to changes in protein concentration, protein interactions, transcription, and ultimately an effect on phenotype [1–3]. Genome-scale siRNA phenotype screens consist of thousands of targeted perturbation experiments to identify significant effectors on a phenotype of interest, such as cell growth. As these high-throughput screens become more automated and less expensive, there is a growing demand to associate siRNA hits with the interactome.
Unfortunately, the interpretation of genome-scale RNAi phenotype screens is complicated by several sources of experimental variability. Off-target effects arise when the change in phenotype is not a result of a targeted mRNA knockdown, but rather the knockdown of some other mRNA. Cell-line specific differences in RNAi efficacy may result in attenuated knockdown phenotypes for essential effector genes [4–6]. Furthermore, the robustness of genetic regulatory networks complicates the analysis of RNAi phenotype data. Gene knockout studies have demonstrated that a minority of genes, only 19% in S. cerevisiae, are lethal when deleted under laboratory growth conditions . Genome-scale knockdown studies in Drosophila and human cell lines also demonstrate that a relatively small proportion of knockdowns affect growth phenotypes [8, 9]. Several reasons for robustness include signaling modularity, redundancy and feedback loops [2, 10–12]. As a result, knockdowns that cause an impaired growth phenotype provide a glimpse to uncommonly sensitive areas of cell signaling.
Gene set enrichment methods are a conventional tool in the analysis of high throughput datasets. These established statistical protocols were originally used to associate differentially expressed genes from microarray experiments with biologically annotated gene sets such as Gene Ontology (GO) categories, canonical pathways, or protein complexes [5, 8, 13–16]. These over representation approaches (ORA) use a statistic, such as Hypergeometric or average z-score, to assign a p-value that is the probability of seeing the observed overlap of a gene hit list and gene set by chance. ORA methods, however, do not directly take interactions between specific set members into account and this is additional biological information that can be utilized in knowledge-based enrichment approaches. For example, the EGFR pathway contains four types of ErbB family tyrosine kinase receptors that are activated by distinct ligands (e.g. EGF, TGFα) and initiate distinct signal transduction cascades . Consequently, the specific combination of screen hits represented in a pathway provides additional information beyond the simple count of hits occurring in this pathway. An ORA that takes advantage of known connectivity between gene set members provides a complementary view to the results provided by conventional enrichment methods (i.e. the Hypergeometric) and identify signaling events that are enriched for siRNA hits.
To our knowledge, the only pathway enrichment method that takes advantage of knowledge of specific interactions within gene sets was presented by Draghici et al. to analyze gene expression signatures. An impact analysis is used to count all possible paths (interactions) between differentially expressed genes in KEGG pathways. Unfortunately the pathway score is weighted by classic Hypergeometric enrichment analysis (HGA) and the authors do not discuss how results differ based solely on intra-pathway connectivity. This method is also subject to connectivity biases of each gene product causing highly connected genes to be counted in more paths.
Several other papers have interpreted RNAi data using protein interactions. Work by Friedman et al. combined an RNAi screen assayed by protein readout of extra-cellular regulating kinase (ERK) with literature-curated protein interactions to produce a protein interaction network . Another approach recently published by Huang et al. uses gene set enrichment of GO categories to select hits and then connects them with literature-reported protein interactions . The interactome is known to be highly-connected so it is not surprising that protein interactions are found between hits. These approaches do not take into account difference in connectivity between gene products. This bias is cause for concern because knockdown gene hits that are involved in many annotated protein interactions are more likely to be connected with other hits simply by chance.
We propose here an ORA method called protein interaction permutation analysis (PIPA) that takes advantage of literature-curated protein interactions between gene products within gene sets. This method uses a graph permutation algorithm to create a null distribution that takes into account connectivity biases of the known interactome to identify genet sets with statistically enriched interactions between RNAi targeted gene products. It is our hypothesis that pathways enriched for interactions connecting RNAi hits effecting cell growth capitulate essential signaling.
We use PIPA to analyze siRNA cell growth phenotype screens performed on HeLa and TOV cell lines. To justify using a gene interaction ORA, we show that hits mapping to interacting gene pairs are more reproducible between replicate screens. Next we show that PIPA uniquely identifies statistically enriched pathways that the Hypergeometric does not. Finally we produce a global essential signaling network using enriched protein interactions and annotate portions of this network using GO categories.
Results and Discussion
Protein Interaction Permutation Analysis (PIPA) Algorithm
For a given gene set, PIPA identifies the probability of seeing an observed number of protein interactions between siRNA hits by chance. We start with a gene set derived from a GeneGo Metacore Pathway. This gene set, G, is filtered to only contain genes targeted in the siRNA screen. Gene set members are labeled as "hits" or "non-hits" as described in the methods section. Gene set members are connected using literature-curated interactions to create a network N G where genes are represented as nodes and interactions are represented as edges. We label the number of observed edges connecting hits as O.
A network permutation method is used to derive a null distribution by permuting node labels. This approach is an extension of the graph permutation algorithm presented by Balasubramanian et al. and allows the topology of N G to remain constant . By shuffling node labels we sidestep the connection bias of highly annotated (highly connected) gene set members. A minority of the knockdown genes in our siRNA screen are identified as hits (~6%) and accordingly a minority of permutations will label these highly connected nodes as hits.
Starting with network N G , we shuffle node labels with all other genes targeted in the screen to create a shuffled version of N G we refer to as . The number of edges connecting hits within is recorded. This process is iterated 10,000 times and a p-value is calculated as the proportion of sampled permutations that have O or greater edges connecting hits. We use a p-value threshold of 0.05 to reject the null hypothesis that the number of observed edges connecting hits is what we would expect by chance. Because of differences in overlap between members and networks across gene sets, we leave p-values unadjusted for multiple testing for a fair comparison between PIPA and conventional hypergeometric enrichment analysis.
We divide the global interactome network into sub-networks that are based on canonical gene sets for several reasons. As an initial experiment, we applied the above permutation algorithm to the global network and found that hits were not significantly connected. However when focusing on sub-networks derived from gene sets, we find significant enrichment for highly connected hits. This finding is consistent with work by Horvath et al. where network properties from co-expression sub-networks (modules) are more predictive than network properties from the global co-expression network . Another advantage to dividing the global interactome into gene sets, is that by their definition canonical gene sets are biologically annotated and provide an intrinsic systems biology explanation of hits. Finally, by focusing on these sub-networks we are able to divide-and-conquer the global network in a biologically supported and computationally tractable manner.
Interacting gene pair siRNA hits are more reproducible than single gene hits
Of note, the requirement that two hits share an interaction substantially reduces the number of edge hits compared to node hits. If we view nodes that are connected by edges as random independent variables with a ~6% chance of being a hit, the likelihood of drawing two hits by chance (~0.36%) is dramatically smaller. The use of interaction data to select among hits, consequently, substantially reduces the total number of hits while simultaneously enriching for hits that are more likely to be reproduced in replicate screens. This observation supports our hypothesis that analyzing high-throughput siRNA data in the context of protein interactions is likely to enable identification of essential signaling cascades.
PIPA identifies both protein complexes and signaling interactions
When PIPA is carried out using all 16,478 protein interactions, the majority of edge hits in enriched gene sets result from binding interactions (86.9%). Of these enriched binding edges, 72.5% are from protein complexes, as defined by GO Cellular Component categories that are children of the "protein complex" category (GO:0043234). It is expected that sub-units of protein complexes will have similar phenotypes in RNAi screens, and in fact RNAi studies often use essential protein complexes to identify false negatives because each knockdown targeted complex sub-unit is known to have a significant effect on phenotype [9, 15]. Additionally protein interaction databases typically include binding interactions between each sub-unit of large complexes, for example the eukaryotic initiation factor 3 (EIF3) or proteasome, resulting in highly-connected graphs between 13 or 27 complex sub-units (respectively) . As a result, any edge-based ORA will be biased by these highly-connected portions of the global interactome map.
Although the identification of interactions between sub-units of essential protein complexes represents true biology, these highly connected pieces of the interactome obscure regulatory protein interactions which are of more interest for understanding cell signaling. Consequently, to identify essential signaling cascades, we restricted the interaction data set to 11,017 phosphorylation and expression regulation interactions. With this signaling-focused interaction set, 80 edge hits in enriched pathways were identified using PIPA. Of note, while the initial interaction set contained roughly equal proportions of phosphorylation and expression regulation edges, the edge hits from the enriched pathways were significantly enriched for phosphorylation interactions (Figure 2), additionally supporting the hypothesis that PIPA is detecting essential signaling events. Interestingly, this result also suggests that expression regulation is more robust to perturbation than is phosphorylation. This observation may reflect the fact that expression regulation interactions reported in these databases are more likely to be indirect interactions mediated by other genes and gene products than are phosphorylation interactions.
PIPA identifies distinct enriched pathways relative to the Hypergeometric
PIPA detects active signaling branches with pathway gene sets
Essential Signaling Network Identified in HeLa and TOV screens
Genome scale siRNA screens provide an exciting opportunity to investigate the interactome. One caveat of these high-throughput screens is that generally only a few cell attributes are measured after each perturbation. There is a strong need to map hits to the interactome for biological interpretation and generation of signaling hypotheses. Here we present an ORA that incorporates protein interactions to identify canonical signaling pathways enriched for siRNA hits. This method extends and complements traditional gene set enrichment methods by specifically incorporating prior knowledge about the interactions within a pathway using a graph-based permutation algorithm. Also, PIPA provides an enrichment framework to exclusively examine signaling events within canonical pathways without the biases introduced by highly-connected multi-subunit complexes. When applied to genome wide siRNA screens in HeLa and TOV cell lines, the edge hits in identified pathways are enriched for GO categories relevant to the cell cycle, anti-apoptosis, and cytoskeletal structure. Of note, different branches of the FAK Signaling pathway are specifically effected by siRNA perturbations in HeLa versus TOV cells lines.
Protein interaction databases provide an initial view of the human interactome that is known to be incomplete . In addition, there is no universal standard for defining and annotating interactions across databases leading to differences in interaction accuracy. Also, the contextual information about interactions is generally absent from such databases, making it difficult to identify coherent sets of interactions that constitute functional signaling networks. These limitations in the known interactome inevitably lead to limitations in any methods relying on interactome data. Here we combine interactome interactions with canonical gene sets to partially address some of these limitations. As the quality of interactome databases increase, it is expected that the increased coverage and contextual information will be reflected in the improved ability of edge-based enrichment methods, including PIPA, to identify biologically meaningful signaling modules.
The process of natural selection is present in human tumors and a driving force behind cancer treatment resistance . Thus, it is critical to understand what pathways and protein interactions are sensitive to perturbations in a given interactome. Pathways enriched for interactions connecting siRNA hits make available an exciting portrait of essential signaling that is cell line specific and has the potential to guide future treatment strategies.
Genome-scale siRNA screens
Genome-scale siRNA knockdown screens were performed in duplicate on HeLa and TOV cell lines essentially as described before . The siRNA library is comprised of 18,586 unique siRNA pools, where each pool is an equimolar mixture of three siRNAs targeting different sequences of the same mRNA transcript. Both Hela and TOV21G cells were cultured in Dulbecco's Modified Essential Medium (DMEM) supplemented with 10% fetal bovine serum and penicillin/streptomycin. For the assay, 400 cells/well (Hela) or 1000 cells/well (TOV21G) in DMEM were seeded in wells of a 384-well TC-treated microplate. 24 hrs after cell plating, diluted Oligofectamine (1:40 in Opti-MEM) was incubated with siRNA for 20–30 min, and cells were transfected with the Oligofecatmine/siRNA mixture. Final concentration of siRNA pools was 50 nM and 25 nM for Hela and TOV21G screens, respectively. 72 hours post-transfection, cell viability was measured by Alamar Blue fluorescence assay. An siRNA to luciferase was used as a negative control (100% viability) and a no cell control was used a 0% viability control. A siRNA to Polo-like kinase 1 (Plk1) was used as a positive biological control to monitor transfection efficiency throughout the screens.
Cell growth phenotype values from siRNA screens were converted to z-scores, and quantile normalization was performed across all four screens to make a z-score hit threshold comparable across screens. Note that normalization steps result in the same number of node hits for replicate screens shown in Figure 1. For enrichment analyses, z-scores for each knockdown in duplicate screens were averaged together within each cell line. Averaged z-scores less than -1.5 were labeled as siRNA hits having a strong negative effect on cell growth. This threshold is based on other published genome-scale RNAi analyses and chosen to include expected essential genes (PLK1, KIF11, ARPC3) [4, 27–29]. In our enrichment analysis, 1,133 (6.1%) HeLa and 1,108 (6.0%) TOV knockdowns meet this criterion and are labeled as hits. Results shown in Figure 1 comparing replicate screens are non-averaged z-scores using the same z-score threshold to label hits.
Interactome Data Set and Canonical Gene Sets
Literature-curated interactions from GeneGo, Inc., Ingenuity®, and the Human Protein Reference Database (HPRD) were combined and filtered to be unique, non-self, and non-reciprocal [30–32]. There were 98,561 annotated interactions pertaining to phosphorylation, gene expression and physical binding events that mapped to siRNA targeted genes. We focused on these three interaction types as they were universal between all three protein interaction databases and made up the majority of interactions. GeneGo's Metacore canonical pathways (524 total) were used to provide contextual information to the global interactome map by grouping genes and/or gene products into sets that function together in a canonical biological processes. The intersection of the global interactome map and each canonical gene set was used to approximate topologically well-defined signaling pathways.
Overlap effect size calculations
Overlap effect size is estimated by computing the odds ratio on a 2 × 2 table obtained by classifying the relations of replicate screens. The edge hit universe (11,426) is calculated by counting the number of unique siRNA knockdowns belonging to protein interactions where both interacting partners are targeted in the screen. The edge hit overlap is calculated by taking the intersection of hits belonging to protein interactions connecting two siRNA hits.
small interfering RNA
protein interaction permutation analysis
human protein reference database
focal adhesion kinase
Hypergeometric enrichment analysis.
- Iorns E, Lord CJ, Turner N, Ashworth A: Utilizing RNA interference to enhance cancer drug discovery. Nature Reviews Drug Discovery. 2007, 6: 556-568. 10.1038/nrd2355View ArticlePubMedGoogle Scholar
- Friedman A, Perrimon N: Genetic Screening for Signal Transduction in the Era of Network Biology. Cell. 2007, 128: 225-231. 10.1016/j.cell.2007.01.007View ArticlePubMedGoogle Scholar
- Martin SE, Caplen NJ: Applications of RNA interference in mammalian systems. Annual review of genomics and human genetics. 2007, 8: 81-108. 10.1146/annurev.genom.8.080706.092424View ArticlePubMedGoogle Scholar
- Friedman A, Perrimon N: High-throughput approaches to dissecting MAPK signaling pathways. Methods. 2006, 40: 262-271. 10.1016/j.ymeth.2006.05.002View ArticlePubMedGoogle Scholar
- Boutros M, Bras LP, Huber W: Analysis of cell-based RNAi screens. Genome Biology. 2006, 7: R66- 10.1186/gb-2006-7-7-r66PubMed CentralView ArticlePubMedGoogle Scholar
- Haney SA: Increasing the robustness and validity of RNAi screens. Pharmacogenomics. 2007, 8 (8): 1037-1049. 10.2217/14622422.214.171.1247View ArticlePubMedGoogle Scholar
- Giaever G, Chu AM, Ni L, Connelly C, Riles L, Veronneau S, Dow S, Lucau-Danila A, Anderson K, Andre B, et al.: Functional profiling of the Saccharomyces cerevisiae genome. Nature. 2002, 418: 387-391. 10.1038/nature00935View ArticlePubMedGoogle Scholar
- Boutros M, Kiger AA, Armknecht S, Kerr K, Hild M, Koch B, Haas SA, Consortium HFA, Paro R, Perrimon N: Genome-wide RNAi Analysis of Growth and Viability in Drosophila Cells. Science. 2004, 303: 832-835. 10.1126/science.1091266View ArticlePubMedGoogle Scholar
- Kittler R, Pelletier L, Heninger A-K, Slabicki M, Theis M, Miroslaw L, Poser I, Lawo S, Grabner H, Kozak K, et al.: Genome-scale RNAi profiling of cell division in human tissue culture cells. Nature Cell Biology. 2007, 9 (12): 1401-1412. 10.1038/ncb1659View ArticlePubMedGoogle Scholar
- Brunner OM, Brunner H: The modular nature of genetic diseases. Clinical Genetics. 2006, 71: 1-11.View ArticleGoogle Scholar
- Deutscher D, Meilijson I, Kupiec M, Ruppin E: Multiple knockout analysis of genetic robustness in the yeast metabolic network. Nature Genetics. 2006, 38 (9): 993-998. 10.1038/ng1856View ArticlePubMedGoogle Scholar
- Sacher R, Stergiou L, Pelkmans L: Lessons from genetics: interpreting complex phenotypes in RNAi screens. Current opinion in cell biology. 2008, 20 (4): 483-489. 10.1016/j.ceb.2008.06.002View ArticlePubMedGoogle Scholar
- Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004, D258-261. 32 Database
- Curtis RK, Oresic M, Vidal-Puig A: Pathways to the analysis of microarray. Trends in Biotechnology. 2005, 23 (8): 429-435. 10.1016/j.tibtech.2005.05.011View ArticlePubMedGoogle Scholar
- Meur NL, Gentleman R: Assessing the role of multi-protein complexes and pathways in determining phenotype. Bioconductor Project Working Papers. 2008, http://www.bepress.com/bioconductor/paper13Google Scholar
- Levine DM, Haynor DR, Castle JC, Stepaniants SB, Pellegrini M, Mao M, Johnson JM: Pathway and gene-set activation measurement from mRNA expression data: the tissue distribution of human pathways. Genome Biol. 2006, 7 (10): R93- 10.1186/gb-2006-7-10-r93PubMed CentralView ArticlePubMedGoogle Scholar
- Oda K, Matsuoka Y, Funahashi A, Kitano H: A comprehensive pathway map of epidermal growth factor receptor signaling. Molecular systems biology. 2005, 1: 2005 0010Google Scholar
- Draghici S, Khatri P, Tarca AL, Amin K, Done A, Voichita C, Georgescu C, Romero R: A systems biology approach for pathway level analysis. Genome Research. 2007, 17: 1537-1545. 10.1101/gr.6202607PubMed CentralView ArticlePubMedGoogle Scholar
- Huang X, Wang JY, Lu X: Systems analysis of quantitative shRNA-library screens identifies regulators of cell adhesion. BMC systems biology. 2008, 2: 49- 10.1186/1752-0509-2-49PubMed CentralView ArticlePubMedGoogle Scholar
- Balasubramanian R, LaFramboise T, Scholtens D, Gentleman R: A graph-theoretic approach to testing associations between dispartate sources of functional genomics data. Bioinformatics. 2004, 20 (18): 3353-3362. 10.1093/bioinformatics/bth405View ArticlePubMedGoogle Scholar
- Zhang B, Horvath S: A General Framework for Weighted Gene Co-Expression Network Analysis. Statistical Applications in Genetics and MolecularBiology. 2005, 4 (1): Article 17Google Scholar
- Damoc E, Fraser CS, Zhou M, Videler H, Mayeur GL, Hershey JW, Doudna JA, Robinson CV, Leary JA: Structural characterization of the human eukaryotic initiation factor 3 protein complex by mass spectrometry. Mol Cell Proteomics. 2007, 6 (7): 1135-1146. 10.1074/mcp.M600399-MCP200View ArticlePubMedGoogle Scholar
- Li S, Hua ZC: FAK expression regulation and therapeutic potential. Advances in cancer research. 2008, 101: 45-61. 10.1016/S0065-230X(08)00403-XView ArticlePubMedGoogle Scholar
- Hart GT, Ramani AK, Marcotte EM: How complete are current yeast and human protein-interaction networks?. Genome Biology. 2006, 7 (120):
- Weinstein IB: Cancer. Addiction to oncogenes – the Achilles heal of cancer. Science. 2002, 297 (5578): 63-64. 10.1126/science.1073096View ArticlePubMedGoogle Scholar
- Bartz SR, Zhang Z, Burchard J, Imakura M, Martin M, Palmieri A, Needham R, Guo J, Gordon M, Chung N, et al.: Small interfering RNA screens reveal enhanced cisplatin cytotoxicity in tumor cells having both BRCA network and TP53 disruptions. Molecular and cellular biology. 2006, 26 (24): 9377-9386. 10.1128/MCB.01229-06PubMed CentralView ArticlePubMedGoogle Scholar
- Liu X, Erikson RL: Polo-like kinase (Plk)1 depletion induces apoptosis in cancer cells. Proc Natl Acad Sci USA. 2003, 100 (10): 5789-5794. 10.1073/pnas.1031523100PubMed CentralView ArticlePubMedGoogle Scholar
- Maroto B, Ye MB, von Lohneysen K, Schnelzer A, Knaus UG: P21-activated kinase is required for mitotic progression and regulates Plk1. Oncogene. 2008, 27: 4900-4908. 10.1038/onc.2008.131View ArticlePubMedGoogle Scholar
- Harborth J, Elbashir SM, Bechert K, Tuschl T, Weber K: Identification of essential genes in cultured mammalian cells using small interfering RNAs. J Cell Sci. 2001, 114 (Pt 24): 4557-4565.PubMedGoogle Scholar
- GeneGo Metacore. http://www.genego.com
- Ingenuity Systems. http://www.ingenuity.com
- Human Protein Reference Database (HPRD). http://www.hprd.org
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.