Detection of protein complexes from affinity purification/mass spectrometry data
© Cai et al.; licensee BioMed Central Ltd. 2012
Published: 17 December 2012
Recent advances in molecular biology have led to the accumulation of large amounts of data on protein-protein interaction networks in different species. An important challenge for the analysis of these data is to extract functional modules such as protein complexes and biological processes from networks which are characterised by the present of a significant number of false positives. Various computational techniques have been applied in recent years. However, most of them treat protein interaction as binary. Co-complex relations derived from affinity purification/mass spectrometry (AP-MS) experiments have been largely ignored.
This paper presents a new algorithm for detecting protein complexes from AP-MS data. The algorithm intends to detect groups of prey proteins that are significantly co-associated with the same set of bait proteins. We first construct AP-MS data as a bipartite network, where one set of nodes consists of bait proteins and the other set is composed of prey proteins. We then calculate pair-wise similarities of bait proteins based on the number of their commonly shared neighbours. A hierarchical clustering algorithm is employed to cluster bait proteins based on the similarities and thus a set of 'seed' clusters is obtained. Starting from these 'seed' clusters, an expansion process is developed to identify prey proteins which are significantly associated with the same set of bait proteins. Then, a set of complete protein complexes is derived. In application to two real AP-MS datasets, we validate biological significance of predicted protein complexes by using curated protein complexes and well-characterized cellular component annotation from Gene Ontology (GO). Several statistical metrics have been applied for evaluation.
Experimental results show that, the proposed algorithm achieves significant improvement in detecting protein complexes from AP-MS data. In comparison to the well-known MCL algorithm, our algorithm improves the accuracy rate by about 20% in detecting protein complexes in both networks and increases the F-Measure value by about 50% in Krogan_2006 network. Greater precision and better accuracy have been achieved and the identified complexes are demonstrated to match well with existing curated protein complexes.
Our study highlights the significance of taking co-complex relations into account when extracting protein complexes from AP-MS data. The algorithm proposed in this paper can be easily extended to the analysis of other biological networks which can be conveniently represented by bipartite graphs such as drug-target networks.
Protein-protein interactions (PPIs) are believed to be fundamental to the biological process and metabolic functions in the cell . As advance in high throughput experimental methods and computational approaches, such as Yeast two-hybrid (Y2H) screening [2, 3] and Affinity purification/mass spectrometry (AP-MS) [4–6], large genome-scale protein interactions have been detected, resulting in increasing size of PPI networks. Research on PPIs in biology and medicine has shown that a protein complex is a typical pattern existing in PPI networks in which a group of proteins interact with each other to play a biological function in a cell, such as anaphase-promoting complex and protein export and transport complexes , or bind each other in a series of time in a biological process such as the yeast pheromone response pathway and Mitogen-activated protein (MAP) signalling cascades . Hence, to identify the group of functionally interacted proteins could help to reveal and understand the relationship between the organization of a network and its function.
Over the past decade or so, various clustering algorithms [7–16] have been proposed for identifying protein complexes in PPI networks. Markov Cluster Algorithm (MCL) [12, 13] has been one of the most successful clustering methods in identifying complexes from protein interaction networks. It simulates a flow on the graph by calculating successive power of the associated adjacency matrix. A coefficient called inflation is applied to enhance the contrast between regions of strong and weak flows in the graph. The process converges towards a partition of the graph, with a set of high-flow regions (the clusters) separated by boundaries with no flow. In 2006, Brohée and Helden  evaluated four clustering algorithms for their ability to detect protein complexes, and results highlighted that MCL was remarkably robust to graph alterations. Another well-known clustering algorithm is CFinder . It was developed in 2006 based on the idea that a cluster consists of a number of k-cliques where two adjacent k-cliques share k-1 nodes. It exploits the topological feature of the network by using the direct link between a pair of nodes.
Most of these algorithms have been developed by modelling protein interactions as binary, i.e., interactions only exist between pairs of proteins. Results from the Y2H approach are inherently modelled as binary since the Y2H approach detects physical pair-wise protein-protein interactions. Although AP-MS data contains non-binary information, as it directly identifies co-membership of complexes by purifying proteins (called prey) that are associated with tagged proteins which were used as bait [4–6], it also has been modelled as binary networks where purification is seen as direct pair-wise interactions from bait to its associated prey proteins.
Two well-known binary models for AP-MS data are 'Spoke' and 'Matrix' models which have been proposed in 2003 by Bader and Hogue . The 'Spoke' model is similar to a 'Star' topology where bait proteins are the "hub" nodes and purified prey proteins are connected with baits. 'Matrix' model is in the other extreme, that is, besides associate interactions between prey proteins and bait proteins, all these prey proteins are all connected as well. 'Matrix' model for a complex is actually a 'clique' structure. The real topology of the set of proteins lies between these two models . The Molecular Complex Detection (MCODE) algorithm has been developed for identifying densely connected sections of a PPI networks. It weighs proteins by the density of their neighbourhood and takes proteins with highest weight as seeds of clusters. Starting from these seeds, MCODE expands clusters in the network in a greedy fashion. It has been evaluated using Gavin data set  by treating it as 'Spoke' model.
In 2006, Gavin et al.,  devised a 'socio-affinity' scoring system to weigh logical interactions between pairs of proteins in AP-MS data. In this study, several clustering methods have been employed to cluster on the scored PPI networks. In 2007, Collins et al.  developed another scoring system and applied hierarchical clustering methods to weighted networks to derive complexes. Afterwards, Pu et al.,  applied MCL on the scoring system of Collins et al.  to detect protein complexes.
The study of Gavin et al.  highlighted that a protein complex generally contains a core in which proteins are highly co-expressed and share high functional similarity. The COACH approach was proposed in 2009 , aiming at detecting protein complexes with highly-dense structure as well as exploring "core-attachment" organization inside protein complexes. The process of extracting protein complexes by COACH  consists of two stags. Firstly COACH  generates neighbourhood graphs of every node from the original network and then extracts preliminary set of core complexes which are of high density from each neighbourhood graphs. After a redundancy-filtering procedure, a set of final core complexes is obtained. In the second stage, an expansion process is conducted by exploring periphery information of cores to find attachments which consist of complete protein complexes.
The first study of modelling AP-MS data as non-binary was conducted by Scholtens et al., . They built the spoke model of AP-MS data as a directed network where edges link from bait proteins to prey proteins, and then the Local Modelling algorithm  was applied to this directed network. Results showed that predicted clusters from the Local Modelling algorithm well mapped curated protein complexes.
Most recently, in 2011, a novel algorithm called CODEC  has been proposed to cluster AP-MS data. CODEC translated AP-MS data to a bipartite graph, where all proteins in the network are classified into two sets, 'Baits' and 'Preys', and interactions only exist between these two sides. CODEC method aims to detect complexes as dense bipartite sub-graphs. It has been applied to three PPI networks of Yeast [4, 5, 23]. Results showed the CODEC method outperformed other algorithms with higher precision.
As pointed out by Geva and Sharan , AP-MS data could be directly applied for identifying complexes since AP-MS experiments detect complex co-membership. Modelling it as a bipartite graph could be more fitted to the non-binary nature of AP-MS data. Preserving information of bait protein when AP-MS data is modelled may help to improve the accuracy of identifying and predicting protein complexes and functional modules.
This paper presents a novel algorithm for detecting protein complexes from AP-MS data. The algorithm intends to detect groups of prey proteins that are significantly co-associated with the same set of bait proteins. We first construct AP-MS data as bipartite network, where one set of nodes consists of bait proteins and the other set is composed of prey proteins. We then calculate pair-wise similarities of bait proteins based on the number of their commonly shared neighbours. A hierarchical clustering algorithm is employed to cluster bait proteins based on the similarities and thus a set of 'seed' clusters is obtained. Starting from these 'seed' clusters, an expansion process is developed to identify prey proteins which are significantly associated with the same set of bait proteins. Then, a set of completely formed protein complexes are derived.
The organization of the paper is shown below. In Section 2, we first introduce the methodology of our proposed algorithm. In Section 3, we will present and discuss experimental results. We validate biological significance of predicted protein complexes by using curated complexes and well-characterized cellular component from GO . Several statistical metrics have been applied for evaluation. The paper is concluded with conclusion and the discussion of the limitation and future work.
The AP-MS experiment directly detects complex membership by purifying prey proteins which are co-associated with tagged bait proteins [4, 5]. Thus, an assumption of protein complexes can be derived, that, in AP-MS data, a complex is composed of a set of bait proteins along with a set of prey proteins that are significantly associated with the same set of bait proteins. Our proposed method is developed based on this assumption.
Here, a module is referred to as a term of a complex structure. In a strong module each vertex has more connections with the cluster than with the rest of the graph. In a weak module, the sum of all links connecting from each node within the cluster is larger than the sum of all links connecting from each node inside the cluster toward the rest of the network. In PPI networks, there exist complexes which have structure of a strong module, or of a weak module, or a combination of the two. However, all complexes should meet requirements in the definition of weak modules.
We represent AP-MS data as a bipartite graph. The graph is denoted as , where B represents the set of purification with bait nodes which is on the one side, V represents the set of prey nodes on the other side that have been detected by purifying via the bait nodes. If let be the original set of preys that is obtained directly from the dataset, . Thus, V is the union set of bait nodes and prey nodes, which, in other words, V is the set of nodes in the network. Note that, there exist nodes that are preys of some baits but also baits to other preys, thus we assign them a bait instance and prey instance respectively. E represents pair-wise interactions between baits and preys. A potential protein complex or functional module corresponds to a sub-graph of the graph, where is the set of nodes in the cluster, and is the set of corresponding baits.
Calculating pair-wise similarities between bait proteins;
Clustering bait proteins to obtain preliminary seed clusters;
Expanding process to form complete clusters;
Filtering clusters and outputting final set of clusters.
Calculating pair-wise similarities between bait proteins
We calculate the similarity between every pair of bait proteins in the graph. Thus, let be the set of values of similarity between pairs of baits, and then form a network based on these similarities, that is, , where B represents the set of bait proteins.
Clustering bait proteins to obtain preliminary seed clusters
In order to identify the set of prey proteins that are significantly associated with the same set of bait proteins, we first need to obtain sets of bait proteins as 'seed clusters'. Using the similarities calculated above as metric, we apply Agglomerative Hierarchical Cluster algorithm to cluster the bait proteins. We employ an open source tool called MultiDendrograms  to clusterbait proteins. MultiDendrograms  incorporates most common Agglomerative Hierarchical Clustering algorithms, e.g. Single Linkage, Complete Linkage and Unweighted Average. Selection of parameters in experiments will be introduced in the following result section.
Expanding process to form complete clusters
As shown in the work published by Gavin et al., , a protein complex generally contains a core in which proteins are highly co-expressed and share high functional similarity. Some protein cores are surrounded by attachments which help supporting subordinate functions. Inspired by the finding, we consider that the cores correspond to the structure of a strong module  we introduced above and the attachment corresponds to the structure of a weak module . A complete cluster should meet the requirements of a weak module. Thus, the expansion process is composed of two stages: firstly, detecting strong modules from seed clusters which are composed of bait proteins only; secondly, expanding to form final clusters from the strong modules of clusters in a greedy fashion.
1. Detect strong modules from seed clusters
Let Sc be a seed cluster, and let u be a prey protein connecting with proteins in the seed cluster. Let be the number of connections from u to Sc; be the number of connections of u to proteins that are not in Sc; let be the number of internal connections inside the cluster in which u is included in Sc; be the number of external edges from the cluster in which u is included in Sc.
- 1)u should connect to at least half proteins inside the seed cluster, that is,(2)
where is the size of the seed cluster Sc.
- 2)The connections from u to proteins inside the seed cluster should be more than connections linking to other proteins which are not in the seed cluster, that is,(3)
- 3)The out-links of seed cluster which includes u should be less than the internal links, that is,(4)
The process ceases when there is no matched protein.
2. Form final clusters
After the expansion process of finding strong modules, the process of forming final clusters starts. It will iteratively explore matched proteins in the neighbourhood of the proteins in strong modules.
- 1)The connections from v to proteins in Mc should be no less than those from v to other proteins, that is,(5)
- 2)After included v, the internal links inside the new cluster should be more than the external links, that is,(6)
Actually, there exist "seed clusters" only consist of one bait protein; we just add its neighbours if the neighbour protein meets the two conditions above.
Filtering clusters and outputting final set of clusters
In the set of clusters obtained from expansion process, there exist overlapping clusters. We calculate the overlap rate between two clusters, that is, , where |C| is the size of the cluster. If the overlap rate is above a given threshold, we merge the two clusters. In our algorithm, we use 0.2 as the threshold value.
The general time complexity of the entire algorithm is , where represents the number of bait instances in the network and is the number of prey instances which is also the size of the network (when modelling the network we add instances of bait proteins to prey instances side.), . represents the number of predicted clusters obtained. The first step of our algorithm is to calculate pair-wise similarities between bait nodes, thus the time complexity is . In the second step, the time complexity for agglomerative hierarchical algorithm is . As for expansion process, the time complexity of one expansion process is . Since we adopt greedy fashion in expansion, there may by k times of expansion, thus the time complexity for the whole expansion process is . The post-process stage could be up to . Normally, since and , thus, the asymptotic time complexity of our algorithm is .
Implementation and running time
We implemented the proposed method using Java programming language with JDK 1.6. The proposed method is applied on a desktop computer with Inter(R) Core(TM)2 Duo CPU E8500 @3.16GHz 3.17 GHz processor and 8 GB memory. The amount of running time depends on the size of dataset. The running time of the proposed method on Gavin_2006 dataset was 34653 milliseconds, and the application to Krogan_2006 dataset was 61336 milliseconds. This running time only contained the time of process of calculation of pair-wise similarity for bait proteins, expansion process and post-process, excluding the time of application of hierarchical clustering method to generate seed clusters since we utilized the software toolkit, MultiDendrograms , for this purpose.
Preparation of data
We applied our method on two recently published datasets in bait-prey relationships in Yeast. One is the dataset obtained by Gavin et al  with 1993 bait proteins, 2671 prey proteins and 19157 bait-prey relationships; the other is the dataset published by Krogan et al. , which contains 2233 bait proteins, 5219 prey proteins and 40623 bait-prey relationships. 94 prey proteins were suspected as non-specific contaminants  so that they were excluded from the raw data of Krogan et al's dataset. For convenience, we name these two datasets as Gavin_2006 and Krogan_2006 for short.
The number and average size of known complexes derived from two PPI networks
No. of complexes
We utilize evaluation metrics, i.e., accuracy and homogeneity suggested by Broheé and Helden . These two metrics measures the overlap degree between predicted clusters and benchmark complexes.
Let be the set of predicted clusters generated by the clustering algorithm, and let be the subset of , , containing clusters that have at least two nodes annotated in any of benchmark complexes. Let be the set of benchmark complexes and let , , be the set of benchmark complexes excluding those which contain proteins that are not found in the network. Let be the number of clusters in C*, and be the number of complexes in , then a confusion matrix is constructed for comparison between predicted clusters and benchmark complexes. The i th row stands for predicted cluster while the column corresponds to benchmark complex . The entry represents the number of proteins found in cluster that are annotated in benchmark complex. is the size of predicted cluster while represents size of benchmark complex.
Accuracy measures the general correspondence between predicted clusters and benchmark complexes, which contains two components, sensitivity (Se) and positive predictive value (PPV).
Thus, high precision value requires a high performance for both measures. The higher precision values the better quality of a clustering result.
Homogeneity reflects relative ratio of distribution of overlapping intersections between annotated complexes and generated clusters. When proteins are allowed to be assigned to multiple clusters, the value will be lower and thus the homogeneity value will be lower.
• Sensitivity and specificity
F-Measure is the harmonic average of sensitivity and specificity. In our experiments, based on the study of Bader and Hogue , we consider that a predicted cluster significantly matches a benchmark complex if the corresponding .
Selection of parameters
We selected the parameters following a trial-and-error procedure. Unless indicated otherwise, the results reported in this paper were derived based on the following parameter settings: the hierarchical clustering was implemented with un-weighted average linkage and the cut-off values set to 0.3 and 0.25 for Gavin_2006 and Krogan_2006 networks, respectively.
We choose the set of parameters of MCL and MCODE recommended by Broheé and Helden . Specifically, we use inflation rate 1.8 for MCL. For MCODE, we set the parameters depth equal to 100, node score percentage as 0, Haircut is TURE, Fluff is FALSE and the percentage for complex fluffing as 0.2. The value required for CFinder was set to 5. As for the CODEC algorithm, there are two schemes, CODEC-w0 and CODEC-w1. We compare our method to both schemes of CODEC. We only use final predicted clusters from COACH, without considering its predicted core clusters.
Experimental results and discussion
We compare performance of our method with that of several state-of-art clustering methods, which are categorized into two groups. One includes MCL [12, 13], MCODE , CFinder , and COACH , each treating AP-MS data as non-bipartite graph; and the other is CODEC  the algorithm that treated AP-MS data also as a bipartite graph. The input for algorithms in the first category is the set of interactions from a bait protein to its preys represented as the Spoke model .
• Accuracy and homogeneity
Performance comparison on Gavin_2006 with CYC-2008
Performance comparison on Gavin_2006 with GO-CC
PerformancecComparison on Krogan_2006 with CYC-2008
Performance comparison on Krogan_2006 with GO-CC
Number and average size of predicted clusters from different methods on the two testing PPI networks (exclude singleton clusters)
No. of clusters
No. of clusters
PPV value indicates the fraction of clustering results which have also been identified and annotated in the benchmark complexes so far. It favours smaller clusters. In order to be fair, as stated above, we excluded clusters whose size of overlap with curated complexes is less than two proteins. Results show that our method obtains the highest PPV value in comparison to other algorithms. The better accuracy suggests that the proposed algorithm can achieve a much better performance as the value of the accuracy reflects the general performance of a clustering algorithm based on the estimation of the overall correspondence between the set of predicted clusters and the set of annotated complexes.
Homogeneity is the product of the fraction of members in a cluster found in an annotated complex by the fraction of members in the complex found in a cluster. High homogeneity indicates a bi-directional correspondence between a cluster and a complex . The maximal value of homogeneity is 1 when a cluster matches perfectly with a complex which means that the cluster consists of all its members identified in the complex. As shown in Table 2, the proposed algorithm achieves the best performance in terms of the clustering-wise homogeneity value, which reflects the general agreement between identified clusters and benchmark complexes, as well as the quality of a clustering result as a whole.
Similar observations can be made when analysing the Krogan_2006 data as shown in Tables 4 and 5. Our proposed method outperforms other clustering algorithms except CODEC. While CODEC_w1 yields better accuracy than our proposed method, it yields a very low value of homogeneity. This could partly be due to the high level of overlap between clusters generated. In the clustering results obtained by using CODEC-w1, the average overlap rate between predicted clusters is 52% and 50% for Gavin_2006 and Krogan_2006 datasets, respectively. Though relatively lower accuracy value than CODEC-w1, our proposed method still achieves the best performance in terms of both highest PPV and homogeneity.
• Specificity and sensitivity
Specificity/sensitivity/F-measure results on the two testing PPI networks with CYC-2008 and GO-CC benchmark complexes on Gavin_2006
specificity/sensitivity/F-measure results on the two testing PPI networks with CYC-2008 and GO-CC benchmark complexes on Krogan_2006
• Analysis of biological significance of clustering
To further validate biological significance of the results obtained by the proposed method, we next discuss several predicted complexes that are found by our method but not detected by other methods, which are also biological relevant. Here, we present examples of clusters obtained from Krogan_2006 network.
One example of fully-matched clusters identified by the proposed algorithm but not found in results produced by other algorithms, includes four proteins, that is YJR112W, YPL233W, YAL034W-A, and YIR010W. This protein complex has been defined as a kinetochore complex that binds to centromeric chromatin and forms part of the inner kinetochore of a chromosome in the nucleus [31, 32]. Another cluster found by our method and not identified by other algorithms, is composed of five proteins, YIL097W, YMR135C, YIL017C, YGL227W and YDR255C, which are all annotated by GO term: 0034657 (GID complex) [33, 34]. Although not including all proteins, the predicted complex matches five out of seven proteins in the complex with ubiquitin ligase activity that is involved in proteasomal degradation of fructose-1,6-bisphosphatase (FBPase) and phosphoenolpyruvate carboxykinase during the transition from gluconeogenic to glycolytic growth conditions [33, 34]. Another example is the cluster consisting of six proteins, i.e., YJR082C, YEL018W, YNL136W, YFL024C, YOR244W and YHR090. Among these six proteins, three belong to the subunits of NuA4 in baker's yeast, within an essential histone H4/H2A acetyltransferase complex annotated by GO cellular component term GO:0032777 [35, 36]. Although not all listed in the protein complex, the other 3 proteins found in the cluster have been identified as subunit of the NuA4 histone acetyltransferase complex in the yeast [35, 36].
These cases exemplify that, by the incorporation of information of bait proteins in the clustering analysis of AP-MS data, the propose method has the advantage to discover significant functional modules from the networks.
In this paper, we propose a new algorithm for discovering functional modules and complexes in AP-MS PPI networks. It has been tested on two real AP-MS PPI networks, i.e., Gavin_2006  network and Krogan_2006 network . Comparing to well-known MCL algorithm, our algorithm improves the accuracy rate by about 20% in extracting protein complexes from both AP-MS networks and increases the F-measure value by about 50% on Krogan_2006 network. Greater accuracy, better homogeneity and higher specificity and sensitivity were achieved in comparison with the results produced by several state-of-art clustering algorithms. The main feature of our method is that it detects protein complexes by taking co-complex relations into account from AP-MS data. Furthermore, the proposed method is able to detect overlapping modules encoding in PPI networks. In addition, the framework proposed in this paper can be easily extended to the analysis of other biological networks which can be conveniently represented by bipartite graphs such as drug-targets networks.
Currently, our proposed algorithm only considers the topological features of PPI networks. Incorporation of other biological information such as semantic similarity derived from GO into the clustering process would be an important part of our future work.
In this study, the determination of the parameters was based on trial and error. Integration with other techniques such as Genetic Algorithm for the dynamic determination of learning parameters provides another direction of our research.
BC is supported by the Vice Chancellor's Research Scholarships, University of Ulster, UK.
This article has been published as part of BMC Systems Biology Volume 6 Supplement 3, 2012: Proceedings of The International Conference on Intelligent Biology and Medicine (ICIBM) - Systems Biology. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcsystbiol/supplements/6/S3.
- Ghavidel A, Cagney G, Emili A: A skeleton of the human protein interactome. Cell. 2005, 122 (6): 830-2. 10.1016/j.cell.2005.09.006.View ArticlePubMedGoogle Scholar
- Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y: A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci USA. 2001, 98 (8): 4569-4574. 10.1073/pnas.061034498.PubMed CentralView ArticlePubMedGoogle Scholar
- Uetz P, Glot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P, Qureshi-Emili A, Li Y, Godwin B, Rothberg JM: A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature. 2000, 403 (6770): 623-627. 10.1038/35001009.View ArticlePubMedGoogle Scholar
- Gavin AC, Bösche M, Krause R, Grandl P, Marzloch M, Baer A, Schultz J, Rick JM, Mlchon AM, Cruclat CM, Remor M, Höfert C, Schelder M, Brajenovlc M, Ruffner H, Merlno A, Klein K, Hudak M, Dickson D, Rudl T, Gnau V, Bauch A, Bastuck S, Huhse B, Leutwein C, Heurtler MA, Copley RR, Edelmann A, Querfurth E, Rybin V, Drewes G, Ralda M, Bouwmeester T, Bork P, Seraphin B, Kuster B, Neubauer G, Superti-Furga G: Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature. 2002, 415 (6868): 141-7. 10.1038/415141a.View ArticlePubMedGoogle Scholar
- Gavin A, Aloy P, Grandi P, Krause R, Boesche M, Marzioch M, Rau C, Jensen LJ, Bastuck S, Dümpelfeld B, Edelmann A, Heurtier M, Hoffman V, Hoefert C, Klein K: Proteome survey reveals modularity of the yeast cell machinery. Nature. 2006, 440 (7084): 631-6. 10.1038/nature04532.View ArticlePubMedGoogle Scholar
- Yu J, Fotouhi F: Computational approaches for predicting protein-protein interactions: a survey. J Med Sys. 2006, 30 (1): 39-44. 10.1007/s10916-006-7402-3.View ArticleGoogle Scholar
- Spirin V, Mirny L: Protein complexes and functional modules in molecular networks. Proc Natl Acad Sci USA. 2003, 100 (21): 12123-12128. 10.1073/pnas.2032324100.PubMed CentralView ArticlePubMedGoogle Scholar
- Bader GD, Hogue CW: An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics. 2003, 4: 2-10.1186/1471-2105-4-2.PubMed CentralView ArticlePubMedGoogle Scholar
- Hartuv E, Shamir R: A clustering algorithm based on graph connectivity. Information Processing Letters. 2000, 76 (4-6): 175-181. 10.1016/S0020-0190(00)00142-3.View ArticleGoogle Scholar
- King AD, Przulj N, Jurisica I: Protein complex prediction via cost-based clustering. Bioinformatics. 2004, 20 (17): 3013-20. 10.1093/bioinformatics/bth351.View ArticlePubMedGoogle Scholar
- Adamcsek B, Palla G, Farkas IJ, Derenyi I, Vicsek T: CFinder: locating cliques and overlapping modules in biological networks. Bioinformatics. 2006, 22 (8): 1021-1023. 10.1093/bioinformatics/btl039.View ArticlePubMedGoogle Scholar
- Dongen S: Graph clustering by flow simulation [Ph.D. dissertation]: Centers for Mathematics and Computer. 2000, Science, University of UtrechtGoogle Scholar
- Enright AJ, Van Dongen S, Ouzounis CA: An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 2002, 30 (7): 1575-10.1093/nar/30.7.1575.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhou H, Lipowsky R: Network Brownian motion: a new method to measure vertex-vertex proximity and to identify communities and subcommunities. International conference on Computational Science. 2004, 1062-1069.Google Scholar
- Pons P, Latapy M: Computing communities in large networks using random walks. J Graph Algorithms Appl. 2006, 10 (2): 191-218. 10.7155/jgaa.00124.View ArticleGoogle Scholar
- Macropol KK, Can TT, Singh AKA: RRW: repeated random walks on genome-scale protein networks for local cluster discovery. BMC Bioinformatics. 2009, 10: 283-10.1186/1471-2105-10-283.PubMed CentralView ArticlePubMedGoogle Scholar
- Brohée SS, van Helden JJ: Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinformatics. 2006, 7: 488-10.1186/1471-2105-7-488.PubMed CentralView ArticlePubMedGoogle Scholar
- Collins SR, Kemmeren P, Zhao XC, Greenblatt JF, Spencer F, Holstege FC, Weissman JS, Krogan NJ: Toward a comprehensive atlas of the physical interactome of Saccharomyces cerevisiae. Molecular & Cell Proteomics. 2007, 6 (3): 439-50.View ArticleGoogle Scholar
- Pu SS, Vlasblom JJ, Emili AA, Greenblatt JJ, Wodak SJS: Identifying functional modules in the physical interactome of Saccharomyces cerevisiae. Proteomics. 2007, 7 (6): 944-960. 10.1002/pmic.200600636.View ArticlePubMedGoogle Scholar
- Wu MM, Li XX, Kwoh CC, Ng SS: A core-attachment based method to detect protein complexes in PPI networks. BMC Bioinformatics. 2009, 10: 169-10.1186/1471-2105-10-169.PubMed CentralView ArticlePubMedGoogle Scholar
- Scholtens D, Vidal M, Gentleman R: Local modeling of global interactome networks. Bioinformatics. 2005, 21 (17): 3548-3557. 10.1093/bioinformatics/bti567.View ArticlePubMedGoogle Scholar
- Geva G, Sharan R: Identification of protein complexes from co-immunoprecipitation data. Bioinformatics. 2011, 27 (1): 111-117. 10.1093/bioinformatics/btq652.PubMed CentralView ArticlePubMedGoogle Scholar
- Krogan NJ, Cagney G, Yu H, Zhong G, Guo X, Ignatchenko A, Li J, Pu S, Datta N, Tikuisis AP, Punna T, Peregrín-Alvarez JM, Shales M, Zhang X, Davey M, Robinson MD, Paccanaro A, Bray JE, Sheung A, Beattie B, Richards DP, Canadien V, Lalev A, Mena F, Wong P, Starostine A, Canete MM, Vlasblom J, Wu S, Orsi C, Collins SR, Chandran S, Haw R, Rilstone JJ, Gandi K, Thompson NJ, Musso G, St Onge P, Ghanny S, Lam MH, Butland G, Altaf-Ul AM, Kanaya S, Shilatifard A, O'Shea E, Weissman JS, Ingles CJ, Hughes TR, Parkinson J, Gerstein M, Wodak SJ, Emili A, Greenblatt JF: Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature. 2006, 440 (7084): 637-643. 10.1038/nature04670.View ArticlePubMedGoogle Scholar
- Ashburner M, Ball C, Blake J, Botstein D, Butler H, Cherry JM, Davis , Dolinski K, Dwight SS, Eppig JT, Harris M, Hill DP, Issel-Tarver L, Kasarskis A: Gene Ontology: tool for the unification of biology. Nat Genet. 2000, 25 (1): 25-29. 10.1038/75556.PubMed CentralView ArticlePubMedGoogle Scholar
- Altaf-Ul-Amin MM, Shinbo YY, Mihara KK, Kurokawa KK, Kanaya SS: Development and implementation of an algorithm for detection of protein complexes in large interaction networks. BMC Bioinformatics. 2006, 7: 207-10.1186/1471-2105-7-207.PubMed CentralView ArticlePubMedGoogle Scholar
- Radicchi FF, Castellano CC, Cecconi FF, Loreto VV, Parisi DD: Defining and identifying communities in networks. Proc Natl Acad Sci USA. 2004, 101 (9): 2658-2663. 10.1073/pnas.0400054101.PubMed CentralView ArticlePubMedGoogle Scholar
- Jaccard P: Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bulletin de la Société Vaudoise des Sciences Naturelles. 1901, 37: 547-579.Google Scholar
- Fernández A, Gómez S: Solving Non-Uniqueness in Agglomerative Hierarchical Clustering Using Multidendrograms. Journal of Classification. 2008, 25 (1): 43-65. 10.1007/s00357-008-9004-x.View ArticleGoogle Scholar
- Pu S, Wong J, Turner B, Cho E, Wodak SJ: Up-to-date catalogues of yeast protein complexes. Nucleic Acids Res. 2009, 37 (3): 825-831. 10.1093/nar/gkn1005.PubMed CentralView ArticlePubMedGoogle Scholar
- Song J, Singh M: How and when should interactome-derived clusters be used to predict functional modules and protein function?. Bioinformatics. 2009, 25 (23): 3143-3150. 10.1093/bioinformatics/btp551.PubMed CentralView ArticlePubMedGoogle Scholar
- Scharfenberger M, Ortiz J, Grau N, Janke C, Schiebel E, Lechner J: Nsl1p is essential for the establishment of bipolarity and the localization of the Dam-Duo complex. EMBO J. 2003, 22 (24): 6584-97. 10.1093/emboj/cdg636.PubMed CentralView ArticlePubMedGoogle Scholar
- UniProt-GOA: Gene Ontology annotation based on manual assignment of UniProtKB keywords in UniProtKB/Swiss-Prot entries. 2001Google Scholar
- Regelmann J, Schuele T, Josupeit FS, Horak J, Rose M, Entian K, Thumm M, Wolf DH: Catabolite Degradation of Fructose-1,6-bisphosphatase in the Yeast Saccharomyces cerevisiae: A genome-wide screen identifies eight novel GID genes and indicates the existence of two degradation pathways. Mol Biol Cell. 2003, 14 (4): 1652-1663. 10.1091/mbc.E02-08-0456.PubMed CentralView ArticlePubMedGoogle Scholar
- Santt OO, Pfirrmann TT, Braun BB, Juretschke JJ, Kimmig PP, Scheel HH, Hofmann KK, Thumm MM, Wolf DHD: The yeast GID complex, a novel ubiquitin ligase (E3) involved in the regulation of carbohydrate metabolism. Mol Biol Cell. 2008, 19 (8): 3323-3333. 10.1091/mbc.E08-03-0328.PubMed CentralView ArticlePubMedGoogle Scholar
- Boudreault AA, Cronier D, Selleck W, Lacoste N, Utley RT, Allard S, Savard J, Lane WS, Tan S, Cote J: Yeast Enhancer of Polycomb defines global Esa1-dependent acetylation of chromatin. Genes Dev. 2003, 17 (11): 1415-1428. 10.1101/gad.1056603.PubMed CentralView ArticlePubMedGoogle Scholar
- Selleck W, Fortin I, Sermwittayawong D, Cote J, Tan S: The Saccharomyces cerevisiae Piccolo NuA4 Histone Acetyltransferase complex requires the enhancer of Polycomb A domain and chromodomain to acetylate nucleosomes. Mol Cell Biol. 2005, 25 (13): 5535-5542. 10.1128/MCB.25.13.5535-5542.2005.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.