Discovering protein complexes in protein interaction networks via exploring the weak ties effect
 Xiaoke Ma^{1} and
 Lin Gao^{1}Email author
https://doi.org/10.1186/175205096S1S6
© Ma and Gao; licensee BioMed Central Ltd. 2012
Published: 16 July 2012
Abstract
Background
Studying protein complexes is very important in biological processes since it helps reveal the structurefunctionality relationships in biological networks and much attention has been paid to accurately predict protein complexes from the increasing amount of proteinprotein interaction (PPI) data. Most of the available algorithms are based on the assumption that dense subgraphs correspond to complexes, failing to take into account the inherence organization within protein complex and the roles of edges. Thus, there is a critical need to investigate the possibility of discovering protein complexes using the topological information hidden in edges.
Results
To provide an investigation of the roles of edges in PPI networks, we show that the edges connecting less similar vertices in topology are more significant in maintaining the global connectivity, indicating the weak ties phenomenon in PPI networks. We further demonstrate that there is a negative relation between the weak tie strength and the topological similarity. By using the bridges, a reliable virtual network is constructed, in which each maximal clique corresponds to the core of a complex. By this notion, the detection of the protein complexes is transformed into a classic allclique problem. A novel coreattachment based method is developed, which detects the cores and attachments, respectively. A comprehensive comparison among the existing algorithms and our algorithm has been made by comparing the predicted complexes against benchmark complexes.
Conclusions
We proved that the weak tie effect exists in the PPI network and demonstrated that the density is insufficient to characterize the topological structure of protein complexes. Furthermore, the experimental results on the yeast PPI network show that the proposed method outperforms the stateoftheart algorithms. The analysis of detected modules by the present algorithm suggests that most of these modules have well biological significance in context of complexes, suggesting that the roles of edges are critical in discovering protein complexes.
Background
Interpretation of the completed biological genome sequences initiated a decade of landmark studies addressing the critical aspects of cell biology on a systemwide level, including gene expression analysis [1, 2], gene disruptions detection [3, 4], identification of protein subcellular location [5, 6] and so on. An important and challenge task in proteomics is the detection of protein complexes from the available proteinprotein interaction (PPI) networks generated by various experimental technologies such as yeasttwohybrid [7], affinity purification [8], mass spectrometry [9], etc.
Protein complexes, consisting of molecular aggregations of proteins assembled by multiple protein interactions, are of the fundamental units of macromolecular organizations and play crucial roles in integrating individual gene products to perform useful cellular functions. It is confirmed by the fact that the complex 'RNA polymerase II' transcribes genetic information into messages for ribosomes to produce proteins. Unfortunately, the mechanism for most of biological activities is still unknown and hence accurately predicting protein complexes from the available PPI data has a considerable merit of practice because it allows us to infer the principles of biological processes.
The general methods for protein complexes prediction are based on experimental and computational notions. Experimentally, the Tandem Affnity Purification (TAP) with mass spectrometry [9] turns out to be popular. However, it is far away from being a satisfying answer because of the limits on TAP [10]. For example, the transient low affinity protein complexes may be excluded because of the washing and purification operations in the TAPMS. At the same time, this experimental approach needs the tag proteins to infer the protein complex. Gavin et al. [8] have indicated that only limited known yeast protein complex subunits can be extracted by the TAPMS. Moreover, Schonbach [11] showed that, in order to validate the experimental results using the subcellular localization information, a preparation of subcellular fractionated lysates is a must. But the preparation procedure is timeconsuming. That's why the computational approaches are becoming promising alternatives to complement the experimental ones.
Generally, protein interaction data can be effectively modeled as a graph (also called a network) by regarding each protein as a vertex and each known interaction between two proteins as an edge. Although there are plenty of related results in graph theory and many graph algorithms have been developed, it is still nontrivial to design an efficient algorithm to mine protein complexes from PPI networks. One reason is that there has not been an exact definition for a protein complex. To overcome this difficulty, Tong et al. [12] assumed that a protein complex corresponds to a dense subgraph since proteins in the same complex interact frequently among themselves, and similar discussion was also made in Ref. [13].
Although it is nontrivial to design effective and efficient computational methods for predicting complexes, many algorithms have been devoted to the issue. Markov Cluster Algorithm (MCL) [14, 15] simulated random walks within graphs based on the intuition that a walker started at an arbitrary protein and visited a neighborhood vertex with a predefined probability. If he walked into a dense region, it is hard to get out of the region. Molecular Complex Detection (MCODE) [16] relied on the topological structure of a network, where it is assumed that a protein belongs to some complex if it has a subset of neighbors with high degree and there are many interactions among them. CFinder [17] defined a dense subgraph by using the concept of adjacent kcliques. Other nontopological properties such as the functional information [18] and data of protein binding interface [19] are also incorporated into algorithms with an immediate purpose to improve the accuracy of prediction. In addition, there are some others relying solely on TAP data [20–22], which can be summarized as two points: first, a reliable PPI network is constructed by applying specific scoring strategies based on the purification records and selected protein interactions with high scores; second, some existing algorithms are employed to detect dense clusters in the newly constructed networks.
The coreattachment based approaches outperform dramatically the available stateoftheart algorithms, demonstrating the significance of the structure and indicating the critical role of it in discovering protein complexes. This is one of the our major motivations. On the other hand, another major problem confounding the existing computational algorithm is that, available PPI networks are too sparse, for instance, the average numbers of interactions per protein are 5.29, 6.98, and 10.62 in DIP [31], Krogan [22], and Gavin [21], respectively. In these PPI networks, many protein complexes are difficult to be extracted since the sparse networks are full of noises [32]. Therefore, designing an efficient algorithm that gets rid of the noise is an important and challenging task to predict protein complexes. Unfortunately, previous algorithms did not pay enough attention to the problem since they only filter the noise by deleting nodes with degree 1 based on the fact that the interactions between proteins have lower reliability to the topological reliability measures [33, 34]. Aside from issues of noise, all the existing computational approaches only make use of the topological structure information from the vertices and fail to take into consideration the roles of edges. It, however, is unreasonable to ignore the roles of edges, say the weak tie theory [35] and percolation [36], since an edge may play an important role in enhancing the locality or be significant in maintaining the global connectivity. For example, the famous weak ties theory indicates the job opportunities and new ideas are usually from persons with weak connections. Furthermore, the weak ties can be used to characterized the topological properties of networks such as the stability of biological functions [37], the accuracy of network structure prediction [38], the structure in mobile communication networks [39]. And the percolation characterizes the tendency to undergo a topological phase transition as the number of connections is progressively increased. Motivated by these observations, we pose the following question:
Question: whether the roles of edges can be used in protein complexes detection?
In this study, we aim to investigate the possibility to extract protein complexes by exploring the roles of edges and develop an affirmative answer to the above question. In detail, similar to the weak ties effects in mobile communication [39] and document networks [40], we prove complementary results on the PPI networks that is the edges connecting less similar nodes are more significant in maintaining the global connectivity. By using the weak ties and percolation, a reliable virtual network is constructed from the original PPI network, in which each maximal clique corresponds to a protein complex. A coreattachment based method is developed. To test the performance of the proposed algorithm, we applied it to the PPI networks. The experimental results on the yeast PPI network show that the proposed method outperforms DPClus [41], DECAFF [42], MCL [14], MCODE [16] and Coach [24]. Further, the analysis of detected modules by the present algorithm suggests that most of these modules have well biological significance in context of complexes, suggesting that the roles of edges are critical in discovering protein complexes.
Materials and methods
The key idea behind our algorithm consists of three main steps: (1) verifying the existence of weak ties effect in PPI networks; (2) constructing a reliable network by exploring the roles of edges; and (3) identifying the protein complexes by using a coreattachment based method. We show them in turns.
Weak ties phenomenon in PPI networks
A network consists of two basic elements: vertices and edges. Many measurements are developed to characterize the role of a node for structure and function including random walkbased indices [43], PageRank score [44]. In comparison, the study of the edge's role is less extensive.
Actually, edges in a network usually have two roles to play: some contribute to the global connectivity like the ones connecting two clusters while others enhance the locality like the ones inside a cluster. In social networks, the two roles are reflected as two important phenomena, being respectively the homophily [45] and weak ties effects [46]. Homophily demonstrates that connections are more likely to be formed among individuals with close background, common characteristics. On the other hand, the weak ties phenomenon shows that the less similar individuals are prone to be connected with weaker strength. These weak ties have important roles to play in maintaining the global connectivity. It has been proved that the weak ties phenomenon exists in the mobile communication [39] and document networks [40]. But, the weak ties effect for PPI networks remains to be tested.
where s is the size of a connected subgraph, N is the size of the whole network and the sum includes all connected components. An obvious gap occurs when the network disintegrates [47].
where (u, υ) is the edge with u, υ being the endpoints, C_{ u }is the size of the maximal clique containing vertex u and C_{(u, υ)}is the size of the maximal clique containing (u, υ). It, however, can not distinguish the bridges and nonbridges because it fails to take into account the difference between a pair of vertices. The bridggness value for each edge in a clique is 1 according to Eq.(2). It is unreasonable because intuitively the larger the size of a clique is, the lower the probability for some edge in the clique being a bridge is. For example, edges in 3clique are more prone to be bridges than ones in 8clique.
where J(u, υ) is the Jaccard similarity, i.e., $J\left(u,\upsilon \right)=\frac{\leftN\left(u\right)\cap N\left(\upsilon \right)\right}{\leftN\left(u\right)\cap N\left(\upsilon \right)\right}$ with N(u) being the neighbors of vertex u, and C_{ u\υ }is the size of the maximal clique containing u without υ. The 1 J(u, υ) measures the dissimilarity between the pair of endpoints while the latter component quantifies the relation between the neigbors of two endpoints. The physical interpretation of Eq.(3) is that only these edges whose endpoints are less similar in topological and maintain the global connectivity are the bridges. Compared with Eq.(2), the new index is more reasonable, for example, for an edge in a mclique is $\frac{2\left(m1\right)}{{m}^{2}}$, which decreases as the size of a clique increases.
Constructing a reliable network
Gavin et al [8] have pointed out that the core of a complex has relatively more interactions while the attachments bind to the core proteins to form a biological complex, implying that the connectivity of a core is better than the whole complex.
To assess the topological proximity of a core, the measure of proximity of a pair of vertices should be handled beforehand. The most commonly used one is the graph distance, that is, the length of the shortest path connecting the pair of vertices. This quantity, however, is not appropriate for the biological networks largely because of two drawbacks: first, it does not take into account the local structural feature of the networks; second, it is very susceptible to the noises, e.g., a single missing edge effects the proximity, significantly. Thus, vertices connected by paths of various lengthes are likely to be functionality closer than vertices connected via a single path. In detail, give an edge, say (u,υ), it is reasonable to consider that the information transferred from u to υ through the right channels. The more the channels are, the better the connectivity is. Actually, in biological network, the genetic information is transferred by the pathways. From the aspect of graph theory, it is natural to consider the channels as various walks connecting u, υ. Likewise, we also take into consideration the strength of paths: the strength of the effect via longer paths with more intermediate vertices is very likely to be lower than those via shorter ones with fewer intermediaries. Given a walk of length k, say υ_{1}→υ_{2} → ... υ_{ k+ }_{1}, its strength is defined as the product of the weights on each edge in the walk, i.e., ${\prod}_{i=1}^{k}{w}_{i,i+1}$ where w_{ i, j }is the weight on the edge (υ_{ i }, υ_{ i }_{+1}).
The larger the bridgeness of an interaction is, the less weight it is.
where W is a matrix with element (W)_{ ij }= D(i, j).
For any protein pairs, if the similarity between them is large enough, we have enough reason to believe they should be connected, otherwise, unconnected. Therefore, the proteins among a core should connect each other. To construct a virtual and reliable network for the original PPI network, similar to [25], a definition is proposed as
There are two good physic interpretations for Φ(G, τ): first of all, if the similarity of a pair of proteins is considered as the reliable score on the corresponding edge, Φ(G) can be considered as a reliable network of the original one; second, it can be understood as a perturbation of the original network by adding edges between vertices if there are enough short walks connecting them and deleting edges between vertex pairs if there are fewer short walks connecting them.
In this way, the core of a protein complex corresponds to a maximal clique in the virtual network. In the follows, we design algorithm to discover complexes by extracting cores and attachments, respectively.
A coreattachment algorithm
The first task is to extract all the maximal cliques in the virtual network, known as the classic all cliques probleman NPhard problem [48]. Therefore, the exact algorithms are prohibited largely due to the complexity. The heuristic algorithms are selected in order to avoid the time issue. The Coach algorithm detects dense subgraphs very quickly and accurately from each vertex's neighborhood graphs [24]. We adopt the Proteincomplex core mining algorithm in the Coach to identify approximately all cliques in the communicability graph Φ(G). Of course, others can be used to identify the cliques, for example, the greedy algorithm, the tabu search and so on.
What we would like to point out is that, although we adopt the same strategy to detect the cores, our algorithm differ greatly from Coach algorithm for two reasons: first, our algorithm detects core in a virtual network based on the weak ties phenomenon, while the Coach on the original network; second, the strategies for the attachment vary greatly.
which quantifies the average closeness of υ to U from the aspect of connectivity. The larger cl(υ, U) is, the more walks connecting υ and the core. Thus, a vertex υ ∊ CS(U) is selected as an attachment when the $cl\left(\upsilon ,U\right)\ge acl\left(U\cup N\left(U\right)\right)=\frac{{\sum}_{\upsilon \in N\left(U\right)}cl\left(\upsilon ,U\right)}{\leftN\left(U\right)\right+\leftU\right}$, indicating that the selected attachment has more connection ways with U than the average connectivity in N(U).
The procedure can be described as following:
Step 1: Compute the bridgeness for each interaction in PPI network G according to Eq.(3);
Step 2: Compute similarity matrix S based on Eqs.(5)(6);
Step 3: Construct the virtual network Φ(G) with a predefined threshold τ;
Step 4: Extract the cores using Proteincomplex core mining algorithm [24];
Step 5: Detect the attachments for each core.
Performance measures
The biological significance of the numerically computed modules can be validated by comparing the experimentally determined complexes (will be introduced in result section).
Fmeasure
where $Precision=\frac{{N}_{cp}}{\leftPS\right}$ and $Recall=\frac{{N}_{cb}}{\leftBS\right}$[49].
Coverage rate
where N_{ i }is the number of proteins in the ith benchmarked complex.
Pvalue
The Pvalue [18] is employed. In detail, given a cluster C with k proteins in a functional group
where │V│ denotes the size of PPI network involved.
Geometric accuracy
Geometrical separation
where $Se{p}_{co}=\frac{{\sum}_{i=1}^{n}{\sum}_{j=1}^{m}Se{p}_{ij}}{n}$ and $Se{p}_{cl}=\frac{{\sum}_{i=1}^{n}{\sum}_{j=1}^{m}Se{p}_{ij}}{m}$.
Results
In this section, the presented algorithm was applied to PPI networks with an immediate purpose to verify the performance from two perspectives: its ability to predict the protein complexes with accuracy, and the robustness of the algorithm. The algorithm was coded using MATLAB version 7.11.
Data
The Database of Interaction Proteins [31] (DIP)(http://dip.doembi.ucla.edu/[version yeast20071104]) data is adopted, which consists of 4,928 proteins and 17,201 interactions. To evaluate the protein complexes predicted by our algorithm, a benchmark set was constructed from the the MIPS [52], Aloy et al. [53] and the SGD database [54] based on the Gene Ontology (GO) notations, which consists of 428 protein complexes [50].
Fmeasure and coverage rate
The results of various algorithms using DIP data
MCL  DPClus  DECAFF  Coach  Our methodI  Our methodII  

Predicted complexes  1116  1143  2190  746  686  620 
Covered proteins  4930  2987  1832  1832  1776  1702 
N _{ cp }  193  193  605  285  242  230 
N _{ cb }  242  274  243  249  198  220 
Pvalue
To further investigate the biological significance of the predicted complexes, the Pvalue is adopted here. The functional homogeneity Pvalue is the probability that a given set of proteins is enriched by a given functional group merely by chance, following the hypergeometric distribution. It is the probability of cooccurrence of proteins with common functions. Accordingly, a low Pvalue of a predicted complex indicates that the collective occurrence of these proteins in the complex does not merely combine by chance and thus achieves high statistical significance. The values are calculated by the GO::TermFinder [55].
We discarded all clusters with Pvalue above a cutoff threshold. In the experiments, we chose a cutoff of 1 × 10^{2} for each protein complex because it offers a compromise between complexcluster matching rate and a clustering passing rate.
Statistical significance of protein complexes obtained by various algorithms on DIP data
MCL  DPClus  DECAFF  Coach  Our methodI  Our methodII  

Predicted complexes  1116  1143  2190  746  686  620 
Significant complexes  312  352  1653  622  536  519 
Proportion (%)  34.2  30.8  75.5  83.4  78.1  83.7 
Selected complexes predicted by our methodII on DIP data
ID  Match  Pvalue  Predicted complexes  Function  

1  90.5%  5.44E44  YBL002W  YBR009C  YBR154C  YDL140C  DNAdirected RNA polymerase activity 
YDL150W  YGL070C  YJR063W  YKL144C  
YKR025W  YNL113W  YNR003C  YOR116C  
YOR151C  YOR207C  YOR210W  YOR224C  
YOR341W  YPR010C  YPR110C  YPR187W  
YPR190C  
2  94.4%  8.77E40  YDL150W  YKL144C  YKR025W  YNL151C  RNA polymerase activity 
YNR003C  YOR116C  YOR207C  YPR110C  
YBL002W  YBR154C  YDR045C  YJR063W  
YNL113W  YOR224C  YOR341W  YPR010C  
YPR187W YPR190C  
3  100%  7.57E26  YPL138C  YDR469W  YBR175W  YHR119W  histone methyltransferase activity (H3K4 specific) 
YBR258C YAR003W YKL018W YLR015W  
4  88.2%  1.49E20  YBL093C  YBR253W  YDR443C  YNL025C  transcription regulator activity 
YNL236W  YOR140W  YBR193C  YCR081W  
YDL005C  YER022W  YGL151W  YGR104C  
YHR041C YOL051W YOL135C YPL042C YPL248C  
5  100%  2.64E21  Q0085 YBL099W YDR298C YDR377W YJR121W  protontransporting ATPase activity, rotational mechanism  
YKL016C YML081CA YPL078C YPR020W 
Size and density distributions
Because the above experiments are sufficient to prove that the superiority of the proposed bridgeness, we only focused on the Type II method in the forthcoming experiment.
Effects of the parameters
Robustness analysis
The robustness analysis on the proposed algorithm was discussed in this subsection. The benchmark networks adopted here originated from Ref. [51]. In detail, from the protein complexes annotated in the MIPS database [52], an interaction network named a test graph is constructed by regarding each protein as a vertex and connecting each pair of nodes in the same complexes. The test graph has a poor value for assessing the robustness of the algorithms because each protein complex corresponds to a clique in the test graph. To solve this problem, the altered graphs are constructed from the test graph by adding or deleting the edges in various proportions. For the sake of convenience, the altered graph is denoted by AG_{ add, del }where add and del show the percentage of added and deleted edges, respectively.
In this experiment, only the MCL and Coach algorithms are selected for a comparison. The reason is that it is reported that the MCL is the most robust algorithms [51], and the Coach algorithm is the best coreattachment based method.
Figure 9(B) displays the impact of edge addition on the separation. We can see that both the MCL and our algorithm have good performances when the percentage of the added edges increases to 80%, while the performance of the Coach algorithm decreases when the percentage of added edges increases to 20%. The impacts of edge removals on the geometric accuracy and separation are shown in Figure 9(C)(D), respectively. Figure 9(C) demonstrates that both the MCL and our algorithm outperform the Coach algorithm. A possible reason is that, as more and more edges are deleted, it becomes more and more difficult to reobtain the deleted edges. When the percentage of removed edges is more than 20%, the virtual network constructed by our algorithm differs greatly from the original test graph. The general trends in Figure 9(D) are similar to those displayed in Figure 9(C).
Figure 9 (AD) are the results on the networks being either added or removed edges, while Figure 9 (EH) are the results on the networks involving both addition and removal. Figure 9 (E) demonstrates the effect of edge addition on the altered network from which 40% of the edges have been deleted previously. From it one can easily draw a conclusion that, when the addition less than 50%, the MCL outperforms the Coach and our algorithm, but when the the addition greater than 50%, both methods outperform the MCL. There is a good explanation: since the Coach and our algorithm are cliquebased method, edge deletion destroys the structure of cliques, decreasing their performance; when more and more edges are added, some of the cliques destroyed previously are recovered, enhancing their performance. Furthermore, these two algorithms are barely affected by addition that is up to 100%, as the MCL decreases significantly the edges start to increase gradually. The values of separation on this type of altered network are shown in Figure 9 (F), where the MCL is at its the best performance. However, both the Coach and our algorithm are more stable than the MCL. The results on edge deletion on the altered network from which 40% of the edges have been added previously are shown in Figure 9 (GH), which are similar to those in Figure 9 (EF).
Conclusions
Protein complexes are key and basic molecular units in cellular functions and computational approaches to discovering accurately the unknown protein complexes hidden in the available PPI data are critical need. At present all these computational algorithms focus on the roles of proteins without taking into account the roles of interactions.
In this paper, we investigate the possibility to predict protein complexes with the roles of edges in PPI networks. Firstly, the weak ties phenomenon in the PPI network is proved by using the concept of bridge. Secondly, a reliable and virtual PPI network is constructed making use the relations of topological similarity and bridgeness. Finally, a coreattachment algorithm is designed. The experimental results demonstrate that the roles of edges in biological network is more promising than the roles of proteins, implying the significant importance of the roles of interactions.
The possible future research directions are

Because biological network is a special kind of social networks, to uncover the social behaviors hidden in biological networks and make the most of them to discover biological problems, such as protein complex prediction, disease causing genes prediction, are very promising.

The discovery of structurefunctionality is a hot and very important topic in bioinformatics. How to associate the social behaviors including the weak ties with the functions is challenge and critical since it provides a deep insight into the biological processes.
Thus, designing effective and efficient methods which can solve these problems will be very important and interesting.
Declarations
Acknowledgements
This work was supported by the National Key NSFC (Grant No. 60933009&91130006), NSFC (Grant No. 61072103, 61100157&61174162), SRFDPHE (Grant No. 200807010013) and FRFCU(Grant No. K50510030006).
This article has been published as part of BMC Systems Biology Volume 6 Supplement 1, 2012: Selected articles from The 5th IEEE International Conference on Systems Biology (ISB 2011). The full contents of the supplement are available online at http://www.biomedcentral.com/bmcsystbiol/supplements/6/S1.
Authors’ Affiliations
References
 Hugher TR: Functional discovery via a compendium of expression profiles. Cell. 2000, 102: 109126. 10.1016/S00928674(00)000155.View ArticleGoogle Scholar
 Neal SH, Amos M, Marek C, Nina VF, Jayanth RB: Dynamic Modeling of Gene Expression Data. Proc Natl Acad Sci. 2001, 98: 16931698. 10.1073/pnas.98.4.1693.View ArticleGoogle Scholar
 RossMacdonald P: Largescale analysis of the yeast genome by transposon tagging and gene disruption. Nature. 1999, 402: 413418. 10.1038/46558.View ArticlePubMedGoogle Scholar
 Winzeler EA: Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science. 1999, 285: 901906. 10.1126/science.285.5429.901.View ArticlePubMedGoogle Scholar
 Kumar A: Subcellular localizaion of the yeast proteome. Genes Dev. 2002, 16: 707719. 10.1101/gad.970902.PubMed CentralView ArticlePubMedGoogle Scholar
 Hub WK: Global analysis of protein localization in budding yeast. Nature. 2003, 425: 686691. 10.1038/nature02026.View ArticleGoogle Scholar
 Ito T, Chila T, Ozawa R, Yoshida M, Hattori M, Sakaki Y: A comparehensive twohybid analysis to explore the yeast protein interactome. Proc Natl Acad Sci. 2001, 98 (8): 45694574. 10.1073/pnas.061034498.PubMed CentralView ArticlePubMedGoogle Scholar
 Gavin AC, Bosche M, Krause R, Grandi P, Marzioch M, Bauer A, Schultz J: Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature. 2002, 415 (6868): 141147. 10.1038/415141a.View ArticlePubMedGoogle Scholar
 Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L: Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature. 2002, 415 (6868): 180183. 10.1038/415180a.View ArticlePubMedGoogle Scholar
 Tarassov K, Messier V, Landry CR, Radinovic S, Molina MM, Shames I: An in vivo map of the yeast protein interactome. Science. 2008, 320 (5882): 14651470. 10.1126/science.1153878.View ArticlePubMedGoogle Scholar
 Schonbach C: Molecular biology of proteinprotein interactions for computer scientists. Biological data mining in protein interaction networks. 2009: 113.Google Scholar
 Tong AH, Drees B, Nardelli G, Bader GD, Brannetti B, Castagnoli L, Evangelista M, Ferracuti S, Nelson B, Paoluzi S, Quondam M, Zucconi A, Hogue C, Fields S, Boone C, Cesareni G: A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules. Science. 2002, 295 (5583): 321324.View ArticlePubMedGoogle Scholar
 Spirin V, Mirny L: Protein complexes and functional modules in molecular networks. Proc Natl Acad Sci. 2003, 100 (21): 1212312128. 10.1073/pnas.2032324100.PubMed CentralView ArticlePubMedGoogle Scholar
 PereiraLeal JB, Enright AJ, Ouzounis CA: Detection of functional modules from protein interaction networks. Proteins. 2004, 54 (1): 4957.View ArticlePubMedGoogle Scholar
 Enright AJ, Dongen SV, Ouzounis CA: An efficient algorithm for largescale detection of protein families. Nucleic Acids Res. 2002, 30 (7): 15751584. 10.1093/nar/30.7.1575.PubMed CentralView ArticlePubMedGoogle Scholar
 Bader G, Hogue C: An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics. 2003, 4: 210.1186/1471210542.PubMed CentralView ArticlePubMedGoogle Scholar
 Adamcsek B, Palla G, Farkas IJ, Derényi I, Vicsek T: CFinder: locating cliques and overlapping modules in biological networks. Bioinformatics. 2006, 22 (8): 10211023. 10.1093/bioinformatics/btl039.View ArticlePubMedGoogle Scholar
 King AD, Prulj N, Jurisica I: Protein complex prediction via costbased clustering. Bioinformatics. 2004, 20 (17): 30133020. 10.1093/bioinformatics/bth351.View ArticlePubMedGoogle Scholar
 Jung SH, Jang WH, Hur HY, Hyun B, Han DS: Protein complex prediction based on mutually exclusive interactions in protein interaction network. Genome Informatics. 2008, 21: 7788.PubMedGoogle Scholar
 Zhang B, Park B, Karpinets TV, Samatova NF: From pulldown data to protein interaction networks and complexes with biological relevance. Bioinformatics. 2008, 24 (7): 979986. 10.1093/bioinformatics/btn036.View ArticlePubMedGoogle Scholar
 Gavin AC, Aloy P, Grandi P, Krause R, Boesche M, Marzioch M, Rau C, Jensen LJ, Bastuck S, Dümpelfeld B: Proteome survey reveals modularity of the yeast cell machinery. Nature. 2006, 440 (7084): 631636. 10.1038/nature04532.View ArticlePubMedGoogle Scholar
 Krogan NJ, Cagney G, Yu G, Zhong H, Guo X, Ignatchenko A, Li J, Pu S, Datta N, Tikuisis AP, Punna T: Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature. 2006, 440 (7084): 637643. 10.1038/nature04670.View ArticlePubMedGoogle Scholar
 Leung HC, Xiang Q, Yiu SM, Y CF: Predicting protein complexes from PPI data: a coreattachment approach. Journal of Computational Biology. 2009, 16 (2): 133144. 10.1089/cmb.2008.01TT.View ArticlePubMedGoogle Scholar
 Wu M, Li X, Kwoh CK, Ng S: A coreattachment based method to detect protein complexes in ppi networks. BMC bioinformatics. 2009, 10: 16910.1186/1471210510169.PubMed CentralView ArticlePubMedGoogle Scholar
 Ma X, Gao L: Predicting protein complexes in protein interaction networks using a coreattachment algorithm based on graph communicability. Information Sciences. 2012, 189: 233254.View ArticleGoogle Scholar
 Habibi M, Eslahchi C, Wong L: Protein complex prediction based on k connected subgraphs in protein interaction network. BMC Syst Biol. 2010, 4: 12910.1186/175205094129.PubMed CentralView ArticlePubMedGoogle Scholar
 Zhang SH, Ning XM, Ding C, Zhang XS: Determining modular organization of protein interaction networks by maximizing modularity density. BMC Systems Biology. 2010, 4 (Suppl 2): S1010.1186/175205094S2S10.PubMed CentralView ArticlePubMedGoogle Scholar
 Liu ZP, Wang Y, Zhang XS, Chen LN: Identifying dysfunctional crosstalk of pathways in various regions of Alzheimer's disease brains. BMC Systems Biology. 2010, 4 (Suppl 2): S1110.1186/175205094S2S11.PubMed CentralView ArticlePubMedGoogle Scholar
 Luo F, Liu J, Li J: Discovering conditional coregulated protein complexes by integrating diverse data sources. BMC Systems Biology. 2010, 4 (Suppl 2): S410.1186/175205094S2S4.PubMed CentralView ArticlePubMedGoogle Scholar
 Li XL, Wu M, Kwoh CK, Ng SK: Computational approaches for detecting protein complexes from protein interaction networks: a survey. BMC Genomics. 2010, 11 (Suppl 1): S310.1186/1471216411S1S3.PubMed CentralView ArticlePubMedGoogle Scholar
 Xenarios I, Rice DW, Marcotte EM, Eisenberge D: DIP: the database of interacting proteins. Nucleic Acids Research. 2000, 28: 289291. 10.1093/nar/28.1.289.PubMed CentralView ArticlePubMedGoogle Scholar
 von Mering C, Krause R, Snel B, Cornell M, Oliver S, Bork P: Comparative assessment of largescale data sets of proteinprotein interactions. Nature. 2002, 417 (6887): 399403.View ArticlePubMedGoogle Scholar
 Saito R, Suzuki H, Hayashizaki Y: Interaction generality, a measurement to assess the reliability of a proteinprotein interaction. Nucleic Acids Research. 2002, 30 (5): 11631168. 10.1093/nar/30.5.1163.PubMed CentralView ArticlePubMedGoogle Scholar
 Chua HN, Sung WK, Wong L: Exploiting indirect neighbours and topological weight to predict protein function from proteinprotein interactions. Bioinformatics. 2006, 22 (13): 16231630. 10.1093/bioinformatics/btl145.View ArticlePubMedGoogle Scholar
 Granovetter M: The strength of weak ties. American Journal of Sociology. 1973, 77 (6): 13601380.View ArticleGoogle Scholar
 Albert R, Barabási AL: Statistical mechanics of complex networks. Reviews of Modern Physics. 2002, 74: 4797. 10.1103/RevModPhys.74.47.View ArticleGoogle Scholar
 Csermely P: Strong links are important, but weak links stabilize them. Trends Biochem Sci. 2004, 29: 331334. 10.1016/j.tibs.2004.05.004.View ArticlePubMedGoogle Scholar
 Lü L, Zhou T: Link prediction in weighted networks: the role of weak ties. Europhys Lett. 2010, 89: 1800110.1209/02955075/89/18001.View ArticleGoogle Scholar
 Onnela JP, Saramäki J, Hyvönen J, Szabó G, Lazer D, Kaski K, Kertész J, Barabási AL: Structure and tie strengths in mobile communication networks. Proc Natl Acad Sci. 2007, 104 (18): 73327336. 10.1073/pnas.0610245104.PubMed CentralView ArticlePubMedGoogle Scholar
 Cheng X, Ren F, Shen H, Zhang Z, Zhou T: Bridgeness: a local index on edge significance in maintaining global connectivity. J Stat Mech. 2010, 10: P10011View ArticleGoogle Scholar
 AltafUlAmin M, Shinbo Y, Mihara K, Kurokawa K, Kanaya S: Development and implementation of an algorithm for detection of protein complexes in large interaction networks. BMC Bioinformatics. 2006, 7: 20710.1186/147121057207.PubMed CentralView ArticlePubMedGoogle Scholar
 Li XL, Foo CS, Ng SK: Discovering protein complexes in dense reliable neighborhoods of protein interaction networks. Comput Syst Bioinformatics Conf. 2007, 6: 157168.View ArticlePubMedGoogle Scholar
 Liu W, Lü L: Link prediction based on local random walk. Europhysic Letter. 2010, 89: 5800710.1209/02955075/89/58007.View ArticleGoogle Scholar
 Brin S, Page L: The anatomy of a largescale hypertextual Web search engine. Computer Networks and ISDN Systems. 1998, 30: 107117. 10.1016/S01697552(98)00110X.View ArticleGoogle Scholar
 Lazarsfeld P, Merton RK: Freedom and Control in Modern Society. 1954, New York: Van NostrandGoogle Scholar
 McPherson JM, SmithLovin L, Cook J: Birds of a feather: Homophily in social networks. Annual Review of Sociology. 2001, 27: 415444. 10.1146/annurev.soc.27.1.415.View ArticleGoogle Scholar
 Stauffer D, Aharony A: Introduction to Percolation Theory. 1994, New York: Van Nostrand, London: CRC Press, 2Google Scholar
 Pardalos P, Xue J: The maximum clique problema. J Global Opt. 1997, 4: 301328.View ArticleGoogle Scholar
 Chua H, Ning K, Sung W, Leong L: Using indirect proteinprotein interactions in protein complex prediction. Comput Syst Bioinformatics Conf. 2007, 6: 97109.View ArticlePubMedGoogle Scholar
 Friedel C, Krumsiek J, Zimmer R, Vingron M, Wong L: Boostrapping the interactome: unsupervised identification of protein complexes in Yeast. Proceedings of the 12th Annual Conference on Research in Computational Molecular Biology (RECOMB). 2008, 316.View ArticleGoogle Scholar
 Brohée S, Van Helden J: Evaluation of clustering algorithms for proteinprotein interaction network. BMC Bioinformatics. 2006, 7: 48810.1186/147121057488.PubMed CentralView ArticlePubMedGoogle Scholar
 Mewes H, Amid C, Arnold R, et al: MIPS: analysis and annotation of proteins from whole genomes. Nucleic Acids Research. 2004, 32: D41D44. 10.1093/nar/gkh092.PubMed CentralView ArticlePubMedGoogle Scholar
 Aloy P, Böttcher B, Ceulemans H: StructureBased Assembly of Protein Complexes in Yeast.Google Scholar
 Dwight S, Harris M, Dolinski K, et al: Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO). Nucl Acids Res. 2002, 30 (1): 6972. 10.1093/nar/30.1.69.PubMed CentralView ArticlePubMedGoogle Scholar
 Boyle E, Weng S, et al: GO::TermFinderopen source software for accessing Gene Ontology information and finding sigificantly enriched Gene Ontology terms associated with a list of genes. Bioinformatics. 2004, 20 (18): 37103715. 10.1093/bioinformatics/bth456.PubMed CentralView ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.