Finding low-conductance sets with dense interactions (FLCD) for better protein complex prediction
- Yijie Wang^{1} and
- Xiaoning Qian^{1}Email author
https://doi.org/10.1186/s12918-017-0405-5
© The Author(s) 2017
Published: 14 March 2017
Abstract
Background
Intuitively, proteins in the same protein complexes should highly interact with each other but rarely interact with the other proteins in protein-protein interaction (PPI) networks. Surprisingly, many existing computational algorithms do not directly detect protein complexes based on both of these topological properties. Most of them, depending on mathematical definitions of either “modularity” or “conductance”, have their own limitations: Modularity has the inherent resolution problem ignoring small protein complexes; and conductance characterizes the separability of complexes but fails to capture the interaction density within complexes.
Results
In this paper, we propose a two-step algorithm FLCD (Finding Low-Conductance sets with Dense interactions) to predict overlapping protein complexes with the desired topological structure, which is densely connected inside and well separated from the rest of the networks. First, FLCD detects well-separated subnetworks based on approximating a potential low-conductance set through a personalized PageRank vector from a protein and then solving a mixed integer programming (MIP) problem to find the minimum-conductance set within the identified low-conductance set. At the second step, the densely connected parts in those subnetworks are discovered as the protein complexes by solving another MIP problem that aims to find the dense subnetwork in the minimum-conductance set.
Conclusion
Experiments on four large-scale yeast PPI networks from different public databases demonstrate that the complexes predicted by FLCD have better correspondence with the yeast protein complex gold standards than other three state-of-the-art algorithms (ClusterONE, LinkComm, and SR-MCL). Additionally, results of FLCD show higher biological relevance with respect to Gene Ontology (GO) terms by GO enrichment analysis.
Keywords
Background
Recent developments of high-throughput profiling techniques, such as yeast two-hybrid (Y2H) and tandem affinity purification (TAP) with mass spectrometry (MS), allow scientists to generate large-scale protein-protein interaction (PPI) datasets for different species [1–5]. These interactome data have enabled us to discover biological insights from a systematic point of view through PPI networks, where nodes represent proteins and edges denote biological relationships (either physical binding or statistical association) between two proteins. In this paper, we focus on predicting protein complexes in derived PPI networks from high-throughput profiling.
Based on the inherent topological structures of protein complexes [6], prediction of protein complexes can be formulated as searching for subnetworks that are densely connected inside and well separated from the rest of the PPI networks. Many algorithms have been developed and applied for this purpose of detecting protein complexes.
These existing algorithms can be grouped into three categories. The first category includes the algorithms that mimic Markovian random walk on graphs, pioneered by MCL [7]. MCL does not have explicit mathematical definitions for the desired properties of subnetworks to detect as protein complexes. Similar to random walk, it iteratively implements “Expand” and “Inflation” operations to generate non-overlapping complexes. R-MCL [8] and SR-MCL [9] are improved versions of MCL. R-MCL penalizes the large complexes at each iteration in order to obtain more size-balanced complexes with a similar number of nodes within them. SR-MCL executes R-MCL many times to yield overlapping complexes. All those algorithms have shown good empirical performance, despite the mystery of parameter tuning and the lack of theoretic understanding of their working mechanisms.
Algorithms in the second category do not directly predict complexes according to the topological structure of subnetworks but resemble traditional clustering methods based on derived similarity measures between nodes or edges. For example, MCODE [1], CFinder [10], and RRW [11] grow complexes from single nodes by iteratively adding similar nodes in terms of different similarity criteria that help form local dense subnetworks. However, they only concentrate on the internal connectivity of the subnetworks and neglect the connectivity between the subnetworks and the rest of the networks. LinkComm [12] represents networks with edge graphs, whose nodes are interactions and edges reflect the similarity between interactions, and derives potential complexes by hierarchical clustering to partition the edge graphs.
Algorithms in the third category detect complexes based on explicit topological definitions of protein complexes. For example, modularity [13] and conductance [6, 14] are two widely used definitions. Algorithms based on modularity [15] aim to detect subnetworks that have higher than expected internal connections. And algorithms, such as ClusterONE [6], based on finding low-conductance sets, focus on the separability of the subnetworks, which can be quantified by the ratios between the external connections of subnetworks and the total number of interactions of the proteins within the subnetworks. However, these methods have their own limitations. Modularity-based methods have the inherent resolution problem [16], which leads to ignorance of small-size protein complexes. Algorithms based on conductance minimization [6, 17] consider the relationships between the internal connections and the external connections of subnetworks, but neglect the density of the interactions within the subnetworks.
In this paper, we propose a two-step algorithm FLCD (Finding Low-Conductance sets with Dense interactions) to detect protein complexes that have dense interactions inside and sparse interactions outside in a given PPI network. FLCD explicitly takes care of both the internal and external connectivity of protein complexes in two steps. FLCD first identifies a low-conductance set around a protein, which is locally well separated from the rest of the network. Then a densely connected subnetwork within the low-conductance set is detected based on the definition of the edge density of a subnetwork proposed in [18]. We compare our FLCD with three state-of-the-art overlapping complex prediction algorithms, which are ClusterONE [6], LinkComm [12], and SR-MCL [9], respectively. Experimental results on four different yeast PPI networks from different publicly accessible databases demonstrate that our FLCD outperforms all competing algorithms for biological significance in terms of yeast protein complex gold standards and Gene Ontology (GO) term annotations [19].
Results and discussion
We first introduce the implementation details of the algorithms that we take for comparison; the information of the PPI networks, the reference protein complex datasets as our gold standards, and the GO terms we use for evaluation; and the criteria for the performance comparison. In order to demonstrate the robust performance of FLCD, we then compare predicted protein complexes from three selected state-of-the-art protein complex prediction algorithms based on two golden standard protein complex datasets on four public yeast PPI networks. What’s more, we apply GO enrichment analysis to the entire set of detected complexes by all the competing algorithms. At the end, we illustrate differences between protein complexes predicted by all competing algorithms corresponding to specific reference complexes to further demonstrate the superiority of our FLCD.
Algorithms, data, and evaluation metrics
Algorithms
We compare our FLCD algorithm with other three state-of-the-art overlapping complex prediction algorithms, which are ClusterONE [6], LinkComm [12], and SR-MCL [9]. The JAVA implementation of ClusterONE does not require any tuning parameters. For LinkComm, we set the tuning parameter t (the threshold to cut the dendrogram for hierarchical clustering) to 0.2 that achieves the best performance empirically in our experiments. For SR-MCL, we set the inflation parameter I=3 and other parameters to their default settings since they yield the best results in our experiments. We set the only parameter k of our FLCD, the size of local neighbors based on personalized PageRank computation, to 20.
Data
The detailed information of four yeast PPI networks and the numbers of covered SGD and MIPS reference complexes
Network | #. proteins | #. interactions | SGD | MIPS |
---|---|---|---|---|
SceDIP | 5136 | 22491 | 224 | 184 |
SceBG | 6438 | 80577 | 234 | 189 |
SceIntAct | 5453 | 54134 | 231 | 187 |
SceMINT | 5414 | 27316 | 230 | 188 |
Due to the possible incompleteness of the reference protein complexes, we further examine the biological relevance of every predicted complex by GO enrichment analysis. We download the mappings of yeast genes and proteins to GO terms according to [20] (version 20150411).
Evaluation metrics for protein complex prediction
For the protein complex prediction, we assess the performance of all competing algorithms by a composite score consisting of three quality measures: F-measure [9, 14]; the geometric accuracy (Acc) score [14]; and the maximum matching ratio (MMR) [6]. For fair comparison, we remove predicted complexes of two or fewer proteins by all competing algorithms.
The Acc score provides a balanced measure of Sn and PPV: \(\textup {Acc}=\sqrt {\textup {Sn} \times \textup {PPV}}\).
The maximum matching ratio (MMR) is the ratio of the weight of maximum weight matching to the size of the reference set.
GO enrichment analysis
We choose the lowest p-value of all its enriched GO terms for a predicted complex as its final p-value. A GO term is statistically significantly enriched when the p-value of any complex corresponding to this GO term is lower than 1e−3.
Comparison on protein complex prediction
We apply all competing algorithms to search for potential protein complexes in four yeast PPI networks and compare them in terms of the composite score, consisting of F-measure, Acc score and MMR based on both the SGD and MIPS reference protein complex datasets.
We note that the different sizes and different numbers of detected complexes would affect the scores for the metrics that we have employed. However, in the context of complex prediction, there is no universal gold-standard metric. Hence, we apply three aforementioned metrics that have been commonly adopted in many other related works [6, 9]. We also note that the average sizes of the complexes generated by FLCD in our experiments are from 6 to 8 for four networks under study. The average complex sizes are indeed comparable to the average sizes of detected complexes by other algorithms. For example, the average sizes of complexes produced by LinkCommunity are from 5 to 6; The average sizes of complexes produced by ClusterONE are from 7 to 9; The average sizes of complexes produced by SR-MCL are from 8 to 10. Furthermore, the total numbers of predicted complexes yielded by FLCD, LinkCommunity and SR-MCL are much larger than that of ClusterONE. The reason is that the post-processing procedure of ClusterONE filters out complexes with lower scores but FLCD and LinkCommunity output all complexes without filtering.
Comparison of protein complex prediction by SGD reference dataset
Network | Method | # complex | #. matched | coverage | Recall | Precision | F-measure | Sn | PPV | Acc | MMR |
---|---|---|---|---|---|---|---|---|---|---|---|
SceDIP | FLCD | 2134 | 152 | 3921 | 0.6786 | 0.2020 | 0.3113 | 0.5964 | 0.5003 | 0.5462 | 0.3685 |
CONE | 380 | 86 | 1503 | 0.3839 | 0.2579 | 0.3085 | 0.4082 | 0.6203 | 0.5032 | 0.1950 | |
LinkC | 1839 | 137 | 3735 | 0.6116 | 0.1289 | 0.2130 | 0.6290 | 0.4820 | 0.5506 | 0.3276 | |
SR-MCL | 3216 | 44 | 4678 | 0.2228 | 0.0221 | 0.0412 | 0.5120 | 0.2893 | 0.3489 | 0.0708 | |
SceBG | FLCD | 4027 | 183 | 5836 | 0.7821 | 0.2000 | 0.3181 | 0.7363 | 0.5621 | 0.6433 | 0.4920 |
CONE | 522 | 122 | 2735 | 0.5214 | 0.2433 | 0.3318 | 0.6488 | 0.6035 | 0.6257 | 0.2542 | |
LinkC | 5382 | 164 | 6076 | 0.7008 | 0.1217 | 0.2072 | 0.8880 | 0.4373 | 0.6231 | 0.4100 | |
SR-MCL | 1862 | 108 | 5889 | 0.4615 | 0.1245 | 0.1961 | 0.8999 | 0.3034 | 0.5225 | 0.2151 | |
SceIntAct | FLCD | 3394 | 172 | 4678 | 0.7446 | 0.1933 | 0.3069 | 0.6699 | 0.5391 | 0.6009 | 0.4661 |
CONE | 496 | 117 | 1994 | 0.5065 | 0.2419 | 0.3275 | 0.5742 | 0.5944 | 0.5842 | 0.2742 | |
LinkC | 1297 | 93 | 5290 | 0.4026 | 0.0941 | 0.1525 | 0.9223 | 0.2393 | 0.4698 | 0.2285 | |
SR-MCL | 1079 | 68 | 5342 | 0.2294 | 0.0437 | 0.1517 | 0.7784 | 0.2402 | 0.4341 | 0.1213 | |
SceMINT | FLCD | 2483 | 157 | 4210 | 0.6826 | 0.2280 | 0.3418 | 0.6524 | 0.5284 | 0.5871 | 0.4163 |
CONE | 513 | 110 | 2335 | 0.4783 | 0.2027 | 0.2848 | 0.5370 | 0.5954 | 0.5654 | 0.2442 | |
LinkC | 2201 | 144 | 4068 | 0.6261 | 0.1595 | 0.2542 | 0.6757 | 0.5540 | 0.6119 | 0.3743 | |
SR-MCL | 3698 | 33 | 4976 | 0.1435 | 0.0169 | 0.0302 | 0.5013 | 0.2597 | 0.3608 | 0.0609 |
Comparison of protein complex prediction by MIPS reference dataset
Network | Method | # complex | #. matched | Coverage | Recall | Precision | F-measure | Sn | PPV | Acc | MMR |
---|---|---|---|---|---|---|---|---|---|---|---|
SceDIP | FLCD | 2134 | 120 | 3921 | 0.6522 | 0.1603 | 0.2573 | 0.4001 | 0.3901 | 0.3951 | 0.3206 |
CONE | 380 | 74 | 1503 | 0.4022 | 0.1868 | 0.2551 | 0.2749 | 0.4015 | 0.3322 | 0.1533 | |
LinkC | 1839 | 109 | 3735 | 0.5924 | 0.1104 | 0.1862 | 0.4775 | 0.3646 | 0.4173 | 0.2993 | |
SR-MCL | 2851 | 41 | 4687 | 0.1964 | 0.0230 | 0.0402 | 0.4592 | 0.2104 | 0.3108 | 0.0726 | |
SceBG | FLCD | 4027 | 124 | 5836 | 0.6561 | 0.1393 | 0.2298 | 0.4643 | 0.4315 | 0.4476 | 0.3611 |
CONE | 522 | 86 | 2735 | 0.4450 | 0.1533 | 0.2293 | 0.4537 | 0.4452 | 0.4494 | 0.1795 | |
LinkC | 5382 | 109 | 6076 | 0.6349 | 0.0918 | 0.1604 | 0.8179 | 0.3504 | 0.5354 | 0.3285 | |
SR-MCL | 1862 | 65 | 5889 | 0.3439 | 0.0673 | 0.1126 | 0.7360 | 0.2436 | 0.4234 | 0.1384 | |
SceIntAct | FLCD | 3394 | 120 | 4678 | 0.6417 | 0.1452 | 0.2368 | 0.4183 | 0.4034 | 0.4108 | 0.3482 |
CONE | 496 | 79 | 1994 | 0.4225 | 0.1633 | 0.2356 | 0.3587 | 0.4296 | 0.3925 | 0.1927 | |
LinkC | 1297 | 80 | 5290 | 0.4278 | 0.0732 | 0.1251 | 0.9028 | 0.1986 | 0.4234 | 0.1886 | |
SR-MCL | 1079 | 45 | 5342 | 0.1337 | 0.0190 | 0.0941 | 0.6246 | 0.1850 | 0.3399 | 0.0960 | |
SceMINT | FLCD | 2483 | 111 | 4210 | 0.5904 | 0.1800 | 0.2759 | 0.4147 | 0.4086 | 0.4116 | 0.3231 |
CONE | 513 | 67 | 2335 | 0.3564 | 0.1267 | 0.1869 | 0.3274 | 0.4017 | 0.3626 | 0.1519 | |
LinkC | 2201 | 100 | 4068 | 0.5319 | 0.1040 | 0.1740 | 0.4744 | 0.4038 | 0.4377 | 0.2744 | |
SR-MCL | 3698 | 24 | 4976 | 0.1277 | 0.0112 | 0.0205 | 0.4192 | 0.1999 | 0.2894 | 0.0481 |
When we take SGD reference dataset as our gold standard protein complexes, from Table 2, we find that FLCD consistently achieves the best MMR scores among all competing algorithms because FLCD is the only algorithm that can capture the desired network structure of protein complexes. In the table, we also compare F-measure and the precision and recall scores that are used to compute F-measure. We observe that for all four PPI networks, FLCD predicts the largest number of matched reference protein complexes, and therefore FLCD attains the best recall scores for all PPI networks. With respect to the precision score, FLCD is the best for SceMINT but ClusterONE performs the best for the rest. However, since the post-processing step in ClusterONE only keeps the dense complexes, ClusterONE has low coverage. Based on the precision and recall scores, we find that FLCD attains the best F-measures for SceDIP and SceMINT PPI networks and ClusterONE obtains the best scores for SceBG and SceIntAct PPI networks. In addition to MMR and F-measure, we show comparison on the cluster-wise sensitivity (Sn), the cluster-wise positive predictive value (PPV) and the Acc score. We notice that FLCD has the best Acc scores for SceBG and SceIntAct. LinkComm obtains the best Acc scores for SceDIP and SceMINT, since LinkComm detects several large-size and many small-size complexes, which favors both the Sn and PPV scores [6]. We also compare the coverage of the competing algorithms and notice that SR-MCL has the largest coverage and FLCD has competitive coverage to SR-MCL. Here, the coverage is defined as the number of proteins covered by all predicted complexes, which is typically used to evaluate whether complex prediction algorithms can help comprehensively predict functionalities for all the proteins in a given network.
For MIPS reference dataset, we notice the similar trend for the evaluation scores in Table 3. FLCD finds the largest number of matched reference complexes in MIPS and attains the best recall scores, F-measures and MMR scores for all four PPI networks. The Acc scores of FLCD are competitive to LinkComm, which achieves the best Acc scores for all four yeast PPI networks. FLCD covers the competitive number of proteins to SR-MCL, which covers the largest number of proteins in all four yeast PPI networks. However, by the overall performance, which is represented by the composite score, FLCD is superior to other competing algorithms as shown in Fig. 2.
In summary, considering the composite score based on three metrics, our FLCD outperforms the other algorithms. To further validate all competing algorithms, we perform GO enrichment analysis in the next section to see whether all predicted complexes by different algorithms have significant biological meaning.
Comparison on GO enrichment analysis
Comparison by GO enrichment analysis
Network | Method | # complex | % enriched | # GO | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
SceDIP | FLCD | 2134 | 72.2 | 1442 | |||||||
CONE | 380 | 71.8 | 852 | ||||||||
LinkC | 1839 | 67.4 | 1273 | ||||||||
SR-MCL | 2851 | 23.5 | 957 | ||||||||
SceBG | FLCD | 4027 | 72.4 | 1800 | |||||||
CONE | 522 | 77.4 | 1282 | ||||||||
LinkC | 5382 | 39.8 | 1554 | ||||||||
SR-MCL | 1862 | 56.4 | 1702 | ||||||||
SceIntAct | FLCD | 3394 | 62.4 | 1414 | |||||||
CONE | 496 | 65.6 | 1031 | ||||||||
LinkC | 1297 | 46.5 | 1129 | ||||||||
SR-MCL | 1079 | 44.7 | 888 | ||||||||
SceMINT | FLCD | 2483 | 62.3 | 1416 | |||||||
CONE | 513 | 59.4 | 954 | ||||||||
LinkC | 2201 | 32.1 | 1123 | ||||||||
SR-MCL | 3698 | 19.7 | 856 |
Examples of predicted complexes
Similarly, we show the predicted RNase complexes by all competing algorithms in Fig. 4 from (b.1) to (b.4). In (b.1), we observe that FLCD detects all proteins in the reference RNase complex but mistakenly includes the protein SKI7 due to the existence of false positive interactions between SKI7 and proteins in RNase complex. In addition to SKI7, the predicted complex by ClusterONE (shown in Fig. 4(b.2)) contains two false positive proteins with very small degrees due to the ignorance of the internal density. Because LinkComm does not explicitly characterize the separability of the complexes, it also recruits some false positive proteins as clearly shown in Fig. 4(b.3). For the complex obtained by SR-MCL, we note that it has lots of false positive proteins and the topological property of the predicted complex is not clear.
Conclusions
We propose an algorithm FLCD to predict protein complexes in protein-protein interaction networks. FLCD can better characterize the topological structure of a protein complex, which is densely connected inside and well separated from the rest of the networks. We compare FLCD with other three state-of-the-art algorithms on protein complex prediction. The comparison results show that FLCD achieves superior performances. Furthermore, GO enrichment analysis of the results of the competing algorithms demonstrates that FLCD finds more biologically meaningful complexes, within which proteins tend to be in the same cellular components and have similar functions and/or participate in the same biological processes.
Methods
Terminologies and definitions
Let an undirected graph G=(V,E) represent a PPI network, where V denotes the set of proteins in G and E is the interaction set. A is the adjacency matrix of G with A _{ ij }=A _{ ji } and A _{ ij }=1 denoting node i interacts with node j and A _{ ij }=0 otherwise. The degree matrix D of G is a diagonal matrix with D _{ ii }=d _{ i }, where \(d_{i} = \sum _{j} A_{ij}\) is the number of interactions connecting to protein i.
where 1 _{ i∈S } is the indicator function depending on whether i∈S.
Motivation
FLCD explicitly considers both the separability and internal edge density of complexes in two steps respectively. At the first step, it takes care of the separability of complexes by ensuring low conductance to hope for the complexes to have unique biological functions. At the second step, FLCD preserves the densely connected parts of the complexes identified in the first step. Because PPI networks are noisy and typically sparse, instead of finding cliques, we use the definition of internal density in (7) to search for dense subnetworks as final predicted complexes.
Searching for a low-conductance set \(\boldsymbol {H^{*}_{v}}\)
Given a starting protein v, our goal is to find a protein set \(H^{*}_{v}\) with low conductance including v. We first apply the algorithm proposed in [17] to find a potential set H with low conductance, then the minimum-conductance set \(H^{*}_{v}\) in H is identified through solving a mixed integer programming (MIP) problem exactly.
After using standard techniques [24] to linearize z x _{ i } and x _{ i } x _{ j }, the optimization problem can be solved by any MIP solver, such as Gurobi [25]. Because the size of |H|=k is much smaller than |V|=n and we only focus on identifying one low-conductance set, we can efficiently obtain the minimum-conductance set \(H^{*}_{v}\) in H by solving (10) exactly.
If node v is in a connected component of size k ^{′} and we set k>k ^{′}, then we might have a trivial solution that the low-conductance set is the connected component with conductance 0. To avoid this, we apply the following procedure. We check every derived low-conductance set of size k ^{′} to see whether it has exactly 0 conductance, which implies that it is a connected component with size k ^{′}. If that is the case, we then set k=k ^{′}−1, and re-solve the MIP to get a non-trivial solution.
Conservation of the densest subnetwork \(\boldsymbol {C_{v}^{*}}\) in \(\boldsymbol {H^{*}_{v}}\)
which can also be cast into the MIP framework with the exactly optimal solution obtained by using standard MIP solvers after linearization [24].
The FLCD algorithm
The FLCD algorithm
Algorithm: The FLCD Algorithm |
---|
Input: \(\mathcal {S} = V\) and k=20. |
Output: A set of predicted complexes R. |
1 While (\(\exists v \in \mathcal {S}\) and d _{ v }≥3) |
2 Estimate \(\hat {p} \approx p(\alpha, v)\). |
3 Sort nodes in V based on \(\hat {p}\) and collect the top k nodes in H _{ v }. |
4 Finding the lowest-conductance set \(H_{v}^{*}\in H_{v}\) based on (10). |
5 Identifying the node set \(C_{v}^{*}\) of the densest subnetwork in \(H_{v}^{*}\) based on (12). |
6 Considering \(C_{v}^{*}\) as one predicted complex, let \(R=\{R, C_{v}^{*}\}\) and\(\mathcal {S} = \mathcal {S} - v\). |
7 EndWhile |
8 Remove duplicated complexes and complexes with size smaller than three in R. |
Notes
Declarations
Acknowledgements
XQ was partially supported by Awards CCF-1447235, CCF-1553281, IOS-1547557 from the National Science Foundation (NSF) and AFRI-2015-67013-22816 from the United States Department of Agriculture (USDA).
Funding
The publication costs of this article was funded by Award CCF-1447235 from the National Science Foundation.
Availability of data and materials
The Matlab source code and relevant network data can be found at https://github.com/xqian37/FLCD.
Authors’ contributions
Conceived the algorithm: YW, XQ. Implemented the algorithm and performed the experiments: YW. Analyzed the results: YW, XQ. Wrote the paper: YW, XQ. Both authors have read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Consent for publication
Not applicable.
Ethics approval and consent to participate
Not applicable.
About this supplement
This article has been published as part of BMC Systems Biology Volume 11 Supplement 3, 2017: Selected original research articles from the Third International Workshop on Computational Network Biology: Modeling, Analysis, and Control (CNB-MAC 2016): systems biology. The full contents of the supplement are available online at http://bmcsystbiol.biomedcentral.com/articles/supplements/volume-11-supplement-3.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
- Bader GD, Hogue CW. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics. 2003; 4:2.View ArticlePubMedPubMed CentralGoogle Scholar
- Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D. The Database of Interacting Proteins: 2004 update. Nucleic Acids Res. 2004; 32:449–51.View ArticleGoogle Scholar
- Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M. BioGRID: A general repository for interaction datasets. Nucleic Acids Res. 2006; 34:535–9.View ArticleGoogle Scholar
- Kerrien S, Aranda B, Breuza L, et al. The IntAct molecular interaction database in 2012. Nucleic Acids Res. 2012; 40(D1):841–6.View ArticleGoogle Scholar
- Licata L, Briganti L, Peluso D, et al. MINT, the molecular interaction database: 2012 update. Nucleic Acids Res. 2012; 40:857–61.View ArticleGoogle Scholar
- Nepusz T, Yu H, Paccanaro A. Detecting overlapping protein complexes in protein-protein interaction networks. Nat Methods. 2012; 9:471–2.View ArticlePubMedPubMed CentralGoogle Scholar
- van Dongen S. A cluster algorithm for graphs. 2000. Technical Report INS-R0010, National Research Institute for Mathematics and Computer Science in the Netherlands, Amsterdam.Google Scholar
- Satuluri V, Parthasarathy S. Scalable Graph Clustering Using Stochastic Flows: Applications to Community Discovery. In: 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09). Paris: 2009. p. 737–46.Google Scholar
- Shih YK, Parthasarathy S. Identifying functional modules in interaction networks through overlapping Markov clustering. Bioinformatics. 2012; 28(18):1473–9.View ArticleGoogle Scholar
- Palla G, Derényi I, Farkas I, Vicsek T. Uncovering the overlapping community structure of complex networks in nature and society. Nature. 2005; 435:814–8.View ArticlePubMedGoogle Scholar
- Macropol K, Can T, Singh AK. RRW: Repeated random walks on genome-scale protein networks for local cluster discovery. BMC Bioinformatics. 2009; 10:283.View ArticlePubMedPubMed CentralGoogle Scholar
- Ahn YY, Bagrow JP, Lehmann S. Link communities reveal multiscale complexity in networks. Nature. 2010; 466:761–4.View ArticlePubMedGoogle Scholar
- Gavin AC, Aloy P, Grandi P, et al. Proteome survey reveals modularity of the yeast cell machinery. Nature. 2006; 440:631–6.View ArticlePubMedGoogle Scholar
- Wang Y, Qian X. Functional module identification in protein interaction networks by interaction patterns. Bioinformatics. 2014; 30(1):81–93.View ArticlePubMedGoogle Scholar
- Newman MEJ. Modularity and community structure in networks. Proc Nat Acad Sci USA. 2006; 103:8577–82.View ArticlePubMedPubMed CentralGoogle Scholar
- Fortunato S, Barthélemy M. Resolution limit in community detection. Proc Nat Acad Sci USA. 2007; 104(1):36–41.View ArticlePubMedGoogle Scholar
- Andersen R, Chung F, Lang K. Local Graph Partitioning Using PageRank Vectors. In: 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06). Berkeley: 2006. p. 475–86.Google Scholar
- Corneil DG, Perl Y. Clustering and domination in perfect graphs. Discrete Appl Math. 1984; 9(1):27–39.View ArticleGoogle Scholar
- Ashburner M, Ball CA, Blake JA, et al. Gene ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000; 25(1):25–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Hong EL, et al. Gene ontology annotations at SGD: New data sources and annotation methods. Nucleic Acids Res. 2008; 36:577–81.View ArticleGoogle Scholar
- Mewes HW, et al. MIPS: Analysis and annotation of proteins from whole genomes. Nucliec Acids Res. 2004; 32(Database issue):D41–44.View ArticleGoogle Scholar
- Wang Y, Qian X. Joint clustering of protein interaction networks through Markov random walk. BMC Syst Biol. 2014; 8(suppl 1):9.View ArticleGoogle Scholar
- Shih Y, Parthasarathy S. Scalable global alignment for multiple biological networks. BMC Bioinformatics. 2012; 13(Suppl 3):11.View ArticleGoogle Scholar
- Fan N, Pardalos P. Multi-way clustering and biclustering by the ratio cut and normalized cut in graphs. J Combinatorial Optimization. 2010; 23(2):224–51.View ArticleGoogle Scholar
- Gurobi Optimization Inc.Gurobi Optimizer Reference Manual. 2016. http://www.gurobi.com.