Low-complexity regions within protein sequences have position-dependent roles
- Alain Coletta†1, 2, 3Email author,
- John W Pinney†4,
- David Y Weiss Solís5, 6,
- James Marsh2,
- Steve R Pettifer2 and
- Teresa K Attwood1
© Coletta et al; licensee BioMed Central Ltd. 2010
Received: 13 October 2009
Accepted: 13 April 2010
Published: 13 April 2010
Regions of protein sequences with biased amino acid composition (so-called Low-Complexity Regions (LCRs)) are abundant in the protein universe. A number of studies have revealed that i) these regions show significant divergence across protein families; ii) the genetic mechanisms from which they arise lends them remarkable degrees of compositional plasticity. They have therefore proved difficult to compare using conventional sequence analysis techniques, and functions remain to be elucidated for most of them. Here we undertake a systematic investigation of LCRs in order to explore their possible functional significance, placed in the particular context of Protein-Protein Interaction (PPI) networks and Gene Ontology (GO)-term analysis.
In keeping with previous results, we found that LCR-containing proteins tend to have more binding partners across different PPI networks than proteins that have no LCRs. More specifically, our study suggests i) that LCRs are preferentially positioned towards the protein sequence extremities and, in contrast with centrally-located LCRs, such terminal LCRs show a correlation between their lengths and degrees of connectivity, and ii) that centrally-located LCRs are enriched with transcription-related GO terms, while terminal LCRs are enriched with translation and stress response-related terms.
Our results suggest not only that LCRs may be involved in flexible binding associated with specific functions, but also that their positions within a sequence may be important in determining both their binding properties and their biological roles.
Low-complexity regions (LCRs) in protein sequences are regions containing little diversity in their amino acid composition. The degree of diversity they exhibit may vary, ranging from regions comprising few different amino acids, to those comprising just one, the amino acid positions within these regions being either loosely clustered, irregularly spaced, or periodic . This work defines LCRs computationally as an amino acid sequence with low information content (see methods). Therefore, simple repetitive sequences such as tandem amino acid repeats form part of the LCR dataset discussed here.
LCRs are common in protein sequences, but precise measures of their abundance are difficult to ascertain. One of the problems is that the degrees of stringency applied by different detection methods differ, leading to different estimates of the numbers of LCRs in the same dataset. Importantly also, our knowledge of the protein universe has changed dramatically during the last 15 years, as protein sequence repositories have become engorged with the outputs of high-throughput sequencing projects. Protein sequence databases have thus grown enormously (both in terms of the numbers of sequences they contain and in terms of the numbers of organisms represented), and estimates of the numbers of LCRs they contain have changed accordingly: e.g., the proportion of proteins in the Swiss-Prot database that contain LCRs has changed from 56%, in 1993 (V-26.0) , to 12% in the current version of UniProt (V-54.0) . Notwithstanding their abundance in protein sequences, LCRs are largely under-represented in the Protein Data Bank (PDB) [4, 5], presumably because most of the proteins containing LCRs do not readily crystallise. Despite this lack of structural information, LCRs are believed to play pivotal roles across a wide range of biological functions [6–8], some of whose mechanisms have been extensively documented, although the proposed functional models remain unverified [8–10].
Low-complexity regions evolve rapidly through recombination events
LCRs are known to evolve rapidly, sometimes via mitotic replication slippage, or, more often, via meiotic recombination events . Highly dynamic diversification of these regions, and high levels of inter-species variation and polymorphism, suggest that newly generated and expanded LCRs are, in most cases, structurally and functionally neutral, with a high probability of fixation , thus generating novel material that could enable rapid functional expansions. Moxon and co-workers suggested that repeat formation is a common source of genetic variation among prokaryotes to generate novel surface antigens and adapt to fast evolving environments [7, 13]. This source of variability may also compensate for longer generation times in eukaryotes, which have higher proportions of LCRs  and it has been suggested that expansions and contractions of tandem repeats constitute a large source of phenotypic variation .
Hub proteins contain more LCRs than non-hub proteins
While some LCRs are known to play important structural roles by acquiring strong static conformations , others have been associated with intrinsically unstructured proteins [15, 16]. The flexible nature of regions lacking well-defined folding structures is thought to be responsible for their versatile binding capabilities; this flexibility could allow these regions to bind several different targets . In their recent study on yeast protein-protein interactions (PPIs), Ekman and co-workers noted that the highly connected 'hub' proteins contain an increased fraction with LCRs compared to non-hub proteins . They suggested that disordered regions are particularly important for flexible binding and could act as flexible linkers between globular protein domains. Here, we set out to investigate whether proteins with LCRs tend to have larger numbers of binding partners across a range of high confidence PPI datasets. We then examined whether proteins with LCRs positioned at their sequence extremities show differences in connectivity compared to proteins with LCRs positioned in central regions, and if the number of protein binding partners is related to LCR length. Finally, we functionally categorised both terminal-LCR and central-LCR groups using Gene Ontology  (GO)-term enrichment analysis.
Results and Discussion
Nodes and edges in each PPI dataset
Number of nodes
Number of edges
The FYI  is generated as the union of: Yeast two-hybrid experiments [23–25], datasets produced from affinity purification and mass spectrometry screens [26, 27], one dataset produced from in silico computational prediction methods , the physical protein-protein interactions, excluding interactions from genome-scale experiments, from the Munich Information Center for Protein Sequences (MIPS)  Comprehensive Yeast Genome Database (CYGD) dataset , and finally, the CYGD protein complexes published in the literature (called LC for L iterature C urated data). The resulting union is then filtered keeping only interactions observed at least twice by different detection methods.
The HC PPI dataset  is also a join of multiple interaction datasets, were the minimal criterion for inclusion is that relevant interactions must be independently reported at least twice. This differs from the FYI in that two independent reports can come from two datasets using identical detection methods. HC uses LC data from five major PPI databases - BIND , BioGrid , DIP , MINT  and MIPS , and interactions detected from affinity purification and mass spectrometry screens [34, 35]. The DIPv dataset  is a computationally verified core of the DIP dataset , which is a database of experimentally verified interactions determined by several techniques (such as genome-wide two hybrid screen-including results from  and -, immunoprecipitation, affinity binding, and antibody blockage).
The DIPv core was computed using two methods: the E xpression P rofile R eliability (EPR) index, and the P aralogous V erification M ethod (PVM). EPR compares RNA expression profiles of potentially interactive proteins against expression profiles of known interacting, and non-interacting pairs of proteins. PVM measures the likelihood that two proteins interact by measuring interactions between their paralogues. We refer to this dataset as DIP-verified (DIPv).
S. cerevisiae is also amongst the most well-annotated genomes, making it ideal for functional analysis using the Gene Ontology . In agreement with previous estimates , our LCR-detection method (see Methods) found that of 6, 165 S. cerevisiae proteins documented in UniProt, 1; 306 contained LCRs. Of these, 929 contain a unique LCR; to simplify the analyses presented, this study deals only with proteins containing a single LCR.
Proteins containing LCRs tend to have more interactions than those without
Degree distributions comparison between protein with and without LCRs.
1.58 × 10-13
3.63 × 10-04
LCR locations are biased towards protein sequence extremities
Terminal LCRs are more connected than central LCRs and show length-connectivity dependence
Number of t-LCRs and c-LCRs found across the four PPI datasets.
Degree distributions comparison between protein with c-LCRs, t-LCRs, and proteins without LCRs.
1.94 × 10-07
1.54 × 10-10
6.88 × 10-04
Correlation results (LCR length versus protein degree).
3.66 × 10-04
1.22 × 10-04
2.68 × 10-04
GO analysis shows that terminal and central LCRs have different biological roles
GO term enrichments for all LCRs.
GO term ID
3.89 × 10-06
response to stress
4.40 × 10-05
1.03 × 10-04
protein amino acid phosphorylation
2.22 × 10-04
6.08 × 10-04
regulation of transcription, DNA-dependent
1.25 × 10-04
nucleic acid binding
2.59 × 10-04
4.58 × 10-04
fungal-type cell wall
6.27 × 10-04
cellular bud tip
GO term enrichments for central and terminal LCRs.
GO term ID
1.09 × 10-10
2.76 × 10-08
response to stress
3.64 × 10-06
4.62 × 10-04
8.55 × 10-06
7.24 × 10-04
2.19 × 10-05
SRP-dependent cotranslational protein targeting to membrane, translocation
8.99 × 10-04
Golgi to plasma membrane transport
1.37 × 10-05
9.10 × 10-05
structural constituent of ribosome
SNAP receptor activity
translation initiation factor activity
translation elongation factor activity
protein domain specific binding
unfolded protein binding
DNA replication origin binding
RNA polymerase subunit kinase activity
2.40 × 10-05
7.83 × 10-05
small ribosomal subunit
1.63 × 10-04
eukaryotic translation initiation factor 3 complex
cytosolic small ribosomal subunit
luminal surveillance complex
transcription factor complex
GO term ID
3.03 × 10-09
1.40 × 10-06
protein amino acid phosphorylation
4.38 × 10-06
4.52 × 10-05
regulation of transcription, DNA-dependent
9.81 × 10-05
4.64 × 10-08
1.03 × 10-05
protein serine/threonine kinase activity
2.18 × 10-07
1.68 × 10-05
2.28 × 10-07
1.68 × 10-05
protein kinase activity
1.88 × 10-06
1.04 × 10-04
8.39 × 10-05
2.94 × 10-04
8.31 × 10-04
nucleic acid binding
ATP-dependent helicase activity
histone deacetylase activity
MAP kinase kinase activity
specific transcriptional repressor activity
2.04 × 10-06
3.39 × 10-04
cellular bud tip
4.07 × 10-06
3.39 × 10-04
5.24 × 10-06
3.39 × 10-04
2.89 × 10-04
mRNA cleavage factor complex
7.97 × 10-04
9.96 × 10-04
cellular bud neck
Our results show that LCRs are preferentially located towards sequence extremities, and that proteins with LCRs in their sequence extremities have more protein binding partners than proteins with LCRs in their central regions. Furthermore, we have shown the length of LCRs to be positively correlated with the number of binding partners, but only in the sequence extremities. While t-LCRs can extend free from the rest of the protein structure, c-LCRs are likely to be surrounded by protein globular domains, thus limiting their flexibility and accessibility, and therefore the number of different proteins to which they can mediate binding. By contrast, if t-LCRs themselves tend to act as promiscuous interfaces for protein binding, this would explain our observation that proteins with longer t-LCR regions have a tendency towards a higher number of protein binding partners. Examining the list of over-represented GO terms in Table 7, we hypothesise that t-LCRs play major roles in low-specificity biological events that involve large protein complexes. Protein chaperones, for example, which play a major role in stress response, have low-specificity binding properties due to the large variety of partners they bind to assist conformational search towards global energy minima [37, 38]. Translation and translation elongation are also events requiring low-specificity interactions, involving a crowded protein machinery that operates on the entire proteome. Finally, molecular transport could also be considered to fall within this category, with large protein complexes moving a wide variety of cargos across the cell.
Although some c-LCRs might still be expected to act as flexible linkers, there is evidence that they may also act as direct binding interfaces, albeit with more restricted promiscuity than t-LCRs. Kim and co-workers  found that disordered regions could function as interfaces with a limited number of binding partners, particularly in the context of phosphorylation cascades in signalling pathways, where proteins tend to contain both a structured kinase domain and an unstructured kinase-binding domain. Indeed, regions of protein disorder are already known to be implicated in signalling as phosphorylation sites . Our GO analysis finds protein kinase functions to be over-represented only for the set of central LCRs, and not those located at the termini, hence could be considered to be consistent with the existence of a specific set of binding partners for each signalling protein. The set of c-LCR proteins is also enriched with other biological processes that, although still 'promiscuous' in the sense that they have multiple binding partners, need to be much more specific than the translation, folding, and transport processes observed for the t-LCRs. Transcription regulation events, for example, limit the number of proteins present simultaneously . Binding events in polyadenylation processes are also relatively specific and do not involve crowded protein machineries.
In their recent study on protein-protein interactions, Ekman and co-workers noted that hub proteins (those with a large number of interacting partners) are more often multi-domain proteins and contain more disordered regions compared to non-hubs. This observations led them to stress that the disordered regions serve as linkers between domains, in addition to their more commonly reported role in flexible or rapidly reversible binding . Our proteome-wide results show that these two LCR functional roles are distinct and depend on the location of the LCRs within the protein sequence: their role in flexible and rapidly reversible binding is preferentially mediated by LCRs located in the terminal regions of proteins while their role as linkers between protein domains is preferentially mediated by centrally located LCRs.
These results, together with the other differences in GO enrichment discussed above, suggest that the functions of the low-complexity regions of a protein are related in a fundamental manner to their positions within the sequence.
Implementation of the LCRs detection algorithm
We define a low-complexity region as any window of length w with an entropy value smaller than t w . Entropy distributions for every window length are highly skewed, with a bell-shaped curve at high entropy values and a very long and thin tail extending toward the low entropy values where LCRs are located (see Additional file 3: Figure S3). Given that all entropy distributions for any window length have a similar shape, a single cut-off point selects the same proportion of low-entropy regions, enriched LCRs, regardless of window length.
A very conservative threshold was sought to exclude non-LCR. Visual inspection determined that a threshold corresponding to 0.5% of the area under the distribution curve only included the portion of the curve where the flat tail, containing the LCRs, was located. A very conservative threshold was chosen to have a stringent cut-off and exclude non-LCRs.
Selecting LCRs in protein sequences
where H is the entropy, μ w the mean, and σ w the standard deviation of f w (H). If multiple LCRs overlap, only the region with the highest Z-score is retained. All detected regions can be accessed and queried through the UTOPIA User Interface .
Analyses were cross-validated over four PPI datasets: three high-confidence datasets (HC , DIPv  and FYI ) and one, potentially of lower-confidence, but much larger set of interactions (BioGrid ). Although the comparison of the three different high-confidence PPI datasets, FYI, HC and DIPv, showed a much greater overlap than previous datasets , there were still large numbers of differences between them (Additional file 4: Figure S5). Therefore, inter-study validation using the three high-confidence and the BioGrid PPI datasets was performed to ensure robust results. To ensure that only information relevant to protein-protein interactions was obtained from the BioGrid network, it was first stripped of all non-physical interactions, as described in . To determine whether LCRs are equally distributed across PPI datasets, the study also investigated the distribution of LCRs within the different PPI datasets. Results showed that the three high-confidence networks were similarly enriched in LCRs (approximately 19% of their entries contain LCRs, see Additional file 5: Table S1). These enrichments in the high-confidence networks support the idea that these regions are highly interactive.
Measurements of region positions in protein sequences, correlations, and comparison of degree distributions
We defined the position of an LCR as the coordinate of the LCR's centre within the protein sequence in which it occurs. We then divided this coordinate by the length of the protein to express it on a normalised scale between 0 and 1. The result is an LCR position metric comparable across LCRs of varying lengths within proteins of varying lengths. t-LCRs were defined as regions starting or ending at no more than 25 amino acids from either sequence extremity, c-LCRs as regions starting or ending at least 50 amino acids from either sequence extremity. Correlation p-values and regression lines were computed using the linear model function implemented in the R statistics package. Degree distributions were compared using the Wilcoxon Mann-Whitney test, also implemented in the R statistics package.
GO-term enrichment analyses
GO-term enrichment p-values were calculated using Fisher's exact test , and transformed to q-values using Benjamini and Hochberg's multiple testing correction method , as implemented in the R statistics package, version 2.7.
AC and DW are supported by the Institut d'encouragement de la Recherche Scientifique et de l'Innovation de Bruxelles (IRSIB). JWP is supported by a University Research Fellowship from the Royal Society. The authors would like to thank Casey Bergman, Stanislav Rudyak, Jose Couceiro, and Jan Griesbach for helpful suggestions.
- DePristo M, Zilversmit M, Hartl D: On the abundance, amino acid composition, and evolutionary dynamics of low-complexity regions in proteins. Gene. 2006, 378: 19-30. 10.1016/j.gene.2006.03.023View ArticlePubMedGoogle Scholar
- Wootton J, Federhen S: Statistics of local complexity in amino acid sequences and sequence databases. Computers chem. 1993, 17 (2): 149-163. 10.1016/0097-8485(93)85006-X.View ArticleGoogle Scholar
- , : The universal protein resource (UniProt). Nucleic Acids Research. 2008, 36: D190-5. 10.1093/nar/gkm895View ArticleGoogle Scholar
- Huntley M, Golding G: Simple sequences are rare in the Protein Data Bank. Proteins. 2002, 48: 134-140. 10.1002/prot.10150View ArticlePubMedGoogle Scholar
- Berman M, Westbrook J, Feng Z, Gilliland G, Bhat T, Weissig H, Shindyalov I, Bourne P: The protein data bank. Nuc Acids Res. 2000, 28: 235-242. 10.1093/nar/28.1.235.View ArticleGoogle Scholar
- Fondon J, Garner H: Molecular origins of rapid and continuous morphological evolution. P Natl Acad Sci Usa. 2004, 101 (52): 18058-18063. 10.1073/pnas.0408118101.View ArticleGoogle Scholar
- Verstrepen K, Jansen A, Lewitter F, Fink G: Intragenic tandem repeats generate functional variability. Nat Genet. 2005, 37 (9): 986-90. 10.1038/ng1618PubMed CentralView ArticlePubMedGoogle Scholar
- Phatnani H, Greenleaf A: Phosphorylation and functions of the RNA polymerase II CTD. Genes Dev. 2006, 20: 2922-2936. 10.1101/gad.1477006View ArticlePubMedGoogle Scholar
- Zagon I, Verderame M, McLaughlin P: The biology of the opioid growth factor receptor (OGFr). Brain Res Brain Res Rev. 2002, 38: 351-376. 10.1016/S0165-0173(01)00160-6View ArticlePubMedGoogle Scholar
- Wanker E, Sun Y, Savitz A, Meyer D: Functional characterization of the 180-kD ribosome receptor in vivo. J Cell Biol. 1995, 130: 29-39. 10.1083/jcb.130.1.29View ArticlePubMedGoogle Scholar
- Marcotte E, Pellegrini M, Yeates T, Eisenberg D: A Census of Protein Repeats. Journal of Molecular Biology. 1999, 293: 151-160. 10.1006/jmbi.1999.3136View ArticlePubMedGoogle Scholar
- D Ekman SL, Bjorklund A, Elofsson A: What properties characterize the hub proteins of the protein-protein interaction network of the protein-protein interaction network of Saccharomyces cerevisiae?. Genome Biology. 2006, 7 (6): R45- 10.1186/gb-2006-7-6-r45View ArticleGoogle Scholar
- Moxon E, Rainey P, Nowak M, Lenski R: Adaptive evolution of highly mutable loci in pathogenic bacteria. Current Biology. 1994, 4: 24-33. 10.1016/S0960-9822(00)00005-1View ArticlePubMedGoogle Scholar
- Tatham A, Shewry P: Elastomeric proteins: biological roles, structures and mechanisms. Trends Biochem Sci. 2000, 25 (11): 567-571. 10.1016/S0968-0004(00)01670-4View ArticlePubMedGoogle Scholar
- Tompa P: Intrinsically unstructured proteins. Trends Biochem Sci. 2002, 27 (10): 527-533. 10.1016/S0968-0004(02)02169-2View ArticlePubMedGoogle Scholar
- Dunker A, Obradovic Z, Romero P, Garner E: Intrinsic protein disorder in complete genomes. Genome Informatics. 2000, 11: 161-171.PubMedGoogle Scholar
- Dyson H, Wright P: Intrinsically unstructured proteins and their functions. Nature Reviews Molecular Cell Biology. 2005, 6: 197-208. 10.1038/nrm1589View ArticlePubMedGoogle Scholar
- Ashburner M, Ball C, Blake J, Botstein D, Butler H, Cherry J, Davis A, Dolinski K, Dwight S, Eppig J, Harris M, Hill D, Issel-Tarver L, Kasarskis A, Lewis S, Matese J, Richardson J, Ringwald M, Rubin G, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25: 25-29. 10.1038/75556PubMed CentralView ArticlePubMedGoogle Scholar
- Bertin N, Simonis N, Dupuy D, Cusick M, Han J, Fraser H, Roth F, Vidal M: Confirmation of organized modularity in the yeast interactome. Plos Biol. 2007, 5 (6): e153- 10.1371/journal.pbio.0050153PubMed CentralView ArticlePubMedGoogle Scholar
- Batada N, Reguly T, Breitkreutz A, Boucher L, Breitkreutz B, Hurst L, Tyers M: Still stratus not altocumulus: further evidence against the date/party hub distinction. Plos Biol. 2007, 5 (6): e154- 10.1371/journal.pbio.0050154PubMed CentralView ArticlePubMedGoogle Scholar
- Deane C, Salwinski L, Xenarios I, Eisenberg D: Protein Interactions Two Methods for Assessment of the Reliability of High Throughput Observations. Molecular and Cellular Proteomics. 2002, 1: 349-356. 10.1074/mcp.M100037-MCP200View ArticlePubMedGoogle Scholar
- Breitkreutz B, Stark C, Reguly T, Boucher L, Breitkreutz A, Livstone M, Oughtred R, Lackner D, Bahler J, Wood V, Dolinski K, Tyers M: The BioGRID Interaction Database: 2008 update. Nucleic Acids Res. 2008, 36: D637-40. 10.1093/nar/gkm1001PubMed CentralView ArticlePubMedGoogle Scholar
- Uetz P, Giot L, Cagney G, Mansfield T, Judson R, et al.: A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature. 2000, 403: 623-627. 10.1038/35001009View ArticlePubMedGoogle Scholar
- Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y: A comprehensive two-hybrid analysis to explore the yeast protein interactome. P Natl Acad Sci Usa. 2001, 98 (8): 4569-4574. 10.1073/pnas.061034498.View ArticleGoogle Scholar
- Fromont-Racine M, Mayes A, Brunet-Simon A, Rain J, Colley A, Dix I, Decourty L, Joly N, Ricard F, Beggs J, Legrain P: Genome-wide protein interaction screens reveal functional networks involving Sm-like proteins. Yeast. 2000, 17 (2): 95-110. 10.1002/1097-0061(20000630)17:2<95::AID-YEA16>3.0.CO;2-HPubMed CentralView ArticlePubMedGoogle Scholar
- Gavin A, Bösche M, Krause R, Grandi P, Marzioch M, Bauer A, Schultz J, Rick J, Michon A, Cruciat C, Remor M, Hofert C, Schelder M, Brajenovic M, Ruffner H, Merino A, Klein K, Hudak M, Dickson D, Rudi T, Gnau V, Bauch A, Bastuck S, Huhse B, Leutwein C, Heurtier M, Copley R, Edelmann A, Querfurth E, Rybin V, Drewes G, Raida M, Bouwmeester T, Bork P, Seraphin B, Kuster B, Neubauer G, Superti-Furga G: Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature. 2002, 415 (6868): 141-147. 10.1038/415141aView ArticlePubMedGoogle Scholar
- Ho Y, Gruhler A, Heilbut A, Bader G, Moore L, Adams S, Millar A, Taylor P, Bennett K, Boutilier K, Yang L, Wolting C, Donaldson I, Schandorff S, Shewnarane J, Vo M, Taggart J, Goudreault M, Muskat B, Alfarano C, Dewar D, Lin Z, Michalickova K, Willems A, Sassi H, Nielsen P, Rasmussen K, Andersen J, Johansen L, Hansen L, Jespersen H, Podtelejnikov A, Nielsen E, Crawford J, Poulsen V, Sorensen B, Matthiesen J, Hendrickson R, Gleeson F, Pawson T, Moran M, Durocher D, Mann M, Hogue C, Figeys D, Tyers M: Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature. 2002, 415 (6868): 180-183. 10.1038/415180aView ArticlePubMedGoogle Scholar
- Mering CV, Krause R, Snel B, Cornell M, Oliver S, Fields S, Bork P: Comparative assessment of large-scale data sets of protein-protein interactions. Nature. 2002, 417 (6887): 399-403. 10.1038/nature750View ArticleGoogle Scholar
- Mewes H, Frishman D, Güldener U, Mannhaupt G, Mayer K, Mokrejs M, Morgenstern B, Münsterkötter M, Rudd S, Weil B: MIPS: a database for genomes and protein sequences. Nucleic Acids Research. 2002, 30: 31-34. 10.1093/nar/30.1.31PubMed CentralView ArticlePubMedGoogle Scholar
- Güldener U, Münsterkötter M, Oesterheld M, Pagel P, Ruepp A, Mewes H, Stümpflen V: MPact: the MIPS protein interaction resource on yeast. Nucleic Acids Research. 2006, 34: D436-41. 10.1093/nar/gkj003PubMed CentralView ArticlePubMedGoogle Scholar
- Bader G, Donaldson I, Wolting C, Ouellette B, Pawson T, Hogue C: BIND-The Biomolecular Interaction Network Database. Nucleic Acids Research. 2001, 29: 242-245. 10.1093/nar/29.1.242PubMed CentralView ArticlePubMedGoogle Scholar
- Xenarios I, Salwínski L, Duan X, Higney P, Kim S, Eisenberg D: DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Research. 2002, 30: 303-305. 10.1093/nar/30.1.303PubMed CentralView ArticlePubMedGoogle Scholar
- Chatr-aryamontri A, Ceol A, Palazzi L, Nardelli G, Schneider M, Castagnoli L, Cesareni G: MINT: the Molecular INTeraction database. Nucleic Acids Research. 2007, 35: D572-4. 10.1093/nar/gkl950PubMed CentralView ArticlePubMedGoogle Scholar
- Gavin A, Aloy P, Grandi P, Krause R, Boesche M, Marzioch M, Rau C, Jensen L, Bastuck S, Dümpelfeld B, Edelmann A, Heurtier M, Hoffman V, Hoefert C, Klein K, Hudak M, Michon A, Schelder M, Schirle M, Remor M, Rudi T, Hooper S, Bauer A, Bouwmeester T, Casari G, Drewes G, Neubauer G, Rick J, Kuster B, Bork P, Russell R, Superti-Furga G: Proteome survey reveals modularity of the yeast cell machinery. Nature. 2006, 440 (7084): 631-636. 10.1038/nature04532View ArticlePubMedGoogle Scholar
- Krogan N, Cagney G, Yu H, Zhong G, Guo X, Ignatchenko A, Li J, Pu S, Datta N, Punna T, Peregrín-Alvarez J, Tikuisis A, Shales M, Zhang X, Davey M, Robinson M, Paccanaro A, Bray J, Sheung A, Beattie B, Richards D, Canadien V, Lalev A, Mena F, Wong P, Starostine A, Canete M, Vlasblom J, Wu S, Orsi C, Collins S, Chandran S, Haw R, Rilstone J, Gandi K, Thompson N, Musso G, Onge PS, Ghanny S, Lam M, Butland G, Altaf-Ul A, Kanaya S, Shilatifard A, O'shea E, Weissman J, Ingles C, Hughes T, Parkinson J, Gerstein M, Wodak S, Emili A, Greenblatt J: Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature. 2006, 440 (7084): 637-643. 10.1038/nature04670View ArticlePubMedGoogle Scholar
- Wootton J: Sequences with unusual amino acid compositions. Curr opin struct biol. 1994, 4: 413-421. 10.1016/S0959-440X(94)90111-2.View ArticleGoogle Scholar
- Tompa P, Csermely P: The role of structural disorder in the function of RNA and protein chaperones. FASEB J. 2004, 18 (11): 1169-1175. 10.1096/fj.04-1584revView ArticlePubMedGoogle Scholar
- Sandhu K: Intrinsic disorder explains diverse nuclear roles of chromatin remodeling proteins. J Mol Recognit. 2009, 22: 1-8. 10.1002/jmr.915View ArticlePubMedGoogle Scholar
- Kim P, Sboner A, Xia Y, Gerstein M: The role of disorder in interaction networks: a structural analysis. Molecular Systems Biology. 2008, 4: 179- 10.1038/msb.2008.16PubMed CentralView ArticlePubMedGoogle Scholar
- Iakoucheva L, Radivojac P, Brown C, O'Connor T, Sikes J, Obradovic Z, Dunker A: The importance of intrinsic disorder for protein phosphorylation. Nucleic Acids Research. 2004, 32 (3): 1037-1049. 10.1093/nar/gkh253PubMed CentralView ArticlePubMedGoogle Scholar
- Reményi A, Scholer H, Wilmanns M: Combinatorial control of gene expression. Nat Struct Mol Biol. 2004, 11 (9): 812-815. 10.1038/nsmb820View ArticlePubMedGoogle Scholar
- Pettifer S, Thorne D, McDermott P, Marsh J, Villéger A, Kell D, Attwood T: Visualising biological data: a semantic approach to tool and database integration. BMC Bioinformatics. 2009, 10 (Suppl 6): S19- 10.1186/1471-2105-10-S6-S19PubMed CentralView ArticlePubMedGoogle Scholar
- Yook S, Oltvai Z, Barabási A: Functional and topological characterization of protein interaction networks. Proteomics. 2004, 4 (4): 928-942. 10.1002/pmic.200300636View ArticlePubMedGoogle Scholar
- Hakes L, Pinney J, Lovell S, Oliver S, Robertson D: All duplicates are not equal: the difference between small-scale and genome duplication. Genome Biol. 2007, 8 (10): R209- 10.1186/gb-2007-8-10-r209PubMed CentralView ArticlePubMedGoogle Scholar
- Mazurie A: http://aurelien.mazurie.oenone.net
- Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. 1995,Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.