Network properties of human disease genes with pleiotropic effects
© Chavali et al; licensee BioMed Central Ltd. 2010
Received: 4 March 2010
Accepted: 4 June 2010
Published: 4 June 2010
The ability of a gene to cause a disease is known to be associated with the topological position of its protein product in the molecular interaction network. Pleiotropy, in human genetic diseases, refers to the ability of different mutations within the same gene to cause different pathological effects. Here, we hypothesized that the ability of human disease genes to cause pleiotropic effects would be associated with their network properties.
Shared genes, with pleiotropic effects, were more central than specific genes that were associated with one disease, in the protein interaction network. Furthermore, shared genes associated with phenotypically divergent diseases (phenodiv genes) were more central than those associated with phenotypically similar diseases. Shared genes had a higher number of disease gene interactors compared to specific genes, implying higher likelihood of finding a novel disease gene in their network neighborhood. Shared genes had a relatively restricted tissue co-expression with interactors, contrary to specific genes. This could be a function of shared genes leading to pleiotropy. Essential and phenodiv genes had comparable connectivities and hence we investigated for differences in network attributes conferring lethality and pleiotropy, respectively. Essential and phenodiv genes were found to be intra-modular and inter-modular hubs with the former being highly co-expressed with their interactors contrary to the latter. Essential genes were predominantly nuclear proteins with transcriptional regulation activities while phenodiv genes were cytoplasmic proteins involved in signal transduction.
The properties of a disease gene in molecular interaction network determine its role in manifesting different and divergent diseases.
Decades-long research efforts have resulted in the identification of a large number of human disease genes [1–3]. Most of this research has been based on experimental and clinical studies of individual diseases and genes. A conceptually different approach was recently described, namely to study the network properties of human disease genes on a diseasome-wide scale. The studies were based on analyzing disease genes databases, such as the Online Mendelian Inheritance in Man (OMIM) . The disease genes were classified as monogenic, polygenic or complex and their properties in molecular interaction networks were elucidated [5, 6]. Further, it was shown that phenotypically similar diseases are often caused by functionally related genes [7–9]. This has led to the exploitation of molecular interaction networks to find novel candidate genes exploring neighbors of a disease-causing gene in a network as they are more likely to cause either the same or a similar disease [7, 8].
Pleiotropy, in the context of human genetic diseases, implies that different pathological effects of different mutations within the same gene predispose an individual to different disorders . While the previous studies have studied the properties of disease genes classified based on the number of genes involved in a phenotype, it is paramount to study the genes classified based on the number of phenotypes they are involved in. This would aid in identifying disease genes that are specific to diseases (specific genes), which can be exploited for therapeutic intervention. This would also help to find pleiotropic genes that are shared between different diseases (shared genes) to understand shared pathogenesis and hence mechanisms underlying co-morbidity [10–12]. Network properties of shared genes associated with phenotypically similar diseases have been examined so far, whereas those of pleiotropic genes with effects on divergent phenotypes and genes associated with specific diseases have not been examined. We hypothesized that the network properties of a gene in molecular interaction network and its tissue co-expression with its interactors determines the number of disease phenotypes it is associated with.
Here, we retrieved human disease genes and the associated diseases from Morbid Map (OMIM). We classified the shared disease genes into genes associated with phenotypically similar diseases (phenosim genes) and those that are associated with phenotypically divergent diseases (phenodiv genes) based on CIPHER score . For instance, AKT1 which is associated with ovarian cancer, breast cancer, colorectal cancer and schizophrenia was classified as a phenodiv gene while TYRP1 which is associated with brown albinism and rufous albinism was classified as a phenosim gene. We demonstrated that shared genes were more central than specific genes while phenodiv genes were more central than phenosim genes. Shared genes had a higher number of disease gene interactors compared to specific genes. However, shared genes had a relatively restricted tissue co-expression with its interactors compared to specific genes. Essential genes, mutations in which lead to lethality, are known to be high degree nodes (hubs), thus occupying a central position in protein interaction network. When compared with specific, shared and phenosim genes essential genes had higher measures of centrality, as expected. However, essential genes and phenodiv genes had comparable connectivities (degrees) instigating us to explore for other network attributes of lethality and pleiotropy. We found that essential and phenodiv genes were intra-modular and inter-modular hubs, with the former being highly co-expressed with their interactors contrary to the phenodiv genes. Gene Ontology analysis identified the essential genes to be predominantly transcription factors residing in nucleus while phenodiv genes were cytoplasmic proteins involved in signal transduction. This study demonstrated that the effect of a disease gene on the number of different and phenotypically divergent diseases is associated with its properties in a molecular interaction network.
Centrality of human disease genes in protein interaction network
Comparison of measures of centrality of specific with shared genes and phenosim with phenodiv genes in human protein interaction network
Disease gene classes
Mean ± S.D.
Mean ± S.D.
Mean ± S.D.
Mean ± S.D.
13.29 ± 24.21
0.26 ± 0.04
3.3 × 10-4 ± 0.001
8.65 ± 1.37
16.28 ± 27.73
0.26 ± 0.03
5.9 × 10-4 ± 0.002
8.61 ± 1.06
12.72 ± 20.72
0.27 ± 0.04
3.1 × 10-4 ± 0.0007
8.75 ± 0.88
20.44 ± 34.85
0.27 ± 0.04
8.7 × 10-4 ± 0.003
8.47 ± 1.17
Comparison of measures of centrality of Essential genes with that of specific, shared, phenosim and phenodiv genes in human protein interaction network
Mean ± S.D.
21.57 ± 35.89
0.27 ± 0.03
5.5 × 10-4 ± 0.002
8.63 ± 0.87
Tissue-specificity of disease genes and their interactors
Comparison of tissue co-expression of interactors of Essential genes with that of specific, shared, phenosim and phenodiv genes
Tissue co-expression of interactors
Mean ± S.D.
46.4 ± 23.2
45.9 ± 24.6
42.4 ± 24.1
42.5 ± 24.0
42.0 ± 22.9
Pleiotropy and network modularity
The phenotypic consequence of a variation in a gene is known to be affected to a large extent by the topological position of its protein product in the molecular interaction network. Thus, the functional importance of a gene is signified by its centrality in a protein interaction network. Previously, we and others have shown that the contribution of variations in a single gene to bring about an associated phenotype is a function of its centrality [4–6]. Accordingly, based on centrality different gene classes leading to a phenotype are ordered as essential genes (being the most central), monogenic disease genes, complex disease genes and non-disease genes (being the most peripheral). However, the network properties of a gene, mutations in which lead to various phenotypes have not been explored.
Based on the current understanding of the human protein interaction network and the results presented here, we demonstrated that the pleiotropic genes (shared genes) had an intermediate centrality compared with essential genes and genes associated with only one disease (specific genes). However, classification of the shared genes based on the similarity of the associated phenotypes demonstrated that phenodiv genes leading to divergent phenotypes were more central than phenosim genes. Thus based on increasing order of centrality these different disease genes could be arranged as Specific, Phenosim and Phenodiv genes. We note that the observed correlation of measures of centrality with phenotypic similarity provides support that the interpretations might not have been affected by considering median CIPHER value as a cut-off to classify phenosim and phenodiv genes.
The similar measures of centrality between essential and phenodiv genes, except betweenness, led us to investigate the properties that determine essentiality (lethality of mutants) and pleiotropy. One of the most striking observations made here was that the essential genes and phenodiv genes were intra-modular and inter-modular hubs, with the former being highly co-expressed with its interactors contrary to the latter. Furthermore, essential genes were predominantly involved in transcription regulation while phenodiv genes in signal transduction.
This study could be affected by knowledge bias pertaining to the disease genes and their associated phenotypes as presented in OMIM, the human protein interaction network and the information on tissue expression. With an increasing number of genetic studies it is likely that some of the specific genes will be identified as shared and some of the phenosim as phenodiv genes. Based on the trend we observed here, it is tempting to speculate that essential disease genes in the specific and phenosim genes categories may have a higher likelihood for this transition. An expansion of knowledge of the diseases and disease genes, protein interactions and tissue expression would aid in better comprehension of the properties associated with genes causing pleiotropic effects. Further, it will be interesting to study the temporal co-expression of these genes with their interactors in various tissues.
Here we demonstrated that the ability of a disease gene to influence the cellular network, signified by its centrality and tissue co-expression with its interactors, determines its pleiotropic effects.
We obtained the list of human genetic diseases and associated 5024 diseases from the OMIM MorbidMap (downloaded in February 2009) . Based on the number of disease associations the genes were then classified as specific- genes associated with one disease (n = 2512) and shared- those associated with more than one disease (n = 838). To determine the phenotypic divergence among diseases associated with the shared genes we used CIPHER (Correlating protein Interaction network and PHEnotype network to pRedict disease genes) . CIPHER provides a similarity score of phenotypes for diseases in MorbidMap, based on their OMIM descriptions using Medical Subject Headings (MeSH) terms. Thus, a lower CIPHER score represents a higher phenotypic divergence. We retrieved CIPHER score for all diseases associated with the same gene and recorded the lowest score for each shared gene. This resulted in identification of 472 shared genes with phenotypic similarity/divergence information. Using the median CIPHER score (0.33) of the entire dataset as cut-off, we then categorized these genes into genes associated with phenotypically similar diseases (Phenosim genes with CIPHER scores ≥0.33; n = 238) and genes associated with phenotypically divergent diseases (Phenodiv genes with CIPHER scores < 0.33; n = 234). We defined essential genes (n = 2366) as previously described , by retrieving a list of human orthologs of mouse genes that resulted in lethal phenotype in embryonic and postnatal stages upon knockout as catalogued in Mouse Genome database . Of the essential genes 811 were associated with human diseases and are classified as essential disease genes while 1555 essential non-disease genes were categorized as essential genes (Figure 1).
Human protein interaction network
We constructed a human protein interaction network using a modified version of CRG interactome . CRG interactome is by far the largest protein interaction network including protein-protein interactions supported by at least one direct experimental evidence demonstrating physical association between two human proteins. To validate proper annotation of the proteins, we retrieved Entrez gene identifiers for all the proteins that were listed in the CRG interactome with Ensembl gene identifiers. After removing entries that lacked Entrez gene identifiers, the modified CRG interactome consisted of 10,092 proteins with 79,211 interactions.
In order to examine differences in tissue distribution among the shared and specific genes, and the Phenosim and Phenodiv genes and co-expression with their interactors (from the human interaction network) we used previously described dataset of complete set of interactions with details of cells and tissues in which each interaction can occur . This dataset was derived based on GNF Atlas expression data  which details tissue expression patterns of genes across 79 different non-disease human tissues. As previously described  we determined the expression of interactors for AKT1 in disease tissues- schizophrenia (GSE17612), colorectal cancer (GSE14333), breast cancer (GSE19615) and ovarian cancer (GSE18520). The datasets of disease tissues from patients were retrieved from NCBI Gene Expression Omnibus . Expression of interactors of CLINT1, PMS1, PHB and RRAS2 were determined in schizophrenia, colorectal cancer, breast cancer and ovarian cancer disease tissues respectively. The interactors are considered co-expressed if they are expressed together in tissues in the datasets considered.
Overrepresentation of Gene Ontology (GO) categories  was determined using the GOSim package from Bioconductor. Statistical significance estimation for overrepresented GO categories in real datasets (phenodiv and essential genes) was done by considering all GO categories without defining any levels in the GO hierarchy in order to avoid loss of information. We considered only categories with at least 5 total genes to prevent categories appearing to be significantly over-represented due to chance.
We compared the measures of centrality- degree, closeness, betweenness and eccentricity, between different classes of genes by Mann-Whitney U tests. Correlation between phenotypic similarity, as determined by CIPHER score and the measures of centrality was determined using Spearman's rank correlation. Different classes of disease genes were compared for disease essential genes using Fisher's Exact test. For GO analysis the number of genes in real dataset (N) with GO classifications was determined. In each GO category the number of genes in real dataset was counted. From the total number of genes represented in GOSim a set of N genes were randomly sampled and the number of genes present in each GO category was counted. This was repeated 10,000 to generate "randomly expected gene lists" for each GO category. Statistical significance was estimated as the ratio of number of times a GO category had more number of expected genes than observed for the real dataset genes to the number of random gene lists considered. Statistical analyses were performed using R.
List of Abbreviations
Online Mendelian Inheritance in Man
Correlating protein Interaction network and PHEnotype network to pRedict disease genes
- Phenosim genes:
Genes associated with phenotypically similar diseases
- Phenodiv genes:
Genes associated with phenotypically divergent diseases
This study was supported by grants from the European Commission, the Swedish Research Council and the Sahlgrenska University Hospital.
We report no conflicts of interest.
- Jimenez-Sanchez G, Childs B, Valle D: Human disease genes. Nature. 2001, 409: 853-855. 10.1038/35057050View ArticlePubMedGoogle Scholar
- Peltonen L, McKusick VA: Genomics and medicine. Dissecting human disease in the postgenomic era. Science. 2001, 291: 1224-1229. 10.1126/science.291.5507.1224View ArticlePubMedGoogle Scholar
- McKusick VA: Mendelian Inheritance in Man and its online version, OMIM. Am J Hum Genet. 2007, 80: 588-604. 10.1086/514346PubMed CentralView ArticlePubMedGoogle Scholar
- Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabási AL: The human disease network. Proc Natl Acad Sci USA. 2007, 104: 8685-8690. 10.1073/pnas.0701361104PubMed CentralView ArticlePubMedGoogle Scholar
- Feldman I, Rzhetsky A, Vitkup D: Network properties of genes harboring inherited disease mutations. Proc Natl Acad Sci USA. 2008, 105: 4323-4328. 10.1073/pnas.0701722105PubMed CentralView ArticlePubMedGoogle Scholar
- Barrenas F, Chavali S, Holme P, Mobini R, Benson M: Network properties of complex human disease genes identified through genome-wide association studies. PLoS One. 2009, 4: e8090- 10.1371/journal.pone.0008090PubMed CentralView ArticlePubMedGoogle Scholar
- Aerts S, Lambrechts D, Maity S, Van Loo P, Coessens B, De Smet F, Tranchevent LC, De Moor B, Marynen P, Hassan B, Carmeliet P, Moreau Y: Gene prioritization through genomic data fusion. Nat Biotechnol. 2006, 24: 537-544. 10.1038/nbt1203View ArticlePubMedGoogle Scholar
- Franke L, van Bakel H, Fokkens L, de Jong ED, Egmont-Petersen M, Wijmenga C: Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. Am J Hum Genet. 2006, 78: 1011-1025. 10.1086/504300PubMed CentralView ArticlePubMedGoogle Scholar
- Ideker T, Sharan R: Protein networks in disease. Genome Res. 2008, 18: 644-652. 10.1101/gr.071852.107PubMed CentralView ArticlePubMedGoogle Scholar
- Park J, Lee DS, Christakis NA, Barabási AL: The impact of cellular networks on disease comorbidity. Mol Syst Biol. 2009, 5: 262- 10.1038/msb.2009.16PubMed CentralView ArticlePubMedGoogle Scholar
- Zhernakova A, van Diemen CC, Wijmenga C: Detecting shared pathogenesis from the shared genetics of immune-related diseases. Nat Rev Genet. 2009, 10: 43-55. 10.1038/nrg2489View ArticlePubMedGoogle Scholar
- Lee DS, Park J, Kay KA, Christakis NA, Oltvai ZN, Barabási AL: The implications of human metabolic network topology for disease comorbidity. Proc Natl Acad Sci USA. 2008, 105: 9880-9885. 10.1073/pnas.0802208105PubMed CentralView ArticlePubMedGoogle Scholar
- Wu X, Jiang R, Zhang MQ, Li S: Network-based global inference of human disease genes. Mol Syst Biol. 2008, 4: 189- 10.1038/msb.2008.27PubMed CentralView ArticlePubMedGoogle Scholar
- Jeong H, Mason SP, Barabási AL, Oltvai ZN: Lethality and centrality in protein networks. Nature. 2001, 411: 41-42. 10.1038/35075138View ArticlePubMedGoogle Scholar
- Bossi A, Lehner B: Tissue specificity and the human protein interaction network. Mol Syst Biol. 2009, 5: 260- 10.1038/msb.2009.17PubMed CentralView ArticlePubMedGoogle Scholar
- Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G, Cooke MP, Walker JR, Hogenesch JB: A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci USA. 2004, 101: 6062-6067. 10.1073/pnas.0400782101PubMed CentralView ArticlePubMedGoogle Scholar
- Lage K, Hansen NT, Karlberg EO, Eklund AC, Roque FS, Donahoe PK, Szallasi Z, Jensen TS, Brunak S: A large-scale analysis of tissue-specific pathology and gene expression of human disease genes and complexes. Proc Natl Acad Sci USA. 2008, 105: 20870-20875. 10.1073/pnas.0810772105PubMed CentralView ArticlePubMedGoogle Scholar
- Taylor IW, Linding R, Warde-Farley D, Liu Y, Pesquita C, Faria D, Bull S, Pawson T, Morris Q, Wrana JL: Dynamic modularity in protein interaction networks predicts breast cancer outcome. Nat. Biotechnol. 2009, 27: 199-204. 10.1038/nbt.1522View ArticlePubMedGoogle Scholar
- Amberger J, Bocchini CA, Scott AF, Hamosh A: McKusick's Online Mendelian Inheritance in Man (OMIM). Nucleic Acids Res. 2009, 37: D793-D796. 10.1093/nar/gkn665PubMed CentralView ArticlePubMedGoogle Scholar
- Blake JA, Bult CJ, Eppig JT, Kadin JA, Richardson JE, : The Mouse Genome Database genotypes::phenotypes. Nucleic Acids Res. 2009, 37: D712-D719. 10.1093/nar/gkn886PubMed CentralView ArticlePubMedGoogle Scholar
- Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva A, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Muertter RN, Edgar R: NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res. 2009, 37: D885-90. 10.1093/nar/gkn764PubMed CentralView ArticlePubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25: 25-29. 10.1038/75556PubMed CentralView ArticlePubMedGoogle Scholar