Detection of type 2 diabetes related modules and genes based on epigenetic networks
- Hui Liu†1, 2,
- Tongtong Wang†3,
- Hongbo Liu1, 2,
- Yanjun Wei2,
- Guofeng Zhao2,
- Jianzhong Su2,
- Qiong Wu1,
- Hong Qiao3Email author and
- Yan Zhang2Email author
© Liu et al.; licensee BioMed Central Ltd. 2014
Published: 24 January 2014
Type 2 diabetes (T2D) is one of the most common chronic metabolic diseases characterized by insulin resistance and the decrease of insulin secretion. Genetic variation can only explain part of the heritability of T2D, so there need new methods to detect the susceptibility genes of the disease. Epigenetics could establish the interface between the environmental factor and the T2D Pathological mechanism.
Based on the network theory and by combining epigenetic characteristics with human interactome, the weighted human DNA methylation network (WMPN) was constructed, and a T2D-related subnetwork (TMSN) was obtained through T2D-related differentially methylated genes. It is found that TMSN had a T2D specific network structure that non-fatal metabolic disease causing genes were often located in the topological and functional periphery of network. Combined with chromatin modifications, the weighted chromatin modification network (WCPN) was built, and a T2D-related chromatin modification pattern subnetwork was obtained by the TMSN gene set. TCSN had a densely connected network community, indicating that TMSN and TCSN could represent a collection of T2D-related epigenetic dysregulated sub-pathways. Using the cumulative hypergeometric test, 24 interplay modules of DNA methylation and chromatin modifications were identified. By the analysis of gene expression in human T2D islet tissue, it is found that there existed genes with the variant expression level caused by the aberrant DNA methylation and (or) chromatin modifications, which might affect and promote the development of T2D.
Here we have detected the potential interplay modules of DNA methylation and chromatin modifications for T2D. The study of T2D epigenetic networks provides a new way for understanding the pathogenic mechanism of T2D caused by epigenetic disorders.
As one of the fastest growing diseases in the world, T2D has developed a main public health problem with more than 280 million people affected by this disease, and is the major type of diabetes with about 90% patients of all the diabetes patients . T2D is characterized by insulin resistance and the decrease of insulin secretion, and the major complications comprise the abnormally high or low blood glucose caused by diabetes related medications, the heart and blood vessel disease, osteoporosis, skin and mouth problems, and the damage of other organs, including kidneys, eyes, feet, nerves and so on. T2D is also the risk factor for Alzheimer's disease and vascular dementia. T2D could be diagnosed by detecting the levels of blood glucose, glucose tolerance and HbA1c (glycosylated haemoglobin).
In the unbiased genome-wide studies on T2D risk genes, the insulin-producing β cells in islets have been considered as the crucial object of researches. The pancreatic β cells were stimulated to secrete insulin to regulate the homoeostasis of the blood glucose by the postprandial high levels of blood glucose . Until now, hundreds of genetic variations related with T2D or glucose/insulin have been identified by GWAS (Genome Wide Association Studies) [6–9]. Most T2D risk genes identified in GWAS performs the biological processes which affect the number or function of the pancreatic β cells, emphasizing the important effect of β cells in the occurrence and development of T2D . Currently, the variances of molecular level in human islets have been analyzed systematically by the integration of GWAS, cDNA Chips and the measurement of the response reaction of insulin for glucose and HbA1c . In another study, a manually created database T2DGADB was built using the genetic association data provided by 701 publications of T2D studies, including the data source derived from GWAS and Meta analysis . But these genetic variants could only explain part of the heritability of T2D, so there need for the new methods to identify the T2D susceptibility genes.
Most previous studies often investigated the T2D etiology in the field of genetics, but the recent researches found that the environmental and lifestyle factors could also affect the T2D pathogenesis , in addition to the genetic influence. And epigenetic variations could establish the links between the environmental exposure and pathological mechanism of T2D [14, 15]. Epigenetic variations were heritable and reversible, which were considered to play an important role in metabolic diseases. As the major environmental risk factors of T2D, the unhealthy diet causing obesity and sedentary lifestyle might cause the epigenetic changes and even promote the occurrence of T2D. Current studies have shown that DNA methylation and histone modifications could change by the metabolic or nutritional disorders and other environmental factors, thereby affecting the development of the pancreatic β cell and the function of insulin secretion to cause the decline of the sensitivity of insulin response, and ultimately lead to the occurrence of T2D. There is also evidence to suggest that the low birth weight is considered as an indicator of fetal malnutrition and associated with Impaired glucose tolerance (IGT) and T2D later in life. Recently, the genome-wide maps of epigenetic markers have been described in several cell lines of mouse and human, including the human islet tissue. A map of histone modifications for human islet tissue was constructed by ChIP-seqencing (ChIP-seq), including the active (H3K4me1, H3K4me2 and H3K4me3) and inhibitive (H3K27me3) histone modifications, which was considered as the reference to the researches for the T2D etiology . These epigenetic findings provide a reliable resource for understanding the crucial roles of the regulatory elements for the human islets in T2D susceptibility.
The statistically significant overlaps between the complex disease pathologies might be caused by the protein variances which are involved in the same pathways or protein complexes, or have the same or similar functions. The increasing evidences also showed that the proteins, whose corresponding genes contributed to the same disease, often tend to interact with each other sharing the same or similar biological processes and co-expressing in the same tissues. Therefore, the computational systems biology methods based on network theory have been widely used to investigate into the human complex diseases. In the network-based approaches, it is hypothesized that the biological networks consist of the interacting proteins with specific functions . The functional interactions could be shown in the interaction level, including physical interaction, co-expression, co-regulation and phenotypic interaction. The functionally related genes usually cluster in the biological networks, and the interactome could explain the complexity of the network linkages. The human interactome has been used to identify the potential disease-causing genes. In a recent research, based on the combination of human interactome and the genome-wide DNA methylation, a weighted cancer-related DNA methylation network was constructed and a list of potential cancer-related genes with aberrant DNA methylation was prioritized. Still, there are few studies about disease network combined the DNA methylation and chromatin modifications with the human interactome.
As one of the metabolic diseases affecting the health and normal life, T2D is related with genetic, autoimmune, environmental and other factors. Recent studies have shown that the epigenetic factors also contribute to the pathogenesis of T2D, but still unclear currently. In this study, based on the network theory, the two major epigenetic modifications, DNA methylation and chromatin modifications were used to construct T2D-related DNA methylation network and chromatin modification pattern network, respectively. The interplay modules of both DNA methylation and chromatin modifications were detected, and then the genes with epigenetic disorders of T2D were identified. These genes and modules might affect the development of T2D by the epigenetic dysregulation, and could be the biological markers for the T2D etiology.
Identification of T2D-related differentially methylated genes
Based on the significant differences of genome-wide DNA methylation distributions between T2D and control samples, it is suggested the differentially methyalted genes might contribute to the occurrence and development of T2D. 1756 differentially methylated genes were identified using SAM with the FDR value < 0.05, most of which were hypomethylated and only 3 genes were hypermethylated (Figure 1C). Comparing differentially methylated genes in T2D and control groups (Figure 1D), the mean value of differentially methylated genes were 0.46 and 0.54 in T2D and control group, respectively. In control group, the differentially methylated genes were mainly in the range of 0.6 to 0.8, but mainly in the range of 0.4 to 0.6 in T2D group. For the differentially methylated genes in control group, there were 200 hypermethylated genes (methylation level >= 0.8), while only 82 genes in T2D group. Reversely, for the hypomethylated genes, there were 165 genes in control group, much less than the 270 genes in T2D group. The results indicated that the DNA methylation patterns reflected the global hypomethylation in the genome of T2D islet cells and there was a set of genes with the DNA methylation level reduced in T2D might play the crucial role in the development of the T2D pathology.
Construction and analysis of T2D-related DNA methylation weighted networks
Topolofical features of WMPN and TMSN.
Empirical P value
In order to assess the reliability of TMSN, the global network randomization was performed (see Methods for details). The topological features of the 1000 global random subnetworks were calculated and compared with TMSN. The results showed that TMSN was significantly different from the global random null model, the three topological features were all significant less than the global random null model with the empirical p value of < 0.01 (Table 1). The possible reason for this phenomenon might be that the seed set was a collection of genes with the relative low degree in WMPN resulting in the TMSN gene set with low degrees. Therefore, comparing the node and edge sizes of TMSN and global random null model, it is found that the sizes of the node set and the edge set for TMSN were less than the sizes of all the 1000 global random subnetworks. And the average degree of the seed gene set was 4 in WMPN, less than all the average degrees of the 1000 random seed gene sets.
To validate the statistical significance of the network modularity for TMSN, the giant component of TMSN was extracted. The local network randomization was performed and 1000 local random subnetwork giant components were constructed (see Methods for details). The topological features of local random null model were calculated and compared with the giant component of TMSN. The results shown that the clustering coefficient and betweenness of the TMSN giant component were 0.51 and 3860, much higher than the local random null model, indicating the statistically significant differences in the network structures between TMSN and the local random null model (Figure 2D and 2E). The findings revealed that TMSN might reflect the relationship between T2D-related genes with aberrant DNA methylation and disease cause genes.
Analysis of the T2D-related differentially methylated genes
Housekeeping genes were defined as the constitutive genes that were required for the maintenance of basic cellular function, and were expressed in all cells of an organism. Thus, the functional centrality of Housekeeping genes determined the network topological centrality of these genes, performing the crucial functions with the high degrees. Studies have shown that cancer-related genes tend to perform the basic cellular functions, named with essential genes. Therefore, the cancer gene set also had the functional and topological centrality in complex biological networks and played the critical role in the development of diseases. Many researches have shown that T2D is the chronic metabolic disease with the complex pathological mechanism unclear so far. But the causing genes of T2D might be the nonessential genes with the non-critical functions in organisms, so the dysregulation caused by these genes might be the non-fatal disorders. In this study, the average degree of T2D-related differentially methylated genes was only 4.00, significant less than all the random gene sets, revealing that T2D-related genes had the functional and topological peripherality in TMSN obtained from WMPN was the biological significance.
The pattern of chromatin modifications for T2D
Recent studies have shown that DNA methylation, histone modifications and other epigenetic modifications often effect together and lead to the common results. Histone modifications were considered to play the important roles in the regulation of the gene expression and the maintenance of chromosomal structure. In this study, we examined five chromatin modifications, including H3K4me1, H3K4me3, H3K79me2, CTCF (CCCTC factor) and DnaseHS (Dnase Hypersensitive Sites) in T2D.
The T2D-specific chromatin modification patterns might result in the aberrance of DNA methylation, which affected the expression levels of the corresponding genes or directly affected the interactions of DNA histone to regulate the expression levels of the genes. Therefore, as same as DNA methylation, chromatin modifications also had the important roles in the occurrence and development of T2D. DNA methylation and chromatin modifications were combined to investigate the interplay for the epigenetic dysregulation contributing on the development of T2D in the subsequent analysis.
Construction and analysis of T2D-related chromatin modification weighted networks
The findings above revealed that TMSN had the T2D-specific network structure, which might contain a set of epigenetic-specific dysregulated pathways caused by the aberrant DNA methylation, contributing to the epigenetic disorders of T2D. Furthermore, the chromatin modification patterns of T2D-related differentially methylated genes were significant distinguished from those of non-differentially methylated genes, indicating that the tight correlation between DNA methylation and chromatin modifications in T2D. The combination of DNA methylation and histone modifications is helpful to the understanding of the epigenetic dysregulation of T2D. Therefore, the chromatin modification patterns of the T2D-related differentially methylated genes and their connected genes were investigated. The T2D-related chromatin modification weighted subnetwork (TCSN) was built, composing of 16624 interactions (edges) and 863 genes (nodes), of which 156 seed genes were contained (see Methods for details) (Figure 5B). The giant component of TCSN comprises 16304 interactions and 709 genes, of which 121 seed genes inside. TCSN might reflect the chromatin modification patterns for differentially methylated genes and their connected genes in T2D. It provides the theoretical basis for T2D-spicific epigenetic disorders caused by the DNA methylation aberrance and the histone modifications, and promotes the further understanding of the epigenetic mechanisms contributing to the occurrence and development of T2D.
Topological features of WCPN and TCSN.
Empirical P value
In order to assess the reliability of TCSN, the global network randomization was performed and 1000 global random subnetworks were constructed (see Methods for details). The topological features of the global random subnetworks were calculated and compared with TCSN. The results showed that TCSN was significantly different from the global random null model, and two topological features (degree and clustering coefficient) were all significant higher than the global random null model with the empirical p value of < 0.01, while the betweenness of TCSN is less than that of WCPN with the statistic significance. Comparing the sizes of TCSN and the global random null model, it is found that the sizes of the node set and edge set for TCSN were 863 and 16624, larger than those of all the 1000 global random subnetworks. The results showed that the chromatin modification patterns of T2D-related differentially methylated genes and the associated genes tended to be similar with each other.
To validate the statistical significance of the network modularity observed in TCSN, the giant component of TCSN was extracted. The local network randomization was performed to contribute 1000 local random subnetwork giant components (see Methods for details). The topological features of local random null model were calculated and compared with the giant component of TCSN. The results were found that the clustering coefficient and betweenness of the TCSN giant component were 0.60 and 3860, respectively (Figure 5D and 5E), significantly higher than the local random null model, indicating the statistically significant differences in the network structures. It is revealed that TCSN may reflect the similar relations of chromatin modification patterns between T2D-related aberrant DNA methylated genes and the connected genes.
Detection of interplay modules of DNA methylation and chromatin modifications for T2D
TMSN contained 134 components, in which the giant component was composed of 863 genes and 6139 interactions, accounting for 72.10% and 96.13% of the total, respectively. For the giant component of TMSN, 42 DNA methylation specific modules were detected by removing the modules only containing 2 elements (see Methods for details). TCSN contained 29 components, and the giant component was composed of 709 genes and 16304 interactions, almost containing all of the genes and interactions in TCSN with the proportions of 82.16% and 98.08%, respectively. It was indicated that the chromatin modification patterns of T2D-related differentially methylated genes and their connecting genes were high similar. Finally, after removing the modules under the threshold, 21 chromatin modification specific modules of T2D were detected from the giant component of TCSN (see Methods for details). The dysregulation of biological processes in organisms were often resulted by the combined effects of DNA methylation and chromatin modifications. The results above showed that the chromatin modification intensities of T2D-related differentially methylated genes were significant higher than those of non-differentially methylated genes, indicating the T2D-specific chromatin modification patterns.
Thus, as the important epigenetic modifications, DNA methylation and chromatin modifications contributed to the epigenetic disorders of T2D together. In the 42 T2D-related DNA methylation modules and 21 T2D-related chromatin modification modules, the interplay modules of DNA methylation and chromatin modifications for T2D were detected by the cumulative hypergeometric test method (see Methods for details). Of the two groups of modules, there were 90 overlaps, and the minimal overlap only contained one gene, while the maximum overlap contained 44 genes. The module combinations (sub-modules) with the smallest overlap size of 3 were remained, and after the bonferroni multiple testing correction (Bonferroni corrected p value < 0.05), 24 significantly enriched sub-modules were identified and considered as the interplay modules of DNA methylation and chromatin modifications (see Additional file 1).
Due to the absence of a gold standard to evaluate whether the identified interplay modules were related to T2D, we used the public database PubMed as the reference list [22, 24]. For the 24 interplay modules, there were 205 genes in all. The automated texting of literature for the relationship of the interplay module genes and T2D was by the query terms that contained these genes and type 2 diabetes /insulin for the co-occurrence either in an abstract or in the title of previous publications. The results showed that 52 genes were related to T2D or insulin with 745 publications (see Additional file 2). And these genes were involved in the almost interplay modules, of which only 4 interplay modules had no publication support for their member genes (see Additional file 3). There may exist the false positive relations for the text-mining of PubMed, because the co-occurrence of the two terms donot mean the gene is indeed related with T2D or insulin. And also, the two terms may be the negative relations for the co-occurrence. Because of the positive and negative relations both existing in the titles and abstracts, the deviation could be ignore. So, the results could reflect the relationship between the genes and T2D.
These interplay modules might contribute to the epigenetic disorders of T2D through the combined effects of DNA methylation and chromatin modifications. Also, T2D-spicific chromatin modification patterns might lead to the aberrance of DNA methylation levels and further affect to the expression levels of the corresponding genes, or directly affect the binding of DNA and histone to regulate the expression levels of genes. On the other hand, the aberrant DNA methylation could directly affect the expression without the cooperation of chromatin modifications. In the following analysis, the effects of DNA methylation and chromatin modifications on gene expression were investigated.
The analysis of the effects of DNA methylation and chromatin modifications to the expression of genes
Next, another interplay module was investigated. Module 12 in TMSN giant component contains six genes, of which there was one differentially methylated gene with no variance of the expression level. Module22 in TCSN giant component contains five genes, and the differentially methylated gene within the Module12 of TMSN was included. The overlap of the two modules was four genes with the statistical significance of the p value less than 0.001, considered as the interplay module of DNA methylation and chromatin modifications (Figure 7B, see Additional file 5). CHMP1A is the differentially methylated gene in the interplay module of DNA methylation and chromatin modifications, and the DNA methylation pattern of this gene was all negatively correlated with that of the connected genes. The findings revealed the specific DNA methylation pattern different from other genes, indicating that the aberrant DNA methylation pattern of CHMP1A might result in the disorders of epigenetic modules in T2D. And the gene, ATP6V1B2, connecting with CHMP1A, was reported a potential factor contributing to the development of T2D . Next, another connected gene ATP6V1H which was not included in the interplay module, showed a decreasing trend of expression level in T2D, considering that the variance of expression level might be resulted by the aberrant chromatin modifications. In summary, in this study, a set of interplay modules of DNA methylation and chromatin modifications was detected, and might play the extremely important roles in the development of T2D.
In this study, based on the network theory, the T2D-related DNA methylation and chromatin modification networks were constructed by the integrated epigenetic data and protein interactome, and the two networks showed the specific topological features. The T2D-related DNA methylation and chromatin modification modules were detected and the interplay modules of the two epigenetic modifications with the statistical significance were identified as the epigenetic markers for T2D. The interplay modules might lead to the epigenetic disorders by the aberrance of DNA methylation and chromatin modifications, and affect the development of T2D. It is helpful for the understanding of the etiological mechanism caused by epigenetic disorders and the treatment for T2D.
Here, the DNA methylation data was examined by Human Methylation 27K BeadChip using Bisulfite conversion technology. The genome-wide DNA methylation chips with Bisulfite conversion technology could provide the accurately quantified at single-base resolution level, More than 27000 CpG sites could be interrogated by Human Methylation 27K BeadChip, but only 14495 genes with the promoter-specific regions were covered. And in this study, the chromatin modifications could be mapped into the TSS proximal regions of 23679 human Entrez genes, of which only 14356 genes had both the DNA methylation and chromatin modifications data, approximately 9000 genes lacked the DNA methylation data. But the T2D-related DNA methylation data of human islet tissue we obtained from GEO were only detected by Human Methylation 27K BeadChip, and the coverage of DNA methylation is the limitation for this study.
We extracted the T2D-related differentially methylated subnetwork from WMPN, which representing the physical and pathway interactions between the T2D-related differentially methylated genes and the connected genes. This subnetwork showed the T2D specific DNA methylation patterns between genes. In the analysis of the network topological features, it is found that the topological features of TMSN were all less than the global random null model with the statistic significance. One reason for this phenomenon is speculated that the most T2D-related differentially methylated genes had the low degrees, and leaded to the size of the TMSN extracted by these genes less than the permutations. Considering that T2D is the non-fatal chronic metabolic disorder, TMSN might show this biological significance. The genes contributing to the occurrence and development of T2D might be a set of the nonessential genes involved in the biological processes, so the dysregulation of these genes leaded to the non-fatal disorder, T2D. Similarly, the non-essentiality of these genes with the biological functions determined the peripherality of function and topology in the biological network. To validate this hypothesis, Housekeeping genes and cancer genes were used to compare with the T2D-related differentially methylated genes. The average degrees of the three gene sets in WMPN were calculated, the results showed that the average degrees of Housekeeping genes and cancer genes were significantly higher than that of the WMPN gene set and the corresponding random gene sets, while the average degree of T2D-related differentially methylated genes was less than that of the WMPN gene set and the corresponding random gene set. Therefore, TMSN constructed by the T2D-related differentially methylated genes showed the T2D specific network characteristics.
The current studies showed that the biological events often caused by the combination of DNA methylation and chromatin modifications. Therefore, the epigenetic disorders of T2D were caused by both DNA methylation and chromatin modifications. In this study, 24 interplay modules of DNA methylation and chromatin modifications were identified by the cumulative hypergeometric test with the statistic significance (Bomferroni corrected p value < 0.05), considered as the epigenetic disordered modules of T2D by the interplay between DNA methylation and chromatin modifications. In the analysis with the combination of the T2D expression level for human islet tissue, it is found that there existing the aberrant expressed genes caused by the abnormal DNA methylation and chromatin modifications. It is shown that DNA methylation was inversely corrected with the expression of the corresponding gene in the linear regression model. But the Spearman correlation coefficient was only -0.3969, seemed weakly. Only part of the genes has shown the negative correlations between DNA methylation and expression of the corresponding gene. This phenomenon could be partly explained by the reason that the probes could not cover all the promoters in the genome-wide scale in Human Methylation 27K BeadChip. Each promoter could cover 1.9 CpG sites, but the gene might have several promoters. While the probes in expression chips usually could be positioned in the common exons of the all transcripts for the corresponding gene. Therefore, the promoters that located by CpG sites were not must be the corresponding promoter of the transcript. So only a part of genes showed the negatively correlations between DNA methylation and expression of the corresponding gene. It is inconvenient for the analysis of the correction of DNA methylation and expression.
T2D is one of the most common chronic metabolic diseases, and related with genetic, autoimmune and environmental factors. Epigenetics could establish the interface between environmental factor and the T2D Pathology mechanism. In this study, it is considered that the biological events were affected by both DNA methylation and chromatin modifications through TMSN and WCPN. The analysis of expression levels of human islet tissue for T2D showed that the variances of expression levels of the genes were caused by the aberrant DNA methylation and the intensities of chromatin modifications. Thus, T2D epigenetic networks provide a new way for understanding the pathogenic mechanism of T2D caused by epigenetic disorders.
Protein-protein interaction and pathway data. The protein-protein interaction data were obtained from six protein-protein interaction databases, including MINT, IntAct, DIP, BioGRID, HPRD and BIND. These data were composed of physical and genetic interactions, which were supported by experimental evidence or scientific literature. The pathway data were derived from HumanCyc , Reactome and NCI nature Pathway Interaction database , three manually curated pathway databases. The format of Protein-protein interaction data were usually two types, tab-delimited ASCII format and PSI MI XML (Proteomics Standards Initiative Molecular Interaction markup language) format . For the former format, simple binary protein interactions could directly be extracted. For the second format, there need to convert it to a tab-delimited format and then obtain the simple binary protein interactions. Pathway data usually was stored by BioPAX format which representing a serial set of biochemical processes . For this format of pathway data, the biochemical reactions were converted into binary interaction Simple Interaction Format (SIF). Integrating these PPI and pathway data, the redundant interactions were removed, and the self-directed interactions were got rid of the network. Finally, the background network consists of 15,876 genes and 437,408 interactions.
T2D DNA methylation data. T2D related DNA methylation data were derived from Michael Volkmar et al. (GEO, accession number: GSE21232) , which contained 11 normal samples and 5 T2D samples of T2D in human islet tissue. DNA methylation level of these islet tissue samples were detected by Illumia HumanMethylation27 BeadChip. In this assay, genome-wide scale genes were detected at single-nucleotide resolution, covering 14495 genes. 27578 probes were contained in HumanMethylation27 assays, in which many genes may contained two or more probes. The average DNA methylation values of the genes with multiple probes were calculated as the DNA methylation level of the corresponding gene. Finally, the DNA methylation levels of 14495 genes with Entrez gene IDs for 16 human islet tissue samples were obtained for the subsequent analysis.
T2D expression data. T2D-related expression data were derived from Jalal Taneera et al. (GEO, accession number: GSE38642) contained 63 islet tissue samples, including nine T2D samples and 54 normal samples . The expression levels of the islet tissues were interrogated by Affymetrix GeneChip Human Gene 1.0 ST whole transcript array containing the well annotated genes based on RefSeq and Ensembl database. For the expression data for human islet tissue, 18808 probes were contained in this array, in which many genes were corresponding with several probes. Similar to the DNA methylation data, for the genes with the multiple probes, the average expression value represents the expression level of the corresponding gene. Finally, the expression levels of 14534 genes with Entrez gene IDs for 63 human islet tissue samples were obtained for the subsequent analysis.
T2D chromatin modifications data. Chromatin modifications were composed of H3 lysine methylation modifications (K4me1, K4me2, K79me2), DNaseI Hypersensitive sites and CCCTC factor (CTCF) data for human islet tissue of T2D patients, which derived from Michael L. Stizel et al.(GEO, accession number: GSE23784) . The histone modifications and CTCF were sequenced by ChIP-seq using the Illumina GAll, and the DNaseI Hypersensitive sites were also sequenced by DNase-seq using the Illumina GAll platform. The intensity of chromatin modifications for TSS proximal regions of the genes on a genome-wide scale were calculated using the peaks identified by MACS. If the middle site of the peak was located in the range of TSS proximal region, the value was accumulated for the TSS proximal region of the corresponding gene. Finally, the chromatin modification intensities were matched into 23680 genes with the TSS proximal regions.
TSS proximal region data. The TSS proximal regions of genes were defined as the region of +-2Kb up-/down-stream from the TSS (-2000bp/+2000bp around Transcription Start Site). The TSS information of human genome was downloaded from UCSC with the version of NCBI 36/hg18 (http://genome.ucsc.edu/cgi-bin/hgTables?command=start).
Identification of differentially expressed genes and differentially methylated genes
Based on adjusted t tests, Significance Analysis of Microarrays (SAM) method could identify the gene set with statistically significance changes in two states (e.g. disease vs. control) . For the expression data of human islet tissue, A FDR adjusted P value of <0.05 was used to identify differentially expressed genes between T2D and control samples by SAM method. For the DNA methylation data of human islet tissue, using the same criterion, the T2D and control samples were compared and the differentially methylated genes with the FDR value lower than 0.05 were identified by SAM. For the genes with the multiple probes, the average expression/DNA methylation value represents the expression/DNA methylation level of the corresponding gene. SAM method was implemented in the samr package (R version 2.15.2, Bioconductor version 2.3).
Construction of WMPN, weighted methylation PPI network and TMSN, T2D-related DNA methylation subnetwork
Construction of weighted methylation PPI network (WMPN). Based on the integrated protein-protein and functional interaction data source, the DNA methylation values of the any gene pair interacted with each other were calculated by the similarity measure of Pearson correlation. The gene pairs showing Pearson correlation > 0.8 or < -0.8 with the P value of < 0.05 were considered as the significant interactions, and those with no statistical significance were pruned from the network. After the edges filtering, the weighted methylation PPI network (WMPN) was constructed. Pearson correlation coefficient was implemented in R (http://www.r-project.org).
Construction of T2D-related DNA methylation subnetwork(TMSN). As the seed genes, the differentially methylated genes were mapped into WMPN. And then, the seed genes and the genes which were the first neighbours with the seed genes were obtained as the node set, the interactions between these genes were obtained as the edge set. Finally, T2D-related DNA methylation subnetwork (TMSN) was composed of the node set and the edge set.
Construction of WCPN, weighted chromatin modification PPI network and TCSN, T2D-related chromatin modification subnetwork
Construction of weighted chromatin modification PPI network (WCPN). Similarly with the construction of WMPN, The chromatin modificaition values of the any gene pair connected with each other were calculated by Pearson correlation coefficient, including H3K4me1, H3K4me3, H3K79me2, DNase Hypersensitive Sites and CTCF. The gene pairs showing the Pearson correlation > 0.8 or < -0.8 with the P value of < 0.05 were remained and considered as the significant interactions. Thus, the weighted chromatin modification PPI network (WCPN) was constructed. Pearson correlation coefficient was implemented in R (http://www.r-project.org).
Construction of T2D-related chromatin modification subnetwork (TCSN). The chromatin modification patterns of the T2D-related genes with aberrant NDA methylation were analyzed. The collection of TMSN gene set was mapped into WCPN and the connections of chromatin modification patterns between these genes were extracted from WCPN as the edge set. The TMSN gene set and their chromatin modification pattern connection set composed the T2D-related chromatin modification subnetwork (TCSN).
The analysis of network topological features
Network biology provides a quantitative description of the characteristics of biological systems network. In this study, we examined the topological features of networks, including degree (connectivity), degree distribution, clustering coefficient and Betweenness . The topological features of networks were implemented in the RBGL package (R version 2.15.2, Bioconductor version 2.3) and a cytoscape plugin NetworkAnalyzer (Version 2.8, Cytoscape 2.8) . And the network visualization was performed using Cytoscape (http://www.cytoscape.org/).
Detection of network modules
Using the Cytoscape plugin MCODE, the T2D-related DNA methylation and chromatin modification patterns of modules were detected from the giant components of TMSN and TCSN, respectively. The algorithm of MCODE could detect the regions where the nodes closely connected with each other in the large-scale biological network. And based on the density of the neighbour nodes of a node, the method weighted a node and extended the connectivity to identify the dense regions.
Identification of interplay modules between chromatin modification and DNA methylation
N represents the total number of the genes of TMSN and TCSN. M and H are defined as the genes counts of the DNA methylation modules and chromatin modification modules, respectively. K is the number of the overlapped genes of the two modules (the gene numbers of the sub-modules). The cumulative hypergeometric probability of this formula refers to the probability that hypergeometric random variable × is greater than or equal to the specified lower limit K. The p value could be calculated to evaluate the enrichment significance of the sub-modules for the two module groups. So, the sub-modules with the Bonferroni corrected p value < 0.05 were identified as the interplay modules of DNA methylation and chromatin modifications for T2D.
Global network permutation. In order to assess the global connectivity of TMSN in the overall network, the random seed gene sets with the equal size of the seed gene set were collected from WMPN, repeating 1000 times. Finally, 1000 random subnetworks were constructed by the random seed genes and their connected genes in WMPN. For the reliability of TCSN, the construction of the 1000 global random subnetworks was similar with that of TCSN by sampling the random gene sets with the equal size of TMSN node set from WCPN. By calculating the network topological features including degree, clustering coefficient and betweenness, the observed subnetwork and the 1000 random networks were compared, and the empirical p value was used to measure the differences with the statistical significance between the observed subnetwork and the global null hypothesis.
Local network permutation. To evaluate the statistical significance of the modularity for TMSN and TCSN, a local rewiring algorithm was performed that maintained the same size of nodes and edges of the network but the connections was rewired. The degree-preserving random rewiring algorithm was implemented as follows:
Randomly selected two degrees E (a - b) and E (c - d), and a, b, c, d , G(V) and G(E) are represented as the node set and the edge set, respectively. If E'(a-d) and E'(b-c) , removed the two edges E'(a-d) and E'(b-c), and then, added the new edges E'(a-d) and E'(b-c), viz. E(a - b, c - d) → E'(a - c, b - d); otherwise, there is nothing changed of the network in this step. That 50% attacks and rearrangements were performed to create the local random subnetwork, repeating 1000 times. For each random subnetwork, the network topological features were calculated and compared with the observed subnetwork. Finally, using the empirical p value, the deviations of the observed subnetwork from the local null hypothesis were measured.
The authors thank the referees for their important suggestions, and gratitude to the general support provided by College of Bioinformatics Science and Technology, Harbin Medical University and School of Life Science and Technology, State Key Laboratory of Urban Water Resources and Environment, Harbin Institute of Technology. This study was supported financially by grants from the National Natural Science Foundation of China (31171383, 31371478, 31371334, 81172742).
The publication costs for this article were funded by the corresponding author.
This article has been published as part of BMC Systems Biology Volume 8 Supplement 1, 2014: Selected articles from the Twelfth Asia Pacific Bioinformatics Conference (APBC 2014): Systems Biology. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcsystbiol/supplements/8/S1.
- Igoillo-Esteve M, Marselli L, Cunha DA, Ladriere L, Ortis F, Grieco FA, Dotta F, Weir GC, Marchetti P, Eizirik DL, Cnop M: Palmitate induces a pro-inflammatory response in human pancreatic islets that mimics CCL2 expression by beta cells in type 2 diabetes. Diabetologia. 2010, 53: 1395-1405. 10.1007/s00125-010-1707-y.View ArticlePubMedGoogle Scholar
- Diagnosis and classification of diabetes mellitus. Diabetes care. 2013, S67-74. 36 Suppl 1Google Scholar
- Davidson MB: Correction to the 2010 report on the diagnosis and classification of diabetes. Diabetes care. 2010, 33: e57-10.2337/dc09-2368.View ArticlePubMedGoogle Scholar
- Weir GC, Bonner-Weir S: Islet beta cell mass in diabetes and how it relates to function, birth, and death. Annals of the New York Academy of Sciences. 2013, 1281: 92-105. 10.1111/nyas.12031.PubMed CentralView ArticlePubMedGoogle Scholar
- Tabak AG, Jokela M, Akbaraly TN, Brunner EJ, Kivimaki M, Witte DR: Trajectories of glycaemia, insulin sensitivity, and insulin secretion before diagnosis of type 2 diabetes: an analysis from the Whitehall II study. Lancet. 2009, 373: 2215-2221. 10.1016/S0140-6736(09)60619-X.PubMed CentralView ArticlePubMedGoogle Scholar
- Dupuis J, Langenberg C, Prokopenko I, Saxena R, Soranzo N, Jackson AU, Wheeler E, Glazer NL, Bouatia-Naji N, Gloyn AL, et al.: New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk. Nature genetics. 2010, 42: 105-116. 10.1038/ng.520.PubMed CentralView ArticlePubMedGoogle Scholar
- Zeggini E, Scott LJ, Saxena R, Voight BF, Marchini JL, Hu T, de Bakker PI, Abecasis GR, Almgren P, Andersen G, et al.: Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nature genetics. 2008, 40: 638-645. 10.1038/ng.120.PubMed CentralView ArticlePubMedGoogle Scholar
- Scott LJ, Mohlke KL, Bonnycastle LL, Willer CJ, Li Y, Duren WL, Erdos MR, Stringham HM, Chines PS, Jackson AU, et al.: A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science. 2007, 316: 1341-1345. 10.1126/science.1142382.PubMed CentralView ArticlePubMedGoogle Scholar
- Steinthorsdottir V, Thorleifsson G, Reynisdottir I, Benediktsson R, Jonsdottir T, Walters GB, et al., Styrkarsdottir U, Gretarsdottir S, Emilsson V, Ghosh S: A variant in CDKAL1 influences insulin response and risk of type 2 diabetes. Nature genetics. 2007, 39: 770-775. 10.1038/ng2043.View ArticlePubMedGoogle Scholar
- Martin-Gronert MS, Ozanne SE: Metabolic programming of insulin action and secretion. Diabetes, obesity & metabolism. 2012, 29-39. 14 Suppl 3Google Scholar
- Taneera J, Lang S, Sharma A, Fadista J, Zhou Y, Ahlqvist E, Jonsson A, Lyssenko V, Vikman P, Hansson O, et al.: A systems genetics approach identifies genes and pathways for type 2 diabetes in human islets. Cell metabolism. 2012, 16: 122-134. 10.1016/j.cmet.2012.06.006.View ArticlePubMedGoogle Scholar
- Lim JE, Hong KW, Jin HS, Kim YS, Park HK, Oh B: Type 2 diabetes genetic association database manually curated for the study design and odds ratio. BMC medical informatics and decision making. 2010, 10: 76-10.1186/1472-6947-10-76.PubMed CentralView ArticlePubMedGoogle Scholar
- Gilbert ER, Liu D: Epigenetics: the missing link to understanding beta-cell dysfunction in the pathogenesis of type 2 diabetes. Epigenetics : official journal of the DNA Methylation Society. 2012, 7: 841-852. 10.4161/epi.21238.View ArticleGoogle Scholar
- Pollin TI: Epigenetics and diabetes risk: not just for imprinting anymore?. Diabetes. 2011, 60: 1859-1860. 10.2337/db11-0515.PubMed CentralView ArticlePubMedGoogle Scholar
- Bramswig NC, Kaestner KH: Epigenetics and diabetes treatment: an unrealized promise?. Trends in endocrinology and metabolism: TEM. 2012, 23: 286-291. 10.1016/j.tem.2012.02.002.PubMed CentralView ArticlePubMedGoogle Scholar
- Park JH, Stoffers DA, Nicholls RD, Simmons RA: Development of type 2 diabetes following intrauterine growth retardation in rats is associated with progressive epigenetic silencing of Pdx1. The Journal of clinical investigation. 2008, 118: 2316-2324.PubMed CentralPubMedGoogle Scholar
- Ng SF, Lin RC, Laybutt DR, Barres R, Owens JA, Morris MJ: Chronic high-fat diet in fathers programs beta-cell dysfunction in female rat offspring. Nature. 2010, 467: 963-966. 10.1038/nature09491.View ArticlePubMedGoogle Scholar
- Bhandare R, Schug J, Le Lay J, Fox A, Smirnova O, Liu C, Naji A, Kaestner KH: Genome-wide analysis of histone modifications in human pancreatic islets. Genome research. 2010, 20: 428-433. 10.1101/gr.102038.109.PubMed CentralView ArticlePubMedGoogle Scholar
- Barabasi AL, Gulbahce N, Loscalzo J: Network medicine: a network-based approach to human disease. Nature reviews Genetics. 2011, 12: 56-68. 10.1038/nrg2918.PubMed CentralView ArticlePubMedGoogle Scholar
- Wu X, Jiang R, Zhang MQ, Li S: Network-based global inference of human disease genes. Molecular systems biology. 2008, 4: 189-PubMed CentralView ArticlePubMedGoogle Scholar
- Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabasi AL: The human disease network. Proceedings of the National Academy of Sciences of the United States of America. 2007, 104: 8685-8690. 10.1073/pnas.0701361104.PubMed CentralView ArticlePubMedGoogle Scholar
- Liu H, Su J, Li J, Lv J, Li B, Qiao H, Zhang Y: Prioritizing cancer-related genes with aberrant methylation based on a weighted protein-protein interaction network. BMC systems biology. 2011, 5: 158-10.1186/1752-0509-5-158.PubMed CentralView ArticlePubMedGoogle Scholar
- Chang CW, Cheng WC, Chen CR, Shu WY, Tsai ML, Huang CL, Hsu IC: Identification of human housekeeping genes and tissue-selective genes by microarray meta-analysis. PloS one. 2011, 6: e22859-10.1371/journal.pone.0022859.PubMed CentralView ArticlePubMedGoogle Scholar
- van Steensel B, Braunschweig U, Filion GJ, Chen M, van Bemmel JG, Ideker T: Bayesian network analysis of targeting interactions in chromatin. Genome research. 2010, 20: 190-200. 10.1101/gr.098822.109.PubMed CentralView ArticlePubMedGoogle Scholar
- Ying Y, Li L, Cao W, Yan D, Zeng Q, Kong X, Lu L, Yan M, Xu X, Qu J, et al.: The microtubule associated protein syntabulin is required for glucose-stimulated and cAMP-potentiated insulin secretion. FEBS letters. 2012, 586: 3674-3680. 10.1016/j.febslet.2012.08.025.View ArticlePubMedGoogle Scholar
- Jayaguru P, Mohr S: Nuclear GAPDH: changing the fate of Muller cells in diabetes. Journal of ocular biology, diseases, and informatics. 2011, 4: 34-41. 10.1007/s12177-012-9085-y.PubMed CentralView ArticlePubMedGoogle Scholar
- Balasubramanian D, Akhtar-Zaidi B, Song L, Bartels CF, Veigl M, Beard L, Myeroff L, Guda K, Lutterbaugh J, Willis J, et al.: H3K4me3 inversely correlates with DNA methylation at a large class of non-CpG-island-containing start sites. Genome medicine. 2012, 4: 47-10.1186/gm346.PubMed CentralView ArticlePubMedGoogle Scholar
- Lu H, Yang Y, Allister EM, Wijesekara N, Wheeler MB: The identification of potential factors associated with the development of type 2 diabetes: a quantitative proteomics approach. Molecular & cellular proteomics : MCP. 2008, 7: 1434-1451. 10.1074/mcp.M700478-MCP200.PubMed CentralView ArticleGoogle Scholar
- Volkmar M, Dedeurwaerder S, Cunha DA, Ndlovu MN, Defrance M, Deplus R, Calonne E, Volkmar U, Igoillo-Esteve M, Naamane N, et al.: DNA methylation profiling identifies epigenetic dysregulation in pancreatic islets from type 2 diabetic patients. The EMBO journal. 2012, 31: 1405-1426. 10.1038/emboj.2011.503.PubMed CentralView ArticlePubMedGoogle Scholar
- Licata L, Briganti L, Peluso D, Perfetto L, Iannuccelli M, Galeota E, Sacco F, Palma A, Nardozza AP, Santonico E, et al.: MINT, the molecular interaction database: 2012 update. Nucleic acids research. 2012, 40: D857-861. 10.1093/nar/gkr930.PubMed CentralView ArticlePubMedGoogle Scholar
- Kerrien S, Alam-Faruque Y, Aranda B, Bancarz I, Bridge A, Derow C, Dimmer E, Feuermann M, Friedrichsen A, Huntley R, et al.: IntAct--open source resource for molecular interaction data. Nucleic acids research. 2007, 35: D561-565. 10.1093/nar/gkl958.PubMed CentralView ArticlePubMedGoogle Scholar
- Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D: The Database of Interacting Proteins: 2004 update. Nucleic acids research. 2004, 32: D449-451. 10.1093/nar/gkh086.PubMed CentralView ArticlePubMedGoogle Scholar
- Chatr-Aryamontri A, Breitkreutz BJ, Heinicke S, Boucher L, Winter A, Stark C, Nixon J, Ramage L, Kolas N, O'Donnell L, et al.: The BioGRID interaction database: 2013 update. Nucleic acids research. 2013, 41: D816-823. 10.1093/nar/gks1158.PubMed CentralView ArticlePubMedGoogle Scholar
- Keshava Prasad TS, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, Telikicherla D, Raju R, Shafreen B, Venugopal A, et al.: Human Protein Reference Database--2009 update. Nucleic acids research. 2009, 37: D767-772. 10.1093/nar/gkn892.PubMed CentralView ArticlePubMedGoogle Scholar
- Alfarano C, Andrade CE, Anthony K, Bahroos N, Bajec M, Bantoft K, Betel D, Bobechko B, Boutilier K, Burgess E, et al.: The Biomolecular Interaction Network Database and related tools 2005 update. Nucleic acids research. 2005, 33: D418-424.PubMed CentralView ArticlePubMedGoogle Scholar
- Romero P, Wagg J, Green ML, Kaiser D, Krummenacker M, Karp PD: Computational prediction of human metabolic pathways from the complete human genome. Genome biology. 2005, 6: R2-PubMed CentralView ArticlePubMedGoogle Scholar
- Croft D, O'Kelly G, Wu G, Haw R, Gillespie M, Matthews L, Caudy M, Garapati P, Gopinath G, Jassal B: Reactome, et al.: a database of reactions, pathways and biological processes. Nucleic acids research. 2011, 39: D691-697. 10.1093/nar/gkq1018.PubMed CentralView ArticlePubMedGoogle Scholar
- Schaefer CF, Anthony K, Krupa S, Buchoff J, Day M, Hannay T, Buetow KH: PID: the Pathway Interaction Database. Nucleic acids research. 2009, 37: D674-679. 10.1093/nar/gkn653.PubMed CentralView ArticlePubMedGoogle Scholar
- Jimenez RC, Vizcaino JA: Proteomics data exchange and storage: the need for common standards and public repositories. Methods Mol Biol. 2013, 1007: 317-333. 10.1007/978-1-62703-392-3_14.View ArticlePubMedGoogle Scholar
- Demir E, Cary MP, Paley S, Fukuda K, Lemer C, Vastrik I, Wu G, D'Eustachio P, Schaefer C, Luciano J, et al.: The BioPAX community standard for pathway data sharing. Nature biotechnology. 2010, 28: 935-942. 10.1038/nbt.1666.PubMed CentralView ArticlePubMedGoogle Scholar
- Stitzel ML, Sethupathy P, Pearson DS, Chines PS, Song L, Erdos MR, Welch R, Parker SC, Boyle AP, Scott LJ, et al.: Global epigenomic analysis of primary human pancreatic islets provides insights into type 2 diabetes susceptibility loci. Cell metabolism. 2010, 12: 443-455. 10.1016/j.cmet.2010.09.012.PubMed CentralView ArticlePubMedGoogle Scholar
- Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences of the United States of America. 2001, 98: 5116-5121. 10.1073/pnas.091062498.PubMed CentralView ArticlePubMedGoogle Scholar
- Barabasi AL, Oltvai ZN: Network biology: understanding the cell's functional organization. Nature reviews Genetics. 2004, 5: 101-113. 10.1038/nrg1272.View ArticlePubMedGoogle Scholar
- Assenov Y, Ramirez F, Schelhorn SE, Lengauer T, Albrecht M: Computing topological parameters of biological networks. Bioinformatics. 2008, 24: 282-284. 10.1093/bioinformatics/btm554.View ArticlePubMedGoogle Scholar
- Bader GD, Hogue CW: An automated method for finding molecular complexes in large protein interaction networks. BMC bioinformatics. 2003, 4: 2-10.1186/1471-2105-4-2.PubMed CentralView ArticlePubMedGoogle Scholar
- Maslov S, Sneppen K: Specificity and stability in topology of protein networks. Science. 2002, 296: 910-913. 10.1126/science.1065103.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.