Prioritizing cancer-related genes with aberrant methylation based on a weighted protein-protein interaction network
- Hui Liu†1,
- Jianzhong Su†1, 3,
- Junhua Li2,
- Hongbo Liu1,
- Jie Lv1,
- Boyan Li1,
- Hong Qiao2Email author and
- Yan Zhang1Email author
© Liu et al; licensee BioMed Central Ltd. 2011
Received: 25 July 2011
Accepted: 11 October 2011
Published: 11 October 2011
As an important epigenetic modification, DNA methylation plays a crucial role in the development of mammals and in the occurrence of complex diseases. Genes that interact directly or indirectly may have the same or similar functions in the biological processes in which they are involved and together contribute to the related disease phenotypes. The complicated relations between genes can be clearly represented using network theory. A protein-protein interaction (PPI) network offers a platform from which to systematically identify disease-related genes from the relations between genes with similar functions.
We constructed a weighted human PPI network (WHPN) using DNA methylation correlations based on human protein-protein interactions. WHPN represents the relationships of DNA methylation levels in gene pairs for four cancer types. A cancer-associated subnetwork (CASN) was obtained from WHPN by selecting genes associated with seed genes which were known to be methylated in the four cancers. We found that CASN had a more densely connected network community than WHPN, indicating that the genes in CASN were much closer to seed genes. We prioritized 154 potential cancer-related genes with aberrant methylation in CASN by neighborhood-weighting decision rule. A function enrichment analysis for GO and KEGG indicated that the optimized genes were mainly involved in the biological processes of regulating cell apoptosis and programmed cell death. An analysis of expression profiling data revealed that many of the optimized genes were expressed differentially in the four cancers. By examining the PubMed co-citations, we found 43 optimized genes were related with cancers and aberrant methylation, and 10 genes were validated to be methylated aberrantly in cancers. Of 154 optimized genes, 27 were as diagnostic markers and 20 as prognostic markers previously identified in literature for cancers and other complex diseases by searching PubMed manually. We found that 31 of the optimized genes were targeted as drug response markers in DrugBank.
Here we have shown that network theory combined with epigenetic characteristics provides a favorable platform from which to identify cancer-related genes. We prioritized 154 potential cancer-related genes with aberrant methylation that might contribute to the further understanding of cancers.
Cancer is a complex multi-gene disease. For a long time, gene mutation has been considered to be related to cancer. A number of oncogenes and tumor suppressor genes linked to mutations have been shown to drive the neoplastic process by increasing tumor cell numbers [1–3]. However, with progress in the understanding of cancer and the ongoing development of epigenetics, it has been reported that aberrant DNA methylation events are involved in many types of cancers . Genome-wide hypomethylation and region-specific CpG island promoter hypermethylation are a hallmark of the cancer epigenome . Thus the mechanisms that drive cancer development and progression cannot be effectively uncovered by studying only genetic factors. For a more profound understanding epigenetic characteristics must also be examined.
Abnormal DNA methylation has been found in many different tumors, including brain tumor, breast cancer, and prostate cancer. DNA hypomethylation is often considered to activate oncogenes. DNA hypermethylation of the promoter region, on the other hand, may initiate the silencing of tumor suppressor genes. It has been reported that the overexpression of FEN1 (flap endonuclease) in breast and other cancers is associated with CpG island hypomethylation of the promoter region in the tumor cells . Similarly, in a recent study by Sun et al., it was found that the promoter of TKTL1 (transketolase-like 1) had a high frequency of hypomethylation which induced the overexpression of the gene in head and neck squamous cell carcinoma (HNSCC) . BRCA1 is a well-characterized tumor suppressor gene which codes for proteins that have important roles in the regulation of the cell cycle and in the apoptosis of tumor cells. In breast and ovarian cancers, BRCA1 is inactivated by hypermethylation within the promoter region of the gene .
Thus, as the number of epigenetic studies gradually increase, the importance of DNA methylation in cancer research is being recognized. Simultaneously, a series of methods for DNA methylation detection have been developed. The earliest DNA methylation detection methods were mainly single-gene sequence-specific methods such as methylation sensitive restriction endonucleases , methylation specific PCR , and combined bisulfite restriction analysis . Subsequently, high-throughput genomic DNA methylation detection methods were developed like, for example, the large-scale microarray and sequencing technologies [12, 13]. Numerous studies of gene methylation have used traditional experimental methods to generate large amounts of methylation data; more recently, a large number of genome-wide DNA methylomes have been generated through the traditional methods being combined with the high throughput technologies.
Some cancers have been identified as different subtypes based on the methylation level of CpG islands  and cancer-related genes have been identified by their epigenetic variations. Loss et al. prioritized genes with epigenetic regulation in 45 breast cancer cell lines by their ranked methylation-expression association using logistic regression, and identified 58 genes as epigenetically regulated genes in the breast cancer cell lines . Some cancer-related genes have been identified by combining epigenetic characteristics and network theory. Network theory provides a platform for the systematic study of diseases [16–19]. The prioritization of cancer-related genes has also been widely studied using network theory. Most network theoretical approaches are based on the assumption that cancer-related genes participate in common functional modules including protein complexes, molecular pathways and developmental processes, and may have the same or similar functions that are involved in the development of cancers. Charles et al. established a weighted function network composed of human genes, and ranked the related genes to 110 different diseases (including cancer). This study revealed the recondite relationship between diseases with quite different phenotypes . In a study by Cui et al., a manually curated human signaling network was constructed and a set of cancer mutated genes and a set of cancer-associated methylated genes were mapped into the signaling network. These researchers found that methylated genes were mainly enriched in negative regulatory loops encoding tumor suppressors in cancer cells . Thus, they successfully developed an approach to identify cancer-related genes that could be used as biomarkers of cancers from high-throughput data.
A human protein-protein interaction network, constructed using a machine learning method, has been shown to be of benefit when applied to the study the disease-related genes using network theory . Here, we report the construction of an integrated and weighted network using protein-protein interaction (PPI) data and its correlation with DNA methylation to provide a comprehensive approach for prioritizing the cancer-related genes with abnormal methylation from genome and epigenome data.
The weighted human PPI network and cancer-associated subnetwork
Comparison of the topological features of WHPN and CASN
The topological features for WHPN and CASN were computed. CASN should have a network structure that is different from WHPN, which should show the specific genetic and epigenetic relations between genes in cancers.
The average path length and the probability density distribution of the average path length were also compared for the two networks. The results showed that CASN had a shorter average path length (average path length for CASN, 3.6902) than WHPN (average path length for WHPN, 4.7335) (Figure 5B). As for the other topological features, the average path length of CASN was significant when compared with the average path lengths of the 1000 randomly simulated subnetworks (Wilcoxon rank sum test, P < 1.7703E-006, FRD = 0.01). The network diameters for WHPN and CASN were 15 and 10, respectively.
Although the topological features of CASN were significantly different from those of WHPN and the random subnetworks, to confirm the significance of our results, we used the topology-matched random subnetworks that were generated as described in the Method section. Compared with the topology-matched random subnetworks, the topology features of CASN were all statistically significant (Wilcoxon rank sum test, Pdegree < 1.9049E-095, Pclustering coefficient < 1.7775E-042, Paverage path length < 2.0008E-091, FDR = 0.01). We also generated 1000 random networks by keeping the same number of nodes and connections as in CASN. We found that, compared with the three different random subnetworks, the degree and clustering coefficient for CASN were much significant those of the three random subnetworks, suggesting that the nodes in CASN were not randomly distributed (Additional file 2).
The calculated topological features of the networks indicate that CASN had a more densely connected network community than WHPN. It is generally believed that the mutant proteins contribute to diseases with similar phenotypes directly or indirectly interact and cancer is considered to be the result of the deregulation of some interrelated pathways. So, if a gene in the network is close to a cancer gene, then that gene is likely to be involved in some of the events that lead to the cancer. Further, the genes in CASN were also related to cancer through their abnormal methylation. Thus, the CASN genes may be involved in the same or similar biological processes as the seed genes, which may in turn be display changes of methylation levels in cancers.
Prioritizing the cancer-related genes with aberrant methylation
To further investigate the proposed hypothesis that the genes that were close in the CASN network may have similar methylation levels, we selected 773 genes that were connected with the seed genes in CASN as candidate genes. The 773 candidate genes were assessed to prioritize the cancer-related genes with aberrant methylation that had not been detected in CASN before. Using the neighborhood-weighting decision rule , every candidate gene was scored to measure the possibility that the methylation state of the candidate genes varied in cancers. The candidate genes whose scores were larger than any of the 1000 simulated scores were identified as the optimized genes (See Methods for details). Finally, 154 CASN genes were prioritized using the seed genes and neighborhood-weighting decision rule (Additional file 3). The optimized genes are also classified as four types according to the number of cancer types that the genes connected; 79 optimized genes for type I, 12 for type II 26 for type III and 37 for type IV (Additional file 4).
Analysis of Hub nodes and optimized genes
CASN genes were divided into three types: seed gene set, optimized gene set and rest potential gene set. The percentile 50 of degrees in the seed gene set, the optimized gene set and the rest potential gene set were 3.5, 4 and 2, respectively (Figure 6B, Additional file 7). The results showed that the seed genes and the optimized genes had higher degrees than the rest potential genes (Wilcoxon rank sum test, Pseed genes and rest potential genes = 2.3400E-002, Poptimized genes and rest potential genes = 5.4202E-004 and Pseed genes and optimized genes > 0.1). Of the top 10 genes with the largest degrees in CASN, 5 were seed genes, 4 were optimized genes and only one was a rest potential gene which was ranked 10 (Additional file 5). Four of the optimized genes (MAX, E2F4, EP300, MAPK8) were all in the top 10 genes with the highest degrees in WHPN.
The gene, MYC associated factor X (MAX), with the second highest number of degrees in CASN and the highest number of degrees (470) in WHPN, was prioritized as an optimized gene. The MAX protein coded by this gene is a member of the basic helix-loop-helix leucine zipper (bHLHZ) family of transcription factors. MAX usually forms heterodimers with other family members such as Mad, Mxi1 and Myc. Myc is an oncoprotein involved in cell proliferation, differentiation and apoptosis . Therefore, MAX may also participate in the same or similar biological processes that affect the development and progression of cancers. In CASN, some Hub nodes were prioritized as cancer-related genes with aberrant methylation. Other studies have shown that cancer-related proteins often have high connectivity that are usually considered to be the Hub notes in networks, rather than the peripheral notes [25, 26]. The optimized genes in our study tended to have high connectivity, suggesting critical roles for them in important biological processes and in the deregulation of genes during the development and progression of the cancers.
Analysis of GO categories and KEGG pathways
The GO function enrichment analysis of the CASN genes indicated that these genes were mainly enriched in the GO terms of regulation of programmed cell death and apoptosis, terms which are relevant of tumors (Additional file 8). The non-CASN genes, on the other hand, were mainly involved in the biological processes of regulation of protein RNA metabolic processes and cell proliferation (Additional file 8). Of the many significant GO terms listed for the non-CASN genes, only some of them were related to programmed cell death and apoptosis and they were significantly lower for the non-CASN genes than for the CASN gene set. These findings revealed that the CASN genes were much more closely related to cancers than the non-CASN genes. In the subsequent analysis, we focus on the CASN gene set.
The seed genes are mainly involved in GO biological processes associated with the regulation of the apoptosis and programmed cell death (Additional file 9). The seed genes were selected because they may be methylated aberrantly; thus, the abnormal methylation may induce their involvement in apoptosis and programmed cell death, affecting the development and progression of cancers [27, 28]. We also found that the optimized genes were mainly enriched in the GO terms involving regulation of gene expression, regulation of transcription and in the biological processes of cell apoptosis and programmed cell death (Additional file 9). Apoptosis is a basic biological phenomenon that may have a direct or indirect relationship with many diseases, such as cancers and autoimmune diseases . Apoptosis is negatively regulated in cancers and the disruption of apoptosis is involved in the initiation of cancers. These biological processes might be deregulated by the aberrant methylation of the optimized genes, thus affecting the process of cancers.
The CASN and non-CASN gene sets in WHPN were next analyzed for KEGG pathway enrichment. The findings revealed that the CASN genes were enriched in many cancer and cancer-associated pathways (Additional file 10); the non-CASN genes were not enriched in any cancer pathway. We investigated the three gene groups in CASN in the same way as we did for the GO enrichment analysis. The results for the optimized genes and seed genes are shown in Additional file 11. The seed genes were annotated to 12 of the 15 cancer pathways in the KEGG database. Two of the other pathways, the P53 signaling pathway and the Wnt signaling pathway, are closely related to cancers [30–32]. The optimized genes were also annotated to cancer pathways, and to the Notch signaling pathway and the cell cycle in KEGG. Some studies have shown that abnormalities of the Notch signaling pathway are linked to breast carcinogenesis, T cell malignancies, neuroblastoma, myeloid leukemia and lung cancer [33, 34]. Our results also show that the optimized genes may be linked to cancers, suggesting that the abnormal methylation levels affect the development and progression of cancers.
Finally, the GO and KEGG enrichment analysis for the rest potential genes indicated that these genes may also be enriched in some cancer-related GO biological processes and in some KEGG cancer- related pathways (Additional file 9 Additional file 11). A possible explanation for this result may be that the all genes in the networks were interactional and all the genes in CASN interacted with the seed genes which were all cancer-related genes. Genes that interact often participate in common functional modules like protein complexes, molecular pathways and developmental processes; so, the rest potential genes may have some functions that are similar to the cancer genes.
The expression level of the optimized genes
These findings suggest that some of the optimized genes in our data set might be differentially expressed in the corresponding cancers, indicating that these genes may be regulated by aberrant methylation resulting in their involvement in the cancer-related pathways.
PubMed co-citations for the optimized genes
Genes validated using PubMed literature co-citations
Entrez gene ID
We also manually searched PubMed for an association between all the optimized genes and their use as diagnostic or prognostic markers. We found that 27 of the 154 optimized genes were in the literature as diagnostic markers, and 20 optimized genes were in the literature as prognostic markers for breast cancer, nasopharyngeal carcinoma, prostate cancer, and for other non-cancer diseases that included Rubinstein-Taybi syndrome, type II diabetes and familial dysautonomia (Additional file 14). Subsequently, we searched for the optimized genes in the DrugBank database, and found that 31 genes were annotated as the targets for drug markers (Additional file 14). Therefore, we conclude that the promoter regions of the optimized genes may be methylated aberrantly, leading to the activation or inhibition of gene expression and contributing to their involvement in the development and progression of cancers.
Here, we prioritized potential cancer-related genes with aberrant methylation based on the constructed weighted human protein-protein interaction network. Network theory has been applied widely to the study of diseases . As a method of describing the interactions between biological molecules, biological network theory reveals the processes and laws involved in, for example, growth, development, aging, and disease. Proteins play important roles in the activity of cells and protein-protein interactions are the main path by which proteins function. Many researchers have used network theory to construct protein-protein interaction networks and combined them with other characteristics, such as molecular pathways and GO annotations, to identify disease-related genes . However, a few studies have attempted to combine network theory with both epigenetic and genetic characteristics . DNA methylation is an important epigenetic modification that influences a variety of physiological activities of the cell, such as X chromosome inactivation, aging, temporal and spatial expression and the development of diseases . Aberrant DNA methylation can affect tumor formation by affecting the chromatin structure and the expression of oncogenes and tumor suppressor genes [41, 42]. Therefore, we prioritize the potential cancer-related genes with aberrant methylation by integrating epigenetic and genetic characteristics based on network theory.
In this study, we ensured the reliability of the linkages in the WHPN by deriving the interaction data from five different PPI databases and by selecting only the PPIs which were validated by experiments or which have been reported in the literature. We then built the cancer-associated subnetwork (CASN) from WHPN. The network topological features evaluated using r values related to power law distribution indicate that WHPN is a typical scale-free network. Most biological networks constructed so far are scale-free networks with a degree distribution that follows the power law with degree exponents in the range 2 < r < 3 . Although rCASN is below this range, CASN still follows power law distribution and has the characteristics of a scale-free network that has a few Hub nodes with high connectivity and most nodes with low clustering coefficients. Thus, CASN is an approximate scale-free network. This characteristic may be due to the synergetic effect of DNA methylation and the cancer-related seed genes that were used to construct it. This also causes the network structure of CASN to be closer than that of WHPN. The clustering coefficient for CASN (0.1410) is more than the clustering coefficient for WHPN (0.0503). Therefore, in CASN, the genes connected with the seed genes may participate in the same or similar biological processes as the seed genes and may have the same or similar functions as the seed genes. Thus, it can be concluded that the deregulated DNA methylation may affect the development and progression of cancers.
We prioritized 154 potential cancer-related genes with aberrant methylation using the neighborhood-weighting decision rule based on CASN. In the PubMed co-citations analysis for these 154 genes, we found 43 genes that were associated with cancers and aberrant methylation in PubMed. The PubMed co-citations analysis is only a rough assessment which may have a high rate of false positives . Of the 43 genes in the PubMed co-citations, only 10 could be confidently validated to be aberrantly methylated in cancers from the literature analysis. The optimized genes that were not identified in the PubMed co-citations may be cancer-related genes with aberrant methylation that are, as yet, undetected. These genes are prime candidates for validation by further experiments.
In this study, we constructed a weighted human protein-protein interaction network (WHPN). Based on WHPN, a cancer-association subnetwork (CASN) was obtained using a set of seed genes derived from PubMeth. Comparing the topological features of the two networks, we found that CASN had a much denser network community than WHPN, indicating that the genes in CASN might be aberrantly methylated in cancers and likely participated in the same or similar biological processes as the seed genes. 154 genes were prioritized as potential cancer-related genes with aberrant methylation based on neighborhood-weighting decision rule. The enrichment analysis of GO and KEGG showed that the prioritized genes were generally enriched for biological processes related to apoptosis and programmed cell death and for pathways associated with cancers. Many of the optimized genes showed some degree of differential expression in the SAM analysis, revealing that these genes might be abnormally methylated in the cancer-related biological processes. Abnormal methylation would affect the expression level of these identified genes leading to the development and progression of cancers. In this study, we prioritized the cancer-related genes with aberrant methylation by integrating DNA methylation and protein-protein interaction characteristics based on the network theory. This method will be helpful for the further understanding of the mechanisms of the development and progression in cancers and may help develop new avenues for the prioritization of cancer-related genes with aberrant methylation for diagnosis and therapeutics.
Protein-protein interaction data
The protein-protein interaction data were obtained from five PPI databases, the Human Protein Reference Database (HPRD) , IntAct , the Database of Interacting Proteins (DIP) , the Molecular INTeraction Database (MINT)  and the Biomolecular Interaction Network Database (BIND) . To assure the reliability of the protein-protein interactions, we used only the interaction data that was supported by experimental evidence and not the optimized interactions. Because the different databases use different identifiers, the original identifiers were mapped to the corresponding Entrez Gene IDs using the cross-reference files from Entrez Gene, the HUGO Gene Nomenclature Committee (HGNC)  or Biomart . Using this method we obtained 80496 pairs of human protein-protein interaction involving 14611 genes.
DNA methylation data
The genome-wide DNA methylation data sources were derived from patients with four types of cancer, glioblastoma (GBM), ovarian cancer, hepatocellular carcinoma and leukemia. The methylation data for glioblastoma and ovarian cancer were from TCGA (The Cancer Genome Atlas) which contained 26 samples for GBM and 394 samples for ovarian cancer. The methylation data for hepatocellular carcinoma and leukemia were obtained from ENCODE in the UCSC Genome Browser which contained two replicates of the HepG2 cell line and two replicates of the K562 cell line. These data were all generated using Infinium assays on the Illumina Infinium HumanMethylation27 BeadChip. In these assays, quantitative measurements of DNA methylation are made for 27578 interrogated CpG dinucleotides covering a genome-wide scale of 14475 genes at single-nucleotide resolution. The probes for the CpG sites were mapped to the corresponding genes using Entrez Gene IDs as the unique identifiers. For genes containing two or more CpG sites, the average methylation value of these CpG sites was used to represent the methylation level of the corresponding gene. Finally, the methylation values of the 27578 CpG sites were mapped to the 14475 genes using the Entrez Gene IDs.
Seed gene data
Seed genes were defined as the genes validated by experiments to be aberrantly methylated (hypermethylation or hypomethylation) in cancers. PubMeth, a database of methylation in cancer, contains genes reported to be abnormally methylated in various cancer types. PubMeth is based on automated literature mining which is then manually checked and annotated . We used the keywords, glioblastoma, ovarian cancer, hepatocellular carcinoma and leukemia, in the PubMeth cancer-centric search tool to extract cancer-associated genes with aberrant methylation from the database. After removing the genes for which the number of samples or the methylation frequency was 0, the remaining set of genes was chosen as seed genes. The selected seed gene set included 73 genes for glioblastoma, 58 for hepatocellular carcinoma, 47 for ovarian cancer and 22 for leukemia. These genes were then combined giving a final seed gene set of 127 genes for the four cancer types.
Construction of WHPN, the weighted human PPI network
We build the weighted network used in this study by combining the human PPI data with the DNA methylation data. Pearson correlations were calculated to assess the association of the methylation for each gene pair. The genes were used as the nodes, the interactions of the gene pairs were the edges and the correlation coefficient was used as the linkage weight. Linkages for which the methylation correlations were beneath the threshold were removed.
Next, the weight thresholds of the network were determined. The DNA methylation data was perturbed for 1000 times; thereafter, the Pearson correlation coefficient of the random DNA methylation was computed for the gene pairs in each perturbed methylation dataset. The methylation correlation of the original gene pairs was compared with the methylation correlations of the perturbed gene pairs. All the values were ranked and compared with the true methylation correlation. If the true value was either in the top five percent or in the bottom five percent of the ranked correlations for the perturbed gene pairs (P value < 0.1), then the methylation correlation for that pair was considered to be significant and this correlation was used as the linkage weight of the connected gene pair. All gene pairs for which the methylation correlations were not within the top or bottom five percent were removed from the PPI network.
Construction of CASN, the cancer-associated subnetwork
The seed gene set obtained using the data from PubMeth was mapped into the weighted human PPI network. Thus, the cancer-associated subnetwork (CASN) is composed of the seed genes and the genes that are connected with the seed genes in the weighted PPI network.
Random subnetworks were generated by randomly sampling the same number of nodes as in CASN from WHPN; the connections in the sampling nodes were kept in the random subnetworks. This process was repeated 1000 times. Topology-matched random subnetworks were generated by the method described by Li et al. in an earlier study . Similarly, all the genes in WHPN were divided into three subsets of equal size based on degrees and clustering coefficients. Next, the WHPN genes were assigned to nine sets by combining the three degree sets with the three clustering coefficient sets. To create the random subnetwork, for every node in CASN, we randomly obtained one node from the same topological set in WHPN; in this way, 1000 topology-matched random subnetworks were generated. Finally, to determine the significance of our results, a third randomization method was used to generate 1000 random subnetworks by keeping the number of nodes and connections of CASN.
The calculation of topological features in the two networks
Network theory provides a quantifiable description of networks for the biological systems. And degree, clustering coefficient and average path length are the three characteristic topological features used to measure complex networks. In this study, we examined these topological features that can be used to measure and compare the different complex networks .
The degree (connectivity) of a node i, k i , is defined as the number of notes which the node i connects. And the average degree of the network is the average value of the degrees of all the notes, 2L/N, where L represents links and N represents notes.
This measurement indicates that the variety trend of probability for the note whose degree is k with the change of the note's degree k, and it also means that the ratio of the number of notes with the degree of k accounting for all the notes in the network.
The r values for the networks are calculated, looking for whether they asymptotically follow power law distribution.
The clustering coefficient of the network is defined as the average value for clustering coefficient of all the notes, which reflects the overall cluster trend of interacted notes.
Average path length
Subnetwork-based cancer gene with aberrant methylation prioritization
Neighborhood-weighting decision rule
ω ij represents the linkage weight that between the candidate gene i and the seed gene j, viz. the Pearson correlation of the DNA methylation between the two genes. If the candidate gene i is not associated with the seed gene j, ω ij = 0. The scores of the candidate genes are calculated by the neighborhood-weighting decision rule.
And next, the candidate genes are prioritized according to the score S i . Through the perturbation of DNA methyaltion data, the permuted weights for linkages are obtained. And then the permuted scores are also calculated by the Neighborhood-weighting decision rule. This procedure is repeated 1000 times and then the scores and the permuted scores are compared. The genes whose score is grater than all the corresponding permuted score are selected. It is believe that these genes have potential to be methylated aberrantly in cancers.
Enrichment analysis of GO and KEGG pathway
The gene annotation enrichment analysis for the optimized genes was taken by DAVID , where the tools can provide functional interpretation of large gene lists derived from genomic studies.
Comparisons of differentially expressed genes
where, d ij is the SAM score for gene i in the random dataset j. The differentially expressed genes were ranked according to their diff_score i . SAM was performed using the R package samr.
Text-mining of PubMed for the optimized genes comparison to previously publications
In the absence of a gold standard to assess the relationship between the cancers and the methylation level of the optimized genes, we used the public database PubMed to produce the reference lists. PubMed contains hundreds of thousands of citations to the biomedical literature. We systematically searched PubMed using query terms that included the optimized gene name, the cancer type and hypermethylation/hypomethylation, for the co-occurrence of these terms either in an abstract or in the title of previous publications .
Weighted Human Protein-protein interaction Network
Kyoto Encyclopedia of Genes and Genomes
Human Protein Reference Database
Database of Interacting Proteins
Biomolecular Interaction Network Database
the HUGO Gene Nomenclature Committee
the Cancer Genome Atlas
ENCyclopedia of DNA Elements
Gene Expression Omnibus
the Database for Annotation, Visualization and Integrated Discovery
Significance Analysis of Microarrays.
This work was supported by National Natural Science Foundation of China (61075023 and 81172742), Natural Science Foundation of Heilongjiang Province (C201012) and Scientific Research Fund of Heilongjiang Provincial Education Department (12511272).
- Futreal PA, Coin L, Marshall M, Down T, Hubbard T, Wooster R, Rahman N, Stratton MR: A census of human cancer genes. Nat Rev Cancer. 2004, 4: 177-183. 10.1038/nrc1299.PubMed CentralView ArticlePubMed
- Thomas RK, Baker AC, Debiasi RM, Winckler W, Laframboise T, Lin WM, Wang M, Feng W, Zander T, MacConaill L, et al, et al.: High-throughput oncogene mutation profiling in human cancer. Nat Genet. 2007, 39: 347-351. 10.1038/ng1975.View ArticlePubMed
- Stratton MR, Campbell PJ, Futreal PA: The cancer genome. Nature. 2009, 458: 719-724. 10.1038/nature07943.PubMed CentralView ArticlePubMed
- Cheung HH, Lee TL, Rennert OM, Chan WY: DNA methylation of cancer genome. Birth Defects Res C Embryo Today. 2009, 87: 335-350. 10.1002/bdrc.20163.PubMed CentralView ArticlePubMed
- Das PM, Singal R: DNA methylation and cancer. J Clin Oncol. 2004, 22: 4632-4642. 10.1200/JCO.2004.07.151.View ArticlePubMed
- Singh P, Yang M, Dai H, Yu D, Huang Q, Tan W, Kernstine KH, Lin D, Shen B: Overexpression and hypomethylation of flap endonuclease 1 gene in breast and other cancers. Mol Cancer Res. 2008, 6: 1710-1717.PubMed CentralPubMed
- Sun W, Liu Y, Glazer CA, Shao C, Bhan S, Demokan S, Zhao M, Rudek MA, Ha PK, Califano JA: TKTL1 is activated by promoter hypomethylation and contributes to head and neck squamous cell carcinoma carcinogenesis through increased aerobic glycolysis and HIF1alpha stabilization. Clin Cancer Res. 2010, 16: 857-866. 10.1158/1078-0432.CCR-09-2604.PubMed CentralView ArticlePubMed
- Esteller M, Silva JM, Dominguez G, Bonilla F, Matias-Guiu X, Lerma E, Bussaglia E, Prat J, Harkes IC, Repasky EA, et al, et al.: Promoter hypermethylation and BRCA1 inactivation in sporadic breast and ovarian tumors. J Natl Cancer Inst. 2000, 92: 564-569. 10.1093/jnci/92.7.564.View ArticlePubMed
- Gonzalgo ML, Jones PA: Rapid quantitation of methylation differences at specific sites using methylation-sensitive single nucleotide primer extension (Ms-SNuPE). Nucleic Acids Res. 1997, 25: 2529-2531. 10.1093/nar/25.12.2529.PubMed CentralView ArticlePubMed
- Herman JG, Graff JR, Myohanen S, Nelkin BD, Baylin SB: Methylation-specific PCR: a novel PCR assay for methylation status of CpG islands. Proc Natl Acad Sci USA. 1996, 93: 9821-9826. 10.1073/pnas.93.18.9821.PubMed CentralView ArticlePubMed
- Eads CA, Laird PW: Combined bisulfite restriction analysis (COBRA). Methods Mol Biol. 2002, 200: 71-85.PubMed
- Gitan RS, Shi H, Chen CM, Yan PS, Huang TH: Methylation-specific oligonucleotide microarray: a new potential for high-throughput methylation analysis. Genome Res. 2002, 12: 158-164. 10.1101/gr.202801.PubMed CentralView ArticlePubMed
- Schumacher A, Kapranov P, Kaminsky Z, Flanagan J, Assadzadeh A, Yau P, Virtanen C, Winegarden N, Cheng J, Gingeras T, Petronis A: Microarray-based DNA methylation profiling: technology and applications. Nucleic Acids Res. 2006, 34: 528-542. 10.1093/nar/gkj461.PubMed CentralView ArticlePubMed
- Noushmehr H, Weisenberger DJ, Diefes K, Phillips HS, Pujara K, Berman BP, Pan F, Pelloski CE, Sulman EP, Bhat KP, et al, et al.: Identification of a CpG island methylator phenotype that defines a distinct subgroup of glioma. Cancer Cell. 2010, 17: 510-522. 10.1016/j.ccr.2010.03.017.PubMed CentralView ArticlePubMed
- Loss LA, Sadanandam A, Durinck S, Nautiyal S, Flaucher D, Carlton VE, Moorhead M, Lu Y, Gray JW, Faham M, et al, et al.: Prediction of epigenetically regulated genes in breast cancer cell lines. BMC Bioinformatics. 2010, 11: 305-10.1186/1471-2105-11-305.PubMed CentralView ArticlePubMed
- Chavali S, Barrenas F, Kanduri K, Benson M: Network properties of human disease genes with pleiotropic effects. BMC Syst Biol. 2010, 4: 78-10.1186/1752-0509-4-78.PubMed CentralView ArticlePubMed
- Jiang Q, Hao Y, Wang G, Juan L, Zhang T, Teng M, Liu Y, Wang Y: Prioritization of disease microRNAs through a human phenome-microRNAome network. BMC Syst Biol. 2010, 4 (Suppl 1): S2-10.1186/1752-0509-4-S1-S2.PubMed CentralView ArticlePubMed
- Nguyen TP, Jordan F: A quantitative approach to study indirect effects among disease proteins in the human protein interaction network. BMC Syst Biol. 2010, 4: 103-10.1186/1752-0509-4-103.PubMed CentralView ArticlePubMed
- Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabasi AL: The human disease network. Proc Natl Acad Sci USA. 2007, 104: 8685-8690. 10.1073/pnas.0701361104.PubMed CentralView ArticlePubMed
- Linghu B, Snitkin ES, Hu Z, Xia Y, Delisi C: Genome-wide prioritization of disease genes and identification of disease-disease associations from an integrated human functional linkage network. Genome Biol. 2009, 10: R91-10.1186/gb-2009-10-9-r91.PubMed CentralView ArticlePubMed
- Cui Q, Ma Y, Jaramillo M, Bari H, Awan A, Yang S, Zhang S, Liu L, Lu M, O'Connor-McCourt M, et al, et al.: A map of human cancer signaling. Mol Syst Biol. 2007, 3: 152-PubMed CentralView ArticlePubMed
- Li L, Zhang K, Lee J, Cordes S, Davis DP, Tang Z: Discovering cancer genes by integrating network and functional properties. BMC Med Genomics. 2009, 2: 61-10.1186/1755-8794-2-61.PubMed CentralView ArticlePubMed
- Barabasi AL, Oltvai ZN: Network biology: understanding the cell's functional organization. Nat Rev Genet. 2004, 5: 101-113. 10.1038/nrg1272.View ArticlePubMed
- Albihn A, Johnsen JI, Henriksson MA: MYC in oncogenesis and as a target for cancer therapies. Adv Cancer Res. 2010, 107: 163-224.View ArticlePubMed
- Jonsson PF, Bates PA: Global topological features of cancer proteins in the human interactome. Bioinformatics. 2006, 22: 2291-2297. 10.1093/bioinformatics/btl390.PubMed CentralView ArticlePubMed
- Fox AD, Hescott BJ, Blumer AC, Slonim DK: Connectedness of PPI network neighborhoods identifies regulatory hub proteins. Bioinformatics. 2011, 27: 1135-1142. 10.1093/bioinformatics/btr099.PubMed CentralView ArticlePubMed
- Friedrich MG, Weisenberger DJ, Cheng JC, Chandrasoma S, Siegmund KD, Gonzalgo ML, Toma MI, Huland H, Yoo C, Tsai YC, et al, et al.: Detection of methylated apoptosis-associated genes in urine sediments of bladder cancer patients. Clin Cancer Res. 2004, 10: 7457-7465. 10.1158/1078-0432.CCR-04-0930.View ArticlePubMed
- Hervouet E, Vallette FM, Cartron PF: Impact of the DNA methyltransferases expression on the methylation status of apoptosis-associated genes in glioblastoma multiforme. Cell Death Dis. 2010, 1: e8-10.1038/cddis.2009.7.PubMed CentralView ArticlePubMed
- Cotter TG: Apoptosis and cancer: the genesis of a research field. Nat Rev Cancer. 2009, 9: 501-507. 10.1038/nrc2663.View ArticlePubMed
- Vazquez A, Bond EE, Levine AJ, Bond GL: The genetics of the p53 pathway, apoptosis and cancer therapy. Nat Rev Drug Discov. 2008, 7: 979-987. 10.1038/nrd2656.View ArticlePubMed
- Matheu A, Maraver A, Serrano M: The Arf/p53 pathway in cancer and aging. Cancer Res. 2008, 68: 6031-6034. 10.1158/0008-5472.CAN-07-6851.View ArticlePubMed
- Barker N, Clevers H: Mining the Wnt pathway for cancer therapeutics. Nat Rev Drug Discov. 2006, 5: 997-1014. 10.1038/nrd2154.View ArticlePubMed
- Westhoff B, Colaluca IN, D'Ario G, Donzelli M, Tosoni D, Volorio S, Pelosi G, Spaggiari L, Mazzarol G, Viale G, et al, et al.: Alterations of the Notch pathway in lung cancer. Proc Natl Acad Sci USA. 2009, 106: 22293-22298. 10.1073/pnas.0907781106.PubMed CentralView ArticlePubMed
- Klinakis A, Lobry C, Abdel-Wahab O, Oh P, Haeno H, Buonamici S, van De Walle I, Cathelin S, Trimarchi T, Araldi E, et al, et al.: A novel tumour-suppressor function for the Notch pathway in myeloid leukaemia. Nature. 2011, 473: 230-233. 10.1038/nature09999.PubMed CentralView ArticlePubMed
- Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA. 2001, 98: 5116-5121. 10.1073/pnas.091062498.PubMed CentralView ArticlePubMed
- van Steensel B, Braunschweig U, Filion GJ, Chen M, van Bemmel JG, Ideker T: Bayesian network analysis of targeting interactions in chromatin. Genome Res. 2010, 20: 190-200. 10.1101/gr.098822.109.PubMed CentralView ArticlePubMed
- Howard JH, Frolov A, Tzeng CW, Stewart A, Midzak A, Majmundar A, Godwin A, Heslin M, Bellacosa A, Arnoletti JP: Epigenetic downregulation of the DNA repair gene MED1/MBD4 in colorectal and ovarian cancer. Cancer Biol Ther. 2009, 8: 94-100.View ArticlePubMed
- Bediaga NG, Acha-Sagredo A, Guerra I, Viguri A, Albaina C, Ruiz Diaz I, Rezola R, Alberdi MJ, Dopazo J, Montaner D, et al, et al.: DNA methylation epigenotypes in breast cancer molecular subtypes. Breast Cancer Res. 2010, 12: R77-10.1186/bcr2721.PubMed CentralView ArticlePubMed
- Knox C, Law V, Jewison T, Liu P, Ly S, Frolkis A, Pon A, Banco K, Mak C, Neveu V, et al, et al.: DrugBank 3.0: a comprehensive resource for 'omics' research on drugs. Nucleic Acids Res. 2011, 39: D1035-1041. 10.1093/nar/gkq1126.PubMed CentralView ArticlePubMed
- Senner CE: The role of DNA methylation in mammalian development. Reprod Biomed Online. 2011, 22: 529-535. 10.1016/j.rbmo.2011.02.016.View ArticlePubMed
- Kwabi-Addo B, Ren C, Ittmann M: DNA methylation and aberrant expression of Sprouty1 in human prostate cancer. Epigenetics. 2009, 4: 54-61. 10.4161/epi.4.1.7400.View ArticlePubMed
- Karpf AR, Jones DA: Reactivating the expression of methylation silenced genes in human cancer. Oncogene. 2002, 21: 5496-5503. 10.1038/sj.onc.1205602.View ArticlePubMed
- Keshava Prasad TS, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, Telikicherla D, Raju R, Shafreen B, Venugopal A, et al, et al.: Human Protein Reference Database--2009 update. Nucleic Acids Res. 2009, 37: D767-772. 10.1093/nar/gkn892.PubMed CentralView ArticlePubMed
- Kerrien S, Alam-Faruque Y, Aranda B, Bancarz I, Bridge A, Derow C, Dimmer E, Feuermann M, Friedrichsen A, Huntley R, et al, et al.: IntAct--open source resource for molecular interaction data. Nucleic Acids Res. 2007, 35: D561-565. 10.1093/nar/gkl958.PubMed CentralView ArticlePubMed
- Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D: The Database of Interacting Proteins: 2004 update. Nucleic Acids Res. 2004, 32: D449-451. 10.1093/nar/gkh086.PubMed CentralView ArticlePubMed
- Ceol A, Chatr Aryamontri A, Licata L, Peluso D, Briganti L, Perfetto L, Castagnoli L, Cesareni G: MINT, the molecular interaction database: 2009 update. Nucleic Acids Res. 2010, 38: D532-539. 10.1093/nar/gkp983.PubMed CentralView ArticlePubMed
- Alfarano C, Andrade CE, Anthony K, Bahroos N, Bajec M, Bantoft K, Betel D, Bobechko B, Boutilier K, Burgess E, et al, et al.: The Biomolecular Interaction Network Database and related tools 2005 update. Nucleic Acids Res. 2005, 33: D418-424.PubMed CentralView ArticlePubMed
- Seal RL, Gordon SM, Lush MJ, Wright MW, Bruford EA: genenames.org: the HGNC resources in 2011. Nucleic Acids Res. 2011, 39: D514-519. 10.1093/nar/gkq892.PubMed CentralView ArticlePubMed
- Haider S, Ballester B, Smedley D, Zhang J, Rice P, Kasprzyk A: BioMart Central Portal--unified access to biological data. Nucleic Acids Res. 2009, 37: W23-27. 10.1093/nar/gkp265.PubMed CentralView ArticlePubMed
- Ongenaert M, Van Neste L, De Meyer T, Menschaert G, Bekaert S, Van Criekinge W: PubMeth: a cancer methylation database combining text-mining and expert annotation. Nucleic Acids Res. 2008, 36: D842-846.PubMed CentralView ArticlePubMed
- Li J, Zimmerman LJ, Park BH, Tabb DL, Liebler DC, Zhang B: Network-assisted protein identification and data interpretation in shotgun proteomics. Mol Syst Biol. 2009, 5: 303-PubMed CentralView ArticlePubMed
- Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003, 13: 2498-2504. 10.1101/gr.1239303.PubMed CentralView ArticlePubMed
- Huang da W, Sherman BT, Lempicki RA: Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009, 4: 44-57.View ArticlePubMed
- Li J, Lenferink AE, Deng Y, Collins C, Cui Q, Purisima EO, O'Connor-McCourt MD, Wang E: Identification of high-quality cancer prognostic markers and metastasis network modules. Nat Commun. 2010, 1: 34-PubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.