Genes associated with genotype-specific DNA methylation in squamous cell carcinoma as candidate drug targets
© Kinoshita et al.; licensee BioMed Central Ltd. 2014
Published: 24 January 2014
Aberrant DNA methylation is often associated with cancers. Thus, screening genes with cancer-associated aberrant DNA methylation is a useful method to identify candidate cancer-causing genes. Aberrant DNA methylation is also genotype dependent. Thus, the selection of genes with genotype-specific aberrant DNA methylation in cancers is potentially important for tailor-made medicine. The selected genes are important candidate drug targets.
The recently proposed principal component analysis based selection of genes with aberrant DNA methylation was applied to genotype and DNA methylation patterns in squamous cell carcinoma measured using single nucleotide polymorphism (SNP) arrays. SNPs that are frequently found in cancers are usually highly methylated, and the genes that were selected using this method were reported previously to be related to cancers. Thus, genes with genotype-specific DNA methylation patterns will be good therapeutic candidates. The tertiary structures of the proteins encoded by the selected genes were successfully inferred using two profile-based protein structure servers, FAMS and Phyre2. Candidate drugs for three of these proteins, tyrosine kinase receptor (ALK), EGLN3 protein, and NUAK family SNF1-like kinase 1 (NUAK1), were identified by ChooseLD.
We detected genes with genotype-specific DNA methylation in squamous cell carcinoma that are candidate drug targets. Using in silico drug discovery, we successfully identified several candidate drugs for the ALK, EGLN3 and NUAK1 genes that displayed genotype-specific DNA methylation.
Promoter methylation is widely recognized as an important factor that regulates gene expression, especially in cancers [1, 2]. Many genes with tumor-specific methylated promoters have been identified. For example, the promoters of the PAK3, NISCH, KIF1A, and OGDHL genes are specifically methylated in several cancers, including breast, esophagus, lung, pancreas, colon, prostate, gastric, cervix, thyroid, kidney, head and neck, ovary, and bladder cancers . Because genes with methylated promoters are believed to be suppressive, genes with tumor-specific hypermethylated promoters were assumed to be tumor suppressors. Similarly, genes with tumor-specific hypomethylated promoters were supposed to be oncogenic (i.e., expressed in tumors) and potential oncogene targets. Identification of promoter methylation in cancer genes is important in helping to find critical genes that can cause cancer formation.
Genotype, on the other hand, is another critical factor that can affect cancer formation . Many genotypes are known to be associated with cancers. Currently, there are no established mechanisms that can relate gene mutations to cancer formation. For example, a cancer-specific single nucleotide polymorphism (SNP) is often associated with specific cancers , but this SNP is located in an intron of the gene. It is still unclear how intronic SNPs affect gene expression. Typically, cancer-associated genotypes work solely as biomarkers.
Despite of the known importance of DNA methylation and genotype on cancer formation, how DNA methylation and genotype cooperatively mediate cancer formation has rarely been discussed. An exception is the recent association study reported by Scherf et al.  who found that genotype-specific promoter DNA methylation of the oncogene CHRNB4 was related to lung cancer. Opavsky et al.  also found that the P53, E2f2 and Pten genes in a mouse model of lymphoma were methylated in a genotype-specific manner. Thus, genotype and DNA methylation may contribute cooperatively to cancer formation in many other cancers.
In this paper, we sought to detect genotype-specific DNA methylation in esophageal squamous cell carcinoma (ESCC). Many previous studies have reported ESCC-specific genotypes. For example, Abnet et al.  found that genotypic variants at position 2q33 on the human chromosome were related to risk of ESCC. Maeng et al.  found that phosphoinositide-3-kinase and BRAF mutations were associated with metastatic ESCC and Wang et al.  found that ESCC was related to polymorphisms in ALDH2 and ADH1B in Chinese females. Thus, genotype-specific DNA methylation is expected to exist widely in ESCC. In this study, we used two publicly available distinct SNP microarray data sets to identify genotype-specific DNA methylation in ESCC.
DNA methylation profiles and genotypes
DNA methylation profiles and genotypes of blood, and normal and tumor tissues for 30 patients from two SNP arrays, Nsp and Sty, were downloaded from the Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information [GEO:GSE20123] . A total of 90 samples for each of the DNA methylation and genotypes were obtained. The normalized data were used without further preprocessing.
Principal component analysis of DNA methylation profiles and genotypes
The downloaded samples were analyzed by principal component analysis (PCA) after substituting a zero for missing values. Principal components (PCs) that exhibited differences between the blood, normal tissue, and tumor tissue samples were selected for further analysis.
Selection of SNPs (probes) based on PCs and a t-test
The top N outliers among the PCs were selected as described previously . The DNA methylation profiles and genotypes were investigated by three pairwise one-sided t-test comparisons: normal tissue vs tumor, blood vs tumor, and blood vs normal tissue. Then, the SNPs (probes) with significant P- values (P < 0.05, adjusted by the Bonferroni correction ) for all three pairwise comparisons were considered to be genes that displayed significant differences between all three cell types. Finally, genes that are selected in common for DNA methylation and genotypes were picked up for further analysis.
Gene annotation using the Gendoo server
Gene annotation was performed with Gendoo (gene, disease features ontology-based overview system) [13, 14]. The RefSeq mRNA IDs for the selected genes were extracted from GEO and transformed to the gene symbols. The gene symbols were then uploaded to the Gendoo server and diseases that were associated with gene symbols were listed with their P- values, which indicated the significance of the associations.
Feature selection based on correlation coefficients
The Pearson and Spearman correlation coefficient for the i th probe (SNP) was then computed between x ij and y j . Finally, the 300 probes (SNPs) with the largest correlation coefficients were selected.
Feature selection based on partial least squares
Partial least squares (PLS) provides a bilinear representation of data and PLS-based feature selection aims to select features that have the most weight to linear combinations . For simplicity, we employed the PLS+MCLASS strategy , where PLS was applied directly to multiclass samples. This strategy is, at most, the third-best depending on the data set being tested (Other strategies include, for example, a voting strategy based on pairwise PLS applications ). However, because there are only three classes in our study, very little improvement can be expected even if the best strategy is employed, as shown previously .
Stepwise feature selection
Stepwise feature selection was performed by adding/removing features iteratively, until the performance reached its maximum. In this study we performed stepwise variable selection using the stepclass function with the lda function as implemented in R .
Lasso-based feature selection
Least absolute shrinkage and selection operator (Lasso)  is another frequently used feature extraction method. Lasso applies linear discriminant analysis with minimizing sum of regression coefficients. This results in the elimination of redundant features. To apply Lasso to our data set, we employed the LARS function implemented in R  by specifying the type=" lasso" option.
t-test of the microarray measurements between genotype and DNA methylation
For the SNPs that were selected in common between genotype and DNA methylation, we used the one-sided t-test that rejects the null hypothesis that the microarray measurement of genotype is as large as the DNA methylation value in favor of the microarray measurement of genotype is more than the DNA methylation value. For random sampling, the same set of SNPs was used for the genotype and DNA methylation measurements.
Protein tertiary structure prediction
Screening drug candidate compounds from the DrugBank database
We downloaded 6583 compounds in smiles format from DrugBank [22, 23]. The smiles format was transformed to three dimensional structures by Babel . The structures of 6510 of the compounds were obtained. Tanimoto indices were computed between the individual compounds and ligands that bind to template proteins. Compounds with Tanimoto indices larger than the threshold values (0.25 for tyrosine kinase receptor (ALK), 0.20 for the other proteins) were selected as candidate drug compounds.
Selection of template proteins and ligands
The template protein structures that we used for in silico drug discovery were selected as follows: first, each template must be used as a model protein for the ligand binding region of the target protein; second, the protein structures that ligands could bind to were selected as templates; and third, as many as possible of the ligands that could bind to several of the model proteins, including those not selected as templates, were selected and fitted to a template protein. These ligands were the "fingerprint" for drug discovery and were used for to compute the Tanimoto index.
Docking simulation using ChooseLD
Docking between the screened compounds and template proteins was performed using ChooseLD . The FPAScore  (minimization of free energy between each compound and template protein) were computed ten times for each compound. The compounds were ranked based on the best score among the ten values. Whole computations were performed independently three times and consistency between the three trials was evaluated.
Estimation of coincident of highly ranked compounds between three independent trials
When the number of highly ranked compounds selected in common between the three independent trials is much less than this number and is close to k, we can conclude that consistency between the three trials is high.
Estimation of genotype-specific DNA methylation
The genotype is specifically demethylated/methylated in the tumor tissue compared with other genotypes (strength of aberrant DNA methylation).
The genotype is abundant in the tumor tissue (abundance of aberrant DNA methylation).
The best balance between these two conditions is not easy to estimate, because there is no standard understanding about the kind of gene abnormalities that generally cause tumors. In this study, we used three kinds of samples: blood, normal and tumor tissues. This made the comparisons more difficult than a comparison between only normal and tumor tissues, because we are not sure if normal tissue is an expected intermediate between blood and tumor. To avoid uncertainties that this complicated situations might cause when estimating genotype-specific DNA methylation, we employed a recently proposed PCA-based unsupervised feature selection method . This procedure does not require the user to select the criterion that is used to estimate genotype-specific DNA methylation. It is necessary simply to select the suitable PC by which the SNPs with genotype-specific DNA methylation are selected.
Genotype-specific DNA methylation estimated using the Nsp microarray data
Intersection between top N outliers between DNA methylation and genotype.
All three associated P- values adjusted by the BH criterion  are less than 0.05, when three pairwise one-sided t-tests (tumor tissue vs normal tissue, normal tissue vs blood, tumor tissue vs blood) are applied.
SNPs selected for DNA methylation and genotype measured by the Nsp microarray.
Genotype-specific DNA methylation estimated using the Sty microarray data
SNPs measured by the Sty microarray using PC4 for genotype and PC3 for DNA methylation.
Estimation of optimal N
SNPs measured by the Sty microarray using PC3 for genotype and PC4 for DNA methylation.
Comparison with other methods
To our knowledge, no feature selection methods that are applicable to three classes of data set without the need for preknowledge about the internal ranking between the classes are currently available. Although our method requires the manual selection of the PCs used for feature selection, no pre-knowledge about the ranking between classes is needed and how the classes should be ranked is quite clear from the PCs (Figures 1, 2, 4, and 5). Thus, there are no other methods that can be compared with our methods.
However, because we now know that the rank between the classes is blood < normal tissue < tumor tissue, we have applied other methods that require this pre-knowledge.
Comparison of our method with other feature selection methods.
The Pearson correlation-based, Spearman correlation-based, and PLS-based feature selection methods successfully selected the 300 topmost SNPs for genotype and DNA methylation. However, the number of SNPs selected in common between genotype and DNA methylation was smaller than the numbers selected the present study (Table 4). Thus, our method clearly outperforms the other methods in selecting the genes in common between genotype and DNA methylation.
Properties of the selected SNPs
Almost all selected SNPs were located outside protein cording regions of the genes (see Additional file 1). The only exceptions were SNP_A-4242077 (associated with PIWIL1), SNP_A-4288260 (associated with PIGO), and SNP_A-1988914(associated with TARBP1). Thus, the majority of the SNPs are presumably related to the regulation of gene expression. The SNPs that were not located in protein coding regions were located in the promoters (identified as "upstream" in additional file 1), and also in introns and in the downstream regions of genes. Thus, the effect of genotype-specific DNA methylation on gene expression is not straightforward.
In addition, some of the selected SNPs have not been reported in Chinese populations, although all patients in the microarray data sets that we used in this study were Chinese. This finding indicates that we have correctly selected mutation that may cause cancer formation.
Screening of cancer-related genes
To determine if the selected SNPs are biologically related to cancers, the genes containing the SNPs were annotated using Gendoo [13, 14]. The RefSeq mRNA IDs of the genes were extracted from GEO and mapped to gene symbols (Additional file 2). The gene symbols were uploaded to the Gendoo server and the diseases that were reported to be associated with each of the gene symbols were listed (see Additional file 3). We found that 86 of the 155 genes listed in Additional file 2 were associated with at least one cancer-related disease. In addition, we performed a literature search to find papers that reported the relationship between any of the 86 selected genes and cancers, because the Gendoo server annotation is based on automated text-mining and may include some misinterpretations. We found that most of 86 genes were mentioned in at least one published paper that described their relationship with cancer (see Additional file 4). Thus, we confirmed that more than half (86) the 155 genes screened by our method were cancer-related genes. In particular, twelve genes (CCND1, CCNL1, CKAP4, CRABP1, FGF3, GRHL2, MYEOV, PKP4, RAP2B, RPL14, SMAD3, ZNF639) were associated with "Carcinoma, Squamous Cell" and eleven genes (CCND1, CKAP4, CRABP1, EVI1, FGF3, MYEOV, PKP4, RPL14, SMAD3,TMEM16A,ZNf639) were associated with "Esophageal Neoplasms". Among them, nine genes are associated with both. Because this study used data sets for ESCC (esophageal squamous cell carcinoma), this association is reasonable and demonstrates the reliability of our method.
Genes with genotype-specific DNA methylation are less methylated than expected
t-tests of microarray measurements between genotype and DNA methylation for blood, normal and tumor tissues.
3.1 × 10− 12
t-tests of randomly sampled SNPs between genotype and DNA methylation.
Number of significant P- values
6.9 × 10− 4
Number of significant P- values
Number of significant P- values
6.12 × 10− 4
9.56 × 10− 3
Structure prediction of the proteins associated with selected genes
Although we selected genes with genotype-specific DNA methylation, for therapeutic purposes, we need to design drugs for the proteins that are encoded by these genes. To identify candidate drugs computationally, the tertiary structures of the target proteins are required as templates. However, the structures of many of the encoded proteins have not been reported.
To obtain the tertiary structure of these proteins, we used two protein structure prediction servers FAMS [18, 19] and phyre2 [20, 21] to predict the structure using only the amino acid sequence of the protein (see Additional file 5 for the amino acid sequences (in fasta format) that were used to predict the tertiary structures of the proteins).
The results of the protein structure predictions are summarized in Additional file 4. Some protein structures were already in the protein data bank (PDB) , if not, they were modeled using the structure of a suitable reference protein. These structures were then used as templates to predict drug candidates in silico.
For the proteins that were not in the PDB, for the reference proteins that were used for the structure prediction, we sought cancer-related papers that cited the reference proteins. The references to these papers are listed in Additional file 4. Most of reference proteins used for structure prediction were cancer-related. This finding also suggests that our gene selection process and protein structure prediction are plausible.
In silico drug discovery
After the FPAScores were estimated (see Methods and Figure 9), to check if three independent trials were feasible, we tested coincidence between three trials in two ways. First, we computed the correlation coefficients between three independent trials. For all pairwise computations for ALK, EGLN3, and NUAK1, the correlation coefficients were greater than 0.9. This suggests that the FRAScores computed by ChooseLD were highly reproducible. (For actual values of the correlation coefficients and scatter plots, see Additional file 6). However, the correlation coefficients represent the overall reproducibilities of FPAScores for the candidate drug compounds. It is more important that the compounds with higher FPAScores, i.e., those regarded as being highly reliable, were reproducible. Therefore, we checked how often the highly ranked compounds were selected between the three trials and found that the selection of the highly ranked compounds was also highly reproducible (see Additional file 7).
The 10 top-ranked compounds as drug targets for ALK, EGLN3, and NUAK1.
Representative target cancer genes
ALK, c-MET, LCK,
TRKA, TRKB, TIE2, ABL
ITK, SYK, MAPKAPK2, GSK3,
CSK, CDK, PIK3CG, ZAP-70
RPS6KA4, POR(P450), SGK1,
NOS1, DPYD, DHODH
EGLN3 (with Fe)
EGLN3 (without Fe)
EGLN1, PHD2, HIF1A
already listed in EGLN3 (with Fe)
already listed in EGLN3 (with Fe)
already listed in EGLN3 (with Fe)
already listed in EGLN3 (with Fe)
already listed in EGLN3 (with Fe)
already listed in EGLN3 (with Fe)
CSF1R and others
CSF1R and others
CSF1R and others
CSF1R and others
PKD1 and others
In this paper, we investigated genotype-specific DNA methylation in esophageal squamous cell carcinoma, using principal component analysis. We identified more than 100 genotype-specific DNA methylation SNPs associated with the disease. Among 155 genotype-specific DNA methylation associated genes, 86 were associated with cancers using the Gendoo server. The structures of proteins encoded by selected genotype-specific DNA methylation associated genes were predicted successfully using two profile based methods, FAMS and Phyre2. Candidate drug compounds were screened using the Tanimoto index from DrugBank and were evaluated by ChooseLD for three selected proteins, ALK, EGLN3 and NUAK1. The selected drug candidates were promising starting points for future studies.
We would like to thank Dr. Katsuichiro Komatsu who helped with the in silico drug screening using ChooseLD.
This research was funded by KAKENHI, 23300357 and Chuo University Joint Research Grant.
This article has been published as part of BMC Systems Biology Volume 8 Supplement 1, 2014: Selected articles from the Twelfth Asia Pacific Bioinformatics Conference (APBC 2014): Systems Biology. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcsystbiol/supplements/8/S1.
- Shen L, Kondo Y, Guo Y, Zhang J, Zhang L, Ahmed S, Shu J, Chen X, Waterland RA, Issa JP: Genome-wide profiling of DNA methylation reveals a class of normally methylated CpG island promoters. PLoS Genet. 2007, 3 (10): 2023-2036.View ArticlePubMedGoogle Scholar
- McCabe MT, Brandes JC, Vertino PM: Cancer DNA methylation: molecular mechanisms and clinical implications. Clin Cancer Res. 2009, 15 (12): 3927-3937. 10.1158/1078-0432.CCR-08-2784.PubMed CentralView ArticlePubMedGoogle Scholar
- Pasche B, Yi N: Candidate gene association studies: successes and failures. Curr Opin Genet Dev. 2010, 20 (3): 257-261. 10.1016/j.gde.2010.03.006.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhou W: Mapping genetic alterations in tumors with single nucleotide polymorphisms. Curr Opin Oncol. 2003, 15: 50-54. 10.1097/00001622-200301000-00007.View ArticlePubMedGoogle Scholar
- Scherf DB, Sarkisyan N, Jacobsson H, Claus R, Bermejo JL, Peil B, Gu L, Muley T, Meister M, Dienemann H, Plass C, Risch A: Epigenetic screen identifies genotype-specific promoter DNA methylation and oncogenic potential of CHRNB4. Oncogene. 2012Google Scholar
- Opavsky R, Wang SH, Trikha P, Raval A, Huang Y, Wu YZ, Rodriguez B, Keller B, Liyanarachchi S, Wei G, Davuluri RV, Weinstein M, Felsher D, Ostrowski M, Leone G, Plass C: CpG island methylation in a mouse model of lymphoma is driven by the genetic configuration of tumor cells. PLoS Genet. 2007, 3 (9): 1757-1769.View ArticlePubMedGoogle Scholar
- Abnet CC, Wang Z, Song X, Hu N, Zhou FY, Freedman ND, Li XM, Yu K, Shu XO, Yuan JM, Zheng W, Dawsey SM, Liao LM, Lee MP, Ding T, Qiao YL, Gao YT, Koh WP, Xiang YB, Tang ZZ, Fan JH, Chung CC, Wang C, Wheeler W, Yeager M, Yuenger J, Hutchinson A, Jacobs KB, Giffen CA, Burdett L, Fraumeni JF, Tucker MA, Chow WH, Zhao XK, Li JM, Li AL, Sun LD, Wei W, Li JL, Zhang P, Li HL, Cui WY, Wang WP, Liu ZC, Yang X, Fu WJ, Cui JL, Lin HL, Zhu WL, Liu M, Chen X, Chen J, Guo L, Han JJ, Zhou SL, Huang J, Wu Y, Yuan C, Huang J, Ji AF, Kul JW, Fan ZM, Wang JP, Zhang DY, Zhang LQ, Zhang W, Chen YF, Ren JL, Li XM, Dong JC, Xing GL, Guo ZG, Yang JX, Mao YM, Yuan Y, Guo ET, Zhang W, Hou ZC, Liu J, Li Y, Tang S, Chang J, Peng XQ, Han M, Yin WL, Liu YL, Hu YL, Liu Y, Yang LQ, Zhu FG, Yang XF, Feng XS, Wang Z, Li Y, Gao SG, Liu HL, Yuan L, Jin Y, Zhang YR, Sheyhidin I, Li F, Chen BP, Ren SW, Liu B, Li D, Zhang GF, Yue WB, Feng CW, Qige Q, Zhao JT, Yang WJ, Lei GY, Chen LQ, Li EM, Xu LY, Wu ZY, Bao ZQ, Chen JL, Li XC, Zhuang X, Zhou YF, Zuo XB, Dong ZM, Wang LW, Fan XP, Wang J, Zhou Q, Ma GS, Zhang QX, Liu H, Jian XY, Lian SY, Wang JS, Chang FB, Lu CD, Miao JJ, Chen ZG, Wang R, Guo M, Fan ZL, Tao P, Liu TJ, Wei JC, Kong QP, Fan L, Wang XZ, Gao FS, Wang TY, Xie D, Wang L, Chen SQ, Yang WC, Hong JY, Wang L, Qiu SL, Goldstein AM, Yuan ZQ, Chanock SJ, Zhang XJ, Taylor PR, Wang LD: Genotypic variants at 2q33 and risk of esophageal squamous cell carcinoma in China: a meta-analysis of genome-wide association studies. Hum Mol Genet. 2012, 21 (9): 2132-2141. 10.1093/hmg/dds029.PubMed CentralView ArticlePubMedGoogle Scholar
- Maeng CH, Lee J, van Hummelen P, Park SH, Palescandolo E, Jang J, Park HY, Kang SY, MacConaill L, Kim KM, Shim YM: High-throughput genotyping in metastatic esophageal squamous cell carcinoma identifies phosphoinositide-3-kinase and BRAF mutations. PLoS ONE. 2012, 7 (8): e41655-10.1371/journal.pone.0041655.PubMed CentralView ArticlePubMedGoogle Scholar
- Wang Y, Ji R, Wei X, Gu L, Chen L, Rong Y, Wang R, Zhang Z, Liu B, Xia S: Esophageal squamous cell carcinoma and ALDH2 and ADH1B polymorphisms in Chinese females. Asian Pac J Cancer Prev. 2011, 12 (8): 2065-2068.PubMedGoogle Scholar
- Yang HH, Hu N, Wang C, Ding T, Dunn BK, Goldstein AM, Taylor PR, Lee MP: Influence of genetic background and tissue types on global DNA methylation patterns. PLoS ONE. 2010, 5 (2): e9355-10.1371/journal.pone.0009355.PubMed CentralView ArticlePubMedGoogle Scholar
- Ishida S, Umeyama H, Iwadate M, Taguchi YH: Bioinformatic screening of autoimmune disease genes and protein structure prediction with FAMS for drug discovery. Protein Pept Lett.Google Scholar
- Holm S: A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics. 1979, 6 (2): 65-70.Google Scholar
- Nakazato T, Bono H, Matsuda H, Takagi T: Gendoo: functional profiling of gene and disease features using MeSH vocabulary. Nucleic Acids Res. 2009, 37 (Web Server): W166-169. 10.1093/nar/gkp483.PubMed CentralView ArticlePubMedGoogle Scholar
- Gendoo. [http://gendoo.dbcls.jp/]
- Student S, Fujarewicz K: Stable feature selection and classification algorithms for multiclass microarray data. Biol Direct. 2012, 7: 33-10.1186/1745-6150-7-33.PubMed CentralView ArticlePubMedGoogle Scholar
- R Core Team: R: A Language and Environment for Statistical Computing. 2013, R Foundation for Statistical Computing Vienna, Austria, [http://www.R-project.org/]Google Scholar
- Tibshirani R: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological). 1996, 267-288.Google Scholar
- FAMS. [http://fams.bio.chuo-u.ac.jp/fams/]
- Umeyama H, Iwadate M: FAMS and FAMSBASE for protein structure. Curr Protoc Bioinformatics. 2004, Chapter 5: Unit5.2Google Scholar
- Phyre2. [http://www.sbg.bio.ic.ac.uk/phyre2/]
- Kelley LA, Sternberg MJ: Protein structure prediction on the Web: a case study using the Phyre server. Nat Protoc. 2009, 4 (3): 363-371. 10.1038/nprot.2009.2.View ArticlePubMedGoogle Scholar
- DrugBank. [http://www.drugbank.ca/]
- Knox C, Law V, Jewison T, Liu P, Ly S, Frolkis A, Pon A, Banco K, Mak C, Neveu V, Djoumbou Y, Eisner R, Guo AC, Wishart DS: DrugBank 3.0: a comprehensive resource for 'omics' research on drugs. Nucleic Acids Res. 2011, 39 (Database): D1035-1041. 10.1093/nar/gkq1126.PubMed CentralView ArticlePubMedGoogle Scholar
- O'Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR: Open Babel: An open chemical toolbox. J Cheminform. 2011, 3: 33-10.1186/1758-2946-3-33.PubMed CentralView ArticlePubMedGoogle Scholar
- Takaya D, Takeda-Shitaka M, Terashi G, Kanou K, Iwadate M, Umeyama H: Bioinformatics based Ligand-Docking and in-silico screening. Chem Pharm. Bull. 2008, 56 (5): 742-744. 10.1248/cpb.56.742.View ArticlePubMedGoogle Scholar
- Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological). 1995, 289-300.Google Scholar
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res. 2000, 28: 235-242. 10.1093/nar/28.1.235.PubMed CentralView ArticlePubMedGoogle Scholar
- ChEMBL. [https://www.ebi.ac.uk/chembl/]
- Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, McGlinchey S, Michalovich D, Al-Lazikani B, Overington JP: ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012, 40 (Database): D1100-1107.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.