- Open Access
Integrative analysis of methylation and transcriptional profiles to predict aging and construct aging specific cross-tissue networks
© The Author(s). 2016
- Published: 23 December 2016
Aging is a complex process relating multi-scale omics data. Finding key age markers in normal tissues could help to provide reliable aging predictions in human. However, predicting age based on multi-omics data with both accuracy and informative biological function has not been performed systematically, thus relative cross-tissue analysis has not been investigated entirely, either.
Here we have developed an improved prediction pipeline, the Integrating and Stepwise Age-Prediction (ISAP) method, to regress age and find key aging markers effectively. Furthermore, we have performed a serious of network analyses, such as the PPI network, cross-tissue networks and pathway interaction networks.
Our results find important coordinated aging patterns between different tissues. Both co-profiling and cross-pathway analyses identify more thorough functions of aging, and could help to find aging markers, pathways and relative aging disease researches.
- Gene Ontology
- KEGG Pathway
- Methylation Data
- Hypergeometric Test
Aging is a multi-faceted and progressive bio-process for many organisms . The aging process is composed by a serious of complex dynamic molecular interactions , which indicate key physiological phenotypes of human health. Moreover, dysfunctions of aging process have been shown to be involved in many disorders such as Parkinson disease , Alzheimers’ disease , many kinds of cancers  and so on. Therefore finding important aging markers could provide opportunities to predict healthy factors and improve diagnostic results for both pre- and pro- gnosis .
It has been reported that profiling patterns of crucial DNA methylation/mRNA markers change with the chronological age . For example, many single tissue predictors (based on methylation or expression data) have been applied to identify aging biomarkers . In addition, a multi-tissue predictor based on methylation data has been used to analyze aging functions . Thus predicting age using multi-scale genome-wide data in normal tissues could provide reliable results of aging-related disease risks thereby . As a result, integrating multi-omics data (i.e. epigenome and transcriptome data) with high predicting ability and meaningful biological results is required to analyze the aging process.
On the other hand, finding interactions between biomarkers is also important to identify characterizations of tissue/individual changes, such as phenotype stage and disease outcome [9, 10]. Reconstructing molecular networks also gives systematic approaches to deal with multi-scale data in aging analysis . A previous study has constructed cross-tissue aging networks based on tissue-specific microarray mRNA data in mouse . Nevertheless, integrative aging networks from multi-scale data (i.e. methylation and expression) have not been constructed entirely in Homo species.
In the present work, we developed a computational pipeline, the Integrating and Stepwise Age-Prediction (ISAP) method, to regress and predict age by integrating methylation and expression data in 9 normal tissues of human. The improved method found key integrative markers for both multi-tissue and tissue-specific models with high accuracy. Furthermore, a serious of network analyses such as the shortest Protein-Protein Interaction (PPI) network, cross-tissue co-profiling and pathway interaction networks revealed coordinated aging patterns in both multi-omics profiling and functional levels. The results showed integrative age-correlated profiles were associated with important pathway characteristics.
Data and pre-process
We obtained paired methylation, transcription and clinical data in 9 different normal tissues with more than 10 samples (BLCA: Bladder Urothelial tissue, BRCA: Breast invasive tissue, HNSC: Head/Neck squamous cell tissue, KIRC: Kidney renal clear cell tissue, KIRP: Kidney renal papillary cell tissue, LIHC: Liver tissue, LUAD: Lung tissue, PRAD: Prostate tissue, THCA: Thyroid tissue) from The Cancer Genome Atlas (TCGA, http://cancergenome.nih.gov)  platform (only using level-3 data). The age of each person came from TCGA clinical data. TCGA methylation data come from the Illumina Infinium Human Methylation27 BeadChip or the Illumina Infinium HumanMethylation450 BeadChip, therefore Illumina probe ID presented in both platforms were selected for further analysis . Transcription data were obtained from RNASeq-v2 data (level-3). Both methylation and expression data were treated by a Singular Value Decomposition (SVD)  method (regress the first 3 principle components) to assess the sources of inter-sample variation separately in each tissue, and then were normalized to have zero mean and unit variance.
The choice of training data sets was guided by the following criteria the same as the previous study : First, the training data should represent a wide spectrum of tissues and cell types; second, the mean age in the training data should be comparable to that of the test data. As a result, to predict age of each person in the multi-tissue model, 6 tissues (Bladder, Breast, Head/Neck, Kidney renal clear cell, Lung and Thyroid) were set as the training data (mean value ≈ 58 years). The rest 3 tissues (Kidney renal papillary cell, Liver and Prostate) were set as the independent test data (mean value ≈ 61 years). To train the multi-tissue model, each one out of the six tissues of training data was set as a set of temporary test data, so cross-validation of the multi-tissue model was performed as 6-fold. To train tissue-specific models, the common 5-fold cross-validation was performed for each tissue respectively.
Least absolute shrinkage and selection operator (Lasso) is a regression method performing both variable selection and regularization to improve the prediction accuracy and interpretability of the statistical model . In this work Lasso was used to regress age using methylation data and the penalty parameter λ value was determined by cross-validation.
Partial least-square (PLS) regression
The partial least-square regression (PLS) method is often used for dimension reduction when dealing with small-sized samples of gene expression data . The algorithm is mainly performed as described by Höskuldsson .
In this work PLS was used to transformed high-dimension expression data before stepwise regression and the number of first modified direction vectors of PLS was finally determined (after stepwise regression) by cross-validation.
Sort gene expressions by their absolute correlation coefficients with the output in descending order. In this work the output was the residuals of ages from Lasso.
Add each sorted gene expression after PLS transformation individually. The number of selected genes was determined by cross-validation.
Integrating and stepwise age-prediction method
Predict age using Lasso regression based on methylation data, the penalty parameter λ value in Lasso is determined by cross-validation using training data, and save the residuals after Lasso regression.
Sort expression features (genes) by absolute correlation coefficients with the residuals in descending order.
Predict the residuals by forward stepwise regression method based on expression data. The expression data with selected features (genes) are transformed into full-ranked matrix by PLS firstly, and the number of selected genes are determined by cross-validation based on training data.
For selected expression features (genes), determined the number of first direction vectors of PLS by cross-validation based on the training data.
Caculate the final regression coefficients of aging markers (both methylation and expression), and save the regression coefficients.
Protein-protein interaction (PPI) network
The background weighted PPI network was constructed using data from STRING database (http://string-db.org/, version 10) . It weights protein-protein interactions by calculating confidence scores. In this work, 70% confidence score (>700) has been used as a cut-off for further analysis. Each pair of selected integrative markers was picked out and calculated their shortest pathway in PPI network using Dijkstra algorithm . Finally the PPI network with shortest pathway among selected markers were constructed, and proteins/genes in the PPI network were sorted by their betweennesses in descending order.
To test whether the top betweenness genes were hubs in the background network or not, we ran a permutation to count the occurrence time of the top genes in the shortest paths between random selected genes (contained the same number of selected gene set of aging markers) when they have greater betweennesses than those in our study. We repeated this process 1000 times, and the p-value was calculated as the proportion of occurrence times of the top betweenness genes in 1000 permutations.
Construction of integrative cross-tissue co-profiling network of aging
Age ≥ 60 was considered as ‘old’ age group, and age ≤ 50 was considered as ‘young’ age group. Tissues with more than 3 samples in both young and old group were selected to constructed cross-tissue network. As a result, 7 tissues were selected, they were: Breast invasive tissue, Head/Neck squamous cell tissue, Kidney renal clear cell tissue, Kidney renal papillary cell tissue, Liver tissue, Lung tissue, and Thyroid tissue. The number of tissue–tissue pairs was 21 in total.
Moreover, all the aging markers in selected tissue-tissue pairs were enriched to GO terms by the hypergeometric test (FDR <0.25) to find functional characteristics of tissue-tissue cross-talk.
Construction of integrative cross-tissue pathway interaction aging network
Selected aging markers in each tissue were enriched to KEGG pathway by the hypergeometric test using formula (1). Significant KEGG pathways (FDR <0.25) in each tissue (totally 7) was selected to further analysis. Three types of pathway interaction networks were considered: first, sum of absolute K-S value differences (> a moderate threshold, i.e. 0.6) was used as the connectivity of two pathways from two tissues; second, sum of all the absolute K-S value differences (with no thresholds) was used as the connectivity of two pathways between two tissues; third, sum of all the absolute K-S value differences (with a more rigorous KEGG enrichment FDR <0.1) was used as the connectivity.
Aging regression results
Regression results of ISAP and other methods
Lasso: methylation and expression
PLS: methylation and expression
elastic net: methylation
elastic net: expression
elastic net: methylation and expression
Table 1 also shows that age prediction by methylation data with higher accuracy than using expression data. Moreover, simply combining methylation and expression could not improve regression results compared with using methylation data alone (shown in Table 1). Therefore, our improved computational method could integrate methylation and transcriptional data more effectively than other general methods.
Regression results in the multi-tissue model
Kidney renal clear cell
Kidney renal papillary cell
Regression results in tissue-specific models
Kidney renal clear cell
Kidney renal papillary cell
Functional/enrichment analysis and PPI-network
Firstly, selected integrative biomarkers were sorted by their absolute regression weights in descending order, which indicated their profiling patterns correlating the chronological age. For example, the marker with the greatest weight in the multi-tissue model was methylation profile of gene GPR45. Protein of GPR45 functioned in the central nervous system, and was reported to be related to aging significantly . Moreover, the marker with the greastest weight in tissue-specific models was expression of gene CORO6 in the Kidney renal clear cell model, which has been reported that to be regulated by age .
Furthermore, we performed enrichment analysis of selected integrative biomarkers (multi-tissue and tissue-specific) on Biological Process (BP) of Gene Ontology (GO) and KEGG pathway using the hypergeometric test. In the multi-tissue model, the top GO biological process and KEGG pathway were positive regulation of immune system process (GO:0007059, p-value = 1.6643e-08, and FDR =1.3308e-05) and cell adhesion molecules (CAMs, p-value = 1.4205e-06, and FDR =2.4517e-04), respectively. It has been reported that inflammatory gene sets such as positive regulation of immune system process or dysfunction of immune system are induced by aging . In addition, multiple changes in immune system occur and disrupt the regulation of body cells with aging in the immune dysregulation theory , which also coincides with our results. Many cell adhesion molecules have been indicated to be dependent on aging , too. In tissue-specific models, the top GO biological process and KEGG pathway were negative regulation of phosphate metabolic process (GO:0045936, p-value = 9.7695e-05, and FDR =0.0157) in kidney renal papillary cell and Antigen processing and presentation pathway (p-value = 3.3052e-06, and FDR =0.0006) in thyroid, respectively. Phosphate metabolic process has been reported to relate to aging diseases and cancer . Moreover, autophagy is an important mechanism of intracellular pathogen’s antigens, and dysfuncition of autophagy is also regulated by aging . As a result, the enrichment analyses indicated that functions of immune system related to aging deeply.
Top aging markers with their betweennesses in the PPI network
Integrative aging-specific cross-tissue co-profiling networks
Furthermore, each of the selected 31 pairs was calculated to investigate whether any pair shares the same GO term. As a result, the correlation of GATA4 in head/neck and EGFL7 in kidney renal clear cell shared most GO terms (number = 4), including system development (GO: 0048731), multicellular organismal development (GO: 0007275), anatomical structure development (GO: 0048856) and animal organ development (GO: 0048513). Since EGFL7 and GATA4 are related to aging [31, 32], it was possible the expression of EGFL7 in the kidney was regulated by the expression of GATA4 in head/neck, with this regulation changing with age. All the shared terms were related to cell/tissue developments, and tissue develpoments are correlated with aging , obviously.
In addition, all the aging specific genes with top high K-S differences (>0.85) were also performed enrichment analysis of GO terms by the hypergeometric test. The top GO terms were positive regulation of caspase activity (GO:0043280, p-value = 9.8594e-05, and FDR =0.0717). It is well known that caspase-dependent apoptotic signaling is vital to many human aging diseases [30, 34].
Integrative aging-specific cross-tissue pathway interaction networks
In immunosenescence theories of aging, either innate or adaptive immune responses are related to the aging process . For instance, cellular senescence is believed to be involved immune and aging progresses, and suppressed relative pathways such cell cycle, pathway of cancers and so on [26, 30, 35]. Cell adhesion cascades appear to affect the functional capacity of cells during aging [27, 36]. Moreover, both cell adhesion and neurotrophin cooperatively perform critical functions in the aging process responding to the immune system [37, 38]. Our results found that the cross-talk among key pathways (i.e. cell cycle, cell adhesion and neurotrophin signaling) played important roles in the aging process altogether. Furthermore, head/neck and kidney might be in core status to regulate the aging process and relative pathways in other tissues.
Predicting age in human normal tissues is fundamental to aging researches. In this paper, we developed an improved method, the ISAP pipeline, to integrate both methylation and expression data for age prediction. The ISAP method predicted age more accurately than other popular methods. Furthermore, the PPI network and enrichment analyses also find core aging genes and pathways.
In addition, network analysis could also help to identify aging related genes/pathways between different tissues. We have performed a serious of network analyses of aging specific markers, and find important profiling patterns and pathway interactions. Our results confirmed existing aging theories or hypotheses and could improve further aging researches.
This article has been published as part of BMC Systems Biology Volume 10 Supplement 4, 2016: Proceedings of the 27th International Conference on Genome Informatics: systems biology. The full contents of the supplement are available online at http://bmcsystbiol.biomedcentral.com/articles/supplements/volume-10-supplement-4.
This work was supported by the National High Technology Research and Development Program (863 Program) (2012AA02A602, 2015AA020104), National Science and Technology Major Project (2012ZX09303013-015), Special funds for scientific research in the health industry (201302010).
The authors declare that publication costs of the article were funded by Natural Science Foundation of China and the National '973' Basic Research Program. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Availability of data and materials
The data supporting the results of this article are included and cited within the article and its additional files.
YW performed the algorithm and analyzed the data; YW and TH and wrote the manuscript; LL and LX designed and sponsored the study. All authors read and approved the manuscript.
The authors declared that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Finch CE. Longevity, Senescence, and the Genome. Chicago: University of Chicago Press; 1990.Google Scholar
- Xue H, Xian B, Dong D, Xia K, Zhu S, Zhang Z, et al. A modular network model of aging. Mol Syst Biol. 2007;3:147.View ArticlePubMedPubMed CentralGoogle Scholar
- Gilberto L. The relationship of Parkinson disease with aging. Arch Neurol. 2007;64(9):1242–6.View ArticleGoogle Scholar
- McKhann GM, Albert MS, Grossman M, Miller B, Dickson D, et al. Clinical and pathological diagnosis of frontotemporal dementia: report of the Work Group on Frontotemporal Dementia and Pick’s Disease. Arch Neurol. 2001;58:1803–9.View ArticlePubMedGoogle Scholar
- Angela Grimes BS, Sathees BC. Chandra significance of cellular senescence in aging and cancer. Cancer Res Treat. 2009;41(4):187–95.View ArticlePubMedPubMed CentralGoogle Scholar
- Sood S, Gallagher IJ, Lunnon K, Rullman E, Keohane A, Crossland H, Phillips BE, Cederholm T, Jensen T, van Loon LJ, Lannfelt L, Kraus WE, Atherton PJ, Howard R, Gustafsson T, Hodges A, Timmons JA. A novel multi-tissue RNA diagnostic of healthy ageing relates to cognitive health status. Genome Biol. 2015;16:185.View ArticlePubMedPubMed CentralGoogle Scholar
- Steve H. DNA methylation age of human tissues and cell types Horvath. Genome Biol. 2013;14:R115.View ArticleGoogle Scholar
- Kajia Cao, Alice S. Chen-Plotkin, Joshua B. Plotkin, Li-San Wang Age-correlated gene expression in normal and neurodegenerative human brain tissues. PLoS One. 5(9): e13098. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2947518/.
- Taylor IW, Linding R, Warde-Farley D, Liu Y, Pesquita C, Faria D, et al. Dynamic modularity in protein interaction networks predicts breast cancer outcome. Nat Biotechnol. 2009;27:199–204.View ArticlePubMedGoogle Scholar
- Huang T, Liu L, Qian Z, Tu K, Li Y, and Xie L. (2010a). Using GeneReg to construct time delay gene regulatory networks. BMC Res Notes 3, 142Google Scholar
- Tao H, Zhang J, Lu X, Dong X, Tao H, Zhang J, Lu X, Dong X, Zhang L, Cai YD, Li YX. Crosstissue coexpression Network of aging. OMICS. 2011;15(10):665–71.View ArticleGoogle Scholar
- Hudson TJ, et al. International network of cancer genome projects. Nature. 2010;464:993–8.View ArticlePubMedGoogle Scholar
- Teschendorff AE, Menon U, Gentry-Maharaj A, Ramus SJ, Gayther SA, et al. An epigenetic signature in peripheral blood predicts active ovarian cancer. PLoS One. 2009;4:e8274.View ArticlePubMedPubMed CentralGoogle Scholar
- Tibshirani R. Regression shrinkage and selection via the lasso. J Royal Stat Soc, Series B. 1996;58(1):267–88.Google Scholar
- Boulesteix AL, Strimmer K. Partial least squares: a versatile tool for the analysis of high-dimensional genomic data. Brief Bioinform. 2006;8:32–44.View ArticlePubMedGoogle Scholar
- Höskuldsson A. PLS regression methods. J Chemometrics. 1988;2:211–28.View ArticleGoogle Scholar
- Draper N, Smith H. Applied Regression Analysis. 2nd ed. New York: Wiley; 1981.Google Scholar
- Aravind S, Pablo T, Mootha VK, Sayan M, Ebert BL, Gillette MA, Amanda P, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. PNAS. 2005;102(43):15545–50.View ArticleGoogle Scholar
- Gasper G, Rahman M. Basic hypergeometric series. Cambridge: Cambridge University Press; 2004. p. xxvi–428.View ArticleGoogle Scholar
- Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J Royal Stat Soc. 1995;57:289–300.Google Scholar
- Damian S, Andrea F, Stefan W, Kristoffer F, Davide H, Jaime H-C, Milan S, Alexander R, Alberto S, Tsafou KP, Michael K, Peer B, Jensen LJ, von Mering C. STRING v10: protein–protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 2015;43(Database issue):D447–52.Google Scholar
- Dijkstra E. A note on two problems in connection with graphs. Numerische Mathematik. 1959;1:269–71.View ArticleGoogle Scholar
- Marsaglia G, Tsang W, Wang J. Evaluating Kolmogorov's Distribution. J Stat Software. 2003;8(18): https://www.researchgate.net/publication/5142829_Evaluating_Kolmogorov%27s_Distribution.
- Park S-K, Kim K, Page GP, Allison DB, Weindruch R, Tomas A. Prolla Gene expression profiling of aging in multiple mouse strains: identification of aging biomarkers and impact of dietary antioxidants. Aging Cell. 2009;8(4):484–95.View ArticlePubMedPubMed CentralGoogle Scholar
- Sanchez D, Bajo-Grañeras R, Del Caño-Espinel M, Garcia-Centeno R, Garcia-Mateo N, Pascua-Maestro R, Maria D. Ganfornina Aging without Apolipoprotein D: Molecular and cellular modifications in the hippocampus and cortex. Exp Gerontol. 2015;67:19–47.View ArticlePubMedGoogle Scholar
- Castelo-Branco C, Soveral I. The immune system and aging: a review. Gynecol Endocrinol. 2014;30(1):16–22.View ArticlePubMedGoogle Scholar
- Volker R, Fausi R, Kathrin P, Bettina H, Wolfgang R. Thomas Kuntzed Circulating Vascular Cell Adhesion Molecules VCAM-1, ICAM-1, and E-Selectin in Dependence on Aging. Gerontology. 2003;49:293–300.View ArticleGoogle Scholar
- Matthew D, Victor D-U, Jianhua Z. Cellular Metabolic and Autophagic Pathways: Traffic Control by Redox Signaling. Free Radic Biol Med. 2013;63:207–21.View ArticleGoogle Scholar
- Ana Maria C, Fernando M. Autophagy and the immune function in aging. Curr Opin Immunol. 2014;0:97–104.Google Scholar
- Galluzzi L, Kepp O, Kroemer G. TP53 and MTOR cross-talk to regulate cellular senescence. AGING. 2010;2(9): https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2984599/.
- Fitch MJ, Luisa C, Frank K, Heidi Stuhlmann E. a Novel Epidermal Growth Factor-Domain Gene Expressed in Endothelial. Cells Dev Dyn. 2004;230(2):316–24.View ArticlePubMedGoogle Scholar
- Chanhee K, Qikai X, Martin TD, Li MZ, Marco D, Liviu A, Tao L, Yankner BA, Judith C, Elledge SJ. The DNA damage response induces inflammation and senescence by inhibiting autophagy of GATA4. Science. 2015;349(6255):aaa5612.View ArticleGoogle Scholar
- Richard F. Loeser Age-Related Changes in the Musculoskeletal System and the Development of Osteoarthritis. Clin Geriatr Med. 2010;26(3):371–86.View ArticleGoogle Scholar
- Favaloro B, Allocati N, Graziano V, Di Ilio C, De Laurenzi V. Regulation of Translation Initiation in Eukaryotes: Mechanisms and Biological Targets Cell. 2009;136(4):731–45. https://www.ncbi.nlm.nih.gov/pubmed/19239892.
- Grimes A, Chandra SB. Significance of cellular senescence in aging and cancer. Cancer Res Trea. 2009;41(4):187–95.View ArticleGoogle Scholar
- Ponnappan S, Aging UP, Function I. Molecular Mechanisms to Interventions. Antioxid Redox Signal. 2011;14(8):1551–85. doi:10.1089/ars.2010.3228.View ArticlePubMedPubMed CentralGoogle Scholar
- Quartu M, Serra MP, Boi M, Melis T, Ambu R, Del Fiacco M.Brain-derived neurotrophic factor (BDNF) and polysialylated-neural cell adhesion molecule (PSA-NCAM): codistribution in the human brainstem precerebellar nuclei from prenatal to adult age. Brain Res. 2010;1363:49-62. doi:10.1016/j.brainres.2010.09.106. Epub 2010 Oct 22.View ArticlePubMedGoogle Scholar
- Butcher S, Chahel H, Lord JM. Ageing and the neutrophil: no appetite for killing? Immunology. 2000;100(4):411–6.View ArticlePubMedPubMed CentralGoogle Scholar