Genomic distance entrained clustering and regression modelling highlights interacting genomic regions contributing to proliferation in breast cancer
© Dexter et al; licensee BioMed Central Ltd. 2010
Received: 1 March 2010
Accepted: 8 September 2010
Published: 8 September 2010
Genomic copy number changes and regional alterations in epigenetic states have been linked to grade in breast cancer. However, the relative contribution of specific alterations to the pathology of different breast cancer subtypes remains unclear. The heterogeneity and interplay of genomic and epigenetic variations means that large datasets and statistical data mining methods are required to uncover recurrent patterns that are likely to be important in cancer progression.
We employed ridge regression to model the relationship between regional changes in gene expression and proliferation. Regional features were extracted from tumour gene expression data using a novel clustering method, called genomic distance entrained agglomerative (GDEC) clustering. Using gene expression data in this way provides a simple means of integrating the phenotypic effects of both copy number aberrations and alterations in chromatin state. We show that regional metagenes derived from GDEC clustering are representative of recurrent regions of epigenetic regulation or copy number aberrations in breast cancer. Furthermore, detected patterns of genomic alterations are conserved across independent oestrogen receptor positive breast cancer datasets. Sequential competitive metagene selection was used to reveal the relative importance of genomic regions in predicting proliferation rate. The predictive model suggested additive interactions between the most informative regions such as 8p22-12 and 8q13-22.
Data-mining of large-scale microarray gene expression datasets can reveal regional clusters of co-ordinate gene expression, independent of cause. By correlating these clusters with tumour proliferation we have identified a number of genomic regions that act together to promote proliferation in ER+ breast cancer. Identification of such regions should enable prioritisation of genomic regions for combinatorial functional studies to pinpoint the key genes and interactions contributing to tumourigenicity.
The field of breast cancer research was amongst the first to adopt genomic profiling tools such as competitive genomic hybridisation (aCGH) and DNA methylation analysis in order to investigate the molecular basis of disease progression. Studies using aCGH to examine DNA copy number changes in breast tumours have demonstrated that the copy number aberrations (CNAs) are not random, but are more prevalent in particular chromosomal locations [1–4]. Indeed, it has become evident that patterns of genomic rearrangements differ between disease subtypes, and may be of prognostic significance [1–4]. It is clear from these studies that particular genomic copy number aberrations are associated with tumour grade. Furthermore, local DNA copy number changes have been shown to cause gene expression changes such that a majority of the genes in gained or amplified regions exhibit increased expression .
Similarly, regional epigenetic changes involving DNA methylation and chromatin structure which lead to or stabilize altered gene expression have been shown to be involved in breast cancer . The interplay of alterations in DNA copy number and epigenetic states is complex, and to understand the full picture data from multiple sources needs to be integrated. Since both copy number and epigenetic alterations result in changes in gene expression patterns, analysis of microarray gene expression data in the context of specific genomic regions is an efficient means of integrating the effects of genomic changes in cancer.
Oestrogen receptor positive (ER+) breast cancer represents the most prevalent breast cancer subtype, and although several anti-oestrogen therapies are available to treat hormone dependent disease, resistance to therapy is common and the full molecular basis of the disease is not fully understood. In this study we have assembled data from ER+ tumours within five published large-scale microarray gene expression datasets and developed a computational analysis approach to score the contributions of genomic regions with altered gene expression to proliferation and hence grade.
Previous analysis of gene expression profiles from ER+ breast tumours has implicated a set of highly correlated genes involved in cell proliferation as a key prognostic feature . This proliferation signature is highly enriched with genes known to be cell-cycle regulated and therefore provides an array-based mitotic index [7–9]. When correlates of histological grade were sought in gene expression profiles, most of the genes selected were those previously found in the proliferation signature [10, 11]. Moreover, it has been demonstrated that the array-based "Genomic Grade Index" was, at least for ER+ breast cancer, more accurate than histological grade in predicting clinical outcome . We explore the relationship of the proliferation signature to genomic regions that display marked covariant gene expression across a large number of tumours.
Patterns of gene expression that are associated with particular aspects of sample phenotype are often referred to as "signatures". This term has been used quite broadly both for clusters of co-regulated and thus correlated genes such as in the proliferation signature [7–9], but also for more complex expression profiles that involve a number of loosely, or even inversely, correlated gene clusters. Like others [13, 14] we have adopted the more operationally defined and analytically useful metagene approach, in which clusters of correlated genes are replaced by statistical summaries of them; here we use cluster centroids (mean vectors). The metagene approach sacrifices detail at the individual gene level in order to gain statistical robustness, generalisability and the necessary dimension reduction to enable higher-level analysis.
The analysis of gene expression data from ER+ breast cancer that we present here involves a number of stages. Firstly, we describe a novel clustering algorithm (GDEC) that uses genomic distance together with expression data to reveal regional patterns of co-ordinate gene expression. We show that many, but importantly not all, of these regional clusters reflect common CNAs in this type of cancer. We derive metagenes as cluster centroids and we refer to metagenes derived from GDEC clusters as regional metagenes (RMGs). We use regression analysis with the RMGs to identify the most important regions for the prediction of proliferation as defined by the proliferation metagene.
Results and Discussion
Genomic distance entrained clustering
We have developed a novel clustering method, called Genomic Distance (GDEC) Entrained Clustering, to identify genomic regions where gene expression is co-ordinately altered. The algorithm reduces the correlation distance between genes in the same chromosomal neighbourhood in a genomic distance and correlation dependent manner. This type of data clustering is generically known as clustering with side-information or clustering with soft constraints and is more typically used in geographical applications . Details of the algorithm and the parameters used are provided in the methods section.
Identification of recurrent regional metagenes
Regional metagenes, copy number aberrations and proliferation
The prognostic significance of the proliferative phenotype in these tumours as assessed by the proliferation signature has been emphasized by others [7, 10]. In order to determine the relationship between RMGs and proliferation, we derived a proliferation metagene. For this we identified the cluster containing most of the genes reported in published proliferation signatures [7–9] in each of the three larger datasets then defined the proliferation metagene as the intersection of these three clusters (see methods for further details). Histograms of correlations of the most variable genes to the proliferation metagene are given in Additional File 2, and the genes that comprise the proliferation signature are detailed in Additional File 3. In each dataset the genes that constitute the proliferation cluster form a small shoulder in the distribution with correlations greater than 0.5. In order to use the proliferation metagene as a continuous marker of proliferation, we excluded all genes with a high correlation to the proliferation metagene (correlation >0.5) from the analysis prior to clustering. We demonstrate in a subsequent section that the proliferation metagene is a reliable surrogate of tumour grade [10, 12] and results in a good separation of grade 1 and 3 tumours (Figure 4).
Identification of regional metagenes predictive of proliferation
Correlation of the RMG regression fit across datasets
Regional metagenes contribute additively to proliferation
To simplify the analysis and increase the sample size, five datasets were merged (see methods). We used GDEC clustering and the tree cutting method to derive 42 RMGs, and performed regression analysis as above (Additional File 4). The training fit gave a correlation of 0.82 to the proliferation metagene. To estimate the extent of overfitting we randomly split the dataset into two halves, and used both as a training set to derive weightings to predict the proliferation metagene of the other. This was repeated 500 times giving an average correlation of 0.79. Thus, by using a larger dataset we have reduced overfitting (see Figure 4).
Selection order for regional metagenes in prediction of proliferation
8q13.1 - 8q22.3
66.72 - 104.15
8p22 - 8p12
17.55 - 31.15
11q13.1 - 11q13.4
66.01 - 70.89
7p15.2 - 7p15.2
27.15 - 27.18
3p21.31 - 3p14.3
49.13 - 58.5
16p13.3 - 16p13.2
0.04 - 8.86
17p13.3 - 17p11.2
0.59 - 19.71
22q11.22 - 22q11.22
20.88 - 21.57
1q24.2 - 1q44
166.15 - 245
23q28 - 23q28
148.67 - 153.54
19q13.11 - 19q13.43
40.22 - 63.77
3q13.32 - 3q22.1
120.41 - 132.22
1p13.3 - 1p13.3
110 - 110.09
16p13.3 - 16p13.2
0.71 - 8.78
16q13 - 16q22.3
55.32 - 73.24
22q12.2 - 22q13.33
28.46 - 49.31
23q22.1 - 23q22.2
99.77 - 102.52
6q14.2 - 6q23.3
83.93 - 137.41
7q21.12 - 7q22.3
86.81 - 107.05
20p11.21 - 20q13.33
25.18 - 62.13
7q34 - 7q34
142.14 - 142.18
17p11.2 - 17q21.32
19.38 - 43.11
11q14.1 - 11q25
85.05 - 133.6
11q12.2 - 11q13.4
60.86 - 72.62
5q13.1 - 5q13.2
69.21 - 70.46
8q24.3 - 8q24.3
141.6 - 146.25
11q22.3 - 11q24.3
109.61 - 129.59
17q23.3 - 17q25.3
58.86 - 78.25
18q11.2 - 18q21.33
17.48 - 59.14
17q21.31 - 17q24.1
38.06 - 59.92
22q12.3 - 22q13.1
34.37 - 37.81
12p13.31 - 12p13.2
6.42 - 10.48
17q11.2 - 17q21.31
23.39 - 39.99
17q22 - 17q24.2
53.3 - 64.04
17q12 - 17q21.1
33.04 - 35.61
19q13.42 - 19q13.42
59.41 - 59.84
6q25.1 - 6q25.1
151.77 - 152.46
8p12 - 8p11.21
37.74 - 42.87
19p13.3 - 19p13.13
0.94 - 12.93
19q13.2 - 19q13.2
46.07 - 46.29
14q32.33 - 14q32.33
105.4 - 106.35
6p22.1 - 6p21.32
26.47 - 32.94
Competitive selection of regional metagenes
Since the majority of the correlation is explained by the first few metagenes added to the model, we ran the selection order permutations four further times with each of the first four RMGs omitted of the RMG set, in order to observe the rank order changes that resulted. The arrows in Figure 5 indicate the metagenes that most frequently substituted for the omitted RMG. The substituting RMGs that replaced the deleted RMGs were not surprisingly correlated with them, illustrating some redundancy amongst the RMGs. In the first case the RMG at 8q24 is frequently gained along with the region at 8q13-22 and is highly correlated to proliferation, but was pushed down the selection order presumably because it provided redundant information once the 8q13-22 RMG had been selected. This effect caused the selection order to deviate from a decreasing order of absolute correlations. For example, the third RMG selected, 11q13 has a lower correlation to the proliferation signature (0.38) than the RMG at 8q24 (0.47). In selection order analyses for the individual datasets we consistently found that the top two RMGs contained an RMG at 8p22 together with one of three RMGs from 8q (data not shown). The RMGs on 8q probably carry redundant information and possibility reflect the common gains of the q-arm of chromosome 8 in breast cancer . Consequently, the 11q13 RMG was more consistently able to provide additional, non redundant information to the model than a second RMG from 8q. Thus, our method establishes not only the regions that contribute most to proliferation, but also highlights the relationships between them such that the more orthogonal, and consequently the most additive combinations are selected with higher priority.
The top three RMGs that were selected using our method reflect known genomic copy number aberrations in breast cancer, thus validating this method. Furthermore, the positively correlated RMGs 8q13-22 and 11q13 are in regions known to be gained, and the negatively correlated 8p22 RMG in a region of common loss . Indeed, comparisons of DNA copy number changes between luminal A and luminal B type breast cancers, corresponding to low and high grade respectively, indicated that the frequency of gain on 8q and loss on 8p was much greater in the more proliferative luminal B subtype .
This analysis approach also highlighted the importance of the HOXA cluster at 7p15 (RMG4), which has been shown to undergo epigenetic silencing in tumours . The HOXA genes have been implicated in growth suppression and apoptosis via a p53-dependent pathway [23, 24]. The consistent selection of the HOXA cluster on chromosome 7p15 in the sequential model building method, suggests that down-regulation of these genes is an additive event and not simply a consequence of rearrangements on chromosome 8.
The metagene at 3p21-14 (RMG5) contains the gene IL17BR. This gene forms half of a two gene predictor for response to tamoxifen treatment, along with HOXB13. IL17BR has been shown to be significantly negatively correlated to grade at the expression level in a large panel of tumours , and is in a region where loss has been associated with high grade .
Chromosome 1 frequently exhibits gain of the q arm and loss of the p arm in breast cancer , with loss of the p arm more frequent in luminal B type tumours. We identified a region from 1q24-44 (RMG9) that was positively correlated to proliferation, and a region at 1p13 (RMG13) that was negatively correlated to proliferation. Thus, this analysis can help pinpoint the location of genes that drive cancer progression when amplified or lost.
The metagene at 17p13 (RMG7) sits in a region that undergoes copy number loss more frequently in luminal B compared to luminal A tumours . This RMG spans the p53 gene and was negatively correlated to proliferation. Epigenetic silencing at 7p15 and copy number loss at 17p13 can both affect progression of tumours with wild type p53, but are likely to be less important in tumours that harbour p53 null mutations.
Detailed analysis of regional metagene interactions
Interaction network analysis of regional metagenes
In most cases connections between RMGs were largely mediated through a centralised component (shown in black) comprising 2808 genes (36 belonging to RMGs 8 to 42) and 3050 interactions, the majority of which emanated from a small number of non-metagene hubs (BRAF, RAF1, DDB1, IMMT, DLG4, TRAF6 and HCLS1). Most of these hubs interacted directly with genes in the proliferation cluster. In the case of RMG 7 a significant number of direct interactions with the proliferation metagene were observed, with TP53 and PAFAH1B1 interacting with 17 proliferation metagenes components including CDC2, CCNA2, RAD51 and AURKA (Additional File 5).
To investigate the relationship between the top RMGs and those that replaced them when they were omitted, a protein interaction network was generated from the proliferation metagene and RMGs 1 to 4, 15, 17 and 26 (8q13-22, 8p12-22, 11q13, 7p15, 16q13-22, 23q22 and 8q24 respectively) and the direct or shortest indirect signalling interactions with the proliferation metagene members were compared. For the top 3 RMGs (8q13-22, 8p12-22 and 11q13), the replacing metagene hit a subset of the respective proliferation metagene members, indicating some functional equivalence (Figure 7B and Additional File 6). Interestingly, this overlap was not observed for RMG 4 (7p15) suggesting that, for the HOX cluster, signalling to the proliferation metagene may be mediated through additional interactions within the centralised component (Figure 7A shown in black).
This analysis reiterates the finding that a number of small changes in a set of complementary pathways driving cell growth and division can act additively to increase cell proliferation. Furthermore, analysis of the RMGs that carry redundant information can help to narrow down the list of potential cancer drivers within RMGs.
We have shown that a small regional distortion of correlation distance in agglomerative clustering results in the formation of regional clusters of co-regulated genes. We have constructed metagenes from these clusters and used linear regression in the modelling of grade using proliferation as a surrogate. Using this approach we have identified 42 genomic regions where gene expression is recurrently altered in ER+ breast cancers. We have gone on to identify the regions most correlated with the proliferation signature. We show that distinct genomic regions combine additively to enhance proliferation, and that regions can be ranked by their contribution to the proliferation rate in a competitive model. As a result we have identified the differentially regulated genomic regions that are most important in proliferation, and hence grade and prognosis, of ER+ breast cancer. Furthermore, detailed analysis of the interacting regions has identified a number of possible genetic drivers of cancer that are involved in key cellular pathways. This approach will have utility in identifying and integrating chromosomal regions where coordinate changes in gene expression confer clinically relevant cancer phenotypes.
Microarray data normalisation
Microarray gene expression data for five of the breast cancer datasets used in this study, were obtained from the GEO database (GSE6532, GSE1456, GSE3494, GSE7390, GSE2034) . The paired gene expression and array comparative genomic hybridization data for 43 ER+ tumours  was downloaded from the database referenced therein. The gene expression data from all these studies was derived using the Affymetrix 133A platform, comprised of the 22215 non-control probesets. ER+ tumour samples were selected on the basis of histological sample annotation. MAS5 processed gene expression values were log2 transformed, and then quantile normalization was applied across all samples . Following transformation and normalisation probesets that had a maximum expression level below 7, an inter-quartile range below 0.5, or missing genomic mapping information were excluded from the analysis. The gene expression values for each gene were then standardised to a mean of 0 and standard deviation of 1 within each dataset. In the case of the combined dataset, this resulted in a final matrix of 5466 probesets for 793 tumours.
The normalized aCGH data from  was smoothed using the DNAcopy R package from Bioconductor . Gains and losses were defined as those log2 ratio values that exceeded 0.2 and were less than -0.2 respectively.
Genomic distance entrained clustering algorithm
where ρ ij is the Pearson correlation, and λ is a scaling parameter that controls the extend of distance distortion. In this study the parameters where fixed at; a = 0.25, λ = 0.5, h = 10 Mb. A three-dimensional plot of the function is provided in additional file 7.
The proliferation metagene was derived as follows. The most variable genes from the three larger datasets used here (Tbig, Uppsala and Wang) were clustered by standard flexible beta clustering and the dendrograms cut to give 100 clusters. In each of the three sets of 100 clusters, a single cluster was identified that contained most of the genes found in published proliferation signatures [7–9]. The gene list used to derive the proliferation metagene used here was taken as the intersection of these three clusters. When the proliferation metagenes were derived de novo for each of the individual datasets, by identifying the proliferation cluster as above, the correlation of this de novo proliferation metagene to the proliferation metagene from the intersection gene list was always high (worst case 0.956). This supports the use of metagenes as stable and transferable estimates of recurrent expression patterns.
All metagenes were derived as the mean vector of the genes in the corresponding cluster following standardization to mean of 0 and standard deviation of 1. In order to avoid biased gene weighting, values from duplicate probesets for the same gene (UniGene Cluster) were averaged prior to averaging across different genes in metagene calculation. For clusters that derived less than 100% of their genes from the same chromosomal region (i.e. 90% to100%), the minority genes from different regions were excluded from the calculations.
Ridge regression and competitive selection
Unless otherwise stated, ridge regression was used with the ridge parameter set by leave-one-out cross-validation in the training set (values ranged from 25 to 120).
Competitive selection was carried out on the merged dataset of 793 ER+ samples from the GSE6532, GSE1456, GSE3494, GSE7390 and GSE2034 datasets. One hundred random sample sets, each with 396 tumours, were drawn from the pool. The ridge regression model was then built up selecting the RMG at each step that best improved the fit. The ridge parameter was fixed at a value of 25 for this analysis. The average RMG rank and correlation of the fit at each step was then recorded.
The clustering and derivation of RMGs was not repeated for each permutation in this analysis, and thus we are only testing the consistency of the regression weightings given a fixed set of RMGs.
Interaction network analysis
Gene lists from the proliferation metagene and RMGs 1 to 7 (and subsequently RMGs 15, 17 and 26) were submitted to ROCK  for network generation http://rock.icr.ac.uk. The resultant network was visualised with ROCKscape (manuscript in preparation), a modification of Cytoscape http://www.cytoscape.org that allows integration with ROCK-BCGF. Network metrics were derived with the RandomNetworks plugin http://sites.google.com/site/randomnetworkplugin/.
The authors would like to thank Prof. Alan Ashworth of the Institute for comments on the manuscript. We acknowledge funding from Breakthrough Breast Cancer and NHS funding to the NIHR Biomedical Research Centre.
- Bergamaschi A, Kim YH, Wang P, Sorlie T, Hernandez-Boussard T, Lonning PE, Tibshirani R, Borresen-Dale AL, Pollack JR: Distinct patterns of DNA copy number alteration are associated with different clinicopathological features and gene-expression subtypes of breast cancer. Genes Chromosomes Cancer. 2006, 45: 1033-1040. 10.1002/gcc.20366View ArticlePubMedGoogle Scholar
- Hicks J, Krasnitz A, Lakshmi B, Navin NE, Riggs M, Leibu E, Esposito D, Alexander J, Troge J, Grubor V, Yoon S, Wigler M, Ye K, Borresen-Dale AL, Naume B, Schlicting E, Norton L, Hagerstrom T, Skoog L, Auer G, Maner S, Lundin P, Zetterberg A: Novel patterns of genome rearrangement and their association with survival in breast cancer. Genome Res. 2006, 16: 1465-1479. 10.1101/gr.5460106PubMed CentralView ArticlePubMedGoogle Scholar
- Korsching E, Packeisen J, Helms MW, Kersting C, Voss R, van Diest PJ, Brandt B, van der Wall E, Boecker W, Burger H: Deciphering a subgroup of breast carcinomas with putative progression of grade during carcinogenesis revealed by comparative genomic hybridisation (CGH) and immunohistochemistry. Br J Cancer. 2004, 90: 1422-1428. 10.1038/sj.bjc.6601658PubMed CentralView ArticlePubMedGoogle Scholar
- Simpson PT, Reis-Filho JS, Gale T, Lakhani SR: Molecular evolution of breast cancer. J Pathol. 2005, 205: 248-254. 10.1002/path.1691View ArticlePubMedGoogle Scholar
- Pollack JR, Sorlie T, Perou CM, Rees CA, Jeffrey SS, Lonning PE, Tibshirani R, Botstein D, Borresen-Dale AL, Brown PO: Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors. Proc Natl Acad Sci USA. 2002, 99: 12963-12968. 10.1073/pnas.162471999PubMed CentralView ArticlePubMedGoogle Scholar
- Novak P, Jensen T, Oshiro MM, Watts GS, Kim CJ, Futscher BW: Agglomerative epigenetic aberrations are a common event in human breast cancer. Cancer Res. 2008, 68: 8616-8625. 10.1158/0008-5472.CAN-08-1419View ArticlePubMedGoogle Scholar
- Dai H, van't Veer L, Lamb J, He YD, Mao M, Fine BM, Bernards R, van de Vijver M, Deutsch P, Sachs A, Stoughton R, Friend S: A cell proliferation signature is a marker of extremely poor outcome in a subpopulation of breast cancer patients. Cancer Res. 2005, 65: 4059-4066. 10.1158/0008-5472.CAN-04-3953View ArticlePubMedGoogle Scholar
- Whitfield ML, Sherlock G, Saldanha AJ, Murray JI, Ball CA, Alexander KE, Matese JC, Perou CM, Hurt MM, Brown PO, Botstein D: Identification of genes periodically expressed in the human cell cycle and their expression in tumors. Mol Biol Cell. 2002, 13: 1977-2000. 10.1091/mbc.02-02-0030.PubMed CentralView ArticlePubMedGoogle Scholar
- Whitfield ML, George LK, Grant GD, Perou CM: Common markers of proliferation. Nat Rev Cancer. 2006, 6: 99-106. 10.1038/nrc1802View ArticlePubMedGoogle Scholar
- Sotiriou C, Wirapati P, Loi S, Harris A, Fox S, Smeds J, Nordgren H, Farmer P, Praz V, Haibe-Kains B, Desmedt C, Larsimont D, Cardoso F, Peterse H, Nuyten D, Buyse M, Van de Vijver MJ, Bergh J, Piccart M, Delorenzi M: Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J Natl Cancer Inst. 2006, 98: 262-272. 10.1093/jnci/djj052View ArticlePubMedGoogle Scholar
- Loi S, Haibe-Kains B, Desmedt C, Lallemand F, Tutt AM, Gillet C, Ellis P, Harris A, Bergh J, Foekens JA, Klijn JG, Larsimont D, Buyse M, Bontempi G, Delorenzi M, Piccart MJ, Sotiriou C: Definition of clinically distinct molecular subtypes in estrogen receptor-positive breast carcinomas through genomic grade. J Clin Oncol. 2007, 25: 1239-1246. 10.1200/JCO.2006.07.1522View ArticlePubMedGoogle Scholar
- Ignatiadis M, Sotiriou C: Understanding the molecular basis of histologic grade. Pathobiology. 2008, 75: 104-111. 10.1159/000123848View ArticlePubMedGoogle Scholar
- Huang E, Ishida S, Pittman J, Dressman H, Bild A, Kloos M, D'Amico M, Pestell RG, West M, Nevins JR: Gene expression phenotypic models that predict the activity of oncogenic pathways. Nat Genet. 2003, 34: 226-230. 10.1038/ng1167View ArticlePubMedGoogle Scholar
- Brunet JP, Tamayo P, Golub TR, Mesirov JP: Metagenes and molecular pattern discovery using matrix factorization. Proc Natl Acad Sci USA. 2004, 101: 4164-4169. 10.1073/pnas.0308531101PubMed CentralView ArticlePubMedGoogle Scholar
- Everitt BS, Landau S, Leese M: Cluster Analysis. 2001, London: Arnold, 4Google Scholar
- Desmedt C, Piette F, Loi S, Wang Y, Lallemand F, Haibe-Kains B, Viale G, Delorenzi M, Zhang Y, d'Assignies MS, Bergh J, Lidereau R, Ellis P, Harris AL, Klijn JG, Foekens JA, Cardoso F, Piccart MJ, Buyse M, Sotiriou C: Strong time dependence of the 76-gene prognostic signature for node-negative breast cancer patients in the TRANSBIG multicenter independent validation series. Clin Cancer Res. 2007, 13: 3207-3214. 10.1158/1078-0432.CCR-06-2765View ArticlePubMedGoogle Scholar
- Miller LD, Smeds J, George J, Vega VB, Vergara L, Ploner A, Pawitan Y, Hall P, Klaar S, Liu ET, Bergh J: An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. Proc Natl Acad Sci USA. 2005, 102: 13550-13555. 10.1073/pnas.0506230102PubMed CentralView ArticlePubMedGoogle Scholar
- Pawitan Y, Bjohle J, Amler L, Borg AL, Egyhazi S, Hall P, Han X, Holmberg L, Huang F, Klaar S, Liu ET, Miller L, Nordgren H, Ploner A, Sandelin K, Shaw PM, Smeds J, Skoog L, Wedren S, Bergh J: Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts. Breast Cancer Res. 2005, 7: R953-964. 10.1186/bcr1325PubMed CentralView ArticlePubMedGoogle Scholar
- Wang Y, Klijn JG, Zhang Y, Sieuwerts AM, Look MP, Yang F, Talantov D, Timmermans M, Meijer-van Gelder ME, Yu J, Jatkoe T, Berns EM, Atkins D, Foekens JA: Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet. 2005, 365: 671-679.View ArticlePubMedGoogle Scholar
- Chin K, DeVries S, Fridlyand J, Spellman PT, Roydasgupta R, Kuo WL, Lapuk A, Neve RM, Qian Z, Ryder T, Chen F, Feiler H, Tokuyasu T, Kingsley C, Dairkee S, Meng Z, Chew K, Pinkel D, Jain A, Ljung BM, Esserman L, Albertson DG, Waldman FM, Gray JW: Genomic and transcriptional aberrations linked to breast cancer pathophysiologies. Cancer Cell. 2006, 10: 529-541. 10.1016/j.ccr.2006.10.009View ArticlePubMedGoogle Scholar
- Witcher M, Emerson BM: Epigenetic silencing of the p16(INK4a) tumor suppressor is associated with loss of CTCF binding and a chromatin boundary. Mol Cell. 2009, 34: 271-284. 10.1016/j.molcel.2009.04.001PubMed CentralView ArticlePubMedGoogle Scholar
- Novak P, Jensen T, Oshiro MM, Wozniak RJ, Nouzova M, Watts GS, Klimecki WT, Kim C, Futscher BW: Epigenetic inactivation of the HOXA gene cluster in breast cancer. Cancer Res. 2006, 66: 10664-10670. 10.1158/0008-5472.CAN-06-2761View ArticlePubMedGoogle Scholar
- Raman V, Martensen SA, Reisman D, Evron E, Odenwald WF, Jaffee E, Marks J, Sukumar S: Compromised HOXA5 function can limit p53 expression in human breast tumours. Nature. 2000, 405: 974-978. 10.1038/35016125View ArticlePubMedGoogle Scholar
- Chen H, Chung S, Sukumar S: HOXA5-induced apoptosis in breast cancer cells is mediated by caspases 2 and 8. Mol Cell Biol. 2004, 24: 924-935. 10.1128/MCB.24.2.924-935.2004PubMed CentralView ArticlePubMedGoogle Scholar
- Jansen MP, Sieuwerts AM, Look MP, Ritstier K, Meijer-van Gelder ME, van Staveren IL, Klijn JG, Foekens JA, Berns EM: HOXB13-to-IL17BR expression ratio is related with tumor aggressiveness and response to tamoxifen of recurrent breast cancer: a retrospective study. J Clin Oncol. 2007, 25: 662-668. 10.1200/JCO.2006.07.3676View ArticlePubMedGoogle Scholar
- Shivakumar L, Minna J, Sakamaki T, Pestell R, White MA: The RASSF1A tumor suppressor blocks cell cycle progression and inhibits cyclin D1 accumulation. Mol Cell Biol. 2002, 22: 4309-4318. 10.1128/MCB.22.12.4309-4318.2002PubMed CentralView ArticlePubMedGoogle Scholar
- Edgar R, Domrachev M, Lash AE: Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002, 30: 207-210. 10.1093/nar/30.1.207PubMed CentralView ArticlePubMedGoogle Scholar
- Bolstad BM, Irizarry RA, Astrand M, Speed TP: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003, 19: 185-193. 10.1093/bioinformatics/19.2.185View ArticlePubMedGoogle Scholar
- Olshen AB, Venkatraman ES, Lucito R, Wigler M: Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics. 2004, 5: 557-572. 10.1093/biostatistics/kxh008View ArticlePubMedGoogle Scholar
- Lance GN, Williams WT: A general theory of classificatory sorting strategies: 1. Hierarchical systems. Computer Journal. 1967, 9: 373-380.View ArticleGoogle Scholar
- Sims D, Bursteinas B, Gao Q, Jain E, Mackay A, Mitsopoulos C, Zvelebil M: ROCK: a breast cancer functional genomics resource. Breast Cancer Res Treat.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.