Skip to main content

An integrative multi-dimensional genetic and epigenetic strategy to identify aberrant genes and pathways in cancer



Genomics has substantially changed our approach to cancer research. Gene expression profiling, for example, has been utilized to delineate subtypes of cancer, and facilitated derivation of predictive and prognostic signatures. The emergence of technologies for the high resolution and genome-wide description of genetic and epigenetic features has enabled the identification of a multitude of causal DNA events in tumors. This has afforded the potential for large scale integration of genome and transcriptome data generated from a variety of technology platforms to acquire a better understanding of cancer.


Here we show how multi-dimensional genomics data analysis would enable the deciphering of mechanisms that disrupt regulatory/signaling cascades and downstream effects. Since not all gene expression changes observed in a tumor are causal to cancer development, we demonstrate an approach based on multiple concerted disruption (MCD) analysis of genes that facilitates the rational deduction of aberrant genes and pathways, which otherwise would be overlooked in single genomic dimension investigations.


Notably, this is the first comprehensive study of breast cancer cells by parallel integrative genome wide analyses of DNA copy number, LOH, and DNA methylation status to interpret changes in gene expression pattern. Our findings demonstrate the power of a multi-dimensional approach to elucidate events which would escape conventional single dimensional analysis and as such, reduce the cohort sample size for cancer gene discovery.


Genomic analyses have substantially improved our knowledge of cancer. Gene expression profiling, for example, is utilized to delineate subtypes of breast cancer, and has facilitated the derivation of predictive and prognostic signatures [15]. However, not all of the gene expression changes observed are causal to cancer development, and global gene expression analysis alone cannot distinguish between causal and reactive changes. Corresponding alteration at the DNA level is regarded as evidence of causality; for example, gene deletion or gene silencing by methylation. Hence, examining genetic and epigenetic events in conjunction with the changes in gene expression pattern should improve the identification of causal changes that lead to disease phenotype.

Analysis of gene copy number alone has correlated breast cancer genome features with poor prognosis based on the degree of genomic instability observed [6]. In terms of gene discovery, specific genomic regions containing important loci have been shown to be frequently gained or lost [711]. Integrative analyses of gene dosage and gene expression in breast cancer have revealed specific genes which are deregulated at the gene expression level as a result of changes in DNA copy number. From a global perspective, studies have shown a broad range in concordance between DNA amplification and overexpression of genes. This variability is attributable to the sensitivity of the methods used in detecting gene copy number and gene expression changes as well as the number of genes examined [1215]. Conversely, when examining gene overexpression, it was found that only ~10% of the overexpression could be attributable to gene amplification [14]. It is certain that altered gene expression can not only be attributed to disruption of regulatory/signaling cascades and downstream effects, but also to a multitude of causal genetic and epigenetic aberrations.

We reason that by examining multiple genomic dimensions simultaneously, with a dimension representing a genome wide assay measuring DNA level alterations such as gene copy number or DNA methylation, we are likely to achieve the following: (i) explain a greater fraction of the observed gene expression deregulation as compared with explaining expression deregulation using only a single dimension, (ii) improve the discovery of critical oncogenes and tumor suppressor genes (TSGs) by focusing on those genes altered simultaneously at multiple genomic dimensions, and (iii) begin to understand the complex mechanisms of dysregulation of oncogenic pathways. In this study, we demonstrate the power of an integrative genomics approach by performing multi-dimensional analyses (MDA) of the genome, epigenome, and transcriptome of breast cancer cell lines. We illustrate and demonstrate the need for integrative analysis of multiple genomic dimensions by showing the co-operative contribution of DNA mechanisms to explaining differential gene expression. Using a strategy to identify genes exhibiting congruent alteration in copy number, DNA methylation, and allelic (or loss of heterozygosity, LOH) status, which we term multiple concerted disruption (MCD) analysis, we find genes representing key nodes in pathways as well as genes which exhibit prognostic significance. In examining the neuregulin pathway, we observe the variability among samples in the mechanism of dysregulation of this commonly altered breast cancer pathway, highlighting the importance of multi-dimensional analysis of a given pathway in individual tumor samples -- in addition to the conventional approach of identifying loci simply based on frequency of disruption in a cohort. Finally, examining the subset of triple negative breast cancer cell (TNBC) lines, we show that a downstream target of FGFR2, a recently implicated oncogene in TNBC, COL1A1 is frequently affected by MCD even though in FGFR2 itself is rarely affected. Notably, this is the first such in-depth genomic, epigenomic, and transcriptomic analyses of breast cancer.


Data generation and acquisition

Commonly used breast cancer (HCC38, HCC1008, HCC1143, HCC1395, HCC1599, HCC1937, HCC2218, BT474, MCF-7) and non-cancer (MCF10A) cell lines were selected for analyses (Additional File 1). Copy number profiles were obtained from the SIGMA database [11, 16]. These profiles were generated using a whole genome tiling path microarray CGH platform [17, 18]. Expression profiles for BT474 and MCF-7 were obtained from the NCI Cancer Biomedical Informatics Grid (caBIG,, MCF10A profile from GEO (GSM254525), and the rest were generated using Affymetrix U133 Plus 2.0 platform at the McGill University and Genome Quebec Innovation Centre. Affymetrix 500 K SNP array data were obtained from caBIG. DNA methylation profiles were generated using the Illumina Infinium methylation platform at the Genomics Lab, Wellcome Trust Centre for Human Genetics. A summary of the sources of all the data used is provided in Additional File 2. Gene expression and methylation data generated were deposited in NCBI GEO (GSE17768 and GSE17769).

Data processing and normalization

Array CGH data were normalized using a stepwise normalization framework [19]. In addition, data were filtered based on a stringent standard deviation cut-off of 0.075 between replicate spots, with those exceeding this cut-off excluded from further analysis. To identify regions of gain and loss, smoothing and segmentation analysis was performed using aCGH-Smooth [20] as previously described [21]. Copy number status for clones which were filtered from above were inferred using neighboring clones within a 1 Mb window.

Affymetrix SNP array data were normalized and genotyped using the "oligo" package in R, specifically using the crlmm algorithm for genotyping [22]. Genotype calls whose confidences were less than 0.95 were termed "No Call" (NC). Subsequently, genotype profiles were analyzed using dChip [23] and LOH was determined using a panel of 60 normal genotypes from the HapMap dataset [24] as provided by dChip, as matching blood lymphoblast profiles were not available. LOH ("L"), Retention ("R"), and No Call ("N") status was determined for every marker in each sample. Analysis parameters used were as specified in the dChip manual.

Raw gene expression profiles from all ten cell lines were RMA normalized using the "affy" package in Bioconductor [25, 26](Additional File 3). Gene expression data were further filtered using the Affymetrix MAS 5.0 Call values ("P","M", and "A"). Since the comparison of differential expression was one cancer line to one normal, both call values could not be "Absent" in order to be retained for analysis.

Methylation data were normalized and processed using Illumina BeadStudio software (, Illumina, Inc., San Diego, CA, USA). Beta-values and confidence p-values were retained for further analysis. Beta-values with associated confidence p-values > 0.05 were excluded. Data from all genomic dimensions were mapped to the hg18 (March 2006) genome assembly.

Strategy for integrative analysis

Copy number and LOH profiles were mapped to genes using the mapping of the Affymetrix U133 Plus 2.0 platform as well as the UCSC Genome Browser [27]. Methylation data were linked to the other three types of data using either the RefSeq gene symbol as specified by the Illumina mapping file (Illumina), or the RefSeq accession number. Differential expression was determined by subtracting the expression value in the non-malignant line MCF10A from the value in each cancer line. Since the obtained gene expression values after RMA normalization were represented in log2 space, a gene was considered differentially expressed if the difference between the cancer line and MCF10A was greater than 1, which corresponded to a two-fold expression difference. DNA methylation status was determined by subtracting beta-values, with hypermethylation defined as a positive difference between tumor and normal (≥ 0.25) and hypomethylation defined as a negative difference between tumor and normal (≤ -0.25). Briefly, a beta value for a given CpG site ranges from 0 to 1 and represents the ratio of the methylated signal over the total signal (methylated plus unmethylated signal). These thresholds are comparable to those used in previous studies using an earlier Illumina methylation platform [28]. Using this mapping strategy, 12,910 unique genes were mapped across platforms corresponding to 24,708 of the ~27,000 Illumina Infinium probes and to 27,053 probes of the Affymetrix U133 Plus 2.0 platform. Visualization of multi-dimensional data was performed using the SIGMA2software [29].

To determine the genetic events that caused (or could explain) gene expression status, we first identified a set of overexpressed and underexpressed genes for each cell line sample relative to MCF10A based on differential expression criteria mentioned above. Each cancer sample may have a different number of differentially expressed genes. Second, for each differentially expressed gene in each sample, we examined the copy number status, methylation status, and allelic status. A differential expression was considered "explained" when the observed expression change matched the expected change at the DNA level. If a gene was overexpressed, the causal copy number status would be a gain, DNA methylation status would be hypomethylation, or allelic status would be allelic imbalance. Conversely, if a gene was underexpressed, the causal copy number status would be a loss, DNA methylation status would be hypermethylation, or allelic status would be LOH. From this point forward, when a change in allele status with overexpression is discussed, it will be denoted as allelic imbalance (AI). Conversely, for underexpression, a change in allele status will be denoted as loss of heterozygosity (LOH). While changes in methylation or changes in gene dosage leading to differential expression are more commonly discussed, previous studies have shown that changes in allele status without change in copy number (copy neutral AI or LOH) can also lead to differential gene expression due to preferential allelic expression [3032].

Multiple concerted disruption (MCD) analysis

To determine what are likely key nodes in pathways and functions, we hypothesize that, in addition to being altered frequently (by one mechanism or multiple mechanisms), these genes also exhibit multiple concerted disruption (MCD) in a given sample. That is, a congruent change in gene copy number (gain or loss) accompanied by allelic imbalance and change in DNA methylation (hypomethylation or hypermethylation) resulting in a change in gene expression (over or underexpression). Moreover, the MCD events would be used as a similar screening approach to gene amplifications (multi-copy increases) or homozygous deletions whereby the expectation is that these events would occur at a lower frequency than disruptions through one mechanism alone and observation of these events would signify importance to the genes in question.

In this study, the MCD strategy can be broken down into four sequential steps. First, using a pre-defined frequency threshold, we identify a set of the most frequently differentially expressed genes. Second, we identify the most frequently differentially expressed genes from step 1 whose expression change is frequently associated with concerted change in at least one DNA dimension (either DNA copy number, DNA methylation or allelic status) within the same sample. Next, we further refine this subset of genes from step 2 by selecting those having concerted change in all dimensions in the same sample which we term as MCD. Finally, we introduce an additional level of stringency by requiring a minimum frequency of MCD in the given cohort. At the end of the process, we identify a small subset of genes which exhibit disruption through multiple mechanisms and show consequential change in gene expression.

Simulated data analysis

Using the status of DNA alteration and expression for every gene in every sample, data within each sample were shuffled and randomized ten times to create ten simulated datasets. Each dataset was analyzed for overall disruption frequency and MCD and all results were then aggregated to determine the frequency distribution of different thresholds observed in the randomized data analysis.

Pathway enrichment analysis

For pathway analysis, Ingenuity Pathway Analysis software (version 8.5) was used (Ingenuity Systems, CA, USA). Specifically, the core and comparison analyses were used, with focus on canonical signaling pathways. Briefly, for a given function or pathway, statistical significance of pathway enrichment is calculated using a right-tailed Fisher's exact test based on the number of genes annotated, number of genes represented in the input dataset, and the total number of genes being assessed in the experiment. A pathway was deemed significant if the p-value of enrichment was ≤ 0.05 (adjusted for multiple comparisons using a Benjamini-Hochberg correction).

Survival and differential gene expression analysis in publicly available datasets

For survival analysis, Kaplan-Meier analysis was performed using the statistical toolbox in Matlab (Mathworks). For each gene, the expression data were sorted from lowest to highest expression across the sample set and survival times were compared between the top 1/3 and bottom 1/3 of the samples. Two publicly available gene expression microarray datasets with survival data were utilized for this analysis [4, 33]. For the Sorlie et al dataset, individuals whose cause of death was not breast cancer were excluded from the analysis and missing data due to quality control issues were filled using the knn method in the "impute" package in Bioconductor [34]. Of the 23 genes selected by our MCD analysis (see Results), 17 were represented in either dataset. Survival distributions were compared using a log rank test and two-tailed p-values unadjusted for multiple comparisons were reported. Log-rank test code was obtained from Matlab File exchange

Subsequently, these 17 genes were further evaluated for differential expression in publicly available expression datasets of clinical breast cancer samples using the Oncomine database [35].

Results and Discussion

Analysis of individual genomic dimensions

When examining each genomic dimension alone, we see that many of the common features identified are consistent with the current knowledge of breast cancer genomes, for example, previously reported chromosomal regions of frequent copy number gain, segmental loss and loss of heterozygosity (LOH)/allelic imbalance (AI) (Figure 1A) [6, 8, 11, 12, 36]. While many regions of frequent LOH/AI do overlap with regions of copy number change, others are in regions of neutral copy number. Key genes implicated in breast cancer reside in these specific regions and are altered expectedly (Figure 1B).

Figure 1
figure 1

Genomic profiles of breast cancer cell lines. (A) Whole genome frequency analysis copy number gain (red), copy number loss (green), loss of heterozygosity/allelic imbalance (AI) (top blue) and copy number neutral LOH/AI (bottom blue). Vertical lines through all four graphs represent the genomic location of key breast cancer genes, using the hg18 build of the human genome map. (B) Illustration of copy number and LOH/AI status for ESR1, BRCA1, BRCA2, ERBB2 and TP53 in each of the samples. Each of these DNA events is evident in all of these genes.

Multi-dimensional analysis (MDA) reveals a higher proportion of intra-sample deregulated gene expression can be explained when more dimensions are analyzed

The impact of integrative, multi-dimensional analysis on gene discovery is observed at two levels: (i) within an individual sample as well as (ii) across a set of samples. Within a given sample, we see that by sequentially examining more genomic dimensions at the DNA level, i.e. gene dosage, allelic status, and DNA methylation, we can explain a higher proportion of the differential gene expression changes observed. Interestingly, although this proportion may vary between samples, it always increases with every additional dimension examined (Figure 2A). For example, in HCC1395, a single genomic dimension alone can explain as much as 64.4% of overexpression but when using all three DNA based dimensions, whereby gene overexpression can be explained by disruption at the DNA level in at least one dimension, as much as 75.7% of aberrant overexpression can be explained. Similarly, in HCC1937, an increase from 56.9% to 74.7% explainable underexpression is observed when moving from one to three genomic dimensions respectively. Conversely, in HCC2218, we observe 44% and 36% of overexpression and underexpression respectively when using all three DNA dimensions. This suggests that the majority of differential expression in sample HCC2218 is most likely a result of complex gene-gene trans-regulation and consequently, highlights the individual differences between samples.

Figure 2
figure 2

Quantitative and qualitative benefits of integrative analyses. (A) Heatmap and bar plot illustration of the additive benefit of multi-dimensional DNA analysis for the explanation of consequential differential gene expression. Within a sample, when sequentially adding a DNA dimension of analysis, an increasing percentage of observed differential gene expression can be explained. For each dimension or combination of dimensions, in the bar plot, the median value is used (grey bars). Heatmaps display the percentage of differential expression explained by DNA mechanisms, with values near to 100 either dark red (overexpression) or green (underexpression) and values closer to 0 in white. (B) Two specific genes GNAS and CASP1 are given as examples to show multiple and complementary mechanisms of gene disruption, illustrating the importance of multi-dimensional analysis (MDA).

MDA reveals genes are disrupted at higher frequencies when examining multiple dimensions as compared to any single dimension alone

When considering across a sample set, we see that analysis of multiple genomic dimensions leads to the discovery of more disrupted genes than what would be detected using a single dimension of analysis alone. For each identified gene, we gain insight in how multiple mechanisms are complementary in gene disruption (Figure 2B). For example, the tumor suppressor gene caspase 1 (CASP1) has been thought to be deactivated through DNA hypermethylation in multiple cancer types [37, 38]. The gene is underexpressed in all nine cases examined in this study. In a subset of these cases, the observed underexpression can be attributed to copy number loss. Interestingly, in the remaining cases, DNA hypermethylation and copy neutral LOH are observed. Similarly, in another example, GNAS is differentially expressed in all nine cases, with a subset of cases showing concerted copy number change while the remaining cases reveal concerted change in DNA methylation. Notably, our conclusion is supported by recent studies of glioblastoma, that also showed higher than expected disruption frequencies of specific genes when multiple genomic dimensions were analyzed [39, 40]. These examples illustrate how deregulated genes can be detected in more cases when multiple, but complementary, approaches are used.

Until very recently, multi-dimensional genomic analysis typically represented the parallel examination of gene dosage and gene expression. To demonstrate the power of examining multiple dimensions, we examine the frequency of gene expression deregulation explained by congruent alteration at the DNA level. Briefly, for each gene, a sample is determined to have a DNA explained gene expression change if any of the following criteria are met; gene overexpression should be accompanied with either (i) copy number gain, (ii) copy neutral allelic imbalance, or (iii) hypomethylation and gene underexpression should be accompanied with either (i) copy number loss, (ii) copy neutral LOH, or (iii) hypermethylation.

To determine an appropriate frequency of disruption threshold, ten random, simulated datasets were generated and a distribution plot was generated for all of the observed frequencies from 0/9 to 9/9 across all simulations (Figure 3A). The proportion of observed frequencies ≥ 5/9 was 0.086 but for ≥ 6/9, the proportion was 0.020. Thus, since the 6/9 threshold was the first threshold ≤ 0.05, 6/9 was used for further analysis. Using this threshold, we found that 437 differentially expressed genes have a corresponding change in gene dosage. Scaling this approach to examining the whole genome at multiple dimensions, we anticipate identifying more disrupted genes. When we added the remaining dimensions to account for differential expression, at the same frequency cut-off, we identified the mechanism of disruption for 1162 deregulated genes (Figure 3B, Additional File 4).

Figure 3
figure 3

Determination and application of a disruption frequency threshold. (A) Results of the analyses of ten simulated datasets. Aggregating the results of the simulated analyses, the proportion of genes in random simulations at the observed frequency thresholds are shown. From these analysis, approximately 2% of the simulations were ≥ 6/9. (B) Using a frequency cut-off of 6/9, the number of genes disrupted at that frequency using a single or combination of DNA dimensions. With a single dimension alone, we can maximally identify 437 genes which are differentially expressed and exhibit a concerted change at the DNA level in a minimum of 6/9 samples. However, using all three dimensions, we find that 1162 genes are in fact differentially expressed and contain at least one concerted change in one of the DNA dimensions. This represents over a two-fold increase in the number of genes identified.

The impact of multi-dimensional integrative analysis on cancer gene discovery is the enhanced detection of genes which are disrupted by multiple mechanisms but at lower frequencies for individual mechanisms. Collectively, the detection of gene dosage, allelic conversion and change in methylation status enable the identification of such genes as frequently disrupted. Using the list of 1162 genes, the distributions of alteration frequencies for each genomic dimension or combination of dimensions were assessed (Figure 4A). Examining the median frequencies in each box plot, there is a sequential increase in the median as more dimensions are examined. This point can be further validated using specific genes. For example, the CD70 and ENG genes are underexpressed in the majority of samples. Using copy number analysis alone, the observed frequency of disruption (loss and underexpression) is 44% and 22% respectively. If we then examine the methylation status, in the remaining cases not explained by DNA copy number, we observe an additional 33% of cases exhibiting hypermethylation and underexpression for ENG (red) and 22% for CD70 (blue). Finally, when we also examine allelic status, we observe an additional 22% of cases with copy neutral LOH and gene underexpression for CD70 and 11% for ENG. In total, by using all three dimensions, the cumulative frequency of disruption is 88% for CD70 and 77% for ENG (Figure 4B). This example demonstrates the utility of a multi-dimensional approach to elucidate events which would escape conventional single dimensional analysis.

Figure 4
figure 4

Impact of multi-dimensional analysis on low frequency events. (A) Box plot analysis of the frequency distribution of single and multi-dimensional analyses (MDA) of the 1162 genes differentially expressed with a concerted change in one of the DNA dimensions. The area in red represents the number of genes (of the 1162) that would be missed if only a single DNA dimension was examined, while the area in blue represents the genes that would be detected. Examining the median values for the three right-most boxes, we see that by even using the box with the highest median (copy number), we would not be able to detect about 50% of the 1162 genes. (B) Two specific examples highlighting the importance of multi-dimensional genomic analysis. Using single dimensional analyses (green shade) alone, CD70 (blue line graph) and ENG (red line graph) disruption occur at very low frequencies (44% and 33% respectively). However, when examining two (red shade) or three genomic dimensions (blue shade), the disruption of these genes occurs at very high frequencies, 88% and 77% respectively. Frequency threshold of 6/9 is denoted with a black dotted line.

MDA identifies significantly enriched cancer related pathways

Using the set of 1162 genes identified by MDA (Additional File 4) and the similar lists of genes identified from each of the simulated datasets, pathway analyses were performed with Ingenuity Pathway Analysis. From the pathway analysis of MDA genes and focusing only on canonical signaling pathways, 53 pathways were significantly enriched for at a Benjamini-Hochberg corrected p-value of 0.05 (Additional File 5). In contrast, using the gene lists from the 10 simulated datasets, nine of the 10 pathway analyses yielded no significant pathways enriched for at the same p-value with one of the pathway analyses yielding one significant pathway. Similar results from Gene Ontology analysis were obtained using the publicly available GATHER database [41] (Additional File 6). Specific pathways involved in breast cancer, ovarian cancer, and prostate cancer were amongst the ones identified as most significant (Figure 5). Consequently, these results suggest that the genes identified using MDA have a high degree of biological relevance.

Figure 5
figure 5

Pathway analysis of the 1162 genes identified by multi-dimensional analysis. Ingenuity Pathway Analysis of the 1162 genes identified by MDA as well as genes meeting the same frequency criteria (6/9) from the analysis of the ten simulated datasets. In total, using the list of 1162 MDA genes, 53 canonical signaling pathways were identified as significant after multiple testing correction using a Benjamini-Hochberg correction (Additional File 5). In contrast, using the same statistical criteria, nine of the 10 simulated datasets yielded no significant pathways with one of the datasets yielding one pathway. In this figure, ten of the most well known, cancer-related pathways are shown. The yellow threshold line represents a Benjamini-Hochberg corrected p-value of 0.05 with bars above that line deemed significant. The first blue bar represents the analysis of the actual dataset and the subsequent ten bars represent the analyses of the ten simulated datasets.

MDA of the Neuregulin signaling pathway reveals a complex pattern of deregulation

Among the 53 pathways which were statistically over-represented from our list of 1162 genes, one of the pathways identified is the neuregulin pathway. This pathway contains the well known breast cancer oncogene ERBB2 as well as other genes known to be affected in breast and other cancers [4245]. Examining the components of this pathway, we observe that some are genes commonly altered while others are infrequently altered across our sample set by multiple patterns of genomic alteration, and some genes which behave oppositely in different samples (Figure 6).

Figure 6
figure 6

Complex deregulation of the Neuregulin/ERBB2 signaling pathway. Each gene is color-coded red and green to represent over and underexpression respectively. Genes colored both represent genes which are over and underexpressed in different samples. Beside each gene is the status for gene expression, copy number, LOH/AI and DNA methylation, with the alterations in each dimension colored as per the legend. DNA alterations are only shown when a change in gene expression is observed. It should be noted that LOH can be derived from multiple mechanisms. In this study, we do not distinguish between the which mechanisms. Likewise, methylation changes may affect one or both alleles. In this study, we do not distinguish the status of the alleles individually. Genes denoted with * have one sample exhibiting multiple concerted disruption (MCD). Samples are coded as follows: S1 = HCC38, S2 = HCC1008, S3 = HCC1143, S4 = HCC1395, S5 = HCC1599, S6 = HCC1937, S7 = HCC2218, S8 = BT474, and S9 = MCF7.

While genes such as HRAS (down), BAD (down), HSP90AB1 (up), SOS2 (up) and RPS6KB1 (up) generally exhibit consistent differential expression with concerted change at the DNA level across our sample set, genes such as GRB7, PTEN, and MAP2K1 exhibit both overexpression and underexpression, with concerted DNA change, in different samples. For example, if we examine PTEN, we observe copy number loss, LOH, DNA hypermethylation and consequent underexpression in HCC1395 while HCC1008 contains copy number gain, with DNA hypomethylation and consequent overexpression (Figure 7). The impact of such a difference on a downstream targets was recently shown in a breast cancer study where AKT and mTOR phosphorylation were higher in cases with low PTEN expression compared to those with high PTEN expression [46]. Using this pathway as an example, though average features across a sample set are important, those differences between samples in the same pathway may also play an important role and thus, may have a consequence on the biology of the tumor.

Figure 7
figure 7

Deregulation of PTEN occurs differently between samples. In HCC1008 (top), PTEN is overexpressed with an associated gain in copy number and hypomethylation. Conversely, in HCC1395 (bottom), PTEN is underexpressed, with an associated loss in copy number, LOH, and DNA hypermethylation. This illustrates how each tumor may behave differently from another.

Genes exhibiting multiple concerted disruption (MCD) - biological and clinical significance

We have demonstrated that we can identify more disrupted genes in a given sample when considering any mechanism of disruption. On the other hand, those genes which exhibit multiple concerted disruption (MCD) across all DNA dimensions -- i.e. overexpression of a gene due to increased gene dosage, which led to allelic imbalance, and DNA hypomethylation at the same locus relieving regulation -- may likely have strong biological significance. Likewise, underexpression due to reduced gene copy number, resulting in LOH, and complementary DNA hypermethylation, leading to gene silencing may also be significant. By employing multiple dimensions of interrogation, genes exhibiting MCD are captured.

To determine what frequency of MCD was deemed significant, we performed a similar analysis of the 10 simulated datasets from before and assessed the proportion of events at each frequency of MCD from 0/9 to 1/9 (Figure 8A). It was found that by random chance, a gene exhibiting MCD in 1/9 would occur 0.3% of the time. Thus, using this threshold of at least one MCD event, 974 genes were identified (Additional File 7). Interestingly, the overlap of the MDA list (1162 genes) with the MCD list (974 genes) yielded 375 genes.

Figure 8
figure 8

Multiple concerted disruption (MCD) analysis and its application to triple negative breast cancer. (A) Analysis of ten simulated datasets to determine the proportion of random simulations at each observed frequency of MCD. Notably, 99.7% of random simulations had a MCD frequency of 0/9, with the remaining 0.3% at 1/9. Moreover, no simulations showed a frequency ≥ 2/9. Thus, the observation of an MCD event suggests the event is likely non-random. (B) Using the knowledge database of Ingenuity Pathway Analysis, upstream and downstream components of FGFR2 were selected to assess their role in the subset of triple negative breast cancer (TNBC) cell lines. Only components which were shown to have a direct or indirect expression level relationship were selected. Of the seven components identified (four upstream and three downstream of FGFR2), one upstream and one downstream component were present in both the MDA list (Additional File 4) and MCD list (Additional File 7). Examining FGFR2 and COL1A1, while FGFR2 overexpression is not frequently associated with DNA level alteration, COL1A1 is frequently affected at DNA level. Moreover, in the five TNBC cell lines examined, four have DNA level alteration of COL1A1 and the remaining line has DNA level alteration of FGFR2.

The MCD strategy sequentially refines the roster of target genes with the intent of identifying critical genes for tumorigenesis (Additional File 8). Such genes which exhibit multiple mechanisms of deregulation, for example, may represent important nodes in pathways such as hub proteins [47], whereby disruption of the gene has an effect on multiple downstream targets or genes with biological and/or clinical relevance. Thus, although these genes may not be affected at a high frequency across the sample set, their disruption at multiple levels in individual samples would signify importance in tumorigenesis. As shown earlier, 375 genes identified by both MDA and MCD. If we further employed a criterion of frequent MCD, whereby this event occurs in 4/9 of cases (signifying high recurrence), we detect 23 genes (Additional File 8). Among the 23 genes identified are TUSC3 (8p22), ELK3 (12q23), and CCNA1 (13q12.3-q13).

TUSC3 resides at 8p22, a locus frequently deleted across multiple epithelial cancers [4851]. ELK3 is an ETS domain transcription factor which, in mice, acts as a transcriptional inhibitor in the absence of RAS, but is a transcriptional activator in the presence of RAS [52]. Recently, ELK3 was shown to be underexpressed in a panel of breast cancer lines as well clinical breast tumor specimens [53]. CCNA1 was shown to be hypermethylated in multiple cancer types, including breast cancer [54].

To validate the relevance of the 23 MCD genes in clinical breast cancer samples, we evaluated gene expression levels associated with survival and examined multiple publicly available microarray datasets using the Oncomine database [35]. Of these 23 genes, 17 were represented in either the van de Vijver et al or Sorlie et al datasets. Interestingly, eight of these genes, demonstrated a statistically significant association with patient survival in at least one of the two independent datasets (Additional Files 9, 10) [4, 33]. Moreover, when comparing the percentage of survival-associated genes (8/17, 47.1%) in the MCD gene list with what was expected without pre-selection (27.1%), the increased percentage was statistically significant based on the binomial test (p = 0.04131806). To further evaluate the clinical significance of these genes, we utilized the Oncomine database (Additional File 9). It should be noted the caveat of the Oncomine analysis is that it may not detect all low levels of differential expression. TUSC3 is shown as an example of one of the genes whose expression correlates with survival (Additional File 8, also see Methods). Notably, in ovarian cancer, TUSC3, in conjunction with EFA6R, also correlated with poor survival [55]. The observations that TUSC3 is altered frequently by multiple mechanisms at the DNA and RNA level and shows a strong association with patient survival, highlight the use of MCD in systematically identifying biologically, and potentially clinically, relevant genes.

Association of genes exhibiting MCD and triple negative breast cancers (TNBC)

In this study, the majority of samples used (5/9) were of the triple negative subtype of breast cancer; a subtype which is estrogen receptor (ER) negative, progesterone receptor (PR) negative, and HER2 negative and represents between 10% and 20% of all diagnosed breast malignancies [5659]. Genomic analyses of triple negative breast cancers (TNBCs) have been previously performed [6063] and they revealed a heterogeneous and complex view of this breast cancer subtype. A recent study, however, had implicated fibroblast growth factor receptor 2 (FGFR2) as novel therapeutic target amplified in TNBCs [59]. Interestingly, from a meta-analysis of array CGH data, this gene was found to be amplified in 4% of TNBC cases [59]. Thus, we assessed the status of FGFR2 and its downstream targets in our multi-dimensional dataset.

While FGFR2 is not amplified in any of the five TNBC cell lines, all of the five cell lines showed overexpression of FGFR2 with one of the cell lines exhibiting a low level gain of a region encompassing FGFR2 (HCC1937). From this analysis, within the sample set of TNBC cell lines, though FGFR2 is overexpressed, it was not frequently associated with DNA level alterations.

However, examining downstream targets of FGFR2 revealed a striking finding. Using the knowledge database of Ingenuity Pathway Analysis, one of the downstream components affected at the expression level, which was also on both the MDA (Additional File 4) and MCD (Additional File 7) lists, was COL1A1. Remarkably, of the five TNBC cell lines, four exhibited DNA alteration associated overexpression of COL1A1 (two lines exhibited MCD at COL1A1 and two other lines have DNA copy number associated overexpression). The remaining line exhibited DNA copy number associated overexpression of FGFR2 (Figure 8B). Hence, every TNBC line was affected at either FGFR2 or COL1A1. Interestingly, COL1A1 has been shown to be both prognostic and predictive in multiple cancer types, including breast cancer [3, 5, 64, 65].


In conclusion, we have demonstrated that a multi-dimensional genomic approach is superior to analysis of one or two genomic dimensions alone. Each additional genomic dimension surveyed increases the amount of aberrant gene expression that can be explained within individual samples. As a by-product, when examining across a sample set, multi-dimensional genomic analysis can identify relevant genes that may be overlooked due to low frequencies of disruption by the individual mechanisms. The increased frequency of gene disruption detected, due to the consideration of multiple mechanisms of disruption, could potentially reduce the sample size of study cohort needed for gene discovery.

Secondly, while the increased detection of genes disrupted using multi-dimensional analysis is useful for achieving a more comprehensive identification of deregulated pathways and gene networks, it also presents a challenge in prioritizing which genes are likely key nodes or hubs in the affected pathways and networks. Hence, one way to prioritize is to identify genes with evidence of multiple concerted disruption. The Knudson two-hit hypothesis suggests that tumor suppressor genes require two allelic hits to disrupt gene function. Bi-allelic alteration, such as homozygous deletion, or concerted genetic and epigenetic changes, are well documented causal mechanisms of gene disruption. Likewise, hypomethylation and increased gene dosage are known mechanisms for gene overexpression. The bi-allelic disruption phenomenon (leading to loss or gain of function) provides a means to identify causative genes; hence, parallel analysis of the genome and epigenome in the same tumor is of great benefit. In this study, we have developed a stepwise gene selection strategy to identify multiple concerted disruptions using an integrative genomics approach.

In this study, three DNA dimensions, which have current affordable high throughput assays, were examined. However, we envision that new techniques for analysis of additional aspects such as histone modification states and gene mutation status will reveal mechanisms that would explain even more gene expression changes within individual samples. The identification of a number of key cancer-related genes and pathways using a relatively small sample size suggests that limitations in requiring large sample sizes for studies to identify relevant genes and pathways may be circumvented by our comprehensive approach. Consequently, this concept can be projected to current technologies such as high throughput sequencing where it may prove more prudent to perform this analysis in multiple dimensions in a smaller number of samples rather than in one dimension in many more samples at a comparable cost. Finally, observing the same gene in a given pathway being deregulated in a completely different manner between samples highlights one of the shortcomings of group-based analysis and highlights the eventual need to move to systems analysis of tumors as individual entities.


  1. Chang JC, Wooten EC, Tsimelzon A, Hilsenbeck SG, Gutierrez MC, Elledge R, Mohsin S, Osborne CK, Chamness GC, Allred DC, et al.: Gene expression profiling for the prediction of therapeutic response to docetaxel in patients with breast cancer. Lancet. 2003, 362 (9381): 362-369. 10.1016/S0140-6736(03)14023-8

    Article  CAS  PubMed  Google Scholar 

  2. Coe BP, Chari R, Lockwood WW, Lam WL: Evolving strategies for global gene expression analysis of cancer. J Cell Physiol. 2008, 217 (3): 590-597. 10.1002/jcp.21554

    Article  CAS  PubMed  Google Scholar 

  3. Perou CM, Sorlie T, Eisen MB, Rijn van de M, Jeffrey SS, Rees CA, Pollack JR, Ross DT, Johnsen H, Akslen LA, et al.: Molecular portraits of human breast tumours. Nature. 2000, 406 (6797): 747-752. 10.1038/35021093

    Article  CAS  PubMed  Google Scholar 

  4. Sorlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, Hastie T, Eisen MB, Rijn van de M, Jeffrey SS, et al.: Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci USA. 2001, 98 (19): 10869-10874. 10.1073/pnas.191367098

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  5. van 't Veer LJ, Dai H, Vijver van de MJ, He YD, Hart AA, Mao M, Peterse HL, Kooy van der K, Marton MJ, Witteveen AT, et al.: Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002, 415 (6871): 530-536. 10.1038/415530a

    Article  PubMed  Google Scholar 

  6. Fridlyand J, Snijders AM, Ylstra B, Li H, Olshen A, Segraves R, Dairkee S, Tokuyasu T, Ljung BM, Jain AN, et al.: Breast tumor copy number aberration phenotypes and genomic instability. BMC Cancer. 2006, 6: 96- 10.1186/1471-2407-6-96

    Article  PubMed Central  PubMed  Google Scholar 

  7. Albertson DG, Ylstra B, Segraves R, Collins C, Dairkee SH, Kowbel D, Kuo WL, Gray JW, Pinkel D: Quantitative mapping of amplicon structure by array CGH identifies CYP24 as a candidate oncogene. Nat Genet. 2000, 25 (2): 144-146. 10.1038/75985

    Article  CAS  PubMed  Google Scholar 

  8. Chin SF, Wang Y, Thorne NP, Teschendorff AE, Pinder SE, Vias M, Naderi A, Roberts I, Barbosa-Morais NL, Garcia MJ, et al.: Using array-comparative genomic hybridization to define molecular portraits of primary breast cancers. Oncogene. 2007, 26 (13): 1959-1970. 10.1038/sj.onc.1209985

    Article  CAS  PubMed  Google Scholar 

  9. Jain AN, Chin K, Borresen-Dale AL, Erikstein BK, Eynstein Lonning P, Kaaresen R, Gray JW: Quantitative analysis of chromosomal CGH in human breast tumors associates copy number abnormalities with p53 status and patient survival. Proc Natl Acad Sci USA. 2001, 98 (14): 7952-7957. 10.1073/pnas.151241198

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  10. Naylor TL, Greshock J, Wang Y, Colligon T, Yu QC, Clemmer V, Zaks TZ, Weber BL: High resolution genomic analysis of sporadic breast cancer using array-based comparative genomic hybridization. Breast Cancer Res. 2005, 7 (6): R1186-1198. 10.1186/bcr1356

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  11. Shadeo A, Lam WL: Comprehensive copy number profiles of breast cancer cell model genomes. Breast Cancer Res. 2006, 8 (1): R9- 10.1186/bcr1370

    Article  PubMed Central  PubMed  Google Scholar 

  12. Chin K, DeVries S, Fridlyand J, Spellman PT, Roydasgupta R, Kuo WL, Lapuk A, Neve RM, Qian Z, Ryder T, et al.: Genomic and transcriptional aberrations linked to breast cancer pathophysiologies. Cancer Cell. 2006, 10 (6): 529-541. 10.1016/j.ccr.2006.10.009

    Article  CAS  PubMed  Google Scholar 

  13. Chin SF, Teschendorff AE, Marioni JC, Wang Y, Barbosa-Morais NL, Thorne NP, Costa JL, Pinder SE, Wiel van de MA, Green AR, et al.: High-resolution aCGH and expression profiling identifies a novel genomic subtype of ER negative breast cancer. Genome Biol. 2007, 8 (10): R215- 10.1186/gb-2007-8-10-r215

    Article  PubMed Central  PubMed  Google Scholar 

  14. Hyman E, Kauraniemi P, Hautaniemi S, Wolf M, Mousses S, Rozenblum E, Ringner M, Sauter G, Monni O, Elkahloun A, et al.: Impact of DNA amplification on gene expression patterns in breast cancer. Cancer Res. 2002, 62 (21): 6240-6245.

    CAS  PubMed  Google Scholar 

  15. Pollack JR, Sorlie T, Perou CM, Rees CA, Jeffrey SS, Lonning PE, Tibshirani R, Botstein D, Borresen-Dale AL, Brown PO: Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors. Proc Natl Acad Sci USA. 2002, 99 (20): 12963-12968. 10.1073/pnas.162471999

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  16. Chari R, Lockwood WW, Coe BP, Chu A, Macey D, Thomson A, Davies JJ, MacAulay C, Lam WL: SIGMA: a system for integrative genomic microarray analysis of cancer genomes. BMC Genomics. 2006, 7: 324- 10.1186/1471-2164-7-324

    Article  PubMed Central  PubMed  Google Scholar 

  17. Ishkanian AS, Malloff CA, Watson SK, DeLeeuw RJ, Chi B, Coe BP, Snijders A, Albertson DG, Pinkel D, Marra MA, et al.: A tiling resolution DNA microarray with complete coverage of the human genome. Nat Genet. 2004, 36 (3): 299-303. 10.1038/ng1307

    Article  CAS  PubMed  Google Scholar 

  18. Lockwood WW, Coe BP, Williams AC, MacAulay C, Lam WL: Whole genome tiling path array CGH analysis of segmental copy number alterations in cervical cancer cell lines. Int J Cancer. 2007, 120 (2): 436-443. 10.1002/ijc.22335

    Article  CAS  PubMed  Google Scholar 

  19. Khojasteh M, Lam WL, Ward RK, MacAulay C: A stepwise framework for the normalization of array CGH data. BMC Bioinformatics. 2005, 6: 274- 10.1186/1471-2105-6-274

    Article  PubMed Central  PubMed  Google Scholar 

  20. Jong K, Marchiori E, Meijer G, Vaart AV, Ylstra B: Breakpoint identification and smoothing of array comparative genomic hybridization data. Bioinformatics. 2004, 20 (18): 3636-3637. 10.1093/bioinformatics/bth355

    Article  CAS  PubMed  Google Scholar 

  21. Coe BP, Lockwood WW, Girard L, Chari R, Macaulay C, Lam S, Gazdar AF, Minna JD, Lam WL: Differential disruption of cell cycle pathways in small cell and non-small cell lung cancer. Br J Cancer. 2006, 94 (12): 1927-1935. 10.1038/sj.bjc.6603167

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  22. Carvalho B, Bengtsson H, Speed TP, Irizarry RA: Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data. Biostatistics. 2007, 8 (2): 485-499. 10.1093/biostatistics/kxl042

    Article  PubMed  Google Scholar 

  23. Lin M, Wei LJ, Sellers WR, Lieberfarb M, Wong WH, Li C: dChipSNP: significance curve and clustering of SNP-array-based loss-of-heterozygosity data. Bioinformatics. 2004, 20 (8): 1233-1240. 10.1093/bioinformatics/bth069

    Article  CAS  PubMed  Google Scholar 

  24. Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W, et al.: Global variation in copy number in the human genome. Nature. 2006, 444 (7118): 444-454. 10.1038/nature05329

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  25. Gautier L, Cope L, Bolstad BM, Irizarry RA: affy--analysis of Affymetrix GeneChip data at the probe level. Bioinformatics. 2004, 20 (3): 307-315. 10.1093/bioinformatics/btg405

    Article  CAS  PubMed  Google Scholar 

  26. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, et al.: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004, 5 (10): R80- 10.1186/gb-2004-5-10-r80

    Article  PubMed Central  PubMed  Google Scholar 

  27. Karolchik D, Kuhn RM, Baertsch R, Barber GP, Clawson H, Diekhans M, Giardine B, Harte RA, Hinrichs AS, Hsu F: The UCSC Genome Browser Database: 2008 update. Nucleic Acids Res. 2008, D773-779. 36 Database,

  28. Bibikova M, Lin Z, Zhou L, Chudin E, Garcia EW, Wu B, Doucet D, Thomas NJ, Wang Y, Vollmer E, et al.: High-throughput DNA methylation profiling using universal bead arrays. Genome Res. 2006, 16 (3): 383-393. 10.1101/gr.4410706

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  29. Chari R, Coe BP, Wedseltoft C, Benetti M, Wilson IM, Vucic EA, MacAulay C, Ng RT, Lam WL: SIGMA2: a system for the integrative genomic multi-dimensional analysis of cancer genomes, epigenomes, and transcriptomes. BMC Bioinformatics. 2008, 9: 422- 10.1186/1471-2105-9-422

    Article  PubMed Central  PubMed  Google Scholar 

  30. Soh J, Okumura N, Lockwood WW, Yamamoto H, Shigematsu H, Zhang W, Chari R, Shames DS, Tang X, MacAulay C, et al.: Oncogene mutations, copy number gains and mutant allele specific imbalance (MASI) frequently occur together in tumor cells. PLoS One. 2009, 4 (10): e7464- 10.1371/journal.pone.0007464

    Article  PubMed Central  PubMed  Google Scholar 

  31. Tuna M, Knuutila S, Mills GB: Uniparental disomy in cancer. Trends Mol Med. 2009, 15 (3): 120-128. 10.1016/j.molmed.2009.01.005

    Article  CAS  PubMed  Google Scholar 

  32. Yan H, Yuan W, Velculescu VE, Vogelstein B, Kinzler KW: Allelic variation in human gene expression. Science. 2002, 297 (5584): 1143- 10.1126/science.1072545

    Article  CAS  PubMed  Google Scholar 

  33. Vijver van de MJ, He YD, van't Veer LJ, Dai H, Hart AA, Voskuil DW, Schreiber GJ, Peterse JL, Roberts C, Marton MJ, et al.: A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med. 2002, 347 (25): 1999-2009. 10.1056/NEJMoa021967

    Article  PubMed  Google Scholar 

  34. Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB: Missing value estimation methods for DNA microarrays. Bioinformatics. 2001, 17 (6): 520-525. 10.1093/bioinformatics/17.6.520

    Article  CAS  PubMed  Google Scholar 

  35. Rhodes DR, Kalyana-Sundaram S, Mahavisno V, Varambally R, Yu J, Briggs BB, Barrette TR, Anstet MJ, Kincead-Beal C, Kulkarni P, et al.: Oncomine 3.0: genes, pathways, and networks in a collection of 18, 000 cancer gene expression profiles. Neoplasia. 2007, 9 (2): 166-180. 10.1593/neo.07112

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  36. Johnson N, Speirs V, Curtin NJ, Hall AG: A comparative study of genome-wide SNP, CGH microarray and protein expression analysis to explore genotypic and phenotypic mechanisms of acquired antiestrogen resistance in breast cancer. Breast Cancer Res Treat. 2008, 111 (1): 55-63. 10.1007/s10549-007-9758-6

    Article  CAS  PubMed  Google Scholar 

  37. Jee CD, Lee HS, Bae SI, Yang HK, Lee YM, Rho MS, Kim WH: Loss of caspase-1 gene expression in human gastric carcinomas and cell lines. Int J Oncol. 2005, 26 (5): 1265-1271.

    CAS  PubMed  Google Scholar 

  38. Ueki T, Takeuchi T, Nishimatsu H, Kajiwara T, Moriyama N, Narita Y, Kawabe K, Ueki K, Kitamura T: Silencing of the caspase-1 gene occurs in murine and human renal cancer cells and causes solid tumor growth in vivo. Int J Cancer. 2001, 91 (5): 673-679. 10.1002/1097-0215(200002)9999:9999<::AID-IJC1113>3.0.CO;2-U

    Article  CAS  PubMed  Google Scholar 

  39. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008, 455 (7216): 1061-1068.

  40. Parsons DW, Jones S, Zhang X, Lin JC, Leary RJ, Angenendt P, Mankoo P, Carter H, Siu IM, Gallia GL, et al.: An integrated genomic analysis of human glioblastoma multiforme. Science. 2008, 321 (5897): 1807-1812. 10.1126/science.1164382

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  41. Chang JT, Nevins JR: GATHER: a systems approach to interpreting genomic signatures. Bioinformatics. 2006, 22 (23): 2926-2933. 10.1093/bioinformatics/btl483

    Article  CAS  PubMed  Google Scholar 

  42. Bachman KE, Argani P, Samuels Y, Silliman N, Ptak J, Szabo S, Konishi H, Karakas B, Blair BG, Lin C, et al.: The PIK3CA gene is mutated with high frequency in human breast cancers. Cancer Biol Ther. 2004, 3 (8): 772-775. 10.4161/cbt.3.8.994

    Article  CAS  PubMed  Google Scholar 

  43. Slamon DJ, Godolphin W, Jones LA, Holt JA, Wong SG, Keith DE, Levin WJ, Stuart SG, Udove J, Ullrich A, et al.: Studies of the HER-2/neu proto-oncogene in human breast and ovarian cancer. Science. 1989, 244 (4905): 707-712. 10.1126/science.2470152

    Article  CAS  PubMed  Google Scholar 

  44. Stein D, Wu J, Fuqua SA, Roonprapunt C, Yajnik V, D'Eustachio P, Moskow JJ, Buchberg AM, Osborne CK, Margolis B: The SH2 domain protein GRB-7 is co-amplified, overexpressed and in a tight complex with HER2 in breast cancer. Embo J. 1994, 13 (6): 1331-1340.

    PubMed Central  CAS  PubMed  Google Scholar 

  45. Lockwood WW, Chari R, Coe BP, Girard L, Macaulay C, Lam S, Gazdar AF, Minna JD, Lam WL: DNA amplification is a ubiquitous mechanism of oncogene activation in lung and other cancers. Oncogene. 2008, 27 (33): 4615-4624. 10.1038/onc.2008.98

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  46. Stemke-Hale K, Gonzalez-Angulo AM, Lluch A, Neve RM, Kuo WL, Davies M, Carey M, Hu Z, Guan Y, Sahin A, et al.: An integrative genomic and proteomic analysis of PIK3CA, PTEN, and AKT mutations in breast cancer. Cancer Res. 2008, 68 (15): 6084-6091. 10.1158/0008-5472.CAN-07-6854

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  47. Wang E, Lenferink A, O'Connor-McCourt M: Cancer systems biology: exploring cancer-associated genes on cellular networks. Cell Mol Life Sci. 2007, 64 (14): 1752-1762. 10.1007/s00018-007-7054-6

    Article  CAS  PubMed  Google Scholar 

  48. Bova GS, Carter BS, Bussemakers MJ, Emi M, Fujiwara Y, Kyprianou N, Jacobs SC, Robinson JC, Epstein JI, Walsh PC, et al.: Homozygous deletion and frequent allelic loss of chromosome 8p22 loci in human prostate cancer. Cancer Res. 1993, 53 (17): 3869-3873.

    CAS  PubMed  Google Scholar 

  49. Chinen K, Isomura M, Izawa K, Fujiwara Y, Ohata H, Iwamasa T, Nakamura Y: Isolation of 45 exon-like fragments from 8p22-->p21.3, a region that is commonly deleted in hepatocellular, colorectal, and non-small cell lung carcinomas. Cytogenet Cell Genet. 1996, 75 (2-3): 190-196. 10.1159/000134480

    Article  CAS  PubMed  Google Scholar 

  50. Cooke SL, Pole JC, Chin SF, Ellis IO, Caldas C, Edwards PA: High-resolution array CGH clarifies events occurring on 8p in carcinogenesis. BMC Cancer. 2008, 8 (1): 288- 10.1186/1471-2407-8-288

    Article  PubMed Central  PubMed  Google Scholar 

  51. Yaremko ML, Recant WM, Westbrook CA: Loss of heterozygosity from the short arm of chromosome 8 is an early event in breast cancers. Genes Chromosomes Cancer. 1995, 13 (3): 186-191. 10.1002/gcc.2870130308

    Article  CAS  PubMed  Google Scholar 

  52. Giovane A, Pintzas A, Maira SM, Sobieszczuk P, Wasylyk B: Net, a new ets transcription factor that is activated by Ras. Genes Dev. 1994, 8 (13): 1502-1513. 10.1101/gad.8.13.1502

    Article  CAS  PubMed  Google Scholar 

  53. He J, Pan Y, Hu J, Albarracin C, Wu Y, Dai JL: Profile of Ets gene expression in human breast carcinoma. Cancer Biol Ther. 2007, 6 (1): 76-82.

    Article  CAS  PubMed  Google Scholar 

  54. Shames DS, Girard L, Gao B, Sato M, Lewis CM, Shivapurkar N, Jiang A, Perou CM, Kim YH, Pollack JR, et al.: A genome-wide screen for promoter methylation in lung cancer identifies novel methylation markers for multiple malignancies. PLoS Med. 2006, 3 (12): e486- 10.1371/journal.pmed.0030486

    Article  PubMed Central  PubMed  Google Scholar 

  55. Pils D, Horak P, Gleiss A, Sax C, Fabjani G, Moebus VJ, Zielinski C, Reinthaller A, Zeillinger R, Krainer M: Five genes from chromosomal band 8p22 are significantly down-regulated in ovarian carcinoma: N33 and EFA6R have a potential impact on overall survival. Cancer. 2005, 104 (11): 2417-2429. 10.1002/cncr.21538

    Article  CAS  PubMed  Google Scholar 

  56. Cheang MC, Voduc D, Bajdik C, Leung S, McKinney S, Chia SK, Perou CM, Nielsen TO: Basal-like breast cancer defined by five biomarkers has superior prognostic value than triple-negative phenotype. Clin Cancer Res. 2008, 14 (5): 1368-1376. 10.1158/1078-0432.CCR-07-1658

    Article  CAS  PubMed  Google Scholar 

  57. Gluz O, Liedtke C, Gottschalk N, Pusztai L, Nitz U, Harbeck N: Triple-negative breast cancer--current status and future directions. Ann Oncol. 2009, 20 (12): 1913-1927. 10.1093/annonc/mdp492

    Article  CAS  PubMed  Google Scholar 

  58. Rakha EA, El-Sayed ME, Green AR, Lee AH, Robertson JF, Ellis IO: Prognostic markers in triple-negative breast cancer. Cancer. 2007, 109 (1): 25-32. 10.1002/cncr.22381

    Article  CAS  PubMed  Google Scholar 

  59. Turner N, Lambros MB, Horlings HM, Pearson A, Sharpe R, Natrajan R, Geyer FC, van Kouwenhove M, Kreike B, Mackay A, et al.: Integrative molecular profiling of triple negative breast cancers identifies amplicon drivers and potential therapeutic targets. Oncogene. 2010, 29 (14): 2013-2023. 10.1038/onc.2009.489

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  60. Andre F, Job B, Dessen P, Tordai A, Michiels S, Liedtke C, Richon C, Yan K, Wang B, Vassal G, et al.: Molecular characterization of breast cancer with high-resolution oligonucleotide comparative genomic hybridization array. Clin Cancer Res. 2009, 15 (2): 441-451. 10.1158/1078-0432.CCR-08-1791

    Article  CAS  PubMed  Google Scholar 

  61. Bertucci F, Finetti P, Cervera N, Esterni B, Hermitte F, Viens P, Birnbaum D: How basal are triple-negative breast cancers?. Int J Cancer. 2008, 123 (1): 236-240. 10.1002/ijc.23518

    Article  CAS  PubMed  Google Scholar 

  62. Han W, Jung EM, Cho J, Lee JW, Hwang KT, Yang SJ, Kang JJ, Bae JY, Jeon YK, Park IA, et al.: DNA copy number alterations and expression of relevant genes in triple-negative breast cancer. Genes Chromosomes Cancer. 2008, 47 (6): 490-499. 10.1002/gcc.20550

    Article  CAS  PubMed  Google Scholar 

  63. Kreike B, van Kouwenhove M, Horlings H, Weigelt B, Peterse H, Bartelink H, van MJ: Gene expression profiling and histopathological characterization of triple-negative/basal-like breast carcinomas. Breast Cancer Res. 2007, 9 (5): R65- 10.1186/bcr1771

    Article  PubMed Central  PubMed  Google Scholar 

  64. Ramaswamy S, Ross KN, Lander ES, Golub TR: A molecular signature of metastasis in primary solid tumors. Nat Genet. 2003, 33 (1): 49-54. 10.1038/ng1060

    Article  CAS  PubMed  Google Scholar 

  65. Wang Y, Klijn JG, Zhang Y, Sieuwerts AM, Look MP, Yang F, Talantov D, Timmermans M, Meijer-van Gelder ME, Yu J, et al.: Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet. 2005, 365 (9460): 671-679.

    Article  CAS  PubMed  Google Scholar 

Download references


We would like to thank Dr. Adi F. Gazdar and Ian M. Wilson for critical reading of the manuscript. RC is supported by scholarships from the Canadian Institutes for Health Research (CIHR); RC and WWL are supported by scholarships from the Michael Smith Foundation for Heath Research. This work was supported by grants from the Canadian Breast Cancer Research Alliance IDEA and Canadian Institutes for Health Research.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Raj Chari.

Additional information

Authors' contributions

RC designed the study, performed the analysis and wrote the manuscript. BPC contributed to data interpretation, study design and manuscript preparation. EAV provided technical assistance and contributed to manuscript preparation. WWL contributed to data interpretation and manuscript preparation. WLL is the principal investigator of this project. All authors have read and approved the final manuscript.

Electronic supplementary material


Additional file 1: Description of cell lines. Background information about the cancer cell lines including mutation status and HER2/ER/PR statuses (XLS 30 KB)


Additional file 2: Sources of Data. Listing of all the sources of data used in this paper. Public sources and newly generated data are indicated. (XLS 30 KB)


Additional file 3: Tab-delimited text file of the RMA normalized gene expression data. Gene expression data file in a matrix format with all RMA normalized data points for each sample (TXT 8 MB)


Additional file 4: List of 1162 multi-dimensional analysis (MDA) genes altered in 6/9 samples by any DNA mechanisms with concerted change in gene expression. A list of the 1162 genes identified by MDA. For each gene, the predominant status is listed. Description of the status is provided in the file. (XLS 114 KB)


Additional file 5: Canonical signaling pathways enriched using 1162 MDA genes. Ingenuity Pathway Analysis of the 1162 genes (XLS 39 KB)


Additional file 6: Results of Gene Ontology analysis using GATHER. Gene Ontology analysis of the 1162 genes using GATHER (XLS 310 KB)


Additional file 7: List of 974 multiple concerted disruption (MCD) analysis genes. A list of the 974 genes exhibiting MCD in at least one sample. The predominant status is listed next to each gene. (XLS 74 KB)


Additional file 8: MCD strategy and Kaplan-Meier analysis of TUSC3. Overview of a strategy using MCD to identify relevant genes. (EPS 8 MB)


Additional file 9: Kaplan-Meier survival and Oncomine expression analyses of frequent MCD genes. Summary of the Oncomine expression and K-M survival analysis of the 23 genes. (DOC 72 KB)


Additional file 10: Summary of Kaplan-Meier survival analysis. Results of the K-M analyses using multiple datasets. (XLS 28 KB)

Authors’ original submitted files for images

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Chari, R., Coe, B.P., Vucic, E.A. et al. An integrative multi-dimensional genetic and epigenetic strategy to identify aberrant genes and pathways in cancer. BMC Syst Biol 4, 67 (2010).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: