Integrating genetics and epigenetics in breast cancer: biological insights, experimental, computational methods and therapeutic potential

Cava, Claudia; Bertoli, Gloria; Castiglioni, Isabella

doi:10.1186/s12918-015-0211-x

Review
Open access
Published: 21 September 2015

Integrating genetics and epigenetics in breast cancer: biological insights, experimental, computational methods and therapeutic potential

Claudia Cava¹,
Gloria Bertoli¹ &
Isabella Castiglioni¹

BMC Systems Biology volume 9, Article number: 62 (2015) Cite this article

6359 Accesses
18 Citations
2 Altmetric
Metrics details

Abstract

Background

Development of human cancer can proceed through the accumulation of different genetic changes affecting the structure and function of the genome. Combined analyses of molecular data at multiple levels, such as DNA copy-number alteration, mRNA and miRNA expression, can clarify biological functions and pathways deregulated in cancer. The integrative methods that are used to investigate these data involve different fields, including biology, bioinformatics, and statistics.

Results

These methodologies are presented in this review, and their implementation in breast cancer is discussed with a focus on integration strategies. We report current applications, recent studies and interesting results leading to the identification of candidate biomarkers for diagnosis, prognosis, and therapy in breast cancer by using both individual and combined analyses.

Conclusion

This review presents a state of art of the role of different technologies in breast cancer based on the integration of genetics and epigenetics, and shares some issues related to the new opportunities and challenges offered by the application of such integrative approaches.

Introduction

Breast Cancer (BC) is the most common cancer in women and the second most common cause of cancer mortality among females [1]. Classification of BC is currently based on histological types and molecular subtypes in order to reflect the hormone-responsiveness of the tumour. The three most common histological types include invasive ductal carcinoma, ductal carcinoma in situ and invasive lobular carcinoma. The molecular subtypes of BC, which are based on the presence or absence of estrogen receptors (ER), progesterone receptors (PR), and human epidermal growth factor receptor-2 (HER2), include luminal A (ER+ and/or PR+; HER2–), luminal B (ER+ and/or PR+; HER2+), basal-like (ER–, PR–, and HER2–), and HER2-enriched (ER–, PR–, and HER2+) subtypes [2, 3]. This classification reflects the BC heterogeneity and the complexity of diagnosis, prognosis, and treatment of BC.

High-throughput approaches allow today a tumour to be investigated at multiple levels: (i) DNA with copy number alteration (CNA), ii) epigenetic alterations, specifically, DNA methylation, histone modifications and microRNA (miRNA) expression level alterations, and (iii) mRNA, with gene expression (GE) de-regulation. These high-throughput approaches redefined the different types of BC in terms of classification, showing the presence of only two BC profiles with different prognosis [4–6].

Development of human cancer can proceed through the accumulation of genetic and epigenetic changes affecting the structure and function of the genome. Several studies have reported that the epigenetic silencing of one allele may act in concert with an inactivating genetic alteration in the opposite allele, thus resulting in total allelic loss of the gene [7, 8]. Birgisdottir et al. [9] have reported hypermethylation and deletion of the BRCA1 promoter and suggested Knudson's two 'hits' in sporadic BC [9]. Li et al. [10] were focused on the expression of beclin 1 mRNA and they demonstrated that loss of heterozygosity and aberrant DNA methylation might be the possible reasons of the decreased expression of beclin 1 in the BC. In BC, a biallelic inactivation of the FHIT gene could be a consequence of epigenetic inactivation of both parental alleles, or epigenetic modification of one allele and deletion of the remaining allele [11].

In 2006, Feinberg et al. suggested that epigenetics and genetics should be combined or integrated in order to achieve better understanding of cancer [12]. A systems biology approach has been employed to explore the functional relationships among multidimensional “omics” technologies. This approach has been demonstrated to be important for addressing a patient to the optimal treatment in a personalized way, in order to improve the efficacy of the treatment for that patient [13].

This review refers to current studies of genetic and epigenetic changes associated with BC, focusing in particular on the processes controlled by CNA, epigenetic alterations (DNA methylation, histone modifications and miRNAs), and GE. Several approaches combining genetic and epigenetic data, in particular regarding CNA and miRNA deregulation, have been considered with the final purpose to identify new biomarkers for BC diagnosis and prognosis suitable to be translated into a clinical environment. Furthermore, experimental and computation methods used for the study and the analysis of these biomarkers are presented. We also discuss the biological insights and clinical impact from such analyses as well as the future challenges of these combination approaches.

Copy number alterations in BC

Biological insights

CNAs are alterations of the DNA of a genome that result in a cell having an abnormal number of copies of one or more sections of the DNA. They have been identified as causes of cancer diseases and developmental abnormalities (e.g. [14]). Changes in DNA copy number (CN) can occur in specific genes or involve whole chromosomes, usually genomic regions between 1kbp and 1Mbp in length [14].

Figure 1 shows an example of a wild type (WT) cell with two copies of DNA segments that suffer of alterations in tumour cells bringing deletions (CN = 0; CN = 1) or amplifications (CN = 3; CN = 4) of the DNA section.

The ability of cancer cells to accumulate genetic alterations is crucial for the development of cancer in order to inactivate tumour suppressor genes (TSGs) and activate oncogenes (OGs).

In BC, several genetic alterations have been found.

Frequent CN deletions between axillary lymph node metastasis and BC primary tumours were revealed, including aberrations at 6q15-16, containing the gene PNRC1 (a putative tumour suppressor) [15]. Amplification and overexpression of the HER2 (HER2/neu, ERBB2) oncogene on chromosome 17q12 has been observed in 15–25 % of invasive BC [16]. HER2-amplified (HER2+) has been associated with poor prognosis in BC [17], amplification of the HER2 gene leading to HER2 protein levels 10–100 times greater than normal levels [18].

EGFR amplification has been frequently associated with indices of poor prognosis in BC patients, such as large tumour size, high histological grade, high proliferative index, HER2 negative, upregulation of PR [19], and negative ER status [20].

In the same region of HER2 (17q12–21) other genes have been found co-amplified or deleted, e.g. topoisomerase (TOP2A) [21]. Different studies observed the possibility of guiding therapy based on TOP2A status [22, 23].

A recent study has shown alterations of PIK3CA and MET in BC [24]. High CN of PIK3CA and MET was associated to a poor prognosis, and these alterations occur often in triple receptor negative BC [24]. Alterations were also found at 9q31.3-33.1, where the genes DBC1 and DEC1 (regulators of apoptosis) are located [15].

OGs activation by genomic amplification occurs in the members of different oncogene families, e.g. MYC and CCND. MYC is a key regulator of cell growth, proliferation, metabolism, differentiation, and apoptosis [25]. This oncogene is located on chromosome 8q24, and several mechanisms are implicated in its deregulation in BC, including gene amplification and traslocations. MYC amplification plays a role in BC progression because it has been detected in the more aggressive phenotype of ductal carcinoma in situ [26] or in invasive processes [27–29].

Gene amplification of CCND1 has been observed in a subgroup of BCs with poor prognosis and associated with resistance to tamoxifen [30]. Region of amplification is 11q13, and CCND1 acts as a cell cycle regulator, promoting progression through the G₁-S phase [31].

Higher ESR1 gene amplification is found in BC with CCND1 gene amplification in comparison with tumours without CCND1 gene amplification [32]. Amplification of ESR1 has been associated with negative ER [32]. The gene TSPAN1 (on 1p34.1) has been found deleted in metastasizing BC and might represent an important TSG [33]. Another gene, EMSY was found involved in sporadic BC. EMSY amplification has been shown to be associated with a poor prognosis [34].

Compared to non-metastatic invasive ductal carcinoma, metastatic invasive ductal carcinoma showed a unique pattern of CNAs, including gains at 2p24-13, 2q22-33, 9q21-31, 12q21-23, 17 q23-25 and loses at 11q23-ter, 14q23-31, 20p11-q12, 2q36-ter, 8q24-ter, 9q33-ter, 2p11-q11, and 12q13 [35, 36].

Table 1 reports a synthesis of the considered mutated genes in BC, with their genetic alterations due to CNs.

Table 1 Genes mutated and their alterations in BC

Full size table

Experimental methods

Current experimental methods for the identification of CNA include cytogenetic techniques, microarrays, and sequencing-based computational approaches.

Karyotyping is a cytogenetic technique performing a standardized and effective single cell screening in order to identify significant genomic aberrations in pathological and in normal samples.

In a standard karyotyping, a dye like Giesma or Quinacrine is used to stain bands on the chromosomes. Each chromosome presents banding pattern for detecting CNAs. Thus, any alteration in banding pattern represents a CNA [37].

Spectral karyotyping (SKY technique) is a novel technique for chromosome analysis [37], based on the approach of the fluorescence in situ hybridization technique (FISH). Sky refers to the multicolour-FISH technique where each chromosome is represented with different colours (a dye with different fluorophores). This technique is used to identify CNAs in cancer cells and in other disease conditions when other techniques are not enough accurate [37].

Resolution is the main limitation of both techniques, the chromosome profile obtained by karyotyping being not enough sensitive to notice short and relevant abnormalities [38].

Hybridization-based microarray approaches, including array comparative genomic hybridization (array CGH) and Single Nucleotide Polymorphism (SNP) microarrays, have been used as an alternative technology to conventional cytogenetic approaches [39]. They are able to infer CNAs (amplifications and deletions) compared to a reference sample. Array CGH platforms compare quickly and efficiently two labelled samples (different fluorophores - test and reference). Denaturation of the DNA in single stranded allows the hybridization of the two samples to microarrays containing DNA sequence probes of known genome position (e.g. bacterial artificial chromosomes, cDNAs, or more recently, oligonucleotides). By using a fluorescence microscope and a dedicated computer software, the signal ratio of different coloured fluorescents is measured in order to identify chromosomal differences between the two sources. An important consideration is the consequence of the reference sample on the CN profile. A comprehensive-characterized reference is the key for the correct interpretation of array CGH data [40].

SNP-arrays have a higher resolution than CGH-arrays, and can be used to identify allele-specific information. SNP microarray has few key differences from CGH technologies. Probe designs are specific to single-nucleotide differences between DNA sequences.

Ultimately, next generation sequencing (NGS) have replaced microarrays as the platform for discovery and genotyping, and present considerable computational and bioinformatics challenges.

Computational methods

We can summarize CNA analysis from microarray in three steps: 1) normalization, 2) probe-level modelling, and 3) CN estimation [41].

The target of normalization is to remove non relevant effects, such as the GC content of the fragment amplified by PCR, technical variations between arrays occurring from differences in sample preparation or labelling, and array production or scanning differences [42].

Probe-level modelling is usually performed at two levels: single locus and multilocus. Single locus modelling measures the CN of a specific target fragment or DNA probe locus in order to produce a raw fragment CN. Multilocus modelling combines the raw CNs of neighbouring fragments or DNA probe loci into a “meta-probe set” which determines the CN of the whole region [41, 42].

Computerized methods to estimate CNs (e.g. segmentation) performs the detection of break points which separate neighbouring regions based on the Log ratio of probe intensity [41, 42].

Several methods are suitable for analysing CNA on microarray data.

i)
The first CNA analysis method has been developed by Affymetrix: Chromosome Copy Number Analysis Tool [43]. Normalization is performed by quantile normalization. Modelling uses robust multichip average. CN estimation can be done subsequently with an arbitrary algorithm.
ii)
DNA-Chip Analyzer (dChip) [44] normalizes using an invariant set method which corresponds to a normalization of the arrays based on the identification of a common baseline array and on adjustment of all the other arrays relative to the baseline array. Modelling is based on a model-based expression index (MBEI) for single-locus. This output is then used by a Hidden Markov Model (HMM) to infer CNs [44].
iii)
Copy Number Analyser for GeneChip arrays (CNAG) [45] normalizes the arrays in order to have the same mean signal intensity for all autosomal probes. This allows fragment probes comparable between arrays to be obtained. The signal intensity ratios is corrected for the differences in PCR product length and GC content. An HMM algorithm is applied to infer CNs along each chromosome.
iv)
Birdsuite's Birdseye [46] normalizes using quantile normalization. Modelling and segmentation are performed together at the multi-loci level. HMM estimates CNs.
v)
Copy-number estimation using Robust Multichip Analysis (CRMA) [47] has been developed as an extension of the RMA model. Normalization is obtained by allelic cross-hybridization correction (ACC). Modelling uses robust multichip average (RMA). CNA analysis can be done using an arbitrary segmentation algorithm.

Given the different existing computational methods for CNA detection using SNP arrays, researchers have the problem to choose the optimal tool for their analyses.

With the aim of offering a support to bioinformatics researches and to answer to their emerging needs to choose among different CNA detection algorithms, the CNV Workshop was developed [48]. It represents the first cohesive and convenient platform for detection, annotation, and assessment of the biological and clinical significance of structural variants [48]. The purpose of the platform is to process data from a wide variety of SNP arrays, and to implement different normalization and CN estimation algorithms.

Since one of the main problem in the choice of the tool is the detection of discrepancies among different platforms [49], some studies have compared the different analysis using the same data set. Although limited to few methods, due to the high computational cost, several studies allowed the assessment of advantages and disadvantages of some techniques [49–51].

Baross et al.[49] found that CNAG, dChip, CNAT and GLAD are suitable for high-throughput processing of Affymetrix 100 K SNP array data for CN analysis. However, the tools revealed considerable variations in the numbers of putative CNA. dChip found more CNA than the other tested tools. The highest rate of false positive candidate deletion calls was produced by CNAG. In general, the performance of all tools in the detection of single copy deletions was better than that of single copy duplications. The authors recommend also the use of reference data set for accurate analysis, processed in the same laboratory and ideally from samples with an ethnic composition similar to the sample set.

Eckel-Passow et al. [50] provided a description of four freely-available software packages (PennCNV, Aroma. Affymetrix, Affymetrix Power Tools (APT), and Corrected Robust Linear Model with Maximum Likelihood Distance (CRLMM)) that are commonly used for CNA analysis of data generated from Affymetrix Genome-Wide Human SNP Array 6.0 platform. APT obtained the best performance with respect to bias. However, PennCNV and Aroma.Affymetrix had the smallest variability associated with the median locus-level CN.

Zhang et al. [51] assessed four software programs currently used for CNA detection: Birdsuite (version 1.5.2), PennCNV-Affy (a trial version), HelixTree (Version 6.4.2), and Partek (Version, 6.09.0129). They evaluated the accuracy in detecting both rare and common CNVs in the Affymetrix 6.0 platform. They found considerable variations among the programs in the number of CNAs. Birdsuite obtained the highest percentages of known HapMap CNAs containing more than twenty markers in two reference CNA datasets. In the tested rare CNA data, Birdsuite and Partek had higher positive predictive values than the other tools.

Other methods exist for analysing CNA on NGS and they are not described in this review. However, most of the more recent algorithms for CNA discovery are modelled on computational methods which were first used to analyse capillary sequencing reads and fully sequenced large-insert clones [39].

Therapeutic approach

A future challenging direction is the discovery of gene CN changes for the development of therapies. For example, duplication of one gene encoding a specific receptor can be associated with a particular pathology. Thus, compounds that down regulate receptor expression may lead benefit in patients.

Cancer is the prime case in which CNAs have been shown to drive disease [52] and therapies where overexpressed or amplified oncogenic drivers are targeted have been already considered. In particular, in BC, the gene encoding epidermal growth factor receptor (EGFR) results to be amplified, and small molecules such as gefitinib, erlotinib, lapatinib, and cetuximab have been applied to inhibit EGFR with benefits for patients [53, 54].

ERBB2, encoding HER2, is amplified in 30 % of BC [17, 55]. In the therapy of HER2-amplified BC, trastuzumab, an anti-HER2 antibody, has been used [56]. Pertuzumab, a humanized monoclonal antibody, binds HER2, and like trastuzumab, it stimulates antibody-dependent, and cell mediated-cytotoxicity [57]. Pertuzumab and trastuzumab binds to different HER2 epitopes acting in the same way. When given together, they operate reinforcing antitumor activity [58].

These proven benefits, although limited to few genes involved in BC, raise the exciting possibility that targeting amplified disease drivers may offer opportunities for therapy development in BC where effective treatments are still limited.

Epigenetic alterations in BC

DNA methylation and histone modifications

DNA methylation and histone modifications play a crucial role in the maintenance of cellular functions and identity. In particular, the main cellular networks affected by epigenetics are cell cycle, apoptosis, DNA repair, detoxification, inflammation, cell adhesion and invasion.

In cancer, the DNA methylation and histone modifications are perturbed, leading to significant changes in GE, which confer to the tumoral cells advantages in proliferation and maintenance of tumoral phenotype. For instance, the genomic inactivation of a tumor suppressor gene (p53, BRCA1,…) or the activation of an oncogene (i.e., Myc) contribute to the malignant transformation. Epigenetic changes differ from genetic changes mainly because they occur at a higher frequency than genetic changes, they are reversible upon treatment with pharmacological agents and occur at defined regions in a gene.

DNA methylation refers to the addition of a methyl group (−CH₃) covalently to the base cytosine (C) in the dinucleotide 5′-CpG-3′. CpGs islands are in the promoter region of many genes [59, 60]. Most CpG dinucleotides in the human genome are methylated, and often leads to silencing of GE. The observation that CpGs islands of housekeeping genes are mainly unmethylated, and the methylation is associated with loss of GE led to the hypothesis that DNA methylation plays an important role in regulating GE [59, 60].

Figure 2 shows how DNA methylation affects GE. Methyl groups in the recognition elements of transcription factors inhibits the binding of transcription factors to DNA, thus resulting in reduced transcriptional activity.

Histones are considered DNA-packaging protein components of chromatin, able to regulate chromatin dynamics. In fact they are subjected to several post-translational modifications, occurring at the amino-terminal end of the histone tail protruding from the surface of the nucleosome [61]. The modifications of histone tails, including lysine acetylation, lysine and arginine methylation, lysine ubiquitylation, phosphorylation, sumoylation, and ribosylation, can significantly affect the expression of genes in a dynamic manner [61]. The most studied histone epigenetic alterations are acetylation/deacetylation, and methylation/demethylation. In BC, abnormal histone modification and DNA hypermethylation are frequently associated to epigenetic silencing of tumor suppressor genes and genomic instability [62, 63].