Inference of hierarchical regulatory network of estrogen-dependent breast cancer through ChIP-based data
© Gu et al. 2010
Received: 30 June 2010
Accepted: 17 December 2010
Published: 17 December 2010
Skip to main content
© Gu et al. 2010
Received: 30 June 2010
Accepted: 17 December 2010
Published: 17 December 2010
Global profiling of in vivo protein-DNA interactions using ChIP-based technologies has evolved rapidly in recent years. Although many genome-wide studies have identified thousands of ERα binding sites and have revealed the associated transcription factor (TF) partners, such as AP1, FOXA1 and CEBP, little is known about ERα associated hierarchical transcriptional regulatory networks.
In this study, we applied computational approaches to analyze three public available ChIP-based datasets: ChIP-seq, ChIP-PET and ChIP-chip, and to investigate the hierarchical regulatory network for ERα and ERα partner TFs regulation in estrogen-dependent breast cancer MCF7 cells. 16 common TFs and two common new TF partners (RORA and PITX2) were found among ChIP-seq, ChIP-chip and ChIP-PET datasets. The regulatory networks were constructed by scanning the ChIP-peak region with TF specific position weight matrix (PWM). A permutation test was performed to test the reliability of each connection of the network. We then used DREM software to perform gene ontology function analysis on the common genes. We found that FOS, PITX2, RORA and FOXA1 were involved in the up-regulated genes.
We also conducted the ERα and Pol-II ChIP-seq experiments in tamoxifen resistance MCF7 cells (denoted as MCF7-T in this study) and compared the difference between MCF7 and MCF7-T cells. The result showed very little overlap between these two cells in terms of targeted genes (21.2% of common genes) and targeted TFs (25% of common TFs). The significant dissimilarity may indicate totally different transcriptional regulatory mechanisms between these two cancer cells.
Our study uncovers new estrogen-mediated regulatory networks by mining three ChIP-based data in MCF7 cells and ChIP-seq data in MCF7-T cells. We compared the different ChIP-based technologies as well as different breast cancer cells. Our computational analytical approach may guide biologists to further study the underlying mechanisms in breast cancer cells or other human diseases.
Global level profiling of in vivo protein-DNA interactions using ChIP-based technologies has evolved rapidly in recent years, from hybridization with spotted or tiling microarray (ChIP-chip) [1–4], to SAGE-like tags (ChIP-SAGE)  or pair-end tag sequencing (ChIP-PET) , to current massively parallel sequencing (ChIP-seq) [7–10].
Estrogen-mediated gene regulation is such a challenging question that it may require powerful genome-wide profiling tools like ChIP-based technologies. In breast cancer cells, ERα can mediate genomic transcription regulation with nuclear initiated steroid signalling and non-genomic activation of various protein kinase cascades . In the classical genomic pathway, estrogen receptor binds to estrogen response elements (ERE) at the regulatory region of the target genes and recruits co-activators or co-repressors to modulate gene transcription . The non-classical genomic pathway does not require ERE but mediates transcription by the interactions of ERα with other proteins such as AP1 , NF-kB , SP1 [15, 16] and others. At molecular level, we need to identify those genes targeted and regulated by estrogen receptors, and the more challenging task is to delineate the architectures and the underlying mechanisms of such regulation. Estrogen receptors, once activated, may induce increased or decreased transcription of its numerous targets, which have been investigated by expression arrays [17, 18]. In some recent publications, profiling the distribution of ERα by ChIP-seq ChIP-PET and ChIP-chip indicated a highly complicated regulation network involved with both ERα and other relative transcription factors [17–19]. For instance, only a small portion of ERα binding sites were located in the promoter regions of known genes and many unforeseen binding sites could be far away from the TSS, up to 50-100 kb. It was also found that a large number of transcription factors had binding sites co-enriched with ERα binding sites, which indicated a close collaboration between ERα and other factors. All these findings support the evolving concept of estrogen receptor regulation from the conventional interaction between ERE and ERα to the long-range chromatin loop .
Tamoxifen is one of selective ER modulators (SERMs) and is widely used to block ERα function for breast cancer treatment ; however, this endocrine therapy is limited by the onset of drug resistance. Tamoxifen resistance could be induced through both genomic and non-genomic estrogen pathways mentioned above and understanding estrogen regulation network will offer therapeutic advantages. Our recent expression array study showed that, in breast cancer cells with acquired tamoxifen resistance, different groups of genes were targeted by estrogen treatment compared with the parental cells . Delineating the changed architectures of the ERα regulation network in tamoxifen resistance cells by ChIP-based assays may provide direct and useful information on tamoxifen resistance.
In this study, we collected three public available ChIP-based datasets--ChIP-chip, ChIP-PET and ChIP-seq for ERα binding sites in breast cancer MCF7 cells upon estrogen exposure, which also include RNA polymerase II (Pol-II) binding sites in these cells since the binding of Pol-II could provide direct information of potential transcription activation. We then applied computational approaches to investigate the hierarchical regulatory information for ERα regulation in MCF7 cells. We have been able to construct hierarchical regulatory networks with target hubs and have established the regulatory pathways between TFs and genes. We also have applied ChIP-seq technology to systematically compare the estrogen-mediated regulatory information between MCF7 and MCF7-T cells.
It is well recognized that the transcription is driven by Pol-II and the general transcriptional machinery; therefore, it is likely that the genes (or promoters) bound by ERα may not be transcribed without Pol-II binding. As such, we need to use the whole-genome-wide binding peaks information combining ERα with Pol-II antibodies.
The peak-calling procedure was described in the Methods section. The number of ChIP peaks from ChIP-seq is 12,516 for ERα and 13,261 for Pol-II upon E2-treated compared to the vehicle control, the number from ChIP-PET is 14,703 for ERα and 13,133 for Pol-II, the number from ChIP-chip is 10,409 for ERα and 11,455 for Pol-II (Additional file 1, Table S1).
Our previous studies [20–23] and others [24–28] have tried to establish and characterize the molecular mechanisms of estrogen-dependent breast cancer cells. In order to further elucidate the more detailed underlying mechanisms, we then created a whole-genome-wide localization of in vivo ERα binding peaks in the MCF7. We located these identified ERα and Pol-II binding peaks relative to a known annotated gene from the RefSeq database (UCSC HG18 Assembly). Our results showed that 18% for ChIP-seq, 44% for ChIP-PET, 45% for ChIP-chip of Pol-II binding peaks are located within Promoter regions (defined as 2 kb upstream to 2 kb downstream, including 5' Core, 5'TSS and WithinGene Core regions) of a known transcription start site (5'TSS) (Additional file 1, Table S1 and Additional file 2, Figure S1). We also found that a relatively small number of ERα binding peaks are located in a known 5'TSS region (9% for ChIP-seq, 6% for ChIP-PET, 6% for ChIP-chip). A big portion of ERα binding peaks are located within intra-genic regions (38% for ChIP-seq, 37% for ChIP-PET, 58% for ChIP-chip) as well as gene desert regions (14% for ChIP-seq, 20% for ChIP-PET, 12% for ChIP-chip), 100 kb far away from known 5'TSSs and 3'TSSs (Additional file 2, Figure S1B). While our location analysis confirms the results from previous studies [17–19] where the data were collected, our comparative analysis further showed ERα binding patterns are essentially similar in which the majority of ERα binding peaks are outside of proximal promoter regions regardless of different ChIP technologies. Our analysis also suggests that most ERα associated genes may be regulated by the long-distance interaction between ER-bound distal enhancer and proximal promoter regions. We also found that some portions (18% for ChIP-seq, 17% for ChIP-PET, 11% for ChIP-chip) of Pol-II binding peaks are located in the 3' ends (including 3' Core, 3' Proximal and 3' Distal regions). This finding might imply some alternative transcripts are transcribed in the 3' end and is consistent to the previous report of alternative promoters identified in the 3' end . This could also be simply due to either stalled RNA Pol II that has finished transcription or loops formed with the promoters as has been proposed .
We also found that 164 genes (~60%) are common between ChIP-seq and ChIP-chip (Figure 1A) while 183 (~59.6%) common genes between ChIP-chip and ChIP-PET (Figure 1A). A comparison of ChIP-seq and ChIP-PET has shown that they have 172 (~63%) common genes (Figure 1A).
One possible scenario to explain why the majority of ERα direct targets are not activated or repressed is that they may lack certain binding TF partners serving as helpers for ERα to co-regulate these genes. Thus we applied the de novo motif discovery approach (ChIPMotifs) [34, 35] developed in our previous study to identify the known or novel ERα TF partners. ChIPMotifs is online software designed for searching the most significant motifs in given peaks based on the known factor motifs in TRANSFAC  and JASPAR  databases.
Previous studies have shown ERα regulates its target genes through three major binding models a) direct binding to ERE (estrogen response element); b) indirect binding, through which it binds to other TF partners which bind to DNA; and c) co-occurrent binding, where both ERα and other TF partners bind to their own specific DNA motifs (Additional file 3, Figure S2) [20–22, 40, 41].
In summary, 40 nodes (Additional file 4, Figure S3A) were identified for ChIP-seq data, 50 for ChIP-PET and 37 for ChIP-chip (Additional file 4, Figure S3B, C). A comparison of the three regulatory networks shows that there are 6 common hubs (ER, FOS, RORA, FOXA1, CEBP and PITX2) and 16 (~40%) common targeted TFs (Additional file 5, Table S2) in MCF7. The final combined ERα regulatory network for MCF7 cell from all three ChIP-based datasets was shown in Figure 3.
A summary of binding peaks of Pol-II and ERα in control and E2-treated MCF7-T cells identified by ChIP-seq.
Unique Mapped Reads
Since we used ChIP-seq technology for MCF7-T cells, we compared the number of ERα regulated genes between MCF7-T and MCF7 cells using ChIP-seq technology. To illustrate the difference of these two cell lines, we first compared the overlapped number of genes with ERα binding peak and Pol-II binding peak, respectively. We found that 1075 genes were overlapped with ERα binding peak for both MCF7 and MCF7-T cell lines, and this number is 1508 for Pol-II binding peak. We also found that only 58 (~21.2%) common genes (Figure 5A) between these two breast cancer cells. This low overlapped number demonstrated that ERα targeted different sets of genes in these two cancer cells.
We next applied the same computational analytical approach to examine the ERα regulated network in MCF7-T cells. We started with training the top 2000 ERα binding peaks with high scores (enrichments) by the ChIPMotifs, a total of 7 TFs were identified, including ERE, PAX6, PITX2, RORA, XBP1, PPARG and PPARA (Figure 5B). Of them, ERE, PAX6, PITX2 and RORA are the same as the ones we identified in ChIP-seq data of MCF7 cells.
The ERα associated transcriptional regulatory network in MCF7-T cells was then constructed and visualized as shown in Figure 5C, including 32 nodes. A comparison of the networks between MCF7-T and MCF7 cells (using ChIP-seq dataset) showed that there are 2 common Hub TFs (RORA and PITX2) and 8 (about 25%) common targeted TFs. However, in MCF7 cells we found that there are 6 common Hub TFs and 16 (about 40%) common targeted TFs among different ChIP technologies. Taken together, our results strongly suggested that E2 induces a different ERα associated regulatory mechanism in MCF7-T cells compared to MCF7 cells, in other words, a rewired ERα regulation network in tamoxifen resistance cells.
In this study, we applied computational approaches to analyze and integrate three ChIP-based datasets and one time-series gene expression data to investigate the dynamic regulatory information for ERα in estrogen-dependent breast cancer MCF7 cells. Our studies not only compared the results of three ChIP-based protocols, but also have inferred the regulatory network and pathway for E2 induced MCF7 cells. Moreover, we used the same approach to compare the difference between MCF7 and MCF7-T cells.
Our comprehensive analysis of the estrogen-mediated regulatory network in MCF7 indicated that the three ChIP-based technologies have similar peak distribution patterns for the same antibody. However, for different antibody (ERα and Pol-II), the binding preference is totally different. Pol-II tends to binding to the promoter region, while ERα has no specific preference.
Our analysis (Figure 1A) also showed that there were more common regulated genes (in percentage) between ChIP-seq data and ChIP-PET data than ChIP-chip data. This may be due to a similar sequencing-based technology used for ChIP-seq and ChIP-PET, but an array-based technology used for ChIP-chip data. The regulatory network analysis indicates some common regulatory Hub TFs were formed in response to estrogen signaling and may lead to the same regulatory paths in MCF7. We also found that there are ~40% common TFs regulated by ERα among three different ChIP-based technologies.
We also compared the regulatory network in MCF7 constructed by our approach with another method--ReMoDiscovery . We used 0.9 (for Motif threshold), and 0.5 (for expression correlation threshold) as the input threshold. Total 26 (of 42) nodes were overlapped between our method (Additional file 4, Figure S3A) and ReMoDiscovery (Additional file 9, Table S5).
In Figure 4, all the genes were with both ERα and Pol-II peak binding, and differentially expression between 12 hours of E2 treat and control. In Figure 4A, the genes were clustered based on the expression values. Each thin line meant the gene expression data (relative value compared with the first time point) at different time point. The genes were classified into 3 groups (the second time point), 4 groups (the third time point) and 5 groups (the fourth time point), respectively. The thick lines corresponded to the gene expression trend of each group. In Figure 4B, the important Hub TFs (Hub TFs with statistically significant number of motifs in the ERα binding peaks of groups of genes) were identified for each group at every time point (4 Hub TFs for time point 2, group 1; 1 for time point 2, group 2 and 3; 3 for time point 4, group 2). The 5 groups of genes were also showed significantly different gene functions. We listed the functions and p -values. Figure 4C (a) showed both up and down regulated genes, (b) corresponded to down regulated genes, and (c) corresponded to up regulated genes. A TF-gene relationship table is prepared as the input of DREM (Additional file 8, Table S4). In this table, 1 represented motif of Hub TF existed in the peak region of a gene.
The hierarchical regulatory network analysis with DREM and Cytoscape revealed that RORA could be a potential ERα partner, which is consistent to other reports  showing that RORA interacts with ERα and enhances ER transcriptional activity in breast cancer. This finding also indicates that RORA was required for a subset of E2-mediated up-regulated genes associated with functions of Lyase activity. Our results may provide the guidance for further investigation on the role of this co-regulation pathway played in breast cancer.
In MCF7-T cells, we found very small proportion (21.2%) of differential expressed genes overlapped with MCF7 cells (ChIP-seq dataset) with both ERα and Pol-II binding peaks. In addition, the number of common TFs of the network between MCF7-T and MCF7 cells is only 25%. Moreover, we also found two TFs with opposite expression trend between MCF7 and MCF7-T cells, in which MSX2 is up-regulated in MCF7-T cells and down-regulated in MCF7 cells, whereas ESR1 (ERα) is down-regulated in MCF7-T cells. We also identified 3 new Hub TFs in MCF7-T cells which were not found in MCF7 cells. This indicates that ERα may be no longer the most important transcription factor, other transcriptional regulators and signalling pathways may play important role in tamoxifen resistant MCF7 cells. Our comparative analysis also suggests that the ERα associated regulatory network in MCF7-T cells is rewired upon E2 induced.
We have performed additional experiments using both RT-qPCR and ChIP-qPCR to validate randomly select eight ER regulated binding loci in MCF7-T cell, and the result demonstrated that the predicted ER regulated binding loci with differential gene expression after E2 treatment were validated as shown in Additional file 10, Figure S5.
We further compared the E2 induced gene functions to specify the difference between MCF7 and MCF7-T cell line. We selected the genes that were differentially expressed and with ERα and PolII binding site. These genes were performed by GO function by the DAVID program  and the top 10 (the close functions were removed) categories were selected based on the p -value. The result was shown in Additional file 11, Figure S6 and Additional file 12, Table S6. 4 of the 10 functions were the same between those two groups of genes.
In summary, we analyzed hierarchical regulatory networks for estrogen-dependent regulation in MCF7 and MCF7-T cells. We systematically compared different ChIP-based technologies as well as different breast cancer cells. Our results revealed extended hierarchal regulatory networks with new target hubs in breast cancer cells. Our computational analytical approach may also provide a framework for dissecting transcriptional regulatory networks in response to breast cancer and other human diseases.
The time-series E2-induced gene expression data in MCF7 cells for all three ChIP-based datasets was obtained from Carroll et al. 2006 . The data was with E2 treatment for 4 time points: 0 hr, 3 hr, 6 hr and 12 hr. We considered those genes were up-regulated if the gene expression values of time point 12 hr are bigger than time point 0, and verse versa.
ChIP-seq of E2-induced MCF7-T cells were maintained in a hormone-free medium (phenol red-free MEM with 2 mmol/L L-glutamine, 0.1 mmol/L nonessential amino acids, 50 units/mL penicillin, 50 Ag/mL streptomycin, 6 ng/mL insulin, and 10% charcoal-stripped FBS) supplemented with 10-7 mol/L 4-hydroxytamoxifen. Prior to all experiments, MCF7-T cells were cultured in hormone-free medium for 1 week to deplete any residual OHT, and then MCF7-T cells were cultured in hormone-free basal medium (phenol-red free MEM with 2 mM L-glutamine, 0.1 mM non-essential amino acids, 50 units/ml penicillin, 50 μg/ml streptomycin, and 3% charcoal-dextran stripped FBS) for three days. MCF7-T cells were treated with E2 (108 mol/L) for 3 hours.
5 × 107 cells were crosslinked with 1% formaldehyde for 10 min, at which point 0.125 M glycine was used to stop the crosslinking. In brief, after crosslinking, cells were treated by lysis buffers and sonicated to fragment the chromatin to a size range of 200 bp-1 kb. Chromatin fragments were then immunoprecipitated with 10 ug of antibody/magnetic beads. The antibodies against Pol-II and ERα were purchased from Santa Cruz Biotechnology (Santa Cruz, sc-899 × and sc-8005 X). After immunoprecipitation, washing, and elution, ChIP DNA was purified by phenol:chloroform:isoamyl alcohol and solubilized in 70 μl of water. Then Illumina library was constructed and sequenced with Illumina/Solexa Genome Analyzer (Canada's Michael Smith Genome Sciences Centre, Vancouver, CA). Briefly, the ChIP DNA sample was run in 12% PAGE and the 100-300 bp DNA fraction was excised and eluted from the gel slice overnight at 4°C in 300 μl of elution buffer (5:1, LoTE buffer (3 mM Tris-HCl, pH 7.5, 0.2 mM EDTA)-7.5 M ammonium acetate) and was purified using a QIAquick purification kit (Qiagen, Cat#28104). The library was constructed using Illumina genomic DNA prep kit by following its protocol (Illunima, cat# FC-102-1002), clusters were generated on the Illumina cluster station (Illumina, cat# FC-103-1002), and the sequence was run on Illumina 1G Analyzer following the manufacturer's instructions (Illumina, cat# FC-104-1003). The unique mapped ChIP-seq datasets (bed format, total four files: control using ER antibody, E2 treated using ER antibody, control using Pol-II antibody and E2 treated using Pol-II antibody) for MCF7-T cells are available from GEO database (accession number: GSE26083) or from our webpage http://motif.bmi.ohio-state.edu/ERNetwork/.
For E2-induced gene expression data in MCF7-T cells, Affymetrix U133 Plus2.0 array platform consisting of ~55,000 transcripts was used for measuring E2-induced gene expression in MCF7-T cells. Three replicates from different biological samples were performed and normalized using MASS5 in the statistic package R http://www.R-project.org (Additional file 13, Table S7). A p-value less than 0.05 using a Welch's t-test and 2 fold changes for each gene were used as cutoff thresholds to determine a set of differential expressed genes. Finally, we mapped the genes with Human TFs and get 1,667 genes remains.
After 3 hr of E2 treatment, control and treated MCF7-T cells were subjected to total RNA extraction by Trizol reagent (Invitrogen). Total RNA (2 μg) was reverse transcribed to cDNA with oligo-dT (SuperScript III; Invitrogen). Quantitative RT-PCR was performed by using SYBR Green dye chemistry (Applied Biosystems) on a 7500 Real-time PCR system apparatus (Applied Biosystems). Gene Expression was measured by the ΔΔCt method using GAPDH as the internal control. Statistical analyses were carried out by using a two-tailed t test. Specific primers for amplification are available on request.
To confirm candidate ERα target genes determined by ChIP-seq, PCR primers targeting a region within 200 bp of the predicted ERE were used to measure the enrichment of this sequence in anti-ERa-immunoprecipitated samples by quantitative PCR with SYBR Green-based detection method (Applied Biosystems). Quantitative values measured by a standard curve (50 to 0.08 ng, 5-fold dilution, R2 > 0.99) of input DNA amplified with the same primer set. Results are presented as the mean of triplicates with standard derivation.
A standard procedure for extracting image files, mapping the reads onto human genome, and filtering the mapped reads to unique reads was followed with the Solexa 1.6 pipeline . Only uniquely mapped reads with a length of 36 bp were then used further for determining the binding regions by our BELT program . For all datasets, we narrowed the peaks with 500 bp in length. All the common peak number in Figures 1 and 5A were overlapped according to the gene ID of each file. We used the fisher exact test  to calculate the p -value of each overlapped number, and assumed that the human genome number is 30,000 (Additional file 14, Table S8).
The following steps were used to calculate the position weight matrix (PWM).
Where c is one of base type (A/T/C/G), Si,j is the nucleotide at row i, column j.
Where p(i) is the background frequency of nucleotide i. For Human genome, p(i = A) = p(T) = 30% and p(i = C) = p(G) = 20%, approximately.
To construct the regulatory network, the PWM was used to scan the peak region of each gene. To make the result more reliable, we use stringent threshold (1 for core score, 0.95 for PWM score) to determine the underlying TF binding site. The PWM score and core score is calculated as follows:
Where w (S i , j) is the score at row i which the nucleotide is the same as the given sequence at column j.
If there is Motif for ERα and Hub TFs for certain gene, a connection between the Hub TF (include ERα) and the gene was made. We further facilitated these relationships by visualizing an estrogen regulatory network with Cytoscape  software platform (see Results section).
To test the significance of the network proposed in this study, a statistical strategy (permutation) was used to determine the probability of each edge of the network under random circumstances. Since the TF binding site region is composed of specific sequences, and only by scanning the sequence region using PWM, we get the network edges. Hence, we shuffled the sequence of each peak region for 1000 times, and to see how many times a specific TF binding site is hit by the scanning process. The ratio (times hit by scanning divide by 1000) of each edge was calculated. The number with low value was considered high statistical significance (we used 0.2 as a cutoff to include as more connection as possible while keeping the relationship reliable).
The up/down-regulated genes were grouped into several classes (sub up/down-regulated classes). For each group, the common gene functions were identified from GO database with p-value represented for the significance. Then, we used the PWMs of the 6 TFs (ERα, RORA, FOS, FOXA1 PITX2 and CEBP) to identify all possible binding sites in all genes with peaks. Thus we established the relationship between Hub TFs and grouped genes. A DREM diverging score less than 0.1 was used as a significant score threshold for TF-grouped gene relationship.
This study was supported by grants from NIH U54CA113001 and R01CA069065 and by funds from the Ohio State University Comprehensive Cancer Center and The Ohio State University Biomedical Informatics.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.