Transcription profiling of lung adenocarcinomas of c-myc-transgenic mice: Identification of the c-myc regulatory gene network

Background The transcriptional regulator c-Myc is the most frequently deregulated oncogene in human tumors. Targeted overexpression of this gene in mice results in distinct types of lung adenocarcinomas. By using microarray technology, alterations in the expression of genes were captured based on a female transgenic mouse model in which, indeed, c-Myc overexpression in alveolar epithelium results in the development of bronchiolo-alveolar carcinoma (BAC) and papillary adenocarcinoma (PLAC). In this study, we analyzed exclusively the promoters of induced genes by different in silico methods in order to elucidate the c-Myc transcriptional regulatory network. Results We analyzed the promoters of 361 transcriptionally induced genes with respect to c-Myc binding sites and found 110 putative binding sites in 94 promoters. Furthermore, we analyzed the flanking sequences (+/- 100 bp) around the 110 c-Myc binding sites and found Ap2, Zf5, Zic3, and E2f binding sites to be overrepresented in these regions. Then, we analyzed the promoters of 361 induced genes with respect to binding sites of other transcription factors (TFs) which were upregulated by c-Myc overexpression. We identified at least one binding site of at least one of these TFs in 220 promoters, thus elucidating a potential transcription factor network. The analysis correlated well with the significant overexpression of the TFs Atf2, Foxf1a, Smad4, Sox4, Sp3 and Stat5a. Finally, we analyzed promoters of regulated genes which where apparently not regulated by c-Myc or other c-Myc targeted TFs and identified overrepresented Oct1, Mzf1, Ppargamma, Plzf, Ets, and HmgIY binding sites when compared against control promoter background. Conclusion Our in silico data suggest a model of a transcriptional regulatory network in which different TFs act in concert upon c-Myc overexpression. We determined molecular rules for transcriptional regulation to explain, in part, the carcinogenic effect seen in mice overexpressing the c-Myc oncogene.


Background
The proto-oncogene c-Myc is highly expressed in many cancer types [1][2][3] and plays a critical role in regulating cell growth, proliferation, loss of differentiation, and apoptosis [4]. In transgenic mice, targeted overexpression of Myc has been shown to be sufficient to induce cancer [5][6][7]. In our department, a transgenic mouse model was created which overexpresses c-Myc. The c-Myc overexpression in alveolar epithelium of these mice results in the development of bronchiolo-alveolar carcinoma (BAC) and papillary adenocarcinoma (PLAC). Life expectancies of c-Myc transgenics range between 12-14 months.
The molecular mechanisms by which c-Myc functions to effect tumorigenesis have been the subject of extensive research in the past several decades. c-Myc is a transcription factor, a basic helix-loop-helix leucine zipper protein that dimerizes with Max to bind the DNA sequence 5'-CACGTG-3', known as an E box, and activates transcription [8]. Myc also represses transcription through interaction with Miz-1 or through other elements at core promoters [9]. Furthermore, Brenner et al. [10] suggested that c-Myc may also repress transcription by recruitment of a DNA methyl-transferase corepressor Dnmt3a. DNA methylation is the most important epigenetic modification in mammalian cells and is associated with transcriptional repression. Nevertheless, the mechanisms of transcriptional repression by c-Myc seem not to occur by direct binding of c-Myc to the DNA sequence 5'-CACGTG-3', known as an E box, and are not really well understood.
The pleiotropic effects of c-Myc on tumorigenesis are thought to be mediated by its target genes, because transcriptionally defective Myc alleles have diminished transforming potential [11]. Furthermore, the domain that is required for c-Myc DNA binding, the basic helix-loophelix zipper domain, is essential for its oncogenic transformation, and c-Myc possesses an N-terminal transactivation domain. Deletions or mutations in this domain result in loss of c-Myc transformation [12]. The transcriptional activation potential of c-Myc, however, does not always correlate with its ability to transform rodent fibroblast cells [13]. Several studies showed that mutations in the Myc box II domain within c-Myc can abrogate its transformation capacity without affecting c-Myc activation of reporter gene constructs [14,15]. These results emphasized the complex and interrelated nature of c-Mycmediated transformation and highlighted the need to identify specific factors that interact with functionally important domains of the c-Myc oncoproteins.
Despite extensive research, the specific mechanisms by which tumorigenesis are achieved are not well understood. This is largely because a comprehensive list of biologically relevant Myc target genes has not yet been defined and such "transformation" associated genes remain elusive [16]. In order to elucidate Myc targets many different techniques have been employed, such as microarray profiling, serial analysis of gene expression, and chromatin immunoprecipitation [17][18][19][20][21][22][23][24][25]. To date, > 1,600 genes have been found to be Myc-responsive and stored in the Myc target gene database [26,27], but only a minority of the Myc-responsive genes have been implicated as direct target genes. C-Myc seems to induce a regulatory gene network, which consists of direct and indirect responses. The direct responses also seem to depend on different other transcription factors which either cooperate with or compete against c-Myc. Some of these transcription factors have already been described in the literature [28,29].
In this study, we report a genetic and bioinformatic approach designed to identify regulatory gene networks induced by overexpression of c-Myc in alveolar epithelium of our female transgenic mouse model, resulting in the development of bronchiolo-alveolar carcinoma (BAC) and papillary adenocarcinoma (PLAC). Because the mechanisms of transcriptional repression by c-Myc do not occur by direct binding of c-Myc to E boxes, we restricted our analysis to promoter sequences of induced genes in which the potential c-Myc binding sites can be identified in silico. Thus, we have identified potential direct target genes of c-Myc and propose different transcription factors which either cooperate with or compete against c-Myc. Furthermore, we in silico describe different indirect mechanisms possibly participating in the Myc tumorigenic phenotype. Taken together, we suggest a model of a regulatory gene network in which different TFs act in concert upon overexpression of c-Myc in our transgenic mouse model.

Analysis of high-density oligonucleotide microarray data
Global gene expression studies were done with lung tissue stemming from a female mouse transgenic line overexpressing the c-Myc proto-oncogene. The complete data have been deposited in NCBIs Gene Expression Omnibus (GEO) [30] and are accessible through GEO Series accession number GSE10954. The quantitative changes in significantly altered genes were investigated. For the definition of "significantly altered", see the Methods section. According to these criteria, transcription of 469 genes was induced and transcription of 8 genes was repressed in 5 months old animals (data shown in Additional file 1). At this time point the tumor burden was approximately 50%. It must be mentioned here that gene expression profiling by microarrays does not provide information about rates of transcription but measure mRNA abundance which might have been modified by processes such as reduced RNA degradation.

Validation of microarray data by real time PCR
For the validation of microarray data, five different genes were selected: Met (met proto-oncogene), Myct1 (myc target 1), Myc (myelocytomatosis oncogene), Pnliprp1 (pancreatic lipase related protein 1) and Pbk (PDZ binding kinase). Expression of these genes was alternatively investigated with real time quantitative PCR using the LightCycler ® . Comparison of fold changes determined by microarray analysis and real time PCR are shown in Figure  1. Statistical significant changes in microarray analysis are indicated by a black diamond. Criteria for significance are described in the methods section. Statistical significant changes in real time PCR are marked with an asterisk, which is based on a paired two-tailed student t-Test. The results were considered significant when the p-value was less than 0.05. As shown in Figure 1 there was basic agreement between the two platforms. The fold changes of Met, however, differ strongly between microarray analysis and real time PCR. This phenomenon can be observed sometimes with the validation of microarray data by real time PCR: microarray analysis shows strong up regulation whereas PCR indicates a very low fold change like 1.5 or less. Here, the reason might be the low average intensity value of 40.01 combined with its high standard deviation of 67.12% for Met in the microarray experiments of nontransgenic animals. Notably, the average standard deviation of all significantly regulated genes of this study amounts 23.81%. Together with a high and stable average intensity value of 480 combined with its low standard deviation of 16.83% for Met in the microarray experiments of c-myc-transgenic animals the corresponding fold change appears higher than it might be in fact. Furthermore, we did not compare gene expression on identical sequences. Hence, we can not exclude the possibility that transcript expression differs on the basis of the different sequences (primers and amplification products) employed.

Promoter sequence analysis of genes induced by overexpression of c-Myc
A flowchart of our in silico strategy used to elucidate the c-Myc regulatory network is depicted in Figure 2.

1) Analysis of promoters of 361 induced genes with respect to c-Myc binding sites
By using their RefSeq annotation, 361 promoter sequences could be extracted from the UCSC Genome Browser for the 469 upregulated genes. Furthermore, promoters of 100 genes which were not regulated at all were extracted (the list of non-regulated genes was prepared after applying criteria according to the Methods section and was included in Additional file 2). Both sequence sets were analyzed using TRANSFAC ® Professional rel. 8.3 integrated tool MATCH ® by using the matrices V$EBOX_Q6_01 (cut-off core similarity: 1.00, matrix similarity: 0.99), V$MYC_Q2 (cut-off core similarity: 1.00, matrix similarity: 0.99), and V$MYCMAX_B (cut-off core similarity: 0.75, matrix similarity: 0.96). The results of these analyses including positions and sequences of the corresponding binding sites are given in Additional file 3. Altogether, 110 different c-Myc binding sites were found in 94 different promoters, which partly were recognized by different matrices. Table 1 gives the 94 genes which are putatively directly regulated by c-Myc and the corresponding biological process they are involved in. In this table, the 15 targets stored already in the Myc Target Database are marked bold. Moreover, the number of c-Myc binding sites identified in the promoter set including promoters of induced genes was compared to the number of c-Myc binding sites identified in the control promoter set. The Validation of microarray data by real time PCR Figure 1 Validation of microarray data by real time PCR. Comparison of gene expression of selected genes determined by microarray analysis (black bars) and real time PCR (grey bars; LC: LightCycler ® ). Fold changes are shown on the y-axis. Significant changes of gene expression are indicated either with a diamond for array analysis or with an asterisk for real time PCR.

2) Analysis of flanking sequences (+/-100 bp) around the 110 c-Myc binding sites
For this analysis, we extracted the 110 c-Myc binding sites including the flanking sequences (100 bp flanking the 5 bp core sequence to both sides (= 205 bp)). We further randomly extracted the same number of 205 bp sequences from the control promoters which were not regulated at all (the list of 205 bp sequences of non-regulated genes was included in Additional file 4). Both sequence sets were analyzed using TRANSFAC ® Professional rel. 8.3 integrated tool MATCH ® by using the matrix profile "vertebrates_minSUM_highQual". An extract of the results of these analyses including the numbers of transcription factor binding sites in the corresponding promoter sets, the following fold occurrence of a given TF, and the significance (p-value) of the occurrence values are listed in Table 3. The complete result of this analysis is given in Additional file 5. According to Table 3 This might mean that these TFs bind in the nearest neighborhood to c-Myc in order to either cooperate with or compete against c-Myc. The distribution of these TFs around the c-Myc binding sites is shown in Figure 3. Here, the diagrams show that AP2 and ZIC3 do not or nearly not bind to the same site as c-Myc does, whereas E2F and ZF5 in some cases seem to bind to the same site as c-Myc.

Number of binding sites found in all promoter sequences
Fold occurrence of binding sites Significance (p-value) The occurrence value of c-Myc binding sites in the promoters of control genes is set to 1. Significance of the representation value of c-Myc binding sites in the promoters of induced genes is measured by the p-value derived from the binomial distribution.

3) Analysis of promoters of 361 induced genes with respect to binding sites of TFs which were transcriptionally induced by overexpression of c-Myc
According to GeneOntology 36 of 477 deregulated genes possess transcription factor activity or transcription regulator activity (Additional file 6). In the database TRANS-FAC ® Professional rel. 8.3, however, positional weight matrices are available only for the 6 transcription factors Atf2, Foxf1a, Smad4, Sox4, Sp3, and Stat5a, which were upregulated by overexpression of c-Myc.

4) Analysis of promoters of induced genes -without any c-Myc binding sites and without any binding sites of TFs which were induced by c-Myc -with respect to binding sites of other TFs
The 96 promoters of genes induced by overexpression of c-Myc which possess neither a putative c-Myc binding site nor a binding site of a transcription factor which was transcriptionally induced by c-Myc were analyzed using TRANSFAC ® Professional rel. 8.3 integrated tool MATCH ® by applying the matrix profile "vertebrates_minSUM_highQual". We further performed the same analysis using control promoters which were not regulated at all (Additional file 2). An extract of the results of these analyses including the numbers of transcription factor binding sites in the corresponding promoter sets and the resulting fold occurrence of a given TF are listed in Table 4. According to this table, the different matrices for the transcription factors Oct1, Mzf1, Pparg, Plzf, Ets, and HmgIY provide more than 30 hits. This table clearly shows an overrepresentation in comparison to control sequences. We found 36 putative Oct1 binding sites in 27 promoters, 37 putative Mzf1 binding sites in 24 promoters, 131 putative Pparg binding sites in 57 promoters, 47 putative Plzf binding sites in 37 promoters, 42 putative Ets binding sites in 25 promoters, and 46 putative HmgIY The TRANSFAC identifier of the respective matrix, the number of hits in the 96 induced gene promoters, the number of hits in the 100 nonregulated control gene promoters, and the corresponding fold occurrences are given in this table. The occurrence value of the respective binding site in the promoters of control genes is set to 1. Significance of the representation value of TF binding sites in the promoters is measured by the pvalue derived from the binomial distribution.
binding sites in 30 promoters. They are listed in Additional file 8. A summary of all results is depicted in Figure  4.

Discussion
Transcription profiling studies have identified many target genes activated or repressed by c-Myc in various animal and human cells or cell lines. The number of experimentally validated c-Myc targets is rapidly expanding thanks to the use of high-throughput methods [19,[31][32][33]. Two recent studies suggest that the potential list of c-Myc targets could be much larger than what was previously anticipated [22,31]. Moreover, Chen et al. [32] suggest the existence of a significant tissue-specific component in the response of many c-Myc target genes. Gene expression studies alone, however, cannot discrimi-nate between direct and indirect targets of c-Myc action, although network-based interference of direct action has been proposed [31]. Furthermore, gene expression studies alone can identify neither transcription factor activations or repressions on the protein level nor transcriptional cooperation and competition of different transcription factors involved in the corresponding regulatory network. Analysis of promoters of regulated genes resulting from gene expression studies, however, may provide indications in these directions.
Thus, using positional weight matrices (PWMs), which is the most widely used method for recognition of transcription factor binding sites [34,35], we analyzed promoters of genes which were induced by overexpression of c-Myc in alveolar epithelium of our female transgenic mouse We wish to point out that the c-Myc transcriptional regulatory network analyzed in other tissues might be different from the network described in this study. Indeed, an analysis of 89 genes whose promoters (1000 bp upstream of the TSS) possess at least one experimentally determined high-quality Myc binding locus on human P493 B cells [29] provided no overlap with promoters of genes in mouse lung adenocarcinoma reported in the present study. In the present study, we analyzed exclusively the flanking sequences around the in silico identified c-Myc binding sites by use of all available positional weight matrices in the TRANSFAC database. Especially binding sites of the transcription factors E2F, AP2, ZF5, and ZIC3 were found to be significantly enriched from 2.2-to 10-fold over control promoter background. The poor concordance of our results and those of Elkon et al. [36] might be due to different reasons: We analyzed different species, different tissues, and different lengths of analyzed sequences and therefore, possibly different distances from c-Myc binding sites.
Notably, both studies identified E2F to be a transcriptional regulator associated with c-Myc. Like c-Myc, E2F also controls cell cycle progression and DNA replication [37]. Thus, deregulation of c-Myc could potentially lead to uncontrolled cell cycle progression through a functional link with E2F, as proposed also by Zeller et al. [29]. The authors supposed that high c-Myc expression leads to increased E2F activity by upregulating genes involved in cell cycle control. The cooperative binding of Myc and E2F followed by transcriptional activation of key downstream targets leads to an increase in DNA replication and cell cycle progression ( Figure 5). Here, by using four different matrices for E2F, we found E2F binding sites in the direct neighborhood of c-Myc binding sites (maximum distance from c-Myc binding sites was 100 bps) in 37 sequences out of 110 sequences possessing a c-Myc binding site. Depending on the matrix used, they are 2.2-to 3.7-fold enriched over the control promoter background. Furthermore, the network relationships between c-Myc and E2F are also obvious through the identification of functional E2F binding sites in the c-Myc [38] and in the E2F promoter [39] as well as the identification of E2F as a c-Myc target gene [26].
By using four different matrices for AP2, we also found AP2 binding sites in the direct neighborhood of c-Myc binding sites in 32 sequences out of 110 sequences possessing a c-Myc binding site. Depending on the matrix applied, they are 2.4-to 10.5-fold enriched over the control promoter background. In 2006, Zeller et al. already identified AP2 to be significantly enriched in cis-regulatory modules with c-Myc [29]. The AP2 family of transcription factors plays a broad range of roles in cell growth, tissue morphogenesis, and cancers. One of the mechanisms by which the AP2 family fulfills their roles is to activate or suppress various downstream target genes at transcriptional levels. A number of studies demonstrated that AP2-interacting proteins can affect the transcription of AP2 downstream targets by modulating the transcriptional activity of AP2. In fact, several AP2-interacting partners have been identified, such as YY1, RB, and c-Myc [28,40,41]. Thus, AP2 is a known c-Myc partner. In 1995, Gaubatz et al. [28] showed that AP2 acts as an inhibitor of Myc-mediated transactivation, a function that is mediated both by competition of AP2 with binding of conditions AP2 might be able to inhibit this transactivation ( Figure 5).
We also found ZF5 binding sites in the direct neighborhood of c-Myc binding sites in 41 sequences out of 110 sequences possessing a c-Myc binding site. They are 2.3fold enriched over the control promoter background. ZF5 is a ubiquitously expressed protein originally identified by its ability to bind and repress the murine c-Myc promoter [42]. It contains an N-terminal POZ domain, which is a conserved protein-protein interface that recruits cofactors to modulate transcription [43]. ZF5 mediates both transcriptional activation and repression of cellular and viral promoters [42][43][44]. Here, in the promoters of which both binding sites -c-Myc and ZF5 -were found in the direct neighborhood, the overrepresented transcription factor c-Myc might have induced the corresponding gene transcription, whereas under normal conditions ZF5 might be able to competitively inhibit this transactivation and further inhibit the transcription of the c-Myc gene ( Figure 5). Sp3, and Stat5a) for the analysis of the 477 deregulated genes, many putative indirect targets of c-Myc action could be identified. In 73 promoters at least one binding site for ATF2 has been identified. ATF2 belongs to the basic region leucine zipper (bZIP) family of transcription factors and is an important member of activating protein 1 (AP-1) [46]. ATF2 functions either as a homodimer or as a heterodimer with other members of the ATF family as well as other bZIP proteins, to bind to specific DNA sequences and activate gene expression. One major role of ATF2 is to regulate the response of cells to stress signals [47,48]. Furthermore, in 2001, Miethe J et al. [49] identified a crosstalk between Myc and activating transcription factor 2 (ATF2): Myc prolongs the half-life of ATF2 and causes increased phosphorylation of ATF2 at sites that have been shown to be crucial for the regulation of ATF2 activity [49]. Thus, ATF2 is activated by c-Myc on the protein level. Here, we show a novel mechanism for gene activation by c-Myc: the transcriptional activation of the transcription factor ATF2, which in turn putatively activates the transcription of 36 genes ( Figure 6). Additionally, Tamura et al. also demonstrated an interaction between ATF2 and c-Myc in rat fibroblasts by affinity chromatography and co-immunoprecipitation [50].
The members of the forkhead box (Fox) family of transcription factors play important roles in regulating transcription of genes involved in cellular proliferation, differentiation, and metabolic homeostasis [51]. Foxf1 RNA is expressed at mesenchymal-epithelial interfaces involved in lung and gut morphogenesis [52]. In the adult mouse, Foxf1 RNA is detected in smooth muscle layers of pulmonary bronchioles, lamina propria of the stomach and the intestine, and in alveolar endothelial cells. Foxf1 is further essential for normal lung repair and endothelial cell survival in response to pulmonary cell injury [53].
Here, we demonstrated transcriptional activation of the The SOX4 gene is highly expressed in human breast cancer cell lines, colon cancer cell lines, hepatocarcinoma, medulloblastomas, and adenoid cystic carcinomas [56][57][58]. SOX-4 was also shown to be highly and differentially expressed in a substantial fraction of small-cell lung carcinoma (SCLC) samples and in a pool of primary lung adenocarcinoma samples, with very low levels of expression in a number of normal essential tissues. Notably, evidence has been presented to suggest that SOX-4 may be involved in tumorigenesis [59,60]. Here, we identified the ability of c-Myc to act as an indirect positive regulator of SOX-4 expression. SOX-4 again mediates the indirect effects of c-Myc ( Figure 6).
Sp3 is a ubiquitous transcription factor closely related to Sp1 and contains activation and inhibitory domains. It can act as an activator or repressor of transcription [61,62]. In 2004, the results of Abdelrahim M et al.
showed that Sp3 plays an important role in cell cycle progression of pancreatic cancer cells [63]. STAT5A is a transcription factor that mediates cytokine and hormone signals. Its constitutive activation has been observed in several human cancers, and it is oncogenic in cell culture models and transgenic animals [64]. Here, we identified the ability of c-Myc to act as an indirect positive regulator of SP3 and STAT5A expression. SP3 and STAT5A again mediate the indirect effects of c-Myc ( Figure 6).
General analysis of the promoters which do not contain any putative c-Myc binding site nor any putative binding site of transcription factors (TFs) being transcriptionally induced by overexpression of c-Myc resulted in the observation that some TF binding sites are overrepresented against the control promoter background. These are binding sites of the TFs: OCT1, MZF1, PPARg, PLZF, ETS, and HMGIY ( Figure 7).
Some of them are worth mentioning, because they seem in part to mediate the carcinogenic effect seen in mice after overexpression of the c-Myc oncogene: Oct1 modulates the activity of genes important for the cellular response to stress [65]. Although adipose tissue has been recognized as a principal site of PPAR gamma expression, it is now known that PPAR gamma is expressed in many other types of tissues and cells. It has often been mentioned in the context of cancer: its ligand activation has Tra nscription cellula r prolifera tion been shown to be involved in promotion or regression of colon tumors [66,67]. Furthermore, activation of PPAR gamma agonists capable of modestly inducing apoptosis has also been documented in a variety of tumor types [68]. Notably, Yamakawa-Karakida N et al. (2002) provided the first evidence of the linkage between PPAR gamma-mediated apoptosis and downregulation of c-Myc gene expression [69].
PLZF is known to be a transcriptional repressor which is associated with suppression of cellular proliferation.
McConnell MJ et al. (2003) showed that PLZF expression maintains a cell in a quiescent state by repressing c-Myc expression and preventing cell cycle progression [70]. They suggested that loss of this suppression would have serious consequences for cell growth control and that growth suppression mediated by PLZF can be reversed by enforced expression of c-Myc. Here, through the overexpression of c-Myc, we found 37 putative target genes for PLZF. They are, however, transcriptionally induced, which might be the reversed effect mentioned by McConnell MJ et al. [70]. Under normal conditions, these genes would be transcriptionally repressed by PLZF. Loss of this repression might play a role in the development of the tumorigenic phenotype of c-Myc.
HMGIY has been shown to be a direct c-Myc target gene [71]. Some studies indicate an important role for HMGIY proteins in regulating gene expression [72]. Histon H1mediated repression of transcription is reduced by HMGIY [73]. Like c-Myc, expression of HMGIY also correlates with rapidly proliferating mammalian tissues as well as neoplastic transformation [74] and, moreover, a higher residence time in heterochromatin and chromosomes, compared with euchromatic regions, correlates with an increased phosphorylation level of HMGIY [75].
The human Ets gene family includes 25 genes that code for transcription factors involved in the control of various aspects of cell proliferation, differentiation, and development. Ets domain transcription factors have been implicated in development of various forms of leukemias and solid tumors. It has been well established that their function can be controlled by phosphorylation-mediated effects on DNA binding. Phosphorylation has been shown to positively regulate transcriptional activities of Ets1 and Ets2. [76,77].
Binding sites of the transcription factors OCT1, MZF1, PPARg, PLZF, ETS, and HMGIY were found to be overrepresented in promoters of genes induced by overexpression of c-Myc. Their own gene expression, however, was unchanged. One explanation for this observation might be their regulation on the protein level. Nevertheless, some of these transcription factors seem to participate also in the Myc tumorigenic phenotype.

Conclusion
Taken collectively, after transcription profiling of lung adenocarcinomas of female c-Myc-transgenic mice we were able to describe the c-Myc regulatory gene network in silico. By using positional weight matrices (PWMs), which is the most widely used method for recognition of transcription factor binding sites, we identified different mechanisms by which c-Myc putatively mediates its tumorigenic actions (see Figure 2

Methods
Tissue samples c-Myc-transgenic female mice displayed morphological alterations with varying degree of nuclear atypia, such as bronchiolo-adenomas and bronchiolo-adenocarcinomas. Thus, different stages of malignant transformation of alveolar epithelium were observed. In the non-transgenic control animals no abnormalities in lung tissue was detected with the exception of a single animal which showed a slight focal interstitial mononuclear cell infiltration.

Gene expression studies
For gene expression analysis, either c-Myc-transgenics or non-transgenic controls were pooled such that 4 pools of 4 mice per group could be analyzed. Each pool was analysed in one microarray experiment. Only aliquots of individual RNA isolations were pooled, thus allowing measurement of selected genes by quantitative RT-PCR amongst all individual animals. Therefore, RNA was isolated from lung tissue of each individual animal, and identical amounts of RNA from 4 individuals of one group were pooled.
Transcriptome analysis was done according to the manufacturer's recommendation (Affymetrix GeneChip ® Expression Analysis Technical Manual (Santa Clara, CA)), using the GeneChip ® Test Arrays and GeneChip ® Mouse Genome 430 2.0. The frozen lung tissues (10-15 mg) were disrupted and homogenized using a rotor-stator homogenizer, and total RNA was isolated from the tissues using the RNeasy total RNA isolation kit (QIAGEN). RNA of individual samples was pooled as described above, and a second cleanup of isolated RNA was done using the RNeasy Mini Kit (QIAGEN). RNA was checked for quantity, purity, and integrity of the 18S and 28S ribosomal bands by capillary electrophoresis using the NanoDrop ND-1000 and the Agilent 2100 Bioanalyzer.
8 μg of total RNA were used as starting material to prepare cDNA. Synthesis of double-stranded cDNA was done with the GeneChip ® one-cycle cDNA Kit (Affymetrix). The cleanup of double-stranded cDNA was done using the GeneChip ® Sample Cleanup module (Affymetrix).
12 μl of cDNA solution were used for in vitro transcription. The in vitro transcription was conducted with the GeneChip ® IVT Labeling Kit (Affymetrix). The total amount of the reaction product was purified with the GeneChip ® Sample Cleanup module (Affymetrix). Purified cRNA was quantified and checked for quality using the NanoDrop ND-1000 and the Agilent 2100 Bioanalyzer. Purified cRNA was cleaved into fragments of 35-200 bases by metal-induced hydrolysis. The degree of fragmentation and the length distribution of the fragmented biotinylated cRNA were checked by capillary electrophoresis using the Agilent 2100 Bioanalyzer.
10 μg of biotinylated fragmented cRNA were hybridized onto the GeneChip ® Mouse Genome 430 2.0 array according to the manufacturer's recommendation. The hybridization was performed for 16 hours at 60 rpm and 45°C in the GeneChip ® Hybridization Oven 640 (Affymetrix). Washing and staining of the arrays was done on the Gene-Chip ® Fluidics Station 400 (Affymetrix) according to the manufacturer's recommendation. The antibody signal amplification, washing, and staining protocol (Affymetrix) was used to stain the arrays with streptavidin R-phycoerythrin (SAPE; Invitrogen, USA). To amplify staining, SAPE solution was added twice with a biotinylated antistreptavidin antibody (Vector Laboratories, CA) staining step in between.
The arrays were scanned using the GeneChip ® Scanner 3000. Scanned image files were visually inspected for artifacts and then analyzed, each image being scaled to the same target value for comparison between chips. The GeneChip ® Operating Software (GCOS) was used to control the fluidics station and the scanner, to capture probe array data, and to analyze hybridization intensity data. Default parameters provided in the Affymetrix data analysis software package were applied for analysis.
After scanning, the GeneChip ® Operating Software (GCOS; version 1.1) generated the expression data for every single chip.
As detailed by the manufacturer, expression of a gene is corroborated by a set of 11 pairs of 25-oligomer. Next to perfect sequence matches, deliberate mismatches which differ by one base only in the middle of the oligomer are introduced to confirm hybridization products. A statistical expression algorithm within the GeneChip ® Operating Software (GCOS) calls on multiple perfect sequence matches and mismatches to determine the presence [a detection call "present" (P) or "absent" (A)] and abundance (a signal value) of an individual transcript. The detection (absolute information) and the signal (numerical values) are calculated independently.
To determine whether a gene is "significantly present", the average signal value (Signal-Avg) and the standard deviation [Stdev and Stdev(%)] were calculated using Affymetrix ® Data Mining Tool Software (version 3.1) and Microsoft ® Excel 2003. Additionally, the number of "present" calls (P-count) for each gene in four replicates was determined.
Criteria applied for a "significantly present" gene were, for example: average signal value ≥ 100, and all four "detection calls" must be "present" (P-count).
Multiple data from replicates were evaluated and compared using statistical analysis with the Affymetrix ® Gene-Chip ® Operating Software (GCOS) and Data Mining Tool (DMT). The average and standard deviation statistics within Affymetrix ® DMT were used to summarize the expression level (the signal values) for each transcript across the replicates. The unpaired t-test and comparison ranking were used to determine the direction and significance of change in a transcript's expression level between sets of replicates. Fold change values were calculated as the ratio of the average expression levels for each gene between c-Myc-transgenic animals and the correlating control experiment.
To extract genes with significantly altered expression, a comparison between groups of animals was conducted using the GeneChip ® Operating Software (GCOS). A comparison analysis was conducted for the female group within the c-Myc-transgenic line: transgenic versus nontransgenic strains.
For the comparison analysis, it was ensured that the scale factors for the compared chips did not differ by a factor larger than 3. The result of a single analysis between two different arrays was reported for each gene as "increase" (I) or "decrease" (D), and the change in signal intensity was determined as the signal logarithm ratio (log 2 ratio).
In this study, with four replicates per group, 16 comparison analyses (4 transgenic versus 4 non-transgenic) were conducted. Comparison ranking analysis was additionally done to study concordance between "increase calls" (I) or "decrease calls" (D) for replicates (this is counting the number of "I-calls" and "D-calls" out of 16).
The unpaired one-sided t-test converting the p-value to a two-sided p-value was used to determine the direction and significance of change in a transcript's expression level (Data Mining Tool, version 3.1). Signal values of each group were used as basis for calculation, with the original p-value cutoff determined to be 0.05.
Comparing different groups, a "fold change" (FC) was calculated, which is the ratio between the average signal values of groups to be compared. Ratios ≤ 1 were recalculated to give negative numbers whose magnitude resembles the extent of repression (for example: ratio of 0.5 was changed to -2).
To select genes in the c-Myc experiments, the following criteria were applied for the comparison conducted: 1. For induced genes m the average signal value of the "treatment" had to be higher than 100 m there had to be more than 13 (out of 16) decrease calls in the comparison ranking Applying these criteria as detailed above, probe sets significantly altered in expression were selected. In a few cases, two or more of these "probe sets" were targeting the same gene.
To prevent reiterations, the following criteria were applied and only one "probe set" per gene was selected: 1. Primarily, "probe sets" not specific for one transcript were eliminated (indicated in the Probe Set ID by an additional letter, e.g. 1370470_x_at).
2. In case all probe sets were specific (Probe Set IDs without an additional letter, i.e. 138520_at) or all were not specific, those with higher signal values were selected.

Real-time PCR
Real-time PCR measurement was done with the LightCycler ® (Roche Diagnostics, Penzberg, Germany). RNA was treated with Dnase and purified with RNeasy Mini Kit. Quality of purified RNA was analyzed in a denaturating Agarose gel. Reverse transcription (RT) was performed with 2 μg of RNA using Omniscript (Qiagen), RNase inhibitor and hexamers (Promega) in a final volume of 20 μl. RT reactions were diluted 1:4 and 2 μl was used for Real-time PCR. SYBR ® Green I was used as a fluorescent dye to determine the amplified PCR products after each cycle. The lengths of PCR products were checked in gel electrophoresis. PCR primers were synthesized by Invitrogen (Karlsruhe, Germany). At the end of each extension phase fluorescence was observed and used for quantitative measurements within the linear range of amplification yielding calculated concentrations as relative units. Exact quantification was achieved by serial dilution with cDNA produced from total RNA extracts using 1:5 or 1:3 dilution steps, depending on the expression level of the gene. Six runs were necessary to measure expression of the genes in all samples. For comparability of the six independent runs, standards were used, which were identical sample pools for all six runs. The standardized sample values for each gene of interest were divided through the standardized values of the housekeeping gene. As housekeeping gene, Ppib (peptidylprolyl isomerase B; cyclophilin B) was used.

Sequence retrieval
The UCSC Genome Browser [78] was used to extract the promoter regions of regulated genes and promoter regions of control genes with no change in expression. Exclusively promoters of genes which are RefSeq annotated were extracted. The beginning of the first exon which also comprises the 5'UTR was considered to be a tentative TSS (transcription start site) [79]. 1000 bp upstream and 100 bp downstream of TSS were extracted, respectively. The choice of these regions was based on previous observations that c-Myc frequently binds to the regions having a distance of up to 1000 bps from the TSS [80,81]. It must be mentioned, however, that binding of c-Myc has also been proved to occur in the first intron of c-Myc target genes [82].

Process of promoter analysis
The most widely used method for recognition of transcription factor binding sites is the application of positional weight matrices (PWMs) [34] TRANSFAC ® Professional rel. 10.1 is the largest collection of weight matrices for eukaryotic transcription factors [83,84] (BIO-BASE GmbH, Wolfenbüttel, Germany). Here, the TRANS-FAC ® -integrated MATCH™ algorithm was employed, calculating scores for the matches by applying the socalled information vector [85]. The matrix profile "vertebrates_minSUM_highQual" was used. Default cutoff values for matrix similarity were used, whereas the cutoff values for core similarity were always set to 0.75. The matrix similarity cutoff is a score that describes the quality of a match between a matrix and an arbitrary part of the input sequences. In addition, only those matches which score higher than or equal to the matrix similarity threshold appear in the output. The number of transcription factor binding sites identified in the analyzed promoter set was compared to the number of transcription factor binding sites identified in a control promoter set [promoters of 100 selected genes which were not regulated at all in all four different groups]. The list of non-regulated genes was prepared after applying criteria described below and was included in Additional file 2.

Selection of genes suitable as control promoters
For analysis of promoters of significantly altered genes, promoters of genes with no change in expression were selected. To do so, genes needed to be expressed with a signal value above 100, and the detection call of all 4 replicates had to be present. At the same time, the fold change must not be greater than 1.1 nor less than -1.1, the change direction, which is the result of the t-test with a p-value greater than 0.5, had to have a "None" call, and of the 16 comparison analyses conducted, less than five were allowed to have an "Induction" or a "Down" call. These criteria, applied to each comparison separately, in each case had to be true for all comparisons at the same time.
They are summarized as follows: m the average signal value of the "treatment" had to be higher than 100 m 4 P-calls had to be in the 4 "treatment" arrays m the fold change had to be between the range 1.1 and -1.1 (ratio of average signal values) m the result of the t-test had to be a "None"-change call m there had to be less than 5 (out of 16) induced calls in the comparison ranking Applying these criteria for c-Myc as detailed above, 164 probe sets for genes with almost no change in expression were selected. In addition, 100 genes with a Transcript RefSeq number were selected to be used to extract promoter sequences for controls (Additional file 2).