### Theoretical Section: The information theoretic analysis

This section summarizes the essence of the information theory based method used for the analysis of mRNA array as described in detail in the Additional file 1 section "Surprisal analysis". For additional discussion of the motivation, see [23]. References 24-28 provide more background. Here we just emphasize that in general our approach, known as surprisal analysis, [19] is a method of analysis of systems in both equilibrium and not equilibrium that are subject to constraints. Surprisal analysis is an analysis of the logarithm of the expression level of each gene as in equation (1) below. This analysis determines the transcription patterns of the transformation process and the weights of these independent patterns at each stage (= time point) of the transformation. A transcription pattern is a set of transcripts that act coherently. We index the patterns by the Greek letter *α*, *α* = 1,2,.. label the different independent patterns. For each pattern we determine its importance. *λ*_{
α
} (*t*_{
T
} ) is the value (= the importance) of the contribution of the pattern *α* at time point *T*. We shall show that at any stage there are very few important patterns. The validation of this statement as well as the determination of the transcription patterns is quantitative. The information theoretic thermodynamic-like approach derives the logarithm of the expression level of each gene. For gene *i* at the time point *T* we obtain equation (1) for the expression level of gene *i* at the time point *T* where *G*_{
iα
} is the weight of that gene in the pattern *α*

The first term on the right side of the equation is the time-invariant part of the gene expression level for the particular transformation process. Typically this term makes a non-negligible contribution. According to the theory, this term is the gene expression level at maximal entropy. The time varying transcription patterns are represented by the terms in the sum. It is these terms that reduce the magnitude of the entropy. In the information theory approach *λ*_{
α
} (*t*_{
T
} ) is the extent of reduction of the entropy due to the particular gene transcription pattern *α*. Due to the presence of the constraints, represented in equation (1) by the action of expressed genes, the system does not reach a steady state.

We have an accurate representation of the transformation process when the measured left hand side in equation (1) is close in value to the theoretical representation on the right hand side of equation (1). By making a least squares match between the two sides of equation (1) we obtain the numbers *G*_{
iα
} and *λ*_{
α
} (*t*) with the necessary mathematical details provided in the Additional file 1 with background material provided in [24–28]. As already mentioned, only very few terms in the sum in equation (1) are required to achieve this. The mathematical technique we use insures that the patterns are orthogonal and independent. We do not seek a perfect fit because experimental data is invariably accompanied by some noise.

### Theoretical Section: Implementation of Surprisal analysis

By the end of the thermodynamic-like analysis we associate the deviations from steady state with a set of transcription patterns. Note that in our approach, each pattern is permanent and not varying in time. The list of coefficients *G*_{
iα
} is determined by our analysis for each value of *α*, see Additional file 1 and [23] for details. The weights *G*_{
iα
} are not a function of time. Only the weight *λ*_{
α
} (*t*_{
T
} ) of the transcription pattern varies with time. This is the background necessary for the analysis to be implemented below. We next proceed to make some additional points about the theory.

A technical point is that the theory expresses the weight of a pattern, that is its importance, as a product of two factors, *λ*_{
α
} (*t*_{
T
} ) = *ω*_{
α
}*P*_{
αT
} , see the Additional file 1 of this paper as well as [23]. Here *ω*_{
α
} is the time independent weight of transcription pattern *α* while *P*_{
αT
} is the fractional weight of the contribution of pattern *α* at time *T*. (The fractional weights sums up to unity as ). We are interested both in those transcription patterns with a large value of the absolute weight *ω*_{
α
} and in those patterns whose fractional weight changes significantly in the course of time. The factorization of *λ*_{
α
} (*t*_{
T
} ) is not just a technical matter because it shows that a transcription pattern can have a lower absolute weight *ω*_{
α
} yet its time-dependent weight can change significantly at some stage of the transformation.

The time invariant part is computed as that part of *X*_{
i
} (*t*_{
T
} ) that is not dependent on time. For notational reasons it is convenient to introduce a 'zeroth' pattern by writing the steady state term as . Unlike the other *λ*'s, from its definition *λ*_{0} does not depend on time. In our computation we allow *λ*_{0} to vary and use its theoretical constancy as a check and a numerical validation of the results. In the experiments of Rotter et al, [20, 21] there are several distinct trajectories that differ by which mutation was induced in the system at the last point in time. Because different trajectories can terminate at distinct biological fates, each such trajectory can possess its own time-invariant pattern.

In the Additional file 1 we provide full details on how the numerical values of the weights *λ*_{
α
} (*t*_{
T
} ) and of the transcription patters *G*_{
iα
} are determined from the measured values of the expression levels *X*_{
i
} (*t*_{
T
} ) of different genes, where *i* is an index of a gene, at different times *t*_{
T
} . It follows from that technical discussion that there is an upper value for the number of different transcription patterns that can be identified from the data.

The result (1) was first derived in the context of selectivity of energy requirements and specificity of energy disposal of chemical reactions [25, 28]. Using this expression to fit the observed data is often known as surprisal analysis. The term surprisal refers to the information provided by the deviation of the expression level from the time independent part.

The transcription patterns and constraints are identified by fitting equation (1) to the observed expression levels at different times along a particular trajectory. Say that there are A time points in that trajectory. When we use all A transcription patterns then the information theoretic expression (1) with *α* = 1,2,.., A-1 is an exact representation of the data, so at any time point the A, *λ*'s, *λ*_{0,}*λ*_{1},...,*λ*_{A-1} fully suffice to recover the data, noise and all. The surprising result is that, as we shall see, in practice one major transcription pattern often accounts for the measured expression levels, (see Figure 2). What it means is that transcription patterns can be ranked in terms of their importance. The details of the fitting procedure are described in the Additional file 1.

The functional form (1) is derived by imposing constraints that prevent the entropy of the distributions of gene expressions from being fully maximal. The details are provided in [23] and in reviews of the formalism [24–27]. Technically the constraints are imposed using the method of Lagrange's undetermined multipliers [29]. A multiplier *λ*_{
α
} (*t*_{
T
} ) is associated with each constraint *α* at each time point *T*. The value of the multiplier is determined by the value of the constraint at that time *T*. We here determine the value by fitting equation (1) to the data and we interpret *λ*_{
α
} (*t*_{
T
} ), the value of the multiplier *α* at time *T*, as the contribution of transcription pattern *α* at that time. We make the least square fit of the experimentally measured right hand side to the theory derived left hand side of equation (1) using the matrix technique of SVD. This provides a set conjugate eigenvectors that define both the weights *G*_{
iα
} of gene *i* in pattern *α* and the weight *λ*_{
α
} (*T*) of the pattern *α* at the time *T*. The distinct eigenvectors are orthogonal and this insures the independence of the patterns.

This interpretation *λ*_{
α
} (*T*) is directly seen when we compute the change in gene expression between two time points *T* and *T'*

Equation (2) plays a key role in the quantitative evaluation of the biological implications of the results of surprisal analysis as reported below. Specifically, equation (2) highlights the quantitative aspects of changes in the levels of gene expressions. Changes in expression patterns primarily require that the fractional weight *P*_{
αT
} changes significantly but it helps that the absolute weight *ω*_{
α
} is large. Also worth noting is that the changes in the fractional weights and in the absolute weight appear in the exponents of the levels of gene expressions. Particularly when the fractional weights change sign, see Figure 3 below, the levels of gene expressions can change by orders of magnitude. This is part of what we mean by an 'inversion' of the level of gene expressions. An example of an inversion is shown in Figure 4 below.

The Additional file 1 shows how to use equation (2) to compute the entropy of the gene transcription system at different points in time. Entropy is a state function meaning that it depends only on the current gene expression levels and not on how we arrived at these values. The equations given in the Additional file 1 provide an explicit illustration of this important property.

### Theoretical Section: Application of Surprisal analysis

In this study we examined the WI-38/hTERT cell system, which was guided to develop into a fully transformed cell, beginning with the normal WI-38 immortalized non-transformed fibroblasts. Cells underwent molecular manipulation such as hTERT insertion, many doublings, depression of p53 function and the insertion of oncogenic H-ras as depicted in Figure 1.

The model system was studied using the Human Genome Focus Array (Affymetrix, Santa Clara, CA) with 8500 verified human genes [10, 21]. The data is the gene expression level for each transcript, *X*_{
i
} (*t*_{
T
} ), at time *t = T* for a series of 12 time points as shown in Figure 1. The previous analysis [10, 21] of the data identified many transformation hallmarks. In particular down-regulation of the transcripts involved in cell development and differentiation at the early stages of transformation, induction of protein biosynthetic pathways, alteration in embryonic antigen expression and in apoptotic transcripts at the latter stages of transformation. In addition, a "tumor-forming" genetic signature reflected in high gene expression levels for cytokines and chemokines was identified. The major findings of this previous study were used to validate the information- theoretical approach used in the current analysis as discussed in detail in the Additional file 1. In order to examine which biological processes were most affected by the transformation, we used the EASE software [30] to group those transcripts that passed the t-test analysis and that changed by at least exp(+ 0.5) for each transcription pattern *α* between two time points. Biological categories that were significantly over-represented, as defined by EASE score < 0.05 are shown in the Additional file 1, Tables S2 to S19.

Since this system did not develop continuously from one point to the next we divided it into several trajectories representing the various possible processes. The expression levels were measured in duplicates for each time point in the trajectory (Figure 1). The data that we analyzed was the mean of the duplicates and included 5582 genes that had a 'present call' (according to Affymetrix calling procedure) in the two duplicates of at least one sample [10, 21]. We also performed the analysis using only those genes that were further filtered by the requirement that the variation between the duplicates is quite small (below 0.05 as judged by a paired t-test).

### Information-theoretic results of Surprisal Analysis of gene expression

Using the data reported by Milyavsky et al. [21], we implemented surprisal analysis and present some results of the analysis in Figure 3. *λ*_{
α
} (*t*_{
T
} ) represents the importance of the contribution, of gene expression pattern *α* at the time *T*. The trajectory 1-5-7-8-10-12 (Figure 1), includes 6 time points and therefore a maximum six values of *λ*_{
α
} (*t*_{
T
} ) can be calculated where *α* = 0,1,...,5. The *α* = 0 term is the time invariant gene expression pattern term and the five other are the varying patterns and we rank them in order of decreasing weight. Thereby, consecutive terms in the sum of terms in equation 1 make decreasing contribution. Figure 3 shows three curves that are the values of *λ*_{
α
} (*t*_{
T
} ) for the 3 constraints (or gene expression patterns) contributing most to the process of transformation, as a function of time.

The dominant transcription pattern *α* = 1 shown in Figure 3 undergoes a large change in value, accompanied by a change of the sign of its weight, between time points 7 and 8. The second pattern increases significantly between time points 5 and 7 and then drops to zero at point 8 and stays zero thereafter. The third pattern contributes only at the last three time points and the sign of its value changes between points 8 and 12. As seen in Figure 3 a weight of 'zero' is not exactly zero. This point is best discussed using the representation *λ*_{
α
} (*t*_{
T
} ) = *ω*_{
α
}*P*_{
αT
} . A weight of near zero at some values of time means that while the absolute weight *ω*_{
α
} is not necessarily small, the fractional weight *P*_{
αT
} is small at those time points. (For each pattern *α* the fractional weights are normalized to one as ). The weights for patterns *α* = 1 and 2 corresponding to Figure 3 are shown in the Additional file 1 Figures S3, S4 and S5. Some patterns do not have a large weight. The theory states that a gene expression pattern does not contribute, meaning *λ*_{
α
} (*t*_{
T
} ) ≈ 0, at such time points that its presence does not lower the entropy. From the point of view of the expression levels of genes, a pattern with a very low weight does not contribute to the gene expression levels at that time, see equation (1) and Figure 5 below. In this case patterns with higher weight will contribute to the measured expression network.

As shown in the Additional file 1 Table S1 the 'zeroth' multiplier *λ*_{0}(*t*_{
T
} ) does not depend on the time *t*_{
T
} of measurement. This value should be constant, since *λ*_{0}(*t*_{
T
} ) is the measure of the contribution of the time invariant gene expression pattern of the steady state, *α* = 0. The nearly exact constancy of the fitted value of *λ*_{0}(*t*_{
T
} ), at different times, is a validation of the concept of a time-invariant contribution, see equation (1). *α* = 0 is the pattern at the maximum of the entropy without time dependent constraints. It is the expression pattern at the limit of steady state. As expected from basic considerations, not all trajectories lead to the same cell fate. Therefore, different trajectories have different secular fates and can therefore differ in their *λ*_{0} values.

We rank the varying transcription patterns by their importance with *α* = 1 being the dominant one where importance is by the size of the absolute weight *ω*_{
α
} . The smaller the value of *ω*_{
α
} , the more likely it is that the fit is not to the real data but to the noise. So at a given point in time we have more confidence in the biological significance in those transcription patterns with larger weights. Even so, we will see that the fourth transcription pattern is very important at early times. Digital results for the weights *λ*_{
α
} (*t*_{
T
} )'s in different trajectories are given in the Additional file 1 Table S1.

The steady state term *λ*_{0}*G*_{0i}plus the other five time-varying transcription patterns exactly reproduce the measured levels of all gene expression. If we use fewer patterns in the expansion (1) we get a quite acceptable approximation when the dominant constraints are used. To highlight this point we show in Figure 2 a scatter plot of the measured levels vs. the prediction using just one or just two transcription patterns, *α* = 1,2. Also shown in Figure 2 is a solid line of unit slope. This is the prediction using all five transcription patterns.

The analysis of individual expression patterns shows that they can undergo an 'inversion' in their importance. Inversion means that genes that at the previously measured point had high expression levels now go down while genes that had a low level go up in their level at the present time point. Examined at the level of a particular pattern the change is an outright inversion. By this we mean that the logarithm of the expression level changes sign or equivalently that the exponent of the expression level changes sign as shown graphically in Figure 4. This inversion transformation that occurs when a constraint undergoes a qualitative change in its weight is somewhat reminiscent of the process known in physics and chemistry as a phase transition.

The results of analysis of different trajectories are shown in Figure 5. The plot is such to emphasize the similarity between different trajectories. So the abscissa is not the time as a running number, see Figure 1 but sequential events. Therefore the last point of trajectories 1-5-7-8-9, 1-5-7-8-10 and 1-5-7-8-11 is shown at the same point because the time points 9, 10 and 11 are all following time point 8.

### Biological results: Identification of transcripts participating in the first, major, transcription pattern (*α*= 1)

With minor variations we find the same major transcription pattern for many trajectories, see for example Figure 5. We begin by analyzing the main features of the transcription pattern *α*= 1 for trajectory 1-5-7-8-10-12. Among the over-represented categories with induced expression we find transcripts participating mostly in protein biosynthesis and the metabolism of DNA and RNA (rRNA, tRNA, mRNA). Note that these transcripts are limited to the induced and do not appear in the over-represented reduced categories (Additional file 1 Table S2, b). Using KEGG we identified in the first transcription pattern a group of induced spliceosome transcripts that was not reported earlier.

In addition to the identification of the processes that contribute significantly to the onset of carcinogenesis, this study aims at unraveling the rationale that drives this process. Therefore we seek an explanation for the increased activity of the high energy demanding processes - division and protein metabolism - in the late stages of transformation despite the unchanged energy metabolism (glycolysis and oxidative phosphorylation), as judged from mRNA levels. In transcription pattern *α* = 1, the information-theoretic approach points to a big group of reduced transcripts, participating in signal transduction category (110 genes with reduced expression out of the overall 404 genes with reduced expression; Additional file 1 Table S2, a). TGFβ-Smad4, JAK-STAT pathways are among the over-represented biological categories with reduced gene expression, and do not appear in the over-represented induced categories (Additional file 1 Table S2). Using KEGG, 28 gene products that function in the MAPK pathway were identified. Of these the expression of 18 gene products is reduced, compared with 10 that are induced. The same phenomenon of the signaling network contraction in general and specifically a reduction of the TGFβ-Smad4, JAK-STAT and MAPK pathways was observed previously in the in vitro model, based on the initiating event in cervical cancer [22]. We suggest that the contraction of signaling networks may reduce the energy requirements for cell maintenance, thereby diverting cellular resources towards rapid cell cycle progression and increased metabolism [22, 31]. The major transcription pattern of the trajectory 1-5-7-8-10-12 exhibits a similar picture of energy recycling where the energy is not invested in the signaling network but can be redirected towards cellular proliferation and protein metabolism as we discussed previously in the HPV16 model system [22, 31].

Interestingly, the major transcription pattern *α* = 1 in the trajectory 1-5-7-8-11 possesses the same disregulated transcription patterns as in 1-5-7-8-10-12 except for the cell death category, which is reduced significantly at point 11 (Table S3). Thus, the major transcription pattern, that has the biggest impact on the process of transformation of these two trajectories shows similar changes at gene expression levels. Reduction in signal transduction in the trajectories 1-5-7-8-11 and 1-5-7-8-10-12 is highly correlated with the enhanced rate of proliferation of the late stages that was measured experimentally [10] and is in line with our observations [31] in the HPV16 model system of the correlation between reduced signaling and enhanced rate of division.

Analysis of the trajectory 1-5-7-8-9 reveals that the major changes in the transcription patterns of the cells in this route of transformation are different from the previous two trajectories, 1-5-7-8-11 and 1-5-7-8-10-12. The long evolution of hTERT immortalized cells without opening the system (for H-RAS induction or p53 inactivation) leads to similar main changes, like reduction in development processes, induction of tRNA and rRNA metabolism and protein biosynthesis. However, the voraciously energy consuming category of signal transduction (Additional file 1 Table S4) is not among the over-represented reduced biological categories and DNA and protein metabolism does not appear among the induced categories.

### The changes in the major gene transcription pattern precedes the genetic alterations induced by p53 inactivation and H-RAS expression

Our purpose here is to characterize the major transcription pattern, *α* = 1, before the application of alterations and to check how far this transcription pattern is affected by the subsequent changes. This analysis enables us to recognize the cellular context that constitutes a necessary condition for tumor initiation. To do so we compared the *λ*_{1}(*t*)values of the 5 continuous routes: 1-5-7-8, 1-5-7-8-9, 1-5-7-8-10, 1-5-7-8-11 and 1-5-7-8-10-12 that branch out at point 8. The major transcription pattern of the 1-5-7-8, 1-5-7-8-10 and 1-5-7-8-11 routes included the reduced signal transduction category and induced protein metabolism. Reduction in signal transduction and induction of protein metabolism are found to be H-RAS/p53 independent, but play important role in cellular transformation. Since these alterations appear in the 1-5-7-8, 1-5-7-8-10, 1-5-7-8-11 and 1-5-7-8-10-12 trajectories we suggest that point 8 exhibits the appropriate cellular context for future oncogenic transformations. The trajectory ending at time point 9 represents a different route, where numerous cell divisions, which occurred between points 8 and 9, caused many additional alterations that not necessarily would lead to a cancerous phenotype.

The cell proliferation category in trajectory 1-5-7-8-10 is among the over-represented down regulated groups (Additional file 1 Table S5) of transcripts as opposed to trajectory 1-5-7-8-11, in which this category appears among the induced expression groups. This difference between the two trajectories might be explained by H-RAS induction that can inhibit cell proliferation through different mechanisms including induction of p21 through p53 [32]. The second difference between the two trajectories is the reduced expression of the transcripts participating in cell death in the trajectory 1-5-7-8-11, which might be caused by p53 inhibition. Interestingly the *α* = 1 transcription pattern of the trajectory 1-5-7-8 and of 1-5-7-8-10-12 have the largest number of the overlapping categories (see Tables S2 an S6).

The analysis of the trajectory 1-5-7-8-10, using the KEGG software reveals that MAPK pathway was reduced during this route (15 transcripts were reduced in comparison with 8 induced (Additional file 1 Table S5)). The analysis of the trajectory 1-5-7-8 showed the similar results (Additional file 1 Table S6). This pathway was also reduced independently from the RAS/p53 induced mutations. Moreover we suggest that this reduction might contribute significantly to the RAS transformation, since among the reduced transcripts we identified ASK1, the regulator of p38 pathway that provides negative feedback for RAS proliferative signaling [33].

### Identification of transcripts participating in the second transcription pattern (*α*= 2)

The second transcription pattern, *α* = 2, contributes at the 1, 5 and 7 time points. This minor transcription pattern switches its role between the two earlier time points and the point 7 as indicated by the change in sign between the 1,5 and 7 time points (See Figure 3). As shown in the Additional file 1 Table S7, in trajectory 1-5-7-8-10-12, the reduced transcripts participate mainly in the processes of cell cycle and proliferation or development. This finding is consistent with the bioinformatic analysis and experimental observation showing reduction in the expression level of the transcripts participating in cell proliferation in comparison with the point 5 and slow growth rate in comparison with the late stages of transformation [10]. Among the overrepresented induced categories we found transcripts participating in the cell communication, cell adhesion, lipid metabolism and enzyme linked receptor protein signaling pathway. The AMPK transcript was among the transcripts involved in the lipid metabolism category. AMPK is an energy sensor and its expression and activity is regulated by the cellular AMP/ATP ratio [34]. The induction of the AMPK in the point 7 might point to the stressful conditions of the cells at the point 7 and be correlated with reduced cell proliferation signature.

The analysis of the trajectories 1-5-7-8-9 and 1-5-7-8-11 reveals that the transcription pattern *α* = 2 contains similar altered transcription patterns, namely reduced expression of the transcripts involved in cell cycle and induced expression of the transcripts participating in lipid metabolism, including AMPK transcript (Additional file 1 Table S8 an S9). The analysis of the major transcription pattern (*α* = 1) of the trajectory 1-5-7 reveals similar results.

### Third transcription pattern (*α*= 3) features the transcripts that participate in the "tumor-forming" signature

The third transcription pattern, *α* = 3, identifies the particular changes that occurred between point 8 and the last point of the trajectories 1-5-7-8-9, 1-5-7-8-11 and 1-5-7-8-10-12. This transcription pattern onsets between point 8 and points 9, 10 and 12 (cf. Figure 3 and Figure 6). When it is on, the other transcription patterns are, by comparison, of lesser importance, cf. Figure 7.

As shown in the Additional file 1 Table S10, for trajectory 1-5-7-8-10-12, the reduced transcripts participating in the signal transduction and cell communication categories significantly contribute to the transition from point 8 to the cancerous point 12 of the trajectory 1-5-7-8-10-12. An additional reduction in these categories, that already occurred in the switch from point 7 to point 8 in the major transcription pattern *α* = 1, differentiates point 8 from point 12 (Additional file 1 Table S10). The switch between points 8 and 12 is also accompanied by an additional induction of the cell proliferation category.

New induced categories belonging to the cell cycle appear at point 12 as compared to point 8 (Additional file 1 Table S10). This induction might be explained by the induction of the chemokine signaling, as identified by the KEGG software. The induction in the expression of the chemokines and cytokines is highly correlated with the "tumor-forming" genetic signature according to Milyavskyet al. [21]. Moreover 12 of the 14 secreted molecules that comprised this "tumor-forming" genetic signature [21] also appear in this study as significantly contributing to the switch from point 8 to the cancerous stage.

The analysis of the third transcription pattern in the trajectory 1-5-7-8-10 reveals similar results. 10 of the 14 secreted molecules that comprised the "tumor-forming" genetic signature [21], contribute significantly to the switch from point 8 to point 10. In our analysis we observed almost 4-fold induction in the value of contribution of the CXCL1chemokine transcript to the third transcription pattern at point 12 as compared to point 10. This result strongly correlates with the observed experimental synergism between H-RAS induction and inactivation of p53 on the expression of CXCL1 [21].

We suggest that the third transcription pattern, as identified in our study, determines the "fine tuning" changes that specify the transition from point 8 through point 10, to the cancerous stage. It might be considered as the necessary stimulus that must happen together with the transformations identified in the major transcription pattern to bring the pre-malignant cell to the fully transformed stage.

The analysis of the trajectory 1-5-7-8-9 identifies overrepresented reduced categories in the third transcription pattern, which includes DNA metabolism and mitotic cell cycle categories (Additional file 1 Table S11). This result correlates with the reduced proliferation cluster in time point 9 as compared to point 8 [10]. The switch from point 8 to 9 is characterized by induced intracellular signaling cascade and death categories (Additional file 1 Table S11). This finding supports our previous hypothesis, that point 9 represents a different route that probably would not lead to a cancerous transcription pattern.

The third transcription pattern of the trajectory 1-5-7-8-11 inverts between points 8 and 11. This switch is accompanied by the reduction of categories, like cell adhesion and cell communication and induction of the similar categories that included cell proliferation and mitotic cell cycle (Additional file 1 Table S12).

### Minor transcription patterns

Quantitative results for the weights of all the different transcription patterns in all the trajectories that were analyzed are given in tabular form in the Additional file 1 (Tables S13, S14 and S15).

### Transcripts of the "tumor-forming" signature were induced in early stages of transformation

To follow the changes in the contribution of the transcripts to the "tumor-forming" signature in the course of transformation, we compared three trajectories that underwent p53 inactivation in their last points: 1-3-4, 1-5-6 and 1-5-7-8-11. The transcription pattern 2 of the trajectory 1-3-4 switched its sign at the point 3. This transcription pattern identifies the changes that occurred after p53 inactivation. 4 of the 14 secreted molecules that comprise "tumor-forming" genetic signature according to Milyavsky et al. [21], appear as significantly positively contributing to the "tumor-forming" signature. EASE analysis reveals induction in the cell proliferation and immune response as expected after p53 inactivation (Additional file 1 Table S19, [35]).

The analysis of the most contributing transcripts to the first transcription pattern of the trajectory 1-5-6, that switched its sign at the point 5 identifies the largest induced overrepresented immune category (71 transcripts out of the 297 are contributing to the major transcription pattern, see Additional file 1 Table S16.) and also induction in the NFκB pathway following the p53 inactivation (Additional file 1 Table S16). 10 of the 14 secreted molecules that comprise "tumor-forming" genetic signature according to Milyavsky et al. [21], appear already at point 6 as significantly positively contributing to the "tumor-forming" signature. Moreover KEGG analysis reveals the biggest group of the induced transcripts participating in the cytokine-cytokine receptor interaction (19 induced as compared to the 5 reduced), including cytokines and chemokines participating in cancer development [36]. 6 of the 19 induced transcripts are known to be regulated by NFκB [36].

Remarkably, the third transcription pattern of the trajectory 1-5-7-8-11, that identifies the most contributing transcripts in the switch from point 8 to point 11, reveals contraction of the immune response category and NFκB network as compared to the point 6 of the trajectory 156 (see Additional file 1 Table S12 and S16). KEGG analysis identifies only 3 induced transcripts participating in the cytokine-cytokine receptor interaction pathway. Only 4 of the 14 secreted molecules that comprise "tumor-forming" genetic signature according to Milyavsky et al. [21] appear at point 11 as significantly positively contributing to the "tumor-forming" signature.

The general contraction of the signaling pathway category, as identified by the major transcription pattern, that includes reduction in the one of the NFκB regulators - MAPK pathway, may explain a drop of the NFκB network activity after p53 inactivation at point 11. The contraction of the signaling network is one of the dominant processes that contribute to the process of transformation in the trajectories 1-5-7-8-11 and 1-5-7-8-10-12. NFκB down regulation might be a by-product of the overall contraction in signaling. We suggest that RAS induction at point 12, on the one hand, rescues the reduction of the NFκB pathway and renews the induced expression of chemokines and the cytokines, that contribute to cell proliferation and comprise the "tumor-forming" genetic signature according to Milyavsky et al. On the other hand, H-RAS activation did not change the major transcription pattern that includes reduced signal transduction. Moreover, as we indicated earlier, the reduction in the MAPK signaling that removes negative feedback control over the H-RAS oncogene might enable it to establish the cancer phenotype. Therefore we conclude that the appropriate combination of changes in several networks was needed in order to enable "the tumor signature" gene products to generate a tumor phenotype.