Increased entropy of signal transduction in the cancer metastasis phenotype
- Andrew E Teschendorff^{1}Email author and
- Simone Severini^{2}
DOI: 10.1186/1752-0509-4-104
© Teschendorff and Severini; licensee BioMed Central Ltd. 2010
Received: 12 April 2010
Accepted: 30 July 2010
Published: 30 July 2010
Abstract
Background
The statistical study of biological networks has led to important novel biological insights, such as the presence of hubs and hierarchical modularity. There is also a growing interest in studying the statistical properties of networks in the context of cancer genomics. However, relatively little is known as to what network features differ between the cancer and normal cell physiologies, or between different cancer cell phenotypes.
Results
Based on the observation that frequent genomic alterations underlie a more aggressive cancer phenotype, we asked if such an effect could be detectable as an increase in the randomness of local gene expression patterns. Using a breast cancer gene expression data set and a model network of protein interactions we derive constrained weighted networks defined by a stochastic information flux matrix reflecting expression correlations between interacting proteins. Based on this stochastic matrix we propose and compute an entropy measure that quantifies the degree of randomness in the local pattern of information flux around single genes. By comparing the local entropies in the non-metastatic versus metastatic breast cancer networks, we here show that breast cancers that metastasize are characterised by a small yet significant increase in the degree of randomness of local expression patterns. We validate this result in three additional breast cancer expression data sets and demonstrate that local entropy better characterises the metastatic phenotype than other non-entropy based measures. We show that increases in entropy can be used to identify genes and signalling pathways implicated in breast cancer metastasis and provide examples of de-novo discoveries of gene modules with known roles in apoptosis, immune-mediated tumour suppression, cell-cycle and tumour invasion. Importantly, we also identify a novel gene module within the insulin growth factor signalling pathway, alteration of which may predispose the tumour to metastasize.
Conclusions
These results demonstrate that a metastatic cancer phenotype is characterised by an increase in the randomness of the local information flux patterns. Measures of local randomness in integrated protein interaction mRNA expression networks may therefore be useful for identifying genes and signalling pathways disrupted in one phenotype relative to another. Further exploration of the statistical properties of such integrated cancer expression and protein interaction networks will be a fruitful endeavour.
Background
The statistical study of biological networks has shed substantial novel insights into how biological networks are organised and how this organisation may relate to cellular function [1]. An important class of biological networks are protein interaction networks (PINs), as determined from high-throughput interaction screens and literature curation [2–4]. Edges in these networks mainly describe interactions of proteins that are part of the same physical complex or posttranslational modifications mediating signal transduction flows. Topological studies of these PINs have shown that they have important properties such as scale-freeness of the degree distribution, characterised by most proteins having small degree constrasting with a small yet significant number of highly connected nodes, called hubs [1]. Hubs have been shown to encode for essential genes in the context of yeast knock-out experiments [5] and are also frequently altered in cancer [6, 7], highlighting that topological properties of these networks may carry important biological information. More recently, hierarchical modularity and betweenness centrality are other network properties that have been actively investigated in the context of biological networks [8, 9]. Specifically, the observation that proteins of high betweenness centrality tend to be also essential [9] and may also be frequently altered in cancer [10, 11] suggests that proteins of high betweenness centrality are also important for normal cellular physiology. However, it is recognised that wellknown disease genes are more frequently studied leading to literature bias and hence to a skew in the topological properties of such genes [12].
Existing PINs are also largely incomplete and therefore only represent very crude models of protein interactions [2]. Nevertheless they have provided useful structural models and their integration with gene expression data has led to novel insights, also in cancer genomics where they have helped identify candidate gene modules driving disease progression and novel diagnostic/prognostic markers [11, 13]. A distinct advantage of the integrative PIN-mRNA expression approach is that it can overcome some of the inherent problems associated with analysing gene expression data on its own, such as poor signal to noise ratios and the difficulty of interpreting gene expression signatures. Indeed, integrative PIN-mRNA expression studies have helped tease out relevant patterns of expression variation in the contextual framework of signalling pathways and protein complexes [14, 15]. Another potential advantage of using PINs is in relation to expression correlation networks to distinguish between direct and indirect correlations. For example, partial correlations and Graphical Gaussian Models have been proposed for deriving relevance networks where edges represent only potential direct influences between genes [16]. An alternative to partial correlations is to use the structural PIN to remove the (long-distance) indirect correlations between genes.
So far though the focus has been on the identification of specific subnetworks that may be causally involved in disease progression [11, 13, 17–19]. Much less has been done on the exploration of statistical properties of networks underlying the cancer phenotype with the exception of [12] where they performed a comprehensive analysis of cancer networks by first selecting genes differentially expressed between cancer and normal tissue (the "seeds") and then constructing the networks as the nearest-neighbor PIN expansion of these seeds. It has become clear however that using PIN information a posteriori does not make full justice of the potential offered by a genuine integrative PIN-mRNA expression analysis. Indeed, it is increasingly recognised that approaches based on single-gene differential expression analysis are too simplistic and have been gradually superseeded by network-based methods [11, 13]. A deeper understanding of the statistical properties of the underlying cancer networks is therefore important, as this may guide the choice of network metrics to use for better identifying signalling pathways or functional modules involved in disease progression.
In this work we investigate statistical properties [20] of integrated PIN-mRNA expression networks with the aim of identifying network properties that vary significantly between different cancer phenotypes. In particular, we hypothesized that since the frequency of genomic alterations is generally associated with a poor prognosis phenotype [21, 22], that this could be reflected by disruptions in the integrated PIN-mRNA expression networks, which would manifest as increases in the randomness of the local correlation patterns. To this end, we here introduce a local entropy measure that quantifies such randomness and demonstrate that (i) entropy is increased in primary tumours that metastasize (henceforth "metastatic phenotype"), (ii) that it better characterises the metastatic phenotype than other metrics which do not quantify randomness, and (iii) that this entropy measure can be used to discover genes and associated functional modules involved in cancer progression. To our knowledge this constitutes the first demonstration that metastatic cancer is characterised by a higher degree of randomness in the underlying signal transduction patterns and that this effect can be measured from integration of gene expression data with existing PINs.
Methods
The protein interaction network
We downloaded the Human Protein Reference Database (HPRD) interaction network [2] from Pathway Commons http://www.pathwaycommons.org The HPRD network has been manually extracted from the literature by expert biologists who read, interpret and analyze the published data. Protein interactions in this network include physical stable interactions such as those defining protein complexes, as well as transient interactions such as post-translational modifications and enzyme reactions found in signal-transduction pathways, including 20 highly curated immune and cancer signalling pathways ("NetPath"). The HPRD network used here consisted of 8396 proteins with 36877 documented interactions.
Breast cancer gene expression data sets
We downloaded the normalised data of four breast cancer gene expression data sets from GEO http://www.ncbi.nlm.nih.gov/geo/[23–26]. These data sets were all profiled with the Affymetrix platform. We refer to these data sets as "EMC" (Erasmus Medical Centre) [23], "Frid" [24], "LoiUnt" [25] and "Mainz" [26]. In the case of [25] we used the data from the untreated patient population. As clinical endpoint we used a binary outcome variable indicating if the tumour metastasized or not. This was possible for the Mainz, Frid and LoiUnt data sets. For the EMC cohort, we used recurrence as a surrogate. The number of samples and their breakup according to clinical outcome in each study were 286 (107 recurred, 179 no recurrence) for "EMC", 200 (46 metastasized, 154 no metastasis) for "Mainz", 129 (27 metastasized, 102 no metastasis) for "Frid" and 125 (28 metastasized, 97 no metastasis) for "LoiUnt". To simplify the jargon we also refer to samples that recurred (EMC cohort) as metastatic samples.
Integrated PIN-mRNA expression networks
Next, we built integrated PIN-mRNA expression networks from the HPRD PIN and a gene expression data matrix. From the PIN we first extracted the proteins for which there were corresponding gene expression profiles. Typically, this resulted in a reduced PIN over approximately 7000 common genes/proteins with a 1-1 correspondence between nodes in the PIN and gene expression profiles. The reduced PIN had several connected components, the maximally connected component containing approximately 95% of the original nodes. Let P_{ ij }denote the adjacency matrix of the PIN and let N denote the number of nodes.
Since we are interested in investigating differences in network properties between metastatic and non-metastatic cancer we constructed correlation expression networks by separately computing gene pairwise Pearson correlations across metastatic and non-metastatic cancers. Since typically the number of metastatic samples is less than that of non-metastatic tumours, to ensure that results are independent of system-size, we subselected a number of non-metastatic tumours at random from the original set. Thus, let denote the Pearson correlation of genes i and j across the n_{ M }metastatic samples, and let denote the corresponding correlation across n_{ M }randomly selected non-metastatic samples. We then assign to each pair of genes an edge weight w_{ ij }= C_{ ij }. Finally, since we are interested in the patterns of local correlation we set w_{ ij }= 0 whenever P_{ ij }= 0. Thus, we assign nonzero weights only to edges that are present in the corresponding PIN. The resulting network defines the integrated PIN-mRNA network, and while the structural networks are identical for the metastatic and non-metastic phenotypes, the values of the edge weights will differ.
The stochastic information flux matrix
so that ∑_{j∈N(i)}p_{ ij }= 1. We note that while this defines a stochastic matrix, it is not doubly stochastic, i.e ∑_{i∈N(j)}p_{ ij }≠ 1, because in general p_{ ij }≠ p_{ ji }.
We interpret this probability distribution as follows. Strong positive correlations in the PIN very likely describe interactions that are necessary for the cell to carry out a specific modular function. We call this a "positive information flux" interaction to indicate that this interaction describes propagation of an activating signal. Thus, although the positive interaction could be between members of a given complex without explicitly representing signal transduction flow, we nevertheless argue that the positive correlation is necessary for the protein complex to carry out a specific function within the (signal transduction) pathway in which it operates. Because of this, we interpret negative correlations as "negative information flux" interactions to indicate that complexes may be in an inactive state or to represent inhibitory interactions. Thus, these interactions are given the smallest "positive flux" weights. Weak or near-zero correlations are given a larger positive flux (p_{ ij }~ 0.5) than negative interactions (p_{ ij }~ 0) because we argue that these interactions have less evidence for inducing an inactivating signal. Thus, the integrated PIN-mRNA networks with the edge weights defined by p_{ ij }as in equation (1) describe positive flux interactions (as measured by gene-gene correlations in expression) subject to the structural constraint of the PIN. Applying this procedure to the two C^{ (M) } and C^{ (N) } correlation matrices, yields two integrated PIN-mRNA networks, one for the metastatic phenotype with stochastic matrix , and one for the non-metastatic phenotype with stochastic matrix Since the number of metastatic samples is less than that of non-metastatic tumours (true for all cohorts considered here), and since we constructed the weighted networks using the same number of samples from each phenotype, we have freedom in the choice of non-metastatic samples to use. We exploit this freedom by making in each cohort 10 random selections of non-metastatic samples (in each case selecting the same number of samples as there are of metastatic cases), called "bootstraps", which allows the robustness of our results to be evaluated.
We point out that gene expression gives only a rough proxy for protein activity and that therefore our integrated PIN-mRNA networks and stochastic information flux matrices only represent very crude models of signal transduction flow. In particular, we expect many egdes to be assigned low fluxes when in fact corresponding protein levels and activities may be highly correlated. Thus, it is clear that our networks do not represent truly integrated networks. However, in the absence of large-scale protein expression and phosphorylation data, specially in relation to cancer phenotypes, relying on correlations in mRNA expression between genes whose protein products have been documented to physically interact (including postranslational modifications) is one viable alternative.
Entropy of information flux distribution
Thus, in the extreme case where all the flux is constrained along one interaction, p_{ ij }= δ_{ ij }(where δ_{ ij }is the Kronecker-delta), S_{ i }= 0, while in the case where the flux is distributed equally across all edges i.e if p_{ ij }= 1/k_{ i }, the entropy attains its maximum value S_{ i }= 1.
We denote by S the distribution of local entropies across the whole network, i.e S = {S_{1},..., S_{ N }} where N is the number of nodes in the network. Since the stochastic matrix p_{ ij }is defined separately for metastatic and non-metastatic phenotypes we obtain separate distributions of local entropies S^{ (M) } and S^{ (N) }.
It is important to point out that the sampling properties of the entropy measure as defined here will depend on the degree of the node. To see this, consider a node of degree 10, with half of the neighbors having p_{ ij }= 1/5 while the other half have p_{ ij }= 0. Then, the entropy is S = log 5/log 10 ≈ 0.70. Consider now another node of degree 100, again with half of the neighbors having p_{ ij }= 1/50 and the rest p_{ ij }= 0. In this case, the entropy is S = log 50/log 100 ≈ 0.85. Thus, while the two nodes have, modulo the degree, the same stochastic flux distribution, the entropies are widely different with the higher connected node exhibiting much higher entropy. A consequence of this is that the sampling properties and in particular the expected variance of the entropy will decrease with increasing degree. This means therefore that when comparing entropy measures between the two phenotypes, the comparison must be done on a degree-by-degree basis. In particular, the paired Wilcoxon (non-parametric) test used above to evaluate differences in the global entropy distributions performs the comparison on a node by node basis and therefore circumvents the degree-dependence problem posed by the entropy measure.
Feature selection based on local differential entropy
- 1.
Randomise the gene expression profiles of each sample, reconstruct stochastic flux matrices for the two phenotypes and calculate differential entropy values.
- 2.
For nodes with the same degree, estimate sampling variance. Generally, for nodes of higher degree (i.e k_{ i }> 50), there were too few nodes to estimate sampling variance, so high-degree nodes were pooled together and a single variance estimate obtained for each pool. In particular, nodes with degrees in the following ranges were pooled in corresponding bins: (50, 75), (75, 100), (100, 150), (150, max k_{ i }). This procedure was justified as we observed that the sampling variance was fairly constant for high-degree nodes.
- 3.
Repeat steps 1 & 2 a total of 100 times to obtain 100 variance estimates for each degree or pooled degree.
- 4.
Obtain smoothed standard error estimates by averaging the standard errors over the 100 randomisations.
- 5.We observed that the resulting standard deviation, σ, was a decreasing monotonic function of the degree of the node. In particular, it took the power-law form(4)
- 6.
Estimate the parameters a and b using non-linear least squares [29].
- 7.
For node i of degree k_{ i }and observed differential entropy value dS_{ i }we computed a P-value by comparing dS_{ i }to the value expected under the null given by a Gaussian of mean 0 and standard deviation σ (k_{ i }) with the parameters as estimated in the previous step.
- 8.
Finally, nodes are ranked according to the significance of the P-values, and subsequent FDR estimation is applied [30].
Average local correlation
and use these metrics in the comparisons to entropy. In comparing all the metrics between the two phenotypes, the P-values obtained from the paired Wilcoxon rank sum tests are directly comparable, since the same number of values are being compared for each choice of metric.
Results
Using entropy to characterise the metastatic cancer phenotype
Based on the observation that more aggressive metastatic breast cancer is associated with a higher genomic instability (i.e a higher frequency of genomic alterations) [21, 22, 24, 31], we sought to investigate if this could be reflected in integrated PIN-mRNA expression networks. Specifically, we hypothesized that genomic (and epigenomic) alterations would lead to disruptions in the dynamics of information flow on protein interaction networks, and that this disruption would manifest itself as a higher degree of randomness in the patterns of local gene expression correlations.
Thus, nodes with significant changes in S between two phenotypes represent points in the gene network where the degree of randomness in the information flow changes.
Increased entropy in metastatic breast cancer
To test our hypotheses we first integrated a highly curated network of protein interactions (from the Human Protein Reference Database) [2] with a high-quality cancer gene expression data set from 286 breast cancer patients [23], of which 107 relapsed (Methods). There were 7279 protein coding genes with expression profiles and with representation in the HPRD PIN. Of these, 167 were isolated nodes, which were therefore removed from the analysis, leaving n = 7112 nodes. The PIN is sparse containing a total of only 31678 edges, representing 0.01% of the maximum possible number of edges. The degree distribution exhibited long-tails with the highest degree node having 222 neighbors. About 60% of nodes had a degree less or equal than 5, while ~75% of nodes had a degree less than 10. To ensure comparibility of correlation values, we sampled 107 samples at random from the 179 non-metastatic samples, to build the corresponding non-metastatic PIN-mRNA network. The metastatic PIN-mRNA network was constructed using all 107 metastatic samples.
Differences in local entropy between the metastatic and non-metastatic network were not substantial in absolute terms, however, the paired test results clearly showed statistical significance and a trend towards increased entropy. To obtain a handle on the statistical significance of individual entropy changes, we first noted that the variance in differential entropy values was dependent on the degree of the node (Additional file 1A), which was not unexpected (Methods). Specifically, high degree nodes are generally less likely to exhibit large entropy changes. To overcome this problem we derived degree-dependent variance estimates of the differential entropy, which allowed specific P-values to be derived (Methods). We observed that the variance decreased with the degree of the node in the form of a power-law allowing P-values to be estimated for nodes of any degree (Figure 2C). Confirming the paired test analysis, individual P-values exhibited a clear skew towards 0 suggesting many significant changes (Figure 2D). Among the top 200 ranked nodes (stringent FDR < 0.001) (Additional file 2), 133 showed entropy increases (Binomial test P < 10^{-6}). Using the P-values to rank the genes also confirmed that the selected nodes were now not skewed towards those of low-degree (Additional file 1B).
Because these results were obtained on the selected nodes of high degree (k_{ i }≥ 10), we asked if this result would remain significant had we included all nodes. We thus repeated the analysis using all 5592 nodes with at least two neighbors (entropy undefined for k_{ i }< 2). This showed that entropy was also significantly higher in metastatic breast cancer (Wilcoxon paired test P < 10^{-100}), despite the fact that non-hubs made up 75% of the network. Thus, while nodes of lower degree generally also showed increases in entropy we focused our analysis on the higher-degree nodes (k_{ i }≥ 10) as these are more prone to disruption in cancer [6, 7].
Validation in other breast cancer cohorts
It is natural to ask if the increase in entropy seen across all cohorts is due to the same genes or instead if in each cohort the subset of genes exhibiting increases is different. To address this we selected the 133 genes which showed the most significant increases in entropy in the EMC data set (out of the top 200 genes with most significant changes (FDR < 0.001), 133 showed increases) and verified that these generally also showed more significant increases in entropy in the validation sets (Figure 3B). This supports the view that there is gene-wise consistency in the metastasis-associated entropy increases seen across all four cohorts.
Randomness as an intrinsic property of the metastatic phenotype
Given the robust finding of increased entropy in metastatic breast cancer, we next asked if other local measures could discriminate between the two cancer phenotypes. Specifically, we compared entropy against the average local correlation, that is, the mean correlation of a node with its nearest neighbors, as well as the mean local absolute correlation (Methods).
One tailed paired Wilcoxon test P-values comparing distribution of local measures of disruption in information flow in metastatic and non-metastatic PIN-mRNA networks across four different breast cancer cohorts.
Metric | EMC (n= 1903) | Mainz (n= 1900) | Frid (n= 1270) | LoiUnt (n= 1899) |
---|---|---|---|---|
| 0.001 | 0.01 | 0.54 | 2e-13 |
| 1e-16 | 5e-10 | 1e-12 | 3e-70 |
dS | 2e-16 | 6e-13 | <1e-100 | <1e-100 |
Biological significance
Gene Set Enrichment Analysis of the top 200 genes of which 133 showed increases in entropy.
Pathway | P (dS > 0) | P(dS < 0) | Example genes |
---|---|---|---|
Apoptosis | 6e-7 | n.s | FAS, TP53, TP53BP2, BCL2, BCL2L1, CASP3 |
IL2 | 6e-4 | n.s | IL2RB, HRAS, FOS, SOS1, LCH, SHC1 |
AR | 8e-4 | n.s | SNX1, RBX1, BSG, NOTCH1, RAB27A, TAF9 |
IGF1 | 9e-4 | n.s | IGF1, IRS1, IGFBP7 |
Since GSEA was performed on the top 200 genes using the 1903 gene list as the background reference, the enrichment of any biological pathways and in particular those of apoptosis and IL2/IGF is very significant and not caused by the selection of nodes of high degree. As a control and to confirm this further, we observed no enrichment of any biological pathway among the genes showing decreases in entropy (Table 2). Although this could be explained in part by the smaller number of genes showing decreases (67 vs 133), we only obtained significance of one biological term (regulation of translation initiation) when we expanded the 67 gene set to the 133 with the most significant entropy decreases.
It is likely that the increases in entropy associated with specific genes are caused by a higher frequency of underlying mutations, losses or epigenetic silencing. Using copy-number and sequencing data of a large cohort of 171 breast tumours (38 metastasized, 130 no-metastasis) [31] we observed an increased frequency of losses in those tumours that metastasized for many of the top ranked genes including BCL2 (18% vs 13%), IGFBP7 (11% vs 6%), IGF1 (8% vs 2%) (Additional file 3). Similarly, TP53 was mutated in 31% of tumours that metastasized against 17% in those tumours that did not. Of the top 133 genes exhibiting increases in entropy, 90 could be mapped to the array CGH study and of these, 68 (75%) showed more frequent losses in primary tumours that metastasized (Additional file 4). In contrast, 65% of genes exhibiting no significant changes in entropy were more frequently lost in metastatic cases (Additional file 4). Thus, genes with significant increases in entropy were approximately 1.6 times more likely to undergo more frequent losses in disseminating tumours (Fisher test P = 0.08).
Discussion
In this work we have explored a specific statistical property of integrated protein interaction and cancer mRNA expression networks. Our hypothesis was that the metastatic cancer phenotype is characterised by a higher level of randomness in the sense of a more disordered local gene expression correlation pattern. In other words, since one may view the protein interactions as imposing to some degree constraints on the allowed local gene expression patterns [9, 11], we posited that these constraints would be violated to a higher extent in breast tumours which exhibit a more invasive phenotype. To test this we introduced a stochastic matrix on the PIN, modeling positive information fluxes around any given node, and from it we defined a local entropy measure which quantifies the degree of randomness of this flux distribution. Using this entropy measure we could show that the metastatic phenotype is indeed characterised by a higher level of entropy independently of the breast cancer cohort considered. While entropy changes were not large in absolute terms due to the focus on nodes of relatively high degree (degree ≥ 10), we were able to show using an empirically derived degree dependent variance estimator, that the changes in entropy were significant with a clear skew towards higher entropy in the metastatic phenotype. Moreover, we were able to show that genes exhibiting significant increases in local entropy in one data set, did so also in the independent cohorts, supporting the view that the disruptions in local expression patterns that we have found are biologically genuine and of relevance to the metastatic phenotype. It is interesting though that the increase in entropy was less marked when analysis was extended to all nodes of degree larger than 1, indicating that low-degree nodes showed less consistent directional changes, in line with the expectation that low-degree nodes may be functionally less important.
We also observed significant enrichment of relevant biological pathways, specially those with tumour suppressor functions, among genes showing increases in entropy in the metastatic phenotype, but not so for genes showing decreases in entropy. This supports the view that it is increases in randomness that may be of most functional consequence. A higher randomness in the co-expression patterns of genes important to the apoptotic cascade may indicate that this pathway is less functional in primary tumours that eventually metastasize. Similarly, our findings suggest that increased randomness in the patterns of expression of the IL2 pathway, a well-known tumour suppressor pathway mediating tumour inhibition through formation of natural killer cells [34], is a critical determinant of breast cancer metastasis. This confirms other reports that immune-response pathways are important in prognosis prediction of breast cancer [35, 36]. While the IGF-pathway has been implicated as a mediator of breast cancer progression and is also involved in the regulation of apoptosis [37–39], our results suggest a novel important role for IGFBP7 and IGF-signalling.
Generally, there were other instances where the identified genes formed interlinked subnetworks implicating whole gene modules that show increased entropy. It will therefore be interesting to extend our analysis to metrics that quantify the degree of randomness across whole subnetworks as opposed to single nodes. Among the larger modules, we identified one associated with the cell-cycle (CDC2, CDC20, MYBL2, MAD2L1) consistent with many other studies showing the importance of cell-cycle genes in breast cancer prognosis [32, 33]. Of note, we also found local entropy increases in subnetworks surrounding genes involved in invasion (SPARC, MMP1, MMP2, MMP3). SPARC expression itself has been associated with clinical outcome in breast cancer in a large number of studies, for example see [40, 41]. In this regard, it is worth pointing out that we have rediscovered these important cell-cycle and extracellular matrix modules without directly comparing expression levels between metastatic and non-metastatic breast cancer, but rather by using entropy to compare the randomness of expression patterns within each phenotype. Thus, our proposed entropy metric is able to identify biological mechanisms related to the cancer hallmarks and a poor prognosis phenotype.
Perhaps most remarkably is the fact that entropy outperformed other local measures of information flux, in the sense that entropy better characterized the metastatic phenotype. One competing measure we considered was the average nearest-neighbor correlation in gene expression, which clearly was less effective in distinguishing the two cancer phenotypes. Similarly, entropy characterised the metastatic phenotype more closely than the effect size statistic provided by a t-test. The t-test is similar to the method used in [11] to identify features disrupted in poor prognosis breast cancer. In that study, the authors identified BRCA1 as one of the top genes exhibiting changes in the mean local correlation. In terms of differential entropy however, BRCA1 was not highly ranked because differential entropy only measures the change in the degree of randomness of the local correlation distribution, while as shown in [11]BRCA1 exhibits a fairly uniform switch from positive to negative correlations with its interacting partners, a switch which is therefore not necessarily associated with an increase in disorder or randomness. This example highlights the fact that although differential entropy can miss important genes, that it constitutes a metric that is complementary to the one used in [11]. Therefore, it will be interesting to compare the alternative metrics in relation to the ranking of nodes. However, the result obtained here that increased entropy is a better distinguishing feature of the metastatic phenotype supports the view that changes in the randomness of the local correlation patterns may be more relevant than changes in mean correlation levels. To confirm this further, we took the mean of the absolute correlations as another metric and compared it to entropy. In line with the expectation that increases in entropy are generally caused by drops in absolute correlation values, we observed that changes in the mean absolute correlation, while not outperforming entropy, was a good surrogate for it. It will therefore also be interesting to compare our entropy metric to other network metrics that quantify randomness or to metrics that combine randomness and changes in mean correlation levels.
The observed increase in entropy in the metastatic breast cancer networks may have a clear biological interpretation, as genomic and epigenomic alterations are more frequent in primary tumours that metastasize [21, 24, 31]. This increased frequency of alterations is true for losses, mutations and amplifications (high-level gains) but does not necessarily hold for low-level gains [24, 31]. It was therefore encouraging to observe that genes undergoing the most significant increases in entropy were also more likely to undergo more frequent losses in the tumours that metastasized. Although this association was only marginal, this could be because the genomic alteration data came from a different cohort. Another potential difficulty in linking entropy changes to gene alterations is that a substantial number of alterations are not copy-number changes but instead may represent point mutations, rearrangements, or epigenetic aberrations, and large scale profiling of these types of alterations is still ongoing. Thus, the question if differential entropy selects for genes that are more frequently altered and how the directionality of change might relate to the type of alteration is still an open question that we hope to address in future studies. An easier context in which to study this would be to compare integrated PIN-mRNA networks for cancer and corresponding normal tissue, as normal tissue exhibits significantly much lower levels of alterations. However, this also requires relatively large normal and cancer tissue data sets profiled on the same platform and ideally as part of the same study. While in breast cancer such data sets are only now becoming available, it will be interesting in the meantime to pursue this question in other cancers.
Conclusions
Metastatic breast cancer is characterised by an increase in the randomness of the local expression correlation patterns. The entropy metric proposed here may therefore be a useful tool for identifying genes and signalling pathways implicated in the metastatic process of other cancers. We foresee that further network-theoretical studies of integrated PIN-mRNA expression networks in the context of cancer genomics will be fruitful.
Declarations
Acknowledgements
AET was supported by a Heller Research Fellowship. SS was supported by a Newton International Fellowship.
Authors’ Affiliations
References
- Barabasi AL, Oltvai ZN: Network biology: understanding the cell's functional organization. Nat Rev Genet. 2004, 5: 101-113. 10.1038/nrg1272View ArticlePubMedGoogle Scholar
- Prasad TS, Kandasamy K, Pandey A: Human protein reference database and human proteinpedia as discovery tools for systems biology. Methods Mol Biol. 2009, 577: 67-79. full_textView ArticlePubMedGoogle Scholar
- Stelzl U, Wanker EE: The value of high quality protein-protein interaction networks for systems biology. Curr Opin Chem Biol. 2006, 10: 551-558. 10.1016/j.cbpa.2006.10.005View ArticlePubMedGoogle Scholar
- Brown KR, Jurisica I: Online predicted human interaction database. Bioinformatics. 2005, 21: 2076-2082. 10.1093/bioinformatics/bti273View ArticlePubMedGoogle Scholar
- Jeong H, Mason SP, Barabási AL, Oltvai ZN: Lethality and centrality in protein networks. Nature. 2001, 411: 41-42. 10.1038/35075138View ArticlePubMedGoogle Scholar
- Hornberg JJ, Bruggeman FJ, Westerhoff HV, Lankelma J: Cancer: a systems biology disease. Biosystems. 2006, 83: 81-90. 10.1016/j.biosystems.2005.05.014View ArticlePubMedGoogle Scholar
- Jonsson PF, Bates PA: Global topological features of cancer proteins in the human interactome. Bioinformatics. 2006, 22: 2291-2297. 10.1093/bioinformatics/btl390PubMed CentralView ArticlePubMedGoogle Scholar
- Han JD, Bertin N, Hao T, Goldberg DS, Berriz GF, et al.: Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature. 2004, 430: 88-93. 10.1038/nature02555View ArticlePubMedGoogle Scholar
- Yu H, Kim PM, Sprecher E, Trifonov V, Gerstein M: The importance of bottlenecks in protein networks: correlation with gene essentiality and expression dynamics. PLoS Comput Biol. 2007, 3: e59- 10.1371/journal.pcbi.0030059PubMed CentralView ArticlePubMedGoogle Scholar
- Ozgür A, Vu T, Erkan G, Radev DR: Identifying gene-disease associations using centrality on a literature mined gene-interaction network. Bioinformatics. 2008, 24: i277-i285. 10.1093/bioinformatics/btn182PubMed CentralView ArticlePubMedGoogle Scholar
- Taylor IW, Linding R, Warde-Farley D, Liu Y, Pesquita C, et al.: Dynamic modularity in protein interaction networks predicts breast cancer outcome. Nat Biotechnol. 2009, 27: 199-204. 10.1038/nbt.1522View ArticlePubMedGoogle Scholar
- Platzer A, Perco P, Lukas A, Mayer B: Characterization of protein-interaction networks in tumors. BMC Bioinformatics. 2007, 8: 224- 10.1186/1471-2105-8-224PubMed CentralView ArticlePubMedGoogle Scholar
- Chuang HY, Lee E, Liu YT, Lee D, Ideker T: Network-based classification of breast cancer metastasis. Mol Syst Biol. 2007, 3: 140- 10.1038/msb4100180PubMed CentralView ArticlePubMedGoogle Scholar
- Rapaport F, Zinovyev A, Dutreix M, Barillot E, Vert JP: Classification of microarray data using gene networks. BMC Bioinformatics. 2007, 8: 35- 10.1186/1471-2105-8-35PubMed CentralView ArticlePubMedGoogle Scholar
- Ulitsky I, Shamir R: Identification of functional modules using network topology and high-throughput data. BMC Syst Biol. 2007, 1: 8- 10.1186/1752-0509-1-8PubMed CentralView ArticlePubMedGoogle Scholar
- Schäfer J, Strimmer K: An empirical bayes approach to inferring large-scale gene association networks. Bioinformatics. 2005, 21: 754-764. 10.1093/bioinformatics/bti062View ArticlePubMedGoogle Scholar
- Tuck DP, Kluger HM, Kluger Y: Characterizing disease states from topological properties of transcriptional regulatory networks. BMC Bioinformatics. 2006, 7: 236- 10.1186/1471-2105-7-236PubMed CentralView ArticlePubMedGoogle Scholar
- Pujana MA, Han JD, Starita LM, Stevens KN, Tewari M, et al.: Network modeling links breast cancer susceptibility and centrosome dysfunction. Nat Genet. 2007, 39: 1338-1349. 10.1038/ng.2007.2View ArticlePubMedGoogle Scholar
- Nibbe RK, Koyutürk M, Chance MR: An integrative -omics approach to identify functional subnetworks in human colorectal cancer. PLoS Comput Biol. 2010, 6: e1000639- 10.1371/journal.pcbi.1000639PubMed CentralView ArticlePubMedGoogle Scholar
- Barrat A, Barthelemy M, Vespignani A: Dynamical Processes on Complex Networks. CUP. 2008,Google Scholar
- Carter SL, Eklund AC, Kohane IS, Harris LN, Szallasi Z: A signature of chromosomal instability inferred from gene expression profiles predicts clinical outcome in multiple human cancers. Nat Genet. 2006, 38: 1043-1048. 10.1038/ng1861View ArticlePubMedGoogle Scholar
- Negrini S, Gorgoulis VG, Halazonetis TD: Genomic instability--an evolving hallmark of cancer. Nat Rev Mol Cell Biol. 2010, 11: 220-228. 10.1038/nrm2858View ArticlePubMedGoogle Scholar
- Wang Y, Klijn JG, Zhang Y, Sieuwerts AM, Look MP, et al.: Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet. 2005, 365: 671-679.View ArticlePubMedGoogle Scholar
- Chin K, DeVries S, Fridlyand J, Spellman PT, Roydasgupta R, et al.: Genomic and transcriptional aberrations linked to breast cancer pathophysiologies. Cancer Cell. 2006, 10: 529-541. 10.1016/j.ccr.2006.10.009View ArticlePubMedGoogle Scholar
- Loi S, Haibe-Kains B, Desmedt C, Lallemand F, Tutt AM, et al.: Definition of clinically distinct molecular subtypes in estrogen receptor-positive breast carcinomas through genomic grade. J Clin Oncol. 2007, 25: 1239-1246. 10.1200/JCO.2006.07.1522View ArticlePubMedGoogle Scholar
- Schmidt M, Böhm D, von Törne C, Steiner E, Puhl A, et al.: The humoral immune system has a key prognostic impact in node-negative breast cancer. Cancer Res. 2008, 68: 5405-5413. 10.1158/0008-5472.CAN-07-5206View ArticlePubMedGoogle Scholar
- Derrida B, Flyvbjerg H: Statistical properties of randomly broken objects and of multivalley structures in disordered systems. J Phys A:Math Gen. 1987, 20: 5273-5288. 10.1088/0305-4470/20/15/039.View ArticleGoogle Scholar
- Barthelemy M, Gondran B, Guichard E: Spatial structure of internet traffic. Physica A. 2003, 319: 633-642. 10.1016/S0378-4371(02)01382-1.View ArticleGoogle Scholar
- Bates DM, Watts DG: Nonlinear Regression Analysis and Its Applications. 1988, Wiley,View ArticleGoogle Scholar
- Storey JD, Tibshirani R: Statistical significance for genomewide studies. Proc Natl Acad Sci USA. 2003, 100: 9440-9445. 10.1073/pnas.1530509100PubMed CentralView ArticlePubMedGoogle Scholar
- Chin SF, Teschendorff AE, Marioni JC, Wang Y, Barbosa-Morais NL, et al.: High-resolution acgh and expression profiling identifies a novel genomic subtype of er negative breast cancer. Genome Biol. 2007, 8: R215- 10.1186/gb-2007-8-10-r215PubMed CentralView ArticlePubMedGoogle Scholar
- Teschendorff AE, Naderi A, Barbosa-Morais NL, Caldas C: Pack: Profile analysis using clustering and kurtosis to find molecular classifiers in cancer. Bioinformatics. 2006, 22: 2269-2275. 10.1093/bioinformatics/btl174View ArticlePubMedGoogle Scholar
- Sotiriou C, Wirapati P, Loi S, Harris A, Fox S, et al.: Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J Natl Cancer Inst. 2006, 98: 262-272. 10.1093/jnci/djj052View ArticlePubMedGoogle Scholar
- Pardoll D: Metastasis-promoting immunity: when t cells turn to the dark side. Cancer Cell. 2009, 16: 81-82. 10.1016/j.ccr.2009.07.007View ArticlePubMedGoogle Scholar
- Teschendorff AE, Miremadi A, Pinder SE, Ellis IO, Caldas C: An immune response gene expression module identifies a good prognosis subtype in estrogen receptor negative breast cancer. Genome Biol. 2007, 8: R157- 10.1186/gb-2007-8-8-r157PubMed CentralView ArticlePubMedGoogle Scholar
- Teschendorff AE, Caldas C: A robust classifier of high predictive value to identify good prognosis patients in er-negative breast cancer. Breast Cancer Res. 2008, 10: R73- 10.1186/bcr2138PubMed CentralView ArticlePubMedGoogle Scholar
- Neuberg M, Buckbinder L, Seizinger B, Kley N: The p53/igf-1 receptor axis in the regulation of programmed cell death. Endocrine. 1997, 7: 107-109. 10.1007/BF02778075View ArticlePubMedGoogle Scholar
- Samani AA, Yakar S, LeRoith D, Brodt P: The role of the igf system in cancer growth and metastasis: overview and recent insights. Endocr Rev. 2007, 28: 20-47. 10.1210/er.2006-0001View ArticlePubMedGoogle Scholar
- Yee D, Wood TL: The igf system in mammary development and breast cancer. J Mammary Gland Biol Neoplasia. 2008, 13: 351-352. 10.1007/s10911-008-9096-2View ArticlePubMedGoogle Scholar
- Naderi A, Teschendorff AE, Barbosa-Morais NL, Pinder SE, Green AR, et al.: A gene-expression signature to predict survival in breast cancer across independent data sets. Oncogene. 2007, 26: 1507-1516. 10.1038/sj.onc.1209920View ArticlePubMedGoogle Scholar
- Farmer P, Bonnefoi H, Anderle P, Cameron D, Wirapati P, et al.: A stroma-related gene signature predicts resistance to neoadjuvant chemotherapy in breast cancer. Nat Med. 2009, 15: 68-74. 10.1038/nm.1908View ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.