Regulation patterns in signaling networks of cancer

Background Formation of cellular malignancy results from the disruption of fine tuned signaling homeostasis for proliferation, accompanied by mal-functional signals for differentiation, cell cycle and apoptosis. We wanted to observe central signaling characteristics on a global view of malignant cells which have evolved to selfishness and independence in comparison to their non-malignant counterparts that fulfill well defined tasks in their sample. Results We investigated the regulation of signaling networks with twenty microarray datasets from eleven different tumor types and their corresponding non-malignant tissue samples. Proteins were represented by their coding genes and regulatory distances were defined by correlating the gene-regulation between neighboring proteins in the network (high correlation = small distance). In cancer cells we observed shorter pathways, larger extension of the networks, a lower signaling frequency of central proteins and links and a higher information content of the network. Proteins of high signaling frequency were enriched with cancer mutations. These proteins showed motifs of regulatory integration in normal cells which was disrupted in tumor cells. Conclusion Our global analysis revealed a distinct formation of signaling-regulation in cancer cells when compared to cells of normal samples. From these cancer-specific regulation patterns novel signaling motifs are proposed.


Background
Endogenous signal transduction in cancer cells is systematically disturbed to redirect the cellular decisions from differentiation and apoptosis to proliferation and, later, invasion [1]. Cancer cells acquire their malignancy through accumulation of advantageous gene mutations by which the necessary steps to malignancy are obtained [2]. These selfish adaptations to independence can be described as a result from an evolutionary process of diversity and selection [3]. We were interested to observe the resulting cellular signal transduction on a global view. Experimental high throughput methods such as gene expression profiling with microarrays enable investigating the pathogenic function of tumors on a mesoscopic level. Large-scale gene expression profiles were successfully used to predict clinical outcome [4,5] and improved risk estimation [6]. However these studies didn't relate genes and their expression to a functional context. To gain an understanding on a systems view, gene expression can be mapped onto cellular networks. Several studies have been reported that used gene expression data from microarrays to describe specific characteristics of signaling networks in cancer. Discriminative components of a protein-protein interaction network were identified by comparing gene expression patterns of metastatic and non-metastatic tumors in breast cancer and suited as risk markers for metastasis of breast cancer [7]. New genetic mediators for prostate cancer were found with networks that were reversely engineered from gene expression profiles [8]. Besides this, insights into evolutionary principles were gained by the analysis of gene expression profiles. Gene expression differences were used to define phylogenetic relationships of several Drosophila species [9] and a molecular clock for primates [10]. Furthermore, the regulation of signaling in yeast was investigated on a global scale to observe regulatory adaptation to the cellular environment. Yeast responded to exogenous signals by shorter regulatory cascades to enable fast signal propagation [11].
The aim of our work was to detect characteristic signaling properties of cancer cells on a global scale. We compared the regulation of signaling pathways in cancer with normal cells and mapped gene expression data of tumors and their corresponding non-malignant ("normal") samples onto a comprehensive protein-proteininteraction network. For inferring regulation-principles in cellular signal transduction, we used a graph searching algorithm that tracked pathways with the highest correlation in regulation. We investigated twenty tumordatasets comprising acute myeloid leukemia, esophageal squamos cell-, lung adeno-and renal clear cell carcinoma, breast-, cervical-, head-and-neck-, oral-tongue-, pancreas-and prostate cancer, and vulva interstitial neoplasia. The investigated tumors showed shorter pathways, but a larger extension of the network. The tumors displayed lower frequency of central proteins and links and a higher information entropy (Shannon's information content) in their network. These findings were embedded into a novel signal-regulation motif which was observed considerably more often in normal cells when compared to tumor cells ( Figure 1). Similar to the study of Cui and co-workers [12], central proteins (hubs) were enriched with cancer mutations. We observed that these proteins showed higher regulationintegrity in the normal samples whereas the tumor samples showed motifs of regulatory maintenance of the neighbors of hubs.

Constructing the signaling networks
We assembled our signaling network employing a comprehensive data repository of known protein-protein interactions from the literature (HPRD: Human Protein Reference Database [13,14] version 9 from April 13 th , 2010). Proteins were represented by their coding genes and will also be denoted as nodes of the networks in the following. Gene expression data of each cancer dataset (malignant cells) and the corresponding set of normal samples (non-malignant cells) was mapped onto the nodes of the network. Depending on the coverage of the probes on the microarray chips, the intersection with the HPRD network comprised of 5574 to 8651 nodes including 559 to 706 receptors and 505 to 617 transcription factors (Table 1). Similar to Luscombe and coworkers, we assumed most likely signaling propagation by high co-regulation of genes of two neighboring proteins in the network [11]. We calculated proteinprotein-distances for each link (link-distances) by the co-regulation (one minus the absolute value of Pearson's correlation) of the two interacting proteins (Additional file 1: Supplemental Figure S1). The link-distances were higher (lower absolute correlation) in cancer cells compared to normal cells (average of average link-distances in normal: 0.34, and tumor: 0.52, P = 1.53E-05, Table  1). We defined pathways for each pair of receptors (signal-operator) and transcription-factors (signal-receiver) by their shortest paths yielding a range of 282,295 to 435,602 pathways for each of the investigated cancer datasets. The tumor cells showed a distinct higher coverage of the original protein-interaction network for these pathways. Table 1 gives an overview of the network data for the different datasets we analyzed and also the network-coverage of all receptor-transcriptionfactor pathways for the tumors and the reference samples. From these pathways we constructed specific networks for each tumor and reference sample. For each tumor and normal sample, the constructed networks consisted only of those links and nodes that appeared at least once in their receptor-transcription-factor pathways. Not-appearing links and nodes were discarded ( Figure 2 shows the number of nodes in all constructed networks of normal and cancer tissues). We were interested if these networks were specific for the respective tumor type. For this, we extracted all somatically mutated genes for specific cancer tissues from a database (COSMIC [15]) and tested if our tumor networks contained genes which have been described specifically for the respective tumors. We performed enrichment Figure 1 Comparative cancer motif. Two different signals are transmitted from two receptors (R1 and R2) to a transcription factor (TF). Green and grey arrows indicate the pathways for normal and cancer cells, respectively. The motif was defined for each pair of pathways (from R1 to TF, and from R2 to TF) such that the pathways of normal cells share at least one common link whereas the pathways for cancer cells did not share any link.
tests (Fisher's exact tests) and found that all tumor networks showed a considerably significant enrichment of their corresponding mutated tumor genes (Additional file 1: Supplemental Table S1).

Tumors use shorter paths, more links and less hubs
We calculated a variety of different network-features to characterize specific differences in signaling-regulation of tumor cells and non-malignant cells. The results are given in Table 2 and Table 3 and will be explained in the following. For getting a reasonable estimate of the general tendency of tumors, we calculated the average out of all datasets for cancer and normal networks and performed a significance test of the pair-wise differences between tumor and normal (paired, non-parametric, Wilcoxon-rank test).
The average path-length of cancer networks was less than for non-malignant (average for cancer: 4.58, and normal: 5.50, P = 3.82E-05). We wanted to know how often the same links (interactions) were used for different signaling pathways. For this, we defined the frequency of a link (link-frequency) as the number of receptor-transcription-factor pathways it was involved in. The average link-frequency was obtained by the number of links used in each single pathway from each respective receptor to each transcription factor, divided by the number of all used links. The average link frequency was higher in normal cells (average of average link-frequency for cancer: 122.6, and normal: 234.4, P = 1.53E-05). Similarly, the node frequency was calculated and showed the same tendency (average for cancer: 524.3, and normal: 723.4, P = 2.29E-05). Hence networks of normal cells used more often the same central proteins and interactions for different signaling tasks. Such a hub-like structure is the central characteristic of scale free networks [16]. We were interested if the networks for cancer and normal samples followed these characteristics and if there were distribution differences between them. In deed, the link-frequency distribution of the networks of both entities followed a power law (probability to draw a link with frequency f is proportional to f -a and a > 1). In comparison to the networks from normal cells, the distributions of tumors showed a steeper decline. We calculated the exponent a of the distribution and observed larger exponents for cancer networks (P = 1.91E-06). (exemplarily, Figure 3 shows the distributions and the regression function for cervical cancer 1, the distributions for all datasets are given in Additional file 1: Supplemental Figure S2). This agrees with the lower average of their link-frequency. These distributions also show that proteins of high connectivity (hubs) in the networks of normal cells are more abundant (Additional file 1: Supplemental Figure S3 shows some illustrations of networks). The clustering coefficient has been employed as a measure of connectedness of networks [16]. We calculated the clustering coefficient and obtained lower values for the network of cancer cells supporting our findings that cancer showed a tendency for less centralized, less hub-dependent formation (average of cancer: 0.118, and normal: 0.125, P = 4.20E-04). Specifically, the number of nodes with a clustering coefficient greater zero was distinctively higher in cancer cells (average for cancer: 2208 and normal: 1956, P = 7.63E-05).

Frequently involved genes are enriched with cancer mutated genes
Cui and co-workers compiled a selective list of 284 cancer mutated genes which were derived from large scale sequencing and the literature (Supplementary Table S10 in [12]). We compared this list with the 50 most frequently involved nodes (our hubs) of each network and found significant enrichment for 19 out of 20 normal and tumor datasets (Additional file 1: Supplemental Table S2). We then defined gene-lists of cancer mutated hubs for every cancer by intersecting the hubs of our network with the list of cancer mutated genes of Cui et al. (Additional file 1: Supplemental Table S3). Interestingly, most of the genes which showed up in the tumor networks were also present in the normal networks. This may indicate that normal cells intrinsically pave the way for their specific evolvement into malignancy.
Signaling-regulation in cancer is detached at cancer mutated hubs but maintained in their vicinity Uri Alon and his co-workers studied occurrences of direction-motifs in triangles and revealed a large variety of   substantial characteristics in signaling networks characterized by consistent and non-consistent feed-forward and feedback loops [16]. We were interested in local regulation patterns of the networks at cancer mutated hubs. For this, we analyzed regulation motifs of every triangle consisting of exactly one hub and two of its neighbors which on their part also interact. We defined two regulation motifs. The first motif reflected the degree of regulatory integration of a hub and its network-vicinity and was defined by a high correlation of all pairs of nodes in the triangle motif (integrated motif, motif A in Figure 4). We found this motif significantly more often in normal cells (P = 1.7E-03, Table 3). The second motif (maintenance motif, motif B in Figure 4) described triangles which pairs of hub-nodes (hub-n 1 , hub-n 2 ) showed high correlation in one tissue type and no correlation in the other, while the mutual correlation of nodes n 1 -n 2 stayed in the same category (no, low and high correlation). Such a scenario is reasonable for a mutated cancer protein with loss-of-function leaving their neighbors unaffected. Indeed, this motif occurred more often in the cancer networks (P = 6.34E-04, Table 3).
Tumor networks are more robust against directed attacks Albert and co-workers showed that scale free networks are error tolerant only against attacks of randomly selected nodes but not against directed removals of central nodes (hubs) [17]. We were interested in the robustness of the networks when removing their hubs. For this, we removed the most frequently involved nodes of every network and calculated the average of pair-wise distances (average network diameter) as an estimate of the fragility of the networks [17]. The relative increase of the network diameter due to the removal was distinctively larger in normal cells compared to cancer cells (average for cancer: 1.59, average for normal: 1.64, P = 0.021, Table 2) indicating higher robustness of the tumor networks against directed attacks at their hubs.

Lower information content in normal cells
We used the number of pathways each single link was involved in (link-frequency) as an estimate of the probability that information (such as a phosphorylation) was passed through this link. In this simplified model, every pathway was treated equally. With this, we calculated the information content for each network. As a measure of disorder, Shannon's information entropy [18] was calculated for each network. The cancer networks exhibited a higher information entropy (average for cancer: 11.98, average for normal: 11.38, P = 3.28-04, Table 3) indicating their higher degree of dispersal.
A comparative network motif Inspired by the results described above, we designed a comparative network-motif which is illustrated in Figure 1. We wanted to put up a model in which cancer cells use different pathways for different tasks whereas normal cells use common signaling interactions for different tasks. Therefore a model was designed such that two pathways (two operator-receiver pairs, R 1 -TF and R 2 -TF in Figure 1) of the normal tissue shared at least one common  The motifs were derived for each triple of nodes consisting of a hub and two of its neighbors in the network (n 1 , n 2 ) which were also mutually connected. In the integration motif (motif A) all nodes are pair-wise highly coregulated. Accordingly, the motif is defined by high correlations (low distances) for links hub-n 1 , hub-n 2 and n 1 -n 2 . In contrast, the maintenance motif (motif B) consisted of a hub which was not coregulated with its neighbors n 1 and n 2 . Counted were triangles which pairs of hub-nodes (hub-n 1 and hub-n 2 ) showed high correlation in one tissue type and no correlation in the other, while the correlation of n 1 -n 2 stayed in the same category. Motif C is a consistent feed-forward loop, taken from the literature [21].
link, whereas the same operator-receiver pairs for the tumor did not share any link. We compared the abundance of this motif with the abundance of its counterpart in which the cancer cells used at least one common link and the normal cells did not share any link. We found a significantly higher number of our motif in which the normal cells share a common link (average counts for cancer: 15,333,384, average for normal: 29,618,238, P = 9.54E-06, Table 3).

Discussion
We investigated network properties of cancer signaling by looking at co-regulation patterns of genes for different cancer types. We analyzed the general regulatory behavior of correlating gene expression samples of one tumor type and study, rather than analyzing the regulatory behavior of single patients. For this, we calculated a gene to gene distance metric for all samples (patients) of normal and cancerous tissues. The networks of the investigated tumors showed distinctive mechanisms in the regulation of signal transduction when compared to normal cells and had shorter path lengths. Luscombe and co-workers analyzed the dynamics of regulatory networks in yeast [11]. In comparison to endogenously caused changes, they discovered a different topological adaptation of the network when yeast responded to environmental changes. For having quick responses, yeast reacted to environmental changes (nutrition depletion, stress response) by short regulatory cascades. Our investigated cancer cells showed a similar tendency as yeast under stress at which fine tuned endogenous homeostasis is of minor importance. Interestingly, for yeast, Luscombe et al. discovered a higher frequency of hubs for stress responses whereas we discovered that the tumors used hubs less frequently. Cells of normal sample had a more centralized network to regulate signals via common nodes and links. This was reflected by a smaller network, higher frequency of hubs, lower entropy and a higher number of our signaling motif in which the number of pathway-pairs with common links was counted. This makes sense, as fine-tuning and integrating diverse signals need to be coordinately transferred to the respective transcriptional response which is substantial for fine grained signaling homeostasis of normal cells to co-ordinate their signals in accordance to their cellular community in the tissue. Degenerated tumor cells do not need this any more. In turn, the tumors showed a higher connectedness of the whole network which may strengthen their independency of exogenic perturbations. Similar to Cui and co-workers [12], we observed with our model that cancer specific mutations occur distinctively more often at hubs for signal transduction. Such a mutation can cause a loss of function. This is beneficial for the cancer if the protein gets insensitive to upstream-signals and fires constitutively an oncogenic signal as e.g. the ABL-BCR fusion protein in chronic myelogenous leukemia [19]. If the protein acts as a tumor suppressor, a complete loss of function is beneficial for oncogenesis. In both scenarios, the regulation for signaling homeostasis of the local network environment is detached from this mal-functional protein and a coordinated regulation between the environment and this protein is not necessary any more. We observed this by counting distinctively less integration-motifs in tumors (motif A in Figure 4). Interestingly, tumors seem to sustain the original signals between the environment. We observed this by higher counts of the disruption motif in tumors which reflects the disruption of co-regulation of the hub, but maintained regulation between the neighbors of the hubs (motif B in Figure 4). Even though tumors may exhibit de-regulation of malfunctional hubs with their neighbors, such a maintained co-regulation of their neighbors gives evidence that bypass regulations are still necessary. Ma'ayan and co-workers observed an accumulation of feedback and feed-forward loops at such hubs [20] which supports this idea. Tumors need to maintain the direct signal of e.g. a feed-forward loop which is necessary for the effect of the constitutive signal of an oncogenic hub ( Figure  4C). Such oncogenic signaling motifs may have implications to drug therapy. If an oncogenic hub is treated (as e.g. ABL-BCR with imatinib [19]) resistance can occur by mutations of the target protein which reduce the affinity of the drug to the target. A combined therapy may avoid this evolvement by additionally blocking the signaling-maintenance of the neighbors. In addition, we found that the observed cancer networks showed higher error tolerance against directed attacks of hub removals. Hence, some maintenance signals may not only support cancer mutated hubs but also pave the way for the signaling network to get independent of them, specifically for proteins of cancer mutated genes with loss-of-function. It is challenging but highly relevant to shed light into these effects experimentally with cell lines exhibiting drug resistances at such hubs. We analyzed networks based on cohorts of patients and used the correlation of expression between gene pairs for the whole cohorts. This approach does not allow the analysis of a single sample and therefore can't be employed for diagnosis of a single patient, but rather for the analysis of tumor subgroups. It may be worthwhile developing distance metrics of gene pairs for single samples with which the investigated topology features can be employed supporting diagnosis.
We proposed a novel comparative signaling-motif for malignant signaling-regulation which sums up our findings ( Figure 1). There have been elaborated studies on network motifs [21]. Our comparative cancer motif is different from these motifs in that it shows signalingregulation in cancer reflecting less centralized formation. The comparative cancer motif agrees with our findings of non-integration (motif A, Figure 4) but signalingmaintenance (motif B, Figure 4) of proteins with higher involvement in signal propagation.

Conclusion
We analyzed network models that based on correlation of gene expression between interacting proteins which enabled us to track basic principles of signaling by its regulation. The malignant signaling networks showed more diverse signaling pathways (average number of nodes in the networks of tumor: 3324, and normal tissue: 2973, P = 2.3E-03, Figure 2), shorter pathways (average path-length for cancer: 4.58, and normal: 5.50, P = 3.82E-05, Figure 2), the networks were less centralized (average clustering-coefficient of cancer: 0.118, and normal tissue: 0.125, P = 4.20E-04) and less dependent on hubs (average increase of network-diameter after hub-removal, for cancer: 1.59, and normal tissue: 1.64, P = 0.021). The cancer networks indicated signaling maintenance and increased error tolerance to punctual attacks even at hubs which makes cancer treatment at specific targets challenging.

Methods
The general workflow of our approach is outlined in Figure 5. To investigate if our network features showed a statistically significant difference we performed paired Wilcoxon tests. We set the significance level to P ≤ 0.05 and considered all p-values below this threshold as statistically significant.

Gene expression analysis
We analyzed twenty different datasets of cancer and their corresponding normal or reference samples. For most of the tumors (8 tumors), we analyzed two datasets for each cancer type. We used two AML (acute myeloid leukemia) datasets containing 18 normal and 25 tumor (AML-1) [22] and 4 normal and 52 cancer samples (AML-2) [23]. The first breast cancer dataset (breast-1) was obtained from cancer and normal sample of 43 patients each [24], breast-2 consisted of 143 normal and 42 cancer samples [25]. We analyzed two cervical cancer sets, cervical-1 [26] and cervical-2 [27] comprising data from 8 and 24 normal and 20 and 31 cancer datasets, respectively. Data of esophageal squamous cell carcinomas (ESCC) was obtained from cancerous and normal tissue of 53 patients (taken from the NCBI database Gene Expression Omnibus, accession code GSE23400). We used a glioma data set containing 23 normal and 153 cancer samples [28]. A head-and-neck dataset was taken from a study of head-and-neck squamous carcinoma consisting of data from 22 normal and cancer samples [29]. We used two lung cancer datasets, denoted as "lung-1" and "lung-2". Lung-1 was taken from a study by Bhattacharjee and co-workers [30] and contained data from 17 normal and 13 cancer samples of adenocarcinoma. Bhattacharjee and co-workers clustered the tumor datasets in their study. To obtain the most relevant data subsets with the necessary homogeneity, we selected their cluster of highly aggressive adenocarcinomas (cluster C2 of their cluster analysis) for our study. Lung-2 contained gene expression data of normal sample and adenocarcinoma tumors from 27 patients [31]. We analyzed an oral-tongue-cancer datasets comprising of data from 26 normal and 31 cancer samples (oral-tongue-1 [32]) and 12 and 26 normal and cancer samples, respectively (oral-tongue-2 [33]). We analyzed two datasets for pancreas cancer, pancreas-1 consisting of 39 normal and tumor tissues [34] and pancreas-2 having 15 normal and 36 cancer samples [35]. The first prostate cancer dataset (prostate-1) comprised of data from 50 normal sample and 52 cancer samples [36], and the second (prostate-2) consisted of 50 normal and 52 cancer samples (taken from the NCBI database Gene Expression Omnibus, accession code GSE17951). The dataset Renal-1 contained 23 normal renal samples and 69 samples of renal cancer 69 [37] and renal-2 had 5 normal and 62 cancer samples [38]. For the first renal datasets we selected homogenous samples by performing hierarchical clustering (Euclidean distance, complete linkage) yielding sets of nine clustered samples for normal tissue and 10 for cancerous tissue. We analyzed data from vulva interstitial neoplasia consisting of 10 normal and 9 cancer samples [39]. All datasets were stratified by randomly deleting datasets of the overrepresented class yielding an equal amount of tumor and normal sample datasets. For breast-1, ESCC, head-and-neck, lung-2, pancreas-1, and oral-tongue-1, normal and cancer samples were from the same patients (which was not the case for the other analyzed datasets). The data had been obtained using microarrays from Affymetrix of the following versions: HG-U133A for AML-1, breast-1, cervical-2, ESCC, lung-2 and renal-1, HG-U133 Plus 2 for breast-2, cervical-1, glioma, oraltongue-2, pancreas-1, pancreas-2, prostate-2, renal-2 and vulva; HG-U95Av2 for AML-2, head-and-neck, lung-1, oral-tongue-1 and prostate-1. We normalized all datasets by Variance Stabilization Normalization [40,41].

Network construction
The protein-protein-interaction network was constructed using the Human Protein Reference Database [13,14] (version 9 from April 13 th , 2010). Interacting proteins were represented by their coding genes. The network was constructed for every gene that could be mapped to a microarray probe-set using BioMart [42]. Interactions were not taken into account if probe information for at least one gene was missing. For a link between node (gene) x and y, we defined a link-distance d xy by Pearson's correlation coefficient r xy from gene expression values of the interacting proteins x and y for n samples (patients) and gene expression x i and y i for gene x and y of sample i, respectively. These distances were calculated for each dataset of normal and cancer tissues and used for the networks of the respective datasets. To equally handle induction and inhibition events, we used the absolute values of all correlation coefficients. Correlation values were subtracted from one to obtain low distances for paths with high correlation. Genes with the molecular function term "receptor activity" from the definitions of Gene Ontology [43] were used as receptors in the network. The definitions of transcription factors were taken from TRANSFAC [44]. We used Dijkstra's algorithm [45] for calculating the shortest paths for every pair of receptors and transcription factors in the normal and tumor networks. These shortest paths of all receptor-transcription factor pairs served as the predicted pathways for each dataset and defined our tumor-specific interaction networks. Links and nodes that were not used by any shortest path were removed. The analyses were then performed on the largest connected component of the interaction network.

Defining the network features
Path length, link and node frequency, and the signaling motif are explained in the results. It is to note that link (and node) frequency is similar to betweenness centrality, which is the number of shortest paths going through the link (and node). While betweenness centrality considers shortest paths between all pairs of nodes, node and link frequency as defined here, was the number of shortest paths between pairs of receptors and transcription factors. The (average) network diameter has been described as a measure for error tolerance of a network against removals of nodes in scale free networks [17] and was used here in a similar way. The diameters for the networks were obtained by the average of the shortest paths of each pair of nodes in the network. The network diameter was calculated for undisturbed (whole) networks and networks in which the top 10% of the hubs were removed. The ratio of these values was calculated to yield the increase of the average network diameter after hub removal. The calculation of the information content was based on the assumption that signals enter the network at any receptor with equal probability within a certain time interval. These signals are passed by the links of the network to the transcription factors via the defined pathways from the receptors, again with equal probability. We assumed that the signals vanish from the signaling network after having entered the corresponding transcription factor at the end of the path. Signals enter the receptors with a certain frequency, resulting in an equal distribution and therefore we assumed uniform density of the signals in each pathway. The probability of a signal to pass through the link of node i and j is then proportional to the number of pathways passing through this link. With this, we calculated the information content by Shannon's definition [18] in which n denotes the number of links and p i the probability of a signal to be passed through link i. The clustering coefficient C i for node i was given by in which n links is the number of links connecting the neighbors of node i and k is the number of neighbors. This feature described how well the neighbors were mutually connected. If they were fully connected, the clustering coefficient was one, if they were not connected at all, the clustering coefficient was zero.

Link-frequency distributions
The link-frequency distributions of normal and tumor cells i followed a power law, i.e. the probability of links P(f) with link-frequency f was approximately given by To estimate the exponent a we applied the method proposed by Newman [46] which determines the exponent of the cumulative distribution avoiding noisy data at the tail of the original distribution (see tail of the link frequency distribution in Figure 3). For visualization we plotted the distribution and the corresponding linear function with slope a on a log-log scale. The intersection with the y-axis of the plotted line was calculated using a least squared fit (see Figure 3 and Additional file 1: Supplemental Figure S2).

Defining and counting the integration and the maintenance motif
We defined three correlation categories based on intervals of the absolute values of the correlation coefficient | r xy |: no correlation for the absolute value of correlation coefficients between zero and 0.3, low correlation for the absolute value of correlation coefficients between 0.3 and 0.5, and high correlation above 0.5. Hubs of cancer mutated genes were defined by intersecting the list of cancer genes from Cui and co-workers (Supplementary Table S10 in [12]) with the nodes that appeared in both tissue types (normal and tumor). From this intersection we selected the top 50 most frequently involved nodes from the normal and the tumor network resulting in 100 cancer mutated hubs for every cancer dataset. Hubs that were selected in both tissue types and as such appeared twice in the union set were used only once. For each dataset, we collected all triangles in which one node was such a cancer mutated hub and that appeared in the normal and in the tumor network ensuring the comparability of our motif counts. Out of these triangles, we selected triangles having the motifs for integration (motif A in Figure 4) and maintenance (motif B in Figure 4). For motif A, we selected triangles in which the absolute correlations |r xy | between all pairs of nodes (hub-n 1 , hub-n 2 , n 1 -n 2 , n 1 and n 2 are the two other nodes in the triangle) was high. For motif B, we counted the abundance of triangles which pairs of hub-nodes showed high correlation in one tissue type and no correlation in the other, while the correlation of n 1 -n 2 stayed in the same category (no correlation, low correlation or high correlation).