Skip to main content

Urothelial cancer gene regulatory networks inferred from large-scale RNAseq, Bead and Oligo gene expression data



Urothelial pathogenesis is a complex process driven by an underlying network of interconnected genes. The identification of novel genomic target regions and gene targets that drive urothelial carcinogenesis is crucial in order to improve our current limited understanding of urothelial cancer (UC) on the molecular level. The inference of genome-wide gene regulatory networks (GRN) from large-scale gene expression data provides a promising approach for a detailed investigation of the underlying network structure associated to urothelial carcinogenesis.


In our study we inferred and compared three GRNs by the application of the BC3Net inference algorithm to large-scale transitional cell carcinoma gene expression data sets from Illumina RNAseq (179 samples), Illumina Bead arrays (165 samples) and Affymetrix Oligo microarrays (188 samples). We investigated the structural and functional properties of GRNs for the identification of molecular targets associated to urothelial cancer.


We found that the urothelial cancer (UC) GRNs show a significant enrichment of subnetworks that are associated with known cancer hallmarks including cell cycle, immune response, signaling, differentiation and translation. Interestingly, the most prominent subnetworks of co-located genes were found on chromosome regions 5q31.3 (RNAseq), 8q24.3 (Oligo) and 1q23.3 (Bead), which all represent known genomic regions frequently deregulated or aberated in urothelial cancer and other cancer types. Furthermore, the identified hub genes of the individual GRNs, e.g., HID1/DMC1 (tumor development), RNF17/TDRD4 (cancer antigen) and CYP4A11 (angiogenesis/ metastasis) are known cancer associated markers. The GRNs were highly dataset specific on the interaction level between individual genes, but showed large similarities on the biological function level represented by subnetworks. Remarkably, the RNAseq UC GRN showed twice the proportion of significant functional subnetworks. Based on our analysis of inferential and experimental networks the Bead UC GRN showed the lowest performance compared to the RNAseq and Oligo UC GRNs.


To our knowledge, this is the first study investigating genome-scale UC GRNs. RNAseq based gene expression data is the data platform of choice for a GRN inference. Our study offers new avenues for the identification of novel putative diagnostic targets for subsequent studies in bladder tumors.


Urothelial cancer (UC) is a heterogeneous disease with risk factors that include smoking, contact to chemicals and age [1]. Urothelial tumors originate from the epithelial lining of the bladder and can progress from non-invasive to more aggressive muscle-invasive subtypes which penetrate the deeper tissue layers of the bladder. The non-invasive tumor stages can be treated by transurethral resection, chemo- and intravesical therapy, whereas for invasive stages cystectomy, radiotherapy and chemotherapy are preferred [1,2]. Monitoring of UC is very expensive as its recurrence rate is high [3]. An understanding of the mechanistic interplay between individual genes and proteins that drive the development and progression of UC is therefore a high priority. System-based approaches allow us to investigate the underlying network structure associated with carcinogenesis and thus facilitate a novel perspective for the identification of molecular targets that drive urothelial carcinogenesis. The inference of gene regulatory networks (GRN) from large-scale gene expression data of tumor samples from various grades and stages is a promising approach for the identification of novel putative targets in cancer [4-6].

A GRN is a mathematical description of the dependencies within a gene expression dataset. Currently, a large arsenal of gene regulatory network inference methods have been developed [7-9]. The most popular methods are based on mutual information which is a dependency measure that can be estimated for all pairs of genes in a gene expression dataset. In this study we infer GRNs by the application of the BC3Net algorithm which is based on the aggregation of an ensemble of C3Net gene regulatory networks [4]. The C3Net algorithm selects a maximum of one gene neighbor for each gene on the basis of the strongest mutual dependency that is statistically significant. For a gene expression dataset with n genes we thus obtain a sparse network with at most n interactions. The BC3Net generates an ensemble of C3Net networks from bootstrap datasets, i.e., by sampling a dataset with replacement, that are subsequently aggregated to a weighted network. We have reported that BC3Net was shown to produce biological meaningful results [4-6,10,11]. The hub genes of GRNs that were inferred from large-scale cancer gene expression data were observed to provide promising putative novel target genes for cancer such as G-protein coupled receptors and transmembrane proteins [10,12].


Preprocessing and sample information for the Illumina RNAseq, Bead array and Affymetrix Oligo microarray gene expression dataset

We preprocessed three large-scale urothelial cancer gene expression datasets from a) Illumina RNAseq (179 samples) [13], b) Illumina Bead array (165 samples) GSE13507 [14,15] and c) Affymetrix oligo microarray (188 samples) platform [16-19]. An overview of the tumor stage information for the three datasets is shown in Table 1. In this table we distinguish 6 tumor stages, namely, pTcis, pTa, pT1, pT2, pT3 and pT4. For each of these stages we list the number of available samples provided by the three platforms.

Table 1 Tumor stages across the three datasets

RNAseq gene expression dataset from TCGA

The RNAseq gene expression dataset was retrieved from The Cancer Genome Atlas bladder cancer TCGA project [] [13]. We used the preprocessed RNAseqV2 normalized count expression values based on RSEM (RNA-Seq by Expectation-Maximization) [20,21] as provided by TCGA and clinical information such as the TCGA barcode identifier, sample type and tumor histology by the bcr_aliquot_uuid identifiers. We extracted gene expression data of primary solid tumors for a total of 179 samples with histology stage information (march 2014). A total of 177 of the 179 selected tumor samples represent muscle invasive carcinoma stage pT2 or above. We performed a log-transformation l o g e (1+p) on the count expression values. The resulting gene expression matrix consisted of 20,161 entrez genes and 179 samples. Genes with a zero standard deviation were removed from the dataset.

Illumina Bead array gene expression dataset

We used the processed matrix series Illumina Bead gene expression data from GEO GSE13507 [14,15]. The dataset comprises 257 samples from the Illumina human-6 v2.0 expression Beadchip microarray platform. The dataset consists of tumor samples from 62 muscle invasive, 104 non-muscle invasive, 23 recurrent non-invasive bladder cancer and samples that we excluded for the network inference representing 58 mucosae surrounding cancer and 10 normal mucosae. We assigned Illumina identifiers to entrez gene id and gene symbols using the illuminaHumanv2.db annotation bioconductor package. From the 43,148 Illumina identifiers for a total of 20,481 an entrez identifier was available. The remaining 22,667 features were not considered for the analysis. In total, we selected 165 primary bladder cancer samples (Table 1).

Affymetrix Oligo microarray gene expression dataset

We used a third UC dataset from Affymetrix gene expression data comprising 183 samples from 4 different datasets. We extracted 93 (U133plus2) samples from GSE31684 [16], 46 (U133A) samples from GSE3167 [17], 30 (U133A) samples from GSE5287 [18] and 19 (U133A) samples from GSE37317 [19]. We considered only probe sets that were present in both array types U133a and U133plus2. We combined the U133plus2 samples and the U133a samples using the matchprobes bioconductor package [22]. We normalized the microarray samples using RMA and quantile normalization [23] using l o g 2 expression intensities for each probe set. As a summary statistic for multiple probesets that match to the same entrez gene identifiers we used the median expression value. Entrez gene ID to Affymetrix probe set annotation was obtained from the hgu133plus2.db and hgu133a.db R package. We excluded all probe sets from our analysis that remained unmapped to entrez identifiers. The resulting expression dataset consisted of 12,495 genes and 188 samples.

BC3Net gene regulatory network inference

We inferred our bladder cancer GRN using C3Net and the “B”agging version of the C3Net [24,25] algorithm called BC3Net [4]. The BC3Net infers an ensemble of C3Net gene regulatory networks from bootstrap generated datasets that are subsequently aggregated to a weighted GRN. We defined an ensemble of B=100 independent bootstrap datasets \(\{{D^{b}_{k}}\}_{k=1}^{B}\) that were generated from a given gene expression dataset D. For each bootstrap data set \({D_{k}^{b}}\) a GRN \({G^{b}_{k}}\) was inferred using C3Net [24,25]. Edges with non-significant mutual information values were subsequently rejected using a non-parametric test with a Bonferroni multiple testing correction for a significance level α=0.05. The null distribution of mutual information is generated from sample-gene label permutations of the original gene expression matrix. For the network inference we used a Pearson Estimator [8,26]

$$ I(X,Y) = -\frac{1}{2} log(1 - \rho^{2}), $$

where ρ denotes the Pearson correlation coefficient. The inferred ensemble of GRNs \(\{{G^{b}_{k}}\}_{k=1}^{B}\) was aggregated into a weighted network \({G_{w}^{b}}\). The weights of the inferred interactions give the frequency how often an interaction was observed in the C3Net network ensemble and are denoted as ensemble consensus rate (ECR). For each inferred weighted edge in the network the statistical significance was estimated by a Binomial test. For multiple testing correction Bonferroni was used with a significance level α=0.05.

Relevance networks

For the inference of relevance networks [27] we used the WGCNA R Package [28] and the CLR [29] implementation provided in the minet R-Package [30]. Interactions were defined for WGCNA by hard thresholds on the absolute Pearson correlation matrix and for CLR by hard thresholds on the z-score transformed mutual information matrix that was estimated using a Pearson Estimator [8,26].

Cancer census genes

The Cancer Gene Census (CGC) [31] (version download 10-01-2014) [] provides information about genes with somatic mutations that are associated to different types of cancer. We used the entrez identifiers of the defined cancer census genes.

Gene ontology gene sets

For our analysis, we obtained the Gene Ontology [32] annotation for entrez gene IDs from Bioconductor [22] annotation packages and GO.db.

Gene family gene sets

We retrieved gene family protein tag information and entrez identifiers of the genes in the HGNC database []. We defined gene family gene sets for groups of genes that shared the same HGNC protein family tag. From the HGNC database we gathered a total of 587 gene family gene sets comprising a total of 16,722 entrez genes.

Gene sets of co-localized adjacent genes

For the identification of genomic regions with enriched subnetworks of interacting genes we defined gene sets of genes that were adjacently located within a chromosomal region (co-located) from overlaping sliding windows along the human chromosomes. We defined gene sets from 1M b (mega bases) sliding windows along the human chromosomes with a 500K b (kilo bases) overlap between adjacent windows. The gene sets of co-located genes were defined for chromosome regions of 1M b with 500K b overlap to mimic the extend for co-expressed gene clusters [33].

Gene pair enrichment analysis (GPEA)

The GPEA facilitates the identification and ranking of significant subnetworks of defined gene sets for a given network. For p genes there is a total of N=p(p−1)/2 different gene pairs. If there are p S genes for a particular gene set (S) then the total number of gene pairs for this gene set is m S =p S (p S −1)/2. When a network G contains n interactions, of which k interactions are among genes from the given gene set S, then a p-value for the enrichment of gene pairs of this gene set S can be calculated from the following hypergeometric distribution

$$ p(k|S) = \sum\limits_{i=k}^{m_{S}} P(X=i | S) = \sum\limits_{i=k}^{m_{S}} \frac{{m_{S} \choose {i}} {N-m_{S} \choose {n-i}}}{{N \choose {n}}} $$

This p-value gives an estimate for the probability to observe k or more interactions between genes from a given gene set S.

We performed a GPEA analysis for the inferred GRNs for 8,000 gene sets of GO biological process (≥3 and <1000 genes), 500 gene sets of gene families (≥3 genes) and 4,000 gene sets of co-located genes (≥3 genes). For the analysis the inferred networks are expected to show a strong association to gene sets of a biological functional and spatial context. Therefore, we considered a more stringent significance level of α=0.001 (103) relative to the number of performed test in the range of 103. Further, we considered a Bonferroni multiple testing correction.

Network centrality measures

For the network analysis we measured the degree centrality and edge density [34]. The degree centrality was defined as the total number of direct neighbors of a gene g i of an undirected gene regulatory network. The edge density of a network was the number of edges divided by the maximal number of possible edges. For an undirected network this number was given by n(n−1)/2, whereas n is the total number of genes.

Protein interaction databases

We gathered and processed interactions from Biogrid [35] (15,337 genes, 135,732 interactions; version biogrid.3.2.11), Intact [36] (10,029 genes, 63,968 interactions; version intact.230314), Mint [37] (7,106 genes, 26,834 interactions; version mint.2013-03-26), Hprd [38] (9,672 genes, 39,233; version hprd.072010), String [39] (20,770 genes, 4,850,628 interactions, version 9.1). Further, we considered the largest manually curated human signaling network [40] (6,306 genes and 57,090 interactions, version 6) which we denote in the text as SingNet (, a pathway protein interaction network extracted from the bioconductor package graphite [41] (6,243 genes, 78,201 interactions; KEGG, Reactome, NCI and Spike) and the integrative network from ConsensusPathDB (CPDB) [42] (16,619 genes, 485,277 interactions; version Dec 2014). We assigned their entrez gene identifiers mapping when available from the interaction database or converted the identifiers (e.g. uniprot identifiers) to entrez identifiers using the annotation from the bioconductor package and uniprot database [43].

Quantitative comparison of experimental interactions in RNAseq, Bead and Oligo UC GRN

We used the interactions from the Biogrid, Intact, Mint, Hprd, CPDB, SingNet, graphite and String database seperately as global reference networks for the GRN and measured the number of true, false positive (TP, FP), true, false negatives (TN, FN) and F-score to compare the performance of the three inferred gene regulatory networks. The F-score measure \(F=2\frac {PR}{P+R}\) gives a weighted average of the precision \(P= \frac {TP}{TP+FP}\) and recall \(R=\frac {TP}{TP+FN}\).

For the local subnetwork based network inference performance comparisons we used the String network as a reference network. We compared the cumulative log transformed F-score distribution separately for commonly significant GO Biological Process, genomic co-located genes and gene family subnetworks between the Oligo, Bead and RNAseq UC GRNs.

For all subnetworks and pairwise network comparisons we performed a hypergeometric test for the number of shared interactions between two networks is not larger than expected by random chance. For the subnetwork analysis we consider FDR multiple testing correction [44].


Urothelial cancer (UC) gene regulatory networks (GRN)

For the identification of molecular targets for UC from a network-based perspective we inferred BC3net GRNs from RNAseq, Bead and Oligo UC gene expression datasets. The giant connected component of the inferred RNAseq UC GRN consisted of 18,952 genes, the Bead UC GRN of 20,140 genes and the Oligo UC consisted of 12,492 genes (Figure 1A). In the following we compared the global network and local structural properties between the three networks.

Figure 1
figure 1

Overview of the comparisons between the inferred RNAseq (A), Bead (C) and Oligo (E) UC GRN. The pairwise comparisons of the three networks are shown in (B, D, F) and triple comparison in (G). We compared the networks on the gene interaction level and gene set subnetwork level for Gene Ontology biological processes, gene families and co-located gene sets of 1 Mb genomic regions.

The global network properties among the three networks were highly similar. The edge density (d0.001) of the Oligo has a slightly higher edge density compared to the RNAseq and Bead UC GRN. The degree distribution of the UC GRN follow a power law distribution with exponents α RNAseq=4.09, α Bead=4.16α Oligo=3.73. The average shortest path length for all three networks was p RNAseq=4.17, p Bead=4.45, p Oligo=4.43 genes (measured with the Dijkstra distance [45]).

On the local structural level of individual interactions we observed that the three networks are highly dissimilar. We performed a pairwise comparison and joint comparison of all three networks to quantify the number of shared interactions. The percentages of shared interactions were quantified from the union of all interactions of two networks and for the joint comparison from the union of all interactions of the three networks. The GRN networks shared only a total of 6,676 (2.17%, RNAseq/Bead), 5,321 (2.04%, RNAseq/Oligo) and 2,648 (1.22%, Bead/Oligo) interactions which corresponded to subnetworks among 3,312 to 7,361 genes. In total, we found that only 1,002 (0.25%) interactions were shared across the three GRNs and corresponded to a subnetwork among 1,539 genes. An overview of our gene expression data and inferred gene regulatory networks on the interaction level is shown in Figure 1.

Functional analysis of the inferred Oligo, Bead and RNAseq UC GRN

In this section we highlighted the key biological processes of the three UC GRNs and their association to known cancer genes and performed a comparative analysis between inferred GRN, relevance and PPI networks. We identified the most prominent subnetworks for known biological processes of the inferred Oligo, Bead and RNAseq UC GRN by a functional enrichment analysis for gene pairs (GPEA). The association of the identified biological processes to known cancer hallmarks was quantified by a subsequent enrichment analysis of cancer census genes.

The GPEA analysis was performed for Gene Onotology (GO) biological process for all terms with ≥3 and <1000 genes. For the RNAseq UC GRN we observed a total of 10.3% significant GO terms. In contrast, for the Bead and Oligo UC GRN only 4.94% and 5.38% of all tested GO terms were significant Figure 1. For all GRN networks we observed that 50% of the identified significant GO terms were also enriched by cancer census genes. A total of 91% (RNAseq), 88% (Bead) and 93% (Oligo) of the cancer genes were present in the selected set of significant Gene Ontology biological processes.

From all significant GO terms that we identified 299 (55.4%) GO terms were common across the three UC GRNs. We observed a wide variety of common biological processes with a pronounced representation of immune related processes, cell cycle, catabolic processes such as proteolysis, chromatin organization, metabolism, cell adhesion, cell migration, cell differentiation and development including keratinization and angiogenesis. A complete list of the significant terms for the individual analyses is given in the Additional file 1: Tables S1, S2 and S3. An overview of the functional landscape of the common significant terms among the GRN networks is shown in Additional file 1: Figure S1.

In order to evaluate our results we compared the fraction of cancer associated biological processes between the BC3Net and C3Net GRN, WGCNA and CLR relevance networks and PPI networks from graphite, SingNet and CPDB. The analysis was performed separately for the Oligo, Bead and RNAseq gene expression data. For the analysis we generated relevance networks by hard thresholds for 0.1 to 0.9 percentiles of the absolute correlation matrix (WGCNA) and the z-score transformed matrix (CLR). C3Net inferred interactions were weighted by the respective mutual information value and for BC3Net by the ensemble consensus rate (ECR). For C3Net and BC3Net the analysis was performed on the entire network and for an ensemble of hard thresholds ranging from 0.1 to 0.9 percentiles. Figure 2 shows the fraction of biological processes that were identified from the GPEA analysis (α=0.001, Bonferroni) with a significant enrichment of cancer census genes. For all 3 datasets the BC3Net showed the largest fraction of cancer associated significant biological processes (50%). CLR and WGCNA showed a low performance on the Oligo gene expression dataset (25% to 35%) that is comparable to PPI networks. We also observed that CLR shows a prominently improved performance compared to WGCNA.

Figure 2
figure 2

GPEA analysis of GO biological process gene sets for relevance networks, C3Net and BC3Net GRNs. Shown are the fraction of significant GO biological process terms with significant enrichment of cancer census genes.

Gene pair enrichment analysis of gene subnetworks of co-located genes

Gene expression profiles that are influenced by genomic and epigenomic alterations can elucidate dependency structures of co-located genes and link to novel genomic target regions which are specific to urothelial cancer. In order to identify genomic cancer target regions with significant subnetworks in the RNAseq, Bead and Oligo GRNs we perfromed an enrichment analysis for gene pairs in gene sets from genome-wide 1 Mb genomic regions of co-located genes.

We observed 11.68% significantly co-located gene subnetworks for the RNAseq UC GRN. In contrast, for the Bead and Oligo UC GRN we identified only 2−3%. Figure 3 shows the most prominent co-located gene subnetwork for the GRNs of the RNAseq, Bead array and Oligo UC dataset. For the three GRNs the top 50 chromosomal regions with a significant GRN subnetwork are shown in Tables 2, 3 and 4 (for full tables see Additional file 1: Tables S4, S5 and S6). We reviewed the literature for the most prominent identified genomic region and their association to UC for each GRN. For the RNA-seq UC GRN the most prominent gene subnetwork was located on chromosome locus 5q31.3 and represents a protocadherin gene cluster. In UC, the loci 5q31.2−q32 has been associated with losses in a low fraction of UC tumors [46,47]. In [48,49] an epigenetic analysis was performed on free DNA derived from blood serum samples from UC patients of the protocadherin PCDH10 an PCDH8. The studies of [48,49] showed that the methylation patterns of PCDH10 and PCDH8 were significantly associated with stage, grade, recurrence and tumor size. The 5q31.3 locus was also described in Wilms tumor to be epigentically silenced [50]. For the Bead UC GRN the most prominent co-located gene subnetwork was located on chromosome locus 8q24.3. The 8q24.3 locus is a common gain loci in multiple cancers and was also confirmed in multiple UC cell lines [51]. For the Oligo UC GRN the chromosome locus 1q23.3 contained the most prominent co-located gene subnetwork. In [52] a gain of 1q23.3 was identified from free DNA in urine samples from UC patients. Further, [52] validated a selected candidate PFND2 that is located in 1q23.3 in an independent set of urothelial cancer tumors. PFND2 was significantly amplified and overexpressed and showed association to increasing stage and tumor grade.

Figure 3
figure 3

Significant genomic UC GRN subnetworks defined from co-located genes of 1 Mb genomic regions.(A) Overview of the GPEA analysis showing all significant genomic subnetworks for the RNAseq (yellow), Bead (blue) and Oligo (red) UC GRNs. (B-D) Shown are the most prominent genomic subnetworks derived from the RNAseq (B), Bead (C) and Oligo (D) UC GRNs.

Table 2 GPEA analysis of 1 Mb genomic regions gene sets for the RNAseq UC GRN
Table 3 GPEA analysis of 1 Mb genomic regions gene sets for the Bead UC GRN
Table 4 GPEA analysis of 1 Mb genomic regions gene sets for the Oligo urothelial cancer GRN

Gene pair enrichment analysis of gene family subnetworks

A gene family is a group of duplicated genes with similar biological functions or biochemical activities which often form gene clusters of genes with chromosomal co-location. In this section we performed a GPEA for gene family gene sets and compared the results between the UC RNAseq, Bead and Oligo GRNs.

We found a total of 20.44% (103) significant gene family subnetworks in the RNAseq GRN (Additional file 1: Table S7), but only 8.2% (40) for the Bead UC GRN (Additional file 1: Table S8) and 9.1% (39) for the Oligo UC GRN were significantly enriched (Additional file 1: Table S9). However, we found a high agreement of the gene family subnetworks between our GRNs with a total pairwise overlap of 30−60% and among the three networks 25% of all identified gene family subnetworks (Figure 1D). It is noteworthy that 97.5% of the gene families identified by the Bead and 82% of the gene families identified by the Oligo network were also significantly enriched in the RNAseq GRN. In total 28 gene families were identified across all three UC GRNs which described CD molecules, keratin proteins (KRT), protocadherins (PCDHC), kalikrein proteins (KLK), zinc-finger transcription factors, metallothioneins and the immunoglobulin superfamily. An overview of the significant gene families that were common across all three GRNs are shown in Table 5.

Table 5 GPEA analysis of gene family gene sets for the UC RNAseq, Bead and Oligo GRN

Bladder cancer GRN degree centrality and hub genes

The identification of highly interactive central genes, i.e., hub genes of inferred and experimental network can provide promising targets for urothelial cancer. In this section we described individual hub genes of the gene regulatory network and review their functional role and relevance for the study of UC.

In order to compare the global structural properties of individual genes we performed a pairwise comparison of the degree centrality for 11,700 genes that are present among the three networks. The pairwise comparisons of the gene degree centrality across the three networks showed only a weak correlation. The degree ranks showed a slightly higher correlation between the RNAseq and Bead GRN of r=0.22 (RNAseq-Bead, p≤2.2e−16) compared to r=0.16 (Bead-Oligo, p≤2.2e−16) and r=0.16 (RNAseq-Oligo, p≤2.2e−16).

Hub genes of gene regulatory networks were observed to be highly dataset specific. Table 6 A, B, C shows the six most frequently observed hub genes for each of the inferred UC GRNs. In the following we describe the hub genes for which there is strong evidence for their relevance to cancer related properties. For example, the transmembrane protein HID1 that was observed as a major hubgene in the RNAseq GRN is reported to be downregulated in multiple cancers [53]; FER1L4 is a lncRNA reported to be prominently downregulated in gastric cancer [54], TTLL3 is described as a candidate cancer gene [55], RIF1 has been described to have anti-apoptotic properties in DNA repair [56] and SBNO1 (strawberry notch homolog 1) was reported in lung cancer [57]. For the UC Bead GRN, RNF17 (TDRD4) is a potential liver cancer CT antigen [58] and TMED7 was observed to be upregulated in a nasopharyngeal carcinoma cell line and described to act as a major immune system switch [59]. The Oligo GRN hubgene CYP4A11 was shown to promote angiogenesis and metastasis in lung cancer [60] and SLC38A3 (SNAT3) is a glutamine transporter and has been described as a marker for malignant glioma [61].

Table 6 The six major hub genes for the RNAseq, Bead and Oligo UC GRN

Overlapped only for a single term, i.e., for the regulation of neuron differentiation (GO:0045664). For the RNAseq and Oligo GRN we observed nine terms in agreement, e.g., G-protein coupled receptor signaling pathway (GO:0007186), ion transport (GO:0006811) and sensory perception (GO:0007600). We observed that the average degree centrality of the networks with randomized gene labels was similar across the gene sets and independent of the number of genes of a gene set. However, our RNAseq dataset mostly considered muscle-invasive UCs and shows terms associated to invasiveness. The Bead data comprised terms that were nuclear while the terms predicted from the Oligo data were predominantly extracellular membrane associated.

Quantitative comparison of experimental interactions in RNAseq, Bead and Oligo UC GRN

In this section we quantified the extend of local and globally shared interactions between the inferred GRNs and ppi networks. We assessed the global extent of interactions from PPI databases that were present in the UC GRNs by comparing the entire GRNs networks to String, Biogrid, Hprd, Intact, Mint, Graphite, CPDB and SingNet (Additional file 1: Table S10).

As observed on the subnetwork level using all String interactions the F-scores between the RNAseq and the Oligo UC GRN were similar. In contrast the Bead UC GRN showed a lower F-score for all PPI databases compared to the RNA-seq GRN and Bead GRN (Additional file 1: Table S10).

Further, we compared the relative quantity of PPI interactions for the identified significant subnetworks of the gene sets for biological processes, genomic co-located genes and gene families. To avoid the comparison to subnetworks with no known protein-protein associations we used String as reference as it was the largest collection of PPI interactions that we considered in our study. For each gene set we computed F-scores by comparing the corresponding subnetwork of the GRN to the subnetwork of the String network reference. Figure 4 shows the cumulative F-score distributions between the RNAseq, Bead and Oligo GRN separately for the 299 GO terms, 28 gene family subnetworks and 40 chromosomal 1 Mb regions. In addition, we repeated the analysis for each GRN 25 times using a reference subnetwork where the gene labels were randomized (Figure 4).

Figure 4
figure 4

Network inference performance using interactions from the STRING database. Shown are the F-score distributions for commonly significant GRN subnetworks of the GPEA analysis for A) 299 Gene Ontology Biological Process terms; B) co-located genes of 1 Mb genomic regions; C) 28 gene families; and D) p-value distribution (FDR) estimated for all gene sets.

The average F-scores were highest for the subnetworks of genomic co-located genes (F RNAseq =0.26, F Bead =0.18 and F Oligo =0.26) and for the gene family gene sets (F RNAseq =0.15, F Bead =0.12, F Oligo =0.15). The Gene Ontology gene sets had the lowest average F-scores compared to the genomic and gene family subnetworks (F RNAseq =0.044, F Bead =0.029, F Oligo =0.039) (Figure 4 and Additional file 1: Table S11). The observations are in agreement with the global analysis for RNAseq and Oligo GRN, where the Bead UC GRN has the tendency to perform worse. However, the RNAseq UC GRN shows the tendency for smaller p-values over the Bead and Oligo UC GRN (Figure 4D).

The reference network with randomized gene labels were significantly lower compared to the GRN for all comparisons. The Bead GRN had the tendency to show a significantly lower mean F-score compared to the RNAseq and Oligo GRN for the Gene Ontology (t-test, p R N A s e qB e a d =0.000007, p B e a dO l i g o =0.002390) and the genomic 1Mb window subnetworks (p R N A s e qB e a d =0.000675, p B e a dO l i g o =0.001697). For the gene family subnetworks we did not observe a significant difference among the three GRN.


In this paper, we have presented novel perspectives and applications for the identification of UC molecular targets using GRNs. Specifically, we performed a structural, functional and comparative analysis across three UC GRNs that were independently inferred from three large-scale RNAseq, Bead and Oligo gene expression datasets. Our objective was to identify putative prognostic UC targets for a subsequent investigation in UC tumors on the basis of their enrichment in functional subnetworks and hub gene analysis. Our results demonstrate that GRNs are highly dataset specific on the gene interaction level and showed large similarities across the functional subnetwork levels. The RNAseq based GRN showed the most prominent functional enrichment and is thus the data type of choice for a network inference. The RNAseq and Oligo GRN showed a similar inference performance based on public interaction databases and outperformed the Bead based GRN.

On the structural level, the three inferred GRNs were observed to follow a power law distribution [62] that is common for inferred and experimental biological networks [63-65]. Our results demonstrated that the network structure at the gene level of GRNs are highly dependent on the individual gene expression dataset. On the gene interaction level the pairwise comparison between the networks showed only an overlap of 2% and only 0.25% of all interactions are common among the three networks (Figure 1). There are three main explanations for this observation. The first reason is that the BC3Net algorithm considers only the strongest interaction neighbors for each gene and is thus highly dependent on the search space of the genes that are included in the dataset. The second reason is that the variations caused by concordance differences of the expression are dependent on technical properties of the individual gene expression platforms and platform dependent data processing procedures. The third reason is that the datasets represent varying proportions of different tumor grades and stages from individual patients that represent a complex condition phenotype. Further, gene expression profiles of tumor tissues are highly heterogeneous on the molecular and tissue level, i.e., tumor clonal variation within and between different patient samples [66].

In [67] a guilt-by-association approach was developped to predict molecular roles of genes with unknown functions. The “guilt-by-association” property of genes that are connected in a defined network can also be used for a functional enrichment analysis for gene pairs which have known functions and are involved in the same biological processes. We identified significant functional GRN subnetworks by performing a gene pair enrichment analysis (GPEA) for defined gene sets. We used the terminology gene pair enrichment analysis to distinguish the latter from the terminology for a gene-based enrichment analysis which has no structural constraint. The concept for the analysis was introduced from graph theory [68] and has been developped and applied for the identification of significant protein complex and ontology gene sets in PPI and inferred networks [69-72].

A total of 5 to 10% of all tested Gene Ontology Biological process terms, 2 to 10% of gene sets of co-located genes and 9 to 20% gene family gene sets showed a significant subnetworks by the enrichment of inferred interactions (Figure 1). RNAseq based network showed more than twice the proportion of significant subnetworks compared to the Bead and Oligo microarray based GRN. Our results showed in a quantative manner that RNAseq is beneficial for GRN inference compared to Bead and Oligo microarray based data. The major advantages of RNAseq are more accurate measurement of the dynamic range of low and highly expressed genes [73] and thus gives a better resolution of the underlying functional dependency structure between the genes.

In contrast to the low similarity that was observed between the GRNs on the structural interaction level, we observed high similarities on the functional subnetwork level (Figure 1). The fraction of significant biological process Gene Ontology terms that were common across the three UC GRN was above 55%. For the gene family subnetworks we observed a similarity for 25.45% and the lowest percentage for genomic co-located gene subnetworks 6.8% (Figure 1).

The networks described a prominent enrichment for known cancer hallmarks [74] with significant GO subnetworks related to immune response, cell cycle, signal transduction, DNA repair, translation, proteolysis, metabolic terms such as respiration and cell morphogenesis, adhesion and migration. Further, over 50% of the significant GO subnetworks were highly enriched by known cancer genes defined by the cosmic cancer census [31] across the three UC GRNs. We observed that the fraction of cancer associated subnetworks is prominently lower in relevance network inference methods. This may result from low dependency measures of relevant interactions of genes in a more complex context being excluded from a relevance network by a global threshold. For other GRN inference methods we expect similar results to the results presented by the BC3Net that is based on the C3Net. A C3Net infers a core structure of a GRN and thus infers only a subnetwork of other GRN inference methods based on mutual information [10]. For each gene in a C3Net at most one gene neighbor with strongest mutual dependency is selected, which results in a highly reduced time complexity for multiple testing of mutual information. The C3Net and BC3Net GRNs inference method is therefore less time consuming which makes the inference of very large GRNs (>20K genes) feasible in a reasonable time.

On the genomic level, the GRNs were investigated for genomic UC targets, where we identified genomic regions with known diagnostic and prognostic properties for urothelial cancer such as 1q23.3 [52] (Oligo GRN), 8q24.3 [51] (Bead GRN) and 5q31.3 [48-50] (RNAseq GRN). The identified genomic regions can link to chromosomal aberrations, histone modifications, changes in epigenetic regulation (methylation), regulatory elements and spatial chromosome organization in the nucleus. These processes are commonly deregulated in cancer. For example the impairment of DNA repair mechanisms leads to an accumulation of chromosomal aberrations that are frequently observed in the progression of UC [75]. The identification of subnetworks of genomic regions from co-located genes therefore provided a powerful tool to identify putative novel genomic targets from cancer gene expression datasets.

In the analysis of gene families we found CD molecules as the most prominent gene family. CD molecules are promising targets for novel cancer immunotherapies such as CD47 [76]. Some popular UC biomarkers target proteins of an entire gene family and not a single gene product such as Keratins [77] and Kalikrein proteins [78] which are popular tumor markers for UC. Gene families are crucial in cancer research [79] because they represent groups of genes that are functionally highly redundant and represent potential targets of the underlying molecular heterogeneity that is observed for malignant processes. We showed that the identification and ranking of functional and co-located gene sets and gene families using our GPEA on GRNs is a versatile approach for the generation of novel targets and molecular understanding of the properties of urothelial cancer from the perspective of large-scale tumor tissue gene expression data.

Hub genes of GRNs reflect the most prominent dependencies of the expression profile to a large number of genes. We identified hub genes such as HID1 [53] (RNAseq) RNF17 (TDRD4) [58] (Bead), CYP4A11 [60] (Oligo) for the individual GRNs which show in the literature strong evidence for cancer related diagnostic and prognostic properties. Further, we performed a degree centrality analysis of the GRNs that showed that the degree centrality of the genes allow to target promising mediators of cancer related cellular activities and signaling processes.

In addition, we performed a quantitative comparison of protein interactions for the RNAseq, Bead and Oligo UC GRNs. We note that the overlap for protein interaction data and GRNs is expected to be low and non-random. For example the most prominent PPI interactions that can be found in a GRNs are physical interactions of genes corresponding to large protein complexes (e.g. ribosome biogenesis and proteasome) in contrast to more transient protein interactions [11,80]. A GRN is inferred from gene expression data and thus can only detect indirect association to the protein level of a gene network. However, the analysis allowed to compare network properties between the Oligo, Bead and RNAseq data and pointed to the tendency that Oligo expression data should be prefered over Bead expression data for a GRN inference.


On the functional and structural level our results demonstrated that RNAseq based data is the preferred data type for a GRN inference. GRNs are highly dataset-specific at the interaction level, while at the global functional level they are highly similar. GRN inference is a powerful tool to provide a database of novel UC targets that can be studied for prognostic and diagnostic clinical applications [48,49,52,58,60].


  1. Kaufman D, Shipley W, Feldman A. Bladder cancer. Lancet. 2009; 374(9685):239–49.

    Article  CAS  PubMed  Google Scholar 

  2. Martyn-Hemphill C, Mak D, Khan M, Challacombe B, Bishop C. Recent advances in diagnosis and treatment of transitional cell carcinoma of the bladder. Int J Surg. 2013; 11(9):749–52.

    Article  PubMed  Google Scholar 

  3. Chamie K, Litwin M, Bassett J, Daskivich T, Lai J, Hanley J, et al.Recurrence of high-risk bladder cancer: a population-based analysis. Cancer. 2013; 119(17):3219–27.

    Article  PubMed Central  PubMed  Google Scholar 

  4. de Matos Simoes R, Tripathi S, Emmert-Streib F. Organizational structure and the periphery of the gene regulatory network in B-cell lymphoma. BMC Syst Biol. 2012; 6:38.

    Article  PubMed Central  PubMed  Google Scholar 

  5. Emmert-Streib F, de Matos Simoes R, Mullan P, Haibe-Kains B, Dehmer M. The gene regulatory network for breast cancer: integrated regulatory landscape of cancer hallmarks. Front Genet. 2014; 5:15.

    PubMed Central  PubMed  Google Scholar 

  6. Emmert-Streib F, de Matos Simoes R, Galina Glazko G, Simon McDade S, Benjamin Haibe-Kains B, Andreas Holzinger A, et al.Functional and genetic analysis of the colon cancer network. BMC Bioinf. 2014; 15 Suppl 6:6.

    Article  Google Scholar 

  7. Margolin A, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, et al.ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinf. 2006; 7 Suppl 1:7.

    Article  Google Scholar 

  8. Meyer PE, Kontos K, Lafitte F, Bontempi G. Information-theoretic inference of large transcriptional regulatory networks. EURASIP J Bioinf Syst Biol. 2007; 1:79879.

    Google Scholar 

  9. Huynh-Thu V, Irrthum A, Wehenkel L, Geurts P. Inferring regulatory networks from expression data using tree-based methods. PLoS One. 2010; 5(9):e12776.

    Article  PubMed Central  PubMed  Google Scholar 

  10. de Matos Simoes R, Dehmer M, Emmert-Streib F. B-cell lymphoma gene regulatory networks: biological consistency among inference methods. Front Genet. 2013; 4:281.

    Article  PubMed Central  PubMed  Google Scholar 

  11. de Matos Simoes R, Dehmer M, Emmert-Streib F. Interfacing cellular networks of S. cerevisiae and E. coli: connecting dynamic and genetic information. BMC Genomics. 2013; 14:324.

    Article  PubMed  Google Scholar 

  12. de Matos Simoes R, Emmert-Streib F. Influence of statistical estimators of mutual information and data heterogeneity on the inference of gene regulatory networks. PLoS One. 2011; 6(12):29279.

    Article  Google Scholar 

  13. Network CGAR. Comprehensive molecular characterization of urothelial bladder carcinoma. Nature. 2014; 20(507(7492)):315–22.

    Google Scholar 

  14. Kim W, Kim E, Kim S, Kim Y, Ha Y, Jeong P, et al.Predictive value of progression-related gene classifier in primary non-muscle invasive bladder cancer. Mol Cancer. 2010; 9:3.

    Article  PubMed Central  PubMed  Google Scholar 

  15. Lee J, Leem S, Lee S, Kim S, Park E, Kim S, et al.Expression signature of E2F1 and its associated genes predict superficial to invasive progression of bladder tumors. J Clin Oncol. 2010; 28(16):2660–7.

    Article  CAS  PubMed  Google Scholar 

  16. Riester M, Taylor J, Feifer A, Koppie T, Rosenberg J, Downey R, et al.Combination of a novel gene expression signature with a clinical nomogram improves the prediction of survival in high-risk bladder cancer. Clin Cancer Res. 2012; 18(5):1323–33.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  17. Dyrskjot L, Kruhøffer M, Thykjaer T, Marcussen N, Jensen J, Moller K,et al.Gene expression in the urinary bladder: a common carcinoma in situ gene expression signature exists disregarding histopathological classification. Cancer Res. 2004; 64(11):4040–8.

    Article  CAS  PubMed  Google Scholar 

  18. Als A, Dyrskjot L, von der Maase H, Koed K, Mansilla F, Toldbod H, et al.Emmprin and survivin predict response and survival following cisplatin-containing chemotherapy in patients with advanced bladder cancer. Clin Cancer Res. 2007; 13(15 Pt 1):4407–14.

    Article  CAS  PubMed  Google Scholar 

  19. Smith S, Baras A, Owens C, Dancik G, Theodorescu D. Transcriptional signatures of Ral GTPase are associated with aggressive clinicopathologic characteristics in human cancer. Cancer Res. 2012; 72(14):3480–91.

    Article  CAS  PubMed  Google Scholar 

  20. Wang K, Singh D, Zeng Z, Coleman S, Huang Y, Savich G, et al.MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res. 2010; 38(18):178.

    Article  CAS  Google Scholar 

  21. Li B, Ruotti V, Stewart R, Thomson J, Dewey C. RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics. 2010; 26(4):493–500.

    Article  PubMed Central  PubMed  Google Scholar 

  22. Huber W, Gentleman R. matchprobes: a Bioconductor package for the sequence-matching of microarray probe elements. Bioinformatics. 2004; 20(10):1651–2.

    Article  CAS  PubMed  Google Scholar 

  23. Irizarry R, Bolstad B, Collin F, Cope L, Hobbs B, Speed T. Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 2003; 31(4):15.

    Article  Google Scholar 

  24. Altay G, Emmert-Streib F. Inferring the conservative causal core of gene regulatory networks. BMC Syst Biol. 2010; 4:132.

    Article  PubMed Central  PubMed  Google Scholar 

  25. Altay G, Emmert-Streib F. Structural Influence of gene networks on their inference: Analysis of C3NET. Biol Direct. 2011; 6:31.

    Article  PubMed Central  PubMed  Google Scholar 

  26. Olsen C, Meyer PE, Bontempi G. On the impact of entropy estimation on transcriptional regulatory network inference based on mutual information. EURASIP J Bioinf Syst Biol. 2009; 1:308959.

    Google Scholar 

  27. Butte A, Kohane I. Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. Pac Symp Biocomput. 2000; 5:418–429.

    Google Scholar 

  28. Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinf. 2008; 9:559.

    Article  Google Scholar 

  29. Faith J, Hayete B, Thaden J, Mogno I, Wierzbowski J, Cottarel G, et al.Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol. 2007; 5(1):8.

    Article  Google Scholar 

  30. Meyer P, Lafitte F, Bontempi G. minet: A R/Bioconductor package for inferring large transcriptional networks using mutual information. BMC Bioinf. 2008; 9:461.

    Article  Google Scholar 

  31. Futreal P, Coin L, Marshall M, Down T, Hubbard T, Wooster R, et al.A census of human cancer genes. Nat Rev Cancer. 2004; 4(3):177–83.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  32. Ashburner M, Ball C, Blake J, Botstein D, Butler H, Cherry J, et al.Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000; 25(1):25–9.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  33. Hurst L, Pal C, Lercher M. The evolutionary dynamics of eukaryotic gene order. Nat Rev Genet. 2004; 5(4):299–310.

    Article  CAS  PubMed  Google Scholar 

  34. Newman M. Networks: An Introduction. Oxford: Oxford University Press; 2010.

    Book  Google Scholar 

  35. Chatr-Aryamontri A, Breitkreutz B, Heinicke S, Boucher L, Winter A, Stark C, et al.The BioGRID interaction database: 2013 update. Nucleic Acids Res. 2013; 41(Database issue):816–23.

    Article  Google Scholar 

  36. Hermjakob H, Montecchi-Palazzi L, Lewington C, Mudali S, Kerrien S, Orchard S, et al.IntAct: an open source molecular interaction database. Nucleic Acids Res. 2004; 32(Database issue):452–5.

    Article  Google Scholar 

  37. Zanzoni A, Montecchi-Palazzi L, Quondam M, Ausiello G, Helmer-Citterich M, Cesareni G. MINT: a Molecular INTeraction database. FEBS Lett. 2002; 513(1):135–40.

    Article  CAS  PubMed  Google Scholar 

  38. Peri S, Navarro J, Amanchy R, Kristiansen T, Jonnalagadda C, Surendranath V, et al.Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res. 2003; 13(10):2363–71.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  39. Franceschini A, Szklarczyk D, Frankild S, Kuhn M, Simonovic M, Roth A, et al.STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res. 2013; 41(Database issue):808–15.

    Article  Google Scholar 

  40. Zaman N, Li L, Jaramillo M, Sun Z, Tibiche C, Banville M, et al.Signaling network assessment of mutations and copy number variations predict breast cancer subtype-specific drug targets. Cell Rep. 2013; 5(1):216–23.

    Article  CAS  PubMed  Google Scholar 

  41. Sales G, Calura E, Cavalieri D, Romualdi C. graphite - a Bioconductor package to convert pathway topology to gene network. BMC Bioinf. 2012; 13:20.

    Article  Google Scholar 

  42. Kamburov A, Pentchev K, Galicka H, Wierling C, Lehrach H, Herwig R. ConsensusPathDB: toward a more complete picture of cell biology. Nucleic Acids Res. 2011; 39(Database issue):712–7.

    Article  Google Scholar 

  43. Consortium TU. The Universal Protein Resource. Nucleid Acid Res. 2008; 36(November 2006):190–5.

    Google Scholar 

  44. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodological). 1995; 57:125–33.

    Google Scholar 

  45. Dijkstra EW. A note on two problems in connection with graphs. Numerische Math. 1959; 1:269–71.

    Article  Google Scholar 

  46. Simon R, Eltze E, Schäfer K, Bürger H, Semjonow A, Hertle L, et al.Cytogenetic analysis of multifocal bladder cancer supports a monoclonal origin and intraepithelial spread of tumor cells. Cancer Res. 2001; 61(1):355–62.

    CAS  PubMed  Google Scholar 

  47. Kram A, Li L, Zhang R, Yoon D, Ro J, Johnston D, et al.Mapping and genome sequence analysis of chromosome 5 regions involved in bladder cancer progression. Lab Invest. 2001; 81(7):1039–48.

    Article  CAS  PubMed  Google Scholar 

  48. Lin Y, Li Z, He Z, Guan T, Ma J. Clinical and prognostic significance of protocadherin-10 (PCDH10) promoter methylation in bladder cancer. J Int Med Res. 2012; 40(6):2117–23.

    Article  CAS  PubMed  Google Scholar 

  49. Lin Y, Ma J, Luo X, Guan T, Li Z. Clinical significance of protocadherin-8 (PCDH8) promoter methylation in bladder cancer. J Int Med Res. 2013; 41(1):48–54.

    Article  CAS  PubMed  Google Scholar 

  50. Dallosso A, Hancock A, Szemes M, Moorwood K, Chilukamarri L, Tsai H, et al.Frequent long-range epigenetic silencing of protocadherin gene clusters on chromosome 5q31 in Wilms’ tumor. PLoS Genet. 2009; 5(11):1000745.

    Article  Google Scholar 

  51. Matsuda R, Enokida H, Chiyomaru T, Kikkawa N, Sugimoto T, Kawakami K, et al.LY6K is a novel molecular target in bladder cancer on basis of integrate genome-wide profiling. Br J Cancer. 2011; 104(2):376–86.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  52. Lopez V, Gonzolez-Peramato P, Suela J, Serrano A, Algaba F, Cigudosa J, et al.Identification of prefoldin amplification (1q23.3-q24.1) in bladder cancer using comparative genomic hybridization (CGH) arrays of urinary DNA. J Transl Med. 2013; 11:182.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  53. Harada H, Nagai H, Tsuneizumi M, Mikami I, Sugano S, Emi M. Identification of DMC1, a novel gene in the TOC region on 17q25.1 that shows loss of expression in multiple human cancers. J Hum Genet. 2001; 46(2):90–5.

    Article  CAS  PubMed  Google Scholar 

  54. Song H, Sun W, Ye G, Ding X, Liu Z, Zhang S, et al.Long non-coding RNA expression profile in human gastric cancer and its clinical significances. J Transl Med. 2013; 11:225.

    Article  PubMed Central  PubMed  Google Scholar 

  55. Sjöblom T, Jones S, Wood L, Parsons D, Lin J, Barber T, et al.The consensus coding sequences of human breast and colorectal cancers. Science. 2006; 314(5797):268–74.

    Article  PubMed  Google Scholar 

  56. Wang H, Zhao A, Chen L, Zhong X, Liao J, Gao M, et al.Human RIF1 encodes an anti-apoptotic factor required for DNA repair. Carcinogenesis. 2009; 30(8):1314–9.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  57. Suzuki C, Takahashi K, Hayama S, Ishikawa N, Kato T, Ito T, et al.Identification of Myc-associated protein with JmjC domain as a novel therapeutic target oncogene for lung cancer. Mol Cancer Ther. 2007; 6(2):542–51.

    Article  CAS  PubMed  Google Scholar 

  58. Yoon H, Lee H, Kim H, You K, Park Y, Kim H, et al.Tudor domain-containing protein 4 as a potential cancer/testis antigen in liver cancer. Tohoku J Exp Med. 2011; 224(1):41–6.

    Article  CAS  PubMed  Google Scholar 

  59. Doyle S, Husebye H, Connolly D, Espevik T, O’Neill L, McGettrick A. The GOLD domain-containing protein TMED7 inhibits TLR4 signalling from the endosome upon LPS stimulation. Nat Commun. 2012; 3:707.

    Article  PubMed  Google Scholar 

  60. Yu W, Chen L, Yang Y, Falck J, Guo A, Li Y, et al.Cytochrome P450 w-hydroxylase promotes angiogenesis and metastasis by upregulation of VEGF and MMP-9 in non-small cell lung cancer. Cancer Chemother Pharmacol. 2011; 68(3):619–29.

    Article  CAS  PubMed  Google Scholar 

  61. Sidoryk M, Matyja E, Dybel A, Zielinska M, Bogucki J, Jaskolski D, et al.Increased expression of a glutamine transporter SNAT3 is a marker of malignant gliomas. Neuroreport. 2004; 15(4):575–8.

    Article  CAS  PubMed  Google Scholar 

  62. Barabási AL, Albert R. Emergence of scaling in random networks. Science. 1999; 206:509–12.

    Google Scholar 

  63. Albert R. Scale-free networks in cell biology. J Cell Sci. 2005; 118(21):4947–57.

    Article  CAS  PubMed  Google Scholar 

  64. van Noort V, Snel B, Huymen MA. The yeast coexpression network has a small-world, scale-free architecture and can be explained by a simple model. EMBO Rep. 2004; 5(3):280–4.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  65. Basso K, Margolin AA, Stolovitzky G, Klein U, Dalla-Favera R, Califano A. Reverse engineering of regulatory networks in human b cells. Nat Genet. 2005; 37(4):382–90.

    Article  CAS  PubMed  Google Scholar 

  66. Marusyk A, Polyak K. Tumor heterogeneity: causes and consequences. Biochim Biophys Acta. 2010; 1805(1):105–17.

    CAS  PubMed Central  PubMed  Google Scholar 

  67. modENCODE Consortium, Roy S, Ernst J, Kharchenko P, Kheradpour P, Negre N, et al.Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science. 2010; 330(6012):1787–97.

    Article  Google Scholar 

  68. Alba RD. A graph-theoretic definition of a sociometric clique. J Math Sociology. 1973; 3:3–113.

    Article  Google Scholar 

  69. Hart G, Lee I, Marcotte E. A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality. BMC Bioinf. 2007; 8:236.

    Article  Google Scholar 

  70. Zanivan S, Cascone I, Peyron C, Molineris I, Marchio S, Caselle M, et al.A new computational approach to analyze human protein complexes and predict novel protein interactions. Genome Biol. 2007; 8(12):256.

    Article  Google Scholar 

  71. Wang J, Huang Q, Liu Z, Wang Y, Wu L, Chen L, et al.NOA: a novel Network Ontology Analysis method. Nucleic Acids Res. 2011; 39(13):87.

    Article  Google Scholar 

  72. de Matos Simoes R, Emmert-Streib F. Bagging statistical network inference from large-scale gene expression data. PLoS One. 2012; 7(3):33624.

    Article  Google Scholar 

  73. Malone J, Oliver B. Microarrays, deep sequencing and the true measure of the transcriptome. BMC Biol. 2011; 9:34.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  74. Hanahan D, Weinberg R. Hallmarks of cancer: the next generation. Cell. 2011; 144(5):646–74.

    Article  CAS  PubMed  Google Scholar 

  75. Apolo A, Kwiatkowski D. Targeting molecular aberrations in urothelial carcinoma: are we almost there?. Am Soc Clin Oncol Educ Book. 2013; 33:195–9.

    Article  Google Scholar 

  76. Tseng D, Volkmer J, Willingham S, Contreras-Trujillo H, Fathman J, Fernhoff N, et al.Anti-CD47 antibody-mediated phagocytosis of cancer by macrophages primes an effective antitumor T-cell response. Proc Natl Acad Sci USA. 2013; 110(27):11103–8.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  77. Karantza V. Keratins in health and cancer: more than mere epithelial cell markers. Oncogene. 2011; 30(2):127–38.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  78. Dorn J, Bayani J, Yousef G, Yang F, Magdolen V, Kiechle M, et al.Clinical utility of kallikrein-related peptidases (KLK) in urogenital malignancies. Thromb Haemost. 2013; 110(3):408–22.

    Article  CAS  PubMed  Google Scholar 

  79. Bardelli A, Velculescu V. Mutational analysis of gene families in human cancer. Curr Opin Genet Dev. 2005; 15(1):5–12.

    Article  CAS  PubMed  Google Scholar 

  80. Jansen R, Greenbaum D, Gerstein M. Relating whole-genome expression data with protein-protein interactions. Genome Res. 2002; 12(1):37–46.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  81. Liu Z, Shao Y, Tan L, Shi H, Chen S, Guo J. Clinical significance of the low expression of FER1L4 in gastric cancer patients. Tumour Biol. 2014; 35(10):9613–7.

    Article  CAS  PubMed  Google Scholar 

  82. Rocha C, Papon L, Cacheux W, Marques Sousa P, Lascano V, Tort O, et al.Tubulin glycylases are required for primary cilia, control of cell proliferation and tumor development in colon. EMBO J. 2014; 33(19):2247–60.

    Article  CAS  PubMed  Google Scholar 

Download references


RdMS and SD are funded 100 FTE, and KW 30 FTE by a grant from Invest NI RD0412515. We would like to thank Tamara Zoranovic and Shailesh Tripathi for fruitful discussions.

Author information

Authors and Affiliations


Corresponding authors

Correspondence to Kate E Williamson or Frank Emmert-Streib.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

RdMS conceived the study and analyzed the data. RdMS, KW, SD and FES interpreted the results and wrote the manuscript. All authors read and approved the final manuscript.

Additional file

Additional file 1

Supplementary Materials: Urothelial cancer gene regulatory networks inferred from large-scale RNAseq, Bead and Oligo gene expression data.

Rights and permissions

Open Access  This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit

The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Matos Simoes, R.d., Dalleau, S., Williamson, K.E. et al. Urothelial cancer gene regulatory networks inferred from large-scale RNAseq, Bead and Oligo gene expression data. BMC Syst Biol 9, 21 (2015).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: