Combinatorial network of transcriptional regulation and microRNA regulation in human cancer
© Yu et al.; licensee BioMed Central Ltd. 2012
Received: 27 September 2011
Accepted: 16 May 2012
Published: 12 June 2012
Skip to main content
© Yu et al.; licensee BioMed Central Ltd. 2012
Received: 27 September 2011
Accepted: 16 May 2012
Published: 12 June 2012
Both transcriptional control and microRNA (miRNA) control are critical regulatory mechanisms for cells to direct their destinies. At present, the combinatorial regulatory network composed of transcriptional regulations and post-transcriptional regulations is often constructed through a forward engineering strategy that is based solely on searching of transcriptional factor binding sites or miRNA seed regions in the putative target sequences. If the reverse engineering strategy is integrated with the forward engineering strategy, a more accurate and more specific combinatorial regulatory network will be obtained.
In this work, utilizing both sequence-matching information and parallel expression datasets of miRNAs and mRNAs, we integrated forward engineering with reverse engineering strategies and as a result built a hypothetical combinatorial gene regulatory network in human cancer. The credibility of the regulatory relationships in the network was validated by random permutation procedures and supported by authoritative experimental evidence-based databases. The global and local architecture properties of the combinatorial regulatory network were explored, and the most important tumor-regulating miRNAs and TFs were highlighted from a topological point of view.
By integrating the forward engineering and reverse engineering strategies, we manage to sketch a genome-scale combinatorial gene regulatory network in human cancer, which includes transcriptional regulations and miRNA regulations, allowing systematic study of cancer gene regulation. Our work establishes a pipeline that can be extended to reveal conditional combinatorial regulatory landscapes correlating to specific cellular contexts.
Transcriptional factors (TF) and microRNAs (miRNAs) are important regulation factors to determine the expression levels of mRNAs and miRNAs . TFs activate or repress gene transcription by binding to specific sites (transcription factor binding sites, or TFBSs) in promoter regions, thus regulating gene expression at the transcription level; miRNAs inhibit mRNA translation by inducing mRNA degradation and/or blocking the translation machinery, thus negatively regulates gene expression at the post-transcriptional level. Given the facts that the transcription of both mRNA and miRNA is regulated by TFs, and that mRNA expression, including TF’s, could be modulated by miRNAs, the cellular transcriptome is believed to be determined by combinatorial regulatory network of at least two interconnected layers, where TFs work as master regulators in the transcriptional layer and miRNAs as fine tuners in the post-transcriptional layer . It thus becomes critical to delineate and characterize the two-layered combinatorial regulatory networks, for the sake of understanding the regulatory mechanisms at a higher precision than what we can do with either layer alone.
Databases, such as TransFAC  on TF-to-mRNA regulation, TransmiR  on TF-to-miRNA regulation, and TarBase  on miRNA-to-mRNA regulation, provide experimentally validated regulation relationships between regulators and their targets. However, such data alone are too limited to enable large-scaled studies. Therefore, peers have resorted to a forward-prediction strategy to infer regulatory relationships between TFs or miRNAs and their putative targets based on the matching or complementary of motif or seed sequences [5, 6]. In this way, they built the two-layered combinatorial regulatory networks, and investigated the global and local architectural properties [7, 8]. It is imaginable that a high rate of false positive predictions is necessitated , and moreover, these forward works generate ‘reference networks’ that span across all spatiotemporal contexts – in concept all regulations that take place at different temporal points and different cells or tissues are combined unreasonably. That is, forward engineering cannot solve a conditional regulatory network that corresponds to a particular cellular context. The reverse engineering strategy therefore comes into use where the regulatory relationships between TFs or miRNAs and their putative targets (cause) are inferred from the observed expression correlations (consequence) (for a review see ).
Reverse engineering has been put into effect in inferring TF-controlled transcriptional regulation networks [11–13] as well as sifting miRNA potential targets [14, 15]. However, we have rarely seen successful applications of reverse engineering in inferring combinatorial networks involving TFs and miRNAs, except for a few works where small-scaled combinatory circuits of miRNAs and TFs were mapped around some selected genes prioritized from the expression data [16–18]. The major obstacle in this direction, lack of simultaneously measured miRNA expression data and mRNA expression data, is being relieved as parallel miRNA expression and mRNA expression datasets are being continuously released to public , such as those for epithelial samples [20, 21] or various tumor samples [22–24]. Having only been explored for confirming predicted miRNA targets [24, 25] or extracting tumor-classifying molecular signatures , these parallel expression datasets have far more potential to be exploited.
Previously, we integrated forward predicted gene regulation relationships with miRNA-perturbed gene expression datasets (MPGE datasets) and as a result elucidated miRNA-centered primary and secondary regulatory cascades in human cancer by using nonparametric test and linear regression modeling . Confined to the type of expression data - mRNA expression, the combinatorial regulatory networks mapped therein encompassed only the regulation of mRNA by TF and by miRNA (miRNA-to-mRNA, TF-to-mRNA), missing the regulation of miRNA by TF (TF-to-miRNA). This limitation is also existent in a contemporary study , which substitutes mRNA’s expression data for that of the embedded intragenic miRNA in order to identify miRNA-mediated feedback and feed-forward loops. We realize that studies on combinatorial gene regulatory network can be advanced significantly with the help of the aforementioned parallel miRNA expression and mRNA expression datasets. Due to our preceding work on human cancers , we are particularly interested in the NCI-60 data panel [23, 29] which involves 60 cancerous cell lines originating from breast, central nervous system, colon, leukemia, melanoma, Non-Small Cell Lung, ovarian, prostate, and renal tissues.
In the present work, we demonstrated an efficient integration of the forward-predicted candidate regulatory relationships with the NCI60 panel of parallel miRNA and mRNA expression datasets, giving rise to a genome-scale combinatorial network of transcriptional regulations and miRNA regulations in human cancer. The resultant combinatorial regulatory network makes a scaffold for systematic study of cancer gene regulation, and the demonstrated working pipeline can be extended to reveal conditional combinatorial regulatory landscapes in other cellular contexts.
From the NCI-60 data source CellMiner (http://discover.nci.nih.gov/cellminer/), we downloaded the NCI-60 mRNA expression dataset assayed with the Affymetrix HG-U133 44 K probeset microarray  and the parallel miRNA expression dataset assayed with the OSU-CCC-hsa-miRNA-chip-V3 array . A total of 59 cell lines, originating from breast, central nervous system, colon, leukemia, melanoma, Non-Small Cell Lung, ovarian, prostate, and renal tissues, were used in our study, ignoring the cell line “LC:NCI_H23” for lack of necessary meta-information. For a pre-filtration of non-informative molecules, we only analyzed ‘frequently expressed’ miRNAs and mRNAs whose log2-based expression values were larger than the dataset-specific minimum value (2.3 and 7 for miRNA and mRNA respectively) in more than 47 (~80%) arrays. Afterwards, values in each row (corresponding to each mRNA or miRNA) of the two expression datasets were centered to zero, and rows of synonymous probes were averaged to designate a unique miRNA or mRNA. Finally, we determined two parallel expression datasets across a same spectrum of cancer cell lines, one involving 195 microRNA genes and another involving 8388 protein-coding genes.
From the miRGen website (http://www.diana.pcbi.upenn.edu/miRGen, version 3.0), we obtained 118,408 human candidate miRNA-gene regulatory relationships involving 276 human microRNAs and 10,255 targets, which were a union of results from three forward predicting algorithms: PicTar , TargetScan  and miRanda . For more information, please refer to our related work  and its supplement 2.
A set of forward predicted TF-gene regulatory relationships were compiled by merging records from UCSC (http://genome.ucsc.edu/) and TRED (http://rulai.cshl.edu/TRED/), which included 130,338 binary tuples involving 214 human TFs and 16,534 targets. For more information, please refer to our related work  and its supplement 3.
The file ‘wgRna.txt’, downloaded from UCSC hg18, gave coordinate information of human microRNA genes. For all miRNAs, around 35% were embedded in protein-coding gene regions and on the same-strand of the host gene, i.e., intragenic, whereas the others were located outside protein-coding gene regions, i.e., intergenic. Note that miRNAs embedded in protein gene regions but on the opposite strand to the host gene’s were assigned to the so-called ‘intergenic’ group.
For intragenic miRNAs, we assume that they have the same transcriptional factors as their host genes considering the co-transcription of intragenic miRNAs and the hosts [32, 33]. This is a common tactic in this field , though some violations were lately observed . For intergenic miRNAs, we inherited pioneer operations  to group them into genomic clusters where in each cluster every two proximate miRNAs were separated by not more than 7.5 kb (more explanation is found in Additional file 1). Then we investigated the distribution of TF binding site (TFBS) near the first microRNA in each cluster and defined -3Kb to +1Kb of the first microRNA's transcription starting site (TSS) as the promoter region of the whole cluster (more explanation on the promoter definition is found in Additional file 1.) On these grounds, candidate TF-target microRNA relationships for intergenic miRNAs were established.
After regression of Equation 1, TFs and miRNAs without statistically significant relationship (p > α1, α 1 = 0.05) with their putative targets were disconnected with their putative targets. With regards to miRNA-to-target regulations, specially, we only kept the ones with negative regulation efficacy . In this way, the number of putative regulation relationships was largely cut down. The remaining regulators for a specific target, mRNA or miRNA, were represented as independent variables in the formal multivariate linear regression (Equation 2 or Equation 3).
In brief, expression levels of mRNAs (Equation 2) and miRNAs (Equation 3) were modeled by the regulators that passed the statistical test (p < =α 1 ) in the pre-filtering step (Equation 1), and the stepwise linear regression was implemented to determine the ultimate regulators of a particular target. As we did for Equation 1, if a positive miRNA-to-target regulation efficacy appears in the final regression model of Equation 2, all regulations reserved in the model for the same target gene g were eliminated.
An R function implementing this network modelling algorithm (miRNAII.regression) is provided in the supplementary source codes (Additional file 2).
Statistics of vertices and edges in the combinatorial gene regulatory network in human cancer
Number of unique objects
159 (101 as only regulator, 5 as only target, and 53 as both)
81 (22 as only regulator, 13 as only targets, and 46 as both)
Non-TF protein-coding genes
False discovery rates (FDRs) of predicted regulation relationships
Edges in shuffled network
Edges in predicted network
miRNA → gene
477.8(+ − 29.2)
29.4(+ − 1.8)
TF → gene
382.3(+ − 23.9)
11.2(+ − 0.7)
TF → miRNA
21(+ − 5.2)
21.5(+ − 5.3)
881.1(+ − 46.2)
17.2(+ − 0.9)
We also evaluated our methods in terms of the predictability of the resultant linear model (Equation 2 and Equation 3). While in real work we made use of all 59 expression data-points of a gene, in the jack-knife like evaluation procedure we excluded one data-point from the modeling (Equations 1, 2, and 3) and used the fitted model (Equation 2/Equation 3) to predict the left-out data-point. For each target gene we had 59 iterations, and we calculated the Pearson correlation coefficient (PCC) between the measured expression values and the predicted ones. As a result, significantly higher PCCs were obtained with real expression data than with randomly permuted data (Additional file 5), indicating that the obtained linear model, or roughly speaking the combination of included regulations, had a significant power to predict the targets’ expression values.
Validation of regulation relationships
Significance of validation fraction increase (binomial test p-value)
Forward and reverse
Forward and reverse
Forward and reverse
Since our modeling work was based on the NCI60 expression data panel, the genes in the combinatorial regulatory network should be substantially related to cancer. For a total of 427 cancer related genes downloaded from the “Cancer Gene Census" (http://www.sanger.ac.uk/genetics/CGP/Census/), 197 overlapped the 8,388 frequently expressed genes of the NCI-60 mRNA expression dataset, of which 121 were found in the 3,259 genes remaining in the resultant regulatory network (Additional file 3: Table S1). The fraction of cancer-related genes was significantly raised from 2.3% to 3.7% (hypergeometric t-test p < e-10), consistent with the expectation that our gene regulatory network should enrich genes related to cancer. In the following sections we pinpointed the most noteworthy cancer-related genes and miRNAs from the topological viewpoint (for a full table of vertex properties see Additional file 4: Table S2).
From the combinatorial regulatory network, we concentrated on the most common regulators according to vertices’ out-degrees and betweennesses  (Additional file 4: Table S2). Among these common ones, MYC has the largest number of regulating targets (301) and the highest betweenness (46215.9). Besides, of all regulators, MYC has the largest in-degree (six), implying that MYC is under strict control. These statistics are consistent with the unique role of MYC in tumorigenesis. As one of the most important cancer-related genes, MYC has been proved to participate in several essential functions, such as cell cycle progression and apoptosis. Particularly, MYC was found to be involved in the regulation of a broad range of miRNAs, many of which play key roles in cell proliferation and oncogenic transformation [38, 39].
In our network, MYC was predicted to regulate 10 miRNAs: miR-378, hsa-miR-17, hsa-miR-19a, hsa-miR-19b, hsa-miR-20b, hsa-miR-92, hsa-miR-106a, hsa-miR-25, and hsa-miR-106b, and hsa-miR-125b. Of them, hsa-miR-378 is located in the intron of protein-coding genes PPARGC1B, an experimentally validated transcriptional targets of MYC , and another eight miRNAs (hsa-miR-17, hsa-miR-19a, hsa-miR-19b, hsa-miR-20b, hsa-miR-92, hsa-miR-106a, hsa-miR-25, and hsa-miR-106b) belong to three paralogous clusters located on chromosome 13 (the hsa-miR-17 cluster), chromosome X (the hsa-miR-106a cluster), and chromosome 7 (the hsa-miR-106b cluster), with the former two clusters having been proved to be regulated by MYC . Finally, our prediction of hsa-miR-125b being repressed by MYC was in accordance with an independent observation . It is notable that, except hsa-miR-125b, the other nine miRNAs were all predicted to be ‘ACTIVATED’ by MYC, and this seems contradictory to a previous notion that ‘widespread microRNA REPRESSION by Myc contributes to tumorigenesis’ .
Aside from the 10 miRNA targets, MYC demonstrated another 291 protein-coding genes as its regulating targets. In order to evaluate the reliability of our refined MYC targets, we referred to a set of 3,455 c-Myc binding targets determined in human B lymphoid tumor using chromatin immunoprecipitation coupled with pair-end ditag sequencing analysis (ChIP–PET) , and found that the fraction of ChIP-PET confirmed MYC targets was increased from 18.0% (453 of 2508) in the forward prediction to 23.0% (67 of 291) in the modeling result, which was a statistical significant enrichment (hypergeometric test p = 0.009). The 67 MYC targets confirmed by ChIP-PET experiment were marked out in Additional file 3: Table S1.
By analogy to MYC, miR-106b, a target of MYC, is probably the most important miRNA since it has the largest number of targets among all miRNAs in the network. For the 44 predicted targets of miR-106b, 38 were covered in the public dataset GSE6838 (http://www.ncbi.nlm.nih.gov/projects/geo/query/acc.cgi?acc=GSE6838) recording the gene expression changes in cells transfected with hsa-miR-106b, and 21 were among the top 5% down-regulated genes (Additional file 6). While the hsa-miR-106 family have been implicated in breast cancer  and gastrointestinal tumor , our results furthermore suggest it may be a central miRNA underpinning the general tumorigenesis mechanism. In-depth investigations of hsa-miR-106b and its regulations are necessary in future studies of fundamental cancer mechanisms.
In our network, there are some miRNAs that have betweenness comparable to TFs (Additional file 4: Table S2), among which hsa-let-7c is a typical example. The betweenness of hsa-let-7c is ranked after only four TFs in the combinatorial regulatory network, and its confirmed inhibition of MYC  happened to have the highest edge-wise betweenness. In addition, hsa-let-7c is regulated by NFE2L1, the 5th TF with the largest out-degree. All these observations indicate that hsa-let-7c is another important miRNA probably underpinning the general tumorigenesis mechanism.
It is of great interest how regulators coordinate to regulate their targets. In our combinatorial regulatory network, we identified coordinating regulator pairs which share a significantly large number of common targets (one-sided Fisher’s exact test, p < 0.01), and obtained 17 TF-TF pairs, 46 miR-miR pairs and 17 TF-miR pairs (Additional file 7: Table S3). To our expectation, coordinated regulations often take place among regulators who usually form complexes and play their roles as a whole, such as MYC and MAX; NFKB1 and RELA; JUN, JUND, and FOSL1. Regulators from a same protein family or miRNA family are also likely to form coordinating regulators, such as those from the E2F family, the STAT family, or the let-7 family.
Our 21 miR-miR co-regulating pairs were compared with the counterpart set of 199 pairs reported earlier in a forward-prediction work . There was not a single pair that perfectly matched between the two sets, and, if we relaxed the matching criteria to the family level, only three of our 46 pairs (hsa-mir-130a and hsa-mir-152; hsa-mir-130a and hsa-148b; has-mir-130b and has-mir-19a) showed up in the previous set of 199 pairs. It seems that important discoveries may differ between a forward-predicted reference network and a reverse-engineered context-specific network, which warrants the efforts to integrate the forward-predicted putative regulation relationships with various conditional expression datasets so as to construct conditional combinatorial gene regulatory networks that are specific to different experimental conditions.
By definition, there are 18 types of closed triple-vertex regulatory circuits that involve at least a miR and a TF, which can be classified into ‘feed-forward loops’ (FFLs) and ‘feed-backward-loops’ (FBLs) by considering the connecting ways of directed regulations . Previous studies found that TF-initiated or miRNA-initiated feed-forward loops (FFLs) may be characteristic regulatory motifs in tumor cells influencing a large number of target genes from specific biological pathways [53, 54]. Therefore we counted in our cancer regulatory network the occurrences of all possible fourteen triple-vertex FFLs and four FBLs, and estimated the corresponding p-values through counting the counterpart occurrences 1000 times in randomly shuffled networks. The randomization was achieved by randomly shuffling the actual regulatory relationships (TF-miRNA, TF-mRNA, and miRNA-mRNA), provided that the type-specific in-degree and type-specific out-degree of each vertex were fixed, with the type being miRNA or mRNA.
In this work, we made use of parallel miRNA and mRNA expression datasets from the NCI-60 data panel, and, by using the linear regression model as a tool to integrate the sequence-matching information and the expression data, we sifted the forward-predicted regulation relationships and constructed a human cancer gene regulatory network composed of transcriptional control and miRNA control. This differs from related peer works mostly in that we realized for the first time a global landscape of combinatorial gene regulatory network in a specific biological context. By analyzing the results originating from randomized expression datasets, we estimated that the false discovery rate of our selected regulations were 17.2%, and by taking the experimental evidences from TransFAC, TarBase, and TransmiR as benchmarks, we proved that our human cancer combinatorial gene regulatory network tended to enrich genuine regulations of all three types.
Two years ago, we integrated the forward predictions with the miRNA-perturbed gene expression datasets (MPGE datasets) to elucidate the miRNA-centered primary and secondary regulatory cascades in human cancer, which encompassed two types of mRNA regulation, miRNA-to-mRNA and TF-to-mRNA . By introducing parallel miRNA and mRNA expression datasets, here we manage to map a combinatorial gene regulatory network that encompass one more regulation relationship: the regulation of miRNA by TF, and as a result, TF control and miRNA control are comprehensively described in this genome-scale cancer-related network. With the microarrays becoming cheaper and next-generation sequencing platforms being rapidly developed, it is foreseeable that a large amount of parallel miRNA and mRNA expression datasets are attainable in the near future, and thus, our modeling strategy can be extended to enormous cell types, strains, tissues, and so on.
In human cancer, miRNAs are presumed to preferentially couple its post-transcriptional inhibition with TF-initiated transcription in combinatorial regulatory circuits [53, 54]. Our regulatory network provides a holistic background in which the important elements, relationships, and network motifs can be analyzed thoroughly. For instance, a quick topology analysis of this network highlights the very important cancer-related transcription factor MYC and two remarkable miRNAs hsa-miR-106b and hsa-let-7c. While these entities themselves have already been found to be cancer relevant, our network demonstrates their putative regulatory contexts comprising their coordinating partners and their targets, and such information may shed additional light on tumorigenesis mechanisms. For instance, we discriminated three significantly recurrent coherent FFL motifs from our combinatorial regulatory network, most of which involve the ascertained cancer-related transcription factor MYC. Coherent FFLs are often used to amplify target genes or reduce target genes to inconsequential levels. These types of regulation are often being used as an "on/off" switch during developmental transitions and cellular differentiation. Current work in model organisms suggested miRNAs and TFs are also involved in incoherent FFLs - miRNAs appear to buffer against biological noise by targeting several components within a network in an incoherent manner . The fact that incoherent FFL is missing in our cancer regulation network may imply an important and specific aspect of cancer cells.
While the overall working pipeline turns out to be effective in this work, some specific steps need to be discussed. For example, the prefiltration step (Equation 1) is incurred primarily for reducing computation complexity of the following steps (Equation 2/Equation 3) or circumventing the n < p problem (if the number of samples is less than the number of putative regulators, the stepwise linear regression is unsolvable). Taking into account the forthcoming reduction in microarray or RNA-seq costs, the number of samples in future parallel expression datasets may be larger, which may greatly alleviate the pre-filtration pressure. For another example, doubts have been arising with regards to the stepwise (STEP) linear regression , and therefore it is necessary to consider other variants for modeling Equation 2, say the ‘least absolute shrinkage and selection operator (LASSO)’. LASSO is a penalized linear regression model which shrinks the coefficients of some predictors to smaller values or zeroes, and therefore can be used as a variable selection tool. On the same input data materials, we implemented the algorithm using STEP and LASSO separately, and got two different combinatorial regulation networks. We found that overall LASSO gave rise to a network with denser edges (three edges per node for LASSO, in comparison to two edges per node for STEP). The estimated false discovery rates were 23.3% for LASSO, a little bit higher than that of STEP. When comparing the LASSO-based results and the step-based results, we found on the whole a high level of mutual consistency (data not shown). It should be noted that this comparison was limited to a particular sets of datasets relating to the NCI60 case, and more rigorous comparative experiments are needed to substantiate an optimal choice of the regression model in this step. In the supplementary source codes, we allow the user to decide whether to implement the prefiltration Equation 1 or not, and provide both the STEP portal and the LASSO portal for Equation 2.
Towards the ultimate goal of ‘differential networking’ analysis , we made the first step to map a TF and miRNA-involved combinatorial gene regulation network for a specific context, human cancer. Confined to the used expression dataset, we arrived at results for the general tumorigenesis but not for a specific cancer type; and the results reported herein would be consolidated if a direct comparison was made between the cancer-specific network and a ‘normal’ network. Given the ever-increasing resource of parallel miRNA and mRNA expression datasets enabled by rapidly developing RNA-seqs, improvements are expected to be made in forthcoming succeeding works.
In this work, we made an attempt to integrate the forward engineering and reverse engineering strategies and for the first time resulted in a global landscape of combinatorial gene regulatory network in a specific biological context (human cancer) that has a moderate false discovery rate and is enriched with confirmed regulations. The human cancer combinatory gene regulatory network is found to be a hierarchical scale-free network with MYC, hsa-miR-106b and has-let-7c being the most important regulators. From the network, 17 TF-TF pairs, 46 miR-miR pairs and 17 TF-miR pairs are identified as the significantly co-regulating regulator pairs, and four triple-vertex regulatory circuits (one FBL and three FFLs) turn out as significantly recurrent building motifs. We believe our work provides a scaffold combinatorial gene regulatory network allowing systematic study of cancer gene regulation, and that our pipeline can be extended to reveal conditional combinatorial regulatory landscapes correlating to specific cellular contexts.
We would like to thank Ms. Qian-Qian Shi from Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences for her technical assistance. We would also like to thank Prof. Dennis D. Boos for publicizing the lars-based LASSO wrapper function. This work was supported by grants from National key basic research program (funding numbers: 2011CB910204, 2012CB316501, 2010CB912702); Chinese Academy of Sciences (Innovation funding KSCX2-YW-R-112, KSCX2-EW-R-04); National Natural Science Foundation of China (31000380 to HY, 31070752 to LX, 31171268 to LYY, 90913009 to YXL).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.