Identification of responsive gene modules by network-based gene clustering and extending: application to inflammation and angiogenesis

Background Cell responses to environmental stimuli are usually organized as relatively separate responsive gene modules at the molecular level. Identification of responsive gene modules rather than individual differentially expressed (DE) genes will provide important information about the underlying molecular mechanisms. Most of current methods formulate module identification as an optimization problem: find the active sub-networks in the genome-wide gene network by maximizing the objective function considering the gene differential expression and/or the gene-gene co-expression information. Here we presented a new formulation of this task: a group of closely-connected and co-expressed DE genes in the gene network are regarded as the signatures of the underlying responsive gene modules; the modules can be identified by finding the signatures and then recovering the "missing parts" by adding the intermediate genes that connect the DE genes in the gene network. Results ClustEx, a two-step method based on the new formulation, was developed and applied to identify the responsive gene modules of human umbilical vein endothelial cells (HUVECs) in inflammation and angiogenesis models by integrating the time-course microarray data and genome-wide PPI data. It shows better performance than several available module identification tools by testing on the reference responsive gene sets. Gene set analysis of KEGG pathways, GO terms and microRNAs (miRNAs) target gene sets further supports the ClustEx predictions. Conclusion Taking the closely-connected and co-expressed DE genes in the condition-specific gene network as the signatures of the underlying responsive gene modules provides a new strategy to solve the module identification problem. The identified responsive gene modules of HUVECs and the corresponding enriched pathways/miRNAs provide useful resources for understanding the inflammatory and angiogenic responses of vascular systems.


Background
Understanding of cell responses to environmental stimuli is one of the central tasks of molecular biology. Genomewide gene expression profiling techniques, such as microarray and deep sequencing, are widely used to identify the responsive genes whose expressions are significantly changed after the stimulus. But identifying the responsive genes by differential expressions does not consider the complex gene-gene interactions or regulation information. Increasing evidences suggest that cell responses are usually organized as pathways or responsive gene modules consisting of a group of interacted genes at the molecular level [1][2][3][4]. Identification of the responsive gene modules rather than independent responsive genes can provide better understanding of the underlying molecular mechanisms. With the increasing content of the gene-gene interaction databases, such as protein-protein interaction (PPI) databases and pathway databases, several methods have been developed to identify the responsive gene modules by finding an active subnetwork in genome-wide gene networks (mostly PPI networks) [5][6][7][8][9][10][11][12][13][14]. The previous methods usually formulate the module identification task as an optimization problem: first, a module score evaluating the significance of differential expression [5][6][7][8][9][10] (a few methods also consider the gene-gene co-expression information in the objective function [11,12]) of any given gene sub-network is introduced as the objective function; then heuristic searching or exact computational methods (linear programming) are implemented to find the sub-networks optimizing the objective function. The obtained sub-networks are regarded as the responsive gene modules (see review in [13]). Related methods have been successfully applied for analyzing many physiological processes, such as type 2 diabetes [15], immunology [8], breast cancer metastasis [10] and drug response [5].
Here we presented a new formulation of the module identification task: a group of closely connected and coexpressed differentially expressed (DE) genes in genomewide gene networks are regarded as the signatures of the underlying responsive gene modules at the RNA expression level. Our method named ClustEx was designed to find those signatures in the first step. Many studies show that the genes which are co-expressed in RNA level and/ or interacted in protein level tend to involve in the same biological process, and promising new discoveries have been found by using the co-expression [16,17] and/or interaction information [18][19][20]. After getting the clustered DE genes as the signatures, the "missing parts" of the responsive gene modules are recovered in the second step by adding the intermediate genes, which may not be differentially expressed but are on the paths connecting the DE genes in the gene network.
Human umbilical vein endothelial cells (HUVECs) are widely used as in vitro models to study the vascular systems in inflammation and angiogenesis. We collected two time-course microarray datasets: one is for tumor necrosis factor alpha (TNF) stimulated HUVECs, an inflammation model [21][22][23][24], and the other one is for vascular endothelial growth factor A (VEGF) stimulated HUVECs, a canonical angiogenesis model [25][26][27][28]. Then ClustEx was applied to identify the responsive gene modules of TNF/VEGF stimulated HUVECs by integrating the timecourse microarray data and the genome-wide HPRD PPI data [29][30][31]. Results show that ClustEx has better performances than several available module identification tools on the reference responsive gene sets. The enriched KEGG pathways [32], microRNA (miRNA) target gene sets [33,34] and GO terms [35] identified by gene set analysis also support ClustEx predictions.

ClustEx overview: identify the responsive gene modules by network-based differentially expressed (DE) genes clustering and extending
ClustEx is a two-step method for identifying the responsive gene modules by combining gene expression and interaction information. In the clustering step, average linkage hierarchical clustering was used to cluster and partition the DE genes into different gene groups accord-ing to their distances in gene networks, based on the assumption that a group of closely-connected and coexpressed DE genes are the signatures of the underlying responsive gene modules. In the extending step, the intermediate genes on the k-shortest paths between the DE genes were added to form the final responsive gene modules ( Figure 1). The details of ClustEx are presented in Methods section.

Identification of the responsive gene modules of human umbilical vein endothelial cells (HUVECs) in inflammation
ClustEx was applied to identified the responsive gene modules of HUVECs in inflammation model using the 0~8 h time-course microarray expression profiling data (GSE9055, 0~8 h, 25 time points [36,37]) and the HPRD genome-wide PPI data [29][30][31], with the following settings: the minimum fold changes of DE genes is 2, the shortest path length is shorter than 0.8 for clustering and the "k" is 10 for adding the intermediate genes on the kshortest paths. The identified biggest responsive gene module has 284 genes including 130 DE genes ( Figure 2, Additional file 1) and the second has 34 genes including 18 DE genes. The top two modules are very significant according to the edge-based module score measurement defined by [11] (z-score = 50.279 for the biggest module; z-score = 9.72 for the second module).
To validate our predictions, three different TNF reference responsive gene sets were collected from 1) NetPath "TNF/NF-kB signaling pathway", 2) PID/BioCarta/Reactome annotated TNF signaling pathways, and 3) PubMed abstracts. We compared our predictions with several available module identification tools. The original nodebased approach using simulated annealing (CytoScape jActiveModules plug-in [7]) and the edge-based heuristic searching approach in [11] (the Matlab and Java scripts were obtained by personal contact with the authors) did not find any significant module larger than 30 genes using the parameter settings described in Method section. The other compared methods included the node-based approach using greedy search (jActiveModules), GXNA (Gene eXpression Network Analysis) [8], several methods revised from ClustEx and the simple DE gene approach with minimum fold change (FoldChange_ [fold]) ( Figure  3). Generally, ClustEx predictions are better both on sensitivity and signal-to-noise ratio (S/N) on the reference responsive gene sets, except that FoldChange_2.0 (with minimum fold change 2.0) exhibits much higher sensitivity on the literature reference gene set (TNFLitRef ). As the cutoff of the hierarchical clustering is gradually relaxed (from 0.5 to 1.0), the sensitivity of ClustEx increases but the S/N decreases. The other two module identification methods also show higher specificities than FoldChange_2.0, which suggests that the interaction data  Gene set analysis of KEGG pathways, GO biological processes and microRNA (miRNA) target genes were conducted to find additional supporting evidence. Sixteen pathways were enriched in the biggest responsive gene module identified by ClustEx, including many known pathways affected by TNF, such as Apoptosis, Notch signaling pathway, Jak-STAT signaling pathway, Toll-like receptor signaling pathway and Cell cycle ( Table  1, Additional file 2). Years ago, apoptosis in vascular endothelial cells has been reported after TNF stimulus [38,39]. Looking at the overlapped genes, it is found that caspase apoptosis cascade (CASP3, CASP6, CASP7 and CASP9 in the module) may be activated by TNF. Jak-STAT signaling pathway and Toll-like receptor signaling pathway are two signaling pathways activated by TNF [40][41][42]. Our previous study, which used another two micro-array datasets of TNF-stimulated vascular endothelial cells, also found that apoptosis, Toll-like receptor signaling pathway and Jak-STAT signaling pathway are enriched for the responsive process [43]. jActive-Modules found eleven enriched pathways, GXNA found five pathways and FoldChange_2.0 found nine pathways.  The average rank of the pathway enrichments was higher for ClustEx (average rank 1.86) than the other three methods (jActiveModules 2.32, GXNA 3.18, DE gene approach 2.64) ( Table 1).
For the enriched miRNA target gene sets (the target gene sets are downloaded from the TargetScan website [33]): comparing with five for jActiveModules, four for GXNA and six for FoldChange_2.0, ClustEx found eight miRNAs, more than the other methods ( Table 2, Additional file 3). These results suggest that ClustEx captures more signaling and regulatory information from the gene expression and interaction data of TNF stimulated HUVECs. In the enriched miRNAs, miR-221/222 is a well-studied miRNA which can significantly reduce tube formation and migration by directly targeting KIT (c-kit) [44,45]. In the identified biggest TNF responsive gene  module, ETS1, IRF2, ESR1 and SOCS3, which are important genes in inflammation and angiogenesis, are also predicted as the targets of miR-221/222. MiR-18 is located in a large miRNA cluster miR-17~92, which has been identified as an oncogene [46]. It functions as a proangiogenic factor by repressing THBS1 (Tsp-1). MiR-18 is also predicted to target ESR1, IRF2, KIT, NOTCH2, PAPPA and TNFAIP3 in our study. MiR-145 has recently been reported to regulate cell differentiation [47,48]. A set of inflammatory and/or angiogenic genes, including ADAM17, CD40, ETS1, FOXO1, SMAD3 and TLR4, are predicted as the targets of miR-145, which suggests that miR-145 may also play important role in the two processes.
We also analyzed the enriched GO terms of the biggest responsive gene module. The enriched terms for TNF are mainly divided into three classes: apoptosis, protein kinase cascade and I-kB kinase/NF-kB cascade. Apoptosis and I-kB kinase/NF-kB cascade are two main programs activated by TNF. These two GO terms are consistent with the enriched KEGG pathways. The detail information of the enriched GO terms is documented in Additional file 4.

Identification of the responsive gene modules of HUVECs in angiogenesis
Angiogenesis is an essential physiological process in vascular systems. ClustEx was applied to analyze a timecourse microarray dataset of VEGF stimulated HUVECs (GSE10778, 0~6 h, 5 time points [49]), a canonical angiogenesis model [25][26][27][28]. The biggest responsive gene module has 262 genes, including 106 DE genes (Figure 4, Additional file 1). The z-score of the biggest module is 39.81. On the literature reference gene set (VEGFLitRef ), FoldChange_2.0 achieves highest sensitivity and ClustEx show competitive performance with jActiveModules, while on the reference gene set collected from pathway databases (VEGFPathDBRef ), ClustEx achieves highest specificity and competitive sensitivity to FoldChange_2.0 ( Figure 5).
For the following gene set analysis: thirteen pathways and eight enriched miRNA target gene sets were found enriched in the biggest responsive gene module identified by ClustEx; nine pathways and eight miRNAs were found for jActiveModules; one pathway and six miRNAs were found for GXNA; and three pathways and six miRNAs were found for FoldChange_2.0 (Tables 3, Additional file  2 and Table 4, Additional file 3). In the enriched pathways, TGF-beta signaling pathway, Cell cycle and Wnt signaling pathway are frequently reported to be related to VEGF stimulus [50,51]. In the enriched miRNAs, miR-125 is detectable in HUVECs [52] and miR-200 has been reported to play an important role in angiogenesis and tumorigenesis [53]. MiR-132/212, ranked as the first for the VEGF dataset, may regulate angiogenesis by targeting EP300, MAP3K3, MAPK1 and MAPK3. The enriched GO biological processes are mainly (anti-)apoptosis and RNA/nucleic acid transport related terms (Additional file 4), which is consistent with VEGF pro-angiogenesis effect.

The cross-talk between inflammation and angiogenesis in Notch signaling pathway
Several studies have shown that endothelial cells are closely related to angiogenesis within an inflammatory environment [22,23]. Notch signaling pathway may play essential role in the cross-talk between inflammation and angiogenesis [25,[54][55][56][57]. This pathway was found enriched both in TNF and VEGF responsive gene modules identified by ClustEx. Several repressing signals of notch signaling pathway were found after TNF stimulus, which can promote angiogenesis sprouting with the following VEGF stimulus [25,54]. Some transcription factors in the identified responsive gene modules, such as RELA (NF-kB), YY1 and SMAD3, which are the direct and highly co-expressed neighbors of the genes in KEGG annotated Notch signaling pathway, may also participate in the signaling.

Limitation of the protein-protein interaction edges
Some cell adhesion molecules of HUVECs significantly up-regulated in inflammation, such as ICAM1, VCAM1 and SELE were not covered in the identified responsive gene modules. We manually checked the expression correlations between these genes with their neighbor genes and found that the correlations are relatively low. The  promoters of the three genes contain multiple transcription factor binding sites of the NF-kB complex (NFKB1, RELA), which are significantly up-regulated by TNF stimulus and covered in the biggest TNF responsive gene module (the annotations of the promoters and the transcription factor binding sites are obtain from Transcriptional Regulatory Element Database, TRED [58,59]). These observations suggest that the missed responsive genes are more likely to connect with the biggest responsive module by transcriptional regulation rather than protein-protein interaction. So the missing edges representing the transcriptional regulations (and other types of interactions or regulations) should be added in future studies.

Conclusions
Taking the closely-connected and co-expressed differentially expressed (DE) genes in condition-specific gene networks as the signatures of the underlying responsive gene modules provides a new strategy to solve the module identification problem. The responsive gene modules can be identified by finding the extended sub-networks from groups of clustered DE genes. Following this strategy, a two-step method named ClustEx was proposed and applied to identify the responsive gene modules of HUVECs within inflammation and angiogenesis. ClustEx shows better performances than several available module identification tools on reference responsive gene sets.
The following gene set analysis of pathways and miRNA target genes also support ClustEx predictions.

Time-course microarray and genome-wide protein-protein interaction (PPI) data
Two time-course datasets were downloaded from NCBI GEO database [60,61]: GSE9055, Affymetrix Human Genome U133 Plus 2.0 Array (U133Plus2.0), HUVECs stimulated with 10 ng/mL TNF, 0-8 h, 25 time points [36,37] and GSE10778, U133A, HUVECs stimulated with 100 ng/mL VEGF, 0-6 h, 5 time points [49]. Original CEL format files were downloaded and then processed by dChip [62]. The probe signals were collapsed as gene expression signals by the mean value if multiple probes hit the same gene. PPI data were downloaded from HPRD (Release 7) [29][30][31]. Only the genes both in the HPRD PPI dataset and the microarray platform were used in this study.

ClustEx workflow 1) Identification of the differentially expressed (DE) genes
First, the maximum fold change (according to non-logtransformed signals) respect to the 0 h00 m signal was calculated for each gene. Then the genes with minimum 2-fold changes (either up-regulated or down-regulated) were selected as the DE genes. We found 1421 DE genes (15.7%) in the TNF dataset and 709 DE genes (9.36%) in the VEGF dataset.

2) Clustering step: cluster and partition the DE genes into different groups based on their distances in condition-specific gene networks
Cell responses to environmental stimuli are usually organized as relatively separate responsive gene modules. We clustered and partitioned the DE genes into different groups based on their interactions and their dynamic expression correlations. Each edge of the gene network derived from HPRD PPIs was weighted as And the distance between two direct-interacting genes was defined as  The gene-gene distance was defined as the length of the shortest path between the two genes in the gene network. The shortest path length between any pair of DE genes was calculated using Dijkstra's algorithm. Then average linkage hierarchical clustering was used to cluster the DE genes according to the gene-gene distances. Distance cutoff was set to partition the DE gene into separate gene groups.
Hierarchical model analysis (HMA), a basic densitybased clustering algorithm, is also used to cluster the DE genes. The detail description of this algorithm and the corresponding results are presented in (Additional file 5 and 6).

3) Clustering step: select the cutoff for the hierarchical clustering of the DE genes
As observed in previous studies and in our analysis, a big module usually "dominates" the responsive process [7,11]. We traced the size expansion of the biggest DE gene group and the increase of the corresponding distance cutoff. The cutoff is selected at the point after which the cluster expansion becomes much slower. For the TNF dataset, we observed a sharp turn right before 0.8 and the expansion of the cluster is much slower after 0.8 ( Figure  6A), so we chose 0.8 as the cutoff to generate the DE gene clusters. For the VEGF dataset, a relative turn point exists around 0.14~0.15. We ran ClustEx with cutoff 0.14, 0.145, 0.15 and 0.155. The sizes of the final responsive gene modules are similar: 244, 247, 262 and 265, respectively. So we simply chose the cutoff at 0.15 ( Figure 6B).

4) Extending step: reconstruct the responsive gene modules by adding the intermediate genes connecting the DE genes
Microarray can detect the changes at the RNA expression level, but will miss many activity changes at protein level. It is assumed that the genes which are connecting the DE genes in the gene network are also important for cell responses. The final responsive gene modules were constructed by adding the intermediate genes to the DE gene groups found in the clustering step.
To reduce the false positives on the long paths and the huge computational cost for finding the k-shortest paths between all pairs of nodes in the whole gene network, the extending step was implemented as follows: first, the genes on the shortest paths between the DE genes were added to form a connected sub-network; then the subnetwork was extended by one step in the whole gene network (it means the search space of the extending is limited in the DE genes, the genes on DE genes' shortest paths and the genes directly interacted with the former two kinds of genes); finally, the responsive gene modules were identified by extracting all the genes and edges on the 10-shortest paths between all the pairs of the DE genes in the extended sub-network. The k-shortest paths were calculated using an implementation of Yen's algorithm (k-shortest paths mean the shortest k [1 st -k th short-est] paths connecting the gene pair in the weighted network) [63]. Necessary changes were made in the source codes.

5) Extending step: select "k" for the adding the genes on the kshortest paths
Similar to find the cutoff of the hierarchical clustering, we traced the size expansion of the biggest responsive gene module by increasing "k" from 1 to 20. No obvious cutoff was observed as in the curve of the size of the biggest DE gene cluster in the previous section. We empirically selected "k" as 10: the increased module size from 0 to 10 is more than 5 times as the increased size from 10 to 20 (for TNF dataset, 154/28 = 5.5; for VEGF dataset, 156/16 = 9.75) (Figure 7). The identified responsive gene modules are stable around the "k = 10": as the "k" reduces from 10 to 8, the size of the module is only reduced by 2.8% for the TNF dataset and by 0.8% for the VEGF dataset; as the "k" increases from 10 to 12, the size of the module is only increased by 2.1% for the TNF dataset and by 1.9% for the VEGF dataset. These small changes do little impact for the following analysis.

6) Evaluate the statistical significance of the responsive gene modules
The evaluation method described in [11] was used to estimate the statistical significance of the identified responsive gene modules. First, the score for the edge connecting gene x and gene y was defined as sd(x) and sd(y) are the standard derivations of the expressions of gene x and y in microarray datasets, respectively. |cor(x, y)| is the Pearson correlation of gene x and y (absolute value). The module score (mscore) was calculated by summing the escores of all edges in the module Then we randomly sampled the same number of edges in the whole network and calculated the shuffled module score The random sampling processes were repeated 10,000 times and the statistical significance was evaluated by zscore:

7) ClustEx package for download
To facilitate the usage of ClustEx, we prepared the ClustEx package including two network distance calculation programs (modified Yen source codes are included in the package), several Perl scripts and the installation script. Users can download the package via our website: http://bioinfo.au.tsinghua.edu.cn/member/~gujin/ clustex/ or via email: jgu@tsinghua.edu.cn . Current release requires huge computational cost, especially long waiting time. We will develop future version to solve this problem. We will also include the scripts to help determine the parameters of ClustEx (hierarchical clustering cutoff and "k" for the k-shortest path) in the future version.

Evaluation of computational methods' performances by reference responsive gene sets
We prepared several reference responsive gene sets to evaluate the performances of the computational approaches: TNFLitRef (TNF literature reference gene set), 376 genes. The gene symbols were analyzed and extracted from the 998 PubMed abstracts (before 2009/11/10) using keyword (TNF AND HUVEC*) by Agilent Literature Search (v2.71), a CytoScape plug-in. Then gene symbols were converted to Entrez Gene IDs by IDConverter [64] (a few genes not transferred by IDConverter were manually converted). The genes not covered by HPRD or Affy U133Plus2.0 array were removed. TNFNetPathRef (TNF NetPath pathway reference gene set), 184 genes. All Entrez Gene IDs were derived from "TNF signaling pathway" curated in NetPath database [65]. The genes not covered by HPRD or Affy U133Plus2.0 platform were removed. TNFPathDBRef (TNF pathway database reference gene set), 63 genes. Entrez Gene IDs of the reference genes were derived from following TNF related signaling pathways: BioCarta "TNF/stress related signaling", "TNFR1 signaling pathway and TNFR2 signaling pathway" [66], PID "TNF receptor signaling pathway" [67] and Reactome "TNF signaling" [68]. The genes not covered by HPRD or Affy U133Plus2.0 array were removed.
VEGFLitRef (VEGF literature reference gene set), 342 genes. The gene symbols were analyzed and extracted from the 871 PubMed abstracts (before 2009/11/10) using keyword (VEGF AND HUVEC*) by Agilent Literature Search (v2.71). Then gene symbols were converted to Entrez Gene IDs by IDConverter. The genes not covered by HPRD or Affy U133A array were removed. VEG-FPathDBRef (VEGF pathway database reference gene set), 109 genes. Entrez Gene IDs of the reference genes were derived from BioCarta "VEGF, Hypoxia, and Angiogenesis", PID "Signaling events mediated by VEGFR1 and VEGFR2" and KEGG "VEGF signaling pathway" [32]. The genes not covered by HPRD or Affy U133A array were removed.
We compared the gene lists between the identified responsive gene modules and the reference gene sets. The sensitivity is defined as the percentage of genes in the reference gene set covered by the identified responsive gene module: The signal-to-noise ratio (S/N) was used to evaluate the significance of overlapping. The signal is defined as the number of overlapped genes between the identified responsive gene module and the reference gene set; the noise is defined as the mean of the numbers of the overlapped genes between control modules and the reference gene set: 10,000 control gene sets each with the same size as the studied module were randomly sampling from the complete gene list and then S/N is calculated as the following definition: Comparison with other methods jActiveModules with simulated annealing searching [7] and edge-base scoring method with simulated annealing searching (Matlab + Java codes were obtained by personal communication) [11] were run multiple times with different starting seeds and parameters, but neither one reported significant modules larger than 30 genes. Heuristic searching methods can find the (sub-)optimal results for the objective function if the iterations are long enough. But when the search space is bigger or the structure of the search space is irregular, the searching process is very slow. Due to the high computational cost, we may not be able to find the optimal parameter settings of these programs. Their predictions were not included in the comparison. For jActiveModules with greedy search, the top-scoring module was used in the comparison. EDGE software [69] was used to calculate the p-values evaluating the significances of gene expression changes in timecourse microarray datasets, which were required as jAc-tiveModules inputs. For Gene eXpression Network Analysis (GXNA) [8], the pre-defined sizes of the responsive To fulfill GXNA input requirements, the 0 h00 m signals were repeated 24/4 times as control samples and the signals in the other 24/4 time points were used as case samples. Also due to the high computational cost, we may not be able to find the optimal parameter settings of these programs. The detail settings about the compared program were as follows: a) The edge-based scoring method. The Matlab and Java codes are obtained by email. The package was run as the following parameters: simulated annealing start temperature 1 (default), end temperature 0.01 (default)/0.001 and iteration 30000 (default)/10000. The package was run multiple times with different random seeds. The produced biggest gene modules are no larger than 20 genes for the TNF dataset. Similar results are observed for the VEGF dataset. b) jActiveModules with simulated annealing. This Cytoscape plug-in was run with the default parameter except changing the iteration to 100,000 (the parameter used in the original paper) and switching the Hubfinding On/Off. We ran multiple times with different random seeds. No significant modules were produced by the plug-in. c) jActiveModules with greedy search. The program was run with its default parameter ("search depth" = 1 and "max depth from start node" = 2). The produced modules with the highest scores were used in the comparisons. d) GXNA. The program was run with "-depth 300" for the TNF dataset (./gxna -name [tnf ] -mapFile [tnf ].ann -edgeFile [tnf ].gra -algoType 1-version 001depth 300) and "-depth 250" for the VEGF dataset (./ gxna -name [vegf ] -mapFile [vegf ].ann -edgeFile [vegf ].gra -algoType 1-version 001-depth 250).

Gene set analysis of KEGG pathways, GO terms and miRNA target gene sets
Meet/Min values, commonly used to evaluate the overlapping of the two gene sets [70], were adapted to calculate the pathway/GO enrichments in the responsive gene modules. The GO terms with smaller than 50 genes and larger than 500 genes were removed. Larger Meet/Min values mean higher enrichments: Degree preserving permutation methods were used to generate 1,000 random pathways and the z-scores of Meet/Min were calculated as: The pathways with z-score > 3.0 were reported as enriched in the corresponding responsive gene modules.
Based on the assumption that the genes with higher expression changes, higher correlation with their neighbors and higher connection degrees would be more important, the network-based gene importance scores (gscores) were proposed to evaluate the importance of gene x in the responsive gene module: To evaluate the enrichments of miRNA target gene sets, firstly the overlapped genes were found between the responsive gene modules and the miRNA target gene sets. Then the enrichments were calculated as the sums of the gscores of the overlapped target genes: Degree preserving permutation methods were used to generate 1,000 random miRNA target gene sets and the zscores of tscores were calculated as above. A looser cutoff was used to select enriched miRNA target gene sets (zscore > 2.0). TargetScan (v5.1) [33,34] miRNA target predictions were used in this analysis.