 Methodology Article
 Open Access
 Published:
Detection of statistically significant network changes in complex biological networks
BMC Systems Biology volume 11, Article number: 32 (2017)
Abstract
Background
Biological networks contribute effectively to unveil the complex structure of molecular interactions and to discover driver genes especially in cancer context. It can happen that due to gene mutations, as for example when cancer progresses, the gene expression network undergoes some amount of localized rewiring. The ability to detect statistical relevant changes in the interaction patterns induced by the progression of the disease can lead to the discovery of novel relevant signatures. Several procedures have been recently proposed to detect subnetwork differences in pairwise labeled weighted networks.
Methods
In this paper, we propose an improvement over the stateoftheart based on the Generalized Hamming Distance adopted for evaluating the topological difference between two networks and estimating its statistical significance. The proposed procedure exploits a more effective model selection criteria to generate pvalues for statistical significance and is more efficient in terms of computational time and prediction accuracy than literature methods. Moreover, the structure of the proposed algorithm allows for a faster parallelized implementation.
Results
In the case of dense random geometric networks the proposed approach is 1015x faster and achieves 510% higher AUC, Precision/Recall, and Kappa value than the stateoftheart. We also report the application of the method to dissect the difference between the regulatory networks of IDHmutant versus IDHwildtype glioma cancer. In such a case our method is able to identify some recently reported master regulators as well as novel important candidates.
Conclusions
We show that our network differencing procedure can effectively and efficiently detect statistical significant network rewirings in different conditions. When applied to detect the main differences between the networks of IDHmutant and IDHwildtype glioma tumors, it correctly selects subnetworks centered on important key regulators of these two different subtypes. In addition, its application highlights several novel candidates that cannot be detected by standard single networkbased approaches.
Background
The omnipresence of complex networks is reflected in wide variety of domains including social networks [1, 2], web graphs [3], road graphs [4], communication networks [5], financial networks [6] and biological networks [7–9]. Although we focus on biological networks many aspects of the method proposed in this paper can also be applied for networks in other contexts. In cancer research, the comparison between gene regulatory networks, protein interaction networks, and DNA methylation networks is performed to detect differences between two conditions, such as, healthy and disease [10, 11]. This can lead to discovery biological pathways related to the disease condition, and, in case of cancer, the gene regulatory changes as the disease progresses [12–14].
A central problem in cell biology is to model functional networks underlying interactions between molecular entities from high throughput data. One of the main question is how the cell globally changes its behavior in response to external stimuli or what is the effect of alterations such as, driver somatic mutations and changes in copy number. Signatures of differentially expressed and/or methylated genes are the downstream effect of global cell deregulation in different conditions such as cancer subtypes. Therefore, it is argued that driver mutations activate functional pathways described by different global rewiring of the underlying gene regulatory network.
The identification of significant changes induced by the presence or the progression of disease can help to discover novel molecular diagnostics and prognostic signatures. For example, it is known that, according to the mutation of the gene IDH [15, 16], the majority of malignant brain tumors can be divided two main macrocategories, which can be further divided in seven molecular and clinically distinct subtypes [17]. These two macrogroups are characterized by highly different global expression and epigenomic profiles. Hence, one of the main questions to understand the molecular basis of diseases is how to identify significant changes in the regulatory structure in different conditions.
Various techniques have been developed to compare two graphs including graph matching and graph similarity algorithms [18–20]. However, the problem addressed in this paper is different from popular graph theory problems including graph isomorphism [21] and subgraph matching [22]. Here the goal is to identify statistically significant differences between two weighted networks (with or without labels).
One common statistic used to distinguish one graph, A from another B, having the same number of nodes N, is the Mean Absolute Difference (MAD) metric, defined as: \(d(A,B)=\frac {1}{N(N1)}\sum _{i\neq j}a_{ij}b_{ij}\), where a _{ ij } and b _{ ij } are edge weights corresponding to the topology of networks A and B. This distance measure is equivalent to the Hamming distance [23] and has been extensively used in literature to compare networks [24, 25]. Another statistic used to test association between networks is the Quadratic Assignment Procedure (QAP) defined as: \(Q(A,B) = \frac {1}{N(N1)} \sum _{i=1} \sum _{j=1} a_{ij}b_{ij}\). The QAP metric is used in a permutationbased procedure to differentiate two networks [26, 27]. Ruan et al. showed that these metrics are not always sensitive to subtle topological variations [28].
Our aim is to detect statistically significant differences between two networks under the premise that any true topological difference between the two networks would involve only a small set of edges when compared to all the edges in the network. Recently, a Generalized Hamming Distance (GHD) based method was introduced to measure the distance between two labeled graphs [28], where it was shown that the GHD statistic is more robust than MAD and QAP metrics for identifying subtle variations in the topology of paired networks. In particular the authors showed that GHD permutation distribution follows a normal distribution with closedform expression for first two moments under the null hypothesis that networks A and B are independent. Utilizing the moments, corresponding pvalues were obtained in closedform. They also propose a differential subnetwork identification technique namely dGHD. The advantage of this technique is that – unlike previous differential network analysis techniques [25, 29, 30] – it provides a closedform solution for pvalues for the differential subnetwork left after iterative removal of the least differential nodes. We propose an improvement over dGHD, namely ClosedForm approach that exploits the conditions for asymptotic normality which is computationally cheaper and attains better prediction performance than the dGHD algorithm. Computational efficiency and prediction accuracy is crucial in cancer contexts where networks have a large number of nodes and the topological difference is associated to few driver genes.
Methods
Preliminaries on generalized hamming distance
The Generalized Hamming Distance is a way to estimate the distance between two weighted graphs [28]. Let A=(V,E _{ A }) and B=(V,E _{ B }) be two graphs, with the same set of nodes V={1,…,N}, and different sets of edges, E _{ A } and E _{ B }. The Generalized Hamming Distance (GHD) is defined as:
where \(a^{\prime }_{ij}\) and b ij′ are mean centered edgeweights defined as:
The edge weights, a _{ ij } and b _{ ij }, depend on the topology of the network and provide a measure of connectivity between every pair of nodes i and j in A and B. Different metrics have been adopted to measure the connectivity between pairs of nodes, including: topological overlap (TO) [31, 32], cosine similarity and Pearson correlation [33]. In our experiments, we used the cosine similarity to capture first order interactions between the nodes in the network. Cosine similarity computation scales well for large sparse networks and can be used in place of TO, as it has nearly perfect correlation with it.
Given two networks A and B, a permutation π of the labels of the vertices of A (keeping the edges unchanged) generates a permuted network A _{ π }. The quantity G H D _{ π }(A _{ π },B) represents the test statistics of an inferential problem having as null hypothesis \(\mathcal {H}_{o}\): Graphs A and B are independent [28]. The distribution of G H D _{ π } can be obtained through an exhaustive calculation which can be approximated by a Monte Carlo approach. The authors of [28], indeed, simplified this calculation showing that under the null hypothesis it can be approximated well by a normal distribution with moments that can be obtained analytically.
This can be shown as:
where μ _{ π } is the asymptotic value of the mean GHD statistic and σ _{ π } is the asymptotic value of the standard deviation of GHD statistic computed between A _{ π } and B. In order to calculate the μ _{ π } and σ _{ π } values we define:
Here \(a_{ij}^{t}\) and \(b_{ij}^{t}\) are the edge weights with the power t. Furthermore, we require the following terms:
Using these definitions the closedform expression for mean μ _{ π } and variance \(\sigma _{\pi }^{2}\) are expressed as:
Given a significance threshold α (e.g. 0.01), pvalues >α indicate that there is no sufficient evidence to reject the null hypothesis (\(\mathcal {H}_{o}\)) that graphs A and B are independent. Hence, higher pvalues indicate more probability that the two graphs under consideration are independent.
Differential subnetwork detection with GHD
The GHD distance is able to tell us to what extent are two graphs different but is not able to identify which parts of the graph are similar and which are different. In this work, we are interested in detecting which part of the graphs contribute to make the two graphs different. We call such different subgraphs differential subnetworks.
The notion of differential subnetworks is based on the idea that when comparing two networks only a subset of edges would present altered interaction. The goal is to identify the set of nodes, namely V ^{∗}, associated with such a subset of edges and the pvalues p ^{∗} corresponding to the nodes in V ^{∗}. This goal, formulated as a statistical test, requires that for such a subset V ^{∗} there is no sufficient evidence to reject the null hypothesis that the corresponding subnetworks \(A^{*}(V^{*},E_{A^{*}})\phantom {\dot {i}\!}\) and \(B^{*}(V^{*},E_{B^{*}})\phantom {\dot {i}\!}\) are statistically independent.
The idea here is to adopt an iterative technique to identify the set of nodes V ^{∗} which contributes more to the difference. We start from the dGHD algorithm proposed in [28]. The algorithm measures the edge connectivity with topological overlap metric and benefits from the closedform solution of pvalue (Eq. (3)). In the dGHD algorithm, an iterative procedure is followed where at each iteration the change in centralized GHD (cGHD) i.e. cGHD=GHD(A,B)−μ _{ π } is estimated after the removal of one node. The node where the change in cGHD (i.e. difference in cGHD before and after removal of a node) is maximum is removed. The GHD statistic is computed for remaining subnetworks and the pvalue is estimated. This process is repeated till a user specified minimal set size is reached or it is nolonger possible to have closedform representation for pvalues which happens for N≤3 as shown in Eq. 3. The pvalues are then adjusted for multiple testing by controlling the false discovery rate [34].
The dGHD algorithm suffers from the following limitations: a) During the i ^{th} iteration, the GHD measure is calculated N−i times on different subgraphs with an overall time complexity ∼O(N ^{2}×E) where E=E _{ A }∪E _{ B }; b) The algorithm is prone to discovery more false positives since it uses the change in cGHD (ΔcGHD) as a model selection criterion. We overcome such limitations by proposing the following improvements:

1.
Remove nodes by exploiting the ClosedForm. We use the idea that nodes which have similar topology in networks A and B will contribute the least to cGHD. So, we first calculate the closedform contribution of each node in cGHD once using Eq. 4 and then iteratively remove nodes with least contributions. However, this process is continued till we observe that the pvalue of the remaining subnetwork becomes greater than a threshold θ.

2.
Using a different model selection criterion. Once the pvalue reaches θ, we follow a procedure similar to the dGHD algorithm but use the more intuitive criterion of selecting the node that when removed makes the cGHD value maximum rather than using the change in the cGHD value (before and after removal of a node) as a model selection criterion. By using this model selection criterion, we iteratively identify and remove that node whose contribution is least in the cGHD.
The advantage of the ClosedForm approach is that we significantly reduce the computational complexity and improve the predictive performance. A simple alternative to the ClosedForm approach would be to sort all the nodes based on their contribution to cGHD and thus rank all the nodes based on their capability to differentiate the two networks with complexity (O(N logN)). However, then we will not be able to identify statistically different subnetworks between the two graphs as indicated in [28].
Closedform approach
We propose a fast approach to perform differential subnetwork analysis taking into consideration the contribution of each node to GHD and μ _{ π }. Using Eqs. (1) and (3) this can mathematically be represented as:
We observe that if we sum GHD(A,B)(i) and μ _{ π }(i)∀i∈V, we obtain GHD(A,B) and μ _{ π }. We use the idea that nodes which have similar topology in networks A and B will contribute the least to centralized GHD, i.e. GHD(A,B)−μ _{ π }. We calculate the ClosedForm contribution of each node in the centralized GHD (cGHD) once using Eq. (4) and then iteratively remove nodes with least contribution to the cGHD, i.e. nodes having similar topology in graphs A and B. Thus, we calculate cGHD once and sort all the nodes based on their contribution to the cGHD metric.
This process is continued till we observe that the pvalue of the remaining subnetwork becomes greater than a threshold θ. Once the pvalue reaches θ, we estimate \(\Delta _{V_{K}}=\text {GHD}\left (A(V_{K},E_{A}),B(V_{K},E_{B})\right)\mu _{V_{K}}\) where \(\mu _{V_{K}}\) is the mean of the permutation distribution for the nodes (V _{ K }) of the remaining subnetwork. Furthermore, we define \(\Delta _{V_{Ki}}\) as the value of cGHD after removal of node i. We adopt a different model selection criterion than that proposed in [28] to remove nondifferential nodes. We use the intuitive criterion of selecting that node after removal of which the cGHD value becomes maximum, i.e. the node which was most similar in terms of topology for the pairedgraphs. Finally, the obtained pvalues are adjusted for multiple testing by controlling the false discovery rate [34]. Provided the pairedgraphs A and B, the calculation of \(\Delta _{V_{Ki}}\) can be done independently for each i. Details of the ClosedForm method is provided in Algorithm 1. The sensitivity of the ClosedForm approach with the parameter θ is demonstrated in Experimental Results section. Table 1 summarizes the improvements with respect to the dGHD algorithm in terms of time complexity.
Alternative procedure (fast approximation)
We propose an alternative procedure to the ClosedForm approach namely the Fast Approximation method where we first calculate the cGHD value without including the i ^{th} node, ∀i∈V once. This helps to estimate the cGHD value after removal of the i ^{th} node and can be performed in parallel. Our aim is to quickly discard those nodes after removal of which the cGHD value becomes large thereby removing nodes which were contributing least to the cGHD value. This helps to reduce the dependence between the two subnetworks by removing nodes which have similar topology in graphs A and B. Again, the idea is motivated by the premise that only a subset of nodes will form the differential subnetworks in graph A and B.
In this approach, we iteratively discard those nodes after removal of which the cGHD value becomes maximal till the pvalue for the remaining subnetwork reaches a threshold θ. Once the pvalue reaches θ, we return back to the procedure of estimating \(\Delta _{V_{Ki}} \forall i \in V_{K}\) as described in the ClosedForm approach. We use the same model selection criterion of selecting that node after removal of which the cGHD value becomes maximum as used in the ClosedForm approach. We then adjust the obtained pvalues for multiple testing by controlling the false discovery rate [34]. We refer to this technique as a Fast Approximation to the dGHD [28]. We explain the Fast Approximation technique in detail in Algorithm 2.
From our experiments, we observe that the results of the ClosedForm approach and the Fast Approximation technique are identical. Although, in the case of ClosedForm approach, we calculate closedform contribution of each node in the cGHD value and remove the node with least contribution, while in case of Fast Approximation we select that node after removal of which cGHD value becomes maximum, the ordered list \(\mathcal {O}\) obtained for both the methods is identical. Moreover, the computational complexity of the FastApproximation technique is the same as that of ClosedForm approach.
Inference of the glioma networks and master regulator analysis
We used the TCGA panglioma samples dataset including 1250 samples (463 IDHmutant and 653 IDHwildtype), 583 of which profiled with Agilent microarray and 667 with RNASeq Illumina HiSeq (REF) downloaded from the TCGA portal. The batch effects between the two platform were corrected using the COMBAT algorithm [35]. The final gene expression data matrix includes 12,985 genes and 1250 samples. We reconstructed two gene regulatory networks belonging to two different glioma subtypes: IDHmutant and IDHwildtype. Both networks were reconstructed with a four step procedure that follows ARACNe [36]: i) Computation of mutual information between gene expression profiles to determine interaction between Transcription Factors (TFs) and target genes [37]; ii) data processing inequality to filter out indirect relationships [36], iii) permutation test with 1000 resamplings to keep only statistically significant relationships. We also assembled a global glioma network using all the available 1250 transcriptional profiles using the aforementioned method. In this last case we also used intersection with transcription factor (TF) binding sites to keep only relationships due to promoter binding. We used a set of 457 TF binding sites available in the MotifDB Bioconductor package.
Master Regulator Analysis (MRA) algorithm [38] was applied to the global glioma network in order to compute the statistical significance of the overlap between the regulon of each TF (i.e. its ARACNe inferred targets) and the differentially expressed gene list (WilcoxonMannWhitney test FDR≤0.05) between IDHmutant and IDHwildtype samples. Given a gene interaction network, generated by ARACNe and a gene phenotype signature (e.g. a set of differentially expressed genes), the MRA algorithm computes for each TF the enrichment of the phenotype signature in the regulon of that TF. The regulon of a TF is defined as its neighborhood in the gene interaction network. There are two different methods to evaluate the enrichment of the signature in the regulon. One method uses the statistical Fisher’s exact test, while the other approach uses Gene Set Enrichment Analysis (GSEA). Here we used this last method.
A Master Regulator (MR) gene is a TF which regulon exhibit a statistical significant enrichment of the given phenotype signature.
Validation in the Rembrandt dataset
We used an independent dataset to perform the same analysis of network differencing between IDHmutant and IDHwildtype gliomas and check the the overlap between the two analyses. Raw gene expression (Affymetrix U133 Plus 2.0) from the publically available Repository for Molecular Brain Neoplasia Data (Rembrandt) (https://caintegrator.nci.nih.gov/rembrandt/) included 444 samples divided in 218 Glioblastoma, 148 Astrocytoma, 67 Oligodendrogliomas and 11 mixed histologies. Expression subtype and IDH status was inferred from gene expression following the procedure in [39] resulting in 153 wildtype and 162 mutant samples. These two set of expression profiles were used to generate two regulatory networks using the same approach reported above.
Results and discussion
For all our experiments, we used the ClosedForm approach (since results obtained from ClosedForm and FastApproximation techniques are identical) and compare it with the dGHD method [28].
Cosine similarity and topological overlap
The onestep topological overlap measure used to estimate the edge weights is defined as:
In this work we use the cosine similarity to calculate the edge weights a _{ ij }. The cosine similarity takes into consideration onestep neighborhood of nodes i and j while constructing the edge weight and is very efficient to calculate for sparse matrices. The weights a _{ ij } are estimated as follows:
where A _{ ij } represents the adjacency matrix.
We perform an experiment to calculate the correlation between the onestep topological measure and the cosine similarity measure. For this experiment, we generated 250 random geometric networks using N=250 and the connectivity parameter d=0.15.
Figure 1 shows that the cosine similarity metric is nearly perfectly correlated (Pearson correlation = 0.952) to the topological overlap measure.
Sensitivity to θ
In this experiment, we check the sensitivity of the proposed ClosedForm approach w.r.t. the heuristic θ. For this experiment, we first generated 100 random geometric (RG) networks. In a RG network nodes are generated by uniformly sampling N points on [ 0,1]^{2}. An edge is then drawn between these points if the Euclidean distance between the points is less than a parameter d. This parameter d controls the density of the RG network where smaller values of d result in sparse networks while larger values of d generates dense networks. In our case, we conducted experiments using two different settings. In the first case, we use d=0.15, while in the second setting, we use d=0.3. For both experiments we fix N=250. For each value of d and for each generated RG network A, we permute the first 50 rows and columns of the network to generate network B. Therefore, the first 50 nodes in networks A and B form the goldstandard.
In order to test the sensitivity of the proposed approach w.r.t. θ, we estimate the fraction of permuted nodes correctly identified by the ClosedForm method for various values of θ. We used a grid of θ values varying from Θ={10^{−50},…,10^{−300}} in multiplicative steps of 10^{−20}. The goal of this experiment is to show that the fraction of correctly identified nodes w.r.t. various θ∈Θ remains nearly constant for smaller values of θ. Figure 2 shows the result for RG networks with density parameter d=0.15 and d=0.3. From Fig. 2, we observe that the median fraction of permuted nodes identified by the proposed approaches increases slowly before it converges to a nearly constant value as we decrease the threshold θ (i.e. increase absolute log of threshold θ).
From this experiment, we conclude that the fraction of truly differential nodes identified by the proposed methods increases as we decrease the threshold θ before it starts to converge for smaller values of threshold θ.
We performed further experiments using different θ for various values of N and observed that threshold θ behaves similarly independent of the value of N. We used the θ=10^{−50} as heuristic cutoff for future experiments.
Predictive performance comparison
Experimental Setup: The next simulation study that we carried out was to compare the predictive performance of the proposed approach w.r.t. the dGHD [28] technique. For this experiment, we generate 100 RG networks with N=1,000. For the first experiment we fix the density parameter d=0.15 and permute first 100 nodes in network A to obtain network B. Thus, these first 100 nodes form the differential subnetwork for the paired networks A and B.
In the second case, we use the density parameter d=0.3 to generate the edges for network A. We then generate a small RG network with 100 nodes using density parameter d ^{′}=0.5. This small dense subnetwork is then used to replace the network formed by first 100 nodes in the original network A to form network B. Thus, in the second experiment, these 100 nodes form the differential subnetwork for the paired networks A and B. This kind of mechanism can appear in reallife networks, for example, in case of cancer the transcription activity of some set of genes might get enhanced or suppressed in patients resulting in more or fewer edges in a subnetwork of the gene or DNA methylation network. Hence, the networks generated in the first case are much sparser in comparison to the networks generated in the second case.
Evaluation Metrics: We define the following terms to be used in our analysis:

True Positives (TP)  Refers to the nodes that are correctly identified as part of a differential network.

False Positives (FP)  Refers to the nodes that are incorrectly identified as part of a differential network.

False Negatives (FN)  Refers to the nodes that are part of the differential subnetwork but are not identified correctly as part of the subnetwork.

True Negatives (TN)  Refers to the nodes that are correctly identified as nodes which are not part of the differential subnetwork A ^{∗} and B ^{∗}.
ROC and PR curve comparisons: We generate two set of plots including the receiver operating characteristic (ROC) curves and the precisionrecall (PR) curves. To generate the plots as shown in Fig. 3, we use the ‘ROCR’ [40] package in R. It generates relatively smooth curves by automatically using different thresholds to estimate the true positive rate i.e. \(\frac {n(TP)}{n(TP)+n(FN)}\) and the false positive rate i.e. \(\frac {n(FP)}{n(FP)+n(TN)}\) for ROC plot and precision i.e. \(\frac {n(TP)}{n(TP)+n(FP)}\) and recall i.e. \(\frac {n(TP)}{n(TP)+n(FN)}\) for the PR plot. Here we use the true positive rate (TPR) and Recall interchangeably. Here n(·) represents the total number of nodes. For generating the plots we used the adjusted pvalue lists as obtained from the ClosedForm and dGHD approaches without specifying any threshold to generate smooth curves.
The data in Fig. 3 a and c shows that ClosedForm approach achieves better performance in case of differential subnetworks formed by permuted nodes and subnetworks with higher density. One of the reasons for relatively poor performance of the dGHD approach is that it has low true positive rate (TPR) and a high false positive rate (FPR) when the network has more edges. This is also reflected by the relatively low Recall and Precision values for the dGHD algorithm in Table 2 when d=0.3 and d ^{′}=0.5. From Fig. 3 c, we can observe that the performance of both the dGHD and ClosedForm algorithm improves w.r.t. ROC when the differential subnetwork is denser than the remaining network. However, the gap between the PR curves of ClosedForm and dGHD methods increases when the differential subnetwork is denser.
AUC comparison: For all further simulated experiments, we use pvalue 0.01 as cutoff in order to determine TP, TN, FP and FN respectively. We also evaluated the area under the ROC curve (AUC_ROC [41]) and area under PR curve (AUC_PR [41]) for 100 runs of ClosedForm and dGHD methods (using pvalue 0.01 as cutoff) as shown in Fig. 4.
We observe from Fig. 4 a and b that the dGHD method has lower variance w.r.t. AUC_ROC and AUC_PR metrics in comparison to ClosedForm approach in the case of permuted differential subnetwork. However, in case of denser differential subnetwork, the ClosedForm approach has much smaller variance in comparison to dGHD algorithm w.r.t. AUC_ROC and AUC_PR metrics as depicted in Fig. 4 c and d respectively. This suggests that the performance of ClosedForm technique is better than dGHD method when differential subnetworks are formed either using permuted nodes or higher density. In order to test for significance we performed the Student’s ttest under the null that the difference in the mean values of the two ROC distributions is zero i.e. \(\mu _{AUC\_ROC_{A}}\mu _{AUC\_ROC_{B}}=0\). At a significance level of 5%, we obtain pvalue of 0.48 in case of permuted subnetwork, thereby accepting the null i.e. the difference between the two distributions is not significant. In the case of paired networks with a denser differential subnetwork (i.e. d ^{′}=0.5), we obtain pvalue of 3.42×10^{−14} for the Student’s ttest, thereby rejecting the null. Similarly for the two PR distributions we obtained pvalue of 0.42 in case of permuted subnetwork and pvalue of 2.64×10^{−20} for the denser differential subnetwork.
Comparison with community detection techniques
The task of identifying differential subnetworks can also be rephrased as one of finding heavy subnetworks on a single network (say C) constructed by considering the absolute difference in the edge weights between the topological graph of network A and the topological graph of network B i.e. C _{ ij }=∥a _{ ij }−b _{ ij }∥,∀i,j∈V. This problem can then be construed as one of identifying dense modules in the network C i.e. from the previous experiments we want to discover a module corresponding to the set of nodes which have permuted or identify the denser subnetwork forming the differential subnetwork as a module.
The task of identifying dense/heavy modules in a network (C) is often referred as community detection or graph partitioning or graph clustering. There is a plethora of research associated with the problem of community detection including [42–49]. Several of these methods such as jActiveModules [50] and Spinglass algorithm [45] have also been applied to identify biologically meaningful modules (like functional modules, protein complexes, disease associated genes etc.) in biological networks as shown in [51, 52]. For our task of identifying dense modules in network C we applied 3 different community detection methods namely Louvain [43], Infomap [44] and Spinglass [45] techniques to have a comprehensive comparison with the proposed ClosedForm approach. We used the implementation of these methods available in the ‘igraph’ package in R and run each of these methods at their default settings.
We used the same set of RG networks as in the previous experiments to have a comparison with the community detection techniques. Since we are considering the difference in the topology of networks A and B in network C, we remove all the similarity between the two networks and the module with the maximum internal volume (i.e. total weight of edges within the community) is the one capturing the maximum difference between the topologies of networks A and B. Hence, we consider the densest inferred module as the one comprising the differential subnetwork and label all the nodes belonging to this cluster as differential while all the other modules are considered nondifferential. Using this notion to label the inferred communities, we compare the results obtained for the 3 different community detection techniques w.r.t. the gold standard (i.e. the actual set of labeled nodes which either belong to the permuted subnetwork or belong to the denser subnetwork) in a binary classification framework [53, 54]. These results are integrated in Table 2 along with the results of dGHD technique and the proposed ClosedForm (CF) approach. We assess the results obtained from the 3 community detection methods w.r.t. several quality metrics commonly used for binary classification including Precision, Recall, Kappa, Accuracy, Specificity, AUC_ROC and computational time. From Table 2, we observe that the Louvain method clearly outperforms the Infomap and Spinglass techniques in correctly identifying the differential subnetwork as a module with respect to the various evaluation metrics.
Simulated result analysis
Finally, the summary Table 2 highlights the computational efficiency and better predictive capabilities of the proposed technique in comparison to dGHD algorithm. For this comparison, we report the results obtained on 100 random runs of RG networks with N=1000,d=0.15 and d=0.3 respectively, where the first 100 nodes are permuted. We also report results when the first 100 nodes form the denser differential subnetworks i.e. in experiments where d=0.15 use d ^{′}=0.3 to form denser subnetwork and where d=0.3 use d ^{′}=0.5 to form denser subnetwork. We also conducted experiments on undirected Power Law (PL) graphs using N=1000 and E=10,000 with power law exponents α={2,3} respectively. We permuted the first 100 nodes of each PL network (B) to form the permuted network (A). We performed 100 random runs and report the mean values for various evaluation metrics.
Table 2 compares the ClosedForm, Louvain, Infomap, Spinglass and dGHD techniques w.r.t. various standard evaluation metrics like AUC, Precision, Recall, Accuracy, Specificity, Kappa statistic and computational Time for all the simulation experiments. Higher values of these evaluation metrics represents better quality results. Here the time required by dGHD algorithm is normalized to 1 and the time required by the other algorithms is scaled by the same normalization factor.
We observe from Table 2 that the ClosedForm approach performs exceedingly well in case of experiments on denser RG networks (d=0.3) and PL graphs. It emerges as the best method on these networks for various evaluation metrics. For this configuration, in case of both permuted and denser differential subnetworks, the mean AUC_ROC of ClosedForm approach is at least 10% higher than the dGHD algorithm. This is also reflected in higher values of Precision (0.714 and 0.771) and Recall (0.789 and 0.930) metrics for ClosedForm approach in comparison to low values of Precision (0.645 and 0.7) and Recall (0.577 and 0.731) for the dGHD algorithm in case of these experiments.
However, in case of sparse networks where its relatively easier to identify differential subnetworks ([28]), both ClosedForm and dGHD method have similar predictive performance. For sparse networks, the Louvain method nearly outperforms all other methods for the task of identifying the differential subnetwork as a module. From Table 2, we observe that the 3 community detection techniques have nearly perfect Recall scores but usually have relatively low Precision values. This indicates that these methods correctly identify all the nodes forming the differential subnetwork but also detect a large quantity of falsepositives in the densest module, thereby reducing the Precision values. The Louvain and Infomap methods are extremely fast and interestingly the Louvain method has highest Precision (0.887) which is at least 10% higher than dGHD algorithm and 5% higher than ClosedForm approach while identifying the dense differential subnetwork in a sparse network (d=0.15,d ^{′}=0.3) as shown in Table 2.
We observe that among the community detection techniques the Louvain method is the most efficient and is highly competitive with the dGHD algorithm but cannot outperform the ClosedForm approach on denser networks and Power Law graphs.
Case study in glioma
As a case study, we performed the differential subnetworks analysis of two gene regulatory networks reconstructed from the glioma dataset available on the TCGA. It is well known that the majority of gliomas are divided into two main macrocategories according to the mutation of the gene IDH1 [15, 17, 55]. Therefore, an important biological question, that motivated the development of the reported methodology, was to identify the subnetworks of of transcription factors (TFs) having a different regulatory program in these two major conditions. We reconstructed two gene regulatory networks belonging to two different glioma subtypes: IDHmutant and IDHwildtype as reported in the “Methods” Section.
In our final networks we have 457 TFs and 4,085 targets. We observe that these networks consist of 13,683 unique connections for IDHmutant and 14,158 for IDHwildtype between TFTF and TFtarget. Using these networks, we construct two unipartite topological graphs as described in the Methods section for the 457 TFs. We then perform the proposed differential subnetwork analysis to identify the TFs which are part of differential subnetworks in these topological graphs.
Figure 5 shows the significant differential subnetworks and Table 3 reports the topmost TFs which are part of differential subnetworks as detected by our algorithm. In this table, GHD and μ _{ π } represent the generalized Hamming Distance and its asymptotic mean between the subgraphs after removing the specific transcription factor in each row of the table. Additional file 1: Table S1 instead reports the results for all the 457 considered transcription factors.
In order to highlight the difference of ClosedForm approach with other standard network analysis methods, we also assembled a global glioma network using all the available transcriptional profiles using the same method described above and performed a master regulator analysis [38] with respect to the molecular phenotype under investigation, i.e. genes differentially expressed between IDH mutant and wild type. Master regulator analysis is extensively adopted to identify TFs that act as principal regulators in driving the phenotype from one condition to another.
Interestingly, among the topmost TFs (out of 457) forming the differential subnetworks, we found several genes known to have a central role in controlling specific glioma subtypes as well as novel candidates that deserve further biological validation. In particular, differential network analysis reveals that the subnetwork of STAT3 is one the most different between IDHmutant and IDHwildtype networks and a particularly significant Master Regulator of this wildtype phenotype. Members of our group have previously shown that STAT3, together with C/EBP β, is a key regulator of the mesenchymal differentiation and predicts the poor clinical outcome of IDHwildtype gliomas [38]. Another key regulator of the IDHwildtype gliomas was recently reported by using an integrative functional copy number analysis is the set of HOXA genes [17]. Moreover, another key network hub that the algorithm detects as different is SOX10 which appears to be an active master regulator of the IDHmutant phenotype. We recently reported that the GCIMPlow subgroup in the IDHmutant cohort can mediated by loss of CpG methylation and binding of SOX factors [17]. Furthermore, our algorithm identifies methylCpGbinding domain protein 2 (MBD2) as a differential network hub. In particular, MBD2 has no links in the IDHwildtype network whereas it is highly connected in the IDHmutant network where it is characterized by the CpG island methylator phenotype (GCIMP) [56]. Further investigation is needed to claim such a hypothesis as MBD2 is known also as a mediator of the epigenetic gene regulation and its role in Glioblastoma is being studied as its overexpression may drive tumor growth by suppressing the antiangiogenic activity of key tumor suppressors [57].
The differential network method highlights several other TFs as hubs of differential subnetworks which are not detected with standard MRA. For example, ETV1 and ETV4 which are overexpressed in gliomas of the Codel subtype carrying the mutation of the CIC gene [58]. Another differential subnetwork hub not detected by standard MRA is the tumor suppressor RFX1 whch has been identified as an important target/regulator of the malignancy of Glioblastoma [59], where as the cell cycle regulators such as E2F1 and E2F1, which play a role in progression of IDHmutant glioma are also detected by the ClosedForm algorithm [60].
An important warning that we want to mention is the presence of potential confounding effects due to the adopted dataset obtained by merging the expression profiles from two different platforms. With the additional difficulty that the distribution between IDHwildtype tumors and IDHmutated tumors is unequal between the two platforms (92% of microarray data are wildtype). We adopted this integrated dataset in order to build the two IDH networks and the global glioma network. The main computation in this case is the estimation of the mutual information between pairs of gene profiles (variables) in a set of observations (patients) and each individual pair of values is always extracted in the same platform. We used a robust knearest neighbor estimator proposed in [61] available in the PARMIGENE R package [62]. This estimator is not based on binning of values and is non parametric, working on the geometry of the scatterplot of each pair of gene expression values. Therefore, each observation (sample) can be seen as another evidence of dependency (or independency) between the variables regardless to the platform. Although, we found this merged dataset useful for the estimation of dependencies between genes, its adoption for deriving conclusions in terms of sample groups and pathway analysis should be made with caution.
As a further independent experiment, we performed the same analysis using the REMBRANDT dataset with the network differential analysis on the two networks independently built with ARACNe and the Master Regulator Analysis on the global network. The Table 4 reports the results for the most different TF subnetworks detected by the ClosedForm algorithm on this dataset. Interestingly of the top nine differential nodes obtained in the TCGA dataset five (FOXJ3, NFIA, CREB1, SOX10, KLF13) are also detected as significant in the REMBRANDT dataset suggesting that these TFs have a very different regulatory program in glioma subtypes. Moreover, differently from the TCGA experiment, we observe a significant overlap between the results of ClosedForm and that of the MRA. In particular 70 of the 75 nodes forming the differential subnetworks are also enriched in the MRA (pvalue of the Fisher exact test: 3.3810^{−9}. However, in this case the number of significant master regulators is considerably higher than that obtained in the TCGA case (297 vs. 144).
Conclusion
The comparison of gene expression profiles across different phenotypes is enabling the discovery of novel biomarkers for prognosis or diagnosis. They hold the key to identify novel targets for therapeutical intervention. In this paper, we proposed an improvement to the stateoftheart for comparing two labeled/unlabeled graphs that are representative of two conditions (e.g. the macrocategories according to the mutation of the gene IDH1 in our case study) and identifying statistically significant differences in their topology. We used the centralized GHD (cGHD) metric [28] to calculate the distance between the two labeled networks. We proposed a ClosedForm approach, an improvement to the dGHD algorithm, to detect localized topological differences between paired networks. The ClosedForm approach calculates the closedform contribution of each node in the cGHD metric and efficiently removes nodes with the smaller contributions in the cGHD value. From our experiments on scale free random geometric networks, we discovered that the ClosedForm approach was 1015x faster than dGHD from a computational complexity point of view. For differential subnetwork analysis in very sparse paired graphs, both the ClosedForm and dGHD methods had good predictive performance. They reached mean AUC values of ≈0.935 and ≈0.926 respectively for 100 random runs of simulation experiments. However, for relatively denser networks, the ClosedForm approach outperformed dGHD. The proposed method achieved a mean AUC of ≈0.877 while the dGHD technique reached a mean AUC of ≈0.724. The ClosedForm approach also achieved much higher Precision, Recall and Kappa values in comparison to the dGHD method for relatively denser networks.
We applied our algorithm to detect the main differences between the networks of IDHmutant and IDHwildtype glioma tumors and show that it correctly selects subnetworks centered on important key regulators of these two different subtypes. The adopted dataset is the result of the merging of two different profiling platforms and, as reported in the Results section, its use for other purposes should be made with caution. We also report the results on the same data using standard Master Regulator Analysis on a global network, and show the overlap between the experiments. Indeed, it is known that MRA tends to have many false positives due to correlations between TF profiles which could eventually attenuated with synergy and shadow analysis. On the contrary, the ClosedForm algorithm for network differencing tends to be more conservative as also suggested by the fact that only the significantly different subnetworks are detected in both datasets.
Abbreviations
 AUC:

Area under the curve
 GHD:

Generalized hamming distance
 FP:

False positive
 FPR:

False positive rate
 GSEA:

Gene set enrichment analysis
 MAD:

Mean absolute difference
 MR:

Master regulator
 MRA:

Master regulator analysis
 QAP:

Quadratic assignment procedure
 RG:

Random geometric
 ROC:

Receiver operating characteristic
 TF:

Transcription factor
 TO:

Topological overlap
 TP:

True positive
 TPR:

True positive rate
References
 1
Jin L, Chen Y, Wang T, Hui P, Vasilakos AV. Understanding user behavior in online social networks: a survey. Commun Mag IEEE. 2013; 51(9):144–50.
 2
Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B. Measurement and Analysis of Online Social Networks. In: Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement. IMC ’07. San Diego: ACM: 2007. p. 29–42.
 3
Broder A, Kumar R, Maghoul F, Raghavan P, Rajagopalan S, Stata R, et al. Graph Structure in the Web. Comput Netw. 2000; 33(16):309–20.
 4
Erath A, Löchl M, Axhausen K. Graphtheoretical analysis of the swiss road and railway networks over time. Netw Spat Econ. 2009; 9(3):379–400.
 5
Kesidis G. An introduction to communication network analysis. Hoboken: Wiley; 2007.
 6
Boginski V, Butenko S, Pardolas PM. Statistical analysis of financial networks. Comput Stat Data Anal. 2005; 48(2):431–43.
 7
Ideker T, Ozier O, Schwikowski B, Siegel AF. Discovery regulartory and signalling circuits in molecular interaction networks. Bioinformatics. 2002; 18:s233–40.
 8
Keller A, Bakes C, Gerasch A, Kaufmann M, Kohlbacher O, Meese E, et al. A novel algorithm for detecting differentially regulated paths based on gene enrichment analysis. Bioinfomatics. 2009; 25(21):2787–94.
 9
Nacu S, CritchleyThrone R, Lee R, Holmes S. Gene expression network analysis and applications to immunology. Bioinformatics. 2007; 23(7):850–8.
 10
Dehmer M, EmmertStreib F. Analysis of microarray data: a networkbased appraoch. Weinheim: John Wiley & Sons; 2008.
 11
D’haeseleer P, Liang S, Somogyi R. Genetic network inference: From coexpression clustering to reverse engineering. Bioinformatics. 2000; 16(8):707–26.
 12
Wallace TA, Martin DN, Ambs S. Interaction among genes, tumor biology and the environment in cancer health disparities: examining the evidence on a national and global scale. Carcinogenesis. 2011; 32(8):1107–21.
 13
Ahern TP, HorvathPuho E, Spindler KLG, Sorensen HT, Ording AG, Erichsen R. Colorectal cancer, comorbidity, and risk of venous thromboembolism: assessment of biological interactions in a Danish nationwide cohort. Br J Cancer. 2016; 114(1):96–102.
 14
Ceccarelli M, Cerulo L, Santore A. De novo reconstruction of gene regulatory networks from time series data, an approach based on formal methods. Methods. 2014; 69(3):298–305.
 15
Turcan S, Rohle D, Goenka A, Walsh LA, Fang F, Yilmaz E, et al. IDH1 mutation is sufficient to establish the glioma hypermethylator phenotype. Nature. 2012; 483(7390):479–83.
 16
Network CGAR, et al. Comprehensive, integrative genomic analysis of diffuse lowergrade gliomas. N Engl J Med. 2015; 2015(372):2481–98.
 17
Ceccarelli M, Barthel FP, Malta TM, Sabedot TS, Salama SR, Murray BA, et al. Molecular profiling reveals biologically discrete subsets and pathways of progression in diffuse glioma. Cell. 2016; 164(3):550–63.
 18
Brandes U, Eriebach T. Network Analysis: Methodological Foundations. Berlin: Springer; 2005, p. 3418.
 19
Lena PD, Wu G, Martelli P, Casadio R, Nardini MC. An efficient tool for molecular interaction maps overlap. BMC Bioinforma. 2013; 14(1):159.
 20
Yang Q, Sze S. Path matching and graph matching in biological networks. J Comput Biol. 2007; 14(1):56–67.
 21
Ramana MV, Scheinerman ER, Ullman D. Fractional isomorphism of graphs. Discrete Math. 1994; 132(1):247–65.
 22
Shervashidze N, Schweitzer P, van Leeuwen EJ, Mehlhorn K, Borgwardt KM. WeisfeilerLehman Graph Kernels. J Mach Learn Res. 2011; 12:2539–61.
 23
Hamming RW. The unreasonable effectiveness of mathematics. Am Math Monthly. 1980; 87(2):81–90.
 24
Butts C, Carley KM. Canonical labeling to facilitate graph comparison. Pittsburgh: Carnegie Mellon University Press; 1998.
 25
Gill R, Datta S, Datta S. A statistical framework for differential network analysis from microarrya data. BMC Bioinforma. 2010; 11(1):95.
 26
Mantel N. The detection of disease clustering and a generalized regression approach. Cancer Res. 1967; 27(2):209.
 27
Hubert LJ. Assignment methods in combinatorial data analysis. New York: Marcel Dekker; 1987, p. 1.
 28
Ruan D, Young A, Montana G. Differential analysis of biological networks. BMC Bioinforma. 2015; 16:327.
 29
Fuller TF, Ghazalpour A, Aten JE, Drake TA, Lusis AJ, Horvath S. Weighted gene coexpression network analysis strategies applied to mouse weight. Mammilian Genome. 2007; 18(6):463–72.
 30
Ha MJ, Baladandayuthapani V, Do KA. DINGO: differential network analysis in genomics. Bioinformatics. 2015; 31(21):3413–20.
 31
Zhang B, Horvath S. A general framework for weighted gene coexpression network analysis. Stat Appl Genet Mol Biol. 2005; 4(1):1128.
 32
Allen JD, Xie Y, Chen M, Girad L, Xao GH. Comparing statistical methods for constructing large scale gene networks. PLoS ONE. 2012; 7(1):e29348.
 33
Deshpande R, Vandersluis B, Myers CL. Comparison of profile similarity measures for genetic interaction networks. PLoS ONE. 2013; 8(7):e68664.
 34
Benjamini Y, Yekutieli D. The control of false discovery rate in multiple testing under dependency. Ann Stat. 2001; 29:1165–88.
 35
Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007; 8(1):118–27.
 36
Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Favera RD, et al. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinforma. 2006; 7(S1):1–15.
 37
Sales G, Romualdi C. parmigene  a parallel R package for mutual information estimation and gene network reconstruction. Bioinformatics [ISMB/ECCB]. 2011; 27(13):1876–7. Available from: http://dblp.unitrier.de/db/journals/bioinformatics/bioinformatics27.html#SalesR11.
 38
Carro MS, Lim WK, Alvarez MJ, Bollo RJ, Zhao X, Snyder EY, et al. The transcriptional network for mesenchymal transformation of brain tumours. Nature. 2010; 463(7279):318–25.
 39
Guan X, Vengoechea J, Zheng S, Sloan AE, Chen Y, Brat DJ, et al. Molecular subtypes of glioblastoma are relevant to lower grade glioma. PLoS ONE. 2014; 9(3):e91216.
 40
Sing T, Sander O, Beerenwinkel N, Lengauer T. ROCR: visualizing classifier performance in R. Bioinformatics. 2005; 21(20):3940–41.
 41
Mankiewicz R. The Story of Mathematics. Princeton: Princeton University Press; 2004.
 42
Girvan M, Newman ME. Community structure in social and biological networks. Proc Natl Acad Sci. 2002; 99(12):7821–6.
 43
Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Stat Mech Theory Exp. 2008; 2008(10):P10008.
 44
Rosvall M, Bergstrom CT. Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems. PloS ONE. 2011; 6(4):e18209.
 45
Reichardt J, Bornholdt S. Statistical mechanics of community detection. Phys Rev E. 2006; 74(1):016110.
 46
Orman GK, Labatut V. A comparison of community detection algorithms on artificial networks. In: International Conference on Discovery Science. Berlin: Springer: 2009. p. 242–56.
 47
Mall R, Langone R, Suykens JA. Multilevel hierarchical kernel spectral clustering for reallife large scale complex networks. PloS ONE. 2014; 9(6):e99966.
 48
Mall R, Langone R, Suykens JA. Kernel spectral clustering for big data networks. Entropy. 2013; 15(5):1567–86.
 49
Mall R, Langone R, Suykens JA. Selftuned kernel spectral clustering for large scale networks. In: Big Data, 2013 IEEE International Conference on. Santa Clara: IEEE: 2013. p. 385–93.
 50
Dittrich MT, Klau GW, Rosenwald A, Dandekar T, Müller T. Identifying functional modules in protein–protein interaction networks: an integrated exact approach. Bioinformatics. 2008; 24(13):i223–31.
 51
West J, Beck S, Wang X, Teschendorff AE. An integrative network algorithm identifies ageassociated differential methylation interactome hotspots targeting stemcell differentiation pathways. Sci Rep. 2013; 3:1630.
 52
Jiao Y, Widschwendter M, Teschendorff AE. A systemslevel integrative framework for genomewide DNA methylation and gene expression data identifies differential gene expression modules under epigenetic control. Bioinformatics. 2014; 30(16):2360–66.
 53
Steinwart I, Hush D, Scovel C. A classification framework for anomaly detection. J Mach Learn Res. 2005; 6:211–32.
 54
Kumar A, NiculescuMizil A, Kavukcuoglu K, Daume III H. A binary classification framework for twostage multiple kernel learning. 2012:066428. arXiv preprint arXiv:12X00000.
 55
EckelPassow JE, Lachance DH, Molinaro AM, Walsh KM, Decker PA, Sicotte H, et al. Glioma groups based on 1p/19q, IDH, and TERT promoter mutations in tumors. N Engl J Med. 2015; 372(26):2499–508.
 56
Noushmehr H, Weisenberger DJ, Diefes K, Phillips HS, Pujara K, Berman BP, et al. Identification of a CpG island methylator phenotype that defines a distinct subgroup of glioma. Cancer Cell. 2010; 17(5):510–22.
 57
Zhu D, Hunter SB, Vertino PM, Van Meir EG. Overexpression of MBD2 in glioblastoma maintains epigenetic silencing and inhibits the antiangiogenic function of the tumor suppressor gene BAI1. Cancer Res. 2011; 71(17):5859–70.
 58
Gleize V, Alentorn A, Connen de Kérillis L, Labussière M, Nadaradjane AA, Mundwiller E, et al. CIC inactivating mutations identify aggressive subset of 1p19q codeleted gliomas. Ann Neurol. 2015; 78(3):355–74.
 59
Feng C, Zhang Y, Yin J, Li J, Abounader R, Zuo Z. Regulatory factor X1 is a new tumor suppressive transcription factor that acts via direct downregulation of CD44 in glioblastoma. NeuroOncology. 2014; 16(8):1078–85.
 60
Bai H, Harmancı AS, ErsonOmay EZ, Li J, Coşkun S, Simon M, et al. Integrated genomic characterization of IDH1mutant glioma malignant progression. Nat Genet. 2016; 48(1):59–66.
 61
Kraskov A, Stögbauer H, Grassberger P. Estimating mutual information. Phys Rev E. 2004; 69(6):066138.
 62
Sales G, Romualdi C. parmigene–a parallel R package for mutual information estimation and gene network reconstruction. Bioinformatics. 2011; 27(13):1876–7.
Acknowledgements
None.
Funding
This work was funded by Qatar Foundation.
Availability of data and materials
The scripts implementing the proposed algorithms are available in R at https://sites.google.com/site/raghvendramallmlresearcher/codes. The gene expression data used in this paper were downloaded from the TCGA data portal https://cancergenome.nih.gov/and form the caintegrator portal https://caintergator.nci.nih.gov/rembrandt.
Authors’ contributions
RM conceived the methodology, developed the algorithms and drafted the manuscript. LC generated the data on glioma and helped to draft the manuscript. HB performed the statistical analysis. AI participated in the design of the study and to the critical analysis of the results. MC conceived of the study, participated in its design and coordination and helped to draft the manuscript. All authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Consent for publication
Not applicable.
Ethics approval and consent to participate
Not applicable.
Author information
Additional file
Additional file 1
Table S1. Description of data: GHD and MRA Results for all the 457 considered transcription factors on the TCGA and Rembrandt datasets. (XLSX 62.7 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Received
Accepted
Published
DOI
Keywords
 Differential networks
 Gene regulatory network inference
 Master regulators