Detection of statistically significant network changes in complex biological networks
 Raghvendra Mall^{1}Email author,
 Luigi Cerulo^{2, 3},
 Halima Bensmail^{1},
 Antonio Iavarone^{4} and
 Michele Ceccarelli^{1, 2}Email authorView ORCID ID profile
DOI: 10.1186/s1291801704126
© The Author(s) 2017
Received: 23 August 2016
Accepted: 22 February 2017
Published: 4 March 2017
Abstract
Background
Biological networks contribute effectively to unveil the complex structure of molecular interactions and to discover driver genes especially in cancer context. It can happen that due to gene mutations, as for example when cancer progresses, the gene expression network undergoes some amount of localized rewiring. The ability to detect statistical relevant changes in the interaction patterns induced by the progression of the disease can lead to the discovery of novel relevant signatures. Several procedures have been recently proposed to detect subnetwork differences in pairwise labeled weighted networks.
Methods
In this paper, we propose an improvement over the stateoftheart based on the Generalized Hamming Distance adopted for evaluating the topological difference between two networks and estimating its statistical significance. The proposed procedure exploits a more effective model selection criteria to generate pvalues for statistical significance and is more efficient in terms of computational time and prediction accuracy than literature methods. Moreover, the structure of the proposed algorithm allows for a faster parallelized implementation.
Results
In the case of dense random geometric networks the proposed approach is 1015x faster and achieves 510% higher AUC, Precision/Recall, and Kappa value than the stateoftheart. We also report the application of the method to dissect the difference between the regulatory networks of IDHmutant versus IDHwildtype glioma cancer. In such a case our method is able to identify some recently reported master regulators as well as novel important candidates.
Conclusions
We show that our network differencing procedure can effectively and efficiently detect statistical significant network rewirings in different conditions. When applied to detect the main differences between the networks of IDHmutant and IDHwildtype glioma tumors, it correctly selects subnetworks centered on important key regulators of these two different subtypes. In addition, its application highlights several novel candidates that cannot be detected by standard single networkbased approaches.
Keywords
Differential networks Gene regulatory network inference Master regulatorsBackground
The omnipresence of complex networks is reflected in wide variety of domains including social networks [1, 2], web graphs [3], road graphs [4], communication networks [5], financial networks [6] and biological networks [7–9]. Although we focus on biological networks many aspects of the method proposed in this paper can also be applied for networks in other contexts. In cancer research, the comparison between gene regulatory networks, protein interaction networks, and DNA methylation networks is performed to detect differences between two conditions, such as, healthy and disease [10, 11]. This can lead to discovery biological pathways related to the disease condition, and, in case of cancer, the gene regulatory changes as the disease progresses [12–14].
A central problem in cell biology is to model functional networks underlying interactions between molecular entities from high throughput data. One of the main question is how the cell globally changes its behavior in response to external stimuli or what is the effect of alterations such as, driver somatic mutations and changes in copy number. Signatures of differentially expressed and/or methylated genes are the downstream effect of global cell deregulation in different conditions such as cancer subtypes. Therefore, it is argued that driver mutations activate functional pathways described by different global rewiring of the underlying gene regulatory network.
The identification of significant changes induced by the presence or the progression of disease can help to discover novel molecular diagnostics and prognostic signatures. For example, it is known that, according to the mutation of the gene IDH [15, 16], the majority of malignant brain tumors can be divided two main macrocategories, which can be further divided in seven molecular and clinically distinct subtypes [17]. These two macrogroups are characterized by highly different global expression and epigenomic profiles. Hence, one of the main questions to understand the molecular basis of diseases is how to identify significant changes in the regulatory structure in different conditions.
Various techniques have been developed to compare two graphs including graph matching and graph similarity algorithms [18–20]. However, the problem addressed in this paper is different from popular graph theory problems including graph isomorphism [21] and subgraph matching [22]. Here the goal is to identify statistically significant differences between two weighted networks (with or without labels).
One common statistic used to distinguish one graph, A from another B, having the same number of nodes N, is the Mean Absolute Difference (MAD) metric, defined as: \(d(A,B)=\frac {1}{N(N1)}\sum _{i\neq j}a_{ij}b_{ij}\), where a _{ ij } and b _{ ij } are edge weights corresponding to the topology of networks A and B. This distance measure is equivalent to the Hamming distance [23] and has been extensively used in literature to compare networks [24, 25]. Another statistic used to test association between networks is the Quadratic Assignment Procedure (QAP) defined as: \(Q(A,B) = \frac {1}{N(N1)} \sum _{i=1} \sum _{j=1} a_{ij}b_{ij}\). The QAP metric is used in a permutationbased procedure to differentiate two networks [26, 27]. Ruan et al. showed that these metrics are not always sensitive to subtle topological variations [28].
Our aim is to detect statistically significant differences between two networks under the premise that any true topological difference between the two networks would involve only a small set of edges when compared to all the edges in the network. Recently, a Generalized Hamming Distance (GHD) based method was introduced to measure the distance between two labeled graphs [28], where it was shown that the GHD statistic is more robust than MAD and QAP metrics for identifying subtle variations in the topology of paired networks. In particular the authors showed that GHD permutation distribution follows a normal distribution with closedform expression for first two moments under the null hypothesis that networks A and B are independent. Utilizing the moments, corresponding pvalues were obtained in closedform. They also propose a differential subnetwork identification technique namely dGHD. The advantage of this technique is that – unlike previous differential network analysis techniques [25, 29, 30] – it provides a closedform solution for pvalues for the differential subnetwork left after iterative removal of the least differential nodes. We propose an improvement over dGHD, namely ClosedForm approach that exploits the conditions for asymptotic normality which is computationally cheaper and attains better prediction performance than the dGHD algorithm. Computational efficiency and prediction accuracy is crucial in cancer contexts where networks have a large number of nodes and the topological difference is associated to few driver genes.
Methods
Preliminaries on generalized hamming distance
The edge weights, a _{ ij } and b _{ ij }, depend on the topology of the network and provide a measure of connectivity between every pair of nodes i and j in A and B. Different metrics have been adopted to measure the connectivity between pairs of nodes, including: topological overlap (TO) [31, 32], cosine similarity and Pearson correlation [33]. In our experiments, we used the cosine similarity to capture first order interactions between the nodes in the network. Cosine similarity computation scales well for large sparse networks and can be used in place of TO, as it has nearly perfect correlation with it.
Given two networks A and B, a permutation π of the labels of the vertices of A (keeping the edges unchanged) generates a permuted network A _{ π }. The quantity G H D _{ π }(A _{ π },B) represents the test statistics of an inferential problem having as null hypothesis \(\mathcal {H}_{o}\): Graphs A and B are independent [28]. The distribution of G H D _{ π } can be obtained through an exhaustive calculation which can be approximated by a Monte Carlo approach. The authors of [28], indeed, simplified this calculation showing that under the null hypothesis it can be approximated well by a normal distribution with moments that can be obtained analytically.
Given a significance threshold α (e.g. 0.01), pvalues >α indicate that there is no sufficient evidence to reject the null hypothesis (\(\mathcal {H}_{o}\)) that graphs A and B are independent. Hence, higher pvalues indicate more probability that the two graphs under consideration are independent.
Differential subnetwork detection with GHD
The GHD distance is able to tell us to what extent are two graphs different but is not able to identify which parts of the graph are similar and which are different. In this work, we are interested in detecting which part of the graphs contribute to make the two graphs different. We call such different subgraphs differential subnetworks.
The notion of differential subnetworks is based on the idea that when comparing two networks only a subset of edges would present altered interaction. The goal is to identify the set of nodes, namely V ^{∗}, associated with such a subset of edges and the pvalues p ^{∗} corresponding to the nodes in V ^{∗}. This goal, formulated as a statistical test, requires that for such a subset V ^{∗} there is no sufficient evidence to reject the null hypothesis that the corresponding subnetworks \(A^{*}(V^{*},E_{A^{*}})\phantom {\dot {i}\!}\) and \(B^{*}(V^{*},E_{B^{*}})\phantom {\dot {i}\!}\) are statistically independent.
The idea here is to adopt an iterative technique to identify the set of nodes V ^{∗} which contributes more to the difference. We start from the dGHD algorithm proposed in [28]. The algorithm measures the edge connectivity with topological overlap metric and benefits from the closedform solution of pvalue (Eq. (3)). In the dGHD algorithm, an iterative procedure is followed where at each iteration the change in centralized GHD (cGHD) i.e. cGHD=GHD(A,B)−μ _{ π } is estimated after the removal of one node. The node where the change in cGHD (i.e. difference in cGHD before and after removal of a node) is maximum is removed. The GHD statistic is computed for remaining subnetworks and the pvalue is estimated. This process is repeated till a user specified minimal set size is reached or it is nolonger possible to have closedform representation for pvalues which happens for N≤3 as shown in Eq. 3. The pvalues are then adjusted for multiple testing by controlling the false discovery rate [34].
 1.
Remove nodes by exploiting the ClosedForm. We use the idea that nodes which have similar topology in networks A and B will contribute the least to cGHD. So, we first calculate the closedform contribution of each node in cGHD once using Eq. 4 and then iteratively remove nodes with least contributions. However, this process is continued till we observe that the pvalue of the remaining subnetwork becomes greater than a threshold θ.
 2.
Using a different model selection criterion. Once the pvalue reaches θ, we follow a procedure similar to the dGHD algorithm but use the more intuitive criterion of selecting the node that when removed makes the cGHD value maximum rather than using the change in the cGHD value (before and after removal of a node) as a model selection criterion. By using this model selection criterion, we iteratively identify and remove that node whose contribution is least in the cGHD.
The advantage of the ClosedForm approach is that we significantly reduce the computational complexity and improve the predictive performance. A simple alternative to the ClosedForm approach would be to sort all the nodes based on their contribution to cGHD and thus rank all the nodes based on their capability to differentiate the two networks with complexity (O(N logN)). However, then we will not be able to identify statistically different subnetworks between the two graphs as indicated in [28].
Closedform approach
We observe that if we sum GHD(A,B)(i) and μ _{ π }(i)∀i∈V, we obtain GHD(A,B) and μ _{ π }. We use the idea that nodes which have similar topology in networks A and B will contribute the least to centralized GHD, i.e. GHD(A,B)−μ _{ π }. We calculate the ClosedForm contribution of each node in the centralized GHD (cGHD) once using Eq. (4) and then iteratively remove nodes with least contribution to the cGHD, i.e. nodes having similar topology in graphs A and B. Thus, we calculate cGHD once and sort all the nodes based on their contribution to the cGHD metric.
Time complexity comparison
dGHD  Closedform 

O(N ^{2}E)  O(NE+N log(N)+K ^{2}E) 
Alternative procedure (fast approximation)
We propose an alternative procedure to the ClosedForm approach namely the Fast Approximation method where we first calculate the cGHD value without including the i ^{ t h } node, ∀i∈V once. This helps to estimate the cGHD value after removal of the i ^{ t h } node and can be performed in parallel. Our aim is to quickly discard those nodes after removal of which the cGHD value becomes large thereby removing nodes which were contributing least to the cGHD value. This helps to reduce the dependence between the two subnetworks by removing nodes which have similar topology in graphs A and B. Again, the idea is motivated by the premise that only a subset of nodes will form the differential subnetworks in graph A and B.
In this approach, we iteratively discard those nodes after removal of which the cGHD value becomes maximal till the pvalue for the remaining subnetwork reaches a threshold θ. Once the pvalue reaches θ, we return back to the procedure of estimating \(\Delta _{V_{Ki}} \forall i \in V_{K}\) as described in the ClosedForm approach. We use the same model selection criterion of selecting that node after removal of which the cGHD value becomes maximum as used in the ClosedForm approach. We then adjust the obtained pvalues for multiple testing by controlling the false discovery rate [34]. We refer to this technique as a Fast Approximation to the dGHD [28]. We explain the Fast Approximation technique in detail in Algorithm 2.
From our experiments, we observe that the results of the ClosedForm approach and the Fast Approximation technique are identical. Although, in the case of ClosedForm approach, we calculate closedform contribution of each node in the cGHD value and remove the node with least contribution, while in case of Fast Approximation we select that node after removal of which cGHD value becomes maximum, the ordered list \(\mathcal {O}\) obtained for both the methods is identical. Moreover, the computational complexity of the FastApproximation technique is the same as that of ClosedForm approach.
Inference of the glioma networks and master regulator analysis
We used the TCGA panglioma samples dataset including 1250 samples (463 IDHmutant and 653 IDHwildtype), 583 of which profiled with Agilent microarray and 667 with RNASeq Illumina HiSeq (REF) downloaded from the TCGA portal. The batch effects between the two platform were corrected using the COMBAT algorithm [35]. The final gene expression data matrix includes 12,985 genes and 1250 samples. We reconstructed two gene regulatory networks belonging to two different glioma subtypes: IDHmutant and IDHwildtype. Both networks were reconstructed with a four step procedure that follows ARACNe [36]: i) Computation of mutual information between gene expression profiles to determine interaction between Transcription Factors (TFs) and target genes [37]; ii) data processing inequality to filter out indirect relationships [36], iii) permutation test with 1000 resamplings to keep only statistically significant relationships. We also assembled a global glioma network using all the available 1250 transcriptional profiles using the aforementioned method. In this last case we also used intersection with transcription factor (TF) binding sites to keep only relationships due to promoter binding. We used a set of 457 TF binding sites available in the MotifDB Bioconductor package.
Master Regulator Analysis (MRA) algorithm [38] was applied to the global glioma network in order to compute the statistical significance of the overlap between the regulon of each TF (i.e. its ARACNe inferred targets) and the differentially expressed gene list (WilcoxonMannWhitney test FDR≤0.05) between IDHmutant and IDHwildtype samples. Given a gene interaction network, generated by ARACNe and a gene phenotype signature (e.g. a set of differentially expressed genes), the MRA algorithm computes for each TF the enrichment of the phenotype signature in the regulon of that TF. The regulon of a TF is defined as its neighborhood in the gene interaction network. There are two different methods to evaluate the enrichment of the signature in the regulon. One method uses the statistical Fisher’s exact test, while the other approach uses Gene Set Enrichment Analysis (GSEA). Here we used this last method.
A Master Regulator (MR) gene is a TF which regulon exhibit a statistical significant enrichment of the given phenotype signature.
Validation in the Rembrandt dataset
We used an independent dataset to perform the same analysis of network differencing between IDHmutant and IDHwildtype gliomas and check the the overlap between the two analyses. Raw gene expression (Affymetrix U133 Plus 2.0) from the publically available Repository for Molecular Brain Neoplasia Data (Rembrandt) (https://caintegrator.nci.nih.gov/rembrandt/) included 444 samples divided in 218 Glioblastoma, 148 Astrocytoma, 67 Oligodendrogliomas and 11 mixed histologies. Expression subtype and IDH status was inferred from gene expression following the procedure in [39] resulting in 153 wildtype and 162 mutant samples. These two set of expression profiles were used to generate two regulatory networks using the same approach reported above.
Results and discussion
For all our experiments, we used the ClosedForm approach (since results obtained from ClosedForm and FastApproximation techniques are identical) and compare it with the dGHD method [28].
Cosine similarity and topological overlap
where A _{ ij } represents the adjacency matrix.
We perform an experiment to calculate the correlation between the onestep topological measure and the cosine similarity measure. For this experiment, we generated 250 random geometric networks using N=250 and the connectivity parameter d=0.15.
Sensitivity to θ
In this experiment, we check the sensitivity of the proposed ClosedForm approach w.r.t. the heuristic θ. For this experiment, we first generated 100 random geometric (RG) networks. In a RG network nodes are generated by uniformly sampling N points on [ 0,1]^{2}. An edge is then drawn between these points if the Euclidean distance between the points is less than a parameter d. This parameter d controls the density of the RG network where smaller values of d result in sparse networks while larger values of d generates dense networks. In our case, we conducted experiments using two different settings. In the first case, we use d=0.15, while in the second setting, we use d=0.3. For both experiments we fix N=250. For each value of d and for each generated RG network A, we permute the first 50 rows and columns of the network to generate network B. Therefore, the first 50 nodes in networks A and B form the goldstandard.
From this experiment, we conclude that the fraction of truly differential nodes identified by the proposed methods increases as we decrease the threshold θ before it starts to converge for smaller values of threshold θ.
We performed further experiments using different θ for various values of N and observed that threshold θ behaves similarly independent of the value of N. We used the θ=10^{−50} as heuristic cutoff for future experiments.
Predictive performance comparison
Experimental Setup: The next simulation study that we carried out was to compare the predictive performance of the proposed approach w.r.t. the dGHD [28] technique. For this experiment, we generate 100 RG networks with N=1,000. For the first experiment we fix the density parameter d=0.15 and permute first 100 nodes in network A to obtain network B. Thus, these first 100 nodes form the differential subnetwork for the paired networks A and B.
In the second case, we use the density parameter d=0.3 to generate the edges for network A. We then generate a small RG network with 100 nodes using density parameter d ^{′}=0.5. This small dense subnetwork is then used to replace the network formed by first 100 nodes in the original network A to form network B. Thus, in the second experiment, these 100 nodes form the differential subnetwork for the paired networks A and B. This kind of mechanism can appear in reallife networks, for example, in case of cancer the transcription activity of some set of genes might get enhanced or suppressed in patients resulting in more or fewer edges in a subnetwork of the gene or DNA methylation network. Hence, the networks generated in the first case are much sparser in comparison to the networks generated in the second case.

True Positives (TP)  Refers to the nodes that are correctly identified as part of a differential network.

False Positives (FP)  Refers to the nodes that are incorrectly identified as part of a differential network.

False Negatives (FN)  Refers to the nodes that are part of the differential subnetwork but are not identified correctly as part of the subnetwork.

True Negatives (TN)  Refers to the nodes that are correctly identified as nodes which are not part of the differential subnetwork A ^{∗} and B ^{∗}.
Comparison of proposed ClosedForm (CF) approach with dGHD algorithm. We compared the proposed ClosedForm approach with dGHD, Louvain, Infomap and Spinglass techniques w.r.t. various evaluation metrics for random geometric (RG) and power law (PL) networks
Parameters  Method  AUC_ROC  Precision  Recall  Accuracy  Specificity  Kappa  Time 

Mean ± Sd  Mean ± Sd  Mean ± Sd  Mean ± Sd  Mean ± Sd  Mean ± Sd  Mean  
d=0.15 (RG)  CF  0.935 ± 0.051  0.849 ± 0.037  0.846 ± 0.102  0.969 ± 0.011  0.983 ± 0.004  0.828 ± 0.068  0.078 
d=0.15 (RG)  dGHD  0.926 ± 0.018  0.793 ± 0.021  0.878 ± 0.036  0.965 ± 0.005  0.974 ± 0.003  0.813 ± 0.026  1.0 
d=0.15 (RG)  Louvain  0.980 ± 0.016  0.767 ± 0.052  1.0 ± 0.0  0.965 ± 0.028  0.960 ± 0.031  0.841 ± 0.113  0.012 
d=0.15 (RG)  Infomap  0.843 ± 0.012  0.262 ± 0.015  1.0 ± 0.0  0.718 ± 0.022  0.685 ± 0.024  0.304 ± 0.024  0.018 
d=0.15 (RG)  Spinglass  0.832 ± 0.011  0.249 ± 0.012  1.0 ± 0.0  0.699 ± 0.018  0.665 ± 0.021  0.285 ± 0.020  0.85 
d=0.15, d ^{′}=0.3  CF  0.927 ± 0.048  0.839 ± 0.031  0.862 ± 0.098  0.969 ± 0.008  0.982 ± 0.005  0.825 ± 0.054  0.081 
d=0.15, d ^{′}=0.3  dGHD  0.922 ± 0.022  0.806 ± 0.027  0.868 ± 0.045  0.966 ± 0.006  0.977 ± 0.004  0.816 ± 0.032  1.0 
d=0.15, d ^{′}=0.3  Louvain  0.978 ± 0.018  0.887 ± 0.137  0.974 ± 0.042  0.982 ± 0.018  0.982 ± 0.023  0.916 ± 0.083  0.013 
d=0.15, d ^{′}=0.3  Infomap  0.849 ± 0.008  0.269 ± 0.009  1.0 ± 0.0  0.728 ± 0.015  0.698 ± 0.016  0.316 ± 0.016  0.020 
d=0.15, d ^{′}=0.3  Spinglass  0.859 ± 0.009  0.284 ± 0.013  1.0 ± 0.0  0.747 ± 0.016  0.719 ± 0.017  0.339 ± 0.019  0.92 
d=0.3 (RG)  CF  0.877 ± 0.067  0.714 ± 0.075  0.789 ± 0.135  0.947 ± 0.016  0.975 ± 0.011  0.716 ± 0.099  0.083 
d=0.3 (RG)  dGHD  0.724 ± 0.029  0.645 ± 0.049  0.577 ± 0.059  0.921 ± 0.007  0.971 ± 0.006  0.504 ± 0.051  1.0 
d=0.3 (RG)  Louvain  0.866 ± 0.019  0.406 ± 0.061  1.0 ± 0.0  0.850 ± 0.034  0.833 ± 0.038  0.505 ± 0.072  0.013 
d=0.3 (RG)  Infomap  0.677 ± 0.011  0.147 ± 0.004  1.0 ± 0.0  0.419 ± 0.019  0.354 ± 0.022  0.100 ± 0.008  0.021 
d=0.3 (RG)  Spinglass  0.678 ± 0.011  0.148 ± 0.004  1.0 ± 0.0  0.420 ± 0.018  0.355 ± 0.021  0.100 ± 0.008  0.90 
d=0.3, d ^{′}=0.5  CF  0.979 ± 0.005  0.771 ± 0.061  0.930 ± 0.082  0.965 ± 0.012  0.969 ± 0.011  0.821 ± 0.062  0.09 
d=0.3, d ^{′}=0.5  dGHD  0.848 ± 0.071  0.700 ± 0.038  0.731 ± 0.148  0.941 ± 0.010  0.964 ± 0.009  0.672 ± 0.078  1.0 
d=0.3, d ^{′}=0.5  Louvain  0.932 ± 0.029  0.478 ± 0.118  1.0 ± 0.0  0.879 ± 0.054  0.866 ± 0.059  0.582 ± 0.128  0.014 
d=0.3, d ^{′}=0.5  Infomap  0.674 ± 0.010  0.145 ± 0.004  1.0 ± 0.0  0.413 ± 0.018  0.348 ± 0.020  0.097 ± 0.008  0.023 
d=0.3, d ^{′}=0.5  Spinglass  0.711 ± 0.007  0.162 ± 0.003  1.0 ± 0.0  0.481 ± 0.013  0.423 ± 0.014  0.128 ± 0.006  0.94 
α=2 (PL)  CF  0.797 ± 0.046  0.307 ± 0.307  0.792 ± 0.099  0.801 ± 0.018  0.349 ± 0.051  0.802 ± 0.022  0.09 
α=2 (PL)  dGHD  0.797 ± 0.013  0.294 ± 0.009  0.809 ± 0.027  0.787 ± 0.008  0.333 ± 0.015  0.784 ± 0.009  1.0 
α=2 (PL)  Louvain  0.780 ± 0.014  0.212 ± 0.010  1.0 ± 0.0  0.703 ± 0.018  0.272 ± 0.016  0.690 ± 0.011  0.015 
α=2 (PL)  Infomap  0.665 ± 0.013  0.141 ± 0.004  1.0 ± 0.0  0.603 ± 0.018  0.162 ± 0.012  0.484 ± 0.019  0.026 
α=2 (PL)  Spinglass  0.687 ± 0.014  0.153 ± 0.006  1.0 ± 0.0  0.645 ± 0.021  0.194 ± 0.011  0.527 ± 0.016  0.90 
α=3 (PL)  CF  0.825 ± 0.019  0.345 ± 0.015  0.825 ± 0.035  0.826 ± 0.007  0.402 ± 0.024  0.826 ± 0.004  0.085 
α=3 (PL)  dGHD  0.808 ± 0.027  0.327 ± 0.018  0.799 ± 0.050  0.816 ± 0.008  0.375 ± 0.031  0.817 ± 0.004  1.0 
α=3 (PL)  Louvain  0.774 ± 0.015  0.233 ± 0.011  1.0 ± 0.0  0.736 ± 0.019  0.301 ± 0.009  0.732 ± 0.019  0.015 
α=3 (PL)  Infomap  0.670 ± 0.014  0.168 ± 0.005  1.0 ± 0.0  0.635 ± 0.017  0.210 ± 0.014  0.532 ± 0.014  0.027 
α=3 (PL)  Spinglass  0.694 ± 0.013  0.179 ± 0.007  1.0 ± 0.0  0.670 ± 0.023  0.232 ± 0.012  0.571 ± 0.017  0.94 
We observe from Fig. 4 a and b that the dGHD method has lower variance w.r.t. AUC_ROC and AUC_PR metrics in comparison to ClosedForm approach in the case of permuted differential subnetwork. However, in case of denser differential subnetwork, the ClosedForm approach has much smaller variance in comparison to dGHD algorithm w.r.t. AUC_ROC and AUC_PR metrics as depicted in Fig. 4 c and d respectively. This suggests that the performance of ClosedForm technique is better than dGHD method when differential subnetworks are formed either using permuted nodes or higher density. In order to test for significance we performed the Student’s ttest under the null that the difference in the mean values of the two ROC distributions is zero i.e. \(\mu _{AUC\_ROC_{A}}\mu _{AUC\_ROC_{B}}=0\). At a significance level of 5%, we obtain pvalue of 0.48 in case of permuted subnetwork, thereby accepting the null i.e. the difference between the two distributions is not significant. In the case of paired networks with a denser differential subnetwork (i.e. d ^{′}=0.5), we obtain pvalue of 3.42×10^{−14} for the Student’s ttest, thereby rejecting the null. Similarly for the two PR distributions we obtained pvalue of 0.42 in case of permuted subnetwork and pvalue of 2.64×10^{−20} for the denser differential subnetwork.
Comparison with community detection techniques
The task of identifying differential subnetworks can also be rephrased as one of finding heavy subnetworks on a single network (say C) constructed by considering the absolute difference in the edge weights between the topological graph of network A and the topological graph of network B i.e. C _{ ij }=∥a _{ ij }−b _{ ij }∥,∀i,j∈V. This problem can then be construed as one of identifying dense modules in the network C i.e. from the previous experiments we want to discover a module corresponding to the set of nodes which have permuted or identify the denser subnetwork forming the differential subnetwork as a module.
The task of identifying dense/heavy modules in a network (C) is often referred as community detection or graph partitioning or graph clustering. There is a plethora of research associated with the problem of community detection including [42–49]. Several of these methods such as jActiveModules [50] and Spinglass algorithm [45] have also been applied to identify biologically meaningful modules (like functional modules, protein complexes, disease associated genes etc.) in biological networks as shown in [51, 52]. For our task of identifying dense modules in network C we applied 3 different community detection methods namely Louvain [43], Infomap [44] and Spinglass [45] techniques to have a comprehensive comparison with the proposed ClosedForm approach. We used the implementation of these methods available in the ‘igraph’ package in R and run each of these methods at their default settings.
We used the same set of RG networks as in the previous experiments to have a comparison with the community detection techniques. Since we are considering the difference in the topology of networks A and B in network C, we remove all the similarity between the two networks and the module with the maximum internal volume (i.e. total weight of edges within the community) is the one capturing the maximum difference between the topologies of networks A and B. Hence, we consider the densest inferred module as the one comprising the differential subnetwork and label all the nodes belonging to this cluster as differential while all the other modules are considered nondifferential. Using this notion to label the inferred communities, we compare the results obtained for the 3 different community detection techniques w.r.t. the gold standard (i.e. the actual set of labeled nodes which either belong to the permuted subnetwork or belong to the denser subnetwork) in a binary classification framework [53, 54]. These results are integrated in Table 2 along with the results of dGHD technique and the proposed ClosedForm (CF) approach. We assess the results obtained from the 3 community detection methods w.r.t. several quality metrics commonly used for binary classification including Precision, Recall, Kappa, Accuracy, Specificity, AUC_ROC and computational time. From Table 2, we observe that the Louvain method clearly outperforms the Infomap and Spinglass techniques in correctly identifying the differential subnetwork as a module with respect to the various evaluation metrics.
Simulated result analysis
Finally, the summary Table 2 highlights the computational efficiency and better predictive capabilities of the proposed technique in comparison to dGHD algorithm. For this comparison, we report the results obtained on 100 random runs of RG networks with N=1000,d=0.15 and d=0.3 respectively, where the first 100 nodes are permuted. We also report results when the first 100 nodes form the denser differential subnetworks i.e. in experiments where d=0.15 use d ^{′}=0.3 to form denser subnetwork and where d=0.3 use d ^{′}=0.5 to form denser subnetwork. We also conducted experiments on undirected Power Law (PL) graphs using N=1000 and E=10,000 with power law exponents α={2,3} respectively. We permuted the first 100 nodes of each PL network (B) to form the permuted network (A). We performed 100 random runs and report the mean values for various evaluation metrics.
Table 2 compares the ClosedForm, Louvain, Infomap, Spinglass and dGHD techniques w.r.t. various standard evaluation metrics like AUC, Precision, Recall, Accuracy, Specificity, Kappa statistic and computational Time for all the simulation experiments. Higher values of these evaluation metrics represents better quality results. Here the time required by dGHD algorithm is normalized to 1 and the time required by the other algorithms is scaled by the same normalization factor.
We observe from Table 2 that the ClosedForm approach performs exceedingly well in case of experiments on denser RG networks (d=0.3) and PL graphs. It emerges as the best method on these networks for various evaluation metrics. For this configuration, in case of both permuted and denser differential subnetworks, the mean AUC_ROC of ClosedForm approach is at least 10% higher than the dGHD algorithm. This is also reflected in higher values of Precision (0.714 and 0.771) and Recall (0.789 and 0.930) metrics for ClosedForm approach in comparison to low values of Precision (0.645 and 0.7) and Recall (0.577 and 0.731) for the dGHD algorithm in case of these experiments.
However, in case of sparse networks where its relatively easier to identify differential subnetworks ([28]), both ClosedForm and dGHD method have similar predictive performance. For sparse networks, the Louvain method nearly outperforms all other methods for the task of identifying the differential subnetwork as a module. From Table 2, we observe that the 3 community detection techniques have nearly perfect Recall scores but usually have relatively low Precision values. This indicates that these methods correctly identify all the nodes forming the differential subnetwork but also detect a large quantity of falsepositives in the densest module, thereby reducing the Precision values. The Louvain and Infomap methods are extremely fast and interestingly the Louvain method has highest Precision (0.887) which is at least 10% higher than dGHD algorithm and 5% higher than ClosedForm approach while identifying the dense differential subnetwork in a sparse network (d=0.15,d ^{′}=0.3) as shown in Table 2.
We observe that among the community detection techniques the Louvain method is the most efficient and is highly competitive with the dGHD algorithm but cannot outperform the ClosedForm approach on denser networks and Power Law graphs.
Case study in glioma
As a case study, we performed the differential subnetworks analysis of two gene regulatory networks reconstructed from the glioma dataset available on the TCGA. It is well known that the majority of gliomas are divided into two main macrocategories according to the mutation of the gene IDH1 [15, 17, 55]. Therefore, an important biological question, that motivated the development of the reported methodology, was to identify the subnetworks of of transcription factors (TFs) having a different regulatory program in these two major conditions. We reconstructed two gene regulatory networks belonging to two different glioma subtypes: IDHmutant and IDHwildtype as reported in the “Methods” Section.
In our final networks we have 457 TFs and 4,085 targets. We observe that these networks consist of 13,683 unique connections for IDHmutant and 14,158 for IDHwildtype between TFTF and TFtarget. Using these networks, we construct two unipartite topological graphs as described in the Methods section for the 457 TFs. We then perform the proposed differential subnetwork analysis to identify the TFs which are part of differential subnetworks in these topological graphs.
The top most different transcription factors detected between IDHmutant and IDHwildtype in the TCGA dataset
TF  Zscore  GHD  μ  MRA fdr 

FOXD3  0.000  1.000  1.000  1.000E+00 
FOXJ3  0.000  1.000  1.000  8.442E03 
MLX  0.000  1.000  1.000  8.075E01 
NFIA  0.000  1.000  1.000  4.502E01 
ETV1  0.062  0.058  0.058  1.000E+00 
E2F1  0.085  0.058  0.058  1.007E01 
CREB1  0.208  0.058  0.058  8.580E01 
SOX10  0.234  0.058  0.058  8.442E03 
KLF13  0.338  1.000  0.278  1.240E02 
STAT3  0.354  0.058  0.058  8.442E03 
RUNX3  0.387  0.058  0.059  1.671E02 
IRF3  0.406  0.840  0.455  8.442E03 
ZNF354C  0.498  0.058  0.057  1.000E+00 
HOXD13  0.540  0.059  0.059  2.492E01 
ZIC1  0.622  0.058  0.058  5.787E02 
HOXA2  0.700  0.059  0.059  1.405E01 
FOXO1  0.743  0.058  0.058  8.183E02 
MAFG  0.817  0.862  0.467  6.857E01 
RFX1  0.865  0.059  0.059  3.131E01 
NR1H2  0.871  0.058  0.058  8.176E01 
PAX6  1.003  0.058  0.057  4.147E01 
GLIS2  1.035  0.058  0.059  8.442E03 
NR4A2  1.118  0.058  0.058  1.000E+00 
STAT4  1.137  0.848  0.486  9.615E01 
DLX6  1.208  0.058  0.059  1.000E+00 
SIX4  1.232  0.058  0.058  1.000E+00 
MEF2D  1.379  0.058  0.059  8.442E03 
MTF1  1.388  0.058  0.057  1.000E+00 
MBD2  1.480  0.820  0.495  1.969E01 
OTP  1.493  0.058  0.057  2.970E01 
ETV4  1.529  0.059  0.059  2.122E01 
ZBTB12  1.566  0.194  0.189  4.255E02 
HOXB4  1.595  0.058  0.057  3.019E01 
PLAG1  1.622  0.195  0.190  3.434E01 
E2F6  1.668  0.197  0.192  8.442E03 
CREM  1.674  0.765  0.506  2.122E01 
IRF9  1.700  0.058  0.057  5.950E02 
KLF6  1.709  0.059  0.059  8.442E03 
TFE3  1.716  0.199  0.193  1.049E01 
HSF2  1.759  0.201  0.195  1.671E02 
NR2C1  1.800  0.058  0.058  2.122E01 
ONECUT2  1.804  0.202  0.196  3.657E02 
HOXD3  1.847  0.204  0.198  1.000E+00 
BACH1  1.888  0.058  0.059  2.897E01 
GSX1  1.895  0.207  0.200  1.000E+00 
HOXA13  1.930  0.058  0.057  1.000E+00 
VAX2  1.937  0.208  0.201  1.609E01 
In order to highlight the difference of ClosedForm approach with other standard network analysis methods, we also assembled a global glioma network using all the available transcriptional profiles using the same method described above and performed a master regulator analysis [38] with respect to the molecular phenotype under investigation, i.e. genes differentially expressed between IDH mutant and wild type. Master regulator analysis is extensively adopted to identify TFs that act as principal regulators in driving the phenotype from one condition to another.
Interestingly, among the topmost TFs (out of 457) forming the differential subnetworks, we found several genes known to have a central role in controlling specific glioma subtypes as well as novel candidates that deserve further biological validation. In particular, differential network analysis reveals that the subnetwork of STAT3 is one the most different between IDHmutant and IDHwildtype networks and a particularly significant Master Regulator of this wildtype phenotype. Members of our group have previously shown that STAT3, together with C/EBP β, is a key regulator of the mesenchymal differentiation and predicts the poor clinical outcome of IDHwildtype gliomas [38]. Another key regulator of the IDHwildtype gliomas was recently reported by using an integrative functional copy number analysis is the set of HOXA genes [17]. Moreover, another key network hub that the algorithm detects as different is SOX10 which appears to be an active master regulator of the IDHmutant phenotype. We recently reported that the GCIMPlow subgroup in the IDHmutant cohort can mediated by loss of CpG methylation and binding of SOX factors [17]. Furthermore, our algorithm identifies methylCpGbinding domain protein 2 (MBD2) as a differential network hub. In particular, MBD2 has no links in the IDHwildtype network whereas it is highly connected in the IDHmutant network where it is characterized by the CpG island methylator phenotype (GCIMP) [56]. Further investigation is needed to claim such a hypothesis as MBD2 is known also as a mediator of the epigenetic gene regulation and its role in Glioblastoma is being studied as its overexpression may drive tumor growth by suppressing the antiangiogenic activity of key tumor suppressors [57].
The differential network method highlights several other TFs as hubs of differential subnetworks which are not detected with standard MRA. For example, ETV1 and ETV4 which are overexpressed in gliomas of the Codel subtype carrying the mutation of the CIC gene [58]. Another differential subnetwork hub not detected by standard MRA is the tumor suppressor RFX1 whch has been identified as an important target/regulator of the malignancy of Glioblastoma [59], where as the cell cycle regulators such as E2F1 and E2F1, which play a role in progression of IDHmutant glioma are also detected by the ClosedForm algorithm [60].
An important warning that we want to mention is the presence of potential confounding effects due to the adopted dataset obtained by merging the expression profiles from two different platforms. With the additional difficulty that the distribution between IDHwildtype tumors and IDHmutated tumors is unequal between the two platforms (92% of microarray data are wildtype). We adopted this integrated dataset in order to build the two IDH networks and the global glioma network. The main computation in this case is the estimation of the mutual information between pairs of gene profiles (variables) in a set of observations (patients) and each individual pair of values is always extracted in the same platform. We used a robust knearest neighbor estimator proposed in [61] available in the PARMIGENE R package [62]. This estimator is not based on binning of values and is non parametric, working on the geometry of the scatterplot of each pair of gene expression values. Therefore, each observation (sample) can be seen as another evidence of dependency (or independency) between the variables regardless to the platform. Although, we found this merged dataset useful for the estimation of dependencies between genes, its adoption for deriving conclusions in terms of sample groups and pathway analysis should be made with caution.
The top most different transcription factors detected between IDHmutant and IDHwildtype in the REMBRANDT dataset
TF  Zscore  GHD  μ  MRA fdr 

MGA  0.000  1.000  1.000  2.166E03 
TEAD1  0.000  1.000  1.000  8.017E04 
FOS  0.000  1.000  1.000  5.137E04 
JUNB  0.000  1.000  1.000  5.137E04 
MEF2C  0.015  0.015  0.015  8.001E04 
LEF1  0.058  0.014  0.014  5.137E04 
NEUROD2  0.096  0.016  0.016  1.221E03 
EGR2  0.110  0.013  0.013  6.263E03 
JUN  0.123  0.333  0.500  5.137E04 
ARX  0.144  0.012  0.012  9.301E02 
BBX  0.173  0.012  0.012  7.333E04 
TCF3  0.198  0.011  0.011  5.137E04 
LHX6  0.205  0.017  0.017  8.492E04 
EGR1  0.211  0.011  0.011  9.696E03 
BCL6B  0.214  0.011  0.011  5.137E04 
E2F2  0.217  0.011  0.011  7.786E04 
E2F7  0.220  0.012  0.012  5.137E04 
E2F8  0.223  0.012  0.012  5.137E04 
ELF4  0.226  0.012  0.012  5.137E04 
ETV5  0.229  0.013  0.012  5.137E04 
FLI1  0.232  0.013  0.013  5.137E04 
FOXG1  0.235  0.013  0.013  1.000E+00 
HOXD9  0.239  0.014  0.014  9.728E04 
ID4  0.242  0.014  0.014  7.786E04 
IRF8  0.246  0.014  0.014  2.393E02 
MYBL2  0.250  0.015  0.015  5.137E04 
NFIA  0.254  0.015  0.015  4.085E03 
NFIB  0.258  0.016  0.016  7.796E04 
KLF13  0.258  0.360  0.515  8.001E04 
OLIG2  0.262  0.016  0.016  7.893E02 
PROX1  0.266  0.017  0.017  1.020E02 
SOX2  0.270  0.017  0.017  2.995E03 
TEF  0.275  0.018  0.018  8.221E04 
ZBTB7A  0.280  0.019  0.018  7.700E04 
ZIC1  0.284  0.019  0.019  7.700E01 
SOX13  0.295  0.021  0.020  8.086E04 
TCF7L2  0.300  0.021  0.021  7.487E04 
BCL6  0.305  0.022  0.022  5.137E04 
MAF  0.317  0.024  0.024  5.137E04 
CEBPB  0.330  0.024  0.024  5.137E04 
CEBPD  0.337  0.025  0.025  5.137E04 
HLF  0.344  0.018  0.018  3.029E03 
ELK1  0.349  0.025  0.025  8.017E04 
FOXJ3  0.369  0.027  0.026  5.137E04 
MTF1  0.377  0.028  0.027  5.137E04 
TP53  0.388  0.028  0.028  5.137E04 
GABPA  0.407  0.030  0.029  5.137E04 
CDC5L  0.417  0.031  0.031  7.899E04 
RORA  0.422  0.329  0.467  7.796E04 
IRF9  0.426  0.031  0.031  3.062E03 
STAT1  0.437  0.033  0.032  5.137E04 
CREB1  0.456  0.035  0.034  5.137E04 
SOX10  0.462  0.036  0.035  8.250E04 
HOXD1  0.475  0.038  0.037  5.137E04 
SOX8  0.479  0.038  0.037  1.760E03 
HOXD11  0.480  0.047  0.046  2.975E02 
NR2F2  0.490  0.042  0.041  5.186E04 
DLX1  0.491  0.046  0.045  7.700E04 
TCF12  0.493  0.040  0.040  9.117E04 
THRB  0.495  0.051  0.050  9.850E04 
DLX2  0.496  0.045  0.044  8.492E04 
HOXD10  0.498  0.050  0.049  5.137E04 
ATF5  0.505  0.057  0.055  5.137E04 
STAT4  0.515  0.055  0.054  9.220E04 
TBR1  0.519  0.020  0.020  9.272E04 
MESP1  0.521  0.092  0.087  8.746E04 
POU3F2  0.523  0.063  0.061  5.137E04 
TFEC  0.530  0.082  0.079  5.137E04 
TCF4  0.533  0.071  0.069  7.487E04 
ETS2  0.543  0.176  0.163  9.728E04 
CREM  0.558  0.110  0.104  5.140E04 
TP63  0.561  0.105  0.099  9.220E04 
STAT6  0.563  0.091  0.087  5.137E04 
NPAS2  0.575  0.136  0.127  1.889E01 
GLI3  0.601  0.313  0.455  4.663E02 
Conclusion
The comparison of gene expression profiles across different phenotypes is enabling the discovery of novel biomarkers for prognosis or diagnosis. They hold the key to identify novel targets for therapeutical intervention. In this paper, we proposed an improvement to the stateoftheart for comparing two labeled/unlabeled graphs that are representative of two conditions (e.g. the macrocategories according to the mutation of the gene IDH1 in our case study) and identifying statistically significant differences in their topology. We used the centralized GHD (cGHD) metric [28] to calculate the distance between the two labeled networks. We proposed a ClosedForm approach, an improvement to the dGHD algorithm, to detect localized topological differences between paired networks. The ClosedForm approach calculates the closedform contribution of each node in the cGHD metric and efficiently removes nodes with the smaller contributions in the cGHD value. From our experiments on scale free random geometric networks, we discovered that the ClosedForm approach was 1015x faster than dGHD from a computational complexity point of view. For differential subnetwork analysis in very sparse paired graphs, both the ClosedForm and dGHD methods had good predictive performance. They reached mean AUC values of ≈0.935 and ≈0.926 respectively for 100 random runs of simulation experiments. However, for relatively denser networks, the ClosedForm approach outperformed dGHD. The proposed method achieved a mean AUC of ≈0.877 while the dGHD technique reached a mean AUC of ≈0.724. The ClosedForm approach also achieved much higher Precision, Recall and Kappa values in comparison to the dGHD method for relatively denser networks.
We applied our algorithm to detect the main differences between the networks of IDHmutant and IDHwildtype glioma tumors and show that it correctly selects subnetworks centered on important key regulators of these two different subtypes. The adopted dataset is the result of the merging of two different profiling platforms and, as reported in the Results section, its use for other purposes should be made with caution. We also report the results on the same data using standard Master Regulator Analysis on a global network, and show the overlap between the experiments. Indeed, it is known that MRA tends to have many false positives due to correlations between TF profiles which could eventually attenuated with synergy and shadow analysis. On the contrary, the ClosedForm algorithm for network differencing tends to be more conservative as also suggested by the fact that only the significantly different subnetworks are detected in both datasets.
Abbreviations
 AUC:

Area under the curve
 GHD:

Generalized hamming distance
 FP:

False positive
 FPR:

False positive rate
 GSEA:

Gene set enrichment analysis
 MAD:

Mean absolute difference
 MR:

Master regulator
 MRA:

Master regulator analysis
 QAP:

Quadratic assignment procedure
 RG:

Random geometric
 ROC:

Receiver operating characteristic
 TF:

Transcription factor
 TO:

Topological overlap
 TP:

True positive
 TPR:

True positive rate
Declarations
Acknowledgements
None.
Funding
This work was funded by Qatar Foundation.
Availability of data and materials
The scripts implementing the proposed algorithms are available in R at https://sites.google.com/site/raghvendramallmlresearcher/codes. The gene expression data used in this paper were downloaded from the TCGA data portal https://cancergenome.nih.gov/and form the caintegrator portal https://caintergator.nci.nih.gov/rembrandt.
Authors’ contributions
RM conceived the methodology, developed the algorithms and drafted the manuscript. LC generated the data on glioma and helped to draft the manuscript. HB performed the statistical analysis. AI participated in the design of the study and to the critical analysis of the results. MC conceived of the study, participated in its design and coordination and helped to draft the manuscript. All authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Consent for publication
Not applicable.
Ethics approval and consent to participate
Not applicable.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
 Jin L, Chen Y, Wang T, Hui P, Vasilakos AV. Understanding user behavior in online social networks: a survey. Commun Mag IEEE. 2013; 51(9):144–50.View ArticleGoogle Scholar
 Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B. Measurement and Analysis of Online Social Networks. In: Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement. IMC ’07. San Diego: ACM: 2007. p. 29–42.Google Scholar
 Broder A, Kumar R, Maghoul F, Raghavan P, Rajagopalan S, Stata R, et al. Graph Structure in the Web. Comput Netw. 2000; 33(16):309–20.View ArticleGoogle Scholar
 Erath A, Löchl M, Axhausen K. Graphtheoretical analysis of the swiss road and railway networks over time. Netw Spat Econ. 2009; 9(3):379–400.View ArticleGoogle Scholar
 Kesidis G. An introduction to communication network analysis. Hoboken: Wiley; 2007.View ArticleGoogle Scholar
 Boginski V, Butenko S, Pardolas PM. Statistical analysis of financial networks. Comput Stat Data Anal. 2005; 48(2):431–43.View ArticleGoogle Scholar
 Ideker T, Ozier O, Schwikowski B, Siegel AF. Discovery regulartory and signalling circuits in molecular interaction networks. Bioinformatics. 2002; 18:s233–40.View ArticlePubMedGoogle Scholar
 Keller A, Bakes C, Gerasch A, Kaufmann M, Kohlbacher O, Meese E, et al. A novel algorithm for detecting differentially regulated paths based on gene enrichment analysis. Bioinfomatics. 2009; 25(21):2787–94.View ArticleGoogle Scholar
 Nacu S, CritchleyThrone R, Lee R, Holmes S. Gene expression network analysis and applications to immunology. Bioinformatics. 2007; 23(7):850–8.View ArticlePubMedGoogle Scholar
 Dehmer M, EmmertStreib F. Analysis of microarray data: a networkbased appraoch. Weinheim: John Wiley & Sons; 2008.Google Scholar
 D’haeseleer P, Liang S, Somogyi R. Genetic network inference: From coexpression clustering to reverse engineering. Bioinformatics. 2000; 16(8):707–26.
 Wallace TA, Martin DN, Ambs S. Interaction among genes, tumor biology and the environment in cancer health disparities: examining the evidence on a national and global scale. Carcinogenesis. 2011; 32(8):1107–21.View ArticlePubMedPubMed CentralGoogle Scholar
 Ahern TP, HorvathPuho E, Spindler KLG, Sorensen HT, Ording AG, Erichsen R. Colorectal cancer, comorbidity, and risk of venous thromboembolism: assessment of biological interactions in a Danish nationwide cohort. Br J Cancer. 2016; 114(1):96–102.View ArticlePubMedGoogle Scholar
 Ceccarelli M, Cerulo L, Santore A. De novo reconstruction of gene regulatory networks from time series data, an approach based on formal methods. Methods. 2014; 69(3):298–305.View ArticlePubMedGoogle Scholar
 Turcan S, Rohle D, Goenka A, Walsh LA, Fang F, Yilmaz E, et al. IDH1 mutation is sufficient to establish the glioma hypermethylator phenotype. Nature. 2012; 483(7390):479–83.View ArticlePubMedPubMed CentralGoogle Scholar
 Network CGAR, et al. Comprehensive, integrative genomic analysis of diffuse lowergrade gliomas. N Engl J Med. 2015; 2015(372):2481–98.Google Scholar
 Ceccarelli M, Barthel FP, Malta TM, Sabedot TS, Salama SR, Murray BA, et al. Molecular profiling reveals biologically discrete subsets and pathways of progression in diffuse glioma. Cell. 2016; 164(3):550–63.View ArticlePubMedPubMed CentralGoogle Scholar
 Brandes U, Eriebach T. Network Analysis: Methodological Foundations. Berlin: Springer; 2005, p. 3418.View ArticleGoogle Scholar
 Lena PD, Wu G, Martelli P, Casadio R, Nardini MC. An efficient tool for molecular interaction maps overlap. BMC Bioinforma. 2013; 14(1):159.View ArticleGoogle Scholar
 Yang Q, Sze S. Path matching and graph matching in biological networks. J Comput Biol. 2007; 14(1):56–67.View ArticlePubMedGoogle Scholar
 Ramana MV, Scheinerman ER, Ullman D. Fractional isomorphism of graphs. Discrete Math. 1994; 132(1):247–65.View ArticleGoogle Scholar
 Shervashidze N, Schweitzer P, van Leeuwen EJ, Mehlhorn K, Borgwardt KM. WeisfeilerLehman Graph Kernels. J Mach Learn Res. 2011; 12:2539–61.Google Scholar
 Hamming RW. The unreasonable effectiveness of mathematics. Am Math Monthly. 1980; 87(2):81–90.View ArticleGoogle Scholar
 Butts C, Carley KM. Canonical labeling to facilitate graph comparison. Pittsburgh: Carnegie Mellon University Press; 1998.Google Scholar
 Gill R, Datta S, Datta S. A statistical framework for differential network analysis from microarrya data. BMC Bioinforma. 2010; 11(1):95.View ArticleGoogle Scholar
 Mantel N. The detection of disease clustering and a generalized regression approach. Cancer Res. 1967; 27(2):209.PubMedGoogle Scholar
 Hubert LJ. Assignment methods in combinatorial data analysis. New York: Marcel Dekker; 1987, p. 1.Google Scholar
 Ruan D, Young A, Montana G. Differential analysis of biological networks. BMC Bioinforma. 2015; 16:327.View ArticleGoogle Scholar
 Fuller TF, Ghazalpour A, Aten JE, Drake TA, Lusis AJ, Horvath S. Weighted gene coexpression network analysis strategies applied to mouse weight. Mammilian Genome. 2007; 18(6):463–72.View ArticleGoogle Scholar
 Ha MJ, Baladandayuthapani V, Do KA. DINGO: differential network analysis in genomics. Bioinformatics. 2015; 31(21):3413–20.View ArticlePubMedPubMed CentralGoogle Scholar
 Zhang B, Horvath S. A general framework for weighted gene coexpression network analysis. Stat Appl Genet Mol Biol. 2005; 4(1):1128.Google Scholar
 Allen JD, Xie Y, Chen M, Girad L, Xao GH. Comparing statistical methods for constructing large scale gene networks. PLoS ONE. 2012; 7(1):e29348.View ArticlePubMedPubMed CentralGoogle Scholar
 Deshpande R, Vandersluis B, Myers CL. Comparison of profile similarity measures for genetic interaction networks. PLoS ONE. 2013; 8(7):e68664.View ArticlePubMedPubMed CentralGoogle Scholar
 Benjamini Y, Yekutieli D. The control of false discovery rate in multiple testing under dependency. Ann Stat. 2001; 29:1165–88.View ArticleGoogle Scholar
 Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007; 8(1):118–27.View ArticlePubMedGoogle Scholar
 Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Favera RD, et al. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinforma. 2006; 7(S1):1–15.Google Scholar
 Sales G, Romualdi C. parmigene  a parallel R package for mutual information estimation and gene network reconstruction. Bioinformatics [ISMB/ECCB]. 2011; 27(13):1876–7. Available from: http://dblp.unitrier.de/db/journals/bioinformatics/bioinformatics27.html#SalesR11.View ArticleGoogle Scholar
 Carro MS, Lim WK, Alvarez MJ, Bollo RJ, Zhao X, Snyder EY, et al. The transcriptional network for mesenchymal transformation of brain tumours. Nature. 2010; 463(7279):318–25.View ArticlePubMedGoogle Scholar
 Guan X, Vengoechea J, Zheng S, Sloan AE, Chen Y, Brat DJ, et al. Molecular subtypes of glioblastoma are relevant to lower grade glioma. PLoS ONE. 2014; 9(3):e91216.View ArticlePubMedPubMed CentralGoogle Scholar
 Sing T, Sander O, Beerenwinkel N, Lengauer T. ROCR: visualizing classifier performance in R. Bioinformatics. 2005; 21(20):3940–41.View ArticlePubMedGoogle Scholar
 Mankiewicz R. The Story of Mathematics. Princeton: Princeton University Press; 2004.Google Scholar
 Girvan M, Newman ME. Community structure in social and biological networks. Proc Natl Acad Sci. 2002; 99(12):7821–6.View ArticlePubMedPubMed CentralGoogle Scholar
 Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Stat Mech Theory Exp. 2008; 2008(10):P10008.View ArticleGoogle Scholar
 Rosvall M, Bergstrom CT. Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems. PloS ONE. 2011; 6(4):e18209.View ArticlePubMedPubMed CentralGoogle Scholar
 Reichardt J, Bornholdt S. Statistical mechanics of community detection. Phys Rev E. 2006; 74(1):016110.View ArticleGoogle Scholar
 Orman GK, Labatut V. A comparison of community detection algorithms on artificial networks. In: International Conference on Discovery Science. Berlin: Springer: 2009. p. 242–56.Google Scholar
 Mall R, Langone R, Suykens JA. Multilevel hierarchical kernel spectral clustering for reallife large scale complex networks. PloS ONE. 2014; 9(6):e99966.View ArticlePubMedPubMed CentralGoogle Scholar
 Mall R, Langone R, Suykens JA. Kernel spectral clustering for big data networks. Entropy. 2013; 15(5):1567–86.View ArticleGoogle Scholar
 Mall R, Langone R, Suykens JA. Selftuned kernel spectral clustering for large scale networks. In: Big Data, 2013 IEEE International Conference on. Santa Clara: IEEE: 2013. p. 385–93.Google Scholar
 Dittrich MT, Klau GW, Rosenwald A, Dandekar T, Müller T. Identifying functional modules in protein–protein interaction networks: an integrated exact approach. Bioinformatics. 2008; 24(13):i223–31.View ArticlePubMedPubMed CentralGoogle Scholar
 West J, Beck S, Wang X, Teschendorff AE. An integrative network algorithm identifies ageassociated differential methylation interactome hotspots targeting stemcell differentiation pathways. Sci Rep. 2013; 3:1630.View ArticlePubMedPubMed CentralGoogle Scholar
 Jiao Y, Widschwendter M, Teschendorff AE. A systemslevel integrative framework for genomewide DNA methylation and gene expression data identifies differential gene expression modules under epigenetic control. Bioinformatics. 2014; 30(16):2360–66.View ArticlePubMedGoogle Scholar
 Steinwart I, Hush D, Scovel C. A classification framework for anomaly detection. J Mach Learn Res. 2005; 6:211–32.Google Scholar
 Kumar A, NiculescuMizil A, Kavukcuoglu K, Daume III H. A binary classification framework for twostage multiple kernel learning. 2012:066428. arXiv preprint arXiv:12X00000.
 EckelPassow JE, Lachance DH, Molinaro AM, Walsh KM, Decker PA, Sicotte H, et al. Glioma groups based on 1p/19q, IDH, and TERT promoter mutations in tumors. N Engl J Med. 2015; 372(26):2499–508.View ArticlePubMedPubMed CentralGoogle Scholar
 Noushmehr H, Weisenberger DJ, Diefes K, Phillips HS, Pujara K, Berman BP, et al. Identification of a CpG island methylator phenotype that defines a distinct subgroup of glioma. Cancer Cell. 2010; 17(5):510–22.View ArticlePubMedPubMed CentralGoogle Scholar
 Zhu D, Hunter SB, Vertino PM, Van Meir EG. Overexpression of MBD2 in glioblastoma maintains epigenetic silencing and inhibits the antiangiogenic function of the tumor suppressor gene BAI1. Cancer Res. 2011; 71(17):5859–70.View ArticlePubMedPubMed CentralGoogle Scholar
 Gleize V, Alentorn A, Connen de Kérillis L, Labussière M, Nadaradjane AA, Mundwiller E, et al. CIC inactivating mutations identify aggressive subset of 1p19q codeleted gliomas. Ann Neurol. 2015; 78(3):355–74.View ArticlePubMedGoogle Scholar
 Feng C, Zhang Y, Yin J, Li J, Abounader R, Zuo Z. Regulatory factor X1 is a new tumor suppressive transcription factor that acts via direct downregulation of CD44 in glioblastoma. NeuroOncology. 2014; 16(8):1078–85.View ArticlePubMedPubMed CentralGoogle Scholar
 Bai H, Harmancı AS, ErsonOmay EZ, Li J, Coşkun S, Simon M, et al. Integrated genomic characterization of IDH1mutant glioma malignant progression. Nat Genet. 2016; 48(1):59–66.View ArticlePubMedGoogle Scholar
 Kraskov A, Stögbauer H, Grassberger P. Estimating mutual information. Phys Rev E. 2004; 69(6):066138.View ArticleGoogle Scholar
 Sales G, Romualdi C. parmigene–a parallel R package for mutual information estimation and gene network reconstruction. Bioinformatics. 2011; 27(13):1876–7.View ArticlePubMedGoogle Scholar