Network topology-based detection of differential gene regulation and regulatory switches in cell metabolism and signaling

Background Common approaches to pathway analysis treat pathways merely as lists of genes disregarding their topological structures, that is, ignoring the genes' interactions on which a pathway's cellular function depends. In contrast, PathWave has been developed for the analysis of high-throughput gene expression data that explicitly takes the topology of networks into account to identify both global dysregulation of and localized (switch-like) regulatory shifts within metabolic and signaling pathways. For this purpose, it applies adjusted wavelet transforms on optimized 2D grid representations of curated pathway maps. Results Here, we present the new version of PathWave with several substantial improvements including a new method for optimally mapping pathway networks unto compact 2D lattice grids, a more flexible and user-friendly interface, and pre-arranged 2D grid representations. These pathway representations are assembled for several species now comprising H. sapiens, M. musculus, D. melanogaster, D. rerio, C. elegans, and E. coli. We show that PathWave is more sensitive than common approaches and apply it to RNA-seq expression data, identifying crucial metabolic pathways in lung adenocarcinoma, as well as microarray expression data, identifying pathways involved in longevity of Drosophila. Conclusions PathWave is a generic method for pathway analysis complementing established tools like GSEA, and the update comprises efficient new features. In contrast to the tested commonly applied approaches which do not take network topology into account, PathWave enables identifying pathways that are either known be involved in or very likely associated with such diverse conditions as human lung cancer or aging of D. melanogaster. The PathWave R package is freely available at http://www.ichip.de/software/pathwave.html.


Content
: Pathway analysis with PathWave using the human pathways from BiGG (page 3)     *up is the number of metabolic reactions up-regulated in lung adenocarcinoma; down the number of down-regulated reactions; and no_ch the number of reactions without notable changes. P is the Bonferroni corrected P-value for the entire pathway.

Stability of P-values and robustness of pathway rankings
The pathway P-values estimated by PathWave suffer from a problem which is generally known for Monte Carlo resampling methods: the empirically determined P-values depend on what random permutations are actually executed and may thus change from execution to execution, even with the same dataset, because the executed number of random permutations (which gives an approximate P-value estimate) is usually much lower than the total number of possible permutations (which would give an accurate P-value estimate).
We therefore evaluated the stability and reproducibility of the pathway P-values and the resulting pathway rankings performing 100 identical PathWave runs for case study 2 (D. melanogaster aging; see article) and using three measures: 1. The variation coefficients (or coefficients of variation) of P-values of each pathway, defined as the ratio σ/μ of the standard deviation σ to the mean μ. Multiplication of this ratio by 100 measures the standard deviation as a percentage of the mean and indicates how strongly P-values variate over 100 identical runs.

2.
The "recall" of pathways (i.e. the number of times a pathway is declared significant) as a function of its mean P-value, indicating how frequently pathways with a P-value close to the threshold are possibly missed or wrongly declared significant.
3. The average pairwise Spearman correlation between the 100 pathway rankings obtained from the 100 identical PathWave runs, indicating how stable pathway rankings are across multiple identical runs.
To evaluate how these measures depend on the number of Monte Carlo samplings (parameter numperm) used for P-value estimation, we executed PathWave 100 times (identical runs) for each of numperm=100, numperm=1000, and numperm=10000.
The results show that for 10,000 random samplings (used in the article), the P-values of single pathways ( Figure S6) are highly stable (standard deviation < 5% of mean for most pathways), as are the overall pathway rankings ( Figure S8; median pairwise Spearman correlation of 0.9996) over 100 runs. Moreover, the results indicate an excellent recall ( Figure S7; right). As expected, pathways with an average P-value very close to 0.05 (here ca. ± 2%) are declared significant in roughly 50% of the runs. Pathways with average P<0.045 have a recall of >95% (100% for average P<0.04).
For 1,000 random samplings (used as a default in the PathWave R package), the results indicate a slightly lower but still sufficient stability. The P-values of most pathways have standard deviations of <20% of their mean ( Figure S6), and the obtained pathway rankings are nearly as stable as for 10,000 random samplings ( Figure S8; median pairwise Spearman correlation of 0.9970). Although pathways with an average P-value of close to 0.05 are recalled with slightly more difficulty ( Figure S7; middle), their overall recall is still good (>95% for average P<0.04; 100% for average P<0.03).
For 100 random samplings, in contrast, both the P-values (Figures S6 and S7) and the pathway rankings ( Figure S8) are not sufficiently stable.
Overall, these anlyses suggest that 1,000 random samplings (default in the R package) are sufficient and offer a reasonably high confidence in the obtained results. The 10,000 random samplings used in the article allow for an even higher confidence as well as an excellent stability.

Figure S6: Stability of pathway P-values as measured by their variation coefficients
The variation coefficients were determined for each single pathway over 100 identical PathWave runs. Boxplots depict the distribution of variation coefficients of the pathways for 100 (left), 1000 (middle), and 10,000 random samplings (numperm).

Figure S7: Recall of pathway significance as a function of the mean P-value
The number of times a pathway is declared significant (at a P-value threshold of 0.05) in 100 identical PathWave runs is drawn as a function of the pathway's mean P-value for 100 (left), 1000 (middle), and 10,000 (right) random samplings (numperm).

Figure S8: Robustness of pathway rankings
The robustness of pathway rankings over 100 identical PathWave runs is shown as boxplots of the rankings' pairwise Spearman rank correlations for 100 (left), 1000 (middle) and 10,000 random samplings (numperm).