The pathway Crosstalk perturbation network model
In this work, the pathway crosstalk perturbation network (PXPN) is proposed as a model for integrating high-throughput perturbation biological data to gain insights into the changes in communication between functional biological processes. Figure 1 illustrates a schematic representation of the elements in the model, while the formal definitions of the elements in the PXPN model are provided in Additional file 1. The model is agnostic to the type of high-throughput data, the pathway description, and the statistical measure or algorithm used to define enrichment.
Basically, the PXPN model consists of four steps: 1) identification of perturbed pathways between two physiological states, 2) identification of crosstalk among the perturbed pathways, 3) identification of perturbations in the crosstalk regions between perturbed pathways, and 4) Network integration. A pseudocode representation of this model is available in Additional file 2. The scripts used in the current study are available in our Github repository (https://www.github.com/hurlab/pxpn_neuropathy).
Step 1 involves taking i) an expression dataset with information on two physiological conditions and ii) a list of pathways, defined by a single set of inclusion criteria and curatorial rules, such as those obtained from the same database. These are used as input for an enrichment function to obtain a list of pathways that are considered to be “enriched,” which in this context indicates a perturbation in pathway activity between the physiological states.
Step 2 involves looking for crosstalk between the pathways that were identified as perturbed in the previous step. Crosstalk between pathways is found if the pathways share genes, or, in other words, if the lists of genes for two pathways overlap (see definition in Additional file 1). The intersection between these two lists represents the genes that belong to the crosstalk region (or regions) between these pathways. Importantly, our research focused only on the crosstalk regions of pathways that were identified as perturbed in step 1.
Step 3 involves taking the list of crosstalk regions previously identified and using the expression data and the same enrichment function on it. Doing this allows identifying which crosstalk regions are perturbed themselves. Perturbation in the crosstalk region between two pathways indicates a change of activity in the molecules that are shared between pathways observed between the two physiological conditions, which in turn indicates a perturbation in the communication between pathways, as the expression of genes that connect the two pathways is being collectively perturbed.
Step 4 involves integrating the outputs of steps 1 and 3 into a network model. This is done by representing the perturbed pathways from step 1 as nodes in a graph, then establishing undirected links between pairs of pathways if the crosstalk region between them was identified as perturbed in step 3. The resulting undirected graph is referred to as a Pathway Crosstalk Perturbation Network, which represents the pathways that are perturbed between two biological states, along with the perturbations found in their crosstalk regions. This network model can be further analyzed from a graph-theoretic perspective.
The present research focused on the changes in pathway perturbation and communication through the analysis of changes in topology, modular structure, and connectivity, in PXPNs associated to physiological transitions. Given two phenotypes, such that one can give way to the other sequentially, the transition from the first to the second phenotype may involve the perturbation of a set of biological functions, which can be modeled using a PXPN. The progression from a physiologically healthy state to a pathological state of disease is an important biomedical case. This pathological state may, in turn, by using pharmacological agents, advance to a state of partially restored functionality. Hypothetically, a “perfect” drug could induce a final transition from the pharmacological state of partial functionality back to a healthy state indiscernible from the original physiological state. Each of these transitions can be modeled as three different PXPNs that represent the perturbations associated with each transition. As a case study, this model was implemented with data from a study of the effects of pioglitazone on murine T2DM-associated neuropathy.
RNA-Seq data
RNA-Seq raw data were obtained from a previous study on pioglitazone’s effects on diabetic complications [12] using leptin receptor-deficient db/db mice, a model of T2DM. In brief, male C57BLKS (BKS) db/+ and db/db mice (BKS.Cg-m+/+Leprdb/J) were purchased from the Jackson Laboratory (Bar Harbor, ME). Mice were fed with or without 15 mg/kg pioglitazone (112.5 mg pioglitazone/kg chow for a dose of 15 mg/kg to the mouse) for 11 weeks starting from 5 weeks of age. Pioglitazone treatment normalized renal function and improved small nerve functions but did not improve large fiber functions. Four complication-prone tissues—sciatic nerve (SCN), dorsal root ganglia (DRG), glomeruli, and kidney cortex—were collected at 16 weeks of age and examined for their genome-wide gene expression profiles using RNA-Seq (HiSeq 2000 paired-end read length of 100 bases). The current study focused on the three groups of SCN data, including db/+ (nondiabetic denoted as “healthy” group), db/db (diabetic denoted as “disease” group), and db/db + Pio (diabetic with pioglitazone treatment denoted as “treatment” group). There were n = 6 samples in each group.
The raw sequencing reads were first cleaned by removing reads containing the adapter or poly-N and removing low-quality (quality score < 30) reads using Trimmomatic [14]. FastQC version 0.10.1 [15] was used to assess the quality of raw reads. Clean reads were mapped to the mouse reference genome GRCm38 (mm10) using Hisat2 [16]. The mapping summaries—such as the percentage of reads that were uniquely mapped, multiple mapped, or unmapped—were then collected from the log files of Hisat2 runs. FeatureCounts [17] was used to count reads mapped to individual genes. Only uniquely mapped reads were used in the counting step. Then, the counting metrics were collected from the summary file of each FeatureCounts run. Genes were omitted with zero expression across all samples for the correlation and differential expression analysis. Fragments per kilobase of exon per million mapped reads (FPKM) as a measurement of transcript expression were calculated using an in-house script.
Pathway enrichment algorithm
The Reactome [18] collection of pathways was used in this study. We used the complete set of Reactome murine pathways available through the Graphite R/Bioconductor package [19]. The generally applicable gene set enrichment (GAGE) [20], a cutoff-free enrichment algorithm, was used to identify significantly enriched pathways that were perturbed by diabetes or treatment. The algorithm was run considering undirected perturbations, with an enrichment significance threshold set to q-value < 0.05.
Network analysis
Calculations for topological parameters—degree, clustering coefficient (CC), network density, average path length, and number of connected components (islands in the network)—were carried out using Igraph [21] for R, NetworkX [22] for Python, and Cytoscape 3.3.0 [23]. Additionally, communities (subsets of nodes with high intraconnectivity and low outbound connections) were detected using the Infomap algorithm [24], as implemented in the Igraph package.
Implementation for the study of physiological transitions in the murine diabetic neuropathy model
For this study of murine diabetic neuropathy, the previously described RNA-Seq expression dataset and pathways from the Reactome database were the inputs of the model. The differences between these groups represent the transitions observed in a patient. First, the patient transitions from a physiologically functional state to a pathological state (health to disease, denoted as HTD). Given therapy, the patient transitions from the pathological state to a pharmacologically modulated state (disease to treatment, denoted as DTT). Finally, with successful therapy the patient transitions back to the physiological state (treatment to health, denoted as TTH). Three networks, each representing one of these physiological transitions, were constructed. It is proposed that changes in pathway connectivity in different transitions indicate changes in the overall impact of a particular pathway activity in the observable phenotype.
Null model
To evaluate the significance of the topological parameters of these three PXPNs, an ensemble of 5000 networks was generated for each transition using a null model by randomly rewiring the edges, with a rewiring probability proportional to the size of the intersection between two pathways (measured as the Jaccard index). For each network, each topological parameter was compared against the null model using a Z-test. This model allowed assessing whether particular topological properties of the obtained network differed from a randomly generated network containing the same number of nodes and edges (not all edges are possible, as not all pathways crosstalk with each other).
A second null model, for evaluating the overall capacity of the method of obtaining non-trivial network structures from gene expression measurements, was employed. For each of the three comparisons (HTD, DTT, and TTH), an ensemble of 1500 random expression datasets was generated by shuffling the gene labels of the original RNA-seq data. These data were used to generate PXPNs using the established pipeline and compared against those of the comparisons.