Animal subjects and surgeries
Male Sprague-Dawley (SD) rats (275–300 g) were purchased from Charles River Laboratories (Wilmington, MA). All experimental procedures were approved by the local Institutional Animal Care and Use Committee and were conducted according to the recommendations provided in the Guide for the Care and Use of Laboratory Animals. Protocols were designed to minimize pain and discomfort during the injury procedure and recovery period. Our mCCI and mFPI injury models were described in [3]. After injury preparation, animals were placed in a warm chamber and allowed to completely recover from anesthesia, and then returned to their home cages.
Gene expression microarray
Using the mTBI animals (n = 4/group), ipsilateral cortical issues underlying the injury site was quickly dissected at 24 h post-injury. Total RNA from the cortical tissue was isolated using the mirVana miRNA Isolation Kit (Invitrogen, Carlsbad, CA), following the manufacturers’ recommended protocol, and amplified using the Illumina TotalPrep RNA Amplification Kit (Ambion, Austin, TX). RNA amplification and microarray hybridization were carried out by The University of Texas Health Science Center Houston Microarray Core Laboratory (Houston, TX). Briefly, first-strand complementary DNA (cDNA) was generated from total RNA by reverse transcription. Second strand cDNA synthesis was initiated by the addition of RNase H/DNA polymerase mix. The complementary RNA (cRNA) was amplified by the in vitro transcription reaction (IVT). cRNA (750 ng) was loaded onto RatRefSeq-12 Illumina Sentrix Beadchip Arrays (Illumina, Inc., San Diego, CA), hybridized overnight, washed, and incubated with streptavidin-Cy3 to detect hybridized biotin-labeled cRNA probes. Arrays were dried and scanned with a BeadArray Reader (Illumina). It was noted that most raw gene expression values were not normally distributed but highly skewed. Therefore, the Box-Cox transformation [14] was used to normalize the distribution for each gene expression values. The Kolmogorov-Smirnov test was used to test for normality of the transformed distribution at a 5% significance level.
Constructing a weighted network from protein interaction information and gene expression data
Experimentally detected protein-protein interactions (PPIs) were downloaded from BioGRID [15], DIP [16], and HPRD [17] databases. Since there are a limited number of experiments detecting PPIs in the rat genome, we also obtained predicted rat PPIs based on onthology, where orthologous interactions were generated by mapping experimentally detected PPIs in human or mouse genomes to pairs of orthologs in rat genome, if such orthologs are available in HomoloGene database [18]. Each edge of the protein interaction network was further weighted by overlaying gene expression information. Specifically, we calculated the absolute value of the Pearson correlation coefficient abs(cor(xi,xj)) as the edge weight, where xi and xj represented the normalized gene expression vectors for genes i and j, respectively. Therefore, each edge of the protein interaction network was weighted by the level of co-expression between its two corresponding genes, with the weights between 0 and 1.
Identifying significant subnetworks
We performed functional network analysis by following the protocol described in [19]. We defined the subnetwork scoring function S as a weighted sum of class relevance R and modularity M.
$$ \mathrm{S}=\beta \mathrm{M}+\mathrm{R} $$
(1)
Here M describes the subnetwork connectivity, and R is a measure of the discriminatory power of the subnetwork genes to differentiate classes. In addition, the parameter β allows us to trade off the effects of the gene expression information with the network modularity on the subnetwork score. To simplify the scoring algorithm, we set β = 1, assuming equal weights of network modularity and class relevance on calculating the network score.
To get a measure of how strongly the genes within a subnetwork are connected, the modularity M was calculated as the mean clustering coefficients Ci of the genes in a subnetwork,
$$ \mathrm{M}=\frac{\sum_{\mathrm{i}}{\mathrm{C}}_{\mathrm{i}}}{\mathrm{n}}\left[\mathrm{for}\ \mathrm{n}>=3;0\ \mathrm{otherwise}\right] $$
(2)
(3)| where Ci was defined as in Dong and Horvath [20, 21]. Ci is the clustering coefficient for node i, where nodes l and m are node neighbors of i, and w represents the weight of the edge between nodes in the subnetwork:
$$ {C}_i=\frac{\sum_{l\ne i}{\sum}_{m\ne i,m\ne l}{w}_{il}{w}_{lm}{w}_{mi}}{{\left({\sum}_{l\ne i}{w}_{il}\right)}^2-{\sum}_{l\ne i}{w}_{il}^2} $$
Intuitively, Ci is a ratio of the weighted triangles that can be made with node i and its neighbors over the sum of the weighted possible triangles extending off of node i. For a general weighted network with edge weights between 0 and 1, the clustering coefficient of node i, Ci also lies between 0 and 1. Ci equals 1 if and only if all neighbors of node i are connected to each other.
The class relevance R is a measure of the ability for a subnetwork to distinguish two classes. To calculate this, the expression values of each gene i in sample j were first normalized to z-scores, zij, which had a mean of 0 and standard deviation of 1 for each gene over all samples. The individual z-scores of each member gene in the subnetwork K for sample j were averaged into a summarized expression \( \mathrm{Vkj}=\frac{\sum_{i=1}^n{z}_{ij}}{n} \), where n is the number of genes in the subnetwork K. A t-test was then used to compare the summarized expression values of samples between two classes and the resulting t-value was denoted as the class relevance R. In this study, the two classes refer to mCCI and mFPI injury models.
A framework demonstrating the steps for finding significant subnetworks is described in Fig. 1. We applied the greedy search algorithm in searching subnetworks [22]. To identify significant subnetworks that discriminate mCCI and mFPI, candidate subnetworks were scored comparing two classes. First, individual differentially expressed genes were used as seeds for growing potential subnetworks. For each seed, two neighboring genes were iteratively added to the seed and subnetwork scores were recalculated. The pair of neighboring genes that yielded the biggest improvement in subnetwork score was added to the seed to form an initial subnetwork of three genes (i.e., an initial triangular subnetwork). Single neighbor nodes were then added iteratively until the subnetwork score could no longer be improved. It is likely that genes are shared across different subnetworks, resulting in potentially redundant subnetworks. The redundant subnetworks were removed by the following steps:
-
1.
Obtaining the scores of all subnetworks and sorting them in a descending order of scores.
-
2.
Iterating through the list of subnetworks and checking for redundancy.
-
a)
if a subnetwork was contained within a higher-scoring subnetwork, we discarded the lower-scoring subnetwork;
-
b)
if a subnetwork was a super set of a higher-scoring subnetwork, we discarded the super set;
-
c)
if there was an overlap in genes between a lower-scoring and higher-scoring subnetwork:
-
i.
if the overlap ≥50% (number of overlapping genes/total number of unique genes), we discarded lower-scoring subnetwork;
-
ii.
if the overlap < 50%, we kept both subnetworks (document them for manual inspection).
To select the significant subnetworks, we calculated the empirical p-values of the identified subnetworks. We first generated the null distribution by permuting the expression vector of genes in the full network. This permutation test dissociated the relationship between protein interaction and gene expression information. We then ran the same subnetwork identification procedure on the permuted data. This process was repeated 100 times and the scores of the resulted random subnetworks were recorded for each permutation. The empirical adjusted p-value for the real subnetwork score was calculated as the fraction of the random subnetworks having a higher score than that real subnetwork [23]. Only the subnetworks with empirical adjusted p-values smaller than 0.05 were selected for further evaluation and analysis.
Gene grouping strategies
To evaluate our approach for identifying biomarkers that distinguish different mTBI classes (mCCI vs. mFPI), we compared our approach with other gene grouping strategies to be used in classification. Two other gene grouping strategies were included in this study: 1) Pathway based gene sets using the list of canonical pathways extracted from the Molecular Signature Database (MSigDB) [24]. 2) Functionally related gene sets based on Gene Ontology (GO) annotations. GO gene sets were determined by retrieving genes for all GO terms that contained less than 50 genes, in order to eliminate the GO terms that were too general in function annotation. The resulted two groups of gene sets were evaluated for their discriminatory potential in classifying TBI classes. For each gene grouping strategy, expression values for each gene set were converted into summarized expression scores as described previously. These expression scores were used to test differential expression between mCCI and mFPI classes and gene sets were ranked according to their discriminatory powers. After redundant gene sets were removed, the resulting gene sets were used as features in training algorithms to build models for predicting TBI classes.