Meta-analysis of sex differences in gene expression in schizophrenia

Schizophrenia is a severe psychiatric disorder which influences around 1 % of the worldwide population. Differences between male and female patients with schizophrenia have been noted. There is an earlier age of onset in males compared with females with this diagnosis, and in addition, there are differences in symptom profiles between the sexes. The underlying molecular mechanism of sex difference remains unclear. Here we present a comprehensive analysis to reveal the sex differences in gene expression in schizophrenia with stringent statistics criteria. We compiled a data set consisting of 89 male controls, 90 male schizophrenia patients, 35 female controls and 32 female schizophrenia patients from six independent studies of the prefrontal cortex (PFC) in postmortem brain. When we tested for a sex by diagnosis interaction on gene expression, 23 genes were up-regulated and 23 genes were down-regulated in the male group (q-value < 0.05), several genes are related to energy metabolism, while 4 genes are located on sex chromosome. No genes were statistically significant in the female group when multiple testing correction were conducted (q-value <0.05), most likely due to the small sample size. Our protocol and results from the male group provide a starting point for identifying the underlying different mechanism between male and female schizophrenia patients. Electronic supplementary material The online version of this article (doi:10.1186/s12918-015-0250-3) contains supplementary material, which is available to authorized users.


Introduction
Schizophrenia is a severe psychiatric disorder with a population frequency of approximately 1 % [30]. Schizophrenia is a syndrome characterized by positive symptoms such as delusions, hallucinations, disorganized speech and grossly disorganized or catatonic behavior; negative symptoms such as affective flatterning, alogia, or avolition [26,30]. The etiology and pathophysiological mechanisms of the disorder are not well understood. Research to date indicates that schizophrenia is a multifactorial neurodevelopmental impairment of the brain that could be attributed to both genetic and environmental factors [2,7,21,23,28]. Gene expression is readout of both the genetic and the environmental factors that contribute to the pathophysiology of schizophrenia. Analysis of human postmortem brain is a powerful approach for the identification of risk factors for schizophrenia, because unlike studies of living patients, detailed molecular investigations can be performed directly in the critical brain regions of interest.
The pathophysiology of schizophrenia is likely to be different between males and females. Sex differences have been noted in several epidemiological analyses. For example, several studies indicate that men have a slightly higher incidence of schizophrenia compared with women. In addition, males have an earlier age of onset of schizophrenia, between 18-25 years of age, compared with the female age of onset which is 25-35 years [20]. The symptoms exhibited by male and female patients with schizophrenia also differ. Males tend to have a greater vulnerability to negative symptoms and traits of disorganization, while females more frequently exhibit depressive symptoms [20]. These findings suggest that different underlying mechanisms of schizophrenia occur in males and females. Therefore, we have investigated sex differences in schizophrenia to gain a better understanding of the pathophysiological mechanisms underpinning this disorder.
To identify the biological factors involved in the pathogenesis of schizophrenia and how they are differentially influenced in the sexes, we have investigated microarray expression data from the prefrontal cortex (PFC) in postmortem brain. The PFC region has been strongly associated with deficits of executive function and other cognitive symptoms that occur in patients with schizophrenia. Gene expression within the PFC has been studied extensively using the microarray approach [4-6, 8, 9, 11, 13, 14, 16, 19, 24]. However, statistical analyses of gene expression data from individual small cohorts have lacked sufficient statistical power to avoid conflicting data in these different studies [17,22]. Metaanalysis is a strategy by which these problems could be addressed, because the data from multiple studies can be combined, thus increasing the statistical power available. In this study, we have collected gene expression data generated from post-mortem cohorts of schizophrenia cases and psychiatrically healthy comparison groups that are included in publicly available databases. We have tested for sex differences in PFC gene expression in schizophrenia, using the meta-analysis paradigm.

Methods and materials
Public microarray datasets of postmortem gene expression in schizophrenia We searched the public database and literature on the study conducted on PFC region of postmortem brains and decided to use Mistry's merged expression dataset [17] for further analysis because this combined cohort contains the largest number of samples that could be accessed from available resources. In Mistry's study, the raw image data of 306 postmortem brain samples from seven different datasets were first pooled together, Robust Multi-array Average (RMA) normalization procedure was then applied on these pooled samples to obtain normalized expression value of each probe set. Out of 306 samples, 246 are available to the public. The RMA normalized expression data (http://www.chibi.ubc.ca/wp-content/uploads/2013/ 02/combined.data.txt) and corresponding clinical data (http://www.chibi.ubc.ca/wp-content/uploads/2013/02/ combined.design.txt) of these 246 samples are available for download on their website and will serve as a starting point in this study. The source of six studies available to the public in Mistry's dataset is summarized in Table 1. ComBat [12] batch effect adjustment was carried out in R environment. We used ComBat() function included in the "sva" package downloaded from BioConductor website (https://www.bioconductor.org/packages/release/bioc/html/ sva.html) to perform batch effect correction on the original dataset. Each study is treated as a batch and default parameter setup is used in running the ComBat function.

Differential expression analysis of each probe set
Expression values of each probe set were modeled using a fixed effect linear model approach, where disease status and imbalanced covariates between two groups are treated as fixed effects to be estimated from data. A model selection procedure was also employed for each probe set to address the confounding effect of imbalanced covariates. The details of the procedure will be described in next section. For each probe set, the t-statistic for the disease effect was then extracted from the selected model. P-values were computed using two-sided t-test. The resulting P-values were further converted to q-values using the qvalue package in R [25] which were defined as the minimum positive False Discovery Rate (q-value(t): Pr(H = 0|T > =t) in Bayesian interpretation over a nested rejection region containing observed statistics t) for multiple testing correction.

Covariate adjustment
The observed covariate imbalance between two groups was analyzed to avoid generating misleading result. However modelling covariates for every gene could unnecessarily diminish statistical power if the covariate does not influence the expression of the gene. In our study, we consider only modelling imbalanced covariates for each probe set. We first obtained a probe set (gene) list where the covariate influence on the gene expression is determined with confidence. This refined gene list was generated using a previous method in a postmortem brain gene expression study, in which a correlation analysis was used to evaluate covariates such as age, post-mortem interval, brain PH etc. that influenced the expression of specific genes across multiple post-mortem normal brain datasets [18]. We extracted genes with meta-Q ≤ 0.01 to indicate that the gene was significantly influenced by a particular covariate, and separate it into two lists: positively-correlated and negatively-correlated. Notice that one probe set could be mapped to multiple genes, we excluded those probe sets appearing in the two lists. For each predefined covariate influenced probe set, we first modelled the expression value with a linear model including that covariate. If the direction of the fitted covariate estimates was inconsistent with the pre-refined list, we exclude this covariate and re-estimate a reduced model for this probe set. For the rest of the probe sets, no covariate adjustment was performed.

Selection of sex-specific differentially expressed genes
To identify genes that had a sex by diagnosis interaction with schizophrenia, we used a strategy similar to [3]. Each individual in our dataset is assigned to one of four subgroups: Schizophrenia Male, Control Male, Schizophrenia Female and Control Female. Individuals could also be combined into two groups based on their diagnosis: Schizophrenia and Control group, or based on their sex: Male and Female group. Differential expression analysis was first performed using the procedure described in section 2.2 within each sex. After the initial probe set list was obtained from each sex, we further eliminated those probe sets that are associated with schizophrenia regardless of sex when all of the following criteria were met: (a) the difference between Schizophrenia and Control groups was statistically significant after multiple test correction (q-value < 0.05); (b) the fold change of Schizophrenia Female vs. Control Females, and Schizophrenia Male vs Control Males should be in the same direction i.e. both higher or both lower; (c) the expression difference was not significant between Male and Female groups (defined as p > 0.05 between Female and Male group). After removal of these probe sets from the initial probe set list, we sorted the remaining probe sets within each sex by q-value and report the top ranked probe sets (q-value < 0.05) as sexspecific differentially expressed genes.

Function enrichment of differentially expressed genes
All differentially expressed genes, along with their Affymetrix ID numbers were imported into EASE (Expression Analysis Systematic Explorer) in DAVID (Database for Annotation, Visualization and Integrated Discovery), and were used to identify functionally significant gene classes (https://david.ncifcrf.gov/) [10]. This webserver uses statistical methods to map and identify functional gene categories (for example, Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) or BioCarta), which are enriched in the significant gene list compared with their presence on the array.

Batch and covariate adjustment
A total of 22277 probe sets is analyzed in this study. Before further analysis, by using hierarchical clustering analysis and Principal Component Analysis (PCA), we found it was necessary to correct for "batch effect" as samples from the same study were clustered together. ComBat [12] was used to correct this technical bias by treating each study as a batch. After ComBat adjustment, the hierarchical clustering and PCA results show that no significant clustering remained in the dataset and the adjusted expression data was suitable for further analysis (Additional file 1). Clinical variables associated with each patient sample were examined to determine possible confounding variables. The age and postmortem interval (PMI) is well matched between schizophrenia and control groups while brain pH shows a significant difference between two groups ( Table 2). Within each sex group, we observe that the brain PH is significantly different between schizophrenia males and control males, on the other hand, all covariates are well balanced in female group ( Table 2). Age at death was significantly different between male and female group (Additional file 2). In the predefined brain PH list, 2413 probe sets' expressions were positively correlated with the covariate and 893 were negatively correlated; in the predefined age related list, the number is 1907 and 3028 respectively. The imbalanced covariates and the proportion of probe sets subject to covariate adjustment in each differential analysis are summarized in Table 3.

Genes with altered PFC expression in schizophrenia
To validate our analytical approach used in this study, we performed differential analysis between Schizophrenia and Control group and compared the derived gene list with two published results in which similar meta-analysis were performed [17,22]. We identified 466 probe sets (representing 427 unique genes) that were significantly downregulated in the schizophrenia cases relative to the controls and 312 probe sets (representing 261 unique genes) significantly up-regulated in schizophrenia with q-value < 0.05. Our results show that a large number of overlapped probe sets were observed between our gene list and the other two studies (Fig. 1). All overlapped probe sets showed the same direction of fold difference between the schizophrenia cases and controls. In comparison with result of Mistry et al. [27], 86 out of 125 probe sets (68.8 %) were also identified by our approach. In comparison with result of Santiego et al., 98 out of 160 probe sets (61.3 %) overlapped with our result [22]. Our method identified a similar proportion of the probe sets from both studies.
We also examined whether the identified 778 probe sets are associated with schizophrenia in case control studies of genetic polymorphisms. We compared our gene list with those deposited in SZGene (www.SZgene.org) database which contains the most comprehensive review of schizophrenia association studies [31]. 80 probe sets representing 68 unique genes are found to be genetically associated with schizophrenia. Genes previously showing strong genetic evidence implicated in schizophrenia identified in this study include: regulator of G-protein signaling 4 (RGS4) [32]; discoidin domain receptor family, member 1(DDR1) [33]; and the selenium binding protein 1(SELENBP1) [34]. The full result is included in Additional file 3.

Sex differences in PFC gene expression in schizophrenia
In the male group, we first identified 138 differentially expressed probe sets with a q-value <0.05. We then removed 80 probe sets which shows differential expression regardless of sex based on the filter defined in the Method part. 50 probe sets representing 46 unique genes were identified as specifically different in male schizophrenia patients relative to male controls: 23 probe sets had lower expression (Table 4) and 27 probe sets have higher expression (Table 5) in males with schizophrenia. In the female group, we were not able to identify any differentially expressed probe sets with q-value <0.05 after multiple testing correction.

Function annotation of male-specific differentially expressed genes
The functions of 46 genes associated with schizophrenia in male patients were manually inspected. Of the genes with significantly lower expression in males with schizophrenia, several were related to energy metabolism (ATP5B, ATP5A1, MRPL23, AFG3L2, ABCG2). 4 genes (BEX1, UBL4A, CD99 and MID1) located on the sex chromosome are identified. We also detected differential expression of 4 genes in the males with schizophrenia that were previously identified by Mistry et al. [17] in which sex was not considered. These were Rho-Related BTB Domain-Containing Protein 3 (RHOBTB3), Bobby Sox homolog (BBX), H3 Histone, Family 3B (H3F3B) and pleckstrin homology domain containing, family B (evectins) member 2 (PLEKHB2). Finally, using DAVID webserver, we performed an enrichment analysis to systematically identify over-representation of biological processes or pathways that are altered in the PFC in male schizophrenia patients. After correction for multiple comparisons, we were unable to identify any significant biological process (False Discovery Rate <0.05) in the GO term Biological Process database. We achieved similarly negative results using the KEGG pathway database.

Discussion
We have reported the gene expression differences that show a sex by diagnosis interaction in the PFC in schizophrenia. To our knowledge, this is the first study using a meta-analytical approach to identify sex differences in this brain region in patients with schizophrenia. There are limited data on sex differences in schizophrenia at a molecular level [15,29], although evidence from epidemiological and animal studies indicates that sex differences exist in this disorder [20,27,30]. Individual post-mortem gene expression studies have low statistical power to identify gene expression differences in schizophrenia. This is, most often due to the small sample sizes and moderate gene expression differences between the diagnostic groups. Meta-analysis, on the other hand, addresses this problem and increases statistical power by combining samples from different subject cohorts. The results obtained from our meta-analysis are robust at the statistical level [17]. Our findings open a new window to understand the different pathophysiological mechanisms that lead to schizophrenia in males and females.
In the differential analyses between schizophrenia and control group, we identified the most number of genes most of which overlapped with the results of two   (Fig. 1). To find the reason for the difference, we run the same procedure as in Mistry's for our dataset. We find that the main difference comes from different treatment of "batch effect". In Mistry's study, they treated each experiment date and study as batch variable. A total of 50 batch date and 6 study was modelled in their linear model framework. Introduction of too many predictor variables will decrease the degrees  We only include a covariate in the model when its influence on the gene is confirmed with confidence; while in Mistry's study, they assumed that the covariate influenced every gene resulting in unnecessary inclusion of unrelated variables in the model. Therefore the approach taken in this study has more detection power and leads to discovery of more genes. Our analyses identified 46 genes that were differentially expressed specifically in male patients with schizophrenia. This finding of 50 probe sets is much larger than the expected number of false positives according to our selection procedure. The expected number of false positives was calculated to be 6.9 (138*0.05 = 6.9). Five genes were related to energy metabolism (ATP5B, ATP5A1, MRPL23, AFG3L2, ABCG2). Genes from this function category are consistently implicated in studies of schizophrenia. Another gene that had altered expression in the male schizophrenia group encodes γ-aminobutyric acid receptor-associated protein-like 1 (GABARAPL1) is an early estrogen-induced gene that when overexpressed, interacts with GABA-A or κ-opioid receptors, and plays a role in cell proliferation and cellular metabolic processes [1]. Function enrichment analysis generates negative results for these genes, as indicated in other microarray studies of the PFC in schizophrenia [17]. These results suggest that a diverse number of molecular functions are disrupted in males with schizophrenia.
No genes could be identified in female group after multiple testing correction (q-value < 0.05). To determine if this is due to a much smaller sample size than in male group, we randomly picked the same number of control and schizophrenia subjects from the male groups ten times and run differential analysis on these samples. The procedure was repeated 100 times. No significant difference could be detected in the expression level of any gene in the phenotype groups (data not shown). We then gradually increase the number of samples until genes could be detected with q-value < 0.05 in each run. We started to identify differentially expressed genes when there are 60 controls and 60 schizophrenia in the sample pool. This analysis showed that increasing the number of samples would improve the likelihood of identifying schizophrenia-associated genes in the females. Very few studies have been conducted of sex differences in gene expression in the PFC in schizophrenia. Vawter et.al. [29] reported that three genes (MDH1, HINT1 and SERPINI1) had decreased expression in PFC region of 13 male schizophrenia patients compared with 11 male controls and no expression difference was observed in comparing 9 female schizophrenia patients with 10 female controls using quantitative PCR. We then extracted the corresponding probe sets from our dataset and summarized the result in Table 6. We observed significantly lower expression levels of all three genes in the schizophrenia group compared with controls ( Table 6, Column 3 and 4). The expression difference of these 3 genes in was also tested in males and females separately ( Table 6, Column 5 to 8). Our analysis suggests that all three genes might be altered by schizophrenia and are not related to sex difference. For MDH1, the expression is significantly decreased in the schizophrenia group of both sexes suggesting that this gene might be down-regulated in schizophrenia regardless of sex. HINT1 and SERPINI1 do not show differential expression between the male and female groups ( Table 6, Column 9). Our study has a larger sample size than that of Vawter et al. thus we have greater statistical power to detect small effects, and we would argue that these genes are associated with schizophrenia but are not differentially expressed between the sexes.