Research article | Open | Published:
New insights about host response to smallpox using microarray data
BMC Systems Biologyvolume 1, Article number: 38 (2007)
Smallpox is a lethal disease that was endemic in many parts of the world until eradicated by massive immunization. Due to its lethality, there are serious concerns about its use as a bioweapon. Here we analyze publicly available microarray data to further understand survival of smallpox infected macaques, using systems biology approaches. Our goal is to improve the knowledge about the progression of this disease.
We used KEGG pathways annotations to define groups of genes (or modules), and subsequently compared them to macaque survival times. This technique provided additional insights about the host response to this disease, such as increased expression of the cytokines and ECM receptors in the individuals with higher survival times. These results could indicate that these gene groups could influence an effective response from the host to smallpox.
Macaques with higher survival times clearly express some specific pathways previously unidentified using regular gene-by-gene approaches. Our work also shows how third party analysis of public datasets can be important to support new hypotheses to relevant biological problems.
Large scale gene expression analysis with microarray technology is expanding and generating a large amount of high quality, publicly available data. In the present work we analyzed a dataset derived from monkeys infected by smallpox, published by Rubins et al . Smallpox is a lethal disease that was endemic in many parts of the world until eradicated by a massive immunization program developed by the World Health Organization. Its fatality rate was estimated to be 30%, and the survivors often had disfiguring scars .
There are serious concerns about the use of smallpox as a bioweapon [3, 4]. Recently, some health care workers were vaccinated by the UK government for the analysis of antibody responses . Pox viruses display unique abilities to interfere with the host immune system, producing immune modulators  and there are at least 16 viral genes involved in combating the host immune response . The original study's goal was to analyze the evolution of the gene expression of the peripheral blood cells of variola-infected monkeys, so as to clarify the biological processes associated with host-pathogen interactions .
Among the important results was the absence of a TNF-α/NF-κ B-activated transcriptional mechanism during systemic infection, which could suggest that variola interaction with the mammalian host immune system may be ablative to this response .
Given the importance of this dataset, we have analyzed the data with systems biology approaches considering the expression profiles of gene groups instead of individual genes. These approaches are more powerful, less vulnerable to common experimental variation and usually more advantageous for obtaining biologically meaningful results.
Results and discussion
The goal of our analysis was to identify differentially expressed modules in monkeys with early death, i.e., died in the first three days, and monkeys with a later death (ED and LD, respectively). Our hypothesis is that these modules play crucial roles in the host response to smallpox infection. The gene groups were defined according to the KEGG database (updated on 04/17/2006) , as described in the Methods section.
Initially, this analysis was based on the ED and LD monkey groups using all KEGG Orthology (KO) categories (at level 4), indicating three highly active modules in LD monkeys: Cytokines and Cell Adhesion Molecules (CAMs), which are consistent with the classical model of immune and inflammatory responses, where an activation and proliferation of leukocytes take place. The Alzheimer's Disease module also appeared as differentially activated, most likely due to its hybrid composition (20/22 genes that compose the AD module are present in the microarray platform used by Rubins et al , and 14/20 are also members of other modules).
In addition to the ED and LD analysis, modules were also analyzed during the disease course. We have found 27 modules presenting differential activation in at least one point (Figure 1). Several of those modules are related to immune response, like CD molecules, CAM ligands, antigen processing and presentation and the previously considered activated in a late death Cytokines and Cell adhesion molecules.
A novel finding highlighted by our systems biology approach for all macaques (Figure 1) was an altered expression in the Natural Killer (NK) mediated cytotoxicity module. NK cells play a major role in the host-rejection of virally infected cells. Due to their strong cytolytic activity and the potential for auto-reactivity, they are tightly regulated. The activation of NK cells requires the action of proinflammatory cytokines in combination with cell surface receptors . Under normal conditions, inhibitory NK cell receptors engage MHC class I (MHC-I) molecules. Such inhibitory activity normally predominates over that provided by the activating receptors, which bind to ligands expressed on virus-infected and other stressed cells . Cells with normal levels of MHC-I are generally protected from NK cell-mediated lysis. Virally infected cells often express reduced levels of MHC-I and this inability to engage the inhibitory receptors makes the cells susceptible to NK cell attack . The down regulation of Natural killer cell mediated cytotoxicity module reported here could indicate a susceptibility to a NK cell mediated lysis (days 1–3, Figure 1).
The contributions of individual molecules for each significantly altered module are available in the additional material [see Additional files 1 and 2], represented by scores and the respective p-values.
Relevance Networks (RNs)
To understand the profiles of interaction between the genes of the groups studied, we used a procedure known as relevance networks (RNs), initially proposed by Butte et al . Interaction here represents significant correlation between two variables (i.e. molecules), which may (but not necessarily will) physically interact with each other.
Cytokines and cell adhesion molecules (CAMs) modules were highly expressed in LD monkeys in the early phase of the infection (Figure 1) and were further explored in more detail by using the RN approach. A metamodule (CAMs and ECM receptors combined) was created to gain a better understanding of leukocyte migration to the infection locus. The connections between ICAM2 and ITGAV and ITGA6 shows a significant change (p value < 10-3) when comparing day 0 versus the earliest infection groups (days 1–3), see Figure 2A.
To establish a successful infection, pox viruses affect the release of immune modulators by the infected cells, so it is crucial to understand the dynamics of the host immune system . The population of the immune cells varies during the infection, as do their exposed membrane proteins. Therefore, the Relevance Networks approach can reveal a more detailed perspective on the dynamics of the molecules involved in the disease. A better assessment of the variation of those molecules during the onset of the infection could give a better understanding of the host response to smallpox, and ultimately of what may be the key factors that set the 30% mortality rate.
To reach the infection focus and kill the pathogen, the immune cells have to adhere and overpass the endothelium. These events are possible due to the interaction between the integrins and the extra cellular matrix receptors and are modulated by cytokines. More specifically, ESAM is an endothelial cell adhesion molecule and ITGAV, ITGB1 and ITGB5 are integrins, which are heterodimeric integral membrane proteins composed of an alpha chain and a beta chain. ITGAV encodes an alpha chain; ITGB1 and ITGB5 encode beta chain integrins, known to participate in the cell-surface mediated signaling. ICAM2 is an intercellular adhesion molecule that promotes an inflammatory response and the activation of AKT (a kinase). AKT phosphorylates components that block the apoptosis pathway [12, 13]. To explore further the graphs produced by RNs for the ECM receptors + CAMs the relations concerning ICAM2 and ESAM are detailed. The complete network is available in the additional material [see Additional files 3 and 4]. RNs of day 0 and days 1–3 show statistically significant alterations in the interactions between ICAM2 and the integrins ITGAV and ITGA6 (Figure 2A), which are important extracellular matrix receptors, involved in the adhesion, migration, differentiation, survival and proliferation of cells and cytoskeleton organization. Day 0 versus days 4–6 RNs revealed a significant relation between ICAM2 and other molecules that play crucial role in immune response against pathogens, PLXNC1  and CDH5  (Figure 2B). The relevance networks produced for cytokine group is available in the additional material [see Additional file 5].
ESAM shows a strong relation with ITGAV, ITGB1 and ITGB5 with variable patterns along the time course of infection. The quartet of ESAM, ITGAV, ITGB1 and ITGB5 reveals the relationship between alpha and beta chains in the composition of integrins and the relation of such heterodimeric proteins with ESAM. It is worth mentioning that ITGB3, ITGB6 and ITGB8, elements that interact with ITGAV, are present in the microarray platform design, but were excluded from analysis in the filtering steps (see Methods section).
Active modules and the RNs results pointed out drastic changes in the expression levels and interactions of important genes involved in essential cellular processes of the host response, such as cellular adhesion, apoptosis and immune response. These results constitute interesting new biological hypotheses about smallpox virus infection that could be further explored by expert virology groups.
Overlapping with Rubins et al and Jahrling et al
The data analyzed in our manuscript was previously analyzed by Rubins et al  and Jahrling et al . The former focused on the microarray data and the later on the clinical findings. Rubins et al  reported the expression profile in IFN response, cell cycle/proliferation response, TNF-α/NF-κ B regulated genes, dose response and lymphocyte and Ig responses. Meanwhile Jahrling et al  discuss the hemorrhagic diathesis, cytokines activation (IL6, IL8, INFG, CCL2 and CCL4) and lymphocyte depletion.
Rubins et al  performed a detailed analysis of IFN response genes. The KEGG module of cytokines includes IL6, IL8, IFN -α, -β and -γ, CCL2 (MCP-1), CCL4 (MIP1β), TNF-α and NF – κ B. The influence of each molecule in the modules activation was also computed [see Additional files 1 and 2]. Once KEGG separates the molecules into functional classes, it is not possible to define cascading effects, like Rubins et al did . Despite this fact, we manually compiled a module with the IFN-responsive genes detailed by Rubins et al  (in Figure 2A). For the ED and LD analysis, this module is significantly down regulated in monkeys that died early and up regulated in those which died later (see Methods for details). In addition, for the time course analysis, the IFN-responsive module is down regulated for day 0 and up regulated for days 1–3 and inactive for the other days (data not shown). Therefore, our results suggest that activation of IFN-responsive genes may play an important role in a successful immune response upon infection, as previously reported by Rubins et al .
The decrease in the relative abundance of B cells and MHC II reported by Rubins et al  are corroborated by the profile of module Antigen processing and presentation (Figure 1). The Cell cycle, Complement and coagulation cascades, Cytokines and Hematopoietic cell lineage modules identified here were also reported by Rubins et al  as differentially expressed using gene by gene approaches.
Altered D-dimer levels were interestingly investigated by Jahrling et al , which reflect the serious influence of smallpox virus inhibitors in the complement and coagulation cascades. Our active modules are also in agreement with such findings. Our results suggest an unbalanced cytokine levels, which is strongly associated with sepsis and organ failure death in infected animals.
In addition, we have also observed alterations in the hematopoietic cell lineage module which could be a consequence of the influx of immature leukocytes into the bloodstream, particularly neutrophils (aka "left shift", using medical jargon).
Rubins et al  observed a considerable depletion of T cells, and a large number of apoptotic cells in the splenic periarterial lymphatic sheath, possibly due to the viral replication in antigen presenting cells in the lymphatic tissues, which could lead to apoptosis in the lymphocytes. The lysis of vulnerable cells induced by NK cells (described earlier) and the down regulation of the NK cell mediated cytotoxicity module supports such hypothesis.
The usage of active modules pointed out the major role of the cytokines and ECM receptors in an effective host response. We have also pointed out that the population of the ECM receptors and cytokines significantly changes during the infection, possible due to a host response to eliminate the virus. In order to have a closer perspective of the dynamics of the molecules involved in this response we used the Relevance Networks. Despite the need of an experimental validation of these new biological hypotheses, one of our goals was to exemplify and stress the importance of public microarray datasets analysis by using different and novel bioinformatics models and thus expand knowledge and information about important biological problems.
In this work we used cDNA microarray data containing 37,632 elements that represents approximately 11,000 unique genes and approximately 12,000 genes without identification [1, 17]. The slides were produced with human cDNA clones and the test samples were extracted from primate models (cynomolgus macaques). The samples of interest were always labeled with Cy5 and the reference samples were labeled with Cy3. The original dataset was obtained from the SMD [18, 19] and is associated with the Experimenter and ExptDate keys "KRUBINS" and "2001-08-05", respectively. In this work, only data associated to studies 1 (Harper or India 7124 variola strains through a combination of aerosol exposure [5 × 108 plaque-forming units (pfu)] and high-titer i.v. inoculation [109 pfu]) and 2 (India 7124 strain via i.v. route alone [109 pfu])  were used. Our goal is to analyze the host response to a successful infection; therefore the third study (varied viral dose) data was not used here.
The raw data was submitted to exploratory analyses. The main problem found was a bias of log ratio values from left to right in the majority of the chips, see Figure 3A. To address this problem, we used a normalization procedure proposed by Futschik and Crompton  known as OLIN. The OLIN method has some clear advantages: (i) it relies on strong statistical formalism; (ii) has the capacity to remove spatial and intensity (systematic) errors in microarray data, which is the main problem we found in the original dataset; (iii) Futschick and Crompton  precisely addressed the importance of optimization of model parameters on microarray data normalization (instead of simply accepting the method default values); (iv) the method performance was conveniently compared to other proposed methods . Basically, the effect of log ratio values in relation to mean expression values (M and A, respectively) was is estimated using LOWESS regression with an optimization step for the selection of the parameter α A that control the smoothing of the regression. Subsequently, a new LOWESS regression was performed with log ratio values based on the x and y directions on the slide, and again, the optimization step of the parameters α x and α y was performed. This model successfully corrected the spatial-dependent artifact observed in the dataset (Figure 3B). Also, in a filtering step, we used only spots with signal intensity greater than background intensity in any sample and all spots with signal intensity greater than 2.5 times background intensity in 80% of the samples.
In order to find significantly altered gene groups we used the "active modules" approach, originally proposed by Segal et al . The rationale of this method is to evaluate differential activation or repression of gene modules, which represent the biological processes.
The first step was to organize the data in a matrix E ij , i = 1,...,n s , j = 1,...,n l , where e ij represents the gene i in the sample j. In the present study, 199 groups were selected according to the level 4 of KEGG  (updated on 04/17/2006) [see Additional file 6], so each group has n g genes, g = 1,...,199.
After the data normalization described earlier, expression values were standardized by subtracting the average expression calculated over all the arrays. This process adjusted the mean values of each gene to zero, which were then used to build a discrete version of E ij , named E ij *, by applying the indicator function e ij * = I(e ij ≥ 1) - I(e ij ≤-1) for all i and j (an indicator function assumes value 0 or 1 for false or true, respectively). Thus, E ij * is a matrix where each element can assume the discrete values 1 (induction), 0 (equal expression) or -1(repression).
Using the matrix E ij * the following operations are performed:
For each of the g groups of interest, each array j was then splitted in two disjoint sets of genes: one with exactly the n g elements of the group, and other constituted by the remaining n s -n g genes. One should also consider two random variables Y1j(g)and Y2j(g)representing the number of induced genes in the first and second group, respectively.
Under the null hypothesis of no biological activity for g in sample j, Segal et al  propose that Y1j(g)~hypergeometric(n g ; n s -n g ; q1j) and Y2j(g)~hypergeometric(n g ; n s -n g ; q2j). Intuitively, the absence of biological activity makes the probability of an induced (or repressed) gene to belong to the first group equal to 1/2 and p-values associated with the observed values y1j(g)and y2j(g)can be easily calculated. This procedure defined a new matrix P gj where the element p gj represents the fraction of induced genes of group g in the array j, or zero for the non-significant modules (we assumed a significance level α = 0.05). In the cases where the significance is for the repressed genes, we just made the fraction p gj negative. The arrays were then categorized in biological types t = 1,, t and for each pair of gene set g and biological type t, we count the number of arrays of type t that showed gene set g induced (repressed). We then used the hypergeometric distribution to classify each gene group as significantly induced or repressed across the biological types (p value < 0.05).
The biological types are defined as following: nt = 1 means that the macaque was not infected, nt = 2 means death 1 to 3 days after infection, nt = 3 means death 4 to 6 days after infection and nt = 4 means death after the 6th day. nt = 2 was also labeled Early Death (ED) and nt = 3 and nt = 4 were labeled Late Death (LD). The contribution of molecules for activation of a significantly altered module was computed for time course and "ED and LD" analyses [see Additional files 1 and 2]. Using these values, it is possible to evaluate the influence of each gene in the module activation or repression (p-value).
Relevance Networks (RNs)
For a better understanding of the Relevance Networks (RN) method, a definition of the term interaction is required. Here we are interested in a more general definition of protein interaction, which goes beyond physical interaction and includes also functional or phylogenetic relationships. Interaction here is inferred by significant correlation between two variables (i.e. molecules), which may (but not necessarily do) physically interact with each other. Aiming to identify such correlations, the Relevance Networks (RN) method, initially proposed by Butte et al , was employed. The method computes the squared linear Pearson correlation coefficient, , between all gene pairs for each biological type (as described in the active modules section of methods), defining a fully connected graph. Using a re-sampling method, the algorithm estimates a cut-off value, c, to split the graph into small sub-graphs where > c. These sub-graphs are the relevance networks (RNs).
In order to identify significant differences between two RNs, we adapted this method to contrast two distinct conditions. Therefore, the linear Pearson correlation coefficients between every pair of genes in each condition, and are computed and then a Fisher's Z-transformation  is used to translate the probability distribution of the random variable - into an approximated standard normally distributed random variable, which permits the identification of pairs of genes that displayed significant (positive or negative) differences between and (p < 0.001). This type of analysis was accomplished for the 199 gene groups used in the active modules procedure, linking the findings with biological knowledge.
Rubins KH, Hensley LE, Jahrling PB, Whitney AR, Geisbert TW, Huggins JW, Owen A, Leduc JW, Brown PO, Relman DA: The host response to smallpox: analysis of the gene expression program in peripheral blood cells in a nonhuman primate model. Proceedings of the National Academy of Sciences of the United States of America. 2004, 101: 15190-15195., 10.1073/pnas.0405759101
Fenner F, Anderson D, Arita I, Jezek Z, Ladnyi I: Smallpox and Its Eradication. World Health Organization. 1988
Henderson DA, Inglesby TV, Bartlett JG, Ascher MS, Eitzen E, Jahrling PB, Hauer J, Layton M, McDade J, Osterholm MT, O'Toole T, Parker G, Perl T, Russell PK, Tonat K: Smallpox as a biological weapon: medical and public health management. Working Group on Civilian Biodefense. Jama. 1999, 281: 2127-2137., 10.1001/jama.281.22.2127
Bozzette SA, Boer R, Bhatnagar V, Brower JL, Keeler EB, Morton SC, Stoto MA: A model for a smallpox-vaccination policy. The New England Journal of Medicine. 2003, 348: 416-425., 10.1056/NEJMsa025075
Putz MM, Midgley CM, Law M, Smith GL: Quantification of antibody responses against multiple antigens of the two infectious forms of Vaccinia virus provides a benchmark for smallpox vaccination. Nature Medicine. 2006, 12: 1310-1315., 10.1038/nm1457
Turner PC, Moyer RW: Poxvirus immune modulators: functional insights from animal models. Virus Research. 2002, 88: 35-53., 10.1016/S0168-1702(02)00119-3
Seet BT, Johnston JB, Brunetti CR, Barrett JW, Everett H, Cameron C, Sypula J, Nazarian SH, Lucas A, McFadden G: Poxviruses and immune evasion. Annual Review of Immunology. 2003, 21: 377-423.
Kanehisa M, Goto S, Kawashima S, Nakaya A: The KEGG databases at GenomeNet. Nucleic Acids Research. 2002, 30: 42-46., 10.1093/nar/30.1.42
Andoniou CE, Andrews DM, Degli-Esposti MA: Natural killer cells in viral infection: more than just killers. Immunological Reviews. 2006, 214: 239-250., 10.1111/j.1600-065X.2006.00465.x
Cerwenka A, Lanier LL: Natural killer cells, viruses and cancer. Nature Reviews Immunology. 2001, 1: 41-49., 10.1038/35095564
Butte AJ, Tamayo P, Slonim D, Golub TR, Kohane IS: Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. Proceedings of the National Academy of Sciences of the United States of America. 2000, 97: 12182-12186., 10.1073/pnas.220392197
Perez OD, Kinoshita S, Hitoshi Y, Payan DG, Kitamura T, Nolan GP, Lorens JB: Activation of the PKB/AKT pathway by ICAM-2. Immunity. 2002, 16: 51-65., 10.1016/S1074-7613(02)00266-2
Brunet A, Bonni A, Zigmond MJ, Lin MZ, Juo P, Hu LS, Anderson MJ, Arden KC, Blenis J, Greenberg ME: Akt promotes cell survival by phosphorylating and inhibiting a Forkhead transcription factor. Cell. 1999, 96: 857-868., 10.1016/S0092-8674(00)80595-4
Comeau MR, Johnson R, DuBose RF, Petersen M, Gearing P, VandenBos T, Park L, Farrah T, Buller RM, Cohen JI, Strockbine LD, Rauch C, Spriggs MK: A poxvirus-encoded semaphorin induces cytokine production from monocytes and binds to a novel cellular semaphorin receptor, VESPR. Immunity. 1998, 8: 473-482., 10.1016/S1074-7613(00)80552-X
Tzima E, Irani-Tehrani M, Kiosses WB, Dejana E, Schultz DA, Engelhardt B, Cao G, DeLisser H, Schwartz MA: A mechanosensory complex that mediates the endothelial cell response to fluid shear stress. Nature. 2005, 437: 426-431., 10.1038/nature03952
Jahrling PB, Hensley LE, Martinez MJ, Leduc JW, Rubins KH, Relman DA, Huggins JW: Exploring the potential of variola virus infection of cynomolgus macaques as a model for human smallpox. Proceedings of the National Academy of Sciences of the United States of America. 2004, 101: 15196-15200., 10.1073/pnas.0405954101
Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson J, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, Staudt LM: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature. 2000, 403: 503-11., 10.1038/35000501
Sherlock G, Hernandez-Boussard T, Kasarskis A, Binkley G, Matese JC, Dwight SS, Kaloper M, Weng S, Jin H, Ball CA, Eisen MB, Spellman PT, Brown PO, Botstein D, Cherry JM: The Stanford Microarray Database. Nucleic Acids Research. 2001, 29: 152-155., 10.1093/nar/29.1.152
SMD Advanced Search. http://smd.stanford.edu/cgi-bin/search/QuerySetup.pl
Futschik M, Crompton T: Model selection and efficiency testing for normalization of cDNA microarray data. Genome Biology. 2004, 5: R60-., 10.1186/gb-2004-5-8-r60
Cleveland W: Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association. 1979, 74: 8-10.2307/2286407., 10.2307/2286407
Segal E, Friedman N, Koller D, Regev A: A module map showing conditional activity of expression modules in cancer. Nature Genetics. 2004, 36: 1090-1098.
Biostatistical analysis: Prentice Hall. 1999
Special thanks to Lucas Perin for the English revision and the three reviewers for critical and insightful suggestions. The authors would like also to thank Dr. Eduardo Jordão Neves for discussions about the statistical models and for reading the manuscript; Dr. Luiz Fernando Reis, Dra. Erna Kroon, Dr. Ricardo Vêncio, Tobias Mann, Leonardo Varuzza, Elier Cristo and Dr. Roberto Hirata Jr. for helpful discussions. GHE is supported by CAPES grant; ES is supported by CNPq grant; ACQS, TMV, RAD and RO are supported by FAPESP grants. GHE, ACQS and TMV are PhD students in Bioinformatics at Universidade de São Paulo; ES and RO are PhD students in Statistics at Instituto de Matemática e Estatística, Universidade de São Paulo; RAD is PhD student in Computer Science at Instituto de Matemática e Estatística, Universidade de São Paulo.
All authors conceived the study. GHE implemented and adapted the methods. ACQS implemented the retrieval of the groups from KEGG. ACQS and TMV analyzed the results and drafted the latest version of the manuscript. ES, RAD, RO participated in the statistical modeling. All authors read and accepted the final version of this manuscript.