Insights into the pathogenesis of axial spondyloarthropathy from network and pathway analysis
© Zhao et al.; licensee BioMed Central Ltd. 2012
Published: 16 July 2012
Skip to main content
© Zhao et al.; licensee BioMed Central Ltd. 2012
Published: 16 July 2012
Complex chronic diseases are usually not caused by changes in a single causal gene but by an unbalanced regulating network resulting from the dysfunctions of multiple genes or their products. Therefore, network based systems approach can be helpful for the identification of candidate genes related to complex diseases and their relationships. Axial spondyloarthropathy (SpA) is a group of chronic inflammatory joint diseases that mainly affect the spine and the sacroiliac joints. The pathogenesis of SpA remains largely unknown.
In this paper, we conducted a network study of the pathogenesis of SpA. We integrated data related to SpA, from the OMIM database, proteomics and microarray experiments of SpA, to prioritize SpA candidate disease genes in the context of human protein interactome. Based on the top ranked SpA related genes, we constructed a SpA specific PPI network, identified potential pathways associated with SpA, and finally sketched an overview of biological processes involved in the development of SpA.
The protein-protein interaction (PPI) network and pathways reflect the link between the two pathological processes of SpA, i.e., immune mediated inflammation, as well as imbalanced bone modelling caused new boneformation and bone loss. We found that some known disease causative genes, such as TNFand ILs, play pivotal roles in this interaction.
Axial spondyloarthropathy (SpA) is a family of chronic inflammatory joint diseases of the spine and the sacroiliac joints. One of the major prototypes of SpA is ankylosing spondylitis (AS). The two central features of SpA are inflammation and new bone formation, especially in the spine . The inflammation first occurs around the sites where ligaments attach to the bone. As the inflammation heals, there is new bone formation in the ligament, causing the thickening or hardening of the underlying bone, and eventually the fusion of the vertebral bodies and even the spinal stiffness. It is known that SpA is associated with multiple genes, such as HLA-B27, TNF and IL23R . However, the pathogenesis of SpA remains largely unknown. The complexity of the disorder indicates a multifactorial etiology involving multiple biological processes or pathways.
The pathogenesis of complex chronic diseases such as SpA is believed to happen due to the malfunction of multiple genes and gene products. It has been fairly well confirmed that the propensity of many diseases can be reflected by altered gene and protein expression levels in particular cell types [3, 4]. High throughput experiments at transcriptomic and proteomic levels have been applied to screen potentially disease associated factors of SpA. Some genes involved in the innate immune system, such as SPARC, SLPI and NLRP2, and proteins known to be tumor necrosis factor (TNFa)inducible were identified to be up-expressed in SpA [4–7]. On the other hand, disease associated genes tend to share common functional features, be co-expressed in specific tissues, and their protein products have a tendency to interact with each other . Several computational methods have been developed accordingly to predict disease associated genes based on PPI [9–12], or the integration of gene expression data with PPI [13–15]. Furthermore, research has been conducted trying to identify pathogenic processes by the integrated computational analysis of heterogeneous data sources, including genetics, transcriptomics, proteomics and interactome data. Many specific disease-associated networks have been constructed, including those related to diabetes mellitus, cancers, asthma, Alzheimer's disease, and cardiovascular diseases [16–27]. In addition, some cellular network or signaling pathway databases have systematically collected pathways associated with specific diseases reported in the literature [28, 29]. The disease-associated networks have the promise of allowing for the better understanding of disease pathogenesis, as well as for the identification of potential target sets for therapeutic intervention in the corresponding diseases.
In this work, we integrated SpA-active genes from different resources (known disease genes in the OMIM database , proteomic and microarray experiments) and proposed an approach to prioritize candidate genes in the context of human interactome. We then took out the genes most likely associated with SpA to construct a PPI network of SpA and identified potential pathways involved in SpA. Finally, we drew an overview picture of biological processes involved in the development of SpA.
Our method to construct disease associated network is based on the observation that proteins coded by genes associated with the same disease tend to be closely located to each other in the protein interaction network [8, 31, 32]. Starting with a group of SpA-active genes as seeds, we applied a Katz' centrality based index  to prioritize candidate genes in the PPI network . Given a weighted human interactome represented as a matrix W corresponding to the interaction strength between genes, and a set D of k known disease-active genes as seeds, we define vector x = (x 1, x 2,..., x n )T as initially known activity of genes in the disease, with x i = 1 if gene i is in the set D, x i = 0 otherwise. It should be noted that our SpA-active gene set is a combination of proteomics and microarray data, each of which actually has different confidence levels for the study of disease pathogenesis. A good way is to set different weights for them. But determining the value of each weight makes the solution much more difficult. For simplicity, we just give them the same weight in this study.
where t indicates iteration time, and ϕ is a parameter that sets the relative contributions of the activity x and links in protein interaction networks W to the score. If ϕ is small, the known activity is more important; if ϕ is large, the coupling to the protein neighbours is more important. The parameter ϕ needs to be calibrated with real data. The score could be obtained by performing iterations until the algorithm converges, and then all genes in the PPI network could be ranked according to their s-scores.
where [I-ϕ W]-1 represents the inverse of the matrix I-ϕ W. We solved equation (3) by Jacobi iteration algorithm.
To get a set of SpA-active genes as seeds, we integrated the results of the proteomics and microarray experiments for SpA and obtained 161 genes potentially active in this disease (See Materials and Methods). Our protein interaction network was constructed from the STRING database . A total of 147 of the 161 genes were found present in the STRING PPI network, and these genes were used as seeds to construct the disease gene activity vector x in equation (3).
We searched the Online Mendelian Inheritance in Man (OMIM) database  with a keyword "Spondyloarthropathy" and found 7 causal genes with Entrez gene ID. These 7 genes were used to determine the parameter ϕ. For each of the genes, we determined a candidate gene set of size N(N is about 100), including this disease gene, which locate at, or near the cytogentic loci of the SpA-causative gene. For a known SpA-causative gene in a candidate gene set of size N, if its s-rank calculated by our algorithm is r, then r-ratio, defined as r/N, could reflect how strong this gene is predicted as a disease gene. We determined parameter ϕ as the one minimized the average r-ratios of the known SpA-causative genes.
For all the genes in the PPI network, we calculated the s-score vector by equation (3). Then we ranked the genes in each candidate disease gene set according to their s-cores and got their r-ratios. We repeated this computation by applying different values of ϕ in the area (0, 1/100) and checked the average r-ratios of all the 7 known OMIM disease genes. In this way, the best value of ϕ was determined as 10-4, which minimized the average r-ratios of known OMIM disease genes for SpA. For the optimum ϕ = 10-4, we found an average r-ratio of 0.186, suggesting that the known SpA causative genes were averagely ranked top 18.6% of the candidate genes.
It was known that the performance of our method could be enhanced when partial information about known disease genes was included . Having the optimum value of ϕ determined, we added the 7 known SpA-causative genes into the seeds. Then all the genes in the human PPI network were scored according to equation (3).
The s-score of a gene indicates its possibility associated with the disease. Setting 10% as a cut-off, we took out the genes whose s-scores were top 10% of all genes in the PPI network. We identified a total of 380 genes. Then we limited interactions in the STRING database to weights of at least 0.5, which corresponds to a medium-confidence human genome PPI network , and constructed a subnetwork spanned by these 380 genes. Finally, we obtained a PPI network associated with SpA, which included 367 nodes and 8887 edges.
Furthermore, we searched the DrugBank database  for drugs treating SpA and their protein targets. In fact, currently there is no cure for SpA. Three classes of drugs are used clinically for disease management, that is, to reduce inflammation, relieve pain and stiffness, suppress disease activity and slow disease progression . They are non-steroidal anti-inflammatory drug (NSAID), TNF blocker and disease-modifying antirheumatic drug, while the third class shows disappointing effects in practice . We mapped the protein targets of these drugs onto the SpA PPI network and found 7 targets in this network, i.e., PTGS2, MMP2, MAPK3, HRAS, EGF, NFKB1 and TNF, in which PTGS2 (Cyclooxygenase 2)and TNF (Tumor necrosis factor) are main therapeutic targets of NSAID and TNF blocker, respectively, TNF and NFKB1 are targets for one of the disease-modifying antirheumatic drugs Thalidomide. TNF is also known disease gene of SpA. PTGS1 (Cyclooxygenase 1), the other therapeutic target of NSAID, does not appear in this network because we only took top 10% ranked genes in the global PPI network. As shown in Figure 2, six of the seven targets are located in the inner core of the SpA PPI network, and only one situates at the outer layer, suggesting that the drugs may interfere with the disease by acting on proteins in the core.
Selection of the most significantly enriched GO terms in the SpA PPI network
positive regulation of granulocyte macrophage colony-stimulating factor production
positive regulation of natural killer cell proliferation
immune system process
response to stress
cellular response to stimulus
regulation of cell death
regulation of cell proliferation
regulation of leukocyte activation
regulation of lymphocyte activation
regulation of T cell activation
To identify SpA-relevant biological processes, we mapped the 380 SpA associated genes onto the KEGG and Biocarta pathways, respectively. We used p-value to measure if a pathway is more likely affected by SpA associated genes (see Method part). Given significance level α = 0.05, we found that a total of 40 KEGG pathways are significantly enriched with genes in this group (see Additional file 1, Table S1). Similar pathways in the Biocarta database are also enriched with SpA associated genes. In Additional file 1, Table S2 we just listed four SpA gene enriched pathways included in the Biocarta database but not in the KEGG database.
A central feature of SpA is inflammation, one of the first responses of the immune system to infection or irritation. As listed in Additional file 1, Table S1, SpA is related to a large fraction of pathways in immune system. Some other pathways, although not classified into immune system in the KEGG database, have been known to be highly associated with the function of immune response, such as apoptosis , MAPK signaling pathway , and cell adhesion molecule interactions . Specifically, Additional file 1, Table S1 includes several pathways related to pathogen recognition and inflammatory signalling in innate immune defences, in which the most important one is the Toll-like receptor (TLR) signalling pathway. The innate immune system relies on pattern recognition receptors (PRRs) to detect distinct pathogen-associated molecular patterns (PAMPs). Upon PAMP recognition, PRRs trigger a number of different signal transduction pathways. The pathways induced by PRRs ultimately result in the expression of a variety of proinflammatory molecules, such as cytokines, chemokines, cell-adhesion molecules, and immunoreceptors, which together orchestrate the early host response to infection, mediate the inflammatory response, and also bridge the adaptive immune response  together. The family of TLRs is the major class of PRRs . The association of TLR2 and TLR4 with SpA has been reported . It was noticed that three major signaling pathways were responsible for mediating TLR-induced responses including NF-kB, mitogen-activated protein kinases (MAPKs), and IFN regulatory factors (IRFs) , while we found that the two pathways, MAPKs and NF-kB, which play central roles in induction of a proinflammatory response, are involved in SpA. The tumor necrosis factor (TNFa) is an important upstream protein of the NF-kB pathway, which binds to its receptor to recruit TNF receptor death domain (TRADD) and thus activates NF-kB. TNF inhibitors have been proven highly effective for the treatment ofSpA . In addition, we also found that SpA is associated with some proinflammatory molecule involved pathways, such as the chemokine signaling pathway, natural killer-cell mediated cytotoxicity, Fc epsilon RI signaling pathway, and cell-adhesion molecules interaction. These pathways indicate the process of innate immune response in the progress of SpA. On the other hand, it is known that B and T lymphocytes are responsible for the adaptive immune response . Supplementary Table S1 shows the association of B and T cell receptor signalling pathways with SpA, implying their function in the adaptive immune response in SpA. In fact, it has been known that both the innate and adaptive immune responses are involved and interdependent with each other in SpA .
Another prominent feature of SpA is new bone formation; meanwhile bone loss is also a common finding in SpA . As can be seen in Additional file 1, Table S1 and S2, SpA is associated with osteoclast differentiationand bone remodelling pathways, biological processes that maintain bone density and structure through a balance of bone resorption by osteoclasts and bone deposition by osteoblasts. Both ossification and osteoporosis symptoms of SpA are consequences of an imbalance in the regulation of these two sub-processes of bone remodelling. It is known that the WNT pathway regulates the balance between osteoclast and osteoblast function , verifying our result that SpA is associated with the WNT pathway (see Additional file 1, Table S1). In Additional file 1, Figure S1 we show SpA associated genes involved in the osteoclast differentiation pathway. Only one SpA causative gene from the OMIM database and three SpA active genes from the proteomics and microarray experiments appeared in this pathway, whereas a great fraction of genes involved in this network were predicted by our algorithm.
We have extracted data related to SpA - known SpA causative genes from the OMIM database, proteomic experiments from literature, and microarray experiments from the GEO database. Using these genes as seeds, we developed a Katz'-centrality based index, s-score, to rank genes in the human PPI network. Then we considered 380 top-ranked genes as associated with SpA with high possibility. Based on these genes, we constructed a PPI network and identified potential pathways associated with SpA. The PPI network exhibits a core-periphery topology, in which most seed genes are located at the periphery part, while the inner core aggregates the non-seed genes enriched with innate immune genes and drug targets for SpA, suggesting that the core could be the modulating center of the network. The pathways we have discovered in this way represent the common knowledge of SpA, i.e., that it is an immune-mediated inflammation. Our data also reflect that an imbalanced bone modeling caused new bone formation and bone loss. We also illustrated the interplay between inflammation and bone injure. This network approach represents an alternative method for analyzing the complex effects of candidate genes related to complex diseases.
We collected genes associated with SpA from three resources as follows:
(1) The Online Mendelian Inheritance in Man (OMIM) database : The OMIM database contains information on all known diseases and associated genes. We searched the database with a keyword "Spondyloarthropathy" and found 8 causal genes: HLA-B, TNFA, IL23R, CYP2D6, TNFSF13, TNFSF13B, B2M and COL2A1, in which gene CYP2D6 does not have an Entrez gene ID. The other seven genes are used as SpA-causative genes in this study.
(2) Proteomic experiment results: quantitative proteomics approaches were applied to investigate changes in protein expression in AS (the most common prototype of SpA) monocytes in comparison with healthy controls . We used the 42 genes (Entrez ID) whose encoded proteins were differentially expressed  as potentially active genes in SpA.
(3) The NCBI Gene Expression Omnibus (GEO) database: we searched the GEO database and found 2 microarray experiments related to SpA: GSE1402 (Affymetrix U95Av2) and GSE18781 (AffymetrixHG_U133_Plus2). The GSE1402 experiment included a comparison of peripheral blood mononuclear cells (PBMC) from juvenile SpA with that of normal individuals, and the GSE18781 experiment was an investigation of peripheral blood cells from 18 subjects with SpA and 25 normal individuals. Samples in the GSE18781 experiment were processed as two separate sets at different times: 11 SpA + 12 control subjects in Set 1 and 7 SpA + 13 control subjects in Set 2. We treated the results of these two experiments as three separate datasets.
To integrate gene expression data from the two different platforms, we mapped the probe sets of the platforms to Entrez gene ID. This process yielded a set of 9448 genes common to the two platforms. For each gene in a dataset, we calculated the average expression level for probe sets associated with this gene, and filtered out genes whose mean expression ratios (SpA over control) in the two samples are greater than 0.67 (1/1.5) and less than 1.5. In the next step, we converted the expression value to its rank in the common genes . Thereafter, a nonparametric two sample test, the Wilcoxon rank-sum test, was used to test if a gene is differentially expressed in the SpA and control samples and the p-value of the Wilcoxon rank-sum test was obtained. For such a large number of genes being simultaneously tested, the FDR  corrected p-values were used for screening differentially expressed genes. Given FDR level of 0.10, we found genes that are differentially expresses in the two samples for each of the three datasets. We then combined differentially expressed genes in the three datasets and identified 119 distinct genes potentially active in SpA.
Finally, the combination of disease causative genes from OMIM database, differentially expressed genes from the proteomics and microarray experiments yielded a total of 168 potentially SpA-active genes, which were used in a subsequent analysis.
It is noted that the SpA associated genes collected from the three resources have no overlap, which is likely due to the different levels of data origins. OMIM genes were collected from literatures focused on single gene or protein studies, while microarray and proteomic data were generated from genome-scale experiments at different levels. It is known that the activity of gene and protein is highly dynamic and can change rapidly in response to changes in internal and external environments. However, most current genome-scale experiments could not capture the entire dynamics but only take snapshots at single time points in specific experimental settings. This is why in most cases these data have little overlap. In order to get a better view of the disease pathology, we need to integrate different data resources.
We downloaded human gene location data from the ftp of NCBI MapViewer , which include the chromosomal locations and chromosomal base pair ranges of human genes. For each of the 7 known SpA-causative genes, we determined a set of N candidate genes, including this disease gene, which locate at, or near the cytogentic loci of the disease gene. Here, the number of candidate genes is about 100. If the chromosomal region where the disease gene located has more than 100 genes, we took 100 genes near the disease gene as candidates; otherwise, we took all genes in this region.
Weighted protein-protein interactions (PPI) of human beings were downloaded from version 8.3 of STRING . STRING includes both physical and functional interactions integrated from numerous sources, including experimental repositories, computational prediction methods and public text collections; uses a scoring system to weigh the evidence of each interaction; and includes the interactions between 14532 proteins (Entrez gene ID) of human genome. We normalized the interaction scores in STRING to the area [0, 1] and represented the weighted PPI network as a matrix W.
Innate immunity-relevant human proteins and their interactions were downloaded from the InnateDB database  on April 25, 2011. Till the day we downloaded the data, this database includes 2310 human genes and 4819 interactions manually collected by literature review.
Protein expression data in human normal tissue were downloaded from the web of the Human Protein Atlas . Human Protein Atlas portal is a publicly available database including the spatial distribution and the relative abundance of proteins in 46 different normal human tissues and 20 different cancer types, as well as 47 different human cell lines. The protein abundance scales were combined into four levels: negative, weak, moderate and strong. We downloaded the file named normal_tissue.csv on September 16, 2011 and then extracted proteins expressed in hematopoietic cells of bone marrow and their abundance scales. This dataset includes 10798 gene-coded proteins (Entrez ID) in total.
We downloaded pathway data from the FTP service of KEGG  (Kyoto Encyclopedia of Genesand Genomes) on June 21, 2011. The KEGG PATHWAY section is a collection of manually drawn pathway maps representing the information on the molecular interaction and reaction networks. The hsa_pathway.list file in this section includes a list of known proteins encoded by H. sapiens's genome and the corresponding pathways in which these proteins are involved.
Given a significance level α, a p-value smaller than α demonstrates a low probability that the items with the desired feature are chosen by chance. Hence this p-value can be used to measure whether the n samples drawn from the set is more enriched with items of the desired feature than would be expected by chance .
mitogen-activated protein kinases
pattern recognition receptors
tumor necrosis factor
TNF receptor death domain.
This research was supported by the National Natural Science Foundation of China(10971227); the Special Program for New Drug Innovation of the Ministry of Science and Technology, China (2009ZX09311-001, 2008ZX09101-Z-029); the Swedish Foundation for Strategic Research, the Swedish Research Council and the WCU program through NRF Korea funded by MEST R31-2008-10029-0.
This article has been published as part of BMC Systems Biology Volume 6 Supplement 1, 2012: Selected articles from The 5th IEEE International Conference on Systems Biology (ISB 2011). The full contents of the supplement are available online at http://www.biomedcentral.com/bmcsystbiol/supplements/6/S1.