Volume 5 Supplement 1
Functional pathway mapping analysis for hypoxia-inducible factors
© Chuang et al; licensee BioMed Central Ltd. 2011
Published: 20 June 2011
Hypoxia-inducible factors (HIFs) are transcription factors that play a crucial role in response to hypoxic stress in living organisms. The HIF pathway is activated by changes in cellular oxygen levels and has significant impacts on the regulation of gene expression patterns in cancer cells. Identifying functional conservation across species and discovering conserved regulatory motifs can facilitate the selection of reference species for empirical tests. This paper describes a cross-species functional pathway mapping strategy based on evidence of homologous relationships that employs matrix-based searching techniques for identifying transcription factor-binding sites on all retrieved HIF target genes.
HIF-related orthologous and paralogous genes were mapped onto the conserved pathways to indicate functional conservation across species. Quantitatively measured HIF pathways are depicted in order to illustrate the extent of functional conservation. The results show that in spite of the evolutionary process of speciation, distantly related species may exhibit functional conservation owing to conservative pathways. The novel terms OrthRate and ParaRate are proposed to quantitatively indicate the flexibility of a homologous pathway and reveal the alternative regulation of functional genes.
The developed functional pathway mapping strategy provides a bioinformatics approach for constructing biological pathways by highlighting the homologous relationships between various model species. The mapped HIF pathways were quantitatively illustrated and evaluated by statistically analyzing their conserved transcription factor-binding elements.
hypoxia-inducible factor (HIF), hypoxia-response element (HRE), transcription factor (TF), transcription factor binding site (TFBS), KEGG (Kyoto Encyclopedia of Genes and Genomes), cross-species comparison, orthology, paralogy, functional pathway
It is a challenge for aerobic life to maintain oxygen homoeostasis due to environmental changes and energy demands. In all metazoans hypoxia-inducible factors (HIFs) are transcription factors (TFs) that play a central role in adaptive processes in hypoxic cellular environments [1, 2]. HIF-1 is a heterodimeric protein composed of an oxygen-sensitive α-subunit (HIF-1α) and a ubiquitously expressed β-subunit (HIF-1β) also called aryl hydrocarbon receptor nuclear translocator (ARNT) . HIF-1α contains 4 functional domains including a basic-helix-loop-helix (bHLH) domain for DNA binding [4, 5], a PER-ARNT-SIM (PAS) domain for dimerization, an oxygen-dependent degradation (ODD) domain for targeting proteosomes, and transactivation domains (N-TAD and C-TAD) for transcriptional activation. HIF-1β contains bHLH, PAS, and TADs. Under hypoxic conditions, accumulated HIF-1α translocates from the cytoplasm to the nucleus and dimerizes with HIF-1β via the bHLH and PAS domains to form the HIF-1 complex. HIF-1α may recruit transcription co-activator P300/CBP and bind to a hypoxia response element (HRE) in the regulatory regions of hypoxia-inducible genes, thus mediating transcriptional activation. The consensus HRE motif is a cis-regulatory element with a core segment of 5′RCGTG3′ (where R is A or G)  that governs the transcription of HIF-responsive target genes in the hypoxia-signaling pathway such as those encoding proteins involved in oxygen transport, iron metabolism, glucose transport, cell proliferation, angiogenesis, invasion, and metastasis [7, 8]. Interestingly, the overexpression of HIF-1α and activation of HIF pathways are observed in tumor cells due to lack of HIF-α ubiquitination and degradation in cancer patients. Hence, the inhibition of HIF-1 expression or activity is an alternative strategy in new cancer therapies .
A metabolic pathway represents a sequence of chemical reactions catalyzed by enzymes; most metabolic pathways retain the same functions in all growth stages of living cells. A signal transduction pathway starts with a signal to a receptor and ends with a change in cell behavior . Hence, signal transduction pathways are implicated in different stages of dynamic transitions of a network in living cells. Both metabolic and signal transduction pathways are important components of physiology and are regulated by diverse mechanisms . The HIF pathway is identified as a signal transduction pathway triggered due to low oxygen supply [12, 13]. The classical representation of a biological pathway provides various associations among genes and proteins as well as system-level insight to discover functional information through molecular interaction. During the last decade, an increasing number of pathway datasets has been established in order to correlate functional interactions within a network and elucidate regulatory mechanisms. Hence, biological pathway analysis may facilitate the design and development of biological experiments.
Based on the evolutionary conservation of genomes of related organisms, the construction of a phylogeny, known as phylogenomics, can be carried out. Accordingly, the validity of phylogenetic analysis provides evolutionary relationships among species . Furthermore, phylogenetic inferences can facilitate the understanding of species derivation and the delineation of homologous genes . Random mutations accumulated over the course of many generations may evolve homologous genes that comprise 2 major categories: orthologous and paralogous genes. The former evolves directly from an ancestral gene through speciation events to daughter species, and the latter diverges after gene duplication events within a single species. Most orthologous genes retain similar functions during the course of evolution, while paralogous genes may gain new functions. The history of orthologous genes reflects the development track of diversified species . Hence, functional divergence studies based on cross-species gene comparison may facilitate the prediction of protein functions and identification of horizontal gene transfer events . Here, a methodology for mapping specific biological pathways across several model species and discovering conserved TF-binding motifs from related homologous genes is established for conserved pathway analysis. When a partial subset of the constructed pathway of the target species disappears, it is possible to efficiently replace the empty nodes by retrieving corresponding paralogous genes possessing similar biological functions .
To evaluate the importance of retrieved orthologous and paralogous genes within the mapped functional HIF pathways, a conventional strategy of searching conserved transcription factor binding sites (TFBSs) by matrix-based search on all retrieved HIF target genes was performed. For a selected homologous gene set, the TRANSFAC database containing a comprehensive set of TF-binding specificities summarized as position-specific scoring matrices (PSSMs) was adopted to identify conserved motifs for transcriptional activities [19, 20]. All identified TFBSs from the retrieved homologous genes within a functional pathway among various species are statistically analyzed and ranked in priority order according to the total number of species possessing common motifs.
Mapping HIF pathways among various species
Statistics of the OrthRate and ParaRate parameters within mapped functional pathways between 2 model species with humans as the reference
Identification of HRE motifs in HIF target genes
Identification of HRE motifs for HIF orthologous target genes
HIF orthologous target genes (KEGG: Orthology K05448)
Paralogous and orthologous gene relationships between 2 species
The proposed methodology provides a new paradigm for investigating functional diversification through mapped biological pathways. A gene exhibits different biological functions when it participates in different functional pathways. In the example in Figure 1, each node in the HIF pathway denotes the proportional number of paralogous genes and the orthologous property between the reference species (HSA) and the query species (DRE). Black background squares indicate that DRE might possess greater flexibility and the possibility of gene substitution than HSA. Gray background squares indicate equivalent numbers of paralogous genes, while white squares indicate that DRE does not possess any correlated orthologous gene in this pathway. When a specific biological pathway is selected for a particular organism, a mapped functional pathway based on the orthologous relationship between the query and reference species is constructed; each node in the pathway is displayed with different shades of gray according to the ParaRate values. Hence, the constructed pathways enhance the visualization effects of the homology and functional conservation between 2 species. Although the biological pathways predicted in silico under hypothetical assumptions may not be accurate, the mapped functional pathways based on cross-species comparison indeed provide clues for explaining functional conservation and alternative solutions. At present, the functional pathway mapping analysis only focuses on 6 selected model species; more model species will be gradually included for comprehensive analyses. In summary, the proposed methods can systematically discover orthology at the level of biological pathways and not merely with individual genes. In addition to genes, functional pathways are also conserved in general evolutionary processes. Analyzing the conserved pathways through cross-species comparison may help biologists to discover and distinguish the diversity of functional pathways. The comparison results may serve as a powerful tool for understanding physiological mechanisms in order to suggest better model species for subsequent in vitro or in vivo experimental designs.
Conserved transcription factor analysis for orthologous and paralogous genes
The number of identified HIF-related TFBSs within an orthologous gene set and a randomly selected gene set
Comparison for HIF orthologous gene set
# of species possessing the identical TFBS (0.85/0.80/0.75)
Total identified motifs
Comparison for a randomly selected gene set
Total identified motifs
The number of identified HIF-related TFBSs within a paralogous gene set and a randomly selected gene set of zebrafish species
Comparison for the paralogous genes
# of gene possessing the identical TFBS (0.85/0.80/0.75)
Total identified motifs
Comparison for randomly selected genes
Total identified motifs
To understand the mechanism behind hypoxia signal transduction, the HIF pathway from KEGG was initially extracted and analyzed. By calculating a newly defined ParaRate, the proposed methodology can discover alternative solutions that may possess high possibilities of replacing original genes at each node within a pathway without losing biological function. According to the obtained OrthRate percentages, the system derives similar gene clusters through cross-species comparison and predicts the functional pathway for the query species. Comparing paralogous genes obtained from a set of target genes facilitates the discovery of probable substitute genes with respect to a specific biological pathway. Accordingly, it is possible to construct a novel functional subpathway with respect to these identified alternative selections. When studying a functional pathway, combinatorial regulation by transcription factors and genes need to be analyzed and discussed simultaneously based on an assumption that the genes within the same pathway can be controlled by common regulators. Hence, a set of genes from the mapped functional pathway can be verified by focusing on the identification of common transcription factor-binding motifs.
We propose a strategy for mapping functional pathways among various species that allows the retrieved orthologous and paralogous genes to be further verified by identifying common TF-binding motifs. Six model species including H. sapiens (HSA), M. musculus (MMU), G. gallus (GGA), D. rerio (DRE), X. tropicalis (XTR), and C. intestinalis (CIN) were used for cross-species comparison. The novel terms OrthRate and ParaRate of the mapped functional pathways were defined to quantitatively indicate the conservation and flexibility of homologous pathways. These calculated values can be applied to enhance the alternative selection of functional genes. To verify the gene replaceability within the inferred pathway through in silico analysis, a conventional strategy was executed by searching for TFBSs on all retrieved homologous genes. Here, the TRANSFAC database was adopted for discovering all conserved transcription elements based on the summarized PSSMs. From the cross-species analysis of mapped HIF pathways, 4 species (HSA, MMU, GGA, and DRE) possess highly similar mechanisms for maintaining HIF function compared to XTR and CIN. Furthermore, the corresponding TFBS analyses were evaluated with consistent performance.
The initial pathway information was acquired from KEGG, a knowledge database for the systematic analysis of gene functions, to link genomic information with higher-order functional information [21, 22]. Each map or pathway in KEGG was categorized into an existing taxonomy according to its function, and each pathway was supplemented with a set of orthologously grouped tables for cross-species information with respect to conserved pathways. The orthologous table summarizes functional correlations in the pathway, physical correlations in genomes, and evolutionary relationships among species. It provides useful information as a reference dataset for functional annotations. Using the HIF pathway as an example, if the user enters the keywords “hypoxia-inducible factor” into the main interface of KEGG (http://www.genome.jp/kegg/pathway.html), the system responds with only 1 entry: map05211. In addition to the searched pathways, users can adopt orthologous information from the KEGG and/or Ensembl databases. In particular, orthologous genes identified in KEGG are not only obtained by evaluating sequence similarity, but also by determining if all constituent members are verified within a functional group, such as a conserved subpathway or a molecular complex. The variations of derived datasets from these 2 resources provide hints for possible modifications of an accurate pathway.
Quantitative measurement of functional pathways
To compare the expression level of orthologous gene clusters in pathways against others by utilizing a quantitative measure, it is necessary to represent those associated genes as mathematical objects and provide measurable indices for effective representation. Here, we define 2 types of homologous rate, OrthRate and ParaRate, which can quantitatively suggest alternative functional genes. OrthRate is defined as the total number of corresponding orthologous genes within a specified pathway from the query species divided by the total number of associated genes within the identical pathway from the reference species. OrthRate indicates the proportional percentage of corresponding genes within cross-species biological pathways. ParaRate is similarly defined as the total number of corresponding paralogous genes within a specified pathway from the query species divided by the total number of associated genes within the identical pathway from the reference species. ParaRate is considered as a replacement ratio of duplicated genes with respect to a specified biological function. According to statistical results, the system can be expected to retrieve possible missing subpathways within an individual species and predict extra direct and/or indirect pathways within each species. To demonstrate the functional conservation and alternative selection of genes among various species, 6 remote model species including C. intestinalis (CIN), X. tropicalis (XTR), G. gallus (GGA), M. musculus (MMU), D. rerio (DRE), and H. sapiens (HSA) were initially considered for orthologous analysis in this study. Users are required to define both the query and reference species in advance. The quantitative measurement of functional pathways can then be obtained by taking the total number of homologous genes within the pathway of the reference species as the denominator and the total number of homologous genes of the query species as the numerator. In this study, both the numbers of orthologous and paralogous genes were obtained from the KEGG database.
Identification of HRE motifs
To demonstrate the validity of in silico pathways in the biological sense, the existence of TFBSs within HIF target genes were considered as the verification criterion. HIFs bind target genes at the functional hypoxia response elements (HREs). An overview of the known target genes of HIF reveals that the length of a HRE is nearly 18 base pairs. The mandatory core HRE sequence is “CGTG”—the minimal DNA motif required for interaction with HIFs. The appearance frequency of HREs located within the flanking sequence is randomly distributed as reported previously . To perform alignment and identify whether the HREs are located within the paralogous genes, we created a position-specific scoring matrix (PSSM) to extract all HRE candidates from retrieved HIF target genes. The PSSM matching mechanism scans through a DNA sequence with a fixed length and identifies the most probable motifs according to the calculated scores and sum of position-specific scores for each symbol in the verified substring [23, 24]. The score value of each substring was obtained by summarizing the corresponding scores of PSSM to the j t h substring S j within the DNA sequence, and the value was calculated as , where i represents the position in the substring, S i is the nucleotide symbol at position i in the substring, and p k , i is the score values in row k, column i of the PSSM matrix of a specific TF-binding pattern P. The PSSM profile of the HRE patterns for HIF target genes was generated according to the published paper by R. H. Wenger et al.  and was called the pattern matrix of V$HIF_STKE_2005. The HRE motif searching mechanism allowed us to verify the functional conservation of HIF responses within the retrieved homologous gene set.
Identification of common transcription factors
The identification of common TFBSs is an in silico analysis for ensuring that the target genes possess identical biological functions  under the assumption that a homologous gene set within the same pathway can be controlled by common regulators. Therefore, we employed the TRANSFAC database for identifying the common TFBSs of a specified HIF target gene set. The TRANSFAC library version 10.3 is a comprehensive set of more than 800 TF-binding specificities (585 for vertebrate species), which was adopted and summarized as PSSMs to search transcription elements. The default PSSM cut-off value of each TF-binding pattern is set as 0.85 for examining all substrings from the selected candidate genes; the common TFBSs identified for all selected homologous genes are displayed in priority order according to the appearance pervasiveness and frequency among the defined gene set. In the developed system for conserved TFBS analysis, users are allowed to select limited TFBSs through keyword filtering functions, and the selected PSSM can be from either the TRANSFAC library or user defined matrices. In particular, 3 HIF-related PSSM matrices, including V$HIF_Q3, V$HIF_Q5, and V$AHRHIF_Q6, were assigned along with the customized factor V$HIF_STKE_2005 for analyzing the orthologous and paralogous gene sets within the HIF pathway.
CSC developed the systems and performed evaluations. TWP conceived the study and drafted the manuscript. MDC validated promoter analysis and proofread the manuscript; CHH, WST, HTC, and CCC participated in the system design and evaluation.
This work was supported by the Center of Excellence for Marine Bioenvironment and Biotechnology (CMBB) in National Taiwan Ocean University and the National Science Council in Taiwan R.O.C. (NSC 99-2627-B-019 -007 to T.-W. Pai; NSC99-2627-B-039-002 and CMU98-CT-28 to H.-T. Chang; NSC99-2627-B-007-001 to M. D.-T. Chang).
This article has been published as part of BMC Systems Biology Volume 5 Supplement 1, 2011: Selected articles from the 4th International Conference on Computational Systems Biology (ISB 2010). The full contents of the supplement are available online at http://www.biomedcentral.com/1752-0509/5?issue=S1.
- Semenza GL: Life with oxygen. Science. 2007, 318: 62-64. 10.1126/science.1147949.View ArticlePubMed
- Wenger RH: Cellular adaptation to hypoxia: O2-sensing protein hydroxylases, hypoxia-inducible transcription factors, and O2-regulated gene expression. FASEB J. 2002, 16: 1151-1162. 10.1096/fj.01-0944rev.View ArticlePubMed
- Mahon PC, Hirota K, Semenza GL: FIH-1: a novel protein that interacts with HIF-1alpha and VHL to mediate repression of HIF-1 transcriptional activity. Genes Dev. 2001, 15: 2675-2686. 10.1101/gad.924501.PubMed CentralView ArticlePubMed
- Ke Q, Costa M: Hypoxia-inducible factor-1 (HIF-1). Mol Pharmacol. 2006, 70: 1469-1480. 10.1124/mol.106.027029.View ArticlePubMed
- Jiang BH, Rue E, Wang GL, Roe R, Semenza GL: Dimerization, DNA binding, and transactivation properties of hypoxia-inducible factor 1. J Biol Chem. 1996, 271: 17771-17778. 10.1074/jbc.271.30.17771.View ArticlePubMed
- Wenger RH, Stiehl DP, Camenisch G: Integration of oxygen signaling at the consensus HRE. Sci STKE. 2005, 2005: re12-10.1126/stke.3062005re12.PubMed
- Chun YS, Kim MS, Park JW: Oxygen-dependent and -independent regulation of HIF-1alpha. J Korean Med Sci. 2002, 17: 581-588.PubMed CentralView ArticlePubMed
- Safran M, Kaelin WG: HIF hydroxylation and the mammalian oxygen-sensing pathway. J Clin Invest. 2003, 111: 779-783.PubMed CentralView ArticlePubMed
- Yeo EJ, Chun YS, Park JW: New anticancer strategies targeting HIF-1. Biochem Pharmacol. 2004, 68: 1061-1069. 10.1016/j.bcp.2004.02.040.View ArticlePubMed
- Gough NR, Adler EM, Foley JF: Cell Signaling: Details, details, details. Sci STKE. 2007, 2007: eg9-10.1126/stke.4072007eg9.
- Cakmak A, Ozsoyoglu G: Mining biological networks for unknown pathways. Bioinformatics. 2007, 23: 2775-2783. 10.1093/bioinformatics/btm409.View ArticlePubMed
- Semenza G: Signal transduction to hypoxia-inducible factor 1. Biochem Pharmacol. 2002, 64: 993-998. 10.1016/S0006-2952(02)01168-1.View ArticlePubMed
- Semenza GL: Hypoxia-inducible factor 1 (HIF-1) pathway. Sci STKE. 2007, 2007: cm8-10.1126/stke.4072007cm8.View ArticlePubMed
- Philippe H, Derelle R, Lopez P, Pick K, Borchiellini C, Boury-Esnault N, Vacelet J, Renard E, Houliston E, Queinnec E: Phylogenomics revives traditional views on deep animal relationships. Curr Biol. 2009, 19: 706-712. 10.1016/j.cub.2009.02.052.View ArticlePubMed
- Tatusov RL, Natale DA, Garkavtsev IV, Tatusova TA, Shankavaram UT, Rao BS, Kiryutin B, Galperin MY, Fedorova ND, Koonin EV: The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res. 2001, 29: 22-28. 10.1093/nar/29.1.22.PubMed CentralView ArticlePubMed
- Tsuru T, Kawai M, Mizutani-Ui Y, Uchiyama I, Kobayashi I: Evolution of paralogous genes: Reconstruction of genome rearrangements through comparison of multiple genomes within Staphylococcus aureus. Mol Biol Evol. 2006, 23: 1269-1285. 10.1093/molbev/msk013.View ArticlePubMed
- Raes J, Van de Peer Y: Gene duplication, the evolution of novel gene functions, and detecting functional divergence of duplicates in silico. Appl Bioinformatics. 2003, 2: 91-101.PubMed
- Yosef N, Sharan R, Noble WS: Improved network-based identification of protein orthologs. Bioinformatics. 2008, 24: i200-206. 10.1093/bioinformatics/btn277.View ArticlePubMed
- Matys V, Fricke E, Geffers R, Gößling E, Haubrock M, Hehl R, Hornischer K, Karas D, Kel AE, Kel-Margoulis OV: TRANSFAC®: Transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 2003, 31: 374-378. 10.1093/nar/gkg108.PubMed CentralView ArticlePubMed
- Wingender E, Dietze P, Karas H, Knüppel R: TRANSFAC: A database on transcription factors and their DNA binding sites. Nucleic Acids Res. 1996, 24: 238-241. 10.1093/nar/24.1.238.PubMed CentralView ArticlePubMed
- Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, Itoh M, Katayama T, Kawashima S, Okuda S, Tokimatsu T, Yamanishi Y: KEGG for linking genomes to life and the environment. Nucleic Acids Res. 2008, 36: D480-484. 10.1093/nar/gkm882.PubMed CentralView ArticlePubMed
- Wixon J, Kell D: The Kyoto encyclopedia of genes and genomes--KEGG. Yeast. 2000, 17: 48-55. 10.1002/(SICI)1097-0061(200004)17:1<48::AID-YEA2>3.0.CO;2-H.View ArticlePubMed
- Sen N, Mishra M, Khan F, Meena A, Sharma A: D-MATRIX: A web tool for constructing weight matrix of conserved DNA motifs. Bioinformation. 2009, 3: 415-418.PubMed CentralView ArticlePubMed
- Workman CT, Yin Y, Corcoran DL, Ideker T, Stormo GD, Benos PV: enoLOGOS: a versatile web tool for energy normalized sequence logos. Nucleic Acids Res. 2005, 33: W389-392. 10.1093/nar/gki439.PubMed CentralView ArticlePubMed
- Elnitski L, Jin VX, Farnham PJ, Jones SJ: Locating mammalian transcription factor binding sites: a survey of computational and experimental techniques. Genome Res. 2006, 16: 1455-1464. 10.1101/gr.4140006.View ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.