Volume 6 Supplement 1
The preservation of bidirectional promoter architecture in eukaryotes: what is the driving force?
© Xu et al.; licensee BioMed Central Ltd. 2012
Published: 16 July 2012
The bidirectional gene architecture has been studied in many organisms, and the conservation of bidirectional arrangement has received considerable attention. However, the explanation for the evolutionary conservation about this genomic structure is still insufficient. In this study the large scale identification and pathway enrichment analysis for bidirectional genes were performed in several eukaryotes and the comparative analysis of this arrangement between human and mouse were dissected for the purpose of discovering the driving force of the preservation of this genomic structure.
We identified the bidirectional gene pairs in eight different species and found this structure to be prevalent in eukaryotes. The pathway enrichment analysis indicated the bidirectional genes at the genome level are conserved in certain pathways, such as the DNA repair and some other fundamental cellular pathways. The comparative analysis about the gene expression, function, between human and mouse bidirectional genes were also performed and we observed that the selective force of this architecture doesn't derive from the co-regulation between paired genes, but the functional bias of bidirectional genes at whole genome level is observed strengthened during evolution.
Our result validated the coexpression of bidirectional genes; however failed to support their functional relevance. The conservation of bidirectional promoters seems not the result of functional connection between paired genes, but the functional bias at whole genome level, which imply that the genome-wide functional constraint is important for the conservation of bidirectional structure.
The bidirectional promoters, as a special arrangement of neighbouring genes, have been discussed in many previous studies. The bidirectional gene pairs were defined as the divergent genes with the distance between their transcription start sites (TSS) less than 1 kb . The frequency distribution of distance between adjacent gene pairs showed that the bidirectional promoters are prevalent in human genome . It was later discovered that this genomic architecture is also abundant in mouse, Arabidopsis thaliana, yeast and many other species [2–4]. Comparative genomic analysis suggested that this gene-pair structure is conserved in vertebrates [2, 5, 6]. It was therefore believed that the bidirectional promoters possess special biological meaning [2–4, 7].
The co-regulation was believed to be the distinctive feature of bidirectional gene pairs, and the mechanism of the similarity of expression profiles may be the sharing of the regulatory elements . The previous study by Li et al concluded that this genomic arrangement is ancient and conserved during the evolutionary process, where the function relevance of this structure was also reported in the literature . Other comparative genomic researches about the bidirectional gene pairs were also performed [2, 6], but the reason for the structure conservation is still not clear now. The comparative analyses about the expression and function attribution of bidirectional gene pairs at whole genome level between human and mouse in our work could provide the potential explanations for this question.
In this study, we first performed the large-scale identification and pathway enrichment analysis of bidirectional gene pairs among several eukaryotes. Then we analyzed the general evolutionary tendency of this architecture. The functional preference of bidirectional genes at whole genome level was discovered and this preference was found to be conserved among species. The function relevance at the paired genes level as the driving force for the preservation of bidirectional promoters was excluded. The functional bias of bidirectional genes at the whole genome level is strengthened in human compared with mouse, which may imply the genuine origin of the conservation of bidirectional architecture.
Bidirectional promoters are prevalent in eukaryotic genomes
Statistical results of bidirectional promoters and bidirectional genes in the eight selected eukaryotes.
Number of bidirectional promoters
Number of bidirectional genes
Number of all protein coding genes
Percentage of bidirectional genes
The prevalence of bidirectional promoters indicates this genomic architecture or the involved genes may have some special properties which make them preserved during the evolutionary history. We attempted to provide a potential explanation by the comparative analysis of bidirectional promoter among species, especially between human and mouse genome.
Co-regulation of bidirectional gene pairs hardly determines the fate of bidirectional promoters
It was examined that the sequence of bidirectional promoters can regulate both divergent genes . As a result, the co-regulation of paired bidirectional genes can be expected. The co-expression level of paired bidirectional genes had been confirmed to be significantly higher than other neighbouring gene structures by whole-genome microarray data analysis [1, 5]. The significant function relevance had also been observed in the paired genes .
However, there are two potential deficiencies in the former analyses. First, the tandem duplications, which may cause the co-regulation of paired gene as trivial reasons , must be removed to purify the influence of bidirectional promoters. The tandem duplications, representing the genes duplicated in tandem , have pretty high sequence similarity and show symmetry not only in gene expression but also in function. Second, the similar expression pattern in neighbouring genes has been reported in human, drosophila and C. elegans [11, 13, 14], and chromatin-level gene regulation are thought as the most probable explanation for this phenomenon . Consequently, in order to exclude the contribution of chromatin-level gene regulation, the co-regulation level of bidirectional genes should be compared with other adjacent gene architectures as well as the random gene pairs.
As pointed out by Yanai and his co-worker, the similar gene expression profiles may not imply similar functions , so the co-expression of bidirectional genes may not be driven by the biological function but by the shared regulatory elements. The co-regulation of bidirectional gene pairs may not serve as the selection criteria of this genome architecture but the consequence.
The function preference of bidirectional genes increase along with the selection of bidirectional architecture
Enriched KEGG pathways of bidirectional genes in eight different species.
Nucleotide excision repair
Base excision repair
Systemic lupus erythematosus
O-Mannosyl glycan biosynthesis
Metabolic pathways(Global Pathway)
Basal transcription factors
SNARE interactions in vesicular transport
Citrate cycle (TCA cycle)
The more interesting finding is that the conserved bidirectional gene enriched pathways are more likely to involve the basic functions in cell. In order to confirm this observation, the tissue specificity of gene expression was then evaluated by the gene expression profiles. Large-scale gene expression variation has been used to select house-keeping genes in many former researches; the genes with lower expression variation among tissues are regarded as potential house-keeping genes [12, 20, 21]. The calculation formula for gene expression specificity is presented in Materials and Methods . The tissue specificity of bidirectional genes is significantly lower than other genes (Wilcoxom sum rank test, p-value 2.459E-13 for human and 2.338E-13 for mouse), which means the bidirectional genes express widely among different tissues and prefer to perform fundamental functions.
Enriched KEGG pathway of human cBIP and sBIP gene classes.
KEGG pathway ID
Nucleotide excision repair
Base excision repair
Inositol phosphate metabolism
Base excision repair
Nucleotide excision repair
p53 signaling pathway
Enriched KEGG pathway of mouse cBIP and sBIP gene classes.
KEGG pathway ID
Nucleotide excision repair
Fatty acid elongation in mitochondria
Limonene and pinene degradation
Valine, leucine and isoleucine degradation
Porphyrin and chlorophyll metabolism
In this study, we found that the bidirectional gene pairs were prevalent in eukaryotes and the percentage of bidirectional genes declines along with the increasing of genome size. The increasing of genome size is much faster than that of gene number during evolution, which can be attributed to two reasons. First, the growing number and length of introns make the gene longer . Second, the intergenic distance is also greatly expanded. The expanding of intergenic distance inevitably reduces the percentage of bidirectional promoters.
Only protein-coding gene was considered in this work, the non coding transcripts which are pervasive in many organisms were recently found enriched in the upstream of protein-coding genes and shared the same promoter with the adjacent genes [4, 25, 26]. But most of the pervasive non-coding transcripts at bidirectional promoters were considered as unstable which would be degraded soon after the birth, and the function of them was not clear right now . In an attempt to validate the distance distribution of head to head transcript pairs, we took the listed non-coding genes in Ensembl into consideration and re-identified the bidirectional promoters. Although this gene collection don't include all transcripts, the distribution is similar with the former (Additional file 2), and the minor peak which represents the bidirectional promoters indeed increased. However these transcripts are rarely expression quantified and function annotated. Our research focused on the evolution force of the efficient bidirectional promoters, so only the bidirectional promoters which encode two protein-coding genes were considered.
The co-regulation of bidirectional gene pairs has been reported in many studies including the co-expression and function relevance [1, 5]. Here we validated the coexpression of bidirectional genes rather than the cofunction. The bidirectionality has been proved to be an inherent feature of promoters , and the proposed divergent transcription model also thought the genes were transcribed in both direction synchronously . However the GO similarity analysis didn't agree with the functional relevance of paired genes. The bidirectional genes transcribed simultaneously but perform different function in cell. The co-regulation of bidirectional pairs may stem from the shared promoter; however it hardly has effect on the selection of bidirectional promoters because the natural selection of gene order bases on the functional relevance such as operon in prokaryotes. The shuffling of bidirectional linkage between invertebrates and vertebrates also proves the bidirectional structures are not kept by co-regulation.
The cross-species pathway enrichment analysis showed that the functions of bidirectional genes are greatly conserved in certain fundamental function classes like DNA-repair and transcription related pathways. And this function preference may increase along with the selection of bidirectional structure. The bidirectionality is the inherent feature of promoters , the < 1 kb interval between head to head gene pairs can basically determine the co-regulation of paired genes. We assumed that the surrounding nucleotide composition of these genes may be the genuine trigger, the upstream genome structure of these genes are more stable and avoid the insertion of non coding DNAs or other genes which leads to the shorter interval between adjacent gene; however this assumption requires further validation.
Identification of bidirectional promoters in eight eukaryote genomes
The chromosomal positions and sequences information of all the protein-coding genes were fetched from the Ensembl database  (Ensembl gene Build 58) using the Biomart system  for eight selected organisms: Homo sapiens, Mus musculus, Rattus norvegicus, Bos Taurus, Gallus gallus, Drosophila melanogaster, Caenorhabditis elegans and Saccharomyces cerevisiae. The mitochondria genome and unmapped fragments were not included in the following analysis. The gene start sites in Ensembl gene annotation database were regarded as the reliable transcription start site (TSS) of each gene because the full-length cDNA was used to confirm the gene boundaries . The protein-coding genes on each chromosome were sorted according to the TSS coordinates. The neighbouring genes on the same strand were recognized as the head to tail gene pairs, while the opposite strand as the head to head gene pairs. Then the distances between head to head gene TSSs were calculated for the eight organisms respectively.
Removal of tandem duplication
As indicated in the previous works, the tandem duplication can contribute to the local similarity of gene attributions and this substantially affects the neighbouring gene effect analysis . We therefore removed the tandem duplications from the neighbouring gene pairs for the following coexpression and cofunction analysis. For each adjacent gene pairs, corresponding protein sequences were obtained from Ensembl database (Build 58), and then the protein sequences were imported into pair-wise BLAST to get the e-value of sequence similarity (standard setting, word size 2). This method with 0.2 as cut off value has been proved to be powerful to remove ~90% of related genes from a dataset . In this article, we used smaller cut off to reduce false positive rate. The pair with e-value < 0.01 was regarded as tandem duplication and eliminated in the following gene pair similarity analysis.
Extraction of conserved and species-specific bidirectional gene pairs by orthologous linkage between human and mouse
If the human paired bidirectional genes both have the one-to-one orthologous gene in mouse genome and the orthologous gene pairs were still arranged in bidirectional architecture, these bidirectional gene pairs were counted as the conserved bidirectional gene pairs (human cBIP pairs), while other gene pairs as the human specific bidirectional gene pairs (human sBIP pairs). Similarly, the mouse bidirectional gene pairs are also divided into mouse cBIP pairs and sBIP pairs using the human-mouse linkage. The 14024 one-to-one orthologous gene relationships between human and mouse were extracted from Ensembl database via the Biomart. As a result, 540 human conserved bidirectional promoters and 270 human unique bidirectional promoters were classified, while these numbers are 540 and 207 in mouse genome.
Pathway enrichment analysis of bidirectional genes
The calculations in the parentheses refer to the combinatorial calculation. Pathway was recognized as enriched with bidirectional genes if the p-value was lower than 0.05.
Gene expression specificity and coexpression level
where n represents the number of expression datasets, Emax as the maximum expression value of all across cell type expression values, Ei as gene expression value in each microarray experiment. In human and mouse genome, for the mapped head to head gene pairs, head to tail pairs and random-generated 20000 gene pairs, the gene coexpression level were then evaluated as the Pearson correlation coefficient between expression profiles of paired genes separately.
Gene Ontology association analysis
The GO annotation for each gene was extracted from Gene Ontology database . For one gene, the direct annotation was extended to general annotation by appending all the parent nodes of the direct annotation in the GO vocabulary tree . The detail about the algorithm of Resnik semantic similarity was discussed in Li's work . Among all the neighbouring gene pairs, the functional similarities of annotated pairs were then calculated in all three GO subsystems: "biological process", "molecular function", "cellular component", employing an R package for computing semantic similarity based on Gene Ontology annotations called csbl.go .
This work was supported by the National Natural Science Foundation of China (31170795, 91029703), International S&T Cooperation Program of Suzhou (SH201120) and the Major State Basic Research Development Program of China (2010CB945600).
This article has been published as part of BMC Systems Biology Volume 6 Supplement 1, 2012: Selected articles from The 5th IEEE International Conference on Systems Biology (ISB 2011). The full contents of the supplement are available online at http://www.biomedcentral.com/bmcsystbiol/supplements/6/S1.
- Trinklein ND, Aldred SF, Hartman SJ, Schroeder DI, Otillar RP, Myers RM: An abundance of bidirectional promoters in the human genome. Genome Res. 2004, 14 (1): 62-66.PubMed CentralView ArticlePubMed
- Koyanagi KO, Hagiwara M, Itoh T, Gojobori T, Imanishi T: Comparative genomics of bidirectional gene pairs and its implications for the evolution of a transcriptional regulation system. Gene. 2005, 353 (2): 169-176. 10.1016/j.gene.2005.04.027.View ArticlePubMed
- Wang Q, Wan L, Li D, Zhu L, Qian M, Deng M: Searching for bidirectional promoters in Arabidopsis thaliana. BMC Bioinformatics. 2009, 10 (Suppl 1): S29-10.1186/1471-2105-10-S1-S29.PubMed CentralView ArticlePubMed
- Neil H, Malabat C, d'Aubenton-Carafa Y, Xu Z, Steinmetz LM, Jacquier A: Widespread bidirectional promoters are the major source of cryptic transcripts in yeast. Nature. 2009, 457 (7232): 1038-1042. 10.1038/nature07747.View ArticlePubMed
- Li YY, Yu H, Guo ZM, Guo TQ, Tu K, Li YX: Systematic analysis of head-to-head gene organization: evolutionary conservation and potential biological relevance. PLoS Comput Biol. 2006, 2 (7): e74-10.1371/journal.pcbi.0020074.PubMed CentralView ArticlePubMed
- Yang MQ, Taylor J, Elnitski L: Comparative analyses of bidirectional promoters in vertebrates. BMC Bioinformatics. 2008, 9 (Suppl 6): S9-10.1186/1471-2105-9-S6-S9.PubMed CentralView ArticlePubMed
- Adachi N, Lieber MR: Bidirectional gene organization: a common architectural feature of the human genome. Cell. 2002, 109 (7): 807-809. 10.1016/S0092-8674(02)00758-4.View ArticlePubMed
- Simonoff JS: Smoothing methods in statistics. Springer_Verlag, New York. 1996
- Gregory TR, Nicol JA, Tamm H, Kullman B, Kullman K, Leitch IJ, Murray BG, Kapraun DF, Greilhuber J, Bennett MD: Eukaryotic genome size databases. Nucleic Acids Res. 2007, 35: D332-338. 10.1093/nar/gkl828.PubMed CentralView ArticlePubMed
- Skrabanek L, Wolfe KH: Eukaryote genome duplication - where's the evidence?. Curr Opin Genet Dev. 1998, 8 (6): 694-700. 10.1016/S0959-437X(98)80039-7.View ArticlePubMed
- Lercher MJ, Blumenthal T, Hurst LD: Coexpression of neighboring genes in Caenorhabditis elegans is mostly due to operons and duplicate genes. Genome Res. 2003, 13 (2): 238-243. 10.1101/gr.553803.PubMed CentralView ArticlePubMed
- Lercher MJ, Urrutia AO, Hurst LD: Clustering of housekeeping genes provides a unified model of gene order in the human genome. Nat Genet. 2002, 31 (2): 180-183. 10.1038/ng887.View ArticlePubMed
- Caron H, van Schaik B, van der Mee M, Baas F, Riggins G, van Sluis P, Hermus MC, van Asperen R, Boon K, Voute PA: The human transcriptome map: clustering of highly expressed genes in chromosomal domains. Science. 2001, 291 (5507): 1289-1292. 10.1126/science.1056794.View ArticlePubMed
- Spellman PT, Rubin GM: Evidence for large domains of similarly expressed genes in the Drosophila genome. J Biol. 2002, 1 (1): 5-10.1186/1475-4924-1-5.PubMed CentralView ArticlePubMed
- Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G: A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci USA. 2004, 101 (16): 6062-6067. 10.1073/pnas.0400782101.PubMed CentralView ArticlePubMed
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25 (1): 25-29. 10.1038/75556.PubMed CentralView ArticlePubMed
- Yanai I, Korbel JO, Boue S, McWeeney SK, Bork P, Lercher MJ: Similar gene expression profiles do not imply similar tissue functions. Trends Genet. 2006, 22 (3): 132-138. 10.1016/j.tig.2006.01.006.View ArticlePubMed
- Liu Bingchuan, Chen Jiajia, Shen B: Genome-wide Analysis of the Transcription Factor Binding Preference of Human Bidirectional Promoters and Functional Annotation of the Related Gene Pairs. The Fourth International Conference on Computatiomal Systems Biology (ISB2010). 2010, 81-92.
- Bird AP: Gene number, noise reduction and biological complexity. Trends Genet. 1995, 11 (3): 94-100. 10.1016/S0168-9525(00)89009-5.View ArticlePubMed
- de Jonge HJ, Fehrmann RS, de Bont ES, Hofstra RM, Gerbens F, Kamps WA, de Vries EG, van der Zee AG, te Meerman GJ, ter Elst A: Evidence based selection of housekeeping genes. PLoS One. 2007, 2 (9): e898-10.1371/journal.pone.0000898.PubMed CentralView ArticlePubMed
- Eisenberg E, Levanon EY: Human housekeeping genes are compact. Trends Genet. 2003, 19 (7): 362-365. 10.1016/S0168-9525(03)00140-9.View ArticlePubMed
- Liao BY, Scott NM, Zhang J: Impacts of gene essentiality, expression pattern, and gene compactness on the evolutionary rate of mammalian proteins. Mol Biol Evol. 2006, 23 (11): 2072-2080. 10.1093/molbev/msl076.View ArticlePubMed
- Haider S, Ballester B, Smedley D, Zhang J, Rice P, Kasprzyk A: BioMart Central Portal--unified access to biological data. Nucleic Acids Res. 2009, 37: W23-27. 10.1093/nar/gkp265.PubMed CentralView ArticlePubMed
- Lewin B: Genes 9. 2008, Jones and Bartlett Publishers
- Core LJ, Waterfall JJ, Lis JT: Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science. 2008, 322 (5909): 1845-1848. 10.1126/science.1162228.PubMed CentralView ArticlePubMed
- Seila AC, Calabrese JM, Levine SS, Yeo GW, Rahl PB, Flynn RA, Young RA, Sharp PA: Divergent transcription from active promoters. Science. 2008, 322 (5909): 1849-1851. 10.1126/science.1162253.PubMed CentralView ArticlePubMed
- Seila AC, Core LJ, Lis JT, Sharp PA: Divergent transcription: a new feature of active promoters. Cell Cycle. 2009, 8 (16): 2557-2564. 10.4161/cc.8.16.9305.View ArticlePubMed
- Hubbard TJ, Aken BL, Ayling S, Ballester B, Beal K, Bragin E: Ensembl 2009. Nucleic Acids Res. 2009, 37: D690-697. 10.1093/nar/gkn828.PubMed CentralView ArticlePubMed
- Kasprzyk A, Keefe D, Smedley D, London D, Spooner W, Melsopp C, Hammond M, Rocca-Serra P, Cox T, Birney E: EnsMart: a generic system for fast and flexible access to biological data. Genome Res. 2004, 14 (1): 160-169.PubMed CentralView ArticlePubMed
- Curwen V, Eyras E, Andrews TD, Clarke L, Mongin E, Searle SM, Clamp M: The Ensembl automatic gene annotation system. Genome Res. 2004, 14 (5): 942-950. 10.1101/gr.1858004.PubMed CentralView ArticlePubMed
- Lercher MJ, Chamary JV, Hurst LD: Genomic regionality in rates of evolution is not explained by clustering of genes of comparable expression profile. Genome Res. 2004, 14 (6): 1002-1013. 10.1101/gr.1597404.PubMed CentralView ArticlePubMed
- Kanehisa M, Goto S: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000, 28 (1): 27-30. 10.1093/nar/28.1.27.PubMed CentralView ArticlePubMed
- Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003, 4 (2): 249-264. 10.1093/biostatistics/4.2.249.View ArticlePubMed
- Gautier L, Cope L, Bolstad BM, Irizarry RA: affy--analysis of Affymetrix GeneChip data at the probe level. Bioinformatics. 2004, 20 (3): 307-315. 10.1093/bioinformatics/btg405.View ArticlePubMed
- Ovaska K, Laakso M, Hautaniemi S: Fast gene ontology based clustering for microarray experiments. BioData Min. 2008, 1 (1): 11-10.1186/1756-0381-1-11.PubMed CentralView ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.