- Open Access
A regulatory similarity measure using the location information of transcription factor binding sites in Saccharomyces cerevisiae
© Wu et al.; licensee BioMed Central Ltd. 2014
- Published: 12 December 2014
Defining a measure for regulatory similarity (RS) of two genes is an important step toward identifying co-regulated genes. To date, transcription factor binding sites (TFBSs) have been widely used to measure the RS of two genes because transcription factors (TFs) binding to TFBSs in promoters is the most crucial and well understood step in gene regulation. However, existing TFBS-based RS measures consider the relation of a TFBS to a gene as a Boolean (either 'presence' or 'absence') without utilizing the information of TFBS locations in promoters.
Functional TFBSs of many TFs in yeast are known to have a strong positional preference to occur in a small region in the promoters. This biological knowledge prompts us to develop a novel RS measure that exploits the TFBS location information. The performances of different RS measures are evaluated by the fraction of gene pairs that are co-regulated (validated by literature evidence) by at least one common TF under different RS scores. The experimental results show that the proposed RS measure is the best co-regulation indicator among the six compared RS measures. In addition, the co-regulated genes identified by the proposed RS measure are also shown to be able to benefit three co-regulation-based applications: detecting gene co-function, gene co-expression and protein-protein interactions.
The proposed RS measure provides a good indicator for gene co-regulation. Besides, its good performance reveals the importance of the location information in TFBS-based RS measures.
- Gene Pair
- Transcription Factor Binding Site
- Regulatory Neighborhood
- Regulatory Similarity
- Yeast Gene
Identification of co-regulated genes are helpful for solving many biological problems such as unraveling the underlying molecular mechanisms of specific cellular functions, identifying functionally related proteins and dissecting the gene regulatory networks [1–3]. The first step toward identifying co-regulated genes is to define the regulatory similarity (i.e., the degree of co-regulation) of two genes. Gene regulation is a complex process, which involves various mechanisms: transcription factors (TFs) binding, miRNAs binding, epigenetic modifications, etc. Nowadays, various data related to these mechanisms, such as TF binding sites, miRNA binding sites and histone modification patterns, are available for gene regulation study. Among them, TF binding sites (TFBSs) have been the most widely used data. This is because that TFs binding to TFBSs in promoters is the most crucial and well understood step in gene regulation.
To date, many studies have been proposed to use TFBS data to measure the regulatory similarity (RS) of two genes [4–8]. However, existing TFBS-based RS measures consider the relation of a TFBS to a gene as a Boolean (either 'presence' or 'absence') without utilizing the information of TFBS locations. In yeast and human, functional TFBSs of many TFs are known to have a strong positional preference to occur in a small region in the promoters [9, 10]. This biological knowledge prompts us to develop a novel RS measure that exploits the TFBS location information. Following Allocco et al.'s approach , the performances of different RS measures are evaluated by the fraction of gene pairs that are co-regulated (validated by the literature evidence deposited in the YEASTRACT database ) by at least one common TF under different RS scores. The experimental results show that the proposed RS measure was the best co-regulation indicator among the six compared RS measures. In addition, the co-regulated genes identified by the proposed RS measure are also shown to be able to benefit three co-regulation-based applications: detecting gene co-function, gene co-expression and protein-protein interactions.
This study proposes a novel RS measure using the TFBS location information. This section first describes the datasets used in this study and five existing TFBS-based RS measures followed by the proposed RS measure.
Following previous studies in the literature, the promoter of a yeast gene in this study is defined as the intergenic region between this gene and its nearest non-overlapped upstream gene [13–18]. The genomic locations of the start and stop codons of 6604 genes of Saccharomyces cerevisiae (the budding yeast) were retrieved from Nagalakshmi et al.'s work . The genomic locations of 422576 TFBSs of 163 yeast TFs were collected from the SwissRegulon database , which deposited high-quality TFBS datasets predicted using Bayesian probabilistic analysis. Users can choose different posterior probability cutoffs to control the quality of the retrieved TFBSs. This study adopted a moderate cutoff of 0.5 and included a section to discuss the influence of the TFBS quality to the proposed RS measure.
Existing TFBS-based RS measures
Five existing TFBS-based RS measures
Garten et al.1
Veerla and Höglund
Shalgi et al.
Park et al.2
The proposed RS measure
Equations (1)-(5) consider the relation of a TFBS to a gene as a Boolean (either 'presence' or 'absence') without utilizing the information of TFBS locations in the promoters. The biological knowledge that the biological relevance of TFBSs is highly related to their locations in the promoters [9, 10] motivates us to introduce the TFBS location information into the RS measure as follows:
, Eq. (6)
Small TFBS offset distances imply high regulatory similarity
This study is motivated by the biological knowledge that functional TFBSs of many TFs in yeast are known to have a strong positional preference in the promoters . Because the critical regions in the promoters that make TFBSs functional are unknown, Eq. (6) is actually based on a derived hypothesis: if the offset distance of two TFBSs of a common TF in two genes' promoters is small, the two TFBSs are prone to co-present in the critical regions and therefore be co-functional. To investigate the practicability of the above hypothesis, a relation analysis of the co-functionality and the TFBS offset distance was conducted as follows. As shown in Figure 1, a TFBS offset distance can be computed given a TF t and two genes a and b, denoted as a <t, a, b> tuple. In this analysis, the co-functionality related to a TFBS offset distance was defined as the ratio of tuples in which the literature evidences collected by the YEASTRACT database  showed that TF t regulates both a and b to all tuples. The detailed steps are listed below:
• For a TF t, all gene pairs <a, b> whose promoters have the TFBS of t were collected.
• The TFBS offset distance (as d i in Figure 1) of t relative to <a, b> was calculated.
• A tuple <t, a, b> was stored in the bucket of the TFBS offset distance, B d , where d is the TFBS offset distance of <t, a, b>.
• After repeating 1-3 for all TFs, each bucket contains all tuples having the same TFBS offset distance.
• Finally, the relation of d and the ratio of tuples in the bucket B d in which the literature evidences showed that TF t regulates both a and b to all tuples was plotted.
The proposed RS measure is a good co-regulation indicator
Significance of performance difference of the proposed RS measure against five methods
5.36 × 10-244
Veerla and Höglund
3.23 × 10-83
Garten et. al.
4.82 × 10-213
Park et. al.
4.88 × 10-231
Shalgi et. al.
8.04 × 10-137
The effects of TFBS qualities
The SwissRegulon database , of which the TFBS data were used in this study, provides users a parameter of posterior probability to control the quality of the obtained TFBSs. Actually most resources of TFBS locations provide parameters such as ChIP-chip p-value and phylogenetic conservation and let users to choose the most appropriate values for their applications [13, 17, 21]. This section aims to figure out whether the TFBS quality affects the performance of the proposed RS measure and, if it does affect, what TFBS qualities are suggested.
TFBS qualities and quantities
Co-regulated genes of CCT8 identified by the proposed RS measure
RPN8, THI12, GTF1, GBP2, NOP7, YOR262W, NUP84, MDM32, TMA108, NUP85, URB2, MSO1
THR4, PRE8, SEC65, ISN1
RCF1, MRPL16, TIF11, RPN3, CYM1, YGL010W, URA7, RPA12, YNL144W-A, SCL1, EMC4
CSH1, YLR030W, RPL15A
Ranks of RPN8 and RSC1 against CCT8
Veerla and Höglund
Garten et. al.
Park et. al.
Shalgi et. al.
To justify the correctness of the rank order, the biological relevance of common TFs were analyzed. In this study, a TF is defined biologically relevant to a gene if the literature evidences obtained from the YEASTRACT database show that the TF regulates the gene. In Figure 5, all TFs with small TFBS offset distances are biologically relevant to both target genes (Rpn4 and Abf1 to both CCT8 and RPN8 in (a) and Abf1 to both CCT8 and RSC1 in (b)). Furthermore, all the other TFs, which have large TFBS offset distances, are not simultaneously relevant to both downstream genes. This suggests the correctness of the proposed RS measure as well as the importance of incorporating the information of TFBS locations.
Good RS measure benefits co-regulation-based applications
Co-regulated genes are considered to influence many biological behaviors and co-regulation measures have been used in various applications [22, 23]. The section "The proposed RS measure is a good co-regulation indicator" shows that the proposed RS is a good co-regulation index over the five competitors. This section discusses whether this leads to a better result in three co-regulation-based applications: detecting gene co-function, gene co-expression and protein-protein interactions.
In this study, the scenario of detecting gene co-function, gene co-expression and protein-protein interactions using gene co-regulation was designed as follows. First, users have a target gene of interest. The RS score of the target gene against each gene in the genome is calculated. The n genes with the highest RSs are called the regulatory neighborhood (RN) to the target gene and n is called the neighborhood size. Then the degree of co-function of the RN is evaluated using the functional enrichment score proposed by Reimand et al. , denoted as FES in this study. In FES, genes are considered to perform similar biological functions if they have similar Gene Ontology (GO) terms . The degree of co-expression of the RN is evaluated by the co-expression score proposed by Yang and Wu , denoted as CES in this study. CES is the average of the pairwise expression correlations in the RN. The degree of protein-protein interactions of the RN is evaluated by the interaction enrichment score proposed by Reimand et al. , denoted as IES in this study. IES measures the tendency of forming protein complex modules of a RN.
Comparison of six regulatory similarities on three applications
Veerla and Höglund
Garten et al.
Park et al.
Shalgi et al.
This study proposed a novel measure that can compute the regulatory similarity (RS) of two genes using the location information of transcription factor binding sites. Based on the documented regulation associations between TFs and genes in the YEASTRACT database, this study has shown that the proposed RS measure is a good co-regulation indicator. Furthermore, its good performance can benefit to three co-regulation-based applications. The proposed RS measure will be helpful for unraveling the underlying molecular mechanisms of specific cellular functions and dissecting the gene regulatory networks.
This work was supported by Ministry of Science and Technology of Taiwan.
The publication charges of this article were funded by Ministry of Science and Technology of Taiwan grant NSC 102-2221-E-006-085-MY2.
This article has been published as part of BMC systems Biology Volume 8 Supplement 5, 2014: Proceedings of the 25th International Conference on Genome Informatics (GIW/ISCB-Asia): Systems Biology. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcsystbiol/supplements/8/S5.
- Terai G, Takagi T, Nakai K: Prediction of co-regulated genes in Bacillus subtilis on the basis of upstream elements conserved across three closely related species. Genome Biol. 2001, 2 (11): research0048.0001-research0048.0012Google Scholar
- Polanski K, Rhodes J, Hill C, Zhang P, Jenkins DJ, Kiddle SJ, Jironkin A, Beynon J, Buchanan-Wollaston V, Ott S: Wigwams: identifying gene modules co-regulated across multiple biological conditions. Bioinformatics. 2014, 30 (7): 962-970. 10.1093/bioinformatics/btt728.PubMed CentralView ArticlePubMedGoogle Scholar
- Lin TW, Wu JW, Chang DTH: Combining phylogenetic profiling-based and machine learning-based techniques to predict functional related proteins. PloS one. 2013, 8 (9): e75940-10.1371/journal.pone.0075940.PubMed CentralView ArticlePubMedGoogle Scholar
- Garten Y, Kaplan S, Pilpel Y: Extraction of transcription regulatory signals from genome-wide DNA-protein interaction data. Nucleic Acids Research. 2005, 33 (2): 605-615. 10.1093/nar/gki166.PubMed CentralView ArticlePubMedGoogle Scholar
- Veerla S, Höglund M: Analysis of promoter regions of co-expressed genes identified by microarray analysis. BMC bioinformatics. 2006, 7 (1): 384-10.1186/1471-2105-7-384.PubMed CentralView ArticlePubMedGoogle Scholar
- Shalgi R, Lieber D, Oren M, Pilpel Y: Global and local architecture of the mammalian microRNA-transcription factor regulatory network. PLOS Computational Biology. 2007, 3 (7): e131-10.1371/journal.pcbi.0030131.PubMed CentralView ArticlePubMedGoogle Scholar
- Park PJ, Butte AJ, Kohane IS: Comparing expression profiles of genes with similar promoter regions. Bioinformatics. 2002, 18 (12): 1576-1584. 10.1093/bioinformatics/18.12.1576.View ArticlePubMedGoogle Scholar
- Van Helden J: Metrics for comparing regulatory sequences on the basis of pattern counts. Bioinformatics. 2004, 20 (3): 399-406. 10.1093/bioinformatics/btg425.View ArticlePubMedGoogle Scholar
- Hansen L, Mariño-Ramírez L, Landsman D: Many sequence-specific chromatin modifying protein-binding motifs show strong positional preferences for potential regulatory regions in the Saccharomyces cerevisiae genome. Nucleic Acids Research. 2010, 38 (6): 1772-1779. 10.1093/nar/gkp1195.PubMed CentralView ArticlePubMedGoogle Scholar
- Tabach Y, Brosh R, Buganim Y, Reiner A, Zuk O, Yitzhaky A, Koudritsky M, Rotter V, Domany E: Wide-scale analysis of human functional transcription factor binding reveals a strong bias towards the transcription start site. PLoS One. 2007, 2 (8): e807-10.1371/journal.pone.0000807.PubMed CentralView ArticlePubMedGoogle Scholar
- Allocco DJ, Kohane IS, Butte AJ: Quantifying the relationship between co-expression, co-regulation and gene function. BMC bioinformatics. 2004, 5 (1): 18-10.1186/1471-2105-5-18.PubMed CentralView ArticlePubMedGoogle Scholar
- Teixeira MC, Monteiro P, Jain P, Tenreiro S, Fernandes AR, Mira NP, Alenquer M, Freitas AT, Oliveira AL, Sá-Correia I: The YEASTRACT database: a tool for the analysis of transcription regulatory associations in Saccharomyces cerevisiae. Nucleic Acids Research. 2006, 34 (suppl 1): D446-D451.PubMed CentralView ArticlePubMedGoogle Scholar
- MacIsaac KD, Wang T, Gordon DB, Gifford DK, Stormo GD, Fraenkel E: An improved map of conserved regulatory sites for Saccharomyces cerevisiae. BMC bioinformatics. 2006, 7 (1): 113-10.1186/1471-2105-7-113.PubMed CentralView ArticlePubMedGoogle Scholar
- Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, Gerber GK, Hannett NM, Harbison CT, Thompson CM, Simon I: Transcriptional regulatory networks in Saccharomyces cerevisiae. Science. 2002, 298 (5594): 799-804. 10.1126/science.1075090.View ArticlePubMedGoogle Scholar
- Simon I, Barnett J, Hannett N, Harbison CT, Rinaldi NJ, Volkert TL, Wyrick JJ, Zeitlinger J, Gifford DK, Jaakkola TS: Serial regulation of transcriptional regulators in the yeast cell cycle. Cell. 2001, 106 (6): 697-708. 10.1016/S0092-8674(01)00494-9.View ArticlePubMedGoogle Scholar
- Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, Macisaac KD, Danford TW, Hannett NM, Tagne JB, Reynolds DB, Yoo J: Transcriptional regulatory code of a eukaryotic genome. Nature. 2004, 431 (7004): 99-104. 10.1038/nature02800.PubMed CentralView ArticlePubMedGoogle Scholar
- Chang DTH, Huang CY, Wu CY, Wu WS: YPA: an integrated repository of promoter features in Saccharomyces cerevisiae. Nucleic acids research. 2011, 39 (suppl 1): D647-D652.PubMed CentralView ArticlePubMedGoogle Scholar
- Chang DTH, Li WS, Bai YH, Wu WS: YGA: Identifying distinct biological features between yeast gene sets. Gene. 2013, 518 (1): 26-34. 10.1016/j.gene.2012.11.089.View ArticlePubMedGoogle Scholar
- Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M: The transcriptional landscape of the yeast genome defined by RNA sequencing. Science. 2008, 320 (5881): 1344-1349. 10.1126/science.1158441.PubMed CentralView ArticlePubMedGoogle Scholar
- Pachkov M, Erb I, Molina N, Van Nimwegen E: SwissRegulon: a database of genome-wide annotations of regulatory sites. Nucleic Acids Research. 2007, 35 (suppl 1): D127-D131.PubMed CentralView ArticlePubMedGoogle Scholar
- Tsai HK, Chou MY, Shih CH, Huang GTW, Chang TH, Li WH: MYBS: a comprehensive web server for mining transcription factor binding sites in yeast. Nucleic Acids Research. 2007, 35 (suppl 2): W221-W226.PubMed CentralView ArticlePubMedGoogle Scholar
- Bhardwaj N, Lu H: Correlation between gene expression profiles and protein-protein interactions within and across genomes. Bioinformatics. 2005, 21 (11): 2730-2738. 10.1093/bioinformatics/bti398.View ArticlePubMedGoogle Scholar
- Gyenesei A, Wagner U, Barkow-Oesterreicher S, Stolte E, Schlapbach R: Mining co-regulated gene profiles for the detection of functional associations in gene expression data. Bioinformatics. 2007, 23 (15): 1927-1935. 10.1093/bioinformatics/btm276.View ArticlePubMedGoogle Scholar
- Reimand Jr, Vaquerizas JM, Todd AE, Vilo J, Luscombe NM: Comprehensive reanalysis of transcription factor knockout expression data in Saccharomyces cerevisiae reveals many new targets. Nucleic Acids Research. 2010, 38 (14): 4768-4777. 10.1093/nar/gkq232.PubMed CentralView ArticlePubMedGoogle Scholar
- Gene Ontology C: The gene ontology: enhancements for 2011. Nucleic Acids Research. 2012, 40 (D1): D559-D564.View ArticleGoogle Scholar
- Yang TH, Wu W-S: Identifying biologically interpretable transcription factor knockout targets by jointly analyzing the transcription factor knockout microarray and the ChIP-chip data. BMC Systems Biology. 2012, 6 (1): 102-10.1186/1752-0509-6-102.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.