The overlap percentage varies among different TFs and different genes
Although on average only 4% of the TF binding dataset is overlapped with the TF knockout effect dataset, the percentage actually varies among different TFs and different genes. As shown in Figure 1, the percentage for different TFs varies between 0% and 36% and the percentage for different genes varies between 0% and 100% (see Additional file 1 for details). Identifying biological features that are associated with the overlap percentage may lead to biological explanations of the surprisingly low percentage of the binding targets of a TF affected when this TF is knocked out.
Functional redundancy of TFs explains why most binding targets of a transcription factor are not affected when the transcription factor is knocked out
In order to test whether functional redundancy may lead to one TF compensating for another, thus masking the TF knockout effect on the binding targets of the knocked-out TF, let us define two sets of TFs. The first is the set of TFs with high functional redundancy, which is defined as those TFs whose functional redundancy calculated using Equation (1) are among the top X% (X = 10, 20, 30, 40 or 50) of the 173 TFs under study. The other is the set of TFs with low functional redundancy, which is defined as those TFs whose functional redundancy are among the bottom X% of the 173 TFs under study. As shown in Figure 2, TFs with high functional redundancy show significantly lower overlap percentage (calculated using Equation (2)) than do TFs with low functional redundancy, suggesting that functional redundancy may explain why most binding targets of a TF are not affected when the TF is knocked out. Note that our result is robust against different choices (10, 20, 30, 40 or 50) of X and different sources (MIPS or GO) of functional annotation terms being used.
Lowly expressed genes have lower overlap percentage
Since both ChIP-chip and TF knockout experiments were performed on the yeast cells grown in the rich media condition, we speculate that lowly expressed genes in the rich media condition have lower percentage of the TF binding dataset overlapped with the TF knockout effect dataset than do highly expressed genes. To test our speculation, let us define two sets of genes. The first is the set of lowly expressed genes, which is defined as those genes whose expression levels are among the bottom X% (X = 10, 20, 30, 40 or 50) of the 4065 genes under study. The other is the set of highly expressed genes, which is defined as those genes whose expression levels are among the top X% of the 4065 genes under study (see Additional file 2 for details). The gene expression data in the rich media condition was downloaded from Holstege et al.'s study [18] and Nagalakshmi et al.'s study [19]. As shown in Figures 3a and 3b, lowly expressed genes show significantly lower overlap percentage (calculated using Equation (3)) compared with highly expressed genes, suggesting that low expression level is associated with a gene being insensitive to the knockout of its promoter-binding TFs. Note that our result is robust against different choices (10, 20, 30, 40 or 50) of X and different sources (Holstege et al.'s study or Nagalakshmi et al.'s study) of gene expression data being used.
Ribosomal genes are known to be highly transcribed in the rich media condition. If our finding is biologically meaningful, we expect that ribosomal genes have higher overlap percentage compared with the rest of the 4065 genes under study. To test this assertion, we downloaded two lists of ribosomal genes from KEGG ribosome pathway: sce03010 [20] and MIPS functional category: 12.01.01 ribosomal proteins [17]. As expected, ribosomal genes show significantly higher overlap percentage (calculated using Equation (3)) compared with the rest of the 4065 genes under study (see Figures 3c and 3d), thus further strengthen our finding. Note that our result is robust against different sources (KEGG or MIPS) of the list of ribosomal genes being used.
TATA box-less genes have lower overlap percentage
It is known that TATA box-less and TATA box-containing genes are distinctly regulated [21]. TATA box-less genes tend to be housekeeping genes, have a sharply peaked TF binding site (TFBS) distribution and are constitutively expressed, while TATA box-containing genes are usually associated with environmental stress responses, dispersed TFBS distribution and variably expressed under different conditions [21–25]. It is interesting to know whether these two classes of genes differ in their overlap percentage. The lists of TATA box-less genes and TATA box-containing genes were downloaded from Basehoar et al.'s study [21]. Depending on how stringent the criterion for defining a TATA box is, three possible lists of TATA box-containing genes were defined by Basehoar et al. [21]. As shown in Figure 4, TATA box-less genes show significantly lower overlap percentage (calculated using Equation (3)) compared with TATA box-containing genes, suggesting that lacking a TATA box is associated with a gene being insensitive to the knockout of its promoter-binding TFs. Note that our result is robust against different criteria of defining TATA box-containing genes.
Genes containing a nucleosome-free region (NFR) have lower overlap percentage
In yeast, the capacity to modulate gene expression upon changing conditions (i.e., transcriptional plasticity) correlates with the organization of their promoter nucleosomes [26]. Genes containing an NFR immediately upstream of the transcriptional start site (TSS) are characterized by low transcriptional plasticity, while genes lacking an NFR immediately upstream of the TSS are characterized by high transcriptional plasticity. It is interesting to know whether these two classes of genes differ in their overlap percentage. The lists of genes containing and lacking an NFR were both downloaded from Tirosh and Baikai's study [26]. As shown in Figure 5a, genes containing an NFR show significantly lower overlap percentage (calculated using Equation (3)) compared with genes lacking an NFR, suggesting that containing an NFR immediately upstream of the TSS is associated with a gene being insensitive to the knockout of its promoter-binding TFs.
It is known that genes lacking an NFR are subjected to greater regulation by specific chromatin remodelling factors than are genes containing an NFR [26]. If our finding is biologically meaningful, we expect that TFs involved in chromatin remodelling have higher overlap percentage compared with the rest of the 173 TFs under study. To test this assertion, we downloaded the list of TFs involved in chromatin remodelling from Ozonov and van Nimwegen's study [27]. As expected, TFs involved in chromatin remodelling show significantly higher overlap percentage (calculated using Equation (2)) compared with the rest of the 173 TFs under study (see Figure 5b), thus further strengthen our finding.
Genes with low transcriptional plasticity have lower overlap percentage
We have shown that two classes of genes (TATA box-less genes and genes containing an NFR) have lower overlap percentage. Since both classes of genes are known to have low transcriptional plasticity [21, 26], this prompts us to speculate that genes with low transcriptional plasticity have lower percentage of the TF binding dataset overlapped with the TF knockout effect dataset than do genes with high transcriptional plasticity. To test our speculation, let us define two sets of genes. The first is the set of genes with low transcriptional plasticity, which is defined as those genes whose transcriptional plasticity are among the bottom X% of the 4065 genes under study. The other is the set of genes with high transcriptional plasticity, which is defined as those genes whose transcriptional plasticity are among the top X% of the 4065 genes under study (see Additional file 2 for details). The transcriptional plasticity each gene in the yeast genome was downloaded from Lin et al.'s study [22]. As shown in Figure 5c, genes with low transcriptional plasticity show significantly lower overlap percentage (calculated using Equation (3)) than do genes with high transcriptional plasticity, suggesting that low transcriptional plasticity is associated with a gene being insensitive to the knockout of its promoter-binding TFs. Note that our result is robust against different choices (10, 20, 30, 40 or 50) of X being used.
Several gene properties are not associated with the overlap percentage
In the previous sections, we show that four gene properties (expression level, TATA box, nucleosome, and transcriptional plasticity) are associated with the overlap percentage. Actually, five other gene properties are also tested but do not have statistically significant association with the overlap percentage. These five gene properties include the 5'UTR length, 3'UTR length, gene essentiality, number of physical interaction partners and number of genetic interaction partners.
More analyses motivated by Cusanovich et al's study
In Cusanovich et al.'s paper [29], they reported that functional TF binding is enriched in the regulatory regions with a larger number of bound TFs and more binding sites. Moreover, functional TF binding tends to occur further from the TSS (i.e. in the enhancer regions). Motivated by their findings, we perform extra analyses and have the following three observations: (i) a low number of bound TFs in a gene, (ii) a low number of TFBSs in a gene, and (iii) a short average distance of TFBSs to the TSS in a gene are all associated with a gene being insensitive to the knockout of its promoter-binding TFs (see Additional file 3 for details).