Identifying gene triplets whose regulatory patterns obey logic functions (Figure 1, first step)
We applied logic analysis to expression data to identify gene triplets whose regulation obeys one of the eight possible logic functions (Additional file 1: Table S1 and Figure 3). The analysis was applied to data of Gasch et al. ([10]), which measure the expression of all Saccharomyces cerevisiae genes in response to various environmental stresses. Initially, we constructed two binary state vectors for each gene. One vector describes whether the gene is repressed and the other vector describes whether the gene is induced across of the microarray experiments; vectors were retained for analysis only if induction or repression was seen in at least 10% of experimental conditions (see Methods and Figure 2). This resulted in 2,969 (~ 25%) gene vectors, 45% of which represent the induced state while 55% represent the repressed state. Next, using these binary vectors, we analyzed all possible gene triplets. We identified about nine million potentially significant gene triplets, based on the associated uncertainty coefficient (U) and P-value. These thresholds were chosen to filter out triplets that are only related by pairwise correlations between two genes (see detailed description in Method section). Some of the gene triplets were significant under more than one type of logic function. In these cases we assigned to each gene triplet the most significant logic function as defined by the highest U value. This assignment reduced the number of non-redundant triplets for further grouping and analysis to 5,241,065.
The eight possible types of triplet logic relationships described earlier [7], occur with different frequencies (Additional file 1: Table S1). The four types (A AND !B, !A AND B), A XOR B, A OR B, and A AND B represented 53.5%, 30.6%, 15.2%, and 0.7% of the cases, while the remaining four types almost never occurred. We believe certain logic types are rare because the binary microarray data we are using is relatively sparse (it contains many more zeros than ones). As a result, only logic functions where f(0,0) = 0 are observed often, whereas functions where f(0,0) = 1 are not. Additional file 2: Figure S1 contains example heat maps of triplets of genes that obey the AND and XOR logic functions.
Mapping gene triplets to multi-protein complexes (Figure 1 second step)
We mapped all gene triplets to complexes as described in Methods. We identified 40,521 triplets that were composed of genes that mapped to multi-protein complexes. Of these triplets, 412 (1%) mapped to a single multi-protein complex, 40,109 (99%) triplets mapped to at least two different complexes and about 90% mapped to three different complexes. As mentioned above, the U value was used to filter out gene triplets that are associated only by pairwise correlations (see Methods). That most gene triplets mapped to more than one complex supports our choice of this threshold.
Grouping gene triplets that map to three complexes (Figure 1 third step)
Next we grouped together gene triplets obeying the same logic function and mapping to the same set of three complexes (Figure 1, third step). We restricted our analysis to two logic functions: XOR and AND. These two functions were abundant in our data and were judged to have more intuitive biological interpretations than the other logic types (Figure 3). The logic function AND yields 397 triplets of protein complexes. For each triplet of complexes we computed the significance of the finding based on the number of gene triplets that map to these complexes and computed a P value using the hypergeometric distribution (see Methods section). Out of these 397 triplets of protein complexes, 102 (25.7%) are significant (P ≤ 0.05 adjusted for Bonferroni correction). A total of 15,915 triplets of protein complexes were related through the logic function XOR, of which 729 (4.6%) are significant (P ≤ 0.05 adjusted for Bonferroni correction).
The significant triplets of protein complexes related through logic functions AND and XOR include 69 and 159 different protein complexes which are supported by 230 and 3,775 gene triplets respectively. The genes composing the triplets encode a subset of the subunits of each complex. This may be explained by the incompleteness of the microarry data (missing measurements in specific experiments) and the strict parameters we choose. To check if the subunits we identify are representative of the entire complex, we calculated the expression coherence between the subunits of a complex. We found that in all complexes that appear in our study, the expression was indeed coherent (Additional file 3: Table S2).
The list of all triplets of protein complexes which have coordinated regulation (under the AND and XOR logic functions) appears in Additional files 4 and 5: Tables S3 and S4. Below we discuss examples of the triplets of protein complexes whose synchronized regulation has been previously described in the literature, as well as novel predictions of co-regulation of complexes.
Regulation of protein translation, autophagy degradation and N-linked glycosylation - examples of triplet complexes that have coordinated regulation obeying the AND logic function
Figure 4 is a schematic representation of various multi-protein complexes involved in processes related to translation. The figure caption provides a brief description of the function of the complexes whose co-regulation is described in the following sections.
Ribosome large subunit - 60S, eIF2B initiation factor and RNA polymerase I/III
Our results reveal that the transcription of the 60S ribosomal large subunit decreases if and only if (IFF) the transcription of the eIF2B initiation factor AND RNA polymerase I/III are decreased as well. The three subunits of the RNA polymerase, RPB5, RPC19, and RPO26 that participate in this logic relation are components of both polymerase I and III. Figure 5(A) shows the subset of experiments (outlined rectangle) where the transcription of all three complexes decreases. Indeed, co-regulation between complexes involved in ribosome biogenesis (RNA polymerase I/III) and protein translation (eIF2B initiation factor) was shown recently to be mediated by TOR signaling, as reviewed in Wullschleger et al. [4]. In response to nutrients, TOR induces ribosome biogenesis, translation, and nutrient import, whereas stress conditions repress these functions [4]. Our results suggest the stress conditions tested in these experiments inhibit TOR signaling and this inhibition leads to the repression (either direct or indirect) of all three complexes. ChIP-chip data reveal that the genes encoding the subunits of Ribosome 60S, RNA polymerase and eIF2B are bound by overlapping sets transcription factors. Genes encoding RNA polymerase I/III subunits and ribosome large subunits are bound by the ABF1 transcription factor (ARS-Binding Factor 1), whereas genes encoding the RNA polymerase I subunit and the eIF2B subunit genes are bound by RPN4 (Regulatory Particle Non-ATPase) and DIG1 (Down-regulator of Invasive Growth).
Ribosome 60S and 40S subunits and the autophagy related complex
We find that the transcription of the 60S ribosome large subunit decreases only when the transcription of the 40S subunit decreases AND the transcription of autophagy related complex increases. Figure 5(B), shows that in a subset of experiments the transcription of the autophagy-related dimer complex Aut2P/Aut7P is increased when the transcription of both of the ribosomal complexes, 40S and 60S is decreased. Although the relation includes only one subunit of the 60S ribosomal complex, it is known that ribosomal subunits are strongly co-expressed (average correlation coefficient of 0.87 (± 0.08) of 86% of possible pairs within the ribosome). All other subunits of the 60S ribosome were assigned lower scores due to incompleteness of the microarray data in the specified experiments and the strict parameters we choose. Aut2P/Aut7P has a role in protein degradation while the two ribosomal complexes 40S and 60S have a role in protein synthesis. That these two complexes have opposite function likely explains their opposite transcriptional regulation in this subset of experiments (outlined rectangle). In this example as in the previous one, the TOR signaling pathway is known to mediate both the translation and the autophagy processes. When the cell experiences stress conditions, the TORC1 complex is inhibited. This inhibition leads to decreased transcription of genes involved in translation and also leads to activation of the autophagy process [4, 5].
We identified TFs that bind genes of both the 40S and 60S ribosomal complexes, but could not identify TFs that also bind genes encoding the Aut2/Aut7 complex, possibly because we employed strict filtering of the ChIP-chip data (see Methods). Moreover, we did find that genes encoding subunits of the TORC1 (TOR complex) and ribosomal 40S and 60S subunits are all bound by the REB1 (RNA polymerase I Enhancer Binding protein), PHO2 (PHOsphate metabolism regulator) and MSN4 (activated in stress conditions) TFs.
Ribosome 60S, RNA polymerase I/III and Mannosyltransferase glycosylation complex
The transcription of the 60S ribosomal large subunit decreases only when the transcription of RNA polymerase I/III AND the M-POL II complex are both decreased. Figure 5(C) shows coordinated reduction in the transcription of the ribosome complex, the RNA polymerase complex (I/III), and the M-POL II complex in a subset of stress conditions (outlined rectangle). The M-POL II, mannosyltransferase II is the third complex enzyme in mannan modification of N-linked glycan processing (elongating the α (1,6) mannan backbone) in the Golgi apparatus. The importance of N-linked glycan processing is underscored by the fact that mannoproteins make up about 40% of the yeast cell wall [11–13]. The substrates of the POL-II enzyme are N-linked glycan modified proteins from the ER (Figure 4). Glycosylation in the ER has been shown to be important for many polypeptides to undergo proper or complete folding (reviewed in [14]). Thus, we expect tight regulation of ribosome translation in the cytoplasm, followed by modification of N-linked glycans (ER) and subsequent mannan modification of N-linked glycan by the M-POL II (golgi). The subset of stress conditions for which the transcription of these three complexes decreases (legend Figure 5) are all known to reduce overall protein synthesis.
We find that the ABF1 transcription factor binds to genes encoding subunits of all three complexes. Moreover, the YAP5 basic leucine zipper (bZIP) transcription factor was found to bind the genes encoding the ribosome 60S and the M-POL II subunits.
We are unaware of evidence in the literature of coordinated regulation between translation - related complexes and mannan modification in the golgi. Our analysis therefore generates a novel prediction supported by TF binding data and the known biological roles of the complexes.
Ribosome synthesis and regulation - an example of coordinated regulation among complexes obeying the XOR logic function
One of the significant triplets of protein complexes that are related by an XOR (exclusive OR) logic function, involves the processome. The example presented here results from combining two triplets of protein complexes: processome, proteasome and the 60S and 40S ribosomal subunits. In this triplet, processome transcription decreases if the transcription of the ribosome (40+60S) decreases, XOR the transcription of the proteasome increases. Prior experimental studies of these three multi-protein complexes support the proposed logic relationship we find between these complexes. It has been suggested that the rRNA processome SSU (Small Subunit) complex has two roles in the maturation process of the pre-ribosome 90S [15]. The first role of the rRNA processome is carried out by its sub complex t-Utp (U3 proteins), which is recruited to the Pol I promoter upstream of the rDNA gene for transcription initiation. The second role of the processome is pre-rRNA cleavage of the pre-ribosome 90S before transcription is completed. In recent work with mammalian cells, Stavreva et al. found that complexes associated with pre rRNA processing factors are ubiquitinated and hence labeled for processing by the proteasome, a step essential for proper activity in ribosome maturation. One of the factors found to be ubiquitinated is fibrillarin, a yeast NOP1 homolog that is a subunit of the rRNA splicing processome [16]. As the processome was found to regulate its own activity [17], reduction of its abundance may lead to decrease of its own transcription. The co-regulation of these three complexes is reasonable given the proposed regulation mechanism by the proteasome.
Figure 6 presents the consensus mRNA expression vectors of the three complexes as a heat map showing the logic relationship between their transcription patterns. The subset of stress conditions for which the transcription of both the processome and ribosome decreases (describe at legend Figure 6) is likely to cause a drop in the "translation" rate. While the subset of stress conditions for which proteasome transcription is induced while the processome transcription is reduced might be related to processome degradation by the proteasome [16]. The relevant subsets of experiments in the second case, include response to 0.3 mM H2O2 in cells with deletions of stress induced TFs. In fact, it was shown that high H2O2 concentrations results an increase rate of ribosome biogenesis and maturation [18], substantiating our prediction. Two transcription factors were found to bind several of the genes encoding subunits of all three complexes: RAP1 (Repressor Activator Protein) and CBF1 (Centromere-Binding Factor).
Cellular network of all complexes having different triplet relationships with the ribosome
By grouping together all predicted complex triplets that obey the same type of logic function (AND) and involve the ribosomal small or large subunits, we were able to generate a network. Figure 7 shows a subset of this network that includes complexes belonging to the "energy", "transcription" and "translation" functional classes (as defined by MIPS functional categories [19]). The figure shows that complexes belonging to the same functional classes are regulated in the same direction. In response to stress conditions, complexes belonging to the functional class "energy" are positively regulated (induced), while complexes belonging to the functional classes "transcription" or "translation" are negatively regulated (repressed). This result suggests that the regulation of different complexes may be determined by a master regulatory mechanism that differentially controls multi-protein complex expression, based on function.