Efficiency of complex production in changing environment

Background Cell function necessitates the assemblage of proteins into complexes, a process which requires further regulation on top of the fairly understood mechanisms used to control the transcription and translation of a single protein. However, not much is known about how protein levels are controlled to realize that regulation. Results We integrated data on the composition of yeast protein complexes and the dynamics of their protein building-blocks concentrations to show how the cell regulates protein levels to optimize complex formation. We find that proteins which are subunits of the same complex tend to have similar levels which change similarly following a change in growth conditions, and that abundant proteins undergo larger decrease in their copy number when grown in minimal media. We also study the fluctuations in protein levels and find them to be significantly smaller in large complexes, and in the least abundant subunit of each complex. We use a mathematical model of complex synthesis to explain how all these observations increase the efficiency of complex synthesis, in terms of better utilization of the available molecules and better resilience to stochastic variations. Conclusion In conclusion, these results indicate an intricate regulation at all levels of protein production for the purpose of optimizing complex formation.

rRNA processing a Whenever the description of the complex was given in the MIPS data set, it is reported. Otherwise, we report the GO term with lowest P-value common to all genes in the complex (www.yeastgenome.org). Table 1-A list of 46 complexes whose protein levels change uniformly, along with the MIPS identifier, protein systematic names, and complex function (when available).

Properties of uniform complexes
Our results for the uniform complexes are complementary to the results of ref [1], where it was shown that cell-cycle complexes are made up from both constitutively expressed (static) and periodically expressed (dynamic) proteins. We show that for a steady-state growth in different environments (as opposed to periodic changes in the course of cell cycle), the levels of all (or most) subunits tend to change in the same direction. It is interesting to compare the uniform complexes with the cell cycle information of ref [1], which provides a classification of proteins into dynamic and static, and the precise position in the cell cycle in which the dynamic proteins are expressed. We integrated that information with our 46 uniform complexes. In Figure S1, we show the fraction of dynamic proteins within each complex. It can be seen that only 12/46 uniform complexes contain any dynamic protein (compared to 22±3, if each protein in a complex is dynamic with probability 0.086, which is the fraction of complex proteins which are dynamic).
Figure S1-For each of the uniform complexes, we plot the fraction of proteins which are expressed periodically [1]. The complexes in each group (minimal (red), constant (green), and rich (blue)), were ordered by their fraction of periodic proteins.
In Figure S2, we plot the position within the cell cycle in which each dynamic protein is expressed (zero is M/G 1 boundary). The two complexes which are highly dynamic and the expressions of their proteins are most concerted (no. 26 and 27 below) are the prereplication, and replication complex, respectively. Figure S2-For each of the uniform complexes, we plot the position within the cell cycle of all of its periodic proteins [1]. The complexes are ordered as in Figure S1.
Focusing attention to the function of the uniform complexes, we could identify 12/19 of the rich state complexes as related to translation or ribosome biosynthesis, in accordance with previous studies [2][3][4] which showed that the yeast is very sensitive to deviations from uniformity in ribosomal proteins. The regulation of the change in protein levels is probably harder to achieve as the complex gets larger. Indeed, the average number of subunits in uniform complexes is smaller than in non-uniform complexes (average no. of subunits per complex 8.3 vs. 12.3, P=0.02, t-test).

The dosage balance hypothesis
The dosage balance hypothesis [3] states that "an imbalance in the concentration of the subcomponents of a protein-protein complex can be deleterious". Thus, "underexpression and overexpression of protein complex subunits should lower fitness, and … the accuracy of transcriptional co-regulation of subunits should reflect the deleterious consequences of imbalance." Our results that protein levels in a complex are uniform in different growth conditions demonstrate that regulation of subunit levels exists even at the protein level. This regulation, we argued, is aimed to optimize complex production. However, a natural question is whether these results can be solely explained based on a potentially harmful effect of incomplete subcomplex, or is there further regulation towards minimization of the waste of resources. Here we show that the dosage balance is not likely to be attributed solely to the harmful effects of partial complexes as it does not correlate with haploinsufficiency and over-expression toxicity.
First, we used a data set which divides all proteins into haplo-sufficient and haploinsufficient based on the growth pattern after a deletion of one allele of their gene [2]. We call each complex with at least one haplo-insufficient protein a haplo-insufficient complex. If harmful effects of complex subunits were to explain our findings, one would have expected the protein levels [5] of the haplo-insufficient complexes to be more tightly regulated, and thus more uniform. However, in reality the converse it true -the variance in the (log of the) levels of the (245) haplo-insufficient complexes is slightly (but not significantly) higher than the levels of the (297) haplo-sufficient complexes (P=0.08).
Second, we studied the list of genes that are toxic when over-expressed [6]. As before, we designate as toxic complexes these complexes which contain at least one toxic overexpressed gene. Here too, the variance in the (log of the) levels of the (381) toxic complexes is higher than the (161) non-toxic ones (P=0.003).
Finally, the harmful effects of dosage imbalance are also unlikely to explain the maintenance of protein levels across different conditions. To show that, we examine the fraction of haploinsufficient and toxic complexes that show uniform direction of change between minimal and rich conditions. We find that the fraction of complexes with uniform direction of change is lower in haplo-insufficient and toxic complexes (considering only complexes with at least four subunits). While only 9.2% of the haplo-insufficient complexes exhibit uniform change, 12.5% of the haplo-sufficient complexes are uniform. Similarly, only 8.4% of the toxic complexes are uniform, compared to 17.9% of the non-toxic complexes. Thus, complexes that consist of proteins with harmful effect of dosage imbalance do not show more uniform levels across different condition than other complexes.
In conclusion, the uniformity in the protein levels in a complex cannot be explained solely by the tight regulation of those proteins which are highly sensitive to change in their levels. Instead, our results suggest dosage optimization across the board, in accordance with the economic usage principle. To show this, we fix the initial values of (A 0 ,B 0 ,C 0 ) and calculate the level of ABC from the model. We then fix the new ABC level to be 10% of the original ABC, and allow of A 0 and B 0 to change. We scan all possible values of A 0 and B 0 (C 0 is automatically determined by the restriction on the final level of ABC), and select the optimal set of values which yields the largest decrease in the total number of molecules used relative to the number originally used (i.e., A 0 +B 0 +C 0 ). We then repeated the analysis for several initial values of (A 0 ,B 0 ,C 0 ), and plotted the ratio between the new and original concentration (in the optimal configuration), for A 0 , B 0 , and C 0 . It can be seen that the relative concentration of A 0 (the most abundant component) compared to the original concentration is the smallest. The relative concentration of B 0 is larger, and for C 0 (the least abundant component) is the largest. Therefore, in the optimal case, the more abundant component decreases more. Figure S3. The ratio (percent) between the new and old concentration of A,B, and C (bottom to top surfaces, respectively). The x and y-axes denote the initial concentration A 0 and C 0 . All three components are allowed to change, constrained by the final concentration of the ABC complex being %10 of the original, and the optimal solution (in terms of minimum total number of molecules) is chosen.

Lower noise of complex proteins
In Figure S4 below, we plot the distribution of CV (noise) values for complex and noncomplex proteins. It can be seen that complex proteins exhibit lower values of noise (P<10 -

An alternative explanation to the low noise in large complexes
Large complexes contain higher fraction of essential proteins [8] (7% more). This could provide an alternative explanation to the reduced noise in large complexes, as essential proteins are less noisy [7]. To check whether this effect might account for the lower noise, we compared the components of large complexes to random proteins, but controlled for essentiality by keeping the total number of essential proteins constant. This resulted in an average level of noise (CV) of 18.52 compared to 19.0±0.12 expected by chance (P<10 -4 ; the lower noise in large complexes remains significant even when abundance is controlled for (P≈0.003)). Thus we conclude that the low noise in large complexes cannot be explained solely by the higher fraction of essential or abundant proteins.
Note that even the usage of more essential proteins in large complexes does, by itself, increase efficiency. Here we observe an additional effect, which might be explained as a supplementary optimization towards more economic usage of resources.

An alternative explanation to the high level of the least abundant protein in a complex
The least abundant protein of each complex was shown to have higher concentration and, consequently, lower noise. We interpreted this observation in terms of a kinetic argument showing that the variation in the level of the goal complex is most sensitive to the concentration of the least abundant protein. If this explanation holds, it is yet another manifestation of efficiency.
However, the higher concentration of the least abundant protein might be explained as a consequence of complexes having more uniform subunit abundances than random: if the variance of intra-complex levels is low, then the lowest concentration subunit would always have higher average abundance than if the variance of intra-complex abundance were high.
To test this alternative explanation, we look at the concentration of the most abundant protein in a complex. If the alternative explanation is true, one expects the concentration of the protein of highest level to be lower than expected by chance. We find that the average concentration of the least abundant protein is 1310 (molecules/cell) compared to 960 after randomization, a 36% increase, while the concentration of the most abundant protein is 145,000, compared to 157,000 after randomization, which is a decrease of only 8%. Thus we observe that the protein of lowest abundance exhibits a much stronger increase in its level, indicating another level of regulation beyond the tendency towards overall low variance, which attests for the validity of our kinetic argument.

An alternative explanation to the similarity of transcript length in a complex
It has been previously reported [9] and can be easily verified that biochemical activity is correlated with protein length. Thus, the similarity of transcript length in a complex could be alternatively explained due to the common function of complex members. To refute this alternative hypothesis, we performed the following computation. We downloaded GO annotation for the yeast genome from SGD (www.yeastgenome.org). For each gene, we extracted all "molecular function" GO terms which are associated with it (where we considered terms at the third level of the tree, altogether adding up to 79 terms). Without controls, to establish the fact that complex subunits have similar transcript length, we calculated the variance of the (log of the) transcript lengths and averaged over all complexes. We then repeated the same calculation when the complex proteins were shuffled. However, when shuffling the complex proteins, we only allowed an exchange of two proteins which share at least one GO molecular function term. Thus, we control for the effect of the similar functions of the complex subunits (this, in addition to control for abundance, since it is also known that gene length is related to abundance). While the real variance was 0.27, the variance after randomization was 0.316+-0.012, yielding P-value of about 10 -5 . Without any controls the variance after randomization is higher (0.334+-0.014, P≈2•10 -7 ), leading to stronger result. However, we can still conclude that the tendency of complex subunits to contain genes with similar transcript size is not in all due to the similar function of the complex subunits.