Calculation of the relative metastabilities of proteins in subcellular compartments of Saccharomyces cerevisiae

Background Protein subcellular localization and differences in oxidation state between subcellular compartments are two well-studied features of the the cellular organization of S. cerevisiae (yeast). Theories about the origin of subcellular organization are assisted by computational models that can integrate data from observations of compositional and chemical properties of the system. Presentation and implications of the hypothesis I adopt the hypothesis that the state of yeast subcellular organization is in a local energy minimum. This hypothesis implies that equilibrium thermodynamic models can yield predictions about the interdependence between populations of proteins and their subcellular chemical environments. Testing the hypothesis Three types of tests are proposed. First, there should be correlations between modeled and observed oxidation states for different compartments. Second, there should be a correspondence between the energy requirements of protein formation and the order the appearance of organelles during cellular development. Third, there should be correlations between the predicted and observed relative abundances of interacting proteins within compartments. Results The relative metastability fields of subcellular homologs of glutaredoxin and thioredoxin indicate a trend from less to more oxidizing as mitochondrion – cytoplasm – nucleus. Representing the overall amino acid compositions of proteins in 23 different compartments each with a single reference model protein suggests that the formation reactions for proteins in the vacuole (in relatively oxidizing conditions), ER and early Golgi (in relatively reducing conditions) are relatively highly favored, while that for the microtubule is the most costly. The relative abundances of model proteins for each compartment inferred from experimental data were found in some cases to correlate with the predicted abundances, and both positive and negative correlations were found for some assemblages of proteins in known complexes. Conclusion The results of these calculations and tests suggest that a tendency toward a metastable energy minimum could underlie some organizational links between the the chemical thermodynamic properties of proteins and subcellular chemical environments. Future models of this kind will benefit from consideration of additional thermodynamic variables together with more detailed subcellular observations.

Much attention has been given to the use of thermodynamics in describing and understanding driving forces in biological evolution.Energy minimization imparts a direction for spontaneous change of a system, and response of a system in this direction can at times be tied to an increase in relative fitness [33,34,35,36,37].A biological system that moves away from minimum energy does not break the laws of thermodynamics but couples its endergonic reactions with the exchange of matter and energy in its surroundings [38,39,40,41].The thermodynamic characteristics of open systems are thus of particular interest to biological evolution [42,43,44]; in particular, the interactions of organisms with their environments are important influences on the stable compositions and distributions of genes or organisms [45,46,47,48].
Why are proteins not equally distributed inside cells?Physical separation of key enzymes is thought to be essential in the cytoskeletal network and in regulation of metabolic pathways and other cellular functions [49,50,51].The patterns of subcellular structure persist even though populations of proteins turnover through continual degradation and synthesis in cells [52,53,54,55], and despite the endergonic, or energy-consuming, qualities of protein biogenesis [56,57].It can be shown that the relative abundances of amino acids in proteins correlate inversely with the metabolic cost of synthesis of the amino acid [58,59], which is a temperature-dependent function [60].The starting premise of this study, then, is that protein formation reactions are unfavorable to different degrees, depending on the environments and compositions of the biomolecules.
The application of equilibrium chemical thermodynamics as a way to characterize the relative stabilities of minerals as a function of temperature, pressure and oxidation-reduction potential [61,62,63], or to calculate the relative abundances of coexisting inorganic [64,65] and/or organic species [41,66], is well documented in the geochemical literature.An advantage of performing quantitative chemical thermodynamic calculations for many different model systems is that the equilibrium state serves as a frame of reference for describing both reversible and irreversible chemical changes.For example, the weathering of igneous rocks is an overall irreversible process but the sequences of minerals formed can nevertheless be predicted after initial formulation of the relative stability limits of the chemical species involved [67,68].One of the motivations for this study is to see whether a similar approach could be used to describe the sequence of events in irreversible subcellular processes.
The thermodynamic calculations reported in this study are based on algorithms for calculating the standard molal Gibbs energies of ionized proteins [69] and a chemical reaction framework that is used to compute metastable equilibrium relative abundances of proteins [70].The Supporting Information for this paper includes the software package (Text S1) and the program script and data files (Text S2) used to carry out these calculations.The theoretical approach adopted here is based on the description of a chemical system in terms of intensive variables.These variables are temperature, pressure and the chemical potentials of the system.It is convenient to denote the chemical potentials by the chemical activities or fugacities of basis species, for example the activity of H + (which defines pH) or the fugacity of oxygen.This permits comparison of the parameters of the model with reference systems described in experimental and other theoretical biochemical studies.
A few notes on terminology follow.Formation of a protein refers to the overall process of protein biosynthesis and translocation to a specific compartment.Activity and species denote, respectively, chemical activity and chemical species, not enzyme activity or biological species.In the present study, activity coefficients are taken to be unity, so the chemical activities are equivalent to molal concentrations.Below, oxidation-reduction potential and oxygen fugacity are used synonymously, and redox refers specifically to Eh.The oxidation-reduction potential of a system can be expressed in terms of Eh using an equation given in the Methods.The overall compositions of proteins in compartments are referred to here as proteologs (or model proteologs).The interactions of proteins are processes in which the proteins come into physical contact, for example in transport processes between compartments and in the formation of complexes.If a process results in a change in the composition of a population of interacting proteins, then a chemical reaction has occurred.Protein-protein interactions do not necessarily correspond to chemical reactions.However, a population of interacting proteins does chemically react a.Amino acid compositions of subcellular isoforms of glutaredoxin (GLRX), thioredoxin (TRX) and thioredoxin reductase (TRXB) in S. cerevisiae were taken from the SWISS-PROT database [71] (accession numbers shown in the table).Chemical formulas of nonionized proteins, and calculated standard molal Gibbs energy of formation from the elements (∆G • , in kcal mol −1 , at 25 • C and 1 bar) and net ionization state (Z) at pH = 7 of charged proteins are listed.Average nominal oxidation state of carbon (Z C ) was calculated using Eqn.(12).
if the turnover rates of the proteins are not all the same or if, through evolution, the genes coding for the proteins undergo different non-synonymous mutations.Model systems consisting of interacting proteins are useful targets for assessing the potential for chemical reactivity, which might occur on evolutionary time scales longer than the physical interactions.The purpose of this study is to quantify using a metastable equilibrium reference state the responses of populations of model proteins for different subcellular compartments of S. cerevisiae to gradients of oxidation-reduction potential.There are two major parts to this paper.In the first part, the reactions corresponding to intercompartmental interactions between isoforms (or homologs) of particular enzymes and between proteologs are quantified by calculating the oxygen fugacities for equal chemical activities of the reacting proteins or proteologs in metastable equilibrium.A ranking of relative metastabilities of the proteologs is discussed.Specific known interactions between compartments are considered in order to derive values of the oxygen fugacity within compartments that best metastabilize the corresponding proteologs relative to those of other compartments.Equal-activity values of the oxygen fugacity in the reactions are used to predict a sequence of formation of model proteologs in response to a temporal oxidation-reduction gradient.
In the second part of this paper, the relative abundances of model proteins in metastable equilibrium are calculated and compared with measured abundances.The range of protein abundances in a metastable equilibrium population often approaches that seen in experiments over a narrow window of oxygen fugacity.Positive and negative correlations between the calculated and experimental relative abundances are found in some cases.Local energy minimization and its opposition in the cellular demands for selectivity in protein formation are discussed as possible processes leading to the observed patterns.

Results and Discussion
Calculated metastability relations are described below for intercompartmental interactions between the model homologs and proteologs, and for intracompartmental interactions among the most abundant proteins in compartments or the reference model complexes.Experimental comparisons and discussion of their implications are integrated with these results.

Relative metastabilities of subcellular homologs of redoxins
The cytoplasmic, nuclear and mitochondrial homologs of glutaredoxin [72,73,74] and thioredoxin/thioredoxin reductase [75,11] in yeast cells represent the first model systems for subcellular environments studied here.The names and chemical formulas of these proteins are listed in Table 1, together with some computed properties.The average nominal oxidation state of carbon (Z C ) is a function of the relative proportions of the elements in the chemical formula (see Methods).These values are provided just to get some initial bearing on the differences in compositions of the proteins.In Table 1 the proteins with the lowest values of Z C are the mitochondrial homologs and those with the highest values of Z C are the nuclear homologs.Because the current objective is to describe the compositions of populations of proteins in terms of a variable like oxidationreduction potential, a quantity such as Z C is not sufficient; it has no explicitly derivable relation to intensive properties that can be measured.The forces acting on chemical transformations among proteins can, however, be assessed by first writing chemical reactions denoting their formation.An example of this procedure is given further below for a specific model system.The basic methods that apply there were used throughout this study.The standard molal Gibbs energies (∆G • ) and net charges of ionized proteins at pH = 7 are listed in Table 1 so that the results described below can be reproduced at this pH.
In Figs.1a and b the metastable equilibrium predominance limits of ionized proteins in the glutaredoxin and thioredoxin/thioredoxin reductase model systems are shown as a function of the logarithm of oxygen fugacity and pH.Here, the predominant protein in a population is taken to be the one with the greatest chemical activity.The computation of the relative metastabilities of the proteins included all five model proteins in the glutaredoxin system as candidates, but note regarding Fig. 1a that only two of the five proteins appear on the diagram.Those that do not appear are less metastable, or have greater energy requirements for their formation over the range of conditions represented in Fig. 1a than either of the proteins appearing in the figure .The equal-activity lines in these pH diagrams are curved because the ionization states of the proteins depend on pH.The observation apparent in Fig. 1a that increasing log f O2 (g) favors formation of the cytoplasmic protein homolog relative to its mitochondrial counterpart is also true for the thioredoxin/thioredoxin reductase system shown in Fig. 1b.In comparing Figs.1a and b note that in the latter figure, predominance fields for a greater number of candidate proteins appear, and that the predominance field boundary between mitochondrial and cytoplasmic proteins occurs at a lower oxidation-reduction potential.The dashed lines shown in each diagram of Fig. 1 are reference lines denoting the reduction stability limit of H 2 O (log f O2 (g) ≈ −83.1 at 25 • C and 1 bar [76]).
Predominance diagrams as a function of Eh and pH for the glutaredoxin and thioredoxin/thioredoxin reductase systems are shown in Figs.1c and d.Like log f O2 (g) , Eh and pH together are a measure of the oxidation-reduction potential of the system; the different scales can be converted using Eqn.(11).The trapezoidal areas bounded by dotted lines in Figs.1c and d show the ranges of Eh and pH corresponding to the log f O2 (g) -pH diagrams of Figs.1a and b.It can be deduced from these diagrams that if the upper log f O2 (g) limit of Fig. 1a were extended upward, this diagram would include a portion of the predominance field for the nuclear protein GLRX3.
It appears from Figs. 1a-b that increasing increasing log f O2 (g) at constant pH, or increasing pH at constant oxidation-reduction potential have similar consequences for the relative metastabilities of the cytoplasmic and mitochondrial homologs.In this analysis, however, pH does not appear to be a very descriptive variable; the magnitude of the effect of changing oxygen fugacity over several log units is greater than the effect of changing pH by several units.In further metastability calculations pH was set to 7. Also, because Eh itself is defined in terms of pH, the oxidation-reduction potential variable adopted below is log f O 2(g) , which is more directly related to the potential of a thermodynamic component.
In Figs.1e and f the logarithm of activity of water (log a H2O ) appears as a variable.In Fig. 1e it can be seen that the formation of a nuclear homolog of GLRX is favored relative to the cytoplasmic homologs by decreasing activity of water and/or increasing oxygen fugacity, and that increasing relative metastabilities of the mitochondrial proteins are consistent with lower oxidation-reduction potentials and to some extent higher activities of water.In Fig. 1f it appears that the formation of the thioredoxin reductases relative to thioredoxins in each compartment is favored by increasing f O2 (g) , and that for the TRX the relative metastabilities of the mitochondrial proteins increase with decreasing f O2 (g) .a. [77] (Homo sapiens).b.The lower and upper values are taken from [78] and [79], respectively.c.The state of the GSSG/GSH couple in the nucleus is thought to be more reduced than in the cytoplasm [4]; see text.d. [10] (Homo sapiens HeLa [80] cells).e. [9] (Mus musculus: mouse hybridoma cells [81]).f.Calculated by combining the law of mass action for Fe +3 + e − Fe +2 (standard molal Gibbs energies taken from [82]) with a Fe +3 = a Fe +2 (see text).g. [83] (Homo sapiens).h.[6] (yeast).i. [84] (organism unspecified).j. [7] (HeLa) k. [8].l. [5].m.Values of Eh and pH listed here were combined with Eqn.(11) at T = 25 • C, P = 1 bar and a H2O = 1 to generate the values of log f O2 (g) .

Comparison with subcellular redox measurements
Let us compare the positions of the predominance fields in Fig. 1 with measured subcellular redox states.The values of Eh derived from the concentrations of oxidized and reduced glutathione (GSSG and GSH, respectively) in extra-and subcellular environments reported in various studies [9,77,10,79,78] were converted to corresponding values of log f O2 (g) using Eqn.(11) in the Methods and are listed in Table 2.In order to fill in the table as completely as possible, it was necessary to consider measurements performed on eukaryotic cells other than those of S. cerevisiae (e.g., HeLa [80] and mouse hybridoma [81] cells).The values of pH required for conversion of Eh to log f O2 (g) were also retrieved from the literature [83,6,7].The computation of log f O2 (g) from Eh was performed at 25 • C and 1 bar and with log a H2O = 0.No measurements of vacuolar Eh have been reported, but it has been noted that Fe +3 predominates over Fe +2 in this compartment [85].Hence, a nominal (and relatively very oxidizing) value of Eh for the vacuole was calculated that corresponds to equal activities of Fe +3 and Fe +2 .
The available measurements of redox states in compartments of eukaryotic cells can be summarized as, from most reducing to most oxidizing, mitochondria -nucleus -cytoplasm -endoplasmic reticulum -extracellular [4].Strong redox gradients within the mitochondrion are essential to its function [86], which is not captured by the single values listed in Table 2. Comparison nevertheless with the computational results shown in Fig. 1 indicates that a relatively reducing environment does metastably favor the mitochondrial homolog.
Measurements of GSH/GSSG concentrations point to a lower redox state in the nucleus than in the cytoplasm, a.Chemical formulas of nonionized proteologs and standard molal Gibbs energy of formation from the elements (∆G • , in kcal mol −1 , at 25 • C and 1 bar) and net ionization state (Z) at pH = 7 of ionized proteologs were calculated using the overall amino acid compositions given in Table S1.Values of the nominal oxidation state of carbon (Z C ) were calculated using Eqn.(12).log f O2 (g) values for compartments were determined from the metastable equilibrium limits of subcellular interactions listed in Table 4.
but the chemical thermodynamic predictions show the nuclear proteins favored by relatively oxidizing conditions.Studies using nuclear magnetic resonance (NMR) showing that the hydration state of the nucleus is higher than the cytoplasm [16,13] bring into question the prediction consistent with Fig. 1e that the formation of the nuclear proteins is favored relative to their cytoplasmic counterparts by decreasing activity of water.Also, mitochondrial pH is somewhat higher than that of the cytoplasm [6,7], but in Figs.1a and b it appears that the predicted energetic constraints favor the cytoplasmic proteins at higher pHs.These comparisons indicate that all metastable equilibrium constraints are not preserved in the spatial relationships of the homologous redoxins in the cell.

Relative metastabilities of proteologs
The chemical formulas and thermodynamic properties of the model proteologs -hypothetical proteins representing the overall amino acid compositions of compartments (see Methods) -are listed in Table 3.The predominance diagrams in Fig. 2 depicting the relative metastabilities of the model proteologs as a function of log f O2 (g) and log a H2O were generated in sequential order.The first diagram in this figure corresponds to a system in which all 23 proteologs were considered.Subsequent diagrams in Fig. 2 were generated by eliminating from consideration some or all of the proteologs represented by predominance fields in the immediately preceding diagram.It can be seen in Fig. 2a that consideration of 23 proteologs resulted in predicted predominance fields for six proteins over the ranges of log f O2 (g) and log a H2O shown in the diagram.Subsequent diagrams in the sequence represent proteologs with lower predicted relative metastabilities, i.e., higher energy requirements for formation relative to proteologs appearing earlier in the sequence.
There is a large difference between the relatively oxidized conditions of the endoplasmic reticulum reported in the literature (see Table 2) and the theoretically relatively reduced environment of the ER proteolog shown in Fig. 2a.Also note the average nominal carbon oxidation state of the ER proteolog, which is the lowest of any in Table 3.A possible interpretation of these observations is that there is significant chemical heterogeneity within this compartment and a relatively high energy demand for the formation of these proteins in the oxidizing spaces.Nevertheless, the juxtaposition in the ER of very reduced proteins and high redox potential does permit a possible advantage: If the redox potential of the compartment were much lower, the proteins constituting the endoplasmic reticulum would become more favorable to produce than any other proteins (see below) ultimately localized to other compartments that are initially produced there.Perhaps in this way a high redox state could signal the production of cytoplasmic and secreted proteins and a drop in redox state the production of biosynthetic enzymes, i.e. the reproduction of the ER itself.The proteologs appearing in successive diagrams in Fig. 2 are characterized by increasingly higher predicted energy requirements for their formation.
Hence, the nuclear, cytoplasmic and mitochondrial proteologs appearing in Fig. 2c-d are relatively less metastable compared to those of actin, early Golgi and ER appearing in Fig. 2a.It is noteworthy that the proteologs representing the two cytoskeletal systems in yeast cells, actin and microtubule, appear at opposite ends of the energy spectrum.This prediction may be consistent with the observation that actin in different forms appears to be present at most stages of the cell cycle [87], but that the microtubule cytoskeleton grows during anaphase (i.e., the stage of the cell cycle characterized by physical separation of the chromosomes; [88]) and is degraded during other stages of the cell cycle [87,88].
The order of appearance of phases throughout a reaction sequence is determined by the relative stabilities of the phases [63].Examples of the application of this notion in inorganic systems are the reaction series of metamorphic minerals, paragenetic sequences of mineralization [89], Ostwald ripening [90], and weathering reaction paths [91].Can the relative metastabilities of proteins provide information about their order of appearance in the cell cycle?
The outcome of the mitotic cycle in S. cerevisiae is the growth of a new cell in the form of a bud [88].Not all structures in the bud form simultaneously.Instead, it has been observed that [92] "the endoplasmic reticulum, a. Interactions between proteins in different subcellular locations in S. cerevisiae were identified in the literature.
The calculated reaction coefficients on O 2(g) and the metastable equilibrium value of log f O2 (g) were calculated for each reaction between model proteologs.Names of locations shown in bold indicate that the model value of log f O2 (g) for this compartment (Table 3) lies in the metastability range for the proteolog in the particular reaction.
Golgi, mitochondria, and vacuoles all begin to populate the bud well before anaphase and that their segregation into the bud does not require microtubules".The results in Fig. 2 indicate that the proteolog for bud is of comparable metastability relative to that of Golgi but it less metastable than the proteolog of ER.In the absence of energy input, it follows that there would be a chemical driving force to form the ER proteins at the expense of any of the bud that may be present.The appearance in the bud of the less-metastable mitochondrial proteins suggests that there is a source of energy to the bud that is nevertheless not sufficient to drive the formation of the proteins in the microtubule.The formation of these proteins may not be possible until the products of the mitochondrial reactions and other energy-rich metabolites have accumulated in the cell.

Intercompartmental protein interactions
The diagrams in Fig. 2 show the predominant metastability interactions between proteologs for different subcellular compartments.However, many subcellular interactions may in fact be meta-metastable with respect to Fig. 2. For example, interactions occur between proteins in the cytoplasm and nucleus [93], but the proteologs for these compartments do not share a reaction boundary in Fig. 2c.Below, known intercompartmental interactions are combined with the oxygen fugacity requirements for (meta-)metastable equilibrium of the proteologs to characterize compartmental oxidation-reduction potentials.These are used in the next section to explore a possible developmental reaction path.
To assess the biochemical evidence for specific interactions between proteins in different compartments in yeast cells, a series of review papers was surveyed [87,94,95,93,96,97].Statements implying interaction between proteins in different compartments were identified by scanning for action words including interact, are at, align, end at, organize, embed, move, associate, found, locate, extend, bisect, move, migrate, enter, attach, translocate, carry, sort, composed of, line, dock and fuse, recycle, transport, pinch, proceed, reach, degrade in, deliver, colocalize,  4. Reactions are grouped by a common proteolog, listed along the bottom of the plot.Reactions that were used to derive model values of oxygen fugacity of compartments listed in Table 3 are denoted by arrows and bold lines and labels.The position of the reaction labels denotes the direction of the reaction that favors formation of the corresponding proteolog.The actin-bud and ER-cell periphery interactions were omitted from this plot to aid in clarity of labeling; they overlap with actin-vacuolar membrane and ER-cytoplasm, respectively.contain, associate, separate, protrude, penetrate, cooperate, crosstalk, anchor, reside, continuous with, shuttle, oxidize, essential to, convey, arrange, import, and transcribe.The source statements are listed in Text S3 and simplified pairwise representations of the interactions are summarized in Table 4.Of 190 possible combinations between any two of the 20 subcellular compartments (this count excludes the ambiguous location and ER to Golgi and punctate composite, which did not appear in the literature survey), 46 interactions were identified through this survey.
Chemical reactions corresponding to each of the interactions listed in Table 4 were written between residue equivalents of the proteologs, with the reactant proteolog being the one on the left-hand side of the interaction and the product proteolog the one on the right-hand side.The reactions are listed in Table S2.Corresponding values of ∆n O2 (g) (reaction coefficient on O 2(g) ) are listed in Table 4 together with the values of log f O2 (g) where the calculated chemical activities of the two proteologs in each reaction are equal.Note that there are some reactions where the absolute value of ∆n O2 (g) is substantially smaller than the others; these include spindle pole-nuclear periphery, Golgi-early Golgi and nucleus-actin.Because of the small value of ∆n O2 (g) in these reactions, the values of log f O2 (g) for equal activities of these proteins tend to be more extreme than for other reactions.Note that the sign of ∆n O2 (g) denotes the thermodynamically favored direction of the reaction as log f O2 (g) is changed from its equal-activity value; for example, at log f O2 (g) = −75.1, the proteologs of actin and bud metastably coexist with equal chemical activities, but at higher values that of actin predominates in metastable equilibrium.
The interactions listed in Table 4 were used to generate model values of the oxygen fugacity in each compartment that are listed in Table 3.The criterion used for this analysis was that the oxygen fugacity in a compartment should in as many cases as possible favor the formation of its proteolog relative to those of interacting compartments.For example, consider the proteolog for endosome, which occurs in three interactions listed in Table 4.
The endosomal proteolog is favored to form relative to that of actin by log f O2 (g) < −76.6 and relative to that of vacuole by log f O2 (g) < −74.1.In contrast, the endosomal proteolog is favored to form relative to the proteolog of Golgi by log f O2 (g) > −74.3.A single value of log f O2 (g) can satisfy at most two of these constraints; the model value for endosome is taken to be just below the limit for its interaction with actin, or log f O2 (g) = −76.7 (Table 3).Because this value favors formation of the endosomal proteolog relative to those of actin and vacuole, the proteolog of endosome is listed in bold font in these interactions in Table 4, but is shown in normal font in the interaction with the Golgi proteolog.Similar reasoning was used to derive oxygen fugacities for the other subcellular compartments listed in Table 3, except for microtubule.
The outcome of the above analysis is summarized in Fig. 2, where the values of log f O2 (g) for interactions that fall between −79 and −71 are plotted.The interactions are grouped by a common interacting proteolog so that differences between them can be more easily visualized.To avoid clutter, the reaction labels are generally restricted to the name of a single proteolog to indicate the direction of log f O2 (g) change that favors its formation in the reaction.Model interactions that were used to constrain the limits of oxygen fugacities for one compartment (such as the actin-endosome interaction noted above) or two compartments (such as Golgi-late Golgi) are identified with one or two arrows, respectively, and the names of the corresponding proteologs are shown in bold font.
If the model compartmental values of log f O2 (g) all favored formation of the corresponding proteologs relative to their interacting partners, the name of every proteolog would appear in bold font in Table 4.This is only the case, however, for some proteologs such as that of actin, where log f O2 (g) − 75 favors formation of this proteolog relative to any of its interacting partners.At the same oxygen fugacity, it can be shown that the proteolog for microtubule is unmetastable with respect to any of its interacting partners except for bud neck.Notably, the proteolog for microtubule only becomes relatively metastable at high oxygen fugacities (w.r.t.bud, cell periphery and spindle pole) or at low oxygen fugacities (w.r.t.actin, cytoplasm and nucleus).Hence, the value of log f O2 (g) − 75 taken here for the microtubule compartment is different from all the others, in that this represents conditions where the formation of its proteolog is more unfavorable than that of any of its interacting partners.

Sequential formation driven by oxygen fugacity gradients
We have already seen theoretical evidence that the microtubule is a relatively unmetastable assemblage of proteins in the cell.It is known in spite of this that the microtubule as well as the spindle pole are essential in cellular division [87].Can the metastable equilibrium relationships reveal anything about the origins of the interactions of the microtubule and spindle pole in this process?The following thought experiment explores why the irreversible formation of proteologs might follow a sequence that is related to metastable equilibrium thermodynamic relationships.
To start, consider a permeable sac consisting of the cytoplasmic proteolog, which we will expose to a changing oxidation-reduction environment.The oxidation-reduction program will begin at log f O2 (g) = −75, drop to log f O2 (g) = −83.5, increase to log f O2 (g) = −69 and return to log f O2 (g) = −75.At any point along this program the only reactions we will consider are those involving the proteologs of microtubule or spindle pole.Let us assume in addition that none of these reactions proceeds to completion, and that any reaction may only proceed while log f O2 (g) is near the equal-activity value for the reaction.Keeping in mind that no mechanism for the reactions is implied here, it may still be worthwhile to note that others have observed near-equilibrium concentrations of substrates in a subset of enzymatically catalyzed reactions [98,99].
At log f O2 (g) = −75, no reaction occurs because the conditions coincide with the metastability field of the cytoplasmic proteolog relative to either microtubule or spindle pole.As soon as the log f O2 (g) decreases below −78.7, some of the spindle pole proteolog may form irreversibly at the expense of the cytoplasmic proteolog.Below log f O2 (g) = −83.3, the microtubular proteolog can begin to form at the expense of the cytoplasmic proteolog.At log f O2 (g) = −83.5 both of these reactions may favorably proceed, and we begin now to increase log f O2 (g) .As we pass log f O2 (g) = −83.3,then log f O2 (g) = −78.7 going in the positive direction, some of the proteolog of microtubule, then spindle pole can react irreversibly to form the cytoplasmic proteolog.These are the opposite of the first two irreversible reactions.
As long as the current and following reactions do not proceed to completion, there will be a population of the microtubule and spindle pole proteologs available to react.Above log f O2 (g) = −78.7,where the formation of the cytoplasmic proteolog becomes favored relative to spindle pole (see above), the proteolog of actin may also favorably form at the expense of that of microtubule.The nuclear proteolog can form above log f O2 (g) = −75.9 at the expense of the microtubular proteolog, and above log f O2 (g) = −75.5 at the expense of the spindle pole spindle.pole→nucleusproteolog.We now momentarily pass through our starting point, log f O2 (g) = −75.So far, the proteologs from spindle pole, microtubule, actin and nucleus, in that order, may have formed as a result of irreversible reactions of the original cytoplasmic proteolog.Also, the proteologs of microtubule and spindle pole may have been subsequently partially degraded after their possible formation.Now, as log f O2 (g) is increased above −74.3, the proteolog of spindle pole becomes unmetastable relative to that of microtubule.Above log f O2 (g) = −69.4,the proteolog of bud neck may be formed irreversibly at the expense of that of microtubule.At our maximum log f O2 (g) = −69 this reaction can continue, but as we drop below log f O2 (g) = −69.2 it may be joined by formation of the proteolog of cell periphery.Below log f O2 (g) = −69.4any proteolog of bud neck that may have formed becomes unmetastable relative to that of microtubule.Below log f O2 (g) = −72.3any proteolog of microtubule that remains may degrade in favor of formation of the proteolog of bud.Finally, as we drop past log f O2 (g) = −74.3and return to our starting point of log f O2 (g) = −75 the proteolog of spindle pole once again becomes relatively metastable instead of microtubule.In summary, at log f O2 (g) > −75 the potential arises for formation of proteologs of the microtubule, bud neck, cell periphery, bud and spindle pole, as well as for retrograde reactions that may destroy the proteolog of microtubule.
It is important to emphasize the qualified nature of these predictions; all we know from thermodynamics is that any of these reactions could have progressed in the direction of a local Gibbs energy minimum.Whether and to what extent they actually move forward is a consequence of the reaction mechanism.The purpose of this analysis is not to suggest any mechanism but to ask whether work performed by control of log f O2 (g) may energize such a mechanism.The enzymatic properties of the proteins themselves are probably essential in any actual mechanism.It is encouraging to observe that at and below the starting log f O2 (g) = −75 the proteolog of endoplasmic reticulum is favored to form relative to the cytoplasmic proteolog.Hence under these conditions there exists a potential for production of biosynthetic enzymes.
The results of this thought experiment are summarized in Table 5.The range of theoretical values of f O2 (g) required for the chemical transformations among the proteologs is between −83.5 and −69, which in terms of redox potential at 25 • C, 1 bar, pH = 7 and log a H2O = 0 correspond to Eh = −0.420Vand Eh = −0.205V,respectively (Eqn.11).The former value is just below the stability limit for water (log f O 2(g) = −83.1)but the redox state of the NADPH/NADP + pool in rat liver mitochondria might approach this value (Eh = −0.415V[86]).The latter value is consistent with the state of human cells during differentiation (Eh = −0.200V),which is about 0.040V higher than proliferating cells [100].
Oscillations in the redox state of yeast cells are coupled to many metabolic changes including protein transcription and turnover [101].Reductive and oxidative phases in the metabolic cycle of yeast have been identified, with DNA replication occurring during the former and cell cycle initiation occurring at an advanced stage of the latter [102].Oxidative stress was shown to hasten HeLa cells into anaphase by overcoming the normal spindle checkpoint mechanism [103].Although the results shown in Table 5 do not directly address the synthesis of DNA, they do show that there is a potential for the formation of the nuclear proteolog during a relatively reducing part of the hypothetical f O2 (g) cycle.In the oxidizing part of this cycle, above log f O2 (g) = −74.3, the metastability of the proteolog for spindle pole is decreased, and at the highest oxidation-reduction potentials a favorable chemical potential field exists for metastable formation of the proteolog for bud neck.Hence, the notion that "a fundamental redox attractor underpins ... core cellular processes" [104] is in principle supported by the changing relative metastabilities of the proteologs as a function of oxidation-reduction potential.• C and 1 bar and with total activity of protein residues equal to unity for (a) the proteologs shown in Table 1 and (b) the five proteins localized to ER to Golgi whose experimental abundances were reported in [105].The rightmost dotted line in (b) indicates conditions where the calculated abundance ranking of the proteins is identical to that found in the experiments, and the leftmost dotted line where the calculated logarithms of activities have a lower overall deviation from experimental ones, which are indicated by the points.This value of log f O2 (g) (−78) was used to construct the corresponding diagram in Fig. 5.

Calculation of relative abundances of proteins
Above, the interactions between homologs (enzyme isoforms) in subcellular compartments and proteologs representing overall protein compositions in subcellular compartments were used to derive oxygen fugacity limits for metastable reaction of proteins in different compartments.In the second part of this study, attention is focused on the relative abundances and intracompartmental interactions of proteins.
The logarithms of activities of proteologs consistent with metastable equilibrium among all 23 model proteologs are plotted in Fig. 4a as a function of log f O2 (g) .This diagram was generated based on metastable equilibrium among the residues of the proteins [70] in the same manner as described in detail below for a smaller set of proteins (those appearing in Fig. 4b).The purpose of Fig. 4a is to recapitulate the relationships shown in Fig. 2. Note that the same proteins predominate at the extremes of oxygen fugacity represented in 4a and in Fig. 1a (reducing -ER; oxidizing -actin) and that the proteolog of microtubule appears with low relative abundance.More importantly, perhaps, there is a minimum in the range of calculated activities of the proteologs around log f O2 (g) = −75; changing oxidation-reduction potential alters not only the identity of the predominant protein in a metastably interacting population but also the relative abundances of all the others.There is probably not a single value of log f O2 (g) where the calculated relative abundances of the proteologs shown in Fig. 2 reflect the composition of the cell.Let us therefore look more closely at the relative abundances of proteins within compartments.
In Fig. 4b the relative abundances of the five model proteins localized exclusively to ER to Golgi are shown as a function of log f O2 (g) .A worked-out example of the calculations leading to this figure, which method also underlies the generation of the other figures shown here, is presented in the following paragraphs.
The model proteins for ER to Golgi, in order of decreasing abundance in the cell reported by [105], are YLR208W, YHR098C, YDL195W, YNL049C and YPL085W.(For simplicity, the proteins are identified here by the names of the open reading frames (ORF).)The formula of the uncharged form of the first protein, YLR208W, is C 1485 H 2274 N 400 O 449 S 4 , and its amino acid sequence length is 297 residues.The standard molal Gibbs energy of formation from the elements (∆G • ) of this protein at 25 • C and 1 bar calculated using group additivity [69] is −10670 kcal mol −1 .At this temperature and pressure and at pH = 7, group additivity can also be used [69] to calculate the charge of the protein (−10.8832) and the standard molal Gibbs energy of formation from the elements of the charged protein (−10880 kcal mol −1  The double arrows signify that a priori one does not know the sign of the chemical affinity of either of these reactions. At 929 residues, YHR098C is over 3 times as long as YLR208W, but in the formation reactions from the basis species of the residue equivalents of the two proteins, the coefficients on the basis species are similar.The difference between the coefficients of the same basis species in the reactions signifies the response (owing to moderation, i.e.LeChatelier's principle [106]) of the metastable equilibrium assemblage to changes in the corresponding chemical activity or fugacity.For example, because ν CO2,1 < ν CO2,2 , ν NH3,1 < ν NH3,2 and ν O2,1 < ν O2,2 , increasing a CO2 (aq) , a NH3 (aq) or f O2 (g) at constant T , P and chemical activities of the other basis species shifts the metastable equilibrium in favor of YLR208W at the expense of YHR098C.Here, ν i denotes the reaction coefficient of the ith basis species or protein, which is negative for reactants and positive for products as written.Conversely, because ν H2O,1 > ν H2O,2 , ν H2S,1 > ν H2S,2 and ν H + ,1 > ν H + ,2 increasing a H2O , a H2S (aq) or a H + (decreasing pH) at constant T , P and chemical activities of the other basis species shifts the metastable equilibrium in favor of YHR098C at the expense of YLR208W.The magnitude of the effect is proportional to the size of the difference between the coefficients of the basis species in the reactions, and it can be quantified for a specific model system using the following calculations.
To assess the relative abundances of the proteins in metastable equilibrium, we proceed by calculating the chemical affinities of each of the formation reactions.The chemical affinity (A) is calculated by combining the equilibrium constant (K) with the reaction activity product (Q) according to [107] where 2.303 is the natural logarithm of 10, R stands for the gas constant, T is temperature in degrees Kelvin, ∆G • r is the standard molal Gibbs energy of the reaction, and a i and ν i represent the chemical activity and reaction coefficient of the ith basis species or species of interest (i.e., residue equivalent of the protein) in the reaction.Let us calculate ∆G • r (in kcal mol −1 ) of Reaction 1 by writing We now calculate the activity product of the reaction using The values of a i used to write Eqn. ( 5) are the reference values listed in the Methods for a CO2 (aq) , a H2O , a NH3 (aq) , a H2S (aq) and a H + .The value of f O2 (g) used in Eqn.(5) (log f O2 (g) = −75.3) is also a reference value that, it will be shown, characterizes a metastable equilibrium distribution of proteins that is rank-identical to the measured relative abundances of the proteins.Finally, the value of a of the residue equivalent of the protein in Eqn. ( 5) is set to a reference value of unity (log a = 0).If we are only concerned with the relative abundances of the proteins in metastable equilibrium, the actual value used here does not matter so long as it is the same in the analogous calculations for the other proteins.Combining Eqns.( 3)-( 5) yields A 1 /2.303RT = −25.25 (this is a non-dimensional number).Following the same procedure for the other four proteins (YHR098C, YDL195W, YNL049C and YPL085W) results in A/2.303RT equal to −24.86, −24.74, −24.93 and −24.94, respectively.Now let us turn to the relative abundances of the proteins in metastable equilibrium, which we compute using a Boltzmann distribution for the relative abundances of the residue equivalents: where a t denotes the total activity of residue equivalents in the system and n stands for the number of proteins in the system.Note regarding the left-hand side of Eqn. ( 6) that because we are taking activity coefficients of unity, the ratio a i /a t is equal to the ratio of concentrations, or proportionally numbers, of residue equivalents in the system.There is not a negative sign in front of A/RT in the exponents Eqn. ( 6) because the chemical affinity is the negative of Gibbs energy change of the reaction.Note in addition that the values of A/2.303RT given above must be multiplied by ln 10 = 2.303 before being substituted in Eqn.(6).By taking a t = 1, we can combine Eqn. ( 6 If one now iterates calculation of the chemical affinities of the residue formation reactions using the calculated metastable equilibrium logarithms of activities of the residue equivalents (instead of the starting reference value of log a = 0), the resulting chemical affinities for each formation reaction will be all equal and generally non-zero.This property of metastable equilibrium was used in [70] to describe specific application of a method using a system of linear equations for finding the metastable equilibrium state without explicitly writing Eqn.(6).
The results of the calculation described above correspond to the dotted line at log f O2 (g) = −75.3 in Fig. 4b.At this oxygen fugacity, the ranks of abundance of the model proteins in metastable equilibrium are identical to the ranks of experimental abundances.The figure was generated in whole by carrying out this procedure for different reference values of log f O2 (g) .It can be seen in Fig. 4a that there is a narrow range on either side of log f O2 (g) = −75.3(ca.±0.05) where the relative abundances of the proteins in metastable equilibrium occur in the same rank order.Beyond these limits, changing f O2 (g) drives the composition of the metastable equilibrium assemblage to other states that do not overlap as closely with the experimental rankings.The experimental abundances of the proteins reported by [105] are 21400, 12200, 1840, 1720 and 358, respectively, in relative units.These abundances were scaled to the same total activity of residues (unity) used in the calculations to generate the experimental relative abundances plotted at the dashed line in Fig. 4b at log f O2 (g) = −78.Under these conditions, the metastable equilibrium abundances of the proteins do not occur in exactly the same rank order as the experimental ones, but there is a greater overall correspondence with the experimental relative abundances.

Relative abundances of proteins within compartments
The procedure outlined above for calculating the relative abundances of model proteins in ER to Golgi was repeated for each of the other compartments identified in [22].Up to 50 experimentally most abundant proteins were chosen to model each of the compartments.The relative abundances of the proteins were calculated at 0.5 log unit increments from log f O2 (g) = −82 to −70.5.Scatterplots of the experimental vs. calculated relative abundances for each set of proteins are shown in Figure S1.These comparisons were visually assessed to regress values of log f O2 (g) , listed in Table 6, that yield the best fit between calculated and experimental relative abundances.The resulting calculated relative abundances are listed together with the experimental ones in Table S3; the best-fit scatterplots for each set of model proteins are shown in Fig. 5 The retrieval of optimal values of log f O 2(g) was aided by also calculating the root mean square deviation (RMSD) of logarithms of activities using Eqn.(13) and the Spearman rank correlation coefficient (ρ; Eqn.14) between experimental and calculated logarithms of activities.The dotted lines in Fig. 5 were drawn at one RMSD a. Values of log f O2 (g) in each location were regressed by comparing calculated and experimental logarithms of activities of the most abundant proteins in different subcellular locations and of selected complexes for each location (Figure S1).n denotes the number of model proteins used in the calculations.RMSD values were calculated using Eqn.( 13), and ρ denotes the Spearman rank correlation coefficient, calculated using Eqn.(14).
on either side of the one-to-one correspondence, denoted by the solid lines in this figure .The RMSD values were used to identify outliers that are identified in Fig. 5 by letters and open symbols and that are listed in Table 7.To aid in distinguishing the points, they were assigned colors on a red-blue scale that denotes the average nominal oxidation state of carbon of the protein (Eqn.12).
There is a considerable degree of scatter apparent in many of the plots shown in Fig. 5, so a low significance is attached with the log f O2 (g) values regressed from these comparisons.In specific cases such as late Golgi and nuclear periphery a lower overall deviation is apparent and there is a visual indication of a positive correlation between the calculated and experimental relative abundances.Because they were regressed from individual noisy data, the values of log f O2 (g) listed in Table 6 are probably not as representative of subcellular oxidation-reduction conditions as those listed in Table 3, which have the additional benefit of being partly based on known subcellular interactions (see above).
The comparisons depicted in Fig. 5 and in Figure S1 are important because they reveal that the range of protein abundance observed in cells is accessible in a metastable equilibrium assemblage at some values of log f O2 (g) .For example, the range of experimental abundances of the model proteins in actin covers about 1.6 orders of magnitude, while the calculated abundances vary over about 2.2 orders of magnitude.Extreme values of log f O2 (g) tend to weaken this correspondence (Figure S1).The lowest degree of correspondence occurs for the cytoplasmic proteins, where ∼ 6 orders of magnitude separate the predicted relative abundances of the top 50 most abundant proteins, which in the experiments have a dynamic range spanning about 1.2 orders of magnitude.The great degree of scatter apparent in many of the comparisons in Fig. 5a is troublesome.The scatter could be partly a consequence of including in the comparisons model proteins that do not actually interact with each other, despite their high relative abundances.To address this concern, a more selective approach was adopted below that takes account of fewer numbers of proteins that interact through the formation of complexes.Red and blue colors denote, respectively, low and high average nominal carbon oxidation states (Z C ) of the protein.Dotted lines are positioned at one RMSD above and below one-to-one correspondence, which is denoted by the solid lines.Outlying points are labeled with letters that are keyed to the proteins in Table 7.The values of log f O2 (g) used in the calculations are listed in Table 6.
a. Proteins are listed whose calculated logarithm of activity differs from experimental values by more than the root mean square deviation shown in Table 6.

Relative abundances of proteins in complexes
The correspondence between the calculated and experimental relative abundances of the five model proteins in ER to Golgi raises the question of what characteristics of the proteins might be responsible for this result.Searching the functional annotations of these proteins reveals that they are part of the COPII coat complex [111].The inclusion of the COPII complex above was largely unintentional, as the procedure there was to look at the most abundant proteins in given compartments.Nevertheless, the results for that model system suggested that focusing on specific complexes in other compartments could yield interesting results.Because the interactions of proteins to form complexes is essential in cellular structure and regulating functions of enzymes [51], factors that affect the relative abundances of the complexing proteins may be fundamental to the control of metabolic processes.The model complexes used in this study are identified in Table 8.Each complex was nominally associated with a subcellular compartment based on the names and descriptions of the complexes available in the literature.Some exceptions are the cyclin-dependent protein kinase complex, the proteins of which are largely cytoplasmic and nuclear [22], but here is placed in the slot for the ambiguous location because no definitely ambiguously localized complexes could be identified.For a similar reason, the proteins listed in Table 8 under punctate composite are not part of a named complex but were chosen because they are localized to early Golgi in addition to the punctate a. Numbers in parentheses refer to the ID of the complex, if available, from http://yeast-complexes.embl.de[124].Compositions and localizations of complexes were also taken from references listed in square brackets.Symbols: "*" the protein was not localized in the compartment [22]; "X" or "NA" not tagged or no abundance [105]; "a", "b", etc. refer to outliers in Fig. 7.
composite characterization [22].Other exceptions are the vacuolar model proteins (proteases and other canonical vacuolar proteins [12]), enzymes of the ergosterol biosynthetic pathway, some of which are associated with the lipid particle [119], and proteins integral to the peroxisomal membrane, which were identified using the Gene Ontology (GO) annotations in the SGD [111].Where they could be found, the ID numbers of the complexes in a yeast complex database [124] are listed in parentheses in Table 8, as are literature references that describe the composition and/or localization of the complexes.If any of the proteins in the complexes do not localize [22] to the compartment shown in Table 8 they are marked with an asterisk; those proteins that were not present in the YeastGFP database or that are lacking an abundance count therein [105] are marked with "X" and "NA", respectively.
The calculated metastable equilibrium logarithms of activities of the proteins in each complex are shown as a function of log f O2 (g) in Fig. 6.The calculated logarithms of activities of the proteins were compared with experimental ones by constructing scatterplots at 0.5 log unit intervals from log f O2 (g) = −82 to −70.5, which are shown in Figure S1.As above, visual assessment of fit was the first resort to obtain values of log f O2 (g) that maximize the correspondence with experimental relative abundances, but the RMSD and Spearman rank correlation coefficient were also considered in these comparisons.Because of the small sample size in many of the comparisons, the sign of the correlation coefficient is as useful as its magnitude in assessing the results.The resulting calculated relative abundances are listed together with the experimental ones in Table S4.
The number of model proteins in each of the complexes is less than the number of most abundant proteins in each compartment considered in the preceding section, so the visible decrease in scatter is expected.Some of the model complexes represented in Fig. 7 exhibit an apparent positive correlation between calculated and experimental logarithms of activities; these include translation initiation factor eIF3, nuclear pore complex and proteins integral to peroxisomal membrane.An inverse correlation between calculated and experimental logarithms of activities is apparent for proteins in the ESCRT I & II complexes, signal recognition complex, and DASH complex.A few of the other complexes (Golgi transport complex, sterol biosynthesis enzymes) exhibit very little overall correspondence between calculated and experimental logarithms of activities.
The results in Fig. 7 permit an interpretation of the relative energetic requirements for formation of different groups of interacting proteins.Take for example complex 14, which is the DASH complex that associates with the microtubule.An inverse correlation between the experimental and calculated relative abundances is apparent for this complex in Fig. 7.The RMSD between calculated and experimental logarithms of activities of proteins is 1.05, which is among the highest listed in Table 6.Note from Eqn. (3) that a ∼ 1 log unit change in the chemical activity of a chemical species corresponds to a Gibbs energy difference equal to 2.303RT .An average difference of ∼ 1 between calculated and experimental logarithms of activity indicates that the formation of the proteins requires 2.303RT = 1364 cal mol −1 per protein beyond what would be needed if the proteins formed in metastable equilibrium relative abundances.On the other hand, the formation in specific oxidation-reduction conditions of proteins making up translation initiation factor eIF3 and other assemblages where cellular abundances positively correlate with and span the same range as the metastable equilibrium distribution can proceed close to a local minimum energy required for protein formation.
Because of their relatively high energy demands, proteins in complexes such as the DASH complex and the spindle pole body are likely to be more dynamic in the cell.Although a positive rank correlation coefficient for the latter complex is reported in Table 6, at a higher oxygen fugacity (log f O2 (g) = −76) a strong inverse correlation obtains between experimental abundances and calculated metastable equilibrium relative abundances of the proteins in this complex (Figure S1).The finding made elsewhere of some inverse relationships between relative abundance of proteins and corresponding mRNA levels was also interpreted as evidence for additional effort on the part of the cell [125].An inverse relationship that opposes equilibrium may be favored in evolution because of the strategic advantage of incorporating otherwise costly (rare) amino acids that increase enzymatic diversity [126].The present results show that specific examples of inverse relationships in the relative abundances of proteins can be identified using a metastable equilibrium reference state that is conditioned by oxidation-reduction conditions.Chemical selectivity in the dynamic formation in the cell of high-energy proteins could lead to transient formation of complexes that function only under certain conditions.In contrast, complexing proteins that interact close to metastable equilibrium are more likely to be constitutively formed.8. Metastable equilibrium activities of proteins in the complexes were calculated as a function of log f O2 (g) for total activity of residues set to unity.Dotted red lines denote values of log f O2 (g) (listed in Table 6) and calculated relative abundances that were used in making Fig. 7.  Symbols are as in Fig. 5; the model proteins and the outliers are listed in Table 8.

Concluding Remarks
This study was concerned with thermodynamic selectivity of protein formation primarily as a function of one variable: oxidation-reduction potential represented by the logarithm of the fugacity of oxygen (log f O2 (g) ).In reality, many variables are changing in cells, including the hydration state, pH, activity of CO 2 and H 2 S, temperature and pressure.These all factor into the Gibbs energy changes accompanying the overall chemical transformation between proteins.Except for oxygen fugacity, the other variables were held constant in most of the calculations reported here.It is tempting to explore the effects of these variables on the compositions of metastable equilibrium assemblages.Incorporation into the framework of protein folding reactions and a non-ideality contribution, or excess Gibbs energy, that would encompass the effects of electrostatic interactions and macromolecular crowding is another target for expanding the scope of the thermodynamic characterizations.
The model results reported above were chosen in order to test specific predictions made using the hypothesis that the selection for or against metastable equilibrium has measurable consequences in organisms.The findings can be summarized as: 1.The oxidation-reduction potential (log f O2 (g) ) limits of relative metastabilities of redoxin isoforms overlap with measured Eh (redox potential) in the cytoplasm and mitochondrion but not the nucleus.
2. The model proteologs represent the overall amino acid compositions of proteins in different compartments.
At relatively low oxidation-reduction potential, proteologs in order of decreasing relative metastability are those of ER, Golgi, cell periphery, mitochondrion, nuclear periphery and spindle pole.At higher oxidationreduction potential, proteologs in order of decreasing relative metastability are those of actin, nucleolus, nucleus, vacuole, bud neck and microtubule.At intermediate oxygen fugacities, proteologs of lipid particle, peroxisome and early Golgi are relatively metastable compared to those of cytoplasm, vacuolar membrane and late Golgi.
3. In a chemically reacting system starting with the cytoplasmic proteolog where all interactions include the proteologs of microtubule or spindle pole, environmental shifts in log f O2 (g) going from −75 to −83.5 to −69 to −75 can drive the sequential formation of proteologs of spindle pole, microtubule, cytoplasm, actin, nucleus, cell periphery, bud neck and bud.
4. Oxidation-reduction potentials within −78 < log f O2 (g) < −74 give rise to metastable equilibrium populations of most abundant model proteins within compartments in which the range of protein abundance becomes closest to that seen in reported measurements.Substantial scatter is evident in the comparisons, but a moderate overall positive rank correlation was observed.
5. Closer fits between calculated and experimental relative abundances were obtained within −80 < log f O2 (g) < −73 by considering fewer numbers of model proteins that interact in complex formation.Strong positive correlations were found for, among others, cytoplasmic translation initiation factor eIF3 and nuclear pore complex; negative correlations were found for the microtubule-associated DASH complex and the endosomal ESCRT I & II complexes.
This study contributes to understanding the products of evolution by quantifying the extent of departure from metastable equilibrium in populations of interacting proteins.The observed positive correlations are consistent with a trend of some populations of interacting proteins to be imprinted with the consequences of local energy minimization in chemical reactions.These results and observations also support the notion that changing oxidationreduction potential can selectively promote or hold back the reactions leading to formation of complexing proteins in relative abundances seen in the cell.Combining proteomic data with metastable equilibrium calculations is therefore a promising avenue for predicting complexes that form in specific oxidation-reduction conditions that vary temporally and spatially in biochemical systems.

Methods
The essential steps in the calculations reported here are 1) defining standard states, 2) identifying model proteins for systems of interest, 3) assessing the relative abundances of model proteins in metastable equilibrium, 4) visualizing the results of the calculations on speciation or predominance diagrams and 5) comparing the computational results with experimental biochemical and proteomic data.

Standard states and chemical activities
The activity of a species is fundamentally related to the chemical potential of the species by where R and T represent, respectively, the gas constant and the temperature, µ and µ • stand for the chemical potential and standard chemical potential, respectively, and a denotes activity.No provision for activity coefficients of proteins or other species was used in this study; under this approximation, the activity of an aqueous species is equal to its concentration (molality).The standard state for aqueous species including proteins specifies unit activity of the aqueous species in hypothetical one molal solution referenced to infinite dilution.The standard molal Gibbs energies of the proteins were calculated with the CHNOSZ software package [70] using group additivity properties and parameters taken from [69].

Proteologs: overall compositions of proteins in compartments
The overall amino acid compositions of proteins in 23 subcellular locations in S. cerevisiae were calculated by combining localization [22] and abundance [105] data for proteins measured in the YeastGFP project with amino acid compositions of proteins downloaded from the Saccharomyces Genome Database (SGD) [111].Of 4155 ORF names listed in the YeastGFP dataset, all but 12 are present in SGD (the missing ones are YAR044W, YBR100W, YDR474C, YFL006W, YFR024C, YGL046W, YGR272C, YJL012C-A, YJL017W, YJL018W, YJL021C and YPR090W).
To generate proteologs that are most representative of each compartment, proteins that were annotated in the YeastGFP study as being localized to more than one compartment were excluded from this analysis (except for bud; see below), as were those for which no abundance was reported.The names of the open reading frames (ORFs) corresponding to the proteins in the YeastGFP data set were matched against the SGD's protein_properties.tabfile downloaded on 2008-08-04.This search yielded a number of model proteins for each compartment, ranging from 5 (ER to Golgi) to 746 (cytoplasm); see Table 3.The names of the compartments used throughout the tables and figures in this paper correspond to the notation used in the YeastGFP data files (where spaces are replaced with a period).
It was found that no proteins with reported abundances and localized to the bud were exclusive to that compartment, hence all of the proteins localized there (which also have localizations in other compartments) were taken as models for the bud proteolog.The amino acid composition of the proteolog for each compartment was calculated by taking the sum of the compositions of each model protein for a compartment in proportion to its fractional abundance in the total model protein population of the compartment.The resulting amino acid compositions are listed in Table S1.The corresponding chemical formulas of the nonionized proteologs and the calculated standard molal Gibbs energies of formation from the elements at 25 • C and 1 bar of the ionized proteologs are shown in Table 3.

Metastability calculations
Diagrams showing the predominant proteins and the relative abundances of proteins in metastable equilibrium were generated using the CHNOSZ software package [70].These calculations take account of formation reactions of the proteins written for their residue equivalents [70].This approach is demonstrated in the Results for a specific model system.
The basis species, or perfectly mobile components of an open system [61], appearing in the formation reactions studied here are CO 2(aq) , H 2 O, NH 3(aq) , O 2(g) , H 2 S (aq) and H + .The reference activities used for the basis species were 10 −3 , 10 0 , 10 −4 , 10 −7 and 10 −7 , respectively, for CO 2(aq) , H 2 O, NH 3(aq) , H 2 S (aq) and H + .In the case of diagrams showing Eh as a variable, the aqueous electron (e − ) was substituted for O 2(g) in the basis species.Reference values for a e − or f O2 (g) are not listed here because one or the other is used as an independent variable in each of the calculations described above.

Conversion between scales of oxidation-reduction potential
Conversion between the log f O2 (g) and Eh scales of oxidation-reduction potential can be made by first writing the half-cell reaction for the dissociation of H 2 O as Taking pH = − log a H + and pe = − log a e − , the logarithmic analog of the law of mass action for Reaction 8 can be written as log where log K 8 stands for the logarithm of the equilibrium constant of Reaction 8 as a function of temperature and pressure.Eh is related to pe by [127] pe = F 2.303RT Eh , where F and R denote the Faraday constant and the gas constant, respectively.Combining Eqns.( 9) and ( 10) yields the following expression for Eh as a function of log f O2 (g) and other variables: At 25 • C and 1 bar, F/2.303RT = 16.903volt −1 and log K 8 = −41.55;for pH = 7 and log a H2O = 0, a value of Eh = 0 V corresponds to log f O2 (g) = −55.Eqn.(11) permits the conversion between Eh and log f O2 (g) as well at other temperatures, pHs, and activities of H 2 O.

Average nominal oxidation state of carbon
Let us write the chemical formula of a species of interest as C nC H nH N nN O nO S Z nS , where Z denotes the net charge.The average nominal oxidation state of carbon (Z C ) of this species is given by Eqn. ( 12) is consistent with the electronegativity rules described in [128] and is compatible with the equation for average oxidation number of carbon used in [129].For example, Eqn. ( 12) can be used to calculate the average nominal oxidation states of carbon in CO 2 and CH 4 , which are +4 and −4, respectively.Note that the proportions of oxygen and other covalently-bonded heteroatoms contribute to the value of Z C of a protein or other molecule, but that proton ionization does not alter the nominal carbon oxidation state, because of the opposite contributions from Z and n H in Eqn.(12).In the 4143 proteins identified in the YeastGFP subcellular localization study and found in the Saccharomyces Genome Database, the minimum and maximum of Z C are −0.414 and 0.390, respectively.Of the proteins in this dataset, six have Z C < −0.35 (YDR193W, YDR276C, YEL017C-A, YJL097W, YML007C-A, YMR292W) and six have Z C > 0.15 (YCL028W, YHR053C, YHR055C, YKR092C, YMR173W, YPL223C).The points in the scatterplots in this paper (Figs.

Comparison with experimental relative abundances
In comparison, experimental abundances of proteins in each model system were scaled so that the total chemical activity of residues was equal to unity.The root mean square deviation between calculated and experimental logarithms of activities was calculated using

Text S2: Program script and data files for generating figures (GZ)
This program script and supporting files were used to generate the figures shown above.It includes the script itself (plot.R), protein compositions (generated from the protein properties.tabfile downloaded from the Saccharomyces Genome Database), calculated standard molal thermodynamic properties of the proteins (to speed up calculations), YeastGFP protein localization and abundance data [22,105], and a .csvversion of Table 6.To generate the figures, the contents of the zip file should all be placed into the R working directory before loading CHNOSZ.Then read in the script with source('plot.R').More details on the operation are provided at the top of the script file.
Text S3: Interactions between subcellular compartments in yeast (PDF) This file lists statements from [87,94,95,93,96,97] used to identify the interactions between proteins in different compartments of Saccharomyces cerevisiae that are listed in Table 4.

Figure 1 :
Figure 1: Relative metastabilities of homologs of glutaredoxin and thioredoxin/thioredoxin reductase.Predominance diagrams were generated for homologs of (a,c,e) glutaredoxin and of (b,d,f ) thioredoxin/thioredoxin reductase in S. cerevisiae.The letters in parentheses following the labels indicate the subcellular compartment to which the protein is localized (C -cytoplasm; M -mitochondrion; N -nucleus).Calculations were performed for ionized proteins at 25 • C and 1 bar and for reference activities of basis species noted in the Methods.Reduction stability limits of H 2 O are shown by dashed lines; the dotted lines in (c) and (d) correspond to the plot limits of (a) and (b).

Figure 2 :
Figure 2: Relative metastabilities of proteologs of compartments.Predominance diagrams were generated as a function of log f O2 (g) and log a H2O at 25 • C and 1 bar for the proteologs listed in Table 3.The diagram in (a) represents 23 model proteologs; diagrams in panels (b)-(f ) represent successively fewer model proteologs.

Figure 3 :
Figure 3: Logarithms of oxygen fugacity for equal chemical activities of proteologs in intercompartmental interactions.Metastable equilibrium values of log f O2 (g) were obtained for the model reactions listed in Table4.Reactions are grouped by a common proteolog, listed along the bottom of the plot.Reactions that were used to derive model values of oxygen fugacity of compartments listed in Table3are denoted by arrows and bold lines and labels.The position of the reaction labels denotes the direction of the reaction that favors formation of the corresponding proteolog.The actin-bud and ER-cell periphery interactions were omitted from this plot to aid in clarity of labeling; they overlap with actin-vacuolar membrane and ER-cytoplasm, respectively.

Figure 4 :
Figure4: Metastable equilibrium abundances of model proteologs and proteins as a function of oxygen fugacity.Chemical speciation diagrams were generated as a function of log f O2 (g) at 25 • C and 1 bar and with total activity of protein residues equal to unity for (a) the proteologs shown in Table1and (b) the five proteins localized to ER to Golgi whose experimental abundances were reported in[105].The rightmost dotted line in (b) indicates conditions where the calculated abundance ranking of the proteins is identical to that found in the experiments, and the leftmost dotted line where the calculated logarithms of activities have a lower overall deviation from experimental ones, which are indicated by the points.This value of log f O2 (g) (−78) was used to construct the corresponding diagram in Fig.5.

Figure 5 :
Figure 5: Comparison of experimental and calculated logarithms of activities of proteins in compartments.Red and blue colors denote, respectively, low and high average nominal carbon oxidation states (Z C ) of the protein.Dotted lines are positioned at one RMSD above and below one-to-one correspondence, which is denoted by the solid lines.Outlying points are labeled with letters that are keyed to the proteins in Table7.The values of log f O2 (g) used in the calculations are listed in Table6.

Figure 6 :
Figure 6: Calculated logarithms of activities of model proteins in complexes.The numbered complexes are identified in Table8.Metastable equilibrium activities of proteins in the complexes were calculated as a function of log f O2 (g) for total activity of residues set to unity.Dotted red lines denote values of log f O2 (g) (listed in Table6) and calculated relative abundances that were used in making Fig.7.

Figure 7 :
Figure 7: Comparison of experimental and calculated logarithms of activities of interacting proteins.Symbols are as in Fig.5; the model proteins and the outliers are listed in Table8.

5 and 7
and FigureS1) are colored on a continuous red-blue scale according to the value of Z C of the proteins, where maximum red occurs at Z C = −0.35 and maximum blue occurs at Z C = 0.15.

Table 1 :
Subcellular isoforms of glutaredoxin, thioredoxin and thioredoxin reductase in yeast a .

Table 2 :
Nominal electrochemical characteristics of subcellular environments in eukaryotes.Values refer to yeast cells unless noted otherwise.

Table 3 :
Overall protein compositions (proteologs) of compartments in yeast cells a .

Table 4 :
Major intercompartmental protein interactions in yeast a .

Table 5 :
Hypothetical oxygen fugacity cycle and sequence of reactions of proteologs.
).The formula of the protein in this ionization state is C 1485 H 2263.1168 N 400 O 449 S −10.8832 4 .Dividing by the length of the protein, we find that the formula and standard molal Gibbs energy of formation from the elements of the residue equivalent of YLR208W are C 5.0000 H 7.6199 N 1.3468 O 1.5118 S −0.0366 0.0135 and −36.633 kcal mol −1 , respectively.The formation from basis species of the residue equivalent of YLR208W is consistent with 5.0000CO 2(aq) + 1.7946H 2 O + 1.3468NH 3(aq) + 0.0135H 2 S (aq) C 5.0000 H 7.6199 N 1.3468 O 1.5118 S −0.0366 0.0135 + 5.1414O 2(g) + 0.0366H + .(1) Similar reasoning can be applied to write the formation reaction of the residue equivalent of YHR098C as 4.9720CO 2(aq) + 1.8708H 2 O + 1.3240NH 3(aq) + 0.0441H 2 S (aq) C 4.9720 H 7.7882 N 1.3240 O 1.5231 S −0.0138 0.0441 + 5.1464O 2(g) + 0.0138H + .(2) [108,109,110] values of ∆G • of O 2(g) and H + are both zero, which are consistent with the standard state conventions for gases and the hydrogen ion convention used in solution chemistry.The values of ∆G • of the other basis species are taken from the literature[108,109,110].The value of log K 1 consistent with Eqn.(4) is −392.19.
) with A/RT of each of the formation reactions to calculate chemical activities of the residue equivalents of the proteins equal to 0.0905, 0.2248, 0.2994, 0.1944 and 0.1909, respectively.The lengths of the proteins are 297, 929, 1273, 876 and 2195, so the corresponding logarithms of activities of the proteins are e.g.log (0.0905/297) = −3.52 for YLR208W, and −3.61, −3.63, −3.65 and −4.06 for the remaining proteins, respectively.

Table 6 :
Oxygen fugacities, deviations and correlation coefficients in comparisons of intracompartmental protein interactions a .

Table 8 :
Model proteins in complexes a .