mRNA stability and the unfolding of gene expression in the long-period yeast metabolic cycle

Background In yeast, genome-wide periodic patterns associated with energy-metabolic oscillations have been shown recently for both short (approx. 40 min) and long (approx. 300 min) periods. Results The dynamical regulation due to mRNA stability is found to be an important aspect of the genome-wide coordination of the long-period yeast metabolic cycle. It is shown that for periodic genes, arranged in classes according either to expression profile or to function, the pulses of mRNA abundance have phase and width which are directly proportional to the corresponding turnover rates. Conclusion The cascade of events occurring during the yeast metabolic cycle (and their correlation with mRNA turnover) reflects to a large extent the gene expression program observable in other dynamical contexts such as the response to stresses/stimuli.


Other data sources
The protein complexes were downloaded from the MPACT subsection (http://mips.gsf.de/genre/proj/mpact/) of the CYGD database at MIPS. Only complexes manually annotated from the literature are considered; those obtained from high throughput experiments are disregarded (according to the MIPS classification scheme these last are labeled "550").

Time series analysis
As in [1], genes are filtered using a periodogram test. In order to retain only genes with a well-defined periodicity, we fixed a more selective p.value than [1] thereby reducing the number of genes to 1951.
The period of the YMC, computed in the time domain looking at the most impulsive-like categories (in Fig. S1 the 3 RNA polymerases), is estimated as 287.5 min (see Fig. S1 for a detailed description).
To each of the 1951 genes labeled as periodic, we associated a phase, computed maximizing the correlation with respect to a train of 360 shifted sinusoids (resolution of 1 • ). Means and standard deviations of the phases of periodic signals must be computed "mod 360 • ", and are normally subject to large numerical errors and illconditioning. A typical example is the following: assume two periodic genes are assigned the phases φ 1 = 350 • and φ 2 = 6 • . Owing to the 360 • periodicity, the peaks of the two genes are very close, but the average phase is (φ 1 + φ 2 )/2 = 178 • , which is obviously wrong. The correct answer requires a shift from the principal values of the periodic signal: (φ 1 − 360 • + φ 2 )/2 = −8 • . To avoid problems with biased mean values and/or the appearance of inelegant negative phases around the "crucial" transcription bursts, the 0 phase was chosen so as to anticipate of ∼ 30 • these events.
Given that the period is approximately 287.5 minutes, the phase delay φ can be transformed into time delay τ by means of the relation τ = φ 287. 5 360 . Under the convention for the 0 phase, each period "begins" approximately 24 min before the transcriptional bursts.
For each gene, the pulse width is computed estimating on each period the interval in which the expression levels stays above the median value across consecutive samples.

Least squares regressions
In Fig. 1 of the paper, the least square fitting in the HL/phase plot is given by the equation The corresponding p.value is computed via a Fisher test statistics. Since we have determined the period as 287.5 min and the zero phase ∼ 30 • before the impulsive bursts shown for example in Fig. S1, the equation in terms of time delay with respect to the bursts, τ t ≃ 0.8(φ − 30 • ), is τ t ≃ 7.38 HL − 128.8 (min).
Within most clusters, the standard deviation in terms of φ is minimal; it is higher in terms of HL, see Table 1  Repeating the linear regression for the three plots in Fig. 2(b) of the paper we get:

Clusterization
The clusterization of the time profiles in Fig.1 of the paper is performed via a k-means algorithm using as distance a nonnormalized correlation function. Varying the number of clusters and/or the (randomly chosen) initial cluster assignments, the results (in terms of the regressions) are basically unchanged. A k-means algorithm is also used to cluster the KEGG pathways in Fig. S3. In this case as distance we use the average of the pairwise correlations between all genes of every two pathways.

More compartmentalized categories
Other categories which can be associated with a particular cellular compartment emerge from the joint analysis of pathways and protein complexes. For example lipid biosynthesis, which is to a large extent localizable in the endoplasmic reticulum (ER) has a phase comparable to the translocon family of complexes (Fig. S5) which is composed of the Sec61 protein translocator, the signal recognition particle which binds the ER-specific sequence on the nascent polypeptide chain and the signal peptidase that cleaves it off.

Signaling proteins
Complexes with eminently intracellular signaling functions, such as the antagonistic cAMP-dependent protein kinase and serine/threonine phosphoprotein phosphatases (respectively phosphorylator and dephosphorylator of signaling proteins) have similar patterns of expression, similar timing during the YMC and high Pearson correlation (at least for what concerns periodic genes).

Weakly periodic categories
Several categories linked to transcriptional activation or RNA processing, like the histone acetyltransferase enzyme or the nuclear processing complex family (3'-end pre-RNA processing factors CFI and II and 3'-end polyadenylation factors PFI) or the chromatin assembly complex, seem to be evading the tight phase coordination. However, this is mostly due to the evanescent periodic pattern, if any, of the corresponding genes. Likewise for the nuclear pore complex, which assists the export of mature mRNA through the nuclear envelope: most of its genes in fact show bursts which are synchronous with the initial pulses but of very small amplitude, thus ambiguous in terms of temporal classification.

Double peak and anticorrelated isoenzymes.
Especially for mitochondrially localized pathways such as citric acid cycle and oxidative phosphorylation the pulses are very broad, with a neat downregulation only in correspondence of the bursts of transcription and an overall profile often exhibiting a double peak on each period (occurring with a phase lag of ∼ 100 • one from the other, see Fig. S8). The four respiratory chain complexes for example follow this pattern in a fairly precise manner. In order to investigate the meaning of this double peak characteristic, we consider genes whose products are classified as isoenzymes. If we look at the correlation for all pairs of isoenzymes, see Fig. S6(a), we see that restricting to periodic genes an almost bimodal distribution emerges, with a significant percentage (43 out of 210) of isoenzyme pairs being anticorrelated (R < −0.3). This behavior has no counterpart on the distribution of expected values (computed as above by means of a large collection of microarrays). In more than 50% of these anticorrelated pairs the pattern of activation in the time course is similar (see Fig. S6(c)), with one of the two isoenzymes exhibiting a deep and prolonged downregulation immediately following the transcription bursts. The majority of these pairs is involved in oxidoreductive processes, like, for example, SDH1-YJL045W, SDH3-YMR118C, SDH4-YLR164W (all subunits of succinate dehydrogenase), or the NADP-dependent isocitrate dehydrogenase pairs IDH1-IDP3, IDH2-IDP3, IDP1-IDP3, or the plasma membrane H+-ATPase isoenzymes PMA1-PMA2, or the NADH dehydrogenase pairs NDE1-NDE2. Three among the most anticorrelated pairs of isoenzymes showing this pattern are located along the pentose phosphate pathway, two on the cytosolic oxidative branch (SOL3-SOL4 and GND1-GND2), the third (the transketolases TKL1-TKL2) downstream. The pentose phosphate pathway synthesizes NADPH, which is the major source of reducing equivalents and, according to [2,3], plays a major role in the establishment of the cycle. Also the most anticorrelated isoenzymes in the glycolysis pathway, the alcohol dehydrogenases, have a similar pattern: ADH1 and ADH3 (reducing acetaldehyde to ethanol) versus ADH2 (catalyzing the reverse reaction), see section 4 for a more detailed analysis of the periodic pattern in the central metabolism.

Periodic pattern analysis in the central metabolism
From Fig. S8, it seems that the long bursts of the citric acid cycle and oxidative phosphorylation genes could be composed of two distinct adjacent phases for each period. Similarly, the profiles of the anticorrelated isoenzyme pairs mentioned in section 3 and reproduced in Fig. S9, show that of the two recurrent patterns described in Fig. S6(c), one resembles the mitochondrial transcription/translation burst (upregulation approximately in the interval 225÷375 min interval and periodically thereafter), the other is more delayed (interval 300÷450 min) and characterized by a deep downregulation during and after the main transcription bursts (200÷275 min). The alcohol dehydrogenases isoenzymes are "prototypes" of the 2 patterns: ADH1 and ADH3 (respectively cytosolic and mitochondrial, both reducing acetaldehyde to ethanol) follow the first, while ADH2 (using ethanol as substrate) follows the second.
The first pattern (ADH1/ADH3) is synchronous with the hexokinases catalyzing the initial glucose phosphorylation: of the three isoenzymes, HXK2 has the earliest response but is also more rapidly repressed, while HXK1 is more long-lived and is expressed, together with the glucokinase GLK1, also in the other interval [4]. Quite unexpectedly, the enzyme for the final irreversible step of glycolysis, pyruvate kinase (CDC19, as the isoenzyme PYK2 remains constantly basal), is neither synchronous with the ADH1/ADH3 and HXK2 profile, nor with the other one (ADH2 and GLK1), but rather delayed with respect to both modes (in Fig. S5 pyruvate kinase has the highest phase delay). The high level of expression of alcohol dehydrogenase in all metabolic modes suggests that pyruvate production may not be the rate-limiting step of the pathway, and that a delayed pyruvate kinase action may help meeting cellular ATP demand by distributing uniformly ATP production along the cycle (see Fig. S10). As a matter of fact, CDC19 peaks always precede the transcription bursts (in correspondence of the downregulation of the mitochondrial genes) and fall right after that. Most of the enzymes for the intermediate steps of glycolysis do not show any significant periodic trend (see Fig. S7 for an overall view of the phase of the genes on the central metabolic pathways), although on the other irreversible reaction, phosphofructokinase (both genes PFK1 and PFK2) has some degree of resemblance with CDC19. On the contrary, the gluconeogenesis enzymes pyruvate carboxylase (PYC1) and phosphoenolpyruvate carboxykinase (PCK1) show a strong correlation with the genes ADH1/ADH3 and HXK2.
The acetaldehyde-ethanol exchange is part of the so-called "PDH bypass" (i.e. route alternative to the pyruvate dehydrogenase complex) for acetyl-CoA production, see [5]. The supply to this pathway (through PDC5) is almost continuous (except in the "valleys" of the pyruvate kinase) and the rest of the pathway, aldehyde dehydrogenase (mostly isoenzyme ALD6, mitochondrial) and acetyl-CoA synthase (ACS1, cytosolic) coordinated with ADH2. On the contrary, the pyruvate carboxylase branch follows ADH1/ADH3, while the direct route pyruvate/acetyl-CoA (PDH complex) is unclear (more synchronous with ADH1/ADH3 though).
With the exclusion of the succinyl-CoA ligase (both LSC1 and LSC2) all the steps of the citric acid cycle are more in agreement with the ADH1/ADH3 pattern and are rigorously shut down during the transcription bursts. From Fig. S8, it seems that the long bursts of the oxidative phosphorylation genes overlap with both patterns. Looking at the trace of observed dO 2 (data taken from [1] and reproduced in Fig. S10(b)), citric acid cycle and oxidative phosphorylation activation seem to correspond to the maximum drop in dO 2 concentration (200÷300 min interval following the transcription burst), but they seem to persist also long after the recovery of dO 2 . It must be noticed that the trace of dO 2 resembles closely the expression profile of the catalase enzymes that produce O 2 detoxifying reactive oxygen species.

Glucose-regulated carbon metabolism
There is a consistent literature on the influence of glucose abundance on gene expression [6,7,8,9,10]. On the YMC, the standard glucose activated and/or repressed signaling pathways are not expressed. For example the Snf1 serine/threonine protein kinase complex subunits SNF1, SNF4, SIP1, SIP2, GAL83, as well as the other regulated genes on the same pathway MIG1, CAT8 and ADR1, do not show any significant pattern.

Direct versus clustered correspondences phase/HL
Unlike Fig. 1(b) of the paper, a plot HL/phase for all periodic genes looks very scattered (see Table 1 of the paper for the HL standard deviation on each cluster). One of the reasons may be that the HL measures are widely imprecise (see comparison between HL datasets in [11]), another that HL values should probably be considered as contextspecific, parametrically dependent on a set of physiological conditions and/or on trans-acting degradation factors [12,11], while the HL values available from [11,13,14] represent more a "built-in" turnover rate. Nonetheless, as we lump together genes according to either expression profiles (unsupervised approach) or functional annotation (supervised approach) the linear relationship neatly emerges. It is tempting to speculate that the process of grouping genes reduces the uncertainty of the turnover rate measures and that functionally similar genes might share a similar fate (e.g. being accumulated in specific P-bodies [15]) regardless of the exact value of the turnover rate.

Synchronization and pulse width
In [1] it is affirmed that broad profiles (like those associated here to "late" categories) may be due to loss of synchronization in the population of yeast cells as they progress through the cycle(s). Based on what we show in this paper, such an interpretation is problematic: loss of synchronization during a cycle would jeopardize the entire transcriptional program on the following cycles, while on the contrary, we still see thin and precisely coordinated pulses in the fast categories. Population synchronization is often said to be influenced by rapidly diffusing molecules such as H 2 S [16]. We notice here that the sulfate-related pathways are expressed only in the first part of each cycle. In spite of the high coordination and early phase, the peaks for these categories last in average 100 min, see Fig. S4.  Figure S1: Computation of the period of the bursts in the YMC. When the period is computed via Fourier analysis, as is done in [1], the answer is 300 min. However, a closer look at the genes having impulse-like behavior (in this Figure the three RNA polymerases) reveals that the sampling is not perfectly synchronized with the period observed: in a time window twice the period (200÷775 min) there are 23 samples instead of the expected 24, and the "11.5" samples per period ratio seems to yield a more accurate matching of the peaks. The resulting period is therefore 11.5 · 25 = (775 − 200)/2 = 287.5 min. Notice how this explains why the second peak is less resolved that the first and third one in basically all Figures shown in this paper.    Figure S4: Correlation between genes along the KEGG metabolic pathways (genetic information categories such as transcription, translation, and DNA replication and repair are not included), computed for the yeast metabolic cycle time series (top plot) and for a collection of 790 yeast experiments (bottom plot). The correlation is computed between enzymes that are neighbors in terms of metabolic reactions: from adjacent genes to genes separated by three intermediate reactions (green scales). Averages between all genes involved in a pathway is also shown in yellow. On the right panel of both figures average values of correlations along the pathways are shown (top part of the right plots); in the remaining three plots the pathways are further grouped by average connectivity degree.
Correlations are higher for more tightly connected pathways than for those having a low connectivity degree. Comparing the right hand sides of the two Figures, correlation among neighboring genes for the YMC is higher than for the collection, thus confirming the high level of functional coordination induced by the YMC along the metabolic pathways.    Information abound compartment and reaction direction is extrapolated from [17]. The expression of the enzymatic genes is taken as a measure of the flux of metabolites through the reaction node (scales are however not indicative). The peaks of consumption of ATP in the cytoplasm in correspondence of the main bursts are small but visible. More visible is the pattern of ATP-producing enzymes in both compartments. In the cytoplasm this is essentially due to the pyruvate kinase enzyme Cdc19 transforming phosphoenolpyruvate into pyruvate during anaerobic respiration, while in the mitochondria it is due to the oxidative phosphorylation pathway. The fermentative recharging of ATP in the cytoplasm is quite in antiphase with the respiratory mitochondrial one (scale here can be even misleading: aerobic ATP production is of course far more efficient than anaerobic one). Notice that during the bursts of transcription, ATP hydrolysis rather than peaks of ADP induces peaks in the production of AMP, as is expected for high energy demanding reactions such as RNA polynucleotide synthesis. (b): Time course of expression for enzymes of reactions involving O 2 . Color, line thickness and compartment subdivision is the same as above. The third plot is the trace of dissolved O 2 (blue line, data reproduced from [1]), and the dO 2 ratio (green line). Its trend follows closely the cytoplasmic "oxygen production" (blue curve in the middle plot), which essentially is the time course of the catalases, enzymes detoxifying reactive oxygen species such as H 2 O 2 . Qualitatively the main discrepancy between the two curves occurs in the 50 min interval following the bursts (e.g. 200 ÷ 250 min.) where d O 2 keeps decreasing while the concentration of the Catalases mRNAs remains basically at zero level. From the top plot, the explanation could be that this is the interval in which mitochondrial respiration starts, thereby consuming O 2 .