Bridging the gap between gene expression and metabolic phenotype via kinetic models

Background Despite the close association between gene expression and metabolism, experimental evidence shows that gene expression levels alone cannot predict metabolic phenotypes, indicating a knowledge gap in our understanding of how these processes are connected. Here, we present a method that integrates transcriptome, fluxome, and metabolome data using kinetic models to create a mechanistic link between gene expression and metabolism. Results We developed a modeling framework to construct kinetic models that connect the transcriptional and metabolic responses of a cell to exogenous perturbations. The framework allowed us to avoid extensive experimental characterization, literature mining, and optimization problems by estimating most model parameters directly from fluxome and transcriptome data. We applied the framework to investigate how gene expression changes led to observed phenotypic alterations of Saccharomyces cerevisiae treated with weak organic acids (i.e., acetate, benzoate, propionate, or sorbate) and the histidine synthesis inhibitor 3-aminotriazole under steady-state conditions. We found that the transcriptional response led to alterations in yeast metabolism that mimicked measured metabolic fluxes and concentration changes. Further analyses generated mechanistic insights of how S. cerevisiae responds to these stresses. In particular, these results suggest that S. cerevisiae uses different regulation strategies for responding to these insults: regulation of two reactions accounted for most of the tolerance to the four weak organic acids, whereas the response to 3-aminotriazole was distributed among multiple reactions. Moreover, we observed that the magnitude of the gene expression changes was not directly correlated with their effect on the ability of S. cerevisiae to grow under these treatments. In addition, we identified another potential mechanism of action of 3-aminotriazole associated with the depletion of tetrahydrofolate. Conclusions Our simulation results show that the modeling framework provided an accurate mechanistic link between gene expression and cellular metabolism. The proposed method allowed us to integrate transcriptome, fluxome, and metabolome data to determine and interpret important features of the physiological response of yeast to stresses. Importantly, given its flexibility and robustness, our approach can be applied to investigate the transcriptional-metabolic response in other cellular systems of medical and industrial relevance.


Derivation of generic kinetic expressions
Consider a set of metabolites that are transformed in a single irreversible reaction: where i a and j b denote the stoichiometric coefficients of species i A and j B , respectively. We Similarly, for a single reversible reaction: After normalizing the protein levels and metabolite concentrations, the reaction rate can be written as, ( ), where f v and b v are determined as follows: where β is a fitting parameter.

Thermodynamic constraints on β
By definition, β is constrained to be larger than 1.0. In addition, β must satisfy thermodynamic constrains for parallel pathways: the overall equilibrium constant should be the same for all parallel pathways. The relation between β and the equilibrium constants is given by: We have two cases of parallel pathways in the metabolic network that impose the following   constraints:   ,   31  ,  30  ,  33  ,  32  ,  25 , eq eq eq eq eq . 54 , 55 , 29 , eq eq eq Substituting the equilibrium constants as a function of β for the flux distribution of the ∆gnc4 mutant under 3-aminotriazole treatment, we obtained the constraints: ,we finally obtained: Similarly, for the flux distribution of the reference culture for the weak organic acid treatment experiments, we obtained the constraints:

Rate expressions for lumped reactions
To simplify the derivation, consider the sequence of two irreversible reactions: where k d denotes the stoichiometric coefficients of species .
where r denotes the rate of the lumped reaction and v represents the flux through the reactions.
Equations (S24) and (S25) can be solved for the product of concentrations of j B : This can be generalized for n sequential irreversible reactions between substrates i A and products k D as, For a sequence of reversible reactions: The rates of these two reactions are described by the expressions: Assuming that under steady-state conditions: , after algebraic manipulations of Eq. (S32) to Eq. (S35), the rate of the lumped reaction is given by: To further simplify the expressions, we used the following heuristic approximations: and the rate expression for the lumped reversible reaction: This approximate expression is close to the exact expression when β is much larger than 1.0 and 1 g and 2 g are similar (see Figure 1), and allows us to use the same expression to compute the overall gene expression changes for both reversible and irreversible reactions.
For sequential reactions involving irreversible and reversible reactions, the expressions became too convoluted. Therefore, we approximated their reaction rate as, where γ is the number of irreversible steps.

Overall gene expression change for reactions associated with multiple genes
Consider a single-step reaction associated with n genes. There are two basic cases for computing the overall gene expression change: 1. All genes are required for the reaction to occur (e.g., the genes code for subunits of an enzyme): 2. Genes code for independent isoenzymes: where i w is a weight given to gene i.
In the examples shown in the main text, the weights were a function of the expression intensity ( i h ) of each gene: Thus, highly expressed genes were given more weight than lowly expressed genes.
For more complicated reaction-gene associations, the overall gene expression change was computed by applying the appropriate mean (arithmetic or geometric) in a recursive way. For example, for a lumped reaction, first the overall gene expression change for each step was computed. Then, the overall expression change for the lumped reaction was computed using the geometric mean, because all steps of the lumped reaction were required.
Method to compute a reference flux distribution from measured uptake/production rates A reference flux distribution that satisfies the stoichiometric constrains and the measured uptake and production rates of extracellular metabolites can be obtained with the following four-step procedure: 1. Compute the lower and upper bound of every reaction in the network. The lower bound was computed by solving the optimization problem: Step 1: 4. The solution of the problem in Step 3 may have some of the reactions with zero flux, even if their lower or upper bounds are non-zero. However, we assumed that if the enzyme(s) and substrate(s) of a given reaction are present, then the reaction must have a non-zero flux. To compute a non-zero flux for such reactions we solved the following problem: where Z is the set of reactions with zero flux, 0 z is the objective function value for the solution of the problem in Step 3, and L is a slack parameter. This problem is solved iteratively until all possible reactions have a non-zero flux. Then, the reference flux distribution is computed from all solutions: where m is the number of solutions j v .
We tested the method to compute a reference flux distribution using only uptake and production

Estimation of fitting parameters
Estimation of α, β, and δ We estimated the parameters α, β, and δ by minimizing the sum of squared error between the simulated and the experimental amino acid concentration changes. Figure 4 shows

Parameters specific for the analysis of S. cerevisiae's response to WOA treatment
The constructed models for simulating the response of S. cerevisiae to treatment with WOAs only have one fitting parameter. This parameter is the uptake rate of the WOA, and it was assumed to be the same for all WOAs. We set the WOA uptake rate that minimizes the average distance of model predictions to the experimental data of the treatment conditions for all WOAs ( Figure 5): r , is the uptake rate of acetic acid by diffusion in the reference condition, estimated using the method for computing the reference distribution using the procedure described in Section 1 above.

Parameters specific for the analysis of S. cerevisiae's response to histidine starvation
The constructed models to simulate the response of S. cerevisiae to histidine starvation only have two fitting parameters. The first parameter was included to account for the different inhibition where gly ser r → denotes the rate expression without accounting for inhibition and the parameter gly ser k → was set such that the difference between the predicted and measured concentration changes of glycine and methionine was minimized.

Simulations under histidine starvation without gene expression data
Relatively large correlation coefficients between predicted and experimental free amino acid concentrations can be obtained even without gene expression data. Figure 8 shows the simulated response of the wild-type culture without using gene expression data.

Identification of candidate off-targets of 3-aminotriazole
We identified candidate off-targets of 3-aminotriazole using an in-house ligand-based target identification protocol [3]. This protocol determines if a query molecule is a potential inhibitor of a protein based on the structural overlap between the query molecule and known inhibitors of the protein. Currently, the database of target proteins contains mainly human proteins. Therefore, we looked for possible targets of 3-aminotriazole in this database. Subsequently, we looked for S. cerevisiae proteins with sequence similarity to the identified target human proteins. The top ten proteins with higher score in the database are given in Table 1. Five of the top 10 identified offtarget proteins of 3-aminotriazole have high sequence similarity with S. cerevisiae's proteins.
Notably, these five proteins are involved in the synthesis of tetrahydrofolate, a coenzyme required for the synthesis of glycine from serine. The genes encoding the potential targets of 3aminotriazole are shown in the last column of Table 1, and their function in the synthesis of tetrahydrofolate is shown in Figure 9.      The uncertainty propagation analysis was carried out by simulating the model with random metabolic fluxes and GED generated by sampling normal distributions with the mean and standard deviations of the experimental data.

Expression profiles of the two most determinant reactions for tolerance to weak organic acids (WOAs)
We estimated the gene expression profiles using the gene expression data from Abbott et al. [2].
As in the source article, gene expression levels lower than 12 were set to 12 and all microarrays were scaled to a target average signal of 150. We performed two sample t-test to evaluate differential expression between the treatments and the reference conditions and estimated the false discovery rate on the p-values from the t-test. The expression profiles for the genes associated with the two most important reactions of S. cerevisiae tolerance to WOA treatment are given in Table 2.   [2] (gene expression change larger than two-fold and a false discovery rate less than 0.5%). b Ratio of gene expression level between the treatment and reference condition. c p-values from a two-sample t-test to evaluate differentially expression of genes. d q-values for the estimated false discovery rate of differentially expressed genes.