Integration of time-resolved transcriptomics data with flux-based methods reveals stress-induced metabolic adaptation in Escherichia coli

Background Changes in environmental conditions require temporal effectuation of different metabolic pathways in order to maintain the organisms’ viability but also to enable the settling into newly arising conditions. While analyses of robustness in biological systems have resulted in the characterization of reactions that facilitate homeostasis, temporal adaptation-related processes and the role of cellular pathways in the metabolic response to changing conditions remain elusive. Results Here we develop a flux-based approach that allows the integration of time-resolved transcriptomics data with genome-scale metabolic networks. Our framework uses bilevel optimization to extract temporal minimal operating networks from a given large-scale metabolic model. The minimality of the extracted networks enables the computation of elementary flux modes for each time point, which are in turn used to characterize the transitional behavior of the network as well as of individual reactions. Application of the approach to the metabolic network of Escherichia coli in conjunction with time-series gene expression data from cold and heat stress results in two distinct time-resolved modes for reaction utilization—constantly active and temporally (de)activated reactions. These patterns contrast the processes for the maintenance of basic cellular functioning and those required for adaptation. They also allow the prediction of reactions involved in time- and stress-specific metabolic response and are verified with respect to existing experimental studies. Conclusions Altogether, our findings pinpoint the inherent relation between the systemic properties of robustness and adaptability arising from the interplay of metabolic network structure and changing environment.


Experimental Methods
A detailed description of the experimental procedure, the transcript data extraction and normalization can be found in the Materials and Methods section of [1]. To approximate the growth rate of the cells under stress condition, we assume that the change in optical density (OD) scales linearly with the biomass production of the cells. To avoid the introduction of additional uncertainties, we also make the following simplifying assumption: OD-specific cell concentrations do not differ significantly for different growth conditions and the biomass production is approximately the same after both heat and cold shock. From the OD data of the supplementary file [1], we calculate the mean OD values of the tree replicates for each time point and determine the differences between time intervals for the control, cold and heat condition. Based on these values, we approximate the growth rate with respect to the control condition. The results are summarized in Table S1. We consider a biomass production of the minimized network of at least 10% of the original network as a valid optimization criterion.

Weighting of reactions
Statistical analysis of the gene expression data is performed using the software environment R [2] and the packages "limma" [3] (for differential gene-expression analysis) and "mixtools" [4] (for bimodal distribution analysis). The data are log-normalized and differential gene expression is determined for each time-point between the replicates of the control and the respective stress condition. The expression values of gene i for all time-points and conditions (including also data from lactose shift and oxidative stress experiments for five time-points) are used as an approximation of the whole spectrum of possible expression values. As a threshold for calling a distribution bimodal or not, the median of all log-likelihoods of the fitted curves (normalmix ) is used. For those genes showing a bimodal distribution, the intersection of both fitted curves determines the gene-specific threshold value ϑ. If the gene shows no bimodality, the median of all expression values for the respective gene is used as a threshold (for an example see Fig.  S1).
We use the genome-scale metabolic network reconstruction of E. coli K-12 [5]. The network contains 153 external metabolites. To avoid that they become dead-end metabolites, i.e., are only consumed or produced, they are removed form the network. Out of the 1075 reactions of the network, 871 can be weighted by experimental data. After splitting reversible reactions into a forward and a backward reaction the resulting network consists of 618 metabolites and 1331 reactions. Two example distributions of weights mapped onto the network are shown in Figure S2.

Flux-based methods
Flux-based methods employ the stoichiometry of a metabolic network and constraints on the reaction velocities (e.g., upper and lower boundaries, reaction reversibility) to predict steady-state flux distributions. These methods allow the investigation of biochemical network, even when detailed kinetics of the reactions are unknown or uncertain, which is the case for most biochemical networks. The stoichiometry of all reactions of the network is represented by a matrix S, where S ij denotes the stoichiometric coefficient of metabolite i in reaction j. For a system of n metabolites and m reactions S is an n x m matrix. Depending on the metabolic state, each reaction j proceeds at a certain reaction velocity v j . At steady-state the system is mass-balanced and constrained to fulfill the following equation: where v is the vector of fluxes. Further constraints on the system are imposed by assigning boundaries to the reaction velocities: where v min and v max are the upper and lower boundaries, respectively. These constraints result from the availability of nutrients in the growth media, maximal enzyme activities or thermodynamic constraints. If a reaction v j is irreversible, the corresponding lower boundary v min j is zero. For most biological networks there are more reactions than metabolites and therefore the system is under-determined [6]. This implies that the imposed constraints do not define a unique solution, but rather a solution space Φ, which is a convex set in the m-dimensional space of fluxes.

Flux balance analysis
Flux balance analysis (FBA) is a flux-based method that aims at optimizing a certain objective function. For metabolic models of microorganisms the maximization of growth has been shown a valid optimization goal [7,8]. This optimality condition can be combined with the steady-state constraints by introducing a biomass reaction with flux v bm . This reaction produces biomass from a linear combination of several metabolic precursors in the network. The resulting mathematical formulation is the following: where c is a vector of weights, indicating how much each reaction contributes to the objective function.
In the case when only the biomass production is optimized, c is a vector of zeros with only one entry equal to one at the position of v bm . By using this formulation, one can determine points within Φ, that describe optimal flux distributions with respect to the objective function. Due to its linear formulation FBA is readily applicable to large-scale networks.

Elementary flux modes
An elementary flux mode (EFM) is defined as a minimal set of enzymes that can operate at steadystate, with all irreversible reactions involved used in the appropriate direction [9]. In contrast to FBA, EFM-based analysis does not predict optimal flux distributions, as it captures the whole set of nondecomposable fluxes. Therefore, any possible flux distribution of the metabolic system can be represented as a linear combination of its EFMs.

Finding the dual problem
The fist step in transforming a bi-level to a single-level mixed-integer linear optimization problem (MILP) is to find the dual for the inner linear program. From duality theory [10], it follows that for each linear problem that can be rewritten in the standard maximum problem form: there is a dual standard minimum form: where x and µ represent the vector of variables of the dual and the primal, respectively, c and b are vectors of coefficients and A is a matrix of coefficients. From the strong duality theorem [10], it follows that, if the primal has an optimal solution x * and the dual also has an optimal solution µ * , their objective function values must be equal : Our bilevel program is the following: where y j is the Boolean variable and w j the weight associated with reaction j, c j is the contribution of v j to the biomass production, f min is a requirement for a minimal biomass production and v max j is the upper boundary on v j , while D, N denote the set of dispensable and nondispensable reactions, respectively. We use that equality condition Eq. (1) to transform the inner optimization problem into a non-linear system of equality constraints.
This transformation leads to a single-level mixed-integer problem: where µ i , µ bio , µ max i are the dual variables associated with the coefficients of the stoichiometric matrix S ij , the biomass function and the upper boundaries for the reaction velocities v max j , respectively. By introducing new auxiliary variables [11], the non-linearities now occurring in the constraints, can be recast into equivalent linear expressions. For each non-linearity we introduce two new continuous variables q max and r max , where q max j = µ max j · y j . The continuous variable r max satisfies: where µ max j,low and µ max j,up are the lower and upper boundaries for µ max j,l , respectively. They can be calculated by minimizing and maximizing the value of µ max j,low and µ max j,up subject to the constraints of the dual problem, respectively [12].
The bi-level problem definition is now transformed into the following MILP: which is implemented and solved for different set of weights. The metabolic model of E. coli K-12 [5] is imported into Matlab [13] using the SBMLToolbox [14]. The optimization approaches (the network minimization program and FBA) are implemented in Matlab using Tomlab 7.8 [15]. EFMs in the time-and condition-specific minimal networks are computed using efmtool [16].

Kendall rank correlation coefficient
Let a = (a 1 , ...a n ) and b = (b 1 , ...b n ) be two sets of observations from two variables A and B respectively, such that all the values of a i and b i are unique. Any pair of observations ( and takes values between [−1, 1]. If the two sets are independent their Kendall τ coefficient is zero. If two rankings match perfectly, the coefficient takes a value of 1 and a value of -1, if the two sets have complete disagreement.

Clustering analysis
The clustering of the fractional appearance profiles is performed using the package "cluster" [17] and the function "PAM", using Pearson correlation as a distance measure. It is repeated 100 times for each number of clusters k between 2 and 10. The Silhouette index [18] is used to determine the most appropriate number of clusters.

GO enrichment analysis
To perform GO enrichment analysis for the selected reactions, we use the gene-reaction annotation of the metabolic network reconstruction to backtrack reaction related genes. For these genes, the blattner gene IDs are translated to Entrez IDs. The R package "hyperGTest" [19] and the org.EcK12.eg.db [20] are used to determine significant terms in the biological process ontology, with a significance level of 0.05. Table S1. Approximated growth rates from optical density measurements. Given are the delta mean OD values (∆∅OD) of three replicates for consecutive time-points t x/(x+10) and the approximated growth rate with respect to the control condition.  Table S2. Clustering of selected reactions by their fractional appearance profiles for cold and heat shock, respectively. Listed are the enzyme names for the respective reactions for each of the nine clusters. If a reaction is reversible, its proceeding direction is indicated by -f and -r, denoting forward-or backward, respectively.