### Common approaches to genome-scale optimization: FBA and MOMA

Figure 1 presents a geometric interpretation of the commonly used FBA and MOMA objective functions, contrasting them with the PSEUDO objective that we will describe below.

Standard FBA solves for a vector of metabolic fluxes within constrained flux space that maximizes cellular growth rate (Figure 1A). Because time is not represented in the model, predictions of growth rate or yield are treated identically. Mutations within the FBA framework are modeled as additional constraints that remove a region of the allowed flux space. For example, deletion of the *pgi* gene in *E. coli* eliminates phosphoglucose isomerase activity and constrains flux through that reaction to be zero. A mutant model is then re-solved to predict a new growth rate optimum (Figure 1B).

A popular alternative method for predicting mutant behavior is based on the Minimization of Metabolic Adjustment (MOMA) [15]. A mutant may not grow optimally if natural selection has not had a chance to act on the new genetic background. Instead, MOMA hypothesizes that a mutant will tend to approximate the wild-type state as closely as possible. Formally, a MOMA flux vector is found with minimum Euclidean distance to a single optimal wild-type profile, subject to the constraints of mutation (Figure 1C). This method therefore requires as input a unique optimal wild-type flux vector, which may be known from empirical measurements. In practice however, this point is often predicted with a standard FBA model.

Both FBA and MOMA use convex objective functions and convex constraints. Applications of these models can therefore access a powerful suite of convex programming algorithms [16]. A flux vector identified by convex programming is guaranteed to be unique, globally optimal, and can be computed in milliseconds. This is unlike most nonlinear optimization methods, which are computationally intensive and often can not guarantee a global optimum. The fast run times provided by convexity means that thousands of model variants can be rapidly re-solved in seconds on a conventional desktop. In metabolic engineering, for example, mutations can be screened combinatorially *in silico* for sets that improve production of a metabolite of interest.

### The problem of degeneracy in genome-scale models

Although solutions to FBA and MOMA problems are guaranteed to be globally optimum, they are not guaranteed to be unique. In practice, many different flux profiles allow equally optimum growth (Additional file 1: Figure S1 and Additional file 2: Figure S2). The problem of degeneracy is encountered frequently in the literature, and numerous attempts have been made to address it.

Degeneracy can be reduced by further constraining the model using known regulatory interactions [17], metabolite concentrations [18] or thermodynamic laws [19]. Previously measured flux rates, when available, are a particularly valuable guide for further predictions [15, 20, 21].

However, in many cases the additional information required to formulate these constraints is simply not available. Much of the power genome-scale methods is their potential to make predictions even in poorly characterized systems. Few metabolic flux measurements are available for most organisms, so here we have used only flux prediction techniques without flux measurements as inputs. Even in well known model organisms, regulatory interactions are only partially understood and imperfectly captured in a linear framework. In contrast, the approach we propose here uses only stoichiometric models and basic thermodynamic constraints, which can be inferred from any annotated genome sequence [22].

### PSEUDO: a new objective for microbial metabolism

A geometric interpretation of the PSEUDO method is presented in Figure 1D. We propose an objective function that explicitly accounts for a region of degenerate near-optimality. This region, **p**, is bounded as in a wild-type FBA model, with the additional constraint that it includes only flux configurations with nearly optimal growth. In this case, we set a threshold of at least 90% maximal growth rate on the vectors we will consider.

\begin{array}{c}\hfill {\mathrm{b}}_{L}\le \mathrm{p}\le {\mathrm{b}}_{U}\hfill \\ \hfill \mathrm{S}\xb7\mathrm{p}=\left[0\right]\hfill \\ \hfill {\mathrm{p}}_{\mathit{GROWTH}}\ge 0.90\xb7{\widehat{\mathrm{f}}}_{\mathit{GROWTH}}\hfill \end{array}

(1)

The flux bounds **b**_{
L
} and **b**_{
U
} constrain fluxes that are known to be thermodynamically irreversible or that are limited by media inputs. The matrix **S** represents the biochemical stoichiometries of all metabolic reactions. The product **S**·**p** yields the net production or consumption rate of each metabolite in the system, necessarily **0** in the steady state. The maximum growth rate, {\widehat{\mathrm{f}}}_{\mathit{GROWTH}}, derives from a standard FBA problem. Yet compared to a single-point FBA solution, the above is a more complete and conservative expression of what we can predict with confidence regarding growth-optimal fluxes. It expresses imperfection in the proposition that metabolism works only to maximize growth rate.

We then introduce a flux vector, **q**, representing a mutant version of the same organism. The **q** vector is not required to show near-optimal growth, but the mutation imposes the additional bounds, \mathsf{b}{\mathit{\text{'}}}_{U} and \mathsf{b}{\mathit{\text{'}}}_{L}, on a subset of fluxes, **q**_{
MUT
}. We propose that cellular regulation will push metabolism in the mutant, not towards a single optimal point as in MOMA, but towards a degenerate optimal region.

\begin{array}{l}\begin{array}{l}\mathrm{minimize}:\u2551\mathrm{p}-\mathrm{q}\u2551\hfill \\ \mathrm{subject}\phantom{\rule{0.25em}{0ex}}\mathrm{to}:\hfill \end{array}\hfill \\ \begin{array}{l}\phantom{\rule{3em}{0ex}}{\mathrm{b}}_{L}\le \mathrm{p}\le {\mathrm{b}}_{U}\hfill \\ {\mathrm{p}}_{\mathit{GROWTH}}\ge 0.90\xb7{\widehat{\mathrm{f}}}_{\mathit{GROWTH}}\hfill \\ \phantom{\rule{3em}{0ex}}\mathrm{S}\xb7\mathrm{p}=\left[0\right]\hfill \end{array}\phantom{\rule{2.5em}{0ex}}\begin{array}{l}\phantom{\rule{5em}{0ex}}{\mathrm{b}}_{L}\le \mathrm{q}\le {\mathrm{b}}_{U}\hfill \\ \phantom{\rule{2em}{0ex}}\mathrm{b}{\mathit{\text{'}}}_{L}\le \phantom{\rule{0.75em}{0ex}}{\mathrm{q}}_{\mathit{MUT}}\phantom{\rule{0.75em}{0ex}}\le \mathrm{b}{\mathit{\text{'}}}_{U}\hfill \\ \phantom{\rule{5em}{0ex}}\mathrm{S}\xb7\mathrm{q}=\left[0\right]\hfill \end{array}\phantom{\rule{0.25em}{0ex}}\hfill \end{array}

(2)

The above describes the geometric problem of finding the minimum distance between two polytopes: **p** representing the region of nearly optimal growth and **q** the space of possible fluxes limited by mutation. The point in **q** closest to the region **p** is the PSEUDO-predicted flux configuration for this metabolic mutant.

A PSEUDO solution will exist when the maximum mutant growth rate is less than the threshold set for the near-optimal region. Otherwise, the mutant and near-optimal flux polytopes overlap, describing a range of possible degenerate solutions. A minimum distance of 0 as a solution for (2) is immediately diagnostic for this degenerate case. A second form of degeneracy is possible in our model if the near-optimal and mutant polytopes align such that multiple solutions share the same non-zero Euclidean distance. This case is similar to classical FBA, in that an optimal objective value is defined but does not correspond to a unique solution.

As detailed in the Methods section, we can reformulate the objective ║p-q║ as a problem for either quadratic or conic convex programming. Therefore this formulation retains the desirable computational properties of FBA and MOMA, i.e. a guaranteed global optimum can be computed rapidly.

### PSEUDO growth predictions fall between FBA and MOMA predictions

Genome-scale optimization methods have a well-established utility for predicting growth rates of mutant strains under a variety of conditions. We compared growth rate predictions using the PSEUDO method to predictions from the FBA and MOMA techniques, both of which are commonly used for this purpose [23]. Predictions were compared to growth rates of *E. coli* deletion mutants from the Keio collection in defined glucose medium [24]. We compiled data for 795 mutant strains that could be represented in our model and for which growth rate data was available. The results of this comparison are shown in Figure 2.

The PSEUDO method was able to correctly predict the lethality phenotype for 88% of the deletion mutants examined. Results were comparable with FBA and MOMA (88% and 87%, respectively) and consistent with previous reports [4]. A deletion was predicted lethal if the growth rate was calculated to be less than 5% of the maximum. A deletion was empirically lethal if the measured OD yield was less than 5% of the maximum reported yield. The quantitative growth rate predictions made by the three methods were in general similar, with PSEUDO growth rates within 5% of the FBA optimum for 763 of the 795 mutants examined. Overall, FBA, MOMA and PSEUDO predictions correlated equally well with yield data (Spearman's *ρ* = 0.55, 0.54, 0.55, respectively). This general similarity was expected. Mutations which stoichiometrically block the production of biomass must be lethal by all three methods. Mutations in pathways that are not active in these media conditions will not affect growth predictions in any model.

However, PSEUDO growth predictions were found to differ by more than 5% from either FBA or MOMA predictions for 41 of the examined mutants (Figure 2A). This subset includes mutants bearing deletions of genes involved in glycolysis, the pentose phosphate pathway and the citric acid cycle. These disruptions in central carbon processing significantly impact cell fitness and require adaptations throughout the metabolic network, providing a challenging test case for predictive theories of metabolism. We therefore examined more closely behavior of our model in these cases.

Without exception, the PSEUDO growth prediction falls between those of FBA and MOMA. This pattern highlights an important feature of the PSEUDO objective function. FBA, by definition, calculates the highest possible growth rate consistent with thermodynamics and the conservation of mass. MOMA and PSEUDO growth predictions will both fall short of this maximum if there is a trade-off between matching the optimal growth rate and matching the rest of the wild-type flux vector. PSEUDO relaxes this trade-off by considering a range of targets within the degenerate optimal region and therefore PSEUDO growth rate predictions generally exceed those of MOMA.

The growth predictions that differ among the three models are plotted against measured growth data in Figure 2B. The mathematical requirement that PSEUDO predictions fall between FBA and MOMA correctly recapitulates the growth phenotypes of these strains. In all cases, a comparable rank correlation was found between predicted and observed growth rates (0.45, 0.42 and 0.63 for FBA, MOMA, PSEUDO respectively). However, we found the FBA predictions to consistently overestimate growth of these strains, while MOMA predictions were an underestimate in general. Figure 2C shows a histogram of prediction errors, the difference between measured and predicted yields for each method. The average PSEUDO prediction error was 0.06, and the mean of the error distribution was not significantly different from zero by either bootstrap resampling or ANOVA statistics (*p*-value: 0.2). In contrast, FBA systematically overestimated growth rates by 0.48 on average (*p*-value: 7.5·10^{-9}). MOMA predictions were lower than predicted values by 0.13 on average, and were significantly biased to underestimation (*p*-value: 1.0·10^{-8}).

### Precision and accuracy of PSEUDO flux predictions

We next sought to compare the biochemical flux predictions derived from our model to empirical flux measurements. The Metabolic flux rates of 31 central carbon reactions in 24 *E. coli* metabolic deletion mutants as determined by ^{13}C-tracer experiments were reported by Ishii and [25]. We found that 12 of these mutants carried enzymatic deletions that could be treated within our genome-scale optimization framework, and that 27 fluxes showed measureable variation between strains. Accounting for occasional omissions in the data set, we were able to curate a total of 320 flux measurements against which to compare our predictions.

Figure 3 compares measured fluxes values to predictions derived using the FBA, MOMA or PSEUDO objective functions. To facilitate comparison, both predicted and measured fluxes were normalized to the glucose uptake rate. Pearson correlation coefficients obtained for predictions within each of the 12 individual mutants ranged from 0.74 to 0.96. For 9 of the 12 strains, the PSEUDO method yielded a significantly higher correlation coefficient than either of the other two methods (p-value < 0.05). In one case, standard FBA produced the best predictions. In two cases, correlations from two or more methods were statistically equivalent. Statistical significance was assessed with Meng’s Z-test, which takes into account the high degree of correlation between the predictions from each method [26]. Similar significance results were obtained by bootstrap resampling. Exact correlation coefficients and p-values are reported in Additional file 3: Table S1.

For a global view of the metabolic behavior predicted by the FBA, MOMA and PSEUDO objective functions, we aggregated and examined predictions for all 320 measured fluxes across 12 mutants (Figure 4ABC). Across all 320 fluxes, PSEUDO was more predictive than FBA and MOMA (Pearson correlation coefficients of 0.86, 0.84 and 0.91 respectively for FBA, MOMA, and PSEUDO; *p*-value: 2.4·10^{-12}, Meng’s Z-test). Because measured flux values span several orders of magnitude, we also compared predictions to data using rank correlation coefficients that are less influenced by numerical outliers. An overall Spearman rank correlation of 0.82, 0.80, and 0.87 was obtained with FBA, MOMA and PSEUDO respectively. The higher coefficient from PSEUDO was again significant by bootstrap resampling (*p*-value < 1·10^{-6}).

Prediction accuracy was found to vary substantially for different pathways within central carbon metabolism. The histograms in Figure 4DEF compare prediction errors by each method in reactions belonging to glycolysis, the pentose phosphate cycle (PPP) and the tricarboxylic acid cycle (TCA) reactions. While prediction errors within glycolysis and the PPP were comparable among the 3 methods, the PSEUDO method produced significantly lower prediction errors within the TCA cycle. The FBA, MOMA and PSEUDO methods produced mean prediction errors of -41%, -42%, and -17% respectively. Flux variation within this class was also better predicted by PSEUDO. For TCA cycle reactions, FBA, MOMA and PSEUDO yielded Pearson correlations of 0.41, 0.38 and 0.60 respectively (*p*-value: 0.01). Thus, both the absolute level of TCA cycle flux and flux variations within the TCA cycle were better predicted by PSEUDO. Relatively high error in TCA cycle predictions when using growth as an objective in carbon-limited conditions has been reported in other models [15, 27]. We found predictions within the TCA cycle to be both the most error-prone and the most revised under our model, being responsible for most of the improved performance. We found no other significant differences in predictions from the 3 methods in glycolysis, the PPP, anapleurotic, or secretion reactions.

### Sensitivity analysis of TCA cycle predictions

To understand why the PSEUDO objective function improves flux predictions for the TCA cycle, we chose to investigate central carbon metabolism in greater detail using the *zwf* mutant as a case study (Figure 5). Carbon consumed in the form of glucose may be converted to biomass, fully oxidized to CO_{2} or secreted as reduced organic metabolites. We reasoned that the metabolic strategy used to fulfill a given objective function would be reflected in the way carbon is partitioned among these three final forms.

We observed significant differences in the ultimate fate of carbon in the *zwf* mutant as predicted with the FBA, MOMA and PSEUDO objective functions (Figure 5A). The FBA model predicts the largest flux of carbon into biomass (63%, 48%, 56% and for FBA, MOMA, PSEUDO, respectively). This is consistent with the mathematical requirement that FBA identify the highest possible growth rate. The MOMA model predicts a significant amount of carbon secreted in the form of organic metabolites (0%, 20%, and 2% for FBA, MOMA, PSEUDO, respectively). The profile of secreted metabolites also varies among the models. While PSEUDO predicts mainly acetate secretion, MOMA predicts significant secretion of 33 diverse metabolites including urea, glutamate, leucine and tryptophan. Organic secretion is likely driven by the MOMA objective to match the WT flux profile. In effect, the MOMA attempts to produce metabolites at WT levels that cannot be completely consumed and partially secretes the difference.

The PSEUDO objective function predicts the highest CO_{2} output (37%, 32% and 42% for FBA, MOMA, PSEUDO respectively). This increase in total carbon oxidation is consistent with the higher TCA cycle flux predicted by PSEUDO. It may be explained by the relaxation in PSEUDO of strict optimality requirements used in the other methods. PSEUDO is not explicitly driven, like FBA, to optimize biomass production. Nor does it produce excess metabolites and secrete them, like MOMA. With less carbon flux dedicated to these objectives, the PSEUDO model retains more carbon to fully oxidize.

We next performed a sensitivity analysis to characterize the effect of CO_{2} output on the behavior of each model. We constrained the total CO_{2} output flux to a series of specific values near the WT optimum and re-solved the *zwf* mutant model to determine the fluxes to growth and organic secretion. The resulting plots reveal the trade-offs confronted in each objective function when assigning central metabolic fluxes near optimality (Figure 5BC).

In the FBA model, the *zwf* mutant reaches a maximum growth rate of 87% WT with a slightly increased CO_{2} output. The secretion flux approaches zero as growth attains a maximum, indicating that secreting organic carbon is costly to growth. The MOMA model predicts less growth and more secretion, but also exhibits a trade-off between the two. Any deviation of CO_{2} output levels from the WT optimum results in less growth and more secretion. In contrast, the PSEUDO solution is not found near a local growth maximum. Instead, the PSEUDO method identifies a range of CO_{2} output fluxes that are consistent with near-optimal growth. In the absence of a trade-off with growth, the PSEUDO objective is free to match other features of the WT flux vector. In this case, it selects a high CO_{2} flux that coincides with low secretion, similar to the WT.

To further characterize the assignment of TCA cycle fluxes in PSEUDO, we extended our analysis to examine the CO_{2} produced by the three main carbon-oxidizing pathways: Glycolysis, the PPP and the TCA cycle (Figure 5EFG). We observed a striking qualitative difference in the behavior of PSEUDO and the other objective functions. Uniquely, the PSEUDO objective function predicts that no CO_{2} is generated through the PPP. In contrast, FBA and MOMA predict that significant CO_{2} production through the oxidative reactions of the PPP (2.5% and 10% of the total for FBA and MOMA, respectively).

In fact, the *zwf* mutant shows no oxidative PPP activity in published observations [28]. During normal glucose-limited growth, the *zwf* gene product (together with *pgl*) supplies 6-phosphogluconate to the oxidative reactions of the PPP. In the *zwf* mutant, the FBA and MOMA models alternately supply this molecule through the action of glucose dehydrogenase and glucokinase. While *E. coli* is capable of oxidizing glucose directly to gluconate, the activity appears only under a narrow range of conditions not found in laboratory cultures [29].

As revealed by sensitivity analysis, small alterations the in the PPP flux had significant consequences for the predictive power of each model. Constraining the oxidative PPP flux to zero only slightly decreased the growth predictions for the FBA WT and *zwf* models (Figure 5D). This indicates that a zero-PPP solution exists near the WT optimum. However, neither FBA nor MOMA identified this solution. In the case of FBA, glucokinase activity bestows a small growth advantage, while MOMA is driven to match the high PPP flux of the WT.

When PPP flux was decreased, the MOMA model predicted significantly higher growth and lower secretion, indicating that a zero-PPP solution alleviated the trade-off between these two fluxes in MOMA. As PPP fluxes approach zero, glycolytic and TCA cycle fluxes increase and converge in all models, indicating that similar solutions were found by FBA, MOMA and PSEUDO under reduced PPP flux. We conclude that the low TCA cycle predictions in the MOMA and, to a lesser extent, the FBA model are partially a consequence of positive oxidative PPP fluxes. In contrast, PSEUDO approaches a solution with near-zero PPP flux, still consistent with near-optimal WT growth, and much less disruptive for the *zwf* metabolic network.

In summary, several features of the PSEUDO objective appear to contribute to higher and more accurate predictions of TCA cycle fluxes for the *zwf* mutant. The PSEUDO model grows less than FBA and secrete less than MOMA, leaving more carbon available to oxidize. PSEUDO is less constrained in matching growth rates, predicting equal growth from a wide range of possible CO_{2} output fluxes. Within this range, the PSEUDO objective selects a flux profile that matches key features of the WT. High CO_{2} and TCA fluxes are consistent with glycolytic fluxes and a secretion profile that resemble the WT.

In the case of the *zwf* mutant, the qualitative features of the PSEUDO prediction including the high growth rate, the increased CO_{2} production and zero oxidative PPP flux agree with published phenotypic observations [28]. In contrast, the FBA and MOMA predictions fail to reproduce one or more of these three qualitative phenotypic features.

### PSEUDO predictions are robust to the near-optimal growth threshold

We next sought to characterize the global behavior of the PSEUDO model as a function of the threshold that we use to define near-optimal growth. This value, set to 90% optimal growth in Equation 2, is a required input parameter for our model that has no equivalent in either FBA or MOMA. Mathematically, this represents a degree of uncertainty in the hypothesis that metabolism optimizes only growth rate. Biologically, this threshold could be understood as the point at which selection for growth is counterbalanced by other, unknown, selection pressures or by inherent noise. The results presented above were made using a threshold parameter of 90% optimal growth. This selection was guided by growth variability reported in the literature. For example, the reported growth rates of metabolic mutants of *B. subtilis* suggest that metabolism may be sub-optimal for growth at roughly this level under laboratory conditions [13].

We found that PSEUDO predictions were remarkably stable as the near-optimal growth threshold was varied from 80-99%, as shown in Figure 6. Both Pearson and Spearman correlation values for PSEUDO predictions reached a maximum with the growth threshold set to 90%, and declined as near-optimal growth converged to maximum theoretical growth (Figure 6AB). We observed no qualitative differences in model behavior across this parameter range (Figure 6CDEF). This behavior is consistent with the convex shape of flux space. In a convex space, variability tends to increase rapidly for small deviations from optimality, then decelerate and plateau at moderate deviations [30–32]. Robustness with respect to the selected threshold is an important feature of the PSEUDO model, as this parameter may be difficult to measure in practice.

For very high or low values of the growth threshold, the practical application and biological interpretation of the PSEUDO model becomes more difficult. In practice, we were unable to compute solutions with threshold values higher than 99.9%, as extremely narrow range constraints on individual fluxes are known to challenge interior-point optimization solvers [33]. In this regime, we expect that PSEUDO predictions will become MOMA-like as the near-optimal region shrinks to become the optimal region. For low values of the growth threshold, the near-optimal region grows eventually to include the MOMA and FBA solutions and specific PSEUDO predictions are undefined.

### A cloud theory of metabolic regulation

The PSEUDO formulation includes a limit on the power of growth rate alone to determine metabolic behavior. This may represent simply a formal accounting for uncertainty, which is neglected in simpler models. Alternately, our objective function may be interpreted as expressing an organizational principal at work in metabolism. The metabolic network may not be perfectly adaptable in the service of growth, as imagined in FBA, nor strictly committed to a singular flux profile, as postulated by MOMA. The PSEUDO method mediates between these perspectives, attributing to metabolism an intermediate level of flexibility. Our model suggests that regulation drives metabolic fluxes to a certain range of values, but that fluctuations within that range are fitness-neutral and unregulated. On this hypothesis, we expect that regulation will allow fluxes to vary in proportion to the size of their near-optimal range.

To compare theoretical and observed flux variability, we first sought a properly normalized measure of the size of the degenerate optimal polytope in each dimension. While it is computationally infeasible to describe this space completely, [12, 34], it is possible to estimate its shape probabilistically [32]. As described in the methods, we used a Monte Carlo sampling technique to generate a set of 3000 random points uniformly distributed within the degenerate optimal region [35, 36]. These points allow an unbiased estimate of the distribution of values a given flux may attain without compromising near-optimal growth.

An ideal measure of flux variability *in vivo* would compile flux values from individual wild-type cells. Unfortunately, no such single-cell resolution dataset is currently available. Instead, we examined flux variability at the population level under two sources of genetic perturbation. The Tomita data set includes 31 fluxes measured in chemostat cultures of 24 deletion mutants for enzymes in central carbon processing [25]. Viable glycolysis and pentose phosphate pathway deletion mutants often exhibit substantially revised flux profiles and test the limits of metabolic plasticity. The Sauer data set measures 24 central carbon fluxes in 91 transcription factor deletion mutants [37]. These regulatory disruptions globally alter flux profiles while leaving the enzymatic network intact. We reasoned that these data sets together would allow us to characterize the tendency of individual fluxes to vary under perturbation.

Figure 7 compares our measures of theoretical and observed variation. For each flux in our data set, we compared the coefficient of variation (CV) derived from computational Monte Carlo sampling (Figure 7A) to the CV from published measurements (Figure 7B). Measured variability in both data sets was well matched by theoretical variability within the degenerate optimal region (Figure 7CD). The Sauer flux measurements produced a rank correlation of 0.72 (p-value: 6.7·10-5). For the Tomita data, Spearman's ρ was 0.87, (p-value: 5.6·10-9). Exact values for the estimated and measured variability of each flux are reported in Additional file 4: Table S2. The observed correlation between predicted degeneracy and measured variation supports a model in which metabolism may adopt many possible flux configurations without compromising growth rate.