Comparative multi-goal tradeoffs in systems engineering of microbial metabolism
© Byrne et al.; licensee BioMed Central Ltd. 2012
Received: 29 April 2012
Accepted: 29 August 2012
Published: 26 September 2012
Metabolic engineering design methodology has evolved from using pathway-centric, random and empirical-based methods to using systems-wide, rational and integrated computational and experimental approaches. Persistent during these advances has been the desire to develop design strategies that address multiple simultaneous engineering goals, such as maximizing productivity, while minimizing raw material costs.
Here, we use constraint-based modeling to systematically design multiple combinations of medium compositions and gene-deletion strains for three microorganisms (Escherichia coli, Saccharomyces cerevisiae, and Shewanella oneidensis) and six industrially important byproducts (acetate, D-lactate, hydrogen, ethanol, formate, and succinate). We evaluated over 435 million simulated conditions and 36 engineering metabolic traits, including product rates, costs, yields and purity.
The resulting metabolic phenotypes can be classified into dominant clusters (meta-phenotypes) for each organism. These meta-phenotypes illustrate global phenotypic variation and sensitivities, trade-offs associated with multiple engineering goals, and fundamental differences in organism-specific capabilities. Given the increasing number of sequenced genomes and corresponding stoichiometric models, we envisage that the proposed strategy could be extended to address a growing range of biological questions and engineering applications.
KeywordsMetabolism Microorganisms Metabolic engineering Constraint-based modeling
Microorganisms possess metabolic capabilities that are essential to society, science, and industry. Today, most bulk and specialty chemicals are derived from crude oil. However, declining oil reserves, rising oil prices, and growing environmental concerns have prompted renewed interest in producing chemicals using microorganisms instead of fossil fuels. To transform microbial hosts into cellular factories, the applied discipline of systems metabolic engineering is using genome-scale approaches that redirect microbial metabolism to synthesize renewable and cost-effective biochemicals[1–3].
Classical metabolic engineering methods use localized metabolic intuition and random mutagenesis screening to develop microbial strains that possess improved biochemical production capabilities. For example, Escherichia coli does not naturally produce succinic acid as a major fermentative product; consequently, early metabolic engineering efforts targeted metabolic pathways that were thought to be involved in succinic acid synthesis. However, these perceived improvements were often ineffective or produced undesirable side-effects (e.g. large amounts of impurities were produced or cell growth was significantly inhibited)[4–7]. While some conventional strategies have shown a degree of success, production levels for the synthesis of succinic acid, as well as many other valuable biochemical compounds, often fall considerably short of maximum theoretical production limits[1, 3, 8]. These shortcomings are due, in part, to the fact that metabolic pathways and related regulatory processes form complex molecular and functional interaction networks. By focusing solely on one particular enzyme or metabolic pathway, it is likely that interrelated and potentially undesirable effects elsewhere in the cell might be inadvertently missed.
Similarly, conventional metabolic designs often singly focus on achieving maximal production rates or yields of targeted compounds without accounting for adverse economic consequences (e.g., due to material costs or final product purification processes) that may ultimately make a design impractical or commercially infeasible. Many, or even most, real engineering problems have multiple engineering goals, such as maximizing operational performance, minimizing material cost, and maximizing experimental reproducibility. The criteria subsequently used for design and optimization of engineering processes largely depend on which engineering goals are chosen. For industrial fermentation processes, four of the most important design-selection criteria are productivity, yield, final titer, and economic cost. Productivity is the rate of product generation and is important to ensure the effective utilization of production capacity (e.g. capacity of bioreactors). Yield is the ratio of unit product formation to unit substrate consumption and is used as a measure of the production efficiency. Final titer is the purity of product generation and is important since further treatment of the fermentation medium, such as removal of impurities, may be necessary. Finally, economic cost is the monetary expenditure per unit of generated product. Economic costs may be associated with each component of the fermentation process and may ultimately dictate the viability of a product given current market conditions. Furthermore, engineering criteria may be condition-dependent (e.g. the criteria used for high volume, low value-added industrial fermentation products may differ significantly from the criteria used for low volume, high value-added products) and conflicting (e.g. the goal of maximum productivity may adversely affect the goal for minimum economic cost). Thus, tradeoffs among engineering goals can help to differentiate and prioritize design selection criteria.
To help evaluate and understand these complex biological and engineering relationships, system modeling is becoming an increasingly valuable tool for scientists and metabolic engineers alike. Kinetic modeling has been used to evaluate dynamic enzymatic effects of metabolism. However, at whole-cell scales, kinetic modeling can become unwieldy due, in part, to the prerequisite of kinetic parameters that may be difficult to obtain experimentally. Consequently, constraint-based modeling has become a powerful alternative, since it obviates this prerequisite by approximating metabolism in steady-state. Despite this simplification and some additional limitations, constraint-based modeling has been experimentally shown to provide valuable predictions of whole-cell metabolic fluxes and growth phenotypes under a variety of environmental and genetic conditions[13–15]. As a result, a growing number of constraint-based analysis methods are being developed to evaluate metabolic models and the corresponding mathematical solution space that characterizes the phenotypic potential of an organism. For example, flux balance analysis (FBA) uses a chosen objective function to search the edges of the mathematical solution space for a single optimal network state and associated flux distribution. FBA has been used for a variety of applications, such as predicting the lethality of gene knockouts and quantitatively predicting cellular growth rates and fluxes under different conditions. Bi-level optimization approaches based on FBA have been developed to simultaneously optimize two hierarchically-related objectives such as a primary and secondary metabolite production in microbial strain design. In particular, an initial algorithm aimed at identifying optimal designs through multiple gene knockouts (OptKnock) was followed by more versatile approaches capable of taking into account gene up-regulation and down-regulation (OptReg), as well as existing flux measurements (OptForce). Rather than analyzing single network states, other constraint-based analysis techniques, such as extreme pathway analysis and uniform random sampling, may be used to assess global network properties, characterizing ranges of optimal or sub-optimal biochemical network states. In addition, to address multiple optimality goals that may conflict and cannot be optimized simultaneously, multi-objective optimization and trade-off analysis approaches have been recently developed[24–26]. Together, these methods are yielding new biological and engineering insights.
In this study, we develop an integrative computational framework that elucidates relationships between environmental and genetic perturbations and their system-wide effects on microbial metabolism and metabolic engineering design strategies. Prior metabolic engineering studies have primarily focused on either environmental or genetic perturbation strategies, a single organism, one or a few engineering goals (usually productivity or yield) and optimal design solutions. Conversely, our approach addresses the multifaceted nature of metabolic engineering design processes by exhaustively generating and systematically analyzing more than four hundred million designs that incorporate both extracellular (i.e. medium composition) and intracellular (i.e. genetic knockout) perturbations and multiple microorganisms and engineering goals. Although any biochemical reaction network and synthesized target metabolite can be incorporated into our methodology, we focus on three microorganisms (E. coli, S. cerevisiae and S. oneidensis) and six target metabolite by-products of industrial interest: acetate, ethanol[28, 29], formate, hydrogen, D-lactate[27, 32], and succinate[33–35]. Escherichia coli[13, 36] and Saccharomyces cerevisiae are perhaps the best characterized and studied prokaryotic and eukaryotic microorganisms, respectively, and are commonly used for a wide range of computational and experimental scientific studies and industrial applications. Shewanella oneidensis is, by comparison, a more recently sequenced and less well-studied bacterium, yet it possesses considerable potential for bioremediation, microbial fuel cells and other bioenergy applications. A set of 36 biological and economic traits is used to evaluate corresponding engineering design goals. Although economic considerations are paramount in evaluating feasibility of any industrial design with commercial potential, a methodology for incorporating economic factors into constraint-based modeling had not been implemented before. The resulting population of phenotypes provides a rich dataset that is used to assess local design considerations and biological causalities, as well as global perturbation effects. An experimental compatibility score is used to assess the expected agreement of predictions with experimental data, such as mRNA expression arrays. Additionally, we present local tradeoffs between individual designs and engineering goals and global tradeoffs of metabolic traits across and within organisms. We find distinctive phenotypic characteristics that differentiate innate organism-specific metabolic capabilities, making certain organisms more suitable for particular engineering applications. We also find specific and general metabolic design strategies that can be used to facilitate optimal engineering output.
Generation of engineering design candidates
In total, more than 435 million conditions were simulated: 133,420,920 for E. coli, 179,133,985 for S. cerevisiae, and 123,124,374 for S. oneidensis. A relatively small fraction of these conditions produce viable-growth phenotypes (Additional file1: Figure S1): 15% for E. coli, 11% for S. cerevisiae, and 9% for S. oneidensis. Thirty-six metabolic metrics (18 of which are functions of economic variables) are computed for each viable-growth phenotype. Box-plot statistics for the complete data set are shown in Additional file1: Figure S2. Economic data were available for 80% of E. coli nutrients, 71% of S. cerevisiae nutrients, and 63% of S. oneidensis nutrients. Unless specified otherwise, subsequent analyses are performed on the economic data subset (see Methods for more details). Experimental data used for estimating an experimental consistency score were available for 149 of the simulated conditions. While in this work we use an indirect measure of experimental consistency and do not present a direct comparison of predicted and measured fluxes, we wish to emphasize that flux balance models have undergone a number of experimental tests[40–43], and have been used successfully for different specific metabolic engineering applications, such as production of lycopene and vanillin[40–43].
Different organisms are better at achieving different goals
Once organized into clustered meta-phenotypes, our data reveal that different organisms possess distinct dominant phenotypic characteristics. Figure2 shows the meta-phenotypes for the subset of data with economic pricing. E. coli can be characterized by the fewest number of dominant meta-phenotypes (10) followed by S. oneidensis (20) and S. cerevisiae (30). The associated number of phenotypes is 10,086,971 for E. coli, 10,080,733 for S. cerevisiae, and 12,632,536 for S. oneidensis. The largest phenotype cluster for E. coli accounts for more than 56% of all phenotypes. That meta-phenotype has relatively low biomass production rates and high biomass carbon yields, as well as low profit rates due to lower rates of targeted byproducts synthesis. The second largest phenotype is similar to the first largest phenotype; the only major difference is considerably higher acetate and hydrogen revenue yields. Together, the two largest meta-phenotypes account for more than 70% of all phenotypes. This implies that for the metabolic traits and conditions under consideration, E. coli has relatively low phenotypic variation compared to S. cerevisiae and S. oneidensis, which are much more broadly distributed. The first two largest meta-phenotypes contain 30% and 36% of all phenotypes for S. cerevisiae and S. oneidensis, respectively. A comparison between the meta-phenotypes within and between organisms can be visualized in the form of a correlation matrix (Additional file1: Figure S18).
Selected Pareto optimal engineering designs
Acetate production rate (mmol gDW-1 hr-1)
Biomass production rate (hr-1)
Succinate production rate (mmol gDW-1 hr-1)
Total economic cost rate ($ hr-1)
Succinate production rate (0.99), Succinate purity (0.01)
malthx, fum, gam, pi, so4
Succinate production rate (0.5), Succinate purity (0.5)
malthx, fum, gam, pi, so4
Succinate purity (1)
ac, fum, nh4, pi, so4
Succinate production rate (0.01), Total economic cost rate (−0.99)
sucr, o2, gam, ppt, so4
Succinate purity (0.5), Total economic cost rate (−0.5)
glc, o2, urea, pi, so4
Succinate production rate (0.33), Succinate purity (0.33), Total economic cost rate (−0.33)
glyclt, fum, nh4, pi, so4
Succinate production rate (1)
ptsG, pykFA, pfl
glc, NA, nh4, pi, so4
Acetate production rate (1)
glc, NA, nh4, pi, so4
Acetate purity (1)
glc, o2, nh4, pi, so4
Microarray consistency (1)
glc, NA, nh4, pi, so4
Acetate production rate (0.33), Microarray consistency (0.33), Acetate purity (0.33)
glc, NA, nh4, pi, so4
It is apparent that different organisms are better at achieving different goals. E. coli tends to produce higher ethanol rates than the other organisms, whereas S. cerevisiae tends to produce higher ethanol purities and at better cost efficiency. Higher ethanol production rates in E. coli tend to be positively correlated with increased economic cost. E. coli also seems to be better at producing hydrogen, whereas S. oneidensis tends to be better at producing formate. Both E. coli and S. oneidensis are good at producing acetate. Under various conditions, all the organisms appear to be able to produce relatively high levels of succinate.
An additional outcome of this analysis is that economic considerations significantly affect optimal choices of engineering designs. This may be obvious, and commercial industries usually develop engineering strategies based on economic considerations. But, to our knowledge, this is the first time high-throughput FBA analysis has been combined with economic considerations. A very distinctive bimodal economic feature in the engineering phenotypic landscape (Figure2) is that there are very expensive, high growth rate (with low profit rates and high ethanol production rates) designs and, conversely, there are cheaper and low growth rate designs. In E. coli, ethanol tends to be a costly product, whereas, by comparison, ethanol would seem to be more profitable in S. cerevisiae. In E. coli, acetate production tends to be more profitable than in the other two organisms.
Pareto analyses and correlation maps reveal local and global trade-offs
While more detailed analyses for all six metabolic products are available in the Supplementary Materials and in the online tool (see Methods), we focus here, as a representative example, on all candidate designs relevant for the production of succinate in all three organism (Figure3(A-C)). Results indicate that E. coli tends to have the greatest range of succinate capabilities, followed by S. cerevisiae and then S. oneidensis. The two-dimensional Pareto optimal frontier contains all multi-goal optimal designs and extends around the periphery of the solution space (Figure3(A, B, D, E)). Piece-wise linear trade-offs are computed over a range of Pareto designs to determine the marginal gain or cost of relative changes in weighted linear combinations of goals. For example, for succinate production rate versus succinate purity (Figure3(A)), there are 8 Pareto optimal designs within a range of succinate purity of 0 to 0.84 (succinate purity is defined as a ratio and is dimensionless; see Additional file1: Table S2 for more details) and linear regression yields a tradeoff of −214.6 units of succinate production for every unit increase in succinate purity. Above succinate purity of 0.84, there are 16 Pareto optimal designs with tradeoff of −7.2 units of succinate production for every unit increase in succinate purity. This indicates that for unit increases in succinate purity below 0.84, there is a very large negative cost in succinate production rate, whereas above 0.84, where succinate purity is relatively high and succinate production rate is low, further increases in succinate purity come at relatively low additional cost in terms of succinate production rate decreases. Additionally, for succinate purity below 0.6 there are several E. coli Pareto optimal designs, between 0.6 and 1 there are many S. cerevisiae Pareto optimal designs, and for succinate purity close to 1 there is one S. oneidensis optimal design. Thus, E. coli is better for high succinate production rates and low succinate purity (with succinate purity sensitive to design changes), whereas S. cerevisiae (and to a small extent S. oneidensis) is better for high succinate purity and low succinate production rate (with comparatively low sensitivity to design changes). Similar logic can be applied to the Pareto optimal designs in three dimensions presented in Figure3(B, C).
One may choose to prioritize simulated designs by their degree of consistency with available experimental data. High experimental consistency indicates that subsequent experimental validation may be more consistent with the predicted design solution. We computed experimental consistency scores by mapping available mRNA microarray data to metabolic flux values, in analogy with previously developed approaches to integrate gene expression data with FBA modeling (see Methods). Figure3(E) shows designs considered for maximal acetate production rate, acetate purity and experimental consistency. Although designs with higher experimental consistency are preferable, Figure3(E) indicates that higher microarray consistency comes at a cost of reduced acetate production rate. With additional higher resolution experimental data, such as metabolic flux measurements[45, 46], these insights could be improved and expanded. In principle, prediction-mapped experimental data could be used as a proxy for predicting sensitivity or accuracy and as a metric for ranking designs.
The Pareto frontiers discussed above allow one to visualize different trade-offs identifiable from our data. It is further possible to focus on specific sections of these frontiers, and characterize engineering designs that are optimal for a specific linear combination of engineering goals. In general, we observe that different combinations of goals warrant very different design solutions. As illustrative examples, selected design criteria and associated optimal designs are presented in Table1. Two of these designs yield maximal succinate production rate, both of which are for E. coli. Between the two designs, the design with higher succinate purity is an E. coli Δedd Δgnd mutant grown on minimal medium with maltohexoase as carbon source, fumarate as electron acceptor, D-glucosamine as nitrogen source, phosphate as phosphorous source, and sulfate as sulfur source. This design (hereafter referred to as Design 1, as specified in Table1) produces 195.34 mmol succinate/gDW/hr with 0.63 succinate purity. Compared to Design 7 in Table1 (an experimentally-validated succinate production design, E. coli ΔptsG ΔpykFA Δpfl mutant fermented on glucose minimal medium), Design 1 has more than a 20-fold increase in succinate production rate. However, we also see that the total economic cost rate is very high (19314.4 $/hr), perhaps impractically so. Thus, we may alternatively choose design criteria that equally weight the maximization of succinate purity and minimization of total economic cost rate. Design 5 in Table1 shows that this engineering goal combination produces a design for S. cerevisiae ΔYBR196C ΔYMR256C mutant aerobically grown on minimal medium with glucose as carbon source, urea as nitrogen source, phosphate as phosphorous source, and sulfate as sulfur source. Design 5 produces succinate purity of 0.43 and total economic cost rate of $0.05/hr. It also produces 14.77 mmol succinate/gDW/hr. As a result, this design has comparable economic cost to the validated E. coli ΔptsG ΔpykFA Δpfl mutant design (Design 7), yet has 60% higher succinate production rate. In general, many of the resultant designs (including Design 1 and Design 5) do not appear in published literature and subsequent experimental validation of the simulation predictions will be warranted. It should be noted also that an effective implementation of Design 5 may be problematic, as it involves the deletion of a gene (YBR196C) previously reported to be essential under similar growth conditions, probably due to regulatory effects.
Perturbation effects on engineering phenotypes are generalizable
So far, we focused on specific engineering goals. To provide an overall comparison of design strategies, we assessed the relative phenotypic effects of specific types of environmental and genetic perturbations. From a practical standpoint, a metabolic engineer would like to know what types of perturbations (i.e. gene deletions or environmental changes) are the most effective at inducing desirable phenotypes. To address this issue, a graph-based method was developed to assess the frequency at which different types of perturbations induce changes in the phenotypes. The availability of a huge number of phenotypic states provided the unique opportunity to explore the global connectivity between phenotypes. In particular, given any two meta-phenotypes, we asked how many elementary changes in nutrient conditions (e.g. carbon sources) or genetic background (e.g. single gene deletions) could mediate a transition between these two phenotypes. By computing the relative frequency at which a phenotype transitions from one meta-phenotype to another meta-phenotype, we can compare causal environmental and genetic perturbation types.
Differences in the meta-phenotypic traits can be evaluated by comparing node colors. Each node face is divided into quadrants associated with the four selected engineering metrics: succinate production rate, succinate purity, biomass production rate, and total economic cost rate. For example, in Figure6, meta-phenotype Nodes 9 and 10 are the most economically costly, whereas Nodes 7 and 8 have the highest succinate production rates. We comprehensively analyzed Design 1 and Design 5. Table1 showed details of the designs and associated phenotype Cluster Ids. Design 1 is associated with Node 8 in Figure6 and has high succinate production rate. Design 5 is associated with Node 30 in Additional file1: Figure S13 and has high succinate purity.
Phenotypic prevalence is a measure of how common a meta-phenotype is, given the imposed conditions. Phenotypic prevalence is represented by node size, which is scaled by phenotype cluster size. We see that both Designs 1 and 5 are phenotypes associated with less prevalent meta-phenotypes. Node 1 in Figure6 is the largest, and thus most common meta-phenotype for E. coli. We can similarly evaluate the distribution of phenotypic prevalence for each organism. The distribution for E. coli is the most skewed and, thus, the majority of E. coli phenotypes are associated with a few meta-phenotypes, whereas the distribution is comparatively more uniform for S. cerevisiae and S. oneidensis. Together with phenotypic variation, this further shows that for the studied traits E. coli has few overall dominant phenotypes and a single super-dominant meta-phenotype.
Some phenotypes are innately more robust or sensitive to different types of perturbations. Self-loops indicate the robustness of the phenotype cluster to genetic or environmental perturbations, whereas the thick edges between meta-phenotypes indicate that those phenotypes are sensitive to the considered type of perturbation. In Figure6, phenotypes are generally robust against changes in electron acceptors but are relatively more sensitive to changes in carbon or nitrogen sources. Design 1 (associated with Node 8 in Figure6) is relatively robust to changes in phosphorous sources, but more sensitive to carbon and nitrogen sources and single gene-deletions.
Network edges can be used to determine global transitions between meta-phenotypes. For example, Node 8 (meta-phenotype associated with Design 1) is connected to Nodes 6, 7, and 10 by nitrogen source perturbations and to Node 3 by carbon source perturbations. Thus, Node 8 is more closely related to those nodes and it is easier (i.e. fewer perturbations of those types are required) to transition between those nodes than between the other nodes that it is not connected to.
Interestingly, the different patterns of connectivity found in meta-phenotype graphs for different organisms suggest that, broadly speaking, the metabolic usefulness of different organisms may be best assessed through different types of perturbation analyses. Additional file1: Figure S15 shows the relative perturbation influences on global phenotype changes. It is apparent that carbon and nitrogen source perturbations have the greatest relative effect on changes in phenotype in E. coli, whereas S. cerevisiae, and S. oneidensis are more uniformly sensitive to all types of environmental and genetic perturbations analyzed. Thus, if one is trying to perturb the E. coli metabolism, one might preferentially design carbon and nitrogen source perturbations.
We presented a high-throughput computational framework for generating and exploring an exhaustive landscape of in silico perturbations and metabolic engineering designs. Each design condition consists of an environment (medium composition), an organism (Escherichia coli, Saccharomyces cerevisiae, or Shewanella oneidensis) and a genotype (set of gene deletions). The vast population of the resultant design solutions produces a contextualized phenotypic map that is used to evaluate relationships between engineering goals and fundamental biological network properties. Using a set of metabolic traits, the large number of metabolic phenotypes is clustered into dominant meta-phenotypes. Whereas individual phenotypes are used to evaluate localized design considerations, causal biological mechanisms and design tradeoffs, the meta-phenotypes are used to evaluate global phenotypic diversity and relationships between metabolic traits and perturbation strategies. The proposed approach can help understand how environmental and genetic factors influence metabolism and metabolic engineering design.
A single unique optimal design solution may suffice for a single distinct set of weighted engineering goals. The resultant phenotypic map provides a regional and global context for this design solution relative to all other designs. If, for example, the exact values of the weights associated with the importance of the engineering goals are uncertain, we show that, by using the map, sub-optimal designs located in the proximity of optimality can be evaluated to assess the sensitivity of those weights. A designer can then assess whether or not it might be desirable to reprioritize engineering goals. By analogy, instead of having a single travel destination and navigational route to that destination, a map is very useful for assessing alternative destinations and routes that may, upon further inspection, be deemed more desirable than the original one.
Prior metabolic engineering studies have primarily focused on a single organism, engineering application, environment or genetic perturbation strategy, and optimal design solution. However, different organisms have different metabolic capabilities, due to diverse environmental adaptations and biochemical wiring. Thus, the interdependencies between the desirable organism, engineering application and design strategy are often unclear a priori. Here, we extended prior approaches by comparing organisms and systematically evaluating both environmental and genetic perturbation strategies. We illustrated inherent differences in the metabolic capabilities and phenotypic variations of E. coli, S. cerevisiae and S. oneidensis. To account for important economic considerations, we developed a methodology for integrating economic data. Furthermore, we showed that preexisting experimental data can be readily incorporated to help rank designs by how likely they are to be accurately reproduced experimentally.
After initial compilation of the phenotype population dataset, multiple complex combinatorial optimization problems can be solved (e.g. Pareto optimality design analysis). There is no restriction in terms of linearity or nonlinearity of metabolic traits and, if an engineering goal needs to be changed, there is no need to re-compute the phenotype dataset; one just needs to redefine the corresponding function and re-query the data. There are, however, limitations. For example, prediction accuracy of the metabolic designs is limited by the accuracy of the underlying models. Thus, accuracy can be further improved by improving the models (e.g. incorporating additional biological mechanisms, such as transcriptional regulation). Searching the design parameter space is also limited by the combinatorial nature of this “brute-force” approach. Here, we evaluated a comprehensive, but limited, subset of the theoretically infinite number of genetic and environmental parameter values and combinations. Simulating and processing the 435 million conditions took approximately 4 weeks of CPU time (see Methods for more details). By further optimizing the underlying programming code and by incorporating additional computing processors, the overall compute-time could be significantly reduced and many additional organisms and design strategies could be evaluated. Nevertheless, since an exhaustive search of the complete parameter space is not possible, prior knowledge will be useful in deciding which regions and level of granularity of the parameter space to explore.
Compared to other optimization approaches, our method potentially sacrifices depth (e.g. looking at triple and multiple knockouts) in favor of breadth (i.e. obtaining a snapshot of behaviors across an unprecedented number of perturbations and environments). Future studies may seek to further compare and contrast the spectrum of perturbation strategies to assess the advantages and limitations of each. For example, methods that can infer optimal combinations of more than two gene additions or deletions[19, 41] could be preceded by broad surveys across multiple organisms. In addition, our approach could provide useful preliminary indication of the suitability of specific organisms for nonlinear objectives that may not be easily addressed through other available optimization approaches.
We would like to highlight that the biological details of our results can be conveniently accessed and visualized through the online tool that we present as part of this work. This tool is currently tied to predefined criteria for the choice of designs and engineering goals. However, future elaborations of our approach and of this tool could easily relax the existing constraints, for example including weights for the importance of different objectives, and thresholds for levels of acceptable violations of specific constraints. In addition, the process could be transformed into an iterative one, where an initial query throughout the entire space could be followed by a user-defined choice of specific criteria, which would lead to a deeper search in a restricted region of the space.
Furthermore, while in the current work we focus mainly on the metabolic phenotypes relevant for metabolic engineering applications, a different type of analysis could provide complementary insight on the biological aspects of the data presented. For example, it would be interesting to understand, for each meta-phenotype, whether it can be associated with specific environmental or genetic properties (e.g. limitation of a specific nutrient). This type of analysis would require revisiting our large data set (i.e. the meta-phenotypes shown in Additional file2: Table S4, and the complete list of designs they comprise, available online, see Methods), in search for meaningful biological patterns.
Given the increasing number of sequenced genomes, improved model accuracy and the growing available computing power, it is foreseeable that future extensions of our approach could help address a growing range of biological questions and engineering applications. Rapid growth of industrial biotechnology is helping to drive demand for a widening range of products, such as commodity chemicals (e.g., succinic acid and ethanol), fine chemicals (e.g. 6-aminopenicillanic acid and other antibiotics), and specialty chemicals (e.g., food and feed additives). In many application areas, however, production output of cellular factories falls significantly short of what is theoretically possible and may be insufficient for practical implementation. Systems engineering methods, including the approach presented, hold great promise in overcoming current engineering limitations and design challenges[1–3]. The exhaustive strategy we have explored, while combinatorially limited, enables complex searches across nonlinear objectives and multiple species, complementing other optimization methods, and providing a global portrait of the landscape of possible metabolic phenotypes.
where C (mol/L) is the concentration vector of m internal metabolites, x (mol/L/h) is the reaction rate (flux) vector of n reactions, A is the stoichiometry matrix of dimension m × n whose elements a ij represent the stoichiometric coefficient of the element i involved in reaction j, and μ (1/h) is the specific dilution rate associated with the change in volume of the system. At steady state there is no accumulation of internal metabolites in the system and Equation (1) can be simplified to Ax = 0. Additionally, due to thermodynamic restrictions, some reactions can effectively be considered irreversible leading to additional contraints of the type x i ≥ 0.
An objective function that is commonly used for microbial systems is the maximization of biomass formation (see Equation 3). To simulate changes in nutrient composition or gene-deletion effects over a range of parameter values, parameters α i and β i in the LP problem (Equation 2) can be iteratively modified (e.g., both set to zero to stimulate a gene knockout) and the problem solved again to obtain a new solution vector x. While this new solution achieves max f(x), the engineering objective (e.g. maximization of target-compound synthesis or minimization of media cost) may be suboptimal.
The genome-scale metabolic models for E. coli, S. cerevisiae, and S. oneidensis are used to enumerate over a comprehensive set of feedstock medium compositions and single and double gene deletions. The nutrient and gene-deletion parameter space that is explored is described in Additional file1: Table S1. Single and double gene-deletions are chosen from the genes associated with the citric acid cycle, glycolysis, gluconeogenesis, oxidative phosphorylation, pentose phosphate, and pyruvate metabolism pathways.
where the stoichiometric coefficient d i corresponds to the experimentally measured contribution of biomass component z i to biomass. To quantify the engineering value of metabolic states under the various conditions, engineering metrics are defined as listed in Additional file1: Table S2. Values for maximum theoretical engineering goals are computed using FBA (Equation (2)) using the engineering metrics themselves as objectives to be either maximized or minimized, rather than the objective function expressed in Equation (3). Linear programming is implemented using the GNU Linear Programming Kit software. Data processing is implemented as a distributed process run on a computing cluster with 192 processor dual-dual core 2.8 Ghz computer nodes.
Metabolic traits are defined in Additional file1: Table S1. Final titer is typically measured as a concentration. However, it may alternatively be thought of as a ratio of the target product to the total by-products being produced (i.e. titer ratio).
Unit prices for some nutrients were not available. Approximately 23% of the simulated medium compositions included one or more nutrients that did not have an assigned nutrient unit price. Additional file1: Figures S2 and S4 show the results for all simulations. All other figures contain economic metric data based on nutrient unit prices and, therefore, omit those simulations with missing unit price data.
To evaluate how closely the simulated phenotype and pathway activity predictions correspond to the experimental values, a metric for experimental consistency is computed using a method called Gene Inactivity Moderated by Metabolism and Expression (GIMME). The GIMME algorithm provides a quantitative consistency score that indicates how consistent a set of gene expression data is when compared to a simulated flux solution under similar conditions. A set of 149 Affymetrix microarrays for E. coli, processed using GC-RMA, was gathered. This method evaluates how closely the pathway activity, as measured by microarray gene expression, matches the simulated pathway flux activity.
where m i = mean of cluster C i and n = number of objectives in the dataset. To find the optimal number of clusters, the gap statistic is used to compare within-cluster dispersions in the observed data to expected within-cluster dispersions in data generated from a null distribution when the deviation is maximized. This method, designed to be applicable to any cluster technique and distance measure, is in wide use[60–62]. We found that our optimal cluster numbers are fairly robust, particularly for the economic dataset discussed in the main text (Additional file1: Figure S3B), where the deviation is significantly less for one cluster more or less than the computed optimal number of clusters.
where y i is meta-phenotype i, is the average meta-phenotype vector and σ is the meta-phenotype standard deviation. Instructions for downloading the phenotype metric data and conditions and associated meta-phenotypes mapping are available athttp://nets.bu.edu/about.
Pareto optimal designs and trade-offs
Multi-goal optimization (also known as multi-objective optimization) is the process of simultaneously optimizing two or more conflicting goals (or objectives) subject to a set of constraints[63–65]. In our study, we have a vector of engineering goals f(v) = f1(v),f2(v),…fm(v)], where v is the vector of computed engineering metrics shown in Table S2. The associated multi-goal optimization problem is min f (v), bounded by the discrete set of available solutions. Each solution corresponds to a metabolic engineering design candidate (or multiple candidates if they have identical engineering phenotypes). If the individual goals in f ( v ) do not conflict, then it is possible to find a unique optimal solution. However, if the individual goals in f ( v ) do conflict, then a unique solution will not exist. Instead, there will be a set of Pareto solutions. If a change (or tradeoff) in one of the solutions improves one goal without making another goal any worse, then that change is called a Pareto improvement and the initial solution is called dominated. If the subsequent solution is such that an improvement in one goal requires degradation in another goal, then that solution is called nondominated and is Pareto optimal. The set of Pareto optimal solutions is often called the Pareto frontier.
To determine the Pareto optimal designs and frontiers in our study, we use a Pareto-compliant method called Nondominated Sorting Genetic Algorithm II (NSGA-II). This method, widely used in prior research[25, 67–69], incorporates a genetic algorithm and a ranking procedure to select nondominated solutions. Specifically, we used the Matlab function gamultiobj, from the Global Optimization Toolbox. We feed into the function our set of engineering goals, f(v), and obtain as a result the set of Pareto optimal designs. The parameter values that we used are 500 maximum number of generations, population size of 100 chromosomes, 0.85 probability of crossover, 0.05 probability of mutation, distribution index of 10 for simulated crossover, distribution index of 20 for simulated mutation and a random seed of 0.6. Prior studies showed that these parameter values are generally satisfactory[25, 68] and we found that our results were not significantly sensitive to changes in these values. Subsequent Pareto tradeoff analysis (i.e. determining marginal gain or cost of relative changes in weighted linear combinations of goals) is computed using piece-wise linear differences between the Pareto designs associated with particular Pareto frontiers.
Meta-phenotype transition network
Transition frequencies are computed by varying a single individual perturbation type (carbon, electron acceptor, nitrogen, phosphorous, and sulfur sources and single and double gene deletions), while maintaining fixed the remaining perturbation types. Environmental perturbations are imposed by changing the absence or presence of a nutrient in the medium (as described in the Methods “Constraint-based modeling” subsection), whereas genetic perturbations are imposed by deleting single or double genes. A resultant meta-phenotype transition network can be generated such that Nodes i and j represent two viable-growth engineering phenotype clusters. Edge t i,j represents the cumulative phenotype-cluster transition frequency between Nodes i and j due to either environmental perturbations or genetic perturbations. Node 0 represents the nonviable-growth phenotype. Thus, edges t i,0 and t j,0 represent the cumulative phenotype-cluster transition frequencies to the nonviable-growth phenotype due to perturbations. Edges are bidirectional, so t i,j is equivalent to t j,i . Edge thickness is proportional to the cumulative transition frequency for environmental or genetic perturbations. By performing this analysis systematically for all meta-phenotypes, we obtained a network of meta-phenotype transitions for each organism E. coli, S. cerevisiae, and S. oneidensis. Because the non-viable meta-phenotype is the sum of all possible non-viable phenotypes, it is comparatively extremely large and would effectively dwarf all viable meta-phenotype nodes. Thus, the non-viable meta-phenotype is not shown or included in the figures and results presented.
Multi-goal Metabolic Engineering Visualizer
A public website (Additional file1: Figures S16 and S17), located athttp://nets.bu.edu, was developed to make available the optimal metabolic engineering design results. The website provides an interface that may be used to submit customized search queries, choose engineering designs, and interact with resultant metabolic network visualizations.
From the website’s main page (Additional file1: Figure S16), a user can choose from a list of organisms, target products, and engineering goals. Based on selected optimization criteria, the website generates a list of metabolic engineering designs. If multiple engineering goals are selected, then a resultant set of Pareto optimal designs are tabulated where one can compare alternative designs with competing metric values. The user may click on any one of the designs to generate a metabolic network map that has corresponding metabolic pathways and reactions color-coded by flux values. The map can be panned, zoomed, and searched. Other map features include clickable nodes and edges for obtaining additional information about metabolites and reactions.
An online tutorial for the website (Additional file1: Figure S17) is located athttp://nets.bu.edu/about. Alternatively, the tutorial can be obtained by clicking “Help” on the website’s main page. The tutorial explains the process of defining engineering optimization criteria, selecting resultant designs and visualizing metabolic pathway activity.
- ec/E. coli:
Flux balance analysis
GC Robust Multi-array Average
Gram dry weight
Gene Inactivity Moderated by Metabolism and Expression
Multi-Goal Metabolic Engineering
- sc/S. cerevisiae:
- so/S. oneidensis:
Tricarboxylic acid cycle.
We are thankful for helpful conversations with Timothy Gardner and members of the Segrè lab. DB was supported by a fellowship from National Science Foundation Integrative Graduate Education and Research Traineeship grant 0654108. DB and DS were supported by grants from NIH (5R01GM089978-02) and the Office of Science (BER) of the US Department of Energy (DE-FG02-07ER64388).
- Keasling JD: Manufacturing molecules through metabolic engineering. Science. 2010, 330: 1355-1358. 10.1126/science.1193990.View ArticleGoogle Scholar
- Kim I-K, Roldão A, Siewers V, Nielsen J: A systems-level approach for metabolic engineering of yeast cell factories. FEMS Yeast Res. 2011, 12: 228-248.View ArticleGoogle Scholar
- Lee JW, Kim TY, Jang Y-S, Choi S, Lee SY: Systems metabolic engineering for chemicals and materials. Trends Biotechnol. 2011, 29: 370-378. 10.1016/j.tibtech.2011.04.001.View ArticleGoogle Scholar
- Millard CS, Chao YP, Liao JC, Donnelly MI: Enhanced production of succinic acid by overexpression of phosphoenolpyruvate carboxylase in escherichia coli. Appl Environ Microbiol. 1996, 62: 1808-1810.Google Scholar
- Stols L, Donnelly MI: Production of succinic acid through overexpression of NAD(+)-dependent malic enzyme in an escherichia coli mutant. Appl Environ Microbiol. 1997, 63: 2695-2701.Google Scholar
- Hong SH, Lee SY: Metabolic flux analysis for succinic acid production by recombinant escherichia coli with amplified malic enzyme activity. Biotechnol Bioeng. 2001, 74: 89-95. 10.1002/bit.1098.View ArticleGoogle Scholar
- Bunch PK, Mat-Jan F, Lee N, Clark DP: The ldhA gene encoding the fermentative lactate dehydrogenase of escherichia coli. Microbiology (Reading, Engl). 1997, 143 (Pt 1): 187-195.View ArticleGoogle Scholar
- Lee SJ, Lee D-Y, Kim TY, Kim BH, Lee J, Lee SY: Metabolic engineering of escherichia coli for enhanced production of succinic acid, based on genome comparison and in silico gene knockout simulation. Appl Environ Microbiol. 2005, 71: 7880-7887. 10.1128/AEM.71.12.7880-7887.2005.View ArticleGoogle Scholar
- Price ND, Reed JL, Palsson BØ: Genome-scale models of microbial cells: evaluating the consequences of constraints. Nat Rev Microbiol. 2004, 2: 886-897. 10.1038/nrmicro1023.View ArticleGoogle Scholar
- Jarboe LR, Zhang X, Wang X, Moore JC, Shanmugam KT, Ingram LO: Metabolic engineering for production of biorenewable fuels and chemicals: contributions of synthetic biology. J Biomed Biotechnol. 2010, 2010: 761042-View ArticleGoogle Scholar
- Tang YJ, Meadows AL, Keasling JD: A kinetic model describing shewanella oneidensis MR-1 growth, substrate consumption, and product secretion. Biotechnol Bioeng. 2007, 96: 125-133. 10.1002/bit.21101.View ArticleGoogle Scholar
- Segrè D, Zucker J, Katz J, Lin X, D’haeseleer P, Rindone WP, Kharchenko P, Nguyen DH, Wright MA, Church GM: From annotated genomes to metabolic flux models and kinetic parameter fitting. OMICS. 2003, 7: 301-316. 10.1089/153623103322452413.View ArticleGoogle Scholar
- Edwards JS, Palsson BO: Metabolic flux balance analysis and the in silico analysis of escherichia coli K-12 gene deletions. BMC Bioinforma. 2000, 1: 1-10.1186/1471-2105-1-1.View ArticleGoogle Scholar
- Edwards JS, Ibarra RU, Palsson BO: In silico predictions of escherichia coli metabolic capabilities are consistent with experimental data. Nat Biotechnol. 2001, 19: 125-130. 10.1038/84379.View ArticleGoogle Scholar
- Famili I, Forster J, Nielsen J, Palsson BO: Saccharomyces cerevisiae phenotypes can be predicted by using constraint-based analysis of a genome-scale reconstructed metabolic network. Proc Natl Acad Sci U S A. 2003, 100: 13134-13139. 10.1073/pnas.2235812100.View ArticleGoogle Scholar
- Orth JD, Thiele I, Palsson BØ: What is flux balance analysis?. Nat Biotechnol. 2010, 28: 245-248. 10.1038/nbt.1614.View ArticleGoogle Scholar
- Suthers PF, Zomorrodi A, Maranas CD: Genome-scale gene/reaction essentiality and synthetic lethality analysis. Mol Syst Biol. 2009, 5: 301-View ArticleGoogle Scholar
- Kim J, Reed JL, Maravelias CT: Large-scale bi-level strain design approaches and mixed-integer programming solution techniques. PLoS One. 2011, 6: e24162-10.1371/journal.pone.0024162.View ArticleGoogle Scholar
- Burgard AP, Pharkya P, Maranas CD: Optknock: a bilevel programming framework for identifying gene knockout strategies for microbial strain optimization. Biotechnol Bioeng. 2003, 84: 647-657. 10.1002/bit.10803.View ArticleGoogle Scholar
- Pharkya P, Maranas CD: An optimization framework for identifying reaction activation/inhibition or elimination candidates for overproduction in microbial systems. Metab Eng. 2006, 8: 1-13. 10.1016/j.ymben.2005.08.003.View ArticleGoogle Scholar
- Ranganathan S, Suthers PF, Maranas CD: OptForce: an optimization procedure for identifying all genetic manipulations leading to targeted overproductions. PLoS Comput Biol. 2010, 6: e1000744-10.1371/journal.pcbi.1000744.View ArticleGoogle Scholar
- Price ND, Reed JL, Papin JA, Famili I, Palsson BO: Analysis of metabolic capabilities using singular value decomposition of extreme pathway matrices. Biophys J. 2003, 84: 794-804. 10.1016/S0006-3495(03)74899-1.View ArticleGoogle Scholar
- Price ND, Schellenberger J, Palsson BO: Uniform sampling of steady-state flux spaces: means to design experiments and to interpret enzymopathies. Biophys J. 2004, 87: 2172-2186. 10.1529/biophysj.104.043000.View ArticleGoogle Scholar
- Oh Y-G, Lee D-Y, Lee SY, Park S: Multiobjective flux balancing using the NISE method for metabolic network analysis. Biotechnol Prog. 2009, 25: 999-1008. 10.1002/btpr.193.View ArticleGoogle Scholar
- Lee FC, Pandu Rangaiah G, Lee D-Y: Modeling and optimization of a multi-product biosynthesis factory for multiple objectives. Metab Eng. 2010, 12: 251-267. 10.1016/j.ymben.2009.12.003.View ArticleGoogle Scholar
- Schuetz R, Zamboni N, Zampieri M, Heinemann M, Sauer U: Multidimensional Optimality of Microbial Metabolism. Science. 2012, 336: 601-604.Google Scholar
- Shanmugam KT, Ingram LO: Engineering biocatalysts for production of commodity chemicals. J Mol Microbiol Biotechnol. 2008, 15: 8-15. 10.1159/000111988.View ArticleGoogle Scholar
- Bro C, Regenberg B, Förster J, Nielsen J: In silico aided metabolic engineering of saccharomyces cerevisiae for improved bioethanol production. Metab Eng. 2006, 8: 102-111. 10.1016/j.ymben.2005.09.007.View ArticleGoogle Scholar
- Otero JM, Panagiotou G, Olsson L: Fueling industrial biotechnology growth with bioethanol. Adv Biochem Eng Biotechnol. 2007, 108: 1-40.Google Scholar
- Waks Z, Silver PA: Engineering a synthetic dual-organism system for hydrogen production. Appl Environ Microbiol. 2009, 75: 1867-1875. 10.1128/AEM.02009-08.View ArticleGoogle Scholar
- Meshulam-Simon G, Behrens S, Choo AD, Spormann AM: Hydrogen metabolism in shewanella oneidensis MR-1. Appl Environ Microbiol. 2007, 73: 1153-1165. 10.1128/AEM.01588-06.View ArticleGoogle Scholar
- Fong SS, Burgard AP, Herring CD, Knight EM, Blattner FR, Maranas CD, Palsson BO: In silico design and adaptive evolution of escherichia coli for production of lactic acid. Biotechnol Bioeng. 2005, 91: 643-648. 10.1002/bit.20542.View ArticleGoogle Scholar
- Cox SJ, Shalel Levanon S, Sanchez A, Lin H, Peercy B, Bennett GN, San K-Y: Development of a metabolic network design and optimization framework incorporating implementation constraints: a succinate production case study. Metab Eng. 2006, 8: 46-57. 10.1016/j.ymben.2005.09.006.View ArticleGoogle Scholar
- Zeikus JG, Jain MK, Elankovan P: Biotechnology of succinic acid production and markets for derived industrial products. Appl Microbiol Biotechnol. 1999, 51: 545-552. 10.1007/s002530051431.View ArticleGoogle Scholar
- Branduardi P, Smeraldi C, Porro D: Metabolically engineered yeasts: “potential” industrial applications. J Mol Microbiol Biotechnol. 2008, 15: 31-40. 10.1159/000111990.View ArticleGoogle Scholar
- Varma A, Boesch BW, Palsson BO: Biochemical production capabilities of escherichia coli. Biotechnol Bioeng. 1993, 42: 59-73. 10.1002/bit.260420109.View ArticleGoogle Scholar
- Fredrickson JK, Romine MF, Beliaev AS, Auchtung JM, Driscoll ME, Gardner TS, Nealson KH, Osterman AL, Pinchuk G, Reed JL, Rodionov DA, Rodrigues JLM, Saffarini DA, Serres MH, Spormann AM, Zhulin IB, Tiedje JM: Towards environmental systems biology of shewanella. Nat Rev Microbiol. 2008, 6: 592-603. 10.1038/nrmicro1947.View ArticleGoogle Scholar
- Heidelberg JF, Paulsen IT, Nelson KE, Gaidos EJ, Nelson WC, Read TD, Eisen JA, Seshadri R, Ward N, Methe B, Clayton RA, Meyer T, Tsapin A, Scott J, Beanan M, Brinkac L, Daugherty S, DeBoy RT, Dodson RJ, Durkin AS, Haft DH, Kolonay JF, Madupu R, Peterson JD, Umayam LA, White O, Wolf AM, Vamathevan J, Weidman J, Impraim M, Lee K, Berry K, Lee C, Mueller J, Khouri H, Gill J, Utterback TR, McDonald LA, Feldblyum TV, Smith HO, Venter JC, Nealson KH, Fraser CM: Genome sequence of the dissimilatory metal ion-reducing bacterium shewanella oneidensis. Nat Biotechnol. 2002, 20: 1118-1123. 10.1038/nbt749.View ArticleGoogle Scholar
- Lovley DR: The microbe electric: conversion of organic matter to electricity. Curr Opin Biotechnol. 2008, 19: 564-571. 10.1016/j.copbio.2008.10.005.View ArticleGoogle Scholar
- Lun DS, Rockwell G, Guido NJ, Baym M, Kelner JA, Berger B, Galagan JE, Church GM: Large-scale identification of genetic design strategies using local search. Mol Syst Biol. 2009, 5: 296-View ArticleGoogle Scholar
- Patil KR, Rocha I, Förster J, Nielsen J: Evolutionary programming as a platform for in silico metabolic engineering. BMC Bioinforma. 2005, 6: 308-10.1186/1471-2105-6-308.View ArticleGoogle Scholar
- Xu P, Ranganathan S, Fowler ZL, Maranas CD, Koffas MAG: Genome-scale metabolic network modeling results in minimal interventions that cooperatively force carbon flux towards malonyl-CoA. Metab Eng. 2011, 13: 578-587. 10.1016/j.ymben.2011.06.008.View ArticleGoogle Scholar
- Brochado AR, Matos C, Møller BL, Hansen J, Mortensen UH, Patil KR: Improved vanillin production in baker’s yeast through in silico design. Microb Cell Fact. 2010, 9: 84-10.1186/1475-2859-9-84.View ArticleGoogle Scholar
- Pinchuk GE, Hill EA, Geydebrekht OV, De Ingeniis J, Zhang X, Osterman A, Scott JH, Reed SB, Romine MF, Konopka AE, Beliaev AS, Fredrickson JK, Reed JL: Constraint-based model of shewanella oneidensis MR-1 metabolism: a tool for data analysis and hypothesis generation. PLoS Comput Biol. 2010, 6: e1000822-10.1371/journal.pcbi.1000822.View ArticleGoogle Scholar
- Sauer U: Metabolic networks in motion: 13C-based flux analysis. Mol Syst Biol. 2006, 2: 62-View ArticleGoogle Scholar
- Yuan J, Bennett BD, Rabinowitz JD: Kinetic flux profiling for quantitation of cellular metabolic fluxes. Nat Protoc. 2008, 3: 1328-1340. 10.1038/nprot.2008.131.View ArticleGoogle Scholar
- Aguilera A: Deletion of the phosphoglucose isomerase structural gene makes growth and sporulation glucose dependent in saccharomyces cerevisiae. Mol Gen Genet. 1986, 204: 310-316. 10.1007/BF00425515.View ArticleGoogle Scholar
- Gavrilescu M, Chisti Y: Biotechnology-a sustainable alternative for chemical industry. Biotechnol Adv. 2005, 23: 471-499. 10.1016/j.biotechadv.2005.03.004.View ArticleGoogle Scholar
- Clarke BL: Stoichiometric network analysis. Cell Biophys. 1988, 12: 237-253.View ArticleGoogle Scholar
- Bertsimas D, Tsitsiklis JN: Introduction to Linear Optimization. 1997, Athena Scientific, New HampshireGoogle Scholar
- Feist AM, Palsson BO: The biomass objective function. Curr Opin Microbiol. 2010, 13: 344-349. 10.1016/j.mib.2010.03.003.View ArticleGoogle Scholar
- Feist AM, Henry CS, Reed JL, Krummenacker M, Joyce AR, Karp PD, Broadbelt LJ, Hatzimanikatis V, Palsson BØ: A genome-scale metabolic reconstruction for escherichia coli K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information. Mol Syst Biol. 2007, 3: 121-View ArticleGoogle Scholar
- Mo ML, Palsson BO, Herrgård MJ: Connecting extracellular metabolomic measurements to intracellular flux states in yeast. BMC Syst Biol. 2009, 3: 37-10.1186/1752-0509-3-37.View ArticleGoogle Scholar
- Barrett CL, Herring CD, Reed JL, Palsson BO: The global transcriptional regulatory network for metabolism in escherichia coli exhibits few dominant functional states. Proc Natl Acad Sci U S A. 2005, 102: 19103-19108. 10.1073/pnas.0505231102.View ArticleGoogle Scholar
- GNU Linear Programming Kit (GLPK): [http://www.gnu.org/software/glpk/],
- Becker SA, Palsson BO: Context-specific metabolic networks are consistent with experiments. PLoS Comput Biol. 2008, 4: e1000082-10.1371/journal.pcbi.1000082.View ArticleGoogle Scholar
- Irizarry RA, Wu Z, Jaffee HA: Comparison of affymetrix geneChip expression measures. Bioinformatics. 2006, 22: 789-794. 10.1093/bioinformatics/btk046.View ArticleGoogle Scholar
- Hartigan J, Wong M: Algorithm AS 136: a k-means clustering algorithm. J R Stat Soc Ser C Appl Stat. Series C (Applied Statistics). 1979, 28: 100-108.Google Scholar
- Tibshirani R, Walther G, Hastie T: Estimating the number of clusters in a dataset via the Gap statistic. J R Stat Soc. 2000, 63: 411-423.View ArticleGoogle Scholar
- Solovieff N, Hartley SW, Baldwin CT, Perls TT, Steinberg MH, Sebastiani P: Clustering by genetic ancestry using genome-wide SNP data. BMC Genet. 2010, 11: 108-View ArticleGoogle Scholar
- Li M, Reilly MP, Rader DJ, Wang L-S: Correcting population stratification in genetic association studies using a phylogenetic approach. Bioinformatics. 2010, 26: 798-806. 10.1093/bioinformatics/btq025.View ArticleGoogle Scholar
- Stegmayer G, Milone DH, Kamenetzky L, Lopez MG, Carrari F: A biologically inspired validity measure for comparison of clustering methods over metabolic data sets. IEEE/ACM Trans Comput Biol Bioinform. 2012, 9: 706-716.View ArticleGoogle Scholar
- Bertsekas DP: Nonlinear programming. 1995, Athena Scientific, New Hampshire, 1Google Scholar
- Censor Y: Pareto optimality in multiobjective problems. Appl Math Optim. 1977, 4: 41-59. 10.1007/BF01442131.View ArticleGoogle Scholar
- Deb K: Multi-objective optimization using evolutionary algorithms. 2001, John Wiley and Sons, New JerseyGoogle Scholar
- Deb K, Pratap A, Agarwal S, Meyarivan T: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput. 2002, 6: 182-197. 10.1109/4235.996017.View ArticleGoogle Scholar
- Patnaik PR: Intelligent models of the quantitative behavior of microbial systems. Food Bioprocess Technol. 2009, 2: 122-137. 10.1007/s11947-008-0112-8.View ArticleGoogle Scholar
- Tarafder A, Rangaiah GP, Ray AK: A study of finding many desirable solutions in multiobjective optimization of chemical processes. Comput Chem Eng. 2007, 31: 1257-1271. 10.1016/j.compchemeng.2006.10.010.View ArticleGoogle Scholar
- Lee FC, Rangaiah GP, Ray AK: Multi-objective optimization of an industrial penicillin V bioreactor train using non-dominated sorting genetic algorithm. Biotechnol Bioeng. 2007, 98: 586-598. 10.1002/bit.21443.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.