Metabolic network reconstruction and genome-scale model of butanol-producing strain Clostridium beijerinckii NCIMB 8052
© Milne et al; licensee BioMed Central Ltd. 2011
Received: 3 May 2011
Accepted: 16 August 2011
Published: 16 August 2011
Solventogenic clostridia offer a sustainable alternative to petroleum-based production of butanol--an important chemical feedstock and potential fuel additive or replacement. C. beijerinckii is an attractive microorganism for strain design to improve butanol production because it (i) naturally produces the highest recorded butanol concentrations as a byproduct of fermentation; and (ii) can co-ferment pentose and hexose sugars (the primary products from lignocellulosic hydrolysis). Interrogating C. beijerinckii metabolism from a systems viewpoint using constraint-based modeling allows for simulation of the global effect of genetic modifications.
We present the first genome-scale metabolic model (i CM925) for C. beijerinckii, containing 925 genes, 938 reactions, and 881 metabolites. To build the model we employed a semi-automated procedure that integrated genome annotation information from KEGG, BioCyc, and The SEED, and utilized computational algorithms with manual curation to improve model completeness. Interestingly, we found only a 34% overlap in reactions collected from the three databases--highlighting the importance of evaluating the predictive accuracy of the resulting genome-scale model. To validate i CM925, we conducted fermentation experiments using the NCIMB 8052 strain, and evaluated the ability of the model to simulate measured substrate uptake and product production rates. Experimentally observed fermentation profiles were found to lie within the solution space of the model; however, under an optimal growth objective, additional constraints were needed to reproduce the observed profiles--suggesting the existence of selective pressures other than optimal growth. Notably, a significantly enriched fraction of actively utilized reactions in simulations--constrained to reflect experimental rates--originated from the set of reactions that overlapped between all three databases (P = 3.52 × 10-9, Fisher's exact test). Inhibition of the hydrogenase reaction was found to have a strong effect on butanol formation--as experimentally observed.
Microbial production of butanol by C. beijerinckii offers a promising, sustainable, method for generation of this important chemical and potential biofuel. i CM925 is a predictive model that can accurately reproduce physiological behavior and provide insight into the underlying mechanisms of microbial butanol production. As such, the model will be instrumental in efforts to better understand, and metabolically engineer, this microorganism for improved butanol production.
The diminishing supply of non-renewable feedstocks--and concern over environmental ramifications of their use in fuel and chemical production--highlights the need for technological advances to improve the economic viability of sustainable production methods. In particular, given its broad scope of applications as a chemical feedstock and compelling properties as an alternative transportation fuel , sustainable production of butanol is of particular industrial interest. Butanol production via microbial fermentation from lignocellulosic material (historically achieved using solventogenic clostridia, prior to petroleum refining ) represents a sustainable method for production of this important solvent.
Recently, the most common fermentation microorganisms--Escherichia coli and Saccharomyces cerevisiae-- have been engineered to produce butanol and its branched chain derivatives (e.g., isobutanol) [3–5]. However, relative to E. coli and S. cerevisiae, the solventogenic clostridia offer two clear advantages as butanol-producing microorganisms: (i) the evolved ability to produce and tolerate butanol at concentrations up to 21 g/L--important because butanol is highly toxic to microorganisms at even low concentrations [6–9], and (ii) the ability to co-ferment pentose and hexose sugars, the primary sugars found in lignocellulosic hydrolysates [10, 11]. These characteristics should reduce the number of genetic modifications needed to make biological butanol production economically competitive.
Among the solventogenic clostridia, Clostridium acetobutylicum ATCC 824 and C. beijerinckii produce the highest n-butanol concentrations; the mutant strain C. beijerinckii BA101 achieves the highest reported concentration (17-21 g/L) across all microorganisms [7–9]. The parent strain of BA101, C. beijerinckii strain NCIMB 8052, holds several advantages for industrial butanol production: (i) it has proven amenable to experimental modifications that increase butanol tolerance and production ; (ii) the solventogenic genes reside on the chromosome rather than on a separate megaplasmid (as is the case for C. acetobutylicum), potentially increasing its resistance to degeneration ; (iii) it can successfully produce butanol in continuous culture conditions ; and (iv) it has a broad substrate utilization spectrum [10, 11, 14]. Taken together, these traits give C. beijerinckii particular appeal as a clostridial catalyst for industrial butanol production.
Like other solventogenic clostridia, C. beijerinckii produces solvents (butanol and acetone) as products of a biphasic metabolism. Butyrate and acetate are produced first in acidogenesis; in solventogenesis, acids are re-assimilated and production of butanol and acetone begins. Central to improving butanol production is deciphering what causes this metabolic switch. Numerous phenomena--decreased pH, acid accumulation, intracellular ATP concentration, nutrient limitation, interplay between carbon and electron flow pathways, and sporulation--have been hypothesized to contribute [15, 16]. Most of these phenomena are directly tied to cellular metabolism, or more specifically, to changes among the enzymes and metabolites that comprise the intracellular metabolic network. Our aim is to develop an understanding of C. beijerinckii metabolism--through systems analysis of the metabolic network--that will enable us to optimally direct available carbon towards the production of butanol.
In silico reconstruction of metabolic networks--and subsequent analysis of genome-scale metabolic models through constraint-based modeling --enables a global interrogation of metabolism not possible with standard experiments. The genome-scale model allows one to analyze the cell from a systems viewpoint to predict whole-cell effects of genetic changes, and to simulate known and hypothesized phenotypes. Methods for reconstructing and analyzing metabolic networks have been well established for microorganisms, and genome-scale models have been built for all branches of life [18, 19]. Furthermore, numerous successes have been demonstrated for using these models to guide rational engineering in model microorganisms such as E. coli and S. cerevisiae[20–22]. Importantly, models of this type can be constructed for any organism with a sequenced genome, and thus hold particular utility for lesser-characterized organisms such as C. beijerinckii.
We present here the first genome-scale metabolic model (named i CM925) for C. beijerinckii, built based on the NCIMB 8052 strain. There have been four genome-scale models built for clostridia--two for C. acetobutylicum[23–25] and one each for the cellulolytic strains C. thermocellum and C. cellulolyticum. C. beijerinckii is distinct from these in that it is the most productive wild-type butanol-producing clostridia known to date [6–9]. Containing 925 genes, 938 reactions, 881 metabolites, and 67 membrane transport reactions, i CM925 is the largest genome-scale model for a clostridial species. The i CM925 model can simulate substrate uptake and product formation rates for typical batch culture experiments, and correctly captures the relationships between the formation of products such as butanol and hydrogen. As such, the C. beijerinckii model will be instrumental in our future efforts to engineer C. beijerinckii to produce higher titers of butanol.
Results & Discussion
The first step towards building a genome-scale metabolic model is reconstructing the genome-scale metabolic network--typically done using publically available annotation databases and published literature. A list is collected of reactions that are either catalyzed by enzymes encoded in the genome or have been defined experimentally, and then expanded to define relationships between genes, enzymes, reactions, metabolites, and pathways in the network. To establish the genome-scale metabolic model, the network reactions are subjected to a number of physico-chemical constraints--either calculated or based on physiological data--to simulate defined cultural conditions. Given the limited literature and biochemical data available for C. beijerinckii, we reconstructed our metabolic network first using a semi-automated approach to obtain annotation data from three major databases, and then utilized computational algorithms and manual curation to further refine the network. To test the ability of the i CM925 model to simulate experimentally-observed behavior, we conducted a series of batch fermentations to compare measured substrate uptake and product formation rates with model predictions. The model provides a solid basis with which to study the unique characteristics of C. beijerinckii metabolism and guide future metabolic engineering experiments for enhanced butanol production capability.
The initial genome-scale metabolic network
The available genome annotations for lesser-characterized organisms are largely generated by computational, informatics-based procedures (i.e., they often lack manual curation), and there is a paucity of experimentally-confirmed biochemical data. To facilitate reconstruction, expand the scope of our C. beijerinckii network, and evaluate confidence for each gene-protein-reaction (GPR) relationship included, we merged annotation data from three independent databases: KEGG (Kyoto Encyclopedia for Genes and Genomes) , BioCyc , and The SEED [30, 31]. To reduce the time required to assemble annotation data into a well-connected genome-scale network, we employed a semi-automated computational approach (see Methods) to retrieve and integrate information from each database.
The foundation for our network, comprising 525 reactions, was obtained from the KEGG database. We expanded this network to include an additional 75 and 136 unique reactions from The SEED and BioCyc databases, respectively. Careful reconciliation and integration of the obtained biochemical data was required because the three databases do not follow a uniform nomenclature for reactions, metabolites, and pathways. We chose to adhere to the nomenclature used by the BiGG database (the largest available repository for genome-scale metabolic models) in order to enable easier comparison with other in silico models . This mapping step was quickly accomplished by using a matrix formalism to overlay the different databases (see Methods) based on stoichiometry. The mapping between BioCyc and KEGG for reaction and metabolite names in C. beijerinckii is available in Additional File 1.
In addition to establishing reliability for each included reaction, we evaluated the predicted gene-associations for reactions found in two or more of the annotation databases (see Additional File 2A for database-based GPR comparison). In cases where annotations did not agree between databases, associations were selected for the model based on the strongest BLAST  evidence (i.e., genomic identity between the associated enzyme and similarly annotated database proteins). Reassuringly, we found that most annotation disagreements were due to a missing gene-reaction relationship rather than a contrasting association; this suggests that overlapping reactions comprise a well annotated area of the network.
The refined C. beijerinckii metabolic network
The draft metabolic network derived from genome annotation data--even with combined information from multiple databases--contained gaps (i.e., missing reactions) that prevented simulation of cell growth and accurate physiological behavior (e.g., butanol production). Gaps create unconnected sections/regions in the network, thereby preventing production or consumption of a metabolite. In turn, the "dead-end" metabolite has often been observed experimentally as consumed or produced, or is needed to simulate cell growth. Network gaps must therefore be filled using literature information and/or genomic evidence beyond what was included in the annotation databases.
Identifying network gaps and selecting candidate gap-filling reactions with strong supporting evidence can be time consuming, especially for lesser-characterized organisms like C. beijerinckii. Consequently, we used the GapFind and GapFill  algorithms to computationally identify and resolve gaps, thus minimizing the amount of manual curation needed. Candidate reactions suggested by GapFill were chosen from the BiGG database; this database contains genome-scale models that have undergone extensive refinement and validation, and thus is a resource of high-confidence reactions . After reviewing candidate reactions for sufficient BLAST  evidence, we identified an additional 22 putative annotations (and 22 additional network reactions) for the C. beijerinckii genome (a list of added reactions may be found in Additional File 1)--seven of which were required for simulated cell growth. While GapFind and GapFill are highly useful computational algorithms, we found that they are not guaranteed to suggest reactions with strong supporting evidence and do not focus specifically on fulfilling the model objective function. Therefore, manual addition of reactions based on literature evidence was still needed to fill important gaps in the C. beijerinckii network.
Notably, the draft network was missing a butanol dehydrogenase enzyme, a ferredoxin NAD+ reductase, and did not contain the necessary biochemical transformations for production of known phospholipids. Reactions for both an NAD+ and NADP+ butanol dehydrogenase enzyme (BUTOHDx and BUTOHDy), known to exist in solvent producing clostridia [2, 35], were added based on BLAST  scores for the C. beijerinckii gene Cbei_2421. We were unable to find a gene association for ferredoxin NAD+ reductase--even though the NADP+ reductase is matched to Cbei_0661 and Cbei_2182--but added the reaction (FDXNRx) based on literature evidence . The phospholipid pathway was characterized using a similar approach to Lee et al. , drawing upon experimental data for fatty acid biosynthesis . In total, 38 reactions were added as a result of our manual curation--11 of which were added based on BLAST  comparison with reactions from the Senger & Papoutsakis C. acetobutylicum model [24, 25] and 22 of which were added for the formation of phospholipid and biomass components. The source for manually added reactions (as well as all other reactions included in the model) can be found in Additional File 1).
One of the most significant gaps in the draft C. beijerinckii network prevented model-simulated production of oxoglutarate, a major component of central metabolism; this gap stemmed from missing genetic evidence for enzymatic reactions need to complete the TCA cycle. We completed the TCA cycle in the model based on conclusions from two recent experimental studies, in which carbon labeling showed that C. acetobutylicum uses a bifurcated TCA cycle culminating in succinate secretion [37, 38]. The initial reconstruction did not support a bifurcated TCA cycle: our network was missing a citrate synthase (CS), succinyl-CoA synthetase (SUCOAS), and a succinate transport (SUCCex) reaction. In addition, the directionality of existing reactions did not support the experimentally observed flux. To allow for simulation of the bifurcated cycle and enable oxoglutarate production, we added the three missing reactions (without genetic evidence), and restricted reaction directionality to that observed in the study.
The genome-scale model (i CM925)
Network statistics for genome-scale models of clostridia
We assessed the pathway contribution of each annotation database (see Additional File 2B) to determine (i) if any database exhibited more complete coverage in one area of metabolism (e.g., carbohydrate metabolism) and (ii) if one database contributed more blocked reactions to the model. For each of the 13 pathway categories depicted in Figure 2, we found similar coverage between KEGG, BioCyc, and the SEED; this indicates that the small overlap found between databases is not simply a result of one database contributing more heavily to a particular area of metabolism. Additionally, each database contributed a similar number of blocked reactions: 22% of the blocked reactions came from BioCyc, 21% from KEGG, 10% from The SEED, 18% from two or more databases, and 17% from all three databases (these percentages are not directly proportional to the total number of enzymes contributed by each database). Therefore, we did not find that one database outperformed another in terms of model connectivity.
Validation of i CM925
To evaluate the predictive accuracy of i CM925, we used Flux Balance Analysis (FBA, see Methods) to reproduce experimental fermentation behavior. The FBA formalism represents all known reactions in the cell as a stoichiometric matrix, and uses linear programming to maximize a user defined objective function (e.g., growth) under a steady state assumption [45, 46]. Importantly, FBA can be used to simulate experimental parameters such as growth rates, uptake rates, and byproduct secretion rates--enabling quantitative evaluation of model agreement with physiological behavior.
During fermentation, C. beijerinckii produces six primary carbon-containing byproducts: acetate, butyrate, acetone, butanol, ethanol, and carbon dioxide. Due to the biphasic nature of C. beijerinckii metabolism, not all five byproducts are produced at the same rates throughout fermentation. In a targeted gene expression study, Shi and Blaschek found that solvent formation began during mid-exponential growth (7-8 hours); this period was characterized by increased expression levels of solvent formation genes and accompanied by decreased expression of genes associated with acid formation . To validate the ability of the model to simulate byproduct secretion and growth rates, we conducted our own batch fermentation experiments using NCIMB 8052 cultures grown on minimal media. Similar to Shi and Blaschek, we observed the switch from butyrate to butanol formation at 8-10 hours; we chose to focus our simulations on the subsequent period of exponential growth in which butanol is produced.
We determined substrate uptake and product secretion rates for cultures grown at four temperatures (30°C, 33°C, 35°C, 40°C) to obtain multiple data sets with which to compare model simulations. Only results for 35°C are reported in the main text, as it is most representative of typical fermentation conditions (complete experimental results are available in Additional File 2C). Experimental rate estimates (in units of mmol/gDW/hr) were determined for butanol, acetone, ethanol, acetate and butyrate using product concentration and growth rate (see Methods, Additional File 2D). We observed a net consumption of glucose and acetate (the carbon containing compounds in our defined growth media) and net production of acetone, butanol and ethanol. When performing simulations, specified uptake and secretion rates were constrained to fall within one standard deviation of experimentally measured rates, while the remaining rates were determined by FBA. All model simulations were conducted with biomass production (defined by the biomass equation, see Methods and Additional File 1 for details) as the assumed cellular objective.
To confirm that i CM925 is capable of simulating production of all expected metabolites at experimentally determined rates, additional constraints were added to the product secretion reactions for butanol, acetone, ethanol, and butyrate (Figure 3B, see Additional File 2D for comparisons at different temperatures). As product formation is known to be associated with the generation of ATP in the cell , the effect of ATP production requirements was analyzed by altering the constraints on the non-growth associated ATP maintenance (NGAM) reaction. The first simulation assumed that no ATP is needed for non-growth associated maintenance, and resulted in a higher growth rate than expected; this is a biologically unrealistic assumption, but illustrates the dependency of growth rate on ATP maintenance. The latter simulation--with an NGAM value that guided the in silico growth rate to the experimentally calculated range--demonstrated that the expected experimental phenotype can be reproduced by the model. The selected NGAM value was 8.5 mmol/gDW/hr, which is encouragingly similar to the value used in the E. coli i AF1260 model . From these simulations, we concluded that all observed secretion patterns exist within the solution space of the model, even though solvent secretion patterns in C. beijerinckii are not very well described by the i CM925 model when using the optimal growth objective.
Analysis of the active reactions in i CM925
Contrary to experimental ethanol production which stems primarily from acetyl-CoA [2, 40], i CM925 predicted that about 70% of ethanol was made from threonine (derived from aspartate) by the enzyme threonine acetaldehyde-lyase (THRA). Butanol, butyrate, and acetone were produced using the experimentally characterized pathways, and acetate was consumed using CoA-transferase (COAT1) as expected [2, 40]. Intriguingly, the model predicted simultaneous production and consumption of butyrate using butyrate kinase (BUTK) and CoA-transferase (COAT2), respectively. The re-uptake of acids by solventogenic clostridia has been experimentally established, with one of the leading suggestions for this behavior being a means of de-toxification of the acidic environment . Given that the primary objective in our simulations is to maximize flux through the biomass equation within the imposed constraints, it is most likely that the motivation for re-uptake of butyrate by the i CM925 model is the generation of additional ATP--a major component of biomass. Previous experimental studies investigating acid re-uptake [41–43] do not support this suggested motivation, however--making additional exploration into the motivations of the model an interesting area of focus going forward. These experimental findings further suggest that selective pressures other than optimal growth may dominate phenotype under typical fermentation conditions.
Understanding butanol production: the role of molecular hydrogen
Additionally, our simulation shows that for high levels of hydrogen formation, acetate is the only byproduct and growth rate is at a maximum. The production of hydrogen eliminates the need for additional NADH consumption by butyrate, allowing ATP generation to occur exclusively via acetate formation--the most efficient method for the cell. Maximum growth rate is observed under these conditions because they represent the most energy efficient means of glucose utilization for the microorganism. At very low hydrogen consumption rates, we observe production of butanol rather than butyrate, as the production of butanol results in the consumption of two additional NADH molecules compared to butyrate. This observation supports the conclusion that without hydrogen production, excess electrons are disposed of via the production of acids and solvents. The overall observed effect of hydrogen formation is not only experimentally consistent, but it highlights the importance of this reaction in regulating butanol formation, and will be an area of focus in our continued efforts to improve butanol production in C. beijerinckii.
Comparison of i CM925 with C. acetobutylicum model
Butanol, currently produced as a byproduct of petroleum refining, is appealing in industry as both an important chemical feedstock and an alternative transportation fuel. We have built the first genome-scale model for C. beijerinckii to better understand the metabolic behavior of the microorganism, and to guide future metabolic engineering for increased butanol production. Having a genome-scale model for C. beijerinckii is advantageous because it helps provide a global picture of metabolism--enabling interrogation of the interplay between the various fermentation products of the microorganism from a systems viewpoint. Given the lack of detailed biochemical data available for C. beijerinckii, we integrated and cross-checked information from three major annotation databases to reconstruct the core metabolic network, and then further completed the network with computational algorithms and manual curation. We collected experimental fermentation data to determine production rates of acetone, ethanol, and butanol, and uptake rates of acetate and glucose, and these rates were used to confirm the ability of the model to accurately represent physiological behavior. Interestingly, reactions found in all three annotation databases proved to contribute significantly to the actively used reactions in validation simulations. Even though the observed experimental phenotypes were found to exist in the solution space of the model, optimal growth simulations on glucose did not predict the expected product profiles--suggesting the possibility of an alternative cellular objective or additional mechanisms not captured by the i CM925 model. One reaction found to have a strong impact on the predicted product formation rates was the hydrogenase reaction--a reaction that has been found to impact solvent formation experimentally as well. Going forward, this model will play a central role in understanding and engineering butanol production by C. beijerinckii.
Additionally, the construction of i CM925 highlighted important areas of investigation for future model-building efforts in other lesser-characterized microorganisms (e.g., genome annotation agreement), and for improved constraint-based simulation of non-growth phenotypes (e.g., alternative objective functions).
Genome-scale models are built using enzyme-catalyzed reaction information encoded in the genome of an organism, as well as experimentally characterized reaction information. This collection of all known reactions in the metabolism of the organism then serves as the foundation for the metabolic model. The model is represented mathematically by the stoichiometric (S) matrix. Each column in S represents a reaction in the network, where entries for each row indicate the stoichiometric relationship of corresponding metabolites (negative and positive coefficients denote reactants and products, respectively, and zero entries indicate a non-participating reaction). Through constraint-based modeling [17, 53], a series of balances and bounds (discussed below) are applied to the reactions in S, and the model is used to simulate cell growth by optimizing for a user-defined objective function. These simulations can then be used to examine the interplay between different reactions and pathways, and to predict resulting metabolic phenotypes from genetic modifications.
Semi-automated compilation of the draft metabolic network
The metabolic network describes the connectivity of metabolites and reactions in a cell and characterizes the link between genes, proteins and reactions (GPR relationship). We built the base metabolic network using the KEGG genome annotation for C. beijerinckii. This draft network contained annotation-based information available for C. beijerinckii from the KEGG database, including GPR relationships, pathway information, reaction stoichiometry, and reaction reversibility. To reduce the time needed to generate the initial draft network we developed a MATLAB based program to automatically collect and organize organism-specific biochemical information from the KEGG database.
using the between-database metabolite mapping, we identified the set of all compounds shared by BioCyc and KEGG;
we built temporary S matrices (one for the BioCyc C. beijerinckii reaction set, one for the KEGG/SEED draft network set, and one for the BiGG database--including only those reactions involving metabolites in the shared set (metabolites were ordered identically in each matrix);
we identified matching columns (reactions), assuming that identical stoichiometric relationships between a set of metabolites represented a matching reaction between BioCyc and KEGG.
we manually inspected and curated reactions not mapped in an automated fashion.
The resulting network, based on KEGG nomenclature, consisted of the list of reaction formulas, the corresponding enzymes (including enzyme commission number) and genes, reaction identifiers, pathway information, and a note on the source database. To enable comparison with other published genome-scale models (over 50 to date ), the metabolite and reaction identifiers were reformatted in accordance with models available in the BiGG database . This was done similarly to the BioCyc-KEGG mapping described above. Metabolite mapping was achieved using flat files from BiGG, and reactions were mapped using temporary S matrices for BiGG and the C. beijerinckii network. Manual matching was performed for reactions and metabolites for which no automated connection was found. Names were generated for any remaining reactions and metabolites for which no mapping existed.
After identifying active reactions in defined model simulations (described below), we examined whether there was a strong association between database overlap and inclusion in the active set. Specifically, we used the Fisher's exact probability test to determine whether the set of overlapping reactions (those found in all three annotation databases) was "enriched" for active use in a simulation. Using the 'fisherextest' function available for MATLAB, we defined a 2 × 2 table representing the following frequencies: (i) reactions belonging to all databases that were active in the simulation; (ii) overlapping reactions that were inactive; (iii) non-overlapping reactions (those belonging to two or less databases) that were active; and (iv) non-overlapping reactions that were inactive in the simulation. An exact P-value--indicating the probability of observing the same or higher frequency of overlapping, active reactions by random chance--was calculated by the function based on a hypergeometric distribution.
Building the genome-scale metabolic model
When building the model, application of physico-chemical constraints--namely mass and energy balance--were carefully enforced. To mass- and charge-balance model reactions, charge information for each molecule was determined using (in order): (i) the BiGG database; (ii) computational pKA based predictions at pH 7.2; and (iii) BioCyc (see Additional File 1 for complete list). The model was then mass balanced in a semi-automated fashion using charged molecular formulas. Reactions with a hydrogen imbalance were balanced by automatically altering the stoichiometric hydrogen relationship until both mass and charge balances were satisfied. Reactions that could not be balanced in this manner were inspected manually. Any reactions that ultimately could not be balanced were excluded from the model entirely.
We applied environmental constraints as bounds on individual fluxes (e.g., flux capacities, thermodynamics), defining the smaller solution space that represents the allowable phenotypes for the model. Irreversible reactions were constrained to positive or negative flux, depending on direction. Membrane transport and exchange reactions were used to transfer metabolites into and out of the cytosol and system boundaries, respectively. For metabolites whose uptake or output rates were experimentally determined, we specified individual bounds (e.g., glucose, acetate) on the corresponding exchange reactions, and these were varied depending on the simulation. Reversibility was determined by careful comparison of reaction direction in all databases, and the most common directionality was typically chosen. We used extreme pathway analysis [54, 55] to identify thermodynamically infeasible cycles, and eliminated these cycles by changing directionality or deleting one of the participating reactions.
As the constraint-based system is highly underdetermined, many solutions (i.e., flux distributions) exist that satisfy S v = 0. We therefore used Flux Balance Analysis (FBA) to determine the distribution of reaction fluxes that optimize a user-defined biological objective function (in our simulations, the commonly used biomass production objective) [46, 56]. To facilitate simulations, the model was formatted to be compatible with the COBRA Toolbox ; all model simulations were subsequently performed using COBRAToolbox-1.3.1 in MATLAB, with GLPK as the linear programming solver. For all optimizations, the 'minNorm' flag was turned on (related to the cost of enzyme production in the cell), and simulations were run with a negative lower bound representing a reversible reaction.
To simulate biomass production, a single equation representing all macromolecules comprising one C. beijerinckii cell was created using known experimental compositions and compositions inferred from the genome. Following the detailed supplemental information provided by Lee et al.  for their C. acetobutylicum genome-scale model, biomass was assumed to consist of: DNA, RNA, lipids, protein, peptidogylcan, and teichoic acid and trace metabolites. DNA, RNA and protein content were calculated directly from the genome sequence, and peptidoglycan, teichoic acid and trace metabolites were kept similar to C. acetobutlylicum. To determine the appropriate lipid composition, we performed a detailed analysis of lipid and fatty acid content in the cell using data from . See Additional File 1 for more details.
When performing Flux Variability Analysis [17, 50], we selected reactions that could increase or decrease by 25% of their maximal flux value for further analysis. To simulate the effect of hydrogen production on the production of acetate, butyrate, acetone, butanol, ethanol and growth we performed a robustness analysis [17, 50]. This was done by constraining the flux through the hydrogen exchange reaction and iteratively performing FBA to evaluate changes in flux through the exchange reactions of other products. The reaction flux through exchange reactions for all interested metabolites was then plotted versus flux through the hydrogen exchange reaction.
Refinement of the genome-scale metabolic model
As defined by Kumar et al., metabolites that participate in network gaps fall into two categories: non-produced or non-consumed. We first used GapFind/GapFill , which identifies network gaps and suggests reactions (from a user-specified database--in our case, the BiGG database) whose addition to the model would eliminate the gap. Suggested reactions were manually inspected for relevancy and homology evidence using BLAST ; reactions with an E-value of 1 × 10-8 or less for their associated gene were added to the model. This liberal cut off was used in an effort to achieve biomass growth--the identified gene associations for the added reactions were later compared against KEGG, BioCyc and The SEED. If one of the suggested gene associations was found to have a stronger annotation in one of these databases, the corresponding reaction from GapFill was not added to the model. To identify replacement reactions, we iteratively ran GapFind and GapFill until the suggested reactions all had poor supporting genetic evidence.
Additional model refinement was carried out using reactions described in published C. beijerinckii material, as well as the two published C. acetobutylicum genome-scale models [23–25]. Reactions added from the C. acetobutylicum models were added in the same manner as the GapFill suggestions, with a required BLAST  E-value of no more than 1 × 10-8. In only a few cases, reactions were added without any genomic evidence, given sufficient literature support for the reaction. Model refinement continued until the model was capable of simulating accurate growth and product formation.
Experimental data collection & analysis
The four fermentation studies were conducted at different temperatures: 30°C, 33°C, 35°C, and 40°C; each study was run in triplicate. Cultures of C. beijerinckii NCIMB 8052 were stored in spore form at 4°C in sterile H2O . Spores were heat shocked for 10 minutes at 80°C, immediately transferred into an ice bath for 5 minutes, and inoculated into a 6% glucose filter-sterilized P2YE medium [9, 57]. The inoculum was incubated in an anaerobic chamber under N2:CO2:H2 (volume ratio of 85:10:5) atmosphere for 14 hours at 35 ± 1°C. Cell cultures were then transferred into 1 L Sixfors Bioreactors (Appropriate Technical Resources, Inc.) containing 400 mL 6% glucose filter-sterilized P2 medium under anaerobic conditions for a 100 hour total fermentation period. Over this time period, samples were taken every 3 hours for the first 24 hours, 6 hours for the next 12 hours, 12 hours for the next 24 hours, and then every 24 hours for the remainder of the time. For each sample, optical density was measured using a UV-Visible Spectrophotometer (Thermo Scientific BioMate 3) and cell density was calculated using the relationship A600 = 1 equivalent to 0.28 mg/mL. Gas chromatography (Agilent Technologies 7890A GC System) was used to quantify acetic acid, acetone, butyric acid, ethanol, and butanol concentrations, and glucose concentration was determined using high pressure liquid chromatography (Agilent Technologies 1200 Series). The pH was recorded throughout the fermentation.
In this equation, [metabolite] is the metabolite concentration in mmol/L, [biomass] is the cell concentration in gDW/L and μ is the growth rate. The yield Δ[metabolite]/Δ[biomass] was determined by plotting metabolite concentration against biomass concentration. Growth rate was found using an exponential growth fit to the biomass vs. time plots. To test the ability of i CM925 to reproduce the experimental rates, an experimental "range" was defined as within one standard deviation above or below the mean. This range was used to constrain the upper and lower bounds of the relevant uptake and output reactions in the model, and the resulting in silico growth prediction was compared to the experimental growth rate.
The authors gratefully acknowledge funding from an NSF CAREER grant (NDP) and a Chemistry-Biology Interface NIH Training Grant Fellowship (CBM). Additionally, the authors would like to thank professors Aaron Best and Matt DeJongh from Hope College for their assistance with collecting annotation data from The SEED database, graduate students Matthew Benedict, Matthew Gonnerman, and Yi Wang for their feedback, ideas and assistance with model building, and undergraduate students Sanchit Beri, Keith Chavez, and Kanishka Desai for their contributions to building the model.
- Lee SY, Park JH, Jang SH, Nielsen LK, Kim J, Jung KS: Fermentative butanol production by Clostridia. Biotechnol Bioeng. 2008, 101: 209-228. 10.1002/bit.22003View ArticlePubMedGoogle Scholar
- Jones DT, Woods DR: Acetone-butanol fermentation revisited. Microbiol Rev. 1986, 50: 484-524.PubMed CentralPubMedGoogle Scholar
- Atsumi S, Hanai T, Liao JC: Non-fermentative pathways for synthesis of branched-chain higher alcohols as biofuels. Nature. 2008, 451: 86-89. 10.1038/nature06450View ArticlePubMedGoogle Scholar
- Bond-Watts BB, Bellerose RJ, Chang MC: Enzyme mechanism as a kinetic control element for designing synthetic biofuel pathways. Nat Chem Biol.
- Steen EJ, Chan R, Prasad N, Myers S, Petzold CJ, Redding A, Ouellet M, Keasling JD: Metabolic engineering of Saccharomyces cerevisiae for the production of n-butanol. Microb Cell Fact. 2008, 7: 36- 10.1186/1475-2859-7-36PubMed CentralView ArticlePubMedGoogle Scholar
- Ezeji T, Milne C, Price ND, Blaschek HP: Achievements and perspectives to overcome the poor solvent resistance in acetone and butanol-producing microorganisms. Appl Microbiol Biotechnol. 85: 1697-1712.
- Annous BA, Blaschek HP: Isolation and characterization of Clostridium acetobutylicum mutants with enhanced amylolytic activity. Appl Environ Microbiol. 1991, 57: 2544-2548.PubMed CentralPubMedGoogle Scholar
- Ezeji TC, Qureshi N, Blaschek HP: Butanol fermentation research: upstream and downstream manipulations. Chem Rec. 2004, 4: 305-314. 10.1002/tcr.20023View ArticlePubMedGoogle Scholar
- Formanek J, Mackie R, Blaschek HP: Enhanced Butanol Production by Clostridium beijerinckii BA101 Grown in Semidefined P2 Medium Containing 6 Percent Maltodextrin or Glucose. Appl Environ Microbiol. 1997, 63: 2306-2310.PubMed CentralPubMedGoogle Scholar
- Qureshi N, Ezeji TC, Ebener J, Dien BS, Cotta MA, Blaschek HP: Butanol production by Clostridium beijerinckii. Part I: Use of acid and enzyme hydrolyzed corn fiber. Bioresource Technology. 2008, 99: 5915-5922. 10.1016/j.biortech.2007.09.087View ArticlePubMedGoogle Scholar
- Ezeji T, Qureshi N, Blaschek HP: Butanol production from agricultural residues: Impact of degradation products on Clostridium beijerinckii growth and butanol fermentation. Biotechnology and Bioengineering. 2007, 97: 1460-1469. 10.1002/bit.21373View ArticlePubMedGoogle Scholar
- Wilkinson SR, Young M: Physical map of the Clostridium beijerinckii (formerly Clostridium acetobutylicum) NCIMB 8052 chromosome. J Bacteriol. 1995, 177: 439-448. 10.1002/path.1711770416PubMed CentralPubMedGoogle Scholar
- Kashket ER: Clostridial strain degeneration. FEMS Microbiology Review. 1995, 17 (3): 307-315. 10.1111/j.1574-6976.1995.tb00214.x.View ArticleGoogle Scholar
- Shi Y, Li YX, Li YY: Large number of phosphotransferase genes in the Clostridium beijerinckii NCIMB 8052 genome and the study on their evolution. BMC Bioinformatics. 11 (Suppl 11): S9-
- Shi Z, Blaschek HP: Transcriptional analysis of Clostridium beijerinckii NCIMB 8052 and the hyper-butanol-producing mutant BA101 during the shift from acidogenesis to solventogenesis. Appl Environ Microbiol. 2008, 74: 7709-7714. 10.1128/AEM.01948-08PubMed CentralView ArticlePubMedGoogle Scholar
- Mitchell WJ: Physiology of carbohydrate to solvent conversion by clostridia. Adv Microb Physiol. 1998, 39: 31-130.View ArticlePubMedGoogle Scholar
- Price ND, Reed JL, Palsson BO: Genome-scale models of microbial cells: evaluating the consequences of constraints. Nat Rev Microbiol. 2004, 2: 886-897. 10.1038/nrmicro1023View ArticlePubMedGoogle Scholar
- Feist AM, Herrgard MJ, Thiele I, Reed JL, Palsson BO: Reconstruction of biochemical networks in microorganisms. Nat Rev Microbiol. 2009, 7: 129-143.PubMed CentralView ArticlePubMedGoogle Scholar
- Thiele I, Palsson BO: A protocol for generating a high-quality genome-scale metabolic reconstruction. Nat Protoc. 2010, 5: 93-121.PubMed CentralView ArticlePubMedGoogle Scholar
- Feist AM, Palsson BO: The growing scope of applications of genome-scale metabolic reconstructions using Escherichia coli. Nat Biotechnol. 2008, 26: 659-667. 10.1038/nbt1401PubMed CentralView ArticlePubMedGoogle Scholar
- Milne CB, Kim PJ, Eddy JA, Price ND: Accomplishments in genome-scale in silico modeling for industrial and medical biotechnology. Biotechnol J. 2009, 4: 1653-1670. 10.1002/biot.200900234PubMed CentralView ArticlePubMedGoogle Scholar
- Oberhardt MA, Palsson BO, Papin JA: Applications of genome-scale metabolic reconstructions. Mol Syst Biol. 2009, 5: 320-PubMed CentralView ArticlePubMedGoogle Scholar
- Lee J, Yun H, Feist AM, Palsson BO, Lee SY: Genome-scale reconstruction and in silico analysis of the Clostridium acetobutylicum ATCC 824 metabolic network. Appl Microbiol Biotechnol. 2008, 80: 849-862. 10.1007/s00253-008-1654-4View ArticlePubMedGoogle Scholar
- Senger RS, Papoutsakis ET: Genome-scale model for Clostridium acetobutylicum: Part I. Metabolic network resolution and analysis. Biotechnol Bioeng. 2008, 101: 1036-1052. 10.1002/bit.22010PubMed CentralView ArticlePubMedGoogle Scholar
- Senger RS, Papoutsakis ET: Genome-scale model for Clostridium acetobutylicum: Part II. Development of specific proton flux states and numerically determined sub-systems. Biotechnol Bioeng. 2008, 101: 1053-1071. 10.1002/bit.22009PubMed CentralView ArticlePubMedGoogle Scholar
- Roberts SB, Gowen CM, Brooks JP, Fong SS: Genome-scale metabolic analysis of Clostridium thermocellum for bioethanol production. BMC Syst Biol. 4: 31-Google Scholar
- Salimi F, Zhuang K, Mahadevan R: Genome-scale metabolic modeling of a clostridial co-culture for consolidated bioprocessing. Biotechnol J. 5: 726-738.Google Scholar
- Kanehisa M, Goto S: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000, 28: 27-30. 10.1093/nar/28.1.27PubMed CentralView ArticlePubMedGoogle Scholar
- Caspi R, Foerster H, Fulcher CA, Kaipa P, Krummenacker M, Latendresse M, Paley S, Rhee SY, Shearer AG, Tissier C, et al.: The MetaCyc Database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases. Nucleic Acids Res. 2008, 36: D623-631.PubMed CentralView ArticlePubMedGoogle Scholar
- Overbeek R, Begley T, Butler RM, Choudhuri JV, Chuang HY, Cohoon M, de Crecy-Lagard V, Diaz N, Disz T, Edwards R, et al.: The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res. 2005, 33: 5691-5702. 10.1093/nar/gki866PubMed CentralView ArticlePubMedGoogle Scholar
- Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M, et al.: The RAST Server: rapid annotations using subsystems technology. BMC Genomics. 2008, 9: 75- 10.1186/1471-2164-9-75PubMed CentralView ArticlePubMedGoogle Scholar
- Schellenberger J, Park JO, Conrad TM, Palsson BO: BiGG: a Biochemical Genetic and Genomic knowledgebase of large scale metabolic reconstructions. BMC Bioinformatics. 11: 213-Google Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389PubMed CentralView ArticlePubMedGoogle Scholar
- Satish Kumar V, Dasika MS, Maranas CD: Optimization based automated curation of metabolic reconstructions. BMC Bioinformatics. 2007, 8: 212- 10.1186/1471-2105-8-212PubMed CentralView ArticlePubMedGoogle Scholar
- Chen JS: Alcohol dehydrogenase: multiplicity and relatedness in the solvent-producing clostridia. FEMS Microbiol Rev. 1995, 17: 263-273. 10.1111/j.1574-6976.1995.tb00210.xView ArticlePubMedGoogle Scholar
- Durre P, : Handbook of Clostridia. 2005, Taylor & Francis Group, LLC,View ArticleGoogle Scholar
- Amador-Noguez D, Feng XJ, Fan J, Roquet N, Rabitz H, Rabinowitz JD: Systems-level metabolic flux profiling elucidates a complete, bifurcated tricarboxylic acid cycle in Clostridium acetobutylicum. J Bacteriol. 192: 4452-4461.Google Scholar
- Crown SB, Indurthi DC, Ahn WS, Choi J, Papoutsakis ET, Antoniewicz MR: Resolving the TCA cycle and pentose-phosphate pathway of Clostridium acetobutylicum ATCC 824: Isotopomer analysis, in vitro activities and expression analysis. Biotechnol J. 2010 Nov 4,Google Scholar
- Reed JL, Vo TD, Schilling CH, Palsson BO: An expanded genome-scale model of Escherichia coli K-12 (iJR904 GSM/GPR). Genome Biol. 2003, 4: R54- 10.1186/gb-2003-4-9-r54PubMed CentralView ArticlePubMedGoogle Scholar
- Markowitz VM, Ivanova NN, Szeto E, Palaniappan K, Chu K, Dalevi D, Chen IM, Grechkin Y, Dubchak I, Anderson I, et al.: IMG/M: a data management and analysis system for metagenomes. Nucleic Acids Res. 2008, 36: D534-538.PubMed CentralView ArticlePubMedGoogle Scholar
- Markowitz VM, Korzeniewski F, Palaniappan K, Szeto E, Werner G, Padki A, Zhao X, Dubchak I, Hugenholtz P, Anderson I, et al.: The integrated microbial genomes (IMG) system. Nucleic Acids Res. 2006, 34: D344-348. 10.1093/nar/gkj024PubMed CentralView ArticlePubMedGoogle Scholar
- Henry CS, Zinner JF, Cohoon MP, Stevens RL: iBsu1103: a new genome-scale metabolic model of Bacillus subtilis based on SEED annotations. Genome Biol. 2009, 10: R69- 10.1186/gb-2009-10-6-r69PubMed CentralView ArticlePubMedGoogle Scholar
- Oh YK, Palsson BO, Park SM, Schilling CH, Mahadevan R: Genome-scale reconstruction of metabolic network in Bacillus subtilis based on high-throughput phenotyping and gene essentiality data. J Biol Chem. 2007, 282: 28791-28799. 10.1074/jbc.M703759200View ArticlePubMedGoogle Scholar
- Nishikawa T, Gulbahce N, Motter AE: Spontaneous reaction silencing in metabolic optimization. PLoS Comput Biol. 2008, 4: e1000236- 10.1371/journal.pcbi.1000236PubMed CentralView ArticlePubMedGoogle Scholar
- Edwards JS, Ibarra RU, Palsson BO: In silico predictions of Escherichia coli metabolic capabilities are consistent with experimental data. Nat Biotechnol. 2001, 19: 125-130. 10.1038/84379View ArticlePubMedGoogle Scholar
- Kauffman KJ, Prakash P, Edwards JS: Advances in flux balance analysis. Curr Opin Biotechnol. 2003, 14: 491-496. 10.1016/j.copbio.2003.08.001View ArticlePubMedGoogle Scholar
- Minton NP, Clarke DJ, : Clostridia. 1989, New York: Plenum Press,View ArticleGoogle Scholar
- Feist AM, Henry CS, Reed JL, Krummenacker M, Joyce AR, Karp PD, Broadbelt LJ, Hatzimanikatis V, Palsson BO: A genome-scale metabolic reconstruction for Escherichia coli K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information. Mol Syst Biol. 2007, 3: 121-PubMed CentralView ArticlePubMedGoogle Scholar
- Mitchell WJ: Carbohydrate uptake and utilization by Clostridium beijerinckii NCIMB 8052. Anaerobe. 1996, 2: 379-384. 10.1006/anae.1996.0048.View ArticleGoogle Scholar
- Becker SA, Feist AM, Mo ML, Hannum G, Palsson BO, Herrgard MJ: Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox. Nat Protoc. 2007, 2: 727-738. 10.1038/nprot.2007.99View ArticlePubMedGoogle Scholar
- Datta R, Zeikus JG: Modulation of acetone-butanol-ethanol fermentation by carbon monoxide and organic acids. Appl Environ Microbiol. 1985, 49: 522-529.PubMed CentralPubMedGoogle Scholar
- Kim BH, Bellows P, Datta R, Zeikus JG: Control of Carbon and Electron Flow in Clostridium acetobutylicum Fermentations: Utilization of Carbon Monoxide to Inhibit Hydrogen Production and to Enhance Butanol Yields. Appl Environ Microbiol. 1984, 48: 764-770.PubMed CentralPubMedGoogle Scholar
- Price ND, Papin JA, Schilling CH, Palsson BO: Genome-scale microbial in silico models: the constraints-based approach. Trends Biotechnol. 2003, 21: 162-169. 10.1016/S0167-7799(03)00030-1View ArticlePubMedGoogle Scholar
- Papin JA, Price ND, Edwards JS, Palsson BB: The genome-scale metabolic extreme pathway structure in Haemophilus influenzae shows significant network redundancy. J Theor Biol. 2002, 215: 67-82. 10.1006/jtbi.2001.2499View ArticlePubMedGoogle Scholar
- Price ND, Papin JA, Palsson BO: Determination of redundancy and systems properties of the metabolic network of Helicobacter pylori using genome-scale extreme pathway analysis. Genome Res. 2002, 12: 760-769.PubMed CentralView ArticlePubMedGoogle Scholar
- Edwards JS, Covert M, Palsson B: Metabolic modelling of microbes: the flux-balance approach. Environ Microbiol. 2002, 4: 133-140. 10.1046/j.1462-2920.2002.00282.xView ArticlePubMedGoogle Scholar
- Lee J, Mitchell WJ, Blaschek HP: Glucose uptake in Clostridium beijerinckii NCIMB 8052 and the solvent-hyperproducing mutant BA101 (vol 67, pg 5025, 2001). Applied and Environmental Microbiology. 2002, 68: 3181-3181.PubMed CentralGoogle Scholar
- Varma A, Palsson BO: Stoichiometric flux balance models quantitatively predict growth and metabolic by-product secretion in wild-type Escherichia coli W3110. Appl Environ Microbiol. 1994, 60: 3724-3731.PubMed CentralPubMedGoogle Scholar