Towards a genome-scale kinetic model of cellular metabolism
- Kieran Smallbone†1, 2,
- Evangelos Simeonidis†1, 3Email author,
- Neil Swainston1, 4 and
- Pedro Mendes1, 4, 5
© Smallbone et al; licensee BioMed Central Ltd. 2010
Received: 27 July 2009
Accepted: 28 January 2010
Published: 28 January 2010
Advances in bioinformatic techniques and analyses have led to the availability of genome-scale metabolic reconstructions. The size and complexity of such networks often means that their potential behaviour can only be analysed with constraint-based methods. Whilst requiring minimal experimental data, such methods are unable to give insight into cellular substrate concentrations. Instead, the long-term goal of systems biology is to use kinetic modelling to characterize fully the mechanics of each enzymatic reaction, and to combine such knowledge to predict system behaviour.
We describe a method for building a parameterized genome-scale kinetic model of a metabolic network. Simplified linlog kinetics are used and the parameters are extracted from a kinetic model repository. We demonstrate our methodology by applying it to yeast metabolism. The resultant model has 956 metabolic reactions involving 820 metabolites, and, whilst approximative, has considerably broader remit than any existing models of its type. Control analysis is used to identify key steps within the system.
Our modelling framework may be considered a stepping-stone toward the long-term goal of a fully-parameterized model of yeast metabolism. The model is available in SBML format from the BioModels database (BioModels ID: MODEL1001200000) and at http://www.mcisb.org/resources/genomescale/.
Recent advances in genome sequencing techniques and bioinformatic analyses have led to an explosion of systems-wide biological data. In turn, the reconstruction of genome-scale networks for micro-organisms has become possible. Whilst the first stoichiometric models were limited to the central metabolic pathways, later efforts such as iFF708  and iND750  were much more comprehensive. A recent community-driven reaction network for S. cerevisiae (bakers' yeast) consists of 1761 reactions and 1168 metabolites .
The ability to analyse, interpret and ultimately predict cellular behaviour is a long sought-after goal. The genome sequencing projects are defining the molecular components within the cell, but describing their integrated function will be a challenging task. Ideally, one would like to use enzyme kinetics to characterize fully the mechanics of each reaction, in terms of how changes in metabolite concentrations affect local reaction rates. However, a considerable amount of data and effort is required to parameterize even a small mechanistic model; the determination of such parameters is costly and time-consuming, and moreover much of the required information may be difficult or impossible to determine experimentally. Instead, genome-scale metabolic modelling has relied on constraint-based analysis , which uses physicochemical constraints such as mass balance, energy balance, thermodynamics and flux limitations to describe the potential behaviour of an organism. Such methods, however, ignore much of the dynamic nature of the system and are unable to give insight into cellular substrate concentrations. These methods are more suitable for defining the wider limits of systems behaviour than making reliable and accurate predictions about metabolism.
In a previous paper, we presented a method for constructing a kinetic model for a metabolic pathway based only on the knowledge of its stoichiometry . Here, we present a first attempt at the creation of a parameterized, genome-scale kinetic model of metabolic networks, through appending existing kinetic models of constituent metabolic pathways from the BioModels database  to a stoichiometric model of yeast metabolism . The results (see Additional file 1) are presented in SBML (Systems Biology Markup Language; http://sbml.org/) , using MIRIAM-compliant annotations (Minimal Information Requested In the Annotation of Models; http://www.ebi.ac.uk/miriam/) . Critically, such markup allows automated reasoning about the model's assumptions and provenance.
Results and Discussion
A number of reconstructions of the metabolic network of yeast based on genomic and literature data have been published. However, due to different approaches utilized in the reconstruction, as well as different interpretations of the literature, the earlier reconstructions differ significantly. A community effort resulted in a consensus network model of yeast metabolism, combining results from previous models (, available from http://www.comp-sys-bio.org/yeastnet). In all, the resulting consensus network consists of 1857 reactions (of which 1761 are metabolic) involving 2153 chemical species (of which 1168 are metabolites). Species in the model are annotated using both database-dependent (e.g. ChEBI ) and database-independent (e.g. InChI ) references, generating for the first time a representation that allows computational comparisons to be performed.
That is, we define an objective function Z, a linear combination of the fluxes v j , that we maximize over all possible steady state fluxes (N v = 0; where N is the m × n stoichiometric matrix) satisfying certain constraints. In many genome scale metabolic models, a biomass production reaction is defined explicitly that may be taken as a natural form for the objective function. The metabolic reconstruction used here  lacks such a sink for metabolism. We accomplish this by adding a pseudo-reaction representing cellular growth (sometimes referred to as "biomass production"). The biomass composition used here is taken from the iND750 model .
In a previous paper , we defined a method for the generation of kinetic models of cellular metabolism, based solely on the knowledge of reaction stoichiometries. This modelling framework requires little experimental data regarding variables and no knowledge of the underlying mechanisms for each enzyme; nonetheless it allows inference of the dynamics of cellular metabolite concentrations. The fluxes found through FBA are allowed to vary dynamically . To create a kinetic model (of minimal complexity), four sets of information are required:
Network stoichiometry (N).
Reference fluxes (v*) through the network.
Reference metabolite concentrations (x*).
Elasticities (ε) -- changes in reaction rates with effector levels.
Selected reaction fluxes used in the model
alcohol dehydrogenase, reverse rxn (acetaldehyde → ethanol)
glycerol-3-phosphate dehydrogenase (NAD)
glucose transport (uniport)
glycerol transport via channel
alpha, alpha-trehalose-phosphate synthase (UDP-forming)
where BM denotes the subset of j that includes all the reactions with fluxes defined in BioModels. A unique reference flux (see additional file 2) is chosen from the space of all solutions to the above problem, by finding the box that defines the maximum and minimum values attainable by each v j , then choosing a flux as close as possible to the centre of the box. Iterating, the method minimizes and centres the flux through the network and, in this case, fixes all 956 fluxes to unique values. The algorithm  that produces the unique solution from the available flux space is described briefly below.
A simple FBA formulation is solved, in order to identify the maximum achievable growth rate, Z*. For the first iteration, we minimize the total flux required to achieve Z*. This assumption (i.e. that the cell minimizes its total flux. ) may be posed as a LP problem by decomposing fluxes v j into their positive and negative parts. The solution of this first iteration provides the minimal total flux through the network (Z1). We then find the bounds on each reaction flux, subject to the new constraint that the total flux through the network cannot be larger than Z1. The bounds are calculated by solving an optimisation problem for maximizing and minimizing the flux of each reaction iteratively. These limits are set as the new upper and lower bounds for the fluxes. The "centre" for each flux is the mean of the new bounds, as the most representative value of all solutions.
In the second iteration, we place a box around the hull (defining new bounds), before minimizing the distance between the flux of each reaction and the centre value, subject to the constraint that the total network flux cannot exceed Z1, as found in the first iteration. In turn, this leads to new bounds and a corresponding centre. Each iteration of the algorithm adds an additional constraint, and the flux is drawn towards the centre of the bounds. After a finite number of iterations, the bounds converge to a single solution, within a specified tolerance.
The algorithm is explained in detail in a previous paper , which described a method for finding a unique solution within the space of all possible flux distributions in FBA. In that paper, the algorithm is used on four recent genome-scale metabolic reconstructions. Using an iteration of linear programs, unique flux solutions are found in the available flux space for each organism.
Selected intracellular metabolite concentrations used in the model
2.75 × 10-4
Nicotinamide adenine dinucleotide
Nicotinamide adenine dinucleotide - reduced
Extracellular metabolite concentrations used in the model
8.2 × 10-5
5.3 × 10-4
where c denotes the compartment volumes. The benefit of this approximation lies in the existence of analytic forms for steady states and their stability matrix , thus avoiding computational problems associated with models of this size . In a recent investigation, the linlog approximation was proved better than its alternatives (linear, power laws, generic and convenience) at describing E. coli sugar metabolism .
To test the resultant genome-scale model, and to try and indentify key steps in the metabolic network of yeast, we calculate the flux control coefficients for reactions, as defined by metabolic control analysis (MCA). MCA studies how the control of fluxes and intermediate concentrations in a metabolic pathway is distributed among the different enzymes that constitute the pathway. Developed independently by Kacser and Burns  and Heinrich and Rapoport , the main theorems of MCA were given rigorous theoretical backing by Reder . Of particular interest is the connectivity theorem, highlighting the close relationship between the local properties of individual reactions (elasticities) and global properties of the system (control coefficients). This theorem links the properties of the individual reactions (elasticities) to the properties of the system (control coefficients).
Whilst Reder's formula is often used in computational applications, it assumes that a certain matrix is invertible; this may not be true, especially if some reference reaction rates are zero. For example, the number of independent metabolites is often defined solely in terms of stoichiometry as rank(N) (here = 616). However, once kinetics are taken into account, this number drops drastically to rank(N·diag(v*)·ε) = 205. Reder's method only holds if these two values are identical. Thus, in Methods, we derive again the main results of MCA without relying on such an assumption.
Reactions exerting most control over glucose transport
glucose transport (uniport)
glycerol-3-phosphate dehydrogenase (NAD)
adenylate kinase (GTP)
Reactions exerting most control over biomass production
H2O transport via diffusion
glycerol-3-phosphate dehydrogenase (NAD)
adenylate kinase (GTP)
glucose transport (uniport)
ribonucleoside-triphosphate reductase (UTP)
The systems biology approach often involves the development of mechanistic models, such as the reconstruction of dynamic systems from the quantitative properties of their elementary building blocks. Typically, this is performed in a 'bottom-up' manner, whereby models built as individual elements are experimentally-determined. Here we propose an alternative, 'top-down' mechanism, whereby an approximative model of the whole system is built initially; this model can then be used to guide experimental design and can subsequently be updated as specific knowledge becomes available from experimental results, following the iterative 'cycle of knowledge' approach . At any point of this iterative approach, detailed kinetic rate laws can be included if they become available, in which case the approach is then a hybrid top-down and bottom-up approach.
The genome-scale model that is produced with the presented methodology is offered in SBML format, with MIRIAM-compliant annotations. Such markup allows automated reasoning about the model's assumptions and provenance . A variety of software programs (e.g. COPASI ) have been designed to interface with SBML, but do not generally encounter models of this size. Indeed, the kinetic model produced here has over an order of magnitude more metabolites and reactions than any other kinetic model found in the BioModels repository. As the field develops, so larger models will be built, and software programs will be required to interface with models of at least this size. Thus, this methodology also allows software testing and advancement. The presence of analytic solutions facilitates validation of new tools, and avoids the usual problems with the high demands on computational power that models of this size have.
In this paper, we present a novel methodology that can be used to create a parameterized, genome-scale kinetic model of the metabolic network of an organism. The methodology is demonstrated by its application on yeast metabolism, through appending existing kinetic submodels from the BioModels database to a stoichiometric model of yeast. The final model has 956 metabolic reactions involving 820 metabolites and, to our knowledge has significantly wider scope than any previous models of comparable type. We demonstrate the usefulness of such a model, by applying the principles of metabolic control analysis to identify key steps within the network.
Critically, both the original stoichiometric model, and the kinetic model that constitutes the end-result of the method are available in SBML, using MIRIAM-compliant annotations. Models in BioModels are annotated with computer-readable references such as ChEBI  or InChI , which made it possible to curate the mapping to the stoichiometric model in a semi-automated manner. While fully-automated mapping of BioModels reactions to those in our stoichiometric model would be preferable, inconsistencies such as unbalanced reactions in either data resource prevent this at the current time. As systems biology is still a new and emerging field, it should be expected that discrepancies and other annotation issues will improve considerably. This, combined with greater availability of kinetic models for reactions and pathways in model repositories such as BioModels in the future, would mean that our methodology could be used to provide an increasingly more accurate and detailed genome-scale, kinetic model for an organism, in an efficient and automated manner. Furthermore, the approach should benefit from expanding its scope in order to exploit other resources containing kinetic data, such as SABIO-RK  and BRENDA .
Our methodology clearly has limitations, in that the linlog framework is only valid in a region near the chosen reference state. Moreover, due to the vast lack of information, many of the parameters used in building the model are unknown and must be estimated through techniques such as flux balance analysis. Nonetheless, our modelling framework is a necessary stepping stone at creation of a genome-scale kinetic model, and may thus be considered the first step in the deductive-inductive 'cycle of knowledge' crucial for systems biology . We have demonstrated that this first model can be used to pinpoint, through sensitivity analysis, reactions that have the most control over the network, or reactions for which small perturbations of the values of their kinetic parameters lead to significant changes in the predictions of the model. Subsequent experimental work, such as kinetic assays may be used to improve the model's resolution. In the present case this includes glucosamine-6-phosphate deaminase, glutamine-fructose-6-phosphate transaminase and glutamine synthetase. The model (see additional file 1) is publically available for download in SBML format from the BioModels database (BioModels ID: MODEL1001200000) and at http://www.mcisb.org/resources/genomescale/.
where ε' is the n × m unscaled elasticity matrix.
In general, the rank(N ε') = m0 <m and the system defined above will display moiety conservations - certain metabolites can be expressed as linear combinations of other metabolites in the system. Note that the number of independent metabolites is not given simply by rank(N), as is generally (and erroneously) suggested; rather the local dynamics of the system must also be taken into account via the elasticity matrix. The conservations may be removed through matrix decomposition, using a m × m0 link matrix L that relates the complete vector of internal metabolites to the vector of independent metabolites . Writing A = N ε' and letting A r denote a m0 × m matrix composed of linearly independent rows of A, the corresponding link matrix is defined as , where '+' denotes the Moore-Penrose pseudoinverse ; hence A = L·A r .
where the m0 × m0 matrix (Nr·ε'. L) is invertible through introduction of the link matrix L.
respectively. If we compare our expressions to those given in Reder , we see that they are identical, save in her case r' is defined as the independent rows of N, leading to . If r = r' (i.e. if rank(N ε') = rank(N)), then L = L' and the two results are equivalent.
As such, we may see that we have extended Reder's work to encompass the possibility that rank(N ε') < rank(N), as is the case for our model (rank(N ε') = 205, whilst rank(N) = 616). From Equations (10) and (11), one may trivially deduce the summation and connectivity theorems.
subset of j: reactions with fluxes defined in BioModels
subset of i: all independent metabolites
m × m
m × 1
scaled flux control coefficients
n × n
C J '
unscaled flux control coefficients
n × n
C S '
unscaled concentration control coefficients
m × n
denotes the jth standard basis vector
n × 1
vector specifying the optimized fluxes
n × 1
m × n
m × m 0
m × 1
reference metabolite concentrations
m × 1
independent metabolite concentrations
m0 × 1
n × 1
reference flux vector
n × 1
lower bounds vector
n × 1
upper bounds vector
n × 1
fluxes defined in the Biomodels database
55 × 1
maximum achievable growth rate
minimal total flux through the network
m × n
unscaled elasticity matrix
m × n
We are grateful for the financial support of the BBSRC and EPSRC through grant BB/C008219/1 "The Manchester Centre for Integrative Systems Biology (MCISB)". We also thank Michael Howard for invaluable discussions, and our MCISB colleagues.
- Forster J, Famili I, Fu P, Palsson BO, Nielsen J: Genome-scale reconstruction of the Saccharomyces cerevisiae metabolic network. Genome Research. 2003, 13 (2): 244-253. 10.1101/gr.234503PubMed CentralView ArticlePubMedGoogle Scholar
- Duarte NC, Herrgard MJ, Palsson BO: Reconstruction and validation of Saccharomyces cerevisiae iND750, a fully compartmentalized genome-scale metabolic model. Genome Research. 2004, 14 (7): 1298-1309. 10.1101/gr.2250904PubMed CentralView ArticlePubMedGoogle Scholar
- Herrgard MJ, Swainston N, Dobson P, Dunn WB, Arga KY, Arvas M, Bluthgen N, Borger S, Costenoble R, Heinemann M, et al.: A consensus yeast metabolic network reconstruction obtained from a community approach to systems biology. Nature Biotechnology. 2008, 26 (10): 1155-1160. 10.1038/nbt1492PubMed CentralView ArticlePubMedGoogle Scholar
- Covert MW, Famili I, Palsson BO: Identifying constraints that govern cell behavior: A key to converting conceptual to computational models in biology?. Biotechnology and Bioengineering. 2003, 84 (7): 763-772. 10.1002/bit.10849View ArticlePubMedGoogle Scholar
- Smallbone K, Simeonidis E, Broomhead DS, Kell DB: Something from nothing - bridging the gap between constraint-based and kinetic modelling. Febs Journal. 2007, 274 (21): 5576-5585. 10.1111/j.1742-4658.2007.06076.xView ArticlePubMedGoogle Scholar
- Le Novere N, Bornstein B, Broicher A, Courtot M, Donizelli M, Dharuri H, Li L, Sauro H, Schilstra M, Shapiro B, et al.: BioModels Database: a free, centralized database of curated, published, quantitative kinetic models of biochemical and cellular systems. Nucleic Acids Research. 2006, 34: D689-D691. 10.1093/nar/gkj092PubMed CentralView ArticlePubMedGoogle Scholar
- Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, Arkin AP, Bornstein BJ, Bray D, Cornish-Bowden A, et al.: The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics. 2003, 19 (4): 524-531. 10.1093/bioinformatics/btg015View ArticlePubMedGoogle Scholar
- Le Novere N, Finney A, Hucka M, Bhalla US, Campagne F, Collado-Vides J, Crampin EJ, Halstead M, Klipp E, Mendes P, et al.: Minimum information requested in the annotation of biochemical models (MIRIAM). Nature Biotechnology. 2005, 23 (12): 1509-1515. 10.1038/nbt1156View ArticlePubMedGoogle Scholar
- Brooksbank C, Cameron G, Thornton J: The European Bioinformatics Institute's data resources: towards systems biology. Nucleic Acids Research. 2005, 33: D46-D53. 10.1093/nar/gki026PubMed CentralView ArticlePubMedGoogle Scholar
- Coles SJ, Day NE, Murray-Rust P, Rzepa HS, Zhang Y: Enhancement of the chemical semantic web through the use of InChI identifiers. Organic & Biomolecular Chemistry. 2005, 3 (10): 1832-1834. 10.1039/b502828kView ArticleGoogle Scholar
- Kauffman KJ, Prakash P, Edwards JS: Advances in flux balance analysis. Current Opinion in Biotechnology. 2003, 14 (5): 491-496. 10.1016/j.copbio.2003.08.001View ArticlePubMedGoogle Scholar
- Price ND, Reed JL, Palsson BO: Genome-scale models of microbial cells: Evaluating the consequences of constraints. Nature Reviews Microbiology. 2004, 2 (11): 886-897. 10.1038/nrmicro1023View ArticlePubMedGoogle Scholar
- Visser D, Heijnen JJ: Dynamic simulation and metabolic re-design of a branched pathway using linlog kinetics. Metabolic Engineering. 2003, 5 (3): 164-176. 10.1016/S1096-7176(03)00025-9View ArticlePubMedGoogle Scholar
- Smallbone K, Simeonidis E: Flux balance analysis: A geometric perspective. Journal of Theoretical Biology. 2009, 258 (2): 311-315. 10.1016/j.jtbi.2009.01.027View ArticlePubMedGoogle Scholar
- Holzhütter H-G: The principle of flux minimization and its application to estimate stationary fluxes in metabolic networks. European Journal of Biochemistry. 2004, 271 (14): 2905-2922. 10.1111/j.1432-1033.2004.04213.xView ArticlePubMedGoogle Scholar
- Allen J, Davey HM, Broadhurst D, Heald JK, Rowland JJ, Oliver SG, Kell DB: High-throughput classification of yeast mutants for functional genomics using metabolic footprinting. Nature Biotechnology. 2003, 21 (6): 692-696. 10.1038/nbt823View ArticlePubMedGoogle Scholar
- Visser D, Heijden van der R, Mauch K, Reuss M, Heijnen S: Tendency modeling: a new approach to obtain simplified kinetic models of metabolism applied to Saccharomyces cerevisiae. Metab Eng. 2000, 2 (3): 252-275. 10.1006/mben.2000.0150View ArticlePubMedGoogle Scholar
- Takahashi K, Yugi K, Hashimoto K, Yamada Y, Pickett CJF, Tomita M: Computational challenges in cell simulation: a software engineering approach. Intelligent Systems, IEEE. 2002, 17 (5): 64-71. 10.1109/MIS.2002.1039834.View ArticleGoogle Scholar
- Hadlich F, Noack S, Wiechert W: Translating biochemical network models between different kinetic formats. Metabolic Engineering. 2009, 11 (2): 87-100. 10.1016/j.ymben.2008.10.002View ArticlePubMedGoogle Scholar
- Kacser H, Burns JA: The control of flux. Symp Soc Exp Biol. 1973, 27: 65-104.PubMedGoogle Scholar
- Heinrich R, Rapoport TA: Linear Steady-State Treatment of Enzymatic Chains - General Properties, Control and Effector Strength. European Journal of Biochemistry. 1974, 42 (1): 89-95. 10.1111/j.1432-1033.1974.tb03318.xView ArticlePubMedGoogle Scholar
- Reder C: Metabolic control theory: a structural approach. J Theor Biol. 1988, 135 (2): 175-201. 10.1016/S0022-5193(88)80073-0View ArticlePubMedGoogle Scholar
- Kell DB, Oliver SG: Here is the evidence, now what is the hypothesis? The complementary roles of inductive and hypothesis-driven science in the post-genomic era. Bioessays. 2004, 26 (1): 99-105. 10.1002/bies.10385View ArticlePubMedGoogle Scholar
- Kell DB, Mendes P: The markup is the model: Reasoning about systems biology models in the Semantic Web era. Journal of Theoretical Biology. 2008, 252 (3): 538-543. 10.1016/j.jtbi.2007.10.023View ArticlePubMedGoogle Scholar
- Hoops S, Sahle S, Gauges R, Lee C, Pahle J, Simus N, Singhal M, Xu L, Mendes P, Kummer U: COPASI-A COmplex PAthway SImulator. Bioinformatics. 2006, 22 (24): 3067-3074. 10.1093/bioinformatics/btl485View ArticlePubMedGoogle Scholar
- Wittig U, Golebiewski M, Kania R, Krebs O, Mir S, Weidemann A, Anstein S, Saric J, Rojas I: SABIO-RK: Integration and curation of reaction kinetics data. Data Integration in the Life Sciences, Proceedings. 2006, 4075: 94-103. full_text. full_textView ArticleGoogle Scholar
- Schomburg I, Chang A, Schomburg D: BRENDA, enzyme data and metabolic information. Nucleic Acids Research. 2002, 30 (1): 47-49. 10.1093/nar/30.1.47PubMed CentralView ArticlePubMedGoogle Scholar
- Sauro HM, Ingalls B: Conservation analysis in biochemical networks: computational issues for software writers. Biophysical Chemistry. 2004, 109 (1): 1-15. 10.1016/j.bpc.2003.08.009View ArticlePubMedGoogle Scholar
- Penrose R: A generalized inverse for matrices. Mathematical Proceedings of the Cambridge Philosophical Society. 1955, 51 (03): 406-413. 10.1017/S0305004100030401.View ArticleGoogle Scholar