Further developments towards a genome-scale metabolic model of yeast
- Paul D Dobson†1,
- Kieran Smallbone†2, 3Email author,
- Daniel Jameson2, 4,
- Evangelos Simeonidis2, 5,
- Karin Lanthaler1, 2,
- Pınar Pir6,
- Chuan Lu7,
- Neil Swainston2, 4,
- Warwick B Dunn1, 2,
- Paul Fisher4,
- Duncan Hull1,
- Marie Brown1,
- Olusegun Oshota2, 5, 8,
- Natalie J Stanford2, 5, 8,
- Douglas B Kell1,
- Ross D King7,
- Stephen G Oliver6,
- Robert D Stevens4 and
- Pedro Mendes2, 4, 9
© Dobson et al; licensee BioMed Central Ltd. 2010
Received: 2 June 2010
Accepted: 28 October 2010
Published: 28 October 2010
To date, several genome-scale network reconstructions have been used to describe the metabolism of the yeast Saccharomyces cerevisiae, each differing in scope and content. The recent community-driven reconstruction, while rigorously evidenced and well annotated, under-represented metabolite transport, lipid metabolism and other pathways, and was not amenable to constraint-based analyses because of lack of pathway connectivity.
We have expanded the yeast network reconstruction to incorporate many new reactions from the literature and represented these in a well-annotated and standards-compliant manner. The new reconstruction comprises 1102 unique metabolic reactions involving 924 unique metabolites - significantly larger in scope than any previous reconstruction. The representation of lipid metabolism in particular has improved, with 234 out of 268 enzymes linked to lipid metabolism now present in at least one reaction. Connectivity is emphatically improved, with more than 90% of metabolites now reachable from the growth medium constituents. The present updates allow constraint-based analyses to be performed; viability predictions of single knockouts are comparable to results from in vivo experiments and to those of previous reconstructions.
We report the development of the most complete reconstruction of yeast metabolism to date that is based upon reliable literature evidence and richly annotated according to MIRIAM standards. The reconstruction is available in the Systems Biology Markup Language (SBML) and via a publicly accessible database http://www.comp-sys-bio.org/yeastnet/.
A central goal of integrative systems biology is the accurate representation of molecular interaction networks. Ultimately, such networks can be used to underpin mathematical models, consisting of stochastic or ordinary differential equations that permit the simulation of biological behaviour. The first step in generating such models is constructing a network of biochemical reactions and interactions between molecular components of the system to form a qualitative (unparameterised) model. Several groups have reconstructed the metabolic network of baker's yeast from genomic and literature data [1–3]. Variation in the approaches used, and contradictory interpretations of the available literature, mean that most reconstructions differ considerably. To resolve these problems, a cohort of the yeast systems biology community collaborated to create a consensus reconstruction. In April 2007, a large focused meeting brought together experts from various groups and disciplines in order to resolve discrepancies between the various reactions and metabolites described by other available reconstructions and form a consensus. The resultant reconstruction , subsequently referred to as "Yeast 1.0", removed the ambiguities inherent in its predecessors through the use of principled and computer-readable annotations. Whilst previous reconstructions had defined entities using subjective names, which lacked precision and resulted in ambiguities, Yeast 1.0 directly referenced chemical and protein descriptions to persistent databases or used standardised, database-independent, computer-readable representations. This removed the ambiguities and allowed the new reconstruction to be used effectively as the basis for automated analyses.
A limitation of Yeast 1.0 came about through the very generation of the consensus; the network became considerably fragmented as reactions that could not be readily annotated (due to the presence of structural ambiguities) were removed. This led to underrepresentation of a number of pathways, particularly those involved in lipid biosynthesis. Since Yeast 1.0, many improvements have been made to the reconstruction. The latest release, described here, is considerably larger (in terms of numbers of metabolites and reactions), of higher quality (by reference to literature evidence), exhibits greater coverage of known metabolic enzymes, and is better connected than all previous efforts.
Results and Discussion
The correct and complete representation of lipid metabolism is important, not only to meet the ultimate goal of genome-scale coverage, but also because understanding and engineering lipid metabolism through systems and synthetic biology is likely to play a major role in the replacement of fossil energy sources and chemical feedstocks with biofuels and bioplastics . In Yeast 1.0, lipid metabolism was poorly captured. To move towards a better representation, the literature, database annotations and homology relationships were used to identify the set of lipid-related yeast enzymes. Homology with mouse and human enzymes reported in LipidMaps , and with enzymes from all organisms reported in KEGG lipid pathways , indicated lipid enzymes in yeast (homology relationships predefined by Ensembl ). Further enzymes were added to the set manually by examination of SGD and Ensembl annotations. A total of 268 yeast enzymes were identified as likely to be part of lipid metabolism. Although the boundaries of this set are unavoidably subjective, it appears to capture the majority of lipid-related genes in yeast.
The 34 remaining lipid enzymes (in figure 3 these are 31 not found in any reconstruction, plus three found in both iMM904 and iIN800) from the set are either too poorly characterised functionally to be included or cannot be represented within the current description of the cell's compartmentalisation. Flippases, for example, require a more detailed description of membrane faces to capture their role in membrane asymmetry. Improving compartmental representation will be a goal for future releases.
Our approach towards structural improvement is also an example of the iterative "cycle of knowledge" approach , where the model is first used to guide biological research and can subsequently be updated and improved as specific new knowledge becomes available. In this case the iteration consisted of discovery and collation of experimental evidence previously obtained but which had never been identified in this context. Such discovery of knowledge was informed by the previous models and was unlikely to have happened in their absence.
New reconstructions are often validated through constraint-based approaches like Flux Balance Analysis (FBA)  to assess their ability to predict experimental results. While there is clear utility in deploying such methods to explore biochemical capacity, using improved agreement with experimental observations to determine whether the reconstruction is, in some sense, 'better' than previous efforts is potentially misleading. In the current release, non-inferred reactions are supported by evidence from the literature and it is in this sense that the reconstruction is validated and improved. That said, the updates improved the connectivity considerably and together with the inclusion of a reaction describing biomass composition now allows FBA to be performed. The availability of the model in SBML means that it is accessible through many generic and systems-biology-specific software packages, including the COBRA (COnstraint-Based Reconstruction and Analysis) toolbox .
Gene knockout analysis
number of genes
true positive (%)
true negative (%)
false positive (%)
false negative (%)
Closer inspection of predictions reveals that relatively subtle network variations often underlie prediction differences. Four experimentally lethal knockouts were not initially predicted as such by the new reconstruction, but are correctly predicted using iMM904. Three of these genes encode enzymes that are essential to riboflavin biosynthesis. The capacity of iMM904 to predict lethality correctly is due to its biomass definition including a small contribution from riboflavin, whereas this was not part of the initial iIN800 or current network's biomass definition. Subsequent addition of riboflavin to the (empirical) biomass description has resolved these differences. Note that this is not therefore a reflection of the quality of the underlying network but only of the empirical biomass estimation, which is itself dependent on the growth conditions.
In places, the added richness of the new reconstruction combines with certain known limitations to defeat total agreement with experiment. An example is seen by knocking out the acs2 gene, encoding acetyl-coA synthetase (Acs2p). By experiment this should be lethal, yet in the current network the cytoplasmic reaction is also catalysed by Acs1p, consistent with experimental data . When the Acs2p-catalysed reaction is eliminated, flux simply re-routes through the Acs1p reaction. Importantly, it is only the fortuitous incompleteness of iMM904, lacking the cytosolic Acs1 isozyme that reveals the inviability of the acs2 knockout. The proper basis of the inviability of the acs2 mutant is that ACS1 is transcriptionally repressed in the high glucose conditions of viability experiments and so is unable to compensate for the loss of ACS2. Transcriptional control is not captured in the metabolic network and thus cannot be captured in metabolic reconstructions of this type.
Both these examples highlight the caution required when using approaches such as FBA to validate reconstructions. The added detail in the present network can naturally lead to an increase in false positive outcomes: in silico knockouts that are overcome by alternative routings in the network but are actually lethal in vivo. This is, however, tempered by a decrease in false negative outcomes (i.e. knockouts that appear lethal computationally but are viable in vivo, as presented in Table 3).
Despite the much-increased coverage of the current reconstruction, 451 genes probably encode metabolic enzymes that still have no associated reaction (Additional file 2). For the majority of these, very little is known about their function and further characterisation is required. From the viewpoint of furthering systems biology reconstruction efforts, these enzymes are important targets for reductionist molecular biology studies, including, for instance, systematic analyses using the Robot Scientist approach . Their listing here is a motivation for further iterations on the cycle of knowledge.
The development of high quality, well annotated, genome-scale, metabolic networks is an ambitious, challenging, but necessary step towards the realisation of integrative systems biology. While networks predicted through bioinformatics approaches are useful, particularly for the extension of systems biology approaches to less well-studied organisms, reconstructions built upon solid biochemical evidence provide a gold standard upon which predictions can be reliably based. For metabolic reconstructions, where the goal is to capture maximally our current understanding of metabolism, these problems are primarily of data integration and quality. It has proven essential to involve the extended systems biology and yeast communities in this process, both to establish the mechanisms and structures for acquiring and representing information, and also to tap into expert knowledge from the various sub-disciplines of biology and biochemistry. In the recent very large-scale reconstruction of the yeast molecular interaction network by Aho et al. , genomic, transcriptomic, proteomic and metabolomic data were integrated. These authors note that incorporating the higher quality data of Yeast 1.0 (and therefore even more of this contribution) would considerably improve their reconstruction over the metabolic information extracted from KEGG, and also that standards compliance is essential to this integration task.
Yeast 1.0 set standards and amalgamated existing networks, enhancing annotation and removing less reliable data. In this latest reconstruction, we have made significant headway on the process of filling gaps in the network. There is still some way to go before realising the goal of at least one reaction for each putative metabolic enzyme and, if one also considers enzyme promiscuity [37, 38], even this will represent an incomplete picture of metabolism. This latest reconstruction is a considerable improvement on previous releases, particularly in describing lipid metabolism and addressing gaps in the original reconstruction that hindered modelling efforts. Information from other reconstructions since Yeast 1.0 has been incorporated, although not indiscriminately, and very many reactions not found in other reconstructions have been garnered from the literature. It is considerably larger than all previous efforts, while maintaining compliance with community-defined standards.
While Yeast 1.0 represented a major advance, particularly through the definition of standards and by the involvement of the wider yeast community, a major flaw was that it was not amenable to constraint-based analysis. The current reconstruction rectifies this, mostly by filling in gaps but also by inclusion of an appropriately annotated "biomass" reaction, without compromising the strict evidence requirements of its predecessor. When compared to experimental knockout data, this reconstruction did not identify certain lethal knockouts that other yeast reconstructions correctly predicted, but proves better than them in recognising viable deletions. This is a direct result of the richness of the model; as with the example of the acetyl-coA synthetases (above), addition of isoenzymes of specific reactions that do not exist in earlier reconstructions can reduce the predictive power of the model. Nonetheless, such enzymes are included due to literature support. This reconstruction continues the shifting focus, started with the consensus model Yeast 1.0, toward realistic representation and proof-based selection of reactions, rather than creating a reconstruction with simulation in mind. Reactions with a lower level of confidence (e.g. biomass definition) are characterised with specialised evidence codes and SBO terms, allowing the easy extraction of subsets of the network from the SBML code for specific purposes.
To facilitate further improvements, we encourage the community to provide information and/or corrections to the current release. We have set up a dedicated point-of-contact to this end firstname.lastname@example.org. We also highlight gaps in the network that cannot be resolved from current literature, as well as the little-studied enzymes for which we have not yet identified any function (see Additional File 2). These represent potentially important research opportunities for the community and we welcome efforts towards an improved understanding of their functions.
The Manchester groups thank the UK Biotechnology and Biological Sciences Research Council (BBSRC) and the Engineering and Physical Sciences Research Council (EPSRC) for financial support (grants BB/C008219/1 and BB/F006012/1). The Cambridge group acknowledges BBSRC grant BB/C505140/2. The Manchester, Aberystwyth and Cambridge groups all acknowledge support from the European Union FP7 project UNICELLSYS (Grant agreement no.: 201142) and from SysMO (MOSES). We thank Mike Hucka for advice on formatting SBML annotations, Rasmus Ågren for providing the iIN800 reconstruction and Steve Turner for help with ChEBI submissions. This is a contribution from the Manchester Centre for Integrative Systems Biology and the Cambridge Systems Biology Centre.
- Förster J, Famili I, Fu P, Palsson BØ, Nielsen J: Genome-scale reconstruction of the Saccharomyces cerevisiae metabolic network. Genome Research. 2003, 13 (2): 244-253. 10.1101/gr.234503PubMed CentralView ArticlePubMed
- Duarte NC, Herrgård MJ, Palsson BØ: Reconstruction and validation of Saccharomyces cerevisiae iND750, a fully compartmentalized genome-scale metabolic model. Genome Research. 2004, 14 (7): 1298-1309. 10.1101/gr.2250904PubMed CentralView ArticlePubMed
- Kuepfer L, Sauer U, Blank LM: Metabolic functions of duplicate genes in Saccharomyces cerevisiae. Genome Research. 2005, 15 (10): 1421-1430. 10.1101/gr.3992505PubMed CentralView ArticlePubMed
- Herrgård MJ, Swainston N, Dobson P, Dunn WB, Arga KY, Arvas M, Blüthgen N, Borger S, Costenoble R, Heinemann M, et al.: A consensus yeast metabolic network reconstruction obtained from a community approach to systems biology. Nature Biotechnology. 2008, 26 (10): 1155-1160. 10.1038/nbt1492PubMed CentralView ArticlePubMed
- Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, Arkin AP, Bornstein BJ, Bray D, Cornish-Bowden A, et al.: The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics. 2003, 19 (4): 524-531. 10.1093/bioinformatics/btg015View ArticlePubMed
- Wang XS, Gorlitsky R, Almeida JS: From XML to RDF: how semantic web technologies will change the design of 'omic' standards. Nature Biotechnology. 2005, 23 (9): 1099-1103. 10.1038/nbt1139View ArticlePubMed
- Kell DB, Mendes P: The markup is the model: reasoning about systems biology models in the Semantic Web era. Journal of Theoretical Biology. 2008, 252 (3): 538-543. 10.1016/j.jtbi.2007.10.023View ArticlePubMed
- Le Novere N, Finney A, Hucka M, Bhalla US, Campagne F, Collado-Vides J, Crampin EJ, Halstead M, Klipp E, Mendes P, et al.: Minimum information requested in the annotation of biochemical models (MIRIAM). Nature Biotechnology. 2005, 23 (12): 1509-1515. 10.1038/nbt1156View ArticlePubMed
- Laibe C, Le Novere N: MIRIAM resources: tools to generate and resolve robust cross-references in Systems Biology. BMC Systems Biology. 2007, 1: 58- 10.1186/1752-0509-1-58PubMed CentralView ArticlePubMed
- Apweiler R, Martin MJ, O'Donovan C, Magrane M, Alam-Faruque Y, Antunes R, Barrell D, Bely B, Bingley M, Binns D, et al.: The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Research. 2010, 38: D142-D148. 10.1093/nar/gkp846View Article
- Weng S, Dong Q, Balakrishnan R, Christie K, Costanzo M, Dolinski K, Dwight SS, Engel S, Fisk DG, Hong E, et al.: Saccharomyces Genome Database (SGD) provides biochemical and structural information for budding yeast proteins. Nucleic Acids Research. 2003, 31 (1): 216-218. 10.1093/nar/gkg054PubMed CentralView ArticlePubMed
- PubMed. http://www.ncbi.nlm.nih.gov/pubmed/
- de Matos P, Alcantara R, Dekker A, Ennis M, Hastings J, Haug K, Spiteri I, Turner S, Steinbeck C: Chemical Entities of Biological Interest: An update. Nucleic Acids Research. 2009, 38: D249-254. 10.1093/nar/gkp886PubMed CentralView ArticlePubMed
- YeastNet: A consensus reconstruction of yeast metabolism. http://www.comp-sys-bio.org/yeastnet/
- B-Net: A schema for representing detailed biochemical knowledge. http://mendes.vbi.vt.edu/tiki-index.php?page=B-Net
- Mo ML, Palsson BØ, Herrgård MJ: Connecting extracellular metabolomic measurements to intracellular flux states in yeast. BMC Systems Biology. 2009, 3: 37- 10.1186/1752-0509-3-37PubMed CentralView ArticlePubMed
- Nookaew I, Jewett MC, Meechai A, Thammarongtham C, Laoteng K, Cheevadhanarak S, Nielsen J, Bhumiratana S: The genome-scale metabolic model iIN800 of Saccharomyces cerevisiae and its validation: a scaffold to query lipid metabolism. BMC Systems Biology. 2008, 2: 71- 10.1186/1752-0509-2-71PubMed CentralView ArticlePubMed
- Heinisch JJ, Müller S, Schlüter E, Jacoby J, Rodicio R: Investigation of two yeast genes encoding putative isoenzymes of phosphoglycerate mutase. Yeast. 1998, 14 (3): 203-213. 10.1002/(SICI)1097-0061(199802)14:3<203::AID-YEA205>3.0.CO;2-8View ArticlePubMed
- Ratledge C, Cohen Z: Microbial and algal oils: Do they have a future for biodiesel or as commodity oils?. Lipid Technology. 2008, 20 (7): 155-160. 10.1002/lite.200800044.View Article
- Fahy E, Sud M, Cotter D, Subramaniam S: LIPID MAPS online tools for lipid research. Nucleic Acids Research. 2007, 35: W606-612. 10.1093/nar/gkm324PubMed CentralView ArticlePubMed
- Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M: From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Research. 2006, 34: D354-D357. 10.1093/nar/gkj102PubMed CentralView ArticlePubMed
- Hubbard TJP, Aken BL, Ayling S, Ballester B, Beal K, Bragin E, Brent S, Chen Y, Clapham P, Clarke L, et al.: Ensembl 2009. Nucleic Acids Research. 2009, 37: D690-D697. 10.1093/nar/gkn828PubMed CentralView ArticlePubMed
- Mahadevan R, Schilling CH: The effects of alternate optimal solutions in constraint-based genome-scale metabolic models. Metabolic Engineering. 2003, 5 (4): 264-276. 10.1016/j.ymben.2003.09.002View ArticlePubMed
- Kell DB, Oliver SG: Here is the evidence, now what is the hypothesis? The complementary roles of inductive and hypothesis-driven science in the post-genomic era. Bioessays. 2004, 26 (1): 99-105. 10.1002/bies.10385View ArticlePubMed
- Kauffman KJ, Prakash P, Edwards JS: Advances in flux balance analysis. Current Opinion in Biotechnology. 2003, 14 (5): 491-496. 10.1016/j.copbio.2003.08.001View ArticlePubMed
- Becker SA, Feist AM, Mo ML, Hannum G, Palsson BØ, Herrgård MJ: Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox. Nature Protocols. 2007, 2 (3): 727-738. 10.1038/nprot.2007.99View ArticlePubMed
- Le Novère N, Courtot M, Laibe C: Adding semantics in kinetics models of biochemical pathways. Proceedings of the 2nd International Symposium on experimental standard conditions of enzyme characterizations: 2006. 2006, 137-153. Rüdesheim, Germany Beilstein Institut
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al.: Gene Ontology: tool for the unification of biology. Nature Genetics. 2000, 25 (1): 25-29. 10.1038/75556PubMed CentralView ArticlePubMed
- Bornstein BJ, Keating SM, Jouraku A, Hucka M: LibSBML: An API library for SBML. Bioinformatics. 2008, 24 (6): 880-881. 10.1093/bioinformatics/btn051PubMed CentralView ArticlePubMed
- Makhorin A: GNU Linear Programming Kit. 2001, Moscow: Moscow Aviation Institute
- Giaever G, Chu AM, Ni L, Connelly C, Riles L, Véronneau S, Dow S, Lucau-Danila A, Anderson K, André B, et al.: Functional profiling of the Saccharomyces cerevisiae genome. Nature. 2002, 418 (6896): 387-391. 10.1038/nature00935View ArticlePubMed
- Snitkin ES, Dudley AM, Janse DM, Wong K, Church GM, Segrè D: Model-driven analysis of experimentally determined growth phenotypes for 465 yeast gene deletion mutants under 16 different conditions. Genome Biology. 2008, 9 (9): R140- 10.1186/gb-2008-9-9-r140PubMed CentralView ArticlePubMed
- SGD project: ACS1/YAL054C. http://www.yeastgenome.org/cgi-bin/locus.fpl?dbid=S000000050
- van den Berg MA, de Jong-Gubbels P, Kortland CJ, van Dijken JP, Pronk JT, Steensma HY: The two acetyl-coenzyme A synthetases of Saccharomyces cerevisiae differ with respect to kinetic properties and transcriptional regulation. Journal of Biological Chemistry. 1996, 271 (46): 28953-28959. 10.1074/jbc.271.46.28953View ArticlePubMed
- King RD, Rowland J, Oliver SG, Young M, Aubrey W, Byrne E, Liakata M, Markham M, Pir P, Soldatova LN, et al.: The Automation of Science. Science. 2009, 324 (5923): 85-89. 10.1126/science.1165620View ArticlePubMed
- Aho T, Almusa H, Matilainen J, Larjo A, Ruusuvuori P, Aho KL, Wilhelm T, Lähdesmäki H, Beyer A, Harju M: Reconstruction and validation of RefRec: a global model for the yeast molecular interaction network. PLoS ONE. 5 (5): e10662-
- Hult K, Berglund P: Enzyme promiscuity: mechanism and applications. Trends in Biotechnology. 2007, 25 (5): 231-238. 10.1016/j.tibtech.2007.03.002View ArticlePubMed
- Nobeli I, Favia AD, Thornton JM: Protein promiscuity and its implications for biotechnology. Nature Biotechnology. 2009, 27 (2): 157-167. 10.1038/nbt1519View ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.