SSER: Species specific essential reactions database
- Abraham A. Labena†1, 2, 3,
- Yuan-Nong Ye†4,
- Chuan Dong1, 2,
- Fa-Z Zhang1, 2 and
- Feng-Biao Guo1, 2, 5Email author
© The Author(s). 2017
Received: 23 November 2016
Accepted: 13 April 2017
Published: 19 April 2017
Essential reactions are vital components of cellular networks. They are the foundations of synthetic biology and are potential candidate targets for antimetabolic drug design. Especially if a single reaction is catalyzed by multiple enzymes, then inhibiting the reaction would be a better option than targeting the enzymes or the corresponding enzyme-encoding gene. The existing databases such as BRENDA, BiGG, KEGG, Bio-models, Biosilico, and many others offer useful and comprehensive information on biochemical reactions. But none of these databases especially focus on essential reactions. Therefore, building a centralized repository for this class of reactions would be of great value.
Here, we present a species-specific essential reactions database (SSER). The current version comprises essential biochemical and transport reactions of twenty-six organisms which are identified via flux balance analysis (FBA) combined with manual curation on experimentally validated metabolic network models. Quantitative data on the number of essential reactions, number of the essential reactions associated with their respective enzyme-encoding genes and shared essential reactions across organisms are the main contents of the database.
SSER would be a prime source to obtain essential reactions data and related gene and metabolite information and it can significantly facilitate the metabolic network models reconstruction and analysis, and drug target discovery studies. Users can browse, search, compare and download the essential reactions of organisms of their interest through the website http://cefg.uestc.edu.cn/sser.
KeywordsSSER Database Essential Reactions Flux Balance Analysis (FBA) Metabolic Networks
Despite their complexity, the reconstructed metabolic networks are important tools to visualize the ‘omics’ data and foster understanding and interpretation of these data in terms of biological functions . Reconstruction of such networks is time intensive and requires extensive effort, costing several months to years depending on the genome size and number of personnel involved . Although the degree of indispensability is not uniformly equal for all of the reactions in the network, each reaction in the metabolic network contributes for the proper functionality of the biological system of the organism in one or other way. Consequently, these reactions are classified as either essential or non-essential. The essential ones are those reactions which are vital for the viability of the organism in a given living conditions than non-essential ones. Some of the reactions are universally essential irrespective of the environment in which the organism is situated, these reactions are identified for a model organism and termed as “super-essential” in the network .
Following the whole genome sequencing and biological systems modeling, the number of predictive metabolic network models has been growing significantly. Consequently, tremendous numbers of biological databases storing metabolic pathway information have been developed. Although the efforts have contributed greatly to the understanding of the systems biology of a considerable number of organisms, finding the reaction essentiality data in a centralized repository has given little attention. The existing databases such as KEGG (Kyoto Encyclopedia of Genes and Genomes) , BIGG (Biochemical Genetic and Genomic, Systems Biology Research Group of University of California San Diego) , Biocyc, Metacyc , Ecocyc , Bio-models , the model SEED , GSMN (Genome-Scale Models Database, Tian Jin University) , Biosilico  and many others offer comprehensive information on biochemical reactions , but none of them especially focus on essential reactions. Therefore, building a centralized repository for this class of reactions would be of great value. Essential reactions are potential candidate targets for antimetabolic drug design [3, 13, 14]. Especially if a single reaction is catalyzed by multiple enzymes, then inhibiting the reaction would be a better option than targeting the enzymes itself or the corresponding enzyme-coding gene  and this was the key driving force for us towards constructing species-specific essential reactions database (SSER).
The current version (version 1.0) of SSER includes essential biochemical and transport reactions of twenty-six organisms. The reactions were obtained by applying flux balance analysis (FBA) on experimentally validated metabolic network models in in-silico growth conditions in combination with manual curation of each reaction. Besides to storing biochemically essential reactions, SSER can allow the users to obtain information related to the enzyme-coding genes, essential precursors, and products in a defined in in-silico growth conditions. The information from SSER can also have a significant role in biotechnology based industries as essential reactions can be used to increase the yields of production in these industries.
Construction and content
Data acquisition and source
Comprehensive, latest and experimentally validated genome-scale metabolic network model versions were downloaded (Nov-Dec 2015) from publically accessible model repositories, mainly BiGG, GSMN and authors’ publications (Additional file 1). It means that a model is selected from multiple versions of an organism, if it is the most up to date, contains comprehensive information and experimentally validated. For instance, we chose to use iJO1366 because it was the most up to date version of Escherichia coli K-12 MG1655 at the time of model collection. Furthermore, iJO1366 represents a significant expansion of the E. coli reconstruction than iAF1260 and older versions as it contains greater number of genes, metabolic reactions and unique metabolites . The above criteria were set only to limit the number of models to be considered in the first version of SSER. We put forward to include more models and organisms in future versions. The degree of essentiality of a gene/reaction is crucially dependent on the growth environment, and hence each reaction in our database is supplemented with growth media information. This information was obtained by searching published articles reporting the experimentally validated reconstructions of each organism (see Additional file 2). To investigate the extent of association between essential reactions and essential genes, we downloaded the essential gene information of two organisms selected for the case study, E. coli K-12 MG1655 and Bacillus subtilis 168 from DEG version 13.0 (Database of Essential Genes) database . We chose the two microbes because they are the most studied and best characterized in terms of their genome annotation, functional characterization, and knowledge of growth behavior [18–20]. See Additional file 3 for the whole workflow.
Recently, computational approaches have become the most powerful techniques over the experimental counterparts in reaction/gene essentiality analysis due to their high sensitivity, speed, accuracy, and low cost . We took an advantage of a constraint-based flux balance analysis (FBA) approach in conjunction with manual curation in constructing SSER. The Constraint-Based Reconstruction and Analysis (COBRA 2.0)  toolbox in MATLAB environment was implemented in this regard. FBA is among powerful in-silico technique which has been widely used in genome-scale metabolic network reconstruction and analysis. A Significant number of studies have also revealed its capability to accurately predict cellular phenotypes from genotypes. For example, in a yeast model, iND750 reconstruction 4,154 in-silico predicted growth phenotypes across multiple environmental conditions were compared with two large-scale experimental deletion studies showed 83% agreement between the in-silico and the experimental results . As a second step towards constructing SSER, the models, as downloaded from their source were loaded into MATLAB (MathWorks® R2012b) environment with the Constraint-Based Reconstruction and Analysis (COBRA 2.0) Toolbox [2, 21, 23] and then a single reaction deletion simulation was applied as described in the following section.
Flux balance analysis (FBA)
Flux balance analysis (FBA) is an approved constraint-based approach which is based on the principle of linear optimization to determine the steady-state reaction flux distribution in a metabolic network by maximizing an objective function [14, 24]. By definition, an essential reaction is a biochemical or transport reaction its deletion abolishes or decreases the cellular growth significantly [25, 26]. The essential reactions in the network models can be determined through single reaction deletion studies. In a single reaction deletion function, a flux value of zero is given to the reaction that is to be removed, or the reaction catalyzed by a particular enzyme is completely removed from the network or switched off. Hence, depending on the value of the Biomass Objective Function (BOF), the fate of each reaction under investigation could be decided [2, 27, 28].
The growth ratio of the mutant to the wild-type denoted as “grRatio” in our database and “Browse” page of the website, was used to determine essential reactions in each model. Different threshold values of the biomass production rates for gene/reaction essentiality determination has been used, ranging from 0 to 10% growth reduction of the mutant with respect to the wild-type depending on a given substrate conditions and other imposed constraints [26, 29–31]. Yang and coworkers  have observed consistency in gene essentiality prediction of the computational method with experimental methods using the biomass production ratio of less than 1% and 0. That is, they obtained consistent results using the cutoffs <1% and 0 separately. They assumed that computationally zero growth can be assessed with biomass production of less than 1e−6 for computational noise elimination. In another study, 1% cut-off value was used in determining synthetic reaction lethality analysis . In our work, a reaction is classified as essential if the growth ratio is less than 1% and these reactions were extracted into a separate file for further curation. We thought using this stricter cutoff can reduce the risk of inclusion of the false positives into our collection of essential reactions. A similar threshold value was used in a case study conducted for the validation of single gene deletion function of the COBRA toolbox where the maximum growth rate was defined to be greater than 99% in yeast iDN750 model  (see Additional file 4).
Once the reactions that met the above criteria extracted, the next step was to unify the short names (Abbreviations) of the reactions. Searching our database would be troublesome if the reactions were deposited as they were in the models because different researchers follow various methods of nomenclature of biochemical reactions in their reconstructions. Therefore, we looked some way to reorganize and unify the reactions that were identified as essential in each organism. This was achieved by searching in BiGG databases for the abbreviations by using the names of the reactions as a query string. The search results are not always single value but some reaction names are associated with multiple abbreviations. In such conditions, we decide to choose the one with pre-defined reaction parameters such as metabolite type and compartment match with the query reaction. For example, searching BiGG database for the reactions “2 succinyl 6 hydroxy 2 4 cyclohexadiene 1 carboxylate synthase” returned SHCHCS3, SHCHCS2, and 2S6HCCi. Among the results, 2S6HCCi exactly fit our search criteria and hence it is considered as a short name for that particular reaction.
Utility and discussion
The “Contents” page is about the statistics of the database and is depicted in the form of tables and graphs. A table of the total number of the essential reactions and number of essential reactions associated with their corresponding enzyme coding genes as well as two graphs of shared essential reactions across the species and essential reaction-essential gene association graphs are included in this page. All the supporting data files such as the models used in this study, in SBML format and all essential reactions comprised in SSER can be downloaded at the “Download” page. The “Download” page also contains information for programmatic (API) access of SSER. “Help” page provides useful information on how to use the database and also it included the description of the headings of the table columns of the browse page. All the references reviewed for each organism are available at “References” page.
Secondly, to investigate whether essential reactions are evolutionary conserved or not, we identified the number of essential reactions shared across organisms in our database. To facilitate this task we developed a comparison function which is available on the ‘Compare’ page of the website. Users can compare the essential reactions across organisms with similar growth media conditions. As reaction essentiality is mainly determined by environmental condition, the comparison function is particularly limited to the prokaryotes which have grown in glucose minimal medium. Selecting two or more organism from the list on the page and clicking “Run” button at the bottom of the page provides a list of the short names and details of the reactions which were isolated as essential in the selected organisms. The result could be opened in the browser and can be downloaded in “.txt” file format. For instance, comparing E. coli K-12 MG1655 and Shigella flexneri 2a strain 301 returned 219 shared essential reactions. This represents 82.3% and 83.2% essential reactions in both organisms, respectively (see Additional file 5). We validated this result against sequence similarity alignment result in genome sequence report of Shigella flexneri 2a in which it shared 84.8% (3.9 Mb/4.6 Mb) of its genome with E. coli K-12 MG1655 and Escherichia coli O157 . A study conducted on the evolution of the metabolic network of E.coli  has also revealed similar result, showing that six E.coli strains compared have shared 285 essential reactions in their genomes.
A large number of essential reactions could be shared, particularly if the organisms are closely related on the tree of life. To this end, we calculated the evolutionary distance across 22 prokaryotes using composition vector method  and correlated this data with the number of shared essential reactions. For instance, using the same organism as above, E. coli K-12 MG1655 and Shigella flexneri 2a str. 30, we obtained the shortest calculated Composition Vector Distance (CVD) for these organisms (CVD = 0.165165606804). But E. coli K-12 MG1655 has shared only 124 essential reactions with Yersinia pestis CO92 which is distantly related to it than Shigella flexneri 2a str. 301 (CVD = 0.500301106751) (see Additional file 6). Recent studies have also shown that phylogenetically closely related organisms share an evolutionarily conserved core of essential reactions [20, 30, 34, 35]. All the calculated CVD values can be accessed on the “Compare” page of our website.
Availability and requirements
SSER is publically accessible via http://cefg.uestc.edu.cn/sser and comprises 6077 essential biochemical reactions of twenty-six species. The website is scripted in HTML5, CSS3, PHP and SQL and tested with Internet Explorer 8, Internet Explorer 7, Firefox, Google Chrome and Safari4.
The current version of SSER comprises 6077 essential biochemical and transport reactions of twenty-six organisms. The reactions were identified via flux balance analysis (FBA) in conjunction with manual curation on experimentally validated metabolic network models. SSER would be a prime source to obtain essential reactions data and related gene and metabolite information. It can significantly facilitate the metabolic network models reconstruction and analysis, and drug target discovery studies. Furthermore, SSER provides a function for comparing essential reactions across organisms thereby extending its applicability to evolutionary studies. Finally, we put forward to update SSER on a regular basis.
Biomass objective function
Constraint-based flux balance
Constraint based reconstruction and analysis
Flux balance analysis
Genome-scale metabolic models
Systems biology markup language
Species specific essential reactions database
Our special thanks go to Mr. Korabza Shewarega for his unlimited help in developing the web interface and all CEFG group members at UESTC for their overall support.
This work was supported by National Natural Science Foundation of China [31470068,31660320], Sichuan Youth Science and Technology Foundation of China [2014JQ0051] and Fundamental Research Funds for the Central Universities of China [ZYGX2015Z006 and ZYGX2015J144]. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Availability of data and materials
All models analysed and data generated during this study are included in this article (Additional file 1).
AAL has performed the data acquisition, data analysis and construction of the database and wrote the web interface development codes and draft manuscript. YNY has contributed by writing the web page codes. FBG has designed the study, supervised the whole work, and revised the manuscript. CD has written a built-in program for a compare page of the website. CD and FZZ have written MATLAB and python programs for essential reactions extraction and for conversion of the file formats. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Francke C, Siezen RJ, Teusink B. Reconstructing the metabolic network of a bacterium from its genome. Trends Microbiol. 2005;13(11):550–8.View ArticlePubMedGoogle Scholar
- Thiele I, Palsson BO. A protocol for generating a high-quality genome-scale metabolic reconstruction. Nat Protoc. 2010;5(1):93–121.View ArticlePubMedPubMed CentralGoogle Scholar
- Barve A, Rodrigues JF, Wagner A. Superessential reactions in metabolic networks. Proc Natl Acad Sci U S A. 2012;109(18):E1121–1130.View ArticlePubMedPubMed CentralGoogle Scholar
- Kanehisa M, Goto S, Furumichi M, Tanabe M, Hirakawa M. KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res. 2010;38(Database issue):D355–360.View ArticlePubMedGoogle Scholar
- King ZA, Lu J, Drager A, Miller P, Federowicz S, Lerman JA, Ebrahim A, Palsson BO, Lewis NE. BiGG Models: A platform for integrating, standardizing and sharing genome-scale models. Nucleic Acids Res. 2016;44(D1):D515–522.View ArticlePubMedGoogle Scholar
- Caspi R, Altman T, Billington R, Dreher K, Foerster H, Fulcher CA, Holland TA, Keseler IM, Kothari A, Kubo A, et al. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases. Nucleic Acids Res. 2014;42(Database issue):D459–471.View ArticlePubMedGoogle Scholar
- Keseler IM, Mackie A, Peralta-Gil M, Santos-Zavaleta A, Gama-Castro S, Bonavides-Martinez C, Fulcher C, Huerta AM, Kothari A, Krummenacker M, et al. EcoCyc: fusing model organism databases with systems biology. Nucleic Acids Res. 2013;41(Database issue):D605–612.View ArticlePubMedGoogle Scholar
- Le Novere N, Bornstein B, Broicher A, Courtot M, Donizelli M, Dharuri H, Li L, Sauro H, Schilstra M, Shapiro B, et al. BioModels Database: a free, centralized database of curated, published, quantitative kinetic models of biochemical and cellular systems. Nucleic Acids Res. 2006;34(Database issue):D689–691.View ArticlePubMedGoogle Scholar
- Henry CS, DeJongh M, Best AA, Frybarger PM, Linsay B, Stevens RL. High-throughput generation, optimization and analysis of genome-scale metabolic models. Nat Biotechnol. 2010;28(9):977–82.View ArticlePubMedGoogle Scholar
- Meiyappan Lakshmanan BMaD-YL. Identifying essential genes/reactions of the rice photorespiration by in silico model-based analysis. Springer. 2013;6(20):1–5.Google Scholar
- Hou BK, Kim JS, Jun JH, Lee DY, Kim YW, Chae S, Roh M, In YH, Lee SY. BioSilico: an integrated metabolic database system. Bioinformatics. 2004;20(17):3270–2.View ArticlePubMedGoogle Scholar
- Lang M, Stelzer M, Schomburg D. BKM-react, an integrated biochemical reaction database. BMC Biochem. 2011;12(1):42.View ArticlePubMedPubMed CentralGoogle Scholar
- Kim HU, Kim TY, Lee SY. Genome-scale metabolic network analysis and drug targeting of multi-drug resistant pathogen Acinetobacter baumannii AYE. Mol BioSyst. 2010;6(2):339–48.View ArticlePubMedGoogle Scholar
- Lee JM, Gianchandani EP, Papin JA. Flux balance analysis in the era of metabolomics. Brief Bioinform. 2006;7(2):140–50.View ArticlePubMedGoogle Scholar
- Sun Z-XXX. Constrain-based analysis of gene deletion on the metabolic flux redistribution of Saccharomyces Cerevisiae. J Biomedical Science and Engineering. 2008;1:121–6.View ArticleGoogle Scholar
- Orth JD, Conrad TM, Na J, Lerman JA, Nam H, Feist AM, Palsson BO. A comprehensive genome-scale reconstruction of Escherichia coli metabolism--2011. Mol Syst Biol. 2011;7:535.View ArticlePubMedPubMed CentralGoogle Scholar
- Luo H, Lin Y, Gao F, Zhang CT, Zhang R. DEG 10, an update of the database of essential genes that includes both protein-coding genes and noncoding genomic elements. Nucleic Acids Res. 2014;42(Database issue):D574–580.View ArticlePubMedGoogle Scholar
- Dotsch A, Klawonn F, Jarek M, Scharfe M, Blocker H, Haussler S. Evolutionary conservation of essential and highly expressed genes in Pseudomonas aeruginosa. BMC Genomics. 2010;11:234.View ArticlePubMedPubMed CentralGoogle Scholar
- Zuo G, Hao B. CVTree3 Web Server for Whole-genome-based and Alignment-free Prokaryotic Phylogeny and Taxonomy. Genomics, Proteomics & Bioinformatics. 2015;13(5):321–31.View ArticleGoogle Scholar
- Luo H, Gao F, Lin Y. Evolutionary conservation analysis between the essential and nonessential genes in bacterial genomes. Scientific reports. 2015;5:13210.View ArticlePubMedPubMed CentralGoogle Scholar
- Becker SA, Feist AM, Mo ML, Hannum G, Palsson BO, Herrgard MJ. Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox. Nat Protoc. 2007;2(3):727–38.View ArticlePubMedGoogle Scholar
- Doerks T, Copley RR, Schultz J, Ponting CP, Bork P. Systematic identification of novel protein domain families associated with nuclear functions. Genome Res. 2002;12(1):47–56.View ArticlePubMedPubMed CentralGoogle Scholar
- Schellenberger J, Que R, Fleming RM, Thiele I, Orth JD, Feist AM, Zielinski DC, Bordbar A, Lewis NE, Rahmanian S, et al. Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v2.0. Nat Protoc. 2011;6(9):1290–307.View ArticlePubMedPubMed CentralGoogle Scholar
- Raman K, Chandra N. Flux balance analysis of biological systems: applications and challenges. Brief Bioinform. 2009;10(4):435–49.View ArticlePubMedGoogle Scholar
- Thykaer J, Andersen MR, Baker SE. Essential pathway identification: from in silico analysis to potential antifungal targets in Aspergillus fumigatus. Med Mycol. 2009;47 Suppl 1:S80–87.View ArticlePubMedGoogle Scholar
- Navid A. Applications of system-level models of metabolism for analysis of bacterial physiology and identification of new drug targets. Briefings in functional genomics. 2011;10(6):354–64.View ArticlePubMedGoogle Scholar
- Sun J, Sayyar B, Butler JE, Pharkya P, Fahland TR, Famili I, Schilling CH, Lovley DR, Mahadevan R. Genome-scale constraint-based modeling of Geobacter metallireducens. BMC Syst Biol. 2009;3:15.View ArticlePubMedPubMed CentralGoogle Scholar
- Mo ML, Palsson BO, Herrgard MJ. Connecting extracellular metabolomic measurements to intracellular flux states in yeast. BMC Syst Biol. 2009;3:37.View ArticlePubMedPubMed CentralGoogle Scholar
- Feist AM, Scholten JC, Palsson BO, Brockman FJ, Ideker T. Modeling methanogenesis with a genome-scale metabolic reconstruction of Methanosarcina barkeri. Mol Syst Biol. 2006;2:2006–0004.View ArticlePubMedPubMed CentralGoogle Scholar
- Suthers PF, Zomorrodi A, Maranas CD. Genome-scale gene/reaction essentiality and synthetic lethality analysis. Mol Syst Biol. 2009;5:301.View ArticlePubMedPubMed CentralGoogle Scholar
- Hong Yang EWK, Brutinel ED, Palani NP, Sadowsky MJ, Odlyzko AM, Gralnick JA, Igor G, Libourel L. Genome-Scale Metabolic Network Validation of Shewanella oneidensis Using Transposon Insertion Frequency Analysis. PLoS Comput Biol. 2014;10(9):e1003848.View ArticlePubMedPubMed CentralGoogle Scholar
- Qi Jin ZY, Jianguo X, Yu W, Yan S, Weichuan L, Jinhua W, Hong L, Jian Y, Fan Y, Xiaobing Z, Jiyu Z, Guowei Y, Hongtao W, Di Q, Jie D, Lilian S, Ying X, Ailan Z, Yishan G, Junping Z, Biao K, Keyue D, Shuxia C, Hongsong C, Zhijian Y, Bingkun H, Runsheng C, Dalong M, Boqin Q, Yumei W, Yunde H, Jun Y. Genome sequence of Shigella exneri 2a: insights into pathogenicity through comparison with genomes of Escherichia coli K12 and O157. Nucleic Acids Res. 2002;30(20):4432–41.View ArticlePubMedPubMed CentralGoogle Scholar
- Baumler DJ, Peplinski RG, Reed JL, Glasner JD, Perna NT. The evolution of metabolic networks of E. coli. BMC Syst Biol. 2011;5:182.View ArticlePubMedPubMed CentralGoogle Scholar
- Bergmiller T, Ackermann M, Silander OK. Patterns of evolutionary conservation of essential genes correlate with their compensability. PLoS Genet. 2012;8(6), e1002803.View ArticlePubMedPubMed CentralGoogle Scholar
- Gerdes SY, Scholle MD, Campbell JW, Balazsi G, Ravasz E, Daugherty MD, Somera AL, Kyrpides NC, Anderson I, Gelfand MS, et al. Experimental Determination and System Level Analysis of Essential Genes in Escherichia coli MG1655. J Bacteriol. 2003;185(19):5673–84.View ArticlePubMedPubMed CentralGoogle Scholar
- Price ND, Papin JA, Schilling CH, Palsson BO. Genome-scale microbial in silico models: the constraints-based approach. Trends Biotechnol. 2003;21(4):162–9.View ArticlePubMedGoogle Scholar