- Open Access
SSER: Species specific essential reactions database
BMC Systems Biologyvolume 11, Article number: 50 (2017)
Essential reactions are vital components of cellular networks. They are the foundations of synthetic biology and are potential candidate targets for antimetabolic drug design. Especially if a single reaction is catalyzed by multiple enzymes, then inhibiting the reaction would be a better option than targeting the enzymes or the corresponding enzyme-encoding gene. The existing databases such as BRENDA, BiGG, KEGG, Bio-models, Biosilico, and many others offer useful and comprehensive information on biochemical reactions. But none of these databases especially focus on essential reactions. Therefore, building a centralized repository for this class of reactions would be of great value.
Here, we present a species-specific essential reactions database (SSER). The current version comprises essential biochemical and transport reactions of twenty-six organisms which are identified via flux balance analysis (FBA) combined with manual curation on experimentally validated metabolic network models. Quantitative data on the number of essential reactions, number of the essential reactions associated with their respective enzyme-encoding genes and shared essential reactions across organisms are the main contents of the database.
SSER would be a prime source to obtain essential reactions data and related gene and metabolite information and it can significantly facilitate the metabolic network models reconstruction and analysis, and drug target discovery studies. Users can browse, search, compare and download the essential reactions of organisms of their interest through the website http://cefg.uestc.edu.cn/sser.
Despite their complexity, the reconstructed metabolic networks are important tools to visualize the ‘omics’ data and foster understanding and interpretation of these data in terms of biological functions . Reconstruction of such networks is time intensive and requires extensive effort, costing several months to years depending on the genome size and number of personnel involved . Although the degree of indispensability is not uniformly equal for all of the reactions in the network, each reaction in the metabolic network contributes for the proper functionality of the biological system of the organism in one or other way. Consequently, these reactions are classified as either essential or non-essential. The essential ones are those reactions which are vital for the viability of the organism in a given living conditions than non-essential ones. Some of the reactions are universally essential irrespective of the environment in which the organism is situated, these reactions are identified for a model organism and termed as “super-essential” in the network .
Following the whole genome sequencing and biological systems modeling, the number of predictive metabolic network models has been growing significantly. Consequently, tremendous numbers of biological databases storing metabolic pathway information have been developed. Although the efforts have contributed greatly to the understanding of the systems biology of a considerable number of organisms, finding the reaction essentiality data in a centralized repository has given little attention. The existing databases such as KEGG (Kyoto Encyclopedia of Genes and Genomes) , BIGG (Biochemical Genetic and Genomic, Systems Biology Research Group of University of California San Diego) , Biocyc, Metacyc , Ecocyc , Bio-models , the model SEED , GSMN (Genome-Scale Models Database, Tian Jin University) , Biosilico  and many others offer comprehensive information on biochemical reactions , but none of them especially focus on essential reactions. Therefore, building a centralized repository for this class of reactions would be of great value. Essential reactions are potential candidate targets for antimetabolic drug design [3, 13, 14]. Especially if a single reaction is catalyzed by multiple enzymes, then inhibiting the reaction would be a better option than targeting the enzymes itself or the corresponding enzyme-coding gene  and this was the key driving force for us towards constructing species-specific essential reactions database (SSER).
The current version (version 1.0) of SSER includes essential biochemical and transport reactions of twenty-six organisms. The reactions were obtained by applying flux balance analysis (FBA) on experimentally validated metabolic network models in in-silico growth conditions in combination with manual curation of each reaction. Besides to storing biochemically essential reactions, SSER can allow the users to obtain information related to the enzyme-coding genes, essential precursors, and products in a defined in in-silico growth conditions. The information from SSER can also have a significant role in biotechnology based industries as essential reactions can be used to increase the yields of production in these industries.
Construction and content
Data acquisition and source
Comprehensive, latest and experimentally validated genome-scale metabolic network model versions were downloaded (Nov-Dec 2015) from publically accessible model repositories, mainly BiGG, GSMN and authors’ publications (Additional file 1). It means that a model is selected from multiple versions of an organism, if it is the most up to date, contains comprehensive information and experimentally validated. For instance, we chose to use iJO1366 because it was the most up to date version of Escherichia coli K-12 MG1655 at the time of model collection. Furthermore, iJO1366 represents a significant expansion of the E. coli reconstruction than iAF1260 and older versions as it contains greater number of genes, metabolic reactions and unique metabolites . The above criteria were set only to limit the number of models to be considered in the first version of SSER. We put forward to include more models and organisms in future versions. The degree of essentiality of a gene/reaction is crucially dependent on the growth environment, and hence each reaction in our database is supplemented with growth media information. This information was obtained by searching published articles reporting the experimentally validated reconstructions of each organism (see Additional file 2). To investigate the extent of association between essential reactions and essential genes, we downloaded the essential gene information of two organisms selected for the case study, E. coli K-12 MG1655 and Bacillus subtilis 168 from DEG version 13.0 (Database of Essential Genes) database . We chose the two microbes because they are the most studied and best characterized in terms of their genome annotation, functional characterization, and knowledge of growth behavior [18–20]. See Additional file 3 for the whole workflow.
Recently, computational approaches have become the most powerful techniques over the experimental counterparts in reaction/gene essentiality analysis due to their high sensitivity, speed, accuracy, and low cost . We took an advantage of a constraint-based flux balance analysis (FBA) approach in conjunction with manual curation in constructing SSER. The Constraint-Based Reconstruction and Analysis (COBRA 2.0)  toolbox in MATLAB environment was implemented in this regard. FBA is among powerful in-silico technique which has been widely used in genome-scale metabolic network reconstruction and analysis. A Significant number of studies have also revealed its capability to accurately predict cellular phenotypes from genotypes. For example, in a yeast model, iND750 reconstruction 4,154 in-silico predicted growth phenotypes across multiple environmental conditions were compared with two large-scale experimental deletion studies showed 83% agreement between the in-silico and the experimental results . As a second step towards constructing SSER, the models, as downloaded from their source were loaded into MATLAB (MathWorks® R2012b) environment with the Constraint-Based Reconstruction and Analysis (COBRA 2.0) Toolbox [2, 21, 23] and then a single reaction deletion simulation was applied as described in the following section.
Flux balance analysis (FBA)
Flux balance analysis (FBA) is an approved constraint-based approach which is based on the principle of linear optimization to determine the steady-state reaction flux distribution in a metabolic network by maximizing an objective function [14, 24]. By definition, an essential reaction is a biochemical or transport reaction its deletion abolishes or decreases the cellular growth significantly [25, 26]. The essential reactions in the network models can be determined through single reaction deletion studies. In a single reaction deletion function, a flux value of zero is given to the reaction that is to be removed, or the reaction catalyzed by a particular enzyme is completely removed from the network or switched off. Hence, depending on the value of the Biomass Objective Function (BOF), the fate of each reaction under investigation could be decided [2, 27, 28].
The growth ratio of the mutant to the wild-type denoted as “grRatio” in our database and “Browse” page of the website, was used to determine essential reactions in each model. Different threshold values of the biomass production rates for gene/reaction essentiality determination has been used, ranging from 0 to 10% growth reduction of the mutant with respect to the wild-type depending on a given substrate conditions and other imposed constraints [26, 29–31]. Yang and coworkers  have observed consistency in gene essentiality prediction of the computational method with experimental methods using the biomass production ratio of less than 1% and 0. That is, they obtained consistent results using the cutoffs <1% and 0 separately. They assumed that computationally zero growth can be assessed with biomass production of less than 1e−6 for computational noise elimination. In another study, 1% cut-off value was used in determining synthetic reaction lethality analysis . In our work, a reaction is classified as essential if the growth ratio is less than 1% and these reactions were extracted into a separate file for further curation. We thought using this stricter cutoff can reduce the risk of inclusion of the false positives into our collection of essential reactions. A similar threshold value was used in a case study conducted for the validation of single gene deletion function of the COBRA toolbox where the maximum growth rate was defined to be greater than 99% in yeast iDN750 model  (see Additional file 4).
Once the reactions that met the above criteria extracted, the next step was to unify the short names (Abbreviations) of the reactions. Searching our database would be troublesome if the reactions were deposited as they were in the models because different researchers follow various methods of nomenclature of biochemical reactions in their reconstructions. Therefore, we looked some way to reorganize and unify the reactions that were identified as essential in each organism. This was achieved by searching in BiGG databases for the abbreviations by using the names of the reactions as a query string. The search results are not always single value but some reaction names are associated with multiple abbreviations. In such conditions, we decide to choose the one with pre-defined reaction parameters such as metabolite type and compartment match with the query reaction. For example, searching BiGG database for the reactions “2 succinyl 6 hydroxy 2 4 cyclohexadiene 1 carboxylate synthase” returned SHCHCS3, SHCHCS2, and 2S6HCCi. Among the results, 2S6HCCi exactly fit our search criteria and hence it is considered as a short name for that particular reaction.
The current version (version 1.0) of SSER contains 6077 essential biochemical and transport reactions of twenty-six organisms. It is a relational database built on the top of seven tables, four of which are major contributors whereas the remaining three are bridging tables. The three most important tables include “reaction”, “reactions”, and “species”. The “reaction” table lists SSER_ID, reaction abbreviation, reaction name and reference’s PubMed ID (PMID) for each reaction. The reactions table lists the details of each reaction. These include growth rate of knockout strain (grRateKO), growth rate wild type (grRateWT), growth ratio (grRatio), cutoff, reaction equation, the subsystem, media condition, associated gene and gene name if exists. The third table describes the species name and source of the data (see Fig. 1).
Utility and discussion
SSER was established with the primary objective of delivering three vital functions to its users. The first and the most important one is to serve as a repository for quantitative data, names, formulae and stoichiometric equations of essential reactions in comparison to the total number of the reactions used in the reconstructions of each organism. Furthermore, a quantitative data about the number of the essential reactions associated with their respective enzyme-encoding genes can also be retrieved from SSER (see Fig. 2).
Users can search for essential reactions of the organisms of their interest in the “Browse” page of the website by using a keyword. In addition, the details of each reaction can be browsed by following the link on SSER_ID field of each reaction. The link returns reaction equation, functional assignment (subsystem), growth media, growth rate knock out (KO), growth rate wild-type (WT), growth ratio, cutoff, the SSER_ID and names of the associated gene(s) of each reaction (see Fig. 3).
The “Contents” page is about the statistics of the database and is depicted in the form of tables and graphs. A table of the total number of the essential reactions and number of essential reactions associated with their corresponding enzyme coding genes as well as two graphs of shared essential reactions across the species and essential reaction-essential gene association graphs are included in this page. All the supporting data files such as the models used in this study, in SBML format and all essential reactions comprised in SSER can be downloaded at the “Download” page. The “Download” page also contains information for programmatic (API) access of SSER. “Help” page provides useful information on how to use the database and also it included the description of the headings of the table columns of the browse page. All the references reviewed for each organism are available at “References” page.
Secondly, to investigate whether essential reactions are evolutionary conserved or not, we identified the number of essential reactions shared across organisms in our database. To facilitate this task we developed a comparison function which is available on the ‘Compare’ page of the website. Users can compare the essential reactions across organisms with similar growth media conditions. As reaction essentiality is mainly determined by environmental condition, the comparison function is particularly limited to the prokaryotes which have grown in glucose minimal medium. Selecting two or more organism from the list on the page and clicking “Run” button at the bottom of the page provides a list of the short names and details of the reactions which were isolated as essential in the selected organisms. The result could be opened in the browser and can be downloaded in “.txt” file format. For instance, comparing E. coli K-12 MG1655 and Shigella flexneri 2a strain 301 returned 219 shared essential reactions. This represents 82.3% and 83.2% essential reactions in both organisms, respectively (see Additional file 5). We validated this result against sequence similarity alignment result in genome sequence report of Shigella flexneri 2a in which it shared 84.8% (3.9 Mb/4.6 Mb) of its genome with E. coli K-12 MG1655 and Escherichia coli O157 . A study conducted on the evolution of the metabolic network of E.coli  has also revealed similar result, showing that six E.coli strains compared have shared 285 essential reactions in their genomes.
A large number of essential reactions could be shared, particularly if the organisms are closely related on the tree of life. To this end, we calculated the evolutionary distance across 22 prokaryotes using composition vector method  and correlated this data with the number of shared essential reactions. For instance, using the same organism as above, E. coli K-12 MG1655 and Shigella flexneri 2a str. 30, we obtained the shortest calculated Composition Vector Distance (CVD) for these organisms (CVD = 0.165165606804). But E. coli K-12 MG1655 has shared only 124 essential reactions with Yersinia pestis CO92 which is distantly related to it than Shigella flexneri 2a str. 301 (CVD = 0.500301106751) (see Additional file 6). Recent studies have also shown that phylogenetically closely related organisms share an evolutionarily conserved core of essential reactions [20, 30, 34, 35]. All the calculated CVD values can be accessed on the “Compare” page of our website.
Surprisingly, three reactions, namely CHORS (Chorismate synthase), SHKK (Shikimate kinase) and PSCVT (3phosphoshikimate 1carboxyvinyltransferase), were found to be essential in multiple organisms irrespective of the growth media condition and the phylogeny of the organisms. They were found essential in all 22 prokaryotes in our database. Inspired by the case above, we searched our database for organisms in which a given reaction is essential and identified the number of shared essential reaction across the entire organisms. However, the reason behind this trend needs further investigation which is beyond the scope of this article (see Fig. 4 and Additional file 7).
The third important information in SSER is a quantitative data about the number of the essential reactions associated with essential genes. Essential genes have been predicted by removing or switching off enzyme-catalyzed biochemical reactions. If the switching-off of the reaction abolishes or significantly reduces the cellular growth, then the gene that encodes the protein catalyzing that particular reaction is considered to be essential [10, 30, 36]. In this particular case study, essential reactions-essential genes association analysis of E.coli K-12 MG1655 and Bacillus subtilis 168 strains revealed that 116 and 65 out of 269 and 205 reactions respectively were catalyzed by the enzymes encoded by essential genes. From the case above, we can see that the number of the essential genes is not exactly equal to the number of essential reactions. Therefore, this result alerts us to consider the role of essential reactions in cellular systems studies than solely depending on essential genes information in such studies (see Fig. 5).
Availability and requirements
SSER is publically accessible via http://cefg.uestc.edu.cn/sser and comprises 6077 essential biochemical reactions of twenty-six species. The website is scripted in HTML5, CSS3, PHP and SQL and tested with Internet Explorer 8, Internet Explorer 7, Firefox, Google Chrome and Safari4.
The current version of SSER comprises 6077 essential biochemical and transport reactions of twenty-six organisms. The reactions were identified via flux balance analysis (FBA) in conjunction with manual curation on experimentally validated metabolic network models. SSER would be a prime source to obtain essential reactions data and related gene and metabolite information. It can significantly facilitate the metabolic network models reconstruction and analysis, and drug target discovery studies. Furthermore, SSER provides a function for comparing essential reactions across organisms thereby extending its applicability to evolutionary studies. Finally, we put forward to update SSER on a regular basis.
Biomass objective function
Constraint-based flux balance
Constraint based reconstruction and analysis
Flux balance analysis
Genome-scale metabolic models
Systems biology markup language
Species specific essential reactions database
Francke C, Siezen RJ, Teusink B. Reconstructing the metabolic network of a bacterium from its genome. Trends Microbiol. 2005;13(11):550–8.
Thiele I, Palsson BO. A protocol for generating a high-quality genome-scale metabolic reconstruction. Nat Protoc. 2010;5(1):93–121.
Barve A, Rodrigues JF, Wagner A. Superessential reactions in metabolic networks. Proc Natl Acad Sci U S A. 2012;109(18):E1121–1130.
Kanehisa M, Goto S, Furumichi M, Tanabe M, Hirakawa M. KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res. 2010;38(Database issue):D355–360.
King ZA, Lu J, Drager A, Miller P, Federowicz S, Lerman JA, Ebrahim A, Palsson BO, Lewis NE. BiGG Models: A platform for integrating, standardizing and sharing genome-scale models. Nucleic Acids Res. 2016;44(D1):D515–522.
Caspi R, Altman T, Billington R, Dreher K, Foerster H, Fulcher CA, Holland TA, Keseler IM, Kothari A, Kubo A, et al. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases. Nucleic Acids Res. 2014;42(Database issue):D459–471.
Keseler IM, Mackie A, Peralta-Gil M, Santos-Zavaleta A, Gama-Castro S, Bonavides-Martinez C, Fulcher C, Huerta AM, Kothari A, Krummenacker M, et al. EcoCyc: fusing model organism databases with systems biology. Nucleic Acids Res. 2013;41(Database issue):D605–612.
Le Novere N, Bornstein B, Broicher A, Courtot M, Donizelli M, Dharuri H, Li L, Sauro H, Schilstra M, Shapiro B, et al. BioModels Database: a free, centralized database of curated, published, quantitative kinetic models of biochemical and cellular systems. Nucleic Acids Res. 2006;34(Database issue):D689–691.
Henry CS, DeJongh M, Best AA, Frybarger PM, Linsay B, Stevens RL. High-throughput generation, optimization and analysis of genome-scale metabolic models. Nat Biotechnol. 2010;28(9):977–82.
Meiyappan Lakshmanan BMaD-YL. Identifying essential genes/reactions of the rice photorespiration by in silico model-based analysis. Springer. 2013;6(20):1–5.
Hou BK, Kim JS, Jun JH, Lee DY, Kim YW, Chae S, Roh M, In YH, Lee SY. BioSilico: an integrated metabolic database system. Bioinformatics. 2004;20(17):3270–2.
Lang M, Stelzer M, Schomburg D. BKM-react, an integrated biochemical reaction database. BMC Biochem. 2011;12(1):42.
Kim HU, Kim TY, Lee SY. Genome-scale metabolic network analysis and drug targeting of multi-drug resistant pathogen Acinetobacter baumannii AYE. Mol BioSyst. 2010;6(2):339–48.
Lee JM, Gianchandani EP, Papin JA. Flux balance analysis in the era of metabolomics. Brief Bioinform. 2006;7(2):140–50.
Sun Z-XXX. Constrain-based analysis of gene deletion on the metabolic flux redistribution of Saccharomyces Cerevisiae. J Biomedical Science and Engineering. 2008;1:121–6.
Orth JD, Conrad TM, Na J, Lerman JA, Nam H, Feist AM, Palsson BO. A comprehensive genome-scale reconstruction of Escherichia coli metabolism--2011. Mol Syst Biol. 2011;7:535.
Luo H, Lin Y, Gao F, Zhang CT, Zhang R. DEG 10, an update of the database of essential genes that includes both protein-coding genes and noncoding genomic elements. Nucleic Acids Res. 2014;42(Database issue):D574–580.
Dotsch A, Klawonn F, Jarek M, Scharfe M, Blocker H, Haussler S. Evolutionary conservation of essential and highly expressed genes in Pseudomonas aeruginosa. BMC Genomics. 2010;11:234.
Zuo G, Hao B. CVTree3 Web Server for Whole-genome-based and Alignment-free Prokaryotic Phylogeny and Taxonomy. Genomics, Proteomics & Bioinformatics. 2015;13(5):321–31.
Luo H, Gao F, Lin Y. Evolutionary conservation analysis between the essential and nonessential genes in bacterial genomes. Scientific reports. 2015;5:13210.
Becker SA, Feist AM, Mo ML, Hannum G, Palsson BO, Herrgard MJ. Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox. Nat Protoc. 2007;2(3):727–38.
Doerks T, Copley RR, Schultz J, Ponting CP, Bork P. Systematic identification of novel protein domain families associated with nuclear functions. Genome Res. 2002;12(1):47–56.
Schellenberger J, Que R, Fleming RM, Thiele I, Orth JD, Feist AM, Zielinski DC, Bordbar A, Lewis NE, Rahmanian S, et al. Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v2.0. Nat Protoc. 2011;6(9):1290–307.
Raman K, Chandra N. Flux balance analysis of biological systems: applications and challenges. Brief Bioinform. 2009;10(4):435–49.
Thykaer J, Andersen MR, Baker SE. Essential pathway identification: from in silico analysis to potential antifungal targets in Aspergillus fumigatus. Med Mycol. 2009;47 Suppl 1:S80–87.
Navid A. Applications of system-level models of metabolism for analysis of bacterial physiology and identification of new drug targets. Briefings in functional genomics. 2011;10(6):354–64.
Sun J, Sayyar B, Butler JE, Pharkya P, Fahland TR, Famili I, Schilling CH, Lovley DR, Mahadevan R. Genome-scale constraint-based modeling of Geobacter metallireducens. BMC Syst Biol. 2009;3:15.
Mo ML, Palsson BO, Herrgard MJ. Connecting extracellular metabolomic measurements to intracellular flux states in yeast. BMC Syst Biol. 2009;3:37.
Feist AM, Scholten JC, Palsson BO, Brockman FJ, Ideker T. Modeling methanogenesis with a genome-scale metabolic reconstruction of Methanosarcina barkeri. Mol Syst Biol. 2006;2:2006–0004.
Suthers PF, Zomorrodi A, Maranas CD. Genome-scale gene/reaction essentiality and synthetic lethality analysis. Mol Syst Biol. 2009;5:301.
Hong Yang EWK, Brutinel ED, Palani NP, Sadowsky MJ, Odlyzko AM, Gralnick JA, Igor G, Libourel L. Genome-Scale Metabolic Network Validation of Shewanella oneidensis Using Transposon Insertion Frequency Analysis. PLoS Comput Biol. 2014;10(9):e1003848.
Qi Jin ZY, Jianguo X, Yu W, Yan S, Weichuan L, Jinhua W, Hong L, Jian Y, Fan Y, Xiaobing Z, Jiyu Z, Guowei Y, Hongtao W, Di Q, Jie D, Lilian S, Ying X, Ailan Z, Yishan G, Junping Z, Biao K, Keyue D, Shuxia C, Hongsong C, Zhijian Y, Bingkun H, Runsheng C, Dalong M, Boqin Q, Yumei W, Yunde H, Jun Y. Genome sequence of Shigella exneri 2a: insights into pathogenicity through comparison with genomes of Escherichia coli K12 and O157. Nucleic Acids Res. 2002;30(20):4432–41.
Baumler DJ, Peplinski RG, Reed JL, Glasner JD, Perna NT. The evolution of metabolic networks of E. coli. BMC Syst Biol. 2011;5:182.
Bergmiller T, Ackermann M, Silander OK. Patterns of evolutionary conservation of essential genes correlate with their compensability. PLoS Genet. 2012;8(6), e1002803.
Gerdes SY, Scholle MD, Campbell JW, Balazsi G, Ravasz E, Daugherty MD, Somera AL, Kyrpides NC, Anderson I, Gelfand MS, et al. Experimental Determination and System Level Analysis of Essential Genes in Escherichia coli MG1655. J Bacteriol. 2003;185(19):5673–84.
Price ND, Papin JA, Schilling CH, Palsson BO. Genome-scale microbial in silico models: the constraints-based approach. Trends Biotechnol. 2003;21(4):162–9.
Our special thanks go to Mr. Korabza Shewarega for his unlimited help in developing the web interface and all CEFG group members at UESTC for their overall support.
This work was supported by National Natural Science Foundation of China [31470068,31660320], Sichuan Youth Science and Technology Foundation of China [2014JQ0051] and Fundamental Research Funds for the Central Universities of China [ZYGX2015Z006 and ZYGX2015J144]. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Availability of data and materials
All models analysed and data generated during this study are included in this article (Additional file 1).
AAL has performed the data acquisition, data analysis and construction of the database and wrote the web interface development codes and draft manuscript. YNY has contributed by writing the web page codes. FBG has designed the study, supervised the whole work, and revised the manuscript. CD has written a built-in program for a compare page of the website. CD and FZZ have written MATLAB and python programs for essential reactions extraction and for conversion of the file formats. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
All models and reactions in SSER. (RAR 3930 kb)
Growth media information. (XLSX 13 kb)
Workflow. (TIF 70 kb)
Names and coefficients of metabolite in BOF. (XLSX 36 kb)
Shared reactions between E.coli and Shigella flexneri 2a str. 301. (XLSX 23 kb)
Composition vector distance. (XLSX 14 kb)
Shared essential reactions across all organisms. (XLSX 870 kb)