Our comparison revealed that there is only a small core of the metabolic network on which all five databases agree. Especially on reaction level the overlap is surprisingly low, only 199 reactions could be found in all five databases. Our analysis shows that the small overlap between the databases is partly explained by conceptual differences like a difference in coverage of the metabolic network. One clear example is the large set of transport reactions and reactions in lipid metabolism in EHMN, which account for 23% of the unique reactions.
Our decision to compare five pathway databases, also limits the consensus: the more databases one includes in the comparison, the lower the consensus is likely to be. We indeed observe a substantial increase in overlap when we compare pairs of databases (Additional file 4) instead of five. However, also in this case with a median consensus of around 15%, the agreement on reaction level is still relatively low. Two main factors can strongly bias the size of the consensus detected. Firstly, the consensus is constrained by differences in database size. This partly explains, for example, the consensus of only 11% when comparing a large database such as EHMN and a small database such as Reactome. Secondly, the consensus is positively influenced by the fact that databases are not constructed independently from each other. For example, EHMN used KEGG as a starting point for its reconstruction , which explains the higher consensus of 28%. However, even if we would restrict our comparison to three pathway databases, BiGG, EHMN, and KEGG, that are most interdependent [7, 9], the consensus on reaction level is still only 14%, when not considering the transport reactions from BiGG and EHMN.
Despite the observed lack of overlap, the GO enrichment analysis of the consensus and majority genes (Additional file 2) does provide us with evidence that there is a core of metabolic processes the databases agree on. Examples of such processes are nucleotide metabolism and carbohydrate metabolism, which is also reflected on reaction level (Additional file 3). The comparison of the core metabolic processes indeed showed a considerable increase of the majority score at the gene level and to a lesser extent at reaction level. However, the consensus on reaction level remains low even for this more limited set.
Especially on reaction level the comparison is clouded by several conceptual differences and technical difficulties. The main technical challenge is to establish the identity of metabolites between databases. This was also observed to be one of the main problems for the experts involved in the construction of the consensus of two in silico metabolic network reconstructions of S. cerevisiae . Matching metabolites by name is not an ideal solution, as many, possibly ambiguous, synonyms and spelling variants exist for the same metabolite . Matching metabolites using metabolite identifiers is, in our comparison, restricted by the relatively large number of metabolites that had not been linked any of the four metabolite databases (KEGG, ChEBI, PubChem Compound, and CAS). One reason for the lack of metabolite identifiers is that the metabolite databases themselves are also work in progress. Metabolites that exist in a large number of structural variations such as, for example, lipids may not have been described yet in full detail in the metabolite databases. This was indeed observed for EHMN, where a large set of the unique metabolites without an identifier is involved in lipid metabolism. On the other hand, part of the metabolites of the pathway databases may not be described in any of the four metabolite databases we considered, because they, for example, do not meet the criteria to be included, such as proteins encoded by the genome found in Reactome. Furthermore, all pathway databases have a preference for one of the metabolite databases for which they curate the link. For example, BiGG mainly derived its identifiers from KEGG Compound. Similarly, for Reactome only ChEBI IDs have been manually curated. Due to this, metabolites may not link out to a metabolite database if the metabolite does not exist in the preferred reference database.
It will require a considerable manual effort to correctly assign metabolite identifiers to each metabolite and establish the correspondence of metabolites between databases. An initiative that could aid in solving some of these problems is ChemSpider , which integrates a wide variety of metabolite databases. The use of database-independent structural representations such as SMILES and InChI strings has also been recommended . In our case, three databases (EHMN, HumanCyc and KEGG) provide InChI strings for 77%, 58%, and 75% of their metabolites, respectively. The consensus is, however, only 66 of the 3475 InChI strings in total. The low consensus when matching on InChI string can partly be explained by a difference in the amount of detail with which the structure of metabolites has been described and a difference in protonation state.
The question remains to what extent the reaction consensus would increase, even if all metabolites were properly described. As illustrated by our comparison of the TCA cycle also conceptual differences play an important role in explaining the lack of overlap. A similar conclusion can be drawn from a comparison of the two yeast metabolic networks that were used in building a consensus network . Even after the identity of the metabolites between the two reconstructions had been established manually, the consensus on reaction level was still only 36%. In a recent comparison of two metabolic networks of A. thaliana  only 33% of the total number of reactions could be matched unambiguously. Furthermore, it is important to keep in mind that even if we would find unambiguous descriptions for each metabolite this does not guarantee a match. Firstly, the databases, or more specifically their metabolites, are partly complementary. EHMN, for example, explicitly focused on expanding lipid metabolism in comparison to KEGG . Secondly, many of the reactants without a metabolite identifier are part of reactions that are peripheral to metabolism proper, such as precursor and degradation products of BiGG and proteins in Reactome, and are therefore unlikely to have a match in all five databases.
An example of a conceptual difference is the variation in the number of intermediate steps used to describe a specific metabolic conversion. This could be because of different database-specific criteria for when the intermediate steps of a conversion should be described or not. A second example is the use of generic metabolites (e.g., alcohol) in reactions, as HumanCyc does. This may be done to model the broad substrate specificity of the enzyme or to indicate that the exact substrate specificity is unknown. Other databases, for example BiGG, focus more on indicating the specific metabolite, e.g., ethanol instead of alcohol. This difference may be amplified by the number of specific instances given. Also more subtle conceptual differences play a role, like a different protonation state (neutral versus charged), the detail in which the structure of a metabolite is described (e.g., D-Glucose versus α-D-Glucose) or whether the metabolite is described as enzyme bound or not (e.g., lipoamide-E versus lipoamide). Finally, our GO enrichment analysis showed that the scope of the metabolic networks described by the five databases differs. The set of genes that are only found in at most two databases is, compared to the genes found in the majority of the databases, enriched for terms related to protein metabolic processes, like protein phosphorylation, proteolysis, and RNA metabolism (Additional file 2). EHMN and HumanCyc, for example, both include a generic reaction describing the phosphorylation of a protein, which is connected to a large set of 250 and 304 kinases, respectively. Differences in the metabolic processes covered by the databases also explain to some extent the differences in size of the databases.
The differences mentioned above not only make it difficult to determine the consensus between databases, but also to distinguish between conflicting and complementary content. This is especially so if one also keeps in mind that all five databases are work in progress. For example, a difference in the coverage of the metabolic network could be caused by a fundamental disagreement on whether certain processes are part of the human metabolic network. It could also be that they just did not include these processes yet and then this could be seen as complementary information. Similarly, for 45% of the consensus reactions the databases do not fully agree on the genes coding for the catalyst (Additional file 3), which may point to either complementary or conflicting information. Another example is the difference in number of steps, which can in most cases be explained by a difference in the level of detail of the description. It could, however, also reflect disagreement on the number of intermediate steps required for a particular conversion.
The low level of consensus provides compelling evidence that additional curation and the integration of the content of the five pathway databases in a single human metabolic network is desired and would improve the description of human metabolism. However, given the results of our comparison and all difficulties outlined above, what would be the way forward towards an integrated network? The consensus consists of only 199 reactions, even less when also considering the connected genes and EC numbers, and is therefore not of direct practical use. Another option is to take the union of the reactions contained in the individual databases. This is the approach taken by, for example, ConsensusPathDB  for integrating functional interactions, including metabolic reactions. Besides being restricted by the same conceptual and technical issues that we described, combining the content of the databases is not the definite answer. It will not solve disagreements between databases regarding, for example, the gene product catalyzing a reaction or whether a reaction can take place in human or not. Conflicting information would end up in the union and ultimately requires manual curation or at least annotation of such conflicts. Reasons for disagreement are manifold and database-dependent. Some databases, for example HumanCyc, prefer to err on the side of false positives to bring potential pathways to the attention of the community . In BiGG, some reactions without evidence were included because they improved the performance of the in silico model. A different interpretation of the literature used in the construction of the network also causes disagreements . Moreover, some parts of the metabolic network are still subject of debate and the current literature reflects these different opinions. The union will for a large part consist of data that is only supported by one of the databases.
A third option is to only include reactions on which the majority of the databases agree. This gives a higher level of confidence and in our case also a considerably larger set of 1004 reactions instead of the 199 reactions in the consensus. However, caution is warranted as for instance the databases are not strictly independent as illustrated by our pairwise comparison of KEGG and EHMN, for example. Erroneous data may, therefore, be propagated in multiple databases. Our case study of the TCA cycle also illustrates the problems of the majority vote strategy (Additional file 12). If we retain all entities the majority agrees on, 40% of the reactions are included. However, the genes MDH1 and ACO1 encoding for cytosolic proteins are also part of the majority as is the conversion of citrate to oxaloacetate (EC 220.127.116.11), which is also cytosolic. Moreover, there is no majority for any of the EC numbers proposed by one of the databases for the conversion of 2-oxoglutarate to succinyl-CoA. Also conceptual differences can be observed as, for example, we are left with two routes for both the conversion of citrate to isocitrate. Furthermore, reactions that are not part of the majority, but only found in one or two databases are not necessarily incorrect, but could be valuable complementary information. For example, KEGG gives a more detailed description of the conversion of 2-oxoglutarate to succinyl-CoA.
If the conceptual differences and technical issues we identified would be resolved the overlap will increase. It will, however, remain very difficult to (automatically) discern useful complementary information from conflicting information. In this respect, a more widespread use of evidence codes indicating the type of evidence supporting the data would enable to make a distinction between high and low confidence data. However, extensive annotation of evidence is currently only provided by BiGG and HumanCyc.
Significant manual intervention will be needed to reach the ultimate goal of a single human metabolic network. A promising model is a community-based approach, such as WikiPathways  or an annotation jamboree as advocated by Mo and Palsson . A wiki-based approach allows the community to curate existing pathways and add new ones. Annotation jamborees are organized around domain experts and facilitate the reconciliation and refinement of metabolic pathway databases. They have already been carried out successfully for various organisms [23–25]. The results of our comparison could be used as a stepping stone for such an effort as it is crucial to understand the underlying causes of the differences to be able to resolve them. For integration purposes, we also provide an automatically derived overview of all reactions in which matching reactions are aligned, along with their associated genes, EC number and pathways (Additional file 13). The overviews of the comparison on gene, EC number and reaction level can be also found online http://www.molgenis.org/humanpathwaydb. Here, results of the comparison can be queried, sorted, and exported in a number of ways. The web application was generated using the MOLGENIS toolkit  and next to the graphical user interface also provides several scriptable interfaces, e.g., an R interface. Using, for example, the majority reactions as a starting point for curation these overviews could aid experts on the human metabolic network to consolidate the differences between the networks and arrive at a unified model of human metabolism.