### Network-based reaction-scoring system

Beyond experimental evidence, it is possible to assess the likelihood of reactions in genome-scale metabolic reconstructions using theoretical models. Given an experimentally derived metabolic reconstruction, a score can be computed for each reaction on the basis of a suitable stochastic network-based model, as we propose in the next section. In the context of bipartite network representations [24, 25], the model exploits the statistical regularities that underlie the structure of a metabolic network in order to ascertaining how well individual links between metabolites and reactions fit the observed topological patterns. In this way, it is possible to predict probabilities of connection for all potential interactions, those already present in the reconstruction and those absent. These probabilities are further integrated for the specific combination of metabolites involved in any particular reaction.

To define the network-based scores in quantitative terms, we interpret that a reaction is equivalent to the univocal combination of its associated metabolites, and that every metabolite *m* has a probability *p*_{
mr
} of being associated to a reaction *r*. Once the connection probabilities between metabolites and reactions are estimated from the model, and assuming their mutual independence, the probability that a certain combination *ν* of metabolites co-occur in a reaction *r* is

where the subindex *m* corresponds to metabolites in the predefined set *ν* and *m*' to those not included. The average number of co-occurrences can be calculated as the sum over all reactions

A network-based reaction score can then be defined as the average number of occurrences *n*_{
ν
} that the model associates to its particular combination of metabolites *ν*. Defined in this way, these scores break the degeneracy of reactions with identical experimental evidence.

### The Tree Distance Bipartite model

In the following, we introduce and discuss a stochastic network-based model to estimate the connection probabilities between metabolites and reactions. Taking advantage of their natural bipartite nature, we consider network representations where metabolites and reactions appear as two different classes of nodes, and metabolites are connected by edges to the reactions they participate in. We consider the simplest unweighted undirected representation, where substrates and products are not differentiated. See Figure 1 for an example.

Previous works have shown that the complex organization of metabolic networks displays characteristic features shared by other complex networks: short topological diameter [29], steady state cycles [30] or structural robustness [11], for instance. We implement a large-scale model that takes advantage of some of those organizing principles, in particular the heterogeneity in the number of connections per metabolite (degree) [31], to infer connection probabilities *p*_{
mr
} between metabolites and reactions. Network-based models are usually defined in terms of connection rules between the nodes. These laws are usually stated independently of observed systems to produce simulated networks that summarize their topological structure. Notice that here, in contrast, we compute the set of connection probabilities that has the maximum likelihood to reproduce the observed patterns in empirical metabolic networks, so we are solving the inverse problem.

Our first step is to assume a metric space underlying the structure of the empirical metabolic network. To this end, we fit to it a hierarchical random graph [27], once represented as a bipartite network with *M* metabolites and *R* reactions (see Figure 1). More specifically, we adjust the empirical bipartite network to a dendrogram, or binary tree structure, where metabolites and reactions appear as leafs. This tree represents the underlying metric space, and each of the *M* + *R* - 1 internal nodes *t* in the dendrogram has an associated distance *d*_{
t
} , so that each pair metabolite-reaction for which *t* is the lowest common ancestor is separated by a distance in the tree *d*_{
mr
} = *d*_{
t
} , independently from whether the link actually exists in the empirical network or not. We find these tree distances by fitting the tree to the empirical network data combining a maximum-likelihood approach with a Monte Carlo sampling method that explores the space of all possible dendrograms (see Methods). Our results are based on intensive numerical simulations that average a large number of samples in the stationary state when changes in the form of the dendrogram do not modify the likelihood function beyond fluctuations.

Once a distance *d*_{
mr
} is associated to a metabolite-reaction pair, we correct for heterogeneity in the degrees of metabolites. We compute estimated connection probabilities *p*_{
mr
} between all possible combinations metabolite-reaction as

where *k*_{
m
} is the degree of the metabolite and *μ* = 1/*R* ensures that network realizations with these connection probabilities have the same number of links as the observed network. From this equation, it is clear that the closer a pair is, the higher will be its connection probability. A Tree Distance Bipartite (TDB) score for every specific reaction can then be computed by applying Eq. (1) and Eq. (2).

### TDB scores for E. coli metabolism. Validation and Implementation

As an application of the methodology, we analyzed the *i*AF1260 version of the K12 MG1655 strain of *E. coli* [32] provided in the BiGG database [33, 34]. From the empirical data, we built a bipartite network representation (see Methods and Additional File 1) with 1479 reactions and 976 metabolites (network provided in Additional File 2). As expected, the number of metabolites entering into a reaction, *k*_{
r
} , follows a nearly homogeneous distribution with mean < *k*_{
r
} > = 4.82 and mode 5. In contrast, the number *k*_{
m
} of reactions in which a metabolite participates displays a heterogeneous degree distribution very close to a scale-free with and an average degree < *k*_{
m
} > = 7.30 (Additional File 1, Figure S2). The most highly connected substrates have more than a hundred and up to 841 connections (*h*^{+}, *h*2*o*, *atp*, *pi*, *adp*, *ppi*, *nad*, *nadh*).

The TDB probabilities *p*_{
mr
} were validated using standard tools in medical decision making and signal-detection theory [35], and compared with corresponding results from alternative models: the Configuration Model for bipartite networks [24, 36, 37] (CMB), the Hierarchical Random Graph model [27] generalized to bipartite networks (HRBG), and a local approach based on the computation of common neighbors (CN) [38] (reactions) between pairs of metabolites (see Methods). We checked two different aspects: the discrimination power of the models, and the behavior of the predictions under noisy conditions. We first measured the receiver operating characteristic (ROC) curve of predicted probabilities.

To calculate the ROC curves we ranked the TDB probabilities from highest to lowest. We took every value in the rank as a threshold, and for each threshold we computed the fraction of true connections and the fraction of false connections above the threshold, the True Positive Rate (TPR) and the False Positive Rate (FPR) respectively, understanding that a true connection is an observed link in the empirical network and a false connection is an absent link. When representing the TPR in front of the FPR, a completely random guess would give a point along the diagonal line and points above the diagonal represent improved classification results. In relation to this, the area under the ROC curve (AUC) is a scalar measure of accuracy [39]. In the present context, it is calculated as the probability that a randomly chosen empirical link in the metabolic network has higher probability than a randomly chosen non-existing one. The ROC curves are shown in Figure 2A. As also corroborated by the AUC value (*AUC*_{
TDB
} = 0.88, *AUC*_{
HRBG
} = 0.85, *AUC*_{
CMB
} = 0.85, and *AUC*_{
CN
} = 0.76), the TDB model has more discriminatory power over a wide range of values as compared to other alternatives.

The second test helps to evaluate robustness against noise. Links were removed from the empirical network in order to see whether our algorithm was able to distinguish those from non-existing links. After removing a subset of links uniformly at random in the original network, a new set of connection probabilities was calculated on the basis of the remaining part of the network. The new probabilities associated to the removed connections were compared one by one to that of non-existing links. This statistic ranges from 0.5 to 1 and indicates how much better our method performs as compared to a by chance baseline accuracy of 0.5. We calculated this accuracy statistic, that is shown in Figure 2B, for different fractions of removed links. For the TDB model, when 1% of the 7127 links in the network are removed the index takes a value of 0.87, meaning that 87% of the times removed links are ranked higher in probability than non-existing links.

In both tests, the TDB model outperforms all other strategies. Furthermore, TDB probabilities are well calibrated, meaning that the distributional forecasts and the observations are statistically consistent (Additional File 1, Figure S4). In view of these results, we accepted the accuracy at the statistical level of the predicted probabilities and we used them to compute theoretical scores following Eq. (1) and Eq. (2) (provided in Additional File 2). Very high values of the TDB scores are typically associated to non-specific reactions dominated by carrier metabolites (hubs in network-based terms). At the top of the rank, the four different reactions with the highest values form a group of outliers (short plateau in the top graph of Figure 3) corresponding to reactions whose metabolites are exclusively hubs. These reactions are the *inorganic diphosphatase*, which hydrolyzes the pyrophosphate anion into inorganic phosphate; the *hydrolase of ATP into ADP*, which appears in the *Nucleotides Salvage* pathway and as a reaction for ATP maintenance requirement according to flux balance analysis, and mediates also the uptake of phosphate; and the *ATP synthase* in the *Oxidative Phosphorylation* pathway that takes four protons in periplasm to produce one ATP in cytosol from ADP. These non-specific reactions are at the end of catabolic chains and are likely to be shared by many different organisms. For instance, we checked that, according to the BiGG database [33, 34], these reactions are also present in yeast (*S. cerevisiae i*MM904) and that *inorganic diphosphatase* and *ATP synthase* appear in human metabolism as well, where the *hydrolase of ATP into ADP* mediates the active transport of different metabolites across compartments. Even very simple organisms like *Mycoplasma pneumoniae* [20] use the *inorganic diphosphatase* reaction and the *hydrolase of ATP into ADP* in their metabolism. In [29], the presence of *inorganic diphosphatase* is reported in 101 out of 107 organisms spanning a wide taxonomical range, and the *hydrolase of ATP into ADP* is found into 25 of those. At the other extreme of the spectrum, low values of *S*_{
TDB
} scores are associated with very specific reactions involving rare metabolites, in the sense that they enter a small number of reactions, or with unlikely reactions, like those involving a large number of metabolites. As an example, the *thiazole phosphate* synthesis is the largest reaction in the database involving twelve metabolites and has the lowest *S*_{
TDB
} score, seventeen orders of magnitude below the maximum. In between, *S*_{
TDB
} scores adopt a broad distribution of continuous values.

In the BiGG database, every reaction (except for exchanges and transports through outer membrane) is annotated with a confidence score assessing its level of experimental evidence. These values are discrete and range from 4 at the top, when there is direct biochemical proof, to 0 at the bottom, if the reaction is included with no experimental evidence but only because it improves modeling results. In between, values of 3 correspond to genomic evidence, level 2 refers to sequence homology evidence, and 1 stands for physiological evidence. This confidence scoring system presents some shortcomings, one being the degeneracy implicit in the use of only five discrete values for lists of hundreds or thousands of reactions, another the fact that different categories are not disjoint but nested, meaning that one backs the other. An additional worth remarking feature concerns the mean degree of the metabolites participating the reactions in each scoring level. Quite unexpectedly, we found a strong bias as this measure monotonously increases when decreasing rank from 4 to 1. More specifically, the average degree of metabolites entering into reactions with high level of confidence score is, on average, smaller than the average degree of metabolites associated to reactions with low level of confidence score.

It is not completely surprising, as the just mentioned test seems to suggest, that absolute indices based on the architecture of the empirical network may go differently as compared to qualitative indicators assigned from updated experimental information. We, thus, aim at complementing the potentialities of both scoring systems by cross-comparing them. In this respect, first we look for reactions with strong experimental evidence (values 4 and 3) that at the same time score low in the TDB system. Those correspond to very specific reactions that could be functionally or evolutionary important. Examples are the five *FMNH2-dependent monooxygenase* reactions in the *Inorganic Ion Transport* pathway, the *Pyridoxine* 5'-*phosphate synthase* reaction in the *Cofactor and Prosthetic Group Biosynthesis* pathway, or the *Taurine dioxygenase* reaction in the *Alternate Carbon* metabolism. Conversely, a weak experimental evidence, scores 2 or 1 in the database, but a high value of the *S*_{
TDB
} score, qualifies the reaction as a good target for further experimental verification in standard conditions. Many examples are found within the transport subsystems, like reactions of transport via ABC system (*iron (II)* and *(III)*, *phosphatidylglycerol*, *phosphatidate*). If the *S*_{
TDB
} score is low, the reaction could be difficult to be observed experimentally except for very specific environments. Finally, high *S*_{
TDB
} scores for reactions that where required for modeling, score 0 in the database, denote consistency between our model and steady-state flux optimization solutions. It is worth remarking that these reactions involve more than one carrier metabolite and stand out as potential experimental targets in standard conditions. However, a variety of reactions manifest discrepancies between both model-based likelihoods. More specifically, we differentiate two situations. The first does not entail contradiction and refers to very specific reactions with low *S*_{
TDB
} scores involving rare metabolites that, on the other hand, seem to be essential for the viability of the bacteria according to flux balance analysis. This points out to potential experimental targets in non-standard conditions. Reactions in the subsystem of *Cofactor and Prosthetic Group Biosynthesis* are, for instance, in this category. In contrast, the second situation involves reactions like those with the highest number of participating metabolites, which according to confidence scores in the database were included in the reconstruction on the basis of modeling reasons while our methodology predicts very low scores. For them, we believe that low *S*_{
TDB
} scores point indeed to insufficient ne detail resolution in the database and we are suspicious that new experiments may show a split of the high degree reactions into a set of coupled lower degree ones.

### Relative TDB scores

Along with absolute scores, we also analyzed relative scores defined on the basis of the Configuration Model for bipartite networks (CMB) [24, 36, 37] (see Methods). The latter assumes the actual degree distributions for metabolites and reactions and it is otherwise maximally random in the assembly of connections. The *S*_{
CMB
} score of a reaction represents its probability of occurrence according to the configuration model and we use this value to compute the relative score as the ratio ∑ = *S*_{
TDB
} /*S*_{
CMB
} (provided in Additional File 2). Since differences between both scores are mainly related to the consideration of tree distances between metabolites and reactions in the TDB model, a relative score ∑ = *S*_{
TDB
} /*S*_{
CMB
} > 1 (< 1) points to the presence of tree distance correlations (anticorrelations) in the bipartite network, which are absent in the random case. In other words, a ratio higher (smaller) than one for a certain reaction indicates that its metabolites have a tendency to aggregate (avoid each other) as compared to the random case.

The ranking of relative scores is shown in the bottom graph of Figure 3, where several clusters can be differentiated. The first three clusters appear in slightly tilted plateaus with levels well separated by appreciable jumps. Each of them is formed by a subgroup of reactions that, according to the database, tend to belong to the same subsystem and share characteristic combinations of metabolites. The first group includes the *FMNH2-dependent monooxygenase* reactions, mentioned above as highly specific, with *flavin mononucleotide* and *sulfite* as reactants. The two reactions in the second plateau belong to the *Glycerophospholipid Metabolism* subsystem and are the only two in the database associated to *acyl phosphatidylglycerol*. The third cluster gathers together the eleven reactions containing *2-Demethylmenaquinone 8*. It is remarkable that, in general, the reactions in these clusters have attached a high DB confidence score. Exceptions appear in the third cluster, where one reaction has an experimental evidence score less than 3 while four reactions are included for modeling reasons, so that they become interesting targets for further experimental verification. For the rest, most of the scores have values above but close to one and there are also over two hundred reactions with ratios below one. At the very tail, one finds a set of reactions that share the common characteristic of being those with the highest reaction degree and with weak or just modeling evidence. In particular, *thiazole phosphate synthesis* is the sole reaction involving twelve metabolites in the database and has the lowest relative score ∑ = 0.3, and noticeably also the lowest absolute *S*_{
TDB
} score (its confidence score in the database is 2).

Relative scores conform better than absolute scores to the idea of pathways as functional modules, since they accentuate the e ect of tree distances that we expect to be related with the modular organization of the network. This topic will be explored in depth in future work.