# Metabolic Flux-Based Modularity using Shortest Retroactive distances

- GauthamVivek Sridharan
^{1}, - Michael Yi
^{1}, - Soha Hassoun
^{2}and - Kyongbum Lee
^{1}Email author

**6**:155

**DOI: **10.1186/1752-0509-6-155

© Sridharan et al.; licensee BioMed Central Ltd. 2012

**Received: **6 August 2012

**Accepted: **27 November 2012

**Published: **27 December 2012

## Abstract

### Background

Graph-based modularity analysis has emerged as an important tool to study the functional organization of biological networks. However, few methods are available to study state-dependent changes in network modularity using biological activity data. We develop a weighting scheme, based on metabolic flux data, to adjust the interaction distances in a reaction-centric graph model of a metabolic network. The weighting scheme was combined with a hierarchical module assignment algorithm featuring the preservation of metabolic cycles to examine the effects of cellular differentiation and enzyme inhibitions on the functional organization of adipocyte metabolism.

### Results

Our analysis found that the differences between various metabolic states primarily involved the assignment of two specific reactions in fatty acid synthesis and glycerogenesis. Our analysis also identified cyclical interactions between reactions that are robust with respect to metabolic state, suggesting possible co-regulation. Comparisons based on cyclical interaction distances between reaction pairs suggest that the modular organization of adipocyte metabolism is stable with respect to the inhibition of an enzyme, whereas a major physiological change such as cellular differentiation leads to a more substantial reorganization.

### Conclusion

Taken together, our results support the notion that network modularity is influenced by both the connectivity of the network’s components as well as the relative engagements of the connections.

### Keywords

Modularity Metabolic networks Metabolic flux Edge-weighting Adipocyte metabolism Retroactivity## Background

The topology of interactions in a biological network is often studied by modeling the network as a graph, which allows the use of established algorithms and metrics such as shortest path analysis[1] and betweenness centrality[2]. Graph theoretical models have yielded useful insights into not only the global topology of biological networks, but also local interactions that form distinct substructures, frequently referred to as modules[3, 4]. Indeed, there is growing consensus that many types of biological networks possess modular character. Hierarchically arranged modules have been identified in metabolic networks, where larger, more heterogeneous subnetworks comprise smaller, more cohesive subnetworks[5, 6]. Hierarchical modularity has also been observed for gene interaction networks[7] and protein interaction networks[8].

Despite the important insights obtained from topological analysis, almost all of the graph-based studies to date have examined a biological network under a single static condition[9]. For instance, Potapov and coworkers note that shortest path analysis, applied to a static network, may offer limited information because the length of an edge in the graph model may not correlate well with the overall efficiency of a particular biochemical transformation represented by the edge[10]. There is increasing evidence that biological network organization is dynamic and state dependent, which cannot be adequately studied from a static point of view. As a result, there has been growing interest in augmenting the topological information of biological networks for graph-based analysis with observed activity data. Recently, Tang and coworkers used gene expression data to construct time-course protein interaction networks, and found that functional modules detected in the time-course networks more closely matched known regulatory complexes than those detected in the static networks[11]. In another example, Greenblum and coworkers constructed a metagenomic network of the human gut microbiome using gene expression data, and showed that state-specific networks representing lean or obese individuals exhibited different topological properties, including modularity[12]. Similarly, Taylor and coworkers found that dynamic changes in the organization of the protein-protein interaction network, rather than expression levels of individual proteins, correlated strongly with breast cancer prognosis[13]. Interestingly, mutations in hub proteins connecting different modules were found to be more frequently associated with cancer phenotypes than mutations in hub proteins that are highly connected with other proteins in the same modules, suggesting that alterations in global modularity may occur in cancer.

In the case of a metabolic reaction network, gene or even protein expression data may not best capture the interactions between the network’s components, as mRNA levels or enzyme concentrations do not necessarily correlate with reaction rate or metabolite turnover. A more comprehensive snapshot of the physiological state may be provided by a metabolic network’s reaction flux distribution, which directly reflects the relative engagements of enzymes, integrating the various layers of regulatory processes active in the cell. Intuitively, the flux of a reaction can be used to weight the interaction mediated by the reaction. For example, Yoon and coworkers applied flux-based weights to adjust the edge distances in a graph model of murine adipocyte metabolism, and thereby reflect metabolic state-dependent variations in the interactions between metabolite pools[14, 15]. While intuitive, this weighting scheme assumes that the metabolic network is modeled as a metabolite centric graph, where the edges represent reactions. For the purpose of studying the interactions between enzymes, it is often useful to model the metabolic network as a reaction centric graph, where the nodes represent enzymes and edges represent interactions between the enzymes mediated by metabolite substrates and effectors[16]. The benefit of a reaction-centric graph, particularly in the context of modularity analysis, is that a metabolite is not constrained to a module. Instead, a metabolite is more appropriately modeled as a shared resource, and reactions define the functional identity of a module. To our knowledge, a scheme to weight the edges of a reaction-centric graph has not yet been described in the literature. The purpose of this study was therefore to develop a generally applicable method for incorporating activity data such as metabolic flux into modularity analysis using graph models where the nodes, rather than the edges, represent the network’s functional components.

Recently, we defined a new metric, termed Shortest Retroactive Distance (ShReD), to capture feedback and other cyclical interactions in a metabolic network[17]. Based on the earlier work of Saez-Rodriguez and coworkers on retroactivity[18], ShReD was used to solve for modular partitions that would minimize cyclical interactions between modules while maximizing such interactions within a module. While the earlier work on retroactivity focused on nearest neighbor interactions, for example mediated by the product of a reversible reaction, the ShReD-based analysis also considered interactions between distant parts of a network. In the present study, we further expand the use of ShReD as a modularity analysis metric by developing a weighting scheme to reflect phenotypic state-dependent variations in reaction-to-reaction interactions. We focus on flux data due to the integral nature of the information content in such data, reflecting the functional outcomes of transcriptional, translational, and post-translational mechanisms of enzyme activity regulation. Flux data can be obtained using a number of different methods, including isotopic (typically ^{13}C) labeling, metabolic flux analysis (MFA), and flux balance analysis (FBA), Generally, mathematical model-based analysis of isotopic enrichment of multiple metabolite pools offers the greatest resolution. Flux balance analysis is a constrained optimization based approach typically used to estimate fluxes in conjunction with a metabolic objective function. The problem is usually severely underdetermined in FBA. In the present study, we used a constrained optimization based approach to estimate metabolic fluxes, but without assuming a metabolic objective. Rather, we minimized the sum of squared differences between the measured and estimated exchange fluxes, as the problems were well constrained. Applied to a model of adipocyte metabolism, ShReD-based modules obtained using flux weights more consistently reflected recognizable functions of established pathways compared to the modules obtained without the weights. Comparisons of modules obtained using several different flux sets representing distinct metabolic states identified robust reaction pairs that repeatedly partitioned into the same module across many levels of modular hierarchy, suggesting possible co-regulation.

## Results

### Effects of weighting edges on ShReD distribution

**V,**which ranks the ShReD between two reaction nodes relative to the distribution of all ShReDs involving either one of the two reaction nodes.

There is a positive correlation (R^{2}=0.35, p<0.01) between the un-weighted ShReD and the corresponding weighted ShReD. The correlation analysis was performed on reaction pairs with ShReD < 100, since the maximal ShReD value was capped at 100 (see Methods). The positive correlation suggests that the topology of the metabolic network as defined by the stoichiometry has some influence on the closeness of cyclical interactions between enzymes as defined by the fluxes of the reactions connecting the enzymes (Figure 1C). However, the correlation is not very strong, as there are many instances where a relatively short un-weighted ShReD corresponds to a relatively long weighted ShReD (Figure 1C, i), and a long un-weighted ShReD corresponds to a short weighted ShReD (Figure 1C, ii).

### Effects of edge-weighting on ShReD-based network partition

*a priori*assigned textbook associations largely remain intact (Figure 2B: Module #7224, Module #7287). Quantitatively, the flux-weighted partition has a greater average homogeneity index between heights 1–7 in the hierarchy (Figure 3), where height zero corresponds to terminal nodes. At height zero, the average homogeneity is similarly high for both the weighted and un-weighted cases due to the large number of single reaction modules.

### Comparing dynamic vs. Static weighting schemes

In the absence of flux data, topological data other than cyclical connectivity could be used to guide modularity analysis. Metabolite degrees[19] were investigated as an example of connectivity-based weights reflective of network topology from a static perspective. Briefly, the edge distance from a reaction node R_{i} to reaction node R_{j} was determined as the number of reactions in the network that consume the intermediary metabolite connecting R_{i} and R_{j}. The rationale was that the influence of R_{i} on R_{j} would be strongest if R_{j} is the only reaction consuming the intermediary metabolite produced by R_{i}. The influence would be weaker if the intermediary metabolite was consumed not only by R_{j}, but also by many other reactions in the network. Applying this weighting scheme to the adipocyte model (Figure 3), we find that ShReD-based partitioning of the metabolite degree-weighted network results in average homogeneity index values that lie between the un-weighted network and the flux-weighted network. This result suggests that the metabolite degree-weighted network is an improvement over the un-weighted network, but is less effective than the flux-weighted network at capturing the relative engagements between the reactions.

### Robust interaction pairs

_{ij}of two reaction nodes in the initial un-partitioned network could predict the degree to which the two reaction nodes remain together in the hierarchical partitioning. The degree to which two reaction nodes remain together was assessed by the partition score H

_{ij}, which scales the number of modules shared by both reaction nodes with respect to the total depth of the partitions for each reaction node (see Methods for definition of depth). A scatter plot of the partition score and the modularity score for the Day 12 flux-weighted model shows a significant positive correlation (R

^{2}= 0.45, p<0.01) for reaction pairs with a positive modularity score (Figure 4A). Of particular interest are the reaction pairs that fall in the upper right hand corner, chosen here to be reaction pairs with V

_{ij}> 3.0 and H

_{ij}> 0.7. Reaction pairs satisfying this criterion were selected from all four flux-weighted adipocyte models (Day 4, Day 12, and Day 12 with PCX or LDH inhibition). Forty reaction pairs, or roughly 1.5% of the possible 2556 reaction pairs, satisfied the criterion for at least one of the four models. A heat map displaying the number of models (of the four adipocyte models) for which a given reaction pair meets the criterion shows that 17 of the 40 reaction pairs robustly partition together across the different metabolic states (Figure 4B). One such reaction pair is [R32, R50] (for reaction definitions, see Additional file1: Table S1), which corresponds to NADPH production from the pentose phosphate shunt and NADPH consumption for palmitate synthesis, respectively. To determine whether these robust reaction pairs could be identified solely based on stoichiometry in the absence of flux information, each of the 17 reaction pairs were mapped onto a corresponding plot of modularity and partition scores for the un-weighted adipocyte model (Figure 4C). Overall, the correlation between the partition and modularity scores, albeit still significant, was weaker for the model without flux weights (R

^{2}=0.11, p<0.01). Only 5 of the 17 robust reaction pairs identified in Figure 4B have partitions scores > 0.7, and only 3 reaction pairs also have modularity scores > 3.0 in the un-weighted H-V plot. The three reaction pairs are [R34, R36] corresponding to pyruvate dehydrogenase/citrate synthase and isocitrate dehydrogenase in the TCA cycle, [R51, R52], corresponding to triglyceride synthesis and degradation and [R57, R58], corresponding to glutamate synthesis and degradation. The remaining robust reaction pairs identified in the four flux-based partitions are not found in the un-weighted network partition. For example, the robust pair [R29, R30], which corresponds to reactions in glycolysis, has relatively low partition and modularity scores in the un-weighted case.

### Impact of metabolic state on modularity

## Discussion

In this paper, we present a novel methodology for investigating the impact of different metabolic states on the functional organization of metabolic networks. The methodology utilizes metabolic flux data as weights for a graph-based partitioning method that conserves cyclical interactions. Previously, we assessed the cyclical interactions based on ShReDs calculated by assuming static interactions, and thus a uniform graph distance, between each connected reaction pair. In the present study, we allow the interactions, and thus the graph distances, to vary with the metabolic state.

Unlike the un-weighted case, the weighted ShReD distribution displays significant skewness (Figure 1B), indicating that the arithmetic mean is not representative of the average or expected ShReD. A likely reason for the skewness is that some reactions, particularly those involved in amino acid metabolism, carry negligible flux compared to other parts of central carbon metabolism such as glycolysis and the TCA cycle. As the edge weights of a ShReD are inversely proportional to the fluxes of the reactions comprising the ShReD, a ShReD that includes one or more reactions carrying negligible flux can be very large, and thus skew the arithmetic mean of the distribution. For this reason, it is possible for a reaction pair to have a relatively small ShReD in an un-weighted network, but a relatively large ShReD in a corresponding weighted network. For example, the ShReD for the reaction pair [R28, R63] in the unweighted network is two (2) (Figure 1C, i), since R28 (3-phosphoglycerate synthesis in glycolysis) and R63 (proline synthesis) interact cyclically via the production and consumption of NADH and NAD^{+}. However, in the weighted network, the directional interaction from R28 to R63 is very weak, since only a very small fraction of the NADH produced by glycolysis is used for proline synthesis. The corresponding edge distance is 1251, which is approximately 50-times the average of non-infinite ShReDs in the network (25). As a result, the weighted ShReD between these two reactions traverses an alternate sequence of reaction nodes, comprising 10 reactions spanning parts of glycolysis and the TCA cycle, 2-oxoglutarate synthesis, and glutamate synthesis (Additional file1: Figure S2). The ShReD value of this cycle is ~60. This ShReD value is still relatively large compared to other weighted ShReD values in the distribution, implying a relatively weak cyclical interaction. Conversely, a relatively long ShReD in the unweighted network can yield a relatively short ShReD in the weighted network. For example, the unweighted ShReD for the reaction pair [R40, R52], corresponding to mitochondrial malate synthesis and triglyceride degradation respectively, is the largest non-infinite ShReD at 11 (Figure 1C, ii). However, every edge in this cycle carries a relatively large flux, resulting in a weighted ShReD value of 22, which is close to the average ShReD of the weighted network.

A comparison of the hierarchical partition trees for the unweighted and weighted (Day 12) models shows that the weighted model yields greater functional homogeneity of modules based on the canonical pathway assignments of the constituent reactions (Figures 2 and3). This suggests that the network topology alone, as defined by the network’s reaction stoichiometry, is insufficient to capture the functional associations between reactions that are reflected in the textbook pathway assignments. In our previous work, we augmented the stoichiometric information by including known regulatory interactions between reactions. Edges denoting regulatory interactions were drawn from one reaction node to another if the product metabolite of the first reaction allosterically regulated the second reaction. The presence of these regulatory edges had a significant impact on the modularity of the network. However, for many cell types, information regarding regulatory mechanism is incomplete or difficult to obtain, requiring extensive manual searches of the literature. Therefore, an un-weighted network will almost certainly contain only partial information regarding functional interactions between reactions. One way to upgrade the information content is to incorporate metabolic flux data, which provides a snapshot of cellular metabolic state, and reflects the integral of various regulatory processes active in the cell. In this study, we found that incorporating flux data as weights for directed interactions between reactions resulted in homogeneous modules that are more in line with textbook knowledge on biochemical pathway organization.

However, we found that some module inhomogeneity persists deep into the hierarchy even for the weighted models. A majority of these inhomogeneous modules include one or more robust reaction pairs that consistently partition together across the different metabolic states examined in this study. One such module, found at depth 7 of the Day 12 model partition (Figure 2B, Module #7299), points to a tight coupling between carbohydrate metabolism, citrate malate cycle, and lipid metabolism, mediated through the production and consumption of NADPH. This module includes the reaction pair [R32, R50], corresponding to NADPH production via the pentose phosphate shunt and palmitate synthesis respectively, which was one of the 17 robust reaction pairs with both a high modularity score and a high partition score for all four flux-weighted partitions. We have previously observed that the interactions mediated by cofactors, which are ubiquitously present throughout metabolism, can couple reactions spanning seemingly distant pathways[17]. Prior studies have often removed cofactors or ‘currency metabolites’ prior to network modularization due to the difficulty of assigning them to distinct functional modules. While ShReD-based partitioning can also be performed after the removal of cofactors, our prior work suggests that cofactors are essential in mediating metabolic cycles and allosteric feedback loops[17], and should thus be retained if the goal is to identify modules based on cyclical interactions.

One possible biochemical basis underlying the robust reaction pairs is co-regulation. For example, reaction R50, catalyzed by 3-oxoacyl-(acyl-carrier-protein) reductase, requires NADPH as a cofactor for activity[20], while both enzymes catalyzing the lumped reaction R32, glucose 6-phosphate dehydrogenase and 6-phosphogluconate dehydrogenase, are allosterically regulated by NADPH[21, 22]. Similarly, reaction R44, catalyzed by malic enzyme, is product-inhibited by NADPH[23], which is a required cofactor for reaction R50. Another co-regulated robust reaction pair is [R34, R48], corresponding to the first steps in the TCA cycle (pyruvate dehydrogenase and citrate synthase) and oxidative phosphorylation, respectively. Oxaloacetate is a limiting substrate for citrate synthase, and also a competitive inhibitor of oxidative phosphorylation[24]. Reactions R34 (pyruvate dehydrogenase/citrate synthase), R36 (isocitrate dehydrogenase) and R41 (malate dehydrogenase) are steps in the TCA cycle regulated by ATP, which could explain the robustness of interactions between reaction pairs [R34, R36] and [R34, R41][23, 25, 26].

While the partitions of the four flux-weighted models share similar modules as exemplified by the robust reaction pairs, they also exhibit notable differences. For the Day 4 partition, reactions catalyzed by PCX and PEPCK both split off immediately from the parent network at depth one of the hierarchy. This split is due to the very low flux carried by these reactions at Day 4, which excludes them from significant cyclical interactions with any of the other reaction nodes. Day 4 represents an early stage of differentiation when an immature adipocyte phenotype is expected. While lipogenic genes are activated, the fluxes of lipid synthesis and triglyceride accumulation remain low at this stage relative to other parts of central carbon metabolism. Our results suggest that PCX and PEPCK, which catalyze upstream steps in glycerogenesis and fatty acid synthesis from glucose, are not yet integral to any major functional modules in the immature adipocyte. However, at Day 12 (Figure 2B), PEPCK is tightly coupled to the TCA cycle reactions, mediated through the consumption and production of ATP, and PCX is coupled to carbohydrate metabolism and triglyceride metabolism (Figure 2B Module #7252). Indeed, there is a striking difference between Day 4 and Day 12 partitions based on the relative distances between the corresponding pairs of reactions in the H-V space. In comparison, there is a more subtle difference between the partitions of Day 12 and Day 12 with LDH inhibition. These observations suggest that the inhibition of one enzyme is not enough to drastically alter modularity. In contrast, the transition from an immature phenotype on Day 4 to a mature phenotype on Day 12 represents a concerted set of changes across cellular metabolism, which is reflected in the broadly altered modularity.

## Conclusions

Taken together, our results support the notion that network modularity is influenced by both the connectivity of the network’s components as well as the relative engagements of the connections. The major contribution of the present study is a generally applicable methodology to incorporate activity data into a systematic partitioning framework featuring the conservation of cyclical, or retroactive, interactions. We found two key benefits of incorporating metabolic flux data. First, comparisons across different metabolic states can identify conserved modules comprising robustly interacting reactions that may be co-regulated by a common allosteric effector. Second, embedded in the flux data is information on the various layers regulatory processes active in the cell, which can be used to augment connectivity relationships defined by stoichiometry. In the context of modularity analysis, the implication is that lack of detailed knowledge on regulatory mechanisms can be at least partially addressed using experimentally observable data. On the other hand, the reliance on experimental data is also a limitation in the scalability of our methodology. As modularity analysis is an approach to study complex networks, it is ideally applied to large-scale systems rich with complexity. Unfortunately, resolving the flux distribution of a large-scale metabolic network, for example using ^{13}C isotope labeling, remains experimentally demanding and technically challenging. One way to address this limitation in scalability could be to utilize solutions from constraint-based methods such as Flux Balance Analysis that require relatively few measurements. Rather than rely on flux data reflecting an observed metabolic state, flux data could be used that reflect an optimized state or a range of attainable states. An added benefit of using such model-derived flux data could be to enable efficient exploration of different module configurations accessible to a metabolic network.

In summary, we have extended a previously developed methodology for modularity analysis by considering non-uniform interactions between retroactively connected reactions. Whether the modules defined by cyclical interactions between their constituent reactions indeed contribute to some recognizable system property warrants further study. For example, a future study could examine whether modules comprising metabolic cycles serve to limit the propagation of perturbations through the network, and thereby add to the stability of the system.

## Methods

### Adipocyte model and fluxes

A stoichiometric network model of adipocyte central carbon metabolism was formulated by slightly modifying a previously published model[15]. The modifications were as follows. Reactions were removed for ketone body metabolism, because these reactions carried negligible flux. Reactions were added for glyceroneogenesis to allow the synthesis of glycerone-phosphate from phosophoenolpyruvate. The number of reactions and metabolites in the modified model were 72 and 79, respectively, with 48 independent steady state balances and 22 measured exchange rates. The system was underdetermined by a degree of two. Metabolic flux distributions were calculated for four different phenotypic states: immature adipocyte (day 4 post-induction), mature adipocyte (day 12 post-induction), mature adipocyte treated with an inhibitor for lactate dehydrogenase (LDH), and mature adipocyte treated with an inhibitor for pyruvate carboxylase (PCX). Rates of metabolite uptake and output (exchange rates) describing these phenotypic states were taken from our previous work[15, 27]. Fluxes were calculated by minimizing the sum of squared differences between measured and calculated metabolite exchange rates subject to stoichiometric balance constraints. The reaction definitions of the adipocyte model and flux distributions corresponding to the four phenotypic states are listed in Tables S1 and S2 (Additional file1).

### Flux-based ShReD

_{1}produces 100 mol/min of metabolite M

_{2}, of which 60 mol/min is directed towards R

_{2}and the remainder towards R

_{3}. Assuming that the pool of M

_{2}is homogeneous, we attribute a stronger influence of R

_{1}on R

_{2}relative to R

_{3}. Intuitively, a stronger influence is modeled as a smaller edge weight (shorter path distance), whereas a weaker influence is modeled as a larger edge weight (longer path distance). Formally, we define the edge distance between a connected pair of reaction nodes as the

*inverse of the fraction of the intermediate metabolite production flux that is directed towards the destination reaction node*. In the example of Figure 7A, the dimensionless edge distance D

_{1,2}between R

_{1}and R

_{2}is given by:

_{1}and R

_{3}is given by:

_{0}is 100 mol/min, of which 70 and 30 mol/min is directed towards R

_{3}and R

_{4}, respectively. Based on the assumption that the metabolite pool is homogeneous and the contributions of the upstream reactions to this pool are indistinguishable, the directed interaction from R

_{1}to R

_{3}is the same as the interaction from R

_{2}to R

_{3}. The weighting would be the same if

*v*

_{ 1 }and

*v*

_{ 2 }are each 50, or if

*v*

_{ 1 }is several orders of magnitude smaller than

*v*

_{ 2 }. Even if

*v*

_{ 1 }= 0.01, it has to be assumed that 70% of that small flux is directed towards R

_{3}, because the source of the intermediary metabolite flux cannot be distinguished by the downstream enzymes. Similarly, the interaction from R

_{1}and R

_{4}is the same as the interaction from R

_{2}to R

_{4}. Generalizing for a pair of reactions R

_{i}and R

_{j}connected through an intermediary metabolite M produced by an arbitrary number reactions N, the edge distance from node R

_{i}to R

_{j}is given by:

_{k}that produce the intermediary metabolite M, v

_{k}is the flux of R

_{k}, and v

_{j}is the flux of the reaction R

_{j}. When

*v*

_{j}is close to zero, the corresponding edge distance is very large, as is any ShReD that includes this edge. In such cases, allowing a ShReD to reach an arbitrarily large value could exaggerate the numerical difference between reactions whose fluxes are not statistically different from zero. Therefore, the value of a flux-weighted ShReD was capped with an upper bound. For numerical convenience, the cap was set at 100, as fewer than 5% of all ShReDs calculated in this study exceeded this value. The calculation of ShReDs based on flux weights is illustrated in Figure 8. Distinct flux distributions (Figures 8A and8C) can result in different ShReDs for the same reaction pair (Figures 8B and8D).

### Partition algorithm

Partitions of flux-weighted and unweighted network models were generated using Newman’s community detection algorithm[28] similar to our previous work. The overall algorithm flow is shown in Figure S3 (Additional file1). Briefly, the partitioning algorithm begins by finding the connected subnetworks in the parent network using a breadth-first traversal algorithm[29], as it is possible that the parent network, represented as a reaction centric graph, may not be fully connected. Each connected subnetwork is then partitioned into two daughter subnetworks to maximize a modularity score. Applied recursively, the algorithm produces a hierarchical tree of modules. Unlike our previous work, we do not require each daughter subnetwork to contain at least one cycle as a criterion for partition. It is sufficient that at least one daughter subnetwork contains at least one cycle. This relaxation allows the algorithm to find solutions (reaction node assignments) that result in a partition where single reaction nodes peel off from a larger subnetwork. While the single reaction nodes obviously cannot possess a cycle, this should not preclude further partitioning of the larger subnetwork.

#### Modularity matrix

**V**has a positive entry V

_{ij}if the corresponding ShReD between a pair of reaction nodes is small relative to the expected ShReD, whereas it has a negative entry if the corresponding ShReD is large. Due to the skewing effect of the flux weights on the ShReD distribution, the determination of whether a weighted ShReD is small or large relative to expectation was based on a log ratio. Formally, we define an entry V

_{ij}in the modularity matrix

**V**as follows.

In equation 4, p_{ij} is the fraction of all weighted ShReDs involving reaction R_{i} or R_{j} that is longer than the ShReD between R_{i} and R_{j} (ShReD_{ij}). If exactly half of all ShReDs involving R_{i} or R_{j} are longer (or shorter) than ShReD_{ij}, then V_{ij} is zero. Otherwise, V_{ij} is positive or negative depending on the rank of ShReD_{ij} relative to all other ShReDs involving R_{i} or R_{j}. As an example, consider the subnetwork shown in Figure 2B (module #7249). The flux weighted ShReD matrix for this subnetwork is shown in Figure S4 (Additional file1). There are a total of 26 ShReDs involving R24 or R31, including the ShReD between R24 and R31 (ShReD_{24,31}). Of these, ShReD_{24,31} ranks 11th in terms of length. Applying equation 4, p_{24,31} = 10/25 = 0.4, and V_{24,31} = −0.41. If p_{ij} = 0, p_{ij} is arbitrarily set to 0.01. The smallest V_{ij} value is thus −4.60, which is on the same order of magnitude as the other entries in the modularity matrix.

#### Optimization of modularity score

**s**, that maximizes the modularity score[28]. The sum score is defined based on the modularity matrix

**V**:

Each element s_{i} or s_{j} of vector **s** has a value of either −1 or 1. An increase in Q is obtained in two cases: if V_{ij} is positive and reactions i and j are assigned to the same subnetwork (s_{i} = s_{j} = 1 or s_{i} = s_{j} = −1), or if V_{ij} is negative and the two reactions are assigned to different subnetworks (s_{i} = 1 and s_{j} = −1 or vice versa). A solution to the maximization problem can be found using a number of different optimization methods. For example, an approximate solution can be obtained using eigenvalue decomposition[28]. In this study, we used a genetic algorithm (GA). While the GA was computationally less efficient than the eigenvalue decomposition method, it yielded superior solutions (**s** vectors) with larger Q scores. The GA was implemented using custom code written in MATLAB with the following parameters. The initial population of solutions comprised 100 randomly generated **s** vectors. The population size was kept constant. A fixed fraction (60%) of the solutions was selected for reproduction based on fitness (Q score). New individuals were bred through crossover and mutation. During crossover, an element in the offspring **s** vector was assigned the same value as the corresponding elements in the parent **s** vectors if the values were the same in both parents. Otherwise, the element was randomly assigned either −1 or 1. The mutation (sign change) rate was set to 20%. The GA terminated when the average Q score of the population reached a plateau with an absolute slope < 0.05 with respect to generation number. The fittest solution (**s** vector with the largest Q score) generated over the course of the GA was used for the partition. For the subnetworks encountered in this study, termination was reached generally within 200 generations. For the example subnetwork of Figure 2B (Module #7249), the GA terminated in 117 generations, and clearly outperformed the eigenvalue solution (Additional file1: Figure S5). In cases where the subnetwork size was sufficiently small (< 9 reactions), an exhaustive search was performed to find a globally optimal solution. The runtime for the complete partitioning of the Day 12 model was 180 seconds using the GA and 85 seconds using the eigenvalue approximation on a laptop computer with a 2.2 GHz CPU (Intel Core 2 Duo) and 4 GB of physical memory.

### Hierarchical tree of modules

The partitioning results are reported in the form of a hierarchical tree annotated with several properties. Each module is represented as a pie chart, where the size of each slice is proportional to the fraction of reactions that belong to the corresponding, pre-assigned canonical (textbook) grouping. The homogeneity index of a module corresponds to the fraction occupied by the largest slice in the pie chart. The homogeneity index therefore ranges from 0 to 1, where a larger number indicates greater homogeneity in terms of composition based on the canonical group assignments. The black lines connecting the nodes in the hierarchical tree represent ShReD-based partitions, whereas the red lines represent the formation of components from partitions that include disconnected components. The depth of a module is determined as the number of black edges traversed from the root node to the module. The height of a module is determined as the largest possible number of black edges traversed from the module to a terminal leaf node.

### Partition score for reaction pairs

where *Shared* is the number of modules in the partition hierarchy that include both reactions i and j, and m_{i} and m_{j} are, respectively, the maximal depth of reactions i and j. The numerical range of H is thus from 0 to 1. A value of zero indicates that the two reactions are immediately separated after the first partition operation, whereas a value close to one indicates that the two reactions remain together in the same module through many rounds of partition operations.

### Reaction pair H-V space Euclidean distance

To assess the impact of metabolic state and its corresponding flux distribution on the hierarchical partition of reaction modules, a Euclidean distance is computed for each reaction pair in the H-V (partition score – modularity score) coordinate space from its original location corresponding to the first metabolic state to its new location corresponding to the second state. All coordinates are normalized to the mean partition score and modularity score of the corresponding flux-weighted partition.

## Declarations

### Acknowledgements

The authors gratefully acknowledge Ehsan Ullah for assistance with writing the file management portion of the program code and discussions of the manuscript.

## Authors’ Affiliations

## References

- Floyd RW: Algorithm-97 - Shortest Path. Commun Acm. 1962, 5: 345-View ArticleGoogle Scholar
- Girvan M, Newman ME: Community structure in social and biological networks. Proc Natl Acad Sci U S A. 2002, 99: 7821-7826. 10.1073/pnas.122653799.View ArticleGoogle Scholar
- Zhao J, Yu H, Luo JH, Cao ZW, Li YX: Hierarchical modularity of nested bow-ties in metabolic networks. BMC Bioinformatics. 2006, 7: 386-10.1186/1471-2105-7-386.View ArticleGoogle Scholar
- Holme P, Huss M, Jeong H: Subnetwork hierarchies of biochemical pathways. Bioinformatics. 2003, 19: 532-538. 10.1093/bioinformatics/btg033.View ArticleGoogle Scholar
- Papin JA, Reed JL, Palsson BO: Hierarchical thinking in network biology: the unbiased modularization of biochemical networks. Trends Biochem Sci. 2004, 29: 641-647. 10.1016/j.tibs.2004.10.001.View ArticleGoogle Scholar
- Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabasi AL: Hierarchical organization of modularity in metabolic networks. Science. 2002, 297: 1551-1555. 10.1126/science.1073374.View ArticleGoogle Scholar
- Trevino S, Sun Y, Cooper TF, Bassler KE: Robust detection of hierarchical communities from Escherichia coli gene expression data. PLoS Comput Biol. 2012, 8: e1002391-10.1371/journal.pcbi.1002391.View ArticleGoogle Scholar
- Yook SH, Oltvai ZN, Barabasi AL: Functional and topological characterization of protein interaction networks. Proteomics. 2004, 4: 928-942. 10.1002/pmic.200300636.View ArticleGoogle Scholar
- Ideker T, Krogan NJ: Differential network biology. Mol Syst Biol. 2012, 8: 565-View ArticleGoogle Scholar
- Potapov AP, Goemann B, Wingender E: The pairwise disconnectivity index as a new metric for the topological analysis of regulatory networks. BMC Bioinformatics. 2008, 9: 227-10.1186/1471-2105-9-227.View ArticleGoogle Scholar
- Tang X, Wang J, Liu B, Li M, Chen G, Pan Y: A comparison of the functional modules identified from time course and static PPI network data. BMC Bioinformatics. 2011, 12: 339-10.1186/1471-2105-12-339.View ArticleGoogle Scholar
- Greenblum S, Turnbaugh PJ, Borenstein E: Metagenomic systems biology of the human gut microbiome reveals topological shifts associated with obesity and inflammatory bowel disease. Proc Natl Acad Sci U S A. 2012, 109: 594-599. 10.1073/pnas.1116053109.View ArticleGoogle Scholar
- Taylor IW, Linding R, Warde-Farley D, Liu Y, Pesquita C, Faria D, Bull S, Pawson T, Morris Q, Wrana JL: Dynamic modularity in protein interaction networks predicts breast cancer outcome. Nat Biotechnol. 2009, 27: 199-204. 10.1038/nbt.1522.View ArticleGoogle Scholar
- Yoon J, Si Y, Nolan R, Lee K: Modular decomposition of metabolic reaction networks based on flux analysis and pathway projection. Bioinformatics. 2007, 23: 2433-2440. 10.1093/bioinformatics/btm374.View ArticleGoogle Scholar
- Si Y, Yoon J, Lee K: Flux profile and modularity analysis of time-dependent metabolic changes of de novo adipocyte formation. Am J Physiol Endocrinol Metab. 2007, 292: E1637-E1646. 10.1152/ajpendo.00670.2006.View ArticleGoogle Scholar
- Ma HW, Zhao XM, Yuan YJ, Zeng AP: Decomposition of metabolic network into functional modules based on the global connectivity structure of reaction graph. Bioinformatics. 2004, 20: 1870-1876. 10.1093/bioinformatics/bth167.View ArticleGoogle Scholar
- Sridharan GV, Hassoun S, Lee K: Identification of biochemical network modules based on shortest retroactive distances. PLoS Comput Biol. 2011, 7: e1002262-10.1371/journal.pcbi.1002262.View ArticleGoogle Scholar
- Saez-Rodriguez J, Gayer S, Ginkel M, Gilles ED: Automatic decomposition of kinetic models of signaling networks minimizing the retroactivity among modules. Bioinformatics. 2008, 24: i213-i219. 10.1093/bioinformatics/btn289.View ArticleGoogle Scholar
- Croes D, Couche F, Wodak SJ, van Helden J: Inferring meaningful pathways in weighted metabolic networks. J Mol Biol. 2006, 356: 222-236. 10.1016/j.jmb.2005.09.079.View ArticleGoogle Scholar
- Carlisle-Moore L, Gordon CR, Machutta CA, Miller WT, Tonge PJ: Substrate recognition by the human fatty-acid synthase. J Biol Chem. 2005, 280: 42612-42618. 10.1074/jbc.M507082200.View ArticleGoogle Scholar
- Ozer N, Aksoy Y, Ogus IH: Kinetic properties of human placental glucose-6-phosphate dehydrogenase. Int J Biochem Cell Biol. 2001, 33: 221-226. 10.1016/S1357-2725(01)00011-5.View ArticleGoogle Scholar
- Rippa M, Giovannini PP, Barrett MP, Dallocchio F, Hanau S: 6-Phosphogluconate dehydrogenase: the mechanism of action investigated by a comparison of the enzyme from different species. Biochim Biophys Acta. 1998, 1429: 83-92. 10.1016/S0167-4838(98)00222-2.View ArticleGoogle Scholar
- Shearer HL, Turpin DH, Dennis DT: Characterization of NADP-dependent malic enzyme from developing castor oil seed endosperm. Arch Biochem Biophys. 2004, 429: 134-144. 10.1016/j.abb.2004.07.001.View ArticleGoogle Scholar
- Dervartanian DV, Veeger C: Studies on Succinate Dehydrogenase, I. Spectral Properties of the Purified Enzyme and Formation of Enzyme-Competitive Inhibitor Complexes. Biochim Biophys Acta. 1964, 92: 233-247.Google Scholar
- Lehninger AL, Nelson DL, Cox MM: Lehninger principles of biochemistry. 2005, New York: W.H. Freeman, 4Google Scholar
- Martinez-Rivas JM, Vega JM: Purification and characterization of NAD-isocitrate dehydrogenase from chlamydomonas reinhardtii. Plant Physiol. 1998, 118: 249-255. 10.1104/pp.118.1.249.View ArticleGoogle Scholar
- Si Y, Shi H, Lee K: Impact of perturbed pyruvate metabolism on adipocyte triglyceride accumulation. Metab Eng. 2009, 11: 382-390. 10.1016/j.ymben.2009.08.001.View ArticleGoogle Scholar
- Newman MEJ: Modularity and community structure in networks. Proc Natl Acad Sci U S A. 2006, 103: 8577-8582. 10.1073/pnas.0601602103.View ArticleGoogle Scholar
- Cormen TH: Introduction to algorithms. 2009, Cambridge, Mass: MIT Press, 3Google Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.