# Analysis of complex metabolic behavior through pathway decomposition

- Kuhn Ip
^{1, 2, 3, 4, 5}, - Caroline Colijn
^{6}and - Desmond S Lun
^{1, 2, 3, 4, 5}Email author

**5**:91

**DOI: **10.1186/1752-0509-5-91

© Ip et al; licensee BioMed Central Ltd. 2011

**Received: **29 November 2010

**Accepted: **3 June 2011

**Published: **3 June 2011

## Abstract

### Background

Understanding complex systems through decomposition into simple interacting components is a pervasive paradigm throughout modern science and engineering. For cellular metabolism, complexity can be reduced by decomposition into pathways with particular biochemical functions, and the concept of elementary flux modes provides a systematic way for organizing metabolic networks into such pathways. While decomposition using elementary flux modes has proven to be a powerful tool for understanding and manipulating cellular metabolism, its utility, however, is severely limited since the number of modes in a network increases exponentially with its size.

### Results

Here, we present a new method for decomposition of metabolic flux distributions into elementary flux modes. Our method can easily operate on large, genome-scale networks since it does not require all relevant modes of the metabolic network to be generated. We illustrate the utility of our method for metabolic engineering of *Escherichia coli* and for understanding the survival of *Mycobacterium tuberculosis* (MTB) during infection.

### Conclusions

Our method can achieve computational time improvements exceeding 2000-fold and requires only several seconds to generate elementary mode decompositions on genome-scale networks. These improvements arise from not having to generate all relevant elementary modes prior to initiating the decomposition. The decompositions from our method are useful for understanding complex flux distributions and debugging genome-scale models.

## Background

Computational analysis of cellular metabolism has recently gained increasing prominence and importance. In particular, genome-scale computational models capturing stoichiometric and thermodynamic constraints have been published for over 30 organisms [1] ranging from relatively simple prokaryotes such as *E. coli*, to complex eukaryotes such as *Homo sapiens*[2, 3]. The application of such computational models is dependent on their accuracy and the tools developed for their analysis. The maintenance of model accuracy, or debugging, is an ongoing process by which model predictions are validated against experimental observations and discrepancies are identified and corrected. This process is clearly linked with the use and analysis of the models. These computational models can be analyzed in a number of ways. Flux-balance analysis (FBA) and elementary mode analysis are perhaps two of the most popular and powerful.

FBA determines a distribution of steady-state reaction fluxes that satisfies the constraints of the model and that optimizes a biological objective function such as biomass or adenosine triphosphate (ATP) production [4]. With appropriate constraints and a biological objective, FBA has been shown to be an effective method for prediction of phenotypes associated with genetic manipulations such as knockouts [5, 6] and of intracellular metabolic fluxes [7]. A significant application for FBA, therefore, is metabolic engineering-using computational predictions of metabolic phenotype under genetic manipulations to guide the engineering of metabolically optimized strains [8]-and computational methods have been devised to search the space of genetic manipulations *in silico* for those that yield the desired phenotype [9–13]. But, while FBA has had a number of successful applications, the method gives little insight into its predictions, hindering human understanding and model debugging.

Elementary mode analysis, on the other hand, yields no explicit predictions of metabolic behavior, and seeks primarily to allow understanding of an organism's metabolic capabilities. In elementary mode analysis, the elementary flux modes (EFMs) that define minimal sets of reactions capable of operating at steady state are generated [14]. Elementary flux modes formalize the concept of a biochemical pathway [15], and studying the modes associated with a given metabolic network has been shown to be effective for understanding its function, regulation, and robustness [16, 17]. The principal drawback of elementary mode analysis is that the number of EFMs in a network suffers from a combinatorial explosion [18], and the use of complete sets of EFMs gives rise to problems with scalability when applied to genome-wide models [19]. For example, more than two million modes exist for a simple model of *E. coli* central metabolism consisting of 112 reactions [20], which increases to more than 26 million modes when the possible substrates are expanded [21]. *i* AF1260 [6], the most recent genome-scale metabolic network of *E. coli*, consists of 2077 reactions, and the number of reactions in *E. coli* metabolic models has increased steadily for the past two decades [22]. Thus, the computation time and memory storage required to enumerate all the EFMs of full and detailed genome-scale metabolic networks are prohibitively large.

For many applications, however, our understanding of particular metabolic functions of an organism, such as its ability to produce a desired metabolite, is of greater interest than our understanding of its full metabolic capabilities. In this case, it is not necessary to know all the EFMs of the network, but simply those that combine to give rise to a particular behavior. Previous approaches to this problem have relied on first computing all modes of the network [23, 24]. More recently, an approach presented by de Figueiredo et al. [25], found biologically significant EFMs by identifying the K-shortest EFMs of the metabolic network without necessarily enumerating the whole set. Our motivation is similar in that we are not concerned with generating all the EFMs of a network. However, our approach differs greatly in that we wish to determine those elementary modes that combine to yield a given flux distribution. This flux distribution can consist of both measured and computationally predicted fluxes.

We present an algorithm to find the elementary modes that combine to produce any previously-specified flux distribution. This links the advantages of elementary mode analysis with the advantages of flux balance analysis, without requiring the prohibitive computation of all elementary modes. Our method is therefore applicable to genome-scale models and, because it can take as an input any flux distribution, it can be connected to particular experimental conditions [26] or genetic modifications [10].

To demonstrate the utility of our method and its applicability to genome-scale models, we apply it to genome-scale models of *E. coli* and MTB and show how its results can be used to interpret flux distributions related to the metabolic engineering of *E. coli* and to the survival of MTB during infection.

## Results and Discussion

### Overview

Our method takes a given steady-state metabolic flux distribution and the corresponding metabolic model, and produces a decomposition of the flux distribution into elementary flux modes. In this paper, we use flux distributions obtained by FBA, but our method can equally be applied to flux distributions obtained by alternative means, such as those derived from experimental measurement or obtained from genetic perturbation analysis methods such as MOMA [27] or ROOM [28]. As an elementary flux mode is itself a set of reactions operating at steady state, any flux distribution composed entirely of elementary flux modes must necessarily operate at steady state too. Therefore, an input flux distribution derived from experimental measurement may need to be balanced first to produce a steady state flux distribution, either by regression to fit the measured data or alternative means.

Our method operates by first selecting the reaction with non-zero flux of maximum magnitude from the given flux distribution. The algorithm then uses mixed-integer linear programming to find an elementary flux mode that both contains the selected reaction and is contained in the given distribution. The contribution of this elementary flux mode to the given distribution is determined before it is removed to give an updated flux distribution. The updated flux distribution is used as the input distribution for the next iteration of the algorithm, and this procedure is repeated until the updated flux distribution is zero (see Methods for details).

Elementary mode decompositions are not unique. Our goal is to assist in biological interpretation, and we are primarily interested in obtaining a valid decomposition rather than any specific decomposition. A valid decomposition will arise irrespective of the choice made at each iteration of the reaction with non-zero flux, but how this choice is made determines the specific decomposition. Our choice has some desirable properties. As the elementary flux mode found by the mixed-integer linear program includes the chosen reaction, the flux through this reaction will upper bound the contribution of the elementary flux mode. By selecting the reaction with non-zero flux of maximum magnitude, we avoid the possibility of generating many elementary flux modes with very small weightings and hence minimal contribution to the flux distribution. This choice also minimizes numerical instabilities arising from the calculations.

Although mixed-integer linear programming is NP-hard in general, some large mixed-integer linear programs (MILPs) can be solved with a modest amount of computation by solvers such as SCIP [29], IBM ILOG CPLEX (International Business Machines Corp., Armonk, New York), and Gurobi (Gurobi Optimization, Houston, Texas). In particular, our algorithm, implemented using Gurobi, successfully terminates in at most several seconds in all the genome-scale applications mentioned in this paper.

By contrast, previous approaches to decomposing flux distributions into elementary modes [23, 24] have relied on first generating the complete set of relevant elementary modes, then calculating a weight distribution among these modes. Since elementary mode decompositions are not unique [23], the principal advantage of these previous approaches in comparison to ours is that they are capable of selecting a particular unique decomposition based on some criterion (typically the minimization of the Euclidean norm of the weight vector), while our approach simply generates a valid decomposition among all possibilities. It is not clear, however, that criteria for selecting the weight distribution, such as the minimization of the Euclidean norm, are biologically meaningful. As we will see with the examples in this paper, simply having a valid decomposition is in itself useful.

The principal drawback of these previous approaches is that generating the complete set of relevant elementary modes can be prohibitive, particularly for genome-scale models. To demonstrate this fact, we used efmtool [21] to efficiently generate the relevant elementary modes for the genome-scale model of MTB by Jamshidi and Palsson [30], *i* NJ661, for growth on Middlebrook 7H9-a standard growth medium for MTB. With the elementary modes generated by efmtool, we then applied the quadratic programming approach proposed by Schwartz and Kanehisa [23] to obtain a distribution of weights to assign to these modes. The total computational time of this approach was 34 minutes and 27 seconds.

By contrast, our approach generates a valid elementary mode decomposition consisting of 19 modes in 1.0 seconds-a computational time improvement exceeding 2000-fold. *i* NJ661 consists of 1,028 reactions, which is modest for current genome-scale models [1]. With the ever increasing size and complexity of genome-scale models [22] and the exponential manner in which the number of elementary modes increases with the size of the network, it follows that in some cases, the advantages offered by our approach may not simply be a several-thousand fold improvement in computational time, but rather the difference between practical feasibility and infeasibility.

Indeed, *i* NJ661v [31], a recent model that modifies *i* NJ661 with the aim of more accurately modeling MTB in *in vivo* infection, is only slightly larger than *i* NJ661, with 1,049 reactions. However, a more complex growth medium resulted in a significantly larger number of relevant elementary modes for the flux distribution obtained by FBA than that for *i* NJ661, and we were unable to generate them all using efmtool because of memory limitations. Using our decomposition method, we generated a valid elementary mode decomposition consisting of 27 modes in 2.5 seconds.

*Metabolic engineering of* E. coli

To demonstrate the utility of our approach for metabolic engineering, we consider the metabolic engineering of *E. coli* for increased acetate production-a problem that has received attention owing to the value of acetic acid for its industrial and food uses [32]. Knockout strategies for increased production of acetate based on FBA have previously been reported by Lun et al. [10] using the genome-scale metabolic reconstruction of *E. coli* by Feist et al. [6], *i* AF1260. These knockout strategies were generated using GDLS (Genetic Design through Local Search) [10], an efficient heuristic for generating metabolic engineering strategies involving multiple knockouts from genome-scale models, extending the capabilities of the computationally-expensive optimal search proposed by OptKnock [9]. The strategies were chosen using FBA to have high predicted production of acetate whilst maintaining required energy production and growth to ensure viability. Specifically, a predicted non-growth-associated ATP maintenance (NGAM) flux of at least 8.39 h^{-1} and a biomass flux of at least 0.05 h^{-1} were required.

Number of changes to reactions used in each *E. coli* knockout mutant compared with the wild-type (WT)

Gene Knockouts | 1 | 2 | 3 | 4 | 5 | 6 | 8 |
---|---|---|---|---|---|---|---|

| 12 | 12 | 22 | 25 | 25 | 29 | 36 |

| 9 | 9 | 16 | 21 | 23 | 17 | 24 |

*E. coli*metabolic network displays a degree of robustness as evidenced by the varied pathways by which biomass and NGAM are produced. The knockouts force flux onto alternative pathways for producing these necessary metabolic components, and these pathways produce acetate as a side-product. By finding and examining the elementary modes associated with the acetate-enhancing knockout strategies, we uncover the mechanism responsible for the increased acetate flux, which was not apparent from the FBA analysis.

Elementary modes for acetate-producing *E. coli* knockout strategies^{a}

Knockouts (Number in brackets, Separated by semi-colons) | Mode | Weight | Overall conversion (Acetate production in bold) | Biomass (g/mmol Glc) | NGAM (mmol/mmol Glc) | Number of reactions |
---|---|---|---|---|---|---|

| 1 | 2.92 | 1.000 Glc + 0.005 K + 0.315 NH | 0.029 | 0.2 | 412 |

2 | 7.08 | 1.000 Glc + 0.297 H | 0.021 | 1.1 | 414 | |

Combined | 10.000 Glc + 2.101 H | 0.23 | 8.4 | 415 | ||

| 1 | 4.18 | 1.000 Glc + 0.667 H | 0 | 2 | 38 |

2 | 5.82 | 1.000 Glc + 0.004 K + 0.228 NH | 0.021 | 0.0045 | 414 | |

Combined | 10.000 Glc + 2.788 H | 0.12 | 8.4 | 418 | ||

| 1 | 4.07 | 1.000 Glc + 0.750 H | 0 | 2.1 | 54 |

2 | 5.93 | 1.000 Glc + 0.047 H | 0.022 | 0 | 417 | |

Combined | 10.000 Glc + 3.331 H | 0.13 | 8.4 | 418 | ||

| 1 | 2.82 | 1.000 Glc + 0.444 H | 0 | 1.3 | 59 |

2 | 7.18 | 1.000 Glc + 0.233 H | 0.0071 | 0.64 | 418 | |

Combined | 10.000 Glc + 2.926 H | 0.051 | 8.4 | 421 | ||

| 1 | 2.91 | 1.000 Glc + 0.444 H | 0 | 1.3 | 59 |

2 | 7.09 | 1.000 Glc + 0.233 H2O + 0.001 K + 0.077 NH4 + 0.007 Pi + 0.159 SO4 → 0.408 12PPD-R + | 0.0071 | 0.64 | 416 | |

Combined | 10.000 Glc + 2.946 H | 0.051 | 8.4 | 419 | ||

| 1 | 2.87 | 1.000 Glc + 0.444 H2O + 0.167 SO4 → 0.444 12PPD-R + | 0 | 1.3 | 59 |

2 | 7.13 | 1.000 Glc + 0.235 H2O + 0.001 K + 0.076 NH4 + 0.007 Pi + 0.158 SO4 → 0.408 12PPD-R + | 0.0071 | 0.64 | 413 | |

Combined | 10.000 Glc + 2.949 H | 0.05 | 8.4 | 417 | ||

| 1 | 6.53 | 1.000 Glc + 0.762 H2O → 0.071 12PPD-R + | 0 | 1.3 | 77 |

2 | 3.47 | 1.000 Glc + 0.280 H | 0.014 | 0 | 424 | |

Combined | 10.000 Glc + 5.945 H | 0.05 | 8.4 | 427 | ||

| 1 | 6.53 | 1.000 Glc + 0.762 H | 0 | 1.3 | 77 |

2 | 3.47 | 1.000 Glc + 0.282 H | 0.014 | 0 | 424 | |

Combined | 10.000 Glc + 5.953 H | 0.05 | 8.4 | 427 |

Most interestingly, when the modes are examined in terms of acetate production, we find that the most efficient modes are generally those that only contribute to NGAM flux. However, a biomass-producing mode is necessary to satisfy growth and viability requirements. Thus, the problem of selecting knockouts to maximize acetate production, given a limiting resource such as glucose, is not necessarily about finding a single optimal elementary mode. Rather, competing constraints demand that the chosen modes need to complement each other well. This can be seen in the reported decompositions for the various knockout mutants.

For example, of all the biomass-producing modes, the mode arising from the five-knockout mutant is the most efficient at 1.496 mmol of acetate per mmol of glucose. When more knockouts are allowed, the overall acetate production is improved despite a decrease in the acetate production efficiency of the biomass-producing mode. This decrease is offset by a shift towards using NGAM-producing modes with significantly more efficient production of acetate. For the six and eight knockout cases, the NGAM-producing mode generates 1.690 mmol of acetate per mmol of glucose.

Finally, as FBA does not necessarily yield a unique distribution, we implemented the recursive MILP algorithm from Lee et al. [33] to find alternate optima and then obtained decompositions for the corresponding flux distributions. Our results (not shown) indicate that the decomposition into two modes with distinct functions related to NGAM and biomass production is preserved for alternate optima. Thus, the decomposition of flux distributions into primarily NGAM-producing and biomass-producing modes is a robust quality.

### Understanding the survival of MTB during infection

To illustrate another application of our approach, we consider the utilization of the glyoxylate shunt in MTB. The glyoxylate shunt enzyme isocitrate lyase (ICL), present in MTB as two isoforms, is believed to be required by microorganisms to utilize fatty acids as a source of carbon and energy. This shunt has previously been shown to be required for the *in viv* o growth and virulence of MTB [34]. Since MTB is believed to subsist on fatty acids during infection [34, 35], it is argued that, by removing ICL, MTB is no longer able to utilize fatty acids for carbon and energy and therefore unable to grow *in vivo*. Indeed, strains of MTB absent in both isoforms of ICL are unable to grow on fatty acid substrates and unable to survive in macrophages and mice [34]. Therefore, ICL has attracted significant attention as a promising drug target for treatment of MTB infection [36].

Under the assumption that the metabolic reconstruction by Beste et al. is correct, our analysis implies that MTB is capable of producing both carbon and energy from fatty acids without the use of ICL but, for lower uptake rates of fatty acids, ICL allows for more efficient utilization of the fatty acids. Indeed, we found that the optimal growth rate predicted by FBA differs only slightly with ICL present and without it (see Figure 1d). This presents the intriguing possibility that MTB possesses the metabolic capability to grow on fatty acids without ICL, but does not do so after the knockout of ICL because it has not yet undergone the adaptive evolution necessary to make use of this metabolic capability. This possibility is consistent with the work of Fong and Palsson [38] showing that the growth rate of gene deletion strains of *E. coli* can change significantly after undergoing adaptive evolution. The possible existence of such inactive routes for metabolizing fatty acids without the use of ICL in MTB has been discussed elsewhere [39] and, if true, would imply that MTB could rapidly evolve resistance to drugs inhibiting ICL.

Closer examination of the elementary modes reveals how MTB grows *in silico* without ICL and provides a testable means of confirming or rejecting the model's predictions. All the elementary modes that do not use ICL use malate synthase, the second of the two enzymes that form the glyoxylate shunt. In all of these modes, the full flux through malate synthase comes from HtrA, a gene predicted to code for 4-hydroxy-2-oxoglutarate aldolase to complete the hydroxyproline degradation pathway in MTB using the Bayesian method of Green and Karp [40]. HtrA supplies the glyoxylate that is used as a substrate by malate synthase. MTB and other mycobacteria have been observed to grow using hydroxyproline as a carbon source [41, 42], suggesting that this pathway may indeed exist. Further studies confirming and characterizing this pathway will shed light on whether it does in fact provide MTB with a viable means of producing glyoxylate.

The elementary mode decomposition analysis we have presented sheds light on the mechanics of the FBA prediction that is difficult to obtain from the FBA results alone. Specifically, HtrA in combination with malate synthase can be used in GSMN-TB to generate biomass and NGAM. However, the space of possible biomass and NGAM production rates that can be achieved in this way is smaller than that which is possible using the glyoxylate shunt. At low octadecenoate uptake rates, this difference is important, leading to a lower biomass production to meet the NGAM requirement. Furthermore, it also demonstrates the utility of our method in identifying a potential source of the discrepancy between in silico predictions and observed experimental results. If growth is not experimentally possible, even after adaptive evolution, then it suggests that the model is incorrect and the likely error in the model comes from the inclusion of the predicted gene, HtrA.

## Conclusions

We have presented a method for decomposing a given flux distribution into a set of constituent elementary modes. In contrast to previous approaches, our method does not require the initial generation of a full set of elementary modes, which is often computationally demanding and, in some cases, computationally infeasible for practical purposes. In a moderately-sized instance, we observed a computational time improvement of over 2000-fold using our method.

Overall, we see that elementary mode analysis offers a great deal that flux-balance analysis alone does not. FBA yields predictions of overall metabolic behavior, while elementary mode analysis allows understanding of metabolic capabilities. By decomposing flux distributions obtained by FBA into elementary modes, we can gain insight into how metabolic networks achieve their predictions. We exploit modularity to decompose a complex entity into a simpler entity, which enables debugging and understanding and, ultimately, more sophisticated design and engineering.

## Methods

### Genome-scale FBA modeling

*E. coli, i*AF1260. This model consists of three parts. First, from

*m*metabolites and

*n*reactions, we form an

*m*×

*n*stoichiometric matrix

*S*, whose

*ij*th element

*S*

_{ ij }is the stoichiometric coefficient of metabolite

*i*in reaction

*j*. Second, the flux distribution

*v*, whose

*j*th element

*v*

_{ j }is the flux though reaction

*j*, is constrained by a lower bound vector

*a*, whose

*j*th element

*a*

_{ j }is the lower bound of the flux through reaction

*j*, and an upper bound vector

*b*, whose

*j*th element

*b*

_{ j }> 0 is the upper bound of the flux through reaction

*j*. Finally, the linear objective is formed by multiplying the fluxes by an objective vector

*f*, whose

*j*th element

*f*

_{ j }is the weight of reaction

*j*in the biological objective. Thus, a biologically optimal flux distribution is obtained by solving

### Elementary mode decomposition

For a given flux vector *ν*, we use *R*(*ν*) = {*i*:*ν*_{
i
}≠ 0}to denote the set of indices of the reactions participating in *ν*. We define an elementary flux mode using the following two definitions [4].

#### Definition

*S*and lower and upper bound constraints

*a*and

*b*is a non-zero

*n*× 1 vector

*p*satisfying the following two conditions:

- 1.
it is a valid steady-state flux distribution, i.e.

*Sp*= 0; - 2.
irreversible reactions flow in the right direction, i.e. for all

*j*such that*a*_{ j }≥ 0, we have*p*_{ j }≥ 0.

#### Definition

We say a flux mode is *elementary* if it is minimal among all flux modes, i.e. there does not exist any other flux mode such that *R*(*p*') ⊂ *R*(*p*).

Before stating the algorithm, we require one further definition.

#### Definition

We say a flux mode *p conforms* to a flux distribution *v* if *ν*_{
j
}> 0 for all reactions *j* with *p*_{
j
}> 0 and if *ν*_{
j
}< 0 for all reactions *j* with *p*_{
j
}< 0.

Our interest is in finding elementary modes that conform to a given flux distribution *v* since it ensures that *v* is decomposed into elementary modes that have reactions flowing in the same direction as *v*, for the purposes of biological interpretation.

*v*in the feasible set of optimization problem (1) and outputs an elementary mode decomposition for

*v*, i.e. a set of elementary flux modes {

*p*

^{(k)}} that conform to

*v*and a corresponding set of positive weights {

*w*

_{ k }} such that . Suppose we have a flux distribution

*v*that satisfies

*Sv*= 0. Let

*k*: = 1, and

*v*

^{(k):}=

*v*. Choose some

*j*

_{ k }such that , and then solve the following mixed-integer linear program (MILP):

where *M* is a large constant and sgn is the sign function, taking the value 1 if its argument is positive and -1 if its argument is negative. This MILP is similar to that used by de Figueiredo et al. [25] for finding the shortest elementary modes in a metabolic network. Our purpose, however, differs significantly: we seek to decompose a given flux distribution into constituent elementary modes. The specific choice of *j*_{
k
}is unimportant: all choices such that
will generate a valid decomposition, though the specific decomposition achieved will likely vary with the choice. As discussed in the overview, we choose *j*_{
k
}= argmax_{
j
}|*v*_{
j
}|.

Let (*p**,*q**) be an optimal solution. We then set *p*^{(k)}: = *p** and
. Finally, we set *v*^{(k + 1)}: = *v*^{(k)}- *w*_{
k
}*p*^{(k)}. If *v*^{(k + 1)}is the zero vector, then we are done. Otherwise, we choose *j*_{k + 1}such that
, increment *k* by one, and repeat the above procedure.

The following proposition establishes that this algorithm generates the desired output. In brief, the algorithm works because, at each iteration, the MILP finds a flux mode where *p*_{
jk
}is non-zero and where the number of non-zero elements in the flux mode is minimized. Because the number of non-zero elements in the flux mode is minimized, the flux mode is minimal and, hence, elementary. Since each reaction with non-zero flux must participate in at least one elementary mode in the decomposition, it does not matter how *j*_{
k
}is chosen, as long as
. A valid decomposition will be generated regardless of how *j*_{
k
}is chosen, though the particular decomposition that is generated among the non-unique possibilities will depend on this choice.

#### Proposition

The elementary mode decomposition algorithm stated above terminates after a finite number of iterations *K* and generates a set of elementary flux modes {*p*^{(k)}} that conform to *v* and a corresponding set of positive weights {*w*_{
k
}} such that
.

#### Proof

First, to establish that each *p*^{(k)}is in fact a mode, we observe that any *p*^{(k)}generated as a solution of problem (2) will meet the steady state condition of the system. Problem (2) has a solution since
and *q* such that *q*_{
j
}= 1 if *p*_{
j
}≠ 0 and *q*_{
j
}= 0 otherwise is in the feasible set of the problem. Further, by constraining the components of *p*^{(k)}to have the same sign as the corresponding elements of *v*, we ensure that irreversible reactions flow in the right direction since, for any *j*, if *a*_{
j
}≥ 0 then *v*_{
j
}≥ 0, which sets the constraint *p*_{
j
}≥ 0 in problem (2).

Second, from the constraints of problem (2), we can see that *p*^{(k)}conforms to *v*.

Lastly, we establish that each *p*^{(k)}is minimal among all flux modes conforming to *v* and, therefore, elementary in the set of all such modes. We first observe that the optimal cost of problem (2) at iteration *k* is |*R* (*p*^{(k)})|. Suppose there exists a mode *p*' that conforms to *v* with *R*(*p'*) ⊂ *R*(*p*^{(k)}). If
, then we assume without loss of generality that
, and (*p',q'*), where *q'*_{
j
}= 1 if *p'*_{
j
}≠ 0 and q'_{j} = 0 otherwise, is in the feasible set of problem (2) and
, in contradiction with *p*^{(k)}being an optimal solution. If
, then let *p"* = *p*^{(k)}- *wp*', where
. Now *p"* is a mode that conforms to *v* with *R*(*p'*) ⊂ *R*(*p*^{(k)}) and (*p*", *q"*), where *q"j* = 1 if *p"j* ≠ 0 and *q"j* = 0 otherwise, is in the feasible set of problem (2) and
, resulting in the same contradiction. Hence there does not exist a mode *p'* that conforms to *v* with *R*(*p*') ⊂ *R*(*p*^{(k)}), and we conclude that *p*^{(k)}is elementary.

It is straightforward to see that *w*_{
k
}> 0 owing to its definition and that, after each iteration of the algorithm, |*R*(*v*^{(k)})| will decrease by at least 1, i.e. |*R*(*v*^{(k + 1)})| < |*R*(*v*^{(k)})|. Thus the algorithm will terminate after a finite number of iterations *K* ≤ |*R*(*v*)|. □

### Characterization of optimal metabolic behavior using given elementary modes

When calculating the elementary mode decompositions for a range of related flux distributions, as in our MTB application, it is helpful to use only a subset of all elementary modes obtained, since it likely that a subset of the modes suffices to generate valid decompositions for all the distributions. To do so, we select a subset of *K* modes {*p*^{(1)}, ..., *p*^{(k)}} out of all those obtained and use the following approach to determine if they suffice to support all the flux distributions. We successively remove modes from the subset to arrive at one that is minimal in the sense that no additional modes can be removed and still support all the flux distributions.

*P*=[

*p*

^{(1)}...

*p*

^{(K)}] and define a non-negative weight vector

*w =*[

*w*

_{1}, ...,

*w*

_{ K }] such that a flux distribution

*v*=

*Pw*. Substitution of

*v = Pw*into (1) gives a means of finding a biologically-optimal weight vector over the given set of elementary flux modes. Specifically, we solve

If the biomass derived from solving (3) corresponds with that from (1), we conclude that the given elementary modes are sufficient to characterize the flux distribution of interest, and the given modes are utilized according to the weights *w** obtained from the optimal solution of (3).

### Implementation of FBA and elementary mode decomposition

FBA and our elementary mode decomposition method were implemented using MATLAB R2010b and Gurobi 4.0. This implementation is available in additional file 1.

### Comparison to previous decomposition methods

*i*NJ661 growing on Middlebrook 7H9. The resulting distribution

*v*contained 507 non-zero components. The reactions

*j*for which

*v*

_{ j }= 0 were removed from the metabolic network, generating a reduced

*S*that was used as input to efmtool. efmtool version 4.7.1 was used with default parameters in MATLAB R2010b to generate the elementary modes for the reduced metabolic network, resulting in a 507 × 131,558 matrix

*P*containing all the elementary modes. Finally, the quadratic program

as proposed by Schwartz and Kanehisa [23] was solved using MOSEK 6.0.0.91 (MOSEK ApS, Copenhagen, Denmark).

efmtool generated the 131,558 elementary modes for the network in 1 minutes 48 seconds, while the quadratic optimization step took 32 minutes and 39 seconds, resulting in a total computational time of 34 minutes and 27 seconds. Computations were carried out on the Mac OS X 10.6.4 platform using a computer with an Intel Core 2 Duo 2.53 GHz processor with 4 GB of RAM.

For *i* NJ661v, the biologically optimal flux distribution *v* obtained by solving optimization problem (1) contained 505 non-zero components. Again, a reduced *S* was generated by removing the reactions *j* for which *v*_{
j
}= 0, and the result used as input to efmtool. With a maximum heap size of 4 GB, efmtool failed before generating all elementary modes, with 657,447 modes generated.

## Declarations

### Acknowledgements

KI gratefully acknowledges support from the Australian Government through an Australian Postgraduate Award. The authors would like to thank Belinda Chiera and Melissa Yates for careful proofreading of the manuscript.

## Authors’ Affiliations

## References

- Milne CB, Kim PJ, Eddy JA, Price ND: Accomplishments in genome-scale
*in silico*modeling for industrial and medical biotechnology. Biotechnol J. 2009, 4: 1653-1670. 10.1002/biot.200900234PubMed CentralView ArticlePubMedGoogle Scholar - Duarte NC, Becker SA, Jamshidi N, Thiele I, Mo ML, Vo TD, Srivas R, Palsson BØ: Global reconstruction of the human metabolic network based on genomic and bibliomic data. Proceedings of the National Academy of Sciences. 2007, 104: 1777-1782. 10.1073/pnas.0610772104.View ArticleGoogle Scholar
- Ma H, Sorokin A, Mazein A, Selkov A, Selkov E, Demin O, Goryanin I: The Edinburgh human metabolic network reconstruction and its functional analysis. Mol Syst Biol. 2007, 3:Google Scholar
- Kauffman KJ, Prakash P, Edwards JS: Advances in flux balance analysis. Curr Opin Biotechnol. 2003, 14: 491-496. 10.1016/j.copbio.2003.08.001View ArticlePubMedGoogle Scholar
- Edwards JS, Ibarra RU, Palsson BØ:
*In silico*predictions of*Escherichia coli*metabolic capabilities are consistent with experimental data. Nat Biotechnol. 2001, 19: 125-130. 10.1038/84379View ArticlePubMedGoogle Scholar - Feist AM, Henry CS, Reed JL, Krummenacker M, Joyce AR, Karp PD, Broadbelt LJ, Hatzimanikatis V, Palsson BØ: A genome-scale metabolic reconstruction for
*Escherichia coli*K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information. Mol Syst Biol. 2007, 3: 121-PubMed CentralView ArticlePubMedGoogle Scholar - Schuetz R, Kuepfer L, Sauer U: Systematic evaluation of objective functions for predicting intracellular fluxes in
*Escherichia coli*. Mol Syst Biol. 2007, 3: 119-PubMed CentralView ArticlePubMedGoogle Scholar - Alper H, Miyaoku K, Stephanopoulos G: Construction of lycopene-overproducing
*E. coli*strains by combining systematic and combinatorial gene knockout targets. Nat Biotechnol. 2005, 23: 612-616. 10.1038/nbt1083View ArticlePubMedGoogle Scholar - Burgard AP, Pharkya P, Maranas CD: OptKnock: A bilevel programming framework for identifying gene knockout strategies for microbial strain optimization. Biotechnol Bioeng. 2003, 84: 647-657. 10.1002/bit.10803View ArticlePubMedGoogle Scholar
- Lun DS, Rockwell G, Guido NJ, Baym M, Kelner JA, Berger B, Galagan JE, Church GM: Large-scale identification of genetic design strategies using local search. Mol Syst Biol. 2009, 5: 296-PubMed CentralView ArticlePubMedGoogle Scholar
- Pharkya P, Burgard AP, Maranas CD: OptStrain: A computational framework for redesign of microbial production systems. Genome Res. 2004, 14: 2367-2376. 10.1101/gr.2872004PubMed CentralView ArticlePubMedGoogle Scholar
- Pharkya P, Maranas CD: An optimization framework for identifying reaction activation/inhibition or elimination candidates for overproduction in microbial systems. Metab Eng. 2006, 8: 1-13. 10.1016/j.ymben.2005.08.003View ArticlePubMedGoogle Scholar
- Rocha M, Maia P, Mendes R, Pinto JP, Ferreira EC, Nielsen J, Patil KR, Rocha I: Natural computation meta-heuristics for the in silico optimization of microbial strains. BMC Bioinformatics. 2008, 9: 499- 10.1186/1471-2105-9-499PubMed CentralView ArticlePubMedGoogle Scholar
- Schuster S, Hilgetag C: On elementary flux modes in biochemical reaction systems at steady state. J Biol Syst. 1994, 2: 165-182. 10.1142/S0218339094000131.View ArticleGoogle Scholar
- Schuster S, Fell DA, Dandekar T: A general definition of metabolic pathways useful for systematic organization and analysis of complex metabolic networks. Nat Biotechnol. 2000, 18: 326-332. 10.1038/73786View ArticlePubMedGoogle Scholar
- Stelling J, Klamt S, Bettenbrock K, Schuster S, Gilles ED: Metabolic network structure determines key aspects of functionality and regulation. Nature. 2002, 420: 190-193. 10.1038/nature01166View ArticlePubMedGoogle Scholar
- Trinh CT, Wlaschin A, Srienc F: Elementary mode analysis: a useful metabolic pathway analysis tool for characterizing cellular metabolism. Appl Microbiol Biotechnol. 2009, 81: 813-826. 10.1007/s00253-008-1770-1PubMed CentralView ArticlePubMedGoogle Scholar
- Klamt S, Stelling J: Combinatorial Complexity of Pathway Analysis in Metabolic Networks. Mol Biol Rep. 2002, 29: 233-236. 10.1023/A:1020390132244View ArticlePubMedGoogle Scholar
- Acuña V, Chierichetti F, Lacroix V, Marchetti-Spaccamela A, Sagot MF, Stougie L: Modes and cuts in metabolic networks: Complexity and algorithms. Biosystems. 2009, 95: 51-60. 10.1016/j.biosystems.2008.06.015View ArticlePubMedGoogle Scholar
- Gagneur J, Klamt S: Computation of elementary modes: a unifying framework and the new binary approach. BMC Bioinformatics. 2004, 5: 175- 10.1186/1471-2105-5-175PubMed CentralView ArticlePubMedGoogle Scholar
- Terzer M, Stelling J: Large-scale computation of elementary flux modes with bit pattern trees. Bioinformatics. 2008, 24: 2229-2235. 10.1093/bioinformatics/btn401View ArticlePubMedGoogle Scholar
- Feist AM, Palsson BØ: The growing scope of applications of genome-scale metabolic reconstructions using
*Escherichia coli*. Nat Biotechnol. 2008, 26: 659-667. 10.1038/nbt1401PubMed CentralView ArticlePubMedGoogle Scholar - Schwartz JM, Kanehisa M: A quadratic programming approach for decomposing steady-state metabolic flux distributions onto elementary modes. Bioinformatics. 2005, 21: ii204-205. 10.1093/bioinformatics/bti1132View ArticlePubMedGoogle Scholar
- Poolman MG, Venkatesh KV, Pidcock MK, Fell DA: A method for the determination of flux in elementary modes, and its application to
*Lactobacillus rhamnosus*. Biotechnol Bioeng. 2004, 88: 601-612. 10.1002/bit.20273View ArticlePubMedGoogle Scholar - de Figueiredo LF, Podhorski A, Rubio A, Kaleta C, Beasley JE, Schuster S, Planes FJ: Computing the shortest elementary flux modes in genome-scale metabolic networks. Bioinformatics. 2009, 25: 3158-3165. 10.1093/bioinformatics/btp564View ArticlePubMedGoogle Scholar
- Colijn C, Brandes A, Zucker J, Lun DS, Weiner B, Farhat MR, Cheng TY, Moody DB, Murray M, Galagan JE: Interpreting expression data with metabolic flux models: Predicting
*Mycobacterium tuberculosis*mycolic acid production. PLoS Comp Biol. 2009, 5: e1000489-10.1371/journal.pcbi.1000489.View ArticleGoogle Scholar - Segrè D, Vitkup D, Church GM: Analysis of optimality in natural and perturbed metabolic networks. Proc Natl Acad Sci USA. 2002, 99: 15112-15117. 10.1073/pnas.232349399PubMed CentralView ArticlePubMedGoogle Scholar
- Shlomi T, Berkman O, Ruppin E: Regulatory on/off minimization of metabolic flux changes after genetic perturbations. Proc Natl Acad Sci USA. 2005, 102: 7695-7700. 10.1073/pnas.0406346102PubMed CentralView ArticlePubMedGoogle Scholar
- Achterberg T: Constraint Integer Programming. 2007, Technische Universität BerlinGoogle Scholar
- Jamshidi N, Palsson BO: Investigating the metabolic capabilities of
*Mycobacterium tuberculosis*H37Rv using the*in silico*strain*i*NJ661 and proposing alternative drug targets. BMC Syst Biol. 2007, 1: 26- 10.1186/1752-0509-1-26PubMed CentralView ArticlePubMedGoogle Scholar - Fang X, Wallqvist A, Reifman J: Development and analysis of an in vivo-compatible metabolic network of Mycobacterium tuberculosis. BMC Syst Biol. 2010, 4: 160- 10.1186/1752-0509-4-160PubMed CentralView ArticlePubMedGoogle Scholar
- Causey TB, Zhou S, Shanmugam KT, Ingram LO: Engineering the metabolism of
*Escherichia coli*W3110 for the conversion of sugar to redox-neutral and oxidized products: Homoacetate production. Proc Natl Acad Sci USA. 2003, 100: 825-832. 10.1073/pnas.0337684100PubMed CentralView ArticlePubMedGoogle Scholar - Lee S, Phalakornkule C, Domach MM, Grossmann IE: Recursive MILP model for finding all the alternate optima in LP models for metabolic networks. Computers & Chemical Engineering. 2000, 24: 711-716.View ArticleGoogle Scholar
- Munoz-Elias EJ, McKinney JD: Mycobacterium tuberculosis isocitrate lyases 1 and 2 are jointly required for in vivo growth and virulence. Nat Med. 2005, 11: 638-644. 10.1038/nm1252PubMed CentralView ArticlePubMedGoogle Scholar
- Schnappinger D, Ehrt S, Voskuil MI, Liu Y, Mangan JA, Monahan IM, Dolganov G, Efron B, Butcher PD, Nathan C, Schoolnik GK: Transcriptional Adaptation of
*Mycobacterium tuberculosis*within Macrophages: Insights into the Phagosomal Environment. J Exp Med. 2003, 198: 693-704. 10.1084/jem.20030846PubMed CentralView ArticlePubMedGoogle Scholar - Sharma V, Sharma S, Hoener zu Bentrup K, McKinney JD, Russell DG, Jacobs WR, Sacchettini JC: Structure of isocitrate lyase, a persistence factor of
*Mycobacterium tuberculosis*. Nat Struct Biol. 2000, 7: 663-668. 10.1038/77964View ArticlePubMedGoogle Scholar - Beste DJ, Hooper T, Stewart G, Bonde B, Avignone-Rossa C, Bushell ME, Wheeler P, Klamt S, Kierzek AM, McFadden J: GSMN-TB: a web-based genome-scale network model of
*Mycobacterium tuberculosis*metabolism. Genome Biol. 2007, 8: R89- 10.1186/gb-2007-8-5-r89PubMed CentralView ArticlePubMedGoogle Scholar - Fong SS, Palsson BØ: Metabolic gene-deletion strains of
*Escherichia coli*evolve to computationally predicted growth phenotypes. Nat Genet. 2004, 36: 1056-1058. 10.1038/ng1432View ArticlePubMedGoogle Scholar - Purohit HJ, Cheema S, Lal S, Raut CP, Kalia VC: In Search of Drug Targets for Mycobacterium tuberculosis. Infect Disord Drug Targets. 2007, 7: 245-250. 10.2174/187152607782110068View ArticlePubMedGoogle Scholar
- Green ML, Karp PD: A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases. BMC Bioinformatics. 2004, 5: 76- 10.1186/1471-2105-5-76PubMed CentralView ArticlePubMedGoogle Scholar
- Indira M, Sirsi M: Nutritional studies on Mycobacterium tuberculosis--effect of amino acids on the 'in vitro' growth of Mycobacterium tuberculosis. Indian J Tuberculosis. 1960, 7: 75-82.Google Scholar
- Krulwich TA, Pelliccione NJ: Catabolic pathways of coryneforms, nocardias, and mycobacteria. Annu Rev Microbiol. 1979, 33: 95-111. 10.1146/annurev.mi.33.100179.000523View ArticlePubMedGoogle Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.