Incorporation of enzyme concentrations into FBA and identification of optimal metabolic pathways
© De et al. 2008
Received: 31 March 2008
Accepted: 18 July 2008
Published: 18 July 2008
Skip to main content
© De et al. 2008
Received: 31 March 2008
Accepted: 18 July 2008
Published: 18 July 2008
In the present article, we propose a method for determining optimal metabolic pathways in terms of the level of concentration of the enzymes catalyzing various reactions in the entire metabolic network. The method, first of all, generates data on reaction fluxes in a pathway based on steady state condition. A set of constraints is formulated incorporating weighting coefficients corresponding to concentration of enzymes catalyzing reactions in the pathway. Finally, the rate of yield of the target metabolite, starting with a given substrate, is maximized in order to identify an optimal pathway through these weighting coefficients.
The effectiveness of the present method is demonstrated on two synthetic systems existing in the literature, two pentose phosphate, two glycolytic pathways, core carbon metabolism and a large network of carotenoid biosynthesis pathway of various organisms belonging to different phylogeny. A comparative study with the existing extreme pathway analysis also forms a part of this investigation. Biological relevance and validation of the results are provided. Finally, the impact of the method on metabolic engineering is explained with a few examples.
The method may be viewed as determining an optimal set of enzymes that is required to get an optimal metabolic pathway. Although it is a simple one, it has been able to identify a carotenoid biosynthesis pathway and the optimal pathway of core carbon metabolic network that is closer to some earlier investigations than that obtained by the extreme pathway analysis. Moreover, the present method has identified correctly optimal pathways for pentose phosphate and glycolytic pathways. It has been mentioned using some examples how the method can suitably be used in the context of metabolic engineering.
Metabolism is a complex process that takes place for producing energy and forms the driving force for cellular activity. It involves a large number of chemical reactions/conversions carried out by living organisms as they feed, grow and reproduce. A cascade of such reactions/conversions form a highly branched network. A metabolic network consists of many reactions and transport processes associated with the production and depletion of cellular metabolites. Metabolic pathways are defined as coordinated series of biochemical reactions in which the product of one reaction is the reactant of the subsequent one in the chain. Examples of metabolic pathways include Glycolysis, the Krebs cycle and the Pentose phosphate pathways.
There exist various categories of data models for analyzing metabolic pathways. The huge amount of genomic data, available at present, has led to the construction of genome-scale models of metabolism . The biological information from genomes can be extracted by constructing computational models and subsequently making predictions from them [2, 3]. Flux balance analysis is a constraint-based approach [4–6] that spans the closed solution space within which many steady state solutions would lie. Optimization techniques are used to find out a single state, within this space of allowed states, which reflects the actual flux distribution of the cell under a defined set of nutrient conditions [7, 8]. The utilities of such modeling include predicting systems behavior, identifying crucial steps in systems regulation. In , Cascante et. al. have shown how this kind of modeling can be used for characterizing fermentation pathway of S. cerevisiae. Moreover, modeling and analysis of metabolic networks may be useful to perform rational drug design .
Reactions in a metabolic pathway are mostly enzymatic. That is, for a reaction A → B catalyzed by an enzyme E, the rate of production of B depends not only on the concentration of the substrate A but also on the concentration of E that is available for catalyzing the reaction. Assuming that sufficient amount of the substrate A being present, if the concentration of E is low (high) then the rate of production of B will also be low (high). In the extreme pathway analysis (one of the methods under flux balance approach) , the authors have considered the reaction flux but not the enzyme concentration. This motivates us to develop a new method that considers both the substrate and enzyme concentration, thereby it becomes somewhat closer to real life situations than what extreme pathway analysis offers. We intend to undertake this endeavor in the present article.
The effectiveness of the present method is demonstrated on two synthetic systems designed in [12, 13], on two pentose phosphate, two glycolytic pathways , one large carotenoid biosynthesis pathway  and a network of core carbon metabolism  of various organisms belonging to different phylogeny. The method is compared with the existing extreme pathway analysis . The major differences of the present method from the existing extreme pathway analysis have been pointed out. Finally, we provide biological relevance of the results. A possible validation from biological point of view along with the salient features of the method is also included. It has been demonstrated with a few examples that the method can be appropriately applied to the problems of metabolic engineering.
The proposed method is described in the methodology section. Here we provide a comparative analysis of the present method with extreme pathway analysis using two synthetic [12, 13] and four different real life pathways. Real life pathways include pentose phosphate and glycolytic pathways of E. coli K-12 MG1655, T. pallidum and P. falciparum, a large network of carotenoid biosynthesis [16–20] and a network of core carbon metabolism . All these real life pathways are obtained from the KEGG database . In order to restrict the size of the article, we have provided a brief account on these real-life pathways in the Additional File 1. Some of the results are included here while the others are provided in the Additional File 1 for restricting the size of the article.
For the system in Fig. 2, we want to maximize the rate of yield of the metabolite P, starting from the substrate A. The system is composed of 10 reactions R 1, R 2,...,R 10 involving 6 metabolites A, B, C, D, E and P. The reactions R 2 and R 8 are reversible. We associate the weighting factors c 1, c 2,...,c 10 corresponding to the enzymes catalyzing these reactions respectively. There are 6 internal fluxes R 5, R 6,...,R 10 and 4 exchange fluxes R 1, R 2,...,R 4 as depicted in Fig. 2. The constraints as mentioned in the methodology section, and following  are as follows:
α = (0, -∞, 0, 0, 0, 0, 0, -∞, 0, 0); β = (1, 0, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞).
The terms α 2 and α 8 are -∞ because R 2 and R 8 are reversible. Moreover, β 2 = 0 because exclusive growth on substrate A is considered. Finally, we assume that the maximal uptake rate of A is 1 (β 1 = 1). Following the method described in the methodology section, we have generated a set of flux vectors.
In order to maximize the rate of yield of P for growth on substrate A, the objective function y (Eq. (7)) is minimized, where z is given by z = c 9 v 9 + c 10 v 10 - c 3 v 3. We vary the value of λ from 0.1 to 1.0. Initially, we should always give stress on the maximization of the rate of yield rather than on the constraint. That is, initially λ should be kept small. As we go from λ = 0.1 to λ = 1.0, it implies that we are increasing the stress on the constraint, and finally both the rate of yield (z) and the constraint are treated equally. For each value of λ, we minimize y, and consider that set of c i -values corresponding to the λ-value as the final solution, for which y becomes minimum. Here we have obtained an optimal pathway as R 1 → R 5 → R 9 → R 3, which is in accordance with earlier investigations . The optimal pathway is obtained for λ = 1.0 in 85 iterations.
Some possible paths
Average quantity (z) of P
R1 → R5 → R9 → R3
c1 = 0.88, c5 = 0.80, c9 = 0.80, c3 = 0.86
R1 → R6 → R8 → R9 → R3
c1 = 0.88, c6 = 0.56, c8 = 0.57, c9 = 0.80, c3 = 0.86
R1 → R5 → R8 → R10 → R3
c1 = 0.88, c5 = 0.80, c8 = 0.57, c10 = 0.04, c3 = 0.86
R1 → R6 → R10 → R3
c1 = 0.88, c6 = 0.56, c10 = 0.04, c3 = 0.86
R1 → R7 → R10 → R3
c1 = 0.88, c7 = 0.18, c10 = 0.04, c3 = 0.86
Variation of c-values and average z with the upper bound on reaction fluxes for the optimal path R 1 → R 5 → R 9 → R 3 of the system in Fig. 2
Upper bound on flux value
Average quantity (z) of P
c1 = 0.93, c5 = 0.83, c9 = 0.91, c3 = 0.85
c1 = 0.91, c5 = 0.92, c9 = 0.82, c3 = 0.98
c1 = 0.91, c5 = 0.85, c9 = 0.83, c3 = 0.91
c1 = 0.88, c5 = 0.84, c9 = 0.81, c3 = 0.86
c1 = 0.87, c5 = 0.82, c9 = 0.86, c3 = 0.83
c1 = 0.84, c5 = 0.89, c9 = 0.87, c3 = 0.82
c1 = 0.94, c5 = 0.87, c9 = 0.89, c3 = 0.81
c1 = 0.89, c5 = 0.86, c9 = 0.85, c3 = 0.85
c1 = 0.86, c5 = 0.84, c9 = 0.81, c3 = 0.82
c1 = 0.98, c5 = 0.93, c9 = 0.91, c3 = 0.87
The glycolytic pathway in T. pallidum consists of 13 metabolites and 25 fluxes (Fig. 3). The starting metabolite is α -D-Glucose-1P and the target product is phosphoenolpyruvate. Thus we maximize the rate of yield z = c 25 v 25 - c 24 v 24 of phosphoenolpyruvate, starting from the substrate α-D-Glucose-1P. Here an optimal pathway has been obtained as α - D - Gluucose - 1P → α - D - Gluucose - 6P → β - D - Fructose - 6P → β - D - Fructose - 1, 6P2 → Glyceraldehyde - 3P → Glycerate - 1, 3P2 → Glycerate - 3P → Glycerate - 2P → Phosphoenolpyruvate in 100 iterations as shown by bold (black) arrows. The optimal pathway is obtained for λ = 0.9.
We have applied the method to analyze a simple example that represents the skeleton network of core carbon metabolism [15, 23]. The network includes 23 reactions, seven of which are regulated by four regulatory proteins. The internal metabolites are A, B, C, D, E, F, O2, NADH, ATP and the external metabolites are Carbon 1, Carbon 2, D ext , E ext , F ext , H ext and Oxygen. This network is a highly simplified representation of core metabolic processes including a glycolytic pathway with carbon 1 (C1) and carbon 2 (C2) as primary substrates, as well as a pentose phosphate pathway and a TCA cycle, through which amino acid H enters into the system. Fermentation pathways, amino acid biosynthesis, cell growth along with corresponding regulation (e.g. catabolite repression, aerobic/anaerobic regulation, and carbon storage regulation) are also included. The growth reaction is indicated by white arrows. For further details of the pathway, one may refer to [15, 23].
Our methodology aims at obtaining the optimal metabolic flux distribution within the solution space. In this study, R z , the biomass flux, is defined as C + F + H + 10AT P → 1 Biomass, and needs to be maximized. Applying the proposed methodology, the optimal pathway has been found to be: A → B → C → G → C → Biomass for λ = 0.8 in 95 iterations.
Extreme pathways are a set of generating vectors that describe the conical steady-state solution space for flux distributions through an entire metabolic network. These cone-generating vectors correspond to biochemical pathways. The optimal metabolic pathways are calculated using linear optimization and are then interpreted using the extreme pathways. For details of the method, one may refer to [11, 24]. Here we demonstrate a comparative analysis of the present method with the extreme pathway analysis [11, 24]. The comparative analysis has been done on all the above mentioned pathways. Optimal pathways obtained by extreme pathway analysis for the two synthetic systems (Fig. 2, and Fig. 8 in Additional File 1) are the same as that obtained by the present method. Similarly, for pentose phosphate and glycolytic pathways in E. coli K-12MG1655 (Figs. 10 and 11 in Additional File 1), optimal pathways are the same as obtained by both the methods.
For P. falciparum, an optimal pentose phosphate pathway α - D - Glucose - 6P → β - D - Fructose - 6P → D - Xylulose - 5P → D - Glyceraldehyde - 3P as obtained by the extreme pathway analysis was found to be different from that obtained by the present method as shown in Fig. 12 in Additional File 1. The glycolytic pathway in the organism T. pallidum as obtained by the present algorithm has been found to be different from the previously identified optimal pathway by the extreme pathway analysis. For the latter case, it was found to be α - D - Glucose - 1P → α - D - Glucose - 6P → β - D - Glucose - 6P → β - D - Fructose - 6P → β - D - Fructose - 1, 6P2 → Glyceraldehyde - 3P → Glycerate - 1, 3P2 → Glycerate - 3P → Glycerate - 2P → Phosphoenolpyruvate as shown by bold (white) arrows in Fig. 3. Using the extreme pathway analysis we have obtained a different carotenoid biosynthesis pathway: Phytoene → 9, 9' - Di - cis - ζ - Carotene → 7, 9, 7', 9' - Tetra - cis - lycopene → Lycopene → γ - Carotene → β - Carotene → β - Cryptoxanthin → Zeaxanthin → Antheraxanthin → Violaxanthin → 9 - cis - Violaxanthin → Xanthoxin → Abscisic aldehyde → Abscisic alcohol as shown by bold (white) arrows in Fig. 4.
For the carotenoid biosynthesis pathway, the path obtained by the proposed method is different from the extreme pathway analysis at the starting branch and at the end of the branch of the optimal path.
Starting from Phytoene there are two paths to arrive at the intermediate metabolite Lycopene. Of the two paths, the proposed method found three metabolites, Phytofluene, ζ-Carotene and Neurosporene while the extreme pathway analysis identified two metabolites, 9,9'-Di-cis-ζ-Carotene and 7,9,7',9'-Tetra-cis-lycopene, to reach Lycopene, as can be seen from Fig. 4. The paths identified by the above two methods are different till it reaches the metabolite Lycopene. From Lycopene both the methods follow the same path till they arrive at the other intermediate metabolite Violaxanthin. From Violaxanthin extreme pathway method found one intermediate metabolite, 9-cis-Violaxanthin to arrive at the other intermediate metabolite Xanthoxin. The proposed method found two intermediate metabolites Neoxanthin and 9'-cis-Neoxanthin to reach Xanthoxin. So we can conclude that from Violaxanthin the paths obtained by the above two methods are differnt till it arrives at the metabolite Xanthoxin. From Xanthoxin both the methods follow the same path to reach the target metabolite Abscisic alcohol.
Comparison of the flux values obtained by the proposed method and the extreme pathway analysis for the core carbon metabolic network in Fig. 5
(Extreme path- way analysis)
|v av - v max |/v av × 100%
The values of v max in Table 3 corresponding to extreme pathway analysis were obtained by imposing certain environmental and regulatory constraints mentioned in , while the proposed method, for simplicity, does not consider such constraints. Moreover, for computing an average flux value by our method, we have taken average of a distribution of such flux values while the method of extreme pathway analysis determines the flux value. Finally, it may be mentioned here that we have developed the methodology accommodating certain characteristics of a system (i.e. within a specific metabolic system).
We did not consider the characteristics outside the system, from which the external fluxes enter into it. However, the method developed here has produced consistent results that have been validated by randomizing the starting point in generating flux vectors.
Comparison of flux values obtained by the proposed method and the extreme pathway analysis for the system in Fig. 2
Average flux value (proposed method)
Flux value (Extreme pathway analysis)
(v av )
(v epa )
|v av - v epa |/v av × 100%
This section provides how the results obtained by the present method as well as extreme pathway analysis are relevant to some biological facts already observed by other researchers. Salient features of the present method along with true/false positive/negative scenarios are also depicted.
Here we demonstrate how the results obtained by the present method are biologically more relevant than those obtained by the extreme pathway analysis.
In the carotenoid biosynthesis pathway, there are two possible paths from the initial metabolite phytoene, producing 9,9'-Di-cis-ζ-Carotene in one branch and phytofluene in another branch. Of the two possible paths, the branch that produces phytofluene as the intermediate metabolite is observed in [25, 26], which is the same as obtained by our proposed method (Fig. 4). It is to be mentioned here that the other path has been identified by the extreme pathway analysis.
As we proceed along the path, we observe that there are 4 possible paths emerging from the intermediate metabolite Neurosporene (Fig. 9 in Additional File 1). The path that produces α-Zeacarotene and Hydroxy-neurosporene are not biochemically feasible as they do not lead to the target metabolite Abscisic alcohol. Of the remaining two paths, the path producing Lycopene is obtained by the present method (Fig. 4). The path that leads from phytoene to lycopene through the intermediate paths as mentioned above can be found in fungi [27, 28]. Lycopene is also found to be an intermediate in the biosynthesis of other carotenoids, in some bacteria, fungi and green plants . Thus both the present and extreme pathway analysis have correctly identified Lycopene as an intermediate over the other alternative β-Zeacarotene (Fig. 4).
There are 4 possible paths emerging from the intermediate metabolite Lycopene (Fig. 9 in Additional File 1). The paths producing δ-Carotene, 3,4-Dehydrolycopene and Rhodopin as the intermediate metabolites are not possible as they do not lead to the final product Abscisic alcohol. We can reach the target metabolite Abscisic alcohol only through the path that produces γ-carotene. There are 7 possible paths from the intermediate metabolite γ-carotene (Fig. 9 in Additional File 1). The paths yielding Chlorobactene, 1'-Hydroxy- γ-carotene, Myxol, Deoxymyxol, (2'S)-Deoxymyxol 2'-α-L- fucoside and 1'2'-Dihydro-γ-carotene do not terminate to the target metabolite Abscisic alcohol, and hence they are not biochemically feasible. The target metabolite can be obtained through the path producing β-carotene. The biosynthesis pathway for β-carotene has been determined for fungi such as Phycomyces blakesleeanus and Neurospora crassa . β-carotene is also synthesized by a number of bacteria, fungi, and most green plants . The path from β-carotene producing Echinenone terminates at the end products Adonixanthin and Astaxanthin. The other path terminates at the end product Isorenieratene. The above mentioned two paths are not possible as none of them yields Abscisic alcohol that is the desired target metabolite of carotenoid biosynthesis (Fig. 9 in Additional File 1). The path producing β-Cryptoxanthin is the only feasible path as it finally leads to the target metabolite.
From β-Cryptoxanthin there are two paths producing Thermo-biszeaxanthin and Zeaxanthin as the two products. The path producing Zeaxanthin is followed as it terminates to the desired end metabolite. The path from β-carotene to Zeaxanthin can be found in Flavobacterium Species . The conversion between Zeaxanthin and Antheraxanthin is a reversible one, and as the forward reaction rate is greater than the reverse rate, the pathway from Zeaxanthin to Antheraxanthin is favored. As Capsanthin obtained from Antheraxanthin is not the desired end product (Fig. 9 in Additional File 1), this path is not followed. Violaxanthin obtained from Antheraxanthin via the reversible reaction ultimately leads to the target metabolite Abscisic alcohol. Capsorubin is obtained from Violaxanthin but this path is not followed as this does not yield the desired target metabolite Abscisic alcohol. Neoxanthin is produced from Violaxanthin. Arabidopsis is the best-characterized plant system of carotenoid biosynthesis. The path from β-carotene to Neoxanthin for xanthophyll biosynthesis has been observed in Arabidopsis. It may be emphasized that this path was identified by the present method but not by the extreme pathway analysis (Fig. 4).
Similarly, there exists a single path from Neoxanthin to 9'-cis-Neoxanthin and from 9'-cis-Neoxanthin to Xanthoxin. Two paths are emerging from Xanthoxin producing Abscisic aldehyde in one branch and Xanthoxic acid in the other. The path leading to Xanthoxic acid is not followed as this does not lead to the final metabolite of carotenoid biosynthetis. The other one producing Abscisic aldehyde is followed as it terminates to the target metabolite Abscisic alcohol by the subsequent reaction. Two intermediate metabolites, 9'-cis-neoxanthin and 9-cis-violaxanthin, have been identified in light-grown and etiolated leaves, and in roots of a variety of species . Biochemical evidence has suggested the occurrence of this pathway in various eukaryotes and in archaea .
The existence of the sugar phosphates Glyceraldehyde-3P, Ribulose-5P, Xylulose-5P, Fructose-6P and Glucose-6P in the pentose-phosphate pathway (PPP) are found in [34–36] (Figs. 10 and 12 in Additional File 1). The major pathway for glucose-6P metabolism in E. coli in  is the same as obtained from our proposed methodology (Fig. 11 in Additional File 1).
Here we highlight some of the salient features of our method and try to argue that the results obtained thereof might not be a mathematical artefact. The present method maximizes the rate of production of biomass which in a way should be consistent with the law of mass action and subsumes the free energy minimization principle. Then we use the stoichiometric matrix based on these two fundamental and model independent principles. In our results, we are able to identify the true positive and the true negative scenarios correctly which could be a pointer to the fact that our method has not introduced any artefact in its formulation. As for the intermediate scenario our method for real systems so far has not produced any false positive or false negative results.
Considering the pentose phosphate pathway in the organism E. coli K-12 MG1655 (Fig. 10 in Additional File 1) we have found the path starting from the metabolite α-D-Glucose-6P to reach the target metabolite D-Glyceraldehyde-3P and D-Fructose-6P. We have observed the relative values of the components of the flux vector v that are involved in the aforesaid resulting pathway. The value of v 6 is greater than v 5, and the value of v 21 is greater than v 22. This leads us to obtain the target metabolite.
Starting from any intermediate metabolite, e.g., β-D-Glucose-6P, D-Glucono-1,5 lactone-6P and 6-Phospho-D-Gluconate, we were able to reach the target metabolite D-Glyceraldehyde-3P and D-Fructose-6P. While considering a particular intermediate metabolite, we noted the relative values of the components of the flux vector v. We observed that the relative values of the components of the flux vector v while obtaining the path from any intermediate metabolite to the target are of the same order of magnitude as that obtained by considering the original path from α-D-Glucose-6P to reach the target metabolite.
Moreover, we considered some other metabolite as starting substrate that are not on optimal path, and found that they did not lead to the target metabolite. For example, starting from D-Ribose-5P, we were able to reach the metabolite PRPP for most of the values of λ lying between 0.1 and 1.0, in steps of 0.1. For low values of λ, we could reach the metabolite D-Erythrose-4P and D-Fructose-6P. Thus we can conclude that any intermediate metabolite on optimal path produces the target metabolite, and it is independent of the starting metabolite. As desired, the metabolite not on optimal path do not lead to the target metabolite D-Glyceraldehyde-3P and D-Fructose-6P even via the reverse path. Similar observations were found for the other pathways.
We took two intermediate metabolites β-D-Glucose-6P and D-Glucono-1,5 lactone-6P that are indeed on the final path as identified by the present method, and we found that in most of the cases it led to the target. This shows that the method that we have developed is independent of choice of the initial substrate. Then we chose another substrate which also belongs to the path identified by us but this time the substrate could lead to various branches, of which one would eventually lead us to the target. We found that for certain range of values of the parameter λ, this would always lead us to the target, picking up the correct branch that is most of the time followed by the organism. Now for certain other ranges of the values of the parameter λ, the other branches in the pathway were picked up. In all the cases, the relative strength of the vector v reflects the correct strength that would drive the path from the starting substrate to the target. We have observed this feature for all the real life paths.
The kind of constraint that we have imposed on to the systems must have captured the essential biochemistry of the systems. That is why the method becomes independent of the choice of the various substrates within the conscensus pathway and makes our methodology quite a general one without centering around any specific model of the system.
The method developed in this article may be useful for manipulating a metabolic pathway to achieve some desired goals constituting some tasks of metabolic engineering. Here we describe a few examples where this method may be useful.
Let us consider the synthetic pathway in Fig. 2 and redraw it in Fig. 6 incorporating the enzymes e 1, e 2,...e 10 mediating the reactions R 1, R 2,...R 10, respectively. For this system, we have already determined R 1 → R 5 → R 9 → R 3 as an optimal pathway through which the amount of the product P becomes maximum. We have also found that the concentration of the enzymes e 1, e 2,...e 10 that is required to get this optimal pathway is 0.88 for e 1, 0.80 for e 5, 0.80 for e 9 and 0.86 for e 3. For some reasons, let us say, the concentration of the enzyme e 5 becomes low (~0). Under this situation, the amount of the target product will also be less. On application of the present method on this system, we would be able to identify an optimal pathway and thereby the reason behind the situation. Then we can make necessary arrangement to activate the corresponding gene and thereby leading to the formation of the enzyme in higher concentration.
Consider the example of reducing the amount of acetate in glycolytic pathway as done by Yang et al . Here the authors have proposed a method of adding a new path of forming Acetoin for reducing the amount of acetate. However, this problem may boil down to determining an optimal metabolic pathway through which the amount of acetate is minimum. Then we can apply our method to this problem for determining this optimal pathway and finally inhibit the other paths but only express this optimal path. This will lead to the formation of acetate to a minimum amount. The amount of acetate will be minimum if this optimal path is only expressed and the others are inhibited.
For the third example, let us assume there are n paths P 1, P 2,...P n starting from a given substrate to yield a given target metabolite. Also assume that out of these n pathways, there are multiple optimal pathways P 1, P 2,...P m (m <n) through which the amount of the target metabolite becomes maximum. Now if we want to avoid a particular pathway, we may inhibit (by some means) the genes producing some of the enzymes catalyzing the reactions in that pathway.
Here we have developed a simple method for identifying an optimal metabolic pathway through which a metabolite attains a maximum rate of growth on a given substrate. The method involves formulation of the rate of yield of a metabolite incorporating weighting coefficients indicating the concentration levels of enzymes catalyzing biochemical reactions in the pathway. A new constraint incorporating these weighting coefficients has been defined. Using the method, a set of flux vectors has been generated, which has then been used to determine a set of above-mentioned weighting coefficients giving rise to a maximum rate of yield of a metabolite, starting from a given substrate. The entire method is based on well known flux balancing approach.
It is to be mentioned here that the extreme pathway analysis  does not consider the effect of enzymes catalyzing the reactions in a metabolic pathway. On the other hand, the method developed in this article involves enzyme concentration in its formulation; thereby it is closer to the real life situations than the extreme pathway analysis. The other difference between the said two methods is that extreme pathway analysis finds the flux vectors through optimization, whereas the present method generates a subset of possible flux vectors and finds an optimal pathway in terms of weighting coefficients reflecting enzyme concentration. Moreover, the extreme pathway analysis considers individual reactions in the pathway in a sequential manner, whereas the present method considers all the reactions in parallel.
It has been observed that the method though simple enough, is able to identify the optimal pathways which conform to the results of some earlier studies. The method can suitably be used using reaction databases without going into complex mathematical calculations, and without using various kinetic parameters that are hard to be estimated. Comparative analysis of the results obtained by the present method with that of the extreme pathway analysis shows that the present method has been able to identify optimal pathways correctly for almost all the pentose phosphate and glycolytic pathways considered here. The present method has identified a carotenoid biosynthesis pathway that is closer to some earlier investigations than that obtained by the extreme pathway analysis. All the optimal real life pathways have been biologically validated. Finally, possible direct impact of the method on certain problems of metabolic engineering has been pointed out.
Here we have assumed that a large amount of substrate is present. This assumption implies that any influx of the substrate from the other pathways does not have any effect on the rate of production of the corresponding product, due to limited supply of enzymes. Moreover, for simplicity, we have not considered any feedback inhibition on the enzyme activity. In other words, we are considering only the fraction of enzyme molecules that have not been inactivated due to feedback inhibitions. Incorporation of feedback inhibitions on enzyme activity forms a scope for further investigation.
In biochemical networks, crosstalk often occurs, which deals with multiple inputs and overlapping outputs . Here we are dealing with metabolic networks. If crosstalk occurs in the networks under consideration then there may be more than one disjoint sources (metabolites) from which the target or any other intermediate metabolites on the pathway under consideration are found. In this case, we have to consider all these input metabolites of the other networks while constructing the stoichiometric matrix and generating the flux vectors. However, this forms a scope for further investigation.
In this section we describe the proposed method. First of all, we make some assumptions based on which we describe the method subsequently.
Here we assume that a large amount of substrate is present. Thus any sudden influx of the substrate from other pathways does not effect any change of the rate of production of the corresponding product. This is due to the limited availability of the enzymes in a pathway. In other words, the ratio of enzyme concentration to substrate concentration is very low. For simplicity, we have not considered any feedback inhibition on the enzyme activity. In other words, we are considering only the fraction of enzyme molecules that have not been inactivated due to feedback inhibitions.
A metabolic reaction network is a collection of enzymatic reactions and transport processes. A system boundary can be drawn around all these types of reactions that constitute internal fluxes operating inside the network. The system is closed to the passage of certain metabolites while others are allowed to enter and/or exit the system based on external sources and/or sinks that are operating on the network. The existence of an external source/sink on a metabolite necessitates the introduction of an exchange flux, which allows a metabolite to enter or exit the system boundary. These fluxes represent the inputs and outputs of the system.
Thus the term z needs to be maximized for yielding maximum rate of growth of B. Here v k is the flux of the reaction R k involving only the metabolite B . The term c k in [0,1] denotes the weighting factor corresponding to this reaction R k . c k indicates the level of concentration of the enzyme catalyzing the reaction R k . c k = 1 indicates that the required amount of the enzyme catalyzing the reaction R k is present. On the other hand, c k → 0 indicates that sufficient amount of enzyme is not present to carry out the reaction. Higher the value of c k , higher is the concentration of the enzyme and vice-versa. The term v k is considered to be positive if the reaction R k yields the metabolite B, otherwise it is negative. A reversible reaction is considered as two separate reactions corresponding to forward and backward reactions. It is to be mentioned here that the role of c i in Eq. (1) in extreme pathway analysis  is different from that in the present method. In the earlier case, c is a unit vector, along a particular flux, whereas in the present method, c indicates the level of concentration of the various enzymes catalyzing the reactions in the network.
For solving the above-mentioned maximization problem, we require the values of the flux vectors v = [v 1, v 2,...,v n ] T that cannot be obtained easily as the full dynamics is not known or becomes intrackable in most of the scenarios. In order to overcome this situation, we now propose an algorithm for generating flux vectors that satisfy approximately the quasi-steady state condition . That is, we generate those v which satisfies
S.v ≈ 0 (2)
and the inequalities in (4) and (5). Here S is the m × n stoichiometric matrix  with m as the number of metabolites and n as the number of reactions. From a reaction database, S can be computed. Then the flux vectors v form the null space of S. In extreme pathway analysis, the approximately equality sign in Eq. (2) is replaced by the strict equality sign as the system is in a steady state scenario. The proposed method generates the flux vectors v as linear combinations of the basis vectors spanning the null space of S, and those satisfying the inequalities (4) and (5). We cannot always guarantee (due to finite memory, and the problem of overflow or underflow for representing a numerical value) the strict equality in Eq. (2) for the flux vectors that are generated. Thus we have considered approximate equality sign instead of strict equality. Since in practical situations, n > m, Eq. (2) is under determined. So we proceed as follows:
Generate basis vectors v b that form the null space of the stoichiometric matrix S. Let the number of such basis vectors be l. (This is done by using standard routines and toolboxes of MATLAB.)
(i) Generate l number of positive random numbers a p , p = 1, 2,...,l.
until certain inequality constraint on v is satisfied for all its components. All the internal fluxes are non-negative yielding 
v i ≥ 0, ∀i (4)
The constraints on the exchange fluxes depends on their directions . These constraints can be expressed as
α j ≤ v j ≤ β j (5)
where α j and β j are either zero, or negative and positive infinity, respectively, based on the direction of the exchange flux. The activity of these exchange fluxes is considered to be positive if the metabolite is exiting or being produced by the system, and negative if the metabolite is entering or being consumed by the system. For all metabolites in which a source or sink may be present, the exchange flux can operate in a bidirectional manner and is unconstrained. Under the existence of a source (input), α j is set to negative infinity and β j to zero. On the other hand, if only a sink (output) exists on the metabolite, α j is set to zero and β j to positive infinity. If both a source and a sink are present for the metabolite then the exchange flux is bidirectional with α j set to negative infinity and β j to positive infinity leaving the exchange flux unconstrained. For further details on these issues, one may refer to .
Thus we generate a large number of flux vectors, satisfying the inequality constraints, which form the data set. The flux vectors along with their corresponding weighting factors are used to determine z. The optimization algorithm searches through this generated data set, and estimates the values of the weighting coefficients c k (Eq. (7)) on convergence.
Eq. (2) describes the quasi-steady state condition, which assumes that the concentration of the enzymes catalyzing various reactions in the network is present in the system at the required level. In other words, the genes that produce these enzymes need to be expressed at the required level. For a variety of reasons, in real systems, the genes that produce these enzymes may not be expressed at the required level. This imposes restrictions on the system, and for this purpose, we define a new constraint as
S.(C.v) = 0 (6)
Here C is an n × n diagonal matrix whose diagonal elements are the components of the vector c. That is, if C = [γ ij ]n × n, then γ ij = δ ij c i , where δ ij is the Kronecker delta. Note that c i is the weighting factor corresponding to the enzyme catalyzing ith reaction in the network, irrespective of whether the reaction involves the metabolite B or not.
Thus the problem of determining a metabolic pathway yielding maximum rate of production of a metabolite B starting from a substrate A, boils down to a maximization problem, where z is maximized with respect to c, subject to satisfying the constraint given in Eq. (6).
Combining Eq. (1) and Eq. (6), we can reformulate the objective function as
y = 1/z + Λ T .(S.(C.v)) (7)
Thus the modified value of c i is given by
c i (t + 1) = c i (t) + Δc i , ∀i, t = 0, 1, 2,...
c i (t + 1) is the value of c i at iteration (t + 1), which is computed based on the c i -value at the iteration t. Regularization parameter λ is chosen empirically. Here we are varying the value of λ from 0.1 to 1.0 in steps of 0.1. Using the above mentioned method, for each value of λ, we finally get c i -values for which y attains a minimum value. For each value of λ, y is minimized. We choose the specific λ-value for which the y-value is the minimum over all the minima obtained for different values of λ. The c-values corresponding to this minimum y are finally considered.
The vector c corresponds to the flux vector v. That is, its ith component c i (c i ε [0,1]) is associated with the flux v i of the ith reaction of a metabolic network. On minimization of y, some of the c i values will attain non-zero values in [0,1] and the others are very close to zero. We consider a metabolic path to be an optimal one, if y is minimum and the c i -values of all the enzymes catalyzing the reactions in that path are greater than zero. Otherwise, a low c-value (close to zero) corresponding to an enzyme catalyzing an intermediate reaction may result in an insufficient product. This will reduce the rate of the next reactions and hence the amount of the target metabolite. In other words, these non-zero c i -values indicate an optimal pathway through which the rate of yield of metabolite B, being grown on the substrate A, becomes maximum. The major differences of the method from the existing extreme pathway analysis  are as follows.
Unlike the extreme pathway analysis, the present method considers the presence of enzymes.
Extreme pathway analysis finds the flux vectors upon optimization, whereas the present method generates a set of some possible flux vectors and finds an optimal pathway in terms of weighting coefficients reflecting enzyme concentration.
Extreme pathway analysis considers individual reactions in the pathway in a sequential manner, whereas the present method considers all the reactions in parallel.
The value of c corresponding to an enzyme E may be estimated in vitro in the following way. Let us assume the following enzymatic reactions
where S and P stand for substrate and product respectively. The terms k 1, k 2 and k 3 are rate constants. Let us also assume that x mole of S can produce y mole of P. Under this situation, assume that [E min ] is the minimum concentration of the enzyme E that is required to obtain the maximal rate (V max ) of product formation. Thus an estimate of c may be taken as
c = [E]/[E min ]
where [E] is the concentration level of the enzyme E which is required to get an optimal path. Note that the values of V max , and the rate constants can be estimated in vitro . Thus E min can also be determined through 
V max = k3[E min ]
and thereby [E] using c-value obtained by our method. On the other hand, if [E] can be determined in vitro, then the theoretical value of c (obtained by the proposed method) can be verified with the experimental value.
Dr. S. Mukhopadhyay gratefully acknowledges the financial assistance received in the form of a grant, BT/BI/04/001/93 and BT/BI/10/019/99 from the Department of Biotechnology, Government of India.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.