In this section we present two alternative procedures to obtain steady state solutions of possibly large under-determined metabolic networks. The first approach uses a symbolic GJE algorithm that guarantees exact solutions and the second approach uses SVD implemented in a numerical algorithm that ought to give better performance. In addition, we describe a method for applying constraints to the steady state solution.
Statement of the steady state problem
Every chemical reaction and thus reaction system has the strict requirement of conservation of mass. A system of mass balances around each species has the form:
where
is a length m vector of the time derivative for each mass density of metabolic species, N is the m × n stoichiometric matrix that links metabolites to their reactions via stoichiometry, and ν is a length n vector that describes the flux through each reaction. For a system at steady state with n reactions and m species, the system of chemical reactions becomes:
The number of flux variables that need to be specified to calculate a viable steady state is f = n - r where r is the rank of N. Let us denote the vectors of dependent and independent flux variables as νdep and νindep of length r and f, respectively. Then with a n × n permutation matrix P that reorders the columns of N such that columns corresponding to dependent flux variables appear earliest, the steady state Equation (2) reads
where
and N = [N1N2] PT. Clearly, when the m × r matrix N1 is regular (m = r and det N1 ≠ 0), the relation between νdep and νindep vectors can be computed directly:
However, for many metabolic networks the stoichiometric matrix N may contain linearly dependent rows (r < m). In addition, νindep or P are not known in advance.
In the following we consider two methods based on GJE and SVD procedures that solve Equation (2) for the relation between dependent and independent flux variables: The general solution is written as:
We identify R as a kernel of the steady state solution where the columns are flux basis vectors and νindep are flux coordinates.
Solving the steady state problem via GJE
Solving the steady state problem via GJE is based on transforming the stoichiometric matrix N to a row-echelon form NGJE where all columns corresponding to dependent flux variables would have exactly one nonzero element and Equation (5) can be easily composed (
is identity matrix and hence
). The column permutation matrix P is constructed during the GJE process while applying the leading row and column selection rules (pivot element selection). One of the advantage of using GJE is that it allows one to influence the pivot element selection rules so that a preferred flux basis for the system will be obtained. If the selected flux variables cannot form a basis, the routine will move one or more of the preselected independent variables to become dependent.
Note that in numerical GJE algorithms the typical leading row and column selection rule consists of choosing a pivot element with largest absolute value for maximal numerical stability. Symbolic GJE algorithms that calculate in fractions avoid numerical rounding errors and can implement more optimal selection rules that take into account the sparsity of the system. In SympyCore [11] the leading row and column selection rule consists of choosing such a pivot element that minimizes the number of row operations for minimal computation time.
Solving the steady state problem via SVD
Solving the steady state problem via SVD is based on decomposing the stoichiometric matrix N into a dot product of three matrices:
where u, v = [VimVker] are orthogonal matrices and σ is a diagonal matrix with nonzero values on the diagonal. The solution to the steady state Equation (2) is
where α is a f vector of arbitrary parameters. Note that the SVD approach does not provide a numerically reliable and efficient way to determine the vectors of dependent and independent flux variables and in the following we use these in the form of the permutation matrix P found from the GJE approach:
which gives Equation (5b):
where Vindep is a regular f × f matrix.
Processing and analysis of metabolic networks
SBML models of metabolic networks were obtained from the BiGG database [42]. During the parsing all floating point numbers were converted to fractional numbers. All species that did not participate in any reactions were excluded. Species that are appear as both a reactant and product, i.e. in polymerization reactions, were removed from the list of reactants, and an additional reaction transporting this species across the system boundary was added.
Each metabolic network was transformed into open form using the following rule: if a species participated in exactly one reaction, a reaction transporting this species across the system boundary was added. As an alternative rule used in the example yeast network, if all transport reactions out of the system are known, then transformation to open form is accomplished by removing rows for the species that are external to the system.
Both of these approaches result in equivalent steady state solutions because adding extra reactions extends linear pathways that each contain a species that exits the system. Both approaches were applied to the example yeast network: external species were removed to calculate the flux distribution in Figure 3 and additional file 2: yeast_example.pdf while the algorithm to add extra transport reactions was used to calculate the values in Table 1.
Composing the example yeast network
The example yeast network given in Figure 3 was manually composed for analyzing carbon isotope dynamics, and thus excludes metabolites that do not participate in carbon rearrangement, i.e. cofactors. To simplify the model, Carbon 3 of histidine (by InChI carbon number) was assumed to come from bicarbonate, and not Carbon 2 of ATP. Similarly, Carbon 1 of methionine was also assumed to come from bicarbonate, and not 5-Methyltetrahydropteroyltri-L-glutamate. In addition, the glyoxylate cycle and thus the third pathway for producing glycine was removed. All relevant details of the network including metabolite abbreviations, reaction definitions, the steady state solution, and substituted flux values used to constrain the system are given in additional file 2: yeast_example.pdf.
The example yeast network makes use of fictitious metabolites that link the stoichiometry of coupled reactions. The three pentose phosphate pathway reactions are broken into two parts each linked with a fictitious metabolite that represents the carbon skeleton that is broken off of one metabolite in the first step and transferred to the next. In the additional file 2: yeast_example.pdf fictitious metabolite names start with either a capital X, Y, or Z, followed by a lower case Greek number indicating the number of carbons they contain followed by a section indicating their use. This latter section is either the yeast enzyme they participate in, the code for the metabolite they are derived from, or GOG indicating the transfer of glutamate to 2-Oxoglutarate.
Applying constraints to the steady state solution
The GJE routine provides a flux based coordinate system to describe the steady state flux space while SVD provides an orthogonal coordinate system. When specifying a flux value that is part of a flux coordinate system, one dimension from the steady state flux space is removed.
Many different sets of independent flux variables can form a coordinate system for the steady state flux space. The GJE routine allows the researcher to specify which flux variables forms a flux coordinate system and thus can match the choice of coordinate system with the experimental data available. The basis vectors formed from a flux coordinate system are often sparse and tend to span connected portions of the metabolism.
Let us assume a relation between dependent and independent flux variables as given in Equation (5). In addition to that, let us assume some constraining knowledge about the dependent variables, for example, the flux positivity for irreversible reactions:
for some i ∈ [1; r]. The problem being solved is how the constraints on νdep constrain the independent flux variables νindep. The goal is to determine how the steady state flux space is bounded. This is useful for many techniques used to analyze the properties of metabolic networks, for example in optimization procedures that must scan the steady state flux space while avoiding regions that are not feasible [32].
To find the constraints for νindep, we set up the following system:
where g × f matrix G and g vector b define g measured data constraints for νindep; q × r matrix Q that defines q positivity constraints for νdep. The system in Equation (10) defines a convex polytope and due to the constraining parts it is redundant. The redundancy can be removed by using the following geometric computation algorithm: solve the vertex enumeration problem for the convex polytope defined by Equation (10) and then using the obtained vertexes and rays solve the facet enumeration problem. The solution to the facet enumeration problem is a set of inequalities that has no redundancies and defines the same convex polytope as Equation (10). Note that the intermediate result of the vertex enumeration problem (polytope vertexes and rays) provides convenient information to volume scanning applications.
Computational software and error analysis
The GJE results of this paper are obtained using a Python package SympyCore [11] that implements both memory and processor efficient sparse matrix structures and manipulation algorithms. For solving the steady state problem we are using the symbolic matrix object method get_gauss_jordan_elimination_operations that allows one to specify the list of preferred leading columns (that is, the preferred list of dependent flux variables) for the GJE algorithm and after applying the GJE process the method returns a matrix object that is in row-echelon form. In addition to that, the method returns also a list of all applied row operations that can be later efficiently applied to other matrix objects. This feature is especially useful for adding extra columns to a stoichiometric matrix and then applying GJE process without the need to recompute the row-echelon form of the original matrix. One could use this to add transport reactions to a metabolic network during the constraint process.
The SVD results of this paper were obtained using a Python package NumPy [43] that provides a function numpy.linalg.svd for computing SVD of an array object. NumPy was built with LAPACK and ATLAS (version 3.8.3) libraries that provide a state-of-the-art routine (dgesdd) for computing SVD.
Since the results obtained with the symbolic GJE routine are correct and the results of the numerical SVD routine contain numerical rounding errors then in the error analysis we are using maximal relative flux error
where
and
are matrix elements in Equation (5) obtained with GJE and SVD routines, respectively. Note that εSVD characterizes relative errors in dependent flux variables introduced by the numerical SVD routine.
For solving vertex and facet enumeration problems we use a Python package pycddlib [44], a wrapper of the cddlib (version 094g) that implements the double description method [40].
The Python scripts used for computing the results are available in SympyCore [45]. The performance timings were obtained on a Ubuntu Linux dual-core (AMD Phenom(tm) II X2 550) computer with 4GB RAM.