The generalized mass action model of cellular metabolism describes the mass balance of metabolites, taking into account all metabolic influxes and effluxes and their stoichiometric ratios, as follows:
(1)
where X(t,p) is the vector of metabolic concentration time profiles, S ∈ Rm × n is the stoichiometric matrix for m metabolites that participate in n reactions, and v(X,p) denotes the vector of metabolic fluxes (i.e. reaction rates). Here, each flux is described by a power-law equation:
(2)
where γ
j
is the rate constant of the j-th flux and f
ji
is the kinetic order parameter, representing the influence of metabolite X
i
on the j-th flux (positive: X
i
is an activating factor or a substrate, negative: X
i
is an inhibiting factor). In incremental parameter identification, a data pre-processing step (e.g. smoothing or filtering) is usually applied to the noisy time-course concentration data X
m
(t
k
), in order to improve the time-slope estimates. Subsequently, the dynamic metabolic fluxes v(t
k
) are estimated from Equation (1) by substituting with Finally, the kinetic parameters associated with the j-th flux (i.e. γ
j
and f
ji
’s) can be calculated using a least square regression of the power law flux function in Equation (2) against the estimated v
j
(t
k
). Note that for GMA models, the least square parameter regressions in the last step are linear in the logarithmic scale and thus, can be performed very efficiently.
A unique set of dynamic flux values v(t
k
) can only be computed from when the number of metabolites exceeds that of fluxes. However, a metabolite in general can participate in more than one metabolic flux (m < n). In such a situation, there exist an infinite number of dynamic flux combinations v(t
k
) that satisfy The dimensionality of the set of flux solutions is equal to the degree of freedom (DOF), given by the difference between the number of fluxes and the number of metabolites: n
DOF
= n-m >0 (assuming S has a full row rank, i.e. there is no redundant ODE in Equation (1)). The positive DOF means that the values of n
DOF
selected fluxes can be independently set, from which the remaining fluxes can be computed. This relationship forms the basis of the proposed estimation method, in which the model goodness of fit to data is optimized by adjusting only a subset of parameters associated with the independent fluxes above.
Specifically, we start by decomposing the fluxes into two groups: v(t
k
) = [ v
I
(t
k
)T v
D
(t
k
)T ]T , where the subscripts I and D denote the independent and dependent subset, respectively. Then, the parameter vector p and the stoichiometric matrix S can be structured correspondingly as p = [ p
I
p
D
] and S = [ S
I
S
D
]. The relationship between the independent and dependent fluxes can be formulated by rearranging into:
(3)
In this case, given p
I
, one can compute the independent fluxes v
I
(X
m
(t
k
),p
I
) using the concentration data X
m
(t
k
), and subsequently obtain v
D
(t
k
) from Equation (3). Finally, p
D
can be estimated by a simple least square fitting of v
D
(X
m
(t
k
),p
D
) to the computed v
D
(t
k
) one flux at a time, when there are more time points than the number of parameters in each flux.
In this study, two formulations of the parameter estimation of ODE models in Equation (1) are investigated, involving the minimization of concentration and slope errors. The objective function for the concentration error is given by
(4)
and that for the slope error is given by
(5)
where K denotes the total number of measurement time points and X(t
k
,p) is the concentration prediction (i.e. the solution to the ODE model in Equation (1)). Figure1 describes the formulation of the incremental parameter estimation and the procedure for computing the objective functions. Note that the computation of ΦC requires an integration of the ODE model and thus, the estimation using this objective function is expected to be computationally costlier than that using ΦS. On the other hand, metabolic mass balance is only approximately satisfied at discrete time points t
k
during the parameter estimation using ΦS, as the ODE model is not integrated.
There are several important practical considerations in the implementation of the proposed method. The first consideration is on the selection of the independent fluxes. Here, the set of these fluxes is selected such that (i) the m × m submatrix S
D
is invertible, (ii) the total number of the independent parameters p
I
is small, and (iii) the prior knowledge of the corresponding p
I
is maximized. The last two aspects should lead to a reduction in the parameter search space and the cost of finding the global optimal solution of the minimization problem in Figure1. The second consideration is regarding constraints in the parameter estimation. Biologically relevant values of parameters are often available, providing lower and/or upper bounds for the parameter estimates. In addition, enzymatic reactions in the ODE model are often assumed to be irreversible and thus, dynamic flux estimates are constrained to be positive. Hence, the parameter estimation involves a constrained minimization problem, for which many global optimization algorithms exist.
So far, we have assumed that the time-course concentration data are available for all metabolites. However, the method above can be modified to accommodate more general circumstances, in which data for one or several metabolites are missing. In this case, the ODE model is first rewritten to separate the mass balances associated with measured and unmeasured metabolites, such that
(6)
where the subscripts M and U refer to components that correspond to measured and unmeasured metabolites, respectively. Again, if the fluxes are split into two categories v
I
and v
D
as above, the following relationship still applies for the measured metabolites:
(7)
Naturally, the degree of freedom associated with the dynamic flux estimation is higher by the number of component in X
U
than before. Figure2 presents a modification of the parameter estimation procedure in Figure1 to handle the case of missing data, in which an additional step involving the simulation of unmeasured metabolites will be performed. In this integration, X
M
is set as an external variable, whose time-profiles are interpolated from the measured concentrations. The set of independent fluxes v
I
are now selected to include all fluxes that appear in and those that lead to a full column ranked S
D,M
. If S
D,M
is a non-square matrix, then a pseudo-inverse will be done in Equation (7). Of course, the same considerations mentioned above are equally relevant in this case. Note that the initial conditions of X
U
will also need to be estimated.