Like most methods of kinetic parameter estimation, we assume that temperature and pressure are constant, so rate constants in mass action kinetic equations are constant, and the Gibbs Free Energy of Formation is constant. We also assume that the measured system is at steady state, meaning that the time derivatives of metabolite concentrations and reaction rates are zero.

Also, we assume that there are measurements of stable reactants and products of enzyme reactions, but not substrate-enzyme complexes, product-enzyme complexes and free enzyme concentrations, as they are generally difficult to measure experimentally. It is assumed that metabolite concentrations are obtained by high-throughput methods, such as chromatography, mass spectroscopy, or nuclear magnetic resonance spectroscopy [15]. For example, reasonably accurate concentration data can be obtained by mass spectroscopy with internal standards. Normally, average value of coefficient of variation for mass spectrometry below 0.2 is considered as good measurements [16–18], and thus it is not unreasonable to expect such data to have a constant coefficient of variation (i.e., normally distributed relative error) of 20 *%*.

We also assume that it is possible to obtain measurements of reaction rates. For steady state reaction rate measurement, one widely used method is *C*
^{13} labeling, which uses a cell culture at steady state in a medium with labeled-carbon substrates. Reaction rates can be determined by analyzing the labeling pattern of targeted metabolites from mass spectrometry [19]. In addition, we assume that the Gibb’s Free Energies of Formation of metabolites are known, since these are used to compute the equilibrium constants (*K*
_{
eq
}) for enzymatic reactions.

### Single substrate and product reversible reactions

We use a standard simple but general reaction mechanism to represent most metabolic reversible reactions [20]. This subsection considers single reactant/product case. The more general case consisting of multiple reactants and multiple products will be discussed later. The reaction is a three step process, namely binding, conversion and release:

$$ \mathrm{a} + \mathrm{E}\; {\underset{k_{\text{-}1}}{\overset{k_{1}}\rightleftharpoons}} \; \text{aE} \;{\underset{k_{\text{-}2}}{\overset{k_{2}}\rightleftharpoons}}\; \text{bE} \; {\underset{k_{\text{-}3}}{\overset{k_{3}}\rightleftharpoons}} \; \mathrm{b} + \mathrm{E} $$

((1))

where *a* is the reactant, *b* is the product, *E* is the free enzyme, *aE* and *bE* are the intermediate complexes, and *k*
_{
i
} and *k*
_{−i
} are reaction rate constants for *i*∈{1,2,3}.

Assuming the reaction is at steady state, an equation for the reaction rate can be written as:

$$ v = \frac{K_{eq}[\!a]-[\!b]}{c_{1} + c_{2}[\!a] + c_{3}[\!b]} $$

((2))

where \(K_{\textit {eq}} = \frac {k_{1}k_{2}k_{3}}{k_{-1}k_{-2}k_{-3}}\) is an equilibrium constant, obtained from the Standard Gibbs Free Energy of Formation of the reactants and products.

*c*
_{1} is

$$\left(\frac{k_{2}k_{3}}{k_{-1}k_{-2}k_{-3}} + \frac{k_{3}}{k_{-2}k_{-3}} + \frac{1}{k_{-3}}\right)/\left[E_{tot}\right], $$

*c*
_{2} is

$$\left(\frac{k_{1}k_{2}}{k_{-1}k_{-2}k_{-3}} + \frac{k_{1}k_{3}}{k_{-1}k_{-2}k_{-3}} + \frac{k_{1}}{k_{-1}k_{-3}}\right)/[\!E_{tot}], $$

*c*
_{3} is

$$\left(\frac{1}{k_{-2}} + \frac{1}{k_{-1}} + \frac{k_{2}}{k_{-1}k_{-2}}\right)/[\!E_{tot}], $$

and [ *E*
_{
tot
}], the total enzyme, is [ *E*]+[ *a*
*E*]+[ *b*
*E*].

If *K*
_{
eq
} is very large and the reversible reactions’ rate constants (*k*
_{−2} and *k*
_{−3}) are small, *c*
_{3} can be neglected and the rate Eq. 2 can be reduced to standard irreversible Michaelis Menten equation.

This rate equation can be derived from the ordinary differential equations for mass action kinetics of a reaction (1), by setting the derivatives of the concentrations of all chemical species to zero (since the system is assumed to be at steady state) and solving for [ *E*
_{
tot
}]. The detailed derivation and calculation for the steady state equation and equilibrium constant are presented in Additional files 2 and 3.

### Parameter estimation by maximum likelihood for single substrate/product reversible reaction

The InVEst method estimates the parameters of kinetic rate Eq. (2) using maximum likelihood, assuming relative error in all measurements. Parameters are estimated from a set of *n* experiments, each with data values for *a*
_{
i
} (substrate), *b*
_{
i
} (product), *v*
_{
i
} (reaction rate), for experiment *i*.

Each data value has some known relative error. Specifically, we have *a*
_{
i
}=*a*
_{
i0}
*ε*
_{
a
}, *b*
_{
i
}=*b*
_{
i0}
*ε*
_{
b
} and *v*
_{
i
}=*v*
_{
i0}
*ε*
_{
v
}, where *a*
_{
i0}, *b*
_{
i0}, and *v*
_{
i0} are latent variables representing the data values without measurement error, multiplied by a normally distributed error with mean 1 and standard deviation *σ*: \(\epsilon _{x} \sim N \left (1,{\sigma _{x}^{2}}\right)\) (where *x* is *a*, *b*, or *v*).

The likelihood function is:

$$\begin{aligned} L(a_{i0},b_{i0},v_{i0},c_{1},c_{2},c_{3} ; a_{i},b_{i},v_{i})= \\ f(a_{i},b_{i},v_{i} ; a_{i0},b_{i0},v_{i0},c_{1},c_{2},c_{3}) \end{aligned} $$

Since each data acquisition can be carried out independently [21], errors in *a*, *b* and *v* can be assumed to be independent of *c*
_{1},*c*
_{2} and *c*
_{3} and each other, the likelihood function can be written as

$$\begin{aligned} & f(a_{i},b_{i},v_{i} ; a_{i0},b_{i0},v_{i0}) = \\ & \prod f(a_{i} ; a_{i0}) \prod f(b_{i} ; b_{i0}) \prod f(v_{i} ; v_{i0}) \end{aligned} $$

The distribution of *a*
_{
i
} is

$$N\left(a_{i0},a_{i0}^{2}{\sigma_{a}^{2}}\right) = \frac{1}{\sqrt{2\pi a_{i0}^{2}{\sigma_{a}^{2}}}}\exp\left(-\frac{(a_{i}-a_{i0})^{2}}{2a_{i0}^{2}{\sigma_{a}^{2}}}\right) $$

The distributions of the other data values are similar.

The parameters that maximize the likelihood also maximize the log of the likelihood, which is

$${} {\fontsize{9.4pt}{9.6pt}\selectfont{\begin{aligned} \log(L)= &\sum_{i=1}^{n}\left(-\log\left(a_{i0}\sigma_{a}\sqrt{2\pi}\right)\right) + \sum_{i=1}^{n}\left(-\frac{(a_{i}-a_{i0})^{2}}{2a_{i0}^{2}{\sigma_{a}^{2}}}\right) \\ &+\sum_{i=1}^{n}\left(-\log\left(b_{i0}\sigma_{b}\sqrt{2\pi}\right)\right) + \sum_{i=1}^{n}\left(-\frac{(b_{i}-b_{i0})^{2}}{2b_{i0}^{2}{\sigma_{b}^{2}}}\right) \\ &+\sum_{i=1}^{n}\left(-\log\left(v_{i0}\sigma_{v}\sqrt{2\pi}\right)\right) +\sum_{i=1}^{n}\left(-\frac{(v_{i}-v_{i0})^{2}}{2v_{i0}^{2}{\sigma_{v}^{2}}}\right) \end{aligned}}} $$

Negating the log likelihood and dropping constant factors yields an objective function to minimize, subject to the constraints of Eq. 2.

$$\begin{aligned} \min &\left(\sum_{i=1}^{n}(\log(a_{i0})+\log(b_{i0})+\log(v_{i0})) \right.\\ &+ \frac{1}{2{\sigma_{a}^{2}}}\sum_{i=1}^{n}\left(\frac{a_{i}}{a_{i0}}-1\right)^{2} + \frac{1}{2{\sigma_{b}^{2}}}\sum_{i=1}^{n}\left(\frac{b_{i}}{b_{i0}}-1\right)^{2} \\ &\left. + \frac{1}{2{\sigma_{v}^{2}}}\sum_{i=1}^{n}\left(\frac{v_{i}}{v_{i0}}-1\right)^{2} \right)\\ \text{s.t.}& \quad v_{i0} = \frac{K_{eq}a_{i0} - b_{i0}}{c_{1} + c_{2}a_{i0} +c_{3}b_{i0}}, \text{\ where \(i = 1,2,\cdots,n\)} \end{aligned} $$

where all the *a*
_{
i
}, *b*
_{
i
} and *v*
_{
i
} are experimental measurements, all the relative errors *σ* are known and *a*
_{
i0}, *b*
_{
i0}, *v*
_{
i0} are latent variables, and *c*
_{1}, *c*
_{2} and *c*
_{3} are the parameters to be estimated by solving the optimization problem.

In the implementation, this is simplified to an unconstrained optimization problem by substituting the right-hand side of Eq. 2 for *v*
_{
i0}.

### Generalization to multiple substrates and products

For reactions with multiple substrates and products, there are two possible mechanisms, namely single-displacement and double-displacement. For single-displacement reactions, the order of substrates binding to the enzyme can be random or ordered. Those two type of reactions can be approximated by following reaction [22]:

$$\begin{aligned} & \mathrm{a}_{1} + \mathrm{a}_{2} + \cdots + \mathrm{a}_{\mathrm{m}} + \mathrm{E} \; {\underset{k_{\text{-}1}}{\overset{k_{1}}\rightleftharpoons}} \; \mathrm{a}_{1}\mathrm{a}_{2}\cdots \mathrm{a}_{\mathrm{m}}\mathrm{E} \\ & {\underset{k_{\text{-}2}}{\overset{k_{2}}\rightleftharpoons}} \; \mathrm{b}_{1}\mathrm{b}_{2}\cdots \mathrm{b}_{\mathrm{p}}\mathrm{E}\; {\underset{k_{\text{-}3}}{\overset{k_{3}}\rightleftharpoons}} \;\mathrm{b}_{1} + \mathrm{b}_{2} + \cdots + \mathrm{b}_{\mathrm{p}} + \mathrm{E} \end{aligned} $$

where *m* is the number of reactants and *p* is the number of products in this reaction.

A steady state equation can be derived as in the single reactant/product case:

$$ v = \frac{K_{eq}\prod\limits_{j=1}^{m}[\!a_{j}]-\prod\limits_{j=1}^{p}[\!b_{j}]} {c_{1} + c_{2}\prod\limits_{j=1}^{m}[\!a_{j}] + c_{3}\prod\limits_{j=1}^{p}[\!b_{j}]} $$

((3))

where *c*
_{1}, *c*
_{2}, *c*
_{3}, *K*
_{
eq
}, and *E*
_{
tot
} are as before.

The derivation of the objective function to minimize in order to find the parameters that maximize the likelihood is a straightforward generalization of the single substrate/product case.

$${} {\fontsize{9.4pt}{9.6pt}\selectfont{\begin{aligned} \min &\left(\sum_{i=1}^{n}\sum_{j=1}^{m}\log(a_{ij0}) +\sum_{i=1}^{n}\sum_{j=1}^{p}\log(b_{ij0}) +\sum_{i=1}^{n}\log(v_{i0}) \right.\\ &+ \frac{1}{2{\sigma_{a}^{2}}}\sum_{i=1}^{n}\sum_{j=1}^{m}\left(\frac{a_{ij}}{a_{ij0}}-1\right)^{2} + \frac{1}{2{\sigma_{b}^{2}}}\sum_{i=1}^{n}\sum_{j=1}^{p}\left(\frac{b_{ij}}{b_{ij0}}-1\right)^{2} \\ &+ \left. \frac{1}{2{\sigma_{v}^{2}}}\sum_{i=1}^{n}\left(\frac{v_{i}}{v_{i0}}-1\right)^{2} \right) \end{aligned}}} $$

which is maximized subject to the constraints of Eq. 3.

In the implementation, this can also be simplified to an unconstrained optimization problem by substituting the right-hand side of Eq. 3 for *v*
_{
i0}.

### Parameter identifiability

It is sometimes not possible to obtain *in vivo* data whose values are well enough distributed to estimate all parameters accurately. In this section, we characterize some cases when parameters cannot be accurately estimated. From Eq. (2), it is clear that when one term in the denominator is much smaller than the others, *v* is relatively insensitive to the corresponding parameter. For example, if *c*
_{1},*c*
_{2}
*a*≫*c*
_{3}
*b*, then Eq. 2 will be approximately

$$ v = \frac{K_{eq}[\!a]-[\!b]}{c_{1} + c_{2}[\!a]}, $$

So changes in *c*
_{3} will have little effect on *v*. More importantly, changes in data values resulting from erroneous estimates of *c*
_{3} will be small relative to the noise in the data, so estimates of *c*
_{3} tend to have large errors. Similarly, estimates of *c*
_{1} tend to have large errors when *c*
_{2}
*a*+*c*
_{3}
*b*≫*c*
_{1} and estimates of *c*
_{2} have large errors when *c*
_{1}+*c*
_{3}
*b*≫*c*
_{2}
*a*.

For illustration, consider the simpler case when *K*
_{
eq
} is very large and the rate Eq. (2) can be approximated by the standard Michaelis Menten equation. In Fig. 1(a), two data sets derived from the same actual parameters have large *a*
_{
i
}, so the *v*
_{
i
} values lie near the maximum value of the curve. We call this region as saturation region since reaction rates asymptotically approach a maximum level, and additional increases in the substrate concentration do not lead to an increase in the reaction rates. In this case, *c*
_{2}, which determines the maximum value, is the only parameter that affects the curve fit, so estimates of *c*
_{1} from both data sets have large errors. In Fig. 1(b), all of the substrate concentration *a*
_{
i
} values are small, so the points lie near the region where the curve is increasing linearly. We call this region as linear region since reaction rates increase in almost a linear fashion with increasing substrate concentrations. The slope in this region is determined by *c*
_{1} almost independently of *c*
_{2} so estimates of *c*
_{2} have large errors.

Estimates of the accuracy of parameter estimates must be obtained using the available data. InVEst uses bootstrapping to estimate the variance of the parameter estimates.

### Bootstrap estimation of standard error

The *c* parameter estimates can vary widely in accuracy, depending on the experimental data. Bootstrapping [23] is used to estimate the relative standard errors and bias of the parameter estimates, so users can tell whether the parameter estimation is good or not. Let \(\hat {c}\) be the estimate from the data, and \(\hat {c_{i}}^{*}\) be the estimate from a bootstrap sample. A typical recommendation is to use *N*=*n*
^{2} bootstrap samples for *n* experimental measurements [24]. The bootstrap estimation of standard errors is calculated from \(SE_{B}(\hat {c}) = \left [\frac {1}{N}\sum (\hat {c_{i}}^{*}-\hat {c})^{2}\right ]^{\frac {1}{2}}\) and bias estimation is calculated by \(\mathit {Bias} = \frac {1}{N}\sum \hat {c_{i}}^{*}-\hat {c}\)[25]. As the *c* parameters have a large range of possible values, it is more appropriate to use relative errors and relative bias to describe the estimate. The relative standard error is calculated by \(SE_{B}/\hat {c}\) and the relative bias is calculated by \(\mathit {Bias}/\hat {c}\).

### Estimation of total enzyme change

Estimating kinetic parameters can be useful for identifying the effects of genetic changes or drug treatments that target metabolic enzymes. The total concentration of the enzyme in the cell may change because of changes in gene expression or loss of function in one or more copies of the gene coding for the enzyme, or the activity may change because of changes in the protein sequence or post translational modifications. Estimating these changes for specific enzymes in each sample can help identify the target of a mutation or drug (it’s the enzyme whose activity changes the most), and may be useful for estimating the impact of such a change on flux through a network.

Since each of the kinetic parameters *c*
_{
i
} is of the form *c*
*i*′/*E*
_{
tot
}, where *c*
*i*′ is independent of the enzyme concentration, *E*
_{
tot
} can be estimated from the ratio

$$ \frac{E_{tot}^{wt}}{E_{tot}^{mt}} = \frac{c_{i}^{mt}}{c_{i}^{wt}} $$

where \(c_{i}^{wt}\) and \(c_{i}^{mt}\) are corresponding *c*
_{
i
} parameters (*i* = 1, 2 or 3) for wild type and mutant (or drug treated) samples. Note that it is possible to obtain a reliable estimate for *E*
_{
tot
} whenever there are reliable estimates for one of the three parameters in both samples.