- Methodology article
- Open Access

# A method for zooming of nonlinear models of biochemical systems

- Mikael Sunnåker
^{1, 2, 3}Email author, - Gunnar Cedersund
^{4, 5, 6}and - Mats Jirstrand
^{1}

**5**:140

https://doi.org/10.1186/1752-0509-5-140

© Sunnåker et al; licensee BioMed Central Ltd. 2011

**Received:**3 February 2011**Accepted:**7 September 2011**Published:**7 September 2011

## Abstract

### Background

Models of biochemical systems are typically complex, which may complicate the discovery of cardinal biochemical principles. It is therefore important to single out the parts of a model that are essential for the function of the system, so that the remaining non-essential parts can be eliminated. However, each component of a mechanistic model has a clear biochemical interpretation, and it is desirable to conserve as much of this interpretability as possible in the reduction process. Furthermore, it is of great advantage if we can translate predictions from the reduced model to the original model.

### Results

In this paper we present a novel method for model reduction that generates reduced models with a clear biochemical interpretation. Unlike conventional methods for model reduction our method enables the mapping of predictions by the reduced model to the corresponding detailed predictions by the original model. The method is based on proper lumping of state variables interacting on short time scales and on the computation of fraction parameters, which serve as the link between the reduced model and the original model. We illustrate the advantages of the proposed method by applying it to two biochemical models. The first model is of modest size and is commonly occurring as a part of larger models. The second model describes glucose transport across the cell membrane in baker's yeast. Both models can be significantly reduced with the proposed method, at the same time as the interpretability is conserved.

### Conclusions

We introduce a novel method for reduction of biochemical models that is compatible with the concept of zooming. Zooming allows the modeler to work on different levels of model granularity, and enables a direct interpretation of how modifications to the model on one level affect the model on other levels in the hierarchy. The method extends the applicability of the method that was previously developed for zooming of linear biochemical models to nonlinear models.

## Background

One of the main reasons for the rapid growth of the field of systems biology is that it makes extensive use of mathematical modeling [1–3]. This allows for a better handling of high complexity, which is an inherent property of all living systems. Using modeling, complex hypotheses can be formulated and tested in a more systematic manner than is possible using only biochemical reasoning [4–6]. However, even if one can obtain a detailed model of the system with a high predictive power, the model in itself does not automatically lead to a full understanding of the underlying biochemistry. One should for instance analyze the model to single out its essence, i.e., to identify those parts of the model that can be eliminated, while still preserving the model's crucial behavior. This latter task is referred to as model reduction, and it is the topic of this paper. There is an extensive literature available on the topic of model reduction. However, most of these studies have been done outside the field of systems biology, and since systems biology brings about new types of challenges, reduction of biochemical models is still in its early stages. Traditional engineering approaches like balanced truncation have focused on preserving the input-output profile in an optimal manner, both for linear [7–10], and for nonlinear [11] systems. However, these methods are not suitable for systems biology, because the reduced model has no natural interpretation in itself (nevertheless, some special cases where this problem can be circumvented have been identified [12, 13]). This lack of interpretation is a problem because systems biology models are usually developed to help characterizing the dominating parts and structure of the system, and not only to obtain a black-box predictor. Methods have therefore been developed with traditional chemical approaches that are more centered on reducing the internal dynamics of the system. These methods are typically based on a sensitivity analysis [14–17], on time-scale separation [18–21], or on the lumping of state variables [22–26] (see [20] for a general review on model reduction). The perhaps most widely used method is lumping. Two of the main reasons for this are that an effective lumping scheme can be identified from basic properties of the model (e.g., the stoichiometry), and that lumped state variables are formed as easily interpretable pools of state variables in the original model. However, lumping does normally not come with the possibility of back-translation from the lumped state variables in the reduced model to the original state variables. In [27] we provided such relations. This means that we can take the result from a simulation of a reduced model, and without performing a new simulation, directly compute the corresponding trajectories of the desired original state variables. Because of this back-translation possibility, we refer to the resulting two models as two degrees of zooming of the same model. Nevertheless, like in other recent model reduction papers in systems biology [28–32], the results in [27] were mainly developed with linear systems in mind. Linear systems virtually only appear in the cases of mono-molecular reaction networks and for models describing the probabilistic evolution of a single protein complex [27, 33]. However, already in [27] we proposed that zooming may in principle also be applicable to nonlinear models, but we did not derive formulae for back-translation. Note that a majority of the currently available systems biology models are in fact nonlinear.

With the method introduced in this paper, we provide the extension of the previously proposed method in [27] to nonlinear models. We show that new challenges arise due to the nonlinearities, but also how these challenges can be overcome, for instance with a wise choice of state variables in the reduced model. The method is demonstrated by application to two closed models of metabolic systems.

## Methods

In this paper we present a more general version of the method that was introduced in [27], which is applicable to nonlinear models. We start with some basic definitions and key observations that are illustrated on a small example model, before we turn to the details of the method.

### Basic Definitions and Assumptions

*t*denotes time; the dot over

*x*in Eq. (1) denotes derivative w.r.t. to time; the state vector

*x*∈ ℝ

^{ n }; the parameters

*p*∈ ℝ

^{ p }; the inputs

*u*∈ ℝ

^{ m }; the outputs

*y*∈ ℝ

^{ l }; and

*f*and

*h*are in general nonlinear functions. The state vector, whose individual elements are referred to as state variables, typically represents amounts or concentrations of chemical species, and the parameters commonly represent kinetic constants, initial conditions, or scaling factors. In this paper we are primarily interested in a comparison of the state variables between models (the original model and the reduced model), which means that the form of the nonlinear function in Eq. (2) is irrelevant for the application of our method. The right-hand side of Eq. (1) can be expressed as the stoichiometric matrix

*S*∈ ℝ

^{n×q}times a vector of reaction rates

*r*=

*r*(

*x*,

*p*,

*u*,

*t*);

*r*∈ ℝ

^{ q }

*x*

_{ i }is in steady-state for

*t*≥ 0 it holds by definition that

*x*

_{ i }is in QSS, there are terms on the right-hand side of the ODE that are much larger than the negligible term on the left-hand side. The approximation

is then commonly used to reduce the model. We refer to a state as fast in the time interval ≤ *T*_{0} ≤ *t* < *T*_{1} if Eq. (4) is valid in this time interval, and holds for the class of all considered inputs to the system. Note that *T*_{1} = ∞ in the case that the systems remains in QSS, which may for example not be the case for models with switches (where, e.g., the values of a subset of the state variables may change when a certain condition is fulfilled) [34, 35].

Note that Eq. (3) (steady state) necessitates that Eq. (4) (quasi-steady state) is fulfilled, but not *vice versa*. Although QSS implies that (some of) the terms of the right-hand side of the ODE are large and leaves the left-hand side (derivative term) negligible, the derivative term may still be large enough for the state variable in QSS to change considerably during the time-span of a simulation; the key is that these changes mainly occur on a slow manifold.

### Zooming of Linear Models

*A*∈ ℝ

^{n × n},

*B*∈ ℝ

^{n × m},

*C*∈ ℝ

^{l × n}, and

*D*∈ ℝ

^{l × m}. The method is based on the existence of at least one subset of state variables in ${\mathcal{M}}_{o}$ for which the internal dynamics is very fast with respect to the current time-scale of interest. An algorithm for automatic reduction of linear models that is based on the detection of such subsets, which are referred to as fast clusters, is presented in [27]. If the

*w*state variables of a fast cluster are replaced by a single state variable ${x}_{lL}^{\prime}={\sum}_{i=1}^{w}{x}_{{l}_{i}}$, we obtain a reduced version of the original model

where *x*_{
r
} ∈ ℝ^{(n - w + 1)}, *A*_{
r
} ∈ ℝ^{(n - w + 1) × (n - w + 1)}, *B*_{
r
} ∈ ℝ^{(n - w + 1) × m}, and *C*_{
r
} ∈ ℝ^{l × (n - w + 1)}.

where ${k}_{jL}^{\prime}$ is the rate parameter in the reaction from state variable ${x}_{lL}^{\prime}$ to state variable *x*_{
j
} in the reduced model, and *k*_{
ji
} is the rate parameter in the reaction from state variable *x*_{
i
} to state variable *x*_{
j
} in the original model.

Finally, note that Eqs. (5) and (6) provide a link between ${\mathcal{M}}_{o}$ and ${\mathcal{M}}_{r}$, which constitute two different levels of granularity. It is this link between the models that make us consider them as two different degrees of zooming, and the primary goal of this paper is to establish such a link also for nonlinear models.

### Extension to Nonlinear Models - Initial Observations

We will now present some key observations that are used in the derivation of the method for zooming of nonlinear models.

*m*∈ ℕ

^{ n }for which

*mS*= 0, which also implies that

*n*

_{ r }, the number of linearly independent vectors for which Eq. (7) holds is equal to

*n*-

*n*

_{ r }, which implies the existence of a matrix

*M*

where $M\in {\mathbb{N}}^{n\phantom{\rule{0.3em}{0ex}}-\phantom{\rule{0.3em}{0ex}}{n}_{r}\phantom{\rule{0.3em}{0ex}}\times \phantom{\rule{0.3em}{0ex}}n}$.

*T*

_{0}≤

*t*<

*T*

_{1}. For simplification we will assume that there are no inputs to the system, although it would in principle be possible to incorporate inputs in the following discussion. The right-hand side of the ODEs for these fast state variables, if there are no inputs, can be separated into two parts. The first part contains reactions between fast state variables that are significant for the fast dynamics;

*r*

_{ f }(

*x*

_{ f },

*p*), and the second part contains all other reactions,

*r*

_{ s }(

*x*,

*p*), i.e.,

*S*

_{ f }and

*S*

_{ s }are the corresponding stoichiometric matrices. Let us now consider the fast stoichiometric matrix,

*S*

_{ f }, and especially the conserved moieties that are implied by

*S*

_{ f }. Since these moieties are only (approximately) conserved on a fast enough time-scale, we refer to such moiety conservations as apparent conservations. Let

*M*

_{ f }be a matrix with

*a*linearly independent rows such that

*l*, so that

*l*with respect to time, we get

where Eqs. (9) and (10) were used.

*M*

_{ f }is not unique, but that in fact any matrix $\widehat{M}=N{M}_{f}$ can be used for lumping, where

*N*∈ ℝ

^{a×a}is non-singular. This observation allows us to choose a matrix $\widehat{M}$ for which a maximal number of rows in $\widehat{M}\phantom{\rule{0.3em}{0ex}}{S}_{s}$ vanish, which results in the greatest possible reduction in the number of state variables. Finally note that Eq. (4), in the absence of inputs, gives

since the term *S*_{
f
}*r*_{
f
} (*x*_{
f
} , *p*) dominates the term *S*_{
s
}*r*_{
s
} (*x*, *p*). Note that Eq. (4) and consequently Eq. (13) only hold in *T*_{0} ≤ *t* < *T*_{1}, since the state is only known to be fast in this time span.

Eq. (12) defines the ODEs of the reduced model (the lumped state variables), and Eqs. (11) and (13) can in principle be used to calculate back-translation formulae, as is demonstrated with the small example model in the next section. However, as we shall see, this approach requires the explicit algebraic solution to a system of nonlinear equations, which is typically an infeasible task. Furthermore, there is not a clear one-to-one mapping between the state variables of the original and reduced models as in the case of proper lumping [27].

### A Small Example Model

We will now present a small example model, with three fast state variables, which is reduced with the approach discussed above. An alternative approach is then demonstrated with the advantage that it scales better to larger models.

*C*from a substrate

*A*and an enzyme

*B*

*x*

_{ f }= (

*A B C*)

^{ T }, where the bullets (•) represent the slow state variables surrounding the three fast state variables in the model. The ODEs for the fast state variables take the form

where the lumped state variables *L*_{1} and *L*_{2} are introduced. Note that Eq. (12) defines the dynamics of the lumped state variables.

*A*,

*B*, and

*C*are given by the non-negative solution to Eqs. (16)-(18)

where ${K}_{1}=\frac{{k}_{-1}}{{k}_{1}}$.

We can now employ Eq. (12) to solve the ODEs for the lumped state variables *L*_{1} and *L*_{2}, and use Eqs. (19)-(21) as back-translation formulae to compute the trajectories of the original state variables *A*, *B*, and *C*. However, note that for even slightly larger clusters of fast species than the one discussed here it would not be possible to calculate algebraic expressions of the original state variables with this approach, since it builds on the explicit solution of a system of nonlinear equations, which quickly becomes infeasible with growing problem size.

*A*and

*C*

*A*and

*C*is

*η*

_{ A }(

*B*,

*p*) and

*η*

_{ C }(

*B*,

*p*) are defined in Eqs. (23) and (24), respectively. The ODE for

*L*

_{1}is defined by Eq. (12), and the ODE for

*B*can be derived by differentiation of

*L*

_{2}in Eq. (18), which gives that

The reduced model consists of the two state variables *L*_{1} and *B* (note that *L*_{2} does not appear in the reduced model), and the dynamics is described by Eqs. (12) and (25), respectively. Note that the state variables *A* and *C* can be back-translated from the reduced model with Eqs. (23) and (24). This approach is a bit more intricate than the first, but comes with the advantage that we do not need to solve a system of nonlinear equations.

### A Method for Zooming of Nonlinear Models

We will now step-by-step present a method that can be used to construct zoomable nonlinear biochemical models. This involves two sub-goals: i) to identify a reduced model that shares important characteristics with the original model, ii) to derive back-translation formulae that can be used to compute the original state variables and parameters from the reduced model.

In an initialization step of the method for a model ${\mathcal{M}}_{o}$ we first formulate mathematical equations for all conservation relations Eq. (8), state variables in steady-state Eq. (3), and quasi-steady state assumptions Eq. (4). If additional properties of the system are known, we also formulate the corresponding equations.

### Step 1

The first step of the method is to identify the apparent conservation relations in the model.

**Definition 1:** Let *S*_{
f
} be the stoichiometric matrix for the reactions *r*_{
f
} (*x*_{
f
} , *p*) as defined in Eq. (9). Each subset of state variables for which the corresponding rows of *S*_{
f
} are linearly dependent constitutes an apparent conservation relation. Hence the apparent conservation relations lie in the left null space of *S*_{
f
} and the dimension of this space is *n* - rank(*S*_{
f
} ).

Note that the apparent conservation relations are defined in Eq. (11). It is trivial to identify the set of all linearly dependent rows of *S*_{
f
} with a mathematical computing software (e.g., SBtoolbox for Matlab [36]).

### Step 2

The second step of the method is to define the state variables of the reduced model, which we refer to as modified lumped state variables.

**Definition 2:** Let *x* be a lumped state variable corresponding to a subset of the state variables in an apparent conservation relation. Then *x* is a modified lumped state variable if the lumping scheme with respect to the state variables of the original model is proper.

where *M*_{
m
} is *a* × *n*_{
f
} matrix with elements equal to 0 or 1 and column sums equal to 1, and *l*_{
m
} denotes the modified lumped state variables. We typically have a large freedom in the choice of *M*_{
m
} . The number of state variables is maximally reduced if all exact conservation relations in the model are retained as modified lumped state variables (and replaced by constants).

### Step 3

*k*:th modified lumped state variable ${l}_{{m}_{k}}$ be denoted by ${x}_{{m}_{k}}$, so that

*n*

_{ m }equations that are linear w.r.t. ${x}_{{m}_{k}}$, and linearly independent, are required to calculate fraction parameters. The existance of

*n*

_{ m }such equations results in an equation system

*A*(

*l*

_{ m },

*p*) is invertible since the equations are linearly independent, and we have that

where we used Eq. (27) in the last step.

A modified lumped state variable for which an insufficient number of linear and linearly independent equations are available may still be used in the reduced model. However, the back-translation of the modified lumped state variable to the original state variables is then not possible, and step 3 of the method is ignored.

### Step 4

**Theorem:**The dynamics of the modified lumped state variables is given by

*S*

_{ s }and

*r*

_{ s }(

*x*,

*p*) were defined in Eq. (9), the matrix

*M*

_{ f }is defined in Eq. (11), and

where the matrix *M*_{
m
} is defined in Eq. (26), and ${g}_{i}\left({l}_{m},p\right)\equiv {x}_{{f}_{i}}$ is introduced to simplify the notation.

**Proof:**

*l*with respect to time, which gives

*I*is the identity matrix and

*J*(

*l*

_{ m },

*p*) is the Jacobian of (

*M*

_{ f }-

*M*

_{ m })

*g*(

*l*

_{ m },

*p*) with respect to

*l*

_{ m }. The element

*J*

_{ ij }of

*J*(

*l*

_{ m },

*p*) is given by

where Eq. (12) was used in the last step. □

The matrix *I* + *J*(*l*_{
m
} ; *p*) is symbolically invertible, but may in general contain singularities for particular combinations of parameters values and state variable values. However, the matrix is always invertible for the models discussed in this paper, since the corresponding determinants are strictly positive.

### Step 5

The final step of the method is to back-translate the modified lumped state variables to the original state variables with the fraction parameters derived in step 3. This allows a comparison between the predictions by the reduced model to those of the original model.

The implementation of the method is straight-forward, and we have used Matlab (R2008b) together with the SBtoolbox [36] as computing software for the models in this paper.

## Results

We will now demonstrate the method through application to two example models.

### Enzyme Kinetics Model

*S*, into a product,

*P*, which is catalyzed by an enzyme,

*E*.

Note that the complexes *C*_{
s
} and *C*_{
p
} are formed by *S* bound to *E*, and *P* bound to *E*, respectively. This model is frequently occurring as part of larger models of biological systems, although the reaction from *C*_{
S
} to *C*_{
P
} is sometimes neglected, or reversible. The ODEs for the model are listed in Appendix A.1, where the three reactions are defined as: *r*_{1} = *k*_{1}*SE* - *k*_{-1}*C*_{
S
} , *r*_{2} = *k*_{2}*C*_{
S
} , and *r*_{3} = *k*_{3}*C*_{
P
} - *k*_{-3}*PE*.

*r*

_{1}(

*x*,

*p*) and

*r*

_{3}(

*x*,

*p*) are assumed to be dominating, and the reaction term

*r*

_{2}(

*x*,

*p*) to be insignificant in the ODEs. This results in that all state variables are in QSS, which gives

which is constant since the total amount of the enzyme *E* is conserved in the system.

#### Reduction of the Enzyme Kinetics Model

*S*

_{ f }. Since

*r*

_{2}(

*x*,

*p*) is dominated by

*r*

_{1}(

*x*,

*p*) and

*r*

_{3}(

*x*,

*p*) the model ODEs can be written on the form of Eq. (9)

*S*

_{ f }is given by the row vectors of

*M*

_{ f }, which is defined by

where *L*_{
S
} and *L*_{
P
} are apparent conservation relations and *LE* is an exact conservation relation.

*L*

_{ E }is retained as a state variable in the reduced model (i.e., since

*L*

_{ E }, unlike

*L*

_{ S }and

*L*

_{ P }, can be replaced by a constant). The vector of modified lumped state variables then is defined

*L*

_{ E }. There are five equations (Eqs. (34)-(35) and (37)) that are linear w.r.t. the state variables

*E*,

*C*

_{ S }, and

*C*

_{ P }, which are lumped into the state variable

*L*

_{ E }. Note that only

*n*

_{ m }= 3 equations are required to derive fraction parameters, and we use Eqs. (34)-(36) to formulate an equation system as in Eq. (28) with the solution

where ${M}_{1}=\frac{{k}_{1}}{k-1}$ and ${M}_{3}=\frac{{k}_{3}}{k-3}$. The two remaining modified lumped state variables correspond to *S* and *P* in the original model, so we define that ${l}_{{m}_{1}}=S\equiv {g}_{1}\left({l}_{m},p\right)$ and ${l}_{{m}_{2}}=P\equiv {g}_{3}\left({l}_{m},p\right)$.

The two ODEs in Eqs (41)-(42) define the dynamics of the state variables in the reduced model. We finally note that the exact conservation relation for the substrate, *L*_{
T
} = *S* + *P* + *C*_{
S
} + *C*_{
P
} , together with Eqs. (39)-(40) can be used to reduce the model further to a single state.

*Additional files*.

*r*

_{1}(

*x*,

*p*) and

*r*

_{3}(

*x*,

*p*) dominate the reaction term

*r*

_{2}(

*x*,

*p*), which results in that all state variables are in QSS. To assess the impact of these assumptions on the reduced model we compute the relative difference between the state variables in the original and in the reduced model

*i*in the original model, ${x}_{i}^{r}$ is the corresponding back-translated state variable in the reduced model, and |

*x*| denotes the elementwise absolute values of

*x*. The maximal mean and infinity norm of

*ε*

_{ i }(

*t*) in Eq. (43) over time is presented in Table 1 for parameter values over five orders of magnitude. In general, the reduced model appears to be robust to changes in the parameter values, although slightly more sensitive to some parameters (e.g., small values of

*k*

_{-1}, large values of

*k*

_{1}, or large values of

*k*

_{2}, which violate the assumptions used in the reduction). However, note that the validity of the QSS assumption may also depend on the state variables, for example the total concentration of the enzyme. Interestingly, we observed that the reduced model can well approximate the original model over several order of magnitudes around the nominal enzyme concentration (

*L*

_{ E }= 1). It is well-known that the QSS approximation is only valid for sufficiently small enzyme concentrations, and as expected the performance of the reduced model starts to decrease for immense enzyme concentrations.

Robustness of the reduced model for large deviations from the nominal parameter point are presented for the enzyme kinetics model, with a sampling frequency of 0.1 (starting from 0.1) time units.

Param./Factor | 10 | 10 | 10 | 10 | 10 |
---|---|---|---|---|---|

| 0.00053/0.0082 | 0.0018/0.0071 | 0.0010/0.0059 | 0.019/0.18 | 0.19/1.8 |

| 0.19/1.9 | 0.0061/0.079 | 0.0010/0.0059 | 0.0017/0.0025 | 0.00024/0.0028 |

| 0.00019/0.0060 | 0.00082/0.0055 | 0.0010/0.0059 | 0.035/0.23 | 0.21/0.39 |

| 0.0068/0.014 | 0.0053/0.010 | 0.0010/0.0059 | 0.0067/0.044 | 0.035/0.29 |

| 0.020/0.30 | 0.0048/0.032 | 0.0010/0.0059 | 0.0051/0.0093 | 0.0069/0.015 |

All | 0.012/1.2 | 0.0014/0.062 | 0.0010/0.0059 | 0.025/0.41 | 0.010/0.045 |

where the fraction parameters specify the distribution of the enzyme among the corresponding original state variables.

### Glucose Transport in Budding Yeast

*S. cerevisiae*), which constitutes the first step of glycolysis, is presented in [37]. The inflow of glucose is modeled as a facilitated diffusion process, in which a carrier enzyme is responsible for the transport between the inner and outer regions of the cellular membrane. It is assumed that glucose 6-phosphate (G6P) has an inhibitory role in the glucose transport process by binding to the transporter. A graphical representation of the model is shown in Figure 2, and the ODEs for the state variables are listed in Appendix A.2.

In [27] we described how the calculation of fraction parameters, based on a set of assumptions, leads to the same reaction rates in the reduced model as were reported in [37]. The assumptions are that state variables participating in reactions for uptake and release of glucose and G6P across the cell membrane are in QSS, that the transporter is conserved, and that the concentrations of the transporter in the inner and outer regions of the cellular membrane are constant.

*L*

_{ E },

*L*

_{ Glc }, and

*L*

_{G 6P}are constant over time. The assumption in [37] that the concentrations of the transporter in the inner and outer regions of the cell membrane are constant is formulated

It is not clear how equations for back-translation of the state variables in the reduced model in [27, 37] can be derived. The reduced model has three state variables; external- and internal glucose and G6P, but only two differential equations for the in- and outflow of glucose, since the ODE for G6P is replaced by a representative function that is inferred from the G6P data. Our method does not rely on that such information is available, although it would in principle be possible to utilize data fitted functions for the state variables. Also note that the equations in the reduced model describe the total influx and efflux of glucose across the membrane [27], which cannot be interpreted w.r.t. the state variables of the original model. Other assumptions in [37] that complicates a comparison with our method are that the efflux of glucose is negligible, that the concentration of glucose in the cytosol is negligible, and that the concentrations of the transporter are constant in the inner and outer regions of the cell membrane (Eq. (50)).

It is not possible to generate a reduced model by direct substitution of the fraction parameters that were derived in [27] into the ODEs of the original model, since this would lead to the prediction that the state variables are constant (as discussed in [27]). We will now instead illustrate how our method can be used to derive a reduced and zoomable version of the glucose transport model.

#### Reduction of the Glucose Transport Model

Before applying our method to the glucose transport model we tried an alternative approach. Eqs. (43)-(46) were solved w.r.t. the state variables in QSS, and the resulting expressions were then substituted into the remaining ODEs. The details of the derivation of the reduced model are presented in Appendix A.3. The reduced model does not produce satisfactory predictions for any other state variable than ${x}_{G6P}^{i}$, which remains approximately constant during the simulation. Implementations of the original model (Additional file 4), the reduced model (Additional file 5), and a script for simulation with SBtoolbox2 for MATLAB [36] (Additional file 6), are available in *Additional files*.

Since the first approach turned out to be insufficient for reduction of the glucose transport model we applied our method to the same model. Following [37], we initially assumed constant transporter concentrations in the inner and outer regions of the cellular membrane, as defined by Eq. (50). For the details on the derivation of the reduced model we refer to Appendix A.4. The reduced model clearly performs better than the model resulting from the first approach, but it is still not satisfactory. However, the assumption of constant regional concentrations of the transporter may not be valid since the transport of glucose across the cell membrane is a rate limiting step in the model, and appears to be important for the state variable dynamics. We therefore decided to neglect Eq. (50) in the reduction process.

Implementations of the original model (Additional file 4), the reduced model (Additional file 7), and a script for simulation with SBtoolbox2 for MATLAB [36] (Additional file 8), are available in *Additional files*. In the first step of the method we identify the following apparent conservation relations

We note that there are two disjoint clusters of fast reactions in the model, corresponding to the outer- and inner parts of the cell membrane.

*L*

_{G 6P}as modified lumped state variables. The choice to keep

*L*

_{G 6P}leads to the largest possible reduction in the number of state variables, since the conservation of

*L*

_{G 6P}is exact (which is not true for any other state variable in

*l*). The modified lumped state variables are defined true for any other state variable in

*l*). The modified lumped state variables are defined

Note that two of the state variables in the original model, ${x}_{Glc}^{e}$ and ${x}_{Glc}^{i}$, are also modified lumped state variables.

*L*

_{G 6P}, and ${L}_{{E}_{3}}$). All of the modified lumped state variables satisfy the requirement that at least

*n*

_{ m }of Eqs. (43)-(49) and Eq. (52) are linear, and linearly independent, with respect to the corresponding original state variables. Eqs. (43) and (52) form a nonlinear equation system with the solution

where $\xi ={K}_{3}{K}_{4}+{K}_{4}{x}_{E-Glc}^{i}+{K}_{3}{x}_{E}^{i}$. We define that ${g}_{3}\equiv {x}_{E-G6P}^{i}$, ${g}_{4}\equiv {x}_{E-Glc-G6P}^{i}$, and ${g}_{5}\equiv {x}_{G6P}^{i}$. For the two original state variables that are kept as modified lumped state variables in the reduced model we define that ${g}_{1}\equiv {x}_{Glc}^{e}$ and ${g}_{2}\equiv {x}_{Glc}^{i}$.

*L*

_{G 6P}can be replaced by a constant in the model, since ${\stackrel{\xb0}{L}}_{G6P}=0$. The ODEs of the modified lumped state variables are derived with Eq. (31). In the inner region the ODEs are

*Additional files*.

*L*

_{G 6P}as a modified lumped state variable, the reduced model will have the same state variables as the reduced model in [27, 37] (i.e., ${x}_{Glc}^{e},\phantom{\rule{2.77695pt}{0ex}}{x}_{Glc}^{i}$, and ${x}_{G6P}^{i})$ two additional state variables for the transporter. This gives a reduced model with five state variables, but equally many parameters as in the previous case. A comparison between the original model and the reduced model, w.r.t. the original state variables, is shown in Figure 5 and Figure 6. As can be seen the comparison is very good, in fact it is even slightly better than for the reduced model with four state variables. The details of the derivation of the reduced model are presented in Appendix A.5. Implementations of the original model (Additional file 4), the reduced model (Additional file 11), and a script for simulation with SBtoolbox2 for MATLAB [36] (Additional file 12), are available in

*Additional files*.

*k*

_{-4}and to large values of

*k*

_{4}. This is mainly due to that a large proportion of the transporter

*E*is absorbed in ${x}_{E-G6P}^{i}$, which leads to that some of the QSS assumptions are invalid. We also observed that relative difference between the models is insensitive to the total concentration of the transporter for several orders of magnitude around the nominal value (

*L*

_{ E }= 0.01). However, note that this observation is specific to the studied model and may not be generalizable to other similar biochemical models.

Robustness of the reduced model for large deviations from the nominal parameter point are presented for the glucose transport model, with a sampling frequency of 1 (starting from 1) time units.

Param./Factor | 10 | 10 | 10 | 10 | 10 |
---|---|---|---|---|---|

| 0.079/0.091 | 0.079/0.10 | 0.081/0.091 | 0.083/0.10 | 0.084/0.11 |

| 0.085/0.23 | 0.084/0.093 | 0.081/0.091 | 0.078/0.10 | 0.078/0.089 |

| 0.16/0.93 | 0.080/0.55 | 0.081/0.091 | 0.063/0.083 | 0.054/0.062 |

| 0.054/0.10 | 0.062/0.074 | 0.081/0.091 | 0.097/0.10 | 0.10/0.10 |

| 0.081/0.091 | 0.082/0.091 | 0.081/0.091 | 0.079/0.089 | 0.075/0.082 |

| 0.070/0.080 | 0.079/0.089 | 0.081/0.091 | 0.082/0.091 | 0.081/0.091 |

| 0.23/0.32 | 0.21/0.30 | 0.081/0.091 | 6.36/6.93 | 310/336 |

| 310/336 | 6.36/6.93 | 0.081/0.091 | 0.21/0.30 | 0.23/0.32 |

| 0.10/0.10 | 0.095/0.17 | 0.081/0.091 | 0.080/0.089 | 0.083/0.088 |

| 0.076/0.088 | 0.091/0.096 | 0.081/0.091 | 0.075/0.15 | 0.074/0.19 |

All | 0.16/0.97 | 0.081/0.69 | 0.081/0.091 | 0.060/0.091 | 0.054/0.081 |

## Discussion

In this paper we have presented a novel method for reduction of biochemical models that is compatible with the concept of zooming. Several methods for reduction of biochemical models already exist in the literature. However, few of these methods result in biochemically interpretable models, and to our knowledge there are no nonlinear lumping methods for which the state variables and parameters of the reduced model can be back-translated (mapped) to the original model.

The application of the QSS assumption has been a commonly used tool in the modeling of biochemical networks since the late 1960s, and in chemical kinetics for more than 80 years [38]. The validity of the QSS approximation is well studied both for specific biochemical mechanisms [38, 39] and for more complex models [40, 41]. The resulting equations, together with conservation relations, are typically used to eliminate some of the state variables in the model (e.g., see [40]). However, with the examples in this paper we have showed that such an approach is not always sufficient, and we propose to use proper lumping of state variables in combination with back-translation.

Our method has several important advantages when applied to biochemical models. The most important advantage is that we end up with reduced models with a clear biological interpretation, meaning that each state variable of the original model corresponds to a fraction of exactly one of the state variables in the reduced model. A consequence is that neighboring species in the original model remain neighbors in the reduced model. Hence we can consider the original and reduced models as two different degrees of zooming; a concept that we discussed in some detail in [27] for linear models.

The work in this paper can be seen as an extension of the theory introduced for linear models in [27] to nonlinear models. The method is based on assumptions regarding the dynamics that result in a sufficient number of equations that are linear w.r.t. the state variables to be back-translated. Such equations are typically a natural result of QSS assumptions and conservations relations in models based on mass action kinetics [42], and in particular in models that involve transporters and enzymes (e.g., the models in this paper). However, note that our method may also be applicable to models with other types of reaction kinetics. We also note that if too few linear relations are available for calculation of fraction parameters for a part of a model, this part can still be reduced and the reduced model can be simulated, although we cannot back-translate the corresponding modified lumped state variables since no fraction parameters are available. However, depending on the purpose of the model it may be enough to calculate fraction parameters for a subset of the state variables in the reduced model. Linearization of the model around a steady state operating point may also be a feasible approach to calculate fraction parameters with the method in [27].

The proposed method enables mapping of the state variables and parameters of the reduced model to those of the original model. In [27] we referred to this mapping as back-translation. Back-translation is of great importance, since we can directly observe how modifications to the reduced model impact the original model. It also gives the modeler an opportunity to check whether the assumptions underlying the reduction are acceptable. To illustrate the power of back-translation we provide plots for comparison of simulations of the original and reduced models, w.r.t. the original state variables, for the models to which the method is applied in this paper.

Back-translation of state variables typically requires the solution of a system of nonlinear equations, which often results from the assumption of state variables in QSS and conservation relations. Unfortunately, analytic solutions to systems of nonlinear equations do in general not exist. An advantage with the proposed method is that such solutions are not required, since they are replaced by computation of the inverse of a matrix for each cluster of fast state variables, which is in general a more feasible task.

Our method was applied to a small model with five state variables that commonly appears as part of larger biochemical models, and to a previously published model for the transport of glucose in baker's yeast (*S. cerevisiae*) [37]. The first model was reduced from five to one state variable, and from five to three parameters. However, note that our focus has been on the reduction of the number of state variables and not the number of parameters, which are reduced as a side-effect of the QSS assumptions.

The model for glucose transport was first reduced with an approach in which the QSS equations and conservation relations were directly substituted into the remaining ODEs. The results of this approach are not satisfactory since the reduced model gives predictions that are different from the original model for most state variables. Our method was then applied to the same model both together with the assumption of equal concentration of the transporter in the inner and outer regions of the cell membrane used in [37], and without any additional assumptions. The application of our method together with the assumptions used in [37] results in a model with three state variables. The state dynamics is significantly better preserved than with the first approach, although still not satisfactory. We then decided to reduce the original model without the assumption regarding the localization of the transporter, with two different definitions of modified lumped state variables. While one of these definitions results in a reduced model with four state variables and gave rather accurate predictions, the other choice reduces the number of state variables to five and gives an excellent description of the state dynamics. It is therefore apparent that there is a tradeoff between accuracy and the number of state variables in the reduction process. The glucose transport model corresponds to the first part of glycolysis, in which glucose is transported into the cell. We therefore propose that it might be rewarding to carefully re-investigate the assumptions underlying the reaction rate equations in complete models of glycolysis (see [43] for one example).

We have also observed a few issues regarding the implementation of the method. The symbolic inversion of the matrix that is necessary to compute the dynamics of the modified lumped state variables may be expensive. However, this is typically only a practical limitation for large matrices, which result from large clusters of fast state variables. In our experience large clusters of fast state variables are relatively rare also in large biochemical models. Another option, if it is not practically feasible to invert the symbolic matrix, is to solve the system of linear equations in Eq. (31) numerically. We also observed that the symbolic right-hand side of the resulting differential equations may be long. However, these are usually not practical limitations for the applicability of the method, e.g., the simulations of all examples in this paper are very fast on a modern computer. Available methods to reduce the analytic reaction rate expressions include sensitivity analysis w.r.t. state variables and parameters, and the method proposed in [17].

There is still no consensus method for automatic identification of state variables in QSS, although criteria for the detection of state variables in QSS have been proposed, for example in [30]. A simple approach is to simulate the original model and investigate for which state variables the corresponding in- and outflow reaction rates are approximately equal. State variables for which this condition holds are then considered to be in QSS. Note that for the models in this paper it was already clear from the biochemical understanding of the corresponding systems which of the state variables that could be considered fast (see [37] for the glucose transport example). However, an appropriate general criterion for automatic identification of state variables in QSS is still lacking.

Although the theory presented in this paper constitutes a great leap forward for construction of zoomable models, more research is required to make the method fully automatic. An important challenge is to define a meaningful measure for the similarity between the hierarchical model layers (degrees of zooming). Another interesting, although trivial, observation that deserves further attention is that QSS assumptions typically do not hold in the whole parameter space. Although the reduced models in this paper appear to be robust to varying parameter values it may not be the case in general. It may therefore be revealing to compare the original model and the reduced model to characterize the parameter space regions in which the QSS-assumptions are valid.

## Conclusions

We have presented a novel method for reduction of biochemical models that is compatible with the concept of zooming. Zooming allows the modeler to operate on different levels of model granularity, and enables a direct interpretation of how modifications to the model on one level affect the same model on other levels in the hierarchy. The proposed method is based on the application of proper lumping in combination with the identification of linear relations in nonlinear equations.

The method was applied to two example models. The first model is small and commonly occurring as a part of larger biochemical models. The second example is a model for glucose transport in baker's yeast, which constitutes the starting point for glycolysis. Both models could be significantly reduced with the proposed method, and the resulting state variables could be back-translated to the original state variables. The method that is presented in this paper constitutes an extension of the method that was previously developed for linear biochemical models to its nonlinear counterpart. Since most models in the systems biology community are in fact nonlinear, our method constitutes an important step towards zoomable biochemical models.

## A Appendix

### A.1 Appendix 1

where *r*_{1} = *k*_{1}*SE* - *k*_{-1}*C*_{
S
} , *r*_{2} = *k*_{2}*C*_{
S
} , and *r*_{3} = *k*_{3}*C*_{
P
} - *k*_{-3}*PE*.

*k*

_{1}= 1000,

*k*

_{-1}= 2000,

*k*

_{2}= 1,

*k*

_{3}= 3000, and

*k*

_{-3}= 1000, together with the initial conditions

*S*(0) =

*E*(0) = 1 and

*P*(0) =

*C*

_{ S }(0) =

*C*

_{ P }(0) = 0. This gives the parameter values

*M*

_{1}= 0.5,

*M*

_{3}= 3 and

*L*

_{ E }= 1 in the reduced model. Initial conditions can in general be obtained from a short simulation of the original model, until the fast state variables reach QSS, but in this case Eq. (19) gives an analytic expression of

*S*(0) (

*P*(0) = 0)

where ${M}_{1}^{-1}={K}_{1}$, *S* = *A*, *L*_{
S
} = *L*_{1}, and *L*_{
E
} = *L*_{2}.

### A.2 Appendix 2

which was introduced in [27].

### A.3 Appendix 3

In this section we investigate an alternative (naive) approach to reduce the glucose transport model. The first step is to identify state variables for which the QSS assumption holds, and the mass conservation relations in the model. In the second step of this approach we then substitute the corresponding system of equations into the ODEs corresponding to slow state variables.

Note that the state variables ${x}_{E-Glc}^{e}$ and ${x}_{E-Glc}^{i}$ are decoupled from the state variables ${x}_{E}^{e}$ and ${x}_{E}^{i}$ in Eqs. (61)-(64).

There are three molecules (moieties) whose mass is conserved in the model as a whole, i.e., Glc, G6P, and E. However, we can not substitute any of the conservation relations into the remaining ODEs without re-introducing state variables that were already eliminated. So the final reduced model takes the form of Eqs. (61) - (64). However, the sum of the state variables ${x}_{E-Glc}^{e}$ and ${x}_{E-Glc}^{i}$, and ${x}_{E}^{e}$ and ${x}_{E}^{i}$ is conserved in the reduced model, which makes it possible to reduce the model to two state variables.

Unfortunately, due to the form of the ODEs and the initial conditions of the state variables in the reduced model, the state variables ${x}_{E-Glc}^{e}$ and ${x}_{E-Glc}^{i}$ remain equal to zero at all times, and only the state variables ${x}_{E}^{e}$ and ${x}_{E}^{i}$ take non-zero values. We therefore decided to simulate the original model for a short time until the fast state variables reach QSS, and to use the final state variable values as initial conditions in the reduced model.

which can be used for back-translation of the state variables of the reduced model to those of the original model.

The predictions of the state variables of the original model, resulting from simulations of the original model and the reduced model with the parameter values set as in [37], is not satisfactory for any other state variable than *G* 6*P*, which remains approximately constant over time. Implementations of the original model and the reduced model in SBtoolbox2 for MATLAB [36] are included in *Additional files*.

### A.4 Appendix 4

In this section we apply our method to the glucose transport model, and following [37] we will assume that the concentrations of the transporter are constant in the inner and outer regions of the cellular membrane. With this assumption the distribution among the transporter state variables of the original model, which constitute the lumped state variables *L*_{
E
} , is uniquely defined.

where $x={\left({x}_{Glc}^{e}\phantom{\rule{1em}{0ex}}{x}_{Glc}^{i}\phantom{\rule{1em}{0ex}}{x}_{E-G6P}^{i}\phantom{\rule{1em}{0ex}}{x}_{E-Glc-G6P}^{i}\phantom{\rule{1em}{0ex}}{x}_{G6P}^{i}\phantom{\rule{1em}{0ex}}{x}_{E-Glc}^{e}\phantom{\rule{1em}{0ex}}{x}_{E-Glc}^{i}\phantom{\rule{1em}{0ex}}{x}_{E}^{e}\phantom{\rule{1em}{0ex}}{x}_{E}^{i}\right)}^{T}$.

*L*

_{ E }in the reduced model since it corresponds to an exact conservation, and therefore results in the largest reduction possible (note that the exact conservation relations,

*L*

_{ E }and

*L*

_{G 6P}, can not simultaneously be used since the lumping would then not be proper). The modified lumped state variables take the form

We note that Eqs. (43)-(47), and (50) are all linear w.r.t. the state variables that constitute state *L*_{
E
} , so the requirement for the existence of at least 6 (*n*_{
m
} ) linear relations is satisfied, which enables back-translation in step three of the method.

*L*

_{ E }. Eqs. (43)-(47), and (50) form an equation system, corresponding to Eq. (28)

where ${l}_{{m}_{1:3}}$ denotes the first three states variables in *l*_{
m
} . Note that there are three state variables in the reduced model, which is the same number as for the reduced model in [37].

In the fifth step of our method we compare predictions of the original state variables between the original and reduced models, where *L*_{
E
} is back-translated with the fraction parameters defined in Eqs. (66)-(67).

The simulation results are clearly more accurate than with the approach in Appendix A.3, although still not satifying. We refer to *Additional files* for implementations of the original and reduced models in SBtoolbox2 for MATLAB [36].

### A.5 Appendix 5

In this section we apply our method to the glucose transport model, but with an alternative definition of the modified lumped state variables. We do not use the assumption of constant regional concentrations of the transporter (Eq. (50)).

In the first step of the method we note that the apparent conservations are given by Eq. (51).

*L*

_{G 6P}, in the reduced model. This leads to the following definition of the modified lumped state variables

Where $x={\left({x}_{Glc}^{e}\phantom{\rule{1em}{0ex}}{x}_{Glc}^{i}\phantom{\rule{1em}{0ex}}{x}_{E-G6P}^{i}\phantom{\rule{1em}{0ex}}{x}_{E-Glc-G6P}^{i}\phantom{\rule{1em}{0ex}}{x}_{G6P}^{i}\phantom{\rule{1em}{0ex}}{x}_{E-Glc}^{e}\phantom{\rule{1em}{0ex}}{x}_{E-Glc}^{i}\phantom{\rule{1em}{0ex}}{x}_{E}^{e}\phantom{\rule{1em}{0ex}}{x}_{E}^{i}\right)}^{T}$, and we note that ${x}_{Glc}^{e}$, ${x}_{Glc}^{i}$, and ${x}_{G6P}^{i}$ are state variables both in the original and reduced models. Also note that the requirement of at least *n*_{
m
} equations, that are linear w.r.t. the original state variables and linearly independent, is satified for each of the modified lumped state variables by Eqs. (43)-(49).

and the fraction parameters were calculated with Eq. (30).

*l*

_{3:5}and ${l}_{{m}_{3:5}}$ are the last three state variables of

*l*and

*l*

_{ m }, respectively, and

In the fifth step of the method we simulate the reduced model with Eqs. (56)-(57) and (70)-(71) and we then use Eqs. (53) and (69) for back-translation of the state variables. A comparison between the original model and the reduced model, w.r.t. the state variables of the original model, is presented in Figure 5 and Figure 6. The agreement between the models is very good. We refer to *Additional files* for implementations of the original and reduced models in SBtoolbox2 for MATLAB [36].

## Additional files

The original and reduced versions of the models presented in this paper, and scripts for simulation and comparison between the original and reduced versions of the models. Note that the systems biology toolbox for MATLAB [36] and the symbolic math toolbox for MATLAB must be installed on the system for simulation of the attached models.

## Declarations

### Acknowledgements

This work was supported by grants from the European Commission 6th Framework Programme (BIOSIM, grant No 005137) and 7th Framework Programme (UNICELLSYS, grant No 201142), SystemsX.ch (RTD project YeastX), the Swedish Foundation for Strategic Research through the Gothenburg Mathematical Modelling Centre, the Swedish Research Council, and by Lions. We would also like to thank John Johansson for his contributions to the development of the method in his M.Sc. thesis at the Fraunhofer-Chalmers Centre.