- Methodology article
- Open Access
- Published:

# A method for zooming of nonlinear models of biochemical systems

*BMC Systems Biology*
**volume 5**, Article number: 140 (2011)

## Abstract

### Background

Models of biochemical systems are typically complex, which may complicate the discovery of cardinal biochemical principles. It is therefore important to single out the parts of a model that are essential for the function of the system, so that the remaining non-essential parts can be eliminated. However, each component of a mechanistic model has a clear biochemical interpretation, and it is desirable to conserve as much of this interpretability as possible in the reduction process. Furthermore, it is of great advantage if we can translate predictions from the reduced model to the original model.

### Results

In this paper we present a novel method for model reduction that generates reduced models with a clear biochemical interpretation. Unlike conventional methods for model reduction our method enables the mapping of predictions by the reduced model to the corresponding detailed predictions by the original model. The method is based on proper lumping of state variables interacting on short time scales and on the computation of fraction parameters, which serve as the link between the reduced model and the original model. We illustrate the advantages of the proposed method by applying it to two biochemical models. The first model is of modest size and is commonly occurring as a part of larger models. The second model describes glucose transport across the cell membrane in baker's yeast. Both models can be significantly reduced with the proposed method, at the same time as the interpretability is conserved.

### Conclusions

We introduce a novel method for reduction of biochemical models that is compatible with the concept of zooming. Zooming allows the modeler to work on different levels of model granularity, and enables a direct interpretation of how modifications to the model on one level affect the model on other levels in the hierarchy. The method extends the applicability of the method that was previously developed for zooming of linear biochemical models to nonlinear models.

## Background

One of the main reasons for the rapid growth of the field of systems biology is that it makes extensive use of mathematical modeling [1–3]. This allows for a better handling of high complexity, which is an inherent property of all living systems. Using modeling, complex hypotheses can be formulated and tested in a more systematic manner than is possible using only biochemical reasoning [4–6]. However, even if one can obtain a detailed model of the system with a high predictive power, the model in itself does not automatically lead to a full understanding of the underlying biochemistry. One should for instance analyze the model to single out its essence, i.e., to identify those parts of the model that can be eliminated, while still preserving the model's crucial behavior. This latter task is referred to as model reduction, and it is the topic of this paper. There is an extensive literature available on the topic of model reduction. However, most of these studies have been done outside the field of systems biology, and since systems biology brings about new types of challenges, reduction of biochemical models is still in its early stages. Traditional engineering approaches like balanced truncation have focused on preserving the input-output profile in an optimal manner, both for linear [7–10], and for nonlinear [11] systems. However, these methods are not suitable for systems biology, because the reduced model has no natural interpretation in itself (nevertheless, some special cases where this problem can be circumvented have been identified [12, 13]). This lack of interpretation is a problem because systems biology models are usually developed to help characterizing the dominating parts and structure of the system, and not only to obtain a black-box predictor. Methods have therefore been developed with traditional chemical approaches that are more centered on reducing the internal dynamics of the system. These methods are typically based on a sensitivity analysis [14–17], on time-scale separation [18–21], or on the lumping of state variables [22–26] (see [20] for a general review on model reduction). The perhaps most widely used method is lumping. Two of the main reasons for this are that an effective lumping scheme can be identified from basic properties of the model (e.g., the stoichiometry), and that lumped state variables are formed as easily interpretable pools of state variables in the original model. However, lumping does normally not come with the possibility of back-translation from the lumped state variables in the reduced model to the original state variables. In [27] we provided such relations. This means that we can take the result from a simulation of a reduced model, and without performing a new simulation, directly compute the corresponding trajectories of the desired original state variables. Because of this back-translation possibility, we refer to the resulting two models as two degrees of zooming of the same model. Nevertheless, like in other recent model reduction papers in systems biology [28–32], the results in [27] were mainly developed with linear systems in mind. Linear systems virtually only appear in the cases of mono-molecular reaction networks and for models describing the probabilistic evolution of a single protein complex [27, 33]. However, already in [27] we proposed that zooming may in principle also be applicable to nonlinear models, but we did not derive formulae for back-translation. Note that a majority of the currently available systems biology models are in fact nonlinear.

With the method introduced in this paper, we provide the extension of the previously proposed method in [27] to nonlinear models. We show that new challenges arise due to the nonlinearities, but also how these challenges can be overcome, for instance with a wise choice of state variables in the reduced model. The method is demonstrated by application to two closed models of metabolic systems.

## Methods

In this paper we present a more general version of the method that was introduced in [27], which is applicable to nonlinear models. We start with some basic definitions and key observations that are illustrated on a small example model, before we turn to the details of the method.

### Basic Definitions and Assumptions

The method is developed for models of biochemical reaction systems on state space form that are based on nonlinear ordinary differential equations (ODEs)

where *t* denotes time; the dot over *x* in Eq. (1) denotes derivative w.r.t. to time; the state vector *x* ∈ ℝ ^{n} ; the parameters *p* ∈ ℝ ^{p} ; the inputs *u* ∈ ℝ ^{m} ; the outputs *y* ∈ ℝ ^{l} ; and *f* and *h* are in general nonlinear functions. The state vector, whose individual elements are referred to as state variables, typically represents amounts or concentrations of chemical species, and the parameters commonly represent kinetic constants, initial conditions, or scaling factors. In this paper we are primarily interested in a comparison of the state variables between models (the original model and the reduced model), which means that the form of the nonlinear function in Eq. (2) is irrelevant for the application of our method. The right-hand side of Eq. (1) can be expressed as the stoichiometric matrix *S* ∈ ℝ^{n×q}times a vector of reaction rates *r* = *r*(*x*, *p*, *u*, *t*); *r* ∈ ℝ ^{q}

The existence of separate time-scales are commonly utilized for reduction of biochemical models (e.g., by reduction of mass action kinetics to Michaelis-Menten kinetics). The typical approach is to investigate if subsets of the state variables are in steady state or in quasi-steady state (QSS). If state variable *x*_{
i
} is in steady-state for *t* ≥ 0 it holds by definition that

which implies that

which efficiently removes the state variable from the model, since it can be substituted for constant. If on the other hand the state variable *x*_{
i
} is in QSS, there are terms on the right-hand side of the ODE that are much larger than the negligible term on the left-hand side. The approximation

is then commonly used to reduce the model. We refer to a state as fast in the time interval ≤ *T*_{0} ≤ *t* < *T*_{1} if Eq. (4) is valid in this time interval, and holds for the class of all considered inputs to the system. Note that *T*_{1} = ∞ in the case that the systems remains in QSS, which may for example not be the case for models with switches (where, e.g., the values of a subset of the state variables may change when a certain condition is fulfilled) [34, 35].

Note that Eq. (3) (steady state) necessitates that Eq. (4) (quasi-steady state) is fulfilled, but not *vice versa*. Although QSS implies that (some of) the terms of the right-hand side of the ODE are large and leaves the left-hand side (derivative term) negligible, the derivative term may still be large enough for the state variable in QSS to change considerably during the time-span of a simulation; the key is that these changes mainly occur on a slow manifold.

### Zooming of Linear Models

The concept of zooming was introduced in [27], and a method was presented that is applicable to linear time-invariant (LTI) models, which on state space form reads:

where *A* ∈ ℝ^{n × n}, *B* ∈ ℝ^{n × m}, *C* ∈ ℝ^{l × n}, and *D* ∈ ℝ^{l × m}. The method is based on the existence of at least one subset of state variables in ${\mathcal{M}}_{o}$ for which the internal dynamics is very fast with respect to the current time-scale of interest. An algorithm for automatic reduction of linear models that is based on the detection of such subsets, which are referred to as fast clusters, is presented in [27]. If the *w* state variables of a fast cluster are replaced by a single state variable ${x}_{lL}^{\prime}={\sum}_{i=1}^{w}{x}_{{l}_{i}}$, we obtain a reduced version of the original model

where *x*_{
r
} ∈ ℝ^{(n - w + 1)}, *A*_{
r
} ∈ ℝ^{(n - w + 1) × (n - w + 1)}, *B*_{
r
} ∈ ℝ^{(n - w + 1) × m}, and *C*_{
r
} ∈ ℝ^{l × (n - w + 1)}.

The fraction parameters, which are typically computed from QSS assumptions and mass conservation relations, take the form

The fraction parameters are used for back-translation of the lumped state variable to the original state variables. Note that the fraction parameters are functions of the model parameters only, and therefore time-invariant; as we will see, these fraction parameter properties do in general not hold for nonlinear models. By comparing the reactions of the original and reduced models, we see that

where ${k}_{jL}^{\prime}$ is the rate parameter in the reaction from state variable ${x}_{lL}^{\prime}$ to state variable *x*_{
j
} in the reduced model, and *k*_{
ji
} is the rate parameter in the reaction from state variable *x*_{
i
} to state variable *x*_{
j
} in the original model.

Finally, note that Eqs. (5) and (6) provide a link between ${\mathcal{M}}_{o}$ and ${\mathcal{M}}_{r}$, which constitute two different levels of granularity. It is this link between the models that make us consider them as two different degrees of zooming, and the primary goal of this paper is to establish such a link also for nonlinear models.

### Extension to Nonlinear Models - Initial Observations

We will now present some key observations that are used in the derivation of the method for zooming of nonlinear models.

First observe that the mass, which corresponds to a weighted (w.r.t. the molecular weight) sum of the state variables, of a closed (no exchange of matter with the surroundings) nonlinear model is conserved. However, the total number of molecules is in general not conserved in such a model as it is for linear models. This is for example due to the formation and dissociation of complexes, which alters the total number of molecules in the system. For instance, the binding of A to B reduces the number of molecules, as the product AB only counts as one molecule; binding reactions cannot occur in linear models. A second, and related, observation is that another type of conservations appears in nonlinear models; conserved moieties. A moiety is a specific functional part of a molecule, and the weighted sum of the number of molecules that contain this functional part is constant in a closed system. The presence of such a conserved moiety is equivalent to the existence of a row vector *m* ∈ ℕ ^{n} for which *mS* = 0, which also implies that

If we let the rank of S be denoted by *n*_{
r
} , the number of linearly independent vectors for which Eq. (7) holds is equal to *n* - *n*_{
r
} , which implies the existence of a matrix *M*

where $M\in {\mathbb{N}}^{n\phantom{\rule{0.3em}{0ex}}-\phantom{\rule{0.3em}{0ex}}{n}_{r}\phantom{\rule{0.3em}{0ex}}\times \phantom{\rule{0.3em}{0ex}}n}$.

Let us now make some remarks regarding fast state variables in a nonlinear model. Let ${x}_{f}\in {\mathbb{R}}^{{n}_{f}}$ be the vector of all fast state variables in *T*_{0} ≤ *t* < *T*_{1}. For simplification we will assume that there are no inputs to the system, although it would in principle be possible to incorporate inputs in the following discussion. The right-hand side of the ODEs for these fast state variables, if there are no inputs, can be separated into two parts. The first part contains reactions between fast state variables that are significant for the fast dynamics; *r*_{
f
} (*x*_{
f
} , *p*), and the second part contains all other reactions, *r*_{
s
} (*x*, *p*), i.e.,

where *S*_{
f
} and *S*_{
s
} are the corresponding stoichiometric matrices. Let us now consider the fast stoichiometric matrix, *S*_{
f
} , and especially the conserved moieties that are implied by *S*_{
f
} . Since these moieties are only (approximately) conserved on a fast enough time-scale, we refer to such moiety conservations as apparent conservations. Let *M*_{
f
} be a matrix with *a* linearly independent rows such that

where ${M}_{f}\in {\mathbb{N}}^{a\times {n}_{f}}$. Each row of this matrix thus implies an apparent conserved moiety in the system. Let the sums of state variables that correspond to apparent moiety conservations (i.e., lumps of state variables) be denoted by *l*, so that

If we differentiate *l* with respect to time, we get

where Eqs. (9) and (10) were used.

It is interesting to note that the matrix *M*_{
f
} is not unique, but that in fact any matrix $\widehat{M}=N{M}_{f}$ can be used for lumping, where *N* ∈ ℝ^{a×a}is non-singular. This observation allows us to choose a matrix $\widehat{M}$ for which a maximal number of rows in $\widehat{M}\phantom{\rule{0.3em}{0ex}}{S}_{s}$ vanish, which results in the greatest possible reduction in the number of state variables. Finally note that Eq. (4), in the absence of inputs, gives

since the term *S*_{
f
}*r*_{
f
} (*x*_{
f
} , *p*) dominates the term *S*_{
s
}*r*_{
s
} (*x*, *p*). Note that Eq. (4) and consequently Eq. (13) only hold in *T*_{0} ≤ *t* < *T*_{1}, since the state is only known to be fast in this time span.

Eq. (12) defines the ODEs of the reduced model (the lumped state variables), and Eqs. (11) and (13) can in principle be used to calculate back-translation formulae, as is demonstrated with the small example model in the next section. However, as we shall see, this approach requires the explicit algebraic solution to a system of nonlinear equations, which is typically an infeasible task. Furthermore, there is not a clear one-to-one mapping between the state variables of the original and reduced models as in the case of proper lumping [27].

### A Small Example Model

We will now present a small example model, with three fast state variables, which is reduced with the approach discussed above. An alternative approach is then demonstrated with the advantage that it scales better to larger models.

Consider the reversible formation of a complex *C* from a substrate *A* and an enzyme *B*

consisting of the fast state variables *x*_{
f
} = (*A B C*) ^{T} , where the bullets (•) represent the slow state variables surrounding the three fast state variables in the model. The ODEs for the fast state variables take the form

The three state variables in the model constitute a fast cluster with two apparent conserved moieties, which may be represented by the following relations

where the lumped state variables *L*_{1} and *L*_{2} are introduced. Note that Eq. (12) defines the dynamics of the lumped state variables.

The distribution of mass among the fast state variables is given by Eq. (15) and by applying Eq. (13) to (14), which results in an equation system with the three fast state variables as unknowns

Analytic expressions for the fast state variables *A*, *B*, and *C* are given by the non-negative solution to Eqs. (16)-(18)

where ${K}_{1}=\frac{{k}_{-1}}{{k}_{1}}$.

We can now employ Eq. (12) to solve the ODEs for the lumped state variables *L*_{1} and *L*_{2}, and use Eqs. (19)-(21) as back-translation formulae to compute the trajectories of the original state variables *A*, *B*, and *C*. However, note that for even slightly larger clusters of fast species than the one discussed here it would not be possible to calculate algebraic expressions of the original state variables with this approach, since it builds on the explicit solution of a system of nonlinear equations, which quickly becomes infeasible with growing problem size.

Alternatively, we can take an approach to the problem that is inspired by the method for linear systems in [27]. The first step is to express Eqs. (16) and (17) as a linear system w.r.t. the state variables *A* and *C*

The solution to Eq. (22) w.r.t. *A* and *C* is

where ${K}_{1}=\frac{{k}_{-1}}{{k}_{1}}$, and the fraction parameters *η*_{
A
} (*B*, *p*) and *η*_{
C
} (*B*, *p*) are defined in Eqs. (23) and (24), respectively. The ODE for *L*_{1} is defined by Eq. (12), and the ODE for *B* can be derived by differentiation of *L*_{2} in Eq. (18), which gives that

The reduced model consists of the two state variables *L*_{1} and *B* (note that *L*_{2} does not appear in the reduced model), and the dynamics is described by Eqs. (12) and (25), respectively. Note that the state variables *A* and *C* can be back-translated from the reduced model with Eqs. (23) and (24). This approach is a bit more intricate than the first, but comes with the advantage that we do not need to solve a system of nonlinear equations.

### A Method for Zooming of Nonlinear Models

We will now step-by-step present a method that can be used to construct zoomable nonlinear biochemical models. This involves two sub-goals: i) to identify a reduced model that shares important characteristics with the original model, ii) to derive back-translation formulae that can be used to compute the original state variables and parameters from the reduced model.

In an initialization step of the method for a model ${\mathcal{M}}_{o}$ we first formulate mathematical equations for all conservation relations Eq. (8), state variables in steady-state Eq. (3), and quasi-steady state assumptions Eq. (4). If additional properties of the system are known, we also formulate the corresponding equations.

### Step 1

The first step of the method is to identify the apparent conservation relations in the model.

**Definition 1:** Let *S*_{
f
} be the stoichiometric matrix for the reactions *r*_{
f
} (*x*_{
f
} , *p*) as defined in Eq. (9). Each subset of state variables for which the corresponding rows of *S*_{
f
} are linearly dependent constitutes an apparent conservation relation. Hence the apparent conservation relations lie in the left null space of *S*_{
f
} and the dimension of this space is *n* - rank(*S*_{
f
} ).

Note that the apparent conservation relations are defined in Eq. (11). It is trivial to identify the set of all linearly dependent rows of *S*_{
f
} with a mathematical computing software (e.g., SBtoolbox for Matlab [36]).

### Step 2

The second step of the method is to define the state variables of the reduced model, which we refer to as modified lumped state variables.

**Definition 2:** Let *x* be a lumped state variable corresponding to a subset of the state variables in an apparent conservation relation. Then *x* is a modified lumped state variable if the lumping scheme with respect to the state variables of the original model is proper.

Note that the original state variables have a clear interpretation in the reduced model (i.e., that the lumped variables form disjoint sets) if the lumping scheme is proper, i.e.,

where *M*_{
m
} is *a* × *n*_{
f
} matrix with elements equal to 0 or 1 and column sums equal to 1, and *l*_{
m
} denotes the modified lumped state variables. We typically have a large freedom in the choice of *M*_{
m
} . The number of state variables is maximally reduced if all exact conservation relations in the model are retained as modified lumped state variables (and replaced by constants).

### Step 3

The third step of the method is to derive fraction parameters, which constitute the link between the reduced model and the original model. Let the original state variables that constitute the *k*:th modified lumped state variable ${l}_{{m}_{k}}$ be denoted by ${x}_{{m}_{k}}$, so that

A number of *n*_{
m
} equations that are linear w.r.t. ${x}_{{m}_{k}}$, and linearly independent, are required to calculate fraction parameters. The existance of *n*_{
m
} such equations results in an equation system

where both $A\left({l}_{m},p\right)\in {{\mathsf{\text{R}}}^{{n}_{m}}}^{\times {n}_{m}}$ and ${b}_{k}\left({l}_{m},p\right)\in {\mathsf{\text{R}}}^{{n}_{m}}$ are known, although some of the equations may in general be approximate (e.g., QSS). The matrix *A*(*l*_{
m
} , *p*) is invertible since the equations are linearly independent, and we have that

The fraction parameters can then be calculated

where we used Eq. (27) in the last step.

A modified lumped state variable for which an insufficient number of linear and linearly independent equations are available may still be used in the reduced model. However, the back-translation of the modified lumped state variable to the original state variables is then not possible, and step 3 of the method is ignored.

### Step 4

The fourth step of the method is to derive the rate of change of the modified lumped state variables. **Theorem:** The dynamics of the modified lumped state variables is given by

where *S*_{
s
} and *r*_{
s
} (*x*, *p*) were defined in Eq. (9), the matrix *M*_{
f
} is defined in Eq. (11), and

where the matrix *M*_{
m
} is defined in Eq. (26), and ${g}_{i}\left({l}_{m},p\right)\equiv {x}_{{f}_{i}}$ is introduced to simplify the notation.

**Proof:**

First subtract Eq. (26) from Eq. (11)

and differentiate *l* with respect to time, which gives

where *I* is the identity matrix and *J*(*l*_{
m
} , *p*) is the Jacobian of (*M*_{
f
} - *M*_{
m
} )*g*(*l*_{
m
} , *p*) with respect to *l*_{
m
} . The element *J*_{
ij
} of *J*(*l*_{
m
} , *p*) is given by

From Eq. (33) it is straight-forward to derive ${\stackrel{\xb0}{l}}_{m}$, which takes the form

where Eq. (12) was used in the last step. □

The matrix *I* + *J*(*l*_{
m
} ; *p*) is symbolically invertible, but may in general contain singularities for particular combinations of parameters values and state variable values. However, the matrix is always invertible for the models discussed in this paper, since the corresponding determinants are strictly positive.

### Step 5

The final step of the method is to back-translate the modified lumped state variables to the original state variables with the fraction parameters derived in step 3. This allows a comparison between the predictions by the reduced model to those of the original model.

The implementation of the method is straight-forward, and we have used Matlab (R2008b) together with the SBtoolbox [36] as computing software for the models in this paper.

## Results

We will now demonstrate the method through application to two example models.

### Enzyme Kinetics Model

The model below describes the process of conversion of a substrate, *S*, into a product, *P*, which is catalyzed by an enzyme, *E*.

Note that the complexes *C*_{
s
} and *C*_{
p
} are formed by *S* bound to *E*, and *P* bound to *E*, respectively. This model is frequently occurring as part of larger models of biological systems, although the reaction from *C*_{
S
} to *C*_{
P
} is sometimes neglected, or reversible. The ODEs for the model are listed in Appendix A.1, where the three reactions are defined as: *r*_{1} = *k*_{1}*SE* - *k*_{-1}*C*_{
S
} , *r*_{2} = *k*_{2}*C*_{
S
} , and *r*_{3} = *k*_{3}*C*_{
P
} - *k*_{-3}*PE*.

The reaction terms *r*_{1}(*x*, *p*) and *r*_{3}(*x*, *p*) are assumed to be dominating, and the reaction term *r*_{2}(*x*, *p*) to be insignificant in the ODEs. This results in that all state variables are in QSS, which gives

We denote the sum of the state variables containing the enzyme by

which is constant since the total amount of the enzyme *E* is conserved in the system.

#### Reduction of the Enzyme Kinetics Model

The first step of the method is to identify the apparent conservation relations from the matrix *S*_{
f
} . Since *r*_{2}(*x*, *p*) is dominated by *r*_{1}(*x*, *p*) and *r*_{3}(*x*, *p*) the model ODEs can be written on the form of Eq. (9)

A basis of the left null space of *S*_{
f
} is given by the row vectors of *M*_{
f
} , which is defined by

where *L*_{
S
} and *L*_{
P
} are apparent conservation relations and *LE* is an exact conservation relation.

The second step is to define the modified lumped state variables on the form of Eq. (26). The number of state variables is maximally reduced if *L*_{
E
} is retained as a state variable in the reduced model (i.e., since *L*_{
E
} , unlike *L*_{
S
} and *L*_{
P
} , can be replaced by a constant). The vector of modified lumped state variables then is defined

In the third step of the method we calculate fraction parameters for the modified lumped state variable *L*_{
E
} . There are five equations (Eqs. (34)-(35) and (37)) that are linear w.r.t. the state variables *E*, *C*_{
S
} , and *C*_{
P
} , which are lumped into the state variable *L*_{
E
} . Note that only *n*_{
m
} = 3 equations are required to derive fraction parameters, and we use Eqs. (34)-(36) to formulate an equation system as in Eq. (28) with the solution

where ${M}_{1}=\frac{{k}_{1}}{k-1}$ and ${M}_{3}=\frac{{k}_{3}}{k-3}$. The two remaining modified lumped state variables correspond to *S* and *P* in the original model, so we define that ${l}_{{m}_{1}}=S\equiv {g}_{1}\left({l}_{m},p\right)$ and ${l}_{{m}_{2}}=P\equiv {g}_{3}\left({l}_{m},p\right)$.

In the fourth step we derive the rate of change of the modified lumped state variables. Eq. (32) gives that ${\stackrel{\xb0}{L}}_{E}=0$, which is replaced by a constant, and

where

The two ODEs in Eqs (41)-(42) define the dynamics of the state variables in the reduced model. We finally note that the exact conservation relation for the substrate, *L*_{
T
} = *S* + *P* + *C*_{
S
} + *C*_{
P
} , together with Eqs. (39)-(40) can be used to reduce the model further to a single state.

In the fifth step of the method we use the fraction parameters, defined in Eqs. (38)-(40), to back-translate the modified lumped state variables to the state variables of the original model. A comparison between predictions of the original state variables, from simulations of the original model and the reduced model, is presented in Figure 1. Implementations of the original model (Additional file 1), the reduced model (Additional file 2), and a script for simulation with SBtoolbox2 for MATLAB [36] (Additional file 3), are available in *Additional files*.

The only assumption that was used in the derivation of the reduced model is that the reaction terms *r*_{1}(*x*, *p*) and *r*_{3}(*x*, *p*) dominate the reaction term *r*_{2}(*x*, *p*), which results in that all state variables are in QSS. To assess the impact of these assumptions on the reduced model we compute the relative difference between the state variables in the original and in the reduced model

where ${x}_{i}^{o}$ is state variable *i* in the original model, ${x}_{i}^{r}$ is the corresponding back-translated state variable in the reduced model, and |*x*| denotes the elementwise absolute values of *x*. The maximal mean and infinity norm of *ε*_{
i
} (*t*) in Eq. (43) over time is presented in Table 1 for parameter values over five orders of magnitude. In general, the reduced model appears to be robust to changes in the parameter values, although slightly more sensitive to some parameters (e.g., small values of *k*_{-1}, large values of *k*_{1}, or large values of *k*_{2}, which violate the assumptions used in the reduction). However, note that the validity of the QSS assumption may also depend on the state variables, for example the total concentration of the enzyme. Interestingly, we observed that the reduced model can well approximate the original model over several order of magnitudes around the nominal enzyme concentration (*L*_{
E
} = 1). It is well-known that the QSS approximation is only valid for sufficiently small enzyme concentrations, and as expected the performance of the reduced model starts to decrease for immense enzyme concentrations.

Note that all the state variables of the original model have a direct biological interpretation also in the reduced model, and that Eqs. (38)-(40) can be used to back-translate the state variables. The reduced model may be depicted

where the fraction parameters specify the distribution of the enzyme among the corresponding original state variables.

### Glucose Transport in Budding Yeast

A model for the transport of glucose into a cell of baker's yeast (*S. cerevisiae*), which constitutes the first step of glycolysis, is presented in [37]. The inflow of glucose is modeled as a facilitated diffusion process, in which a carrier enzyme is responsible for the transport between the inner and outer regions of the cellular membrane. It is assumed that glucose 6-phosphate (G6P) has an inhibitory role in the glucose transport process by binding to the transporter. A graphical representation of the model is shown in Figure 2, and the ODEs for the state variables are listed in Appendix A.2.

In [27] we described how the calculation of fraction parameters, based on a set of assumptions, leads to the same reaction rates in the reduced model as were reported in [37]. The assumptions are that state variables participating in reactions for uptake and release of glucose and G6P across the cell membrane are in QSS, that the transporter is conserved, and that the concentrations of the transporter in the inner and outer regions of the cellular membrane are constant.

The assumption that the state variables ${x}_{Glc}^{e},\phantom{\rule{2.77695pt}{0ex}}{x}_{Glc}^{i},\phantom{\rule{2.77695pt}{0ex}}{x}_{E-G6P}^{i}$, and ${x}_{E-Glc-G6P}^{i}$, which participate in the uptake and release of G6P and glucose across the cell membrane, are in QSS gives that

We have the following exact conservation relations in the model

where *L*_{
E
} , *L*_{
Glc
} , and *L*_{G 6P}are constant over time. The assumption in [37] that the concentrations of the transporter in the inner and outer regions of the cell membrane are constant is formulated

It is not clear how equations for back-translation of the state variables in the reduced model in [27, 37] can be derived. The reduced model has three state variables; external- and internal glucose and G6P, but only two differential equations for the in- and outflow of glucose, since the ODE for G6P is replaced by a representative function that is inferred from the G6P data. Our method does not rely on that such information is available, although it would in principle be possible to utilize data fitted functions for the state variables. Also note that the equations in the reduced model describe the total influx and efflux of glucose across the membrane [27], which cannot be interpreted w.r.t. the state variables of the original model. Other assumptions in [37] that complicates a comparison with our method are that the efflux of glucose is negligible, that the concentration of glucose in the cytosol is negligible, and that the concentrations of the transporter are constant in the inner and outer regions of the cell membrane (Eq. (50)).

It is not possible to generate a reduced model by direct substitution of the fraction parameters that were derived in [27] into the ODEs of the original model, since this would lead to the prediction that the state variables are constant (as discussed in [27]). We will now instead illustrate how our method can be used to derive a reduced and zoomable version of the glucose transport model.

#### Reduction of the Glucose Transport Model

Before applying our method to the glucose transport model we tried an alternative approach. Eqs. (43)-(46) were solved w.r.t. the state variables in QSS, and the resulting expressions were then substituted into the remaining ODEs. The details of the derivation of the reduced model are presented in Appendix A.3. The reduced model does not produce satisfactory predictions for any other state variable than ${x}_{G6P}^{i}$, which remains approximately constant during the simulation. Implementations of the original model (Additional file 4), the reduced model (Additional file 5), and a script for simulation with SBtoolbox2 for MATLAB [36] (Additional file 6), are available in *Additional files*.

Since the first approach turned out to be insufficient for reduction of the glucose transport model we applied our method to the same model. Following [37], we initially assumed constant transporter concentrations in the inner and outer regions of the cellular membrane, as defined by Eq. (50). For the details on the derivation of the reduced model we refer to Appendix A.4. The reduced model clearly performs better than the model resulting from the first approach, but it is still not satisfactory. However, the assumption of constant regional concentrations of the transporter may not be valid since the transport of glucose across the cell membrane is a rate limiting step in the model, and appears to be important for the state variable dynamics. We therefore decided to neglect Eq. (50) in the reduction process.

Implementations of the original model (Additional file 4), the reduced model (Additional file 7), and a script for simulation with SBtoolbox2 for MATLAB [36] (Additional file 8), are available in *Additional files*. In the first step of the method we identify the following apparent conservation relations

We note that there are two disjoint clusters of fast reactions in the model, corresponding to the outer- and inner parts of the cell membrane.

In the second step we define the modified lumped state variables. We decide to keep the lumped state variables ${L}_{{E}_{1}}$ and *L*_{G 6P}as modified lumped state variables. The choice to keep *L*_{G 6P}leads to the largest possible reduction in the number of state variables, since the conservation of *L*_{G 6P}is exact (which is not true for any other state variable in *l*). The modified lumped state variables are defined true for any other state variable in *l*). The modified lumped state variables are defined

where

Note that two of the state variables in the original model, ${x}_{Glc}^{e}$ and ${x}_{Glc}^{i}$, are also modified lumped state variables.

In the third step of the method we calculate fraction parameters for the modified lumped state variables that correspond to more than one of the original state variables (i.e., ${L}_{{E}_{1}}$, *L*_{G 6P}, and ${L}_{{E}_{3}}$). All of the modified lumped state variables satisfy the requirement that at least *n*_{
m
} of Eqs. (43)-(49) and Eq. (52) are linear, and linearly independent, with respect to the corresponding original state variables. Eqs. (43) and (52) form a nonlinear equation system with the solution

Let us define that ${g}_{6}\equiv {x}_{E-Glc}^{e}$ and ${g}_{8}\equiv {x}_{E}^{e}$. Similarly, the fractions of the two carrier state variables in the inner regions of the cell to the lumped state variable ${L}_{{E}_{3}}$ can be computed from Eqs. (44) and (52) (Eq. (28))

We define that ${g}_{7}\equiv {x}_{E-Glc}^{i}$ and ${g}_{9}\equiv {x}_{E}^{i}$. The fraction parameters for the G6P-state variables can be computed from Eqs. (45)-(46) and Eq. (28) with the solution

where $\xi ={K}_{3}{K}_{4}+{K}_{4}{x}_{E-Glc}^{i}+{K}_{3}{x}_{E}^{i}$. We define that ${g}_{3}\equiv {x}_{E-G6P}^{i}$, ${g}_{4}\equiv {x}_{E-Glc-G6P}^{i}$, and ${g}_{5}\equiv {x}_{G6P}^{i}$. For the two original state variables that are kept as modified lumped state variables in the reduced model we define that ${g}_{1}\equiv {x}_{Glc}^{e}$ and ${g}_{2}\equiv {x}_{Glc}^{i}$.

The fourth step of the method is to derive rate equations for the modified lumped state variables. Since the apparent conservations are separated into two disjoint clusters of fast state variables, we can treat the model for the inner and outer regions of the membrane separately. Let the modified lumped state variables corresponding to the outer region of the cell membrane be denoted by ${l}_{{m}_{1}}={\left({x}_{Glc}^{e}\phantom{\rule{1em}{0ex}}{L}_{{E}_{1}}\right)}^{T}$, and the variables in the inner region of the cell membrane by ${l}_{{m}_{2}}={\left({x}_{Glc}^{i}\phantom{\rule{1em}{0ex}}{L}_{{E}_{3}}\right)}^{T}$. Note that the state variable *L*_{G 6P}can be replaced by a constant in the model, since ${\stackrel{\xb0}{L}}_{G6P}=0$. The ODEs of the modified lumped state variables are derived with Eq. (31). In the inner region the ODEs are

where

In the larger outer region of the cell membrane the ODEs take the form

where

In the fifth step of the method the reduced model, which is defined by the four ODEs in Eqs. (56)-(59), is simulated. The trajectories of the state variables of the reduced model can be back-translated to the original state variables with the fraction parameters defined in Eqs. (53), (54), and (55). The simulation results are shown in Figure 3 and Figure 4. All the state variables can be back-translated properly, which shows that the model properties that are important for recovery of the state variables are retained in the reduction. Implementations of the original model (Additional file 4), the reduced model (Additional file 9), and a script for simulation with SBtoolbox2 for MATLAB [36] (Additional file 10), are available in *Additional files*.

If we use ${L}_{{E}_{2}}$ instead of *L*_{G 6P}as a modified lumped state variable, the reduced model will have the same state variables as the reduced model in [27, 37] (i.e., ${x}_{Glc}^{e},\phantom{\rule{2.77695pt}{0ex}}{x}_{Glc}^{i}$, and ${x}_{G6P}^{i})$ two additional state variables for the transporter. This gives a reduced model with five state variables, but equally many parameters as in the previous case. A comparison between the original model and the reduced model, w.r.t. the original state variables, is shown in Figure 5 and Figure 6. As can be seen the comparison is very good, in fact it is even slightly better than for the reduced model with four state variables. The details of the derivation of the reduced model are presented in Appendix A.5. Implementations of the original model (Additional file 4), the reduced model (Additional file 11), and a script for simulation with SBtoolbox2 for MATLAB [36] (Additional file 12), are available in *Additional files*.

Note that the only assumption used to derive the reduced model is that states that are involved in reactions at the membrane are in QSS. To investigate the parameter space region in which the QSS assumptions are valid we use the measure defined in Eq. (43). The maximal mean and infinity norm of the relative difference between the original and the reduced model in Eq. (43) over time is presented in Table 2. The reduced model appears to be relatively robust to changes in the parameters, although sensitive to small values of *k*_{-4} and to large values of *k*_{4}. This is mainly due to that a large proportion of the transporter *E* is absorbed in ${x}_{E-G6P}^{i}$, which leads to that some of the QSS assumptions are invalid. We also observed that relative difference between the models is insensitive to the total concentration of the transporter for several orders of magnitude around the nominal value (*L*_{
E
} = 0.01). However, note that this observation is specific to the studied model and may not be generalizable to other similar biochemical models.

## Discussion

In this paper we have presented a novel method for reduction of biochemical models that is compatible with the concept of zooming. Several methods for reduction of biochemical models already exist in the literature. However, few of these methods result in biochemically interpretable models, and to our knowledge there are no nonlinear lumping methods for which the state variables and parameters of the reduced model can be back-translated (mapped) to the original model.

The application of the QSS assumption has been a commonly used tool in the modeling of biochemical networks since the late 1960s, and in chemical kinetics for more than 80 years [38]. The validity of the QSS approximation is well studied both for specific biochemical mechanisms [38, 39] and for more complex models [40, 41]. The resulting equations, together with conservation relations, are typically used to eliminate some of the state variables in the model (e.g., see [40]). However, with the examples in this paper we have showed that such an approach is not always sufficient, and we propose to use proper lumping of state variables in combination with back-translation.

Our method has several important advantages when applied to biochemical models. The most important advantage is that we end up with reduced models with a clear biological interpretation, meaning that each state variable of the original model corresponds to a fraction of exactly one of the state variables in the reduced model. A consequence is that neighboring species in the original model remain neighbors in the reduced model. Hence we can consider the original and reduced models as two different degrees of zooming; a concept that we discussed in some detail in [27] for linear models.

The work in this paper can be seen as an extension of the theory introduced for linear models in [27] to nonlinear models. The method is based on assumptions regarding the dynamics that result in a sufficient number of equations that are linear w.r.t. the state variables to be back-translated. Such equations are typically a natural result of QSS assumptions and conservations relations in models based on mass action kinetics [42], and in particular in models that involve transporters and enzymes (e.g., the models in this paper). However, note that our method may also be applicable to models with other types of reaction kinetics. We also note that if too few linear relations are available for calculation of fraction parameters for a part of a model, this part can still be reduced and the reduced model can be simulated, although we cannot back-translate the corresponding modified lumped state variables since no fraction parameters are available. However, depending on the purpose of the model it may be enough to calculate fraction parameters for a subset of the state variables in the reduced model. Linearization of the model around a steady state operating point may also be a feasible approach to calculate fraction parameters with the method in [27].

The proposed method enables mapping of the state variables and parameters of the reduced model to those of the original model. In [27] we referred to this mapping as back-translation. Back-translation is of great importance, since we can directly observe how modifications to the reduced model impact the original model. It also gives the modeler an opportunity to check whether the assumptions underlying the reduction are acceptable. To illustrate the power of back-translation we provide plots for comparison of simulations of the original and reduced models, w.r.t. the original state variables, for the models to which the method is applied in this paper.

Back-translation of state variables typically requires the solution of a system of nonlinear equations, which often results from the assumption of state variables in QSS and conservation relations. Unfortunately, analytic solutions to systems of nonlinear equations do in general not exist. An advantage with the proposed method is that such solutions are not required, since they are replaced by computation of the inverse of a matrix for each cluster of fast state variables, which is in general a more feasible task.

Our method was applied to a small model with five state variables that commonly appears as part of larger biochemical models, and to a previously published model for the transport of glucose in baker's yeast (*S. cerevisiae*) [37]. The first model was reduced from five to one state variable, and from five to three parameters. However, note that our focus has been on the reduction of the number of state variables and not the number of parameters, which are reduced as a side-effect of the QSS assumptions.

The model for glucose transport was first reduced with an approach in which the QSS equations and conservation relations were directly substituted into the remaining ODEs. The results of this approach are not satisfactory since the reduced model gives predictions that are different from the original model for most state variables. Our method was then applied to the same model both together with the assumption of equal concentration of the transporter in the inner and outer regions of the cell membrane used in [37], and without any additional assumptions. The application of our method together with the assumptions used in [37] results in a model with three state variables. The state dynamics is significantly better preserved than with the first approach, although still not satisfactory. We then decided to reduce the original model without the assumption regarding the localization of the transporter, with two different definitions of modified lumped state variables. While one of these definitions results in a reduced model with four state variables and gave rather accurate predictions, the other choice reduces the number of state variables to five and gives an excellent description of the state dynamics. It is therefore apparent that there is a tradeoff between accuracy and the number of state variables in the reduction process. The glucose transport model corresponds to the first part of glycolysis, in which glucose is transported into the cell. We therefore propose that it might be rewarding to carefully re-investigate the assumptions underlying the reaction rate equations in complete models of glycolysis (see [43] for one example).

We have also observed a few issues regarding the implementation of the method. The symbolic inversion of the matrix that is necessary to compute the dynamics of the modified lumped state variables may be expensive. However, this is typically only a practical limitation for large matrices, which result from large clusters of fast state variables. In our experience large clusters of fast state variables are relatively rare also in large biochemical models. Another option, if it is not practically feasible to invert the symbolic matrix, is to solve the system of linear equations in Eq. (31) numerically. We also observed that the symbolic right-hand side of the resulting differential equations may be long. However, these are usually not practical limitations for the applicability of the method, e.g., the simulations of all examples in this paper are very fast on a modern computer. Available methods to reduce the analytic reaction rate expressions include sensitivity analysis w.r.t. state variables and parameters, and the method proposed in [17].

There is still no consensus method for automatic identification of state variables in QSS, although criteria for the detection of state variables in QSS have been proposed, for example in [30]. A simple approach is to simulate the original model and investigate for which state variables the corresponding in- and outflow reaction rates are approximately equal. State variables for which this condition holds are then considered to be in QSS. Note that for the models in this paper it was already clear from the biochemical understanding of the corresponding systems which of the state variables that could be considered fast (see [37] for the glucose transport example). However, an appropriate general criterion for automatic identification of state variables in QSS is still lacking.

Although the theory presented in this paper constitutes a great leap forward for construction of zoomable models, more research is required to make the method fully automatic. An important challenge is to define a meaningful measure for the similarity between the hierarchical model layers (degrees of zooming). Another interesting, although trivial, observation that deserves further attention is that QSS assumptions typically do not hold in the whole parameter space. Although the reduced models in this paper appear to be robust to varying parameter values it may not be the case in general. It may therefore be revealing to compare the original model and the reduced model to characterize the parameter space regions in which the QSS-assumptions are valid.

## Conclusions

We have presented a novel method for reduction of biochemical models that is compatible with the concept of zooming. Zooming allows the modeler to operate on different levels of model granularity, and enables a direct interpretation of how modifications to the model on one level affect the same model on other levels in the hierarchy. The proposed method is based on the application of proper lumping in combination with the identification of linear relations in nonlinear equations.

The method was applied to two example models. The first model is small and commonly occurring as a part of larger biochemical models. The second example is a model for glucose transport in baker's yeast, which constitutes the starting point for glycolysis. Both models could be significantly reduced with the proposed method, and the resulting state variables could be back-translated to the original state variables. The method that is presented in this paper constitutes an extension of the method that was previously developed for linear biochemical models to its nonlinear counterpart. Since most models in the systems biology community are in fact nonlinear, our method constitutes an important step towards zoomable biochemical models.

## A Appendix

### A.1 Appendix 1

The ordinary differential equations for the enzyme kinetics model take the form

where *r*_{1} = *k*_{1}*SE* - *k*_{-1}*C*_{
S
} , *r*_{2} = *k*_{2}*C*_{
S
} , and *r*_{3} = *k*_{3}*C*_{
P
} - *k*_{-3}*PE*.

The parameters are set to values that satisfy the assumptions of dominating and insignificant reaction terms, with *k*_{1} = 1000, *k*_{-1} = 2000, *k*_{2} = 1, *k*_{3} = 3000, and *k*_{-3} = 1000, together with the initial conditions *S*(0) = *E*(0) = 1 and *P*(0) = *C*_{
S
} (0) = *C*_{
P
} (0) = 0. This gives the parameter values *M*_{1} = 0.5, *M*_{3} = 3 and *L*_{
E
} = 1 in the reduced model. Initial conditions can in general be obtained from a short simulation of the original model, until the fast state variables reach QSS, but in this case Eq. (19) gives an analytic expression of *S*(0) (*P*(0) = 0)

where ${M}_{1}^{-1}={K}_{1}$, *S* = *A*, *L*_{
S
} = *L*_{1}, and *L*_{
E
} = *L*_{2}.

### A.2 Appendix 2

The ordinary differential equations for the glucose transport model take the form

which was introduced in [27].

### A.3 Appendix 3

In this section we investigate an alternative (naive) approach to reduce the glucose transport model. The first step is to identify state variables for which the QSS assumption holds, and the mass conservation relations in the model. In the second step of this approach we then substitute the corresponding system of equations into the ODEs corresponding to slow state variables.

Now consider the model for glucose transport in yeast. We assume that the state variables ${x}_{Glc}^{e}$, ${x}_{Glc}^{i}$, ${x}_{E-G6P}^{i}$ and ${x}_{E-Glc-G6P}^{i}$ are in QSS, which gives Eqs. (43)-(46). Note that Eqs. (45)-(46) indirectly imply that ${x}_{G6P}^{i}$ is in steady state. The substitution of Eqs. (43) - (46) into the ODEs of the original model gives