We start with some preparatory background on the chemical master equation (CME) and parametric model order reduction. This serves in particular to fix the notation used throughout the remainder of the article. Then the application of parametric model order reduction to the CME is introduced.

### The chemical master equation

The structure of a biochemical reaction network is characterized completely by the list of involved species, denoted as

*X*
_{1}
*X*
_{2}…,

*X*
_{
n
}, and the list of reactions, denoted as

where

*m* is the number of reactions in the network, and the factors

and

are the stoichiometric coefficients of the reactant and product species, respectively
[

12]. The net change in the amount of species

*i* occuring through reaction

*j* is given by

Reversible reactions can always be written in the form (1) by splitting the forward and reverse path into two separate irreversible reactions.

For a stochastic network model, the variables of interest are the probabilities that the network is in any of the possible states which are characterized by the molecular copy numbers of the individual species

*X*
_{1},

*X*
_{2}…,

*X*
_{
n
}. We denote the molecular copy number of

*X*
_{
i
}by

. Then, the state variables of the stochastic model are given by the real numbers

for
, *i* = 1,…,*n*. As a short-hand notation for (3), we write *p*(*t*,*x*), with
.

The transitions from one state to another are determined by chemical reactions according to (1). The changes in the molecule numbers are described by the stoichiometric reaction vectors

To avoid needlessly complicated cases, we assume *v*
_{
j
}≠ *v*
_{
k
}for *j* ≠ *k*.

The probabilities of the network being in any of the possible states

*x* evolve over time, and their evolution is governed by the chemical master equation (CME) as derived by
[

1]. From a given molecular state

*x*, one can compute the propensity

*ν*
_{
j
} that reaction

*j* takes place according to the law of mass action as

where

is the vector of reaction rate constants, which are model parameters depending on the physical properties of the molecules involved in the reactions. The propensities are related to the probability that reaction

*j* will occur in a short time interval of length

*dt* when the system is in state

*x*:

Taking the possible transitions and the corresponding reaction propensities together yields the chemical master equation (CME), a linear differential equation where the variables are the probabilities that the system is in each of the possible molecular states

*x*:

for
. The CME (7) is subject to an initial condition *p*(*t*
_{0},*x*) = *p*
_{0}(*x*) for
.

Despite being linear, the CME is hard to solve numerically. This is due to the problem that the state space is for most systems infinite-dimensional, since all possible states
of the reaction network (1) must in general be considered. Instead of directly solving the CME (7), a number of alternative approaches to study the stochastic dynamics of biochemical reaction networks have been suggested. The most common approach is to generate a simulated realization of the stochastic process described by the reaction network (1), using for example the Gillespie algorithm
[13]. In this approach, the probabilities *p*(*t*
*x*) for the possible system states are obtained from many simulated realizations. However, since this requires a large number of realizations, it is computationally expensive.

As a more direct approach, Munsky and Khammash
[

3] have proposed the finite state projection (FSP), where the CME is solved on a finite subset of the state space. Here, this subset is denoted by

*Ω*, and is defined as

where the

*x*
^{(i)} are the system states for which the probabilities are computed in the projected model. The underlying assumption is that the probabilities for other states will be very low on the time scale of interest—otherwise the FSP may not yield good approximations to the solution of the CME. In particular we assume the time interval of interest to be given by [0,

*T* for final time

*T* > 0. The probabilities for the states

*x*
^{(i)} in

*Ω*are written in the vector

*P*(

*t*) approximating

*p*(

*x*
*t*) at the finite number of states

*Ω*:

The equation to be solved with the FSP approximation is

where

is the matrix of state transition propensities, and

is a vector of initial probabilities for the states in Ω. The elements of the matrix

*A*(

*θ*) are computed as

We will frequently omit the parameter dependence of the solution (and other parametric quantities). Hence the solution *P*(*t*), as abbreviation of *P*(*t*
*θ*), of (10) is an approximation to the solution *p*(*t*
*x*) of the orginal CME on the domain *Ω*. Munsky and Khammash
[3] have also derived an upper bound on the error between the solution *P*(*t*) computed via the FSP, and the solution of the original CME *p*(*t*
*x*) on *Ω*.

Here, we consider in addition an output vector

defined by

with

. Examples for relevant outputs are the probability that the molecular copy numbers are in a certain domain

, which is achieved by the row vector output matrix

*C* defined by

*C*
_{
i
}= 1 if

, otherwise

*C*
_{
i
}= 0, with

*p* = 1, or the expected molecular copy numbers, given by

i.e. *C* = (*x*
^{(1)},…,*x*
^{(d)}) with *p* = n.

The basic motivation for the model reduction presented here is that we are interested in parametric analysis of the model, where the model (10) has to be solved many times with different values for the parameters *θ*. Due to the typical high dimensions of the matrix *A*(*θ*), already a single simulation is computationally expensive, and analysis tasks requiring many repeated simulations are often computationally infeasible. Thus, the primary goal is to derive a reduced model which is rapidly solvable and provides an approximation
to the output *y*(*t*), potentially without any consideration of the original state vector *P*(*t*).

### Order reduction of parametric models

Model order reduction of parametric problems is a very active research field in systems theory, engineering and applied mathematics. We refer to
[8, 10, 11] and references therein for more information on the topic.

Here, we apply the reduction technique for parametric problems presented in
[9] adopted to our notation. It is based on two biorthogonal global projection matrices
with *r* ≪ *d* and *W*
^{
T
}
*V* = *Id*, where *r* is the dimension of the reduced model. The matrix *V* is assumed to span a space that approximates the system state variation for all parameters and times. The construction of such matrices will be detailed in the next subsection.

The gain of computational efficiency in repeated simulations comes from a separation of the simulation task into a computationally expensive “offline” phase and a computationally cheap “online” phase. In the offline phase, suitable projection matrices *V* and *W* are computed without fixing specific parameter values. With the projection matrices, a reduced model with the same free parameters as the original model is computed. In the online phase, the reduced model is simulated with the actually chosen parameter values, which is typically several orders of magnitude faster than the simulation of the original model. For analysis tasks with repeated simulations, only the online phase has to be repeated for different choices of the parameter values, yielding an overall gain in computational efficiency.

#### Decomposition in parametric and non-parametric part

The reduction technique assumes a separable parameter dependence of the full system matrices and the initial condition. This means, we assume that there exist a suitable small constant

, parameter independent components

and parameter dependent scalar coefficient functions

for

*q* = 1,…,

*Q*
_{
A
} such that

and similarly for the system matrix

*C* and initial condition

*P*
_{0}. We assume that

stems from some domain

of admissible parameters. In the next step, the reduced component matrices and initial conditions are determined by

for

*q* = 1,…,

*Q*
_{
A
}. The resulting quantities

,

, and

are

*r*-dimensional vectors or matrices and independent of the high dimension

*d*. The basis computation and the computation of these reduced system components is performed once and parameter-independently in the offline-phase. Then, in the online-phase, for any new parameter

*θ*the reduced system matrices and the initial condition are assembled by

and similarly for

*P*
_{
r0}(

*θ*) and

*C*
_{
r
}(

*θ*). The low dimensional reduced system that remains to be solved is

From the reduced state *P*
_{
r
}(*t*), an approximate state for the full system can be reconstructed at any desired time by
. Also the difference between the approximated output
and the output *y*(*t*) of the original model can be bounded by so called error estimators. A-posteriori error bounds for the reduced systems as considered here are given in
[9].

#### Basis generation

Different methods for the computation of the projection bases *V* and *W* exist. In systems theory, methods like balanced truncation, Hankel-norm approximation or moment matching are applied, that approximate the input-output behaviour of a linear time-invariant system
[6]. The resulting reduced models can be applied for varying input signals. Extensions to parametric problems exist, e.g.
[8, 11]. As we do not have varying inputs in the problem studied here, we consider snapshot-based approaches to be more suitable. This means, the projection bases are constructed by solution snapshots, i.e. special solutions computed for selected parameter values.

The generation of projection matrices

*V* and

*W* must be done in such a way, that they are globally well approximating the system states over the parameter and time domain. A possible way to achieve this is the POD-Greedy algorithm, which has been introduced in
[

14] and is meanwhile a standard procedure in reduced basis methods
[

15,

16]. The algorithm makes use of a repeated proper orthogonal decomposition (POD) of trajectories

, which for our purposes can be defined as

Intuitively,

is a state space vector representing the single dominant mode that minimizes the squared mean projection error. Computationally, this minimization task is solved by a reformulation as a suitable eigenvalue problem. Consider the correlation matrix

. Then,

is an eigenvector corresponding to the largest eigenvalue

*λ*
_{
max
}of

*C*, i.e.,

. For additional theoretical and computational details on POD we refer to
[

17,

18]. We further require a finite subset of parameters

, that are used in the basis generation process. As error indicator Δ(

*θ*
*V*) we use the projection error of the full system trajectory on the reduced space spanned by the orthonormal columns of

*V* , i.e.

The POD-Greedy procedure which is given in the pseudo-code below, starts with an arbitrary orthonormal initial basis
and performs an incremental basis extension. The algorithm repeatedly identifies the currently worst resolved parameter (a), orthogonalizes the corresponding full trajectory with the current reduced space (b), computes a POD of the error trajectory (c), and inserts the dominant mode into the basis (d).

function *V* = POD-Greedy

1. *N* := *N*
_{0}

2. while

(a)

(b)

(c) *v*
_{
N + 1} := *POD*(*E*)

(d) *V*
_{
N + 1} := [*V*
_{
N
},*v*
_{
N + 1}]

(e) *N* := *N* + 1

3. end while

Note that the algorithm is implemented such that the simulation of the full model, yielding *P*(*t*,*θ*) in (19), is only performed once for each *θ*in the training set
.

For concluding the basis generation, we set *W* := *V*. This satisfies the biorthogonality condition *W*
^{
T
}
*V* = *Id*, as *V* has orthonormal columns by construction. In practice the time-integrals in (18) are realized by a finite sampling of the time interval.

A theoretical underpinning for the POD-Greedy algorithm has recently been provided by the analysis of convergence rates
[19]. This is based on the approximation-theoretical notion of the *Kolmogorov n*-width
of a given set
, which quantifies how well the set can be approximated by arbitrary *N*-dimensional linear subspaces of
. The convergence statement for the case of exponential convergence then can be summarized as follows: If the set of solutions
is compact and has an exponentially decaying Kolmogorov *n*-width
for some *M*
*a*
*α* > 0 and all
, then the error sequence
generated by the POD-Greedy procedure (cf. the definition in Step 2. in the pseudo code) also decays with an exponential rate,
with suitable constants *β*
*c*
*C* > 0 depending on *M*,*a*,*α*. Thus, if the set of solutions can be approximated by linear subspaces with an exponentially decaying error term, then the POD-Greedy algorithm will in fact find an approximation with an exponentially decaying error term, though possibly with suboptimal parameters in the error bound.

Extensions of the POD-Greedy algorithm exist, e.g. allowing more than one mode per extension step, performing adaptive parameter and time-interval partitioning, or enabling training-set adaptation
[15, 16, 20].

### Reduced models of the parametrized chemical master equation

In this section, we describe how to apply the reduction method for parametrized models presented in the previous section to FSP models for the chemical master equation.

As discussed in the previous section, the first step in the proposed reduction method is a decomposition of the

*d*-dimensional system matrix

*A*(

*θ*) as in (14). Such a decomposition is possible for the case of mass action reaction propensities, as defined in (5), or generalized mass action, as recently suggested for the chemical master equation
[

21]. In this case, the length of the parameter vector

*θ* is equal to the number of reactions

*m*, and we decompose

*A*(

*θ*) into

*m* terms as

Hence, concerning the notation given before, we have

*Q*
_{
A
}=

*m* components

*A*
^{[q]} and coefficient functions

. Each matrix

*A*
^{[q]} in this decomposition comes from just the transition propensities corresponding to reaction

*q*, and is defined by

More generally, such a decomposition is also possible if reaction rate propensities can be decomposed into the product of two terms, with the first term depending on parameters only, and the second term on molecule numbers only. This case is for example encountered when the temperature-dependance of the reaction rate constant is relevant, and the temperature *T* is a variable parameter in the Arrhenius equation
. Since the output matrix *C* and the initial condition *P*
_{0} are usually not depending on parameters in this framework, a decomposition of *C* and *P*
_{0} is not considered.

The situation is more difficult for reaction propensities involving for example rational terms with parameters in the denominator. The denominator parameters can not be included in the reduced order model by the decomposition outlined in (20) and (21). If variations in these parameters are however not relevant to the planned analysis, then they can be set to their nominal value, and the decomposition can directly be done as described above. Alternatively, approximation steps can be performed, such as Taylor series expansion or empirical interpolation
[22], that generate an approximating parameter-separable expansion.