The methodology presented in the following is general, i.e. not only applicable for ODEs. Therefore, we first introduce the prediction profile likelihood as well as prediction confidence intervals and next illustrating the applicability for ODE models.

### The prediction profile likelihood

For additive Gaussian noise

*ε* ∼

*N*(0,

*σ*
^{2}) with known variance

*σ*
^{2}, two times the negative log-likelihood

of the data

*y* for the parameters

*θ* is, except a constant offset, identical to the residual sum of squares RSS(

*θ*|

*y*) = ∑

_{
i
}(

*y*
_{
i
}−

*F*(

*t*
_{
i
},

*u*,

*θ*))

^{2}/

*σ*
^{2}. In this case, maximum likelihood estimation is equivalent to standard least-squares estimation

i.e. to minimizing the residual sum of squares.

*F* =

*g*(

*x*(

*t*,

*u*,

*θ*),

*θ*) denotes the model response which is in the case of a

*state space model* given after integration of a system of differential equations

with an externally controlled input function

*u* and a mapping to experimentally observable quantities

The parameter vector *θ* comprises the kinetic parameters as well as the initial values, and additional offset or scaling parameters for the observations. Note, that the presented methodology is general, i.e. also applicable for other types of models like regression models or partial differential equations, delay differential equations and differential algebraic equations.

It has been shown
[

6] that the profile likelihood

for a parameter

*θ*
_{
j
}given a data set

*y* yields reliable confidence intervals

for the estimation of a single parameter. Here, *α* is the confidence level and
denotes the *α* quantile of the chi-square distribution with one degree of freedom which is given by the respective inverse cumulative density function. LL^{∗} is the maximum of the log-likelihood function after all parameters are optimized. In (5), the optimization is performed for all parameters except *θ*
_{
j
}. The analogy of likelihood-based parameter and prediction confidence intervals is discussed in the Additional file
1.

i.e. the probability that the true parameter value is inside the confidence interval, holds for 6 if the magnitude of the decrease of the residual sum of squares by fitting of *θ*
_{
j
} is
distributed. This is given asymptotically as well as for linear parameters and is a good approximation under weak assumptions
[20, 21]. If the assumptions are violated, the distribution of the magnitude of the decrease has to be generated empirically, i.e. by Monte-Carlo simulations, as discussed in the Additional file
1.

The *experimental design*
*D* = {*t*,*g*,*u*} comprises all environmental conditions which can be controlled by the experimenter like the measurement times *t*, the observables *g*, and the input functions *u*. A *prediction*
*z* = *F*(*D*
_{pred},*θ*) is the response of the model *F* for a prediction condition *D*
_{pred} = {*t*
_{pred},*g*
_{pred},*u*
_{pred}} specifying a prediction observable *g*
_{pred} evaluated at time point *t*
_{pred} given the externally controlled stimulation *u*
_{pred}. In principle, every quantity which can be computed based on the model can serve as a model prediction *z*. Typical examples comprise concentrations of dynamic compounds, but also concentration ratios or integrals, or characteristics of a time course like the height or width of a peak.

In some cases the observable *g*
_{pred} corresponds to measuring a dynamic variable *x*(*t*) directly, i.e. it corresponds to a compound whose concentration dynamics is modeled by the ODEs. In a more general setting the observable is defined by an observational function *g*
_{pred}(*x*(*t*),*θ*) depending on several dynamic variables *x*. Therefore, *g*
_{pred} does neither have to coincide with a dynamic variable nor with an observational function *g* of the measurements performed to build the model.

In analogy to (7), the desired property of a prediction confidence interval PCI

_{
α
}(

*D*|

*y*) derived from an experimental data set y with a given significance level

*α* is that the probability

that the true value of *F*(*D*
_{pred},*θ*
_{true}) is inside the prediction confidence interval PCI_{
α
}is equal to *α*. In other words, the PCI covers the model response for the true parameters with a proportion *α* of the noise realizations which would yield different data sets *y*.

The prediction profile likelihood

is obtained by maximization over the model parameters satisfying the constraint that the model response

*F*(

*D*
*θ*
^{∗}) after fitting is equals to the considered value

*z* for the prediction. The prediction confidence interval is in analogy to (6) given by

i.e. the set of predictions *z* = *F*(*D*
_{pred}
*θ*) for which − 2 PPL is below a threshold given by the
-distribution. In analogy to likelihood based confidence intervals for parameters, such PCI yields the smallest unbiased confidence intervals for predictions for given coverage *α*[22].

Instead of sampling a high-dimensional parameter space, the prediction profile likelihood calculation comprises sampling of a one-dimensional prediction space by evaluating several predictions *z*. Evaluating the maximum of the likelihood satisfying the prediction constraint does in general not require an unambiguous point in the parameter space as in the case of structural non-identifiabilities. In analogy to profile likelihood for parameter estimates, the significance level determines the threshold for the PPL, which is given asymptotically by the quantiles (6) of the
-distribution
[23]. In the Additional file
1, a Monte-Carlo algorithm is presented which can be used to calculate the threshold in cases where the asymptotic assumption is violated.

### The validation profile likelihood

Likelihood-based confidence interval like (6) or (10) correspond to the set of predictions which are not rejected by a likelihood ratio test. Having a prediction confidence interval, the question arises whether a model has to be rejected if a validation measurement is outside the predicted interval. This, in fact, would hold if a “perfect” validation measurement would be available, i.e. a data point without measurement noise. For validation experiments, however, the outcome is always noisy and is therefore expected to be more frequently outside the PCI than the true value. Hence, the prediction confidence interval (10) has to be generalized for application to a validation experiment.

For a validation experiment, we therefore introduce a

*validation profile likelihood* VPL and a corresponding

*validation confidence interval*
in the following. In such a setting, a confidence interval should have a coverage

for the validation data point *z* ∼ *N*(*μ*,SD^{2}) with expectation *μ* = *F*(*D*
_{vali},*θ*
_{true}) and variance SD^{2}. Here, *D*
_{vali} denotes the design for the validation experiment. A validation confidence interval satisfying (11) allows a rejection of the model if a noisy validation measurement with error SD is outside the interval.

for validation data can be calculated by relaxing the constraint in (9) used to compute the prediction profile likelihood. Because in this case, the model prediction does not necessarily have to coincide with the data point

*z*. Instead, the deviation from the validation data point is penalized equivalently to the data

*y*. The agreement of the model with the data

*y* and the validation measurement

*z* is then given by

We now define the validation profile (log-)likelihood

with

*θ*
^{∗} =

*θ*
^{∗}(

*z*,

*y*) = arg max

_{
θ
} LL(

*z*,

*y*|

*θ*) as the maximized joint log-likelihood in (12) read as a function of

*z*. The corresponding validation confidence interval is given by

Optimization of the likelihood (12) minimizes both, the mismatch of existing data RSS(

*θ*|

*y*), and the mismatch with the fixed validation data point

*z*. The model response

obtained after this parameter optimization can be interpreted as a prediction

*z*
^{
″
}satisfying the constraint optimization problem (9) considered for the prediction profile likelihood. It holds

i.e. the validation profile likelihood LL

^{∗} can be scaled to the prediction likelihood via

where *z*
^{
′
}= *F*(*D*
_{vali},*θ*
^{∗}(*z*,*y*,SD > 0)) is the model response for *θ*
^{∗} estimated from *z* and *y*.

Optimization with nonlinear constraints is a numerically challenging issue. Therefore, (16) provides a helpful way to omit constraint optimization. The VPL can be calculated with SD > 0 like a common least-squares minimization and is then afterwards rescaled to obtain the PCI for the true value.