A continuous-time adaptive particle filter for estimations under measurement time uncertainties with an application to a plasma-leucine mixed effects model
© Krengel et al.; licensee BioMed Central Ltd. 2013
Received: 31 October 2011
Accepted: 7 January 2013
Published: 19 January 2013
When mathematical modelling is applied to many different application areas, a common task is the estimation of states and parameters based on measurements. With this kind of inference making, uncertainties in the time when the measurements have been taken are often neglected, but especially in applications taken from the life sciences, this kind of errors can considerably influence the estimation results. As an example in the context of personalized medicine, the model-based assessment of the effectiveness of drugs is becoming to play an important role. Systems biology may help here by providing good pharmacokinetic and pharmacodynamic (PK/PD) models. Inference on these systems based on data gained from clinical studies with several patient groups becomes a major challenge. Particle filters are a promising approach to tackle these difficulties but are by itself not ready to handle uncertainties in measurement times.
In this article, we describe a variant of the standard particle filter (PF) algorithm which allows state and parameter estimation with the inclusion of measurement time uncertainties (MTU). The modified particle filter, which we call MTU-PF, also allows the application of an adaptive stepsize choice in the time-continuous case to avoid degeneracy problems. The modification is based on the model assumption of uncertain measurement times. While the assumption of randomness in the measurements themselves is common, the corresponding measurement times are generally taken as deterministic and exactly known. Especially in cases where the data are gained from measurements on blood or tissue samples, a relatively high uncertainty in the true measurement time seems to be a natural assumption. Our method is appropriate in cases where relatively few data are used from a relatively large number of groups or individuals, which introduce mixed effects in the model. This is a typical setting of clinical studies. We demonstrate the method on a small artificial example and apply it to a mixed effects model of plasma-leucine kinetics with data from a clinical study which included 34 patients.
Comparisons of our MTU-PF with the standard PF and with an alternative Maximum Likelihood estimation method on the small artificial example clearly show that the MTU-PF obtains better estimations. Considering the application to the data from the clinical study, the MTU-PF shows a similar performance with respect to the quality of estimated parameters compared with the standard particle filter, but besides that, the MTU algorithm shows to be less prone to degeneration than the standard particle filter.
KeywordsParticle filter Sequential Monte Carlo methods Nonlinear filtering Parameter estimation Measurement time uncertainties PK/PD Mixed effects Leucine kinetics
Measurement time uncertainties
Uncertainty in the time at which a measurement is taken is an often neglected source of random error. While in many application areas, this kind of error is generally small and indeed neglectable (due to automated measurements and precise timings), in others it may be of real influence, especially in the life sciences. As a prominent example, one may consider pharmacokinetic and pharmacodynamic (PK/PD) models which are used to describe the metabolic interactions and the effects of a chemical agent (like a drug or a labelled substance) over time inside an organism, respectively.
A typical population experiment in the PK/PD context consists in the analysis of the contents of the blood plasma of several individuals with respect to concentrations of certain molecules of interest. For this purpose, blood probes have to be taken from each individual at certain (fixed) time points after a certain event has occurred (e.g. a drug or a labelled substance has been applied). It is clear from the setting of the experiments that there is some variation in the real point in time when the blood probe has been taken: the true time when the measurement value has been obtained might be shortly before or after the intended time, and this true measurement time is not known to us. Since the inclusion of those time uncertainties in the model usually makes the analysis more difficult, it is standard to lump the time uncertainties with the measurement error. But especially at early times when concentrations change quickly, this may easily lead to wrong estimations, even if one assumes very high variances of the measurement error (we will demonstrate this later on a simple example). On the other hand, the inclusion of measurement time uncertainties (MTU) in algorithms aiming at inference making in complex models is not straightforward. In this article, we will present a modification of the Particle Filter (PF) algorithm (which we call MTU-PF) which is able to fully include a statistical model of the time uncertainties.
Inference in complex systems
The assessment of the effectiveness of a drug in a clinical study has been done in the past by the direct computation of relatively simple statistical values. The enormous increase in complexity of the underlying models, due to present developments in medicine and biology, for instance in the areas of personalized medicine or systems biology, increases also the need for more sophisticated model-based inference methods.
The estimation of unobservable internal variables or model parameters from data which have been obtained from blood or tissue samples at several time points can reveal information on the concentrations and effectiveness of the substance under question. If these data come from individuals which belong to two different (or even more) groups, e.g. test and control group, mixed effects are introduced in the underlying models. The inherent non-linearity and high variability of biological processes adds considerably to the difficulties one faces during the inference step. Inference in connection with dynamic models plays a major role in many other application areas. State and parameter estimation as well as model discrimination and validation are most common, but also optimal control problems should be mentioned.
It is often not enough to consider (independent) measurement noise . Correlations between residuals are not uncommon, and the violation of this statistical assumption may lead to wrong estimates. A natural way to include correlated noise is to model two different types of noise: the dynamic (process or system) noise which is present in the dynamics of the system states and originates either from true random fluctuations in the system or from unmodelled dynamics in the system, and the measurement noise which is introduced by the measurement procedure or equipment and modelled by independent residuals. One possible approach is to use state space models which consist of a time-continuous model for the system states, e.g. based on Stochastic Differential Equations (SDEs), and a separate model for the time-discrete measurements.
Parameter estimation with Maximum Likelihood approach
Parameter estimation in state space systems is a difficult problem. In a context where the system dynamics are modelled by Ordinary Differential Equations (ODEs) without correlated noise, the problem is most often considered as a (deterministic) optimization problem based on a Maximum Likelihood (ML) formulation. An overview of these approaches can be found in  and ; see also , which consider also other aspects like identifiability. A generalization of the ML approach including more flexible cost functions is given by the prediction error estimation method (). The introduction of system noise in the state variables leads to optimization problems with SDE constraints. In this case, internal system states which cannot be directly observed need to be estimated jointly with the parameters, given the data. For this purpose, the parameter estimation methods must be augmented by appropriate state filtering methods. An overview of ML parameter estimation in these types of models is given in . If the SDEs are non-linear, linearizations to the Kalman Filter, like the Extended Kalman Filter (EKF) or the Unscented Kalman Filter (UKF), are used to establish approximations to the means and covariances of the filter distributions over time. All those approximations suffer from the fact that they approximate the filtering distributions of the states by a Gaussian distribution at all time points and cannot adequately approximate skewed or multimodal distributions. Better approximations are provided by simulation based methods like Sequential Monte Carlo (SMC) algorithms where good convergence results have been established (). Nevertheless, they suffer from several drawbacks when applied to the joint estimation of dynamic states and fixed parameters ([8–10], see also ).
Parameter estimation in a Bayesian context
In a Bayesian context, in contrast to the “classical” ML approach, a prior distribution is assigned to the parameter vector, hence the parameters can be treated as random variables. In this sense, parameter estimation is done by evaluating the so-called posterior distribution which can be computed (at least theoretically) by Bayes’ theorem given the observations (measurements) and the prior distribution. In the context of high-dimensional spaces, this requires the computation of high-dimensional integrals which is not possible to do analytically. For this purpose, Markov Chain Monte Carlo (MCMC) methods provide powerful tools for the computation of simulation-based approximations to the posterior distribution. Again, in the context of the joint estimation of dynamic states and fixed parameters, the design of good proposal densities is a very difficult problem which renders the use of standard MCMC methods like the Metropolis-Hastings sampler impractical for the purposes of parameter estimation in state space systems.
It has long been a wish to combine both (dynamic) SMC and (static) MCMC methods to provide a general tool for the joint estimation of dynamic states and static parameters. Only recently, Andrieu et al.  proposed a very promising combination of both types of Monte Carlo approaches called Particle Markov Chain Monte Carlo (PMCMC) which is generally applicable and where also convergence has been proved.
In the present article, even though the PMCMC approach might be the preferred method for parameter estimation in state space systems, we will concentrate solely on the SMC methods, since our modification affects only this part. However, to be able to do parameter estimation in a pure SMC context, we rely on an approach that is very often used to avoid problems with the estimation of constant parameters. This approach consists in the introduction of artificial dynamics in the parameters, that means the parameters are allowed to slightly change their values over time. In this way, and in a Bayesian context, the parameters can be treated exactly in the same way as the system states. After building an augmented system state by concatenating the parameter vector and the state vector, the joint estimation of states and parameters reduces to filtering of the augmented state vector which makes SMC methods directly applicable to the problem.
Particle filters for state and parameter estimation
Particle filters ([12–14]) belong to the class of SMC methods for state filtering in state space models. Using the state augmentation approach, the method is also capable of estimating system parameters. The standard particle filter is designed for discrete, non-linear, and non-Gaussian models and can routinely be adapted to the continuous case with measurements at discrete times. The idea of the particle filter is that, at each time point, there is a sample based representation (the weighted particles) of the current estimate of the inner states and parameters which is based on the measurements that have been obtained up to the current time point. The particle cloud is then propagated through time, and the particles and weights are updated accordingly at each time point where measurements are available.
Non-Linear Mixed Effects models
Estimation in a Non-linear Mixed Effects model (NLME) involves the estimation of both global and individual parameters. With classical maximum likelihood estimation, the individual parameters are random variables equipped with a distribution while the global parameters remain constants with a “true” but unknown value. If the underlying model equations are non-linear, this leads to likelihood functions which are not analytically accessible and one has to rely on approximations. In the context where the system dynamics are modelled by ODEs, the most popular algorithm for NLME parameter estimation in the PK/PD context is the tool NONMEM (). In  an estimation algorithm for NLME models based on Stochastic Differential Equations (SDEs) was proposed that uses the First-Order Conditional Estimation (FOCE) method to approximate the likelihood in combination with the EKF estimation in the SDEs. This has been added to NONMEM (). In , a comparison between ODE and SDE based parameter estimation has been performed which showed that the interindividual variabilities were in general estimated to be smaller for the SDE model. Donnet and Samson () proposed a stochastic version of the Expectation-Maximization (SAEM) algorithm (for the estimation of the global parameters) in combination with MCMC methods (for the estimation of states and individual parameters). However, since MCMC exhibits slow mixing properties in the context of the estimation of states and parameters in state space models, in  MCMC has been replaced by the more promising PMCMC approach of Andrieu et al. ().
On the other hand, in a Bayesian context, also the global parameters are equipped with a (prior) probability distribution, and the conceptual difference between global and individual parameters vanishes. The mixed effects model can then be considered as a hierarchical model with dependent parameters ([20, 21], see also  for a more recent population-based Bayesian approach to PK/PD modelling). Simulation-based (Monte Carlo) methods can easily be adapted to this case. Nevertheless, the above mentioned challenges to both SMC and MCMC methods are even higher due to the increased number of states and parameters in NLME models (the number of states and individual parameters has to be multiplied by the number of individuals).
Aim of the article
Our goal is two-fold: Firstly, we want to show that the particle filter algorithm is applicable (with our modifications) also to more complex models when time uncertainties are formulated explicitly. Secondly, we want to show that the modification may even provide the possibility for further enhancement of the performance of the algorithm by presenting an adaptive time-stepping scheme which is only possible in the context of the new algorithm.
We do not claim that our MTU algorithm generally performs better or worse than the standard filter, nor that it should be the preferred method for estimation in non-linear mixed effects models. Rather, we provide a method which is usable for models where time uncertainties may play a major role. In these cases, it may indeed lead to better estimations. On the other hand, our method transfers the time-discrete particle filter approach, where updates based on the measurements very strictly depend on the measurement times, to a truly time-continuous approach, where updates to the filtering distributions can be performed at every point on the time-scale. Since we want to focus on the time uncertainties, we neglect discussing further issues like identifiability, model evaluation and model discrimination. In our application to the model of plasma-leucine kinetics, we try to avoid these issues by providing ad-hoc values to some of the parameters (especially to the variances of the system states).
Parameters for the motivating example
true value of α
true value of β
distribution of T j
truncated at and at t0
Comparing Figures 1(a) and 1(b-d), we observe that the distributions of the measurements exhibit clearly different shapes. For the “true” model depicted in Figure 1(a), if we consider a single point in time that lies in a time segment where the state values change quickly, the distribution of the measurement at this certain point in time is quite broad. The variance in the measured value is very high, whereas it is small in time segments where the state values change slowly. In contrast, for the standard particle filter, the measurement variance is constant and hence the assumed measurement distributions differ remarkably from the “true” distributions, howsoever the value of is chosen. It must be expected that this leads to difficulties when inference on the states and parameters needs to be done based on these models. We will resume our example after having presented the MTU particle filter and will show that this is indeed the case.
We divide this section into three subsections. In the first subsection, we fix the state and observation model we want to consider. In the second subsection entitled “Standard case” we outline the standard particle filter algorithm in the context of time-continuous states with time-discrete measurements, and the various probability distributions involved. Although nothing is new in this subsection, it serves several purposes. Firstly, the time-continuous case is relatively rarely considered in the literature; secondly, the derivation of our modification needs a slightly more general formulation than it is standard for the discrete-time filter; and lastly, the comparison of our modified version with the standard case might more clearly reveal the differences between the two approaches. In the third subsection entitled “MTU particle filter”, we present our new modification of the particle filter. In the following section “Results and Discussion”, we compare the new MTU particle filter to the standard particle filter and to an alternative Maximum Likelihood estimation method on a simple artificial example. We also present an application of our MTU-PF method to a PK/PD study in a non-linear mixed-effects setting in direct comparison with the standard particle filter.
Note: a list of all used symbols with a short explanation can be found at the end of this paper.
the state space restricted to the interval [t0,t], and denote by the corresponding pushforward measure. For each s and t with t>s≥t0, let Ks,t(x s , dx t ) be the Markov kernel of the process from time s to time t.
with drift a(x,t), diffusion matrix B(x,t), multidimensional standard Wiener process , and initial variable . In this case, it is possible to sample directly (at least approximately) from the kernels Ks,t when a suitable discretization method is applied, for instance the Euler-Maruyama method.
Observations / measurements
Let the process be observed via M random variables Y1:M with values in measurable spaces . Each single observation (measurement) y j depends on the state variable at some time T j and on the observation time (measurement time) T j itself. We assume that, given the observation time T j and the state , the variable y j is independent of all other variables, and the conditional measure can be expressed via some conditional probability density with respect to a reference measure on . We do not require any further restrictions on g such as linear dependence on the states or Gaussianity.
Observation / measurement times
The observation times (measurement times) T j for j=1,…,M are usually assumed to be deterministically given and known. Our variant of the particle filter will be based on the assumption that the observation times T j are themselves realizations of random variables T j . These variables model the uncertainty about exact observation times. In contrast to the observation variables y j , the observation times T j are never observed (measured). We assume that all information available to us is their probability distribution on the half axis [t0,∞), while in the case of the observations y j , we know both the densities and the observed values y j .
In this article, we will only consider the simplest case where each variable T j is independent of all others. Dependencies between the T j ’s, especially concerning the order of the observation times, may be considered natural but would lead to more complicated algorithms. However, order dependencies can easily be introduced via restrictions on the support of the variables. In general, the probability distribution of every single variable T j shall be given by a density γ j (t j ) with respect to the Lebesgue measure on the interval [t0,∞).
In the following, we will consider the two cases mentioned, where either all T j are deterministic and known or all T j are random and unknown. Note that the first case formally coincides with the case that T j is random but observed. We will therefore stick to the notation for the observation densities in both cases.
Standard case: measurement times deterministic and known
We will first consider the standard case, where the observation times T j are known. For simplicity, we assume here that the observation times t1:M are strictly ordered increasingly, i.e. t0<t1⋯<t M .
The standard case of the particle filter is usually formulated for discrete-time Markov processes with general state space where the state variables are only defined at the initial time t0 and at the times t1,…,t M when measurements occur. Nevertheless, this case is included in our more general framework where X t is defined for all t≥t0. One just focuses on the state variables for those times only. In view of the later generalization to random observation times, we will consider the fixed values T j as realizations of random variables T j and condition all occurring densities on them. As mentioned above this assumption leads to the same results as if we assumed the values T j to be given deterministically.
Full model and filter model
for each , and to compute as well as pointwise.
Note that if we can sample from the Markov kernels of , we can choose (at least in law), whence and . This is a standard choice, but in terms of efficiency of the particle filter algorithm not always the best one. On the other hand, finding a suitable Markov chain different from is not an easy task.
are the normalized weights. It obtains its maximal value N if all weights are equal, and it approaches 1 if the variance of the weights and thus the degree of degeneracy increases. To avoid this degeneration of the samples, a resampling step needs to be done when the ESS drops below a threshold NThreshold (which is usually chosen to be N/2).
(using (16)). The necessary correction is therefore achieved if the unnormalized weights are replaced by the corrected unnormalized weights .
such that after the resampling step the unnormalized weights are all equal to 1. Nevertheless, in general their choice is free and may be based on the observations (which is used in the so-called auxiliary particle filter ).
Particle filter algorithm
The particle filter computes the state realizations and weights recursively through time. In its standard form, the particle filter can be stated as in algorithm 1.
Algorithm 1 Standard particle filter
with initial estimate (see e.g. ).
MTU particle filter: Uncertain measurement times
We now assume that each observation time T j is a realization of a random variable T j . Its distribution is expressed via densities γ j with respect to the Lebesgue measure . The observation times T j themselves are not observed.
Note that we cannot use the simple notation of the standard case where for filtering only the first k observations are taken into consideration at time t k , since neither the observations are ordered in time nor the times T j are fixed in advance. For this reason we have to include all measurements Y1:M also into the filter model. Note that even though we use the complete data Y1:M=y1:M in the notation, only those y j have to be known at time t for which T j ≤t holds. To avoid confusion, we mark all densities connected to the filter model at time t by a hat superscript (and by the index t).
is the data likelihood with respect to the measure .
Effective computation of the filter distributions
In the following paragraph, we will show how the densities of the filter distributions given by (26) can be effectively computed. This is the basis for the formulation of our MTU particle filter method.
This is what we wanted to show.
depends on the path . It is even more convenient to compute by evaluating the antiderivative of γ j , if it is computationally available.
Note that the definition of the filter distribution is dependent on the reference measure . A suitable change of this measure may help to further increase the efficiency of the algorithm. This issue still has to be explored.
The MTU particle filter algorithm
for each s,t∈[t0,t] with s<t exist and can be evaluated pointwise. (In fact, it suffices that ϱt | s(x t | x s ) exists for all states which are reachable from some initial state with ). In the special case that we sample from the Markov kernels of directly, ϱt | s≡1 for all s<t. Our MTU particle filter is described in algorithm 2. Here we suppress the indices s1,…,s ℓ in the notations of states and weights (see last paragraph).
Algorithm 2 MTU particle filter
for the computation of the values .
To be able to fully exploit our MTU particle filter method, the discretization stepsize must be chosen appropriately. One simple possibility is to use a very small stepsize throughout the complete procedure. A quite high computation time will result from that. This can be reduced if an adaptive stepsize is chosen. We propose to determine the stepsize Δ τ d online depending on the ESS estimate. The stepsize should decrease when the ESS drops rapidly, and it should increase again if the ESS estimate changes only marginally. In detail, the following procedure can be applied.
In each step of the algorithm, we obtain an initial guess of the stepsize by a linear interpolation between a maximal stepsize if the ESS had not changed since the last step, and a minimal stepsize if the ESS had dropped by the number N of samples (actually, the maximal difference that can be obtained is N-1). From this initial guess, we compute the increments of the partial weights, and from them we predict the ESS in the next step based on the current stepsize guess. If the difference between this predicted ESS and the current ESS drops by more than a certain relative amount (we use 10%), then a new guess of the stepsize is computed by dividing the current guess by 2. With this new guess, a new predicted ESS is computed, and the test can be applied again. This procedure will be applied iteratively until either the difference between predicted and current ESS drops by less than the prescribed amount, or the stepsize guess falls below a prescribed minimal stepsize. The current stepsize or the minimal stepsize, respectively, is then accepted, and the algorithm proceeds with this stepsize. See algorithm 3 for a formal description of the stepsize determination.
If we sample from the Markov kernels of directly, then and the update of the weights does not depend on the new states . Hence we only need to compute the weight update and the corresponding ESS estimate until we find an adequate stepsize. In this case, it is not necessary to sample the new states in each iteration which renders the algorithm computationally more effective.
Note that this procedure cannot be performed when the measurement times are fixed (i.e. in the standard particle filter). In this case the ESS does not depend on the stepsize, and a reduction of it will not improve the ESS. The application of the MTU particle filter with distributed measurement times is essential to be able to use this adaptive stepsize procedure.