Automatic validation of computational models using pseudo-3D spatio-temporal model checking

Background Computational models play an increasingly important role in systems biology for generating predictions and in synthetic biology as executable prototypes/designs. For real life (clinical) applications there is a need to scale up and build more complex spatio-temporal multiscale models; these could enable investigating how changes at small scales reflect at large scales and viceversa. Results generated by computational models can be applied to real life applications only if the models have been validated first. Traditional in silico model checking techniques only capture how non-dimensional properties (e.g. concentrations) evolve over time and are suitable for small scale systems (e.g. metabolic pathways). The validation of larger scale systems (e.g. multicellular populations) additionally requires capturing how spatial patterns and their properties change over time, which are not considered by traditional non-spatial approaches. Results We developed and implemented a methodology for the automatic validation of computational models with respect to both their spatial and temporal properties. Stochastic biological systems are represented by abstract models which assume a linear structure of time and a pseudo-3D representation of space (2D space plus a density measure). Time series data generated by such models is provided as input to parameterised image processing modules which automatically detect and analyse spatial patterns (e.g. cell) and clusters of such patterns (e.g. cellular population). For capturing how spatial and numeric properties change over time the Probabilistic Bounded Linear Spatial Temporal Logic is introduced. Given a collection of time series data and a formal spatio-temporal specification the model checker Mudi (http://mudi.modelchecking.org) determines probabilistically if the formal specification holds for the computational model or not. Mudi is an approximate probabilistic model checking platform which enables users to choose between frequentist and Bayesian, estimate and statistical hypothesis testing based validation approaches. We illustrate the expressivity and efficiency of our approach based on two biological case studies namely phase variation patterning in bacterial colony growth and the chemotactic aggregation of cells. Conclusions The formal methodology implemented in Mudi enables the validation of computational models against spatio-temporal logic properties and is a precursor to the development and validation of more complex multidimensional and multiscale models. Electronic supplementary material The online version of this article (doi:10.1186/s12918-014-0124-0) contains supplementary material, which is available to authorized users.

: Considered approximate probabilistic model checking approaches. Bayesian methods consider prior knowledge about the parameters and variables in the model when deciding if a logic property holds. Conversely frequentist approaches assume no prior knowledge is available. All methods except probabilistic black-box take as input a user-defined upper bound on the approximation error. They request additional model executions until the result is sufficiently accurate. Probabilistic black-box model checking takes a fixed number of model simulations as input and computes a p-value as the confidence measure of the result.

Frequentist Bayesian
Estimate Chernoff-Hoeffding bounds [1] Mean and variance [3] Hypothesis Statistical [7] Statistical [2] testing Probabilistic black-box [5,6] In the initialisation step of Algorithm 1 nrOfTimeoutSeconds, the number of seconds to wait between re-executing the extra evaluation program, is fixed. The reason for introducing such a variable is to temporarily wait and allow the model simulator to finish its execution before verifying if new simulations were provided. Afterwards the collection of valid model simulations is initialised based on the given simulationsInputSet. The model checker of type modelCheckingType is then executed to verify if the logic property logicProperty holds considering the available simulations and set Algorithm 1 The wrapper algorithm employed to call specific model checking algorithms (see Table 1 for the considered approaches). If sufficient model simulations are available, respectively generated and evaluated within extraEvaluationT ime minutes, then the chosen specific model checking algorithm is used to provide an answer. Otherwise the user is informed that the maximum extra evaluation time threshold was reached and the answer is provided using the probabilistic black-box model checking approach. Model simulations are generated and stored in an input set simulationsInputSet using the external model simulation program extraEvaluationP rogram. The logic property to be verified is stored in the variable logicP roperty.
Require: modelCheckingT ype is the specific model checking approach, modelCheckingP arameters is the collection of parameters required by the chosen modelCheckingT ype, extraEvaluationT ime is the maximum number of minutes allowed for generating and evaluating additional model simulations, extraEvaluationP rogram is the model simulation program which is called whenever new simulations are required, simulationsInputSet is the set containing the simulations and logicP roperty is the PBLSTL logic property to be verified Ensure: A true/false answer together with a measure of confidence is provided The default number of seconds to wait 2: between re-executing the extra GenerateModelSimulations(extraEvaluationP rogram);
4. The modelCheckingType model checker execution is resumed considering the additional simulations.
The loop is exited when either extraEvaluationTime minutes elapsed or enough model simulations have been provided. In the former case the probabilistic black-box model checker is executed to provide a result. Otherwise the result is computed using the modelCheckingType model checker. In the end both result and confidence measure are reported to the user. The main advantages of Algorithm 1 are: • The model checking execution time and number of generated and evaluated simulations is finite. Depending on the parameters of the model checker, the distribution of the data and the number of required simulations the answer will be provided using the desired model checker type or the default probabilistic black-box model checker.
• In contrast to traditional model checking methods in our approach the model checking task is decoupled from a specific model and model simulation environment (e.g. Matlab [4]). An external program which can generate simulations is provided as input to the model checker. Whenever additional model simulations are required this external program is executed. For the algorithm implementation our recommendation is that the employed external program should be a script (e.g. Bash [UNIX], Batch [Windows]) which calls the model simulator and stores the output into the specified location.

Finite number of state transitions
Logic properties are evaluated with respect to simulations of computational models. In order to be able to decide if the logic property is satisfied, the model simulation must cover a sufficiently long time frame. Stopping the simulation early could potentially render the evaluation of temporal logic properties undecidable. Therefore there is a need for a mechanism to decide when a simulation execution can be stopped. When verifying BLSTL logic properties an upper bound can be placed on the required simulation time because all temporal logic operators are bounded. Let us denote the upper bound corresponding to a BLSTL logic property ψ by ψ .
Definition. The upper bound ψ ∈ N corresponding to a BLSTL logic property ψ considering an execution σ is defined recursively on the structure of the logic property as follows: • nsm nm = 0 because the value of nsm and nm is computed considering only σ[0]; • nsv nm = 0 because the value of nsv and nm is computed considering only σ[0]; • d(nm1) nm2 = 1 because the value of nm1 is computed considering both σ[0] and σ [1]; Thus the minimum simulation time frame to be covered by model executions when verifying a BLSTL logic property ψ is [0, ψ ].
Proof. We will prove the results of Lemma 1 recursively on the structure of the logic property ψ as described below: 1. σ |= nsm nm if and only ifσ |= nsm nm Proof.    Proof. (c) By Definition 9 ψ = 1 which means that according to the assumptions of Lemma 1ŝ 0 = s 0 andŝ 1 = s 1 . Hence the symbols nm1 0 , nm1 1 and nm2 are evaluated to the same values for both σ andσ.
(c) By Definition 9 ¬ψ = ψ which means that according to the assumptions of Lemma 1ŝ 0 = s 0 ,ŝ 1 = s 1 , ...,ŝ m = s m where the value of m is determined such that sufficient timepoints are recorded for the evaluation of ψ. Hence the semantics of ψ considering σ is equivalent to the semantics of ψ consideringσ.
(c) By Definition 9 ψ 1 ∧ ψ 2 = max( ψ 1 , ψ 2 ) which means that according to the assumptions of Lemma 1ŝ 0 = s 0 ,ŝ 1 = s 1 , ...,ŝ m = s m where the value of m is determined such that sufficient timepoints are recorded for the evaluation of both ψ 1 and ψ 2 . Hence the semantics of ψ 1 and ψ 2 is the same considering both σ andσ.
(a) σ |= ψ 1 U [a, b] ψ 2 if and only if ∃i, a ≤ i ≤ b such that σ i |= ψ 2 , and for all j, a ≤ j < i, σ j |= ψ 1 ; and for all j , a ≤ j < i ,σ j |= ψ 1 ; (c) By Definition 9 ψ 1 U [a, b] ψ 2 = b + max( ψ 1 , ψ 2 ). This means that according to the assumptions of Lemma 1ŝ 0 = s 0 ,ŝ 1 = s 1 , ..., s m = s m where the value of m is determined such that sufficient time points are recorded for the evaluation of both ψ 1 and ψ 2 considering (d) From 9c it follows that for any suffix execution σ h /σ h , a ≤ h ≤ b the semantics of ψ 1 and ψ 2 is the same.
(e) From 9d it follows that ∃i, (f) From 9d it follows that ∀j, a ≤ j < i ≤ b such that σ j |= ψ 1 if and only if ∀j , a ≤ j < i ≤ b, i = i, j = j such thatσ j |= ψ 1 .
(g) From 9e and 9f it follows that ∃i, a ≤ i ≤ b such that σ i |= ψ 2 and ∀j, a ≤ j < i ≤ b such that σ j |= ψ 1 if and only if ∃i , a ≤ i ≤ b, i = i such thatσ i |= ψ 2 and ∀j , a ≤ j < i ≤ b, j = j such that σ j |= ψ 1 .
(i) From 9b and 9h it follows that σ |= (c) By Definition 9 F [a, b] ψ = b + ψ . This means that according to the assumptions of Lemma 1ŝ 0 = s 0 ,ŝ 1 = s 1 , ...,ŝ m = s m where the value of m is determined such that sufficient time points are recorded for the evaluation of ψ considering any execution suffix (d) From 10c it follows that the semantics of ψ is equivalent for suffix
(d) From 12c it follows that the semantics of ψ is equivalent for suffix executions σ 1 andσ 1 .
(d) From 13c it follows that the semantics of ψ is equivalent for suffix executions σ k andσ k .
(d) From 14c it follows that the semantics of ψ is equivalent for both σ andσ.
Lemma 2. The number of state transitions required to verify a BLSTL logic property is finite.
Proof. From Lemma 1 it follows that a BLSTL logic property ψ can be verified against a model simulation σ based on a finite prefixσ. The minimum time interval captured byσ is bounded and can be computed using Definition . Since we assume the time divergence property holds for all the considered systems only a finite number of state transitions can occur in a bounded interval of time.
Well-defined model checking problem Theorem 1. The spatio-temporal model checking problem is well-defined.
Proof. It was shown that the number of required model executions in order to verify if a PBLSTL logic property φ holds is finite. Moreover considering Lemmas 1 and 2 only a finite prefix and a finite number of state transitions has to be considered for each model execution. Thus the evaluation of φ is reduced to the problem of evaluating non-temporal properties over a finite number of states for each model execution. This implies evaluating arithmetic expressions and/or detecting spatial entities which are both decidable. Hence the model checking problem is well-defined.