Bayesian uncertainty analysis for complex systems biology models: emulation, global parameter searches and evaluation of gene functions
 Ian Vernon^{1}Email authorView ORCID ID profile,
 Junli Liu^{2}Email author,
 Michael Goldstein^{1},
 James Rowe^{2, 3},
 Jen Topping^{2} and
 Keith Lindsey^{2}
https://doi.org/10.1186/s1291801704843
© The Author(s) 2017
Received: 28 July 2017
Accepted: 9 November 2017
Published: 2 January 2018
Abstract
Background
Many mathematical models have now been employed across every area of systems biology. These models increasingly involve large numbers of unknown parameters, have complex structure which can result in substantial evaluation time relative to the needs of the analysis, and need to be compared to observed data of various forms. The correct analysis of such models usually requires a global parameter search, over a high dimensional parameter space, that incorporates and respects the most important sources of uncertainty. This can be an extremely difficult task, but it is essential for any meaningful inference or prediction to be made about any biological system. It hence represents a fundamental challenge for the whole of systems biology.
Methods
Bayesian statistical methodology for the uncertainty analysis of complex models is introduced, which is designed to address the high dimensional global parameter search problem. Bayesian emulators that mimic the systems biology model but which are extremely fast to evaluate are embeded within an iterative history match: an efficient method to search high dimensional spaces within a more formal statistical setting, while incorporating major sources of uncertainty.
Results
The approach is demonstrated via application to a model of hormonal crosstalk in Arabidopsis root development, which has 32 rate parameters, for which we identify the sets of rate parameter values that lead to acceptable matches between model output and observed trend data. The multiple insights into the model’s structure that this analysis provides are discussed. The methodology is applied to a second related model, and the biological consequences of the resulting comparison, including the evaluation of gene functions, are described.
Conclusions
Bayesian uncertainty analysis for complex models using both emulators and history matching is shown to be a powerful technique that can greatly aid the study of a large class of systems biology models. It both provides insight into model behaviour and identifies the sets of rate parameters of interest.
Keywords
Background
Fundamental challenges facing systems biology
Recent advances in genome sequencing techniques, a variety of ‘omic’ techniques and bioinformatic analyses, have led to an explosion of systemswide biological data. Thus, identification of molecular components at the genome scale based on biological data has become possible. However, a major challenge in biology is to analyse and predict how functions in cells emerge from interactions between molecular components. Computational and mathematical modelling provide compelling tools to study the nonlinear dynamics of these complex interactions [1]. A particular example is kinetic modelling, in which the kinetics of each biological reaction are described in accordance with the corresponding biological process, and the properties of the whole system are described using differential equations: a common tool for analysing biological systems [2–4].
A critical problem found in the mathematical modelling of many complex biological systems, which is of particular severity in kinetic modelling, is that the models often contain large numbers of uncertain parameters (a common type being reaction rate parameters). In most cases, such kinetic parameters cannot be directly measured as experiments typically measure concentrations rather than rates. Even when such parameters can be measured ‘directly’, this is usually in experimental conditions that are significantly different from the cellular environment we wish to study. Therefore, we have to compare the mathematical model’s outputs with experimental observations, often in the form of measured concentrations and trends, and determine which values of the input or rate parameters will achieve an acceptable match between model and reality. This involves consideration of several sources of uncertainty including observation error, biological variability and the tolerance we place on the model’s accuracy, known as the model discrepancy. It is vital that we perform a global parameter search for all input parameter settings that achieve an acceptable match. This is because a single solution for the rate parameter values may suggest certain biological implications and give particular predictions for future experiments, both of which could be gravely misleading were we to explore the parameter space further and find several alternative solutions that give radically different implications and predictions. This is a mistake that is disturbingly common.
Unfortunately, performing global parameter searches over high dimensional spaces can be extremely challenging for several reasons, most notably: (a) the complex structure of the model and hence the complex way it imposes constraints on the parameters, (b) the substantial model evaluation time relative to the needs of the analysis, (c) the need for a careful assessment of an “acceptable match” that incorporates appropriately all the complexities and uncertainties of the comparison between the model and the real system, and (d) high dimensional spaces, being extremely large, require vast numbers of model evaluations to explore. For example, some spatial models of root development [5] require at least several minutes for a single evaluation. It is worth considering how large high dimensional spaces are: were we just to evaluate the model in question at the corners of the initial input space, in say 32 dimensions, we would require 2^{32}≃4.3 billion evaluations, which would take approximately 136 years if the model took 1 second per evaluation. However, global parameter searches are essential for any meaningful inference or prediction to be made about the biological system. Therefore this represents a fundamental challenge for the whole of systems biology. This article describes practical methodology to address this problem, based on Bayesian statistics methodology for the uncertainty analysis of complex models [6–9].
Bayesian emulation and uncertainty analysis
The issues surrounding the analysis of complex models under uncertainty, and specifically the global parameter search problem, are not unique to systems biology, and have been encountered in many different scientific disciplines. An area of Bayesian statistics has arisen to meet the demand of such analyses. This area, sometime referred to as the uncertainty analysis of computer models, centres around the construction of Bayesian emulators [6–9]. An emulator is a statistical construct that mimics the scientific model in question, providing predictions of the model outputs with associated uncertainty, at as yet unevaluated input parameter settings. The emulator is however, extremely fast to evaluate [10]. It provides insight into the model’s structure and, thanks to its speed, it can be used to help perform the global parameter search far more efficiently than approaches that just use the comparatively slow scientific model itself (for examples see [6, 8, 11–14]).
Many analyses and corresponding parameter searches still fail because an appropriate measure of an acceptable match between model and reality is not defined. This can lead to the use of badly behaved objective functions that do not properly capture the desired match criteria, and which are often harder to explore in high dimensions, due to increased numbers of ridges, spikes and local minima. The Bayesian emulation methodology we introduce naturally incorporates more detailed statistical models of the difference between the model outputs and the observed data, which allow the inclusion of important sources of uncertainty such as observational error and model discrepancy, the later being the upfront acknowledgement of the limitations of the current model. Various structures of increasing complexity are available for the representation of these uncertainties, depending on the requirements and importance of the study (see [6, 15–18] for examples and discussion).
It is worth noting that due to their speed, the use of emulators would greatly improve the efficiency of many forms of analysis that a modeller may wish to perform, e.g. for a fully Bayesian MCMC analysis [9, 19, 20] or for more direct global parameter searches such as [21]. However for high dimensional models, the particular strategy chosen for a parameter search is vital. Many approaches struggle due to being trapped in local minima (of which there may be many) or because they chase the scientifically spurious best match parameter setting. Here, we describe an efficient global parameter search method known as Bayesian history matching, which has proved very successful across a wide range of scientific disciplines including cosmology [6, 7, 15, 22–24], epidemiology [11, 25], oil reservoir modelling [8, 26–28], climate modelling [12], environmental science [16] and traffic modelling [29].
It utilises Bayesian emulators to reduce efficiently the input parameter space in iterations or waves, by identifying regions that are implausible as matches to the observed data, with the objective of identifying all acceptable input parameter settings. It is a careful approach that avoids many of the traps of common parameter search techniques.
Hormonal crosstalk network in Arabidopsis root development
Understanding how hormones and genes interact to coordinate plant growth is a major challenge in developmental biology. The activities of auxin, ethylene and cytokinin depend on the cellular context and exhibit either synergistic or antagonistic interactions. Previously, three of our authors developed a hormonal crosstalk network for a single Arabidopsis cell by iteratively combining modelling with experimental analysis [30]. Kinetic modelling was used to analyse how such a network regulates auxin concentration in the Arabidopsis root, by controlling the relative contribution of auxin influx, biosynthesis and efflux; and by integrating auxin, ethylene and cytokinin signalling [30]. Although some of the parameters in the model were based on experimental data, most parameters were chosen in an ad hoc way, by adjusting them to fit experimental data. Conditional on those somewhat ad hoc choices, it was shown that the hormonal crosstalk network quantitatively describes how the three hormones (auxin, ethylene, and cytokinin) interact via POLARIS peptide (PLS) [31, 32] to regulate plant root growth [30].
In this work we demonstrate the power of the Bayesian emulation methodology by applying it to the hormonal crosstalk network in Arabidopsis root development. Specifically, we explore the model’s 32dimensional parameter space, and identify the set of all acceptable matches between model outputs and experimental data, taking into account major sources of uncertainty. This provides much insight into the model’s structure and the constraints imposed on the rate parameters by the current set of observed data. We apply the methodology to a second, competing model, and hence are able to investigate gene functions robustly. As an example, our analysis suggest that, in the context of the hormonal crosstalk network, POLARIS peptide (PLS) must have a role in positively regulating auxin biosynthesis.
The paper is organised as follows. In the “Methods” section we begin by defining a simple 1dimensional toy model that we use to illustrate our definitions and to demonstrate the three main parts of the Bayesian methodology: linking the model to reality, Bayesian emulation, and history matching, before going on to compare the strengths and weaknesses of Bayesian history matching to more standard approaches. In the “Results” and “Discussion” section we describe in detail the application of this methodology to the full 32 dimensional Arabidopsis model, and discuss the relevant insights and biological implications obtained.
Methods
Simple 1dimensional exponential example

Linking the model to reality

Bayesian Emulation

History matching: a global parameter search
We will temporarily assume the initial conditions are f _{0}=f _{ t=0}(x)=1. The system runs from t=0 to t=5 and we are at first interested in the value of f _{ t }(x) at t=3.5. This mathematical model features an input or rate parameter x, which we wish to learn about. We do this using a measurement of the real biological system at t=3.5 which we denote z, which corresponds to, but is not the same as, the model output f _{ t=3.5}(x). Note that usually, for models of realistic complexity, we would not have the analytic solution for f _{ t }(x) given by Eq. (2). Instead we would resort to a numerical integration method to solve Eq. (1) that might require significant time for one model evaluation, ranging from less than a second to hours, days or even weeks, depending on the full complexity of the model [6, 12]. Such a computational cost, for a single evaluation of the model, means that a full global parameter search is computationally infeasible, especially when the model has many rate parameters and therefore a high dimensional input space, which may require vast numbers of evaluations to explore fully.
 1
When comparing the model to observed data from the real biological system, an adequate statistical description of the link between model and reality, covering all major uncertainties, is required.
 2
For complex models, the time taken to evaluate the model numerically is so long that an exhaustive exploration of the model’s behaviour is not feasible.
 3
The appropriate scientific goal should be to identify all locations in input parameter space that lead to acceptable fits between model and data, and not just find the location of a single good match.
Methods to address these three fundamental issues are described in the next three sections.
Model discrepancy and linking the model to reality
Most systems biology models are developed to help explain and understand the behaviour of corresponding real world biological systems. An essential part of determining whether such models are adequate for this task is the process of comparing the model to experimental data. As a comparison of this kind involves several uncertainties that cannot be ignored, it is therefore vital to develop a clearly defined statistical model for the link between systems biology model f(x) and reality z. This allows for a meaningful definition of an ‘acceptable’ match between a model run and the observed data. Here we describe a simple yet extremely useful statistical model for the link between the biological model and reality, that has been successfully applied in a variety of scientific disciplines, for example climate, cosmology, oil reservoir modelling and epidemiology [6, 8, 11, 12].
where we represent the errors as additive, although more complex forms could be used. Note that z, y and e here represent vectors of random quantities, which will reduce to scalar random quantities if there is only one quantity of interest. A common specification [6] that we will employ here is to judge the errors to be independent from y, and unbiased with expectation E(e)=0 and, for the scalar case, \(\text {Var}(e) = \sigma _{e}^{2}\).
where the ε is the model discrepancy: a vector of uncertain quantities that represents directly the difference between the model and the real system. Again we treat y, f, x ^{∗} and ε as vectors of random quantities. A simple and popular specification [6] would be to judge that ε is independent of f(x ^{∗}), x ^{∗} and e, with E(ε)=0 and, in the scalar case, \(\text {Var}(\epsilon) = \sigma _{\epsilon }^{2}\). In a multivariate setting, where f(x) describes a vector of outputs (for example, with each output labelled by time t), the vector ε may have an intricate structure, possessing nonzero covariances between components of ε. This could capture the related deficiencies of the model across differing time points. Various structures of increasing complexity are available (for examples see [6, 8, 9]), along with methods for specification of their components [6, 16].
While the explicit inclusion of the model discrepancy term ε is unfamiliar, it is now standard practice in the statistical literature for complex models [7, 9, 17, 33]. Furthermore, any analysis performed without such a term is implicitly conditioned with the statement “given the model is a perfect representation of reality for some value of the inputs x”, a statement that is rarely true. The model discrepancy allows us to perform a richer analysis than before as we can now include any extra knowledge we have about the model’s deficiencies to improve our modelling of reality y, through the joint structure of ε (see for example [6, 16]). This is especially important in prediction: as a simple example, if our model undershot every auxin output we have measured so far, we may suspect that it will undershoot future measurements of auxin also, and may wish to build this into our prediction for future y [33].
We can specify probabilistic attributes of ε a priori, or learn about them by comparing to observed data. For direct specification, there are often various simple experiments that can be performed on the model itself to obtain assessments of σ _{ ε } and other aspects if necessary. For example, often models are run from exact initial conditions, so performing a set of exploratory model evaluations with the initial conditions appropriately perturbed would provide a lower bound on σ _{ ε }. See [16] where several such assessment methods are demonstrated, for more details.
1dimensional example
Figure 2 (right panel) shows the far more realistic situation where we include both observation error e (the black dashed lines represent z±3σ _{ e }) and model discrepancy ε (the purple dashed lines show f(x)±3σ _{ ε }). As we have taken into account both major types of uncertainty there is now a range of acceptable values for x (green points) with borderline/unacceptable points in yellow/red. If we were to consider additional outputs of the model, we still have a chance to match them simultaneously to data for a subset of the currently acceptable points. If on the other hand we cannot find any acceptable points x even given the uncertainties represented by e and ε, then we can state that the model is inconsistent with the observed data and therefore most likely based on incorrect biological principles. Further, inclusion of the observation error and model discrepancy terms often aids a global parameter search as they tend to smooth the likelihood surface (or comparable objective function), making it both easier to explore while simultaneously more robust. They also help reduce the temptation to chase the often scientifically misleading global minimum, such as the lone green point in Fig. 2 (left panel), instead suggesting that the identification of a set of acceptable input points is the appropriate goal for such a search (see the green points in Fig. 2 (right panel)).
Bayesian emulation of systems biology models
Many complex mathematical models have been developed and employed within the area of systems biology. Often these models have high dimensional input spaces in that they posses several input parameters, for example reaction rate parameters, that must be specified in order to evaluate the model. We represent the list of such inputs as the vector x, with individual inputs as x _{ k } with k=1,…,d. The model may have any number of outputs, denoted as the vector f(x), with individual outputs as f _{ i }(x) with i=1,…,q, the behaviour of which we want to investigate, possibly comparing some of these to observed data. For example, the index i may label the different times we are interested in, or the different chemical outputs of the model, or both. Most models are complex enough that they require numerical integration methods to solve, and hence take appreciable time to evaluate. This evaluation time can range anywhere from less than a second to minutes, hours or even days for highly sophisticated models: our approach is applicable in any of these cases, and adds more value as the dimensionality and evaluation time of the model increases.
A Bayesian emulator is a fast, approximate mimic of the full systems biology model. It gives insight into the structure of the model’s behaviour and can be used instead of the model in many complex calculations. The emulator gives a prediction of what the model’s output f(x) will be at a yet to be evaluated input point x, and additionally provides an associated uncertainty for that prediction (these are often expressed as posterior distributions, or simply expectations and variances in some cases). Critically an emulator is extremely fast to evaluate as it only requires a few matrix multiplications, and hence can be used to explore the input space more fully, as for example in a global parameter search.
with expectation zero and \(\text {Var} (w_{i}(x)) = \sigma ^{2}_{w_{i}}\), that represents the effects of the remaining inactive input variables [6].
The emulator, as given by Eq. (5), possesses various desirable features. The regression term, given by \(\sum _{j} \beta _{ij} g_{ij}(x_{A_{i}})\), is often chosen to represent say a third order polynomial in the active inputs. This would attempt to mimic the large scale global behaviour of the function f _{ i }(x), and in many cases, will capture a large proportion of the model’s structure. (It is worth noting that reasonably accurate emulators can often be constructed just using regression models, for example using the lm() function in R. This can be a sensible first step, before one attempts the construction of a full emulator of the form given in Eq. (5).) The second term \(\phantom {\dot {i}}u_{i}(x_{A_{i}})\), the Gaussian process, mimics the local behaviour of f _{ i }(x) and specifically its local deviations from the third order polynomial given by the regression terms. We can choose the list of active inputs \(x_{A_{i}}\) using various statistical techniques for example, classical linear model fitting criteria such as AIC or BIC [6]. A list of say p active inputs for a particular output f _{ i }(x) means that we have reduced the input dimension from d to p dimensions, which can result in large efficiency gains. The small remaining effect of the inactive inputs is captured by the third term w _{ i }(x) in Eq. (5).
We then fit the emulator given by Eq. (5) to the set of model runs using our favourite statistical tools, guided by expert judgement. Specifically we would prefer a fully Bayesian approach if we required full probability distributions [9], and a Bayes Linear approach [38, 39], which we will describe below, if we required purely expectations, variances and covariances of f(x). We make certain pragmatic choices in the emulator construction process, for example, while we keep the regression coefficients β _{ ij } uncertain, we may directly specify \(\sigma ^{2}_{u_{i}}\), \(\sigma ^{2}_{w_{i}}\) and θ _{ i } a priori, or use suitable plugin estimates [6].
The emulators then provide an expectation and variance for the value of f(x) at an unexplored input point x. We can test the emulators using a series of diagnostics, for example checking their prediction accuracy over a new batch of runs [40]. See [10] for an introduction and [6, 7, 9] for detailed descriptions of emulator construction.
All quantities on the right hand side of Eqs. (8) and (9) can be calculated from Eqs. (5) and (6) combined with prior specifications for E(β _{ ij }), Var(β _{ ij }), \(\sigma ^{2}_{u_{i}}\), \(\sigma ^{2}_{w_{i}}\) and θ _{ i }. Note that we could have used the entire collection of model outputs D={D _{1},D _{2},…,D _{ q }} instead of just D _{ i } in Eqs. (8) and (9), if we had specified a more complex, multivariate emulator [41].
\(\mathrm {E}_{D_{i}}(f_{i}(x))\) and \(\text {Var}_{D_{i}}(f_{i}(x))\) are used directly in the implausibility measures used for the global parameter searches described below.
1dimensional example
We now demonstrate the construction of an emulator for the simple one dimensional exponential model. As there is only one output dimension, f(x) is now a scalar, so we drop the i index from Eqs. (59).
where again the output of interest has t=3.5.
For simplicity we treat the constant term β _{0} as known and hence set Var(β _{0})=0, and choose prior expectation E(β _{0})=β _{0}=3.5, a value which we expect the function outputs to be approximately centred around. We specify the parameters in the covariance function for u(x) given by Eq. (6) to be σ _{ u }=1.5 and θ=0.14 representing curves of moderate smoothness: this process will be discussed in more detail for the full Arabidopsis model.
We can now calculate the adjusted expectation and variance E_{ D }(f(x)) and Var_{ D }(f(x)) from Eqs. (8) and (9) respectively.
Figure 4 (left panel) shows E_{ D }(f(x)) as a function of x as the blue line. We can see that it precisely interpolates the five known runs at outputs D, which is desirable as f(x) is a deterministic function. The blue line also gives a satisfactory estimate of the true function f(x)= exp(3.5x). The red pair of lines give the credible interval \(\mathrm {E}_{D}(f(x)) \pm 3 \sqrt {\text {Var}_{D}(f(x))}\) as a function of x. This defines a region between the lines that we believe is highly likely to contain the true function f(x). Another desirable feature of the emulator is that these credible intervals decrease to zero width at the five known run locations, as is appropriate for a deterministic function, as we precisely know the value of f(x) there (and because we have no inactive variables). Therefore when x is close to a known run we are more certain about the possible values of f(x), compared to when x is far from any such runs.
which has been simulated at only 10 input points evenly spread between x ^{(1)}=0.1 and x ^{(10)}=0.5. Here the prior emulator specifications were as in the previous example, but with E(β _{0})=0, σ _{ u }=0.6 and θ=0.06 allowing for functions with more curvature, centred around zero. As before the blue and red lines show E_{ D }(f(x)) and \(\mathrm {E}_{D}(f(x)) \pm 3 \sqrt {\text {Var}_{D}(f(x))}\) as functions of x. The true function f(x) is given by the solid black line and it can be seen that it lies within the credible region for all x, only getting close to the boundary for x>0.5. This demonstrates the power of the emulation process: with the use of only 10 points the emulator accurately mimics a reasonably complex function with five turning points. We will demonstrate the effectiveness of emulators in higher dimensions for the main Arabidopsis model example.
History matching: an efficient global parameter search
Bayesian emulation is very useful in a variety of situations. As emulators are extremely fast to evaluate, they can replace the original model in any larger calculation, for example when designing future experiments. They can also provide much structural insight into the behaviour of the model. One of the most important applications of emulation is to the problem of performing a global parameter search. In this section we describe a powerful iterative global search method known as history matching, which has been successfully employed in a variety of scientific disciplines including galaxy formation [6, 7, 15, 22, 24], epidemiology [11, 25, 42, 43], oil reservoir modelling [8, 26–28], climate modelling [12], environmental science [16, 44] and traffic modelling [29]. Many of these applications involved models with substantial runtime, for which the process of emulation is vital.
 1
Are there any input parameter settings that lead to acceptable matches between the model output and observed data?
 2
If so, what is the full set \(\mathcal {X}\) that contains all such input parameter settings?
History matching is designed to answer these questions. It proceeds iteratively and employs implausibility measures to determine parts of the input space that can be discarded from further investigation.
The numerator of Eq. (18) gives the distance between the emulator expectation \(\mathrm {E}_{D_{i}}(f_{i}(x))\) and the observation z _{ i }, while the denominator standardises this quantity by all the relevant uncertainties regarding this distance: the emulator variance \(\text {Var}_{D_{i}}(f_{i}(x))\), the model discrepancy variance Var(ε _{ i }) and the observation error variance Var(e _{ i }). This structure is a direct consequence of Eqs. (3) and (4). A large value of I _{ i }(x) for a particular x implies that we would be unlikely to obtain an acceptable match between f _{ i }(x) and z _{ i } were we to run the model there. Hence we can discard the input x from the parameter search if I _{ i }(x)>c, for some cutoff c, and refer to such an input as implausible. We may choose the cutoff c by appealing to Pukelsheim’s 3sigma rule [45], which is the powerful result that states that for any continuous, unimodal distribution, 95% of its probability must lie within ± 3σ, regardless of asymmetry, skew, or heavy tails, which suggests that a choice of c=3 could be deemed reasonable [6].
where Q represents the collection of all outputs, or some important subset of them (often we will only emulate a small subset of outputs in early iterations). A more robust approach would be to consider the second or third maximum implausibility, hence allowing for some inaccuracy of the emulators [6]. Also, multivariate implausibility measures are available (see [6] for details), but these require a more detailed prior specification, for example this requires covariances between different components of e and ε. Note that a low value of the implausibility I _{ M }(x) does not imply that the input point x is ‘good’ or ‘plausible’ as it still may lead to a poor fit to outputs that have not been included in Q yet. Also, low implausibility at x may occur because of a high emulator variance \(\text {Var}_{D_{i}}(f_{i}(x))\) which once resolved following further runs of the model, may then lead to a high implausibility at x. Hence we refer to low implausibility inputs x as “nonimplausible", consistent with the literature [6–8, 11, 24, 25, 28].
 1
Design and evaluate a well chosen set of runs over the current nonimplausible space \(\mathcal {X}_{k}\). e.g. using a maximin Latin hypercube with rejection [6].
 2
Check to see if there are new, informative outputs that can now be emulated accurately (that were difficult to emulate well in previous waves) and add them to the previous set Q _{ k−1}, to define Q _{ k }.
 3
Use the runs to construct new, more accurate emulators defined only over the region \(\mathcal {X}_{k}\) for each output in Q _{ k }.
 4
The implausibility measures I _{ i }(x), i∈Q _{ k }, are then recalculated over \(\mathcal {X}_{k}\), using the new emulators.
 5
Cutoffs are imposed on the Implausibility measures I _{ i }(x)<c and this defines a new, smaller nonimplausible volume \(\mathcal {X}_{k+1}\) which should satisfy \(\mathcal {X} \subset \mathcal {X}_{k+1} \subset \mathcal {X}_{k}\).
 6
Unless a) the emulator variances for all outputs of interest are now small in comparison to the other sources of uncertainty due to the model discrepancy and observation errors, or b) the entire input space has been deemed implausible, return to step 1.
 7
If 6 a) is true, generate as large a number as possible of acceptable runs from the final nonimplausible volume \(\mathcal {X}\), sampled depending on scientific goal.
We then analyse the form of the nonimplausible volume \(\mathcal {X}\), the behaviour of model evaluations from different locations within it and the corresponding biological implications. If the entire input space has been deemed implausible in step 6 b), this may imply that the model is inconsistent with the observed data, with respect to the specified uncertainties. This could be because the biological principles that underlie the model’s structure are incorrect, and hence remodelling is required. Or that we may have underestimated the observation errors, the model discrepancy or even the emulator uncertainty, although emulator diagnostics [40] combined with the choice of fairly conservative cutoffs should make the latter unlikely. Note that concluding that a far larger model discrepancy is needed is essentially stating that the model is highly inaccurate, and may, for example be judged unfit for purpose.

As we progress through the waves and reduce the volume of the region of input space of interest, we expect the function f(x) to become smoother, and hence to be more accurately approximated by the regression part of the emulator \(\phantom {\dot {i}}\beta _{ij} g_{ij}(x_{A_{i}})\), which is often composed of low order polynomials (see Eq. 5).

At each new wave we have a higher density of points in a smaller volume and hence the Gaussian process term \(\phantom {\dot {i}}u_{i}(x_{A_{i}})\) in the emulator will be more effective, as it depends mainly on the proximity of x to the nearest runs.

In later waves the previously strongly dominant active inputs from early waves will have their effects curtailed, and hence it will be easier to select additional active inputs, unnoticed before.

There may be several outputs that are difficult to emulate in early waves (perhaps because of their erratic behaviour in uninteresting parts of the input space) but simple to emulate in later waves once we have restricted the input space to a much smaller and more biologically realistic region.
1dimensional example
The nonimplausible space \(\mathcal {X}_{1}\) at wave 1 is the full initial range of the rate parameter x, which is 0.075<x<5.25. If we impose cutoffs of I(x)<3 then this defines the wave 2 nonimplausible space \(\mathcal {X}_{2}\) as shown by the green region of the xaxis in Fig. 5 (left panel).
We then perform the second wave by designing a set of two more runs over \(\mathcal {X}_{2}\), reconstructing the emulator over this region, and recalculating the implausibility measure I(x). The results of this second wave are shown in Fig. 5 (right panel). It can be seen that the emulator is now highly accurate over the \(\mathcal {X}_{2}\) region and that the nonimplausible region in green has been further reduced. As the emulator is now far more accurate than the corresponding observation error, we may stop the analysis with this wave as \(\mathcal {X}_{3} \simeq \mathcal {X}\), implying that further runs will do little to reduce the nonimplausible region further. Note that providing we have enough runs in each wave, we would often create new emulators at each wave, defined only over the current green nonimplausible region [6], instead of updating emulators from previous waves, as in Fig. 5. See the Additional file 1 for R code to reproduce the example model output, discrepancy, emulation and history matching plots of Figs. 1, 2, 4, and 5 respectively.
History matching and Bayesian MCMC
Here we discuss the standard form of a full Bayesian analysis, and compare it to the above history matching approach, highlighting the relative strengths and weaknesses of each method.
History matching attempts to answer efficiently some of the most important questions that a modeller may have, identifying if the model can match the data, and where in input space such acceptable matches can be found. It requires only a limited specification related to the key uncertain quantities, in terms of means, variances and covariances. A fully Bayesian approach goes further [46], and delivers a posterior distribution across all uncertain quantities, which has the benefit of providing probabilistic answers to most scientific questions e.g. in this context it gives the posterior distribution of the location of the true input x ^{∗}. However, it requires a more detailed prior specification of joint probability distributions across all these quantities, and critically, it also assumes the existence of a single true x ^{∗} (and the accuracy of the statistical model that defines it). This may not be judged appropriate, for example, in a situation where two different x ^{∗}s were found that gave good matches to two different subsets of the observed data. In more detail, a fully Bayesian specification in the context of Eqs. (3) and (4) requires the multivariate distributions π(zy),π(yf(x ^{∗})),π(f(x ^{(j)})x ^{(j)}) for a collection of inputs x ^{(j)}, and a prior distribution π(x ^{∗}) over the true input x ^{∗}. Meaningful specifications of this form can be difficult to make, for example, often familiar distributional forms are assumed such as the multivariate normal distribution, but such choices are often made for mathematical convenience or computational tractability. Really, choices of this kind demand a careful justification, without which results such as the posterior for x ^{∗} rapidly lose meaning. For example, the fully Bayesian approach will, after substantial calculation, return a posterior for x ^{∗}, which may be quite narrow, even if the model cannot fit the observed data at all. History matching however may quickly discover this mismatch after a few waves, making further analysis unnecessary.
The second drawback of the fully Bayesian approach is that it is often hard to perform the necessary calculations, and therefore various numerical schemes are required, the most popular being MCMC [47]. While MCMC has enjoyed much success, issues still remain over the convergence of an MCMC algorithm for even modest dimensional problems [48]. Often the likelihood may be highly multimodal, and therefore vast numbers of model evaluations are usually required to reach convergence, making MCMC prohibitively expensive for models of even moderate evaluation time. In contrast, the calculations for history matching are relatively fast and simple.
A third issue is that of robustness: small changes in the full Bayesian specification, especially involving the likelihood, can lead to substantial changes in the posterior. These sensitivities can go unnoticed and can be hard to analyse [49–52], but will call into question the resulting scientific conclusions.
Due to these issues, we support the fully Bayesian approach, but only for cases where such detailed calculations are warranted, say for a well tested, accurate biological model, which possesses well understood model deficiencies, and which is to be combined with data that have a trusted observation error structure, and critically, where full probabilistic results are deemed essential. If, instead, the main concern is to check whether the model can fit the data; to see what regions of input parameter space give acceptable matches, and for this to be used for further model development, then a history match may be the more appropriate analysis. Even if one wishes to forecast the results of future experiments [33, 53] and to make subsequent decisions, history matching can be sufficient, as one can either reweight appropriately the samples generated in the final wave, as was done in [11], or as stated in step 7 of the history matching algorithm, use more sophisticated sampling in the nonimplausible region depending on the scientific question.
Now the use of emulators can of course facilitate the large number of model evaluations required for Bayesian MCMC algorithms, admittedly at the expense of increased uncertainty (see for example [9, 54]). However, a more serious problem is then encountered. The likelihood function, which is a core component of Bayesian calculations, is constructed from all outputs of interest (and therefore attempts to describe both the ‘good’ and ‘bad’ inputs simultaneously). Hence we need to be able to emulate with sufficient accuracy all such outputs, including their possibly complex joint behaviour. This may be extremely challenging as often, especially in early waves, there may exist several erratically behaved outputs that are extremely difficult to emulate, which will dramatically fail emulator diagnostics [40]. Unfortunately, the likelihood and hence the posterior may be highly sensitive to such poorly constructed emulators. Therefore, from a purely practical perspective, employing the full Bayesian paradigm using inadequate emulators constructed at wave 1, may be unwise [6, 15].
History matching as a precursor to MCMC
If one does wish to perform a fully Bayesian analysis on a well tested biological model, we would usually recommend performing a history match first [15]. This would identify the nonimplausible region \(\mathcal {X}\) which should contain the vast majority of the posterior. Then MCMC or an equivalent Bayesian calculation (such as importance sampling) can be performed within \(\mathcal {X}\), using the accurate emulators defined in the final wave (for an example of this see [11]).
This is because if the model is of modest to high dimension, the posterior may often only occupy a tiny fraction of the original input space \(\mathcal {X}_{1}\). Unless the model is very fast to evaluate, we would need to use emulators to overcome the MCMC convergence issues, but performing enough model runs to construct sufficiently accurate emulators over the whole of \(\mathcal {X}_{1}\) would be extremely inefficient. Iterative history matching naturally provides the accurate emulators that we would need, defined only over \(\mathcal {X}\), which should contain the posterior.
Note that the history match cuts out space efficiently, based on small numbers of easy to emulate outputs in early waves, and designs appropriate runs for the next wave that have good coverage properties for the current nonimplausible region. Alternative iterative strategies, such as using MCMC at each wave to generate samples from the current posterior (which includes high emulator uncertainty), for use as model runs in the next wave, may be highly inefficient and could run into a series of difficulties. Such strategies would not fully exploit the smoothness of the model output, and may tend to cluster points together around the current posterior mode (hence squandering runs, for smooth deterministic models), leading to poor coverage of the full nonimplausible space. Such strategies may also be highly misled by inaccurate emulators and the subsequent posterior sensitivity combined with multimodal likelihood issues, leading to clustered designs in the wrong parts of the input space, and in some cases a lack of convergence.
To conclude, if one really desires a fully Bayesian analysis, then performing a history match first can greatly improve efficiency. In this way we view history matching not as a straight competitor to alternative approaches, but instead a complimentary technique, which has many benefits in its own right. See [6] and the extended discussion in [15] for more details of this argument.
History matching and ABC
Another Bayesian technique that has been developed more recently, and compared (somewhat cautiously) to history matching, by for example [55], is that of Approximate Bayesian Computation or ABC [56]. While the two approaches seem to share some superficial similarities, they are fundamentally different in their goal and the principled way each approach is set up and implemented. For example, ABC attempts to approximate full Bayesian inference and hence to obtain an approximate Bayesian posterior distribution (critically, using a tolerance that tends to zero). History matching is not an inference procedure, as it is simply attempting to rule out all the input space that is clearly inconsistent with the data given the model discrepancy and observation error (which are meaningful ‘tolerances’ that critically will never tend to zero). It is worth noting that if one attempts to specify a meaningful minimum size for the tolerance in ABC, one is arguably not really employing ABC anymore, but is instead just back using Bayesian inference (as shown by [56]) in the form of a samplingresampling algorithm (as described for example by [57]). History matching does not attempt to probabilize the remaining input space in any way, which can result in increased efficiency of the parameter search. We have directly compared and contrasted the two approaches in [58], where we demonstrated that a powerful version of ABC failed to find any part of parameter space that matched the observed data for a 22dimensional stochastic epidemiology model, while history matching found the correct part of input space and many good matches using approximately half the number of runs that the (failed) ABCSMC approach used.
Results
Application to the hormonal crosstalk network in Arabidopsis root development model
We now describe the relevant features of the hormonal crosstalk in Arabidopsis root development model [30], in preparation for the application of the Bayesian emulation and history matching processes introduced above.
The input or rate parameter ranges that define the initial search region \(\mathcal {X}_{1}\) over which the history match is performed
Input rate parameters  Minimum  Maximum 

k _{1}  0.1  10 
k _{2}/k _{1a }  0.02  2 
k _{2a }/k _{1a }  0.28  28 
k _{2b }  0.1  10 
k _{2c }  1 × 10^{−6}  1 
k _{3}/k _{1a }  0.2  20 
k _{3a }/k _{1a }  0.045  4.5 
k _{5}/k _{4}  0.1  10 
k _{6}  Control: 0 (pls mutant) or 0.3 (wildtype)  
k _{6a }  0.002  2000 
k _{7}  0.1  10 
k _{9}/k _{8}  0.1  10 
k _{10a }/k _{10}  166  1.66 × 10^{4} 
k _{11}/k _{10}  166  1.66 × 10^{5} 
k _{12a }/k _{12}  0.1  10 
k _{13}/k _{12}  1  1000 
k _{15}/k _{14}  2.83 × 10^{−4}  0.283 
k _{16a }/k _{16}  0.33  33.3 
k _{17}/k _{16}  0.033  3.33 
k _{18}  0.01  10 
k _{19}/k _{18a }  0.01  10 
k _{1v a u x i n }/k _{1a }  0.1  100 
k _{1v C K }/k _{18a }  0.1  10 
k _{1v e t h }/k _{12}  1  100 
The list of 15 original model outputs, their initial conditions and whether measurements are available
Model output  Initial concentration  Measurement available 

Auxin  0.1  Yes 
X  0.1  
PLSp  0.1  
Ra  0  
Ra*  1  
CK  0.1  Yes 
ET  0.1  Yes 
PLSm  0.1  Yes 
Re  0  
Re*  0.3  
CTR1  0  
CTR1*  0.3  
IAA  0 or 1  
cytokinin  0 or 1  
ACC  0 or 1 
Summaries of the direction of observed trends of the four measurable chemicals, relative to wild type for the four types of experiment: pls mutant, feeding auxin, ethylene and cytokinin respectively (first four columns)
Trend relative to wild type with no feeding  Trend relative to pls mutant with no feeding  

(k _{6}=0.3, IAA=ACC=cytokinin=0)  (k _{6}=0, IAA=ACC=cytokinin=0)  
Chemical output  pls mutant  Feed Auxin  Feed Ethylene  Feed Cytokinin  pls mutant + Feed Ethylene 
(k _{6}=0)  (IAA=1)  (ACC=1)  (cytokinin=1)  (k _{6}=0 and ACC=1)  
Auxin  Down  Up  Up  Down  Down 
PLSm    Up  Down  Down   
ET  No change  Up  Up  Up   
CK  Up  Down  Down  Up   
where we have introduced a control parameter a that represents the combined choice of plant type and feeding action, the subscript j indexes each of the four measurable chemicals, the vector x represents the vector of rate parameters as before and t represents time.
where the subscript i indexes the elements of the list {j,a _{1},a _{2}} corresponding to the 16 trends that were actually measured, as presented in Table 4. It is this function f _{ i }(x) that will be directly compared to the observed trends. Again, note the analogy with f _{ t }(x) as defined by Eqs. (1) and (2). We also append to f _{ i }(x) two additional outputs of interest which are not ratios: log(h _{ a u x i n,w t }(x,t)) and log(h _{ C K,w t }(x,t)), again evaluated as t→∞. These will ensure the acceptable matches found will not have unrealistic concentrations of auxin and cytokinin. Note that the Bayesian emulation and history matching methods we propose could be applied to outputs at any time point, and not just to the equilibrium points of primary interest here (see for example [11, 24, 25, 27, 28]).
The primary question that the modeller may ask at this point is whether the outputs of the model, in the form of f _{ i }(x), match the observed trends given in Table 4, to within an acceptable level of tolerance, and what is the set \(\mathcal {X}\) of all rate or input parameters corresponding to such acceptable matches.
The initial input space \(\mathcal {X}_{1}\) that we choose to perform the global parameter search or history match over is defined in Table 2. This was constructed by specifying ranges on the 23 inputs that covered at least one order of magnitude above and below the single input parameter setting found in [30]. The ranges of some parameters of particular interest were subsequently increased to allow a wider exploration. This means we will explore a biologically plausible space that covers at least two orders of magnitude in every dimension, centred (on a log scale) around the original parameter point. This gives rise to a large space \(\mathcal {X}_{1}\), of suitable size to demonstrate our methodology. Note that we could make these ranges wider still if this was deemed plausible, which would simply result in us having to perform more waves to complete the history match.
Linking the Arabidopsis model to reality
represent an increase of between 20% to ten fold for the “Up” trends, a decrease also of between 20% to ten fold for the “Down” trends, and an interval of 40% decrease to 40% increase for the “No Change” trend. These intervals, assumed symmetric on the log scale, were formulated by answering the natural question: where would each model output have to lie to avoid violating the trends given in Table 4, considering relevant observational and model uncertainties? This specification captures the main features of the trend data and is sufficient for our purposes of demonstrating the Bayesian history matching methodology. Obviously, a more detailed treatment would involve having more information regarding the observations z _{ i } themselves, and their associated measurement errors represented by Var(e _{ i }). Also, were we to consider in more detail the known deficiencies of the model, we could give a more detailed specification of the model discrepancy Var(ε _{ i }), which would most likely include correlations between different outputs that exploited the joint structure suggested by the choice of chemical, choice of mutant and choice of feeding regime, or even including a simple dependence on certain input parameters. See [6, 8, 16] for examples of more detailed model discrepancy specifications in alternative applications, and [15, 18] for further discussions.
The specification of z _{ i },Var(e _{ i }) and Var(ε _{ i }) or equivalently σ _{ i }, can be used to define an ‘acceptable match’ between model output and observed data via the implausibility measures of Eq. (18), as any model evaluation that satisfies I _{ i }(x)<c for some cutoff c. A common choice is c=3, based on Pukelsheim’s 3sigma rule (see [45]). We may impose this constraint simultaneously across all of the 18 outputs shown in Fig. 7, by demanding that I _{ M }(x)<c where I _{ M }(x) is the maximum implausibility defined by Eq. (19), or we could impose a less stringent criteria by constraining the second or third maximum implausibility instead, which would allow model runs to deviate from one or two outputs respectively.
Bayesian emulation of the Arabidopsis model
We can now proceed with the first wave of emulation of the Arabidopsis model as follows. Note that several packages are available that perform standard Gaussian Process emulation (see for example the BACCO [60] and GPfit [61] packages in R, or GPy [62] for Python) which may be of use to the uninitiated, as an alternative to the slightly more sophisticated emulators we describe here.
First we design a set of 2000 wave 1 runs over the initial search region \(\mathcal {X}_{1}\) based on a maximin Latin hypercube design (see Fig. 3 and [34, 35]), using for example the lhs() function in R [37]. Each of these runs specifies a distinct set of values of all the rate parameters in x, and therefore for each run the differential equations given in Table 1 were solved numerically using the lsoda() function again in R, with initial conditions given in Table 3, up to t=10000 seconds to ensure equilibrium is reached (equilibrium was then checked). Each run took approximately 1 second of real time to evaluate, implying that although this is a relatively fast model, it is still too slow to exhaustively search the full 23 dimensional input space, which would likely require a vast number of runs. The emulators that we develop turn out to be 4 orders of magnitude faster than the model, and hence allow a much more detailed and efficient exploration. This ratio of emulator speed versus model speed actually improves as the model complexity increases, as the speed of an emulator is a function of the number of runs used to construct it [6]. Note that when choosing the number of wave 1 runs, the computer model literature tentatively suggests that at least 10d are required for emulator construction, where d is the dimension of the input space. Of course, depending on the complexity of the model, far more may be needed. Here, as the Arabidopsis model is of reasonable speed, we could afford to run 2000 runs per wave, and this allows the fitting of higher order polynomial terms such as cubics, once a restricted set of active inputs has been identified. Also, 2000 runs allows for a tractable inverse Var(D _{ i })^{−1} that is computed in the emulator Eqs. (8) and (9).
The wave 1 run outputs f _{ i }(x) for all 18 outputs considered (see Eq. (22)) are shown in Fig. 7 as the purple lines, with the observed data intervals z _{ i }±3σ _{ i } given as the black error bars, and the best run previously found by and discussed in [30] shown as the light blue line. As these runs were generated from a space filling design, they can give substantial insight into the broad behaviour of the model over the initial search region \(\mathcal {X}_{1}\). We can see that some outputs are seemingly constrained to give only positive (e.g. Auxin_fa) or negative (e.g. Auxin_mu) trends, and that many of the runs are far from the target ranges (as the yaxis is on a loge scale). We also find that no individual wave 1 run passes through every one of the target intervals. This all suggests that the volume of the nonimplausible space \(\mathcal {X}\) containing only acceptable runs may be small or indeed zero, and hence we may need several waves for the history match.
We employ the more general emulator structure as represented by Eq. (5). For each output f _{ i }(x), we identify the list of active input parameters \(x_{A_{i}}\) by fitting first order polynomials in x and selecting the active inputs based on AIC criteria (using for example the lm() and step() functions in R [37]). We choose the set of deterministic functions \(\phantom {\dot {i}}g_{ij}(x_{A_{i}})\) by selecting terms from the complete third order polynomials in the active inputs, discarding terms again based on AIC criteria (see [6, 7, 15, 22] for more details). We show the structure of these wave 1 emulators in terms of the deterministic functions \(\phantom {\dot {i}}g_{ij}(x_{A_{i}})\), and the choice of active variables \(x_{A_{i}}\), in the Additional file 2. Due to the large number of runs and in the absence of strong prior information, we set E(β _{ ij })=0 and take a large Var(β _{ ij }) limit. The β _{ ij } terms will hence behave, after the Bayes Linear update represented by Eqs. (8) and (9), approximately like their Ordinary Least Squares linear model fits (see [6] for details). Note that the linear models formed at this point, without the inclusion of the Gaussian process part below, would already give reasonably effective emulators (see for example [42]).
We choose the combination of the Gaussian process and nugget variances \(\sigma ^{2}_{c_{i}} = \sigma ^{2}_{u_{i}} + \sigma ^{2}_{w_{i}}\) to be equal to the residual standard error from the OLS linear model fit [6], and set \(\sigma ^{2}_{w_{i}} = p \sigma ^{2}_{c_{i}}\) where p is a parameter governing the proportion of variance explained by the inactive variables, taken to be between 0.05 to 0.1, and checked with emulator diagnostics described below. Note that we could design more runs that vary the inactive variables to assess p more accurately, as is done in [6]. We utilise the argument presented in full in [6] for choosing correlation lengths for emulators that contain low order polynomials, which states that as we are fitting third order polynomials in the active inputs, we expect the residual surface to look approximately like a fourth (or higher) order surface, and hence we can choose the correlation length accordingly. We hence set the (scaled) correlation lengths θ _{ i }, required for Eq. (6), to be 0.35, in agreement with [6], where the inputs x have all been scaled to the range [−1,1]. Note that one can go further and estimate the individual correlation lengths θ _{ i } using say maximum likelihood, which may improve the emulators accuracy, but provided the polynomial surface is fitting well, as judged say by the adjusted R ^{2} of the linear model, this improvement would only be modest.
Finally, we constructed the emulators by using the Bayes Linear update Eqs. (8) and (9) to obtain \(\mathrm {E}_{D_{i}}(f_{i}(x))\) and \(\text {Var}_{D_{i}}(f_{i}(x))\) for each output i, where D _{ i } is the corresponding vector of 2000 run output values. Emulator diagnostics were then performed by evaluating 200 new diagnostic runs and comparing them to the corresponding emulator predictions, in the form of prediction intervals \(\mathrm {E}_{D_{i}}(f_{i}(x)) \pm 3\sqrt {\text {Var}_{D_{i}}(f_{i}(x))}\) (here 200 runs was deemed sufficient but see [40] for detailed emulator diagnostics). In the first wave we found that 13 out of the 18 outputs were straightforward to emulate, in that their emulators were of sufficient accuracy to allow reasonably large parts of the input space to be removed, while simultaneously satisfying emulator diagnostics. The remaining 5 outputs were left to be considered in later waves. Each of these 13 outputs (that define Q _{1}) required between 7–13 active inputs \(x_{A_{i}}\), out of a total of 31 full or 23 reduced input parameters x, which represents a substantial dimensional reduction and hence a large benefit to the emulator construction process and subsequent parameter search, as discussed in [15]. This is in addition to the speed increase of using emulators as they are in this case 10^{4} times faster to evaluate than the full Arabidopsis model. Note that each one of the 23 inputs featured in at least one of the 13 emulators.
History matching the Arabidopsis model
We now employ the iterative history matching strategy described above to the Arabidopsis model.
As well as the maximised implausibility I _{ M }(x) defined by Eq. (19), we also use the more robust second and third maximum implausibilities denoted I _{2M }(x) and I _{3M }(x) respectively, defined using the set Q _{ k } of outputs considered in wave k, as these implausibility measures are more robust to emulator failure. In the first wave, only I _{2M }(x) and I _{3M }(x) were used with conservative cutoffs c of 3.25 and 3 imposed respectively. This defined \(\mathcal {X}_{2}\): the nonimplausible space remaining after wave 1, which had a volume of 2.06×10^{−1} of the original input space \(\mathcal {X}_{1}\).
The plot has the following interpretation: the red/dark grey regions correspond to high implausibility and imply that no matter what values we choose for the remaining 21 inputs, the Arabidopsis model will not give good matches to the data in these regions of (k _{1v a u x i n }/k _{1a }, k _{1v e t h }/k _{12}) space. The green/yellow regions imply that somewhere within the 21dimensional space there are low implausibility points with these values of k _{1v a u x i n }/k _{1a } and k _{1v e t h }/k _{12}. We are therefore looking at the silhouette of \(\mathcal {X}_{2}\) for various different cutoffs represented as colours [7]. The green and yellow regions will be investigated further in subsequent waves.
where V_{21}{.} denotes the 21dimensional volume of the remaining space. ρ(x ^{′}) can therefore show where large or small amounts of nonimplausible points can be found, conditioned on x ^{′}, providing further insight into the structure of \(\mathcal {X}_{2}\). Both I _{ P }(x ^{′}) and ρ(x ^{′}) are generalisable to higher dimensions if necessary, and various computational shortcuts in the emulator calculations can be exploited (see [6, 7, 15, 22] for details).
Summary of the 4 waves of emulation
Wave  Outputs emul.  Active inputs  Cutoffs c used  Prop. space nonimp.  

I _{ M }  I _{2M }  I _{3M }  
1  13  713    3.25  3  2.06×10^{−1} 
2  18  615    3.1  2.8  4.83×10^{−3} 
3  18  616  5  2.9  2.7  4.34×10^{−4} 
4  18  1119  3.2  2.8  2.65  2.69×10^{−5} 
5      3.2      1.21×10^{−6} 
Figure 8 middle and right columns, show the minimised implausibility and optical depth plots after wave 2 and wave 4 respectively, again for the inputs k _{1v a u x i n }/k _{1a } and k _{1v e t h }/k _{12}, and highlight the progression of the history match and the sequential reduction of the nonimplausible space. The minimised implausibility plots also show the sensitivity of the size and location of the nonimplausible region to the choice of cutoff motivated by Pukelsheim’s rule, and given in Table 5. Note that in the optical depth plot after wave 4 (bottom right panel), the depth of the nonimplausible region is now very small. Even if we were to set the inputs k _{1v a u x i n }/k _{1a } and k _{1v e t h }/k _{12} to values corresponding to the largest depth (given by the dark red region), the chances of finding a nonimplausible point by randomly choosing the other inputs is approximately 2.3×10^{−4}, highlighting the difficulty of manual or ad hoc searches of the input space.
The history matching process is terminated with the evaluation of a wave 5 set of uniformly drawn acceptable runs. As shown in Table 5, the nonimplausible space \(\mathcal {X}\) was now 1.21×10^{−6} smaller than that of the original \(\mathcal {X}_{1}\): a small target, which would require on average a total of 830000 runs chosen at random to obtain 1 single acceptable run, requiring approximately 230 hours of processor time. In contrast, our history matching approach generated hundreds of acceptable runs using only 10000 model evaluations, requiring approximately 2.7 hours of processor time. For a more expensive model in terms of evaluation time, such efficiency gains would be even more dramatic [6, 7, 15].
We now go on to describe the results of the parameter search and discuss their implications.
Discussion of the results of the parameter search
Figure 10 also informs as to the class of possible observed data sets that the model could have matched, and hence gives insight into the model’s flexibility. We see that 6 out of the 16 trend outputs could have predicted either positive or negative (or zero) trends, and hence could possibly have fitted many different data sets, although further investigation of the joint structure of these outputs would be required to confirm this. For example, if these 6 outputs were found to vary independently, then they could be adjusted to fit any combination of positive, negative (or zero) trends. However the remaining 10 trend outputs are restricted to giving the ‘correct’ trend, and hence seem not to be flexible at all. In general, we may be concerned about an overly flexible model, possessing say a high number of rate parameters, and specifically about claims that it has been validated based purely on a comparison to data, as it would be no surprise when it fits the observed data well, and therefore it may not contain much inherent biological structure at all. This is clearly not the case for the Arabidopsis model considered here. Only by performing a global parameter search such as described here, can one guard against such issues.
Discussion
Evaluation of gene functions using Bayesian emulation methodology
In the previous sections, we have shown that Bayesian emulation and history matching methodology allows extensive exploration of the input rate parameter space, giving multiple insights into the model’s structure, constraints placed upon it by the observed data and on the corresponding biological consequences.
Here we further demonstrate that this methodology can be used to evaluate regulatory relationships and gene functions in hormonal signalling systems, by examining both the above results and the results obtained from a second history match of an alternative model.
Therefore increasing k _{6a } decreases this regulatory strength. Thus, low values of k _{6a } indicate that a regulatory relationship of ethylene inhibiting PLSm production is required. The optical depth and minimised implausibility plots corresponding to k _{6a } in Fig. 9 show that high values of k _{6a } are ruled out. Our analysis suggests that no acceptable parameter combinations with large k _{6a } can be found that are consistent with the target data, and hence our results strongly support the assertion that the inhibition of PLSm production by ethylene is required for predicting known experimental trends, conditional on the remaining specifications made in the analysis.
Comparing the results of the k _{2c }=0 case (Fig. 12) with the results of the k _{2c }>0 case (Fig. 10) we can immediately see some important differences between the two models. For the Auxin_mufe output (7th error bar from the left), the k _{2c }>0 model always returns the correct negative trend. In contrast the k _{2c }=0 model returned the incorrect positive trend for the vast majority of the wave 1, 2 and 3 runs, implying that there is only a very small region of input space that returns the correct negative trend, a region located by the history match analysis and explored by the wave 5 runs. Without such an analysis it would be easy to incorrectly conclude that the k _{2c }=0 model is inconsistent with the data. This demonstrates perhaps the most important difficulty in exploring high dimensional models: there may be one (or more) extremely small regions of input space of scientific interest, and conventional optimisation techniques may easily get stuck in local minima far away from these regions. Our Bayesian history matching approach however is specifically designed to combat such difficulties by carefully exploring the space using efficient emulator based global search methods, as we have demonstrated here.
After considering that the PLS gene is required for the response of ethylene downstream based on experimental observations (mathematically this is equivalent to the response of ethylene downstream, X, remaining constant for the pls mutant (k _{6}=0) fed with ethylene), previous research [30] deduced that the PLS gene does indeed have a function in auxin biosynthesis. However, the history match of the k _{2c }=0 model (Fig. 12) suggests that, given the specification of the trends and their relevant uncertainties, the k _{2c }=0 model is consistent with observed data, and hence it may not be essential for the PLS gene to play a role in auxin biosynthesis.
It shows a comparison of the spread of input parameter locations of the acceptable runs found for the k _{2c }>0 model (red box plots) and the k _{2c }=0 model (blue box plots) in terms of individual input rate parameters as labelled along the xaxis. Note that the two sets of acceptable runs being compared correspond to the red lines in Figs. 10 and 12 respectively. The yaxis is on a log10 scale, and the grey rectangles show the initial ranges that define the original search region \(\mathcal {X}_{1}\) as given in Table 2. The light blue horizontal lines show the input parameter values of the previous best run as found by [30]. The main differences between the two models’ acceptable runs are exhibited by the following parameters or ratios of parameters: k _{2}/k _{1a }, k _{2b }, k _{3}/k _{1a }, k _{3a }/k _{1a }, k _{12a }/k _{12}, k _{15}/k _{14}, k _{18}, k _{19}/k _{18}. To the best of our knowledge, the biological significance of many of these differences cannot be judged using current biological insight. However, two ratios, k _{2}/k _{1a } and k _{3a }/k _{1a }, do reveal some important results. k _{1a } is the maximal rate of transporting auxin from shoot to root; k _{2} is the background auxin biosynthesis rate; k _{3a } is the rate constant describing the control of ethylene downstream over auxin transport from root to shoot. First, biologically k _{2}/k _{1a } must be very small. This is because the background auxin biosynthesis rate, k _{2}, must be very small and usually biologically negligible, as it represents the nonenzymatic process in auxin biosynthesis. Moreover, auxin transport from shoot to root, whose maximal rate is k _{1a }, is an important process, as evidenced experimentally [63]. Therefore, k _{1a } should be large. However, for the k _{2c }=0 model to match target data, the majority of acceptable runs have relatively large k _{2}/k _{1a }, while for the k _{2c }>0 model much smaller and more realistic values are preferred. Second, biologically k _{3a }/k _{1a } should be small. This is because it is known that auxin more predominantly transports from shoot to root, to form an auxin concentration maximum in the root tip [64, 65]. However, for the k _{2c }=0 model to match target data, the set of acceptable runs suggest that relatively large k _{3a }/k _{1a } is required. Therefore, the differences between the two models’ parameter ratios highlight that, although we have found acceptable matches for the k _{2c }=0 model, these matches have not been found at biologically realistic parameter values. While we must be cautious about such conclusions that are based on the finite sampling of the nonimplausible regions, we have generated hundreds of approximately uniformly sampled acceptable runs from each model that do indeed exhibit the features discussed. Therefore our results suggest that biological insight clearly favours the model with k _{2c }>0, i.e. that the PLS gene does have a function in auxin biosynthesis. More detailed measurements of the key outputs that restrict k _{2}/k _{1a } and k _{3a }/k _{1a } would of course further clarify this issue.
Our results show that Bayesian emulation and history matching methodology can be used to evaluate regulatory relationships and gene functions in hormonal signalling systems. To further improve the accuracy of the results of this methodology, the following aspects should be considered. First, experimental data should be more quantitatively measured, to define more accurate trends. The example trends we have used in this work, as summarised in Table 4 and the associated discussion, are mainly formulated based on qualitative or semiquantitative experimental data, combined with scientific judgement. Second, model development should include more components, to better describe the experimental systems. Third, Bayesian emulation methodology should be used to study the effects of additional experiments, such as the response of ethylene downstream when feeding ethylene, etr1 mutant and etr1pls double mutant, on the evaluation of regulatory relationships and gene functions. Fourth, Bayesian emulation methodology should also be used to explore the effects of the uncertainty of quantitative trends on the evaluation of regulatory relationships and gene functions, as in most cases trends of biological data are not sufficiently quantitative.
Conclusions

A more formal statistical model linking the biological model to reality, which encompasses major sources of uncertainty such as observational errors and model discrepancy.

A Bayesian emulator allowing a very fast exploration of model behaviour, applicable to models even with very long evaluation times.

A careful history match using implausibility measures that performs an iterative global exploration of the input parameter space using the emulators, to find the region containing all acceptable matches to the observed data.
We applied this methodology to two versions of the hormonal crosstalk in Arabidopsis root development model, and in each case identified the small region of input space containing scientifically interesting matches. The two models and their biological implications were then compared in a robust manner and used to discuss gene functions. We found that although some acceptable matches to the specified trends could be found for the k _{2c }=0 model, these were only found at parameter settings that violated other known biological evidence, whereas the k _{2c }>0 model’s acceptable matches seemed far more realistic. This implied that PLS does indeed play a role in auxin biosynthesis. Our results also strongly supported the assertion that the inhibition of PLSm production by ethylene is required for consistency with known experimental trends.
We would stress that searching for all acceptable matches between model output and observed data is vital for several reasons. It avoids the danger of false conclusions being made, based on the analysis of a single run (or a small number of runs) consistent with the data: conclusions that could easily change if an alternative run was found instead, that also matched the data but which provided different biological implications. If we want to use the model to make predictions for the results of future biological experiments, all acceptable matches must be found and the corresponding range of predictions examined. A narrow range of predictions from the acceptable runs for a particular proposed future experiment, for example, would imply that it would be a good test of the model as it could possibly rule it out, while a large range implies that this experiment would most likely be informative for the model’s rate parameters. Model predictions, using all the acceptable runs, can then be used to design efficient sets of future experiments that are most likely to realise particular scientific goals, such as learning about all or subsets of the rate parameters, testing the model or distinguishing between certain biological hypotheses. We would assert that this design problem is also a fundamental challenge to the area of systems biology, but leave a detailed exposition to future work.
Since plant root development is regulated by multiple hormones in a coordinated way [66], unravelling the regulatory relationships and gene functions for root development is a difficult task that requires the investigation of how biological information is spatiotemporally integrated and communicated [67]. Modelling hormonal crosstalk as an integrative system is an important aspect for integrating information in plant root development [5, 30, 68–71]. This work demonstrates that a combination of experimental data, a model of hormonal crosstalk in Arabidopsis root development, and Bayesian emulation and history matching methodology is able to evaluate regulatory relationships and gene functions in a hormonal signalling system. In particular, Bayesian emulation and history matching methodology is a useful method for performing a global parameter search to attempt to find all input parameter settings that achieve an acceptable match.
Declarations
Acknowledgements
JL and KL gratefully acknowledge the Biotechnology & Biological Sciences Research Council (BBSRC) for funding in support of this study. JR is in receipt of a BBSRC studentship. IV and MG gratefully acknowledge MRC and EPSRC funding.
Funding
This work was initiated as part of an EPSRC seed corn grant, administered by Durham University. In addition, JL and KL gratefully acknowledge the Biotechnology & Biological Sciences Research Council (BBSRC) for funding in support of this study. JR is in receipt of a BBSRC studentship. IV and MG gratefully acknowledge MRC and EPSRC funding.
Availability of data and materials
All data generated or analysed during this study are included in this published article.
Authors’ contributions
IV, JL, MG and KL conceived the idea. IV and JL designed this research and carried out the data analysis. IV and MG developed and performed the Bayesian emulation uncertainty analysis. All authors established the links between the Bayesian emulation methodology and hormonal crosstalk in plant development. IV, JL, MG and KL wrote the manuscript. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
 Boogerd FC, Bruggeman F, Hofmeyr JHS, Westerhoff HV, (eds).Systems Biology Philosophical Foundations. Amsterdam: Elsevier; 2007.Google Scholar
 Alves R, Antunes F, Salvador A. Tools for kinetic modeling of biochemical networks. Nat Biotech. 2006; 24(6):667–72.View ArticleGoogle Scholar
 Jamshidi N, Palsson BØ. Formulating genomescale kinetic models in the postgenome era. Mol Syst Biol. 2008; 4:171. doi:10.1038/msb.2008.8.View ArticlePubMedPubMed CentralGoogle Scholar
 Smallbone K, Simeonidis E, Swainston N, Mendes P. Towards a genomescale kinetic model of cellular metabolism. BMC Syst Biol. 2010; 4:6.View ArticlePubMedPubMed CentralGoogle Scholar
 Moore S, Zhang X, Mudge A, Rowe JH, Topping JF, Liu J, Lindsey K. Spatiotemporal modelling of hormonal crosstalk explains the level and patterning of hormones and gene expression in arabidopsis thaliana wildtype and mutant roots. New Phytol. 2015; 207(4):1110–22. doi:10.1111/nph.13421.201519023.View ArticlePubMedPubMed CentralGoogle Scholar
 Vernon I, Goldstein M, Bower RG. Galaxy formation: a bayesian uncertainty analysis. Bayesian Anal. 2010; 5(4):619–70.View ArticleGoogle Scholar
 Vernon I, Goldstein M, Bower RG. Galaxy formation: Bayesian history matching for the observable universe. Stat Sci. 2014; 29(1):81–90.View ArticleGoogle Scholar
 Craig PS, Goldstein M, Seheult AH, Smith JA. Pressure matching for hydrocarbon reservoirs: a case study in the use of bayes linear strategies for large computer experiments (with discussion) In: Gatsonis C, Hodges JS, Kass RE, McCulloch R, Rossi P, Singpurwalla ND, editors. Case Studies in Bayesian Statistics. New York: Springer;1997. p. 36–93.Google Scholar
 Kennedy MC, O’Hagan A. Bayesian calibration of computer models. J R Stat Soc Ser B. 2001; 63(3):425–64.View ArticleGoogle Scholar
 O’Hagan A. Bayesian analysis of computer code outputs: A tutorial. Reliab Eng Syst Saf. 2006; 91:1290–300.View ArticleGoogle Scholar
 Andrianakis I, Vernon I, McCreesh N, McKinley TJ, Oakley JE, Nsubuga R, Goldstein M, White RG. Bayesian history matching of complex infectious disease models using emulation: A tutorial and a case study on HIV in Uganda. PLoS Comput Biol. 2015; 11(1):1003968.View ArticleGoogle Scholar
 Williamson D, Goldstein M, Allison L, Blaker A, Challenor P, Jackson L, Yamazaki K. History matching for exploring and reducing climate model parameter space using observations and a large perturbed physics ensemble. Clim Dyn. 2013; 41(7–8):1703–29.View ArticleGoogle Scholar
 Heitmann K, Higdon D, et al. The coyote universe ii: Cosmological models and precision emulation of the nonlinear matter power spectrum. Astrophys J. 2009; 705(1):156–74.View ArticleGoogle Scholar
 Oakley J, O’Hagan A. Bayesian inference for the uncertainty distribution of computer model outputs. Biometrika. 2002; 89(4):769–84.View ArticleGoogle Scholar
 Vernon I, Goldstein M, Bower RG. Rejoinder for Galaxy formation: a bayesian uncertainty analysis. Bayesian Anal. 2010; 5(4):697–708.View ArticleGoogle Scholar
 Goldstein M, Seheult A, Vernon I. Assessing Model Adequacy In: Wainwright J, Mulligan M, editors. Environmental Modelling: Finding Simplicity in Complexity, 2nd edn. Chichester: Wiley;2013. doi:10.1002/9781118351475.ch26.Google Scholar
 Brynjarsdottir J, O’Hagan A. Learning about physical parameters: The importance of model discrepancy. Inverse Probl. 2014; 30(114007):24.Google Scholar
 Goldstein M, Rougier JC. Reified bayesian modelling and inference for physical systems (with discussion). J Stat Plan Infer. 2009; 139(3):1221–39.View ArticleGoogle Scholar
 Higdon D, Kennedy M, Cavendish JC, Cafeo JA, Ryne RD. Combining field data and computer simulations for calibration and prediction. SIAM J Sci Comput. 2004; 26(2):448–66.View ArticleGoogle Scholar
 Henderson DA, Boys RJ, Krishnan KJ, Lawless C, Wilkinson DJ. Bayesian emulation and calibration of a stochastic computer model of mitochondrial dna deletions in substantia nigra neurons. J Am Stat Assoc. 2009; 104(485):76–87.View ArticleGoogle Scholar
 ZamoraSillero E, Hafner M, Ibig A, Stelling J, Wagner A. Efficient characterization of highdimensional parameter spaces for systems biology. BMC Syst Biol. 2011; 5(1):1–22. doi:10.1186/175205095142.View ArticleGoogle Scholar
 Bower RG, Vernon I, Goldstein M, Benson AJ, Lacey CG, Baugh CM, Cole S, Frenk CS. The parameter space of galaxy formation. Mon Not Roy Astron Soc. 2010; 96(454):717–29.Google Scholar
 Vernon I, Goldstein M. Bayes linear analysis of imprecision in computer models, with application to understanding galaxy formation In: Augustin T, Coolen FPA, Moral S, Troffaes MCM, editors. ISIPTA’09: Proceedings of the Sixth International Symposium on Imprecise Probability: Theories and Applications. Durham: SIPTA;2009. p. 441–50.Google Scholar
 Rodrigues LFS, Vernon I, Bower RG. Constraints to galaxy formation models using the galaxy stellar mass function. MNRAS. 2017; 466(2):2418–35.View ArticleGoogle Scholar
 Andrianakis I, Vernon I, McCreesh N, McKinley TJ, Oakley JE, Nsubuga RN, Goldstein M, White RG. History matching of a complex epidemiological model of human immunodeficiency virus transmission by using variance emulation. J R Stat Soc: Ser C: Appl Stat. 2017; 66(4):717–40. doi:10.1111/rssc.12198.View ArticleGoogle Scholar
 Craig PS, Goldstein M, Seheult AH, Smith JA. Bayes linear strategies for history matching of hydrocarbon reservoirs In: Bernardo JM, Berger JO, Dawid AP, Smith AFM, editors. Bayesian Statistics 5. Oxford: Clarendon Press: 1996. p. 69–95.Google Scholar
 Cumming JA, Goldstein M. Bayes linear uncertainty analysis for oil reservoirs based on multiscale computer experiments In: O’Hagan A, West M, editors. Handbook of Bayesian Analysis. Oxford: Oxford University Press;2009.Google Scholar
 Cumming JA, Goldstein M. Small sample bayesian designs for complex highdimensional models based on information gained using fast approximations. Technometrics. 2009; 51(4):377–88.View ArticleGoogle Scholar
 Boukouvalas A, Sykes P, Cornford D, MaruriAguilar H. Bayesian precalibration of a large stochastic microsimulation model. IEEE Trans Intell Transp Syst. 2014; 15(3):1337–47.View ArticleGoogle Scholar
 Liu J, Mehdi S, Topping J, Tarkowski P, Lindsey K. Modelling and experimental analysis of hormonal crosstalk in arabidopsis. Mol Syst Biol. 2010; 6(1):373. doi:10.1038/msb.2010.26. http://arxiv.org/abs/http://msb.embopress.org/content/6/1/373.full.pdf.PubMedPubMed CentralGoogle Scholar
 Casson SA, Chilley PM, Topping JF, Evans IM, Souter MA, Lindsey K. The polaris gene of arabidopsis encodes a predicted peptide required for correct root growth and leaf vascular patterning. Plant Cell. 2002; 14(8):1705–21.View ArticlePubMedPubMed CentralGoogle Scholar
 Chilley PM, Casson SA, Tarkowski P, Hawkins N, Wang KLC, Hussey PJ, Beale M, Ecker JR, Sandberg GK, Lindsey K. The polaris peptide of arabidopsis regulates auxin transport and root growth via effects on ethylene signaling. Plant Cell. 2006; 18(11):3058–72.View ArticlePubMedPubMed CentralGoogle Scholar
 Craig PS, Goldstein M, Rougier JC, Seheult AH. Bayesian forecasting for complex systems using computer simulators. J Am Stat Assoc. 2001; 96(454):717–29.View ArticleGoogle Scholar
 Sacks J, Welch WJ, Mitchell TJ, Wynn HP. Design and analysis of computer experiments. Stat Sci. 1989; 4(4):409–35.View ArticleGoogle Scholar
 Santner TJ, Williams BJ, Notz WI. The Design and Analysis of Computer Experiments. New York: Springer; 2003.View ArticleGoogle Scholar
 Currin C, Mitchell T, Morris M, Ylvisaker D. Bayesian prediction of deterministic functions with applications to the design and analysis of computer experiments. J Am Stat Assoc. 1991; 86(416):953–63.View ArticleGoogle Scholar
 Team RC. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing: R Foundation for Statistical Computing; 2015. http://www.Rproject.org/.Google Scholar
 Goldstein M. Bayes linear analysis. In: Kotz S, et al, editors. Encyclopaedia of Statistical Sciences. Hoboken: Wiley;1999. p. 29–34.Google Scholar
 Goldstein M, Wooff DA. Bayes Linear Statistics: Theory and Methods. Chichester: Wiley; 2007.View ArticleGoogle Scholar
 Bastos TS, O’Hagan A. Diagnostics for gaussian process emulators. Technometrics. 2008; 51:425–38.View ArticleGoogle Scholar
 Rougier J. Efficient emulators for multivariate deterministic functions. J Comput Graph Stat. 2008; 17(4):827–43. doi:10.1198/106186008X384032. http://dx.doi.org/10.1198/106186008X384032 View ArticleGoogle Scholar
 Andrianakis I, McCreesh N, Vernon I, McKinley TJ, Oakley JE, Nsubuga R, Goldstein M, White RG. Efficient History Matching of a High Dimensional IndividualBased HIV Transmission Model. J Uncertaint Quantif; 5(1):694–719.Google Scholar
 McCreesh N, Andrianakis I, Nsubuga RN, Strong M, Vernon I, McKinley TJ, Oakley JE, Goldstein M, Hayes R, White RG. Universal test, treat, and keep: improving art retention is key in costeffective hiv control in uganda. BMC Infect Dis. 2017; 17(1):322. doi:10.1186/s128790172420y.View ArticlePubMedPubMed CentralGoogle Scholar
 Goldstein M, Huntley N. In: Ghanem R, Higdon D, Owhadi H, (eds).Bayes Linear Emulation, History Matching, and Forecasting for Complex Computer Simulators. Cham: Springer; 2016, pp. 1–24. https://doi.org/10.1007/9783319112596_141.Google Scholar
 Pukelsheim F. The three sigma rule. Am Stat. 1994; 48:88–91.Google Scholar
 Bernardo JM, Smith AFM. Bayesian Theory.Wiley Series in Probability and Statistics. Wiley; 2006. https://books.google.co.uk/books?id=cl6nAAAACAAJ.
 Brooks S, Gelman A, Jones G, Meng XL. Handbook of Markov Chain Monte Carlo. Florida: CRC press; 2011.View ArticleGoogle Scholar
 Geyer C. Introduction to markov chain monte carlo. In: Brooks S, Gelman A, Jones G, Meng XL, editors. Handbook of Markov Chain Monte Carlo. Florida: CRC press;2011. p. 3–48.Google Scholar
 Berger JO. An overview of robust Bayesian analysis. Test. 1994; 3(1):5–59.View ArticleGoogle Scholar
 Berger JO, Insua DR, Ruggeri F. Bayesian robustness In: Insua DR, Ruggeri F, editors. Robust Bayesian Analysis. Lecture Notes in Statistics. New York: Springer;2000. p. 1–31.Google Scholar
 In: Insua DR, Ruggeri F, (eds).Robust Bayesian Analysis. Lecture Notes in Statistics. New York: Springer; 2000.Google Scholar
 Vernon I, Gosling JP. A bayesian computer model analysis of robust bayesian analyses. Bayesian Anal. 2017. (In submission) arXiv:1703.01234.Google Scholar
 Goldstein M, Rougier JC. Bayes linear calibrated prediction for complex systems. J Am Stat Assoc. 2006; 101(475):1132–43.View ArticleGoogle Scholar
 Higdon D, Gattiker J, Williams B, Rightley M. Computer model calibration using highdimensional output. J Am Stat Assoc. 2008; 103(482):570–83.View ArticleGoogle Scholar
 Holden PB, Edwards NR, Hensman J, Wilkinson RD. In: Sisson S, Fan L, Beaumont M, (eds).ABC for climate: dealing with expensive simulators: Handbook of Approximate Bayesian Computation (ABC); 2016. arXiv:http://arxiv.org/abs/1511.03475.Google Scholar
 Wilkinson RD. Approximate bayesian computation (abc) gives exact results under the assumption of model error. Stat Approaches Genet Mol Biol. 2013; 12(2):129–41.Google Scholar
 Smith AFM, Gelfand AE. Bayesian statistics without tears: A samplingresampling perspective. Am Stat. 1992; 46(2):84–8.Google Scholar
 McKinley TJ, Vernon I, Andrianakis I, McCreesh N, Oakley JE, Nsubuga RN, Goldstein M, White RG. Approximate bayesian computation and simulationbased inference for complex stochastic epidemic models. Stat Sci Rev J Inst Math Stat. 2017. To appear http://dro.dur.ac.uk/22953/.
 Wilkinson DJ. Stochastic Modelling for Systems Biology. Taylor and Francis Group, LLC: Chapman and Hall; 2006.Google Scholar
 Hankin RKS. Introducing bacco, an r bundle for bayesian analysis of computer code output. J Stat Softw. 2005; 14(16):1–21.View ArticleGoogle Scholar
 MacDonald B, Ranjan P, Chipman H. Gpfit: An r package for fitting a gaussian process model to deterministic simulator outputs. J Stat Softw Artic. 2015; 64(12):1–23. doi:10.18637/jss.v064.i12.Google Scholar
 GPy: GPy: A Gaussian process framework in python. 2012. http://github.com/SheffieldML/GPyhttp://github.com/SheffieldML/GPy. Accessed 2017.
 Vanneste S, Friml J. Auxin: a trigger for change in plant development. Cell. 2009; 136(6):1005–16. doi:10.1016/j.cell.2009.03.001.View ArticlePubMedGoogle Scholar
 Friml J, Benková E, Blilou I, Wisniewska J, Hamann T, Ljung K, Woody S, Sandberg G, Scheres B, Jürgens G, Palme K. Atpin4 mediates sinkdriven auxin gradients and root patterning in arabidopsis. Cell. 2002; 108(5):661–73. doi:10.1016/S00928674(02)006566.View ArticlePubMedGoogle Scholar
 Sabatini S, Beis D, Wolkenfelt H, Murfett J, Guilfoyle T, Malamy J, Benfey P, Leyser O, Bechtold N, Weisbeek P, Scheres B. An auxindependent distal organizer of pattern and polarity in the arabidopsis root. Cell. 1999; 99(5):463–72. doi:10.1016/S00928674(00)815354.View ArticlePubMedGoogle Scholar
 GarayArroyo A, De La Paz Sánchez M, GarcíaPonce B, Azpeitia E, ÁlvarezBuylla ER. Hormone symphony during root growth and development. Dev Dyn. 2012; 241(12):1867–85. doi:10.1002/dvdy.23878.View ArticlePubMedGoogle Scholar
 Chaiwanon J, Wang W, Zhu JY, Oh E, Wang ZY. Information integration and communication in plant growth regulation. Cell. 2016; 164(6):1257–68.View ArticlePubMedPubMed CentralGoogle Scholar
 Liu J, Mehdi S, Topping J, Friml J, Lindsey K. Interaction of pls and pin and hormonal crosstalk in arabidopsis root development. Front Plant Sci. 2013; 4(75). doi:10.3389/fpls.2013.00075.
 Lindsey K, Rowe J, Liu J. Hormonal crosstalk for root development: a combined experimental and modelling perspective. Front Plant Sci. 2014; 5(116). doi:10.3389/fpls.2014.00116.
 Moore S, Zhang X, Liu J, Lindsey K. Modelling Plant Hormone Gradients. In: eLS. Chichester: Wiley;2015.Google Scholar
 Rowe JH, Topping JF, Liu J, Lindsey K. Abscisic acid regulates root growth under osmotic stress conditions via an interacting hormonal network with cytokinin, ethylene and auxin. New Phytol. 2016. doi:10.1111/nph.13882. 201520300. 201520300