Incorporating prior knowledge improves detection of differences in bacterial growth rate

Background Robust statistical detection of differences in the bacterial growth rate can be challenging, particularly when dealing with small differences or noisy data. The Bayesian approach provides a consistent framework for inferring model parameters and comparing hypotheses. The method captures the full uncertainty of parameter values, whilst making effective use of prior knowledge about a given system to improve estimation. Results We demonstrated the application of Bayesian analysis to bacterial growth curve comparison. Following extensive testing of the method, the analysis was applied to the large dataset of bacterial responses which are freely available at the web-resource, ComBase. Detection was found to be improved by using prior knowledge from clusters of previously analysed experimental results at similar environmental conditions. A comparison was also made to a more traditional statistical testing method, the F-test, and Bayesian analysis was found to perform more conclusively and to be capable of attributing significance to more subtle differences in growth rate. Conclusions We have demonstrated that by making use of existing experimental knowledge, it is possible to significantly improve detection of differences in bacterial growth rate. Electronic supplementary material The online version of this article (doi:10.1186/s12918-015-0204-9) contains supplementary material, which is available to authorized users.

1 Background on the model of Baranyi

and Roberts for bacterial growth
A classical family of one-variable growth models states that in a constant physical environment, the expected number of individuals, x(t), in the population at time t, can be described by the first order autonomous differential equation with the initial value x(0) = x 0 and maximum population density, x max . Here, µ(x) is the so-called specific rate, which depends on the population size and has the following properties (see [1,2]): µ(x) is continuously differentiable on (0, x max ), µ(x 0 ) > 0 and µ(x max ) = 0, dµ(x) dx is strictly negative on (0, x max ).
The last two conditions express that (i) at the initial conditions, however small the population is, it grows at a positive rate and (ii) the larger the population size, the smaller the specific rate, until the population size becomes x max , when growth stops. Under these conditions the above differential equation has a unique solution. This solution is monotone increasing and converges to x max as t → ∞.
For our practical task, it is more useful to study the logarithm of the population size and henceforth our main time-dependent variable is y = ln x(t). Accordingly, we use the notation y 0 = ln x 0 and y max = ln x max . With the above substitution, the specific growth rate becomes the slope of the y(t) growth curve, and our model is: dy dt =ẏ = µ(e y ), y(0) = y 0 .
For a bacterial population, it is assumed that, at division, two viable daughter cells replace the mother cell, and the distribution of the division times (i.e. the times from birth to division for each single cell) does not depend on the number of ancestral generations. This is expressed by the fact that the above equation is autonomous; that is the time does not explicitly appear on the right hand side of the differential equation. Given constraints (2)-(5), the (ẏ, y) phase-plane portrait of the initial value problem, (6), follows the pattern shown by the black dashed line in Figure 1 of the main text; it decreases monotonically as the population size grows. The model given in (6) expresses the transition from the exponential to stationary phases, but not the lag phase. The right hand side of our equation needs to be modified by an adjustment factor which indicates the time elapsed from the inoculation. This function should be small close to the inoculation and should increase monotonically, converging to 1 as the process arrives at the exponential phase. This factor makes the system non-autonomous (the left-hand side,ẏ, depending explicitly on time too, rather than implicitly through y). Such a factor was introduced in reference [3] in the form of a Michaelis-Menten function, α(P ), where the "bottle-neck" substance, P , was produced at a constant rate, lending α(t) the well-known logistic shape. The blue line in Figure 1 in the main text shows the desired pattern of the phase-plane portrait for the non-autonomous model, with lag phase included. Assuming α(0) is sufficiently small, the solution of the initial value problem (6) follows a sigmoid pattern for y = ln x, where a relatively linear phase is both preceded and followed by a transition phase characterised by small specific growth rates.

Prescribing versus inferring the noise level, σ
When deciding whether to infer or fix the noise level, σ, in our analysis, it is important to investigate how much difference its value makes to our results. For this reason we performed a comparison using the large dataset of growth curves available at ComBase. The Bayesian analysis was run twice over the dataset, fitting every curve using the 4 parameter model of Baranyi and Roberts as described in the main text (Equations (1) and (2)). The first of these times, the noise level was inferred (allowing us to obtain the value σ = σ I ), whilst the second time σ was fixed at σ F = 0.5. The evidence was then compared for the two cases. The results are shown in Figure S1. Unsurprisingly, it was found that if σ is chosen correctly, there is not much difference between the two cases. However, if σ is prescribed incorrectly, the Bayes factor shows a large difference in evidence compared to the case where σ is inferred; the evidence being strongly in favour of the latter. Therefore, a good general rule would be that if we are not sure of the value of the noise level, we are best to infer it. Log Bayes for Sigma Inferred vs. Fixed Figure S1: If we are not confident in our estimate for the noise level then it should be inferred, since this can have a big impact on the results. A large dataset comparison of the difference in evidence between the case where σ is inferred (with value σ I ) and prescribed at σ F = 0.5. The inset shows a close up of the plot. Here we compare our results by plotting the log Bayes factor, ln β 12 . These results show that choosing to prescribe σ incorrectly can have a detrimental effect on the evidence, and therefore the results.

An explanation of the choice of control parameters used in the analysis
We investigated the best control parameters of our nested sampling code for accurate estimation of the evidence. For this, we used the 4 parameter model of Baranyi and Roberts as given in the main text (Equations (1) and (2)). In low-dimensional cases, the evidence can be approximately calculated by bruteforce numerical integration. Three bacterial growth curves in the ComBase database were fitted using the 4 parameter model of Baranyi and Roberts and the evidence was evaluated on a fine grid with small step sizes for each of these. The output from 50 runs of nested sampling compare very well to the brute-force values for a range of prior sizes ( Figure S2). On average nested sampling produces an estimate for the evidence that is accurate for practical purposes ( Figure S3), where small differences in evidence suggests the data does not strongly discriminate between competing models. We note that there is variability in this stochastic algorithm due to differences in the random seed ( Figure S4), but results are similar for enough samples, around 100. For the nested sampling algorithm a greater sampling density from the prior distribution will increase the chances of highly probable areas being explored. The current literature of nested sampling applications in biological problems use a large range of different prior sizes. For example, in the study of protein folding [4] a set of 20000 prior objects was used to provide a wide selection of conformations. At the other end of the scale it has been shown that maintaining a set of 25 active points can produce accurate parameter mean and standard deviations that are relatively insensitive to the prior size [5]. In our testing phase we found that 100 prior samples were often sufficient for the estimation of bacterial growth curve parameters and evidence. There is no rigorous termination criterion to suggest when we have accumulated the bulk of the evidence [6], however one plausible criterion is that termination is decided by approximating the remaining evidence that can be accumulated from the posterior. This amount can be estimated as ∆Z i = L max X i , Avg. log evidence Figure S3: A larger number of prior samples increases the accuracy and decreases the variance of the evidence, but the gain is not large above around 100 points. The average log evidence plotted against the number of prior samples from 10 runs of nested sampling is shown as a thick orange line. The ribbon represents the average of the associated standard deviation of each run. Notice how it tightens around the mean for more prior samples. The dark line is the approximate value from brute-force numerical integration. Noting the scale of the y-axis, we see that it is not worth the expense of the extra computation time that is required for only slightly more accurate evidence calculation at larger numbers of prior samples. where L max is the maximum likelihood value of the active set and X i is the remaining prior volume [7,8].
Using a selection of tolerance levels, it was found that one particular choice did not affect the accuracy of the evidence calculation (see Figure S5, for which each panel looks the same). To be safe, a tolerance of 0.1 was chosen in the log-evidence calculation. The number of posterior samples dictated by this criteria was checked against test cases (inferring bacterial curve parameters) with increasing numbers of posterior samples, and was found to be consistent.  (1) and (2)), we may instead consider a fairly all-encompassing set of growth models, which may be described as subsets of the 6 parameter model of Baranyi and Roberts [9,3]. If we let the bacterial concentration at time t be given by x(t), then this model is given by where y 0 = ln x(0) and y max = ln x max , for x max the maximum bacterial concentration. Here, µ max denotes the maximum specific growth rate in (t, ln x)-space and m controls the curvature of the growth curve during the transition to the stationary phase. This model includes an adjustment function, which takes into account some intracellular substance that must be built up at some rate, ν, during the lag phase in order for bacterial growth to occur and which is assumed to follow Michaelis-Menten kinetics. This function introduces a lag phase of length λ. We note that although that the bacterial concentration is transformed to the ln x scale for use in the model, all results are transformed back afterwards to the log 10 x scale. Simpler models that include the adjustment function may be obtained given certain assumptions. In particular, if we assume that bacterial growth occurs at the same rate as growth of the rate-limiting substance so that µ max = ν, as well as setting m = 1 (a common choice for a transition to stationary phase that is not too sharp) and h 0 = λµ max (so that the lag time is defined by the point at which the tangent to the growth rate crosses the y axis divided by the growth rate), then the 4 parameter model is obtained, with parameter set {y 0 , y max , µ max , h 0 }. Furthermore, if we take m = 0 instead of 1, then the stationary phase is lost, resulting in a 3 parameter model with parameter set {y 0 , µ max , h 0 }.
We also consider a couple of classical growth models. These may be obtained by first replacing A(t) by t, so that we lose the lag phase. Then, in the case of letting m = 1, the logistic model is obtained (a classical sigmoidal model) with parameter set {y 0 , y max , µ max }, whilst the choice m = 0 results in a linear model with parameter set {y 0 , µ max }. This subsetting of the 6 parameter model is summarised in Figure S6  Excel add-in, response to fluctuating conditions Figure S6: Simpler growth models can be considered as subsets of the 6 parameter model of Baranyi and Roberts [9,3]. A summary of the bacterial growth models studied. The table gives a colour key for the curve shape for each model (the 4 and 6 parameter models share the same curve shape) and the parameters used in each model. Here y 0 = ln x 0 , where x 0 is the initial bacterial concentration and y max = ln x max where x max is the maximum of the bacterial concentration. In addition, µ max is the maximum specific growth rate, λ is the lag time, h 0 = µ max λ, and ν and m control the curvatures from the lag to exponential and exponential to stationary phases respectively.

Model comparison for comparing bacterial growth models
There are a number of software solutions available to implement the various growth models (as well as microbial survival or inactivation curves), but these offer only limited advice on model choice (usually through visual comparison of fitted curves combined with a statistical evaluation such as the coefficient of determination and goodness of fit). A summary of currently available resources is given in Table S1, including various tools available for the purposes of risk assessment. As mentioned in the main text, by making use of the model comparison framework described in the methods section we may also compare bacterial growth models. We therefore take the opportunity to compare the above growth models over a large dataset. Since every extra parameter must be justified by the data [29], we must be careful not to inadvertently favour a more complicated model than necessary as this will lead to indeterminate parameters. A number of papers have used various statistical tests to compare  [17] Growth models in vegetable and meat products FISHMAP [18] Growth models in fish products Food Spoilage & Safety Predictor [19] Growth models in food products MLA Index Calculator [20,21] Log growth calculation in meat products Shelf Stability Predictor [22] Growth models in ready-to-eat meals Salmonella Predictions [23,24] Excel add-in, response to fluctuating conditions E. coli SafeFerment [25] Excel add-in, response to fluctuating conditions GInaFiT [26] Excel add-in, predictive survival models E. coli Inactivation Model [27] Predictive survival models Sym'previus [28] Database of responses, predictive growth/survival models models [30,31,32]. These papers have indicated that the most appropriate model may vary depending on the data, but that for minimal datasets, a simpler model may often be preferred over one with more parameters.
As discussed above, in the literature there has been a tendency to set µ max = ν and m = 1 [9,3,33]. However, recent work using Escherichia coli suggests that in arabinose, µ max > ν [34]. Using the 6 parameter model of Baranyi and Roberts, it is found that over a large dataset the means of the parameter samples are in agreement with µ max = ν (see Figure S7). It is also found that typically m = 1. A further assumption  Figure S7: Contrary to the assumptions often made in growth models, µ max = ν, m = 1 over a large dataset. Using the 6 parameter model of Baranyi and Roberts and inferring the noise level, σ, we plot the mean values of the parameters µ max , ν and m over the dataset. We find that contrary to the simplifying assumptions made in several growth curve models, µ max is not always equal to ν and m is not always equal to 1.
(also discussed above) is to let h 0 = µ max λ. In our analysis we find that as a direct result of this (and the assumption that µ max = ν), µ max and h 0 are more highly (positively) correlated in the 4 parameter model of Baranyi and Roberts than their counterparts, µ max and λ, in the 6 parameter model (in which λ and ν are also found to be significantly negatively correlated). This can be seen, for example, in Figure S8. Despite these apparent differences between models, calculation of the evidence when fitting the various models to each single growth curve over a large dataset suggests that it is far from clear that one model fits better than another. Figure S9 shows the results of calculating the preferred model as a function of the number of data-points in a given curve. Here Bayes' factor is used together with the principle of Occam's razor, so that a more complicated model (with more parameters) is preferred only if there is substantial evidence for it over a simpler model. We also note that there are less data curves in the dataset as we tend towards large and small numbers of data points, explaining the bell-shaped pattern that can be seen in these plots. There are found to be some general rules in play. For instance, for just two or possibly three data points the most often preferred model is the simpler logistic model. Once there are more than two or three data points, the 6 parameter model is the most often favoured, followed by the 4 parameter model. However, the best model is not always obvious and certainly there is no clear rule as to which growth model is the most appropriate for a given set of data. Therefore, in agreement with the work of reference [30], we would recommend that a new model comparison analysis be undertaken for each new set of data, in order to find the optimum fit.
On the other hand, we must be careful not to be misled by statistical methods while comparing growth models and certainly should not blindly apply them without forethought. Often the knowledge and experience of the scientist may bring circumstantial evidence to the problem. For example, if the experiment is run over  Figure S8: The growth rate µ max is highly correlated with h 0 . A typical example (using curve A 1 from the ComBase database and prescribing the noise level, σ, at σ F = 0.5) of correlation matrices for the (a) 4 and (b) 6 parameter models of Baranyi and Roberts. Here the orientation of ellipses represents the sign of the correlation. The correlation between µ max and h 0 in the 4 parameter model is stronger than the correlation between µ max and λ in the 6 parameter model.  Figure S9: There is no single preferred growth model, but the number of data-points plays a part. The number of times each model is the most preferred over a large dataset as a function of the number of data points (where we note that there are less data curves in the dataset as we tend towards large and small numbers of data points). A model with more parameters is preferred only if there is substantial evidence for it over a model with fewer parameters. The noise level, σ, is inferred. a long time, the bacteria in question may be growing under stress conditions and the observed curve may actually be a combination of both growing and dying subpopulations. This sort of knowledge can only be added by the experimentalist and confirmed via experience during other similar experiments. Therefore we suggest that this kind of analysis be used as a complementary tool, together with the user's own experience, to ultimately determine the best model to be used in each situation and for each application.

°C)
Figure S14: There is a general correlation between temperature and bacterial growth rate. Fitting all of the growth curves available at ComBase using the 4 parameter model of Baranyi and Roberts, we find an overall correlation of 0.63 between the temperature of each experiment and our inferred growth rate for that experiment.  Figure S15: Results may be clustered by estimated growth rate instead of temperature. Growth curves for the organism Salmonella spp. at pH values of between 5 and 5.1 were clustered by their inferred growth rates instead of temperature and the coloured by the temperature. For each cluster entry, the first number is the temperature and the second is the growth rate. Again clusters with similar estimated growth rates are found to grow in similar temperatures (high temperature is purple, low temperature is green). Probability density Figure S16: Clusters of curves inform the prior. A cluster of four curves is illustrated by plotting a Gaussian distribution using the mean and variance of each curve (dashed lines). The means and variances are combined using Equations (13) and (14) of the main text to give an overall mean and variance which is used to build both a Gaussian (solid black line) and Cauchy (solid blue line) prior.