### The standard two stage approach

In STS, the following model structure is used

$$ \dot{x}^{i} = f\left(x^{i},u^{i},p^{i}\right) $$

((1))

$$ y^{i} = h\left(x^{i},u^{i},p^{i}\right) $$

((2))

where *x*
^{i} is the state vector for the *i*:th individual; *u*
^{i} is the input signal vector for individual *i*; *p*
^{i} is the parameter vector for the *i*:th individual; *f* and *g* are nonlinear vector functions; and *y*
^{i} is the vector of observations for individual *i*.

#### Parameter estimation

In Stage 1, the parameters for each individual *p*
_{
i
} are estimated. Note that in a modelling framework such as eqs. (1)-(2), no information is shared between the individuals, which makes the parameter estimation problem for each individual a separate estimation problem. In Stage 2, the variability of the parameter estimates are calculated (Fig. 1
e).

In the case of only estimating the parameters, we minimize the following cost function, based on the sum of squares of the residuals

$$ cost_{\chi^{2}}(p) = \sum^{n_{y}}_{j=1}\sum^{N}_{i=1}\frac{\left(y_{ij}-\hat{y}_{ij}(p)\right)^{2}}{\sigma_{ij}^{2}} $$

((3))

where *i* and *j* denotes the *i*:th observation in the *j*:th state; *y*
_{
ij
} is the experimental data; \(\hat {y}_{\textit {ij}}\) is the corresponding simulated output from the model; and *σ*
_{
ij
} is the standard deviation of the experimental measurement. In other words, the kinetic parameters *p* are given by

$$ \hat{p} = \arg\min\left[cost_{\chi^{2}}(p)\right] $$

((4))

Note that this approach cannot be used to estimate *σ*, since in eq. (3) *σ* only appears in the denominator, and the optimum for *σ* therefore lies at +*∞*. For estimation of the noise, we therefore use the more general approach of maximizing the full log-likelihood function

$$ \begin{aligned} -l(p,\sigma) &= \sum^{n_{y}}_{j=1}\sum^{N}_{i=1}\frac{\left(y_{ij}-\hat{y}_{ij}(p)\right)^{2}}{\sigma_{ij}^{2}} + \sum^{n_{y}}_{j=1}\sum^{N}_{i=1}log\left(\sigma_{ij}^{2}\right)\\ & \quad+ n_{y}Nlog(2\pi) \end{aligned} $$

((5))

where

$$ ({\hat p,\hat \sigma}) = \textrm{arg max}\left[l(p,\sigma)\right] $$

((6))

where *σ* is the vector of all *σ*
_{
ij
}, which now normally have their optima different from 0 or infinity, since they appear both in the numerator and in the denominator.

#### Software

Modelling and parameter estimation with the STS approach was performed using MATLAB (Mathworks). The simulation of the model was performed with the function SPBDsimulate in The Systems Biology Toolbox2 (SBTB2) (sbtoolbox2.org). Optimisation was done both using the built-in local optimisation algorithm *lsqnonlin* (which uses Gauss-Newton methods), and using the global simulated annealing and nonlinear simplex approach available in SBTB2. All simulations and optimisations, except the ones made for the noise estimation and the JLH were done using a PC (Processor: Intel Core i5-3470 3.20 GHz, Memory: 8.00 GB, manufacturer: Hewlett-Packard). The simulations and optimisations for the noise estimation and the JLH were done using a laptop (Processor: Intel Core i5-560M 2.667 GHz, Memory: 2x 2048 MB, manufacturer: Samsung, DDR3-10600S, 1333 MHz). All MATLAB-files used (including datasets) are available in Additional file 1.

### The nonlinear mixed-effects approach

NLME is a general modelling approach that can be applied to analyse any type of system that can be described by eqs. (7)-(11), i.e. to a system that is made up of individuals belonging to a joint population, and where the individuals’ parameter values belong to the parameter distribution of the population. This link between the parameter values among the individuals allows information to be shared between the individuals. The idea is that this information sharing may result in better estimates for both the individual parameters and for the covariance matrices. More specifically, in NLME models, the following general model structure is used

$$\begin{array}{@{}rcl@{}} \dot{x}^{i} & = & f\left(x^{i},u^{i},\phi^{i}\right) \end{array} $$

((7))

$$\begin{array}{@{}rcl@{}} y^{i} & = & h\left(x^{i},u^{i},\phi^{i},\varepsilon^{i}\right) \end{array} $$

((8))

$$\begin{array}{@{}rcl@{}} \phi^{i} & = & g\left(\Theta,\eta^{i},Z^{i}\right) \end{array} $$

((9))

$$\begin{array}{@{}rcl@{}} \eta^{i} &\in& N(0,\Omega) \end{array} $$

((10))

$$\begin{array}{@{}rcl@{}} \varepsilon^{i}&\in& N(0,\Sigma) \end{array} $$

((11))

where *x*
^{i}, *u*
^{i}, and *y*
^{i} are, just as for STS, the state, input, and measurement vectors for individual *i*; where *ϕ*
^{i} is the parameter vector for the *i*:th individual, which now no longer is a free variable, but instead depends on *Θ*, the population parameter vector describing the typical individual in the population, *Z*
^{i} the covariates (not used in this paper), and *η*
^{i}, the random effects; and where *Ω* and *Σ* describes the covariance matrices of the random effects, *η*
^{i}, and the measurement noise, *ε*
^{i}, respectively.

#### Parameter estimation

In NLME there are two types of parameters to estimate: the fixed effects, *Θ*, and the variances of the random effects, *Ω* and *Σ*.

The fixed effects describe the main trend, i.e. the typical value of the model parameters. For a single-cell model, such fixed effects could be the typical population value of a kinetic parameter.

There are two types of random effects; between-cell, *η*, and between-sample, *ε*, variability. The between-sample variabilities are related to the residuals, in that they describe the differences between the predicted, \(\hat {y}\), and the observed, *y*, measurement values. In practice, the software packages used herein, NONMEM and Monolix, can handle e.g. additive

$$ y = \hat{y} + \varepsilon $$

((12))

proportional, and a combination of additive and proportional between-sample variability

$$ y = \hat{y} \cdot (1 + \varepsilon_{1}) + \varepsilon_{2} $$

((13))

The effect of the between-cell random effects, *η*, on the cell-specific parameters, *ϕ*
^{i}, can in principle be described by any function, *g*, and the used software packages support both normal, log-normal, and user-specified distributions (specific examples are provided in the two test cases below).

The marginal likelihood (L) of the model parameters for the data is the product of the individual marginal likelihoods of the cells, *j* according to

$$ \begin{aligned} P(y^{\,j}|\Theta,\Sigma,\Omega) &= L^{j}\left(\Theta,\Sigma,\Omega|y^{\,j}\right) = \int{P(y^{\,j},\eta^{\,j}|\Theta,\Sigma,\Omega)d\eta^{\,j}} \\ & = \int{P\left(y^{\,j}|\eta^{\,j},\Theta,\Sigma\right) \cdot P\left(\eta^{\,j}|\Omega\right)d\eta^{\,j}} \end{aligned} $$

((14))

$$ \begin{aligned} L^{j}(\Theta) &= \int^{+\infty}_{-\infty} {\left(\frac{1}{\sqrt{2\pi\Sigma}}\right)^{m}e^{-\frac{\sum\limits_{j=1}^{\eta^{\,j}} \left(y^{j}-\hat{y}^{\,j}\right)^{2}}{\Sigma}}}\frac{1}{\sqrt{2\pi\Omega}}e^{-\frac{\eta^{{\,j}^{2}}} {\Omega}}d\eta^{j} \\ &= \int^{+\infty}_{-\infty}{h^{\,j}(\eta,\theta)d\eta^{\,j}} \end{aligned} $$

((15))

Where *m* is the number of observations per cell, i.e., *m* =*N*·*n*
_{
y
}. The parameters are estimated through minimizing -2 log of the likelihood (-2LL). Since there is no closed form solution to the marginal likelihood various approximations are available. The most commonly used approximation is the first-order conditional estimation method where L is linearised with a first order Taylor expansion around the estimates of the random effects, i.e. around *η*. In the NONMEM example (Model 1), the numerical search for the minimum of the -2LL is implemented according to a modified algorithm by [24] which is a derivate-free quasi-Newton type minimization algorithm. The objective function value (OFV) reported by the software is proportional to -2LL. In the Monolix example (Model 2), the numerical search is done by a stochastic approximation of the expectation maximisation algorithm [25].

### Software

The NLME approach has been implemented in several software packages [26]. In this paper, the two software packages NONMEM and Monolix are both used. This is done to show consistency in terms of results across different software packages, but also as a way of presenting several choices to the reader.

For the analysis with NONMEM version 7.2.0 [27] was used. The interaction with NONMEM is made through NM-TRAN, a language translating user-defined code and datasets into FORTRAN77. ODEs can be defined by the user and for this particular project a differential equation solver, for non-stiff systems, was used (ADVAN6) together with the first-order conditional estimation (FOCE) method. Perl-speaks-NONMEM (PsN)3.5.3 [28, 29] was used for execution of models. NONMEM and PsN were installed on a PC and a laptop, which systems details were the same as described in the section The standard two stage approach, with the fortran compiler GNU gfortran.

For the analysis with Monolix, version 4.3.2 was used, as implemented in MATLAB version 8.1 [30]. Importantly, for users who do not have access to MATLAB, a stand alone version also exists. The models are written in a language called mlxtran, which apart from having a rich library of PKPD-models also allows the users to define their own ODEs [31]. SBTB2 also contains functions for translating SBTB2-models into mlxtran, using the addon-package SBPOP. Compared to NONMEM, Monolix offers a more user friendly environment, including a graphical user interface. For the beginner, we therefore recommend to use Monolix, and we have also developed a small tutorial, which explains how to us it for single-cell models. This tutorial, together with all scripts used to perform the analysis in this paper, is available in the Additional file 1. All the analysis regarding Monolix was performed on the same PC as described in the section The standard two stage approach. All NONMEM-files and Monolix-files used (including datasets) are available in Additional file 1.

### An in-between approach: the joint likelihood function

There are two main differences between STS and NLME which both potentially could lead to improvements in the parameter estimation: i) in NLME one forms a likelihood function for the parameter estimation to the combined data set for all cells, and ii) in NLME one postulates a distribution for the variation of the parameter values across the cell population.

To analyse where the improvement in the parameter estimation originates from, we also did some analysis with an in-between approach: the joint likelihood function (JLH) approach. In JLH, we only use improvement aspect i), the single likelihood function for the combined data set, but do not postulate a joint parameter distribution. In other words, instead of eqs. (5)-(6), we use the following two equations:

$$ \begin{aligned} &-l_{\textrm{JLH}}(p^{1},p^{2}\ldots,\sigma^{1},\sigma^{2},\ldots) = \sum^{N_{c}}_{k=1}\sum^{n_{y}}_{j=1}\sum^{N}_{i=1}\frac{\left(y^{k}_{ij}-\hat{y}^{k}_{ij}(p)\right)^{2}}{\left(\sigma^{k}_{ij}\right)^{2}}\\ &\qquad + \sum^{N_{c}}_{k=1}\sum^{n_{y}}_{j=1}\sum^{N}_{i=1}log\left(\left(\sigma_{ij}^{k}\right)^{2}\right) + N_{c}n_{y}Nlog(2\pi) \end{aligned} $$

((16))

where

$$ {\fontsize{9.2pt}{9.6pt}\selectfont{\begin{aligned} \left({\hat p^{1},\hat p^{2},\ldots\hat \sigma^{1}, \hat \sigma^{2},\ldots}\right) = \arg \max\left[l_{\textrm{JLH}}\left(p^{1},p^{2} \ldots,\sigma^{1},\sigma^{2}\right), \ldots \right] \end{aligned}}} $$

((17))

where *N*
_{
c
} denotes the number of cells.

#### Software

The software and optimization and model formulation tools used for the JLH analysis is the same as for STS.

### Experimental data

The experimental procedures are further discussed in [32]. FRAP experiments were performed using YFP. We used yeast cells of the BY4741 genetic background expressing YFP under the control of the constitutively expressed ACT1 promoter (P _{ACT1}-YFP). In addition, this strain expressed two extra fluorescent protein reporters: a CFP-Ace2 fusion and a Myo1-mCherry fusion, both driven by their own promoters. We used Ace2 to locate the nucleus and determine the cells position in the cell cycle (Ace2 is nuclear only at the end of mitosis and early G1), and we used Myo1 to confirm mother-daughter separation (Myo1 forms a ring at the bud-neck during mitosis, which disappears when cytokinesis is complete). In this way, we minimized cell-to-cell variation in our measurements related to cell cycle position, without disturbing the system with synchronization procedures.

We used exponentially growing cells cultured in synthetic medium (BSM-TRP,LEU,URA 2 % glucose). For the experiments, we attached cells to the bottom of 384-well glass bottom plates (MGB101-1-1-LG, Matrical Biosciences, Spokane, Washington, USA). To prevent cells from moving during the experiment, we pre-treated the wells with concanavalin A (type V; Sigma-Aldrich, St. Louis, MO, USA), as previously described (Colman-Lerner 2005).

Photobleaching of individual nuclei was performed using an Olympus IX-81 inverted microscope with a FV1000 confocal module with an oil immersion Olympus UplanSapo 63X objective (numerical aperture, NA 1.35). We used an automatic z-axis control, a motorized x-y stage, a 458-488-515 argon laser, a 543 He-Ne laser and photomultiplier (PMT) Hamamatsu R6353.

For train-of-FRAP experiments, we imaged YFP with the 515 nm laser. In each cell, we repeated four times the following procedure: we first took 5 images, then performed a partial photobleaching (roughly reducing the signal by 50 %), then measured the recovery for 7 sec (30 images, time resolution 0.22 sec). From the photobleaching step, we used a 100 % laser power on a small sub-area of the nucleus (radius 0.25 m) during 0.16 sec. Subsequently, we imaged using 1 % laser power and 4 sec/pixel scanning speed. We set the confocal microscope pinhole to wide open (500 m).

Quantification of total fluorescence in each compartment was performed in ImageJ, by manually drawing regions of interest (ROIs) in the nucleus and the cytosol, and quantifying it using the ImageJ plugin “Time Series Analyzer V2”. A typical ROI was a circle of 200 nm in radius. We applied photobleaching and autofluorescence corrections to all images as described in [32].

### Two case studies: a linear and a nonlinear model

Let us now introduce the two examples considered in this paper. The first, Model 1, is a linear model, describing transport of YFP, and the second, Model 2, is a nonlinear model describing the osmo-regulation of yeast cells.

#### Linear model

The state-space description of Model 1 is given by the following four equations

$$\begin{array}{*{20}l} \frac{dn}{dt} &= -k_{1}\cdot n + k_{2}\cdot c \end{array} $$

((18))

$$\begin{array}{*{20}l} \frac{dc}{dt} &= k_{1}\cdot n - k_{2}\cdot c \end{array} $$

((19))

$$\begin{array}{*{20}l} y_{n} &= n + \epsilon \end{array} $$

((20))

$$\begin{array}{*{20}l} y_{c} &= c + \epsilon \end{array} $$

((21))

where *n* and *c* are the amounts of YFP in the nucleus and in the cytosol, respectively; where *k*
_{1} and *k*
_{2} are the transports from and to the nucleus, respectively; and where *y*
_{
n
} and *y*
_{
c
} are the two measurement signals (Fig. 3
b and c). A sketch of the model is given in Fig. 3
a.

In STS estimation, data from each cell were analysed separately, potentially yielding as many *k*
_{1} values as there were cells in the experiment. For simulation of the data using eqs. (18)-(21), the following equations were used to obtain the initial conditions

$$ n(t_{FRAP,j}) = p_{n,j}\cdot y_{n}(t_{FRAP,j}) $$

((22))

$$ c(t_{FRAP,j}) = p_{c,j}\cdot y_{c}(t_{FRAP,j}) $$

((23))

where *t*
_{
F
R
A
P,j
} is the first time point after FRAP *j*. There were four FRAPs, two states, and two kinetic parameters, and thus 10 unknown parameters in the parameter vector *p*

$$ p = (k_{1},k_{2},p_{n,1},p_{n,2},p_{n,3},p_{n,4},p_{c,1},p_{c,2},p_{c,3},p_{c,4}) $$

((24))

In the NLME estimation, the individual rate constants \({k_{1}^{j}}\) and \({k_{2}^{j}}\) were described by the following equations

$$ {k_{1}^{\,j}} = \theta_{k_{1}} \cdot e^{\eta_{k_{1}}^{\,j}} $$

((25))

$$ {k_{2}^{\,j}} = \theta_{k_{2}} \cdot e^{\eta_{k_{2}}^{\,j}} $$

((26))

where \(\theta _{k_{1}}\) and \(\theta _{k_{1}}\) are the typical values of *k*
_{1} and *k*
_{2} in the cell population, respectively, and \(\eta _{k_{1}}^{\,j}\) and \(\eta _{k_{2}}^{\,j}\) are random effects describing the difference between the typical and individual values. \(\eta _{k_{1}}\) and \(\eta _{k_{2}}\) belongs to normal distributions with mean 0 and estimated variances, \(\omega _{k_{1}}^{2}\) and \(\omega _{k_{2}}^{2}\), respectively. The estimated variances \(\omega _{k_{1}}^{2}\) and \(\omega _{k_{2}}^{2}\) can be correlated. Both the variances \(\omega _{k_{i}}^{2}\) and their correlations are collected in the variance-covariance matrix *Ω*. Thus, five parameters are needed to describe the individual \({k_{1}^{\,j}}\) and \({k_{2}^{\,j}}\) for the entire population: \(\theta _{k_{1}}\), \(\theta _{k_{2}}\), \(\omega _{k_{1}}^{2}\), \(\omega _{k_{2}}^{2}\), and the off-diagonal element in *Ω*. Unlike in STS, this number is always five, independent of the number of analysed cells.

The initial conditions were modelled as

$$ n(t_{FRAP,j}) = e^{(\eta_{j}\cdot\Theta_{3})}\cdot y_{n}(t_{FRAP,j}) $$

((27))

$$ c(t_{FRAP,j}) =e^{(\eta_{j}\cdot\Theta_{3})}\cdot y_{c}(t_{FRAP,j}) $$

((28))

where *η*
_{
j
}∈*N*
_{
j
}(0,1) and *Θ*
_{3} is the standard error of the residual error.

Finally, the noise is for both STS and NLME assumed to be additive, which also is how the simulated data was generated. In other words, simulations were done for different values of the parameters, additive non-correlated noise was added, and the ability of STS and NONMEM to retrieve the parameter values and the standard deviation of the noise was evaluated.

#### The nonlinear model

Model 2 is a model published by Gennemark et al. [33] (equations for the model can be found in Additional file 2). The model describes how the yeast cell reacts to an osmolarity shock by producing glycerol via activation of the protein Hog1 (Fig. 4
a). A description of all the equations can be found in the Supplementary material. The model consists of 4 ODEs, 10 parameters, and 3 algebraic equations, including several nonlinearities, both products of states, and events switching between two expressions depending on the value of a state. 4 parameters (Ve, kp2, kHOG, td) were optimized from the simulated data, and the remaining 6 parameters were kept at their literature values. The initial conditions were assumed to be known, as was the noise level. The input of the model is the addition of salt to the cells (Fig. 4
b) and the measured output signal of the model is intracellular (Fig. 4
c) and total (intracellular + extracellular) glycerol concentrations (Fig. 4
d).

In the NLME estimation, the model parameters are described by the following equations.

$$\begin{array}{@{}rcl@{}} \text{Ve}^{\,j} & = & \theta_{\text{Ve}} \cdot e^{\eta_{\text{Ve}}^{\,j}} \end{array} $$

((29))

$$\begin{array}{@{}rcl@{}} \textrm{kp2}^{\,j} & =& \theta_{\textrm{kp2}} \cdot e^{\eta_{\textrm{kp2}}^{\,j}} \end{array} $$

((30))

$$\begin{array}{@{}rcl@{}} \textrm{kHOG}^{\,j}& =& \theta_{\textrm{kHOG}} \cdot e^{\eta_{\textrm{kHOG}}^{\,j}} \end{array} $$

((31))

$$\begin{array}{@{}rcl@{}} \text{td}^{j} & =& \theta_{\text{td}} \cdot e^{\eta_{\text{td}}^{\,j}} \end{array} $$

((32))

where *θ*
_{
x
} is the typical value of *x* in the cell population, and \({\eta _{x}^{j}}\) is the random effect describing the difference between the typical and individual values for parameter *x* for cell *j*. These random effects (*η*
_{Ve}, *η*
_{kp2}, *η*
_{kHOG}, and *η*
_{td}) belongs to normal distributions with mean 0 and estimated variances (\(\omega _{\text {Ve}}^{2}\), \(\omega _{\textrm {kp2}}^{2}\), \(\omega _{\textrm {kHOG}}^{2}\), and *ω*
_{td}). There are no covariances, i.e. the matrix *Ω* is set to be diagonal. Both measurements of the model, intracellular and total glycerol concentrations, have additive noise

$$\begin{array}{@{}rcl@{}} y_{\text{ic}} & = & \hat{y}_{\text{ic}} + \varepsilon_{\text{ic}} \end{array} $$

((33))

$$\begin{array}{@{}rcl@{}} y_{\text{tot}}& = & \hat{y}_{\text{tot}} + \varepsilon_{\text{tot}} \end{array} $$

((34))

where index ic means intracellular, tot means total, and where *ε*
_{ic} and *ε*
_{tot} are normally distributed, with mean 0 and estimated variances.

#### Comparison of performance

The performance of STS and NLME are analyzed by comparing the relative deviation from the true parameter value (estimated parameter/true parameter) (Figs. 5, 7). Apart from figures showing the relative deviation for each parameter for each artificial cell, the differences between STS and NLME is also accompanied by a Student’s t-test. The t-test is pairwise, and tests whether the relative deviation (the distance from 1) is significantly different between STS and NLME.

### Ethics

The experiments and data collection were carried out on *Saccharomyces Cerevisiae*, which has no associated ethical issues.