### Optimization problem formulation.

Model parameters were estimated by minimizing the difference between model simulations and \(\mathcal {E}\) experimental measurements. Simulation error is quantified by an objective function *K*(**p**) (typically the Euclidean norm of the difference between simulations and measurements) subject to problem and parameter constraints:

$$ \begin{aligned} & \min_{\mathbf{p}} K(\mathbf{p}) & =& \sum\limits_{i=1}^{\mathcal{E}} \left(g_{i}(t_{i},\mathbf{x,p,u})-y_{i}\right)^{2} \\ & \text{subject to} &&\dot{\mathbf{x}}=\mathbf{f}(t,\mathbf{x}(t,\mathbf{p}),\mathbf{u}(t),\mathbf{p})\\ &&&\mathbf{x}(t_{0}) = \mathbf{x}_{0}\\ &&&\mathbf{c}(t,\mathbf{x,p,u}) \geqslant \mathbf{0} \\ &&& \mathbf{p}^{L} \leqslant \mathbf{p} \leqslant \mathbf{p}^{U}\\ \end{aligned} $$

(1)

The term *K*(**p**) denotes the objective function (sum of squared error), t denotes time, *g*_{i}(*t*_{i},**x****,****p****,****u**) is the model output for experiment *i*, while *y*_{i} denotes the measured value for experiment *i*. The quantity **x**(*t*,**p**) denotes the state variable vector with an initial state **x**_{0}, **u**(*t*) is a model input vector, **f**(*t*,**x**(*t*,**p**),**u**(*t*),**p**) is the system of model equations (e.g., differential equations or algebraic constraints) and **p** denotes the model parameter vector (quantity to be estimated). The parameter search (or model simulations) can be subject to **c**(*t*,**x****,****p****,****u**) linear or non-linear constraints, and parameter bound constraints where **p**^{L} and **p**^{U} denote the lower and upper parameter bounds, respectively. Optimal model parameters are then given by:

$$ \mathbf{p^{*}}= \arg\min_{\mathbf{p}} K\left(\mathbf{\mathbf{p}}\right) $$

(2)

In this study, we considered only parameter bound constraints, and did not include the **c**(*t*,**x****,****p****,****u**) linear or non-linear problem constraints. However, additional these constraints can be handled, without changing the approach, using a penalty function method.

### Dynamic optimization with particle swarms (DOPS).

DOPS combines multi-swarm particle swarm optimization with dynamically dimensioned search (Fig. 1) and (Algorithm 1). The goal of DOPS is to estimate optimal or near optimal parameter vectors for high-dimensional biological models within a specified number of function evaluations. Toward this objective, DOPS begins by using a particle swarm search and then dynamically switches, using an adaptive switching criteria, to a DDS search phase.

#### Phase 1: Particle swarm phase.

Particle swarm optimization is an evolutionary algorithm that uses a population of particles (solutions) to find an optimal solution [54, 55]. Each particle is updated based on its experience (particle best) and the experience of all other particles within the swarm (global best). The particle swarm phase of DOPS begins by randomly initializing a swarm of \(\mathcal {K}\)-dimensional particles (represented as *z*_{i}), wherein each particle corresponded to a \(\mathcal {K}\)-dimensional parameter vector. After initialization, particles were randomly partitioned into *k* equally sized sub-swarms \(\mathcal {S}_{1},\hdots,\mathcal {S}_{k}\). Particles within each sub-swarm \(\mathcal {S}_{k}\) were updated according to the rule:

$$ {z}_{i,j} = \theta_{1,j-1}{z}_{i,j-1} + \theta_{2}{r}_{1}\left(\mathcal{L}_{i} - {z}_{i,j-1}\right) + \theta_{3}{r}_{2}\left(\mathcal{G}_{k} - {z}_{i,j-1}\right) $$

(3)

where (*θ*_{1},*θ*_{2},*θ*_{3}) were adjustable parameters, \(\mathcal {L}_{i}\) denotes the best solution found by particle *i* within sub-swarm \(\mathcal {S}_{k}\) for function evaluation 1→*j*−1, and \(\mathcal {G}_{k}\) denotes the best solution found over all particles within sub-swarm \(\mathcal {S}_{k}\). The quantities *r*_{1} and *r*_{2} denote uniform random vectors with the same dimension as the number of unknown model parameters (\(\mathcal {K}\times {1}\)). Equation 3 is similar to the general particle swarm update rule, however, it does not contain velocity terms. In DOPS, the parameter *θ*_{1,j−1} is similar to the inertia weight parameter for the velocity term described by Shi and Eberhart [56]; Shi and Eberhart proposed a linearly decreasing inertia weight to improve convergence properties of particle swarm optimization. Our implementation of *θ*_{1,j−1} is inspired by this and the decreasing perturbation probability proposed by Tolson and Shoemaker [34]. It is an analogous equivalent to inertia weight on velocity. However *θ*_{1,j−1} places inertia on the position rather than velocity and uses the same rule described by Shi and Eberhart to adaptively change with the number of function evaluations:

$$\begin{array}{@{}rcl@{}} \mathbf \theta_{1,j}&=&\frac{(\mathcal{N}-{j})*({w}_{max}-{w}_{min})}{(\mathcal{N}-{1})} + {w}_{min} \end{array} $$

(4)

where \(\mathcal {N}\) represents the total number of function evaluations, *w*_{max} and *w*_{min} are the maximum and minimum inertia weights, respectively. In this study, we used *w*_{max}= 0.9 and *w*_{min}= 0.4, however, these values are user configurable and could be changed depending upon the problem being explored. Similarly, *θ*_{2} and *θ*_{3} were treated as constants, where *θ*_{2} = *θ*_{3}= 1.5; the values of *θ*_{2} and *θ*_{3} control how heavily the particle swarm weighs the previous solutions it has found when generating a new candidate solution. If *θ*_{2}≫*θ*_{3} the new parameter solution will resemble the best *local* solution found by particle *i* (\(\mathcal {L}_{i}\)), while *θ*_{3}≫*θ*_{2} suggests the new parameter solution will resemble the best *global* solution found so far. While updating the particles, parameter bounds were enforced using reflection boundary conditions (Algorithm 2).

After every \(\mathcal {M}\) function evaluations, particles were randomly redistributed to a new sub-swarm, and updated according to Eq. (3). This process continued for a maximum of \(\mathcal {F}\mathcal {N}\) functions evaluations, where \(\mathcal {F}\) denotes the fraction of function evaluations used during the particle swarm phase of DOPS:

$$ \mathcal{F} = \left(\frac{\text{NP}}{\mathcal{N}}\right)j $$

(5)

The quantity NP denotes the total number of particles in the swarm, \(\mathcal {N}\) denotes the total possible number of function evaluations, while the counter *j* denotes the number of successful particle swarm iterations (each costing NP function evaluations). If the simulation error stagnated e.g., did not change by more than 1% for a specified number of evaluations (default value of 4), the swarm phase was terminated and DOPS switched to exploring parameter space using the DDS approach using the remaining \(\left (1-\mathcal {F}\right)\mathcal {N}\) function evaluations.

#### Phase 2: DDS phase.

Dynamically Dimensioned Search (DDS) is a single solution based search algorithm. DDS is used to obtain good solutions to high-dimensional search problems within a fixed number of function evaluations. DDS starts as a global search algorithm by perturbing all the dimensions. Later the number of dimensions that are perturbed is decreased with a certain probability. The probability that a certain dimension is perturbed reduces (a minimum of one dimension is always perturbed) as the iterations increase. This causes the algorithm to behave as a local search algorithm as the number of iterations increase. The perturbation magnitude of each dimension is from normal distribution with zero mean. The standard deviation that was used in the original DDS paper and the current study is 0.2. DDS performs a greedy search where the solution is updated only if it is better than the previous solution. The combination of perturbing a subset of dimensions along with greedy search indirectly relies on model sensitivity to a specific parameter combination. The reader is requested to refer to the original paper by Tolson and Shoemaker for further detail [34].

At the conclusion of the swarm phase, the overall best particle, \(\mathcal {G}_{k}\), over the k sub-swarms was used to initialize the DDS phase. DOPS takes at least \(\left (1-\mathcal {F}\right)\mathcal {N}\) function evaluations during the DDS phase and then terminates the search. For the DDS phase, the best parameter estimate was updated using the rule:

$$ \mathcal{G}_{new}({J})=\left\{\begin{array}{ll} \mathcal{G}(\mathbf{J})+\mathbf{r}_{normal}(\mathbf{J})\sigma(\mathbf{J}), &\ \text{if} \mathcal{G}_{{new}}(\mathbf{J})<\mathcal{G}(\mathbf{J}).\\ \mathcal{G}(\mathbf{J}), &\ \text{otherwise}. \end{array}\right. $$

(6)

where **J** is a vector representing the subset of dimensions that are being perturbed, **r**_{normal} denotes a normal random vector of the same dimensions as \(\mathcal {G}\), and *σ* denotes the perturbation amplitude:

$$ \sigma = {R}\left(\mathbf{p}^{U} - \mathbf{p}^{L}\right) $$

(7)

where *R* is the scalar perturbation size parameter, **p**^{U} and **p**^{L} are (\(\mathcal {K}\times {1}\)) vectors that represent the maximum and minimum bounds on each dimension. The set **J** was constructed using a probability function \(\mathcal {P}_{i}\) that represents a threshold for determining whether a specific dimension *j* was perturbed or not; \(\mathcal {P}_{i}\) is monotonically decreasing function of function evaluations:

$$ \mathcal{P}_{i}={1}-\log\left[\frac{i}{(1-\mathcal{F})\mathcal{N}}\right] $$

(8)

where *i* is the current iteration. After \(\mathcal {P}_{i}\) is determined, we drew \(\mathcal {P}_{j}\) from a uniform distribution for each dimension *j*. If \(\mathcal {P}_{j}<\mathcal {P}_{i}\) was included in **J**. Thus, the probability that a dimension *j* was perturbed was inversely proportional to the number of function evaluations. DDS updates are greedy; \(\mathcal {G}_{new}\) becomes the new solution vector only if it is better than \(\mathcal {G}\).

### Multiswitch DOPS

We investigated whether switching search methods more than once would result in better performance; this DOPS variant is referred to as multiswitch DOPS or msDOPS. msDOPS begins with the PSO phase and uses the same criteria as DOPS to switch to the DDS phase. However, msDOPS can switch back to a PSO search when the DDS phase has reduced the functional value to 90% of its initial value. Should the DDS phase fail to improve the functional value sufficiently, this version is identical to DOPS. When the switch from DDS to PSO occurs, we use the best solution from DDS to seed the particle swarm. DOPS and msDOPS source code is available for download under a MIT license at http://www.varnerlab.org.

### Comparison techniques

The implementations of particle swarm optimization, simulated annealing, and genetic algorithms are the ones given in Matlab R2017A (particleswarm, simulannealbnd and ga). The implementation of DE used was developed by R.Storn and available at http://www1.icsi.berkeley.edu/~storn/code.html. The version of eSS used was Release 2014B - AMIGO2014bench VERSION WITH eSS MAY-2014-BUGS FIXED - JRB, released by the Process Engineering Group IIM-CSIC. The genetic algorithm, particle swarm, and differential evolution algorithms were run with a 40 particles to be directly comparable to the number of particles used in the PSO phase of DOPS. For comparison, the version of CM-AES used was cmaes.m, Version 3.61.beta from http://cma.gforge.inria.fr/cmaes_sourcecode_page.html. The scripts used to run the comparison methods are also available at http://www.varnerlab.org.