### Doubly weighted stochastic simulation

We begin with a brief review of the dwSSA; detailed derivation and applications can be found in Daigle *et al*. [17]. Throughout this paper we assume a well-stirred system at constant temperature with *N* species *S*
_{1},…,*S*
_{
N
} and *M* reactions *R*
_{1},…,*R*
_{
M
}. The state of the system at time *t* is represented as **X**(*t*)=[*X*
_{1},…,*X*
_{
N
}], where *X*
_{
i
} is the population of species *S*
_{
i
}. Using the “direct method” implementation of Gillespie’s Stochastic Simulation Algorithm (SSA) [19], the system moves forward in time by sequentially firing one of *R*
_{
j
}, *j*Є {1,…,*M*} reactions, whose propensity at time *t* is *a*
_{
j
}(**X**(*t*)) and its sum . Here, the next reaction is chosen with a categorical random variable *j*
^{′} and the time to the next reaction with an exponential random variable *τ*. We also assume that all trajectories are run until the smaller of the final simulation time *t*
_{
f
} and the first time is observed, where is the event of interest. Denoting the stopping time of a trajectory as , the probability of a complete trajectory given **X**(0)=**x**
_{0} under SSA is as follows:

with and the total number of reactions fired in .

The dwSSA uses predilection functions to increase the number of trajectories that reach :

where is a biasing parameter for *R*
_{
j
}. The probability of the same trajectory **J** under the dwSSA is then given by

and the bias incurred by using the predilection function is corrected with

It is straightforward to confirm that the product of (3) and (4) equals (1).

The Monte Carlo estimator for , the probability that the system reaches by time *t* given the initial state **x**
_{0}, is

where *N* is the total number of trajectories, **J**
_{
i
} represents the *i*
^{th} simulated dwSSA trajectory, and takes a value of 1 if is visited by **J**
_{
i
} and 0 otherwise. The quantity in (5) can be interpreted as the weighted average of successful trajectories, *i.e.*, trajectories reaching , where the weight is computed according to (4). A good set of biasing parameters would yield successful trajectories with weights close to the true probability and thus reduce variance in the probability estimator. The dwSSA computes low variance biasing parameters by minimizing cross entropy using a modified version of Rubinstein’s multilevel cross-entropy method [17],[18]. The advantage of minimizing cross entropy over minimizing variance is that the former yields biasing parameters with a closed-form solution for (2) where the latter does not. Having a closed-form solution is of practical necessity, as the alternative would be to solve a large set of nonlinear equations, significantly decreasing the efficiency of the algorithm if not making the simulation infeasible.

Following derivations presented in Daigle *et al*. [17], the dwSSA biasing parameter for *R*
_{
j
} is computed as

where *n*
_{
ij
} is the total number of times reaction *j* fires in the *i*
^{th} trajectory, iterates only over trajectories reaching , and *l* is the stage index in multilevel cross-entropy method. Computation for terminates when intermediate rare event reaches , at which point we set .

The objective of the dwSSA necessitates computation of the likelihood ratio (4) as its probability estimator is with respect to the initial reaction rates *k*
^{(0)}. On the other hand, the objective of SParSE is to compute a set of reaction rates such that

where is the desired probability of observing by time *t*, an indicator function for observing during *i*
^{th} trajectory, and a user-specified absolute error tolerance on . Unlike the dwSSA where the biasing parameters are updated each level of the multilevel cross-entropy method according to (6), in SParSE reaction rates are updated instead. We note that it is possible to use the dwSSA Monte Carlo estimator (5) in (7) and update *γ*
^{(l)} instead of *k*
^{(l)}. However unless *k*
^{(0)} is sufficiently close to *k*
^{*}, the likelihood ratio (4) may become extremely small, *i.e.*, degenerate, and updating reaction rates avoids this problem. We discuss the criteria for updating *k* in the following section.

### Multilevel cross-entropy method

The modification of multilevel cross-entropy method in SParSE is similar to that of Ref [17]. However, there are three major differences between the multilevel cross-entropy method employed by dwSSA and by SParSE: (i) dwSSA only computes a single intermediate event *ξ*
^{(l)} and the corresponding set of biasing parameters *γ*
^{(l)} while SParSE may compute multiple such quantities, (ii) SParSE can calculate biasing parameters for initial reaction rates that either over- or under-perturb the system with respect to . For an over-perturbed system, it applies inverse biasing to reaction rates to convert ε from a “sure event” to a “rarer event”, and (iii) dwSSA updates biasing parameters *γ*
^{(l)} while SParSE updates reaction rates *k*
^{(l)}. The following subsection explains the first two differences and highlights how SParSE achieves the same time complexity as dwSSA for computing quantities in (i). The next subsection focuses on (iii) and its effect on simulation details.

#### Concurrent computation of multiple intermediate events and biasing parameters

In dwSSA, *N* trajectories in level *l* of multilevel cross-entropy method are run until either the final simulation time *t*
_{
final
} or the first occurrence of *ξ*
^{(l)}, where *ξ*
^{(l)} is an intermediate rare event chosen by the top ⌈*ρ* *N*⌉trajectories that evolve farthest in the direction of ε. Typical value of *ρ* used in dwSSA is 0.01 for *N*=10^{5}, although any value *ρ* Є (0,1) can be used in theory [17]. The role of *ρ* can be thought as a knob that controls the tradeoff between the speed of convergence to and accuracy in . For *ρ*
^{′}<*ρ*, we get , thus smaller values for *ρ* can potentially drive the system toward faster. However the number of trajectories reaching is less than the number of trajectories reaching *ξ*
^{(l)} since ⌈*ρ*
^{′}
*N*⌉<⌈*ρ* *N*⌉. Having fewer data to compute reduces the confidence on the estimate, therefore it is advised to keep ⌈*ρ* *N*⌉ above a set threshold (*e.g.*, 200) in practice. On the other hand, larger value of *ρ* (*e.g.*, *ρ*>0.3) implies a less selective intermediate rare event. The resulting biasing parameters may not push the system closer to , causing a failure in convergence to the target event. In our experience, *ρ*<0.2 and ⌈*ρ* *N*⌉>100 yield both reliable computation of biasing parameters and acceptable convergence to .

In order to determine *ξ*
^{(l)}, it is necessary to determine the direction of bias in addition to *ρ*. This is done by grouping the initial state **x**
_{0} into two categories according to its distance with respect to the event of interest:

where *f*(**x**(*t*)) is an event function. Two requirements for *f*(**x**(*t*)) are that it takes **x**(*t*) as an input and can be used to evaluate the distance between the current state and (*i.e.*, it can be used to compute extreme values of each trajectory to determine the next intermediate event). The value of *φ*
_{type} indicates *initial* position of **x**(*t*) with respect to at ε *t*=0. When *φ*
_{type} is equal to 1, maximum value of *f*(**x**(*t*)) in each trajectory is recorded, and *N* such values are sorted in descending order at the end of the simulation. The reasoning for this is that since , we need to encourage higher *f*(**x**(*t*)) values to get closer to . Similarly, minimum *f*(**x**(*t*)) values are recorded and sorted in ascending order when *φ*
_{type} is -1. For convenience we refer to the sorted array of extreme *f*(**x**(*t*)) values as **v**
_{
k
}, where *k* is the reaction rates used to generate **x**(*t*).

We now define a SParSE probability estimator for :

where *f*
_{
i
}(**x**(*t*|*k*)) are event function evaluated with *i*
^{th} SSA trajectory generated with reaction rates *k* and takes a value of 1 if *f*
_{
i
}(**x**(*t*|*k*)) contains and 0 otherwise. Once *N* trajectories are simulated, we can expect one of the following outcomes: (a) the inequality in (7) is satisfied, (b) , or (c) .

In the first case, SParSE exits and returns *k* as a successful output, i.e., a point in the solution hyperplane (*k*≡*k*
^{*}). In the second case, we need to choose extreme values of *f*(**x**(*t*)) evolving furthest to , and we can view as a “rarer event” as in the dwSSA. Thus intermediate events and its respective biasing parameters are computed iteratively, each time taking the system closer to with success rate . The last case corresponds to parameter sets that “over-perturb” the system, as was reached with probability greater than . The method used to determine an intermediate event in the classical multilevel cross-entropy method cannot be applied here because we do not want trajectories that produce extreme values of *f*(**x**(*t*)). However, the information gathered from such trajectories can be used to quantify the behavior we *do not* want to observe. We achieve this by collecting the extreme values of *f*(**x**(*t*)) as in case (b), except that each SSA simulation is run until the final simulation time without stopping when is observed. Once intermediate events are chosen and their corresponding biasing parameters are computed, we update *j*
^{th} reaction rate *k*
_{
j
} with 1/*γ*
_{
j
}. This inverse biasing discourages over-perturbation with respect to . Algorithms 2 and 3 in Appendix B of Additional file 1 contain pseudocode for (b) and (c), respectively.

Unlike the multilevel cross-entropy method used by dwSSA, where only one intermediate event is computed in each level of multilevel CE method, SParSE may choose multiple intermediate events. While it is not necessary to compute multiple intermediate events to reach the solution hyperplane, doing so greatly improves algorithm efficiency. The caveat here is that the efficiency gain occurs only when the biasing parameters for the multiple intermediate events are computed simultaneously. We start by describing the method SParSE uses to choose multiple intermediate events, all of which can be reached by *N* trajectories with sufficient frequency. This is attained by choosing multiple values for *ρ* that is a function of the distance to the desired target event probability . Denoting the distance as , we have two different methods for choosing *ρ*(*δ*): one for case (b) and the other for case (c). Handling of the two cases differ, as the result of inverse biasing is not as obvious as the normal biasing for case (b) is. In normal biasing strategy, updating intermediate reaction rates with a particular set of biasing parameters redistributes **v**
_{
k
} such that the corresponding intermediate event becomes the mode. However, the inverse biasing operates on the heuristics of discouraging over-perturbation without knowing the exact effect on **v**
_{
k
}. Thus more conservative values for *ρ* are used in (11) to compensate for this difference. Lastly, we note that each of these cases can be detected by comparing the sign of *δ* to the value of *φ*
_{type}, where the equality represents case (b).

For sgn(*δ*(*k*))==*φ*
_{type}

For sgn(*δ*(*k*))≠*φ*
_{type}

As the distance to the target event decreases, SParSE selects less extreme values for intermediate events and vice versa. This reduces the risk of over- and under-perturbations. We note that the number of elements in *ρ*(*δ*) does not necessarily correspond to the number of intermediate events chosen. For example, elements corresponding to positions ⌈0.005 * *N*⌉ and ⌈0.01**N*⌉ of **v**
_{
k
} may be the same. We also note that a custom function can be used to compute *ρ* to better suit a specific system. However, the above default values work well for all examples presented in this paper. Lastly, *N* can be chosen as a function of min(*ρ*) and *c*, where *c* is the minimum number of data points desired to reliably compute *γ*
^{(l)}, *i.e.*, *N*≥*c*/ min(*ρ*).

Once intermediate events are computed, they are sorted in ascending order of its probability, *i.e.*, *P* *r* *o* *b*(*ξ*
^{(l,1)}) ≤ … ≤*P* *r* *o* *b*(*ξ*
^{(l,q)}), where *q* is the number of unique intermediate events chosen at level *l*. We note that this sorting is done automatically if elements of *ρ* are sorted in ascending order, which (10, 11) are.

Now we describe how biasing parameters for all intermediate events are computed concurrently in a single ensemble of *N* simulations. In each simulation, we check for *ξ*
^{(l,q)}. If *ξ*
^{(l,q)} is observed, the statistics gathered up to the time at which *ξ*
^{(l,q)} was reached are used to compute *γ*
^{(l,q)}. Then the trajectory continues its course of simulation, this time checking for *ξ*
^{(l,q-1)} while keeping the cumulative statistics. This process repeats until the smaller of *t*
_{
final
} and the time at which *ξ*
^{(l,1)} is observed (*i.e.*, all intermediate events are observed). When *q*==1, this method is identical to the one used by dwSSA. Although a single trajectory runtime for *q*>1 is slightly longer than the runtime for *q*==1, the additional resources spent on concurrent computation is negligible compared to the savings of (*q*-1).*N* simulations. We note that this process yields biasing parameter sets that are correlated because *γ*
^{(l,i)} is computed with a subset of data used to compute *γ*
^{(l,i+1)}. However, this correlation does not affect the validity or the accuracy of the final output as only one set is selected at each level to update the reaction rates, the process for which we explain in the next section.

#### Updating intermediate reaction rates

SParSE propagates the system towards the solution hyperplane by iteratively updating reaction rates during the modified multilevel cross-entropy method. The update process requires choosing one set of biasing parameters from possibly many sets, where the set size is determined by the number of unique intermediate events. The current intermediate reaction rates are then multiplied element-wise by the chosen set to produce the next intermediate reaction rates. The criterion SParSE adopts is straightforward; at level *l* it chooses the biasing parameter set that, when multiplied to the current intermediate reaction rates, takes the system closest to while preserving the sign of *δ*(*k*
^{(g)}),*g*=0,…,*l*.

Without loss of generality, we define *k*
^{(cur)} as the intermediate reaction rates at an arbitrary level *l*. In order to update the intermediate reaction rates for the next stage, we evaluate how each candidate biasing parameter set *γ*
^{(l,γ)} performs with respect to the update criterion. We define *k*
^{(int,i)} as

where *q* is the number of unique intermediate events. We recall that sgn(*δ*(*k*
^{(cur)}))≠*φ*
_{type} corresponds to the case when , which requires inverse biasing to reduce over-perturbation.

Starting with *i*=1, we compute . If , then the algorithm exits with *k*
^{*}=*k*
^{(int,i)}. Otherwise, we traverse through available sets of biasing parameters to find
for sgn(*δ*(*k*
^{(cur)}))==*φ*
_{type} and for sgn(*δ*(*k*
^{(cur)}))≠*φ*
_{type}. Since *ξ*
^{(l,.)} are sorted in ascending order of its probability, *k*
^{(int,i)} is expected to produce more extreme *f*(**x**(*t*)) values than *k*
^{(int,i+1)}. Thus it is not necessary to evaluate all possible . For the case of under-perturbation we can stop the evaluation at the first occurrence of *k*
^{(int,i)} that satisfies the inequality and set *k*
^{(l+1)}←*k*
^{(int,i)}. For the case of over-perturbation, however, we stop the simulation at the first occurrence of *k*
^{(int,i)} that *violates* the inequality and set *k*
^{(l+1)}←*k*
^{(int,i-1)}.

It is possible that all candidate biasing parameter sets fail to satisfy the update criterion. The failure indicates lies between and . Furthermore, this failure is a direct result of many-to-one relationship between and . Trajectories simulated with two different sets of reaction rates *k* and *k*
^{′}=*k*+*є* are likely to differ from each other, resulting in . However, both **v**
_{
k
} and may yield the same intermediate events, since they are determined solely by the value of the sorted array at positions ⌈*ρ* *N*⌉. Despite the identical *ξ*, SParSE estimates computed with *k* and *k*
^{′} will differ if the proportion of occurrences of is not the same in the two arrays.

In summary, the modified multilevel cross-entropy method for SParSE comprises of 3 steps. First we determine intermediate events for the current reaction rates using the SSA. We then employ dwSSA simulations to compute biasing parameters for each of the intermediate events. Lastly we follow steps described in this section to choose one set of biasing parameters to update reaction rates for the next iteration. This process repeats until either *k*
^{*} is found or until intermediate reaction rates cannot be updated any more. For computational efficiency, we can combine the first and the last steps by computing at the same time as computing . We discard if the estimate does not satisfy the required inequality or if *k*
^{(int,i)} is not the best candidate for the next intermediate reaction rates.

Lastly we point out that the original dwSSA formula (6) for computing *γ*
^{(l)} requires computation of a trajectory weight, which is the product of the likelihood ratio between the propensity function and the predilection function. We recall that the predilection function in the multilevel cross-entropy method used by dwSSA is characterized with the biasing parameters from level (*l*-1). However, we do not need to compute the trajectory weight in SParSE because its objective is to find a new set of reaction rates that confer the event specifications regardless of *k*
^{(0)}, which is used only as a starting position in the path to find *k*
^{*}. The intermediate reaction rates in level *l* of SParSE reflect cumulative amount of bias applied to the original system up to stage *l*-1 as quantified in (12). Thus the propensity function at level *l* does not require additional biasing. This leads to using SSA to determine intermediate events and dwSSA to compute *γ*
^{(l)} with . Therefore the formula (6) for SParSE simplifies to

### Exponential interpolation of biasing parameters

Iteratively updating intermediate reaction rates via the modified multilevel cross-entropy method described in the previous section may not find *k* that satisfies (7). Possible reasons for the failure include poor choice of *ρ*, insufficient *N*, and nonexistence of candidate intermediate reaction rates that satisfy the update criterion. The first two aligned can lead to slow convergence to , especially for systems near a deterministic regime or for simulations that demand high accuracy (*i.e.*, small values of . Setting a limit on the maximum number of iterations for the multilevel cross-entropy method can detect slow or non-converging reaction rates, and increasing *N* and/or modifying *ρ* will increase the rate of convergence in most aligned. The last phenomenon occurs when no suitable biasing parameters exist to update the reaction rates. Here we have , where and , *i* =1,…,*q*. The target probability lies between the two estimates and , and the multilevel cross-entropy method is unable to fine-tune intermediate reaction rates to achieve within the specified error tolerance .

It is reasonable to assume that a linear combination of *k*
^{(u)} and *k*
^{(v)} may result in *k*
^{*}. A more sophisticated method for approximating *k*
^{*} would be to fit an interpolant through past intermediate reaction rates. By making the following two assumptions, SParSE computes candidate biasing parameters such that when multiplied to *k*
^{(0)}, they may satisfy (7).

#### Assumption1

*k*
^{*} exists such that for or for , *j*є {1,…,*M*}.

#### Assumption 2.

can be computed independently from for *j*≠*h*.

We note that a single dwSSA trajectory likelihood ratio at level *l* of the multilevel cross-entropy method is

where and are the propensity sum of the trajectory **J** at time *t* generated with *k*
^{(l-1)} and *k*
^{(l)}, respectively, and *γ*
^{(l-1)} is the biasing parameter used to update the intermediate reaction rates, *i.e.*, . The quantity inside the exponential term is a function of the system state, which is in turn a function of intermediate reaction rates. In order to compare SParSE estimates generated with different intermediate reaction rates, we rewrite them as a function of the initial reaction rates and normalized intermediate biasing parameters, *i.e.*, , and . The purpose here is to quantify the relationship between different values of *γ*
^{(0,.)} and its corresponding estimates . Considering the form of the likelihood ratio in (14), a natural form for the interpolant is

where *p*
_{
j
} and *q*
_{
j
} are constants and are normalized version of the intermediate biasing parameters used to compute past SParSE estimates. Output data used in constructing interpolants are the corresponding SParSE estimates multiplied by *N*, the total number of simulations. This particular form allows for fast solving of *p*
_{
j
} and *q*
_{
j
} with a first order polynomial curve fitting method. We first transform the data to logarithmic scale, compute for two coefficients in the first order polynomial, and then retransform the output with exponentiation. The reason for scaling the output data with *N* is to preserve as many significant digits as possible, as logarithmic y-scale is used to compute the polynomial coefficients. While other forms of interpolant may yield more accurate interpolation, (15) allows for fast computation while satisfying Assumption (1), as an exponential function is monotonic.

The number of past intermediate reaction rates available for interpolation varies by factors such as *k*
^{(0)}, , and *N*. Although all past estimates can be used for interpolation, confining the number of data to *X* closest estimates of (*e.g.* *X*=5) while having at least one estimate on either side of the target probability is recommended, as the accuracy of interpolation may degrade with estimates that are far from the target probability. Due to the construction of the algorithm, there exists at least one estimate on either side of when the algorithm enters the exponential interpolation stage. However, we note that the total number of past intermediate reaction rates can be as few as 2. Once the values of *p*
_{
j
} and *q*
_{
j
} in (15) are determined for all *M*interpolants, SParSE executes the following steps to further increase the efficiency of simulation.

Step 1: For each *j*є{1,…,*M*}, project onto the x axis of the interpolant to compute candidate biasing parameters , where the first element (.) in the superscript is the interpolation iteration index and *s*є{1,…,7} is the index of the candidates.

Step 2: Compute candidate intermediate reaction rates , where

Step 3: Constrain to satisfy for or for , *j*є{1,…,*M*}, if necessary. Reverse the signs in inequalities for sgn(*δ*(*k*))≠*φ*
_{type}.

Starting with *s*=4, we compute . We note that corresponds to the reaction rates computed from projecting the exact target probability to the interpolating function. If executing Step 3 results in duplicate candidates, we eliminate the duplicate set(s) and assign the starting index to *s* such that .

If confers the target event probability within , SParSE exits with . Otherwise, we compute the next estimate with for , and for
. The interpolation stage continues until either *k*
^{*} is found or the end of candidate reaction rates is reached, at which point an additional interpolation may be executed with updated data. On a rare occasion, *k*
^{*} lies between two candidate reaction rates without satisfying the error tolerance. This can lead to infinite loop of incrementing and decrementing *s* by 1 without converging to *k*
^{*}, but the cycle can easily be detected with a mask vector. SParSE implements one by creating a zero vector whose size is equal to the number of candidate biasing parameter sets. Every time a SParSE estimate is computed with candidate reaction rates at index *s*, we increment *s*
^{th} position of the mask vector. Once the magnitude of any mask vector position becomes greater than 2, we conclude that *k*
^{*} lies between two candidate biasing reaction rates. At this point, we have refined an upper and lower bound on *k*
^{*}, as all candidate biasing parameters computed satisfy the inequality in Assumption 1. Once the bounds are sufficiently small, an alternative to an additional interpolation is to take a weighted average of the two candidate reaction rates, where the weight is the distance between the SParSE estimate and . For the examples presented in the following section, we did not encounter any initial reaction rates that required such treatment.