Stochastic parameter search for events

Background With recent increase in affordability and accessibility of high-performance computing (HPC), the use of large stochastic models has become increasingly popular for its ability to accurately mimic the behavior of the represented biochemical system. One important application of such models is to predict parameter configurations that yield an event of scientific significance. Due to the high computational requirements of Monte Carlo simulations and dimensionality of parameter space, brute force search is computationally infeasible for most large models. Results We have developed a novel parameter estimation algorithm—Stochastic Parameter Search for Events (SParSE)—that automatically computes parameter configurations for propagating the system to produce an event of interest at a user-specified success rate and error tolerance. Our method is highly automated and parallelizable. In addition, computational complexity does not scale linearly with the number of unknown parameters; all reaction rate parameters are updated concurrently at the end of each iteration in SParSE. We apply SParSE to three systems of increasing complexity: birth-death, reversible isomerization, and Susceptible-Infectious-Recovered-Susceptible (SIRS) disease transmission. Our results demonstrate that SParSE substantially accelerates computation of the parametric solution hyperplane compared to uniform random search. We also show that the novel heuristic for handling over-perturbing parameter sets enables SParSE to compute biasing parameters for a class of rare events that is not amenable to current algorithms that are based on importance sampling. Conclusions SParSE provides a novel, efficient, event-oriented parameter estimation method for computing parametric configurations that can be readily applied to any stochastic systems obeying chemical master equation (CME). Its usability and utility do not diminish with large systems as the algorithmic complexity for a given system is independent of the number of unknown reaction rate parameters. Electronic supplementary material The online version of this article (doi:10.1186/s12918-014-0126-y) contains supplementary material, which is available to authorized users.


Background
Stochastic modeling of biochemical and ecological systems has become increasingly popular due to its ability to represent system dynamics correctly at a detailed level, especially when species are present at low population. Deterministic models, on the other hand, are easier to analyze, yet they may fail to capture even the average behavior when the represented system exhibits nonlinearity [1] or is near extinction. Recent advancements in cloud computing platforms [2,3] and GPU computing [4][5][6][7] have significantly increased the affordability of computational resources. This enables development and use of stochastic algorithms that would have been deemed computationally infeasible in the past. However, there is *Correspondence: mroh@intven.com Numerical Methods, Institute for Disease Modeling, 1575 132 Ave. NE, 98005 Bellevue, USA still a void in stochastic methods that can answer scientifically interesting questions. One such application is in determining reaction rate configurations that yield an event of interest with a set success probability. Most parameter estimation algorithms in stochastic chemical kinetics setting take time-series data as an input and compute a set of reaction rate parameters that most closely reproduce the data. Methods used to determine these reaction rate parameters include maximum likelihood ratio [8][9][10], gradient decent [11], and moment closure [12]. While these algorithms are useful in its own right, scientists are often interested in knowing all parameter combinations that yield a specific event of interest. For gene regulatory models, knowledge of all pathways to achieve a specific event, such as bistable transition of lac operon in E. coli [13][14][15], may be used to guide laboratory experiments. In epidemiological models, all intervention http://www.biomedcentral.com/1752-0509/ 8/126 parameter combinations that achieve eradication can be combined with econometrics in computing the most costeffective strategy for eradicating a disease [16]. To authors' knowledge, no algorithm has been developed in stochastic chemical kinetics setting that computes such parameter combinations.
In this paper, we present Stochastic Parameter Search for Events (SParSE) that finds a parametric hyperplane of reaction rates conferring a user-specified event with prescribed success rate and error tolerance. Our algorithm is robust in that it accurately computes the solution hyperplane for low probability events as well as high probability events. It is also trivial to parallelize the algorithm; initial parameter sets do not need to communicate with each other to find the direction to the unknown solution hyperplane. Once the algorithm finds a point in the solution hyperplane, the ratio between the initial and final rates can be used as the biasing parameters by the doubly weighted stochastic simulation algorithm (dwSSA) [17] to compute the probability of observing the target event with its success rate under the original system description. This allows calculation of the target event probabilities under the original parameters as a powerful side benefit of the algorithm. Lastly, the SParSE runtime per parameter sample is of the same order as that of the stochastic simulation algorithm (SSA), i.e., the algorithm complexity is independent of the number of unknown parameters for a given system. This is achieved by combining a novel modification of dwSSA [17], Rubinstein's cross-entropy method [18], and exponential interpolation of biasing parameters. This feature provides substantial benefits when searching multi-dimensional parameter space.

Doubly weighted stochastic simulation
We begin with a brief review of the dwSSA; detailed derivation and applications can be found in Daigle et al. [17]. Throughout this paper we assume a well-stirred system at constant temperature with N species S 1 , . . . , S N and M reactions R 1 , . . . , R M . The state of the system at time t is represented as X(t) = [X 1 , . . . , X N ], where X i is the population of species S i . Using the "direct method" implementation of Gillespie's Stochastic Simulation Algorithm (SSA) [19], the system moves forward in time by sequentially firing one of R j , j ∈ {1, . . . , M} reactions, whose propensity at time t is a j (X(t)) and its sum a 0 = M j=1 a j (X(t)). Here, the next reaction is chosen with a categorical random variable j and the time to the next reaction with an exponential random variable τ . We also assume that all trajectories are run until the smaller of the final simulation time t f and the first time E is observed, where E is the event of interest. Denoting the stopping time of a trajectory as T , the probability of a complete trajectory J = t 1 , j 1 , . . . , t N T , j N T given X(0) = x 0 under SSA is as follows: with t i ≡ i j=1 τ j and N T the total number of reactions fired in [0, T ].
The dwSSA uses predilection functions to increase the number of trajectories that reach E: b j (X(t)) ≡ γ j a j (X(t)), b 0 (X(t)) = M j=1 b j (X(t)), (2) where γ j ∈ R + is a biasing parameter for R j . The probability of the same trajectory J under the dwSSA is then given by and the bias incurred by using the predilection function is corrected with It is straightforward to confirm that the product of (3) and (4) equals (1). The Monte Carlo estimator for p dwSSA (x 0 , E; t), the probability that the system reaches E by time t given the initial state x 0 , iŝ where N is the total number of trajectories, J i represents the i th simulated dwSSA trajectory, and I {J i ∩E} takes a value of 1 if E is visited by J i and 0 otherwise. The quantity in (5) can be interpreted as the weighted average of successful trajectories, i.e., trajectories reaching E, where the weight is computed according to (4). A good set of biasing parameters would yield successful http://www.biomedcentral.com/1752-0509/8/126 trajectories with weights close to the true probability and thus reduce variance in the probability estimator. The dwSSA computes low variance biasing parameters by minimizing cross entropy using a modified version of Rubinstein's multilevel cross-entropy method [17,18]. The advantage of minimizing cross entropy over minimizing variance is that the former yields biasing parameters with a closed-form solution for (2) where the latter does not. Having a closed-form solution is of practical necessity, as the alternative would be to solve a large set of nonlinear equations, significantly decreasing the efficiency of the algorithm if not making the simulation infeasible. Following derivations presented in Daigle et al. [17], the dwSSA biasing parameter for R j is computed aŝ where n ij is the total number of times reaction j fires in the i th trajectory, i iterates only over trajectories reaching E, and l is the stage index in multilevel crossentropy method. Computation forγ (l) terminates when intermediate rare event reaches E, at which point we set γ (l) ≡γ * .
The objective of the dwSSA necessitates computation of the likelihood ratio (4) as its probability estimator is with respect to the initial reaction rates k (0) . On the other hand, the objective of SParSE is to compute a set of reaction rates k * ∈ R M+ such that where P E is the desired probability of observing E by time t, I {f i (x(t|k * ))∩E} an indicator function for observing E during i th trajectory, and P E a user-specified absolute error tolerance on P E . Unlike the dwSSA where the biasing parameters are updated each level of the multilevel crossentropy method according to (6), in SParSE reaction rates are updated instead. We note that it is possible to use the dwSSA Monte Carlo estimator (5) in (7) and update γ (l) instead of k (l) . However unless k (0) is sufficiently close to k * , the likelihood ratio (4) may become extremely small, i.e., degenerate, and updating reaction rates avoids this problem. We discuss the criteria for updating k in the following section.

Multilevel cross-entropy method
The modification of multilevel cross-entropy method in SParSE is similar to that of Ref [17]. However, there are three major differences between the multilevel crossentropy method employed by dwSSA and by SParSE: (i) dwSSA only computes a single intermediate event ξ (l) and the corresponding set of biasing parameters γ (l) while SParSE may compute multiple such quantities, (ii) SParSE can calculate biasing parameters for initial reaction rates that either over-or under-perturb the system with respect to P E . For an over-perturbed system, it applies inverse biasing to reaction rates to convert E from a "sure event" to a "rarer event", and (iii) dwSSA updates biasing parameters γ (l) while SParSE updates reaction rates k (l) . The following subsection explains the first two differences and highlights how SParSE achieves the same time complexity as dwSSA for computing quantities in (i). The next subsection focuses on (iii) and its effect on simulation details.

Concurrent computation of multiple intermediate events and biasing parameters
In dwSSA, N trajectories in level l of multilevel crossentropy method are run until either the final simulation time t final or the first occurrence of ξ (l) , where ξ (l) is an intermediate rare event chosen by the top ρN trajectories that evolve farthest in the direction of E. Typical value of ρ used in dwSSA is 0.01 for N = 10 5 , although any value ρ ∈ (0, 1) can be used in theory [17]. The role of ρ can be thought as a knob that controls the tradeoff between the speed of convergence to E and accuracy inγ * . For ρ < ρ, we get |E − ξ (l ) | ≤ |E − ξ (l) |, thus smaller values for ρ can potentially drive the system toward E faster. However the number of trajectories reaching ξ (l ) is less than the number of trajectories reaching ξ (l) since ρ N < ρN .
Having fewer data to computeγ (l) reduces the confidence on the estimate, therefore it is advised to keep ρN above a set threshold (e.g., 200) in practice. On the other hand, larger value of ρ (e.g., ρ > 0.3) implies a less selective intermediate rare event. The resulting biasing parameters may not push the system closer to E, causing a failure in convergence to the target event. In our experience, ρ < 0.2 and ρN > 100 yield both reliable computation of biasing parameters and acceptable convergence to E. In order to determine ξ (l) , it is necessary to determine the direction of bias in addition to ρ. This is done by grouping the initial state x 0 into two categories according to its distance with respect to the event of interest: where f (x (t)) is an event function. Two requirements for f (x (t)) are that it takes x(t) as an input and can be used to evaluate the distance between the current state and E (i.e., it can be used to compute extreme values of each http://www.biomedcentral.com/1752-0509/8/126 trajectory to determine the next intermediate event) . The value of φ type indicates initial position of x(t) with respect to E at t = 0. When φ type is equal to 1, maximum value of f (x (t)) in each trajectory is recorded, and N such values are sorted in descending order at the end of the simulation. The reasoning for this is that since f (x (t 0 )) ≤ E, we need to encourage higher f (x (t)) values to get closer to E. Similarly, minimum f (x (t)) values are recorded and sorted in ascending order when φ type is -1. For convenience we refer to the sorted array of extreme f (x (t)) values as v k , where k is the reaction rates used to generate x (t).
We now define a SParSE probability estimator for E: where f i (x(t|k)) are event function evaluated with i th SSA trajectory generated with reaction rates k and I {f i (x(t|k))∩E} takes a value of 1 if f i (x(t|k)) contains E and 0 otherwise. Once N trajectories are simulated, we can expect one of the following outcomes: (a) the inequality in In the first case, SParSE exits and returns k as a successful output, i.e., a point in the solution hyperplane (k ≡ k * ). In the second case, we need to choose extreme values of f (x (t)) evolving furthest to E, and we can view E as a "rare event" as in the dwSSA. Thus intermediate events and its respective biasing parameters are computed iteratively, each time taking the system closer to E with success rate P E . The last case corresponds to parameter sets that "over-perturb" the system, as E was reached with probability greater than P E . The method used to determine an intermediate event in the classical multilevel cross-entropy method cannot be applied here because we do not want trajectories that produce extreme values of f (x (t)). However, the information gathered from such trajectories can be used to quantify the behavior we do not want to observe. We achieve this by collecting the extreme values of f (x (t)) as in case (b As the distance to the target event decreases, SParSE selects less extreme values for intermediate events and vice versa. This reduces the risk of over-and underperturbations. We note that the number of elements in ρ(δ) does not necessarily correspond to the number of intermediate events chosen. For example, elements corresponding to positions 0.005 * N and 0.01 * N of v k may be the same. We also note that a custom function can be used to compute ρ to better suit a specific system. However, the above default values work well for all examples presented in this paper. Lastly, N can be chosen as a function of min (ρ) and c, where c is the minimum number of data points desired to reliably compute γ (l) , i.e., N ≥ c/ min (ρ). http://www.biomedcentral.com/1752-0509/8/126 Once intermediate events are computed, they are sorted in ascending order of its probability, i.e., Prob(ξ (l,1) ) ≤ · · · ≤ Prob(ξ (l,q) ), where q is the number of unique intermediate events chosen at level l. We note that this sorting is done automatically if elements of ρ are sorted in ascending order, which (10,11) are. Now we describe how biasing parameters for all intermediate events are computed concurrently in a single ensemble of N simulations. In each simulation, we check for ξ (l,q) . If ξ (l,q) is observed, the statistics gathered up to the time at which ξ (l,q) was reached are used to compute γ (l,q) . Then the trajectory continues its course of simulation, this time checking for ξ (l,q−1) while keeping the cumulative statistics. This process repeats until the smaller of t final and the time at which ξ (l,1) is observed (i.e., all intermediate events are observed). When q == 1, this method is identical to the one used by dwSSA. Although a single trajectory runtime for q > 1 is slightly longer than the runtime for q == 1, the additional resources spent on concurrent computation is negligible compared to the savings of (q − 1) · N simulations. We note that this process yields biasing parameter sets that are correlated because γ (l,i) is computed with a subset of data used to compute γ (l,i+1) . However, this correlation does not affect the validity or the accuracy of the final output as only one set is selected at each level to update the reaction rates, the process for which we explain in the next section.

Updating intermediate reaction rates
SParSE propagates the system towards the solution hyperplane by iteratively updating reaction rates during the modified multilevel cross-entropy method. The update process requires choosing one set of biasing parameters from possibly many sets, where the set size is determined by the number of unique intermediate events. The current intermediate reaction rates are then multiplied elementwise by the chosen set to produce the next intermediate reaction rates. The criterion SParSE adopts is straightforward; at level l it chooses the biasing parameter set that, when multiplied to the current intermediate reaction rates, takes the system closest to E while preserving the sign of δ(k (g) ), g = 0, · · · , l.
Without loss of generality, we define k (cur) as the intermediate reaction rates at an arbitrary level l. In order to update the intermediate reaction rates for the next stage, we evaluate how each candidate biasing parameter set γ (l,·) performs with respect to the update criterion. We define k (int,i) as where q is the number of unique intermediate events.
We recall that sgn δ(k (cur) ) = φ type corresponds to the case when P E + P E <p SParSE (x 0 , k (cur) , E; t), which requires inverse biasing to reduce over-perturbation. (int,i) . Otherwise, we traverse through available sets of biasing parameters to find min For the case of under-perturbation we can stop the evaluation at the first occurrence of k (int,i) that satisfies the inequality and set k (l+1) ← k (int,i) . For the case of overperturbation, however, we stop the simulation at the first occurrence of k (int,i) that violates the inequality and set It is possible that all candidate biasing parameter sets fail to satisfy the update criterion. The failure indicates P E lies betweenp SParSE k (l) andp SParSE k (int,·) . Furthermore, this failure is a direct result of many-to-one relationship between k ∈ R M >0 and ξ ∈ R. Trajectories simulated with two different sets of reaction rates k and k = k + are likely to differ from each other, resulting in v k = v k . However, both v k and v k may yield the same intermediate events, since they are determined solely by the value of the sorted array at positions ρN . Despite the identical ξ , SParSE estimates computed with k and k will differ if the proportion of occurrences of E is not the same in the two arrays.
In summary, the modified multilevel cross-entropy method for SParSE comprises of 3 steps. First we determine intermediate events for the current reaction rates using the SSA. We then employ dwSSA simulations to compute biasing parameters for each of the intermediate events. Lastly we follow steps described in this section to choose one set of biasing parameters to update reaction rates for the next iteration. This process repeats until either k * is found or until intermediate reaction rates cannot be updated any more. For computational efficiency, we can combine the first and the last steps by computing Lastly we point out that the original dwSSA formula (6) for computing γ (l) requires computation of a trajectory weight, which is the product of the likelihood ratio between the propensity function and the predilection function. We recall that the predilection function in the multilevel cross-entropy method used by dwSSA is characterized with the biasing parameters from level (l − 1). However, we do not need to compute the trajectory weight in SParSE because its objective is to find a new set of reaction rates that confer the event specifications regardless of k (0) , which is used only as a starting position in the path to find k * . The intermediate reaction rates in level l of SParSE reflect cumulative amount of bias applied to the original system up to stage l − 1 as quantified in (12). Thus the propensity function at level l does not require additional biasing. This leads to using SSA to determine intermediate events and dwSSA to com-

Exponential interpolation of biasing parameters
Iteratively updating intermediate reaction rates via the modified multilevel cross-entropy method described in the previous section may not find k that satisfies (7). Possible reasons for the failure include poor choice of ρ, insufficient N, and nonexistence of candidate intermediate reaction rates that satisfy the update criterion. The first two aligned can lead to slow convergence to E, especially for systems near a deterministic regime or for simulations that demand high accuracy (i.e., small values of P E ). Setting a limit on the maximum number of iterations for the multilevel cross-entropy method can detect slow or non-converging reaction rates, and increasing N and/or modifying ρ will increase the rate of convergence in most aligned. The last phenomenon occurs when no suitable biasing parameters exist to update the reaction rates.
The target probability lies between the two estimatesp(k (u) ) andp(k (v) ), and the multilevel cross-entropy method is unable to fine-tune intermediate reaction rates to achieve P E within the specified error tolerance P E .
It is reasonable to assume that a linear combination of k (u) and k (v) may result in k * . A more sophisticated method for approximating k * would be to fit an interpolant through past intermediate reaction rates. By making the following two assumptions, SParSE computes candidate biasing parameters such that when multiplied to k (0) , they may satisfy (7).
We note that a single dwSSA trajectory likelihood ratio at level l of the multilevel cross-entropy method is where a are the propensity sum of the trajectory J at time t generated with k (l−1) and k (l) , respectively, and γ (l−1) is the biasing parameter used to update the intermediate reaction rates, i.e., k (l) The quantity inside the exponential term is a function of the system state, which is in turn a function of intermediate reaction rates. In order to compare SParSE estimates generated with different intermediate reaction rates, we rewrite them as a function of the initial reaction rates and normalized intermediate biasing parameters, i.e., k The purpose here is to quantify the relationship between different values of γ (0,·) and its corresponding estimateŝ p SParSE (x 0 , k (·) , E; t). Considering the form of the likelihood ratio in (14), a natural form for the interpolant is where p j and q j are constants and γ (0,·) j are normalized version of the intermediate biasing parameters used to compute past SParSE estimates. Output data used in constructing interpolants are the corresponding SParSE estimatesp SParSE (x 0 , k (·) , E; t) multiplied by N, the total number of simulations. This particular form allows for fast solving of p j and q j with a first order polynomial curve fitting method. We first transform the data to logarithmic scale, compute for two coefficients in the first order polynomial, and then retransform the output with exponentiation. The reason for scaling the output data with N is to preserve as many significant digits as possible, http://www.biomedcentral.com/1752-0509/8/126 as logarithmic y-scale is used to compute the polynomial coefficients. While other forms of interpolant may yield more accurate interpolation, (15) allows for fast computation while satisfying Assumption (1), as an exponential function is monotonic.
The number of past intermediate reaction rates available for interpolation varies by factors such as k (0) , P E , and N. Although all past estimates can be used for interpolation, confining the number of data to X closest estimates of P E (e.g. X = 5) while having at least one estimate on either side of the target probability is recommended, as the accuracy of interpolation may degrade with estimates that are far from the target probability. Due to the construction of the algorithm, there exists at least one estimate on either side of P E when the algorithm enters the exponential interpolation stage. However, we note that the total number of past intermediate reaction rates can be as few as 2. Once the values of p j and q j in (15) are determined for all M interpolants, SParSE executes the following steps to further increase the efficiency of simulation.
Step 1: onto the x axis of the interpolant to compute candidate biasing parametersγ (·,s) j , where the first element (·) in the superscript is the interpolation iteration index and s ∈ {1, · · · , 7} is the index of the candidates.
Starting with s = 4, we computep SParSE (k (·,s) ). We note thatk (·,4) corresponds to the reaction rates computed from projecting the exact target probability to the interpolating function. If executing Step 3 results in duplicate candidates, we eliminate the duplicate set(s) and assign the starting index to s such that q j · exp p j ×γ Ifp SParSE (k (·,s) ) confers the target event probability within P E , SParSE exits with k * =k (·,s) . Otherwise, we compute the next estimate withk (·,s+1) for ,s) ). The interpolation stage continues until either k * is found or the end of candidate reaction rates is reached, at which point an additional interpolation may be executed with updated data. On a rare occasion, k * lies between two candidate reaction rates without satisfying the error tolerance. This can lead to infinite loop of incrementing and decrementing s by 1 without converging to k * , but the cycle can easily be detected with a mask vector. SParSE implements one by creating a zero vector whose size is equal to the number of candidate biasing parameter sets. Every time a SParSE estimate is computed with candidate reaction rates at index s, we increment s th position of the mask vector. Once the magnitude of any mask vector position becomes greater than 2, we conclude that k * lies between two candidate biasing reaction rates. At this point, we have refined an upper and lower bound on k * , as all candidate biasing parameters computed satisfy the inequality in Assumption 1. Once the bounds are sufficiently small, an alternative to an additional interpolation is to take a weighted average of the two candidate reaction rates, where the weight is the distance between the SParSE estimate and P E . For the examples presented in the following section, we did not encounter any initial reaction rates that required such treatment.

Results and discussion
We illustrate SParSE performance on the following three examples of increasing complexity: a birth-death process, a reversible isomerization process and a Susceptible-Infectious-Recovered-Susceptible (SIRS) disease transmission model. The first two examples were chosen to demonstrate the algorithm's accuracy against the exact solution, which for these examples can be computed using the master equation or the infinitesimal generator matrix [1]. We then progress to a more complex SIRS model, which has no closed-form analytical solution.
For each system, we analyze the SParSE performance on all possible combinations of P E ∈ {0.40, 0.60, 0.80} and P E ∈ {0.01, 0.05, 0.10}, where P E and P E denote a desired probability for event E and its error tolerance, respectively. We then compare the result with that from comparable SSA simulations whose reaction rates are selected using uniform random sampling (URS). SParSE also employs URS but only to generate a number of initial reaction rates as a starting point, here set to 30. The number of simulations, N, used to estimate P E per parameter sample is set to 5 × 10 4 unless mentioned otherwise. We also test the robustness of SParSE by assessing its performance on a low probability event, P E = 0.010 and P E = 0.001 for the birth-death process, and a high probability event, P E = 0.95 and P E = 0.005 for the reversible isomerization process. used for computing an intermediate event and a SParSE estimate, it is straightforward to compare the two strategies with computational fairness. For each simulation scenario, we provide four metrics on performance: the total number of SParSE estimates needed for all 30 initial parameter samples, the number of initial parameters that did not reach the solution hyperplane within 10 iterations of multilevel cross-entropy method or 3 iterations of exponential interpolation, the number of parameter sets that required interpolation in addition to the multilevel cross-entropy method, and the number of successful parameter sets generated by SSA simulations using URS for sampling reaction rates. Lastly, we provide movie files of SParSE ensemble simulations for two test scenarios: birth-death with P E = 0.010 and P E = 0.001 and SIRS with P E = 0.40 and P E = 0.01. All computations were run on a desktop with Intel® Xeon® CPU E5-2680, 2.70 GHz processor with 8 cores and 32 GB of RAM. We utilized Matlab's Parallel Computing Toolbox™ (PCT) and the Coder™. The PCT™ was used to simulate 8 SParSE ensembles in parallel while the Coder™ was used to convert frequently-used custom Matlab functions into low-level C functions for faster computation.

Birth-death process
Our first example is a birth-death process.  Table 1 summarizes the results for the 9 standard test aligned, where SParSE achieved 100% success rate in finding k * , a vector of reaction rates k * 1 k * 2 that confers desired P E and P E , for 8 test aligned. For P E = 0.60 and P E = 0.01, two of thirty samples, k We picked one of 30 initial reaction rates for P E = 0.60 and P E = 0.05 to illustrate a complete progression of the algorithm.  Figure 2 shows an illustration of the interpolation results. We see from Table 1  The first column denotes the target probability, the second column absolute error tolerance, the third column the total number of SParSE samples computed for the 30 initial parameter sets with the number inside the parenthesis indicating the total number of intermediate event computations, the fourth the number of initial reaction rate parameter sets that required exponential interpolation, the fifth the number of initial sets that did not converge to the solution hyperplane, and the sixth the number of successful parameter sets generated with URS. For the fourth column, four numbers inside the bracket indicate the number of parameter sets that required 0, 1, 2, and 3 interpolations, respectively. N = 5 × 10 4 . Figure 3, where the values of z axis are set to the probability of the target event, which is computed using k 1 and k 2 values defined by the data's x and y coordinates, respectively. Together the figure shows the contour of the event probability surface for different values of k 1 and k 2 . Despite a rapid change in the event probability around P E = 0.60, SParSE was able to find a point in the solution hyperplane for all 30 sets of initial reaction rates. Next we illustrate the robustness of SParSE by choosing a very small target probability P E = 0.010 and P E = 0.001 (animated illustration of SParSE simulations for this scenario is provided as Additional file 2). For this problem, we increased N to 2 × 10 5 to reduce the relative uncertainty in the estimate [20]. Table 2 summarizes the results. We see that all 30 initial sets of reaction rates successfully converged to the solution hyperplane while SSA-URS yielded only 3 successful samples.  Figure 5 displays result of the same simulation scenario using SSA-URS, except that it contains 36 additional data to accommodate the total number of intermediate event computations SParSE required. We note that the parameter ranges shown in Figure 4 differ from that in Figure 5, whose data obey parametric constraints specified in the model description (i.e., 1.0 ≤ k 1 ≤ 1.7 and 0.0125 ≤ k 2 ≤ 0.025). These constraints are shown as white dashed lines in contains data outside the perimeter of white dashed lines is that our implementation of SParSE does not utilize the parametric constraints other than to generate initial sets of reaction rates. Changing the implementation of SParSE to enforce the parametric constraints throughout the simulation requires the user to provide a parametric region that contains the solution hyperplane. In this alternate implementation, if the solution hyperplane does not exist within the user-specified region, all computations are wasted. The current implementation allows for computation of the solution hyperplane regardless of its location while exploiting the user's knowledge in generating initial reaction rates.

Reversible isomerization process
Our next example concerns a reversible isomerization process, where two conformational isomers A and B are interconverted by rotation about single bonds:  Table 3 summarizes the results from 9 standard test scenarios, all of which attained 100% convergence to the solution hyperplane. We see that the total number of SParSE samples required for P E ∈ {0.40, 0.60} is comparable between the birth-death and the reversible isomerization processes. However, the latter required considerably more number of samples for P E = 0.80. This is due to the difference in the contour of target event probability surface between the two processes. Figure 6 compares ensemble results between the two processes for P E = 0.80 and P E = 0.05. Figure 6A represents the birth-death process and Figure 6B the reversible isomerization process. We see that the reversible isomerization process contains a significantly larger parametric region that corresponds to P E > 0.80, and that the probability in this region changes slowly (i.e., plateau-like contour). Only two over-perturbing initial reaction rates (i.e., orange squares above the green dotted line corresponding to P E + P E = 0.85) were generated for the birth-death http://www.biomedcentral.com/1752-0509/8/126 process while eleven such rates were generated for the reversible isomerization process. Intermediate reaction rates (white squares) of these eleven samples are close together due to the slowly changing probability in their vicinity. Lastly we note that none of the data in Figure 6B left the original parameter ranges stated in the model description. This confirms that even for simple systems such as birth-death process and reversible isomerization process, it is nontrivial to predict parameter ranges that form a convex bound.
Next we choose a high probability target of P E = 0.95 and P E = 0.005 with N = 10 5 . Table 4 summarizes the result. We see that one of 30 samples failed to converge. SParSE was not able to find k * for k (0) 27 = [0.205 0.414] after 3 rounds of exponential interpolations. The qualitative explanation for the failure is the same as with the birth-death process for P E = 0.60 and P E = 0.01, which is discussed in Appendix C of Additional file 1.
It is worth pointing out that the number of successful parameter sets generated with SSA-URS varies widely from simulation to simulation. The expected number, however, is the volume of the solution hyperplane (which changes with different values of P E ) divided by the total  Tables 1, 3, and 5).

Simple SIRS disease dynamics
The final example is Susceptible-Infectious-Recovered-Susceptible (SIRS) disease transmission model, which consists of the following three reactions: . This model describes a homogenous, fixed population setting where members of S become infected by members of I, who recover from the infection at rate γ . Once recovered, members of R have immunity against the infection. However, the immunity wanes at rate ω, and this transition from recovered to susceptible compartment replenishes http://www.biomedcentral.com/1752-0509/8/126 the population of S. The target event for this system is set to the population of I reaching 50 before t = 30. Unlike the first two examples, there is no closed-form analytical solution for this model. In order to construct the probability voxel for the specified parameter ranges, we divided each parameter region into 30 uniformly-spaced grids and computed each combination with the SSA, where each of 27, 000(30 3 ) ensembles was simulated with N = 10 5 . We then further refined the resolution of the probability volume to a 70 × 70 × 70 grid using interp3 function in Matlab, which applied linear interpolation to the 3dimensional mesh data from SSA simulations. Figure 7 displays the final solution volume, where the color of each point represents the target event probability according to the color bar on the right of the figure.
As with the previous examples, we tested SParSE on all possible combinations of P E ∈ {0.40, 0.60, 0.80} and P E ∈ {0.01, 0.05, 0.10} and measured the same quantities as in Table 1. Table 5 summarizes the results. We see that SParSE achieved 100% success rate for all scenarios tested. However, statistics on column 1 demonstrates that the total number of estimates computed for any SIRS scenario is greater than the one for the first two examples with the same target probability and error tolerance.  These results imply that the multilevel cross-entropy method applied to the SIRS model made conservative moves to reach the solution hyperplane; the algorithm required many intermediate event computations to approach the vicinity of P E but fewer fine-tuning steps (i.e., exponential interpolations). Although the same ρ(δ) values were used for all three examples, we see that its effect differs depending on the underlying system.
Two expected trends emerge from Table 5; the total number of SParSE samples and the total number of exponential interpolations required to reach the solution hyperplane decrease with increasing P E . Although numbers in columns 3 and 4 differ among Tables 1, 3, and 5, qualitative algorithmic behavior as a function of P E remain the same for all three examples. As for its performance, SParSE outperformed SSA-URS (by a factor of 1.15 to 10) on all scenarios except one. For P E = .80 and P E = 0.10, SSA with URS yielded 41 successful sets, while SParSE yielded 30. We note that the maximum number of successful sets for SParSE cannot exceed the number of initial parameter sets, which is 30 for all examples presented in this paper. Also, the parameter ranges we chose for the SIRS model result in an uneven distribution of the target probability. From Figure 7, we see that a significant portion of the probability volume belongs to high (> 0.8) or low (< 0.2) probability region. Since the SSA-URS success probability is determined solely by the ratio between the volume of the solution hyperplane and the total volume defined by the specified parameter ranges, this particular scenario is biased to be more favorable toward SSA-URS. For general applications involving a target event, however, we cannot expect the solution hyperplane to lie within the user-specified parameter ranges, to which SSA-URS samples are confined. If this region does not contain the solution hyperplane, SSA-URS is unable to produce k * regardless of the number of samples generated. The current implementation of SParSE, on the other hand, is highly likely to find the closest point (cross-entropy metric) in the solution hyperplane through multilevel cross-entropy method and exponential interpolation stages, both of which are not limited by the user-specified parameter ranges. In practical situations, it is likely that the user does not have enough systematic  The column identities match those of Table 1. N = 1 × 10 5 .
insight to identify a region that contains the solution hyperplane for a particular target event. We expect SParSE to be more efficient than SSA-URS by orders of magnitude in such aligned, as the performance of SParSE is much less sensitive to the dimensionality of the search space and the volume within P E of the solution hyperplane than the performance of SSA-URS is. We picked one scenario, P E = 0.40 and P E = 0.05, for visual comparison between SParSE and SSA-URS outcomes outcomes (animated illustration of SParSE simulations for this scenario is provided in Additional file 3). Figure 8 display the ensemble result for each method. The solution hyperplane for P E = 0.40 is represented by a cyan-colored surface, which was obtained by applying isosurface function in Matlab to the probability volume. We have omitted displaying the region corresponding to P E ± P E for clear visualization of data. We see that the volume of the solution hyperplane for this particular scenario is small relative to the volume of the entire voxel. Thus we expect poor performance from SSA-URS, which is confirmed by statistics in Table 5. SSA-URS generated only 5 successful parameter combinations out of 467 samples, while SParSE generated 30. Since one point in the solution hyperplane corresponds to one set of initial reaction rates in SParSE, this indicates 100% convergence. The number of data in Figure 8A and 8B are 305 and 467, respectively. Figure B contains 162 more data to compensate for the total number of intermediate event computations required by SParSE. Despite having fewer data, SParSE  produced not only 6 times the number of k * but also data that are closely scattered around the solution hyperplane. The latter fact offers couple advantages. First, having good resolution near P E ± P E enables more accurate mapping of the solution hyperplane, as it is unknown or computationally infeasible to be computed even for moderately sized systems (e.g., 5-20 parameters). In addition if we were to run another set of simulations on the identical scenario, we can generate initial reaction rates that are near the solution hyperplane using the past simulation results. Lastly, we chose one of 30 initial reaction rates to illustrate a complete progression of SParSE on the SIRS model. Unlike the set of initial reaction rates chosen for the birthdeath process (k (0) 14 = [1.2414 0.02445] with P E = 0.60 and P E = 0.01) in Figure 1, which under-perturbs the system, the set of initial reaction rates chosen here, k (0) 26 = [0.0942 1.7150 0.6196], over-perturbs the system. Figure 9 displays the flow chart of SParSE simulations for this scenario, and Figure 10 illustrates the results from the first and second exponential interpolations, respectively. The interpolants for all three reactions exhibit a good fit with respect to the past biasing parameters, and the quality of fit improves in the second iteration with updated past estimates closer to the target probability. According to the flow chart and Figure 10A, SParSE entered the first iteration of exponential interpolation with four past estimates, three of which under-perturbed the system (from multilevel cross-entropy method). After exhausting the candidate biasing parameters from the first iteration, all of which produced estimates greater than 0.61, SParSE entered a second iteration of exponential interpolation. At this point, the top five closest estimates to P E = 0.60 all came from over-perturbing biasing parameter sets. SParSE then removed the most over-perturbing set and inserted the least under-perturbing set in attempt to http://www.biomedcentral.com/1752-0509/8/126 improve the quality of the interpolant. Figure 10B reflects these modifications. The last candidate from the second interpolation produced an estimate within P E = 0.01, at which point the algorithm exited with the final reaction rates (k * (26) = [0.0917 1.899 0.587]). We note that the slope of the interpolants in Figure 10 are opposite from the ones in Figure 2. This is because inverse biasing technique is used for over-perturbing reaction rates, as described by Equation 12.

Conclusions
In this paper, we presented SParSE-a novel stochastic parameter estimation algorithm for events. SParSE contains two main research contributions. First, it presents a novel modification of the multilevel cross-entropy method that (1) concurrently computes multiple intermediate events as well as their corresponding biasing parameters, and (2) handles over-perturbing initial reaction rates as well as under-perturbing ones. Second, it uses information from past simulations to automatically find a path to the parametric hyperplane corresponding to the target event with user-specified probability and absolute error tolerance.
By introducing a novel heuristic for handling reaction rates that over-perturb the system, SParSE can handle target events whose probability does not need to be rare with respect to the initial reaction rates k (0) . If the user wishes to compute the probability of observing P E with respect to k (0) , however, it can be done by simply running the dwSSA with biasing parameters that are the ratio between the final reaction rates k * from SParSE and k (0) . No additional multilevel cross-entropy simulations are required by the dwSSA to determine biasing parameters since the final set of reaction rates computed by SParSE contains this information. For this reason, SParSE improves upon the dwSSA in that it can handle an additional type of rare event. The only class of rare events whose probability dwSSA can estimate is the one that is seldom reached by the system using the original reaction rates. SParSE, on the other hand, can also compute the probability of events that are reached too often with respect to the target probability using the original reaction rates. Average frequency of observing such target event with k (0) would be much higher than the desired frequency (i.e., (P E ± P E ) × N), and therefore the probability of observing E with success rate P E ± P E and reaction rates k (0) would be very small, yet its biasing parameters are uncomputable with the dwSSA, but are computable with SParSE.
It is important to note that the computational complexity of SParSE is independent of the number of parameters to be estimated. Like the dwSSA [17], SParSE utilizes information-theoretic concept of cross-entropy to concurrently compute biasing parameters for all reactions in the system. Moreover, SParSE avoids serial computation of biasing parameters for multiple intermediate events at any given stage of multilevel cross-entropy method by introducing a clever ordering of intermediate events and data management. Figures 4, 5 and 8 illustrate that SParSE not only is more efficient than SSA-URS in finding k * but also gives a better resolution of the area near the solution hyperplane. This is because intermediate reaction rates computed by SParSE are guaranteed to be closer http://www.biomedcentral.com/1752-0509/8/126 to k * than k (0) is. Thus intermediate reaction rates near k * can be used to improve the quality of interpolation in constructing the solution hyperplane. Another computational asset of SParSE is that it is highly parallelizable. In large scale application, multiple sets of initial reaction rates can be dispatched separately since each set finds its way to the solution hyperplane independently from each other. In smaller scale, SParSE estimate computation or an ensemble of multilevel cross-entropy method simulations also can be parallelized. In simulating examples presented in this paper, we have chosen the latter method; each set of N simulations was distributed among 8 cores using the Parallel Computing Toolbox™ in Matlab. Lastly, a single SParSE trajectory from the multilevel cross-entropy method without any biasing (i.e.,γ = − → 1 ) generates the same number of uniform random numbers as the SSA does. The only difference is that SParSE requires additional data management for recording biasing parameter information (two floating point numbers for each reaction [17]), which is used in the next round of multilevel cross-entropy method. It is difficult to compare the exact computational cost between the two methods when SParSE utilizes γ = − → 1 ; depending on the amount of bias applied per reaction, the number of random numbers generated per trajectory will differ between the two methods even if the same reaction rates were used. For the exponential interpolation stage in SParSE, SSA is used to computep SParSE , thus the computational cost of SParSE and SSA trajectory are identical for a given set of reaction rates.
One of the inputs required by SParSE is a range of values each parameter can take. There is no theoretical limit on the parameter range SParSE can manage; however, it is required for the following practical reasons. First, the http://www.biomedcentral.com/1752-0509/8/126 volume of the solution hyperplane could be infinite if we do not confine parameter ranges. For the reversible isomerization process presented in the previous section, all solution hyperplanes from the 9 standard test scenarios are defined by the ratio between the two reaction rate parameters; infinitely many pairs exist that keep this ratio conserved. In addition, a range is required to sample initial reaction rates. If a user wishes to use a distribution other than the uniform distribution to generate initial reaction rates, different statistics (mean, standard deviation, etc.) may be needed.
We remind our readers that although parameter ranges are used to constrain the position of initial reaction rates, the same ranges are not enforced on the final reaction rates on the solution hyperplane. The main reason for this is that there is no guarantee the solution hyperplane intersects with the volume defined by the user-specified parameter ranges. By not limiting the final reaction rates to reside within the user-specified region, SParSE is able to find a set of reaction rates that lie on the solution hyperplane that are close to the user-specified parameter ranges. For example, in Figure 3, white dashed lines represent parameter ranges specified prior to the simulation. We see that 3 of 30 initial sets reached the solution hyperplane but are outside this region. We also see that some intermediate reaction rates (white squares) escape the region but return to it by the time k * (red square) is found. For most practical applications, we know neither the curvature of the solution hyperplane nor the existence of it within the prescribed parameter ranges. The parameter ranges for all examples in this paper were chosen such that all possible values in (0 1) are captured while the volume of a solution hyperplane for any particular P E is well-defined within this region. Therefore we expect the computational gain from employing SParSE over SSA-URS to be much higher for an arbitrary problem where the user is unable to provide informative parameter ranges for the target event of interest and its desired probability.
Future work will focus on two main areas whose improvement will substantially benefit the algorithm. First, the multilevel cross-entropy method for SParSE can improve from employing an adaptive ρ(δ) function, whose values for determining intermediate events would change as the simulation progresses. While SParSE proved to be computationally efficient for all three examples presented in this paper, their results demonstrated that the same ρ(δ) function can produce qualitatively different behavior on how the system approaches the solution hyperplane. We can use past values of ρ(δ) and its effect onp SParSE to estimate the speed of convergence toward the solution hyperplane. This can potentially reduce the number of multilevel cross-entropy method iterations, where reduction of each iteration saves 2 × N simulations. The second area of future research will be on efficient sampling of initial reaction rates. Once SParSE finishes simulating first sets of k (0) , positions of resulting k * may be far away from each other and thus insufficient to construct an accurate picture of the solution hyperplane. Instead of randomly sampling the next set of initial reaction rates, we can utilize information from the prior ensemble of SParSE simulations to improve the positioning of the next set of k (0) . For example, we can construct a rough interpolation (e.g., linear interpolation) of the solution hyperplane using k * s from the first ensemble, and sample the next set from the estimated solution hyperplane, which could be constrained by the user-specified parameter ranges if necessary. A more sophisticated method would be required