### G1/S transition model

To test the proposed algorithm for computing uncertainty-based weights, the G1/S transition model was used [12, 24]. This model consists of 2 variables, pRB (Retinoblastoma protein) and E2F1 (Activator), and 10 model parameters. We also consider the initial concentrations of both variables as parameters, and therefore, the total number of model parameters is 12. The ordinary differential equations of the model are described in Eq. (7). Since pRB inhibits E2F1 activation, the concentration of E2F1 decreases as the concentration of pRB increases as shown in Fig. 3a. Afterwards, the concentrations of both variables approach steady state gradually.

$$ {\begin{aligned} \frac{d}{dt}[pRB]&=K_{1}\frac{[E2F1]}{K_{n1}+[E2F1]}\frac{J_{11}}{J_{11}+[pRB]}-\varphi_{pRB}[pRB]\\ \frac{d}{dt}[E2F1]&\,=\,K_{p}\,+\,K_{2}\frac{a^{2}+[E2F1]^{2}}{K_{n2}^{2}+[E2F1]^{2}}\frac{J_{12}}{J_{12}+[pRB]}\,-\,\varphi_{E2F1}[E2F1] \end{aligned}} $$

(7)

To generate experimental data, we simulated Eq. 7 using the parameter setting in [12] as the true parameter, with observed time points evenly spaced every 25 min from 0 to 800 min. The simulated data was perturbed with a small amount of multiplicative and additive Gaussian noise [14]. Figure 3a shows the simulated experimental data. The total number of experimental data points is 66 (33 data points for each variable). Based on this experimental data, we examined two versions of the model, one with 6 unknown parameters, and the other with 12 unknown parameters. For both versions of the model, the initial concentrations of the two variables were treated as unknown parameters.

#### G1/S transition with 6 unknown parameters

To consider a simple version of the G1/S transition model, four model parameters (*K*_{1},*J*_{11},*K*_{n2} and *J*_{12}) and the initial concentrations of both variables were treated as unknown parameters. Using the iterative algorithm described in the “Methods” section, weights of the 66 data points were calculated and visualized in Fig. 3a. The solid curve represents the weight of each data point and the horizontal dotted line represents the equal-weight formulation where every data point receives the same weight 1. As shown in Fig. 3b, the initial data point (*t*=0) of both variables received the highest weights among all data points. This is because the initial data points directly reflect values of two unknown parameters, which are the initial concentrations. Among other data points, those in dynamically changing regions (from ∼25 to ∼250 mins) received relatively larger weights compared to those in flat regions (from ∼250 to ∼800 mins).

To compare the equal-weight cost function and the weighted cost function, the sampling algorithm introduced in the “Methods” section was applied to visualize the collection of acceptable parameter settings. For each cost function, the interior-point algorithm [22] was used to estimate the underlying parameter. The estimated parameter setting was then used as the seed for the sampling algorithm to generate collections of acceptable parameters. We defined the threshold for acceptable to be three times the cost value of the estimated parameter setting. Finally, we simulated the model using the acceptable parameters. Figure 4 visualizes the model predictions generated from the acceptable parameters for both cost functions. The gray belt represents the range of acceptable model predictions and the black curve is the experimental data (Fig. 3a).

Figure 4a represents the model predictions of the collection of acceptable parameters derived by the sampling algorithm with the equal-weight cost function. Figure 4b corresponds to the weighted cost function. From this figure, we can see that the belts corresponding to the second variable (E2F1) are quite different between the equal-weight and weighted cost functions. The equal-weight cost function resulted in a belt that is relatively thick in the dynamic region, and relatively thin in the flat region. This is because the large number of data points in the flat region essentially carries the same information, the steady state, forcing the parameter estimation algorithm to focus on finding the steady state accurately, even at the expense of errors in the dynamic region. This unbalanced belt width illustrates the limitation of the equal-weight cost function described in Fig. 1. The belt width of the first variable (pRB) is more balanced, because the flat region of the first variable is shorter than the second variable, and the data points for pRB are noisier than those for E2F1. For the weighted cost function, the belt is much thinner in the dynamic region. This is because data points in the dynamic region received larger weights while data points in the flat region received smaller weights. By redistributing the weights according to the relative importance of each data point, we can derived the weighted cost function, giving larger weights to the data points in dynamically changing regions and lower weights to flat regions.

To test the sensitivity of the algorithm for computing the weights, we generated 100 experimental datasets by randomly perturbing the noise-free simulation in Fig. 3a. The variation among the 100 datasets is shown in Fig. 5a. Using the iterative algorithm, weights are computed based on each experimental dataset. The variation among the 100 sets of weights is shown in Fig. 5b. The first measurement time point for both variables consistently receive large weights across the 100 datasets, very robust to the noise. For other measurement time points, we can observe the same pattern as in Fig. 3b, where the dynamically change regions consistently receive higher weights than flat regions.

#### G1/S transition with 12 unknown parameters

To consider a more challenging situation, we assumed that all 12 parameters are unknown. Now we have to estimate 12 unknown model parameters with the same experimental data (Fig. 3a), and perform the iterative algorithm to calculate the weights. The resulting weights are shown in Additional file 1: Figure S1, which is qualitatively similar as before (larger in dynamic regions and lower in flat regions), but not exactly the same. This shows that the uncertainty-based weights do not just simply encode the curvature of the given data. They are also influenced by the mathematical structure of the ODE-model and which parameters need to be estimated.

Additional file 1: Figure S2 illustrates the model predictions of acceptable parameters for both cost functions, using the sampling algorithm. The comparison is qualitatively the same as the previous 6-parameter example. The equal-weight cost function (Additional file 1: Figure S2A) led to highly unbalanced belt width because of the large number of data points in the steady state region, whereas the weighted cost function (Additional file 1: Figure S2B) led to relatively balanced belt width by assigning larger weights to data points in dynamic regions and lower weights to the data points in flat regions.

To examine the sensitivity of the weights in the 12-parameter model, we used the same 100 experimental datasets introduced in the previous example (Fig. 5a), and calculated weights based on the 100 datasets. The variation of the weights is shown in Additional file 1: Figure S3. Comparing Fig. 5b and Additional file 1: Figure S3, we can see that weights of the data points in this 12-parameter model are not as robust as the weights in the previous 6-parameter model. This is caused by the higher complexity of the 12-parameter model, making the parameter estimation component of the iterative algorithm to overfit to the noise and subsequently lead to different weights.

#### Unevenly spaced time points

In the 6-parameter and 12-parameter models above, the experimental data points are evenly spaced along the time axis. As shown in Fig. 3b and Additional file 1: Figure S1, data point in different time periods receive different weights, and the magnitudes of weights decrease as time increases, indicating that the data points located at later time points may be redundant. Therefore, to reduce the effect of these redundant data points, we manually selected an unevenly spaced time points as the experimental observations. Data points in dynamic regions are densely sampled, while data points in steady state region are sparsely sampled. The selected time points are 0, 5, 10, 15, 20, 30, 40, 50, 75, 100, 125, 150, 175, 200, 300, 400, 600, and 800. With the measurement time points changed, we generated an experimental dataset by randomly perturbing noise-free simulation of the model, as shown in Fig. 6a where the circles represent the unevenly spaced measurement time points and the simulated experimental observations. Using this data, the iterative algorithm was performed to obtain the weights shown in Fig. 6b. In contrast to the weights in the previous examples, all data points except the first data point (*t*=0) receive very similar weights regardless of the region, dynamic or steady state. This is because the measurement time points are selected strategically make the data points roughly equally important, so that redundancy among the data points is reduced.

Using the sampling algorithm, we visualized the model predictions of acceptable parameters for these two cost functions, under unevenly spaced time points. As shown in Additional file 1: Figure S4, the belts associated to the two cost functions are quite similar to each other. This is due to the low variation among the weights shown in Fig. 6b, which makes the weighted cost function (Additional file 1: Figure S4B) and the equal-weight cost function (Additional file 1: Figure S4A) almost equivalent to each other. This example shows that the proposed weighted cost function can also be achieved by strategically selecting measurement points to avoid redundancy in the resulting experimental data.

To test the sensitivity of the iterative algorithm, we randomly simulated 100 experimental datasets, shown in Additional file 1: Figure S5A. We applied the iterative algorithm to compute weights based on each experimental dataset. Additional file 1: Figure S5B describes the variations among 100 sets of weight, where weights for the majority of the simulated datasets are close to “1” (0 in log scale), corresponding to the equal-weight cost function.

### MAPK module

To test the proposed weighted cost function in a more complex model, the MAPK module was considered [25]. This module consists of 5 variables and 9 model parameters. The ordinary differential equations of this module are depicted in Eq. (8), where *X* represents MAPK, *XE* represents the complex of *X* with the enzyme *E*, *XP* is singly phosphorylated form of MAPK, *XPE* is complex of *XP* with the enzyme, and *XPP* is doubly phosphorylated form [25]. In this example, we assume that the concentration of the enzyme *E* is initially 0.01, changes to 10 at t=1, and changes back to 0.01 at t=5. When the enzyme concentration is increased at t=1, the dynamic variables either increase or decrease, in respond to the enzyme change. To generate the experimental data, we used the model parameters in [25] as the true underlying parameters. All 9 parameters and 5 initial conditions were considered as unknown model parameters, thus the total number of parameters is 14.

$$ {\begin{aligned} \frac{d}{dt}[X]&=-K_{1}[X]E+ K_{2}[XE]+K_{7}[XP] \\ \frac{d}{dt}[XE]&=K_{1}[X]E-(K_{2}+k_{3})[XE] \\ \frac{d}{dt}[XP]&=K_{3}[XE]-K_{7}[XP]-K_{4}[XP]E+K_{5}[XPE]+K_{8}[XPP] \\ \frac{d}{dt}[XPE]&=K_{4}[XP]E-(K_{5}+K_{6})[XPE] \\ \frac{d}{dt}[XPP]&=K_{6}[XPE]-K_{8}[XPP] \end{aligned}} $$

(8)

Figure 7a shows the simulated noise-free data generated from the true parameter and noisy experimental data obtained by randomly perturbing the noise-free data. The measurement time points are evenly spaced, every 0.5 h from 0 to 10 h. Therefore, the total number of experimental data is 105 (21 data points for each variable). When the catalyzing enzyme *E* concentration was changed at t=1 and t=5, the dynamic variables responded to the changesu. For example, the concentration of *X* decreased rapidly after t=1, while the concentrations of remaining four variables increased. As the catalyzing enzyme *E* decreased back at t=5, all variables returned to the initial condition gradually.

Using the iterative algorithm, we calculated weights for the data points, shown in Fig. 7b. Similar to the previous models, the initial data point at t=0 receives the largest weight because it is directly related to some of the unknown model parameters. The data points in the dynamic region (from t=1 to t=2) of all five variables receive the large weights. Since all five variables exhibit little dynamics from t=2 to t=5, weights of data points in these flat regions are relatively small. After t=5, weights of *X*, *XP*, *XPE*, and *XPP* slightly increased because their model predictions are changed dynamically, whereas the concentration of the *XE* barely changed and hence its data points after t=5 received small weights.

Figure 8 shows the results of the sampling algorithm for both cost functions. In this example, the acceptance threshold was defined as five times the cost value of the optimal parameter setting. In Fig. 8a, equal-weight cost function, the belt of variable *XE* (the second row) is quite thick in the dynamic region and thin in the flat region. This imbalanceness of acceptable model predictions between dynamic and flat regions is consistent with results in the previous examples. In Fig. 8b, the corresponding belt of variable *XE* generated with the weighted cost function is much thinner in the dynamic region, compared to the equal-weight cost function. This is because the dynamic regions receive larger weights than the flat regions. For all five dynamic variables, the belt width (variation in acceptable model predictions) from the weighted cost function is smaller than or equal to that from the equal-weight cost function.

To test the noise sensitivity of the weights in the MAPK module, we randomly generated 100 noisy time series data, as shown in Fig. 9a. We applied the iterative algorithm to compute weights starting from each of the 100 times series data. The resulting weights are shown in Fig. 9b. The first measurement time points of all variables consistently receive high weight, because they directly reveal the initial condition parameters. The variation of each weight across the 100 noisy datasets is small compared to the variation of weights across different data points, indicating robustness of the algorithm with respect to noise.