### Synthetic time series

The proposed method was tested on synthetic time series generated by reference test models [11, 18, 19] of 2, 4, and 5 state variables (Equations 2, 3, and 4 respectively). Each system was simulated with different initial concentrations of its variables in order to imitate different biological stimulus-response experiments as described in [18]. All specifications of the simulations with different initial conditions can be found in Additional file 1.

In all three case studies, no knowledge about the pathway was assumed and all parameters were considered freely variable. Even so, the correct network topology was extracted in all cases, with a mean error magnitude of 10^{-5} for each numerically integrated state variable.

The 2-dimensional system

\begin{array}{l}{\dot{X}}_{1}=3{X}_{2}^{-2}-{X}_{1}^{0.5}{X}_{2}\\ {\dot{X}}_{2}={X}_{1}^{0.5}{X}_{2}-{X}_{2}^{0.5}\end{array}

(2)

exhibits oscillatory behavior that is challenging for estimation purposes, leading to difficulties of standard algorithms in finding good solutions. The reason is that even small shifts in the oscillation phase between the dynamics of the estimated system and the true target system result in significant cumulative errors. By contrast, the 4-dimensional system

\begin{array}{l}{\dot{X}}_{1}=12{X}_{3}^{-0.8}-10{X}_{1}^{0.5}\\ {\dot{X}}_{2}=8{X}_{1}^{0.5}-3{X}_{2}^{0.75}\\ {\dot{X}}_{3}=3{X}_{2}^{0.75}-5{X}_{3}^{0.5}{X}_{4}^{0.2}\\ {\dot{X}}_{4}=2{X}_{1}^{0.5}-6{X}_{4}^{0.8}\end{array}

(3)

(see Figure 1 for the corresponding pathway) is relatively well behaved and will be used to identify problems that are likely to emerge even for the inference of less complicated dynamic models. The third system (Equation 4) describes an artificial genetic network and has been used as a benchmark [11, 18, 20] for S-system inference algorithms.

\begin{array}{l}{\dot{X}}_{1}=5{X}_{3}{X}_{5}^{-1}-10{X}_{1}^{2}\\ {\dot{X}}_{2}=10{X}_{1}^{2}-10{X}_{2}^{2}\\ {\dot{X}}_{3}=10{X}_{2}^{-1}-10{X}_{2}^{-1}{X}_{3}^{2}\\ {\dot{X}}_{4}=8{X}_{3}^{2}{X}_{5}^{-1}-10{X}_{4}^{2}\\ {\dot{X}}_{5}=10{X}_{4}^{2}-10{X}_{5}^{2}\end{array}

(4)

The results of the algorithm on the 2, 4 and 5-dimensional systems, presented in Additional file 1, demonstrate that the proposed method retrieves the correct parameter values for noise-free time series. Three different data sets were created for each test systems (Equations 2, 3 and 4) using different initial conditions in the system's numerical integration (see Additional file 1). These three data sets allowed us to assess the ability of the algorithm to deal with different time series dynamics. Using each data set, we performed 10 trials for each system's variables (*X*_{
i
}). The runs differed in the random initial guess for *β* (see *Initial parameter guesses* section for the initialization of the kinetic order values) which was chosen from the range [0.1, 12]. The search space for kinetic orders was limited to a reasonable range of [-2, 3], which is consistent with collective experience in the field (see Chapter 5 in [4]). As an example result, the experiment with the 5-dimensional system performed on the first data set illustrates the success rate of the algorithm: the exact parameter values were found for all variables in all trails except for variable *X*_{5} in one of the trials. The procedure is computationally efficient, requiring 3 minutes to perform 40 optimizations for the 4-dimensional system (10 optimizations for each state variable corresponding to approximately 5 seconds per case), on a personal computer with a 2.00 GHz processor and 1 GB RAM. Thanks to the numerical decoupling, the complexity of the algorithm is of the order *O(M*N)* where *M* is the number of state variables and *N* is the number of data points used in the optimization. All experiments were performed with 100 data points. For the 5-dimensional system the proposed algorithm found the correct parameter set, overcoming the problematic identification of the kinetic orders *g*_{32} and *h*_{32} of the state variable *X*_{3} presented by most algorithms in the literature. If a stop criterion is defined as a value of 1e-12 for the sum of the squared errors between the slopes of the optimized system and the true slopes, the time required to identify the system parameters for the 5-dimensional system was 23 sec on the machine described above. An experiment with a 10-dimensional system was also performed and the total time consumed was 75 sec (see Additional file 1).

Similar results were achieved with the optimization of the 2-dimensional system. Importantly, the correct parameter set was found, although not with the same regularity as in the 4- and 5-dimensional system optimizations. Issues encountered in finding the correct solutions appeared to be caused by a combination of different features of the system, such as the position of the optimal point within the feasible parameter space, which in the 5-variable case is situated right on the border of the infeasible region within the parameter space (see Figure 1 of the Additional file 1), multiple local minima, as well as the particular choice of initial parameter guesses. These peculiarities of the algorithm and the problem itself lead to different parameter values, although the errors of the decoupled and integrated system are still small (typically about at the order of 1e-5; for instance, see Tables 23, 29 and 30 in the Additional file 1).

The proposed algorithm calculates the initial guesses for the kinetic orders as close to zero as possible, given an initial *β* value (see section *Initial parameter guesses*). However, in this specific case study, near-zero values of the kinetic orders *h*_{11} and *h*_{12} for the constant rate *β*_{1} = 1 fall into the infeasible parameter region, which complicates the parameter optimization. For instance, the smallest feasible value for *h*_{12} is 0.8636. The proposed algorithm overcomes this initial problem by adjusting itself and subsequently returns correct solutions when the system is rescaled in time [21]. This is most easily achieved by multiplying the alphas (*α*_{1} and *α*_{2}) and betas (*β*_{1} and *β*_{2}) with a positive factor (see example in Additional file 1), which increases the feasible parameter space. This step is, in fact, equivalent to multiplying the slope vector by a positive number. Thanks to the modularity of the decoupled system, this scaling can be performed separately for each state variable without affecting the kinetic order values. Only the values of the rate constants are changed, but they are easily recovered by dividing them by the positive number used for scaling. It was observed that this strategy often, but not always, enhances the algorithmic performance. It appears to improve performance most if the rate constants have small values.

Initially, all experiments were performed with noise-free time series, but in a second set of experiments, we added noise. Because the proposed algorithm uses the decoupled, algebraic form, a signal extraction procedure was employed for the noisy data to provide smooth time series and slopes [22]. The results show that combining the two strategies (smoother and proposed algorithm) generate accurate dynamical responses for the case studies used in this report (Figure 2).

### Error surfaces of decoupled S-systems

To explore the results of the proposed algorithm visually and to investigate patterns of convergence, we performed a grid search on the parameters of the 2-dimensional system (Equation 2). Specifically, we searched a 100 × 100 grid where each point represented the kinetic orders *h*_{11} and *h*_{12} over the range [-2.5, 2.0]. Correspondingly, 100 time points for *X*_{1} and *X*_{2} and its correspondent slopes *S*_{1} and *S*_{2} were generated by numerical integration of the 2-dimensional system (Equation 2) with *X*_{1}(0) = 3 and *X*_{2}(0) = 1 as initial conditions. Methods described in a later section were used on time series of *X*_{1} and *X*_{2} to calculate the regression matrix *L*, and for each given initial value of the rate constant *β*_{1} (uniformly spaced over the interval [1, 6]) and for each point of the grid, the error surface for the variable *X*_{1} was constructed. The algorithm started with the degradation term D{T}_{1}={\beta}_{1}{X}_{1}^{{h}_{11}}{X}_{2}^{{h}_{12}} for the first grid point using a given value for *β*_{1} and the time series points for *X*_{1} and *X*_{2}. Subsequently, the production vector (*Vp*_{1} = [log(*α*_{1}) *g*_{11} *g*_{12}]) was obtained from the slope vector *S*_{1}, the regression matrix *L*, and the degradation term *DT*_{1} in Equations (7)–(10). Once all parameter values for variable *X*_{1} in the production and degradation vectors were determined, the estimated slopes were calculated ({\widehat{S}}_{1} = *PT*_{1} - *DT*_{1}) and the logarithm of the sum of the squared errors between these slopes and the target solutions was computed as error=\mathrm{log}\left({\displaystyle \sum {({S}_{1}-{\widehat{S}}_{1})}^{2}}\right). This process was repeated for all points on the grid such that an error surface resulted for each *β*_{1} value. In this manner, ten surfaces were constructed using different *β* values; they are shown superimposed in Figure 3.

The first observation is that most of the search region is not feasible (unfilled *X-Y* space), even though there is *a priori* no hint that solutions in the open range should not converge. It turns out in retrospect that these are regions where the argument of the logarithm on right side of Equation 7 is negative, due to negative slope values. Also worth noting is that for each *β* a similarly shaped surface ("bowl") was found, but that not all surfaces have the same minimal point (Figures 3 and 4). This information will be of critical importance in the discussion of the convergence profile of the proposed method.

The same strategy was applied to noisy time series resulting in a new set of surfaces (data not shown). Gaussian noise with 15% variance was added to the *X*_{1} and *X*_{2} time series and a refined Whitaker's filter [22] was used to smooth the data and estimate slopes.

The error surfaces obtained using noisy data (Figure 5) present the same shapes as seen for the noise-free data except that the error average is higher and points to a different global minimum, which however is essentially indistinguishable in value from the local optima (see Additional file 1 for details).

### Convergence problems

It would be unreasonable to assume that the algorithm converges to the global optimum under all imaginable conditions and initial settings: no estimation algorithm for nonlinear systems can – or should be expected to – measure up to such high a standard. For instance, if the ranges of initial guesses are changed or if the number of initial guesses is reduced, the algorithm may converge to an acceptable local minimum which, however, is not global. This is not surprising, given the complicated nature of the error surface of realistic systems and the fact that nonlinear systems often exhibit almost flat, banana-shaped or ellipsoid valleys in which the minimum is centered [23–27]. At this point, a comprehensive picture of potential obstacles to convergence is not available. One prominent reason for lacking or faulty convergence is that some problems are ill-posed, for instance, because of collinearity between columns of the regression matrix *L*. This situation occurs when two or more metabolites have similar dynamics [25] or when at least one variable is essentially constant and is therefore collinear with the first column of the *L* matrix. In these and some other cases, the regression matrix *L* has a high condition number, which the proposed procedure flags. It might be possible to remedy some of these ill-posed problems with a regularization algorithm for multiple linear regression and through redesigning the algorithm with the regularized solution. It seems advisable in any event to remove model redundancies, for instance by pooling or eliminating collinear variables or merging essentially constant variables with the rate constants of the term.

### Parameter estimation of constrained networks

The proposed method was extended to address the parameter identification for systems with topological constraints. This extension allows the algorithm to account for precursor-product relationships problems, which mandate that the degradation term of the precursor is equivalent to the production term of the product [28]. Thus, instead of optimizing the parameters for each metabolite separately, a set of terms is optimized simultaneously, consisting of one of the parameter vectors (production or degradation vector) of each metabolite. As an illustrative, simple example, consider a linear pathway with feedback, where we have to account for constraints between the production and degradation terms of subsequent metabolites (Figure 6). Specifically in the example system, the efflux from *X*_{1} is identical to the influx into *X*_{2}, and the efflux from *X*_{2} is identical to the influx into *X*_{3}. Consequently, the degradation term of *X*_{1} is exactly the same as the production term of *X*_{2}, and the degradation term of *X*_{2} must be the same as the production term of *X*_{3}. The amendment of the proposed method toward simultaneous estimation readily satisfies these types of constraints.

The extended algorithm was applied to the 3-dimensional linear pathway system in Figure 6, and some of the results are shown in Additional file 1. The algorithm found the correct parameter set, and all 10 optimizations, in which the algorithm now performs a single, combined optimization for all variables simultaneously, thereby accounting for constraints, were completed in 37 sec on a 2.00 GHz processor with 1 GB RAM.

### Graphical user interface

An open source MATLAB toolbox and a stand-alone compiled Graphical User Interface (GUI) application were developed as an exploratory tool (see Section *Availability and requirements*). The application was developed as a modular extension of our previous work and constitutes a critical component within our long-term effort of advancing a data processing pipeline for S-system estimation from metabolomic time series [13, 22]. A snapshot of the GUI is shown in Figure 7. All computational results and graphics described in this report can be reproduced using this application.