Dynamic estimation of specific fluxes in metabolic networks using non-linear dynamic optimization
- Dominique Vercammen^{1},
- Filip Logist^{1} and
- Jan Van Impe^{1}Email author
https://doi.org/10.1186/s12918-014-0132-0
© Vercammen et al.; licensee BioMed Central. 2014
Received: 31 March 2014
Accepted: 13 August 2014
Published: 3 December 2014
Abstract
Background
Metabolic network models describing the biochemical reaction network and material fluxes inside microorganisms open interesting routes for the model-based optimization of bioprocesses. Dynamic metabolic flux analysis (dMFA) has lately been studied as an extension of regular metabolic flux analysis (MFA), rendering a dynamic view of the fluxes, also in non-stationary conditions. Recent dMFA implementations suffer from some drawbacks, though. More specifically, the fluxes are not estimated as specific fluxes, which are more biologically relevant. Also, the flux profiles are not smooth, and additional constraints like, e.g., irreversibility constraints on the fluxes, cannot be taken into account. Finally, in all previous methods, a basis for the null space of the stoichiometric matrix, i.e., which set of free fluxes is used, needs to be chosen. This choice is not trivial, and has a large influence on the resulting estimates.
Results
In this work, a new methodology based on a B-spline parameterization of the fluxes is presented. Because of the high degree of non-linearity due to this parameterization, an incremental knot insertion strategy has been devised, resulting in a sequence of non-linear dynamic optimization problems. These are solved using state-of-the-art dynamic optimization methods and tools, i.e., orthogonal collocation, an interior-point optimizer and automatic differentiation. Also, a procedure to choose an optimal basis for the null space of the stoichiometric matrix is described, discarding the need to make a choice beforehand. The proposed methodology is validated on two simulated case studies: (i) a small-scale network with 7 fluxes, to illustrate the operation of the algorithm, and (ii) a medium-scale network with 68 fluxes, to show the algorithm’s capabilities for a realistic network. The results show an accurate correspondence to the reference fluxes used to simulate the measurements, both in a theoretically ideal setting with no experimental noise, and in a realistic noise setting.
Conclusions
Because, apart from a metabolic reaction network and the measurements, no extra input needs to be given, the resulting algorithm is a systematic, integrated and accurate methodology for dynamic metabolic flux analysis that can be run online in real-time if necessary.
Keywords
Background
Metabolic network models describing the biochemical reaction network and material fluxes inside microorganisms open interesting routes for the model-based optimization of bioprocesses. The estimation of these fluxes is called metabolic flux analysis (MFA). Based on measurements of exchange fluxes between the environment and the cell, and possibly thermodynamic, physiological, statistical [1] or loop-law constraints [2] and/or measurements from ^{13}C labeling experiments, an accurate estimate of the full set of fluxes can be obtained (e.g., [3],[4]). Dynamic metabolic flux analysis (dMFA) has lately been studied as an extension of regular MFA, rendering a dynamic view of the fluxes, also in non-stationary conditions [5]. Recent dMFA implementations suffer from some drawbacks, though. In this work, a new methodology based on a B-spline parameterization of the fluxes is presented. These are estimated using state-of-the-art dynamic optimization methods and tools, i.e., orthogonal collocation, an interior-point optimizer and automatic differentiation. The resulting algorithm is also fully contained, resolving the fact that in previous methods, a choice of the set of free fluxes was required. As will be shown, the choice of this set has a significant influence on the resulting estimates, highlighting the need for a more reasoned determination of this set. In this algorithm this set of free fluxes is chosen optimally, alleviating the need for an a priori (non-optimal) choice and improving the estimates.
In the Background section, the basics of metabolic reaction networks are covered, together with the derivation of the dMFA model structure. Existing implementations of dMFA are described, along with their features and drawbacks. The main contribution of this work is in the Methods section. Here, the proposed methodology of incremental flux estimation using B-splines is described. This methodology is validated on the case studies in the Results and discussion section. Finally, the Conclusions section summarizes the main results of this work.
Metabolic reaction networks
Modeling of intracellular dynamics
with t the time [h], c _{int} the (m _{int}×1) vector of intracellular concentrations [mmol/gDW], S _{int} the m _{int} rows of the stoichiometric matrix which correspond to the intracellular metabolites, v the (n×1) vector of specific fluxes [mmol/gDW/h] and μ the scalar specific growth rate of the organism [1/h] which equals the flux of the pseudo-reaction to biomass. The first term on the right hand side is the reaction term, the second term is a dilution term which arises due to the growth of the biomass. To get a fully defined model, expressions for all intracellular fluxes are needed.
Multi-scale model for microbial dynamics
with c _{ext} the (m _{ext}×1) vector of extracellular concentrations [mmol/L], c _{bio} the scalar biomass concentration [gDW/L], S _{ext} the m _{ext} rows of the stoichiometric matrix which correspond to the extracellular metabolites and ${\mathbf{\text{s}}}_{\text{bio}}^{T}$ the row of the stoichiometric matrix which corresponds to the biomass pseudo-metabolite. This system is for the description of concentration evolution in a bioreactor operating in batch-mode, but of course transport terms can be added to make it suitable for fed-batch or continuous operation of bioreactors. It is important to notice that c _{ext} and c _{bio} are defined per liter of medium, while c _{int} is defined per gram of biomass dry weight. As the fluxes are specific, i.e., defined per gram of biomass dry weight, which is more descriptive from a kinetics point of view, the fluxes are multiplied with the biomass concentration in Equations (2) and (3). Again, to fully define this model structure, expressions for all fluxes are needed. These expressions are typically estimated from experimental data.
Simplifying the multi-scale model: assuming a pseudo steady-state
Note that to fully define this simplified model, only expressions for the free fluxes need to be identified, resulting in a substantial reduction in experimental and numerical cost.
with S _{e} of size (m _{ext}+1×n), and ${\mathbf{\text{q}}}_{\text{bio}}^{T}$ a (1×m _{ext}+1) row vector which selects the last element of x, the biomass concentration, i.e., ${\mathbf{\text{q}}}_{\text{bio}}^{T}=\left[0\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}0\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}\dots \phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}0\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}1\right]$. This more concise representation will be used in the remainder of this text.
Dynamic metabolic flux analysis
with n _{time} the total number of time points at which measurements were taken and n _{out} the number of outputs of the system. This system includes algebraic states z(t), which represent the irreversible fluxes. By constraining these states to be positive, the irreversible fluxes are also kept positive. Furthermore, y(t) is the (n _{out}×1) vector of outputs of the system, which can be any non-linear function f of the states and free fluxes, and y _{ j }(t _{ i }) is the model output j at time t _{ i }. The objective function is a weighted sum of least squares, with m _{ ij } and σ _{ ij } respectively the average and the standard deviation for the measurement of output j at time point t _{ i }, and x _{0} is the vector of initial values for the states, which is typically also an optimization variable.
This problem has been treated in literature in different ways. Antoniewicz [9] identifies four approaches for dMFA. (i) The first approach [10] divides the experimental time domain in metabolic phases, after which in each phase a classical, static MFA problem is solved, based on averaged measurements of exchange fluxes. This method does not produce time-resolved fluxes. (ii) In another approach [11], the measurements themselves are approximated by spline functions, which are then differentiated, resulting in a set of extra flux measurements. These measurements are then used to estimate the (free) fluxes using a series of standard, static MFA problems at different points in time. This approach is easy to use, but presents a number of disadvantages [5]. The most important disadvantage is the fact that every set of data is fitted with its own, independent set of parameters, disregarding the correlation between the different measurements in the estimation process itself. Due to this independent estimation, it is also not possible to use a consistent criterion for assessing the goodness of the fit, as not all data are taken into account in every estimation problem. Furthermore, by representing the dynamic problem as a series of disconnected, static problems, important information on the dynamic nature of the system is lost. Further information loss also occurs when taking derivatives of the spline functions. Also the dynamic, possibilistic framework of Llaneras et al. [12] can be categorized in this class. In this framework, dynamic extracellular concentration measurements are taken into account into a possibilistic MFA strategy by approximating the derivatives of the concentrations. Again, the need for numerical differentiation is an important drawback of also this methodology. (iii) Most of these drawbacks have been overcome using the approach described in [5]. In this method, the free fluxes are not estimated as specific fluxes, but are combined with the biomass concentration to non-specific free fluxes. These are then parameterized as piecewise linear functions, which ascertains that the dynamic system can be solved analytically, resulting in a non-dynamic, non-linear parameter estimation problem. This approach, unfortunately, introduces some new drawbacks, i.e., the fact that specific fluxes, which are most descriptive from a biological kinetics point of view, cannot be estimated directly, again resulting in a loss of information when these need to be calculated, the non-smoothness of the flux profiles because of the piecewise linear description, and the fact that the irreversibility constraints on the fluxes are not taken into account. (iv) The most recent approach, called Dynamic Flux Estimation (DFE) [13], uses power-law or Michaelis-Menten kinetic functions to describe the fluxes, which must be a priori postulated. This kinetic information is not yet available for all metabolic reactions in a range of environmental conditions, so an approach which does not need these kinetic functions is preferred. The interested reader is referred to [9] for a review of applications of these four classes of methods for dMFA.
The approach which is presented in this work addresses the disadvantages of previous methods by establishing the true non-linear, dynamic nature of the dMFA problem and using state-of-the-art tools for solving this kind of problems. More specifically, the dynamic optimization problem is solved using (i) direct collocation on finite elements to obtain a finite dimensional optimization problem, and (ii) automatic differentiation to calculate exact first and second order information. The smoothness of the free flux profiles is ensured by using B-spline functions of second order. B-splines have already been used to discretize the state variables for parameter estimation in biological models [14], but not yet to discretize the fluxes in dMFA models. By using all data at once in the parameter estimation process, the goodness-of-fit of the resulting model can be assessed in a consistent way. Furthermore, no knowledge of the kinetics of the different metabolic reactions is needed. However, if this information is available, it can easily be integrated in this methodology in all possible functional forms, including non-linear expressions, as the problem is solved as a non-linear optimization problem.
Methods
This section is organized in the following way. First, the dMFA problem is transformed into a dynamic parameter estimation problem by use of the B-spline parameterizations. This problem is then further discretized by applying the orthogonal collocation technique, rendering the definition of an NLP subproblem for a fixed number of internal knots in the different spline functions. An adaptive, incremental algorithm to generate a sequence of these subproblems is then defined, in which the number of internal knots is systematically increased, and the experimental horizon is elongated until the full experiment is described. After this, an extension of the algorithm is described in which the K matrix is chosen optimally, i.e., in such a way that the sum of squared errors (SSE) is minimized. This extension makes the algorithm fully integrated, once a network is chosen and measurements are provided. Finally, the determination of the confidence algorithms for the flux estimates is outlined.
Parameterization of the free fluxes using B-splines
However, based on the recursive definition, only g+2−(k+1)=g−k+1 basis functions can be defined from the g+2 knots. This means that 2k extra knots need to be added to fully define the spline function. This is usually done by adding k knots at the beginning and at the end of the knot sequence, equal to the starting and ending knot, respectively [15],[16]. For k=2, e.g., the total sequence of knots is t _{0},t _{0},t _{0},t _{1},…,t _{ g },t _{ g+1},t _{ g+1},t _{ g+1}. Based on this total knot vector and the order, the spline functions can be efficiently evaluated using the Cox-de Boor algorithm [17].
In the final sequential algorithm, three main operations on the splines are used. The first operation is constraining a knot to the specific measurement interval it is in. This is done to prevent knots from straying too far from their initial optimal location. The second operation is inserting a knot at the end of a specified time frame. In this operation, knot insertion, a feature inherent to B-splines, is used to insert a knot without changing the spline profile. This way, the next optimization can be started from a good initial guess, with an extra knot inserted. This operation also takes into account the bounds which were placed on previously added knots, i.e., the knot is inserted in the time frame at the end, where no knot has yet been inserted. The last operation prolongates the splines, which only changes the ending knot to the new value. This means that the spline profile is slightly changed, because the spline parameters stay the same, but is still close to the previous profile.
The vector g contains the number of internal knots for the d free fluxes, which are integer variables. The vector t _{knot} contains the internal knot locations for all free fluxes, i.e., this vector contains n _{g} elements, with ${n}_{\text{g}}=\sum _{i=1}^{d}{g}_{i}$. The last vector p _{u} contains all spline parameters or control points for all free fluxes. The number of elements in this vector equals n _{g}+d·(k+1). In total, there are d integer parameters, the numbers of internal knots, and 2·n _{g}+d·(k+1) continuous parameters, the knot locations and spline parameters, to estimate.
Formulation of the dynamic estimation problem
Discretization of the dynamic parameter estimation problem using collocation
The resulting dynamic optimization problem must be discretized in some way to be able to solve it [18]. In this work, direct collocation on finite elements was chosen [19]. For a full overview on the direct collocation method, the reader is referred to [18]. For the methodology in this work, cubic Lagrange polynomials were chosen, with collocation points situated at the Radau roots. The finite element borders were chosen at the time points of the measurements.
with p _{x} the collocation variables, including the initial values for the states.
Dimensions of the resulting non-linear programming problems
Variables | |
---|---|
Differential state variables p _{x} | 4·(n _{time}−1)·(m _{ext}+1) |
Algebraic state variables p _{z} | 4·(n _{time}−1)·n _{irr} |
Spline parameters p _{u} | n _{g}+d·(k+1) |
Internal knot locations t _{knot} | n _{g} |
Initial values x _{0} | m _{ext}+1 |
K matrix values | n·d |
Equality constraints | |
Differential state collocation constraints | 3·(n _{time}−1)·(m _{ext}+1) |
Differential state continuity constraints | (n _{time}−2)·(m _{ext}+1) |
Algebraic state collocation constraints | (3·(n _{time}−1)+1)·n _{irr} |
Algebraic state continuity constraints | (n _{time}−2)·(n _{irr}) |
Initial value constraints | m _{ext}+1 |
K null space constraints | m _{int}·d |
K orthogonality constraints | $\frac{d\xb7(d+1)}{2}$ |
The resulting NLP (31) is solved using the interior-point optimization routine IPOPT [20]. Gradient, Jacobians and Hessian are generated exactly using automatic differentiation with CasADi [21].
Adaptive incremental free flux estimation
as otherwise the denominator can become zero or negative. In the remainder of this text, A I C _{c} is indicated just as AIC, for simplicity.
After selecting the correct number of timepoints, the polynomials are fitted and the optimal AIC value is saved as F, together with the optimal splines U. Then, it is checked if a new knot can be inserted based on the number of measurements available at this point. If so, d optimization problems are generated, in which every time one knot is inserted into one free flux spline at a time. The three problems are solved, and the minimum AIC value over these problems is saved as A I C ^{∗}, along with the corresponding optimal splines U ^{∗}, in which there is now one knot in one of the splines. Now, two possibilities arise. If A I C ^{∗} is smaller than F, a new, better solution is found, and another knot can be added using the same steps just described. If A I C ^{∗} is higher than F, however, the old optimum was better than the new one, and so the old one is kept. At this point, an optimal solution for this number of time points has been found, and a new time point can be added if there is still one left to be added. After adding the time point, the splines are prolongated, and the prolongated problem is solved to get the new starting values for F and U for the next iteration. Once all time points have been added, U contains the final, optimal set of free flux profiles for the specified dataset.
Although the Akaike criterion can compare different models, it cannot assess the absolute goodness-of-fit. To this end, a χ ^{2}-test is also executed on the resulting model [24]. The resulting model is accepted if the variance weighted sum of squared errors is smaller than the critical χ ^{2}-value for the specified confidence level (95%) and number of degrees of freedom, which is the number of measurements minus the number of independent parameters.
This methodology ensures that each optimization problem is initialized with an excellent initial guess for the knot locations and spline parameters, rendering shorter optimization times and convergence to at least a decent local minimum in each iteration.
Choice of the null space basis K
with I a (d×d) identity matrix. In this last constraint, only the diagonal and one of the two off-diagonal parts are independent, since K ^{ T }·K is symmetric. In total, n·d variables are added (all elements of K), along with m _{int}·d null space constraints and $\frac{d\xb7(d+1)}{2}$ orthogonality constraints, rendering $\frac{d\xb7(d-1)}{2}$ extra degrees of freedom for the optimization.
Formulation of the non-linear estimation problem with an optimal K
Determination of confidence bounds on the estimated flux profiles
After the optimal model is estimated, uncertainty of the parameters and free flux profiles is estimated using a Monte Carlo bootstrapping methodology [25]. The Monte Carlo approach is used frequently in MFA studies [26]. In this work, the method is used because of the non-linearity and constraints in the optimization problem. In this case, the Fisher information approach can give highly different confidence intervals because of these non-linearities and bounds. Based on the assumption of normally distributed measurements with known variances, 1000 sets of measurements were sampled from these distributions and for each set of measurements, parameters were estimated, resulting in 1000 sets of parameter values and 1000 free flux profiles. 95% confidence intervals were generated by sorting these parameters and taking the 2.5^{th} and 97.5^{th} percentiles as respectively lower and upper confidence bounds.
Results and discussion
This results section is structured in the following way. First, the small-scale case study is presented, along with the network, simulated measurements and reference fluxes used to simulate these measurements. For the small-scale network, different cases with and without measurement noise, and with different K matrices, are introduced. Then, a detailed description of the iterations made by the algorithm for one case is given, to further clarify the operation. After this, the results of all cases in the small-scale case study are shown and discussed, highlighting the most important features of this methodology. Then, the medium-scale case study is treated, again by showing the network, simulated measurements and reference fluxes. Finally, the results for this case study are presented, with both the fixed and optimal (free) K matrices, along with an analysis of the computational complexity for both case studies.
Description of the small-scale case study
The methodology was executed for 4 different K matrices: (i) the one corresponding to free fluxes 1, 4 and 5, i.e., the same one as was used to generate the simulated measurements, (ii) the one corresponding to free fluxes 3, 6 and 7, of which free fluxes 3 and 6 are reversible, as opposed to the first case where all free fluxes are irreversible, (iii) an orthonormal basis obtained through the Matlab command null(S _{int}), and (iv) a variable basis which is optimized using the additional constraints as described before. These cases are referred to as 145, 367, orthonormal and optimal, respectively. In total, the algorithm was run 8 times: 4 times for the different cases with low noise, and 4 times for the different cases with the realistic noise realization.
All reactions, all stoichiometric matrices, the irreversibility matrix, the null space basis matrices, and the simulated measurements and measurement variances for this case study are given in the Additional files 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 and 12.
Description of the algorithm iterations
Overview of the iterations done by the incremental flux estimation algorithm for the 145 case
Iter. no. | n _{time} | n _{p} | AIC before insertion | Min. AIC | Comment |
---|---|---|---|---|---|
1 | 4 | 13 | 208.0 | — | No new knot possible |
2 | 5 | 13 | 86.7 | 150.0 | No better minimum |
... | ... | ... | ... | ... | ... |
7 | 10 | 13 | 41.0 | 50.3 | No better minimum |
8 | 11 | 15 | 58.6 | 48.0 | Knot inserted in flux 1, interval 7 |
15 | 48.0 | 57.9 | No better minimum | ||
9 | 12 | 17 | 88.2 | 58.4 | Knot inserted in flux 1, interval 9 |
17 | 58.4 | 65.5 | No better minimum | ||
10 | 13 | 19 | 1562.6 | 1400.7 | Knot inserted in flux 1, interval 10 |
21 | 1400.7 | 100.5 | Knot inserted in flux 1, interval 11 | ||
23 | 100.5 | 85.8 | Knot inserted in flux 3, interval 8 | ||
23 | 85.8 | 100.4 | No better minimum | ||
11 | 14 | 23 | 81.1 | 93.8 | No better minimum |
12 | 15 | 23 | 78.2 | 89.2 | No better minimum |
13 | 16 | 23 | 78.8 | 85.5 | No better minimum |
14 | 17 | 25 | 92.0 | 82.4 | Knot inserted in flux 3, interval 12 |
25 | 82.4 | 92.3 | No better minimum | ||
15 | 18 | 25 | 80.1 | 88.9 | No better minimum |
... | ... | ... | ... | ... | ... |
18 | 21 | 25 | 79.9 | 84.1 | No better minimum |
For this network, n _{out}=4 and d=3, so the number of time points needed to start is 4, based on Equation (39). In the first iteration, basically three second degree polynomials are fitted for each of the three fluxes. At this point it is not yet possible to insert a knot because after insertion the number of parameters would be 11, and Equation (38) would not be satisfied anymore. So a new time point is added at the end, and the problem is solved again for this extended dataset. Now, it is possible to insert a knot. Three subproblems are generated, one for each flux spline in which a knot is inserted. The minimum AIC for these subproblems (150.0) is however bigger than the previous one (86.7), so no new minimum is found, and the dataset is extended again. This same pattern goes on until iteration 8, which is shown in more detail in Figure 7. In step 8A, the prolongated problem of the previous iteration is solved, giving the base value for F (58.6), and the starting set of free flux profiles U for this iteration. The dashed lines indicate the profiles before optimization, i.e., after the prolongation step of iteration 7, and the solid lines are the profiles after optimization. Step 8B consists of three optimizations, one for each free flux. In step 8B _{1}, a knot is inserted into the spline corresponding to free flux 1, which, as is shown by the dashed lines, does not change the profile of this flux. The profiles are then used as good starting values for the optimization, after which again the solid profiles result. The optimal location of this knot is 7.94. This same procedure is repeated in steps 8B _{2} and 8B _{3}, with insertion in free flux 2 and 3 respectively. After this, the minimum AIC value over steps 8B _{1}, 8B _{2} and 8B _{3} is collected (48.0), along with the corresponding profiles. In this case, the minimum is found when inserting a knot in free flux 1. Since the minimum AIC value is also lower than F, the profiles from step 8B _{1} form the starting values for a new round of insertions, after a constraint has been added that constrains the location of the newly added knot to its corresponding measurement interval, in this case between 7.0 and 8.0. This new round of insertions does not yield a better minimum (57.9). This means the profiles after step 8B _{1} are the best possible in this iteration, and these profiles are prolongated in step 8C. As can be seen in the figure, the prolongation changes the profiles slightly, but they are still good starting values for the base optimization of iteration 9 (step 9A), yielding an F value of 88.2. Because of the knot that is already present in step 9B _{1}, the new knot for free flux 1 is inserted after time 8.0. The new minimum (58.4) is again found after insertion in free flux 1, and a second insertion in this iteration does not result in an improvement (65.5), so the profiles after step 9B _{1} are prolongated in step 9C. The algorithm ends in iteration 18, because the full dataset is used at that point, rendering the optimal flux profiles for the full experiment.
The fact that this procedure results in a good sequence of starting values for the different optimization problems is clearly shown in Figure 6. The dashed lines are again the profiles before each iteration, the solid lines are the profiles after the iteration. Because of the high degree of non-linearity resulting from the free knot locations, these good starting values are essential for the efficient estimation of the fluxes.
Results in the low noise setting
These figures indicate that, except in the orthonormal case, the methodology gives an accurate representation of the fluxes in the theoretically optimal low noise setting. The big differences with the reference profiles in the orthonormal case are mainly due to numerical problems, since in this low noise setting, a lot of knots are inserted (about 10 per free flux), resulting in harder optimization problems and longer optimization times. Although the other methods for the choice of K can cope with these difficulties, this is apparently not the case for the orthonormal method. The integral numbers for the other three methods are very close, so there is no clear difference between them in this low noise setting.
Results for the realistic noise realization
Results for the different cases: the intervals where knots are inserted for the three free fluxes, values for SSE and AIC and the number of parameters
Choice of K | Knot intervals for free flux | SSE | AIC | n _{p} | ||
---|---|---|---|---|---|---|
1 | 2 | 3 | ||||
1, 4 and 5 | 7, 9, 10, 11 | — | 8, 12 | 7.49 | 79.9 | 25 |
3, 6 and 7 | 10, 11, 12 | 10, 11, 12 | 7, 8, 10, 11 | 49.50 | 160.4 | 33 |
Orthonormal | 8, 11, 12 | 9, 10, 11 | 7, 8, 10, 11 | 14.41 | 125.3 | 33 |
Optimal | 7, 9, 10, 11 | — | 8, 12 | 1.33 | 86.9 | 28 |
First of all, it is clear that the algorithm puts the knots in locations where they are most needed. In the 145 case, four knots are chosen for flux 1, which exhibits the highest degree of curvature, two knots for flux 5, and no knots for flux 4, which has the flattest profile. Fluxes 3, 6 and 7, on the other hand, all exhibit profiles with a much higher degree of curvature, resulting in a higher total number of knots for the 367 case. Both the SSE and the number of parameters are higher for this case, resulting in a higher AIC value. Based on these observations, a possible strategy for choosing the basis K could be to choose free fluxes with a “flat” profile, i.e., with a low degree of curvature. This is in practice not feasible as the flux profiles are of course not known beforehand. A possible solution would be to choose a basis, estimate the fluxes, and choose a new basis based on the results of the estimation. For this to work, one would need a way of quantifying the degree of curvature of a function, which is not trivial. It would also be computationally very demanding, as the full estimation procedure would have to be repeated a number of times.
The choice of a random orthonormal basis is also clearly not satisfactory. For this case, the knot locations cannot be displayed anymore on the flux profiles, as the free fluxes are in this case linear combinations of the different fluxes. In the final, optimal case, from the set of possible orthonormal bases, the one which minimizes the goodness-of-fit is chosen during the course of the algorithm, introducing a minimal number of extra parameters for the optimization. The estimated profiles for this case, as well as for the 145 case, accurately resemble the reference profiles, although the integral numbers indicate that the optimal case is slightly better. In the optimal case, however, three extra parameters are introduced, resulting in a lower SSE, but a slightly higher AIC value and computational time. This is only a small penalty, though, not outweighing the benefits of having an accurate estimate using only one run of the algorithm, and the fact that the algorithm is completely contained, as the user does not have to make any choices once the measurements and the details of the network are given. In this simulated case study, the results for the 145 case are probably better because the same basis was used to simulate the measurements. In a real-life setting, though, there is no generating set of free fluxes, and a choice regarding this set of free fluxes cannot be made. Thus, the addition of the determination of the optimal K basis is a welcome addition in real-life settings, as the optimal basis is always found, without any choices required by the user. The operation of the algorithm in optimal K mode thus makes the algorithm fully contained.
Description of the medium-scale case study
The medium-scale network was adapted from [5], and all reactions are available in the supplementary data of that work. A few alterations were made, though. The pseudo-reaction of the glucose feed to the glucose in the medium (reaction 59 in [5]), was removed, as this flux was considered constant and thus does not need to be estimated. Furthermore, reaction 41 in [5] (42 in Additional file 13), threonine to glycine, was set to be reversible, as otherwise no biomass could be formed, and also reaction 61 in [5] (66 in Additional file 13), the citrate exchange reaction, was considered reversible. This resulted in a network with in total 68 fluxes, 62 intracellular metabolites and 10 extracellular metabolites (glucose, glycerol, ammonia, sulphate, citrate, 1,3-propanediol, acetate, carbon dioxide, oxygen and biomass). The number of free fluxes is 6, and these were chosen to be (numbering according to Additional file 13) flux 62 (glycerol exchange), 63 (glucose uptake), 65 (ammonia uptake), 66 (citrate exchange), 67 (acetate uptake) and 68 (oxygen uptake). Furthermore, there are 44 irreversible fluxes.
All reactions, all stoichiometric matrices, the irreversibility matrix, the null space basis matrix, and the simulated measurements and measurement variances for this case study are given in the Additional files 13, 14, 15, 16, 17, 18, 19, 20, 21 and 22.
Results for the medium-scale network
The estimation of the fluxes in the medium-scale network was carried out with both a fixed K matrix, again the same one as used to simulate the measurements, and an optimized K matrix. The estimated fluxes for the fixed K are shown in Figure 16, the results for the optimal K matrix are shown in Figure 17. Also, the fitted measurements for both cases are presented in Figures 14 and 15. It is clear that the proposed methodology is also able to estimate the fluxes in a larger network successfully. The confidence bounds are in this case, however, much wider than in the small-scale case. This is due to the larger amount of measurement noise added to the simulated measurements. Due to the minimization of the AIC criterion, this is the best possible fit while at the same time keeping the uncertainty as low as possible. Also, the difference between the estimates with a fixed and an optimal K matrix is again very small. This again confirms the fact that the algorithm can be successfully used in the optimal K mode, i.e., without making an a priori choice on the free fluxes, as the results are very similar in the two cases. This is an important advantage of this method as in practice the set of free fluxes is not known.
Computational complexity of the algorithm
CPU times for the two case studies, both with fixed K and optimized K
Case study | K matrix | Running time in seconds |
---|---|---|
Small-scale | Fixed | 31.2 |
Small-scale | Optimal | 51.0 |
Medium-scale | Fixed | 494.0 |
Medium-scale | Optimal | 1771.4 |
Conclusions
In this contribution, a novel systematic methodology for dynamic metabolic flux analysis, based on B-spline parameterizations, has been presented. Because of the high degree of non-linearity in the estimation of the knot locations, an incremental knot insertion algorithm is proposed. By using this algorithm, at least an excellent local minimum is found. Furthermore, the algorithm is fully contained, as the user does not have to make any choices regarding the null space basis of the intracellular stoichiometric matrix. This methodology tackles the disadvantages of previous methods for dMFA by making sure that the estimates are smooth, that specific fluxes are estimated and that extra constraints can be taken into account. The algorithm is validated on a small-scale simulated case study in both a low noise and a realistic noise setting. In both cases, an accurate dynamic estimation of the fluxes is obtained. The algorithm was also tested on a more realistic network with 68 fluxes and 6 free fluxes. Although CPU times are longer, mainly due to a larger number of measurements, the algorithm was also able to successfully estimate the fluxes in this larger case study. The algorithm can still be run in real-time as biological processes are typically slow. To keep CPU times under control, the total number of measurements should be reduced if possible, as the CPU time per iteration tends to grow exponentially over the iterations. This can be done, e.g., by only considering time horizons which are of specific interest to the researcher. Possible further improvements of the algorithm, mainly in the regions of parallellization and subproblem initialization, are indicated.
Additional files
Declarations
Acknowledgements
Dominique Vercammen holds a PhD grant of the Agency for Innovation through Science and Technology in Flanders (IWT). Jan Van Impe holds the chair Safety Engineering sponsored by the Belgian Chemistry and Life Sciences Federation essenscia. The research was supported by the KU Leuven Research Fund: PFV/10/002 (OPTEC), OT/10/035; the Research Foundation Flanders (FWO): FWO KAN2013 1.5.189.13, FWO-G.0930.13 and the Belgian Federal Science Policy Office: IAP VII/19 (DYSCO).
Authors’ Affiliations
References
- Carinhas N, Bernal V, Teixeira AP, Carrondo MJT, Alves PM, Oliveira R: Hybrid metabolic flux analysis: combining stoichiometric and statistical constraints to model the formation of complex recombinant products . BMC Syst Biol. 2011, 5: 34-10.1186/1752-0509-5-34.PubMed CentralView ArticlePubMedGoogle Scholar
- Noor E, Lewis NE, Milo R: A proof for loop-law constraints in stoichiometric metabolic networks . BMC Syst Biol. 2012, 6: 140-10.1186/1752-0509-6-140.PubMed CentralView ArticlePubMedGoogle Scholar
- Zamorano F, Vande Wouwer A, Bastin G: A detailed metabolic flux analysis of an underdetermined network of CHO cells . J Biotechnol. 2010, 150 (4): 497-508. 10.1016/j.jbiotec.2010.09.944.View ArticlePubMedGoogle Scholar
- Fischer E, Zamboni N, Sauer U: High-throughput metabolic flux analysis based on gas chromatography–mass spectrometry derived 13C constraints . Biotechnol Bioeng. 2001, 76 (2): 144-56. 10.1002/bit.1154.View ArticleGoogle Scholar
- Leighty RW, Antoniewicz MR: Dynamic metabolic flux analysis (DMFA): a framework for determining fluxes at metabolic non-steady state . Metab Eng. 2011, 13 (6): 745-55. 10.1016/j.ymben.2011.09.010.View ArticlePubMedGoogle Scholar
- Van Impe JF, Vercammen D, Van Derlinden E: Toward a next generation of predictive models: a systems biology primer . Food Control. 2013, 29 (2): 336-42. 10.1016/j.foodcont.2012.06.019.View ArticleGoogle Scholar
- Llaneras F, Picȯ J: Stoichiometric modelling of cell metabolism . J Biosci Bioeng. 2008, 105 (1): 1-11. 10.1263/jbb.105.1.View ArticlePubMedGoogle Scholar
- Stephanopoulos GN, Aristidou AA, Nielsen J: Metabolic Engineering: Principles and Methodologies . 1998, Academic Press, San Diego, USAGoogle Scholar
- Antoniewicz MR: Dynamic metabolic flux analysis - tools for probing transient states of metabolic networks . Curr Opin Biotech. 2013, 24: 973-8. 10.1016/j.copbio.2013.03.018.View ArticlePubMedGoogle Scholar
- Lequeux G, Beauprez J, Maertens J, Van Horen E, Soetaert W, Vandamme E, Vanrolleghem PA: Dynamic metabolic flux analysis demonstrated on cultures where the limiting substrate is changed from carbon to nitrogen and vice versa . J Biomed Biotechnol. 2010, 621645: 1-19. 10.1155/2010/621645.View ArticleGoogle Scholar
- Niklas J, Schrȧder E, Sandig V, Noll T, Heinzle E: Quantitative characterization of metabolism and metabolic shifts during growth of the new human cell line AGE1.HN using time resolved metabolic flux analysis . Bioproc Biosyst Eng. 2011, 34 (5): 533-45. 10.1007/s00449-010-0502-y.View ArticleGoogle Scholar
- Llaneras F, Sala A, Picȯ J: Dynamic estimations of metabolic fluxes with constraint-based models and possibility theory . J Process Contr. 2012, 22 (10): 1946-55. 10.1016/j.jprocont.2012.09.001.View ArticleGoogle Scholar
- Chou I, Voit EO: Estimation of dynamic flux profiles from metabolic time series data . BMC Syst Biol. 2012, 6: 84-10.1186/1752-0509-6-84.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhan C, Yeung LF: Parameter estimation in systems biology models using spline approximation . BMC Syst Biol. 2011, 5: 14-10.1186/1752-0509-5-14.PubMed CentralView ArticlePubMedGoogle Scholar
- Dierckx P: Curve and Surface Fitting with Splines. New York . 1993, Oxford University Press, USAGoogle Scholar
- Van Loock W, Pipeleers G, De Schutter J, Swevers J: A convex optimization approach to curve fitting with, B-splines . Proceedings of the 18th IFAC World Congress: 28 August - 2 September 2011; Milano, Italy . Edited by: Bittanti S, Cenedese A, Zampieri S. 2011, International Federation of Automatic Control, Milano, Italy, 2290-5.Google Scholar
- de Boor C: A Practical Guide to Splines . 2001, Springer, New York, USAGoogle Scholar
- Biegler LT: Nonlinear Programming: Concepts, Algorithms, and Applications to Chemical Processes . 2010, MPS-SIAM, Philadelphia, USAView ArticleGoogle Scholar
- Biegler LT: Solution of dynamic optimization problems by successive quadratic programming and orthogonal collocation . Comput Chem Eng. 1984, 8: 243-248. 10.1016/0098-1354(84)87012-X.View ArticleGoogle Scholar
- Wächter A, Biegler LT: On the implementation of a primal-dual interior point filter line search algorithm for large-scale nonlinear programming . Math Program. 2006, 106 (1): 25-57. 10.1007/s10107-004-0559-y.View ArticleGoogle Scholar
- Andersson J, Åkesson J, Diehl M: CasADi – A symbolic package for automatic differentiation and optimal control . Recent Advances in Algorithmic Differentiation. Lecture Notes in Computational Science and Engineering. Volume 87 . Edited by: Forth S, Hovland P, Phipps E, Utke J, Walther A. 2012, Springer, Berlin Heidelberg, 297-307.View ArticleGoogle Scholar
- Burnham KP, Anderson DR: Multimodel inference: understanding AIC and BIC in model selection . Social Method Res. 2004, 33: 261-304. 10.1177/0049124104268644.View ArticleGoogle Scholar
- Motulsky H, Christopoulos A: Fitting Models to Biological Data Using Linear and Nonlinear Regression: A Practical Guide to Curve Fitting . 2004, Oxford University Press, New York, USAGoogle Scholar
- Snedecor GW, Cochran WG: Statistical Methods, Eighth Edition . 1989, Iowa State University Press, Ames, USAGoogle Scholar
- Efron B, Tibshirani R: An Introduction to the Bootstrap . 1994, Chapman and Hall/CRC, London, UKGoogle Scholar
- Schellenberger J, Zielinski DC, Choi W, Madireddi S, Portnoy V, Scott DA, Reed JL, Osterman AL, Palsson BØ: Predicting outcomes of steady-state ^{13}C isotope tracing experiments using Monte Carlo sampling . BMC Syst Biol. 2012, 6: 9-10.1186/1752-0509-6-9.PubMed CentralView ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.