 Software
 Open Access
DMPy: a Python package for automated mathematical model construction of largescale metabolic systems
 Robert W. Smith†^{1, 2},
 Rik P. van Rosmalen†^{1},
 Vitor A. P. Martins dos Santos^{1, 2} and
 Christian Fleck^{1}Email author
Received: 18 December 2017
Accepted: 14 May 2018
Published: 19 June 2018
Abstract
Background
Models of metabolism are often used in biotechnology and pharmaceutical research to identify drug targets or increase the direct production of valuable compounds. Due to the complexity of large metabolic systems, a number of conclusions have been drawn using mathematical methods with simplifying assumptions. For example, constraintbased models describe changes of internal concentrations that occur much quicker than alterations in cell physiology. Thus, metabolite concentrations and reaction fluxes are fixed to constant values. This greatly reduces the mathematical complexity, while providing a reasonably good description of the system in steady state. However, without a large number of constraints, many different flux sets can describe the optimal model and we obtain no information on how metabolite levels dynamically change. Thus, to accurately determine what is taking place within the cell, finer quality data and more detailed models need to be constructed.
Results
In this paper we present a computational framework, DMPy, that uses a network scheme as input to automatically search for kinetic rates and produce a mathematical model that describes temporal changes of metabolite fluxes. The parameter search utilises several online databases to find measured reaction parameters. From this, we take advantage of previous modelling efforts, such as Parameter Balancing, to produce an initial mathematical model of a metabolic pathway. We analyse the effect of parameter uncertainty on model dynamics and test how recent fluxbased model reduction techniques alter system properties. To our knowledge this is the first time such analysis has been performed on large models of metabolism. Our results highlight that good estimates of at least 80% of the reaction rates are required to accurately model metabolic systems. Furthermore, reducing the size of the model by grouping reactions together based on fluxes alters the resulting system dynamics.
Conclusion
The presented pipeline automates the modelling process for large metabolic networks. From this, users can simulate their pathway of interest and obtain a better understanding of how altering conditions influences cellular dynamics. By testing the effects of different parameterisations we are also able to provide suggestions to help construct more accurate models of complete metabolic systems in the future.
Keywords
Background
Quantitative modelling of metabolic networks has helped design and improve the production of compounds relevant for the bioindustrial and pharmaceutical sectors [1, 2]. Many of the favoured methods to model large metabolic networks relate system inputs (e.g. growth conditions) to phenotypic outputs (for example, growth rate or compound secretion) using directed graphs, i.e. internal dynamics are not directly considered [3, 4]. However, to obtain a more detailed understanding of how system perturbations or alterations influence metabolic networks and, hence, the observed and predicted phenotypes, it is desirable to have a method of constructing consistent kinetic models of largescale metabolism. Here, we present a computational pipeline that brings together a range of recently published methods to convert a genomescale reaction network into a detailed mathematical model, usable to study and predict metabolic functions [5].
where x is the vector of metabolite concentrations, S is the stoichiometry matrix of the system detailing how one metabolite converts into another, and v(x,k) is the flux vector that describes the rate of metabolite conversion. Notably, the fluxes will generally depend on the concentrations of metabolites in the network and the parameters, or kinetic rates, k. Methods to solve Eq. 1 have approached the problem from two directions. First, by using the quasisteadystate assumption, the metabolite concentrations are believed to be constant relative to the time taken for a cell to grow [4, 6]. This method requires data such as growth and production rates of the species under study [7, 8]. Furthermore, gene expression and essentiality data, or metabolic flux measurements (for example from ^{13}Clabelling experiments), can be used to further constrain and validate the model. Second, by using mathematical functions (based on generalised MichaelisMenten kinetics) to describe v(x,k) with measured or estimated values of k and approximating the system solution numerically [9–11]. The model simulations can then be compared to measured timeseries profiles of metabolite concentrations. We shall briefly review both approaches here.
Constraint based metabolic models
which is a set of linear equations that can be solved for an optimal vector v when the system is optimized to, for example, maximize biomass production (although there are a variety of other functions, as shown in [7]). This method is commonly referred to as Flux Balance Analysis (FBA) [4, 6, 12]. Notable examples highlighting the utility of FBA approaches can be found in [3, 13–15] whilst [16] have recently used FBA to obtain a genomescale model of the human metabolic network, which has seen use in applications such as the discovery of anticancer drugs [17, 18]. Over recent years, a number of extensions to the FBA method have been developed. However, systematic analysis of these methods has shown that there is not one single bestperforming method that provides a reasonable match between simulated and measured reaction fluxes [2, 7, 8].
One notable development of FBA is dynamic FBA (dFBA) that aims to match timedependent changes in system outputs with internal flux dynamics [19–21]. In principle, dFBA is a set of FBA computations conducted independently at several timepoints. This results in a set of dynamically changing optimal flux vectors that cover the analysed timeperiod. However, such an approach can be problematic as the solution of FBA is often nonunique, i.e. there are multiple optimal flux vectors that can solve Eq. 2 for a given set of objective functions [22, 23]. Consequently, discontinuities in the flux vector can be observed when v is plotted against time, which is suggestive of a jump from one optimal flux vector to another [20, 21]. One method of solving this issue is to constrain the flux optimisation such that the timedependent change from one flux vector to another is not allowed to be large, thus limiting the search space for the optimisation routine [19, 20, 24]. The addition of constraints such as these enforces the reaction fluxes of a network to change continuously through time, in a similar manner to trajectories obtained from Eq. 1, but it is not clear how the trajectories depend on reaction rates or concentrations.
Dynamic metabolic models
with u_{ l } representing the enzyme concentration for reaction R_{ l } and \(\mathbf {k} =\allowbreak \{k^{cat+},\allowbreak k^{cat},\allowbreak k^{M}_{a},\allowbreak k^{M}_{b},\allowbreak \boldsymbol {\alpha },\allowbreak \boldsymbol {\beta }\}\), where k^{ cat } are the catalytic rates, k^{ M } are the Michaelis constants and α_{ i } (or β_{ j }) the reaction stoichiometry associated with concentration a_{ i } of reactant A_{ i } (or b_{ j } of product B_{ j }) [9]. What one should notice immediately is that, for even a simple reaction, a large number of kinetic rates need to be known or accurately estimated to ensure that Eq. 3 matches observed data (at least 4 rates for a reaction with a single reactant and product). Thus, whereas the pitfall of dFBA is that it requires a large number of system constraints to provide an accurate solution, the downside to kinetic approaches is that a vast amount of data is required to accurately parameterise models of large metabolic networks.
Parameter balancing
Fortunately, an increasing number of experimentallymeasured parameters required for the use of MichaelisMenten approximations are becoming available in online databases (for example BRENDA [29], SABIORK [30] and eQuilibrator [31]). Parenthetically, even if only in vitro estimates of kinetic rates are available, in certain cases the relationship to their in vivo value has been shown [32]. Furthermore, the Parameter Balancing (referred to as PB from hereon) method has been developed to utilise Bayesian inference techniques and include constraints on thermodynamic relationships between different parameters [33, 34]. Thus, one could obtain either measured or realistic estimates for a number of the parameters required to construct a kinetic model and simulate changes in metabolite concentration. Examples of such steps can be found in [1, 11], whilst [28] have shown that relatively small models constructed with fluxes given by Eq. 3 together with measured or estimated parameter values provides better matches to data than other tested functions.
Model reduction
The examples of [1, 11] show two different desired cases of modelling metabolism. In the first instance, [1] construct a detailed model describing the central metabolic pathways of L. lactis whilst, in the second, [11] produce a genomescale model of yeast metabolism. In principle, the conclusions drawn from larger models should be consistent with those of smaller, more detailed systems and vice versa when created in the same species. Thus one important consideration is that of model reduction. Based on current methods, model reduction could occur at one of three stages: pruning unimportant reactions directly from the genomescale network [35], grouping subsets of reactions into a single effective reaction on the subsystem of interest [36], or by assessing the kinetic rates of the system using timescale separation arguments fixing a subset of system components to constant values [37]. Notably, the methods of [35, 36] require only the reaction network and no information about the dynamics of the pathway in order to reduce the system size.
DMPy
Here, inspired by the workflows of [5, 11], we present an automated pipeline that can translate a (genomescale) reaction network into a dynamic reaction equation model. Given the network as an input, our pipeline automatically integrates measured kinetic rates from different sources (both online databases and measured or estimated values) with the PB technique, optionally reduces model size and, finally, translates the network into ordinary differential equations using Eq. 3. Importantly, this method differs from other (semiautomatic) methods of constructing ODE models (such as CellNetAnalyzer and COPASI [38, 39]) as parameters are obtained from readilyavailable experimental measurements rather than including an optimisation step to timeseries data that can prove very difficult for largescale systems. By generating simulated data of both the L. lactis central metabolic pathway and randomly generated reaction networks, we go on to analyse the accuracy of our pipeline and determine how many kinetic rates must be measured experimentally to obtain an accurate model. We also show the effect of fluxbased model reduction techniques on the resulting system dynamics, extensions to larger networks, and the utility of our pipeline by including compartmentalisation and regulation within metabolic pathways. The presented framework is intended to provide a first approximation of largescale metabolic dynamics within which further details and computational methods can be added as more data and information come to light. In the Discussion we will highlight how our pipeline can be extended to improve the resulting models (by incorporating different model reduction and parameter optimisation strategies) if the appropriate datasets are available.
Implementation and methods
Pipeline
In this section we shall provide an overview of the computational workflow. For further implementation details and required inputs please refer to Additional file 1. All simulations and testing were performed using Python version 2.7 (Python Software Foundation, www.python.org) and MATLAB R2012b (MathWorks, Massachusetts, USA).
In the following we shall discuss each stage of the pipeline and its purpose.
Parameter gathering (steps 14, Fig. 1)
Each of these pathways are exhaustively examined (Step 4, Fig. 2) until all the useful experimental measurements have been obtained. Full provenance and, if available in the original database, literature references are saved and can be manually examined if needed, for example, in the case of conflicts between identifiers.
Parameter balancing (step 5, Fig. 1)
Upon obtaining measured kinetic rates from online sources (and combining these with experimental measurements), a table of found values and their reference is automatically constructed, which is used as an input into the PB software. In order to improve the scalability of the parameter balancing algorithm and to support the parameter balancing of large genome scale models, the parameter balancing algorithm was implemented directly into the pipeline based on the original implementation by Lubitz et al. [34]. Here we shall briefly review the PB method and further details can be found in [33, 34, 41].
where k^{cat+} is the forward catalytic rate, k^{cat−} is the backwards catalytic rate, \(k_{j}^{V} = \sqrt {k_{j}^{cat+}k_{j}^{cat}}\) (s^{1}) is the geometric mean of catalytic rates, h_{ j } is the cooperativity factor for sigmoidal kinetics, n_{ ij } is the stoichiometric coefficient of metabolite i in reaction j, R is the gas constant (J ·mol^{1} ·K^{1}) and T is the temperature in Kelvin [34].
where ΔG is the chemical potential through a reaction [34]. The concentrations, c, obtained from the balancing routine are used as initial conditions when simulating the resulting ordinary differential equations (see below).
Mathematical translation (step 6, Fig. 1)
with x_{ A } representing the vector of components that activate the reaction and x_{ I } the vector of components that inhibit the reaction [33, 47]. Here, f_{ reg } is known as allosteric reaction regulation, whilst D_{ reg } is specific reaction regulation [33]. Thus, Eqs. 5 and (6) represent a generalised version of Eq. 3.
where v_{ f } and v_{ r } are the sum of forward and reverse fluxes, respectively. In the case of irreversible reactions, their forward or reverse rate can be fixed to zero to remove these fluxes. To include the effect of compartmentalisation one needs to appropriately rescale concentrations within the different subsystems. In the pipeline this can be done by assigning the metabolite to a compartment directly in the initial SBML file, which will be preserved in the final output model and can subsequently be compensated for by the numerical integration tool of choice.
Simulation (step 7, Fig. 1)
Now that we have a set of ordinary differential equations and kinetic rates to determine their functions, we can simulate the metabolic models given a set of initial conditions. Furthermore, system inputs can either be fixed constant or perturbed during simulation (e.g. a pulse of glucose uptake). Simulations were conducted using libRoadRunner [48] and Scipy [49]. As has been discussed previously, due to the size and stiffness of large genomescale ordinary differential equation models, numerical instabilities can make simulation difficult as processes occur on different timescales [5]. In all the presented results, we simulate the test models for 400s, providing a pulse in the level of one of the metabolites after 200s (Glyceraldehyde3Phosphate for the L. lactis and E. coli models). The initial conditions for the simulations are obtained from the PB routine, using the c’s in Eq. 4. The parameter values used are the median values obtained from each individual parameter distribution.
Model reduction (optional step)
The optional model reduction step is currently employed before Step 1 of our pipeline. To do this, we used the NetworkReducer algorithm that has been implemented in the MATLAB toolbox CellNetAnalyzer [35, 39]. NetworkReducer decreases the size of metabolic networks by iteratively removing the reaction with the lowest flux variability determined using Flux Variability Analysis (FVA) [24]. The iterative process ends when no more reactions can be removed without violating the behaviours of the full model (i.e. specific growth or production rates on a predefined medium). Finally, a compression step is used that compresses linear pathways into a single effective reaction. We applied both the pruning measure that removes reactions with the smallest range of possible fluxes whilst maintaining phenotypes and protected pathways, and the network compression step that lumps reactions from connected pathways to the subsystem of interest into a single reaction. For details on how the algorithm achieves this, please see [35]. In the analysed networks, we set parts of the glycolysis pathway to be protected, whilst a resulting growth rate within 1% of the growth rate obtained from the full model had to be maintained. The growth rate of the full system was approximated using COBRApy [50].
Test models
To highlight the utility of our pipeline, we will illustrate our analysis for multiple systems. In order to fully analyse the accuracy of the pipeline, we have used the central metabolic pathway of L. lactis [1]. The initial mathematical model of this system is already in the correct form of Eq. 7 for most of the reactions and provides ranges for kinetic rates within the system that we can include in our pipeline. To show that the pipeline can also be used for larger scale systems, we use the E. coli core and the iJO1366 genome scale models [51, 52] in combination with the optional reduction steps.
In order to show how our automated search for kinetic rates improves the number of parameters found from online databases, we also included the S. cerevisiae model iTO977 and the iJO1366 E. coli model that is well annotated relative to other species [52, 53].
Finally we have included randomly generated reaction networks in order to facilitate testing effects of including regulation and compartmentalization. See Additional file 1 for details on the generation of these models.
Pipeline analysis tests

how many times the posterior distributions need to be sampled before there is convergence in the mean square error between the samples and the simulated ‘gold standard’ data;

the amount of prior distributions that need to be known to obtain minimal differences between the model obtained from the pipeline and simulated data, and;

the effect of altering the width of the posterior distribution sampled from.
Convergence was tested by taking the standard error of deviation relative to the mean score. After an initial period of 25 samples, the simulation was marked as converged when \((\sigma _{s} / \sqrt {n}) / \mu _{s} < 0.05\), where σ_{ s } is the standard deviation of the score, μ_{ s } is the mean score and n is the number of simulations. If convergence was not obtained within 10000 samples, the simulation run was halted and marked as unconverged.
Results
Parameter gathering is improved by including identification translation and previously estimated rates
Coverage of kinetic rates obtained from the databases with (+) and without () the use of identifier translators
Model  Database  k ^{ M }  k ^{ cat }  k ^{ eq }  

+    +    +    
E. coli  # rates  10183  2583  2583  
iJO1366  BRENDA  637  591  286  259  /  / 
eQuilibrator  /  /  /  /  491  0  
SABIORK  228  0  4  0  0  0  
# found  700  591  290  259  491  0  
E. coli  # rates  380  95  95  
Core model  BRENDA  29  0  10  0  /  / 
eQuilibrator  /  /  /  /  37  0  
SABIORK  10  0  2  0  0  0  
# found  33  0  12  0  37  0  
S. cerevisiae  # rates  5509  1612  1612  
iTO977  BRENDA  342  330  113  108  /  / 
eQuilibrator  /  /  /  /  450  0  
SABIORK  104  96  3  0  0  0  
# found  370  356  116  108  450  0  
L. lactis  # rates  73  21  21  
BRENDA  2  2  1  1  /  /  
eQuilibrator  /  /  /  /  12  0  
SABIORK  2  2  0  0  0  0  
# found  3  3  1  1  12  0 
From Table 2, there are two results that should be highlighted. First, upon inclusion of the identification translating databases, the number of parameters for which values are found always increases, suggesting that our automated searching strategy functions correctly and finds a larger subset of parameter values within the databases. Second, as a percentage of the total number of kinetic rates being searched for, there is a higher fraction of equilibrium constants, k^{ eq }, available online than other reaction parameters. Unlike enzyme specific properties such as the catalytic rate constant or the Michaelis constant, the equilibrium constant is only specific to the chemical reaction and the environment in which it takes place. Thus they are more readily available then other kinetic rates, which have to be individually measured for each enzyme. Furthermore, the equilibrium constants can be predicted using methods such as component contribution [54], as implemented by eQuilibrator [31], which can also take into account factors such as the pH and ionic strength of the cellular environment. Despite this minor success, though, the number of measured reaction rates is only a small fraction (roughly 1%) of the total needed to fully parameterise a model. We discuss in Additional file 1 how our parameter searching algorithm can also filter results for specific experimental conditions, such as pH or temperature. This suggests that efforts to experimentally determine these rates in a highthroughput manner should be continued in order to improve our knowledge of parameter distributions required for model development. However, for all unmeasured kinetic rates, the PB algorithm constructs pseudodistributions to estimate the feasible range of certain parameters under specified experimental conditions. Alternatively, parameter estimation methods can be used when the appropriate data is available (see Discussion).
In Fig. 3ac, we show the distribution of each individual parameters median value obtained from the parameter distributions before and after using the PB algorithm. When looking at the distribution of median values found for the iJO1366 E. coli model, we observe that the distributions both before and after balancing are approximately lognormally distributed (Fig. 3ac), as is assumed by the PB algorithm. It is notable that the resulting posterior distribution of median values shows good overlap with parameter values used in models of E. coli central carbon metabolism that have been obtained using different parameter optimisation methods (Fig. 3df) [55, 56].
The amount and quality of prior knowledge influences accuracy of resulting model
Given the observation that altering the amount and quality of experimentally measured or computationally estimated parameter values influences the prior and posterior distributions obtained we wished to understand the effects of these changes on system dynamics more thoroughly. In order to do this we took the parameter balanced model of L. lactis metabolism and generated timeseries data of all compounds. This simulated ‘gold standard’ reference dataset was then used to compare the generated dynamics obtained using different varieties of input into our framework (Additional file 1: Figure S1). Notably, the dynamics are qualitatively different with our pipeline compared to the original model (compare blue line with orange and green lines) [1]. In our model, a glucose pulse leads to a decrease in FBP and G3P concentrations. This reflects two factors. First, the mathematical form of reactions in our model is constrained leading to a different model structure to that published by [1]. Second, in our balanced model we find MichaelisMenten constants and maximal velocity rates that differ from the original model. The net effect of these alterations is that the balance between F6P and FBP conversion is altered such that, after a glucose input, our model simulates an increase in FBP to F6P conversion rather than FBP to G3P. This leads to a drop in FBP and a corresponding decrease in G3P concentrations.
As one would intuitively expect, our analysis shows that having a larger number of experimentally measured or optimised kinetic rates results in system dynamics that better reflect the underlying biological network (Fig. 5). However, it is interesting to note that the number of samples required before convergence of the mean square error of the system dynamics compared to the ‘gold standard’ increased with the number of known parameters (Additional file 1: Figure S2, green lines). This likely reflects that, when little information is known, most sampled parameter sets from the posterior distribution yield equally poor simulations. For random networks we obtained more intuitive results where more accurate posterior distributions require fewer samples to converge (Additional file 1: Figure S2, orange lines).
Finally, it can be observed that there are multiple distinct peaks in the error distributions (Fig. 5). This can be an indication of local minima in the parameter landscape where the structure of the network has an inherent tendency towards certain concentration states. In Additional file 1 (Figs. 2 and 3) the same analysis was performed on a randomly generated metabolic network. This supports the observation that with less than approximately 80% of the parameters known the simulation error steeply increases.
In addition, it has to be noted that not only decreasing the fraction of known parameters causes an increase in error, but also increasing the sampling width of the posterior distribution (Fig. 6). This indicates that having high quality measurements or estimates of the parameters is another essential factor. However, simulated dynamics are more sensitive to decreased fractions of known parameters than a decrease in parameter quality (compare the improvement of simulation accuracy across rows of Fig. 6 to down columns). Thus, having a rough estimate of most parameters can be considered better than knowing few parameters with high accuracy.
Performing fluxbased model reduction techniques alters system dynamics
One aspect of model construction that may ease the requirement for measuring/approximating a large number of parameter sets is to reduce the size of metabolic networks being analysed. Thus, if the size of the metabolic network can be reduced, whilst maintaining an accurate description of experimentally observed phenomena, then the resulting mathematical model will contain lower numbers of reactions and kinetic rates. We explored the effect of reducing model size such that systems still maintained an optimal flux through the system using NetworkReducer (see Implementation and methods) [35]. This method allows one to not only prune reactions from the metabolic network that do not influence the optimal flux vector, but also to compress sidereactions into single, lumped reaction nodes.
Figure 7 highlights that system dynamics can be quantitatively altered through the use of NetworkReducer. Furthermore, there are clear qualitative differences between the dynamics of the full and reduced models for several system components. Therefore, this suggests that using fluxbased constraints for model reduction may not be the best method when one is interested in system dynamics and that networks should be reduced by other means (see Discussion).
Comparing pipeline utility for models of varying size
Comparison of run times and memory usage of pipeline for models of different sizes
Model  Reactions  Peak  Parameter  Parameter  Model  Total 

(metabolites)  memory  search  balancing  building  
(GB)  (min:sec)  (min:sec)  (min:sec)  (min:sec)  
E. coli  2583  29.6  22:05  33:14  2:12  57:33 
iJO1366  (1805)  
E. coli  95  0.2  0:58  0:04  0:07  1:10 
Core model  (92)  
S. cerevisiae  1612  19.1  13:18  11:44  1:14  26:18 
iTO977  (2245)  
L. lactis  21  0.2  0:34  0:01  0:01  0:41 
(26) 
Discussion
In this work we have introduced an automated computational pipeline that translates a genomescale network of cellular metabolism into a parameterised set of ordinary differential equations that can simulate the dynamic behaviour of system components. Whilst this pipeline is inspired by the works of [5, 11], these steps have previously required manual efforts to collect appropriate datasets, reaction rates and to translate the metabolic network into mathematical equations. Thus, bringing these processes together into a single package, DMPy, will have beneficial consequences for many researchers interested in the dynamics of metabolic pathways.
The pipeline has three key steps and one optional (model reduction) process (Fig. 1). Upon parsing a genomescale metabolic network, we have developed an algorithm that automatically searches through a set of prescribed online databases to find all available experimentally measured reaction rates (Fig. 2, Tables 1 and 2 and Additional files 1 and 2). Where parameters are unmeasured, pseudo distributions are constructed estimating the approximate value that these rates would take upon measurement. These parameter distributions based on our prior expectations are then used as an input into the previously published PB algorithm (Fig. 3) [34]. The resulting posterior parameter distributions then contain reaction rates that satisfy a range of thermodynamic criteria such as Haldane and Wegscheider relationships. Conveniently, the distribution of median parameter values that we use for model simulations closely resemble parameter distributions found using other methods to model E. coli carbon metabolism [55, 56]. The metabolic network is subsequently translated into a generalised set of differential equations using the reversible MichaelisMenten approximation (Eqs. 5 and (6)) that can be simulated to explore the dynamics of metabolism [9]. We show the utility of our framework by analysing the results of the central metabolic pathway of L. lactis, randomized networks, and both the full and the core model of E. coli metabolism [1, 51, 52]. In the following subsections we shall describe where we think efforts could be made to improve the effectiveness of our initial pipeline.
Precise measurements of kinetic rates in highthroughput shall improve model accuracy
We analysed the results of the algorithm in comparison to simulated ‘gold standard’ data for the central metabolic pathway of L. lactis that has been previously modelled using differential equations (Fig. 5) [1]. Notably, we found that the accuracy of parameterised dynamic models using our pipeline is dependent on the amount and quality of measurements of the kinetic rates within the L. lactis system. By resampling the posterior distributions 10000 times and computing the average error in comparison to the simulated dataset, we found that accurate estimates for >80% of the reaction rates are required to obtain an approximately close match between simulated and target metabolic timeseries data (Fig. 5). Whilst we note that this result may differ for other systems, this finding is notable as it is often the case that less than 1% of the required reaction rates have been measured (Table 2). This suggests that efforts of obtaining measurements or estimates of reaction rates are necessary to help construct accurate parameterised dynamic models of metabolism.
One way of improving dynamic models is through improved high throughput measurement techniques. By altering the width of prior parameter distributions we can analyse the effect of measurement precision of kinetic rates whereby narrower prior distributions imply more precise measurements. What we observed was that increasing the sampling width of the prior distributions quickly led to erroneous dynamics being simulated from the resulting model and parameter set (Fig. 6). This was supported both when a high or low fraction of the required reaction rates had been measured. Since in vitro parameter measurements can often have errors of an order of magnitude or more compared to in vivo, great care has to be taken when directly integrating these measurements. Techniques, such as that by [32], aim to alleviate this problem by integrating multiple sources of data from different conditions and could be integrated into our pipeline. In conclusion, the experimental setup used to obtain kinetic rates requires high precision in order to decrease the width of parameter distributions and improve model accuracy. We suggest that such an idealised highthroughput precise measurement method should be one of the key targets for future research in this area.
A second option to obtain good estimates of reaction rates from limited data is the use of parameter optimisation methods (as in [55, 56]). Recently, Fröhlich et al. have proposed a method that can find estimates for 100’s of parameters faster than previous methods [57]. Thus, one could include this method in our pipeline after the construction of the ODEs to finetune model dynamics. However, two aspects should be noted. First, any parameter estimation method should include constraints such that parameters satisfy the Haldane and Wegscheider relationships used in the PB algorithm (see Implementation and methods). Second, it is hard to predict how including such measures will increase the run time of our pipeline as presented in Table 3. Hence, finding an efficient parameter optimisation method for large systems is a key aspect of future research.
Model reduction should incorporate knowledge of system dynamics
In this work we have shown how fluxconstrained model reduction methods result in smaller systems that have qualitatively different dynamics to the original complete model (Fig. 7) [35, 36]. Essentially, these model reduction techniques look to remove reactions from larger networks that have little influence on the resulting reaction fluxes important for observed phenotypes. For example, in NetworkReducer (see Implementation and methods) [35], flux variability analysis (FVA) is used to find reactions that have minimal impact on resulting flux estimates. Thus, these minimalimpact reactions can be removed without altering system fluxes. However, since these methods are performed using the genomescale reaction network, the model reduction step is performed before the creation of a dynamic model and, subsequently, does not take into account any information about temporal system dynamics. Consequently, we have found that the dynamics of metabolic components are influenced by fluxbased model reduction techniques (Fig. 7). This suggests that any fluxbased reduction method is only useful as a first approximation when no data is available. The reduced model could then be used to aid direct experiments to obtain better quality data for key components in the system — conveniently also aiding the parameter estimation problem discussed above.
This raises the issue of whether methods can be automated that are able to reduce a largescale metabolic network to maintain specific dynamic requirements, such as a match to timeseries data. One could envisage two possible techniques; first, by reducing the differential equations based on timescale arguments (for example, the quasisteady state assumption) [37], and second, by using fluxbased reduction methods and appropriately reapproximating the parameters of reduced modules such that the remaining parts of the system are dynamically consistent with the full model. In the second instance this would require generating both a model of the full and reduced networks and then dynamically comparing their output. Thus, rather than developing a single model with our pipeline, two would need to be produced, thus increasing the computational time to generate a model. This is undesirable and, consequently, reducing dynamic models by focussing on a specific timescale may be more appropriate.
Within biological systems, processes occur across a range of timescales (roughly from femtoseconds to hours). Thus, one is generally interested in understanding what happens at a specific timescale and ignoring or simplifying those processes that happen too quickly or slowly, consequently reducing the systems complexity [37]. Importantly, the speed of reactions could be approximated by their maximal velocity. Upon adaptation of our pipeline, one could in principle search for all component dynamics that occur either too quickly or slowly compared to the components of interest and fix their concentrations as constants within the system. This reduces the number of components and equations that require parameterisation and simulation. Such ideas are the focus of future work and pipeline developments.
Expanding the mathematical functions within the model
One other limitation of our current pipeline is that every reaction is mathematically described by the same reversible MichaelisMenten approximation, including transport reactions or genetic interactions. In Additional file 1, we have presented comparative effects of including compartmentalisation within metabolic networks and the effects of including or altering regulatory mechanisms (Additional file 1: Figures S4 and S5), automating a process of selecting mathematical functions from a library of specific reactions may also increase the accuracy of the resulting models. This would allow for the appropriate depiction of known transport reactions between compartments and the addition of transport regulators or saturation effects. Tools such as SBMLSqueezer [58], allow for automated selection of rate laws based on the components and annotation of the reaction and could be integrated into the pipeline at the model generation step, although it has to be considered how different rate laws can fit into the assumptions of the PB framework. Additionally, recent work by [59] shows how to generate a genome scale map of regulatory interactions of small molecules. This could further improve the quality of the generated networks, especially when starting from existing genome scale metabolic networks, where this information is often lacking.
Furthermore, by allowing a user to easily manipulate and alter mathematical functions, submodules of larger networks can be easily replaced by known, moredetailed kinetic models of specific pathways. Importantly, our pipeline allows users to fix parameter values by using narrow prior distributions. Consequently, the resulting models will maintain the dynamics of a more detailed model whilst allowing one to observe the effects of the metabolic pathway on a genomescale reaction network. Future developments of this computational framework will aim to incorporate these ideas to provide more flexibility to users such that finer details can be added to largescale models of metabolism.
Conclusions
In conclusion, we have developed a modular framework that provides an initial approximation of temporal metabolic changes within a cell, which can easily be extended with additional data as required. Due to the modular approach, individual methods within the framework can be replaced or updated as they are enhanced. Furthermore, by generating more detailed data, we have shown that the accuracy of these dynamic models will improve given the current methods used within the pipeline. In addition to this, through the use of model reduction techniques and compartmentalisation within a cell, individual subnetworks or compartments within the dynamic model can be easily manipulated and replaced as metabolic pathways are studied in more detail. We envision that this framework will be of great use to the metabolic community as attempts continue to unravel the complex relationship between system inputs and physiological outputs that are relevant for the bioindustry sector.
Availability and requirements
DMPy can be found at gitlab.com/wurssb/DMPy along with all computer scripts and models used in this study. DMPy is written in Python, further implementation details can be found in Additional file 1.
Progject name: DMPy.
Homepage: gitlab.com/wurssb/DMPy.
Programming Language: Python.
Notes
Declarations
Acknowledgements
The authors would like to thank Agnieszka Wegrzyn, Barbara Bakker (University of Groningen, The Netherlands), Timo Lubitz, Wolfram Liebermeister (Humboldt Universität zu Berlin, Germany) and Niels Zondervan (Wageningen UR, The Netherlands) for helpful discussions about implementation and potential extensions/uses of modules within the algorithm.
Funding
RWS and VAPMdS are funded by FP7 Marie Curie Initial Training Network grant agreement number 316723 and Horizon 2020 project number 634942.
VAPMdS is funded by Horizon 2020 grant agreement number 635536. CF is supported by HFSP Research grant RGP0025/2013. RPvR is funded by the intramural funding Investment Theme Synthetic Biology at Wageningen UR.
Authors’ contributions
RWS, VAPMdS and CF designed the study. RPvR performed simulations. RWS and RPvR analysed results. RWS, RPvR and CF wrote the paper. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
 Costa RS, Hartmann A, Gaspar P, Neves AR, Vinga S. An extended dynamic model of lactococcus lactis metabolism for mannitol and 2,3butanediol production. Mol Biosyst. 2014; 10:628–39.View ArticlePubMedGoogle Scholar
 Rienksma RA, SuarezDiaz M, Spina L, Schaap PJ, dos Santos VAPM. Systemslevel modeling of mycobacterial metabolism for the identification of new (multi)drug targets. Semin Immunol. 2014; 26:610–22.View ArticlePubMedGoogle Scholar
 Varma A, Palsson BØ. Metabolic capabilities of escherichia coli: Ii. optimal growth patterns. J Theor Biol. 1993; 165:503–22.View ArticleGoogle Scholar
 Kauffman KJ, Prakash P, Edwards JS. Advances in flux balance analysis. Curr Opin Biotechnol. 2003; 14:491–6.View ArticlePubMedGoogle Scholar
 Smallbone K, Mendes P. Largescale metabolic models: from reconstruction to differential equations. Ind Biotechnol. 2013; 9:179–184.View ArticleGoogle Scholar
 Orth JD, Thiele I, Palsson BØ. What is flux balance analysis?. Nat Biotechnol. 2010; 28:245–248.View ArticlePubMedPubMed CentralGoogle Scholar
 Scheutz R, Kuepfar L, Sauer U. Systematic evaluation of objective functions for predicting intracellular fluxes in escherichia coli. Mol Syst Biol. 2007; 3:119.Google Scholar
 Machado D, Herrgard M. Systematic evaluation of methods for integration of transcriptomic data into constraintbased models of metabolism. PLoS Comput Biol. 2014; 10:1003580.View ArticleGoogle Scholar
 Liebermeister W, Klipp E. Bringing metabolic networks to life: convenience rate law and thermodynamic constraints. Theor Biol Med Model. 2006; 3:41.View ArticlePubMedPubMed CentralGoogle Scholar
 Liebermeister W, Klipp E. Bringing metabolic networks to life: integration of kinetic, metabolic, and proteomic data. Theor Biol Med Model. 2006; 3:42.View ArticlePubMedPubMed CentralGoogle Scholar
 Stanford NJ, Lubitz T, Smallbone K, Klipp E, Mendes P, Liebermeister W. Systematic construction of kinetic models from genomescale metabolic networks. PLoS ONE. 2013; 8:79195.View ArticleGoogle Scholar
 Bordbar A, Monk JM, King ZA, Palsson BO. Constraintbased models predict metabolic and associated cellular functions. Nat Rev Genet. 2014; 15(2):107–20.View ArticlePubMedGoogle Scholar
 Varma A, Palsson BØ. Metabolic capabilities of escherichia coli: I. synthesis of biosynthetic precursors and cofactors. J Theor Biol. 1993; 165:477–502.View ArticlePubMedGoogle Scholar
 Kholodenko BN, Cascante M, Hoek JB, Westerhoff HV, Schwaber J. Metabolic design: how to engineer a living cell to desired metabolite concentrations and fluxes. Biotechnol Bioeng. 1998; 59:239–47.View ArticlePubMedGoogle Scholar
 van Heck RGA, Ganter M, Martins dos Santos VAP, Stelling J. Efficient Reconstruction of Predictive Consensus Metabolic Network Models. PLOS Comput Biol. 2016; 12:1005085.View ArticleGoogle Scholar
 Swainston N, Smallbone K, Hefzi H, Dobson PD, Brewer J, Hanscho M, Zielinski DC, Ang KS, Gardiner NJ, Gutierrez JM, Kyriakopoulos S, Lakshmanan M, Li S, Liu JK, Martinez VS, Orellana CA, Quek LE, Thomas A, Zanghellini J, Borth N, Lee DY, Nielsen LK, Kell DB, Lewis NE, Mendes P. Recon 2.2: from reconstruction to model of human metabolism. Metabolomics. 2016; 12:109.View ArticlePubMedPubMed CentralGoogle Scholar
 Robinson JL, Nielsen J. Anticancer drug discovery through genomescale metabolic modeling. Curr Opin Syst Biol. 2017; 4:1–8.View ArticleGoogle Scholar
 Geng J, Nielsen J. In silico analysis of human metabolism: Reconstruction, contextualization and application of genomescale models. Curr Opin Syst Biol. 2017; 2:29–38.View ArticleGoogle Scholar
 Giuseppin MLF, van Riel NAW. Metabolic modeling of saccharomyces cerevisiae using the optimal control of homeostasis: a cybernetic modle definition. Metab Eng. 2000; 2:14–33.View ArticlePubMedGoogle Scholar
 van Riel NAW, Giuseppin MLF, Verrips CT. Dynamic optimal control of homeostasis: an integrative system approach for modeling of the central nitrogen metabolism in saccharomyces cerevisiae. Metab Eng. 2000; 2:49–68.View ArticlePubMedGoogle Scholar
 Mahadevan R, Edwards JS, III FJD. Dynamic flux balance analysis of diauxic growth in escherichia coli. Biophys J. 2002; 83:1331–40.View ArticlePubMedPubMed CentralGoogle Scholar
 Lee S, Phalakornkule C, Domach MM, Grossmann IE. Recursive milp model for finding all the alternate optima in lp models for metabolic networks. Comput Chem Eng. 2000; 24:711–6.View ArticleGoogle Scholar
 Maarleveld TR, Wortel MT, Olivier BG, Teusink B, Bruggeman FJ. Interplay between constraints, objectives, and optimality for genomescale stoichiometric models. PLoS Comput Biol. 2015; 11:1004166.View ArticleGoogle Scholar
 Mahadevan R, Schilling CH. The effects of alternate optimal solutions in constraintbased genomescale metabolic models. Metab Eng. 2003; 5(4):264–76.View ArticlePubMedGoogle Scholar
 Steuer R, Gross T, Selbig J, Blasius B. Structural kinetic modeling of metabolic networks. Proc Natl Acad Sci U S A. 2006; 103:11868–73.View ArticlePubMedPubMed CentralGoogle Scholar
 Khodayari A, Zomorrodi AR, Liao JC, Maranas CD. A kinetic model of escherichia coli core metabolism satisfying multiple sets of mutant flux data. Metab Eng. 2014; 25:50–62.View ArticlePubMedGoogle Scholar
 Murabito E, Verma M, Bekker M, Bellomo D, Westerhoff HV, Teusink B, Steuer R. Montecarlo modeling of the central carbon metabolism of lactococcus lactis: insights into metabolic regulation. PLoS ONE. 2014; 9:106453.View ArticleGoogle Scholar
 Du B, Zielinski DC, Kavvas ES, Dräger A, Tan J, Zhang Z, Ruggiero KE, Arzumanyan GA, Palsson BØ. Evaluation of rate law approximations in bottomup kinetic models of metabolism. BMC Syst Biol. 2016; 10:40.View ArticlePubMedPubMed CentralGoogle Scholar
 Schomburg I, Chang A, Ebeling C, Gremse M, Heldt C, Huhn G, Schomburg D. Brenda, the enzyme database: updates and major new developments. Nucleic Acids Res. 2004; 32:431–3.View ArticleGoogle Scholar
 Wittig U, Kania R, Golebiewski M, Rey M, Shi L, Jong L, Algaa E, Weidemann A, SauerDanzwith H, Mir S, Krebs O, Bittkowski M, Wetsch E, Rojas I, Muller W. Sabiork  a database for biochemical reaction kinetics. Nucleic Acids Res. 2012; 40:790–6.View ArticleGoogle Scholar
 Flamholz A, Noor E, BarEven A, Milo R. equilibrator  the biochemical thermodynamics calculator. Nucleic Acids Res. 2012; 40:770–5.View ArticleGoogle Scholar
 Davidi D, Noor E, Liebermeister W, BarEven A, Flamholz A, Tummler K, Barenholz U, Goldenfeld M, Shlomi T, Milo R. Global characterization of in vivo enzyme catalytic rates and their correspondence to in vitro kcat measurements. Proc Natl Acad Sci U S A. 2016; 113:3401–6.View ArticlePubMedPubMed CentralGoogle Scholar
 Liebermeister W, Uhlendorf J, Klipp E. Modular rate laws for enzymatic reactions: thermodynamics, elasticities and implementation. Bioinformatics. 2010; 26:1528–34.View ArticlePubMedGoogle Scholar
 Lubitz T, Schulz M, Klipp E, Liebermeister W. Parameter balancing in kinetic models of cell metabolism. J Phys Chem B. 2010; 114:16298–303.View ArticlePubMedPubMed CentralGoogle Scholar
 Erdrich P, Steuer R, Klamt S. An algorithm for the reduction of genomescale metabolic network models to meaningful core models. BMC Syst Biol. 2015; 9:48.View ArticlePubMedPubMed CentralGoogle Scholar
 Rao S, van der Schaft A, van Eunen K, Bakker BM, Jayawardhana B. A model reduction method for biochemical reaction networks. BMC Syst Biol. 2014; 8:52.View ArticlePubMedPubMed CentralGoogle Scholar
 Kuntz J, Oyarzun D, Stan GB. Model reduction of geneticmetabolic networks via time scale separation. In: A systems theoretic approach to systems and synthetic biology I: models and system characterization. Dordrecht: Springer: 2014.Google Scholar
 Hoops S, Sahle S, Gauges R, Lee C, Pahle J, Simus N, Singhal M, Xu L, Mendes P, Kummer U. Copasi: a complex pathway simulator. Bioinformatics. 2006; 22:3067–74.View ArticlePubMedGoogle Scholar
 Klamt S, SaezRodriguez J, Gilles ED. Structural and functional analysis of cellular networks with cellnetanalyzer. BMC Syst Biol. 2007; 1:2.View ArticlePubMedPubMed CentralGoogle Scholar
 Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H. The systems biology markup language (sbml): a medium for representation and exchange of biochemical network models. Bioinformatics. 2003; 19:524–31.View ArticlePubMedGoogle Scholar
 Lubitz T, Hahn J, Bergmann FT, Noor E, Klipp E, Liebermeister W. SBtab: a flexible table format for data exchange in systems biology. Bioinformatics. 2016; 32(16):2559–61.View ArticlePubMedPubMed CentralGoogle Scholar
 Morgat A, Axelsen KB, Lombardot T, Alcantara R, Aimo L, Zerara M, Niknejad A, Belda E, HykaNouspikel N, Coudert E, Redaschi N, Bougueleret L, Steinbeck C, Xenarios I, Bridge A. Updates in rhea  a manually curated resource of biochemical reactions. Nucleic Acids Res. 2015; 43:459–64.View ArticleGoogle Scholar
 Caspi R, Altman T, Billington R, Dreher K, Foerster H, Fulcher CA, Holland TA, Keseler IM, Kothari A, Kubo A, Krummenacker M, Latendresse M, Mueller LA, Ong Q, Paley S, Subhraveti P, Weaver DS, Weerasinghe D, Zhang P, Karp PD. The metacyc database of metabolic pathways and enzymes and the biocyc collection of pathway/genome databases. Nucleic Acids Res. 2014; 42:459–71.View ArticleGoogle Scholar
 Moretti S, Martin O, Van Du Tran T, Bridge A, Morgat A, Pagni M. MetaNetX/MNXref – reconciliation of metabolites and biochemical reactions to bring together genomescale metabolic networks. Nucleic Acids Res. 2016; 44(D1):523–6.View ArticleGoogle Scholar
 Bornstein BJ, Keating SM, Jouraku A, Hucka M. Libsbml: an api library for sbml. Bioinformatics. 2008; 24:880–1.View ArticlePubMedPubMed CentralGoogle Scholar
 Meurer A, Smith CP, Paprocki M, Čertík O, Kirpichev SB, Rocklin M, Kumar A, Ivanov S, Moore JK, Singh S, Rathnayake T, Vig S, Granger BE, Muller RP, Bonazzi F, Gupta H, Vats S, Johansson F, Pedregosa F, Curry MJ, Terrel AR, Roučka V, Saboo A, Fernando I, Kulal S, Cimrman R, Scopatz A. Sympy: symbolic computing in python. PeerJ Comput Sci. 2017; 3:103.View ArticleGoogle Scholar
 Hackett SR, Zanotelli VRT, Xu W, Goya J, Park JO, Perlman DH, Gibney PA, Botstein D, Storey JD, Rabinowitz JD. Systemslevel analysis of mechanisms regulating yeast metabolic flux. Science. 2016; 354(6311):2786–6.View ArticleGoogle Scholar
 Somogyi ET, Bouteiller JM, Glazier JA, König M, Medley JK, Swat MH, Sauro HM. libRoadRunner: a high performance SBML simulation and analysis library. Bioinformatics. 2015; 31(20):3315–21.View ArticlePubMedPubMed CentralGoogle Scholar
 Jones E, Oliphant T, Peterson P. Scipy: open source scientific tools for python. 2001. www.scipy.org.
 Ebrahim A, Lerman JA, Palsson BØ, Hyduke DR. Cobrapy: constraintsbased reconstrunction and analysis for python. BMC Syst Biol. 2013; 7:74.View ArticlePubMedPubMed CentralGoogle Scholar
 Orth JD, Palsson BØ, Fleming RMT. Reconstruction and Use of Microbial Metabolic Networks: the Core Escherichia coli Metabolic Model as an Educational Guide. EcoSal Plus. 2010; 4(1).Google Scholar
 Orth JD, Conrad TM, Na J, Lerman JA, Nam H, Feist AM, Palsson BØ. A comprehensive genomescale reconstruction of escherichia coli metabolism. Mol Syst Biol. 2011; 7:535.View ArticlePubMedPubMed CentralGoogle Scholar
 Österlund T, Nookaew I, Bordel S, Nielsen J. Mapping conditiondependent regulation of metabolism in yeast through genomescale modeling. BMC Syst Biol. 2013; 7:36.View ArticlePubMedPubMed CentralGoogle Scholar
 Noor E, Haraldsdóttir HS, Milo R, Fleming RMT. Consistent Estimation of Gibbs Energy Using Component Contributions. PLoS Comput Biol. 2013; 9(7):1003098.View ArticleGoogle Scholar
 Jahan N, Maeda K, Matsuoka Y, Sugimoto Y, Kurata H. Development of an accurate kinetic model for the central carbon metabolism of escherichia coli. Microb Cell Factories. 2016; 15:112.View ArticleGoogle Scholar
 Millard P, Smallbone K, Mendes P. Metabolic regulation is sufficient for global and robust coordination of glucose uptake, catabolism, energy production and growth in escherichia coli. PLoS Comput Biol. 2017; 13:1005396.View ArticleGoogle Scholar
 Frölich F, Kaltenbacher B, Theis FJ, Hasenauer J. Scalable parameter estimation for genomescale biochemical reaction networks. PLoS Comput Biol. 2017; 13:1005331.View ArticleGoogle Scholar
 Dräger A, Zielinski DC, Keller R, Rall M, Eichner J, Palsson BØ, Zell A. SBMLsqueezer 2: contextsensitive creation of kinetic equations in biochemical networks. BMC Syst Biol. 2015; 9(1):68.View ArticlePubMedPubMed CentralGoogle Scholar
 Reznik E, Christodoulou D, Goldford JE, Briars E, Sauer U, Segrè D, Noor E. GenomeScale architecture of small molecule regulatory networks and the fundamental tradeoff between regulation and enzymatic activity. Cell Rep. 2017; 20(1):2666–77.View ArticlePubMedPubMed CentralGoogle Scholar