RMBNToolbox: random models for biochemical networks
© Aho et al. 2007
Received: 22 February 2007
Accepted: 24 May 2007
Published: 24 May 2007
Skip to main content
© Aho et al. 2007
Received: 22 February 2007
Accepted: 24 May 2007
Published: 24 May 2007
There is an increasing interest to model biochemical and cell biological networks, as well as to the computational analysis of these models. The development of analysis methodologies and related software is rapid in the field. However, the number of available models is still relatively small and the model sizes remain limited. The lack of kinetic information is usually the limiting factor for the construction of detailed simulation models.
We present a computational toolbox for generating random biochemical network models which mimic real biochemical networks. The toolbox is called Random Models for Biochemical Networks. The toolbox works in the Matlab environment, and it makes it possible to generate various network structures, stoichiometries, kinetic laws for reactions, and parameters therein. The generation can be based on statistical rules and distributions, and more detailed information of real biochemical networks can be used in situations where it is known. The toolbox can be easily extended. The resulting network models can be exported in the format of Systems Biology Markup Language.
While more information is accumulating on biochemical networks, random networks can be used as an intermediate step towards their better understanding. Random networks make it possible to study the effects of various network characteristics to the overall behavior of the network. Moreover, the construction of artificial network models provides the ground truth data needed in the validation of various computational methods in the fields of parameter estimation and data analysis.
Modeling and analysis of large biochemical networks is in its infancy. Networks' intrinsic capabilities and behavior arise both from the numerous network components and their complex interactions, thereby making the modeling task very challenging. In the field of computational systems biology, researchers modeling these networks often aim at predicting the system behavior in response to a given treatment. For example, lethality prediction for gene deletions [1, 2] and maximization of the yield of a metabolic product [3, 4] provide interesting applications.
Currently the structures of various biochemical networks are under extensive research. Best known are the structures of metabolic networks which are reconstructed on the basis of genome annotation, and biochemical and physiological evidence . Metabolic network models are reconstructed e.g. for yeast Saccharomyces cerevisiae [1, 6], bacteria Escherichia coli [7, 8] and Streptomyces coelicolor , and a number of other organisms . The structures of other intracellular networks types than metabolic networks are poorer known. Gene regulatory networks are explored in large scale in gene deletion studies [11–13] and transcription factor binding experiments , but various uncertainties relate to those studies. On the other hand, much information is available for protein-protein interaction networks and signal transduction networks [15–17] but, for example, the modular composition of proteins retards their reconstruction [18, 19].
Besides structural information, the modeling of biochemical network behavior needs information about reaction kinetics, too. Reaction kinetics is much studied in biochemistry but, unfortunately, it still remains mostly unknown because of the difficult quantification of reaction velocities, especially in vivo [20, 21]. In some cases it has been possible to construct kinetic models for reaction pathways . In these situations, both the network structure and reaction kinetics are known, and the network model can be simulated using a system of ordinary differential equations (ODEs). However, in most cases the lack of kinetic information prevents the construction of ODE models or the model sizes remain very limited.
The usual approach to construct an ODE model for a biochemical pathway is to collect the needed information from literature piece by piece. The process is time consuming, and uncertainties appear in model construction because of natural complexity of cellular systems and the varying conditions in which they are examined.
A complementary method to construct ODE models is to adopt the available information, and then randomly generate the lacking information. These partially random models have several applications. First, they provide the ground truth data for objective evaluation of methods in data analysis and parameter estimation. The fundamental problem in those fields is that the goodness of the methods cannot be evaluated because data from real biological measurements is always noisy and the correct values remain unknown (see, e.g., [23–25]). Second, a researcher can generate a practically unlimited number of networks in which given features are varied. This makes it possible to study interrelationship between network structure and function, and to obtain statistical significance on the results (see, e.g., ). Third, the approach allows gradual model construction in which randomness is decreased after more information becomes available. For example, if the parameters of kinetic rate laws were previously drawn from a distribution, their values can be fixed when they become known. Thus, the model becomes more similar to its biological example.
There are many software for time series simulation, parameter estimation, and other analysis of biochemical network models (see, e.g. [27–34]). To authors' knowledge, however, there is no freely available an easily extendable software toolbox for generation of random ODE models for biochemical networks. The existing network generation softwares [35, 36] have different modeling approaches and principles.
The reaction rates v j , j = 1, ..., n are determined by kinetic laws f j as
v j = f j (c j , p j ), (2)
in which c j includes concentrations of species taking part in the reaction j, and p j contains the parameter values of the kinetic law.
In addition to the basic scheme shown in Eqs. 1 and 2, the model may contain other details such as assignment rules. An assignment rule makes it possible to assign a specific value for a variable independently from the system of differential equations above. The value may depend on time, species amounts, or whatever other model variables.
The toolbox provides various methods for constructing network structures. The user can generate and import graph models as well as stoichiometric models. RMBNToolbox uses an incidence matrix representation to store a directed bipartite graph which describes the network structure. In the graph, species and reactions are nodes connected with directed edges. Edges indicate the direction of mass flow or controlling activity. Next we introduce the main approaches for setting up network structures.
The toolbox provides functions that make use of statistical rules in network structure generation. The user may specify the number of reactions, the number of species, and a probability density function. The probability density function defines the number of species that are connected to each reaction. For example, it may be required that the probabilities for reactions to have 1, 2 and 3 substrates, are 50%, 30%, and 20%, respectively. The method is useful when reactions have a known indegree distribution of substrates or outdegree distribution of products which is used as a determining feature for the structure generation. In various network systems, the structure of the network determines its stability. The toolbox offers a possibility to specifically generate stable or unstable linear systems as models for biochemical networks. Methods with different structure generation principles are implemented for this purpose. The first method generates network structures and tests their stability until a network structure with a stable (or unstable) behavior is found. The other methods generate network structures iteratively. One by one they connect random species to random reactions and check whether the network remains stable (or unstable). All the methods examine the model stability using the eigenvalues of the constructed system matrix. Further theoretical details are presented in an example of the Section 'Results'.
A network analysis study may be based on graph theoretical approaches, too. A tree is a graph in which no loops nor unconnected nodes exist. The toolbox makes it possible to generate trees as models of network structures. A tree sets up a network sceleton to which more reactions, species, or their connections can be added later on, or which can be analyzed further as such.
In addition to random structure generation, the user can specify any pre-defined network structure by providing a bipartite graph in the form of an m × n matrix M. In that case, the m rows represent species and n columns represent reactions, and the element M (i, j) equals one if species i is connected to reaction j. If a reaction and a species are not connected, then the respective element in M equals zero. This approach makes it possible for the user to easily generate any kind of network structure using her own methods, and to process the model further using the toolbox functions.
The toolbox supports the import of stoichiometric matrices. The user may find the import feature especially useful in the cases in which the structure of a metabolic network is known, but kinetics not. Stoichiometric matrices S are provided as m × n matrices (see, Equation 1), possibly along with the names for the m species and n reactions.
The main task in the generation of network kinetics is to choose and set kinetic laws for reactions in the network model (see, Eq. 2). RMBNToolbox has a function that randomly chooses kinetic laws from KineticLawLibrary  which contains many of the basic kinetic laws from biochemistry textbooks [20, 21, 39]. Kinetic laws have different forms depending on various features on their reaction mechanisms, such as the numbers of substrates and products, compulsory or arbitrary binding order of multiple substrates, and reversibility. Two features related to network structure determine if a specific kinetic law can be set for a specific reaction in the network model. First, the numbers of subsrates and products must be the same in the kinetic law and in the reaction it is applied to. Second, the reversibility of the kinetic law must match with the reversibility of the reaction. The choice of a kinetic law can be made randomly among those kinetic laws which fulfill these two requirements.
Kinetic laws have various parameters for which values need to be determined. By default, the parameter values are random numbers from uniform distributions. The user can redefine the distributions, and she can set new values separately for individual parameters if needed.
In addition to reactions, the amounts of species may be determined by assignment rules. In this case, the user writes an assignment rule as a Matlab M-file, and specifies the variables which are used for its evaluation. With a similar procedure, assignment rules can be set for parameters of kinetic rate laws. Thus, assignment rules make it possible for species and parameter values to be functions of any other variables.
An initial state has to be given for a network model before its dynamical behavior can be simulated. This includes defining the initial amounts for species, but also the values of other time-dependent variables which may exist. The toolbox provides a function for this task. On the other hand, there are many network analysis methods that do not need the state information (e.g., flux balance analysis ). For those cases, the user can generate and export models without the state information.
The network models created with the help of RMBNToolbox can be exported in the format of Systems Biology Markup Language (Level 2, version 1) . An increasing amount of software tools support SBML for model exchange, and therefore the user can choose her favourite tool for further analysis of the generated models. RMBNToolbox bases its SBML support on other software. The network model generated by RMBNToolbox is converted to the format of SBMLToolbox  which is another toolbox working in Matlab. After kinetic laws are read from KineticLawLibrary  and added to the model, SBMLToolbox makes it possible to export the model in SBML format. The export is done with the help of the LibSBML library which is written in ISO C and C++ .
In this section we present examples of the intended use of RMBNToolbox. In the first example we generate a large model for a genetic regulatory network that can be used to produce ground truth data for a microarray simulation . In the second example the structure and stoichiometry of a metabolic network are known, and the kinetic laws are randomly generated. Furthermore, the example demonstrates how metabolic fluxes in a steady state can be decomposed by elementary flux modes. The third example studies network stability using a control theoretic approach. The example generates small networks for which the network structure determines the stability. All the Matlab scripts that are used to generate the following example networks can be found in the examples folder of RMBNToolbox. All the generated example networks can be downloaded as additional files of this article.
In gene regulatory networks a set of genes produce proteins called transcriptional regulators. Transcriptional regulators bind to the promoter areas of genes, thereby activating or inhibiting their transcription. Most of the genes do not produce transcriptional regulators but their functions may be related to other processes, such as metabolism or cellular growth. Transcriptional regulators are usually thought as the key to the cellular control. In this example we produce a large network model with simple structural characteristics [see Additional file 1]. The model mimics a gene regulatory network.
In the generated network there are 1000 transcription reactions which produce one product each. The total of 200 of the products act as transcriptional regulators which control the network by activating and inhibiting the transcription reactions. Each of the transcription reactions has one activatory and one inhibitory regulator which are selected randomly from the 200 regulators.
where V basal is the rate of transcription in the absence of activators and inhibitors, I and A represent the concentrations of inhibitor and activator, K I and K A represent the concentrations with which the inhibitor and the activator have the effect of half of their maximal effects, and n I and n A act as Hill coeffcients. The parameter values are random numbers from the following uniform distributions: V basal ∈ U (5,10), K I ∈ U (2,3), K A ∈ U (1,2), n I ∈ U (1,2), n A U (1,2). The initial concentrations I and A are random numbers from the uniform distributions I ∈ U (0, 1) and A ∈ U (0, 1).
The degradation kinetics of each gene product follows the mass action law
r = k P, (4)
where k is a rate parameter and P is the concentration of the gene product. Similarly to the kinetic laws of transcription reactions, the parameter values are unique for each degradation reaction. In this case, the value of parameter k is drawn from the uniform distribution U (0.01, 0.02).
For a comparison, we additionally simulate a duplicate network which mimics a gene deletion [see Additional file 2]. In the duplicated network, the production of a randomly chosen regulator is stopped by setting the parameter V basal of its transcription reaction to zero. Otherwise the duplicated network is identical to the original network.
The behavioral differences are illustrated by a time series simulation. After constructed, both networks are exported in SBML format and SBML ODE Solver  is used to simulate them. Figure 2 gives an overview to differences in simulation results. For each of the species, the figure shows concentration differences between the two simulations. For each time point t, the difference d (t) is calculated as
d (t) = c (t) - c* (t), (5)
where c (t) and c* (t) are concentration values of the species in the first and in the second simulation, respectively. Although most of the species act similarly in both simulations, there are large and unforeseeable dynamic variations too. The effects of the inactivation of the regulator do not fade away or relax to a constant but the inactivation seems to have complex behavioral consequences.
In this example, a time series simulation is used to illustrate a result from stoichiometric network analysis. As presented in , any feasible steady state flux distribution of a metabolic network is a linear combination of so-called elementary flux modes (EFM). We show using random reaction kinetics that this holds in a small examplary metabolic network model. Extreme pathways  and elementary fluxes  are similar concepts to EFMs, and they would be equally valid for this analysis.
i.e., there is no accumulation or depletion of internal metabolites. Specific reaction velocities (flux distributions) are needed to maintain steady states.
Elementary flux modes describe such reversible and irreversible pathways in the network which maintain steady states when working. In an elementary flux mode, each reaction is assigned with its relative velocity compared to other reactions in the same EFM. EFMs are minimal in the sense that the active reactions in an EFM cannot be a subset of the active reactions in another EFM. Elementary flux modes can be calculated based on a stoichiometric matrix and the respective reaction irreversibilities . Let vector e denote an elementary flux mode in which element e i = 0 if reaction i is inactive and e i ≠ 0 if the reaction i is active. Further, let the set of all N elementary flux modes of the network be in matrix E = [e 1, e 2, ..., e N ]. Then any flux distribution v, which results a steady state into the network, can be described as a linear combination of the EFMs as
v = E β, β j ≥ 0 if EFM j is irreversible (7)
where the vector β weigths each of the elementary fluxes by a scalar. The weigths are non-negative for EFMs describing irreversible pathways.
The objective function is set to find the maximum of the sum of the weigths. Rather than the maximum value, we are now interested in the existence of any solution. In the following, we utilize the fact that the maximum can be found only if any solutions exist.
Neither RMBNToolbox nor Systems Biology Markup Language take care of the rationality of the generated network models. Possible unstability of the generated model is a typical issue the user has to consider. In this example we look how to exploit control theory for generating models which are unstable and biologically unreasonable and, on the other hand, stable and biologically more reasonable.
in which the vector c contains states of variables, and the system matrix A determines the system properties. The system is known to be stable if the eigenvalues of A have nonpositive real parts, and every eigenvalue with zero as the real part has an associated Jordan block of order one [49, 50].
Next we derive a model for which the user can determine if this stability requirement is fulfilled or not. For this purpose, the biochemical network model described by Equations 1 and 2 needs to be represented in the format of Equation 9.
We begin the model reformulation from kinetic laws, i.e., Equation 2. Because the intended model in Equation 9 is linear, we can utilize for its construction such kinetic laws which are linear too. Kinetic laws of the form first-order mass-action fulfill this need. For example, the kinetic law for reaction j is v j = k j c j where k j is a reaction-specific rate constant and c j is the concentration of the subsrate. Kinetic laws of this form make it possible to represent the reaction rate vector v of Equation 1 by a matrix-vector multiplication
v = Γc, (10)
where Γ is a diagonal n × m matrix storing rate constants k j , for each reaction j = 1, ..., n, on its main diagonal.
During the model construction, matrices S and Γ are randomly generated after which they are multiplied to produce the matrix A. The eigenvalues of A are calculated, and the values of their real parts are examined. The generation is repeated until the eigenvalues indicate the required stable or unstable behavior of the model.
In network model generation, the user has to define various network characteristics that include network components, their connections, stoichiometries, kinetic rate laws, etc. RMBNToolbox helps the user in these tasks by providing functions that make it possible to create and modify various structural and kinetic features. Many generation tasks are automated and, on the other hand, randomization can be exploited efficiently. The most of the features specified in Systems Biology Markup Language are supported by the toolbox. The toolbox does not oversee the rationality of the generated models, because an unreasonable model in one context may be reasonable in another one.
The model generation times are fair even for large models. For example, the genetic regulatory network model presented in Section 'Results' has 1,000 species and 2,000 reactions. The model and the corresponding SBML file were generated in appr. 40 seconds using a PC with 1GB RAM and Pentium M 1,3 GHz processor. Small network models, such as the one used in the stoichiometric analysis example, are generated within one second.
We have presented a software called RMBNToolbox that can be used to generate random models for biochemical networks. The toolbox functions make it possible to generate network models with various user specified characteristics. For example, network structure, stoichiometric coefficients, kinetic laws and parameter values can be easily generated and manipulated. With the help of SBMLToolbox and LibSBML, the models can be translated into the format of Systems Biology Markup Language. The generated network models can be simulated and analyzed using any software that is able to use models provided in SBML format. The toolbox can be easily extended and modified, because it has a modular structure, it is implemented in Matlab environment, and it is freely available under GNU General Public Licence. Random network models can be applied to various purposes in the field of biochemical network modeling. Artificial models are needed to produce noise free data in which the characteristics are precisely known. Only that kind of data can be used for objective evaluation of various data analysis and parameter estimation methods. On the other hand, the new information acquired from biochemical networks can be included into the network model generation. This makes it possible to refine the model gradually while preserving the variations of the unknown parts of the network. Further, it is possible to study various emergent properties in network behavior, such as the effects of varying network connectivity. For these kinds of purposes, a sufficiently large number of network models is generated and the features of interest are varied.
Project name: RMBNToolbox
Project home page: http://sourceforge.net/projects/rmbntoolbox
Operating system(s): Platform independent
Programming language: Matlab
Other requirements: LibSBML 2.3.2 or higher, SBMLToolbox 2.0.0 or higher
License: GNU GPL
This work was funded by the National Technology Agency of Finland. In addition, the work was supported by the Academy of Finland, (application number 213462, Finnish Programme for Centres of Excellence in Research 2006–2011). TA aknowledges Tampere Graduate School in Information Science and Engineering, Signe and Ane Gyllenberg Foundation, and Tamperelaisen tutkimustyön tukisäätiö.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.