ALC: automated reduction of rulebased models
 Markus Koschorreck^{1}Email author and
 Ernst Dieter Gilles^{1}
DOI: 10.1186/17520509291
© Koschorreck and Gilles; licensee BioMed Central Ltd. 2008
Received: 20 June 2008
Accepted: 31 October 2008
Published: 31 October 2008
Abstract
Background
Combinatorial complexity is a challenging problem for the modeling of cellular signal transduction since the association of a few proteins can give rise to an enormous amount of feasible protein complexes. The layerbased approach is an approximative, but accurate method for the mathematical modeling of signaling systems with inherent combinatorial complexity. The number of variables in the simulation equations is highly reduced and the resulting dynamic models show a pronounced modularity. Layerbased modeling allows for the modeling of systems not accessible previously.
Results
ALC (Automated Layer Construction) is a computer program that highly simplifies the building of reduced modular models, according to the layerbased approach. The model is defined using a simple but powerful rulebased syntax that supports the concepts of modularity and macrostates. ALC performs consistency checks on the model definition and provides the model output in different formats (C MEX, MATLAB, Mathematica and SBML) as readytorun simulation files. ALC also provides additional documentation files that simplify the publication or presentation of the models. The tool can be used offline or via a form on the ALC website.
Conclusion
ALC allows for a simple rulebased generation of layerbased reduced models. The model files are given in different formats as readytorun simulation files.
Background
Combinatorial complexity
The association of a few proteins can result in an enormous amount of possible protein complexes [1, 2]. This phenomenon is referred to as combinatorial complexity. A general observation is that there are four possible sources of combinatorial complexity in signaling systems that can also occur in combination with each other:

Binding events at scaffold proteins with more than one binding site

Modification (e.g. phosphorylation) of proteins with several sites

Oligomerization (as a special case: dimerization)

Chain formation
In a simplified form, we previously used the insulin signaling system as a demonstration object for combinatorial complexity [3]. The complexity (145,156,469 possible species) results from a combination of the first three items listed above. Many other signaling systems also show combinatorial complexity [1]. As outlined below, one ODE (ordinary differential equation) is required for the balance of each possible complex. This leads to enormous problems in the modeling of systems with inherent combinatorial complexity.
Structure of the introduction
In the following sections, we introduce general modeling concepts which are characterized by significant differences in their ability to cope with combinatorial complexity. We outline the conventional approach to model biochemical systems, discuss its limitations and review recent techniques that allow for the modeling of systems with inherent combinatorial complexity. A very promising approach is that of layerbased modeling [3], for which we present ALC (Automated Layer Construction), a tool for rulebased automated modeling. The main motivation of this comprehensive introduction is to provide simplified but accurate information regarding layerbased modeling for those who are not familiar with the method. More involved discussions and several examples can be found in our previous publication introducing layerbased modeling [3]. Readers who are already familiar with rules and layerbased modeling should read the "general concepts and definitions" section but may skip the remainder of the introduction and directly proceed to the "implementation" chapter where we introduce the software tool ALC.
General concepts and definitions
In this section, we introduce the notation for the variables and their dependencies that we use later on. Proceeding in such a formal way may not be necessary for conventional modeling, however, introducing the layerbased approach and ALC benefits from a precise terminology.
Modules
Systems can often be decomposed into selfcontained subsystems that interact with other subsystems in a welldefined way. Such subsystems are called modules. The process of defining the modules and their connections is called modularization.
Molecule definitions
The molecule definition introduced here is not equivalent to the strict chemical understanding of molecules where all atoms have to be linked by covalent bonds. A molecule definition represents the class of possible modifications of a protein (or another molecule). It consists of the molecule name that is optionally followed by the successive definition of all sites (if there are any). A site definition consists of a commaseparated sequence of all possible modifications of this site, enclosed by curly brackets. As an example for a molecule definition, R{0, L}{0, P} defines the molecule R. The possible modifications (or configurations) of R are '0' and 'L' at the first site as well as '0' and 'P' at the second site. Note that we allow for the consideration of a bound ligand (e.g. L) as a site configuration of the receptor molecule. There can also be molecules that exist in only one configuration. The modification of them results in new molecules.
Species and concentrations
Species correspond to a specific configuration of a molecule and can be considered as instances of the corresponding molecule definitions. Notations for species consist of the molecule name followed by the commaseparated sequence of site configurations within one pair of squared brackets. As an example, the species R[L, P] is a specified configuration of the molecule R. The ligand L is bound to the first site, the second site is phosphorylated. The species R[0, 0] defines the unphosphorylated receptor without ligand. If a molecule A can occur in just one state (i.e. the molecule definition is A), this single species is also denoted as A. For the sake of simplicity, we use the same notation for species and their concentrations.
Macroscopic variables
In exact model reduction [4, 5] and layerbased modeling [3] there exist variables that correspond to sums of species of a corresponding conventional model. We call these variables macroscopic variables (or lumped states). Note that species can be macroscopic variables.
Microscopic species
All species that are not macroscopic variables are microscopic species (or microstates). Microscopic species correspond to a distinct modification of a molecule and do not represent a pool of chemical molecules or complexes with common properties. Microscopic species are species that could also occur in a conventional model. Therefore, in the conventional modeling approach, microscopic species are equivalent to species.
Macrostates and patterns
Macrostates describe macroscopic variables, e.g. degrees of phosphorylation or occupancy which correspond to sums of species. Their notation is analogous to that for species, with the sole difference being that the modification 'X' at a specific site indicates that all distinct modifications at this site are included. Therefore, a site modification 'X' of a molecule indicates the sum of all possible modifications of this site. Macrostates in rules are interpreted as patterns. Each species of the class defined by the pattern occurs in a separate reaction when the rule is evaluated. Therefore, patterns do not represent sums of species (as macrostates do) but each of the corresponding species individually (see section "rulebased modeling"). Like species, macrostates can also be considered as instances of the corresponding molecule definitions. Note that not all macroscopic variables are macrostates. The hallmark of macrostates and patterns is that they must contain at least one site modification 'X'.
As an example, the macrostate ERB[L, X, X] defines the sum of all species of the molecule ERB that have a bound ligand at the first site, regardless of the state of the other sites.
Note that the term macrostate is used with a slightly different meaning in other contributions [4, 5], where it represents what we call a macroscopic variable (or macroscopic species). We reserve the term macrostate for sums of species that are defined by at least one site modification 'X'. The so defined macrostates are also macrostates or mesoscopic states as defined by Borisov et al. [4] and Conzelmann et al. [5].
Complexes
A complex is represented by a list of commaseparated species within curly brackets where the binding partner is indicated at each occupied binding site. Indices have to be used if it is required to achieve uniqueness.
As an example, consider the binding of a molecule R with n sites and a molecule S with k sites. The complex of R and S is {R[m_{1},..,S,...,m_{ n }], S[m_{1},...,R,...,m_{ k }]}, where m_{ i }denotes the modification at the site position i. If we also consider another molecule T that has q sites, the complex of R, S and T, where S and T bind to R is denoted as {R[m_{1},..,S,...,T,...,m_{ n }], S[m_{1},...,R,...,m_{ k }], T[m_{1},...,R,...,m_{ q }]}. This definition of complex notation is very general but cumbersome in many cases.
We introduce a simplified notation that will be used from now on and is supported by ALC. In many cases, this notation is less cumbersome and more intuitive than the general one. Consider the case that the molecules S and T both have only one site which is a binding site for R. The complex of R, S and T then is R[m_{1},..,S,...,T,...,m_{ n }] and can be treated as a species of the molecule R. If S and T have more sites, the configuration of S and T in the complex with R can be indicated by introducing additional site modifications of the corresponding site on R. If two of the associating molecules have several sites, the class of possible complexes can be defined by a new molecule definition which consists of a new molecule name and the sequence of site definitions of the corresponding molecule definitions for the monomers. We exemplify this for the general case where R has a binding site for S and n other sites, while S has a binding site for R and k other sites. A species representing a complex of R and S is $RS[{m}_{1}^{R},\mathrm{...}{m}_{n}^{R},{m}_{1}^{S},\mathrm{...}{m}_{k}^{S}]$, where the superscripts indicate the origin of the sites. Dimerization can be treated analogously.
Due to the modular structure of layerbased models, this simplified notation is usually more convenient than the general one, even if the signaling network is highly branched and contains many scaffold proteins.
Summary
Modules are selfcontained subsystems of a system that interact with other subsystems in a welldefined way. Species and macrostates are instances of molecule definitions which in turn define classes of species. Species can be microscopic species that correspond to a distinct modification of a chemical molecule or a complex. Species can also be macroscopic variables that correspond to sums of microscopic species. Macrostates are macroscopic variables that correspond to sums of species and whose hallmark is that they must contain at least one site modification 'X'. Patterns occurring in rules have the same notation as macrostates and define classes of species. Each species of such a class occurs in a separate reaction when the rule is evaluated. Following the simple notation of complexes, they are treated as species of one of the participating molecules or as species of newly defined molecules.
Reactions and kinetics
Mass action kinetics
If chemical systems are modeled at the molecular level, mass action kinetics are often a good description of the chemical processes [6] and are therefore frequently used. We use a representation of reaction equations which includes the dynamic information by adding the kinetic parameters of the reaction. In a reaction such as
aA + bB... ⇋ cC + dD... k 1 k 1d
A, B, C and D are the participating species and a, b, c and d are their stoichiometric coefficients. The kinetic parameters of the forward and backward reactions are k 1 and k 1d, respectively. According to the law of mass action [6], the reaction rate r for this reaction (Equation 1) is given as
r = k 1·A^{ a }·B^{ b }...  k 1d·C^{ c }·D^{ d }...
where A, B, C and D are the concentrations of the corresponding species. Taking the stoichiometric coefficients as the exponents in the rate law is only justified if the system is homogeneous and if the mixture is ideal [6, 7]. We will always assume that these assumptions are fulfilled.
Reactions may be irreversible which can be indicated by a reaction parameter that equals zero or a reaction symbol that indicates the direction of the reaction ('→' or '←' instead of '⇋'). In the latter case, only one kinetic parameter has to be given.
Generalized mass action kinetics
In the classical law of mass action for ideal mixtures, k 1 and k 1d are constants [6] which only depend on the temperature and on the pressure [7]. In the generalized law of mass action presented in [6], the constants k 1 and k 1d are multiplied by a common positive nonlinear function (that may also depend on concentrations of species). We use a generalized law of mass action that is more general where both reaction parameters may be general nonlinear functions.
Enzyme kinetics
Note that the power of S that is 'missing' in the numerator of the parameter expression in Equation 3 is reintroduced by building the rate law according to the generalized law of mass action (Equation 4). There exist several extension of MichaelisMenten kinetics that can also be formulated by the generalized law of mass action [6].
Conventional modeling and its limitations
The conventional (mechanistic) modeling approach for biochemical networks is to define each chemical reaction by a reaction equation. A rate law has to be assigned for each reaction and one ODE is required for the balance of each microscopic species [6]. This approach is frequently applied in systems biology [8–17].
Conventional modeling is very powerful for small reaction systems however its limitations become clearly visible when considering systems with inherent combinatorial complexity. For these systems, the numbers of necessary reactions and ODEs are very high [2]. As an example, 1.5·10^{8} ODEs would be necessary to get a conventional model of the insulin signaling system [3]. Therefore, the modeling of systems with inherent combinatorial complexity is very difficult, or even impossible using the conventional modeling approach. A frequently used solution of this problem is to simplify the combinatorial variety by focusing on small subsets of the possible reactions and complexes. This method can work well and is often applied [8–15], but there are also cases in which the resulting models show poor accuracy when compared to a model which considers all possible complexes and reactions.
The reason for these differences in approximation quality is that it is very difficult to decide which reactions and complexes can be neglected and which cannot. Apparently wellfounded assumptions may result in large approximation errors [5]. The structure of the reduced model (i.e. the decision of which species and reactions can be neglected; the assumed temporal order of the processes) depends on the parameter values of the reaction network [18]. However, these parameters are often not known and can only be estimated after the setup of the model equations.
A few systematic approaches for the modeling of systems with inherent combinatorial complexity exist. In the following sections, first of all we outline the exact modeling techniques and then give an introduction to layerbased modeling; a recent approximative technique.
Rulebased modeling
A single molecular event, e.g the binding of an effector, is often described by a long list of reactions. Large subsets of these reactions or even all reactions are usually parameterized by the same kinetic constants. Such subsets can be represented using generalized reactions with patterns instead of species [2, 19, 20]. These generalized reactions are called rules and each of them represents a class of reactions that are parameterized by the same kinetic constants. In many cases, a relatively short list of rules is sufficient to describe the reactions of a signaling system. Software tools exist [20–26] which automatically generate the simulation files from a rulebased model definition.
These reactions can be expressed by a single rule containing patterns on both sides.
R[0, X] + L ⇋ R[L, X] k 1 k 1d
Sites with a modification 'X' at the same position on both sides of the reaction equation correspond to each other. When rules are evaluated, the species in each reaction have the same modification at this site on both sides of the reaction equation.
Rules can also be parameterized by generalized mass actions kinetics in the same way as reactions. We give an example for enzyme kinetics of rules in the ALC user guide (Additional file 1).
Altogether, rules highly simplify the model representation and the generation of the model equations. A major drawback of rulebased modeling is that the resulting model may be very large as one ODE is required for each feasible species. The concept of rules however, is very powerful and is also used in stochastic simulation, exact model reduction and layerbased modeling, all of which are introduced below.
Stochastic simulation
Chemical processes at the molecular level show a stochastic behavior which is described by the chemical master equation [27]. Approaches directly reflecting the stochasticity describe integer populations of distinct species and assign to each reaction a probability for its occurrence in a certain time interval. Software tools exist that allow for the stochastic simulation of a rulebased model [21–26].
There are two major approaches for the stochastic simulation; namely onthefly and generatefirst [28]. The generatefirst approach uses rules to define the total reaction network of the system which is used for the stochastic simulation. The onthefly approach allows for the reduction of the model size. During the simulation, all complexes are only balanced if they are populated. The reactions of the complexes are also only generated if their occurrence is possible, i.e. if the complexes that act as reactants are populated. The advantage of the reduced model size comes at the price of a high computational effort if many simulation runs are necessary, e.g. for parameter estimation.
We do not discuss the stochastic approaches in detail and focus on deterministic modeling using ODEs. This approach describes chemical processes well, if the number of molecules and the volume are large enough [27].
Exact model reduction
Exact model reduction aims to achieve a macroscopic description of biochemical systems. This is motivated by the view that domains (functional components of proteins) and not species are the fundamental elements of signal transduction [29]. State variables in exactly reduced models are macroscopic quantities, such as levels of occupancy or degrees of phosphorylation that represent the state of the domains. These macroscopic quantities are preserved in exactly reduced models. Information about all single species is usually not of interest and is no longer explicitly contained in the models. In the last few years, two exact techniques for the reduction of deterministic models have been developed [4, 5].
Borisov et al. proposed the first exact approach to reduce the enormous amount of possible model equations [4]. The starting point is a rulebased model definition from which the reduced model is directly generated. However, the method only considers modularity within a single molecule with many sites. A major deficiency of this method is that the modularity of the entire reaction network cannot be considered. Therefore, this method is very valuable for describing complex formation at one large scaffold protein, however, it cannot be applied to many real signaling systems.
Conzelmann et al. [5] extended and generalized the approach by proposing an exact domainoriented lumping technique which can be applied to a large class of systems. However, for the application of this method it is necessary to evaluate the rulebased model definition and to use a conventional ODE model as the starting point. The conventional ODE model (that can be very large) is subjected to a state space transformation which results in the exactly reduced model if the unobservable states are omitted. The formal reduction method presented by Conzelmann et al. [5] has recently been extended and improved [30, 31]. The modified approach facilitates the generation of the exactly reduced model equations and does not require the former generation of a complete conventional model any more.
As in the approach of Borisov et al. [4], the performance of domainoriented lumping [5, 31] strongly depends on the interactions in the network. Enormous reduction is possible in some cases. In other cases no reduction is possible [3]. In many cases, the resulting reduced models still consist of too many ODEs for efficient simulation or analysis.
Layerbased modeling
Layerbased modeling allows for a macroscopic and highly reduced description of signaling systems [3]. A very convenient feature of layerbased modeling is that the reduced model can be directly obtained, which means in particular that the preceding generation of a conventional model is not necessary. The number of necessary ODEs in layerbased models is dramatically reduced. As an example, modeling of the insulin signaling system is possible with only 214 ODEs, instead of the 1.5·10^{8} needed when using conventional mechanistic modeling [3].
A graph of all processes and their interactions serves to define the modules that are called layers. The different layers are connected in a highly standardized way. They either exchange signals that correspond to macroscopic quantities, or they are not connected. No reaction proceeds across layer boundaries, therefore it is possible to model the layers separately from each other once their connections are defined. In special cases, a layerbased model exactly describes the dynamics of macroscopic quantities. In typical biochemical scenarios, it provides an approximative, but highly accurate description.
In this section, we give a detailed introduction to layerbased modeling. We start by introducing the concepts of processes and interactions necessary for the building of interaction graphs, which define the modularity of layerbased models. Afterwards we describe how the layers can be modeled and how their connections are defined. At the end of the introduction we discuss the mathematical background and the approximation quality of layerbased modeling.
Processes
In combinatorial reaction networks, the same molecular event often occurs in many different reactions. The high number of reactions results from the high amount of different species that participate in the reactions. All reactions that describe the same molecular event define the corresponding process. As an example, all reactions that change a certain level of occupancy or degree of phosphorylation belong to the same process. Note that the inverse molecular event also belongs to the same process (e.g. the process effector binding also contains all reactions describing effector dissociation).
The reactions of a process may either be parameterized by the same or by different kinetic constants depending on the presence of process interactions.
Interactions
Interactions between processes result in different parameter values for different reactions of the same process. This means that the reactions of a process that is influenced by other processes are not parameterized by the same kinetic constants. If the reaction parameters of a process depend on a modification that is performed by another process, these two processes interact with each other. If reaction parameters of two processes do not depend on the other process, these two processes do not interact.
There exist three structurally different types of interactions between processes. They can interact via graded interactions, allornone interactions or they do not interact [3].
Graded interaction
Allornone interaction
Allornone interactions are a limiting case of graded interactions, where both processes can only occur under certain preconditions, provided by the other process. A typical case is the interaction of the processes of binding site phosphorylation and effector binding (Figure 1B). The effector can only bind if the binding site is phosphorylated. Additionally, the dephosphorylation of the binding site is only possible if the site is not occupied. A hallmark of allornone interactions is the degeneration of reaction cycles to reaction chains (Figure 1C).
Phosphorylation and effector binding are the most common examples of processes that undergo an allornone interaction. Such an interaction is also possible for other pairs of processes, e.g ligand binding and receptor dimerization (dimerization occurs only if the ligand is bound, ligand dissociation is only possible for monomers), or receptor dimerization and receptor phosphorylation (phosphorylation is only possible if the receptor is dimerized, dissociation is only possible if the receptor is unphosphorylated). To simplify the discussion, and without loss of generality, we will always assume that the processes that undergo allornone interactions are phosphorylations and effector bindings.
No interaction
The third type of interaction, also a limiting case of graded interactions, is that the processes do not interact. Kinetic parameters of each process are not influenced by the other process (Figure 1A).
The interaction graph and layers
The third step in layerbased modeling consists of determining the modularity of the model from the interaction graph. All processes that are directly or indirectly connected by graded interactions form a module that is called layer. The set of these processes can be directly obtained from the interaction graph, as all processes that are connected by red lines (representing graded interactions) are to be described within the same layer. Note that layers may only contain a single process, and that there may be many layers. The connectivity of the layers is defined by allornone interactions between the processes of the different layers. After the definition of the connections, the layers are modeled independently from each other, as if all processes belonging to other layers do not exist.
In contrast to other modularization techniques [32–35] where the modularity is deduced from the entire model, layers can be modeled independently from each other once their connections are defined. This may give the impression that the user defines the modularity, however, the modularity is uniquely defined by the interaction pattern of the processes. Once the modeler has defined which processes interact in which way, they have only one way of influencing the modularity, which is to merge two or more layers into a larger one.
Mass flow and signal flow
In this subsection, we discuss the characteristics of mass flows and signal flows, and their occurrence in layerbased modeling. This helps in the understanding of basic principles of layerbased modeling, as the flows within layers and between layers are qualitatively different.
Mass flow
Each reaction defines a transition between species whose quantity is given by the reaction rate. We refer to these transitions as mass flows. Mass flows occur within layers and do not cross layer boundaries. Note that fluxes without spatial direction (e.g. within a compartment) are also considered as mass flows.
Signal flow
Signal flows define information transfers. In contrast to mass flow, signal flow is not associated to reactions. In layerbased modeling, signal flows occur between layers and can be represented by macroscopic variables, usually sums of species with phosphorylated binding sites x_{ i }and corresponding sums of species with occupied binding sites x_{ i }b. These signals are exchanged between layers containing processes that interact via allornone interactions. This will be discussed in detail below.
Flows and layers
Layers are either only connected by signal flows or are not connected at all. No mass flows cross layer boundaries as no reactions transport substance from one layer to the other. Within layers there are mass flows defined by reaction equations and the corresponding rates as in conventional modeling. The connections between layers show retroactivity [34] because the signal flow is bidirectional. However, changes in a layer (e.g. the introduction of an additional regulatory site) that do not affect the notation of x_{ i }and x_{ i }b are usually not followed by changes in another layer. Therefore, the layers can be modeled separately from each other as long as the notations of the signal flows between the layers are defined.
Modeling of layers
In the fourth step of layerbased modeling, the signal flows between the layers are defined and the layers are modeled separately from each other. Modeling within layers shows remarkable similarities to conventional modeling, however, there are differences that mainly result from the presence of allornone interactions between processes of different layers.
Phosphorylation of binding sites
Allornone interactions between binding site phosphorylations and the corresponding effector bindings bring much of the reduction potential of layerbased modeling. An allornone interaction implies that a phosphorylated binding site remains phosphorylated while the effector is bound, and that the effector binds only to phosphorylated binding sites. The phosphorylation of a binding site is often described in a different layer than the layer in which effector binding to the phosphorylated binding site is described. However, if there are graded interactions that connect the binding site phosphorylation and effector binding processes indirectly, these two processes have to be described in the same layer. It is obvious that the phosphorylation of binding sites to which effector binding is described in another layer has to be treated differently than the phosphorylation of regulatory sites or the phosphorylation of binding sites to which the effectors bind in the same layer.
To indicate the phosphorylation of a binding site to which the effector binds in another layer, the site notation (uppercase) 'P' is used. The phosphorylation of other sites can be indicated by other notations, e.g. (lowercase) 'p'. Species with a site modification (uppercase) 'P' comprise microscopic species with a phosphorylated but unoccupied binding site as well as microscopic species with a phosphorylated and occupied binding site.
Effector binding to phosphorylated binding sites in other layers does not directly change the concentration of species with a site modification 'P'. Therefore, layers only exchange signal flows and not mass flows across layer boundaries. The signal flows between the layers are used to guarantee that only unoccupied binding sites can be dephosphorylated (see below).
Reactions
The processes of each layer are described by reactions as if there were no other layers. Note that this also holds for the phosphorylation of binding sites to which effector binding is described in another layer. The sole difference is that the phosphorylation of binding sites to which the effectors bind in other layers has to be indicated by a site modification (uppercase) 'P'.
Effectors can only bind to a subset of phosphorylated binding sites, namely those that are not occupied. Effector binding to such a binding site in another layer is performed by introducing a new species that represents the sum of all microscopic species with phosphorylated but unoccupied binding sites. This new species acts as a binding partner for the effector and is defined by an algebraic equation. It is defined as the difference of the sum of species with phosphorylated binding sites x_{ i }and the corresponding sum of species with occupied binding sites x_{ i }b. These sums define the signal flows between the layers and are discussed in the next paragraph.
Layer connections
For the realization of allornone interactions between two processes of different layers, signal flows have to be exchanged between the layers. These signal flows are typically defined by the sums of species with phosphorylated binding sites x_{ i }and the corresponding sums of species with occupied binding sites x_{ i }b. Note that the signal flows x_{ i }and x_{ i }b often correspond to experimental readouts. These signals are used to compute correction terms assuring that only unoccupied binding sites are dephosphorylated (see below, e.g. Equation 7). Their differences define the concentrations of binding partners for effectors whose binding site phosphorylations are described in different layers.
Rates and ODEs
It represents the fraction of species with phosphorylated binding sites that are not occupied. The need for these correction terms results from the loss of information and the corresponding reduction of the model size that is due to the modularization of the network. The correction terms always have the same structure, however, different x_{ i }and the corresponding x_{ i }b are taken for each binding site.
The rate for each dephosphorylation reaction of a binding site with a site modification 'P' is multiplied by the appropriate c_{ i }which guarantees that only unoccupied binding sites are dephosphorylated.
All dephosphorylation rates for a specific site with a modification 'P' are multiplied by the same c_{ i }. The implicit assumption behind this is that all species with this phosphorylated binding site (that all count to the same x_{ i }) have the same fraction of unoccupied phosphorylated binding sites. This is the core of the approximation and the reduction and results from the rapid equilibrium assumptions that are discussed in the mathematical background.
The ODEs are generated as they would be in conventional modeling. Note that no ODE is necessary for the binding partners in binding reactions that are defined by algebraic equations as differences of x_{ i }and x_{ i }b. The sums x_{ i }and x_{ i }b are also defined by algebraic equations.
A small example system
In a small example system that has already been intensively discussed [3], it is assumed that a receptor R has two sites. The first is a binding site for the ligand L, the second is a binding site for the effector E. This binding site has to be phosphorylated before E can bind. A conventional representation of this system is shown in Figure 2. To simplify the discussion, the receptor is denoted as D in the conventional detailed model and is denoted as R in the layerbased reduced model.
The processes of ligand binding and autophosphorylation perform a graded interaction (Figure 1A). In this special case, ligand binding stimulates the autophosphorylation, whereas the autophosphorylation does not influence ligand binding. The processes of autophosphorylation and effector binding undergo an allornone interaction (Figure 1B). Note that another reaction chain that is analogous to the one in Figure 1C and includes microscopic receptor species with a bound L exists.
where totR is the total concentration of the receptor. In this example, the concentration of the (extracellular) ligand L is set to the constant value 10.
The model equations of the effector layer are
r_{4} = k 4·RXp·E  k 4d·RXE
Equations 8 and 10 show that the layers can be modeled independently from each other as long as the connections between the layers (Equation 11) are defined.
Note that the letter 'X' only indicates a macrostate if it is a site configuration. In the species RXE and RXp, the 'X' is a part of the molecule name. Therefore, RXE and RXp are not macrostates. However, RXE represents the sum of all microscopic species where E is bound to the receptor, and RXp represents the sum of all microscopic species where the binding site is phosphorylated and not occupied. Therefore, both are macroscopic species.
Advanced strategies
Using the descriptions that have been provided above, it is possible to model a large class of systems by applying the step by step procedure below. Here we introduce advanced strategies that allow for an additional reduction of layerbased models. All strategies are exemplified by model definitions for ALC resulting in executable models (Additional file 2).
Equivalent binding sites
If n binding sites on a molecule are exactly equivalent and do not influence each other (and if this molecule is not an oligomer that can dissociate) it is sufficient to model only one binding site. Note that this holds only if the initial conditions of the degrees of phosphorylation are also identical.
If it is desired to have a model output representing the sum of all phosphorylated binding sites (for both linear and nonlinear kinetics), the factor n has to be considered and the output is n·R[P].
Equivalent sites that are not binding sites and that do not influence each other can be treated in an analogous way. The reduction potential resulting from equivalent binding sites can be exploited to exactly reduce the model of insulin signaling with 214 ODEs [3] to 51 ODEs (Additional file 3).
One effector binding to different binding sites
RXp represents the sum of all phosphorylated but unoccupied binding sites on the dimers and monomers of the receptor. The Binding of E[0] to RXp leads to E[R].
The correction term c_{ R }for the dephosphorylation of all these binding sites is the same and equals c_{ R }= (x_{ R } x_{ R }b)/x_{ R }. This corresponds to the assumption that the same fraction c_{ R }of phosphorylated binding sites is occupied for all those sites with a site modification 'P'.
This strategy also prevents the danger of negative concentrations in the effector layers that may occur if e.g. receptor dimerization is faster than effector dissociation and R_{1}[P] rapidly 'vanishes' due to dimerization. In an example (where the actual advanced strategy is not applied) x_{R 1}= R[P, X,...] is the sum of monomers with phosphorylated binding sites and the associated x_{R 1}b represents all effectors that are bound to receptor monomers. Let the system be in a state where a large fraction of the binding sites is occupied, i.e. x_{R 1} x_{R 1}b ≈ 0. Assume that an external stimulus induces the rapid dimerization of the receptor which rapidly lowers the value of x_{R 1}because the dimers are not included in x_{R 1}. Note that the dimerization has no direct influence on x_{R 1}b. Therefore, it is possible that the concentration of the species R 1Xp = x_{R 1} x_{R 1}b representing the concentration of the sum of unoccupied binding sites of the monomer becomes smaller than zero. As mentioned above, this problem can and should be avoided by using a common x_{ i }for corresponding sites on the monomers and dimers.
Additional signals between layers
Additional signals between layers are allowed as long as no graded interactions are introduced by them. In a typical example, the processes of receptor activation and receptor phosphorylation are described in the receptor layer, whereas the binding and phosphorylation of effectors is described in different layers. The phosphorylation of effectors that are bound to receptors is performed by activated receptors. It is important that one specific activated receptor does not selectively phosphorylate effector molecules that are bound to it. This would introduce a graded interaction between the processes of receptor activation and effector phosphorylation and therefore destroy the modular structure of the model. Instead of this, it is advantageous to assume that the pool of activated receptors acts as an enzyme and phosphorylates receptorbound effectors. In this case, the signal representing receptor activity can be transferred to the effector layers as an additional signal.
In general, the additional coupling between processes of different layers must occur in the form of a meanfield assumption. This means that the rate of an event in a downstream layer (e.g. effector phosphorylation) is proportional to the average value of some property computed from the upstream layer (e.g. receptor activation).
Step by step procedure: layerbased modeling
 1.
Identify all processes and their interactions.
 2.
Draw the interaction graph.
 3.
Deduce from the interaction graph how many layers are in the model and which processes form part of which layer. All processes that are coupled by graded interactions are part of the same layer. Processes of different layers are coupled by allornone interactions or do not interact.
 4.
Model each layer separately as if the processes of all other layers do not exist.
 (a)
Define rules and reactions, as in conventional modeling. The site modification (uppercase) 'P' is reserved for indicating the phosphorylation of binding sites to which effectors bind in other layers. A different notation (e.g. lowercase 'p') has to be used to indicate the phosphorylation of other sites.
 (b)
Define all sums of species with phosphorylated binding sites x_{ i }and all sums of species with occupied binding sites x_{ i }b. This only has to be done for binding sites with a site modification 'P'. Note that x_{ i }b has the same notation as the corresponding x_{ i }, followed by (lowercase) 'b'.
 (c)
Use algebraic equations to define the concentrations of the species that act as binding partners in effector binding to binding sites whose phosphorylation is described in other layers. These species are defined as differences of x_{ i }and x_{ i }b.
 (d)
Assign a rate to each reaction, as in conventional modeling. A special case is the dephosphorylation of binding sites with a site modification (uppercase) 'P'. The rate expressions for these dephosphorylation reactions have to be multiplied by the appropriate correction term c_{ i }= (x_{ i } x_{ i }b)/x_{ i }. Using the correction terms c_{ i }guarantees that the dephosphorylation is only possible for unoccupied binding sites.
 (e)
Conservation relations can be used to replace one ODE for each conserved moiety (e.g. a protein that is not degraded or synthesized).
 (f)
Use reactions and rates, as in conventional modeling, to build the ODEs for all species that are not defined by an algebraic equation.
 (g)
Assign initial conditions guaranteeing x_{ i }> 0 (prevents division by zero) and x_{ i }≥ x_{ i }b ∀ i (prevents negative concentrations of binding partners).
Mathematical background
In the following subsection, we give an overview about the mathematical background of layerbased modeling. We do not recapitulate the detailed discussions provided elsewhere [3], but give the main results and some additional descriptive interpretations. Understanding the mathematical background is not necessary for layerbased modeling. Therefore, this subsection may be skipped if the reader is not interested in mathematical details.
We start the discussion by describing the rapid equilibrium assumption, which is an important part of the theoretical basis for the reduction potential of layerbased modeling.
Rapid equilibrium assumption
This reaction is almost in equilibrium though a nonzero flux r_{ i }goes through it. Equation 17 can be used to replace one ODE of the system. The resulting reduced model provides a good approximation of the dynamics after a short initial time span (or for all times if the initial conditions already fulfill Equation 17) [6].
Diagonal reactions
which implies that the processes of L binding and E binding are completely independent. However, this is an approximation as there is an indirect interaction between these processes (Figure 3) that results in (dynamic) deviations from Equation 19.
Equation 19 can be rewritten as
D[0, p]·D[L, E]  D[L, p]·D[0, E] = 0.
This corresponds to a rapid equilibrium assumption for the reaction
D[0, p] + D[L, E] ⇋ D[L, p] + D[0,E]
which is not part of the reaction scheme (Figure 2) and where the equilibrium constant equals one. This reaction fills the diagonals of the square representing the reaction cycle (Figure 5). Therefore, we refer to this kind of fast virtual reaction as a diagonal reaction.
Assuming that the two processes are indeed independent, the equations following from the calculus of probability as discussed by Borisov et al. [4] can be transformed to Equation 20 which leads to a different interpretation of the same equations [3].
As Equation 20 is only one of three equilibrium conditions, assuming equilibrium for the diagonal reaction is a weaker assumption than assuming equilibrium for the whole reaction cycle.
The approximation in layerbased modeling is based on the (implicit) equilibrium assumption for diagonal reactions connecting processes of different layers that do not interact directly. Note that such diagonal reactions can connect more than two processes (for an example see Additional file 4, p.3 in [3]). Each such equilibrium assumption reduces the number of necessary ODEs by one.
Nonuniqueness of diagonal reactions
The list of diagonal reactions is not unique. In almost all larger models there are several conditions such as Equation 19, which leads to
q_{1} = q_{2} = ... = q_{ n }
where q_{ i }are quotients of microscopic species of the conventional model (see again Additional file 4, p.3 in [3]). Equation 22 provides n  1 independent equations. It can be easily seen that the list of the corresponding diagonal reactions depends on the choice of the n  1 independent equations that result from Equation (22). However, the set of microscopic species connected by the diagonal reactions is unique.
Approximation quality
We introduce a measure for the approximation error g as the deviation from Equation 20.
D[0, p]·D[L, E]  D[L, p]·D[0, E] = g
The error g(t) quantifies the distance of the diagonal reaction (Equation 21, Figure 5) from equilibrium. It follows a first order ODE
A very important result from Equation 26 is that the approximation error declines and is bounded because u(t) is bounded from above and a(t) is bounded from both below and above. Similar expressions to Equation 26 can be given for u(t) ≤ 0 and g_{0} ≤ 0.
There is one error equation (with g_{ i }, a_{ i }and u_{ i }) as Equation 24 for each diagonal reaction. In the most simple case, a_{ i }equals the sum of the four kinetic parameters of the reaction cycle. In each case, a_{ i }is large if the reaction parameters in the corresponding cycle are large and u_{ i }is a weighted sum of external fluxes (that can be positive or negative) entering the reaction cycle. When unbalanced (u_{ i }(t) ≠ 0), these incoming fluxes can lead to fluxes through the diagonal reaction that lower the approximation quality.
All u_{ i }(t) vanish in thermodynamic equilibrium or in the absence of graded interactions. Therefore, layerbased modeling provides an exactly reduced model if there are no graded interactions (i.e. if all processes are connected by allornone interactions or do not interact) or if the layers containing graded interactions are not coupled to the other layers. In both cases, the initial conditions of the corresponding conventional model have to fulfill the equilibrium conditions for the diagonal reactions (e.g. Equation 20). If they do not, the approximation is only stationarily exact.
The approximation is also stationarily exact if all fluxes vanish in the stationary case, i.e. if the system reaches thermodynamic equilibrium. Note that a system is able to reach thermodynamic equilibrium if the Wegscheider condition [6] is fulfilled for the entire reaction network and if no species or fluxes are set to a constant value (or follow an externally imposed trajectory). Another possibility for stationary exactness is the presence of a stationary flux distribution, characterized by u_{ i }(t) ≡ 0 ∀ i even if the system does not reach equilibrium.
In all other cases, layerbased modeling is a purely approximative method. However, for typical scenarios the approximation error is very low. We showed this for an example system where parameter variations of four orders of magnitude around values from literature were not able to cause a low approximation quality [3].
Evaluation of the approximation quality
Up until now there has been no easy method to quantify the approximation error of large layerbased models. A very laborious method (which is not applicable for very large systems) is to build the conventional model and compare typical simulation results for macroscopic variables. Another method which also needs to simulate the corresponding conventional model is to check all errors g_{ i }. If all g_{ i }decline rapidly, the approximation quality is high. However, a large error for some g_{ i }(e.g. Equation 23) is not a direct evidence for poor approximation quality. If a state variable that is approximated by an equilibrium assumption for a diagonal reaction (as Equation 20) has only little influence on the state variables of the layerbased model, even a large error g_{ i }will not result in a large approximation error. This depends on the observability of this state variable, which is difficult to quantify and discussed elsewhere [36]. We recommend that more qualitative checks, described below, be performed in order to assess the approximation quality.
In the typical case, effector binding and the corresponding binding site phosphorylation are described in different layers. If the binding of the effector is relatively fast (as the binding of most effectors is) the approximation quality is usually high. If there are fast processes in the layer describing the binding site phosphorylation that connect all or most of the different phosphorylated species, the approximation quality is usually high as well. High parameter values for these fast processes result in high values of the exponents a_{ i }and therefore in a high approximation quality (Equation 26). The process of ligand binding is a candidate for such a fast process in the example system (Figure 4).
If the binding of an effector is slow, it should be considered to merge the layers describing this effector binding and the corresponding binding site phosphorylation to a single one. In this case, the erroneous equilibrium assumptions for the slow process are not introduced and do not lead to an approximation error (but do also not contribute to the model reduction).
If one is not sure about the approximation quality resulting from certain parameter combinations, one could build a conventional model of a subsystem comprising the processes belonging to the considered parameters and all processes that directly or indirectly interact with them via graded interactions. The approximation error can be analyzed quantitatively for the comparison of simulation results for macroscopic quantities of the conventional subsystem with those of a corresponding layerbased model.
guaranteeing x_{ i }≥ x_{ i }b ∀ i. External fluxes ${f}_{{x}_{i}}$ that reflect the changes in x_{ i }have to be included in the conventional model. These fluxes ${f}_{{x}_{i}}$ add or remove the corresponding species with phosphorylated and unoccupied binding sites from the conventional model. Note that this species is defined by x_{ i } x_{ i }b in the layerbased model. The velocity with which a certain x_{ i }may change depends on the kinetic parameters for phosphorylation and dephosphorylation of the corresponding binding site.
Reduction of the model size
A typical situation in cellular signaling is that there are multiple sites within the same protein. In this case, the size of the conventional model is a product of the number of states of each site, where each possible state of a binding partner must be considered as a separate state of the site. In the layerbased model, however, each binding site that is phosphorylated has only two states (unphosphorylated and phosphorylated). The states of the binding partner are modeled in a separate module (layer) and do not increase the number of states of the original protein. Another typical case is that binding partners are phosphorylated on binding sites and bind other effectors. In this case, the argumentation above can be applied to each additional effector in such an association chain.
The exponential growth of the number of state variables with the number of binding sites and binding partners in conventional models is partly replaced by linear growth with the number of layers. Within the layers there is still exponential growth with respect to the number of molecular sites included in the layer. Both scenarios mentioned above lead to implicitly assumed equilibrium conditions for diagonal reactions in layerbased models. Each such equilibrium condition reduces the number of ODEs by one. In the small example system, one diagonal reaction (Equation 21) is assumed to be in equilibrium (Equation 20), which reduces the number of ODEs by one compared to a conventional model.
Stochastic simulation of layerbased models
Layerbased modeling has been introduced as a technique for deterministic modeling [3]. However, stochastic simulation of layerbased models is in principle possible.
A general problem in using stochastic simulation tools for layerbased models is that there are species that are defined by algebraic assignments. Additionally, it is crucial that the reaction parameters are allowed to be functions of species which becomes apparent when considering the correction terms. Therefore, only simulators can be used that support algebraic assignments and flexible parameter formats. Note that correcting the reaction parameters by the molecular weight of complexes (as is proposed for Moleculizer [24]) is not applicable for layerbased models as this information is not contained in layerbased models, due to the modularization.
Implementation
ALC is a computer program that converts model definitions for layerbased models into model files in different formats as well as into documentation files. Its intuitive syntax is simple but powerful and supports the concepts of rules, macrostates and modularity.
ALC can be used offline or via a form on the ALC website [37]. ALC performs detailed consistency checks on the model definition. These checks result in informative error messages and warnings ensuring that most errors are easy to find and easy to correct. The output files of ALC are readytorun simulation files in the formats C MEX, MATLAB (The MathWorks), Mathematica (Wolfram Research) and SBML [38]. ALC also provides the model in LATEX [39, 40] and plain text format to simplify its publication or presentation.
Architecture
ALC is written in the programming language Perl [41, 42]. When used offline, the model definition is read in by the script file ALC.pl which also starts the modeling procedure. The source code necessary for the generation of output files is organized in three separate Perl modules. The Perl module Procedures.pm contains procedures that guide the application through the general steps of the model generation process. The Perl module Functions.pm contains functions that are called by the procedures from the module Procedures.pm and functions that are called by other functions from the same module. The Perl module Output.pm contains procedures that generate all output files after the model generation process is finished. The form on the ALC website [37] is linked to a Perl CGI script that is running on an Apache web server [43]. This CGI script extracts the model definition from the uploaded model definition file or from the text field in which the model definition was pasted and starts the model generation. It also displays the results and provides the download links for the output files. The model generation and the generation of the output files is performed by the same Perl modules that are used offline.
Constructing the model definition
The model definition is constructed in plain text and consists of distinct sections that are encapsulated by #name and #end name, where 'name' is the name of the section. All sections may occur as often as desired in the model definition. Therefore, each layer can be defined separately, supporting the modular structure of layerbased models (see e.g. Equations 8 and 10).
The syntax of ALC and the distinct sections are briefly introduced below. Detailed descriptions can be found in the ALC user guide (Additional file 1). A tutorial demonstrates the generation of the model definition for an example system in detail (Additional file 4).
The section '#molecules'
All molecule definitions are given in the section #molecules (Figure 7). A molecule definition consists of the molecule name that is optionally followed by the successive definition of all sites. The molecule name has to start with a capital letter which is optionally followed by an arbitrary sequence of alphanumeric symbols (letters, digits and underscores are allowed). Each site definition consists of a commaseparated sequence of all possible modifications which is encapsulated by curly brackets. A site modification is defined by a sequence of alphanumeric symbols. The definition of a site modification 'X' is forbidden, since 'X' is reserved for indicating macrostates and patterns. A site modification 'P' has a predefined meaning. It marks the site as a binding site to which effector binding is described in another layer. In this case, 'P' indicates the phosphorylated state.
As an example, a molecule definition R{0, L}{0, P} defines the molecule R that can have the modifications '0' and 'L' at the first site and the modifications '0' and 'P' at the second site. If a molecule is defined, its instances (e.g. species like R[L, 0] or macrostates like R[X, P]) can be used in other sections of the model definition.
Complexes
ALC only supports the simple notation of complexes introduced above. Complexes of species can be treated as species of one of the corresponding molecule definitions, or as species of new molecules that are defined for complexes of the molecules. As an example, the species R[L, 0] is the complex of the species L and R[0, 0] as this is already contained in the molecule definition R{0, L}{0, P}, whereas the dimer of R{0, L}{0, P} has to be explicitly defined, e.g. as R2{0, L}{0, P}{0, L}{0, P}.
The section '#reactions'
The reaction symbol is '<>' for reversible reactions and '>' or '<' for irreversible reactions. For irreversible reactions only one parameter has to be given. ALC builds the corresponding rates following the generalized law of mass action where the parameters can be general nonlinear functions.
If a reaction describes the transition of a site modification 'P' to anything but 'P' (i.e. the dephosphorylation of a binding site to which effector binding is described in another layer), the reaction rate has to be modified using a correction term c_{ i }.
To guarantee that only the unoccupied fraction of binding sites with a site modification 'P' becomes dephosphorylated, ALC automatically searches the corresponding x_{ i }and x_{ i }b in the model definition, builds the appropriate c_{ i }= (x_{ i } x_{ i }b)/x_{ i }and considers c_{ i }in the rate laws for dephosphorylation. As the correction terms c_{ i }in the reaction rates are automatically assigned by ALC, one need not include them in the reaction parameters. The algorithm for this assignment is described below.
The section '#layer connections'
The signal flow between the layers is defined in the section #layer connections (Figure 7). Notations of the sums of species with a phosphorylated binding site x_{ i }have to start with a lowercase 'x' and are optionally followed by a sequence of letters and digits. Notations of the sums of species with occupied binding sites x_{ i }b start with the notation of the corresponding x_{ i }and end with a lowercase 'b'. Note that notations of x_{ i }are not allowed to end with a lowercase 'b'.
The phosphorylation of binding sites (site modification 'P') and the corresponding effector bindings are described in different layers. In this case, the binding partners for the effectors are defined by algebraic equations, typically as differences of x_{ i }and x_{ i }b. This is also done in the section #layer connections. The notation of these binding partners is analogous to the notation of species, however, the binding partners do not have to be defined in the section #molecules.
The sections '#parameters' and '#initial conditions'
The sections #parameters and #initial conditions are used to define parameters and initial conditions, respectively (Figure 7). In contrast to species notation, parameter notations have to start with a lowercase letter (which may not be 'r' or 'x', as these letters are reserved for the notation of reaction rates and layer connections, respectively). Parameters can also be functions of other parameters (e.g. k1 = k2/k3).
Each species that is not defined by an algebraic equation requires an initial condition. An initial condition for a species is defined by an equation that assigns a value to the species. Note that initial conditions can be functions of parameters (e.g. R[0, P] = a/b+1).
Initial conditions and parameters for which no value is assigned, are set to the corresponding default values. The default values can be arbitrarily chosen (see below) and a warning is given for each omitted parameter definition or initial condition.
Setting the default value for initial conditions to zero may result in division by zero in the correction terms c_{ i }if no initial condition for phosphorylated species is set manually to a nonzero value. One could use a default value very close to zero (e.g. 10^{20}) to avoid this problem. During the assignment of initial conditions, one has to assure that x_{ i }≥ x_{ i }b and x_{ i }> 0 hold for all i. It is not possible that more binding sites are occupied than are phosphorylated, as in these cases the phosphorylation is a precondition for effector binding. Therefore, it is obvious that the condition x_{ i }≥ x_{ i }b ∀ i has to be fulfilled for the initial conditions.
Optional sections
Conserved moieties are chemical entities that participate in a reaction system without loss of integrity and always remain in the system [6]. As an example, the sum of all species of a protein that is not degraded or synthesized is a conserved moiety. Conservation relations (molecular balances) allow for the replacement of the ODE of one species for each conserved moiety by an algebraic equation, and can be defined in the section #molecular balances. Concentrations of species can be set to a constant value (which can be a function of parameters) in the section #clamped concentrations. The section #algebraic relations allows for the assignment of arbitrary algebraic expressions (that may contain species, macrostates, numerical values and parameters) to variables whose notation is analogous to that for species. The section #remove allows for the complete removal of defined species from the reaction network and from all macrostates. This simplifies the use of rules for association reactions and is discussed in the user guide. The section #output allows for the declaration of species, macrostates, variables from the section #algebraic relations, or arbitrary algebraic expressions as outputs. All outputs are visualized in the resulting Mathematica, C MEX and MATLAB models and defined as Output_{ i }in the SBML model.
Assignment of correction terms
In the following section, we describe the conditions that are checked in order to decide whether a correction term c_{ i }is necessary and if so, the algorithm to find the appropriate correction term. ALC detects the need for a correction term in the rate law of a reaction if the following two conditions are simultaneously fulfilled:

A molecule M occurs on both sides of the reaction equation.

The site modification of the molecule M is 'P' on the j th site on one side of the reaction equation and not 'P' on the other side of the reaction equation.
As an example, these conditions are fulfilled for the second and for the third reaction in Figure 7, whereas they are not fulfilled simultaneously in the remaining reactions.
 1.
Replace macrostates by the corresponding sums in each definition of x_{ i }.
 2.
Count for all x_{ i }how often the molecule M occurs with a site modification 'P' on the j th site (factors before the species are considered).
 3.
Take the x_{ i }with the highest score and use c_{ i }= (x_{ i } x_{ i }b)/x_{ i }as the correction term.
The choice of the correction term is described in the ALC user guide for an example system. This algorithm is implemented much faster than described above. The first step is only performed once. The second step is performed for all molecules and positions with the site modification 'P' that occur in at least one x_{ i }simultaneously. This is done by building a hash (an associative array, a major data type in Perl) [41] that has the keys: molecule name and position. The values are the corresponding x_{ i }that have the highest score. Using this hash, the third step is trivial.
Running ALC
There are two possible ways of running ALC. The first is to use the form on the ALC website [37], where ALC can be accessed using a browser without any additional software.
The second possibility is to download the latest release of ALC from the same website [37], from SourceForge.net [44], or to use the first release that is given in Additional file 5. Perl (freely available, e.g. [42]) is required for the offline use. For the installation of ALC it is only necessary to unpack the file 'download_ALC.zip' (or Additional file 5) in the directory where the model definition files will be stored. If ALC is installed, store the model definition in the file 'layer.alc', open a command line interface, go to the ALC directory and type
ALC.pl
followed by the return key. Another, equivalent possibility is to use
perl ALC.pl
to perform the generation of the output files. These commands execute ALC with default values, e.g. for undefined parameters or initial conditions and for the filenames of the model definition file and the resulting model files. Without changing the default values, the model definition has to be stored in the file 'layer.alc' and the notation of the resulting output files starts with 'model'. All default values can be changed in the file Config_ALC.txt or via the command line for each call of ALC separately. A detailed description of default values and command line parameters is given in the ALC user guide. Note that all options have their equivalent as text fields or select elements in the form on the ALC website [37].
Step by step procedure: using ALC
 1.
Identify all processes and their interactions.
 2.
Draw the interaction graph.
 3.
Deduce from the interaction graph how many layers are in the model and which processes form part of which layer. All processes that are coupled by graded interactions are part of the same layer. Processes of different layers are coupled by allornone interactions or do not interact.
 4.
Model each layer separately as if the processes of all other layers do not exist.
 (a)
Define all molecules in the section #molecules (e.g. R{0, L}{0, P}). The site modification (uppercase) 'P' is reserved for indicating the phosphorylation of binding sites to which the effectors bind in other layers. A different notation (e.g. lowercase 'p') has to be used to indicate the phosphorylation of other sites.
 (b)
Define rules and reactions, as in conventional modeling, in the section #reactions (e.g. R[0, X]+L<>R[L, X] k1 k2). Define variables in the section #algebraic relations that represent frequently used expressions in the parameter part of the reactions.
 (c)
Define all sums of species with phosphorylated binding sites x_{ i }(e.g. x = R[X, P]) and all sums of species with occupied binding sites x_{ i }b (e.g. xb = RXE) in the section #layer connections. Note that x_{ i }b has the same notation as the corresponding x_{ i }, followed by (lowercase) 'b'.
 (d)
Define the binding partners for effectors whose binding sites are phosphorylated in other layers in the section #layer connections (e.g. RXp = xxb).
 (e)
Assign values to parameters in the section #parameters (e.g. k1 = 3). Undefined parameters are set to the default value (the value of the option Param) which can be defined in the file Config_ALC.txt or via the command line.
 (f)
Assign initial conditions in the section #initial conditions (e.g. R[0,0] = 2). Undefined initial conditions are set to the default value (the value of the option InCond) which can be defined in the file Config_ALC.txt or via the command line. The initial conditions have to guarantee x_{ i }> 0 and x_{ i }≥ x_{ i }b ∀ i.
 (g)
Concentrations of species can be set to a constant value in the section #clamped concentrations (e.g. L = 10).
 (h)
Conservation relations (e.g. R[0,0] = totRR[X, X]+R[0,0]) can be defined in the section #molecular balances. Using macrostates and the species whose ODE is to be replaced highly simplifies this step.
 5.
Define outputs (observables) to be visualized in the C MEX, MATLAB and Mathematica models in the section #output. All variables defined in the section #layer connections are visualized in the C MEX, MATLAB and Mathematica models without declaring them as outputs. This can be disabled by changing the default value (the value of the option OutLC) which can be defined in the file Config_ALC.txt or via the command line.
 6.
Store the model definition in a file, e.g. 'layer.alc', which is the default file name for the model definition file.
 7.
Run ALC offline or use the form on the ALC website.
Output files
The resulting model is given in MATLAB (default: modelM call.m and modelM.m), C MEX (default: modelC call.m, modelC mdl.mdl and modelC.c), Mathematica notebook (default: model.nb) and Mathematica input file (default: model.mma.m) formats as directly executable files. They include the visualization of userdefined output variables and – in the default case – all variables defined in the section #layer connections.
The core of the C MEX model is a MATLAB Sfunction written in C, which results in a much faster simulation of the model compared to the standard MATLAB format. A recent release of MATLAB (earliest version successfully tested: 7.0.4.352) is required for using the C MEX models. The Mathematica input file contains the same code as the notebook file but simplifies the integration of layerbased models into existing mathematica code.
The model is also given in SBML (default: model.xml), LATEX (default: model.tex) and plain text (default: model.txt) formats, which allows for the direct usage of model equations for presentations or publications. Additionally, the model definition is converted to the LATEX format (default: model.input.tex).
Note that the SBML standard [38] allows for explicit algebraic assignments, which are used in layerbased models. However, some SBMLsupporting tools will not work with layerbased SBML models as these tools do not support algebraic assignments. An important tool that works with layerbased SBML models is MathSBML [45] (release 2.7.0.3 or higher).
Results and discussion
ALC: Functionality and performance
Simplification of the model generation
ALC converts model definitions given in a simple but powerful rulebased syntax to computational models in different formats, as well as documentation files. The main benefit of ALC is that it dramatically simplifies layerbased modeling and reduces the risk of creating erroneous model equations.
The assignment of the correction terms c_{ i }to dephosphorylation rates is one of the more difficult and errorprone steps in manual layerbased modeling. ALC performs these assignments automatically, such that errors in this step are avoided. This strengthens the analogy of layerbased modeling to conventional modeling and rulebased modeling as the reaction network within layers can now be defined using rules without considering correction terms.
ALC also supports the usage of macrostates. This highly simplifies the definition of the signals between the layers (x_{ i }and x_{ i }b) as well as the definition of conservation relations and the use of enzyme kinetics for rules and reactions.
Portability of models
In many cases, it is comfortable to have the model in different formats. As an example, SBML [38] is becoming the de facto standard for model representation in systems biology. Though by far not all modeling and simulation projects use SBML, it is often desired to provide SBML models for publications. In many cases, the model equations in plain text are part of the manuscript or are provided as supplementary material. ALC automatically exports the model to several formats including SBML and provides the model equations in LATEX format as well as in plain text format.
The manual format conversion of models is probably the major reason for errors in published model descriptions. Automation of this step, as provided by ALC, is not only convenient but also lowers the probability of errors in the provided models.
Application range of ALC
ALC is optimized for building layerbased models. It supports features of layerbased modeling that are not present in conventional or rulebased modeling, and that are not supported by other tools. ALC is also well suited for conventional modeling of reaction networks. Although the functionality of ALC is related to other tools that also allow for rulebased modeling [20–26], the application range is different. Note that rulebased conventional models can be built using ALC. However, due to limitations in the description of complexes, ALC is not as well suited for this task as e.g. BioNetGen [20–22]. ALC shows its optimal performance when building reduced modular models according to the layerbased approach.
Computational aspects
The output files for small layerbased models are generated in less than one second on a desktop PC. The generation of larger layerbased models that can correspond to extremely large conventional models is also very rapid. As an example, the output files for the layerbased model of insulin signaling (214 ODEs) that replaces a conventional model with 1.5·10^{8} ODEs [3] are generated in about one second on a desktop PC. A few thousand ODEs are usually generated within seconds.
The model size in the offline version of ALC is only restricted by CPU time, disk space and memory capacity. The online version of ALC which is accessible via a form on the ALC website [37] provides full functionality, but is restricted to model definitions that define no more than 500 species. This is done to keep the traffic and the processor load on the server at a reasonable size.
The choice of the modeling method
An important question one has to answer at the beginning of the modeling process is that of which modeling method is the most suited for the considered system. A principle decision is necessary between stochastic and deterministic simulation.
If stochastic simulation is chosen, it depends on the model size, as well as on the branching of the network and on the number of simulation runs as to whether onthefly or generatefirst modeling is more suited. Note that layerbased modeling is also suited for stochastic simulation. In the following paragraph, we give a rule of thumb for choosing the appropriate deterministic modeling technique.
The exact techniques should be used if it is possible and applicable. Whether one should use conventional modeling, rulebased modeling or exact model reduction simply depends on the size of the resulting model and on the number of necessary simulations. Additional assumptions (e.g. assuming a temporal order of processes) that are only made to keep the numbers of species and reactions low but have no physiological justification should be avoided. Such assumptions may lead to a nearly unpredictable approximation error [5, 18].
If the conventional model (which may be the result of rulebase modeling) is too large for efficient simulation or parameter estimation, and if the exactly reduced model is also too large or is difficult to generate, layerbased modeling is a powerful possibility. In addition, the layerbased formalism is especially suited for comparing many model variants, as in many cases only a few equations in a single layer have to be changed. Note that rulebased modeling is also possible for layerbased models when using ALC. If the layerbased model is still too large for efficient simulation or parameter estimation, the layerbased model should be subjected to exact model reduction. In this case, the exact model reduction is performed on each layer separately. This combination of layerbased modeling and exact model reduction results in the fewest ODEs without the introduction of an additional approximation error as introduced by layerbased modeling.
Conclusion
The layerbased approach allows for a highly reduced macroscopic description of signaling systems with inherent combinatorial complexity. It is an approximative, but accurate modeling technique that results in a reduced model with a pronounced modular structure. An interaction graph which represents the interactions between the processes defines the modularity of the model. The resulting modules, called layers, are connected in a standardized way and can be modeled separately from each other once their connections are defined.
ALC highly simplifies the generation of layerbased models. The tool can be used offline or via a form on the ALC website. The simple but powerful syntax of the model definition supports the concepts of modularity, rules and macrostates. The model definition is divided into distinct sections and can be structured to mirror the modularity of the model. ALC provides the resulting dynamic models in different formats (C MEX, MATLAB, Mathematica and SBML) as readytorun simulation files, and provides documentation files that simplify the presentation and publication of the models.
Availability and requirements
Project name: ALC
Project home page: http://layer.mpimagdeburg.mpg.de
ALC on SourceForge.net: http://sourceforge.net/projects/alc/
Operating system(s): Platform independent
Programming language: Perl
License: GNU Lesser General Public License
Any restrictions for use by nonacademics: None
Declarations
Acknowledgements
This work was supported by the German Federal Ministry of Education and Research (BMBF) and the Network Systems Biology HepatoSys. We thank Michael Ederer, Johannes Witt, Holger Conzelmann and Duncan Payne for proofreading the manuscript and betatesting.
Authors’ Affiliations
References
 Hlavacek WS, Faeder JR, Blinov ML, Perelson AS, Goldstein B: The complexity of complexes in signal transduction. Biotechnol Bioeng. 2003, 84 (7): 78394.View ArticlePubMedGoogle Scholar
 Hlavacek WS, Faeder JR, Blinov ML, Posner RG, Hucka M, Fontana W: Rules for modeling signaltransduction systems. Sci STKE. 2006, 2006 (344): re6PubMedGoogle Scholar
 Koschorreck M, Conzelmann H, Ebert S, Ederer M, Gilles ED: Reduced modeling of signal transduction – a modular approach. BMC Bioinformatics. 2007, 8: 336PubMed CentralView ArticlePubMedGoogle Scholar
 Borisov NM, Markevich NI, Hoek JB, Kholodenko BN: Signaling through receptors and scaffolds: independent interactions reduce combinatorial complexity. Biophys J. 2005, 89 (2): 951966.PubMed CentralView ArticlePubMedGoogle Scholar
 Conzelmann H, SaezRodriguez J, Sauter T, Kholodenko B, Gilles E: A domainoriented approach to the reduction of combinatorial complexity in signal transduction networks. BMC Bioinformatics. 2006, 7: 34PubMed CentralView ArticlePubMedGoogle Scholar
 Heinrich R, Schuster S: The Regulation of Cellular Systems. 1996, Chapman & HallView ArticleGoogle Scholar
 Atkins P, Paula Jd: Physical Chemistry. 2006, Oxford University PressGoogle Scholar
 Asthagiri A, Lauffenburger D: A computational study of feedback effects on signal dynamics in a mitogenactivated protein kinase (MAPK) pathway model. Biotechnol Prog. 2001, 17: 227239.View ArticlePubMedGoogle Scholar
 Sedaghat AR, Sherman A, Quon MJ: A mathematical model of metabolic insulin signaling pathways. Am J Physiol Endocrinol Metab. 2002, 283 (5): E1084101.View ArticlePubMedGoogle Scholar
 Hatakeyama M, Kimura S, Naka T, Kawasaki T, Yumoto N, Ichikawa M, Kim JH, Saito K, Saeki M, Shirouzu M, Yokoyama S, Konagaya A: A computational model on the modulation of mitogenactivated protein kinase (MAPK) and Akt pathways in heregulininduced ErbB signalling. Biochem J. 2003, 373 (Pt 2): 451463.PubMed CentralView ArticlePubMedGoogle Scholar
 Haugh J, Schooler K, Well A, Wiley H, Lauffenburger D: Effect of epidermal growth factor receptor internalization on regulation of the phosphoslipase Cgamma1 signaling pathway. J Biol Chem. 1999, 274: 89588965.View ArticlePubMedGoogle Scholar
 Haugh J, Well A, Lauffenburger D: Mathematical modeling of epidermal growth factor receptor signaling through the phospholipase C pathway: mechanistic insights and predictions for molecular interventions. Biotechnol Bioeng. 2000, 70: 225238.View ArticlePubMedGoogle Scholar
 Kholodenko BN, Demin OV, Moehren G, Hoek JB: Quantification of Short Term Signaling by the Epidermal Growth Factor Receptor. J Biol Chem. 1999, 274 (42): 3016930181.View ArticlePubMedGoogle Scholar
 Moehren G, Markevich N, Demin O, Kiyatkin A, Goryanin I, Hoek JB, Kholodenko BN: Temperature dependence of the epidermal growth factor receptor signaling network can be accounted for by a kinetic model. Biochemistry. 2002, 41: 30620.View ArticlePubMedGoogle Scholar
 Schoeberl B, EichlerJonsson C, Gilles ED, Müller G: Computational modeling of the dynamics of the MAP kinase cascade activated by surface and internalized EGF receptors. Nat Biotechnol. 2002, 20 (4): 370375.View ArticlePubMedGoogle Scholar
 Liu G, Swihart MT, Neelamegham S: Sensitivity, principal component and flux analysis applied to signal transduction: the case of epidermal growth factor mediated signaling. Bioinformatics. 2005, 21 (7): 1194202.View ArticlePubMedGoogle Scholar
 Koschorreck M, Gilles ED: Mathematical modeling and analysis of insulin clearance in vivo. BMC Systems Biology. 2008, 2: 43PubMed CentralView ArticlePubMedGoogle Scholar
 Faeder J, Blinov M, Goldstein B, Hlavacek W: Combinatorial complexity and dynamical restriction of network flows in signal transduction. Syst Biol (Stevenage). 2005, 2: 515.View ArticleGoogle Scholar
 Faeder JR, Blinov ML, Goldstein B, Hlavacek WS: Rulebased modeling of biochemical networks. Complexity. 2005, 10: 2241.View ArticleGoogle Scholar
 Blinov ML, Faeder JR, Goldstein B, Hlavacek WS: BioNetGen: software for rulebased modeling of signal transduction based on the interactions of molecular domains. Bioinformatics. 2004, 20 (17): 328991.View ArticlePubMedGoogle Scholar
 Faeder JR, Blinov ML, Hlavacek WS: Graphical rulebased representation of signal transduction networks. Proc ACM Symp Appl Computing. 2005, 133140.Google Scholar
 Blinov ML, Yang J, Faeder JR, Hlavacek WS: Graph theory for rulebased modeling of biochemical networks. Lect Notes Comput Sci. 2006, 4230: 89106.View ArticleGoogle Scholar
 Novère NL, Shimizu T: STOCHSIM: modelling of stochastic biomolecular processes. Bioinformatics. 2001, 17 (6): 5756.View ArticlePubMedGoogle Scholar
 Lok L, Brent R: Automatic generation of cellular reaction networks with Moleculizer 1.0. Nat Biotechnol. 2005, 23: 1316.View ArticlePubMedGoogle Scholar
 Calzone L, Fages F, Soliman S: BIOCHAM: an environment for modeling biological systems and formalizing experimental knowledge. Bioinformatics. 2006, 22 (14): 18057.View ArticlePubMedGoogle Scholar
 Sedwards S, Mazza T: CytoSim: a formal language model and stochastic simulator of membraneenclosed biochemical processes. Bioinformatics. 2007, 23 (20): 28002.View ArticlePubMedGoogle Scholar
 Gillespie DT: A rigorous derivation of the chemical master equation. Physica A. 1992, 188: 404425.View ArticleGoogle Scholar
 Blinov ML, Faeder JR, Yang J, Goldstein B, Hlavacek WS: 'Onthefly' or 'generatefirst' modeling?. Nat Biotechnol. 2005, 23 (11): 13445.View ArticlePubMedGoogle Scholar
 Pawson T, Nash P: Assembly of cell regulatory systems through protein interaction domains. Science. 2003, 300 (5618): 445452.View ArticlePubMedGoogle Scholar
 Conzelmann H, Gilles ED: Functional Proteomics: Methods and Protocols. 557576. Humana Press 2008 chap. Dynamic pathway modeling of signal transduction networks – A domainoriented approachGoogle Scholar
 Conzelmann H, Fey D, Gilles E: Exact model reduction of combinatorial reaction networks. BMC Syst Biol. 2008, 2: 78PubMed CentralView ArticlePubMedGoogle Scholar
 Hartwell L, Hopfield J, Leibler S, Murray A: From molecular to modular cell biology. Nature. 1999, 402 (6761 Suppl): C4752.View ArticlePubMedGoogle Scholar
 SaezRodriguez J, Kremling A, Conzelmann H, Bettenbrock K, Gilles ED: Modular Analysis of Signal Transduction Networks. IEEE Contr. Syst Mag. 2004, 24 (4): 3552.View ArticleGoogle Scholar
 SaezRodriguez J, Kremling A, Gilles ED: Dissecting the puzzle of life: Modularization of signal transduction networks. Comput Chem Eng. 2005, 29 (3): 619629.View ArticleGoogle Scholar
 Ederer M, Sauter T, Bullinger E, Gilles ED, Allgöwer F: An Approach for Dividing Models of Biological Reaction Networks into Functional Units. Simulation. 2003, 79 (12): 703716.View ArticleGoogle Scholar
 Isidori A: Nonlinear control systems. 77105. Springer 1995 chap. Global decomposition of control systemsGoogle Scholar
 ALC website. http://layer.mpimagdeburg.mpg.de
 Hucka M, Finney A, Sauro H, Bolouri H, Doyle J, Kitano H, Arkin A, Bornstein B, Bray D, CornishBowden A, Cuellar A, Dronov S, Gilles E, Ginkel M, Gor V, Goryanin I, Hedley W, Hodgman T, Hofmeyr JH, Hunter P, Juty N, Kasberger J, Kremling A, Kummer U, Novère NL, Loew L, Lucio D, Mendes P, Minch E, Mjolsness E, Nakayama Y, Nelson M, Nielsen P, Sakurada T, Schaff J, Shapiro B, Shimizu T, Spence H, Stelling J, Takahashi K, Tomita M, Wagner J, Wang J, Forum S: The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics. 2003, 19 (4): 52431.View ArticlePubMedGoogle Scholar
 Lamport L: LaTeX: A Document Preparation System: User's Guide and Reference Manual. 1994, AddisonWesley Longman, AmsterdamGoogle Scholar
 Comprehensive TeX Archive Network. http://www.ctan.org
 Wall L, Christiansen T, Orwant J: Programming Perl. There's More Than One Way To Do It. 2000, O'Reilly MediaGoogle Scholar
 Comprehensive Perl Archive Network. http://www.cpan.org
 Apache HTTP Server Project. http://httpd.apache.org
 ALC on SourceForge.net. http://sourceforge.net/projects/alc/
 Shapiro BE, Hucka M, Finney A, Doyle J: MathSBML: a package for manipulating SBMLbased biological models. Bioinformatics. 2004, 20 (16): 282931.PubMed CentralView ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.