An enzymecentric approach for modelling nonlinear biological complexity
 ChinRang Yang^{1}Email author
DOI: 10.1186/17520509270
© Yang; licensee BioMed Central Ltd. 2008
Received: 27 February 2008
Accepted: 01 August 2008
Published: 01 August 2008
Abstract
Background
The current challenge of Systems Biology is to integrate high throughput data sets for simulating the complexity of biological networks, exploit the evolution of naturedesigned networks that maintain the robustness of a biological system, and thereby generate novel, experimentally testable hypotheses. In order to simulate nonlinear biological complexities, we have previously developed an EnzymeCentric mechanistic modeling approach and validated it using metabolic network in E. coli. The idea is to use prior knowledge of catalytic and regulatory mechanisms of each enzyme within the metabolic network to build a dynamic model for investigating the network level regulation and thus understand the nature design principle behind the network.
Results
In this paper, we further demonstrate the application of complex enzyme catalytic and regulatory modules to simulate nonlinear network regulatory patterns vs. simple linear conversion model. We learned and validated that it is essential to incorporate prior knowledge from the literature to simulate nonlinear biological complexities. The network expandability is demonstrated and validated with the complex amino acid biosynthetic network with multiregulations. Also, we demonstrated the compatibility of mechanistic models within close species. Furthermore, the eukaryotic protein factory model for insuring steady mRNA production is simulated and the coupling of RNA transcription and splicing is validated by both mathematical simulation and experimental analysis.
Conclusion
We demonstrated the importance of modeling complex enzyme catalytic and regulatory mechanisms to further understand nonlinear network regulatory patterns. The simulations presented in this paper reveal how a living system maintains homeostasis and its robustness to continue functioning while facing environmental stresses or genetic mutations.
Background
Remarkable advances in the high throughput technologies enable researchers to broaden their research focuses from a single gene/protein to global gene/protein expression profiles. The daunting challenge is how to turn these overwhelmingly data into real information and gain meaningful insights of how the information is processed in a living system. The goal of Systems Biology is to integrate these high throughput data sets through mathematical models that can computationally simulate the complexity of a biological network, explore the design principles during the evolution of a biological circuit to insure its robustness and thereby generate novel, experimentally testable hypotheses. It is the mission in the postgenomic era to understand how all the parts of cells – genes, proteins, and many other molecules – work in concert to create complex living organisms and analyze how entire biological systems function, both in health and in sickness.
The main challenge for systems biologists is to develop a mathematical modeling technique suitable for modeling the complexity of biological systems. The "EnzymeCentric" mechanistic modeling approach has been developed in order to integrate expert knowledge in the mechanistic understanding of enzyme catalytic and regulatory mechanisms [1, 2]. Various enzyme models can be assembled into a pathway such that the combination of various pathways is automatically extended into a larger biological network. The modularity of the model and the feature of automatic model generation enables flexible updates. This approach incorporates nonlinear biological properties which are essential for simulating the network level regulation and predicting responses to perturbations.
In contrast, commonly seen modeling approaches either ignore the dynamic properties of enzymes by assuming they are constant or, without considering regulation, simply assign probability linkage among components through training sets. The term "nonlinear" here refers to the biological nonlinearity of enzymes that are variables in the equations (not mathematical nonlinearity). The amino acid biosynthetic network and the interaction of transcription and RNA splicing are presented here to show the advantages of using the nonlinear EnzymeCentric approach vs. a linear conversion model. These examples reveal the value of EnzymeCentric modeling to help understand how a living system maintains homeostasis and continues to function (robustness) while facing environmental stresses or genetic mutations.
Results
The Modularity of Nonlinear EnzymeCentric Modeling of the Amino Acid Biosynthetic Network
Nonlinear Network Level Regulation vs. Linear Conversion Model: Prediction of Isoleucine Production over Pyruvate Perturbation
To illustrate what happens in the nonlinear model, in Fig 3B, AHAS, IR, DAD and TB are multifunctional enzymes shared by valine and isoleucine biosynthetic pathways. Excess Pyr disturbs the balance of enzyme partition and shifts most of enzymes to the valine biosynthesis. Decreased enzymes for the isoleucine pathway produce less isoleucine and the feedback inhibition of threonine deaminase (TDA) is relieved. TDA is more active to consume and decrease threonine level. To validate the response of the feedback regulation, the product of TDA, αKB, is a toxic metabolite that blocks glucose transport and cell growth. When TDA is activated, αKB accumulates, and AHAS enzymes become a rate limit step due to the shift of enzyme partition between two pathways. Using a simple growth assay, we validated that excess Pyr inhibits E. coli K12 growth (the right image of Petri dish in Fig 3D). This phenomenon cannot be explained by the simple linear conversion model.
It is essential to realize that under the network regulation, increasing substrate alone can not guarantee increased product amount. This is especially important for metabolic engineering. What we demonstrated here is the effect of multifunctional enzyme partition under increasing substrate condition on preventing overproduction and maintaining homeostasis of an amino acid.
Nonlinear Network Level Regulation vs. Linear Conversion Model: Prediction of Valine Production over Pyruvate Perturbation
While excess Pyr shifts enzymes to the valine biosynthesis, an interesting question is whether this affects valine production. Given more substrates and enzymes, the obvious answer is valine will increase, which is true if the reaction is in a test tube with purified enzymes. However, the simulated valine response pattern (dashed line in Fig 3E) tells a different story. Initially, valine increases in concert with Pyr level until saturation occurs. While Pyr keeps increasing and passes a certain threshold, an isozyme (transaminase C or TC) reaction is turned on (Fig 3F). TC converts valine reversibly back to its intermediate, αKIV. Once turned on, the TC reaction becomes a dominant reaction over the major transamination of transaminase B (TB), so that valine decreases as Pyr continues to increase.
This biological circuit is designed to automatically switch the direction of the metabolic flux to prevent overproducing and thus maintain homeostasis of valine. While the substrate levels increase, the outcome could be increasing, unchanged or decreasing, depending upon the given condition. Although the difference may not be obvious within the physiological levels of Pyr in both models (1000 uM in Fig 3C&D), under the extreme perturbation (e.g. drug treatments or gene mutations), the enzymecentric model clearly shows a distinct response pattern.
Flexibility and Compatibility of Mechanistic Model within Close Species: Prediction of Drug Sensitivity in Salmonella LT2
The ability to reuse models within close species or cell lines is another advantage of the mechanistic modeling over statistic modeling. The model can be reused by changing a few parameters that are known to be different among species and validated without rebuilding the model. In contrast, statistical models require an entirely new training data set to train a new model.
We demonstrate here that Escherichia coli (E. coli) K12 and Salmonella typhimurium (S. typh.) LT2 are two closely related model organisms. However, several enzyme kinetic properties are known to be different: (i) E. coli K12 has nonfunctional, mutated AHAS isozyme II; S. typh. LT2 has normal AHAS II, but does not express AHAS III; (ii) LT2 has much higher TDA enzyme activity (kcat is over 10 fold increased) than K12; and, (iii) although the allosteric TDA is a tetramer in LT2 the same as in K12, it has only two functional catalytic sites. The allosteric parameters (c and L) were refitted to the kinetic data from LT2 as described in [3] [curve fitting shown in Additional file 2A]. The following are values of parameters in the models to reflect this knowledge: LT2: [AHAS II] = 8 uM, [AHAS III] = 0 uM, c = 0, L = 0.27 vs. K12: [AHAS II] = 0 uM, [AHAS III] = 2 uM, c = 0.013, L = 1.05.
EnzymeCentric Modeling of the Coupling Effect of Spliceosome and RNA Polymerase
Discussion
In this paper, we demonstrate the application of complex enzyme catalytic and regulatory modules to simulate nonlinear network regulatory patterns. In order to simulate the nonlinear biological complexities, it is essential to incorporate prior knowledge of enzyme mechanism from the literature. Simulation/perturbation of the integrated mathematical model from each individual enzyme models will help us to understand the network level regulation and the purpose of the circuit design. The model expandability is demonstrated; each individual amino acid metabolic pathway is simulated/validated independently and then adds into a larger network model to form a complex amino acid biosynthetic network with layers of multiregulations. These regulatory mechanisms, including multifunctional enzyme partition and reversible isozyme reaction, contribute to prevent overproduction and maintain homeostasis of amino acids under environmental changes. The purposes of these metabolic circuit designs are never realized by investigators when the focus is on studying the mechanism of a single enzyme. Simulations using the integrated, nonlinear EnzymeCentric model uncover the purposes of these designs. To simulate and understand these properties are especially important for metabolic engineers to design mutant strains that remove these regulatory mechanisms that protect microorganism from overproduction to increase yield of amino acid production.
We also demonstrated the flexibility of mechanistic models within close species by changing a few parameter values. The model can easily be reused from one species to the other. Furthermore, the eukaryotic protein factory model for ensuring steady mRNA production is simulated and the coupling of RNA transcription and splicing is validated by both mathematical simulation and experimental analysis. This circuit design guarantees an extended halflife for the proper processing of nascent premRNAs and ensures the quality and steady production of mature mRNA production, which does not occur in viral RNA polymerase.
Comparison of Nonlinear EnzymeCentric Approach to Other Models: The Key for Modeling the Biological NonLinearity is "Enzyme is a Variable"
The network level regulation involving multiple regulatory mechanisms is beyond the concept of a simple feedback loop, linear conversion [8] or statistic model. Regulatory proteins which are critical for the robustness and integrity of the system can be identified using numerical perturbations as shown above. In general, such regulatory proteins are located at the upstream branch points of a network. Usually, multiple regulators or isoforms control the direction or partition of metabolic flux (e.g. TDA and AHAS). Nevertheless, downstream regulators are also observed (e.g. TC) under certain conditions. Computers can generate spectra of dynamic responses over biological perturbations to facilitate the identification of regulatory proteins and thereby explicate the design principles of natureoccurring biological circuits.
Unfortunately, biology is more complicated than those models can realistically reflect without the introduction of numerous extra dynamic variables. For example, in the metabolic network model below, to model the isozymes (AKI&III and AHASI&II&III) that are controlled by different modes of regulation, the MCA models must allow multiple fluxes for the conversion of the same substrate to the same product. To model the bifunctional enzyme (AKIHDHI, the same protein carrying two enzyme activities), the fluxes of these two enzyme reactions must be "dependent" on each other. Also, several enzymes are involved in multipathways and their fluxes depend on the enzyme partition between these pathways in a given condition. These facts point out the fundamental drawback of the MCA approach in considering metabolic networks as a collection of "independent" chemical conversions. With enzymes remaining as variables, the EnzymeCentric approach allows the temporal patterns of enzyme state/partition and networkspecific regulatory patterns to be identified and is able to incorporate nonlinear properties of a biological network, such as positive/negative feedback regulation, allosteric regulation [3], reversible enzyme reaction, posttranslational modification (intermediate enzyme states), biological redundancy (isozymes/isoforms), multifunctional enzyme and its partition between the multifunctional pathways [1]. In contrast, commonly seen modeling approaches mainly focus on discovering correlations or probability linkages between molecular species using specific training sets. Regulation is not considered to simplify the complexity of their models (Fig 6). Nevertheless, simulations presented here demonstrate these nonlinear regulatory properties are essential for modeling the network level regulation that maintains the robustness of a biological system.
Conclusion
In conclusion, we recognize biological complexity and develop a novel modeling tool to integrate prior mechanistic knowledge into a mathematical model. With perturbing the parameters in the model systematically, regulatory factors critical to maintaining the functional integrity of the system were found. We demonstrated the importance of modeling complex enzyme catalytic and regulatory mechanisms to further understand the nonlinear network regulatory patterns and circuit design for preventing of overproduction and thereby maintaining homeostasis. The simulations presented in this paper reveal how a living system maintains homeostasis and robustness to continue functioning while facing environmental stresses and also strengthen the idea of bringing knowledge and regulatory mechanisms into computer simulation [8] to make a model smart enough, and, as such, become an engine of discovery and prediction.
Methods
EnzymeCentric Modeling Approach
The idea of "EnzymeCentric" modeling is to understand common enzyme catalytic and regulatory mechanisms in biological processes, and then integrate individual enzyme models into a pathway so various pathways assemble into a larger biological network [1, 2]. The focus is on mining the expert knowledge of individual enzymes studies by different laboratories from the literature to identify molecular interactions and regulatory patterns of each enzyme (e.g. feedback and allosteric regulation). Special attention is given to enzyme isoforms and multifunctional enzymes which are essential for the reactive flux distribution within the network [1]. The mathematical tools for the EnzymeCentric modeling, including kMech (enzyme kinetics), GMWC (Generalized MWC model for multiligand allosteric regulation) and Cellerator, are freely available to noncommercial users http://www.cellerator.org. They can be executed in Windows, MacOS, or Linux within Mathematica™. Alternatively, Cellerator can generate ordinary differential equations (ODEs) in System Biology Markup Language (SBML).
Network Extension: The Modularity of EnzymeCentric Modeling
Yang et al. 2005 [1] and Najdi et al. 2006 [3] have used the EnzymeCentric approach to model the regulated flow of metabolites through the multifunctional branched chain amino acids (BCAA, isoleucine, valine and leucine) biosynthetic pathways and their upstream threonine biosynthesis in the model organism, and E. coli K12, respectively (Fig 1A). Both models have been validated with several known genetic and biochemical perturbations. To demonstrate the modularity and expandability of kMech, we simply put these two models together in the simulator to form a network of four interacting metabolic pathways. Cellerator automatically regenerates 131 ODEs with 189 rate parameters for the new model consisting of 17 enzymes [Additional file 1: Mathematica™ codes and parameter values]. The enzyme mechanisms of these pathways include simple catalytic, Bi Bi, Ping Pong Bi Bi, and Bi Ter mechanisms that are regulated by either allosteric, competitive, noncompetitive inhibition or activation mechanisms (Fig 1B). The connecting points of these four pathways (Thr, Pyr and αKIV) and eight feedback loops are automatically found and connected by the computer. In the reported simulations, steadystate enzyme activity levels were optimized to properly channel the steadystate flow of metabolic intermediates through these pathways at levels that match their reported in vivo levels and the concentrations of enzyme cofactors, ATP and NAD(P)(H) were kept as constants [3, 1].
Rate Constant Approximation
One important issue of mechanistic modeling is how to obtain kinetic rate constants for simulation. It is a difficult task to measure the forward and reverse rate constants (kf, kr) experimentally. Alternatively, the rate constants of metabolic enzymes are approximated from easily measured kinetic constants Km (MichaelisMenten constant) and kcat (catalytic constant or enzyme turnover number) using the Lambda (Λ) approximation which is previously developed and implemented in kMech [1, 2]. In brief, the approximation introduces a new parameter Λ that represents the ratio of forward reaction flux of the enzymesubstrate complex formation to the catalytic flux of product production. In other words, when Λ is large, the enzymesubstrate complex approaches steady state very fast. This is the same as the MichaelisMenten pseudosteady state assumption [9]. The flexibility of data fitting using the Λ approximation over the MichaelisMenten equation is also illustrated in Additional file 3A. The values of Λ are empirically adjusted to fit experimental data. In the case of metabolic network simulation, the values of Λ can be varied from 10 to 1,000,000 with no significant changes in the steady levels of intermediates and endproducts. Therefore, the Λ is set to 100 for all enzymes in the model [Additional file 1: Mathematica™ codes and parameter values]. However, in the RNA splicing model, the binding of RNA polymerase to the DNA template is a slow process, and the Λ is set to 1 to fit the measured results [Additional file 4: Mathematica™ codes and parameter values]. However, not all biological pathways are as well studied as the metabolic network (i.e. Km and kcat are not always available). One solution for this challenge is to apply the quantitative time course data after certain treatments to constrain the model and approximate parameter spaces for kinetic constants as demonstrated in the RNA splicing model below.
The other issue of dealing with a large number of parameters is how to prevent overfitting. The integrated metabolic network model has total of 189 rate parameters. To avoid the overfitting problem, the key is to understand the parameters. For each enzyme, at least three parameters are needed: total enzyme concentration ([En]total), affinity to all of its substrates (Km), and reaction rate (kcat). If the enzyme is regulated by additional factors, more parameters are added (e.g. Ki for affinity to an inhibitor; Ka for an activator). The Monod, Wyman, Changeux model [3] is used for modeling an allosteric enzyme with two additional allosteric parameters: L (partition of active and inactive enzymes) and c (affinity of substrate to the inactive enzyme). In other words, we only introduced biological meaningful parameters into the model to prevent the problems of overfitting.
The Interactome of RNA Transcription and Spliceosome: The Mathematical Model of RNA Splicing
The mathematical model of RNA splicing was built using the EnzymeCentric approach. Each enzyme mechanism is parsed by kMech into a set of fundamental associationdissociation reactions that are translated by Cellerator into ordinary differential equations (ODEs) that are numerically solved by Mathematica™. The pathway diagram of the interaction between transcription and RNA splicing is shown below:
The 1st reaction represents "Transcription of DNA" modeled by the kMech generalized BiBi (twosubstrate, twoproduct) reaction. NTP represented four nucleotides required for transcription. The 2nd reaction represents spliceosome binds to premRNA modeled by the Cellerator simple catalytic model. The 3rd reaction represents spliceosome_premRNA complex releases mRNA and free spliceosome. The 4th to 6th reactions represent premRNA and mRNA degradation by RNase. The above model was translated by Cellerator into 13 ODEs with 15 rate constants that describe the rates of change of 13 reactants involved in the model [Additional file 4: Mathematica™ codes and parameter values]. The forward rate constants (variable names with kfprefix) and reverse rate constants (variable names with krprefix) were not available experimentally and approximated from experimental measurements (Km and kcat) of enzymes by Λ approximation method. The plausible values of the kinetic measurements (Km and kcat) are optimized from the quantitative time course measurements of premRNA and spliced mRNA using in vitro RNA splicing assay for the Pol II and T7 polymerase as shown in Additional file 4.
Analysis of the Nonlinear System using Systematic Perturbation
The rule of thumb for the EnzymeCentric approach is that as more factors are introduced into the model, the more we can study how these factors affect the robustness of the system and why the system evolves to have these factors. However, the common bistability or bifurcation analysis requires reducing the complex nonlinear model (by assuming many variables as constants) to a simplified nearlinear model with a few parameters. To maintain the biological complexity, an alternative way is to apply the systematic perturbation, which is commonly used in other contexts (e.g., bridge building, or automobile and airplane manufacturing) to test newly designed products through extensive computer simulations before prototyping. The goal of this task is to identify the regulatory proteins or controlling factors which are critical for the robustness and integrity of the biological system by iterating simulations with altered values of substrate/enzyme concentrations or kinetic constants. All enzymatic parameters including [En]total (total enzyme concentration), Km (the affinity of enzyme to substrate) and kcat (the rate of catalysis) were perturbed numerically.
Abbreviations
 BCAA:

branched chain amino acids
 E. coli :

Escherichia coli
 S. typh :

Salmonella typhimurium
 SM:

Sulfometuron Methyl
 Pol II:

RNA polymerase II
 MCA:

metabolic control analysis
 GMWC:

Generalized Monod, Wyman, Changeux model
 ODE:

ordinary differential equation.
Declarations
Acknowledgements
This project was supported by the Biomedical Informatics Training (BIT) Program from National Institutes of Health, National Research Service Award (5 T15 LM007443) from the National Library of Medicine, USA. Author thanks Hatfield, GW and Mjolsness, ED for mentoring; Hertel, KJ for experimental support on the RNA splicing assay; Wan, F for consulting in mathematical modeling and analysis (UCIrvine); Shapiro, BE (Cal Tech) for the software support; Dong, Y (UTSW) for the artwork and Strawderman, T (UTSW) for copyediting the manuscript. This is manuscript CSCN006 from the 'Cell Stress and Cancer Nanomedicine' program in the Simmons Comprehensive Cancer Center at the University of Texas Southwestern Medical Center at Dallas.
Authors’ Affiliations
References
 Yang CR, Shapiro BE, Hung SP, Mjolsness ED, Hatfield GW: A mathematical model for the branched chain amino acid biosynthetic pathways of Escherichia coli K12. J Biol Chem. 2005, 280 (12): 1122411232. 2005/01/20 10.1074/jbc.M411471200View ArticlePubMedGoogle Scholar
 Yang CR, Shapiro BE, Mjolsness ED, Hatfield GW: An enzyme mechanism language for the mathematical modeling of metabolic pathways. Bioinformatics. 2005, 21 (6): 774780. 2004/10/29 10.1093/bioinformatics/bti068View ArticlePubMedGoogle Scholar
 Najdi TS, Yang CR, Shapiro BE, Hatfield GW, Mjolsness ED: Application of a generalized MWC model for the mathematical simulation of metabolic pathways regulated by allosteric enzymes. J Bioinform Comput Biol. 2006, 4 (2): 335355. 2006/07/05 10.1142/S0219720006001862View ArticlePubMedGoogle Scholar
 Hicks MJ, Yang CR, Kotlajich MV, Hertel KJ: Linking splicing to Pol II transcription stabilizes premRNAs and influences splicing patterns. PLoS Biol. 2006, 4 (6): e1472006/04/28 10.1371/journal.pbio.0040147PubMed CentralView ArticlePubMedGoogle Scholar
 Hatzimanikatis V, Bailey JE: MCA has more to say. J Theor Biol. 1996, 182 (3): 233242. 1996/10/07 10.1006/jtbi.1996.0160View ArticlePubMedGoogle Scholar
 Alves R, Savageau MA: Comparing systemic properties of ensembles of biological networks by graphical and statistical methods. Bioinformatics. 2000, 16 (6): 527533. 2000/09/12 10.1093/bioinformatics/16.6.527View ArticlePubMedGoogle Scholar
 Wu L, Wang W, van Winden WA, van Gulik WM, Heijnen JJ: A new framework for the estimation of control parameters in metabolic pathways using linlog kinetics. Eur J Biochem. 2004, 271 (16): 33483359. 2004/08/05 10.1111/j.00142956.2004.04269.xView ArticlePubMedGoogle Scholar
 Schadt EE, Sachs A, Friend S: Embracing complexity, inching closer to reality. Sci STKE. 2005, 2005 (295): pe402005/08/04 10.1126/stke.2952005pe40PubMedGoogle Scholar
 Murray JD: Mathematical Biology. Edited by: SA L. 1993, 19: 111118. Reaction Kinetics , Springer, 2nd, BiomathematicsGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.