- Open Access
Sig2GRN: a software tool linking signaling pathway with gene regulatory network for dynamic simulation
BMC Systems Biologyvolume 10, Article number: 123 (2016)
Linking computational models of signaling pathways to predicted cellular responses such as gene expression regulation is a major challenge in computational systems biology. In this work, we present Sig2GRN, a Cytoscape plugin that is able to simulate time-course gene expression data given the user-defined external stimuli to the signaling pathways.
A generalized logical model is used in modeling the upstream signaling pathways. Then a Boolean model and a thermodynamics-based model are employed to predict the downstream changes in gene expression based on the simulated dynamics of transcription factors in signaling pathways.
Our empirical case studies show that the simulation of Sig2GRN can predict changes in gene expression patterns induced by DNA damage signals and drug treatments.
As a software tool for modeling cellular dynamics, Sig2GRN can facilitate studies in systems biology by hypotheses generation and wet-lab experimental design.
One of the major forms of cellular responses to extracellular perturbations is to change the gene expression in response to the cellular signals transmitted by signaling pathways. Diverse stimuli can be converted into a series of intercellular reactions through signal transduction pathways which generate various transcription factor activities, thereby producing different gene expression patterns that result in subsequent cellular behaviors.
Over the past few decades, many studies have presented various computational strategies, such as data-driven, logic-based and biochemical kinetic methods, in modeling signaling pathways or gene regulatory networks separately. Data-driven methods [1–4], which are constructed mainly based on statistical modeling, show great potential when the underlying biological mechanisms are unclear. Logic-based models, such as Boolean Network [5–11] and generalized logic models [12–16] are suitable formalisms for modeling relatively large networks in which the detailed kinetic parameters are not fully available. If the underlying biochemical mechanisms are known, biochemical kinetic modeling [17–20] is a well-established strategy for describing the dynamic sub-cellular systems using a set of mathematical equations. In the field of gene regulation, thermodynamic models have also been successfully applied [21–23] besides the aforementioned methods.
Despite the many models of signaling pathways and gene regulatory network (GRN), it is still a big challenge to link the models of signal transduction with the downstream gene expression regulation. To address this challenge, Peng et al.  used a set of differential equations to do forward simulations of the NF-kB signaling pathway and then used Network Component Analysis [25–27], a data-driven method based on matrix decomposition, to reversely engineer a gene regulatory network (GRN). Then they matched the forward simulations and reverse engineering results and successfully linked the signaling profiles with the subsequent gene expression profiles. However, their method needs detailed kinetic parameters which may not be available as yet in many cases. A similar study of Melas et al.  first employed a multi linear regression algorithm to identify correlation-based relationships between signaling proteins and cellular responses (e.g., cytokine releases) and connected them using “non-canonical” edges. Integrating a canonical network of the signaling pathway from prior knowledge, the whole network was then converted into a Boolean model. Next, they optimized the network against the experimental data using Integer Linear Programming  and identified the pathway activities that induced the diverse cellular responses. Their reconstructed model is able to predict the dynamics of signaling pathways and cellular responses; however, because the biological meaning of the “non-canonical” edges learnt from the data is difficult to interpret, their model can hardly reveal the molecular mechanisms of how signal transduction regulates gene expression.
Here, we present Sig2GRN, a software tool which links the models of signaling pathway with gene regulatory networks (GRNs). A generalized logical model, which we proposed previously in  and is based on network topology, is employed here to capture the dynamical trends of transcription factors in cellular signaling pathways. Then two different models, i.e., a Boolean model and a thermodynamics-based model , are integrated to predict the downstream gene expression patterns based on the predicted transcription factor activities. As a Java plugin for Cytoscape  (version 2.8.3), Sig2GRN is able to simulate the dynamics of the signaling pathways and the subsequent time-series gene expression data. We first provide an overview of Sig2GRN’s core functionalities, and then describe two case studies to illustrate the usage and performance of Sig2GRN.
Methods and implementation
Generalized logical modeling of signaling pathways for predicting transcription factor activities
Sig2GRN takes a directed graph as the input where each vertex x in the network represents a molecular species (e.g., a signaling protein, a transcription factor or a gene) and each directed edge (u, v) denotes a biological interaction (e.g., protein phosphorylation or transcriptional regulation) from node u to node v. The input network is divided into two layers, i.e., the upstream signaling pathways and the downstream gene regulatory network, according to the type of biological interactions from prior knowledge (Fig. 1). The simulation starts from the user-defined perturbations and generates the dynamical trends of the signaling proteins using a generalized logical model in our previous work . The goal of the upstream simulation is to generate the dynamical trends of the transcription factors under the perturbations. Then the simulated transcription factor activities are employed to predict the gene expression patterns over time, using either a Boolean model or a thermodynamic-based model , which can be selected by users. Therefore, the two layers of network are linked together by the transcription factors, and the cellular responses can be predicted given the extracellular perturbations.
In the upstream signaling pathways, the state S t (with value ∈[0,1], where 0 means fully inhibited and 1 means fully activated) of node s at the t-th simulation iteration is updated based on its previous state at the (t−1)-th iteration and the incoming signals from its parent nodes, according to Eq. (1) ,
where A i (or B j ) represents the amount of signals transmitted from the i-th activating (or j-th blocking) parent node upstream of s, and d is the degradation rate (value ∈(0,1)) at each iteration. Using this model, we have successfully predicted the dynamics of a cancer signaling pathway under various perturbations . In this work, we select the simulated transcription factor activities (e.g., the proportion of the concentration of transcription factor in the active form) as the output of the upstream generalized logical mode and use them as the input to the downstream models to further predict the gene expressions as shown in Fig. 1.
Boolean modeling of transcriptional regulation
Once the time-series data of the transcription factor activities (value ∈[0,1] at each simulation iteration) are generated, users can select either a Boolean model  or a thermodynamic model  to predict the subsequent gene expression patterns.
Under the Boolean scenario, the AND logical relation is assigned to the transcription factors that have the same transcriptional regulation type (e.g., activation or inhibition) for a gene, so that the gene will be switched ON (or OFF) when the maximum activity level of activating (or inhibiting) transcription factors surpasses a user-defined threshold (value ∈(0,1)). When both activation and inhibition regulations are present on the same gene, the inhibition is assumed to precede the activation. The simulation result of the Boolean model is a list of 0s and 1s, over the course of time.
Thermodynamic modeling of transcriptional regulation
Since the Boolean simulation only refers to whether a gene up-regulated or down-regulated without revealing to what extent they will be expressed or not, we implement a thermodynamic (also termed fractional occupancy) model  to describe the gradual responses of gene expression to signal transduction. The thermodynamic model is derived under the assumption that the system is in the thermodynamic equilibrium. As such, the gene expression level is defined as a function of the activity levels of the bound transcription factors as shown in Eq. (2) ,
where [ E] is the gene expression level, N is the number of all possible arrangements of transcription factors attaching to their corresponding binding sites, G is the set of transcription factor arrangements that turn the gene on, n i (n m ) is the number of transcription factor binding sites employed in the i-th (m-th) arrangement, K j and [ TF j ] represent the binding affinity of binding site j and the activity level of the transcription factor corresponding to binding site j, and Q i is the probability of the gene being expressed when the i-th arrangement comprises the binding of both activating and inhibiting transcription factors (Q i =1 when only activating transcription factors are included).
Case Study 1: DNA damage induced cell apoptosis. DNA damage caused by ionizing radiation will activate ATM, while that by UV light will activate ATR and DNA-PK [31–33]. The stimulated kinases ATM, ATR and DNA-PK can phosphorylate p53 and E2F1 transcription factors directly or indirectly via Chk1 and Chk2. The activated p53 and E2F1 can regulate transcription of apoptosis regulator Bax, Bcl-2 and Apaf-1. Figure 2 shows the regulatory cascade of DNA damage induced apoptosis regulation. The network is constructed using GeneGo MetaCore database .
Given the input data (value ∈[0,1]) associated with the receptors of the network (i.e., ATM, ATR and DNA-PK), the user-specified edge weights (value ∈[0,1]) and the number of iterations, Sig2GRN will first generate the dynamics of all the nodes’ activities in the network based on Eq. (1). By manually selecting the transcription factors that regulate the transcription of the genes of interest, we can run Sig2GRN to further predict this gene’s expression status over time. Figure 3 shows the simulated expression of Bax and Bcl-2 as an example. Here ATR and DNA-PK are selected as input nodes to simulate the exposure of the cells to UV light. The input levels of the input nodes were both set to 1; the edge weights of activation and inhibition interactions were set to 0.7 and 0.8, respectively; and the number of iterations was set to 100. In the Boolean model, for Bax, the selected transcription factor was p53 and the interaction type from p53 to Bax was set to activation; for Bcl-2, the transcription factors were E2F1 and p53 and the interaction types were activation and inhibition, respectively. In the thermodynamic model, the binding affinities of E2F1 and p53 were both set to 2. The parameter settings used here are only for purpose of demonstration because the prior knowledge available for parameter settings is insufficient in this case study. Moreover, the robustness of our model to the variations of the parameters (including the edge weights and the initial values of the nodes) have been empirically demonstrated in our previous work . It can be seen from Fig. 3 a that Bax is expressed (in the Boolean model, 1 means the gene is expressed) after about 12 iterations when the DNA damage signals are transmitted from p53. Similar conclusion can be drawn from the thermodynamic model in Fig. 3 b that the expression of Bax increases rapidly to a plateau. In Fig. 3 d, Bcl-2 is also turned on after about 15 iterations. Compared with Bax, the simulated expression of Bcl-2 using the thermodynamic model (Fig. 3 e) increases more smoothly and the maximum expression is less than that of Bax because of incoming inhibiting signals from p53.
To validate the simulation, we use a dataset in which human TK6 cells were treated with UV light and then the gene expression was measured at three time points, i.e., 4, 8 and 24 hrs . Figures 3 c and 3 f give the experimental data (the ratio of the gene expression levels between UV light treated and control groups) of Bax and Bcl-2 expression over the three time points. These two genes are the overlap between the network (Fig. 2) and the dataset , the dataset has measurements of many other genes which cannot be included in the network and the gene Apaf-1 in the network has no measurements in the dataset. It can be seen from the real data that the expression levels of both Bax and Bcl-2 increase over time when the cells are exposed to UV light; the slope of Bcl-2 curve is smoother and the height of the Bcl-2 curve is lower than that of the Bax curve. This suggests that, to some extent, our simulation tool is able to link the signal transduction with the gene expression regulation through transcription factors.
Case Study 2: apoptotic signaling network treated by different combinations of drugs. Predicting the efficacy of drugs and the design of combination therapy is a major endeavor for biomedical research and pharmaceutical industry. Lee et al.  studied the effects of different combinations of drugs in enhancing cell death in human breast cancer cells (cell line BT20). Here we construct a network based on their experiments and simulate the cell responses under different combinations of drug treatments to evaluate the performance of our simulator.
The network (Fig. 4) comprises 35 nodes and 57 edges [13, 34, 36]. In the 35 nodes, 32 represent signaling proteins and 3 represent cell fates (e.g., apoptosis, proliferation and cell cycle). From the dataset in , we select four samples, i.e., treated with (1) EGFR inhibitor, (2) DNA damage activator, (3) both drugs and (4) the control group. The dataset has no measurement of gene expression, instead, the numbers of cells that fall into each cell fate were measured at 5 time points (i.e., 0, 6, 8, 12 and 24 hrs). Therefore, no interaction of transcriptional regulation is included in the network. We directly calculate the proportion of the dead cells at each time point as the cells response to the perturbations.
As shown in Table 1, four different types of simulation input are defined to correspond to the experimental settings in . For example, half activating (0.5) signals are assigned to both TNFR and EGFR to simulate the control group; full blocking (0), half activating (0.5) and full activating (1) signals are assigned to EGFR, TNFR and DNA damage, respectively, to simulate the addition of both EGFR inhibitor and DNA damage activator together. The edge weights of activation and inhibition interactions are 0.7 and 0.8, respectively; and the number of iterations is 100. Since the network in Fig. 4 does not involve transcriptional regulation, the predicted dynamics of Caspase 3 (the only upstream node of Apoptosis) is considered as the predicted cell responses to the perturbations.
Figure 5 a shows the simulated proportions of cell death over time. Compared with the control group (the blue curve), the addition of drugs (the orange, yellow and purple curves) enhances cell death. While the EGFR inhibitor (the orange curve) increases cell death to a small extent, the effect of DNA damage activator (the yellow curve) is significant. Furthermore, the treatment with both drugs together (the purple curve) performs the best in enhancing the cell death. Compared with the real data in Fig. 5 b, the predictions are consistent with the experimental measurements of the drug effects in terms of trends and ranking of the curves. However, there is a synergistic effect of the co-treatment in the real data, e.g., the cell death proportion induced by the co-treatment exceeds the sum of the cell death proportions induced by the two treatments separately, which has not been captured by the simulation. Moreover, mapping the simulation iterations to the real time points remains a challenge for our simulator.
In spite of the promising performance of our computational simulations, limitations have also been noticed. For example, in case study 2, the simulation did not reveal the synergistic effect of the co-treatment by two drugs. Possible reasons include the insufficient prior knowledge of the input networks and an oversimplification of the computational model of the nonlinear regulatory system. Moreover, since the simulation is iterated over discrete time points, it is hard to assign real time to simulation steps, which is a major obstacle for linking the two biological processes (e.g., signal transduction and transcriptional regulation) with different time scales. Techniques of multiscale modeling and simulation will be incorporated into the software in near future.
Computational simulation is an important systems biology approach to the analysis of signaling pathways and gene regulatory networks. In this work, we present a software tool called Sig2GRN which is able to link the cellular signaling pathways with the downstream gene expression regulation. A generalized logical model is used in modeling the upstream signaling pathways, while a Boolean Network and a thermodynamic model are employed in modeling the downstream gene expression based on the simulated activities of transcription factors. We have shown two case studies on simulating the cell responses to the extracellular perturbations and validated the simulations with wet-lab experimental data. As a Cytoscape plugin, Sig2GRN is designed to be extensible so that more computational models of gene regulation (e.g., epigenetic modifications) can be integrated to facilitates studies in systems biology. Compared with existing methods to link signaling pathways with gene regulation, such as in , Sig2GRN is a parameter-free software which requires no kinetic parameters of the pathways, and thus it is still applicable when only insufficient prior knowledge of the underlying mechanisms is available. Moreover, Sig2GRN is able to predict the gene expression time-course data given the perturbations to the signaling pathways, whereas in  the gene expression data are required as the input of their model, which is therefore unable to predict new gene expression patterns.
Janes KA, Kelly JR, Gaudet S, Albeck JG, Sorger PK, Lauffenburger DA. Cue-signal-response analysis of TNF-induced apoptosis by partial least squares regression of dynamic multivariate data. J Comput Biol. 2004; 11(4):544–61.
Janes KA, Yaffe MB. Data-driven modelling of signal-transduction networks. Nat Rev Mol Cell Biol. 2006; 7(11):820–8.
Jaqaman K, Danuser G. Linking data to models: data regression. Nat Rev Mol Cell Biol. 2006; 7(11):813–9.
Duren Z, Wang Y. A systematic method to identify modulation of transcriptional regulation via chromatin activity reveals regulatory network during mESC differentiation. Scientific Reports 6. 2016; 22656. doi:10.1038/srep22656.
Calzone L, Tournier L, Fourquet S, Thieffry D, Zhivotovsky B, Barillot E, Zinovyev A. Mathematical modelling of cell-fate decision in response to death receptor engagement. PLoS Comput Biol. 2010; 6(3):1000702.
Kauffman SA. Homeostasis and differentiation in random genetic control networks. Nature. 1969; 224(5215):177–8.
Kauffman SA. Metabolic stability and epigenesis in randomly constructed genetic nets. J Theor Biol. 1969; 22(3):437–67.
Mitsos A, Melas IN, Siminelakis P, Chairakaki AD, Saez-Rodriguez J, Alexopoulos LG. Identifying drug effects via pathway alterations using an integer linear programming optimization formulation on phosphoproteomic data. PLoS Comput Biol. 2009; 5(12):1000591.
Saez-Rodriguez J, Simeoni L, Lindquist JA, Hemenway R, Bommhardt U, Arndt B, Haus UU, Weismantel R, Gilles ED, Klamt S, Schraven B. A logical model provides insights into T cell receptor signaling. PLoS Comput Biol. 2007; 3(8):163.
Sharan R, Karp RM. Reconstructing boolean models of signaling. J Comput Biol. 2013; 20(3):249–57.
Thomas R. Boolean formalization of genetic control circuits. J Theor Biol. 1973; 42(3):563–85.
Aldridge BB, Saez-Rodriguez J, Muhlich JL, Sorger PK, Lauffenburger DA. Fuzzy logic analysis of kinase pathway crosstalk in TNF/EGF/insulin-induced signaling. PLoS Comput Biol. 2009; 5(4):1000340.
Zhang F, Chen H, Zhao LN, Liu H, Przytycka TM, Zheng J. Generalized logical model based on network topology to capture the dynamical trends of cellular signaling pathways. BMC Syst Biol. 2016; 10(Suppl 1):7.
Huang ZY, Hahn J. Fuzzy modeling of signal transduction networks. Chem Eng Sci. 2009; 64(9):2044–056.
Morris MK, Saez-Rodriguez J, Clarke DC, Sorger PK, Lauffenburger DA. Training signaling pathway maps to biochemical data with constrained fuzzy logic: Quantitative analysis of liver cell responses to inflammatory stimuli. PLoS Comput Biol. 2011; 7(3):1001099.
Zheng J, Zhang D, Przytycki PF, Zielinski R, Capala J, Przytycka TM. Simboolnet–a cytoscape plugin for dynamic simulation of signaling networks. Bioinformatics. 2010; 26(1):141–2.
Albeck JG, Burke JM, Aldridge BB, Zhang M, Lauffenburger DA, Sorger PK. Quantitative analysis of pathways controlling extrinsic apoptosis in single cells. Mol Cell. 2008; 30(1):11–25.
Michaelis L, Menten ML. Die kinetik der invertinwirkung. Biochem. 1913; Z(49):333–69.
Neumann L, Pforr C, Beaudouin J, Pappa A, Fricker N, Krammer PH, Lavrik IN, Eils R. Dynamics within the CD95 death-inducing signaling complex decide life and death of cells. Mol Syst Biol. 2010; 6(352). doi:10.1038/msb.2010.6.
Novák B, Tyson JJ. A model for restriction point control of the mammalian cell cycle. J Theor Biol. 2004; 230(4):567–79.
Dresch JM, Thompson MA, Arnosti DN, Chiu C. Two-layer mathematical modeling of gene expression: Incorporating DNA-level information and system dynamics. SIAM J Appl Math. 2013; 73(2):804–26.
Dresch JM, Liu X, Arnosti DN, Ay A. Thermodynamic modeling of transcription: sensitivity analysis differentiates biological mechanism from mathematical model-induced effects. BMC Syst Biol.2010;4(142). doi:10.1186/1752-0509-4-142.
He X, Samee MAH, Blatti C, Sinha S. Thermodynamics-based models of transcriptional regulation by enhancers: The roles of synergistic activation, cooperative binding and short-range repression. PLoS Comput Biol. 2010; 6(9):e1000935. doi:10.1371/journal.pcbi.1000935.
Peng SC, Wong DS, Tung KC, Chen YY, Chao CC, Peng CH, Chuang YJ, Tang CY. Computational modeling with forward and reverse engineering links signaling network and genomic regulatory responses: NF-kB signaling-induced gene expression responses in inflammation. BMC Bioinforma. 2010; 11:308.
Chang C, Ding Z, Hung YS, Fung PC. Fast network component analysis (FastNCA) for gene regulatory network reconstruction from microarray data. Proc Natl Acad Sci USA. 2008; 24(11):1349–1358.
Liao JC, Boscolo R, Yang YL, Tran LM, Sabatti C, Roychowdhury VP. Network component analysis: reconstruction of regulatory signals in biological systems. Proc Natl Acad Sci USA. 2003; 100(26):15522–15527.
Noor A, Ahmad A, Serpedin E, Nounou M, Nounou H. ROBNCA: robust network component analysis for recovering transcription factor activities. Bioinformatics. 2013; 29(19):2410–418.
Melas IN, Mitsos A, Messinis DE, Weiss TS, Alexopoulos LG. Combined logical and data-driven models for linking signalling pathways to cellular response. BMC Syst Biol. 2011; 5:107.
Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003; 13(11):2498–504.
Ay A, Arnosti DN. Mathematical modeling of gene expression: a guide for the perplexed biologist. Crit Rev Biochem Mol Biol. 2011; 46(2):137–51.
Bakkenist CJ, Kastan MB. Dna damage activates ATM through intermolecular autophosphorylation and dimer dissociation. Nature. 2003; 421(6922):499–506.
Norbury CJ, Zhivotovsky B. DNA damage-induced apoptosis. Oncogene. 2004; 23(16):2797–808.
Dhanalakshmi S, Agarwal C, Singh RP, Agarwal R. Silibinin up-regulates DNA-protein kinase-dependent P53 activation to enhance UVB-induced apoptosis in mouse epithelial JB6 cells. J Biol Macromol. 2005; 280(21):20375–0383.
Ekins S, Nikolsky Y, Bugrim A, Kirillov E, Nikolskaya T. Pathway mapping tools for analysis of high content data. Methods Mol Biol. 2007; 356(356):319–350.
Glover KP, Chen Z, Markell LK, Han X. Synergistic gene expression signature observed in TK6 cells upon co-exposure to UVC-irradiation and protein kinase c-activating tumor promoters. PLoS ONE. 2015; 10(10):0139850.
Lee MJ, Ye AS, Gardino AK, Heijink AM, Sorger PK, MacBeath G, Yaffe MB. Sequential application of anticancer drugs enhances cell death by rewiring apoptotic signaling networks. Cell. 2012; 149(4):780–94.
We would like to thank Ms. Jing Guo, a Ph.D. student at the School of Computer Science and Engineering, Nanyang Technological University, for her help with testing the software.
This article has been published as part of BMC Systems Biology Volume 10 Supplement 4, 2016: Proceedings of the 27th International Conference on Genome Informatics: systems biology. The full contents of the supplement are available online at http://bmcsystbiol.biomedcentral.com/articles/supplements/volume-10-supplement-4.
This project is supported by MOE AcRF Tier 2 Grant ARC39/13 (MOE2013-T2-1-079) and MOE AcRF Tier 1 seed grant on complexity (RGC2/13), Ministry of Education, Singapore.
The publication cost is supported by MOE AcRF Tier 2 Grant ARC39/13 (MOE2013-T2-1-079), Ministry of Education, Singapore.
Availability of supporting data
Software availability: http://histone.scse.ntu.edu.sg/Sig2GRN/.
FZ designed study; acquired, analysed and interpreted data; implemented main experiments; drafted manuscript. RSL implemented the software. JZ conceived the study, participated in conceptualization and discussion, critically reviewed and revised the manuscript and gave final approval for submission. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate