 Research Article
 Open Access
 Published:
A new efficient approach to fit stochastic models on the basis of highthroughput experimental data using a model of IRF7 gene expression as case study
BMC Systems Biology volume 11, Article number: 26 (2017)
Abstract
Background
Mathematical models are used to gain an integrative understanding of biochemical processes and networks. Commonly the models are based on deterministic ordinary differential equations. When molecular counts are low, stochastic formalisms like Monte Carlo simulations are more appropriate and well established. However, compared to the wealth of computational methods used to fit and analyze deterministic models, there is only little available to quantify the exactness of the fit of stochastic models compared to experimental data or to analyze different aspects of the modeling results.
Results
Here, we developed a method to fit stochastic simulations to experimental highthroughput data, meaning data that exhibits distributions. The method uses a comparison of the probability density functions that are computed based on Monte Carlo simulations and the experimental data. Multiple parameter values are iteratively evaluated using optimization routines. The method improves its performance by selecting parameters values after comparing the similitude between the deterministic stability of the system and the modes in the experimental data distribution. As a case study we fitted a model of the IRF7 gene expression circuit to timecourse experimental data obtained by flow cytometry. IRF7 shows bimodal dynamics upon IFN stimulation. This dynamics occurs due to the switching between active and basal states of the IRF7 promoter. However, the exact molecular mechanisms responsible for the bimodality of IRF7 is not fully understood.
Conclusions
Our results allow us to conclude that the activation of the IRF7 promoter by the combination of IRF7 and ISGF3 is sufficient to explain the observed bimodal dynamics.
Background
Computer models contribute to the integrative understanding of complex molecular processes in the cell. The most commonly used approach is deterministic modeling based on ordinary differential equations (ODEs). When the studied system comprises species with a low molecular count, stochastic formalisms, e.g. Gillespie’s algorithm that simulates trajectories and uses discrete molecule numbers are more appropriate [1]. In recent years, the use of stochastic models has substantially increased [2].
Considering stochasticity in biological systems has changed the quantitative and qualitative understanding obtained by previous deterministic models [3]. Examples of biological phenomena discovered by stochastic modeling include gene expression in burstlike patterns [4], productive or latent cell decision after HIVinfection [5], and the presence of oscillatory behavior induced by noise [6]. However, in contrast to the plethora of methods used to fit and analyze deterministic models [7–10], there are only very limited sets of methods available to do the same with stochastic models.
Thus, only recently, methods for parameter fitting of stochastic models have been developed and are still under development. Parameter estimation methods for stochastic models have been so far designed for single time course data [11–16]. For experiments leading to data distributions, they include the moment closure [17–19], and the comparison of experimental and simulation distributions [20–22]. Despite the current efforts, there are still major problems that limit the full applicability of those methods. The main drawback is the high computational execution time. To solve this problem novel efficient strategies to fit stochastic models are needed.
In the following we introduce a new approach that is a variant of existing methods based on the comparison of distributions. Differently to the existing methods we use the experimental data to define a condition that the model must fulfill in the deterministic regime. This condition tests, if the deterministic steady states are in the close proximity to the modes of the experimental distribution. Only if the model fulfills this condition, the parameter set is evaluated in the stochastic regime. In this way, the algorithm directs the evaluation of parameter sets towards regions in parameter space with high potential to reproduce the experimental data. Additionally, this method applies to nonlinear models and complex experimental data distributions.
To test the potential of this new method, we selected an open scientific question and real experimental data. We investigated the molecular mechanisms responsible for the experimentally observed bimodal dynamics of IRF7 expression after Interferon (IFN) stimulation.
It is well documented that in isogenic cell populations not all cells produce an antiviral response when infected by a virus [23]. The mechanisms behind this heterogeneity have been related to stochastic events in the signaling pathways responsible to elicit the antiviral response [24–26]. IFN is a cytokine that activates the JAKSTAT signaling pathway in virusinfected cells. The JAKSTAT signaling pathway induces the translocation of ISGF3 (a transcription factor) into the nucleus to directly activate the transcription of a set of several hundred IFNstimulated genes (ISGs) [27]. IRF7 is an ISG with a central role in the immune response [28, 29]. Recently, it was observed that after IFN stimulation murine fibroblasts show a switchlike pattern of IRF7 expression, which is reflected at the population level as bimodality [30].
We developed a model to describe the observed bimodal dynamics of IRF7 expression after IFN stimulation. We hypothesize that the binding of IRF7 and ISGF3 to the IRF7 promoter is the mechanism responsible for this bimodal behavior. Our simulation results reproduced IRF7 switchlike dynamics at single cell level, and bimodality was achieved at the population level. Hereby, we used the newly developed method to fit the model and correctly reproduce the experimental data. Our results allow us to conclude that the IRF7 promoter activation by the combination of IRF7 dimers and ISGF3 is sufficient to quantitatively explain the observed IRF7 bimodal dynamics.
Methods
Experimental data
Published experimental data were produced by Rand et al. [30]. The data described the expression of IRF7 after IFN β stimulation in a population of murine NIH3T3 fibroblasts. Rand’s experiments were done in the following way: First, cells were transfected with a BAC (Bacterial Artificial Chromosome) containing IRF7 and reporter mCherry genes fused, subsequently cultures were treated with different concentrations of murine IFN β. For illustrative purposes we selected the treatment with 150U of IFN β, a concentration where bimodality is prevalent. Then, fluorescence in the cultures was monitored using flow cytometry at different time points during 48 hours.
Numerical methods
In the deterministic regime the model was simulated using symbolic methods in Matlab [31] and/or using the LSODA algorithm in COPASI 4.11 [32]. Stability was calculated by making the righthand side of the differential equations equal to zero and subsequently by finding the roots of the system using function solve in Matlab. Eigenvalues were calculated using function eig in Matlab. For solving the model under stochastic dynamics we used the Gibson and Bruck algorithm [33] coded in COPASI. The random search and genetic algorithm were coded in Matlab. Raw experimental data was analyzed using the function FCS data reader coded in Matlab [34]. The modes in the distributions were calculated using the function PeakFinder coded in Matlab [35]. The source code of the project can be accessed via: https://sourceforge.net/projects/irf7bimodaldynamics/.
Results
New algorithm to fit stochastic biological models
Comparing experimental and simulation distributions
The measurements of fluorescence were made comparable with the corresponding observable chemical species S ^{o} in the model by a function h that maps state S ^{o}(t _{ i }) to observation S ^{†}(t _{ i }) as follows:
at each measurement time point t _{ i }, i=1,…,n.
Subsequently, a process to compare distributions was developed based on [20]. First, considering a specific set of parameter values θ={θ _{1},…,θ _{ d }}, we performed ns repetitions of the stochastic simulations s(t _{ i })={s _{1}(t _{ i }),…,s _{ ns }(t _{ i })}. The total of those stochastic simulations were used to build histograms with a fixed number of bins L for each t _{ i }. Subsequently, the (discrete) probability density function (PDF) of the simulations P _{ s }(s(t _{ i }),θ,b _{ l }) were computed using the center of each bin b _{ l }, for l=1,…,L, and the normalized form of the histograms. On the other hand, having nm repetitions of single cell experimental data m(t _{ i })={m _{1}(t _{ i }),…,m _{ nm }(t _{ i })} histograms and PDFs P _{ e }(m(t _{ i }),b _{ l }) were built. Even though, complex formulas can be use to calculate the number of bins in an histogram, here L was kept constant for P _{ s } and P _{ e } and was calculated as \(L = \sqrt {nm}\), this is a simple approach used by default by most programming languages (i.e. Matlab). Finally, an approach using the difference of squares was used as a metric to calculate the distance between P _{ s }(s(t _{ i }),θ,b _{ l }) and P _{ e }(m(t _{ i }),b _{ l }), as follows:
Algorithm implementation
The objective function F(θ,m) can be used to calculate a parameter estimate
As optimization usually requires plenty of function evaluations and each evaluation of F requires ns stochastic simulations, a direct optimization strategy is computationally unfeasible even for simple models. Therefore, we introduce a new strategy that selects good candidate parameters and only performs the stochastic simulations for these parameters.
In large systems where fluctuations can be discarded, the stochastic system can be reduced to the deterministic one [36, 37]. For this reason, in most cases the deterministic dynamics can be associated with a measure of central tendency in the PDFs obtained after the stochastic simulations. Using this reasoning we will introduce the strategy to efficiently estimate parameters for stochastic models making use of deterministic dynamics as an initial indicator.
Assuming that the experimental data is in equilibrium at the last measurement point t _{ n }, we used the modes from P _{ e }(m(t _{ n }),b _{ l }) as a central measure of tendency, denoting the modes by α(t _{ n })=(α _{1}(t _{ n }),…,α _{ q }(t _{ n })), with q being the total number of modes.
Then, using the ODE version of the model (\(\dot {\boldsymbol {X}}(\theta,t)\)) we determine its stability. Here we tested whether the system has stable steady states and denote them with \(\boldsymbol {X}^{*}(\theta) = \left (X^{*}_{1}(\theta),\ldots,X^{*}_{ss}(\theta) \right)\), being s s the number of stable steady states. If the system has no stable steady state, it holds s s=0 and the vector X ^{∗}(θ) is empty. The calculation of steady states can either be performed by solving \(\dot {\boldsymbol {Y}}(\theta) = 0\) where \(\dot {\boldsymbol {Y}}\) is the right hand side of the ODE system with X(θ,t) replaced by Y(θ). Solving this equation can be impossible for large systems. Alternatively, one can numerically solve the ODE systems until the flux is zero for all components. Varying the initial conditions leads to the different steady states. The stability of the system was calculated after determining the sign of the eigenvalues (λ<0), for stable steady states [38].
Having X ^{∗}(θ) and α(t _{ n }) we introduce the deterministic precondition:
Equation (4) means that the number of modes equals the number of stable steady states: s s=q, and that each of the stable steady states is close to its corresponding mode. Only, if this is fulfilled, we calculate stochastic simulations and evaluate the objective function F(θ,m). If the precondition is not fulfilled, we do not carry out stochastic simulations. In this case, we assign a high (bad) objective function value to direct the parameter search towards parameters that pass the deterministic precondition. A graphical representation of the deterministic precondition is given in Fig. 1.
At this point, it is important to notice that the deterministic precondition is not stating that the stable steady states form the ODE system are equal to the mean obtained by solving the stochastic system. Rather, the deterministic precondition tests if the deterministic steady state lies around a certain range.
Our extended objective function with the deterministic check reads as:
and the estimate is defined as:
As previously discussed, our method is based on the central assumption that the modes in the experimental data are related to the stable steady states in the deterministic mathematical model. If this assumption should fail, either of the following could happen: a) The method accepts a parameter that passed the deterministic precondition but does not show a good agreement of the distributions. If it does not show good agreement, the objective function F(θ,m) will show a high value which means that this parameter does not lead to a good fit. Therefore, this case leads to a loss of computational time, but not to wrong results and it overall is still less expensive than evaluating every parameter set. b) We reject a parameter that fails the deterministic precondition but would have lead to a good fit. In this case, we lose indeed a good parameter. This shows that our method is obviously a heuristic strategy that does not guarantee to find the exact global minimum, but this is true for almost any optimization strategy. The algorithmic steps are depicted in Fig. 2.
Parameter estimation methods
The deterministic precondition was first implemented using a random search algorithm. Random search is a global optimization strategy that tests random combinations of parameter values. The successful output in this method is dependent on the total number of evaluated parameter sets [39]. The more evaluated parameter values, the higher the probability to find the global minimum. The pseudocode for the random search is given in Algorithm 1.
The second implementation of the deterministic precondition was using the more directed Genetic Algorithm. Genetic Algorithm mimics evolution and is based on reproduction and selection. This algorithm is made of a population of individuals (parameter sets), and each contains a genome that is defined by the number of parameters to optimize. The individuals are ranked after solving the objective function, and a population of parental individuals is selected according to an elitism rate (ε). New individuals (offspring) are generated by pairing and recombining the parental genomes (crossover). Variability is introduced in the population by adding mutations in the new individuals according to a given mutation rate (μ). By the continuous process of selecting the best parameters after each generation, the algorithm evolves towards the regions in the parameter space that reduce the values in the objective function. The pseudocode for the Genetic Algorithm is given in Additional file 1.
Identifying parameters for a constitutive gene expression circuit with insilico data
To prove the functionality and benefits of the new proposed algorithm, we first applied it to a simple example where the parameters are known a priori. The model describes the stochastic dynamics of two variables, protein and mRNA of a gene with constitutive expression [40], a graphical representation of the constitutive gene expression circuit is given in Fig. 3a. The model is described by the following reactions:
To estimate the parameter values for this model we carried out the following procedure: First, we generated the insilico data by selecting the following true parameter values θ ^{o}=(5,0.03,0.1,0.03) and running 10000 independent stochastic simulations from which PDFs were built. Four observable time points were defined at 50 min, 100 min, 150 min and 200 min (see Fig. 3 b). From the PDFs a mode was obtained at α _{1}(t _{ n }) = [543] a.u.(arbitrary units), where t _{ n }=200 min. Then, we defined the deterministic precondition using Eq. (4), and assigning minimum and maximum acceptance ranges by setting \(\beta _{1}^{low}\) = 0.95 and \(\beta _{1}^{up}\) = 1.05, respectively. A deeper analysis of the stability of the constitutive gene expression circuit is given in Additional file 2. To build simulated PDFs we used 1000 repetitions of the stochastic model, this number was empirically calculated according to Additional file 2. For illustrative purposes, we assumed that the values for the parameters responsible for the mRNA transcription and protein translation (θ _{1}, θ _{3}) were unknown, and the new algorithm was used to estimate those parameters. The deterministic precondition was applied using the RS strategy obtaining that the algorithm only evaluates stochastic dynamics in 3.1% of the tested parameter values, reducing in this way the total simulation time. Additionally, the complete algorithm was repeated 1000 times and histograms of the estimated parameter were computed to determine whether they are close to θ ^{(0)}. As can be observed in Fig. 3c the deterministic precondition reduces the evaluation of different parameters under stochastic dynamics by selecting only those parameters that are in a welldefined region in the proximity of θ ^{(0)}. For this model, the main benefits of using the deterministic precondition was the reduction of the number of parameters evaluated under stochastic dynamics. This rejection of parameters was made in an area outside the true parameter values, and hence no difference in accuracy is expected in comparison with a method without using the deterministic precondition. A complete description of the analysis of the performance, accuracy and error for this example is given in Additional file 2. The model for the constitutive gene expression circuit can be obtained from BioModels database under reference MODEL1608100000.
Mathematical model for IRF7 expression dynamics
Subsequently we applied our algorithm to a real problem with flow cytometry data. Here we studied the dynamics of murine IRF7 gene expression upon IFN stimulation. For this reason, we developed a model that comprises known key components and feedback mechanisms. The overall system describes the active IRF7 promoter (Pa) by the binding of IRF7 dimer and ISGF3 to the DNA binding sites ISRE and IRFE, respectively [41], the transcription and translation of IRF7, and its subsequent phosphorylation and dimerization. IRF7 protein binding to the IRFE binding site in the promoter results in the production of more IRF7 protein, constituting a positive feedback loop [42]. A graphical representation of the IRF7 gene expression dynamics is given in Fig. 4.
Chemical reactions
The developed model consists of 7 species and 13 reactions (reactions (11) to (23)). In the model, reaction (11) describes the IFN activation of the JAKSTAT signaling pathway. For the description of this reaction and according to [30] a saturable function was used. Downstream reactions of the pathway were lumped in reaction (12), so that the same variable was used to describe the output of the JAKSTAT signaling pathway, namely ISGF3, in this highly simplified model. Reaction (13) describes the IRF7 promoter activation by the binding of IRF7 dimer and ISGF3 to ISRE and IRFE, respectively. Subsequently, we incorporated IRF7 mRNA transcription by the active promoter, and to a lesser extent by the basal promoter state, as reactions (14) and (15), respectively. Reaction (16) considers the translation of IRF7 mRNA to produce IRF7 protein. IRF7 protein is phosphorylated in reaction (17). Two phosphorylated IRF7 proteins form a IRF7 dimer in reaction (18). Finally, IRF7 promoter inactivation, degradation of mRNAs, ISGF3, IFN and IRF7 proteins are represented by reactions (19) to (23), respectively:
The reaction rates are given in Table 1.
Mathematical equations
To evaluate the deterministic precondition we consider the corresponding ODEs (Eq. (24) to (30)):
The model for the IRF7 circuit can be obtained from BioModels database under reference MODEL1608100001.
Fitting the stochastic IRF7 gene expression model to experimental data
All parameters of the above described model for the IRF7 gene circuit were fitted by using experimental data of IRF protein expression. In the model, a global quantity to describe the different forms of the IRF7 protein was defined as follows:
and this variable was mapped to IRF7^{†} (t _{ i }) flow cytometry measurements as follows:
where φ is a scaling factor.
Using the experimental data, PDFs were build and the modes in the distributions were determined using the PeakFinder function [35] obtaining two elements of α(t _{ n }) = [77, 1000] a.u. (arbitrary units), where t _{ n } = 48 h. φ relates the values of fluorescence with the molecular count described by the mathematical model. Unfortunately, no calibration curve is provided with the data to calculate this parameter. For this reason, multiple values were tested for φ obtaining consistent results, for illustrative purposes we report φ = 1. We defined the deterministic precondition using Eq. (4), and assigning minimum and maximum acceptance ranges by setting \(\beta _{k}^{low}\) = 0.95 and \(\beta _{k}^{up}\) = 1.05, for k = 1, 2. The allowed ranges for the parameter values are given in Table 2. The deterministic precondition was introduced in two different optimization strategies, random search, and genetic algorithms. In both optimization strategies 1000 realizations of the stochastic simulations were performed if the parameter set fulfilled the deterministic precondition.
Using the random search strategy, we tested 10000 parameter sets from which less than the 1% passed the deterministic precondition and were stochastically evaluated. The reduction of parameter values allowed us to efficiently find a set of parameter values that reproduced the experimental data (the fitting for the random search algorithm is given in Additional file 3). A complete analysis of the performance of the use of the deterministic precondition with the random search algorithm is given in Additional file 4. To test the reproducibility of the estimated parameter values, the method was repeated 100 times obtaining welldefined parameter distributions that show the predominant values, see Additional file 5.
Then, we implemented the deterministic precondition using a genetic algorithm with adaptive population size. Here we implemented an initial population of 3000 individuals for the first generation, and 5 subsequent generations with 20 individuals. Notice, that a large initial population of parameters was needed by the expected high rejection percentage of parameters by the deterministic precondition during the first generation. As parameters for the algorithm we used ε=0.4 and μ=0.2. Our simulation results showed a constant decrease in the value of the objective function value during the generations, which indicates progress during fitting. By the use of the deterministic precondition 99% of the parameters were rejected in the first generation and in the subsequent generations around 30% of the parameter values were rejected, improving in this way the efficiency of the algorithm (see Fig. 5). The comparison between the experimental data and the model simulation distributions is given in Fig. 6 showing a high degree of agreement. In addition, to check the validity of the methodology further we fit the model to two additional flow cytometry measurements that describe the stimulation of the cell culture with 100U and 250U of IFN. The respective results are given in Additional file 6. Complete analysis of the performance of the use of the deterministic precondition in the genetic algorithm is given in Additional file 4.
IRF7 temporal dynamics
The simple model of IRF7 gene expression described above is sufficient to explain IRF7 bimodality. Using the optimized parameter values given in Table 2 and the initial conditions given in Table 3, the stochastic temporal dynamics of the model were simulated and are given in Fig. 7. In Fig. 7a it can be seen that the IFN concentration evolves to a steady state. Subsequently, the first affected variable is ISGF3 that equally evolves towards a stable steady state, see Fig. 7b. Figure 7c shows the IRF7 promoter dynamics exhibiting transitions between active/basal states. This promoter state transition is a characteristic of genes with regulated expression [43]. Subsequently, IRF7 mRNA expression displays a pattern that is affected by this stochastic switching between two possible states, one with basal expression and the other with active expression, see Fig. 7d. For singlecell trajectories, IRF7 protein expression shows a switchlike expression. For the whole population of those trajectories bimodality is observed (Fig. 7e), the same stands for the different forms of the IRF7 protein, phosphorylated (Fig. 7f) dimer (Fig. 7g), and the total amount of IRF7 proteins (Fig. 7h). The system’s steady states are given in Table 4.
Discussion
The promise of systems biology is to achieve a quantitative understanding of the molecular processes in the cell with the aid of computational models. However, a bottleneck is the availability of reliable parameter values needed in those models. Often, it is very difficult or even impossible to measure all of these parameters. For deterministic models, this problem has been well tackled by the development of efficient methods for parameter estimation. Contrarily, for stochastic dynamics the landscape of methods for parameter estimation is poorer and still under development. For this reason, innovative and efficient algorithms are needed to fit and validate realistic stochastic models.
During the last years important advances have been achieved in the field of parameter inference for stochastic models. On one hand, strategies that involve mathematical procedures of moment estimation have been suggested [17–19]. On the other, Bayesian methods that test multiple parameters values to find approximated solutions that represent the data have been successfully implemented [20]. Currently, the main problems observed in most methods include the computational cost, the accuracy of the obtained solution, and its potential to be implemented in large and nonlinear systems. Different strategies have been suggested to improve the accuracy and alleviate the computational performance [21, 22].
In our method, we tackled the computational cost by introducing a deterministic precondition that works as a bypass in the algorithm avoiding large amounts of unnecessary stochastic simulations. The concept of reducing the parameter space by introducing a precondition defined by the experimental data has been suggested previously by Hori et al. [21]. In their method, a small order linear model is used to optimize the experimental data by finding the root of a Lyapunov equation. In contrast, we use a deterministic precondition. Both approaches have advantages and disadvantages. Thus, Hori’s method is constrained by the need to find the Lyapunov equation. Our method is based on steady state calculation and Monte Carlo simulations that are standard and wellknown methodologies used in systems biology.
Recently a new algorithm developed by Lillacci et al., has been shown to significantly reduce the computational cost and achieve a high accuracy fit for stochastic models and flow cytometry data [22]. This method uses the Kolmogorov distance as a metric to calculate the difference between model simulation and experimental data. By choosing this metric, it is possible to estimate the minimal number of simulations needed to compare experimental and model simulations under a certain tolerance value. This estimated value decreases as the number of experimental data increases. In typical flow cytometry experiments the number of measured cells is in the order of tens of thousands and this number in Lillacci’s Algorithm is translated in a reduction of at least one order of magnitude in the number of required stochastic simulations. This algorithm has been applied to a model with 18 reactions and 20 free parameters obtaining a good fit to flow cytometry experimental data that reproduces bimodal dynamics. Comparing our method with Lillacci’s algorithm it is important to point out that both methods tackle the computational cost issue in two fundamentally different ways. Lillacci’s algorithm minimizes the number of needed stochastic simulations, whereas our algorithm minimizes the number of parameters sets evaluated with stochastic dynamics. A powerful new algorithm may be the result of combining both methodologies.
It is well known from nonlinear dynamics that the model architecture and parameter values determine the behavior of the system. Hence, more complex stable dynamics such as limit cycles, and highermultistability (a system with more than two stable steady states) can be obtained. Unstable systems are usually not of interest in the biological context. Our method was designed and tested for monostable and bistable systems, all other cases being rejected by the deterministic precondition. The case of limit cycles can be better approached by existing parameter estimation methods for single timecourse data [12–16]. The case of systems with highermultistability can in principle be treated by our method, but the application is limited by the costly need to find multiple stable steadystates in the system. Finally, it is relevant to mention that comparing stable states of the ODE model with modes of the measured flow cytometry distribution in some models can present some inherent problems. Bimodal fluorescence distributions may have no relation to bistability, e.g. when the systems has large differences in transitions rates for the promoter states [44].
Our method was implemented using two wellestablished optimization strategies, random search and genetic algorithms. Using a random search algorithm we observed that less than 1% of the population of parameters are subject to stochastic evaluations, this reduction in the tested parameters allowed us to explore a larger proportion of the parameter space, which is especially relevant for systems with multiple unknown parameters and no initial parameter guesses. On the other hand, using genetic algorithms we achieved a convergence to a minimal value in the optimization function after a few generations (see Fig. 5). Additionally, during each generation in the genetic algorithm a large percentage of the total evaluated parameters is rejected by the deterministic precondition. This percentage is dependent on the parameter of the algorithm (rate of elitism and mutation rate). In both strategies the introduction of the deterministic precondition significantly improved the parameter estimation process. Moreover, the implementation of deterministic precondition is not restricted to random search and genetic algorithms, it can also be implemented in other optimization algorithms, e.g. particle swarm or simulated annealing. A complete analysis of the performance of using the deterministic condition is given in Additional file 4.
The accuracy of the method is given by the agreement between experimental data and simulations. As can be observed in Fig. 6 a good fit was found, albeit not a perfect one. We observed that even increasing the number of tested parameter values during the random search and/or increasing the number of generations during the genetic algorithm did not improve the fit. For this reason, we consider that the differences between the model and the experimental data might be explained by the fact that our system is a highly simplified model that was built by lumping some steps in the biological system. Our team is working to fit more complex models in a future publication. To test the reproducibility of the obtained parameter values, the method was repeated 100 times and distributions of the obtained parameters were built. As can be observed in Additional file 5, when a parameter is identifiable this is reflected in a narrow distribution, on the other hand, when a parameter is nonidentifiable this is reflected in a wide distribution. In the example given by reactions (11) to (23) we observed that most parameters are not identifiable. Thus, nonidentifiability is expected in the parameters contained in reactions (11) and (20). Those reactions have opposite effects on the IFN dynamics. For this reason, multiple combinations of values in parameters V _{ IFN }, k _{ IFN } and k _{ dIFN } have similar effects on the overall system dynamics. Contrarily, the parameters involved in the production of the mRNA (k _{ Active } and k _{ Basal }) are better confined.
Rand et al. described bimodal gene expression of IRF7 after IFN stimulation. This phenomenon was explained at the cell population level by the effects of the IFN paracrine response [30]. Here we observed that bimodality also can be explained in absence of paracrine response and only taking into account the molecular mechanism responsible for the IRF7 promoter dynamics. Bistability in geneexpression circuits is commonly associated with the switching between active/basal states in the gene promoter. In most cases, DNA cooperativity in the promoter is the basis of promoterstate switching [45]. However, for typeI IFN responses a promoter activation in a cooperative manner has recently been discarded [46]. Taking the recent literature into account, we developed a model on the basis of a positive feedback loop circuit comprising the independent activation of ISRE and IRFE elements by one ISGF3 molecule and one IRF7 dimer, respectively (see Fig. 4). This model has the needed and sufficient elements to sustain a bistable system (that is a positive feedback loop and nonlinear dynamics in the reaction rates) [47, 48]. Additionally, multiple model structures testing different biological scenarios including: additional feedback loops, additional intermediate elements, and promoter cooperative activation were tested, obtaining that the presented model reproduces the experimental data best. Our simulation results agree with the experimental data obtained from flow cytometry [30]. Additional characteristics observed in the experimental data are also reproduced by the model, such as the basal expression of IRF7 without IFN stimulation [29], see Additional file 7. The number of molecules in the active state of the system (given by the obtained 2 ^{nd} steady state, see Table 4) agrees with the order of magnitude reported for mammal mRNAs (average = 17 Molecules/Cell with a range between 1 to 200 Molecules/Cell) and the range of protein concentration (average = 50,000 Molecules/Cell with a range between 100 to 10^{8} Molecules/Cell) [49].
Conclusion
Here, we present a method to fit stochastic models to experimental data. The method is based on the comparison of distributions. The central idea of the method is to use a deterministic precondition that is defined by the experimental data as a filter avoiding large amounts of costly stochastic simulations. Using this idea, the number of parameters evaluated under stochastic dynamics is reduced, resulting in a significant improvement in the performance of the algorithm. As a case study, we used a model of IRF7 gene expression investigating the origin of bimodality in its dynamics upon IFN stimulation. Our results allowed us to conclude that a circuit with IRF7 promoter activation by one IRF7 dimer and one ISGF3 molecule is sufficient to explain the observed bimodal dynamics.
Abbreviations
 au:

Arbitrary units
 BAC:

Bacterial artificial chromosome
 DP:

Deterministic precondition
 GA:

Genetic algorithm
 IFN:

Interferon
 ISG:

Interferon stimulated genes
 ODE:

Ordinary differential equation
 PDF:

Probability density function
 RS:

Random search
 sss:

Stable steady state
References
 1
Gillespie DT. A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J Comput Phys. 1976; 22(4):403–34. doi:10.1016/00219991(76)900413.
 2
Adrews SS, Dinh T, Arkin AP. Stochastic models of biological processes. Encycl Complex Syst Sci. 2009:8730–49. doi:10.1007/9780387304403_524.
 3
Raj A, van Oudenaarden A. Nature, nurture, or chance: stochastic gene expression and its consequences. Cell. 2008; 135(2):216–26. doi:10.1016/j.cell.2008.09.050.
 4
Golding I, Paulsson J, Zawilski SM, Cox EC. Realtime kinetics of gene activity in individual bacteria. Cell. 2005; 123(6):1025–36. doi:10.1016/j.cell.2005.09.031.
 5
Weinberger LS, Burnett JC, Toettcher JE, Arkin AP, Schaffer DV. Stochastic gene expression in a lentiviral positivefeedback loop: HIV1 Tat fluctuations drive phenotypic diversity,. Cell. 2005; 122(2):169–82. doi:10.1016/j.cell.2005.06.006.
 6
Samoilov M, Plyasunov S, Arkin AP. Stochastic amplification and signaling in enzymatic futile cycles through noiseinduced bistability with oscillations,. Proc Natl Acad Sci U S A. 2005; 102(7):2310–5. doi:10.1073/pnas.0406841102.
 7
Moles CG, Mendes P, Banga JR. Parameter estimation in biochemical pathways: a comparison of global optimization methods. Genome Res. 2003:2467–74. doi:10.1101/gr.1262503..
 8
RodriguezFernandez M, Egea JA, Banga JR. Novel metaheuristic for parameter estimation in nonlinear dynamic biological systems. BMC Bioinformatics. 2006:483. doi:10.1186/147121057483.
 9
Bock HG, Kostina E, Schlöder JP. Numerical Methods for Parameter Estimation in Nonlinear Differential Algebraic Equations. GAMMMitteilungen. 2007; 30(2):376–408. doi:10.1002/gamm.200790024.
 10
Marquardt DW. An algorithm for leastsquares estimation of nonlinear parameters. J Soc Ind Appl Math. 1963; 11(2):431–41. doi:10.1137/0111030.
 11
Zimmer C. Reconstructing the hidden states in time course data of stochastic models. Math Biosci. 2015; 269:117–29. doi:10.1016/j.mbs.2015.08.015.
 12
Zimmer C, Sahle S. Deterministic inference for stochastic systems using multiple shooting and a linear noise approximation for the transition probabilities. IET Syst Biol. 2015:1–12. doi:10.1049/ietsyb.2014.0020.
 13
Komorowski M, Costa MJ, Rand DA, Stumpf MPH. Sensitivity, robustness, and identifiability in stochastic chemical kinetics models. Proc Natl Acad Sci. 2011; 108(21):8645–50. doi:10.1073/pnas.1015814108.
 14
Golightly A, Wilkinson DJ. Bayesian inference for stochastic kinetic models using a diffusion approximation. Biometrics. 2005; 61(3):781–8. doi:10.1111/j.15410420.2005.00345.x.
 15
Tian T, Xu S, Gao J, Burrage K. Simulated maximum likelihood method for estimating kinetic rates in gene expression. Bioinformatics. 2007; 23(1):84–91. doi:10.1093/bioinformatics/btl552.
 16
Boys RJ, Wilkinson DJ, Kirkwood TBL. Bayesian inference for a discretely observed stochastic kinetic model. Stat Comput. 2008; 18(2):125–35. doi:10.1007/s112220079043x.
 17
Zechner C, Ruess J, Krenn P, Pelet S, Peter M, Lygeros J, Koeppl H. Momentbased inference predicts bimodality in transient gene expression. Proc Natl Acad Sci. 2012; 109(21):8340–345. doi:10.1073/pnas.1200161109.
 18
Kügler P. Moment fitting for parameter inference in repeatedly and partially observed stochastic biological models. PLoS ONE. 2012; 7(8):1–15. doi:10.1371/journal.pone.0043001.
 19
MillerJensen K, Skupsky R, Shah PS, Arkin AP, Schaffer DV. Genetic selection for contextdependent stochastic phenotypes: Sp1 and TATA mutations increase phenotypic noise in HIV1 gene expression. PLoS Comput Biol. 2013; 9(7). doi:10.1371/journal.pcbi.1003135.
 20
Poovathingal SK, Gunawan R. Global parameter estimation methods for stochastic biochemical systems,. BMC Bioinformatics. 2010; 11:414. doi:10.1186/1471210511414.
 21
Hori Y, Khammash MH, Hara S. Efficient parameter identification for stochastic biochemical networks using a reducedorder realization. In: Proceedings of the European Control Conference 2013. Zurich: European Control Conference, ECC: 2013. p. 4154–49.
 22
Lillacci G, Khammash M. The signal within the noise: efficient inference of stochastic gene regulation models using fluorescence histograms and stochastic simulations. Bioinformatics. 2013; 29(18):2311–9. doi:10.1093/bioinformatics/btt380.
 23
Ivashkiv LB, Donlin LT. Regulation of type I interferon responses,. Nat Rev Immunol. 2014; 14(1):36–49. doi:10.1038/nri3581.
 24
Hu J, Sealfon SC, Hayot F, Jayaprakash C, Kumar M, Pendleton AC, Ganee A, FernandezSesma A, Moran TM, Wetmur JG. Chromosomespecific and noisy IFNB1 transcription in individual virusinfected human primary dendritic cells. Nucleic acids Res. 2007; 35(15):5232–41. doi:10.1093/nar/gkm557.
 25
Zhao M, Zhang J, Phatnani H, Scheu S, Maniatis T. Stochastic expression of the interferon β gene. PLoS Biol. 2012; 10(1):1001249. doi:10.1371/journal.pbio.1001249.
 26
Levin D, Harari D, Schreiber G. Stochastic receptor expression determines cell fate upon interferon treatment. Mol Cell Biol. 2011; 31(16):3252–66. doi:10.1128/MCB.0525111.
 27
Schneider WM, Chevillotte MD, Rice CM. Interferonstimulated genes: a complex web of host defenses. Annu Rev Immunol. 2014; 32:513–45. doi:10.1146/annurevimmunol032713120231.
 28
Vasquez K, Sigrist K, Kucherlapati R, Demant P, Dietrich WF, Agoulnik S, Plus S. IRF7 is the master regulator of. Nature. 2005; 434(April):772–7. doi:10.1038/nature03419.1..
 29
Honda K, Taniguchi T. IRFs: master regulators of signalling by Tolllike receptors and cytosolic patternrecognition receptors. Nat Rev Immunol. 2006; 6(9):644–58. doi:10.1038/nri1900.
 30
Rand U, Rinas M, Schwerk J, Nöhren G, Linnes M, Kröger A, Flossdorf M, KályKullai K, Hauser H, Höfer T, Köster M. Multilayered stochasticity and paracrine signal propagation shape the typeI interferon response. Mol Syst Biol. 2012; 584:584. doi:10.1038/msb.2012.17.
 31
MATLAB version (R2015a); 2015.
 32
Hoops S, Sahle S, Gauges R, Lee C, Pahle J, Simus N, Singhal M, Xu L, Mendes P, Kummer U. COPASI–a COmplex PAthway SImulator. Bioinformatics (Oxford, England). 2006; 22(24):3067–74. doi:10.1093/bioinformatics/btl485.
 33
Gibson MA, Bruck J. Efficient exact stochastic simulation of chemical systems with many species and many channels. J Phys Chem A. 2000; 104(9):1876–89. doi:10.1021/jp993732q.
 34
Balkay L. FCS data reader. MATLAB Central File Exchange. 2006.
 35
Yoder N. peakfinder. MATLAB Central File Exchange. 2009.
 36
Gillespie DT. The chemical langevin equation. J Chem Phys. 2000; 113(1):297–306. doi:10.1063/1.481811.
 37
Gillespie DT. Stochastic simulation of chemical kinetics. Annu Rev Phys Chem. 2007; 58:35–55. doi:10.1146/annurev.physchem.58.032806.104637.
 38
Strogatz SH. Nonlinear dynamics and chaos with applications to physics, biology, chemistry, and engineering. New York: Westview Press; 2000. doi:10.1137/1037077.
 39
Brownlee J. Clever algorithms natureinspired programming recipes. Melbourne: Lulu Enterprises; 2011.
 40
Legewie S, Herzel H, Westerhoff HV, Blüthgen N. Recurrent design patterns in the feedback regulation of the mammalian signalling network. Mol Syst Biol. 2008; 4(1):190. doi:10.1038/msb.2008.29.
 41
Ning S, Huye LE, Pagano JS. Regulation of the transcriptional activity of the IRF7 promoter by a pathway independent of interferon signaling. J Biol Chem. 2005; 280(13):12262–70. doi:10.1074/jbc.M404260200.
 42
Litvak V, Ratushny AV, Lampano AE, Schmitz F, Huang AC, Raman A, Rust AG, Bergthaler A, Aitchison JD, Aderem A. A FOXO3IRF7 gene regulatory circuit limits inflammatory sequelae of antiviral responses. Nature. 2012; 490(7420):421–5. doi:10.1038/nature11428.
 43
Kaufmann DE, Walker BD. Treatment interruption to boost specific HIV immunity in acute infection. Curr Opin HIV AIDS. 2007; 2(1):21–5.
 44
Dobrzyński M, Nguyen LK, Birtwistle MR, von Kriegsheim A, Fernández AB, Cheong A, Kolch W, Kholodenko BN. Nonlinear signalling networks and celltocell variability transform external signals into broadly distributed or bimodal responses. J R Soc Interface. 2014; 11(98):20140383. doi:10.1098/rsif.2014.0383.
 45
Kaern M, Elston TC, Blake WJ, Collins JJ. Stochasticity in gene expression: from theories to phenotypes. Nat Rev Genet. 2005; 6(6):451–64. doi:10.1038/nrg1615.
 46
Begitt A, Droescher M, Meyer T, Schmid CD, Baker M, Antunes F, Knobeloch KP, Owen MR, Naumann R, Decker T, Vinkemeier U. STAT1cooperative DNA binding distinguishes type 1 from type 2 interferon signaling. Nat Immunol. 2014; 15(2):168–76. doi:10.1038/ni.2794.
 47
Cherry JL, Adler FR. How to make a biological switch. J Theor Biol. 2000; 203(2):117–33. doi:10.1006/jtbi.2000.1068.
 48
Craciun G, Tang Y, Feinberg M. Understanding bistability in complex enzymedriven reaction networks. Proc Natl Acad Sci. 2006; 103(23):8697–702. doi:10.1073/pnas.0602767103.
 49
Schwanhäusser B, Busse D, Li N, Dittmar G, Schuchhardt J, Wolf J, Chen W, Selbach M. Global quantification of mammalian gene expression control. Nature. 2011; 473(7347):337–42. doi:10.1038/nature10098.
 50
Yang E, van Nimwegen E, Zavolan M, Rajewsky N, Schroeder M, Magnasco M, Darnell JE. Decay rates of human mrnas: correlation with functional characteristics and sequence attributes. Genome Res. 2003; 13(8):1863–72. doi:10.1101/gr.1272403.
Acknowledgements
We gratefully thank Dr. Frank T. Bergmann for providing a modified version of Copasi that is used as part of this project. In the same way, we express our sincere thanks to Dr. Sven Sahle for his helpful comments helping us to increase the algorithm performance by suggesting changes in the data acquisition and analysis. We sincerely thank Dr. Ulfert Rand and Dr. Mario Köster for providing the experimental data.
Funding
This research was supported by Consejo Nacional de Ciencia y Tecnología (CONACYT) under grant 263795 in the program “Estancias postdoctorales al extranjero para la consolidación de grupos de investigación convocatoria 2015” to LUA. CZ was supported by BIOMS. The project was also supported by BMBF (Immunoquant).
Availability of data and materials
The data sets supporting the result of the article are included within the article and in additional files. The source code of the project can be accessed via: https://sourceforge.net/projects/irf7bimodaldynamics/. The models can be accessed via: https://www.ebi.ac.uk/biomodelsmain/
Authors’ contributions
LUA, UK and CZ conceived and designed the research. LUA wrote the manuscript, designed and performed the simulations. UK suggested extensions and modifications of the research. UK and CZ participated in discussions and revised the manuscript critically. All authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Consent for publication
Not applicable.
Ethics approval and consent to participate
Not applicable.
Author information
Additional files
Additional file 1
Pseudocode for the genetic algorithm with the deterministic precondition. Additional file with the pseudocode for the genetic algorithm. (PDF 85.9 kb)
Additional file 2
Identifying parameters for a constitutive gene expression circuit with insilico data. Additional file with an example where the method is applied to a simulation case with synthetic data. The file contains Figures A1A3, and Tables A1 and A2. (PDF 930 kb)
Additional file 3
Random search strategy. Additional file with Figure A4. Results obtained by using the random search strategy. (PDF 930 kb)
Additional file 4
Algorithm’s speed and performance analysis. Additional file with an analysis of the algorithm’s speed and performance. The file contains Figure A5, and Tables A3 and A4. (PDF 75.9 kb)
Additional file 5
Testing the reproducibility of the selected parameter values. Additional file containing Figure A6 that shows the parameter distributions. (PDF 215 kb)
Additional file 6
Fitting the model to different doses of IFN. Additional file with figure A7 and A8 that shows the fits of the model to flow cytometry data describing the stimulation of cells with 100U and 250U of IFN. (PDF 188 kb)
Additional file 7
Model dynamics without IFN stimulation. Additional file with Figure A9 that shows the time courses for the model in absence of IFN stimulation. (PDF 693 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Aguilera, L., Zimmer, C. & Kummer, U. A new efficient approach to fit stochastic models on the basis of highthroughput experimental data using a model of IRF7 gene expression as case study. BMC Syst Biol 11, 26 (2017). https://doi.org/10.1186/s1291801704064
Received:
Accepted:
Published:
Keywords
 IRF7
 IFN
 Stochastic models
 Parameter estimation