 Methodology article
 Open Access
 Published:
Principal process analysis of biological models
BMC Systems Biology volume 12, Article number: 68 (2018)
Abstract
Background
Understanding the dynamical behaviour of biological systems is challenged by their large number of components and interactions. While efforts have been made in this direction to reduce model complexity, they often prove insufficient to grasp which and when model processes play a crucial role. Answering these questions is fundamental to unravel the functioning of living organisms.
Results
We design a method for dealing with model complexity, based on the analysis of dynamical models by means of Principal Process Analysis. We apply the method to a wellknown model of circadian rhythms in mammals. The knowledge of the system trajectories allows us to decompose the system dynamics into processes that are active or inactive with respect to a certain threshold value. Process activities are graphically represented by Boolean and Dynamical Process Maps. We detect model processes that are always inactive, or inactive on some time interval. Eliminating these processes reduces the complex dynamics of the original model to the much simpler dynamics of the core processes, in a succession of submodels that are easier to analyse. We quantify by means of global relative errors the extent to which the simplified models reproduce the main features of the original system dynamics and apply global sensitivity analysis to test the influence of model parameters on the errors.
Conclusion
The results obtained prove the robustness of the method. The analysis of the submodel dynamics allows us to identify the source of circadian oscillations. We find that the negative feedback loop involving proteins PER, CRY, CLOCKBMAL1 is the main oscillator, in agreement with previous modelling and experimental studies. In conclusion, Principal Process Analysis is a simpletouse method, which constitutes an additional and useful tool for analysing the complex dynamical behaviour of biological systems.
Background
Mathematical modelling has been used for decades as an approach to understand the functioning of biological systems in terms of their internal processes and components. The latter form complex networks that vary in nature. For instance, biochemical networks include processes controlling the intracellular level of metabolites, RNAs and proteins, which allow cells to live and grow. A process either corresponds to a single biochemical reaction, for example protein phosphorylation, or encompasses many biochemical reactions like those involved in general cell functions (translation of proteins, transcription of RNAs...). In ecological networks, the processes can refer to events influencing the distribution and abundance of organisms, or to fluxes of energy and matter.
Numerous kinetic models of these networks have been developed in computational biology, of increasing complexity due to advances in modelling and parameter estimation approaches (see [1, 2] for an example). Complexity arises from the high dimension of the networks, the large number of biological processes involved and their non linearity due to the complex feedback loops that regulate them.
One approach often used to tackle the problem of complexity is model reduction (see [3] for a recent review). The simplified models are easier to analyse, while retaining the main features of the original ones and their biological significance. Briefly, methods of model reduction shorten the list of network species or of network reactions (e.g. [4, 5]), lump state variables (e.g. [6]) or decompose the system into slow and fast dynamics (e.g. [7–9]). The often used quasisteadystate approximation falls in the latter category (e.g. [10]). Other approaches simplify the mathematical functions describing the molecular processes. For instance, piecewise affine differential equations approximate by step functions the sigmoidal functions used to describe the regulation of gene expression. The dynamics of the simplified system can be easily analysed by means of state transition graphs [11]. However, these simplifications are generally restricted to models of gene expression and are more difficult to apply to other types of networks [12].
Reduction approaches have proven successful to significantly reduce model complexity, but they do not provide a mean to understand how the system dynamics emerges from the cascade of biological processes and regulatory mechanisms at work. This is especially true when the reduced models remain complex, with many coupled equations sharing common processes and involving complex feedback loops. For instance, regulatory mechanisms switch on certain biological processes at some times and off at others. It is thus important for a good understanding of the system behaviour to identify which and when processes significantly influence the system dynamics. In other words, instead of analysing a single reduced model in place of the original one, valid on the whole time interval, we may want to analyse series of simplified models highlighting the important processes of the original model during the periods of time in which they are active.
This is how we address the problem of high dimensional model analysis in this study. We develop a mathematical and numerical approach based on the boolean concept of activity/inactivity. The method, called Principal Process Analysis (PPA), determines the contribution of each biological process to the output of the dynamical system. In models of biological networks, these processes appear in a linear additive manner in each ODE. We first identify the inactive processes and neglect them. In a second step, we treat processes whose activity varies along time: we define time windows in which these processes are either always active or always inactive. We eventually create submodels for each time window that only contain the active processes. This procedure leads to the simplification of the system to its core mechanisms. The simplified system can be further studied, to understand the role of each active process in the system dynamics.
PPA is a general approach that can be easily applied to any biological system described by ordinary differential equations (ODEs). It shares common features with a model reduction method focusing on major model parameters rather than processes [4], in which parameters that are not required for the system behaviour are removed. Another approach dedicated to chemical reactions identifies and removes chemical species that contribute less to the model output [5]. In this case, the problem is solved using optimization approaches (see also [13]). Despite these similarities, PPA is not a model reduction approach. It provides a mean to access to and dissect the more complex dynamics of the original model through the analysis of simplified versions in given time windows. Results are easily interpretable and do not require additional and complicated computations.
Preliminary work on PPA has been described in an earlier conference paper [14], in which we applied PPA to two ODE models of biochemical networks whose simplification preserved their dynamical behaviour: the model of circadian rhythms in Drosophila [15] and the model of the regulation of the ERK signalling pathway [16]. Questions remained open though, concerning the scalability of the approach and its robustness: to which extent does PPA preserve model dynamics in systems of higher dimension, with many more biological processes involved and including interlocked feedback loops? And since the approach requires a priori knowledge of the parameter values, how sensitive are process activities or inactivities to the value of these parameters? In this study, we address these questions by studying a much more complex model of circadian rhythms in mammals, including 16 variables, 76 processes, and intertwined positive and negative feedback loops [17]. Parameter sensitivity analysis of the global relative error between the original and reduced systems allows us to assess the quality and robustness of our approach.
The paper is organized as follows. “Methods” section describes the principle of Principal Process Analysis as well as global sensitivity analysis. “Model description” section introduces the model of mammalian circadian clock. We apply our approach to this complex model in “Principal Process Analysis of the circadian clock model” to “Influence of parameter values” sections, before concluding in “Conclusions” section.
Methods
We summarize below the basics of the method of Principal Process Analysis. We will use as running example the 14^{th} variable of the mammalian circadian clock model analysed in “Model description” section (see also Appendix B. It describes how the concentration of the nuclear form of protein BMAL1 (B_{ N }=x_{14}) changes:
Principal Process Analysis (PPA)
Consider the following ODE model of biological network:
where \(x=\left (x_{1},x_{2},\ldots,x_{n}\right) \epsilon \mathbb {R}^{n}\) is the vector of component concentrations, \(x0=\left (x0_{1},x0_{2},\ldots,x0_{n}\right) \epsilon \mathbb {R}^{n}\) the vector of their initial values and p\(\epsilon \mathbb {R}^{b}\) the vector of parameters. Each equation is decomposed into a sum of biological processes:
where f_{ ij } represents the j^{th} process involved in the dynamical evolution of the i^{th} variable of the system over a period of time [ t_{0},T].
Example 1
Equation (1) includes seven processes, each associated with a specific biological function. They take a positive or negative value, depending on whether they affect positively or negatively the variation of BMAL1 concentration. The equation of the protein is rewritten as:
where \(f_{14,1}=  V_{3B} \frac {B_{N}}{K_{p}+B_{N}},...,f_{14,7}=  k_{dn} B_{N} \).
Figure 1a shows the dynamical evolution of processes f_{14,1} to f_{14,7} during a day. Nuclear import of BMAL1 is the fastest process of Eq. (1), while the basal degradation of the protein is the slowest.
Comparison criteria are needed to weigh the influence of the different processes f_{ ij } on the time evolution of each variable x_{ i }. There are several alternatives. For instance, we can compare their absolute value (f_{ ij }(x,p)), scale it by the i^{th} initial condition \(\left (\frac {f_{ij}(x(t),p)}{x_{0i}} \right)\), or scale it by the solution of the i^{th} ODE \(\left (\frac {f_{ij}(x(t),p)}{x(t)_{i}}\right)\). In this work we associate a relative weight with each process to make it dimensionless:
where 0≤W_{ ij }(t,p)≤ 1 and \(\sum _{j}W_{ij}(t,p)=1\).
Definition 1
Let the continuous function f_{ ij }(x(t),p) be the j^{th} process of \(\dot {x}_{i}(t)\) in t ε[t_{0},T] and let the threshold δε [0,1]. We call a process f_{ ij }(x(t),p) always inactive when W_{ ij }(t,p)<δ∀ t ε [0,T]. We call a process f_{ ij }(x(t),p) inactive at time t when W_{ ij }(t,p)<δ. We call a process f_{ ij }(x(t),p) active at time t when W_{ ij }(t,p)≥δ. Switching time for a process f_{ ij }(x(t),p) is the time \(t_{ij}^{s}\) at which W_{ ij }(t,p)=δ. A process can have 0,1,…,z switching times. The switching time set S_{ i } for the i^{th} variable contains all the switching times \(t_{ij}^{s}\) where j=1,..,k and s=1,…,z. The global switching time set S is the union of all S_{ i }.
The choice of δ is important, since it determines above which weight a process can be considered active or inactive and, as we will see it later, if the process should be kept or omitted in the simplified model. An excessively high value might lead to an oversimplified model, without many dynamical features of the original model. Conversely a very low value might result in a model insufficiently simplified, which remains too complicated to analyse. From our experience, a convenient value is δε [0, 0.1], where the value of δ can be adjusted to the number of processes. For instance, if an ODE contains numerous processes of similar value, each individual process weighs little. In this case, δ should not be chosen too high to avoid omitting all these processes; it can be inversely proportional to the total number N of processes in the equation: \(\delta \propto \frac {1}{N}\). In this paper, finetuning the threshold value is not justified: there are not many processes per equation and they have very different values. We will always take δ=0.1.
Example 2
We apply Eq. (5) to determine the dynamical weight of the seven processes in Eq. (1). Results are shown in Fig. 1b. As expected, the nuclear import, which is the fastest process, weighs more in the dynamical evolution of BMAL1 concentration, while the basal degradation of the protein weighs little. We determine the process activities using δ=0.1:

The weight of processes W_{14,2}, W_{14,6}, W_{14,7} is always below δ: their related processes f_{14,2}, f_{14,6}, f_{14,7} are thus always inactive;

The processes W_{14,1} and W_{14,3} are always above δ: f_{14,1} and f_{14,3} are active during the whole system dynamics;

The weight of processes W_{14,4} and W_{14,5} crosses the threshold twice and the switching times \(t_{14,4}^{1}=4.4\)h, \(t_{14,4}^{2}=20.7\)h, \(t_{14,5}^{1}=0.8\)h and \(t_{14,5}^{2}=20.3\)h are collected in the set S_{14}.
Visualization of process activities
Graphical tools turn out to be useful to analyse the dynamical weights of complex systems such as the mammalian circadian clock model. We use three of them in PPA, which are described below.

The Boolean Process Map summarizes qualitatively the knowledge of the process activity or inactivity along time for each variable. A black bar means that the process is active, while the white bar indicates an inactive process.

Example: The Boolean Process Map in Fig. 2a represents the process activities deduced from the dynamical weights in Fig. 1b. We observe that there is always an active phosphorylation of BMAL1 in the nucleus, while the basal degradation can be considered always inactive. The nuclear export is solely active in the first and last periods of time.

The Dynamical Process Map is a network representation of the process activities. Variables (represented by boxes) are connected by processes (arrows). Three cases arise, which depend on the activity of processes shared by several variables: blackcoloured arrows represent processes that are inactive for all variables involved, while active processes are displayed as red arrows. Yellow arrows are used for processes shared by several variables that have different activities: for instance, one process is considered active in one equation, but inactive in another one. Note that the model simplification by elimination of inactive processes, as will be described in “Model simplification by elimination of always inactive processes” section, will have for effect to remove black arrows in the Dynamical Process Map.

Example 3 Figure 2 b represents the Dynamical Process Map for x_{14}, the nuclear concentration of Bmal1, in the time interval between \(t^{1}_{14,4}\) and \(t^{1}_{14,5}\). Phosphorylation is an example of active process for the nuclear BMAL1 concentration (see the Boolean Process Map in Panel A). It is shown in red because it is also considered active at the same moment for the other variable sharing this process, the concentration of phosphorylated BMAL1.

The 3D Process Map represents the timedependent evolution of the intensity of each process. Process activities are averaged per hour, which leads to a discretisation of time. Vertical bars represent process weights for each hour. Their color code represents the intensity of process weights relatively to the other weights.

Example 4 Figure 2 c describes the 3D Process Map for the concentration of nuclear BMAL1. The phosphorylation of the protein, its nuclear import and its consumption for the formation of a large complex are the processes the most active over time.
Model simplification by elimination of always inactive processes
Eliminating processes that play a minor role in the system dynamics facilitates the analysis of large models. Since in the previous steps of PPA we have determined the process activities in system (2), we now neglect processes that are considered always inactive. This will give us g(x^{r}), the function approximating f(x) in (2) with less processes.
We thus introduce the ODE system (6), which approximates system (2):
where \(x^{r}=\left (x_{1}^{r},x_{2}^{r},\ldots,x_{n}^{r}\right) \epsilon \mathbb {R}^{n}\) is the vector of component concentrations, \(x0=\left (x0_{1},x0_{2},\ldots,x0_{n}\right) \epsilon \mathbb {R}^{n}\) the vector of their initial values, and \(p^{r} \epsilon \mathbb {R}^{c}\), where c≤b is the vector of parameters. The model simplification approach relies basically on the following theorem: if the vector fields of two systems are close (f(x)≈g(x)), then the solutions of the original and approximated systems are close during some time interval under the assumptions on the Lipschitz conditions listed in [18, p. 79, Th. 2.5].
Based on the dynamical weights determined in “Principal Process Analysis (PPA)” section and the threshold value δ, we apply the following rule to define g(x^{r},p^{r}): if W_{ ij }(x(t),p) < δ∀ t ε [ t_{0},T]theng_{ ij } = 0; if not, g_{ ij }≡f_{ ij }.
We thus define x^{r} as an approximation of x and p^{r} as a subset of p.
Example 5
We proceed to the simplification of processes in Eq. 1. Because f_{14,2}, f_{14,6}, f_{14,7} are always inactive, g_{14,2}=0, g_{14,6}=0, g_{14,7}=0 and g_{14,1}≡f_{14,1}, g_{14,3}≡f_{14,3}, g_{14,4}≡f_{14,4}, g_{14,5}≡f_{14,5}. The resulting ODE for \(x^{r}_{14}\) is:
Note that Principal Process Analysis is applied to each ODE separately. As a consequence, processes shared by two equations can be active in one equation, but inactive in the other. Elimination of the inactive processes breaks mass balance in the simplified model. For our purpose, this is not an issue: the simplification does not aim at reducing the model, but rather analysing a submodel of the original one, which describes the dynamics of the important phenomena.
It is interesting to quantify the extent to which the simplified system (6) preserves the behaviour of the original one. This gives a better sense of how the active processes kept in the simplified model are responsible for the dynamics of the original system. In addition, this helps identifying potential problems related to the model simplification, for instance involving a wrong choice of the δ value. One can also imagine pathological cases, when the simplified system does not reproduce the main dynamical features of the original model: for instance, if it evolves towards a different basin of attraction or if the removal of a consumption term does not compensate a synthesis term any more, leading the simplified system to explode in finite time. It is non nonsensical in all these cases to analyse simplified models that behave so differently from the original ones. The δ threshold should be adjusted to a new value and Principle Process Analysis rerun until model simplification proves satisfactory according to the criteria described below.
We present in Appendix A an a priori analysis of the error made when removing some inactive processes. This analysis gives a theoretical, but very conservative, bound on the error. In practice, we numerically compute the global relative error between the original and simplified models. Several forms of error are possible. We have chosen the following one, analysed over a period of time [ t_{0},T], in which y_{ h } and \(y^{r}_{h}\) are the h^{th} outputs of the original and simplified systems, respectively:
How to choose the model outputs? They can correspond to all model variables or combinations of them, if the latter are involved in some biological phenomena of interest for instance. In the case of the circadian clock model, six variables were specifically studied in the original papers [17, 19], which we will use as outputs to determine the global relative error between the original and simplified models: the concentrations of Per mRNA (M_{ P }), Cry mRNA (M_{ C }), Bmal1 mRNA (M_{ B }), total PER protein (P_{ Tot }), total CRY protein (C_{ Tot }) and total BMAL1 protein (B_{ Tot })^{Footnote 1}.
Creation of sequences of submodels
In the previous step of PPA, the models are simplified by elimination of always inactive processes. Here we go one step further in the simplification, by eliminating processes that are inactive at times. This is achieved by decomposing the period of time during which the system evolves into time intervals. To that end we use the switching times t_{ b } (with b=1,…,d) determined in “Principal Process Analysis (PPA)” section: this allows creating a succession of submodels for each time interval, which contain the core mechanisms in that period of time.
To avoid creating large sequences of submodels, we reduce the number of time windows by grouping proximal switching times with the easytocompute kmeans clustering [20]. Hence the dswitching times included in the global switching time set S=[t_{1},t_{2},...,t_{ d }] are grouped into z (≤d) clusters C= {C_{1},C_{2},...,C_{ z }}, so as to minimize the withincluster sum of square (or withincluster inertia):
where μ_{ v } is the mean of the switching times in C_{ v }. The consequence is that processes with switching times belonging to cluster C_{ v } are assumed to switch together at the same time \(t^{r}_{v}=\mu _{v}\), the mean switching time in cluster C_{ v }.
How to define the right number of clusters? A too large number of clusters will result in a low error, but also in numerous time windows that make the simplified models still too complex to analyse. Equation (10) describes how to take into account this tradeoff between the number z of clusters and the error. It is related to the difference between the maximum and the minimum number of active processes during the temporal evolution of the system: if this difference is low, z should be chosen low as well. We thus define z, rounded to the nearest number, as:
where \(n^{v}_{act}\) denotes the number of active processes in the v^{th} time window.
We eventually end up with a sequence of z+1 submodels in the time interval [ 0,T], the first one being valid in \(\left [0,t^{r}_{1}\right ]\) and the last one, in \(\left [t^{r}_{z},T\right ]\).
Similarly to the global errors determined in “Model simplification by elimination of always inactive processes” section, we can also assess how the newly simplified models reproduce the dynamical behaviour of the original model in each time window \(\left [t^{r}_{v1},t^{r}_{v}\right ]\), by measuring the error:
We compute the error (11) between the original model and each submodel, with or without propagating errors: in the first case, for each time window \(\left [t^{r}_{v1},t^{r}_{v}\right ]\) (v=1,…,z+1 with \(t^{r}_{0}=t_{0}\) and \(t^{r}_{z+1}=T\)), the initial values of the h outputs of submodel SM_{ v } are equal to the final values at \(t^{r}_{v1}\) of submodel SM_{v−1}; in the second case, they are equal to the values of the original model at \(t^{r}_{v1}\).
Global sensitivity analysis
Principal Process Analysis is applied to models with given parameter values and initial conditions. It may be questioned whether the uncertainty of their values influences the simplification of the model and thus, the analysis of the system dynamics. While we have shown PPA to be robust to variations of initial conditions in [21], the question remains open for parameter values.
To that aim, we perform global sensitivity analyses on the global relative errors between the original model and the reduced model (defined in Eq. (11)). Such an analysis consists in quantifying the parameter influence on the error, while varying the parameters simultaneously in given ranges. In contrast, in a local sensitivity analysis, parameters would vary oneatatime in the neighbourhood of their nominal value. First, we perform an analysis on each of the six errors defined for the six model outputs \( \left (e^{v}_{M_{P}},e^{v}_{M_{C}},e^{v}_{M_{B}},e_{P_{Tot}},e^{v}_{C_{Tot}},e^{v}_{B_{Tot}}\right)\). Then, in a more detailed analysis, we compute the global relative error for each state variable, according to Eq.(11) (with y_{ h }=x_{ i },i=1,…,16); sensitivity analyses are also performed on each of these 16 errors. The method used is based on factorial design [22], analysis of variance (ANOVA) and principal component analysis (PCA) [23].
We first explore the parameter space using a factorial design. We vary N_{ f }=51 parameters of the model [17] (see “Model description” section). We choose N_{ l }=2 levels for each parameter p_{ f } (or factor): \(p^{}_{f}= 0.8\: p_{f}\) and \(p^{+}_{f}=1.2\: p_{f}\). A full factorial design, defined as all possible combinations of the parameter levels, would be necessary to estimate the main effects and interactions of all parameters. Such a full design corresponds to \(N_{l}^{N_{f}}=2^{51}\) parameter combinations and would necessitate the same number of model simulations to compute the corresponding outputs, which are far too many. Thus we implement a fractional factorial design [24], which is a subset (fraction) of the full design of size \(N_{j}<N_{l}^{N_{f}}\). The design is determined according to a given statistical model linking the error e_{ h } to the parameters p_{ f }, for each time window \(\left [t^{r}_{v1},t^{r}_{v}\right ]\). We choose a second order linear model, which incorporates all main effects and twoway interactions as follows:
where \(e^{v}_{h,j}\) is the error computed according to Eq. (8) for output (or state variable) h, time window v, and parameter combination j (j=1,…,N_{ j }) of the fractional factorial design; \(\mu ^{v}_{h}\) is the grand mean; \(\alpha ^{v}_{h,f(j)}\) is the main effect of parameter p_{ f } for parameter combination j; \(\beta ^{v}_{h,f(j)k(j)}\) is the interaction effect between parameters p_{ f } and p_{ k } (k≠f) for parameter combination j; and \(\epsilon ^{v}_{h,j}\) is the residual. Each main effect \(\alpha ^{v}_{h,f(j)}\) can take two values, according to the level of parameter p_{ f } in combination j: \(\alpha ^{v}_{h,f^{+}}\) or \(\alpha ^{v}_{h,f^{}}\). Similarly, each twoway interaction effect can take four values: \(\beta ^{v}_{h,f^{+}k^{+}},\: \beta ^{v}_{h,f^{+}k^{}},\: \beta ^{v}_{h,f^{}k^{+}},\: \beta ^{v}_{h,f^{}k^{}}\). The fractional factorial design determines the parameter combinations needed to estimate all main effects and twoway interactions. It is obtained using R package planor^{Footnote 2} and consists of N_{ j }=2^{12} parameter combinations, yielding as many simulations. According to the sparsityofeffects principle, a system is usually dominated by main effects and low order interactions, so neglecting thirdorder and higher interactions can still provide good estimates.
An ANOVA is then performed on these simulations, for each error e_{ h }. It consists in estimating the grand mean, main effects and interaction terms of model (12), using a leastsquare criterion to minimise the residuals. It is based on the following variance decomposition:
where \(\overline {e^{v}_{h}}\) is the mean error computed over all N_{ j } simulations of the fractional factorial design; and \(\widehat {e^{v}_{h,j}}=\widehat {\mu ^{v}_{h}}+\sum _{f}\widehat {\alpha ^{v}_{h,f(j)}}+\sum _{f}\sum _{k\neq f}\widehat {\beta ^{v}_{h,f(j)k(j)}}\) (\(\widehat {~}\) denoting an estimated value) is the error estimated from the linear model (12) for parameter combination j. The total sum of squares \(SS_{T}^{h,v}\) is split into the sum of squares attributed to the model \(SS_{M}^{h,v}\) and the residual sum of squares \(SS_{r}^{h,v}\), the latter corresponding to the criterion that is minimised. In turn, \(SS_{M}^{h,v}\) is split into sum of squares attributed to each main effect \(\alpha ^{v}_{h,f}\) and twoway interaction term \(\beta ^{v}_{h,fk}\), denoted respectively \(SS_{f}^{h,v}\) and \(SS_{fk}^{h,v}\). The total sensitivity index of parameter p_{ f } is then defined as follows:
Noting that the variance of error \(e^{v}_{h}\) computed over all N_{ j } simulations of the fractional factorial design is \(\sigma ^{2}_{e^{v}_{h}}=\frac {1}{N_{j}1}SS_{T}^{h,v}\), the total sensitivity index \(tSI_{f}^{h,v}\) represents the fraction of the variance explained by parameter p_{ f }. As an ANOVA requires a scalar variable, separate sensitivity analyses are performed for each scalar error \(e^{v}_{h}\) and separate indices are computed for each error \(e^{v}_{h}\). To compare the parameter influence on the different errors \(e^{v}_{h}\), we use non normalised indices, obtained by multiplying each \(tSI_{f}^{h,v}\) by the variance of the error:
To obtain synthetic sensitivity indices that represent the influence of each parameter on the errors for all 16 state variables, we decompose the error vector (e_{ h }:h=1…,16) by PCA (without normalisation). As a result, an inertia proportion ω_{ c } can be attributed to each component c (a component is a linear combination of the 16 errors e_{ h }). It represents the variability among all simulations carried by this component. Only the N_{ c } first components whose cumulated inertia add up to 95% or more are retained. Moreover, each simulation is given a score on each component, a scalar representing the projection of the simulation on the component. Then, for each component retained, an ANOVA is performed on the scores and total sensitivity indices \(tSI^{c}_{f}\) are computed, as described in Eq. (14). Finally, a total generalised sensitivity index is calculated for each parameter p_{ f } as the sum of the total sensitivity indices on each component, weighted by the inertia of the component:
We use the multisensi R package^{Footnote 3} for this analysis.
In what follows, we show how Principal Process Analysis can help with the analysis of complex biological models. We apply the approach to a model of the circadian clock developed in [17, 19], which we describe in the following section.
Results
Model description
Periodic fluctuations of the environment subject living organisms to biological rhythms. The latter are endogenous by nature, but entrained by environmental variations. For instance, circadian rhythms are generated by a molecular clock within cells, which synchronizes daily physiological variations to the daynight alternance. The model we study here describes the circadian clock in mammals [17, 19]. In this model, the clock forms a complicated network of intertwined positive and negative feedback loops involving four clock genes: Per, Cry, Bmal1, and Clock. Their mRNA and protein produce sustained oscillations with a period of 24 hours. Light affects expression of gene Per at the transcriptional level: the first twelve hours of day light increase its transcription rate (up to 1.8 [ μM/h]), while it is lowered in the next twelve hours of darkness (down to 1.5 [ μM/h]). The system functions as follows (for the complete schema, see Fig. 3):

Transcription of genes Per, Cry and Bmal1 occurs in the nucleus. The newly synthesized mRNAs are exported to the cytosol.

In the cytosol, the mRNAs can be either degraded or translated into proteins, which ones are subsequently phosphorylated (the process is reversible). Unphosphorylated proteins PER and CRY form the complex PERCRY, which reversibly enters the nucleus. The nuclear and cytosolic forms of the complex can be phosphorylated. Likewise, protein BMAL1 is reversibly phosphorylated and reversibly enters the nucleus, but sole its unphosphorylated form makes a complex with protein CLOCK. Phosphorylated proteins and complexes in the nucleus or the cytosol are subject to degradation.

In the nucleus, the complex CLOCKBMAL1 activates the transcription of Per and Cry genes. Activation is stopped by binding of the PERCRY complex to CLOCKBMAL1, which indirectly inhibits Per and Cry transcription.

The concentration of CLOCK protein is not a variable in the model because it is constitutively expressed at high levels and considered to be not limiting [17].
The 16 model equations, 56 parameter and 16 initial condition values are shown in Appendix B. The model dynamics is difficult to analyse though, as the circadian clock involves numerous processes, including interlocked positive and negative feedback loops responsible for the oscillatory behaviour of the clock proteins. Reducing the original model around its core active processes can facilitate the model analysis, without changing significantly the original dynamics, in particular the sustained oscillations of the solutions.
Principal Process Analysis of the circadian clock model
We apply Principal Process Analysis to identify major processes of the circadian clock model. To this end we decompose each ordinary differential equation in processes, as shown in Eq. (4) for BMAL1. Each process has a biological interpretation and corresponds to a regulatory mechanism or a biochemical reaction.
We then calculate the relative weight of each process using Eq. (5) and the threshold value δ=0.1.
We collect the switching times (values given in Appendix C) and then build a Boolean Process Map to visualize the activity/inactivity of each process, shown in Fig. 4. We simplify the model by neglecting 24 out of 76 processes, which are always inactive (32% of all processes). They correspond to mRNA and protein basal degradations; cytosolic dephosphorylations of CRY, BMAL1, and PERCRY; PERCRYCLOCKBMAL1 dissociation in the nucleus; and BMAL1 dephosphorylation in the nucleus. The list of neglected processes is shown in Appendix D.
We now determine the global relative errors between the original and the simplified model using Eq. (8) for all six outputs (see Table 1). The dynamics of the two models are compared in Fig. 7a. The simplified model preserves qualitatively the trend of the original solutions, as well as their sustained oscillations. The most noticeable difference concerns the peak of the total concentration of protein PER (P_{ Tot }), which corresponds also to the highest error in Table 1 (26.48%): the peak is lower with the reduced model, which also explains the delay between the original and the simplified solutions. Nevertheless the simplified model reproduces qualitatively the oscillatory behaviour of protein PER observed in the original model. The concentrations in the original and simplified models peak at almost the same time. These global relative errors do not call for an adjustment of the threshold value δ. In the next section, we proceed to the second step of the Principal Process Analysis.
Creation of submodels
The simplified model obtained above can be further reduced if we also neglect processes that are sometimes inactive during the system dynamics. Based on the Boolean Process Map and the collected switching times, we identify between 38 and 45 active processes along time (Fig. 5) and a total of 46 switching times (see Fig. 6a). Clustering the switching times into 4 clusters (Fig. 6b) allows us to generate the five submodels described below. The number of clusters has been chosen according to Eq. (10).

SM1, valid from \(t_{0}^{r}=0\) to \(t_{1}^{r}=0.9\) h: neglected processes for this model are always inactive (32% of the total). This model corresponds to the simplified model obtained in “Principal Process Analysis of the circadian clock model” section.

SM2, from \(t_{1}^{r}=0.9\) h to \(t_{2}^{r}=6\) h: 46% of the processes are neglected. In addition to the always inactive listed in “Principal Process Analysis of the circadian clock model” section, we have the following inactive processes in this model: cytosolic dephosphorylation of PER, CRY, and PERCRY; cytosolic dissociation of PERCRY; nuclear dephosphorylation of PERCRY; PERCRY export from the nucleus; and formation of the large complex PERCRYCLOCKBMAL1.

SM3, from \(t_{2}^{r}=6\) h to \(t_{3}^{r}=12.5\) h, in which 50% of processes are neglected. In addition to the processes listed in “Principal Process Analysis of the circadian clock model” section, inactive processes are in this case: transcription of Per and Cry mRNAs; cytosolic phosphorylations and dephosphorylations of PER and CRY; cytosolic dephosphorylation of PERCRY; nuclear phosphorylation and dephosphorylation of PERCRY; and nuclear export of BMAL1.

SM4, from \(t_{3}^{r}=12.5\) h to \(t_{4}^{r}=20\) h, which neglects 42% of processes. The processes include the processes listed in “Principal Process Analysis of the circadian clock model” section, as well as: PER and CRY translation; formation of the PERCRY complex in the cytosol; PERCRY dephosphorylation in the cytosol and the nucleus; and export of BMAL1 from the nucleus.

SM5, from \(t_{4}^{r}=20\) h to \(t_{5}^{r}= 24\) h, in which 46% of the processes are neglected. With the processes listed in “Principal Process Analysis of the circadian clock model” section, other neglected processes are: cytosolic dephosphorylation of PER and CRY; PERCRY dissociation in the cytosol; export of PERCRY; PERCRY dephosphorylation both in the cytosol and the nucleus; and PERCRYCLOCKBMAL1 formation.
See also Appendix D for the list of neglected processes in each submodel.
Table 2 gives the global relative errors (11) without propagation error, between the original model and the submodels for the six outputs and for each time window. Figure 7b illustrates the six model outputs for the original model and the submodels without propagation errors, while Fig. 7c compares the coupled submodels with and without propagation error.The simplified models preserve the oscillatory behaviour of the total concentrations of PER, CRY, and BMAL1, albeit with some discrepancies in the amplitude of the oscillations. It is in the third time window that the approximated solution differs the most from the original one (Table 2). This is visible in Fig. 7b in the third time window where the total concentrations of PER and CRY form a much higher peak in the solution of the simplified model. Recall that this error is not an issue, since our objective is primarily the qualitative analysis of the model. It is sufficient that the remaining processes in the simplified model produce a dynamical behaviour qualitatively similar and relatively close to the original model. This shows their important contribution to the system dynamics.
Applying a Dynamical Process Map to the third submodel (Fig. 8; see also “Visualization of process activities” section) shows that the transcription of Per and Cry genes is inactive (black arrow) and that both PER and CRY phosphorylations in the cytosol and in the nucleus are not entirely active (they are not active for all the variables in which they are involved, yellow arrow). In the other time windows these processes are always entirely active (red arrows). This probably explains why we had an higher error in Table 2 for the variable M_{ P }, M_{ C }, P_{ Tot } and C_{ Tot } in SM3. The global sensitivity analysis, presented in the next Section, will confirm the validity of this assumption.
Since the dynamics of the coupled submodels remain close to the original one, we can further analyse the behaviour of the network simplified to its core processes. We use the Dynamical Process Maps for the different submodels (Appendix E), together with the process activities in Fig. 4 and the model outputs in Fig. 7. The simplified models preserve the three main interlocked feedback loops described in the original model, one positive and two negative loops. The functioning of these loops is directly affected by changes of process activities. Among the two negative feedback loops, which one is the main oscillator? One negative feedback loop involves the inhibition of Bmal1 transcription by the nuclear form of BMAL1 associated to the protein CLOCK. If this mechanism is the main source of oscillations, we should observe wide changes in process activities controlling BMAL1 levels. The total concentration of the protein does not vary much in amplitude (Fig. 7). It mainly decreases in SM2 and SM3, when the concentration of PERCRY is also high and forms a complex with CLOCKBMAL1, which is subsequently degraded. This degradation process is active most of the time (Fig. 4 and Appendix E), but variations of the total BMAL1 concentration do not modify strongly the transcription of Bmal1 mRNA, which remains always active. As well, the other processes of translation, phosphorylation and degradation for this variable almost never switch between inactive and active states over time (Fig. 4 and Appendix E). Overall, this suggests that the negative feedback loop involving CLOCKBMAL1 is not the main oscillator. A similar conclusion was drawn for the original model in [17].
The other negative feedback loop inhibits Per and Cry transcription through the titration of CLOCKBMAL1 by PERCRY to form the inhibitory complex PERCRYCLOCKBMAL1. The total concentration of BMAL1 peaks before that of PER and CRY, as can be seen in Fig. 7 for SM2 and SM3. When its concentration is maximal in SM1 and SM2, the nuclear form of the protein associated to the protein CLOCK stimulates the transcription of Per and Cry genes, in conditions where light has also a stimulatory effect on the transcription of these two genes. The processes of transcription and translation of Per and Cry are active in both models, as a result of which levels of PER and CRY raise to reach their maximal concentration in SM3. As can be seen from the process activities in Fig. 4 and the Dynamical Process Maps in Appendix E, conditions are favourable for the accumulation of high levels of complexes PERCRY and CLOCKBMAL1PERCRY in the nucleus. For instance, numerous processes decreasing PER, CRY and PERCRY concentrations in the cytosol and the nucleus are inactive: their phosphorylation is reduced (the process is inactive for the dephosphorylated forms but still active for the phosphorylated ones), which limits their degradation, and the nuclear import of PERCRY is always active. During the same period of time, the formation of the large complex CLOCKBMAL1PERCRY, which is active for both CLOCKBMAL1 and PERCRY (Fig. 4 and Appendix E), suggests that the nuclear forms of PERCRY and CLOCKBMAL1 bind as soon as they accumulate in the nucleus. The large complex is immediately degraded since its degradation process is always active and its dissociation, always inactive.
In SM2 and SM3, the degradation of the large complex is not compensated for by other mechanisms allowing BMAL1 accumulation in the nucleus: the cytosolic form of the protein is actively phosphorylated and then degraded, while its dephosphorylation is inactive, which reduces the quantity of protein to be imported in the nucleus (see Fig. 4 and the Dynamical Process Maps in Appendix E). In this compartment, the absence of active dephosphorylation, together with the active protein phosphorylation, also contribute to decrease pools of CLOCKBMAL1 complexes (Fig. 4, Appendix E). This halts transcription of Per and Cry mRNAs in SM3 (the processes are inactive and light is also switched off towards the end of SM3). This also affects the translation of PER and CRY, which becomes inactive in SM4. Altogether these observations suggest that the negative feedback loop inhibiting Per and Cry transcription via the complex CLOCKBMAL1PERCRY is the main source of circadian oscillations. This is consistent with conclusions in [17], where a second oscillator based on the autoinhibition of BMAL1 has been obtained for specific parameter values only. These results are also consistent with the observation of arrhythmic behaviours in mutant mice with double knockout of the Per and Cry genes [25, 26].
The positive feedback loop activates Per and Cry transcription through a control of protein stability mediated by the phosphorylation processes. In the model, sole the phosphorylated forms of the proteins are degraded. We observed that the reversible phosphorylation reactions are often displaced in the forward sense, as dephosphorylation processes are often found inactive. In particular, they contribute to decrease the concentration of PER, CRY and PERCRY, which also diminishes the concentration of the large complex CLOCKBMAL1PERCRY and thus relieves the inhibition exerted by the complex on transcription of Per and Cry genes. Kinetic modelling of the circadian clock in Drosophila has shown the importance of this positive feedback loop for circadian rhythms [27].
Influence of parameter values
In order to check the robustness of the five submodels, we perform a global sensitivity analysis on the output errors for each time window \(\left (e^{v}_{h}\right)\). We perform the analysis without propagating the errors because each submodel is valid for a specific time window, independently from the other time windows. We vary 51 among the 56 parameters of the model: the Hill coefficients m and n are kept fixed because they represent the degree of cooperativity in gene repression/activation, while k_{ stot }, v_{ stot }, V_{ phos } are function of other parameters (see Appendix B). We hence compute the non normalised total sensitivity indices for all parameters according to Eq. (15) (see Fig. 9, first column). Because the last three outputs (P_{ Tot },C_{ Tot },B_{ Tot }) are the sum of model variables that interact, some processes have no impact on these outputs and the information on the parameter influence is lost. We also perform the global sensitivity analysis on the 16 global relative errors between the original model and the submodel variables without propagating errors (see Fig. 9, second column). The complex PERCRY plays an important role in every time window: its variability is due mostly to its maximal phosphorylation velocity (V_{1PC}) and its degradation parameter (v_{ dPCC }). In the third and fourth time window the other important variation is due to the CRY protein: in SM3 the variation is mostly due to the binding constants in the transcription of Per and Cry mRNAs (K_{ AP } and K_{ AC }) and in SM4, to the maximal translation rate of BMAL1 (k_{ sB }) that stimulates Per and Cry mRNA transcription. In the last time window, lots of variables contribute to the system variation: the most important parameter for the variability of the outputs is the maximal velocity of BMAL1 phosphorylation in the nucleus (V_{3B}).
To get a more global view of the model simplification, we calculate, for each parameter combination and for each time window, the average error (averaged over the 16 variables) between the original model and the submodel variables as follows:
Results are shown in Fig. 10. The variability is higher in the third and four submodel, although the difference between the lower and upper quartiles is low in all submodels.
Then, for each timewindow, we compute the total generalised sensitivity indices according to Eq. (16), which represents the fraction of error variability explained by each parameter when parameter values vary. The results are shown in Fig. 11: we obtain similar results to the ones in Fig. 9 (column 2): in SM1 and SM2, the maximal phosphorylation velocity (V_{1PC}) and degradation (v_{ dPCC }) of PERCRY complex play the main role; in SM3, the binding constants of Per and Cry proteins (K_{ AP } and K_{ AC }); in SM4, the translation of BMAL1 protein (k_{ sB }) and in SM5, the maximal phosphorylation velocity of BMAL1 protein in the nucleus (V_{3B}). In order to check whether the error variations between the original model and the submodels are due to parameters appearing in neglected processes, we determine the following ratio: \(R^{v}_{h}=\frac {\sum \limits _{f\in \{ inactive~processes \}}tGSI_{f}^{h,v}}{\sum \limits _{f}tGSI_{f}^{h,v}}\). We only use the 10 most informative parameters, with higher tGSI, as they explain most variability. We choose a conservative option: if a parameter is neglected in an inactive process but still appears in other active processes, we still consider that it belongs to the neglected process parameters (worst case). Results are shown in Table 3. In most time windows, the variability is mainly due to parameters still contained in the reduced submodels, i.e. the parameters of the active processes. In the third timewindow, however, parameters appearing in neglected processes generate more than 50% of the variability. It is consistent with Fig. 7b: the peaks of the total concentration of PER and CRY are overestimated by the submodel and some of the most important parameters that lead to the output variability for this time window are the translation rate of PER and CRY proteins, the maximal phosphorylation velocity of PERCRY complex in the cytosol and nucleus (as it has been shown in Fig. 11). This confirms what we have supposed when applying the Dynamical Process Map to SM3 (see discussion about Fig. 8 at the end of “Creation of submodels” section).
Discussion
A challenging task when analysing the dynamics of biological networks is to understand the relation between the network behaviour and its numerous processes, the activity of which is switched on and off by regulatory mechanisms. Model reduction is one possible approach to deal with model complexity and help deciphering the design principle of these networks. However, the reduced models can be still too complex to study and this does not answer the question on the role of each individual process in the network behaviour. Ideally, one would like to identify the major processes, quantify and then understand their contribution to the system dynamics. Principal Process Analysis was developed with this objective in mind, and with the final goal of simplifying the original model in one or several submodels around core active processes that are responsible for the dynamics of the original system. The dynamics of the core processes is much more tractable in the submodels than in the original one. Questions remained open though concerning the scalability and robustness of this approach.
In this paper we tested the scalability of Principal Process Analysis by applying the approach on a model of high dimension, the mammalian circadian clock model, which incorporates numerous processes and complex interlocked feedback loops responsible for oscillatory behaviours. Simplification of the original system dynamics to as much as 50% of its processes in five coupled submodels helped us relate the dynamics of the simplified models to the system components and their active interactions. We hence observed that the negative feedback loop controlling Per and Cry transcription through the formation of the large complex PERCRYCLOCKBMAL1 is the main oscillator, in agreement with previous experimental and modelling studies [17, 25, 26]. Principal Process Analysis has been also applied with success to diverse biogeochemical and biochemical models in our group and elsewhere, see [14, 21, 28, 29]. These case studies and the present one exemplify the applicability and scalability of the approach to models of diverse nature and and complexity.
In this paper, the quantification of global errors allowed us to conclude that the simplified models reproduce well the behaviour of the original ones. Even in the case of the largest errors observed on the model output, did the simplified models preserve the oscillations of the clock proteins. Since Principal Process Analysis is based on the a priori knowledge of the model parameters, it was important to assess the robustness of the approach to uncertainties on these parameter values. Through a global sensitivity analysis, we studied the impact of of parameter values on the error between the original model and the simplified submodels. Not only was the variation of the error small, but it was mostly due to parameters of the neglected processes. With this analysis, we proved the robustness of PPA to parameter uncertainty. In addition, we provided clues to identify and solve potential troubles related to the model simplification, in order to decrease errors between the original model and the simplified submodels.
In a recent study, we also showed by other methods the robustness of PPA to initial conditions [21]. The latter were supposed to lie in rectangles contained in a region of the variable space varying by one order of magnitude in each coordinate. Under additional assumptions on the monotonicity of biological processes within rectangles, the maximal bound of process weights was computed, which allowed identifying active processes in each rectangle, similarly to “Principal Process Analysis (PPA)” section, for which weight is above the threshold value δ. Based on the behaviour of processes on the edges of the rectangles, it was then possible to determine the transitions between rectangles and deduce the evolution of process activities along the different transitions. The method has been applied on a small gene expression network containing a negative feedback loop [21]. The same principles could be applied to show the robustness of the model to larger variations of its parameters. In this case, the parameter space should be divided in rectangles in which the activity/inactivity of processes is studied. Such extension of the method is part of a future work.
In the current state of development, Principal Process Analysis is not a model reduction approach. For instance, the elimination of the inactive processes from the original model breaks down the mass conservation relations when eliminating a process in one equation that is considered active in others. As long as the purpose of PPA is to analyse the important processes in the original model, this is not an issue. Nevertheless, the approach could be extended so as to preserve mass conservation relations. In addition, simplified models with much smaller global relative errors could be obtained, so that the simplified submodels represent more accurately the original model. We are currently studying a refinement of PPA by considering three different levels of activities (inactive, active, fully active), defined by two different thresholds in order to improve the quality of the model simplification and model analysis. Such improvements could bring PPA closer to a reduction method, since the simplified models become accurate representations of the original model.
Conclusions
Mathematical models of biological systems have grown in complexity to include large numbers of processes. As a consequence, their contribution to the system dynamics becomes hardly tractable. The current manuscript contributes to this problem with the development of Principal Process Analysis. Provided the ODE model of the system is composed of a linear sum of terms describing each a process, the method enables the identification of the major processes contributing to the system dynamics and when they play a key role. Removing inactive processes allows restricting the dynamical analysis of the system to its core processes and facilitates the understanding of the system functioning. The conclusions derived with the method are robust to fluctuations in the parameter values. As such, Principal Process Analysis can be applied to any type of ODE models with the same form.
Appendix A: Estimate of errors
We give in this appendix a rough estimate of the a priori error, based on bounds in the model and simple lemmas to compare the solutions of two differential equations. We refer to [18, Chapter 3], for the basic notions.
Consider the following ODE model of biological network, as given in Eq. (2):
Variable x is supposed to live in a bounded domain D of \(\mathbb {R}^{n}\), and all functions in f are supposed to be smooth enough (at least C^{2}) and Lipshitz on D. We denote by L the Lipschitz constant of f(x,p) on D. The decomposition into processes gives:
where f_{ ij } represents the j^{th} process involved in the dynamical evolution of the i^{th} variable of the system over a period of time [ t_{0},T], and n_{ i } is the number of processes for \(\dot {x}_{i}\).
The weights are computed during [ t_{0},T]. For the sake of simplicity, we suppose that, for each variable, the weight of the first process and only this weight is lower than threshold δ:
It means that processes f_{i1},i=1,…,n are inactive during period [ t_{0},T], and thus eliminated from the system, giving the new simpler system (the new variable is denoted by y for simplicity):
As variables are assumed to be bounded, vector f_{1}=(f_{11},…,f_{1n})^{t} is such that:
where B_{ i } is an upper bound for \(\sum _{j=1}^{n_{i}}f_{ij}(x(t),p)\) obtained from the variable bounds on domain D. All B_{ i } form vector B.
Therefore:
If the initial conditions are the same (x(t_{0})=y(t_{0})), then Theorem 3.4 in [18], which is based on Gronwall’s Lemma, gives a bound between the two solutions x and y:
The same proof applies when several f_{ ij } are inactive for some variables. One just need to sum the errors in Eq. (22).
This gives a rough bound between the two solutions; this bound is theoretical and conservative, and is not used in the practical a posteriori computation of the error in our work. Nevertheless, it shows that the error is roughly proportional to the threshold δ used in the weight computations.
Appendix B: Full mammalian model
Model equations
Equations listed in [17, 19].mRNAs of Per gene
mRNAs of Cry gene
mRNAs of Bmal1 gene
Nonphosphorylated PER protein in the cytosol
Nonphosphorylated CRY protein in the cytosol
Phosphorylated PER protein in the cytosol
Phosphorylated CRY protein in the cytosol
Nonphosphorylated PERCRY complex in the cytosol
Nonphosphorylated PERCRY complex in the nucleus
Phosphorylated PERCRY complex in the cytosol
Phosphorylated PERCRY complex in the nucleus
Nonphosphorylated BMAL1 protein in the cytosol
Phosphorylated BMAL1 protein in the cytosol
Nonphosphorylated BMAL1 protein in the nucleus
Phosphorylated BMAL1 protein in the nucleus
Inactive complex between PERCRY and CLOCKBMAL1 in the nucleus
Model parameters
Parameters listed in [17, p.546]: Set 1.
k_{1}(h^{−1})=0.4, k_{2}(h^{−1})=0.2, k_{3}(nM^{−1}h^{−1})=0.4, k_{4}(h^{−1})=0.2, k_{5}(h^{−1})=0.4, k_{6}(h^{−1})=0.2, k_{7}(nM^{−1}h^{−1})=0.5, k_{8}(h^{−1})=0.1, K_{ AP }(nM)=0.7, K_{ AC }(nM)=0.6, K_{ IB }(nM)=2.2, k_{ dmb }(h^{−1})=0.01, k_{ dmc }(h^{−1})=0.01, k_{ dmp }(h^{−1})=0.01, k_{ dnc }(h^{−1})=0.12, k_{ dn }(h^{−1})=0.01, K_{ d }(nM)=0.3, K_{ dp }(nM)=0.1, K_{ p }(nM)=0.1, K_{ mB }(nM)=0.4, K_{ mC }(nM)=0.4, K_{ mP }(nM)=0.31, k_{ stot }(h^{−1})=1.0, k_{ sB }(h^{−1})=0.12k_{ stot }, k_{ sC }(h^{−1})=1.6k_{ stot }, k_{ sP }(h^{−1})=0.6k_{ stot }, n=4, m=2, V_{ phos }(nMh^{−1})=0.4, V_{1B}(nMh^{−1})=0.5, V_{1C}(nMh^{−1}) = 0.6, V_{1P}(nMh^{−1}) = V_{ phos },V_{1PC}(nMh^{−1})=V_{ phos }, V_{2B}(nMh^{−1})=0.1, V_{2C}(nMh^{−1})=0.1, V_{2P}(nMh^{−1})=0.3, V_{2PC}(nMh^{−1})=0.1, V_{3B}(nMh^{−1})=0.5, V_{3PC}(nMh^{−1})=V_{ phos }, V_{4B}(nMh^{−1})=0.2, V_{4PC}(nMh^{−1})=0.1, v_{ dBC }(nMh^{−1})=0.5, v_{ dBN }(nMh^{−1})=0.6v_{ dCC }(nMh^{−1})=0.7, v_{ dIN }(nMh^{−1})=0.8, v_{ dIN }(nMh^{−1})=0.8, v_{ dPC }(nMh^{−1})=0.7, v_{ dPCC }(nMh^{−1}) = 0.7,v_{ dPCN }(nMh^{−1}) = 0.7,v_{ mB }(nMh^{−1}) =0.8, v_{ mC }(nMh^{−1})=1.0, v_{ mP }(nMh^{−1})=1.1, vstot(nMh^{−1})=1.0, v_{ sB }(nMh^{−1})=v_{ stot }, v_{ sB }(nMh^{−1})=v_{ stot }, v_{ sC }(nMh^{−1})=1.1v_{ stot }, v_{ sP }(nMh^{−1})=1.5v_{ stot }
Initial conditions
The unit of the initial conditions is nM.
M_{ P }(0)=2.188M_{ C }(0)=1.633,M_{ B }(0)=9.498,P_{ C }(0)=2.008,C_{ C }(0)=1.884,P_{ CP }(0)=0.129,C_{ CP }(0)=0.473, PC_{ C }(0)=1.228,PC_{ N }(0)=0.177,PC_{ CP }(0)=0.203, PC_{ NP }(0)= 0.101,B_{ C }(0) =2.523,B_{ CP }(0)=0.929,B_{ N }(0)=1.787,B_{ NP }(0)=0.318,I_{ N }(0)=0.051
Appendix C: Switching times
See Table 4.
Appendix D: Neglected processes
First reduced model
Neglected processes are: f_{1,3}, f_{2,3}, f_{3,3}, f_{4,6}, f_{5,3}, f_{5,6}, f_{6,4}, f_{7,4}, f_{8,2}, f_{8,7}, f_{9,6}, f_{9,7}, f_{10,4}, f_{11,4}, f_{12,3}, f_{12,6}, f_{13,2}, f_{13,4}, f_{14,2}, f_{14,6}, f_{14,7}, f_{15,4}, f_{16,1}, f_{16,4}.
Second reduced model: submodels
Neglected processes in SM1 are: f_{1,3}, f_{2,3}, f_{3,3}, f_{4,6}, f_{5,3}, f_{5,6}, f_{6,4}, f_{7,4}, f_{8,2}, f_{8,7}, f_{9,6}, f_{9,7}, f_{10,4}, f_{11,4}, f_{12,3}, f_{12,6}, f_{13,2}, f_{13,4}, f_{14,2}, f_{14,6}, f_{14,7}, f_{15,4}, f_{16,1}, f_{16,4}.
In SM2, we supposed that processes switching state from t_{1}=0.33 until t_{6}=1.5 change simultaneously at time \(t_{1}^{r}=0.9\). Deleted processes are common to those removed in SM1, as well as: f_{4,3}, f_{4,4}, f_{5,4}, f_{7,2}, f_{8,3},f_{8,5}, f_{9,2}, f_{9,3}, f_{10,2}, f_{14,5}.
In SM3, we supposed that processes switching state from t_{7}=3.8 until t_{6}=1.5 change simultaneously at time \(t_{2}^{r}=6\). Deleted processes are common to those removed in SM1, as well as: f_{1,1}, f_{2,1}, f_{4,2}, f_{4,3}, f_{5,2}, f_{7,2}, f_{8,1}, f_{9,1}, f_{9,2}, f_{10,2}, f_{11,2}, f_{12,5}, f_{14,4}.
In SM4, we supposed that processes switching state from t_{16}=9.8 until t_{26}=15.6 change simultaneously at time \(t_{3}^{r}=12.5\). Deleted processes are common to those removed in SM1, as well as: f_{4,1}, f_{5,1}, f_{8,4}, f_{9,2}, f_{10,2}, f_{11,2},f_{12,5}, f_{14,4}.
In SM5, we supposed that processes switching state from t_{27}=17.3 until t_{46}=22.9 change simultaneously at time \(t_{4}^{r}=20.0\). Deleted processes are common to those removed in SM1, as well as: f_{4,3}, f_{4,4}, f_{5,4}, f_{7,2}, f_{8,3}, f_{8,5}, f_{9,2}, f_{9,3}, f_{10,2}, f_{14,5}.
Notes
 1.
Total protein concentrations are defined as follows: P_{ Tot }=P_{ C }+P_{ CP }+PC_{ C }+PC_{ N }+PC_{ CP }+PC_{ NP }+I_{ N }, C_{ Tot }=C_{ C }+C_{ CP }+PC_{ C }+PC_{ N }+PC_{ CP }+PC_{ NP }+I_{ N }, B_{ Tot }=B_{ C }+B_{ CP }+B_{ N }+B_{ NP }+I_{ N }.
 2.
planor: Generation of regular factorial designs https://CRAN.Rproject.org/package=planor
 3.
multisensi: Multivariate Sensitivity Analysis https://CRAN.Rproject.org/package=multisensi
References
 1
Bettenbrock K, Fischer S, Kremling A, Jahreis K, Sauter T, Gilles ED. A quantitative approach to catabolite repression in Escherichia coli. J Biol Chem. 2006; 281(5):2578–84.
 2
Kuepfer L, Peter M, Sauer U, Stelling J. Ensemble modeling for analysis of cell signaling dynamics. Nat Biotechnol. 2007; 25(9):1001–6.
 3
Snowden TJ, van der Graaf PH, Tindall MJ. Methods of model reduction for largescale biological systems: a survey of current methods and trends. Bull Math Biol. 2017; 79(7):1449–86.
 4
Apri M, de Gee M, Molenaar J. Complexity reduction preserving dynamical behavior of biochemical networks. J Theor Biol. 2012; 304:16–26.
 5
Petzold L, Zhu W. Model reduction for chemical kinetics: An optimization approach. AIChE J. 1999; 45(4):869–86.
 6
Sunnåker M, Cedersund G, Jirstrand M. A method for zooming of nonlinear models of biochemical systems. BMC Syst Biol. 2011; 5(1):140.
 7
Gorban AN, Karlin IV. Method of invariant manifold for chemical kinetics. Chem Eng Sci. 2003; 58(21):4751–68.
 8
Anderson J, Chang YC, Papachristodoulou A. Model decomposition and reduction tools for largescale networks in systems biology. Automatica. 2011; 47(6):1165–74.
 9
Hangos KM, Gábor A, Szederkényi G. Model reduction in biochemical reaction networks with MichaelisMenten kinetics. In: Control Conference (ECC), 2013 European. Zürich: IEEE: 2013. p. 4478–4483.
 10
Segel LA, Slemrod M. The quasisteadystate assumption: a case study in perturbation. SIAM Rev. 1989; 31(3):446–77.
 11
de Jong H, Gouzé JL, Hernandez C, Page M, Sari T, Geiselmann J. Qualitative simulation of genetic regulatory networks using piecewiselinear models. Bull Math Biol. 2004; 66(2):301–40.
 12
Baldazzi V, Ropers D, Markowicz Y, Kahn D, Geiselmann J, de Jong H. The carbon assimilation network in Escherichia coli is densely connected and largely signdetermined by directions of metabolic fluxes. PLoS Comput Biol. 2010; 6(6):1000812.
 13
Bhattacharjee B, Schwer DA, Barton PI, Green WH. Optimallyreduced kinetic models: reaction elimination in largescale kinetic mechanisms. Combust Flame. 2003; 135(3):191–208.
 14
Casagranda S, Ropers D, Gouzé JL. Model reduction and process analysis of biological models. In: 2015 23rd Mediterranean Conference on Control and Automation (MED). Torremolinos: IEEE: 2015. p. 1132–9.
 15
Leloup JC, Goldbeter A. A model for circadian rhythms in Drosophila incorporating the formation of a complex between the PER and TIM proteins. J Biol Rhythm. 1998; 13(1):70–87.
 16
KwangHyun C, SungYoung S, HyunWoo K, Wolkenhauer O, McFerran B, Kolch W. Mathematical modeling of the influence of RKIP on the ERK signaling pathway In: Priami C, editor. Computational Methods in Systems Biology. Rovereto: Springer: 2003. p. 127–41.
 17
Leloup JC, Goldbeter A. Modeling the mammalian circadian clock: sensitivity analysis and multiplicity of oscillatory mechanisms. J Theor Biol. 2004; 230(4):541–62.
 18
Khalil HK. Nonlinear Systems, Second edn. New Jersey: Prentice Hall; 1996.
 19
Leloup JC, Goldbeter A. Toward a detailed computational model for the mammalian circadian clock. Proc Natl Acad Sci. 2003; 100(12):7051–6.
 20
Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY. An efficient kmeans clustering algorithm: Analysis and implementation. IEEE Trans Pattern Anal Mach Intell. 2002; 24(7):881–92.
 21
Casagranda S, Gouzé JL. Principal Process Analysis and reduction of biological models with order of magnitude. IFACPapersOnLine. 2017; 50(1):12661–6.
 22
Kobilinsky A, Monod H, Bailey RA. Automatic generation of generalised regular factorial designs. Comput Stat Data Anal. 2017; 113:311–29.
 23
Lamboni M, Monod H, Makowski D. Multivariate sensitivity analysis to measure global contribution of input factors in dynamic models. Reliab Eng Syst Saf. 2011; 96(4):450–9.
 24
Box GE, Hunter JS. The 2 kp fractional factorial designs. Technometrics. 1961; 3(3):311–51.
 25
Zheng B, Albrecht U, Kaasik K, Sage M, Lu W, Vaishnav S, Li Q, Sun ZS, Eichele G, Bradley A, et al. Nonredundant roles of the mPer1 and mPer2 genes in the mammalian circadian clock. Cell. 2001; 105(5):683–94.
 26
Van Der Horst GT, Muijtjens M, Kobayashi K, Takano R, Kanno Si, Takao M, de Wit J, Verkerk A, Eker AP, van Leenen D, et al. Mammalian Cry1 and Cry2 are essential for maintenance of circadian rhythms. Nature. 1999; 398(6728):627–630.
 27
Tyson JJ, Hong CI, Thron CD, Novak B. A simple model of circadian rhythms based on dimerization and proteolysis of PER and TIM. Biophys J. 1999; 77(5):2411–7.
 28
Pagel H, Poll C, Ingwersen J, Kandeler E, Streck T. Modeling coupled pesticide degradation and organic matter turnover: From gene abundance to process rates. Soil Biol Biochem. 2016; 103:349–64.
 29
RoblesRodriguez C, Bideaux C, Guillouet S, Gorret N, Roux G, MolinaJouve C, AcevesLara C. Multiobjective particle swarm optimization (MOPSO) of lipid accumulation in fedbatch cultures. In: 2016 24th Mediterranean Conference on Control and Automation (MED). Athens: IEEE: 2016. p. 979–984.
Acknowledgements
The authors would thank the research program LABEX SIGNALIFE (ANR11LABX002801) and the anonymous reviewers for helpful comments.
Funding
We acknowledge the Conseil Régional PACA and the Investissements d’Avenir Bioinformatique programme under project RESET (ANR11BINF0005) for funding the PhD thesis of S. Casagranda.
Availability of data and materials
All data generated or analysed during this study are included in the published article and the Appendix. Matlab scripts to run Principal Process Analysis on the circadian clock model are available at the following link: http://wwwsop.inria.fr/members/JeanLuc.Gouze/BMBcode/code.zip.
Author information
Affiliations
Contributions
SC, DR and JLG designed the study and developed the methodology. SC performed the analysis. ST contributed to parameter sensitivity analysis. All authors discussed the results and contributed to the final manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Ethics approval and consent to participate
Not applicable.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Casagranda, S., Touzeau, S., Ropers, D. et al. Principal process analysis of biological models. BMC Syst Biol 12, 68 (2018). https://doi.org/10.1186/s1291801805866
Received:
Accepted:
Published:
Keywords
 Dynamical systems
 Biological networks
 Process analysis
 Model reduction
 Parameter sensitivity analysis
 Circadian clock