Clustering reveals limits of parameter identifiability in multi-parameter models of biochemical dynamics

Background Compared to engineering or physics problems, dynamical models in quantitative biology typically depend on a relatively large number of parameters. Progress in developing mathematics to manipulate such multi-parameter models and so enable their efficient interplay with experiments has been slow. Existing solutions are significantly limited by model size. Results In order to simplify analysis of multi-parameter models a method for clustering of model parameters is proposed. It is based on a derived statistically meaningful measure of similarity between groups of parameters. The measure quantifies to what extend changes in values of some parameters can be compensated by changes in values of other parameters. The proposed methodology provides a natural mathematical language to precisely communicate and visualise effects resulting from compensatory changes in values of parameters. As a results, a relevant insight into identifiability analysis and experimental planning can be obtained. Analysis of NF- κB and MAPK pathway models shows that highly compensative parameters constitute clusters consistent with the network topology. The method applied to examine an exceptionally rich set of published experiments on the NF- κB dynamics reveals that the experiments jointly ensure identifiability of only 60 % of model parameters. The method indicates which further experiments should be performed in order to increase the number of identifiable parameters. Conclusions We currently lack methods that simplify broadly understood analysis of multi-parameter models. The introduced tools depict mutually compensative effects between parameters to provide insight regarding role of individual parameters, identifiability and experimental design. The method can also find applications in related methodological areas of model simplification and parameters estimation. Electronic supplementary material The online version of this article (doi:10.1186/s12918-015-0205-8) contains supplementary material, which is available to authorized users.


Background
In the first section we provide details of conventional modelling techniques that we incorporate in our framework.

Calculating sensitivities of ODE models
The methodology developed in this paper can in principle be applied to analyse any ordinary differential equation model. Precisely, we consider where y = (y 1 , ..., y k ) ∈ R k is a system's state; F () is a law, which determines temporal evolution of y; and θ = (θ 1 . . . , θ l ) ∈ θ ⊂ R l is a vector of model parameters. For simplicity, instead of writing solution of (1) as a function of the time, t; an initial condition, y 0 ; and the parameters, θ we write simply y(t). Without loss of generality, we assume that the first q components of y denoted by y (q) = (y 1 , ..., y q ) contain model variables that are of interest. These components evaluated at specified times are denoted by Y = (y (q) (t 1 ), ..., y (q) (t n )).
The relationship between the mechanistic model (1) and experimental observations, X, is defined as X = (y q (t 1 ) + 1 , . . . , y q (t n ) + n ), 1 where i is a multivariate measurement noise.
In the examples studied, without loss of generality, we assume the noise to be multivariate normal (MVN), i ∼ M V N (0, D i ) with D i , being a diagonal matrix of measurement variances. In this setting X ∼ M V N (Y, D), where D is a matrix that contains matrices D i on its diagonal. The derivative of solution of the equation (1), y(t), with respect to the parameter θ i , z i (t) = ∂y(t) ∂θ i is described by an another ODE [1] where ∇ y F (y(t), θ) is the Jacobian matrix of F () with respect to y. Evaluation of z i (t) at the times and components of interests defines the sensitivity vector S i = z (q) (t 1 ), ..., z (q) (t n ) of the parameter θ i . A collection of the sensitivity vectors for all i = 1, ..., l constitutes the sensitivity matrix S = (S 1 , ..., S l ). The Fisher information matrix (FIM) of the MVN with density P (X|θ) defined as F I i,j (θ) = E ∂ log(P (X|θ)) ∂θ i ∂ log(P (X|θ)) ∂θ j can be written in terms of sensitivity vectors [2] F I(θ) = 1 σ 2 S T D −1 S.
For convenience and without loss of generality in the paper we assumed D = I, where I is the identity matrix.

Logarithmic parametrisation
Throughout the paper we parametrise models in terms of logarithms of their parameters, log(θ i ). For notational convenience we write simply θ i instead of log(θ i ).

Shannon Mutual Information
In the paper we use mutual information to measure similarity between groups of parameters. Parameters estimates are treated as random variables. First we introduce definition of mutual information and consider general random variables U and V having a joint probability distribution P (U, V ) and marginal distributions P (U ), P (V ). Using the Shannon entropy [3], uncertainty associated with a random variable is given as H(U ) = − P (U ) log(P (U ))dU.
Analogously the uncertainty of U conditioned on a given value of V = v is given by conditional entropy Averaging over all possible values of V we have average conditional entropy Using the above notions the average reduction in entropy of U resulting from knowing V is given by the Shannon mutual information 2 Mutual information -canonical correlation analysis (MI-CCA) Below we explain calculation of I(θ A , θ B ) using CCs directly from Fisher information matrix. Assume, data X (i) , i = 1, ..., N are independent measurements of Y for given parameters θ. Under certain general assumptions the distribution of the maximum likelihood estimator or equivalently maximum a posteriori estimatorθ is asymptotically multivariate normal [4] See conditions A1-A9 in [5] or Bernstein-von Mises Theorem in [6] for details. Suppose θ = (θ A , θ B ), θ A = (θ 1 , ..., θ m ) and θ B = (θ m+1 , ..., θ l ). Given the equation (9) we can divide the Fisher information matrix and the covariance matrix Σ = F I −1 into components corresponding to θ A and θ B and The mutual information formula for the multivariate normal can be easily computed using one of the following well known formulae [7] or where | · | denotes the determinant of a matrix and ρ M i are canonical correlations calculated from Σ. Precisely and j < i. The formulae (12) and (13) have however two strong disadvantages: involve inversion of the FIM, and requires division by the matrix determinant, which can be close to zero. Therefore we propose a new approach that avoids these difficulties. Specifically we show that a formula based on canonical correlations calculated directly from the FIM instead of the covariance matrix also holds where ρ i are canonical correlations calculated directly from F I(θ). These are defined as for i = min(m, l − m) and j < i.

Proof
It is sufficient to prove that Recall that As Σ is positively defined then |Σ AA | > 0 and |Σ BB | > 0. In particular Σ AA and Σ BB are nonsingular and we can take advantage of the formula for determinant of a block matrix have nonzero determinants so they are invertible. This allows us to use formula for block-wise inversion Combining this with equation (18) we obtain (19) to observe that After short transformations and replacing one |Σ| with |F I| −1 we finally get mark θ * in red; end while clusters ← clusters − A − B + C; end while Non-identifiable parameters are marked in red.

Interpreting (δ, ζ)-identifiability
As discussed in the subsection 2.3 of the MP the parameter θ i is said to be (δ, ζ)-identifiable if ρ(θ i , θ −i ) < δ and ||S i || > ζ. We purposely require the parameter to satisfy two independent conditions. The ζ-condition requires the parameter to have individual sensitivity above a threshold. The δ-condition requires the parameter not to be correlated with remaining parameters above the threshold. Such a definition is constructed to ensure that repeating the same experiment several times cannot make non-identifiable parameters identifiable and to determine which very parameters are not identifiable. In addition our focus on having non-correlated parameters results from advantages of manipulating models with a non-correlated parameters. These advantages are widely discussed in the literature [1,[8][9][10][11] .
In our approach, incorporating additional experiments that yield highly correlated parameters may decreasing the number of identifiable parameters. For illustration consider a model with two parameters θ = (θ 1 , θ 2 ). Assume that the parameters can be informed by two experiments yielding the following FIMs Given independence of the two experiments the joint experiment results with the FIM Comparing correlation in the three scenarios, denoted here by ρ (1) , ρ (2) , ρ (1,2) , we have that correlation is averaged over experiments Therefore adding "bad" i.e. highly correlated data (experiment) may cause parameters fail to satisfy δ-condition. It may seem an undesirable property. From practical point of view in many cases it is however desirable. When performing inference in multi-parameter models the main difficulty arrises from the likelihood surface having the shape of a long thin ellipsoid [8]. This makes the search in parameter space computationally demanding. Our approach, tailored to deal with multi-parameter models, is aimed to recognise this difficulty.
4.1 Relationship of (δ, ζ)-identifiability to "alphabetical" experimental design criteria The criterium of (δ, ζ)-identifiability is different from the conventional Fisher information based approaches known as alphabetic optimality (A-optimality, D-optimality, E-optimality etc.), which maximise overall information about model parameters [12]. These criteria do not serve to determine identifiability of individual parameter. We are focused on maximising the number of individual parameters. In addition, as opposed to our method, conventional approaches of alphabetic optimality predict increase of information as the same experiment is repeated.

Low pairwise correlations do not ensure identifiability
Measuring parameters similarity using MI-CCA has a qualitative advantage over pairwise correlations. Parameters that have low pairwise correlations may be non-identifiable and 7 have CC equal to 1. This can be easily exemplified by the following FIM with respect to parameters θ 1 , θ 2 , θ 3 Straightforward calculation shows that we have the following pairwise correlations However, CCA detects that θ 3 is a linear combination of θ 1 and θ 2 as

Relationship of (δ, ζ)-identifiability to profile likelihoods based identifiability analysis
The concept of (δ, ζ)-identifiability is related to a concept of identifiability defined based on profile likelihoods [13,14]. Our framework is complementary and should be applied in different scenarios. To demonstrate the link between the two approaches we introduce a standard statistical notation. Assume as before, data X have a distribution with a density P (X|θ). Identical independent copies of X are denoted by X k . We also introduce: log-density l(X|θ) = log(P (X|θ)); expected log-density L(θ) = E θ * (l(X|θ)), where E θ * () denotes expected value with respect to P (X|θ * ); log-likelihood L n (θ) = N k=1 l(X i |θ). Log-profile likelihood, P L n (·), of a parameter θ i estimated along with parameters θ −i = θ\{θ i } is conventionally defined as [13,15,16] We consider the second order Taylor expansion of L(θ) around a true value θ * The first order term is missing as L(θ) has the maximum for θ = θ * and therefore ∇ θ L(θ * ) = 0. The second term is by definition described by Fisher information. The function defined below we call asymptotic profile likelihood (AP L) It is well known that assuming general regularity conditions, we have convergence L n (θ) → L(θ) for each θ and therefore also convergence P L n (θ) → AP L(θ) as n → ∞. Consider the following transformation where ρ is the CC between θ i and θ −i .
Therefore we have the formula that allows us to understand the link between (δ, ζ)-identifiability and P L based identifiability Consider curvature κ of AP L measured as the signless, second order derivative with respect The assumption of (δ, ζ)-identifiability imposes conditions on the curvature of AP L. The contribution resulting from F I ii must be greater than ζ and the contribution coming from correlation must be greater than (1 − δ 2 ). We purposefully neglect the actual value of F I ii and verify only if it is above a threshold, as it can be artificially inflated by scaling of the parameter or repeating the same experiment several times. The correlation ρ is affected by neither of these factors.
The identifiability based on profile likelihoods [13] is determined by confidence intervals whereθ is the argument maximising L n (θ), and ∆ is a constant selected based on χ 2 statistics. If confidence interval extends infinitely parameter is non-identifiable.
The two methods are applicable in different practical situations. If data X is available and L n (θ) can be calculated profile likelihood is the superior method. Our method, however, is tailored to deal with a situation where only L(θ) is available, whereas L n (θ) is not. It is the case in a number of scenarios. For instance, if data X is not (yet) available or L n (θ) 9 cannot be evaluated analysing AP L is the best one can do. As we demonstrate clustering provides informative tool to explain the curvature of APLs, which cannot be delivered by available methods. In the case of our meta-analysis of the NF-κB experiments calculating L n (θ) would hardly be possible due to reasons related to availability / comparability of data as well as computational efficiency.

Proposition
The minimum min Differentiation with respect to θ −i given θ i shows that the minimum is achieved for where lower indices denote corresponding elements of the FIM. We have therefore Calculating CC, ρ, between θ i and θ −i we obtain that (C) Analysis of the system out-of-steady-state, the initial condition was increased 30-fold compared to the steady state. See text for a discussion. The dendrograms were generated using k r = 100, k p = 2, γ r = 1.2, γ p = 0.8.

Analysis of the NF-κB system
The NF-κB dynamical model together with used parameter values are taken directly from [17] and reader is referred to the supplementary information of that paper for more details.
Below we reproduce equations of the model and its parameter values (Table 1).

Variable normalisation
In the entire analysis of the NF-κB system we normalised model variables by squared root of their maxima. We also assumed that the normalised variables were measured with unit variance normal measurement error. Precisely where I is an identity matrix. Indices i j select for variables that are of interest in the context of the presented examples. These two assumptions are equivalent to assume that each Y i j is measured with the variance equal to the max(Y i j ).

Available experimental data -identifiability analysis
In the Table 2 we summarise experiments presented in the 9 papers [17][18][19][20][21][22][23][24][25] used to study identifiability of the NF-κB model. We simplistically assumed that all experiments were comparable and measurements across experiments were taken in the same units equivalent to those used in the model (31). We used algorithm (3) to find out the number of identifiable parameters using the (δ, ζ)-identifiability criterium. We set δ = 0.95 and ζ = 1. Starting from the first listed experiment we iteratively added subsequent experiments that maximised the number of identifiable parameters. We repeated the procedure using experiments published in each of the papers and prior to its publication. Obtained cumulative number of identifiable parameters is presented in Figure 2. The dendrogram with 21 identifiable parameters for the total of available data is presented in Figure 4A in MP.

Dependance of the results on δ and ζ values
In order to verify how the identifiability predictions depend on choice of values of δ and ζ we have plotted the number of identifiable parameters as a function of δ and ζ in Figure 3. All parameters are well above considered ζ thresholds therefore the number of identifiable parameters does not change as ζ is varied. The number of identifiable parameters increases as higher correlation can be accepted and reaches the total number of model parameters (39) at δ = 1. The number of identifiable parameters is sensitive to δ. However the main conclusions of our analysis that the rich set of available experiments yields highly correlated parameters holds for all values of δ smaller the one. The correlations are moderately decreased by the proposed experiments.
We found that adding new protocols cannot increase the number of identifiable parameters. In Figure 4 we show the distribution of the number of identifiable parameters for the literature experiments considered jointly with one randomly generated experiment. The number of identifiable parameter does not increase. It is the result of some parameters being highly correlated with the remaining model parameters in the randomly generated experiments. Histograms show CCs between each of the selected 19 highly correlated parameters and the remaining parameters in the sampled 1000 random experiments.

Experiments to increase the number of identifiable parameters
Here we provide technical details regarding experiments that increase the number of identifiable parameters as well as describe reasoning that lead us to propose them. Among non-identifiable parameters in Figure 4A in MP, we have selected 6 parameters ki, KN , ka, c3, c4 and c3a to be identified. As these refer to different modules of the pathway we divide them into two subsets: ki, KN , ka and c3,c4, c3a. Experiments to estimate parameters of each subset are described separately below.
In order to make inference for this equation independently of other equations variables y 1 , y 9 , y 16 must be measured. Therefore we assume these variables are measured in WT cells for 60 minutes every 5 minutes. We also assume the parameter k20 is estimated along with the other three parameters. Throughout first 5 minutes of 60 minutes long experiment cells are stimulated with 10ng TNF-α. The dendrogram that corresponds to such experiment is presented in Figure 7A. The parameters exhibit high correlation as can be expected from the structure of the equation (32). In order to improve identifiability we notice that using A20 knockout cells would eliminate k20 from the equations. Similarly, blocking dephosphorylation with a phosphatase inhibitor would eliminate ki. Therefore measuring the assumed three variables after TNF-α stimulation in three types of cells: wild type, A20 knockout, and A20 knockout with blocked phosphatases activity, would render all parameters identifiable. This is indeed reflected in the corresponding dendrogram ( Figure 7C).

Predicions are confirmed by profile likelihoods
In order to verify identifiability predictions, we generated data from the model. The generated data were used to plot profile likelihoods. There is an ideal correspondence between 24 what is predicted by our method and identifiability based on profile likelihoods. This is shown in Figure 7B,D.   Figure 7: Dendrogram for experiments aiming to estimate parameters ki, KN , ka along with k20. Dynamics of IKKK activity (y 1 ) receptor activity (y 16 ), and cytoplasmic A20 (y 9 ) was assumed to be measured in (A) WT cells; (C) jointly in WT cells, A20 knockouts, and A20 knockouts with phosphatase activity inhibitor. Predictions of the dendrograms (A) and (C) are verified by profile likelihoods in (B) and (D) respectively. (E) presents histogram of the number identifiable of parameters in experiment (C) when values of the parameters k20, ki, KN , ka were randomly perturbed by upto 50%. This is to demonstrate that the prediction does not depend on actual value of the parameters.

Identifiability predictions do not depend on parameter values
To verify if our predictions depend on actual values of the parameters we have randomly generated 1000 sets of parameter values. For each set we calculated the number of identifiable parameters using our (δ,ζ)-criterium with ζ = 1 and δ = 0.95. In each of the 1000 sets all parameters were identifiable. This is depicted in the histogram (7D). Parameters were sampled according to the formula is the primary value of one of the four parameters, i is indexing the four parameters, j is indexing the 1000 generated sets and U is a random variable sampled from a uniform distribution on [0,1]. 6.6.2 Experimental design for parameters c3, c4, c3a Estimation of the parameter c3a, which is the degradation rate of IκBα transcript, present in the equation IκBα transcript y 13 dt = c1a y 18 − c3a y 13 is relatively straightforward [26,27]. We assume that after 10 min long 10ng TNF-α stimulation and 50 minutes of waiting for transcripts to be produced transcription is blocked with a transcription inhibitor. The above equation is then fitted to data assuming c1a = 0. In our numerical simulations we assumed transcripts were measured every 5 minutes for 2h starting at 1h after initial TNF-α stimulation.
We assume parameter c5 is estimated along with c3 and c4. In order to obtain information about translation rate and degradation rates we considered again cells being stimulated for 10 minutes with 10ng TNF-α. In order to avoid dissect the equation from the rest of the system we assume transcription is blocked just after 10 min of TNF-α stimulation. Three modified versions of this experiments were jointly required to obtain sufficiently decelerated parameters. In version one only A20 transcript is measured, in version two only A20 protein is quantified, in the third scenario both A20 mRNA and protein are measured. Such combination of measurements allowed us to break parameter correlations. This is depicted in Figure 8. Parameters of this experiment are plotted jointly with the experiment to estimate ki, KN , ka. The dendrogram plotted for literature experiments together with the proposed experiments is plotted in Figure 4B in MP. In order to demonstrate that the described methodology is suitable for models that involve hundreds of parameters, and to verify if correspondence between similarity and network topology is not limited to the NF-κB system, we analyzed a computational model of dynamics of the MAP kinase cascade activated by surface and internalized EGF receptors [28]. The EGF receptor belongs to the tyrosine kinase family of receptors and is expressed in virtually all organs of mammals. It plays a complex role during embryonic and postnatal development and in the progression of tumor. Binding of EGF to the extracellular domain of the EGF receptor initialize complex process. These lead to phosphorylation, transmission of conformational change, and proximal translocation to membrane-associated target molecules. It induces two pathways: Shc-dependent and Shc-independent, leading to activation of Ras-GTP and eventually MAP kinase cascade through the kinases Raf, MEK and ERK. Activated ERK phosphorylates and regulates several cellular proteins and nuclear transcription factors. One of the widely accepted mathematical models of MAPK signalling was published in [28].
It involves approximately 100 ODEs and 200 parameters. Authors optimised parameters so that the model generates behaviour observed experimentally. Detailed analysis of the model is beyond the scope of this paper. The dendrogram plotted for this model (Figure 9), reveals that correspondence between network topology is not limited to the NF-κB model, but is likely to be more general feature of biochemical models. The latter statement however requires further verification. Parameters in the dendrogram 9 are colour coded and correspond to the arrows in the networks' schematics in Figure 10. Clearly clustered parameters describe functional modules proximate in the netwok's structure. Even though the model involves 200 parameters the dendrogram can be computed within minutes on a standard desktop machine. 0 C1 C2 C3 C4 C5 C6 Figure 9: Dendrogram of MAPK signalling model [28]. The model was imported as an SMBL file directly from the BioModels database under the ID B IOMD0000000019 -Schoeberl2002 -EGF MAPK Clusters are described in the caption of Figure 10.  Figure 10: Structure of the MAPK signalling model [28]. Reaction arrows and labels are colour coded as in the corresponding dendrogram. The similarity of the model parameters highly resembles topology of the network.
Parameter clusters correspond to the following functional modules: cluster C1 -ERK activation; clusters C2 and C3 -part of Shc-dependent Ras-GTP activation pathway; cluster C4 -final part of Shc-independent Ras-GTP activation pathway without internalization mechanism; cluster C5 -complex (EGF-EGFR)2 phosphorylation and Shc-independent Ras-GTP activation pathway and cluster C6 -binding EGF to EGF Receptor and MEK kinase activation pathway.