In this section we will first introduce the mathematical system class and give a more concise definition of identifiability. We focus on local at-a-point identifiability and describe the four methods for testing local at-a-point identifiability compared here. Finally, we summarize the three signaling pathways and the corresponding models.
System Class
The concept of identifiability applies to a large system class. Here we treat models of the form
since many biological signaling pathways can be represented by systems of this form. In equation (1)
,
,
and
denote the state variables, the parameters, the inputs, and the outputs of the dynamical system, respectively. A biological parameter, for example a kinetic constant, corresponds to a component p
k
of the parameter vector. The functions f and h map from an open subset
onto
and
, respectively, and are assumed to be smooth. Note both a lab experiment and a simulation are uniquely defined by the initial conditions x(0) = x0 and the values of the inputs u(t) from t = 0 to the final time t = t
f
. Both an experiment and a simulation result in values of the outputs y(t) at successive points
in time
where n
t
denotes the number of measurements or stored simulation results, respectively. In the present paper, output data of the form (2) is obtained from simulations with models that have been adjusted to experimental data by other authors before [1–3].
The solution of equation (1) that results for a particular choice of initial conditions x0, parameters p, and inputs u(t) is denoted by
The solution of equation (1) and its derivatives with respect to the parameters are computed with the integrator DDASPK [33]. The output behavior of the model is given by the response function
Concept of identifiability
Here we summarize the notions of identifiability as necessary for the paper (see [4] for a review). Assuming that the inputs u(t), the initial conditions x0, and the measurement times t
i
are given, the different notions of identifiability can be defined as follows.
Definition 1: The parameter p
k
of the model (1) is called globally structurally identifiable if, for all admissible values
and all
,
implies
Definition 2: The parameter p
k
of the model (1) is called locally structurally identifiable if, for all
, there exists a neighborhood
such that for all
, equations (5) imply equation (6).
The parameter vector p is called globally structurally or locally structurally identifiable if all its components p
k
are globally structurally or locally structurally identifiable, respectively. Global and local at-a-point identifiability are defined as follows.
Definition 3: Let
be a point in the parameter space of the model (1). The parameter p
k
of the model (1) is called globally at-a-point identifiable at the point p* if, for all
, equations (5) imply equation (6).
Definition 4: Let
be a point in the parameter space of the model (1). The parameter p
k
of the model (1) is called locally at-a-point identifiable at the point p* if there exists a neighborhood
such that for all
, equations (5) imply equation (6).
The following subsection will introduce the four methods for at-a-point identifiability testing compared in this work.
Methods for local at-a-point identifiability testing
The compared numerical methods for local at-a-point identifiability testing are based on the sensitivity of the model outputs at discrete time points t
k
, k ∈ {1,..., n
t
}, with respect to the parameters. The sensitivity information is stored in the n
y
n
t
× n
p
dimensional sensitivity matrix S. S is a block matrix that consists of time dependent blocks s(t
i
) of size n
y
× n
p
[34]:
The entries of s(t
i
) are called sensitivity coefficients. For a nominal parameter vector p*, given x0, and fixed time t
k
with k ∈ {1,..., n
t
}, they are defined as
Essentially, the sensitivity coefficients describe how sensitive the system output is to changes in a single parameter. If the model output is highly sensitive to a perturbation in one parameter we can consider this parameter to be important for the system behavior. In contrast, a parameter that has no influence on the outputs is a candidate for an unidentifiable parameter. Linearly dependent columns of the sensitivity matrix imply that a change in the system outputs due to a change in one parameter, say p
j
, can be compensated by changing some or all of the dependent parameters p
k
, with k ≠ j. If dependencies exist, parameter estimation will fail or result in non-unique parameter values [26]. The correlation and orthogonal method are based on this idea.
For ease of presentation, we first give a detailed introduction to the eigenvalue method proposed here. The remaining methods are summarized briefly only, and the reader is referred to the appendix for more details. Since all methods analyze at-a-point identifiability, nominal values for the parameters are required. For the test cases treated here parameter values are available from the literature [1–3] and denoted by plit.
Eigenvalue method
The eigenvalue method is similar to the method published by Vajda et al. [26]. The eigenvalue method proposed here differs from the approach proposed by Vajda and coworkers in that a manual inspection of eigenvalues and eigenvectors is not necessary here. As a result, an automatic analysis of large models becomes feasible. We first describe the eigenvalue method and subsequently describe differences to the method published by Vajda et al. [26] in more detail.
We consider the least squares parameter estimation problem that amounts to minimizing cost functions of the form
with respect to p, where the y
i
(t
j
) are the measured or simulated data introduced in equation (2). In Gaussian approximation, the Hessian matrix H of equation (9) has the entries
for k, l ∈ {1,..., n
p
}. As indicated in equation (10), H can be expressed in terms of the sensitivity matrix S introduced in equation (7). H is a symmetric matrix and therefore its eigenvalues are real. Furthermore, H is positive semi-definite.
In order to study the identifiability of the system in equation (1) for given x0, u(t) and for nominal parameters
we first solve equation (1) by numerical integration and subsequently calculate H as given by equation (10). We set p* to parameter values taken from the literature plit in all calculations. All identifiability tests carried out in the present paper therefore amount to asking whether the investigated signaling models are identifiable at the literature parameter values plit. As pointed out already in the Background section, identifiability tests of this type clarify whether a given model is reasonably detailed even before experiments are carried out. In this sense, the identifiability of a model at nominal or literature values for its parameters is considered to be a necessary condition for its practical identifiability with measured data.
Let λjand ujdenote the j th eigenvalue and the corresponding eigenvector of H, respectively. Assume the eigenvalues to be ordered such that
, assume the eigenvectors ujto be normalized such that ujTuj= 1, and assume p* to minimize equation (9). Consider the change of ϕ when moving from p* in a direction αujfor some real α. Gaussian approximation yields
where the linear order is zero, since p* is is a minimum by assumption. Since Huj= λjujand ujTuj= 1, the last equality in equation (11) holds. Equation (11) implies that ϕ does not change when moving from p* to p* + Δp if Δp = αujfor any eigenvector ujwith λj= 0 and any real α. In words, the directions ujcorresponding to λj= 0 are those directions in the parameter space along which the least squares cost function is constant. We call these directions degenerate for short. In the particular case of
where the entry 1 is in position k, the model is not identifiable with respect to the k th component
. The approach proposed here will therefore remove this parameter from consecutive calculations by fixing
to
. In general however, ujwill not be of the special form (12), but ujwill be a vector with more than one non-zero entry, therefore the choice of k is not obvious. In this case, we select a k such that
, and remove the k th component of p* from the parameter estimation problem by fixing it to its literature value. In the examples treated below this choice of k turns out to be appropriate in all cases. In general, this might not be the case, however. There might be entries of
, l ≠ k, with an absolute value
close or equal to the maximal value
. We call such entries
and the corresponding parameters co-dominant, since together with
they dominate the degenerate direction. A simple example for a system with co-dominant parameters is given in Appendix A. We will discuss the issue of co-dominant parameters for the examples treated here in the Results section.
In practical applications the smallest eigenvalue will typically not be zero but close to zero. In this case an eigenvalue cut-off value ϵ ≈ 0, ϵ > 0 needs to be specified. An eigenvalue λ1 <ϵ is considered to be small enough to be treated as zero. Note λj≥ 0 for all j since H is positive semi-definite.
The proposed algorithm splits the set of parameter indices {1,..., n
p
} of a given model into two disjoint subsets which we denote by I and U. Upon termination, the sets I and U = {1,..., n
p
} - I contain the indices to the identifiable and the unidentifiable parameters, respectively. We denote the current number of elements in I by n
I
and the current elements in I by
. The algorithm proceeds as follows.
-
1.
Choose Δt, t
f
, ϵ. Set p* = p lit. Set I = {1,..., n
p
}, n
I
= n
p
, and U = ∅.
-
2.
If I is empty, stop. The model is not identifiable with respect to any parameter in this case.
-
3.
Fix the parameters p
k
, k ∈ U to their literature values and consider only the p
k
, k ∈ I to be variable. Formally, this corresponds to setting p to
and treating all remaining parameters p
k
, k ∈ U as fixed numbers.
-
4.
Calculate ϕ(p*) according to equation (9). Calculate the Hessian matrix of ϕ with respect to the parameters p
k
with k ∈ I and evaluate it at p = p*. Calculate the eigenvalues λ jand corresponding eigenvectors u jof the Hessian matrix. Assume the eigenvalues to be ordered such that
. Assume the eigenvectors to be normalized.
-
5.
If λ 1 ≥ ϵ ,stop. The model is identifiable with respect to the parameters p
k
for all k ∈ I.
-
6.
If λ 1 < ϵ ,select k such that
. Remove k from the set I, add k to the set U, set n
I
to the current number of elements in I, and return to step 2.
The order in which parameters are removed from I determines the ranking of parameters from least identifiable to most identifiable.
The method is similar to the approach introduced by Vajda et al. [26]. These authors also analyze the eigenvectors that correspond to small eigenvalues of the Hessian matrix, but they focus on the dependencies between eigenvectors that arise due to special parameter combinations of the form p1/p2 or p1·p2 in the model. In a two step procedure Vajda et al. first decide which parameters to lump together (for example, p
new
= p1·p2) and subsequently recalculate the eigenvalues and eigenvectors for the system with the new lumped parameters. These two steps are repeated until the smallest eigenvalue is sufficiently large. The lumping step requires manual inspection of the eigenvalues and eigenvectors of the Hessian matrix. While the approach proposed by Vajda et al. worked very well for their example with 5 parameters, it becomes infeasible for models with considerably more parameters. The examples treated in the next section demonstrate that our approach, in contrast, can be applied to models with up to at least a hundred parameters. Moreover, our approach can be carried out automatically, while the approach suggested by Vajda et al. was never intended for this purpose.
Correlation method
The correlation method was first introduced by Jacquez and Greif [25] who compared identifiability results achieved with the correlation method to analytical results obtained with a transfer function method (see [14] for a review). Jacquez and Greif [25] consider only small linear compartmental models with up to 3 states and 5 parameters. In most of the examples the correlation and the transfer function method were in agreement.
In the systems biology literature the correlation method was applied by Zak et al. [9] to investigate identifiability of a large genetic regulatory network (nonlinear, 44 states and 97 parameters). More recently, Rodriguez-Fernandez et al. [10] embedded the correlation method into their framework for robust parameter estimation in order to exclude unidentifiable parameters from parameter estimation.
The central idea of the correlation method is to find unidentifiable parameters by investigating the linear dependence of the columns of S (for details see Appendix B). The correlation method approximates linear dependence by calculating the sample correlation [35] of two columns
of S. The sample correlation is given by
where
In equations (13)–(16) corr(S.
i
, S.
j
), cov(S.
i
, S.
j
), σ (S.
i
), and
denote the sample correlation between S.
i
and S.
j
, the sample covariance between S.
i
and S.
j
, the sample standard deviation of S.
i
, and the mean of the entries of S.
i
, respectively.
Two linearly dependent columns S.
i
and S.
j
of the sensitivity matrix give rise to an absolute value of the correlation corr(S.
i
, S.j) = 1. In the examples treated here, none of the parameters exhibit a correlation of exactly ± 1, however. In the numerical implementation of the method, two columns of S are considered correlated if the absolute value of their correlation is equal to or larger than 1 – ϵ
c
, where ϵ
c
∈ [0, 1] is a parameter of the algorithm. We stress that while ϵ
c
is a tuning parameter, the comparison of the identifiability methods is not affected by the choice of ϵ
c
. As explained below, we systematically vary ϵ
c
over its entire range 0 ≤ ϵ
c
≥ 1 in the comparison.
When applying the correlation method to the pathway model examples, two problems arise regularly that have not been discussed by Jacquez and Greif [25]. For one, the method detects pairs of correlated parameters, but there is no criterion which one of the parameters from a pair to consider unidentifiable. Moreover, if more than one pair of correlated parameters is detected, there is no criterion to choose among the pairs.
In order to mitigate these ambiguities we introduce the total correlation
where
and L denotes the set of indices of those parameters that have not been found to be unidentifiable in any previous iteration. We select the parameter with the highest total correlation for removal. Some cases remain, however, in which two parameters have the same total correlation. In these cases we pick one parameter at random.
In order to be able to compare results to those of the other methods, we use the correlation method to rank model parameters from least identifiable to most identifiable. More precisely, we initially set ϵ
c
= 0 and L = {1... n
p
} and calculate the total correlations (17) for all parameters in L. If parameters with nonzero total correlations exist we select the parameter with highest total correlation to be unidentifiable and remove its index from L. If no such parameter exists we increase ϵ
c
until nonzero total correlations occur. The order in which parameters are removed from L creates a ranking of parameters from least to most identifiable. The algorithm is described in detail in Appendix B. Based on the ranking of parameters the method is compared to the other three approaches. The comparison is explained in detail in the subsection entitled Method comparison.
Principal component analysis (PCA) based method
Degenring et al. [27] introduce three criteria based on PCA that rank the influence of parameters on model output. Parameters that do not affect the model outputs are unidentifiable by definition and can be removed from the model equations. Degenring et al. successfully use their approach to simplify a complex metabolism model of Escherichia coli K12. A complete description how to get a full ranking using the three criteria can be found in [28]. A pseudo code description of the method can be found in Appendix C.
The PCA based method differs from the other methods described here in that it does not use the sensitivity matrix S as defined in equation (7) but a truncated matrix which we refer to by
. This matrix
is defined as
where i ∈ {1,..., n
y
}. While the sensitivity matrix S as defined in equation (7) describes all responses in one matrix,
is created for each response r
i
separately.
Three PCA based criteria are applied to each of the n
y
matrices
. As a result 3n
y
parameter rankings are created, which we represent by 3n
y
lists
of integers
, where j ∈ {1, 2, 3} and i ∈ {1,..., n
y
}. The first ranking position
and the last ranking position
contain the index of the parameter with the lowest and the highest influence on the model output r
i
, respectively. The lower index j ∈ {1, 2, 3} denotes the criterion by which the ranking is calculated. We will explain one of the criteria in more detail in order to give the reader an idea of the method. The remaining two criteria are summarized in Appendix C. All 3n
y
rankings are integrated into one ranking by applying a strategy described in [28], which we briefly summarize below.
Let,
and
denote the j th eigenvalue and corresponding eigenvector of
, respectively, and assume the eigenvalues to be ordered such that
. For each i ∈ {1,..., n
p
} evaluate the first criterion as follows.
-
1.
Set L 1 = {1,..., n
y
}.
-
2.
For q from 1 to n
p
-
Find r
l
such that
.
-
Set
= r
l
.
-
Set Lq+1= L
q
- {r
l
}.
do:
The first criterion loops over the eigenvectors starting with the eigenvector that belongs to the smallest eigenvalue. It identifies the largest absolute entry of each eigenvector and selects the corresponding parameter for removal, if this parameter has not been selected yet. The index of the k th parameter selected for removal is ranked at position k, where k ∈ {1,..., n
p
}.
The remaining two criteria of the PCA based method are similar but differ with respect to the process for chosing r
l
. For brevity we omit details and refer the reader to Appendix C. A final ranking J that incorporates all 3n
y
previously calculated rankings is obtained as follows. Set N
q
= ∅, for all q = 1,..., n
p
. Set J to an empty list.
For q from 1 to n
p
do:
-
For all j ∈ {1, 2, 3} and all i ∈ {1,..., n
y
} set
.
-
Set
.
-
Only if N
q
≠ ∅ append the N
q
to the list J.
First note that an iteration q might exist, in which no N
q
is appended to J. Therefore the final number n
J
of elements in J is not necessarily equal to but might be smaller than n
p
. Further note that entries of J do not necessarily correspond to indices of single parameters but to sets of parameter indices. Therefore, a ranking position J
i
might contain several parameter indices an internal ranking of which cannot be obtained. This ambiguity hampers not only the interpretation of the ranking result but also the comparison with the rankings produced by the other methods. We describe a strategy to deal with the latter problem in section entitled Method comparison.
Orthogonal method
The orthogonal method was developed by Yao et al. [24] to analyze parameter identifiability of an Ethylene/Butene copolymerization model with 50 parameters. In the field of systems biology the method has been applied in a framework for model identification [36] and in the identifiability analysis of a NF-κ B model [37].
We summarize the concept of the method and refer the reader to Appendix D for details. The method iterates over the columns S.
k
of the sensitivity matrix S, with k ∈ {1,..., n
p
}. Those columns of S that correspond to identifiable parameters are collected in a matrix Xqwhere q is the iteration counter. The algorithm starts by selecting the column of S with highest sum of squares in the first iteration. In iteration q + 1, q columns of S have been selected. In the order of their selection these columns form the matrix Xq. The next step essentially amounts to selecting the column S.
k
that exhibits the highest independence to the vector space V spanned by the columns of Xq. More precisely, an orthogonal projection
of S.
k
onto V is calculated and
is interpreted as the shortest connection from V to S.
k
. The squared length of
is used as measure of independence. If the length of
is near zero, S.
k
is nearly linearly dependent to the columns of Xq. Conversely, a large value of
indicates that parameter p
k
is linearly independent to the columns of Xq. In the orthogonal method the parameter p
l
where
is selected as the next identifiable parameter.
Essentially, the orthogonal method finds those columns of S that are as independent as possible. The quantity
can be interpreted as a measure of independence that has been freed from dependencies on previously selected parameters. Therefore not only the influence of a parameter on the outputs, but also the dependencies between the parameters are accounted for by using
as a measure. A visualization of the orthogonal method can be found in Appendix D.
Yao et al. [24] stop the selection process once the length of the maximal
drops below a cut-off value ϵ
o
≈ 0, the choice of which is rather arbitrary. In the present paper we do not apply this cut-off criterion but merely rank all parameters from most to least identifiable where the terms most identifiable and least identifiable are used in the same sense as in the correlation method. A comparison to the results of the other methods is therefore not affected by the choice of ϵ
o
. Our approach to comparing the methods is explained in the section entitled Method comparison.
Method comparison
The central idea behind our comparison of the methods is the equivalence between the local identifiability of a model and the local positive definiteness of the Hessian matrix (10) at a minimum of the least squares parameter cost function (9) for this model. Due to this theoretical result, which has been established by Grewal and Glover [38], we can test local identifiability at a point in the model parameter space by testing the Hessian matrix (10) for positive definiteness at this point. Since the Hessian matrix is positive definite if and only if all its eigenvalues are strictly positive, we consider a model to be identifiable if the eigenvalues are bounded below by a small strictly positive number ϵ. Note the eigenvalue method proposed here is based on the same idea in that it selects those parameters to be unidentifiable that cause eigenvalues smaller than ϵ.
By virtue of the relation between local at-a-point identifiability and the eigenvalues of the Hessian matrix we can compare all four identifiability testing methods to one another with the same criterion and an unique parameter value ϵ. For a given model we first create a ranking of the parameters from least to most identifiable with each method as described in the previous four sections. For each parameter ranking we then fix the parameter considered to be the least identifiable to its literature value, calculate the Hessian matrix (10) with respect to the remaining parameters, and determine the smallest eigenvalue. In all following steps we additionally fix the next least identifiable parameter and recalculate the Hessian and smallest eigenvalue. This process is carried out until the smallest eigenvalue exceeds ϵ. The parameters that need to be fixed in order for the smallest eigenvalue to exceed ϵ are considered to be the unidentifiable parameters. Clearly, the smaller the set of unidentifiable parameters, the larger the set of identifiable parameters or, equivalently, the larger the number of parameters for which the least squares parameter estimation problem can be solved. In this sense we consider the method to be the best one that results in the smallest number of unidentifiable parameters in our comparison.
Note in contrast to the other methods, the ranking created with the PCA based method does in general not contain one but several parameters. In such a case we fix all parameters ranked at the same position at once and reevaluate the smallest eigenvalue of the Hessian matrix. The last ranking position
created by the PCA based method contains the indices of all parameters that have not been ranked previously. Fixing of parameters p
k
, with k ∈
, would therefore not leave any parameter unfixed. For this reason we consider the PCA method to have failed, if fixing of all parameters with indices in
does not lead to a value of the smallest eigenvalue larger than or equal to ϵ.
Pathway model description
Having introduced the methods to be compared, we briefly summarize the three pathways and the corresponding models used for the identifiability studies.
JAK-STAT pathway
The Janus Kinase (JAK) – signal transducer and activator of transcription (STAT) pathway is triggered by cytokines and growth factors. The pathway has an impact on the expression of genes that regulate diverse cellular processes, such as cellular proliferation, differentiation, and apoptosis. A detailed model of the Interferon-induced JAK-STAT pathway is given by Yamada et al. [2]. This model describes the following signal transducing steps. Upon ligand binding the receptor dimerizes thus triggering the activation of receptor-associated JAK. Activated JAK immediately phosphorylates the receptor complex, enabling the binding and subsequent phosphorylation of cytoplasmic STAT. Phosphorylated STAT can dimerize, and finally, dimerized STAT is imported into the nucleus. Here it activates target genes, one of which is the suppressor of cytokine signaling 1 (SOCS1), an important mediator of negative feedback.
EGF induced MAP kinase pathway
Mitogen-activated protein (MAP) kinases are involved in mitosis and the regulation of cellular proliferation. The Epidermal Growth Factor (EGF)-induced MAP kinase pathway is centered on a three kinase cascade that terminates with the dual phosphorylation of the third, so-called MAP kinase. EGF stimulation leads to dimerization of the EGF receptor. The dimerized EGF receptor triggers the phosphorylation of Raf, the first kinase of the cascade. Raf, in turn, phosphorylates the mitogen extracellular kinase, MEK, the second kinase in the cascade, which finally phosphorylates the MAP kinase (extracellular signal-related kinase, ERK). Phosphorylated ERK regulates several proteins and nuclear transcription factors and controls the expression of target genes. The model published by Schoeberl et al. [3] additionally includes the internalization of EGF receptor and two further modules: an SH2-domain containing protein (SHC) dependent module and an SHC independent one.
NF-κ B pathway
Nuclear Factor-κ B (NF-κ B) is a transcription factor, which regulates genes involved in inflammation, immune response, cell proliferation, and apoptosis. In the unstimulated cell NF-κ B is captured in the cytoplasm by the protein inhibitor of NF-κ B (Iκ Bα). Upon stimulation by pathway activating signals such as the Tumor Necrosis Factor (TNF), the Iκ B kinase (IKK) is activated. Activated IKK phosphorylates Iκ Bα thus inducing its degradation. As a consequence NF-κ B is released, translocates into the nucleus and regulates effector genes. NF-κ B induces its own downregulation by inducing the production of two proteins: 1) Iκ Bα that inhibits NF-κ B by relocating it to the cytoplasm and 2) the zink-finger protein A20 that represses IKK and consequently indirectly inhibits NF-κ B.