 Research
 Open Access
 Published:
The decrease of consistence probability: at the crossroad of catastrophic transition of a biological system
BMC Systems Biology volume 10, Article number: 50 (2016)
Abstract
Background
Unlike traditional detection of a disease state in which there are clear phenomena, it is usually a challenge to identify the predisease state during the progression of a complex disease just before the serious deterioration, not only because of the high complexity of the biological system, but there may be few clues and apparent changes appearing until the catastrophic critical transition occurs.
Results
In this work, by exploiting the different dynamical features between the normal and predisease states, we present a hiddenMarkovmodel (HMM) based computational method to identify the predisease state and elucidate the essential mechanisms during the critical transition at the network level. Specifically, by considering the network variation and regarding that the predisease state is the end or shiftpoint of a stationary Markov process, a consistence score is proposed to measure the probability that a system is in consistency with the normal state. As validation, this approach is applied to detect the upcoming critical transition of complex systems based on both the dataset generated from a simulated network and the rich information provided by highthroughput microarray data. The effectiveness of our method has been demonstrated by the identification of the predisease states for two real datasets including HCVinduced hepatocellular carcinoma and virusinduced influenza infection.
Conclusion
From dynamical view point, the criticaltransition phenomena in many biological processes are of some generic properties, which can be detected by the established method.
Background
Recently, evidence suggests that the deterioration of many complex diseases is not necessarily smooth but abrupt, that is, the sudden change of system state exists widely during the progression of complex diseases. For example, some chronic diseases such as cancer, the malignant deterioration may arise within a period of shorttime progression, while before such catastrophic transitions the disease such as chronic inflammation may progress gradually for years of long incubative duration [1–5]. In other words, during the progression of illness there is a sudden critical state transition from a relatively healthy stage to a seriously diseased stage. For many complex diseases, it is crucial to detect such critical state transition in advance so as to prevent or at least get ready for such a catastrophic event. However, it is still a challenge work to signal the upcoming critical deterioration since the state of the system may show little apparent change before the tipping point is really reached. This is also the reason why diagnosis based on traditional biomarkers may fail to indicate a predisease state. A possible approach to study the warning signal of the sudden deterioration is to explore and analyze the dynamical features generated from the early abnormalities in distinct timeseries prior to the emergence of the apparent malignancy. Therefore, in order to describe the underlying dynamical mechanism of complex diseases, their evolutions are often modeled as timedependent nonlinear dynamical systems, in which the abrupt deterioration or qualitative transition is viewed as the state transition or phase shift at a bifurcation point [6]. We particularly focus on the complex diseases with sudden deterioration phases or critical transition points during their progressions.
It was previously hypothesized that the disease progression can be modeled into three states (Fig. 1 a): (A) a normal state (or a beforetransition stage), representing a relatively healthy stage with high stability to external perturbations; (B) a predisease state (or a pretransition stage), defined as the prelude to catastrophic deterioration into the disease state, occurring before the imminent phase transition point is reached, therefore, with low stability due to its dynamical structure; (C) a disease state (or an aftertransition stage), representing a seriously deteriorated stage possibly with high stability, because the system usually finds it difficult to recover or return to the normal state even after treatment [7–9]. This is supported by the observations that there is usually sudden health catastrophic shift during the gradual progression of many chronic diseases [10–13]. Recently, a concept called dynamical network biomarker (DNB) was presented to detect the impending critical transition, or equivalently, the predisease state [14, 15]. The DNB method and its subsequent modifications have been successfully applied to real biological and clinical data, and identified the earlywarning signals of the sudden deterioration of several complex diseases [16–21].
In this work, by exploring the distinct dynamical features between the correlation networks respectively generated in normal and predisease state, we developed a computational method on the basis of the hidden Markov model (HMM) for identifying the predisease state before the critical point is really reached during the biological process of complex diseases. Specifically, it is natural to model the progression of a biological system in a normal state as a stationary Markov process, since the normal state is a stable state and with high resilience. The predisease state is modelled as the timevarying Markov process due to its unstable nature and high sensitivity to even small perturbation. The disease state is another stationary Markov process in view of its high stability (see Fig. 1 a). Identifying the predisease state is then equivalent to detecting the end of the stationary Markov process. Utilizing the timecourse data, we presented the computational method and algorithm on estimating the possibility of supposed termination of Markov process at each candidate sampling point. Specifically, by exploring the critical phenomena of network structure in dynamics (Fig. 1 b), a consistence score (Cscore) was proposed to signal the upcoming critical transition, i.e., the drastic decrease of Cscore implies the onset of a predisease state, in contrast to the relatively smooth Cscore in either a normal or disease state (Fig. 1 c). To demonstrate the effectiveness of our method, we applied the algorithm to a simulated regulation network and two sets of real data, the microarray dataset of HCVinduced dysplasia and hepatocellular carcinoma (HCC) (GSE6764) and live influenza infection (humans) caused by H3N2 virus (GSE30550). The predisease states were successfully identified for both numerical simulation and real datasets, and thus signaling the imminent critical transitions.
Methods
We first present the theoretical basis, i.e., the dynamical properties of a complex system near the tipping point, and then illustrate the preprocessing of real datasets and the detail algorithm.
Theoretical basis
Disease progression or its biological process can be generally divided into three states or stages, i.e., (A) the normal stage, (B) the predisease stage, and (C) the disease stage (Fig. 1 a). The normal stage is a stable state with high resilience and robustness stage, during which the state may change slowly and thus is modelled as a stationary Markov process. The predisease stage is unstable and defined as the limit of the normal stage just before the occurrence of catastrophic phase shift. It is sensitive to perturbation including noise or external interference that leads to the change of system parameters, thus still reversible to the normal stage given appropriate interventions. Therefore, the system progression during a predisease stage is considered as a timevarying Markov process, during which the statetransition probability may fluctuate from time to time. However, further progression of the illness led by persistent effects of perturbation may trigger a drastic state change into the disease stage, the other stable state described as the second stationary Markov process, which is usually difficult to return to the normal state even with intensive interventions. Hence, it is crucial to detect the predisease state so as to prevent qualitative deterioration into an irreversible stage. On the basis of the above settings, detecting the imminent critical transition is equivalent to identifying the end point (or switching point) of the stationary Markov process (Fig. 1). Besides, we investigate the different dynamical features between the correlation network respectively generated from normal and predisease state, i.e., comparing the differential links from adjacent time points.
Based on such study design, we carry out theoretical derivation in the following sections.
Markov process of the network evolution near the critical point
We describe the theoretical derivation of our computational method, and introduce the qualitative behaviors in dynamics of biological variables to characterize the critical transition. The dynamics for the progression of complex diseases is very complicated either before or after the critical transition, and therefore the state equations are generally constructed in a highdimensional space with a large number of variables and parameters. Therefore, it is a difficult task to construct an accurate mathematical model describing the dynamical behavior of the system during the biological process. Thus we aim at developing a modelfree method to detect the critical signal.
We consider a discretetime dynamical system in generic form
where Z(k)=(z _{1}(k),…,z _{ n }(k)) is an ndimensional state vector or variables at time instant k that represents gene or protein expressions, while P=(p _{1},…,p _{ s }) is a parameter vector or driving factors that represent slowly changing factors, e.g., genetic factors (SNP, CNV, etc.) and epigenetic factors (methylation, acetylation, etc.). f:R ^{n}×R ^{s}→R ^{n} are generally nonlinear functions. Furthermore, the following conditions are assumed to be held for system (1). (1) \(\bar {Z}\) is a fixed point of system (1) such that \(\bar {Z}= f(\bar {Z}; P)\). (2) There is a value P _{ c } such that one or a pair of eigenvalues of the Jacobian matrix \(\left.\frac {\partial f(Z; P_{c})}{\partial Z}\right _{Z=\bar {Z}}\) is equal to 1 in the modulus. (3) When P≠P _{ c }, the eigenvalues of (1) are not always equal to 1 in the modulus. These three conditions with other transversal conditions imply that the system undergoes a phase change at \(\bar {Z}\) or a codimensionone bifurcation when P reaches the threshold P _{ c }.
For system (1) near \(\bar {Z}\), before P reaches P _{ c }, the system is supposed to stay at a stable fixed point \(\bar {Z}\) and therefore all the eigenvalues are within (0,1) in modulus. The parameter value P _{ c } at which the state shift of the system occurs is called a bifurcation parameter value, or a critical transition value.
Now we consider the linearized approximate equations of Eq. (1). Specifically, by introducing new variables Y(t)=(y _{1}(t),…,y _{ n }(t)) and a fullrank transformation matrix S=(s _{ ij })_{ n×n } satisfying J=S Λ S ^{−1}, i.e.,
we have
where ζ=(ζ _{1},…,ζ _{ n }) are small Gaussian noise with zero means. ζ _{ i } has a small standard deviation σ _{ i } for all i, and covariances κ _{ ij }=Cov(ζ _{ i },ζ _{ j }).
Without loss of generality, the diagonalized matrix Λ=(λ _{1},…,λ _{ n }) is assumed to have each λ _{ i } between 0 and 1. Among the eigenvalues of Λ, the largest one (in modulus), say λ _{1}, first approaches to 1 in modulus when parameter transition P→P _{ c } occurs. The eigenvalue λ _{1} characterizes the system’s rate of change around the fixed point and is called the dominant eigenvalue. The normal state corresponds to the period with λ _{1}<1, whereas the predisease stage corresponds to the period with λ _{1}→1. Without the loss of generality, the first variable y _{1} in Y is assumed to be associated with λ _{1}. Calculating the statistical indices, it is clear that the Pearson’s correlation coefficient (PCC) is of the following expression
Obviously, there are three cases as follows.

when s _{ i1}≠0 and s _{ j1}≠0, \(\lim \limits _{\lambda _{1}\rightarrow 1}\text {PCC}(z_{i}, z_{j})\rightarrow 1\);

when s _{ i1}≠0 and s _{ j1}=0, \(\lim \limits _{\lambda _{1}\rightarrow 1}\text {PCC}(z_{i}, z_{j})\rightarrow 0\);

when s _{ i1}=0 and s _{ j1}=0, \(\lim \limits _{\lambda _{1}\rightarrow 1}\text {PCC}(z_{i}, z_{j})\rightarrow P_{ij}\), where P _{ ij } is a bounded value.
Hence, close to a tipping point, among the original variables Z=(z _{1},...,z _{ n }) there is a dominant group which is composed of dominant variables z _{ i }=s _{ i1} y _{1}(k)+⋯+s _{ in } y _{ n }(k) with s _{ i1}≠0. It is clear from the above derivation that the correlation between a pair of dominant variables increases sharply as the dominant eigenvalue λ _{1}→1, while the correlation between a dominant variable and any other molecule decreases sharply. It also should be noted that such critical change of the correlation only appears when the system approaches to the critical tipping point, or equivalently, the system is in a predisease state. By employing this dynamical feature between a normal state (when the system is far from the tipping point) and a predisease state (when the system is in the vicinity of the tipping point), it is possible to detect the earlywarning signal of the critical transition based on the hidden Markov model.
Identifying the end of Markov process and the algorithm of HMMbased method
Based on the dynamical characteristics of a complex biological system and the discussion above, it is natural to regard the critical transition as the switch from a stationary Markov process (i.e., the normal state) to a timevarying Markov process (i.e., the predisease state). Therefore, identifying the predisease state is equivalent to detecting the end point or switch point of a stationary Markov process. To present the computational method, we first introduce the following symbols.

Denote the stationary Markov process as M _{1}, and the timevarying Markov process as M _{2}.

Denote the time variable as t, and the progression of the system along time series as t∈{1,2,...,T−1,T,...}.

Denote the observed sequence up to time point t as O={o _{1},o _{2},...,o _{ t−1},o _{ t }}, where o _{ t } represents the sample set derived at time point t.

Denote the state sequence up to time point T as {s _{1},s _{2},...,s _{ T−1},s _{ T }}, i.e., the state of the system is s _{ T } at time point t=T, or equivalently, s _{ T }=S t a t e(o _{ T }).
Specifically, it is assumed that a biological system is initially in the normal state, or equivalently, the progression of the system is in a stationary Markov process M _{1}. Then for the progression of the system along a time series {1,2,...,T−1,T,...}, we propose a consistence score (Cscore) to measure the probability of the system being in the same stationary Markov process, i.e.,
For each candidate time point t=T, the high value of Cscore presents that the progression of the system at t=T is consistent with the stationary Markov process M _{1}, i.e., it is still in the stationary Markov process, while the sudden decrease of Cscore illustrates the low consistence with M _{1} (Fig. 1 c), and the progression of the system is no longer in the stationary Markov process. Therefore, the abrupt change of Cscore identifies the predisease state and indicates the upcoming critical transition.
From the third time point or sampling stage during a time series, we regard each point/stage as a candidate transition point/stage. In order to validate whether a candidate time point t=T (T=3,4...) is the changing or switching point from the stationary Markov process to the timevarying Markov process, we carry out an iterative process as the following two steps.

1.
Train a hidden Markov model (HMM) θ _{ T−1}=(A,B,π) on the basis of an observed sequence {o _{1},o _{2},...,o _{ T−1}}, i.e., the preceding T−1 sets of samples generated from time points 1,2,...,T−1. The stationary Markov process in the normal state is actually described by the trained HMM.

2.
Calculating the Cscore based on the observation {o _{ T }} and the trained HMM θ _{ T−1}. If there is a drastic decrease of Cscore, then the iterative process end up with t=T being the switching point, at which the biological system is in the predisease stage. Otherwise go to back to the training step for next time point t=T+1.
First, to train an HMM θ _{ T−1}=(A,B,π) where the subscript T−1 of θ represents that the HMM is derived from the training samples up to time point t=T−1, we need to estimate a state transition matrix A, an emission matrix B, and a probability vector for the initial state π. For a network with n nodes and m links where each node represents a biomolecule and each link represents the correlation between two nodes, suppose at a sampling time point t∈{1,2,...,T−1,T,...} there are w samples for each node z _{ i }, i.e., \(\{{z_{i}^{1}} (T1),{z_{i}^{2}} (T1),...,{z_{i}^{w}} (T1)\}\). Then through leavingoneout procedure we obtain w Pearson’s correlation coefficients (PCCs) between any two nodes z _{ i } and z _{ j }, i.e., {P C C _{1}(z _{ i },z _{ j }),P C C _{2}(z _{ i },z _{ j }),...,P C C _{ w }(z _{ i },z _{ j })} where each P C C _{ k }(z _{ i },z _{ j }) (k=1,2,...,w) is calculated based on w−1 samples for z _{ i } and z _{ j }. To train A and B based on an unsupervised learning procedure, we have the following steps.
A. Estimate the distribution of each link at a former time point (T−2) Under the assumption that each correlation coefficient follows Gaussian distribution, we obtain the estimation of the distribution for link P C C(z _{ i },z _{ j })_{ T−2} between two nodes z _{ i } and z _{ j } at time point T−2, i.e., based on the w correlation coefficients, we estimate the mean μ _{ k }(T−2) and standard deviation σ _{ k }(T−2) for each link P C C(z _{ i },z _{ j })_{ T−2}. Then we have the distribution \(N(\mu _{k} (T2), {\sigma ^{2}_{k}} (T2))\).
B. Determine the consistence vector for each variable at (T−1) At time point t=(T−1), we have m links for the network, i.e., l i n k _{ k }(T−1)=P C C(z _{ i },z _{ j })_{ T−1} between two nodes z _{ i } and z _{ j } with k=1,2,...,m, and for each link there are w samples through leaveoneout procedure, i.e., \(link_{k}(T1)=\{lin{k_{k}^{1}},lin{k_{k}^{2}},...,lin{k_{k}^{w}}\}\). Let an index \({L_{k}^{s}} (T1)\in \{0,1\}\) describe whether a correlation \(lin{k_{k}^{s}}\) is consistent comparing with its former distribution \(N(\mu _{k} (T2), {\sigma ^{2}_{k}} (T2))\), that is, whether the appearance of link at time T−1 is with large probability in the distribution \(N(\mu _{k} (T2), {\sigma ^{2}_{k}} (T2))\). For each correlation l i n k _{ k }(T−1) at time point T−1, we have
Obviously, \({L_{k}^{s}} (T1)=0\) represents that the correlation l i n k _{ k }(T−1) is consistent with the former distribution \(N(\mu _{k} (T2), {\sigma ^{2}_{k}} (T2))\), while x _{ k }(T−1)=1 represents that the correlation l i n k _{ k }(T−1) is inconsistent with the former distribution \(N(\mu _{k} (T2), {\sigma ^{2}_{k}} (T2))\). Thus, for each sample of correlation \((lin{k_{1}^{s}} (T1),lin{k_{2}^{s}} (T1),...,lin{k_{m}^{s}} (T1))\), the vector \(L^{s} (t1)=({L_{1}^{s}} (T1),..., {L_{m}^{s}} (T1))\) is the consistence vector at time T−1.
Let #0(T−1) and #1(T−1) respectively denote the number of value 0 and that of value 1 in an consistence vector L ^{s}(T−1) at T−1. Obviously, #0(T−1) + #1(T−1)=m, where m is the number of links in the network, among which there are #0(T−1) variables consistent with the former distribution \(N(\mu _{k} (T2), {\sigma ^{2}_{k}} (T2))\), while #1(T−1) variables inconsistent with the former distribution \(N(\mu _{k} (T2), {\sigma ^{2}_{k}} (T2))\).
According to above settings, we actually transform the observed correlation sample set o _{ T−1}=(l i n k _{1}(T−1),l i n k _{2}(T−1),...,l i n k _{ m }(T−1)) into the corresponding consistence vector o _{ T−1}=(L ^{1}(T−1),L ^{2}(T−1),...,L ^{m}(T−1)).
C. Training the HMM at T−1 In this step, we need to identify the state transition matrix A and the emission matrix B at (T−1), that is, training the HMM θ _{ T−1}=(A(T−1),B(T−1),π) on the basis of an observed sequence {o _{1},o _{2},...,o _{ T−1}}.
There are two possible states W _{0} and W _{1} in time point t−1. Then, we calculate the possibilities for each possible state transition and thus obtain the state transition matrix A(T−1)=(a _{ ij }(T−1))_{2×2}, where
with i,j∈{1,2}.
Besides, for the emission matrix B(T−1)=(b _{ jk }(T−1))_{2×(m+1)} where b _{ jk }(T−1) is the probability of the kth possible observation under the assumption that the system state is W _{ j } at time t−1, i.e.,
where j∈{1,2} and k∈{0,1,2,...,m}. Obviously, there are m+1 possible observable cases for any correlation sample at t−1, i.e., case #1(T−1)=k with k∈{0,1,2,...,m}. In the case of an mlink biological network, case #1(T−1)=k reflects that there are k links differentially expressed in one observation (i.e., one sample) at T−1 comparing with their former expressions.
The initial state distribution π={π _{1},π _{2}} is defined at time T−2, where
with i∈{1,2}.
According to BaumWelch algorithm, we build A, B, and π based on the training set {o _{1},o _{2},...,o _{ T−1}}, i.e., sample sets up to time T−1. The training process at time T−1 includes the following three steps.

Initialization For h=0, set initial values for \(a_{ij}^{0}\), \(\,b_{jk}^{0}\), and \({\pi _{i}^{0}}\), we have the HMM θ ^{0}=(A ^{0},B ^{0},π ^{0}).

Update For h=1,2,..., we have the update for \(a_{ij}^{h}\), \(\,b_{jk}^{h}\), and \({\pi _{i}^{h}}\) by recursion
$${} a_{ij}^{h} =\frac{\sum\limits_{t=1}^{T1}\xi_{t}(i,j)}{\sum\limits_{t=1}^{T1}\gamma_{t}(i)}, \quad \!\!\!b_{jk}^{h} =\frac{\sum\limits_{t=1, \#1(T1)=k}^{T1}\gamma_{t}(k)}{\sum\limits_{t=1}^{T1}\gamma_{t}(k)}, \quad{\pi_{i}^{h}} =\gamma_{1}(i), $$(9)where
$$ \gamma_{t}(i)=P(s_{t}=M_{i}\,\,O,\, \theta_{p})=\frac{P\left(s_{t}=M_{i},\, O\,\,\theta_{p}\right)}{P(O\,\,\theta_{p})} $$(10)and
$$ {} {{\begin{aligned} \xi_{t}(i,j)=P\left(s_{t1}=M_{i}, s_{t}=M_{j}\,\,O,\, \theta_{p}\right)=\frac{P\left(s_{t1}=M_{i}, s_{t}=M_{j},\, O\,\,\theta_{p}\right)}{P\left(O\,\,\theta_{p}\right)} \end{aligned}}} $$(11)with i,j∈{0,1}. For γ _{ t }(i) and ξ _{ t }(i,j), the HMM θ _{ p } used in the prior knowledge is that updated from the preceding step. For example, at the first iterative step, the HMM θ _{ p } is θ ^{0}=(A ^{0},B ^{0},π ^{0}) based on the initial values. The observation sequence used in the prior knowledge is O={o _{1},o _{2},...,o _{ T−1}}.

Ending When h=H, i.e., the Hthupdating step, the recursion is terminated. Then
$$ {\theta_{i}^{H}} =\left(A^{H},B^{H},\pi^{H}\right). $$(12)The HMM used in the testing process follows \(\theta _{T1}={\theta _{i}^{H}}\).
Under the assumption that the transition point is at T, or in other word, time point T is hypothesized as the end point of a stationary Markov process of the normal stage (see Fig. 1). Thus the onset of a predisease stage is the end of the stationary Markov process described as the trained HMM θ _{ T−1}. Therefore, at testing step in a candidate transition point T, we calculate the consistence score, i.e., Cscore, based on the trained HMM θ _{ T−1}=(A(T−1),B(T−1),π).
According to the Markov chain, the Cscore is
The numerator
and the denominator
where a _{11} and a _{ ij } is from the state transition matrix A=(a _{ ij })_{2×2} in Eq. (6), b _{1k } and b _{ jk } is from the emission matrix B=(b _{ jk })_{2×(m+1)} in Eq. (7) while k=#1(T) represents that for the sample set o _{ T } there are k variables with consistence index 1 in average, Q is the forward probability calculated based on standard forward algorithm. It should be noticed that in Eqs. (13) and (14) the backward probability is set to be 1, since samples o _{ T+1},⋯ are not available when T is the testing time point.
According to above settings, given the HMM θ _{ T−1}, the calculation of HMM probability P _{ T } at a candidate time point T only relies on the samples from T−1 and T. Obtaining the Cscore P _{ t } for every candidate time point, the time point a r g _{ t }[m a x(P _{ t })]_{ t=1,2,...,T }, is the transition point.
Results
Identifying the pretransition state for a sevennode network
To demonstrate the effectiveness of the computational method and the consistence score, we used a sevennode gene regulatory network (Fig. 2 a) to show the detection of earlywarning signals near a critical point. These types of gene regulatory networks are often used to study transcription, translation, diffusion, and translocation processes that affect gene regulatory activities [22]. The following seven differential equations represent the gene regulation of seven genes in a network where gene regulation is represented in a MichaelisMenten form as the following Eq. (15), with the exception of the degradation rates, which are linearly proportional to the concentrations of the corresponding biomolecules.
where q is a scalar control parameter and ζ _{ i }(t) (i=1,2,...,10) are Gaussian noises with zero means and covariances κ _{ ij }=Cov(ζ _{ i },ζ _{ j }). z _{ i } (i=1,...,10) represent the concentrations of mRNAi. In Eq.(15), the degradation rates of mRNAs are \(\left (\frac {2+q}{5}, \frac {2+q}{5}, 1, \frac {6}{5}, \frac {7}{5}, \frac {9}{5}, \frac {9}{5}\right)\). There is a stable equilibrium point \(\bar {Z}=(\bar {z_{1}}, \bar {z_{2}},...,\bar {z_{10}})=(0,0, 0, 0, 0, 0, 0, 0, 0, 0)\). The differential equations Eq. (15) can be transformed into the difference equations Z(k+1)=f(Z(k),P) using Euler scheme with a small time interval 1. It is clear that there are seven distinct eigenvalues (0.67^{q},0.45,0.37,0.30,0.24,0.20,0.13) for the linearized system. Thus, the equilibrium point \(\bar {Z}\) is stable when q∈(0,1]. There is a critical value q _{ c }=0. We aim to detect early warning signals that indicate the critical transition as a control parameter q approaches the critical value 0 from q>0. Applying the HMMbased approach to the system, we obtain the Cscore curve as in Fig. 2 b.
The numerical simulation shows that a drastic boost of the Cscore, i.e., HMM probability, indicates the upcoming critical transition at parameter q=0 (Fig. 2 b). To demonstrate the different dynamics of the system between the normal state and the predisease state, we illustrate the underlying frequency of links with different correlation values (Fig. 2 c), from which it can be seen that there is a significant change in the frequency distribution of the links when the system is near a tipping point.
Predicting critical transitions in real datasets
We applied the HMMbased method in three real experimental datasets, i.e., the microarray data for HCVinduced dysplasia and hepatocellular carcinoma (HCC) (GSE6764) and live influenza infection (humans) caused by H3N2 virus (GSE30550).
We first present the application on HCVinduced HCC dataset, in which there are 7 sampling stages, i.e., cirrhosis, lowgrade dysplastic stage, highgrade dysplastic stage, very early HCC stage, early HCC stage, advanced HCC stage and very advanced HCC stage. In these sampling stages, gene expression profiles of 75 tissue samples were analyzed representing the stepwise carcinogenic process from preneoplastic lesions (cirrhosis and dysplasia) to HCC, including four neoplastic stages (“very early HCC" to metastatic tumors). According to the presented method above, we regard that each sampling stage is a candidate transition point, i.e., the end point of a stationary Markov process in the normal state. To validate whether a candidate point is the transition one, there are the following four dataspecific steps. First, to decrease the computational complexity, at each candidate point we selected top 5 % differentialexpression genes through the rank of Pvalues from student ttest. Second, a network was constructed by mapping these selected genes to human proteinprotein interaction (PPI) network from STRING (http://stringdb.org/). Third, the normalized correlation values, i.e., PCCs, were calculated for the corresponding links at each stage. Fourth, at each candidate point, the Cscore is then calculated (Fig. 3 a). Clearly, there are seven probability curves respectively corresponding to seven groups of genes selected in distinct candidate points, i.e., each group of genes is differentially expressed at one time point. Among the probability curves in Fig. 3 a, the red one presents the Cscore calculated based on the network constructed at the “very early HCC" stage (the 4th sampling stage), from which it can be seen that at the 4th sampling stage the Cscore shows the minimum probability which presents the least consistence of the system with the preceding state. This is in coincided with the previous result [9] and the observed biological phenotypes [23]. Thus the abrupt decrease of the Cscore reflects the presence of a predisease state and indicates the imminent critical transition into a disease state (HCC stage). To further carry out functional analysis and elucidate the relation between top differentialexpression genes and dysfunctional pathways, in Fig. 3 c we also employed the clustering analysis through correlation at the identified predisease state (“very early HCC" stage), that is, we selected a clustering group of genes related to the differentiallyexpressed links (with Pvalue 1.91E03 and around 3fold change comparing with the control group) for further functional analysis.
Figure 4 a presents the dynamical evolution in the gene network based on the human molecular network with their functional interactions (proteinprotein interactions and TFtarget regulations). The selected gene group in Fig. 3 c is placed at the top corner of each network. Clearly, at “very early HCC" stage the selected gene group are strongly correlated with wild fluctuation, which provides a significant signal from a network viewpoint and indicates the predisease state just before the deterioration into HCC, while other genes show no significant signal. Clearly, when the deterioration is impending, these selected genes form a special subnetwork, which actually guarantees the successful application of the HMMbased method, that is, this subnetwork exhibits the most significant changes in the links when the system is near a critical transition point. It can also be seen that, oppositely, neither the whole gene network nor the selected differentialexpression genes present a signal before or after the transition, which shows the sensitivity of the Cscore at the predisease state. In fact, the Cscore reveals the existence of the predisease state, which, however, cannot be shown by any single biomolecule. Therefore, the benefits brought by the HMMbased method in signaling the predisease state make the identification and management of highrisk cases more effective.
The functional analysis shows that some of the selected genes are highly relevant to the corresponding complex diseases, which validates the effectiveness of our method in a way. In the HCC study, many genes included in the top significant subnetwork relate to the response to HCV infection, especially the activation of the immune system and the dysfunctions associated with basic cell metabolism of hosts [23–25]. In the enrichment analysis, the most significant enriched pathways are related to the function of cell growth and cell metabolism, such as transcriptional misregulation in cancer, the Wnt signaling pathway and purine metabolism. The enriched pathways in cancer and hepatitis provide evidence that most of the genes related to differentiallyexpressed links may relate to the deterioration into HCC. These functional analysis implies the involvement of the differentialexpression genes and links in the dysfunctional pathways and other HCVrelated biological processes.
Figures 3 b, 3 d and 4 b shows another application of Cscore in the dataset of H3N2 virusinduced influenza infection, in which there are 16 sampling time points over the whole study period (132 hours). Nine subjects were diagnosed as having influenza infection or corresponding clinic symptoms 45 hours after the exposure to influenza viruses [26]. The specific procedure of data processing, gene filtering and computation are similar to the previous application. It can be seen that the Cscore curves based on the human PPI network for live influenza infection in Fig. 3 b, with eight probability curves respectively corresponding to the first eight candidate points. Among the probability curves in Fig. 3 b, the red curve presents the Cscore based on the top 5 % differentialexpression genes at 29 hr, while the orange one shows that calculated at 36 hr, the adjacent time point after 29 hr. Both these two curves show a sudden decrease of consistence probability during the progression, which implies that the onset of predisease state is around a period spanned from 29 to 36 hr, i.e., the upcoming deterioration into a disease state might be after 36 hr, which is in coincidence with the fact that the early symptoms of influenza infection arises after 45 hr. Furthermore, to show the significance of the selected genes whose collective dynamics results in the significant changes in the links and thus generate the earliest signal at 29 hr, in Fig. 3 d we carried out the clustering analysis based on the correlation values at 29 hr. We see that the genes in the selected group show a large fold change and a significant Pvalue in average. For these genes, the enrichment analysis shows that the most significant pathway is influenza A pathway, which shows the involvement of the selected genes in the biological processes of infection.
Figure 4 b presents the dynamical evolution of the gene network respectively at 5, 12, 29, 60 hr for live influenza infection. The selected gene group in Fig. 3 d is placed in the top corner. It can been seen that at 29 hr the structure of the subnetwork of the selected genes changes significantly and thus signals the upcoming deterioration into a disease state which is also in coincidence with the clinic observation.
Therefore, our application results are in coincidence with the experimental observation and successfully detect the earlywarning signal of the impending critical transition.
Discussion and conclusions
Complex diseases significantly damage the health of people all over the world. Detecting the earlywarning signal of the sudden deterioration provides an opportunity to interrupt and prevent the continuing costly cycle of managing these diseases and their complications. Although it is crucial to detect the predisease state so as to prevent the qualitative deterioration by taking appropriate intervention actions, it is a challenging task to reliably identify the predisease state because the state of the system may show neither apparent change nor clear phenomenon before this critical transition during the disease progression. This is also the reason why diagnosis based on traditional biomarkers may fail to indicate a predisease state.
In this work, by detecting the dynamical change of links in a network, we presented a computational method and corresponding algorithm based on HMM to measure the dynamical difference of the system progression, and thus identify the imminent critical transition. It is worth noting that this method aims to detect the earlywarning signal generating from the predisease state (or pretransition state), rather than to find the indication of disease state (or aftertransition state) in which the qualitative deterioration has already taken place.
We applied our method to the identification of the predisease state based on a simulated dataset and two microarray datasets, which demonstrate the sensitivity and effectiveness of our method. For both two diseases, we constructed biomolecular networks (Fig. 4) to gauge the dynamical regulation among genes at different sampling point along a timecourse progression. Both the functional and enrichment analyses validate the computational results. Therefore, the HMMbased method provides a computational possibility of prying into the underlying mechanism of biological processes of the disease progression, and thus may help to achieve the timely intervention. Our dynamic network analysis also suggests, in regard to the diseases, to focus on the specific predisease states to probe the in situ external perturbation (such as environment changes) preceding the development into a badly ill stage. This may lead to not only insights of external environment interactions, but also an effective time window for novel intervention or therapeutic strategies in specific diseases. The main difference between our work and previous ones is that rather than screen out some variables (genes or proteins), the proposed method mainly focuses on the direct identification of critical transition point, by calculating and comparing the consistence probability of each candidate end point of the Markov model in the normal state. Therefore, the accuracy of HMMbased approach is not limited by the selection of features. This is the main value in the potential applications of the HMMbased method from a network point of view.
There are limitations of this work. First, the validity of the identified predisease state and the accurate result needs further supports from animal experiments or clinical studies. Second, the method is insensitive when the correlations are not differentially expressed. Although this work is merely a step towards detecting the earlywarning signals of critical transition during disease progression and the algorithm is expected to be improved in both sensitive and accurate ways, it opens a window of opportunity for the applicable approach to the earlywarning system of critical transition during the biological processes of complex diseases.
References
Venegas JG, Winkler T, Musch G, Melo MF, Layfield D, Tgavalekos N,et al.Selforganized patchiness in asthma as a prelude to catastrophic shifts. Nature. 2005; 434(7034):777–82.
McSharry PE, Smith LA, Tarassenko L.Prediction of epileptic seizures: are nonlinear methods relevantNat Med. 2003; 9(3):241–2.
Pastor BR, Guallar E, Coresh J.Transition models for changepoint estimation in logistic regression. Stat Med. 2003; 22(7):1141–62.
Paek SH, Chung HT, Jeong SS, Park C, Kim C, Kim JE, et al.Hearing preservation after gamma knife stereotactic radiosurgery of vestibular schwannoma[J]. Cancer. 2005; 104(3):580–90.
Liu JK, Rovit RL, Couldwell WT. Pituitary apoplexy: Seminars in Neurosurgery. New York: 2001. p. 315–20.
Tanaka G, Tsumoto K, Tsuji S, Aihara K.Bifurcation analysis on a hybrid systems model of intermittent hormonal therapy for prostate cancer. Physica D: Nonlinear Phenomena. 2008; 237(20):2616–27.
Achiron A, Grotto I, Balicer R, Magalashvili D, Feldman A, Gurevich M.Microarray analysis identifies altered regulation of nuclear receptor family members in the predisease state of multiple sclerosis. Neurobiol Dis. 2010; 38(2):201–9.
Chen L, Liu R, Liu ZP, Li M, Aihara K.Detecting earlywarning signals for sudden deterioration of complex diseases by dynamical network biomarkers. Sci Rep. 2012;2. doi:http://dx.doi.org/10.1038/srep00342.
Liu R, Li M, Liu ZP, Aihara K, Chen L.Identifying critical transitions and their leading biomolecular networks in complex diseases. Sci Rep. 2012;2. doi:http://dx.doi.org/10.1038/srep00813.
Scheffer M, Bascompte J, Brock WA, Brovkin V, Carpenter SR, Dakos V,et al.Earlywarning signals for critical transitions. Nature. 2009; 461(7260):53–9.
Liu R, Yu X, Liu X, Xu D, Aihara K, Chen L.Identifying critical transitions of complex diseases based on a single sample. Bioinformatics. 2014; 30(11):1579–86.
Litt B, Esteller R, Echauz J, Alessandro MD, Shor R, Henry T, et al.Epileptic seizures may begin hours in advance of clinical onset: a report of five patients. Neuron. 2001; 30(1):51–64.
He D, Liu ZP, Honda M, Kaneko S, Chen L.Coexpression network analysis in chronic hepatitis B and C hepatic lesions reveals distinct patterns of disease progression to hepatocellular carcinoma. J Mol Cell Biol. 2012; 4(3):140–52.
Liu R, Wang X, Aihara K, Chen L.Early diagnosis of complex diseases by molecular biomarkers, network biomarkers, and dynamical network biomarkers. Med Res Rev. 2014; 34(3):455–78.
Liu R, Aihara K, Chen L.Dynamical network biomarkers for identifying critical transitions and their driving networks of biologic processes. Quant Biol. 2013; 1(2):105–14.
Liu X, Liu R, Zhao XM, Chen L.Detecting earlywarning signals of type 1 diabetes and its leading biomolecular networks by dynamical network biomarkers. BMC Med Genom. 2013; 6(Suppl 2):S8.
Tan Z, Liu R, Zheng L, Hao S, Fu C, Li Z, et al.Cerebrospinal fluid protein dynamic driver network: At the crossroads of brain tumorigenesis. Methods. 2015; 83:36–43.
Zeng T, Zhang C, Zhang W, Liu R, Liu J, Chen L.Deciphering early development of complex diseases by progressive module network. Methods. 2014; 67(3):334–43.
Liu R, Chen P, Aihara K, Chen L.Identifying earlywarning signals of critical transitions with strong noise by dynamical network markers. Sci Rep. 2015;5. doi:http://dx.doi.org/10.1038/srep17501.
Li M, Zeng T, Liu R, Chen L.Detecting tissuespecific early warning signals for complex diseases based on dynamical network biomarkers: study of type 2 diabetes by crosstissue analysis. Brief Bioinformatics. 2014; 15(2):229–43.
Chen P, Liu R, Chen L, Aihara K.Identifying critical differentiation state of MCF7 cells for breast cancer by dynamical network biomarkers. Front Genet. 2015;6. doi:http://dx.doi.org/10.3389/fgene.2015.00252.
Chen L, Wang RS, Zhang XS. Biomolecular networks: methods and applications in systems biology. New Jersey: John Wiley & Sons; 2009.
Wurmbach E, Chen Y, Khitrov G, Zhang W, Roayaie S, Schwartz M, et al.Genomewide molecular profiles of HCVinduced dysplasia and hepatocellular carcinoma. Hepatology. 2007; 45(4):938–47.
Bruix J, Boix L, Sala M, Llovet JM. Focus on hepatocellular carcinoma. Cancer Cell. 2004; 5(3):215–9.
Farazi PA, DePinho RA. Hepatocellular carcinoma pathogenesis: from genes to environment. Nat Rev Cancer. 2006; 6(9):674–87.
Huang Y, Zaas AK, Rao A, Dobigeon N, Woolf PJ, Veldman T, et al.Temporal dynamics of host molecular responses differentiate symptomatic and asymptomatic influenza a infection. PLoS Genet. 2011; 7(8):e1002234.
Acknowledgements
Publication of this article was funded by National Natural Science Foundation of China (Grant numbers 91530320, 91439103, 11401222 and 61370228); Science and Technology Planning Project of Guangdong Province, China (Grant number 2015B010128008, 2014B090903008, 2015B010109006); Fundamental Research Funds for the Central Universities (Grant number 2014ZZ0064); Pearl River Science and Technology Nova Program of Guangzhou (Grant Number 201610010029).
Declarations
This article has been published as part of BMC Systems Biology Vol 10 Suppl 2 2016: Selected articles from the IEEE International Conference on Bioinformatics and Biomedicine 2015: systems biology. The full contents of the supplement are available online at http://bmcsystbiol.biomedcentral.com/articles/supplements/volume10supplement2.
Author’s contributions
PC and YL conceived the research. PC performed the numerical simulation and real data analysis. All authors wrote the paper and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Author information
Authors and Affiliations
Corresponding author
Additional information
From IEEE International Conference on Bioinformatics and Biomedicine 2015 Washington, DC, USA.912 November 2015
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Chen, P., Li, Y. The decrease of consistence probability: at the crossroad of catastrophic transition of a biological system. BMC Syst Biol 10 (Suppl 2), 50 (2016). https://doi.org/10.1186/s129180160295y
Published:
DOI: https://doi.org/10.1186/s129180160295y
Keywords
 Dynamical network biomarker
 Hidden Markov process
 Predisease states