Forecasting influenza A pandemic outbreak using protein dynamical network biomarkers

Background Influenza A virus is prone to mutation and susceptible to human beings and spread in the crowds when affected by the external environment or other factors. It is very necessary to forecast influenza A pandemic outbreak. Methods This paper studies the different states of influenza A in the method of dynamical network biomarkers. Through establishing protein dynamical network biomarkers of influenza A virus protein, a composite index is ultimately obtained to forecast influenza A pandemic outbreak. Results The composite index varies along with the state of pandemic influenza virus from a relatively steady state to critical state before outbreak and then to the outbreak state. When the composite index continuous decreases for 2 years and increases of more than o.1 suddenly, it means the next year is normally in the outbreak state. Therefore, we can predict and identify whether a certain year is in the critical state before influenza A outbreak or outbreak state by observing the variation of index value. Meanwhile, through data analysis for different countries influenza A pandemic outbreak in different countries can also be forecasted. Conclusions This indicates the composite index can provide significant warning information to detect the stage of influenza A, which will be significantly meaningful for the warning and prevention of influenza A pandemic.


Background
It is proved that there is a kind of common critical phenomenon in lots of complex biological process, i.e. a relative stable state quickly enters into another state after a critical point in a very short period of time [1,2]. There is the kind of critical phenomenon for influenza A, because it needs only a very short period of time quickly from a relative stable state to outbreak state after a critical point. Thus in order to timely and effectively prevent and control the outbreak of influenza A pandemic, the key lies in predicting the critical point before the outbreak.
At present, influenza A is studied from all aspects. Pan et al. found that the spatio-temporal network that connects the cities with human cases along the order of outbreak timing emerges two-section-power-law edgelength distribution, using the empirical analysis and modeling studies [3]. Chang et al. studied the vaccine for influenza, so as to achieve the effect of prevention of influenza [4]. Banerjee et al. made full comparisons for the structural features of all H1N1 HA gene sequences and the composition of global amino acid to make it possible to depict the developing trend of influenza A [5]. He et al. also made in-depth studies to identify HA protein epitopes of avian influenza virus [6].
This paper studies the different states of influenza A using DNB. Through establishing PDNB of influenza A virus protein and using the nature of DNB, a composite index is ultimately obtained to forecast influenza A pandemic outbreak. The composite index varies along with the state of pandemic influenza virus from a relatively steady state to critical state before outbreak and then to the outbreak state. Therefore, we can predict and identify whether a certain year is in the critical state before influenza A outbreak or outbreak state by observing the variation of index value. This indicates the composite index can provide significant warning information to detect the stage of influenza A, which will be significantly meaningful for the warning and prevention of influenza A pandemic. Meanwhile, through data analysis for different countries influenza A pandemic outbreak in different countries can also be forecasted.

DNB analysis
The concept of network biomarkers is set up with the development of high-throughput genomic technologies and the systematic and multidimensional study of molecular expression profiling [7,8]. This concept refers to a series of markers as well as their mutual relations and has been proposed as a new marker type [9]. Compared with traditional biomarkers, these markers can accurately distinguish disease states for taking the links between the molecules into consideration [10,11]. However, it is used to diagnose the states of diseases, not for the detecting the critical point before the outbreak of diseases.
The method of dynamic network biomarkers focuses on the detection and assessment of different stages of the disease in the development of disease and shows it is a time-dependent method [12]. It studies the location changes of the markers over time and the relationship among network markers over time changing and then constructs three-dimensional images showing the interaction relationship between the markers. Therefore the study of Network markers focuses on the molecular interactions and distinguishes normal and disease states, and the study of dynamic network markers focuses on dynamic changes, which is helpful to discover the marker accurately and comprehensively and further to distinguish the state of disease before outbreak. It not only does not depend on the method of small sample excavation mode markers, but also make it easier for clinical application. At the same time it can be used in future studies to find early warning signals in any biological process, such as differentiation, senescence and cell cycle of each phase as well as key change.

Defining PDNB
Firstly, taking hemagglutinin (HA) protein as an example, we suppose that a HA protein marked y is linked sequentially by t numbers of amino acids. Its amino acid sequence is represented as y = x 1 x 2 ⋯ x t , in which x i ∈ {A, V, L, I, P, F, W, M, D, E, G, S, T, C, Y, N, Q, K, R, H}; i = 1 , 2 , ⋯ , t. We suppose s-1-th year have m numbers of influenza virus HA proteins all over the world and its amino acid sequence is represented as y s − 1 , 1 , y s − 1 , 2 , ⋯ , y s − 1 , m . Meanwhile, We suppose s-th year have n numbers of influenza virus HA proteins all over the world and its amino acid sequence is represented as y s , 1 , y s , 2 , ⋯ , y s , n . The amino acid number of the y i , j is marked c i , j ,where i=s-1,s; j = 1 , 2 , ⋯ , q;q = max {m, n}. Sequentially selecting the i-th amino acid for y s − 1 , 1 , y s − 1 , 2 , ⋯ , y s − 1 , m to form a new amino acid sequence is defined as Z s − 1 , i , and then take out the one of the largest number of amino acids. If the maximum number of amino acids has two or more than two, we take the first amino acid without loss of generality. At the same time, it is marked We individually connect them in order to form a new amino acid sequences (U S − 1 = x 1 x 2 ⋯ x k ) and then separately compare with corresponding amino acids of y s , 1 , y s , 2 , ⋯ , y s , n one by one. If they are different, the assignment is 1, on the contrary the assignment is 0. Therefore, n new sequences are represented by E s , 1 , E s , 2 , ⋯ , E s , n are obtained in sth year. Then we calculate their mean (M), standard deviation (SD) and coefficient of variation (CV). Their computation formulas are as follows: where f(s, i) represents the frequency of occurrence of one in sequence E s , i . Similarly, we calculate M, SD and CV of the other nine proteins. The protein that the top three values of CV s are defined as core protein (CP), and the others are no-core protein (NP). CP is a set of high confident interactions of proteins, which forms a subnetwork called influenza A virus proteins of protein dynamical network biomarkers (see Fig. 1).

Defining forecasting index
The frequencies of the 20 kinds of amino acids can be calculated through the computation formulas as follows: where f x i s; j ð Þ represents the frequency of occurrence of amino acid x i in amino acid sequence y s,j . Therefore, we can get a 23 dimensional characteristic value vector of HA protein. By the same way, the f x i s ð Þ of the other nine proteins can be calculated in turn, so we can get a characteristic value matrix (X=[V 1 (s),V 2 (s), ⋯ ,V 10 where V t (s) represents the characteristic value vector of the t-th influenza A protein, t = 1 , 2 , ⋯ , 10. Defining the characteristic distance between proteins: where v and w respectively represents the v-th and the w-th protein.
The core proteins are not only the universal indicators to detect the complex outbreak signal of influenza A, but also the dominant or driving network of the whole protein system in the development, mutation and outbreak of the critical stages. In fact, the dominant network breaks through the limits of variation in the first time, first enters to the state of variation, and then affects other proteins and lead to the transfer of the entire system. Therefore, the determination of the dominant network can not only detect system in the critical state before break out, also help to reveal the underlying mechanism of influenza A virus proteins from the dimension of dynamic network. By combining the above properties of the core proteins, we can get a composite index: Where CV k --represents the average value of the core proteins' CV s , CP cd --is the average value of the characteristic distance between the core proteins, NP cd --is the average value of the characteristic distance between the core and non-core proteins.
When I s-3 > I s-2 > I s-1 and I s -I s-1 > 0.1, it can be concluded that s + 1 year is in the outbreak state.
Although the amino acid sequence of each protein will fluctuate randomly, the composite index can provide significant early warning information when the influenza A virus is close to the critical state before the outbreak or the outbreak state.

Forecasting influenza A pandemic outbreak
Ten of proteins for influenza A virus are hemagglutinin (HA), matrix protein, matrix protein 2, neuraminidase, nonstructural protein 1, non-structural protein 2, nucleocapsid protein, PA RNA polymerase, PB1 RNA polymerase and PB2 RNA polymerase. They are composed of 20 different amino acids link to form polymers. This paper selects influenza A virus protein sequences from 1934 to September 2016 from the NCBI website (http://www.ncbi.nlm.nih.gov/ genomes/FLU/Database/nph-select.cgi?go=database), lots of data before 1934 are absent.  As shown in Table 1, by using the above methods to calculate the composite index of the 1934 to September 2016. However, we can't figure out the composite index of some years, because some data in 1937-1942, 1944-1945, 1952-1956 years are absent.

Forecasting influenza A pandemic outbreak in pandemic occurrence place
Through influenza A virus protein data analysis for different countries influenza A pandemic outbreak in different countries can also be forecasted. Take China as an example, this paper selects influenza A virus protein sequences occurred in China from the NCBI website to forecast influenza A pandemic outbreak in China. Whereas lots of data in 1954-1956, 1958-1963, 1965, 1967 years are absent, all data before 1954 and in 2016 year are absent.
As shown in Table 2, by using the above methods to calculate the composite index of the 1954 to 2015. However, we can't figure out the composite index of some years, because lots of data in 1954-1956, 1958-1963, 1965, 1967 years are absent, all data before 1954 and in 2016 year are absent.

Forecasting influenza A pandemic outbreak
The dynamic network markers of Pandemic influenza virus vary in the whole process from a relatively stable state to the critical state before outbreak as well as the outbreak state, which results in the status transfer of the entire network and finally results in fluctuations in the composite index. Therefore, by observing the transformation of the composite index, we can predict the critical state before the outbreak of pandemic influenza and the outbreak state.
The flu broke out in Hong Kong in 1968 and continued until 1969, of which 7.5 million people died. In 1972, influenza broke out in Henan Province and quickly spread to the entire province. As shown in Fig. 2 The influenza A broke out in The United States, Russia and Japan in 1976 and 1977. Although the prevalence of this flu was typical of the outbreak, adults were slightly infected, and the incidence rate   The influenza A broke out in India in 2015, of which 1.5 thousand people died [15]. As shown in Fig. 6 In general, the composite index varies along with the state of pandemic influenza virus from a relatively steady state to critical state before outbreak and then to the outbreak state. When the composite index continuous decreases for 2 years and increases of more than o.1 suddenly, it means the next year is normally in the outbreak state. Therefore, we can predict and identify whether a certain year is in the critical state before influenza A outbreak or outbreak state by observing the variation of index value.

Conclusions
We select the data of protein amino acid sequence of pandemic influenza virus between 1934 and September 2016, and the different countries' data such as China's data between 1957 and 2015 in which only some data in a very few years are absent, and obtain a composite index by using PDNB. Although the amino acid sequence of each protein will randomly fluctuate, the composite index can still provide reliable, significant early warning information when influenza pandemic is close to the critical state or outbreak state. The network markers and other traditional markers cannot provide an early warning signal of the critical state before pandemic outbreak in comparison with dynamic network biomarker. This fully shows the dynamic network biomarker is more stable and accurate to determine the state in which the pandemic influenza virus, particularly the critical state of pandemic influenza. This will achieve the aim of early warning and then strengthen preventive measures in advance. This is of great significance for the research and warning of pandemic influenza virus.