 Research article
 Open Access
 Published:
Impact of environmental inputs on reverseengineering approach to network structures
BMC Systems Biology volume 3, Article number: 113 (2009)
Abstract
Background
Uncovering complex network structures from a biological system is one of the main topic in system biology. The network structures can be inferred by the dynamical Bayesian network or Granger causality, but neither techniques have seriously taken into account the impact of environmental inputs.
Results
With considerations of natural rhythmic dynamics of biological data, we propose a system biology approach to reveal the impact of environmental inputs on network structures. We first represent the environmental inputs by a harmonic oscillator and combine them with Granger causality to identify environmental inputs and then uncover the causal network structures. We also generalize it to multiple harmonic oscillators to represent various exogenous influences. This system approach is extensively tested with toy models and successfully applied to a real biological network of microarray data of the flowering genes of the model plant Arabidopsis Thaliana. The aim is to identify those genes that are directly affected by the presence of the sunlight and uncover the interactive network structures associating with flowering metabolism.
Conclusion
We demonstrate that environmental inputs are crucial for correctly inferring network structures. Harmonic causal method is proved to be a powerful technique to detect environment inputs and uncover network structures, especially when the biological data exhibit periodic oscillations.
Background
One of the main topics in system biology is to uncover the complex network structures in a biological system [1, 2]. In comparison with simple systems, nowadays the researchers always face larger and more complex dynamic interactive systems (e.g., neural networks and gene networks). Traditional techniques, such as the crosscorrelation and partial coherence analysis [3–7], are inadequate to clearly and explicitly reveal the true network structures for such a complex system. These techniques neither take time dimension into consideration nor reveal the directional interactions, thus they cannot configure a dynamic interactive system with time. Over the past few decades several advanced techniques such as dynamic Bayesian networks [8] and Granger causality [9–13] have been developed to identify network structures in dynamic systems. Granger causality only reveals direct causality between pairwise units with linear interactions, thus conditional and partial Granger causality [14, 15] and kernel Granger causality [16] have been proposed to deal with indirect causal interactions among multiple variables.
In multivariable (more than two) situations, one time series can be connected to another time series in a direct or an indirect manner, raising the important question of whether there exists a (direct) causal influence between two time series when the influence of other time series are taken into account. In such cases, repeated bivariate analysis can be misleading. For example, one time series may falsely appear to cause another if they are both influenced by a third time series but with different time delays. The conditional Granger causality [14, 15] aim to deal with the causal interactions among multiple variables. However, the applicability of the kernel Granger causality or the conditional Granger causality largely depends on the experimental ability to measure all relevant variables in the system, but it is usually not feasible in the biological recordings. Environmental inputs, including exogenous inputs from external sources and unmeasured endogenous variables, cannot be all captured by experimental techniques, but such environmental inputs can confound the accuracy of causal influences and thus degrade the credibility of the uncovered network structure. For example, in our experimental data recorded from the inferotemporal (IT) cortex of sheep, every measured neuron receives common exogenous inputs from the visual cortex and feedbacks from the prefrontal cortex [7, 15]. Even with advanced multielectrode array (MEA) techniques, it is only able to record a tiny subset of interacting neurons in a single area [15, 17] and there are bound to be endogenous variables. Hence controlling environmental inputs is a critical issue when applying Granger causality to experimental data. Recently, partial Granger causality [15, 18] is developed to eliminate the influences of exogenous inputs and latent variables, but a full elimination is only possible if all common inputs have a more or less identical influence on all measured variables. It is generally not realistic that all measured variables can receive an almost identical and common influence in experimental recordings. In fact, the common influence degrades due to spatial increment. In such cases, it is critical to identify which measured variables received environmental inputs, and what is the impact of the environmental inputs on configuring network structures?
We take a system biology approach to answer the questions above. Most current techniques largely ignore the natural dynamical characteristics of the biological data, which usually exhibits highly rhythmic (periodic) oscillations, especially under periodic environmental influence, e.g. lightdark condition [19–23]. Such natural periodic dynamics of experimental data can provide important information in model fitting and error estimation. To overcome the limitations of current causality techniques and make full use of harmonic oscillation characteristic of experimental data, we consider a harmonic oscillator, or a set of harmonic oscillators, to represent the environmental inputs. The harmonic oscillators can be mathematically formulated by the hidden periodic model [24, 25]. We extend the current linear Granger causality model (Autoregressive model) by inclusion of the typical harmonic oscillators embedded in the experimental recordings. If the inclusion of harmonic oscillators can significantly reduce the variance of the prediction error, then the environmental causal influence can be reduced or eliminated. The mathematical representation of a harmonic oscillator model is derived in Method section and the application of the harmonic Granger causality approach is elucidated in Result section. Although the techniques of Granger causality are based on time series data, additional useful information can be revealed when the analysis is performed in the frequency domain [14, 18, 26, 27]. Investigating the causal interactions between different frequencies adds another dimension to the already complex identification of spatiotemporal and frequencyspecific rhythmic oscillations. Conventional crossfrequency interactions are characterized by the synchrony of phase, recognized as 'n:m phase synchrony' [28, 29]. Phase synchrony indicates amplitudeindependent phaselocking of n cycles of one oscillation to m cycles of another oscillation, however, this method largely ignores the information carried by the amplitude and the coupling effects between phase and amplitude. Importantly, phase information can be sensitive to random noise [30], while in most experimental data the true signal is heavily masked by random noise. In this study, as a byproduct, we also assess whether it is reliable to use the phase information between two (oscillating) units to estimate the causality. Our simulation results clearly demonstrate that it can be very insufficient and inaccurate to use only the phase information to characterize a causal interaction, but the approach developed in the current paper works. We first apply the harmonic oscillator idea to a toy model and validate it by comparing with the conventional Granger causality. Then we investigate the effect of multiple oscillators by employing a small sparsely connected network. Finally we apply the harmonic Granger causality to a real biological network of microarray data of the flowering genes of the model plant Arabidopsis Thaliana. We aim to identify which genes are directly affected by the presence of the sunlight, and uncover the causal interactions among genes. Although tens of thousands of known genes within Arabidopsis are collected with Microarray, only those genes known to be involved in the flowering of the plant (8 genes in this case) are analyzed by our harmonic Granger causality. The method successfully reveals the genes that most possibly receive environmental inputs. We finally compare our causal network with other candidate models in the literatures [31–33]. With this system biology approach, our causal network depicts all possible connections reported in the literatures [32, 33], and also reveals two more connections that do not exist in the known candidate models.
Results
Toy Model & Validation
In this simple toymodel example, we compare the performance of traditional Granger causality and harmonic Granger causality on four simple model configurations. We show that the traditional Granger Causality analysis is not sufficient to describe the influence of one time series upon another in the presence of an external driving force. The full simulated model under consideration is described as follows:
Within this model we consider four configurations, firstly where both node X_{ t }and Y_{ t }experience an environmental input, secondly where X_{ t }experiences the environmental input and Y_{ t }does not, thirdly where Y_{ t }experiences the environmental input and X_{ t }does not, and lastly where neither X_{ t }or Y_{ t }undergo the influence from the external driving oscillation. Despite the external driving oscillation exerted on either X or Y, there is a coupling frequency of 34 Hz between X and Y for all four configurations. In each of the simulations, f_{ x }and f_{ y }are equal to 10 Hz, the input phases, ϕ_{ x }and ϕ_{ y }are set to zero, and the variances of inherent noises, ε_{ t }and η_{ t }, are set to 0.1. In all the following simulations, the values of C_{ x }and C_{ y }are 0.07 and 0.05, respectively.
In the first case where both nodes experience the environmental inputs, Fig. 1Bi and 1Ci shows that both X(ω) and Y(ω) have components at the driving oscillation frequency of 10 Hz. The causal link known to be present from X to Y should mean that there is a spike in the causality spectra at the driving frequency. Fig. (1Di) and (1Ei) show that the normal Granger AR method can only detect the causality spectra at the intrinsic coupling frequency around 34 Hz, but the Harmonic method is capable of detecting the causal influence at the external driving frequency at 10 Hz. The reason is that the driving force from X to Y comes from the factor 0.3X_{ t } 2 (Eq. 1), which contains both intrinsic coupling frequency and external driving frequency. Harmonic method can better fit and predict the data at external driving frequency such that it can detect the external driving frequency at 10 Hz better than normal Granger AR method.
The second case is where X_{ t }receives the driving input and Y_{ t }does not. Fig. 1Bii shows that the peak in the spectra can be seen in the Xchannel at 10 Hz representing the external driving oscillation, and although there is no inherent 10 Hz oscillation in the Y_{ t }channel, the spectra of Y (ω) (Fig. 1Cii) shows that there is an induced oscillation at this frequency, hence one would expect that the causality of F_{X→Y}will show a large component at the driving frequency. Fig. (1Dii) and 1Eii show the causality, F_{X→Y}calculated using both AR and harmonic methods. The AR method shows little peak at the driving frequency, while the harmonic method produces a large spike in the causality at the frequency 10 Hz. In third case where Y_{ t }receives the external input, Fig. (1Ciii) shows that the peak in the spectra can be seen in the Ychannel at 10 Hz representing the external driving oscillation. We would expect that in this case the causality shows no peak at the driving frequency as X_{ t }contains no driving oscillation: Fig. (1Diii) and (1Eiii) show the causality, F_{X→Y}calculated using AR (panel D_{ iii }) and harmonic methods (panel E_{ iii }). In this model configuration, both the AR and the harmonic methods produce similar results. However there is a drastic decrement in the spectra of the harmonic method at the driving frequency in the causality, qualitatively, this cannot be the case, so in this instance it is preferable to use the AR method to calculate the causality spectra as X_{ t }contains no driving oscillation. The causality decrement near the driving frequency may be caused by the inclusion of the harmonic term, which may extract the intrinsic power of the signal Y at the driving frequency of 10 Hz. The decrement means that signal Y was mainly driven by the harmonic term at 10 Hz and signal X did not contribute much at this frequency. For other frequency range, harmonic term did not influence signal Y and the driving force mainly came from signal X. The final model configuration considered is where neither X_{ t }or Y_{ t }has the driving input. Fig. (1Div) and (1Eiv) show that in this case there is very little difference in the causality spectra obtained using the two methods.
Through this simple toy model we have demonstrated that the normal Granger Causality in the frequency domain is not sufficient to detect interactions at all frequencies in the presence of an environmental input. Furthermore we have demonstrated that the causal method with the additional harmonic term produces more consistent and accurate results than the traditional Granger causality method at most instances. The harmonic causal method and the traditional Granger causality method can be a good complement to each other when applying to the time series with or without oscillatory environmental influences.
Investigating the Effects of Phase
It is a common scenario in physics to determine the driving relationship between two oscillators. The phases of the two oscillators can be interleaved throughout the time, thus phase may not be an accurate indication of the causal interactions between the two oscillators. One of the main motivations of introducing the harmonic term is to investigate the effects of phase and to determine whether it can be used as an indication of the amount of causal interactions between oscillatory signals. Consider again the case where X_{ t }drives Y_{ t }, Eq. 1 shows that the influence of the time series X_{ t }upon Y_{ t }is encapsulated in the oscillatory and noise terms. However, the amplitude of the resultant oscillation will depend on the phase difference of the two harmonic terms. This can be seen by considering the addition of the two oscillatory terms. Let O_{ y }= C_{ y }cos(2πf_{ y }t + ϕy) and the harmonic component from X_{ t }be , then in the case that the frequency of the oscillations are equal, f_{ x }= f_{ y }= f, the two oscillation terms can be combined as follows:
where
and
It is able to analyze the effect of input phase upon the level of influence of one time series upon another. Eq. 3 reveals that the magnitude of the resultant harmonic term is indeed a function of both and ϕ_{ y }. The oscillating term, , oscillates sinusoidally and can take values in the range [1, 1].
The relative effect of this oscillation upon depends on the values of C_{ y }and . The extremes of are given by , and it can be shown that the maximum effect of the phase differences happens when C_{ y }= and the minimal difference happens when either C_{ y }or equals zero.
In order to demonstrate the effects of phase changes, the same model as Eq. 1 is used. Fig. 2 shows the enormous consequences of the magnitude of the resultant signal by simply varying the phase of the environmental input, ϕ_{ x }and ϕ_{ y }. Fig. 2A shows that the configuration of the model with both nodes of the system receiving external environmental inputs and a causal link from X to Y. Fig. 2B shows two examples in the time domain traces obtained in the absence of environmental noise. The upper panel Fig. 2B1 shows the time domain traces when the phase difference of the external inputs is zero (ϕ_{ x }= ϕ_{ y }) and the lower panel Fig. 2B2 shows the scenario when ϕ_{ x } ϕ_{ y }= π. In each of these diagrams the blue trace, which is the node X_{ t }, is unchanged, however the time series associated with Y_{ t }change considerably. The difference of the maximum amplitudes of the two time series is denoted by Δ. The colourmap figure shown in Fig. 2C plotted the value of Δ against the values of the input noise and the input phase differences. It can be seen that the effects of the phase differences is lessened by increasing the noise in the system. Fig. 2D reveals the interdependence between noise and phase in this system. The upper panel Fig. 2D1 shows the amplitude difference against the phase differences in the absence of noise. The reason why the peak does not happen at p is because C_{ x }and C_{ y }are different, and there is a phase difference. If C_{ x }= C_{ y }, then the peak would be exactly at p. The lower panel Fig. 2D2 shows how a measure of the difference varies with increasing noise. The measure of the difference is defined as . As the noise increases the measure difference tends to zero, confirming that the noise levels can mask the phase effects.
We have shown that the phase of the external driving oscillation has an effect on the amplitudes of the resultant time series, then the real question is if this effect can be detected as a difference in the level of causality. Recall that the actual level of causal interaction is not varying and the influence that X_{ t }exerts over Y_{ t }does not alter throughout the simulations. The investigation into the effects of phase continued with a series of calculations determining the level of causality. Fig. 3 shows the causality detected in the system obtained by four different methods. Fig. 3A1 shows the time domain causality detected using the normal Granger method, Fig. 3B1 shows the time domain causality detected using the harmonic method, Fig. 3A2 and 3B2 show the causality detected using the AR and harmonic methods in the frequency domain. For consistency we require that:
where F_{X→Y}is the causality on the time domain and f_{X→Y}(ω) is the causality at frequency ω. Eq. 6 is the Kolmogorov condition that indicates the equivalence of the causality in time and frequency domain.
Comparing the AR and harmonic causality results, the amplitude differences influenced by the phase differences are represented in the AR method but not in the harmonic causality results, which imply that the harmonic approach must be used for a true indication of the level of causal interaction as it is robust in presence of the external driving oscillation.
Comparison of Fig. (3A1) and Fig. (3A2) shows that the Eq. 6 holds for the AR method and there is a high level of correspondence between the causality calculated in the time domain and frequency domain. Similarly, comparison of Fig. (3B1) and Fig. (3B2) shows that to a large extent the harmonic method is consistent with the Kolmogorov condition in Eq. 6. The causality calculated by harmonic method does not depend on the phase difference no matter it is in the time domain or frequency domain.
We further investigated the relationship between the phase difference of input and output signals, and the influence of noise level. The same configuration model presented in Fig. 2A was used to demonstrate this interrelation. The phase difference between two input signals varied from 0 to 2p and the noise level (variance of the white noise) increased from 0 to 0.2. The phase difference of signal X and Y was plotted as a function against the phase difference of the input signals and the noise level. Fig. 4A shows the colourmap of the interrelation. The color intensity represents the phase difference for the output signals. Fig. 4B and Fig. 4C show the averaged intensity along noise level and the input phase difference, respectively. It is clear that the phase difference of the output signals does not depend on the phase difference of the input signals and the noise level. The results indicate that the phase information cannot be used alone to accurately determine the causal relationship between any two signals. The interpretation of the causality based on phase should be cautious as the causality may not reflect the true relationship.
Investigating Effect of Multiple Oscillators
In experimental recordings, the measured variables are usually influenced by many environmental inputs, thus multiple oscillators have to be considered. In order to reveal the power and limitations associated with the additional oscillators, a simple system was considered and the errors of seven different connection schemes was compared. The schematic plot for seven connection schemes is displayed in Fig. 5A, and the error terms corresponding to each schemes are described as follows:

i)
Σ_{ NC }: AR + harmonic oscillations, no causality
ii) Σ_{C 1}: AR + harmonic oscillations, influence from causal node
iii) Σ_{C 2}: AR + harmonic oscillations, influence from noncausal nodes
iv) Σ_{ NAR }: harmonic oscillations only

v)
Σ_{ ARNC }: AR only
vi) Σ_{ARCA 1}: AR, influence from causal node
vii) Σ_{ARCA 2}: AR, influence from noncausal nodes
To investigate the effect of multiple oscillators on the goodness of fit, we consider a simple network consisting of five nodes in various random configurations. The variances are estimated for each of the connection scheme in Fig. 5A, and the number of harmonic oscillators varies from 1 to 11. The results of this simulation are shown in Fig. 5B. Inspection of Fig. 5B reveals some of the issues involved in using the harmonic oscillators to provide a full description of the time domain evolution. Firstly looking at those variance estimates with no harmonic oscillatory components (Σ_{ ARNC }, Σ_{ARCA 1}and Σ_{ARCA 2}), as expected in Fig. 5B that these are constants for increasing numbers of oscillators (invariant as they are independent of oscillators). The estimates of Σ_{ ARNC }and Σ_{ARCA 2}are very similar, because the noncausal nodes do not provide useful information for fitting the target node. The causal estimate (Σ_{ARCA 1}) is the best estimate when no harmonic oscillators are considered.
Inspection of estimates in which oscillations are included reveals an important trend; one would expect that as the number of oscillators increases, the estimates become more and more accurate. Theoretically speaking this should be the case, however realistically it is shown not to be the case as in Fig. 5B. When the number of oscillators goes beyond seven, the estimated variances drastically increase. The reason for this situation is the overfitting problem. The total number of parameters needed to be estimated depends on the number of oscillators (three extra parameters per oscillator), and the known data cannot fit the model if the parameters go beyond the amount of data. As expected, the causal estimate (Σ_{C 1}) provides the best estimate for small number of oscillators (minimum at oscillators = 3). This simulation reveals that the number of oscillators does not exceed certain value (seven in this case) if we do indeed obtain a good estimate far surpassing the accuracy of the estimate using only AR. We have to perform goodness of fit test to determine the number of oscillators that can help to fit the data best.
Gene Data & Network
Having shown the necessity of applying the harmonic approach to identifying causality in data sets where it is known that an external environmental oscillation is driving the time series, it is necessary to apply this system approach to real experimental data. The microarray gene data pertaining to the flowering clock cycle of the Arabidopsis is one such example where this methodology may prove enlightening. The plants (Arabidopsis) are grown in laboratory conditions, where they are subjected to 12 hours of artificial daylight followed by 12 hours of no light representing night time. Gene microarray data is collected at regular intervals (twice a day) throughout the experiment. Inspection of the time series of this gene expression data reveals that there is a clear periodic oscillation which corresponds to day/night time periods, suggesting that the expression levels of the genes depend upon the amount of sunlight present (see Fig. 6 panel A).
We consider the time domain change of eight genes involved in the flowering system of the Arabidopsis, namely CCA1, ELF4, GI, LHY, PRR5, PRR7, PRR9 and TOC1. The time domain trace of the expression of these genes is shown in Fig. 6 panel A. Each of the genes with the exception of GI exhibits highly oscillatory behaviour with period of one day. This periodicity is attributed to the presence of incident sunlight during the day time and its absence during the night. Some of the genes are directly affected by the light and are expressed to a greater or less extent during the day. The experimental data used for this analysis is over a period of 11 days with two measurements per day, hence data for each gene consists of 22 data points and there are 16 repetitions of each time point (4 biological repetitions and 4 technical repetitions for each measurement).
The task regarding this data set is twofold, firstly we wish to identify which of the genes are driven by the external oscillation. And secondly, we wish to determine how the genes are connected to form the causal network governing flowering of the plant. The method to determine environmental input and network connectivity is as follows. There are 56 pairwise combinations possible with eight genes; for each of these 56 gene pairs the parameters of four candidate models were calculated. These models are presented below:
where M is the number of genes in the network, p is the number of lagged observations used in the model. p can be determined by a quantity called Akaike Information Criteria (AIC) [34]. The four candidate models are descriptions of an estimation of X_{ g }used to determine the effect of X_{ h }with or without the external driver. The variance of the errors associated with each of the models are: Σ_{1}, Σ_{2}, Σ_{3}, Σ_{4} respectively.
The errors (Σ_{ i })  associated with each model were estimated for each gene pair. Using these errors we can infer both the presence of an external environmental driver and the possibility of a connection between the pair of genes.
Comparison of Σ_{1} and Σ_{3} and comparison of Σ_{2} and Σ_{4} reveal that whether particular gene may have an external input as these estimates differ only by the presence of the oscillatory input. If the estimate is improved appreciably by the addition of the harmonic term, then it is possible that this gene receives an environmental input. Therefore for each gene, a measure of the likelihood of input presence is obtained as follows: . Fig. 6B shows the oscillation metric for each of the 8 genes in the network. This method merely states which of the genes is more likely than others to have an input, so a decision must be made as the value of the oscillation metric is the cutoff point. At a first estimate, the value of M = 0.5 has been selected. Figure 6B shows that selecting a cutoff value of 0.5 for the oscillation metric leads to the following genes having an external oscillatory input; ELF4, PRR7, TOC1, LFY.
Having calculated whether the genes have an external input, it is possible to obtain the causality of each pair of genes. Consider the causality between gene X and Y. If gene X has an external input, then the causal influence X exerting upon Y is: . Whereas there is no external input to gene X, then the causality will be given by: . The errors (Σ_{ i })  and hence the causality associated with each model were found for each pair of genes and then sorted in descending order. Those with the highest level of causality deemed more likely to have a connection.
The table of errors (Σ_{ i }) shown in Fig. 7 is used to find the most likely connection in the gene network. One such candidate network is shown in Fig. 8A.
To validate the gene network generated by harmonic Granger causality, we compare it with other candidate network in the literature of circadian clock cycle by Ueda [31, 32]. The candidate network is reproduced in Fig. 8B. The three loop model was first proposed by Locke [31] and then modified by Ueda [32, 33]. In Ueda's model two hypothesized unnamed genes are omitted in our model and the genes LHY and CCA1 are treated as one entity. Our analysis reveals that four of the genes in this network receive external inputs: PRR7, CCA1, CCA1 and ELF4. The first two of these agree with Ueda's network. The Ueda's network states that TOC1 does not receive external influence but the hypothesized gene does. It is possible that this influence has been included in the time series of TOC1. The structure of the two networks also are very closely related, both showing a high level of connectivity. Perhaps the biggest difference is that our method shows that there are connections between PRR7/PRR9 and TOC1, while Ueda's model does not reveal such connections.
In addition to finding the likely connections between the genes, the frequency domain analysis allows us to investigate the frequencies at which one gene drives another. Fig. 8C shows the causality spectra calculated for each of the 15 connections believed to exist in the gene network shown in Figure 8. f_{X→Y}(ω) is shown on frequency domain between [0, 0.5]/day^{1}, whether this is calculated using the harmonic or AR method depends upon whether Gene X is thought to have a external input, i.e., those with an external input are calculated with the harmonic method, and those with no external input are calculated using the AR method. It is often the case that the two methods produce very different causality spectra, so selecting the correct method is essential, in many instances the AR method predicts a causal influence at the driving frequency, yet the harmonic method does not.
Discussion
Although harmonic causal method has greatly improved the performance of causal connection detection, there are several issues that harmonic causal method cannot answer or infer at this stage. First, the application of harmonic Granger causality has a precondition that the signal is influenced by harmonic environmental inputs. Most biological data exhibits harmonic oscillatory behavior, while there may also exist other form of nonlinear exogenous input other than harmonic form. The harmonic method cannot deal with such nonlinear interactions, and its application to nonlinear exerts would depend on specific problems. Second, harmonic causal method is developed to detect the directional causal interactions between any two elements, but it has no ability to determine the influence is positive or negative. For example, one neuron can exert an excitatory or inhibitory effect on another neuron; or one gene can cause another gene's expression level up or down. It is not possible at this stage to figure out the positive or negative effect by only determining the directional causal interactions. Third, one should take extra caution when applying multiple harmonic causal method as the overfitting problem can easily happen. Excessive number of harmonic oscillator will generate inaccurate model estimations and predictions.
Conclusion
We have presented a system biology approach to study the impact of environmental inputs on recovering network structures. The harmonic modification of the Granger causality is essential if we want to have the complete picture of causal interactions between elements in a system in the presence of a periodic environmental oscillation. The toy model example demonstrated that the conventional Granger causality was not sufficient to reveal the level of causal influence in the presence of an oscillatory driver. Furthermore, the toy model was able to validate the estimates used in the definition of the frequency domain harmonic causality. One of the motivations for the introduction of the driving oscillation was to investigate whether it is feasible to use phase differences between oscillatory signals to assess the causality. We also showed that the apparent level of influence on the conventional Granger causality was tightly related to the phase difference and noise intensity, and this artefact was enough to render the estimation of the conventional Granger causality. The harmonic Granger causality was not sensitive to these phase effects and produces more accurate estimate of the true causality. We also applied the harmonic method to detect external drivers and causal connections in a gene network. We were able to predict which genes receive an environmental input from the sun and these results are in agreement with the experimental results to a large degree. Furthermore, we were able to reproduce the network, which not only reveals known connections but also predicts new connections comparing with classical candidate models.
Our approach clearly demonstrates that by including appropriate environmental (oscillatory) inputs in a conventional reverseengineering approach could significantly improve its accuracy. Obviously the same idea could be applied to other approaches such as the Bayesian network inferences and information theory approach.
Methods
Causality in the Time Domain
In order to infer the connections between the elements of a system constituting a network, we propose an extended Granger Causality whereby a harmonic oscillatory term is added to the normal autoregressive (AR) and error terms of the conventional Granger analysis, and such simple modification can yield surprising and useful results. To appreciate the effect of the proposed modification and the power of the addition of the harmonic oscillation to the Granger causality analysis, we provide the conventional Granger causality in the supplementary material (Additional file 1) and proceed directly to the formulation of harmonic Granger Causality.
Consider two time series X_{ t }and Y_{ t }, a general form of an autoregressive model with environment inputs (sinusoidal form) has the following vector autoregressive representation:
A joint autoregressive representation having information of past measurements of both time series X_{ t }and Y_{ t }can be written as
where p is the maximum number of lagged observations in the model. ε_{ it }, i = 1, 2, 3, 4, are prediction errors with variance Σ_{ i }, which are uncorrelated over time. The value of Σ_{1} measures the accuracy of the autoregressive prediction of X based on its previous values and the harmonic term, whereas the value of Σ_{3} represents the accuracy of predicting present value of X based on previous measurements of both X and Y and the harmonic term. According to the causality definition of Granger, if the prediction of one process is improved by incorporating past information of the second process, then the second process causes the first process. In other words, if the variance of the prediction error for the first process is reduced by the inclusion of the past histories of the second process then a causal relation from the second process to the first process exists. This causal influence is quantified by
It is clear that F_{Y→X}= 0 when there is no causal influence from Y to X and F_{Y→X}> 0 when there is. Similarly, define causal influence from X to Y as
Due to the natural rhythmic dynamics of the experimental recordings, the environmental inputs (denote as E) are represented by the harmonic terms. While the inclusion of the harmonic terms can exclude the periodic influence caused by the environmental inputs, thus the prediction error can be better estimated and truly reflect the interaction between two processes. We can quantify the influence of environmental inputs (E) by recalling the joint autoregressive model of X_{ t }and Y_{ t }.
By definition of Granger causality, the causal influence from E to X or Y can be defined as
Causality with multiple Oscillators
In many aspects, the addition of a single oscillator is a generalization of the Granger causality, however the application of the adaptation to the normal autoregressive approach is limited to only one external driving force. A further generalization considered here consists of adding more oscillators to the AR model. The interpretation of this approach is as follows: the first and most simple interpretation is that the additional oscillators represent more external oscillatory driving forces, thus being a mere extension of the single oscillator case. A more enlightening and more useful interpretation is that multiple oscillators represent a 'field' of unknown influences upon the network. We know from Fourier theory that any function or signal can be represented by a (possibly infinite) summation of sinusoids, therefore the addition of multiple oscillators in this fashion can, in theory, account for any incident influence upon each of the variables within the system. This interpretation has some rather useful applications. Consider a large sparsely connected network. It is a typical scenario that due to some experimental limitations, we can only record a small proportion of information in the network as a whole. Ideally we wish to reconstruct the structure of the subnetwork for which we have recorded. Hopefully by considering multiple oscillators, this Fourierlike method will provide an avenue to recover the structure of the subnetwork.
By analogy with the single oscillator case, in multiple oscillator scenario there exist a number of unknown external inputs about which we can obtain no information. These unknown inputs and their influences are to be approximated by the summation of many oscillators. In order to calculate the interaction from Y_{ i }to X with many external oscillators and known variables, we can write the equations as
In Eq. 14, all known variables (Y) are included in the AR terms, while in Eq. 13, the variable Y_{ i }is excluded in the AR terms. M is the number of known variables, p is the total number of previous time steps included and N is the number of oscillators considered in the estimation. The errors associated with noncausal and causal estimations are ε_{ nc }and ε_{ c }respectively.
The level of causality from Y_{ i }→ X is quantified as:
If then there is no causal influence from Y_{ i }to X. If F_{Y→X}> 0 then there is a causal influence from Y_{ i }to X.
Causality in the Frequency Domain
The key of information extraction is to switch from temporal domain to frequency domain in which their information content can usually become more obvious. Fourier transform provides spectral power that identifies the amplitudes of sine functions of various frequencies that exist throughout the entire duration of the signal. The time domain Granger causality and the harmonic modification can be transformed into the frequency domain, whereby we can obtain the causality spectra showing the frequencies at which the influence of one variable is exerted on another. Expressing the harmonic time series approximations in matrix format leads to the following expression:
where the summation of over the time lags is implied such that . L is the lag operator. And the zeroth terms of the coefficient matrix are such that a(0) = 1, b(0) = 0, c(0) = 0 and d(0) = 1. O_{ x }and O_{ y }are the harmonic terms. Take the Fourier transform on both sides of this matrix equation and then multiply by the inverse of the matrix, then express X(ω) and Y(ω) in terms of the error and harmonic oscillations, we can obtain the transfer function:
Now the spectra of X(ω) and Y (ω) can be can be derived as
Thus the spectra are given by:
It is instructive to investigate the components which constitute the spectra of X(ω) and Y(ω). Expanding the expression for S_{ xx }and S_{ yy }yields an equation with 16 terms dependent upon the errors terms, (E_{ x }(ω), E_{ y }(ω)), the harmonic terms, (O_{ x }(ω), O_{ y }(ω)), the transfer functions, H_{ xx }, H_{ xy }, H_{ yx }, H_{ yy }, and their complex conjugates. For the X channel, these components are as follows:
Each element of the spectra of S_{ xx }can be thought of as either intrinsic (caused by the X_{ t }), causal (caused by Y_{ t }) or cross terms (caused by X_{ t }and Y_{ t }). Thus
where
In the absence of the harmonic oscillators, there are only four terms in the expression for S_{ xx }and S_{ yy }, and the cross term can be eliminated using the transformation proposed by Geweke [35, 36]. Eliminating the cross term is essential for a consistent definition of the frequency domain causality, however the addition of the harmonic terms makes the prospect of removing the cross term rather troublesome. In the harmonic case, it is not possible, in general to eliminate the cross terms by means of transformation due to the presence of O_{ x }and O_{ y }terms. The oscillation terms O_{x, y}are sinusoidal indicating that the Fourier transforms of these functions are delta functions, and the discontinuous nature of the delta functions makes it impossible to find a transformation eliminate all the cross terms. The method we use to eliminate the cross terms is as follows: firstly we apply the approximation of the Geweke transformation:
where γ^{2} is the covariance matrix between X and Y, S is the variance of either error E_{ x }or E_{ y }.
The step from Eq. 22 to Eq. 23 suggests that the transfer matrix H is an approximate to the true value. This estimate is necessary to ensure that the causality has a consistent definition. The transfer equation is now as follows:
We combined Eq. 18 and Eq. 24 to define X(ω) as follows:
Where and . This has the effect of nullifying the cross terms which contain the element of error. Then the cross terms (components of S_{ cross }) are reallocated either to S_{ causal }or S_{ intrinsic }in the following fashion:
After the reallocation of the components of the spectrum the resultant causality is an approximation rather than a precise calculation, yet it can be shown to yield convincing and consistent results. Using these two methods to approximate the spectrum, we have obtained the spectrum in such a format as the term is negligible. In the case where the harmonic term is not present, the causality is defined as:
Yet, it is essential that the causality in the harmonic case is defined in terms of both S_{ intrinsic }and S_{ causal }. Therefore, by analogy to the normal frequency domain causality (without harmonic terms), the frequency domain causality in the harmonic case is defined as:
In the Results section, we will show through examples that this approximation produces an excellent estimate of the frequency domain causality.
References
 1.
Alon U: Biological networks: the tinkerer as an engineer. Science. 2003, 301 (5641): 18661867. 10.1126/science.1089072
 2.
Gardner TS, di Bernardo D, Lorenz D, Collins JJ: Inferring genetic networks and identifying compound mode of action via expression profiling. Science. 2003, 301 (5629): 102105. 10.1126/science.1081900
 3.
Lopes da Silva F, Pijn JP, Boeijinga P: Interdependence of eeg signals: linear vs. nonlinear associations and the significance of time delays and phase shifts. Brain Topogr. 1989, 2 (12): 918. 10.1007/BF01128839
 4.
Kocsis B, Bragin A, Buzsaki G: Interdependence of multiple theta generators in the hippocampus: a partial coherence analysis. J Neurosci. 1999, 19 (14): 62006212.
 5.
Albo Z, Di Prisco GV, Chen YH, Rangarajan G, Truccolo W, Feng JF, Vertes RP, Ding MZ: Is partial coherence a viable technique for identifying generators of neural oscillations?. Biological Cybernetics. 2004, 90 (5): 318326. 10.1007/s0042200404755
 6.
Schelter B, Winterhalder M, Eichler M, Peifer M, Hellwig B, Guschlbauer B, Lucking CH, Dahlhaus R, Timmer J: Testing for directed influences among neural signals using partial directed coherence. J Neurosci Methods. 2006, 152 (12): 210219. 10.1016/j.jneumeth.2005.09.001
 7.
Wu J, Kendrick K, Feng J: Detecting correlation changes in electrophysiological data. J Neurosci Methods. 2007, 161 (1): 155165. 10.1016/j.jneumeth.2006.10.017
 8.
Friedman N, Linial M, Nachman I, Pe'er D: Using Bayesian networks to analyze expression data. Journal of Computational Biology. 2000, 7 (34): 601620. 10.1089/106652700750050961
 9.
Granger CWJ: Investigating Causal Relations by Econometric Models and CrossSpectral Methods. Econometrica. 1969, 37 (3): 414428. 10.2307/1912791.
 10.
Granger CWJ: Testing for Causality  A Personal Viewpoint. Journal of Economic Dynamics & Control. 1980, 2 (4): 329352. 10.1016/01651889(80)90069X.
 11.
Gourevitch B, Le BouquinJeannes R, Faucon G: Linear and nonlinear causality between signals: methods, examples and neurophysiological applications. Biological Cybernetics. 2006, 95 (4): 349369. 10.1007/s0042200600980
 12.
Ding M, Chen , Yonghong M, Bressler SL: Granger Causality: Basic Theory and Application to Neuroscience. Handbook of Time Series Analysis. Edited by: Schelter BWM, Timmer J. 2006, 451474. WileyVCH Verlage
 13.
Seth AK: Causal networks in simulated neural systems. Cogn Neurodyn. 2008, 2 (1): 4964. 10.1007/s115710079031z
 14.
Chen YH, Bressler SL, Ding MZ: Frequency decomposition of conditional Granger causality and application to multivariate neural field potential data. J Neurosci Methods. 2006, 150 (2): 228237. 10.1016/j.jneumeth.2005.06.011
 15.
Guo S, Seth AK, Kendrick KM, Zhou C, Feng J: Partial Granger causalityeliminating exogenous inputs and latent variables. J Neurosci Methods. 2008, 172 (1): 7993. 10.1016/j.jneumeth.2008.04.011
 16.
Marinazzo D, Pellicoro M, Stramaglia S: KernelGranger causality and the analysis of dyanmical networks. Physical Review E. 2008, 77 (056215): 19.
 17.
Wu J, Kendrick K, Feng J: A novel approach to detect HotSpots in largescale multivariate data. BMC Bioinformatics. 2007, 8: 331 10.1186/147121058331
 18.
Guo S, Wu J, Ding M, Feng J: Uncovering interactions in the frequency domain. PLoS Comput Biol. 4 (5): e1000087
 19.
Chance B, Estabrook RW, Ghosh A: Damped sinusoidal oscillations of cytoplasmic reduced pyridine nucleotide in yeast cells. Proc Natl Acad Sci USA. 1964, 51: 12441251. 10.1073/pnas.51.6.1244
 20.
Chrobak JJ, Buzsaki G: Gamma oscillations in the entorhinal cortex of the freely behaving rat. J Neurosci. 1998, 18 (1): 388398.
 21.
Reppert SM, Weaver DR: Molecular analysis of mammalian circadian rhythms. Annu Rev Physiol. 2001, 63: 647676. 10.1146/annurev.physiol.63.1.647
 22.
Buzsaki G: Rhythms of the Brain. 2006, Oxford: Oxford University Press
 23.
Yan J, Wang HF, Liu YT, Shao CX: Analysis of Gene Regulatory Networks in the Mammalian Circadian Rhythm. PLoS Comput Biol. 2008, 4 (10): 1310.1371/journal.pcbi.0040013.
 24.
Tiao GC, Grupe MR: Hidden Periodic AutoregressiveMoving Average Models in TimeSeries Data. Biometrika. 1980, 67 (2): 365373.
 25.
He S: Estimation of the mixed ar and hidden periodic model. Acta Mathematicae Applicatae Sinica. 1997, 13 (2): 196208. 10.1007/BF02015141.
 26.
Brovelli A, Ding M, Ledberg A, Chen Y, Nakamura R, Bressler SL: Beta oscillations in a largescale sensorimotor cortical network: directional influences revealed by Granger causality. Proc Natl Acad Sci USA. 2004, 101 (26): 98499854. 10.1073/pnas.0308538101
 27.
Wu J, Liu X, Feng J: Detecting causality between different frequencies. J Neurosci Methods. 2008, 167 (2): 367375. 10.1016/j.jneumeth.2007.08.022
 28.
Tass P, Rosenblum MG, Weule J, Kurths J, Pikovsky A, Volkmann J, Schnitzler A, Freund HJ: Detection of n:m phase locking from noisy data: Application to magnetoencephalography. Physical Review Letters. 1998, 81 (15): 32913294. 10.1103/PhysRevLett.81.3291.
 29.
Lachaux JP, Rodriguez E, Martinerie J, Varela FJ: Measuring phase synchrony in brain signals. Human Brain Mapping. 1999, 8 (4): 194208. 10.1002/(SICI)10970193(1999)8:4<194::AIDHBM4>3.0.CO;2C
 30.
Rubiola E: Phase Noise and Frequency Stability in Oscillators. The Cambridge RF and Microwave Engineering Series. 2008, Cambridge University Press
 31.
Locke JC, KozmaBognar L, Gould PD, Feher B, Kevei E, Nagy F, Turner MS, Hall A, Millar AJ: Experimental validation of a predicted feedback loop in the multioscillator clock of Arabidopsis thaliana. Mol Syst Biol. 2006, 2: 59 10.1038/msb4100102
 32.
Ueda HR: Systems biology flowering in the plant clock field. Mol Syst Biol. 2006, 2: 60 10.1038/msb4100105
 33.
Ueda HR, Chen WB, Adachi A, Wakamatsu H, Hayashi S, Takasugi T, Nagano M, Nakahama K, Suzuki Y, Sugano S, et al.: A transcription factor response element for gene expression during circadian night. Nature. 2002, 418 (6897): 534539. 10.1038/nature00906
 34.
Akaike H: New Look at StatisticalModel Identification. IEEE Transactions on Automatic Control. 1974, AC19 (6): 716723. 10.1109/TAC.1974.1100705.
 35.
Geweke J: Measurement of lineardependence and feedback between multiple time series. Journal of the American Statistical Association. 1982, 77 (378): 304313. 10.2307/2287238.
 36.
Geweke J: Measures of conditional lineardependence and feedback between time series. Journal of the American Statistical Association. 1984, 79 (388): 907915. 10.2307/2288723.
Acknowledgements
We thank Jing Kang to read the manuscript and give critical comments. JF was supported by an EPSRC (UK) grant CARMAN and an EU grant BION.
Author information
Affiliations
Corresponding author
Additional information
Authors' contributions
JHW carried out model development and data analysis and wrote up the manuscript. JLS carried out the model development and data analysis. VBW provided the leaf gene data. JFF conceived of the study, participated in design, supervised the studies. All authors read and approved the final manuscript.
Jianhua Wu, James L Sinfield contributed equally to this work.
Electronic supplementary material
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
About this article
Cite this article
Wu, J., Sinfield, J.L., BuchananWollaston, V. et al. Impact of environmental inputs on reverseengineering approach to network structures. BMC Syst Biol 3, 113 (2009). https://doi.org/10.1186/175205093113
Received:
Accepted:
Published:
Keywords
 Harmonic Oscillator
 Granger Causality
 Cross Term
 Causal Influence
 Causal Interaction