# Impact of environmental inputs on reverse-engineering approach to network structures

- Jianhua Wu†
^{1}, - James L Sinfield†
^{2}, - Vicky Buchanan-Wollaston
^{3}and - Jianfeng Feng
^{2, 4}Email author

**3**:113

https://doi.org/10.1186/1752-0509-3-113

© Wu et al; licensee BioMed Central Ltd. 2009

**Received: **14 April 2009

**Accepted: **4 December 2009

**Published: **4 December 2009

## Abstract

### Background

Uncovering complex network structures from a biological system is one of the main topic in system biology. The network structures can be inferred by the dynamical Bayesian network or Granger causality, but neither techniques have seriously taken into account the impact of environmental inputs.

### Results

With considerations of natural rhythmic dynamics of biological data, we propose a system biology approach to reveal the impact of environmental inputs on network structures. We first represent the environmental inputs by a harmonic oscillator and combine them with Granger causality to identify environmental inputs and then uncover the causal network structures. We also generalize it to multiple harmonic oscillators to represent various exogenous influences. This system approach is extensively tested with toy models and successfully applied to a real biological network of microarray data of the flowering genes of the model plant Arabidopsis Thaliana. The aim is to identify those genes that are directly affected by the presence of the sunlight and uncover the interactive network structures associating with flowering metabolism.

### Conclusion

We demonstrate that environmental inputs are crucial for correctly inferring network structures. Harmonic causal method is proved to be a powerful technique to detect environment inputs and uncover network structures, especially when the biological data exhibit periodic oscillations.

## Background

One of the main topics in system biology is to uncover the complex network structures in a biological system [1, 2]. In comparison with simple systems, nowadays the researchers always face larger and more complex dynamic interactive systems (e.g., neural networks and gene networks). Traditional techniques, such as the cross-correlation and partial coherence analysis [3–7], are inadequate to clearly and explicitly reveal the true network structures for such a complex system. These techniques neither take time dimension into consideration nor reveal the directional interactions, thus they cannot configure a dynamic interactive system with time. Over the past few decades several advanced techniques such as dynamic Bayesian networks [8] and Granger causality [9–13] have been developed to identify network structures in dynamic systems. Granger causality only reveals direct causality between pairwise units with linear interactions, thus conditional and partial Granger causality [14, 15] and kernel Granger causality [16] have been proposed to deal with indirect causal interactions among multiple variables.

In multivariable (more than two) situations, one time series can be connected to another time series in a direct or an indirect manner, raising the important question of whether there exists a (direct) causal influence between two time series when the influence of other time series are taken into account. In such cases, repeated bivariate analysis can be misleading. For example, one time series may falsely appear to cause another if they are both influenced by a third time series but with different time delays. The conditional Granger causality [14, 15] aim to deal with the causal interactions among multiple variables. However, the applicability of the kernel Granger causality or the conditional Granger causality largely depends on the experimental ability to measure all relevant variables in the system, but it is usually not feasible in the biological recordings. Environmental inputs, including exogenous inputs from external sources and unmeasured endogenous variables, cannot be all captured by experimental techniques, but such environmental inputs can confound the accuracy of causal influences and thus degrade the credibility of the uncovered network structure. For example, in our experimental data recorded from the inferotemporal (IT) cortex of sheep, every measured neuron receives common exogenous inputs from the visual cortex and feedbacks from the prefrontal cortex [7, 15]. Even with advanced multielectrode array (MEA) techniques, it is only able to record a tiny subset of interacting neurons in a single area [15, 17] and there are bound to be endogenous variables. Hence controlling environmental inputs is a critical issue when applying Granger causality to experimental data. Recently, partial Granger causality [15, 18] is developed to eliminate the influences of exogenous inputs and latent variables, but a full elimination is only possible if all common inputs have a more or less identical influence on all measured variables. It is generally not realistic that all measured variables can receive an almost identical and common influence in experimental recordings. In fact, the common influence degrades due to spatial increment. In such cases, it is critical to identify which measured variables received environmental inputs, and what is the impact of the environmental inputs on configuring network structures?

We take a system biology approach to answer the questions above. Most current techniques largely ignore the natural dynamical characteristics of the biological data, which usually exhibits highly rhythmic (periodic) oscillations, especially under periodic environmental influence, e.g. light-dark condition [19–23]. Such natural periodic dynamics of experimental data can provide important information in model fitting and error estimation. To overcome the limitations of current causality techniques and make full use of harmonic oscillation characteristic of experimental data, we consider a harmonic oscillator, or a set of harmonic oscillators, to represent the environmental inputs. The harmonic oscillators can be mathematically formulated by the hidden periodic model [24, 25]. We extend the current linear Granger causality model (Autoregressive model) by inclusion of the typical harmonic oscillators embedded in the experimental recordings. If the inclusion of harmonic oscillators can significantly reduce the variance of the prediction error, then the environmental causal influence can be reduced or eliminated. The mathematical representation of a harmonic oscillator model is derived in Method section and the application of the harmonic Granger causality approach is elucidated in Result section. Although the techniques of Granger causality are based on time series data, additional useful information can be revealed when the analysis is performed in the frequency domain [14, 18, 26, 27]. Investigating the causal interactions between different frequencies adds another dimension to the already complex identification of spatiotemporal and frequency-specific rhythmic oscillations. Conventional cross-frequency interactions are characterized by the synchrony of phase, recognized as 'n:m phase synchrony' [28, 29]. Phase synchrony indicates amplitude-independent phase-locking of n cycles of one oscillation to m cycles of another oscillation, however, this method largely ignores the information carried by the amplitude and the coupling effects between phase and amplitude. Importantly, phase information can be sensitive to random noise [30], while in most experimental data the true signal is heavily masked by random noise. In this study, as a by-product, we also assess whether it is reliable to use the phase information between two (oscillating) units to estimate the causality. Our simulation results clearly demonstrate that it can be very insufficient and inaccurate to use only the phase information to characterize a causal interaction, but the approach developed in the current paper works. We first apply the harmonic oscillator idea to a toy model and validate it by comparing with the conventional Granger causality. Then we investigate the effect of multiple oscillators by employing a small sparsely connected network. Finally we apply the harmonic Granger causality to a real biological network of microarray data of the flowering genes of the model plant Arabidopsis Thaliana. We aim to identify which genes are directly affected by the presence of the sunlight, and uncover the causal interactions among genes. Although tens of thousands of known genes within Arabidopsis are collected with Microarray, only those genes known to be involved in the flowering of the plant (8 genes in this case) are analyzed by our harmonic Granger causality. The method successfully reveals the genes that most possibly receive environmental inputs. We finally compare our causal network with other candidate models in the literatures [31–33]. With this system biology approach, our causal network depicts all possible connections reported in the literatures [32, 33], and also reveals two more connections that do not exist in the known candidate models.

## Results

### Toy Model & Validation

Within this model we consider four configurations, firstly where both node *X*_{
t
}and *Y*_{
t
}experience an environmental input, secondly where *X*_{
t
}experiences the environmental input and *Y*_{
t
}does not, thirdly where *Y*_{
t
}experiences the environmental input and *X*_{
t
}does not, and lastly where neither *X*_{
t
}or *Y*_{
t
}undergo the influence from the external driving oscillation. Despite the external driving oscillation exerted on either *X* or *Y*, there is a coupling frequency of 34 Hz between *X* and *Y* for all four configurations. In each of the simulations, *f*_{
x
}and *f*_{
y
}are equal to 10 *Hz*, the input phases, *ϕ*_{
x
}and *ϕ*_{
y
}are set to zero, and the variances of inherent noises, *ε*_{
t
}and *η*_{
t
}, are set to 0.1. In all the following simulations, the values of *C*_{
x
}and *C*_{
y
}are 0.07 and 0.05, respectively.

*X*(

*ω*) and

*Y*(

*ω*) have components at the driving oscillation frequency of 10

*Hz*. The causal link known to be present from X to Y should mean that there is a spike in the causality spectra at the driving frequency. Fig. (1Di) and (1Ei) show that the normal Granger AR method can only detect the causality spectra at the intrinsic coupling frequency around 34 Hz, but the Harmonic method is capable of detecting the causal influence at the external driving frequency at 10 Hz. The reason is that the driving force from

*X*to

*Y*comes from the factor 0.3

*X*

_{ t }- 2 (Eq. 1), which contains both intrinsic coupling frequency and external driving frequency. Harmonic method can better fit and predict the data at external driving frequency such that it can detect the external driving frequency at 10 Hz better than normal Granger AR method.

The second case is where *X*_{
t
}receives the driving input and *Y*_{
t
}does not. Fig. 1Bii shows that the peak in the spectra can be seen in the X-channel at 10 *Hz* representing the external driving oscillation, and although there is no inherent 10 *Hz* oscillation in the *Y*_{
t
}channel, the spectra of *Y* (*ω*) (Fig. 1Cii) shows that there is an induced oscillation at this frequency, hence one would expect that the causality of *F*_{X→Y}will show a large component at the driving frequency. Fig. (1Dii) and 1Eii show the causality, *F*_{X→Y}calculated using both AR and harmonic methods. The AR method shows little peak at the driving frequency, while the harmonic method produces a large spike in the causality at the frequency 10 Hz. In third case where *Y*_{
t
}receives the external input, Fig. (1Ciii) shows that the peak in the spectra can be seen in the Y-channel at 10 *Hz* representing the external driving oscillation. We would expect that in this case the causality shows no peak at the driving frequency as *X*_{
t
}contains no driving oscillation: Fig. (1Diii) and (1Eiii) show the causality, *F*_{X→Y}calculated using AR (panel *D*_{
iii
}) and harmonic methods (panel *E*_{
iii
}). In this model configuration, both the AR and the harmonic methods produce similar results. However there is a drastic decrement in the spectra of the harmonic method at the driving frequency in the causality, qualitatively, this cannot be the case, so in this instance it is preferable to use the AR method to calculate the causality spectra as *X*_{
t
}contains no driving oscillation. The causality decrement near the driving frequency may be caused by the inclusion of the harmonic term, which may extract the intrinsic power of the signal *Y* at the driving frequency of 10 Hz. The decrement means that signal *Y* was mainly driven by the harmonic term at 10 Hz and signal *X* did not contribute much at this frequency. For other frequency range, harmonic term did not influence signal *Y* and the driving force mainly came from signal *X*. The final model configuration considered is where neither *X*_{
t
}or *Y*_{
t
}has the driving input. Fig. (1Div) and (1Eiv) show that in this case there is very little difference in the causality spectra obtained using the two methods.

Through this simple toy model we have demonstrated that the normal Granger Causality in the frequency domain is not sufficient to detect interactions at all frequencies in the presence of an environmental input. Furthermore we have demonstrated that the causal method with the additional harmonic term produces more consistent and accurate results than the traditional Granger causality method at most instances. The harmonic causal method and the traditional Granger causality method can be a good complement to each other when applying to the time series with or without oscillatory environmental influences.

### Investigating the Effects of Phase

*X*

_{ t }drives

*Y*

_{ t }, Eq. 1 shows that the influence of the time series

*X*

_{ t }upon

*Y*

_{ t }is encapsulated in the oscillatory and noise terms. However, the amplitude of the resultant oscillation will depend on the phase difference of the two harmonic terms. This can be seen by considering the addition of the two oscillatory terms. Let

*O*

_{ y }=

*C*

_{ y }

*cos*(2

*πf*

_{ y }

*t*+

*ϕy*) and the harmonic component from

*X*

_{ t }be , then in the case that the frequency of the oscillations are equal,

*f*

_{ x }=

*f*

_{ y }=

*f*, the two oscillation terms can be combined as follows:

It is able to analyze the effect of input phase upon the level of influence of one time series upon another. Eq. 3 reveals that the magnitude of the resultant harmonic term is indeed a function of both
and *ϕ*_{
y
}. The oscillating term,
, oscillates sinusoidally and can take values in the range [-1, 1].

The relative effect of this oscillation upon
depends on the values of *C*_{
y
}and
. The extremes of
are given by
, and it can be shown that the maximum effect of the phase differences happens when *C*_{
y
}=
and the minimal difference happens when either *C*_{
y
}or
equals zero.

*ϕ*

_{ x }and

*ϕ*

_{ y }. Fig. 2A shows that the configuration of the model with both nodes of the system receiving external environmental inputs and a causal link from X to Y. Fig. 2B shows two examples in the time domain traces obtained in the absence of environmental noise. The upper panel Fig. 2B1 shows the time domain traces when the phase difference of the external inputs is zero (

*ϕ*

_{ x }=

*ϕ*

_{ y }) and the lower panel Fig. 2B2 shows the scenario when

*ϕ*

_{ x }-

*ϕ*

_{ y }=

*π*. In each of these diagrams the blue trace, which is the node

*X*

_{ t }, is unchanged, however the time series associated with

*Y*

_{ t }change considerably. The difference of the maximum amplitudes of the two time series is denoted by Δ. The colourmap figure shown in Fig. 2C plotted the value of Δ against the values of the input noise and the input phase differences. It can be seen that the effects of the phase differences is lessened by increasing the noise in the system. Fig. 2D reveals the interdependence between noise and phase in this system. The upper panel Fig. 2D1 shows the amplitude difference against the phase differences in the absence of noise. The reason why the peak does not happen at

*p*is because

*C*

_{ x }and

*C*

_{ y }are different, and there is a phase difference. If

*C*

_{ x }=

*C*

_{ y }, then the peak would be exactly at

*p*. The lower panel Fig. 2D2 shows how a measure of the difference varies with increasing noise. The measure of the difference is defined as . As the noise increases the measure difference tends to zero, confirming that the noise levels can mask the phase effects.

*X*

_{ t }exerts over

*Y*

_{ t }does not alter throughout the simulations. The investigation into the effects of phase continued with a series of calculations determining the level of causality. Fig. 3 shows the causality detected in the system obtained by four different methods. Fig. 3A1 shows the time domain causality detected using the normal Granger method, Fig. 3B1 shows the time domain causality detected using the harmonic method, Fig. 3A2 and 3B2 show the causality detected using the AR and harmonic methods in the frequency domain. For consistency we require that:

where *F*_{X→Y}is the causality on the time domain and *f*_{X→Y}(*ω*) is the causality at frequency *ω*. Eq. 6 is the Kolmogorov condition that indicates the equivalence of the causality in time and frequency domain.

Comparing the AR and harmonic causality results, the amplitude differences influenced by the phase differences are represented in the AR method but not in the harmonic causality results, which imply that the harmonic approach must be used for a true indication of the level of causal interaction as it is robust in presence of the external driving oscillation.

Comparison of Fig. (3A1) and Fig. (3A2) shows that the Eq. 6 holds for the AR method and there is a high level of correspondence between the causality calculated in the time domain and frequency domain. Similarly, comparison of Fig. (3B1) and Fig. (3B2) shows that to a large extent the harmonic method is consistent with the Kolmogorov condition in Eq. 6. The causality calculated by harmonic method does not depend on the phase difference no matter it is in the time domain or frequency domain.

*p*and the noise level (variance of the white noise) increased from 0 to 0.2. The phase difference of signal

*X*and

*Y*was plotted as a function against the phase difference of the input signals and the noise level. Fig. 4A shows the colourmap of the interrelation. The color intensity represents the phase difference for the output signals. Fig. 4B and Fig. 4C show the averaged intensity along noise level and the input phase difference, respectively. It is clear that the phase difference of the output signals does not depend on the phase difference of the input signals and the noise level. The results indicate that the phase information cannot be used alone to accurately determine the causal relationship between any two signals. The interpretation of the causality based on phase should be cautious as the causality may not reflect the true relationship.

### Investigating Effect of Multiple Oscillators

- i)
Σ

_{ NC }: AR + harmonic oscillations, no causality

ii) Σ_{C 1}: AR + harmonic oscillations, influence from causal node

iii) Σ_{C 2}: AR + harmonic oscillations, influence from non-causal nodes

_{ NAR }: harmonic oscillations only

- v)
Σ

_{ ARNC }: AR only

vi) Σ_{ARCA 1}: AR, influence from causal node

vii) Σ_{ARCA 2}: AR, influence from non-causal nodes

To investigate the effect of multiple oscillators on the goodness of fit, we consider a simple network consisting of five nodes in various random configurations. The variances are estimated for each of the connection scheme in Fig. 5A, and the number of harmonic oscillators varies from 1 to 11. The results of this simulation are shown in Fig. 5B. Inspection of Fig. 5B reveals some of the issues involved in using the harmonic oscillators to provide a full description of the time domain evolution. Firstly looking at those variance estimates with no harmonic oscillatory components (Σ_{
ARNC
}, Σ_{ARCA 1}and Σ_{ARCA 2}), as expected in Fig. 5B that these are constants for increasing numbers of oscillators (invariant as they are independent of oscillators). The estimates of Σ_{
ARNC
}and Σ_{ARCA 2}are very similar, because the non-causal nodes do not provide useful information for fitting the target node. The causal estimate (Σ_{ARCA 1}) is the best estimate when no harmonic oscillators are considered.

Inspection of estimates in which oscillations are included reveals an important trend; one would expect that as the number of oscillators increases, the estimates become more and more accurate. Theoretically speaking this should be the case, however realistically it is shown not to be the case as in Fig. 5B. When the number of oscillators goes beyond seven, the estimated variances drastically increase. The reason for this situation is the overfitting problem. The total number of parameters needed to be estimated depends on the number of oscillators (three extra parameters per oscillator), and the known data cannot fit the model if the parameters go beyond the amount of data. As expected, the causal estimate (Σ_{C 1}) provides the best estimate for small number of oscillators (minimum at oscillators = 3). This simulation reveals that the number of oscillators does not exceed certain value (seven in this case) if we do indeed obtain a good estimate far surpassing the accuracy of the estimate using only AR. We have to perform goodness of fit test to determine the number of oscillators that can help to fit the data best.

### Gene Data & Network

We consider the time domain change of eight genes involved in the flowering system of the Arabidopsis, namely CCA1, ELF4, GI, LHY, PRR5, PRR7, PRR9 and TOC1. The time domain trace of the expression of these genes is shown in Fig. 6 panel A. Each of the genes with the exception of GI exhibits highly oscillatory behaviour with period of one day. This periodicity is attributed to the presence of incident sunlight during the day time and its absence during the night. Some of the genes are directly affected by the light and are expressed to a greater or less extent during the day. The experimental data used for this analysis is over a period of 11 days with two measurements per day, hence data for each gene consists of 22 data points and there are 16 repetitions of each time point (4 biological repetitions and 4 technical repetitions for each measurement).

where *M* is the number of genes in the network, *p* is the number of lagged observations used in the model. *p* can be determined by a quantity called Akaike Information Criteria (AIC) [34]. The four candidate models are descriptions of an estimation of *X*_{
g
}used to determine the effect of *X*_{
h
}with or without the external driver. The variance of the errors associated with each of the models are: Σ_{1}, Σ_{2}, Σ_{3}, Σ_{4} respectively.

The errors (Σ_{
i
}) - associated with each model were estimated for each gene pair. Using these errors we can infer both the presence of an external environmental driver and the possibility of a connection between the pair of genes.

Comparison of Σ_{1} and Σ_{3} and comparison of Σ_{2} and Σ_{4} reveal that whether particular gene may have an external input as these estimates differ only by the presence of the oscillatory input. If the estimate is improved appreciably by the addition of the harmonic term, then it is possible that this gene receives an environmental input. Therefore for each gene, a measure of the likelihood of input presence is obtained as follows:
. Fig. 6B shows the oscillation metric for each of the 8 genes in the network. This method merely states which of the genes is more likely than others to have an input, so a decision must be made as the value of the oscillation metric is the cut-off point. At a first estimate, the value of *M* = 0.5 has been selected. Figure 6B shows that selecting a cut-off value of 0.5 for the oscillation metric leads to the following genes having an external oscillatory input; ELF4, PRR7, TOC1, LFY.

Having calculated whether the genes have an external input, it is possible to obtain the causality of each pair of genes. Consider the causality between gene *X* and *Y*. If gene *X* has an external input, then the causal influence *X* exerting upon *Y* is:
. Whereas there is no external input to gene *X*, then the causality will be given by:
. The errors (Σ_{
i
}) - and hence the causality associated with each model were found for each pair of genes and then sorted in descending order. Those with the highest level of causality deemed more likely to have a connection.

_{ i }) shown in Fig. 7 is used to find the most likely connection in the gene network. One such candidate network is shown in Fig. 8A.

To validate the gene network generated by harmonic Granger causality, we compare it with other candidate network in the literature of circadian clock cycle by Ueda [31, 32]. The candidate network is reproduced in Fig. 8B. The three loop model was first proposed by Locke [31] and then modified by Ueda [32, 33]. In Ueda's model two hypothesized unnamed genes are omitted in our model and the genes LHY and CCA1 are treated as one entity. Our analysis reveals that four of the genes in this network receive external inputs: PRR7, CCA1, CCA1 and ELF4. The first two of these agree with Ueda's network. The Ueda's network states that TOC1 does not receive external influence but the hypothesized gene does. It is possible that this influence has been included in the time series of TOC1. The structure of the two networks also are very closely related, both showing a high level of connectivity. Perhaps the biggest difference is that our method shows that there are connections between PRR7/PRR9 and TOC1, while Ueda's model does not reveal such connections.

In addition to finding the likely connections between the genes, the frequency domain analysis allows us to investigate the frequencies at which one gene drives another. Fig. 8C shows the causality spectra calculated for each of the 15 connections believed to exist in the gene network shown in Figure 8. *f*_{X→Y}(*ω*) is shown on frequency domain between [0, 0.5]/day^{-1}, whether this is calculated using the harmonic or AR method depends upon whether Gene *X* is thought to have a external input, i.e., those with an external input are calculated with the harmonic method, and those with no external input are calculated using the AR method. It is often the case that the two methods produce very different causality spectra, so selecting the correct method is essential, in many instances the AR method predicts a causal influence at the driving frequency, yet the harmonic method does not.

## Discussion

Although harmonic causal method has greatly improved the performance of causal connection detection, there are several issues that harmonic causal method cannot answer or infer at this stage. First, the application of harmonic Granger causality has a precondition that the signal is influenced by harmonic environmental inputs. Most biological data exhibits harmonic oscillatory behavior, while there may also exist other form of nonlinear exogenous input other than harmonic form. The harmonic method cannot deal with such nonlinear interactions, and its application to nonlinear exerts would depend on specific problems. Second, harmonic causal method is developed to detect the directional causal interactions between any two elements, but it has no ability to determine the influence is positive or negative. For example, one neuron can exert an excitatory or inhibitory effect on another neuron; or one gene can cause another gene's expression level up or down. It is not possible at this stage to figure out the positive or negative effect by only determining the directional causal interactions. Third, one should take extra caution when applying multiple harmonic causal method as the overfitting problem can easily happen. Excessive number of harmonic oscillator will generate inaccurate model estimations and predictions.

## Conclusion

We have presented a system biology approach to study the impact of environmental inputs on recovering network structures. The harmonic modification of the Granger causality is essential if we want to have the complete picture of causal interactions between elements in a system in the presence of a periodic environmental oscillation. The toy model example demonstrated that the conventional Granger causality was not sufficient to reveal the level of causal influence in the presence of an oscillatory driver. Furthermore, the toy model was able to validate the estimates used in the definition of the frequency domain harmonic causality. One of the motivations for the introduction of the driving oscillation was to investigate whether it is feasible to use phase differences between oscillatory signals to assess the causality. We also showed that the apparent level of influence on the conventional Granger causality was tightly related to the phase difference and noise intensity, and this artefact was enough to render the estimation of the conventional Granger causality. The harmonic Granger causality was not sensitive to these phase effects and produces more accurate estimate of the true causality. We also applied the harmonic method to detect external drivers and causal connections in a gene network. We were able to predict which genes receive an environmental input from the sun and these results are in agreement with the experimental results to a large degree. Furthermore, we were able to reproduce the network, which not only reveals known connections but also predicts new connections comparing with classical candidate models.

Our approach clearly demonstrates that by including appropriate environmental (oscillatory) inputs in a conventional reverse-engineering approach could significantly improve its accuracy. Obviously the same idea could be applied to other approaches such as the Bayesian network inferences and information theory approach.

## Methods

### Causality in the Time Domain

In order to infer the connections between the elements of a system constituting a network, we propose an extended Granger Causality whereby a harmonic oscillatory term is added to the normal autoregressive (AR) and error terms of the conventional Granger analysis, and such simple modification can yield surprising and useful results. To appreciate the effect of the proposed modification and the power of the addition of the harmonic oscillation to the Granger causality analysis, we provide the conventional Granger causality in the supplementary material (Additional file 1) and proceed directly to the formulation of harmonic Granger Causality.

*X*

_{ t }and

*Y*

_{ t }, a general form of an autoregressive model with environment inputs (sinusoidal form) has the following vector autoregressive representation:

*X*

_{ t }and

*Y*

_{ t }can be written as

*p*is the maximum number of lagged observations in the model.

*ε*

_{ it },

*i*= 1, 2, 3, 4, are prediction errors with variance Σ

_{ i }, which are uncorrelated over time. The value of Σ

_{1}measures the accuracy of the autoregressive prediction of

*X*based on its previous values and the harmonic term, whereas the value of Σ

_{3}represents the accuracy of predicting present value of

*X*based on previous measurements of both

*X*and

*Y*and the harmonic term. According to the causality definition of Granger, if the prediction of one process is improved by incorporating past information of the second process, then the second process causes the first process. In other words, if the variance of the prediction error for the first process is reduced by the inclusion of the past histories of the second process then a causal relation from the second process to the first process exists. This causal influence is quantified by

*F*

_{Y→X}= 0 when there is no causal influence from

*Y*to

*X*and

*F*

_{Y→X}> 0 when there is. Similarly, define causal influence from

*X*to

*Y*as

*E*) are represented by the harmonic terms. While the inclusion of the harmonic terms can exclude the periodic influence caused by the environmental inputs, thus the prediction error can be better estimated and truly reflect the interaction between two processes. We can quantify the influence of environmental inputs (

*E*) by recalling the joint autoregressive model of

*X*

_{ t }and

*Y*

_{ t }.

### Causality with multiple Oscillators

In many aspects, the addition of a single oscillator is a generalization of the Granger causality, however the application of the adaptation to the normal autoregressive approach is limited to only one external driving force. A further generalization considered here consists of adding more oscillators to the AR model. The interpretation of this approach is as follows: the first and most simple interpretation is that the additional oscillators represent more external oscillatory driving forces, thus being a mere extension of the single oscillator case. A more enlightening and more useful interpretation is that multiple oscillators represent a 'field' of unknown influences upon the network. We know from Fourier theory that any function or signal can be represented by a (possibly infinite) summation of sinusoids, therefore the addition of multiple oscillators in this fashion can, in theory, account for any incident influence upon each of the variables within the system. This interpretation has some rather useful applications. Consider a large sparsely connected network. It is a typical scenario that due to some experimental limitations, we can only record a small proportion of information in the network as a whole. Ideally we wish to reconstruct the structure of the subnetwork for which we have recorded. Hopefully by considering multiple oscillators, this Fourier-like method will provide an avenue to recover the structure of the subnetwork.

*Y*

_{ i }to

*X*with many external oscillators and known variables, we can write the equations as

In Eq. 14, all known variables (*Y*) are included in the AR terms, while in Eq. 13, the variable *Y*_{
i
}is excluded in the AR terms. *M* is the number of known variables, *p* is the total number of previous time steps included and *N* is the number of oscillators considered in the estimation. The errors associated with noncausal and causal estimations are *ε*_{
nc
}and *ε*_{
c
}respectively.

If
then there is no causal influence from *Y*_{
i
}to *X*. If F_{Y→X}> 0 then there is a causal influence from *Y*_{
i
}to *X*.

### Causality in the Frequency Domain

*L*is the lag operator. And the zeroth terms of the coefficient matrix are such that

*a*(0) = 1,

*b*(0) = 0,

*c*(0) = 0 and

*d*(0) = 1.

*O*

_{ x }and

*O*

_{ y }are the harmonic terms. Take the Fourier transform on both sides of this matrix equation and then multiply by the inverse of the matrix, then express

*X*(

*ω*) and

*Y*(

*ω*) in terms of the error and harmonic oscillations, we can obtain the transfer function:

*X*(

*ω*) and

*Y*(

*ω*). Expanding the expression for

*S*

_{ xx }and

*S*

_{ yy }yields an equation with 16 terms dependent upon the errors terms, (

*E*

_{ x }(

*ω*),

*E*

_{ y }(

*ω*)), the harmonic terms, (

*O*

_{ x }(

*ω*),

*O*

_{ y }(

*ω*)), the transfer functions,

*H*

_{ xx },

*H*

_{ xy },

*H*

_{ yx },

*H*

_{ yy }, and their complex conjugates. For the

*X*channel, these components are as follows:

*S*

_{ xx }can be thought of as either intrinsic (caused by the

*X*

_{ t }), causal (caused by

*Y*

_{ t }) or cross terms (caused by

*X*

_{ t }and

*Y*

_{ t }). Thus

*S*

_{ xx }and

*S*

_{ yy }, and the cross term can be eliminated using the transformation proposed by Geweke [35, 36]. Eliminating the cross term is essential for a consistent definition of the frequency domain causality, however the addition of the harmonic terms makes the prospect of removing the cross term rather troublesome. In the harmonic case, it is not possible, in general to eliminate the cross terms by means of transformation due to the presence of

*O*

_{ x }and

*O*

_{ y }terms. The oscillation terms

*O*

_{x, y}are sinusoidal indicating that the Fourier transforms of these functions are delta functions, and the discontinuous nature of the delta functions makes it impossible to find a transformation eliminate all the cross terms. The method we use to eliminate the cross terms is as follows: firstly we apply the approximation of the Geweke transformation:

where *γ*^{2} is the covariance matrix between *X* and *Y*, S is the variance of either error *E*_{
x
}or *E*_{
y
}.

*H*is an approximate to the true value. This estimate is necessary to ensure that the causality has a consistent definition. The transfer equation is now as follows:

*S*

_{ cross }) are reallocated either to

*S*

_{ causal }or

*S*

_{ intrinsic }in the following fashion:

*S*

_{ intrinsic }and

*S*

_{ causal }. Therefore, by analogy to the normal frequency domain causality (without harmonic terms), the frequency domain causality in the harmonic case is defined as:

In the Results section, we will show through examples that this approximation produces an excellent estimate of the frequency domain causality.

## Notes

## Declarations

### Acknowledgements

We thank Jing Kang to read the manuscript and give critical comments. JF was supported by an EPSRC (UK) grant CARMAN and an EU grant BION.

## Authors’ Affiliations

## References

- Alon U: Biological networks: the tinkerer as an engineer. Science. 2003, 301 (5641): 1866-1867. 10.1126/science.1089072View ArticlePubMedGoogle Scholar
- Gardner TS, di Bernardo D, Lorenz D, Collins JJ: Inferring genetic networks and identifying compound mode of action via expression profiling. Science. 2003, 301 (5629): 102-105. 10.1126/science.1081900View ArticlePubMedGoogle Scholar
- Lopes da Silva F, Pijn JP, Boeijinga P: Interdependence of eeg signals: linear vs. nonlinear associations and the significance of time delays and phase shifts. Brain Topogr. 1989, 2 (1-2): 9-18. 10.1007/BF01128839View ArticlePubMedGoogle Scholar
- Kocsis B, Bragin A, Buzsaki G: Interdependence of multiple theta generators in the hippocampus: a partial coherence analysis. J Neurosci. 1999, 19 (14): 6200-6212.PubMedGoogle Scholar
- Albo Z, Di Prisco GV, Chen YH, Rangarajan G, Truccolo W, Feng JF, Vertes RP, Ding MZ: Is partial coherence a viable technique for identifying generators of neural oscillations?. Biological Cybernetics. 2004, 90 (5): 318-326. 10.1007/s00422-004-0475-5View ArticlePubMedGoogle Scholar
- Schelter B, Winterhalder M, Eichler M, Peifer M, Hellwig B, Guschlbauer B, Lucking CH, Dahlhaus R, Timmer J: Testing for directed influences among neural signals using partial directed coherence. J Neurosci Methods. 2006, 152 (1-2): 210-219. 10.1016/j.jneumeth.2005.09.001View ArticlePubMedGoogle Scholar
- Wu J, Kendrick K, Feng J: Detecting correlation changes in electrophysiological data. J Neurosci Methods. 2007, 161 (1): 155-165. 10.1016/j.jneumeth.2006.10.017View ArticlePubMedGoogle Scholar
- Friedman N, Linial M, Nachman I, Pe'er D: Using Bayesian networks to analyze expression data. Journal of Computational Biology. 2000, 7 (3-4): 601-620. 10.1089/106652700750050961View ArticlePubMedGoogle Scholar
- Granger CWJ: Investigating Causal Relations by Econometric Models and Cross-Spectral Methods. Econometrica. 1969, 37 (3): 414-428. 10.2307/1912791.View ArticleGoogle Scholar
- Granger CWJ: Testing for Causality - A Personal Viewpoint. Journal of Economic Dynamics & Control. 1980, 2 (4): 329-352. 10.1016/0165-1889(80)90069-X.View ArticleGoogle Scholar
- Gourevitch B, Le Bouquin-Jeannes R, Faucon G: Linear and nonlinear causality between signals: methods, examples and neurophysiological applications. Biological Cybernetics. 2006, 95 (4): 349-369. 10.1007/s00422-006-0098-0View ArticlePubMedGoogle Scholar
- Ding M, Chen , Yonghong M, Bressler SL: Granger Causality: Basic Theory and Application to Neuroscience. Handbook of Time Series Analysis. Edited by: Schelter BWM, Timmer J. 2006, 451-474. Wiley-VCH VerlageGoogle Scholar
- Seth AK: Causal networks in simulated neural systems. Cogn Neurodyn. 2008, 2 (1): 49-64. 10.1007/s11571-007-9031-zPubMed CentralView ArticlePubMedGoogle Scholar
- Chen YH, Bressler SL, Ding MZ: Frequency decomposition of conditional Granger causality and application to multivariate neural field potential data. J Neurosci Methods. 2006, 150 (2): 228-237. 10.1016/j.jneumeth.2005.06.011View ArticlePubMedGoogle Scholar
- Guo S, Seth AK, Kendrick KM, Zhou C, Feng J: Partial Granger causality-eliminating exogenous inputs and latent variables. J Neurosci Methods. 2008, 172 (1): 79-93. 10.1016/j.jneumeth.2008.04.011View ArticlePubMedGoogle Scholar
- Marinazzo D, Pellicoro M, Stramaglia S: Kernel-Granger causality and the analysis of dyanmical networks. Physical Review E. 2008, 77 (056215): 1-9.Google Scholar
- Wu J, Kendrick K, Feng J: A novel approach to detect Hot-Spots in large-scale multivariate data. BMC Bioinformatics. 2007, 8: 331- 10.1186/1471-2105-8-331PubMed CentralView ArticlePubMedGoogle Scholar
- Guo S, Wu J, Ding M, Feng J: Uncovering interactions in the frequency domain. PLoS Comput Biol. 4 (5): e1000087-Google Scholar
- Chance B, Estabrook RW, Ghosh A: Damped sinusoidal oscillations of cytoplasmic reduced pyridine nucleotide in yeast cells. Proc Natl Acad Sci USA. 1964, 51: 1244-1251. 10.1073/pnas.51.6.1244PubMed CentralView ArticlePubMedGoogle Scholar
- Chrobak JJ, Buzsaki G: Gamma oscillations in the entorhinal cortex of the freely behaving rat. J Neurosci. 1998, 18 (1): 388-398.PubMedGoogle Scholar
- Reppert SM, Weaver DR: Molecular analysis of mammalian circadian rhythms. Annu Rev Physiol. 2001, 63: 647-676. 10.1146/annurev.physiol.63.1.647View ArticlePubMedGoogle Scholar
- Buzsaki G: Rhythms of the Brain. 2006, Oxford: Oxford University PressView ArticleGoogle Scholar
- Yan J, Wang HF, Liu YT, Shao CX: Analysis of Gene Regulatory Networks in the Mammalian Circadian Rhythm. PLoS Comput Biol. 2008, 4 (10): 13-10.1371/journal.pcbi.0040013.View ArticleGoogle Scholar
- Tiao GC, Grupe MR: Hidden Periodic Autoregressive-Moving Average Models in Time-Series Data. Biometrika. 1980, 67 (2): 365-373.Google Scholar
- He S: Estimation of the mixed ar and hidden periodic model. Acta Mathematicae Applicatae Sinica. 1997, 13 (2): 196-208. 10.1007/BF02015141.View ArticleGoogle Scholar
- Brovelli A, Ding M, Ledberg A, Chen Y, Nakamura R, Bressler SL: Beta oscillations in a large-scale sensorimotor cortical network: directional influences revealed by Granger causality. Proc Natl Acad Sci USA. 2004, 101 (26): 9849-9854. 10.1073/pnas.0308538101PubMed CentralView ArticlePubMedGoogle Scholar
- Wu J, Liu X, Feng J: Detecting causality between different frequencies. J Neurosci Methods. 2008, 167 (2): 367-375. 10.1016/j.jneumeth.2007.08.022View ArticlePubMedGoogle Scholar
- Tass P, Rosenblum MG, Weule J, Kurths J, Pikovsky A, Volkmann J, Schnitzler A, Freund HJ: Detection of n:m phase locking from noisy data: Application to magnetoencephalography. Physical Review Letters. 1998, 81 (15): 3291-3294. 10.1103/PhysRevLett.81.3291.View ArticleGoogle Scholar
- Lachaux JP, Rodriguez E, Martinerie J, Varela FJ: Measuring phase synchrony in brain signals. Human Brain Mapping. 1999, 8 (4): 194-208. 10.1002/(SICI)1097-0193(1999)8:4<194::AID-HBM4>3.0.CO;2-CView ArticlePubMedGoogle Scholar
- Rubiola E: Phase Noise and Frequency Stability in Oscillators. The Cambridge RF and Microwave Engineering Series. 2008, Cambridge University PressGoogle Scholar
- Locke JC, Kozma-Bognar L, Gould PD, Feher B, Kevei E, Nagy F, Turner MS, Hall A, Millar AJ: Experimental validation of a predicted feedback loop in the multi-oscillator clock of Arabidopsis thaliana. Mol Syst Biol. 2006, 2: 59- 10.1038/msb4100102PubMed CentralView ArticlePubMedGoogle Scholar
- Ueda HR: Systems biology flowering in the plant clock field. Mol Syst Biol. 2006, 2: 60- 10.1038/msb4100105PubMed CentralView ArticlePubMedGoogle Scholar
- Ueda HR, Chen WB, Adachi A, Wakamatsu H, Hayashi S, Takasugi T, Nagano M, Nakahama K, Suzuki Y, Sugano S, et al.: A transcription factor response element for gene expression during circadian night. Nature. 2002, 418 (6897): 534-539. 10.1038/nature00906View ArticlePubMedGoogle Scholar
- Akaike H: New Look at Statistical-Model Identification. IEEE Transactions on Automatic Control. 1974, AC19 (6): 716-723. 10.1109/TAC.1974.1100705.View ArticleGoogle Scholar
- Geweke J: Measurement of linear-dependence and feedback between multiple time series. Journal of the American Statistical Association. 1982, 77 (378): 304-313. 10.2307/2287238.View ArticleGoogle Scholar
- Geweke J: Measures of conditional linear-dependence and feedback between time series. Journal of the American Statistical Association. 1984, 79 (388): 907-915. 10.2307/2288723.View ArticleGoogle Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.