Highly sensitive inference of time-delayed gene regulation by network deconvolution
© Chen et al.; licensee BioMed Central Ltd. 2014
Published: 8 December 2014
Gene regulatory network (GRN) is a fundamental topic in systems biology. The dynamics of GRN can shed light on the cellular processes, which facilitates the understanding of the mechanisms of diseases when the processes are dysregulated. Accurate reconstruction of GRN could also provide guidelines for experimental biologists. Therefore, inferring gene regulatory networks from high-throughput gene expression data is a central problem in systems biology. However, due to the inherent complexity of gene regulation, noise in measuring the data and the short length of time-series data, it is very challenging to reconstruct accurate GRNs. On the other hand, a better understanding into gene regulation could help to improve the performance of GRN inference. Time delay is one of the most important characteristics of gene regulation. By incorporating the information of time delays, we can achieve more accurate inference of GRN.
In this paper, we propose a method to infer time-delayed gene regulation based on cross-correlation and network deconvolution (ND). First, we employ cross-correlation to obtain the probable time delays for the interactions between each target gene and its potential regulators. Then based on the inferred delays, the technique of ND is applied to identify direct interactions between the target gene and its regulators. Experiments on real-life gene expression datasets show that our method achieves overall better performance than existing methods for inferring time-delayed GRNs.
By taking into account the time delays among gene interactions, our method is able to infer GRN more accurately. The effectiveness of our method has been shown by the experiments on three real-life gene expression datasets of yeast. Compared with other existing methods which were designed for learning time-delayed GRN, our method has significantly higher sensitivity without much reduction of specificity.
The inference of a gene regulatory network (GRN) is a vital step in understanding many biological systems in detail. However, the inference of GRN is known to be challenging due to several facts: (1) gene regulation is inherently complicated, (2) the measurements of gene expression levels are usually noisy, (3) the datasets for GRN inference are often incomplete, (4) time-series gene expression datasets have short time series compared to the number of genes measured. Generally, a GRN is inferred using machine learning algorithms on a time-series gene-expression dataset. Given the time-series data, the gene regulation could be inferred in two ways: one is assuming instantaneous or first order regulation, and the other is considering higher order regulation. In many cases, a gene regulates the expression of another gene by its products (RNAs or proteins). Since it takes time to generate those products and different processes (e.g. transcription, translation) need different amounts of time, time-delayed regulation is ubiquitous in cellular processes. Thus, inferring time-delayed gene interactions is essential to accurately reconstructing GRN.
The problem of inferring higher-order time delays is challenging, due to the tremendous search space when the numbers of time lags are unknown. For the r-th order system with totally T time points in the dataset, the available numbers of time points for inference reduce to T-r. This poses a serious computational challenge resulting in more false predictions.
While many methods have been introduced to reconstruct first-order gene regulation (e.g. DBN-MCMC [1–3], dynamic RandomForest ), there are only a few methods for inferring time-delayed GRN. In 2010, a dynamic version of ARACNE (Algorithm for the Reconstruction of Accurate Cellular Networks) was introduced to infer time-delayed dependencies among genes . Their method, called TimeDelay ARACNE (or TD-ARACNE), is able to reconstruct time-delayed dependencies effectively. In 2012, Morshed et al. proposed a framework to infer instantaneous and time-delayed genetic interactions at the same time . Their approach was shown to outperform some existing methods such as TD-ARACNE and BANJO. In 2013, Li et al. presented a method to infer high-order gene regulation, named MMHODBN (max-min high-order dynamic Bayesian network) . MMHODBN is a hybrid Bayesian network method, which incudes two steps: first it learns the skeleton (i.e. an undirected network) of GRN using constraint-based Bayesian learning (Spirtes et al., 2001); then it performs a search-and-score technique to orient the edges in the skeleton of GRN. It was shown that MMHODBN was able to learn high-order gene interactions effectively. Mundra et al. proposed a method for inferring time-delayed GRN based on cross-correlation and LASSO . This method has been tested on real-life yeast pathways in G1 phase to show its effectiveness in identifying time-delayed regulation among genes. Despite all those efforts, the performance of inferring time-delayed genetic regulation is yet to be further improved.
In this paper, we propose a simple yet effective and efficient method to tackle the challenges of inferring high-order time-delayed gene regulation. Using cross-correlation [9, 10] and data manipulations, we first determine the probable time lags and then use the algorithm of network deconvolution (ND)  to infer the time-delayed GRN. ND is a technique to identify direct dependencies in an observed network (e.g. correlation-based network) which contains both direct and indirect interactions. By assuming that the indirect edges could be estimated from the products of direct edges and the observed network is the sum of the direct and indirect edges, ND can recover the direct network from the observed network through the process of deconvolution. However, the authors of ND methods have not considered time delays, i.e. they assume all direct interactions take equal time, which is unlikely in the real biological systems. Our method integrates time delay inference and adjustment into the ND approach, to further increase its power. Running on three real-life datasets of yeast, the proposed method achieves better performance than existing methods.
Results and discussion
We proposed a method to infer time-delayed gene regulation based on cross-correlation and network deconvolution. We first identified the probable time delays for the interactions between each target gene and its potential regulators, using cross-correlation[9, 10]. Then, we adapted the algorithm of network deconvolution  to infer time-delayed genetic interactions. Network deconvolution has been shown to be very promising in learning gene regulation . However, ND does not consider time delays, which are essential in gene interactions. Besides, the network inferred by ND is indirected (i.e. without directions in edges). Here we introduced time delays into ND to enhance its strength in GRN inference. Based on the time delays identified with cross-correlation, we aligned the samples and calculated correlations of genes using the aligned samples. Then we applied ND to the correlation matrix and identified the direct interactions between the target gene and its regulators. The network inferred by our method is directed and includes time delays. We have evaluated the performance of our method on three gene expression datasets, described as follows.
There are two parameters in time-delayed ND (i.e. the proposed method). One is the threshold θ (0 ≤ θ ≤ 1) for the matrix output by ND. Since this matrix of ND is a weighted matrix where each entry represents the strength of interaction between the corresponding gene pair, we need to set up a threshold to obtain a connectivity matrix from this weighted matrix. Then we can compare the connectivity matrix with the benchmark network and calculate performance metrics such as sensitivity and F-measure. In all the following experiments (except the ROC cureves), we set the threshold θ to a moderate value (i.e. 0.5). The other parameter is the maximum time lag r. Since a large r would lead to a small number of samples available to infergene interactions, we usually set r ≤ 5. We have done experiments with r taking values from 2 to 5. The results of comparing to other methods are similar (data not shown). Here we only show the results with r = 5. In addition, we show the results with r = 3 for the yeast5off dataset to compare.
To compare time-delayed ND with other existing methods which are designed for inferring time-delayed gene regulation, we apply these methods on the same dataset and compare their performance using sensitivity, precision, and F-measure. There are three existing methods available for our comparisons, namely time-delayed ARACNE (TD-ARACNE) , Xcorr+LASSO [8, 15], and MMHODBN .
Comparison based on yeast9 dataset.
Comparison based on yeast5on dataset.
Comparison based on yeast5off dataset.
As the results of the three experiments suggested, time-delayed ND has a high sensitivity in detecting time-delayed gene interactions, which yields a better performance (in terms of F-measure) than other methods.
The network deconvolution algorithm is a nonlinear filter which could be applied to any symmetric (and some asymmetric) network matrix to filter out the indirect edges. Using correlation in the original method, only an undirected network could be inferred. In the proposed method, the correlation is calculated between each target gene and the rest of genes. The correlation matrix also includes correlations between all the other genes using the time samples aligned based on the inferred time delays between these genes and the target genes (see Algorithms 1 and 2). ND is then applied to such a correlation matrix, i.e., the filter for indirect edges is determined using correlation not only between a target gene and the rest of genes but also between the rest of genes correlations. This step helps in the determination of regulatory direction while considering redundancy in the possible regulators.
In the proposed method, the networks are inferred with a priori determined maximum time lag r and a threshold θ. Given the short time series, increasing maximum possible time lag will reduce the available number of time points to compute the correlations. This may ultimately result in an increase in the numbers of false predictions. Hence, we have restricted the maximum possible time lag to 5 in our experiments. An increase in the value of θ means increasing the cut-off to infer a regulatory edge. Generally, this increase will result in only partial recovery of true positive edges. Hence in our experiments, we have kept the value of θ to a moderate level so that most of the true positives edges could be recovered while keeping the false positives in check.
In this paper, we proposed a method named time-delayed ND to infer time-delayed gene interactions based on cross-correlation and network deconvolution. We first infer the probable time delays for the interactions between each target gene and its potential regulators, using cross-correlation. Then based on the inferred time delays, we align the time samples for each target gene. After that, we employ the algorithm of network deconvolution to identify direct interactions between the target gene and its regulators. The performance of time-delayed ND has been evaluated on three real-life gene expression datasets. Compared with three other methods for inferring time-delayed GRNs, our method achieved overall better performance in the inference of time-delayed GRN.
The time-delayed gene regulatory network is inferred using a time-series gene expression data. Let X T ×N = (x1, x2, ..., x N ) be a time-series gene expression dataset where N is the number of genes and T is the number of time samples. Let x t,i denote the expression level of the i-th gene at time t. Then x i = (x1,i, x2,i, ..., x T ,i )T, where 1 ≤ i ≤ N denotes the expression profile of the i-th gene across T time points.
To infer regulatory interactions among genes, the most straightforward way is by using correlations. However, there are two major issues about correlations: (1) time-delayed regulation is not likely to be inferred by simple correlations; (2) the relationships based on correlations are not direct and would suffer from a large number of false positive predictions. The two issues can be coped with by cross-correlation and network deconvolution respectively, which are described in the following two sections.
To infer a time-delayed regulation between the i-th gene and j-th gene, we need to determine the number of time lags first. This could be achieved by applying cross-correlation [9, 10] on the expression profiles of these two genes. The lag that gives the maximum absolute cross-correlation is the most likely time lag.
Here, x i and x j are the expression profiles. Note that we normalized the gene expression data for each gene to have "zero" mean and "one" standard deviation before calculating the cross-correlation.
In this paper, we utilize the Matlab function xcorr which adopts the previous way (i.e. Eq. 2) to calculate cross-correlations. For each target gene, cross-correlation is calculated between this gene and all the other genes. From the maximum possible r time lags, we identify the time-delayed τ which corresponds to the maximum absolute values of C(x i , x j , τ ). This time delay is denoted by l ij and it represents the probable time lag of regulation between the i-th gene and the j-th gene.
After determining the probable time lag for each gene pair, we can proceed to determine the possible regulators for each target gene from the rest genes in the dataset. As mentioned above, using correlations is a natural way to identify interactions among genes; but such approach may suffer from a large number of false positive predictions. The main reason is that most correlations represent indirect dependencies instead of direct dependencies. A direct dependency between two variables mean that the interaction between the two variables does not depend on any intermedium. On the other hand, the indirect dependency is caused by direct dependencies through some intermediate nodes. For example, if A regulates B and B regulates C, even though there is no direct relationship between A and C, the correlation between A and C could be high because there is an intermediate node B between them. Network deconvolution (ND)  is a technique to infer direct dependencies among variables. Let us use a matrix to represent a network. Starting from the matrix of correlations (or other similarity metrics) which could include both direct and indirect dependencies, ND is able to filter out the indirect dependencies through a process called network deconvolution.
In our code, the scaled version (i.e. Eq. 8) of ND was implemented. We used the default value for the parameter of α.
Experimental results in  show that when their assumptions hold that the indirect edges can be derived from the product of direct edges and the observed network is the sum of direct and indirect edges, the method can remove all indirect edges and recover all direct edges; even when the assumptions do not hold, it still can infer most of direct interactions, as shown by simulation experiments on various network structures. For more details about ND, please refer to .
Algorithm 1 shows more technical details about the procedure of time-delayed ND. For each gene in the set of N genes, the same process is carried out, which consists of three steps. First, with a fixed maximum time delay r, we identify the most likely time delays with the maximum absolute cross-correlation for the interaction between the target gene and each of the other genes. Second, we align the time samples for the target gene based on the time delays (see Algorithm 2), and compute correlations between this gene and the other genes based on the aligned samples. Third, we apply ND on the matrix of correlations and obtain a new matrix with direct dependencies among genes. Then the direct dependencies between the target gene and the other genes are extracted and stored. In this way, we infer time-delayed regulation among genes from time-series gene expression data.
The performance of methods is evaluated using Sensitivity, Precision and F-measure. We define "positive" as the presence of a connection, and "negative" as the absence of an edge. The numbers of true positives, true negatives, false positives, and false negatives are denoted as TP, TN, FP, and FN, respectively. Then Sensitivity (denoted as Se; also known as recall ), Precision (P r), F-measure (Fm), and Specificity (Sp) are defined as , , , . F-measure provides a balanced criterion to evaluate the performance of methods in GRN inference. A method with high F-measure implies that it can recover most true edges while most edges inferred by this method are correct. Here we use F-measure as the major criterion for comparing different methods.
Algorithm 1. Inferring time-delayed regulation based on Network Deconvolution
INPUT: Time-series gene expression data X(T ×N )with N genes and T time samples; the maximum possible time lag r
OUTPUT: An N × N matrix S with weights to show the strength of interactions among genes; An N × N matrix D with time-delayed information for each interaction
Normalize X so that the expression data for each gene have "zero" mean and "one" standard deviation
for each gene i do
Initialize a temporary vector
for each gene j in the rest genes do
Calculate the cross-correlation C(x i , x j ) for gene i and gene j based on Eq. 3
Identify the time lag l ij with the maximum absolute cross-correlation C(x i , x j ) among r choices of time lags
Store l ij into
Obtain the matrix with aligned time samples for target gene i according to Algorithm 2
Initialize a temporary N × N matrix C ∗ = [ ]
for each gene i ′ do
for each gene j ′ do
Compute the correlation corr ij between gene i′ and gene j′ based on the corresponding time samples in .
Store corr ij into C∗
Apply ND on C ∗ to obtain S ∗ which denotes the direct interactions among genes based on the extracted time samples.
Extract the vector which contains the direct dependencies between gene i and its potential regulators and store it into S, S = [S ]
Append the values of time lags to D, D = [D ]
Return S and D
Algorithm 2. Aligning time samples based on the delays inferred by cross-correlation
INPUT: Time-series gene expression data X(T ×N ); the matrix of time delays L i ∗ for target gene (denoted as the i-th gene); the maximum possible time lag r
OUTPUT: An (T − r) × N matrix with aligned time samples for the target gene
Initialize = [ ]
Extract the vector of time samples for the target gene (the i-th gene) from X, y = X(r + 1 : T, i)
Append the expression of the target gene to , = [y]
for each gene j (j ≠ i) do
Find time delay l ij of interaction between the target gene i and gene j from
Extract and align time samples for gene j from X based on l ij , x j = X(r − l ij + 1 : T − l ij , j)
Append the expression of the j-th gene to , = [x j ]
1 The derivation of Eq. 2: Using the convolution expression, we have ϕ xy (τ ) = x(−τ ) ∗ y(τ ); Converting x(−τ ) to frequency domain using Fourier transform ; Substituting τ ′ = −τ we have ; Combining the equations above we have Φ xy = FT[ϕ xy (τ )] = X∗ (f)Y(f); Applying inverse Fourier transform we can obtain ϕ xy as in Eq. 2.
This project is supported by AcRF Tier 2 grant MOE2010-T2-1-056 (ARC9/10) and AcRF Tier 1 Seed Fund on Complexity (RGC 2/13), Ministry of Education, Singapore.
The publication charges for this article were funded by AcRF Tier 1 Seed Fund on Complexity (RGC 2/13).
This article has been published as part of BMC Systems Biology Volume 8 Supplement 4, 2014: Thirteenth International Conference on Bioinformatics (InCoB2014): Systems Biology. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcsystbiol/supplements/8/S4.
- Husmeier D: Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic bayesian networks. Bioinformatics. 2003, 19 (17): 2271-2282. 10.1093/bioinformatics/btg313.View ArticlePubMedGoogle Scholar
- Wilczyn´ski B, Dojer N: Bnfinder: exact and efficient method for learning bayesian networks. Bioinformatics. 2009, 25 (2): 286-287. 10.1093/bioinformatics/btn505.View ArticleGoogle Scholar
- Chen H, Maduranga D, Mundra PA, Zheng J: Integrating epigenetic prior in dynamic bayesian network for gene regulatory network inference. Computational Intelligence in Bioinformatics and Computational Biology (CIBCB). 2013, 2013 IEEE Symposium, 76-82. IEEEGoogle Scholar
- Maduranga D, Zheng J, Mundra PA, Rajapakse JC: Inferring gene regulatory networks from time-series expressions using random forests ensemble. Pattern Recognition in Bioinformatics. 2013, Springer, Berlin Heidelberg, 13-22.View ArticleGoogle Scholar
- Zoppoli P, Morganella S, Ceccarelli M: TimeDelay-ARACNE: Reverse engineering of gene networks from time-course data by an information theoretic approach. BMC Bioinformatics. 2010, 11 (1): 154-10.1186/1471-2105-11-154.PubMed CentralView ArticlePubMedGoogle Scholar
- Morshed N, Chetty M, Vinh NX: Simultaneous learning of instantaneous and time-delayed genetic interactions using novel information theoretic scoring technique. BMC systems biology. 2012, 6 (1): 62-10.1186/1752-0509-6-62.PubMed CentralView ArticlePubMedGoogle Scholar
- Li Y, Ngom A: The max-min high-order dynamic bayesian network learning for identifying gene regulatory networks from time-series microarray data. Computational Intelligence in Bioinformatics and Computational Biology (CIBCB). 2013, 2013 IEEE Symposium, 83-90. IEEEGoogle Scholar
- Mundra PA, Zheng J, Niranjan M, Welsch RE, Rajapakse JC: Inferring time-delayed gene regulatory networks using cross-correlation and sparse regression. Bioinformatics Research and Applications. 2013, Springer, Berlin Heidelberg, 64-75.View ArticleGoogle Scholar
- Orfanidis SJ: Optimum Signal Processing. An Introduction. 1996, Prentice-Hall, United StatesGoogle Scholar
- Rhudy M, Bucci B, Vipperman J, Allanach J, Abraham B: Microphone array analysis methods using cross-correlations. ASME 2009 International Mechanical Engineering Congress and Exposition. 2009, American Society of Mechanical Engineers, 281-288.Google Scholar
- Feizi S, Marbach D, M´edard M, Kellis M: Network deconvolution as a general method to distinguish direct dependencies in networks. Nature biotechnology. 2013Google Scholar
- Simon I, Barnett J, Hannett N, Harbison CT, Rinaldi NJ, Volkert TL, Wyrick JJ, Zeitlinger J, Gifford DK, Jaakkola TS, et al: Serial regulation of transcriptional regulators in the yeast cell cycle. Cell. 2001, 106 (6): 697-708. 10.1016/S0092-8674(01)00494-9.View ArticlePubMedGoogle Scholar
- Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B: Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell. 1998, 9 (12): 3273-3297. 10.1091/mbc.9.12.3273.PubMed CentralView ArticlePubMedGoogle Scholar
- Cantone I, Marucci L, Iorio F, Ricci MA, Belcastro V, Bansal M, Santini S, di Bernardo M, diBernardo D, Cosma MP: A yeast synthetic network for in vivo assessment of reverse-engineering and modeling approaches. Cell. 2009, 137 (1): 172-181. 10.1016/j.cell.2009.01.055.View ArticlePubMedGoogle Scholar
- ElBakry O, Ahmad M, Swamy M: Inference of gene regulatory networks with variable time delay from time-series microarray data. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2013, 10 (3): 671-687.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.