 Research
 Open Access
 Published:
MICRAT: a novel algorithm for inferring gene regulatory networks using time series gene expression data
BMC Systems Biology volumeÂ 12, ArticleÂ number:Â 115 (2018)
Abstract
Background
Reconstruction of gene regulatory networks (GRNs), also known as reverse engineering of GRNs, aims to infer the potential regulation relationships between genes. With the development of biotechnology, such as gene chip microarray and RNAsequencing, the highthroughput data generated provide us with more opportunities to infer the genegene interaction relationships using gene expression data and hence understand the underlying mechanism of biological processes. Gene regulatory networks are known to exhibit a multiplicity of interaction mechanisms which includeÂ functional and nonfunctional, and linear and nonlinear relationships. Meanwhile, the regulatory interactions between genes and gene products are not spontaneous since various processes involved in producing fully functional and measurable concentrations of transcriptional factors/proteins lead to a delay in gene regulation. Many different approaches for reconstructing GRNs have been proposed, but the existing GRN inference approaches such as probabilistic Boolean networks and dynamic Bayesian networks have various limitations and relatively low accuracy. Inferring GRNs from time series microarray data or RNAsequencing data remains a very challenging inverse problem due to its nonlinearity, high dimensionality, sparse and noisy data, and significant computational cost, which motivates us to develop more effective inference methods.
Results
We developed a novel algorithm, MICRAT (Maximal Information coefficient with Conditional Relative Average entropy and Timeseries mutual information), for inferring GRNs from time series gene expression data. Maximal information coefficient (MIC) is an effective measure of dependence for twovariable relationships. It captures a wide range of associations, both functional and nonfunctional, and thus has good performance on measuring the dependence between two genes. Our approach mainly includes two procedures. Firstly, it employs maximal information coefficient for constructing an undirected graph to represent the underlying relationships between genes. Secondly, it directs the edges in the undirected graph for inferring regulators and their targets. In this procedure, the conditional relative average entropies of each pair of nodes (or genes) are employed to indicate the directions of edges. Since the time delay might exist in the expression of regulators and target genes, time series mutual information is combined to cooperatively direct the edges for inferring the potential regulators and their targets. We evaluated the performance of MICRAT by applying it to synthetic datasets as well as real gene expression data and compare with other GRN inference methods. We inferred five 10gene and five 100gene networks from the DREAM4 challenge that were generated using the gene expression simulator GeneNetWeaver (GNW). MICRAT was also used to reconstruct GRNs on real gene expression data including part of the DNAdamaged response pathway (SOS DNA repair network) and experimental dataset in E. Coli. The results showed that MICRAT significantly improved the inference accuracy, compared to other inference methods, such as TDBN, etc.
Conclusion
In this work, a novel algorithm, MICRAT, for inferring GRNs from time series gene expression data was proposed by taking into account dependence and time delay of expressions of aÂ regulator and its target genes. This approach employed maximal information coefficients for reconstructing an undirected graph to represent the underlying relationships between genes. The edges were directed by combining conditional relative average entropy with time course mutual information of pairs of genes. The proposed algorithm was evaluated on the benchmark GRNs provided by the DREAM4 challenge and part of theÂ real SOS DNA repair network in E. Coli. The experimental study showed that our approach was comparable to other methods on 10gene datasets and outperformed other methods on 100gene datasets in GRN inference from time series datasets.
Background
Reconstruction of gene regulatory networks (GRNs), also known as reverse engineering [1] of GRNs, aims to infer the potential relationship between genes using gene expression data. Sometimes the expression of a gene is affected by others (named as its regulators) and meanwhile regulates its downstream target genes. Reconstruction of gene regulatory networks can help people understand the mechanism of gene interactions in the cell and reveal the mystery of life. The discovering of the regulation relationship between genes was initially achieved by biological experiments [2]. However, this approach was costly and progressed slowly in research, thus becoming a bottleneck that restricted the development of biological systems. With the development of biotechnology, such as gene chip microarray, the highthroughput data generated provides us with more opportunities to understand the potential regulatory relationships of genes. In recent years, developing machine learning and data mining algorithms and applying them to reconstruction of gene regulatory networks has become an active research topic in bioinformatics.
Different computational methods have been proposed to tackle the gene regulatory networks identification problems. The typical methods include regression models [3, 4], Bayesian networks [5,6,7,8,9,10], state space models [11,12,13,14] and information theory models [15,16,17,18,19,20], etc.
Several existing methods formulate gene regulatory networks as regression problems which are based on the assumption that all regulatory interactions are linear, meaning that the gene expression level of a target gene is a combination of the expression levels of its transcriptional factors. However, biological networks are known to exhibit multiple regulation mechanisms including nonlinear interactions.
The Bayesian network model employed the joint probability distribution of gene expression data to construct a directed acyclic graph to represent the relationship between genes. A relative change ratio (RCR) method was proposed by Li et al. to preprocess the nullmutants steady state data in order to find the key genes and build GRNs, in which these selected key genes have a higher potential than other genes to play very critical roles [10]. The LBN algorithm (local Bayesian network) [21] was designed by Liu et al. to improve the accuracy of GRN inference from gene expression data by exploring advantages of Bayesian network (BN) and conditional mutual information (CMI) methods. The LBN algorithm first uses CMI to construct an initial network or GRN. Then, the BN method is employed to generate a series of local BNs by selecting the knearest neighbors of each gene as its candidate regulatory genes. Integrating these local BNs forms a tentative network by performing CMI. The final network, or GRN, can be obtained by iteratively performing CMI and local BN on the tentative network. TDBN [7] was used to infer GRNs from time series trajectory data which were combined with the previous knowledge gained in the above step. Wang et al. developed the pLasso [22] method from a Bayesian perspective for the reconstruction of gene networks. This method assigns different prior distributions to different edge subsets according to a modified Bayesian information criterion that incorporates prior knowledge on both the network structure and the pathway information. It used information from the Pathway Commons (PC) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) as prior information for the network reconstruction. However, in reality there is no prior knowledge while inferring the interaction relationship between genes. Moreover, signaling pathways are dynamic events that take place over a given period of time. In order to identify these pathways, expression data over time is required. Dynamic Bayesian network (DBN) is an important and widely used approach for predicting the GRNs from time series expression data but has high computational costs and may not be able to infer a large GRN.
Furthermore, other methods for inferring GRNs have been proposed. Patrilia et al. proposed a new algorithm, namely iRafNet [23], which integrated different data types under a unified random forest framework. The key idea of iRafNet is to introduce a weighted sampling scheme within random forest to incorporate information from other sources of data. Specifically, the model considers the expression of each gene as a function of the expression of other genes. In literature [24], a fast method was developed for inferring gene regulatory relationships from just knockdown data. It used a simple linear regression model focusing on single regulatortarget gene pairs based on knockdown data, and allowed the incorporation of prior knowledge about the relationships and generates posterior probabilities.
Mutual information (MI) is another measurement of the dependence of two variables and is often used to assess the relationship between genes. Margolin [16] and Sales [17] employed MI between two genes to measure their associations in order to reconstruct gene regulatory networks. But they could not tackle the problem that one gene is regulated by multiple genes. PCA_CMI [18] was proposed by Zhang et al. for inferring GRNs from gene expression data considering the nonlinear dependence and topological structure of GRNs by employing path consistency algorithm based on conditional mutual information (CMI). It discovered the associations between related genes without identifying the regulators and their targets.
Reshef et al. presented a measure of dependence for twovariable relationships, named the maximal information coefficient (MIC) [25], which captures a wide range of functional and nonfunctional associations. MIC has better performance on measuring the dependence of twovariable relationships in comparison with other methods. Thus, we concentrate our effort on identifying the interactions between genes based on this method.
In this paper, we developed a novel algorithm, MICRAT (Maximal Information coefficient with Conditional Relative Average entropy and Time series mutual information), for inferring GRNs from time series gene expression data. This approach employed MIC as measurement for assessing the correlation between each pair of genes while constructing an undirected graph to represent the underlying relationship between genes. It subsequently combined conditional relative average entropy with time series mutual information to determine the potential regulator and its target genes.
Methods
Information theory has been used to reconstruct GRNs from gene expression data. Mutual information is generally used as a useful criterion for measuring the dependence between variables (genes) X and Y [18]. For gene expression data, variable X is a vector, in which the element x denotes gene expression value in different conditions (samples/time points). In this section, we introduced several basic concepts of information theory and graph theory which will be used in the proposed method.
Mutual information and conditional mutual information
For two random variables (genes) X and Y, the mutual information of X and Y is defined as:
where H(X) and H(Y) are the entropy of the random variables X and Y, respectively; H(X,â€‰Y) is the joint entropy of X and Y.
For a discrete variable X, the entropy H(X) measures the average uncertainty of variable X, and can be computed by
where p(x) is the probability that the variable X takes x;
The joint entropy of X and Y is denoted by
where p(x,â€‰y) is the joint probability of X and Y.
The mutual information of X and Y is calculated by
The conditional mutual information [16] of X and Y given Z, I(X, Yâ€‰Z) is defined as
where p(x,â€‰y,â€‰z) is the joint probability of variables X, Y and Z; p(xâ€‰z) is the conditional probability that the variable x holds under the condition that the variable z is established; similarly, p(x, yâ€‰z) is the conditional probability that the variable x and the variable y are established under the condition that the variable z is satisfied.
In this paper, the probability of a variable (gene) is estimated with the Gaussian kernel probability density estimator [26] as follows [18].
where C is the covariance matrix of variable X, âˆ£Câˆ£ is the determinant of matrix C, n is the number of samples (time points), and m is the number of variables (genes).
Then the entropy of variable X can be computed by using (2) and (6).
With formulas (1) and (7), the mutual information of variables X and Y can be obtained as given in Eq. (8).
Similarly, the conditional mutual information of variables X and Y given variable Z can be calculated by Eq. (9).
MI can be used for measuring the dependence of two variables (genes) and CMI can help determine the dependence between two variables given another variable. Generally speaking, a higher MI (or CMI) indicates closer relationship between variables.
Maximal information coefficient
The maximal information coefficient (MIC) is an effective measurement of interesting relationship between pairs of variables. MIC captures a wide range of functional and nonfunctional associations, and has two heuristic properties: generality and equitability. The generality ensures that with sufficient sample size MIC captures a wide range of interesting relationships, not limited to specific function types, or even to all functional relationships; and the equitability means that MIC gives similar scores to equally noisy relationships of different types. These two heuristic properties of MIC indicate that it is suitable to infer various underlying relationships.
The Maximal Information Coefficient of a set D of twovariable data with sample size n and grid size less than B(n) is given by literature [25].
where the dataset D is composed of the pairs of values <x, y>, and xâ€‰âˆˆâ€‰X, yâ€‰âˆˆâ€‰Y; M(D)_{X, Y} is the characteristic matrix of dataset D of variable X and Y with entries
where I^{âˆ—}(D,â€‰x,â€‰y) is the maximum mutual information achieved by x and y; n is size of D; log(â€‰min(x,â€‰y)) is used to normalize the maximum mutual information; B(n) is the grid size, and B(n)â€‰=â€‰n^{0.6}, which was found to work well in practice [25].
Conditional relative average entropy
The conditional relative average entropy (CRAE) [27] is defined as:
where âˆ£Yâˆ£ is the number of values of variable Y, and H(Y) is the entropy of the variable Y; H(Yâ€‰X) is the conditional entropy of the variable Y under the condition of the given variable X.
Time series mutual information
In most cases, there is a time delay during the gene regulations or at least the gene expressions change simultaneously for regulatory and target genes. Thus the time series mutual information could contribute to the orientation of the regulation.
Suppose that we have T time points and the expression levels of N genes are measured at each time point. The time series gene expression data can be summarized as a Tâ€‰Ã—â€‰N matrix Xâ€‰=â€‰(x_{1,} x_{2,}â€¦,â€‰x_{T})^{T} whose i^{th} row vector x_{i}â€‰=â€‰(x_{i1},â€‰x_{i2},â€‰â€¦x_{iN})^{T}corresponds to a gene expression level vector measured at timepoint i.
For given genes X and Y, the time series mutual information from X to Y during time points 1 to T is defined as
where tdelay(0â€‰â‰¤â€‰tdelayâ€‰â‰¤â€‰T) represents the time delay for gene X regulating gene Y; X_{1â€‰â†’â€‰(Tâ€‰âˆ’â€‰tdelay)} is a vector representing the gene expression level of X from timepoint 1 to timepoint (Tâ€‰âˆ’â€‰tdelay). Similarly, Y_{1â€‰+â€‰tdelayâ€‰â†’â€‰T} represents the gene expression level of Y from timepoint 1â€‰+â€‰tdelay to timepoint T. In this paper, we set the time delay to 1 since from the observation of our experiments there is no significant improvement to the accuracy with different settings.
Gene regulatory networks
In this work, a gene regulatory network is described by a directed graph (digraph). A graph G is defined by the pair (V(G),â€‰E(G)), where V(G) denotes the set of vertices (nodes) and E(G)â€‰âŠ†â€‰V(G)â€‰Ã—â€‰V(G) denotes the set of edges. In a digraph, an edge is defined by an ordered pair of vertices (x,â€‰y) denoting the edge direction, from vertex x pointing to vertex y. Here, the nodes represent genes while the edges describe the gene regulatory interactions. A directed edge (x,â€‰y) indicates that gene x regulates gene y, while an undirected edge (x,â€‰y) only indicates that there is an association between gene x and gene y.
Algorithm of MICRAT
We propose a novel algorithm called MICRAT for reconstructing gene regulatory networks using time series gene expression profiles. The algorithm mainly consists of two procedures: First, an undirected graph is generated representing the associations between genes based on the maximal information coefficient (MIC) of each pair of genes. Then, the conditional relative average entropy combining with time series mutual information is used to determine the directions of the edge in the undirected graph. The flowchart of the method for reconstructing GRN is shown in Fig. 1, and more details are described in the following subsections.
Generating an undirected gene interaction graph
The types of relationship between genes vary diversely. It is difficult, even impossible to identify the gene associations with only one function. The maximal information coefficient could capture a wide range of dependence of two variables, both functional and nonfunctional. Thus it is suitable for measuring the associations between genes.
We refer to two gene expression data G_{X} and G_{Y} as variables X and Y, respectively. The association between them is measured by
where MIC(X,â€‰Y) denotes the maximal information coefficient of X and Y. A higher value of I_{D}(X,â€‰Y) indicates a closer relationship between X and Y, vice versa. The edges in the undirected graph are added based on a given a threshold Î¸ of I_{D}(X,â€‰Y).
The generation of an undirected gene interaction graph includes the following five steps:

(1)
Initialize the gene regulatory network, G_{D}, as an undirected graph with no vertices and no edges. There are n genes in the dataset D.

(2)
For a given time series gene expression dataset D, calculate I_{D}(X,â€‰Y) for each pair of genes (x,â€‰y).

(3)
Rank the pairs of genes by descending order of I_{D}(X,â€‰Y) obtained by step (2) to a list called RLI.

(4)
From the top of RLI, check each pair of genes (x,â€‰y), add an undirected edge between vertices x and y if I_{D}(X,â€‰Y)â€‰â‰¥â€‰Î¸ where Î¸ is a threshold and will be discussed in the experiment section.

(5)
Delete redundant edges. There might be some redundant edges in the undirected graph especially when there are triangle cycles. Conditional mutual information of genes is used to determine whether to delete edges or not. For example, there is a triangle cycle including three genes X, Y and Z in the graph since I_{D}(X,â€‰Y), I_{D}(X,â€‰Z) and I_{D}(Y,â€‰Z) are not lower than Î¸ and three edges between X, Y and Z are included. We calculated three conditional mutual information values of I(X,â€‰Yâ€‰Z), I(X,â€‰Zâ€‰Y) and I(Y,â€‰Zâ€‰X), and deleted edge (Y,â€‰Z) if I(Y, Zâ€‰X)â€‰=â€‰0, which indicates that gene Y has no relationship with gene Z.
Data discretization and edgeorientation
In our algorithm, the edges in the undirected graph are oriented by combining the conditional relative average entropy with time series mutual information of each pair of nodes. From the observation of our experiments, better performance was obtained on discretized data. Therefore, the gene expression data were discretized before being used for edge orientation.
The gene expression data are first normalized by standard fraction ZScore and then discretized according to a given threshold. The standard fraction Zâ€‰_â€‰score of the gene X at the time point t_{j} is defined as:
where Î¼ denotes the mean value of the gene expression data at all time points of gene X, Ïƒ is the standard deviation, and x_{j} denotes the expression value of the gene X at the time point t_{j}. Given a threshold k, if Z_{x, j}â€‰â‰¥â€‰k, then the gene expression data at time t_{j} is denoted as 1; otherwise the gene expression data is denoted as 0.
All edges in the undirected graph are oriented by the following procedure: Given two vertices X and Y of an edge in the undirected graph, if CRAE(Xâ€‰â†’â€‰Y)â€‰+â€‰I_{T}(Xâ€‰â†’â€‰Y)â€‰>â€‰CRAE(Yâ€‰â†’â€‰X)â€‰+â€‰I_{T}(Yâ€‰â†’â€‰X), then the orientation is Xâ€‰â†’â€‰Y, indicating that gene X regulates gene Y; While if CRAE(Xâ€‰â†’â€‰Y)â€‰+â€‰I_{T}(Xâ€‰â†’â€‰Y)â€‰<â€‰CRAE(Yâ€‰â†’â€‰X)â€‰+â€‰I_{T}(Yâ€‰â†’â€‰X), then the orientation is Yâ€‰â†’â€‰X, which means gene Y regulates gene X. In the case of CRAE(Xâ€‰â†’â€‰Y)â€‰+â€‰I_{T}(Xâ€‰â†’â€‰Y)â€‰=â€‰CRAE(Yâ€‰â†’â€‰X)â€‰+â€‰I_{T}(Yâ€‰â†’â€‰X), the higher value of I_{T}(Xâ€‰â†’â€‰Y) (or I_{T}(Yâ€‰â†’â€‰X)) indicates Xâ€‰â†’â€‰Y (or Yâ€‰â†’â€‰X). The pseudo code of algorithm MICRAT is given in Fig. 2.
Results and discussion
In this section, we evaluate the performance of algorithm MICRAT by applying it to synthetic datasets as well as a real gene expression dataset for inferring GRNs. Several experiments were carried out to demonstrate the performance of our approach by comparison with other related methods.
For synthetic datasets, five 10genes and five 100genes time series expression datasets from the DREAM4 challenge were used, respectively [28]. The real life gene expression dataset involves an eightgene subnetwork, part of a DNAdamage response pathway (SOS pathway) in the bacteria E. Coli.
Generally, there are several measurements for evaluating the performance of GRN inference methods, including Precision, Recall, Accuracy (ACC), Fâ€‰_â€‰score, Matthews correlation coefficient (MCC), etc. They are defined as follows:
where TP, FP, TN, FN are the numbers of true positives, false positives, true negatives and false negatives, respectively. In this study, true positives indicate correctly inferred edges which exist in golden standard networks, while false positives denote wrongly inferred edges which do not exist in golden standard networks, and so on for true negatives and false negatives, respectively. Usually, there exists an inverse relationship between precision and recall. It is possible to increase one at the cost of reducing the other. Therefore, Accuracy, Fâ€‰_â€‰score and Matthews correlation coefficient are widely used as balanced evaluation measures. Here we use ACC, Fâ€‰_â€‰score and MCC as measurements for evaluating the performance of our method.
In order to set the best value of Î¸, the threshold of MIC of two gene expression data while measuring the independence between genes, we test MICRAT by varying Î¸ from 0 to 1 with interval of 0.01 on both 10gene expression datasets and 100gene expression datasets. Empirical results showed that with the increase of Î¸, the precision of the model becomes higher while the recall goes down. According to the experimental results, the comparatively good performance of MICRAT could be obtained while the thresholds were set to 0.43 for 10gene datasets and 0.15 for 100gene datasets, respectively.
Experiments on simulation datasets
We first evaluated MICRAT on simulation datasets which were generated based on benchmarking networks. Many tools have been developed for assessing the effectiveness of GRN inference methods. The DREAM4 challenge introduced a critical performance assessment framework of methods for GRN inference and presented an in silico benchmark suite as a blinded, communitywide challenge within the context of the DREAM project. In this challenge, the gene expression datasets with noise and their golden standard (benchmark) networks were given. The golden standard networks were selected from source networks of real species [28]. They are widely used as a benchmark for the evaluation of GRN inference methods. Here, we tested our algorithm on the DREAM4 challenge time series gene expression data in networks of size 10 and 100 genes, respectively.
As mentioned above, MICRAT found the associations between pairs of genes in the beginning. Then the conditional relative average entropy in cooperation with time series mutual information was used to orient the edges in the undirected graph and obtain the GRN. For evaluating the performance of our approach, we compared our algorithm with TDBN [7, 10] and NARROMI [19]. TDBN is a typical method with significant impact on inferring GRNs from time series gene expression data. NARROMI is quite related to our method and it significantly outperformed other methods, such as LP [29], LASSO [30, 31], ARACNE [16] and GENIE3 [32]. This procedure was tested on DREAM4 time series gene expression datasets with 10gene networks and 100gene networks, respectively. Experimental results were given in Table 1. First, we carried out MICRAT on five 10gene datasets. There are 21â€‰Ã—â€‰5â€‰=â€‰105 time points in each of the datasets. We computed ACC, Fâ€‰_â€‰score and MCC on each of the five datasets as evaluation measures for the compared methods. Here, we chose 0.43 as the threshold value of MIC to determine the dependence between genes in the inferred network for our method, and the threshold of Zâ€‰_â€‰score was set to kâ€‰=â€‰1.2 for discretization of gene expression data when the edge is oriented. For NARROMI, the threshold is set to 0.05 which is the default value given in literature [19]. For TDBN, all the parameters were chosen as default in their literature. The accuracy of our approach is higher than those of the other methods in four datasets and slightly lower than that of NARROMI for dataset 5. The left part of Fig. 3 shows the average accuracies on five 10gene datasets for three methods. It can be observed that MICRAT performed better than the others. For the Fâ€‰_â€‰score measurement, MICRAT has the best performance on two datasets, but the worst performance on one dataset. Figure 4 shows that the average Fâ€‰_â€‰scores on five 10gene datasets is somewhat inferior to NARROMI but superior to TDBN. From Table 1 it can be observed that MICRAT is only slightly inferior to NARROMI on one dataset but has better performance on others for MCC. Figure 5 gives the average Matthews correlation coefficients of the three methods on five 10gene datasets and five 100gene datasets, respectively. We can see that MICRAT is superior to NARROMI and significantly outperforms TDBN.
We also applied our algorithm to the five DREAM4 100gene datasets. Each dataset has 21â€‰Ã—â€‰10â€‰=â€‰210 time points. We also computed ACC, Fâ€‰_â€‰score and MCC on each of the five datasets as evaluation measures for MICRAT and the other two methods. Experimental results were showed in Table 1 and the right part of Figs. 3, 4 and 5. In these experiments, we chose 0.15 as the threshold of MIC to determine the regulation relationship between genes in the inferred network. For NARROMI, the threshold is also set to 0.05, the same value as in literature [19]. As shown in Table 1 and the right part of Figs. 3, 4 and 5, all the three measures of ACC, Fâ€‰_â€‰score and MCC in our method are superior to NARROMI and TDBN on each of the five 100gene time series dataset. The results show that our approach has better performance than other methods on the 100gene datasets. Thus MICRAT has the potential to infer large GRNs using time series expression data with a large number of genes.
Experiments on real gene expression dataset
To validate our method on a real biological gene regulatory network, we analyzed the wellknown SOS DNA repair network in E. Coli. This GRN is well known for its responsibility of repairing the DNA if it gets damaged. It is the largest, most complex and best understood DNA damageinducible network to be characterized to date [33]. We applied our MICRAT algorithm to an eightgene network, part of the SOS DNA repair network in the bacteria E. coli [34,35,36]. The expression data sets of the network were obtained from Uri Aron Lab [36]. These data are kinetics of 8 genes namely uvrD, lexA, umuD, recA, uvrA, uvrY, ruvA and polB. Four experiments were done at various UV light intensities (Exp.1 and 2: 5Jm^{âˆ’â€‰2}, Exp. 3 and 4: 20Jm^{âˆ’â€‰2}). In each experiment, the 8 genes were monitored at 50 instants which are evenly spaced by 6 min intervals. In order to assess the effectiveness of our algorithm, we compared the inferred network with the known interactions between these eight genes. In our experiment, the threshold of MIC was set to Î¸=0.1 for measuring the dependence of the pair of genes, and the threshold of Zâ€‰_â€‰score was also set to kâ€‰=â€‰1.2 for discretization of gene expression data while the edge is oriented. Figure 6 gives the real SOS DNA repair network. Figure 7 shows part of inferred GRN inferred by our approach. In this Figure, one can see that our method finds 6 out of the 9 edges in the target network and identifies lexA as the â€˜hubâ€™ gene for this network. The exact ground truth for this network is not precisely known, hence it is not possible to calculate wellknown performance measures [33]. By using the known interactions obtained from literature [33, 37, 38], Table 2 showed the comparison of our algorithm with other methods such as Perrin [39], BANJO [40], GlobalMIT [41] and Morshed [33] for correct predictions on the regulatory relationships between these eight genes. From Table 2, it could be observed that our method identified most interactions except lexAâ†’ruvA, lexAâ†’lexA and recAâ†’lexA, where the latter two are not inferred since our method does not consider the double regulation between two genes and the gene that is selfregulated. That is the limitation of our method, which motivates us to continue further study for improvement.
Conclusions
In this work, a novel algorithm, MICRAT, for inferring GRNs from time series gene expression data was proposed by taking into account dependence and time delay of expressions of a regulator and its target genes. This approach employed maximal information coefficients for reconstructing an undirected graph to represent the underlying relationships between genes. The edges were directed by combining conditional relative average entropy with time course mutual information of pairs of genes. The performance of our proposed algorithm was evaluated using synthetic datasets from benchmark GRNs provided by the DREAM4 challenge and the real, experimental dataset SOS DNA repair network in E. Coli. Experimental study showed that our approach was comparable to other methods on 10gene datasets and outperformed other methods on 100gene datasets in GRN inference from time series datasets. In the followup study, we plan to test the algorithm on largescale datasets, including time course mRNASeq, human gene expression data, etc. and compare our approach with more recently proposed algorithms so as to evaluate its robustness and improve its performance.
Abbreviations
 ACC:

Accuracy
 CMI:

Conditional mutual information
 CRAE:

Conditional relative average entropy
 GRN:

Gene regulatory network
 MCC:

Matthews correlation coefficient
 MI:

Mutual information
 MIC:

Maximal information coefficient
References
Vijesh N, Chakrabarti SK, Sreekumar J. Modeling of gene regulatory networks: a review. J Biomed Sci Eng. 2013;06(1):223â€“31. http://file.scirp.org/pdf/JBiSE_2013022716483315.pdf. Accessed 03 July 2018.
Lemmens K, Dhollander T, Bie TD, et al. Inferring transcriptional modules from ChIPchip, motif and microarray data. Genome Biol. 2006;7(5):R37. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.279.1345&rep=rep1&type=pdf. Accessed 03 July 2018.
Singh N, Vidyasagar M. bLARS: an algorithm to infer gene regulatory networks. IEEE/ACM Trans Comput Biol Bioinform. 2016;13(2):1. https://www.computer.org/csdl/trans/tb/2016/02/07138615.pdf. Accessed 03 July 2018.
Haury AC, et al. TIGRESS: trustful inference of gene regulation using stability selection. BMC Syst Biol. 2012;6(1):145. https://core.ac.uk/download/pdf/51228782.pdf. Accessed 03 July 2018.
Friedman N, Linial M, Nachman I, et al. Using Bayesian network to analyze expression data. J Comput Biol. 2000;7(3â€“4):601â€“20. www.cs.huji.ac.il/~nir/Papers/FLNP1Full.pdf. Accessed 03 July 2018.
Murphy K, Mian S. Modeling gene expression data using dynamic Bayesian networks. Technical Report, Computer Science Division. Berkeley: University of California; 1999. www.cs.ubc.ca/~murphyk/Papers/ismb99.pdf. Accessed 03 July 2018
Zou M, Conzen SD. A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data. Bioinformatics. 2005;21(1):71â€“9. https://academic.oup.com/bioinformatics/article/21/1/71/212416. Accessed 03 July 2018.
Yeung KY, Dombek KM, Lo K, et al. Construction of regulatory networks using expression timeseries data of a genotyped population. Proc Natl Acad Sci U S A. 2011;108(48):19436â€“41. www.pnas.org/content/108/48/19436.full. Accessed 03 July 2018.
Haoni L, et al. Learning the structure of gene regulatory networks from time series gene expression data. BMC Genomics. 2011;12(Suppl 5):S13. https://www.ncbi.nlm.nih.gov/pubmed/22369588. Accessed 03 July 2018.
Li P, et al. Gene regulatory network inference and validation using relative chang tratio analysis and timedelayed dynamic Bayesian network. EURASIP J Bioinform Syst Biol. 2014;2014:12. https://link.springer.com/article/10.1186/s1363701400123. Accessed 03 July 2018.
Wu X, et al. State space model with hidden variables for reconstruction of gene regulatory networks. BMC Syst Biol. 2011;5(Suppl 3):S3. http://europepmc.org/articles/PMC3287571. Accessed 03 July 2018.
Wu FX. Gene regulatory network modelling: a statespace approach. Int J Data Min Bioinform. 2008;2(1):1â€“14. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.192.4775&rep=rep1&type=pdf. Accessed 03 July 2018.
Osamu H, Ryo Y, Seiya I, Rui Y, Tomoyuki H, CharnockJones DS, Cristin P, Satoru M. Statistical inference of transcriptional modulebased gene networks from time course gene expression profiles by using state space models. Bioinformatics. 2008;24:932â€“42. https://academic.oup.com/bioinformatics/article/24/7/932/295736. Accessed 03 July 2018.
Kojima K, Rui Y, Seiya I, Mai Y, Masao N, Ryo Y, Teppei S, Kazuko U, Tomoyuki H, Noriko G, Satoru M. A state space representation of VAR models with sparse learning for dynamic gene networks. Genome Inform. 2009;22:56â€“68. https://www.jsbi.org/pdfs/journal1/IBSB09/IBSB09006.pdf. Accessed 03 July 2018.
Wang J, Chen B, Wang Y, et al. Reconstructing regulatory networks from the dynamic plasticity of gene expression by mutual information. Nucleic Acids Res. 2013;41(8):395â€“408. https://academic.oup.com/nar/article/41/8/e97/2409387. Accessed 03 July 2018.
Margolin AA, Nemenman I, Basso K, et al. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics. 2006;7(1):S7. https://arxiv.org/pdf/qbio/0410037. Accessed 03 July 2018.
Sales G, Romualdi C. Parmigenea parallel R package for mutual information estimation and gene network reconstruction. Bioinformatics. 2011;27(13):1876â€“7 (2). https://academic.oup.com/bioinformatics/article/27/13/1876/184634. Accessed 03 July 2018.
Zhang X, Zhao XM, He K, et al. Inferring gene regulatory networks from gene expression data by path consistency algorithm based on conditional mutual information. Bioinformatics. 2012;28(1):98â€“104. https://dl.acm.org/citationcfm?id=2139373 Accessed 03 July 2018.
Zhang X, et al. NARROMI: a noise and redundancy reduction technique improves accuracy of gene regulatory network inference. Bioinformatics. 2013;29(1):106â€“13. https://core.ac.uk/download/pdf/52414020.pdf. Accessed 03 July 2018.
Chaitankar V, Ghosh P, Perkins EJ, et al. A novel gene network inference algorithm using predictive minimum description length approach. BMC Syst Biol. 2010;4(Suppl 1):S7. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2880413. Accessed 03 July 2018.
Liu F, Zhang SW, Guo WF, Wei ZG, Chen L. Inference of gene regulatory network based on local Bayesian networks. PLoS Comput Biol. 2016;12(8):e1005024. https://doi.org/10.1371/journal.pcbi.1005024 http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005024. Accessed 03 July 2018.
Wang Z. Incorporating prior knowledge into gene network study. Bioinformatics. 2013;29(20):2633â€“40. https://academic.oup.com/bioinformatics/article/29/20/2633/277562. Accessed 03 July 2018.
Petralia F, Wang P, Yang J, Zhidong T. Integrative random forest for gene regulatory network inference. Bioinformatics. 2015;31:197â€“205. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4542785/. Accessed 03 July 2018.
Young WC, Raftery AE, Yeung KY. A posterior probability approach for gene regulatory network inference in genetic perturbation data. Math Biosci Eng. 2017;13(6):1241â€“51. https://arxiv.org/abs/1603.04835. Accessed 03 July 2018.
Reshef DN, Reshef YA, Finucane HK, et al. Detecting novel associations in large data sets. Science. 2011;334(6062):1518â€“24. http://science.sciencemag.org/content/334/6062/1518. Accessed 03 July 2018.
Basso K, et al. Reverse engineering of regulatory networks in human B cells. Nat Genet. 2005;37:382â€“92. https://s3uswest2.amazonaws.com/owwfilespublic/b/b2/Basso.pdf. Accessed 03 July 2018.
Jiang J, Wang J, Yu H, et al. Poison identification based on Bayesian network: a novel improvement on K2 algorithm via Markov blanket[M]// advances in swarm intelligence: Springer Berlin Heidelberg; 2013. p. 173â€“82. https://link.springer.com/chapter/10.1007/9783642387159_21. Accessed 03 July 2018
Dream4 In Silico Network Challenage. http://dreamchallenges.org/project/dream4insiliconetworkchallenge/. Accessed 03 July 2018.
Wang Y, Joshi T, Zhang XS, et al. Inferring gene regulatory networks from multiple microarray datasets. Bioinformatics. 2006;22(22):2413â€“20. https://academic.oup.com/bioinformatics/article/22/19/2413/240982. Accessed 03 July 2018.
Tibshirani R. Regression shrinkage and selection via the lasso. J Royal Stat Soc. 1996;58(1):267â€“88. http://statweb.stanford.edu/~tibs/ftp/lassoretro.pdf. Accessed 03 July 2018.
Geeven G, et al. Identification of contextspecific gene regulatory networks with GEMULAgene expression modeling using LAsso. Bioinformatics. 2012;28(2):214â€“21. https://academic.oup.com/bioinformatics/article/28/2/214/198473. Accessed 03 July 2018.
HuynhThu VA, et al. Infering regulatory networks from expression data using treebased methods. PLoS One. 2010;5(9):e12776. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0012776. Accessed 03 July 2018.
Morshed, et al. Simultaneous learning instantaneous and timedelayed genetic interactions using novel information theoretic scoring technique. BMC Syst Biol. 2012;6:62. https://link.springer.com/chapter/10.1007%2F9783642249587_29. Accessed 03 July 2018.
Ronen M, Rosenberg R, Shraiman BI, et al. Assigning numbers to the arrows: parameterizing a gene regulation network by using accurate expression kinetics. Proc Natl Acad Sci U S A. 2002;99(16):10555â€“60. www.pnas.org/lookup/doi/10.1073/pnas.152046799. Accessed 03 July 2018.
Shenorr SS, Milo R, Mangan S, et al. Network motifs in the transcriptional regulation network of Escherichia coli. Nat Genet. 2002;31(1):64â€“8. https://www.weizmann.ac.il/mcb/UriAlon/sites/mcb.UriAlon/files/network_motifs_in_coli_0.pdf. Accessed 03 July 2018.
Uri Alonâ€™s SOS Dataset webpage. https://www.weizmann.ac.il/mcb/UriAlon/download/downloadabledata. Accessed 03 July 2018.
Noman N, Iba H. Inferring gene regulatory networks using differential evolution with local search heuristics. IEEE/ACM Trans Comput Biol Bioinform. 2007;4(4):643â€“7. https://www.ncbi.nlm.nih.gov/pubmed/17975274. Accessed 03 July 2018.
Kamura S, et al. Inference of Ssystem models of genetic networks using a cooperative coevolutionary algorithm. Bioinformatics. 2005;21(7):1154â€“63. https://academic.oup.com/bioinformatics/article/21/7/1154/268773. Accessed 03 July 2018.
Perrin BE, Ralaivola L, Mazurie A, et al. Gene networks inference using dynamic Bayesian networks. Bioinformatics. 2003;19(suppl 2):ii138â€“48. https://academic.oup.com/bioinformatics/article/19/suppl_2/ii138/180436. Accessed 03 July 2018.
Yu J, Smith VA, Wang PP, et al. Advances to Bayesian network inference for generating causal networks from observational biological data. Bioinformatics. 2004;20(18):3594â€“603. https://www.semanticscholar.org/paper/AdvancestoBayesiannetworkinferenceforcausalYuSmith/adcbcb725490d64d12b4f795e1e381ca6b8de4b4. Accessed 03 July 2018.
Vinh NX, Chetty M, Coppel R, et al. GlobalMIT: learning globally optimal dynamic bayesian network with the mutual information test criterion. Bioinformatics. 2011;27(19):2765â€“6. https://academic.oup.com/bioinformatics/article/27/19/2765/231220. Accessed 03 July 2018.
Funding
This work was partially supported by a subaward of the National Science Foundation EPS 0903787 and the U.S. Army Environmental Quality/Installation (EQI) Basic Research Program. Permission was granted by the Chief of Engineers to publish this information. It was also partially supported by the Key Research Projects of Universities in Henan Province of China under grant number 17A520015. The publication cost of this article was funded by the U.S. Army Environmental Quality/Installation (EQI) Basic Research Program.
Availability of data and materials
The datasets used and/or analysed during the current study are available at http://dreamchallenges.org/project/dream4insiliconetworkchallenge/. The codes of the algorithm of MICRAT are available from the corresponding author on reasonable request.
About this supplement
This article has been published as part of BMC Systems Biology Volume 12 Supplement 7, 2018: From Genomics to Systems Biology. The full contents of the supplement are available online at https://bmcsystbiol.biomedcentral.com/articles/supplements/volume12supplement7.
Author information
Authors and Affiliations
Contributions
BY, CZ and PG conceived the project. YX and BY developed and implemented the MICRAT algorithms and conducted the experimental study. BY and CZ wrote the paper. PG, AM and WK participated in the analysis and provided suggestions for the document. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisherâ€™s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Yang, B., Xu, Y., Maxwell, A. et al. MICRAT: a novel algorithm for inferring gene regulatory networks using time series gene expression data. BMC Syst Biol 12 (Suppl 7), 115 (2018). https://doi.org/10.1186/s1291801806351
Published:
DOI: https://doi.org/10.1186/s1291801806351
Keywords
 Gene regulatory networks
 Maximal information coefficient
 Conditional relative average entropy
 Timeseries mutual information