Simultaneous learning of instantaneous and time-delayed genetic interactions using novel information theoretic scoring technique
- Nizamul Morshed^{1}Email author,
- Madhu Chetty^{1} and
- Nguyen Xuan Vinh^{1}
DOI: 10.1186/1752-0509-6-62
© Morshed et al; licensee BioMed Central Ltd. 2012
Received: 17 December 2011
Accepted: 6 June 2012
Published: 12 June 2012
Abstract
Background
Understanding gene interactions is a fundamental question in systems biology. Currently, modeling of gene regulations using the Bayesian Network (BN) formalism assumes that genes interact either instantaneously or with a certain amount of time delay. However in reality, biological regulations, both instantaneous and time-delayed, occur simultaneously. A framework that can detect and model both these two types of interactions simultaneously would represent gene regulatory networks more accurately.
Results
In this paper, we introduce a framework based on the Bayesian Network (BN) formalism that can represent both instantaneous and time-delayed interactions between genes simultaneously. A novel scoring metric having firm mathematical underpinnings is also proposed that, unlike other recent methods, can score both interactions concurrently and takes into account the reality that multiple regulators can regulate a gene jointly, rather than in an isolated pair-wise manner. Further, a gene regulatory network (GRN) inference method employing an evolutionary search that makes use of the framework and the scoring metric is also presented.
Conclusion
By taking into consideration the biological fact that both instantaneous and time-delayed regulations can occur among genes, our approach models gene interactions with greater accuracy. The proposed framework is efficient and can be used to infer gene networks having multiple orders of instantaneous and time-delayed regulations simultaneously. Experiments are carried out using three different synthetic networks (with three different mechanisms for generating synthetic data) as well as real life networks of Saccharomyces cerevisiae, E. coli and cyanobacteria gene expression data. The results show the effectiveness of our approach.
Background
Bayesian network and its extension, dynamic Bayesian network (DBN), has found significant applications in the modeling of genetic interactions [1, 2]. To the best of our knowledge, prior works on inter and intra-slice connections in the dynamic probabilistic network formalism [3, 4] have modelled a DBN using an initial network and a transition network employing the 1st-order Markov assumption, where the initial network exists only during the initial period of time and subsequently the dynamics is expressed using only the transition network. Realising that a d-th order DBN has variables replicated d times, a 1st-order DBN for this task^{b} is therefore usually limited to around 10 variables. Alternately, if a 2nd-order DBN model is chosen, it can mostly deal with 6-7 variables [5]. Thus, prior works on DBNs were either unable to discover these two interactions simultaneously or were unable to fully exploit its potential, thereby restricting studies to simpler network configurations. However, since our proposed approach does not replicate variables, we can study any complex network configuration without limitations on the number of nodes. Zou et al. [2], while highlighting the existence of both instantaneous and time-delayed interactions among genes while considering the parent-child relationships of a particular order, did not account for the regulatory effects of other parents (having different order of regulation than the current one) on that particular child. This is in violation of the biological reality that parents with various orders of regulation can jointly regulate a child. Our proposed method supports multiple parents to regulate a child simultaneously, with different orders of regulation. Moreover, the limitation of detecting genetic interactions like $A\iff B$, which are prevalent in genetic networks [6], is also overcome with the proposed method. Experiments conducted using both synthetic and real-life GRNs show the effectiveness of our approach.
Results and discussion
- 1.Sensitivity(Se): It measures the proportion of true connections which are correctly inferred. It is defined as follows:$\phantom{\rule{1em}{0ex}}\mathit{\text{Se}}=\frac{\mathit{\text{TP}}}{\mathit{\text{TP}}+\mathit{\text{FN}}}$(1)
- 2.Specificity (Sp): Specificity is defined by the following equation:$\phantom{\rule{1em}{0ex}}\mathit{\text{Sp}}=\frac{\mathit{\text{TN}}}{\mathit{\text{TN}}+\mathit{\text{FP}}}$(2)
- 3.Precision (Pr): Precision is proportional to the inferred connections which are correct. It is defined as follows:$\phantom{\rule{1em}{0ex}}\mathit{\text{Pr}}=\frac{\mathit{\text{TP}}}{\mathit{\text{TP}}+\mathit{\text{FP}}}$(3)
- 4.F-score (F): Biologically, a good reconstruction algorithm should infer as many correct arcs as possible, in addition to the criteria that most of the inferred arcs should be correct. The F-score measure is the harmonic mean of Se and Pr [7] and represents a compromise between these two objectives:$\phantom{\rule{1em}{0ex}}\mathit{\text{F}}=\frac{2\mathit{\text{Pr}}\phantom{\rule{.50em}{0ex}}\mathit{\text{Se}}}{\mathit{\text{Pr}}+\mathit{\text{Se}}}$(4)
Since our method uses discrete data for the statistical significance tests embedded in the scoring function, we applied the Persist [8] algorithm to discretize the data into 3 levels. The confidence level (α) is set to 0.9. We will use a local search in the DAG space with the classical operators of arc addition, arc deletion and arc reversal. The starting point of the search is always an empty graph. The parameters for all the other methods that are used for comparison are set to their default values mentioned in their user manuals.
Synthetic network
Synthetic network using differential equation based models
Probabilistic network of yeast
Comparison of proposed method with BANJO and BNFinder on the yeast sub-network
N=30 | N=50 | N=100 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Se | Sp | Pr | F | Se | Sp | Pr | F | Se | Sp | Pr | F | |
Proposed | 0.62± | 0.992± | 0.57± | 0.59± | 0.80± | 1.0± | 0.79± | 0.79± | 0.82± | 1.0± | 0.76± | 0.79± |
Method | 0.12 | 0.0045 | 0.11 | 0.11 | 0.04 | 0.0 | 0.07 | 0.05 | 0.06 | 0.0 | 0.03 | 0.04 |
BNFinder | 0.53± | 0.996± | 0.68± | 0.59± | 0.62± | 0.997± | 0.74± | 0.67± | 0.69± | 0.997± | 0.74± | 0.72± |
+BDe | 0.04 | 0.0006 | 0.02 | 0.02 | 0.04 | 0.0019 | 0.13 | 0.06 | 0.08 | 0.0007 | 0.06 | 0.07 |
BNFinder | 0.51± | 0.996± | 0.63± | 0.56± | 0.60± | 0.996± | 0.68± | 0.63± | 0.65± | 0.996± | 0.69± | 0.67± |
+MDL | 0.08 | 0.0006 | 0.07 | 0.08 | 0.05 | 0.0022 | 0.15 | 0.09 | 0.0 | 0.0 | 0.04 | 0.02 |
BANJO | 0.51± | 0.987± | 0.49± | 0.46± | 0.55± | 0.993± | 0.57± | 0.55± | 0.60± | 0.995± | 0.61± | 0.61± |
0.08 | 0.01 | 0.2 | 0.15 | 0.09 | 0.0049 | 0.23 | 0.16 | 0.08 | 0.0014 | 0.09 | 0.08 |
Synthetic network of glucose homeostasis
In higher eukaryotes, glucose homeostasis is maintained via a complex system involving many organs and signaling mechanisms. The liver plays a crucial role in this system by storing glucose as glycogen when blood glucose levels are high, and releasing glucose into the bloodstream when blood glucose levels are low. To accomplish its task, the liver responds to circulating levels of hormones, mainly insulin, epinephrine, glucagon, and glucocorticoids [18].
Comparison of proposed method with BANJO and BNFinder on the glucose homeostasis network
N=50 | N=75 | N=100 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Se | Sp | Pr | F | Se | Sp | Pr | F | Se | Sp | Pr | F | |
Proposed | 0.50 | 0.9812 | 0.54 | 0.52 | 0.46 | 0.9914 | 0.71 | 0.56 | 0.54 | 0.9906 | 0.72 | 0.62 |
Method | ||||||||||||
BNFinder | 0.48 | 0.9488 | 0.29 | 0.37 | 0.52 | 0.9506 | 0.32 | 0.39 | 0.56 | 0.9557 | 0.36 | 0.44 |
+BDe | ||||||||||||
BNFinder | 0.54 | 0.948 | 0.31 | 0.40 | 0.56 | 0.9395 | 0.29 | 0.38 | 0.54 | 0.9369 | 0.27 | 0.37 |
+MDL | ||||||||||||
BANJO | 0.52 | 0.97 | 0.44 | 0.47 | 0.48 | 0.9838 | 0.57 | 0.52 | 0.54 | 0.9881 | 0.67 | 0.60 |
Real-life biological data of saccharomyces cerevisiae(IRMA)
To validate our method with a real-life biological gene regulatory network, we investigate a recent network reported in [21]. In that significant work, the authors built a network, called IRMA, of the yeast Saccharomyces cerevisiae[21]. They tested the transcription of network genes by culturing the cells in presence of galactose and glucose. The network is composed of five genes regulating each other; it is also negligibly affected by endogenous genes. It is one of the first attempts at building a reference data set having an accurately known target network [7]. There are two sets of gene profiles called Switch ON and Switch OFF for this network, each containing 16 and 21 time series data points, respectively. A ’simplified’ network, ignoring some internal protein level interactions, is also reported in [21]. To compare our reconstruction method, we consider 3 other methods, namely, TDARACNE [7], BANJO [16] and BNFinder [17].
IRMA ON dataset
Performance comparison based on IRMA ON dataset
Original Network | Simplified Network | |||||||
---|---|---|---|---|---|---|---|---|
Se | Sp | Pr | F | Se | Sp | Pr | F | |
Proposed Method | 0.63 | 1.0 | 1.0 | 0.77 | 0.67 | 1.0 | 1.0 | 0.80 |
TDARACNE | 0.63 | 0.88 | 0.71 | 0.67 | 0.67 | 0.90 | 0.80 | 0.73 |
BNFinder+BDe | 0.13 | 0.82 | 0.25 | 0.17 | 0.17 | 0.80 | 0.33 | 0.22 |
BNFinder+MDL | 0.13 | 0.82 | 0.25 | 0.17 | 0.17 | 0.80 | 0.33 | 0.22 |
BANJO | 0.25 | 0.76 | 0.33 | 0.27 | 0.50 | 0.70 | 0.50 | 0.50 |
IRMA OFF dataset
Comparison based on IRMA OFF dataset
Original Network | Simplified Network | |||||||
---|---|---|---|---|---|---|---|---|
Se | Sp | Pr | F | Se | Sp | Pr | F | |
Proposed Method | 0.50 | 0.94 | 0.80 | 0.62 | 0.50 | 0.90 | 0.75 | 0.60 |
TDARACNE | 0.60 | - | 0.37 | 0.46 | 0.75 | - | 0.50 | 0.60 |
BNFinder+BDe | 0.13 | 0.82 | 0.25 | 0.17 | 0.33 | 0.80 | 0.50 | 0.40 |
BNFinder+MDL | 0.13 | 0.82 | 0.25 | 0.17 | 0.33 | 0.80 | 0.50 | 0.40 |
BANJO | 0.38 | 0.88 | 0.60 | 0.46 | 0.33 | 0.90 | 0.67 | 0.44 |
Yeast KEGG pathway reconstruction
In order to test the proposed method’s performance on yeast S. cerevisiae cell cycle, we selected a eleven gene network of the G1-phase: Cln3, Cdc28, Mbp1, Swi4, Clb6, Cdc6, Sic1, Swi6, Cln1, Cln2, Clb5. The data used was obtained from the cdc28 experiment of Spellman et al. [22]. In the later stage of the G1-phase, the Cln3-Cdc28 protein kinase complex activates two transcription factors, MBF and SBF, and these promote the transcription of some genes important for budding and DNA synthesis [7, 23]. Entry into the S-phase requires the activation of the protein kinase Cdc28p through binding with Clb5 or Clb6, and also the destruction of Sic1 [24]. Also, Swi4 becomes associated with Swi6 to form the SCB complex that activates CLN1 and CLN2 in late G1. Mbp1 forms the MCB-binding factor complex with Swi6, which activates DNA synthesis genes and S-phase cyclin genes CLB5 and CLB6 in late G1 [7]. In budding yeast, commitment to DNA replication during the normal cell cycle requires degradation of the cyclin-dependent kinase (CDK) inhibitor Sic1. The G1 cyclin-CDK complexes Cln1-Cdk1 and Cln2-Cdk1 initiate the process of Sic1 removal by directly catalyzing Sic1 phosphorylation at multiple sites [7, 25].
SOS DNA repair network of E. coli
The expression of the genes in the SOS regulatory network is controlled by a complex circuitry which involves the RecA and LexA proteins [27]. Normally LexA acts as the master repressor of more than 20 genes, including lexA and recA genes. This repression is done by its binding to the interaction sites in the promoter regions of these genes. When DNA damage occurs, one of the SOS proteins, RecA, acts as a sensor. By binding to single-stranded DNA, it becomes activated, senses the damage and mediates LexA autocleavage [27]. The drop in LexA levels in turn stops the repression of the SOS genes and activates them. When the damage has been repaired, the level of activated RecA drops and it stops mediating LexA autocleavage. LexA level in turn increases, starting repression of the SOS genes, and the cell then returns to its normal state.
The expression data sets of the SOS DNA repair system were obtained from Uri Alon Lab [28]. These data are expression kinetics of 8 genes namely uvrD, lexA, umuD, recA, uvrA, uvrY, ruvA and polB. Four experiments were done for various UV light intensities (Exp. 1 and 2:5J m^{−2}, Exp. 3 and 4:20J m^{−2}). In each experiment, the above 8 genes were monitored at 50 instants which are evenly spaced by 6 minutes intervals.
Analysis of individual interactions inferred by proposed method
Regulator | Target | correct/incorrect |
---|---|---|
LexA | uvrD | correct |
umuD | correct | |
recA | correct | |
uvrA | correct | |
RecA | uvrY | correct^{a} |
polB | correct^{a} | |
uvrD | ruvA | incorrect |
Network analysis of strongly cycling genes in cyanobacteria, Cyanothece sp. ATCC 51142
Similar to other large scale datasets (e.g. the Human HeLa cell data [31], the Arabidopsis L. Heynth dataset [32]), the microarray data set for cyanobacteria also has very few samples. Moreover, being not a well-studied organism, it requires caution in the interpretation of results. We note that GRN reconstruction studies of cyanobacteria reported earlier (e.g. [29, 33, 34]) commonly emphasise an evaluation criteria, namely “functional enrichment” analysis of sub-networks. Further, another common feature noted for genetic networks [35–37] is that transcriptional regulatory networks possess the scale free nature of the network topology^{d}. Since we have limited samples and also because the ground truth is unknown, we have carried out the evaluation of the inferred network using both: (i) statistical means i.e. GO functional enrichment analysis (using both p = 0.05 and p = 0.10), and (ii) the R^{2} measure of the power-law fit of the network to establish its scale-free nature.
The enrichment analysis was done by using gene ontology (GO) database (compiled using two sources: one from the Cyanobase database [38], and another from genome-wide amino sequence matching using the Blast2GO software suite [39]; the the compiled database is available in Additional file 7), where every GO terms appearing in each sub-network is assessed to find out whether a certain functional category is significantly over-represented in a certain sub-network/cluster, more than what would be expected by chance. The Cytoscape [40] plugin BiNGO [41] was used for GO functional category enrichment analysis. For BiNGO, we use the combined and filtered gene set as the reference set, the hypergeometric test as the test for functional over-representation, and False Discovery Rate (FDR) as the multiple hypothesis testing correction scheme.
First, we present the results corresponding to p = 0.05. The network obtained by BNFinder+BDe has 16 sub-networks each containing at least 3 genes. Of these, 6 sub-networks have significantly enriched functionalities (as determined by the GO functional enrichment test). Of the other 10, we compute the 3 most densely connected hubs for each sub-network, and in 2 of 10 such sub-networks, the hubs have defined significantly enriched functionalities. On the other hand, in our result, there are 14 sub-networks in total having at least 3 genes. Of these, 3 sub-networks have defined enriched functions (the largest sub-network has the role of nitrogen fixation according to the enrichment test). Of the other 11, we compute the 3 most densely connected hubs for each sub-network, and in 5 of the 11 such sub-networks, the hubs have defined significantly enriched functionalities.
The results corresponding to p = 0.10 show that for BNFinder+BDe, 7 sub-networks have enriched functionalities (as determined by the test). Of the other 9, we compute the 3 most densely connected hubs for each sub-network, and in 2 of the 9 such sub-networks, the hubs have defined enriched functionalities. On the contrary, the result using our approach has 5 sub-networks with defined significantly enriched functions (the largest sub-network has the role of nitrogen fixation, similar to the p = 0.05 case). Of the other 9, we compute the 3 most densely connected hubs for each sub-network, and in 6 of the 9 such sub-networks, the hubs have defined significantly enriched functionalities.
We also test the networks to assess whether they are scale free, using a power law fit. The R^{2} value of the fit corresponding to our network is 0.93, which is a better fit compared to BNFinder+BDe (0.62).
Conclusion
In this paper, we propose a framework that can simultaneously represent instantaneous and time-delayed genetic interactions. The proposed scoring metric uses information theoretic quantities having not only relevant properties but also implicitly includes the biological truth that some genes may jointly regulate other genes. Incorporating these novel features, we have implemented a score+search based GRN reconstruction algorithm. Experiments have been performed on different synthetic networks of varying complexities and also on real-life biological networks. Our method shows improved performance compared to other recent methods, both in terms of reconstruction accuracy and number of false predictions and at the same time maintaining comparable or better true predictions. A natural extension of the described method can be incorporation of a-priori knowledge from sources like protein-protein interactions databases and fusing the knowledge with existing regulatory networks to make the inferred networks much more reliable, and we are pursuing this objective. Along with these extensions, the proposed approach would improve the accuracy of gene regulatory network reconstruction and enhance research in systems biology.
Methods
The representational framework
Let us model a gene network containing n genes (denoted by ${X}_{1},{X}_{2}\dots ,{X}_{n}$) with a corresponding microarray dataset having N time points. A basic DBN-based GRN reconstruction method would try to find associations between genes X_{ i } and X_{ j } by taking into consideration the data ${x}_{i1},\dots ,{x}_{i(N-\delta )}$ and ${x}_{j(1+\delta )},\dots ,{x}_{\mathit{\text{jN}}}$ or vice versa (small case letters mean data values in the microarray), where 1 ≤ δ ≤ d. That is, it will take into consideration the d-th order Markov rule, for a gene having a maximum order of regulation d with its parents. This will effectively enable this model to capture at most d-step time delayed interactions. Conversely, a basic BN-based strategy would use the entire set of N time points and it will capture regulations that are effective instantaneously.
Complications in the alignment of data samples can arise if the parents have different orders of regulation with the child node. To make this notion clear, we describe an example where we have already assessed the degree of interest in adding two parents (gene B and C, having third and first order regulations, respectively) to the gene under consideration, X. Now, we want to assess the degree of interest in adding gene A as a parent of X with a second order regulatory relationship, that is we want to compute^{e}MI(X, A^{2}|{B^{3}, C^{1}}), where superscripts on the parent variables denote the order of regulation it has with the child node.
The √ symbol inside a cell denotes that this data sample will be used for MI computation, whereas empty cells denote that these data samples will not be considered for computing the MI. Similar alignments will need to be done for the other case, where the data is considered to be periodic (e.g., datasets of yeast compiled by [42] show such cyclic behavior [43]). However, we can use all the N data samples in this case, where the data is shifted in a circular manner.
- 1.
The network must be a directed acyclic graph.
- 2.
The inter-slice arcs must go in the correct direction (no backward arc).
- 3.
Interactions remain existent independent of time (stationarity assumption).
Our proposed scoring metric, CCIT
The term Pa(X_{ i }[t]) in the above equation represents the parents of gene X_{ i } at time t, which can be in the same time-slice or in one of the d previous time-slices (d is the maximum order of regulation) of gene X_{ i } at time t. Thus, using our approach, the scoring function for B will calculate $\mathit{\text{MI}}(B,\{{A}^{0},{D}^{0}\}\cup \{{C}^{1}\left\}\right)$. Scoring in this manner enables us to score both intra and inter-slice interactions simultaneously, rather than considering these two types of interactions in an isolated manner, making it specially suitable for problems like reconstructing GRNs, where occurrence of joint regulation is a common phenomenon.
The idea of penalizing complex structures is ubiquitous, finding its place in most of the scores like BIC, MIT and MDL. The penalization component for BIC and MDL are global, whereas for MIT it is specific for each variable and its parents. Being local in nature, the MIT scheme usually outperforms the other two [44]. In this scheme, the localised penalization is based on a theorem of Kullback [45], which says that for a particular confidence level α, the quantity 2N.MI(X_{ i }, X_{ j }|Pa(X_{ i })) − χ α,l_{ ij } represents a statistical test of conditional independence, where l_{ ij } is the degrees of freedom of a chi-squared distribution, and ${\chi}_{\alpha ,{l}_{\mathit{\text{ij}}}}$ is the statistical significance threshold. The more positive the value is, the more likely is that X_{ i } and X_{ k } are related (given the current parent set, Pa(X_{ i })) and vice-versa. Thus, adding up the MI quantities for all the genes (multiplied by 2*number of samples) and subtracting the corresponding local penalization measures effectively constitute a series of conditional independence (CI) tests, and this scheme is used for scoring using MIT.
However, porting this idea of local penalization directly to a gene regulatory network which is cursed with dimensionality (there are a large number of variables (genes), but only a few samples are available), has the problem of over-penalization. This can be exemplified using Figure 1. The penalization component for gene B according to MIT, will be: χ_{α,4} + χ_{α,12} + χ_{α,36}, assuming the special case where we have 3 levels of discrete data (the details of how these penalization components can be computed will be shown later). For a Bayesian network design having thousands of samples available, this penalization is not a problem. However, but for GRN reconstruction with samples ranging between 20-50, this penalization is too high. To remedy this situation, we propose to apply the penalization only on a per-order of regulation basis. Using this modified scheme, the penalization will be 2χ_{α,4} + χ_{α,12}, which constitutes considerable savings, thereby increasing better prediction ratio (in terms of sensitivity and specificity).
where r_{ p } denotes the number of possible values that gene X_{ p } can take (after discretization, if the data is continuous). If the number of possible values that the genes can take is not the same for all the genes, the quantity ${\sigma}_{i}^{k}$ denotes the permutation of the parent set $P{a}^{k}\left({X}_{i}\right)$ where the first parent gene has the highest number of possible values, the second gene has the second highest number of possible values and so on.
Some properties of CCIT Score
In this section we study several useful properties of the proposed scoring metric. The first among these is the decomposability property, which is especially useful for local search algorithms:
Proposition 1
CCIT is a decomposable scoring metric.
Proof
This result is evident as the scoring function is, by definition, a sum of local scores. □
Next, we show in Theorem 1 that CCIT takes joint regulation into account while scoring and it is different than three related approaches, namely MIT [44] applied to: a Bayesian Network (which we call MI T_{0}); a dynamic Bayesian Network (called MI T_{1}); and also a naive combination of these two, where the intra and inter-slice networks are scored independently (called MI T_{0 + 1}). For this, we make use of the decomposition property of MI, defined next:
Property 1
Theorem 1
CCIT scores intra and inter-slice arcs concurrently, and is different from MI T_{0}, MI T_{1} and MI T_{0 + 1} since it takes into account the fact that multiple regulators may regulate a gene simultaneously, rather than in an isolated manner.
Proof
- 1.Application of MIT in a BN based framework:$\phantom{\rule{1em}{0ex}}{s}_{\mathit{\text{MI}}{T}_{0}}=2\mathit{\text{N.MI}}(B,\{{A}^{0},{D}^{0}\left\}\right)-\left({\chi}_{\alpha ,4}+{\chi}_{\alpha ,12}\right)$(12)
- 2.Application of MIT in a DBN based framework:$\phantom{\rule{1em}{0ex}}{s}_{\mathit{\text{MI}}{T}_{1}}=2N\left\{\mathit{\text{MI}}\right(B,{C}^{1})+\mathit{\text{MI}}(A,{D}^{1}\left)\right\}-2{\chi}_{\alpha ,4}$(13)
- 3.A naive application of MIT in a combined BN and DBN based framework:$\phantom{\rule{1em}{0ex}}\begin{array}{l}{s}_{\mathit{\text{MI}}{T}_{0+1}}=2N\left\{\mathit{\text{MI}}\right(B,\{{A}^{0},{D}^{0}\})+\mathit{\text{MI}}(B,{C}^{1})\hfill \\ \phantom{\rule{8em}{0ex}}+\mathit{\text{MI}}(A,{D}^{1})\}-\left(3{\chi}_{\alpha ,4}+{\chi}_{\alpha ,12}\right)\hfill \end{array}$(14)
- 4.Our proposed scoring metric:$\phantom{\rule{1em}{0ex}}\begin{array}{l}{s}_{\mathit{\text{CCIT}}}=2N\left\{\mathit{\text{MI}}\right(B,\{{A}^{0},{D}^{0}\}\cup \left\{{C}^{1}\right\})\\ \phantom{\rule{6em}{0ex}}+\mathit{\text{MI}}(A,{D}^{1})\}-\left(3{\chi}_{\alpha ,4}+{\chi}_{\alpha ,12}\right)\phantom{\rule{2em}{0ex}}\end{array}$(15)
The concurrent scoring behavior of CCIT is evident from the first term in RHS of (15). Also, the inclusion of C in the parent set in the first term of the RHS of the equation exhibits the manner by which it achieves the objective of taking into account the biological fact that multiple regulators may regulate a gene jointly (the calculation, however, needs to be carried out in accordance with the process we described in the Methods Section).
Considering (12) and (13), it is also obvious that CCIT is different from both MIT_{0} and MIT_{1}. To show that CCIT is different from MI T_{0 + 1}, we consider (14) and (15). It suffices to consider whether MI(B, {A^{0},D^{0}}) + MI(B, C^{1}) is different from $\mathit{\text{MI}}(B,\{{A}^{0},{D}^{0}\}\cup \{{C}^{1}\left\}\right)$. Using (11), this becomes equivalent to considering whether MI(B, {A^{0},D^{0}}|C^{1}) is the same as MI(B, {A^{0}, D^{0}}), which are clearly inequal. This completes the proof. □
Endnotes
^{a}The time-delay will always be greater than zero. However, if the delay is small enough so that the regulated gene is effected before the next data sample is taken, it can be considered as an instantaneous interaction
^{b}a tutorial can be found in http://www.cs.ubc.ca/∼murphyk/Software/BDAGL/dbnDemo_hus.htm
^{c}The method works as follows: for a variable X_{ i } with k states, a basis vector is constructed for P(X_{ i }|Pa(X_{ i })) by normalizing the vector $\left(\frac{1}{1},\frac{1}{2},\cdots \phantom{\rule{0.3em}{0ex}},\frac{1}{k}\right)$. For the j-th instantiation pa(X_{ i }) of Pa(X_{ i }), samples are obtained for the probability corresponding to this instantiation by using θ_{ ij } ∼ Dirichlet(s α_{ ij }) where s is the equivalent sample size and the α_{ ij }’s are obtained by shifting the basis vector to the right j places where j modulo k is not one.
^{d}We clarify that different processes including genetic networks will generate scale free networks. However, if a network obtained using microarray data is scale free, it indicates that it is modelling the underlying biological process more accurately
^{e}in this paper, we use Mutual Information (MI)/log-likelihood based Conditional Independence tests for analysis of regulatory interactions
^{f}it should be noted here that MIT/MDL are basic scoring metric for BNs, which can be extended to score both Static and Dynamic BNs separately. Here, we are discussing MIT/MDL applied to a network having both zero and higher-order interactions
Declarations
Acknowledgments
This research is a part of the larger project on genetic network modeling supported by Monash University and Australia-India Strategic Research Fund. Comments and suggestions from Assoc. Prof. Manzur Murshed from Gippsland School of IT is gratefully acknowledged.
Authors’ Affiliations
References
- Ram R, Chetty M: A markov-blanket-based model for gene regulatory network inference. Comput Biol Bioinf, IEEE/ACM Trans 2011,8(2):353-367.View ArticleGoogle Scholar
- Zou M, Conzen S: A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data. Bioinformatics 2005, 21: 71. 10.1093/bioinformatics/bth463View ArticleGoogle Scholar
- de Campos, Ji Q: Efficient structure learning of Bayesian networks using constraints. J Machine Learning Res 2011, 12: 663-689.Google Scholar
- Friedman N, Murphy K, Russell S: Learning the structure of dynamic probabilistic networks. Morgan Kaufmann Publishers Inc.; 1998.Google Scholar
- Eaton D, Murphy K: Bayesian structure learning using dynamic programming and MCMC. Proceedings of the Twenty-Third Confererence on Uncertainty in Artificial Intelligence (UAI 2007) 2007.Google Scholar
- Chaitankar V, Ghosh P, Perkins E, Gong P, Deng Y, Zhang C: A novel gene network inference algorithm using predictive minimum description length approach. BMC Syst Biol 2010,4(Suppl 1):S7. 10.1186/1752-0509-4-S1-S7View ArticleGoogle Scholar
- Zoppoli P, Morganella S, Ceccarelli M: TimeDelay-ARACNE: Reverse engineering of gene networks from time-course data by an information theoretic approach. BMC Bioinf 2010, 11: 154. 10.1186/1471-2105-11-154View ArticleGoogle Scholar
- Mörchen F, Ultsch A: Optimizing time series discretization for knowledge discovery. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery and data mining. Chicago, IL, USA: ACM; 2005:660-665.View ArticleGoogle Scholar
- Schaffter T, Marbach D, Floreano D: GeneNetWeaver: In silico benchmark generation and performance profiling of network inference methods. Bioinformatics 2011,27(16):2263-2270. Oxford Univ Press 10.1093/bioinformatics/btr373View ArticleGoogle Scholar
- Prill R, Marbach D, Saez-Rodriguez J, Alexopoulos L, Sorger P, Xue X, Clarke N, Altan-Bonnet G, Stolovitzky G: Towards a rigorous assessment of systems biology models: the DREAM3 challenges. PloS one 2010,5(2):e9202. 10.1371/journal.pone.0009202View ArticleGoogle Scholar
- Prill R, Saez-Rodriguez J, Alexopoulos L, Sorger P, Stolovitzky G: Crowdsourcing network inference: the DREAM predictive signaling network challenge. Science’s STKE 2011,4(189):mr7.Google Scholar
- Marbach D, Schaffter T, Mattiussi C, Floreano D: Generating realistic in silico gene networks for performance assessment of reverse engineering methods. J Comput Biol 2009,16(2):229-239. 10.1089/cmb.2008.09TTView ArticleGoogle Scholar
- Noman N, Iba H: Inferring gene regulatory networks using differential evolution with local search heuristics. IEEE/ACM Trans Comput Biol Bioinf (TCBB) 2007,4(4):634-647.View ArticleGoogle Scholar
- Kimura S, Ide K, Kashihara A, Kano M, Hatakeyama M, Masui R, Nakagawa N, Yokoyama S, Kuramitsu S, Konagaya A: Inference of S-system models of genetic networks using a cooperative coevolutionary algorithm. Bioinformatics 2005,21(7):1154-1163. IEEE Computer Society Press 10.1093/bioinformatics/bti071View ArticleGoogle Scholar
- Husmeier D: Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks. Bioinformatics 2003,19(17):2271. 10.1093/bioinformatics/btg313View ArticleGoogle Scholar
- Yu J, Smith V, Wang P, Hartemink A, Jarvis E: Advances to Bayesian network inference for generating causal networks from observational biological data. Bioinformatics 2004,20(18):3594. 10.1093/bioinformatics/bth448View ArticleGoogle Scholar
- Wilczyński B, Dojer N: BNFinder: exact and efficient method for learning Bayesian networks. Bioinformatics 2009,25(2):286. 10.1093/bioinformatics/btn505View ArticleGoogle Scholar
- Le P, Bahl A, Ungar L: Using prior knowledge to improve genetic network reconstruction from microarray data. Silico Biol 2004,4(3):335-353.Google Scholar
- Murphy K, et al., et al.: The bayes net toolbox for matlab. Comput Sci Stat 2001,33(2):1024-1034.Google Scholar
- Chickering D, Meek C: Finding optimal Bayesian networks. Proc. UAI 2002.Google Scholar
- Cantone I, Marucci L, Iorio F, Ricci M, Belcastro V, Bansal M, Santini S, Di Bernardo M, Di Bernardo D, Cosma M: A yeast synthetic network for in vivo assessment of reverse-engineering and modeling approaches. Cell 2009, 137: 172-181. 10.1016/j.cell.2009.01.055View ArticleGoogle Scholar
- Spellman P, Sherlock G, Zhang M, Iyer V, Anders K, Eisen M, Brown P, Botstein D, Futcher B: Comprehensive identification of cell cycle–regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 1998,9(12):3273-3297.View ArticleGoogle Scholar
- Cross F: Starting the cell cycle: what’s the point? Curr Opin Cell Biol 1995,7(6):790-797. 10.1016/0955-0674(95)80062-XView ArticleGoogle Scholar
- Chun K, Goebl M: Mutational analysis of Cak1p, an essential protein kinase that regulates cell cycle progression. Mol Gen Genet MGG 1997,256(4):365-375. 10.1007/s004380050580View ArticleGoogle Scholar
- Sawarynski K, Kaplun A, Tzivion G, Brush G: Distinct activities of the related protein kinases Cdk1 and Ime2. Biochimica Et Biophysica Acta 2007,1773(3):450-456. 10.1016/j.bbamcr.2006.10.003View ArticleGoogle Scholar
- Kanehisa M, Goto S, Kawashima S, Nakaya A: The KEGG databases at GenomeNet. Nucleic Acids Res 2002, 30: 42-46. 10.1093/nar/30.1.42View ArticleGoogle Scholar
- Kabir M, Noman N, Iba H: Reverse engineering gene regulatory network from microarray data using linear time-variant model. BMC Bioinf 2010,11(Suppl 1):S56. BioMed Central Ltd 10.1186/1471-2105-11-S1-S56View ArticleGoogle Scholar
- Uri Alon’s SOS Dataset webpage [http://www.weizmann.ac.il/mcb/UriAlon/Papers/SOSData/] []
- Stöckel J, Welsh E, Liberton M, Kunnvakkam R, Aurora R, Pakrasi H: Global transcriptomic analysis of Cyanothece 51142 reveals robust diurnal oscillation of central metabolic processes. Proc Nat Acad Sci 2008,105(16):6156. 10.1073/pnas.0711068105View ArticleGoogle Scholar
- Wang W, Ghosh B, Pakrasi H: Identification and modeling of genes with Diurnal Oscillations from microarray time series data. Comput Biol Bioinf, IEEE/ACM Trans 2011, 8: 108-121.View ArticleGoogle Scholar
- Whitfield M, Sherlock G, Saldanha A, Murray J, Ball C, Alexander K, Matese J, Perou C, Hurt M, Brown P, et al., et al.: Identification of genes periodically expressed in the human cell cycle and their expression in tumors. Mol Biol Cell 2002,13(6):1977-2000. 10.1091/mbc.02-02-0030.View ArticleGoogle Scholar
- Yuan Y, Li C, Windram O: Directed partial correlation: inferring large-scale gene regulatory network through induced topology disruptions. PLoS One 2011,6(4):e16835. 10.1371/journal.pone.0016835View ArticleGoogle Scholar
- McDermott J, Oehmen C, McCue L, Hill E, Choi D, Stöckel J, Liberton M, Pakrasi H, Sherman L: A model of cyclic transcriptomic behavior in the cyanobacterium Cyanothece sp. ATCC 51142. Mol BioSyst 2011,7(8):2407-2418. 10.1039/c1mb05006kView ArticleGoogle Scholar
- Toepel J, Welsh E, Summerfield T, Pakrasi H, Sherman L: Differential transcriptional analysis of the cyanobacterium Cyanothece sp. strain ATCC 51142 during light-dark and continuous-light growth. J Bacteriol 2008,190(11):3904-3913. 10.1128/JB.00206-08View ArticleGoogle Scholar
- Jeong H, Tombor B, Albert R, Oltvai Z, Barabási A: The large-scale organization of metabolic networks. Nature 2000,407(6804):651-654. 10.1038/35036627View ArticleGoogle Scholar
- Jeong H, Mason S, Barabasi A, Oltvai Z: Lethality and centrality in protein networks. Arxiv preprint cond-mat/0105306 2001.Google Scholar
- Guelzim N, Bottani S, Bourgine P, Képès F: Topological and causal structure of the yeast transcriptional regulatory network. Nat Genet 2002, 31: 60-63. 10.1038/ng873View ArticleGoogle Scholar
- Kazusa DNA Research Institute: The cyanobacteria database . [http://genome.kazusa.or.jp/cyanobase] [].
- Götz S, García-Gómez J, Terol J, Williams T, Nagaraj S, Nueda M, Robles M, Talón M, Dopazo J, Conesa A: High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Res 2008,36(10):3420-3435. 10.1093/nar/gkn176View ArticleGoogle Scholar
- Shannon P, Markiel A, Ozier O, Baliga N, Wang J, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 2003,13(11):2498-2504. 10.1101/gr.1239303View ArticleGoogle Scholar
- Maere S, Heymans K, Kuiper M: BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics 2005,21(16):3448-3449. 10.1093/bioinformatics/bti551View ArticleGoogle Scholar
- Cho R, Campbell M, Winzeler E, Steinmetz L, Conway A, Wodicka L, Wolfsberg T, Gabrielian A, Landsman D, et al., et al.: A genome-wide transcriptional analysis of the mitotic cell cycle. Mol Cell 1998, 2: 65-73. 10.1016/S1097-2765(00)80114-8View ArticleGoogle Scholar
- Xing Z, Wu D: Modeling multiple time units delayed gene regulatory network using dynamic Bayesian network. In Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops. IEEE; 2006:190-195.View ArticleGoogle Scholar
- de Campos L: A scoring function for learning Bayesian networks based on mutual information and conditional independence tests. J Machine Learning Res 2006, 7: 2149-2187.Google Scholar
- Kullback S: Information Theory and Statistics. Volume. New York: Dover Publications; 1968.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.