Seeded Bayesian Networks: Constructing genetic networks from microarray data
 Amira Djebbari^{1} and
 John Quackenbush^{1}Email author
DOI: 10.1186/17520509257
© Djebbari and Quackenbush; licensee BioMed Central Ltd. 2008
Received: 03 December 2007
Accepted: 04 July 2008
Published: 04 July 2008
Abstract
Background
DNA microarrays and other genomicsinspired technologies provide large datasets that often include hidden patterns of correlation between genes reflecting the complex processes that underlie cellular metabolism and physiology. The challenge in analyzing largescale expression data has been to extract biologically meaningful inferences regarding these processes – often represented as networks – in an environment where the datasets are often imperfect and biological noise can obscure the actual signal. Although many techniques have been developed in an attempt to address these issues, to date their ability to extract meaningful and predictive network relationships has been limited. Here we describe a method that draws on prior information about genegene interactions to infer biologically relevant pathways from microarray data. Our approach consists of using preliminary networks derived from the literature and/or proteinprotein interaction data as seeds for a Bayesian network analysis of microarray results.
Results
Through a bootstrap analysis of gene expression data derived from a number of leukemia studies, we demonstrate that seeded Bayesian Networks have the ability to identify highconfidence genegene interactions which can then be validated by comparison to other sources of pathway data.
Conclusion
The use of network seeds greatly improves the ability of Bayesian Network analysis to learn gene interaction networks from gene expression data. We demonstrate that the use of seeds derived from the biomedical literature or highthroughput proteinprotein interaction data, or the combination, provides improvement over a standard Bayesian Network analysis, allowing networks involving dynamic processes to be deduced from the static snapshots of biological systems that represent the most common source of microarray data. Software implementing these methods has been included in the widely used TM4 microarray analysis package.
Background
DNA microarrays and other genomeinspired, highthroughput technologies allow us to capture information regarding gene expression across the entire collection of genes in an organism. While it has long been argued that such genomewide profiles should allow the identification of networks and pathways, deducing such interactions for even a small number of genes remains a daunting task. Although several gene network modeling techniques have been applied to microarray data, including weight matrices [1], Boolean networks [2], and differential equations [3], Bayesian Networks (BNs), BN seemed to show the greatest promise in the analysis of expression data as they provided the ability to learn network structures and develop predictive models of system response [4].
BNs were first applied to gene expression studies in the analysis of the yeast cell cycle [4], a dataset that consisted of expression data collected over a carefully planned timecourse [5]. Friedman and colleagues were able to deduce a predictive model of the cell cycle machinery in yeast from this data, a result that generated a great deal of excitement within the research community. However, application of BN analysis to more "realistic" datasets (i.e. tumor vs. normal, treated vs. control) failed to provide similarly useful insight and as such is rarely applied in analysis of expression profiling data.
Application of Bayesian Network analysis in genomics is challenging for a number of reasons. The first problem is that learning BNs is computationally expensive as, ideally, one must assess all potential network topologies corresponding to all possible sets of directed acyclic graphs linking the genes. This results in a combinatorial explosion of the number of possible structures and parameters; formally this has been shown to be an NPComplete problem [6]. As an alternative, general purpose heuristic search algorithms, such as greedy hill climbing, can be used to explore the "state space" of the problem (here, the relative expression state for each gene in each sample) in an attempt to optimize some scoring function. The problem with these approaches is that they often find local maxima and do not converge to the globally optimal solution. This accounts, in most instances, for the failure of BN analysis to find "realistic" networks in datasets that lacked the richness of the cell cycle time course.
A potential solution to the limitations of the type of generalpurpose search algorithms was proposed by Wolpert and Macready, who noted that the use of domainspecific knowledge can provide a useful bias that can lead to nearoptimal solutions in exploring the state space of a particular problem [7]. In the context of BN analysis of microarray data, a useful bias can be introduced through the use of preliminary network topologies as soft constraints to seed the search for a network graph [8–10], an approach that has been applied in a variety of related problems [11–15]. Although a network seed biases the search for the best topology, it does not limit it so that new potential interactions between genes can be identified.
A number of possibilities exist to provide prior seeds for the network topologies, including pathway/interaction databases, networks deduced from the published biomedical literature indexed in PubMed, and those constructed from highthroughput interaction screens such as the proteinprotein interaction (PPI) described by Rual and colleagues [16]. As our goal, in part, is to discover new interactions, we chose to concentrate on prior networks deduced from the literature and PPI data which often include potential interactions not yet annotated in curated pathway databases. As such, this approach has potential to discover new interactions by combining diverse sources of data and information.
A second challenge in the application of methods such as BN analysis to microarrays is the typical design of genomics studies. As noted previously, most microarray studies do not involve uniform temporal sampling of the state of a system where inferred relationships in the gene expression state space of genes can more easily be detected. Rather, typical studies involve static comparisons of different phenotypic or treatment groups where relationships between gene states can be much more subtle. Further, and possibly more importantly, the large number of genes assayed on a single array and the relatively small number of samples profiled generally provide too few measurements to constrain potential models, and arrive at an optimal solution. To address this, we implement model averaging through bootstrapping which allows us to compute confidence estimates for network features in the models we derive.
Ultimately, the question we must resolve is whether application of BN analysis to gene expression data can reveal useful networks that can lead to testable hypotheses about the state and response of the systems under study. In particular, our goal in this manuscript is to assess the use of prior network structures as seeds for a search of the gene expression state space. To do this, we present an analysis of two leukemia datasets [17–19]. As described below, we find that by combining microarray data with prior network structures deduced from the literature, PPI, or a combination of these, we can better recover known pathways and relationships between genes than when analyzing microarray data without a network seed. To estimate the robustness of our approach, we compare the learned networks to known networks from the KEGG database [20] and show that networks constructed with high confidence edges have a very small falsepositive rate, but at the expense of missing true edges. This suggests that even when applied to imperfect data, our approach provides a conservative way of recovering pathways and of identifying potential new interactions that can be further evaluated in the laboratory.
Results
Our analytical pipeline closely follows that outlined by Friedman and colleagues [4], with important modifications. One starts with normalized DNA microarray data from an appropriate study and identifies a set of candidate genes that will be further analyzed. For example, one might use a variety of statistical methods to identify genes correlating with the various phenotypic states under study or select a set of genes representing a particular pathway or process thought relevant to the system being analyzed. The expression data for these genes are then discretized using a multinomial model [4] and assigned to three mutually exclusive and exhaustive bins (underexpressed, unchanged, and overexpressed) by equalwidth binning. This discretized data then provides the "raw material" that is subsequently used to learn a Bayesian Network. Our modifications of the Friedman et al. approach involve the introduction of a prior network seed to initiate the learning stage and the use of bootstrapping to estimate confidence in the equivalence classes of final networks. As a means of comparing a BN analysis both with and without network seeds, we compared the resulting networks to "known" pathways and evaluated our ability to reproduce documented interactions between genes.
Deducing prior network structure from the published literature
where w(A, B) and w(e) denote the weight of edge (A, B) and the weight of edge e respectively in the set of edges E.
The choice of limiting edge assignments to genes appearing pairwise in articles indexed in PubMed was based on an analysis of the network properties that can be deduced from the literature. Specifically, limiting networks to papers containing two and only two genes results in network topologies exhibiting a scalefree behavior. In contrast, increasing the threshold beyond two genes introduced deviations from scalefree behavior, leading to networks that have properties of a complete graph (results not shown). Further, setting some threshold is necessary to remove publications including wholegenome studies that mention thousands of genes and consequently produce networks where each node is connected to every other node. The choice of two, while conservative, allows for a prior network with the highest possible confidence without resorting to more ambitious textmining approaches.
Deducing prior network structure from highthroughput screens
There are many sources of network priors other than the literature, including the growing collection of interactome data available from highthroughput yeast two hybrid proteinprotein interaction (PPI) screens such as that recently released by Rual and colleagues [16]. While these datasets are still sparse – current interactome datasets are thought to represent only about 1% of the human interactome (2754 edges from approximately 64 million protein pairs) – they represent an unbiased screen for interactions and have identified many not catalogued in the published literature. Since we were interested in exploring the usefulness of PPIbased networks as priors, we chose to compensate for their relative sparseness by using the "significant gene" set derived from our initial microarray analysis as a starting point to search the PPI data and allowed the initial networks to expand through protein interactions at a distance at most k = 3 interactions away using Floyd's allpairs shortest paths algorithm [23]. For PPI networks, we use a uniform distribution for the prior probabilities for all edges.
Setting an initial directionality for network edges
As a BN is a directed acyclic graph (DAG), the initial network used to seed the search must have directions assigned to each edge. To address this for the undirected literature and PPI networks, we developed an approach based on a modified depthfirst search (DFS) algorithm to traverse the graph and make assignments. Simply, one starts at some randomly selected node and explores the graph as far as possible along each branch before backtracking, assigning directions as the graph is traversed. Because DFS is commonly used for cycle detection [23], we modified the standard algorithm to keep edges in the depthfirst tree but direct back edges in increasing order of visiting timestamp to ensure the resulting graph is a DAG (the modified DFS, along with an alternate approach, is described in detail in Supplement S1; see Additional File 1). While all of the resulting edges may not be directed appropriately, this is not a problem as the BN search considers not only additions and deletions of edges but also edge reversals and therefore provides a means of correcting an error made at this stage. As this method cannot estimate the conditional probabilities on the edges, we assume a flat distribution such that all interactions are, initially, equally likely.
Bootstrapping
As noted previously, overfitting is a potential problem in BN analysis, particularly given the small number of samples profiled in a typical microarray experiment. To compensate for this, we perform model averaging through nonparametric bootstrapping (resampling with replacement) to estimate the confidence in various network features learned [24]. As this requires comparison of the DAGs derived from each bootstrap iteration, one must consider the equivalence class of DAGs. An equivalence class is a partition of a set into subsets that are equivalent under some operation, such as the partition of the integers into evens and odds under the modulotwo operation. Here we encode the independencies within the DAGs and use these as way of partitioning the entire collection into subsets in which each member is equivalent. As described in greater detail in the Supplement S2 (see Additional File 1), we transform the DAG from each bootstrap iteration into a complete partially directed acyclic graph (CPDAG) which is a representative of an equivalence class of DAGs through the DAGtoCPDAG algorithm described by Chickering [25]. For each CPDAG, we check whether the features of interest [directed edge, undirected edge, order relation (one variable is the ancestor of the other variable), and Markov relation (if two variables are connected either way or if they are both parents of another variable)] are found. By counting how often a particular feature occurs relative to the total number of iterations, we can estimate the overall bootstrap confidence and consequently select features that are strongly supported by the data.
In the work presented here, we perform 100 bootstrap operations for each of four cases: gene expression data alone (with no priors), or using priors derived from the literature, from PPI data, or from a combination of the two. For each iteration, we use the opensource WEKA package [26, 27] with a greedy hill climbing algorithm to optimize the BDe score [28], a Bayesian metric that uses an explicit prior over networks and is proportional to the posterior probability of the network given the data; as a default, we set the maximum number of parents for each node equal to three. In this framework, both the network topology and the conditional probabilities associated with each edge are learned from the data, but starting from our initial seed network.
To evaluate the performance of this approach, we investigate a number of features that can be learned from the data and estimate the confidence, as noted, based on the frequency with which it occurs. We use this to address three questions: Can we recover known gene interactions? Are the confidence estimates different in prior (combined literature and microarray data) versus no prior (microarray data alone)? Is the performance better when using prior rather than the no prior approach?
Finally, it should be noted that there are a number of alternative scoring metrics and search algorithms. The choice here in part reflects the initial presentation by Friedman [4] but also represents the defaults defined by WEKA. We investigated a number of alternatives and each produced a result consistent with that described below; an overall improvement in the resulting networks when using an initial seed network structure.
BN analysis of gene expression in Leukemia I: an example
As a first test of seeded BN approach, we analyzed a wellknown example of a typical microarray expression study, the 1999, Golub et al. comparison of Acute Lymphoblastic Leukemia (ALL) with that in Acute Myeloid Leukemia (AML) patients (38 samples, 27 ALL and 11 AML) that used the Affymetrix Hu6800 GeneChip™ (containing 7129 genespecific probe sets representing approximately 6817 genes) to identify signatures that could differentiate these disease subtypes [17]. As is the case in most analyses of this sort, we first performed a statistical filtering to reduce the complexity of the data. Here we simply used a betweensubjects ttest with a standard Bonferroni correction to identify the 40 genes most significant for distinguishing the ALL and AML samples. As noted previously, we expanded the seed networks to take better advantage of the limited PPI data by including those genes a distance of three or fewer from the initial set (based on the PPI data). This PPIextended dataset consisted of 63 genes and served as the starting point for analysis (See Supplement S3; see Additional File 1); the resulting literature plus PPI "seed" network, included 44 gene nodes connected by 48 edges and 19 singleton (unconnected) genes.
Scores for learned networks using top 40 genes and those at distance at most 3 from them in PPI using the leukemia dataset from Golub et al. [17] using microarray data only or priors from combinations of the literature and PPI data.
Prior Seed Source  None  Literature prior  PPI prior  Literature and PPI prior 

Score  29.1340899  29.82373161  29.90025881  30.99743368 
Average confidence values for directed edge features for the 7 highestconfidence gene pairs derived from the Golub et al. data [17] in a BN analysis using no prior seeds or priors derived from the literature. The ttest pvalue measures the likelihood that prior and noprior results are equivalent.
Gene pairs  Average no prior  Average prior  ttest pvalue  

SMARCA4  SMARCA4_2  0.4075  0.959583  0.001059 
CBX1  SRP9  0.155833  0.944583  1.18 × 10^{6} 
TMPO  GTF2E2  0.017083  0.934583  6.54 × 10^{10} 
SMARCA4_2  RBBP4  0.005833  0.894583  7.67 × 10^{8} 
CD79A  DHPS  0.011667  0.894167  1.32 × 10^{7} 
FAH  MCM3  0.02375  0.877083  5.90 × 10^{7} 
FAH  PSMA6  0.02875  0.790833  5.80 × 10^{6} 
Average confidence values for Markov relations features for the 7 highestconfidence gene pairs derived from the Golub et al. data [17] in a BN analysis using no prior seeds or priors derived from the literature.
Gene pairs  Average no prior  Average prior  ttest pvalue  

SMARCA4_2  SMARCA4  0.4575  0.965  0.001473 
SRP9  CBX1  0.476667  0.96  0.001508 
GTF2E2  TMPO  0.148333  0.938333  2.54 × 10^{6} 
RBBP4  SMARCA4_2  0.15375  0.912083  1.43 × 10^{5} 
CD79A  DHPS  0.135417  0.895833  1.35 × 10^{5} 
FAH  MCM3  0.187917  0.881667  4.93 × 10^{5} 
FAH  PSMA6  0.283333  0.79625  0.001547 
BN analysis of gene expression in Leukemia II: pathway reconstruction and validation
Although the analysis of the Golub et al. data provides some level of confidence that we can faithfully recover pairwise interactions between genes using a seeded BN approach, we developed this approach in hopes of deducing the structure of biological networks from gene expression data. The Hu6800 GeneChip™ used in the Golub et al. study surveys only a limited subset of the genes within the genome and has significant gaps in nearly every pathway relevant to understanding the cellular mechanisms that distinguish ALL from AML.
Consequently, we turned our attention to the analysis of two more extensive Leukemia datasets from Ross and colleagues [18, 19], who collected expression profiles on the Affymetrix U133A GeneChip™, which contains probes for more than 22,000 transcripts representing nearly the entirety of the proteincoding genes encoded within the human genome. The raw CEL files were normalized and the resulting dataset was filtered to eliminate adult low quality samples (details in Supplement S4; see Additional File 1); this left 120 pediatric ALL and 120 pediatric AML samples. To evaluate network reconstruction, we limited our analysis to genes mapping to the KEGG cell cycle pathway. For these genes, we performed a BN analysis using no prior information as well as seeds derived from the literature, PPI data, and the combination; in each case, 100 bootstrap iterations were performed using hill climbing and the BDe score. Networks were constructed using gene pairs with bootstrap confidence greater than a threshold value in the directed edge, Markov relation, and order relation features.
Given a set of genes and a predicted network describing their associations, the optimal method for validating the network would be to experimentally test each interaction that comprise the associated graph. As this is often impractical, as an alternative, we will rely the KEGG pathway database [20] as a source of "verified" interactions and network topologies that can be used to assess the fidelity of our predicted networks. For the analysis presented here, we assume that interactions represented in KEGG are true and that those not in KEGG are false. While the former is probably a good assumption, the latter is likely not; there may be as yet undiscovered interactions and, indeed, the goal of a BN analysis is to discover these. A second potential limitation of using KEGG pathways is that they are typically represented as undirected graphs while the results of a BN analysis are directed acyclic graphs. Consequently, we cannot assess the directionality of predicted edges, only their presence or absence. Further, KEGG represents physical interactions between proteins, not the RNA transcripts which are assayed in microarray studies. The seeded BN approach that we describe here models interactions between gene expression states and these may not be identical to the interaction states of the corresponding proteins. Despite these limitations, we chose to compare our results to known pathways annotated in KEGG because it is a conservative way to evaluate the performance of our approach as it does not rely on the validation of potentially novel interactions learned from the data.
Contingency table for comparing learned networks to KEGG pathways.
Learned Network  

Ideal (KEGG pathway)  Edge in network  Yes  No 
Yes  TP = E_{L} ⋂ E_{I}  FN = E_{I}  TP  
No  FP = E_{L}  TP  TN = [(V_{I}•V_{I}  1)/2]  FN 
Taken together, the evidence here, based on a number of measures, suggests that use of prior network seeds, compared to the analysis of gene expression alone, improves our ability to learn interactions known to occur between gene products and consequently to reconstruct networks in a BN framework. In addition, our analysis suggests that, ultimately, the network model that one chooses to use requires a balance between relearning known edges and discovering new potential interactions; the choice of which to emphasize depends of the goal of the analysis. Regardless of the choice, the value of a Bayesian Network framework is that it not only provides a network graph, but conditional probabilities on the edges that result in predictive models that can be used in a variety of applications.
Discussion
Much of the excitement generated by microarrays and other highthroughput technologies was based on the expectation that they would lead us to uncover the pathways and networks that provide the link between genotype and phenotype. Application of Bayesian Network analysis, although computationally expensive, represented one possible way to discover these important links in expression data, but failed to provide much insight when applied to the vast majority of expression datasets. Here we present a way to recover at least some of that lost promise.
In an analysis of two gene expression datasets, both representing the types of classcomparisons that are typical in microarray studies, we established that the use of domainspecific knowledge in the form of prior network seeds can improve the ability of Bayesian Network analyses to learn known interactions between genes. We demonstrated this by showing improvements in recovery of known pairwise interactions between gene products as defined by pathways in the KEGG database. Through the use of bootstrapping, we are able to assign confidence values to the individual interactions. Finally, these interactions, represented by an edge between two gene nodes, can be used to reconstruct predictive networks at any confidence level.
Generally, one would hope to use such an analysis to discover new potential interactions and to build testable models that can be validated. Not surprisingly, we find a tradeoff between sensitivity and specificity in detecting interactions when varying the bootstrap confidence threshold. Networks constructed with high confidence edges give high specificity but miss many interactions, resulting in lower sensitivity and fewer potentially novel interactions. Setting the confidence threshold too low, on the other hand, may lead to the identification of many spurious edges and limit the ultimate utility of any network model. Managing this tradeoff must, therefore, be done in the context of the experimental problem being investigated and one's ability to validate the results.
There are a number of potential limitations to what we present here. First, in the analysis of the Ross data, we chose to focus on genes within the cell cycle regulatory pathway rather than looking at the entire collection of genes in the array. However, even with this focused collection of genes, we were able to validate our hypothesis, that prior network seeds derived from the literature and PPI data improve our ability to detect known interactions relative to a standard, unseeded BN analysis. What is most exciting about the work we present here is that we were able to learn the structure of a dynamic process, the cell cycle, from a dataset consisting of static snapshots of two phenotypic states. This suggests that directed experiments where a particular network or pathway is perturbed and followed over time may further improve the overall performance of a BN approach. Using such an approach in an iterative manner, in which a network is first learned, then perturbed and the resulting data used to refine the predicted network structure, may allow us to discover novel players in many known networks and to learn previously unknown networks from DNA microarray expression profiles.
Second, it should be noted that the networks we learn from BN analysis do not represent physical networks but instead capture subtle relationships between the states of various genes and their ability to influence the states of others. Nevertheless, in our evaluation of BN performance, we compared our results to physical interaction networks between proteins represented in the KEGG database. We believe this is justified as the hypothesis underlying our work is that the physical relationship captured in geneproduct interactions is reflected in transcriptional response and the results we present, showing good correlation between the physical and BN networks provide some validation for this hypothesis.
Finally, one may consider whether the use of seeds biases us toward simply relearning known networks documented in the literature. While we believe the evidence presented in this manuscript suggests otherwise, even if it were the case, our approach represents an automated way of extracting network graphs from a gene list, refining the graph through the use of expression data, and learning conditional probabilities that can be used to make predictions as to how the system may respond to perturbations. The implications of this for a wide range of problems ranging from mechanistic studies to drug target prioritization remain to be explored.
Conclusion
Ultimately, the value of what we present here will be judged based on its utility. The goal of most systems biology research is the development of predictive, testable models that can be used to infer the properties of biological systems. The seeded BN approach we describe here yields such models and the data we present indicates that the networks we learn do, indeed, reflect the properties of biological systems. While we await further validation of the approach, we believe that seeded Bayesian Networks will be an important new approach for the analysis of genomescale expression data.
Abbreviations
 ALL:

Acute Lymphoblastic Leukemia
 AML:

Acute Myeloid Leukemia
 BNs:

Bayesian Networks
 CPDAG:

complete partially directed acyclic graph
 DAG:

directed acyclic graph
 DFS:

depthfirst search
 FN:

false negatives
 FP:

false positives
 KEGG:

Kyoto Encyclopedia of Genes and Genomes
 PPI:

proteinprotein interaction
 PPV:

positive predictive value
 ROC:

ReceiverOperator Characteristic
 TP:

true positives
 TN:

true negatives.
Declarations
Acknowledgements
The authors would like to thank Simon Berkovich and Fenglong Liu for useful discussions. We would also like to thank Weidong (Wendy) Wong, Eleanor Howe, Joseph White, and Raktim Sinha for assistance with MeV Graphical User Interface implementation and Remco Bouckaert of the WEKA team for helpful guidance with software implementation. This work was supported by grants from the National Heart, Lung, and Blood Institute of the National Institutes of Health, the National Library of Medicine of the National Institutes of Health, and from the DanaFarber Cancer Institute High Tech Fund.
Authors’ Affiliations
References
 Weaver DC, Workman CT, Stormo GD: Modeling regulatory networks with weight matrices. Pac Symp Biocomput. 1999, 112123.Google Scholar
 Akutsu T, Miyano S, Kuhara S: Identification of genetic networks from a small number of gene expression patterns under the Boolean network model. Pac Symp Biocomput. 1999, 1728.Google Scholar
 Chen T, He HL, Church GM: Modeling gene expression with differential equations. Pac Symp Biocomput. 1999, 2940.Google Scholar
 Friedman N, Linial M, Nachman I, Pe'er D: Using Bayesian networks to analyze expression data. J Comput Biol. 2000, 7 (34): 601620.View ArticlePubMedGoogle Scholar
 Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B: Comprehensive identification of cell cycleregulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell. 1998, 9 (12): 32733297.PubMed CentralView ArticlePubMedGoogle Scholar
 Chickering DM: Learning Bayesian networks is NPComplete. Learning from Data: Artificial Intelligence and Statistics V. Edited by: Fisher D, Lenz H. 1996, 121130. SpringerVerlagView ArticleGoogle Scholar
 Wolpert DH, W.G. M: No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation. 1997, 1 (1): 6782.View ArticleGoogle Scholar
 Hartemink AJ, Gifford DK, Jaakkola TS, Young RA: Combining location and expression data for principled discovery of genetic regulatory network models. Pac Symp Biocomput. 2002, 437449.Google Scholar
 Imoto S, Higuchi T, Goto T, Tashiro K, Kuhara S, Miyano S: Combining microarrays and biological knowledge for estimating gene networks via Bayesian networks. Proc IEEE Comput Soc Bioinform Conf. 2003, 2: 104113.PubMedGoogle Scholar
 Castelo R, Siebes APJM: Priors on network structures: Biasing the search for Bayesian networks. Centrum voor Wiskunde informaticae. 1998Google Scholar
 Le Phillip P, Bahl A, Ungar LH: Using prior knowledge to improve genetic network reconstruction from microarray data. In Silico Biol. 2004, 4 (3): 335353.PubMedGoogle Scholar
 Husmeier D, Werhli AV: Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks with Bayesian networks. Comput Syst Bioinformatics Conf. 2007, 6: 8595.View ArticlePubMedGoogle Scholar
 Bastos G, Guimarães KS: Analyzing the Effect of Prior Knowledge in Genetic Regulatory Network Inference. Pattern Recognition and Machine Intelligence. 2005, 611616. Berlin / Heidelberg , SpringerView ArticleGoogle Scholar
 Gevaert O, Van Vooren S, De Moor B: A framework for elucidating regulatory networks based on prior information and expression data. Ann N Y Acad Sci. 2007, 1115: 240248.View ArticlePubMedGoogle Scholar
 Werhli AV, Husmeier D: Reconstructing gene regulatory networks with bayesian networks by combining expression data with multiple sources of prior knowledge. Stat Appl Genet Mol Biol. 2007, 6: Article15PubMedGoogle Scholar
 Rual JF, Venkatesan K, Hao T, HirozaneKishikawa T, Dricot A, Li N, Berriz GF, Gibbons FD, Dreze M, AyiviGuedehoussou N, Klitgord N, Simon C, Boxem M, Milstein S, Rosenberg J, Goldberg DS, Zhang LV, Wong SL, Franklin G, Li S, Albala JS, Lim J, Fraughton C, Llamosas E, Cevik S, Bex C, Lamesch P, Sikorski RS, Vandenhaute J, Zoghbi HY, Smolyar A, Bosak S, Sequerra R, DoucetteStamm L, Cusick ME, Hill DE, Roth FP, Vidal M: Towards a proteomescale map of the human proteinprotein interaction network. Nature. 2005, 437 (7062): 11731178.View ArticlePubMedGoogle Scholar
 Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999, 286 (5439): 531537.View ArticlePubMedGoogle Scholar
 Ross ME, Zhou X, Song G, Shurtleff SA, Girtman K, Williams WK, Liu HC, Mahfouz R, Raimondi SC, Lenny N, Patel A, Downing JR: Classification of pediatric acute lymphoblastic leukemia by gene expression profiling. Blood. 2003, 102 (8): 29512959.View ArticlePubMedGoogle Scholar
 Ross ME, Mahfouz R, Onciu M, Liu HC, Zhou X, Song G, Shurtleff SA, Pounds S, Cheng C, Ma J, Ribeiro RC, Rubnitz JE, Girtman K, Williams WK, Raimondi SC, Liang DC, Shih LY, Pui CH, Downing JR: Gene expression profiling of pediatric acute myelogenous leukemia. Blood. 2004, 104 (12): 36793687.View ArticlePubMedGoogle Scholar
 Kanehisa M: The KEGG database. Novartis Found Symp. 2002, 247: 91101; discussion 1013, 11928, 24452.View ArticlePubMedGoogle Scholar
 Jenssen TK, Laegreid A, Komorowski J, Hovig E: A literature network of human genes for highthroughput analysis of gene expression. Nat Genet. 2001, 28 (1): 2128.PubMedGoogle Scholar
 McEntyre J, Lipman D: PubMed: bridging the information gap. Cmaj. 2001, 164 (9): 13171319.PubMed CentralPubMedGoogle Scholar
 Cormen TH, Leiserson CE, Rivest RL: Introduction to Algorithms. 1990, Cambridge, Massachusetts, MIT PressGoogle Scholar
 Friedman N, Goldszmidt M, Wyner A: Data analysis with bayesian networks: A bootstrap approach. Edited by: Laskey K, Prade H. 1999, 196205. Morgan KaufmannGoogle Scholar
 Chickering DM: Learning equivalence classes of Bayesian network structures. Edited by: Horvitz E, Jensen F. 1996, 150157. Morgan KaufmannGoogle Scholar
 Frank E, Hall M, Trigg L, Holmes G, Witten IH: Data mining in bioinformatics using Weka. Bioinformatics. 2004, 20 (15): 24792481.View ArticlePubMedGoogle Scholar
 WEKA: , http://www.cs.waikato.ac.nz/~ml/weka/index.html
 Heckerman D, Geiger D, Chickering DM: Learning Bayesian networks: the combination of knowledge and statistical data. Machine Learning. 1995, 20 (3): 197243.Google Scholar
 Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003, 13 (11): 24982504.PubMed CentralView ArticlePubMedGoogle Scholar
 Cytoscape :, http://www.cytoscape.org
 Harbour JW, Dean DC: The Rb/E2F pathway: expanding roles and emerging paradigms. Genes Dev. 2000, 14 (19): 23932409.View ArticlePubMedGoogle Scholar
 Nevins JR: The Rb/E2F pathway and cancer. Hum Mol Genet. 2001, 10 (7): 699703.View ArticlePubMedGoogle Scholar
 Aref S, Mabed M, ElSherbiny M, Selim T, Metwaly A: Cyclin D1 expression in acute leukemia. Hematology. 2006, 11 (1): 3134.View ArticlePubMedGoogle Scholar
 Acid S, de Campos LM, FernandezLuna JM, Rodriguez S, Maria Rodriguez J, Luis Salcedo J: A comparison of learning algorithms for Bayesian networks: a case study based on data from an emergency medical service. Artif Intell Med. 2004, 30 (3): 215232.View ArticlePubMedGoogle Scholar
 Fawcett T: ROC Graphs: Notes and Practical Considerations for Data Mining Researchers. HP Labs Tech Report HPL20034. 2003Google Scholar
 Saeed AI, Bhagabati NK, Braisted JC, Liang W, Sharov V, Howe EA, Li J, Thiagarajan M, White JA, Quackenbush J: TM4 microarray software suite. Methods Enzymol. 2006, 411: 134193.View ArticlePubMedGoogle Scholar
 Saeed AI, Sharov V, White J, Li J, Liang W, Bhagabati N, Braisted J, Klapa M, Currier T, Thiagarajan M, Sturn A, Snuffin M, Rezantsev A, Popov D, Ryltsov A, Kostukovich E, Borisovsky I, Liu Z, Vinsavich A, Trush V, Quackenbush J: TM4: a free, opensource system for microarray data management and analysis. Biotechniques. 2003, 34 (2): 374378.PubMedGoogle Scholar
 MeV:, http://www.tm4.org/mev.html
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.