Machine learning based analyses on metabolic networks supports high-throughput knockout screens
© Plaimas et al; licensee BioMed Central Ltd. 2008
Received: 29 April 2008
Accepted: 24 July 2008
Published: 24 July 2008
Computational identification of new drug targets is a major goal of pharmaceutical bioinformatics.
This paper presents a machine learning strategy to study and validate essential enzymes of a metabolic network. Each single enzyme was characterized by its local network topology, gene homologies and co-expression, and flux balance analyses. A machine learning system was trained to distinguish between essential and non-essential reactions. It was validated by a comprehensive experimental dataset, which consists of the phenotypic outcomes from single knockout mutants of Escherichia coli (KEIO collection). We yielded very reliable results with high accuracy (93%) and precision (90%). We show that topologic, genomic and transcriptomic features describing the network are sufficient for defining the essentiality of a reaction. These features do not substantially depend on specific media conditions and enabled us to apply our approach also for less specific media conditions, like the lysogeny broth rich medium.
Our analysis is feasible to validate experimental knockout data of high throughput screens, can be used to improve flux balance analyses and supports experimental knockout screens to define drug targets.
Defining drug targets and drug design is one of the major goals in biomedical research. In particular, metabolic enzymes have been successfully targeted by specific drugs to inhibit essential processes of pathogenic organisms in the human host . Analyzing the metabolic network in silico helps to identify enzymes that are essential for the survival of the organism [2, 3]. A general model for the metabolic network has been described by graph theoretical approaches and was applied to identify drug targets in pathogenic organisms . The term 'damage' was used to assess enzymes that may serve as drug targets when their inhibition influences a substantial number of downstream metabolic reactions and products . Furthermore, concepts of choke points and load points were successfully applied to estimate the essentiality of an enzyme [2, 3]. Load points were defined as hot spots in the metabolic network (enzymes/metabolites) based on the ratio of the number of k-shortest paths passing through a metabolite/enzyme (in/out), and the number of nearest neighbor links (in/out) attached to it. This ratio was compared to the average load value in the network . Choke points uniquely consume or produce a certain metabolite, which may make them indispensable. For example, in Plasmodium falciparum d-aminolevulinate dehydratase (ALAD) has been considered as a choke point  and was proven experimentally to serve as a valid antimalarial target .
Flux balance analyses (FBA) is a widely used and well established method to assess the essentiality of genes [7, 8]. However, FBA approaches need clear definitions of nutrition availability and biomass production under specifically given environmental conditions (for a good overview of these aspects see e.g. ). High-throughput experiments have been performed to investigate the essentiality of a major portion or all genes in an organism [10–12]. For Escherichia coli, the essentiality of virtually all open reading frames was observed by a comprehensive knockout screen (KEIO collection ). This data enables to test the performance of an in silico metabolic model that predicts essential genes. Analyzing flux balances under aerobic glucose condition using the COBRA toolbox  and a newly reconstructed metabolic network of E. coli yielded 92% accuracy when predicting the essentiality of genes . Feist and co-workers compared their predictions with the KEIO collection and yielded 88% for rich media conditions. In another study, FBA and the corresponding experimental knockout screen was performed to study the opportunistic behavior of the pathogen Pseudomonas aeruginosa with a systems view .
In this paper we propose an integrative machine learning approach applying a broad list of the described tools. The machine learning system was supplied with qualitative and quantitative descriptors derived from biochemical knowledge, genomic and transcriptomic data, and flux balance analyses. Using the KEIO collection  as the gold standard, we yielded an overall accuracy of 93% for rich media conditions. Comparative analysis between the flux balance approach and our machine learning approach yielded some improvements for FBA, namely to consider aminyl-tRNA reactions in modeling. Predictions that contradicted the KEIO collection were experimentally tested and successfully used to detect errors in the experimental data. Predicted reactions matching the experimental screen strengthen their candidacy as potential drug targets. Supporting this claim, 19 out of 37 predictions for novel targets were found in other literature with reported experimental evidence.
The data for the metabolic network was taken from a previous study and reconstructed in the same way (iAF1260, see Feist and co-workers ). Basically, the metabolic network was represented as an undirected bipartite graph consisting of metabolites and reactions as alternating nodes. This network was taken for our flux balance analyses. For all other analyses, unspecific compounds such as water, ATP, etc. were discarded.
The gold standard
In order to demonstrate the efficiency of our approach we used data from the KEIO collection  as the gold standard. The dataset consisted of the phenotypic outcomes from a set of knockout mutants of single genes and was used to define the classes "essential" and "non-essential" for our reactions. Genes were knocked out by in-frame replacement of a PCR product containing a kanamycin resistance gene. The start-codon and the up-stream translational signal were not replaced and fully intact. After kanamycin treatment, in-frame single gene deletions were verified by PCR with loci specific primers. When they were unable to create a mutant that formed colonies on a plate, the mutated gene was considered to be essential. Knockout experiments were performed in LB rich medium and in glucose minimal medium, resulting in two datasets (denoted as rich medium and glucose minimal medium, respectively). For the rich medium, out of 4,288 tested genes, for 303 genes no mutants were found and therefore defined as being essential. Genes that were considered to be essential under rich medium condition were also considered as essential under glucose minimal medium condition. Additionally, to these genes, 119 genes were assigned to be essential in glucose minimal medium as they showed very slow growth in minimal media (growth rate ≤ 0.0926 in 24 hours). Experimental criteria for gene essentiality on glucose minimal medium are described in detail in [8, 12]. Genes were mapped to the corresponding proteins, enzymes and reactions using the gene-protein-reaction Table from Feist et al. . The reaction(s) associated with each gene were defined as essential or non-essential if there was no other way to activate the reaction(s) by other genes and if the coding gene was experimentally essential or non-essential, respectively. Otherwise they were discarded from our training and testing analysis. Furthermore, 133 reactions were discarded from the analysis, as the corresponding genes couldn't be defined. Finally, from 303 essential genes we determined a set of 231 essential and 1125 non-essential reactions under rich medium. Out of these 1125 non-essential reactions under rich medium, 107 reactions were defined as essential under glucose minimal medium. In total, 1356 reactions were used and the experimental results (KEIO) for their essentiality were taken as class labels of the reactions (samples) for training and validating the classifiers. Note that, we didn't use this experimental data for any features of the reactions.
Defining the features
List of all features
Topology features: local structures
Reachable/Unreachable Products (RUP): more than or equal to one product cannot be produced when blocking a reaction
Percentage of Unreachable Products (PUP): the percentage of products which cannot be produced when blocking a reaction
Number of Substrates (NS)
Number of Products (NP)
Number of Neighbouring Reactions (NNR)
Number of Neighbours of Neighbouring Reactions (NNNR)
Clustering Coefficient Value (CCV): clustering coefficient of a reaction
Directionality of a reaction (DIR)
Topology features: deviations, choke points, load scores and damage
Number of Deviations (ND)
Average Path Length (APL): the average path length of the deviations
Length of Shortest Path (LSP): the length of the shortest path of the deviations
Choke Point (CP): a reaction is a choke point or not (Rahman et al, 2006)
Load Score (LS): load score of a reaction (Rahman et al, 2006)
Number of Damaged Reactions (NDR): the number of damaged reactions after blocking a reaction (Lemke et al, 2004)
Number of Damaged Compounds (NDC): the number of damaged compounds after blocking a reaction (Lemke et al, 2004)
Number of Damaged Reactions having no Deviations (NDRD): the number of damaged reactions that have no other alternative paths to be reached after blocking a reaction
Number of Damaged Compounds having no Deviations (NDCD): the number of damaged compounds that have no other alternative paths to be reached after blocking a reaction
Number of Damaged Choke point Reactions (NDCR): the number of damaged choke point reactions after blocking a reaction
Number of Damaged Choke point Compounds (NDCC): the number of damaged choke point compounds after blocking a reaction
Number of Damaged Choke point Reactions having no Deviations (NDCRD): the number of damaged choke point reactions that have no other alternative paths to be reached after blocking a reaction
Number of Damaged Choke point Compounds having no Deviations (NDCCD): the number of damaged choke point compounds that have no other alternative paths to be reached after blocking a reaction
Gene expression data, genomic data and miscellaneous
Number of Coding Genes (NCG): the number of coding genes for a reaction
Homology at 10-10 (H10): the number of homologous genes with e-value cutoff 10-10
Homology at 10-7 (H7): the number of homologous genes with e-value cutoff 10-7
Homology at 10-5 (H5): the number of homologous genes with e-value cutoff 10-5
Homology at 10-3 (H3): the number of homologous genes with e-value cutoff 10-3
Number of Reactions from Same Genes (NRSG): the number of reactions derived from the same genes
Number of Reaction having Similar Expression (NRSE): the number of reactions that have similar expression (correlation coefficient >0.8)
Maximum of Correlation Coefficients (MCC): maximum value of the correlation coefficients for all neighbouring reactions
Biomass Flux Value (BFV): biomass flux value when blocking a reaction (under aerobic glucose condition)
Topology based features
We set up a breadth first algorithm to investigate the network when a single reaction was blocked. We defined a reaction as essential for survival when basically the mutated network could not yield the products of the reaction from upstream substrates of the reaction. Hence, features were defined to describe if the knocked out reaction was substantial for producing its downstream metabolites or if these products could still be produced by other pathways. The investigation for each tested knocked out reaction was performed by the following algorithm.
i. All metabolites acting as input nodes (substrates) and output nodes (products) of the knocked out reaction were selected. The set of substrates S defined the input nodes and the set of products P defined the output nodes. To get a broader list of available substrates we integrated several other substrates into S. We included the substrates of the upstream reactions and the products of the downstream reactions into the sets S and P, respectively. Substrates of reactions that had at least one of the substrates S as a substrate was included into S. Further, substrates of reactions that had a metabolite out of P as a substrate were also included into S.
ii. Reactions were selected which used only available compounds as substrates.
iii. These selected reactions and their products were incorporated into the network. These products were set as new available metabolites in the network.
iv. Steps ii and iii were repeated until no further reactions could be identified for incorporation.
v. The output nodes that could be produced were counted (reachable products P).
After finishing the process, we used the number of defined output nodes that could be produced within the mutated network for two features, i.e. a quality feature defining if at least one product could not be produced (RUP, reachable/unreachable products), and the percentage of products that could not be produced (PUP, percentage of unreachable products).
We again run a breadth first search on the network to estimate possible deviations. This time we focused on relevant pathways by using the similarity measure from the SIMCOMP software . SIMCOMP was used to define the most relevant substrates and products of each reaction. Starting from S, the breadth first search explored the network for finding the direct products of the knocked out reaction. When the algorithm visited these products, it stored the corresponding pathway and continued its search to find further alternative paths until the network was entirely explored or a maximal path length of 10 reactions was reached. We took the average path length (APL, average path length) and the shortest path length (LSP, length of shortest path) of the deviations as features for the classifier. The deviation features were used to find alternative pathways to produce products of the knocked out reaction by its substrates S. In the metabolic network, these substrates can also be consumed by other reactions yielding their products etc. Therefore, we kept track of alternative paths in the metabolic network for the potential of the organism to survive when a reaction was blocked. The organism may have many pathways to produce the products making the system more robust. Thus, we counted the number of possible alternative paths yielding feature ND (ND, number of deviations).
Choke points, load points and damage
A reaction that uniquely consumes or produces a certain metabolite in the metabolic network is considered a choke point. Such a reaction shows high potential for essentiality [2, 3]. We checked if an observed reaction was a choke point (CP, choke points). According to the concept of load scores from , we computed a load score of a reaction from the average number of pathways passing through the reaction, in comparison to the number of pathways for all metabolites in the network. We used the definition of damaged compounds/reactions reported by . Basically, damage was defined by determining the potentially effected metabolites and reactions downstream of the knocked out reaction. We applied their definition for calculating the features NDR (NDR, number of damaged reactions) and NDC (number of damaged compounds). In turn, some damaged compounds/reactions might have been produced from alternative pathways. Therefore, we calculated the number of damaged compounds/reactions that did not have an alternative way to be reached from the substrates of the knocked out reaction (NDRD, number of damaged reactions having no deviations; NDCD, number of damaged compounds having no deviations). In addition to our analysis on damaged compounds/reactions, we also included the number of damaged choke points (NDCR, number of damaged choke point reactions; NDCC, number of damaged choke point compounds; NDCRD, number of damaged choke point reactions having no deviations; NDCCD, number of damaged choke point compounds having no deviations).
Local topology features
The number of substrates and products of the knocked out reaction were counted (NS, number of substrates, and NP, number of products, respectively). Further, we defined features for the number of neighbouring reactions (NNR, number of neighbouring reactions), the number of neighbours of neighbouring reactions (NNNR, number of neighbours of neighbouring reactions) and the clustering coefficient (CCV, clustering coefficient values) [16, 17] of the knocked out reaction. The reaction direction (DIR, directionality of a reaction) was taken from the model from Feist et al..
Gene expression data, genomic data and miscellaneous
For our case study, we collected gene expression data from a study observing the regulation during oxygen deprivation . This dataset was taken to have a rather unspecific regulation, i.e. not of a small band but of a broad range of effected metabolic pathways. The gene expression data of each data-set was mapped onto the corresponding reactions. For a reaction that was catalysed by a complex of proteins, we took the mean of the gene expression values for the corresponding genes (for more details see ). Genes in the same pathway often show co-regulation . Therefore, the maximum correlation coefficient of all neighboring reactions of the knocked out reaction (MCC, maximum correlation coefficient) and the number of reactions having similar gene expression (correlation coefficient > 0.8) were calculated (NRSE, number of reactions having similar expression). Together with the number of reactions coming from the same gene (NRSG, number of reactions from same genes), these features served the machine for estimating if the knocked out reaction was in a biosynthesis or degradation pathway. We also included the number of homologous genes that might have taken over the function of the knocked out gene. Homologous genes were searched using Blast  against all open reading frames of E. coli with four different e-value cutoffs, i.e. 10-3, 10-5, 10-7, and 10-10 yielding the features H3, H5, H7 and H10, respectively. The method of the flux balance simulations is described in Results and discussion (section Comparing the performance to the performance of flux balance analyses).
We applied Support Vector Machines from the R package e1071  to classify between essential and non-essential reactions of the metabolic network. A radial basis function was used as the kernel function. Parameter optimization was performed for the regularization term that defined the costs for false classifications (5 steps for each, range: 2n, n = -4, -2, 0, 2, 4). The same range was taken for the kernel width γ. This optimization was realized by training with a grid search over all combinations of these parameters . The sizes of the two classes differed significantly in our data set (essential: 17%, non-essential: 83%). For a broad spectrum of different precisions and sensitivities, we varied the weight factor for the positive instances from the data set with the optimized feature set in the range of 0.1 to 5.0. We performed a leave-one-out cross validation to measure the effectiveness of the machine learning method. A single reaction was selected as the validation data to be predicted and the remaining reactions as the training data. This was repeated for each reaction in the data set. For assessing the performance of the classifiers, we calculated the standard measures accuracy (number of correctly predicted reactions/number of all predicted reactions), sensitivity (number of true positives/(number of true positives + number of false negatives)), specificity (number of true negatives/(number of true negatives + number of false positives)), positive prediction value or precision (number of true positives/number of positively predicted reactions), negative prediction value (number of true negatives/number of negatively predicted reactions).
The feature selection was done by a top-down approach. We trained the Support Vector Machines in terms of maximizing the overall accuracy using all features. Each single feature was discarded from the data set and the performance of the machine was observed. Testing the performance of the machine was done by a leave-one-out cross validation. The accuracies of the machines missing one feature were compared and the best machine kept for the next iteration. This was repeated until the accuracy did not increase. The machine with the best accuracy was selected as the best classifier and its features as the optimized feature set.
Experimental protocol for the knockout verification
Knockout mutations were verified by PCR amplification of genomic loci expected to contain the 1327 base pair gene replacement cassette with specific primers (Table S4 in Additional file 1). Primers were chosen to have equal predicted melting temperatures of ~60°C and hybridised at specific distances upstream and downstream of the target gene. PCR reactions were performed directly from freshly grown bacterial colonies for 30 cycles at the annealing temperature of 54°C. The product sizes obtained from the KEIO collection strains were compared to those from the wild-type E. coli strain MG1655 on 1% agarose gels.
Assembling a list of drug targets
To map enzymes with drug targets, drugs and their corresponding drug targets were selected from the drug database Drugbank . We took drugs into account that affected any organism excepting humans and other mammals. Entries that were found as metabolites for a reaction in the KEGG database  were discarded to restrict our drug list to non-endogenous compounds. The targets' annotated EC numbers of the remaining drugs were collected as our validated drug targets.
Results and discussion
Performance of the machine learning algorithm
Performance of machine learning based predictions on rich media condition
(all 30 features)
(25 optimized features)
positive predictive values (precision)
negative predictive values
Identifying drug targets
Comparing the performance of the machine learning approach to flux balance analyses
Comparison of our machine learning method and Flux Balance Analyses on glucose minimal media condition
positive predictive values
negative predictive values
Improving flux balance simulations
Using our approach as a means to validate the experimental knockout screen
Predicting a different outcome from experimental high throughput screen (KEIO) may be due to either an error in our algorithm, or an error within the experimental knockout screen. We examined our lists of false positives and false negatives by two experimental set-ups. Our list of false negatives contained 71 genes which our algorithm predicted to be non-essential under glucose minimal condition in contradiction to the outcome of the KEIO experiment . For 33 of them we obtained corresponding knockout clones from the KEIO library (growing on rich media), and grew them on M9 glucose medium. Indeed, we were able to grow 9 out of 33 clones with good growth rates (OD600 ≥ 0.2 after 48 hours) and 3 clones with reasonable growth rates (OD600 between 0.07 and 0.2 after 48 hours). The complete list is given in Table S3 [in Additional file 1]. In turn, we also tested the list of false positives, for which our algorithm predicted 33 genes to be essential, in contrast to the experimental high throughput screen. We assumed that some of these genes weren't knocked out correctly. Baba et al. (2006) provided a validity estimation for their clones. We compared our results to their estimations and selected 6 genes, for which mutants they estimated to be less than or equal to 37.5% correct. For 5 out of these 6 genes (alaS, coaA, coaE, glyS and hemE) PCR with specific primer pairs (Table S4 in Additional file 1) yielded two products with sizes corresponding to wild-type and knockout alleles, respectively. This indicated that the genes were not correctly knocked out and the wild-type gene was still present. No PCR product was observed for the ileS knockout. Additionally we tested another 4 genes out of our list, for which mutations were stated to be 100% correct by Baba et al. Indeed, for all of those genes (aspC, epd, luxS, thiE) only the correct PCR product corresponding to the knockout allele was observed.
Defining drug targets is a challenging task. Many experiments rely on a conditional essentiality screen of genes to define the associated enzymes as possible drug targets. Machine learning methods can help to validate this experimental data. Our approach used the experimental knockout data for E. coli from KEIO . The machine was trained with this data and predicted quite accurately the experimental outcomes. Most methods based on graphical networks aim at finding out weak points in the network. We set up a machine learning system that integrates features describing the network topology and functional genomics properties in an elaborated way. By this we gained two valuable insights. Firstly, we could see that the topologic, genomic and transcriptomic data describing the network attributes was sufficient for defining the essentiality of a certain reaction. For pathogens it is often hard to define the environmental parameters which are complex and changeable as e.g. for intestinal infections. Our approach can, in principle, handle all media conditions, as shown for rich media conditions in this study. Rich media conditions may better reflect the situation of the pathogens in the host (like e.g. in the gut), in comparison to minimal media conditions with clearly defined carbon sources for which flux balance analyses can be well adapted. A second benefit of our study is the experimental validation and support for estimations of potential drug targets. When regarding the intersection of our results and the KEIO collection, we found 37 potential targets for novel drugs, for 19 out of which we could find some reported experimental evidence in the literature. An advantage of machine learning approaches is to easily change the stringency parameter, e.g. for increasing precision to avoid loosing potential candidates, the weight factor for the positive instances can be increased. We used gene expression data from E. coli wild-type and single knock out strains. The single knock outs were regulators for respiration effecting a large number of genes and also the treatment was rather unspecific (growth in oxygen rich and deprived conditions). Hence, a large portion of network pathways of the metabolic network was differentially expressed . Within the presented approach, data of such pathway unspecific examinations suited well to let the classifier learn which neighboring enzymes jointly work together. Therefore, also multiple gene co-expression datasets for a variety of conditions may suit well for our approach. However, it needs still to be investigated which gene expression data suits best to optimize the performance.
We have presented a system that could be broadly applied to systems seeking potential drug targets for a variety of substantial bacterial infections and other organisms. For E. coli we benefited from a rich data pool including a well elaborated metabolic network, a genome wide knock out viability screen, the genome sequence and a feasible gene expression dataset. Nowadays, the genomic sequence may not be the limiting factor for most applications as a remarkable number of genomes has been sequenced or will be sequenced in the next future. As our approach uses unspecific gene expression data also this can be obtained from publically available resources or obtained by rather straightforward experiments. Very well elaborated metabolic networks have been assembled for some organisms (e.g. B. subtilis , H. pylori , M. barkeri , M. tuberculosis , S. cerevisiae ) to which we expect that our method can be transferred without major difficulties. Further networks can be received for a large amount of organisms from existing excellent databases like BioCyc  and Kegg . It will be challenging to exploit these networks with our method. Finally, until now, for our approach the genome wide essentiality screen is still substantial and laborious. A methodological very challenging task remains to employ our approach across different organisms, by e.g. using the essentiality screen of one organism to infer the information to another.
We are very grateful to Adam Feist for helping us to understand and use the COBRA Toolbox and his flux balance analyses. We thank Christopher Dyer for stylistic corrections. This work was funded within the BMBF-FORSYS consortium Viroquant (# 0313923), the Deutscher Akademischer Auslandsdienst, the Helmholtz Alliance on Systems Biology of Cancer and the Commission on Higher Education (CHE) of Thailand.
- Hopkins AL, Groom CR: The druggable genome. Nature reviews. 2002, 1 (9): 727-730. 10.1038/nrd892PubMedGoogle Scholar
- Rahman SA, Schomburg D: Observing local and global properties of metabolic pathways: 'load points' and 'choke points' in the metabolic networks. Bioinformatics. 2006, 22 (14): 1767-1774. 10.1093/bioinformatics/btl181View ArticlePubMedGoogle Scholar
- Yeh I, Hanekamp T, Tsoka S, Karp PD, Altman RB: Computational analysis of Plasmodium falciparum metabolism: organizing genomic information to facilitate drug discovery. Genome Res. 2004, 14 (5): 917-924. 10.1101/gr.2050304PubMed CentralView ArticlePubMedGoogle Scholar
- Schuster S, Fell DA, Dandekar T: A general definition of metabolic pathways useful for systematic organization and analysis of complex metabolic networks. Nat Biotechnol. 2000, 18 (3): 326-332. 10.1038/73786View ArticlePubMedGoogle Scholar
- Lemke N, Heredia F, Barcellos CK, Dos Reis AN, Mombach JC: Essentiality and damage in metabolic networks. Bioinformatics. 2004, 20 (1): 115-119. 10.1093/bioinformatics/btg386View ArticlePubMedGoogle Scholar
- Bonday ZQ, Dhanasekaran S, Rangarajan PN, Padmanaban G: Import of host delta-aminolevulinate dehydratase into the malarial parasite: identification of a new drug target. Nat Med. 2000, 6 (8): 898-903. 10.1038/78659View ArticlePubMedGoogle Scholar
- Edwards JS, Palsson BO: Metabolic flux balance analysis and the in silico analysis of Escherichia coli K-12 gene deletions. BMC Bioinformatics. 2000, 1: 1- 10.1186/1471-2105-1-1PubMed CentralView ArticlePubMedGoogle Scholar
- Feist AM, Henry CS, Reed JL, Krummenacker M, Joyce AR, Karp PD, Broadbelt LJ, Hatzimanikatis V, Palsson BO: A genome-scale metabolic reconstruction for Escherichia coli K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information. Molecular systems biology. 2007, 3: 121- 10.1038/msb4100155PubMed CentralView ArticlePubMedGoogle Scholar
- Schuetz R, Kuepfer L, Sauer U: Systematic evaluation of objective functions for predicting intracellular fluxes in Escherichia coli. Molecular systems biology. 2007, 3: 119- 10.1038/msb4100162PubMed CentralView ArticlePubMedGoogle Scholar
- Baba T, Ara T, Hasegawa M, Takai Y, Okumura Y, Baba M, Datsenko KA, Tomita M, Wanner BL, Mori H: Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Molecular systems biology. 2006, 2: 2006 0008- 10.1038/msb4100050PubMed CentralView ArticlePubMedGoogle Scholar
- Oh YK, Palsson BO, Park SM, Schilling CH, Mahadevan R: Genome-scale reconstruction of metabolic network in Bacillus subtilis based on high-throughput phenotyping and gene essentiality data. J Biol Chem. 2007, 282 (39): 28791-28799. 10.1074/jbc.M703759200View ArticlePubMedGoogle Scholar
- Joyce AR, Reed JL, White A, Edwards R, Osterman A, Baba T, Mori H, Lesely SA, Palsson BO, Agarwalla S: Experimental and computational assessment of conditionally essential genes in Escherichia coli. J Bacteriol. 2006, 188 (23): 8259-8271. 10.1128/JB.00740-06PubMed CentralView ArticlePubMedGoogle Scholar
- Becker SA, Feist AM, Mo ML, Hannum G, Palsson BO, Herrgard MJ: Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox. Nature protocols. 2007, 2 (3): 727-738. 10.1038/nprot.2007.99View ArticlePubMedGoogle Scholar
- Oberhardt MA, Puchalka J, Fryer KE, Dos Santos VA, Papin JA: Genome-scale metabolic network analysis of the opportunistic pathogen Pseudomonas aeruginosa PAO1. J Bacteriol. 2008Google Scholar
- Hattori M, Okuno Y, Goto S, Kanehisa M: Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. J Am Chem Soc. 2003, 125 (39): 11853-11865. 2003/09/25 10.1021/ja036030uView ArticlePubMedGoogle Scholar
- Barabasi AL, Oltvai ZN: Network biology: understanding the cell's functional organization. Nat Rev Genet. 2004, 5 (2): 101-113. 10.1038/nrg1272View ArticlePubMedGoogle Scholar
- Wagner A, Fell DA: The small world inside large metabolic networks. Proceedings. 2001, 268 (1478): 1803-1810.Google Scholar
- Covert MW, Knight EM, Reed JL, Herrgard MJ, Palsson BO: Integrating high-throughput and computational data elucidates bacterial networks. Nature. 2004, 429 (6987): 92-96. 10.1038/nature02456View ArticlePubMedGoogle Scholar
- König R, Schramm G, Oswald M, Seitz H, Sager S, Zapatka M, Reinelt G, Eils R: Discovering functional gene expression patterns in the metabolic network of Escherichia coli with wavelets transforms. BMC Bioinformatics. 2006, 7: 119- 10.1186/1471-2105-7-119PubMed CentralView ArticlePubMedGoogle Scholar
- Samal A, Singh S, Giri V, Krishna S, Raghuram N, Jain S: Low degree metabolites explain essential reactions and enhance modularity in biological networks. BMC Bioinformatics. 2006, 7: 118-2006/03/10 10.1186/1471-2105-7-118PubMed CentralView ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25 (17): 3389-3402. 10.1093/nar/25.17.3389PubMed CentralView ArticlePubMedGoogle Scholar
- Dimitriadou E, Hornik K, Leisch F, Meyer D, Weingessel A: Misc Functions of the Department of Statistic (e1071), TU Wien. 2006Google Scholar
- Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, Chang Z, Woolsey J: DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res. 2006, 34 (Database issue): D668-72. 2005/12/31 10.1093/nar/gkj067PubMed CentralView ArticlePubMedGoogle Scholar
- Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, Itoh M, Katayama T, Kawashima S, Okuda S, Tokimatsu T, Yamanishi Y: KEGG for linking genomes to life and the environment. Nucleic Acids Res. 2008, 36 (Database issue): D480-4. 2007/12/14PubMed CentralPubMedGoogle Scholar
- Schramm G, Zapatka M, Eils R, König R: Using gene expression data and network topology to detect substantial pathways, clusters and switches during oxygen deprivation of Escherichia coli. BMC Bioinformatics. 2007, 8 (1): 149- 10.1186/1471-2105-8-149PubMed CentralView ArticlePubMedGoogle Scholar
- Thiele I, Vo TD, Price ND, Palsson BO: Expanded metabolic reconstruction of Helicobacter pylori (i IT341 GSM/GPR): an in silico genome-scale characterization of single- and double-deletion mutants. J Bacteriol. 2005, 187 (16): 5818-5830. 10.1128/JB.187.16.5818-5830.2005PubMed CentralView ArticlePubMedGoogle Scholar
- Feist AM, Scholten JC, Palsson BO, Brockman FJ, Ideker T: Modeling methanogenesis with a genome-scale metabolic reconstruction of Methanosarcina barkeri. Molecular systems biology. 2006, 2: 2006 0004- 10.1038/msb4100046PubMed CentralView ArticlePubMedGoogle Scholar
- Jamshidi N, Palsson BO: Investigating the metabolic capabilities of Mycobacterium tuberculosis H37Rv using the in silico strain iNJ661 and proposing alternative drug targets. BMC systems biology. 2007, 1: 26- 10.1186/1752-0509-1-26PubMed CentralView ArticlePubMedGoogle Scholar
- Duarte NC, Herrgard MJ, Palsson BO: Reconstruction and validation of Saccharomyces cerevisiae iND750, a fully compartmentalized genome-scale metabolic model. Genome Res. 2004, 14 (7): 1298-1309. 10.1101/gr.2250904PubMed CentralView ArticlePubMedGoogle Scholar
- Caspi R, Foerster H, Fulcher CA, Kaipa P, Krummenacker M, Latendresse M, Paley S, Rhee SY, Shearer AG, Tissier C, Walk TC, Zhang P, Karp PD: The MetaCyc Database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases. Nucleic acids research. 2008, 36 (Database issue): D623-31.PubMed CentralPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.