Identifying essential genes in bacterial metabolic networks with machine learning methods
© Plaimas et al; licensee BioMed Central Ltd. 2010
Received: 17 December 2009
Accepted: 3 May 2010
Published: 3 May 2010
Identifying essential genes in bacteria supports to identify potential drug targets and an understanding of minimal requirements for a synthetic cell. However, experimentally assaying the essentiality of their coding genes is resource intensive and not feasible for all bacterial organisms, in particular if they are infective.
We developed a machine learning technique to identify essential genes using the experimental data of genome-wide knock-out screens from one bacterial organism to infer essential genes of another related bacterial organism. We used a broad variety of topological features, sequence characteristics and co-expression properties potentially associated with essentiality, such as flux deviations, centrality, codon frequencies of the sequences, co-regulation and phyletic retention. An organism-wise cross-validation on bacterial species yielded reliable results with good accuracies (area under the receiver-operator-curve of 75% - 81%). Finally, it was applied to drug target predictions for Salmonella typhimurium. We compared our predictions to the viability of experimental knock-outs of S. typhimurium and identified 35 enzymes, which are highly relevant to be considered as potential drug targets. Specifically, we detected promising drug targets in the non-mevalonate pathway.
Using elaborated features characterizing network topology, sequence information and microarray data enables to predict essential genes from a bacterial reference organism to a related query organism without any knowledge about the essentiality of genes of the query organism. In general, such a method is beneficial for inferring drug targets when experimental data about genome-wide knockout screens is not available for the investigated organism.
By definition, essential proteins of a cellular organism are necessary to live and replicate, and are therefore attractive targets for antimicrobial treatments. However, experimentally assaying the essentiality of their coding genes is error prone when done in high throughput. Additionally, experimental screens are resource intensive and not feasible for all organisms, as typically, for each gene a knock-out strain needs to be constructed. Besides this, pathogenic bacterial organisms are hazardous to cultivate and therefore need higher laboratorial efforts. The metabolism of a cell is substantial for maintaining life and growth, and hence metabolic enzymes have been successfully targeted by antibiotics inhibiting essential processes in bacterial genomes .
Several computational techniques have been developed to identify essential genes in silico. Flux balance analyses (FBA) is widely used to assess the essentiality of genes . However, FBA approaches need clear definitions of nutrition availability and biomass production under specifically given environmental conditions . Descriptors for enzymes in the metabolic network were put up by graph theoretical approaches and were used to identify drug targets in micro-organisms. Concepts of choke-points and load-points were successfully applied to estimate the essentiality of an enzyme [4–6]. The term 'damage' was used to assess enzymes that may serve as drug targets when their inhibition influences a substantial number of downstream reactions and products . In a previous study, we examined the ability of the network to obtain the products of a knocked-out reaction from its educts via alternative pathways and used this method to detect potential drug targets for P. falciparum . Various descriptors for centrality of a node in a network have been successfully applied and supported detecting essential proteins in protein-protein interaction networks [8–12]. It was shown that proteins which have a more central position evolve more slowly and are more likely to be essential for survival . Also sequence features like codon usage, GC-content and localization signals were used for predicting essential genes. They were successfully applied for inferring essential genes from S. cerevisiae to the less studied yeast strain S. mikatae . Additionally, phyletic retention has been found to be a valuable predictive feature for gene essentiality in E. coli and S. cerevisiae [14, 15]. Although protein-protein interaction networks may provide a global view of cellular signaling, we were rather interested in identifying drug targets in pathogens inferred from properties of mal-functional metabolism after having knocked out an enzymatic function.
In a previous study, we developed and applied an integrative machine learning method that combined these topology based methods to validate an experimental knock-out screen of Escherichia coli . We now used the basic concepts of this strategy to enable predicting essential genes in an organism for which no experimental training data is available. For defining the essentiality of a gene, we now integrated sequence characteristics such as codon usages, length of the sequences and phyletic retention. Furthermore, we incorporated several centrality-descriptors for a node in a network. We used experimental datasets of comprehensive genome wide knock-out screens of Escherichia coli [17, 18] and Pseudomonas aeruginosa [19, 20] to train the machines with a large variety of attributes including topology characteristics as mentioned above, own developments on evaluating possible flux deviations , and genomic and transcriptomic information. To develop a classification system that is readily applicable for predicting essential genes of a new query organism, the system needs to make accurate predictions for an organism on which it was not trained. Therefore, we performed a cross-validation across the organisms of E. coli and P. aeruginosa, i.e. we trained with E. coli and validated with P. aeruginosa (and vice versa) to obtain the quality of the performance of this approach. We then applied the trained and validated classifiers to the pathogenic bacterium Salmonella typhimurium. We compared our results with the literature and experimental data of a large knock-out study for S. typhimurium . Furthermore, we analyzed our predictions with gene set enrichment tests for metabolic pathways and identified proteins of the entire non-mevalonate pathway to be relevant for targeting with drugs. Its reactions showed typically topological characteristics of essential reactions. Using our prediction results and the experimental knock-out screen, we defined 35 enzymes as drug targets for S. typhimurium, 23 out of which have been described previously as drug targets in other micro-organisms. We suggest these and the remaining twelve as potential new drug targets for the organism we studied (S. typhimurium).
Results and discussion
Predicting essential genes with an organism-wise cross validation
List of all features.
Reachable/Unreachable Products (RUP): equals one if all products could be produced when blocking the reaction, otherwise zero
Percentage of Unreachable Products (PUP): the percentage of products which cannot be produced when blocking the reaction
Number of Deviations (ND)
Average Path Length (APL): the average path length of the deviations
Length of the Shortest Path (LSP): the length of the shortest path of the deviations
b) Local topology
Number of Substrates (NS)
Number of Products (NP)
Number of Neighboring Reactions (NNR)
Number of Neighbors of Neighboring Reactions (NNNR)
Clustering Coefficient Value (CCV): clustering coefficient of a reaction
Directionality of a reaction (DIR)
c) Choke points and load scores
Choke Point (CP): a reaction is a choke point or not (Rahman et al, 2006)
Load Score (LS): load score of a reaction (Rahman et al, 2006)
Number of Damaged Reactions (NDR) (Lemke et al, 2004)
Number of Damaged Compounds (NDC) (Lemke et al, 2004)
Number of Damaged Reactions having no Deviations (NDRD): the number of damaged reactions that have no other alternative paths to be reached after blocking a reaction
Number of Damaged Compounds having no Deviations (NDCD): the number of damaged compounds that have no other alternative paths to be reached after blocking a reaction
Number of Damaged Choke point Reactions (NDCR)
Number of Damaged Choke point Compounds (NDCC)
Number of Damaged Choke point Reactions having no Deviations (NDCRD): the number of damaged choke point reactions that have no other alternative paths to be reached after blocking a reaction
Number of Damaged Choke point Compounds having no Deviations (NDCCD): the number of damaged choke point compounds that have no other alternative paths to be reached after blocking a reaction
Genomic and transcriptomic features
Number of Associated Reactions (NAR): the number of reactions that base on the knocked-out gene
Homology at different expectation values: the number of homologous genes with e-value cutoff 10-30,10-20,10-10,10-7,10-5,10-3 (H30, H20, H10, H7, H5, H3)
g) Gene expression
Number of Genes having Similar Expression (NGSE): the number of genes that have similar expression (correlation coefficient >0.8)
Maximum of Correlation Coefficients (MCC): maximum value of the correlation coefficients for all neighboring genes
h) Phyletic retention
Phyletic Retention (PR): the number of orthologs in the other prokaryotes
i) Codon usage
Number of codons
Base composition at silent sites (T3s, C3s, A3s, G3s)
The frequency of amino acids glutamine (exemplarily)
Examining the features
The efficiency of flux deviations was estimated by the features RUP and PUP which gave an estimate if all products of the knocked-out reaction could be produced without the reaction (RUP) and how large the percentage of non-producible products (PUP) was. RUP was a Boolean feature to observe if the mutant could produce all products of the knocked-out reaction. RUP = 1 was set if all downstream products could be produced by the mutant while RUP was set to zero if at least one downstream product could not be produced. The number of reachable products (RUP) was highly negative correlated and the percentage of unreachable products (PUP) highly positive correlated to the essentiality of a gene (P = 1.2E-10 and P = 2.4E-09, respectively) as shown in Figure 3A. If the (in silico) mutant could not produce one or more downstream products, RUP was zero whereas the percentage of unreachable products was increased in comparison to the situation in which all products could be produced. The higher the percentage of unreachable products of the mutant, the less products of the knocked out enzyme could be covered by alternative pathways. The number of substrates and products of the reactions of the knocked out gene (NS, NP) were positively correlated to gene essentiality (P = 4.3E-06 and P = 0.0172, respectively) showing that essential enzymes metabolize more different compounds. Interestingly, the number of neighboring reactions (NNR) and the number of neighbors of neighboring reactions (NNNR) showed a weak negative correlation to essentiality (P = 0.14 and P = 0.091, respectively). This is reasonable as a reaction with a high number of neighboring reactions may have more metabolites as products that can be produced by alternative enzymes. The clustering coefficients (CCV) showed the same tendency (negatively correlated, P = 0.018) also pointing to advantageous alternative pathways.
We estimated the feasibility of possible flux deviations by a set of features describing alternative pathways. The number of alternative pathways (ND), the average path length of the deviations (APL) and the length of the shortest alternative path (LSP) described the feasibility of alternative pathways. As expected, all of them were negatively correlated to essentiality (P = 0.15, P = 3.4E-04 and P = 0.0063, respectively), i.e. knocked out enzymes for which alternative pathways existed were less likely to cause a lethal phenotype if knocked out. Choke-points (CP) are uniquely consumed or produced compounds in the metabolism and showed a positive correlation with essentiality (P = 2.8E-04) as choke-points are often difficult to be replaced by the rest of the metabolism. Load-scores (LS) give an estimate of how often a reaction is involved in metabolic processes. They were also positively correlated to essentiality (P = 9.4E-04). Betweenness centrality (BW) and eccentricity (EC) were strongly positive correlated to essentiality (P = 1.3E-14 and 7.6E-08, respectively) showing that enzymes have a higher influence on vitality if placed in the center of the network. Closeness centrality (CN) also showed a positive correlation (P = 0.0020). Interestingly, the eigenvector centrality (EV) showed a negative correlation (P = 0.0013). Betweenness, closeness and eccentricity centrality are global centrality measures considering the whole network while the eigenvector centrality is a measure for local centrality and is computed from its neighbors. Note that typically a node with a high value of eigenvector centrality is a hub (node with high connectivity) with other hubs connected to it. Hence, flux deviations may be more likely for local hubs that have hubs in their vicinity making the node replaceable whereas global central nodes seemed to be generally substantial for maintaining the metabolic flow in the network. Therefore the eigenvector centrality may describe the network topology more in the sense as the clustering coefficient, specifically in respect to the likelihood of alternative pathways.
Genomic and transcriptomic features
As expected, the number of homologous genes (H30, H20, H10, H7, H5, H3) showed a negative correlation to essentiality (P = 3.2E-04, 6.3E-04, 1.4E-06, 4.7E-09, 1.1E-10, 1.5E-09, respectively). Interestingly, an E-value cut-off of 10-5 (H5) worked best showing that also non-perfectly matching sequences may take over functions of the knocked out gene. The number of genes having similar expression (NGSE) exhibited also a negative correlation to essentiality (P = 1.7E-04) which may be due to co-expression of genes with analogous function. For the feature phyletic retention (PR), the number of prokaryotes having orthologs of the knocked out gene showed a positive correlation to essentiality (P = 2.1E-16) supporting the findings of a previous study that conserved genes in evolution hint for their essentiality .
We analyzed the codon usage for each gene and related these to the essentiality of the gene. We found that genes with a high number of the nucleotide thymine at the third position of the codons were more likely to be essential for cell viability (feature T3s in Figure 3, see Additional file 3: SupplementS3 for histograms). The third codon position is the most redundant position in the genetic code. Matching of mRNA to tRNA codon nucleotides is less robust at the third position, and translational errors are therefore more likely to occur at that position. However, essential genes need to be stable and to be protected in the sequence. Thymine in the genetic code might cope for this as it was shown that thymine protected DNA and improved the efficiency of DNA replication . Conserved genes are more likely to be essential  and a thymine at the 3rd codon position facilitates stable genetic inheritance into off-springs and cellular replicates. Interestingly, we observed a larger difference of T3s in E. coli when compared to P. aeruginosa. It was found that a large average of G and C content at the third codon position is common for all genes in P. aeruginosa . These results in a low T content at the third codon position which we observed and may explain the larger difference of T3s for essential and non-essential genes in E. coli compared to P. aeruginosa (see also Additional file 3: SupplementS3 for histograms).
Even though decision trees performed inferior compared to Support Vector Machines (data not shown), we used decision trees for an alternative approach to determine the most discriminating features for gene essentiality. Decision trees were trained with the datasets ecoB, ecoG, paeL and paeJ. It is known that the first decision of a decision tree is performed with the best discriminating feature . Therefore, we counted the occurrence of our features at the top position of each run. Out of 400 runs, we got T3s (137×), BW (81×), Nc (38×), H3 (21×), H5 (15×), val (12×), H7 (11×), NGSE (11×), PUP (10×) as the first decision (features with less than 10 counts are not shown). This again showed that the feature for T3s substantially supported the classification.
Identifying drug targets for S. typhimurium
Predicted essential genes and potential drug targets.
a) Intersection of our predictions with the experimental screen
Undecaprenyl pyrophosphate synthase
Bifunctional 5,10-methylene-tetrahydrofolate dehydrogenase
Acyl carrier protein S-malonyltransferase
Enoyl-(acyl carrier protein) reductase
DNA-directed RNA polymerase subunit alpha
ATP-dependent DNA helicase Rep
DNA-directed RNA polymerase subunit beta
DNA-directed RNA polymerase subunit beta'
b) Predictions for the non-mevalonate pathway
4-hydroxy-3-methylbut-2-enyl diphosphate reductase
1-deoxy-D-xylulose 5-phosphate reductoisomerase
4-hydroxy-3-methylbut-2-en-1-yl diphosphate synthase
2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase
2-C-methyl-D-erythritol 4-phosphate cytidylyltransferase
** murE: UDP-N-acetylmuramoylalanyl-D-glutamate-2,6-diaminopimelate ligase is an essential enzyme and a well-known target against bacterial cell walls of Staphylococcus aureus .
* murG: N-acetylglucosaminyl transferase is a potential antibiotic targeting the biosynthesis of bacterial peptidoglycans. However, it is difficult to design inhibitors for this enzyme. Identifying inhibitors is under current research [29, 30].
** murC: UDP-N-acetylmuramate-L-alanine ligase catalyzes an essential step in the pathway for synthesizing peptidoglycan precursors. Recently, new inhibitors of the MurC enzyme have been successfully tested for Escherichia coli, Proteus mirabilis and Klebsiella pneumoniae .
* pyrH: The gene for uridylate kinase is essential in Mycobacterium tuberculosis .
** uppS: Undecaprenyl pyrophosphate synthase (UPPS) is a novel antibacterial target of Streptococcus pneumoniae .
* hemH: Ferrochelatase is essential for multiplication and intracellular survival of Brucella abortus .
* kdsB: Analogs of 3-deoxy-D-manno-octulosonate (KDO) were designed to inhibit CTP:CMP-KDO cytidylyltransferase (CMP-KDO synthetase) . It is a potential target of Haemophilus influenzae  and E.coli .
* fabD: Acyl carrier protein S-malonyltransferase is a potential target of Mycobacterium bovis BCG .
** fabG: 3-ketoacyl-(acyl-carrier-protein) reductase is a well-known drug target of E. coli, B. subtilis, and S. aureus .
* pssA: Phosphatidylserine synthetase is required for motility and chemotaxis in E. coli . Furthermore, mutants of Escherichia coli K12 which were defective in phosphatidylserine synthetase, were isolated as temperature-sensitive, conditional lethals .
* kdtA: 3-deoxy-D-manno-octulosonic-acid transferase is the enzyme of kdtA. In E. coli, it is essential for cell growth and accounts for conditional lethality associated with mutations in KDO biosynthesis .
** dfp: Pantothenate kinase is uptream of phosphopantothenoylcysteine decarboxylase in the biosynthesis of pantothenate and CoA. It is a well-known target for antimicrobial drugs against E. coli and Mycobacterium tuberculosis .
* rpoB: DNA-directed RNA polymerase is a promising target for the discovery of new antimicrobial agents against E. coli .
The non-mevalonate pathway and fatty acid biosynthesis are highly enriched with essential genes of S. typhimurium
We established a machine learning approach that predicts the essentiality of genes for an organism when no experimental knock-out data is available. The classifiers were trained with essentiality information for genes of one organism (e.g. E. coli) and were employed to predict essential genes of the other organism (e.g. P. aeruginosa). These predictions did not depend on essentiality information of the query organism for which the predictions were made, but solely on features that were calculated from the metabolic network and genomic and transcriptomic information of the query organism. Such data is abundantly available for many pathogenic bacteria. We applied this method to predict essential genes of Salmonella typhimurium as the query organism of interest and proposed 35 potential drug targets. 27 targets resulted from the intersection between our predictions and an experimental study  and 8 targets of the non-mevalonate pathway which we found by a statistical enrichment analysis. The non-mevalonate pathway is essential in algae, plants and several eubacteria including pathogenic bacteria. Enzymes of this pathway have been described to be potential targets for the development of novel antibiotics and herbicidal agents [50, 52]. For S. typhimurium, we also inferred this by our machine learning approach. We discovered interesting correlations of our features to the essentiality of a gene. Various features describing the network topology served the machine to select reactions that showed no possible pathways for flux deviations, as e.g. in the linear non-mevalonate pathway. An intelligent combination of these features may be seen as an alternative approach to the established methods of flux balance analyses (FBA) and elementary flux modes (EFM) if detailed growth and nutrient information is lacking (which is needed for FBA, see  and if an in-depth refinement of the metabolic network is considered to be too labor intensive (in EFM the enzymes need to be separated into internal nodes and external nodes to reduce the computational complexity ).
Our method served well to estimate genes and their corresponding enzymes in the bacterial organisms of E. coli, P. aeruginosa and S. typhimurium. Inferring essentiality information for an organism from another organism may be facilitated by investigating a closely related organism as we did with S. typhimurium being rather closely related to E. coli. To apply this method to other micro-organisms, the metabolism may need to be well described, and the method may need adaptation for less studied organisms or which show special metabolic capabilities. For applying this method to eukaryotic genomes, the compartments in the cell at which a reaction occurs may need to be considered. It will be very challenging to apply our method for inferring multiple drug targets using experimental double knock-out screens as e.g. of the synthetic lethal project of eSGA . For this, attributes that are related to single players (like e.g. sequence features) might be of less relevance while specifically network features might be much more related to synergistic knock-out effects. The challenge for the future remains to integrate such topological descriptive approaches with genetic information to systematically explore the network effects of enzyme treatments and combinations thereof.
The metabolic networks of E. coli, P. aeruginosa, and S. typhimurium were reconstructed using the database of KEGG . Unspecific compounds such as water, ATP, etc. were discarded. Additionally, only the main compounds of the reactions as annotated in the KGML files of KEGG were used. Except for the centrality features and clustering coefficients, the topology features were calculated by a representation of the network as a bipartite graph consisting of metabolites and reactions as alternating nodes. For calculating the centrality features and the clustering coefficients, we represented the metabolic network as an undirected graph, also known as a reaction-pair network. It consisted of reactions as nodes and metabolites as edges connecting two reactions of the graph. Two reactions were connected by a metabolite if the metabolite was a product of one of the reactions and a substrate of the other reaction. Edges were discarded such that the network had no loops and no more than one edge between any two different nodes. Reactions were mapped to enzymes and enzymes mapped to their corresponding genes using the association tables from KEGG. Genes that corresponded to death-end reactions in the network were not included into the datasets for training and validation. If a gene corresponded to more than one reaction, the mean value of the reaction features was taken. For the Boolean features (RUP, DIR, CP, see below) we used the Boolean OR-operation, i.e. a gene feature was set to one if at least one reaction feature equaled to one.
The gold standards
To train and validate our predictions, we used published datasets from genome wide experimental knock-out screens. All were performed in LB rich medium. Two datasets were of E. coli and two of P. aeruginosa. Additionally we used data from a knock-out study of S. typhimurium which was also performed in LB rich medium. All datasets were taken from the NMPDR database . For E. coli, we used the KEIO collection of Baba and co-workers  which we denoted as 'ecoB'. It consisted of 104 essential and 641 non-essential genes for the metabolic network. The other dataset of E. coli was from Gerdes and co-workers  which we denoted as 'ecoG'. It consisted of 147 essential genes and 533 non-essential genes for our network. For P. aeruginosa we used the data of Liberati et al.  denoted as 'paeL'. It consisted of 92 essential genes and 615 non-essential genes for the network. The other dataset for P. aeruginosa was taken from the study by of Jacobs et al. . We denoted it as 'paeJ'. It consisted of 150 essential genes and 579 non-essential genes. The experimental dataset for S. typhimurium was from Knuth and co-workers  and based on insertion-duplication mutagenesis (IDM). Small, randomly generated genomic fragments were cloned into a conditionally replicating vector, and the resulting library of single S. typhimurium clones was grown under permissive conditions. Upon switching to non-permissive temperature, discrimination between lethal and non-lethal insertions following homologous recombination allowed the trapping of genes with essential functions. With this method, genes were detected that were indispensable for growth. However, non-essential genes could not be determined. For the metabolism, 53 genes were found to be essential and for the remaining 711 the essentiality could not be determined by this method .
Defining the features
Features were obtained from network topology properties and genomic and transcriptomic information. Table 1 shows an overview of all features and their abbreviations.
Topology based features
a) Deviation features
For the following features, we used the metabolic network in the representation of a bipartite graph consisting of two different alternating nodes, i.e. metabolites and reactions. As reported recently [6, 16], we implemented a breadth first algorithm to investigate the network when a single reaction was blocked. We defined a reaction as essential for survival when basically the mutated network could not yield the products of the reaction from upstream substrates of the reaction. Hence, features were defined to describe if the knocked out reaction was substantial for producing its downstream metabolites or if these products could still be produced by other pathways. The investigation for each tested knocked out reaction was performed by the following algorithm:
i. All metabolites acting as input nodes (substrates) and output nodes (products) of the knocked out reaction were selected. The set of substrates S defined the input nodes and the set of products P defined the output nodes. To get a broader list of available substrates we integrated other substrates into S. We included the substrates of the upstream reactions and the products of the downstream reactions into the sets S and P, respectively. Substrates of reactions that had at least one of the substrates S as a substrate were included into S. Further, substrates of reactions that had a metabolite out of P as a substrate were also included into S.
ii. Reactions were selected which used only available compounds as substrates.
iii. These selected reactions and their products were incorporated into the list of discovered reactions and products. The products were set as newly available metabolites in the network.
iv. Steps ii and iii were repeated until no further reactions could be identified.
v. The output nodes that could be produced were counted (reachable products P).
After finishing the process, we used the number of defined output nodes that could be produced within the mutated network for two features, i.e. a quality feature defining if all products could be produced (RUP, reachable/unreachable products), and the percentage of products that could not be produced (PUP, percentage of unreachable products). We again run a breadth first search on the network to estimate possible deviations. Starting from S, the breadth first search explored the network for finding the direct products of the knocked out reaction. When the algorithm visited these products, it stored the corresponding pathway and continued its search to find further alternative paths until the network was entirely explored or a maximal path length of 10 reactions was reached. The organism may have many pathways to produce the products making the system more robust. Thus, we counted the number of possible alternative paths yielding feature ND (ND, number of deviations). We took the average path length (APL, average path length) and the shortest path length (LSP, length of shortest path) of the deviations as features for the classifier. The deviation features were used to find alternative pathways to produce products of the knocked out reaction by its substrates S. In the metabolic network, these substrates can also be consumed by other reactions yielding their products etc. Therefore, we kept track of alternative paths in the network for the potential of the organism to survive when a reaction was blocked.
b) Local topology
The number of substrates and products of the knocked out reaction were counted (NS: number of substrates; NP: number of products). We defined features for the number of Neighboring reactions (NNR) and the number of Neighbors of Neighboring reactions (NNNR). We calculated the clustering coefficient (CCV) as described in [57, 58] of the knocked out reaction to estimate the local density of the network. The reaction direction (DIR, directionality of reactions) was taken from KEGG and set as reversible if no other information was available.
c) Choke-points and load-scores
A reaction that uniquely consumes or produces a certain metabolite in the metabolic network is considered a choke point. Such a reaction shows high potential for essentiality [4, 5]. We used this as a feature (CP, choke points). According to the concept , load scores were defined as hot spots in the network based on the ratio of the number of k-shortest paths passing through a reaction, and the number of nearest neighbor links attached to it. This ratio was compared to the average load value in the network.
The damage was defined to determine potentially effected metabolites and reactions downstream of the knocked out reaction. We used the definition of damaged compounds and reactions from Lemke and co-workers  yielding the features NDR (NDR, number of damaged reactions) and NDC (number of damaged compounds). As half of our reactions were annotated as being reversible, some compounds and reactions might have been inferred as damage but are actually just back-traced alternative pathways. Therefore, we calculated the number of damaged compounds and reactions for a network in which all alternative pathways were discarded yielding NDRD (number of damaged reactions without deviations) and NDCD (number of damaged compounds without deviations). In addition, we calculated the number of damaged choke points (NDCR, number of damaged choke point reactions; NDCC, number of damaged choke point compounds; NDCRD, number of damaged choke point reactions without deviations; NDCCD, number of damaged choke point compounds without deviations).
in which Neighbor(i) is the set of neighboring reactions of reaction i, n is the total number of reactions and λ is a constant. This leads directly to the well-known eigenvector equation, Ax = λx. Normally, there are different eigenvalues λ for which an eigenvector solution exists. According to the Perron-Frobenius theorem only the eigenvector of the largest eigenvalue is the eigenvector centrality .
Genomic and transcriptomic features
We calculated the number of homologous genes that might have taken over the function of the knocked out gene. Homologous genes were searched using BLAST  against all open reading frames of the respective organism (E. coli, P. aeruginosa, S. typhimurium). We used different E-value cutoffs, i.e. 10-3, 10-5, 10-7, 10-10, 10-20, and 10-30 to obtain the features H3, H5, H7, H10, H20 and H30, respectively. Sequences of all open reading frames were taken from the NCBI database (http://www.ncbi.nlm.nih.gov/, E. coli: [GenBank:NC_000913], P. aeruginosa: [GenBank:NC_002516], and S. typhimurium: [GenBank:NC_003197]).
g) Gene expression
We collected gene expression data for all three investigated organisms from public resources. The datasets were selected in respect to have a rather unspecific regulation, i.e. from treatments affecting not a small band but a broad range of metabolic pathways. For E. coli, we used gene expression data from a study in which the regulation during oxygen deprivation was investigated , for P. aeruginosa from a study observing the response to agmatine and putrescine treatment  and from a study of quorum-sensing response to environmental conditions . For S. typhimurium we used data of cells treated with nutrient limitation at different time points  and data from a study that captured the regulatory response in the environment of the host . The data was normalized by variance stabilization normalization . Genes with similar functionality in the same pathway often show co-regulation . Therefore, the maximum correlation coefficient (MCC) of all neighboring reactions of the knocked out reaction was used as a feature. Additionally, we calculated the number of reactions with similar gene expression (NGSE, correlation coefficient > 0.8) and used it as features for an estimate of co-regulated analogous genes.
h) Phyletic retention
We selected 177 prokaryotic organisms (except E. coli, P. aeruginos a, and S. typhimurium) as described in Gustafson et al.  out of which we counted the number of organisms having an open reading frame that was homologous the sequence of the knocked out gene. This was performed with E. coli, P. aeruginosa, and S. typhimurium using bi-directional best BLAST hits (E-value cutoff of 0.1).
i) Codon usage
Codons were counted for each investigated gene from its coding region. We counted base compositions at silent sites (third position of the codons) yielding the features T3s, C3s, A3s, G3s for thymine, cytosine, adenine and guanine, respectively. Additionally, the number of codons coding for all encoded amino acids (phe, ser, tyr, cys, leu, trp, pro, his, arg, gln, ile, met, thr, asn, lys, val, ala, asp, glu, gly) were counted. All codon counts were normalized by division of the total number of codons (Nc). Nc was also used as a feature.
The machine learning system
We used Support Vector Machines from the R package 'e1071' to classify essential and non-essential genes of metabolism http://www.r-project.org. A radial basis function was used as the kernel. Parameter optimization was performed on the training data for the regularization term and the kernel width. The regularization term defined the costs for false classifications and was optimized using the values 2 n with n = -4, -2, 0, 2, 4. The same range was taken for the kernel width. This optimization was realized by training with a grid search over all combinations of these parameters. The sizes of the two classes differed considerably in our data sets (essential genes: 8 - 15%, non-essential genes: 85 - 92%). For a broad spectrum of different sensitivities and specificities, we applied a voting scheme. We trained 100 Support Vector Machines (SVMs) with all essential genes and an equal amount of randomly selected non-essential genes. With this, we stratified the training data. For the classification of a query gene, the output of all machines was summed up and used as a voting score for the gene to be essential for the cell.
Defining the most discriminating features with decision trees
The first decision in decision trees applies the best discriminating feature . Therefore, we used decision trees as an alternative approach for defining the most discriminating features. We applied the method of decision trees using the R package 'rpart' http://www.r-project.org to classify essential and non-essential genes of the metabolism. Gini impurity  was used for splitting the data. The minimum number of observations that had to exist in a node was 20 and the maximum depth was 30. We stratified the training data in the same manner as for training the Support Vector Machines. One hundred decision trees were generated for each gold standard (ecoB, ecoG, paeL and paeJ). To get the most discriminating features, for each run the first decision of each tree was selected.
Performance measures and statistics
For assessing the performance of the classifier, the predictions were compared to the gold standard from the experimental screens. A prediction was either true positive (tp, prediction: essential, gold standard: essential), false positive (fp, prediction: essential, gold standard: non-essential), true negative (tn, prediction: non-essential, gold standard: non-essential) or false negative (fn, prediction: non-essential, gold standard: essential). We calculated the standard measures accuracy = (tp+tn)/(tp+tn+fp+fn), sensitivity = tp/(tp+fn), and specificity = tn/(tn+fp). A receiver operator characteristics (ROC-curve) was used to measure the performance for a classifier system with various thresholds. In the ROC-curve the sensitivity is plotted against 1 - specificity and the area under the curve (AUC) yields a performance estimate across the entire range of thresholds. P-values for Pearson's correlation coefficients for the features were calculated as described in .
This work was funded within the BMBF-FORSYS consortium Viroquant [#0313923]; the Nationales Genom-Forschungs-Netz (NGFN+) for the neuroblastoma project ENGINE, and the Commission on Higher Education (CHE) of Thailand.
- Hopkins AL, Groom CR: The druggable genome. Nat Rev Drug Discov. 2002, 1: 727-730. 10.1038/nrd892View ArticlePubMedGoogle Scholar
- Feist AM, Henry CS, Reed JL, Krummenacker M, Joyce AR, Karp PD, Broadbelt LJ, Hatzimanikatis V, Palsson BO: A genome-scale metabolic reconstruction for Escherichia coli K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information. Mol Syst Biol. 2007, 3: 121- 10.1038/msb4100155PubMed CentralView ArticlePubMedGoogle Scholar
- Schuetz R, Kuepfer L, Sauer U: Systematic evaluation of objective functions for predicting intracellular fluxes in Escherichia coli. Mol Syst Biol. 2007, 3: 119- 10.1038/msb4100162PubMed CentralView ArticlePubMedGoogle Scholar
- Rahman SA, Schomburg D: Observing local and global properties of metabolic pathways: 'load points' and 'choke points' in the metabolic networks. Bioinformatics. 2006, 22: 1767-1774. 10.1093/bioinformatics/btl181View ArticlePubMedGoogle Scholar
- Yeh I, Hanekamp T, Tsoka S, Karp PD, Altman RB: Computational analysis of Plasmodium falciparum metabolism: organizing genomic information to facilitate drug discovery. Genome Res. 2004, 14: 917-924. 10.1101/gr.2050304PubMed CentralView ArticlePubMedGoogle Scholar
- Fatumo S, Plaimas K, Mallm JP, Schramm G, Adebiyi E, Oswald M, Eils R, Konig R: Estimating novel potential drug targets of Plasmodium falciparum by analysing the metabolic network of knock-out strains in silico. Infect Genet Evol. 2009, 9: 351-358. 10.1016/j.meegid.2008.01.007View ArticlePubMedGoogle Scholar
- Lemke N, Heredia F, Barcellos CK, Dos Reis AN, Mombach JC: Essentiality and damage in metabolic networks. Bioinformatics. 2004, 20: 115-119. 10.1093/bioinformatics/btg386View ArticlePubMedGoogle Scholar
- Gursoy A, Keskin O, Nussinov R: Topological properties of protein interaction networks from a structural perspective. Biochem Soc Trans. 2008, 36: 1398-1403. 10.1042/BST0361398View ArticlePubMedGoogle Scholar
- Hahn MW, Kern AD: Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks. Mol Biol Evol. 2005, 22: 803-806. 10.1093/molbev/msi072View ArticlePubMedGoogle Scholar
- Estrada E: Virtual identification of essential proteins within the protein interaction network of yeast. Proteomics. 2006, 6: 35-40. 10.1002/pmic.200500209View ArticlePubMedGoogle Scholar
- Przulj N, Wigle DA, Jurisica I: Functional topology in a network of protein interactions. Bioinformatics. 2004, 20: 340-348. 10.1093/bioinformatics/btg415View ArticlePubMedGoogle Scholar
- Acencio ML, Lemke N: Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information. BMC Bioinformatics. 2009, 10: 290- 10.1186/1471-2105-10-290PubMed CentralView ArticlePubMedGoogle Scholar
- Seringhaus M, Paccanaro A, Borneman A, Snyder M, Gerstein M: Predicting essential genes in fungal genomes. Genome Res. 2006, 16: 1126-1135. 10.1101/gr.5144106PubMed CentralView ArticlePubMedGoogle Scholar
- Gustafson AM, Snitkin ES, Parker SC, DeLisi C, Kasif S: Towards the identification of essential genes using targeted genome sequencing and comparative analysis. BMC Genomics. 2006, 7: 265- 10.1186/1471-2164-7-265PubMed CentralView ArticlePubMedGoogle Scholar
- Hwang YC, Lin CC, Chang JY, Mori H, Juan HF, Huang HC: Predicting essential genes based on network and sequence analysis. Mol Biosyst. 2009, 5: 1672-8. 10.1039/b900611gView ArticlePubMedGoogle Scholar
- Plaimas K, Mallm JP, Oswald M, Svara F, Sourjik V, Eils R, Konig R: Machine learning based analyses on metabolic networks supports high-throughput knockout screens. BMC Syst Biol. 2008, 2: 67- 10.1186/1752-0509-2-67PubMed CentralView ArticlePubMedGoogle Scholar
- Baba T, Ara T, Hasegawa M, Takai Y, Okumura Y, Baba M, Datsenko KA, Tomita M, Wanner BL, Mori H: Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol Syst Biol. 2006, 2: 2006 0008- 10.1038/msb4100050PubMed CentralView ArticlePubMedGoogle Scholar
- Gerdes SY, Scholle MD, Campbell JW, Balazsi G, Ravasz E, Daugherty MD, Somera AL, Kyrpides NC, Anderson I, Gelfand MS, et al.: Experimental determination and system level analysis of essential genes in Escherichia coli MG1655. J Bacteriol. 2003, 185: 5673-5684. 10.1128/JB.185.19.5673-5684.2003PubMed CentralView ArticlePubMedGoogle Scholar
- Jacobs MA, Alwood A, Thaipisuttikul I, Spencer D, Haugen E, Ernst S, Will O, Kaul R, Raymond C, Levy R, et al.: Comprehensive transposon mutant library of Pseudomonas aeruginosa. Proc Natl Acad Sci USA. 2003, 100: 14339-14344. 10.1073/pnas.2036282100PubMed CentralView ArticlePubMedGoogle Scholar
- Liberati NT, Urbach JM, Miyata S, Lee DG, Drenkard E, Wu G, Villanueva J, Wei T, Ausubel FM: An ordered, nonredundant library of Pseudomonas aeruginosa strain PA14 transposon insertion mutants. Proc Natl Acad Sci USA. 2006, 103: 2833-2838. 10.1073/pnas.0511100103PubMed CentralView ArticlePubMedGoogle Scholar
- Knuth K, Niesalla H, Hueck CJ, Fuchs TM: Large-scale identification of essential Salmonella genes by trapping lethal insertions. Mol Microbiol. 2004, 51: 1729-1744. 10.1046/j.1365-2958.2003.03944.xView ArticlePubMedGoogle Scholar
- Leon PE: Inhibition of ribozymes by deoxyribonucleotides and the origin of DNA. J Mol Evol. 1998, 47: 122-126. 10.1007/PL00006368View ArticlePubMedGoogle Scholar
- Giaever G, Chu AM, Ni L, Connelly C, Riles L, Veronneau S, Dow S, Lucau-Danila A, Anderson K, Andre B, et al.: Functional profiling of the Saccharomyces cerevisiae genome. Nature. 2002, 418: 387-391. 10.1038/nature00935View ArticlePubMedGoogle Scholar
- Grocock RJ, Sharp PM: Synonymous codon usage in Pseudomonas aeruginosa PA01. Gene. 2002, 289: 131-139. 10.1016/S0378-1119(02)00503-6View ArticlePubMedGoogle Scholar
- Witten IH, Frank E: Data Mining, Practical Machine Learning Tools and Techniques. 2005, San Francisco: Morgan Kaufmann,Google Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389PubMed CentralView ArticlePubMedGoogle Scholar
- Hubbard TJ, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, Cutts T, et al.: Ensembl 2007. Nucleic Acids Res. 2007, 35: D610-617. 10.1093/nar/gkl996PubMed CentralView ArticlePubMedGoogle Scholar
- Bratkovic T, Lunder M, Urleb U, Strukelj B: Peptide inhibitors of MurD and MurE, essential enzymes of bacterial cell wall biosynthesis. J Basic Microbiol. 2008, 48: 202-206. 10.1002/jobm.200700133View ArticlePubMedGoogle Scholar
- Kotnik M, Anderluh PS, Prezelj A: Development of novel inhibitors targeting intracellular steps of peptidoglycan biosynthesis. Curr Pharm Des. 2007, 13: 2283-2309. 10.2174/138161207781368828View ArticlePubMedGoogle Scholar
- Helm JS, Hu Y, Chen L, Gross B, Walker S: Identification of active-site inhibitors of MurG using a generalizable, high-throughput glycosyltransferase screen. J Am Chem Soc. 2003, 125: 11168-11169. 10.1021/ja036494sView ArticlePubMedGoogle Scholar
- Zawadzke LE, Norcia M, Desbonnet CR, Wang H, Freeman-Cook K, Dougherty TJ: Identification of an inhibitor of the MurC enzyme, which catalyzes an essential step in the peptidoglycan precursor synthesis pathway. Assay Drug Dev Technol. 2008, 6: 95-103. 10.1089/adt.2007.114View ArticlePubMedGoogle Scholar
- Robertson D, Carroll P, Parish T: Rapid recombination screening to test gene essentiality demonstrates that pyrH is essential in Mycobacterium tuberculosis. Tuberculosis (Edinb). 2007, 87: 450-458. 10.1016/j.tube.2007.05.015View ArticleGoogle Scholar
- Peukert S, Sun Y, Zhang R, Hurley B, Sabio M, Shen X, Gray C, Dzink-Fox J, Tao J, Cebula R, Wattanasin S: Design and structure-activity relationships of potent and selective inhibitors of undecaprenyl pyrophosphate synthase (UPPS): tetramic, tetronic acids and dihydropyridin-2-ones. Bioorg Med Chem Lett. 2008, 18: 1840-1844. 10.1016/j.bmcl.2008.02.009View ArticlePubMedGoogle Scholar
- Tong L, Harwood HJ: Acetyl-coenzyme A carboxylases: versatile targets for drug discovery. J Cell Biochem. 2006, 99: 1476-1488. 10.1002/jcb.21077View ArticlePubMedGoogle Scholar
- Tong L: Acetyl-coenzyme A carboxylase: crucial metabolic enzyme and attractive target for drug discovery. Cell Mol Life Sci. 2005, 62: 1784-1803. 10.1007/s00018-005-5121-4View ArticlePubMedGoogle Scholar
- Almiron M, Martinez M, Sanjuan N, Ugalde RA: Ferrochelatase is present in Brucella abortus and is critical for its intracellular survival and virulence. Infect Immun. 2001, 69: 6225-6230. 10.1128/IAI.69.10.6225-6230.2001PubMed CentralView ArticlePubMedGoogle Scholar
- Capobianco JO, Darveau RP, Goldman RC, Lartey PA, Pernet AG: Inhibition of exogenous 3-deoxy-D-manno-octulosonate incorporation into lipid A precursor of toluene-treated Salmonella typhimurium cells. J Bacteriol. 1987, 169: 4030-4035.PubMed CentralPubMedGoogle Scholar
- Ku MJ, Yoon HJ, Ahn HJ, Kim HW, Baek SH, Suh SW: Crystallization and preliminary X-ray crystallographic studies of 3-deoxy-manno-octulosonate cytidylyltransferase from Haemophilus influenzae. Acta Crystallogr D Biol Crystallogr. 2003, 59: 180-182. 10.1107/S0907444902019698View ArticlePubMedGoogle Scholar
- Jelakovic S, Schulz GE: The structure of CMP:2-keto-3-deoxy-manno-octonic acid synthetase and of its complexes with substrates and substrate analogs. J Mol Biol. 2001, 312: 143-155. 10.1006/jmbi.2001.4948View ArticlePubMedGoogle Scholar
- Sohn MJ, Zheng CJ, Kim WG: Macrolactin S, a New Antibacterial Agent with FabG-inhibitory Activity from Bacillus sp. AT28. J Antibiot (Tokyo). 2008, 61: 687-691.View ArticleGoogle Scholar
- Shi W, Bogdanov M, Dowhan W, Zusman DR: The pss and psd genes are required for motility and chemotaxis in Escherichia coli. J Bacteriol. 1993, 175: 7711-7714.PubMed CentralPubMedGoogle Scholar
- Raetz CR, Foulds J: Envelope composition and antibiotic hypersensitivity of Escherichia coli mutants defective in phosphatidylserine synthetase. J Biol Chem. 1977, 252: 5911-5915.PubMedGoogle Scholar
- Belunis CJ, Clementz T, Carty SM, Raetz CR: Inhibition of lipopolysaccharide biosynthesis and cell growth following inactivation of the kdtA gene in Escherichia coli. J Biol Chem. 1995, 270: 27646-27652. 10.1074/jbc.270.46.27646View ArticlePubMedGoogle Scholar
- Kumar P, Chhibber M, Surolia A: How pantothenol intervenes in Coenzyme-A biosynthesis of Mycobacterium tuberculosis. Biochem Biophys Res Commun. 2007, 361: 903-909. 10.1016/j.bbrc.2007.07.080View ArticlePubMedGoogle Scholar
- Kobayashi K, Ehrlich SD, Albertini A, Amati G, Andersen KK, Arnaud M, Asai K, Ashikaga S, Aymerich S, Bessieres P, et al.: Essential Bacillus subtilis genes. Proc Natl Acad Sci USA. 2003, 100: 4678-4683. 10.1073/pnas.0730515100PubMed CentralView ArticlePubMedGoogle Scholar
- Ji Y, Zhang B, Van SF, Horn , Warren P, Woodnutt G, Burnham MK, Rosenberg M: Identification of critical staphylococcal genes using conditional phenotypes generated by antisense RNA. Science. 2001, 293: 2266-2269. 10.1126/science.1063566View ArticlePubMedGoogle Scholar
- Chopra I: Bacterial RNA polymerase: a promising target for the discovery of new antimicrobial agents. Curr Opin Investig Drugs. 2007, 8: 600-607.PubMedGoogle Scholar
- Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 1999, 27: 29-34. 10.1093/nar/27.1.29PubMed CentralView ArticlePubMedGoogle Scholar
- Hunter WN: The non-mevalonate pathway of isoprenoid precursor biosynthesis. J Biol Chem. 2007, 282: 21573-21577. 10.1074/jbc.R700005200View ArticlePubMedGoogle Scholar
- Illarionova V, Kaiser J, Ostrozhenkova E, Bacher A, Fischer M, Eisenreich W, Rohdich F: Nonmevalonate terpene biosynthesis enzymes as antiinfective drug targets: substrate synthesis and high-throughput screening methods. J Org Chem. 2006, 71: 8824-8834. 10.1021/jo061466oView ArticlePubMedGoogle Scholar
- Singh N, Cheve G, Avery MA, McCurdy CR: Targeting the methyl erythritol phosphate (MEP) pathway for novel antimalarial, antibacterial and herbicidal drug discovery: inhibition of 1-deoxy-D-xylulose-5-phosphate reductoisomerase (DXR) enzyme. Curr Pharm Des. 2007, 13: 1161-1177. 10.2174/138161207780618939View ArticlePubMedGoogle Scholar
- Cornish RM, Roth JR, Poulter CD: Lethal mutations in the isoprenoid pathway of Salmonella enterica. J Bacteriol. 2006, 188: 1444-1450. 10.1128/JB.188.4.1444-1450.2006PubMed CentralView ArticlePubMedGoogle Scholar
- Testa CA, Cornish RM, Poulter CD: The sorbitol phosphotransferase system is responsible for transport of 2-C-methyl-D-erythritol into Salmonella enterica serovar typhimurium. J Bacteriol. 2004, 186: 473-480. 10.1128/JB.186.2.473-480.2004PubMed CentralView ArticlePubMedGoogle Scholar
- Dandekar T, Moldenhauer F, Bulik S, Bertram H, Schuster S: A method for classifying metabolites in topological pathway analyses based on minimization of pathway number. Biosystems. 2003, 70: 255-270. 10.1016/S0303-2647(03)00067-4View ArticlePubMedGoogle Scholar
- Butland G, Babu M, Diaz-Mejia JJ, Bohdana F, Phanse S, Gold B, Yang W, Li J, Gagarinova AG, Pogoutse O, et al.: eSGA: E. coli synthetic genetic array analysis. Nat Methods. 2008, 5: 789-795. 10.1038/nmeth.1239View ArticlePubMedGoogle Scholar
- McNeil LK, Reich C, Aziz RK, Bartels D, Cohoon M, Disz T, Edwards RA, Gerdes S, Hwang K, Kubal M, et al.: The National Microbial Pathogen Database Resource (NMPDR): a genomics platform based on subsystem annotation. Nucleic Acids Res. 2007, 35: D347-353. 10.1093/nar/gkl947PubMed CentralView ArticlePubMedGoogle Scholar
- Barabasi AL, Oltvai ZN: Network biology: understanding the cell's functional organization. Nat Rev Genet. 2004, 5: 101-113. 10.1038/nrg1272View ArticlePubMedGoogle Scholar
- Wagner A, Fell DA: The small world inside large metabolic networks. Proc Biol Sci. 2001, 268: 1803-1810. 10.1098/rspb.2001.1711PubMed CentralView ArticlePubMedGoogle Scholar
- Csardi G, Nepusz T: The igraph software package for complex network research. interJournal. 2006, Complex Systems: 1695-Google Scholar
- Bonacich P: Power and Centrality: A Family of Measures. American Journal of Sociology. 1987, 92: 1170-1182. 10.1086/228631.View ArticleGoogle Scholar
- Koschützki D, Schreiber F: Comparison of Centralities for Biological Networks. Proc German Conf Bioinformatics (GCB 2004). 2004, 199-206. Springer-Verlag,Google Scholar
- Bonacich P: Factoring and weighting approaches to status scores and clique identification. Journal of Mathematical Sociology. 1972, 2: 113-120.View ArticleGoogle Scholar
- Covert MW, Knight EM, Reed JL, Herrgard MJ, Palsson BO: Integrating high-throughput and computational data elucidates bacterial networks. Nature. 2004, 429: 92-96. 10.1038/nature02456View ArticlePubMedGoogle Scholar
- Chou HT, Kwon DH, Hegazy M, Lu CD: Transcriptome analysis of agmatine and putrescine catabolism in Pseudomonas aeruginosa PAO1. J Bacteriol. 2008, 190: 1966-1975. 10.1128/JB.01804-07PubMed CentralView ArticlePubMedGoogle Scholar
- Schuster M, Greenberg EP: Early activation of quorum sensing in Pseudomonas aeruginosa reveals the architecture of a complex regulon. BMC Genomics. 2007, 8: 287- 10.1186/1471-2164-8-287PubMed CentralView ArticlePubMedGoogle Scholar
- Kim CC, Falkow S: Significance analysis of lexical bias in microarray data. BMC Bioinformatics. 2003, 4: 12- 10.1186/1471-2105-4-12PubMed CentralView ArticlePubMedGoogle Scholar
- Detweiler CS, Monack DM, Brodsky IE, Mathew H, Falkow S: virK, somA and rcsC are important for systemic Salmonella enterica serovar Typhimurium infection and cationic peptide resistance. Mol Microbiol. 2003, 48: 385-400. 10.1046/j.1365-2958.2003.03455.xView ArticlePubMedGoogle Scholar
- Huber W, von Heydebreck A, Sultmann H, Poustka A, Vingron M: Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics. 2002, 18 (Suppl 1): S96-104.View ArticlePubMedGoogle Scholar
- Samal A, Singh S, Giri V, Krishna S, Raghuram N, Jain S: Low degree metabolites explain essential reactions and enhance modularity in biological networks. BMC Bioinformatics. 2006, 7: 118- 10.1186/1471-2105-7-118PubMed CentralView ArticlePubMedGoogle Scholar
- Breiman L, Friedman JH, Olshen R, Stone CJ: Classification and Regression Tree. 1984, Monterey, CA: Wadsworth International Group,Google Scholar
- Weiss NA: Introductory Statistics. 2007, New York: Addison Wesley,Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.