A multi-level multi-scale approach to study essential genes in Mycobacterium tuberculosis
- Soma Ghosh†1,
- Priyanka Baloni†1,
- Sumanta Mukherjee1,
- Praveen Anand1 and
- Nagasuma Chandra1Email author
© Ghosh et al.; licensee BioMed Central Ltd. 2013
Received: 25 April 2013
Accepted: 20 November 2013
Published: 5 December 2013
The set of indispensable genes that are required by an organism to grow and sustain life are termed as essential genes. There is a strong interest in identification of the set of essential genes, particularly in pathogens, not only for a better understanding of the pathogen biology, but also for identifying drug targets and the minimal gene set for the organism. Essentiality is inherently a systems property and requires consideration of the system as a whole for their identification. The available experimental approaches capture some aspects but each method comes with its own limitations. Moreover, they do not explain the basis for essentiality in most cases. A powerful prediction method to recognize this gene pool including rationalization of the known essential genes in a given organism would be very useful. Here we describe a multi-level multi-scale approach to identify the essential gene pool in a deadly pathogen, Mycobacterium tuberculosis.
The multi-level workflow analyses the bacterial cell by studying (a) genome-wide gene expression profiles to identify the set of genes which show consistent and significant levels of expression in multiple samples of the same condition, (b) indispensability for growth by using gene expression integrated flux balance analysis of a genome-scale metabolic model, (c) importance for maintaining the integrity and flow in a protein-protein interaction network and (d) evolutionary conservation in a set of genomes of the same ecological niche. In the gene pool identified, the functional basis for essentiality has been addressed by studying residue level conservation and the sub-structure at the ligand binding pockets, from which essential amino acid residues in that pocket have also been identified. 283 genes were identified as essential genes with high-confidence. An agreement of about 73.5% is observed with that obtained from the experimental transposon mutagenesis technique. A large proportion of the identified genes belong to the class of intermediary metabolism and respiration.
The multi-scale, multi-level approach described can be generally applied to other pathogens as well. The essential gene pool identified form a basis for designing experiments to probe their finer functional roles and also serve as a ready shortlist for identifying drug targets.
KeywordsEssential genes Multi-scale analysis Mycobacterium tuberculosis Protein structure
An essential gene is defined as a gene necessary for the growth and maintenance of an organism . Identifying a set of essential genes for a given condition, is of significant importance for a number of reasons: (a) for understanding pathogen biology: identification will help in prioritizing the set of genes for functional studies; (b) from an evolutionary perspective: since a distinct correlation is suggested between essentiality and extent of conservation in a given family or class of organisms ; (c) in drug discovery: essential genes in a pathogenic species form a ready shortlist of possible drug targets . Essentiality to the pathogen is in fact one of the key criteria for defining a drug target ; (d) essentiality also serves as a useful parameter in biomarker identification, since by definition, an essential gene is necessarily present in the pathogen ; (e) from a synthetic biology perspective, an essential gene set should overlap extensively with the minimal gene set required for survival of the organism and hence an identification of the essential set also forms a starting point for future work towards synthetic reconstruction of the organism . As whole genome sequences of a large number of species are being deciphered and omics data covering various aspects are accumulated, a number of both experimental and computational methods are being explored to identify essential genes [7–11]. There are reports based on functional genomics studies of several organisms, which indicate that only 15 - 20% of the genes are essential to the organism under a given condition .
Approaches used for identification of essential gene set include classical forward genetic screens , genome-wide RNA interference screens  and targeted gene knockouts . Typically, a given gene is deleted or inactivated by one of the listed techniques and if the resulting mutant strain leads to ‘loss of function’ , in terms of the loss of viability of the organism, the given gene is said to be essential. Whole genome transposon mutagenesis, which involves exploitation of transposon or mobile DNA elements, as mutagens, so as to inactivate the gene, has been applied to a number of organisms [16–19]. The basic premise in these is that, the inability of the bacterium to survive due to the disruption of gene expression by the inserted transposon indicates essentiality of the disrupted gene. Databases such as DEG , and OGEE  have also been developed, that combine information about essential genes in a number of organisms based on extensive literature survey. While, each of these methods has tremendous advantages, they come with their own limitations as well and hence cannot be expected to identify essential genes comprehensively. Even when successful, these techniques do not provide any mechanistic insights about why a particular gene is essential.
Computational methods to probe essentiality have mainly involved identification of orthologs in related organisms and assessing phyletic retention. Sequence features such as GC-content, codon usage, and localization signals have also been used for inferring essentiality, although these have been found to be less accurate [1, 22]. Rio and coworkers have reported a method for identification of essential genes from large interaction networks and report significant prediction accuracy by using a combination of network parameters and centrality measures . Plaimas and coworkers  apply a machine learning approach to identify essential genes in bacterial metabolic networks, wherein features characterizing network topology, sequence information and co-expression profiling were utilized for training, which was further applied to identify drug targets in Salmonella typhimurium. Constraint-based metabolic modeling approaches such as flux balance analysis have been used to analyse genome-scale metabolic networks in several organisms such as Staphylococcus aureus, Helicobacter pylori, Mycoplasma genitalium and Pseudomonas aeruginosa. Simulation of virtual knockouts of genes in these networks is helpful in inferring essentiality [23–26]. Very recently Karr et al. have built a whole cell computational model of the life cycle of Mycoplasma genitalium, wherein the cell was divided into different modules based on the functional capacity of each module. Each module was then independently modeled using mathematical tools that best suits the model: FBA for metabolism, Poisson processes for RNA and protein degradation. The built model thus could accurately reproduce experimental data and provide insight into many biological processes. Such studies highlight the importance of integrating different mathematical or computational aspects for different biological processes. Different computational methods capture different aspects that define why a gene is essential and hence, no method individually is sufficient to provide a powerful predictive tool, warranting exploration of newer approaches for studying essentiality.
Essentiality is inherently a systems property . A gene known to have an important function may not be essential in the whole system due to genetic redundancy or functional pleiotropy . The loss of some genes can be compensated by alternate pathways to reach the same biochemical goal. In other cases the function of the gene may not be essential for growth or survival, but may have some specialized function such as imparting virulence . A systems approach, therefore, becomes necessary to address these issues.
There are nearly two million deaths every year, translating to one death every 15-18 seconds, due to tuberculosis . Tuberculosis (TB) has unfortunately retained the status, for number of decades, as being the leading killer among all infectious diseases. The strong synergy of the causative organism Mycobacterium tuberculosis (M.tb) with the deadly virus HIV has made the problem more acute. Although a handful of drugs and vaccines are available for the treatment of this disease, the problem has remained acute due to difficulties in diagnosis, long periods of treatment, inability to tackle latent forms of the bacteria and more importantly the emergence of drug resistant varieties of M.tb such as MDR, XDR and TDR strains . Further research to understand the biology of the pathogen in a more wholistic approach and to apply the knowledge for the identification of newer and more efficient drug targeting strategies is thus urgently required. The whole genome sequencing of M.tb, about 15 years ago has triggered intense functional genomics studies on M.tb, leading to the accumulation of several types of omics data. Several computational studies are also available [33, 34], adding to the resource base, making it feasible to address complex issues such as gene essentiality for this pathogen from an integrated perspective. Methods to integrate various omics data into networks are also beginning to be described in literature .
Here we seek to study gene essentiality at multiple levels and at multiple scales of spatial resolution, to identify with high confidence, a set of essential genes in M.tb. The study uses a range of models at each level, including genome-scale interactome, metabolic network, individual pathways and biochemical reactions and finally individual amino acid residues in the proteins. Computational approaches at systems, structural and sequence levels have thus been employed.
Microarray data analysis
Microarray analysis provides a global picture of the expression profiles of all the genes in an organism under a given condition. It is reasonable to assume that an essential gene ought to be expressed in the cell in sufficient quantities. Although, expression by itself does not dictate function in a bacterial cell, consistent expression patterns of individual genes can be considered to provide a rough indication of the functional capacity of that gene in the cell. In this study, genes that exhibit consistent and high expression patterns in many samples of the same wild type (WT) growth condition are grouped into a shortlist of probable candidates for essentiality. This is based on the premise that a cell expends its limited energy and resources towards protein synthesis only if the protein has certain functional significance for the survival of the cell .
Microarray based gene expression dataset considered for the present study consists of a total of 39 samples of the same WT condition, with 3923 genes in each sample . A scoring scheme was devised for all the genes in the genome that forms a continuous scaling measure (Equation 4). The score captures consistent expression in the 39 different microarrays used for the study and for a given expression level, those which show consistent expression, will have the highest scores. It is important to note here, that while high expression is not a necessary condition for the gene to be termed essential, consistent expression across biological samples of the same kind is required for a gene to be considered for essentiality. Jeong et al. have earlier shown the importance of consistency in gene expression data for predicting gene essentiality in yeast, Saccharomyces cerevisiae.
In this work, genes which appear up to the 50th percentile in the ranked list (based on Equation 4) were selected for further analysis. At this stage, since the intention is to obtain a shortlist for further analysis in the pipeline, a lenient cut-off of 50th percentile was chosen so as to avoid any false negatives. As many as 2968 genes (listed in Additional file 1: Table S1) out of 3923 genes were identified by this criteria indicating that a large number of genes showed consistent and significant levels of expression. Although it is theoretically possible to miss some genes that may be consistently expressed at very low levels, the rather broad cut-off used at this stage implies that whatever maybe missed will only be a small fraction, since of the 955 genes eliminated, most (97.5%; 932 out of 955) showed generally low expression and large variations (SD > 0.5).
These 2968 genes were further classified into different Tuberculist  functional classes. As can be seen from Additional file 2: Figure S1, genes involved in metabolism and respiration form a major portion of the pie-chart (24%). M.tb is known to have developed sophisticated mechanisms for cell-wall synthesis which is also reflected in the figure, with 20% of the genes falling in this category. 24% of this set is formed of conserved hypotheticals, highlighting the knowledge gaps that exist for M.tb. About 10% of the genome from this list is dedicated for PE and PPE families of proteins, which are unique to mycobacteria . The functions of some of these are now understood and it is believed that they are important for virulence and antigenic diversity of the pathogen . Variable patterns of gene expression for these genes during infection have been discussed earlier [41, 42]. These set of proteins are involved in the modulation of antigen processing and presentation and thus very successful in evading immune responses. These set of genes have been reported as immune-dominant antigens and are ideal drug targets. About 4% of the shortlisted genes belonged to this functional category.
Flux balance analysis
Metabolism is a fundamental requirement in an organism to sustain life. Flux balance analysis (FBA) captures the metabolic state of a system at steady state condition. This is achieved by reconstructing a genome-scale metabolic network  and computing the relative fluxes of individual biochemical reactions within the constraints of mass balance and defined stoichiometric coefficients of the network, so as to obtain maximal growth. The method enables systematic perturbations through simulation of KOs of individual genes in the model. An essential gene and hence the protein can easily be identified through simulations that lead to zero or reduced growth .
Genome scale metabolic reconstruction for M.tb H37Rv used here consists of 661 genes and 939 reactions  and is referred to as i NJ661. Gene expression values were incorporated into this model to guide the bounds of the individual reaction fluxes. The method of incorporating gene expression values is based on E-Flux reported in the literature . Bounds for each reaction were assigned based on the Gene-Protein-Reaction relationships (GPRs) provided with the model. Integrating expression data into FBA makes the network much more biologically realistic, since dynamics of a metabolic network would also depend on the protein concentrations available for the reaction to occur. Fluxes are calculated through each reaction so as to maximise the biomass function. The objective value obtained for the original model was 0.0398 hr-1, while for the modified model it was 0.0039 hr-1. Around 46 - 47% non-zero flux reactions were obtained in both the cases. Single gene deletions were performed by systematically deleting each gene in the model and calculating the optimal fluxes using the same biomass function. The ratio of the objective value of the KO and WT (grRatio) was calculated and plotted for the original and the modified model. The value of grRatio determines the importance of a particular gene for the growth of the organism and hence can be used to predict essentiality.
i NJ661 and GSMN-TB  are two metabolic networks reported almost simultaneously, both sharing a high amount of similarity. Besides these, newer models have been developed recently for the same organism with more number of reactions and genes. One such model is i NJ661v  which contains 663 genes, 838 metabolites, and 1,049 reactions that incorporates reactions from both i NJ661 and GSMN-TB, and shown to be more sensitive towards predicting essentiality. The objective value obtained for the model was 0.07 hr-1, with 49.85% of the reactions showing non-zero fluxes. Therefore, in addition to i NJ661 model, i NJ661v model was also considered for analysis. The method for predicting gene essentiality was similar to the i NJ661 model, and this lead to the addition of 59 more genes to the already identified 260 genes. The grRatio for KO simulations for this model is shown in Figure 2b and a comparison of the impact of gene deletions in the two models is shown in Additional file 3: Figure S2.
The genes obtained were further classified into functional groups using the scheme provided in Jamshidi et al.. As seen in Additional file 4: Figure S3(a), genes from membrane metabolism (12%) and fatty acid metabolism (9%) were significant, reiterating the importance of lipid metabolism and cell wall synthesis in M.tb. Purine metabolism (10%) and redox metabolism (7%) also contribute significantly to this set of genes.
Apart from the single gene deletions, which characterised genes required for the survival of the organism, double gene deletions were also performed to look for pairs of genes which when knocked out simultaneously are lethal or unfavourable to the organism. 110 such gene pairs (Additional file 5: Table S2) were obtained and could serve as potential drug targets. Additional file 4: Figure S3(b) gives the pathway classification of the genes identified from double KO analysis. In this case, genes belonging to fatty acid metabolism (19%) account for higher contribution, followed by citric acid cycle pathway (14%) and purine metabolism (8%) while glutamate metabolism and glycine, serine, threonine metabolism account for 7% of the total list. About 6% of genes are involved in redox metabolism and 5% of genes contributed to pyruvate metabolism and transport respectively.
From this analysis, 319 genes (hypergeometric p-value < 0.0001) which showed lethal or unfavourable growth on performing single gene deletions for the gene expression integrated models were selected as candidate essential genes (Additional file 1: Table S1). The hypergeometric test was performed to establish the significance of the subset of genes identified as essential from all the genes analysed using FBA. The genes identified as essential through this approach were later categorised into different functional classes based on the schema provided by Palsson and co-workers .
Protein–protein interaction networks are known to capture global as well as the local behaviour of a system, despite having the limitation of being static in nature. This can be resolved to some extent by integrating gene expression data into the network and then deriving various insights through calculation of network properties . Shortest paths between all pairs of nodes in a network serve as useful pointers to understand the overall topology of the network and the extent of interaction between the nodes . Adding weights based on expression level of a node leads to construction of response networks and hence identification of biologically significant paths .
An extensively curated, weighted and directed protein–protein interaction network was generated using standards described in the methods section. It consisted of a total of 1240 nodes and 7844 edges (Additional file 6: Table S3). This was a high confidence network and an interaction was included only if sufficient evidence was available for that interaction in literature and databases. To make the network more biologically significant, microarray data was integrated into the network as node weights. Each edge was assigned a weight that was a function of the betweenness centrality of that edge and the node weights connecting that edge (Equation 5). Shortest paths were exploited to study the importance of each protein by performing systematic KOs in the network. Importance of a gene or protein is then simply a measure of the number of shortest paths disrupted upon knockout of that node in the network. Number of disrupted paths also indicates whether a given protein causes a global disruption to the network or whether it brings about a local effect.
Conservation of a protein across species can also broadly imply essentiality as they have been retained through evolution. BLAST  searches were performed for all the M.tb H37Rv sequences against a genus-specific dataset of 63 species, representing a set of evolutionarily related organisms. The dataset was prepared by considering all Mycobacterium species, whose whole genome sequences were available, excluding all the M.tb strains. Thus, this dataset includes species that are pathogenic, non-pathogenic, fast and slow growers as well as those pathogenic to non-human hosts. Sequences with identity ≥ 30% with sequence coverage ≥ 70% and e-value of ≤ 0.001, with respect to the query protein were chosen for studying essentiality.
Proteins showing high phyletic retention were shortlisted. Quantitatively, a score was assigned to each protein based on Equation 7, explained in the methods section. Those with a score ≥ 50 were selected to be essential, which in turn implied that the gene is conserved in at least 50% of the species in the mycobacterial genus. 1902 sequences were thus identified and are listed in Additional file 1: Table S1.
A functional classification of this set was also performed based on Tuberculist classification scheme (Additional file 8: Figure S4) and it is observed that large number of genes belong to intermediary metabolism and respiration (29%), conserved hypotheticals (23%), cell wall and cell processes (19%) and lipid metabolism (11%).
Deriving the final list of essential genes (EGs) and comparison with experimental datasets
EG = essential genes,
F = genes identified using gene expression integrated FBA,
N = genes identified using network,
P = genes identified using phyletic retention,
M = genes identified using microarray,
The Boolean logic (Equation 1) selects those genes that were predicted as essential by at least two of the three methods (FBA, phyletic retention and network analysis) in addition to the necessary condition of consistent expression pattern. The rule considers expression data as an essential but not a sufficient condition to predict essentiality. If a gene is considered essential, it is important that it expresses sufficiently and consistently. A final set of 283 genes were identified using Equation 1 and is provided in Additional file 9: Table S7.
An earlier study had experimentally identified a set of 654 genes, of which 614 were considered as essential while 40 more were found to disrupt the growth of the organism using transposon site hybridization method . The same group has subsequently used high density mutagenesis and deep sequencing methods to identify a set of 774 genes (p < 0.05) as essential . The latter is reported to be a more sensitive method and apart from identifying those reported by the former approach, it adds many new ones. A recent study by Zhang et al., that identified essential genes by dividing the whole-genome sequence into windows of different length and studied essentiality by performing TraSH analysis have also become available. Thus, the set of 283 genes obtained from our study is compared to these datasets, from which we observe an agreement for 182 genes. A few other datasets of experimentally identified EGs under different conditions or using different approaches are also available [17, 53–60]. From these, an agreement is observed for another 26 genes. Put together, the agreement with our analysis is around 73.5%. It must be noted that the experimental approaches too have their limitations and fail to capture essentiality in some situations. Despite these limitations, it is encouraging to observe a high correlation with these datasets.
Sequence and structural analysis
Biological function to any essential protein is imparted by its amino acid sequence and structure. Strong interdependence exists between protein essentiality, its function and the corresponding sequence and structural details of the protein. Therefore, an in-depth residue and structural level analysis is carried out for the final list of 283 proteins to obtain structural and sequence level insights that impart function and hence essentiality to the protein product of the gene.
Residue level conservation
Binding sites in a protein structure are important for the function of the protein. Residues in a binding site are usually conserved and have functional importance attached to them. Using algorithms previously developed in the laboratory  binding sites were identified in the structural models for the set of EGs identified here. The models were obtained from M.tb structural proteome database, available in the laboratory . Structural models were available for 269 of the 283 identified EGs, in which a total of 919 binding pockets were identified. The conserved residues identified from sequence conservation analysis were mapped on the binding sites or pockets. Figure 8c shows an example of binding pockets for Rv3436c and mapping of the conserved residues on it.
From this analysis, it was observed that all the 269 proteins had at least some conserved residues in their binding sites, amounting to 810 of the 919 sites studied. The binding sites were selected based on a consensus method that considered geometry, conservation as well as energy based cut-offs. This ensures that the identified pockets have sufficient volume and geometry to recognize a meaningful biological ligand. Figure 9 shows that more than 70% of the binding sites contain conserved residues. Considering the fact, that the set of sites studied includes crystallographically known binding sites as well as consensus site predictions, the extent of conservation is remarkable indeed. This was also verified statistically by calculating the quantile variation of conservation scores in the protein as a whole as compared to the binding sites alone in each case. The conservation of residues in the binding sites was significantly higher than for other regions in the protein (p-value < 2.2 e-16). The p-value was calculated between the conservation score distribution of the residues in the whole protein and that at the binding sites. Identification of these sites also provides a list of important residues and hence important interactions for the function of that protein. Our definition of essentiality implies that a given gene is essential if it has an essential function, which is conserved across other members in its close ecological niche. This means that the functional site residues that impart a function to a protein ought to be closely conserved. Hence the set of conserved residues in conserved pockets in the set of EGs indicate a basis for the conservation of function and hence for the essentiality of that protein.
The complexity of living organisms can be probed by understanding which genes play an essential role for the growth and survival and which are conditionally essential. Essentiality is studied here at multiple levels, at the levels of modules in the topological network of protein-protein interactions, at the pathway level by identifying essential metabolic, regulatory and signalling pathways, at the genome level, first by identifying high expression in multiple arrays of the same condition and second, by selecting genes that are evolutionarily conserved.
Consistent expression at a genome-scale level is used for weighting both the genome-scale metabolic network as well as the topological interactome, making the models resemble the actual experimental parameters more closely. An often asked question is whether gene expression levels can be used to understand protein abundance. While, gene expression may not always translate to corresponding protein abundances, there are many studies which show significant correlation between gene expression and protein abundances [71, 72]. In any case, given the paucity of proteome-wide protein abundance data, this issue is difficult to address. In this study, we have tried to resolve this issue by considering multiple transcriptome profiles of essentially the same condition, simply to augment confidence levels. This is based on the premise that if a gene is consistently expressed across multiple profiles, it is unlikely to vary significantly in that condition and hence can be expected to correlate reasonably well with protein abundance. It should however be noted that focussing only on consistently expressed genes may not capture all genes that are essential for the organism, but in a systematic genome-scale study such as this, the decision has been to minimise false positives, with the belief that the subset of genes identified as essential should be of high confidence even if a few genes are missed out.
Two different types of systems approaches are used here, the interactome architecture analysis and the flux balance analysis of the genome-scale metabolic model, both integrating functional genomics data in terms of expression levels. The topological analysis utilizing graph theoretical methods of a weighted directed network explore the inherent organization of the connectedness of the individual molecular players in the cell and provide a basis to understand dynamics and the flow of chemical information using those connections. These approaches provide insights into the emergent properties arising in the system as a whole comprising specific interactions among the large number of components in a cell.
Detailed comparison of the various methods used to predict essentiality
The three experimentally determined methods have identified 654 , 774  and 688  EGs. It must be noted that the set of EGs common to them are only 362, which corresponds to a commonality of only 47%, indicating lacunae with the experimental methods as well. Nevertheless for lack of any other comprehensive data, union of the EGs from these three methods, amounting to 1093 EGs are used for comparison with the predictions made in this study. The accuracies observed with individual methods are, 30.7% for phyletic retention analysis, 73% with FBA, 29.3% for gene expression and 62.2% with network analysis.
139 genes were identified by this method and are listed in Additional file 11: Table S6. This set showed 60% correlation with experimental dataset. Overall, on comparing EGs identified by the different Boolean logics, Equation 1 described in the results, fares better as it a) has higher coverage as compared to Equation 2 and b) it is indeed not skewed towards any one method as seen from the analysis of Equation 3. Finally, the 283 EGs derived from Equation 1 were chosen as the final list from this study, which had an overall accuracy of 73.5%.
Summary of datasets used in the study
Transcriptome profiling of M.tb under different combinations of drug treatment using whole genome microarrays.
(GSE1642 from GEO)
(transcriptome, FBA and network analysis)
Database for linking genomic, proteomic and pathway level information. Also hosts resources for metabolites and ligands.
Open-source database for pathway information. It is manually curated and peer-reviewed.
Provides a list of protein-protein interactions based on computational, high-throughput experiments and literature survey.
Wang et al.
Protein – protein interactions were identified based on high-throughput bacterial two hybrid method.
Vashisht et al.
M.tb interactome pathway re-annotated and reconstructed, containing a total of 1434 proteins linked through 2575 functional relationship.
Jamshidi et al.
Reconstruction of metabolic network of the in silico strain of M.tb. Consists of 661 genes and 939 reactions. FBA was performed and essential genes were identified.
Fang et al.
Reconstruction of metabolic network of the in silico strain of M.tb. Consists of 663 genes and 1049 reactions. FBA was performed and essential genes were identified.
Anand et al.
Structural models were obtained and validated for 70% of the M.tb genome. Fold based functional annotations were assigned and novel binding sites identified.
(binding site analysis)
Sassetti et al.
Transposon site hybridization was carried to identify set of essential genes.
Griffin et at.
High density mutagenesis and deep sequencing carried out to identify essential genes.
Zhang et al.
Identification of essential genes by dividing the whole genome sequence into windows of different length and performing TraSH analysis.
Further, as a way of removing reliance on any cut-off by any method, all 3923 genes in the genome were assigned a score based on an integrated scoring scheme (listed in Additional file 12: Table S5) devised to quantify the results obtained from different approaches to infer essentiality. 608 genes as obtained from the integrated score correspond to the top 50th percentile were also compared with individual experimental and theoretical method and is listed in Table 1. The highest confidence set of 51 EGs from Equation 2 were indeed the top 51 ranks based on the integrated score. Moreover, the final list of 283 genes identified by Boolean logic 1 (Equation 1) were also observed to have high ranks in this list, again demonstrating that there was no bias from any one approach.
Conditional essentiality has not been explicitly addressed in this study since the main focus was to identify those that are consistently essential in the WT condition. The same framework and techniques with small modifications can however be utilized, to probe condition-specific essentiality, when sufficient data becomes available. Yet, it is interesting to note that some of the EGs identified here are reported in literature to be conditionally essential genes based on individual experimental studies (indicated in Additional file 9: Table S7).
The final list of 283 genes was further compared to an earlier study by Raman et al. that identified drug targets for M.tb based on an integrated pipeline approach. It is observed that about 31.4% of the essential genes identified in this study can also be used as drug targets for the pathogen. The subset of genes in M.tb belonging to fatty acid metabolism, purine metabolism, amino acid metabolism and redox metabolism are seen to dominate the list of essential genes.
Some of the genes belonging to the PE/PPE class are reported to be essential for M.tb H37Rv [16, 17]. In this study, genes from this class were identified as essential from microarray analysis; however these were not identified by other methods. This is because, i ) FBA studies genes belonging to metabolic pathways only, ii) these class of genes are not evolutionarily conserved across the species and iii) no high confidence interaction data was available for these genes and thus was not included in the network. As a result, these genes were filtered out on applying the Boolean logic.
Extensive lists of essential genes identified using experimental whole-genome approaches are available in literature for several organisms such as Staphylococcus aureus[9, 74], Helicobacter pylori, Mycoplasma genitalium, and Pseudomonas aeruginosa. There are reports of such lists identified through computational approaches as well [8, 78, 79]. Such studies have primarily involved phylogenetic comparisons [1, 9, 22]. There are also a few reports recently, where flux balance analysis and related methods are used to study essential genes and reactions in metabolic networks [19, 23, 26, 66]. However, apart from studies such as , there have not been sufficient efforts in the direction of integrated studies to infer essentiality. This study bridges perspective from several levels by considering the set of genes that are required for maintaining interactome integrity, maintaining metabolism and enabling growth as well as conservation through evolution of various organisms in the genus.
The set of genes identified have been probed further to predict amino acid residues that are important for the function of the gene product again by utilizing evolutionary information at a finer scale and integrating with the three dimensional structural information of protein molecules combined with the functional information of ligand binding pockets. The methodology used here can serve as a generic framework for identifying essential gene lists in other pathogens as well. The set of genes identified have the potential to be applied in drug discovery, taking target identification to a much more rational and wholistic level.
A robust methodology to identify essential genes in Mycobacterium tuberculosis has been developed that integrates data obtained from different levels of abstraction. Towards this, microarray data has been used for essential gene prediction, knockout analysis using FBA to identify essential genes from metabolic networks, phylogenetic analysis to identify evolutionarily conserved genes and systematic knockout analysis of protein-protein interaction network has been performed. The results are validated using experimental datasets. Structural analysis of the proteins of the predicted essential genes is further analysed at the sequence and structural level to provide a basis for essentiality. Overall, the method recognises the importance of a multi–scale analysis and provides a framework for prediction of essential genes.
Multiple approaches at different levels of abstraction are utilized in this study, as described in Figures 1 and 5. Methods for systems level analyses include flux balance analysis and topological analysis using graph theoretical methods of the protein-protein interaction network. Genomic microarray data from multiple sets has been analysed using microarray analysis. Sequence analyses and database searches are carried out using standard sequence alignment tools while structural analysis is carried out using algorithms previously developed in the laboratory. A summary of the datasets used in these analyses is given in Table 2.
Microarray data analysis
An exhaustive transcriptome profiling for M.tb under many different conditions is publicly available in Gene Expression Omnibus (GEO) [80, 81]. The GEO series, GSE1642 was selected for this study . The authors have deposited the transcriptome profile of the pathogen under various drug treated and culture conditions. Of the 430 microarray sets available from this study, 39 samples that represented the control or WT conditions were hand-picked and the control chip (Cy3) was selected for the analysis. Since the focus here is to study gene essentiality under normal growth conditions, expression data from only the control samples were used. Cultures grown in MiddleBrook 7H9 media were considered as control and used for the analysis. Expression values for 3923 genes were available from this study.
i = gene index
j = sample index
The shortlisted genes were further classified into functional classes using Tuberculist.
Flux balance analysis
Genome-scale reconstructions of M.tb metabolism have been reported earlier [45, 47]. The i NJ661 model was obtained from BiGG database , while the i NJ661v model was obtained from the study done by Fang et al.. Fluxes through each reaction were calculated using the COBRA toolbox and glpk solver interfaced with MATLAB . Single and double gene deletions were calculated using the COBRA toolbox. To make the computational model closer to the actual scenario, an integrated flux balance approach known as E-Flux was utilized . In this, the median of gene expression values are used to bound the fluxes of each reaction in the model. Thus the relative ratios of the steady-state fluxes would indirectly take into account differences in protein abundances. The reaction fluxes, effects of single and double gene deletions were then computed using the COBRA toolbox.
Eij = edge weight between node i and j
Bij = betweenness centrality of Eij
Ni = node weight of node i
Nj = node weight of node j
Visualization of the network and calculation of betweenness centrality for each edge was done using Cytoscape .
Random network generation
To study the statistical significance of gene knockout analysis, random networks were generated using the ‘Random Network’ plugin in Cytoscape (apps.cytoscape.org/apps/randomnetworks). The networks were generated such that the degree distribution of the network was preserved. 100 such random networks were generated each with the same number of nodes (1240) as used in the original network. A second degree of randomness was added to the network, by weighing each node in the network with gene expression data randomly. For each random network, edge weights were calculated by randomising node weights and systematic knockout analysis performed to calculate the statistical significance (p-value) of gene knockout in the original network.
Broken path analysis
D i = Disruption caused by KO of gene i
PP i = Paths perturbed upon KO of gene i
#paths WT = Total paths in the WT network
Multiple sequence alignment
BLAST hits i = number of unique species for which hits were obtained
N = number of species in the dataset = 63
S(j) = entropy calculated at sequence position j, where the sequence position is defined with respect to the multiple sequence alignment,
i = amino acids, in case of a gap insertion in the sequence alignment each amino acid is given equal count, while evaluating the frequency.
p i (j) = probability of occurrence of amino acid i in position j. This is calculated based on the MSA obtained
Structural analysis of ligand binding pockets
Crystal structures where available for the shortlisted proteins were obtained from PDB . For others, high confidence structural models for the essential genes were obtained from the M.tb Structural Proteome database, where available . The database provides putative binding sites or functional residues of M.tb protein structures, derived from a consensus approach using three different algorithms - (i) LIGSITEcsc , that utilizes solvent exposed surface conservation information, (ii) PocketDepth , an in-house geometric-based algorithm to detect cavities on protein surface and, (iii) SiteHound , a probe-based energy calculation method. A total of 269 structures were thus considered for this analysis.
All the pockets predicted by PocketDepth that were within the radius of 5 Å from LIGSITEcsc prediction and had an overlap with SiteHound predicted clusters were obtained. These pockets were further scanned to study the overlap between the residues forming pockets and the residues that are conserved as determined from MSA to derive the residue level attributes of an essential gene.
Integrated scores were also assigned to individual genes as follows:
PP KO = number of paths perturbed upon knockout (inclusive of broken paths and paths with higher path cost)
#paths WT = Total paths in the WT network
which is further normalized to obtain a percentile score.
We thank the Department of Biotechnology (DBT) and the Government of India for financial support. PB is a CSIR (Council of Scientific and Industrial Research, India) fellow in MBU.
- Arigoni F, Talabot F, Peitsch M, Edgerton MD, Meldrum E, Allet E, Fish R, Jamotte T, Curchod ML, Loferer H: A genome-based approach for the identification of essential bacterial genes. Nat Biotechnol. 1998, 16 (9): 851-856. 10.1038/nbt0998-851.PubMedView ArticleGoogle Scholar
- Jordan IK, Rogozin IB, Wolf YI, Koonin EV: Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res. 2002, 12 (6): 962-968.PubMedPubMed CentralView ArticleGoogle Scholar
- Cole ST: Comparative mycobacterial genomics as a tool for drug target and antigen discovery. Eur Respir J Suppl. 2002, 36: 78s-86s.PubMedView ArticleGoogle Scholar
- Miesel L, Greene J, Black TA: Genetic strategies for antibacterial drug discovery. Nat Rev Genet. 2003, 4 (6): 442-456. 10.1038/nrg1086.PubMedView ArticleGoogle Scholar
- Maertzdorf J, Weiner J, Kaufmann SH: Enabling biomarkers for tuberculosis control. Int J Tuberc Lung Dis. 2012, 16 (9): 1140-1148. 10.5588/ijtld.12.0246.PubMedView ArticleGoogle Scholar
- Joyce AR, Palsson BO: Toward whole cell modeling and simulation: comprehensive functional genomics through the constraint-based approach. Progress in drug research Fortschritte der Arzneimittelforschung Progres des recherches pharmaceutiques. 2007, 64 (265): 267-309.Google Scholar
- Chalker AF, Minehart HW, Hughes NJ, Koretke KK, Lonetto MA, Brinkman KK, Warren PV, Lupas A, Stanhope MJ, Brown JR, et al: Systematic identification of selective essential genes in Helicobacter pylori by genome prioritization and allelic replacement mutagenesis. J Bacteriol. 2001, 183 (4): 1259-1268. 10.1128/JB.183.4.1259-1268.2001.PubMedPubMed CentralView ArticleGoogle Scholar
- del Rio G, Koschutzki D, Coello G: How to identify essential genes from molecular networks?. BMC Syst Biol. 2009, 3: 102-10.1186/1752-0509-3-102.PubMedPubMed CentralView ArticleGoogle Scholar
- Forsyth RA, Haselbeck RJ, Ohlsen KL, Yamamoto RT, Xu H, Trawick JD, Wall D, Wang L, Brown-Driver V, Froelich JM, et al: A genome-wide strategy for the identification of essential genes in staphylococcus aureus. Mol Microbiol. 2002, 43 (6): 1387-1400. 10.1046/j.1365-2958.2002.02832.x.PubMedView ArticleGoogle Scholar
- Gerdes SY, Scholle MD, Campbell JW, Balazsi G, Ravasz E, Daugherty MD, Somera AL, Kyrpides NC, Anderson I, Gelfand MS, et al: Experimental determination and system level analysis of essential genes in escherichia coli MG1655. J Bacteriol. 2003, 185 (19): 5673-5684. 10.1128/JB.185.19.5673-5684.2003.PubMedPubMed CentralView ArticleGoogle Scholar
- Plaimas K, Eils R, Konig R: Identifying essential genes in bacterial metabolic networks with machine learning methods. BMC Syst Biol. 2010, 4: 56-10.1186/1752-0509-4-56.PubMedPubMed CentralView ArticleGoogle Scholar
- Gil R, Silva FJ, Pereto J, Moya A: Determination of the core of a minimal bacterial gene set. Microbiol Mol Biol Rev. 2004, 68 (3): 518-537. 10.1128/MMBR.68.3.518-537.2004.PubMedPubMed CentralView ArticleGoogle Scholar
- Peters JL, Cnudde F, Gerats T: Forward genetics and map-based cloning approaches. Trends Plant Sci. 2003, 8 (10): 484-491. 10.1016/j.tplants.2003.09.002.PubMedView ArticleGoogle Scholar
- Boutros M, Ahringer J: The art and design of genetic screens: RNA interference. Nat Rev Genet. 2008, 9 (7): 554-566. 10.1038/nrg2364.PubMedView ArticleGoogle Scholar
- Green RA, Kao HL, Audhya A, Arur S, Mayers JR, Fridolfsson HN, Schulman M, Schloissnig S, Niessen S, Laband K,. et al: A high-resolution C. elegans essential gene network based on phenotypic profiling of a complex tissue. Cell. 2011, 145 (3): 470-482. 10.1016/j.cell.2011.03.037.PubMedPubMed CentralView ArticleGoogle Scholar
- Griffin JE, Gawronski JD, Dejesus MA, Ioerger TR, Akerley BJ, Sassetti CM: High-resolution phenotypic profiling defines genes essential for mycobacterial growth and cholesterol catabolism. PLoS Pathog. 2011, 7 (9): e1002251-10.1371/journal.ppat.1002251.PubMedPubMed CentralView ArticleGoogle Scholar
- Sassetti CM, Rubin EJ: Genetic requirements for mycobacterial survival during infection. Proc Natl Acad Sci USA. 2003, 100 (22): 12989-12994. 10.1073/pnas.2134250100.PubMedPubMed CentralView ArticleGoogle Scholar
- Sassetti CM, Boyd DH, Rubin EJ: Genes required for mycobacterial growth defined by high density mutagenesis. Mol Microbiol. 2003, 48 (1): 77-84. 10.1046/j.1365-2958.2003.03425.x.PubMedView ArticleGoogle Scholar
- Zhang YJ, Ioerger TR, Huttenhower C, Long JE, Sassetti CM, Sacchettini JC, Rubin EJ: Global assessment of genomic regions required for growth in mycobacterium tuberculosis. PLoS Pathog. 2012, 8 (9): e1002946-10.1371/journal.ppat.1002946.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhang R, Lin Y: DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes. Nucleic Acids Res. 2009, 37 (Database issue): D455-D458.PubMedPubMed CentralView ArticleGoogle Scholar
- Chen WH, Minguez P, Lercher MJ, Bork P: OGEE: an online gene essentiality database. Nucleic Acids Res. 2012, 40 (Database issue): D901-D906.PubMedPubMed CentralView ArticleGoogle Scholar
- Gustafson AM, Snitkin ES, Parker SC, DeLisi C, Kasif S: Towards the identification of essential genes using targeted genome sequencing and comparative analysis. BMC Genomics. 2006, 7: 265-10.1186/1471-2164-7-265.PubMedPubMed CentralView ArticleGoogle Scholar
- Lee D-S, Burd H, Liu J, Almaas E, Wiest O, Barabási A-L, Oltvai ZN, Kapatral V: Comparative genome-scale metabolic reconstruction and flux balance analysis of multiple staphylococcus aureus genomes identify novel antimicrobial drug targets. J Bacteriol. 2009, 191 (12): 4015-4024. 10.1128/JB.01743-08.PubMedPubMed CentralView ArticleGoogle Scholar
- Suthers PF, Dasika MS, Kumar VS, Denisov G, Glass JI, Maranas CD: A genome-scale metabolic reconstruction of mycoplasma genitalium, iPS189. PLoS Comput Biol. 2009, 5 (2): e1000285-10.1371/journal.pcbi.1000285.PubMedPubMed CentralView ArticleGoogle Scholar
- Schilling CH, Covert MW, Famili I, Church GM, Edwards JS, Palsson BO: Genome-scale metabolic model of helicobacter pylori 26695. J Bacteriol. 2002, 184 (16): 4582-4593. 10.1128/JB.184.16.4582-4593.2002.PubMedPubMed CentralView ArticleGoogle Scholar
- Oberhardt MA, Puchałka J, Fryer KE, Dos Santos VAM, Papin JA: Genome-scale metabolic network analysis of the opportunistic pathogen pseudomonas aeruginosa PAO1. J Bacteriol. 2008, 190 (8): 2790-2803. 10.1128/JB.01583-07.PubMedPubMed CentralView ArticleGoogle Scholar
- Karr JR, Sanghvi JC, Macklin DN, Gutschow MV, Jacobs JM, Bolival B, Assad-Garcia N, Glass JI, Covert MW: A whole-cell computational model predicts phenotype from genotype. Cell. 2012, 150 (2): 389-401. 10.1016/j.cell.2012.05.044.PubMedPubMed CentralView ArticleGoogle Scholar
- Chandra N: Computational approaches for drug target identification in pathogenic diseases. Expert Opin Drug Discov. 2011, 6 (10): 975-979. 10.1517/17460441.2011.611128.PubMedView ArticleGoogle Scholar
- Conant GC, Wagner A: Duplicate genes and robustness to transient gene knock-downs in caenorhabditis elegans. Proc Biol Sci. 2004, 271 (1534): 89-96. 10.1098/rspb.2003.2560.PubMedPubMed CentralView ArticleGoogle Scholar
- Cole ST, Brosch R, Parkhill J, Garnier T, Churcher C, Harris D, Gordon SV, Eiglmeier K, Gas S, Barry CE, et al: Deciphering the biology of mycobacterium tuberculosis from the complete genome sequence. Nature. 1998, 393 (6685): 537-544. 10.1038/31159.PubMedView ArticleGoogle Scholar
- Global Tuberculosis Control. 2011, World Health Organisation,http://www.who.int/tb/publications/global_report/2011/en/,
- Goldman RC, Plumley KV, Laughon BE: The evolution of extensively drug resistant tuberculosis (XDR-TB): history, status and issues for global control. Infect Disord Drug Targets. 2007, 7 (2): 73-91. 10.2174/187152607781001844.PubMedView ArticleGoogle Scholar
- Anand P, Sankaran S, Mukherjee S, Yeturu K, Laskowski R, Bhardwaj A, Bhagavat R, Brahmachari SK, Chandra N: Structural annotation of mycobacterium tuberculosis proteome. PLoS One. 2011, 6 (10): e27044-10.1371/journal.pone.0027044.PubMedPubMed CentralView ArticleGoogle Scholar
- Vashisht R, Mondal AK, Jain A, Shah A, Vishnoi P, Priyadarshini P, Bhattacharyya K, Rohira H, Bhat AG, Passi A, et al: Crowd sourcing a new paradigm for interactome driven drug target identification in mycobacterium tuberculosis. PLoS One. 2012, 7 (7): e39808-10.1371/journal.pone.0039808.PubMedPubMed CentralView ArticleGoogle Scholar
- Bebek G, Koyuturk M, Price ND, Chance MR: Network biology methods integrating biological data for translational science. Brief Bioinform. 2012, 13 (4): 446-459. 10.1093/bib/bbr075.PubMedPubMed CentralView ArticleGoogle Scholar
- Vilaprinyo E, Alves R, Sorribas A: Minimization of biosynthetic costs in adaptive gene expression responses of yeast to environmental changes. PLoS Comput Biol. 2010, 6 (2): e1000674-10.1371/journal.pcbi.1000674.PubMedPubMed CentralView ArticleGoogle Scholar
- Boshoff HI, Myers TG, Copp BR, McNeil MR, Wilson MA, Barry CE: The transcriptional responses of mycobacterium tuberculosis to inhibitors of metabolism: novel insights into drug mechanisms of action. J Biol Chem. 2004, 279 (38): 40174-40184. 10.1074/jbc.M406796200.PubMedView ArticleGoogle Scholar
- Jeong H, Oltvai ZN, Barabsi A: Prediction of protein essentiality based on genomic data. ComPlexUs. 2003, 1: 19-28. 10.1159/000067640.View ArticleGoogle Scholar
- Lew JM, Kapopoulou A, Jones LM, Cole ST: TubercuList--10 years after. Tuberculosis (Edinb). 2011, 91 (1): 1-7. 10.1016/j.tube.2010.09.008.View ArticleGoogle Scholar
- Banu S, Honore N, Saint-Joanis B, Philpott D, Prevost MC, Cole ST: Are the PE-PGRS proteins of mycobacterium tuberculosis variable surface antigens?. Mol Microbiol. 2002, 44 (1): 9-19. 10.1046/j.1365-2958.2002.02813.x.PubMedView ArticleGoogle Scholar
- Mukhopadhyay S, Balaji KN: The PE and PPE proteins of mycobacterium tuberculosis. Tuberculosis (Edinb). 2011, 91 (5): 441-447. 10.1016/j.tube.2011.04.004.View ArticleGoogle Scholar
- Sampson SL: Mycobacterial PE/PPE proteins at the host-pathogen interface. Clin Dev Immunol. 2011, 2011: 497203-PubMedPubMed CentralView ArticleGoogle Scholar
- Orth JD, Thiele I, Palsson BO: What is flux balance analysis?. Nature Biotechnol. 2010, 28 (3): 245-248. 10.1038/nbt.1614.View ArticleGoogle Scholar
- Raman K, Chandra N: Flux balance analysis of biological systems: applications and challenges. Brief Bioinform. 2009, 10 (4): 435-449. 10.1093/bib/bbp011.PubMedView ArticleGoogle Scholar
- Jamshidi N, Palsson BO: Investigating the metabolic capabilities of mycobacterium tuberculosis H37Rv using the in silico strain iNJ661 and proposing alternative drug targets. BMC Syst Biol. 2007, 1: 26-10.1186/1752-0509-1-26.PubMedPubMed CentralView ArticleGoogle Scholar
- Colijn C, Brandes A, Zucker J, Lun DS, Weiner B, Farhat MR, Cheng TY, Moody DB, Murray M, Galagan JE: Interpreting expression data with metabolic flux models: predicting mycobacterium tuberculosis mycolic acid production. PLoS Comput Biol. 2009, 5 (8): e1000489-10.1371/journal.pcbi.1000489.PubMedPubMed CentralView ArticleGoogle Scholar
- Beste DJ, Hooper T, Stewart G, Bonde B, Avignone-Rossa C, Bushell ME, Wheeler P, Klamt S, Kierzek AM, McFadden J: GSMN-TB: a web-based genome-scale network model of mycobacterium tuberculosis metabolism. Genome Biol. 2007, 8 (5): R89-10.1186/gb-2007-8-5-r89.PubMedPubMed CentralView ArticleGoogle Scholar
- Fang X, Wallqvist A, Reifman J: Development and analysis of an in vivo-compatible metabolic network of mycobacterium tuberculosis. BMC Syst Biol. 2010, 4: 160-10.1186/1752-0509-4-160.PubMedPubMed CentralView ArticleGoogle Scholar
- Barabasi AL, Oltvai ZN: Network biology: understanding the cell’s functional organization. Nat Rev Genet. 2004, 5 (2): 101-113. 10.1038/nrg1272.PubMedView ArticleGoogle Scholar
- Cabusora L, Sutton E, Fulmer A, Forst CV: Differential network expression during drug and stress response. Bioinformatics. 2005, 21 (12): 2898-2905. 10.1093/bioinformatics/bti440.PubMedView ArticleGoogle Scholar
- Kanehisa M, Goto S: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000, 28 (1): 27-30. 10.1093/nar/28.1.27.PubMedPubMed CentralView ArticleGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215 (3): 403-410.PubMedView ArticleGoogle Scholar
- Golby P, Nunez J, Cockle PJ, Ewer K, Logan K, Hogarth P, Vordermeier HM, Hinds J, Hewinson RG, Gordon SV: Characterization of two in vivo-expressed methyltransferases of the mycobacterium tuberculosis complex: antigenicity and genetic regulation. Microbiology. 2008, 154 (Pt 4): 1059-1067.PubMedPubMed CentralView ArticleGoogle Scholar
- Hsu T, Hingley-Wilson SM, Chen B, Chen M, Dai AZ, Morin PM, Marks CB, Padiyar J, Goulding C, Gingery M, et al: The primary mechanism of attenuation of bacillus Calmette-Guerin is a loss of secreted lytic function required for invasion of lung interstitial tissue. Proc Natl Acad Sci USA. 2003, 100 (21): 12420-12425. 10.1073/pnas.1635213100.PubMedPubMed CentralView ArticleGoogle Scholar
- Parish T, Smith DA, Roberts G, Betts J, Stoker NG: The senX3-regX3 two-component regulatory system of mycobacterium tuberculosis is required for virulence. Microbiology. 2003, 149 (Pt 6): 1423-1435.PubMedView ArticleGoogle Scholar
- Rengarajan J, Bloom BR, Rubin EJ: Genome-wide requirements for mycobacterium tuberculosis adaptation and survival in macrophages. Proc Natl Acad Sci USA. 2005, 102 (23): 8327-8332. 10.1073/pnas.0503272102.PubMedPubMed CentralView ArticleGoogle Scholar
- Rickman L, Scott C, Hunt DM, Hutchinson T, Menendez MC, Whalan R, Hinds J, Colston MJ, Green J, Buxton RS: A member of the cAMP receptor protein family of transcription regulators in mycobacterium tuberculosis is required for virulence in mice and controls transcription of the rpfA gene coding for a resuscitation promoting factor. Mol Microbiol. 2005, 56 (5): 1274-1286. 10.1111/j.1365-2958.2005.04609.x.PubMedPubMed CentralView ArticleGoogle Scholar
- Savvi S, Warner DF, Kana BD, McKinney JD, Mizrahi V, Dawes SS: Functional characterization of a vitamin B12-dependent methylmalonyl pathway in mycobacterium tuberculosis: implications for propionate metabolism during growth on fatty acids. J Bacteriol. 2008, 190 (11): 3886-3895. 10.1128/JB.01767-07.PubMedPubMed CentralView ArticleGoogle Scholar
- Venugopal A, Bryk R, Shi S, Rhee K, Rath P, Schnappinger D, Ehrt S, Nathan C: Virulence of mycobacterium tuberculosis depends on lipoamide dehydrogenase, a member of three multienzyme complexes. Cell Host Microbe. 2011, 9 (1): 21-31. 10.1016/j.chom.2010.12.004.PubMedPubMed CentralView ArticleGoogle Scholar
- Zahrt TC, Deretic V: Mycobacterium tuberculosis signal transduction system required for persistent infections. Proc Natl Acad Sci USA. 2001, 98 (22): 12706-12711. 10.1073/pnas.221272198.PubMedPubMed CentralView ArticleGoogle Scholar
- Ducati RG, Breda A, Basso LA, Santos DS: Purine salvage pathway in mycobacterium tuberculosis. Curr Med Chem. 2011, 18 (9): 1258-1275. 10.2174/092986711795029627.PubMedView ArticleGoogle Scholar
- Lyon RH, Hall WH, Costas-Martinez C: Utilization of amino acids during growth of mycobacterium tuberculosis in rotary cultures. Infect Immun. 1970, 1 (6): 513-520.PubMedPubMed CentralGoogle Scholar
- Parker WB, Long MC: Purine metabolism in mycobacterium tuberculosis as a target for drug development. Curr Pharm Des. 2007, 13 (6): 599-608. 10.2174/138161207780162863.PubMedView ArticleGoogle Scholar
- Timm J, Post FA, Bekker LG, Walther GB, Wainwright HC, Manganelli R, Chan WT, Tsenova L, Gold B, Smith I: Differential expression of iron-, carbon-, and oxygen-responsive mycobacterial genes in the lungs of chronically infected mice and tuberculosis patients. Proc Natl Acad Sci. 2003, 100 (24): 14321-10.1073/pnas.2436197100.PubMedPubMed CentralView ArticleGoogle Scholar
- Metris A, Reuter M, Gaskin D, Baranyi J, van Vliet A: In vivo and in silico determination of essential genes of campylobacter jejuni. BMC Genomics. 2011, 12 (1): 535-10.1186/1471-2164-12-535.PubMedPubMed CentralView ArticleGoogle Scholar
- Xu P, Ge X, Chen L, Wang X, Dou Y, Xu JZ, Patel JR, Stone V, Evans K, Kitten T: Genome-wide essential gene identification in streptococcus sanguinis. Sci Rep. 2011, 1: 125-PubMedPubMed CentralGoogle Scholar
- Dotsch A, Klawonn F, Jarek M, Scharfe M, Blocker H, Haussler S: Evolutionary conservation of essential and highly expressed genes in pseudomonas aeruginosa. BMC Genomics. 2011, 11: 234-View ArticleGoogle Scholar
- Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, et al: Clustal W and Clustal X version 2.0. Bioinformatics. 2007, 23 (21): 2947-2948. 10.1093/bioinformatics/btm404.PubMedView ArticleGoogle Scholar
- Padiadpu J, Mukherjee S, Chandra N: Rationalization and prediction of drug resistant mutations in targets for clinical anti-tubercular drugs. J Biomol Struct Dyn. 2013, 31 (1): 44-58. 10.1080/07391102.2012.691361.PubMedView ArticleGoogle Scholar
- Kalidas Y, Chandra N: PocketDepth: a new depth based algorithm for identification of ligand binding sites in proteins. J Struct Biology. 2008, 161 (1): 31-42. 10.1016/j.jsb.2007.09.005.View ArticleGoogle Scholar
- Futcher B, Latter GI, Monardo P, McLaughlin CS, Garrels JI: A sampling of the yeast proteome. Mol Cell Biol. 1999, 19 (11): 7357-7368.PubMedPubMed CentralView ArticleGoogle Scholar
- Orntoft TF, Thykjaer T, Waldman FM, Wolf H, Celis JE: Genome-wide study of gene copy numbers, transcripts, and protein levels in pairs of non-invasive and invasive human transitional cell carcinomas. Mol Cell Proteomics. 2002, 1 (1): 37-45. 10.1074/mcp.M100019-MCP200.PubMedView ArticleGoogle Scholar
- Raman K, Yeturu K, Chandra N: targetTB: a target identification pipeline for mycobacterium tuberculosis through an interactome, reactome and genome-scale structural analysis. BMC Syst Biol. 2008, 2: 109-10.1186/1752-0509-2-109.PubMedPubMed CentralView ArticleGoogle Scholar
- Ji Y, Zhang B, Van Horn SF, Warren P, Woodnutt G, Burnham MK, Rosenberg M: Identification of critical staphylococcal genes using conditional phenotypes generated by antisense RNA. Science. 2001, 293 (5538): 2266-2269. 10.1126/science.1063566.PubMedView ArticleGoogle Scholar
- Salama NR, Shepherd B, Falkow S: Global transposon mutagenesis and essential gene analysis of helicobacter pylori. J Bacteriol. 2004, 186 (23): 7926-7935. 10.1128/JB.186.23.7926-7935.2004.PubMedPubMed CentralView ArticleGoogle Scholar
- Glass JI, Assad-Garcia N, Alperovich N, Yooseph S, Lewis MR, Maruf M, Hutchison CA, Smith HO, Venter JC: Essential genes of a minimal bacterium. Proc Natl Acad Sci USA. 2006, 103 (2): 425-430. 10.1073/pnas.0510013103.PubMedPubMed CentralView ArticleGoogle Scholar
- Liberati NT, Urbach JM, Miyata S, Lee DG, Drenkard E, Wu G, Villanueva J, Wei T, Ausubel FM: An ordered, nonredundant library of pseudomonas aeruginosa strain PA14 transposon insertion mutants. Proc Natl Acad Sci USA. 2006, 103 (8): 2833-2838. 10.1073/pnas.0511100103.PubMedPubMed CentralView ArticleGoogle Scholar
- Barve A, Rodrigues JF, Wagner A: Superessential reactions in metabolic networks. Proc Natl Acad Sci USA. 2012, 109 (18): E1121-E1130. 10.1073/pnas.1113065109.PubMedPubMed CentralView ArticleGoogle Scholar
- Li M, Zhang H, Wang JX, Pan Y: A new essential protein discovery method based on the integration of protein-protein interaction and gene expression data. BMC Syst Biol. 2012, 6: 15-10.1186/1752-0509-6-15.PubMedPubMed CentralView ArticleGoogle Scholar
- Barrett T, Troup DB, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, et al: NCBI GEO: archive for functional genomics data sets—10 years on. Nucleic Acids Res. 2011, 39 (suppl 1): D1005-D1010.PubMedPubMed CentralView ArticleGoogle Scholar
- Edgar R, Domrachev M, Lash AE: Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002, 30 (1): 207-210. 10.1093/nar/30.1.207.PubMedPubMed CentralView ArticleGoogle Scholar
- Schellenberger J, Park JO, Conrad TM, Palsson BO: BiGG: a biochemical genetic and genomic knowledgebase of large scale metabolic reconstructions. BMC Bioinformatics. 2010, 11: 213-10.1186/1471-2105-11-213.PubMedPubMed CentralView ArticleGoogle Scholar
- Schellenberger J, Que R, Fleming RM, Thiele I, Orth JD, Feist AM, Zielinski DC, Bordbar A, Lewis NE, Rahmanian S, et al: Quantitative prediction of cellular metabolism with constraint-based models: the COBRA toolbox v2.0. Nature protocols. 2011, 6 (9): 1290-1307. 10.1038/nprot.2011.308.PubMedPubMed CentralView ArticleGoogle Scholar
- Szklarczyk D, Franceschini A, Kuhn M, Simonovic M, Roth A, Minguez P, Doerks T, Stark M, Muller J, Bork P, et al: The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res. 2011, 39 (Database issue): D561-D568.PubMedPubMed CentralView ArticleGoogle Scholar
- Haw R, Hermjakob H, D’Eustachio P, Stein L: Reactome pathway analysis to enrich biological discovery in proteomics data sets. Proteomics. 2011, 11 (18): 3598-3613. 10.1002/pmic.201100066.PubMedPubMed CentralView ArticleGoogle Scholar
- Wang Y, Cui T, Zhang C, Yang M, Huang Y, Li W, Zhang L, Gao C, He Y, Li Y, et al: Global protein-protein interaction network in the human pathogen mycobacterium tuberculosis H37Rv. J Proteome Res. 2010, 9 (12): 6665-6677. 10.1021/pr100808n.PubMedView ArticleGoogle Scholar
- Smoot ME, Ono K, Ruscheinski J, Wang PL, Ideker T: Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics. 2011, 27 (3): 431-432. 10.1093/bioinformatics/btq675.PubMedPubMed CentralView ArticleGoogle Scholar
- Robert WF: Algorithm 97: shortest path. Commun ACM. 1962, 5 (6): 345-Google Scholar
- Kramer LD, Presser SB, Houk EJ, Hardy JL: Effect of the anesthetizing agent triethylamine on western equine encephalomyelitis and St. Louis encephalitis viral titers in mosquitoes (Diptera: Culicidae). J Med Entomol. 1990, 27 (6): 1008-1010.PubMedView ArticleGoogle Scholar
- Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, Burkhardt K, Feng Z, Gilliland GL, Iype L, Jain S, et al: The protein data bank. Acta Crystallogr D. 2002, 58 (Pt 6 No 1): 899-907.PubMedView ArticleGoogle Scholar
- Huang B, Schroeder M: LIGSITEcsc: predicting ligand binding sites using the Connolly surface and degree of conservation. BMC Struct Biol. 2006, 6: 19-10.1186/1472-6807-6-19.PubMedPubMed CentralView ArticleGoogle Scholar
- Hernandez M, Ghersi D, Sanchez R: SITEHOUND-web: a server for ligand binding site identification in protein structures. Nucleic Acids Res. 2009, 37 (Web Server issue): W413-W416.PubMedPubMed CentralView ArticleGoogle Scholar
- Oliveros JC: VENNY. An interactive tool for comparing lists with Venn Diagrams. 2007,http://bioinfogp.cnb.csic.es/tools/venny/,Google Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.