It is now well-established that better insights into biological systems may be obtained by considering large-scale system-level models, since biological systems are complex networks of many processes. The conventional method of focussing on a single protein at a time, however important the protein may be, would mean losing perspective of its larger context and hence may not provide the right answers, especially in drug discovery. Broader insights about the appropriateness of a potential target can be obtained by considering pathways and whole-system models relevant to that disease. For example, an enzyme that may be identified as a good target for a particular disease may not actually be critical or essential, when viewed in the context of the entire metabolism in the cell. Analysing system-level models can help in assessing criticality of the individual proteins by studying any alternate pathways and mechanisms that may naturally exist to compensate for the absence of that protein. This study has demonstrated how systems biology can be used in drug target identification and drug discovery.
As the necessity of systems-level studies is becoming more and more obvious, a wide spectrum of techniques have been developed and applied for the simulation and analysis of biochemical systems [62–65]. These include stoichiometric techniques that rely on reaction stoichiometry and other constraints, kinetic pathway modelling techniques that rely on comprehensive mechanistic models and interaction-based analyses, as well as Petri nets and qualitative modelling formalisms . The FBA carried out in conjunction with gene knock-outs here indicates the criticality of individual reactions and hence the associated proteins. In FBA, knock-outs can in fact be viewed as extreme inhibitions in which the target is totally inhibited by a drug. 188 of the 661 proteins in Mtb iNJ661 model resulted in lethal phenotypes when knocked out, indicating their essentiality for producing the required biomass and hence for bacterial growth. The FBA analysis also has the potential to consider multiple knock-outs again amounting to total inhibition at multiple points. Such a phenomenon is known to occur by some drugs individually and more commonly by a cocktail of drugs. For example, isoniazid is thought to act at two points in the pathway by inhibiting both InhA and KasA [4, 67]. The FBA study presents a ready framework to analyse the effects of such drug inhibitions, which would be extremely difficult to judge by inspection of the reaction maps alone. Various combinations of the non-lethal gene deletions leading to about 111,628 different double knock-outs were generated and tested with FBA using the same objective function. 49 of them were found to lead to lethal phenotypes, with growth ratio of zero, as compared to that of the wild-type. Such proteins can be targeted simultaneously to achieve excellent antibacterial effect, although individually either one of them would not be good targets. Some examples of such pairs are Rv0505c (SerB1, non-essential)-Rv3042c (SerB2, in H-List), both phosphoserine phosphatases, Rv2243 (FabD, H-List)-Rv0649 (FabD2, non-essential), both malonyl CoA-ACP transacylases, Rv3273–Rv3588c, both carbonic anhydrases, and non-essential, individually, by systems analyses. It is conceivable that each of these pairs that appear to be isozymes produce a lethal phenotype on deletion, since the functional step of the pathways they catalyse may have proceeded in the absence of one, but would be arrested in the absence of both enzymes. Another example is that of Rv0363c (Fba, a fructose-bisphosphate aldolase)-Rv1237 (SugB, a sugar transport membrane protein ABC transporter). Such studies using FBA, however, can be carried out only for the annotated reactome component of the bacterial cell.
Networks obtained by considering various protein-protein interactions and influences, on the other hand are much more comprehensive and nearly complete in their coverage, especially because of the availability of an integrated database that considers experimentally mapped interactions and those predicted from one or more of the four well-established computational methods [17, 18]. A drawback of such a network however, could be a large number of false positives. To minimise the introduction of false positives, we have eliminated all low-confidence interactions from our study. The number of broken paths introduced by a knock-out is taken here as a measure of the essentiality of the protein in maintaining the network. Biological networks typically display a power-law degree distribution. We explore the importance of the disruption of network connectivity that occurs on account of attacking nodes that lie on many shortest paths in the network. The advantage of interaction-based modelling such as this is that it is possible to generate interaction networks from existing databases and it is not constrained by lack of quantitative mechanistic data.
Besides essentiality to the pathogen, an ideal target should have several other properties such as non-similarity with human proteins whose inhibition could lead to potential adverse drug effects, an aspect that has been analysed at multiple levels in this study (see Fig. 1). The simplest level of course is to check for sequence similarity of the target being queried with all the proteins in the human proteome. Sequence information is readily available for hundreds of bacteria and this type of analysis is reported earlier for pathogenic genomes such as Burkholderia pseudomallei , Helicobacteri pylori , Pseudomonas aeruginosa [70, 71] and even Mtb . However, such sequence filtering while important, cannot be the sole criteria for identifying high quality targets, since two proteins that are considerably dissimilar in their sequences could have very similar binding sites [73, 74]. Thus, while sequence similarity very often leads to structural and hence functional similarity, it is not a necessary condition for two proteins to have similar ligand binding profiles.
In the process of target identification, what really matters for a good target is to have a binding site in the target protein that is sufficiently different from that of any host protein. This is so that a given drug is both available in intended quantities to the intended target and perhaps more importantly, to avoid adverse effects by the drug binding to another protein from the host and manipulating its function as well, which is unintended and unanticipated. For this purpose, it is not very intuitive to look at structural classes and overall properties such as the structural family or secondary structural types, that might describe a structure. Instead, it is important to study the possible binding profile of a given drug to all those proteins to which it is likely to be exposed. Towards this goal, we first identified possible pockets in the set of Mtb and human structures, using PD, a validated algorithm that was recently developed in our lab. All such putative pockets were tested for certain criteria such as size and volume, retaining only those that were likely to bind to small molecules. The filtered pockets from preliminarily shortlisted targets from Mtb were then screened for similarity against pockets from the human proteins, which involved over 245 million comparisons, using PM, a site-matching algorithm recently developed in our laboratory. From this, 145 putative targets were eliminated due to high similarity with one or more human proteins. Interestingly, well-known molecules such as AlrA, PanD and GyrB are observed to have high similarities with proteins in the human, perhaps explaining the side effects caused by the drugs targeting them. With a cut-off in PMScore of 60%, molecules such as InhA, EmbA and EmbC, would all have been eliminated from the list for not having the properties of a safe target. However, since it is in principle, possible to design inhibitors that could bind only to the intended target by exploiting subtle structural differences that exist at the sites of the bacterial target in question with those of the human proteins obtained as hits with PM, we chose to use a high cut-off of 80%, so as to remove only those with very high risk of causing side effects. Some examples of molecules that have failed at this stage are DdlA, GyrB, AftA and AlrA. It must be noted that some of these were ranked as high priority targets by other studies that did not consider the structural aspect explicitly, again emphasising the need for structural level analysis. Eliminating those proteins with high similarity to proteins in the gut flora also helps in ultimately reducing the risk of side effects.
The last stages of filtering and post-identification analysis resulted in identifying two categories of targets: broad-spectrum targets and Mtb-specific targets. It is necessary to identify targets in both the categories, since they are required in different situations. Mtb-specific targets are believed to be safer since they would not lead to many organisms developing resistance against the drugs of such targets. Broad-spectrum targets, on the other hand, would be extremely useful when multiple infections co-exist or in some cases where a specific diagnosis is not possible. A comprehensive phylogenetic analysis of the shortlisted targets against 228 different pathogenic genomes has been carried out in this study, leading to the identification of broad-spectrum targets. Identification of pathways and proteins involved in generating drug resistance and then targeting them simultaneously as co-targets along with the primary broad-spectrum targets would reduce the risk of drug resistance significantly, making many more molecules accessible for therapeutic intervention.