- Methodology article
- Open Access
Predicting the points of interaction of small molecules in the NF-κB pathway
© Patel et al; licensee BioMed Central Ltd. 2011
Received: 8 April 2010
Accepted: 22 February 2011
Published: 22 February 2011
The similarity property principle has been used extensively in drug discovery to identify small compounds that interact with specific drug targets. Here we show it can be applied to identify the interactions of small molecules within the NF-κB signalling pathway.
Clusters that contain compounds with a predominant interaction within the pathway were created, which were then used to predict the interaction of compounds not included in the clustering analysis.
The technique successfully predicted the points of interactions of compounds that are known to interact with the NF-κB pathway. The method was also shown to be successful when compounds for which the interaction points were unknown were included in the clustering analysis.
One of the most studied cellular signalling systems is the Nuclear Factor κB (NF-κB) network. The NF-κB family of transcription factors controls the transcription of at least 300 genes, but has different transcriptional and cell fate outcomes in different cells and in response to different stimuli. As well as being a critical component of the innate immune response, NF-κB controls cell division and apoptosis in most cell types. While the NF-κB signalling pathway has been studied in many papers (nearly 30,000 are returned by a PubMed search for "Nuclear Factor kappa B"), there is still a great deal about the system which is not understood. Recently, NF-κB proteins have been shown to oscillate between the cytoplasm and nucleus of stimulated cells and the frequency of these oscillations has been suggested to alter the pattern of gene expression. The discovery of the importance of these dynamic processes requires a re-interpretation of the previous literature.
NF-κB has been a much studied drug target in the pharmaceutical industry. Numerous traditional medicines have been shown to contain compounds that affect NF-κB activity. Many of these are now being investigated for pharmaceutical development, for example gambogic acid, caffeic acid phenyl ester, green tea polyphenols (reviewed by Khan and Mukhtar). In addition, NF-κB antisense oligonucleotides have recently been shown to affect outcome in a murine endotoxic shock model and NF-κB decoy oligonucleotides are of interest as potential therapy for inflammatory diseases (for review see ). The effects of NF-κB modulating drugs have been measured mostly using assays for NF-κB function that have been limited to easily available endpoints such as IκB degradation or DNA binding. As a result the interpretation of the site of action of these compounds may require re-analysis. The combination of the limited characterisation of the site of action, as well as the limited understanding of the NF-κB network, has meant that it has been difficult to interpret and compare the action of different NF-κB inhibitors. Here we use chemoinformatic approaches to cluster a set of known NF-κB modulatory compounds.
The methodology is based on the similar property principle (structurally similar compounds have similar properties), although it must be noted that there are flaws with the principle. The main flaw is that small structural changes can lead to a dramatic change in property (e.g. changing a hydrogen bond donor for an acceptor activity can greatly increase activity against a target), which has a major impact in studying quantitative structure activity/property relationships. In this study we use it as a general rule rather than a specific rule. In addition to identifying relationships between clusters of compounds and their biological functions, clusters were also used to identify the points of interaction of compounds (which are known to interact with the NF-κB pathway) not used in the clustering analysis.
The compounds were obtained from a literature search, which in many instances involved manually searching for chemical structures using chemical names present in the literature. Structures for the compounds can be found in the Additional File 1 (chiral information has been included where known; however only 2D information was used in the work presented here). Since the creation of this list, advances in text mining mean it is now possible to automatically extract names of compounds from the literature and associate the names with structures, for example using Pipeline Pilots' ChemMining Collection or OSCAR3. The resulting list of compounds from the literature search could look like the diverse set collated here. Such lists obtained for a cellular pathway could be used (as here) to identify compounds which interact in a similar manner in a given pathway. A point to note is that here, this technique has not been used to identify novel compounds that interact within the pathway, but rather to identify the point of interaction of compounds which are known to affect the pathway. As an additional aim of this work, we have used all the compounds obtained from the literature search in this analysis, including those for which no specific point of interaction in the NF-κB pathway has been suggested, in order to investigate if this step (or a similar first step in the drug discovery process) could also be automated.
For 297 compounds the type of interaction within the pathway was also taken from the literature. The interactions were defined as: interacting via a ROS mechanism (this can be at any point in the pathway); inhibiting IKK (at point 1 in Figure 2); inhibiting degradation/phosphorylation of IκB (point 2 in Figure 2); increasing degradation/phosphorylation of IκB (point 2); inhibiting translocation (point 3); and interfering with DNA binding (point 4). Four compounds had more than one interaction (see Additional File 1 for structures and references).
Compounds in the Various Training and Test Sets
Interfering with DNA binding
Inhibits IKK activation
Inhibits IκB degradation or phosphorylation
Activates IκB degradation or phosphorylation
Training set 1
Test set 1
Training set 2
Test set 2
Training set 3
Test set 3
Training set 4
Test set 4
Training set 5
Test set 5
Each training set was clustered using Pipeline Pilot with the following descriptors: Extended Connectivity Fingerprints with a path length of 4 atoms (ECFP4), Property descriptors (AlogP, molecular weight, number of hydrogen bond acceptors, number of hydrogen bond donors, number of atoms, number of rotatable bonds, number of rings, and number of aromatic rings), ECFP4 with Property descriptors, BCUT (descriptors obtained from the eigenvalues of the adjacency matrix, weighting the diagonal elements with atom weights), GCUT (obtained from the eigenvalues of a modified graph distance adjacency matrix), BCUT with GCUT, and GCUT with Property descriptors. The BCUT and GCUT descriptors were calculated using MOE. Clustering was based on maximal dissimilarity partitioning with the clusters derived by imposing a distance threshold between a molecule and its cluster representative. As the clustering algorithm in Pipeline Pilot is dependent upon a seed compound, five different seeds were chosen for each descriptor and dataset combination (i.e. there were five different sets of clusters for each descriptor of each dataset giving a total of 25 different sets of clusters for each descriptor).
The number of clusters used in the clustering was chosen by using the following method: first the training set compounds were clustered into a set number of clusters (n), which varied from one to 200 (which would give an average of 2 compounds per cluster). For each n, the average self-similarity (avg-s) of the clusters was calculated. The value of n was chosen so that the biggest decrease in avg-s was seen between (n-1) and n clusters.
The cluster which had the most similar compound;
The cluster which had the most similar cluster centre;
The cluster with the highest average similarity;
Repeat considering only clusters with a minimum of 1 (i.e. singletons), 2, 3, 4 or 5 compounds.
The compounds with unidentified points of interaction in the pathway were included in the training sets used in the clustering analysis in order to investigate how their inclusion affected the ability of using this technique to predict the interactions of the query compounds in the test sets.
Each dataset was then analysed to see how many of the clusters contain compounds with a predominant interaction. The levels of predominance used in this analysis were 50%, 66%, and 75%, and considered clusters with a minimum size of 1 (singletons), 2, 3, 4 or 5 compounds. For example, the datasets were analysed to see how many clusters have at least 50% of their members with the same interaction.
These results show that some of the descriptors and clustering levels used are able to classify compounds into clusters that have predominantly one type of interaction. Next, we look at whether the clusters can be used to identify the interactions of compounds in the test sets.
As before, varying the percentage of compounds in a cluster that must have the same interaction for that interaction to be assigned to a query compound has little effect on the order of performance of the descriptors, although there is a slight drop in the number of correctly identified interactions. Similarly, no difference is seen whether the most similar cluster is calculated using the compound of the most similar cluster, the most similar cluster centre, or the cluster with the highest average cluster.
Average Number of Queries with A Similarity > = 0.7
Minimum Cluster Size
BCUT with GCUT
GCUT with Properties
ECFP4 with Properties
The drawback to this technique is that the identifications of the point of interactions are limited to compounds which are similar to those used in the initial clustering analysis. If a novel compound that is distinct in structure to the known compounds is found to interact with the pathway, the technique used here may not be sufficient to identify the point of interaction. Incorporating techniques used in scaffold-hopping, such as using reduced graphs, may help to overcome such limitations. Representing molecules as a set of connected features (e.g. an aromatic ring system or an aliphatic link joining two other features together) and using these representations in a search would allow molecules with the same connections of features to be retrieved which would be less structurally similar than the work presented here whilst (hopefully) having the same functionality, allowing for more diverse molecules with similar interactions to be found. Other methods may include creating pharmacophores from molecules with the same interactions and finding compounds which fit the pharmacophore.
In this analysis we have shown that it is possible to use noisy data obtained from the literature to link together chemoinformatics and network biology, specifically a cellular pathway network. The clusters produced from such data have been shown to be fairly robust, with the information gained from clustering able to help us to decide on the mechanism of action for compounds that are known to interact somewhere in the NF-κB pathway, and could be used to help infer which (and where in the pathway) other untested compounds interact. Here, ECFP4 and ECFP4 with Property descriptors have been shown to be the best at producing clusters which can be used to identify the interactions of an external set of compounds. One interesting feature would be if the techniques used here would be able to find compounds which can alter the timings, and hence the function, of the system. The results presented also show the general applicability of the similar property principle.
YP, CH and MW would like to thank the BBSRC (SCIBS grant codes: BBE01366X1 and BBE0136001; SABR grant code BBF0059381) for financial support for this project. This work was directly supported by Jim Thomas, Dave Spiller, Clare Vickers and Paul Dobson and indirectly by Dean Jackson and William Rowe.
- Alon U: An introduction to systems biology: design principles of biological circuits. London: Chapman and Hall/CRC; 2006.Google Scholar
- Klipp E, et al., et al.: Systems Biology in Practice: Concepts, Implementation and Clinical Application. Berlin: Wiley/VCH; 2005.View ArticleGoogle Scholar
- Palsson BØ: Systems biology: properties of reconstructed networks. Cambridge: Cambridge University Press; 2006.View ArticleGoogle Scholar
- Lehár J, et al., et al.: Combination chemical genetics. Nature Chemical Biology 2008, 4: 674-681.PubMed CentralView ArticlePubMedGoogle Scholar
- Smukste I, Stockwell BR: Advances in chemical genetics. Annual Review of Genomics and Human Genetics 2005, 6: 261-286. 10.1146/annurev.genom.6.080604.162136View ArticlePubMedGoogle Scholar
- Stockwell BR: Chemical genetics: ligand-based discovery of gene function. Nature Reviews Genetics 2000, 1: 116-125. 10.1038/35038557PubMed CentralView ArticlePubMedGoogle Scholar
- Nelson DE, et al., et al.: Oscillations in Transcription Factor Dynamics: A New Way to Control Gene Expression. Biochemical Society Transactions 2004,32(6):1090-1092. 10.1042/BST0321090View ArticlePubMedGoogle Scholar
- Ashall L, et al., et al.: Pulsatile Stimulation Determines Timing and Specifity of NF-κB-Dependent Transcription. Science 2009, 324: 242-246. 10.1126/science.1164860PubMed CentralView ArticlePubMedGoogle Scholar
- He D, et al., et al.: The NF-kappa B inhibitor, celastrol, could enhance the anti-cancer effect of gambogic acid on oral squamous cell carcinoma. BMC Cancer 2009, 9: 343. 10.1186/1471-2407-9-343PubMed CentralView ArticlePubMedGoogle Scholar
- Andrade-Silva AR, et al., et al.: Effect of NFkappaB inhibition by CAPE on skeletal muscle ischemia-reperfusion injury. J Surg Res 2009,153(2):254-62. 10.1016/j.jss.2008.04.009View ArticlePubMedGoogle Scholar
- Khan N, Mukhtar H: Multitargeted therapy of cancer by green tea polyphenols. Cancer Lett 2008,269(2):269-80. 10.1016/j.canlet.2008.04.014PubMed CentralView ArticlePubMedGoogle Scholar
- Siwale RC, et al., et al.: The effect of intracellular delivery of catalase and antisense oligonucleotides to NF-kappaB using albumin microcapsules in the endotoxic shock model. J Drug Target 2009,17(9):701-9. 10.3109/10611860903062070View ArticlePubMedGoogle Scholar
- De Stefano D, De Rosa G, Carnuccio R: NFkappaB decoy oligonucleotides. Curr Opin Mol Ther 2010,12(2):203-13.PubMedGoogle Scholar
- Johnson MA, Maggiora GME: Concepts and Applications of Molecular Similarity. Wiley, New York; 1990.Google Scholar
- Maggiora GM: On Outliers and Activity Cliffs - Why QSAR Often Disappoints. Journal of Chemical Information and Modelling 2006,46(4):1535. 10.1021/ci060117sView ArticleGoogle Scholar
- Accelrys: Pipeline Pilot. 2009.Google Scholar
- Corbett P, Murray-Rust P: High-Throughput Identification of Chemistry in Life Science Texts. In Computational Life Sciences II. Edited by: Berthold MR, Glen R, Fischer I. Springer Berlin/Heidelberg; 2006:107-118. full_textView ArticleGoogle Scholar
- Hughes TR, et al., et al.: Functional discovery via a compendium of expression profiles. Cell 2000, 102: 109-126. 10.1016/S0092-8674(00)00015-5View ArticlePubMedGoogle Scholar
- Kell DB, King RD: On the optimization of classes for the assignment of unidentified reading frames in functional genomics programmes: the need for machine learning. Trends in Biotechnology 2000, 18: 93-98. 10.1016/S0167-7799(99)01407-9View ArticlePubMedGoogle Scholar
- Patel Y, et al., et al.: Assessment of Additive/Nonadditive Effects in Structure-Activity Relationships: Implications for Iterative Drug Design. Journal of Medicinal Chemistry 2008,51(23):7552-7562. 10.1021/jm801070qView ArticlePubMedGoogle Scholar
- Chemical Computing Group Inc: Molecular Operating Environment. 2009.Google Scholar
- Nobeli I, et al., et al.: A Structure-Based Anatomy of the E. coli Metabolome. Journal of Molecular Biology 2003, 336: 697-719. 10.1016/j.jmb.2003.10.008View ArticleGoogle Scholar
- Gillet VJ, Willett P, Bradshaw J: Similarity Searching Using Reduced graphs. Journal of Chemical Information and Computer Science 2003,43(2):338-45.View ArticleGoogle Scholar
- Meng QJ, et al., et al.: Ligand modulation of REV-ERBα function resets the peripheral circadian clock in a phasic manner. Journal of Cell Science 2008,121(21):3629-3635. 10.1242/jcs.035048PubMed CentralView ArticlePubMedGoogle Scholar
- Heynekamp JJ, et al., et al.: Substituted trans-stilbenes, including analogues of the natural product resveratrol, inhibit the human tumor necrosis factor α-induced activation of transcription factor nuclear factor κB. Journal of Medicinal Chemistry 2006,49(24):7182-7189. 10.1021/jm060630xView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.