Predicting the points of interaction of small molecules in the NF-κB pathway
© Patel et al; licensee BioMed Central Ltd. 2011
Received: 8 April 2010
Accepted: 22 February 2011
Published: 22 February 2011
Skip to main content
© Patel et al; licensee BioMed Central Ltd. 2011
Received: 8 April 2010
Accepted: 22 February 2011
Published: 22 February 2011
The similarity property principle has been used extensively in drug discovery to identify small compounds that interact with specific drug targets. Here we show it can be applied to identify the interactions of small molecules within the NF-κB signalling pathway.
Clusters that contain compounds with a predominant interaction within the pathway were created, which were then used to predict the interaction of compounds not included in the clustering analysis.
The technique successfully predicted the points of interactions of compounds that are known to interact with the NF-κB pathway. The method was also shown to be successful when compounds for which the interaction points were unknown were included in the clustering analysis.
One of the most studied cellular signalling systems is the Nuclear Factor κB (NF-κB) network. The NF-κB family of transcription factors controls the transcription of at least 300 genes, but has different transcriptional and cell fate outcomes in different cells and in response to different stimuli. As well as being a critical component of the innate immune response, NF-κB controls cell division and apoptosis in most cell types. While the NF-κB signalling pathway has been studied in many papers (nearly 30,000 are returned by a PubMed search for "Nuclear Factor kappa B"), there is still a great deal about the system which is not understood. Recently, NF-κB proteins have been shown to oscillate between the cytoplasm and nucleus of stimulated cells and the frequency of these oscillations has been suggested to alter the pattern of gene expression. The discovery of the importance of these dynamic processes requires a re-interpretation of the previous literature.
NF-κB has been a much studied drug target in the pharmaceutical industry. Numerous traditional medicines have been shown to contain compounds that affect NF-κB activity. Many of these are now being investigated for pharmaceutical development, for example gambogic acid, caffeic acid phenyl ester, green tea polyphenols (reviewed by Khan and Mukhtar). In addition, NF-κB antisense oligonucleotides have recently been shown to affect outcome in a murine endotoxic shock model and NF-κB decoy oligonucleotides are of interest as potential therapy for inflammatory diseases (for review see ). The effects of NF-κB modulating drugs have been measured mostly using assays for NF-κB function that have been limited to easily available endpoints such as IκB degradation or DNA binding. As a result the interpretation of the site of action of these compounds may require re-analysis. The combination of the limited characterisation of the site of action, as well as the limited understanding of the NF-κB network, has meant that it has been difficult to interpret and compare the action of different NF-κB inhibitors. Here we use chemoinformatic approaches to cluster a set of known NF-κB modulatory compounds.
The methodology is based on the similar property principle (structurally similar compounds have similar properties), although it must be noted that there are flaws with the principle. The main flaw is that small structural changes can lead to a dramatic change in property (e.g. changing a hydrogen bond donor for an acceptor activity can greatly increase activity against a target), which has a major impact in studying quantitative structure activity/property relationships. In this study we use it as a general rule rather than a specific rule. In addition to identifying relationships between clusters of compounds and their biological functions, clusters were also used to identify the points of interaction of compounds (which are known to interact with the NF-κB pathway) not used in the clustering analysis.
The compounds were obtained from a literature search, which in many instances involved manually searching for chemical structures using chemical names present in the literature. Structures for the compounds can be found in the Additional File 1 (chiral information has been included where known; however only 2D information was used in the work presented here). Since the creation of this list, advances in text mining mean it is now possible to automatically extract names of compounds from the literature and associate the names with structures, for example using Pipeline Pilots' ChemMining Collection or OSCAR3. The resulting list of compounds from the literature search could look like the diverse set collated here. Such lists obtained for a cellular pathway could be used (as here) to identify compounds which interact in a similar manner in a given pathway. A point to note is that here, this technique has not been used to identify novel compounds that interact within the pathway, but rather to identify the point of interaction of compounds which are known to affect the pathway. As an additional aim of this work, we have used all the compounds obtained from the literature search in this analysis, including those for which no specific point of interaction in the NF-κB pathway has been suggested, in order to investigate if this step (or a similar first step in the drug discovery process) could also be automated.
For 297 compounds the type of interaction within the pathway was also taken from the literature. The interactions were defined as: interacting via a ROS mechanism (this can be at any point in the pathway); inhibiting IKK (at point 1 in Figure 2); inhibiting degradation/phosphorylation of IκB (point 2 in Figure 2); increasing degradation/phosphorylation of IκB (point 2); inhibiting translocation (point 3); and interfering with DNA binding (point 4). Four compounds had more than one interaction (see Additional File 1 for structures and references).
Compounds in the Various Training and Test Sets
Interfering with DNA binding
Inhibits IKK activation
Inhibits IκB degradation or phosphorylation
Activates IκB degradation or phosphorylation
Training set 1
Test set 1
Training set 2
Test set 2
Training set 3
Test set 3
Training set 4
Test set 4
Training set 5
Test set 5
Each training set was clustered using Pipeline Pilot with the following descriptors: Extended Connectivity Fingerprints with a path length of 4 atoms (ECFP4), Property descriptors (AlogP, molecular weight, number of hydrogen bond acceptors, number of hydrogen bond donors, number of atoms, number of rotatable bonds, number of rings, and number of aromatic rings), ECFP4 with Property descriptors, BCUT (descriptors obtained from the eigenvalues of the adjacency matrix, weighting the diagonal elements with atom weights), GCUT (obtained from the eigenvalues of a modified graph distance adjacency matrix), BCUT with GCUT, and GCUT with Property descriptors. The BCUT and GCUT descriptors were calculated using MOE. Clustering was based on maximal dissimilarity partitioning with the clusters derived by imposing a distance threshold between a molecule and its cluster representative. As the clustering algorithm in Pipeline Pilot is dependent upon a seed compound, five different seeds were chosen for each descriptor and dataset combination (i.e. there were five different sets of clusters for each descriptor of each dataset giving a total of 25 different sets of clusters for each descriptor).
The number of clusters used in the clustering was chosen by using the following method: first the training set compounds were clustered into a set number of clusters (n), which varied from one to 200 (which would give an average of 2 compounds per cluster). For each n, the average self-similarity (avg-s) of the clusters was calculated. The value of n was chosen so that the biggest decrease in avg-s was seen between (n-1) and n clusters.
The clusters were analysed to see if the compounds they contained had predominantly one type of interaction. The interactions used to define the predominance of a cluster are as given above. Validation of the accuracy of the clustering procedure was performed by finding the most similar cluster to the compounds in a test set (i.e. the compounds in the test sets were used as query compounds) in turn and assigning the predominant interaction of the nearest cluster to the test compound. The nearest cluster was found in one of the following ways:
1. The cluster which had the most similar compound;
2. The cluster which had the most similar cluster centre;
3. The cluster with the highest average similarity;
4. Repeat considering only clusters with a minimum of 1 (i.e. singletons), 2, 3, 4 or 5 compounds.
The compounds with unidentified points of interaction in the pathway were included in the training sets used in the clustering analysis in order to investigate how their inclusion affected the ability of using this technique to predict the interactions of the query compounds in the test sets.
Each dataset was then analysed to see how many of the clusters contain compounds with a predominant interaction. The levels of predominance used in this analysis were 50%, 66%, and 75%, and considered clusters with a minimum size of 1 (singletons), 2, 3, 4 or 5 compounds. For example, the datasets were analysed to see how many clusters have at least 50% of their members with the same interaction.
These results show that some of the descriptors and clustering levels used are able to classify compounds into clusters that have predominantly one type of interaction. Next, we look at whether the clusters can be used to identify the interactions of compounds in the test sets.
As before, varying the percentage of compounds in a cluster that must have the same interaction for that interaction to be assigned to a query compound has little effect on the order of performance of the descriptors, although there is a slight drop in the number of correctly identified interactions. Similarly, no difference is seen whether the most similar cluster is calculated using the compound of the most similar cluster, the most similar cluster centre, or the cluster with the highest average cluster.
Average Number of Queries with A Similarity > = 0.7
Minimum Cluster Size
BCUT with GCUT
GCUT with Properties
ECFP4 with Properties
The drawback to this technique is that the identifications of the point of interactions are limited to compounds which are similar to those used in the initial clustering analysis. If a novel compound that is distinct in structure to the known compounds is found to interact with the pathway, the technique used here may not be sufficient to identify the point of interaction. Incorporating techniques used in scaffold-hopping, such as using reduced graphs, may help to overcome such limitations. Representing molecules as a set of connected features (e.g. an aromatic ring system or an aliphatic link joining two other features together) and using these representations in a search would allow molecules with the same connections of features to be retrieved which would be less structurally similar than the work presented here whilst (hopefully) having the same functionality, allowing for more diverse molecules with similar interactions to be found. Other methods may include creating pharmacophores from molecules with the same interactions and finding compounds which fit the pharmacophore.
In this analysis we have shown that it is possible to use noisy data obtained from the literature to link together chemoinformatics and network biology, specifically a cellular pathway network. The clusters produced from such data have been shown to be fairly robust, with the information gained from clustering able to help us to decide on the mechanism of action for compounds that are known to interact somewhere in the NF-κB pathway, and could be used to help infer which (and where in the pathway) other untested compounds interact. Here, ECFP4 and ECFP4 with Property descriptors have been shown to be the best at producing clusters which can be used to identify the interactions of an external set of compounds. One interesting feature would be if the techniques used here would be able to find compounds which can alter the timings, and hence the function, of the system. The results presented also show the general applicability of the similar property principle.
YP, CH and MW would like to thank the BBSRC (SCIBS grant codes: BBE01366X1 and BBE0136001; SABR grant code BBF0059381) for financial support for this project. This work was directly supported by Jim Thomas, Dave Spiller, Clare Vickers and Paul Dobson and indirectly by Dean Jackson and William Rowe.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.