- Research
- Open Access
Multi-target drug repositioning by bipartite block-wise sparse multi-task learning
- Limin Li^{1},
- Xiao He^{2, 3}Email author and
- Karsten Borgwardt^{2, 3}
https://doi.org/10.1186/s12918-018-0569-7
© The Author(s) 2018
- Published: 24 April 2018
Abstract
Background
Finding potential drug targets is a crucial step in drug discovery and development. Recently, resources such as the Library of Integrated Network-Based Cellular Signatures (LINCS) L1000 database provide gene expression profiles induced by various chemical and genetic perturbations and thereby make it possible to analyze the relationship between compounds and gene targets at a genome-wide scale. Current approaches for comparing the expression profiles are based on pairwise connectivity mapping analysis. However, this method makes the simple assumption that the effect of a drug treatment is similar to knocking down its single target gene. Since many compounds can bind multiple targets, the pairwise mapping ignores the combined effects of multiple targets, and therefore fails to detect many potential targets of the compounds.
Results
We propose an algorithm to find sets of gene knock-downs that induce gene expression changes similar to a drug treatment. Assuming that the effects of gene knock-downs are additive, we propose a novel bipartite block-wise sparse multi-task learning model with super-graph structure (BBSS-MTL) for multi-target drug repositioning that overcomes the restrictive assumptions of connectivity mapping analysis.
Conclusions
The proposed method BBSS-MTL is more accurate for predicting potential drug targets than the simple pairwise connectivity mapping analysis on five datasets generated from different cancer cell lines.
Availability
The code can be obtained at http://gr.xjtu.edu.cn/web/liminli/codes.
Keywords
- Drug repositioning
- Multi-task learning
- L1000
Background
In recent years, multi-target drugs - that is, drugs that affect more than one gene or protein - have been moving into the focus of drug discovery and development [1, 2]. The first reason for this phenomenon is that multi-target drugs have been found to be more effective than single-target alternatives for several complex diseases, such as cancer and metabolic diseases [1, 3–5]. The rationale behind this observation is that the efficacy of the inhibition of a single target may often not be strong enough to affect the entire biological process, which means that multiple targets with weaker inhibition may have a stronger combined effect than a single blocked target. A second reason to study multi-target drugs is that many drugs fail to be approved because of their severe side effects in clinical trials [2, 6], which is a negative consequence of more than one target being affected. Therefore, finding potential compound targets is a crucial step in drug profiling, the process that seeks those compounds with a desired target or those without undesired side effects.
Many machine learning methods have been proposed for finding potential drug targets based on compound structure [5]. The rationale is that if two compounds are similar in structure, they may have similar targets. The targets of the compounds are inferred by comparing their structures to known drugs. However, it has been shown that many compounds with similar structure have different effects [7]. Therefore, considering only structure information is not sufficient to accurately detect potential drug targets. Several other types of information are also used for drug target prediction, such as drug sensitivity, drug side effects, gene expression, gene/protein structure, gene/protein function, Protein Protein Interaction (PPI) or metabolic network [8–12]. In recent work, Liu et al. [13] sought to solve the drug targeting problem by using a new type of information in form of the LINCS L1000 dataset [14], which includes expressions levels of single gene knock-downs and drug treatments. This information connect drugs and gene knock-downs directly through their regulation effects on all the genes in a cell.
This LINCS L1000 dataset [14] is a part of the Library of Integrated Network-Based Cellular Signatures (LINCS) Program (http://www.lincsproject.org/) that generates and publishes large datasets of measurements that quantify how cells respond to a variety of perturbing agents. Specifically, the LINCS L1000 platform (http://www.lincscloud.org/) provides large-scale gene expression assays in which cultured cells have been exposed to various chemical and genetic perturbations [14]. The LINCS L1000 dataset includes 20,413 small-molecule compounds and 18,493 shRNAs knock-downs tested in 18 different cancer cell lines. After each perturbation, a gene expression profile for each cell line is obtained. This huge dataset creates the opportunity to analyze the relationship between compounds and gene targets at a genome-wide level.
Liu et al. [13] explored this relationship based on the assumption that a drug treatment and the knock-down of a target gene of this drug will induce similar gene expression changes in a sample. Using this idea, drug targets can be inferred by connectivity mapping analysis [15], that is, by finding knock-downs and drugs with similar gene expression profiles. Similarity between gene expression profiles is determined using the gene set enrichment analysis [16] that quantifies whether a drug and a gene knock-down up- or down-regulate the same set of genes.
Connectivity mapping-based approaches [13, 15] lead to a one-to-one mapping between drugs and gene knock-downs. However, the effect of a drug may not resemble that of only knocking down its single-target gene. Many drugs are able to inhibit several known target genes and many closely related genes on various biology pathways. If a drug inhibits many genes, the gene expression measured after the drug treatment may be different from those measured after each of the gene knock-down experiments. Connectivity mapping ignores the additive effects of gene knock-downs which exist in many biological systems [17–19].
Therefore, our goal in this paper is to develop an approach for multi-target drug repositioning using the LINCS L1000 dataset that could overcome the restrictive assumptions of connectivity analysis. We model the problem as finding combinations of gene knock-downs that induce gene expression changes similar to a drug treatment. Furthermore, we assume that the effect of a drug treatment can be modelled as the additive effects of all its single target gene knock-downs, which is reasonable since additive effects of gene knock-downs exist in many biological systems [17–19]. Finally, we propose an efficient and effective multi-task machine learning approach for detecting the potential drug targets, using both expression data and compound structure information. The assumption of additive effects of gene knock-downs may not reveal the true underlying biology system. However, our experiments show that, in a practical sense, it works much better than pairwise connectivity mapping in predicting the potential drug targets.
The analysis of the LINCS L1000 dataset is further complicated by the fact that each drug treatment is replicated in several plates, each of which represents one gene expression signature of the drug treatment. This is similar for the gene knock-down, where each genetic perturbation is performed as a knock-down of one of the shRNAs of the gene. Therefore, each gene knock-down is represented by several signatures as well, which may vary for different shRNAs. These replication experiments make the data set more reliable, as the redundancy in measurements will lead to noise reduction and to a better representation of the spectrum of the effects of a drug. However, this also makes the data analysis more complicated. For example, the enrichment analysis-based methods cannot be directly applied to test the association between drugs and gene knock-downs, since drug treatments and gene knock-downs are represented by set of signatures.
We propose a novel bipartite block-wise sparse multi-task learning method that detects the relationships between groups of drug signatures and groups of gene signatures in an unsupervised manner. The optimization problem can be solved based on the accelerated proximal gradient method, which is more efficient than the computationally demanding enrichment analysis-based test. In terms of effectiveness, our extensive experiments on five cancer cell lines from the LINCS L1000 data [14] provided more accurate predictions of potential drug targets than the simple connectivity mapping-based test, validated by known drug targets from the DrugBank database [20], together with the Gene ontology (GO) function information from the GO database [21], or with PPI information from the HPRD PPI network [22]. The prediction results generate an interesting unified connected bipartite graph of drugs and genes across different cell lines, where we can find co-modules of drugs and genes, duplicate edges across multiple cell lines, and meaningful connections of genes in the same pathways. These novel and meaningful discoveries from the L1000 database demonstrate the effectiveness of our approach.
Methods
Materials
Data information for the five datasets
Cell line | No.drugs | No.d-treats | D-dose | D-time | No. genes | No.g-treats | G-time |
---|---|---|---|---|---|---|---|
HCC515 | 144 | 504 | 10 μm | 24h | 156 | 1715 | 96h |
HT29 | 44 | 160 | 10 μm | 24h | 174 | 2543 | 96h |
PC3 | 329 | 2513 | 10 μm | 24h | 223 | 2954 | 96h |
SW480 | 4 | 8 | 10 μm | 6h | 6 | 36 | 96h |
MCF7 | 293 | 1608 | 10 μm | 24h | 219 | 2655 | 96h |
We also downloaded the drug structure from the KEGG database [23] and computed the structure similarities among the drugs by applying the software Simcomp [24] on the drug structures.
Approach
List of notations
Notation | Description |
---|---|
A | p×m matrix with gene expressions for gene therapies. |
B | p×n matrix with gene expressions for drug treatments. |
K | b×b similarity matrix among b drugs. |
p | number of landmark genes whose gene expressions are measured. |
a | number of knockout genes, or the number of column blocks in A. |
b | number of drugs, or the number of column blocks in B. |
m _{ i } | number of signatures for knocking down gene i. |
n _{ j } | number of signatures for treatments using drug j. |
m | \(\sum _{i} m_{i}\),total number of experiments with gene knockout. |
n | \(\sum _{j} n_{j}\),total number of experiments with drug treatment. |
W | m×n matrix with multivariate factors |
W ^{ ij } | m_{ i }×n_{ j } matrix, the (i,j) block in W |
W ^{:j} | m×n_{ j } matrix, the (·,j) sub-matrix in W |
W ^{i:} | m_{ i }×n matrix, the (i,·) sub-matrix in W |
K(s,t) | k_{ st }, the (s,t) entry in K. |
\(w^{j}_{l}\) | the lth column in the block W^{:j} |
\(\bar {w}_{j}\) | mean of the columns in W^{:j} |
As mentioned, the aim of this paper is to find sets of gene knock-downs that could induce gene expression changes similar to those of a drug treatment. In other words, we would like to learn a weight matrix W (see Fig. 1), such that each block of B can be approximated by a combination of blocks in A.
where L(A,B,W) is a loss function and Φ(W) is a regularization term. This is a typical multi-task learning problem [25], where a task corresponds to a column of W and a feature corresponds to a row of W.
In our biological application, a task represents a specific signature for a drug treatment and a feature represents a specific signature for a gene knock-down. Thus, each drug treatment with multiple signatures corresponds to a group of tasks, and each gene knock-down with multiple signatures corresponds to a group of features.
However, in our scenario we do not want sparse entries in W with ℓ_{1} regularization or sparse rows in W with ℓ_{2,1} group sparsity regularization. As demonstrated in Fig. 1, we would like W to be sparse block-wise in a bipartite way, such that a few groups of features are selected for a group of tasks. As a result, each drug only has a small number of potential targets.
In “Bipartite block-wise sparse multi-task learning” section, we propose a bipartite block-wise sparse multi-task learning model and an efficient optimization algorithm to solve the problem. In “Graph structure on group of tasks” section we integrate the compound structure information into the proposed bipartite block-wise sparse multi-task learning model. In “Association stability score” section we introduce the stability selection strategy for parameters.
Bipartite block-wise sparse multi-task learning
where α_{ ij } is a weight factor for each block and can be simply chosen as number of entries in the (i,j) block in W. Note that when α_{ ij }=1, the first regularization term is equal to the ℓ_{1} regularization.
Graph structure on group of tasks
The chemical structure of the drugs is commonly used in drug target discovery, since similar drugs often share similar targets [5]. Suppose we are given a matrix of \(K\in \mathbb {R}^{b\times b}\) that contains structure similarity among all the b drugs in our scenario. K can be considered as a graph matrix or adjacent matrix of a graph. To integrate drug structure information in the above model (2), we further develop a novel multi-task learning model with a super-graph structure.
In general multi-task learning, graph structure on tasks may increase the accuracy for multi-task learning [28]. Instead of graph structure on tasks, our application should consider the graph structure on groups of tasks, which we refer to as super-graph structure.
The first term ensures that the tasks in each group are as close to the group center as possible, and the second term ensures that the group centers have the graph structure that is represented by K.
Optimization algorithm
where \(\tilde {L} = \tilde {H}+ELE^{T}\).
The pseudocode of the proposed method BBSS-MTL is shown in Algorithm 1.
Association stability score
In this section, we provide a stability selection strategy to deal with parameters in the proposed models. Suppose we have the sets Λ_{1}, Λ_{2} and Λ_{3} for the parameters λ_{1}, λ_{2} and λ_{3}, respectively. For each combination of the parameters λ={λ_{1},λ_{2},λ_{3}}∈Λ_{1}×Λ_{2}×Λ_{3}=Λ, we define a probability score \(P_{\lambda }\in \mathbb {R}^{a\times b}\) for all blocks in W in the following way. We first subsample {A_{ t },B_{ t }} from the data {A,B} with the number of rows being p/2. Using our BBSS-MTL model, we can obtain W_{ t }.
We repeat the procedure T times and compute the probability of hitting for each block {i,j} as \(P_{\lambda }(i,j) = \sum _{t} \left (W_{t}^{ij}\neq 0 \right)/T\). We then compute these probability values for each combination of the parameters λ∈Λ, and define the association stability score of block {i,j} by averaging these probabilities over different parameters as score(i,j)=mean_{λ∈Λ}P_{ λ }(i,j).
Results
In this section, we evaluate our approaches on eight simulated datasets and show the effectiveness of the BBSS-MTL for bipartite block sparsity and the super-graph structure. We then apply our approach to find potential targets for drugs on datasets from five cell lines.
Simulation
A. Data Generation
We simulate data using the following scenario with p=50,m=200,n=80,a=20,b=10. We first simulate \(A\in \mathbb {R}^{p\times m}\), \(W\in \mathbb {R}^{m \times n}\), and then generate \(B\in \mathbb {R}^{p\times n} = AW + E\), where the elements of E are sampled from a standard normal distribution N(0,1). We assume the groups of correlated input variables in A have an equal size of 10.
To generate A, we first generate a prototype column vector \(\bar {A}_{i}\) for each group i, where \(\bar {A}_{i} \sim N(0,5I_{50})\) and I_{50} is a 50-dimensional identity matrix. We then generate the columns in this group by \(A_{i_{k}} = \bar {A}_{i} + \epsilon \), where ε∼N(0,I_{50}), \(i_{k} \in \mathcal {I}_{i}\), \(\mathcal {I}_{i}\) represents the indices in i-th column group of A. The procedure is repeated for each column group of A, and we get A with column group structure.
To generate \(W\in \mathbb {R}^{m\times n}\), we first generate a prototype \(W_{0}\in \mathbb {R}^{a\times b}\) using the following scenario. We assume three groups of columns with sizes {4,3,3} in W_{0}, respectively. First, the input features are randomly selected for the three output groups (two for the first group, three for the second group, and three for the third group). We then randomly choose another feature, which is used for all the three groups, and further choose another feature for only the second and third groups. Hence, three features in total are chosen for the first group, five for the second group, and five for the third group, such that the spatial relationships between the three groups are different. We then generate W by putting its entries in (i,j)-th block W^{ ij }=W_{0}(i,j)+ε, where ε∼N(0,0.1), in either a sparse or dense way. Two datasets, Data1.0 and Data2.0, are generated with the sparse and the relatively dense scenarios, respectively.
We then obtain the similarity among the groups of tasks by the Gaussian kernel calculated among the columns of W_{0}. To show the effectiveness of using the super-graph structure, we randomly perturb W_{0} by changing t nonzero entries to zero and changing t zeros entries to nonzero, in different levels with t=2,5,10. With the same A and the perturbed W_{0}s, we generate datasets Data1.1, Data1.2, Data1.3 and Data2.1, Data2.2, Data2.3 based on Data1.0 and Data2.0, respectively. Note that the first digit in the name of the dataset indicates whether the entries in the nonzero blocks of W is sparse or not, while the last digit represents the levels of the perturbation.
B. Evaluating the BBSS-MTL without super-graph structure
With the eight datasets, we first check the prediction performance of the proposed block sparse multi-task learning method BBSS-MTL without super-graph structure, i.e. λ_{3}=0. We compare BBSS-MTL with other multi-task methods, including ℓ_{1} regularization and ℓ_{2,1} regularization. For each dataset, we first randomly split the p=50 samples into five folds. Four of the folds are taken as training data and the remaining fold is taken as test data, in turn. Parameters are chosen by cross-validation on the training data only. Once W^{∗} is calculated from the training data, the mean squared error on the test data (\(\phantom {\dot {i}\!} {MSE}_{W^{*}} = \|B- {AW}^{*}\|_{F}\)) is used to measure the performance of the learning.
The mean squared error of different regularization methods
ℓ _{1} | ℓ _{2,1} | BBSS-MTL(λ_{1},λ_{3}=0) | BBSS-MTL(λ_{3}=0) | |
---|---|---|---|---|
Data1.0 | 3.472 ±0.023 | 4.272 ±0.055 | 3.328 ±0.018 | 3.325 ±0.017 |
Data1.1 | 3.401 ±0.020 | 4.324 ±0.058 | 3.269 ±0.016 | 3.268 ±0.018 |
Data1.2 | 3.409 ±0.019 | 4.238 ±0.072 | 3.265 ±0.016 | 3.266 ±0.016 |
Data1.3 | 3.513 ±0.019 | 4.430 ±0.076 | 3.388 ±0.019 | 3.391 ±0.020 |
Data2.0 | 1.313 ±0.031 | 1.315 ±0.018 | 1.014 ±0.007 | 1.016 ±0.007 |
Data2.1 | 1.331 ±0.037 | 1.337 ±0.025 | 1.031 ±0.007 | 1.029 ±0.007 |
Data2.2 | 1.372 ±0.028 | 1.383 ±0.026 | 1.087 ±0.008 | 1.089 ±0.008 |
Data2.3 | 1.506 ±0.025 | 1.529 ±0.030 | 1.199 ±0.011 | 1.200 ±0.008 |
C. Evaluating the BBSS-MTL with super-graph structure
To further evaluate the performance of our BBSS-MTL approach, we designed the following experiments. Suppose we have A and B, the latter of which is generated by a ground truth W: the nonzero blocks in W can be discovered by BBSS-MTL without super-graph structure. However, if the given B is generated by a perturbation \(\tilde {W}\) of W, can we recover the useful information in W from A and B?
Connectivity mapping analysis
We then consider the case in which we compare two signature vectors of differential gene expressions: one from a drug treatment and the other one from a gene knock-down. We follow the method used in [30]. Firstly, we rank the two vectors to get the two up- and down-regulated gene sets. We then use the connectivity mapping analysis test twice to test whether the up- and down-regulated gene sets from one vector are enriched in the other ones. Finally, the p-values are summarized using the Fisher inverse chi-square test statistic. The null distribution is also obtained by random permutations.
As mentioned, the above-described method cannot be directly applied to compare a drug treatment and a gene knock-down, as the drug treatment and gene knock-down are represented by groups of signatures in L1000 dataset. Therefore, we have compared all the pairs of signatures from a drug treatment and a gene knock-down and chosen the smallest p-value. Finally, we performed one-to-one mapping on all the pairs of drug treatments and gene knock-downs and ranked them using the p-values.
Experimental results
For each of the induced real datasets, we applied our BBSS-MTL approach to find the associations among the drugs and the knock-down genes. For the parameters, we set λ_{1}=1 and λ_{3}=1e+4. We chose the parameter set Λ_{2} such that the smallest one could get dense W and the largest one could get W with all zeros. With these parameter setting, stability analysis was used to obtain a stability score for each drug-gene pair.
Since the approved drug-target pairs are limited, we evaluated our predicted potential drug targets by integrating the gene function information in the Gene Ontology or the protein-protein interactions in the PPI network. We define potential drug targets of a drug as those genes that are down- or upstream and closely related to the real targets in pathways. We first computed the semantic scores among all the genes in each of our five datasets using the R software package Gosim [31]. Two genes with a high semantic score are considered to have similar gene functions. If a predicted gene’s semantic similarity score with any known target gene of a drug is higher than 0.8, it is considered as a potential target for the drug. By comparing our results with the GO-based scores, we could calculate the AUC (area under curve) values AUC_GO for each cell line dataset. We also evaluated our results by using a human PPI network, Human Protein Reference Database (HPRD) [22]. We computed pairwise distances among all genes in each of our induced datasets by their shortest paths. Two genes with a connection on the PPI network are considered to have physical interactions with each other. Thus, if a predicted gene is connected to any known target gene of a drug, it is considered as a potential target for a drug. We computed AUC_PPI by comparing our results with the PPI-based scores.
The area under the ROC curve (AUC) for the prediction
Cell line | Lasso | Connectivity mapping | BBSS-MTL | |||
---|---|---|---|---|---|---|
AUC_GO | AUC_PPI | AUC_GO | AUC_PPI | AUC_GO | AUC_PPI | |
HCC515 | 0.528 | 0.510 | 0.456 | 0.549 | 0.592 | 0.537 |
HT29 | 0.458 | 0.563 | 0.479 | 0.556 | 0.558 | 0.603 |
PC3 | 0.491 | 0.503 | 0.509 | 0.561 | 0.550 | 0.580 |
SW480 | 0.500 | 0.400 | 0.444 | 0.609 | 0.769 | 0.719 |
MCF7 | 0.541 | 0.588 | 0.492 | 0.541 | 0.571 | 0.606 |
Discussion
Additive effects of NFKB1 and 1KBKB for immunologic drugs
Top predictions of potential drug targets on SW480 cell line
Drug | Gene | Stability score |
---|---|---|
Thalidomide | IKBKB | 0.927 |
Valproic acid | IKBKB | 0.920 |
Thalidomide | NFKB1 | 0.913 |
Valproic acid | NFKB1 | 0.900 |
Sirolimus | IKBKB | 0.880 |
Sirolimus | NFKB1 | 0.873 |
Auranofin | IKBKB | 0.847 |
Auranofin | NFKB1 | 0.847 |
A unified bipartite graph of drugs and targets in three cell lines
Some of the novel findings can be supported by other references or pathway analysis. For example, using the dateset of HT29 cell line, a group of drugs Fluorometholone, Tropicamide, Budesonide, Indapamide, Nabumetone, Warfarin and Fluocinonide are predicted to be related with a group of genes VDR and TEK. The gene VDR is a vitamin D receptor in lipid metabolism and calcium reabsorption. In the presence of a ligand, VDR binds to vitamin D response elements to either increase or repress transcription of target genes. Two known targets of Nabumetone - PTGS1 and PTGS2 - are also known to be in the pathway of lipid metabolism. Nabumetone is used for the treatment of osteoarthritis and rheumatoid arthritis. McAlindon et al. [35] showed that vitamin D may prevent progression of osteoarthritis, and Athanassiou et al. [36] pointed out that reduced vitamin D intake has been linked to increased susceptibility to the development of rheumatoid arthritis. This implies that knocking down VDR might have similar effects to taking Nabumetone, and that VDR could be a potential target for Nabumetone. The known targets of Tropicamide - CHRM1, CHRM2, CHRM3 and CHRM4 - are known to be in the pathway of calcium signaling pathway, which is likely to be related to VDR. The primary target of the drug Fluorometholone is the glucocorticoid receptor (GR). The bound receptor-ligand complex further binds to many glucocorticoid response elements (GREs) in the promoter region of the target genes. The VDR gene contains a number of putative GREs, and it has been proven that VDR could regulate the expression of glucocorticoid [37]. Therefore, the gene VDR could have an additive effect for the treatment of Fluorometholone. The gene TEK, a receptor tyrosine kinase, is also likely to have an effect on this drug. Kuo et al. [38] showed that many glucocorticoid-regulated genes affect receptor tyrosine kinase signalling. GR primary targets could inhibit the insulin/IGF1 pathway, which propagates through receptor tyrosine kinases. Knocking down TEK may modulate the activity of this pathway and further regulate the expression of GR. There is little evidence that VDR is close to TEK in the same pathway, and our results suggest that the two genes might be affected together in different pathways by the five drugs, including Fluorometholone and Nabumetone.
It is also interesting that we found edges between drugs and genes in multiple cell lines by BBSS-MTL. For example, we found that genes NTRK1, PTGER4, POLA1, VKORC1, GPRC5a, VDR, KCNMA1, IMPDH2 and PNP could be potential targets of the drug Bortezomib in both cell lines of MCF7 and PC3. The stability score of Bortezomib and NTRK1 is ranked first in the MCF7 cell line and second in the PC3 cell line by BBSS-MTL. Bortezomib is known to inhibit the 26S proteasome, which modulates the activity in the division of multiple myeloma and leukemic cells, and further induces apoptosis. The gene NTRK1 also modulates the activity of the apoptosis pathway. Therefore, it is likely that NTRK1 is the potential target of the drug Bortezomib or close to its target in the downstream pathways. We note that TEK is also predicted to be related to the drug Bortezomib, in the cell line of PC3, which means that the results from the two cell lines connect NTRK1 and TEK through Bortezomib. In fact, the genes NTRK1 and TEK are both receptor tyrosine kinases and it is known that 26S proteasome can degrade receptor tyrosine kinases [39]. The epidermal growth factor receptor (EGFR) is a known target for the drug Gefitinib. From the PC3 cell line, we could recover the connection between Gefitinib and the gene PGE4. It is known that the association of PGE4 with β-arrestin 1 and c-Src signaling complex could result in the transactivation of EGFR [40], which shows the high probability that PGE4 could take effect with EGFR together for the treatment of the drug Gefitinib.
To summarize, the results across different cell lines in whole graph presented in Fig. 4 show plenty of interesting and novel findings, including the unified connected bipartite graph, the co-modules of drugs and genes in the graph, the duplicate edges across multiple cell lines, and meaningful connections of genes.
Conclusion
In this paper, we have proposed a bipartite block-wise sparse multi-task learning approach BBSS-MTL for discovering multiple targets for drugs using the LINCS L1000 dataset. We assume that the effect of a drug treatment can be approximated by adding the effects of all its single-target knock-downs and considering additive effects of multiple targets for a drug treatment. Our results show that our model could achieve higher accuracy in detecting potential drug targets than the widely used but simpler pairwise connectivity mapping. Interesting and novel discoveries by our methods, such as new biologically meaningful drug target candidates, the modules of the drugs and genes from the three different cell lines, and the duplicate edges predicted from different cell lines, also reflect the effectiveness of our approaches.
However, there are some limitations of our proposed approaches. For example, the agonistic effect cannot be reflected from the knock-down experiments. A drug generally only inhibits part of the protein functions, while knocking down a gene may reduce all its functions. Besides, our additivity assumption is a simplification of the complexity of the underlying biological system. Still, its better performance implies that it describes the biological system in the L1000 problem much better than the widely used pairwise mapping. It might be possible to get rid of the additivity assumption by constructing a more complicated nonlinear model that considers the interactions among multiple targets of a drug treatment. However, this will lead to a much higher computational load, which currently renders genome-wide analyses infeasible. This problem will be a topic of future work.
Declarations
Acknowledgements
The work was supported by the NSFC projects 11471256, 11631012 and the SNSF Starting Grant “Significant Pattern Mining”.
Funding
The publication charges for this article were funded by SNSF Starting Grant “Significant Pattern Mining”.
Availability of data and materials
The L1000 dataset is available in http://www.lincscloud.org/. Drug structure dataset is obtained from KEGG database. The code is available at http://gr.xjtu.edu.cn/web/liminli/codes.
About this supplement
This article has been published as part of BMC Systems Biology Volume 12 Supplement 4, 2018: Selected papers from the 11th International Conference on Systems Biology (ISB 2017). The full contents of the supplement are available online at https://bmcsystbiol.biomedcentral.com/articles/supplements/volume-12-supplement-4.
Authors’ contributions
LL designed the work, LL and XH performed the experiments, LL, XH and KB wrote the manuscript. All authors revised and approved the manuscript.
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
- Lu JJ, et al. Multi-target drugs: The trend of drug research and development. PLoS ONE. 2012; 7(6):40262. https://doi.org/10.1371/journal.pone.0040262.View ArticleGoogle Scholar
- Reddy AS, Zhang S. Polypharmacology: drug discovery for the future. Expert Rev Clin Pharmacol. 2013; 6(1):10–15861274. https://doi.org/10.1586/ecp.12.74.View ArticlePubMed CentralGoogle Scholar
- Paolini GV, et al. Global mapping of pharmacological space. Nat Biotech. 2006; 24(7):805–815.View ArticleGoogle Scholar
- Csermely P, et al. The efficiency of multi-target drugs: the network approach might help drug design. Trends Pharmacol Sci. 2005; 26(4):178–82. https://doi.org/10.1016/j.tips.2005.02.007.View ArticlePubMedGoogle Scholar
- Koutsoukas A, et al. From in silico target prediction to multi-target drug design: Current databases, methods and applications. J Proteome. 2011; 74(12):2554–74. https://doi.org/10.1016/j.jprot.2011.05.011.View ArticleGoogle Scholar
- Lounkine E, et al. Large-scale prediction and testing of drug activity on side-effect targets. Nature. 2012; 486(7403):361–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Martin YC, et al. Do structurally similar molecules have similar biological activity?J Med Chem. 2002; 45(19):4350–8. https://doi.org/10.1021/jm020155c.View ArticlePubMedGoogle Scholar
- Csermely P, et al. Structure and dynamics of molecular networks: A novel paradigm of drug discovery: A comprehensive review. Pharmacol Ther. 2013; 138(3):333–408. https://doi.org/10.1016/j.pharmthera.2013.01.016.View ArticlePubMedPubMed CentralGoogle Scholar
- Li L, et al. Predicting enzyme targets for cancer drugs by profiling human metabolic reactions in nci-60 cell lines. BMC Bioinformatics. 2010; 11(1):1–16.View ArticleGoogle Scholar
- Li L. Mpgraph: multi-view penalised graph clustering for predicting drugtarget interactions. IET Syst Biol. 2014; 8:67–736.View ArticlePubMedGoogle Scholar
- Isik Z, et al. Drug target prioritization by perturbed gene expression and network information. Sci Rep. 2015; 5:17417.View ArticlePubMedPubMed CentralGoogle Scholar
- Laenen G, et al. Finding the targets of a drug by integration of gene expression data with a protein interaction network. Mol BioSyst. 2013; 9:1676–85. https://doi.org/10.1039/C3MB25438K.View ArticlePubMedGoogle Scholar
- Liu C, et al. Compound signature detection on lincs l1000 big data. Mol BioSyst. 2015; 11:714–22. https://doi.org/10.1039/C4MB00677A.View ArticlePubMedPubMed CentralGoogle Scholar
- Subramanian A, et al. A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell. 2016; 171(6):1437–1452.e17.View ArticleGoogle Scholar
- Lamb J, et al. The connectivity map: Using gene-expression signatures to connect small molecules, genes, and disease. Science. 2006; 313(5795):1929–35. https://doi.org/10.1126/science.1132939. http://www.sciencemag.org/content/313/5795/1929.full.pdf.View ArticlePubMedGoogle Scholar
- Subramanian A, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. PNAS. 2005; 102(43):15545–50. https://doi.org/10.1073/pnas.0506580102.View ArticlePubMedPubMed CentralGoogle Scholar
- Additive roles of XPA and MSH2 genes in uvb-induced skin tumorigenesis in mice. DNA Repair. 2002; 1(11):935–40.Google Scholar
- Alper H, Miyaoku K, Stephanopoulos G. Construction of lycopene-overproducing e. coli strains by combining systematic and combinatorial gene knockout targets. Nat Biotechnol. 2005; 23(5):612–6.View ArticlePubMedGoogle Scholar
- Phillips PC. Epistasis — the essential role of gene interactions in the structure and evolution of genetic systems. Nat Rev Genet. 2008; 9:855–67.View ArticlePubMedPubMed CentralGoogle Scholar
- Wishart DS, et al. Drugbank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res. 2006; 34(Database issue):668–672. https://doi.org/10.1093/nar/gkj067.View ArticleGoogle Scholar
- Ashburner M, et al. Gene ontology: tool for the unification of biology. Nat Genet. 2000; 25(1):25–29.View ArticlePubMedPubMed CentralGoogle Scholar
- Keshava Prasad TS, etal. Human protein reference database—2009 update. Nucleic Acids Res. 2009; 37(suppl 1):767–72.View ArticleGoogle Scholar
- Kanehisa M, Goto S. Kegg: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000; 28(1):27–30. https://doi.org/10.1093/nar/28.1.27. http://nar.oxfordjournals.org/content/28/1/27.full.pdf+html.View ArticlePubMedPubMed CentralGoogle Scholar
- Prabhakara S, Acharya R. Simcomp: A hybrid soft clustering of metagenome reads. In: PRIB’10. Berlin, Heidelberg: Springer: 2010. p. 113–124. http://dl.acm.org/citation.cfm?id=1887854.1887866.Google Scholar
- Swirszcz G, Lozano AC. Multi-level lasso for sparse multi-task regression. In: ICML. New York: ACM: 2012. p. 361–8. http://icml.cc/2012/papers/207.pdf.Google Scholar
- Zhou J, et al. Modeling disease progression via fused sparse group lasso. In: SIGKDD. Beijing: ACM: 2012. p. 1095–103. https://doi.org/10.1145/2339530.2339702. http://doi.acm.org/10.1145/2339530.2339702.Google Scholar
- Goncalves AR, et al. Multi-task sparse structure learning. In: CIKM. New York: ACM: 2014. p. 451–60. https://doi.org/10.1145/2661829.2662091. https://doi.org/10.1145/2661829.2662091. http://doi.acm.org/10.1145/2661829.2662091.Google Scholar
- Chen X, et al. Graph-structured multi-task regression and an efficient optimization method for general fused lasso. 2010. arXiv:1005.3579v1.Google Scholar
- Hollander M, Wolfe DA. Nonparametric Statistical Methods. Wiley series in probability and statistics. New York: Wiley; 1999. A Wiley-Interscience publication. http://opac.inria.fr/record=b1095753.Google Scholar
- Hoshida Y, Brunet J-P, Tamayo P, Golub TR, Mesirov JP. Subclass mapping: Identifying common subtypes in independent disease data sets. PLoS ONE. 2007; 2(11):1195. https://doi.org/10.1371/journal.pone.0001195.View ArticleGoogle Scholar
- Yu G, et al. Gosemsim: an r package for measuring semantic similarity among go terms and gene products. Bioinformatics. 2010; 26(7):976–978. https://doi.org/10.1093/bioinformatics/btq064. http://bioinformatics.oxfordjournals.org/content/26/7/976.full.pdf+html.View ArticlePubMedGoogle Scholar
- Jeon K-I, et al. Gold compound auranofin inhibits ikappab kinase (ikk) by modifying cys-179 of ikkbeta subunit. Exp Mol Med. 2003; 35:61–66.View ArticlePubMedGoogle Scholar
- Bennett G, et al. Valproic acid-induced alterations in growth and neurotrophic factor. Reprod Toxicol. 2000; 14(1):1–11. https://doi.org/10.1016/S0890-6238(99)00064-7.View ArticlePubMedGoogle Scholar
- Deeb SA, et al. Vitamin e decreases valproic acid induced neural tube defects in mice. Neurosci Lett. 2000; 292(3):179–82. https://doi.org/10.1016/S0304-3940(00)01457-9.View ArticlePubMedGoogle Scholar
- McAlindon TE, et al. Relation of dietary intake and serum levels of vitamin d to progression of osteoarthritis of the knee among participants in the framingham study. Ann Intern Med. 1996; 125(5):353–9. https://doi.org/10.7326/0003-4819-125-5-199609010-00001.View ArticlePubMedGoogle Scholar
- Kostoglou-Athanassiou A, et al. Vitamin d and rheumatoid arthritis. Ther Adv Endocrinol Metab. 2012; 3(6):181–7. https://doi.org/10.1177/2042018812471070.View ArticlePubMedPubMed CentralGoogle Scholar
- Hidalgo AA, et al. Glucocorticoid regulation of the vitamin d receptor. J Steroid Biochem Mol Biol. 2010; 121(1-2):372–5.View ArticlePubMedPubMed CentralGoogle Scholar
- Kuo T, et al. Genome-wide analysis of glucocorticoid receptor-binding sites in myotubes identifies gene networks modulating insulin signaling. PNAS. 2012; 109(28):11160–65.View ArticlePubMedPubMed CentralGoogle Scholar
- Sepp-Lorenzino L, et al. Herbimycin a induces the 20 s proteasome- and ubiquitin-dependent degradation of receptor tyrosine kinases. J Biol Chem. 1995; 270(28):16580–7.View ArticlePubMedGoogle Scholar
- Buchanan FG, et al. Role of β-arrestin 1 in the metastatic progression of colorectal cancer. PNAS. 2006; 103(5):1492–7.View ArticlePubMedPubMed CentralGoogle Scholar