Multi-target drug repositioning by bipartite block-wise sparse multi-task learning

Background Finding potential drug targets is a crucial step in drug discovery and development. Recently, resources such as the Library of Integrated Network-Based Cellular Signatures (LINCS) L1000 database provide gene expression profiles induced by various chemical and genetic perturbations and thereby make it possible to analyze the relationship between compounds and gene targets at a genome-wide scale. Current approaches for comparing the expression profiles are based on pairwise connectivity mapping analysis. However, this method makes the simple assumption that the effect of a drug treatment is similar to knocking down its single target gene. Since many compounds can bind multiple targets, the pairwise mapping ignores the combined effects of multiple targets, and therefore fails to detect many potential targets of the compounds. Results We propose an algorithm to find sets of gene knock-downs that induce gene expression changes similar to a drug treatment. Assuming that the effects of gene knock-downs are additive, we propose a novel bipartite block-wise sparse multi-task learning model with super-graph structure (BBSS-MTL) for multi-target drug repositioning that overcomes the restrictive assumptions of connectivity mapping analysis. Conclusions The proposed method BBSS-MTL is more accurate for predicting potential drug targets than the simple pairwise connectivity mapping analysis on five datasets generated from different cancer cell lines. Availability The code can be obtained at http://gr.xjtu.edu.cn/web/liminli/codes.


Background
In recent years, multi-target drugs -that is, drugs that affect more than one gene or protein -have been moving into the focus of drug discovery and development [1,2]. The first reason for this phenomenon is that multi-target drugs have been found to be more effective than single-target alternatives for several complex diseases, such as cancer and metabolic diseases [1,[3][4][5]. The rationale behind this observation is that the efficacy of the inhibition of a single target may often not be strong enough to affect the entire biological process, *Correspondence: xiao.he@bsse.ethz.ch 2 Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland 3 Swiss Institute of Bioinformatics (SIB), Basel, Switzerland Full list of author information is available at the end of the article which means that multiple targets with weaker inhibition may have a stronger combined effect than a single blocked target. A second reason to study multi-target drugs is that many drugs fail to be approved because of their severe side effects in clinical trials [2,6], which is a negative consequence of more than one target being affected. Therefore, finding potential compound targets is a crucial step in drug profiling, the process that seeks those compounds with a desired target or those without undesired side effects.
Many machine learning methods have been proposed for finding potential drug targets based on compound structure [5]. The rationale is that if two compounds are similar in structure, they may have similar targets. The targets of the compounds are inferred by comparing their structures to known drugs. However, it has been shown that many compounds with similar structure have different effects [7]. Therefore, considering only structure information is not sufficient to accurately detect potential drug targets. Several other types of information are also used for drug target prediction, such as drug sensitivity, drug side effects, gene expression, gene/protein structure, gene/protein function, Protein Protein Interaction (PPI) or metabolic network [8][9][10][11][12]. In recent work, Liu et al. [13] sought to solve the drug targeting problem by using a new type of information in form of the LINCS L1000 dataset [14], which includes expressions levels of single gene knock-downs and drug treatments. This information connect drugs and gene knock-downs directly through their regulation effects on all the genes in a cell.
This LINCS L1000 dataset [14] is a part of the Library of Integrated Network-Based Cellular Signatures (LINCS) Program (http://www.lincsproject.org/) that generates and publishes large datasets of measurements that quantify how cells respond to a variety of perturbing agents. Specifically, the LINCS L1000 platform (http://www.lincscloud.org/) provides large-scale gene expression assays in which cultured cells have been exposed to various chemical and genetic perturbations [14]. The LINCS L1000 dataset includes 20,413 smallmolecule compounds and 18,493 shRNAs knock-downs tested in 18 different cancer cell lines. After each perturbation, a gene expression profile for each cell line is obtained. This huge dataset creates the opportunity to analyze the relationship between compounds and gene targets at a genome-wide level.
Liu et al. [13] explored this relationship based on the assumption that a drug treatment and the knock-down of a target gene of this drug will induce similar gene expression changes in a sample. Using this idea, drug targets can be inferred by connectivity mapping analysis [15], that is, by finding knock-downs and drugs with similar gene expression profiles. Similarity between gene expression profiles is determined using the gene set enrichment analysis [16] that quantifies whether a drug and a gene knock-down up-or down-regulate the same set of genes.
Connectivity mapping-based approaches [13,15] lead to a one-to-one mapping between drugs and gene knockdowns. However, the effect of a drug may not resemble that of only knocking down its single-target gene. Many drugs are able to inhibit several known target genes and many closely related genes on various biology pathways. If a drug inhibits many genes, the gene expression measured after the drug treatment may be different from those measured after each of the gene knock-down experiments. Connectivity mapping ignores the additive effects of gene knock-downs which exist in many biological systems [17][18][19].
Therefore, our goal in this paper is to develop an approach for multi-target drug repositioning using the LINCS L1000 dataset that could overcome the restrictive assumptions of connectivity analysis. We model the problem as finding combinations of gene knock-downs that induce gene expression changes similar to a drug treatment. Furthermore, we assume that the effect of a drug treatment can be modelled as the additive effects of all its single target gene knock-downs, which is reasonable since additive effects of gene knock-downs exist in many biological systems [17][18][19]. Finally, we propose an efficient and effective multi-task machine learning approach for detecting the potential drug targets, using both expression data and compound structure information. The assumption of additive effects of gene knock-downs may not reveal the true underlying biology system. However, our experiments show that, in a practical sense, it works much better than pairwise connectivity mapping in predicting the potential drug targets.
The analysis of the LINCS L1000 dataset is further complicated by the fact that each drug treatment is replicated in several plates, each of which represents one gene expression signature of the drug treatment. This is similar for the gene knock-down, where each genetic perturbation is performed as a knock-down of one of the shRNAs of the gene. Therefore, each gene knock-down is represented by several signatures as well, which may vary for different shRNAs. These replication experiments make the data set more reliable, as the redundancy in measurements will lead to noise reduction and to a better representation of the spectrum of the effects of a drug. However, this also makes the data analysis more complicated. For example, the enrichment analysis-based methods cannot be directly applied to test the association between drugs and gene knock-downs, since drug treatments and gene knock-downs are represented by set of signatures.
We propose a novel bipartite block-wise sparse multitask learning method that detects the relationships between groups of drug signatures and groups of gene signatures in an unsupervised manner. The optimization problem can be solved based on the accelerated proximal gradient method, which is more efficient than the computationally demanding enrichment analysis-based test. In terms of effectiveness, our extensive experiments on five cancer cell lines from the LINCS L1000 data [14] provided more accurate predictions of potential drug targets than the simple connectivity mapping-based test, validated by known drug targets from the DrugBank database [20], together with the Gene ontology (GO) function information from the GO database [21], or with PPI information from the HPRD PPI network [22]. The prediction results generate an interesting unified connected bipartite graph of drugs and genes across different cell lines, where we can find co-modules of drugs and genes, duplicate edges across multiple cell lines, and meaningful connections of genes in the same pathways. These novel and meaningful discoveries from the L1000 database demonstrate the effectiveness of our approach.

Materials
We downloaded the small-molecule compound and shRNA data in the L1000 dataset, which was released by the Broad Institute LINCS Data Generation Center [14]. In our experiments, we used the expressions for the 978 landmark genes in the five cell lines of SW480, HT29, HCC515, MCF7, and PC3. A landmark gene is one whose gene expression has been determined as being informative to characterize the transcriptome and which is measured directly in the L1000 assay. The 978 landmark genes were selected as those widely expressed across lineage and were found to have good predictive power for inferring the expression of other genes that are not directly measured in the assay. For each of these cell lines, we generated a smaller dataset, which included only the treatment effects for the approved drugs and shRNA genes in the Drug-Bank Database [20]. For the drug treatments or the gene knock-downs in each dataset, the same treatment conditions were used. For all the cell lines except SW480, the drug treatment with dose of 10μm and duration of 24h was used. For the SW480, there are no experiments for drug treatments with duration 24h, so we used the duration of 6h instead. The details of the data information are shown in Table 1.
We also downloaded the drug structure from the KEGG database [23] and computed the structure similarities among the drugs by applying the software Simcomp [24] on the drug structures.

Approach
Suppose we have tested the responses of a cell line after b treatments with small-molecule compounds and a gene knock-downs. After the treatments, the expression levels of p selected landmark genes are evaluated. As there are several replicates of each perturbation, there are several signatures for each drug treatment or gene knock-down treatment. For gene knock-downs we obtain a p × m differential gene expression matrix A, where m is the total number of experiments with a gene knock-down. The effect of each gene therapy i ∈ {1, · · · , a} is represented by a column block of A, as demonstrated in Fig. 1, with m i replicating experiments, where i m i = m. Similarly, we get another p × n differential gene expression matrix B for the drug treatment. The effect of each drug therapy j ∈ {1, · · · , b} is represented by a column block of B with n j replicating experiments, where j n j = n. For the sake of simplicity, we have assumed the columns with the same gene knock-down in A or the same drug in B are grouped together. Table 2 summarizes the notations used in this paper.
As mentioned, the aim of this paper is to find sets of gene knock-downs that could induce gene expression changes similar to those of a drug treatment. In other words, we would like to learn a weight matrix W (see Fig. 1), such that each block of B can be approximated by a combination of blocks in A.
Therefore, our objective function can be represented as: where L(A, B, W ) is a loss function and (W ) is a regularization term. This is a typical multi-task learning problem [25], where a task corresponds to a column of W and a feature corresponds to a row of W. In our biological application, a task represents a specific signature for a drug treatment and a feature represents a specific signature for a gene knock-down. Thus, each drug treatment with multiple signatures corresponds to a group of tasks, and each gene knock-down with multiple signatures corresponds to a group of features.
As we consider a multivariate regression model, the loss function can be represented as F . Since a drug often affects a few genes [20], we would like W to have a sparsity structure. Two common regularization terms for sparsity are which enforces sparse entries in W, and which enforces row sparsity in W. Here, W (k, l) and W (k, :) represent the (k, l) entry and the k-th row of W, respectively. The 1 and 2,1 regularizations are widely  1 Illustration of the proposed model. Each block of A and B represents differential gene expression data after gene knock-down and drug treatment. W is the association matrix we would like to learn used in multi-task learning approaches in various applications, including bioinformatics [25][26][27][28]. 2,1 sparsity is usually called group sparsity or block sparsity in multitask learning since it is generalized from group lasso and assumes that only a few features should be selected for all the tasks. However, in our scenario we do not want sparse entries in W with 1 regularization or sparse rows in W with 2,1 group sparsity regularization. As demonstrated in Fig. 1, we would like W to be sparse block-wise in a bipartite way, such that a few groups of features are selected for a group of tasks. As a result, each drug only has a small number of potential targets.  In "Bipartite block-wise sparse multi-task learning" section, we propose a bipartite block-wise sparse multitask learning model and an efficient optimization algorithm to solve the problem. In "Graph structure on group of tasks" section we integrate the compound structure information into the proposed bipartite block-wise sparse multi-task learning model. In "Association stability score" section we introduce the stability selection strategy for parameters.

Bipartite block-wise sparse multi-task learning
In this section, we propose a new type of sparsity called bipartite block sparsity that enforces sparse blocks in the W matrix such that a few groups of features are selected for each group of tasks. Suppose W ij ∈ R m i ×n j represents the (i, j) block in W corresponding to the i-th gene and jth drug. Our bipartite block sparsity ( bb ) regularization is represented as With slightly changed 1 regularization and our bb regularization together, we proposed a bipartite block-wise sparse multi-task model as where α ij is a weight factor for each block and can be simply chosen as number of entries in the (i, j) block in W.
Note that when α ij = 1, the first regularization term is equal to the 1 regularization.

Graph structure on group of tasks
The chemical structure of the drugs is commonly used in drug target discovery, since similar drugs often share similar targets [5]. Suppose we are given a matrix of K ∈ R b×b that contains structure similarity among all the b drugs in our scenario. K can be considered as a graph matrix or adjacent matrix of a graph. To integrate drug structure information in the above model (2), we further develop a novel multi-task learning model with a super-graph structure. In general multi-task learning, graph structure on tasks may increase the accuracy for multi-task learning [28]. Instead of graph structure on tasks, our application should consider the graph structure on groups of tasks, which we refer to as super-graph structure.
Suppose the j 1 -th task group (j 1 -th drug) has a weight score of k j 1 j 2 = K(j 1 , j 2 ) with the j 2 -th task group (j 2 -th drug). Denote the submatrix in W corresponding to the jth drug by W :j , the lth column in the matrix W :j by w j l , and the mean of the columns in W :j byw j . The super-graph structure can be used for another type of regularization: (3) The first term ensures that the tasks in each group are as close to the group center as possible, and the second term ensures that the group centers have the graph structure that is represented by K.
Putting all these components together, we propose the bipartite block-wise sparse multi-task learning model with super-graph structure (BBSS-MTL)

Optimization algorithm
To solve this optimization problem, we first simplify the terms in (3). The first term in (3) can be rewritten as where H j = I n j − e n j e T n j /n j is a centering matrix, e n j is an n j -dimensional column vector with all ones, andH The second term in (3) can be simplified as tr(W LW T ), whereW = [w 1 , · · · ,w b ]. Note thatW can be further simplified as where E = diag(e n 1 /n 1 , · · · , e n b /n b ) is a block diagonal matrix. Thus the second term in (3) can be simplified as From (5) and (6), we can obtain that the equation in (3) is equivalent to whereL =H + ELE T . Thus the BBSS-MTL in model (4) can be rewritten as To solve the optimization problem in Eq. 8, we propose an accelerated proximal gradient-based algorithm.
The key step is to generate the proximal operator: It can be proved that the above operator exhibits a certain decomposition property, based on which we can efficiently obtain the proximal operator in two stages.
where the 1 proximal operator on the matrix V ij is prox b1 and the 2 proximal operator on the matrix V ij is prox b2 The pseudocode of the proposed method BBSS-MTL is shown in Algorithm 1.

Association stability score
In this section, we provide a stability selection strategy to deal with parameters in the proposed models. Suppose we have the sets 1 , 2 and 3 for the parameters λ 1 , λ 2 and λ 3 , respectively. For each combination of the parameters λ = {λ 1 , λ 2 , λ 3 } ∈ 1 × 2 × 3 = , we define a probability score P λ ∈ R a×b for all blocks in W in the following way. We first subsample {A t , B t } from the for i ∈ 1 : a, j ∈ 1 : b do 8: Update (i, j)-block of W by two steps: Compute (W k+1 ) using the equation in (8) 13: We repeat the procedure T times and compute the probability of hitting for each block {i, j} as P λ (i, j) = t W ij t = 0 /T. We then compute these probability values for each combination of the parameters λ ∈ , and define the association stability score of block {i, j} by averaging these probabilities over different parameters as score(i, j) = mean λ∈ P λ (i, j).

Results
In this section, we evaluate our approaches on eight simulated datasets and show the effectiveness of the BBSS-MTL for bipartite block sparsity and the super-graph structure. We then apply our approach to find potential targets for drugs on datasets from five cell lines.

A. Data Generation
We simulate data using the following scenario with p = 50, m = 200, n = 80, a = 20, b = 10. We first simulate A ∈ R p×m , W ∈ R m×n , and then generate B ∈ R p×n = AW + E, where the elements of E are sampled from a standard normal distribution N(0, 1). We assume the groups of correlated input variables in A have an equal size of 10.
To generate A, we first generate a prototype column vec-torĀ i for each group i, whereĀ i ∼ N(0, 5I 50 ) and I 50 is a 50-dimensional identity matrix. We then generate the columns in this group by A i k =Ā i + , where ∼ N(0, I 50 ), i k ∈ I i , I i represents the indices in i-th column group of A. The procedure is repeated for each column group of A, and we get A with column group structure.
To generate W ∈ R m×n , we first generate a prototype W 0 ∈ R a×b using the following scenario. We assume three groups of columns with sizes {4, 3, 3} in W 0 , respectively. First, the input features are randomly selected for the three output groups (two for the first group, three for the second group, and three for the third group). We then randomly choose another feature, which is used for all the three groups, and further choose another feature for only the second and third groups. Hence, three features in total are chosen for the first group, five for the second group, and five for the third group, such that the spatial relationships between the three groups are different. We then generate W by putting its entries in (i, j)-th block W ij = W 0 (i, j) + , where ∼ N(0, 0.1), in either a sparse or dense way. Two datasets, Data1.0 and Data2.0, are generated with the sparse and the relatively dense scenarios, respectively.
We then obtain the similarity among the groups of tasks by the Gaussian kernel calculated among the columns of W 0 . To show the effectiveness of using the supergraph structure, we randomly perturb W 0 by changing t nonzero entries to zero and changing t zeros entries to nonzero, in different levels with t = 2, 5, 10. With the same A and the perturbed W 0 s, we generate datasets Data1.1, Data1.2, Data1.3 and Data2.1, Data2.2, Data2.3 based on Data1.0 and Data2.0, respectively. Note that the first digit in the name of the dataset indicates whether the entries in the nonzero blocks of W is sparse or not, while the last digit represents the levels of the perturbation.

B. Evaluating the BBSS-MTL without super-graph structure
With the eight datasets, we first check the prediction performance of the proposed block sparse multi-task learning method BBSS-MTL without super-graph structure, i.e. λ 3 = 0. We compare BBSS-MTL with other multi-task methods, including 1 regularization and 2,1 regularization. For each dataset, we first randomly split the p = 50 samples into five folds. Four of the folds are taken as training data and the remaining fold is taken as test data, in turn. Parameters are chosen by cross-validation on the training data only. Once W * is calculated from the training data, the mean squared error on the test data (MSE W * = B − AW * F ) is used to measure the performance of the learning. Table 3 shows the mean and standard deviation of the MSEs calculated by 50 different split of the datasets. The results show that our BBSS-MTL method performs best for all of the datasets. For Data1.*, where each block of W has sparse entries, 1 regularization is the second best and 2,1 is the worst. For Data2.*, both 1 and 2,1 perform poorly, but our approach performs well. In Fig. 2, we show the calculated W for Data2.0, Data2.1, Data2.2, and Data2.3, using a connectivity mapping analysis test (introduced in "Connectivity mapping analysis" section) and different types of regularization. The selected regularization parameters are chosen to be 0.005, 0.01 and 0.2 for 1 , 2,1 and BBSS-MTL, respectively. We observe that both 1 regularization and 2,1 regularization have false positive discoveries, and that BBSS-MTL could enforce the non-interesting blocks in W to be exactly zero for all the four datasets.

C. Evaluating the BBSS-MTL with super-graph structure
To further evaluate the performance of our BBSS-MTL approach, we designed the following experiments. Suppose we have A and B, the latter of which is generated by a ground truth W : the nonzero blocks in W can be discovered by BBSS-MTL without super-graph structure. However, if the given B is generated by a perturbationW of W, can we recover the useful information in W from A and B?
Note that Data2.1, Data2.2, and Data2.3 are generated based on the perturbations from the W in Data2.0. For this W, we also have the similarity matrix K among the task groups, which was computed as a Gaussian kernel among the columns of its prototype W 0 . With the side information K, we could apply our BBSS-MTL with super-group structure on the three datasets to determine whether the computed W * matches the ground truth W. The Fig. 3 shows the results, in which the nonzero blocks in the true W can be recovered by BBSS-MTL from all the three levels of perturbation.

Connectivity mapping analysis
As a baseline to compare the effect of the compound treatment and the gene knock-down, our adopted method is based on the connectivity mapping analysis test [15], which is based on the Kolmogorov-Smirnov statistic [29]. Given a set of up-and down-regulated genes S up and S dn and a vector of differential gene expressions E with length n, where S up and S dn are subsets of n genes whose differential expression are measured in the given vector, the connectivity mapping analysis tests whether the given gene set is enriched in the given gene expression vector using the Kolmogorov-Smirnov test. Specifically, the analysis starts by ranking the expression vector E and then constructs a new vector V indicating the position of each gene. The up-connectivity score CS up is then calculated from the following two values: Similarly, we can calculate CS dn for the down-regulated gene set. The connectivity score CS = 0 if CS up and CS dn have the same sign; otherwise CS = CS up − CS dn . The null distribution is obtained by randomly permuting E 1000 times and then calculating the test statistic.
We then consider the case in which we compare two signature vectors of differential gene expressions: one from a drug treatment and the other one from a gene knock-down. We follow the method used in [30]. Firstly, we rank the two vectors to get the two upand down-regulated gene sets. We then use the connectivity mapping analysis test twice to test whether the up-and down-regulated gene sets from one vector are enriched in the other ones. Finally, the p-values are summarized using the Fisher inverse chi-square test statistic. The null distribution is also obtained by random permutations.
As mentioned, the above-described method cannot be directly applied to compare a drug treatment and a gene knock-down, as the drug treatment and gene knockdown are represented by groups of signatures in L1000 dataset. Therefore, we have compared all the pairs of signatures from a drug treatment and a gene knock-down and chosen the smallest p-value. Finally, we performed one-to-one mapping on all the pairs of drug treatments and gene knock-downs and ranked them using the p-values.

Experimental results
For each of the induced real datasets, we applied our BBSS-MTL approach to find the associations among the drugs and the knock-down genes. For the parameters, we set λ 1 = 1 and λ 3 = 1e + 4. We chose the parameter set 2 such that the smallest one could get dense W and the largest one could get W with all zeros. With these parameter setting, stability analysis was used to obtain a stability score for each drug-gene pair.
Since the approved drug-target pairs are limited, we evaluated our predicted potential drug targets by integrating the gene function information in the Gene Ontology or the protein-protein interactions in the PPI network. We define potential drug targets of a drug as those genes that are down-or upstream and closely related to the real targets in pathways. We first computed the semantic scores among all the genes in each of our five datasets using the R software package Gosim [31]. Two genes with a high semantic score are considered to have similar gene functions. If a predicted gene's semantic similarity score with any known target gene of a drug is higher than 0.8, it is considered as a potential target for the drug. By comparing our results with the GO-based scores, we could calculate the AUC (area under curve) values AUC_GO for each cell line dataset. We also evaluated our results by using a human PPI network, Human Protein Reference Database (HPRD) [22]. We computed pairwise distances among all genes in each of our induced datasets by their shortest paths. Two genes with a connection on the PPI network are considered to have physical interactions with each other. Thus, if a predicted gene is connected to any known target gene of a drug, it is considered as a potential target for a drug. We computed AUC_PPI by comparing our results with the PPI-based scores. Table 4 shows the AUC values using Lasso, connectivity mapping and our BBSS-MTL for the five cell line datasets. For Lasso, we simply average the expression levels of different treatments (different knock-downs) for each drug (gene), and apply single task lasso to calculate association stability scores similar to the BBSS-MTL approach. We can see that for the small dataset of cell line SW480, our method could achieve the highest AUC values among all the five datasets, based on either GO information or PPI information. The results obtained for the other four cell lines are slightly lower. For almost all the datasets, our approach performs significantly better than the connectivity mapping approach, which attempts to find the associations by one-to-one mapping. Our approaches obtained higher AUC values based on either GO information or PPI information, which may suggest that the multi-task learning could discover useful multiple targets that are affected by the same drugs.

Additive effects of NFKB1 and 1KBKB for immunologic drugs
We first investigated the results obtained from the smallest dataset generated from the SW480 cell line. Table 5 shows a list of potential targets for the drugs discovered from the SW480 cell lines. The SW480 dataset only includes four drugs and six genes. The four drugs are Thalidomide, Valproic acid, Sirolimus and Auranofin and the six genes are IKBKB, NFKB1, MTOR, HDAC2, PPARG and AR. Among these genes and drugs, NFKB1 is an approved target for Thalidomide, and IKBKB is an approved target for Auranofin. The results show that these known targets for the two drugs are discovered (in rows 3 and 7 of the table, respectively), and NFKB1 and IKBKB could both have additive effects on the drug treatment of Thalidomide or Auranofin. This is highly likely because NFKB1 and IKBKB are close in the NF-Kappa-B signaling pathway, which is a critical pathway for immune response. NFKB1 forms the NF-kappa-B complex and is known to be inhibited by I-kappa-B proteins, while IKBKB is a kinase which phosphorylates serine residues on the I-kappa-B proteins and further activates the NF-kappa-B complex. Our approach successfully recovers the connection between these two genes. The discovered link between the drug Auranofin and the gene NFKB1 could be further supported by studies on Auranofin [32]. In addition, Sirolimus is also known to be also highly related to the immune system. Also, studies have shown that Valproic acid can induce neural tube defects, which is related with maternal immune system [33,34]. Therefore, NFKB1 and IKBKB may also have additive effects for the treatments with immunologic agents. The top eight  predicted pairs show strong connections between the two above-mentioned genes and all four immunologic drugs. Figure 4 shows our findings across the three cell line datasets of HT29, MFC7 and PC3. The top 30 associations between drugs and genes detected by the proposed method BBSS-MTL on the datasets from the three cell lines are depicted in a bipartite graph of drugs and genes. Interestingly, all the drugs and genes in the three datasets are connected in a unified bipartite graph, and co-modules of drugs and genes could be observed. There are some novel findings from the unified bipartite graph. There are roughly five co-modules of drugs and targets in the graph, including the co-module of the drugs Fluorometholone, Tropicamide, Budesonide, Indapamide, Nabumetone, Warfarin and Fluocinonide and the genes VDR and TEK, the co-module of Bortezomib and its several potential targets CA1, ADA,FASN PRKDC,RRM2, PPARD, COMT,GGCX, PRKCZ, the comodule of PTGER4 and several drugs of Amlodipine, Gefitinib, Nilutamide, Norfloxacin, Acetylcysteine, Biperiden, Flutamide, Mycophenolatemofetil, Amifostine, Disopyramide and Bortezomib, the co-module of the gene NTRK1 and drugs Amifostine, Disopyramide, Bortezomib, Felodipine, Fenofibrate, Dimenhydrinate, Astemizole, Lofexidine, Acitretin, Acetazolamide, Bendroflumethiazide, Amifostine and Disopyramide, the comodule of the gene PRKCZ and the drugs Bortezomib, Amifostine, Acitretin, Acetazolamide and Bendroflumethiazide. Some of the novel findings can be supported by other references or pathway analysis. For example, using the dateset of HT29 cell line, a group of drugs Fluorometholone, Tropicamide, Budesonide, Indapamide, Nabumetone, Warfarin and Fluocinonide are predicted to be related with a group of genes VDR and TEK. The gene VDR is a vitamin D receptor in lipid metabolism and calcium reabsorption. In the presence of a ligand, VDR binds to vitamin D response elements to either increase or repress transcription of target genes. Two known targets of Nabumetone -PTGS1 and PTGS2 -are also known to be in the pathway of lipid metabolism. Nabumetone is used for the treatment of osteoarthritis and rheumatoid arthritis. McAlindon et al. [35] showed that vitamin D may prevent progression of osteoarthritis, and Athanassiou et al. [36] pointed out that reduced vitamin D intake has been linked to increased susceptibility to the development of rheumatoid arthritis. This implies that knocking down VDR might have similar effects to taking Nabumetone, and that VDR could be a potential target for Nabumetone. The known targets of Tropicamide -CHRM1, CHRM2, CHRM3 and CHRM4 -are known to be in the pathway of calcium signaling pathway, which is likely to be related to VDR. The primary target of the drug Fluorometholone is the glucocorticoid receptor (GR). The bound receptorligand complex further binds to many glucocorticoid response elements (GREs) in the promoter region of the target genes. The VDR gene contains a number of putative GREs, and it has been proven that VDR could regulate the expression of glucocorticoid [37]. Therefore, the gene VDR could have an additive effect for the treatment of Fluorometholone. The gene TEK, a receptor tyrosine kinase, is also likely to have an effect on this drug. Kuo et al. [38] showed that many glucocorticoid-regulated genes affect receptor tyrosine kinase signalling. GR primary targets could inhibit the insulin/IGF1 pathway, which propagates through receptor tyrosine kinases. Knocking down TEK may modulate the activity of this pathway and further regulate the expression of GR. There is little evidence that VDR is close to TEK in the same pathway, and our results suggest that the two genes might be affected together in different pathways by the five drugs, including Fluorometholone and Nabumetone.

A unified bipartite graph of drugs and targets in three cell lines
It is also interesting that we found edges between drugs and genes in multiple cell lines by BBSS-MTL. For example, we found that genes NTRK1, PTGER4, POLA1, VKORC1, GPRC5a, VDR, KCNMA1, IMPDH2 and PNP could be potential targets of the drug Bortezomib in both cell lines of MCF7 and PC3. The stability score of Bortezomib and NTRK1 is ranked first in the MCF7 cell line and second in the PC3 cell line by BBSS-MTL. Bortezomib is known to inhibit the 26S proteasome, which modulates the activity in the division of multiple myeloma and leukemic cells, and further induces apoptosis. The gene NTRK1 also modulates the activity of the apoptosis pathway. Therefore, it is likely that NTRK1 is the potential target of the drug Bortezomib or close to its target in the downstream pathways. We note that TEK is also predicted to be related to the drug Bortezomib, in the cell line of PC3, which means that the results from the two cell lines connect NTRK1 and TEK through Bortezomib. In fact, the genes NTRK1 and TEK are both receptor tyrosine kinases and it is known that 26S proteasome can degrade receptor tyrosine kinases [39]. The epidermal growth factor receptor (EGFR) is a known target for the drug Gefitinib. From the PC3 cell line, we could recover the connection between Gefitinib and the gene PGE4. It is known that the association of PGE4 with β-arrestin 1 and c-Src signaling complex could result in the transactivation of EGFR [40], which shows the high probability that PGE4 could take effect with EGFR together for the treatment of the drug Gefitinib.
To summarize, the results across different cell lines in whole graph presented in Fig. 4 show plenty of interesting and novel findings, including the unified connected bipartite graph, the co-modules of drugs and genes in the graph, the duplicate edges across multiple cell lines, and meaningful connections of genes.

Conclusion
In this paper, we have proposed a bipartite block-wise sparse multi-task learning approach BBSS-MTL for discovering multiple targets for drugs using the LINCS L1000 dataset. We assume that the effect of a drug treatment can be approximated by adding the effects of all its singletarget knock-downs and considering additive effects of multiple targets for a drug treatment. Our results show that our model could achieve higher accuracy in detecting potential drug targets than the widely used but simpler pairwise connectivity mapping. Interesting and novel discoveries by our methods, such as new biologically meaningful drug target candidates, the modules of the drugs and genes from the three different cell lines, and the duplicate edges predicted from different cell lines, also reflect the effectiveness of our approaches.
However, there are some limitations of our proposed approaches. For example, the agonistic effect cannot be reflected from the knock-down experiments. A drug generally only inhibits part of the protein functions, while knocking down a gene may reduce all its functions. Besides, our additivity assumption is a simplification of the complexity of the underlying biological system. Still, its better performance implies that it describes the biological system in the L1000 problem much better than the widely used pairwise mapping. It might be possible to get rid of the additivity assumption by constructing a more complicated nonlinear model that considers the interactions among multiple targets of a drug treatment. However, this will lead to a much higher computational load, which currently renders genome-wide analyses infeasible. This problem will be a topic of future work.