 Research
 Open Access
 Published:
Computational drug repositioning using metapathbased semantic network analysis
BMC Systems Biologyvolume 12, Article number: 134 (2018)
Abstract
Background
Drug repositioning is a promising and efficient way to discover new indications for existing drugs, which holds the great potential for precision medicine in the postgenomic era. Many networkbased approaches have been proposed for drug repositioning based on similarity networks, which integrate multiple sources of drugs and diseases. However, these methods may simply view nodes as the sametyped and neglect the semantic meanings of different metapaths in the heterogeneous network. Therefore, it is urgent to develop a rational method to infer new indications for approved drugs.
Results
In this study, we proposed a novel methodology named HeteSim_DrugDisease (HSDD) for the prediction of drug repositioning. Firstly, we build the drugdrug similarity network and diseasedisease similarity network by integrating the information of drugs and diseases. Secondly, a drugdisease heterogeneous network is constructed, which combines the drug similarity network, disease similarity network as well as the known drugdisease association network. Finally, HSDD predicts novel drugdisease associations based on the HeteSim scores of different metapaths. The experimental results show that HSDD performs significantly better than the existing stateoftheart approaches. HSDD achieves an AUC score of 0.8994 in the leaveoneout cross validation experiment. Moreover, case studies for selected drugs further illustrate the practical usefulness of HSDD.
Conclusions
HSDD can be an effective and feasible way to infer the associations between drugs and diseases using on metapathbased semantic network analysis.
Background
Over the past decades, de novo drug development is expensive, timeconsuming and limited to a relatively small number of targets [1,2,3]. By conservative estimate, the cost for developing a new drug is about $1.8 billion dollars, and the developing time is about 15 years [4]. To overcome these problems, researchers and pharmaceutical enterprises have begun to pay their attentions to finding new medical indications from those approved drugs [5]. Drug repositioning (or drug repurposing), which can identify new indications of existing drugs, is able to offer a promising alternative to minimize costs and risks for drug discovery [6, 7]. At the same time, several successfully repositioned drugs have shown that such drug repositioning is an effective way (one example is Minoxidil) [8, 9]. What’s more, since elucidating the molecular basis of disease on a personalized level has become an attainable goal, drug repositioning will play a key role in drug discovery and precision medicine paradigm [10, 11].
With the generation of largescale genomic, transcriptomic and proteomic data, it has become a feasible way to predict new drugdisease associations based on computational models [12]. These methods can be mainly divided into three catalogues: machine learningbased approaches, networkbased approaches and text mining and semantic inference approaches [13]. Here, we will present a brief review for each category. A detailed review is beyond the scope of this paper and has already been presented by Li [13] and Shahreza [14].
Machine learningbased models make the best use of biological data in publicly databases for the prediction of novel associations for drugs and diseases [15]. Firstly, drugs will be represented by features vectors, which are derived from their properties, such as drug fingerprint, chemical structures and side effects, while diseases are characterized by phenotype data [16]. Then machine learningbased models are trained based on various features of drugs and diseases. Lastly, we can predict associations of drugs and diseases based on these learningbased models.
Gottlieb et al [5] firstly proposed a novel method called PREDICT for the largescale prediction of drug indications. The proposed method employed multiple drugdrug and diseasedisease similarity measures to construct a logistic regression classifier for drug repositioning. Menden [17] mainly made use of both genomic features of the cell lines and chemical properties of considered drugs, which aims to build a feedforward perceptron neural network model for the sake of solving the drug repositioning problem. Inspired by Menden, Napolitano et al [18] put forward a drugcentered computational approach, which utilized the integrated drug chemical structures similarity, drug molecular target similarity and druggene expression similarity to complete the prediction. Besides, Zhang [19], Yang [20], Wu [21] and Liang [22] also put forward their respectively machine learning models to infer drugdisease associations.
At the same time, networkbased methods are widely used strategy for computational drug repositioning [23,24,25]. While traditional study mostly focuses on exploring the shared characteristics among drug compounds such as chemical structures [26] and sideeffects [9], recent networkbased approaches [27] take pharmacological, genetic and clinical data into account to explore the relationships between drugs and diseases from network point view. The assumption of networkbased methods is that similar drugs are normally associated with similar diseases and vice versa. Therefore, measuring the similarity between disease phenotypes is essential for drug repositioning [28]. One of the most commonly used rules is guiltbyassociation (GBA) in association relationship prediction [29].
Cheng [30] developed three supervised inference methods which are called drugbased similarity inference (DBSI), targetbased similarity inference (TBSI) and networkbased inference (NBI) respectively, to predict both drugtarget interactions and drugdisease associations. These methods made use of the structural similarity, targettarget genomic sequence similarity and drugtarget topology network similarity. Wu et al [31] built a weighted disease and drug heterogeneous network with the diseasegene and drugtarget relationships from the KEGG database. They clustered the weighted network to identify modules and then assembled all possible drugdisease pairs based on the processed modules. Huang [32] adopted the idea of data fusion and integrated three different networks of drug, genomic and disease phenotype with available experimental data and knowledge. The proposed method inferred drugdisease associations by means of network propagation approach. More recently, Luo [33] proposed a novel computational method named MBiRW to identify potential novel indications for a given drug. MBiRW mainly developed comprehensive similarity measures for drugs and diseases to infer the drugdisease associations. Experimental results on various datasets demonstrated that the proposed approach has a reliable prediction performance. Besides, other methods [1, 8, 12, 34] are also employed to predict novel drug and target associations based on biological networks and achieved great successes.
Except for machine learningbased and networkbased approaches, text mining and semantic inference methods are also effective in predicting drugdisease associations. Especially with the rapid development in text mining research, it is a possible manner to detect novel indications for existing drugs [35, 36]. Exploring the associations of drugs and diseases from biomedical literature, MEDLINE and knowledge databases about genes, has become a meaningful way. Similar to machine learningbased and networkbased methods, these methods [37, 38] can be an effective way in addressing drug repositioning problems.
Although networkbased methods have been used in drug repositioning successfully, most of these approaches simply view objects (nodes) in drugdisease heterogeneous networks as the same type. What’s more, these methods do not consider the different semantic meaning of metapaths, which is crucial for the prediction performance of networkbased methods. For example, Luo [33] built a heterogeneous network by integrating similarities between drugs and disease as well as the known drugdisease association network. A novel BiRandom walk is developed to identify new indications for existing drugs. However, the algorithm treated all the edges in the heterogeneous network equally. Indeed, edges in drug similarity network and disease similarity network represent the similarity relationship of drugs and diseases, while the edges in the drugdisease association network represent the association relationships. The values of edges in the similarity network range from 0 to 1, while values of edges in the drugdisease association network is 0 or 1. This negligence may lead to deviations in predicting results.
Machine learningbased models need to find the information of drugs such as fingerprint, chemical structure and so on. Then drugs can be represented by comprehensive vectors respectively. In this way, we can solve drugrepositioning problems by utilizing all kinds of effective machine learning models such as deep learning. However, machine learningbased models needs to build highly credible negative datasets firstly, which is quite difficult for current data. Networkbased methods measure the similarities between drugs and diseases to construct comprehensive similarity networks. Similarity measurement models are employed to settle drugrepositioning problems. While these methods don’t utilize the negative samples like machine learningbased methods, they have to mining potential associations in depth. Text mining and semantic inference methods mainly explore the associations of drugs and diseases from biomedical literature. In other words, the associations obtained by these methods are all supported by literature, which is alternative to solve drugrepositioning problem. Therefore, these three methods can make up for each other.
HeteSim [39] is a pathbased measure which can accurately measure the relatedness of nodes with the same or different types in a heterogeneous network. This method can effectively capture semantics of metapaths, which is crucial for measuring the relevance of nodes in heterogeneous networks [40,41,42,43].
Methods Katz [44] and CATAPULT [45] only use walk count to measure the similarity between objects, which is shown in Fig. 1. The walkcount between a and c is larger than b and c, which indicates that a is closer to c than b. The association strength between a and c, b and c is 3 and 2 based on walk count, respectively. However, we find that the connections starting from node a possess less meaning than the connections starting from node b. Intuitively, the connectivity between b and c should more intense than a and c, which is in accordance with the results of HeteSim. The association strength between a and c, b and c is 0.567 and 0.707, respectively. Therefore, the similarity calculated by the HeteSim measure seems to be a more reasonable result, which can effectively obtain the semantic meaning of different metapaths.
In this paper, we proposed a novel method called HeteSim_DrugDisease (HSDD) based on HeteSim scores to measure the associations of drugs and diseases. We first construct a heterogeneous network consisting of the drugdrug similarity network, the drugdisease association network and a diseasedisease similarity network. Then, we employ the HeteSim approach to measure the relatedness scores for drugdisease pairs considering the semantic meaning of meatpaths. In the end, we utilize HSDD to predict drugdisease associations. The detail description of HSDD is presented in Methods Section.
Methods
Datasets
To construct a drugdisease heterogeneous network, we downloaded information of drugs and diseases from different data sources. The data mainly contains phenotype similarity network, drug similarity network and drugdisease associations. Next, we will briefly introduce the data used in our experiment. The experimental data used is shown in Table 1.
Disease similarity network
We derived the disease similarity network from MimMiner [46], which is measured based on disease phenotypes. Each disease has one or more phenotype types in the OMIM database [47]. According to the MimMiner database description, the phenotype similarities in the network are measured based on text mining approaches. The similarity values have been normalized to the range [0,1]. Furthermore, we adopt a logistic transformation process to modify the phenotypic similarities, which have been proposed by Vanunu [48]. The definition of the logistic function is
where x denotes the similarity value between phenotypes in MimMiner database, cand dare the parameters. In this study, we set cand das − 15 and log(9999) respectively. From the equation above, we can find that small similarity values will be transformed to be close zero and large similarity values will be enlarged.
Drugdisease association network
The drugdisease association network used in this study was obtained from Gottlieb et al [5]. In this gold standard dataset, there are totally 1933 known drug drugdisease associations involving 593 drugs registered in DrugBank database [49] and 313 diseases listed in Online Mendelian Inheritance in Man (OMIM) [47]. In this study, there are totally 1776 associations related 540 drugs and 306 diseases.
Drug similarity network
The drug similarity network was obtained from the supplementary material of the paper [33]. The authors made the best use of the chemical structures of drugs, similarity correlation analysis and sharing information between drugs to construct a comprehensive drug similarity network, which has totally 663 drugs in this original drug similarity network. The similarity values of drugs range from 0 to 1.
Construction of the drugdisease heterogeneous network
In drug similarity network, let DR = {dr_{1}, dr_{2}, …, dr_{m}} denotes the set of m drugs. The similarity between dr_{i}and dr_{j} can be denoted by sim(dr_{i}, dr_{j}). Similar to drugs, let DI = {di_{1}, di_{2}, …, di_{n}}denotes the set of n diseases in the disease similarity network. The comprehensive similarity value can be represented by sim(di_{i}, di_{j}).
The drugdisease association network can be represented by a bipartite graph G(V, E), where V(G) = {DR, DI} and E(G) is the edge set which contains all the similarities of drugs and diseases and associations between drugs and diseases. If dr_{i} is associated with di_{j} is 1, the weight of edge between them is 1, otherwise, the weight of edge between them is 0. Then we can construct a drugdisease heterogeneous network, which is presented in Fig.2.
Suppose the matrices D, Q and P denote the matrices for drug similarity network, drugdisease association network and disease similarity network respectively, the drug disease heterogeneous network can be expressed as
where Q^{T}denotes the transpose of matrix Q.
HeteSim description
Given a relevance path called S = (A, R), which is denoted by \( {A}_1\overset{R_1}{\to }{A}_2\overset{R_2}{\to}\cdots \overset{R_l}{\to }A{}_{l+1} \). The composite relation between A_{1} and A_{l + 1}is defined as R = R_{1} ∘ R_{2} ∘ ⋯ ∘ R_{l}. A_{i} refers to one of type nodes in the heterogeneous network and R_{i} refers to the relationship between A_{i} andA_{i + 1}. For simplicity, we can also use the type name denoting the relevance path such as P = (A_{1}A_{2}⋯A_{l + 1}), when there is only one relation between pairs.
The HeteSim score between two objects s(s ∈ R_{1}. A_{1}) and t(t ∈ R_{l}. A_{l + 1}) based on the relevance path R = R_{1} ∘ R_{2} ∘ R_{3} ∘ ⋯ ∘ R_{l}, which is expressed as
where O(sR_{1}) is the outneighbors of s based on relation R_{1} and I(tR_{l}) is the inneighbors of t based on relation R_{l}. From the eq. (1), we can find that computation of HeteSim(s, tP) needs to iterate over all pairs (O_{i}(sR_{1}), I_{j}(tR_{l})) of (s, t) along the path and sum up the relatedness of these pairs [39]. Then, we normalize it by the total number of outneighbors of s and inneighbors of t. That means the relevance between s and t is the average relevance between outneighbors s and in neighbors of t.
Specially, the HeteSim score between two sametyped objects s and t based on the selfrelation I is
where δ(s, t) = 1 if s sand t are sametyped objects, or elseδ(s, t) = 0. Obviously, this is not appropriate for our study. Therefore, Yang [50] redefined HeteSim score on selfrelation as the similarity or association strength if s and t is associated, otherwise as 0.
The metapaths in heterogeneous networks have semantic meanings, which make the relatedness of two sametyped objects depending on the given relevance path. Therefore, HeteSim has the ability to measure the similarity of two nodes in a heterogeneous accurately.
Calculation of HeteSim scores
Definition 1. Transition probability matrix. SupposeA and B are two object types in a heterogeneous network, (W_{AB})_{n × m}is the adjacent matrix between typeA and B. The transition probability matrix of A → B can be expressed as
Matrix U_{AB} is the normalized results of matrix W_{AB} along the row vector and V_{AB} is the normalized results of matrix W_{AB}along the column vector. It is easy to prove that U_{AB} is equal to \( {V}_{BA}^{\prime } \).
Definition 2. Reachable probability matrix. In a heterogeneous network, given an arbitrary relevance path P = A_{1}A_{2}⋯A_{l + 1}and two objects s ∈ A_{1}andt ∈ A_{l + 1}, a reachable probability matrix for path P = A_{1}A_{2}⋯A_{l + 1}is defined as,
Objects s and t will meet at the middle type node when s follows along the path and t goes against the path. When the length of path P is even, s and t will meet at the middle of nodeA_{(l/2) + 1}. The path P = (A_{1}A_{2}⋯A_{l + 1}) can be divided into two equallength parts as P = (P_{L}P_{R}),whereP_{L} = (A_{1}A_{2}⋯A_{mid − 1}A_{mid}) and P_{R} = (A_{mid}A_{mid + 1}⋯A_{l + 1}). Here mid = (l/2) + 1. When the length of path P is odd, s and t will not be meet at the same node. In this study, we adopt a compromised method which is proposed by Zeng [42].
Finally, the HeteSim score between s(s ∈ R_{1}. A_{1}) and t(t ∈ R_{l}. A_{l + 1}) based on the path P is calculated as follows:
where \( {PM}_{P_L}={U}_{A_1{A}_2}{U}_{A_2{A}_3}\cdots {U}_{A_{mid1}M} \)and \( {PM}_{P_R^{1}}={U}_{A_l{A}_{l+1}}\cdots {U}_{A_{mid+1}{A}_{mid+2}}{U}_{MA_{mid+1}} \). In the Eq. (2), the transition probability matrix of A_{i} → A_{j}, denotes as \( {U}_{A_i{A}_j} \), is the row normalized matrix of adjacent matrix\( {W}_{A_i{A}_j} \), and the transition probability matrix ofA_{j} → A_{i}, \( {V}_{A_i{A}_j} \) is the column normalized results of matrix \( {W}_{A_i{A}_j} \). The HeteSim score between s and t along the path P can be expressed as
Definition 3. Normalization of HeteSim. The normalized HeteSim score between two objects s and t based on the relevance path P is
As is stated by Shi [39], the normalized HeteSim is the cosine of probability distributions of source object s and target object t reaching the middle type objectM. The HeteSim score ranges from 0 to 1.
Example for HeteSim
A heterogeneous network is shown in Fig.3. The heterogeneous network contains three object types. Here, we show the procedure of measuring the HeteSim scores between s_{1} and t_{1}, t_{2} under the relevance path P = (SDT). The path relevance path P = (SDT) can be divided two parts P_{L} = (SD) and P_{R} = (DT).
The adjacency matrix W_{SD} and W_{TD} can be denoted as:
Then we normalize the above matrices along the row vector. The transition probability matrix of S → D and T → D can be represented as:
According to the Eq. (2), the reachable matrices for P_{L} and P_{R} are equivalent their transition probability matrices, which is \( {V}_{DT}={U}_{TD}^{\prime } \) [39]. Therefore, the HeteSim scores for s_{1},t_{1} and s_{1},t_{2}based on path P can be calculated as:
HeteSim_Drug_Disease method
In the drugdisease heterogeneous network used in this study, there are different metapaths connecting drugs and disease. For example, a drug and a disease phenotype can be connected via “drugdisease phenotype” path and “drugdrugdisease phenotype” path and so on. As we know, these different metapaths may have different semantic meanings. e.g. “Drugdrugdisease phenotype” path indicates that if a drug is associated with a disease, then other drugs similar to the drug can be regard as the potential drugs associated with the disease. “Drugdiseasedisease” path means that if a disease is associated with a drug, the other diseases similar to the disease will be associated with the drug. Next, we will give a systematic introduction to measure the similarity between drugs and diseases connecting by metapaths.
The proposed method HSDD employs HeteSim to compute the similarity of drugs and diseases in the drugdisease heterogeneous network. Usually, scores of different metapaths are combined with a constant that dampens contributions from longer path. HeteSim can effectively measure the subtle semantics of metapaths and we need to combine HeteSim scores of different paths with a constant β to dampen the contributions from longer paths. In this paper, the parameter β needs to be validated by experiments further.
The similarity of S(s, t) based on HSDD can be expressed as
Here s and t denote one drug and one disease, respectively.Ψ_{l}denotes the set of paths connecting the drug s to the disease phenotype t with path length l. It is generally believed that a short path may contribute more than a long path. In this study, we only consider the metapaths with length less than five for HSDD. All the paths that used to measure the association between drugs and diseases are listed in Table 2. There are total 14 paths used for HSDD.
Given a drug s and a disease phenotype t, the association strength is measured by
Results
In this section, we firstly introduce the metrics used to evaluate the performance of various prediction measures. Next, we will perform a comprehensive comparison between HSDD and other representative methods using diseases with known and unknown drugs datasets. After that, we will investigate the effect of parameter βand path lengths on HSDD. At last, we conduct case study to verify the effectiveness of HSDD in inferring drugdisease associations.
Evaluation measures
Firstly, to evaluate the performance of different methods systematically, we conduct a leaveoneout cross validation (LOOCV) experiment. For each drug, at each iteration, one of its drugdisease associations is treat as the test data and all the remaining associations as the training data. After performing prediction, each tested drug ranked together with all other drugs in descending order according to the predicted score. For each specific ranking threshold, if the rank of the tested connection is above the selected threshold, it is regarded as a true positive. The number of true positive over all possible drugdisease relationships is regarded as the truepositive rate corresponding to the specified threshold. On the other hand, if the rank of an unknown connection is above the threshold, it is regarded as a false positive. Truepositive rate and falsepositive rate are computed with varying ranking thresholds for the sake of constructing the receiver operating characteristic (ROC) curve. Area under curve (AUC) represents the overall performance of the algorithms.
Secondly, it is generally believed that the predicted topranked results are also very important and useful in practice. As a result, we compare the performances of all prediction methods in term of the top hundred predicted drugs. The specified toprank thresholds refers to the thresholds that used to count correctly retrieved drugdisease associations. The specified toprank thresholds used in this article is discrete, which range from 0 to 1 with scale 0.1. The more true associations in the top portions, the more effective the prediction method is.
Thirdly, metapaths with different lengths have different contributions to relatedness of drugs and diseases. The parameter β in Eq. (3) can dampen the contributions of longer paths. In this study, we will systematically evaluate its effect on HSDD and then tune its best value by cross validations.
Lastly, we conduct a case study experiment, which predicts topten related drugs for five common diseases for seeking evidence from biomedical literature to verify the effectiveness of HSDD.
Comparison with existing methods on disease with known drugs
We compare HSDD with other four representative methods: NBI [30], HGBI [34] and DrugNet [8], MBiRW [33]. As is mentioned in previous section, NBI could prioritize candidate drugs for a given target or prioritize candidate targets for a given drug simultaneously. HGBI predicted new drugdisease relationships in the newly proposed threelayer model by using an information flowbased method. DrugNet is also a networkbased drug repositioning method and able to predict both drugdisease and diseasedrug prioritization. MBiRW is the stateoftheart method and can infer potential novel indications for drugs. In this study, we compare HSDD with these four methods, by LOOCV experiment and de novo drug–disease prediction analysis. The parameters in HSDD are that the combined path is with length 2, 3, 4 and β equals 0.8.
We conduct the LOOCV experiment for predicting drugdisease associations. In total, there are 1776 drugdisease associations involving 540 drugs and 306 diseases in this experiment. The ROC and AUC values for all methods are presented in Fig. 4a. Method HSDD performs best in the five methods overall. The AUC value for HSDD is 0.8994, while those for methods NBI, HGBI, DrugNet and MBiRW are 0.5824, 0.8376, 0.7717 and 0.8710, respectively.
Moreover, we further investigate the number of correctly retrieved drugdisease associations. A true drugdisease association is considered as correctly retrieved if the predicted ranking of this association is higher than the specified toprank threshold [33]. The results are shown in Fig. 4b. Method HSDD significantly outperforms the other four compared methods. For HSDD, 386 associations are predicted at the top 1, while the results for NBI, HGBI, DrugNet and MBiRW are 15, 77, 69 and 346, respectively. As for the top 10, top 20, top 50 and top 100 evaluation metric, HSDD also performs best, which is followed by MBiRW. Therefore, HSDD can be more useful in practice than other four approaches.
De novo drug–disease prediction
To evaluate the capability of HSDD in predicting potential indications for new drugs, we conduct the de novo drugdisease prediction test. In this experiment, we select the drugs, which only have one associated disease. There are totally 153 associations in this experiment to evaluate the performance of HSDD by the capability to recover the association. There are totally 153 drugs, 132 diseases, and 153 drugdisease associations used in this experiment. At the same time, we also evaluate the performance of NBI, HGBI, DrugNet and MBiRW. The corresponding results have be presented in Fig. 5.
As is shown in Fig.5a. HSDD achieves an AUC of 0.8296, which outperforms other four methods in the same experimental scenario. The AUC values for NBI, HGBI, DrugNet and MBiRW are 0.5668, 0.7629, 0.7375 and 0.8163, respectively.
Moreover, we also investigate the number of correctly retrieved drugdisease associations. The results are listed in Fig. 5b. From the results, we can find that HSDD also outperforms other four methods. For example, among the 153 known drugdisease associations, HSDD achieves 8 of them at the top 1, while the results for NBI, HGBI, DrugNet and MBiRW is 1, 4, 3, and 6. For top 10, HSDD successfully predicts 68 associations, while the results for NBI, HGBI, DrugNet and MBiRW are 17, 27, 22 and 56, respectively. Overall, all de novo prediction results indicate that HSDD can achieve a superior performance.
The effect of parameters on HSDD
In this section, we investigate the effect of parameterβon HSDD. The parameter βdampens the contributions of different length paths. Besides, some research has found that the longer the path length is, the smaller the inhibiting factor is [51]. Therefore, we combine the value of β and path lengths as shown in Table 3. The value of β ranges from 0.1 to 1.0 with the scale 0.1. We divide the relevance path into two types: combined path and independent path. The combinations between β and different path lengths are presented in Table 3. We conducted the LOOCV experiment and calculated the AUC values based on various combinations. The corresponding results are shown in Table 3.
The results in Table 3 demonstrate that with value of β ranging from 0.1 to 0.9 overall, the AUC values of combined path with length 2, 3, 4 gradually increase. However, its AUC value is slightly decreased from 0.9 to 1.0. Therefore, HSDD performs best when β is at 0.9 and combined path is with length 2, 3, 4. For other path combinations, the best value for β can also be obtained from Table 3.
At the same time, we also evaluate the effect of path combination on HSDD. Results in Table 3 show that combined paths performs better than independent paths. Combined path with length 2, 3, 4 achieves the best performance comparing with other path combination. This is because the combined path with length 2, 3, 4 has more significant meanings than combined path with length 2, 3 and combined path with length 3, 4. Therefore, we can set β at 0.9 and select combined path with length 2, 3, 4 as the best path combination for HSDD, which can most effectively measure the associations between drugs and diseases. The phenomenon of AUC variations with path combination is consistent with previous research on pathbased algorithms [51].
Case studies
To verify the effectiveness of HSDD further, we will utilize HSDD infer novel drugdisease associations. In this experiment, we select five common diseases [12] and then predict their related drugs using HSDD. These five disease are Huntington disease (HD, OMIM 143100), Non–smallcell lung cancer (NSCLC, OMIM 211980), Alcohol dependence (AD, OMIM 103780), Smallcell lung cancer (SCLC, OMIM 182280) and Polysubstance abuse, Susceptibility to (PSAB, OMIM 606581). For each disease, we firstly obtain the known drugs and then present the top ten predicted drugs, which has shown in Table 4. We take Huntington disease as an example to explain the results of case study.
Huntington’s disease (HD), also known as Huntington’s chorea, is an autosomaldominant, progressive neurodegenerative disorder with a distinct phenotype and can results in death of brain cells [52, 53]. In OMIM database, HD has many phenotypes and here we select 141,300 as its phenotype to predict its related drugs.
HSDD has predicted ten drugs for HD. Quetiapine (DB01224) was studied in five consecutive patients with Huntington’s disease in a longterm facility. These patients behave improvement of behavioral symptoms without worsening of motor functioning [54]. Author Paleacu designed an experiment of eleven HD patients and the results clearly demonstrates that Olanzapine (DB00334) is safe and is an effective treatment for the behavioral disturbances and frequently for the chorea seen in HD patients [55]. Besides, to evaluate the efficacy and safety of Bupropion (DB01156) in the treatment of apathy in Huntington’s disease (HD), Gelderblom conducted a multicenter, randomized, doubleblind, placebocontrolled, prospective crossover trial [56]. The results of the trail show that bupropion does not alleviate apathy in HD. However, the author observed the effects of participation/placebo, which document the need for carefully controlled trials. For other diseases, the predicted drugs have been presented in Table 4. In this experiment, when measuring HeteSim scores of drug and disease pairs, we utilize all the information in the network including all the known drugs. Most of predicted drugs predicted by HSDD are supported by literature, which indicates its good performance.
Discussion
In this study, we proposed HSDD to infer the associations between drugs and diseases.
Comparing with other effective methods, HSDD shows best performance in all datasets. HSDD has the ability to capture the sematic meaning of metapaths in the heterogeneous network. Besides, the experimental results show that HDSS performs best with the combined path length 2, 3 and 4. This is because this conbined path can extracting much more meaningful metapath from the drugdisease heterogeneous network than the other paths. In the end, the results of HSDD on case studies indicate its good performance, which is validated by literature.
Conclusions
Drug repositioning is a promising and efficient way to develop the associations of drugs and diseases. With the rise of precision medicine, drug repositioning will play a more and more important role. In this study, we proposed a novel method called HSDD to research drug repositioning problem. HSDD makes the best use of metapaths with different lengths in the drugdisease heterogeneous and measures their association strength based on HeteSim scores. The results in all the cross validation experiments show that HSDD outperforms other methods, which can effectively improve the prediction performance. Besides, case studies for some typical diseases indicate that HSDD is an efficient useful way to predict potential drugdisease associations.
HSDD can be extended easily to other research as long as the data is available and suitable. For example, RNAprotein association prediction is another meaningful study. Similar to drug repositioning, networkbased methods have already achieved a good performance. Further, the identification of microRNAs associated with diseases is very important for understanding the pathogenesis of diseases at the molecular level. HSDD can be widely used in these applications.
At the same time, we plan to address two issues in future work. First, we only consider the paths with length less than five in this study. As we know, longer paths also have significant meanings. Therefore, we should investigate the effect of other longer paths on HSDD more comprehensive. Secondly, in this study we only consider the direct associations of drugs and diseases, which only utilizes two kinds of objects. Some research has put drugtarget relationships into drug repositioning. For example, we can predict drug disease associations based on a drugtargetdisease threelayer heterogeneous network, which is inspired by data fusion.
Abbreviations
 DBSI:

Drugbased similarity inference
 GBA:

Guiltbyassociation (GBA)
 HSDD:

HeteSim_DrugDisease
 NBI:

Networkbased inference
 TBSI:

Targetbased similarity inference
References
 1.
Yu L, Wang B, Ma X, Gao L. The extraction of drugdisease correlations based on module distance in incomplete human interactome. BMC Syst Biol. 2016;10(4):531.
 2.
DiMasi JA, Seibring MA, Lasagna L. New drug development in the United States from 1963 to 1992. Clin Pharmacol Ther. 1994;55(6):609–22.
 3.
Guney E, Menche J, Vidal M, Barábasi AL. Networkbased in silico drug efficacy screening. Nat Commun. 2016;7:10331.
 4.
Paul SM, Mytelka DS, Dunwiddie CT, Persinger CC, Munos BH, Lindborg SR, Schacht AL. How to improve R&D productivity: the pharmaceutical industry's grand challenge. Nat Rev Drug Discov. 2010;9(3):203–14.
 5.
Gottlieb A, Stein GY, Ruppin E, Sharan R. PREDICT: a method for inferring novel drug indications with application to personalized medicine. Mol Syst Biol. 2011;7(1):496.
 6.
Ashburn TT, Thor KB. Drug repositioning: identifying and developing new uses for existing drugs. Nat Rev Drug Discov. 2004;3(8):673–83.
 7.
Ammaduddin M, Khan SA, Malani D, Murumagi A, Kallioniemi O, Aittokallio T, Kaski S. Drug response prediction by inferring pathwayresponse associations with kernelized Bayesian matrix factorization. Bioinformatics. 2016;32(17):455–63.
 8.
Martínez V, Navarro C, Cano C, Fajardo W, Blanco A. DrugNet: networkbased drug–disease prioritization by integrating heterogeneous data. Artif Intell Med. 2015;63(1):41–9.
 9.
Von Eichborn J, Murgueitio MS, Dunkel M, Koerner S, Bourne PE, Preissner R. PROMISCUOUS: a database for networkbased drugrepositioning. Nucleic Acids Res. 2011;39(suppl 1):D1060–6.
 10.
Jin G, Wong ST. Toward better drug repositioning: prioritizing and integrating existing methods into efficient pipelines. Drug Discov Today. 2014;19(5):637–44.
 11.
Shameer K, Readhead B, T Dudley J. Computational and experimental advances in drug repositioning for accelerated therapeutic stratification. Curr Top Med Chem. 2015;15(1):5–20.
 12.
Wang W, Yang S, Zhang X, Li J. Drug repositioning by integrating target information through a heterogeneous network model. Bioinformatics. 2014;30(20):2923–30.
 13.
Li J, Zheng S, Chen B, Butte AJ, Swamidass SJ, Lu Z. A survey of current trends in computational drug repositioning. Brief Bioinform. 2016;17(1):2–12.
 14.
Maryam Lotfi Shahreza, Nasser Ghadiri, Sayed Rasoul Mousavi, Jaleh Varshosaz, James R Green; A review of networkbased approaches to drug repositioning[J]. Briefings in Bioinformatics. 2018;19(5):878–92.
 15.
Kinnings SL, Liu N, Tonge PJ, Jackson RM, Xie L, Bourne PE. A machine learningbased method to improve docking scoring functions and its application to drug repurposing. J Chem Inf Model. 2011;51(2):408–19.
 16.
Wang Y, Chen S, Deng N, Wang Y. Drug repositioning by kernelbased integration of molecular structure, molecular activity, and phenotype data. PLoS One. 2013;8(11):e78518.
 17.
Menden MP, Iorio F, Garnett M, McDermott U, Benes CH, Ballester PJ, SaezRodriguez J. Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties. PLoS One. 2013;8(4):e61318.
 18.
Napolitano F, Zhao Y, Moreira VM, Tagliaferri R, Kere J, D’Amato M, Greco D. Drug repositioning: a machinelearning approach through data integration. J Cheminform. 2013;5(1):30.
 19.
Zhang P, Wang F, Hu J. Towards drug repositioning: a unified computational framework for integrating multiple aspects of drug similarity and disease similarity. In: AMIA Annual Symposium Proceedings: American medical informatics association; 2014. p. 1258.
 20.
Yang J, Li Z, Fan X, Cheng Y. Drug–disease association and drugrepositioning predictions in complex diseases using causal inference–probabilistic matrix factorization. J Chem Inf Model. 2014;54(9):2562–9.
 21.
Wu G, Liu J, Wang C. Semisupervised graph cut algorithm for drug repositioning by integrating drug, disease and genomic associations. In: Bioinformatics and Biomedicine (BIBM): IEEE International Conference on: 2016. IEEE; 2016. p. 223–8.
 22.
Liang X, Zhang P, Yan L, Fu Y, Peng F, Qu L, Shao M, Chen Y, Chen Z. LRSSL: predict and interpret drug–disease associations based on data integration using sparse subspace learning. Bioinformatics. 2017;33(8):1187–96.
 23.
Yildirim MA, Goh KI, Cusick ME, Barabasi AL, Vidal M. Drugtarget network. Nat Biotechnol. 2007;25(10):1119.
 24.
Chandrasekaran SN, Huan J. Weighted multiview learning for predicting drugdisease associations. In: Bioinformatics and Biomedicine (BIBM): IEEE International Conference on: 2016. IEEE; 2016. p. 699–702.
 25.
Wang J, Kribelbauer J, Rabadan R. Network propagation reveals novel features predicting drug response of Cancer cell lines. Curr Bioinforma. 2016;11(2):203–10.
 26.
Keiser MJ, Setola V, Irwin JJ, Laggner C, Abbas AI, Hufeisen SJ, Jensen NH, Kuijer MB, Matos RC, Tran TB. Predicting new molecular targets for known drugs. Nature. 2009;462(7270):175–81.
 27.
Chen H, Zhang H, Zhang Z, Cao Y, Tang W. Networkbased inference methods for drug repositioning. Comput Math Methods Med. 2015;2015.
 28.
Peng J, Hui W, Shang X. Measuring phenotypephenotype similarity through the interactome. BMC bioinformatics. 2018;19(5):114.
 29.
Zeng X, Liu L, Lü L, Zou Q, Valencia A. Prediction of potential diseaseassociated microRNAs using structural perturbation method. Bioinformatics. 2018;1:8.
 30.
Cheng F, Liu C, Jiang J, Lu W, Li W, Liu G, Zhou W, Huang J, Tang Y. Prediction of drugtarget interactions and drug repositioning via networkbased inference. PLoS Comput Biol. 2012;8(5):e1002503.
 31.
Wu C, Gudivada RC, Aronow BJ, Jegga AG. Computational drug repositioning through heterogeneous network clustering. BMC Syst Biol. 2013;7(Suppl 5):S6.
 32.
Huang YF, Yeh HY, Soo VW. Inferring drugdisease associations from integration of chemical, genomic and phenotype data using network propagation. BMC Med Genet. 2013;6(3):S4.
 33.
Luo H, Wang J, Li M, Luo J, Peng X, Wu FX, Pan Y. Drug repositioning based on comprehensive similarity measures and birandom walk algorithm. Bioinformatics. 2016;32(17):2664–71.
 34.
Wang W, Yang S, Li J. Drug target predictions based on heterogeneous graph inference. In: Pacific Symposium on Biocomputing Pacific Symposium on Biocomputing: NIH Public Access; 2013. p. 53.
 35.
Hahn U, Cohen KB, Garten Y, Shah NH. Mining the pharmacogenomics literature—a survey of the state of the art. Brief Bioinform. 2012;13(4):460–94.
 36.
Frijters R, Van Vugt M, Smeets R, Van Schaik R, De Vlieg J, Alkema W. Literature mining for the discovery of hidden connections between drugs, genes and diseases. PLoS Comput Biol. 2010;6(9):e1000943.
 37.
Yang HT, Ju JH, Wong YT, Shmulevich I, Chiang JH. Literaturebased discovery of new candidates for drug repurposing. Brief Bioinform. 2016.
 38.
Chen B, Ding Y, Wild DJ. Assessing drug target association using semantic linked data. PLoS Comput Biol. 2012;8(7):e1002574.
 39.
Shi C, Kong X, Huang Y, Philip SY, Wu B. Hetesim: a general framework for relevance measure in heterogeneous networks. Ieee T Knowl Data En. 2014;26(10):2479–92.
 40.
Li C, Sun J, Xiong Y, Zheng G: An efficient drugtarget interaction mining algorithm in heterogeneous biological networks. In: PacificAsia Conference on Knowledge Discovery and Data Mining: 2014. Springer: 65–76.
 41.
Yang J, Li A, Ge M, Wang M. Prediction of interactions between lncRNA and protein by using relevance search in a heterogeneous lncRNAprotein network. In: Control Conference (CCC): 34th Chinese: 2015. IEEE; 2015. p. 8540–4.
 42.
Zeng X, Liao Y, Liu Y, et al. Prediction and validation of disease genes using HeteSim Scores[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB). 2017;14(3):68795.
 43.
X. Zhang, Q. Zou, A. RodriguezPaton and x. zeng, "Metapath methods for prioritizing candidate disease miRNAs," in IEEE/ACM Transactions on Computational Biology and Bioinformatics. https://doi.org/10.1109/TCBB.2017.2776280.
 44.
Katz L. A new status index derived from sociometric analysis. Psychometrika. 1953;18(1):39–43.
 45.
SinghBlom UM, Natarajan N, Tewari A, Woods JO, Dhillon IS, Marcotte EM. Prediction and validation of genedisease associations using methods inspired by social network analyses. PLoS One. 2013;8(5):e58977.
 46.
van Driel MA, Bruggeman J, Vriend G, Brunner HG, Leunissen JA. A textmining analysis of the human phenome. Eur J Hum Genet : EJHG. 2006;14(5):535–42.
 47.
Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. Online Mendelian inheritance in man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2005;33(suppl 1):D514–7.
 48.
Vanunu O, Magger O, Ruppin E, Shlomi T, Sharan R. Associating genes and protein complexes with disease via network propagation. PLoS Comput Biol. 2010;6(1):e1000641.
 49.
Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, Tzur D, Gautam B, Hassanali M. DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res. 2008;36(suppl_1):D901–6.
 50.
Yang J, Li A, Ge M, Wang M. Relevance search for predicting lncRNAprotein interactions based on heterogeneous network. Neurocomputing. 2016;206:81–8.
 51.
Xiao Y, Zhang J, Deng L. Prediction of lncRNAprotein interactions using HeteSim scores based on heterogeneous networks. Sci Rep. 2017;7:3664.
 52.
Vonsattel JPG, DiFiglia M. Huntington disease. J Neuropathol Exp Neurol. 1998;57(5):369.
 53.
Walker FO. Huntington’s disease. Lancet. 2007;369(9557):218–28.
 54.
Alpay M, Koroshetz WJ. Quetiapine in the treatment of behavioral disturbances in patients with Huntington’s disease. Psychosomatics. 2006;47(1):70–2.
 55.
Paleacu D, Anca M, Giladi N. Olanzapine in Huntington's disease. Acta Neurol Scand. 2002;105(6):441–4.
 56.
Gelderblom H, Wüstenberg T, McLean T, Mütze L, Fischer W, Saft C, Hoffmann R, Süssmuth S, Schlattmann P, van Duijn E. Bupropion for the treatment of apathy in Huntington’s disease: a multicenter, randomised, doubleblind, placebocontrolled, prospective crossover trial. PLoS One. 2017;12(3):e0173872.
Acknowledgements
Not applicable.
Funding
This work was supported by the Natural Science Foundation of China (Grant No. 61532014, 61571163, 61671189 and 61801432), and the National Key Research and Development Plan Task of China (Grant No. 2016YFC0901902). Specially, publication of this article was sponsored by Natural Science Foundation of China grant with number 61801432.
Availability of data and materials
Not applicable
About this supplement
This article has been published as part of BMC Systems Biology Volume 12 Supplement 9, 2018: Proceedings of the 29th International Conference on Genome Informatics (GIW 2018): systems biology. The full contents of the supplement are available online at https://bmcsystbiol.biomedcentral.com/articles/supplements/volume12supplement9.
Author information
Affiliations
Contributions
ZT proposed the idea, implemented the experiments and drafted the manuscript. ZT and SC helped with data analysis and revised the manuscript. MG initiated the idea, conceived the whole process and finalized the paper. All authors have read and approved the final manuscript.
Corresponding author
Correspondence to Maozu Guo.
Ethics declarations
Ethics approval and consent to participate
There are no ethics issues. No human participants or individual clinical data are involved with this study.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Published
DOI
Keywords
 Semantic network analysis
 Drug repositioning
 Metapathbased
 HeteSim
 HSDD