Skip to main content

FCMDAP: using miRNA family and cluster information to improve the prediction accuracy of disease related miRNAs

Abstract

Background

Biological experiments have confirmed the association between miRNAs and various diseases. However, such experiments are costly and time consuming. Computational methods help select potential disease-related miRNAs to improve the efficiency of biological experiments.

Methods

In this work, we develop a novel method using multiple types of data to calculate miRNA and disease similarity based on mutual information, and add miRNA family and cluster information to predict human disease-related miRNAs (FCMDAP). This method not only depends on known miRNA-diseases associations but also accurately measures miRNA and disease similarity and resolves the problem of overestimation. FCMDAP uses the k most similar neighbor recommendation algorithm to predict the association score between miRNA and disease. Information about miRNA cluster is also used to improve prediction accuracy.

Result

FCMDAP achieves an average AUC of 0.9165 based on leave-one-out cross validation. Results confirm the 100, 98 and 96% of the top 50 predicted miRNAs reported in case studies on colorectal, lung, and pancreatic neoplasms. FCMDAP also exhibits satisfactory performance in predicting diseases without any related miRNAs and miRNAs without any related diseases.

Conclusions

In this study, we present a computational method FCMDAP to improve the prediction accuracy of disease related miRNAs. FCMDAP could be an effective tool for further biological experiments.

Background

MicroRNAs (miRNAs) are small endogenous non-coding RNAs with length of about 22 nt and can regulate gene expression mainly through post-transcription [1]. The latest version of miRBase consists of 1881 human miRNAs, and most of them regulate more than 60% of human protein-coding genes. miRNAs regulate target genes through biological processes, such as cell growth, proliferation, differentiation and apoptosis. miRNAs play a critical role in the development of various diseases including cancers [2]. Takamizawa et al. [3] found that the expression level of let-7 decreases in lung neoplasms in vivo and in vitro, resulting in shortened post-operative survival of the patients. Moreover, let-7 is a potential therapeutic miRNA for prevention of tumorigenesis. Lung neoplasms are characterized by several key oncogene mutations, including p53, RAS, and MYC; some of which may be directly related to the decreased expression of let-7 and may be inhibited by introducing this miRNA [3]. miRNAs can be used as biomarkers to identify cancer tissure origin of unknown primary origin [4, 5]. Therefore, identification of disease-related miRNAs would benefit research on pathogenesis and diagnosis.

Many disease-related miRNAs have been identified through biological experiments. Researchers have collected data from existing literature to build miRNA-related databases, such as miRBase [6], miRGen [7], miRTarBase [8], miRWalk [9], microRNA.org [10], miRCancer [11], HMDD [12], miR2Disease [13], dbDEMC [14], and PhenomiR [15]. These databases provide solid data foundation for study of miRNAs. However, methodologies for screening of miRNA-disease associations are costly and time consuming. In this regard, computational methods are used to predict miRNAs that are most likely associated with a disease and provide experimental targets for biological experiments to save cost and time.

Computational methods are classified into two main categories, namely, network-based methods and machine-learning-based methods [16]. Network-based methods predict unknown miRNA-disease associations by constructing different computational models using miRNAs and disease-related data resources to construct miRNA and disease similarity networks [17]; the obtained data are then combined with experimentally validated (or known) miRNA-disease networks. Jiang et al. [18] proposed a miRNA-prediction algorithm for the hypergeometric distribution scoring system, and the scores are ranked to select candidate disease- related miRNAs. Chen et al. [19] proposed WBSMDA method, which integrates the With-Score of miRNA and diseases similarity and the Between-Score of unknown miRNA-disease associations to predict potential miRNA-disease associations. However, the two methods make assumptions about probability distribution, and their prediction performances will be affected when the data resources are inconsistent with the assumptions. Xuan et al. [20] proposed HDMP method by considering weighted k most similar neighboring miRNAs and combining miRNA functional similarity to predict miRNAs associated with human diseases. RWRMDA [21] and MIDP [22] methods use random walk to calculate similarity of miRNAs and diseases. However, these methods cannot predict related miRNAs for diseases without any related miRNAs or new diseases (isolated diseases). Zou et al. [23] proposed KATZ to calculate the prediction score of different walking lengths between miRNAs and diseases through social network analysis. However, the performance of KATZ is poor because the known associations are sparse. KATZ also cannot predict related diseases for miRNAs without known related diseases or new miRNAs (isolated miRNAs). However, KATZ cannot be used to predict related miRNAs for isolated diseases.NCPMDA [24] develops network consistency projection to calculate potential miRNA–disease association score from miRNA and disease vector space projection scores. Li et al. [25] proposed a network similarity integration method (NSIM) for predicting potential miRNA-disease associations. However, NSIM are overly dependent on known miRNA-disease associations. HGIMDA [26] utilizes a heterogeneous graph iterative algorithm based on known miRNA–disease associations to predict miRNA–disease associations. However, HGIMDA is difficult to use in selecting parameters.

Machine learning-based methods aim to predict reliable miRNA-disease association by extracting effective features or solving specific optimization problems by using powerful machine-learning algorithms. Xu et al. [27] built a support vector machine (SVM) classifier by using four topological features based on the miRNA target-dysregulated network to predict potential miRNAs related to prostate cancer. The main disadvantage of Xu’s method is the impossibility to obtain negative samples, thereby decreasing the prediction performance. Chen and Yan [28] proposed RLSMDA method that uses regularized least squares to predict miRNA-disease associations. This method is based on semi-supervised learning and avoid using negative samples but adjust parameters intricately. Li et al. [29] proposed MCMDA method using the matrix completion algorithm. Luo et al. [30] proposed CPTL method using the transduction learning collective prediction model to predict miRNA-disease associations. However, these methods cannot be applied to predict potential miRNAs for isolated diseases.

These above methods use only a single piece of information related to miRNAs or diseases, such as association of miRNAs and diseases verified by biological experiments, resulting in overestimation [31]. Therefore, researchers have investigated different types of miRNA- and disease-related a priori biological information to construct miRNA–disease associations through intermediaries. For example, Mørk et al. [32] developed a miRNA–protein–disease heterogeneity-related network, namely, miRPD, which uses protein-related associations as a bridge to link miRNAs and diseases. However, the prediction accuracy of miRPD is unsatisfactory because of its high false positive/negative rates. Xu et al. [33] used the network of interactions between miRNAs and target genes derived from matched miRNA and mRNA expression data and the network of interactions between specific miRNAs and diseases to sequence and identify miRNAs most likely associated with multiple diseases. Liu et al. [31] integrated miRNA-target gene and miRNA-lncRNA multiple data sources, established disease and miRNA similarity subnets, and predicted miRNA-disease associations in heterogeneous networks by using random walk with restart. Zeng et al. [34] used gene functional information, four main parameters of miRNAs and miRNA-disease associations to construct a bilayer networks. Then they used structural consistency as an indicator to estimate the link predictability of the bilayer networks, and used structural perturbation method (SPM) to predict potential miRNA-disease associations. SRMDAP [35] builds miRNA and disease similarity subnetworks by using the SimRank algorithm and density-based clustering recommender model based on known miRNA-mRNA interaction data, disease-gene data, and miRNA-disease association data. However, these methods lead to incomplete calculation of similarity and low prediction accuracy.

In our work, we propose a novel computational method, namely, FCMDAP, by using miRNA family and cluster information to improve the prediction accuracy of disease-related miRNAs. FCMDAP uses information entropy and mutual information (MI) to measure similarity between miRNAs based on miRNA–mRNA interaction and adds miRNA family information to reconstruct a miRNA similarity network. FCMDAP obtains functional similarity between diseases based on disease–gene interaction and semantic similarity between diseases based on disease directed acyclic graph (DAG). FCMDAP then integrates functional and semantic similarity to disease similarity. Based on the k-most similar neighboring recommendation algorithm, FCMDAP uses experimentally verified miRNA–disease association, miRNA similarity, and cluster information to predict potential miRNA–disease associations in miRNA space. FCMDAP also uses experimentally verified miRNA–disease association and disease similarity to predict potential miRNA–disease associations in disease space. The two predicted association scores are linearly integrated together. We implemented leave-one-out cross validation (LOOCV) and achieved AUC of 0.9165. Analysis of miRCancer, dbDEMC, or PhenomiR databases, confirmed the 50, 49, and 48 of top 50 predicted miRNAs in case studies of colorectal, lung, and pancreatic neoplasms, respectively. The average AUC values of FCMDAP to predict isolated diseases and miRNAs were 0.8417 and 0.8944, respectively. For isolated lung neoplasms, all of the top 50 predicted miRNAs were confirmed. For isolated hsa-mir-93, 9 of the top 10 diseases were confirmed. In conclusion, FCMDAP outperforms other methods.

Materials

Data

Data used in FCMDAP are obtained from five data sets:

  1. (1)

    experimentally verified miRNA-disease related data from HMDD v2.0 database (http://www.cuilab.cn/hmdd, Jun-14-2014 Version) [12]. After filtering invalid data with disease name error or wrong miRNA name and removing redundant miRNA-disease associations, we obtained 5048 experimentally verified miRNA-disease associations including 475 miRNAs and 334 diseases as the benchmark dataset [see Additional file 1]. We use M = {m1, m2, , mnm} to represent the miRNA set and D = {d1, d2, , dnd} to represent the disease set, where nm is the number of miRNAs, and nd is the number of diseases. We also use the matrix AS to represent the known association of miRNAs and diseases. When miRNA i associates with disease j, AS(i, j) is 1. Otherwise, AS(i, j) is 0.

  2. (2)

    experimentally verified miRNA-mRNA interactions from miRTarBase database (http://mirtarbase.mbc.nctu.edu.tw/, Release 6.0: Sept-15-2015) [36]. We use these data to measure functional similarity of miRNAs.

  3. (3)

    experimentally verified disease-gene interaction from DisGeNET database (http://www.disgenet.org, Release 4.0: Oct-2016) [37]. We use these data to measure functional similarity of diseases.

  4. (4)

    data on the relationship of various disease from the MeSH (http://www.nlm.nih.gov/, 2017 Version) descriptor of Category C, which are descripted as DAG. We use these data to measure semantic similarity of diseases.

  5. (5)

    information of the family and cluster of human miRNAs from miRBase (http://www.mirbase.org, Release 21) [6]. We established the miRNA family information matrix FAM for the 475 miRNAs in the benchmark. FAM(i, j) = 1 if miRNA i and j are in the same family; otherwise, FAM(i, j) = 0. We also established the miRNA cluster information matrix CLU for 475 miRNAs. CLU(i, j) = 1 if the distance between miRNA i and j is less than 20 kb and we consider the two miRNAs in the same cluster; otherwise, CLU(i, j) = 0.

miRNA similarity network

Information entropy and mutual information (MI) are used to calculate similarity between miRNAs based on the set of mRNAs interacting with miRNAs.

In events set X, information entropy is a measure of the average information content that can be obtained if one of the events actually occurs [38]. This parameter can be defined as

$$H(X)=\sum \limits_{x\in X}p(x)\mathit{\log}\frac{1}{p(x)}=-\sum \limits_{x\in X}p(x)\mathit{\log}\left(p(x)\right)$$
(1)

where p(x) is the probability of x.

For two discrete random variables X and Y, their MI can be described as

$$I\left(X;Y\right)={\sum}_{x\in X}{\sum}_{y\in Y}p\left(x,y\right)\mathit{\log}\frac{p\left(x,y\right)}{p(x)p(y)}$$
(2)

where p(x) is the marginal probability distribution function of X, p(y) is the marginal probability distribution function of Y, and p(x, y) is the joint probability function of X and Y.

If the mRNAs set of miRNA A is\({T}_m^A=\left\{{T}_m^A(1),{T}_m^A(2),\dots, {T}_m^A(ma)\right\}\), and the mRNAs set of miRNA B is \({T}_m^B=\left\{{T}_m^B(1),{T}_m^B(2),\dots, {T}_m^B(mb)\right\}\) (where ma and mb are the target genes number of miRNA A and miRNA B, respectively), then information entropy of \({T}_m^A\) can be calculated as

$$\left\{\begin{array}{c}H\left({T}_m^A\right)=-{\sum}_{i=1}^{ma}p\left({T}_m^A(i)\right){\log}_2\left(p\left({T}_m^A(i)\right)\right)\\ {}p\left({T}_m^A(i)\right)=n\left({T}_m^A(i)\right)/N\end{array}\right.$$
(3)

where N is the total number of the known miRNA–mRNA interactions in the dataset. \(n\left({T}_m^A(i)\right)\) is the known number of interactions between the ith target gene in the target gene set of miRNA A and all miRNAs. \(p\left({T}_m^A(i)\right)\) is the rate of the ith target gene in the target gene set of miRNA A with the known miRNA-mRNA interactions.

The similarity between miRNA A and miRNA B can use the normalized MI of \({T}_m^A\) and \({T}_m^B\) denoted as

$$SM\left(A,B\right)=\frac{2\ast H\left({T}_m^A\cap {T}_m^B\right)}{H\left({T}_m^A\right)+H\left({T}_m^B\right)}$$
(4)

where \(H\left({T}_m^A\cap {T}_m^B\right)\) is the information entropy of the intersection of \({T}_m^A\) and \({T}_m^B\). When calculating the similarity of miRNA A and miRNA B, both of their information entropies and the common information entropies of their mRNAs are considered. Also, the frequency of occurrence of the target mRNAs are considered. It measures the similarity between miRNAs by MI according to the occurrence probability of target genes of miRNAs. The target gene with higher probability is more universal and carries less information, while the target gene with lower probability is more specific and carries more information. Obviously, the difference in target gene probability results in such a result. By comparing the similarity data, we find that the metric is determined by the above two factors, and the similarity between the two miRNAs can be appropriately measured.

Disease similarity network

In building disease similarity network, we first calculate the functional similarity of disease on the basis of disease-gene interaction dataset. We then calculate the semantic similarity of disease on the basis of disease DAG. Finally, we integrate both data into disease similarity to build a disease similarity network.

Disease functional similarity of known disease–gene interactions

If the interaction genes set of disease A is \({T}_d^A=\left\{{T}_d^A(1),{T}_d^A(2),\dots, {T}_d^A(da)\right\}\), and \({T}_d^B=\left\{{T}_d^B(1),{T}_d^B(2),\dots, {T}_d^B(db)\right\}\) is for disease B (where da and db are the target genes number of disease A and disease B, respectively), then the information entropy of \({T}_d^A\) can be calculated as

$$\left\{\begin{array}{c}H\left({T}_d^A\right)=-{\sum}_{i=1}^{da}p\left({T}_d^A(i)\right){\log}_2\left(p\left({T}_d^A(i)\right)\right)\\ {}p\left({T}_d^A(i)\right)=n\left({T}_d^A(i)\right)/N\end{array}\right.$$
(5)

where N is the total number of known disease–gene interactions in the dataset, \(n\left({T}_d^A(i)\right)\) is the known number of the interactions between the ith target gene in the target gene set of disease A and all diseases, and \(p\left({T}_d^A(i)\right)\) is the rate of the ith target gene in the target gene set of disease A with known disease–gene interactions.

The functional similarity between disease A and disease B can use the normalized MI of \({T}_d^A\) and \({T}_d^B\) denoted as

$$SDF\left(A,B\right)=\frac{2\ast H\left({T}_d^A\cap {T}_d^B\right)}{H\left({T}_d^A\right)+H\left({T}_d^B\right)}$$
(6)

where \(H\left({T}_d^A\right)\) and \(H\left({T}_d^B\right)\) are the information entropies \({T}_d^A\) and \({T}_d^B\) of disease A and disease B, respectively. \(H\left({T}_d^A\cap {T}_d^B\right)\) is the information entropy of the intersection of \({T}_d^A\) and \({T}_d^B\). When calculating the functional similarity of disease A and disease B, both the information entropy of the diseases and the common information entropy of their genes are considered.

Disease semantic similarity

Disease semantic similarity DD are built from disease DAG as reported in the literature [39].

$$DD\left(A,B\right)=\frac{\sum_{t\in {T}_A\cap {T}_B}{D}_A(t)+{\sum}_{t\in {T}_A\cap {T}_B}{D}_B(t)}{2\ast \mathit{\min}\left( DV(A), DV(B)\right)}$$
(7)

where DD(A, B) is the semantics similarity value between disease A and disease B in disease DAG. For the meaning of the symols, please refer to the literature [39].

Integrating disease similarity

We integrate disease functional similarity and semantic similarity to obtain disease similarity.

$$SD\left(A,B\right)=\gamma \bullet SD F\left(A,B\right)+\left(1-\gamma \right)\bullet DD\left(A,B\right)$$
(8)

where γϵ(0, 1) is the balance factor to tune the contribution level from disease function similarity and semantic similarity. The results are shown in Additional file 2.

miRNA similarity network reconstruction

miRNA family information is obtained from miRBase database. We establish the miRNA family information matrix FAM for 475 miRNAs in the benchmark dataset. FAM(A, B) = 1 if miRNA A and B are in the same family; otherwise, FAM(A, B) = 0. We recalculate the miRNA similarity by adding miRNA family information as follows

$$miRNAsim\left(A,B\right)= SM\left(A,B\right)\ast \left(1+ FAM\left(A,B\right)\right)$$
(9)

We then reconstruct the miRNA similarity network. The results are shown in Additional file 3.

FCMDAP prediction method

The flowchart of FCMDAP to predict disease-related miRNAs is shown in Fig. 1.

Fig. 1
figure 1

The flowchart of FCMDAP

miRNA space score calculation

Calculating the recommendation score of neighboring miRNAs and disease

Wang et al. [39] proposed that miRNAs with the same similarity tend to be related to diseases with the same functions, and vice versa. In the miRNA space, the related score between miRNA and disease is associated with the correlation score of the neighbor nodes with the miRNA closest to the disease. Hence, if a similar neighbor of a miRNA is related to a disease, then the miRNA may be related to the disease. According to the collaborative recommendation algorithm, the association score of miRNA i and disease j is calculated based on the similarity scores of the top k1 nearest neighbor nodes of miRNA i and the association scores of these nodes and disease j. We normalize the association score of the top k1 most similar neighbor nodes of miRNA i and disease j by using the following:

$$s1\left(i,j\right)=\frac{\sum_{k=1}^{k1} SM1\left(i,k\right)\bullet AS\left(k,j\right)}{\sum \limits_{k=1}^{k1} SM1\left(i,k\right)}$$
(10)

where SM1 is the row vector of each miRNA in the miRNA matrix miRNAsim and is sorted in descending order. Hence, miRNAs that are more similar will be ranked higher. SM1(i, k) is a component of miRNA i and the kth closet similar neighbor nodes in the vector SM1. If miRNA k is related to disease j, then we calculate the sum of the related scores between miRNA i and miRNA k and divide the sum of the related scores of the top k1 similar neighbor nodes of miRNA i.

Calculating the prediction score in the same miRNA cluster

Baskerville S. and Bartel D.P. [40] found significant coexpression among the proximal pairs of miRNAs (< 50 kb). The closest miRNA cluster is usually expressed as a common regulatory unit of polycistronics, and intronic miRNAs are usually coexpressed with host genes, presenting complex miRNA expression patterns. Lu et al. [41] performed statistical analysis and found that miRNAs in 46% of diseases have at least one neighboring member. For example, all of the 6 miRNAs (miR-17, miR-18a, miR-19a, miR-20a, miR-19b-1 and miR-92a-1) involved in hematopoietic malignancies are located in the miR-17 cluster. This result shows that neighboring miRNAs may be regulated by a common regulator under the same conditions and interactions, and their dysfunction may lead to the same disease. Wang et al. [39] confirmed that miRNAs are more likely to associate with the similar disease when clustered and located within 20 kb of genomic location. We downloaded the information of the location of human miRNAs in the genome from miRBase v.21, and clustered miRNAs are selected within a distance of 20 kb. A miRNA cluster matrix CLU is built for the 475 miRNAs in the benchmark dataset. Basing on the collaborative recommendation algorithm, we calculate the normalized related scores between miRNA i and disease j as

$$s2\left(i,j\right)=\frac{\sum_{k=1}^n SM2\left(i,k\right)\bullet AS\left(k,j\right)}{\sum \limits_{k=1}^n SM2\left(i,k\right)}$$
(11)

where SM2(i, k) is the similarity score of miRNA i and miRNA k in the same cluster, and n is the number of miRNAs in the same cluster as miRNA i. If miRNA k is related to disease j, then we add the similarity score miRNAsim(i, k) of miRNA i and miRNA k and divide the sum of the similarity score of pairwise miRNAs in the same cluster as miRNA i. From the formula, we can find that the closer the miRNAs are in the same cluster with disease j, the closer the relation of miRNA i with disease j will be.

Integrating similarity score in miRNA space

In the miRNA space, the recommendation scores of miRNA–disease associations are calculated by integrating the score of top k similarity neighboring miRNAs of miRNA i and the recommendation score of miRNAs in the same cluster as miRNA i with disease j. The formula is as follows:

$$S\_ miRNA\left(i,j\right)=\alpha \ast s1\left(i,j\right)+\left(1-\alpha \right)\ast s2\left(i,j\right)$$
(12)

where α is a tradeoff factor. Experiments show that FCMDAP gets the best performance when α is 0.5.

Calculating disease space score

In the disease space, we also use the k-nearest neighbor-based recommendation algorithm to calculate the predicted association score between disease and miRNA. If the k-nearest neighbor of a disease is related to a miRNA, then the disease is related to the miRNA.

According to the collaborative recommendation algorithm, for miRNA i with disease j, their recommendation score is calculated by the normalized similarity score between the k2-nearest neighbors of disease j and miRNA i. The formula is shown as follows

$$S\_ disease\left(i,j\right)=\frac{\sum_{k=1}^{k2} AS\left(i,k\right)\bullet SD1\left(k,j\right)}{\sum \limits_{k=1}^{k2} SD1\left(k,j\right)}$$
(13)

where SD1 is the column vector of all diseases in disease similarity matrix SD. These vectors are sorted in descending order, and the most similar disease is ranked as the highest. SD1(k, j) represents the k-th component of the k-th nearest neighbor of disease j on the similarity column vector SD of disease j.

Calculating the final prediction score of disease-related miRNAs

The final prediction score of disease-related miRNAs of miRNA i with disease j is obtained by integrating the scores in miRNA space and disease space as follows

$$FC\_ md\left(i,j\right)=\beta \ast S\_ mi\mathrm{R} NA\left(i,j\right)+\left(1-\beta \right)\ast S\_ disease\left(i,j\right)$$
(14)

where β is the factor used to balance the weight of two spaces. Experiments show that the optimal performance of FCMDAP can be obtained when the value of β is 0.8.

FCMDAP can predict isolated disease-related miRNAs and isolated miRNA-related diseases. Isolated disease-related miRNAs/miRNA-related diseases are miRNAs/diseases without any related diseases/miRNAs, such as newly discovered miRNAs/diseases. When we use FCMDAP to predict isolated disease-related miRNAs, all miRNAs related to disease j do not exist, leading to the prediction score S _ miRNA(i, j) of 0. We calculate S _ disease(i, j) from two parts, namely, similarity score between miRNA i and other diseases and similarity between diseases. Thus, FCMDAP can predict the association between isolated diseases and miRNAs. When we predict isolated miRNA-related disease, diseases related to miRNA i do not exist, leading S _ disease(i, j)= 0. We can calculate S _ miRNA(i, j) from the relationship between other miRNA and disease j and the similarity between miRNAs to predict the association of miRNA i and disease j.

Results

Characteristics of the miRNA-disease association network

The benchmark data set include 5048 known miRNA–disease associations of 475 miRNAs and 334 diseases. The characteristics of these associations are shown in Table 1. The average degree of diseases and miRNAs are 15.11 and 10.63, respectively.

Table 1 Global characteristic of the known miRNA-disease association network

Performance evaluation of FCMDAP

The LOOCV of known miRNA-disease associations is used to evaluate the performance of FCMDAP. For a given disease d, each known association of disease d is deleted in turn as a test sample, and the other known associations are used as training set. The remaining miRNAs without experimental evidence regarding their relation with disease d comprise the candidate miRNA set. The association prediction scores of these candidate miRNAs and diseases are calculated and ranked. If the rank exceeds a given threshold, then we consider FCMDAP to successfully predict the association of miRNA and disease. After changing the threshold, drawing the receiver operating characteristic (ROC) curve and calculating the area under the curve (AUC) value are conducted to evaluate prediction performance.

The ROC plots indicate the relationship between the true positive rate (TPR) and the false positive rate (FPR) at different thresholds. If TP, FP, TN, and FN represent true positive, false positive, true negative, and false negative, respectively, then TPR and FPR are calculated as

$$TPR=\frac{TP}{TP+ FN}$$
(15)

and

$$FPR=\frac{FP}{TN+ FP}$$
(16)

After one round of LOOCV, one association between miRNA and disease was excluded, and the prediction score was calculated by remaining associations. All these scores were sorted and a special ranking position was selected as threshold. TP and FP are the number of experimentally verified and unverified associations above the threshold, respectively. TN and FN are the number of unverified and verified associationas below the threshold, respectively.

We compared FCMDAP with SRMDAP, RLSMDA [28], KATZ [23], and Liu’s method [31] in terms of prediction performance, AUC value, and ROC shapes on the benchmark data set. The values of the four parameters of FCMDAP are α = 0.5, β = 0.8, k1 = 50, and k2 = 30. The optimal parameters of SRMDAP, RLSMDA, KATZ, and Liu’s method are set as previously described. The comparison of the overall ROC curves and AUCs of all methods are shown in Fig. 2. The average AUC value of FCMDAP is 0.9165, which is 3.72, 5.81, 6.43, and 11.82% higher than those of SRMDAP, RLSMDA, KATZ and Liu’s method, respecitively. When the FPR is lower than 0.2, the ROC of FCMDAP is more convex near the upper left corner, indicating that the prediction accuracy is higher. Therefore, FCMDAP shows higher prediction accuracy than the other methods.

Fig. 2
figure 2

The ROC curve and AUC value of FCMDAP and other compared methods

To obtain reliable judgment, we tested 18 human diseases associated with at least 70 miRNAs. The results are shown in Table 2. Table 2 shows that FCMDAP obtained the highest AUC value of 0.8837 for pancreatic neoplasms and the lowest AUC value of 0.7572 for hepatocellular carcinoma. The average AUC value for the 18 diseases is 0.8195. The average AUC values for the 18 diseases obtained from SRMDAP, RLAMDA, KATA, and Liu’s method are 0.8057, 0.6671, 0.6901, and 0.5178, respectively. The average AUC value obtained by FCMDAP is 1.38, 15.24, 12.94, and 30.17% higher than those of the four methods, respectively. Hence, FCMDAP exhibits better performance than SRMAPS, RLSMDA, KATA, and Liu’s method.

Table 2 AUC value of compared five methods for 18 diseases

Parameter effect

The five parameters in FCMDAP are α, β, γ, k1, and k2. We focus on miRNA space. In the miRNA space, α balances the tradeoff between the recommendation score from the neighboring miRNAs and the score from the miRNA cluster. β is the entire space balancing factor that sets different weights of recommendation scores from the miRNA and disease spaces. To obtain optimal parameters, we assign different values to α and β starting from 0.1 to calculate the recommendation scores of miRNA–disease association and evaluate the performance of FCMDAP by calculating AUC value. We repeat this work by increasing α and β in steps of 0.1 and calculating the AUC value until α and β are both 1. We obtain the best performance when α = 0.5 and β = 0.8, and the AUC of FCMDAP is 0.9165. The results are shown in Fig. 3.

Fig. 3
figure 3

Average AUCs value affected by α, β

As shown in Fig. 3, the ordinate is the average AUC value, and the abscissa is the value at which β is magnified 10 times. Each curve in the figure represents the line connecting the points of the corresponding average AUC values when the same α value differs from the β value. The average AUC value varies from 0.8712 to 0.9165. When α = 0.1, β = 0.1, the average AUC is the minimum value of 0.8712. When α = 0.5, β = 0.8, the average AUC is the maximum value of 0.9165. The general trend is that the overall average AUC value increase with increasing α, β. γ denotes the balance factor in the disease similarity network based on disease functional similarity in disease–gene interactions and disease semantic similarity in disease DAG. k1 and k2 denotes the number of neighboring miRNAs and neighboring diseases in the recommendation algorithm, respectively. The values of γ, k1, and k2 are set as 0.5, 50, and 30, respectively, according to experience.

Case studies

Three important diseases (colorectal neoplasms, lung neoplasms, and pancreatic neoplasms) were selected to evaluate the performance of FCMDAP. The top 50 miRNA candidates of these three diseases were analyzed and verified using miRCancer (v. Oct. 2017), dbDEMC (v. 2.0), and PhenomiR (v. 2.0) databases and findings in the literature.

Colorectal neoplasms, the third most common cancer worldwide, severely affects the human health. In this regard, understanding colorectal-related miRNAs is important for diagnosis and prognosis of colorectal neoplasmsa. For example, patients with early colorectal neoplasms can be discriminated from healthy people by using serum miR-21, miR-29a, and miR-125b levels [42]. We used experimentally identified miRNA–disease associations as training samples to calculate the recommendation score of all candidate miRNAs through FCMDAP. We then ranked them in descending order and selected the top 50 miRNAs for verification. The top 50 candidate miRNAs and the corresponding evidence of their association with colorectal neoplasms are listed in Table 3. All the top 50 miRNAs were confirmed by analysis of miRCancer, dbDEMC, and PhenomiR databases.

Table 3 The top 50 candidate miRNAs associated with colorectal neoplasms predicted by FCMDAP and the confirmation for their associations by miRCancer, PhenomiR or dbDEMC databases are listed here. All of them have been confirmed

Lung neoplasms is a malignant lung tumor caused by uncontrolled growth of lung tissue cells. Lung tumor cells can also rapidly spread from the lungs to other nearby tissues or other parts of the body. According to the World Health Organization’s 2014 World Cancer Report [43], the number of patients with lung tumors worldwide reached 1.8 million in 2012. Lung neoplasms are the main cause of cancer-related death in men and women (other than breast neoplasms). In the United States, the 5-year survival rate for patients diagnosed with lung neoplasms is only 17.4%, which is lower than that in developing countries. Thus, effective methods for early diagnosis and treatment of lung neoplasms are important. Evidence indicates the important role of miRNAs in the pathogenesis, migration, and spread of lung neoplasms. For example, Takamizawa et al. [3] first found that the expression levels of let-7 are often reduced in lung neoplasms in vitro and in vivo in their study on 143 cases of lung neoplasms. The decrease in let-7 expression may affect the survival of patients that with lung neoplasms who were surgically treated. Johnson et al. [44] found that let-7 acts as a tumor suppressor in lung cells and negatively regulates the expression of the oncogene RAS. Hence, miRNAs can be used to develop drugs for treatment of lung tumors.

In our work, we used experimentally identified miRNA–disease associations as training samples to calculate recommendation scores of all candidate miRNAs based on FCMDAP. We then ranked them in descending order and selected the top 50 miRNAs for verification. The top 50 candidate miRNAs and the corresponding evidence of their association with colorectal neoplasms are listed in Table 4. Among these miRNAs, 48 miRNAs were confirmed in miRCancer, dbDEMC, and PhenomiR databases, and only two miRNAs (hsa-mir-520 g, hsa-mir-147a) were not confirmed. A recent study (PMID: 29033588) [45] showed that hsa-mir-147a is related to lung neoplasms. In this study, lncRNA HOXD-AS1 is specifically upregulated in non-small-cell lung cancer (NSCLC) tissues and promotes cancer cell growth by targeting miR-147a.

Table 4 The top 50 candidate miRNAs associated with lung neoplasms predicted by FCMDAP and the confirmation for their associations by miRCancer, PhenomiR or dbDEMC databases are listed here. 49 of them have been confirmed

Pancreatic neoplasms are cellular masses caused by uncontrollable pancreatic cell proliferation. The most common symptoms of pancreatic neoplasms include yellowing of the skin, abdominal or back pain, unexplained weight loss, and loss of appetite. Early pancreatic neoplasms are small and have no symptoms. Most pancreatic neoplasms are large when they are found and can metastasize to other parts of the body. According to reports, 411,600 people worldwide died of various pancreatic neoplasms in 2015. Pancreatic neoplasms most often occur in developed countries; that is, these malignancies rank as the fifth most common cancer in the UK and the fourth most common cancer in the United States [43, 46]. The prognosis of pancreatic neoplasms is very poor, with 25% survival rate for 1 year after diagnosis and 5% survival rate for 5 years. Thus, effective methods for early diagnosis, treatment, and prognosis of pancreatic neoplasms must be developed. At present, evidence supports the role of miRNA differential expression in the diagnosis, treatment, and prognosis of pancreatic neoplasms. For example, Sadakari et al. [47] found that the relative expression levels of miR-21 and miR-155 in tissues and pancreatic juice of patients with pancreatic ductal adenocarcinoma are significantly higher than those in patients with chronic pancreatitis; thus, miR-21 and miR-155 in pancreatic juice may be a potential biomarker for diagnosis of pancreatic ductal adenocarcinoma. Lodygin et al. [48] reported that the expression of miR-34a is silenced in several types of cancers, including pancreatic neoplasms, due to CpG methylation. By partially targeting CDK16, the re-expression of miR-34a in MiaPaC2 cell line with pancreatic neoplasms induces cellular senescence and cell cycle arrest. This observation indicates that miR-34a is a neoplasm suppressor gene, which is inactivated by CpG methylation and subsequent transcriptional silencing in various tumors, such as pancreatic neoplasms. Thus, miR-34a can be used as a therapeutic target for malignant neoplasms, such as pancreatic neoplasms.

In our work, we also calculated the recommendation score of all candidate miRNAs based on FCMDAP, ranked them in descending order, and selected the top 50 miRNAs for verification. The top 50 candidate miRNAs and the corresponding evidence of their associations with pancreatic neoplasms are listed in Table 5. Among the top 50 miRNAs, 48 miRNAs were confirmed in the miRCancer, dbDEMC, and PhenomiR databases, and only two miRNAs (miR-378a and miR-365a) were not confirmed.

Table 5 The top 50 candidate miRNAs associated with pancreatic neoplasms predicted by FCMDAP and the confirmation for their associations by miRCancer, PhenomiR or dbDEMC databases are listed here. 48 of them have been confirmed

Predicting isolated diseases and isolated miRNAs

FCMDAP can predict isolated disease-related miRNAs. In our work, we removed all experimentally verified disease-miRNA associations for a given disease and calculated the recommendation score by FCMDAP. We also ranked the miRNAs according to their recommendation scores. The average AUC of FCMDAP for predicting an isolated disease is 0.8417. For lung neoplasms, FCMDAP identifies the top 50 miRNAs related to lung neoplasms (Table 6). All of the top 50 miRNAs were confirmed by one or more databases (miRCancer, dbDEMC, or PhenomiR). Hence, FCMDAP exhibits satisfactory performance in predicting isolated diseases.

Table 6 The top 50 miRNAs associated with isolated lung neoplasms predicted by FCMDAP and their evidence

FCMDAP also shows satisfactory performance in predicting isolated miRNA-related diseases. In our work, we removed all disease association information for a given miRNA and calculated the recommendation score for all diseases for a given miRNA by using FCMDAP. We ranked these diseases and verified them in the databases. The average AUC of the FCMDAP to predict isolated miRNA is 0.8944. For hsa-mir-93, the top 10 related diseases predicted by FCMDAP are listed in Table 7. Among the 10 diseases, eight were confirmed to be related to hsa-mir-93 by dbDEMC or PhenomiR databases. Adrenocortical carcinoma, which ranked 8, was not confirmed by these two databases. Heart failure, which ranked 1, was confirmed to be related to hsa-mir-93 in the literature. Ke et al. [49] found that miR-93 is related to cardiomyocyte apoptosis, and miR-93 can prevent cardiomyocyte apoptosis induced by myocardial ischemia/reperfusion by inhibiting PI3K/AKT/PTEN signaling.

Table 7 The top 10 diseasesrelated with hsa-mir-93 predicted by FCMDAP and their evidences

Discussion

In this work, we developed FCMDAP to predict human disease-related miRNAs. FCMDAP calculates the similarity between miRNAs by using mutual information based on the known miRNA-mRNA interaction information and adds the miRNA family information to construct a miRNA space. FCMDAP integrates disease functional similarity based on the disease-gene interaction and disease semantic similarity based on the DAG from MeSH to construct a disease space. FCMDAP integrates the association scores between miRNA and disease from miRNA and disease spaces. The association scores between miRNA and disease are calculated based on the k most similar neighbor recommendation algorithm, and miRNA cluster information is added into miRNA space. Like NSIM and other method, FCMDAP also predict unknown associations by constructing miRNA network and disease network. However, in the process, the similarity calculation process of miRNA and disease are independent of each other. Multiple types of data including miRNA-mRNA interaction, miRNA family information, disease-gene interaction, DAG from MeSH to calculate miRNA similarity, and disease similarity are considered and the prediction does not only depend on the known miRNA–diseases associations, thereby improving the accuracy of similarity calculations. Using the k most similar neighbor recommendation algorithm and miRNA cluster information makes the prediction results more reasonable, and improves the predictive performance.

LOOCV and case research show that FCMDAP exhibits excellent performance in predicting miRNA–disease associations. FCMDAP shows satisfactory performance in predicting diseases without any related miRNA information and miRNAs without any related disease information. The average AUC of FCMDAP for predicting isolated diseases and isolated miRNAs are 0.8417 and 0.8944, respectively. For isolated lung neoplasms, the prediction accuracy reached 100% in the top 50 predicted miRNAs. For the isolated hsa-mir-93, the prediction accuracy reached 90% in the top 10 diseases.

However, FCMDAP presents the following limitations. miRNA similarity can be further improved if other biomolecules that interact with miRNAs can be considered. As FCMDAP is developed on experimentally verified miRNA–disease associations, miRNA–disease associations can be experimentally verified, thereby improving the performance of FCMDAP.

Conclusion

In order to provide effective support for experimental research on miRNAs, we proposed a computational method FCMDAP to find potential disease-related miRNAs. FCMDAP exhibits excellent performance in predicting potential disease-related miRNAs. The FCMDAP could extend to study on other biomeolecular networks and help to decipher the study of complex human disease pathogenesis and diagnosis.

Abbreviations

AUC:

Area under the curve

DAG:

Disease directed acyclic

LOOCV:

Leave-one-out cross validation

MI:

Mutual information

References

  1. Bartel DP. MicroRNAs. Genomics, biogenesis, mechanism, and function. Cell. 2004;116(2):281–97.

    Article  CAS  PubMed  Google Scholar 

  2. He L, Thomson JM, Hemann MT, Hernandomonge E, Mu D, Goodson S, Powers S, Cordoncardo C, Lowe SW, Hannon GJ. A microRNA polycistron as a potential human oncogene. Nature. 2005;435(7043):828–33.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Takamizawa J, Konishi H, Yanagisawa K, Tomida S, Osada H, Endoh H, Harano T, Yatabe Y, Nagino M, Nimura Y. Reduced expression of the let-7 microRNAs in human lung cancers in association with shortened postoperative survival. Cancer Res. 2004;64(11):3753–6.

    Article  CAS  PubMed  Google Scholar 

  4. Rosenfeld N, Aharonov R, Meiri E, Rosenwald S, Spector Y, Zepeniuk M, Benjamin H, Shabes N, Tabak S, Levy A, et al. MicroRNAs accurately identify cancer tissue origin. Nat Biotechnol. 2008;26:462.

    Article  CAS  PubMed  Google Scholar 

  5. Tang W, Wan S, Yang Z, Teschendorff AE, Zou Q. Tumor origin detection with tissue-specific miRNA and DNA methylation markers. Bioinformatics. 2018;34(3):398–406.

    Article  CAS  PubMed  Google Scholar 

  6. Kozomara A, Griffiths-Jones S. miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res. 2013;42(D1):D68–73.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Megraw M, Sethupathy P, Corda B, Hatzigeorgiou AG. miRGen: a database for the study of animal microRNA genomic organization and function. Nucleic Acids Res. 2006;35(suppl_1):D149–55.

    PubMed  PubMed Central  Google Scholar 

  8. Hsu S-D, Lin F-M, Wu W-Y, Liang C, Huang W-C, Chan W-L, Tsai W-T, Chen G-Z, Lee C-J, Chiu C-M. miRTarBase: a database curates experimentally validated microRNA–target interactions. Nucleic Acids Res. 2010;39(suppl_1):D163–9.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Dweep H, Sticht C, Pandey P, Gretz N. miRWalk – database: prediction of possible miRNA binding sites by “walking” the genes of three genomes. J Biomed Inform. 2011;44(5):839–47.

    Article  CAS  PubMed  Google Scholar 

  10. Betel D, Wilson M, Gabow A, Marks DS, Sander C. The microRNA. Org resource: targets and expression. Nucleic Acids Res. 2008;36(suppl_1):D149–53.

    CAS  PubMed  Google Scholar 

  11. Xie B, Ding Q, Han H, Wu D. miRCancer: a microRNA-cancer association database constructed by text mining on literature. Bioinformatics. 2013;29(5):638–44.

    Article  CAS  PubMed  Google Scholar 

  12. Li Y, Qiu CX, Tu J, Geng B, Yang JC, Jiang TZ, Cui QH. HMDD v2.0: a database for experimentally supported human microRNA and disease associations. Nucleic Acids Res. 2014;42(D1):D1070–4.

    Article  CAS  PubMed  Google Scholar 

  13. Jiang Q, Wang Y, Hao Y, Juan L, Teng M, Zhang X, Li M, Wang G, Liu Y. miR2Disease: a manually curated database for microRNA deregulation in human disease. Nucleic Acids Res. 2009;37(Database):D98–104.

    Article  CAS  PubMed  Google Scholar 

  14. Yang Z, Wu LC, Wang AQ, Tang W, Zhao Y, Zhao HT, Teschendorff AE. dbDEMC 2.0: updated database of differentially expressed miRNAs in human cancers. Nucleic Acids Res. 2017;45(D1):D812–8.

    Article  CAS  PubMed  Google Scholar 

  15. Ruepp A, Kowarsch A, Schmidl D, Buggenthin F, Brauner B, Dunger I, Fobo G, Frishman G, Montrone C, Theis FJ. PhenomiR: a knowledgebase for microRNA expression in diseases and biological processes. Genome Biol. 2010;11(1):R6.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Zeng X, Zhang X, Zou Q. Integrative approaches for predicting microRNA function and prioritizing disease-related microRNA using biological interaction networks. Brief Bioinform. 2016;17(2):193–203.

    Article  CAS  PubMed  Google Scholar 

  17. Zou Q, Li J, Song L, Zeng X, Wang G. Similarity computation strategies in the microRNA-disease network: a survey. Brief Funct Genomics. 2015;15(1):55–64.

    PubMed  Google Scholar 

  18. Jiang QH, Hao YY, Wang GH, Juan LR, Zhang TJ, Teng MX, Liu YL, Wang YD. Prioritization of disease microRNAs through a human phenome-microRNAome network. BMC Syst Biol. 2010;4.

  19. Chen X, Yan CC, Zhang X, You ZH, Deng LX, Liu Y, Zhang YD, Dai QH. WBSMDA: within and between score for MiRNA-disease association prediction. Sci Rep. 2016;6:21106.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Xuan P, Han K, Guo M, Guo Y, Li J, Ding J, Liu Y, Dai Q, Li J, Teng Z, et al. Prediction of microRNAs associated with human diseases based on weighted k most similar neighbors. PLoS One. 2013;8(8):e70204.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Chen X, Liu MX, Yan GY. RWRMDA: predicting novel human microRNA-disease associations. Mol BioSyst. 2012;8(10):2792–8.

    Article  CAS  PubMed  Google Scholar 

  22. Xuan P, Han K, Guo YH, Li J, Li X, Zhong YL, Zhang ZG, Ding J. Prediction of potential disease-associated microRNAs based on random walk. Bioinformatics. 2015;31(11):1805–15.

    Article  CAS  PubMed  Google Scholar 

  23. Zou Q, Li J, Hong Q, Lin Z, Wu Y, Shi H, Ju Y. Prediction of MicroRNA-disease associations based on social network analysis methods. Biomed Res Int. 2015;2015:810514.

    PubMed  PubMed Central  Google Scholar 

  24. Gu C, Liao B, Li X, Li K. Network consistency projection for human miRNA-disease associations inference. Sci Rep. 2016;6:36054.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Li XY, Lin YP, Gu CL. A network similarity integration method for predicting microRNA-disease associations. RSC Adv. 2017;7(51):32216–24.

    Article  CAS  Google Scholar 

  26. Chen X, Yan CC, Zhang X, You ZH, Huang YA, Yan GY. HGIMDA: Heterogeneous graph inference for miRNA-disease association prediction. Oncotarget. 2016;7(40):65257.

    PubMed  PubMed Central  Google Scholar 

  27. Xu J, Li CX, Lv JY, Li YS, Xiao Y, Shao TT, Huo X, Li X, Zou Y, Han QL, et al. Prioritizing candidate disease miRNAs by topological features in the miRNA target-dysregulated network: case study of prostate cancer. Mol Cancer Ther. 2011;10(10):1857–66.

    Article  CAS  PubMed  Google Scholar 

  28. Chen X, Yan GY. Semi-supervised learning for potential human microRNA-disease associations inference. Sci Rep-Uk. 2014;4:5501.

    Article  CAS  Google Scholar 

  29. Li JQ, Rong ZH, Chen X, Yan GY, You ZHMCMDA. Matrix completion for MiRNA-disease association prediction. Oncotarget. 2017;8(13):21187.

    PubMed  PubMed Central  Google Scholar 

  30. Luo J, Ding P, Liang C, Cao B, Chen X. Collective prediction of disease-associated miRNAs based on transduction learning. IEEE/ACM Trans Comput Biol Bioinform. 2017;14(6):1468–75.

    Article  PubMed  Google Scholar 

  31. Liu Y, Zeng X, He Z, Zou Q. Inferring microRNA-disease associations by random walk on a heterogeneous network with multiple data sources. IEEE/ACM Trans Comput Biol Bioinform. 2016.

  32. Mork S, Pletscher-Frankild S, Caro AP, Gorodkin J, Jensen LJ. Protein-driven inference of miRNA-disease associations. Bioinformatics. 2014;30(3):392–7.

    Article  CAS  PubMed  Google Scholar 

  33. Xu C, Ping Y, Li X, Zhao H, Wang L, Fan H, Xiao Y, Li X. Prioritizing candidate disease miRNAs by integrating phenotype associations of multiple diseases with matched miRNA and mRNA expression profiles. Mol BioSyst. 2014;10(11):2800–9.

    Article  CAS  PubMed  Google Scholar 

  34. Zeng X, Liu L, Lü L, Zou Q. Prediction of potential disease-associated microRNAs using structural perturbation method. Bioinformatics. 2018;34(14):2425–32.

    Article  PubMed  Google Scholar 

  35. Li X, Lin Y, Gu C, Li Z. SRMDAP: SimRank and density-based clustering recommender model for miRNA-disease association prediction. Biomed Res Int. 2018;2018:11.

    Google Scholar 

  36. Chou CH, Chang NW, Shrestha S, Hsu SD, Lin YL, Lee WH, Yang CD, Hong HC, Wei TY, Tu SJ, et al. miRTarBase 2016: updates to the experimentally validated miRNA-target interactions database. Nucleic Acids Res. 2016;44(D1):D239–47.

    Article  CAS  PubMed  Google Scholar 

  37. Pinero J, Queralt-Rosinach N, Bravo A, Deu-Pons J, Bauer-Mehren A, Baron M, Sanz F, Furlong LI. DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes. Database (Oxford). 2015:bav028.

  38. Shannon CE. A mathematical theory of communication. ACM SIGMOBILE Mobile Computing and Communications Review. 2001;5(1):3–55.

    Article  Google Scholar 

  39. Wang D, Wang JA, Lu M, Song F, Cui QH. Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases. Bioinformatics. 2010;26(13):1644–50.

    Article  CAS  PubMed  Google Scholar 

  40. Baskerville S, Bartel DP. Microarray profiling of microRNAs reveals frequent coexpression with neighboring miRNAs and host genes. RNA. 2005;11(3):241–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Lu M, Zhang Q, Deng M, Miao J, Guo Y, Gao W, Cui Q. An analysis of human microRNA and disease associations. PLoS One. 2008;3.

  42. Yamada A, Horimatsu T, Okugawa Y, Nishida N, Honjo H, Ida H, Kou T, Kusaka T, Sasaki Y, Makato Y, et al. Serum miR-21, miR-29a and miR-125b are promising biomarkers for the early detection of colorectal neoplasia. Clin Cancer Res. 2015;21(18):4234–42.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. McGuire S. World Cancer report 2014. Geneva, Switzerland: World Health Organization, International Agency for Research on Cancer, WHO press, 2015. Adv Nutr. 2016;7(2):418–9.

    Article  PubMed  PubMed Central  Google Scholar 

  44. Johnson SM, Grosshans H, Shingara J, Byrom M, Jarvis R, Cheng A, Labourier E, Reinert KL, Brown D, Slack FJ. RAS is regulated by the let-7 microRNA family. Cell. 2005;120(5):635–47.

    Article  CAS  PubMed  Google Scholar 

  45. Wang Q, Jiang S, Song A, Hou S, Wu Q, Qi L, Gao X. HOXD-AS1 functions as an oncogenic ceRNA to promote NSCLC cell progression by sequestering miR-147a. OncoTargets Ther. 2017;10:4753–63.

    Article  Google Scholar 

  46. Wang H, Naghavi M, Allen C, Barber RM, Bhutta ZA, Carter A, Casey DC, Charlson FJ, Chen AZ, Coates MM. Global, regional, and national life expectancy, all-cause mortality, and cause-specific mortality for 249 causes of death, 1980–2015: a systematic analysis for the global burden of disease study 2015. Lancet. 2016;388(10053):1459–544.

    Article  Google Scholar 

  47. Sadakari Y, Ohtsuka T, Ohuchida K, Tsutsumi K, Takahata S, Nakamura M, Mizumoto K, Tanaka M. MicroRNA expression analyses in preoperative pancreatic juice samples of pancreatic ductal adenocarcinoma. JOP. 2010;11(6):587–92.

    PubMed  Google Scholar 

  48. Lodygin D, Tarasov V, Epanchintsev A, Berking C, Knyazeva T, Körner H, Knyazev P, Diebold J, Hermeking H. Inactivation of miR-34a by aberrant CpG methylation in multiple types of cancer. Cell Cycle. 2008;7(16):2591–600.

    Article  CAS  PubMed  Google Scholar 

  49. Ke Z-P, Xu P, Shi Y, Gao A-M. MicroRNA-93 inhibits ischemia-reperfusion induced cardiomyocyte apoptosis by targeting PTEN. Oncotarget. 2016;7(20):28796.

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

Not applicable.

Funding

Publication costs were supported by National Natural Science Foundation of China (No. 61472127).

Availability of data and materials

All data generated or analysed during this study are included in this published article (and its supplementary information files).

About this supplement

This article has been published as part of BMC Systems Biology Volume 13 Supplement 2, 2019: Selected articles from the 17th Asia Pacific Bioinformatics Conference (APBC 2019): systems biology. The full contents of the supplement are available online at https://bmcsystbiol.biomedcentral.com/articles/supplements/volume-13-supplement-2.

Author information

Authors and Affiliations

Authors

Contributions

CG and XL conceived of and designed the approach. XL carried out the experiments and wrote the manuscript. CG, YL and JY participated in revising the manuscript critically. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Changlong Gu.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional files

Additional file 1:

Known miRNA-disease associations. (XLSX 146 kb)

Additional file 2:

Integrated disease similarity. (XLSX 1379 kb)

Additional file 3:

Integrated miRNA similarity. (XLSX 1850 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, X., Lin, Y., Gu, C. et al. FCMDAP: using miRNA family and cluster information to improve the prediction accuracy of disease related miRNAs. BMC Syst Biol 13 (Suppl 2), 26 (2019). https://doi.org/10.1186/s12918-019-0696-9

Download citation

  • Published:

  • DOI: https://doi.org/10.1186/s12918-019-0696-9

Keywords