FCMDAP: using miRNA family and cluster information to improve the prediction accuracy of disease related miRNAs

Background Biological experiments have confirmed the association between miRNAs and various diseases. However, such experiments are costly and time consuming. Computational methods help select potential disease-related miRNAs to improve the efficiency of biological experiments. Methods In this work, we develop a novel method using multiple types of data to calculate miRNA and disease similarity based on mutual information, and add miRNA family and cluster information to predict human disease-related miRNAs (FCMDAP). This method not only depends on known miRNA-diseases associations but also accurately measures miRNA and disease similarity and resolves the problem of overestimation. FCMDAP uses the k most similar neighbor recommendation algorithm to predict the association score between miRNA and disease. Information about miRNA cluster is also used to improve prediction accuracy. Result FCMDAP achieves an average AUC of 0.9165 based on leave-one-out cross validation. Results confirm the 100, 98 and 96% of the top 50 predicted miRNAs reported in case studies on colorectal, lung, and pancreatic neoplasms. FCMDAP also exhibits satisfactory performance in predicting diseases without any related miRNAs and miRNAs without any related diseases. Conclusions In this study, we present a computational method FCMDAP to improve the prediction accuracy of disease related miRNAs. FCMDAP could be an effective tool for further biological experiments. Electronic supplementary material The online version of this article (10.1186/s12918-019-0696-9) contains supplementary material, which is available to authorized users.


Background
MicroRNAs (miRNAs) are small endogenous non-coding RNAs with length of about 22 nt and can regulate gene expression mainly through post-transcription [1]. The latest version of miRBase consists of 1881 human miRNAs, and most of them regulate more than 60% of human protein-coding genes. miRNAs regulate target genes through biological processes, such as cell growth, proliferation, differentiation and apoptosis. miRNAs play a critical role in the development of various diseases including cancers [2]. Takamizawa et al. [3] found that the expression level of let-7 decreases in lung neoplasms in vivo and in vitro, resulting in shortened post-operative survival of the patients. Moreover, let-7 is a potential therapeutic miRNA for prevention of tumorigenesis. Lung neoplasms are characterized by several key oncogene mutations, including p53, RAS, and MYC; some of which may be directly related to the decreased expression of let-7 and may be inhibited by introducing this miRNA [3]. miRNAs can be used as biomarkers to identify cancer tissure origin of unknown primary origin [4,5]. Therefore, identification of disease-related miRNAs would benefit research on pathogenesis and diagnosis.
Many disease-related miRNAs have been identified through biological experiments. Researchers have collected data from existing literature to build miRNA-related databases, such as miRBase [6], miRGen [7], miRTarBase [8], miRWalk [9], microRNA.org [10], miR-Cancer [11], HMDD [12], miR2Disease [13], dbDEMC [14], and PhenomiR [15]. These databases provide solid data foundation for study of miRNAs. However, methodologies for screening of miRNA-disease associations are costly and time consuming. In this regard, computational methods are used to predict miRNAs that are most likely associated with a disease and provide experimental targets for biological experiments to save cost and time.
Computational methods are classified into two main categories, namely, network-based methods and machine-learning-based methods [16]. Network-based methods predict unknown miRNA-disease associations by constructing different computational models using miRNAs and disease-related data resources to construct miRNA and disease similarity networks [17]; the obtained data are then combined with experimentally validated (or known) miRNA-disease networks. Jiang et al. [18] proposed a miRNA-prediction algorithm for the hypergeometric distribution scoring system, and the scores are ranked to select candidate disease-related miRNAs. Chen et al. [19] proposed WBSMDA method, which integrates the With-Score of miRNA and diseases similarity and the Between-Score of unknown miRNAdisease associations to predict potential miRNA-disease associations. However, the two methods make assumptions about probability distribution, and their prediction performances will be affected when the data resources are inconsistent with the assumptions. Xuan et al. [20] proposed HDMP method by considering weighted k most similar neighboring miRNAs and combining miRNA functional similarity to predict miRNAs associated with human diseases. RWRMDA [21] and MIDP [22] methods use random walk to calculate similarity of miRNAs and diseases. However, these methods cannot predict related miRNAs for diseases without any related miRNAs or new diseases (isolated diseases). Zou et al. [23] proposed KATZ to calculate the prediction score of different walking lengths between miRNAs and diseases through social network analysis. However, the performance of KATZ is poor because the known associations are sparse. KATZ also cannot predict related diseases for miRNAs without known related diseases or new miR-NAs (isolated miRNAs). However, KATZ cannot be used to predict related miRNAs for isolated diseases.NCPMDA [24] develops network consistency projection to calculate potential miRNA-disease association score from miRNA and disease vector space projection scores. Li et al. [25] proposed a network similarity integration method (NSIM) for predicting potential miRNA-disease associations. However, NSIM are overly dependent on known miRNA-disease associations. HGIMDA [26] utilizes a heterogeneous graph iterative algorithm based on known miRNA-disease associations to predict miRNA-disease associations. However, HGIMDA is difficult to use in selecting parameters.
Machine learning-based methods aim to predict reliable miRNA-disease association by extracting effective features or solving specific optimization problems by using powerful machine-learning algorithms. Xu et al. [27] built a support vector machine (SVM) classifier by using four topological features based on the miRNA target-dysregulated network to predict potential miR-NAs related to prostate cancer. The main disadvantage of Xu's method is the impossibility to obtain negative samples, thereby decreasing the prediction performance. Chen and Yan [28] proposed RLSMDA method that uses regularized least squares to predict miRNA-disease associations. This method is based on semi-supervised learning and avoid using negative samples but adjust parameters intricately. Li et al. [29] proposed MCMDA method using the matrix completion algorithm. Luo et al. [30] proposed CPTL method using the transduction learning collective prediction model to predict miRNA-disease associations. However, these methods cannot be applied to predict potential miRNAs for isolated diseases.
These above methods use only a single piece of information related to miRNAs or diseases, such as association of miRNAs and diseases verified by biological experiments, resulting in overestimation [31]. Therefore, researchers have investigated different types of miRNAand disease-related a priori biological information to construct miRNA-disease associations through intermediaries. For example, Mørk et al. [32] developed a miRNA-protein-disease heterogeneity-related network, namely, miRPD, which uses protein-related associations as a bridge to link miRNAs and diseases. However, the prediction accuracy of miRPD is unsatisfactory because of its high false positive/negative rates. Xu et al. [33] used the network of interactions between miRNAs and target genes derived from matched miRNA and mRNA expression data and the network of interactions between specific miRNAs and diseases to sequence and identify miRNAs most likely associated with multiple diseases. Liu et al. [31] integrated miRNA-target gene and miRNA-lncRNA multiple data sources, established disease and miRNA similarity subnets, and predicted miRNA-disease associations in heterogeneous networks by using random walk with restart. Zeng et al. [34] used gene functional information, four main parameters of miRNAs and miRNA-disease associations to construct a bilayer networks. Then they used structural consistency as an indicator to estimate the link predictability of the bilayer networks, and used structural perturbation method (SPM) to predict potential miRNA-disease associations. SRMDAP [35] builds miRNA and disease similarity subnetworks by using the SimRank algorithm and density-based clustering recommender model based on known miRNA-mRNA interaction data, disease-gene data, and miRNA-disease association data. However, these methods lead to incomplete calculation of similarity and low prediction accuracy.
In our work, we propose a novel computational method, namely, FCMDAP, by using miRNA family and cluster information to improve the prediction accuracy of disease-related miRNAs. FCMDAP uses information entropy and mutual information (MI) to measure similarity between miRNAs based on miRNA-mRNA interaction and adds miRNA family information to reconstruct a miRNA similarity network. FCMDAP obtains functional similarity between diseases based on disease-gene interaction and semantic similarity between diseases based on disease directed acyclic graph (DAG). FCMDAP then integrates functional and semantic similarity to disease similarity. Based on the k-most similar neighboring recommendation algorithm, FCMDAP uses experimentally verified miRNA-disease association, miRNA similarity, and cluster information to predict potential miRNA-disease associations in miRNA space. FCMDAP also uses experimentally verified miRNA-disease association and disease similarity to predict potential miRNA-disease associations in disease space. The two predicted association scores are linearly integrated together. We implemented leave-one-out cross validation (LOOCV) and achieved AUC of 0.9165. Analysis of miRCancer, dbDEMC, or PhenomiR databases, confirmed the 50, 49, and 48 of top 50 predicted miR-NAs in case studies of colorectal, lung, and pancreatic neoplasms, respectively. The average AUC values of FCMDAP to predict isolated diseases and miRNAs were 0.8417 and 0.8944, respectively. For isolated lung neoplasms, all of the top 50 predicted miRNAs were confirmed. For isolated hsa-mir-93, 9 of the top 10 diseases were confirmed. In conclusion, FCMDAP outperforms other methods.

Data
Data used in FCMDAP are obtained from five data sets: (1) experimentally verified miRNA-disease related data from HMDD v2.0 database (http://www.cuilab.cn/ hmdd, Jun-14-2014 Version) [12]. After filtering invalid data with disease name error or wrong miRNA name and removing redundant miRNA-disease associations, we obtained 5048 experimentally verified miRNA-disease associations including 475 miRNAs and 334 diseases as the benchmark dataset [see Additional file 1]. We use M = {m 1 , m 2 , ,m nm } to represent the miRNA set and D = {d 1 , d 2 , ,d nd } to represent the disease set, where nm is the number of miRNAs, and nd is the number of diseases. We also use the matrix AS to represent the known association of miRNAs and diseases. When miRNA i associates with disease j, AS(i, j) is 1. Otherwise, AS(i, j) is 0. (2) experimentally verified miRNA-mRNA interactions from miRTarBase database (http://mirtarbase.mbc.nctu.edu.tw/, Release 6.0: Sept-15-2015) [36]. We use these data to measure functional similarity of miRNAs.  [37]. We use these data to measure functional similarity of diseases. (4) data on the relationship of various disease from the MeSH (http://www.nlm.nih.gov/, 2017 Version) descriptor of Category C, which are descripted as DAG. We use these data to measure semantic similarity of diseases. (5) information of the family and cluster of human miRNAs from miRBase (http://www.mirbase.org, Release 21) [6]. We established the miRNA family information matrix FAM for the 475 miRNAs in the benchmark. FAM(i, j) = 1 if miRNA i and j are in the same family; otherwise, FAM(i, j) = 0. We also established the miRNA cluster information matrix CLU for 475 miRNAs. CLU(i, j) = 1 if the distance between miRNA i and j is less than 20 kb and we consider the two miRNAs in the same cluster; otherwise, CLU(i, j) = 0.

miRNA similarity network
Information entropy and mutual information (MI) are used to calculate similarity between miRNAs based on the set of mRNAs interacting with miRNAs.
In events set X, information entropy is a measure of the average information content that can be obtained if one of the events actually occurs [38]. This parameter can be defined as where p(x) is the probability of x.
For two discrete random variables X and Y, their MI can be described as where p(x) is the marginal probability distribution function of X, p(y) is the marginal probability distribution function of Y, and p(x, y) is the joint probability function of X and Y.
where N is the total number of the known miRNA-mRNA interactions in the dataset. nðT A m ðiÞÞ is the known number of interactions between the ith target gene in the target gene set of miRNA A and all miRNAs. pðT A m ðiÞÞ is the rate of the ith target gene in the target gene set of miRNA A with the known miRNA-mRNA interactions.
The similarity between miRNA A and miRNA B can use the normalized MI of T A m and T B m denoted as where HðT A m ∩T B m Þ is the information entropy of the intersection of T A m and T B m . When calculating the similarity of miRNA A and miRNA B, both of their information entropies and the common information entropies of their mRNAs are considered. Also, the frequency of occurrence of the target mRNAs are considered. It measures the similarity between miRNAs by MI according to the occurrence probability of target genes of miRNAs. The target gene with higher probability is more universal and carries less information, while the target gene with lower probability is more specific and carries more information. Obviously, the difference in target gene probability results in such a result. By comparing the similarity data, we find that the metric is determined by the above two factors, and the similarity between the two miRNAs can be appropriately measured.

Disease similarity network
In building disease similarity network, we first calculate the functional similarity of disease on the basis of disease-gene interaction dataset. We then calculate the semantic similarity of disease on the basis of disease DAG. Finally, we integrate both data into disease similarity to build a disease similarity network.

If the interaction genes set of disease
where N is the total number of known disease-gene interactions in the dataset, nðT A d ðiÞÞ is the known number of the interactions between the ith target gene in the target gene set of disease A and all diseases, and pðT A d ðiÞÞ is the rate of the ith target gene in the target gene set of disease A with known disease-gene interactions.
The functional similarity between disease A and disease B can use the normalized MI of T A d and T B d denoted as where HðT A d Þ and HðT B d Þ are the information entropies T A d and T B d of disease A and disease B, respectively. Hð T A d ∩T B d Þ is the information entropy of the intersection of T A d and T B d . When calculating the functional similarity of disease A and disease B, both the information entropy of the diseases and the common information entropy of their genes are considered.

Disease semantic similarity
Disease semantic similarity DD are built from disease DAG as reported in the literature [39].
where DD(A, B) is the semantics similarity value between disease A and disease B in disease DAG. For the meaning of the symols, please refer to the literature [39].

Integrating disease similarity
We integrate disease functional similarity and semantic similarity to obtain disease similarity.
where γϵ(0, 1) is the balance factor to tune the contribution level from disease function similarity and semantic similarity. The results are shown in Additional file 2.
miRNA similarity network reconstruction miRNA family information is obtained from miRBase database. We establish the miRNA family information matrix FAM for 475 miRNAs in the benchmark dataset. FAM(A, B) = 1 if miRNA A and B are in the same family; otherwise, FAM(A, B) = 0. We recalculate the miRNA similarity by adding miRNA family information as follows We then reconstruct the miRNA similarity network. The results are shown in Additional file 3.

FCMDAP prediction method
The flowchart of FCMDAP to predict disease-related miRNAs is shown in Fig. 1.
miRNA space score calculation Calculating the recommendation score of neighboring miRNAs and disease Wang et al. [39] proposed that miRNAs with the same similarity tend to be related to diseases with the same functions, and vice versa. In the miRNA space, the related score between miRNA and disease is associated with the correlation score of the neighbor nodes with the miRNA closest to the disease. Hence, if a similar neighbor of a miRNA is related to a disease, then the miRNA may be related to the disease. According to the collaborative recommendation algorithm, the association score of miRNA i and disease j is calculated based on the similarity scores of the top k1 nearest neighbor nodes of miRNA i and the association scores of these nodes and disease j. We normalize the association score of the top k1 most similar neighbor nodes of miRNA i and disease j by using the following: where SM1 is the row vector of each miRNA in the miRNA matrix miRNAsim and is sorted in descending order. Hence, miRNAs that are more similar will be ranked higher. SM1(i, k) is a component of miRNA i and the kth closet similar neighbor nodes in the vector SM1.
If miRNA k is related to disease j, then we calculate the sum of the related scores between miRNA i and miRNA k and divide the sum of the related scores of the top k1 similar neighbor nodes of miRNA i.
Calculating the prediction score in the same miRNA cluster Baskerville S. and Bartel D.P. [40] found significant coexpression among the proximal pairs of miRNAs (< 50 kb). The closest miRNA cluster is usually expressed as a common regulatory unit of polycistronics, and intronic miR-NAs are usually coexpressed with host genes, presenting Fig. 1 The flowchart of FCMDAP complex miRNA expression patterns. Lu et al. [41] performed statistical analysis and found that miRNAs in 46% of diseases have at least one neighboring member. For example, all of the 6 miRNAs (miR-17, miR-18a, miR-19a, miR-20a, miR-19b-1 and miR-92a-1) involved in hematopoietic malignancies are located in the miR-17 cluster. This result shows that neighboring miRNAs may be regulated by a common regulator under the same conditions and interactions, and their dysfunction may lead to the same disease. Wang et al. [39] confirmed that miRNAs are more likely to associate with the similar disease when clustered and located within 20 kb of genomic location. We downloaded the information of the location of human miRNAs in the genome from miRBase v.21, and clustered miRNAs are selected within a distance of 20 kb. A miRNA cluster matrix CLU is built for the 475 miRNAs in the benchmark dataset. Basing on the collaborative recommendation algorithm, we calculate the normalized related scores between miRNA i and disease j as where SM2(i, k) is the similarity score of miRNA i and miRNA k in the same cluster, and n is the number of miRNAs in the same cluster as miRNA i. If miRNA k is related to disease j, then we add the similarity score miRNAsim(i, k) of miRNA i and miRNA k and divide the sum of the similarity score of pairwise miRNAs in the same cluster as miRNA i. From the formula, we can find that the closer the miRNAs are in the same cluster with disease j, the closer the relation of miRNA i with disease j will be.
Integrating similarity score in miRNA space In the miRNA space, the recommendation scores of miRNA-disease associations are calculated by integrating the score of top k similarity neighboring miRNAs of miRNA i and the recommendation score of miRNAs in the same cluster as miRNA i with disease j. The formula is as follows: where α is a tradeoff factor. Experiments show that FCMDAP gets the best performance when α is 0.5.

Calculating disease space score
In the disease space, we also use the k-nearest neighborbased recommendation algorithm to calculate the predicted association score between disease and miRNA. If the k-nearest neighbor of a disease is related to a miRNA, then the disease is related to the miRNA.
According to the collaborative recommendation algorithm, for miRNA i with disease j, their recommendation score is calculated by the normalized similarity score between the k2-nearest neighbors of disease j and miRNA i. The formula is shown as follows where SD1 is the column vector of all diseases in disease similarity matrix SD. These vectors are sorted in descending order, and the most similar disease is ranked as the highest. SD1(k, j) represents the k-th component of the k-th nearest neighbor of disease j on the similarity column vector SD of disease j.
Calculating the final prediction score of disease-related miRNAs The final prediction score of disease-related miRNAs of miRNA i with disease j is obtained by integrating the scores in miRNA space and disease space as follows where β is the factor used to balance the weight of two spaces. Experiments show that the optimal performance of FCMDAP can be obtained when the value of β is 0.8. FCMDAP can predict isolated disease-related miRNAs and isolated miRNA-related diseases. Isolated disease-related miRNAs/miRNA-related diseases are miRNAs/diseases without any related diseases/miRNAs, such as newly discovered miRNAs/diseases. When we use FCMDAP to predict isolated disease-related miRNAs, all miRNAs related to disease j do not exist, leading to the prediction score S _ miRNA(i, j) of 0. We calculate S _ disease(i, j) from two parts, namely, similarity score between miRNA i and other diseases and similarity between diseases. Thus, FCMDAP can predict the association between isolated diseases and miRNAs. When we predict isolated miRNA-related disease, diseases related to miRNA i do not exist, leading S _ disease(i, j)= 0. We can calculate S _ miRNA(i, j) from the relationship between other miRNA and disease j and the similarity between miRNAs to predict the association of miRNA i and disease j.

Characteristics of the miRNA-disease association network
The benchmark data set include 5048 known miRNAdisease associations of 475 miRNAs and 334 diseases. The characteristics of these associations are shown in

Performance evaluation of FCMDAP
The LOOCV of known miRNA-disease associations is used to evaluate the performance of FCMDAP. For a given disease d, each known association of disease d is deleted in turn as a test sample, and the other known associations are used as training set. The remaining miR-NAs without experimental evidence regarding their relation with disease d comprise the candidate miRNA set. The association prediction scores of these candidate miRNAs and diseases are calculated and ranked. If the rank exceeds a given threshold, then we consider FCMDAP to successfully predict the association of miRNA and disease. After changing the threshold, drawing the receiver operating characteristic (ROC) curve and calculating the area under the curve (AUC) value are conducted to evaluate prediction performance.
The ROC plots indicate the relationship between the true positive rate (TPR) and the false positive rate (FPR) at different thresholds. If TP, FP, TN, and FN represent true positive, false positive, true negative, and false negative, respectively, then TPR and FPR are calculated as and After one round of LOOCV, one association between miRNA and disease was excluded, and the prediction score was calculated by remaining associations. All these scores were sorted and a special ranking position was selected as threshold. TP and FP are the number of experimentally verified and unverified associations above the threshold, respectively. TN and FN are the number of unverified and verified associationas below the threshold, respectively.
We compared FCMDAP with SRMDAP, RLSMDA [28], KATZ [23], and Liu's method [31] in terms of prediction performance, AUC value, and ROC shapes on the benchmark data set. The values of the four parameters of FCMDAP are α = 0.5, β = 0.8, k1 = 50, and k2 = 30. The optimal parameters of SRMDAP, RLSMDA, KATZ, and Liu's method are set as previously described. The comparison of the overall ROC curves and AUCs of all methods are shown in Fig. 2. The average AUC value of FCMDAP is 0.9165, which is 3.72, 5.81, 6.43, and 11.82% higher than those of SRMDAP, RLSMDA, KATZ and Liu's method, respecitively. When the FPR is lower than 0.2, the ROC of FCMDAP is more convex near the upper left corner, indicating that the prediction accuracy is higher. Therefore, FCMDAP shows higher prediction accuracy than the other methods.
To obtain reliable judgment, we tested 18 human diseases associated with at least 70 miRNAs. The results are shown in Table 2. Table 2 shows that FCMDAP obtained the highest AUC value of 0.8837 for pancreatic neoplasms and the lowest AUC value of 0.7572 for hepatocellular carcinoma. The average AUC value for the 18 diseases is 0.8195. The average AUC values for the 18 diseases obtained from SRMDAP, RLAMDA, KATA, and Liu's method are 0.8057, 0.6671, 0.6901, and 0.5178, respectively. The average AUC value obtained by FCMDAP is 1.38, 15.24, 12.94, and 30.17% higher than those of the four methods, respectively. Hence, FCMDAP exhibits better performance than SRMAPS, RLSMDA, KATA, and Liu's method.

Parameter effect
The five parameters in FCMDAP are α, β, γ, k1, and k2. We focus on miRNA space. In the miRNA space, α balances the tradeoff between the recommendation score from the neighboring miRNAs and the score from the miRNA cluster. β is the entire space balancing factor that sets different weights of recommendation scores from the miRNA and disease spaces. To obtain optimal parameters, we assign different values to α and β starting from 0.1 to calculate the recommendation scores of miRNA-disease association and evaluate the performance of FCMDAP by calculating AUC value. We repeat this work by increasing α and β in steps of 0.1 and calculating the AUC value until α and β are both 1. We obtain the best performance when α = 0.5 and β = 0.8, and the AUC of FCMDAP is 0.9165. The results are shown in Fig. 3.
As shown in Fig. 3, the ordinate is the average AUC value, and the abscissa is the value at which β is magnified 10 times. Each curve in the figure represents the line connecting the points of the corresponding average  Colorectal neoplasms, the third most common cancer worldwide, severely affects the human health. In this regard, understanding colorectal-related miRNAs is important for diagnosis and prognosis of colorectal neoplasmsa. For example, patients with early colorectal neoplasms can be discriminated from healthy people by using serum miR-21, miR-29a, and miR-125b levels [42].
We used experimentally identified miRNA-disease associations as training samples to calculate the recommendation score of all candidate miRNAs through FCMDAP. We then ranked them in descending order and selected the top 50 miRNAs for verification. The top 50 candidate miRNAs and the corresponding evidence of their association with colorectal neoplasms are listed in Table 3. All the top 50 miRNAs were confirmed by analysis of miR-Cancer, dbDEMC, and PhenomiR databases.
Lung neoplasms is a malignant lung tumor caused by uncontrolled growth of lung tissue cells. Lung tumor cells can also rapidly spread from the lungs to other nearby tissues or other parts of the body. According to the World Health Organization's 2014 World Cancer Report [43], the number of patients with lung tumors worldwide reached 1.8 million in 2012. Lung neoplasms are the main cause of cancer-related death in men and women (other than breast neoplasms). In the United States, the 5-year survival rate for patients diagnosed with lung neoplasms is only 17.4%, which is lower than that in developing countries. Thus, effective methods for early diagnosis and treatment of lung neoplasms are important. Evidence indicates the important role of miRNAs in the pathogenesis, migration, and spread of lung neoplasms. For example, Takamizawa et al. [3] first found that the expression levels of let-7 are often reduced in lung neoplasms in vitro and in vivo in their study on 143 cases of lung neoplasms. The decrease in let-7 expression may affect the survival of patients that with lung neoplasms who were surgically treated. Johnson et al. [44] found that let-7 acts as a tumor In our work, we used experimentally identified miRNAdisease associations as training samples to calculate recommendation scores of all candidate miRNAs based on FCMDAP. We then ranked them in descending order and selected the top 50 miRNAs for verification. The top 50 candidate miRNAs and the corresponding evidence of their association with colorectal neoplasms are listed in Table 4. Among these miRNAs, 48 miRNAs were confirmed in miRCancer, dbDEMC, and PhenomiR databases, and only two miRNAs (hsa-mir-520 g, hsa-mir-147a) were not confirmed. A recent study (PMID: 29033588) [45] showed that hsa-mir-147a is related to lung neoplasms. In this study, lncRNA HOXD-AS1 is specifically upregulated in non-small-cell lung cancer (NSCLC) tissues and promotes cancer cell growth by targeting miR-147a.
Pancreatic neoplasms are cellular masses caused by uncontrollable pancreatic cell proliferation. The most Table 3 The top 50 candidate miRNAs associated with colorectal neoplasms predicted by FCMDAP and the confirmation for their associations by miRCancer, PhenomiR or dbDEMC databases are listed here. All of them have been confirmed  common symptoms of pancreatic neoplasms include yellowing of the skin, abdominal or back pain, unexplained weight loss, and loss of appetite. Early pancreatic neoplasms are small and have no symptoms. Most pancreatic neoplasms are large when they are found and can metastasize to other parts of the body. According to reports, 411,600 people worldwide died of various pancreatic neoplasms in 2015. Pancreatic neoplasms most often occur in developed countries; that is, these malignancies rank as the fifth most common cancer in the UK and the fourth most common cancer in the United States [43,46]. The prognosis of pancreatic neoplasms is very poor, with 25% survival rate for 1 year after diagnosis and 5% survival rate for 5 years. Thus, effective methods for early diagnosis, treatment, and prognosis of pancreatic neoplasms must be developed. At present, evidence supports the role of miRNA differential expression in the diagnosis, treatment, and prognosis of pancreatic neoplasms. For example, Sadakari et al. [47] found that the relative expression levels of miR-21 and miR-155 in tissues and pancreatic juice of patients with pancreatic ductal adenocarcinoma are significantly higher than those in patients with chronic pancreatitis; thus, miR-21 and miR-155 in pancreatic juice may be a potential biomarker for diagnosis of pancreatic ductal adenocarcinoma. Lodygin et al. [48] reported that the expression of miR-34a is silenced in several types of  cancers, including pancreatic neoplasms, due to CpG methylation. By partially targeting CDK16, the re-expression of miR-34a in MiaPaC2 cell line with pancreatic neoplasms induces cellular senescence and cell cycle arrest. This observation indicates that miR-34a is a neoplasm suppressor gene, which is inactivated by CpG methylation and subsequent transcriptional silencing in various tumors, such as pancreatic neoplasms. Thus, miR-34a can be used as a therapeutic target for malignant neoplasms, such as pancreatic neoplasms. In our work, we also calculated the recommendation score of all candidate miRNAs based on FCMDAP, ranked them in descending order, and selected the top 50 miRNAs for verification. The top 50 candidate miR-NAs and the corresponding evidence of their associations with pancreatic neoplasms are listed in Table 5. Among the top 50 miRNAs, 48 miRNAs were confirmed in the miRCancer, dbDEMC, and PhenomiR databases, and only two miRNAs (miR-378a and miR-365a) were not confirmed.

Predicting isolated diseases and isolated miRNAs
FCMDAP can predict isolated disease-related miRNAs. In our work, we removed all experimentally verified disease-miRNA associations for a given disease and calculated the recommendation score by FCMDAP. We also ranked the miRNAs according to their recommendation scores. The average AUC of FCMDAP for predicting an isolated disease is 0.8417. For lung  neoplasms, FCMDAP identifies the top 50 miRNAs related to lung neoplasms ( Table 6). All of the top 50 miRNAs were confirmed by one or more databases (miRCancer, dbDEMC, or PhenomiR). Hence, FCMDAP exhibits satisfactory performance in predicting isolated diseases. FCMDAP also shows satisfactory performance in predicting isolated miRNA-related diseases. In our work, we removed all disease association information for a given miRNA and calculated the recommendation score for all diseases for a given miRNA by using FCMDAP. We ranked these diseases and verified them in the databases. The average AUC of the FCMDAP to predict isolated miRNA is 0.8944. For hsa-mir-93, the top 10 related diseases predicted by FCMDAP are listed in Table 7. Among the 10 diseases, eight were confirmed to be related to hsa-mir-93 by dbDEMC or PhenomiR databases. Adrenocortical carcinoma, which ranked 8, was not confirmed by these two databases. Heart failure, which ranked 1, was confirmed to be related to hsa-mir-93 in the literature. Ke et al. [49] found that miR-93 is related to cardiomyocyte apoptosis, and miR-93 can prevent cardiomyocyte apoptosis induced by myocardial ischemia/reperfusion by inhibiting PI3K/ AKT/PTEN signaling.

Discussion
In this work, we developed FCMDAP to predict human disease-related miRNAs. FCMDAP calculates the similarity between miRNAs by using mutual information based on the known miRNA-mRNA interaction information and adds the miRNA family information to construct a miRNA space. FCMDAP integrates disease functional similarity based on the disease-gene interaction and disease semantic similarity based on the DAG from MeSH to construct a disease space. FCMDAP integrates the association scores between miRNA and disease from miRNA and disease spaces. The association scores between miRNA and disease are calculated based on the k most similar neighbor recommendation algorithm, and miRNA cluster information is added into miRNA space. Like NSIM and other method, FCMDAP also predict unknown associations by constructing miRNA network and disease network. However, in the process, the similarity calculation process of miRNA and disease are independent of each other. Multiple types of data including miRNA-mRNA interaction, miRNA family information, disease-gene interaction, DAG from MeSH to calculate miRNA similarity, and disease similarity are considered and the prediction does not only depend on the known miRNAdiseases associations, thereby improving the accuracy of similarity calculations. Using the k most similar neighbor recommendation algorithm and miRNA cluster information makes the prediction results more reasonable, and improves the predictive performance. LOOCV and case research show that FCMDAP exhibits excellent performance in predicting miRNA-disease associations. FCMDAP shows satisfactory performance in predicting diseases without any related miRNA information and miRNAs without any related disease information. The average AUC of FCMDAP for predicting isolated diseases and isolated miRNAs are 0.8417 and 0.8944, respectively. For isolated lung neoplasms, the prediction accuracy reached 100% in the top 50 predicted miRNAs. For the isolated hsa-mir-93, the prediction accuracy reached 90% in the top 10 diseases.
However, FCMDAP presents the following limitations. miRNA similarity can be further improved if other biomolecules that interact with miRNAs can be considered. As FCMDAP is developed on experimentally verified miRNAdisease associations, miRNA-disease associations can be experimentally verified, thereby improving the performance of FCMDAP.

Conclusion
In order to provide effective support for experimental research on miRNAs, we proposed a computational method FCMDAP to find potential disease-related miR-NAs. FCMDAP exhibits excellent performance in predicting potential disease-related miRNAs. The FCMDAP could extend to study on other biomeolecular networks and help to decipher the study of complex human disease pathogenesis and diagnosis.