Dysregulation of genetic factors such as microRNAs (miRNAs) and mRNAs has been widely shown to be associated with cancer progression and development. In particular, miRNAs and mRNAs cooperate to affect biological processes, including tumorigenesis. The complexity of miRNA-mRNA interactions presents a major barrier to identifying their co-regulatory roles and functional effects. Thus, by computationally modeling these complex relationships, it may be possible to infer the gene interaction networks underlying complicated biological processes.
We propose a data-driven, hypergraph structural method for constructing higher-order miRNA-mRNA interaction networks from cancer genomic profiles. The proposed model explicitly characterizes higher-order relationships among genetic factors, from which cooperative gene activities in biological processes may be identified. The proposed model is learned by iteration of structure and parameter learning. The structure learning efficiently constructs a hypergraph structure by generating putative hyperedges representing complex miRNA-mRNA modules. It adopts an evolutionary method based on information-theoretic criteria. In the parameter learning phase, the constructed hypergraph is refined by updating the hyperedge weights using the gradient descent method. From the model, we produce biologically relevant higher-order interaction networks showing the properties of primary and metastatic prostate cancer, as candidates of potential miRNA-mRNA regulatory circuits.
Our approach focuses on potential cancer-specific interactions reflecting higher-order relationships between miRNAs and mRNAs from expression profiles. The constructed miRNA-mRNA interaction networks show oncogenic or tumor suppression characteristics, which are known to be directly associated with prostate cancer progression. Therefore, the hypergraph-based model can assist hypothesis formulation for the molecular pathogenesis of cancer.
miRNA-mRNA interaction networksHypergraph-based modelHigher-order gene modulesEvolutionary learningCancer genomics data analysis
Prostate cancer is a common disease in the male population, induced by complex interactions among various genetic factors . As such, the pathological causes of this disease are not easily identified. Recent human cancer studies have demonstrated that most cancer regulations are related to modular construction and combinatorial control by multiple genetic factors. This module-based view of higher-order relationships can provide new insights into the behavior of complex biological systems [2, 3].
Recently, miRNAs have caused great excitement as diagnostic and therapeutic signatures of prostate cancer [4–8]. They play important roles in cancer pathogenesis, including disease onset, progression, and metastasis, by regulating the stability and translation efficiency of their target mRNAs. Thus, the functional relationships between miRNAs and mRNAs should be elucidated to identify key transcriptional circuits involved in cancer regulation. However, analyzing higher-order miRNA-mRNA relationships is rendered as a challenging problem due to the complexity of their interactions.
Modern cancer research has progressed from identifying biomarkers to systemically exploring gene interactions [9–11]. Many studies have focused on the interaction of genetic components at the systems level. Computational methods, which analyze gene regulatory interactions on a genome-wide scale from high-throughput biological data, have flourished in recent decades [12–14]. In addition, systems biology has proposed to build miRNA regulation networks underlying the development of many human diseases [15–17]. Moreover, miRNA regulatory mechanisms are now thought to be inferable from miRNA-mRNA interactions [18–20]. Several studies have attempted to identify groups of coherent miRNAs and mRNAs that cooperate in biological processes from heterogeneous data sources via various computational approaches, including probabilistic methods [21–28], rule-based learning [29, 30], matrix factorization , and statistical methods [32–35]. These approaches have simplified complex biological mechanisms by systematically analyzing the relationships between genetic elements at the genome level. Typically, however, bi-relationships between only two factors are assumed in many previous studies [21, 30–35]. Such restrictions are unsuitable for complex genetic interactions because information is lost under the assumption, and biological regulation is controlled by the interaction of multiple genetic components. Many studies have also investigated miRNA-mRNA regulatory interactions using biological information, especially miRNA-target information [21–25, 29–33]. Biological information reduces the number of false positives, since it provides the predictive model with prior knowledge. In contrast, unknown or hidden interactions not involved in the prior knowledge may be difficult to identify from this information. To avoid this problem, some probabilistic models which infer miRNA-mRNA modules from expression profiles only, without relying on target information, have been proposed [26–28]. Bonnet’s model, called LeMoNe [26, 27], consists of two major steps; the generation of gene clusters based on a feature-sample co-clustering method, and the inference of regulatory modules from generated clusters and regulators based on probabilistically optimized trees. In the clustering approach of Bonnet’s method, gene regulatory modules underlying a specific cancer stage are not easily identified. Liu’s approach infers functional miRNA regulatory modules using Correspondence Latent Dirichlet Allocation (Corr-LDA) . The Corr-LDA based model requires discretized data. Since the Corr-LDA model infers probability distributions from latent variables, moreover, miRNAs can be annotated to any functional modules, while mRNAs are restricted to the miRNA-inferred modules.
Here we introduce a data-driven model for identifying cancer stage-specific interactions that reflects the high-order relationships between miRNAs and mRNAs (Figure 1). The proposed model is a hypergraph comprising numerous hyperedges, representing the multi-variable combinations corresponding to miRNAs and mRNAs. Each hyperedge is formally defined as cancer-stage specific statistical figures, and thus our model can deal with real-valued data without discretization. The weight of a hyperedge reflects the strength of the higher-order dependency among the variables of the hyperedge. Therefore, each hyperedge potentially behaves as a gene module. The model explicitly constructs a complex interaction network from many such gene modules. The model is learned by finding a highly-discriminate hypergraph structure from expression profiles using data relevant to a certain stage of prostate cancer.
The learning process involves the iteration of two learning phases; structure and parameter. The structure learning phase constructs a hypergraph of putative hyperedges for discovering potential gene interactions, from a huge feature space represented by the combinations of many miRNAs and mRNAs. Because the miRNA-mRNA interactions are intractably complex, we adopt an evolutionary strategy based on an information theoretic co-regulatory measure, called mutual information. This strategy is used to select genetic variables for generating hyperedges. During the parameter learning phase, the hypergraph is refined by updating the weights of the hyperedges (representing higher-order miRNA-mRNA modules). To this end, we employ a gradient descent method similar to the back-propagation algorithm for learning artificial neural networks. The learned model is then converted into a network structure reflecting the cooperative higher-order gene activities by connecting the extracted hyperedges. Data-driven learning allows the model to build new miRNA-mRNA interaction networks which display the hidden properties of primary and metastatic prostate cancers from a given dataset, which are not known a priori.
We construct cancer stage-specific miRNA-mRNA interaction networks reflecting their higher-order relationships using the MSKCC Prostate Oncogenome Project dataset  from the model. We demonstrate that the proposed model can build several biologically significant miRNA-mRNA interaction networks, including potential modules associated with primary and metastatic prostate cancer. Moreover, cancer-related miRNAs and genes dominate the identified interactions. Some of these interactions, such as hsa-miR-1, hsa-miR-133a, hsa-miR-143, hsa-miR-145, hsa-miR-221, hsa-miR-222, act as hubs in the constructed networks. We also confirm the biological relevance of the constructed networks through literature review and functional analysis.
Data and experimental settings
In this study, miRNA and mRNA expression profiles obtained from the MSKCC Prostate Oncogenome Project  were matched at three stages of prostate cancer. The dataset contains 373 miRNAs and 19,780 mRNAs from 27 normal, 98 primary and 13 metastatic stages. During preprocessing, sample-wise and feature-wise normalization was conducted, and miRNAs and mRNAs were separately normalized. The experimental parameter settings are listed in Table 1. The parameters are those yielding optimal performance in empirical experiments. A hypergraph can include hyperedges with different number of genetic variables but we fixed the number of variables for all hyperedges of a hypergraph in this study.
Parameter settings for experiments
# of miRNA
# of mRNA
# of modules
β in (5)
Epochs of structure learning
Epochs of parameter learning
η in (10)
κ in (11)
γ in (13)
Rmax , Rmin
Classification performance was evaluated using three standard classification models; support vector machines (SVMs) with the 2nd polynomial kernel and sequential minimal optimization (SMO), k-th nearest neighbor classifiers (k-NNs), and naïve Bayes classifiers (NBs) implemented in Weka . The MATLB algorithms lasso and elastic net (α=0.5) were also used. All results were averaged over 10 experiments. Figure 2 presents the classification accuracy of our model compared to other models. As revealed by the p-values of the t-test, the proposed hypergraph-based model competes on-par with SVMs and outperforms the k-NN, NB and Lasso-based methods. In addition, by comparing the results of 3–5 HG (a hypergraph model whose hyperedges consist of three miRNAs and five mRNAs) and 1–1 HG, we observe that higher-order relationships are more important for discriminating cancer stages than pair-wise relationships between a single miRNA and mRNA.
The proposed hypergraph-based learning method is evaluated on simulation data for verifying whether the method finds true solutions. The data consist of 500 instances with 7 variables whose mean is zero and the class label of each instance is determined as follows:
where xi and c(n) denote the i-th random variable and the class label of the n-th instance. Table 2 illustrates the classification accuracy and predefined modules in the learned model. The accuracy is averaged after 10 experiments by 10-fold cross validation, and each hypergraph includes 20 hyperedges with four variables. In Table 2, Module 1 and 2 means the number of case when there exist hyperedges involving a predefined-set 1 (x2, x3, x4) and 2 (x5, x6, x7) in a learned hypergraph. Because we conducted 10-fold cross validation, the maximum values of Module 1 and 2 are ten. Therefore, we indicate that our method can find true solutions from small combinatorial spaces, considering the accuracy and the number of found variable modules.
Verification result on the simulation dataset
Figure 3 presents two learning curves under various conditions of the structure (a) and the parameter (b) learning phases. As the measure for structure learning, we used mean multivariate mutual information (MMI) of all hyperedges in the model because the goal of the structure learning is to find the significant higher-order cancer-specific gene interaction modules, and an MMI is the measure reflecting the strength of interactions among genetic factors in the hyperedges considering the stage of cancer. On the other hand, classification accuracy is used as the measure for the parameter learning phase since the weight for each cancer stage is updated to minimize the error in the phase. Figure 3(a) presents the increase of mean MMI under various Rmin which is the minimum ratio of the hyperedges replaced in the iteration, and plays a role of the structure learning rate. We indicate that too large an Rmin causes low MMI by replacing too many hyperedges and too small an Rmin leads slow increase of the MMI from Figure 3(a). Figure 3(b) presents similar results to (a) with respect to the effect of learning rate γ.
Moreover, Figure 4 shows the classification accuracy according to the number of genetic factors in the hyperedges. The classification accuracy is the best when a hypergraph consists of hyperedges with three miRNAs and five mRNAs. We indicate that small number of genetic variables show worse performance because various processes of prostate cancer is influenced on the complex interactions among many features. Furthermore, the accuracy of the hypergraphs including hyperedges with more than ten genetic variables is low since the models consist of too specific information and thus have the low generalization property.
Table 3 and Figure 5 show that the proposed learning method can stably extract significant genetic factors despite its random selection approach. We define a measure as the number of appearance of a gene in the model, A(xi), for verifying the stability of the model as follows:
where xi denotes the i-th miRNA or mRNA, and Hm is the m-th learned model. δ(xi, Hm) is an indicator function and it returns one when xi appears at least once in Hm, otherwise zero. The proposed method is compared to randomly generated hypergraphs each comprising 200 hyperedges involving three miRNAs and five mRNAs. The results are derived from 100 models learned by 10 experiments of 10-fold cross validations, and 100 randomly generated hypergraphs. According to Figure 5(a), our method extracts significant miRNAs only, while almost all of the miRNAs are involved in random graphs. Moreover, whereas the learning method selects several significant mRNAs, all mRNAs appear at low frequency in the random graphs, as shown to Figure 5(b). The stability and reproducibility of the proposed model is evident from the high-frequency occurrence of high ranked miRNAs and mRNAs, indicating that certain genes persist in the models. Table 3 lists the miRNAs and mRNAs that appear frequently and rarely in 100 learned models and in randomly generated graphs. Given that several key genes decisively affect a specific cancer, we posit that the proposed model consistently selects essential factors, in contrast to a random selection.
Frequently and rarely appearing miRNAs and mRNAs in the 100 learned models
Constructed higher-order miRNA-mRNA interaction networks in prostate cancer
The miRNA-mRNA interaction network constructed from the proposed model is illustrated in Figure 6(a) and (b) for primary and metastatic prostate cancer respectively . The constructed interaction networks comprise putative miRNA-mRNA modules associated with each stage of prostate cancer, and reflect their higher-order relationships. The primary prostate cancer network includes 67 miRNAs and 233 mRNAs, while the metastatic prostate cancer network involves 65 miRNAs and 180 mRNAs.
Many of the miRNAs in the constructed networks have been significantly associated with prostate cancer in the literature, and are thus termed prostate cancer-related miRNAs . In addition, many of the genes in the constructed networks overlap with cancer-related genes, including transcription factors. To confirm this finding, we compiled a list of 496 oncogenes and 874 tumor suppressor genes from the Cancer Genes of Memorial Sloan-Kettering Cancer Center  and 1476 human transcription factors . We investigated cancer gene enrichment in the constructed interaction networks by hypergeometric test. As shown in Figure 7, most of the significant genes (p-value close to 0) in the constructed networks are overrepresented in the compiled list. This result unambiguously demonstrates that our model can build interaction networks of genetic factors associated with cancer processes.
Interestingly, the enriched hyperedges, and the expression levels of the miRNAs and mRNAs, differ considerably between the primary and metastatic networks. Up- and down-expressed miRNAs and genes are determined by their means at each stage. The red boxed miRNAs and genes are known to be associated with the various stages of prostate cancer [4–8, 42, 43]. The triangles rectangles, diamonds and circles denote miRNAs, oncogenes/ tumor suppressor genes, transcription factors, and other genes in the network, respectively.
Functional analysis of the constructed interaction networks
The constructed miRNA-mRNA interaction networks were validated by functional analyses based on a literature review and gene set analysis. As mentioned above, many of the miRNAs and mRNAs involved in the identified interactions are known indicators of prostate cancer [4–8]. In addition, the mRNAs comprise a portion of their predicted target genes , some of which have been experimentally validated. In particular, several miRNAs are known as ‘oncomiRs’ which function as oncogenes or tumor suppressors, including has-miR-1, -133a, -143, -145, -221, and −222 [45–48]. Many hyperedges in the constructed networks contain the above miRNAs as their components; these particular miRNAs also act as hubs in the networks.
Especially, hsa-miR-143 and hsa-miR-145 play a crucial role in metastatic prostate cancer, and are recognized as a clinicopathological signature of prostate cancer . Interaction modules involving hsa-miR-143 and −145 occupy a large portion of the networks constructed by our model. In addtion, the identified interactions in metastatic prostate cancer contain several experimentally confirmed targets of hsa-miR-143 and −145, including CLINT1, CDKN1A, IRS1, MAPK7, PPM1D and SOD2. Furthermore, hsa-miR-143 and −145 are expressed at low levels in the metastatic network, as has been experimentally validated .
Moreover, hsa-miR-200c emerges as a distinct miRNA in the network of primary prostate cancer. According to several studies, hsa-miR-200c overexpression inhibits metastasis prostate cancer, while aberrant regulation triggers the invasion and migration of prostate cancer at the post-transcriptional level .
Our model identified several transcription factors associated with prostate cancer metastasis, such as ETS2, HOXC4, STAT3, STAT5B, SOX4 and ZEB2. Among these, SOX4, STAT3 and STAT5B are known regulators of metastatic prostate cancer through the regulation of genes involved in miRNA processing, transcriptional regulation, and developmental pathways [50–52]. Indeed, SOX4 is directly regulated by hsa-miR-335 in cancer progression , while hsa-miR-125b coordinates STAT3 regulation in the proliferation of tumor cells [51, 53].
Interactions involving hsa-miR-29b/MMP2 and hsa-miR-335/SOX4 appear concurrently in the constructed metastatic network (Table 4). This finding is consistent with previous studies, in which-miR-29b and −335 were found to suppress tumor metastasis and migration by regulating MMP2 and SOX4, respectively [42, 54]. Interestingly, both of these interactions involve hsa-miR-143, which is closely linked to prostate cancer progression. Furthermore, the well-known cancer-associated genetic factors MMP2 and SOX4 co-emerged in the identified interactions. Although the interactions identified by our model have not been previously reported, they clearly reflect higher-order relationships between miRNAs and mRNAs. As such, they may signify unknown regulatory circuits in prostate cancer development and progression. This result suggests the utility of the proposed model in identifying undiscovered miRNA-mRNA interactions.
Examples of modules (hyperedges) in primary and metastatic prostate cancer
miRNAs [exp. levels: up (+), down (−)]
mRNAs [exp. levels: up (+), down (−)]
Primary prostate cancer
Metastatic prostate cancer
miRNAs and their predicted targets are given in bold font. The underlined genes are the cancer genes archived in the Memorial Sloan-Kettering Cancer Center.
To confirm the biological relevance of the constructed interaction networks, we analyzed the functional correlations among the network genes by canonical pathway analysis . The significant (low p-value) results of the analysis for the primary and metastatic prostate cancer networks are summarized in Table 5. Many of the enriched pathways are closely associated with prostate tumorigenesis and metastasis. In particular, the β-catenin degradation pathway, the Wnt/β-catenin pathway and the Wnt canonical pathway are associated with Wnt signaling, which regulates many genes implicated in prostate cancer. These pathways were identified as significant in the primary prostate cancer network. Deregulation of the Wnt-related pathway reportedly affects prostate cell proliferation and differentiation . Moreover, the annotated genes in the constructed network, such as APC, AXIN1, AKT2, CCND2, CAV1, TLE2 and TCF4, are essential regulatory components of these pathways in prostate cancer. ErbB-related pathways were identified in the metastatic network, including the ErbB network pathway, ErbB4 pathway, Her2 pathway, ErbB2/ErbB3 signaling pathway and the EGFR pathway, which are implicated in prostate cancer progression and metastasis [43, 57]. The FOXM1 pathway also regulates tumor metastasis (including that of prostate cancer) by stimulating the expression of several genes involved in the proliferation of tumor cells and cell cycle progression . The top-ranked pathway in the metastatic network is the MYC activation pathway. MYC reportedly promotes the metastatic phenotype by altering the epigenetic landscape of cancer cells, and is overexpressed in ~75% of advanced prostate cancer patients . Thus, the MYC pathway is a putative key feature of metastatic progression .
Canonical pathway analysis of the constructed interaction networks in primary and metastatic prostate cancer
Canonical pathway analysis
Primary prostate cancer
Pathways in cancer
Retinoic acid pathway
Aurora A pathway
Beta-catenin degradation pathway
Wnt canonical signaling pathway
Met pathway (signaling of HGF receptor)
P38-alpha/beta downstream pathway
Beta-catenin nuclear pathway
Aurora B pathway
EPHB forward pathway
P53 hypoxia pathway
MYC repress pathway
Progesterone mediated oocyte maturation
Rac CycD pathway (Ras and Rho protein on G1/S transition)
IL-6 (interleukin-6) pathway
FGFR2C ligand binding and activation
PDGFR-beta signaling pathway
Metastatic prostate cancer
MYC activate pathway
ErbB network pathway
KIT receptor signaling pathway
Pathways in cancer
Her2 pathway (ErbB2 in signal transduction and oncology)
Yap1 and Wwtr1/Taz stimulated gene expression
Smooth Muscle Contraction
IL-6 signaling pathway
ErbB2/ErbB3 signaling pathway
Integrin signaling pathway
HDAC class I pathway
The proposed hypergraph-based model characterizes higher-order interactions among heterogeneous genetic factors from archived data. Human cancers are typically caused by the modular control of multiple genetic factors. By analyzing gene relationships at higher-order levels, thus, we can better understand the behavior of complex cancer mechanisms. Moreover, the cooperative activities and the combinatorial regulations governed by miRNAs and mRNAs are largely unknown. We have demonstrated that higher-order relationships discriminate between specific cancer stages more precisely than pair-wise analyzes of single miRNA and mRNA interactions. From this viewpoint, we can construct a more complete interaction network consisting of putative biologically significant miRNA-mRNA modules.
In addition, our method focuses on discovering potential interactions in unknown miRNA-mRNA regulatory circuits related to specific cancer stages without the known biological information [60, 61]. The proposed model finds statistically significant gene modules from given expression profiles using a data-driven approach with co-regulatory measure (mutual information). However, a similar hypergraph structure could be readily constructed from other types of quantitative biological information, such as miRNA-target information and gene sequence similarity values. Furthermore, the hypergraph-based model more flexibly represents miRNA-RNA interactions than other methods (which assume that the expression states of miRNAs and mRNAs are linearly proportional to each other), because it isolates significant modules from the statistical co-expressed pattern among genes at a higher-order level.
The proposed hypergraph-based model is similar to Bonnet’s et al.[26, 27] and Li et al., where higher-order relationships governed by miRNA-mRNA interactions are inferred solely from expression profiles. Bonnet’s method is based on a clustering approach, it cannot readily infer gene regulatory modules at a specific cancer stage. In contrast to Bonnet’s method, our method explicitly considers the sample status, (the primary or metastatic state of prostate cancer), from which it constructs cancer stage-specific networks. Liu’s approach is based on Corr-LDA, which requires that data are discretized. By contrast, our method uses intact real-valued data, thus preventing the information loss caused by the discretization.
Furthermore, the proposed model finds the true solution in a small subset of the features, because the problem space is small enough to search exhaustively. Also, unlike other models, our model can efficiently handle the very high-dimensional data required for complex higher-order interactions among features. However, the limitation of the proposed hypergraph-based model emerges at small sample sizes. If the data are few, the reliability of the mean and covariance defined in a hyperedge is reduced.
We have proposed a hypergraph-based model consisting of higher-order miRNA-mRNA modules, which allows the construction of biologically meaningful interaction networks associated with specific cancer stages. For identifying potential significant interactions and refining model performance, we introduced a two-phase learning approach comprising structure and parameter learning. Finally, we constructed cancer stage-specific interaction networks reflecting higher-order miRNA and mRNA relationships by converting the hypergraph structure into an ordinary graph.
We constructed higher-order miRNA-mRNA interaction networks associated with the specific stage of prostate cancer from a matched dataset using the proposed model. The performance of the proposed model is similar to that of SVMs and superior to other classification models (outperforming them by approximately 6–10%). More importantly, our model can construct carcinogenic miRNA-hubbed networks that characterize primary and metastatic prostate cancer. Furthermore, we demonstrated that a large proportion of the miRNAs and mRNAs identified in the constructed interaction networks are indeed involved in prostate cancer progression and development. The proposed hypergraph-based model therefore presents as an alternative method for discovering potential gene regulatory circuits. Such discoveries will greatly assist our understanding of cancer pathogenesis.
A hypergraph-based model characterizes complex interactions among many genetic factors using hypergraph structures. A hypergraph generalizes the edge concept to a hyperedge by which more than two variables can be connected simultaneously [62, 63]. As such, it is suitable for representing higher-order relationships among heterogeneous features (e.g. miRNAs and mRNAs). In our model, a hyperedge contains two or more variables corresponding to miRNAs and mRNAs, weighted by the strength of the higher-order dependency among its elements for each class (where the class denotes a specific cancer stage). Thus, each hyperedge implies a set of miRNA-mRNA modules associated with a certain stage of cancer. The proposed model therefore facilitates the construction of higher-order miRNA-mRNA interaction networks among a population of candidate gene modules related to a specific cancer stage.
A hypergraph-based model H is formally defined as a triple H = (X, Z, E) where X, Z, and E denote the sets of miRNAs, mRNAs, and hyperedges, respectively. A hyperedge is represented by a set of statistical values, including mean and covariance for the class label corresponding to a cancer stage. The mean gene expression values differ widely among the class labels, implying that gene expression depends on cancer progression, as shown in Figure 8. The hyperedge approach enhances the discriminative capability by combining miRNAs and mRNAs (Figure 8). Given an expression dataset with N instances , where x(n) and z(n) are real-valued vectors of miRNA and mRNA expressions in the n-th instance, and y is an element of a cancer stage set Y, the i-th hyperedge ei contains the mean vectors and the covariance of its miRNAs and mRNAs for the given cancer stage:
where and denote the means calculated from the expression profiles of the j-th miRNA and the k-th mRNA, respectively, in the i-th hyperedge (whose elements comprise l miRNA and m mRNAs). l and m are called the degrees of miRNA and mRNA of the hyperedge, respectively. By the definition of a hyperedge, each hyperedge has |Y| mean vector /covariance pairs, and |Y| weights. The hypergraph-based model is considered as a population of hyperedges. Given a gene expression profile (x, z), the cancer stage of the profile is classified as y*, for which the summation of the expected values (the products of the hyperedge weight and the probability of (x, z) matching the hyperedge), is highest among the elements of Y. “(x, z) matches ei|y” means that (x, z) has similar expression values to ones of the i-th hyperedge with respect to the genetic variables involved in ei|y at cancer stage y, and we introduce a Gaussian kernel into the hyperedge to calculate the matching probability of (x, z) and ei|y, P(u=1|x, z, ei|y). The matching probability is calculated by the normalized subdimensional distance between ei|y and (x, z):
where u=1 denotes that (x, z) matches ei|y , and are the standard deviations of xij and zik (the j-th miRNA and k-th mRNA, respectively) in the i-th hyperedge for a given y, and β is a constant for adjusting the probability. Larger β implies smaller matching probability, and therefore a smaller number of hyperedges influence on classifying the data. Specifically, the cancer stage y* of (x, z) is computed as follows:
1. Calculate cy ', the sum of the expected values for each y ' in Y over all hyperedges of H:
where |H| denotes the number of hyperedges and w(ei|y) is the weight of ei|y, explained in the next subsection.
2. Predict the cancer stage as y*:
In terms of distance-based connectionist models, our model is related to radial basis function networks (RBFNs) . Whereas RBFNs use kernelized distance for all variables, the proposed hypergraph model uses the probability derived from the subdimensional distance on the projected space corresponding to each hyperedge. Unlike RBFNs, therefore, the hypergraph-based model can detect embedded subpatterns reflecting higher-order relationships among the components. Because these embedded subpatterns influence the classification, we can intuitively analyze the complex interactions of genetic factors that contribute to classifying a specific cancer stage.
Learning hypergraph-based models
The proposed model learns by finding a hypergraph structure with high discriminative capability at a specific cancer stage. This is achieved by maximizing the conditional likelihood for a model H and the gene expression profiles and a log function is adopted for convenience. To minimize the error of classifying the cancer stage, ED,H , the log conditional likelihood is maximized by least mean square criteria using (7) and a sigmoidal function:
where (x(n), z(n)) denotes the n-th miRNA-mRNA expression and y(n) is the cancer stage of the example. is the label predicted by H and δ(y(n), ) is an indicator function, equal to 1 if y(n) equals , and 0 otherwise. To enhance the classification accuracy, it is essential that the population comprises hyperedges with high discriminative capability, and the hyperedge weights must be refined to minimize (9) in the generated hypergraph.
To meet these requirements, the learning iterates two phases: structure learning and parameter learning. The structure learning constructs a hypergraph from hyperedges that identify potential miRNA-mRNA modules. The weights of the hyperedges are updated to minimize the classification error of the generated gene module population during the parameter learning phase. Because the hypergraph-based model represents a huge combinatorial feature space (size 2|x|+|z|) of many miRNAs and mRNAs, exhaustively searching for the optimal population is infeasible. Instead we adopt an evolutionary learning method based on information-theoretic criteria to generate putative hyperedges for the structure learning.
We assume that a hyperedge consisting of strongly interactive miRNAs and mRNAs is highly discriminative for classification in this study. Mutual information is used as a co-regulatory measuring criterion for efficiently selecting genes for hyperedge generation. Mutual information (MI) is an information-theoretic measure that specifies the degree of conditional independency between two random variables. When a genetic factor more strongly determines the cancer stage, the MI between the gene and the cancer stage is increased. A hyperedge is generated by probabilistically selecting miRNAs and mRNAs, and the MI between each gene and the class label determines the probability of selecting the genes. The probability PI(Xi) of selecting the i-th gene Xi is defined such that miRNAs or mRNAs with high MI are selected more frequently:
where I(Xi; Y) denotes the MI between the i-th genetic factor and the cancer stage, and η is a nonnegative constant that regularizes the influence of MIs on the gene selection. When η is zero, all variables may be selected with equal probability. Once the hyperedges have been generated, the mean vectors and covariance of the hyperedges are calculated from the training dataset. To identify putative strongly-interacting miRNA-mRNA modules, the initial weight of the i-th hyperedge is computed using the variances of each genetic factor and the multivariate MI  among all variables, including the class label involved in the hyperedge. A gene with a particular mean expression value but small variance likely possesses higher discriminative capability than one with larger variance. Moreover, by the definition of MI, large multivariate MI implies more relationships among the genes. Thus the initial weight of a hyperedge is defined as
where k is the number of variables of ei and κ denotes the ratio of the variance to MI.
In the parameter learning phase, the weights of the hyperedges are updated using the gradient descent method for all training data. The aim is to minimize the error in terms of the classification probability in (9) and the matching probability in (5):
where is the real cancer stage of a miRNA-mRNA expression sample, and t and γ denote the epoch number in the parameter learning and the parameter learning rate, respectively. The epoch is the number of weight updates for the built hypergraph during parameter learning, and γ controls the extent of weight change during parameter learning. Thus, the weight becomes high when the hyperedge consists of miRNAs and mRNAs with strong higher-order interactions and when the variances of the gene variables are small at all cancer stages. Following parameter learning, low weighted hyperedges are removed from the population, and the next structure learning step is performed. To prevent the removal of highly discriminating hyperedges, the number of replaced hyperedges decreases to a specific value as the iterations proceed, as follows:
where t is the iteration number of the structure learning phase, and Rmax and Rmin denote the maximum and minimum number of replaced hyperedges, respectively. Therefore, the number of replaced hyperedges consecutively decreases as the structure learning proceeds, while high-discriminative modules are preserved. The algorithm for learning the hypergraph-based model is presented in Figure 9.
Representing interaction networks from hypergraphs
We construct a higher-order miRNA-mRNA interaction network at a specific cancer stage from the learned model. When analyzing complex biological networks based on graph mining, frequently occurring subgraphs in the networks are generally regarded as important building blocks which are merged to create the functional network [66–69]. Since a high-weight hyperedge corresponds to a significant subgraph reflecting a higher-order relationship among genetic variables, the interaction network is constructed by connecting cliques sharing common genes. A hyperedge is assigned separate weights for each cancer stage and it is merged into the graph of the highest weighted cancer stage. Formally, a cancer-stage y ' and a cancer stage-specific interaction network G|y ' =(V, E), where V and E denote a vertex set and an edge set, respectively, is constructed by merging the hyperedges as follows (where y ' is the class label with the largest weight value):
and Ci is a clique corresponding to the i-th hyperedge ei (Figure 10). This dividing and remerging approach enables the constructed interaction networks to be easy-to-visualized without impairing the higher-order property of the model since the weight of edges in the constructed networks are derived from the hyperedge weights reflecting the strength of the higher-order interaction.
This work was supported by the National Research Foundation (NRF) Grant funded by the Korea government (MSIP) (NRF-2010-0017734, NRF-2013M3B5A2035921, and the Bio & Medical Technology Development Program, No.2012M3A9D1054622), supported by KEIT grant funded by the Korea government (MKE) (KEIT-10035348 and KEIT- 10044009), supported by AOARD R&D grant funded by AFORS (124087).
Interdisciplinary Program in Bioinformatics, Seoul National University
Center for Biointelligence Technology (CBIT), Seoul National University
School of Computer Science and Engineering, Seoul National University
Jemal A, Siegel R, Xu J, Ward E: Cancer statistics, 2010.CA Cancer J Clin 2010,60(5):277–300.PubMedView Article
Hartwell LH, Hopfield JJ, Leibler S, Murray AW: From molecular to modular cell biology.Nature 1999, 402:47–52.View Article
Klamt S, Haus U, Theis F: Hypergraphs and cellular networks.PLoS Comput Biol 2009,5(5):e1000385.PubMedView Article
Coppola V, Maria RD, Bonci D: MicroRNAs and prostate cancer.Endocr Relat Cancer 2010, 17:F1-F17.PubMedView Article
Pang Y, Young CY, Yuan H: MicroRNAs and prostate cancer.Acta Biochim Biophys Sin 2010, 42:363–69.PubMedView Article
Gordanpour A, Nam RK, Sugar L, Seth A: MicroRNAs in prostate cancer: from biomarkers to molecularly-based therapeutics.Prostate Cancer Prostatic Dis 2012, 15:314–319.PubMedView Article
Watahiki A, Wang Y, Morris J, Dennis K, O'Dwyer HM, Gleave M, Gout PW, Wang Y: MicroRNAs associated with metastatic prostate cancer.PLoS One 2011,6(9):e24950.PubMedView Article
Schaefer A, Jung M, Mollenkopf HJ, Wagner I, Stephan C, Jentzmik F, Miller K, Lein M, Kristiansen G, Jung K: Diagnostic and prognostic implications of microRNA profiling in prostate carcinoma.Int J Cancer 2010, 126:1166–1176.PubMed
Hornberg JJ, Bruggeman FJ, Westerhoff HV, Lankelma J: Cancer: a systems biology disease.Biosystems 2006, 83:81–90.PubMedView Article
Wang E, Lenferink A, O'Connor-McCourt M: Cancer systems biology: exploring cancer-associated genes on cellular networks.Cell Mol Life Sci 2007,64(14):1752–1762.PubMedView Article
Liu ZP, Wang Y, Zhang XS, Chen L: Network-based analysis of complex diseases.IET Syst Biol 2012,6(1):22–33.PubMedView Article
Segal E, Shapira M, Regev A, Pe'er D, Botstein D, Koller D, Friedman N: Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data.Nat Genet 2003,34(2):166–76.PubMedView Article
Bar-Joseph Z, Gerber GK, Lee TI, Rinaldi NJ, Yoo JY, Robert F, Gordon DB, Fraenkel E, Jaakkola TS, Young RA, Gifford DK: Computational discovery of gene modules and regulatory networks.Nat Biotechnol 2003,21(11):1337–42.PubMedView Article
Lee WP, Tzou WS: Computational methods for discovering gene networks from expression data.Brief Bioinform 2009,10(4):408–423.PubMed
Wang E: RNA technologies in cardiovascular medicine and research. Edited by: Erdmann VA, Poller W, Barciszewski J. Germany: Springer; 2008:69–86.View Article
Bandyopadhyay S, Mitra R, Maulik U, Zhang MQ: Development of the human cancer microRNA network.Silence 2010,1(1):6.PubMedView Article
Volinia S, Galasso M, Costinean S, Tagliavini L, Gamberoni G, Drusco A, Marchesini J, Mascellani N, Sana ME, Jarour RA, Desponts C, Teitell M, Baffa R, Aqeilan R, Iorio V, Taccioli C, Garzon R, Leva GD, Fabbri M, Catozzi M, Previati M, Ambs S, Palumbo T, Garofalo M, Veronese A, Bottoni A, Gasparini P, Harris C, Visone R, Pekarsky P, Chapelle A, Bloomston M, Dillhoff M, Rassenti LZ, Kipps TJ, Huebner K, Pichiorri F, Lenze D, Cairo S, Buendia M, Pineau P, Dejean A, Zanesi N, Rossi S, Calin GA, Liu C, Palatini J, Negrini M, Vecchione A, Rosenberg A, Croce CM: Reprogramming of miRNA networks in cancer and leukemia.Genome Res 2010, 20:589–599.PubMedView Article
Satoh J, Tabunoki H: Comprehensive analysis of human microRNA target networks.BioData Mining 2011, 4:17.PubMedView Article
Liu B, Li J, Cairns MJ: Identifying miRNAs, targets and functions.Brief Bioinform 2012. doi:10.1093/bib/bbs075.
Muniategui A, Pey J, Planes FJ, Rubio A: Joint analysis of miRNA and mRNA expression data.Brief Bioinform 2012. doi:10.1093/bib/bbs028.
Yoon S, De Micheli G: Prediction of regulatory modules comprising microRNAs and target genes.Bioinformatics 2005,21(Suppl. 2):ii93–100.PubMedView Article
Huang J, Morris Q, Frey B: Detecting microRNA targets by linking sequence, microRNA and gene expression data. 2006, 114–129.
[Research in Computational Molecular Biology]
Joung JG, Hwang KB, Nam JW, Kim SJ, Zhang BT: Discovery of microRNA-mRNA modules via population-based probabilistic learning.Bioinformatics 2007,23(9):1141–1147.PubMedView Article
Joung JG, Fei Z: Identification of microRNA regulatory modules in Arabidopsis via a probabilistic graphical model.Bioinformatics 2009,25(3):387–393.PubMedView Article
Liu B, Li J, Tsykin A, Liu L, Gaur AB, Goodall GJ: Exploring complex miRNA-mRNA interactions with Bayesian networks by splitting-averaging strategy.BMC Bioinformatics 2009,10(1):408.PubMedView Article
Bonnet E, Michoel T, Van de Peer Y: Prediction of a gene regulatory network linked to prostate cancer from gene expression, microRNA and clinical data.Bioinformatics 2010,26(18):638–644.View Article
Bonnet E, Tatari M, Joshi A, Michoel T, Marchal K, Berx G, Van de Peer Y: Module network inference from a cancer gene expression data set identifies microRNA regulated modules.PLoS One 2010,5(4):e10162.PubMedView Article
Liu B, Liu L, Tsykin A, Goodall GJ, Green JE, Zhu M, Kim CH, Li J: Identifying functional miRNA-mRNA regulatory modules with correspondence latent Dirichlet allocation.Bioinformatics 2010,26(24):3105–3111.PubMedView Article
Tran D, Satou K, Ho T: Finding microRNA regulatory modules in human genome using rule induction.BMC Bioinformatics 2008,9(Suppl. 12):S5.PubMedView Article
Liu B, Li J, Tsykin A: Discovery of functional miRNA-mRNA regulatory modules with computational methods.J Biomed Inform 2009,42(4):685–691.PubMedView Article
Zhang S, Li Q, Liu J, Zhou XJ: A novel computational framework for simultaneous integration of multiple types of genomic data to identify microRNA-gene regulatory modules.Bioinformatics 2011,27(13):i401–409.PubMedView Article
Peng X, Li Y, Walters KA, Rosenzweig ER, Lederer SL, Aicher LD, Proll S, Katze MG: Computational identification of hepatitis C virus associated microRNA-mRNA regulatory modules in human livers.BMC Genomics 2009,10(1):373.PubMedView Article
Nunez-Iglesias J, Liu CC, Morgan TE, Finch CE, Zhou XJ: Joint genome-wide profiling of miRNA and mRNA expression in Alzheimer’s disease cortex reveals altered miRNA regulation.PLoS One 2010,5(2):e8898.PubMedView Article
Lu Y, Zhou Y, Qu W, Deng M, Zhang C: A Lasso regression model for the construction of microRNA-target regulatory networks.Bioinformatics 2011,27(17):2406–2413.PubMedView Article
Zhang W, Edwards A, Fan W, Flemington EK, Zhang K: MiRNA-mRNA correlation-network modules human prostate cancer and the differences between primary and metastatic tumor subtypes.PLoS One 2012,7(6):e40130.PubMedView Article
Taylor BS, Schultz N, Hieronymus H, Gopalan A, Xiao Y, Carver BS, Arora VK, Kaushik P, Cerami E, Reva B, Antipin Y, Mitsiades N, Landers T, Dolgalev I, Major JE, Wilson M, Socci ND, Lash AE, Heguy A, Eastham JA, Scher HI, Reuter VE, Scardino PT, Sander C, Sawyers CL, Gerald WL: Integrative genomic profiling of human prostate cancer.Cancer Cell 2010, 18:11–22.PubMedView Article
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH: The WEKA data mining software: an update.SIGKDD Explor 2009,11(1):10–18.View Article
Smoot ME, Ono K, Ruscheinski J, Wang PL, Ideker T: Cytoscape 2.8: new features for data integration and network visualization.Bioinformatics 2011,27(3):431–432.PubMedView Article
Jiang Q, Wang Y, Hao Y, Juan L, Teng M, Zhang X, Li M, Wang G, Liu Y: miR2Disease: a manually curated database for microRNA deregulation in human disease.Nucleic Acids Res 2009, 37:D98–104.PubMedView Article
Higgins ME, Claremont M, Major JE, Sander C, Lash AE: CancerGenes: a gene selection resource for cancer genome projects.Nucleic Acids Res 2007,35(Database issue):D721-D726.PubMedView Article
Zhang HM, Chen H, Liu W, Liu H, Gong J, Wang H, Guo AY: AnimalTFDB: a comprehensive animal transcription factor database.Nucleic Acids Res 2012,40(Database issue):D144-D149.PubMedView Article
Triulzi T, Iorio MV, Tagliabue E, Casalini P: MicroRNA: new players in metastatic process. In Oncogene and Cancer - From Bench to Clinic. Edited by: Siregar Y. InTech; 2013:391–414.
Dasgupta S, Srinidhi S, Vishwanatha JK: Oncogenic activation in prostate cancer progression and metastasis: molecular insights and future challenges.J Carcinog 2012,11(1):4.PubMedView Article
Betel D, Koppal A, Agius P, Sander C, Leslie C: mirSVR predicted target site scoring method: Comprehensive modeling of microRNA targets predicts functional non-conserved and non-canonical sites.Genome Biol 2010, 11:R90.PubMedView Article
Esquela-Kerscher A, Slack FJ: Oncomirs: microRNAs with a role in cancer.Nat Rev Cancer 2006, 6:259–269.PubMedView Article
Kojima S, Chiyomaru T, Kawakami K, Yoshino H, Enokida H, Nohata N, Fuse M, Ichikawa T, Naya Y, Nakagawa M, Seki N: Tumour suppressors miR-1 and miR-133a target the oncogenic function of purine nucleoside phosphorylase (PNP) in prostate cancer.Br J Cancer 2012,106(2):405–413.PubMedView Article
Peng X, Guo W, Liu T, Wang X, Tu X, Xiong D, Chen S, Lai Y, Du H, Chen G, Liu G, Tang Y, Huang S, Zou X: Identification of miRs-143 and −145 that is associated with bone metastasis of prostate cancer and involved in the regulation of EMT.PLoS One 2011,6(5):e20341.PubMedView Article
Galardi S, Mercatelli N, Giorda E, Massalini S, Frajese GV, Ciafrè SA, Farace MG: miR-221 and miR-222 expression affects the proliferation potential of human prostate carcinoma cell lines by targeting p27Kip1.J Biol Chem 2007,282(32):23716–23724.PubMedView Article
Vrba L, Jensen TJ, Garbe JC, Heimark RL, Cress AE, Dickinson S, Stampfer MR, Futscher BW: Role for DNA methylation in the regulation of miR-200c and miR-141 expression in normal and cancer cells.PLoS One 2010,5(1):e8697.PubMedView Article
Scharer CD, McCabe CD, Ali-Seyed M, Berger MF, Bulyk ML, Moreno CS: Genome-wide promoter analysis of the SOX4 transcriptional network in prostate cancer cells.Cancer Res 2009,69(2):709–717.PubMedView Article
Gu L, Vogiatzi P, Puhr M, Dagvadorj A, Lutz J, Ryder A, Addya S, Fortina P, Cooper C, Leiby B, Dasgupta A, Hyslop T, Bubendorf L, Alanen K, Mirtti T, Nevalainen MT: STAT5 promotes metastatic behavior of human prostate cancer cells in vitro and in vivo.Endocr Relat Cancer 2010,17(2):481–493.PubMedView Article
Haghikia A, Hoch M, Stapel B, Hilfiker-Kleiner D: STAT3 regulation of and by microRNAs in development and disease.JAK STAT 2012,1(3):143–105.View Article
Steele R, Mott JL, Ray RB: MBP-1 upregulates miR-29b that represses Mcl-1, collagens, and matrix-metalloproteinase-2 in prostate cancer cells.Genes Cancer 2010,1(4):381–387.PubMedView Article
Liberzon A, Subramanian A, Pinchback R, Thorvaldsdóttir H, Tamayo P, Mesirov JP: Molecular signatures database (MSigDB) 3.0.Bioinformatics 2011,27(12):1739–1740.PubMedView Article
Schwartz S Jr, Caceres C, Morote J, De Torres I, Rodriguez-Vallejo JM, Gonzalez J, Reventos J: Gains of the relative genomic content of ErbB1 and ErbB2 in prostate carcinoma and their association with metastasis.Int J Oncol 1999,14(2):367–371.PubMed
Raychaudhuri P, Park HJ: FoxM1: a master regulator of tumor metastasis.Cancer Res 2011,71(13):4329–4333.PubMedView Article
Wolfer A, Ramaswamy S: MYC and metastasis.Cancer Res 2011,71(6):2034–2037.PubMedView Article
Ivan A, Halfon M, Sinha S: Computational discovery of cis-regulatory modules in Drosophila without prior knowledge of motifs.Genome Biol 2008,9(1):R22.PubMedView Article
Zhang BT: Hypernetworks: A molecular evolutionary architecture for cognitive learning and memory.IEEE Computational Intelligence Magazine 2008,3(3):49–63.View Article
Kim SJ, Ha JW, Zhang BT: Evolutionary layered hypernetworks for identifying microRNA-mRNA regulatory modules. (WCCI-CEC 2010; 2010:2299–2306.
[Proceedings of IEEE World Congress Computational Intelligence]
Buhmann MD: Radial basis functions: theory and implementations. Cambridge University Press; 2003.
[Cambridge Monographs on Applied and Computational Mathematics (Vol.12)]
Kraskov A, Stögbauer H, Grassberger P: Estimating mutual information.Phys Rev E 2004,69(6):066138.View Article
Hu H, Yan X, Huang Y, Han J, Zhou XJ: Mining coherent dense subgraphs across massive biological networks for functional discovery.Bioinformatics 2005,21(suppl 1):i213-i221.PubMedView Article
Mason O, Verwoerd M: Graph theory and networks in biology.IET Syst Biol 2007,1(2):89–119.PubMedView Article
Yan X, Mehan MR, Huang Y, Waterman MS, Yu PS, Zhou XZ: A graph-based approach to systematically reconstruct human transcriptional regulatory modules.Bioinformatics 2007,23(13):i577-i586.PubMedView Article
Ramadan E, Perincheri P, Tuck D: A hyper-graph approach for analyzing transcriptional networks in breast cance. 2010, 556–562.
[Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.