 Research article
 Open Access
 Published:
Constructing higherorder miRNAmRNA interaction networks in prostate cancer via hypergraphbased learning
BMC Systems Biology volume 7, Article number: 47 (2013)
Abstract
Background
Dysregulation of genetic factors such as microRNAs (miRNAs) and mRNAs has been widely shown to be associated with cancer progression and development. In particular, miRNAs and mRNAs cooperate to affect biological processes, including tumorigenesis. The complexity of miRNAmRNA interactions presents a major barrier to identifying their coregulatory roles and functional effects. Thus, by computationally modeling these complex relationships, it may be possible to infer the gene interaction networks underlying complicated biological processes.
Results
We propose a datadriven, hypergraph structural method for constructing higherorder miRNAmRNA interaction networks from cancer genomic profiles. The proposed model explicitly characterizes higherorder relationships among genetic factors, from which cooperative gene activities in biological processes may be identified. The proposed model is learned by iteration of structure and parameter learning. The structure learning efficiently constructs a hypergraph structure by generating putative hyperedges representing complex miRNAmRNA modules. It adopts an evolutionary method based on informationtheoretic criteria. In the parameter learning phase, the constructed hypergraph is refined by updating the hyperedge weights using the gradient descent method. From the model, we produce biologically relevant higherorder interaction networks showing the properties of primary and metastatic prostate cancer, as candidates of potential miRNAmRNA regulatory circuits.
Conclusions
Our approach focuses on potential cancerspecific interactions reflecting higherorder relationships between miRNAs and mRNAs from expression profiles. The constructed miRNAmRNA interaction networks show oncogenic or tumor suppression characteristics, which are known to be directly associated with prostate cancer progression. Therefore, the hypergraphbased model can assist hypothesis formulation for the molecular pathogenesis of cancer.
Background
Prostate cancer is a common disease in the male population, induced by complex interactions among various genetic factors [1]. As such, the pathological causes of this disease are not easily identified. Recent human cancer studies have demonstrated that most cancer regulations are related to modular construction and combinatorial control by multiple genetic factors. This modulebased view of higherorder relationships can provide new insights into the behavior of complex biological systems [2, 3].
Recently, miRNAs have caused great excitement as diagnostic and therapeutic signatures of prostate cancer [4–8]. They play important roles in cancer pathogenesis, including disease onset, progression, and metastasis, by regulating the stability and translation efficiency of their target mRNAs. Thus, the functional relationships between miRNAs and mRNAs should be elucidated to identify key transcriptional circuits involved in cancer regulation. However, analyzing higherorder miRNAmRNA relationships is rendered as a challenging problem due to the complexity of their interactions.
Modern cancer research has progressed from identifying biomarkers to systemically exploring gene interactions [9–11]. Many studies have focused on the interaction of genetic components at the systems level. Computational methods, which analyze gene regulatory interactions on a genomewide scale from highthroughput biological data, have flourished in recent decades [12–14]. In addition, systems biology has proposed to build miRNA regulation networks underlying the development of many human diseases [15–17]. Moreover, miRNA regulatory mechanisms are now thought to be inferable from miRNAmRNA interactions [18–20]. Several studies have attempted to identify groups of coherent miRNAs and mRNAs that cooperate in biological processes from heterogeneous data sources via various computational approaches, including probabilistic methods [21–28], rulebased learning [29, 30], matrix factorization [31], and statistical methods [32–35]. These approaches have simplified complex biological mechanisms by systematically analyzing the relationships between genetic elements at the genome level. Typically, however, birelationships between only two factors are assumed in many previous studies [21, 30–35]. Such restrictions are unsuitable for complex genetic interactions because information is lost under the assumption, and biological regulation is controlled by the interaction of multiple genetic components. Many studies have also investigated miRNAmRNA regulatory interactions using biological information, especially miRNAtarget information [21–25, 29–33]. Biological information reduces the number of false positives, since it provides the predictive model with prior knowledge. In contrast, unknown or hidden interactions not involved in the prior knowledge may be difficult to identify from this information. To avoid this problem, some probabilistic models which infer miRNAmRNA modules from expression profiles only, without relying on target information, have been proposed [26–28]. Bonnet’s model, called LeMoNe [26, 27], consists of two major steps; the generation of gene clusters based on a featuresample coclustering method, and the inference of regulatory modules from generated clusters and regulators based on probabilistically optimized trees. In the clustering approach of Bonnet’s method, gene regulatory modules underlying a specific cancer stage are not easily identified. Liu’s approach infers functional miRNA regulatory modules using Correspondence Latent Dirichlet Allocation (CorrLDA) [28]. The CorrLDA based model requires discretized data. Since the CorrLDA model infers probability distributions from latent variables, moreover, miRNAs can be annotated to any functional modules, while mRNAs are restricted to the miRNAinferred modules.
Here we introduce a datadriven model for identifying cancer stagespecific interactions that reflects the highorder relationships between miRNAs and mRNAs (Figure 1). The proposed model is a hypergraph comprising numerous hyperedges, representing the multivariable combinations corresponding to miRNAs and mRNAs. Each hyperedge is formally defined as cancerstage specific statistical figures, and thus our model can deal with realvalued data without discretization. The weight of a hyperedge reflects the strength of the higherorder dependency among the variables of the hyperedge. Therefore, each hyperedge potentially behaves as a gene module. The model explicitly constructs a complex interaction network from many such gene modules. The model is learned by finding a highlydiscriminate hypergraph structure from expression profiles using data relevant to a certain stage of prostate cancer.
The learning process involves the iteration of two learning phases; structure and parameter. The structure learning phase constructs a hypergraph of putative hyperedges for discovering potential gene interactions, from a huge feature space represented by the combinations of many miRNAs and mRNAs. Because the miRNAmRNA interactions are intractably complex, we adopt an evolutionary strategy based on an information theoretic coregulatory measure, called mutual information. This strategy is used to select genetic variables for generating hyperedges. During the parameter learning phase, the hypergraph is refined by updating the weights of the hyperedges (representing higherorder miRNAmRNA modules). To this end, we employ a gradient descent method similar to the backpropagation algorithm for learning artificial neural networks. The learned model is then converted into a network structure reflecting the cooperative higherorder gene activities by connecting the extracted hyperedges. Datadriven learning allows the model to build new miRNAmRNA interaction networks which display the hidden properties of primary and metastatic prostate cancers from a given dataset, which are not known a priori.
We construct cancer stagespecific miRNAmRNA interaction networks reflecting their higherorder relationships using the MSKCC Prostate Oncogenome Project dataset [36] from the model. We demonstrate that the proposed model can build several biologically significant miRNAmRNA interaction networks, including potential modules associated with primary and metastatic prostate cancer. Moreover, cancerrelated miRNAs and genes dominate the identified interactions. Some of these interactions, such as hsamiR1, hsamiR133a, hsamiR143, hsamiR145, hsamiR221, hsamiR222, act as hubs in the constructed networks. We also confirm the biological relevance of the constructed networks through literature review and functional analysis.
Results
Data and experimental settings
In this study, miRNA and mRNA expression profiles obtained from the MSKCC Prostate Oncogenome Project [36] were matched at three stages of prostate cancer. The dataset contains 373 miRNAs and 19,780 mRNAs from 27 normal, 98 primary and 13 metastatic stages. During preprocessing, samplewise and featurewise normalization was conducted, and miRNAs and mRNAs were separately normalized. The experimental parameter settings are listed in Table 1. The parameters are those yielding optimal performance in empirical experiments. A hypergraph can include hyperedges with different number of genetic variables but we fixed the number of variables for all hyperedges of a hypergraph in this study.
Classification performance
Classification performance was evaluated using three standard classification models; support vector machines (SVMs) with the 2nd polynomial kernel and sequential minimal optimization (SMO), kth nearest neighbor classifiers (kNNs), and naïve Bayes classifiers (NBs) implemented in Weka [37]. The MATLB algorithms lasso and elastic net (α=0.5) were also used. All results were averaged over 10 experiments. Figure 2 presents the classification accuracy of our model compared to other models. As revealed by the pvalues of the ttest, the proposed hypergraphbased model competes onpar with SVMs and outperforms the kNN, NB and Lassobased methods. In addition, by comparing the results of 3–5 HG (a hypergraph model whose hyperedges consist of three miRNAs and five mRNAs) and 1–1 HG, we observe that higherorder relationships are more important for discriminating cancer stages than pairwise relationships between a single miRNA and mRNA.
Model evaluation
The proposed hypergraphbased learning method is evaluated on simulation data for verifying whether the method finds true solutions. The data consist of 500 instances with 7 variables whose mean is zero and the class label of each instance is determined as follows:
where x_{ i } and c^{(n)} denote the ith random variable and the class label of the nth instance. Table 2 illustrates the classification accuracy and predefined modules in the learned model. The accuracy is averaged after 10 experiments by 10fold cross validation, and each hypergraph includes 20 hyperedges with four variables. In Table 2, Module 1 and 2 means the number of case when there exist hyperedges involving a predefinedset 1 (x_{ 2 }, x_{ 3 }, x_{ 4 }) and 2 (x_{ 5 }, x_{ 6 }, x_{ 7 }) in a learned hypergraph. Because we conducted 10fold cross validation, the maximum values of Module 1 and 2 are ten. Therefore, we indicate that our method can find true solutions from small combinatorial spaces, considering the accuracy and the number of found variable modules.
Figure 3 presents two learning curves under various conditions of the structure (a) and the parameter (b) learning phases. As the measure for structure learning, we used mean multivariate mutual information (MMI) of all hyperedges in the model because the goal of the structure learning is to find the significant higherorder cancerspecific gene interaction modules, and an MMI is the measure reflecting the strength of interactions among genetic factors in the hyperedges considering the stage of cancer. On the other hand, classification accuracy is used as the measure for the parameter learning phase since the weight for each cancer stage is updated to minimize the error in the phase. Figure 3(a) presents the increase of mean MMI under various Rmin which is the minimum ratio of the hyperedges replaced in the iteration, and plays a role of the structure learning rate. We indicate that too large an Rmin causes low MMI by replacing too many hyperedges and too small an Rmin leads slow increase of the MMI from Figure 3(a). Figure 3(b) presents similar results to (a) with respect to the effect of learning rate γ.
Moreover, Figure 4 shows the classification accuracy according to the number of genetic factors in the hyperedges. The classification accuracy is the best when a hypergraph consists of hyperedges with three miRNAs and five mRNAs. We indicate that small number of genetic variables show worse performance because various processes of prostate cancer is influenced on the complex interactions among many features. Furthermore, the accuracy of the hypergraphs including hyperedges with more than ten genetic variables is low since the models consist of too specific information and thus have the low generalization property.
Table 3 and Figure 5 show that the proposed learning method can stably extract significant genetic factors despite its random selection approach. We define a measure as the number of appearance of a gene in the model, A(x_{i}), for verifying the stability of the model as follows:
where x_{ i } denotes the ith miRNA or mRNA, and H_{ m } is the mth learned model. δ(x_{ i }, H_{ m }) is an indicator function and it returns one when x_{ i } appears at least once in H_{ m }, otherwise zero. The proposed method is compared to randomly generated hypergraphs each comprising 200 hyperedges involving three miRNAs and five mRNAs. The results are derived from 100 models learned by 10 experiments of 10fold cross validations, and 100 randomly generated hypergraphs. According to Figure 5(a), our method extracts significant miRNAs only, while almost all of the miRNAs are involved in random graphs. Moreover, whereas the learning method selects several significant mRNAs, all mRNAs appear at low frequency in the random graphs, as shown to Figure 5(b). The stability and reproducibility of the proposed model is evident from the highfrequency occurrence of high ranked miRNAs and mRNAs, indicating that certain genes persist in the models. Table 3 lists the miRNAs and mRNAs that appear frequently and rarely in 100 learned models and in randomly generated graphs. Given that several key genes decisively affect a specific cancer, we posit that the proposed model consistently selects essential factors, in contrast to a random selection.
Constructed higherorder miRNAmRNA interaction networks in prostate cancer
The miRNAmRNA interaction network constructed from the proposed model is illustrated in Figure 6(a) and (b) for primary and metastatic prostate cancer respectively [38]. The constructed interaction networks comprise putative miRNAmRNA modules associated with each stage of prostate cancer, and reflect their higherorder relationships. The primary prostate cancer network includes 67 miRNAs and 233 mRNAs, while the metastatic prostate cancer network involves 65 miRNAs and 180 mRNAs.
Many of the miRNAs in the constructed networks have been significantly associated with prostate cancer in the literature, and are thus termed prostate cancerrelated miRNAs [39]. In addition, many of the genes in the constructed networks overlap with cancerrelated genes, including transcription factors. To confirm this finding, we compiled a list of 496 oncogenes and 874 tumor suppressor genes from the Cancer Genes of Memorial SloanKettering Cancer Center [40] and 1476 human transcription factors [41]. We investigated cancer gene enrichment in the constructed interaction networks by hypergeometric test. As shown in Figure 7, most of the significant genes (pvalue close to 0) in the constructed networks are overrepresented in the compiled list. This result unambiguously demonstrates that our model can build interaction networks of genetic factors associated with cancer processes.
Interestingly, the enriched hyperedges, and the expression levels of the miRNAs and mRNAs, differ considerably between the primary and metastatic networks. Up and downexpressed miRNAs and genes are determined by their means at each stage. The red boxed miRNAs and genes are known to be associated with the various stages of prostate cancer [4–8, 42, 43]. The triangles rectangles, diamonds and circles denote miRNAs, oncogenes/ tumor suppressor genes, transcription factors, and other genes in the network, respectively.
Functional analysis of the constructed interaction networks
The constructed miRNAmRNA interaction networks were validated by functional analyses based on a literature review and gene set analysis. As mentioned above, many of the miRNAs and mRNAs involved in the identified interactions are known indicators of prostate cancer [4–8]. In addition, the mRNAs comprise a portion of their predicted target genes [44], some of which have been experimentally validated. In particular, several miRNAs are known as ‘oncomiRs’ which function as oncogenes or tumor suppressors, including hasmiR1, 133a, 143, 145, 221, and −222 [45–48]. Many hyperedges in the constructed networks contain the above miRNAs as their components; these particular miRNAs also act as hubs in the networks.
Especially, hsamiR143 and hsamiR145 play a crucial role in metastatic prostate cancer, and are recognized as a clinicopathological signature of prostate cancer [47]. Interaction modules involving hsamiR143 and −145 occupy a large portion of the networks constructed by our model. In addtion, the identified interactions in metastatic prostate cancer contain several experimentally confirmed targets of hsamiR143 and −145, including CLINT1, CDKN1A, IRS1, MAPK7, PPM1D and SOD2. Furthermore, hsamiR143 and −145 are expressed at low levels in the metastatic network, as has been experimentally validated [7].
Moreover, hsamiR200c emerges as a distinct miRNA in the network of primary prostate cancer. According to several studies, hsamiR200c overexpression inhibits metastasis prostate cancer, while aberrant regulation triggers the invasion and migration of prostate cancer at the posttranscriptional level [49].
Our model identified several transcription factors associated with prostate cancer metastasis, such as ETS2, HOXC4, STAT3, STAT5B, SOX4 and ZEB2. Among these, SOX4, STAT3 and STAT5B are known regulators of metastatic prostate cancer through the regulation of genes involved in miRNA processing, transcriptional regulation, and developmental pathways [50–52]. Indeed, SOX4 is directly regulated by hsamiR335 in cancer progression [50], while hsamiR125b coordinates STAT3 regulation in the proliferation of tumor cells [51, 53].
Interactions involving hsamiR29b/MMP2 and hsamiR335/SOX4 appear concurrently in the constructed metastatic network (Table 4). This finding is consistent with previous studies, in whichmiR29b and −335 were found to suppress tumor metastasis and migration by regulating MMP2 and SOX4, respectively [42, 54]. Interestingly, both of these interactions involve hsamiR143, which is closely linked to prostate cancer progression. Furthermore, the wellknown cancerassociated genetic factors MMP2 and SOX4 coemerged in the identified interactions. Although the interactions identified by our model have not been previously reported, they clearly reflect higherorder relationships between miRNAs and mRNAs. As such, they may signify unknown regulatory circuits in prostate cancer development and progression. This result suggests the utility of the proposed model in identifying undiscovered miRNAmRNA interactions.
To confirm the biological relevance of the constructed interaction networks, we analyzed the functional correlations among the network genes by canonical pathway analysis [55]. The significant (low pvalue) results of the analysis for the primary and metastatic prostate cancer networks are summarized in Table 5. Many of the enriched pathways are closely associated with prostate tumorigenesis and metastasis. In particular, the βcatenin degradation pathway, the Wnt/βcatenin pathway and the Wnt canonical pathway are associated with Wnt signaling, which regulates many genes implicated in prostate cancer. These pathways were identified as significant in the primary prostate cancer network. Deregulation of the Wntrelated pathway reportedly affects prostate cell proliferation and differentiation [56]. Moreover, the annotated genes in the constructed network, such as APC, AXIN1, AKT2, CCND2, CAV1, TLE2 and TCF4, are essential regulatory components of these pathways in prostate cancer. ErbBrelated pathways were identified in the metastatic network, including the ErbB network pathway, ErbB4 pathway, Her2 pathway, ErbB2/ErbB3 signaling pathway and the EGFR pathway, which are implicated in prostate cancer progression and metastasis [43, 57]. The FOXM1 pathway also regulates tumor metastasis (including that of prostate cancer) by stimulating the expression of several genes involved in the proliferation of tumor cells and cell cycle progression [58]. The topranked pathway in the metastatic network is the MYC activation pathway. MYC reportedly promotes the metastatic phenotype by altering the epigenetic landscape of cancer cells, and is overexpressed in ~75% of advanced prostate cancer patients [43]. Thus, the MYC pathway is a putative key feature of metastatic progression [59].
Discussion
The proposed hypergraphbased model characterizes higherorder interactions among heterogeneous genetic factors from archived data. Human cancers are typically caused by the modular control of multiple genetic factors. By analyzing gene relationships at higherorder levels, thus, we can better understand the behavior of complex cancer mechanisms. Moreover, the cooperative activities and the combinatorial regulations governed by miRNAs and mRNAs are largely unknown. We have demonstrated that higherorder relationships discriminate between specific cancer stages more precisely than pairwise analyzes of single miRNA and mRNA interactions. From this viewpoint, we can construct a more complete interaction network consisting of putative biologically significant miRNAmRNA modules.
In addition, our method focuses on discovering potential interactions in unknown miRNAmRNA regulatory circuits related to specific cancer stages without the known biological information [60, 61]. The proposed model finds statistically significant gene modules from given expression profiles using a datadriven approach with coregulatory measure (mutual information). However, a similar hypergraph structure could be readily constructed from other types of quantitative biological information, such as miRNAtarget information and gene sequence similarity values. Furthermore, the hypergraphbased model more flexibly represents miRNARNA interactions than other methods (which assume that the expression states of miRNAs and mRNAs are linearly proportional to each other), because it isolates significant modules from the statistical coexpressed pattern among genes at a higherorder level.
The proposed hypergraphbased model is similar to Bonnet’s et al.[26, 27] and Li et al.[28], where higherorder relationships governed by miRNAmRNA interactions are inferred solely from expression profiles. Bonnet’s method is based on a clustering approach, it cannot readily infer gene regulatory modules at a specific cancer stage. In contrast to Bonnet’s method, our method explicitly considers the sample status, (the primary or metastatic state of prostate cancer), from which it constructs cancer stagespecific networks. Liu’s approach is based on CorrLDA, which requires that data are discretized. By contrast, our method uses intact realvalued data, thus preventing the information loss caused by the discretization.
Furthermore, the proposed model finds the true solution in a small subset of the features, because the problem space is small enough to search exhaustively. Also, unlike other models, our model can efficiently handle the very highdimensional data required for complex higherorder interactions among features. However, the limitation of the proposed hypergraphbased model emerges at small sample sizes. If the data are few, the reliability of the mean and covariance defined in a hyperedge is reduced.
Conclusions
We have proposed a hypergraphbased model consisting of higherorder miRNAmRNA modules, which allows the construction of biologically meaningful interaction networks associated with specific cancer stages. For identifying potential significant interactions and refining model performance, we introduced a twophase learning approach comprising structure and parameter learning. Finally, we constructed cancer stagespecific interaction networks reflecting higherorder miRNA and mRNA relationships by converting the hypergraph structure into an ordinary graph.
We constructed higherorder miRNAmRNA interaction networks associated with the specific stage of prostate cancer from a matched dataset using the proposed model. The performance of the proposed model is similar to that of SVMs and superior to other classification models (outperforming them by approximately 6–10%). More importantly, our model can construct carcinogenic miRNAhubbed networks that characterize primary and metastatic prostate cancer. Furthermore, we demonstrated that a large proportion of the miRNAs and mRNAs identified in the constructed interaction networks are indeed involved in prostate cancer progression and development. The proposed hypergraphbased model therefore presents as an alternative method for discovering potential gene regulatory circuits. Such discoveries will greatly assist our understanding of cancer pathogenesis.
Methods
Hypergraphbased models
A hypergraphbased model characterizes complex interactions among many genetic factors using hypergraph structures. A hypergraph generalizes the edge concept to a hyperedge by which more than two variables can be connected simultaneously [62, 63]. As such, it is suitable for representing higherorder relationships among heterogeneous features (e.g. miRNAs and mRNAs). In our model, a hyperedge contains two or more variables corresponding to miRNAs and mRNAs, weighted by the strength of the higherorder dependency among its elements for each class (where the class denotes a specific cancer stage). Thus, each hyperedge implies a set of miRNAmRNA modules associated with a certain stage of cancer. The proposed model therefore facilitates the construction of higherorder miRNAmRNA interaction networks among a population of candidate gene modules related to a specific cancer stage.
A hypergraphbased model H is formally defined as a triple H = (X, Z, E) where X, Z, and E denote the sets of miRNAs, mRNAs, and hyperedges, respectively. A hyperedge is represented by a set of statistical values, including mean and covariance for the class label corresponding to a cancer stage. The mean gene expression values differ widely among the class labels, implying that gene expression depends on cancer progression, as shown in Figure 8. The hyperedge approach enhances the discriminative capability by combining miRNAs and mRNAs (Figure 8). Given an expression dataset with N instances $D={\left\{{d}^{\left(n\right)}\right\}}_{n=1}^{N}={\left\{{\mathbf{x}}^{\left(n\right)},{\mathbf{z}}^{\left(n\right)},{y}^{\left(n\right)}\right\}}_{n=1}^{N}$, where x^{(n)} and z^{(n)} are realvalued vectors of miRNA and mRNA expressions in the nth instance, and y is an element of a cancer stage set Y, the ith hyperedge e_{ i } contains the mean vectors and the covariance of its miRNAs and mRNAs for the given cancer stage:
where ${\mu}_{\mathit{ij}}^{x}$ and ${\mu}_{\mathit{ik}}^{z}$ denote the means calculated from the expression profiles of the jth miRNA and the kth mRNA, respectively, in the ith hyperedge (whose elements comprise l miRNA and m mRNAs). l and m are called the degrees of miRNA and mRNA of the hyperedge, respectively. By the definition of a hyperedge, each hyperedge has Y mean vector /covariance pairs, and Y weights. The hypergraphbased model is considered as a population of hyperedges. Given a gene expression profile (x, z), the cancer stage of the profile is classified as y*, for which the summation of the expected values (the products of the hyperedge weight and the probability of (x, z) matching the hyperedge), is highest among the elements of Y. “(x, z) matches e_{ iy }” means that (x, z) has similar expression values to ones of the ith hyperedge with respect to the genetic variables involved in e_{ iy } at cancer stage y, and we introduce a Gaussian kernel into the hyperedge to calculate the matching probability of (x, z) and e_{ iy }, P(u=1x, z, e_{ iy }). The matching probability is calculated by the normalized subdimensional distance between e_{ iy } and (x, z):
where u=1 denotes that (x, z) matches e_{ iy } , ${\sigma}_{\mathit{ij}y}^{x}$ and ${\sigma}_{\mathit{ij}y}^{z}$ are the standard deviations of x_{ ij } and z_{ ik } (the jth miRNA and kth mRNA, respectively) in the ith hyperedge for a given y, and β is a constant for adjusting the probability. Larger β implies smaller matching probability, and therefore a smaller number of hyperedges influence on classifying the data. Specifically, the cancer stage y* of (x, z) is computed as follows:

1.
Calculate c _{y '}, the sum of the expected values for each y ' in Y over all hyperedges of H:
$${c}_{y\text{'}}={\displaystyle \sum _{i=1}^{\leftH\right}w\left({e}_{iy=y\text{'}}\right)P\left(u=1\mathbf{x},\mathbf{z},{e}_{iy=y\text{'}}\right)},$$(7)
where H denotes the number of hyperedges and w(e_{ iy }) is the weight of e_{ iy }, explained in the next subsection.

2.
Predict the cancer stage as y*:
$${y}^{*}=\underset{\phantom{\rule{.5em}{0ex}}y\text{'}\in Y}{arg\phantom{\rule{.5em}{0ex}}max}{c}_{y\text{'}}.$$(8)
In terms of distancebased connectionist models, our model is related to radial basis function networks (RBFNs) [64]. Whereas RBFNs use kernelized distance for all variables, the proposed hypergraph model uses the probability derived from the subdimensional distance on the projected space corresponding to each hyperedge. Unlike RBFNs, therefore, the hypergraphbased model can detect embedded subpatterns reflecting higherorder relationships among the components. Because these embedded subpatterns influence the classification, we can intuitively analyze the complex interactions of genetic factors that contribute to classifying a specific cancer stage.
Learning hypergraphbased models
The proposed model learns by finding a hypergraph structure with high discriminative capability at a specific cancer stage. This is achieved by maximizing the conditional likelihood for a model H and the gene expression profiles and a log function is adopted for convenience. To minimize the error of classifying the cancer stage, E_{D,H} , the log conditional likelihood is maximized by least mean square criteria using (7) and a sigmoidal function:
s.t.
where (x^{(n)}, z^{(n)}) denotes the nth miRNAmRNA expression and y^{(n)} is the cancer stage of the example. ${y}_{H}^{\text{'}}$ is the label predicted by H and δ(y(n), ${y}_{H}^{\text{'}}$) is an indicator function, equal to 1 if y(n) equals ${y}_{H}^{\text{'}}$, and 0 otherwise. To enhance the classification accuracy, it is essential that the population comprises hyperedges with high discriminative capability, and the hyperedge weights must be refined to minimize (9) in the generated hypergraph.
To meet these requirements, the learning iterates two phases: structure learning and parameter learning. The structure learning constructs a hypergraph from hyperedges that identify potential miRNAmRNA modules. The weights of the hyperedges are updated to minimize the classification error of the generated gene module population during the parameter learning phase. Because the hypergraphbased model represents a huge combinatorial feature space (size 2^{x+z}) of many miRNAs and mRNAs, exhaustively searching for the optimal population is infeasible. Instead we adopt an evolutionary learning method based on informationtheoretic criteria to generate putative hyperedges for the structure learning.
We assume that a hyperedge consisting of strongly interactive miRNAs and mRNAs is highly discriminative for classification in this study. Mutual information is used as a coregulatory measuring criterion for efficiently selecting genes for hyperedge generation. Mutual information (MI) is an informationtheoretic measure that specifies the degree of conditional independency between two random variables. When a genetic factor more strongly determines the cancer stage, the MI between the gene and the cancer stage is increased. A hyperedge is generated by probabilistically selecting miRNAs and mRNAs, and the MI between each gene and the class label determines the probability of selecting the genes. The probability P_{ I }(X_{ i }) of selecting the ith gene X_{ i } is defined such that miRNAs or mRNAs with high MI are selected more frequently:
where I(X_{ i }; Y) denotes the MI between the ith genetic factor and the cancer stage, and η is a nonnegative constant that regularizes the influence of MIs on the gene selection. When η is zero, all variables may be selected with equal probability. Once the hyperedges have been generated, the mean vectors and covariance of the hyperedges are calculated from the training dataset. To identify putative stronglyinteracting miRNAmRNA modules, the initial weight of the ith hyperedge is computed using the variances of each genetic factor and the multivariate MI [65] among all variables, including the class label involved in the hyperedge. A gene with a particular mean expression value but small variance likely possesses higher discriminative capability than one with larger variance. Moreover, by the definition of MI, large multivariate MI implies more relationships among the genes. Thus the initial weight of a hyperedge is defined as
s.t.
where k is the number of variables of e_{ i } and κ denotes the ratio of the variance to MI.
In the parameter learning phase, the weights of the hyperedges are updated using the gradient descent method for all training data. The aim is to minimize the error in terms of the classification probability in (9) and the matching probability in (5):
where $\tilde{y}$ is the real cancer stage of a miRNAmRNA expression sample, and t and γ denote the epoch number in the parameter learning and the parameter learning rate, respectively. The epoch is the number of weight updates for the built hypergraph during parameter learning, and γ controls the extent of weight change during parameter learning. Thus, the weight becomes high when the hyperedge consists of miRNAs and mRNAs with strong higherorder interactions and when the variances of the gene variables are small at all cancer stages. Following parameter learning, low weighted hyperedges are removed from the population, and the next structure learning step is performed. To prevent the removal of highly discriminating hyperedges, the number of replaced hyperedges decreases to a specific value as the iterations proceed, as follows:
where t is the iteration number of the structure learning phase, and R_{ max } and R_{ min } denote the maximum and minimum number of replaced hyperedges, respectively. Therefore, the number of replaced hyperedges consecutively decreases as the structure learning proceeds, while highdiscriminative modules are preserved. The algorithm for learning the hypergraphbased model is presented in Figure 9.
Representing interaction networks from hypergraphs
We construct a higherorder miRNAmRNA interaction network at a specific cancer stage from the learned model. When analyzing complex biological networks based on graph mining, frequently occurring subgraphs in the networks are generally regarded as important building blocks which are merged to create the functional network [66–69]. Since a highweight hyperedge corresponds to a significant subgraph reflecting a higherorder relationship among genetic variables, the interaction network is constructed by connecting cliques sharing common genes. A hyperedge is assigned separate weights for each cancer stage and it is merged into the graph of the highest weighted cancer stage. Formally, a cancerstage y ' and a cancer stagespecific interaction network G_{y '} =(V, E), where V and E denote a vertex set and an edge set, respectively, is constructed by merging the hyperedges as follows (where y ' is the class label with the largest weight value):
and C_{ i } is a clique corresponding to the ith hyperedge e_{ i } (Figure 10). This dividing and remerging approach enables the constructed interaction networks to be easytovisualized without impairing the higherorder property of the model since the weight of edges in the constructed networks are derived from the hyperedge weights reflecting the strength of the higherorder interaction.
References
 1.
Jemal A, Siegel R, Xu J, Ward E: Cancer statistics, 2010. CA Cancer J Clin. 2010, 60 (5): 277300, 10.3322/caac.20073
 2.
Hartwell LH, Hopfield JJ, Leibler S, Murray AW: From molecular to modular cell biology. Nature. 1999, 402: 4752. 10.1038/46972.
 3.
Klamt S, Haus U, Theis F: Hypergraphs and cellular networks. PLoS Comput Biol. 2009, 5 (5): e100038510.1371/journal.pcbi.1000385.
 4.
Coppola V, Maria RD, Bonci D: MicroRNAs and prostate cancer. Endocr Relat Cancer. 2010, 17: F1F17. 10.1677/ERC090172.
 5.
Pang Y, Young CY, Yuan H: MicroRNAs and prostate cancer. Acta Biochim Biophys Sin. 2010, 42: 36369. 10.1093/abbs/gmq038.
 6.
Gordanpour A, Nam RK, Sugar L, Seth A: MicroRNAs in prostate cancer: from biomarkers to molecularlybased therapeutics. Prostate Cancer Prostatic Dis. 2012, 15: 314319. 10.1038/pcan.2012.3.
 7.
Watahiki A, Wang Y, Morris J, Dennis K, O'Dwyer HM, Gleave M, Gout PW, Wang Y: MicroRNAs associated with metastatic prostate cancer. PLoS One. 2011, 6 (9): e2495010.1371/journal.pone.0024950.
 8.
Schaefer A, Jung M, Mollenkopf HJ, Wagner I, Stephan C, Jentzmik F, Miller K, Lein M, Kristiansen G, Jung K: Diagnostic and prognostic implications of microRNA profiling in prostate carcinoma. Int J Cancer. 2010, 126: 11661176.
 9.
Hornberg JJ, Bruggeman FJ, Westerhoff HV, Lankelma J: Cancer: a systems biology disease. Biosystems. 2006, 83: 8190. 10.1016/j.biosystems.2005.05.014.
 10.
Wang E, Lenferink A, O'ConnorMcCourt M: Cancer systems biology: exploring cancerassociated genes on cellular networks. Cell Mol Life Sci. 2007, 64 (14): 17521762. 10.1007/s0001800770546.
 11.
Liu ZP, Wang Y, Zhang XS, Chen L: Networkbased analysis of complex diseases. IET Syst Biol. 2012, 6 (1): 2233. 10.1049/ietsyb.2010.0052.
 12.
Segal E, Shapira M, Regev A, Pe'er D, Botstein D, Koller D, Friedman N: Module networks: identifying regulatory modules and their conditionspecific regulators from gene expression data. Nat Genet. 2003, 34 (2): 16676. 10.1038/ng1165.
 13.
BarJoseph Z, Gerber GK, Lee TI, Rinaldi NJ, Yoo JY, Robert F, Gordon DB, Fraenkel E, Jaakkola TS, Young RA, Gifford DK: Computational discovery of gene modules and regulatory networks. Nat Biotechnol. 2003, 21 (11): 133742. 10.1038/nbt890.
 14.
Lee WP, Tzou WS: Computational methods for discovering gene networks from expression data. Brief Bioinform. 2009, 10 (4): 408423.
 15.
Wang E: RNA technologies in cardiovascular medicine and research. Edited by: Erdmann VA, Poller W, Barciszewski J. 2008, Germany: Springer, 6986.
 16.
Bandyopadhyay S, Mitra R, Maulik U, Zhang MQ: Development of the human cancer microRNA network. Silence. 2010, 1 (1): 610.1186/1758907X16.
 17.
Volinia S, Galasso M, Costinean S, Tagliavini L, Gamberoni G, Drusco A, Marchesini J, Mascellani N, Sana ME, Jarour RA, Desponts C, Teitell M, Baffa R, Aqeilan R, Iorio V, Taccioli C, Garzon R, Leva GD, Fabbri M, Catozzi M, Previati M, Ambs S, Palumbo T, Garofalo M, Veronese A, Bottoni A, Gasparini P, Harris C, Visone R, Pekarsky P, Chapelle A, Bloomston M, Dillhoff M, Rassenti LZ, Kipps TJ, Huebner K, Pichiorri F, Lenze D, Cairo S, Buendia M, Pineau P, Dejean A, Zanesi N, Rossi S, Calin GA, Liu C, Palatini J, Negrini M, Vecchione A, Rosenberg A, Croce CM: Reprogramming of miRNA networks in cancer and leukemia. Genome Res. 2010, 20: 589599. 10.1101/gr.098046.109.
 18.
Satoh J, Tabunoki H: Comprehensive analysis of human microRNA target networks. BioData Mining. 2011, 4: 1710.1186/17560381417.
 19.
Liu B, Li J, Cairns MJ: Identifying miRNAs, targets and functions. Brief Bioinform. 2012, 10.1093/bib/bbs075.
 20.
Muniategui A, Pey J, Planes FJ, Rubio A: Joint analysis of miRNA and mRNA expression data. Brief Bioinform. 2012, doi:10.1093/bib/bbs028.
 21.
Yoon S, De Micheli G: Prediction of regulatory modules comprising microRNAs and target genes. Bioinformatics. 2005, 21 (Suppl. 2): ii93100.
 22.
Huang J, Morris Q, Frey B: Research in Computational Molecular Biology. Detecting microRNA targets by linking sequence, microRNA and gene expression data. 2006, 114129.
 23.
Joung JG, Hwang KB, Nam JW, Kim SJ, Zhang BT: Discovery of microRNAmRNA modules via populationbased probabilistic learning. Bioinformatics. 2007, 23 (9): 11411147. 10.1093/bioinformatics/btm045.
 24.
Joung JG, Fei Z: Identification of microRNA regulatory modules in Arabidopsis via a probabilistic graphical model. Bioinformatics. 2009, 25 (3): 387393. 10.1093/bioinformatics/btn626.
 25.
Liu B, Li J, Tsykin A, Liu L, Gaur AB, Goodall GJ: Exploring complex miRNAmRNA interactions with Bayesian networks by splittingaveraging strategy. BMC Bioinformatics. 2009, 10 (1): 40810.1186/1471210510408.
 26.
Bonnet E, Michoel T, Van de Peer Y: Prediction of a gene regulatory network linked to prostate cancer from gene expression, microRNA and clinical data. Bioinformatics. 2010, 26 (18): 638644. 10.1093/bioinformatics/btq395.
 27.
Bonnet E, Tatari M, Joshi A, Michoel T, Marchal K, Berx G, Van de Peer Y: Module network inference from a cancer gene expression data set identifies microRNA regulated modules. PLoS One. 2010, 5 (4): e1016210.1371/journal.pone.0010162.
 28.
Liu B, Liu L, Tsykin A, Goodall GJ, Green JE, Zhu M, Kim CH, Li J: Identifying functional miRNAmRNA regulatory modules with correspondence latent Dirichlet allocation. Bioinformatics. 2010, 26 (24): 31053111. 10.1093/bioinformatics/btq576.
 29.
Tran D, Satou K, Ho T: Finding microRNA regulatory modules in human genome using rule induction. BMC Bioinformatics. 2008, 9 (Suppl. 12): S5
 30.
Liu B, Li J, Tsykin A: Discovery of functional miRNAmRNA regulatory modules with computational methods. J Biomed Inform. 2009, 42 (4): 685691. 10.1016/j.jbi.2009.01.005.
 31.
Zhang S, Li Q, Liu J, Zhou XJ: A novel computational framework for simultaneous integration of multiple types of genomic data to identify microRNAgene regulatory modules. Bioinformatics. 2011, 27 (13): i401409. 10.1093/bioinformatics/btr206.
 32.
Peng X, Li Y, Walters KA, Rosenzweig ER, Lederer SL, Aicher LD, Proll S, Katze MG: Computational identification of hepatitis C virus associated microRNAmRNA regulatory modules in human livers. BMC Genomics. 2009, 10 (1): 37310.1186/1471216410373.
 33.
NunezIglesias J, Liu CC, Morgan TE, Finch CE, Zhou XJ: Joint genomewide profiling of miRNA and mRNA expression in Alzheimer’s disease cortex reveals altered miRNA regulation. PLoS One. 2010, 5 (2): e889810.1371/journal.pone.0008898.
 34.
Lu Y, Zhou Y, Qu W, Deng M, Zhang C: A Lasso regression model for the construction of microRNAtarget regulatory networks. Bioinformatics. 2011, 27 (17): 24062413. 10.1093/bioinformatics/btr410.
 35.
Zhang W, Edwards A, Fan W, Flemington EK, Zhang K: MiRNAmRNA correlationnetwork modules human prostate cancer and the differences between primary and metastatic tumor subtypes. PLoS One. 2012, 7 (6): e4013010.1371/journal.pone.0040130.
 36.
Taylor BS, Schultz N, Hieronymus H, Gopalan A, Xiao Y, Carver BS, Arora VK, Kaushik P, Cerami E, Reva B, Antipin Y, Mitsiades N, Landers T, Dolgalev I, Major JE, Wilson M, Socci ND, Lash AE, Heguy A, Eastham JA, Scher HI, Reuter VE, Scardino PT, Sander C, Sawyers CL, Gerald WL: Integrative genomic profiling of human prostate cancer. Cancer Cell. 2010, 18: 1122. 10.1016/j.ccr.2010.05.026.
 37.
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH: The WEKA data mining software: an update. SIGKDD Explor. 2009, 11 (1): 1018. 10.1145/1656274.1656278.
 38.
Smoot ME, Ono K, Ruscheinski J, Wang PL, Ideker T: Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics. 2011, 27 (3): 431432. 10.1093/bioinformatics/btq675.
 39.
Jiang Q, Wang Y, Hao Y, Juan L, Teng M, Zhang X, Li M, Wang G, Liu Y: miR2Disease: a manually curated database for microRNA deregulation in human disease. Nucleic Acids Res. 2009, 37: D98104. 10.1093/nar/gkn714.
 40.
Higgins ME, Claremont M, Major JE, Sander C, Lash AE: CancerGenes: a gene selection resource for cancer genome projects. Nucleic Acids Res. 2007, 35 (Database issue): D721D726.
 41.
Zhang HM, Chen H, Liu W, Liu H, Gong J, Wang H, Guo AY: AnimalTFDB: a comprehensive animal transcription factor database. Nucleic Acids Res. 2012, 40 (Database issue): D144D149.
 42.
Triulzi T, Iorio MV, Tagliabue E, Casalini P: MicroRNA: new players in metastatic process. Oncogene and Cancer  From Bench to Clinic. Edited by: Siregar Y. 2013, InTech, 391414.
 43.
Dasgupta S, Srinidhi S, Vishwanatha JK: Oncogenic activation in prostate cancer progression and metastasis: molecular insights and future challenges. J Carcinog. 2012, 11 (1): 410.4103/14773163.93001.
 44.
Betel D, Koppal A, Agius P, Sander C, Leslie C: mirSVR predicted target site scoring method: Comprehensive modeling of microRNA targets predicts functional nonconserved and noncanonical sites. Genome Biol. 2010, 11: R9010.1186/gb2010118r90.
 45.
EsquelaKerscher A, Slack FJ: Oncomirs: microRNAs with a role in cancer. Nat Rev Cancer. 2006, 6: 259269.
 46.
Kojima S, Chiyomaru T, Kawakami K, Yoshino H, Enokida H, Nohata N, Fuse M, Ichikawa T, Naya Y, Nakagawa M, Seki N: Tumour suppressors miR1 and miR133a target the oncogenic function of purine nucleoside phosphorylase (PNP) in prostate cancer. Br J Cancer. 2012, 106 (2): 405413. 10.1038/bjc.2011.462.
 47.
Peng X, Guo W, Liu T, Wang X, Tu X, Xiong D, Chen S, Lai Y, Du H, Chen G, Liu G, Tang Y, Huang S, Zou X: Identification of miRs143 and −145 that is associated with bone metastasis of prostate cancer and involved in the regulation of EMT. PLoS One. 2011, 6 (5): e2034110.1371/journal.pone.0020341.
 48.
Galardi S, Mercatelli N, Giorda E, Massalini S, Frajese GV, Ciafrè SA, Farace MG: miR221 and miR222 expression affects the proliferation potential of human prostate carcinoma cell lines by targeting p27^{Kip1}. J Biol Chem. 2007, 282 (32): 2371623724. 10.1074/jbc.M701805200.
 49.
Vrba L, Jensen TJ, Garbe JC, Heimark RL, Cress AE, Dickinson S, Stampfer MR, Futscher BW: Role for DNA methylation in the regulation of miR200c and miR141 expression in normal and cancer cells. PLoS One. 2010, 5 (1): e869710.1371/journal.pone.0008697.
 50.
Scharer CD, McCabe CD, AliSeyed M, Berger MF, Bulyk ML, Moreno CS: Genomewide promoter analysis of the SOX4 transcriptional network in prostate cancer cells. Cancer Res. 2009, 69 (2): 709717. 10.1158/00085472.CAN083415.
 51.
Abdulghani J, Gu L, Dagvadorj A, Lutz J, Leiby B, Bonuccelli G, Lisanti MP, Zellweger T, Alanen K, Mirtti T, Visakorpi T, Bubendorf L, Nevalainen MT: STAT3 promotes metastatic progression of prostate cancer. Am J Pathol. 2008, 172 (6): 17171728. 10.2353/ajpath.2008.071054.
 52.
Gu L, Vogiatzi P, Puhr M, Dagvadorj A, Lutz J, Ryder A, Addya S, Fortina P, Cooper C, Leiby B, Dasgupta A, Hyslop T, Bubendorf L, Alanen K, Mirtti T, Nevalainen MT: STAT5 promotes metastatic behavior of human prostate cancer cells in vitro and in vivo. Endocr Relat Cancer. 2010, 17 (2): 481493. 10.1677/ERC090328.
 53.
Haghikia A, Hoch M, Stapel B, HilfikerKleiner D: STAT3 regulation of and by microRNAs in development and disease. JAK STAT. 2012, 1 (3): 143105. 10.4161/jkst.19573.
 54.
Steele R, Mott JL, Ray RB: MBP1 upregulates miR29b that represses Mcl1, collagens, and matrixmetalloproteinase2 in prostate cancer cells. Genes Cancer. 2010, 1 (4): 381387. 10.1177/1947601910371978.
 55.
Liberzon A, Subramanian A, Pinchback R, Thorvaldsdóttir H, Tamayo P, Mesirov JP: Molecular signatures database (MSigDB) 3.0. Bioinformatics. 2011, 27 (12): 17391740. 10.1093/bioinformatics/btr260.
 56.
Kypta RM, Waxman J: Wnt/βcatenin signalling in prostate cancer. Nat Rev Urol. 2012, 9: 418428. 10.1038/nrurol.2012.116.
 57.
Schwartz S, Caceres C, Morote J, De Torres I, RodriguezVallejo JM, Gonzalez J, Reventos J: Gains of the relative genomic content of ErbB1 and ErbB2 in prostate carcinoma and their association with metastasis. Int J Oncol. 1999, 14 (2): 367371.
 58.
Raychaudhuri P, Park HJ: FoxM1: a master regulator of tumor metastasis. Cancer Res. 2011, 71 (13): 43294333. 10.1158/00085472.CAN110640.
 59.
Wolfer A, Ramaswamy S: MYC and metastasis. Cancer Res. 2011, 71 (6): 20342037. 10.1158/00085472.CAN103776.
 60.
Friedman N: Inferring cellular networks using probabilistic graphical models. Science. 2004, 303 (5659): 799805. 10.1126/science.1094068.
 61.
Ivan A, Halfon M, Sinha S: Computational discovery of cisregulatory modules in Drosophila without prior knowledge of motifs. Genome Biol. 2008, 9 (1): R2210.1186/gb200891r22.
 62.
Zhang BT: Hypernetworks: A molecular evolutionary architecture for cognitive learning and memory. IEEE Computational Intelligence Magazine. 2008, 3 (3): 4963.
 63.
Kim SJ, Ha JW, Zhang BT: Proceedings of IEEE World Congress Computational Intelligence. Evolutionary layered hypernetworks for identifying microRNAmRNA regulatory modules. 2010, (WCCICEC 2010, 22992306.
 64.
Buhmann MD: Cambridge Monographs on Applied and Computational Mathematics (Vol.12). Radial basis functions: theory and implementations. 2003, Cambridge University Press
 65.
Kraskov A, Stögbauer H, Grassberger P: Estimating mutual information. Phys Rev E. 2004, 69 (6): 066138
 66.
Hu H, Yan X, Huang Y, Han J, Zhou XJ: Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics. 2005, 21 (suppl 1): i213i221. 10.1093/bioinformatics/bti1049.
 67.
Mason O, Verwoerd M: Graph theory and networks in biology. IET Syst Biol. 2007, 1 (2): 89119. 10.1049/ietsyb:20060038.
 68.
Yan X, Mehan MR, Huang Y, Waterman MS, Yu PS, Zhou XZ: A graphbased approach to systematically reconstruct human transcriptional regulatory modules. Bioinformatics. 2007, 23 (13): i577i586. 10.1093/bioinformatics/btm227.
 69.
Ramadan E, Perincheri P, Tuck D: Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology. A hypergraph approach for analyzing transcriptional networks in breast cance. 2010, 556562.
Acknowledgements
This work was supported by the National Research Foundation (NRF) Grant funded by the Korea government (MSIP) (NRF20100017734, NRF2013M3B5A2035921, and the Bio & Medical Technology Development Program, No.2012M3A9D1054622), supported by KEIT grant funded by the Korea government (MKE) (KEIT10035348 and KEIT 10044009), supported by AOARD R&D grant funded by AFORS (124087).
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
SJK proposed the idea and wrote the manuscript and analyzed the data. JWH implemented the method and performed the computational experiments. BTZ supervised the study and revised the manuscript. All authors read and approved the final manuscript.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
About this article
Received
Accepted
Published
DOI
Keywords
 miRNAmRNA interaction networks
 Hypergraphbased model
 Higherorder gene modules
 Evolutionary learning
 Cancer genomics data analysis