- Proceedings
- Open Access
- Published:

# Investigating the utility of clinical outcome-guided mutual information network in network-based Cox regression

*BMC Systems Biology*
**volume 9**, Article number: S8 (2015)

## Abstract

### Background

Network-based approaches have recently gained considerable popularity in high- dimensional regression settings. For example, the Cox regression model is widely used in expression analysis to predict the survival of patients. However, as the number of genes becomes substantially larger than the number of samples, the traditional Cox or *L*_{2}-regularized Cox models are still prone to noise and produce unreliable estimations of regression coefficients. A recent approach called the network-based Cox (Net-Cox) model attempts to resolve this issue by incorporating prior gene network information into the Cox regression. The Net-Cox model has shown to outperform the models that do not use this network information.

### Results

In this study, we demonstrate an alternative network construction method for the outcome-guided gene interaction network, and we investigate its utility in survival analysis using Net-Cox regression as compared with conventional networks, such as co-expression or static networks obtained from the existing knowledgebase. Our network edges consist of gene pairs that are significantly associated with the clinical outcome. We measure the strength of this association using mutual information between the gene pair and the clinical outcome. We applied this approach to ovarian cancer patients' data in The Cancer Genome Atlas (TCGA) and compared the predictive performance of the proposed approach with those that use other types of networks.

### Conclusions

We found that the alternative outcome-guided mutual information network further improved the prediction power of the network-based Cox regression. We expect that a modification of the network regularization term in the Net-Cox model could further improve its prediction power because the properties of our network edges are not optimally reflected in its current form.

## Background

The Cox regression model [1] has been commonly used in survival analysis to detect important features and to predict patient survival. Due to the advance of sequencing technology, the number of genes or features in these analyses is becoming substantially larger than the number of samples. Although several variant models, such as *L*_{1} −regularization [2] or *L*_{2} −regularization in Hilbert space [3], have been proposed to resolve this issue, those variant models are still prone to noise and over-fitting [4]. Recently, network-based approaches have gained considerable popularity for expression quantitative trait loci (eQTL) studies and clinical outcome predictions in high-dimensional regression settings. These approaches incorporate prior network information of either the features [5] or outcomes [6], or both [7, 8]. For survival analysis, Zhang *et al*. [9] recently proposed a variant of the *L*_{2} −regularized Cox model called the Net-Cox model, which is a network-regularized Cox regression model. The typical types of networks used in such approaches are either coexpression networks, which are constructed by computing the correlation between every pair of variables, or static networks, such as the protein-protein interaction (PPI) network, which can be obtained from the existing knowledgebase. In the Net-Cox model, co-expression and functional linkage networks were incorporated in the survival analysis, and the results showed enhanced performance when compared with the conventional methods, which do not use the network information.

A potential limitation of these conventional networks is that the edges only reflect the information of within-features or within-outcomes relations, and do not consider the association between features and outcomes, which may be useful in improving the predictive power. In this study, we show that the outcome-guided mutual information network improves the performance of survival analysis in the Net-Cox regression [9]. We demonstrate the utility of this outcome-guided gene network with the analysis of a TCGA ovarian cancer dataset and compare its performance with those of survival analyses that use other types of networks.

## Methods

### Background on survival analysis using Cox regression

First, we describe the basic formulation of Cox regression [1] for survival analysis using expression and survival data. Given a gene expression profile *X*, which consists of *n* patients and *p* genes, the risk of an event at time *t* for the *i*th patient with gene expression X_{
i
} = (X_{i1} ,...,*X*_{
ip
})′ is defined as h\left(t|{X}_{i}\right)={h}_{0}\left(t\right){e}^{{{X}^{\prime}}_{i}\beta}, where *β* = (*β*_{1},...,*β*_{
p
})′ is a regression coefficient vector, and *h*_{0}(*t*) denotes the baseline hazard function at *t*. The coefficient *β* and the function *h*_{0} are generally unknown and need to be estimated. In a traditional Cox regression model, the estimation of *β* is based on the maximization of the partial log-likelihood, *pl*(*β*):

Here, *δ*_{
i
} is the observed status (*δ*_{
i
} = 1 implies observed; *δ*_{
i
} = 0 implies censored), *t*_{
i
} represents the event time of the *i*th patient, and *R*(*t*_{
i
}) is the subset of patients who survived to time *t*_{
i
}. Once the coefficient vector \widehat{\beta} is obtained, a Breslow estimator can estimate the baseline hazard function *h*_{0} as follows:

### Regularized Cox regression in high-dimensional setting

When *p* ≫ 1, the Cox regression model is prone to noise and tends to produce unreliable estimations of regression coefficients. Several solutions that shrink the coefficients have been proposed. A common solution is the *L*_{2} −Cox model, which uses the penalized total log-likelihood:

where \lambda {\sum}_{j=1}^{p}{\beta}_{j}^{2} is the regularization term and *λ* controls the amount of shrinkage. Net-Cox regression is an extension of the *L*_{2} −Cox model and uses the following penalized log-likelihood:

in which *λβ*′Γ*β* is the penalty term and Γ = (1 - α) *L* + *αI* conveys prior network information between genes. Here, *L* = *I* − *S*, where *I* is the identity matrix and *S* is a normalized Laplacian matrix, which comes from the gene network.

The parameter α ∈ (0,1] controls the contribution of network information to the model. When α is small, more network information is incorporated into the penalty term than when α is large. We note that when α = 1, the network-based cox regression model reduces to the *L*_{2} −Cox model that does not use any network information.

The typical network used for this type of network-based approach is the coexpression network, in which the edge weights correspond to the correlation between expression vectors of two genes [6]. In [9], a functional linkage network was constructed from the existing knowledgebase. We note that both types of networks do not reflect information about the relation between the features and the outcomes inferred from a given dataset. Therefore, we consider an alternative approach of constructing and utilizing outcome-guided mutual information network in survival analysis. The detailed procedures are explained in the following sections.

### Gene interaction networks associated with clinical outcome

Mutual information is an information-theoretic measure that reveals the dependency or association between two random variables [10] and is defined as follows:

where *H*(*X*) and *H*(*Y*) denote the entropies of the respective variables, *X* and *Y*, and *H*(*X*, *Y*) denotes the joint entropy of *X* and *Y*. It can be used to detect both linear and nonlinear relations between two random variables [11]. While many previous studies generally used mutual information to detect dependency between two features of the same type, more recent approaches in [12, 13] have also extended the mutual information to assess the association between expressions of two genomic features, *X*_{1} and *X*_{2}, and the clinical outcome *Y* as follows:

Note that genomic feature values, such as gene expressions or clinical outcome values, are often numeric. To compute the mutual information, we discretize the genomic feature values using a histogram-based technique as in [14]. In the case of survival data, a simple thresholding scheme converts the outcome variable of the survival month to a binary variable. In the case of the TCGA dataset used in this study, we split the patients into short-term living (≤ 3 years) and long-term living (> 3 years). Patients who are reported as *living* and have an overall survival time less than 3 years are filtered out of this study.

Computation of mutual information for every pair of genes and the clinical outcome variable produce a complete network between genes. We can further filter out less significant edges by using the permutation testing scheme proposed in [14]. Through the repeated permutation of clinical outcome labels and the re-computation of mutual information values, a threshold θ is defined as the maximum of the average mutual information values for each gene pair. This θ can be used as a base threshold to remove insignificant edges. However, this threshold still leaves a large number of edges that may not be fully beneficial to the downstream analysis. In this study, another parameter *σ*, which amplifies the significance level by a factor of (1 + *σ*), was proposed to create a stricter cut-off. In particular, the filtered mutual information network with significance level *σ* is defined by

where *P* is the set of all genes, *Y* denotes the binary survival status of patients, and *I*(*g*_{
i
}, *g*_{
j
}; *Y*) is the mutual information of the gene pair (*g*_{
i
}, *g*_{
j
} ) and *Y*. We adopt this thresholding scheme and test the performance of different parameter values. In order to apply the network to the Net-Cox model, we also normalize G_{
σ
}and create a normalized Laplacian matrix \mathsf{\text{S}}={\mathsf{\text{R}}}^{-\frac{1}{2}}{G}_{\sigma}{C}^{-\frac{1}{2}}, where {\mathsf{\text{R}}}_{ii}={\displaystyle {\sum}_{j}}{G}_{{\sigma}_{ij}} and {\mathsf{\text{C}}}_{ii}={\sum}_{j}{G}_{{\sigma}_{ij}}^{\prime} both of which are diagonal matrices.

### Dataset

We use genomic profiles and clinical outcome data from patients with *ovarian serous cystadenocarcinoma* in TCGA. Genomic profiles of copy number alterations (CNAs), messenger RNA (mRNA), and methylation (METH) are used in the experiments. We remove genes or patients with missing values and extract genes that are common to all three profiles. As a result, our data matrix consists of expressions and alterations of 10,022 genes and 340 patients for each of the three profiles.

### Evaluation measure

We compare the performance of the Net-Cox model using the proposed mutual information networks with those of models using a gene co-expression and gene functional linkage network as seen in [9]. In order to create a baseline, an analysis of the *L*_{2} − Cox model, which does not use any network information, is performed. This is the same as setting α = 1 in the Net-Cox model.

With each run of the Net-Cox analysis, the prognostic indices PI={X}^{\prime}\widehat{\beta} are calculated as prediction markers, where *X* is the test data not used in training, and the regression coefficient \widehat{\beta} is estimated from given training data, network information, and the parameters. We evaluate the prediction performance using a time-dependent area under the curve (AUC) [15].

Specifically, time-dependent sensitivity and specificity functions are defined as follows:

in which *c* is the cut-off point, *t* is the survival time, *f*(*X*) are the prognostic indices, f\left(X\right)={\mathsf{\text{X}}}^{\prime}\betaand *δ*(*t*) is the event indicator at time *t* [16]. Upon examination of *sensitivity*[*c*, *t*|*f*(*X*)] and 1 - *specificity*[*c*, *t*|*f*(*X*)], we can define ROC[*t*|*f*(*X*)] as receiving operating characteristic (ROC) curves at any time *t* and AUC[*t*|*f* (*X*) ] as the area under the ROC curves at any time *t*. The larger AUC[*t*|*f*(*X*)] is, the better our prediction model performs at time *t*.

In addition, we evaluate the performance, using a validation set and the log-rank test [17], to analyze whether the patients are properly classified: high-risk or low-risk. We rank patients in descending order by their prognostic indices (*PI)* and divide them into a high-risk group containing the top 40% of patients and a low-risk group containing the bottom 40%. We then test the validity of this group assignment by using the log-rank test and the true survival information.

### Cross validation

We select the optimal parameters, λ and α, that respectively control the degree of sparsity and the amount of network constraint in the Net-Cox model using a 5-fold cross validation process. We reserve 20% of all the samples in this study for validation (the validation set) and use the remaining 80% (the training set) for cross validation in order to optimize the parameters and check the cross-validation error. In the case of mutual information networks, we also choose the optimal *σ* using cross-validation. The average time-dependent AUC is used as an evaluation measure for each genomic profile and the different kinds of networks.

We trained each model five times using a 5-fold cross validation process on the training set. At each run, we estimate the regression coefficient \widehat{\beta} and the mutual information network (or co-expression network) on four folds of the training data. The estimated coefficient \widehat{\beta} and the remaining fold, consisting of 20% of the training data, are then used to calculate the prognostic indices *PI* = *X*′*β*, which are, in turn, used to rank the patients according to their expected survival time, or month.

To examine how the mutual information network filtered by significance level *σ* contributes to the performance of the Net-Cox model, we experiment with the complete mutual information network and those networks filtered by *σ*, where *σ* = [0, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3] for each profile. This corresponds to eight mutual information networks on each profile, including the complete network. We vary the parameters λ = [10^{−4},10^{−3}, 10^{−2}, 10^{−1}] and α = [0.1, 0.3, 0.5, 0.7, 0.9, 1]. As a result, we choose the optimal (λ, α) pair with the best performance, in terms of the time-dependent ROC curves, for each network and each profile.

### Enrichment analysis

In order to assess the biological significance of the results, we performed a gene list enrichment analysis for the 100 genes with the largest regression coefficients for each profile using *ToppGene* http://toppgene.cchmc.org[18]. The enrichment analysis is based on Gene Ontology [19] and Pathway and Disease [20]. The terms that have adjusted *p*-values under 0.05 when using the Benjamini-Hochberg correction [21] are considered biologically significant. We also ran the enrichment test on the genes in the network consisting of the 100 edges with the largest mutual information values for each profile.

## Results and discussion

### Optimal parameter selection using cross validation

We first examined the performance behavior of the mutual information network based Net-Cox model as a function of the parameter *σ* (Figure 1). Note that as the value of *σ* increases, fewer edges remain in the network. In Figure 1, the leftmost *x*-tick label "comp" represents the result obtained using the complete network, and the following *x*-ticks correspond to different *σ* values. Overall, the best performance resulted not from the complete network but from the network filtered by *σ*. Therefore, reducing less significant information and using only significant edges seem to improve the performance of the network-regularized regression model. When the CNA and methylation profiles were examined, the best result was obtained using the largest test value of *σ* = 0.3, for which 9,896 and 6,011 edges, respectively, remained, which was about 0.01% of the total number of pairs. In contrast, the mRNA profile preferred a larger network consisting of 568,486 edges (about 1%) with the optimum at *σ* = 0.1.

We also optimized the parameters *λ* and *α* using a 5-fold cross validation process (Table 1). We found that the optimal *α* value for the mutual information network was smaller than those of the co-expression and functional linkage networks. This implies that the mutual information network contributed more to the regression model than the other approaches. Moreover, the average time-dependent AUC was highest when the mutual information based network was used in all profiles. However, we also note that the median time-dependent AUC was not always highest for the mutual information network, which may imply complementary properties of different types of networks. In Figure 2, the bar plot for the average time-dependent AUC and the boxplot for the distribution of the time-dependent AUC across all the time points and the 5-fold experiments are shown.

### Performance comparison on validation set

After selection of the best parameters for each network, we re-trained each model using the entire training set and then applied the obtained regression model to the holdout validation set. Figure 3 shows a comparison between the resulting time-dependent AUCs. The mutual information based network had the best performance for the mRNA profile and comparable results for the other profiles. Figure 4 shows the Kaplan-Meier survival curves and *p*-values from the log-rank tests with respect to the patient group assignment for each approach. Examining the CNA and mRNA profiles, every method revealed significant results (*p*-value < 0.05) with the log-rank test, except for the functional linkage network on the CNA profile. However, all the methods showed insignificant results for the methylation profile. It appears that the interaction effect in the methylation profile is substantially less than in other profiles.

### Signature genes for each profile

In order to examine the genes that have the strongest marginal association with survival, we displayed the five largest regression coefficients in each profile as a heatmap (Figure 5). The top five genes from each profile were all distinct with no overlap, but some genes had large coefficients in multiple profiles. The top genes from the CNA profile had smaller overall regression coefficients than the top genes in other profiles. Moreover, the top genes in the CNA profile had larger regression coefficients than those in other profiles, which suggests that the roles of these genes are more prominent in other genomic levels.

We find that the identified genes are associated with ovarian cancer in many previous studies. The gene with the largest coefficient in the CNA profile was *URI1, prefoldin-like chaperone* (URI1), which is reported to be an oncogene that amplified in ovarian cancer cells [22]. *Dual-Specificity Tyrosine-(Y)-Phosphorylation Regulated Kinase 1B* (DYRK1B), which is over-expressed in a wide spectrum of ovarian cancer cell lines and human specimens [23], was the fourth highest gene. The largest coefficient gene from the mRNA profile, *Oviductal Glycoprotein 1* (OVGP1), is reported to be a more accreted detection marker than other markers of ovarian epithelial cancers [24]. Over-expression of the *Elafin/Peptidase Inhibitor 3* (PI3) gene from the mRNA profile is associated with poor overall survival [25]. We found no reported associations of the top five genes in the methylation profile with ovarian cancer from previous studies.

An enrichment analysis was performed for the 100 genes having the largest coefficients in each profile (Table 2). Many significantly enriched terms demonstrated an association with ovarian cancer. For example, the genes URI1, *Processing Of Precursor 4* (POP4), *Pleckstrin Homology Domain Containing, Family F Member 1*(PLEKHF1), and DYRK1B from the CNA profile were enriched in the Ovarian Neoplasms term (ctd:D010051) of Disease. For Gene Ontology, the top 100 genes in the CNA profile were primarily enriched in the Biological Process (BP) terms. The Hox family of homeobox genes was enriched in the terms related with the embryonic skeletal system (GO:0048706; GO:0048704), and these genes are critical for cell migration and DNA repair [26]. The genes in the mRNA profile that were enriched in the Chitinases-related terms were already known for their cancer indication roles [27]. Therefore, we can conclude that the genes with large regression coefficients are expected to be related to ovarian cancer.

### Network analysis

To illustrate the general topology of the mutual information network and its effect on prediction performances more closely, we constructed gene interaction sub-networks by using the 100 edges with the largest mutual information values for each profile (Figure 6). The color of each node represents the strength of its marginal effect. We also see many genes with weak marginal effects appear in the network. Topologies of the networks were analyzed by Cytoscape [28] and summarized in Table 3. The three networks reveal different network structures and topologies. The CNA network consists of a smaller number of connected components than the others (7 versus 54 and 50). It also shows a denser connection between the genes. The network centralization value is about five times higher, and its average number of neighbors is about twice as much. In addition, the overall coefficients of the genes in the CNA network were smaller than those in the other profiles. Considering that those genes have high mutual information values and, hence, strong interactive effects, this may imply that the interaction effect on survival in the CNA profile is more dominant than the marginal effect with each gene. The *R*^{2} value for the power-law distribution of the mRNA and methylation networks were 0.922 and 0.909, respectively, which shows strong scale freeness [29], as is the case with many other biological networks [30–33].

Table 4 summarizes the enrichment test results for the genes in each sub-network. The network from the methylation profile was mainly enriched in GO Molecular Function (MF) terms. Networks for the CNA and mRNA profiles were not enriched in GO terms but in terms of diseases related to ovarian cancer, such as Neoplasms (ctd:D009369), Carcinoma (ctd:D002277), and Neoplasm Recurrence, Local (ctd:D009364). Further, the CNA profile network was enriched in other related disease terms, such as Obesity (ctd:D009756) and BMI (Body Mass Index) [34], and Insulin Resistance (ctd:D007333) [35] and Hyperlipidaemia (ctd:D006949) [36].

## Conclusions

In this study, we investigated the utility of an alternative network construction approach based on mutual information in network-based Cox regression. Our results show that the mutual information based network can further improve prediction performance in survival analyses. Moreover, the permutation testing scheme used to discard insignificant pairs improved the prediction performance.

Overall, the performance gain of this alternative approach over existing methods was rather marginal. It seems due to a mismatch between the high mutual information value and the small value of the penalty term \left(\psi \left(\beta \right)\right)=\frac{1}{2}{\sum}_{i,j}^{p}{S}_{i,j}{\left({\beta}_{i}-{\beta}_{j}\right)}^{2}) in the Net-Cox model--it does not necessarily mean that the gene pairs containing high mutual information with respect to survival should have similar marginal effects because the mutual information measure is more concerned with the interaction effect. Even with this discrepancy, results based on the mutual information network are still promising. In future studies, the regularization term could be modified to better reflect the information contained in the mutual information network and, hence, further improve the performance. Another direction would be to apply this network scheme to network-based approaches in other domains.

## References

Cox D: Regression Models and Life-Tables. Breakthroughs in Statistics. Edited by: Kotz S, Johnson N. 1992, Springer New York, 527-541.

Tibshirani R: The lasso method for variable selection in the Cox model. Statistics in medicine. 1997, 16 (4): 385-395. 10.1002/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3.

Li H, Luan Y: Kernel Cox regression models for linking gene expression profiles to censored survival data. Pacific Symposium on Biocomputing. 2002, 65-

Gui J, Li H: Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data. Bioinformatics. 2005, 21 (13): 3001-3008. 10.1093/bioinformatics/bti422.

Li C, Li H: Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics. 2008, 24 (9): 1175-1182. 10.1093/bioinformatics/btn081.

Kim S, Sohn K-A, Xing EP: A multivariate regression approach to association analysis of a quantitative trait network. Bioinformatics. 2009, 25 (12): i204-i212. 10.1093/bioinformatics/btp218.

Cheng W, Zhang X, Guo Z, Shi Y, Wang W: Graph-regularized dual Lasso for robust eQTL mapping. Bioinformatics. 2014, 30 (12): i139-i148. 10.1093/bioinformatics/btu293.

Lee S, Xing EP: Leveraging input and output structures for joint mapping of epistatic and marginal eQTLs. Bioinformatics. 2012, 28 (12): i137-i146. 10.1093/bioinformatics/bts227.

Zhang W, Ota T, Shridhar V, Chien J, Wu B, Kuang R: Network-based survival analysis reveals subnetwork signatures for predicting outcomes of ovarian cancer treatment. PLoS computational biology. 2013, 9 (3): e1002975-10.1371/journal.pcbi.1002975.

Steuer R, Kurths J, Daub CO, Weise J, Selbig J: The mutual information: detecting and evaluating dependencies between variables. Bioinformatics (Oxford, England). 2002, 18 (Suppl 2): S231-240. 10.1093/bioinformatics/18.suppl_2.S231.

Goebel B, Dawy Z, Hagenauer J, Mueller JC: An approximation to the distribution of finite sample size mutual information estimates. IEEE International Conference on Communications, 2005 ICC. 2005, 2: 1102-1106.

Leem S, Jeong H-h, Lee J, Wee K, Sohn K-A: Fast detection of high-order epistatic interactions in genome-wide association studies using information theoretic measure. Computational Biology and Chemistry. 2014

Hu T, Sinnott-Armstrong Na, Kiralis JW, Andrew AS, Karagas MR, Moore JH: Characterizing genetic interactions in human disease association studies using statistical epistasis networks. BMC bioinformatics. 2011, 12: 364-10.1186/1471-2105-12-364.

Butte AJ, Kohane IS: Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. Pacific Symposium on Biocomputing Pacific Symposium on Biocomputing. 2000, 426: 418-429.

Heagerty PJ, Lumley T, Pepe MS: Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics. 2000, 56 (2): 337-344. 10.1111/j.0006-341X.2000.00337.x.

Li H, Gui J: Partial Cox regression analysis for high-dimensional microarray gene expression data. Bioinformatics. 2004, 20 (Suppl 1): i208-215. 10.1093/bioinformatics/bth900.

Mantel N: Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer chemotherapy reports Part 1. 1966, 50 (3): 163-170.

Chen J, Bardes EE, Aronow BJ, Jegga AG: ToppGene Suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res. 2009, W305-311. 37 Web Server

Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature genetics. 2000, 25: 25-29. 10.1038/75556.

Davis AP, Murphy CG, Johnson R, Lay JM, Lennon-Hopkins K, Saraceni-Richards C, Sciaky D, King BL, Rosenstein MC, Wiegers TC, et al: The Comparative Toxicogenomics Database: update 2013. Nucleic Acids Res. 2013, D1104-1114. 41 Database

Benjamini Y: Discovering the false discovery rate. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2010, 72 (4): 405-416. 10.1111/j.1467-9868.2010.00746.x.

Theurillat JP, Metzler SC, Henzi N, Djouder N, Helbling M, Zimmermann AK, Jacob F, Soltermann A, Caduff R, Heinzelmann-Schwarz V, et al: URI is an oncogene amplified in ovarian cancer cells and is required for their survival. Cancer Cell. 2011, 19 (3): 317-332. 10.1016/j.ccr.2011.01.019.

Gao J, Yang X, Yin P, Hu W, Liao H, Miao Z, Pan C, Li N: The involvement of FoxO in cell survival and chemosensitivity mediated by Mirk/Dyrk1B in ovarian cancer. International journal of oncology. 2012, 40 (4): 1203-1209.

Maines-Bandiera S, Woo MM, Borugian M, Molday LL, Hii T, Gilks B, Leung PC, Molday RS, Auersperg N: Oviductal glycoprotein (OVGP1, MUC9): a differentiation-based mucin present in serum of women with ovarian cancer. Int J Gynecol Cancer. 2010, 20 (1): 16-22. 10.1111/IGC.0b013e3181bcc96d.

Clauss A, Ng V, Liu J, Piao H, Russo M, Vena N, Sheng Q, Hirsch MS, Bonome T, Matulonis U, et al: Overexpression of elafin in ovarian carcinoma is driven by genomic gains and activation of the nuclear factor kappaB pathway and is associated with poor overall survival. Neoplasia. 2010, 12 (2): 161-172.

Liu Y, Devescovi V, Chen S, Nardini C: Multilevel omic data integration in cancer cell lines: advanced annotation and emergent properties. BMC Systems Biology. 2013, 7 (1): 14-10.1186/1752-0509-7-14.

Kzhyshkowska J, Gratchev A, Goerdt S: Human chitinases and chitinase-like proteins as indicators for inflammation and cancer. Biomarker insights. 2007, 2: 128-

Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks. Genome Research. 2003, 13: 2498-2504. 10.1101/gr.1239303.

Zhang B, Horvath S: A general framework for weighted gene co-expression network analysis. Statistical applications in genetics and molecular biology. 2005, 4: Article17-

Jeong H, Tombor B, Albert R, Oltvai ZN, Barabási aL: The large-scale organization of metabolic networks. Nature. 2000, 407: 651-654. 10.1038/35036627.

Diez D, Wheelock AM, Goto S, Haeggström JZ, Paulsson-Berne G, Hansson GK, Hedin U, Gabrielsen A, Wheelock CE: The use of network analyses for elucidating mechanisms in cardiovascular disease. Molecular bioSystems. 2010, 6: 289-304. 10.1039/B912078E.

Carter SL, Brechbühler CM, Griffin M, Bond AT: Gene co-expression network topology provides a framework for molecular characterization of cellular state. Bioinformatics (Oxford, England). 2004, 20: 2242-2250. 10.1093/bioinformatics/bth234.

Reguly T, Breitkreutz A, Boucher L, Breitkreutz B-J, Hon GC, Myers CL, Parsons A, Friesen H, Oughtred R, Tong A, et al: Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae. Journal of biology. 2006, 5: 11-10.1186/jbiol36.

Engeland A, Tretli S, Bjørge T: Height, Body Mass Index, and Ovarian Cancer: A Follow-Up of 1.1 Million Norwegian Women. Journal of the National Cancer Institute. 2003, 95 (16): 1244-1248. 10.1093/jnci/djg010.

Arcidiacono B, Iiritano S, Nocera A, Possidente K, Nevolo MT, Ventura V, Foti D, Chiefari E, Brunetti A: Insulin resistance and cancer risk: an overview of the pathogenetic mechanisms. Experimental diabetes research. 2012, 2012: 789174-

Parazzini F, Moroni S, La Vecchia C, Negri E, dal Pino D, Bolis G: Ovarian cancer risk and history of selected medical conditions linked with female hormones. European journal of cancer. 1997, 33 (10): 1634-1637. 10.1016/S0959-8049(97)00011-7.

## Acknowledgements

This research was supported by the Basic Science Research Program through the National Research Foundation (NRF) of Korea funded by the Ministry of Science, ICT, and Future Planning (MSIP) (No. 2010-0028631), and by the Ministry of Education (2012R1A1A2042792).

**Declarations**

The publication costs of this article were funded by Ajou University, S. Korea.

This article has been published as part of *BMC Systems Biology* Volume 9 Supplement 1, 2015: Selected articles from the Thirteenth Asia Pacific Bioinformatics Conference (APBC 2015): Systems Biology. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcsystbiol/supplements/9/S1

## Author information

### Affiliations

### Corresponding author

## Additional information

### Competing interests

The authors declare that they have no competing interests.

### Authors' contributions

HJ, KW, and KS designed the study. HJ and SK implemented the idea and performed the experiments. HJ, SK, KW, and KS developed the idea and performed the analysis. HJ, SK, KW, and KS wrote the paper.

## Rights and permissions

**Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication waiver (https://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

## About this article

### Cite this article

Jeong, Hh., Kim, S.Y., Wee, K. *et al.* Investigating the utility of clinical outcome-guided mutual information network in network-based Cox regression.
*BMC Syst Biol* **9, **S8 (2015). https://doi.org/10.1186/1752-0509-9-S1-S8

Published:

DOI: https://doi.org/10.1186/1752-0509-9-S1-S8

### Keywords

- Mutual Information
- Coexpression Network
- Methylation Profile
- mRNA Profile
- Serous Cystadenocarcinoma