 Proceedings
 Open Access
 Published:
Pathwaygene identification for pancreatic cancer survival via doubly regularized Cox regression
BMC Systems Biology volume 8, Article number: S3 (2014)
Abstract
Background
Recent global genomic analyses identified 69 gene sets and 12 core signaling pathways genetically altered in pancreatic cancer, which is a highly malignant disease. A comprehensive understanding of the genetic signatures and signaling pathways that are directly correlated to pancreatic cancer survival will help cancer researchers to develop effective multigene targeted, personalized therapies for the pancreatic cancer patients at different stages. A previous work that applied a LASSO penalized regression method, which only considered individual genetic effects, identified 12 genes associated with pancreatic cancer survival.
Results
In this work, we integrate pathway information into pancreatic cancer survival analysis. We introduce and apply a doubly regularized Cox regression model to identify both genes and signaling pathways related to pancreatic cancer survival.
Conclusions
Four signaling pathways, including Ion transport, immune phagocytosis, TGFβ (spermatogenesis), regulation of DNAdependent transcription pathways, and 15 genes within the four pathways are identified and verified to be directly correlated to pancreatic cancer survival. Our findings can help cancer researchers design new strategies for the early detection and diagnosis of pancreatic cancer.
Background
Pancreatic cancer [1] is a devastating disease with a very poor prognosis and a fiveyear survival rate around 35%. The most common form of pancreatic cancer is the pancreatic ductal adenocarcinoma (PDAC, a malignant exocrine cancer). In the past 30 years, no substantial progress has been made in PDAC diagnosis and treatment. New techniques and methods to investigate the dynamics of PDAC are urgently needed. Modern microarray technology has revolutionized the way that we study the complex biological systems, allowing pancreatic cancer researchers to make genomewide expression profiling and measure other features for patients in a fast, precise, and costeffective way. One aim of systems biologists is to correctly decipher and interpret the highdimensional complex gene expression data, that is, to identify the key genetic signatures and signaling pathways implicated in the diseases.
Pancreatic cancer is characterized by rapid growth, early local and distant invasion, interactions with stromal cells (e.g., pancreatic stellate cells) [2] and fibrous tissue, and a high resistance to chemotherapy and radiotherapy. The evolution of pancreatic cancer is partially stimulated by the overexpression of several growth factors, cytokines, and genetic alterations [3, 4] at different stages of PDAC. Recent global genomic analyses identified 69 gene sets and 12 core signaling pathways genetically altered in the pancreatic cancer [1]. Most of the previous genomic analyses and microarray studies focused on the identification of the differentially expressed and metastasisassociated genes at different stages of pancreatic cancer [3, 5], ignoring an important clinical factor  survival time. Stratford et al.'s work identified six genetic signatures [6] associated with metastatic pancreatic cancer using a sequence of statistical techniques, including the significance analysis of microarray (SAM) [7], centroidbased predictor [8], Pearson correlation, XTile [9], KaplanMeier estimator [10] and Cox model [11]. Though these genes could help discriminate high and lowrisk patients, the prediction was not based on survival time. A comprehensive understanding of the genetic signatures and signaling pathways that are directly correlated to pancreatic cancer survival will help cancer researchers to develop effective multigene targeted, personalized therapies for the pancreatic cancer patients at different stages and improve survival rate.
The Cox proportional hazards model [11] is the most popular survival model used to describe the relationship between the patient's survival time and predictor variables. When we have highdimensional data (e.g. in a microarray study) where the number of predictors (genes) far exceeds the number of subjects (patients), the Cox model cannot be fitted directly unless the highdimensionality is properly handled. The regularization approach has been widely used to select important variables from a large pool of candidate variables [12–14]. For example, a Lasso (least absolute shrinkage and selection operator) penalty can be imposed to individual variables to automatically remove unimportant ones by shrinking their regression coefficients to be exactly zero [15]. In our previous work [16], we applied a lasso penalized Cox regression method, for the first time, to investigate the signature genes that are correlated to the pancreatic cancer survival time. We identified 12 genes associated with the pancreatic cancer survival and eight of them have been confirmed to be genetically altered and differentially expressed in the cancer of gastric, colorectal, ovarian, breast, skin, kidney, colon, lung, and pancreatic in in vivo and in vitro experiments [17–25]. It has been shown that these survivalassociated genes can also help to grade the stage and estimate the survival time of the PDAC patients.
However, the genes may perform as groups rather than individuals since some genes belong to the same pathways and get involved in the same biological processes. The pathway information is biologically important to our understanding of gene regulatory networks and cancer development [1]. The previous work [16] performs gene selection based on the strength of individual genes solely and ignores the information of signaling pathways. Recently, several variable selection methods have been introduced to consider group effects. For example, the group lasso method penalizes the L_{2}norm (Euclidean norm) of the coefficients within each group in linear regression [26] and Cox proportional hazards model [12]. Based on the boosting technique, a group additive regression model [27] and a nonparametric pathwaybased regression model [28] were developed to identify groups of genomic features that are related to several clinical phenotypes, including the survival outcome. However, those group selection methods only conduct "group selection" without "withingroup selection", since they select variables in an "allinorallout" fashion. That is, if one variable in a group is selected, all the other variables in the same group will also get selected.
Although pathways as a whole are involved in the development of pancreatic cancer, according to the global genomic analyses, not all the genes in the same pathway are involved in the process. In this work, we employ a doubly regularized Cox (DrCox) regression model [29] that integrates both genes and signaling pathways for the pancreatic cancer survival analysis. Both nonoverlap and overlap cases of DrCox are considered. Cyclic coordinate descent algorithms are derived for parameter estimation. We analyze the highdimensional microarray data of pancreatic cancer patients with localized and resected PDAC collected between 1999 and 2007 [6] using DrCox. Four signaling pathways, including Ion transport, immune phagocytosis, TGFβ (spermatogenesis), regulation of DNAdependent transcription pathways, and 15 genes within these four pathways are identified and verified to be directly correlated to pancreatic cancer survival. Compared with other methods, the DrCox model can provide more accurate and useful prediction of survival time [29]. These findings can help cancer researchers design new strategies for the early detection and diagnosis of pancreatic cancer at different stages.
Methods
In this section, we describe the doubly regularized Cox (DrCox) regression and derive the parameter estimates via cyclic coordinate descent algorithms. We first present the case where the groups do not overlap, i.e., each variable belongs to only one group. Then we discuss the overlap case, i.e., variables are allowed to belong to multiple groups.
Doubly regularized Cox (DrCox) regression for nonoverlap cases
Assume that the p variables (genes) occur in K groups (pathways). We further assume the k th group has p_{ k } variables and denote the p_{ k } variables in the k th group by X_{(k)}= (X_{k 1}, . . . , X_{ kpk })^{T}, with the corresponding regression coefficients β_{(k)}= (β_{k 1}, . . . , β_{ kpk })^{T}. For a sample of n subjects, let T_{ i } and C_{ i } denote the survival time and the censoring time for subject i = 1, . . . , n. The observed survival time is defined by Y_{ i } = min{T_{ i }, C_{ i }} and the censoring indicator is δ_{ i } = I(T_{ i } ≤ C_{ i }). The p predictor variables of the i th subject is denoted by ${X}_{i}={\left({X}_{i,\left(1\right)}^{T},\phantom{\rule{2.77695pt}{0ex}}\dots ,\phantom{\rule{2.77695pt}{0ex}}{X}_{i,\left(K\right)}^{T}\right)}^{T}$, where ${X}_{i,\left(k\right)}={\left({X}_{i,k1},\phantom{\rule{2.77695pt}{0ex}}\dots ,\phantom{\rule{2.77695pt}{0ex}}{X}_{i,k{p}_{k}}\right)}^{T}$. The survival time T_{ i } and the censoring time C_{ i } are conditionally independent given X_{ i }. The censoring mechanism is assumed to be noninformative. The observed data can be represented by the triplets {(Y_{ i }, δ_{ i }, X_{ i }), i = 1, . . . , n}.
The Cox proportional hazards model [11] composed of p genes and K pathways is written by
where ${\sum}_{k=1}^{K}{p}_{k}=p$. The partial likelihood of the Cox model is
where D is the set of indices of observed failures, and R_{ i } is the set of indices of the subjects who are at risk at time Y_{ i }.
To achieve the goal of both group and withingroup variable selection and to overcome the nonconvexity drawback, the doubly regularized Cox regression model imposes a mixture of lasso penalty and group lasso penalty to the logpartial likelihood ℓ_{ n }(β) = log L_{ n }(β)
where $\parallel {\beta}_{\left(k\right)}{\parallel}_{1}={\sum}_{j=1}^{px}\left{\beta}_{kj}\right$ is the lasso penalty on individual parameters, $\parallel {\beta}_{\left(k\right)}{\parallel}_{2}=\sqrt{{\sum}_{j=1}^{{p}_{k}}{\beta}_{kj}^{2}}$ is the group penalty on groups of parameters, and λ_{1} and λ_{2} are two nonnegative tuning constants controlling the strength of variables selection. The larger are the tuning constants, the fewer variables are retained in the model. In this paper, the value of the tuning constants are determined using kfold cross validation (datadriven) technique to select a subset of relevant genes and signaling pathways for accurate and robust prediction.
Coordinate descent for nonoverlap cases
Since there are more predictor variables than subjects (p > n), to tackle the highdimensionality problem we use a cyclic coordinate descent algorithm, which has been shown to be computationally efficient [30–33]. The idea is to break a large optimization problem into a sequence of small ones. In other words, instead of estimating all the parameters at the same time, we can update each parameter one by one. Readers can refer to [31, 32] for more details.
In the nonoverlap case, where each variable belongs to only one group, estimation of parameters and selection of important variables can be conducted via the minimization of (1) iteratively w.r.t. one parameter by one parameter. The first step is to calculate the forward and backward directional derivatives of each parameter. If e_{ kj } is the coordinate direction along which β_{ kj } varies, then the forward and backward directional derivatives of β_{ kj } are
and
where I(·) is an indicator function equal to 1 if the condition in the parentheses is satisfied and 0 otherwise, and
After obtaining the directional derivatives, we then need to decide which parameters to be updated and the direction for updating. If both of the directional derivatives ${d}_{{e}_{kj}}g\left(\beta \right)$ and ${d}_{{e}_{kj}}g\left(\beta \right)$ are nonnegative, then the update for β_{ kj } is skipped. If either directional derivative is negative, then we solve for the minimum along the corresponding direction. It is impossible for both directional derivatives to be negative due to the convexity of g(β). After identifying the direction to update the parameter, one can use Newton's method to solve for the minimum. The update at iteration m + 1 is given by
where β m is the estimate at iteration m, I_{1}(·) = I(ǁ · ǁ_{2} = 0), and I_{2}(·) = I(ǁ · ǁ_{2} > 0).
DrCox regression via coordinate descent for overlap cases
However, in reality, one gene can get involved in different pathways. To consider overlapping, we modify the notation and objective function (1). We denote the p variables by X_{1}, . . . , X_{ p } with the corresponding regression coefficients β_{1}, . . . , β_{ p }. Let V_{ k } ⊆ {1, 2, . . . , p} be the set of indices of variables in the k th group. The objective function designed for the overlap case can be written as
Note that predictor X_{ j } can belong to several pathways but it is only associated with one coefficient β_{ j }.
The parameter estimation needs to be modified accordingly. If we consider the coordinate direction e_{ j } for β_{ j }, the forward and backward directional derivatives of β_{ j } are
and
where G_{ j } ⊆ {1, 2, . . . , K} are the indices of groups that X_{ j } belongs to. After determining the direction for updating, the coefficient can be updated by
Results and discussion
The DrCox model with the cyclic coordinate descent algorithm is applied to analyze the PDAC data collected between 1999 and 2007. The aim of this work is to identify core signaling pathway sets and genetic signatures within those pathways related to pancreatic cancer survival. The microarray data of pancreatic cancer include 102 samples [6], which are publicly available at Gene Expression Omnibus (access code 21501). According to [6], among these 102 PDAC patients, 66 died at the end of the study (censoring rate 35%). The survival time ranges from 1 month to 5 years. The KaplanMeier curve is plotted in Figure 1 to show the probability of survival in 5 years for the 102 PDAC patients. Each step means an actual event happens, i.e. a pancreatic cancer patient dies. A short vertical line without a drop means a patient gets censored for different reasons, drops off the study or the study ends. Additionally, two stage variables, T stage and N stage, are given to describe the stages of pancreatic cancer, where T stage describes the size of the primary tumor ranging from 1 to 4 and N stage describes the spread to nearby (regional) lymph nodes with values 0 or 1.
The whole dataset is randomly split into the training, validation, and testing sets with equal sizes. The training set is used for model fitting, and the validation set is used for tuning constants selection. Using the 3fold crossvalidation, we got the optimal values of λ_{1} = 0.3 and λ_{2} = 0.1, which minimize the logpartial likelihood function. Figure 2 shows the 3D plots of the logpartial likelihood function and the number of selected genes vs. (λ_{1}, λ_{2}), respectively. Under the optimal tuning constants, 4 pathways and 15 genes are selected from the pool of 12660 probes of 6910 genes in 130 pathway sets organized in [1], which belong to 15 core groups in the pancreatic cancer studies. The selected pathways include the pathways of "regulation of DNAdependent transcription" (6 out of 2096 genes are selected), "Ion transport" (7 out of 555 genes are selected), "immune phagocytosis" (1 out of 215 genes is selected), and "TGFβ (spermatogenesis)" (1 out of 268 genes is selected) pathways. These identified pathways and genes are biologically meaningful and consistent with the existing scientific findings. In particular, three genes  ZNF233, SLC22A8, and PCYT1B  were identified in the previous work [16] using a Lasso penalized Cox model when considering gene signatures only.
Regulation of DNAdependent transcription pathway is wellknown to be related to the development of cancer. It regulates the frequency and rate of cellular DNAdependent transcription. This work identified three families of six genes that are related to pancreatic cancer survival. The six genes are DENND4A, KLF13, ZNF229, ZNF233, ZNF395, and ZNF432.

DENND4A is a cmyc promoterbinding protein [34], which mediates signal transduction in the nucleus and regulate the DNA replication and transcription. DENND4A can also activate the RAB10 protein, which is a key regulator of polarized sorting in epithelial cells, from an inactive GDPbound form to an active GTPbound form through promoting GDP → GTP exchange.

KLF13 belongs to the KLF family of transcription factors for several oncogenes and tumor suppressor genes [35, 36] and it plays an important role in the tumor progression [36]. Recent study shows that KLF13 is overexpressed in the oral cancer cells. Inhibiting KLF13's expression can decrease the proliferation of cancer cell and increase its sensitivity to ionizing radiation [36]. In pancreatic cancer, KLF13 can suppress the cell growth and neoplastic transformation mediated by KRAS, which is mutated in more than 90% of pancreatic tumors [35]. Our work suggests that KLF13 may be a useful biomarker for early detection and possible targets for the pancreatic cancer therapy.

Zinc finger protein family members: ZNF229, ZNF233, ZNF395 and ZNF432 are DNAbinding protein domains consisting of zinc fingers. Many of these zinc finger proteins, including ZNF233 (also identified in the previous work [16]), have been found to be associated with the abnormality of chromosome 19 in the studies of kidney [23] and pancreatic cancers [1]. Our analysis reveals that Zinc finger proteins and the corresponding pathway might be associated with the survival of pancreatic cancer.

Ion transport pathway plays integral roles in the development of cancer. Since the plasma membrane ion channels contribute to all basic cellular process [37, 38], many ion channels are implicated in the uncontrolled proliferation, decreased apoptosis, and unorganized angiogenesis. According to [37], the ion channels also contribute to the six hallmarks of cancer [39]: "1) selfsufficiency in growth signals, 2) insensitivity to antigrowth signals, 3) evasion of programmed cell death (apoptosis), 4) limitless replicative potential, 5) sustained angiogenesis and 6) tissue invasion and metastasis."
We identified seven genes from three different channels or families, including the TRP channel (TRPV5 and TRPM6) regulating the transcellular Ca^{2+} transport, KCNK channel (KCNK3 and KCNK18) regulating the K^{+} transport, and solute carrier (SLC) family (SLC22A8, SLC8A3, and SLC24A6). Recent experimental studies have indicated that these three families play important roles in the cancer development.

TRP ( Ca^{2}+) channel and TRPV5, TRPM6 genes regulate the Calciummediated signal transduction that is frequently altered in cancer [40]. Several genes in TRPV channel have been detected to be upregulated in prostate, colon, and breast cancer cells [40–42]. Particularly, TRPV5 and TRPV6 genes exhibit unusually high levels of single nucleotide polymorphisms (SNPs) in African populations as compared to other populations [41]. Moreover, the genes TRPM6 and TRPM7 in the TRPM channel can enhance the secretion of angiogenic factors, for example VEGF [40], resulting in a sustained unorganized angiogenesis process. The TRP channel and TRPV5, TRPM6 genes identified in pancreatic cancer survival data could be possible targets for the future cancer diagnosis and treatment.

KCNK ( K+) channel and KCNK3, KCNK18 genes regulate the potassium (K+) transport and membrane potential (Vm) in response to different physical and chemical factors [38, 40]. Several KCNK channel genes, for example, KCNK9 [43], are overexpressed in breast and lung cancers, and the gene KCNK2 can promote prostate cancer cell's growth [40, 44].

SLC family: SLC22A8, SLC8A3, SLC24A6 are membrane transport proteins that are involved in the transport and excretion of many organic ions, drugs and toxicants. Some genes in SLC family are cancerrelated, for example, SLC43A2 whose overexpression is associated with the adenocarcinomas and squamous cell carcinoma [45], which was identified in the previous work [16].
Immune phagocytosis pathway and CYBA gene: One prominent hallmark feature of cancer is the evasion of immune destruction [39]. The immune system is important in preventing tumor initiation and controlling tumor growth through identifying and eliminating the cancer cells (i.e., tumor immune surveillance) [46]. Macrophages and other phagocytic cells are important players in the innate immune system whose functions include phagocytosis (homeostatic cell clearance), antigen presentation (pathogen defense), and cytokine production (inflammatory responses). Recent evidence [46–48] revealed that the active immune phagocytosis pathway could inhibit tumor growth through phagocytic clearance, i.e., programmed cell removal in clearing damaged and foreign cells. The CYBA gene is a tumor suppressor [49], which regulates the immune system cells  phagocytes, involved in autophagy. The phagocytosis and superoxide production is primary regulated by the cytochrome b 245, (light) alpha subunit (also known as p 22^{phox}), which is encoded by the gene CYBA. CYBA's mutation will cause the failure of phagocytosis and immune defects [50]. This observation supports our prediction that the immune phagocytosis and tumor suppressor gene CYBA might be associated with pancreatic cancer survival and tumor immune evasion. Targeting this pathway might lead to effective cancer immunotherapies.
TGF β core pathway (spermatogenesis signaling set) and PCYT1B gene: The transforming growth factor beta (TGFβ) signaling pathway is critical in regulating many cellular processes, including the cell growth, differentiation and apoptosis. It has genetic alterations in 100% of pancreatic cancers [1]. The gene PCYT1B (phosphate cytidylyl transferase 1 choline β) was identified to be associated with pancreatic cancer survival, which is consistent with the previous work [16]. The expression of PCYT1B is frequently deregulated in cancer cells of epithelial ovarian [21], high grade gliomas [51], and pancreatic ductal adenocarcinoma [22]. Moreover, PCYT1B is a key regulator in the choline phospholipid metabolism, which is altered in the cancers of breast [19], colon [20], ovarian [21], and gliomas [51]. These observations support our prediction that PCYT1B and TGFβ pathway are correlated with pancreatic cancer survival and they might help to grade the stage of pancreatic cancer patients.
Compared with the previous work [16], which selected 12 survivalrelevant genes using a Lasso penalized Cox model, the DrCox model identified 4 pathways and 15 genes related to pancreatic cancer survival. We divide the patients into long and shortsurvival groups based on the selected pathways and genes and conduct the logrank test to compare the two groups. The survival probabilities of these two groups are plotted in the Figure 3. The logrank test gives a pvalue of 0.0179, which means the two groups can be well separated and our finding of 4 pathways and 15 genes is significant.
Conclusions
In this work, we employed the doubly regularized Cox (DrCox) regression coupled with the coordinate descent algorithm to analyze the highdimensional gene expression data of patients with localized and resected PDAC. Different from the previous work [16], this DrCox model can incorporate both gene and pathway information and simultaneously infer genetic signatures and important signaling pathways that are related to the pancreatic cancer survival. The proposed cyclic coordinate descent algorithm can quickly remove irrelevant genes and signaling pathways, so the prediction of survival time is more accurate and robust than other methods. Other group selection models select variables in an "allinorallout" fashion with no withingroup selection, that is, if one variable in a group (pathway) is selected, all the other variables in the same group will get selected. For example, if gene PCYT1B in the TGFβ pathway is selected, all the rest of genes in the TGFβ pathway will be selected as well. However, not all the genes in the TGFβ pathway are involved in the development of pancreatic cancer. The advantage of our DrCox method is that it can conduct both group selection and withingroup selection simultaneously and eliminate the irrelevant.
This work identified four signaling pathways, including Ion transport, immune phagocytosis, TGFβ (spermatogenesis), regulation of DNAdependent transcription pathways, and 15 genes within these four pathways, which are directly correlated to pancreatic cancer survival. Pancreatic cancer patients with these deregulated signaling pathways and mutated genes might have a shorter survival time. Several inferred signaling components have been confirmed to be altered frequently in the cancer of pancreatic, oral, prostate, colon, breast and lung in the in vivo or in vitro experiments. Our finding predicts that, the TRP (Ca^{2}+) channelrelated genes (TRPV5 and TRPM6) and KCNK (K+) channelrelated genes in the ion transport pathway are possible biomarkers of pancreatic cancer survival. The Immune phagocytosis pathway with the tumor suppressor CYBA gene, which regulates the immune system cells and autophagy through phagocytic clearance, have not received enough attention in the existing pancreatic cancer research literature. The gene PCYT1B in the TGFβ pathway is frequently deregulated in cancer cells compared with normal cells, which might help to grade the stage of pancreatic cancer patients. The KLF13 in the regulation of DNAdependent transcription pathway could regulate the cell growth through regulating KRAS pathway. These findings demonstrate that these survivalassociated genetic signatures and pathways could be useful biomarkers for early cancer detection and diagnosis and help pancreatic cancer researchers to grade the cancer stage and select appropriate therapies to prolong the patient's survival time at different stages.
This work is the first attempt to infer the pancreatic cancer survivalassociated signaling pathway sets and genetic signatures within those pathways using statistical techniques. However, any statistical findings need to be tested by the further clinical and wet lab experiments of pancreatic cancer. We are unable to test our results with other independent datasets in this paper due to the data source limitation. We do expect our results can get verified or falsified by further investigation. We hope the genetic signatures and pathways found in this paper could help cancer researchers design new strategies for the early detection and diagnosis and lead to effective treatments and immunotherapies for pancreatic cancer.
References
 1.
Jones S, Zhang X, Parsons DW, Lin JC, Leary RJ, Angenendt P, Mankoo P, Carter H, Kamiyama H, Jimeno A, Hong SM, Fu B, Lin MT, Calhoun ES, Kamiyama M, Walter K, Nikolskaya T, Nikolsky Y, Hartigan J, Smith DR, Hidalgo M, Leach SD, Klein AP, Jaffee EM, Goggins M, Maitra A, IacobuzioDonahue C, Eshleman JR, Kern SE, Hruban RH, Karchin R, Papadopoulos N, Parmigiani G, Vogelstein B, Velculescu VE, Kinzler KW: Core Signaling Pathways in Human Pancreatic Cancers Revealed by Global Genomic Analyses. Science. 2008, 321: 18011806. 10.1126/science.1164368.
 2.
Vonlaufen A, Joshi S, et al.: Pancreatic Stellate Cells: Partners in Crime with Pancreatic Cancer Cells. Cancer Res. 2008, 68: 20852093. 10.1158/00085472.CAN072477.
 3.
Bardeesy N, DePinho RA: Pancreatic cancer biology and genetics. Nature Reviews Cancer. 2002, 2 (12): 897909. 10.1038/nrc949.
 4.
Maitra A, Hruban R: Pancreatic Cancer. Annu Rev Pathol Mech Dis. 2008, 3: 157188. 10.1146/annurev.pathmechdis.3.121806.154305.
 5.
Friess H, Ding J, Kleeff J, et al: Microarraybased identification of differentially expressed growth and metastasisassociated genes in pancreatic cancer. Cell Mol Life Science. 2003, 60: 11801199.
 6.
Stratford JK, Bentrem DJ, Anderson JM, et al: A SixGene Signature Predicts Survival of Patients with Localized Pancreatic Ductal Adenocarcinoma. PLoS Med. 2010, 7: e100030710.1371/journal.pmed.1000307.
 7.
Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci. 2001, 98: 51165121. 10.1073/pnas.091062498.
 8.
Hu Z, Fan C, Oh DS, Marron JS, He X, et al: The molecular portraits of breast tumors are conserved across microarray platforms. BMC Genomics. 2006, 7: 9610.1186/14712164796.
 9.
Camp RL, DolledFilhart M, Rimm DL: Xtile: a new bioinformatics tool for biomarker assessment and outcomebased cutpoint optimization. Clin Cancer Res. 2004, 10: 72527259. 10.1158/10780432.CCR040713.
 10.
Kaplan EL, Meier P: Nonparametric estimation from incomplete observations. Journal of the American Statistical Association. 1958, 53: 457481. 10.1080/01621459.1958.10501452.
 11.
Cox DR: Regression models and lifetables (with discussion). J R Statist Soc B. 1972, 34: 187220.
 12.
Ma S, Song X, Huang J: Supervised group Lasso with applications to microarray data analysis. BMC Bioinformatics. 2007, 8: 6076. 10.1186/14712105860.
 13.
Sohn I, Kim J, Jung SH, Park C: Gradient lasso for Cox proportional hazards model. Bioinformatics. 2009, 25: 17751781. 10.1093/bioinformatics/btp322.
 14.
Tibshirani R: Univariate Shrinkage in the Cox Model for High Dimensional Data. Stat Appl in Genet and Mol Biol. 2009, 8: 21
 15.
Tibshirani R: The lasso method for variable selection in the Cox model. Statist Med. 1997, 16: 385395. 10.1002/(SICI)10970258(19970228)16:4<385::AIDSIM380>3.0.CO;23.
 16.
Wu TT, Gong H, Clarke EM: A Transcriptome Analysis by Lasso Penalized Cox Regression for Pancreatic Cancer Survival. Journal of Bioinformatics and Computational Biology. 2011, 9: 6310.1142/S0219720011005744.
 17.
Shi Y, Zhai H, Wang X, et al: Ribosomal proteins S13 and L23 promote multidrug resistance in gastric cancer cells by suppressing druginduced apoptosis. Exp Cell Res. 2004, 296: 33746. 10.1016/j.yexcr.2004.02.009.
 18.
Denis M, Chadeneau C, et al: Overexpression of the S13 ribosomal protein in actively growing cells. Int J Cancer. 1993, 55: 275280. 10.1002/ijc.2910550218.
 19.
KatzBrull R, Seger D, RivensonSegal D, Rushkin E, Degani H: Metabolic markers of breast cancer: enhanced choline metabolism and reduced cholineetherphospholipid synthesis. Cancer Res. 2002, 62: 196670.
 20.
Mori N, Delsite R, Natarajan K, Kulawiec M, Bhujwalla Z, Singh K: Loss of p53 function in colon cancer cells results in increased phosphocholine and total choline. Mol Imaging. 2004, 3: 31923. 10.1162/1535350042973517.
 21.
Iorio E, Ricci A, Bagnoli M, et al: Activation of phosphatidylcholinecycle enzymes in human epithelial ovarian cancer cells. Cancer Res. 2010, 70: 21262135. 10.1158/00085472.CAN093833.
 22.
von Forstner C, Egberts JH, Ammerpohl O, et al: Gene Expression Patterns and Tumor Uptake of 18F FDG, 18FFLT, and 18FFEC in PET/MRI of an Orthotopic Mouse Xenotransplantation Model of Pancreatic Cancer. J Nucl Med August. 2008, 49: 13621370. 10.2967/jnumed.107.050021.
 23.
Koeman J, Russell R, et al: Somatic Pairing of Chromosome 19 in Renal Oncocytoma is Associated with Deregulated ELGN2Mediated OxygenSensing Response. PLoS Genetics. 2008, 4: e100017610.1371/journal.pgen.1000176.
 24.
Cuezva J, Chen G, Alonso A, Isidoro A, Misek D, Hanash S, Beer D: The bioenergetic signature of lung adenocarcinomas is a molecular marker of cancer diagnosis and prognosis. Carcinogenesis. 2004, 25: 115763. 10.1093/carcin/bgh113.
 25.
Hernlund E, Hjerpe E, AvallLundqvist E, Shoshan M: Ovarian carcinoma cells with low levels of betaF1ATPase are sensitive to combined platinum and 2 deoxyDglucose treatment. Mol Cancer Ther. 2009, 8: 191623. 10.1158/15357163.MCT090179.
 26.
Yuan M, Lin Y: Model selection and estimation in regression with grouped variables. J R Statist Soc B. 2006, 68: 4967. 10.1111/j.14679868.2005.00532.x.
 27.
Luan Y, Li H: Group additive regression models for genomic data analysis. Biostatistics. 2008, 9: 100113. 10.1093/biostatistics/kxm015.
 28.
Wei Z, Li H: Nonparametric pathwaybased regression models for analysis of genomic data. Biostatistics. 2007, 8: 265284. 10.1093/biostatistics/kxl007.
 29.
Wu TT, Wang S: Doubly Regularized Cox Regression for Highdimensional Survival Data with Group Structures. Statisstics and Its Interface. 2013, 6: 175186. 10.4310/SII.2013.v6.n2.a2.
 30.
Fu WJ: Penalized regressions: the bridge versus the lasso. J Comp and Graph Stat. 1998, 7: 397416.
 31.
Friedman J, Hastie T, Hoefling H, Tibshirani R: Pathwise coordinate optimization. Ann Appl Stat. 2007, 1: 302332. 10.1214/07AOAS131.
 32.
Wu TT, Lange K: Coordinate descent algorithms for lasso penalized regression. Ann Appl Stat. 2008, 2: 224244. 10.1214/07AOAS147.
 33.
Wu TT, Chen YF, Hastie T, Sobel E, Lange K: Genomewide association analysis by lasso penalized logistic regression. Bioinformatics. 2009, 25: 714721. 10.1093/bioinformatics/btp041.
 34.
Ray R, Miller D: Cloning and characterization of a Human cmyc promoterbinding protein. Molecular and Cellular Biology. 1991, 21542161.
 35.
FernandezZapico ME, Billadeau DD, Urrutia R: Klf13 Suppresses the Transforming Activity of KRas By Direct Downregulation the Cyclin B Gene. Pancreas. 2004, 29: 360
 36.
Henson B, Gollin S: Overexpression of KLF13 and FGFR3 in oral cancer cells. Cytogenet Genome Res. 2010, 128: 192198. 10.1159/000308303.
 37.
Prevarskaya N, Skryma R, Shuba Y: Ion channels and the hallmarks of cancer. Trends in Molecular Medicine. 2010, 16: 107121. 10.1016/j.molmed.2010.01.005.
 38.
Goldstein S, Bockenhauer D, O'Kelly I, Zilberberg N: Potassium leak channels and the KCNK family of twopdomain subunits. Nature Reviews Neuroscience. 2001, 2: 175184. 10.1038/35058574.
 39.
Hanahan D, Weinberg RA: The hallmarks of cancer. Cell. 2000, 100: 5770. 10.1016/S00928674(00)816839.
 40.
Bodding M: TRP proteins and cancer. Cell Signal. 2007, 19: 617624. 10.1016/j.cellsig.2006.08.012.
 41.
Peng J: TRPV5 and TRPV6 in transcellular Ca(2+) transport: regulation, gene duplication, and polymorphisms in African populations. Adv Exp Med Biol. 2011, 704: 23975. 10.1007/9789400702653_14.
 42.
Dong H, et al: Molecular mechanisms underlying Ca2+mediated motility of human pancreatic duct cells. Am J Physiol Cell Physiol. 2010, 299 (6): C1493503. 10.1152/ajpcell.00242.2010.
 43.
Mu D, et al: Genomic amplification and oncogenic properties of the KCNK9 potassium channel gene. Cancer Cell. 2003, 3: 297302. 10.1016/S15356108(03)000540.
 44.
Voloshyna I, et al: TREK1 is a novel molecular target in prostate cancer. Cancer Res. 2008, 68: 11971203. 10.1158/00085472.CAN075163.
 45.
Haase C, Bergmann R, Fuechtner F, Hoepping A, Pietzsch J: Ltype amino acid transporters LAT1 and LAT4 in cancer: uptake of 3Omethyl618FfluoroLdopa in human adenocarcinoma and squamous cell carcinoma in vitro and in vivo. J Nucl Med. 2007, 48: 20632071. 10.2967/jnumed.107.043620.
 46.
Jaiswal S, MP Chao RM, Weissman I: Macrophages as mediators of tumor immunosurveillance. Trends Immunol. 2010, 31: 212219. 10.1016/j.it.2010.04.001.
 47.
Chao MP, W I, Majeti R: Programmed cell removal: A new obstacle in the road to developing cancer. Nat Rev Cancer. 2011, 12 (1): 5867.
 48.
Chao MP, Weissman IL, Majeti R: The CD47SIRPalpha Pathway in Cancer Immune Evasion and Potential Therapeutic Implications. Curr Opin Immunol. 2012, 24: 225232. 10.1016/j.coi.2012.01.010.
 49.
Powell JA, et al: Sequencing, Transcript Identification, and Quantitative Gene Expression Profiling in the Breast Cancer Loss of Heterozygosity Region 16q24.3 Reveal Three Potential TumorSuppressor Genes. Genomics. 2002, 80: 303310. 10.1006/geno.2002.6828.
 50.
Nakano Y, LongoGuess CM, Bergstrom DE, et al: Mutation of the Cyba gene encoding p22phox causes vestibular and immune defects in mice. J Clin Invest. 2008, 118: 11761185.
 51.
Righi V, Roda J, Paz J, et al: 1H HRMAS and genomic analysis of human tumor biopsies discriminate between high and low grade astrocytomas. NMR in Biomedicine. 2009, 22: 629637. 10.1002/nbm.1377.
Acknowledgements
HG was supported by the new faculty startup grant from the Saint Louis University, TTW and EMC were supported by NSF grant CCF0926181.
Declarations
Publication of this article was funded by the Saint Louis University to HG and NSF grant CCF0926181 to TTW.
This article has been published as part of BMC Systems Biology Volume 8 Supplement 1, 2014: Selected articles from the Twelfth Asia Pacific Bioinformatics Conference (APBC 2014): Systems Biology. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcsystbiol/supplements/8/S1.
Author information
Additional information
Competing interests
The authors declare that there are no competing interests.
Authors' contributions
HG and TTW are joint first authors and both authors contributed equally. HG, TTW, and EMC proposed the study, TTW prepared the computational code, HG and TTW analyzed the results and wrote the manuscript. All authors read and approved the final manuscript.
Haijun Gong, Tong Tong Wu contributed equally to this work.
Rights and permissions
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Gong, H., Wu, T.T. & Clarke, E.M. Pathwaygene identification for pancreatic cancer survival via doubly regularized Cox regression. BMC Syst Biol 8, S3 (2014). https://doi.org/10.1186/175205098S1S3
Published:
Keywords
 Pancreatic Cancer
 Lasso
 Pancreatic Cancer Patient
 Group Lasso
 PDAC Patient