Cliques for the identification of gene signatures for colorectal cancer across population

Pradhan, Meeta P; Nagulapalli, Kshithija; Palakal, Mathew J

doi:10.1186/1752-0509-6-S3-S17

Volume 6 Supplement 3

The International Conference on Intelligent Biology and Medicine (ICIBM): Systems Biology

Research
Open access
Published: 17 December 2012

Cliques for the identification of gene signatures for colorectal cancer across population

Meeta P Pradhan¹,
Kshithija Nagulapalli¹ &
Mathew J Palakal¹

BMC Systems Biology volume 6, Article number: S17 (2012) Cite this article

3005 Accesses
9 Citations
Metrics details

Abstract

Background

Colorectal cancer (CRC) is one of the most commonly diagnosed cancers worldwide. Studies have correlated risk of CRC development with dietary habits and environmental conditions. Gene signatures for any disease can identify the key biological processes, which is especially useful in studying cancer development. Such processes can be used to evaluate potential drug targets. Though recognition of CRC gene-signatures across populations is crucial to better understanding potential novel treatment options for CRC, it remains a challenging task.

Results

We developed a topological and biological feature-based network approach for identifying the gene signatures across populations. In this work, we propose a novel approach of using cliques to understand the variability within population. Cliques are more conserved and co-expressed, therefore allowing identification and comparison of cliques across a population which can help researchers study gene variations. Our study was based on four publicly available expression datasets belonging to four different populations across the world. We identified cliques of various sizes (0 to 7) across the four population networks. Cliques of size seven were further analyzed across populations for their commonality and uniqueness. Forty-nine common cliques of size seven were identified. These cliques were further analyzed based on their connectivity profiles. We found associations between the cliques and their connectivity profiles across networks. With these clique connectivity profiles (CCPs), we were able to identify the divergence among the populations, important biological processes (cell cycle, signal transduction, and cell differentiation), and related gene pathways. Therefore the genes identified in these cliques and their connectivity profiles can be defined as the gene-signatures across populations. In this work we demonstrate the power and effectiveness of cliques to study CRC across populations.

Conclusions

We developed a new approach where cliques and their connectivity profiles helped elucidate the variation and similarity in CRC gene profiles across four populations with unique dietary habits.

Background

Colon rectal cancer (CRC) is the third most commonly diagnosed cancer worldwide. It is the second leading cause of cancer death in the United States, and worldwide, nearly 608,000 deaths are reported every year due to CRC. The CRC incidence rate varies across the globe. For example, low incidence rates for CRC have been associated with Asian and African populations. Dietary and environmental factors have also been known to contribute to CRC patterns [1]. Therefore, we postulate that there are some common as well as some unique key gene signatures that can discriminate CRC across populations.

Due to the advent of high through-put technologies, a multitude of public domain expression datasets are now available for CRC research. These datasets are generated worldwide and deposited with the objective of identifying key molecules that play an important role in different stages of CRC. Gene-expression profiling and meta-analysis have been extensively used to: a) understand the mechanisms that drive a normal cell to become a cancer cell, b) understand different metastatic levels [2–6], and c) identify biomarkers [7]. Differentially expressed genes have been identified as biomarkers in leukemia, B-cell lymphoma, breast and lung cancers [8–11]. Gene signatures are a set of genes that might play an important role in a given disease. Using gene expression datasets, gene signatures were identified in different cancers [12–14]. First attempts to identify gene signatures from gene expression were done in breast cancer [10]. Genes combine together and act as pathways to perform a biological function and genes in a given pathway are co-expressed [15]. Large-scale efforts are being made to identify the biomarkers associated with specific pathways and biological function using gene expression profiles [16–21]. A single pathway can be deregulated by different mechanisms or combination of genes. Also, a set of genes can target one or many pathways. Gene signatures can help to identify these patterns in pathways and also the relationships among them [22]. First attempts for identifying gene signatures were done for breast cancer [10] and have since been used in various other cancers as well [12–14].

Network based approaches have been used to identify subnetwork markers (gene signatures) that are more reproducible than individual markers [23–25]. Functional modules extracted from networks are groups of genes with same functions [26]. The genes in the subnetworks are co-expressed (high/low) and they share more interactions among them, than with other genes in the larger network [27, 28]. These functional modules can be used to identify both similar and unique biological characteristics among different species datasets [29] and are also considered to be subnetworks [30]. In protein-protein interaction networks, these functional modules are present as sub-graphs or tightly connected sub-graphs [31, 32] and can be analyzed with respect to their individual characteristics using either Gene Ontology similarities or Pathway significance [33–35]. Identification of regulatory modules or gene subnetworks is important as they play critical roles in biological processes [36] and their associated pathways can provide potential targets for drug intervention in cancers [37].

Though gene signatures can improve understanding of a disease, identification of these signatures across populations is difficult, as gene expression is known to vary between populations [38]. Even though modules have been effectively used for the identification of gene signatures, this approach is computationally complex because the modules are open subnetworks [39], meaning that within a disease network, a very large number of modules will be identified [40]. Therefore, use of modules for comparing gene signatures across populations is computationally an intractable problem. Though attempts have recently been made to understand the difference in CRC between African-Americans and European-Americans [41] using a systems biology approach. However, not much work has been done in the area of gene signature identification across populations with respect to CRC.

Due to the complexity of gene signature identification, we propose the use of cliques as an alternative to modules for the comparison of gene signatures across populations. Cliques are closed, fully-connected subnetworks. The genes that are identified as part of these cliques are functionally related and highly co-expressed [33]. Since cliques are closed networks, they are both computationally tractable and more conserved in the biological networks [42]. A clique consists of molecules that can be associated with one or many pathways and these molecules are related with their Gene Ontologies [43]. A recent study reported the use of cliques in elucidating the mechanisms involved in breast cancer [44].

In this paper we have attempted to understand CRC gene signatures across four different populations: USA, Germany (GER), China (CHN), and Saudi Arabia (SA). The studies on each of these populations were conducted separately, and the data was downloaded from public repositories GEO http://www.ncbi.nlm.nih.gov/geo. For the study model, we hypothesized that tumors target biological modules that execute specific biological processes [45]. Since cliques are fully connected conserved subnetworks within biological networks, our hypothesis is that they are conserved across populations and can be understood as gene signatures. Therefore we propose to understand these cliques in CRC across populations. In this work we integrated the expression data along with network topological features and biological features. Cliques were then scored based on these features. Our work identified the common and unique cliques across populations that were important with respect to CRC. To identify the important cliques we analyzed the networks based on the following perspectives: (i) identification of genes from individual datasets based on p-value; (ii) construction of gene networks for each population; (iii) annotation of nodes and edges of networks with topological and biological features; (iv) identification of cliques across networks; (v) comparison of the cliques in all the networks based on their strength and connectivity profiles; and, (vi) evaluation of the cliques as gene signatures based on their biological significance in CRC.

Results and discussion

jIn order to decipher the gene signatures and identify the similarity-uniqueness among the four different populations of CRC (USA, GER, CHN, SA), we developed a methodology as described in Figure 1. Our methodology involved identifying genes in each dataset that satisfied the two sample t-test, construction of the gene networks using Human Protein Reference Database (HPRD) [46], obtaining the gene expression profiles (up- and down-regulated genes), identifying cliques in each dataset and comparing them across the populations, and connecting the cliques in each network to identify a Clique Connectivity Profile (CCP) and comparing them across populations.

Data analysis

The gene expression in all the four datasets was first normalized using the R-package RMA algorithm [47]. The two-sample t-test was used to identify the differentially expressed genes in each dataset. The genes satisfying the t-test (p-value < 0.05 and q-value with FDR < 0.1) in each dataset were then used to construct the networks. Figure 2(a) shows the profile of gene expression across the population dataset.

Network construction

To construct the gene network for each population, we used only those genes that coded for proteins present in the HPRD database [46]. The networks were compared with respect to their node similarity. Table 1 shows the node (i.e., gene) similarity across the four populations. As shown in Table 1, a large number of genes were common among USA, CHN and GER, but there were fewer genes common with SA.

Table 1 Node similarity across population

Full size table

Analysis of population specific networks

To analyze these population-specific networks with respect to their topological and biological features, these networks were first compared with the HPRD network for their interactions, degree, diameter, and average path length. Table 2 shows the results of this comparison. The average path-length is the overall ease with which the genes in the network communicate with each other. Though the degree and number of interactions differ for GER, USA, and SA with HPRD, the diameter and the average path lengths of these networks is in accordance with HPRD. Therefore these networks have the ability to generate functional complexes or modules and can be analyzed with respect to their biological processes.

Table 2 Comparison of population networks with HPRD network

Full size table

For further analysis of the networks, Pearson Correlation Coefficients (PCC) were computed for each edge, and correlations greater than 0.6 and less than -0.6 were considered. This reduced the size and the complexity in the networks.

Analysis of node strength based on topological properties

The node strength for each node in the population specific network was computed using the topological parameters - namely, eccentricity, closeness and betweenness (see method section, eqns. (i), (ii), (iii), and (iv)). The common high scoring nodes identified in all the population were: TP53, SRC, ESR1, SMAD3, GRB2, EP300, CREBBP, EGFR, SMAD2, and CSN2KA1. Of these, the transcription factors TP53, ESR1, SMAD3, and SMAD2 were identified as important in CRC [48]. GRB2 overexpression has also been identified in CRC [49] It was reported that binding of GRB2 with GAB2 plays an important role in CRC [50]. GAB2 additionally interacts with BCR-ABL, activating BCR-ABL-associated CRC significant pathways - specifically, PI3K-mTOR, RAS/RAF/MEK/ERK, and JAK-STAT [51]. EP300 has also been identified to be involved in CRC [52]. There were also a few unique top scoring nodes: ATXN1 for GER and PRKCA, PRKACA and UBQLN4 for SA. Not many references were available for UBQLN4 and ATXN1 in CRC, although ATXN1 has been associated with cancer pathways [53–55]. The topological analysis revealed top scoring genes that are common (known) as well as unique (not fully understood) with their significance in CRC across the four populations.

Analysis of biological features for population specific networks

The biological feature analysis provided the edge strength for all the genes in all four population networks. The three biological features considered for the analysis were: PCC, Gene Ontology distance, and Pathway similarity score. The PCC had already been computed during the network construction process as described earlier (method section - equation (v)). The number of edges satisfying the PCC in each population network was: 15876 (USA), 12769 (CHN), 11664 (GER), and 9025 (SA). As it is known that, biological processes are essentially a series of events accomplished by one or more molecular functions. Each node in the network was associated with its biological processes. Gene Ontology distance was computed across an edge between two nodes in the network (method section: equation (vi)). The number of unique Gene Ontology biological processes terms identified in each of the population networks were: 2806 (CHN), 2801 (GER), 2674 (SA), and 2791 (USA). The following Gene Ontology biological processes were associated with maximum number of genes in all networks: GO: 0007165 (signal transduction), GO: 0006468 (protein phosphorylation), GO: 0006955 (immune response), and GO: 0055114 (oxidation-reduction process). Of these processes, signal transduction pathways are currently used as therapeutic targets in CRC [56], and immune response has been associated with CRC progression [57]. Since these biological processes are known to be important in CRC, we concluded that GO biological process should be a key feature for computing the EdgeStrength (method section - equation (vi)). Another key biological feature that we use was the Pathway similarity score between two nodes. The pathways were identified using the KEGG database [53, 54], and the number of unique pathways identified for the respective populations were: 99 (CHN), 92 (USA), 54 (GER), and 87 (SA). There were a total of 105 unique pathways across all the four populations: forty-nine pathways were common to all four countries, thirty-five were common to three countries, seven were common to two countries, and five were unique to one country. The CRC pathways identified across all the countries were: Chemokine signaling pathways, Wnt-signaling pathway, MAPK signaling pathway, JAK-STAT pathway, Calcium-signaling pathway, ErbB signaling pathway, and Pathways in cancer [54, 58, 59]. The association of major CRC pathways with the nodes thus justified the use of the pathway similarity score as an essential feature for computing edge strength. Each node in the network was annotated with its pathway, and the pathway similarity score was computed across an edge of two nodes in the network (method section - equation (vii)).

Through these various analyses, we obtained the topological and biological features for all four populations to compute the NodeStrength and EdgeStrengths for their respective genes in the networks.

Identification of cliques

Genes with similar expression patterns across various networks perform similar functions [27, 28]. Both functional modules and interacting modules have similar co-expressed genes [60]. Based on this understanding, we designed an algorithm that identified the cliques (described in method section) in each population network. Figure 2(b) shows the distribution of the number of identified cliques of different sizes in each of the population-specific network. Figure 2 (c) shows the total number of unique cliques identified for all four populations. The largest number of cliques (all sizes included) was identified for USA and minimum for SA respectively. In this analysis, we considered only cliques of node size seven, as this size was found to be consistent across all four population networks, while cliques of higher sizes were not found across all the populations. For the specified clique size of seven, a total of 650 cliques were identified across the four populations. These cliques were then further analyzed with respect to their distribution in the different populations. There were 49 cliques common to all populations, while 20, 10, and 1 unique clique were identified in USA, GER, and CHN, respectively. Figure 2 (d) shows the Venn diagram for the distribution of size seven cliques across the four populations. The total number of genes identified in these cliques within each population network was: 126 (USA), 114 (CHN), 108 (GER), and 95 (SA). We identified 137 genes in total, with 57 of those genes common among all cliques across the populations.

Analysis of cliques common across the populations

To understand the significance of the cliques across the populations, we first analyzed all the cliques with respect to their GO biological processes and pathway associations. The numbers of GO biological processes associated with cliques for each population network were: 247 (USA), 235 (GER), 222 (CHN), and 192 (SA), 247 (USA). GO terms with hyper-geometric p-values < 0.05 for each population were identified (method section- equation (xi)); GO: 0007165, GO: 0006468 and GO: 007049 were identified as the processes with the smallest hyper-geometric p-values across all cliques in all population networks. GO:0007165 was associated with signaling pathways, which have been identified as the targets for CRC [61]. GO: 007049 was associated with cell division cycle. Previous studies found genes involved in cell cycle, apoptosis, and invasion to play an important role in CRC [62]. GO: 0006468 was associated with protein phosphorylation. TGF-beta is a key pathway involved in CRC, and progression of this pathway is known to be dependent on protein phosphorylation [48, 61, 63]. The validation of these pathways with respect to involvement in CRC supports that clique nodes are involved in important GO biological processes, and these enriched GO terms are key features by which CRC can be evaluated across populations.

The identified cliques were further analyzed using GO Term Finder [64], and DAVID level 3 [65]. Cliques identified as significant (p < 0.05) were further evaluated based on literature. Table 3 shows the details of a few common cliques identified in all the population networks and their gene significance in CRC. It was also observed that some of the genes in these common cliques have been widely studied in terms of CRC, while others have relatively sparse available literature.

Table 3 Common cliques across the four population datasets

Full size table

The clique {EGFR, ESR1, GRB2, PIK3R1, PTPN6, SHC1, SRC} in Table 3 was enriched in the following GO biological processes: EGFR signaling pathway, cellular response to growth, and signal transduction; most of the genes in this clique have been identified as significant in CRC. For example, targeted therapy using EGFR is currently available for CRC [66]. EGFR is a trans-membrane tyrosine kinase receptor belonging to HER family of cell surface receptor; it is triggered by ligands and leads to the activation of many intracellular signal transduction pathways (e.g., RAS, PI3K-AKT, STAT) [67] which are known to effect the activation of many transcription factors involved in cellular response (differentiation, apoptosis, proliferation, and migration). ESR1 is used as a epigenetic marker [64], while activation of GRB2/SOS, leads to a cytoplasmic phosphorylation cascade involving KRAS [68, 69].

KRAS pathways are the targeted pathways in CRC [70], PTPN6 mutation has not been identified specifically in CRC, but it has been found in other cancers, including lymphoma and leukemia. Similarly, although SHC1 has not been identified directly in CRC, it has been identified in lung cancer [71].

The clique {BRCA1, CREBBP, EP300, ESR1, SMAD2, SMAD3, TP53} was enriched with the following GO biological processes: regulation of transcription, response to lipid, and positive regulation of cellular metabolic process. Again, these genes have all been studied in terms of CRC or other cancers. CREBBP and EP300 have been identified as prognostic markers for CRC [72], and EP300 is additionally been identified in lipid metabolism in CRC [65]. Some studies have identified BRCA1 [73], up-regulation of CREBBP [52, 72], mutation of EP300, loss of SMAD2 signaling [74] in CRC. The TGF-Beta signaling pathway is known to play an important role as tumor suppressor and tumor promoters in CRC by activating the SMAD2/SMAD3 complex, which enters the nucleus to further regulate transcription [61, 75]. Overexpression and mutation of TP53 has also been associated with CRC [76–78].

The clique {CSN2, CSN3, CSN4, CSN5, CSN6, CSN7A, CSN8, and TP53} is a part of signalosome of CSN9, which acts as protein kinase. The enriched GO biological processes were protein deneddylation and cellular protein metabolism. CSN-mediated protein deneddylation has been identified in the literature to promote Hedgehog-pathways, though it has not been reported specifically in CRC [79]. CSN3 has been identified as essential for cell proliferation in hepatocellular carcinoma, while CSN5 is known to be a regulator of TP53 and MDM2; additionally, CSN6 is known to be important for regulating DNA-damage-associated apoptosis and tumor genesis, as well as enhancing p53-mediated tumor suppression http://www.genedistiller.org. MDM2 has been identified as a probable therapeutic target for CRC [62, 80].

In the clique {DIS3, EXOSC2, EXOSC4, EXOSC5, EXOSC7, EXOSC8, EXOSC9, and MPP6}, the enriched GO biological processes identified were: DNA modification, DNA deamination, and mRNA catabolic process. A form of DNA modification that is used to identify many cancers is DNA methylation in the promoter regions, which causes silencing of many genes [81]. DIS3, which has been identified in cancer genomes, stabilizes RNA and its translation into proteins [82], while EXOSC4 is involved in ribosome biogenesis and is highly up-regulated in CRC [83]. Though not much has been reported about the other genes in this clique with respect to CRC, they have been identified in other cancers.

One of the unique cliques identified for USA was {LSM1, LSM2, LSM3, LSM5, LSM6, LSM7, SMN1}. LSM1 is mapped on chromosome 8p11.2, which has been identified in both prostate cancer [78] and CRC. Although SMN1 has not been identified in cancer directly, it has been proposed to interact with BCL-2, which is associated with CRC, and has a high prognostic value [84, 85]. From this analysis, it can be stated that common and unique cliques identified across population networks are involved in important biological processes in CRC. These cliques include genes that are both well-studied and less-studied in CRC, as well as those known to play a role in other cancers, indicating their importance in CRC networks and in better understanding CRC across populations. This analysis also demonstrates the importance of cliques in the CRC disease and can be used to understand the four population-specific networks.

Analysis of pathways associated with genes in cliques for all populations

Cliques identified in the population-specific networks were further analyzed using the KEGG database for their pathway similarity score. Figure 3 shows the profile of pathways associated with the maximum number of genes in each network. This association varies across populations. For example, Pathways in Cancer is associated with the highest number of genes in all the populations - 26 (CHN), 26 (GER), 18 (USA), and 10 (SA);many of the different pathways that belong to the domain of Pathways in Cancer in the KEGG database were discussed in the previous section. Though JAK-STAT pathways were identified to be associated with clique-genes in all the population, the level of association, as defined by number of clique-genes identified, was higher in GER (9) than in SA (3). Similar observations of varying levels of association were made for many pathways, as can be seen in Figure 3.

Based on our analysis of the common cliques, unique cliques, and pathways associated with the cliques for all four populations, gene signatures for CRC can then be developed from the genes identified in these cliques. Specific gene signatures for each individual population could also be developed using the unique cliques that were identified in each population.

Analysis of CliqueStrength

The parameter CliqueStrength was computed for all cliques in the population-specific networks based on both topological and biological features (method section - equation (ix)). Cliques associated with high CliqueStrength were considered important in these networks.

To assess the usefulness of the CliqueStrength parameter, we analyzed two top-scored cliques that were common or unique across the populations. Table 4 shows the top scored cliques in each network, their associated biological processes (GOTerm Finder, David level -3 terms), and the genes associated with each process. Table 4 additionally shows that the CliqueStrength of a common clique varies across population, as can be seen by comparing scores for the first clique in USA (5.91) and CHN (5.85); this is due to the fact that the parameter is a function of topological and biological features, which, along with network size, is variable across populations. The genes identified in the biological processes in the top scored cliques were either transcription factors (SMAD2, JUN, SMAD3), hub nodes, or those genes discussed in Table 5 that are known to play an important role in CRC. The top scoring clique identified in SA network was {MCM10, MCM2, MCM3, MCM4, MCM6, MCM7, ORC2L}. Interaction of highly expressed ORC2 and MCM6 is responsible for the initiation of DNA replication [46]. Moreover, ORC2L is not yet identified in CRC but is associated with breast cancer [43]. These results suggest that the top scored cliques indeed are associated with genes of significance in CRC.

Table 4 Top scored cliques in each population network

Full size table

Table 5 Analysis of clique connectivity profile MaxCliques

Full size table

The high scoring cliques were further considered as a seed to identify the clique connectivity profiles (CCP) for gene-signature identification. The overall connectivity using the top scoring clique can help to identify the CRC gene signature profile for a specific population.

Discovering clique connectivity profile (CCP)

Cliques cannot carry out biological processes in isolation, but rather, they interact with other cliques in order to perform a biological process. These interactions can also help to identify the interacting pathways between cliques. Identifying a clique's connectivity profile (CCP) is important for better understanding the biological processes and pathways. To decipher how these cliques interacted in the network, we analyzed them based on their connectivity profile. In our algorithm, we considered the connectivity of cliques based on two parameters: (i) identification of common (links) genes, and (ii) CliqueStrength (based on topological and biological features- equation (viii)). The initial condition of the CCP algorithm was the identification of common genes across cliques. For this analysis, the connectivity between two cliques was computed based on the following two conditions: (a) maximum number of common genes across cliques and Highest CliqueStrength (MaxCliques), and (b) minimum number of common genes across cliques and Highest CliqueStrength (MinCliques), (nn < = 4, where nn = number of common genes across cliques).

Analysis based on MaxCliques condition

The clique with maximum CliqueStrength was selected as a seed, and the CCP was determined based on maximum common nodes and highest CliqueStrength until no new cliques could be added. For each iteration, the CliqueConnectivityScore was computed as described in the algorithm (Method section equation (x)). Figure 4 depicts the connectivity profile for one of the top common scoring cliques identified in USA, GER and CHN. These populations had cliques that were common and unique to all three connectivity profiles. Although the three CCPs shown in Figure 4 originated from the same seed, their connectivity profiles diverged at cliques #17 {EP300, ESR1, SMAD4, SMAD3, CBP, AR, CTNNB1}, #39 {BRCA1, EP300, ESR1, SMAD3, SMAD, TP53, SP1}, and #GC1 {EGFR, GRB2, PIK3R1, PTPN6, SHC, SRC, ESR1}. Clique#17 was the first divergent point where the profiles differed for CHN when compared to USA and GER. Expression of AR, which was included in Clique#17, has been associated with BRCA1 mutations in breast cancer [86]. Clique#39 includes BRCA1, whose mutations are associated with early-onset of colorectal cancer [48].

Transcription factors (SMAD2, SMAD3, P53, SMAD4, SP1, JUN) significant in CRC [48] were also identified both as hub nodes in our analysis and in these connectivity profiles. Using GOTerm finder and David-level 3, the biological processes for these CCPs were identified, and pathways associated with these CCPs were obtained from the KEGG database. Table 5 shows the enrichment with respect to GO biological processes and pathway analysis for these clique connectivity profiles.

The biological processes enriched in all three clique connectivity profiles (CCP) included Cellular process, and Cell differentiation. These were analyzed in earlier section of this paper and proved to be significant in CRC. The total numbers of pathways identified for genes present in the CCPs for each population were: 6 (USA), 22 (GER), and 25 (CHN). All pathways identified in USA and GER were also present in CHN. In USA, the pathway with lowest E-value was the Wnt signaling pathway; however, for GER, the ErbB signaling pathway was the lowest. Wnt signaling was identified in CHN along with the MAPK and Chemokine signaling pathways. These pathways are all known to be associated with biological processes in CRC [59, 87–89].

Figure 5 depicts the CCPs constructed using MaxCliques for top scored cliques common to USA and SA. From this figure it can be observed that the CCP diverges at the seed itself, indicating divergence in gene regulation between USA and SA. The biological processes associated with USA CCP were: positive regulation of cellular processes (1.2E-7), regulation of cell differentiation (1.1E-4), regulation of DNA binding (0.0062), and regulation of cell growth (1.3E-3) associated with CRC pathways. The biological processes associated with SA CCP were: regulation of metabolic process (4.8E-06), regulation of cell differentiation (6.9E-4), regulation of immune response (9.83E-09), and their associated pathways were: TGF-beta signaling, Wnt signaling, NOD-like receptor pathways, Toll-like signaling pathways, and MyD88 induced toll-like receptor pathways. Most of the biological processes identified across the CCP were common to both and are known to be associated with CRC, but the pathways associated with these processes were not overlapping. In the SA CCP, toll-like signaling pathway is identified. Toll-like receptor pathways play a key role in all the immune responses in CRC and are identified for cancer therapy [90]. While common cliques and pathways were identified for the populations of interest, subsequent analysis was also able to determine points of divergence within the connectivity profiles across all four populations.

This analysis depicts the importance of cliques and their connectivity profiles with respect to the important biological processes, and pathways and it helps to demonstrate the divergence of them across the four populations.

Analysis based on MinCliques condition

Using the same seed as given in Figure 4, we found the CCPs for MinCliques as shown in Figure 6. The USA CCPs identifed a new clique that contains the EGFR gene, whereas for GER and CHN, the CCP identified the same divergent clique as shown in Figure 4 {EGFR, GRB2, PIK3R1, PTPN6, SHC, SRC, ESR1}. The CCP then diverged for GER and CHN. For CHN, the new connected clique contained the genes {ZAP70, VAV1, FYN, and CRK} while the GER clique had {STAT1, PTPN11, EGFR, JAK2, STAT5B, STAT5A, and STAT3}. ZAP 70 and VAV1 are known to be over-expressed in CRC [91], and STAT1, JAK2, and others are related to the JAK-STAT pathways associated with CRC. Figure 6 depicts the advantages and disadvantages of using a threshold for overlapping nodes to identify CCP - the altering the number of overlapping nodes.

When all clique sizes were considered for the MaxCliques algorithm, our study identified cliques with TAF1, TAF10, JUN, and FOS in all the populations. TAF1 is a regulator of apoptosis in cancer [92] and has been identified to be up-regulated in NCI-60 cell lines [93]. KRAS, which was present in a size three clique, was identified in USA, SA, and China populations. KRAS clique had its CCPs connected with the clique of BCL2 (size five-six). KRAS pathways are known to be associated with CRC [54]. The clique {MCM10, MCM2, MCM3, MCM4, MCM6, MCM7, ORC2L} identified in Table 4 was associated with cliques of CDKN1A (size four, five) in all the populations and down-regulation of CDKN1A plays a role in CRC [94]. The clique {DIS3, EXOSC2, EXOSC4, EXOSC5, EXOSC7, EXOSC8, EXOSC9, MPP6} identified in Table 3 does not have any CCP with any other size cliques in any population networks.

Genes present in cliques can form signatures for CRC specific to population. The CCP algorithm, if preceded with either MaxCliques or MinCliques, will still identify the important genes associated with CRC. Analysis of cliques of all sizes can identify the divergence in CCPs across population. As the definition of cliques is more stringent than that of modules, networks have fewer cliques than modules, allowing for more manageable analysis. Our analysis showed that CCPs can identify the commonality and divergence across populations. The ability of both cliques and CCPs to identify commonalities and divergences allows for them to be considered as gene signatures for CRC and can be evaluated further in the laboratory.

Conclusions

In this paper we developed a methodology for identification of commonalities and variations in CRC across populations by evaluating cliques and their connectivity profiles. In this study, we considered four distinct populations across the world. We used both topological and biological features - specifically co-expression, GO distances for biological process, and pathway similarity scores - in our network analysis. We additionally introduced the concept of using cliques to capture gene signatures for CRC across populations. The methodology developed for joining cliques is powerful for finding the commonalities and divergences among populations with respect to their gene signatures. Using the CCP, we were able to capture important network components, including biological processes, pathways, and genes, and use these to elucidate the gene signature of CRC. The advantage of using cliques as opposed to functional modules is that although there are fewer cliques in a network, they are still able to capture the key gene signatures of a disease. Though the current study only applied the use of clique analysis to small datasets, we plan to validate the procedure in larger datasets. We additionally plan to make our CCP algorithm more stringent with respect to overlapping nodes. As our methodology is scalable with respect to annotation, different features such as static and dynamic profiles, literature score, and phenotypes can give in-depth stratification of CRC across populations. Comparison of all cliques (through their CCP) as gene signatures across populations may ultimately aid the advancement of personalized medicine and the identification of efficient drug targets.

Methods

In order to decipher the gene signatures and identify the similarity/uniqueness among the four different populations of CRC, the following methodology, as illustrated in Figure 1, was adopted.

Datasets

Four independent microarray studies available in the public domain repository GEO http://www.ncbi.nih.gov/geo were considered for this study. These studies were performed on the GPL 570 platform. The datasets from four different food habitats were considered - CHN, GER, SA and USA. These populations are quite distinct with respect to each other as there is less commonality in their diet and environmental conditions. The statistics for these different datasets are: (i) GER (GSE4183): 23 disease and 8 healthy control samples; (ii) SA (GSE23878): 35 disease and 24 healthy control samples; (iii) USA (GSE 13471): 4 disease and 4 healthy control samples; and, (iv) CHN (GSE22242): 1 disease and 1 healthy control sample. Raw data in each case was processed using the RMA algorithm in R Bio conductor http://www.r-project.org[47]. The normalized datasets were then analyzed by two-sample t-test. The genes satisfying the t-test (p-value < 0.05 and q-value with FDR < 0.1) were further considered for differential expression analysis across the populations.

Construction of the interaction network

For the above genes the population specific networks, were constructed using the protein-protein interactions obtained from the HPRD database [46].

Analysis of population specific networks

Networks were first analyzed individually based on their topological and biological features. Each node in the network was first annotated for its topological properties, with the edges providing the biological significance.

Node strength based on topological properties

Using the statistical computing tool R, each node in the network was scored for its Degree, Eccentricity, Closeness, and Betweenness properties. Degree was defined by the number of connections a given node had with other nodes in the network. Eccentricity of a node was defined by the ease with which it could be accessed by all the other nodes in the network. Eccentricity of a node v was calculated by computing the shortest path between the node v and all other nodes in the network as,

E c c e n t r i c i t y (E_{e c c} (v)) = \frac{1}{max \{d i s t (v, w) : w \in V\}}

(i)

Where w represents the number of nodes in set V of nodes and has the shortest distance to node v.

Closeness of a node v is the average of the shortest path between the node v and all other nodes in the network and was given by,

C l o s e n e s s (C_{c l s o} (v)) = \frac{1}{\sum_{w \in V} d i s t (v, w)}

(ii)

Betweenness of a node v is the inverse of the ratio of total number of shortest paths from node s to node t given by σ_st to the number of total paths passing through node v (σ_st (v)). This was computed as,

B e t w e e n n e s s (B_{b e t} (v)) = \sum_{s \neq v \neq t} \frac{σ_{s t} (v)}{σ_{s t}}

(iii)

Each of the above features was computed and normalized to obtain the node strength for v, which was given as,

N o d e S t r e n g t h_{v} = \frac{(D e g r e e + E_{c c} + C_{c l s o} + B_{b e t})}{4}

(iv)

Edge strength based on biological properties

The edge strength was defined as the weight assigned to an edge connecting the two nodes (v_i, v_j ) in the network. Edge strength was computed based on three biological features: PCC, Gene ontology distance, and pathway similarity score. PCC was used as a similarity measure between the two nodes as it identified the co-expressed genes, which encode interacting proteins and help in understanding cellular patterns [95], in the network.

The PCC was calculated for nodes (v_i, v_j ) as,

PearsonCor r_{vi vj} = \frac{\sum_{k = 1}^{n} (v i_{k} - v i_{m e a n}) (v j_{k} - v j_{m e a n})}{\sqrt{\sum_{k = 1}^{n} {(v i_{k} - v i_{m e a n})}^{2} {(v j_{k} - v j_{m e a n})}^{2}}}

(v)

Where vi_mean, vj_mean of the sample is means for the genes i and j, and n is number of samples.

The genes in the network were annotated based on GO biological process and evaluated for their similarity. The GO distance similarity for nodes (v_i, v_j ) was computed as [96]:

GeneOntologyDistance (v_{i} v_{j}) = \frac{# (G O (G_{i}) Δ G O (G_{j}))}{# (G O (G i) \cup G O (G_{j})) + # (G O (G i) \cap G O (G_{j})}

(vi)

Where, Δ is the symmetric set difference, and GO(G_i ) is the number of GO annotations for v_i . Similarly, we computed GO(G_i )) for v_j . If the GO distance between (v_i, v_j ) was less than 1.0, they were considered interacting. The interacting nodes are considered for constructing the network.

The Pathway similarity score was computed using pathways in KEGG database [53, 54]. Each gene was annotated with its associated pathway, and the gene-pathway similarity score was computed as follows:

Let (v_i, v_j ) represent the two nodes in the network. Let P_N represent a set of pathways where gene v_i is present, and P_M represent the set of pathways where gene v_j is present. P_common then equals the number of common pathways identified in P_N and P_M , and Unique equals the unique number of pathways present in P_N and P_M . The pathway similarity score between (v_i, v_j ) is defined as:

PathwaySimilarityScor e_{(v_{i} v_{j})} = \frac{\frac{P c o m m o n}{P_{N}} + \frac{P c o m m o n}{P_{M}}}{U n i q u e (P_{N} + P_{M})}

(vii)

The three biological features were further normalized, and each interaction in the network was scored based on the average score for each of the features and given as,

EdgeStrengt h_{v_{i} v_{j}} = \frac{P e a r s o n C o r r_{N o r m} + G e n e O n t o l o g y D i s t a n c e_{N o r m} + P a t h w a y S i m i l a r i t y S c o r e_{N o r m}}{3}

(viii)

Identification of cliques

Cliques are fully connected, conserved, and co-expressed in the networks [33, 42]. We developed a graph-based approach to identify cliques in the networks with the purpose of understanding them as gene signatures across population. A clique was defined as a fully connected graph, as shown in Figure 1 (a). Let G = (V, E) be any arbitrary undirected graph with V = {1,2,3 ... N} as its vertices, and E = {(1, 2), (1, 3), ..., (1, N)}, the set of corresponding edges. A clique C is a sub-graph of V such that C ∈ V and every vertex of C in the sub-graph is connected to all the other C - 1 vertcies. Each population network was then analyzed for cliques of various sizes, ranging from 3 to M nodes. For our analysis, M = 7. The strength of a clique was defined based on the associated node strength (eqn. (iv)) and edge strengths (viii)), and computed as:

CliqueStrengt h_{i} = \sum_{N o d e j = 1}^{n} \sum_{E d g e k = 1}^{e} N o d e S t r e n g t h + E d g e S t r e n g t h

(ix)

We used the greedy algorithm to first identify three-node cliques in the networks as a seed. The seed was then used for identifying cliques of higher sizes, ranging from four to seven nodes.

Clique connectivity profile algorithm (CCP)

To understand the profile of the cliques across population, we developed an algorithm to discover the connectivity profile of the cliques based on the number of common nodes. Our hypothesis for this connectivity rule was that cliques with common nodes may have similar pathways and Gene Ontology biological processes. Each clique may traverse the network by taking different paths. Identification of the clique connection profile (CCP) was important to understanding the gene signature of CRC as the interacting genes in these cliques might be important for a function in a given biological process. The CCP algorithm annotated each clique with its total CliqueStrength (equation (ix)), and then identified its closest clique connection based on the number of common nodes and CliqueStrength. This CCP algorithm iteratively progressed until no new clique could be added to the path. The clique connectivity strength was computed as,

CliqueConnectivityScor e_{i, j} = \sum \frac{C l i q u e S t r e n g t h_{i j}}{2}

(x)

The CCP algorithm first identified the clique (for similar size) with highest strength common to all the population. Using this as a seed, the algorithm proceeded ultimately produced a network of cliques that provided the gene signatures that are present across the populations for CRC. The smaller size cliques were added to this network of cliques. These CCPs could then be used to understand the commonality and uniqueness as gene-signatures in CRC across populations.

Statistical evaluation of cliques using gene enrichment analysis

Hyper-geometric distribution based on p-value was used for identifying the significance of GOTerms in the network and cliques [97] and was computed as,

p - value = \sum_{i = x}^{N} \frac{(\begin{matrix} g \\ g g \end{matrix}) (\begin{matrix} G - g \\ n - g g \end{matrix})}{(\begin{matrix} G \\ n \end{matrix})}

(xi)

where, significance of a given GOTerm x, gg genes in the n genes of cliques, and that is associated to g genes from G genes in the population network. GO Terms with p-values less than 0.05 were further used for analyzing the biological significance of the cliques.

Abbreviations

CRC:: Colorectal cancer
CCP:: clique connectivity profile
GO:: Gene Ontology
GER:: Germany
SA:: Saudi Arabia
CHN:: China
PCC:: Pearson correlation coefficient.

References

Burkitt D: Possible relationships between bowel cancer and dietary habbits. Proc R Soc Med. 1971, 64: 964-5.
PubMed Central CAS PubMed Google Scholar
Sridhar R, et al: A molecular signature of metastasis in primary solid tumors. Nature Genetics. 2003, 33: 49-54. 10.1038/ng1060.
Google Scholar
Rhodes DR, et al: Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. Proc Natl Acad Sci USA. 2004, 101 (25): 9309-14. 10.1073/pnas.0401994101.
PubMed Central CAS PubMed Google Scholar
Kavak E, et al: Meta-analysis of cancer gene expression signatures reveals new cancer genes, SAGE tags and tumor associated regions of co-regulation. Nucleic Acids Research. 2010, 38 (20): 7008-21. 10.1093/nar/gkq574.
PubMed Central CAS PubMed Google Scholar
Xu L, Geman D, Winslow RL: Large-scale integration of cancer microarray data identifies a robust common cancer signature. BMC Bioinformatics. 2007, 8: 275-10.1186/1471-2105-8-275.
PubMed Central PubMed Google Scholar
Axelsen JB, et al: Genes overexpressed in different human solid cancers exhibit different tissue-specific expression profiles. Proc Natl Acad Sci USA. 2007, 104 (32): 13122-7. 10.1073/pnas.0705824104.
PubMed Google Scholar
Altman RB, Raychaudhuri S: Whole-genome expression analysis: challenges beyond clustering. Current Opinion in Structural Biology. 2001, 11 (3): 340-347. 10.1016/S0959-440X(00)00212-8.
CAS PubMed Google Scholar
Golub T, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. Science. 1999, 286 (5439): 531-537. 10.1126/science.286.5439.531.
CAS PubMed Google Scholar
Alizadeh AA, et al: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature. 2000, 403 (6769): 503-11. 10.1038/35000501.
CAS PubMed Google Scholar
Bernards R, et al: A gene-expression signature as a predictor of survival in breast cancer. New England Journal of Medicine. 2002, 347 (25): 1999-2009. 10.1056/NEJMoa021967.
PubMed Google Scholar
Beer DG, et al: Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med. 2002, 8 (8): 816-24.
CAS PubMed Google Scholar
Andrisani OM, Studach L, Merle P: Gene signatures in hepatocellular carcinoma (HCC). Semin Cancer Biol. 2011, 21 (1): 4-9. 10.1016/j.semcancer.2010.09.002.
PubMed Central CAS PubMed Google Scholar
Jurchott K, et al: Identification of Y-Box Binding Protein 1 As a Core Regulator of MEK/ERK Pathway-Dependent Gene Signatures in Colorectal Cancer Cells. Plos Genetics. 2010, 6 (12): e1001231-10.1371/journal.pgen.1001231.
PubMed Central PubMed Google Scholar
Fu LM, Fu-Liu CS: Multi-class cancer subtype classification based on gene expression signatures with reliability analysis. Febs Letters. 2004, 561 (1-3): 186-190. 10.1016/S0014-5793(04)00175-9.
CAS PubMed Google Scholar
Ideker T, Ozier O, Schwikowski B, Siegal AF: Discovering regulatory and signaling circuits in molecular interaction networks. Bioinformatics. 2002, 18 (Suppl 1): S233-S240. 10.1093/bioinformatics/18.suppl_1.S233.
PubMed Google Scholar
Oda K: Targeting Ras-PI3K/mTOR pathway and the predictive biomarkers in endometrial cancer. Gan To Kagaku Ryoho. 2011, 38 (7): 1084-7.
CAS PubMed Google Scholar
Kakkar R, Lee RT: The IL-33/ST2 pathway: therapeutic target and novel biomarker. Nat Rev Drug Discov. 2008, 7 (10): 827-40. 10.1038/nrd2660.
PubMed Central CAS PubMed Google Scholar
Andersen JN, et al: Pathway-based identification of biomarkers for targeted therapeutics: personalized oncology with PI3K pathway inhibitors. Sci Transl Med. 2010, 2 (43): 43ra55-10.1126/scitranslmed.3001065.
PubMed Google Scholar
Zhang F, Chen JY: Discovery of pathway biomarkers from coupled proteomics and systems biology methods. BMC Genomics. 2010, 11 (Suppl 2): S12-10.1186/1471-2164-11-S2-S12.
Google Scholar
Bandyopadhyay N, et al: Pathway-based feature selection algorithm for cancer microarray data. Adv Bioinformatics. 2009, 532989-
Google Scholar
Muller H, et al: Graph-based identification of cancer signaling pathways from published gene expression signatures using PubLiME. Nucleic Acids Research. 2007, 35 (7): 2343-2355. 10.1093/nar/gkm119.
PubMed Central PubMed Google Scholar
Bild AH, et al: Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature. 2006, 439 (7074): 353-357. 10.1038/nature04296.
CAS PubMed Google Scholar
Chuang HY, et al: Network-based classification of breast cancer metastasis. Mol Syst Biol. 2007, 3: 140-
PubMed Central PubMed Google Scholar
Li J, et al: Identification of high-quality cancer prognostic markers and metastasis network modules. Nature Communications. 2010, 1 (4): 1-8.
Google Scholar
Swami M: CANCER GENOMICS A modular approach to signalling. Nature Reviews Genetics. 2009, 10 (6): 348-348. 10.1038/nrg2602.
CAS Google Scholar
Hartwell LH, et al: From molecular to modular cell biology. Nature. 1999, 402 (6761 Suppl): C47-52.
CAS PubMed Google Scholar
Ulitsky I, Shamir R: Identification of functional modules using network topology and high-throughput data. BMC Systems Biology. 2007, 1: 8-10.1186/1752-0509-1-8.
PubMed Central PubMed Google Scholar
Maraziotis IA, Dimitrakopoulou K, Bezerianos A: Growing functional modules from a seed protein via integration of protein interaction and gene expression data. BMC Bioinformatics. 2007, 8: 408-10.1186/1471-2105-8-408.
PubMed Central PubMed Google Scholar
Tornow S, Mewes HW: Functional modules by relating protein interaction networks and gene expression. Nucleic Acids Research. 2003, 31 (21): 6283-9. 10.1093/nar/gkg838.
PubMed Central CAS PubMed Google Scholar
Dittrich MT, et al: Identifying functional modules in protein-protein interaction networks: an integrated exact approach. Bioinformatics. 2008, 24 (13): i223-31. 10.1093/bioinformatics/btn161.
PubMed Central CAS PubMed Google Scholar
Chen TC, et al: Cliques in mitotic spindle network bring kinetochore-associated complexes to form dependence pathway. Proteomics. 2009, 9 (16): 4048-62. 10.1002/pmic.200900231.
CAS PubMed Google Scholar
Hu H, Yan X, Huang Y, Han J, Zhou X: Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics. 2005, 21 (Suppl 1): i213-i221. 10.1093/bioinformatics/bti1049.
CAS PubMed Google Scholar
Shen Q-JJaH-B: Maximum-clique algorithm: an Effective Method to Mine Large-scale Co-expressed Genes in Arabidopsis Anther. Proceeding of the 30th Chinese Control Conference. 2011
Google Scholar
Wu H, et al: Prediction of functional modules based on comparative genome analysis and Gene Ontology application. Nucleic Acids Research. 2005, 33 (9): 2822-37. 10.1093/nar/gki573.
PubMed Central CAS PubMed Google Scholar
Zhao S, Li Y: Extracting functional modules from biological pathways. Nature precedings. 2007, 1457: 1-
Google Scholar
Li Y, Agarwal P, Rajagopalan D: A global pathway crosstalk network. Bioinformatics. 2008, 24 (12): 1442-1447. 10.1093/bioinformatics/btn200.
CAS PubMed Google Scholar
Clements M, et al: Integration of known transcription factor binding site information and gene expression data to advance from co-expression to co-regulation. Genomics Proteomics Bioinformatics. 2007, 5 (2): 86-101. 10.1016/S1672-0229(07)60019-9.
CAS PubMed Google Scholar
Storey JD, et al: Gene-expression variation within and among human populations. Am J Hum Genet. 2007, 80 (3): 502-509. 10.1086/512017.
PubMed Central CAS PubMed Google Scholar
Minguez P, Dopazo J: Assessing the Biological Significance of Gene Expression Signatures and Co-Expression Modules by Studying Their Network Properties. PLOs One. 2011, 6 (3): e17474-10.1371/journal.pone.0017474.
PubMed Central CAS PubMed Google Scholar
Mao LY, et al: Arabidopsis gene co-expression network and its functional modules. BMC Bioinformatics. 2009, 10: 346-10.1186/1471-2105-10-346.
PubMed Central PubMed Google Scholar
Jovov B, Araujo-Perez F, Sigel CS, Stratford JK, McCoy AN, Yeh JJ, Keku T: Differential Gene Expression between African American and European American Colorectal Cancer Patients. PLOs One. 2012, 7 (1): e30168-10.1371/journal.pone.0030168.
PubMed Central CAS PubMed Google Scholar
Koyuturk M, et al: Detecting conserved interaction patterns in biological networks. J Comput Biol. 2006, 13 (7): 1299-322. 10.1089/cmb.2006.13.1299.
PubMed Google Scholar
Mostafavi S, et al: GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome Biol. 2008, 9 (Suppl 1): S4-10.1186/gb-2008-9-s1-s4.
PubMed Central PubMed Google Scholar
Shi Z, Derow CK, Zhang B: Co-expression module analysis reveals biological processes, genomic gain, and regulatory mechanisms associated with breast cancer progression. Bmc Systems Biology. 2010, 4: 74-10.1186/1752-0509-4-74.
PubMed Central PubMed Google Scholar
Cerami E, et al: Automated Network Analysis Identifies Core Pathways in Glioblastoma. PLoS One. 2010, 5 (2): e8918-10.1371/journal.pone.0008918.
PubMed Central PubMed Google Scholar
Prasad TSK, et al: Human Protein Reference Database-2009 update. Nucleic Acids Research. 2009, 37: D767-D772. 10.1093/nar/gkn892.
CAS Google Scholar
Bolstad BM, et al: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003, 19 (2): 185-93. 10.1093/bioinformatics/19.2.185.
CAS PubMed Google Scholar
Pradhan MP, Prasad NK, Palakal MJ: A Systems Biology Approach to the Global Analysis of Transcription Factors in Colorectal Cancer. BMC Cancer. 2012, 12: 331-10.1186/1471-2407-12-331.
PubMed Central CAS PubMed Google Scholar
Xiang H, Wang Q, Hu Z, Xu J, Wang W, Wei Z: Expression of EGFR, Grb2, p-mTOr and VEGFR in human colorectal cancer. Academic Journal of Second Military Medical University. 2009, 29 (7): 775-779.
Google Scholar
Seiden-Long I, Navab R, Shih W, Li M, Chow J, Zhu CQ, Radulovich N, Caroline Saucier C, Tsao MS: Gab1 but not Grb2 mediates tumor progression in Met overexpressing colorectal cancer cells. Carcinogenesis. 2008, 29 (3): 647-655.
CAS PubMed Google Scholar
Vaughan TY, Verma S, Bunting KD: Grb2-associated binding (Gab) proteins in hematopoietic and immune cell biology. Am J Blood Res. 2011, 1 (2): 130-134.
PubMed Central CAS PubMed Google Scholar
Bryan EJ, et al: Mutation analysis of EP300 in colon, breast and ovarian carcinomas. Int J Cancer. 2002, 102 (2): 137-41. 10.1002/ijc.10682.
CAS PubMed Google Scholar
Kanehisa MaGS: KEGG: Kyoto Enclyopedia of Genes and Genomes. Nucleic Acids Research. 2000, 28: 27-30. 10.1093/nar/28.1.27.
PubMed Central CAS PubMed Google Scholar
Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M: KEGG for integration and interpretation of large-scale molecular datasets. Nucleic Acids Research. 2012, 40: D109-D114. 10.1093/nar/gkr988.
PubMed Central CAS PubMed Google Scholar
Lee Y, et al: ATXN1 protein family and CIC regulate extracellular matrix remodeling and lung alveolarization. Dev Cell. 2011, 21 (4): 746-57. 10.1016/j.devcel.2011.08.017.
PubMed Central CAS PubMed Google Scholar
Cohen SJ, Cohen RB, Meropol NJ: Targeting signal transduction pathways in colorectal cancer - More than skin deep. Journal of Clinical Oncology. 2005, 23 (23): 5374-5385. 10.1200/JCO.2005.02.194.
CAS PubMed Google Scholar
Waldner M, Schimanski CC, Neurath MF: Colon cancer and the immune system: the role of tumor invading T cells. World J Gastroenterol. 2006, 12 (45): 7233-8.
PubMed Central CAS PubMed Google Scholar
Ambs S, et al: Relationship between p53 mutations and inducible nitric oxide synthase expression in human colorectal cancer. Journal of the National Cancer Institute. 1999, 91 (1): 86-88. 10.1093/jnci/91.1.86.
CAS PubMed Google Scholar
Fang JY, Richardson BC: The MAPK signalling pathways and colorectal cancer. Lancet Oncology. 2005, 6 (5): 322-327. 10.1016/S1470-2045(05)70168-6.
CAS PubMed Google Scholar
Danning He Z-PL, Luonan Chen: Identification of dysfunctional modules and disease genes in congential heart disease by a network-based approach. BMC Genomics. 2011, 12: 592-10.1186/1471-2164-12-592.
PubMed Central PubMed Google Scholar
Xu Y, Pasche B: TGF-beta signaling alterations and susceptibility to colorectal cancer. Hum Mol Genet. 2007, 16 (Spec No 1): R14-20.
PubMed Central CAS PubMed Google Scholar
Ray RM, Bhattacharya S, Johnson LR: Mdm2 inhibition induces apoptosis in p53 deficient human colon cancer cells by activating p73- and E2F1-mediated expression of PUMA and Siva-1. Apoptosis. 2011, 16 (1): 35-44. 10.1007/s10495-010-0538-0.
CAS PubMed Google Scholar
Pradhan M, Palakal M: Identifying CRC specific pathways and drug targets from literature augmented proteomics data. Proceedings of BioCOMP. 2010, II: 323-329.
Google Scholar
Boyle EI, et al: GO::TermFinder - open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes. Bioinformatics. 2004, 20 (18): 3710-3715. 10.1093/bioinformatics/bth456.
PubMed Central CAS PubMed Google Scholar
Dennis G, et al: DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol. 2003, 4 (5): P3-10.1186/gb-2003-4-5-p3.
PubMed Google Scholar
Valentini AM, Pirrelli M, Caruso ML: EGFR-targeted therapy in colorectal cancer: Does immunohistochemistry deserve a role in predicting the response to cetuximab?. Current Opinion in Molecular Therapeutics. 2008, 10 (2): 124-131.
CAS PubMed Google Scholar
Shankaran V, Obel J, Benson AB: Predicting response to EGFR inhibitors in metastatic colorectal cancer: current practice and future directions. Oncologist. 2010, 15 (2): 157-67. 10.1634/theoncologist.2009-0221.
PubMed Central CAS PubMed Google Scholar
Bardelli A, Siena S: Molecular mechanisms of resistance to cetuximab and panitumumab in colorectal cancer. J Clin Oncol. 2010, 28 (7): 1254-1261. 10.1200/JCO.2009.24.6116.
CAS PubMed Google Scholar
Horii Joichiro, et al: Methylation of estrogen receptor 1 in colorectal adenomas is not age-dependent, but is correlated with K-ras mutation. Cancer Science. 2009, 100 (6): 1005-1011. 10.1111/j.1349-7006.2009.01140.x.
CAS PubMed Google Scholar
Mlakar V, et al: Presence of activating KRAS mutations correlates significantly with expression of tumour suppressor genes DCN and TPM1 in colorectal cancer. Bmc Cancer. 2009, 9: 282-10.1186/1471-2407-9-282.
PubMed Central PubMed Google Scholar
Lockwood W, Chari R, Coe BP, Girard L, MacAulay C, Lam S, Gazdar AF, Minna JD, Lam WL: DNA amplification is a ubiquitous mechanism of oncogene activation in lung and other cancers. Oncogene. 2008, 27 (33): 4615-4624. 10.1038/onc.2008.98.
PubMed Central CAS PubMed Google Scholar
Ishihama K, et al: Expression of HDAC1 and CBP/p300 in human colorectal carcinomas. J Clin Pathol. 2007, 60 (11): 1205-10. 10.1136/jcp.2005.029165.
PubMed Central CAS PubMed Google Scholar
Suchy J, et al: BRCA1 mutations and colorectal cancer in Poland. Fam Cancer. 2010, 9 (4): 541-4. 10.1007/s10689-010-9378-x.
CAS PubMed Google Scholar
Xie W, Rimm DL, Lin Y, Shhih WJ, Reiss M: Loss of Smad Signaling in Human Colorectal cancer is associated with advanced disease and poor prognosis. Cancer Journal. 2003, 9 (4): 302-312. 10.1097/00130404-200307000-00013.
CAS Google Scholar
Li F, Cao Y, Townsend CM, Ko CK: TGF-b Signaling in Colon Cancer Cells. World Journal of Surgery. 2005, 29 (3): 306-11. 10.1007/s00268-004-7813-6.
PubMed Google Scholar
Rodrigues NR, et al: P53 Mutations in Colorectal-Cancer. Proceedings of the National Academy of Sciences of the United States of America. 1990, 87 (19): 7555-7559. 10.1073/pnas.87.19.7555.
PubMed Central CAS PubMed Google Scholar
Schmitz M, et al: PCR-SSCP a sensitive and rapid method to detect mutations in the P53 tumor suppressor gene of patients with advanced colorectal cancer. European Journal of Cancer. 1995, 31A: 448-448.
Google Scholar
Liu Y, Bodmer WF: Analysis of P53 mutations and their expression in 56 colorectal cancer cell lines. Proceedings of the National Academy of Sciences of the United States of America. 2006, 103 (4): 976-981. 10.1073/pnas.0510146103.
PubMed Central CAS PubMed Google Scholar
Wu J, Lin W, Chen W, Huang Y, Tang C, Ho MS, Pi H, Chien C: CSN-mediated deneddylation differentially modulates Ci155 proteolysis to promote Hedgehog signalling responses. Nat Commun. 2011, 2: 182-
PubMed Central PubMed Google Scholar
Chene P: Inhibiting the p53-MDM2 interaction: an important target for cancer therapy. Nat Rev Cancer. 2003, 3 (2): 102-9. 10.1038/nrc991.
CAS PubMed Google Scholar
Das PM, Singal R: DNA methylation and cancer. J Clin Oncol. 2004, 22 (22): 4632-42. 10.1200/JCO.2004.07.151.
CAS PubMed Google Scholar
Lim J, et al: Isolation of murine and human homologues of the fission-yeast dis3+ gene encoding a mitotic-control protein and its overexpression in cancer cells with progressive phenotype. Cancer Res. 1997, 57 (5): 921-5.
CAS PubMed Google Scholar
Francisco M, Philippe L, Torben FØ, Karin B: Genes Involved in Human Ribosome Biogenesis are Transcriptionally Upregulated in Colorectal Cancer. Scholarly Research Exchange. 2009, 2009: Article ID 657042
Google Scholar
Takahashi S, Suzuki S, Inaguma S, Cho Y-M, Ikeda Y, Hayashi N, Inoue T, Sugimura Y, Nishiyama N, Fujita T, Ushijima T, Shirai T: Down-regulation of Lsm1 is involved in human prostate cancer progression. Br J Cancer. 2002, 86 (6): 940-946. 10.1038/sj.bjc.6600163.
PubMed Central CAS PubMed Google Scholar
Coovert D, Thanh TLe, Morris GE, Man NT, Kralewski M, Sendtner M, Burghes AHM: Does the survival motor neuron protein (SMN) interact with Bcl-2?. J Med Genet. 2000, 37: 536-539. 10.1136/jmg.37.7.536.
PubMed Central CAS PubMed Google Scholar
Pristauz G, et al: Androgen receptor expression in breast cancer patients tested for BRCA1 and BRCA2 mutations. Histopathology. 2010, 57 (6): 877-84. 10.1111/j.1365-2559.2010.03724.x.
PubMed Google Scholar
Bienz M, Clevers H: Linking colorectal cancer to Wnt signaling. Cell. 2000, 103 (2): 311-320. 10.1016/S0092-8674(00)00122-7.
CAS PubMed Google Scholar
Torsello A, et al: P53 and bcl-2 in colorectal cancer arising in patients under 40 years of age: Distribution and prognostic relevance. European Journal of Cancer. 2008, 44 (9): 1217-1222. 10.1016/j.ejca.2008.03.002.
CAS PubMed Google Scholar
Hembruff SL, Cheng N: Chemokine signaling in cancer: Implications on the tumor microenvironment and therapeutic targeting. Cancer Ther. 2009, 7 (A): 254-267.
PubMed Central CAS PubMed Google Scholar
So EY, Ouchi T: The application of Toll like receptors for cancer therapy. International Journal of Biological Sciences. 2010, 6 (7): 675-681.
PubMed Central CAS PubMed Google Scholar
Huang MY, et al: CDC25A, VAV1, TP73, BRCA1 and ZAP70 gene overexpression correlates with radiation response in colorectal cancer. Oncology Reports. 2011, 25 (5): 1297-306.
CAS PubMed Google Scholar
Kimura J, et al: A functional genome-wide RNAi screen identifies TAF1 as a regulator for apoptosis in response to genotoxic stress. Nucleic Acids Research. 2008, 36 (16): 5250-5259. 10.1093/nar/gkn506.
PubMed Central CAS PubMed Google Scholar
Amundson SA, et al: Integrating global gene expression and radiation survival parameters across the 60 cell lines of the National Cancer Institute Anticancer Drug Screen. Cancer Res. 2008, 68 (2): 415-24. 10.1158/0008-5472.CAN-07-2120.
CAS PubMed Google Scholar
Ogino S, Kawasaki T, Kirkner GJ, Ogawa A, Dorfman I, Loda M, fuchs CS: Down-regulation of p21 (CDKN1A/CIP1) is inversely associated with microsatellite instability and CpG island methylator phenotype (CIMP) in colorectal cancer. J Pathol. 2006, 210 (2): 147-54. 10.1002/path.2030.
CAS PubMed Google Scholar
Patil A, Nakai K, Kinoshita K: Assessing the utility of gene co-expression stability in combination with correlation in the analysis of protein-protein interactions. BMC Genomics. 2011, 12 (Suppl 3): S19-10.1186/1471-2164-12-S3-S19.
PubMed Central CAS PubMed Google Scholar
Martin D, et al: GOToolBox: functional analysis of gene datasets based on Gene Ontology. Genome Biol. 2004, 5 (12): R101-10.1186/gb-2004-5-12-r101.
PubMed Central PubMed Google Scholar
Wang J, Chen G, Li M, Pan Y: Integration of breast cancer gene signatures based on graph centrality. Bmc Systems Biology. 2011, 5 (Suppl 3): S10-10.1186/1752-0509-5-S3-S10.
PubMed Central PubMed Google Scholar

Download references

Acknowledgements

This work was funded in part by a grant from the Department of Defense as part of the Cancer Care Engineering Project. We also want to thank Deepali Jhamb and Yogesh Pandit, members of the TiMAP laboratory at IUPUI, for their valuable suggestions and Jeff Hostetler for editing the manuscript.

This article has been published as part of BMC Systems Biology Volume 6 Supplement 3, 2012: Proceedings of The International Conference on Intelligent Biology and Medicine (ICIBM) - Systems Biology. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcsystbiol/supplements/6/S3.

Author information

Authors and Affiliations

School of Informatics, Indiana University Purdue University Indianapolis, IN, USA
Meeta P Pradhan, Kshithija Nagulapalli & Mathew J Palakal

Authors

Meeta P Pradhan
View author publications
You can also search for this author in PubMed Google Scholar
Kshithija Nagulapalli
View author publications
You can also search for this author in PubMed Google Scholar
Mathew J Palakal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mathew J Palakal.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

MPP: conceptualizing and developing methodology, writing and analysis of all the algorithms, writing manuscript.

KN: Data collection and data filtering, drawing figures, critical reading and editing of the manuscript.

MJP: PI of the project, conceptualizing the objective, writing manuscript, valuable inputs at all times.

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Pradhan, M.P., Nagulapalli, K. & Palakal, M.J. Cliques for the identification of gene signatures for colorectal cancer across population. BMC Syst Biol 6 (Suppl 3), S17 (2012). https://doi.org/10.1186/1752-0509-6-S3-S17

Download citation

Published: 17 December 2012
DOI: https://doi.org/10.1186/1752-0509-6-S3-S17

The International Conference on Intelligent Biology and Medicine (ICIBM): Systems Biology

Cliques for the identification of gene signatures for colorectal cancer across population

Abstract

Background

Results

Conclusions

Background

Results and discussion

Data analysis

Network construction

Analysis of population specific networks

Analysis of node strength based on topological properties

Analysis of biological features for population specific networks

Identification of cliques

Analysis of cliques common across the populations

Analysis of pathways associated with genes in cliques for all populations

Analysis of CliqueStrength

Discovering clique connectivity profile (CCP)

Analysis based on MaxCliques condition

Analysis based on MinCliques condition

Conclusions

Methods

Datasets

Construction of the interaction network

Analysis of population specific networks

Node strength based on topological properties

Edge strength based on biological properties

Identification of cliques

Clique connectivity profile algorithm (CCP)

Statistical evaluation of cliques using gene enrichment analysis

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors' contributions

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Systems Biology

Contact us