Modularity-based credible prediction of disease genes and detection of disease subtypes on the phenotype-gene heterogeneous network
© Yao et al; licensee BioMed Central Ltd. 2011
Received: 4 February 2011
Accepted: 20 May 2011
Published: 20 May 2011
Protein-protein interaction networks and phenotype similarity information have been synthesized together to discover novel disease-causing genes. Genetic or phenotypic similarities are manifested as certain modularity properties in a phenotype-gene heterogeneous network consisting of the phenotype-phenotype similarity network, protein-protein interaction network and gene-disease association network. However, the quantitative analysis of modularity in the heterogeneous network and its influence on disease-gene discovery are still unaddressed. Furthermore, the genetic correspondence of the disease subtypes can be identified by marking the genes and phenotypes in the phenotype-gene network. We present a novel network inference method to measure the network modularity, and in particular to suggest the subtypes of diseases based on the heterogeneous network.
Based on a measure which is introduced to evaluate the closeness between two nodes in the phenotype-gene heterogeneous network, we developed a Hitting-Time-based method, CIPHER-HIT, for assessing the modularity of disease gene predictions and credibly prioritizing disease-causing genes, and then identifying the genetic modules corresponding to potential subtypes of the queried phenotype. The CIPHER-HIT is free to rely on any preset parameters. We found that when taking into account the modularity levels, the CIPHER-HIT method can significantly improve the performance of disease gene predictions, which demonstrates modularity is one of the key features for credible inference of disease genes on the phenotype-gene heterogeneous network. By applying the CIPHER-HIT to the subtype analysis of Breast cancer, we found that the prioritized genes can be divided into two sub-modules, one contains the members of the Fanconi anemia gene family, and the other contains a reported protein complex MRE11/RAD50/NBN.
The phenotype-gene heterogeneous network contains abundant information for not only disease genes discovery but also disease subtypes detection. The CIPHER-HIT method presented here is effective for network inference, particularly on credible prediction of disease genes and the subtype analysis of diseases, for example Breast cancer. This method provides a promising way to analyze heterogeneous biological networks, both globally and locally.
Disease gene prediction is one of the most important aims in biological and medical sciences. Network-based evidence as well as inference approaches has become more and more attractive in the research field of disease-causing gene discovery, and a variety of methods have been developed recently from this point of view [1–5]. Researchers also attach great importance to special features embedded in biological networks especially the protein-protein interaction (PPI) network for deeply understanding molecular mechanism of common human diseases [6–15]. Since genetic diseases are genetically or phenotypically similar, it is promising to combine the protein-protein interactions and the phenotype similarities to a phenotype-gene heterogeneous network to infer the candidate disease genes [1–4]. The so-called "phenotype-gene heterogeneous network" reflects a holistic view of complex relationships among various phenotypes and phenotypes, phenotypes and genes, as well as genes and genes, which consists of the phenotype-phenotype similarity network, gene-disease association network and protein-protein interaction network, respectively. Based on such a heterogeneous network, we propose a regression model named CIPHER (Correlating protein Interaction network and PHEnotype network to pRedict disease genes) to quantify the concordance between candidate genes and target phenotypes . The algorithm of random walk is also proposed to prioritize the candidate disease genes in protein-protein interaction networks  and then a random walk with restarts (RWR) method is extended to the above heterogeneous network .
In general, the network-based disease-gene discovery methods make use of information from both the topological structure and the associations between diseases and genes. The basic assumption is that similar disease phenotypes are caused by functionally related genes and these genes are likely to be close to each other on the protein-protein interaction networks, so that network modules are formed [15–18, 5]. Here the network module in computation refers to a group of genes exhibiting network proximity, and in biology refers to certain functional units such as protein complexes, signaling or metabolic pathways and transcriptional programs [16–19, 5]. Therefore, the algorithms in  prioritize candidate genes based on their closeness to known disease genes. After the similarity information between the phenotypes is provided by van Driel et al. through text mining technology , the phenotype similarity and the protein-protein interactions are combined together for the prioritization of the candidate disease genes [1, 2, 4].
However, so far little modularity analysis on the phenotype-gene heterogeneous networks has been done. The predicted results from the network inference methods need to be tested to see whether they form the modules and to which corresponding biological function they are related. In this paper, the network inference methods are further developed to measure the modularity property of the disease-gene prediction results. Furthermore, we also provide the method to infer the relationship between the subtypes of diseases and the modules formed by these predicted results.
Inference on the phenotype-gene heterogeneous network
For the network-based inference, a candidate gene g is prioritized to be a potential disease-causing gene of the target phenotype p if one or both of the followings are satisfied:
1. The gene g is close to some disease-causing genes associated with p.
2. The gene g is close to some phenotypes which are highly similar to p.
Hence one key point is to define the closeness between two nodes in the network, and this will be used to measure the similarity between the nodes based on the network topology [1, 2]. Currently the nearest-neighbor method considers the direct interactions information and ignores the long-range interactions. The shortest-path method considers the length of the shortest path connecting two nodes but ignores the number of short paths between them. The random walk with restart method [3, 4] combines the local and global network information to enhance the prediction performance.
Another key point is the priori information known about each target phenotype, the known disease-genes and the similar phenotypes. In the phenotype-gene heterogeneous network, for each given phenotype p, its known causing genes and the similar phenotypes are represented as the nodes which link to p directly, and these nodes are termed as the adjacent nodes of the target phenotype p in the heterogeneous network. The paths between p and any other nodes have to cross this adjacency set. Therefore, the prioritization can be carried out by measuring the closeness between the candidate genes (namely all genes in the protein-protein interaction network) and these adjacent nodes.
Candidate disease genes prioritization: which are the most credible?
Gene sets inference for the disease subtypes
Identifying subtypes of diseases such as cancer is of critical importance for predicting clinical outcomes as well as designing more-specific therapies for patients, facilitating a new era of translational medicine and personalized medicine [20, 21]. The intrinsic cancer subtypes have been studied in different ways by using histology, molecular pathology, genetic mutation and gene-expression information . The classification of human cancer has become more and more informative as the detailed molecular analysis is provided. For example, the molecular heterogeneity in tumor can be recognized according to the different patterns of the gene expression information [20–22]. Interestingly, Li et al. recently reported an integrative network analysis method to identify recurrent network modules that contribute to Breast cancer metastasis by using a set of tumour gene microarrays . Since molecular network modules have been detected in cancer subtypes , it is possible to use network modules to further classify Breast cancer into subtypes.
It is well accepted that similar phenotypes may be caused by functionally close related genes [1–16]. An extension of this assumption would be that genes related to different subtypes are likely to form distinct protein-protein interaction modules, which is a common indicator of gene functional relationship .
Thus, our CIPHER-HIT method is further used to identify the sub-groups of genes corresponding to the cancer subtypes. Such groups of genes are called sub-modules in the network, and the main task of our method is to identify the gene sets related to different subtypes of a target disease (or phenotype). In cases where the heterogeneity information of a phenotype is included in its adjacent nodes, it is promising to further classify the prioritized genes based on such information. The similar phenotypes and their associated genes have also provided information for identifying the sub-modules. For example, the phenotype node representing FANCONI ANIMIA has high topological similarity to the phenotype node BREAST CANCER. Recent studies demonstrate that genes FANCA, FANCB, FANCC, FANCD2, FANCE, FANCF and FANCG associated with Fanconi animia are closely related to the susceptibility of Breast cancer [25, 26]. These genes can be prioritized to be associated with Breast cancer by CIPHER-HIT successfully. In addition, by discriminating the adjacent nodes through which these genes are prioritized, they can also be marked as the sub-module corresponding to the subtype of Fanconi animia related Breast cancer.
Thus, in this work, we develop a method to reveal the relationship between each prioritized gene and each adjacent node so that the hierarchical clustering method is applied to discover the potential subtypes of the target phenotype. These results are meaningful for further biomedical and experimental researches, since they help to focus on the genes which are likely to form the sub-modules corresponding to the potential subtypes of diseases.
Results and Discussion
CIPHER-HIT: the topological closeness measure based on the Mean-Hitting-Time
The CIPHER method  and the random walk with restart method (RWR) [3, 4] are the approaches which reflect the global structural information of the phenotype-gene heterogeneous network, while the parameters such as the restart rate in RWR, which are related to the performance, are required to be pre-set. In the CIPHER-HIT method, we present a new closeness measure between two nodes based on the Mean-Hitting-Time of the random walk on the heterogeneous network. Although this measure is developed from the same mathematical background as the random walk with restart method [3, 4], it both reflects the global topological information very well and refrains from setting up a difficult-to-explain priori parameter. Moreover, one extension of this measure - the conditional Mean-Hitting-Time can be used to discover modularity characteristics on the phenotype-gene heterogeneous network and contribute to disease subtype inference.
where ℙ a (τ B = k) refers to the possibility that a random walk starting form node a hits the set B at a time point k, and k is the summing target ranging from 1 to positive infinite.
Here can be inconveniently large in actual calculation, so we scale it to ensure the range of MHT is between 0 and 1.
where ℙ a (τ B = k|τ B <τ A ) refers to the possibility that a random walk starting form node a hits the set B at a time point k, conditioning on the same random walk hits the set B before it hits the set A.
We also scale CMHT to the range between 0 and 1 to avoid the inconvenient large in actual calculation. Both of the closeness measures defined in Equation (2) and Equation (4) can be computed explicitly without any preset parameters (see detailed computational methods in Material and Methods).
Performance of CIPHER-HIT in credibly predicting diseases-causing genes
which can be used to test the connectivity strength between u i and other adjacency nodes. Note that a smaller value of the conditional Mean-Hitting-Time (M p (u i )) indicates a higher modularity level, namely a stronger connection between the adjacent node (u i ) and other nodes in the adjacency set . By calculating the minimum conditional Mean-Hitting-Time, we assess the modularity level of one node p on the network with regard to its adjacency node u i as the maximum connectivity strength between other adjacency nodes u and u i . Different from the concept of topological similarity between two nodes, the modularity level of one node with regard to another takes the other adjacency nodes into consideration, and serves as the measure of connectivity strength among more than two connected nodes. Then we set a threshold θ M to distinguish the adjacent nodes so that which satisfies M p (u i ) < θ M will be marked as the one with high connectivity strength to the other adjacent nodes.
According to the definition above, denotes the adjacent nodes u including disease-genes associated with p or phenotypes similar to p that are strongly connected with each other. For any , the random walk starting from u i will reach u j easily without passing p. This feature is illustrated in Figure 2B.
Next, we analyze the prioritized genes for target phenotype p. We measure the closeness between each gene to the nodes in without the influence of the nodes in . We compute for each gene g and then rank results from the smallest to the largest, so that we record the ranking position r' p (g). By comparison of RANK p (g) and RANK' p (g) for each prioritized genes, and if RANK p (g)/RANK' p (g) > 1, we conclude that gene g is in association with the node p because it is close to the adjacent nodes in set , and these genes are marked as the most credible predicted results.
Note that though we mark the prioritized genes that are close to the adjacent node in , we do not exclude the other prioritized genes. The nodes in are also available to form modules with other genes but they might not be exhibited because of the incompleteness of the network information. Since the genes in already exhibit the inclination to have tight relationship, we suggest the marked genes be selected for further biological investigation with high priority.
Disease subtype inference by CIPHER-HIT
The development of a reliable method to identify disease subtypes will not only enhance our understanding of disease mechanism, but also provide principles for designing a tailored diagnosis and treatment for patients. For a long time, identification of disease subtypes by phenotype associations of patients is of highly importance for assigning individual treatments in the medical community, especially in traditional Chinese medicine which holds "Bian-ZHENG-Lun-Zhi" (Syndrome differentiation and treatment for disease) as its core concept . Inspired by such a rationale , we further note that in the heterogeneous networks, the adjacent set of a target phenotype can be used not only to predict potential disease-causing genes, but also to reveal further structural relationships among the genes with regard to their contributions to disease phenotypes. If the prioritized genes of a query phenotype can be further grouped into several classes according to different functions, then the sub-modules in the network are expected to be distinguished to correspond to these sub-groups of genes.
Next, the classification of the prioritized genes can be done by diagonalization of the matrix C in Equation (9) by using the hierarchical clustering method. Furthermore, after matrix diagonalization, suppose the genes are divided into groups G1, G2 ···,G l , and the adjacent nodes are divided into , then it is promising to analyze the subtypes of the phenotype p based on such divisions. And the resulted sub-groups of disease genes are likely to be related to the functional units of disease subtypes.
Finally, we statistically analyze the subgroups of genes to evaluate whether they are separable in terms of network topology. We calculate the Mean-Hitting-Time between pairs of predicted disease-causing genes, either within the same subgroup or between different subgroups, to assess the topological similarity. The Fisher's exact test  is employed to access whether gene pairs within the same subgroup are more topologically similar than gene pairs in separate subgroups.
A case study on Breast cancer subtype detection
Breast cancer is known to be a carcinoma with highly heterogeneous  and its heterogeneity is more complicate than the results suggested by histopathological analysis alone , so it became necessary to find more molecular evidence to distinguish Breast cancer subtypes. Therefore, we take "Breast cancer" as a typical case to evaluate the performance of CIPHER-HIT for detection of disease subtypes.
Statistical measures for the predicted two subtypes of Breast cancer*.
Number of gene pairs with high topological similarity
MHT(g, g') <θ R )
Number of gene pairs with low topological similarity
MHT(g, g') >θ R )
Within subgroup 1
Within subgroup 2
Between subgroups 1 and 2
Thus, the case study of Breast cancer shown in Figure 4 provides evidence that the connectivity features of the phenotype-gene heterogeneous network can be used to distinguish the molecular bases related to different disease subtypes and lead to novel findings. And the CIPHER-HIT method could serve as an important complementarity to current approaches for identification of cancer subtypes. If the prioritized genes of a queried phenotype are further divided into sub-groups which are related to subtypes of the disease, then we call each sub-group of genes as the susceptible modules of disease subtypes.
From the above example, it can be seen that the polymorphism of the cancer is related to a group of genes, instead of a single gene. We propose to characterize the subtypes of a disease by distinguishing the associated gene groups. If the adjacent nodes of a given phenotype exhibit a genetic or phenotypic difference, namely the prioritized genes can be divided into several sub-groups according to their relations to the adjacent nodes, it is likely to reveal subtypes according to a sub-division. Our work demonstrates that the disease subtype analysis can be carried out in the network context and benefit from the integration of phenotype and gene heterogeneous information. We also show that the modularity-based method, CIPHER-HIT, is a promising way to discover the subtype-associated genes based on the heterogeneous network structure. Based on the prioritization information on the gene sets, the results will allow for further clinical and experimental researches.
For the limitations of the present work, the CIPHER-HIT method currently only restricts on the genetic level, makes use of relatively simple data resources, and does not consider the quantitative analysis for gene expressions. As one of the future research directions, more efforts are still need to be made to evaluate the performance our method on different data, especially include quantitative information such as microarray and proteomics data for discovering disease mechanism in the gene expression level or protein level. An extension of our method to the systematic identification of disease subtypes also needs to be developed. Moreover, we believe that the method can also be easily generalized to enable the credible prediction of drug targets and detect the pleiotropic effects of drugs in our drugCIPHER framework  if we combine drug targets information into the phenotype-gene heterogeneous network.
In summary, in this work, we introduce a concept of modularity level and propose a CIPHER-HIT method to use the Mean-Hitting-Time to measure global closeness between nodes of the heterogeneous network that consists of both genes and phenotypes. This measure has solid mathematics foundations and is easy to calculate. Based on this measure, we proposed a method to select high confident neighbors of a phenotype and detect gene modules that are highly connected to these high confident neighbors. Therefore the modularity of prioritized genes can be revealed, which may provide more mechanistic insights to the phenotype-genotype association. We also demonstrate that the performance of disease gene predictions is improved significantly by combining the modularity measure into the network inference, suggesting modularity is one of key features for network-based credible prioritization of candidate disease genes. Moreover, by detecting the sub-modules in the heterogeneous network, we revealed the potentially genetic and phenotypic properties of cancer subtypes. We believe this method can also be explored to predict biomarkers associated with disease subtypes in the gene expression and protein levels, as well as detect the pleiotropic drug actions in the future.
Materials and methods
Dataset and the heterogeneous network
We used the following three data sets to form the three parts, namely the phenotype-phenotype similarity network, protein-protein interaction network and gene-disease association network, of the phenotype-gene heterogeneous network based on which the prediction was carried out.
The Human Protein Reference Database (HPRD)  was adopted to construct the protein-protein interaction network. The largest component of the HPRD protein-protein interaction network contains 34364 edges and 8503 vertices.
The phenotype similarity came from the results calculated by van Driel et al.. The phenotype similarity network contains 5080 phenotypes.
The associations between the phenotypes and genes were from the OMIM (Online Mendelian Inheritance in Man, http://www.ncbi.nlm.nih.gov/omim) records as described in precious studies [2, 4]. The edge weights of this phenotype-gene sub-network will be defined in Equation (10).
The heterogeneous network was described by the weight matrix. We constructed it by merging the weight matrices of the sub-networks into one matrix. Let W G denote the weight matrix of the HPRD network. For any two genes g1 and g2, if there was a corresponding protein-protein interaction recorded in the HPRD database, then W G (g1, g2) = 1, otherwise W G (g1, g2) = 0.
The phenotype similarities were used as the description of the diseases relations. The same data as previous works [2, 4] were used, where the phenotype similarity data were calculated by van Driel et al.. Since the high similarities were only present between parts of phenotype pairs, we set a threshold to filter out very low similarity values. Let W p denote the weight matrix of the phenotype-phenotype similarity network. If the similarity value between two phenotypes p1 and p2 was larger than the threshold 0.4, then the weight W p (p1, p2) took this similarity value, otherwise W p (p1, p2) = 0.
by which we can achieve that for each pair of associated gene and phenotype (g, p), the average possibility of "walking" onto a different sub-network at the point g and p in the random walk process will equal 0.5.
where refers to the transpose of W A .
We defined the random walk according to the weight matrix described as Equation (11) and carried out the network inference on it.
The Mean-Hitting-Time and conditional Mean-Hitting-Time in CIPHER-HIT
In the previous random walk with restart method [3, 4], the stationary distribution is used to define closeness between two nodes on a network. Here we define the topological properties on the phenotype-gene heterogeneous network in the same mathematical background using the Mean-Hitting-Time of the random walk. This definition is more suitable in solving the problem of both disease-causing gene inference and disease subtype inference, because by adopting this measure, we no longer have to choose the priori parameter required in the former method (which was always assumed to be arbitrary), and this measure leads us to a natural way of discovering modularity characteristics on the heterogeneous network. The math formula expressions below are mainly adopted from [40, 41].
where I refers to the identity matrix, and x(v) refers to the v th component of vector x.
This work is supported by the National Natural Science Foundation of China (Nos. 60934004, 90709013 and 61021063) and the innovation scientific fund of Tsinghua University.
- Lage K, Karlberg EO, Størling ZM, Olason PI, Pedersen AG, Rigina O, Hinsby AM, Tümer Z, Pociot F, Tommerup N, Moreau Y, Brunak S: A human phenome-interactome network of protein complexes implicated in genetic disorders. Nat Biotechnol. 2007, 25: 309-316. 10.1038/nbt1295View ArticlePubMedGoogle Scholar
- Wu X, Jiang R, Zhang MQ, Li S: Network-based global inference of human disease genes. Mol Syst Biol. 2008, 4: 189-PubMed CentralView ArticlePubMedGoogle Scholar
- Köhler S, Bauer S, Horn D, Robinson PN: Walking the interactome for prioritization of candidate disease genes. Am J Hum Genet. 2008, 82: 949-958. 10.1016/j.ajhg.2008.02.013PubMed CentralView ArticlePubMedGoogle Scholar
- Li Y, Patra JC: Genome-wide inferring gene-phenotype relationship by walking on the heterogeneous network. Bioinformatics. 2010, 26: 1219-1224. 10.1093/bioinformatics/btq108View ArticlePubMedGoogle Scholar
- Wu X, Li S: Cancer gene prediction using a network approach. Cancer Systems Biology. Edited by: Edwin Wang. 2010, 191-212. Series: Chapman & Hall/CRC Mathematical & Computational Biology, USA: CRC PressView ArticleGoogle Scholar
- Lim J, Hao T, Shaw C, Patel AJ, Szabó G, Rual JF, Fisk CJ, Li N, Smolyar A, Hill DE, Barabási AL, Vidal M, Zoghbi HY: A protein-protein interaction network for human inherited ataxias and disorders of Purkinje cell degeneration. Cell. 2006, 125: 801-814. 10.1016/j.cell.2006.03.032View ArticlePubMedGoogle Scholar
- Goehler H, Lalowski M, Stelzl U, Waelter S, Stroedicke M, Worm U, Droege A, Lindenberg KS, Knoblich M, Haenig C, Herbst M, Suopanki J, Scherzinger E, Abraham C, Bauer B, Hasenbank R, Fritzsche A, Ludewig AH, Büssow K, Coleman SH, Gutekunst CA, Landwehrmeyer BG, Lehrach H, Wanker EE: A protein interaction network links GIT1, an enhancer of huntingtin aggregation, to Huntington's disease. Mol Cell. 2004, 15: 853-865. 10.1016/j.molcel.2004.09.016View ArticlePubMedGoogle Scholar
- Xu J, Li Y: Discovering disease-genes by topological features in human protein-protein interaction network. Bioinformatics. 2006, 22: 2800-2805. 10.1093/bioinformatics/btl467View ArticlePubMedGoogle Scholar
- Bortoluzzi S, Romualdi C, Bisognin A, Danieli GA: Disease genes and intracellular protein networks. Physiol Genomics. 2003, 15: 223-227.View ArticlePubMedGoogle Scholar
- George RA, Liu JY, Feng LL, Bryson-Richardson RJ, Fatkin D, Wouters MA: Analysis of protein sequence and interaction data for candidate disease gene prediction. Nucl Acids Res. 2006, 34: e130- 10.1093/nar/gkl707PubMed CentralView ArticlePubMedGoogle Scholar
- Gonzalez G, Uribe JC, Tari L, Brophy C, Baral C: Mining gene-disease relationships from biomedical literature: weighting protein-protein interactions and connectivity measures. Pac Symp Biocomput. 2007, 28-39.Google Scholar
- Kann MG: Protein interactions and disease: computational approaches to uncover the etiology of diseases. Brief Bioinform. 2007, 8: 333-346. 10.1093/bib/bbm031View ArticlePubMedGoogle Scholar
- Limviphuvadh V, Tanaka S, Goto S, Ueda K, Kanehisa M: The commonality of protein interaction networks determined in neurodegenerative disorders (NDDs). Bioinformatics. 2007, 23: 2129-2138. 10.1093/bioinformatics/btm307View ArticlePubMedGoogle Scholar
- Pattin KA, Moore JH: Exploiting the proteome to improve the genome-wide genetic analysis of epistasis in common human diseases. Hum Genet. 2008, 124: 19-29. 10.1007/s00439-008-0522-8PubMed CentralView ArticlePubMedGoogle Scholar
- Oti M, Snel B, Huynen MA, Brunner HG: Predicting disease genes using protein-protein interactions. J Med Genet. 2006, 43: 691-698. 10.1136/jmg.2006.041376PubMed CentralView ArticlePubMedGoogle Scholar
- Brunner HG, van Driel MA: From syndrome families to functional genomics. Nat Rev Genet. 2004, 5: 545-551.View ArticlePubMedGoogle Scholar
- van Driel MA, Bruggeman J, Vriend G, Brunner HG, Leunissen JA: A text-mining analysis of the human phenome. Eur J Hum Genet. 2006, 14: 535-542. 10.1038/sj.ejhg.5201585View ArticlePubMedGoogle Scholar
- Jiang X, Liu B, Jiang J, Zhao H, Fan M, Zhang J, Fan Z, Jiang T: Modularity in the genetic disease-phenotype network. FEBS Lett. 2008, 582: 2549-2554. 10.1016/j.febslet.2008.06.023View ArticlePubMedGoogle Scholar
- Qi Y, Ge H: Modularity and dynamics of cellular networks. PLoS Comp Biol. 2006, 2: e174-10.1371/journal.pcbi.0020174.View ArticleGoogle Scholar
- van't Veer LJ, Bernards R: Enabling personalized cancer medicine through analysis of gene-expression patterns. Nature. 2008, 452: 564-570. 10.1038/nature06915View ArticlePubMedGoogle Scholar
- Sims AH, Howell A, Howell SJ, Clarke RB: Origins of breast cancer subtypes and therapeutic implications. Nat Clin Pract Oncol. 2007, 4: 516-525. 10.1038/ncponc0908View ArticlePubMedGoogle Scholar
- Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson J, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, Staudt LM: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature. 2000, 403: 503-511. 10.1038/35000501View ArticlePubMedGoogle Scholar
- Li J, Lenferink AE, Deng Y, Collins C, Cui Q, Purisima EO, O'Connor-McCourt MD, Wang E: Identification of high-quality cancer prognostic markers and metastasis network modules. Nat Commun. 2010, 1: 34-PubMedGoogle Scholar
- Sharan R, Ulitsky I, Shamir R: Network-based prediction of protein function. Mol Syst Biol. 2007, 3: 88-PubMed CentralView ArticlePubMedGoogle Scholar
- Levy-Lahad E: Fanconi anemia and breast cancer susceptibility meet again. Nat Genet. 2010, 42: 368-369. 10.1038/ng0510-368View ArticlePubMedGoogle Scholar
- D'Andrea AD: Susceptibility pathways in Fanconi's anemia and breast cancer. N Engl J Med. 2010, 362: 1909-1919. 10.1056/NEJMra0809889PubMed CentralView ArticlePubMedGoogle Scholar
- Li S, Zhang ZQ, Wu LJ, Zhang XG, Li YD, Wang YY: Understanding ZHENG in traditional Chinese medicine in the context of neuro-endocrine-immune network. IET Syst Biol. 2007, 1: 51-60. 10.1049/iet-syb:20060032View ArticlePubMedGoogle Scholar
- Upton JG Graham: Fisher's Exact Test. J Royal Statistical Society A. 1992, 155: 395-402. 10.2307/2982890.View ArticleGoogle Scholar
- Korkola JE, DeVries S, Fridlyand J, Hwang ES, Estep AL, Chen YY, Chew KL, Dairkee SH, Jensen RM, Waldman FM: Differentiation of lobular versus ductal breast carcinomas by expression microarray analysis. Cancer Res. 2003, 63: 7167-7175.PubMedGoogle Scholar
- Hsu HM, Wang HC, Chen ST, Hsu GC, Shen CY, Yu JC: Breast cancer risk is associated with the genes encoding the DNA double-strand break repair Mre11/Rad50/Nbs1 complex. Cancer Epidemiol Biomarkers Prev. 2007, 16: 2024-2032. 10.1158/1055-9965.EPI-07-0116View ArticlePubMedGoogle Scholar
- Low YL, Li Y, Humphreys K, Thalamuthu A, Li Y, Darabi H, Wedrén S, Bonnard C, Czene K, Iles MM, Heikkinen T, Aittomäki K, Blomqvist C, Nevanlinna H, Hall P, Liu ET, Liu J: Multi-Variant Pathway Association Analysis Reveals the Importance of Genetic Determinants of Estrogen Metabolism in Breast and Endometrial Cancer Susceptibility. PLoS Genet. 2010, 6: e1001012- 10.1371/journal.pgen.1001012PubMed CentralView ArticlePubMedGoogle Scholar
- Chisamore MJ, Wilkinson HA, Flores O, Chen JD: Estrogen-related receptor-alpha antagonist inhibits both estrogen receptor-positive and estrogen receptor-negative breast tumor growth in mouse xenografts. Mol Cancer Ther. 2009, 8: 672-681. 10.1158/1535-7163.MCT-08-1028View ArticlePubMedGoogle Scholar
- Chisamore MJ, Cunningham ME, Flores O, Wilkinson HA, Chen JD: Characterization of a novel small molecule subtype specific estrogen-related receptor alpha antagonist in MCF-7 breast cancer cells. PLoS ONE. 2009, 4: e5624- 10.1371/journal.pone.0005624PubMed CentralView ArticlePubMedGoogle Scholar
- Fasching PA, Loehberg CR, Strissel PL, Lux MP, Bani MR, Schrauder M, Geiler S, Ringleff K, Oeser S, Weihbrecht S, Schulz-Wendtland R, Hartmann A, Beckmann MW, Strick R: Single nucleotide polymorphisms of the aromatase gene (CYP19A1), HER2/neu status, and prognosis in breast cancer patients. Breast Cancer Res Treat. 2008, 112: 89-98. 10.1007/s10549-007-9822-2View ArticlePubMedGoogle Scholar
- Talbott KE, Gammon MD, Kibriya MG, Chen Y, Teitelbaum SL, Long CM, Gurvich I, Santella RM, Ahsan H: A CYP19 (aromatase) polymorphism is associated with increased premenopausal breast cancer risk. Breast Cancer Res Treat. 2008, 111: 481-487. 10.1007/s10549-007-9794-2View ArticlePubMedGoogle Scholar
- Arpino G, Weiss H, Lee AV, Schiff R, Placido SD, Osborne CK, Elledge RM: Estrogen Receptor-Positive, Progesterone Receptor-Negative Breast Cancer: Association With Growth Factor Receptor Expression and Tamoxifen Resistance. J Natl Cancer Inst. 2005, 97: 1254-1261. 10.1093/jnci/dji249View ArticlePubMedGoogle Scholar
- Bauer KR, Brown M, Cress RD, Parise CA, Caggiano V: Descriptive analysis of estrogen receptor (ER)-negative, progesterone receptor (PR)-negative, and HER2-negative invasive breast cancer, the so-called triple-negative phenotype: a population-based study from the California cancer Registry. Cancer. 2007, 109: 1721-1728. 10.1002/cncr.22618View ArticlePubMedGoogle Scholar
- Zhao S, Li S: Network-based relating pharmacological and genomic spaces for drug target identification. PLoS One. 2010, 5: e11764- 10.1371/journal.pone.0011764PubMed CentralView ArticlePubMedGoogle Scholar
- Peri S, Navarro JD, Amanchy R, Kristiansen TZ, Jonnalagadda CK, Surendranath V, Niranjan V, Muthusamy B, Gandhi TK, Gronborg M, Ibarrola N, Deshpande N, Shanker K, Shivashankar HN, Rashmi BP, Ramya MA, Zhao Z, Chandrika KN, Padma N, Harsha HC, Yatish AJ, Kavitha MP, Menezes M, Choudhury DR, Suresh S, Ghosh N, Saravana R, Chandran S, Krishna S, Joy M, Anand SK, Madavan V, Joseph A, Wong GW, Schiemann WP, Constantinescu SN, Huang L, Khosravi-Far R, Steen H, Tewari M, Ghaffari S, Blobe GC, Dang CV, Garcia JG, Pevsner J, Jensen ON, Roepstorff P, Deshpande KS, Chinnaiyan AM, Hamosh A, Chakravarti A, Pandey A: Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res. 2003, 13: 2363-2371. 10.1101/gr.1680803PubMed CentralView ArticlePubMedGoogle Scholar
- Bovier A: Metastability: A Potenial Theoretical approach. Proceedings of ICM Madrid, European Mathematical Society. 2006, 498-518.Google Scholar
- Norris JR: Markov Chain. 1997, Cambridge CB2 2RU, United Kingdom: Cambridge University PressView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.