Mining breast cancer genes with a network based noise-tolerant approach
© Nie and Yu; licensee BioMed Central Ltd. 2013
Received: 28 November 2012
Accepted: 21 June 2013
Published: 25 June 2013
Mining novel breast cancer genes is an important task in breast cancer research. Many approaches prioritize candidate genes based on their similarity to known cancer genes, usually by integrating multiple data sources. However, different types of data often contain varying degrees of noise. For effective data integration, it’s important to design methods that work robustly with respect to noise.
Gene Ontology (GO) annotations were often utilized in cancer gene mining works. However, the vast majority of GO annotations were computationally derived, thus not completely accurate. A set of genes annotated with breast cancer enriched GO terms was adopted here as a set of source data with realistic noise. A novel noise tolerant approach was proposed to rank candidate breast cancer genes using noisy source data within the framework of a comprehensive human Protein-Protein Interaction (PPI) network. Performance of the proposed method was quantitatively evaluated by comparing it with the more established random walk approach. Results showed that the proposed method exhibited better performance in ranking known breast cancer genes and higher robustness against data noise than the random walk approach. When noise started to increase, the proposed method was able to maintained relatively stable performance, while the random walk approach showed drastic performance decline; when noise increased to a large extent, the proposed method was still able to achieve better performance than random walk did.
A novel noise tolerant method was proposed to mine breast cancer genes. Compared to the well established random walk approach, it showed better performance in correctly ranking cancer genes and worked robustly with respect to noise within source data. To the best of our knowledge, it’s the first such effort to quantitatively analyze noise tolerance between different breast cancer gene mining methods. The sorted gene list can be valuable for breast cancer research. The proposed quantitative noise analysis method may also prove useful for other data integration efforts. It is hoped that the current work can lead to more discussions about influence of data noise on different computational methods for mining disease genes.
KeywordsNetwork Breast cancer Data noise Noise tolerance
Novel disease genes remain difficult to identify in most genetic diseases, and in particular, in highly polygenic disorders. Currently, not all genes have yet been detected even for those diseases whose molecular mechanisms are partially known , for instance, breast cancer . Breast cancer is a common cancer and a major cause of cancer death among females around the world, which makes up 23% of total cancer cases and 14% of cancer deaths . Mining breast cancer genes is conducive to understand its pathogenic mechanism and search for effective treatments. With rapid growth of disease-related genomic and functional data, computational approaches can be utilized to mine for new cancer genes .
In the past two decades, a number of computational methods had been developed to mine potential disease related genes. Most of those methods rank candidate genes based on the idea that proteins similar to each other tend to cause similar or same diseases . They involve setting up a candidate gene set to be compared with a known disease gene set on their physical or functional attributes . On one hand, physical attribute-based methods include screening direct neighbors of known disease genes in the PPI network [7, 8], comparing shortest path length  between candidate genes and known disease genes, clustering or graph partitioning to uncover disease modules in the interaction network [10–12]. Some approaches also used global network features to find genes similar with known disease genes [13, 14]. On the other hand, several methods rely on functional similarities between candidate and disease genes , for example, some methods measured similarity between genes by their functional annotations  (e.g., Gene Ontology (GO) ). Methods using other data sources had also been developed, such as gene expression [18, 19], biological pathways and sequence features .
Cancers such as breast cancer are complex and heterogeneous in nature, cancer-related genes often do not function in isolation but interact with one another . Integrating multiple data types was found to be effective for gene mining in alleviating problems caused by incomplete information [21–23]. For instance, ENDEAVOUR  is an online tool based on using multiple data sources. It integrated candidate gene rankings from different data sources into a final ranking with the order statistic algorithm. However, different data categories usually contain inherent noise or systematic errors . For instance, data from computational predictions will no doubt contain some amount of uncertainty. Experimental data obtained from different labs or experimental platforms can contain appreciable amount of noise. Noise in source data can push computed results away from their true values, lead to erroneous reporting.
A better method must be able to tolerate certain amount of noise, which makes the integration of different data sources more applicable to real-life scenarios. Despite the fact that some approaches can work with precision when presented with highly accurate data, few studies have shown that those methods worked robustly when faced with increasingly noisy data. A number of papers had discussed the task of balancing noise and precision when using multiple data sources for cancer gene mining, however, hardly anyone had analyzed the noise problem quantitatively [26–29]. It is important to calibrate how robust a method works with respect to noise, namely, how fast a method deteriorates when percentage of noise in source data goes up. With that knowledge, users can then be confident about the method’s effectiveness when it is applied to real life data sets.
Results and discussion
Volume of input data
Volume of original data
Mar. 3, 2013
Mar. 3, 2013
Mar. 3, 2013
Mar. 3, 2013
Human Signalling Network
Mar. 3, 2013
Gene expression data
Apr. 7, 2011
Known cancer genes
Mar. 3, 2013
Mar. 3, 2013
GO term (BP)
Mar. 3, 2013
Mar. 3, 2013
Mar. 3, 2013
Cancer-hallmark GO terms
Mar. 3, 2013
After removing redundancy, a comprehensive human PPI network was constructed with data obtained from multiple interactions databases. The resultant network contained a total of 156,459 PPIs with 15,494 genes. A noise tolerant method was designed to rank potential breast cancer genes.
Rationale for data integration
Evaluation of performance in ranking known cancer genes
Ranking performance comparison
The proposed method
Random walk approach
top 10% average
all 11 test genes average
Robustness with respect to realistic data noise
Approaches that exhibit robust performance with regard to noise are needed if they are to prove useful in cancer gene hunting endeavours. Nevertheless, as mentioned before, few projects had specifically analysed data noise effects quantitatively. A network based noise tolerant method was proposed here to mine breast cancer genes. Its performance was compared with that of the well performing random walk approach by five-fold cross-validation. The results confirmed the proposed method’s robust performance with respect to data noise.
The set of known breast cancer genes (KnownSet, see Methods) was enlarged by including genes sharing GO annotations with those in the KnownSet. The enlarged set was called the GOSet (GO enriched gene set, see Methods), which was adopted as a noisy set of likely breast cancer genes. The GOSet was utilized to check an algorithm’s robustness with respect to data noise. Data were sampled from the GOSet, and combined with the KnowSet to generate a noisy set of training data. This way of synthesizing noisy data set is unique in that it doesn’t simply using random data as noise, which is too artificial. The GOSet contains enriched but still imperfect data, which can better mimic data noise in real life scenarios. An algorithm’s ability to retain its performance was checked when fraction of noisy data in the training set went up.
Random walk approach exhibited a sharp decrease in its performance, while our method was able to maintain a relatively stable performance.
When noise increased to a large extent, the proposed method was able to perform about twice as well as random walk approach did.
It can thus be stated that the proposed method was more robust with respect to noise in input data, compared to the state-of-art random walk based approach. The results also confirmed the power of data integration, which was able to let different data sets complementing each other [22, 23].
Robustness with respect to completely random noise
Cancers are highly complex processes, the majority of cancer genes are yet to be mapped. Currently available data (known breast cancer genes) are too limited to be really effective for cancer gene searching purpose. Broadening the scope of input data (both volume and type) should enable better use of available data to mine for new cancer genes. Approaches that work robustly against data noise are needed.
A novel noise tolerant breast cancer gene mining method was presented here, which integrated a comprehensive PPI network, gene expression data, prior knowledge of breast cancer and GO annotations to rank potential breast cancer genes. From each data source, a ranked list for each candidate gene was computed, and they were then combined into a final ranking order. Influence of data noise was quantitatively evaluated. Random walk approach performed better than the proposed method using 100% accurate input data (known breast cancer genes). However, the proposed method showed much greater noise tolerance. To our best knowledge, this is the first effort to quantitatively analyse noise tolerance between different cancer gene mining methods. The framework of the proposed mining method and the quantitative way of appraising noise effects are flexible enough to be useful for other data sources, and hopefully, lead to more discussions on data noise issue for different computational methods in cancer gene mining field.
Figure 1 presented a schematic view of our approach. A comprehensive PPI network was obtained by integrating data from different interactions databases  (Box A). A set of known breast cancer genes (KnownSet) was extracted from the OMIM and CGC databases (Box B). Candidate genes were first ranked by three network topological attributes: node degree, node betweenness and by their closeness to known cancer genes in the network (Box C). GO term enrichment analyses were performed for KnownSet, producing a GO term set enriched with breast cancer related terms, into which a group of cancer-hallmark GO terms were also added . A set of genes which were annotated with terms in the obtained GO set were generated, which was called the GO enriched gene set (GOSet) (Box D). A batch of breast cancer-related expression data was extracted from the GEO database  on April 7, 2011 and expression profiles in those data files were clustered based on their similarity with each other (Box E). Expression clusters were intersected with GOSet. Overlap significance was represented by a p-value computed with the normal distribution. The p-value was utilized to rank genes in expression clusters (Box F). All individual rankings from different data sources were finally combined into a final ranking, which represented a gene’s overall probability of being involved in breast cancer (Box G).
Deriving known breast cancer gene set
The known breast cancer genes
Ranking by PPI network
The human PPI data were derived from five sources: HPRD , BioGRID , homoMINT , IntAct  and a manually curated human signalling network . Protein identifiers were mapped to uniform coding gene identifiers. Official gene symbols were used as identifier. Redundant interactions were removed, along with interactions with identifiers that could not be mapped to gene symbols (Table 1). The final PPI network was represented by an undirected graph where nodes representing genes and edges representing interactions. The graph contained 156,459 interactions connecting 15,494 genes.
Similarities between proteins were found to be correlated with their proximity in the PPI network . It was assumed that when a gene in the PPI network exhibited topological features similar to known breast cancer genes, it’s more likely to be involved in breast cancer processes. Several papers had shown that cancer genes could be effectively distinguished from others by their topological attributes in the PPI network, such as node degree , betweenness centrality  and shortest path length . The above three network topological indices were computed and used to assess gene similarity in the PPI network. Genes were then sorted according to values of the topological indices.
Let G(V,E) be the PPI network, where V is the set of genes, and E the set of interactions in the network.
where jk is the number of shortest paths from a source j∈V to a target k∈V, and jk (v) is the number of those paths passing through some node v other than j,k. If j=k, jk =1, and if v∈j,k, jk (v)=0.
where G is the KnownSet, d(v,t) is the shortest path length between node v and t. n is the number of known breast cancer genes which can be reached by v.
Deriving GO enriched gene set
Ranking by gene expression and GO
All breast cancer-related gene expression datasets (keywords: Homo sapiens & breast cancer) were download from the GEO database . Data sets with fewer than five samples or conditions were deleted. Data sets of normal versus cancer samples were used so those containing recurred versus non-recurred samples were deleted. 53 GDSes (GEO data sets) were thus obtained.
where n is the set of expression profiles for gene i in a GDS, m is the number of samples/conditions in one of those profiles, and e k (j) is the corresponding expression value of sample j.
After the above mentioned preprocessing steps, genes in each GDS were clustered by the APCluster algorithm according to their expression profiles. APCluster is an algorithm based on affinity propagation which works by considering all data points as potential cluster centers at the same time and setting up messages of similarity between any two data points, messages are exchanged among data points until all clusters are determined. APCluster had been shown to perform well compared to other clustering approaches [58, 59]. Pearson correlation coefficient between gene expression profiles was used as the similarity metric for APCluster. It was assumed that genes within a cluster would have higher probability of being involved in certain biological processes than those across clusters.
S E (v) was the expression-based ranking score of gene v, which was computed from breast cancer-related gene expression data and GO annotations. λ (0≤λ≤1) is a coefficient to weigh the contribution of topological attributes and expression information in ranking breast cancer genes. The average ranking of genes in the KnownSet that sorted into top 10% was computed as the P-score. A smaller P-score meant better performance, that is, it was more likely to find true breast cancer genes from the top of the sorted list. S(v) is the final ranking for a gene v, which reflected the belief that a specific gene was a potential breast cancer gene. The higher a gene was ranked, the more likely it was involved in breast cancer related processes.
Random walk approach
where p 0 is the initial probability, which is 1/37 for the 37 genes in the KnownSet and 0 for all others; r represents the probability of remaining at the same node at the next step.  showed that random walk worked robustly against different r values, which was also confirmed by our computation (data not shown). r was taken to be 0.7 in the current work. For details of random walk approach, see .
Ranking performance comparison
OMIMSet contained 26 known breast cancer genes, CGCSet contained additional 11. The 26 known breast cancer genes in the OMIMSet were used as the KnownSet. Procedure in Figure 1 was followed and the model built, which was then used to rank the 11 known breast cancer genes in the CGCSet. Ranking values in italic meant those genes were ranked in the top 10% of the final list. The row of “top 10% average” represented average rankings of those known breast cancer genes in CGCSet that ranked in top 10%, while “all 11 test genes average” represented the average rankings of all 11 genes in the CGCSet (Table 2). In later computation, OMIMSet and CGCSet were combined into a KnownSet of 37 genes.
Performance evaluation against realistic data noise
One F-score was computed for each fold, averaging five F-scores (for five-fold cross validation) produced the final F-score.
Performance evaluation against completely random noise
Random genes were first sampled from the PPI network and added to the KnownSet. Procedure in Figure 1 and random walk computation were then performed. Figure 5 plotted the ratio of random genes to the number of genes in KnownSet (37) and the proportion of added random genes that ranked within top 10%.
This work was partly supported by the National Natural Science Foundation of China (No. 61179008).
- Wang X, Gulbahce N, Yu H: Network-based methods for human disease gene prediction. Brief Funct Genomics. 2011, 10: 280-293. 10.1093/bfgp/elr024.PubMedView Article
- Wu X, Li S: Cancer gene prediction using a network approach. Cancer Systems Biology. 2010, 191-212.View Article
- Siegel R, Naishadham D, Jemal A: Cancer statistics, 2012. CA Cancer J Clin. 2012, 62: 10-29. 10.3322/caac.20138.PubMedView Article
- Materi W, Wishart DS: Computational systems biology in cancer: modeling methods and applications. Gene Regul Syst Bio. 2007, 1: 91-110.PubMedPubMed Central
- Ideker T, Sharan R: Protein networks in disease. Genome Res. 2008, 18: 644-652. 10.1101/gr.071852.107.PubMedPubMed CentralView Article
- Chuang HY, Hofree M, Ideker T: A decade of systems biology. Annu Rev Cell Dev Biol. 2010, 26: 721-744. 10.1146/annurev-cellbio-100109-104122.PubMedPubMed CentralView Article
- Oti M, Snel B, Huynen MA, Brunner HG: Predicting disease genes using protein–protein interactions. J Med Genet. 2006, 43: 691-698. 10.1136/jmg.2006.041376.PubMedPubMed CentralView Article
- Östlund G, Lindskog M, Sonnhammer ELL: Network-based Identification of Novel Cancer Genes. Mol Cell Proteomics. 2010, 9: 648-655. 10.1074/mcp.M900227-MCP200.PubMedPubMed CentralView Article
- Franke L, Bakel H, Fokkens L, De Jong ED, Egmont-Petersen M, Wijmenga C: Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. Am J Hum Genet. 2006, 78: 1011-1025. 10.1086/504300.PubMedPubMed CentralView Article
- Wu X, Jiang R, Zhang MQ, Li S: Network-based global inference of human disease genes. Mol Syst Biol. 2008, 4: 189-PubMedPubMed CentralView Article
- Navlakha S, Kingsford C: The power of protein interaction networks for associating genes with diseases. Bioinformatics. 2010, 26: 1057-1063. 10.1093/bioinformatics/btq076.PubMedPubMed CentralView Article
- Navlakha S, Schatz MC, Kingsford C: Revealing biological modules via graph summarization. J Comput Biol. 2009, 16: 253-264. 10.1089/cmb.2008.11TT.PubMedView Article
- Köhler S, Bauer S, Horn D, Robinson PN: Walking the interactome for prioritization of candidate disease genes. Am J Hum Genet. 2008, 82: 949-958. 10.1016/j.ajhg.2008.02.013.PubMedPubMed CentralView Article
- Vanunu O, Magger O, Ruppin E, Shlomi T, Sharan R: Associating genes and protein complexes with disease via network propagation. PLoS Comput Biol. 2010, 6: e1000641-10.1371/journal.pcbi.1000641.PubMedPubMed CentralView Article
- Tiffin N, Andrade-Navarro M, Perez-Iratxeta C: Linking genes to diseases: it's all in the data. Genome Med. 2009, 1: 77-10.1186/gm77.PubMedPubMed CentralView Article
- Perez-Iratxeta C, Bork P, Andrade MA: Association of genes to genetically inherited diseases using data mining. Nat Genet. 2002, 31: 316-319.PubMed
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT: Gene Ontology: tool for the unification of biology. Nat Genet. 2000, 25: 25-29. 10.1038/75556.PubMedPubMed CentralView Article
- Nitsch D, Gonçalves J, Ojeda F, De Moor B, Moreau Y: Candidate gene prioritization by network analysis of differential expression using machine learning approaches. BMC Bioinforma. 2010, 11: 460-10.1186/1471-2105-11-460.View Article
- Linh T, Bin Z, Zhan Z, Chunsheng Z, Tao X, John L, Hongyue D, Eric S, Jun Z: Inferring causal genomic alterations in breast cancer using gene expression data. BMC Syst Biol. 2011, 5: 121-10.1186/1752-0509-5-121.View Article
- George RA, Liu JY, Feng LL, Bryson-Richardson RJ, Fatkin D, Wouters MA: Analysis of protein sequence and interaction data for candidate disease gene prediction. Nucleic Acids Res. 2006, 34: e130-e130. 10.1093/nar/gkl707.PubMedPubMed CentralView Article
- Gortzak-Uzan L, Ignatchenko A, Evangelou AI, Agochiya M, Brown KA, St. Onge P, Kireeva I, Schmitt-Ulms G, Brown TJ, Murphy J, Rosen B, Shaw P, Jurisica I, Kislinger T: A proteome resource of ovarian cancer ascites: integrated proteomic and bioinformatic analyses to identify putative biomarkers. J Proteome Res. 2007, 7: 339-351.PubMedView Article
- Fortney K, Jurisica I: Integrative computational biology for cancer research. Hum Genet. 2011, 4: 465-481.View Article
- Nibbe RK, Koyutürk M, Chance MR: An integrative-omics approach to identify functional sub-networks in human colorectal cancer. PLoS Comp Biol. 2010, 6: e1000639-10.1371/journal.pcbi.1000639.View Article
- Aerts S, Lambrechts D, Maity S, Van Loo P, Coessens B, De Smet F, Tranchevent LC, De Moor B, Marynen P, Hassan B: Gene prioritization through genomic data fusion. Nat Biotechnol. 2006, 24: 537-544. 10.1038/nbt1203.PubMedView Article
- Xia Z, Wu L-Y, Zhou X, Wong ST: Semi-supervised drug-protein interaction prediction from heterogeneous biological spaces. BMC Syst Biol. 2010, 4: S6-PubMedPubMed CentralView Article
- De Bie T, Tranchevent LC, Van Oeffelen LMM, Moreau Y: Kernel-based data fusion for gene prioritization. Bioinformatics. 2007, 23: i125-i132. 10.1093/bioinformatics/btm187.PubMedView Article
- Chen Y, Wang W, Zhou Y, Shields R, Chanda SK, Elston RC, Li J: In silico gene prioritization by integrating multiple data sources. PLoS One. 2011, 6: e21137-10.1371/journal.pone.0021137.PubMedPubMed CentralView Article
- Barabási AL, Gulbahce N, Loscalzo J: Network medicine: a network-based approach to human disease. Nat Rev Genet. 2011, 12: 56-68. 10.1038/nrg2918.PubMedPubMed CentralView Article
- Huan T, Wu X, Bai Z, Chen JY: Seed-weighted random walks ranking method and its application to leukemia cancer biomarker prioritizations. Proceedings of the 2009 IEEE International Conference on Bioinformatics and Biomedicine Workshop. 2009, 220-220.View Article
- Chen X, Yan G, Ren W, Qu JB: Modularized random walk with restart for candidate disease genes prioritization. Syst Biol (Stevenage). 2009, 353-360.
- Li Y, Patra JC: Genome-wide inferring gene–phenotype relationship by walking on the heterogeneous network. Bioinformatics. 2010, 26: 1219-1224. 10.1093/bioinformatics/btq108.PubMedView Article
- Yu J, Finley RL: Combining multiple positive training sets to generate confidence scores for protein–protein interactions. Bioinformatics. 2009, 25: 105-111. 10.1093/bioinformatics/btn597.PubMedPubMed CentralView Article
- Berardini TZ, Khodiyar VK, Lovering RC, Talmud P: The Gene Ontology in 2010: extensions and refinements. Nucleic Acids Res. 2010, 38: D331-D335.View Article
- Rhee SY, Wood V, Dolinski K, Draghici S: Use and misuse of the gene ontology annotations. Nat Rev Genet. 2008, 9: 509-515. 10.1038/nrg2363.PubMedView Article
- Harris M, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004, 32: D258-D261. 10.1093/nar/gkh036.PubMedView Article
- Murali T, Pacifico S, Yu J, Guest S, Roberts GG, Finley RL: DroID 2011: a comprehensive, integrated resource for protein, transcription factor, RNA and gene interactions for Drosophila. Nucleic Acids Res. 2011, 39: D736-D743. 10.1093/nar/gkq1092.PubMedPubMed CentralView Article
- Li J, Lenferink AE, Deng Y, Collins C, Cui Q, Purisima EO, O'Connor-McCourt MD, Wang E: Identification of high-quality cancer prognostic markers and metastasis network modules. Nat Commun. 2010, 1: 34-PubMed
- Edgar R, Domrachev M, Lash AE: Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002, 30: 207-210. 10.1093/nar/30.1.207.PubMedPubMed CentralView Article
- Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA: Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2005, 33: D514-D517.PubMedPubMed CentralView Article
- Futreal PA, Coin L, Marshall M, Down T, Hubbard T, Wooster R, Rahman N, Stratton MR: A census of human cancer genes. Nat Rev Cancer. 2004, 4: 177-183. 10.1038/nrc1299.PubMedPubMed CentralView Article
- Keshava Prasad TS, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, Telikicherla D, Raju R, Shafreen B, Venugopal A, Balakrishnan L, Marimuthu A, Banerjee S, Somanathan DS, Sebastian A, Rani S, Ray S, Harrys Kishore CJ, Kanth S, Ahmed M, Kashyap MK, Mohmood R, Ramachandra YL, Krishna V, Rahiman BA, Mohan S, Ranganathan P, Ramabadran S, Chaerkady R, Pandey A: Human Protein Reference Database—2009 update. Nucleic Acids Res. 2009, 37: D767-D772. 10.1093/nar/gkn892.PubMedPubMed CentralView Article
- Stark C, Breitkreutz BJ, Chatr-Aryamontri A, Boucher L, Oughtred R, Livstone MS, Nixon J, Van Auken K, Wang X, Shi X: The BioGRID interaction database: 2011 update. Nucleic Acids Res. 2011, 39: D698-D704. 10.1093/nar/gkq1116.PubMedPubMed CentralView Article
- Persico M, Ceol A, Gavrila C, Hoffmann R, Florio A, Cesareni G: HomoMINT: an inferred human network based on orthology mapping of protein interactions discovered in model organisms. BMC Bioinforma. 2005, 6: S21-View Article
- Hermjakob H, Montecchi‒Palazzi L, Lewington C, Mudali S, Kerrien S, Orchard S, Vingron M, Roechert B, Roepstorff P, Valencia A, Margalit H, Armstrong J, Bairoch A, Cesareni G, Sherman D, Apweiler R: IntAct: an open source molecular interaction database. Nucleic Acids Res. 2004, 32: D452-D455. 10.1093/nar/gkh052.PubMedPubMed CentralView Article
- Cui Q, Ma Y, Jaramillo M, Bari H, Awan A, Yang S, Zhang S, Liu L, Lu M, O'Connor-McCourt M, Purisima EO, Wang E: A map of human cancer signaling. Mol Syst Biol. 2007, 3: 152-PubMedPubMed CentralView Article
- Wang PI, Marcotte EM: It's the machine that matters: predicting gene function and phenotype from protein networks. J Proteomics. 2010, 73: 2277-2289. 10.1016/j.jprot.2010.07.005.PubMedPubMed CentralView Article
- Xu J, Li Y: Discovering disease-genes by topological features in human protein-protein interaction network. Bioinformatics. 2006, 22: 2800-2805. 10.1093/bioinformatics/btl467.PubMedView Article
- Özgür A, Vu T, Erkan G, Radev DR: Identifying gene-disease associations using centrality on a literature mined gene-interaction network. Bioinformatics. 2008, 24: i277-i285. 10.1093/bioinformatics/btn182.PubMedPubMed CentralView Article
- Brandes U: On variants of shortest-path betweenness centrality and their generic computation. Social Netwks. 2008, 30: 136-145. 10.1016/j.socnet.2007.11.001.View Article
- Hagberg AA, Schult DA, Swart PJ: Exploring network structure, dynamics, and function using NetworkX. Proceedings of the 7th Python in Science Conference (SciPy2008). 2008, Pasadena CA USA, 11-15.
- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. PNAS. 2005, 102: 15545-15550. 10.1073/pnas.0506580102.PubMedPubMed CentralView Article
- Rivals I, Personnaz L, Taing L, Potier M-C: Enrichment or depletion of a GO category within a class of genes: which test?. Bioinformatics. 2007, 23: 401-407. 10.1093/bioinformatics/btl633.PubMedView Article
- Sherman BT, Tan Q, Kir J, Liu D, Bryant D, Guo Y, Stephens R, Baseler MW, Lane HC, Lempicki RA: DAVID bioinformatics resources: expanded annotation database and novel algorithms to better extract biology from large gene lists. Nucleic Acids Res. 2007, 35: W169-W175. 10.1093/nar/gkm415.PubMedPubMed CentralView Article
- Falcon S, Gentleman R: Using GOstats to test gene lists for GO term association. Bioinformatics. 2007, 23: 257-258. 10.1093/bioinformatics/btl567.PubMedView Article
- Zheng Q, Wang XJ: GOEAST: a web-based software toolkit for Gene Ontology enrichment analysis. Nucleic Acids Res. 2008, 36: W358-W363. 10.1093/nar/gkn276.PubMedPubMed CentralView Article
- Davis JW: Bioinformatics and computational biology solutions using R and Bioconductor. J Amer Statistical Assoc. 2007, 102: 388-389.View Article
- Yu J, Murali T, Finley RL: Assigning confidence scores to protein–protein interactions. Two Hybrid Technologies. 2012, Springer, 161-174.View Article
- Frey BJ, Dueck D: Clustering by passing messages between data points. Science. 2007, 315: 972-976. 10.1126/science.1136800.PubMedView Article
- Bodenhofer U, Kothmeier A, Hochreiter S: APCluster: an R package for affinity propagation clustering. Bioinformatics. 2011, 27: 2463-2464. 10.1093/bioinformatics/btr406.PubMedView Article
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.