Gene Ontology Function prediction in Mollicutes using Protein-Protein Association Networks
© Gómez et al; licensee BioMed Central Ltd. 2011
Received: 24 January 2011
Accepted: 12 April 2011
Published: 12 April 2011
Many complex systems can be represented and analysed as networks. The recent availability of large-scale datasets, has made it possible to elucidate some of the organisational principles and rules that govern their function, robustness and evolution. However, one of the main limitations in using protein-protein interactions for function prediction is the availability of interaction data, especially for Mollicutes. If we could harness predicted interactions, such as those from a Protein-Protein Association Networks (PPAN), combining several protein-protein network function-inference methods with semantic similarity calculations, the use of protein-protein interactions for functional inference in this species would become more potentially useful.
In this work we show that using PPAN data combined with other approximations, such as functional module detection, orthology exploitation methods and Gene Ontology (GO)-based information measures helps to predict protein function in Mycoplasma genitalium.
To our knowledge, the proposed method is the first that combines functional module detection among species, exploiting an orthology procedure and using information theory-based GO semantic similarity in PPAN of the Mycoplasma species. The results of an evaluation show a higher recall than previously reported methods that focused on only one organism network.
Sequence similarity has proven to be useful for many years in attempting to annotate genomes [1, 2]. A simple way to infer the possible function of a protein is to use an alignment procedure such as PSI-BLAST , to find possible homologues in sequence databases, such as UniProt . However, sequence homology has its limitations. Only a fraction of newly discovered sequences have identifiable homologous genes in current databases, and its viability is limited to cases where substantial sequence similarity to annotated proteins can be found . Moreover, the growing number of annotations extrapolated from sequence similarity is prone to errors [6–8], hence, new bioinformatics methods are developed to complement traditional sequence homology-based methods.
The development of high throughput technologies has resulted in large amounts of predicted Protein-Protein Interaction networks (PPI) for different genomes and, subsequently, methods using PPI data for functional inference [6, 9–12] have been developed. It has been demonstrated that we may be able to use the semantics of gene annotations [13, 14] and that we can obtain greater precision to predict new annotations using Gene Ontology (GO) information inside PPI [9, 10, 15]. Several semantic similarity measures using the GO database have been applied to gene products annotated with high ratios of prediction accuracy [13, 15–19].
Recently, other methods using PPI to predict functions for individual genes or proteins have been developed by considering modularisation in biological networks  These methods attempt to first identify coherent groups of proteins and then assign functions to all of the proteins in each group. In terms of topology, a functional module can be typically understood as a group of proteins that are densely interconnected and contribute to perform for a specific biological function . Once a module is obtained, the function prediction within the module  is usually conducted in a straightforward way by simple methods like orthology exploitation .
However, most of these approaches use PPI and, while very useful, they have limitations. The obtained PPI data result in a rich, but quite noisy and still incomplete, source of information. Also, PPIs are only available for a reduced group of organisms  due to the problems of using high throughput technologies in important study organisms such as Mycoplasma.
This last case implies a significant restriction since Mycoplasma genitalium is one of the most studied species, as it is the smallest organism having a small genome size [23, 24] and has limited metabolic capabilities . Due to all of the above, it has become a close approximation to the minimal set of genes required for bacterial growth [2, 24], so, in order to study the Proteomes of these species, other type of protein-protein networks is necessary.
In many ways, Protein-Protein Association Networks (PPAN) are a more informative way of describing proteins and their mutual interactions . In contrast to PPI, PPAN make no assertion as to how exactly two proteins interact: Proteins can show a productive functional interaction without physically interacting with each other (i.e. performing subsequent metabolic reactions in the same metabolic pathway). Therefore, whenever two proteins form a specific functional partnership, they can be thought of as being associated, independently of what the actual mechanism of their association is . PPAN have been widely used recently in order to predict protein function [27–31].
In this work, a new protocol is described for predicting new gene annotations involving different approaches using PPAN for Mycoplasmas: Functional module identification, orthology exploitation and Gene Ontology (GO) Semantic similarity-based measures . We have developed a simple, but effective, procedure in order to assign function transferring GO terms to unannotated protein nodes inside functionally conserved modules between two Mycoplasma species.
1. Calculations and algorithm procedure
1.2. Protein-protein association networks for Mycoplasma species
We have performed this study on seven Mycoplasma species proteomes. One of the main limitations in using protein-protein interactions for function prediction is the availability of interaction data, especially if one wishes to work with Mollicutes. The entries in STRING  correspond to protein-protein functional associations for more than 600 organisms, including Mycoplasmas. A protein-protein functional association can mean either a direct physical binding or an indirect interaction, such as participation in the same metabolic pathway or cellular process. The associations are derived from high-throughput experimental data, from the mining of databases and literature and from predictions based on genome context analysis. STRING carefully assesses and integrates all of these data in order to obtain a single confidence score for all protein interactions, taking a more generalised perspective on proteins and their associations than other databases whose main purpose is to collect and curate direct experimental evidence about protein-protein physical interactions.
Moreover, the improvement of the use of STRING in Protein Function network procedures has been indicated , and in a recent work, STRING has been used to study the proteome organisation of Mycoplasma pneumoniae.
1.2. Functional Module identification
Number of interactions and functional modules detected for each genome and M. genitalium
Number of interactions
1.3. Gene Ontology (GO) terms assignation to proteins inside the modules
Genomes, their number of genes and genes with GO annotations
Number of genes
Number of GO annotations
1.4. Transferring GO terms to unannotated protein
The orthology value calculated between each potential pair of orthologs between conserved modules using ORTHOMCL . If the value exceed a threshold, the pair was considered as real orthologs and the procedure of Jaeger and Laeser  was followed for transferring GO terms specifically between those ortholog protein pairs and generating a list for the unannotated M. genitalium protein.
1.5. Semantic similarity analysis of transferred GO terms
The transferred GO annotations list of unannotated M. genitalium proteins was analyzed, comparing them with the GO annotations from annotated proteins inside the module that they belong to. The Information theory-based semantic similarity (ITTS) for predicting function through interacting proteins was followed [10, 14]:
• To calculate Semantic Similarity between the transferred GO terms from unannotated protein and GO terms from interacting neighbours inside the same functional module.
• To consider GO terms with a similarity value above a threshold, then associate unannotated protein with these GO terms
However, we can accept the predictions if they are similar to the annotations in the module, i.e., have similar GO ontology annotations, then a "manual curation" procedure is needed. A schematic view of the ITTS procedure is depicted in Figure 1.
2. Performance measures
2.1. Effectiveness of the method
The performance of the method was measured as an average value in a five-fold cross-validation analysis, where the GOA dataset for M. genitalium was randomly divided into five parts. Four parts for model learning (training), and the remaining part for validation (testing). Known GO annotations were removed from the test set and it was tried to predict the terms of the proteins in the test set using the rest of the sets (training sets).
Finally, the predicted terms were compared with original annotations to determine the amount of correctly predicted annotations. Effectiveness is validated using standard information retrieval measures: recall and precision. Several terms have defined:
A: set of annotated GO functions (in test set)
P: set of predicted GO functions
F: GO functions in train set
The performance of the method measured as an average value in a 5-fold cross-validation analysis for two ontologies: Biological Process and Molecular Function.
The high recall in both ontology cases indicates that a large number of GO terms is recovered by our method from all of the GO terms that are relevant to the search. The high precision in the Molecular Function ontology case indicates that a large proportion of the GO terms are relevant to the search among all of the GO terms recovered by our method.
2.2. Predictions in the M. genitalium dataset using different GOA files
Content of GO terms and genes for each Gene Ontology Annotation (GOA) file used in our experiments.
>10 annotations (%)
3-9 annotations (%)
<2 annotations (%)
We found that our procedure performed well, for the Molecular Function Ontology, and high precision scores (more than 80%) and recall values of nearly 60% were obtained. In the Biological Process Ontology, precision scores obtained were also high (more than 90%) and recall values of nearly 85% were obtained for those GO terms that were associated with two or more genes. However, it can be seen that in both experiments (Molecular Function and Biological Process Ontology) poor values were obtained compared with the other genome predictions in the case of M. mycoides. We believe that the cause of these results could be that the higher semantic similarity values obtained for all of the GO terms predicted and those that are part of the modules are barely 0.6 (data not shown), while the higher semantic similarity values for the other genomes oscillate between 0.7 and 0.85.
2.3. Predictions of informative GO terms in the M. genitalium dataset using different approaches
We also, wish to determine if our method can be compared with other similar method in terms of precision and recall when trying to predict informative GO terms. Informative GO terms were defined as those terms that are annotated to at least n proteins (n = 10) and has at least a level-4 or higher. Defining:
t: Relevant/Informative GO terms
Here, how well our procedure detects informative GO terms, as compared to other similar approximations is studied. FS-Weight is an algorithm which predicts the function of a protein based on the idea that the interaction between indirect neighbours inside a PPI is likely to share common functions . These indirect neighbours may interact with the same protein due to some common physical or biochemical characteristics, especially if they share many common interaction neighbours. The FS-weight algorithm was chosen, from others for several reasons: First, because the method is one of the latest PPI-based algorithms for predicting protein function using GO annotations. Secondly, because it was tested to predict protein function using the STRING database, showing significant improvement and, thirdly, because FS-Weight can predict protein function effectively for all of the three categories of GO across different genomes, indicating that it is a robust approach . We also compared our procedure with two other approximations: The first one  attempts to learn a linear model of how likely a protein is to have a function given the frequency with which a proteins neighbours have that term. Parameters of the model are estimated using quasi-likelihood estimation techniques. In order to provide a fair comparison, the proposed method has been compared to a sequence based annotation method which is based on the transfer of annotations in a closely related species, which is BYPASS . This method predicts the putative function for the protein from its sequence integrating the results from the PSI-BLAST programme and a fuzzy logic algorithm using several protein sequence characteristics which have been checked, with regards to their ability to rearrange a PSI-BLAST profile according more to their biological functions.
Different semantic similarity measures were used to obtain scores between the GO terms included inside of each predicted module and the GO terms obtained using our procedure. For PPI predictions, the modules obtained were assigned using NetworkBLAST and then calculated the semantic similarity between the GO terms included and the GO terms predicted. For BYPASS predictions, the GO terms were assigned to the predictions using the Gene Ontology database directly. The semantic similarity values were then used as variable thresholds. (See Material and Methods for details).
It can be seen in Figure 4 that, for the three semantic similarity calculations our algorithm makes predictions with better precision and recall, as compared to PPI (Chua and Deng) and sequence-analysis methods. The main advantage of our procedure is that, despite obtaining a low number of predicted GO terms, they are related to the terms inside of each module, while the PPI predictions use only one PPAN (in this case the M. genitalium) and, then, the predicted GO terms have lower similarity values than do our predictions. It can be seen, however, that our results have the same problem as do the PPI aproximations . Due to the lack of annotation information and our previous parsing in trying to avoid inconsistency in GO terms, the informative terms chosen are a low number (nearly 50) and may not provide statistically strong comparisons.
M. genitalium using different genomes and number of genes unannotated. Coverage of annotations using our method (Predictions over number of unnanotated genes)
Coverage of annotations
Number of GO terms assigned
It has been shown herein, that our procedure out-performs other similar algorithms to predict GO-based annotations using Protein-Protein networks, with equal or higher overall precision from a significantly broader range of GO terms. The incorporation of other approximations such as functional module detection, conserved between species and orthology exploitation, predict function with higher precision and recall in two ontologies of the GO database. As compared to other GO search engines, our algorithm is capable of finding GO terms with high semantic similarity values due to using orthology information between proteins predicted inside functional modules conserved between species, and it has been also shown to recapitulate "known" future GO annotations artificially removed from the dataset using five-fold cross-over validation, with high precision and recall.
We believe that our combined approach could be applied in future as a high-precision Mollicute genome annotation procedure: Moderately well GO-annotated Mycoplasma genomes could help to improve protein function in other Mycoplasma less-annotated genomes in a more effective way than ab initio classic annotation methods. However, caution must be exercised when using this technique. We have shown how critical neighbour genomes with good GO annotation are: The performance of this procedure is limited because it needs Mycoplasma genomes with a high GO annotation degree and a high number of predicted conserved modules. If genomes were used with low predicted modules and a low number of GO terms annotated within those modules, a reliable number of predictions could not be achieved (data not shown). Future work will cover this by using the latest development of the GO database, and which evolutionary distances between Mollicute genomes are still allowable in order to predict a reliable number of conserved functional modules between them.
Gene Ontology annotation file
The standardised Gene Ontology vocabulary (GO) was used  as a standard to annotate the proteins inside the PPAN. The annotation for each gene in each proteome was obtained from the Gene Ontology Annotation file (GOA)  available for each Mycoplasma proteome in the GO database http://www.geneontology.org. The version of GO used in this study dated from October, 2008, and was parsed for each of the species, thus avoiding several problems:
• Excluding terms annotated as obsolete and terms that no have relation with Mycoplasma such as: GO:0000004. (Biological process unknown).
• Nearly all Mycoplasma proteomes are annotated in GOA files mainly with GO terms with evidence code IEA (Inferred from Electronic Annotation). To be retained, IEA annotations must be manually reviewed in order to be assigned an upgraded Evidence Code such as ISS (Inferred from Sequence or Structural Similarity). All the of IEA GO terms that do not have an associated ISS code were removed.
• IEA annotations are generalised to apply to a diverse range of species and usually only represent very broad functions such as 'Protein binding' and 'Enzyme binding'. In effect, this means that as functional genomics data are modelled using GO annotation, a large proportion of the remaining data describes only very broad biological concepts . GO terms are arranged in a hierarchical manner with more general terms at the lower level and more specific terms at the higher level. The GO term "biological process" is defined as level 0, its children terms as level 1, and so on. Wong's work was followed and only GO terms were considered to contribute to "annotated protein" if they had at least one level-4 GO term or higher.
GO semantic similarity calculations
The information content of each GO term for each protein inside the detected modules was calculated and then three measures were applied to estimate the semantic similarity between GO terms assigned using orthology exploitation to un-annotated proteins and the GO terms assigned to the annotated proteins inside the same functional module. Information content is defined as the frequency of each term which occurs in the GO corpus.
where S(go1, go2 ) is the set of parent terms shared by go1 and go2.
Depending on the similarity between the GO annotations for the protein and the GO annotations for the module which it belongs to, this score ranges from between 0 and 1, where 1 indicates functional equality and 0 indicates maximal functional distance. Low values can be discriminated among, indicating low precision, and higher values, indicating high precision.
Protein-Protein Association Networks
Inferred from Electronic Annotation.
This research was supported by Grants BIO2007-67904-C02 and BFU2010-22209-C02 from the MCYT (Ministerio de Ciencia y Tecnología, Spain), and from the Centre de Referència de R+D de Biotecnologia de la Generalitat de Catalunya. The English of this manuscript has been corrected by Mr. Chuck Simmons, a native English-speaking Instructor of English of Autonomous University of Barcelona (UAB). EQ is holder of a Chair of Knowledge and Technology Transfer Parc de Recerca UAB-Santander. AG acknowledges his postdoctoral fellowship from Parc de Recerca UAB-Santander.
- Galperin MY, Koonin EV: Who's your neighbor? New computational approaches for functional genomics. Nat Biotechnol. 2000, 18: 609-613. 10.1038/76443View ArticlePubMedGoogle Scholar
- Koonin EV: Bridging the gap between sequence and function. Trends Genet. 2000, 16: 16- 10.1016/S0168-9525(99)01927-7View ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, et al.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389PubMed CentralView ArticlePubMedGoogle Scholar
- The Universal Protein Resource (UniProt) 2009. Nucleic Acids Res. 2009, 37: D169-174.
- Gomez A, Cedano J, Espadaler J, Hermoso A, Pinol J, et al.: Prediction of protein function improving sequence remote alignment search by a fuzzy logic algorithm. Protein J. 2008, 27: 130-139. 10.1007/s10930-007-9116-xView ArticlePubMedGoogle Scholar
- Deng M, Zhang K, Mehta S, Chen T, Sun F: Prediction of protein function using protein-protein interaction data. J Comput Biol. 2003, 10: 947-960. 10.1089/106652703322756168View ArticlePubMedGoogle Scholar
- Devos D, Valencia A: Practical limits of function prediction. Proteins. 2000, 41: 98-107. 10.1002/1097-0134(20001001)41:1<98::AID-PROT120>3.0.CO;2-SView ArticlePubMedGoogle Scholar
- Devos D, Valencia A: Intrinsic errors in genome annotation. Trends Genet. 2001, 17: 429-431. 10.1016/S0168-9525(01)02348-4View ArticlePubMedGoogle Scholar
- Chua HN, Sung WK, Wong L: Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions. Bioinformatics. 2006, 22: 1623-1630. 10.1093/bioinformatics/btl145View ArticlePubMedGoogle Scholar
- Jaeger S, Leser U: High-Precision Function Prediction using Conserved Interaction. Proc of the German Conference on Bioinformatics. 2007, 146-163.Google Scholar
- Samanta MP, Liang S: Predicting protein functions from redundancies in large-scale protein interaction networks. Proc Natl Acad Sci USA. 2003, 100: 12579-12583. 10.1073/pnas.2132527100PubMed CentralView ArticlePubMedGoogle Scholar
- Wu H, Su Z, Mao F, Olman V, Xu Y: Prediction of functional modules based on comparative genome analysis and Gene Ontology application. Nucleic Acids Res. 2005, 33: 2822-2837. 10.1093/nar/gki573PubMed CentralView ArticlePubMedGoogle Scholar
- Couto FM, Silva MJ, Lee V, Dimmer E, Camon E, et al.: GOAnnotator: linking protein GO annotations to evidence text. J Biomed Discov Collab. 2006, 1: 19- 10.1186/1747-5333-1-19PubMed CentralView ArticlePubMedGoogle Scholar
- Tao Y, Sam L, Li J, Friedman C, Lussier YA: Information theory applied to the sparse gene ontology annotation network to predict novel gene function. Bioinformatics. 2007, 23: i529-538. 10.1093/bioinformatics/btm195PubMed CentralView ArticlePubMedGoogle Scholar
- Chua HN, Sung WK, Wong L: Using indirect protein interactions for the prediction of Gene Ontology functions. BMC Bioinformatics. 2007, 8 Suppl 4: S8- 10.1186/1471-2105-8-S4-S8View ArticlePubMedGoogle Scholar
- Arnaud MB, Costanzo MC, Shah P, Skrzypek MS, Sherlock G: Gene Ontology and the annotation of pathogen genomes: the case of Candida albicans. Trends Microbiol. 2009, 17: 295-303. 10.1016/j.tim.2009.04.007PubMed CentralView ArticlePubMedGoogle Scholar
- Giglio MG, Collmer CW, Lomax J, Ireland A: Applying the Gene Ontology in microbial annotation. Trends Microbiol. 2009, 17: 262-268. 10.1016/j.tim.2009.04.003View ArticlePubMedGoogle Scholar
- Pesquita C, Faria D, Bastos H, Ferreira AE, Falcao AO, et al.: Metrics for GO based protein semantic similarity: a systematic evaluation. BMC Bioinformatics. 2008, 9 Suppl 5: S4- 10.1186/1471-2105-9-S5-S4View ArticlePubMedGoogle Scholar
- Wang JZ, Du Z, Payattakool R, Yu PS, Chen CF: A new method to measure the semantic similarity of GO terms. Bioinformatics. 2007, 23: 1274-1281. 10.1093/bioinformatics/btm087View ArticlePubMedGoogle Scholar
- Newman ME: Modularity and community structure in networks. Proc Natl Acad Sci USA. 2006, 103: 8577-8582. 10.1073/pnas.0601602103PubMed CentralView ArticlePubMedGoogle Scholar
- Hartwell LH, Hopfield JJ, Leibler S, Murray AW: From molecular to modular cell biology. Nature. 1999, 402: C47-52. 10.1038/35011540View ArticlePubMedGoogle Scholar
- Sharan R, Ulitsky I, Shamir R: Network-based prediction of protein function. Mol Syst Biol. 2007, 3: 88- 10.1038/msb4100129PubMed CentralView ArticlePubMedGoogle Scholar
- Fraser CM, Gocayne JD, White O, Adams MD, Clayton RA, et al.: The minimal gene complement of Mycoplasma genitalium. Science. 1995, 270: 397-403. 10.1126/science.270.5235.397View ArticlePubMedGoogle Scholar
- Glass JI, Assad-Garcia N, Alperovich N, Yooseph S, Lewis MR, et al.: Essential genes of a minimal bacterium. Proc Natl Acad Sci USA. 2006, 103: 425-430. 10.1073/pnas.0510013103PubMed CentralView ArticlePubMedGoogle Scholar
- Dybvig K, Voelker LL: Molecular biology of mycoplasmas. Annu Rev Microbiol. 1996, 50: 25-57. 10.1146/annurev.micro.50.1.25View ArticlePubMedGoogle Scholar
- von Mering C: Protein-Protein Interaction Networks: Assembly and Analysis. Edited by: Ron Appel, Ernest Feytmans, Lausanne. 2008, SIoB,Google Scholar
- Alexeyenko A, Sonnhammer EL: Global networks of functional coupling in eukaryotes from comprehensive data integration. Genome Res. 2009, 19: 1107-1116. 10.1101/gr.087528.108PubMed CentralView ArticlePubMedGoogle Scholar
- Gabow AP, Leach SM, Baumgartner WA, Hunter LE, Goldberg DS: Improving protein function prediction methods with integrated literature data. BMC Bioinformatics. 2008, 9: 198- 10.1186/1471-2105-9-198PubMed CentralView ArticlePubMedGoogle Scholar
- Hu P, Janga SC, Babu M, Diaz-Mejia JJ, Butland G, et al.: Global functional atlas of Escherichia coli encompassing previously uncharacterized proteins. PLoS Biol. 2009, 7: e96- 10.1371/journal.pbio.1000096View ArticlePubMedGoogle Scholar
- Lee SA, Chan CH, Tsai CH, Lai JM, Wang FS, et al.: Ortholog-based protein-protein interaction prediction and its application to inter-species interactions. BMC Bioinformatics. 2008, 9 Suppl 12: S11- 10.1186/1471-2105-9-S12-S11View ArticlePubMedGoogle Scholar
- Mavromatis K, Chu K, Ivanova N, Hooper SD, Markowitz VM, et al.: Gene context analysis in the Integrated Microbial Genomes (IMG) data management system. PLoS One. 2009, 4: e7979- 10.1371/journal.pone.0007979PubMed CentralView ArticlePubMedGoogle Scholar
- Couto FM, Coutinho PM, Silva MJ: Measuring semantic similarity between Gene Ontology terms. Data & Knowledge Engineering. 2007, 61: 137-152. 10.1016/j.datak.2006.05.003View ArticleGoogle Scholar
- Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, et al.: STRING 8--a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res. 2009, 37: D412-416. 10.1093/nar/gkn760PubMed CentralView ArticlePubMedGoogle Scholar
- Sharan R, Ideker T: Modeling cellular machinery through biological network comparison. Nat Biotechnol. 2006, 24: 427-433. 10.1038/nbt1196View ArticlePubMedGoogle Scholar
- Sharan R, Suthram S, Kelley RM, Kuhn T, McCuine S, et al.: Conserved patterns of protein interaction in multiple species. Proc Natl Acad Sci USA. 2005, 102: 1974-1979. 10.1073/pnas.0409522102PubMed CentralView ArticlePubMedGoogle Scholar
- Barrell D, Dimmer E, Huntley RP, Binns D, O'Donovan C, et al.: The GOA database in 2009--an integrated Gene Ontology Annotation resource. Nucleic Acids Res. 2009, 37: D396-403. 10.1093/nar/gkn803PubMed CentralView ArticlePubMedGoogle Scholar
- Camon E, Magrane M, Barrell D, Lee V, Dimmer E, et al.: The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Res. 2004, 32: D262-266. 10.1093/nar/gkh021PubMed CentralView ArticlePubMedGoogle Scholar
- Chen F, Mackey AJ, Stoeckert CJ, Roos DS: OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res. 2006, 34: D363-368. 10.1093/nar/gkj123PubMed CentralView ArticlePubMedGoogle Scholar
- Lord PW, Stevens RD, Brass A, Goble CA: Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics. 2003, 19: 1275-1283. 10.1093/bioinformatics/btg153View ArticlePubMedGoogle Scholar
- McCarthy FM, Bridges SM, Burgess SC: GOing from functional genomics to biological significance. Cytogenet. Genome Res. 2007, 117: 278-287.View ArticleGoogle Scholar
- Resnik P: Using information content to evaluate semantic similarity in a taxonomy. Proceedings of the 14th International Joint conference on Artificial Intelligence. 1995,Google Scholar
- Lin D: An information-theoretic definition of similarity. Proceedings of the 15th International Conference on Machine Learning. 1998,Google Scholar
- Jiang J, Conrath D: Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings of the 10th International Conference on Research on Computational linguistics. 1997,Google Scholar