Cumulative semantic similarity. The graph diffusion kernel was used to calculate scores for gene pairs (compartment model, diffusion parameter γ = 1). For each gene pair, the semantic similarity was also calculated for Biological Process, Cellular Component, and Molecular Function, and the largest of the three values was retained. The cumulative average of this maximum value was then calculated for GDK thresholds of decreasing stringency. (A) The threshold is shown as the GDK score, from least stringent (score = 5 × 10-6, the smallest GDK score) to most stringent (617.2, the largest GDK score). (B) The threshold is shown as the rank order, from most stringent (rank = 1) to least stringent (rank = 279,000, the total number of pairs).