Optimal graph alignment between VZV and KSHV
The protein interaction network of the herpesvirus VZV consists of 76 open reading frames (ORFs) and 173 protein-protein interactions (of these ORFs, 19 have no detected interactions and are disregarded from the subsequent analysis). The protein interaction network of KSHV consists of 84 ORFs and 123 interactions (34 ORFs have no detected interactions), [4], see Figure 2a. Thirty-four ORFs in VZV have reciprocally best matching sequence homologs with reading frames in KSHV. Between pairs of ORFs with such homologous partners, there are 44 interactions in VZV and 25 interactions in KSHV. Of these interactions, 8 occur in both species, that is the overlap between interaction networks is about 13% when the alignment is given by sequence homology. The optimal alignment of the two networks is shown in Figure 2b. The list of aligned ORFs and details on the scoring are given in the supplementary text [see Additional file 1]. The alignment consists of 26 pairs of aligned ORFs, spanning one third of the protein interaction networks of VZV and KSHV. The alignment contains 44 interactions, 10 of which are self-interactions. Of the 34 interactions between distinct ORFs, 11 are matching interactions occurring in both protein interaction networks, only one of the 10 self-interactions matches. Of the 26 pairs of aligned ORFs, 24 pairs have detectable sequence similarity. The remaining 2 aligned pairs involve ORFs which have no detectable sequence similarity with each other or any other ORF. The mean connectivity of the aligned part of the protein interaction network is 3.0 interactions per ORF, compared with a mean connectivity of 2.4 of VZV and 1.5 of KSHV.
Additional file 2: Supplementary Animation. The Supplementary Animation [Additional file 2] illustrates the network alignment algorithm and shows the intermediate steps between the Figure 2a and Figure 2b. See caption of the Figure 2a,b for the colour coding of the nodes and links. (MP4 3 MB)
The quality of the alignment we have obtained can be tested by comparing the genomic positions of the aligned ORFs. We count the ranks of ORFs from the initial terminal repeats of the two genomes (left TR of KSHV, TRL of VZV). In Figure 3a the ranks of reading frames in VZV are plotted against the ranks of their alignment partners in KSHV. Aligned ORFs without any sequence similarity fit very well into the sequence of ORFs in their respective genomes. The molecular weights of the aligned nodes are highly correlated, see Figure 3b. In addition, we find that interactions among the aligned ORFs are more likely to be conserved across several other herpes species, including herpes simplex virus (HHV-1) and murine cytomegalovirus (mCMV). The mutual information on the interactions in different species within the alignment is 6.6-times higher than for the interactions among ORFs outside of the alignment [see Additional file 1 for details].
In some cases, sequence similar pairs of ORFs are not aligned because of mismatched interactions. As an extreme case an ORF may have several interactions in one species, but none in the other, indicating most likely an unsuccessful yeast-two-hybrid assay (Y2H) experiment. Examples are KSHV ORF64/VZV ORF22, 22/37, 42/53, 36/47, and 33/44.
Functional relationships detected by interaction similarity
Some ORFs are aligned due to their matching interactions, either with low or with no detectable sequence similarity. We discuss these cases separately.
KSHV ORF67.5/VZV ORF25
These ORFs have a sequence identity of only 18% over 76 aa (see Methods for details). They are listed as homologs in the VIDA3 database [20], and both of them are thought to be homologs of the HHV-1 protein UL33 [21]. The alignment of these ORFs largely results from 4 matching links out of 5 in KSHV and 12 in VZV (p-value of 4 × 10-3, [see Additional file 1]) with a local link score S
L
= 4.57 versus node score S
N
= 4.20. Our alignment thus confirms the homology.
KSHV ORF28/VZV ORF65
These ORFs have a sequence identity of only 11% over 102 aa. They are not listed as sequence homologs in databases VOCS [22], VIDA3 [20] and NCBI [23]. However, the sequence alignment extends over their complete length, with no gaps. Again, the alignment of these nodes results from 4 matching links out of 4 in KSHV and out of 5 in VZV (p-value of 10-3) with a local link score S
L
= 6.30 versus node score S
N
= 3.50. Functional annotation is available only for VZV ORF65; it belongs to the membrane/glycoprotein class, most likely it is a type-II membrane protein [24]. The alignment of KSHV ORF28 with VZV ORF65 leads us to predict that KSHV ORF28 also codes for a membrane glycoprotein, see Figure 2c for illustration.
Several experimental studies support this prediction. Gene expression studies show that ORF28 is co-expressed with tertiary lytic ORFs and hence probably falls in the classes of structural or host-virus-interaction genes [25, 26]. The expression of ORF28 is affected by blocking DNA replication [27] showing ORF28 is a secondary or tertiary gene. Furthermore, ORF28 has been detected in the virion by mass spectroscopy, leading to a tentative functional classification as a glycoprotein-envelope protein [28]. Finally, ORF28 is a positional homolog of the Epstein-Barr virus ORF BDLF3, which is known to encode glycoprotein gp150.
KSHV ORF23/VZV ORF39
These ORFs have no significant sequence similarity: although the alignment obtained with clustalW [29] has a sequence identity of 18% over 240 aa, it is statistically insignificant; a randomised test yields a p-value of 0.43. A systematic analysis involving a wide range of different scoring parameters does not yield a statistically significant sequence alignment either [see Additional file 1]. The reading frames KSHV ORF23 and VZV ORF39 are aligned purely due to 3 matching interactions out of 4 of KSHV and 4 of VZV (p-value 2 × 10-2). The local link score equals 4.47 versus a node score of - 0.49. Functional classification is available only for VZV ORF39 as a membrane/glycoprotein [20]. The alignment thus leads us to predict that KSHV ORF23 also codes for a membrane glycoprotein.
This prediction is supported by several experimental studies. Again ORF23 is co-expressed with tertiary lytic ORFs [25] and is sensitive to blocked DNA replication [27], so it is a late gene. The expression patterns of ORF23 are similar to those of structural and packaging genes.
KSHV ORF41/VZV ORF60
These ORFs have 3 matching interactions out of 3 in KSHV and 6 in VZV (p = 2 × 10-2), but no significant sequence similarity (The clustalW sequence alignment has identity of 12% over 160 aa with p-value 0.94). They are aligned with a local link score of 4.39 versus a node score of -0.49. Both ORFs are functionally annotated. KSHV ORF41 codes for a helicase/primase associated factor [30] and is not affected by blocking DNA replication [27]. On the other hand, VZV ORF60 codes for the glycoprotein L [20, 31]. It may be that either of them has a so-far unknown function, leading to the matching protein interactions. This idea finds support in [25], where the expression maximum of ORF41 was found to come after the secondary lytic phase. This is surprising because the transcript is needed already during the secondary lytic phase (DNA replication). No other DNA-replicating gene controlled by a different operon to KSHV ORF41 has an expression dynamics with this property. Such a delay of the maximum of expression may have two reasons: either the transcription of the ORF41 is not controlled after its role is finished, or ORF41 indeed has a hitherto uncharacterised function in the tertiary lytic phase, possibly a structural one.
We also note that ORF41 is specific to the class of γ-herpesviruses, of which KSHV is a member. Analogously, ORF60 is a-herpesvirus specific. It is possible that the homolog of ORF41 in VZV and the homolog of ORF60 in KSHV were lost as a result of either of these proteins acquiring a new function. This would be an example of non-orthologous gene displacement [19].
Interaction clusters
The alignment shown in the Figure 2 contains a cluster of proteins all interacting with each other. This cluster comprises the aligned pairs KSHV ORF23/VZV ORF39, 28/65, 29b/42, and 67.5/25 connected by matching links only. The p-value for such a fully connected cluster (a clique) to emerge at random is approximately 5 × 10-11. The pair KSHV ORF41/VZV ORF60 discussed above is connected to this cluster by two matching links, forming an almost fully connected cluster of 5 ORFs pairs with 8 of 10 possible links present and matching. Surprisingly, while all the other ORFs in the cluster code for structural proteins (virion assembly and structure proteins), ORF41 of KSHV is annotated as a helicase/primase associated factor, and hence codes a protein involved in DNA replication. The association with structure-related genes may be interpreted as a further evidence towards another function of ORF41 as a structural protein.
This cluster of interacting proteins is also found in a third species, the Epstein-Barr virus EBV, which is of the same viral family as KSHV. Three of the four ORFs of the cluster in KSHV have sequence homologs in EBV, namely ORF23, ORF67.5, ORF29b. All of the corresponding ORFs in EBV are found to interact with each other (Peter Uetz, private communication).
The individual species KSHV and VZV contain further clusters, but these are not conserved across species. For instance, the cluster comprising ORFs 28, 29b, 41 and K10 in KSHV contains genes coding for predicted virion proteins, virion assembly and host-virus interaction proteins. ORFs 25, 19, 27, and 38 forming a fully connected cluster in VZV code for proteins involved in virion assembly, nucleotide repair, metabolism, and host-virus interaction.
Interaction conservation and protein function
Protein interactions which are conserved across species shed further light on the functional relationship of the interaction partners. We compare the functions of interacting proteins (i) when the interaction is conserved between KSHV and VZV, and (ii) regardless of conservation.
Each annotated protein can be assigned to one of two functional classes: it is either a 'structural protein' (its functional annotation is one of capsid/core protein, membrane/glycoprotein, virion protein, virion assembly), or an 'information-processing' protein (DNA replication, gene expression regulation, nucleotide repair/metabolism, host-virus interaction). We take the functions of two proteins to be similar if both their functional annotations fall into the same class. Based on this classification, we measure the correlation between functional annotations of interacting proteins by mutual information. For conserved interactions, this is nearly 20-times higher than for the set of all interactions (0.107 bits vs. 0.006 bits). Hence, conserved interactions are more likely to connect functionally similar proteins. Conversely, functionally similar proteins have more conserved interactions than functionally unrelated genes. The mutual information between interactions in the two species is nearly ten times higher for pairs of functionally similar proteins than for pairs of functionally different proteins (0.071 bits vs. 0.007 bits).