Exploring virus relationships based on virus-host protein-protein interaction network
© Xu et al. 2011
Published: 23 December 2011
Skip to main content
© Xu et al. 2011
Published: 23 December 2011
Currently, several systems have been proposed to classify viruses and indicate the relationships between different ones, though each system has its limitations because of the complexity of viral origins and their rapid evolution rate. We hereby propose a new method to explore the relationships between different viruses.
A new method, which is based on the virus-host protein-protein interaction network, is proposed in this paper to categorize viruses. The distances between 114 human viruses, including 48 HIV-1 and HIV-2 viruses, are estimated according to the protein-protein interaction network between these viruses and humans.
The results demonstrated that our method can disclose not only relationships consistent with the taxonomic results of currently used systems of classification but also the potential relationships that the current virus classification systems have not revealed. Moreover, the method points to a new direction where the functional relationships between viruses and hosts can be used to explore the virus relationships on a systematic level.
Viruses can be classified according to different aspects [1, 2], such as their geometry, whether they have an envelope, the identity of the host organism they can infect, the mode of transmission, or the type of disease they cause. One of the widely accepted and useful classification systems is based on the combination of their nucleic acid (DNA or RNA), strandedness (single-stranded or double-stranded), sense, and method of replication. This classification system was proposed by David Baltimore . However, the system concerning the diseases caused by the viruses or the morphology of the viruses is not generally accepted, because different viruses could cause the same disease and their morphologies could look very similar under the microscope. It is the instability of the classification system based on viral characteristics that keep these kinds of taxonomy systems from general acceptance. The Baltimore classification offers a way to classify viruses in a given category and behaves in a definite pattern. Meanwhile, the International Committee on Taxonomy of viruses (ICTV) [4–6] has also devised and implemented rules for the classification of viruses. The ICTV system shares many features with the taxonomy system of cellular organisms, such as structure, etc. This classification uses the regular succession of Order, Family, Subfamily, Genus, and Species. Particularly, the code of nomenclature regulated by the International Committee on Taxonomy of Viruses differs from the others on several points. Most notably, names of orders and families are italicized, and species names are not binomial, but instead, they generally take the form of [Host] virus. Up to now, 84 families and more than 2,000 species of virus have been defined by the ICTV classification system. In addition, other virus classification systems, such as Holmes classification , LHT System of Virus Classification , and Casjens and Kings classification  of viruses, have been also proposed.
All of the three main classification systems mentioned above, the Baltimore classification, LHT System, and Casjens and Kings Classification, are based on certain chemical or physical characteristics of viruses. On the other hand, the ICTV classification system is based on the hypothesis that the members of the order might have a common ancestor. However, sixty-four of the total 84 families in the ICTV classification system are still unplaced. The Holmes classification sorts viruses into Phaginae (attacks bacteria), Phytophaginae (attacks plants), and Zoophaginae (attacks animals) according to the host type. These classifications are accepted by some virologists, but not all the virologists are satisfied with the current virus classification systems . Some of them even do not take the information of virus taxonomy into consideration while doing their research because of the limitations of each classification method mentioned above. As the current taxonomies of the viruses do not reflect the phylogeny relationship of different viruses , it is hard to build a classification system to satisfy all virologists. However, different methods could be set up to show the relationship between different viruses, which might be helpful to virologists.
A virus needs to use the DNA replication and protein synthetic systems of its host to complete its life cycle and proliferation. Therefore, some viruses exhibit strong co-evolutionary relationships with their mammalian host organisms [12, 13]. The study of co-evolution dates back to over 40 years ago . The co-evolution models, including "gene-for-gene", "matching allele", and "matching genotype", have already been proposed [15–18]. In addition, evidence of co-evolution is available in both the temporal and the spatial patterns . As essential viral proteins usually interact with their host proteins, we extend the hypothesis "function association - guilty by association" in an organism to the protein interactions between the organisms and explore the relationships between different viruses by examining the virus-host protein-protein interaction network. The Gene Ontology, which is comprised of terms in a hierarchical tree structure and is adopted for the gene function assignment of most studied organisms, is now extending its realm to the field of microbial annotation. The project of PAMGO (Plant-Associated Microbe Gene Ontology) has already extended the GO system to describe various processes related to microbe-host interactions. Currently, the project has assigned more than 800 new GO terms for microbe-host interaction or other symbiotic interactions . The controlled vocabulary of GO offers scientific researchers a consistent framework to gain and process biological information, thus minimizing the trouble coupled with the variations in human language and its inconsistency across different research communities. In this paper, we hypothesize that the relationship between viruses can be explored through the functional relationship of their proteins' interacting partners from their hosts. To demonstrate this, we used the viruses and human as the model since the protein-protein interaction (PPI) data between viruses and human have been accumulated significantly more than such data between viruses and other organisms. First, the PPI data between viruses and human are collected. Next, the relationship between distinct viruses is inferred based on the functional similarity between different GO terms of their proteins' interacting partners in human. Our estimation of the relationships between different viruses shows a different perspective to the relations of viruses that attack the same host and could be regarded as complementary to the traditional virus classification systems.
In evolution research, different indicators, such as the similarity between conserved sequences, have been used to determine the distances between organisms. In our method, we have defined the smallest special score derived from the SSBP of proteins between two sets as the distances between different viruses. The mathematical definition of the distance between two sets is the infimum of the distance between any components of the two sets. Our definition of the distance between two viruses is consistent with the mathematical distance definition of two point sets. Moreover, some human proteins, with which a viral protein interacts, could exert their function in relatively general processes. These general proteins contribute less to differentiate viruses. In an ideal situation, we should use the proteins that have more specific functions and participate in more special processes to reflect the relationship between different viruses. The infimum represents the most specific similarity between two protein sets and could reflect the relationship between two viruses on the most specific level. Considering the definition of GO term is at a general level, the smallest special score also has the tendency to get rid of the non-specificity of some GO terms in our sets.
In our new approach, the ability to detect the relationship between distinct viruses relies on the quality of the virus-host protein-protein interaction network explicitly. If the network is reliable and contains enough information to bridge the connection between viruses and their hosts, the relationship disclosed based on PPI network would reveal more functional associations to the virologists who are interested in the relationships between different viruses. In total, 9683 human proteins are confirmed to interact with these viral proteins of 114 viruses, and among them 8249 human proteins are verified to interact with 48 HIV viruses, while 66 non-HIV viruses correspond only to 1434 human proteins. This number is relatively low compared to the number of human proteins that interact with HIV proteins. As discussed above, the classification of HIV viruses displayed much more reliable result than for the rest groups of the viruses. This might be caused by the difference between the amounts of data in the two corresponding datasets that are currently available. It is expected that more verified virus-host protein-protein interaction data of other viruses may lead to more reliable and valuable results for exploring potential relationships between distinct viruses. Our method points to a new direction to elucidate the relationship between viruses on the systematical level and provides rich information for virologists to study the relationships among various viruses.
The protein-protein interaction network used in this paper is constructed mainly according to the results of Dyer et al. , which focuses on the interaction between human protein and pathogen protein, and a database (http://molvis.vbi.vt.edu/pig/index.php) describing the pathogen and human protein-protein interaction. We combined the data from the two sources mentioned above to build the protein interaction networks between each virus and its host, human, in our paper. Each interaction pair in the network is composed of a viral protein and a human protein. This network is the foundation that we used to evaluate the relationships between different viruses. The distance between distinct viruses is calculated according to the functional similarity of the host protein that interacts with the corresponding viral protein in the network. In the GO system, each human protein has been normally assigned GO terms in three different categories - 'Cellular Component', 'Biological Process' and 'Molecular Function', we used the GO term - "Molecular Function" to carry out the analysis since researchers usually are more interested in the molecular function of a protein when they study the virus and host interaction.
The method of smallest shared biological process, which is used to measure the functional similarity between different proteins in the previous work [24, 25], is the reference of the special score system applied in this paper. The following procedure was used to quantify functional similarity between two proteins according to SSBP: First, identify all gene ontology terms shared by two proteins; next, count how many other proteins were assigned to each of the shared terms as well; finally, identify the shared biological process term with the smallest count (SSBP). The SSBP score between each protein-protein pair is recorded. As shown in the Figure 3A, the two numbers 8 and 18 in the red or green circles are the SSBP scores of the red protein pair and green protein pair, respectively. The special score system used in this article is generally derived from the SSBP. Instead of calculating the protein relationships based on their biological process GO terms, we focused on their relationships based on the GO terms in "Molecular Function" category. This is the only difference between our method and SSBP. Last, the infimum of the special score of two sets, which is described in the previous part, is stored in the distance matrix to measure the relationship between two viruses.
This work was supported by the National 973 Key Basic Research Program (Grant Nos. 2010CB945401, 2008CB713807 and 2007CB108800), and the National Natural Science Foundation of China (Grant No. 30870575, 31071162, 31000590), and the Science and Technology Commission of Shanghai Municipality (11DZ2260300).
This article has been published as part of BMC Systems Biology Volume 5 Supplement 3, 2011: BIOCOMP 2010 - The 2010 International Conference on Bioinformatics & Computational Biology: Systems Biology. The full contents of the supplement are available online at http://www.biomedcentral.com/1752-0509/5?issue=S3.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.