Identifying potential survival strategies of HIV-1 through virus-host protein interaction networks
© van Dijk et al. 2010
Received: 9 February 2010
Accepted: 15 July 2010
Published: 15 July 2010
Skip to main content
© van Dijk et al. 2010
Received: 9 February 2010
Accepted: 15 July 2010
Published: 15 July 2010
The National Institute of Allergy and Infectious Diseases has launched the HIV-1 Human Protein Interaction Database in an effort to catalogue all published interactions between HIV-1 and human proteins. In order to systematically investigate these interactions functionally and dynamically, we have constructed an HIV-1 human protein interaction network. This network was analyzed for important proteins and processes that are specific for the HIV life-cycle. In order to expose viral strategies, network motif analysis was carried out showing reoccurring patterns in virus-host dynamics.
Our analyses show that human proteins interacting with HIV form a densely connected and central sub-network within the total human protein interaction network. The evaluation of this sub-network for connectivity and centrality resulted in a set of proteins essential for the HIV life-cycle. Remarkably, we were able to associate proteins involved in RNA polymerase II transcription with hubs and proteasome formation with bottlenecks. Inferred network motifs show significant over-representation of positive and negative feedback patterns between virus and host. Strikingly, such patterns have never been reported in combined virus-host systems.
HIV infection results in a reprioritization of cellular processes reflected by an increase in the relative importance of transcriptional machinery and proteasome formation. We conclude that during the evolution of HIV, some patterns of interaction have been selected for resulting in a system where virus proteins preferably interact with central human proteins for direct control and with proteasomal proteins for indirect control over the cellular processes. Finally, the patterns described by network motifs illustrate how virus and host interact with one another.
Recent advances in high throughput genome-wide screening techniques have increased not only the amount of generated data, but also its quality. In combination with the completion of the human genome project, this has led to early expectations of revolutionizing medicine. However, as often is the case in life science, the devil is in the details. We have learned that before we can efficiently use genome-wide data for developing the next generation of drugs and treatments we have to revolutionize the way we use our data . Since we have recognized that we are not yet equipped with the right tools to interpret this unprecedented amount of data we have been building large databases where data is waiting to be processed into information. Today interpreting this data stands as the grand challenge for bioinformatics in the post-genomic era.
Meanwhile, hoping to solve this problem, we have been broadening our view and have been looking elsewhere for answers. One of these is the field of network science. This relatively new field has emerged from graph theory and physics and has proved to be a powerful method for the mathematical representation, visualization and analysis of complex data that involves many interacting components. In this area powerful concepts have been developed, such as network centrality, scalability and network motifs, that have enabled us to understand a system through its network topology [2–9]. Subsequently many fields have benefited from these advances. For example in epidemiology the mapping of human interactions into social networks gave insight into how sexually transmitted diseases spread in a population [10–12]. In developmental biology the representation of interactions among different genes as gene regulatory networks has been widely accepted [13–17] and in social sciences the analysis of human mobility patterns using a human interaction network helped us shed light on the dynamics of our society .
However, the field of virology has not yet received the full attention it deserves from network research, despite the availability of data and ready to use scientific methodology. Only recently Dyer and colleagues have described a network between human proteins interacting with viruses and other pathogens based on manually curated data from literature as well as publicly available databases . In their work they give an overview of the common interacting proteins of viruses such as HIV, Incense and Measles to pathogen groups like Toxoplasma and Plasmodium. Their findings emphasize that pathogens preferentially interact with two kinds of proteins: hubs (ones that interact with many other proteins) and bottlenecks (ones that lie on many shortest paths). They also provide evidence from Gene Ontology (GO) annotation that different sets of pathogens target the same processes even though they interact with different proteins. One remarkable feature of their data is that it is highly biased towards HIV interactions. Approximately eighty percent of all interactions are specific to Human Immunodeficiency Virus (HIV).
Human immunodeficiency virus (HIV) is recognized to be responsible for one of the most destructive pandemics in recorded history. It causes thousands of deaths and substantially decreases the life quality of millions of individuals each year, most of which live in Sub-Saharan Africa.
Since the first isolation of HIV in 1981, scientists are investigating every aspect of the virus hoping to find a vaccine. Genomic research has revealed that HIV has a compact genome, which consists of nine open reading frames (leading to nine primary translation products) that code for fifteen different translational products, represented by nineteen proteins. Most of the coding regions of HIV overlap, except for the genes rev and tat that are split by introns.
Despite the compactness of its genome, HIV has a very high nucleotide substitution rate, several million times faster than one of the average eukaryotic genome. Such a high substitution rate enables a virus population to exist in a cloud of genotypes called quasispecies and to rapidly adapt to environmental changes by means of this diversity. Varying conditions such as different humoral and innate immune system responses within and between hosts or varying treatment regiments result in selection pressures therefore shifting the dominant virus genotype . This led to the understanding that the persistence of the virus in host relies on the complex web of interactions it has, rather than the fitness of its structural components. In other words, HIV's strategy for dealing with environmental stress lies in its ability to change its structural components while maintaining their function. This is also the main reason why it is unlikely that a universal vaccine will be developed using conventional methods like targeting anchor proteins. Therefore, before we can expect to start developing a cure, we need to invest more in the understanding of the interplay between the virus and the host.
Fourteen most frequent types of interactions between HIV and human proteins.
induces phosphorylation of
In addition to the NCBI database there are three other independent data-sets available as a result of small interfering RNA (siRNA) screens [23–25]. However, there is surprisingly little overlap between these four resources.
A very recent review by Bushman et al. addresses this issue by comparing the results of these three siRNA screens . There were 34 genes called in at least two siRNA screens where as little as three genes were common in all three screens. Furthermore, of the 34 genes on two or three lists, only 11 were reported in the NCBI database. They have explained several reasons that could contribute to this variation. In addition they have included the interactions from NCBI database and other related work to assemble a "host-pathogen" interaction network. The analysis of this all-combined host-pathogen network revealed ten clusters that are identified with a distinct biochemical or cellular function. The clusters that were identified not only confirm understanding of some known processes such as immune response and tat activation/transcriptional elongation but also suggest the existence of new processes previously overlooked such as proteasome and mediator complex activity.
Nevertheless there are two important shortcomings associated with siRNA screening. First, the siRNA method can not be used to identify genes if their knockdown is toxic (i.e. resulting in cell death). Hence the method can be argued to be biased towards the Identification of genes that have a phenotype, yet on the periphery of a pathway within the total HIV-1 Human interactome Second, it does not explain the type of interaction that the suggested gene might have with HIV proteins. Therefore we argue that if one aims to identify "core proteins" involved in important processes for viral survival and also wants to analyze resulting dynamics, one has to rely on relatively less-biased and well annotated data such as the NCBI database. However the quality of the published manuscripts differ among those present in the database. In this report, all individual calls reporting interactions are treated equally for computational analyses.
In the remaining of this paper we introduce the HIV-1 Human Protein-Protein Interaction Network based on the database by the National Institute of Allergy and Infectious Diseases (NIAID) called HIV-1, Human Protein Interaction Database. In the results section we present our findings from network centrality and network motif analysis. In the discussion section we discuss the analysis of network topology and patterns that has led to the Identification of HIV specific proteins and processes associated with viral survival. In the methods section we explain how our network was inferred and annotated with publicly available human protein interaction data and gene ontology (GO) terms. Subsequently, newly developed algorithms are described in the methods section.
The National Institute of Allergy and Infectious Diseases' (NIAID) HIV-1, Human Protein Interaction Database offers comprehensive data on nineteen HIV proteins (fifteen structural and four intermediate proteins) interacting with 1452 human proteins via 3959 interactions. The most frequent types of these interactions are summarized in Table 1 with their frequency. We can see that regulatory (up-regulates, down-regulates, regulated by) and activation/inhibition (activates, inhibits, inhibited by) are among the most common interactions.
This explains their overrepresentation in Figure 2-A. To correct for this bias we have calculated a relative connectivity distribution of the activation/inhibition and regulatory sub-networks using normalization (see section methods for details). This allows for direct comparison of connectivities between HIV proteins and between the two sub-networks (see Figure 2-B).
Top ten highest connected HDFs, considering only HIV-HDF connections.
mitogen-activated protein kinase 1
protein kinase C, alpha
mitogen-activated protein kinase 3 isoform 1
actin, gamma 1 propeptide
major histocompatibility complex, class I, A precursor
CD4 antigen precursor
interleukin 10 precursor
interferon, alpha 1
We hypothesize that central genes or proteins in the human protein interaction network are more likely to be important players in the life cycle of the virus than non-central ones. Therefore, after constructing the HIV-1 human protein interaction network we have measured three types of network centrality: degree, betweenness and eigenvector centrality on both local and global networks.
Mean values of centrality measures on HDFs and on proteins of the whole human protein interaction network, with standard deviations between brackets.
total human network
(HDF > total)
Set of proteins that are found to be hubs by both the degree and eigenvector centrality metrics.
tumor protein p53
breast cancer 1, early onset isoform 1
estrogen receptor 1
CREB binding protein isoform a
v-rel reticuloendotheliosis viral oncogene homolog A
proto-oncogene tyrosine-protein kinase SRC
TATA box binding protein
myc proto-oncogene protein
E1A binding protein p300
Table 4 summarizes the top one percent of the highest ranked HDFs in the total network. We notice from this table that both centrality metrics result in very similar sets of top ranked proteins. The extended table with the top five percent of proteins identified with different measures can be found in the additional file 6. We can see that P53, Brca-1 and Retinoblastoma-1 have been identified as being highly central by both metrics. This result is not surprising since all three are well established oncogenes and have been extensively studied. Therefore their connections with other proteins are expected to be better documented.
We define a protein with high betweenness score as a bottleneck .
Top one percent of proteins that have the highest score from the betweenness centrality metric.
tumor protein p53
growth factor receptor-bound protein 2 isoform 1
breast cancer 1, early onset isoform 1
proto-oncogene tyrosine-protein kinase
EGFR [GenBank:NP_005219.2 ]
epidermal growth factor receptor isoform a
signal transducer and activator of transcription 3 isoform 1
estrogen receptor 1
phosphoinositide-3-kinase, regulatory subunit, polypeptide 1 isoform 1
DNA directed RNA polymerase II polypeptide A
myc proto-oncogene protein
Sp1 transcription factor
v-rel reticuloendotheliosis viral oncogene homolog A
Src homology 2 domain containing transforming protein 1 isoform p52Shc
It is not surprising that from our centrality analysis the proteins that are important for the functioning of a cell are also crucial for the viral survival. The question that remains is "Are there HIV specific processes that are crucial for viral existence but not as important for the cell?"
Because of this strong correlation between local and global properties almost any protein that is identified as highly essential using a ranking based on local properties is also important globally. To counteract this effect we filter out proteins of global importance by re-ranking them using an adjusted metric (see methods for details).
Set of proteins that are identified as central using both adjusted centrality metrics (degree and eigenvector centrality).
TBP-associated factor 1 isoform 1
activating transcription factor 2
general transcription factor IIB
signal transducer and activator of transcription 1 isoform alpha
TATA box binding protein
cyclin-dependent kinase inhibitor 1A
CCAAT/enhancer binding protein beta
Top ten bottlenecks after normalization.
proteasome (prosome, macropain) 26S subunit, non-ATPase, 6
proteasome alpha 2 subunit
proteasome 26S non-ATPase subunit 10 isoform 1
DEAH (Asp-Glu-Ala-His) box polypeptide 9
CD4 antigen precursor
CD82 [GenBank: NP002222.1]
CD82 antigen isoform 1
IKK-related kinase epsilon
protein tyrosine phosphatase, receptor type, C isoform 1 precursor
chemokine (C-C motif) receptor 5
One remark is that "some of the virus-host interaction studies have been done on individual subunits of a complex, but at other times a complex is implicated in a virus-host interaction and all subunits of that complex are linked to a virus protein even though only a few subunits might be involved in the interaction. This might lead to spurious over-represented motifs." On the other hand, if those data describing interaction of complexes rather than individual subunits is discarded this might lead to an under-representation of complexes which would in reality be present in the motif analysis. We have chosen to include these in favor of over-representation of motifs since the HIV-1 human protein interaction data is already sparse.
Complex networks in general and biological networks specifically have been found to consist of small recurring patterns, so-called network motifs [2, 4, 36]. These building blocks have been used to study the structure and dynamic behavior of networks.
Co-regulation, or co-activation/inhibition is what we describe as two HIV proteins regulating/activation/inhibition one human protein (see Figure 8). The two interactions can be of the same type (e.g. both up-regulation, or inhibition), where they can show a potential redundancy in the system. Of the co-regulation motif we found six types of regulatory and two types of activation/inhibition motifs to be significantly over-represented.
Inclusion of interactions between HDFs (collected from human protein interaction databases, see methods section) gives the ability to study the relationship between HDFs that have a common interacting HIV protein. The network motif that is associated with this pattern is what we identify as a "clique" (see Figure 8). Traditionally the term clique has been used to denote a group of fully interconnected nodes , but has also been used to describe network motifs of the fully connected three node sub-graph . In this work we study such a clique that consists of two human proteins and one HIV protein. As the interactions between HIV and human nodes have directionality a number of different clique patterns arise, similar to the ones without HDF-HDF interactions.
A feed-forward type [2–4, 36] (or self-regulatory) motif occurs when two connected HDFs are also (indirectly) interacting via an HIV protein. Co-regulation (or activation/inhibition) is also observed in the clique. Two interacting human proteins both also regulate/signal the same HIV protein. Again when the two interactions are of the same type this might indicate a redundancy (see Discussion). Nine different clique patterns were observed in the regulatory network and five in the activation/inhibition network. We have also conducted a Gene Ontology analysis for each motif that was identified (see additional file 12).
In this study we have analyzed a pathogen-host protein interaction network in an effort to relate network topology to biological functioning. Topologically central proteins have shown to be crucial for HIV functioning and network motifs appear to be the result of the complex virus-host interplay. In this section we discuss these results from the network centrality metrics and the network motif analysis.
First we have conducted a meta-analysis of the HIV-human protein interaction network to examine the distribution of interactions among HIV proteins as well as HDFs. Network analysis identified key components in the life cycle of HIV.
The normalized relative connectivity analysis revealed involvement of viral proteins in distinct sub-functions (activation/inhibition and regulatory).
Integrase is a viral enzyme that enables the viral genome to be integrated into the DNA of the host cell. In addition to this it is present at the time of the initial infection of a cell in only small amounts . One can speculate that any dual function of activation/inhibition or regulatory nature would end up in reduced efficiency and probably early detection by the human immune machinery before completing the job. This might be the reason why it is involved in neither the activation/inhibition network, nor the regulatory network.
HIV proteins which are exposed to the extracellular environment (Gp120, Gp41, Tat and Vpr) have approximately an equal number of interactions inferred from their global connectivity in the total network. This is probably due to the large variety of function related to these proteins. It is indeed true for Tat and Vpr and possibly for Gp120, that they are hyperactive in terms of their role in different processes. All three proteins are also directly exposed to the extracellular factors such as antibodies. Gp41 on the other hand, is originally buried in the viral envelope and is exposed only after Gp120 binds to a CD4 receptor. In addition, Gp41 has been associated with a specific role in viral membrane fusion. So it is puzzling that Gp41 is sharing this generic connectivity profile. On the other side of the spectrum, viral enzymes RT, retropepsin and integrase all show interaction profiles that are highly specific for activation and inhibition interactions. These enzymes are reaction specific and functional changes are likely to be too costly for the virus, therefore might be favorable to keep these proteins uni-functional.
Similar connectivity analysis for human proteins revealed Mitogen-activated protein kinase 1 (Mapk1), Interferon gamma (Ifng) and Protein kinase C alpha (Prkca) and Mitogen activated protein kinase 3 (Mapk3) as the most HIV connected nodes in HIV-human protein interaction network, having degrees 10, 9, 9 and 9 respectively. Mapk1 is identified as the integration point for multiple pathways and takes part in a wide variety of cellular processes . Ifng is an important cytokine for innate and adaptive immunity. Prkca and Mapk3 are both known to be involved in various critical cellular processes. It is not unexpected that we find them to be over-represented in the HIV-1 human protein interaction network.
Meta-analysis of the HIV-human protein interaction network revealed that HDFs interacting with HIV constitute a non-random sub-network (HDF network) in the human interactome. We employed three centrality measures (degree, betweenness and eigenvector centrality) to analyze the HDF sub-network in detail. We calculated the average centrality measures for the HDF network as well as the total human protein interaction network. It is clear that the HDF network is located topologically central in the human-protein interaction network and is significantly densely connected.
Hub analysis of the HDF network resulted in fifteen proteins that are found to be central for at least one of the two centrality metrics (degree and eigenvector centrality) where six of them were oncogenes.
Bottleneck analysis was conducted based on the betweenness centrality and resulted in a similar list to the hub analysis. Further inspection showed that both were also highly central in the total human protein interaction network.
We calculated the correlation between local and global centrality for each of the centrality metrics that resulted in high correlation for each measure. This means that the centrality assigned to each protein in the HDF network was a result of its high connectivity in the total network. To overcome this problem and identify HIV specific processes we have normalized each centrality measure from the HDF network by its global network counterpart. We observe from the normalized list that highly studied oncogenes are replaced by transcription factors, transcription factor sub-units (TBP) and transcription activators. This finding is important because although transcription is important for the cell, it is probably "the vital" processes for HIV to synthesize proteins necessary for forming progeny. It is important to note that in the normalized bottlenecks list, three proteasome subunits constitute the most important bottlenecks specific for the HDF network. Proteasome subunits were also identified as one of the important processes by Bushman et al. . It is known that cellular proteasome can act negatively on HIV infection by destroying viral proteins but it is not clear what the overall effect is on the infection. Our results show that the importance of protease stems from the close interaction between vital proteins in regulation of gene expression and cell communication with proteasomal proteins. Therefore proteasome seems to connect the processes governed by these proteins and the rest of the HDF network. All biochemical reactions in the cell are dynamic and their equilibrium depends on the concentration of the substrates available. Proteasomes have a unique role in this scenario by being the regulator of the concentration of particular proteins. A strong line of evidence for HIV's exploitation of proteasomal pathways comes from the innate restriction host factors that inhibit viral replication at the cellular level. Human CD317/Tetherin and APOBEC proteins (APOBEC3G and APOBEC3F) have been identified to inhibit HIV replication and render resistance to HIV infection. There is growing evidence that HIV proteins Vpu and Vif accelerate proteasomal degradation of CD317/Tetherin [40–43] and APOBEC3G/F [44, 45] respectively, thus suppressing their expression and overcoming the innate resistance. Strikingly, the human restriction factor tetherin mentioned above is not curated into the NIAID database. Yet, the importance of proteasomal degradation for HIV infection has been identified independently in this work. Given the critical role of HIV's Vif and Vpu in suppressing APOBEC3G/F and CD317 activity, we argue that pharmacologic compounds designed for restoring the activity of these intrinsic anti-viral factors in infected cells in-vivo, could have strong therapeutic benefits, and therefore deserve serious attention.
As a result, we hypothesize that after infection, apart from degrading HIV proteins, re-prioritization of proteasomal pathways is an indirect control mechanism actively engaged by the virus to manage the concentrations of pivotal proteins in the cell. We have shown that regulation of gene expression and cell communication are major processes that are directly linked to proteasome functioning.
Traditionally networks of single systems have been studied using network motifs (e.g. gene regulatory network of yeast, see ). Discovered patterns, in terms of over-represented network motifs, hold information on network structure and dynamics of that system. HIV infection and it's life-cycle is based on the interplay between two systems, namely the virus itself and the human host. Consequently, network analysis using motifs results in understanding of dynamics and structure of interplay as opposed to the functioning of the two systems independently.
By interpreting the inferred network motifs (see Figure 7 and 8) we achieve insight into this interplay. Self-regulation or feedback is a pattern that is commonly found in gene regulatory systems (see [2–4, 36]). Generally these patterns indicate a response mechanism, where a signal such as a gene regulation (up-regulation, down-regulation) or a phosphorylation (activation, inhibition) of a protein A triggers a similar signal to protein B. In the two node case the source of the signal to A is B, thus potentially resulting in a positive or negative feedback loop. In the three node case (two different HIV proteins) interpretation is less trivial. When we consider all HIV proteins that make up the virus as a unity, we may consider the motif as a feedback or self-regulation. Since current available data is lacking information on interactions between HIV proteins, we are not able to interpret it as a loop. Yet interaction between HIV proteins, especially with the regulatory protein Tat, are known to be prevalent . Therefore it is plausible to assume the existence of three node feedback loops.
One limitation of the network motif analysis is the absence of time (or causality) and spatial information associated with each event in the database. Therefore, reconstruction of pathway dynamics by means of network motifs is not possible. One way to overcome this problem, at least for some motifs, is to include interactions among human proteins that indicate shared compartments and time. For instance, co-regulation, specifically in the case of two of the same interactions, points to a potential redundancy. This only holds when we assume that the two similar interactions occur in a shared spatial and temporal frame, i.e. the interactions happen in the same cellular compartment and roughly at the same time. This assumption becomes more plausible when HDF-HDF interactions are incorporated, serving as proof for the co-occurrence in time and space, of the two proteins. Co-regulation that occurs within a clique thus more strongly points to redundancy. It is these redundancies that are known to contribute to the robustness of regulatory networks in general [46–48] and give evidence for a potential cause of the robust nature of HIV infections.
Studying HIV-human interaction in terms of network motifs gives us the opportunity to reconstruct dynamics on the protein level. It is known that under selection pressure by the immune system the HIV virus undertakes a number of actions to evade this defense. This interplay where the host tries to undermine virus reproduction and where the virus evades immune response is the key concept for understanding virus-host relations.
Network motifs that have been found to be significantly over-represented, i.e. when their existence can not only be accounted for by randomness, show patterns that apparently have been selected for. By investigating these motifs individually we observe these strategies on the protein interaction level.
One of such motifs is a two node feedback loop, found in the HIV-host activation/inhibition network (see motif B2 in Figure 7). Significant over-representation of this network motif shows the inhibitory behavior of HIV proteins on human proteins that in turn inhibit the HIV protein. We therefore refer to these patterns as an "indirect positive feedback" and in this specific case "self-activation" as inhibition of an inhibitor results in (relative) activation. Closer inspection of all occurrences of this network motif shows that the HIV Tat and Gp120 protein and the human protein Interferon Gamma (Ifng) have the highest level of involvement. Gene Ontology analysis of the observed network motif indicates that the human proteins involved in the network motif are involved in immune response (see additional file 12).
Ifng, or type II interferon, is a cytokine critical for innate and adaptive immunity against viral and intracellular bacterial infections and for tumor control. The importance of Ifng in the immune system stems in part from its ability to inhibit viral replication directly, but most importantly derives from its immunostimulatory and immunomodulatory effects [49, 50].
We want to acknowledge that the results presented in this paper are based on annotated protein interaction data from the NIAID database. This data varies strongly in quality and it can be argued to contain a bias as a result from translating individual reports into a structured database. Therefore the results presented above should be interpreted qualitatively authentic rather than quantitatively accurate. Nonetheless, the presented work is the first in the field, according to our knowledge, to incorporate network centrality analysis and network motifs in a virus-host protein interaction network. We encourage experimental testing of the results in this paper to study their potential role in HIV infection.
We have demonstrated that infection with HIV results in re-prioritization of cellular processes such as transcription and proteasome activity. The primary success of the virus depends on the synthesis of new virions in a reasonable amount of time. This has to be accomplished before the infected cells are detected by patrolling CD8+ T cells or a humoral response has emerged. Therefore it is highly plausible that hijacking of the transcriptional machinery is one of the key processes that has a pronounced role post-infection.
In addition, proteasomes not only gain significant importance for the survival of the cell by degrading HIV proteins early in the infection, but arguably also for HIV, since they regulate the concentration of the innate antiviral host factors such as APOBEC3G/F and CD317 and can be targeted by HIV proteins Vpu and Vif.
We have shown that using network motifs one can identify recurring patterns that have consequences in the virus-host dynamics. Specifically, we observed patterns that show strategies of the virus used to evade the host immune system. Finally, we conclude that the survival of HIV within the host requires direct control of the cellular machinery via the pivotal human proteins and indirect control via the proteasomes. Network motifs and complex network theory provide a promising framework to study these dynamics.
The NCBI HIV-Human Protein Interaction database is used to construct a protein interaction network. The obtained network consists of nineteen HIV proteins that interact with 1452 human proteins through 3959 interactions (See Figure 1.)In this protein interaction network nodes represent either HIV or human proteins and edges interactions between them. Because interactions between HIV and human proteins are annotated (see Figure 8) for most common interaction types), edges in our network are directed and have an interaction type. As interactions are only between HIV and human protein, the resulting network is bipartite.
Figure 2 shows the connectivity of the nineteen HIV proteins in the HIV-Human protein interaction networks. Figure 2-A shows the absolute number of interactions per HIV proteins for each of the two subnetworks and the total network. Figure 2-B shows the normalized relative connectivity. This was achieved by first calculating the relative connectivity, by dividing the number of interactions for each protein and network by the total number of interactions in that network. Next the numbers were normalized by dividing the relative connectivity for each protein and each of the two subnetworks by the relative connectivity of that protein in the total network. This normalization permits the comparison of proteins and subnetworks.
To incorporate interactions between HDFs and between HDFs and human non-HDF proteins, data on protein interaction was collected from several databases (BIND, BioGRID, HPRD) and added to the network [51–53]. As a result the network consists of nineteen HIV proteins, 1,452 HDFs and 12,557 non-HDF human proteins, and 3,959 HIV-HDF interactions, 4,540 HDF-HDF interactions and 13,189 interactions between HDFs and non-HDF human proteins.
The metrics that are used to rank HDFs according to their importance in the network are based on a number of network centrality measures (measured per node):
In contrast to the degree, which is a measure of direct connectedness (number of interacting proteins in our case), the eigenvector centrality measures direct and indirect connectedness. Because well connected nodes contribute more to the score of their neighbors than low connected nodes, a protein with a relative high eigenvector centrality not just indicates high activity in terms of different interactions, but also points to activity in important pathways. The betweenness centrality, on the contrary, only measures pathway activity. A protein with high betweenness is positioned at a central location in the network, as relatively many shortest paths cross it. This does not necessarily imply well connectedness in terms of degree; a low connected protein might still have a high centrality. This way important "cross-roads" in the network can be identified, that would not have been noticed using standard degree analysis.
Using these three metrics we seek to measure the importance of human proteins that interact with HIV proteins (HDFs). In order to distinguish between HDFs that are important to whole human functioning and HDFs that are specifically important to the HIV life-cycle, we normalize our centrality ranking using a distinction between "local" and "global" metrics.
For instance, we define local degree of an HDF as the number of edges to other HDFs, and global degree of an HDF as the number of edges to any other human protein (including HDFs). So local degree measures connectivity within the HDF network, whereas global degree measures connectivity in the whole human protein interaction network. Similarly, we define local and global measures for eigenvector centrality and betweenness.
To use this as a normalization, first, we filter for proteins in the the top five percent of local degree, eigenvector centrality and betweenness. This results in 73 proteins for each metric. Second, to calculate the adjusted centrality metrics we divide the local by the global value. This results in three lists of proteins that are important specifically for HIV regarding these three metrics (see Table 6 and 7).
The HIV-Host protein interaction network was analyzed for network motifs using a motif detection algorithm implemented in Prolog (see additional files 13, 14, 15 and 16). The prolog programming language presents a useful alternative for network motif finding as the definition and detection of network patterns is highly intuitive (prolog is a declarative language used for logic programming). In contrast to the motif detection tools MAVisto  and Mfinder ), our implementation in Prolog and the FANMOD  motif finding tool are able to find any annotated network pattern consisting of any number of nodes and edges. This means that we are able to specify the type of edges and nodes, thereby distinguishing between different functional motifs even though they have the same topology (i.e. distinguishing between regulatory and activation/inhibition motifs). Motif detection was carried out for all possible two and three node patterns. To determine the significance of the observed motifs, motif detection was repeated on one thousand randomized networks using a strict randomization algorithm. This to ensure an unchanged connectivity distribution.
Fully randomized networks would make any found network motif to be significant. For this reason a randomized network should be as similar to the original network as possible, yet randomized. In [2, 3, 36] this is achieved by introducing a rewiring algorithm that iteratively switches the sources or targets of two random edges until the network is sufficiently randomized. This results in a network where the edges are randomized without changing the number of nodes or edges and without changing the degree distribution of the network. In our approach we used a similar algorithm (see Figure 6) for randomizing the networks. Because edges can be of different type, we either switch the sources or targets of two randomly chosen edges with equal probability.
As described in [3, 36] the significance of network motifs is determined using the P value and Z score which are calculated using the number of a specific motif found in the original network (N real ) and the average number found in the randomized networks (N rand ) with standard deviation (SD). A network motif is found to be significant if the probability of finding the motif N real times in the randomized networks (P value ) is smaller than 0.02 and the number of standard deviations N real is removed from N rand is at least 2. As a result the network motifs that are found to be significant can not just be attributed to randomness.
Human Immunodeficiency Virus
The National Center for Biotechnology Information
The National Institute of Allergy and Infectious Diseases
HIV Dependency Factors.
The authors would like to thank Dr. Marten Postma and our reviewers for their valuable feedback. This research was supported by the European Union through the ViroLab project http://www.virolab.org, EU project no. IST-027446 and DynaNets project http://www.dynanets.org EU grant agreement no: 233847
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.