Identifying potential survival strategies of HIV-1 through virus-host protein interaction networks
- David van Dijk†1Email author,
- Gokhan Ertaylan†1Email author,
- Charles AB Boucher2 and
- Peter MA Sloot1
© van Dijk et al; licensee BioMed Central Ltd. 2010
Received: 9 February 2010
Accepted: 15 July 2010
Published: 15 July 2010
The National Institute of Allergy and Infectious Diseases has launched the HIV-1 Human Protein Interaction Database in an effort to catalogue all published interactions between HIV-1 and human proteins. In order to systematically investigate these interactions functionally and dynamically, we have constructed an HIV-1 human protein interaction network. This network was analyzed for important proteins and processes that are specific for the HIV life-cycle. In order to expose viral strategies, network motif analysis was carried out showing reoccurring patterns in virus-host dynamics.
Our analyses show that human proteins interacting with HIV form a densely connected and central sub-network within the total human protein interaction network. The evaluation of this sub-network for connectivity and centrality resulted in a set of proteins essential for the HIV life-cycle. Remarkably, we were able to associate proteins involved in RNA polymerase II transcription with hubs and proteasome formation with bottlenecks. Inferred network motifs show significant over-representation of positive and negative feedback patterns between virus and host. Strikingly, such patterns have never been reported in combined virus-host systems.
HIV infection results in a reprioritization of cellular processes reflected by an increase in the relative importance of transcriptional machinery and proteasome formation. We conclude that during the evolution of HIV, some patterns of interaction have been selected for resulting in a system where virus proteins preferably interact with central human proteins for direct control and with proteasomal proteins for indirect control over the cellular processes. Finally, the patterns described by network motifs illustrate how virus and host interact with one another.
Recent advances in high throughput genome-wide screening techniques have increased not only the amount of generated data, but also its quality. In combination with the completion of the human genome project, this has led to early expectations of revolutionizing medicine. However, as often is the case in life science, the devil is in the details. We have learned that before we can efficiently use genome-wide data for developing the next generation of drugs and treatments we have to revolutionize the way we use our data . Since we have recognized that we are not yet equipped with the right tools to interpret this unprecedented amount of data we have been building large databases where data is waiting to be processed into information. Today interpreting this data stands as the grand challenge for bioinformatics in the post-genomic era.
Meanwhile, hoping to solve this problem, we have been broadening our view and have been looking elsewhere for answers. One of these is the field of network science. This relatively new field has emerged from graph theory and physics and has proved to be a powerful method for the mathematical representation, visualization and analysis of complex data that involves many interacting components. In this area powerful concepts have been developed, such as network centrality, scalability and network motifs, that have enabled us to understand a system through its network topology [2–9]. Subsequently many fields have benefited from these advances. For example in epidemiology the mapping of human interactions into social networks gave insight into how sexually transmitted diseases spread in a population [10–12]. In developmental biology the representation of interactions among different genes as gene regulatory networks has been widely accepted [13–17] and in social sciences the analysis of human mobility patterns using a human interaction network helped us shed light on the dynamics of our society .
However, the field of virology has not yet received the full attention it deserves from network research, despite the availability of data and ready to use scientific methodology. Only recently Dyer and colleagues have described a network between human proteins interacting with viruses and other pathogens based on manually curated data from literature as well as publicly available databases . In their work they give an overview of the common interacting proteins of viruses such as HIV, Incense and Measles to pathogen groups like Toxoplasma and Plasmodium. Their findings emphasize that pathogens preferentially interact with two kinds of proteins: hubs (ones that interact with many other proteins) and bottlenecks (ones that lie on many shortest paths). They also provide evidence from Gene Ontology (GO) annotation that different sets of pathogens target the same processes even though they interact with different proteins. One remarkable feature of their data is that it is highly biased towards HIV interactions. Approximately eighty percent of all interactions are specific to Human Immunodeficiency Virus (HIV).
Human Immunodeficiency Virus
Human immunodeficiency virus (HIV) is recognized to be responsible for one of the most destructive pandemics in recorded history. It causes thousands of deaths and substantially decreases the life quality of millions of individuals each year, most of which live in Sub-Saharan Africa.
Since the first isolation of HIV in 1981, scientists are investigating every aspect of the virus hoping to find a vaccine. Genomic research has revealed that HIV has a compact genome, which consists of nine open reading frames (leading to nine primary translation products) that code for fifteen different translational products, represented by nineteen proteins. Most of the coding regions of HIV overlap, except for the genes rev and tat that are split by introns.
Despite the compactness of its genome, HIV has a very high nucleotide substitution rate, several million times faster than one of the average eukaryotic genome. Such a high substitution rate enables a virus population to exist in a cloud of genotypes called quasispecies and to rapidly adapt to environmental changes by means of this diversity. Varying conditions such as different humoral and innate immune system responses within and between hosts or varying treatment regiments result in selection pressures therefore shifting the dominant virus genotype . This led to the understanding that the persistence of the virus in host relies on the complex web of interactions it has, rather than the fitness of its structural components. In other words, HIV's strategy for dealing with environmental stress lies in its ability to change its structural components while maintaining their function. This is also the main reason why it is unlikely that a universal vaccine will be developed using conventional methods like targeting anchor proteins. Therefore, before we can expect to start developing a cure, we need to invest more in the understanding of the interplay between the virus and the host.
Fourteen most frequent types of interactions between HIV and human proteins.
induces phosphorylation of
In addition to the NCBI database there are three other independent data-sets available as a result of small interfering RNA (siRNA) screens [23–25]. However, there is surprisingly little overlap between these four resources.
A very recent review by Bushman et al. addresses this issue by comparing the results of these three siRNA screens . There were 34 genes called in at least two siRNA screens where as little as three genes were common in all three screens. Furthermore, of the 34 genes on two or three lists, only 11 were reported in the NCBI database. They have explained several reasons that could contribute to this variation. In addition they have included the interactions from NCBI database and other related work to assemble a "host-pathogen" interaction network. The analysis of this all-combined host-pathogen network revealed ten clusters that are identified with a distinct biochemical or cellular function. The clusters that were identified not only confirm understanding of some known processes such as immune response and tat activation/transcriptional elongation but also suggest the existence of new processes previously overlooked such as proteasome and mediator complex activity.
Nevertheless there are two important shortcomings associated with siRNA screening. First, the siRNA method can not be used to identify genes if their knockdown is toxic (i.e. resulting in cell death). Hence the method can be argued to be biased towards the Identification of genes that have a phenotype, yet on the periphery of a pathway within the total HIV-1 Human interactome Second, it does not explain the type of interaction that the suggested gene might have with HIV proteins. Therefore we argue that if one aims to identify "core proteins" involved in important processes for viral survival and also wants to analyze resulting dynamics, one has to rely on relatively less-biased and well annotated data such as the NCBI database. However the quality of the published manuscripts differ among those present in the database. In this report, all individual calls reporting interactions are treated equally for computational analyses.
HIV-1, Human Protein Protein Interaction Network and Analysis
In the remaining of this paper we introduce the HIV-1 Human Protein-Protein Interaction Network based on the database by the National Institute of Allergy and Infectious Diseases (NIAID) called HIV-1, Human Protein Interaction Database. In the results section we present our findings from network centrality and network motif analysis. In the discussion section we discuss the analysis of network topology and patterns that has led to the Identification of HIV specific proteins and processes associated with viral survival. In the methods section we explain how our network was inferred and annotated with publicly available human protein interaction data and gene ontology (GO) terms. Subsequently, newly developed algorithms are described in the methods section.
The National Institute of Allergy and Infectious Diseases' (NIAID) HIV-1, Human Protein Interaction Database offers comprehensive data on nineteen HIV proteins (fifteen structural and four intermediate proteins) interacting with 1452 human proteins via 3959 interactions. The most frequent types of these interactions are summarized in Table 1 with their frequency. We can see that regulatory (up-regulates, down-regulates, regulated by) and activation/inhibition (activates, inhibits, inhibited by) are among the most common interactions.
This explains their overrepresentation in Figure 2-A. To correct for this bias we have calculated a relative connectivity distribution of the activation/inhibition and regulatory sub-networks using normalization (see section methods for details). This allows for direct comparison of connectivities between HIV proteins and between the two sub-networks (see Figure 2-B).
Top ten highest connected HDFs, considering only HIV-HDF connections.
mitogen-activated protein kinase 1
protein kinase C, alpha
mitogen-activated protein kinase 3 isoform 1
actin, gamma 1 propeptide
major histocompatibility complex, class I, A precursor
CD4 antigen precursor
interleukin 10 precursor
interferon, alpha 1
We hypothesize that central genes or proteins in the human protein interaction network are more likely to be important players in the life cycle of the virus than non-central ones. Therefore, after constructing the HIV-1 human protein interaction network we have measured three types of network centrality: degree, betweenness and eigenvector centrality on both local and global networks.
HDF sub-network is Central
Mean values of centrality measures on HDFs and on proteins of the whole human protein interaction network, with standard deviations between brackets.
total human network
(HDF > total)
Set of proteins that are found to be hubs by both the degree and eigenvector centrality metrics.
tumor protein p53
breast cancer 1, early onset isoform 1
estrogen receptor 1
CREB binding protein isoform a
v-rel reticuloendotheliosis viral oncogene homolog A
proto-oncogene tyrosine-protein kinase SRC
TATA box binding protein
myc proto-oncogene protein
E1A binding protein p300
Table 4 summarizes the top one percent of the highest ranked HDFs in the total network. We notice from this table that both centrality metrics result in very similar sets of top ranked proteins. The extended table with the top five percent of proteins identified with different measures can be found in the additional file 6. We can see that P53, Brca-1 and Retinoblastoma-1 have been identified as being highly central by both metrics. This result is not surprising since all three are well established oncogenes and have been extensively studied. Therefore their connections with other proteins are expected to be better documented.
We define a protein with high betweenness score as a bottleneck.
Top one percent of proteins that have the highest score from the betweenness centrality metric.
tumor protein p53
growth factor receptor-bound protein 2 isoform 1
breast cancer 1, early onset isoform 1
proto-oncogene tyrosine-protein kinase
EGFR [GenBank:NP_005219.2 ]
epidermal growth factor receptor isoform a
signal transducer and activator of transcription 3 isoform 1
estrogen receptor 1
phosphoinositide-3-kinase, regulatory subunit, polypeptide 1 isoform 1
DNA directed RNA polymerase II polypeptide A
myc proto-oncogene protein
Sp1 transcription factor
v-rel reticuloendotheliosis viral oncogene homolog A
Src homology 2 domain containing transforming protein 1 isoform p52Shc
Identification of host factors that are specific to HIV infection
It is not surprising that from our centrality analysis the proteins that are important for the functioning of a cell are also crucial for the viral survival. The question that remains is "Are there HIV specific processes that are crucial for viral existence but not as important for the cell?"
In order to understand the relation between local (related to other HDFs) and global (related to all human proteins) properties of HDFs, we check whether high centrality in the HDF network is a predictor for high centrality in the total human protein interaction network. We plot the local against the global measures of all our metrics. In Figure 5 these three plots are shown, clearly signifying strong correlations.
Because of this strong correlation between local and global properties almost any protein that is identified as highly essential using a ranking based on local properties is also important globally. To counteract this effect we filter out proteins of global importance by re-ranking them using an adjusted metric (see methods for details).
Set of proteins that are identified as central using both adjusted centrality metrics (degree and eigenvector centrality).
TBP-associated factor 1 isoform 1
activating transcription factor 2
general transcription factor IIB
signal transducer and activator of transcription 1 isoform alpha
TATA box binding protein
cyclin-dependent kinase inhibitor 1A
CCAAT/enhancer binding protein beta
Top ten bottlenecks after normalization.
proteasome (prosome, macropain) 26S subunit, non-ATPase, 6
proteasome alpha 2 subunit
proteasome 26S non-ATPase subunit 10 isoform 1
DEAH (Asp-Glu-Ala-His) box polypeptide 9
CD4 antigen precursor
CD82 [GenBank: NP002222.1]
CD82 antigen isoform 1
IKK-related kinase epsilon
protein tyrosine phosphatase, receptor type, C isoform 1 precursor
chemokine (C-C motif) receptor 5
One remark is that "some of the virus-host interaction studies have been done on individual subunits of a complex, but at other times a complex is implicated in a virus-host interaction and all subunits of that complex are linked to a virus protein even though only a few subunits might be involved in the interaction. This might lead to spurious over-represented motifs." On the other hand, if those data describing interaction of complexes rather than individual subunits is discarded this might lead to an under-representation of complexes which would in reality be present in the motif analysis. We have chosen to include these in favor of over-representation of motifs since the HIV-1 human protein interaction data is already sparse.
Complex networks in general and biological networks specifically have been found to consist of small recurring patterns, so-called network motifs [2, 4, 36]. These building blocks have been used to study the structure and dynamic behavior of networks.
Co-regulation, or co-activation/inhibition is what we describe as two HIV proteins regulating/activation/inhibition one human protein (see Figure 8). The two interactions can be of the same type (e.g. both up-regulation, or inhibition), where they can show a potential redundancy in the system. Of the co-regulation motif we found six types of regulatory and two types of activation/inhibition motifs to be significantly over-represented.
Inclusion of interactions between HDFs (collected from human protein interaction databases, see methods section) gives the ability to study the relationship between HDFs that have a common interacting HIV protein. The network motif that is associated with this pattern is what we identify as a "clique" (see Figure 8). Traditionally the term clique has been used to denote a group of fully interconnected nodes , but has also been used to describe network motifs of the fully connected three node sub-graph . In this work we study such a clique that consists of two human proteins and one HIV protein. As the interactions between HIV and human nodes have directionality a number of different clique patterns arise, similar to the ones without HDF-HDF interactions.
A feed-forward type [2–4, 36] (or self-regulatory) motif occurs when two connected HDFs are also (indirectly) interacting via an HIV protein. Co-regulation (or activation/inhibition) is also observed in the clique. Two interacting human proteins both also regulate/signal the same HIV protein. Again when the two interactions are of the same type this might indicate a redundancy (see Discussion). Nine different clique patterns were observed in the regulatory network and five in the activation/inhibition network. We have also conducted a Gene Ontology analysis for each motif that was identified (see additional file 12).
In this study we have analyzed a pathogen-host protein interaction network in an effort to relate network topology to biological functioning. Topologically central proteins have shown to be crucial for HIV functioning and network motifs appear to be the result of the complex virus-host interplay. In this section we discuss these results from the network centrality metrics and the network motif analysis.
HIV Human Protein Interaction Network Meta-Analysis
First we have conducted a meta-analysis of the HIV-human protein interaction network to examine the distribution of interactions among HIV proteins as well as HDFs. Network analysis identified key components in the life cycle of HIV.
The normalized relative connectivity analysis revealed involvement of viral proteins in distinct sub-functions (activation/inhibition and regulatory).
Integrase is a viral enzyme that enables the viral genome to be integrated into the DNA of the host cell. In addition to this it is present at the time of the initial infection of a cell in only small amounts . One can speculate that any dual function of activation/inhibition or regulatory nature would end up in reduced efficiency and probably early detection by the human immune machinery before completing the job. This might be the reason why it is involved in neither the activation/inhibition network, nor the regulatory network.
HIV proteins which are exposed to the extracellular environment (Gp120, Gp41, Tat and Vpr) have approximately an equal number of interactions inferred from their global connectivity in the total network. This is probably due to the large variety of function related to these proteins. It is indeed true for Tat and Vpr and possibly for Gp120, that they are hyperactive in terms of their role in different processes. All three proteins are also directly exposed to the extracellular factors such as antibodies. Gp41 on the other hand, is originally buried in the viral envelope and is exposed only after Gp120 binds to a CD4 receptor. In addition, Gp41 has been associated with a specific role in viral membrane fusion. So it is puzzling that Gp41 is sharing this generic connectivity profile. On the other side of the spectrum, viral enzymes RT, retropepsin and integrase all show interaction profiles that are highly specific for activation and inhibition interactions. These enzymes are reaction specific and functional changes are likely to be too costly for the virus, therefore might be favorable to keep these proteins uni-functional.
Similar connectivity analysis for human proteins revealed Mitogen-activated protein kinase 1 (Mapk1), Interferon gamma (Ifng) and Protein kinase C alpha (Prkca) and Mitogen activated protein kinase 3 (Mapk3) as the most HIV connected nodes in HIV-human protein interaction network, having degrees 10, 9, 9 and 9 respectively. Mapk1 is identified as the integration point for multiple pathways and takes part in a wide variety of cellular processes . Ifng is an important cytokine for innate and adaptive immunity. Prkca and Mapk3 are both known to be involved in various critical cellular processes. It is not unexpected that we find them to be over-represented in the HIV-1 human protein interaction network.
Meta-analysis of the HIV-human protein interaction network revealed that HDFs interacting with HIV constitute a non-random sub-network (HDF network) in the human interactome. We employed three centrality measures (degree, betweenness and eigenvector centrality) to analyze the HDF sub-network in detail. We calculated the average centrality measures for the HDF network as well as the total human protein interaction network. It is clear that the HDF network is located topologically central in the human-protein interaction network and is significantly densely connected.
Hub analysis of the HDF network resulted in fifteen proteins that are found to be central for at least one of the two centrality metrics (degree and eigenvector centrality) where six of them were oncogenes.
Bottleneck analysis was conducted based on the betweenness centrality and resulted in a similar list to the hub analysis. Further inspection showed that both were also highly central in the total human protein interaction network.
We calculated the correlation between local and global centrality for each of the centrality metrics that resulted in high correlation for each measure. This means that the centrality assigned to each protein in the HDF network was a result of its high connectivity in the total network. To overcome this problem and identify HIV specific processes we have normalized each centrality measure from the HDF network by its global network counterpart. We observe from the normalized list that highly studied oncogenes are replaced by transcription factors, transcription factor sub-units (TBP) and transcription activators. This finding is important because although transcription is important for the cell, it is probably "the vital" processes for HIV to synthesize proteins necessary for forming progeny. It is important to note that in the normalized bottlenecks list, three proteasome subunits constitute the most important bottlenecks specific for the HDF network. Proteasome subunits were also identified as one of the important processes by Bushman et al. . It is known that cellular proteasome can act negatively on HIV infection by destroying viral proteins but it is not clear what the overall effect is on the infection. Our results show that the importance of protease stems from the close interaction between vital proteins in regulation of gene expression and cell communication with proteasomal proteins. Therefore proteasome seems to connect the processes governed by these proteins and the rest of the HDF network. All biochemical reactions in the cell are dynamic and their equilibrium depends on the concentration of the substrates available. Proteasomes have a unique role in this scenario by being the regulator of the concentration of particular proteins. A strong line of evidence for HIV's exploitation of proteasomal pathways comes from the innate restriction host factors that inhibit viral replication at the cellular level. Human CD317/Tetherin and APOBEC proteins (APOBEC3G and APOBEC3F) have been identified to inhibit HIV replication and render resistance to HIV infection. There is growing evidence that HIV proteins Vpu and Vif accelerate proteasomal degradation of CD317/Tetherin [40–43] and APOBEC3G/F [44, 45] respectively, thus suppressing their expression and overcoming the innate resistance. Strikingly, the human restriction factor tetherin mentioned above is not curated into the NIAID database. Yet, the importance of proteasomal degradation for HIV infection has been identified independently in this work. Given the critical role of HIV's Vif and Vpu in suppressing APOBEC3G/F and CD317 activity, we argue that pharmacologic compounds designed for restoring the activity of these intrinsic anti-viral factors in infected cells in-vivo, could have strong therapeutic benefits, and therefore deserve serious attention.
As a result, we hypothesize that after infection, apart from degrading HIV proteins, re-prioritization of proteasomal pathways is an indirect control mechanism actively engaged by the virus to manage the concentrations of pivotal proteins in the cell. We have shown that regulation of gene expression and cell communication are major processes that are directly linked to proteasome functioning.
Traditionally networks of single systems have been studied using network motifs (e.g. gene regulatory network of yeast, see ). Discovered patterns, in terms of over-represented network motifs, hold information on network structure and dynamics of that system. HIV infection and it's life-cycle is based on the interplay between two systems, namely the virus itself and the human host. Consequently, network analysis using motifs results in understanding of dynamics and structure of interplay as opposed to the functioning of the two systems independently.
By interpreting the inferred network motifs (see Figure 7 and 8) we achieve insight into this interplay. Self-regulation or feedback is a pattern that is commonly found in gene regulatory systems (see [2–4, 36]). Generally these patterns indicate a response mechanism, where a signal such as a gene regulation (up-regulation, down-regulation) or a phosphorylation (activation, inhibition) of a protein A triggers a similar signal to protein B. In the two node case the source of the signal to A is B, thus potentially resulting in a positive or negative feedback loop. In the three node case (two different HIV proteins) interpretation is less trivial. When we consider all HIV proteins that make up the virus as a unity, we may consider the motif as a feedback or self-regulation. Since current available data is lacking information on interactions between HIV proteins, we are not able to interpret it as a loop. Yet interaction between HIV proteins, especially with the regulatory protein Tat, are known to be prevalent . Therefore it is plausible to assume the existence of three node feedback loops.
One limitation of the network motif analysis is the absence of time (or causality) and spatial information associated with each event in the database. Therefore, reconstruction of pathway dynamics by means of network motifs is not possible. One way to overcome this problem, at least for some motifs, is to include interactions among human proteins that indicate shared compartments and time. For instance, co-regulation, specifically in the case of two of the same interactions, points to a potential redundancy. This only holds when we assume that the two similar interactions occur in a shared spatial and temporal frame, i.e. the interactions happen in the same cellular compartment and roughly at the same time. This assumption becomes more plausible when HDF-HDF interactions are incorporated, serving as proof for the co-occurrence in time and space, of the two proteins. Co-regulation that occurs within a clique thus more strongly points to redundancy. It is these redundancies that are known to contribute to the robustness of regulatory networks in general [46–48] and give evidence for a potential cause of the robust nature of HIV infections.
Studying HIV-human interaction in terms of network motifs gives us the opportunity to reconstruct dynamics on the protein level. It is known that under selection pressure by the immune system the HIV virus undertakes a number of actions to evade this defense. This interplay where the host tries to undermine virus reproduction and where the virus evades immune response is the key concept for understanding virus-host relations.
Network motifs that have been found to be significantly over-represented, i.e. when their existence can not only be accounted for by randomness, show patterns that apparently have been selected for. By investigating these motifs individually we observe these strategies on the protein interaction level.
One of such motifs is a two node feedback loop, found in the HIV-host activation/inhibition network (see motif B2 in Figure 7). Significant over-representation of this network motif shows the inhibitory behavior of HIV proteins on human proteins that in turn inhibit the HIV protein. We therefore refer to these patterns as an "indirect positive feedback" and in this specific case "self-activation" as inhibition of an inhibitor results in (relative) activation. Closer inspection of all occurrences of this network motif shows that the HIV Tat and Gp120 protein and the human protein Interferon Gamma (Ifng) have the highest level of involvement. Gene Ontology analysis of the observed network motif indicates that the human proteins involved in the network motif are involved in immune response (see additional file 12).
Ifng, or type II interferon, is a cytokine critical for innate and adaptive immunity against viral and intracellular bacterial infections and for tumor control. The importance of Ifng in the immune system stems in part from its ability to inhibit viral replication directly, but most importantly derives from its immunostimulatory and immunomodulatory effects [49, 50].
We want to acknowledge that the results presented in this paper are based on annotated protein interaction data from the NIAID database. This data varies strongly in quality and it can be argued to contain a bias as a result from translating individual reports into a structured database. Therefore the results presented above should be interpreted qualitatively authentic rather than quantitatively accurate. Nonetheless, the presented work is the first in the field, according to our knowledge, to incorporate network centrality analysis and network motifs in a virus-host protein interaction network. We encourage experimental testing of the results in this paper to study their potential role in HIV infection.
We have demonstrated that infection with HIV results in re-prioritization of cellular processes such as transcription and proteasome activity. The primary success of the virus depends on the synthesis of new virions in a reasonable amount of time. This has to be accomplished before the infected cells are detected by patrolling CD8+ T cells or a humoral response has emerged. Therefore it is highly plausible that hijacking of the transcriptional machinery is one of the key processes that has a pronounced role post-infection.
In addition, proteasomes not only gain significant importance for the survival of the cell by degrading HIV proteins early in the infection, but arguably also for HIV, since they regulate the concentration of the innate antiviral host factors such as APOBEC3G/F and CD317 and can be targeted by HIV proteins Vpu and Vif.
We have shown that using network motifs one can identify recurring patterns that have consequences in the virus-host dynamics. Specifically, we observed patterns that show strategies of the virus used to evade the host immune system. Finally, we conclude that the survival of HIV within the host requires direct control of the cellular machinery via the pivotal human proteins and indirect control via the proteasomes. Network motifs and complex network theory provide a promising framework to study these dynamics.
NCBI Database to network
The NCBI HIV-Human Protein Interaction database is used to construct a protein interaction network. The obtained network consists of nineteen HIV proteins that interact with 1452 human proteins through 3959 interactions (See Figure 1.)In this protein interaction network nodes represent either HIV or human proteins and edges interactions between them. Because interactions between HIV and human proteins are annotated (see Figure 8) for most common interaction types), edges in our network are directed and have an interaction type. As interactions are only between HIV and human protein, the resulting network is bipartite.
HIV protein connectivity
Figure 2 shows the connectivity of the nineteen HIV proteins in the HIV-Human protein interaction networks. Figure 2-A shows the absolute number of interactions per HIV proteins for each of the two subnetworks and the total network. Figure 2-B shows the normalized relative connectivity. This was achieved by first calculating the relative connectivity, by dividing the number of interactions for each protein and network by the total number of interactions in that network. Next the numbers were normalized by dividing the relative connectivity for each protein and each of the two subnetworks by the relative connectivity of that protein in the total network. This normalization permits the comparison of proteins and subnetworks.
Human Protein interactions
To incorporate interactions between HDFs and between HDFs and human non-HDF proteins, data on protein interaction was collected from several databases (BIND, BioGRID, HPRD) and added to the network [51–53]. As a result the network consists of nineteen HIV proteins, 1,452 HDFs and 12,557 non-HDF human proteins, and 3,959 HIV-HDF interactions, 4,540 HDF-HDF interactions and 13,189 interactions between HDFs and non-HDF human proteins.
The metrics that are used to rank HDFs according to their importance in the network are based on a number of network centrality measures (measured per node):
In contrast to the degree, which is a measure of direct connectedness (number of interacting proteins in our case), the eigenvector centrality measures direct and indirect connectedness. Because well connected nodes contribute more to the score of their neighbors than low connected nodes, a protein with a relative high eigenvector centrality not just indicates high activity in terms of different interactions, but also points to activity in important pathways. The betweenness centrality, on the contrary, only measures pathway activity. A protein with high betweenness is positioned at a central location in the network, as relatively many shortest paths cross it. This does not necessarily imply well connectedness in terms of degree; a low connected protein might still have a high centrality. This way important "cross-roads" in the network can be identified, that would not have been noticed using standard degree analysis.
Using these three metrics we seek to measure the importance of human proteins that interact with HIV proteins (HDFs). In order to distinguish between HDFs that are important to whole human functioning and HDFs that are specifically important to the HIV life-cycle, we normalize our centrality ranking using a distinction between "local" and "global" metrics.
For instance, we define local degree of an HDF as the number of edges to other HDFs, and global degree of an HDF as the number of edges to any other human protein (including HDFs). So local degree measures connectivity within the HDF network, whereas global degree measures connectivity in the whole human protein interaction network. Similarly, we define local and global measures for eigenvector centrality and betweenness.
To use this as a normalization, first, we filter for proteins in the the top five percent of local degree, eigenvector centrality and betweenness. This results in 73 proteins for each metric. Second, to calculate the adjusted centrality metrics we divide the local by the global value. This results in three lists of proteins that are important specifically for HIV regarding these three metrics (see Table 6 and 7).
Network motif detection
The HIV-Host protein interaction network was analyzed for network motifs using a motif detection algorithm implemented in Prolog (see additional files 13, 14, 15 and 16). The prolog programming language presents a useful alternative for network motif finding as the definition and detection of network patterns is highly intuitive (prolog is a declarative language used for logic programming). In contrast to the motif detection tools MAVisto  and Mfinder ), our implementation in Prolog and the FANMOD  motif finding tool are able to find any annotated network pattern consisting of any number of nodes and edges. This means that we are able to specify the type of edges and nodes, thereby distinguishing between different functional motifs even though they have the same topology (i.e. distinguishing between regulatory and activation/inhibition motifs). Motif detection was carried out for all possible two and three node patterns. To determine the significance of the observed motifs, motif detection was repeated on one thousand randomized networks using a strict randomization algorithm. This to ensure an unchanged connectivity distribution.
Fully randomized networks would make any found network motif to be significant. For this reason a randomized network should be as similar to the original network as possible, yet randomized. In [2, 3, 36] this is achieved by introducing a rewiring algorithm that iteratively switches the sources or targets of two random edges until the network is sufficiently randomized. This results in a network where the edges are randomized without changing the number of nodes or edges and without changing the degree distribution of the network. In our approach we used a similar algorithm (see Figure 6) for randomizing the networks. Because edges can be of different type, we either switch the sources or targets of two randomly chosen edges with equal probability.
As described in [3, 36] the significance of network motifs is determined using the P value and Z score which are calculated using the number of a specific motif found in the original network (N real ) and the average number found in the randomized networks (N rand ) with standard deviation (SD). A network motif is found to be significant if the probability of finding the motif N real times in the randomized networks (P value ) is smaller than 0.02 and the number of standard deviations N real is removed from N rand is at least 2. As a result the network motifs that are found to be significant can not just be attributed to randomness.
Human Immunodeficiency Virus
The National Center for Biotechnology Information
The National Institute of Allergy and Infectious Diseases
HIV Dependency Factors.
The authors would like to thank Dr. Marten Postma and our reviewers for their valuable feedback. This research was supported by the European Union through the ViroLab project http://www.virolab.org, EU project no. IST-027446 and DynaNets project http://www.dynanets.org EU grant agreement no: 233847
- Sloot PMA, Coveney PV, Ertaylan G, Müller V, Boucher CAB, Bubak M: HIV decision support: from molecule to man. Philosophical transactions Series A, Mathematical, physical, and engineering sciences. 2009, 367 (1898): 2691-703. http://rsta.royalsocietypublishing.org/content/367/1898/2691.long 10.1098/rsta.2009.0043View ArticlePubMedGoogle Scholar
- Yeger-Lotem E, Sattath S, Kashtan N, Itzkovitz S, Milo R, Pinter RY, Alon U, Margalit H: Network motifs in integrated cellular networks of transcription-regulation and protein-protein interaction. Proc Natl Acad Sci USA. 2004, 101 (16): 5934-9. 10.1073/pnas.0306752101PubMed CentralView ArticlePubMedGoogle Scholar
- Shen-Orr SS, Milo R, Mangan S, Alon U: Network motifs in the transcriptional regulation network of Escherichia coli. Nat Genet. 2002, 31: 64-68. 10.1038/ng881View ArticlePubMedGoogle Scholar
- Alon U: Network motifs: theory and experimental approaches. Nat Rev Genet. 2007, http://www.nature.com/pdfinder/10.1038/nrg2102Google Scholar
- Barabasi A, Albert R: Emergence of scaling in random networks. Science. 1999, 286 (5439): 509- 10.1126/science.286.5439.509View ArticlePubMedGoogle Scholar
- Park J, Barabási AL: Distribution of node characteristics in complex networks. Proc Natl Acad Sci USA. 2007, 104 (46): 17916-20. http://www.pnas.org/content/104/46/17916.long 10.1073/pnas.0705081104PubMed CentralView ArticlePubMedGoogle Scholar
- Barabási A, Oltvai Z: Network biology: understanding the cell's functional organization. Nat Rev Genet. 2004, 5 (2): 101-113. 10.1038/nrg1272View ArticlePubMedGoogle Scholar
- Newman MEJEJ: The structure and function of complex networks [Review]. SIAM Review. 2003http://d.wanfangdata.com.cn/NSTLQKNSTLQK6801247.aspxGoogle Scholar
- Newman MEJEJ, Barabási AL, Watts DJ: The structure and dynamics of networks. 2006, Princeton University Press, http://books.google.com/books?id=6LvQIIP0TQ8C&printsec=frontcoverGoogle Scholar
- Moslonka-Lefebvre M, Pautasso M, Jeger M: Disease spread in small-size directed networks: Epidemic threshold, correlation between links to and from nodes, and clustering. J Theor Biol. 2009, http://www.sciencedirect.com/science?_ob=ArticleURL&udi=B6WMD-4WK48GN-1&_user=10&_rdoc=1&_fmt=&_orig=search&_sort=d&_docanchor=&view=c&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=b56f13bd2360d16255e12a0aa0e1cb76Google Scholar
- Gómez-Gardeñes J, Latora V, Moreno Y, Profumo E: Spreading of sexually transmitted diseases in heterosexual populations. Proc Natl Acad Sci USA. 2008, 105 (5): 1399-404. http://www.pnas.org/content/105/5/1399.long 10.1073/pnas.0707332105PubMed CentralView ArticlePubMedGoogle Scholar
- Sloot P, Ivanov S, Boukhanovsky A, van de D: Stochastic Simulation of HIV Population Dynamics through Complex Network Modeling. science.uva.nl. 2009, http://www.science.uva.nl/research/pwrs/papers/archive/Sloot2007c.pdfGoogle Scholar
- Erwin DH, Davidson EH: The evolution of hierarchical gene regulatory networks. Nat Rev Genet. 2009, 10 (2): 141-8. http://www.nature.com/nrg/journal/v10/n2/abs/nrg2499.html 10.1038/nrg2499View ArticlePubMedGoogle Scholar
- Ledford H: FANTOM studies networks in cells. Nature. 2009, 458 (7241): 955-http://www.nature.com/news/2009/090419/full/458954a.html 10.1038/458954aView ArticlePubMedGoogle Scholar
- Ye C, Galbraith SJ, Liao JC, Eskin E: Using network component analysis to dissect regulatory networks mediated by transcription factors in yeast. PLoS Comput Biol. 2009, 5 (3): e1000311- 10.1371/journal.pcbi.1000311PubMed CentralView ArticlePubMedGoogle Scholar
- Bezerianos A, Maraziotis IA: Computational models reconstruct gene regulatory networks. Mol. BioSyst. 2008, 4 (10): 993-1000. [From David.], 10.1039/b800446nView ArticlePubMedGoogle Scholar
- Crombach A, Hogeweg P: Evolution of evolvability in gene regulatory networks. PLoS Comput Biol. 2008, 4 (7): e1000112- 10.1371/journal.pcbi.1000112PubMed CentralView ArticlePubMedGoogle Scholar
- González MC, Hidalgo CA, Barabási AL: Understanding individual human mobility patterns. Nature. 2008, 453 (7196): 779-82. http://www.nature.com/nature/journal/v453/n7196/full/nature06958.html 10.1038/nature06958View ArticlePubMedGoogle Scholar
- Dyer MD, Murali TM, Sobral BW: The landscape of human proteins interacting with viruses and other pathogens. PLoS Pathog. 2008, 4 (2): e32- 10.1371/journal.ppat.0040032PubMed CentralView ArticlePubMedGoogle Scholar
- van't Wout AB, Schuitemaker H, Kootstra NA: Isolation and propagation of HIV-1 on peripheral blood mononuclear cells. Nat Protoc. 2008, 3 (3): 363-70. http://www.nature.com/nprot/journal/v3/n3/abs/nprot.2008.3.html 10.1038/nprot.2008.3View ArticleGoogle Scholar
- Fu W, Sanders-Beer B, Katz K, Maglott D, Pruitt K, Ptak R: Human immunodeficiency virus type 1, human protein interaction database at NCBI. Nucleic Acids Res. 2008, [David],Google Scholar
- Ptak RG, Fu W, Sanders-Beer BE, Dickerson JE, Pinney JW, Robertson DL, Rozanov MN, Katz KS, Maglott DR, Pruitt KD, Dieffenbach CW: Cataloguing the HIV type 1 human protein interaction network. AIDS Res Hum Retroviruses. 2008, 24 (12): 1497-502. 10.1089/aid.2008.0113PubMed CentralView ArticlePubMedGoogle Scholar
- Zhou H, Xu M, Huang Q, Gates AT, Zhang XD, Castle JC, Stec E, Ferrer M, Strulovici B, Hazuda DJ, Espeseth AS: Genome-scale RNAi screen for host factors required for HIV replication. Cell Host Microbe. 2008, 4 (5): 495-504. http://linkinghub.elsevier.com/retrieve/pii/S1931-3128(08)00330-2 10.1016/j.chom.2008.10.004View ArticlePubMedGoogle Scholar
- König R, Zhou Y, Elleder D, Diamond TL, Bonamy GMC, Irelan JT, Chiang CY, Tu BP, Jesus PDD, Lilley CE, Seidel S, Opaluch AM, Caldwell JS, Weitzman MD, Kuhen KL, Bandyopadhyay S, Ideker T, Orth AP, Miraglia LJ, Bushman FD, Young JA, Chanda SK: Global analysis of host-pathogen interactions that regulate early-stage HIV-1 replication. Cell. 2008, 135: 49-60. http://linkinghub.elsevier.com/retrieve/pii/S0092-8674(08)00952-5 10.1016/j.cell.2008.07.032PubMed CentralView ArticlePubMedGoogle Scholar
- Brass AL, Dykxhoorn DM, Benita Y, Yan N, Engelman A, Xavier RJ, Lieberman J, Elledge SJ: Identification of Host Proteins Required for HIV Infection Through a Functional Genomic Screen. Science. 2008, 319 (5865): 921-926. 10.1126/science.1152725View ArticlePubMedGoogle Scholar
- Bushman FD, Malani N, Fernandes J, D'Orso I, Cagney G, Diamond TL, Zhou H, Hazuda DJ, Espeseth AS, König R, Bandyopadhyay S, Ideker T, Goff SP, Krogan NJ, Frankel AD, Young JAT, Chanda SK: Host cell factors in HIV replication: meta-analysis of genome-wide studies. PLoS Pathog. 2009, 5 (5): e1000437- 10.1371/journal.ppat.1000437PubMed CentralView ArticlePubMedGoogle Scholar
- Romani B, Engelbrecht S, Glashoff RH: Functions of Tat: the versatile protein of human immunodeficiency virus type 1. J Gen Virol. 2010, 91 (Pt 1): 1-12. 10.1099/vir.0.016303-0View ArticlePubMedGoogle Scholar
- Gorry P, Dunfee RL, Mefford M, Kunstman K, Morgan T, Moore JP, Mascola JR, Agopian K, Holm GH, Mehle A, Taylor J, Farzan M, Wang H, Ellery P, Willey SJ, Clapham PR, Wolinsky SM, Crowe SM, Gabuzdaac* D: Changes in the V3 region of gp120 contribute to unusually broad coreceptor usage of an HIV-1 isolate ... Virology. 2007, http://linkinghub.elsevier.com/retrieve/pii/S0042682206008749Google Scholar
- Kanmogne GD, Schall K, Leibhart J, Knipe B, Gendelman HE, Persidsky Y: HIV-1 gp120 compromises blood-brain barrier integrity and enhances monocyte migration across blood-brain barrier: implication for viral neuropathogenesis. Journal of cerebral blood ow and metabolism: official journal of the International Society of Cerebral Blood Flow and Metabolism. 2007, 27: 123-34.View ArticleGoogle Scholar
- Figueiredo A, Moore KL, Mak J, Sluis-Cremer N, de Bethune MP, Tachedjian G: Potent nonnucleoside reverse transcriptase inhibitors target HIV-1 Gag-Pol. PLoS Pathog. 2006, 2 (11): e119- 10.1371/journal.ppat.0020119PubMed CentralView ArticlePubMedGoogle Scholar
- Ozgür A, Vu T, Erkan G, Radev DR: Identifying gene-disease associations using centrality on a literature mined gene-interaction network. Bioinformatics. 2008, 24 (13): i277-85. 10.1093/bioinformatics/btn182PubMed CentralView ArticlePubMedGoogle Scholar
- Yu H, Kim PM, Sprecher E, Trifonov V, Gerstein M: The importance of bottlenecks in protein networks: correlation with gene essentiality and expression dynamics. PLoS Comput Biol. 2007, 3 (4): e59-http://www.ploscompbiol.org/article/info%253Adoi%252F10.1371%252Fjournal.pcbi.0030059 10.1371/journal.pcbi.0030059PubMed CentralView ArticlePubMedGoogle Scholar
- Sewell AK, Price DA, Teisserenc H, Booth BL, Gileadi U, Flavin FM, Trowsdale J, Phillips RE, Cerundolo V: IFN-gamma exposes a cryptic cytotoxic T lymphocyte epitope in HIV-1 reverse transcriptase. J Immunol. 1999, 162 (12): 7075-9.PubMedGoogle Scholar
- Mulder LC, Muesing MA: Degradation of HIV-1 integrase by the N-end rule pathway. J Biol Chem. 2000, 275 (38): 29749-53. 10.1074/jbc.M004670200View ArticlePubMedGoogle Scholar
- Huang DW, Sherman BT, Lempicki RA: Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009, 4: 44-57. 10.1038/nprot.2008.211View ArticleGoogle Scholar
- Milo R: Network Motifs: Simple Building Blocks of Complex Networks. Science. 2002, 298 (5594): 824-827. 10.1126/science.298.5594.824View ArticlePubMedGoogle Scholar
- Albert R: Scale-free networks in cell biology. Journal of Cell Science. 2005, 118 (21): 4947-4957. 10.1242/jcs.02714View ArticlePubMedGoogle Scholar
- Turlure F, Devroe E, Silver PA, Engelman A: Human cell proteins and human immunodeficiency virus DNA integration. Front Biosci. 2004, 9: 3187-208.http://www.bioscience.org/2004/v9/af/1472/fulltext.htm 10.2741/1472View ArticlePubMedGoogle Scholar
- Mebratu Y, Tesfaigzi Y: How ERK1/2 activation controls cell proliferation and cell death: Is subcellular localization the answer?. Cell Cycle. 2009, 8 (8): 1168-75. 10.4161/cc.8.8.8147PubMed CentralView ArticlePubMedGoogle Scholar
- Andrew AJ, Miyagi E, Kao S, Strebel K: The formation of cysteine-linked dimers of BST-2/tetherin is important for inhibition of HIV-1 virus release but not for sensitivity to Vpu. Retrovirology. 2009, 6: 80- 10.1186/1742-4690-6-80PubMed CentralView ArticlePubMedGoogle Scholar
- Goffinet C, Allespach I, Homann S, Tervo HM, Habermann A, Rupp D, Oberbremer L, Kern C, Tibroni N, Welsch S, Krijnse-Locker J, Banting G, Kräusslich HG, Fackler OT, Keppler OT: HIV-1 antagonism of CD317 is species specific and involves Vpu-mediated proteasomal degradation of the restriction factor. Cell Host Microbe. 2009, 5 (3): 285-97. 10.1016/j.chom.2009.01.009View ArticlePubMedGoogle Scholar
- Goffinet C, Homann S, Ambiel I, Tibroni N, Rupp D, Keppler OT, Fackler OT: Antagonism of CD317 restriction of human immunodeficiency virus type 1 (HIV-1) particle release and depletion of CD317 are separable activities of HIV-1 Vpu. J Virol. 2010, 84 (8): 4089-94.http://jvi.asm.org/cgi/content/full/84/8/4089?view=long&pmid=20147395 10.1128/JVI.01549-09PubMed CentralView ArticlePubMedGoogle Scholar
- Douglas JL, Viswanathan K, McCarroll MN, Gustin JK, Früh K, Moses AV: Vpu directs the degradation of the human immunodeficiency virus restriction factor BST-2/Tetherin via a betaTrCP-dependent mechanism. J Virol. 2009, 83 (16): 7931-47.http://jvi.asm.org/cgi/content/full/83/16/7931?view=long&pmid=19515779 10.1128/JVI.00242-09PubMed CentralView ArticlePubMedGoogle Scholar
- Yamashita T, Nomaguchi M, Miyake A, Uchiyama T, Adachi A: Status of APOBEC3G/F in cells and progeny virions modulated by Vif determines HIV-1 infectivity. Microbes Infect. 2010, 12 (2): 166-71. 10.1016/j.micinf.2009.11.007View ArticlePubMedGoogle Scholar
- Malim MH: APOBEC proteins and intrinsic resistance to HIV-1 infection. Philos Trans R Soc Lond, B, Biol Sci. 2009, 364 (1517): 675-87. 10.1098/rstb.2008.0185PubMed CentralView ArticlePubMedGoogle Scholar
- Wagner A, Wright J: Alternative routes and mutational robustness in complex regulatory networks. Biosystems. 2007, 88 (1-2): 163-172. 10.1016/j.biosystems.2006.06.002View ArticlePubMedGoogle Scholar
- Ciliberti S, Martin OC, Wagner A: Robustness Can Evolve Gradually in Complex Regulatory Gene Networks with Varying Topology. PLoS Comput Biol. 2007, 3 (2): e15- 10.1371/journal.pcbi.0030015PubMed CentralView ArticlePubMedGoogle Scholar
- Macia J, Solé R: Distributed robustness in cellular networks: insights from synthetic evolved circuits. journals.royalsociety.org. 2009,http://journals.royalsociety.org/index/qg5q57l8720685g7.pdfGoogle Scholar
- Yahi N, Spitalnik SL, Stefano KA, Micco PD, Gonzalez-Scarano F, Fantini J: Interferon-gamma decreases cell surface expression of galactosyl ceramide, the receptor for HIV-1 GP120 on human colonic epithelial cells. Virology. 1994, 204 (2): 550-7. 10.1006/viro.1994.1568View ArticlePubMedGoogle Scholar
- Emilie D, Maillot MC, Nicolas JF, Fior R, Galanaud P: Antagonistic effect of interferon-gamma on tat-induced transactivation of HIV long terminal repeat. J Biol Chem. 1992, 267 (29): 20565-70.PubMedGoogle Scholar
- Prasad TSK, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, Telikicherla D, Raju R, Shafreen B, Venugopal A, Balakrishnan L, Marimuthu A, Banerjee S, Somanathan DS, Sebastian A, Rani S, Ray S, Kishore CJH, Kanth S, Ahmed M, Kashyap MK, Mohmood R, Ramachandra YL, Krishna V, Rahiman BA, Mohan S, Ranganathan P, Ramabadran S, Chaerkady R, Pandey A: Human Protein Reference Database-2009 update. Nucleic Acids Res. 2009, D767-72. 37 Database,
- Breitkreutz BJ, Stark C, Reguly T, Boucher L, Breitkreutz A, Livstone M, Oughtred R, Lackner DH, Bähler J, Wood V, Dolinski K, Tyers M: The BioGRID Interaction Database: 2008 update. Nucleic Acids Res. 2008, D637-40. 36 Database,
- Gilbert D: Biomolecular interaction network database. Briefings in Bioinformatics. 2005, 6 (2): 194-8. 10.1093/bib/6.2.194View ArticlePubMedGoogle Scholar
- Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003, 13 (11): 2498-504.http://genome.cshlp.org/content/13/11/2498.long 10.1101/gr.1239303PubMed CentralView ArticlePubMedGoogle Scholar
- Csardi G, Nepusz T: The igraph software package for complex network research. InterJournal Complex Systems. 2006, 1695:Google Scholar
- R Development Core Team: R: A language and environment for statistical computing. 2005, [ISBN 3-900051-07-0], R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.orgGoogle Scholar
- Bonacich P: Power and Centrality: A Family of Measures. The American Journal of Sociology. 1987, 92 (5): 1170-1182. 10.1086/228631.http://www.jstor.org/stable/2780000 10.1086/228631View ArticleGoogle Scholar
- Katz L: A new status index derived from sociometric analysis. Psychometrika. 1953, 18: 39-43. 10.1007/BF02289026.http://ideas.repec.org/a/spr/psycho/v18y1953i1p39-43.html 10.1007/BF02289026View ArticleGoogle Scholar
- Freeman L: Centrality in social networks: Conceptual clarification. Social networks. 1979, 1 (3): 215-239. 10.1016/0378-8733(78)90021-7.View ArticleGoogle Scholar
- Brandes U: A faster algorithm for betweenness centrality. J Math Sociol. 2001, 25 (2): 163-177. 10.1080/0022250X.2001.9990249.View ArticleGoogle Scholar
- Schreiber F, Schwobbermeyer H: MAVisto: a tool for the exploration of network motifs. Bioinformatics. 2005, 21 (17): 3572- 10.1093/bioinformatics/bti556View ArticlePubMedGoogle Scholar
- Wernicke S, Rasche F: FANMOD: a tool for fast network motif detection. Bioinformatics. 2006, 22 (9): 1152- 10.1093/bioinformatics/btl038View ArticlePubMedGoogle Scholar