Construction and analysis of the protein-protein interaction network related to essential hypertension

Background Essential hypertension (EH) is a complex disease as a consequence of interaction between environmental factors and genetic background, but the pathogenesis of EH remains elusive. The emerging tools of network medicine offer a platform to explore a complex disease at system level. In this study, we aimed to identify the key proteins and the biological regulatory pathways involving in EH and further to explore the molecular connectivities between these pathways by the topological analysis of the Protein-protein interaction (PPI) network. Result The extended network including one giant network consisted of 535 nodes connected via 2572 edges and two separated small networks. 27 proteins with high BC and 28 proteins with large degree have been identified. NOS3 with highest BC and Closeness centrality located in the centre of the network. The backbone network derived from high BC proteins presents a clear and visual overview which shows all important regulatory pathways for blood pressure (BP) and the crosstalk between them. Finally, the robustness of NOS3 as central protein and accuracy of backbone were validated by 287 test networks. Conclusion Our finding suggests that blood pressure variation is orchestrated by an integrated PPI network centered on NOS3.


Background
Hypertension is a main risk factor of stroke, heat failure and ischemic heart disease. In spite of the huge amount of researches recently performed in this area, the pathogenesis of human hypertension remains elusive. Thus, hypertension has to be defined as "essential" for 95 to 99% of cases [1]. Essential hypertension (EH) is viewed as a consequence of interaction between environmental factors and genetic background. Data from animal models, human twin and family studies have indicated that approximately 30%-60% of BP variation is caused by genetic factors [2,3]. Furthermore, association study and linkage analysis have determined many casual or susceptible genes related to EH. BP must be a highly regulated quantity, affected by a multitude of physiological systems that finally integrate and maintain BP levels to secure an adequate blood perfusion of all tissues [4]. BP variation is a consequence of altered activity in signal transduction pathways and interactions of complex intra-and intercellular processes. As all biochemical processes are governed by the proteins, we propose that proteinprotein interactions (PPIs) especially the proteins encoded by these casual or susceptible genes are extremely important in orchestrating the BP variation.
In the recent years, the topological analyses have been applied to molecular networks including protein interaction networks, whose nodes are proteins linked to each other via physical interactions [5]. In this study, we aimed to identify the important proteins and the biological regulatory pathways involving in EH and further explore the molecular connectivity between these pathways by the topological analysis of the PPIs network derived from the proteins encoded by casual or susceptible genes for EH. The parameters of degree and betweenness are two fundamental measures in network theory. Degree measures how many neighbors a node direct connect to while betweenness measures how often nodes occur on the shortest paths between other nodes [6]. In the PPIs network the nodes with high degree defined as hub protein and the nodes with high betweenness defined as bottleneck protein, both are key or important protein [6]. Yu H and colleagues think of protein networks in analogy to a transportation network, so proteins with high betweenness are similar to heavily used intersections, such as those leading to major highways or bridges [7]. In this study, we employed the proteins with high BC to identify the important genes and their related signal pathways in regulating BP.

Methods
The research method used in this study mainly consisted of seven steps.
Step one: extraction of the candidate genes associated with EH from the literature using PolySearch text mining system. Step two: Scanning protein interactions from the database STRING.
Step three: Construction of PPIs network and extraction the giant component from the extended network.
Step four: Topological Analysis of PPI network.
Step five: extraction the large BC nodes from the giant network to create a backbone network.
Step six: Construction a subnetwork consisting of all shortest paths between the candidate genes from the giant network.
Step seven: Validation of the backbone network and the NOS3 as central protein.
Extraction of genes associated with essential hypertension from the literature We searched candidate genes associated with EH by PolySearch text mining system, which can produce a list of concepts relevant to the user's query by analyzing multiple information sources including PubMed, OMIM, DrugBank and Swiss-Prot. It covers many types of biomedical concepts including diseases, genes/proteins, drugs, metabolites, SNPs, pathways and tissues [8]. We used PolySearch system to search the genes associated with EH. The query type is 'Disease-Gene/Protein Association' and the query keyword is 'essential hypertension'. PolySearch system returns 1435 literatures. To check the accuracy, we manually confirmed whether these genes are associated with the essential hypertension. Finally a total of 69 candidate genes were obtained (Table 1).

Scanning protein-protein interactions
The candidate genes listed in Table 1 were converted to be the seed proteins. We obtained PPIs from STRING database, a precomputed database for the exploration of protein-protein interactions. The newest version of STRING, 9.0, covers approximately 2.5 million proteins from 630 different organisms [9].

Construction of PPIs network and extracting the giant component from the extended network
We constructed an extended network that not only consists of the seed proteins but their direct PPI neighbors and the interactions between these proteins. The network was constructed using Pajek [10], a highly versatile program for the analysis, operation and visualization of large networks. In this study, the extended network includes a giant component and two small separate components derived from two seeds proteins. This study aimed to explore the mechanism of EH at the system level and the nodes with large BC value must be in the giant network obviously because both of two small separate components consist of small number of nodes, so only the giant network and its parameters related to the network theory had been analyzed or processed. In order to analyze and process the giant network conveniently, we extracted it from the extended network.

Topological analysis of protein interaction network
Properties of nodes including connectivity degree (k), betweenness centrality (BC) and closeness centrality (CC) were adopted to evaluate the nodes in a network; especially k and BC are two fundamental parameters in the network theory [6,11]. Degree (k), the most basic characteristic of a node in a network is defined as the number of adjacent links, i.e. the number of interactions that connect one protein to its neighbors. BC is the fraction of the number of shortest paths that pass through each node, which measures how often nodes occur on the shortest paths between other nodes. The shortest path is calculated by measuring the length of all the geodesics from or to the vertices in the network. A node with high BC has great influence over what flows in the network. BC may play a major role as a global property since it is a useful indicator for detecting bottlenecks in a network. Closeness centrality (CC) is defined as the inverse of the average length of the shortest paths to/from all the other nodes in the graph, which tells us the topological center of the network. Global topological measurements of networks include average degree, mean shortest path length and diameter used to character network [6]. Average degree (<k>): it represents the mean of all degree values of nodes in a network. Mean shortest path length (mspl): is the average of the steps needed to connect every pair of nodes through their shortest path. Diameter (D): is the longest among all shortest paths. In this study, properties of nodes and measurements used to characterize network were calculated by Pajek software. Searching high BC nodes to create a backbone network In this study, we viewed PPI maintaining the blood pressure homeostasis as a transportation network. Thus, the proteins with high BC should be the heavily used intersections, these proteins and the links between them make up a backbone network. The critical point of high BC was set at 5% of the total node set of the network [12,13]. These high BC nodes and the links between them were extracted from the giant network to create a backbone network. BC was originally introduced to measure the centrality of the nodes in a network. By definition, most of the shortest paths in a network go through the nodes with high BC. These nodes function as bottleneck control the communication among other nodes in the network.

Construction a subnetwork consisting of all shortest paths between the candidate genes
Even in the giant network, there are a few pairs of candidate gene are not connected directly. In order to construct a subnetwork in which all genes associated with EH are connected directly or indirectly with minimum number of nodes, we found out all shortest paths between every pair of candidate genes. The shortest paths between the candidate genes are calculated by Pajek software. Then the subnetwork consists of nodes in these paths.

Validation of the backbone network and the NOS3 as central protein
In order to validate the robustness of the backbone network and the NOS3 as central protein, we constructed test networks only using a part of 69 genes as initial seeds. The initial seed genes were determined by omitting from 1 to 7 (10% of 69) genes. If the number of the omitted genes is 3, there are 314364 (69 × 68 × 67) combinations. Therefore, the omitted genes were selected randomly if the number of omitted genes is more than 3. However, considering the importance of NOS3 in our conclusion, NOS3 was omitted always. Then the exact method of omitting genes is as below. If the number of omitted gene is 1, then there are 69 combinations because every gene of 69 genes was omitted once. If the number of omitted gene is 2, then there are 68 combinations (NOS3 and other 68 genes). If the number of omitted genes is more than 3, the omitted genes are NOS3 and other genes selected randomly and regardless of number of omitted genes, randomly selecting is 30 times. Finally, 287 test networks (69 + 68 + 5*30) ware constructed (Additional file 1), and the BC values of nodes in these networks ware calculated by Pajek software. Then the nodes with top 27 BC value were determined in these test networks. We tested the robustness of the backbone network and the NOS3 as central protein by calculating frequency of NOS3 as a node with the largest BC value and the accuracy of the backbone nodes in the test networks. The accuracy of backbone was estimated as the fraction of the nodes with top 27 BC in the test networks which agree with the nodes in the backbone network described in step five.

Protein-protein interaction network
The extended network includes one giant network and two separated small networks which are derived respectively from the seed protein CYBA (cytochrome b-245, alpha polypeptide) and PSMA6 (proteasome subunit, alpha type, 6) ( Figure 1). The giant network consisted of 535 nodes connected via 2572 edges (Figure 2). The backbone network consisted of 27 nodes connected via 39 edges ( Figure 3). Accordingly, we studied the measurements charactering network listed in Table 2: number of nodes (N), average degree (<k>), diameter (D) and mean shortest path length (mspl). The largest degree in the giant network is 43, while its average degree is 7.61. This network is characterized by a small number of highly connected nodes, while most of the other nodes have few connections. It indicates that the giant network is similar to other human PPIs [14].

Key nodes in the PPI network
In this study, the nodes with large degree or high BC were viewed as key nodes, and 5% of the total nodes set of the network was used as the critical point of large degree and high BC nodes. Of 535 total nodes, 27 nodes have high BC (Table 3), 28 nodes have large degree (Table 4) and 13 nodes were selected with high BC and large degree (Table 5) and 14 nodes only with high BC ( Table 6). In order to discern their roles in the network, these nodes were highlighted in different color and size ( Figure 2). KNG1 (kininogen 1) is a hub protein with the largest degree, while NOS3 (nitric oxide synthase 3) is a bottleneck protein with the highest BC. NOS3 has highest CC value, which indicates that NOS3 locates at the centre of the network.
The signaling pathsway in the high BC network and cross-talk between them derived from backbone network The backbone network consists from 27 high BC nodes, the size of which corresponds to their BC value and the Figure 1 Overview of the extended network. The extended network includes one giant network and two separated small networks which are derived respectively from the seed protein CYBA (cytochrome b-245, alpha polypeptide) and PSMA6 (proteasome subunit, alpha type, 6). The nodes with label are seed proteins converted from the candidate genes listed in Table 1 while the nodes without name are their neighbors scanned from STRING database.  39 links between them (Figure 3). Without calculating the values of BC and CC, we can find out that NOS3 locates at the centre of the backbone network with the highest BC value and the largest degree. NOS3 has 8 neighbors: SIRT1, CAT, AKT1, IFNG, TNF, KNG1, REN and CALM1. These proteins also represented SIRT1 pathway, antioxidant system, AKT pathway, inflammatory system, kallikrein-kinin system, rennin-angiotensin system and Calcium signaling pathway. The details of other proteins in the backbone network were not presented here.

Subnetwork consisting of all shortest paths between the candidate genes
This subnetwork consists of 93 nodes including 6 proteins which are not large BC nodes nor seed proteins, 60 seed proteins, 20 large BC nodes and 7 nodes which are both seed protein and large BC node (Figure 4). We can find out that NOS3 has the highest BC value and the top 27 BC nodes in this subnetwork coincide well with 27 nodes in the backbone network. There are only 6 proteins is not in the list of 27 nodes with large BC value in the giant network. They are TGFBR2, AGT, ACE2 GNAQ, HSD11B2 and KCNJ1 ( Table 7).
The robustness of the backbone network and the NOS3 as central protein There are 7 genes with the largest BC value in the test networks. They are CALM1, HIF1A, IFNG, KNG1, NOS3, NPY and REN (Additional file 1). Though NOS3 is not as the initial seed gene, its frequency as the node with the largest BC value is 211 in 287 test networks (Table 8 and Figure 5). The accuracy of the backbone is 0.80344 (Table 8). Both the accuracy of the backbone and the frequency of NOS3 as the node with the largest BC value decrease rapidly when the number of omitted genes is 3 (Table 8 and Figure 6).

Discussion
Though larger number of study had been finished on EH and many casual or susceptible genes related to EH had been reported, its pathogenesis remains elusive. We proposal that the proteins encode by these genes can determine BP level by the interactions between them. The purpose of this study is to analysis the contribution of these proteins to the pathogenesis of EH and discovers other key proteins cooperating with them by topological analyses. As two fundamental measures in the network    [12,[28][29][30]. We also utilized degree and betweenness as main parameters to evaluate the nodes in the PPIs. In this study, 69 genes have been searched as causative or susceptible genes involved in EH. The network derived from seed proteins converted from these genes, consists a giant network and two separated small network ( Figure 1). Only two seed proteins (CYBA and PSMA6) separate from the giant network, it suggests that the PPIs between these proteins orchestrate the BP variation. There must be some missed genes from literature searching and new causative or susceptible genes remained to be discovered for EH, even false nodes result from false interactions in the network. However, as reviewed by Gipsi Lima-Mendez and Jacques van Helden [14], biological networks are tolerant to nodes deletion, and new nodes prefer to link to nodes with large degree. In another word, biological networks are robust to random alteration of nodes but sensitive to hub removal.
In the giant network, there are 28 proteins with large degree and 27 proteins with high BC, 13 proteins with both large degree and high BC among them (Tables 3, 4, 5 and Figure 2). In order to disentangle the effects of betweenness and degree, Yu and co-workers divided all proteins in a certain network into four categories [7]: nonhub-nonbottlenecks (small degree and low BC); hub-nonbottlenecks (large degree but low BC); nonhub-bottlenecks (small degree but high BC); and hub-bottlenecks (large degree and high BC). Han et al. distinguish two subtypes among the highly connected proteins: hub-bottlenecks tend to be date-hubs, whereas Table 5 The list of proteins with both high BC and large degree and their functions

Symbol
Function description

NOS3
Produces nitric oxide which is implicated in vascular smooth muscle relaxation.

CYP2E1
An effective producer of reactive oxygen species.

NPY
A peptide with direct and potential effects on vasoconstriction.

HIF1A
Functions as a master transcriptional regulator of the adaptive response to hypoxia. POMC Controls energy homeostasis.

REN
Generates angiotensin I from angiotensinogen in the plasma.

TNF
A proinflammatory cytokine which induce endothelial dysfunction.

PTH
Elevates calcium level by dissolving the salts in bone and preventing their renal excretion.

SRC
Involves in cell maintenance and communication.

INS
Decreases blood glucose concentration.

TGFB1
Controls proliferation, differentiation and other functions in many cell types.

IFNG
Has an important immunoregulatory function.

RGS5
A potent GTPase-activating protein for Giα and Gqα.

GRK4
Specifically phosphorylates the activated forms of G protein-coupled receptors.

ATP1B1
Maintains the normal gradients of Na (+) and K (+) across plasma membrane.

SELE
Cell-surface glycoprotein having a role in immunoadhesion.

GNAQ
Acts as an activator of phospholipase C.

SP1
Activates or represses transcription in response to stimuli.
hub-nonbottlenecks tend to be party-hubs. Party hubs interact with most of their partners simultaneously, whereas date hubs bind different partners at different times or locations [15]. We believe that further verify the space-time effect of these proteins, which will help us to identify drug targets and biomarkers for EH. KNG1 with the largest degree ranks 5 in the high BC proteins list while NOS3 with the highest BC ranks 5 in the large degree proteins list. KNG1 representing kallikrein-kinin system and NOS3 representing Endothelial NO system both mainly function as vasodilatation in the regulation of BP. In certain degree, we can cautiously speculate that EH originates from the failure of systemic or local vasodilatation in the right time and right place. NOS3 with the largest CC value locates at the centre of the giant network and the backbone network derived from high BC proteins, which highlight the significant role of NO system in maintaining BP homeostasis. In the study, the backbone network centering on NOS3 is a signaling high pathway to regulate the BP variation ( Figure 3). The proteins within it are key intersections.  The intersections direct linking to NOS3 include SIRT1, CAT, AKT1, IFNG, TNF, KNG1, REN and CALM1. It has been reported that SIRT1 promotes endothelial-dependent vasodilatation by targeting NOS3 for deacetylation, leading to enhance nitric oxide (NO) production [16]. A recent study has shown that production of NO, stimulated by caloric restriction, increases SIRT1 expression; this study suggests that eNOS may be involved in regulation of the expression of SIRT1 in murine white adipocytes [17]. Although H 2 O 2 is not directly involved in NO synthesis, the H 2 O 2 / CAT stimulate NO synthase activity [18]. As the major cardiovascular enzymatic antioxidants, CAT indicates the role of oxidative stress in the hypertension [19]. Akt regulates the activity of NOS3 via phosphorylation at Ser1177, regulating NO production and vasodilation [20]. It has been estimated that Akt kinase has over 9000 possible substrates [21]. The evidence regarding the role of inflammatory system (TNF, IFNG) and renin-angiotensin system (REN) in BP regulation and their interactivity with NOS3 is available anywhere. After release from its precursor KNG1, kinin regulates NOS3 by activating two distinct G protein-coupled receptors called B2R and B1R [22]. CALM1 activates NO synthesis in NOS3 through a conformation change of the flavin mononucleotide domain from its shielded electron-accepting state to a new electron-donating state [23]. Theses proteins also represented SIRT1 pathway, antioxidant system, AKT pathway, inflammatory system, kallikrein-kinin system, rennin-angiotensin system and Calcium signaling pathway. Their role in BP regulation and their interactions with NO system are reported by many researches [23][24][25][26][27].
The backbone network presents a clear and visual overview which shows all important genes and related regulatory pathways for BP and the crosstalk between them. In order to further confirm the role of NOS3 and other proteins in the backbone network, we construct a subnetwork consisting of all shortest paths between the candidate genes ( Figure 4). In this subnetwork there are only 6 proteins neither seed proteins converted from candidate genes or nodes with large BC value in the giant network. In another word, the large BC nodes can connect and integrate these seed proteins well. We can also find out that NOS3 has the highest BC value and  To test how robust the conclusions obtained in this work against the change of initial seed genes, 287 test networks had been constructed by omitting several initial seed genes. Despite that NOS3 was not as initial seed genes always, its frequency as a node with the largest BC value is 211 in 287 test networks. KNG1, REN, NPY, CALM1, HIF1A and IFNG Flowing NOS3, their frequency is 50, 10, 6, 5, 4 and 1 respectively (Table 8, Figure 5). All of these 7 proteins are the nodes with high BC and degree in the original network ( Table 5). The accuracy of backbone is 0.80344 (Table 8). Both the accuracy of the backbone and the frequency of NOS3 as the node with the largest BC value decrease rapidly when the number of omitted genes is 3 (Table 8 and Figure 6). It may suggest that the NOS3 as central protein and the component of backbone network dependent each other.

Conclusion
Most of seed proteins (67 of 69) associated with EH and their PPI neighbours connected to a giant network. The backbone network presented a clear overview, which shown all important genes, their related regulatory pathways for BP and the crosstalk between them. The backbone network is robust against the changes of initial seed genes. Our finding suggested that blood pressure variation was orchestrated by an integrated PPI network centered on NOS3.

Additional file
Additional file 1: Detail information for frequency of nodes with the largest BC value and accuracy of backbone in the 287 test networks.

Competing interests
The author(s) declare that they have no competing interests.
Authors' contributions RJ made substantial contributions to conception and design, construction of PPIs, analysis of PPIs, was involved in drafting the manuscript or revising it critically for important intellectual content, and gave final approval of the version to be published. LH made substantial contributions to construction of PPIs, analysis of PPIs, was involved in drafting the manuscript or revising it critically for important intellectual content, and gave final approval of the version to be published. FJ made substantial contributions to searching and validation of genes related to essential hypertension, and gave final approval of the version to be published. LL made substantial contributions to searching and validation of genes related to essential hypertension, and gave final approval of the version to be published. XY made substantial contributions to searching and validation of genes related to essential hypertension, and gave final approval of the version to be published. LX made substantial contributions to Scanning protein interactions from the database STRING and gave final approval of the version to be published. SH made substantial contributions to Scanning protein interactions from the database STRING. CY made substantial contributions to Scanning protein interactions from the database STRING. JX made substantial contributions to Scanning protein interactions from the database STRING. LY made substantial contributions to searching and validation of genes related to essential hypertension, and gave final approval of the version to be published. LH made substantial contributions to conception and design, construction of PPIs, analysis of PPIs, was involved in drafting the manuscript or revising it critically for important intellectual content, and gave final approval of the version to be published. All authors read and approved the final manuscript.