Construction of a cancer-perturbed protein-protein interaction network for discovery of apoptosis drug targets

Background Cancer is caused by genetic abnormalities, such as mutations of oncogenes or tumor suppressor genes, which alter downstream signal transduction pathways and protein-protein interactions. Comparisons of the interactions of proteins in cancerous and normal cells can shed light on the mechanisms of carcinogenesis. Results We constructed initial networks of protein-protein interactions involved in the apoptosis of cancerous and normal cells by use of two human yeast two-hybrid data sets and four online databases. Next, we applied a nonlinear stochastic model, maximum likelihood parameter estimation, and Akaike Information Criteria (AIC) to eliminate false-positive protein-protein interactions in our initial protein interaction networks by use of microarray data. Comparisons of the networks of apoptosis in HeLa (human cervical carcinoma) cells and in normal primary lung fibroblasts provided insight into the mechanism of apoptosis and allowed identification of potential drug targets. The potential targets include BCL2, caspase-3 and TP53. Our comparison of cancerous and normal cells also allowed derivation of several party hubs and date hubs in the human protein-protein interaction networks involved in caspase activation. Conclusion Our method allows identification of cancer-perturbed protein-protein interactions involved in apoptosis and identification of potential molecular targets for development of anti-cancer drugs.


Background
Study of interactome, the entire set of molecular interactions within cells, has provided many insights into the etiology and regulation of cancer [1,2]. Tumorigenesis is a multi-step process caused by genetic alterations that drive the progressive transformation of normal cells into malignant cells. At the molecular level, genetic mutations, translocations, amplifications, deletions, and viral gene insertions can alter translated proteins and thereby disrupt signal transduction pathways and protein-protein interactions that are essential for apoptosis and other important cellular processes [3]. Inactivation of pro-apoptotic proteins or up-regulation of anti-apoptotic proteins results in unchecked growth of cells and ultimately to cancer [4]. From a systems biology perspective, cancer is mainly caused by malfunctions of perturbed protein interaction networks in the cell [5,6].
Apoptosis is necessary for normal human development and survival, in that cells must die in order to prevent uncontrolled growth [7]. Apoptosis requires activation of multiple pathways via regulated protein-protein interac-tions [8]. Apoptosis is mediated by an intrinsic pathway, which is triggered by "death stimuli" (e.g., DNA damage, oncogene activation, among others) within a cell, or by an extrinsic pathway, which is initiated by binding of an extracellular "death ligand". The extrinsic pathway can link to the intrinsic pathway, which then triggers the release of mitochondria proteins via protein-protein interactions [4,8,9]. During apoptosis, several proteins are released from the intermembrane space of the mitochondria into the cytoplasm and these proteins activate initiator caspases and trigger a series of protein-protein interactions in the caspase cascade. Evading apoptosis is one of the six acquired capabilities of cancer cells [3], and anticancer treatment using cytotoxic drugs is considered to mediate cell death by activating key elements of the apoptosis program and the cellular stress response [10]. Comprehensive knowledge of protein-protein interactions provides a framework for understanding the biology of cancer as an integrated system [11].
Most gene products mediate their functions within complex networks of interconnected macromolecules, forming a dynamic topological interactome [11,12]. High throughput two-hybrid experiments [13,14] and several online interactome databases, such as BIND [15], HPRD [16], Intact [17], and Himap [18], allow analysis of the global topologies of human protein-protein interactions. BIND [15] is a database designed to store full descriptions of interactions, molecular complexes and pathways. HPRD [16] provides detailed data including protein sequences, localization, domains, and motifs, and thousands of protein-protein interactions, with other data. Intact [17] contains an enrichment of protein-protein interactions, related literature, and experimental detail. Himap [18] combines two datasets of yeast-two-hybrid experiments [13,14] to form a human protein reference database [16], with references to functions and predictions.
However, experimental and database approaches often yield "false-positives" [19]. For example, yeast two-hybrid experiments based on transactivation of reporter genes require the presence of auto-activators, where the bait activates gene expression in the absence of any prey [11]. The yeast two-hybrid technique can yield false-positives (spurious interactions detected because of the high-throughput nature of the screening process), and false-negatives (undetected interactions) [19,20]. Computational methods can refine protein-protein interaction networks and result in fewer false-positives [21,22]. Because of the complex nature of interactomes, such as that observed in the apoptosome complex during caspase formation [7,8,23], a nonlinear mathematical model provides better characterization than a linear model [24,25]. In addition, a stochastic model allows consideration of intrinsic and extrinsic molecular "noise" that causes stochastic variations in transcription and translation [24]. In this paper, we describe a nonlinear stochastic model that characterizes dynamic protein-protein interaction networks of apoptosis in cancerous and normal cells.
In this study, we built an initial protein-protein interaction network based on two human yeast two-hybrid data sets [13,14] and four online interactome databases such as BIND [15], HPRD [16], Intact [17], and Himap [18]. Next, we constructed a nonlinear stochastic model of dynamic protein-protein interactions to eliminate false-positives from the network by applying a statistical method (Akaike Information Criterion, AIC) to the high-throughput protein interaction data. We regard all proteins in an organism as a large dynamic interaction system. Protein-protein interactions are considered as nonlinear stochastic processes with several expression profiles of interactive protein partners as input, and the expression profile of a target protein as output. Because of random noise and uncertainties during experiments, we describe protein-protein interactions with stochastic discrete nonlinear dynamic equations. We considered linear individual (or binary) protein interactions and nonlinear cooperative protein complex interactions, but not DNA-protein or metabolite interactions. First, we constructed protein-protein interaction networks of apoptosis in HeLa (human cervical carcinoma) cells and normal primary human lung fibroblasts based on microarray data [26]. Next, we obtained the cancer-perturbed protein-protein interaction network by comparison of apoptosis in normal cells via gain-of-function and loss-of-function networks. Because current drugs designed to induce apoptosis kill cancer cells as well as normal cells, these cancer-perturbed protein-protein interaction networks allow identification of potential selective targets of apoptosis-promoting drugs [5].

Construction of the cancer-perturbed protein-protein interaction network of apoptosis
Initially, we selected proteins that are known to have roles in apoptosis and considered them as the "core nodes" of our network. These included BAX (BCL2-associated X protein), BCL2 (B-cell CLL/lymphoma 2), BID (BH3 interacting domain death agonist), CASP3 (caspase-3), BIRC4 (baculoviral IAP repeat-containing 4), CASP9 (caspase-9), CYCS (cytochrome c, somatic), and DIABLO (diablo homolog, Drosophila). Networks, such as ours, that are developed from initially selected genes or proteins as the core nodes are referred to as "BRAC-centered networks" [27]. Our initial apoptosis network contained 207 protein nodes and 841 protein-protein interaction edges.
From equations (1) to (13) (see "Methods"), we calculated each protein interaction twice, with each partner Global protein-protein interactions of apoptosis in cancerous and normal cells   Figure 1 and Supplementary Table 1 (see "Additional file 1"). Figure 1 illustrates the individual and cooperative protein-protein interaction networks of apoptosis in cancerous cells (183 nodes and 552 edges) and normal cells (175 nodes and 547 edges). These networks are easily modeled by undirected graphs, where the nodes are proteins and two nodes are connected by an undirected edge if the corresponding proteins bind one another [28].
Supplementary Table 1 compares the networks of apoptosis in HeLa cells and normal primary human lung fibroblasts. These data show, for example, that BAX and PEG3 interact in both cell types, that BAX and CCND1 interact in neither cell type, that BAX and RARG interact in normal cells but not in cancerous cells, and that BAX and BCL2L1 interact in cancerous cells but not in normal cells. In order to identify drug targets for anti-cancer drugs, it is important to identify cancer-perturbed protein-protein interaction networks to identify drug targets to kill cancer cells [3]. If an interaction is absent in normal cells, but present in cancer cells, we call it "gain-of-function"; if an interaction is present in normal cells but not in cancerous cells, we term it "loss-of-function". For the 841 interactions that we identified, we classified 157 (18.7%) as "gain-of-function" and 162 (19.3%) as "loss-of-function" (Figs 2A and  2B). This network analysis identified 38% (18.7%+19.3%) of all protein-protein interactions during apoptosis as potential drug targets. Figure 2 shows nodes that are colored according to protein family, as annotated by the Gene Ontology (GO) hierarchy, and illustrates gain-and loss-of-function interactions derived from Supplementary Tables 2 and 3 (see 'Additional file 2' and 'Additional file 3'). Proteins are listed with Gene Ontology (GO) annotations according to the number of perturbed interactions. All protein candidates with more than five degrees of perturbations are shown, to illustrate the number of links with perturbed nodes. Gain-of-function proteins with more than five degrees of perturbations in Figure 2A include BCL2, CASP3, TP53, BCL2L1, PRKCD, MAPK3, NFKB1, BIRC3, CCND1, and PCNA. Loss-of-function proteins with more than five degrees of perturbations in Figure 2B include BCL2, BAX, CASP3, CDKN1A, TP53, BCL2L1, TNF, CASP9, EGFR, MAPK1, APC, TNFRSF6, BAK1, MYC, CFLAR, and APP (see Supplementary Tables 2 and 3). The BCL2 protein has the highest degree of perturbations (18 and 17) in cancerous and normal cells, respectively.
In order to confirm the topology of our networks (Fig. 2), we calculated the false-positive and false-negative rates for the 86 BCL2-interacting proteins in normal cells by use of the HPRD (Human Protein Reference Database) [16] and literature review (see the representative example in Sup-plementary Table 4 from 'Additional file 4'). After refinement by our algorithms, we reduced the false-positive rate to 1.16%. However, the false-negative rate remained at 41.87%, indicating incomplete construction of the network from current experiments and databases. Therefore, compensation by k (see equation (1) in "Methods") is important for estimation of model parameters.

Cancer-perturbed apoptosis mechanism at the systems level
In many cancers, pro-apoptotic proteins are inactivated or anti-apoptotic proteins are upregulated, leading to unchecked growth and an inability to respond to cellular stresses [4]. These gain-and loss-of-function mutations lead to aberrations in protein-protein interaction networks. An integration of interactome data and genomic data can provide a clearer understanding of the functional relationships that underlie apoptosis and other biological processes [11,21]. The results of our investigation of the apoptosis mechanism at the systems level and the elucidation of cancer-perturbed protein-protein interaction network topology are depicted in Figs. 2A and 2B.

Extrinsic pathway, intrinsic pathway and crosstalk
Members of the death receptor superfamily trigger the extrinsic apoptosis pathway upon recruit of caspase-8 through the adaptor protein FAS (TNFRSF6)-associated death domain (FADD) [7]. Binding of a "death ligand" to a receptor triggers formation of a signaling complex that activates caspase-8 or caspase-10, which then activates caspase-3, and finally promotes cell death [23]. Extracellular and intracellular stress triggers the intrinsic apoptosis pathway (mitochondria pathway), which involves activation of pro-apoptotic members of the Bcl-2 family [8]. Bid, a pro-apoptotic member of the Bcl-2 family, allows crosstalk between the extrinsic and intrinsic pathways. The three subfamilies of Bcl-2-related proteins are the anti-apoptotic proteins (e.g., BCL2 and BCL2L1), the proapoptotic multi-domain proteins (e.g., BAX and BAK), and pro-apoptotic BH3-only proteins (e.g., BID and BIM) [9,29].
Of the 19 Bcl-2 proteins or regulators (see "goProcess" in Supplementary Tables 2 and 3

Apoptosis and cell cycle regulation
Proteins generally function as components of complexes that contain other macromolecules, to carry out specific biological processes. Networks of interactions connect different cellular processes [11]. We have identified 30 proteins in the protein-protein interaction network of apoptosis that also participate in cell-cycle regulation. These are PTEN, CDC6, MFN2, PKMYT1, DCC, and E2F1 in Fig . In other words, it is not possible to uniquely describe protein hubs as "gain-of-function" or "loss-of-function" hubs. Therefore, we summarized the degree of perturbation (Supplementary Tables 2 and 3) to identify the perturbed protein hubs in the network of cancer cells. After identifying targeted proteins as inhibitors or activators, it is necessary to study how a drug target is wired into the control circuitry of a complex cellular network [30]. Next, we show a flow chart for identification and prediction of apoptosis drug targets in cancer drug discovery.

Prediction of apoptosis drug targets using cancerperturbed networks of apoptosis
Systems-based drug design is a major application of systems biology [5,25,31]. This method constructs diseaseperturbed protein-protein interaction networks and identifies potential drug targets by comparison of the networks of normal and abnormal cells. This contrasts with the traditional approach, which reduces cellular processes to their individual components or signal transduction pathways and targets a specific molecule or signaling pathway. A limitation of this traditional approach is that a single molecule or pathway does not adequately describe most biological systems, including those affected in cancer [25]. By comparisons of the protein-protein interaction networks of normal and cancerous cells derived from microarray data, we can identify potential drug targets through a systems-based approach (Fig. 3).
Flow chart for identification of potential drug targets in the cancer-perturbed network using microarray data Figure 3 Flow chart for identification of potential drug targets in the cancer-perturbed network using microarray data. Thus, we built initial protein-protein interaction networks from large-scale experiments and databases, and then employed each microarray data set of HeLa cells and primary lung fibroblasts to modify these networks. For network modification, we used a nonlinear stochastic model and the Akaike Information Criterion (AIC). We next compared the networks of cancerous and normal cells, derived gain-of-function and loss-of-function networks, and identified protein hubs with high degree of perturbation as potential drug targets.
Scale-free networks are extremely sensitive to removal of targeted hubs (i.e., attack vulnerability [11,12,32]), so we summed the degree of perturbation (i.e., connectivity) of each node in the cancer-perturbed network (Supplementary Tables 2 and 3) to obtain these perturbed hubs. Proteins with sum of degree of perturbation ≥ 8 (Table 1) differentiate the cancerous and normal interactomes and are potential drug targets [6,25]. We classified the 17 potential drug targets ( Our results indicate that most proteins interact with few partners, whereas hubs interact with many partners, consistent with current views on interactome networks with a scale-free or power law degree distribution [11]. Intrinsic pathway: BCL2, BAX, BCL2L1, and CYCS Defective apoptosis in human cancers often results from over-expression or inhibition of BCL2 proteins. These proteins regulate mitochondrial permeability by inhibiting (e.g., BCL2 and BCL2L1) or promoting (e.g., BAX and BID) release of cytochrome c (CYCS) [33]. BCL2 and several anti-apoptotic relatives, such as BCL2L1, associate with the mitochondrial outer membrane and the endoplasmic reticulum nuclear membrane and maintain the integrity of these membranes. Initiation of apoptosis requires pro-apoptotic family members that closely resemble BCL2 and distantly related proteins that are related only by the small BH3 protein-interaction domain [29]. In our results, proteins with gain-of-function interactions with BCL2 include CCND1, BAD, MCL1, MAPK3, ADM, KITLG, EGFR, BAG2, PKMYT1, TP53, PCNA, MITF, ABCB1, BCL6, ZNF384, HRK, PPP2R5A, and VEGF (Supplementary Table 2). Proteins with loss-of-function interactions with BCL2 include CDKN1A, TNF, WT1, BAG4, BCL2L14, DEK, GRN, RAF1, BLK, BAG5, CAPN2, GHR, CDKN1B, RTN4, BNIP3L, MAP3K1, and CLC (Supplementary Table 3). We predict BCL2 to be the best potential drug target because this protein best differentiates protein-protein interaction networks of HeLa and normal cells [5,25].
Our analysis agrees with the conclusions of previous studies which showed that BCL2 protein family members are good targets for cancer therapy. Drugs that target BCL2 include Genasense [4,34] and ABT-737 [35]. The activation of Bax can be induced by gene therapy through delivery of Bax vectors, and this approach has been successful in inducing apoptosis in cancer cell lines [4]. Antisense BCL2L1 (BCL-xL) downregulates the expression of BCL2 and BCL2L1, induces apoptosis, and inhibits growth of several tumor types in vitro and in vivo [34]. Unfortunately, targeting of BCL-2 also causes adverse effects, presumably 8 [34] because many normal cells depend on proteins in the BCL-2 family to maintain normal mitochondrial function. Difficulty in using BCL-2 antisense DNA or RNA as a delivery system is a problem with Genasense [36]. Cytochrome c, once released into the cytosol, interacts with Apaf-1 and this leads to activation of caspase-9 proenzymes [34]. The chief function of BCL-2 proteins is to regulate the release of cytochrome c from mitochondria. Thus, targeting CYCS or Apaf-1 would also be expected to cause severe adverse effects [37,38].
Extrinsic pathway and crosstalk: TNF, TNFRSF6, and BID The extrinsic pathway activated by death receptors, such as Fas (TNFRSF6/APO-1/CD95) and other TNF receptor family members, allows apoptosis to maintain normal tissue homeostasis [7]. Although death receptors of the TNF superfamily members are potential targets for anti-cancer drugs, toxic side effects have been observed that place limits on their therapeutic use [4]. TNF and Fas (TNFRSF6) were also found to activate nonspecific TNF receptors resulting in extensive ischemic and hemorrhagic lesions in several tissues leading to septic shock and fulminating hepatic failure in animal models [34].
A more promising approach involves targeting the TRAIL (TNF-Related Apoptosis Inducing Ligand) receptors [4,7,34]. Activation of the TRAIL death receptor can kill cancerous cells but not normal cells, whereas monoclonal antibodies against TRAIL and recombinant TRAIL ligand can cause TRAIL resistance [36]. Thus, administration of such a drug might cause tumors to develop resistance, or cause the death of normal cells. BID, a pro-apoptotic Bcl-2 family member, provides crosstalk and integration between the death-receptor and mitochondrial pathways [8,9]. Targeting BID, however, is rarely discussed, although such work has potential for use in combined therapy [34].

Common pathway: CASP3 and CASP9
Caspases are the central components of the apoptotic response network. An effector caspase (e.g., caspase-3) is activated by an initiator caspase (e.g., caspase-9) and the initiator caspase is activated via other protein-protein interactions [8,9]. Targeting inhibitors of caspases could potentially cause apoptosis of cancerous cells. Synthetic activators of caspases include Apoptin and IAP [33,34]. Caspase-3 and -9 are subject to inhibition by IAPs such as Livin [9]. Like BCL-2 inhibitors, XIAP inhibitors must block protein-protein interactions. When released from mitochondria, Smac binds XIAP and inactivates it, triggering apoptosis [36].

Apoptosis regulators: TP53, MYC, CFLAR, and EGFR
One of the most dramatic responses to p53 is induction of apoptosis and regulation via the intrinsic pathway [39].
Drug trials that target p53 include gene therapy involving ONYX-015 and INGN201 and antisense therapy that targets a protein controlling p53 activity by Nutlins which blocks p53/MDM2 interaction [4,34]. The proto-oncogene c-MYC encodes a transcription factor that is implicated in various cellular processes, including cell growth, proliferation, loss of differentiation, and apoptosis. The induction of cell-cycle entry sensitizes the cell to apoptosis, so that cell-proliferation and apoptotic pathways are coupled [40]. CFLAR (c-FLIP) regulates caspase-8 and FADD-like apoptosis. Whereas CFLAR blocks the activation of the initiator caspase-8, XIAP can block the initiation phase (by inhibition of caspase-9) and the execution phase (by blocking caspase-3 and caspase-7) [23]. Some agents, for example, agent ZD1839 as an EGFR (epidermal growth factor receptor) inhibitor, do not primarily target apoptosis, but indirectly modulate apoptosis [34].

Stress-induced signaling and others: MAPK1, MAPK3, CDKN1A, and CCND1
Proteins of the MAPK (mitogen-activated protein kinase) family are crucial in many signaling pathways [41]. Three MEK (MAPK kinase) inhibitors, CI-1040, PD0325901, and ARRY-142886, are currently in clinical trials for treatment of various cancers [42]. Although some drugs target MAPKs, MAPK1 (ERK or p38), and MAPK3 (ERK1) are not the main drug targets in the apoptotic pathway [34]. CDKN1A (p21) (cyclin-dependent kinase inhibitor-1) plays a role in cell cycle arrest and induction of apoptosis. The activities of cyclin D-and cyclin E-dependent kinases are linked through the Cip/Kip family of Cdk inhibitors, including p27 and p21 [43]. CCND1 is cyclin D1 in the G1/S transition of the cell cycle, and is controlled by the tumor suppressor gene RB through cdk-cyclin D complexes [44].

Prediction of additional drug targets by decreasing the degree of perturbation
In addition to identifying several apoptosis drug targets that have already been identified by other studies, we applied our method to predict additional drug targets by decreasing the perturbation threshold. Thus, if we reduce the sum of degree of perturbation to 7, we predict the following additional targets: BAK1, CASP2, BCL2A1, IGF1, PRKCD, NFKB1, and PCNA. NFKB1 has both anti-and pro-apoptotic functions that are determined by the nature of the death stimulus. The drug PS11445 targets the NFKB1 inhibitor IKKβ [34] but the other proteins have not previously been considered as drug targets.

Prediction of new GO annotations of the four proteins: CDKN1A, CCND, PCNA, and PRKCD
If we consider all 24 proteins with sum of degree of perturbation ≥ 7, four proteins (CDKN1A, CCND1, PRKCD and PCNA) are also considered to have a role in apopto-sis. If the function of any one protein in a network is known, identification of its interacting partners allows prediction of their functions [11]. However, two of these proteins (PCNA and PRKCD) are known to have a role in DNA damage-induced apoptosis. PCNA (Proliferating Cell Nuclear Antigen) is ubiquitinated and involved in RAD6-dependent DNA repair in response to DNA damage. PRKCD (protein kinase C, delta) is also associated with DNA damage-induced apoptosis [45], whereas gene ontology annotations of PRKCD do not include apoptosis (Supplementary Tables 2 and 3).
Although other methods (such as identification of protein domains) allow prediction of protein-protein interactions in different organisms [46][47][48], these methods do not allow identification of potential drug targets. Our method provides efficient and precise prediction of anti-cancer drug targets and also specifies these target proteins with detailed Gene Ontology annotations. This will help researchers identify additional drug targets by examination of other cellular mechanisms involved in cancer. Network modeling has been used to identify genes potentially associated with breast cancer in a BRAC-centered network [27]. However, there are very few timeseries microarray databases for cancerous and normal cells, so it is difficult to use our method to compare protein hubs for different types of cancer. Moreover, our method does not address the potential adverse effects of targeting a specific protein and the problem of drug delivery. We expect that more genomic time-series microarray experiments and clinical research will address these limitations in the future.

Caspase activation through static and dynamic hubs
Static network topology is not sufficient to define function, and incorporating time-dependent expression data is important for understanding pathway function [20]. A number of computational approaches have been proposed for prediction of protein-protein interactions, such as domain-domain interactions [46], the confidence score resulting from sequence similarity and number of edges [2], and integration of genomic data sets [49]. A nonlinear stochastic model can depict protein-protein interaction networks at different times by using linear binary interactions and nonlinear protein complex relationships. Previous studies have used linear stochastic models to describe the multiple feedback loops of p53 [50], but nonlinear effects cannot be depicted by this method. Our method also illustrates the dynamic behavior of protein-protein interaction networks, which cannot be examined by probabilistic methods of data integration [49].
Networks consist of party hubs (static hubs) and date hubs (dynamic hubs). Party hubs are found in static complexes with most of their partners present at the same time, whereas date hubs bind their interaction partners at different times or locations [12,51]. To further investigate dynamic apoptotic properties in human cells, we considered caspases as protein hubs in our dynamic nonlinear stochastic protein-protein models (Figs. 4A-D, Supplementary Tables 5 and 6 in 'Additional file 5' and 'Additional file 6'). In order to identify party hubs and date hubs, we first summed the degree of perturbation of each protein as 'plus degree of perturbation' and then subtracted the degree of perturbation of each protein as 'minus degree of perturbation' at two time periods in cancerous and normal cells. If the plus degree of perturbation of one protein was ≥ 20, we defined this protein as a 'hub'. If the minus degree of perturbation of the hub was ≤ 3, we defined the hub as a party hub (static hub Because date hubs appear to be more important than party hubs [51], caspase-2 and -9 are important date hubs that differentiate network topologies of cancerous and normal cells. TP53, BIRC3, BAX, and CASP1 are party hubs in both cell types. Party hubs are found in static complexes where they interact with most of their partners simultaneously [51]. In other words, we believe these four proteins play central roles in functional complexes in both cancerous and normal cells.

Conclusion
The cancer-perturbed protein-protein interaction networks of apoptosis that we developed here shed light on the mechanism of cancer at the systems level and allow identification of potential drug targets. In this study, we applied nonlinear stochastic modeling to describe individual and cooperative protein interactions. Our method is more precise than the linear models used in previous research. We successfully integrated microarray and proteome databases to identify cancer-perturbed protein-protein interaction networks. Our predictions of potential drug targets agreed with potential targets identified by other studies and also identified additional targets which may guide the development of new anti-cancer drugs in future. We also identified static and dynamic hubs in human protein-protein interaction networks, which have heretofore been identified only in yeast.

Construction of initial protein-protein interaction networks
A comprehensive understanding of protein-protein interactions in an organism provides a framework for understanding biology as an integrated system, and human perturbed protein-protein interaction networks offer insight into disease mechanisms such as cancer at the sys-tems level [5,11,25]. Before construction of cancer-perturbed protein-protein interaction networks to explore their roles in the mechanism of cancer, protein-protein interaction networks for both cancer and normal cells must be constructed for comparison. The systematic experimental mappings of the human interactome include yeast two-hybrid systems [13,14], which reveal thousands of preys and baits in the protein matrix. Several online databases such as BIND [15], HPRD [16], Intact [17], and Himap [18] provide fundamental global topologies of human protein-protein interactions. The com-  Tables 5 and 6).

A B
C D bined use of these databases [15][16][17][18] assists precise estimation of parameters as the basis of construction of protein-protein interaction networks, because incomplete rough networks can lead to biased or possibly erroneous conclusions [11]. We first constructed an initial proteinprotein interaction network using two yeast-two-hybrid experiments and four online databases.
Studies of large-scale protein-protein interactions allow development of protein interaction networks, but all large-scale experiments and databases contain high rates of false-positives [52]. Previous studies have demonstrated that use of multiple functional databases allows better identification of protein-protein interactions and leads to better prediction of the function of unknown proteins [49]. Therefore, we integrated different databases and experiments to refine our network [19]. After development of our initial network, we used microarray data to remove false-positive interactions.

Selecting and processing experimental data
We used microarray data [26] to compare gene expression in HeLa cells and normal primary human lung fibroblasts subjected to several stresses (e.g., heat shock from 37°C to 42°C, oxidative stress with menadione or hydrogen peroxide, and endoplasmic reticulum stress with DTT (dithiothreitol) or tunicamycin). Gene expression in HeLa cells and normal fibroblasts, both of which were treated with 2.5 mM DTT, were our microarray data sources. Gene expression was recorded at 0-30 h after stress in HeLa cells and 0-36 h in normal cells [26] (Supplementary Tables 7  and 8; see 'Additional file 7' and 'Additional file 8').

Nonlinear stochastic interaction model of initial networks
We used a nonlinear stochastic model to mimic the dynamic interactions among proteins to remove falsepositives and refine the network. In recent years, some systems and computational biologists employ a dynamic perspective to describe biological functions, because of their dynamic nature [53][54][55]. We considered two types of interactions, binary protein interactions and protein complex interactions, but not DNA-protein or metabolite interactions. We denote these basal interactions as k in our model, whereas ε[t] represents stochastic molecular events, such as molecular fluctuations of the target protein.
In this study, we define 'individual protein-protein interactions' as binary protein-protein interactions, and 'cooperative interactions' as the interaction of a protein complex with the target protein. as: . This equation is based on a previously developed model [56]. Therefore, dynamic interactions between upstream proteins and their targets can be written as:

Graphical representation of individual protein interactions and cooperative protein interactions
This equation is based on a previously developed model [56][57][58].
Remark: The discrete-time dynamic model in equation (1) is based on the discrete sampling of the following continuous differential model:  high-throughput protein interaction databases that contained putative protein complexes within the network. If the multi-protein complex is composed of three or more proteins, we added all combinations of two cooperative proteins from the complex to determine the nonlinear property because one extreme value in the equation will cause significant aberration in other estimated parameters. The basal interaction k in equation (1) represents unknown protein-protein interactions that result from other possible interacting proteins or other influences (e.g., mRNA-protein interactions, protein synthesis). ε[t] represents random noise arising from model uncertainty and molecular fluctuations of protein interactions with the target protein [24].
With the nonlinear stochastic interaction model (equation 1), we can identify interaction parameters a, b i , c ij and k using microarray data. After having identified all the interactions with the target protein, upstream proteins can be considered as target proteins, and by repeating this procedure we can identify all interactions of the initial network. If the estimated interaction parameters , , and are not significant according to the AIC, the corresponding interactions will be eliminated from the initial network.

Identification of interactions in the initial protein-protein interaction network
Before modification of the initial network, we estimated interaction parameters of the initial network by use of maximum likelihood estimation. This represents interactions of all possible protein candidates in the initial network. After further rearrangement, equation (1) can be rewritten as: where φ[t] denotes the regression vector composed of many elements that represent expression levels of protein candidates in the initial network at time point t.
Employing the cubic spline method [54,58] to interpolate microarray data allows us to obtain as many data points as needed. In general, the number of data points should be at least 5 times the number of parameters to be esti- Computation of equation (3) at different time points allows us to construct the following vector-form equation: For simplicity, we represent this as: In equation (4), random noise ε[t k ] is regarded as a white Gaussian noise with zero mean and unknown variance σ 2 , that is, E{ν} = 0, and Σ ν = E{νν T } = σ 2 I. Next, we employed a maximum likelihood estimation method [59] to estimate θ and σ 2 using regression data obtained from the microarray data of the target protein and proteins with which it interacts. Under the assumption that ν is a Gaussian noise vector with M -1 elements, its probability density function is given as follows.
Since ν = X -Φ · θ (equation 5), we can rewrite equation (6) as: Maximum likelihood parameter estimation involves finding θ and σ 2 to maximize the likelihood function in equation (7). In order to simplify the computation, it is practical to take the logarithm of equation (7), which yields the following log-likelihood function: In equation (8) After some computational arrangements from equation (9), the estimated parameters and are: After obtaining estimate , we can rewrite estimated protein-protein interaction (equation 1) as follows: θ We quantified the interactions of all candidate proteins by the process described above.
In equation (12), the estimated parameter denotes the estimation of the residual of target protein, denotes individual interactive rate or binary protein-protein interaction between protein i and the target protein, and denotes cooperative interactive rate or protein complex membership between protein i and protein j, i.e. protein complex ij, to the target protein. A positive value implies positive interaction, a negative value implies negative interaction, and the interactions are more likely as the parameters get larger.

Modification of initial protein-protein interaction networks
Although our maximum likelihood estimation method allows quantification of all possible interactions of a target protein, we still do not know the significance of the interaction and whether it can be regarded as a true interaction. Thus, we used a statistical approach that involves model validation to evaluate significance levels and refine the network. In this procedure, we employed the Akaike Information Criterion (AIC) to validate the model order or the number of model parameters of the network [59].
The Akaike Information Criterion (AIC) considers estimated residual variance and model complexity as one statistic and provides a measure of the information lost when a model is used. In other words, the AIC is an operational way of trading off the complexity of an estimated model against how well the model fits the data. AIC decreases as residual variance decreases and increases as the number of parameters p increases. As the expected residual variance decreases with increasing P for inadequate model complexities, there should be a minimum near the correct number of interaction parameters P. For a protein interaction model with P interaction parameters to fit with data from N samples, the Akaike Information Criterion (AIC) can be written as follows [59] where denotes the estimated expression profile of the target protein, i.e. = φ · . After the statistical selection of P parameters by minimizing the AIC, we can determine whether the protein interaction is significant or a false positive.
In our results, '1' represents protein i interacting with protein j if the interaction is within P significant interactions, and '0' represents protein i not interacting with protein j if the interaction has less than P significant interactions. It must be considered that protein-protein interaction networks represent mutual binding relationships; if protein i binds to protein j, then protein j also binds to protein i. To determine mutual relationships in the high false positive yeast-two-hybrid experiments, we use the Boolean logical 'AND' to determine the interaction maps. Two proteins were considered to interact only if each protein binds to the other protein (see Supplementary Table 1). If an interaction is detected only in HeLa cells but not in normal cells, we call this a 'gain-of-function'; if the interaction is detected in normal cells but not HeLa cells, we call it a 'loss-of-function'. Table 2 shows the truth table of all 16 events in the sample space of comparisons of individual protein-protein interactions between HeLa cells and normal cells, as determined with our AIC detection algorithm. Protein complexes are only considered once, because of incomplete information from online databases. We iteratively refined interactions in the initial network using a similar procedure. Finally, this yielded a refined protein-protein interaction network for HeLa and normal cells.
We provide all Matlab programs and codes of these figures drawn by Osprey 1.2.0 [60] in 'Additional file 9' and 'Additional file 10'. In the Matlab programs, we consider the target protein BAX as an example. Readers can simulate other target proteins using a similar procedure.