### Integrative evaluation of the essentiality of signaling components

Signaling networks can be represented as directed graphs in which nodes denote signaling components, edges represent regulatory interactions, and the orientation of the edges reflects the direction of signal transduction [10, 29–31, 33]. The input (source) nodes of signaling networks represent the initial signals or their receptors, the intermediate nodes consist of various kinases and second messengers, and the output (sink) nodes represent transcription factors or cellular responses. The edges of signaling networks generally represent directional interactions such as phosphorylation, transcriptional regulation, and enzyme catalysis, which result in either inhibitory or activating effects. Our general aim is to predict the essentiality of signaling components through structural (graph theoretical) analysis. Since the graph corresponding to a signaling network does not reflect the possible conditional dependency between incoming edges and cannot resolve inconsistent paths caused by inhibition, we propose an augmented graph representation that naturally incorporates synergy and inhibition. Our method is based on three main steps: network expansion, determination of the cascading effects of node removal, and using the novel concept of elementary signaling mode (ESM) to characterize the network before and after removing a node. The graph theoretical framework proposed here is uniquely suited to signaling networks and similar regulatory networks in which the edges do not necessarily correspond to elementary reactions.

#### Expansion of signaling networks

We utilize two operation rules to expand a signaling network to a new representation that incorporates the regulatory logic (e.g. inhibition, synergistic regulations). First, to take into account inhibitory interactions, we introduce a complementary node for each component that negatively regulates other nodes or is being negatively regulated by other nodes (see Figure 2(a)). This complementary node represents the logical negation of the original node, and allows us to evaluate the influence of the original node's inhibitory effect on the output node. Since nodes which are activated by others and have only activating effects on other nodes have no direct inhibitory roles, we do not introduce complementary nodes for them. An inhibitory edge starting at a node in the original network becomes an activating edge starting at its complementary node in the expanded network. Similarly, an inhibitory edge ending at a node becomes an activating edge ending at its complementary node. Introducing complementary nodes may lead to edges or subgraphs with no connections or relevance to the paths between input(s) and output(s). These edges or subgraphs are discarded by traversing the expanded network from input(s) to output(s) and keeping only the nodes and edges that are relevant to at least one input-output path.

Second, we introduce composite nodes to incorporate conditionally dependent relationships. We represent the combinatorial relationship of multiple regulatory interactions converging on a node *v* by a Boolean rule (see Methods) which can be uniquely written in the following disjunctive normal form:

where *u*_{
ij
} are regulators of node *v*. Usually the Boolean rules for each node will need to be constructed individually on the basis of the existing biological evidence. Our method can be used even if only partial information of inhibitory regulations and synergistic interactions is available (see Methods for guidelines). The Boolean rule governing a complementary node is the logical negation of the Boolean rule that governs its corresponding original node. For each set of synergistic interactions (with AND relationship) ending at a node, we introduce a composite node to denote the synergy in a graphical form [34, 35]. The regulators of *v* activate the composite node, which then activates the node *v* (see Figure 2(b)). The two operations illustrated in Figure 2(a) and 2(b) are executed in combination to expand the signaling network (see Figure 2(c)).

Introducing complementary nodes and composite nodes increases the number of nodes and edges in the network, but the benefit is that ambiguity is eliminated. All the directed interactions in the expanded network represent activation. Multiple edges ending at a composite node are conditionally dependent on each other, and multiple edges ending at an original node or complementary node are independent. The expanded signaling network does not have ambiguous dependencies, and can serve as a substrate for augmented structural methods. In addition, expansion of a signaling network by decomposing complex Boolean rules into independent elements helps to untangle the network into individual modules.

#### Cascading effects of a signaling component's removal

As the expanded signaling network clearly represents the relationships among nodes and signaling interactions, we can evaluate the essentiality of a signaling component by examining the range to which its perturbation propagates. We determine the cascading effect of the removal of a node by an algorithm that iteratively finds and deletes the nodes that have just lost their indispensable regulators (see Methods). There are three cases for a regulator *v* to be indispensable for a direct target node *u*: (1) *v* is the sole regulator of *u*; (2) *u* is a composite node; (3) *v* is the only remaining regulator of *u* left due to the disruption of other regulators. Figure 3(a) shows an example in which removal of node A leads to the disruption of C, D, G, H, and I, but does not eliminate nodes B, E, F and J, since there are two independent signaling interactions activating B. Removing an original node simulates the knockout of a signaling component and evaluates the importance of the activating role of this component. In contrast, removal of a complementary node simulates the constitutive expression (activation) of the signaling component and evaluates the influence of the inhibitory role of this component.

#### Elementary signaling modes

The connectivity of a signaling network between the input node(s) (e.g. ligands) and the output node(s) (e.g. cellular responses) is most reflective of its signal transduction capacity. The concept of shortest paths is widely used to characterize the efficiency of information flow or communicability in networks [17]. However, this measure would classify as unimportant all the nodes that are located outside of such shortest paths, which is clearly unrealistic for signaling networks.

Elementary flux modes, minimal sets of enzymes that can make a metabolic system operate at a steady state, play an important role in metabolic network analysis [8, 36]. Previous efforts for adapting this concept to signaling networks apply to networks that include only chemical reactions and activating regulation [37] or to enzyme cascades (e.g. phosphorylation and dephosphorylation) [38]. Here we propose a counterpart of elementary flux modes in signaling networks which include a variety of interactions and regulatory relationships. An elementary signaling mode (ESM) is defined as a minimal set of components that can perform signal transduction from initial signals to cellular responses. By 'minimal', we mean that an ESM is not decomposable and none of its signaling components is redundant, i.e. knockout of any of the nodes in the ESM will make it unable to transduce the signal. The concept of ESM is an extension of the graph concept of simple path. An ESM that does not contain any composite nodes is indeed a simple path. If the ESM has a composite node, it additionally includes all the edges ending at the composite node and their upstream nodes (see Figure 3(b)). We formulate the identification of an ESM into an integer linear programming problem and design an iterative algorithm to calculate all ESMs in a signaling network (Additional file 1). Since integer programming is NP-hard and cannot be used to enumerate all ESMs in large networks, we also design an efficient depth-first search-based approximate algorithm for estimating the number of ESMs in a signaling network (Additional file 1). In addition, we determine the shortest ESM as an extension of the concept of shortest path by using a dynamic programming procedure (see Methods).

Our method ranks the importance of signaling components by the effects of their perturbation on the ESMs of the network. We characterize each node *v* by the relative reduction in the number of ESMs following the cascading loss of nodes caused by the removal of *v* (Methods). This measure takes values in the interval [0,1], with 1 indicating a node whose loss causes the disruption of all ESMs between the input and output node(s). As a comparison we also define the analogous measure using simple paths instead of ESMs (Methods).

### Validation on three signaling networks

Several regulatory networks with documented synergistic and inhibitory interactions have been published [30–33, 35, 39], which are suitable for validation of our method. We choose three signaling networks describing the immune response of mammals to bacteria, guard cell abscisic acid signaling in plants, and T cell receptor signaling [30–32], as benchmarks for validating our method. We use the Boolean rules developed in prior articles on these networks to encode synergistic interactions and inhibitory regulations. The essentiality of each real node and complementary node is determined by using our ESM centrality measure (denoted by *E*^{ESM}) and the simple path (SP) centrality measure (denoted by *E*^{SP}). We compare these measures to a traditional centrality measure, node betweenness centrality (denoted by BC) [18], as well as the simple path measure used without considering the cascading effects of a node deletion (which is equivalent to the *SigFlux* measure [29]). We evaluate the importance values given by each method by comparing with experimental observations. Additionally we characterize each method's classification accuracy by comparing with the results of Boolean (logical) dynamic models (Additional file 2). Specifically, components are classified as essential if their knockout (OFF state) or constitutive activation (ON state) leads to an incorrect state of the output (Additional file 2). The effect of deleting an original node (a complementary node) in our method is compared with that of keeping this node as OFF (ON) in the Boolean dynamic models or logical steady state analysis. We use sensitivity (the fraction of essential components that are recognized by a method) and specificity (the fraction of non-essential components that are recognized by a method) to evaluate the four methods. High sensitivity with high specificity (i.e. low false discovery rate) indicates good performance of a method. Varying the threshold of importance values that corresponds to essentiality gives a series of sensitivity and specificity values that form an ROC curve.

#### The host immune response network

Thakar *et al.* assembled a regulatory network of immunological processes activated upon invasion by *Bordetellae* bacteria and developed asynchronous Boolean dynamic models of bacterial infections [30]. This network has 18 nodes, of which 'bacteria' can be considered as the input node, and 'phagocytosis' as the output node. The sixteen intermediate nodes include innate immune components such as pro-inflammatory cytokines and dendritic cells, early induced immune components such as B cells, and adaptive immune components such as T helper cells and related cytokines.

Using this network and its Boolean rules, we construct the expanded host immune response network shown in Figure 4(a). Three time-dependent details of the Boolean rules, namely timed decay for Th1RC and Th2RC, the threshold effect in the Boolean rule for T0, and the self-regulations in the Boolean rules for Cab and Oab are not included in the expanded network. Although the Thakar et al. network includes a negative edge between phagocytosis (PH) and bacteria (Bt), which can be translated into the edge ~PH→Bt, this edge is not relevant to the process from the input Bt to the output PH and thus is not included in the expanded network. Due to the relatively small size of the host-immune response network we can exactly enumerate all ESMs. The integer linear programming algorithm and the depth-first-search algorithm using the multiplication operation (see Additional file 1) agree in indicating that there are 15 ESMs in this network. We can see from Figure 4(b) showing the shortest ESM that half of the nodes are involved in all ESMs and are therefore essential to bacterial clearance. The flexible signaling components including Cab, AgAb, Cp, MP, RP, Oab, Th1RC, are involved in a varying number of ESMs.

The importance values of signaling components based on the ESM measure and the SP measure are shown in Figure 4(c). Both measures indicate that single-node deletion of six components, BC, Th2C, Th2RC, T0, DC or AP, leads to the elimination of all signal transduction from bacterial infection to bacterial clearance. These predictions are validated by the experimental observations [40–42] indicating that the deletion of B cells (BC), T0 cells (T0), dendritic cells (DC) or a lack of adaptive immune response results in bacterial persistence. The main difference between the results of the ESM measure and the SP measure lies in the importance values of EC and PIC. Knocking out EC or PIC disrupts all ESMs, but there are several simple paths left. The essentiality of these two nodes indicated by the ESM measure is supported by the fact that pro-inflammatory cytokines and inflammation are required for the resolution of infections [43]. Although betweenness centrality and *SigFlux* give importance values that correlate with those obtained by our method for some components, they fail to capture the essentiality of several other components such as Th2RC and T0 cells. In addition, single-node deletion of the complementary nodes of Th1RC or Th2RC will completely block bacterial clearance according to both the ESM and SP measure, indicating that the inhibition of these nodes at certain stages of the infection is essential for the immune response. Indeed, experimental observations confirm that a switch-over between Th2-related and Th1-related immune functions is necessary for bacterial clearance [30, 40]. However, betweenness centrality and the *SigFlux* measure give low importance values for these inhibitory nodes, which contradict immunological knowledge.

We rerun the Boolean dynamic model of Thakar *et al.*[30] to perturb each component and obtain its essentiality (Additional file 2); the results are given in Table S1 in Additional file 2. The prediction accuracy as compared to the dynamic model obtained by the four graph measures is shown in Figure 4(d). One can clearly see that the ESM measure and the SP measure which incorporate information from inhibitory regulation and synergistic relationships can fully capture the essentiality of signaling components and have a much better performance than betweenness centrality and *SigFlux*.

#### The guard cell ABA signaling network

Plants take up carbon dioxide for photosynthesis and lose water by transpiration through pores called stomata, which are flanked by pairs of specialized guard cells. The size of stomata is regulated by the guard cells' turgor [44]. Under drought stress conditions, plants reduce water loss by synthesizing the phytohormone abscisic acid (ABA) that triggers stomatal closure. Li *et al.*[31] assembled a signaling network corresponding to ABA-induced stomatal closure and formulated an asynchronous Boolean model of the process. There are over 50 nodes in this network, with one input, ABA, and one output, stomatal closure. The intermediate nodes include signaling proteins such as the G protein α subunit GPA1, second messengers such as cytosolic Ca^{2+}, phosphatidic acid, as well as ion channels.

Using this network and the Boolean rules, we construct the expanded ABA signaling network shown in Figure 5. The shortest ESM from ABA to Closure has 21 nodes and has a length of 11. The importance values of signaling components arising from single-node deletions are summarized in Figure S1, where the number of ESMs is calculated by the depth-first-search algorithm using the max operation (Additional file 1). Our method shows that knockout of AnionEM, Depolar or Actin will completely block all the signaling paths and ESMs. Disruption of other components such as GPA1, AGB1, CaIM, PLD, PA, pH_{c}, H^{+}ATPase, Ca^{2+}_{c}, or KOUT leads to a strong reduction of the signal transduction connectivity. In addition, single-node knockouts of SphK and S1P will increase the length of the shortest ESM by more than 60%, suggesting that these signaling components are critical for the efficiency of ABA signal transduction. The important components predicted by our method are validated by numerous experimental observations (Additional file 3).

A detailed comparison of prediction accuracy by the four methods is given in Figure 6, where we use the perturbation results of the Boolean dynamic model as the standard (Additional file 2, Table S2). By comparing with the dynamic simulation result, the best accuracy of the ESM measure is 95% sensitivity, 73% specificity (for an importance threshold of 0.8). The best performance of the SP measure is a sensitivity of 85% and specificity of 78% (for a threshold of 0.6). The best accuracy of the *SigFlux* measure is sensitivity 85%, specificity 73% while that of the betweenness centrality is sensitivity 50%, specificity 68% (both for a threshold of 0.1). Again, the performances of the ESM measure and the SP measure are better than those of the other two. We also evaluate the importance of two-node combinations by simultaneously deleting two original nodes, two complementary nodes, or an original node and a complementary node. The results, shown in Figure S2, are highly consistent with the dynamic simulation in Li *et al.* 2006 [31] (Additional file 3).

#### The T cell receptor signaling network

T cells (lymphocytes) play a central role in the immune response. T cells detect antigens by a special receptor on their surface called T cell receptor (TCR), which is triggered by Major Histocompatibility Complex (MHC) molecules and induces a series of intracellular signaling cascades. CD28 provides an essential co-stimulatory signal during T-cell activation, which increases T cell proliferation and prevents the induction of anergy and cell death. Saez-Rodriguez *et al.* constructed a 94-node T cell receptor signaling network with three input nodes and seven output nodes [32] and used the software CellNetAnalyzer [11] to calculate the logical steady states of this network.

We use CD28 antigen and the ligand of the T cell receptor (denoted by TCRlig) as the two inputs of the T cell signaling network and use NFκB and AP1 as the two outputs. The other outputs studied by Saez-Rodriguez *et al.* are implicitly incorporated in this analysis as the connectivity from the inputs to SRE, CRE and p38 is contained in that from the inputs to AP1, and the connectivity from the inputs to NFAT and PKB is contained in that from the inputs to NFκB. We use our method to expand this T cell receptor signaling network into a new representation shown in Figure S3 (Additional file 4).

The importance values of signaling components obtained by single-node deletions are summarized in Figure 7(a) and Figure 7(b), respectively, where the number of ESMs is calculated by the depth-first-search algorithm using the max operation (Additional file 1). Our method shows that more than 20 components are essential to the activation of the transcription factor NFκB, as single-node disruption of these components blocks all the signaling paths and ESMs to NFκB. Most of these components are located in the core of the T cell receptor signaling network [32]. The importance values given by the ESM measure and the SP measure are very similar except the difference in evaluating the node Fyn, whose ESM-given essentiality is supported by the logical steady state analysis. In contrast, *SigFlux* and betweenness centrality cannot recognize the core part of the T cell signaling network.

We calculate the steady states of this T cell receptor signaling network [32] by using the software CellNetAnalyzer [11] (Additional file 2). The essentiality of signaling components obtained from the perturbation results is summarized in Table S3 (Additional file 2). We can see from Figure 7(c) that both the ESM measure and the SP measure capture well the essentiality of the T cell receptor signaling components, whereas the other two methods do not. The results by using AP1 as the output, given in Figure S4 (Additional file 4), also support this conclusion.