### Implementation

The *cytoHubba* plugin is implemented in Java, based on the Cytoscape API. The plugin implements eleven node ranking methods to evaluate the importance of nodes in a biological network including Degree [1], Edge Percolated Component [9], Maximum Neighborhood Component [10], Density of Maximum Neighborhood Component [10], Maximal Clique Centrality (proposed in this paper), Bottleneck [11], EcCentricity [12], Closeness [13], Radiality [14], Betweenness [15], and Stress [16]. Each method is associated with a function *F* which assigns every node *v* a numeric value *F*(*v*). We say that the ranking of a node *u* is greater than that of another node *v* if the score of *u* (i.e. *F*(*u*)) is greater that of *v* (*i.e. F*(*v*)). The 11 methods can be divided into two major categories: local and global methods. To calculate the score of a node within a network, a local rank method only considers the relationship between the node and its direct neighbors; on the other hand, the global method examines the relationship between the node and the entire network.

Text for this sub-section.

### The algorithms

#### A. Local-based Methods

Here we state notations used for describing these methods. We assume that a biological network *G* = (*V, E*) is an undirected network, where *V* is the collection of nodes within the network and *E* is the edge set. We can use another notation *G* = (*V*(*G*), *E*(*G*)) to represent a network, where *V*(*G*) is the collection of nodes in a network *G*, and *E*(*G*) is the collection of edges in a network *G*. For a set *S*, we use |*S*| to denote its cardinality (*i.e*. the number of elements in the set).

Local based method only considers the direct neighborhood of a vertex. Given a node *v, N*(*v*) denotes the collections of its neighbors. There are four local based methods shown as follows:

**1. Degree method (Deg)**

*Deg*(*v*)=|*N*(*v*)|.

**2. Maximum Neighborhood Component (MNC)**

*MNC*(*v*) = |*V*(*MC*(*v*))|, where *MC*(*v*) is a maximum connected component of the *G*[*N*(*v*)] and *G*[*N*(*v*)] is the induced subgraph of *G* by *N*(*v*).

**3. Density of Maximum Neighborhood Component (DMNC)**

Based on MNC, Lin *et. al*. proposed *DMNC*(*v*) = |*E*(*MC*(*v*))|/ |*V*(*MC*(*v*))|^{ε}, where *ε* = 1.7 [10].

**4. Maximal Clique Centrality (MCC)**

To increase the sensitivity and specificity, we propose MCC to discover featured nodes. The intuition behind MCC is that essential proteins tend to be clustered in a yeast protein-protein interaction network [17]. Given a node *v*, the MCC of *v* is defined asMCC\left(v\right)={\sum}_{C\in S\left(v\right)}\left(\left|C\right|-1\right)!, where *S*(*v*) is the collection of maximal cliques which contain *v*, and (|*C*|-1)! is the product of all positive integers less than |*C*|. If there is no edge between the neighbors of the node *v*, then *MCC*(*v*) is equal to its degree.

#### B. Global-based methods

In *cytoHubba* we implement six node ranking methods based on shortest paths and one method based percolated connectivity. Before we introduce the shortest based methods, let us introduce some notation. The length of a shortest path between nodes *u* and *v* is denoted as *dist*(*u, v*). Let *C*(*v*) be the component which contains node *v*. The *dist*(*u, v*) is equal to infinite if *C*(*v*) ≠ *C*(*w*), and it makes methods of this category cannot be applied to networks with disconnected components. To overcome this problem, we enhance the original methods [11–16], and the score of a node in a connected network computed by enhanced method is the same as that computed by original one.

**1. Closeness (Clo)**

Clo\left(v\right)={\displaystyle \sum _{w\in V}}\frac{1}{dist\left(v,w\right)}

**2. EcCentricity (EC)**

EC\left(\mathsf{\text{v}}\right)=\frac{\left|V\left(C\left(v\right)\right)\right|}{\left|V\right|}\times \frac{1}{\mathsf{\text{max}}\left\{dist\left(v,w\right):w\in C\left(v\right)\right\}}

**3. Radiality (Rad)**

Rad\left(v\right)=\frac{\left|V\left(C\left(v\right)\right)\right|}{\left|V\right|}\times \frac{{\sum}_{w\in C\left(v\right)}\left({\text{\Delta}}_{C\left(v\right)}+1-dist\left(v,w\right)\right)}{\mathsf{\text{max}}\left\{dist\left(v,w\right):w\in C\left(v\right)\right\}}, where Δ_{
C
}_{(v)}is the maximum distance between any two vertices of the component *C*(*v*).

**4. BottleNeck (BN)**

Let *T*_{
s
} be a shortest path tree rooted at node *s*. BN\left(v\right)={\sum}_{s\in V}{p}_{s}\left(v\right)where *p*_{
s
}(*v*) = 1 if more than |*V*(*T*_{
s
})|/4 paths from node *s* to other nodes in *T*_{
s
} meet at the vertex *v*; otherwise ps(v) = 0.

**5. Stress (Str)**

Str\left(v\right)={\sum}_{s\ne t\ne v\in C\left(v\right)}{\sigma}_{st}\left(v\right), where σ_{
st
} (*v*) is the number of shortest paths from node *s* to node *t* which use the node *v*.

**6. Betweenness (BC)**

BC\left(v\right)={\sum}_{s\ne t\ne v\in C\left(v\right)}\frac{{\sigma}_{st}\left(v\right)}{{\sigma}_{st}}, where σ_{
st
} is the number of shortest paths from node *s* to node *t*.

**7. Edge Percolated Component (EPC)**

Given a threshold (0 ≤ the threshold≤ 1), we create 1000 reduced networks by assigning a random number between 0 and 1 to every edge and remove edges if their associated random numbers are less than the threshold.

Let the *G*_{
k
} be the reduced network generated at the *k* th time reduced process. If nodes *u* and *v* are connected in *G*_{
k
}, set {\delta}_{vt}^{k} to be 1; otherwise {\delta}_{vt}^{k}=0. For a node *v* in *G, EPC*(*v*) is defined as EPC\left(v\right)=\frac{1}{\left|V\right|}{\sum}_{k=1}^{1000}{\sum}_{t\in V}{\delta}_{vt}^{k}_{
.
}

### The demo dataset and evaluation

Database of Interacting Proteins used in this study is from DIP database ([18])(http://dip.doe-mbi.ucla.edu, version: 20140117). Essential protein lists are collected from *Saccharomyces* Genome Deletion Project (SGDP) [19] and *Saccharomyces* Genome Database (SGD) [20]. The protein ID match table from Uniprot ID to NCBI gene id is downloaded from Uniprot ftp site.

The PPI network is loaded to cytoscape and calculated by 11 methods using *cytoHubba* plugin. Precision of each method is estimated by the performance of the method to include essential proteins in the top × ranked list (x = 10, 20, 30 ..... 100) by Precision:

\mathsf{\text{Precision}}=\frac{\mathsf{\text{the}}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{number}}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{of}}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{essential}}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{proteins}}}{\mathsf{\text{the}}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{number}}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{of}}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{proteins}}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{in}}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{top}}\times \mathsf{\text{ranked}}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{proteins}}}