Volume 5 Supplement 3
The 2010 International Conference on Bioinformatics and Computational Biology (BIOCOMP 2010): Systems Biology
Biological network motif detection and evaluation
 Wooyoung Kim^{1}Email author,
 Min Li^{1, 2}Email author,
 Jianxin Wang^{2} and
 Yi Pan^{1}Email author
DOI: 10.1186/175205095S3S5
© Kim et al. 2011
Published: 23 December 2011
Abstract
Background
Molecular level of biological data can be constructed into system level of data as biological networks. Network motifs are defined as overrepresented small connected subgraphs in networks and they have been used for many biological applications. Since network motif discovery involves computationally challenging processes, previous algorithms have focused on computational efficiency. However, we believe that the biological quality of network motifs is also very important.
Results
We define biological network motifs as biologically significant subgraphs and traditional network motifs are differentiated as structural network motifs in this paper. We develop five algorithms, namely, EDGE GOBNM, EDGE BETWEENNESSBNM, NMFBNM, NMFGOBNM and VOLTAGEBNM, for efficient detection of biological network motifs, and introduce several evaluation measures including motifs included in complex, motifs included in functional module and GO term clustering score in this paper. Experimental results show that EDGE GOBNM and EDGE BETWEENNESSBNM perform better than existing algorithms and all of our algorithms are applicable to find structural network motifs as well.
Conclusion
We provide new approaches to finding network motifs in biological networks. Our algorithms efficiently detect biological network motifs and further improve existing algorithms to find high quality structural network motifs, which would be impossible using existing algorithms. The performances of the algorithms are compared based on our new evaluation measures in biological contexts. We believe that our work gives some guidelines of network motifs research for the biological networks.
Background
Systems biology focuses on the study of complex interactions in biological systems, rather than the study of individual molecules such as DNA, RNA, proteins and metabolites [1]. One of the goals of systems biology is understanding the structures of all molecules and their interactions in a system level. Therefore major challenges are understanding the dynamic structures of small molecules and determining their functions in a living cell. Various types of biological interactions have been expressed in networks, which include transcriptional regulatory networks, signaling pathways, metabolic networks and proteinprotein interaction (PPI) networks. Biological networks share some of structural properties of other complex networks, or have specific features of scalefree and smallworld effect [2]. However, the properties have been questioned by Lacroix et al. [3] with a number of reasons including the incompleteness of networks and inconsistent link generation for the graphs. Therefore, the analysis extends to other network properties such as network clusters and network motifs.
As biological networks are massive and the size is still increasing, dividing the network into a number of clusters helps reveal specific local properties. Network motif, as another concept describing local properties of a network, is defined as a small connected subgraph appearing frequently and uniquely in a network. Similar to a protein sequence motif, network motif is defined as a overrepeated pattern, but it requires much more computation as the process involves isomorphic testing and repeated processes for uniqueness determination. Network alignment [4] and network querying [5] are analogous to network motifs, but while network motifs are defined with only structural information, network alignment and network querying require both of the topological and biological information. Previous network motif discovery algorithms include exact counting and approximation algorithms: Exhaustive recursive search (ERS) [6], enumerate subgraphs (ESU) [7] and compact topological motifs [8] are exact counting algorithms. For efficient detection, several approximation algorithms have been provided including edge sampling (MFINDER) [6], randomized version of ESU from a search tree (RANDESU) [9], and treefiltering search which is NE MO FINDER[10]. Furthermore, parallel search algorithms have been developed to realize feasible exact counting algorithms [11, 12].
Network motifs are used for many applications in biological networks. Feedforwardloop (FFL) and bifan network motifs are identified as the typical patterns in different types of biological networks [13, 14]. Przulj et al. [15] used network motifs as a relative graphlet frequency distance to distinguish different proteinprotein interaction networks. Also motif frequencies are exploited as classifiers for network model selection [16]. Milo et al. [17] studied that networks of different biological and technological domains have been classified into different superfamilies on the basis of motif significance profiles. To predict proteinprotein interactions, Albert I. and Albert R. [18] used network motifs successfully. In the study by Conant and Wagner [19], network motifs in transcriptional regulatory networks are not evolutionary conserved while network motifs in PPI networks are evolutionary related. On the other hand, network motifs are extended to 'motif modes' each of which has a certain topology and a specific functional property [20].
Through a number of network motif applications, however, we notice several problems regarding the biological meanings of network motifs, on top of the computational challenge for the detection. First, the biological quality of network motifs are not validated thoroughly. A network motif is selected only by its structural uniqueness and just small number of instances of the type are biologically exemplified. Second, only small portion of network motif instances are used for applications and others are ignored. Third, nonmotifs, that is, structurally insignificant subgraphs, have not been analyzed in any studies, which are filtered out before applying to any applications. Fourth, it is still questionable what the network motifs really represent in biological networks.
As we believe that the biological quality of network motifs are also significant, we define a biological network motif in this paper. Throughout this paper, we refer a network motif as a structural network motif to distinguish it from a biological network motif. Unlike structural network motifs, biological network motifs are biologically significant small connected subgraphs regardless of the structure. The biological significance is unspecified in the definition, as it will be assigned flexibly by a goal of the application. We introduce EDGE GOBNM, EDGE BETWEENNESSBNM, NMFBNM, NMFGOBNM and VOLTAGEBNM algorithms for efficient discovery of biological network motifs, and design new evaluation measures named, 'motifs included in complex', 'motifs included in functional module' and 'GO term clustering score'. Our algorithms compete with existing algorithms including ESU, RANDESU and MFINDER, and the performance are compared based on the new measures introduced in this paper. The main idea for our algorithms is to reduce the number of subgraphs to search by removing a number of edges from the original network and, at the same time, increase the discovery rate for biological network motifs. Experimental results with a couple of S. cerevisiae PPI networks demonstrate that EDGE GOBNM and EDGE BETWEENNESSBNM algorithms perform better than other algorithms in most of the measures. In addition, we show that all of our algorithms are applicable to the discovery of structural network motifs as well.
The work has three contributions to the study of network motifs: 1)We question biological meanings of network motifs which have not been focused by existing detection algorithms. New motif search algorithms and evaluation measures are developed based on these questions. 2)We design several algorithms combining the topological and biological information in a network. The algorithms further enrich existing algorithms in a biological context. 3)We develop a number of evaluation measures which qualify biological importance of network motifs. As we know of, this is the first time to suggest systematical evaluation measures for network motifs. With these contributions, we hope that our work gives some guidelines for the researches of network motifs in biological networks.
Results and Discussion
In this paper, we define biological network motifs as biological meaningful network motifs and develop EDGE GOBNM,EDGE BETWEENNESSBNM, NMFBNM, NMFGOBNM and VOLTAGEBNM algorithms for an efficient detection of biological network motifs. For clarification, traditional network motifs are referred as structural network motifs throughout this paper. The performance of each algorithm is compared based on three evaluation measures such as 'motifs included in complex', 'motifs included in functional module', 'GO (Gene ontology) term clustering score' which we design to assess biological quality of network motifs. Detail description of algorithms and evaluation measures are described in the "Methods."
Data sets
Comparison of the algorithms against different evaluation measures
Results of 4node biological network motifs in the DIP Core network
Algorithm  Motif included in  GO Clustering score  

Complex  Function  BP  MF  CC  
ESU  .13  .205  .64  .51  .61 
RANDESU  .13  .208  .65  .28  .46 
MFINDER  .15  .299  .74  .57  .71 
EDGE GOBNM  .21  .479  .85  .70  .80 
EDGE BETWEENNESSBNM  .28  .392  .78  .60  .79 
NMFGOBNM  .18  .360  .78  .61  .75 
NMFBNM  .15  .230  .68  .54  .64 
VOLTAGEBNM  .26  .330  .77  .59  .75 
Results of 5node biological network motifs in the DIP Core network
Algorithm  Motif included in  GO Clustering score  

Complex  Function  BP  MF  CC  
ESU  .07  .097  .67  .51  .63 
RANDESU  .07  .096  .66  .52  .62 
MFINDER  .09  .167  .75  .56  .72 
EDGE GOBNM  .08  .240  .87  .70  .79 
EDGE BETWEENNESSBNM  .14  .210  .81  .59  .76 
NMFGOBNM  .08  .169  .71  .59  .60 
NMFBNM  .13  .104  .65  .53  .61 
VOLTAGEBNM  .08  .121  .71  .50  .67 
Results of 4node biological network motifs in the Y2k network
Algorithm  Motif included in  GO Clustering score  

Complex  function  BP  MF  CC  
ESU  .501  .152  .61  .21  .67 
RANDESU  .491  .126  .61  .23  .65 
MFINDER  .586  .180  .65  .26  .72 
EDGE GOBNM  .603  .463  .94  .25  .90 
EDGE BETWEENNESSBNM  .904  .178  .82  .19  .84 
NMFGOBNM  .609  .434  .92  .27  .90 
NMFBNM  .819  .177  .76  .26  .80 
VOLTAGEBNM  .638  .200  .63  .26  .77 
Results of 5node biological network motifs in the Y2k network
Algorithm  Motif included in  GO Clustering score  

Complex  function  BP  MF  CC  
ESU  .281  .083  .69  .17  .76 
RANDESU  .305  .090  .71  .17  .77 
MFINDER  .431  .096  .73  .21  .80 
EDGE GOBNM  .362  .376  .99  .24  .96 
EDGE BETWEENNESSBNM  .814  .087  .89  .13  .91 
NMFGOBNM  .445  .257  .98  .18  .96 
NMFBNM  .643  .073  .80  .18  .83 
VOLTAGEBNM  .665  .089  .82  .19  .85 
Relationship between biological and structural network motifs
DIP Core statistical properties, from FANMOD
Label  Freq(Original)  MeanFreq (Random)  SDev(Random)  Zscore  Pvalue 

C^  1.46%  5.9e005%  3.04e006  4813.3  < 10^{3} 
CN  10.21%  0.01%  < 10^{6}  289.09  < 10^{3} 
CF  48.69%  42.22%  < 10^{6}  17.31  < 10^{3} 
C~  0.48%  0.00%  0  undefined  < 10^{3} 
Cr  0.47%  0.23%  < 10^{6}  16.28  < 10^{3} 
CR  38.65%  57.54%  < 10^{6}  52.17  > 10^{2} 
Y2k statistical properties, from FANMOD
Label  Freq(Original)  MeanFreq (Random)  SDev(Random)  Zscore  Pvalue 

C~  4.66%  4.07e006%  9.14e007  51013  < 10^{3} 
C^  8.91%  < 10^{2}  4.29e005  2075.1  < 10^{3} 
CN  32.89%  0.021%  < 10^{6}  225.64  < 10^{3} 
Cr  0.55%  1.14%  < 10^{6}  9.95  > 10^{2} 
CF  19.58%  41.82%  < 10^{6}  66.188  > 10^{2} 
CR  33.40%  57.06%  < 10^{6}  84.16  > 10^{2} 
Biological significance for biological network motifs
Y2k network: the rates of motifs included in a 'rRNA processing' functional module in yeast, computed using equation (1).
Algorithm  C~  C^  CN  Cr  CF  CR 

ESU (Counts)  1.0(2,509)  1.0(5,152)  1.0(17,457)  1.0(434)  1.0(8,095)  1.0(15,953) 
RANDESU  .30  .32  .34  .36  .34  .34 
MFINDER  .78  .54  .31  .38  .16  .13 
E DGE GO BNM  .97  .97  .98  1.0  .99  .97 
EDGE BETWEENNESSBNM  .67  .64  .32  .57  .22  .16 
NMFGOBNM  .87  .88  .78  .89  .70  .73 
NMFBNM  .69  .39  .23  .22  .12  .90 
VOLTAGEBNM  .53  .38  .39  .39  .32  .31 
In Table 7, the first column lists all the algorithms conducted in this paper, and the other columns show the recall of subgraphs included in 'rRNA processing' functional module according to each subgraph type. The 'rRNA processing' functional module consists of 206 proteins in the yeast. All algorithms except ESU search only 30% of subgraphs out of the total subgraphs searched with ESU algorithm but EDGE GOBNM recovers over 90% of subgraphs included in 'rRNA processing'. Furthermore, we observe that although the Cr, CF, CR are structural network nonmotifs, about 50% of subgraphs included into the 'rRNA processing' are these nonmotifs. This example shows that even nonmotifs also have biological meanings, therefore the structural network motif defined by its structural uniqueness is insufficient to explain biological meanings.
Conclusions
In this paper, we provide new approaches to finding network motifs in biological networks. Structural network motifs are defined as frequently and uniquely repeated small connected subgraph in a network. However, motivated by several issues brought up while a number of network motif applications are investigated, we propose to find biologically meaningful network motifs. Hence, we define biological network motifs as biologically meaningful knode subgraphs, develop a number of algorithms for efficient detection of biological network motifs and introduce new evaluation measures. The algorithms reduce the number of subgraph search and increase the detection rates of biological network motifs at the same time. The algorithms are categorized into two classes: Edgeremoving algorithms and Network clustering algorithms. EDGE GOBNM and EDGE BETWEENNESSBNM are algorithms which remove a number of edges based on GO term and edge betweenness score, respectively. NMFBNM, NMFGOBNM and VOLTAGEBNM algorithms partition the network based on its topological property or GO term relevance. All the algorithms introduced in this paper improve existing algorithms for high quality structural network motif detection.
We also introduce a number of evaluation measures which measure biological significance of each subgraph: 'motifs included in complex', 'motifs included in functional module' and 'GO term clustering score.' Biological meanings of those biological network motifs are assigned based on these evaluation measures. We ran the algorithms on two PPI network of S. cerevisiae, and compared them with our new measures. An existing exhaustive search and other two existing approximation algorithms are also provided to be compared with our algorithms. EDGE GOBNM shows overall good results in all the measures, but EDGE BETWEENNESSBNM is the best in the 'motifs included in complex' measure.
The works in this paper can be studied further. Currently, the parameters of various algorithms in this paper are adjusted only to obtain a desired number of subgraphs. In near future, various impacts of the parameters on the results should be investigated. Besides the parameters, the balance between topological and biological information will be an important factor for a better algorithm. On the other hand, current evaluation measures are limited to PPI networks. Comprehensive evaluation measures should be designed to apply various types of biological networks. Meanwhile, the work should be extended to weighted or direct networks for more comprehensive analysis of biological network motifs.
Methods
Definitions and notations
Here average(f_{ R }(m)) and std(f_{ R }(m)) refer to the average and standard deviation of frequencies in random networks respectively. Generally, a subgraph m with Pvalue less than 0.01 or Zscore greater than 2.0 is considered as a network motif.
We define a biological network motif g as a small connected subgraph of size k which has topological property as well as biological meanings. For clear understanding, a network motif is referred to structural network motif throughout this paper, and biological network motifs and structural network motifs have manytomany relationships. We emphasize that we do not categorize all of the biological network motifs into some classes like 'motif mode' in the study by Lee and Tzou [25], where the number of motif modes reaches up to millions. Instead, we assume that biological network motifs are application dependent, therefore flexibly categorized according to the applications. For a specific subgraph being a biological network motif, we need some measures which are presented later in this section. From now on, G = (V, E) is a target (original) network, G' = (V, E') is a modified network, n is the number of vertices and m is the number of edges in G.
Description of Algorithm
Structural network motifs are either exactly (exhaustively) or approximately determined. As exhaustive search is infeasible in large networks, approximation algorithms have been used in many applications in practice. In this study, we provide a number of algorithms originally designed to detect biological network motifs, but also improve existing algorithms for high quality structural network motif discovery. Some algorithms use structural information alone or biological information alone, and others combine structural and biological information.
EdgeRemoving Algorithms
We present two algorithms to remove 'insignificant' edges based on two different aspects. EDGE GOBNM (EDGE GO for biological network motif) algorithm removes edges based on its related Gene ontology terms. EDGE BETWEENNESSBNM (EDGE BETWEENNESS for biological network motif) algorithm removes edges based on its edge betweenness score. Since EDGE GOBNM algorithm uses Gene ontology (GO) terms associated with the nodes, the algorithm is applicable only to the gene or protein related networks. In EDGE BETWEENNESSBNM algorithm, although the computation of EDGE BETWEENNESS score is existing measure used for network clustering [26], it is the first time used for network motif detection.
EDGEGOBNM algorithm
We define an EdgeGO set as a set of all GO's associated to both of the end points of the edge e and an EdgeGO depth of e is the maximum depth of the GOs in the EdgeGO set. In EDGE GOBNM algorithm, a threshold GO term depth d should be given as a parameter and the edges whose EdgeGO depth is less than d are removed. Algorithm 1 describes detail steps of the EDGE GOBNM algorithm.
Algorithm 1: EDGE GOBNM
input: Graph G = (V, E), d :a GO depth threshold, k :the motif size.
output: a number of subgraphs with size k.
1 RE ← ∅
2 E' ← E
3 for ∀e ∈ E do
4 GO set ← all GO terms associated with both of the endpoints of e
5 D ← maximum depth of GOset
6 if D <d then
7 RE = RE ∪ {e}
8 E' = E'  {e}
9 Let G' = (V, E')
10 Enumerate all ksubgraphs from G'
Line 10 in Algorithm 1 produces all the ksize subgraphs in the reduced graph G', and any existing exact counting algorithm can be used for this task. In EDGE GOBNM algorithm, different depth threshold d results different number of edges to remove and we experimentally determine the threshold depth to get a desired number of subgraphs. More edges are removed as the depth threshold increases, which in turn reduces the number of subgraph searches. This work is motivated by the paper [20] which reveals that different levels of GO terms lead to different modes of motifs. EDGE GOBNM algorithm is deterministic and the whole process except line 10 runs linearly with the number of edges, m. In most cases, this algorithm obtains unbalanced clusters, where a few clusters have most of the vertices and most of the clusters consist of small number of vertices.
EDGEBETWEENNESSBNM algorithm
EDGE BETWEENNESSBNM algorithm uses topological information to remove some of edges. EDGE BETWEENNESS algorithm is initially introduced by Girvan and Newman [26] to produce network clusters using betweenness score of each edge. Network modularization [28] is supported by this measure and many protein modules are successfully discovered with it. EDGE BETWEENNESSBNM algorithm goes through all edges to compute its edge betweenness score, namely, EBScore: The number of shortest paths in all pairs of vertices that run along with the edge e is EBScore(e), then the edge with the highest EBScore is removed. This process is repeated until we get a desired number of edges to remove. The detail procedure of EDGE BETWEENNESSBNM is described in Algorithm 2.
Algorithm 2: EDGE BETWEENNESSBNM
input : Graph G = (V, E), r is the number of edges to remove, k :the motif size.
output: a number of subgraphs with size k.
1 RE ← ∅
2 E' ← E
3 R ← 0
4 while R < r do
5 for all pairs of vertices in V, obtain the shortest path, SP
6 ∀e ∈ E, let EBscore(e) = number of SP's containing e in the path
7 Let ed be the edge with maximum EBscore
8 RE = RE ∪ {ed}
9 E' = E'  {ed}
10 R = R + 1
11 Let G' = (V, E')
12 Enumerate all ksubgraphs from G'
Except line 12 in Algorithm 2, EDGE BETWEENNESSBNM algorithm runs in O(rmn) where r is the number of edges to remove. EDGE BETWEENNESSBNM algorithm produces relatively balanced network clusters and is also a deterministic algorithm.
Clustering Algorithms
Another way of reducing a network is to partition the network into smaller subnetworks and remove the edges between clusters. In this work, we present three clustering algorithms: NMFBNM (Nonnegative matrix factorization for biological network motif), NMFGOBNM (Nonnegative matrix factorization with GO term for biological network motif) and VOLTAGEbnm(Voltage clustering for biological network motif) algorithm. Voltage clustering algorithm has been used for network clustering before, but not for network motif discovery.
NMFBNM algorithm
Nonnegative matrix factorization (NMF) has been used to cluster various data, such as face images, text corpus and gene expression data. Initially used as a dimension reduction technique, NMF is successfully applied to many clustering tasks with additional sparseness constraints [29–31]. In this work, we apply NMF for an efficient detection of biological network motif. Detail process of NMFBNM is described in Algorithm 3.
Algorithm 3: NMF(GO)bnm
input : Graph G = (V, E), c is the number clusters, k :the motif size, (d is GO depth threshold), η
and β for sparse NMF.
output: a number of subgraphs with size k.
1 RE ← ∅
2 E' ← E
3 Let CL_{1}, ⋯, CL_{ c }= ∅.
4 Construct a data matrix A from G.
5 Run sparse NMF to A and get an n × c matrix H
6 for all the columns in H do
7 Let ${h}^{j}={\left\{{h}_{1}^{j},\cdot \cdot \cdot ,{h}_{c}^{j}\right\}}^{T}$ be j th column vector of H.
8 if ${h}_{i}^{j}$is largest in h^{ j }then
9 put the vertex v_{ j }to CL_{ i }.
10 for ∀e ∈ E do
11 if e lies between clusters of CL_{ i }then
12 RE = RE ∪ {e}
13 E' = E'  {e}
14 Let G' = (V, E')
15 Enumerate all ksubgraphs from G'
Here, $\left\right.{}_{F}^{2}$ is the square of the Frobenius norm, $\left\right.{}_{1}^{2}$ of the L_{1} norm, and H(:,j) is the j th column of matrix H. Two parameters, η for sparseness and β for balance between sparseness and correctness, should be given. Intuitively, the matrix H gives clustering information as described in lines 6 to 9. The detail description of sparse NMF is illustrated in the paper [31] by Kim and Park. Except the last step in Algorithm 3, NMFBNM runs linearly with the size of A at each iteration, and it converges to a stable point, not necessarily unique, through a number of iterations.
NMFGOBNM algorithm
VOLTAGEBNM algorithm
VOLTAGE clustering algorithm is developed by Wu and Huberman [32] to cluster a network based on voltage drops. The algorithm first generates a number of candidate clusters using Kirchhoff equations [33], which tell that total current of each node should sum up to zero. From the candidate clusters, a seed is selected which appears most frequently in the candidate clusters, and the neighbor vertices of this seed are collected to form a cluster. The process is repeated until we get a desired number of clusters. The number of clusters are later adjusted if the seeds are too close. An exact solution for this algorithm requires O(V^{3}), but Wu and Huberman [32] provide an approximation solution in O(V + E). In this paper, we utilize VOLTAGE clustering algorithm to design a VOLTAGEBNM (voltage for biological network motif) algorithm for efficient discovery of biological network motifs as shown in Algorithm 4. We emphasize that VOLTAGEBNM algorithm is easy and fast, but it is nondeterministic algorithm because the randomly selected seeds lead to quite different results every time it runs.
Algorithm 4: VOLTAGEBNM
input : Graph G = (V, E), c is the number clusters, k :the motif size.
output: a number of subgraphs with size k.
1 RE ← ∅
2E' ← E
3 Let CL_{1}, ⋯, CL_{ c }= ∅.
4 m ← 0.
5 while ( m ≤ c) do
//Generate c number of candidate clusters.
6 Pick a vertex pair, source and sink.
7 Compute voltages of each vertex of graph G using source and sink.
8 Group the vertices in two clusters (high/low).
9 Store resulting candidate clusters.
10 m = m + 2
11 l ← 1
12 while l <c do
//generate c  1 clusters
13 Pick one cluster seed s most appearing in candidate clusters.
14 Obtain cooccurrence vertices to the s, and put them to a cluster CL_{ l }.
15 Remove all the cooccurrence vertices and s from candidate clusters.
16 l = l + 1.
17 Remaining unassigned vertices belong to the CL_{ c }cluster.
18 if ∀e ∈ E, e lies between clusters of CL_{i}, then
19 RE = RE ∪ {e}
20 E' = E'  {e}
21 Let G' = (V, E')
22 Enumerate all ksubgraphs from G
Various algorithms used for the detection of biological network motifs
Algorithm  Type  Time before ESU  Parameter  Deterministic 

EDGE GOBNM  EdgeRemoving  O(E)  d  Yes 
EDGE BETWEENNESSBNM  EdgeRemoving  O(rEV)  r  Yes 
NMFGOBNM  Clustering  O(E(V + l))  d, c, η, β  No 
NMFBNM  Clustering  O(EV)  c, η, β  No 
VOLTAGEBNM  Clustering  O(E + V)  c  No 
Evaluation Measures
Network motif is defined as a frequently and uniquely represented subgraph in a network and is determined through structural uniqueness, measured by Pvalue (9) or Zscore (3). The structural uniqueness, however, is an inappropriate validation for motifs in biological networks. Therefore, we design several biological evaluation measures other than topological uniqueness in this study. These are called 'motifs included in complex', 'motifs included in functional module', 'GO (Gene ontology) term clustering score'. Protein complexes are the groups of proteins interacting with each other at the same time and same place in a cell, whereas functional modules are the groups of proteins binding to participate in different cellular processes at different times. Currently, these evaluation measures are specifically designed for PPI networks. More comprehensive validation measures should be developed in near future.
Motifs included in complex
Motifs included in functional module
In our experiments, the database for protein complexes and functional modules are obtained from MIPS [34] server.
GO term clustering score
where min(pi) is the Pvalue of each subgraph, n_{ s }is the number of significant and n_{ i }is the number of insignificant subgraph. A higher GO term clustering score of an algorithm indicates a better algorithm. Since GO term has three independent aspects of BP, MF, CC, we have three types of this measure: BP GO term clustering score; MF GO term clustering score; and CC GO term clustering score.
List of abbreviations
 BNM:

Biological Network Motif
 GO:

Gene Ontology
 BP:

Biological Process
 MF:

Molecular Function
 CC:

Cellular Component
 DAG:

Directed Acyclic Graph
 SP:

Shortest Path
 NMF:

Nonnegative Matrix Factorization
 ERS:

Exhaustive Recursive Search
 ESU:

Enumerate SUbgraph
 RANDESU:

Randomized ESU.
Declarations
Acknowledgements
The work of Yi Pan is supported in part by the National Science Foundation Grants CCF0514750, CCF0646102 and CNS0831634. Wooyoung Kim is supported by an MBD Fellowship from Georgia State University. The work of Min Li and Jianxin Wang is supported in part by the National Natural Science Foundation of China under Grant No.61003124 and No.61073036, the Ph.D. Programs Foundation of Ministry of Education of China No.20090162120073, the Freedom Explore Program of Central South University No.201012200124. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
This article has been published as part of BMC Systems Biology Volume 5 Supplement 3, 2011: BIOCOMP 2010  The 2010 International Conference on Bioinformatics & Computational Biology: Systems Biology. The full contents of the supplement are available online at http://www.biomedcentral.com/17520509/5?issue=S3.
Authors’ Affiliations
References
 Kitano H: Foundations of Systems Biology. Cambridge, MA.: The MIT Press; 2001.Google Scholar
 Barabasi AL, Oltvai ZN: Network biology: understanding the cell's functional organization. Nat Rev Genet 2004,5(2):101113. 10.1038/nrg1272View ArticlePubMedGoogle Scholar
 Lacroix V, Cottret L, Thebault P, Sagot MF: An Introduction to Metabolic Networks and Their Structural Analysis. IEEE/ACM Trans Comput Biology Bioinform 2008, 594617.Google Scholar
 Flannick J, Novak A, Srinivasan BS, McAdams HH, Batzoglou S: Graemlin: General and robust alignment of multiple large interaction networks. Genome Research 2006,16(9):11691181. [http://genome.cshlp.org/content/16/9/1169.abstract] 10.1101/gr.5235706PubMed CentralView ArticlePubMedGoogle Scholar
 Yang Q, Sze SH: Path Matching and Graph Matching in Biological Networks. Journal of Computational Biology 2007, 14: 5667. [http://www.liebertonline.com/doi/abs/10.1089/cmb.2006.0076] 10.1089/cmb.2006.0076View ArticlePubMedGoogle Scholar
 Milo R, ShenOrr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U: Network Motifs: Simple Building Blocks of Complex Networks. Science 2002,298(5594):824827. [http://www.sciencemag.org/cgi/content/abstract/298/5594/824] 10.1126/science.298.5594.824View ArticlePubMedGoogle Scholar
 Wernicke S: Efficient Detection of Network Motifs. IEEE/ACM Trans Comput Biol Bioinformatics 2006,3(4):347359.View ArticleGoogle Scholar
 Parida L: Discovering Topological Motifs Using a Compact Notation. Journal of Computational Biology 2007,14(3):300323. 10.1089/cmb.2006.0142View ArticlePubMedGoogle Scholar
 Wernicke S, Rasche F: FANMOD: a tool for fast network motif detection. Bioinformatics 2006,22(9):11521153. [http://bioinformatics.oxfordjournals.org/cgi/content/abstract/22/9/1152] 10.1093/bioinformatics/btl038View ArticlePubMedGoogle Scholar
 Chen J, Hsu W, Lee ML, Ng SK: NeMoFinder: dissecting genomewide proteinprotein interactions with mesoscale network motifs. In KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. New York, NY, USA: ACM; 2006:106115.View ArticleGoogle Scholar
 Wang T, Touchman JW, Zhang W, Suh EB, Xue G: A Parallel Algorithm for Extracting Transcription Regulatory Network Motifs. Bioinformatic and Bioengineering, IEEE International Symposium on 2005, 0: 193200.View ArticleGoogle Scholar
 Schatz M, CooperBalis E, Bazinet A: Parallel Network Motif Finding. Tech. rep., University of Maryland Insitute for Advanced Computer Studies; 2008.Google Scholar
 Mangan S, Alon U: Structure and function of the feedforward loop network motif. Proceedings of the National Academy of Sciences of the United States of America 2003,100(21):1198011985. [http://www.pnas.org/content/100/21/11980.abstract] 10.1073/pnas.2133841100PubMed CentralView ArticlePubMedGoogle Scholar
 Mangan S, Zaslaver A, Alon U: The Coherent Feedforward Loop Serves as a Signsensitive Delay Element in Transcription Networks. Journal of Molecular Biology 2003,334(2):197204. [http://www.sciencedirect.com/science/article/B6WK749XP57D5/2/e21452290f309dc8622a35f6fa092627] 10.1016/j.jmb.2003.09.049View ArticlePubMedGoogle Scholar
 Przulj N, Corneil DG, Jurisica I: Modeling interactome: scalefree or geometric? Bioinformatics 2004,20(18):35083515. [http://bioinformatics.oxfordjournals.org/cgi/content/abstract/20/18/3508] 10.1093/bioinformatics/bth436View ArticlePubMedGoogle Scholar
 Middendorf M, Ziv E, Wiggins CH: Inferring network mechanisms: The Drosophila melanogaster protein interaction network. Proceedings of the National Academy of Sciences of the United States of America 2005,102(9):31923197. [http://www.pnas.org/content/102/9/3192.abstract] 10.1073/pnas.0409515102PubMed CentralView ArticlePubMedGoogle Scholar
 Milo R, Itzkovitz S, Kashtan N, Levitt R, ShenOrr S, Ayzenshtat I, Sheffer M, Alon U: Superfamilies of Evolved and Designed Networks. Science 2004,303(5663):15381542. [http://www.sciencemag.org/cgi/content/abstract/303/5663/1538] 10.1126/science.1089167View ArticlePubMedGoogle Scholar
 Albert I, Albert R: Conserved network motifs allow proteinprotein interaction prediction. Bioinformatics 2004,20(18):33463352. [http://view.ncbi.nlm.nih.gov/pubmed/15247093] 10.1093/bioinformatics/bth402View ArticlePubMedGoogle Scholar
 Conant GC, Wagner A: Convergent evolution of gene circuits. Nature Genetics 2003, 34: 244266. 10.1038/ng0703244View ArticleGoogle Scholar
 Lee WP, Jeng BC, Pai TW, Tsai CP, Yu CY, Tzou WS: Differential evolutionary conservation of motif modes in the yeast protein interaction network. BMC Genomics 2006, 7: 89. [http://www.biomedcentral.com/14712164/7/89] 10.1186/14712164789PubMed CentralView ArticlePubMedGoogle Scholar
 Xenarios I, Rice DW, Salwinski L, Baron MK, Marcotte EM, Eisenberg D: DIP: the Database of Interacting Proteins. Nucleic Acids Research 2000, 28: 289291. [http://nar.oxfordjournals.org/content/28/1/289.abstract] 10.1093/nar/28.1.289PubMed CentralView ArticlePubMedGoogle Scholar
 von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P: Comparative assessment of largescale data sets of proteiprotein interactions. Nature 2002,417(6887):399403.View ArticlePubMedGoogle Scholar
 Wang J, Li M, Chen J, Pan Y: A Fast Hierarchical Clustering Algorithm for Functional Modules Discovery in Protein Interaction Networks. Computational Biology and Bioinformatics, IEEE/ACM Transactions on 2011,8(3):607620.View ArticleGoogle Scholar
 McKay B: Nauty User's Guide. Tech. Rep. TRCS9002, Dept. of Computer Science, Australian Nat'l Univ; 1990.Google Scholar
 Lee WP, Tzou WS: Fast Revelation of the Motif Mode for a Yeast Protein Interaction Network Through Intelligent AgentBased Distributed Computing. Protein and Peptide Letters 2010,17(11):10911101. [http://www.ingentaconnect.com/content/ben/ppl/2010/00000017/00000009/art00005]View ArticlePubMedGoogle Scholar
 Girvan M, Newman MEJ: Community structure in social and biological networks. Proceedings of the National Academy of Sciences of the United States of America 2002,99(12):78217826. [http://www.pnas.org/content/99/12/7821.abstract] 10.1073/pnas.122653799PubMed CentralView ArticlePubMedGoogle Scholar
 Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, IsselTarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature genetics 2000, 25: 2529. 10.1038/75556PubMed CentralView ArticlePubMedGoogle Scholar
 Wang J, Li M, Deng Y, Pan Y: Recent advances in clustering methods for protein interaction networks. BMC Genomics 2010,11(Suppl 3):S10. [http://www.biomedcentral.com/14712164/11/S3/S10] 10.1186/1471216411S3S10View ArticleGoogle Scholar
 Kim H, Park H: Nonnegative Matrix Factorization Based on Alternating Nonnegativity Constrained Least Squares and Active Set Method. SIAM Journal on Matrix Analysis and Applications 2008,30(2):713730. 10.1137/07069239XView ArticleGoogle Scholar
 Kim J, Park H: Sparse Nonnegative Matrix Factorization for Clustering. Tech. Rep. GTCSE0801, Computational Science and Engineering, Georgia Institute of Technology; 2008.Google Scholar
 Kim H, Park H: Sparse Nonnegative Matrix Factorizations via Alternating Nonnegativityconstrained Least Squares for Microarray Data Analysis. Bioinformatics 2007,23(12):14951502. 10.1093/bioinformatics/btm134View ArticlePubMedGoogle Scholar
 Wu F, Huberman BA: Finding communities in linear time: a physics approach. The European Physical Journal B  Condensed Matter and Complex Systems 2004,38(2):331338. 10.1140/epjb/e200400125xView ArticleGoogle Scholar
 Kirchhoff G, Hensel K, Planck M: Vorlesungen uber mathematische Physik. B.G. Teubner; 1894.
 Mewes HW, Frishman D, Guldener U, Mannhaupt G, Mayer K, Mokrejs M, Morgenstern B, Munsterkotter M, Rudd S, Weil B: MIPS: a database for genomes and protein sequences. Nucleic Acids Research 2002, 30: 3134. [http://nar.oxfordjournals.org/content/30/1/31.abstract] 10.1093/nar/30.1.31PubMed CentralView ArticlePubMedGoogle Scholar
 Zhang Y, Zeng E, Li T, Narasimhan G: Weighted Consensus Clustering for Identifying Functional Modules in ProteinProtein Interaction Networks. Proceedings of the 2009 International Conference on Machine Learning and Applications, ICMLA '09, Washington, DC, USA: IEEE Computer Society 2009, 539544.View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.