Biomolecular network querying: a promising approach in systems biology
© Zhang et al; licensee BioMed Central Ltd. 2008
Received: 08 January 2008
Accepted: 18 January 2008
Published: 18 January 2008
The rapid accumulation of various network-related data from multiple species and conditions (e.g. disease versus normal) provides unprecedented opportunities to study the function and evolution of biological systems. Comparison of biomolecular networks between species or conditions is a promising approach to understanding the essential mechanisms used by living organisms. Computationally, the basic goal of this network comparison or 'querying' is to uncover identical or similar subnetworks by mapping the queried network (e.g. a pathway or functional module) to another network or network database. Such comparative analysis may reveal biologically or clinically important pathways or regulatory networks. In particular, we argue that user-friendly tools for network querying will greatly enhance our ability to study the fundamental properties of biomolecular networks at a system-wide level.
With the rapid accumulation of 'omic' data from multiple species , various models of biological networks are being constructed, such as protein-protein interaction (PPI) networks [2, 3], gene regulatory networks [4, 5], gene co-expression networks [6–8], transcription regulatory networks , and metabolic networks [10, 11]. Instead of looking at individual components, studies on those molecular networks provide new opportunities for understanding cellular biology and human health at a system-wide level. Because of the complexity of life, revealing how genes, proteins and small molecules interact to form functional cellular machinery is a major challenge in systems biology. Recent studies have made great progress in this field, which considerably expanded our insight into the organizational principles and cellular mechanisms of biological systems. For example, new insights have been gained regarding topological properties [10–12], modular organization , and motif enrichment . In particular, network centrality and connectivity measures have been applied to identify essential genes in lower organisms  and cancer-related genes in humans .
Biological systems differ from each other not only because of differences in their components, but also because of differences in their network architectures. A complicated living organism cannot be fully understood by merely analyzing individual components, and it is the interactions between these components and networks that are ultimately responsible for an organism's form and function. For example, humans and chimpanzees are very similar on the sequence and gene expression level, but show striking differences in the "wiring" of their co-expression networks . It is essential to address the similarities and differences between molecular networks by comparative network analysis, to find conserved regions, discover new biological functions, understand the evolution of protein interactions, and uncover underlying mechanisms of biological processes.
In this article, we will discuss the computational problem posed by biomolecular network querying, that is, mapping nodes (such as proteins or genes) of one network of interest (for example a complex, a pathway, a functional module, or a general biomolecular network) to another network or network database for uncovering identical or similar subnetworks. Automated querying tools for implementing such a network comparison will be essential for harnessing the information present in multiple networks across different species or across different conditions.
Tools for identifying conservation between networks
A second example is MNAligner , developed by our group, which is an alignment tool for general biomolecular networks that combines both molecular similarity and topological similarity. This method can detect conserved subnetworks in an efficient manner without requiring special structures on the querying network. Another area of significant progress is multiple network alignment tools, e.g. Grælin developed by Flannick et al. , which uses a probabilistic function for topology matching, and can be applied to search for conserved functional modules among multiple protein interaction networks. Finally, using microarray data from multiple conditions and species, various comparative studies have been conducted so as to reveal transcriptional regulatory modules, predict gene functions, and uncover evolutionary mechanisms . For example, Yan et al.  have developed a graph-based data-mining algorithm called NeMo to detect frequent co-expression modules among gene co-expression networks across various conditions. They found a large number of potential transcriptional modules, which are activated under multiple conditions. Figure 1(c) illustrates a condition-specific module that appears in five leukemia co-expression networks across different conditions. Moreover, genes in the module were found to be involved in the cell cycle and DNA repair, which is consistent with the nature of leukaemia; this gives an initial confirmation of the effectiveness of such an analysis.
Tools for network querying
In addition to the studies on network comparison discussed above, a closely related technique is increasingly attracting attention and is expected to become a major analytical tool for systems biology. This technique is querying a small network against a large-scale network or a database of large-scale networks. Querying a small network is a local network comparison problem, which requires a highly efficient algorithm because it is computationally demanding. This problem has been studied by several groups [22, 23, 34, 35], and a few search tools have been developed. However, the existing methods for querying are far from perfect, lagging behind the demands of the systems biology community.
For instance, although PathBLAST [20, 22] can implement query searches, it is mainly only applicable to small pathways – up to 5 proteins – mainly due to the dimensionality problem with pathway length, and has limited support for identifying non-exact pathway matches. MetaPathwayHunter  developed by Pinter et al. enables fast queries for smaller pathways but is limited to those that take the form of a tree (i.e. a subnetwork with no loops). QPath  has also been developed for searching for linear pathways. Rather than finding networks with feedback loops, the algorithm mainly searches efficiently for homologous pathways, allowing for insertions and deletions of proteins in the pathways. NetMatch  is based on a graph-matching algorithm that aims to find the correspondences between two graphs. The results of NetMatch are subgraphs of the original graph connected in the same way as the querying graph, and therefore they can be viewed as candidate network motifs as a result of their similar topological features . It can also handle multiple attributes per node and edge, but is impeded by the restrictive match requirement, i.e. one-one match without gap.
Future prospects for network querying and comparison
Computational techniques for network querying are obviously still at an early stage and are currently limited by several problems, such as computational complexity and simple topological structures. Like the querying methods for sequences, a universal querying system that can query a network (e.g. a protein complex, a pathway, a functional module, or a general biomolecular network) efficiently against a large-scale complicated network or a large-scale network database is very much needed. By exploiting the growing amount of information on complexes, functional modules and network motifs, one can transfer biological knowledge (e.g. functional annotations or missed interactions) to the subnetwork of another species, thereby increasing the information retrieved from noisy data.
Conventional querying tools generally aim at one specific 'type' of network, such as protein interaction networks, gene co-expression networks, metabolic networks or drug-target networks. Querying several different types of network can uncover more conserved functional units supported by integrated information. If we obtain an interesting pathway that exists in several co-expression networks under different conditions for one species, it clearly implies that the pathway is activated under several different conditions. On the other hand, if the querying is done among networks across different species, the uncovered subnetworks and the queried small network may provide valuable evolutionary information. We believe that evolution-based principles are crucial for network querying, just as substitution matrices and sequence evolution are important for sequence comparisons . The noise and incompleteness of various 'omic' data are another important factor when we design such computational tools.
To benefit from the accumulation of network data, it will be important to develop user-friendly systems biology tools for biomolecular network querying. Recent advances in the field inspired by developments in sequence/structure alignment and large-scale database searching demonstrate the great potential of network querying in elucidating network organization, function and evolution. With the accumulation of huge network-related datasets, advances in computational methods and powerful software tools are being made possible by interdisciplinary cooperation across biology, physics, computer science and applied mathematics. With the development of powerful and sophisticated network querying tools, we expect to gain deep insight into essential mechanisms of biological systems at the network level from the perspective of systems biology.
The authors are grateful to the editors for their valuable comments and suggestions in improving the presentation of the earlier version of the paper. This research work is partly supported by Important Research Direction Project of CAS 'Some Important Problems in Bioinformatics', the National Basic Research Program (973 Program) under Grant No. 2006CB503910, and JSPS and NSFC under JSPS-NSFC collaboration project.
- Greenbaum D, Luscombe NM, Jansen R, Qian J, Gerstein M: Interrelating different types of genomic data, from proteome to secretome: 'oming in on function. Genome Res. 2001, 11: 1463-8. 10.1101/gr.207401View ArticlePubMedGoogle Scholar
- Stelzl U, Worm U, Lalowski M, Haenig C, Brembeck FH, Goehler H, Stroedicke M, Zenkner M, Schoenherr A, Koeppen S, Timm J, Mintzlaff S, Abraham C, Bock N, Kietzmann S, Goedde A, Toksöz E, Droege A, Krobitsch S, Korn B, Birchmeier W, Lehrach H, Wanker EE: A human protein-protein interaction network: a resource for annotating the proteome. Cell. 2005, 122 (6): 957-68. 10.1016/j.cell.2005.08.029View ArticlePubMedGoogle Scholar
- Wang R, Wang Y, Wu L-Y, Zhang X-S, Chen L: Analysis on Multi-domain Cooperation for Predicting Protein-Protein Interactions. BMC Bioinformatics. 2007, 8: 391-doi:10.1186/1471-2105-8-391PubMed CentralView ArticlePubMedGoogle Scholar
- Basso K: Reverse engineering of regulatory networks in human B cells. Nat Genet. 2005, 37: 382-390. 10.1038/ng1532View ArticlePubMedGoogle Scholar
- Wang Y, Joshi J, Xu D, Zhang X-S, Chen L: Inferring gene regulatory networks from multiple microarray datasets. Bioinformatics. 2006, 22: 2413-2420. 10.1093/bioinformatics/btl396View ArticlePubMedGoogle Scholar
- Butte AJ, Tamayo P, Slonim D, Golub TR, Kohane IS: Discovering Functional Relationships Between RNA Expression and Chemotherapeutic Susceptibility Using Relevance Networks. Proc Natl Acad Sci USA. 2000, 97: 12182-12186. 10.1073/pnas.220392197PubMed CentralView ArticlePubMedGoogle Scholar
- Carter S, Brechbuler C, MGriffin , Bond A: Gene co-expression network topology provides a framework for molecular characterization of cellular state. Bioinformatics. 2004, 20 (14): 2242-2250. 10.1093/bioinformatics/bth234View ArticlePubMedGoogle Scholar
- Zhang B, Horvath S: A General Framework for Weighted Gene Co-Expression Network Analysis. Statistical Applications in Genetics and Molecular Biology. 2005, 4 (1): 17-10.2202/1544-6115.1128.View ArticleGoogle Scholar
- Wang R, Wang Y, Zhang X-S, Chen L: Inferring Transcriptional Regulatory Networks from High-throughput Data. Bioinformatics. 2007, 10.1093/bioinformatics/btm465.Google Scholar
- Albert R: Scale-free networks in cell biology. J Cell Sci. 2005, 118: 4947-4957. 10.1242/jcs.02714View ArticlePubMedGoogle Scholar
- Barabasi A, Oltvai Z: Network biology: understanding the cell's functional organization. Nature Rev Gen. 2004, 5: 101-113. 10.1038/nrg1272.View ArticleGoogle Scholar
- Zhang S, Jin G, Zhang XS, Chen L: Discovering functions and revealing mechanisms at molecular level from biological networks. Proteomics. 2007, 7 (16): 2856-2869. 10.1002/pmic.200700095View ArticlePubMedGoogle Scholar
- Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabasi AL: Hierarchical organization of modularity in metabolic networks. Science. 2002, 297: 1551-1555. 10.1126/science.1073374View ArticlePubMedGoogle Scholar
- Alon U: Network motifs: theory and experimental approaches. Nature Rev Genet. 2007, 8: 450-461. 10.1038/nrg2102.View ArticlePubMedGoogle Scholar
- Jeong H, Mason SP, Barabasi AL, Oltvai ZN: Lethality and centrality in protein networks. Nature. 2001, 411: 41- 10.1038/35075138View ArticlePubMedGoogle Scholar
- Horvath S, Zhang B, Carlson M, Lu K, Zhu S, Felciano R, Laurance M, Zhao W, Shu Q, Lee Y, Scheck A, Liau L, Wu H, Geschwind D, Febbo P, Kornblum H, Cloughesy T, Nelson S, Mischel P: Analysis of oncogenic signaling networks in Glioblastoma identifies ASPM as a novel molecular target. Proc Natl Acad Sci USA. 2006, 103 (46): 17402-17407. 10.1073/pnas.0608396103PubMed CentralView ArticlePubMedGoogle Scholar
- Oldham M, Horvath S, Geschwind D: Conservation and evolution of gene co-expression networks in human and chimpanzee brain. Proc Natl Acad Sci USA. 2006, 103 (47): 17973-8. 10.1073/pnas.0605938103PubMed CentralView ArticlePubMedGoogle Scholar
- Sharan R, Ideker T: Modeling cellular machinery through biological network comparison. Nat Biotechnol. 2006, 24: 427-433. 10.1038/nbt1196View ArticlePubMedGoogle Scholar
- Sharan R, Ideker T, Kelley B, Shamir R, Karp RM: Identification of protein complexes by comparative analysis of yeast and bacterial protein interaction data. J Comput Biol. 2005, 12: 835-846. 10.1089/cmb.2005.12.835View ArticlePubMedGoogle Scholar
- Kelley BP, Sharan R, Karp R, Sittler ET, Root DE, Stockwell BR, Ideker T: Conserved pathways within bacteria and yeast as revealed by global protein network alignment. Proc Natl Acad Sci USA. 2003, 100: 11394-11399. 10.1073/pnas.1534710100PubMed CentralView ArticlePubMedGoogle Scholar
- Sharan R, Suthram S, Kelley RM, Kuhn T, McCuine S, Uetz P, Sittler T, Karp RM, Ideker T: Conserved patterns of protein interaction in multiple species. Proc Natl Acad Sci USA. 2005, 102: 1974-1979. 10.1073/pnas.0409522102PubMed CentralView ArticlePubMedGoogle Scholar
- Kelley PB, Yuan B, Lewitter F, Sharan R, Stockwell BR, Ideker T: PathBLAST: a tool for alignment of protein interaction networks. Nucl Acids Res. 2004, 32: 83-88. 10.1093/nar/gkh411.View ArticleGoogle Scholar
- Pinter RY, Rokhlenko O, Yeger-Lotem E, Ziv-Ukelson M: Alignment of metabolic pathways. Bioinformatics. 2005, 21: 3401-3408. 10.1093/bioinformatics/bti554View ArticlePubMedGoogle Scholar
- Trusina A, Sneppen K, Dodd IB, Shearwin KE, Egan JB: Functional alignment of regulatory networks: A study of temperate phages. Plos Comput Biol. 2005, 1: e74- 10.1371/journal.pcbi.0010074PubMed CentralView ArticlePubMedGoogle Scholar
- Berg J, Läsig M: Local graph alignment and motif search in biological networks. Proc Natl Acad Sci USA. 2004, 101: 14689-14694. 10.1073/pnas.0305199101PubMed CentralView ArticlePubMedGoogle Scholar
- Koyutürk M, Grama A, Szpankowski W: Pairwise localalignment of protein interaction network guided by models of evolution. RECOM LNBI. 2005, 3500: 48-65.Google Scholar
- Ogata H, Fujibuchi W, Goto S, Kanehisa M: A heuristic graph comparison algorithm and its application to detect functionally related enzyme clusters. Nucl Acids Res. 2000, 28: 4021-4028. 10.1093/nar/28.20.4021PubMed CentralView ArticlePubMedGoogle Scholar
- Berg J, Läsig M: Cross-species analysis of biological networks by Bayesian alignment. Proc Natl Acad Sci USA. 2006, 103: 10967-10972. 10.1073/pnas.0602294103PubMed CentralView ArticlePubMedGoogle Scholar
- Li Z, Zhang S, Wang Y, Zhang XS, Chen L: Alignment of molecular networks by integer quadratic programming. Bioinformatics. 2007, 23 (13): 1631-1639. 10.1093/bioinformatics/btm156View ArticleGoogle Scholar
- Flannick J, Novak A, Srinivasan BS, McAdams HH, Batzoglou S: Graemlin: General and robust alignment of multiple large interaction networks. Genome Res. 2006, 16: 1169-1181. 10.1101/gr.5235706PubMed CentralView ArticlePubMedGoogle Scholar
- Suthram S, Sittler T, Ideker T: The Plasmodium protein network diverges from those of other eukaryotes. Nature. 2005, 438: 108-112. 10.1038/nature04135PubMed CentralView ArticlePubMedGoogle Scholar
- Zhou XJ, Gibson G: Cross-species Comparison of Genome-wide Expression Patterns. Genome Biology. 2004, 5 (7): 232- 10.1186/gb-2004-5-7-232PubMed CentralView ArticlePubMedGoogle Scholar
- Yan X, Mehan M, Huang Y, Waterman MS, Yu PS, Zhou XJ: A Graph-based Approach to Systematically Reconstruct Human Transcriptional Regulatory Modules. Bioinformatics. 2007, 23 (13): i577-i586. 10.1093/bioinformatics/btm227View ArticlePubMedGoogle Scholar
- Shlomi T, Segal D, Ruppin E, Sharan R: QPath: a method for querying pathways in a protein-protein interaction network. BMC bioinformatics. 2006, 7: 199- 10.1186/1471-2105-7-199PubMed CentralView ArticlePubMedGoogle Scholar
- Ferro A, Giugno R, Pigola1 G, Pulvirenti A, Skripin D, Bader GD, Shasha D: NetMatch: a Cytoscape plugin for searching biological networks. Bioinformatics. 2007, 23: 910-912. 10.1093/bioinformatics/btm032View ArticlePubMedGoogle Scholar
- Durbin R, Eddy SR, Krogh A, Mitchison GJ: BiologicalSequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. 1999, Cambridge: Cambridge University PressGoogle Scholar
- He H, Singh AK: Closure-Tree: An Index Structure for Graph Queries. Proceedings of the 22nd International Conference on Data Engineering (ICDE), Atlanta. 2006, 38-Google Scholar
- Medina M: Genomes, phylogeny, and evolutionary systems biology. Proc Natl Acad Sci USA. 2005, 102: 6630-6635. 10.1073/pnas.0501984102PubMed CentralView ArticlePubMedGoogle Scholar