Metabolic pathway alignment between species using a comprehensive and flexible similarity measure
© Li et al; licensee BioMed Central Ltd. 2008
Received: 15 July 2008
Accepted: 24 December 2008
Published: 24 December 2008
Comparative analysis of metabolic networks in multiple species yields important information on their evolution, and has great practical value in metabolic engineering, human disease analysis, drug design etc. In this work, we aim to systematically search for conserved pathways in two species, quantify their similarities, and focus on the variations between them.
We present an efficient framework, Metabolic Pathway Alignment and Scoring (M-PAS), for identifying and ranking conserved metabolic pathways. M-PAS aligns all reactions in entire metabolic networks of two species and assembles them into pathways, taking mismatches, gaps and crossovers into account. It uses a comprehensive scoring function, which quantifies pathway similarity such that we can focus on different pathways given different biological motivations. Using M-PAS, we detected 1198 length-four pathways fully conserved between Saccharomyces cerevisiae and Escherichia coli, and also revealed 1399 cases of a species using a unique route in otherwise highly conserved pathways.
Our method efficiently automates the process of exploring reaction arrangement possibilities, both between species and within species, to find conserved pathways. We not only reconstruct conventional pathways such as those found in KEGG, but also discover new pathway possibilities. Our results can help to generate hypotheses on missing reactions and manifest differences in highly conserved pathways, which is useful for biology and life science applications.
Comparative analysis of metabolic networks in different species yields information important for both biology (understanding evolution/speciation, annotating new genomes etc.) and life science applications (e.g. in biotechnology, pharmacology). Therefore, it has been an active research field for the last decade. For example, Dandekar et al.  combined biochemical data analysis, elementary flux mode analysis and comparative genome analysis to compare glycolytic pathways in 17 species. Jeong et al.  and Ravasz et al.  studied the global topological properties of the metabolic networks in 43 species. In addition, Küffner et al.  used Petri nets to compare database contents and define differential metabolic displays (DMDs), which allow to compare metabolic networks by identifying intersection and difference sets of reactions. As one of the applications, Heymans et al.  derived phylogenetic trees based on metabolic pathway comparison. Guimerà et al.  analyzed the modularity of the metabolic networks of 18 organisms, and classified metabolites and enzymes based on their roles in connecting different functional modules. Díaz-Mejía et al.  investigated the relation of network modularity and distance between reactions with the retention of gene duplicates in various species and databases. More generally, a review on biological network comparison problems, techniques and applications is given by Sharan et al. .
In studies up till now, however, only little work focused explicitly on the variations between species in conserved pathways, and to our knowledge no alignment of entire networks, exploiting all reaction arrangement possibilities, has been carried out yet. Moreover, the similarity measures used to align metabolic pathways is often not comprehensive, as compounds or network structure are neglected. For example, Tohsato et al.  align pathways based on enzyme EC number similarity only, discarding information on the compounds involved. Yang et al.  perform path matching and graph matching to query certain metabolic pathways or subgraphs in a predefined graph, but also use a similarity measure based on EC numbers only. Although Forst et al.  define the distance between pathways as a combination of distances between compounds and distances between enzymes, they only consider sequence similarity, and the compounds are limited to amino acids. In , sets of reactions in multiple pathways are compared, omitting the connectivity between the reactions. Finally, the pathway similarity score in [5, 13, 14] combines EC number similarity and network topology, but does not include compounds, and alignments are between predefined sub-networks only. Therefore, the comparison is limited to conventional pathways, and different parts of the cellular metabolism are not associated with each other.
A naive approach to find conservation and variations between metabolic networks would be to search for common reactions and reaction pairs, using different cofactors or enzymes in the two species. Besides being inefficient, this approach isolates reactions from their upstream and downstream processes. Instead, we search for conserved pathways, rather than single reactions. In this way, we place the reactions in their metabolic functional context, which helps to 1) filter out isolated reactions not involved in pathways, 2) provide more evidence to claim part of a pathway is conserved, given that neighboring reactions are conserved, 3) interpret the resulting pathways.
Further, we rank the aligned pathways according to their similarities (i.e. level of conservation), which prioritizes them for further investigation. To this end, a novel scoring function is proposed, which forms the core contribution of this paper. It compares all components of two pathways by measuring similarities between substrate sets, product sets, enzyme functions, enzyme sequences, and alignment topology. The resulting individual similarity measures are then integrated into a single score. This scoring function has a generic form and is flexible enough to address various biological questions, by selecting different parameter settings.
We align the pathways from two species in a strict way, in order to investigate highly conserved metabolic pathways, i.e. pathways with very similar structure and limited variation between species. More specifically, two metabolic pathways can be aligned into a conserved pathway only if their individual reactions transform common substrates into common products in each step. We call such a pair of matching reactions a building block (BB). Next, these building blocks are assembled into pathways of a specified length, taking reaction directions into account. Finally, we compute the similarity score for each aligned pathway, and obtain interesting pathways as those pathways that have high similarity scores.
In M-PAS, reactions are represented at three levels of generalization: nodes, supernodes and hypernodes, respectively (see Figure 2). The low-level representation gives the finest details of reactions, in which each compound and each enzyme constitutes a node (Figure 2a). The medium-level representation generalizes reactions, so that all substrates and products of a reaction compose two compound supernodes, and all enzymes in that reaction form an enzyme supernode (Figure 2b). Such a generalized representation is useful due to the multiple-to-multiple property of metabolic reactions, i.e. multiple substrates can be catalyzed by multiple enzymes into multiple products [8, 17]. Finally, at the high-level representation, the corresponding compound supernodes and enzyme supernodes from two aligned reactions are combined into compound hypernodes and enzyme hypernodes, respectively (Figure 2c–d).
These different levels of representation enable the comparison of reactions in a detailed yet flexible manner. Thus, a particular compound node can be part of various compound supernodes given different co-factors in different reactions, and further can be part of various compound hypernodes due to different alignments with other compound supernodes. The same holds for enzyme nodes. This flexible representation not only reflects the versatility of the metabolic network conveniently, but is also necessary in order to express and quantify the similarity of reactions, which will be explained in the section Scoring function.
We allow for up to one mismatch or one gap in a building block, in order to incorporate alternative pathways, evolutionary diversity and annotation errors. That is, in an "enzyme mismatch" building block (em), the first two digits of the EC numbers of their enzymes are not the same. Gaps occur when a single reaction and a series of reactions connected in tandem share common substrates and products, indicating that the number of reactions to transform the specific substrates into the specific products differs between species. The building blocks containing one gap are "direct-gap" (dg) and "enzyme mismatch-gap" (eg). Finally, we include "enzyme crossover match" building blocks (ec) to accommodate possible variations in the order of the catalysis. That is, apart from sharing common substrates and end products in two reactions in each species, the first two EC number digits of the first and second reaction in one species are the same as those of the second and first reaction in the other species, respectively.
To enhance the informativeness of these resulting pathways, we add a constraint to avoid redundant building blocks. That is, a non-identical building block can be constructed only if it contains at least one unique reaction in one of the species, which is absent in the other species. This is because if two reactions converting the same substrate into the same product (e.g. A and B) are present in both species, two "identical" building blocks A1-A2 and B1-B2 are constructed already. Therefore, any other combinations of these reactions (i.e. A1-B2 and B1-A2) are just worse matches.
We set out by specifying a number of criteria for the design of the scoring function. First, similarities of all reaction components should be considered: substrate sets, product sets, enzyme functions and enzyme sequences, respectively. Second, the scoring function should be flexible and adaptable according to the user's biological interests. For example, the user might want to find pathways containing a particular structure (e.g. with a gap); or focus on enzymes only, but not on compounds; or seek to find a completely alternative pathway in which the enzymes are very dissimilar between two species. Third, since we aim to investigate many aspects of an aligned pathway and obtain multiple similarity scores, a reasonable way of integrating these is required. Finally, we should consider specificity in computing similarities, since both distributions of compound connectivity and enzyme EC number hierarchy show large variation [9, 18], i.e. some compounds and EC subclasses appear more often than the others in the background.
1) Total score
According to the criteria above, we first compute similarity scores independently for all compound hypernodes and enzyme hypernodes in an aligned pathway, taking all aspects into account. These are then converted into z-scores before integration to account for their diverse distributions.
Z(B) is the z-score for a building block B. Let c and e denote a compound hypernode and an enzyme hypernode respectively, and denote the set of all c's and e's in a building block B by C B and E B , respectively. Users can define a preferred building block structure by assigning different biases (Z0(B)) to different building block types. For example, if building blocks with gaps are preferred in a query, then these types of building block can be assigned a large positive bias. Weights ω c , ω e ∈ [0, 1] can be used to assign different relative importance to compound similarity and enzyme similarity (resembling the α parameter in ).
2) Compound similarity
where c1 and c2 denote the compound supernodes that form c, and |x| denotes the number of compound nodes in x.
where μ AC and σ AC are the mean and standard-deviation of P A (c) over all possible compound supernode pairs, which represent the expected amount of overlap when the pairing would be random.
The other term is Z S (c), the specificity of the overlap when compared to all possible supernode pairs. That is, if two compound supernodes have overlapping compounds, we take into account the frequency of obtaining this particular overlap at random. We consider two sets of substances to be more similar if the overlapping part is more specific, i.e. not observed frequently by chance. Moreover, considering specificity of compounds may result in more biologically meaningful pathways, since metabolic pathways seem to represent paths through the least "promiscuous" compounds .
where μ SC and σ SC are the mean and standard-deviation of P S (c) computed over all mn compound supernode pairs. The numerator in (5) is the number of times the specific overlap in compound node in c, i.e. (c1 ∩ c2), is observed in the intersections of all possible compound supernode pairs.
3) Enzyme similarity
where μ AE and σ AE are computed from P A (e) over all possible enzyme supernode pairs.
with u and v the total numbers of enzyme supernodes in the two species.
where μ q and σ q are the mean and standard-deviation of Q(e) over all possible enzyme supernode pairs.
Note that there might exist multiple EC numbers and multiple sequences in each enzyme supernode, as illustrated in Figure 2. So we first compute all Z(e) given all possible combinations of EC numbers and corresponding sequences in enzyme hypernode e. Since we aim to find the conserved part between pathways, the highest Z(e) is taken to be the enzyme similarity score for this pair of supernodes, indicating the similarity of the most conserved part between them.
Moreover, when gaps are present, we align two enzyme supernodes in one species with one enzyme supernode in another species separately, obtaining two Z(e). Again, the higher one is selected for this building block to represent the similarity of the most conserved part.
Reaction definitions were obtained from Release 42.0 of the KEGG LIGAND composite database , updated on May 14, 2007. The species-specific reactions and enzyme lists were retrieved from KEGG/XML and KEGG/PATHWAY. Protein sequences were downloaded from UniProtKB/Swiss-Prot . Discrepancies and missing information (e.g. gene names and EC numbers) were resolved manually. Twenty-six currency metabolites (see Appendix) are excluded from consideration during pathway construction to avoid finding large numbers of pathway shortcuts [3, 24, 25]. Note that the reactions containing these metabolites are still included in the algorithm. Currency metabolites are only excluded in aligning reactions into building blocks and assembling pathways, i.e. we do not match or connect two reactions if they only share the same currency metabolites.
Based on 881 enzymatic reactions in S. cerevisiae (with 1762 compound supernodes and 881 enzyme supernodes) and 1106 enzymatic reactions in E. coli (with 2212 compound supernodes and 1106 enzyme supernodes), 640 building blocks are constructed. These are further concatenated into pathways using a backtracking search, starting from a certain substrate. Each pathway contains four different building blocks, and is constrained so that one reaction cannot appear more than once in one species, and one compound (excluding the currency metabolites) cannot be traversed more than once in one species, e.g. a compound can not be both the substrate and product of a reaction, or be the products of two reactions in the pathway. Using 69% of all available building blocks, 2597 length-four pathways are assembled, starting from 245 substrates. These substrates are not restricted to external metabolites, since our pathways are not necessarily routes from uptake to secretion.
The parameter settings and biological emphases in the five queries
0 for all
0 for all
0 for all
100 for "dg"and "eg",
0 for all
In each query, the similarity scores of all 2597 length-four pathways found are computed using (1) and the highest-scoring pathway(s) of a certain substrate is referred as the best pathway for that substrate.
It is useful to investigate the building block types as they reflect the differences between species in terms of reactions use, which is not reflected in the scores. Therefore, we categorize the pathways w.r.t. their configurations of building blocks, in order to gain insight in the impact of the parameter settings on the resulting pathway properties. Abbreviations are used to denote the six categories: "i-i-i-i" indicates a pathway consists of four "identical" building blocks; "d" indicates that the pathway has at least one "direct" building block; "em", "dg", "eg" and "ec" are defined likewise.
The scoring function can address different biological questions
Using our scoring function, different parameter settings result in different best pathways, highlighting different aspects of the pathway features.
Table 1 and Figure 4 can be used as a guide to design a query for a specific purpose. For example, Query 1 finds generally similar pathways in two species. Query 2 only considers enzyme similarity, therefore more best pathways containing "dg" and "ec" are found (Figure 4). Query 5 is a special case of Query 2, looking for conserved pathways with similar enzyme functions. Compound and enzyme sequence similarities are neglected, thus providing more possibilities for predicting the functions of unannotated genes.
Gaps are preferred in Query 4. Indeed, we can see a large increase in best pathways with "dg" and "eg" in Figure 4. Moreover, in-depth analysis shows that the numbers of "dg" and "eg" building blocks in the pathways have also increased four to seven times, demonstrating that the increase of found best pathways with "dg" and "eg" is not because a limited number of building blocks are used repeatedly. The results may hint at additional intriguing evolutionary phenomena: if one enzyme in species 1 is comparable to the combined functionality of two enzymes in species 2, it may be caused by gene fusion in species 1, or gene duplication in species 2 .
Comparing results of different queries can help infer additional details
It can be instructive to investigate the differences in the results between various queries. For instance, the best pathways of a certain substrate in Query 2 and not found in the best pathways of the same substrate in Query 1 have similar enzymes but use different cofactors or less specific substrates. They are well-conserved, a-specific enzymes. Many pathways containing "dg" are found in Query 2 for this reason, as we can see from Figure 4. Figure 5b shows an example, which is found in Query 2 due to its high enzyme similarity, but not in Query 1 or Query 4 for the same substrate due to its low compound similarity. In another example (not shown), a best pathway in Query 2 producing pyruvate is filtered out in Query 1 because pyruvate is less specific, as it is present in 147 reactions .
Combining the component scores makes sense
Figure 6 presents the component scores of each type of building block and shows that the various information sources are not correlated (see also ), making it worthwhile to combine them. In addition, Figure 6 reveals the diverse similarity fingerprint of each type of building block, which calls for further research. For example, the variance of the sequence similarity score in "i" is large, which might arise because of different specificity, horizontal gene transfer, gene fusions, or the fact that only subunits of the enzymes are the same. As to "ec", their sequences are very dissimilar in spite of their similar functions. Possible reasons could be that the enzymes have different substrate specificities, or that intermediate substrates are very different. They could also have been isoenzymes in parallel pathways, having become specialized to one species during evolution.
The conserved part of two aligned networks is scale-free
Short pathways lead to interpretable results
Our methodology has no inherent limit on the pathway lengths. That is, it can construct and score pathways consisting of any number of building blocks. To find longer pathways, one can simply extend the pathway length in the search step. Actually, we conducted experiments without a length limit, which resulted in aligned pathways up to a length of 42 building blocks. Another solution would be to assemble the current length-four short pathways into longer pathways.
However, not all pathway lengths give meaningful results. When the length becomes too short, the method starts to compare individual reactions and loses the power of metabolic functional context, as stated in the background. As a result, some isolated reactions are also included in the results. For example, 31% of building blocks (i.e. length-one pathways) contain isolated reactions, which are not included in any length-four pathway.
M-PAS reveals pathway diversity and alternatives
These results demonstrate the value of including non-identical building blocks, as otherwise these strongly conserved pathways would have been overlooked. In particular, building blocks with gaps or crossovers would be hard to detect manually, e.g. Figure 9a and Figures 11d–f. Take Figure 11d as an example. By comparing reactions in two species, normally we can only find a reversible reaction present in both species which catalyzes indoleglycerol phosphate into L-tryptophan. However, considering gaps allows us to find two consecutive reactions in one of the species which perform the same transformation in two steps. In the end, our algorithm found a unique alternative pathway in E. col i which transforms indoleglycerol phosphate to indole first by an irreversible reaction, followed by a unique reaction transforming indole to L-tryptophan.
New links between different parts of metabolism are found
Our method is global, starting from constructing building blocks to the assembly of pathways. Therefore, the resulting pathways have a reasonable coverage of the network, and explicitly include links between different parts of metabolism, which are displayed in 202 pathway maps of metabolism in KEGG . For example, Figure 9 shows four to seven such maps are linked together in each aligned pathway (see caption).
Moreover, M-PAS not only links different parts of metabolism within one species, but also associates diverse parts in two species with each other, offering potential interesting targets for bioengineering. For instance, in Figure 11e, the unique reaction of S. cerevisiae is found in glycine, serine and threonine metabolism, while the unique reaction of E. coli is found in cysteine metabolism. Therefore it will not be found if we only look at one map or one species at a time.
Primary metabolism is highly conserved
Three pathways with the highest scores in Query 1 are shown in Figure 12. They represent the most conserved part of the metabolic network in the two species and are therefore expected to be important. Not surprisingly, the three pathways are all involved in primary metabolism. Moreover, they all have "i-i-i-i" configuration, meaning all reactions in the pathways are conserved across species. Clement et al.  also pointed out that "vital biological processes in a group of related species should be conserved and expressed by a significant number of reactions in all the organisms of the group".
We can also observe this in Figure 11, where the involved parts of metabolism in the highest-scoring building blocks are rather central processes, e.g. starch and sucrose metabolism, citrate cycle (TCA cycle), CO2 fixation and other important amino acid metabolisms.
In this work, we extend our former alignment framework and propose a novel scoring method to identify conserved metabolic pathways and quantify the level of conservation in an efficient and comprehensive manner. Based on the six types of building blocks, a systematic search is conducted in the network. We find and rank conserved pathways given certain substrates, and shed light on the variations between species within a metabolic functional context. This is not possible by simple comparison of reaction lists or enzyme lists.
Our method combines individual reactions, so that we can find conserved pathways that are not represented in conventional databases. Since the alignment and search are conducted in the whole network, M-PAS unites reactions in different KEGG maps, revealing links and relating reactions with common upstream substrates and downstream products which might be elusive if we only look at subsets of the network.
Our similarity measure combines uncorrelated information sources, including similarities of substrate sets, product sets, enzyme functions, enzyme sequences and alignment configurations. The function has a generic form and is capable of measuring pathway similarity given different biological emphases. Due to its hierarchical integration structure, it is readily extensible to include other relevant similarity measures if available (e.g. enzyme affinities), or to modify a component score (e.g. using compound molecular similarity scores). Moreover, the proposed function is plausible since parts of primary metabolism, which are known to be well conserved, are found to be abundant in our top-scoring pathways and building blocks.
M-PAS reveals highly conserved pathways containing diversity or alternatives, which yields important information for biology and life sciences. First, the results give insight into the evolutionary differences between species. For instance, the two species apparently diverged to process 17 substrates differently, so that no "i-i-i-i" pathways are found starting from them. This divergence calls for special treatment of these substrates per species in analysis and applications. Second, the diversity and alternatives in conserved pathways also provide additional ways to construct metabolic networks for currently unannotated species. Third, our analysis lists potential candidate enzymes for bioengineering, i.e. certain natural enzymes can be removed, introduced, or changed so that we can select a favorable pathway to enforce production of a metabolite of interest, or block pathways leading to certain unfavorable products. In particular, alternative pathways have to be considered in drug design, because blocking central enzymes might not be effective when alternative pathways provide other routes, and cause drug resistance in the pathogen population .
M-PAS is currently constrained to finding linear pathways which are strictly similar. Although further processing these linear pathways, e.g. combining them, could reconstruct some tree-like subnets and cycles, not all network structures can be captured. M-PAS could be extended to construct and score more complex pathway topologies that capture more variation. First, to capture more variation, one may extend the building block definition to include larger differences, e.g. a 'dg' with two gaps, or to allow compound mismatch. But care needs to be taken to keep the computational load acceptable and to avoid linking unrelated pathways. Alternatively, one may reduce the pathway length, e.g. to assemble two building blocks into a pathway to capture diverse pathways with short overlaps. However, as discussed earlier, when the pathway length becomes too short, the method starts to compare individual reactions. To find more complex pathway topologies, a more complex search algorithm is required. An alternative would be to expand our building block definition to incorporate more types of network motifs. But again, the computational load will increase significantly.
The complementary reaction information of multiple well-studied model species provides more confidence and more possibilities to transfer this information to a new species. Although M-PAS currently only performs pairwise alignment on two species, we expect even more informative results when it is applied on multiple species, and larger differences will be found as the phylogenetic distance increases. Finally, by relating different sets of enzymes in different species to a common metabolic function, this work provides an infrastructure in which regulatory factors can be incorporated, and functional hypotheses can be generated.
The twenty-six currency metabolites are ATP, ADP, UTP, UDP, GTP, GDP, AMP, UMP, GMP, NAD, NADH, NADP, NADPH, acetyl-CoA, CoA, propanoyl-CoA, L-glutamine, L-glutamate, 2-oxoglutarate, CTP, CDP, CMP, H2O, CO2, NH2, and phosphate.
The authors would like to thank Rogier J.P. van Berlo, Domenico Bellomo and Peter van Nes for their help and constructive discussions.
This work was part of the BioRange program of the Netherlands Bioinformatics Center (NBIC), which is supported by a BSIK grant through the Netherlands Genomics Initiative (NGI).
- Dandekar T, Schuster S, Snel B, Huynen M, Bork P: Pathway alignment: Application to the comparative analysis of glycolytic enzymes. Biochemical J. 1999, 343 (1): 115-124. 10.1042/0264-6021:3430115.View ArticleGoogle Scholar
- Jeong H, Tombor B, Albert R, Oltvai ZN, Barabási AL: The large-scale organization of metabolic networks. Nature. 2000, 407 (6804): 651-654. 10.1038/35036627View ArticlePubMedGoogle Scholar
- Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabási AL: Hierarchical organization of modularity in metabolic networks. Science. 2002, 297: 1551-1555. 10.1126/science.1073374View ArticlePubMedGoogle Scholar
- Küffner R, Zimmer R, Lengauer T: Pathway analysis in metabolic databases via differential metabolic display (DMD). Bioinformatics. 2000, 16 (9): 825-836. 10.1093/bioinformatics/16.9.825View ArticlePubMedGoogle Scholar
- Heymans M, Singh AK: Deriving phylogenetic trees from the similarity analysis of metabolic pathways. Bioinformatics. 2003, 19: i138-i146. 10.1093/bioinformatics/btg1018View ArticlePubMedGoogle Scholar
- Guimerà R, Sales-Pardo M, Amaral LAN: A network-based method for target selection in metabolic networks. Bioinformatics. 2007, 23 (13): 1616-1622. 10.1093/bioinformatics/btm150PubMed CentralView ArticlePubMedGoogle Scholar
- Díaz-Mejía JJ, Pérez-Rueda E, Segovia L: A network perspective on the evolution of metabolism by gene duplication. Genome Biology. 2007, 8: R26- 10.1186/gb-2007-8-2-r26PubMed CentralView ArticlePubMedGoogle Scholar
- Sharan R, Ideker T: Modeling cellular machinery through biological network comparison. Nature Biotechnology. 2006, 24 (4): 427-433. 10.1038/nbt1196View ArticlePubMedGoogle Scholar
- Tohsato Y, Matsuda H, Hashimoto A: A multiple alignment algorithm for metabolic pathway analysis using enzyme hierarchy. Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology. 2000, 376-383.Google Scholar
- Yang Q, Sze S: Path matching and graph matching in biological networks. J Computational Biology. 2007, 14: 56-67. 10.1089/cmb.2006.0076.View ArticleGoogle Scholar
- Forst CV, Schulten K: Phylogenetic analysis of metabolic pathways. J Mol Evol. 2001, 52: 471-489.View ArticlePubMedGoogle Scholar
- Clemente JC, Satou K, Valiente G: Phylogenetic reconstruction from non-genomic data. Bioinformatics. 2006, 23: e110-e115. 10.1093/bioinformatics/btl307.View ArticleGoogle Scholar
- Pinter RY, Rokhlenko O, Yeger-Lotem E, Ziv-Ukelson M: Alignment of metabolic pathways. Bioinformatics. 2005, 21 (16): 3401-3408. 10.1093/bioinformatics/bti554View ArticlePubMedGoogle Scholar
- Zhenping L, Zhang S, Wang Y, Zhang X, Chen L: Alignment of molecular networks by integer quadratic programming. Bioinformatics. 2007, 23 (13): 1631-1639. 10.1093/bioinformatics/btm156View ArticlePubMedGoogle Scholar
- Li Y, de Ridder D, de Groot MJL, Reinders MJT: Metabolic pathway alignment (M-Pal) reveals diversity and alternatives in conserved networks. Advances in Bioinformatics & Computational Biology. 2008, 6: 273-285. London: Imperial College PressGoogle Scholar
- Kelley BP, Sharan R, Karp RM, Sittler T, Root DE, Stockwell BR, Ideker T: Conserved pathways within bacteria and yeast as revealed by global protein network alignment. Proc Natl Acad Sci USA. 2003, 100 (20): 11394-11399. 10.1073/pnas.1534710100PubMed CentralView ArticlePubMedGoogle Scholar
- Krishnamurthy L, Nadeau J, Ozsoyoglu G, Ozsoyoglu M, Schaeffer G, Tasan M, Xu W: Pathways database system: An integrated system for biological pathways. Bioinformatics. 2003, 19 (8): 930-937. 10.1093/bioinformatics/btg113View ArticlePubMedGoogle Scholar
- Croes D, Couche F, Wodak SJ, van Helden J: Inferring meaningful pathways in weighted metabolic networks. J Mol Biol. 2006, 356: 222-236. 10.1016/j.jmb.2005.09.079View ArticlePubMedGoogle Scholar
- Hedges LV, Olkin I: Statistical methods for meta-analysis. 1985, Orlando, FL: Academic PressGoogle Scholar
- Hwang D, Rust AG, Ramsey S, Smith JJ, Leslie DM, Weston AD, de Atauri P, Aitchison JD, Hood L, Siegel AF, Bolouri H: A data integration methodology for systems biology. Proc Natl Acad Sci. 2005, 102 (48): 17296-17301. 10.1073/pnas.0508647102PubMed CentralView ArticlePubMedGoogle Scholar
- Hughes JD, Estep PW, Tavazoie S, Church GM: Computational identification of cis-regulatory elements associated with groups of functionally related genes in saccharomyces cerevisiae. J Mol Biol. 2000, 296: 1205-1214. 10.1006/jmbi.2000.3519View ArticlePubMedGoogle Scholar
- Goto S, Nishioka T, Kanehisa M: LIGAND: chemical database for enzyme reactions. Bioinformatics. 1998, 14 (7): 591-599. 10.1093/bioinformatics/14.7.591View ArticlePubMedGoogle Scholar
- The UniProt Consortium: The Universal Protein Resource (UniProt). Nucleic Acids Res. 2008, 36: D190-D195. 10.1093/nar/gkm895PubMed CentralView ArticleGoogle Scholar
- Fell DA, Wagner A: The small world of metabolism. Nature Biotechnology. 2000, 18: 1121-1122. 10.1038/81025View ArticlePubMedGoogle Scholar
- Ma HW, Zeng AP: Reconstruction of metabolic networks from genome data and analysis of their global structure for various organisms. Bioinformatics. 2003, 19: 270-277. 10.1093/bioinformatics/19.2.270View ArticlePubMedGoogle Scholar
- Sandmeier E, Hale TI, Christen P: Multiple evolutionary origin of pyridoxal-5'-phosphate-dependent amino acid decarboxylases. European J of Biochemistry. 1994, 221 (3): 997-1002. 10.1111/j.1432-1033.1994.tb18816.x.View ArticleGoogle Scholar
- Clemente JC, Satou K, Valiente G: Finding conserved and non-conserved reactions using a metabolic pathway alignment algorithm. Genome Informatics. 2006, 17 (2): 46-56.PubMedGoogle Scholar