Prediction of novel synthetic pathways for the production of desired chemicals
© Cho et al; licensee BioMed Central Ltd. 2010
Received: 14 September 2009
Accepted: 28 March 2010
Published: 28 March 2010
There have been several methods developed for the prediction of synthetic metabolic pathways leading to the production of desired chemicals. In these approaches, novel pathways were predicted based on chemical structure changes, enzymatic information, and/or reaction mechanisms, but the approaches generating a huge number of predicted results are difficult to be applied to real experiments. Also, some of these methods focus on specific pathways, and thus are limited to expansion to the whole metabolism.
In the present study, we propose a system framework employing a retrosynthesis model with a prioritization scoring algorithm. This new strategy allows deducing the novel promising pathways for the synthesis of a desired chemical together with information on enzymes involved based on structural changes and reaction mechanisms present in the system database. The prioritization scoring algorithm employing Tanimoto coefficient and group contribution method allows examination of structurally qualified pathways to recognize which pathway is more appropriate. In addition, new concepts of binding site covalence, estimation of pathway distance and organism specificity were taken into account to identify the best synthetic pathway. Parameters of these factors can be evolutionarily optimized when a newly proven synthetic pathway is registered. As the proofs of concept, the novel synthetic pathways for the production of isobutanol, 3-hydroxypropionate, and butyryl-CoA were predicted. The prediction shows a high reliability, in which experimentally verified synthetic pathways were listed within the top 0.089% of the identified pathway candidates.
It is expected that the system framework developed in this study would be useful for the in silico design of novel metabolic pathways to be employed for the efficient production of chemicals, fuels and materials.
In the past few decades, various systematic methods have been developed for the prediction of synthetic metabolic pathways for the production of chemicals by employing microorganisms [1–15]. These methods can be classified by whether the approach is based on chemical structural changes, enzymatic information, and/or reaction mechanisms. The method based on chemical structural changes is applied to reconstruct the network which represents the relationship among the biochemical compounds using the structure-based homology analysis [1–4]. This method generates a variety of novel pathways, but prediction to specify the enzymes is difficult. Enzymatic information-based approach focuses on combination of gene knock outs and additions of pathways existing in different organisms [5, 6]. This method is practical to use, but predictions are limited to the synthesis of currently known biochemical compounds. Reaction mechanisms-based approach identifies product candidates that can be driven from a predetermined substrate using a knowledge-based expert system [7–10]. This method predicts novel pathways and compounds according to the accumulated knowledge and rules, but it is limited to identifying biodegradation pathways.
To overcome the disadvantages of the aforementioned methods, the pathway prediction systems were established by combining the given reaction mechanisms and the starting and target compounds [11–13]. These approaches can generate novel compounds and reactions with proposed enzyme candidates. However, the starting and target compounds should be set as known compounds, and thus this method is difficult to be applied to the prediction of a synthetic pathway for a novel compound of interest. A retrosynthesis model, which is a functional group-based synthesis method towards a target compound, has been applied to search desired target chemicals . However, the previous studies provided a huge set of predicted pathways, rather than suggesting more favorable pathways to achieve a goal of efficiently producing a desired chemical. In this study, a system framework was developed to suggest promising enzyme candidates to synthesize desired chemicals based on combined information on chemical structural changes, enzyme characteristics, and reaction mechanisms. The proposed system framework identifies structurally qualified enzymes for the synthesis of predetermined target chemicals and then ranks the enzymes via a prioritization scoring algorithm. Recently, a nice scoring technique to identify preferred pathways by using an automatic design approach for the metabolic pathways has been suggested ; a scoring algorithm was developed for identifying a possible route from a considered host organism. However, this approach cannot be applied to the novel pathways which are not present in the database. Thus, a new scoring algorithm was developed in this paper for the identification of desired novel synthetic pathways. Consequently, the more efficient metabolic pathways for the production of a desired chemical can be proposed.
Results and Discussion
Using the system framework developed in this study, the novel synthetic pathways for the production of isobutanol, 3-hydroxypropionate (3HP), and butyryl-CoA were predicted. In summary, the steps composed of definition of a target compound, route generation, prioritization, and parameter optimization were taken to predict the novel synthetic pathways. The prediction shows a high reliability, in which experimentally verified pathways for the synthesis of isobutanol, 3HP, and butyryl-CoA belonged to top 0.047%, 0.044%, and 0.089% of all the predicted pathway candidates, respectively.
Prediction of synthetic pathways of the production of biofuels and evolutionary parameter optimization
Finally, the preference of each enzyme route candidate was determined as a score calculated by the weighted sum of each factor (Equation 6a in Methods). In the initial predictions, the six pathways were positioned within top 0.55% of the predicted candidates. Specially, isobutanol was within top 0.047% as the best result. From the initial prediction results, each three factor including thermodynamic favorability, pathway distance, and organism specificity is consistent for the six pathways. The other hand, the other factors including binding site covalence and chemical similarity are varied. It is caused that the same enzymes are applied to design the six pathways. Among the results of Atsumi et al. , the production rate of isobutanol was at least three times higher than the production rates of other higher alcohols. Also, the predicted rank for the synthetic pathway via KDC and ADH for the production of isobutanol is higher than other pathways. Therefore, the pathway was considered to be more suitable to produce isobutanol. For the accurate estimate of the relative influence of each factor which is expressed as a parameter, novel synthetic pathway for the production of isobutanol is selected.
The superior enzyme route candidates for the production of each target chemical.
Enzyme route candidates
ec184.108.40.206 → ec220.127.116.11
ec18.104.22.168 → ec22.214.171.124
ec126.96.36.199 → ec188.8.131.52
ec184.108.40.206 → ec220.127.116.11
ec18.104.22.168 → ec22.214.171.124
ec126.96.36.199 → ec188.8.131.52
ec184.108.40.206 → ec220.127.116.11
ec 18.104.22.168 → ec22.214.171.124
ec126.96.36.199 → ec188.8.131.52
ec 184.108.40.206 → ec220.127.116.11
ec18.104.22.168 → ec22.214.171.124
ec126.96.36.199 → ec188.8.131.52
ec184.108.40.206 → ec220.127.116.11
ec18.104.22.168 → ec22.214.171.124
Prediction of synthetic pathways for the production of 3HP
Prediction of alternative synthetic pathways for the production of 1-butanol
Top 10 enzyme route candidates for the synthesis of butyryl-CoA and the prioritization values.
Enzyme route candidates
ec126.96.36.199 → ec188.8.131.52
ec184.108.40.206 → ec220.127.116.11
ec18.104.22.168 → ec22.214.171.124
ec126.96.36.199 → ec188.8.131.52
ec184.108.40.206 → ec220.127.116.11
ec18.104.22.168 → ec22.214.171.124
ec126.96.36.199 → ec188.8.131.52
ec184.108.40.206 → ec220.127.116.11
ec18.104.22.168 → ec22.214.171.124
ec126.96.36.199 → ec188.8.131.52
In this study, a system framework was established to identify promising enzyme candidates to synthesize desired chemicals. This approach can also be applied to find the novel pathways for the biodegradation of chemicals. Through this work, 50 reaction rules representing numerous biochemical reactions were set up for qualitative analysis. The most notable feature of the study is the development of a new quantitative analysis method, prioritization scoring algorithm. Using the novel estimation methods, new opportunities of enzymes can be predicted with greater precision. Moreover, the parameters are estimated by an evolutionary optimization method, and thus more accurate scores can be estimated as more experimentally validated data are added. This in silico prediction system is expected to contribute significantly to in vivo or in vitro experiments.
JChem  was imported to handle chemical structures and GAMS/CPLEX [27, 28] were used to perform parameter optimization. KEGG was employed as a pathway reference database [19, 20]. SMILES/SMARTS [17, 18] were used as chemical structure representation languages, JAVA as a programming language, and MSSQL 2005 Server were used as a database server.
If an enzyme reaction has the same reaction rule with a novel reaction in a desired pathway, then it is matched as a similar reaction. There can exist many matched enzymatic reactions for each reaction in the desired pathway, and thus it is necessary to clarify which enzymatic reactions will be more promising. To address this problem, the similar reactions need to be evaluated quantitatively. The quantitative aspect of likeness is defined by a scoring algorithm - referred to as prioritization. The prioritization method is composed of five factors: binding site covalence, chemical similarity, thermodynamic favorability, pathway distance and organism specificity. Binding site covalence and chemical similarity are evaluated by comparing two reactions. Here, the binding site rules were defined by extending functional groups that are occupied in every reaction rule. The binding site covalence represents the local similarity between two molecules whereas the chemical similarity is estimated based on the entire structures of molecules in each reaction route candidate. Thermodynamic favorability is estimated by chemical structure changes through identified base routes. Pathway distance and organism specificity measure the relationships among enzymes to catalyze reactions in each enzymatic synthetic route from starting to target chemical. Those factors are used to calculate priorities of the enzyme route candidates.
Binding site covalence
i : index of reaction steps in a base route, (i = 1,2,...,n)
n : number of reaction steps in a base route (route length)
j, J : index and set of enzyme route candidates, respectively (∀j ∈ J)
XB, XB j , XB i : binding site covalence of an enzyme route candidate, a reaction route candidate and a reaction step in the base route, respectively
: system name steps in common for two reactions on the substrate side and the product side, respectively
: system name steps of a novel reaction f i on the substrate side and the product side for a reaction step i, respectively
: system name steps of a known reaction g i on the substrate side and the product side for a reaction step i, respectively
XC, XC j , XC i : chemical similarity of an enzyme route candidate, a reaction route candidate and a reaction step in the base route, respectively
T(f, g): Tanimoto coefficient for two molecules and while 0 ≤ T(f, g) ≤ 1
: substrate and product of a reaction step in a base route f i for a step i, respectively
: substrate and product of a reaction step in a reaction route candidate g i for a step i, respectively
XT : normalized thermodynamic favorability of an enzyme route candidate
XT i : Gibbs free energy of formation of a step in a base route
XT j : thermodynamic favorability of an enzyme route candidate
Here, we adjusted the range of thermodynamic favorability from zero to one for comparison with other factors. Moreover, a larger value indicates more fluctuation so that the values should be converted. To make a larger value more favorable, the negative exponential function of each thermodynamic favorability ratio with respect to the maximum value was applied. After the three factors have been estimated, all the identified routes are rearranged as enzyme route candidates. If two or more reactions are catalyzed by one enzyme, then the best values of binding site covalence and chemical similarity are selected (Equation 1c, 2c).
XP : pathway distance of an enzyme route candidate
XPi,i+1 : revised pathway distance between i th and (i + 1) th steps in an enzyme route candidate
pi,i+1 : pathway distance between i th and (i + 1) th steps in an enzyme route candidate
Finally, all the evaluated pathway distances between steps in an enzyme route candidate are multiplied. If one distance is increased, the co-expression probability of two enzymes is decreased (Equation 4a).
XO : organism specificity of an enzyme route candidate
XOi,i+1 : organism specificity between i th and (i + 1) th step in an enzyme route candidate
o i : number of lineage generations of i th step in an enzyme route candidate
oi,i+1 : number of lineage generations in common for i th and (i + 1) th step in an enzyme route candidate
Final priority score
X: priority score of an enzyme route candidate
α : parameter for binding site covalence
β : parameter for chemical similarity
γ : parameter for thermodynamic favorability
δ : parameter for pathway distance
ε : parameter for organism specificity
Each parameter was initially set to be one; in addition, the parameters were optimized, as introduced in the following section. Finally, the promising enzyme candidates are sorted by the priorities where a higher priority value means greater likelihood to catalyze a novel synthetic route. Since numerous enzyme candidates are ordered quantitatively, promising enzyme candidates among them could be distinguished and applied to experiments.
α' : adjusted parameter for binding site covalence
β' : adjusted parameter for chemical similarity
γ' : adjusted parameter for thermodynamic favorability
δ' : adjusted parameter for pathway distance
ε' : adjusted parameter for organism specificity
XB obj : binding site covalence of the desired candidate obj
XC obj : chemical similarity of the desired candidate obj
XT obj : thermodynamic favorability of the desired candidate obj
XP obj : pathway distance of the desired candidate obj
XO obj : organism specificity of the desired candidate obj
: priority score of a route candidate j with adjusted parameters
: priority score of the desired candidate obj with adjusted parameters
y j : binary variable,
The evolutionary parameter optimization process according to enrolled test cases is presented in Figure 5.
This work was supported by the Korean Systems Biology Research Project (20090065571) of the Ministry of Education, Science and Technology (MEST) through the National Research Foundation of Korea (NRF). Further supports by the LG Chem Chair Professorship, Microsoft, IBM SUR program, and World Class University (WCU) program of the MEST through the NRF (R32-2008-000-10142-0) are appreciated.
- Arita M: Metabolic reconstruction using shortest paths. Simulat Pract Theory. 2000, 8: 109-125. 10.1016/S0928-4869(00)00006-9.View Article
- Feldman HJ, Dumontier M, Ling S, Haider N, Hogue CW: CO: A chemical ontology for identification of functional groups and semantic comparison of small molecules. FEBS Lett. 2005, 579: 4685-4691. 10.1016/j.febslet.2005.07.039View ArticlePubMed
- McShan DC, Rao S, Shah I: PathMiner: predicting metabolic pathways by heuristic search. Bioinformatics. 2003, 19: 1692-1698. 10.1093/bioinformatics/btg217PubMed CentralView ArticlePubMed
- McShan DC, Shah I: Heuristic search for metabolic engineering: de novo synthesis of vanillin. Comput Chem Eng. 2005, 29: 499-507. 10.1016/j.compchemeng.2004.08.038.View Article
- Klopman G, Dimayuga M, Talafous J: META. 1. A program for the evaluation of metabolic transformation of chemicals. J Chem Inf Comput Sci. 1994, 34: 1320-1325.View ArticlePubMed
- Pharkya P, Burgard AP, Maranas CD: OptStrain: a computational framework for redesign of microbial production systems. Genome Res. 2004, 14: 2367-2376. 10.1101/gr.2872004PubMed CentralView ArticlePubMed
- Darvas F: Predicting metabolic pathways by logic programming. J Mol Graph. 1988, 6: 80-86. 10.1016/0263-7855(88)85004-5.View Article
- Greene N, Judson PN, Langowski JJ, Marchant CA: Knowledge-based expert systems for toxicity and metabolism prediction: DEREK, StAR and METEOR. SAR QSAR Environ Res. 1999, 10: 299-314. 10.1080/10629369908039182View ArticlePubMed
- Hou BK, Wackett LP, Ellis LB: Microbial pathway prediction: a functional group approach. J Chem Inf Comput Sci. 2003, 43: 1051-1057.View ArticlePubMed
- Karp PD, Paley S, Romero P: The pathway tools software. Bioinformatics. 2002, 18: 225-232.View Article
- Hatzimanikatis V, Li C, Ionita JA, Henry CS, Jankowski MD, et al.: Exploring the diversity of complex metabolic networks. Bioinformatics. 2005, 21: 1603-1609. 10.1093/bioinformatics/bti213View ArticlePubMed
- Ihlenfeldt WD, Gasteiger J: Computer-assisted planning of organic syntheses: the second generation of programs. Angew Chem Int Ed Engl. 1996, 34: 2613-2633. 10.1002/anie.199526131.View Article
- Li C, Henry CS, Jankowski MD, Ionita JA, Hatzimanikatis V, Broadbelt LJ: Computational discovery of biochemical routes to specialty chemicals. Chem Eng Sci. 2004, 59: 5051-5060. 10.1016/j.ces.2004.09.021.View Article
- Prather KL, Martin CH: De novo biosynthetic pathways: rational design of microbial chemical factories. Curr Opin Biotechnol. 2009, 19: 468-474. 10.1016/j.copbio.2008.07.009.View Article
- Rodrigo G, Carrera J, Prather KL, Jaramillo A: DESHARKY: automatic design of metabolic pathways for optimal cell growth. Bioinformatics. 2008, 24: 2554-2556. 10.1093/bioinformatics/btn471View ArticlePubMed
- Atsumi S, Hanai T, Liao JC: Non-fermentative pathways for synthesis of branched chain higher alcohols as biofuels. Nature. 2008, 451: 86-89. 10.1038/nature06450View ArticlePubMed
- Weininger D: SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci. 1988, 28: 31-36.View Article
- Weininger D, Weininger A, Weininger JL: SMILES: 2. Algorithm for generation of unique SMILES notation. J Chem Inf Comput Sci. 1989, 29: 97-101.View Article
- Kanehisa M, Goto S: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 2000, 28: 27-30. 10.1093/nar/28.1.27PubMed CentralView ArticlePubMed
- Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, et al.: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 1999, 27: 29-34. 10.1093/nar/27.1.29PubMed CentralView ArticlePubMed
- Sarker R, Mohammadian M, Yao X: Evolutionary Optimization. 2002, Springer,
- Segre D, Vitkup D: Analysis of optimality in natural and perturbed metabolic networks. Proc Natl Acad Sci USA. 2002, 99: 15112-15117. 10.1073/pnas.232349399PubMed CentralView ArticlePubMed
- Werpy T, Petersen G: Top value added chemicals from biomass Volume I - Results of screening for potential candidates from sugars and synthesis gas [Electronic Version]. DOE Science and Technology Information. 2004,
- Gokarn RR, Selifonova OV, Jessen HJ, Steven JG, Selmer T, Buckel W: 3-hydroxypropionic acid and other organic compounds. Patent application. 2001, no. PCT/US2001/043607,
- Jing X, Meng X, Xian M: Biosynthetic pathways for 3-hydroxypropionic acid production. Appl Microbiol Biotechnol. 2009, 82: 995-1003. 10.1007/s00253-009-1898-7View Article
- Csizmadia F: JChem: Java applets and modules supporting chemical database handling from web browsers. J Chem Inf Comput Sci. 2000, 40: 323-324.View ArticlePubMed
- Brook A, Kendrick D, Meeraus A, Raman R: GAMS: A User's Guide. GAMS Development Corporation. 2002,
- Bixby RE: Progress in linear programming. ORSA J Computing. 1994, 6: 15-22.View Article
- Weiner H: Enzymology and molecular biology of carbonyl metabolism 10. Gulf Professional Publishing. 2001,
- Bitetti-Putzer R, Joseph-McCarthy D, Hogle JM, Karplus M: Functional group placement in protein binding sites: a comparison of GRID and MCSS. J Comput Aided Mol Des. 2001, 15: 935-960. 10.1023/A:1014309222984View ArticlePubMed
- Varadwaj PK, Lahiri T: Functional group based ligand binding affinity scoring function at atomic environmental level. Bioinformation. 2009, 3: 268-274.PubMed CentralView ArticlePubMed
- Gasteiger J, Engel T: Handbook of Chemoinformatics. 2003, Wiley-VCH,View Article
- Holliday JD, Hu CY, Wilett P: Grouping of coefficients for the calculation of inter-molecular similarity and dissimilarity using 2D fragment bit-strings. Comb Chem High Throughput Screen. 2002, 5: 155-166.View ArticlePubMed
- Mavrovouniotis ML: Group contributions for estimating standard gibbs energies of formation of biochemical compounds in aqueous solution. Biotechnol Bioeng. 1990, 36: 1070-1082. 10.1002/bit.260361013View ArticlePubMed
- Mavrovouniotis ML: Estimation of standard Gibbs energy changes of biotransformations. J Biol Chem. 1991, 266: 14440-14445.PubMed
- Deza MM, Deza E: Dictionary of distances. 2006, ISBN 0444520872, Elsevier,
- Croes D, Couche F, Wodak SJ, Helden J: Inferring meaningful pathways in weighted metabolic networks. J Mol Biol. 2006, 356: 222-236. 10.1016/j.jmb.2005.09.079View ArticlePubMed
- Rion SC, Teichmann SA, Thoronton JM: Homology, Pathway Distance and Chromosomal Localization of the Small Molecule Metabolism Enzymes in Escherichia coli. J Mol Biol. 2002, 318: 911-932. 10.1016/S0022-2836(02)00140-7View Article
- Aguilar D, Aviles FX, Querol E, Sternberg MJ: Analysis of phenetic trees based on metabolic capabilities across the three domains of life. J Mol Biol. 2004, 340: 491-512. 10.1016/j.jmb.2004.04.059View ArticlePubMed
- Pace NR: A molecular view of microbial diversity and the biosphere. Science. 1997, 276: 734-740. 10.1126/science.276.5313.734View ArticlePubMed
- Sankoff D: Edit distance for genome comparison based on non-local operations. Springer Berlin. 2006,
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.