Prediction of novel synthetic pathways for the production of desired chemicals

Background There have been several methods developed for the prediction of synthetic metabolic pathways leading to the production of desired chemicals. In these approaches, novel pathways were predicted based on chemical structure changes, enzymatic information, and/or reaction mechanisms, but the approaches generating a huge number of predicted results are difficult to be applied to real experiments. Also, some of these methods focus on specific pathways, and thus are limited to expansion to the whole metabolism. Results In the present study, we propose a system framework employing a retrosynthesis model with a prioritization scoring algorithm. This new strategy allows deducing the novel promising pathways for the synthesis of a desired chemical together with information on enzymes involved based on structural changes and reaction mechanisms present in the system database. The prioritization scoring algorithm employing Tanimoto coefficient and group contribution method allows examination of structurally qualified pathways to recognize which pathway is more appropriate. In addition, new concepts of binding site covalence, estimation of pathway distance and organism specificity were taken into account to identify the best synthetic pathway. Parameters of these factors can be evolutionarily optimized when a newly proven synthetic pathway is registered. As the proofs of concept, the novel synthetic pathways for the production of isobutanol, 3-hydroxypropionate, and butyryl-CoA were predicted. The prediction shows a high reliability, in which experimentally verified synthetic pathways were listed within the top 0.089% of the identified pathway candidates. Conclusions It is expected that the system framework developed in this study would be useful for the in silico design of novel metabolic pathways to be employed for the efficient production of chemicals, fuels and materials.


Background
In the past few decades, various systematic methods have been developed for the prediction of synthetic metabolic pathways for the production of chemicals by employing microorganisms [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15]. These methods can be classified by whether the approach is based on chemical structural changes, enzymatic information, and/or reaction mechanisms. The method based on chemical structural changes is applied to reconstruct the network which represents the relationship among the biochemical compounds using the structure-based homology analysis [1][2][3][4]. This method generates a variety of novel pathways, but prediction to specify the enzymes is difficult. Enzymatic information-based approach focuses on combination of gene knock outs and additions of pathways existing in different organisms [5,6]. This method is practical to use, but predictions are limited to the synthesis of currently known biochemical compounds. Reaction mechanisms-based approach identifies product candidates that can be driven from a predetermined substrate using a knowledge-based expert system [7][8][9][10]. This method predicts novel pathways and compounds according to the accumulated knowledge and rules, but it is limited to identifying biodegradation pathways.
To overcome the disadvantages of the aforementioned methods, the pathway prediction systems were established by combining the given reaction mechanisms and the starting and target compounds [11][12][13]. These approaches can generate novel compounds and reactions with proposed enzyme candidates. However, the starting and target compounds should be set as known compounds, and thus this method is difficult to be applied to the prediction of a synthetic pathway for a novel compound of interest. A retrosynthesis model, which is a functional group-based synthesis method towards a target compound, has been applied to search desired target chemicals [14]. However, the previous studies provided a huge set of predicted pathways, rather than suggesting more favorable pathways to achieve a goal of efficiently producing a desired chemical. In this study, a system framework was developed to suggest promising enzyme candidates to synthesize desired chemicals based on combined information on chemical structural changes, enzyme characteristics, and reaction mechanisms. The proposed system framework identifies structurally qualified enzymes for the synthesis of predetermined target chemicals and then ranks the enzymes via a prioritization scoring algorithm. Recently, a nice scoring technique to identify preferred pathways by using an automatic design approach for the metabolic pathways has been suggested [15]; a scoring algorithm was developed for identifying a possible route from a considered host organism. However, this approach cannot be applied to the novel pathways which are not present in the database. Thus, a new scoring algorithm was developed in this paper for the identification of desired novel synthetic pathways. Consequently, the more efficient metabolic pathways for the production of a desired chemical can be proposed.

Results and Discussion
Using the system framework developed in this study, the novel synthetic pathways for the production of isobutanol, 3-hydroxypropionate (3HP), and butyryl-CoA were predicted. In summary, the steps composed of definition of a target compound, route generation, prioritization, and parameter optimization were taken to predict the novel synthetic pathways. The prediction shows a high reliability, in which experimentally verified pathways for the synthesis of isobutanol, 3HP, and butyryl-CoA belonged to top 0.047%, 0.044%, and 0.089% of all the predicted pathway candidates, respectively.

Prediction of synthetic pathways of the production of biofuels and evolutionary parameter optimization
Recently, Atsumi et al. reported novel metabolic engineering strategies for the production of higher alcohols such as 1-propanol, 1-butanol, 2-methy-1-butanol, 3methy-1-butanol, isobutanol, and 2-phenylethanol in Escherichia coli [16]. The six pathways were devised involving two well-known enzymes, 2-keto-acid decarboxylase (KDC, ec4.1.1.1) and alcohol dehydrogenase (ADH, ec1.1.1.1), which is based on the reaction mechanism that 2-keto acids can be converted to aldehydes and then to alcohol. To testify their work, the system framework was employed. The key novel synthetic pathways are presented in Figure 1.
Initially, for example, the structure of the target compound, isobutanol, was entered in the form of SMILES [17,18]. Then, predefined reaction rules were applied to the target compound for generating substrates. The generated substrates were regarded as intermediate target compounds, and the reaction rules were applied again. The recursive generation step was completed when the generation loop has reached the predefined limit. Throughout the loop, various routes for the production of target compound were generated. The system then detects known chemicals and displays them so as to select a starting compound (Figure 2a). The referred reaction rules are shown in Figure 2b. Among the detected known chemicals, 2-ketoisovalerate was selected as the starting compound. After the starting compound was determined, the routes from the starting compound to the target compound were retrieved for further analysis. The routes for the production of the target compound are defined as the base routes. The identified base routes for the synthesis of the isobutanol are shown in the bottom of Figure 2a. Using the mechanisms defined in every reaction step of base routes, the biochemical reactions in the KEGG database [19,20] were classified into the individual groups. Finally, all the routes from the starting compound to the target compound defined as the reaction route can-didates were determined by combination of the reactions in each group. As shown in Figure 3, two base routes for the synthesis of isobutanol were identified and two groups, 506 reactions in B.CO.4 and 106 reactions in B.C.1.2, were classified for each base route. Finally, 107,272 reaction route candidates were determined.
Next, a prioritization method was applied to decide which route candidates are preferred by estimating five priority factors: binding site covalence, chemical similarity, thermodynamic favorability, pathway distance, and organism specificity. Throughout the prioritization step, reaction route candidates were rearranged into enzymebased routes which were defined as enzyme route candidates. If multiple reactions are related to one enzyme, the values of binding site covalence and chemical similarity of the enzyme are determined by the highest values of those of the related reactions. Figure 4 shows the results of prioritization by the estimation of five factors for the synthesis of isobutanol.
Finally, the preference of each enzyme route candidate was determined as a score calculated by the weighted sum of each factor (Equation 6a in Methods). In the initial predictions, the six pathways were positioned within top 0.55% of the predicted candidates. Specially, isobutanol was within top 0.047% as the best result. From the initial prediction results, each three factor including thermodynamic favorability, pathway distance, and organism specificity is consistent for the six pathways. The other hand, the other factors including binding site covalence and chemical similarity are varied. It is caused that the same enzymes are applied to design the six pathways. Among the results of Atsumi et al. [16], the production rate of isobutanol was at least three times higher than the production rates of other higher alcohols. Also, the predicted rank for the synthetic pathway via KDC and ADH for the production of isobutanol is higher than other pathways. Therefore, the pathway was considered to be more suitable to produce isobutanol. For the accurate estimate of the relative influence of each factor which is expressed as a parameter, novel synthetic pathway for the production of isobutanol is selected.
To identify the optimal parameters, the evolutionary parameter optimization shown as Equation 7 in Methods was performed [21,22]. When the parameter values of 1 were used without optimization, the experimentally proven synthetic pathways of isobutanol was ranked at the 20th out of 42,344 enzyme route candidates. After the evolutionary parameter optimization, the parameter set was changed from {1, 1, 1, 1, 1} to {0.703, 1.000, 1.001, 0.671, 0.951} and the rank of the synthetic pathway was improved to the 17th ( Figure 5). The ranks of six path-ways were improved within top 0.42% of the predicted candidates than the initial predictions positioned within top 0.55% as mentioned above. In addition, there were enzyme route candidates that always have higher values in the final priority scores than those of the experimentally proven pathways (Table 1). It means that they have Figure 5 Parameter and rank changes through evolutionary optimization. The parameter changing behaviours are shown. According to the parameter changes, the rank changes of experimentally proven synthetic pathways for the production of isobutanol, 3HP, and butyryl-CoA are also presented. the superior values in all factors. Thus, the superior enzyme route candidates might be used as superior synthetic pathways for more efficient production of the target chemical.

Prediction of synthetic pathways for the production of 3HP
To examine the performance of the system framework developed in this study, the synthetic pathways for the production of 3HP were predicted as another example. 3HP is one of the top value added chemicals suggested by the U.S. Department of Energy [23]. Other than the suggested seven metabolic pathways for the production of 3HP from glucose [24,25], alternative pathways from acryloyl-CoA were identified. The structure of 3HP and its existing and alternative synthetic pathways from acryloyl-CoA are shown in Figure 6a. To identify the synthetic pathways for the production of 3HP from acryloyl-CoA, two base routes shown in Figure 6b and 29,592 reaction route candidates were generated. Next, reaction route candidates were further analyzed quantitatively in order to identify more promising routes by employing a priori-tization algorithm. During the prioritization, 4,524 enzyme route candidates were generated and evaluated. The experimentally proven synthetic pathway from acryloyl-CoA to 3HP through ec4.2.1.17 and ec2.8.3.1 was ranked at the second out of 4,524 enzyme route candidates (within top 0.044%). Since the existing pathway is superior enzyme route candidate of the experimentally proven pathway, evolutionary optimization does not change the parameter values and the rank. This result demonstrates the efficiency and reliability of the developed system framework for identifying novel synthetic pathways for the production of desired chemicals.

Prediction of alternative synthetic pathways for the production of 1-butanol
In this paper, the alternative synthetic pathway for the production of 1-butanol from 2-ketovalerate [16], as Atsumi et al. presented, has been identified. Since the two step synthetic pathway for the production of 1-butanol from butyryl-CoA already exists in KEGG database, the synthetic pathways to the butyryl-CoA were explored. The constructed alternative pathways with base routes are shown in Figure 7. During the route generation process, 3,240 reaction route candidates and 1,120 enzyme route candidates were identified. The enzyme route candidates are quantitatively analyzed by the prioritization method and the resulting top 10 pathways are shown in Table 2. Among the top 10 pathways, the 8th ranked enzyme route candidate, a novel pathway via ec1.2.1.25 and ec5.4.99.13, was successfully synthesized to produce butyryl-CoA from 2-ketoisovalerate. This pathway is structurally identical compared to the existing pathway; however, no organism has been found to produce the butyryl-CoA via the enzymes. In other words, the pathway obtained lower score in the organism specificity. That is the reason why the pathway is ranked at the 8th. After the evolutionary optimization, the rank is slightly changed to the 7th and the parameter set is changed to {0.956, 1.121, 1.001, 0.669, 0.801}. Through this work, the novel alternative synthetic pathway for the production of 1-butanol has been identified. This study demonstrates the applicability of the developed method and system framework.

Conclusions
In this study, a system framework was established to identify promising enzyme candidates to synthesize desired chemicals. This approach can also be applied to find the novel pathways for the biodegradation of chemicals. Through this work, 50 reaction rules representing numerous biochemical reactions were set up for qualitative analysis. The most notable feature of the study is the development of a new quantitative analysis method, prioritization scoring algorithm. Using the novel estimation methods, new opportunities of enzymes can be predicted with greater precision. Moreover, the parameters are estimated by an evolutionary optimization method, and thus more accurate scores can be estimated as more experimentally validated data are added. This in silico prediction system is expected to contribute significantly to in vivo or in vitro experiments.

System framework
To identify promising enzyme candidates that catalyze novel biochemical reactions to synthesize desired chemicals, a system framework that follows the steps of target compound definition, route generation, prioritization, and parameter optimization was developed ( Figure 8).
The detailed procedure is shown in the Results and Discussion using the isobutanol synthesis pathways as examples. Having set isobutanol as a target chemical, the route length was appointed as 2. As shown in Figure 2, the predefined 50 reaction rules were applied twice repeatedly since the route length was 2; the reaction rules are explained in Database construction. Then, the substrate candidates were detected and a starting chemical has been selected from the candidates. Following this step, enzymatic reactions having the same reaction rule with the reactions in the pathway from the starting chemical to the target chemical were matched. Then the prioritization method based on quantitative analysis has been performed to identify which enzymes would be more promising for the desired novel pathway; the prioritiza-tion method is explained in Prioritization. As a result, the combinations of enzymes were sorted in decreasing order of the prioritization scores. If a novel pathway is proven by further studies including experimental validation, the evolutionary parameter optimization is performed to update the parameters based on the experimental data; the parameter optimization is explained in Parameter optimization. JChem [26] was imported to handle chemical structures and GAMS/CPLEX [27,28] were used to perform parameter optimization. KEGG was employed as a pathway reference database [19,20]. SMILES/SMARTS [17,18] were used as chemical structure representation languages, JAVA as a programming language, and MSSQL 2005 Server were used as a database server.

Database construction
For the management of data for pathway/enzyme prediction, two databases were developed: reaction rule database and binding site rule database. First, the reaction rules explaining reaction mechanisms were defined. Basically, a reaction rule is constructed based on the smallest substructure related to the structural change of the main reactant and product by the reaction. Thus, cosubstrates or cofactors are not considered. With the small number of reaction rules, various reactions can be described. In this study, 50 reaction rules were constructed (See additional file 1: The list of 50 reaction rules). As shown in Figure 9a, several reactions catalyzed by ADH (alcohol dehydrogenase; ec1.1.1.1) can be explained by a single reaction rule defined as F.CO.4. After the construction of reaction rules, all the reactions in the database were classified according to the reaction rules. The reaction rules are applied not only to the forward direction, but also to the backward direction. In addition, reaction rules can be combined in order to represent one reaction. As a result, these simple reaction rules can represent 81% of enzymes present in the KEGG database. Reactions are not categorized when the detailed structure of compounds in reactions does not defined-; when compounds of reactions are not carbon-related-or polymer-related; or when the reaction mechanism cannot be defined clearly in terms of the structure changes of both directions, especially some lyases and ligases cannot be identified clearly. Figure 9b shows the example of leukotriene-A4-forming reaction catalyzed by ec4.4.1.20. The backward reaction was defined by cleaving the C-S bond, by the forward direction could not be defined clearly with simple and widely used cosubstrates such as carbon dioxide. The database for binding site rules has been constructed to estimate binding site covalence among the prioritization factors. Since a binding site is defined based on the three dimensional structure of a molecule, it is difficult to identify a binding site with its structural formula alone, without stereo information [29]. To deal with this problem, similar chemical substructures including the functional group may form similar binding sites [29][30][31]. For each reaction rule, the branch was extended or a ring was generated based on the functional group to generate chemical substructures which are defined as the binding site rules in this paper. The branch was extended with 103 atom attaching reactions or a ring is generated with 25 ring generation reactions when the main branch of the chemical had 3 or more atoms. Next, binding sites rules are named with the direction and the related reaction rule id for further estimation of binding site covalence. For the binding site rule on the substrate side of a reaction rule, the initial step is 'f'. Otherwise, the initial step is 'b' for the binding site on the product side. The letters 'f' and 'b' denote 'forward' and 'backward' direction of the reaction rule, respectively. Then, an index of reaction rule represented by a number is added to distinguish where the binding site rule is defined. Each step is divided by a dot. Figure 10 presents an example. As the binding site rule grows, the name is extended. The remained number represents the hierarchical information of the binding site rule.

Prioritization
If an enzyme reaction has the same reaction rule with a novel reaction in a desired pathway, then it is matched as a similar reaction. There can exist many matched enzymatic reactions for each reaction in the desired pathway, and thus it is necessary to clarify which enzymatic reactions will be more promising. To address this problem, the similar reactions need to be evaluated quantitatively. The quantitative aspect of likeness is defined by a scoring algorithm -referred to as prioritization. The prioritization method is composed of five factors: binding site covalence, chemical similarity, thermodynamic favorability, pathway distance and organism specificity. Binding site covalence and chemical similarity are evaluated by comparing two reactions. Here, the binding site rules were defined by extending functional groups that are occupied in every reaction rule. The binding site covalence represents the local similarity between two molecules whereas the chemical similarity is estimated based on the entire structures of molecules in each reaction route candidate. Thermodynamic favorability is estimated by chemical structure changes through identified base routes. Pathway distance and organism specificity measure the relationships among enzymes to catalyze reactions in each enzymatic synthetic route from starting  to target chemical. Those factors are used to calculate priorities of the enzyme route candidates.

Binding site covalence
Binding site covalence describes how similar two reactions are from the point of chemical structure changes. As described above, binding site rules had been defined as substructures of chemicals including their functional groups. In addition, the systematic name of a binding site rule entails the trace of branch extension. Therefore, the binding site covalence is calculated by systematic names; the more similar names the binding site rules have, the more similar structures they have. The binding site covalence between a known reaction in a reaction route candidate and a novel reaction in a base route can be estimated as follows. First, the covalences of molecules on the substrate side and the product side are evaluated. The binding site covalence is then calculated by summation of the covalence of molecules on both sides of reactions. XB, XB j , XB i : binding site covalence of an enzyme route candidate, a reaction route candidate and a reaction step in the base route, respectively : system name steps in common for two reactions on the substrate side and the product side, respectively : system name steps of a novel reaction f i on the substrate side and the product side for a reaction step i, respectively : system name steps of a known reaction g i on the substrate side and the product side for a reaction step i, respectively An example is shown in Figure 11. One base route can be compared to several reaction route candidates so as to identify which reaction route candidate will be more similar in terms of local structures around functional groups. If chemicals have the same binding site rule as the largest substructure on both sides of the reactions, the binding site covalence of the reaction step becomes the maximum value, one, while the binding site covalence is normalized between zero and one.

Chemical similarity
Chemical similarity is examined on both sides of reactions with respect to how similar two reactions are in terms of chemical fragments [32]. The chemical similarity represents the similarities of the entire molecular structures, while the binding site covalence identifies only the substructure similarities. In Figure 12, the binding site covalence is calculated by considering the circled structures only; on the other hand, the chemical similarity is calculated by considering the whole structures. The binding site shown in the circle always contains a functional group shown in the square. In this case, the binding site covalence might get higher value than the chemical similarity since the substructure similarities on both sides of reactions are more significant. Tanimoto coefficient, representing chemical dissimilarity based on a fragment analysis using chemical bit-strings was employed [32,33]. The Tanimoto coefficient has been one of the most popular methods to show chemical dissimilarity, since it evaluates the relationship between chemicals accurately, is normalized from zero to unity and easy to use. In the system framework, the complementary value of the Tanimoto coefficient was used to evaluate the similarity not the dissimilarity. In the same manner as the binding site covalence calculation, the values are estimated on both sides of the reactions and the summation is taken for each reaction step. where XC, XC j , XC i : chemical similarity of an enzyme route candidate, a reaction route candidate and a reaction step in the base route, respectively T(f, g): Tanimoto coefficient for two molecules and while 0 ≤ T(f, g) ≤ 1 : substrate and product of a reaction step in a base route f i for a step i, respectively : substrate and product of a reaction step in a reaction route candidate g i for a step i, respectively

Thermodynamic favorability
The group contribution method was used to estimate Gibbs free energy of formation [34,35]. Normally, Gibbs free energy of formation has been used to determine whether a reaction is thermodynamically feasible, and it takes a negative value if the reaction is feasible. Since the group contribution method requires only group and atom properties of a molecule, it is appropriate to be adopted in the system framework. However, the environmental parameters in a living system, such as temperature and pressure, are not ideal and change continuously, and thus the group contribution method cannot estimate the exact Gibbs free energy of formation. Therefore, we employed the Gibbs free energy of formation as a reference to determine thermodynamic favorability. From the starting to target chemical, the overall Gibbs free energy of formation estimated by the group contribution method must be the same. Thus, the more important variable than the overall Gibbs free energy of formation is the energy level fluctuation through the synthetic route ( Figure 13). As shown in the Equation 3a, the absolute values of Gibbs free energies of formations are summed through a base route so as to obtain the 1-norm distance [36].
where XT : normalized thermodynamic favorability of an enzyme route candidate XT i : Gibbs free energy of formation of a step in a base route XT j : thermodynamic favorability of an enzyme route candidate Figure 11 An example of estimation of binding site covalence. The binding site covalence between novel and known reactions belonging to rule number 1 (rule id F.C.1) is evaluated. The largest binding site, a compound contains, is shown near by the compound. Hereby, b c means system name step in common for reaction f and g while b f and b g means system name steps of a novel reaction f and a known reaction g, respectively. The superscript s and p means substrate side and product side, respectively. XB is the binding site covalence between a novel reaction f and a known reaction g. (Equation 1a).
Here, we adjusted the range of thermodynamic favorability from zero to one for comparison with other factors. Moreover, a larger value indicates more fluctuation so that the values should be converted. To make a larger value more favorable, the negative exponential function of each thermodynamic favorability ratio with respect to the maximum value was applied. After the three factors have been estimated, all the identified routes are rearranged as enzyme route candidates. If two or more reactions are catalyzed by one enzyme, then the best values of binding site covalence and chemical similarity are selected (Equation 1c, 2c).

Pathway distance
Previous studies have been attempted to identify the genomic relations of pathway distance [37,38]. The pathway distance was initially defined as the shortest path between two enzymes; if two enzymes catalyze same reaction, then the pathway distance becomes zero. Shorter pathway distance showed higher genomic relation based on gene pairs [38]. The shortest pathway distance between two enzymes is examined by a full search starting from one enzyme until met the other one. First, the pathway distance between two enzymes is evaluated and then that of the second and third enzymes are evaluated again until the last enzyme in the enzyme route candidate is encountered.
where XP : pathway distance of an enzyme route candidate XP i,i+1 : revised pathway distance between i th and (i + 1) th steps in an enzyme route candidate p i,i+1 : pathway distance between i th and (i + 1) th steps in an enzyme route candidate  Finally, all the evaluated pathway distances between steps in an enzyme route candidate are multiplied. If one distance is increased, the co-expression probability of two enzymes is decreased (Equation 4a).

Organism specificity
Organism specificity has been assigned in the same manner as pathway distance estimation. It estimates the organism-to-organism distance, a subject that has been studied extensively [39][40][41]. The distance is calculated based on the hierarchical information of organism lineages based on gene changes from generation to generation. If an enzyme is one generation differs from the other one, the co-expression probability of the two enzymes is decreased (Figure 14, Equation 5a). The lineage information of all genes stored in the KEGG database was used. The distance is established by comparing the lineage information; moreover, the system explores the closest organisms which encode the two enzymes.
where XO : organism specificity of an enzyme route candidate XO i,i+1 : organism specificity between ith and (i + 1) th step in an enzyme route candidate o i : number of lineage generations of ith step in an enzyme route candidate o i,i+1 : number of lineage generations in common for ith and (i + 1) th step in an enzyme route candidate

Final priority score
The meanings of five factors addressed above are categorized into three groups: structural similarity of reaction steps in a route, thermodynamic benefits among intermediates and co-expression probability of enzymes. With those factors, the priority score for each route is finally obtained by Equation 6a.
where X: priority score of an enzyme route candidate α : parameter for binding site covalence β : parameter for chemical similarity γ : parameter for thermodynamic favorability δ : parameter for pathway distance ε : parameter for organism specificity Each parameter was initially set to be one; in addition, the parameters were optimized, as introduced in the following section. Finally, the promising enzyme candidates are sorted by the priorities where a higher priority value means greater likelihood to catalyze a novel synthetic route. Since numerous enzyme candidates are ordered quantitatively, promising enzyme candidates among them could be distinguished and applied to experiments.

Parameter optimization
Finding optimal parameters to identify the best candidate is one of the critical issues in determining valid priority, since the parameters denote the impacts and the relationships of the aforementioned five factors. To optimize the parameters, the known enzymes, exist in the KEGG database, were utilized to identify the experimentally proven novel pathways. With pathway information in the KEGG database, the currently known pathways will inevitably receive the highest scores. For this reason, the criteria were designed based on currently known pathways. Herein, the compared chemical structures between a synthesized route and an existing pathway are identical, and thus the binding site covalence and chemical similarity must be the maximum values for each step in a route. Moreover, pathway distance must be short and the organism specificity is one. Thermodynamic favorability can be better for an unknown route; however, this effect cannot be analyzed alone without considering other factors. To validate our approach, the parameters have been optimized by satisfying the criterion that newly synthesized pathways from the literatures are to be ranked high. Using the data set of prioritization factors, a mixed integer linear problem was set up as follows. The primary objective is to minimize the difference between the prior- This tree represents lineage relationships among organisms. The value written in each node indicates the organism specificity compared to the filled node. If an enzyme a is expressed in Alphaproteobacteria and an enzyme b is expressed in Enterobacteriales, the organism specificity between a and b is 0.73.
ity score of a desired route candidate with current parameters and the score with adjusted parameters. The secondary objective is to maximize the number of reactions having lower scores than the desired route candidate. That is, the parameters evolve whenever a test case is added so as to adjust parameters that minimize the deviation and make experimentally verified pathways be ranked high simultaneously.
where α' : adjusted parameter for binding site covalence β' : adjusted parameter for chemical similarity γ' : adjusted parameter for thermodynamic favorability δ' : adjusted parameter for pathway distance ε' : adjusted parameter for organism specificity XB obj : binding site covalence of the desired candidate obj XC obj : chemical similarity of the desired candidate obj XT obj : thermodynamic favorability of the desired candidate obj XP obj : pathway distance of the desired candidate obj XO obj : organism specificity of the desired candidate obj : priority score of a route candidate j with adjusted parameters : priority score of the desired candidate obj with adjusted parameters y j : binary variable, The priority scores are calculated by Equation 6 while the lower bound and upper bound of parameters are 0.5 and 1.5, respectively. This problem was solved by the parameter evolutionary optimization [21,22]. After the evolutionary parameter estimations were performed with five test cases sequentially, the parameters have been adjusted, and consequently the Equation 6a was specified to Equation 6b.
The evolutionary parameter optimization process according to enrolled test cases is presented in Figure 5.