OptPipe - a pipeline for optimizing metabolic engineering targets

Background We propose OptPipe - a Pipeline for Optimizing Metabolic Engineering Targets, based on a consensus approach. The method generates consensus hypotheses for metabolic engineering applications by combining several optimization solutions obtained from distinct algorithms. The solutions are ranked according to several objectives, such as biomass and target production, by using the rank product tests corrected for multiple comparisons. Results OptPipe was applied in a genome-scale model of Corynebacterium glutamicum for maximizing malonyl-CoA, which is a valuable precursor for many phenolic compounds. In vivo experimental validation confirmed increased malonyl-CoA level in case of ΔsdhCAB deletion, as predicted in silico. Conclusions A method was developed to combine the optimization solutions provided by common knockout prediction procedures and rank the suggested mutants according to the expected growth rate, production and a new adaptability measure. The implementation of the pipeline along with the complete documentation is freely available at https://github.com/AndrasHartmann/OptPipe. Electronic supplementary material The online version of this article (doi:10.1186/s12918-017-0515-0) contains supplementary material, which is available to authorized users.


Optimization methods
The different optimization methods that have been considered for knockout prediction on the genome-scale models are the following: OptKnock, RobustKnock, MOMA, OptGene and RobOKoD. The Table below contains the references and availability of the different methods.

OptKnock
OptKnock gives an optimistic prediction to the target production based on the premise that the organism maximizes the biomass [1]. The problem is formulated as a bi-level mixed integer linear problem (MILP) Where v prod is the target compound to produce, v biomass is the reaction corresponding to the biomass, S is the stoichiometric matrix, LB and U B the lower and upper bounds, respectively. The variables of the inner problem are the fluxes v, and the outer problem operates on the binary variables, y, representing the deletions. Here the value y j = 1 means that the j th reaction is deleted and y j = 0 has the meaning that it is not. The maximum number of deletions is κ. It was shown in [1,9] that the bi-level optimization problem can be translated into a single-level MILP. State-of-the art mixed integer solvers like gurobi and CPLEX can find a globally optimal solutions of such problems.

RobustKnock
RobustKnock is formulated as a three-level mixed-integer optimization problem [3] which, similarly to the OptKnock, can be transformed to a single-level mixed integer problem.
Here the outer two levels comprise a max-min problem, which is a common formulation for loss minimization, optimizing for the worst-case scenario.

OptKnock vs. RobustKnock
In a sense, OptKnock represents an optimistic prediction because when fixing the binary variables, it formulates the question: "What is the maximal compound production given that the organism optimizes for biomass?" By the same token, RobustKnock serves as a pessimistic prediction for the biomass-coupled target production, i.e. it maximizes the the minimal target compound given that the biomass is maximized. See Figure 1 for an illustrative example on the two different approaches.

MOMA
Minimization Of Metabolic Adjustment (MOMA) relies on the assumption that the fluxes of the genetically engineered organism undergo a minimal redistribution when compared to the wild type organism as a reference flux distribution [4]. This is expressed in the optimization problem minimizing the distance of the flux distribution of the mutant to the reference.
Where (·)v is a distance measure. Because of shorter running-time, in our applications the linear version of MOMA (lMOMA) was used, which involves the minimization of the first norm between the reference organism and the modified one. The reference is represented by the parsimonious Flux Balance Analysis (pFBA), consisting of two steps. First a traditional Flux Balance Analysis (FBA) is done in order to calculate the wild type biomass denoted by f .

OptGene
OptGene [5,6] considers the application of Evolutionary Algorithms (EAs) and Simulated Annealing (SA). Unlike OptKnock and RobustKnock that guarantee global optimal solutions, EA and SA are meta-heuristic methods that are only capable of providing near-optimal solutions but within a reasonable computation time. Another benefit of OptGene is being quite flexible regarding the objective function that can be optimized (e.g. they are able to deal well with non linear functions). A popular choice is the Biomass-Product Coupled Yield (BPCY) [5], where is the flux of the substrate which serves as precursor for the target compound.

RobOKoD
RobOKoD identifies knockout, overexpression/dampening targets in an iterative threestep process [8]. First a Metabolite Consumption Test (MCT) is executed to identify metabolites within the optimal target production pathway that are also consumed to produce biomass, and to determine their promiscuity. The second step is a flux variability analysis profiling (FVAp), where the reactions are profiled to determine which reaction would make an ideal knockout / dampening / overexpression target. The third step is a selection of the most promising deletions. The strength of RobOKoD is that it is able to predict and rank knockouts, over-expressions, and dampening targets. While predicting an optimized set of gene modifications to implement, unlike other methods, RobOKoD also provides lists of candidate modifications, along with graphical flux variability profiles, allowing the user to manually validate the set of predictions.

Detailed pipeline
The step-by-step procedure applied in the pipeline is detailed below: 1. Introduction of experimental constraints for a wild-type C. glutamicum strain. These values comprise the consumption rates of glucose, acetate and oxygen and production rates of succinate and carbon dioxide, and were obtained from [10]. Additionally, for the particular case study of naringenin production in C. glutamicum it was considered that 5-coumarate is provided in the media, to mimic the experimental conditions.
2. Define the set of reactions that should not be considered for deletion ("Irrelevant reactions") a) Essential reactions: required for growth. This set was determined by evaluating every single reaction deletion and the corresponding growth rate; deletions which resulted in a growth rate below a given threshold (10 -9 ) were considered lethal.
b) Transport and synthetic reactions: transport, sink and macromolecule formation reactions. This list was compiled based on a search by reaction name.
c) Blocked reactions: cannot carry a non-null flux and are therefore, an artifact from the reconstruction process. This set was compiled through the use of the findBlockedReaction, built in the Cobra Toolbox. To avoid OptFlux from considering "Irrelevant Reactions" to knockout, the gene association was removed in these reactions and the evolutionary algorithm was run on the gene deletion option, instead of the reaction deletion option.
d) Enumeration using the OptKnock objective function e) Enumeration using the RobustKnock objective function In both enumeration approaches the mutants with a growth rate below a tolerance of 10 -9 were disregarded.
4. Join the knockout lists generated by the 5 methods 5. Compute the 4 parameters that will be used for the ranking procedure for each deletion mutant combination a) Maximal growth rate b) Minimal target production c) Maximal target production d) Adaptability, i.e. the (sum of absolute differences) distance between the flux distribution of the mutant and the wild-type flux variability. For each reaction in the network, the distance between the flux in the deletion mutant distribution and the minimal/maximal flux of the wild-type distribution is calculated. Differences less than 10% of the minimal wild-type flux were considered insignificant and excluded from the final sum of the differences of all the fluxes, yielding the adaptability parameter.
6. Rank Product (RP) application in 10 iterations to account for randomness in rank attribution between deletion mutants that are tied for a given parameter. Parameters that have a standard deviation below 10 -5 are not considered for the method.
7. Filtering of proposed deletion strategies that have a growth rate below 0.1 and a minimal target production of zero.