Multi-objective optimization framework to obtain model-based guidelines for tuning biological synthetic devices: an adaptive network case

Background Model based design plays a fundamental role in synthetic biology. Exploiting modularity, i.e. using biological parts and interconnecting them to build new and more complex biological circuits is one of the key issues. In this context, mathematical models have been used to generate predictions of the behavior of the designed device. Designers not only want the ability to predict the circuit behavior once all its components have been determined, but also to help on the design and selection of its biological parts, i.e. to provide guidelines for the experimental implementation. This is tantamount to obtaining proper values of the model parameters, for the circuit behavior results from the interplay between model structure and parameters tuning. However, determining crisp values for parameters of the involved parts is not a realistic approach. Uncertainty is ubiquitous to biology, and the characterization of biological parts is not exempt from it. Moreover, the desired dynamical behavior for the designed circuit usually results from a trade-off among several goals to be optimized. Results We propose the use of a multi-objective optimization tuning framework to get a model-based set of guidelines for the selection of the kinetic parameters required to build a biological device with desired behavior. The design criteria are encoded in the formulation of the objectives and optimization problem itself. As a result, on the one hand the designer obtains qualitative regions/intervals of values of the circuit parameters giving rise to the predefined circuit behavior; on the other hand, he obtains useful information for its guidance in the implementation process. These parameters are chosen so that they can effectively be tuned at the wet-lab, i.e. they are effective biological tuning knobs. To show the proposed approach, the methodology is applied to the design of a well known biological circuit: a genetic incoherent feed-forward circuit showing adaptive behavior. Conclusion The proposed multi-objective optimization design framework is able to provide effective guidelines to tune biological parameters so as to achieve a desired circuit behavior. Moreover, it is easy to analyze the impact of the context on the synthetic device to be designed. That is, one can analyze how the presence of a downstream load influences the performance of the designed circuit, and take it into account. Electronic supplementary material The online version of this article (doi:10.1186/s12918-016-0269-0) contains supplementary material, which is available to authorized users.

Note: the empty set ∅ denotes species degradation when placed in the right hand side of a reaction. Notice we assume that binding between the activating dimer and the gC hybrid promoter is always possible, even if the repressor B is already bound to the promoter, and vice versa. Yet, we consider that whenever the repressor B is bound to the promoter gC, the RNA polymerase cannot bind the promoter.
Using the law of mass-action kinetics (Horn and Jackson, 1972;Chellaboina et al., 2009), the previous reactions can be used to formulate the corresponding dynamic balances of the species 3 concentrations: Note these equations can be derived either by inspection, or using specific software to automate the process. Software packages like BioNetGen (Blinov et al., 2004) or COPASI (Mendes et al., 2009) allow to obtain the dynamic kinetic model from either the set of reactions or from SBML files encoding them.
The complete model is of large order, which implies a high computational cost for the parameters estimation process that will be carried out later on. Moreover, the large differences in the time scales among the different species in the synthetic gene network (typically many orders of magnitude) originate huge difficulties for simulating the temporal evolution of the network and for understanding the basic principles of its operation. Therefore, the dynamical model will be reduced using time-scale separation and detection of invariant moieties.
We apply the Quasi Steady-State Approximation (QSSA) to the fast chemical species (Zagaris et al., 2004;Mélykúti et al., 2014). In essence QSSA is a singular perturbation method that considers the time-scale separation among the different dynamics. In particular, we will assume that binding reactions occur very fast in comparison with those corresponding to transcription, translation or even genuine degradation. On the other hand, monomer formation is faster that dimerization Therefore, the differential equation corresponding to the AI complex formation will be assumed to be at steady state.
Additional algebraic relationships among variables can be obtained through system invariants. In the case of reaction networks, it can be observed that some reactions are a linear combi-4 Table S 2. List of variables used in the reduced model.

Variable Description
Units Symbol External I ext nM I e nation of other ones. Then, the linear combination of the concentrations of the species involved will keep constant in time. These linear combinations, so called moieties, can be understood as a kind of quasi-species that keep invariant, i.e. keep constant concentration. After the reduction, ommited here, we get a system of nine ordinary differential equations (each one corresponding to the dynamics of one of the species) plus an algebraic equation, and 26 model parameters that will be the decision variables at the the optimization step. The resulting model is:ẋ (1) . The resulting set of biochemical reactions corresponding to the reduced model is: Where f A , f B , and f C are the lumped propensities obtained from the reduction: 6

Matlab CODE
A short description of the main functions integrating this code and justification of the value sets is given below. It has been divided in two groups: files related to the model computational characterization, and files used by the optimizer, which link to the first set.
Model code • model 3genes.m is a function for the ODEs of the reduced model. Receives the value of the state vector x at time t, the parameters, initial conditions and time point; and returns a vector with the derivatives defined in it. When used with the command ode23s in the function objective func.m one obtains the solution of the ODEs system for the given parameters.
• objective func.m is the objective function. It receives the parameters and returns the objectives values vector, after calculating J 1 and J 2 for the corresponding dynamic response obtained with the given parameters.
The 14 variables are initialized, and the 10 parameters are not because the optimizer will work with a given range in its code.
The ode23s algorithm gives the variables values Y for each t, using model 3genes.m. This ode algorithm was selected because our system model is what it is known as stiff, in terms of the numerical solution of ordinary differential equations, i.e. it has both slow and fast dynamics. An ordinary differential equation problem is stiff if the solution being sought is varying slowly, but there are nearby solutions that vary rapidly, so the numerical method must take small steps to obtain satisfactory results. For our integration problem we use an absolute tolerance of 1e-8, and a relative tolerance of 1e-6.
For computational simulation, we start from the equilibrium initial conditions (precomputed) and give a jump of 50nM to the concentration of x 9 to simulate the induction. With respect to the simulation parameters, the simulation sampling time (Ts) was fixed to 1e −3 minutes, and a total simulation time T sim = 300 minutes was used.
• eval obj fun.m is the function that receive a population of parameters, evaluates the objective functions in this population, and accumulates the results in a matrix to return it. It is executed at each iteration of the sp-MODE algorithm.

MOO code
First, highlight that we use the script Tutorial.m to run all the functions used to obtain the results shown in the main paper.
The first step is to run the spMODEparam file to build the variable 'spMODEDat' with the variables required for the optimization. Here the number of objectives are defined, also the number of decision variables and the 'Cost Function', which brings the objectives matrix after previous ode simulations (by means of interlinked functions mentioned above, constituting in essence the problem 'nucleus' or characterization). The field of search, and bounds to improve pertinency of solutions in the objective space so as to cut solutions with no interest to the DM, are defined here too. Also other aspects, such as maximum Pareto optimal solutions required and a bound on the number of function evaluations.
Once the Pareto set and the Pareto front are found by the optimizer, results can be plot with optional features through the Leveltool. This tool provides the LD visualization for the MCDM.
• spMODEparam.m generates the required parameters to run the spMODE optimization algorithm.
In this file the variables regarding the multi-objective problem are defined. The values of interest for our problem are: 1. Number of objectives. spMODEDat.NOBJ = 2 2. Number of decision variables.
spMODEDat.CostProblem = 'modelo3genes' 5. Maximum and minimum values for the parameters or decision variables are fixed in order to give a range to the optimizer to search the optimal solutions, (spMODE-Dat.FieldD). k d and d I e were fixed to avoid the optimizer to modify the model input I e , as we want an step input determined by K e (t). 6. Bounds on objectives.
• CostFunction.m calls the cost function of your own multi-objective problem. In this case eval obj fun.m. It also includes a default mechanism to improve pertinency (Objective space bounded).

Clustering and Visualization
• clustering.m is a script that performs the hierarchical clustering with the solutions obtained from the spMODE optimization algorithm, and uses the modified LD-tool to plot the LD plots with the cluster number as Y-axis.
Computational cost. Execution of our MOO using the sp-MODE algorithm (15.000 evaluations of the objective function) took around 10 hours and 25 minutes and was performed in a Intel XEON Server with 12 cores and 32 Gb of RAM Memory.