Genetic circuits topology
The genetic networks studied in this work are shown in Figure 1. They are transcriptional regulatory cascades commonly used in synthetic biology studies [6, 13, 17, 21]. These networks are composed of inducible and repressible devices with a reporter gene downstream, such as RFP, to measure the circuit output. In particular, an N-3-oxohexanoyl-L-homoserine lactone (3OC6-HSL)-inducible device is always used as the input module of the network; it is composed of a constitutive expression cassette for the LuxR protein, which activates the Plux promoter in the presence of 3OC6-HSL in a concentration-dependent fashion. A genetic NOT gate is considered as downstream module, composed of a TetR protein expression cassette, driven by the upstream device promoter, and the Ptet promoter, which can be repressed by TetR in a concentration-dependent fashion. Another NOT gate device is also considered; its functioning is analogous to the TetR/Ptet system and it includes the LacI/Plac repressor-promoter pair. Finally, a genetic YES gate is considered, which is composed of an activator protein (herein called A) expression cassette, driven by the upstream device promoter, and a promoter (herein called PA) that is activated by A in a concentration-dependent fashion. While the lux, tet and lac systems are well-defined and literature parameters are used to model their functioning (see below), the A/PA system is mock and arbitrary parameters are used for it.
The paper is mainly focused on the network of Figure 1A, where the 3OC6-HSL-inducible device is interconnected to a TetR-/Ptet-based NOT gate. Two network modifications are also considered, where the NOT gate is replaced by a YES gate (Figure 1B) or where a LacI/Plac-based NOT gate device is interconnected downstream of the TetR/Ptet-based NOT gate (Figure 1C).
In silico experiments
Assuming the modularity of biological parts [8, 9], that is, the module parameters do not change when different units are interconnected, we aim to simulate experimental studies to evaluate the contribution of noise on the population-averaged network output. Two main investigations were performed, which are described below. The in silico experiments are carried out by considering the network at the steady-state and by using Hill equations to describe the transfer function of each individual module. The output of interconnected networks is obtained by serially propagating the output of Hill functions and noise is applied at different levels of the networks, as illustrated in Figure 1D-H.
Transfer function identification for a single module
The steady-state input-output transfer function of the TetR/Ptet-based NOT gate (or the YES gate) is measured by estimating its Hill function parameters from simulated data, generated by including noise obeying specified laws. In particular, 3OC6-HSL-inducible devices affected by different noise entities were used to drive the NOT gate over a range of input values. The input devices are assumed to provide identical population-averaged signals to drive the NOT gate, except for the noise entity affecting them. In an experimental framework, this is analogous to having a set of inducible devices whose population-averaged transfer function at the steady-state has been identified via central tendency measures and the model of noise affecting it is known. Previous studies were performed via a similar setup [6, 39], where the transfer function of the NOT gate module was identified in the presence of different input modules, pre-characterized via central tendency measures, and different transfer curves were yielded for the NOT gate; since the different input modules used may have different noise characteristics, this study aims to evaluate the contribution of noise on the resulting difference of the measured transfer curves.
Input-output function identification for an interconnected network
The black-box input-output transfer function of interconnected networks (see Figure 1A-C) is studied for different noise entities. Population-averaged measures of the individual modules, in the presence of noise, are used to identify their transfer functions and to predict the input-output characteristics of the full network. Then, noise of different entities is applied to the individual modules and it is propagated throughout the network. Finally, the population-averaged measures of the network output are used to identify the black-box Hill function of the whole circuit and it is compared to the prediction obtained above. This study aims to evaluate the contribution of noise on the apparent unpredictability of composite biological systems obtained by interconnecting modules characterized via central tendency measures. The studied situation represents a crucial aspect in the synthetic biology world, where predictable systems are expected to be built from the knowledge of a set of individual parts [3, 5].
Steady-state transfer functions of the genetic modules
The steady-state input-output activity of all the modules is modelled by Hill equations, describing the synthesis rate per cell of the protein encoded by the downstream module, as a function of input molecule concentration [40–42]. In fact, even if all the considered modules are based on transcription, the synthesis (or translation) rate per cell of the downstream gene-encoded protein is assumed to be proportional to the transcription rate generated by the input device. This assumption is based on simple gene expression models, such as the ones described in [43, 44].
All the investigated interconnected networks are studied at the steady-state.
The transfer function of the 3OC6-HSL-inducible system (Module 1, Figure 1D-H) is always modelled as:
(1)
In Eq.1, the output of the device depends on 3OC6-HSL inducer concentration via four parameters describing its behaviour: δ
IN
is the basal activity of the device when no inducer is present, δ
IN
+α
IN
is the maximal synthesis rate of the downstream protein, k
IN
is the inducer concentration giving an output of , and η
IN
is the Hill coefficient. The k
IN
parameter is expressed in the same units as 3OC6-HSL (nM), the η
IN
parameter is dimensionless and the δ
IN
and α
IN
parameters are measured Relative Promoter Units (RPUs) [43], which are used to approximate the downstream protein synthesis rate per cell triggered by the device [6].
The function of the TetR-based NOT gate (Module 2, Figure 1D, E, H) is modelled as:
(2)
In Eq.2, OUTPUT2 is a decreasing function of OUTPUT1 and parameters have the same meaning as above. Although OUTPUT1 is the TetR synthesis rate per cell, it is used to approximate the intracellular concentration of TetR protein, assuming that it is proportional to its synthesis rate per cell [43]; in this case, the proportionality constant is included in the k
OUT
parameter.
When considering the three-module network, the function of the LacI-based NOT gate (Module 3, Figure 1H) is modelled as:
(3)
In Eq.3, all the parameters have the same meaning as in Eq.2.
The function of the YES gate (Module 2, Figure 1F,G) is modelled as:
(4)
In this case (Eq.3), OUTPUT2 is an increasing function of OUTPUT1, as opposed to Eq.2. In Figure 1D-H, the qualitative trends of the Hill functions for all the modules are reported. They can be considered as the deterministic transfer functions of all the devices, in absence of any stochastic effect, i.e., noise.
Models of noise
Here, noise was assumed to affect protein synthesis rate per cell (see Figure 1D-H) and to only be dependent on the interconnected device upstream of the gene encoding the protein. The "default" network condition considered in this paper is illustrated in Figure 1A and 1D and its mathematical model of noise is herein discussed. A 3OC6-HSL-inducible input device is interconnected to a TetR-based NOT gate, whose output is measured; noise affects only OUTPUT1. Considering a vector of N inducer concentrations of 3OC6-HSL, at the i-th 3OC6-HSL concentration (i = 1...N), lognormal multiplicative noise (v
i
) is applied to the deterministic output (y1,i) of Module 1 (Eq.5-6).
(5)
(6)
where the logarithm of the lognormal distribution gives a Gaussian distribution with mean µ = 0 and variance , which, in general, depends on inducer concentration. The lognormal distribution is widely used to describe noise in biological processes and, in particular, it well describes the fluorescence distribution of reporter proteins in cell populations bearing synthetic gene networks, as it is often shown by experimental measurements performed via flow cytometry [45–48].
Under the hypotheses described above, the mean AVE and variance VAR of the lognormal noise are (Eq.7-8):
(8)
Since AVE is not 1, the average value of OUTPUT1,i is not y1,i, but it is (Eq.9):
(9)
which depends on . In the experiments performed in this paper, different noise entities are applied to Module 1; this would produce different average values for the same deterministic transfer function at a given induction (Eq.1). However, in this work we aim to study the effect of equal population-averaged inputs with different noise entities because instruments used in population-based experiments typically measure average values of reporter proteins. For this reason, we introduced a correction term: to have equal average values for OUTPUT1 for different noise variances, we corrected the deterministic output value y1,ias reported in Eq.10.
(10)
where y
pop,i
is the population-averaged output of Module 1 at the i-th induction. From Eq.9 and Eq.10, the average output value for Module 1 is always (Eq.11):
(11)
Considering the Hill function of Module 1 (Eq.1), the αIN,1 and δIN,1 parameters are scaled, with respect to their nominal values, as indicated in Eq.10, while the k
IN
and η
IN
parameters remain unchanged. This process enables to set identical population-averaged output values for different noise entities affecting the 3OC6-HSL-inducible module.
Two different noise models were considered: with constant coefficient of variation (CV) or with constant variance (VAR). The former has been experimentally observed in different works in the literature [13, 32, 33] and unpublished results from our laboratory, while the latter was used in this work to test a different assumption via simulations.
First, a constant CV noise model was assumed for all the 3OC6-HSL input concentrations. From Eq.7-8 it results that (Eq.12):
(12)
Once a CV value is fixed, a value of σ2 can be obtained (Eq.13):
The σ2 value computed from Eq.13 is independent from y
pop,i
and it is used to generate noisy samples for OUTPUT1.
Multiplicative lognormal noise model with constant VAR was also considered. From Eq.8 and Eq.10 it results that (Eq.14):
(14)
From Eq.14, the value can be obtained (Eq.15):
(15)
In this case is a function of y
pop,i
and is different for every i-th induction.
When noise was also applied to OUTPUT2 (see Figure 1E), multiplicative lognormal noise with constant CV or VAR was considered and generated by following the same concepts as above. In this case, noisy samples were extracted with a correlation coefficient 0<ρ<1 to model an extrinsic component of noise in addition to the intrinsic one. In particular, if noise is fully due to an intrinsic component, OUTPUT1 and OUTPUT2 are independent, since intrinsic noise is caused by the stochasticity in processes like gene expression. On the other hand, if part of the noise is due to an extrinsic component, OUTPUT1 and OUTPUT2 are correlated random variables, since biological processes like gene expression in each single cell are influenced by a pure stochastic component (intrinsic noise) and by fluctuations of cellular species (extrinsic noise) that are due to the variation between cells of resources like polymerases and ribosomes. In this context, the expression of two different genes shares common resources and, as a result, it will be correlated. The contribution of extrinsic noise is tuned by changing ρ between 0 (no extrinsic noise) and 1 (no intrinsic noise).
Parameter values and implementation
The parameter values reported in [6] and obtained in unpublished experiments performed in our laboratory were used to describe the deterministic steady-state Hill functions of the nominal lux, tet and lac systems. Such values are reported for each simulation in the Results section. The CV values for multiplicative lognormal noise range between 15% and 75%, which are realistic values for cell-to-cell variability according to [26, 31–33] and to unpublished experiments in our laboratory.
MATLAB R2010a (MathWorks, Natick, MA) was used to implement the study by generating data via in silico experiments and then performing parameter identification. For data generation, lognormal noise with specific entity was applied to the Hill function outputs, which were propagated throughout the cascade of interconnected modules in the network. The Hill functions were computed as described in Eq.1-4, while the lognormal noise was multiplied to the outputs according to Eq.5-6 and Eq.10. The lognrnd function was used to generate independent lognormal noise samples. Correlated lognormal noise samples were computed as exp(vn), where vn are Gaussian noise samples obtained via the mvnrnd function. For each in silico experiment at the i-th 3OC6-HSL concentration, 10,000 independent lognormal noise samples were extracted and used to generate synthetic data, which simulate the steady-state output of 10,000 sampled cells. The lsqnonlin routine was used for parameter estimation. Sensitivity analyses were performed by changing parameters individually in the value ranges specified in the Results section, while all the other parameters were fixed at their nominal values. Parameters were changed by spanning a range of plausible values, according to a number of published [6, 19, 42, 49] and unpublished in vivo experiments carried out by our group. The aim of sensitivity analyses was to evaluate the impact of parameter values on the variability of the parameter estimates.