Modeling RNA interference in mammalian cells
© Cuccato et al; licensee BioMed Central Ltd. 2011
Received: 7 October 2010
Accepted: 27 January 2011
Published: 27 January 2011
Skip to main content
© Cuccato et al; licensee BioMed Central Ltd. 2011
Received: 7 October 2010
Accepted: 27 January 2011
Published: 27 January 2011
RNA interference (RNAi) is a regulatory cellular process that controls post-transcriptional gene silencing. During RNAi double-stranded RNA (dsRNA) induces sequence-specific degradation of homologous mRNA via the generation of smaller dsRNA oligomers of length between 21-23nt (siRNAs). siRNAs are then loaded onto the RNA-Induced Silencing multiprotein Complex (RISC), which uses the siRNA antisense strand to specifically recognize mRNA species which exhibit a complementary sequence. Once the siRNA loaded-RISC binds the target mRNA, the mRNA is cleaved and degraded, and the siRNA loaded-RISC can degrade additional mRNA molecules. Despite the widespread use of siRNAs for gene silencing, and the importance of dosage for its efficiency and to avoid off target effects, none of the numerous mathematical models proposed in literature was validated to quantitatively capture the effects of RNAi on the target mRNA degradation for different concentrations of siRNAs. Here, we address this pressing open problem performing in vitro experiments of RNAi in mammalian cells and testing and comparing different mathematical models fitting experimental data to in-silico generated data. We performed in vitro experiments in human and hamster cell lines constitutively expressing respectively EGFP protein or tTA protein, measuring both mRNA levels, by quantitative Real-Time PCR, and protein levels, by FACS analysis, for a large range of concentrations of siRNA oligomers.
We tested and validated four different mathematical models of RNA interference by quantitatively fitting models' parameters to best capture the in vitro experimental data. We show that a simple Hill kinetic model is the most efficient way to model RNA interference. Our experimental and modeling findings clearly show that the RNAi-mediated degradation of mRNA is subject to saturation effects.
Our model has a simple mathematical form, amenable to analytical investigations and a small set of parameters with an intuitive physical meaning, that makes it a unique and reliable mathematical tool. The findings here presented will be a useful instrument for better understanding RNAi biology and as modelling tool in Systems and Synthetic Biology.
RNA interference (RNAi) is a well characterized regulatory mechanism in eukaryotes [1–3] as well as a powerful tool for understanding gene function, thanks to the discovery that synthetic small interfering RNA oligomers (siRNAs) can efficiently induce RNAi in mammalian cells [4, 5]. RNAi has also been used extensively as a novel "biological part" to design synthetic biological circuits in synthetic biology [6, 7]. Artificial gene silencing has the potential to become a major genetic-based therapeutic tool for viral infections [8, 9], cancer  or inherited genetic disorders [11–13].
Despite its widespread experimental application, the best way to quantitatively model RNA interference is still under debate. In systems and synthetic biology, mathematical models are essential to carry out in silico investigations of biological pathways, or novel synthetic circuits. The aim of this work is to find the most appropriate quantitative mathematical model that can correctly describe the RNAi phenomenon in mammalian cells, for varying concentrations of the siRNA oligomers.
In order to model the effects of RNA interference on mRNA expression levels at different concentration of siRNA oligomers, we carried out in-vivo experiments of RNA interference on two mammalian cell-lines stably expressing the EGFP protein or the tTA protein, respectively.
The different models RNA interference models for the RNAi-induced mRNA degradation rate δ ( X s , X m ) and their corresponding parameters.
k1: Rate of mRNA-siRNA* complex formation
k2: Rate of mRNA-siRNA* complex formation
h2: Number of siRNA target sites
k3: Rate of mRNA-siRNA* complex formation
c3: Cleavage and dissociation rate of mRNA-siRNA*
h3: Number of siRNA target sites
d4: Maximal degradation rate of the mRNA due to RNAi
θ4: Michaelis-Menten like constant
h4: Number of siRNA target sites
where parameter k 1 represents the proportionality constant. In this modeling approach, the siRNA-RISC complex is assumed not to be recycled, but the RISC needs to be reloaded before it can degrade another mRNA molecule (i.e. in this model the dashed line linking step 4 to step 3 in Figure 1 is not taken into account). Indeed, this model was suggested for RNA interference in prokaryotes [16, 18–20]. RNAi in prokaryotes has important differences with RNAi in eukaryotes. One of these is that the interaction between the siRNA and its mRNA target is non-catalytic in nature [16, 21]. This is not the case in mammalian cells. Once the mRNA is cleaved, the siRNA-RISC complex is dissociated from it, in an ATP-dependent manner  and it is free to process further targets [1, 14, 23]. Additionally, in some organisms, such as C.elegans, the primary dsRNA trigger induces synthesis of secondary siRNAs (if the target mRNA is present) through the action of RdRP enzymes, strengthening and perpetuating the silencing response [1, 24].
where k 2 is the proportionality constant and h 2 is the number of siRNA binding sites on the targeted mRNA species. This model was suggested in  for modeling RNAi by miRNAs, since it is experimentally proven that multiple sites for the same miRNA can boost target mRNA repression . This model however suffers from the same limitations as Model 1.
This is in perfect agreement with the experimental finding on the enzymatic activity of both non-mammalian and mammalian RISC on mRNA degradation, as reported in [23, 26, 27], where the product of the reaction (degraded mRNA) was measured at varying the concentration of the substrate (non-degraded mRNA X m ) for a constant amount of the RISC enzyme.
Model 4 basically assumes that only the maximal rate V m of the Michaelis-Menten will be affected when the concentration of the active RISC (proportional to X s ) changes. In [23, 26] the authors show experimentally that changing the concentration of RISC indeed changes the value of V m but also of K m . This is not captured by this model.
Note also that when X m ≪ K m , i.e. low amounts of mRNA compared to the Michaelis-Menten constant K m , the model can be simplified to: . This is equivalent to Model 1. On the other hand, when the mRNA concentration is very high, the model can be simplified to: δ (X s , X m ) = V m = c 3 X s . Namely, the rate of RNAi-driven mRNA degradation will depend only on the amount of active enzyme, i.e. the siRNA concentration. The higher the siRNA concentration, the higher the degradation rate that can be achieved, without any saturation effect.
This model, despite being phenomenological has interesting properties. The kinetic parameters d 4 and θ 4 depend on the efficiency of siRNA binding to its sites on the target mRNA : d 4 represents the maximal degradation rate of the mRNA due to RNA interference; θ 4 the concentration of siRNA oligomers needed to achieve half of the maximal degradation rate. The above equation implies that for X s ≪ θ 4, the increase in the RNAi mediated degradation is linear with (namely, it will be identical to Model 1 for h 4 = 1, or to Model 2 for h 4 = h 2), while it saturates at higher levels of X s , reaching the maximal degradation rate d 4, differently from Model 3.
Numerical fitting results of the four models for in vitro experimental data for the EGFP protein and mRNA.
Experiment on EGFP mRNA levels
k1 = 1.38 × 10-4(pmol min)-1,
k2 = 5.00 × 10-3(pmolh2 min)-1,
h2 = 0.126,
k3h3 = 1.40 × 10-4(pmol min)-1,
c3/km = 1.33 × 103a.u.,
θ4 = 0.105 pmol,
d4 = 8.1 × 10-3min-1,
h4 = 4.47
Experiment on EGFP protein levels
k1 = 9.90 × 10-5(pmol min)-1,
k2 = 1.10 × 10-3(pmolh2 min)-1,
h2 = 0.456,
k3h3 = 1.20 × 10-4(pmol min)-1,
c3/k m = 1.33 × 103a.u.,
θ4 = 12.9 pmol,
d4 = 8.6 × 10-3min-1,
h4 = 4.49
Note also that the parameters found for Model 2 include a coefficient h 2 = 0.126, hence less than unity. Since h 2 describes the number of siRNA binding sites on the targeted mRNA, it should be greater than, or equal to, 1 in order to have a clear biological interpretation. However, if we constrain this parameter to be greater or equal to one, then the model optimizes at the value of h 2 = 1, which makes Model 2 identical to Model 1.
When we fitted the models to the third experimental dataset (set III), which was performed on a different cell-line, with both a different target mRNA and a different siRNA oligomer, Model 4 still performed better than the others with the smallest error, although Model 2 was a close match. Model 3 and 4 were again behaving very similarly and had the largest error. It should be noted that as it happened with the previous fitting results, also in this case Model 2 has a Hill coefficient smaller than unity (h 2 = 0.126).
Finally we observed, as shown in Additional file 1Figure A1, that the experimental error for set III experiment was larger when compared to the EGFP experiment (set I). This was caused by the relative low expression of the tTA in the CHO cell lines, when compared to the EGFP expression in HEK cells, which made measurements more noisy.
The models described above have the same number of unknown parameters to be learned (Methods), but for Model 4, which has one extra parameter. To be sure that the improved performance of Model 4 in describing the experimental data was not due to overfitting, we computed for each model and for each experimental dataset, the prediction error, which allows to assess the generalisation performance of the models . We followed a "leave-one-out" cross validation strategy, where for each model and for each dataset, the parameter identification procedure was repeated each time removing one of the experimental points and then predicting the missing point with the identified parameters. We thus could estimate a prediction error for each model in each experimental dataset. This value is reported in Table 2and Additional file 1Table A1. Model 4 is again the one with the smallest prediction error, once again confirming this model superior performance in describing the data.
Our findings show that the simple Hill function described by Model 4 is sufficient to quantitatively describe the effect of RNA interference, at the mRNA and protein level, in mammalian cells in vitro, for varying concentration of siRNA oligomers.
One significant feature of Model 4 is that it can predict the saturation effect of the RNAi process that we observed experimentally. We considered the possibility that this saturation could be in fact due to the inability of the cell to uptake high concentration of siRNA oligomers, however recent experiments , prove that uptake of siRNA oligomers in cells is linear with the concentration of siRNA oligomers transfected, at least in the concentration range we used. Additionally, Khan et al in , observed upregulation of mRNA targets of endogenous micro-RNA when transfecting siRNA oligomers in mammalian cells. In order to explain this effect, they suggested a saturation of the RISC complex (or other necessary small RNA processing or transport machinery).
It has been demonstrated in [23, 26, 27] that the enzymatic activity of RISC can be efficiently modeled in-vitro as a classic Michaelis-Menten reaction, where the target mRNA is the substrate, the siRNA-loaded RISC is the active enzyme (at a constant concentration), and the product is the degraded mRNA. This is one feature that Model 4 does not capture; namely for a fixed amount of siRNA-loaded RISC (i.e. X s ), Eq. (11) should approximate the Michaelis-Menten in Eq. (10), instead Model 4 becomes simply proportional to X m , as Model 1 and 2. Nevertheless, Model 4 approximates very well the experimental data. We believe this happens because enzymatic reactions have a typical K M much greater that the physiological concentration of their substrate (X m ) , and the same happens for the RISC complex [23, 26]. In this condition, the Michaelis-Menten equation becomes and therefore it is linear in X m , as predicted by Model 4. When the siRNA concentration varies, Model 4 predicts that the parameters will change as a function of X s as described by Eq. (11).
The three parameters of Model 4 have a straightforward biological interpretation, and their values can be easily tuned to accommodate for different efficiencies of RNAi. For example, the parameter d 4 can be used to weigh the degradation due to the RNAi compared to the endogeneous mRNA degradation, and its strength, i.e. what is the maximal degradation rate that can be achieved. θ 4 quantifies the siRNA oligomers concentration needed to achieve half of the maximal degradation of the targeted mRNA. The h 4 coefficient can accommodate for multiple target sites on the same mRNA, or for the cooperativity of the RISC complex.
Clearly, the RNAi process is very complex and no one-to-one relationship can be found between parameters of Model 4 and RNAi biological components. Nevertheless, it has been shown in  that between 104 and 105 siRNA oligomers per cell (corresponding to a concentration in the range 10 pM-100 pM) are sufficient to reach half-maximal mRNA target degradation. Model 4 predicts that half-maximal degradation is achieved for an amount of siRNA oligomers equal to θ 4. The value of this parameter when fitting mRNA levels (Table 2 and Additional file 1 Table A1) is θ 4 ≈ 0.1 pmol despite of the different cell-lines and mRNA-siRNA pairs tested (EGFP and tTA). This value corresponds to a concentration of 50 pM in our experimental setting, hence in good agreement with the previously reported range. Altogether these observations suggest that the quantity θ 4 could be cell-type, mRNA, and siRNA indipendent.
It is estimated that the concentration of active RISC in a cell is about 3 - 5 nM [23, 26, 32]. Taking into account that the volume of a mammalian cell is in the range 10-13 L - 10-12 L, then we can estimate that the number of active RISC in a cell is in the range 103 - 104. The above observations suggest that saturation begins when the number of siRNA oligomers in a cell becomes comparable to the number of RISC molecules.
We observed that the parameters of Model 4 estimated when fitting protein levels (set II experiments) are very close to the ones estimated when fitting mRNA levels (set I experiments). Namely, the optimized values of d 4 and h 4 are very similar for both experimental data. This is important since these are two independent biological experiments. This proves the mathematical robustness of Model 4. The only parameter changing between the two sets of experiment is θ 4, which represents the concentration of siRNA oligomers needed to achieve half of the maximal degradation rate (d 4). This is reflected in Figure 2and Figure 3, where it is clear that saturation is achieved at about 1 pmol for the mRNA data (Figure 2) and at about 20 pmol for the protein data (Figure 3). This difference may be due to biological variability, or to the simplified model of protein translation dynamics we used (steady-state approximation).
We also conformed that Model 4 is cell-line-independent, mRNA-independent, and siRNA-independent, since it can accurately describe the RNA interference process on a different cell-line (CHO) expressing a different mRNA (tTA), silenced by a different siRNA oligomer.
Interestingly, the difference in Model 4 parameters, when testing a different mRNA-siRNA pair (i.e. tTA versus EGFP), shows that only d 4 (the maximal degradation rate) and h 4 (the cooperativity) change significantly, suggesting that these two parameters can be used to describe changes in siRNA-mRNA silencing specific strength, whereas θ 4 may be kept constant.
Recently it has been proposed that siRNA and microRNA efficacy, defined as the percentage decrease in the target mRNA level due to the silencing reaction, could be limited due to mRNA abundance or to mRNA degradation rate .
Model 4 predicts that the percentage decrease in target mRNA level (obtained from Eq. (13) simply dividing by k m /d m ) is indeed sensitive to d m (the target mRNA degradation rate), with a higher degradation rate corresponding to a weaker effect of the silencing reaction, and vice-versa. This result is in line with the experimental observation described in Larsson et al . In addition, according to Model 4, the transcription rate k m of the target mRNA does not have any influence on the silencing reaction efficacy. The target mRNA abundance, in absence of the silencing reaction, is simply obtained from Eq. 1 as k m /d m . Smaller d m will correspond to a higher mRNA abundance (for a constant k m ) therefore a correlation between mRNA abundance and sensitivity to mRNA can be found , but this is only an indirect effect mediated by the degradation rate, at least according to our model. Our conclusion is that siRNA-mediated degradation in mammalian cells can always be best represented as an enzymatic reaction described by an Hill function, whose parameters have to be tuned to the specific siRNA-mRNA pair.
The models discussed so far consider the average behavior of a population of cells. In the case of singe-cell experiments, these models might not be efficient enough due to their deterministic nature and will not be able to capture any stochastic effects.
Since RNA has a plethora of functional properties and plays many of roles in regulating gene expression, it has been used in a number of different studies as a tool for elucidating gene functions. In fact with RNAi it is possible to selectively knock-down any gene and even modulate its dosage . RNA has also been used in the design of therapeutic molecules as well as metabolic reprogramming . The potential uses of this versatile molecule are still very much under study, but their effectiveness depend on many variables such as, the concentration of the silencing reagent, the transfection techniques, the cell type used and the target type selection. In the present study we biologically validated for the first time a mathematical model (Model 4) that has a simple mathematical form, amenable to analytical investigations and a small set of parameters with an intuitive physical meaning that can be used both by the computational and the experimental community interested in the analysis and application of RNA interference.
The sequence of the 21-mer siRNA double-stranded oligomers targeting EGFP was identical to the one reported in . This siRNA targets the coding sequence of the EGFP gene starting at position 237 from the ATG, on the target sequence AAGCAGCACGACTTCTTCAAG. The siRNA HPLC purified, with sequence GCAGCACGACUUCUUCAAGtt (concentration 100 μM) was synthesized by Ambion. As a negative control we used, in all experiments a shuffled sequence non targeting siRNA from Dharmacon. A 21-mer siRNA oligonucleotide was designed against the coding sequence of tetracycline-controlled transactivator (tTA) gene using the Ambion technology platform. Custom siRNA HPLC purified with sequence GGUUUAACAACCCGUAAACtt (concentration 100 μM) were synthesized by Ambion on the target sequence AAGGTTTAACAACCCGTAAAC starting at position 57 from the ATG in the tTA gene coding sequence.
HEK 293 stably expressing EGFP (kindly provided by Mara Alfieri) were maintained at 37°C in a 5% CO2-humidified incubator. HEK 293 cells were cultured in Dulbecco's modified Eagle's medium (DMEM, GIBCO BRL) supplemented with 10% heat-inactivated fetal bovine serum (FBS, Invitrogen) and 1% antibiotic/antimycotic solution (GIBCO BRL). CHO AA8 Tet-Off Cell Line (Clontech) stably expressing the tetracycline-controlled transactivator (tTA) were maintained at 37degC in a 5% CO2-humidified incubator. CHO cells were cultured in alpha-MEM (GIBCO BRL) supplemented with 10% heat-inactivated fetal bovine serum (FBS) (Invitrogen) and 1% antibiotic/antimycotic solution (GIBCO BRL). Cells were seeded at a density of 300.000 per well in a 6 wells multi-well and transfected 1 day after seeding using Lipofectamine 2000 (Invitrogen) according to manufacturer's instructions with siRNA (Silencer Custom siRNA, 100 μM, Ambion) in a range of quantities from 0.001 pmol to 200 pmol (total concentration). The amounts of transfected siRNA oligomers were: 0, 0.001, 0.01, 0.05, 0.1, 0.5, 1.0, 10.0, 20.0, 40.0, 60.0, 80.0, 100.0 and 200.0 pmol in a total of 2 mL of medium (so the final concentrations of siRNA oligomers were 5 × 10-4, 5 × 10-3, 2.5 × 10-2, 5 × 10-2, 2.5 × 10-1, 5 × 10-1, 5.0, 10.0, 20.0, 30.0, 40.0, 50.0, and 100 nM respectively). Each experiment was performed in biological triplicates, and the resulting standard deviations were computed and reported in each graph. One day post-transfection, the media and ligand were replaced. Transfected cells were collected 48 hours post-transfection for RNA extraction and subsequent analysis. FACS analysis was performed 60 hours after transfection.
Total RNA extraction from 35 mm culture plates was performed using the Qiagen RNeasy Kit (Qiagen) according to manufacturers instructions. Retro-transcription of 1 μg of the total RNA extracted was performed using the QuantiTect®Reverse Transcription Kit (Qiagen), according to manufacturers instructions. Quantitative real-time PCR was performed using a LightCycler (Roche Molecular Biochemicals, Mannheim, Germany) to analyze the amplification status of EGFP and tTA. Amplification of the genes was performed from the cDNA obtained from the total RNA and using the LightCycler DNA Master SYRB Green I kit (Roche Molecular Biochemicals). Primer sequences for Human GAPDH and Chinese Hampster GAPDH (used as reference genes) were designed by Primer 3.0 http://frodo.wi.mit.edu/ (Forward primer Human GAPDH: GAAGGTGAAGGTCGGAGTC; Reverse primer Human GAPDH: GAAGATGGTGATGGGATTTC; Forward primer Hamster GAPDH : ACCCAGAAGACTGTGGATGG; Reverse primer Hamster GAPDH: GGATGCAGGGATGATGTTCT). Primer sequences for EGFP and tTA were also designed with Primer 3.0 (Forward primer EGFP: ACGACGGCAACTACAAGACC; Reverse primer EGFP: GCATCGACTTCAAGGAGGAC; Forward primer tTA: ACAGCGCATTAGAGCTGCTT; Reverse primer tTA: ACCTAGCTTCTGGGCGAGTT). The relative amounts of genes were compared with the GAPDH reference genes and calculated using the Principle of Relative Quantification Analysis according to the standard formula 2-DCt . To confirm the specificity of the amplification signal, we considered the primer dissociation curve in each case.
Cells from 35 mm culture plates were trypsinized, filtered and subjected to Fluorescence-Activated Cell Sorting (FACS) analysis 60 hours posttransfection in a Becton Dickinson FACSAria.
In the context of the specific in vitro experiments we carried out, we can make the following assumptions to derive the mathematical model:
Cells express the target mRNA at a constant rate k m which corresponds to the maximum transcription rate of the promoter.
We assume that the siRNA oligomers will be quickly loaded into the RISC and that step 2 of Figure 1 takes place in much shorter time scale than steps 3 and 4.
For the numerical fitting of the ratio of protein levels between negative and positive control, one needs to divide equation (14) by equation (15).
where N is the number of experimental points, is the experimental measurement of experiment i, model is the model prediction for experiment i and SE i is the standard error of experiment i. A genetic algorithm implemented in the "Genetic Algorithm and Direct Search Toolbox" of Matlab (the ga command) was then used to find the parameters which minimised the error function.
The absolute value errors of each model were then normalized against the largest error. Namely, the error of Model 3 (which in both case was the largest one) was set to 1 and all the other errors of the remaining three models, were normalized against error of Model 3.
The Prediction Error (PE) for each experimental dataset was computed by repeating the parameter fitting procedure described above, but this time using a leave-one-out cross-validation procedure. The PE was then computed as the average error (Eq. 16) between the predicted value and the experimental value across all the experimental points. As done for the error function, we then computed a relative value for the PE in order to compare the performance across the different models by normalising against the maximum PE across the four models, and reported it in Table 2 and in Additional file 1 Table A1.
Please observe that in the case of the tTA mRNA the cost function used was as in Eq. 16 but without dividing by the standard error SE since in this case the experimental data were more noisy and the SE could not be estimated accurately.
Thanks to Laura Pisapia for the FACS analysis, S. Giovane and F. Menolascina for technical support and to Mara Alfieri for providing the Hek-EGFP stable clone. Funding for this work has been provided by the Italian Ministry of Research Grant ITALBIONET to DdB.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.