Gene autoregulation via intronic microRNAs and its functions

Background MicroRNAs, post-transcriptional repressors of gene expression, play a pivotal role in gene regulatory networks. They are involved in core cellular processes and their dysregulation is associated to a broad range of human diseases. This paper focus on a minimal microRNA-mediated regulatory circuit, in which a protein-coding gene (host gene) is targeted by a microRNA located inside one of its introns. Results Autoregulation via intronic microRNAs is widespread in the human regulatory network, as confirmed by our bioinformatic analysis, and can perform several regulatory tasks despite its simple topology. Our analysis, based on analytical calculations and simulations, indicates that this circuitry alters the dynamics of the host gene expression, can induce complex responses implementing adaptation and Weber’s law, and efficiently filters fluctuations propagating from the upstream network to the host gene. A fine-tuning of the circuit parameters can optimize each of these functions. Interestingly, they are all related to gene expression homeostasis, in agreement with the increasing evidence suggesting a role of microRNA regulation in conferring robustness to biological processes. In addition to model analysis, we present a list of bioinformatically predicted candidate circuits in human for future experimental tests. Conclusions The results presented here suggest a potentially relevant functional role for negative self-regulation via intronic microRNAs, in particular as a homeostatic control mechanism of gene expression. Moreover, the map of circuit functions in terms of experimentally measurable parameters, resulting from our analysis, can be a useful guideline for possible applications in synthetic biology.


Mean field analysis of the dynamics
In this section the deterministic description of the regulatory circuits in analysis is reported in more details.
On this description is based the evaluation of the response times presented in the main text.

Simple transcriptional unit (sTF)
We first consider the dynamics of a simple transcription unit (scheme in Figure 1B of the main text), for which the time evolution can be evaluated analytically in the two cases of interest: the dynamics of the switch-on and switch-off processes.
The system of equations representing the sTF dynamics is: where k r (q) is the nonlinear increasing function of the TF concentration q reported in the main text in equation 1.
With the target promoter exposed to full activation (q/h r ≫ 1), the transcription rate reduces to k r (q) ≃ k r and it is possible to calculate how the final steady state is approached by the various molecular species starting from the initial condition p(0) = r(0) = 0: with This expression can be simplified in the case of short-living mRNAs to p(t)/p ss ≃ (1 − e −gpt ) as reported in [1].
The response time T ON is then defined by the equation p(T ON )/p ss = 0.5. As can be seen in equations 2, T ON does not depend on production rates (k r and k p ) but only on the half-lives of mRNAs and proteins.
Since T ON for a sTF is independent of the final steady-state value of p, if molecule half-lives are kept constant, it can be used as null model for comparison with iMSLs and tSLs at different levels of repression, without the need of constraints on parameters.
Analogously, the response time T OF F to a switch-off stimulus can be derived. In this case, the initial condition is the steady state given by a fully activated promoter p(0) = p ss , and k r is set to zero at t = 0.
Again the dynamics depends only on the half-lives of mRNAs and proteins: where p(t) p ss = e −gt (1 + gt) if g r = g p = g.
The response time T OF F is given by the condition p(T OF F )/p ss = 0.5.

Intronic miRNA-mediated self loop (iMSL)
The deterministic description of the iMSL is given by equations 3 of the main text. With the condition q/h r ≫ 1, required to have the target promoter exposed to full activation, the transcription rate is at its maximum value k r and the steady-state solution can be easily found: On the other hand, the dynamics and thus the response times can be extracted with numerical integration.
These response times have been normalized in the main text, in particular in Figure 2, with the response times of a sTF (calculated as explained in the previous section), so as to evaluate the differences in the dynamics with respect to a constitutive transcription unit.

Transcriptional self regulation (tSL)
The tSL dynamics is described by the two equations: where the transcription rate k r (q, p) is a product of two Michaelis-Menten-like functions: one corresponding to activation by the TF q and the other one taking into account the transcriptional self-repression.
The choice of a simple product of functions implies the assumption of independent binding of the two regulators [2], which is probably the most common situation. Therefore, the form of the transcription rate is: ] .
The condition q/h r ≫ 1 leads to the simplification k r (q, p) ≃ k r (p) = k r 1 1+( p hp ) . In this case the steadystate solution is: In order to compare the dynamics of tSLs with the iMSL one in un unbiased way, we impose a constraint on parameters in order to have the same steady state p ss for the target protein level. This can be simply done by equating the values of p ss in equations 7 and 4 so as to extract the constraint on the parameter h p which sets the repression strength in the tSL depending on the repression strength h in the imLS: Using this constraint, the response times for the tSL circuit can be evaluated numerically and directly compared with the response times of the iMSL circuit.

Conditions for adaptation and Weber's law implementation
As discussed in the main text, a necessary conditions to have perfect adaptation is the maintenance of a steady state independent of the input level, if this input is constant. In this way, the system can have a dynamical response to input changes while returning to its initial condition once the input level is steady for a sufficiently long time.
In the case of iMSLs, a strong miRNA repression can make the circuit fulfill this condition. In the regime of strong repression s/h ≫ 1, the translation rate simplifies to k p (s) ≃ hk p /s. The substitution of this expression in the equations describing the circuit dynamics (equations 3 of the main text), leads to a steadystate solution of the form: The steady state of the host-gene protein does not depend on the input level q, thus the iMSL circuit can in principle implement perfect adaptation.
Weber's law requires additionally that the peak of the dynamical response depends only on the fold-change of the input. Introducing the further assumption of an approximately linear transcriptional activation (i.e. the amount of TFs q is far from saturating the target promoter), the transcription rate becomes k r (q) ≃ kr hr q . Therefore, the two conditions of strong repression and approximately linear activation simplify the equations of the iMSL dynamics (equations 3 of the main text) to: Assuming that the mRNA half-life is shorter than the other time scales in the system, a quasi-steady-state approximation can be used to further reduce the kinetic equations to: where the input stimulus is represented by a change in the TF concentration from a basal level q 0 to a new level q (F is the fold-change). Equations 11 can be thus rewritten as : This reformulation shows that the dynamics of the target protein depends only on the fold-change F in the input stimulus and not on its absolute value. Equations 14 are the analogous of the equations presented in [3] for the feed-forward loop circuit, adapted here to the iMSL case. Therefore, if the three conditions of strong repression, almost linear transcriptional activation and short mRNA half-life are satisfied, iMLs can implement Weber's law.

Intronic miRNA-mediated self loop (iMSL)
This section briefly describes the procedure to calculate analitically the coefficient of variation CV xi for the molecular species x i involved in the intronic miRNA-mediated self loop. The procedure can be similarly applied to the other two circuits (sTF and tSL).
For the stochastic analysis, the dynamics of transcription, translation and degradation of the TF is included explicitly. Therefore, the additional variable w, representing the TF mRNA number is added in the system description, as well as its transcription rate k w , translation rate k q and the degradation rates g w and g q . In this way, the noise level in the input signal can be naturally modulated changing the relative contribution of transcription and translation to the ⟨q⟩ steady-state level, as described in details in [4].
The following master equation describes the evolution of the probability to find in a cell exactly (w, q, s, r, p) molecules at a given time t: where k r (q) and k p (s) have the functional form described in the main text.
In order to evaluate the noise level at steady state, given by the coefficient of variation CV xi ≡ σ xi /⟨x i ⟩, for each molecular species x i , it is necessary to find a closed expression for the first two moments of the above probability distribution at equilibrium. To this aim, it is sufficient to linearize the regulation functions k r (q) and k p (s) [4][5][6], and apply the moment generating function approach to the resulting master equation at equilibrium [7]. Even after the linearization, the system preserves a nonlinearity due to the term encoding the target translation, which still depends on both the number of mRNAs and miRNAs, but nonetheless the first two moments for the p distribution can be calculated. Using the linearization of the regulation functions: the two rates can be redefined as: with: By defining the generating function: and using the linearized regulation functions, equation 15 can be converted in the following second-order partial differential equation: The differentiation of 20 at the steady state leads to equations for successively higher moments thanks to the following properties of the moment generating function: Differentiating up to the fourth moments, the analytical expression for CV xi = σx i <xi> can be obtained (see [4] for a more exhaustive and detailed analysis), and thus also the noise level of the host-gene protein product used in the main text.

Simple transcriptional unit (sTF)
The stochastic analysis of the sTF can be performed as explained in the previous section. We just report here the corresponding master equation:

Transcriptional self-regulation (tSL)
The master equation for the transcriptional self-regulation (tSL) is: In order to calculate CV p with the moment generating function approach, it is necessary to define the linearization of the function k r (q, p) shown in equation 6. As described in [4], we can linearize it as: and thus obtain a transcription rate of the form k r (q, p) ≃ k 0 r + k 1 r q − k 2 r p. Using this linearization and the moment generating function approach, the analytical expressions of ⟨p⟩ and CV p can be obtained as described in previous sections.

Molecular titration model
A mathematical model of miRNA-mRNA interaction was previously proposed to describe sRNA regulation in bacteria [8]. This model takes into account the physical coupling between miRNAs and mRNAs explicitly with a simple titration mechanism: a miRNA can form a complex with a target mRNA, degrade it and then eventually be available again to target other mRNAs. A parameter α is introduced to measure the probability of miRNA recycling after target degradation induced by mRNA-miRNA coupling. Thus, α represents the degree of "catalyticity" of miRNA regulation, with α = 0 for perfect catalytic action, while α = 1 for stoichiometric action.
Applying this modeling strategy to the iMSL circuit, the following system of differential equations is obtained: In this equations, c is the concentration of the miRNA-mRNA complex, k + is the probability of miRNA-mRNA association and k − the probability of dissociation of the complex c, that can degrade with rate β.
An analogous model of miRNA regulation have been used to describe the results of single-cell experiments in mammalian cells [9], with the additional assumption of slow miRNA turnover, thus neglecting the dynamics of miRNA transcription and degradation.

Relations between titration model and phenomenological models based on Michaelis-Menten functions
If the coupling of miRNAs and mRNAs is fast, or if the interest is on steady state properties, the c dynamics can be equilibrated in equations 24: with k rs = βk + /(k − +β). miRNA regulation is often distinguished from sRNA one because of the efficient recycling (i.e. catalytic interaction). In particular, in the case of α ≃ 0, the molecular concentrations at steady state are: where r 0 = k r (q)/g r is the steady state for a constitutive mRNA, while h = g r /k rs . This expression for the target protein concentration shows the equivalency to the model of miRNA inhibition of target translation used in the main text. In fact, the same steady state can be obtained using an effective Michaelis-Menten function of miRNA concentration as target translation rate (as in equations 3 of the main text), with an effective dissociation constant h = g r /k rs . Therefore, in the limit of high miRNA recycling the steady state properties of a titration model are completely equivalent to an effective model of nonlinear miRNA action on target translation rate.
The miRNA regulation can also be modeled using an effective nonlinear function in the mRNA degradation term, thus assuming that miRNA regulation acts mainly on the stability of mRNAs rather than on their translation efficiency. In this case, the equations describing the iMSL circuit dynamics are: The miRNA action is represented by adding to the basal rate of mRNA degradation g r (in absence of miRNAs) an increasing function of miRNA concentration, where g max is the maximum possible increase of the degradation rate (if s → ∞, g r (s) → g r + g max ) and h is the dissociation constant of miRNA-mRNA interaction. It's easy to see that in the case of strong enough repression, for which s/h ≫ 1, the equation for r in 27 can be recasted as: making clear the relation between this description and the titration model with fast binding/unbinding of mRNA and miRNAs and high cataliticity, i.e. equations 25 with α ≃ 0 and k rs = g max /h.

Adaptation and Weber's law conditions in the titration model
As shown in section 2, the iMSL can perform adaptation in the regime of strong miRNA-mediated repression.
In the context of the titration model with high catalyticity (α ≃ 0), strong repression implies that the degradation of mRNAs is dominated by miRNA regulation. Therefore, we can approximate equations 25 with: The steady state solution for the target protein is: This expression is independent on the input level q, showing that the condition for adaptation is satisfied in the strong repression regime also for this alternative modeling strategy of miRNA regulation, if the miRNA recycling is sufficiently efficient as it is expected for miRNA regulation in higher eukaryotes. Given the steady-state equivalency (shown in the previous section) between the titration model with α ≃ 0 and a phenomenological model of mRNA degradation induction, the addition of iMSLs to the list of adaptive circuits seems robust and model-independent.
Starting from equations 29, in which there is also the implicit assumption of fast mRNA-miRNA binding/unbinding, also the conditions for Weber's law implementation can be examined in the context of the titration model. As previously discussed, the additional requirements with respect to adaptation are an almost linear transcriptional activation and a fast mRNA dynamics. With these two constraints the dynamic equations become: These two equations have exactly the same form of equations 11, thus can be similarly reformulated in terms of adimensional variables, showing that the p dynamics depends only on the input fold-change.

Comparison of the response times for different models of miRNA-mRNA interaction.
A direct comparison of the response times for the different models of miRNA-mRNA interaction is made difficult by the higher number of parameters that are present in the titration model (equations 24) with respect to the phenomenological model presented in the main text. The way in which the two models can be constrained to have the same level of target protein at equilibrium, in order to make an unbiased comparison, is indeed quite arbitrary. In particular, the timescale of the binding/unbinding of mRNAs and miRNAs in the c complex can strongly influence the dynamics.

Effects of the microRNA biogenesis process on the circuit functions.
As discussed in the main text, the multi-step process of miRNA biogenesis can influence the dynamics of the iMSL circuit. In particular, the time needed for the miRNA to be processed, exported, loaded in the RISC complex, and in general to become active may introduce a delay between its transcription and its effect on targets. Therefore, even if the mRNA and the miRNA come from the same transcript, a certain amount of time can be required before observing miRNA repression, thus altering the correlation in mRNA and its intronic miRNA dynamics that in principle follows from their co-transcription. There is evidence that intronic miRNAs can be cleaved by the Drosha enzyme before splicing [10], thus limiting the delay introduced by the nuclear processing. However, even in this case the subsequent steps of cytoplasmic processing and The response times to an activating/deactivating input signal are significantly dependent on the delay (see Figure 1A and B). In case of activation, the longer is the time required to have active miRNAs, the more the protein concentration can rise in absence of any repression. Therefore, the response time T ON is expected to decrease with the delay as indeed shown in Figure 1A. When the time gap between miRNA transcription and activation is large enough to allow the target protein concentration to reach the equilibrium in absence of repression, the response time is actually the one of a constitutive gene approaching its steady state. This explains why curves corresponding to different large delay values tend to overlap in Figure 1A.
As discussed in the main text, in presence of a deactivating signal the switch off dynamics of iMLS is slower than the deactivation dynamics of a constitutive gene. This is due to the fact that after the transcription stop, for each single miRNA that is degraded, the still present mRNAs sense an increase in their translation rate, thus slowering the dynamics with respect to an exponential decrease due to protein and mRNA degradation.
In presence of a time-consuming miRNA processing, while degradation events of active miRNAs are still expected to increase the effective target translation rate, at the same time miRNAs that complete the processing steps can increase the pool of active miRNAs. Therefore, the slowdown of the deactivation dynamics is expected to be weaker in presence of a long biogenesis process, as indeed captured by our phenomenological model (see Figure 1B).
For the sake of completeness, we also report the analysis of the case in which miRNA maturation is faster than mRNA processing. This situation can emerge for example in presence of a slow mRNA export with respect to the miRNA one, thus leading to a faster increase of miRNA concentration in the cytoplasm after transcription activation with respect to the mRNA concentration. The response times correponding to this case (blue dashed lines in Figure 1A and B) show that, as intuitively expected, an eventual delay in the mRNA dynamics (inserted in the model as a time-lag between transcription and translation) has opposite effects on the circuit response with respect to a delay due to miRNA maturation.
The alteration of the dynamics that can arise from miRNA biogenesis has also consequences on the adaptation response of the iMSL circuit. The adaptation precision P (defined in the main text) is not affected by the presence of the delay, since it is a property that follows from the steady-state protein dependence on the input level. On the other hand, as previously discussed for the reponse times, the presence of a delay in miRNA activation allows the protein concentration to reach higher levels in presence of a step activating signal. Therefore, the peak of the dynamical response to a step input, before relaxation to the final steady state, can be significantly increased by the delay, as shown in Figure 1C. In other words, the delay can increase the sensitivity S (see definition in the main text) without affecting the adaptation precision. However, the increased sensitivity must be compared to the noise level (CV p = σ p /⟨p⟩) of p at steady state, to measure how much the produced signal is above the steady-state fluctuations. Noise is expected to increase in presence of a delay in miRNA activation. In fact, the noise buffering property of the iMSL relies on the correlation between miRNA and mRNA fluctuations, and this correlation can be disrupted by the presence of a delay, as we have previously shown for miRNA-mediated feedforward loops [4]. We performed stochastic simulations in order to compare sensitivity and fluctuations for different delay values. The results of this comparison are shown in Figure 1D, where the sensitivity normalized with 2CV p is depicted as a function of the delay time.
With this normalization the response signal can be considered above the noise level (following the definition in the main text) if the normalized sensitivity is greater than 1. The reduced efficiency in noise buffering does not compensate the increased peak amplitude of the protein concentration response, thus making the normalized sensitivity an increasing function of the delay time. This result implies that a difference in the timescales of miRNA and mRNA dynamics due to a delay in miRNA activation has a constructive effect on the implementation of adaptation by iMSLs.
The processes of miRNA and mRNA biogenesis can also induce different effective synthesis rates of mature mRNAs and active miRNAs, for example if only a certain percentage of transcribed miRNAs can complete  Figure C in the main text, in particular of the plot i in the bottom right corner. The plot shows that the peaks in the protein concentration response become progressively higher increasing the delay in miRNA activation. The opposite is true if there is a time-lag between transcription and mRNA translation (blue curve). D) The sensitivity S, normalized with 2CV p (where CV p = σ p /⟨p⟩ is the noise level at steady state), is reported as a function of the miRNA activation delay time for the parameter values of the previous plot C. In absence of delay the circuit is not sensitive (as indeed shown also in Figure 4C of the main text), but a slow miRNA activation leads to an output signal above the noise level before adaptation.
the maturation steps or an inefficient mRNA splicing leads to a mRNA level in the cytoplasm lower than the transcribed one. While in our model each transcription event generates an active miRNA and a target mRNA, the effect of different effective synthesis rates for the two molecular species should not alter qualitatively the results presented in this paper if the final levels of the active molecular species are proportional to the transcribed ones. In fact, in this case eventual discrepancies between the transcribed level and the active level of miRNAs or mRNAs could actually be taken into account rescaling the appropriate parameters. For example, the fact that only a certain proportion of the transcribed miRNAs becomes active is phenomenologically equivalent to a rescaling of the repression strength 1/h, since the relevant quantity that establishes the dowregulation level of target translation is s/h.
Finally, there is experimental evidence of competition for a finite RISC pool (or other necessary small RNA processing or transport machinery) in transfection experiments [12] (although the relevance of this competition is still quantitatively less clear for endogenous miRNAs). If the RISC concentration is limiting, a saturation effect is expected for high miRNA concentrations since the available RISC complexes can be not enough to ensure activation of all the present miRNAs. In this case, an increase of transcription does not lead to a proportional increase of the active miRNA concentration and thus to an increased level of repression.
Therefore, for a low RISC concentration compared to the miRNA one, the functions of miRNA-mediated circuits can be compromised, as it has been shown for the adaptive behaviour of incoherent feedforward loops [11]. Consequently, the relative amount of RISC complexes (as well as other processing enzymes) and miRNAs is a factor potentially relevant in possible experimental tests of the results presented in this paper. Figure 2: Table 1S. (A) Description of human known protein coding genes and human miRNA datasets. For each ENSG, we considered the longest Ensembl transcript ID (ENST). Data from Ensembl v.57 that include miRBase v.13. (B) Classification of human miRNAs with respect to their host genes. Depending on the genomic location of human miRNAs, we considered as intergenic miRNAs whose genomic position were found distant from annotated genes, while intragenic miRNAs whose located within a transcript (annotated as "host gene"). Afterward, intragenic miRNAs were further subdivided into intronic and exonic. An intragenic miRNA was called exonic if its genomic coordinates overlap with genomic coordinates of any exon in the database, and was labeled intronic otherwise. In addition, intragenic miRNAs can be classified depending on whether they are on the same strand (SS) or on the opposite strand (OS) of their host gene. All UTR miRNAs were found to overlap the exon regions. Figure 3: Table 2S. List of intronic miRNA-mediated self loops (iMSLs). For each iMSL we highlighted with "x" the correspondent dataset where they were found, with blue color if they were reported by [13], with red if they were found by [14] and with fuchsia if they were reported in both these two studies. The only iMSL experimentally validated was highlighted with ( * ) symbol.  Table 2S.