Modeling miRNAmRNA interactions: fitting chemical kinetics equations to microarray data
 Zijun Luo^{1, 2}Email author,
 Robert Azencott^{1} and
 Yi Zhao^{2}Email author
DOI: 10.1186/17520509819
© Luo et al.; licensee BioMed Central Ltd. 2014
Received: 10 March 2013
Accepted: 12 February 2014
Published: 18 February 2014
Abstract
Background
The miRNAs are small noncoding RNAs of roughly 22 nucleotides in length, which can bind with and inhibit protein coding mRNAs through complementary base pairing. By degrading mRNAs and repressing proteins, miRNAs regulate the cell signaling and cell functions. This paper focuses on innovative mathematical techniques to model gene interactions by algorithmic analysis of microarray data. Our goal was to elucidate which mRNAs were actually degraded or had their translation inhibited by miRNAs belonging to a very large pool of potential miRNAs.
Results
We proposed two chemical kinetics equations (CKEs) to model the interactions between miRNAs, mRNAs and the associated proteins. In order to reduce computational cost, we used a non linear profile clustering method named minimal net clustering and efficiently condensed the large set of expression profiles observed in our microarray data sets. We determined unknown parameters of the CKE models by minimizing the discrepancy between model prediction and data, using our own fast non linear optimization algorithm. We then retained only the CKE models for which the optimized fit to microarray data is of high quality and validated multiple miRNAmRNA pairs.
Conclusion
The implementation of CKE modeling and minimal net clustering reduces drastically the potential set of miRNAmRNA pairs, with a high gain for further experimental validations. The minimal net clustering also provides good miRNA candidates that have similar regulatory roles.
Keywords
miRNA Chemical kinetics modeling Minimal net clusteringBackground
Transcriptional and translational processes are fundamental cell mechanisms, involving three main molecular species: messenger RNA (mRNA) and their associated proteins, as well as microRNAs (miRNAs).
The miRNAs are small noncoding RNAs of roughly 22 nucleotides in length, which can bind with and inhibit protein coding mRNAs through complementary base pairing. A given miRNA can potentially bind and silence hundreds of mRNAs across a number of signaling pathways. These repressive miRNAmRNA interactions occur in multiple cellular processes [1–3], and involve two distinct modalities: they may directly degrade their target mRNAs, or more often inhibit their translation [4–9].
The best characterized features determining the targets of a specific miRNA are the conserved WatsonCrick pairing to the 5’ region (positions 2–7) of the miRNA, which are the socalled “seed pairing rules” [3, 10–13]. Since seed pairing rules are neither sufficient nor necessary for miRNAtarget functions [4, 14], they have usually been combined with microarray or proteomic analysis to find potential miRNAtarget pairs[15–17]. Classical microarray data analysis relies mostly on massive application of correlation analysis and linear statistical techniques to simultaneously acquired gene expression profiles. Combined with profile thresholding and heat map displays, these techniques provide commonly used clues for qualitative inference.
In [18], Principal Component Analysis and linear correlation had been linked with comparative sequence analysis to study microarray data recorded during mouse stem cells differentiation, and to broadly predict potential miRNAmRNA interactions.
To go beyond the results of the linear microarray analysis applied on timecourse microarray data in [18], we have formalized two basic architectures for repressive miRNAmRNA interactions: “Transcription Degradation” (TD) and “Translation Inhibition” (TI).
Traditional chemical kinetics equations had been proposed to model the transcriptional and translational processes without involving the interaction of miRNAs [19, 20].
We have derived Chemical Kinetics Equations (CKEs) to model the dynamics of TD and TI motifs, in the spirit of [20–30]. The equations are algebraically invariant under affine transformation and allow data condensation to reduce computational cost. We have implemented “minimal net” clustering method, which can control the maximum diameter of the clusters, to condense large data sets of gene expression.
Modeling by nonlinear CKEs involves complicated parameter estimation problems to fit the very large set of expression profiles recorded by microarrays. We have developed innovative fast algorithms dedicated to CKE parameter estimation, by optimization of the quality of fit between model and data.
We validate only the parameterized motifs having a high quality of fit to data. To reach robust conclusions we apply a “parameter parsimony” principle, favoring the models having the smallest number of parameters. And we have also evaluated the robustness of our parameter estimation algorithm by algorithm by direct testing on simulated data. These tests did validate our parameter estimation method is quite robust. This nonlinear approach goes further than wellestablished analysis based on correlation techniques combined with heat map displays.
Methods
Basic interaction architectures
To validate if an miRNA M does indeed repress a given gene G, we model the chemical interactions of M and G within a small network containing the pair (M,G). We now sketch two basic interaction architectures and their CKE models.
Transcription Degradation Motifs (TDmotifs)
where β>0 is the degradation rate of G, v>0 is the reaction rate between G and M, κ>0 is the product of the transcription rate by the concentration c of DNA templates, concentration which we assume to be some constant not depending on time (see [20]).
The parameters S R _{ i }>0,u _{ i }>0 and S A _{ j }>0,w _{ j }>0 are the number of binding sites and the affinity constant for the transcriptional factors R _{ i } and A _{ j }.
Note that the transcription repressors R _{ i } combine multiplicatively their individual impacts R E P _{ i } in R E P(t), and that the R E P _{ i } are analogous to Hill function (see [19, 25]); similar remarks apply to the transcription activators. The multiplicative expressions of R E P(t) and A C T(t) are typical of a socalled “cisregulatory” function and have been derived by J. Goutsias [20].
The term κ R E P(t)[ 1A C T(t)]d t is the concentration of new G molecules synthesized by transcription during the small time interval [ t,t+d t], while the repressive interactions of M and G eliminates v g(t)m(t)d t molecules of G, and natural decay destructs β g(t)d t molecules of G.
TranslationInhibition Motifs (TImotifs)
The parameters S M _{ i }>0 and u _{ i }>0 are resp. the number of binding sites and the affinity constant controlling the inhibiting impact of miRNA M _{ i } on the translation of gene G. Note that H(t) decreases when the miRNA concentrations m _{ i }(t) increase.
Our model for TI motif has been inspired by J. Goutsias [20], and we present below the hypotheses and arguments justifying the expression of H(t). There is a key difference between the CKEs modeling TD and TI motifs. For TI motifs, the concentration of Gmolecules committed to translation is g(t)H(t), where g(t) is the concentration of Gmolecules and H(t)<100%.
For TD motifs, the concentration of Gmolecules synthesized by transcription is κ F(t)=α c F(t) with F(t)<100%, where α and c are respectively the transcription rate and the concentration of DNA templates, assumed to be constant in time.
Derivation of chemical kinetics equations
Our derivation of the regulation equation for TI motif is quite similar to presentation given for TD motifs in [20], but has several changes in assumptions and formulations. To derive the CKEs (1) and (4), we propose a few hypotheses.

Hyp. 1: (TDmotifs) The molecules of the miRNA repressor of gene G can strongly bind only at one unique specific site of G molecules, and once a single such strong bind occurs, the corresponding G molecule degrades extremely fast.

Hyp. 2: (TImotifs) Each miRNA M _{ j } in the set r e p(G) of translation inhibitors of G can weakly bind with G but only at specific binding sites constituting a set B I N D _{ j } of size S _{ j }. The sets B I N D _{1},B I N D _{2},… are pairwise disjoint. Once a G molecule thus binds with one inhibitor M _{ j }, then this G molecule will fail to translate.

Hyp. 3: For any given Gmolecule, call X _{ j } the random number of sites in B I N D _{ j } which actually bind with one “ M _{ j }”molecule. We assume that the random variables X _{1},X _{2},…. are independent.
Hyp.1 is based on the fact that only a small fraction of all messenger RNAs have more than a single miRNA binding site and miRNA bound with an mRNA gene G has a limited effect on the mRNA G, but affects more substantially the protein P generated by G ([4, 31].
where γ,λ, are the degradation and translation rates of P.
The main point is to compute H(t).
The set of Gmolecules committed to translation into P at time t is identical to q(0,t). Thus H(t)=q(0,t)/g(t), and (11) yields $H\left(t\right)=1/{[\phantom{\rule{0.3em}{0ex}}1+u{m}_{1}(t\left)\right]}^{{S}_{1}}$.
which proves CKE (4) for TI motifs.
For TD motifs, the cisregulation function F(t) in CKE (1) has been derived in [20]. By using hypothesis 1 and molecular collision theory, we directly deduce that the concentration the concentration of mRNA G degraded by miRNA M is proportional to the product of both concentrations of G and M. Thus we justify the miRNA degradation term v g(t)m(t).
Invariance by affine profile transformations
Affine profile transformations
for all time dates t. Let be the set of all such affine profile transformations.
In microarray data sets, expression levels of genes are recorded via optical analysis of fluorescence intensities, and hence depend strongly on experimental acquisition modalities. So relative expression levels between pairs of recorded chemical species are more meaningful quantities, and graphic displays of microarray data by “heat maps” often involve logarithms of raw data.
Our microarray data record expression levels which as a first approximation can be viewed as unknown affine transformations of concentrations. Since our CKEs (1), (4) were derived for concentrations, we need to check how these CKEs and their parameters change under generic affine profile transformations.
Affine invariance of CKE models
Consider first a TDmotif involving an mRNA gene G, a protein P, an miRNA M, transcription repressive proteins R _{1},R _{2},…, and transcription activating proteins A _{1},A _{2},…. The CKE model (1) links the concentrations g,p,m,r _{ i },a _{ j } of G,M,P,R _{ i },A _{ j }. Let $\u011d,\widehat{p},\widehat{m},{\widehat{r}}_{i},{\xe2}_{j}$ be the corresponding recorded expression profiles. Assume that each such recorded profiles $\widehat{r}$ is linked to the concentration profile 4 by some unknown affine profile transformation T _{ r }.
From CKE (1), one directly deduces that $\u011d,\widehat{p},\widehat{m},{\widehat{r}}_{i},{\xe2}_{j}$ verify a new CKE having an algebraic form completely similar to CKE (1), but where the original parameters β,v,κ,u _{ i },w _{ j } are replaced by new parameters $\widehat{\beta},\widehat{v},\widehat{\kappa},{\xfb}_{i},{\u0175}_{j}$, and where the integers S R _{1},S R _{2},…, and S A _{1},S A _{2},…, remain unchanged.
The new parameters are easily expressed in terms of the original ones and of the coefficients of the affine transformations, but this is irrelevant practically since we will use microarray data to directly compute the new parameters for a CKE of type (1) linking recorded expression levels.
Similar computations for TImotifs show that this affine invariance property also holds for CKE (4). Hence to model a TD or a TI motif, we can fit a CKE model of type (1) or (4) to recorded expression profiles, even though the theoretical model justification involved true concentrations, which are not directly measured by microarrays.
The key assumption is that, for each chemical species C, the expression levels $\u0109\left(t\right)$ of C recorded by microarray are approximately linked to the concentration c(t) of C by some affine relation $\u0109\left(t\right)=a\left(C\right)c\left(t\right)+b\left(C\right)$, where the unknown coefficients a(C) and b(C) may depend on the species C.
The preceding algebraic model invariance under multiple affine profile transformations strongly suggests that adequate distances between dynamic profiles of recorded expression levels should be invariant under affine profiles transformations, as developed in the next section.
Condensation of expression levels profiles
Microarray data typically record several tens of thousands gene expression profiles. So computational costs to fit microarray data to all potential TDmotifs and TImotifs would of course be prohibitive. A natural option to reduce combinatorial explosion is to cluster the observed profiles.
Since our goal is to model expression profiles by ODEs of type (1) and (4), we need to control the diameters of all clusters of expression profiles. This led us to reject hierarchical clustering as well as Kmeans clustering, and to implement in the space of expression profiles a “minimalnet” clustering technique, inspired by an innovative technique for automatic generation of prototypes in shape spaces (see [32]). In view of the preceding section, the diameters of clusters in the space of profiles should be measured by a distance invariant under affine profiles transformations.
Affine invariant distance between profiles
where <·,·> is the usual scalar product in R ^{ p }.
So the profile distance D is invariant by affine profile transformations
Minimal net clustering
The set Γ of all profiles is now endowed with a distance D invariant by affine profile transformations. Call mPR the large set of miRNA profiles recorded at time t _{1},…,t _{ q } by our microarrays. We fix a maximum radius ε for profiles clusters. We seek to partition mPR into disjoint clusters C L _{1},…,C L _{ r } such that each C L _{ j } has diameter inferior to 2ε, and we also want the number NN of clusters to be as small as possible. We have implemented an iterative algorithm to generate this type of minimal net clustering, in the spirit of [32].
 1.
step 1, let (x _{1},y _{1})=a r g m a x _{ x,y } D(x,y). Then let x _{1} and y _{1} become the representatives of two initial clusters. Let C L _{1}= [ x _{1}],C L _{2}= [ x _{2}], C _{1}={C L _{1},C L _{2}} and R _{1}=Γ∖C _{1} representing the remaining points in Γ excluding the 2 clusters.
 2.
After step n1, we obtain C _{ n1}={C L _{1},C L _{2},…C L _{ n }} and R _{ n1}=Γ∖C _{ n1}, where C L _{ j } is a cluster that has single point, j=1,…,n+1.
In step n, let $\mathit{\text{HD}}=\underset{x\in {R}_{n1}}{max}D(x,{C}_{n1})$, representing the maximum distance between observation x in R _{ n1} and set C _{ n1}. If H D>ε, find ${x}_{\mathit{\text{HD}}}={\mathit{\text{argmax}}}_{x\in {R}_{n1}}D(x,{C}_{n1})$. Let C _{ n }={C _{ n1},x _{ H D }}, R _{ n }=Γ∖C _{ n }. Repeat until H D≤ε.
 3.
Assume the loop stop at step NN, we have C _{ N N }={C L _{1},C L _{2},…C L _{ N N+1}}. For an observation x, find the point ${y}_{{\mathit{\text{CL}}}_{k}}$, belonging to cluster C L _{ k } in C _{ N N }, that is closest to x, i.e. ${y}_{{\mathit{\text{CL}}}_{k}}={\mathit{\text{argmin}}}_{y\in {C}_{\mathit{\text{NN}}}}\left(D\right(x,{C}_{\mathit{\text{NN}}}\left)\right)$, then assign x to the cluster C L _{ k }.
We apply this minimal net clustering algorithm to the set of all miRNA profiles recorded by our microarrays. We define the cluster diameter d i a m(C L) of cluster CL as the maximum distance of any two observations in the cluster, i.e. d i a m(C L)= maxx,y∈C L D(x,y), where. Compared with the commonly used clustering method such as Kmeans and Hierarchical clustering, the minimal net algorithm allows us to control the diameter of all clusters by setting the threshold ε representing the supremum radius of the cluster, while the Kmeans and hierarchical clustering is often used to determine the number of of clusters but not their diameters. If we select a very small threshold ε, the expression levels of genes in the same cluster can be considered almost identical.
Parameter estimation for the CKE models
Parameters estimation strategy
A generic strategy for parameter estimation in CKEs systems is to minimize a cost function evaluating the discrepancy between model predictions and experimental data. Each one of our CKE models has a single output variable, namely the expression level g(t) of mRNA gene G for a TD motif, and the expression level p(t) of protein P for a TI motif. The output variable g(t) or p(t) of a CKE model can be estimated by a function $\u011d\left(t\right)$ or $\widehat{p}\left(t\right)$ once one knows the profiles of all other molecular species involved in the model. The estimators $\u011d\left(t\right)$ or $\widehat{p}\left(t\right)$ are respectively determined by CKE (1) or CKE (4).
where the output variable f(t) is equal to g(t) for TD motifs and to p(t) for TI motifs. The concrete goal is to find the best parameter vector w by minimization of the lack of fit E R R(w) over all possible values of w.
There are no magic solutions for such non linear minimization problems. Moreover fast computing was essential here, since we usually have to solve a very large number of similar “quality of fit maximization” problems when dealing with large microarray datasets.
We have tested several generic cost minimization approaches (see [33–35]) such as “genetic algorithms”, as well as “gradient descent” to minimize a sum of squared modeling residuals. These two techniques turned out to require far too much computing time and were often unreliable due to their high dependence on initialization values.
We hence developed our own fast CKE parametrization algorithms to optimize the quality of fit between CKE models and microarray profiles data. This optimized quality of fit, adequately balanced by a systematic emphasis on parsimoniously parameterized models, becomes an essential clue to decide which potential interactions one should validate between miRNAs, mRNAs, and associated proteins.
Parameters parsimony requirement
Robustness of CKE parametrization is the main motivation for our parameter parsimony requirements. Consider CKE (1) modeling a TDmotif TDM involving n _{ a c t } activators and n _{ r e p } repressors for the transcription of mRNA gene G, and one miRNA M degrading the transcription of G. Then the number n _{ p a r } of unknown parameters is n _{ p a r }=3+2(n _{ a c t }+n _{ r e p }). Each profile g(t),m(t),a _{ i }(t),r _{ j }(t) is recorded at t=t _{1},…,t _{ p }. The CKE outputs for TDM are g(t _{1}),…,g(t _{ p }). Hence each microarray data set provides (p1) equations linking the n _{ p a r } unknown parameters, since the recorded g(t) should be very close to the predicted values $\u011d\left(t\right)$ obtained by solving the ODE (1) with initial value g(t _{1}).
Vapnik’s results on model fitting (see [36]) show that robust accuracy of parameter estimates requires fairly high values of the ratio of (p1)/n _{ p a r }. So n _{ p a r }<<(p1) is a necessary constraint, and we will impose the parameter parsimony requirement n _{ p a r }≤(p1)/4. Indeed when n _{ p a r }>(t _{ p }1), CKE models are overfitted and parameters are poorly estimated.
CKE Parameter estimation
TImotifs: plausible ranges for parameters
Consider a generic TI motif TIM involving an mRNA gene G, its associated protein P, and k translation inhibiting miRNAs [ M _{1},…,M _{ k }]. Let p(t),g(t),m _{ i }(t) be the expression levels of P,G,M _{ i }. We want to model TIM by CKE (4).
which involves (2k+2)≤8 parameters.
The degradation and translation rates γ and λ of protein P were unknown for most proteins.
According to results from [31], only a small percentage 2% of our 30,000 mRNAs have more than 2 potential binding sites for miRNAs, and only 0.02% mRNAs have as many as 7 such binding sites. So in our parameter estimations it is reasonable to restrict the number of binding sites S _{ j } for miRNA M _{ j } to be at most 5.
TImotifs: optimizing the quality of fit
The inputs of CKE (4) are the initial value p(t _{1}) and the expression levels g(t),m _{1}(t),…,m _{ k }(t) recorded at time dates t _{1},…,t _{ q }.
TImotif: parameter estimation algorithm
The assumption 0<H _{ i }(t)<1 yields $0<{\stackrel{\u0304}{H}}_{i}\left(t\right)<1$. We will discretize the possible values of ${\stackrel{\u0304}{H}}_{i}\left(t\right)$ by constraining them to belong to a grid h _{1},…,h _{ s } of 1≤s≤99 percentage values equally spaced in [ 0,1]. Since ${\stackrel{\u0304}{m}}_{i}$ is known, we invert equation (19) to compute a corresponding grid G R D _{ i } of s potential values for ${u}_{i}=\frac{1}{{\stackrel{\u0304}{m}}_{i}}({\left(\frac{1}{{\stackrel{\u0304}{H}}_{i}}\right)}^{1/{S}_{i}}1)$.
Now for 1≤i≤k, select and fix arbitrary values u _{ i } in the grid G R D _{ i }. After selecting as restrictive above the set of parameter vector U= [ u _{1},S _{1},…u _{ k },S _{ k }], the function H(t) is then completely determined for all t.
Note that the set of all possible such choices for U is of cardinal inferior to N=(5s)^{ k }. Since we do explore each one of these possibilities separately, we need to keep the number N at a reasonable level, so we selected s=99 for k=1, s=60 for k=2, s=20 for k=3 etc.
This is a classical constrained linear programming problem, which can be solved by well known fast linear programming algorithms [37], to provide the optimal values γ ^{∗},λ ^{∗},z ^{∗}. Then z ^{∗}=E R R(γ ^{∗},λ ^{∗}) is the minimal value of E R R(γ,λ).
These optimal values are functions γ ^{∗}(U),λ ^{∗}(U),z ^{∗}(U) of the partial vector of parameters U in . We then select the optimal U ^{∗} in as the value of U which minimizes z ^{∗}(U) over . The optimal parametrization w ^{∗} of our TD motif is then given by w ^{∗}= [ γ ^{∗}(U ^{∗}),λ ^{∗}(U ^{∗}),U ^{∗}]. This new parametrization algorithm is fairly fast and has good accuracy. On a current laptop PC, our nonoptimized MATLAB code implementation required less than 5 minutes of CPU time for the parametrization of a typical TImodel with 38 time points and 9 parameters [38]. After code optimization and a reimplementation in C, we expect this CPU time to be reduced to 2 minutes. The algorithm does not require any knowledge of the parameters ranges except for the number of binding sites S, which is an advantage for the range of reaction rates of of many molecules of interest are usually unknown.
TDmotifs: parameter estimation algorithm
The parametrization algorithms just presented also apply to TD models, as we now sketch. The output variable is now the expression level g(t) of mRNA gene G.
The TD model parameters S R _{ i }>0,u _{ i }>0 and S A _{ j }>0,w _{ j }>0 are the number of binding sites and the affinity constants for the transcriptional factors R _{ i } and A _{ j }. They constitute a partial parameter vector U which we restrict as above by first imposing a moderate upper bound S _{ m a x } on all the integers S R _{ i },S A _{ j }. and by selecting, also as done above, adequate finite grids for the values of the u _{ i } and the w _{ j }. This constrains U to belong to a finite set . The cardinal N of is forced to remain at most of order 10^{5}, by adequate constraints on S _{ m a x } and on the coarseness of the u _{ i } grids and the v _{ j } grids.
Solving this constraint linear programming problem generates optimal parameters (β ^{∗},v ^{∗},κ ^{∗}) and a minimal error z ^{∗}, which are all functions of U. One concludes as above by selecting an optimal U ^{∗} minimizing z ^{∗}(U) over all U in .
Quality of fit for CKE models
Consider any TD motif or TI motif . We have seen how to compute a parameter vector w ^{∗} optimizing the quality of fit between microarray data and our CKE model for . This was done by minimizing $\mathit{\text{ERR}}\left(\mathbf{w}\right)=\underset{t}{max}\phantom{\rule{0.3em}{0ex}}\left\phantom{\rule{0.3em}{0ex}}f\right(t)\widehat{f}(t\left)\phantom{\rule{0.3em}{0ex}}\right$, where f(t) is the main output variable of , and $\widehat{f}\left(t\right)$ is the estimation of f(t) based on the CKE parametrized by w.
Results and discussion
Examples of application
We implemented our model of TI on microarray data of mouse stem cells undergoing RAinduced differentiation, as provided by LC Science Inc, and previously analyzed by classical techniques in [18]. We took the recorded expression profiles for proteins/mRNAs GCNF, Oct4, Nanog and Sox2 at time points (0, 1.5, 3, 6)/(0, 3, 6) and expression levels for 266 miRNAs on days 0, 1, 3, 6 from [38] during ES cell differentiation. These profile data were interpolated at 19 intermediary time points, by Piecewise Cubic Hermite Interpolation (PCHIP) and the number of parameters were limited to be 4, i.e. only 1 upstream miRNA was selected for the model, to satisfy the parameter parsimony requirement.
where $\sigma \left({m}_{i}\right)=\sqrt{\sum _{t}{\left({m}_{i}\right(t)\stackrel{\u0304}{{m}_{i}}(t\left)\right)}^{2}}$, t=0,1/3,…,6. Since $\parallel \frac{{m}_{i}\left(t\right)\stackrel{\u0304}{{m}_{i}}\left(t\right)}{\sigma \left({m}_{i}\right)}\parallel \le 1$, $\widehat{{m}_{i}}\left(t\right)$ is positive for t=0,1/3,…,6. Taking ε=0.15, we applied Minimal Net clustering (the MATLAB code can be downloaded through Additional file 1) to the transformed data of miRNAs, $\widehat{{m}_{i}}\left(t\right),i=1,2,\dots ,266$, and obtained 107 clusters, in which the maximum cluster contains 14 miRNAs. We consider that the miRNAs belonging to the same cluster share the same normalized expression level within a negligibly small error. For each cluster C L _{ j }, j=1,…,107, we determine the miRNA M C _{ j } which is the representative of the cluster C L _{ j } for the distance D, and we let m c _{ j }(t) be the expression level of M C _{ j } at time t. We call m c _{ j }(t) is the expression level of cluster C L _{ j }.
Discussion
In [38], we preselected the potential miRNAs for each gene/protein by TargetScan 5.0 and miRanda before applying the two models. The preselection of miRNA candidates were not necessary though it greatly reduced the computational cost. However, the application of the CKE modeling was dependant on the target prediction algorithm, such as TargetScan or miRanda. Therefore, we introduced Minimal Net Clustering in this paper so that the data was condensed and the computational cost could be reduced by a purely numerical method without biological bias.
Since the results of TD model depend on information of transcription factors, the modeling validates not only miRNAs acting as mRNA degraders but also upstream transcription factors simultaneously. In [38] we numerically validated proteins GCNF, Oct4 and Nanog as transcription factors for mRNAs Oct4 and Nanog, while Marson et al. and Boyer et al. [40, 41] claimed that Oct4 and Sox2 are bounded and act together with Nanog as transcription factors. Considering the impact of the transcription factors, the TD modeling may be less convincing unless the transcription factors are fixed as experimentally validated. In this paper we presented more details on the implementation and validation results of the TI model in order to focus on the validation of miRNAs and avoid the influence of assumptions of transcription factors.
TarBase shows that, for protein GCNF the experimentally validated miRNAs are mmulet7b and mmumiR181a; for protein Oct4 the experimentally validated miRNA is mmumiR470; for protein Nanog the experimentally validated miRNAs are mmumiR134, mmumiR470, and mmumiR296; for protein Sox2 the experimentally validated miRNA is mmumiR134. In this paper, we have validated by TI modeling and minimal net clustering all the experimentally tested miRNA repressors of GCNF and Oct4. In our previous work [38], actually none of these miRNA repressors had yet been studied for modeling for the four proteins because of the restriction of preselection. And only the pair (mmumiR181a, GCNF) was validated by the classical correlation analysis done in Gu et al. [18] for the 4 proteins GCNF, Oct4, Nanog, Sox2. Therefore, the TI modeling combined with data condensation not only reduced computational cost but also clearly extended the set of miRNA inhibitors validated by model fitting to microarray data.
Since each numerically validated miRNA cluster may contain two or more miRNAs, the miRNAs in the same cluster could also be considered as potential candidate inhibitors for further experiments to validate. For instance, as Figure 2 shows, mmumiR203, 330, 342, 10a and 99b are potential candidates for protein Oct4 for they are in the same cluster as mmumiR470, which is validated by both the numerical modeling and experiments.
After the pair (mmumiR181a, GCNF) was well validated by our CKE model, we found that the cluster (see Figure 5, top left) containing mmumiR181a also includes mmumiR103 and mmumiR107, which are two known miRNAs that have the same roles in regulating insulin sensitivity and promoting metastasis of colorectal cancer [42, 43]. We also checked that miR103 and miR107 have almost the same mature sequences:

mmumiR103: AGCAGCAUUGUACAGGGCUAUGA

mmumiR107: AGCAGCAUUGUACAGGGCUAUCA
After the pair (mmulet7b, GCNF) was well validated by our CKE model, we observed mmulet7g, and mmulet7i are in the same cluster as mmulet7b (see Figure 5, top right). It was claimed that let7b and 7g reduce tumor growth in mouse models of lung cancer [44]. We then checked that indeed these three miRNAs have very similar mature sequences, namely

mmulet7b: UGAGGUAGUAGGUUGUGUGGUU

mmulet7g: UGAGGUAGUAGUUUGUACAGU

mmulet7i: UGAGGUAGUAGUUUGUGCUGU
To evaluate the correlation between mature sequence and expression profile of our set of 266 miRNAs, we systematically explored all the 266×265/2=35,245 pairs (m i r _{ i },m i r _{ j }) of distinct miRNAs in this set, i≠j,i,j=1,…,35245. For each such pair we then computed the Euclidean distance between the expression level profiles (expression distance in short) and the NeedlemanWunsch alignment score (NWA) between the mature sequences of each miRNA pair. We then divided the 35,245 miRNA pairs into two groups: GroupHigh includes the pairs with high alignment score (N W A>10), i.e. GroupHigh includes miRNA pairs with similar mature sequences, and GroupLow includes the pairs with low alignment score (N W A≤10). We then compared the distribution of expression distances for all miRNA pairs in GroupHigh with the distribution of these distances for miRNA pairs in GroupLow. As seen in Figure 5 (bottom left), the quantiles of these distances in GroupLow are consistently larger than the corresponding quantiles in GroupHigh. This is fully confirmed by KolmogorovSmirnov test which yielded the very significant pvalue 7×10^{32}. The result still holds when we change the alignment score threshold N W A=10 used to define GroupHigh and GroupLow (see Figure 5, bottom right, where the NWA threshold is now N W A=15). We conclude that miRNAs having similar mature sequences tend, with high probability, to have similar expression levels. The above analysis and examples indicate that miRNAs belonging to the same cluster are good candidates to have similar mature sequence. Since the match between miRNA mature sequence and target sites is the main determinant for miRNA targets, miRNAs belonging to same cluster may hence also have similar regulatory roles.
Robustness of parameter estimation
Conclusion
We have separately modeled by chemical kinetics equations the 2 distinct modalities of the repressive actions of miRNAs on posttranscriptional processes of mRNA genes and the associated proteins. This was achieved by first defining the formal structure of two types of interaction architectures (Transcription Degradation motifs and Translation Inhibition motifs) linking miRNAs to subgroups of mRNA genes. The plausibility of each one of these potential TD motifs or TImotifs was then evaluated by computerized parametric modeling, based on microarray data, of adequate formal chemical kinetics equations (CKEs).
We have sketched the formal derivation of 2 specific CKEs modeling by dynamic ODEs the interactions between concentrations of different species of molecules involved in each architecture. This led to a motif validation strategy based on the quantified quality of fit between our optimally parametrized models and the corresponding microarray data.
Our computerized parameter estimation is implemented by an innovative fast algorithm that does not require knowledge of range of molecular reaction rates. On a current standard laptop PC, our implementation of parameter estimation for a typical 9parameters CKE model requires about 5 minutes of computing time.
Our parameter estimation algorithm also provides relatively highquality optimization for the fit between model and microarray data, by integrating both global and local cost minimization techniques, in contexts where plausible ranges of values for most of the unknown parameters are not available in the literature. By perturbing the expression levels of miRNAs and reestimating the parameters, we showed that our parameter algorithm has a satisfactory level of robustness. We believe that our parameter estimation technique with associated evaluation of quality of fit would be quite applicable as a generic algorithm to similar problems in chemical kinetics modeling of molecular interactions.
Modeling very large microarray data is computationally quite expensive. We have hence sketched clustering methods to condense large microarray data. This approach has of course been attempted before our work, but the main point is that we have carefully studied the mathematical compatibility of our CKE models with condensation of the profiles data. Since we have proved that the abstract form of our CKE models is invariant by arbitrary multiple affine transformations of profiles data, we have made sure to constrain the distance of two expression levels profiles to be invariant by these types of affine transformations.
We have implemented a Minimal Net Clustering algorithm based on this distance, which allows us to control the radius of the clusters. The number of CKEs to parameterize can be strongly reduced after condensation of the large data sets, and the affine invariance of our CKEs show that the condensed genes network can then still be modeled by similar CKEs.
By applying our TI modeling to multiple proteins such as GCNF, Oct4, Nanog, Sox2, we showed that 3 miRNAtarget pairs experimentally validated can be also validated by the TI model.
Declarations
Acknowledgements
We would like to thank Drs P. Gunaratne, A. J. Cooney, D. Lonard, X. Xu for multiple discussions concerning their experimental data and results on miRNAmRNA interactions. These discussions led us to a joint publication.
Authors’ Affiliations
References
 Ambros V:The functions of animal microRNAs. Nature. 2004, 431 (7006): 350355. 10.1038/nature02871.View ArticlePubMedGoogle Scholar
 Aravin A, Tuschl T:Identification and characterization of small RNAs involved in RNA silencing. FEBS Lett. 2005, 579 (26): 58305840. 10.1016/j.febslet.2005.08.009.View ArticlePubMedGoogle Scholar
 Bartel DP:MicroRNAs: target recognition and regulatory functions. Cell. 2009, 136 (2): 215233. 10.1016/j.cell.2009.01.002.PubMed CentralView ArticlePubMedGoogle Scholar
 Tay Y, Zhang J, Thomson AM, Lim B, Rigoutsos I:MicroRNAs to Nanog, Oct4 and Sox2 coding regions modulate embryonic stem cell differentiation. Nature. 2008, 455: 11241128. 10.1038/nature07299.View ArticlePubMedGoogle Scholar
 Wang XJ, Reyes JL, Chua NH, Gaasterland T:Prediction and identification of arabidopsis thaliana microRNAs and their mRNA targets. Genome Biol. 2004, 5 (9): R6510.1186/gb200459r65.PubMed CentralView ArticlePubMedGoogle Scholar
 Kawasaki H, Taira K:MicroRNA196 inhibits HOXB8 expression in myeloid differentiation of HL60 cells. Nucleic Acids Symp Ser. 2004, 48 (48): 211212.View ArticleGoogle Scholar
 Moxon S, Jing R, Szittya G, Schwach F, RPR L, Moulton V, Dalmay T:Deep sequencing of tomato short RNAs identifies microRNAs targeting genes involved in fruit ripening. Genome Res. 2008, 18 (10): 16021609. 10.1101/gr.080127.108.PubMed CentralView ArticlePubMedGoogle Scholar
 Williams AE:Functional aspects of animal microRNAs. Cell Mol Life Sci. 2008, 65 (4): 545562. 10.1007/s0001800773559.View ArticlePubMedGoogle Scholar
 Mazière P, Enright AJ:Prediction of microRNA targets. Drug Discov Today. 2007, 12 (11–12): 452458.View ArticlePubMedGoogle Scholar
 Lewis BP, Shih IH, JonesRhoades MW, Bartel DP, Burge C:Prediction of mammalian microRNA targets. Cell. 2003, 115: 787798. 10.1016/S00928674(03)010183.View ArticlePubMedGoogle Scholar
 Lewis BP, Burge CB, Bartel D:Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell. 2005, 120: 1520. 10.1016/j.cell.2004.12.035.View ArticlePubMedGoogle Scholar
 Brennecke J, Stark A, Russell RB, Cohen S:Principles of microRNAtarget recognition. PLoS Biol. 2005, 3: e8510.1371/journal.pbio.0030085.PubMed CentralView ArticlePubMedGoogle Scholar
 Krek A, Grun D, Poy MN, Wolf R, Rosenberg L, Epstein E, MacMenamin P, da Piedade I, Gunsalus KC, Stoffel M, Rajewsky N:Combinatorial microRNA target predictions. Nat Genet. 2005, 37: 495500. 10.1038/ng1536.View ArticlePubMedGoogle Scholar
 Chi SW, Hannon GJ, Darnell RB:An alternative mode of microRNA target recognition. Nat Struct Mol Biol. 2012, 19 (3): 321327. 10.1038/nsmb.2230.PubMed CentralView ArticlePubMedGoogle Scholar
 Baek D, Villé J, Shin C, Camargo FD, Gygi SP, Bartel DP:The impact of microRNAs on protein output. Nature. 2008, 455: 6471. 10.1038/nature07242.PubMed CentralView ArticlePubMedGoogle Scholar
 Selbach M, Schwanhäusser B, Thierfelder N, Fang Z, Khanin R, Rajewsky N:Widespread changes in protein synthesis induced by microRNAs. Nature. 2008, 455: 5863. 10.1038/nature07228.View ArticlePubMedGoogle Scholar
 Mourelatos Z:Small RNAs: The seeds of silence. Nature. 2008, 455: 4445. 10.1038/455044a.View ArticlePubMedGoogle Scholar
 Gu P, Reid JG, Gao X, Shaw CA, Creighton C, Tran PL, Zhou X, Drabek RB, Steffen DL, Hoang DM, Weiss MK, Naghavi AO, Eldaye J, Khan MF, Legge GB, Wheeler DA, Gibbs RA, Miller JN, Cooney AJ, Gunaratne PH:Novel miRNA candidates and miRNAmRNA pairs in ES cells. PLoS ONE. 2008, 3 (7): e254810.1371/journal.pone.0002548.PubMed CentralView ArticlePubMedGoogle Scholar
 de Jong H:Modeling and simulation of genetic regulatory systems: a literature review. J Comput Biol. 2002, 9 (1): 67103. 10.1089/10665270252833208.View ArticlePubMedGoogle Scholar
 Goutsias J, Kim S:A nonlinear discrete dynamical model for transcriptional regulation: construction and properties. Biophys J. 2004, 86 (4): 19221945. 10.1016/S00063495(04)742575.PubMed CentralView ArticlePubMedGoogle Scholar
 CornishBowden A: Fundamentals of Enzyme Kinetics, 3rd Edition. 2004, London: Portland PressGoogle Scholar
 Goutsias J, Lee NH:Computational and experimental approaches for modeling gene regulatory networks. Curr Pharm Des. 2007, 13: 14151436. 10.2174/138161207780765945.View ArticlePubMedGoogle Scholar
 Moore JW, Pearson RG: Kinetics and Mechanism, 3rd Edition. 1981, New York: WileyGoogle Scholar
 Slonim DK, Yanai I:Getting started in gene expression microarray analysis. PLoS Comput. Biol. 2009, 5 (10): e100054310.1371/journal.pcbi.1000543.PubMed CentralView ArticlePubMedGoogle Scholar
 Engl HW, Flamm C, Kügler P, Lu J, Müller S, Schuster P:Inverse problems in systems biology. IOP Sci. 2009, 25: 123014Google Scholar
 Arkin A, Ross J, McAdams HH:Stochastic kinetic analysis of developmental pathway bifurcation in phage linfected escherichia coli cells. Genetics. 1998, 149: 16331648.PubMed CentralPubMedGoogle Scholar
 Wang Y, Liu CL, Storey JD, Tibshirani RJ, Herschlag D, Brown PO:Precision and functional specificity in mRNA decay. Proc Natl Acad Sci. 2002, 99 (9): 58605865. 10.1073/pnas.092538799.PubMed CentralView ArticlePubMedGoogle Scholar
 Meir E, Munro EM, Odell GM, von Dassow G:Ingeneue: a versatile tool for reconstituting genetic networks, with examples from the segment polarity network. J Exp Zool. 2002, 294: 216251. 10.1002/jez.10187.View ArticlePubMedGoogle Scholar
 Müller S, Hofbauer J, Endler L, Flamm C, Widder S, Schuster P:A generalized model of the repressilator. J Math Biol. 2006, 53: 905937. 10.1007/s0028500600359.View ArticlePubMedGoogle Scholar
 Widder S, Schicho J, Schuster P:Dynamic patterns of gene regulation I: simple twogene systems. J Theor Biol. 2007, 246: 395419. 10.1016/j.jtbi.2007.01.004.View ArticlePubMedGoogle Scholar
 Hornstein E, Shomron N:Canalization of development by microRNAs. Nat Genet. 2006, 38: S20S24. 10.1038/ng1803.View ArticlePubMedGoogle Scholar
 Azencott R, Coldefy F, Younes L:A distance for elastic matching in object recognition. Pattern Recognition, Proceedings of 13th Int. Conf. ICPR 96, vol. 1. 1996, IEEE, 687691.View ArticleGoogle Scholar
 Periwal V, Chow CC, Bergman RN, Ricks M, Vega GL, Sumner AE:Evaluation of quantitative models of the effect of insulin on lipolysis and glucose disposal. Am J Physiol Regul Integr Comp Physiol. 2008, 295: R1089R1096. 10.1152/ajpregu.90426.2008.PubMed CentralView ArticlePubMedGoogle Scholar
 Gregory PC: Bayesian Logical Data Analysis for the Physical Sciences: A Comparative Approach With Mathematical Support. 2005, London: Cambridge University PressView ArticleGoogle Scholar
 Küegler P, Gaubitzer E, Müller S:Parameter identification for chemical reaction systems using sparsity enforcing regularization: a case study for the ChloriteIodide Reaction. J Phys Chem A. 2009, 113: 27752785. 10.1021/jp808792u.View ArticleGoogle Scholar
 Vapnik V: Statistical Learning Theory. 1998, New York: WileyGoogle Scholar
 Boyd S, Vandenberghe L: Convex Optimization. 2004, Cambridge: Cambridge University PressView ArticleGoogle Scholar
 Luo Z, Xu X, Gu P, Lonard D, Gunaratne P, Cooney AJ, Azencott R:Regulatory circuits of miRNAs in ES cell differentiation: a chemical kinetics modeling approach. PLoS One. 2011, 6 (10): e2326310.1371/journal.pone.0023263.PubMed CentralView ArticlePubMedGoogle Scholar
 Vergoulis TI, Vlachos P, Alexiou G, Georgakilas M, Maragkakis M, Reczko M, Gerangelos S, Koziris N, Dalamagas T, Hatzigeorgiou A:Tarbase 6.0: capturing the exponential growth of miRNA targets with experimental support. Nucl Acids Res. 2012, 40 (D1): D222D229. 10.1093/nar/gkr1161.PubMed CentralView ArticlePubMedGoogle Scholar
 Marson A, Levine SS, Cole MF, Frampton GM, Brambrink T, Johnstone S, Guenther MG, Johnston WK, Wernig M, Newman J, Calabrese JM, Dennis LM, Volkert TL, Gupta S, Love J, Hannett N, Sharp PA, Bartel DP, Jaenisch R, YR A:Connecting microRNA genes to the core transcriptional regulatory circuitry of embryonic stem cells. Cell. 2008, 10: 1016Google Scholar
 Boyer LA, Lee TI, Cole MF, Johnstone SE, Levine SS, Zucker JP, Guenther MG, Kumar RM, Murray HL, Jenner RG, Gifford DK, Melton DA, Jaenisch R, Young RA:Core transcriptional regulatory circuitry in human embryonic stem cells. Cell. 2005, 122 (6): 947956. 10.1016/j.cell.2005.08.020.PubMed CentralView ArticlePubMedGoogle Scholar
 Trajkovski M, Hausser J, Soutschek J, Bhat B, Akin A, Zavolan M, Heim MH, Stoffel M:MicroRNAs 103 and 107 regulate insulin sensitivity. Nature. 2011, 474 (7353): 649653. 10.1038/nature10112.View ArticlePubMedGoogle Scholar
 Chen HY, Lin YM, Chung HC, Lang YD, Lin CJ, Huang J, Wang WC, Lin FM, Chen Z, Huang HD, Shyy JY, Liang JT, Chen RH:miR103/107 promote metastasis of colorectal cancer by targeting the metastasis suppressors DAPK and KLF4. Cancer Res. 2012, 72 (14): 36313641. 10.1158/00085472.CAN120667.View ArticlePubMedGoogle Scholar
 EsquelaKerscher A, Trang P, Wiggins JF, Patrawala L, Cheng A, Ford L, Weidhaas JB, Brown D, Bader AG, Slack FJ:The let7 microRNA reduces tumor growth in mouse models of lung cancer. Cell Cycle. 2008, 7 (6): 759764. 10.4161/cc.7.6.5834.View ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.