 Methodology article
 Open Access
 Published:
Modeling miRNAmRNA interactions: fitting chemical kinetics equations to microarray data
BMC Systems Biology volume 8, Article number: 19 (2014)
Abstract
Background
The miRNAs are small noncoding RNAs of roughly 22 nucleotides in length, which can bind with and inhibit protein coding mRNAs through complementary base pairing. By degrading mRNAs and repressing proteins, miRNAs regulate the cell signaling and cell functions. This paper focuses on innovative mathematical techniques to model gene interactions by algorithmic analysis of microarray data. Our goal was to elucidate which mRNAs were actually degraded or had their translation inhibited by miRNAs belonging to a very large pool of potential miRNAs.
Results
We proposed two chemical kinetics equations (CKEs) to model the interactions between miRNAs, mRNAs and the associated proteins. In order to reduce computational cost, we used a non linear profile clustering method named minimal net clustering and efficiently condensed the large set of expression profiles observed in our microarray data sets. We determined unknown parameters of the CKE models by minimizing the discrepancy between model prediction and data, using our own fast non linear optimization algorithm. We then retained only the CKE models for which the optimized fit to microarray data is of high quality and validated multiple miRNAmRNA pairs.
Conclusion
The implementation of CKE modeling and minimal net clustering reduces drastically the potential set of miRNAmRNA pairs, with a high gain for further experimental validations. The minimal net clustering also provides good miRNA candidates that have similar regulatory roles.
Background
Transcriptional and translational processes are fundamental cell mechanisms, involving three main molecular species: messenger RNA (mRNA) and their associated proteins, as well as microRNAs (miRNAs).
The miRNAs are small noncoding RNAs of roughly 22 nucleotides in length, which can bind with and inhibit protein coding mRNAs through complementary base pairing. A given miRNA can potentially bind and silence hundreds of mRNAs across a number of signaling pathways. These repressive miRNAmRNA interactions occur in multiple cellular processes [1–3], and involve two distinct modalities: they may directly degrade their target mRNAs, or more often inhibit their translation [4–9].
The best characterized features determining the targets of a specific miRNA are the conserved WatsonCrick pairing to the 5’ region (positions 2–7) of the miRNA, which are the socalled “seed pairing rules” [3, 10–13]. Since seed pairing rules are neither sufficient nor necessary for miRNAtarget functions [4, 14], they have usually been combined with microarray or proteomic analysis to find potential miRNAtarget pairs[15–17]. Classical microarray data analysis relies mostly on massive application of correlation analysis and linear statistical techniques to simultaneously acquired gene expression profiles. Combined with profile thresholding and heat map displays, these techniques provide commonly used clues for qualitative inference.
In [18], Principal Component Analysis and linear correlation had been linked with comparative sequence analysis to study microarray data recorded during mouse stem cells differentiation, and to broadly predict potential miRNAmRNA interactions.
To go beyond the results of the linear microarray analysis applied on timecourse microarray data in [18], we have formalized two basic architectures for repressive miRNAmRNA interactions: “Transcription Degradation” (TD) and “Translation Inhibition” (TI).
Traditional chemical kinetics equations had been proposed to model the transcriptional and translational processes without involving the interaction of miRNAs [19, 20].
We have derived Chemical Kinetics Equations (CKEs) to model the dynamics of TD and TI motifs, in the spirit of [20–30]. The equations are algebraically invariant under affine transformation and allow data condensation to reduce computational cost. We have implemented “minimal net” clustering method, which can control the maximum diameter of the clusters, to condense large data sets of gene expression.
Modeling by nonlinear CKEs involves complicated parameter estimation problems to fit the very large set of expression profiles recorded by microarrays. We have developed innovative fast algorithms dedicated to CKE parameter estimation, by optimization of the quality of fit between model and data.
We validate only the parameterized motifs having a high quality of fit to data. To reach robust conclusions we apply a “parameter parsimony” principle, favoring the models having the smallest number of parameters. And we have also evaluated the robustness of our parameter estimation algorithm by algorithm by direct testing on simulated data. These tests did validate our parameter estimation method is quite robust. This nonlinear approach goes further than wellestablished analysis based on correlation techniques combined with heat map displays.
Methods
Basic interaction architectures
To validate if an miRNA M does indeed repress a given gene G, we model the chemical interactions of M and G within a small network containing the pair (M,G). We now sketch two basic interaction architectures and their CKE models.
Transcription Degradation Motifs (TDmotifs)
We call “TDmotif” any interaction architecture, as sketched in Figure 1, involving a single miRNAmRNA pair (M,G) where G is in Target(M), and M degrades the transcription of G. The TD motif includes also two sets of proteins rep(G) and act(G), namely the transcriptional “repressors” and “activators” of G, denoted by
Let g(t),p(t),m(t),r _{ i }(t),a _{ j }(t), be the expression levels at time t for the chemical species G,P,M,R _{ i },A _{ j }. We model the transcription process by a CKE similar to CKEs proposed in [20, 22, 25, 29, 30], but with a complementary term encoding the repressive impact of miRNA M on its target mRNA G:
where β>0 is the degradation rate of G, v>0 is the reaction rate between G and M, κ>0 is the product of the transcription rate by the concentration c of DNA templates, concentration which we assume to be some constant not depending on time (see [20]).
The percentage 0≤F(t)=R E P(t)[ 1A C T(t)]≤1% is the fraction of existing DNA templates which are committed at time t to transcription of the mRNA gene G. Here the percentages R E P(t) and A C T(t) are modeled by the following products,
where
The parameters S R _{ i }>0,u _{ i }>0 and S A _{ j }>0,w _{ j }>0 are the number of binding sites and the affinity constant for the transcriptional factors R _{ i } and A _{ j }.
Note that the transcription repressors R _{ i } combine multiplicatively their individual impacts R E P _{ i } in R E P(t), and that the R E P _{ i } are analogous to Hill function (see [19, 25]); similar remarks apply to the transcription activators. The multiplicative expressions of R E P(t) and A C T(t) are typical of a socalled “cisregulatory” function and have been derived by J. Goutsias [20].
The term κ R E P(t)[ 1A C T(t)]d t is the concentration of new G molecules synthesized by transcription during the small time interval [ t,t+d t], while the repressive interactions of M and G eliminates v g(t)m(t)d t molecules of G, and natural decay destructs β g(t)d t molecules of G.
TranslationInhibition Motifs (TImotifs)
We call “TImotif” any interaction architecture, as sketched in Figure 1, involving a set M _{1},…,M _{ r } of miRNAs inhibiting the translation of mRNA gene G, by repressing the expression of the protein P generated by G. Let (p(t),m _{ i }(t),g(t)) be the concentrations at time t of protein P, miRNA M _{ i }, and mRNA G. In the spirit of [20, 21], we model the translation inhibition dynamics by the CKE
where γ>0 and λ>0 are the degradation rate and the translation rate for protein P and where H(t) is the percentage of G molecules committed at time t to the translation of G. Thus H(t) encodes the inhibiting impact of the miRNAs M _{1},…,M _{ k } on the translation of G, and is modeled as a product of terms similar to R E P(t):
The parameters S M _{ i }>0 and u _{ i }>0 are resp. the number of binding sites and the affinity constant controlling the inhibiting impact of miRNA M _{ i } on the translation of gene G. Note that H(t) decreases when the miRNA concentrations m _{ i }(t) increase.
Our model for TI motif has been inspired by J. Goutsias [20], and we present below the hypotheses and arguments justifying the expression of H(t). There is a key difference between the CKEs modeling TD and TI motifs. For TI motifs, the concentration of Gmolecules committed to translation is g(t)H(t), where g(t) is the concentration of Gmolecules and H(t)<100%.
For TD motifs, the concentration of Gmolecules synthesized by transcription is κ F(t)=α c F(t) with F(t)<100%, where α and c are respectively the transcription rate and the concentration of DNA templates, assumed to be constant in time.
Derivation of chemical kinetics equations
Our derivation of the regulation equation for TI motif is quite similar to presentation given for TD motifs in [20], but has several changes in assumptions and formulations. To derive the CKEs (1) and (4), we propose a few hypotheses.

Hyp. 1: (TDmotifs) The molecules of the miRNA repressor of gene G can strongly bind only at one unique specific site of G molecules, and once a single such strong bind occurs, the corresponding G molecule degrades extremely fast.

Hyp. 2: (TImotifs) Each miRNA M _{ j } in the set r e p(G) of translation inhibitors of G can weakly bind with G but only at specific binding sites constituting a set B I N D _{ j } of size S _{ j }. The sets B I N D _{1},B I N D _{2},… are pairwise disjoint. Once a G molecule thus binds with one inhibitor M _{ j }, then this G molecule will fail to translate.

Hyp. 3: For any given Gmolecule, call X _{ j } the random number of sites in B I N D _{ j } which actually bind with one “ M _{ j }”molecule. We assume that the random variables X _{1},X _{2},…. are independent.
Hyp.1 is based on the fact that only a small fraction of all messenger RNAs have more than a single miRNA binding site and miRNA bound with an mRNA gene G has a limited effect on the mRNA G, but affects more substantially the protein P generated by G ([4, 31].
Consider first any TI motif involving an mRNA gene G generating the protein P, as well as a set r e p(G)={M _{1},M _{2},…M _{ k }} of k translation inhibitors. We now derive the CKE (4) for this TI motif. Call p(t),g(t),m _{ j }(t) the concentrations of P,G,M _{ j } and let H(t) be at time t the percentage of existing G molecules committed to the translation of G into P. The basic CKE driving translation of G into P is then, as seen in [20],
where γ,λ, are the degradation and translation rates of P.
The main point is to compute H(t).
For k=0, there are no translation inhibitors, and hence H(t)=100%. For k=1, let Q[ S] be the set of G molecules with exactly S binding sites bound to M _{1} molecules, where 0≤S≤S _{1}. Let q(S,t) be the concentration of Q[S] molecules. We have the forward and backward reactions
By molecular collision theory [23], the concentration ρ(t) of free M _{1} molecules which bind at time t on Q[ S]molecules to produce Q[ S+1]molecules by forward reaction 7, is proportional to the product of m _{1}(t)q(S,t) by the number (S _{1}S) of vacant binding sites. Hence, for some constant c,
Similarly the concentration τ(t) of M _{1} molecules freed by backward reaction 7 is given by
for some constant $\stackrel{~}{c}$. At chemical equilibrium, we have τ(t)=ρ(t), and hence
where u is a new constant. By recurrence on S, this implies
This entails
The set of Gmolecules committed to translation into P at time t is identical to q(0,t). Thus H(t)=q(0,t)/g(t), and (11) yields $H\left(t\right)=1/{[\phantom{\rule{0.3em}{0ex}}1+u{m}_{1}(t\left)\right]}^{{S}_{1}}$.
Using hypothesis Hyp 3, a recurrence on k extends the argument just given for k=1, to prove that for k≥1, the percentage H(t) of G molecules committed to translation into P is given by:
which proves CKE (4) for TI motifs.
For TD motifs, the cisregulation function F(t) in CKE (1) has been derived in [20]. By using hypothesis 1 and molecular collision theory, we directly deduce that the concentration the concentration of mRNA G degraded by miRNA M is proportional to the product of both concentrations of G and M. Thus we justify the miRNA degradation term v g(t)m(t).
Invariance by affine profile transformations
Affine profile transformations
Call Γ⊂R ^{q} the set of all possible expression level profiles r(t) indexed by time t _{1},…,t _{ q }. To any pair T=(a,b) of real numbers, we associate an affine profile transformation
for all time dates t. Let be the set of all such affine profile transformations.
In microarray data sets, expression levels of genes are recorded via optical analysis of fluorescence intensities, and hence depend strongly on experimental acquisition modalities. So relative expression levels between pairs of recorded chemical species are more meaningful quantities, and graphic displays of microarray data by “heat maps” often involve logarithms of raw data.
Our microarray data record expression levels which as a first approximation can be viewed as unknown affine transformations of concentrations. Since our CKEs (1), (4) were derived for concentrations, we need to check how these CKEs and their parameters change under generic affine profile transformations.
Affine invariance of CKE models
Consider first a TDmotif involving an mRNA gene G, a protein P, an miRNA M, transcription repressive proteins R _{1},R _{2},…, and transcription activating proteins A _{1},A _{2},…. The CKE model (1) links the concentrations g,p,m,r _{ i },a _{ j } of G,M,P,R _{ i },A _{ j }. Let $\u011d,\widehat{p},\widehat{m},{\widehat{r}}_{i},{\xe2}_{j}$ be the corresponding recorded expression profiles. Assume that each such recorded profiles $\widehat{r}$ is linked to the concentration profile 4 by some unknown affine profile transformation T _{ r }.
From CKE (1), one directly deduces that $\u011d,\widehat{p},\widehat{m},{\widehat{r}}_{i},{\xe2}_{j}$ verify a new CKE having an algebraic form completely similar to CKE (1), but where the original parameters β,v,κ,u _{ i },w _{ j } are replaced by new parameters $\widehat{\beta},\widehat{v},\widehat{\kappa},{\xfb}_{i},{\u0175}_{j}$, and where the integers S R _{1},S R _{2},…, and S A _{1},S A _{2},…, remain unchanged.
The new parameters are easily expressed in terms of the original ones and of the coefficients of the affine transformations, but this is irrelevant practically since we will use microarray data to directly compute the new parameters for a CKE of type (1) linking recorded expression levels.
Similar computations for TImotifs show that this affine invariance property also holds for CKE (4). Hence to model a TD or a TI motif, we can fit a CKE model of type (1) or (4) to recorded expression profiles, even though the theoretical model justification involved true concentrations, which are not directly measured by microarrays.
The key assumption is that, for each chemical species C, the expression levels $\u0109\left(t\right)$ of C recorded by microarray are approximately linked to the concentration c(t) of C by some affine relation $\u0109\left(t\right)=a\left(C\right)c\left(t\right)+b\left(C\right)$, where the unknown coefficients a(C) and b(C) may depend on the species C.
The preceding algebraic model invariance under multiple affine profile transformations strongly suggests that adequate distances between dynamic profiles of recorded expression levels should be invariant under affine profiles transformations, as developed in the next section.
Condensation of expression levels profiles
Microarray data typically record several tens of thousands gene expression profiles. So computational costs to fit microarray data to all potential TDmotifs and TImotifs would of course be prohibitive. A natural option to reduce combinatorial explosion is to cluster the observed profiles.
Since our goal is to model expression profiles by ODEs of type (1) and (4), we need to control the diameters of all clusters of expression profiles. This led us to reject hierarchical clustering as well as Kmeans clustering, and to implement in the space of expression profiles a “minimalnet” clustering technique, inspired by an innovative technique for automatic generation of prototypes in shape spaces (see [32]). In view of the preceding section, the diameters of clusters in the space of profiles should be measured by a distance invariant under affine profiles transformations.
Affine invariant distance between profiles
Recall that Γ⊂R ^{q} is the set of all profiles r(t) indexed by time dates t _{1},…,t _{ q }. The mean and variance of a profile r are denoted by
Define normalized profiles n o r(r) and profiles correlations c o r r(r _{1},r _{2}) by
where <·,·> is the usual scalar product in R ^{p}.
We then define a distance D(r _{1},r _{2}) between profiles r _{1} and r _{2} by
For any affine profile transformation T in , one has n o r(r)=n o r(T r), and hence D(r _{1},r _{2})=D(n o r(r _{1}),n o r(r _{2}). Thus for any affine profile transformations T _{1},T _{2}, and any profiles r _{1},r _{2} one has
So the profile distance D is invariant by affine profile transformations
Minimal net clustering
The set Γ of all profiles is now endowed with a distance D invariant by affine profile transformations. Call mPR the large set of miRNA profiles recorded at time t _{1},…,t _{ q } by our microarrays. We fix a maximum radius ε for profiles clusters. We seek to partition mPR into disjoint clusters C L _{1},…,C L _{ r } such that each C L _{ j } has diameter inferior to 2ε, and we also want the number NN of clusters to be as small as possible. We have implemented an iterative algorithm to generate this type of minimal net clustering, in the spirit of [32].
Define a distance function D(x,y), where x,y represent the observations of the data. Denote by D(x,Y) the distance between an observation x and a set Y, where D(x,Y)= miny∈Y D(x,y). Let Let [ x] denote the cluster containing single point x, and Γ be the set of all observations, the minimal net algorithm is as follows:

1.
step 1, let (x _{1},y _{1})=a r g m a x _{ x,y } D(x,y). Then let x _{1} and y _{1} become the representatives of two initial clusters. Let C L _{1}= [ x _{1}],C L _{2}= [ x _{2}], C _{1}={C L _{1},C L _{2}} and R _{1}=Γ∖C _{1} representing the remaining points in Γ excluding the 2 clusters.

2.
After step n1, we obtain C _{ n1}={C L _{1},C L _{2},…C L _{ n }} and R _{ n1}=Γ∖C _{ n1}, where C L _{ j } is a cluster that has single point, j=1,…,n+1.
In step n, let $\mathit{\text{HD}}=\underset{x\in {R}_{n1}}{max}D(x,{C}_{n1})$, representing the maximum distance between observation x in R _{ n1} and set C _{ n1}. If H D>ε, find ${x}_{\mathit{\text{HD}}}={\mathit{\text{argmax}}}_{x\in {R}_{n1}}D(x,{C}_{n1})$. Let C _{ n }={C _{ n1},x _{ H D }}, R _{ n }=Γ∖C _{ n }. Repeat until H D≤ε.

3.
Assume the loop stop at step NN, we have C _{ N N }={C L _{1},C L _{2},…C L _{ N N+1}}. For an observation x, find the point ${y}_{{\mathit{\text{CL}}}_{k}}$, belonging to cluster C L _{ k } in C _{ N N }, that is closest to x, i.e. ${y}_{{\mathit{\text{CL}}}_{k}}={\mathit{\text{argmin}}}_{y\in {C}_{\mathit{\text{NN}}}}\left(D\right(x,{C}_{\mathit{\text{NN}}}\left)\right)$, then assign x to the cluster C L _{ k }.
We apply this minimal net clustering algorithm to the set of all miRNA profiles recorded by our microarrays. We define the cluster diameter d i a m(C L) of cluster CL as the maximum distance of any two observations in the cluster, i.e. d i a m(C L)= maxx,y∈C L D(x,y), where. Compared with the commonly used clustering method such as Kmeans and Hierarchical clustering, the minimal net algorithm allows us to control the diameter of all clusters by setting the threshold ε representing the supremum radius of the cluster, while the Kmeans and hierarchical clustering is often used to determine the number of of clusters but not their diameters. If we select a very small threshold ε, the expression levels of genes in the same cluster can be considered almost identical.
Parameter estimation for the CKE models
Parameters estimation strategy
A generic strategy for parameter estimation in CKEs systems is to minimize a cost function evaluating the discrepancy between model predictions and experimental data. Each one of our CKE models has a single output variable, namely the expression level g(t) of mRNA gene G for a TD motif, and the expression level p(t) of protein P for a TI motif. The output variable g(t) or p(t) of a CKE model can be estimated by a function $\u011d\left(t\right)$ or $\widehat{p}\left(t\right)$ once one knows the profiles of all other molecular species involved in the model. The estimators $\u011d\left(t\right)$ or $\widehat{p}\left(t\right)$ are respectively determined by CKE (1) or CKE (4).
Each CKE models is parameterized by a parameter vector w of dimension n _{ p a r }. The quality of fit of this model with recorded profiles data is quantified by the size E R R(w) of the estimation error defined as follows
where the output variable f(t) is equal to g(t) for TD motifs and to p(t) for TI motifs. The concrete goal is to find the best parameter vector w by minimization of the lack of fit E R R(w) over all possible values of w.
There are no magic solutions for such non linear minimization problems. Moreover fast computing was essential here, since we usually have to solve a very large number of similar “quality of fit maximization” problems when dealing with large microarray datasets.
We have tested several generic cost minimization approaches (see [33–35]) such as “genetic algorithms”, as well as “gradient descent” to minimize a sum of squared modeling residuals. These two techniques turned out to require far too much computing time and were often unreliable due to their high dependence on initialization values.
We hence developed our own fast CKE parametrization algorithms to optimize the quality of fit between CKE models and microarray profiles data. This optimized quality of fit, adequately balanced by a systematic emphasis on parsimoniously parameterized models, becomes an essential clue to decide which potential interactions one should validate between miRNAs, mRNAs, and associated proteins.
Parameters parsimony requirement
Robustness of CKE parametrization is the main motivation for our parameter parsimony requirements. Consider CKE (1) modeling a TDmotif TDM involving n _{ a c t } activators and n _{ r e p } repressors for the transcription of mRNA gene G, and one miRNA M degrading the transcription of G. Then the number n _{ p a r } of unknown parameters is n _{ p a r }=3+2(n _{ a c t }+n _{ r e p }). Each profile g(t),m(t),a _{ i }(t),r _{ j }(t) is recorded at t=t _{1},…,t _{ p }. The CKE outputs for TDM are g(t _{1}),…,g(t _{ p }). Hence each microarray data set provides (p1) equations linking the n _{ p a r } unknown parameters, since the recorded g(t) should be very close to the predicted values $\u011d\left(t\right)$ obtained by solving the ODE (1) with initial value g(t _{1}).
Vapnik’s results on model fitting (see [36]) show that robust accuracy of parameter estimates requires fairly high values of the ratio of (p1)/n _{ p a r }. So n _{ p a r }<<(p1) is a necessary constraint, and we will impose the parameter parsimony requirement n _{ p a r }≤(p1)/4. Indeed when n _{ p a r }>(t _{ p }1), CKE models are overfitted and parameters are poorly estimated.
CKE Parameter estimation
TImotifs: plausible ranges for parameters
Consider a generic TI motif TIM involving an mRNA gene G, its associated protein P, and k translation inhibiting miRNAs [ M _{1},…,M _{ k }]. Let p(t),g(t),m _{ i }(t) be the expression levels of P,G,M _{ i }. We want to model TIM by CKE (4).
Parametrized by the vector
which involves (2k+2)≤8 parameters.
The degradation and translation rates γ and λ of protein P were unknown for most proteins.
According to results from [31], only a small percentage 2% of our 30,000 mRNAs have more than 2 potential binding sites for miRNAs, and only 0.02% mRNAs have as many as 7 such binding sites. So in our parameter estimations it is reasonable to restrict the number of binding sites S _{ j } for miRNA M _{ j } to be at most 5.
TImotifs: optimizing the quality of fit
The inputs of CKE (4) are the initial value p(t _{1}) and the expression levels g(t),m _{1}(t),…,m _{ k }(t) recorded at time dates t _{1},…,t _{ q }.
Discretizing the ODE (4) at time t _{1},…,t _{ q }, we get
where the percentage H(t) is as recalled above. By summation this implies, for j=2,3,.…,q1, the relation
where
In view of equation (14), when all expression profiles involved have been recorded until time t _{ j }, one can predict the still unknown value of p(t _{ j+1}) by the following natural estimator $\widehat{p}\left({t}_{j+1}\right)$,
The quality of fit of this CKE model with recorded profiles data will be quantified by the size E R R(w ) of the estimation error numerically computed as follows
which in view of (15) can be reformulated as
where for j=1,…,q we have set
We seek a parameter vector w minimizing the cost function E R R(w) in the parametric domain defined by
TImotif: parameter estimation algorithm
For each i=1…k, fix an arbitrary integer 1≤S _{ i }≤5. Call ${\stackrel{\u0304}{m}}_{i}$ and ${\stackrel{\u0304}{H}}_{i}$ the respective medians over time t of the functions m _{ i }(t) and H _{ i }(t). Since the function ${H}_{i}\left(t\right)=1/{[\phantom{\rule{0.3em}{0ex}}1+{u}_{i}{m}_{i}(t\left)\right]}^{{S}_{i}}$ is monotonous in m _{ i }(t), we have
The assumption 0<H _{ i }(t)<1 yields $0<{\stackrel{\u0304}{H}}_{i}\left(t\right)<1$. We will discretize the possible values of ${\stackrel{\u0304}{H}}_{i}\left(t\right)$ by constraining them to belong to a grid h _{1},…,h _{ s } of 1≤s≤99 percentage values equally spaced in [ 0,1]. Since ${\stackrel{\u0304}{m}}_{i}$ is known, we invert equation (19) to compute a corresponding grid G R D _{ i } of s potential values for ${u}_{i}=\frac{1}{{\stackrel{\u0304}{m}}_{i}}({\left(\frac{1}{{\stackrel{\u0304}{H}}_{i}}\right)}^{1/{S}_{i}}1)$.
Now for 1≤i≤k, select and fix arbitrary values u _{ i } in the grid G R D _{ i }. After selecting as restrictive above the set of parameter vector U= [ u _{1},S _{1},…u _{ k },S _{ k }], the function H(t) is then completely determined for all t.
Note that the set of all possible such choices for U is of cardinal inferior to N=(5s)^{k}. Since we do explore each one of these possibilities separately, we need to keep the number N at a reasonable level, so we selected s=99 for k=1, s=60 for k=2, s=20 for k=3 etc.
Fix any U in . Since all recorded profiles involved in the TD motif are available at time t _{1},…,t _{ q }, we can use the U value to directly compute all the values H(t _{ j }), and then all the numbers π _{ j },μ _{ j },ν _{ j } defined by (18). We want to find values of the two last parameters γ>0 and λ>0 which will minimize
This problem is equivalent to minimizing the linear objective function:
under the (2q+3) linear inequality constraints
This is a classical constrained linear programming problem, which can be solved by well known fast linear programming algorithms [37], to provide the optimal values γ ^{∗},λ ^{∗},z ^{∗}. Then z ^{∗}=E R R(γ ^{∗},λ ^{∗}) is the minimal value of E R R(γ,λ).
These optimal values are functions γ ^{∗}(U),λ ^{∗}(U),z ^{∗}(U) of the partial vector of parameters U in . We then select the optimal U ^{∗} in as the value of U which minimizes z ^{∗}(U) over . The optimal parametrization w ^{∗} of our TD motif is then given by w ^{∗}= [ γ ^{∗}(U ^{∗}),λ ^{∗}(U ^{∗}),U ^{∗}]. This new parametrization algorithm is fairly fast and has good accuracy. On a current laptop PC, our nonoptimized MATLAB code implementation required less than 5 minutes of CPU time for the parametrization of a typical TImodel with 38 time points and 9 parameters [38]. After code optimization and a reimplementation in C, we expect this CPU time to be reduced to 2 minutes. The algorithm does not require any knowledge of the parameters ranges except for the number of binding sites S, which is an advantage for the range of reaction rates of of many molecules of interest are usually unknown.
TDmotifs: parameter estimation algorithm
The parametrization algorithms just presented also apply to TD models, as we now sketch. The output variable is now the expression level g(t) of mRNA gene G.
Discretize the ODE (1) at time dates t _{1},…,t _{ q }, to get
where F(t) is defined in (1). By summation this implies the relations
where
When all expression profiles involved have been recorded until time t _{ j }, one can predict the unknown value of g(t _{ j+1}) by the estimator $\u011d\left({t}_{j+1}\right)$,
The quality of fit of this CKE model with the recorded profiles data is quantified by the size of the prediction error which is a function E R R(w) of the parameter vector w
so that
where we have set
The TD model parameters S R _{ i }>0,u _{ i }>0 and S A _{ j }>0,w _{ j }>0 are the number of binding sites and the affinity constants for the transcriptional factors R _{ i } and A _{ j }. They constitute a partial parameter vector U which we restrict as above by first imposing a moderate upper bound S _{ m a x } on all the integers S R _{ i },S A _{ j }. and by selecting, also as done above, adequate finite grids for the values of the u _{ i } and the w _{ j }. This constrains U to belong to a finite set . The cardinal N of is forced to remain at most of order 10^{5}, by adequate constraints on S _{ m a x } and on the coarseness of the u _{ i } grids and the v _{ j } grids.
To minimize E R R(w), we fix as above an arbitrary U in . We can then compute all the numbers τ _{ j },ρ _{ j },η _{ j },θ _{ j }. Since U is fixed, the error size E R R(w) becomes a function E(β,v,κ) of the last 3 positive parameters (β,v,κ), still given by equation (23). As above the minimization of E(β,v,κ) is equivalent to a constraint linear programming problem where we want to minimize the linear function Φ(β,v,κ,z)=z over the following set of (2q+4) linear inequalities
Solving this constraint linear programming problem generates optimal parameters (β ^{∗},v ^{∗},κ ^{∗}) and a minimal error z ^{∗}, which are all functions of U. One concludes as above by selecting an optimal U ^{∗} minimizing z ^{∗}(U) over all U in .
Quality of fit for CKE models
Consider any TD motif or TI motif . We have seen how to compute a parameter vector w ^{∗} optimizing the quality of fit between microarray data and our CKE model for . This was done by minimizing $\mathit{\text{ERR}}\left(\mathbf{w}\right)=\underset{t}{max}\phantom{\rule{0.3em}{0ex}}\left\phantom{\rule{0.3em}{0ex}}f\right(t)\widehat{f}(t\left)\phantom{\rule{0.3em}{0ex}}\right$, where f(t) is the main output variable of , and $\widehat{f}\left(t\right)$ is the estimation of f(t) based on the CKE parametrized by w.
For the optimal CKE parametrization w ^{∗}, the lack of fit to data can then be evaluated by E R R(w ^{∗}). However we have seen in section ‘Invariance by affine profile transformations’ that when comparing two expression profiles recorded by microarray, natural distances between profiles should be roughly invariant by changes of scale for these profiles. It would then be tempting to replace the absolute error of estimation $\left\phantom{\rule{0.3em}{0ex}}f\right(t)\widehat{f}(t\left)\phantom{\rule{0.3em}{0ex}}\right$ at time t by the relative error of estimation $\frac{\leftf\right(t)\widehat{f}(t\left)\right}{f\left(t\right)}$. But relative errors become quite large whenever the output profile f((t) is close to zero. To avoid such spuriously large error values, while still preserving scale invariance whenever f(t) is not close to zero, we define the Smoothed Relative Error of estimation S R E(t) at time t by the following formula, where $\stackrel{\u0304}{f}$ denotes the mean value of the profile f,
We finally quantify the Modeling Error MODER for the optimally parametrized CKE model of motif by
Results and discussion
Examples of application
We implemented our model of TI on microarray data of mouse stem cells undergoing RAinduced differentiation, as provided by LC Science Inc, and previously analyzed by classical techniques in [18]. We took the recorded expression profiles for proteins/mRNAs GCNF, Oct4, Nanog and Sox2 at time points (0, 1.5, 3, 6)/(0, 3, 6) and expression levels for 266 miRNAs on days 0, 1, 3, 6 from [38] during ES cell differentiation. These profile data were interpolated at 19 intermediary time points, by Piecewise Cubic Hermite Interpolation (PCHIP) and the number of parameters were limited to be 4, i.e. only 1 upstream miRNA was selected for the model, to satisfy the parameter parsimony requirement.
For miRNA M _{ i }, the following linear transformation, which could be viewed as normalization, was done:
where $\sigma \left({m}_{i}\right)=\sqrt{\sum _{t}{\left({m}_{i}\right(t)\stackrel{\u0304}{{m}_{i}}(t\left)\right)}^{2}}$, t=0,1/3,…,6. Since $\parallel \frac{{m}_{i}\left(t\right)\stackrel{\u0304}{{m}_{i}}\left(t\right)}{\sigma \left({m}_{i}\right)}\parallel \le 1$, $\widehat{{m}_{i}}\left(t\right)$ is positive for t=0,1/3,…,6. Taking ε=0.15, we applied Minimal Net clustering (the MATLAB code can be downloaded through Additional file 1) to the transformed data of miRNAs, $\widehat{{m}_{i}}\left(t\right),i=1,2,\dots ,266$, and obtained 107 clusters, in which the maximum cluster contains 14 miRNAs. We consider that the miRNAs belonging to the same cluster share the same normalized expression level within a negligibly small error. For each cluster C L _{ j }, j=1,…,107, we determine the miRNA M C _{ j } which is the representative of the cluster C L _{ j } for the distance D, and we let m c _{ j }(t) be the expression level of M C _{ j } at time t. We call m c _{ j }(t) is the expression level of cluster C L _{ j }.
We successively implemented the TI model with only one repressor miRNA (the MATLAB code can be downloaded through Additional file 2). After parametric modeling of our preselected 107 TI motifs, and evaluation of their quality of fit, we have validated only 3 clusters of miRNAs as translational inhibitors repressing protein Oct4. If an miRNA in C L _{ j } is validated by modeling as a potential repressor, all other miRNAs belonging to C L _{ j } are also potential repressors and can be validated numerically as well by the form invariability of the model under affine transformation. Here we present one of these three validated miRNAs of Oct4 (see Figure 2), where the centroid of the cluster is miRNA mmumiR10a, while the other 6 miRNAs in the cluster are mmumiR203, 330, 342, 470 and 99b. We used TarBase [39] to search all the experimentally validated miRNAs targeting Oct4 (Pou5f1) in Mus musculus. By TarBase, only mmumiR470 has been experimentally validated and it is also numerically validated by our TI model.
For protein GCNF, only two miRNAs, mmulet7b and mmulet181a, have been experimentally validated by TarBase and both of them belong to the list of the 20 miRNAs numerically validated by our TI modeling (we presented the validated cluster containing mmulet7b in Figure 3). Our modeling approach did not validate any miRNA repressing both proteins Nanog and Sox2, while there are 3 miRNAs and 1 miRNA experimentally validated as separate repressors of Nanog and Sox2 respectively.
We here also present one example of TD motif for downstream factor mRNA Sox2 (Figure 4). With the assumption that the transcription factors are proteins Oct4 and Nanog [38], we validated cluster mmumiR134, 30a3p, 30b, 335, 431, 4333p, 4343p, and 487b as degraders. Although (mmumiR134, Sox2) is also an experimentally tested pair, we will not discuss deep in detail the validation results of the TD motifs in this paper. The main reason is that the validation results of TD motifs depends much on our knowledge of the transcription factors. More discussion is in the subsection below.
Discussion
In [38], we preselected the potential miRNAs for each gene/protein by TargetScan 5.0 and miRanda before applying the two models. The preselection of miRNA candidates were not necessary though it greatly reduced the computational cost. However, the application of the CKE modeling was dependant on the target prediction algorithm, such as TargetScan or miRanda. Therefore, we introduced Minimal Net Clustering in this paper so that the data was condensed and the computational cost could be reduced by a purely numerical method without biological bias.
Since the results of TD model depend on information of transcription factors, the modeling validates not only miRNAs acting as mRNA degraders but also upstream transcription factors simultaneously. In [38] we numerically validated proteins GCNF, Oct4 and Nanog as transcription factors for mRNAs Oct4 and Nanog, while Marson et al. and Boyer et al. [40, 41] claimed that Oct4 and Sox2 are bounded and act together with Nanog as transcription factors. Considering the impact of the transcription factors, the TD modeling may be less convincing unless the transcription factors are fixed as experimentally validated. In this paper we presented more details on the implementation and validation results of the TI model in order to focus on the validation of miRNAs and avoid the influence of assumptions of transcription factors.
TarBase shows that, for protein GCNF the experimentally validated miRNAs are mmulet7b and mmumiR181a; for protein Oct4 the experimentally validated miRNA is mmumiR470; for protein Nanog the experimentally validated miRNAs are mmumiR134, mmumiR470, and mmumiR296; for protein Sox2 the experimentally validated miRNA is mmumiR134. In this paper, we have validated by TI modeling and minimal net clustering all the experimentally tested miRNA repressors of GCNF and Oct4. In our previous work [38], actually none of these miRNA repressors had yet been studied for modeling for the four proteins because of the restriction of preselection. And only the pair (mmumiR181a, GCNF) was validated by the classical correlation analysis done in Gu et al. [18] for the 4 proteins GCNF, Oct4, Nanog, Sox2. Therefore, the TI modeling combined with data condensation not only reduced computational cost but also clearly extended the set of miRNA inhibitors validated by model fitting to microarray data.
Since each numerically validated miRNA cluster may contain two or more miRNAs, the miRNAs in the same cluster could also be considered as potential candidate inhibitors for further experiments to validate. For instance, as Figure 2 shows, mmumiR203, 330, 342, 10a and 99b are potential candidates for protein Oct4 for they are in the same cluster as mmumiR470, which is validated by both the numerical modeling and experiments.
After the pair (mmumiR181a, GCNF) was well validated by our CKE model, we found that the cluster (see Figure 5, top left) containing mmumiR181a also includes mmumiR103 and mmumiR107, which are two known miRNAs that have the same roles in regulating insulin sensitivity and promoting metastasis of colorectal cancer [42, 43]. We also checked that miR103 and miR107 have almost the same mature sequences:

mmumiR103: AGCAGCAUUGUACAGGGCUAUGA

mmumiR107: AGCAGCAUUGUACAGGGCUAUCA
After the pair (mmulet7b, GCNF) was well validated by our CKE model, we observed mmulet7g, and mmulet7i are in the same cluster as mmulet7b (see Figure 5, top right). It was claimed that let7b and 7g reduce tumor growth in mouse models of lung cancer [44]. We then checked that indeed these three miRNAs have very similar mature sequences, namely

mmulet7b: UGAGGUAGUAGGUUGUGUGGUU

mmulet7g: UGAGGUAGUAGUUUGUACAGU

mmulet7i: UGAGGUAGUAGUUUGUGCUGU
To evaluate the correlation between mature sequence and expression profile of our set of 266 miRNAs, we systematically explored all the 266×265/2=35,245 pairs (m i r _{ i },m i r _{ j }) of distinct miRNAs in this set, i≠j,i,j=1,…,35245. For each such pair we then computed the Euclidean distance between the expression level profiles (expression distance in short) and the NeedlemanWunsch alignment score (NWA) between the mature sequences of each miRNA pair. We then divided the 35,245 miRNA pairs into two groups: GroupHigh includes the pairs with high alignment score (N W A>10), i.e. GroupHigh includes miRNA pairs with similar mature sequences, and GroupLow includes the pairs with low alignment score (N W A≤10). We then compared the distribution of expression distances for all miRNA pairs in GroupHigh with the distribution of these distances for miRNA pairs in GroupLow. As seen in Figure 5 (bottom left), the quantiles of these distances in GroupLow are consistently larger than the corresponding quantiles in GroupHigh. This is fully confirmed by KolmogorovSmirnov test which yielded the very significant pvalue 7×10^{32}. The result still holds when we change the alignment score threshold N W A=10 used to define GroupHigh and GroupLow (see Figure 5, bottom right, where the NWA threshold is now N W A=15). We conclude that miRNAs having similar mature sequences tend, with high probability, to have similar expression levels. The above analysis and examples indicate that miRNAs belonging to the same cluster are good candidates to have similar mature sequence. Since the match between miRNA mature sequence and target sites is the main determinant for miRNA targets, miRNAs belonging to same cluster may hence also have similar regulatory roles.
Robustness of parameter estimation
Estimation algorithms for nonlinear models may yield parameter estimates that are dependent on the particular set of data or on initial estimates of parameters. Since our parameter estimation algorithm is independent from initial estimates of parameters, we now focus on on the measurements errors affecting microarray data and on their impact for parameter estimation. Considering that the noises of the microarray data are not negligible, we have analyzed the robustness of our parameters estimators when one perturbs randomly the observed expression levels of miRNAs. We have selected an arbitrary validated model MD, where the corresponding downstream factor is denoted as D, and has expression levels d(t). Denote miRNAs M _{1},…,M _{ j } pertaining to model MD and call their expression levels m _{1}(t),…,m _{ j }(t). Then we have perturbed the expression levels of the miRNAs by independent random noises having the recorded standard deviation σ _{1}(t),…,σ _{ j }(t) and obtained simulated expression levels s m _{1}(t),…,s m _{ j }(t). After injecting the perturbed expression levels s m _{1}(t),…,s m _{ j }(t) into the model MD and get the predicted expression level p d(t) of D. With p d(t),m _{1}(t),…,m _{ j }(t) and expression levels of other upstream factors, we then applied our parameter estimation algorithm to reestimate the model parameters. This procedure was repeated 100 times. Then for each model parameter, we plotted the histograms of those 100 reestimated parameter values and compared them with the parameter values estimated from unperturbed data of model MD. This analysis showed that our parameter estimation algorithm is quite robust. Here we present the histograms of perturbed estimates of model parameters for one TD motif (Figure 6) and one TI motif (Figure 7).
Conclusion
We have separately modeled by chemical kinetics equations the 2 distinct modalities of the repressive actions of miRNAs on posttranscriptional processes of mRNA genes and the associated proteins. This was achieved by first defining the formal structure of two types of interaction architectures (Transcription Degradation motifs and Translation Inhibition motifs) linking miRNAs to subgroups of mRNA genes. The plausibility of each one of these potential TD motifs or TImotifs was then evaluated by computerized parametric modeling, based on microarray data, of adequate formal chemical kinetics equations (CKEs).
We have sketched the formal derivation of 2 specific CKEs modeling by dynamic ODEs the interactions between concentrations of different species of molecules involved in each architecture. This led to a motif validation strategy based on the quantified quality of fit between our optimally parametrized models and the corresponding microarray data.
Our computerized parameter estimation is implemented by an innovative fast algorithm that does not require knowledge of range of molecular reaction rates. On a current standard laptop PC, our implementation of parameter estimation for a typical 9parameters CKE model requires about 5 minutes of computing time.
Our parameter estimation algorithm also provides relatively highquality optimization for the fit between model and microarray data, by integrating both global and local cost minimization techniques, in contexts where plausible ranges of values for most of the unknown parameters are not available in the literature. By perturbing the expression levels of miRNAs and reestimating the parameters, we showed that our parameter algorithm has a satisfactory level of robustness. We believe that our parameter estimation technique with associated evaluation of quality of fit would be quite applicable as a generic algorithm to similar problems in chemical kinetics modeling of molecular interactions.
Modeling very large microarray data is computationally quite expensive. We have hence sketched clustering methods to condense large microarray data. This approach has of course been attempted before our work, but the main point is that we have carefully studied the mathematical compatibility of our CKE models with condensation of the profiles data. Since we have proved that the abstract form of our CKE models is invariant by arbitrary multiple affine transformations of profiles data, we have made sure to constrain the distance of two expression levels profiles to be invariant by these types of affine transformations.
We have implemented a Minimal Net Clustering algorithm based on this distance, which allows us to control the radius of the clusters. The number of CKEs to parameterize can be strongly reduced after condensation of the large data sets, and the affine invariance of our CKEs show that the condensed genes network can then still be modeled by similar CKEs.
By applying our TI modeling to multiple proteins such as GCNF, Oct4, Nanog, Sox2, we showed that 3 miRNAtarget pairs experimentally validated can be also validated by the TI model.
References
 1.
Ambros V:The functions of animal microRNAs. Nature. 2004, 431 (7006): 350355. 10.1038/nature02871.
 2.
Aravin A, Tuschl T:Identification and characterization of small RNAs involved in RNA silencing. FEBS Lett. 2005, 579 (26): 58305840. 10.1016/j.febslet.2005.08.009.
 3.
Bartel DP:MicroRNAs: target recognition and regulatory functions. Cell. 2009, 136 (2): 215233. 10.1016/j.cell.2009.01.002.
 4.
Tay Y, Zhang J, Thomson AM, Lim B, Rigoutsos I:MicroRNAs to Nanog, Oct4 and Sox2 coding regions modulate embryonic stem cell differentiation. Nature. 2008, 455: 11241128. 10.1038/nature07299.
 5.
Wang XJ, Reyes JL, Chua NH, Gaasterland T:Prediction and identification of arabidopsis thaliana microRNAs and their mRNA targets. Genome Biol. 2004, 5 (9): R6510.1186/gb200459r65.
 6.
Kawasaki H, Taira K:MicroRNA196 inhibits HOXB8 expression in myeloid differentiation of HL60 cells. Nucleic Acids Symp Ser. 2004, 48 (48): 211212.
 7.
Moxon S, Jing R, Szittya G, Schwach F, RPR L, Moulton V, Dalmay T:Deep sequencing of tomato short RNAs identifies microRNAs targeting genes involved in fruit ripening. Genome Res. 2008, 18 (10): 16021609. 10.1101/gr.080127.108.
 8.
Williams AE:Functional aspects of animal microRNAs. Cell Mol Life Sci. 2008, 65 (4): 545562. 10.1007/s0001800773559.
 9.
Mazière P, Enright AJ:Prediction of microRNA targets. Drug Discov Today. 2007, 12 (11–12): 452458.
 10.
Lewis BP, Shih IH, JonesRhoades MW, Bartel DP, Burge C:Prediction of mammalian microRNA targets. Cell. 2003, 115: 787798. 10.1016/S00928674(03)010183.
 11.
Lewis BP, Burge CB, Bartel D:Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell. 2005, 120: 1520. 10.1016/j.cell.2004.12.035.
 12.
Brennecke J, Stark A, Russell RB, Cohen S:Principles of microRNAtarget recognition. PLoS Biol. 2005, 3: e8510.1371/journal.pbio.0030085.
 13.
Krek A, Grun D, Poy MN, Wolf R, Rosenberg L, Epstein E, MacMenamin P, da Piedade I, Gunsalus KC, Stoffel M, Rajewsky N:Combinatorial microRNA target predictions. Nat Genet. 2005, 37: 495500. 10.1038/ng1536.
 14.
Chi SW, Hannon GJ, Darnell RB:An alternative mode of microRNA target recognition. Nat Struct Mol Biol. 2012, 19 (3): 321327. 10.1038/nsmb.2230.
 15.
Baek D, Villé J, Shin C, Camargo FD, Gygi SP, Bartel DP:The impact of microRNAs on protein output. Nature. 2008, 455: 6471. 10.1038/nature07242.
 16.
Selbach M, Schwanhäusser B, Thierfelder N, Fang Z, Khanin R, Rajewsky N:Widespread changes in protein synthesis induced by microRNAs. Nature. 2008, 455: 5863. 10.1038/nature07228.
 17.
Mourelatos Z:Small RNAs: The seeds of silence. Nature. 2008, 455: 4445. 10.1038/455044a.
 18.
Gu P, Reid JG, Gao X, Shaw CA, Creighton C, Tran PL, Zhou X, Drabek RB, Steffen DL, Hoang DM, Weiss MK, Naghavi AO, Eldaye J, Khan MF, Legge GB, Wheeler DA, Gibbs RA, Miller JN, Cooney AJ, Gunaratne PH:Novel miRNA candidates and miRNAmRNA pairs in ES cells. PLoS ONE. 2008, 3 (7): e254810.1371/journal.pone.0002548.
 19.
de Jong H:Modeling and simulation of genetic regulatory systems: a literature review. J Comput Biol. 2002, 9 (1): 67103. 10.1089/10665270252833208.
 20.
Goutsias J, Kim S:A nonlinear discrete dynamical model for transcriptional regulation: construction and properties. Biophys J. 2004, 86 (4): 19221945. 10.1016/S00063495(04)742575.
 21.
CornishBowden A: Fundamentals of Enzyme Kinetics, 3rd Edition. 2004, London: Portland Press
 22.
Goutsias J, Lee NH:Computational and experimental approaches for modeling gene regulatory networks. Curr Pharm Des. 2007, 13: 14151436. 10.2174/138161207780765945.
 23.
Moore JW, Pearson RG: Kinetics and Mechanism, 3rd Edition. 1981, New York: Wiley
 24.
Slonim DK, Yanai I:Getting started in gene expression microarray analysis. PLoS Comput. Biol. 2009, 5 (10): e100054310.1371/journal.pcbi.1000543.
 25.
Engl HW, Flamm C, Kügler P, Lu J, Müller S, Schuster P:Inverse problems in systems biology. IOP Sci. 2009, 25: 123014
 26.
Arkin A, Ross J, McAdams HH:Stochastic kinetic analysis of developmental pathway bifurcation in phage linfected escherichia coli cells. Genetics. 1998, 149: 16331648.
 27.
Wang Y, Liu CL, Storey JD, Tibshirani RJ, Herschlag D, Brown PO:Precision and functional specificity in mRNA decay. Proc Natl Acad Sci. 2002, 99 (9): 58605865. 10.1073/pnas.092538799.
 28.
Meir E, Munro EM, Odell GM, von Dassow G:Ingeneue: a versatile tool for reconstituting genetic networks, with examples from the segment polarity network. J Exp Zool. 2002, 294: 216251. 10.1002/jez.10187.
 29.
Müller S, Hofbauer J, Endler L, Flamm C, Widder S, Schuster P:A generalized model of the repressilator. J Math Biol. 2006, 53: 905937. 10.1007/s0028500600359.
 30.
Widder S, Schicho J, Schuster P:Dynamic patterns of gene regulation I: simple twogene systems. J Theor Biol. 2007, 246: 395419. 10.1016/j.jtbi.2007.01.004.
 31.
Hornstein E, Shomron N:Canalization of development by microRNAs. Nat Genet. 2006, 38: S20S24. 10.1038/ng1803.
 32.
Azencott R, Coldefy F, Younes L:A distance for elastic matching in object recognition. Pattern Recognition, Proceedings of 13th Int. Conf. ICPR 96, vol. 1. 1996, IEEE, 687691.
 33.
Periwal V, Chow CC, Bergman RN, Ricks M, Vega GL, Sumner AE:Evaluation of quantitative models of the effect of insulin on lipolysis and glucose disposal. Am J Physiol Regul Integr Comp Physiol. 2008, 295: R1089R1096. 10.1152/ajpregu.90426.2008.
 34.
Gregory PC: Bayesian Logical Data Analysis for the Physical Sciences: A Comparative Approach With Mathematical Support. 2005, London: Cambridge University Press
 35.
Küegler P, Gaubitzer E, Müller S:Parameter identification for chemical reaction systems using sparsity enforcing regularization: a case study for the ChloriteIodide Reaction. J Phys Chem A. 2009, 113: 27752785. 10.1021/jp808792u.
 36.
Vapnik V: Statistical Learning Theory. 1998, New York: Wiley
 37.
Boyd S, Vandenberghe L: Convex Optimization. 2004, Cambridge: Cambridge University Press
 38.
Luo Z, Xu X, Gu P, Lonard D, Gunaratne P, Cooney AJ, Azencott R:Regulatory circuits of miRNAs in ES cell differentiation: a chemical kinetics modeling approach. PLoS One. 2011, 6 (10): e2326310.1371/journal.pone.0023263.
 39.
Vergoulis TI, Vlachos P, Alexiou G, Georgakilas M, Maragkakis M, Reczko M, Gerangelos S, Koziris N, Dalamagas T, Hatzigeorgiou A:Tarbase 6.0: capturing the exponential growth of miRNA targets with experimental support. Nucl Acids Res. 2012, 40 (D1): D222D229. 10.1093/nar/gkr1161.
 40.
Marson A, Levine SS, Cole MF, Frampton GM, Brambrink T, Johnstone S, Guenther MG, Johnston WK, Wernig M, Newman J, Calabrese JM, Dennis LM, Volkert TL, Gupta S, Love J, Hannett N, Sharp PA, Bartel DP, Jaenisch R, YR A:Connecting microRNA genes to the core transcriptional regulatory circuitry of embryonic stem cells. Cell. 2008, 10: 1016
 41.
Boyer LA, Lee TI, Cole MF, Johnstone SE, Levine SS, Zucker JP, Guenther MG, Kumar RM, Murray HL, Jenner RG, Gifford DK, Melton DA, Jaenisch R, Young RA:Core transcriptional regulatory circuitry in human embryonic stem cells. Cell. 2005, 122 (6): 947956. 10.1016/j.cell.2005.08.020.
 42.
Trajkovski M, Hausser J, Soutschek J, Bhat B, Akin A, Zavolan M, Heim MH, Stoffel M:MicroRNAs 103 and 107 regulate insulin sensitivity. Nature. 2011, 474 (7353): 649653. 10.1038/nature10112.
 43.
Chen HY, Lin YM, Chung HC, Lang YD, Lin CJ, Huang J, Wang WC, Lin FM, Chen Z, Huang HD, Shyy JY, Liang JT, Chen RH:miR103/107 promote metastasis of colorectal cancer by targeting the metastasis suppressors DAPK and KLF4. Cancer Res. 2012, 72 (14): 36313641. 10.1158/00085472.CAN120667.
 44.
EsquelaKerscher A, Trang P, Wiggins JF, Patrawala L, Cheng A, Ford L, Weidhaas JB, Brown D, Bader AG, Slack FJ:The let7 microRNA reduces tumor growth in mouse models of lung cancer. Cell Cycle. 2008, 7 (6): 759764. 10.4161/cc.7.6.5834.
Acknowledgements
We would like to thank Drs P. Gunaratne, A. J. Cooney, D. Lonard, X. Xu for multiple discussions concerning their experimental data and results on miRNAmRNA interactions. These discussions led us to a joint publication.
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
ZL developed the chemical kinetics equations and the parameter estimation algorithm; RA proposed the transformation invariance of chemical kinetics equations and data condensation method; YZ did the computation work. ZL and RA wrote the paper. All authors read and approved the final manuscript.
Electronic supplementary material
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Received
Accepted
Published
DOI
Keywords
 miRNA
 Chemical kinetics modeling
 Minimal net clustering