Problem statement
A microarray experiment only measures the "observed" quantities, as shown in Figure 8, whereas the other quantities are not observed ("hidden"). The dashed oval encloses the closest quantities on the path between the TF and the target gene. Consider a transcriptional network of n genes that are regulated by m regulators, as well as a timedependent external signal. Given the structure G and a set X of transcription rates of these genes in T time points, is it possible to reconstruct the timevarying activity level of m regulators, R, at all time points and the corresponding model parameters? This is an infinite dimensional problem that we tackle by placing a stochastic process prior over the activities of regulators.
Our approach relies on a continuous time, differential equation description of transcriptional dynamics where TFs are treated as latent on/off variables and are modelled using a switching stochastic process. The framework of the proposed method is shown in the Figure 9.
Kinematic model of regulation
Compared with the gene expression level, the gene transcription rate can capture more dynamic characteristics of transcription regulation. We here employ the transcription rate to model the regulation function. We first assume:

The derived transcription rates are average rates over a cell population.

The speed of a TF's binding to or dissociation from its target sites is assumed to be much more rapid than the transcription process, thus rapidequilibrium approximation can be used.
Based on the above assumptions, the transcription rate of a gene is proportional to the amount of the gene bound by its regulators in all genes of the measured cell population. We first consider the case that a gene is regulated by a single activator. The corresponding regulation function can be properly described by MichaelisMenten equation:
\frac{dx}{dt}=\beta \frac{r\left(t\right)}{d+r\left(t\right)}+c\omega x,
(1)
here x represents the mRNA concentration for a particular gene, r(t) the concentration of active TF, β and d are kinetic constants, c a baseline expression rate and ω the mRNA decay rate.
To incorporate the sequence feature and the TF binding preference into the model, we set the binding affinity φ = k/d, and (1) can be reformulated as
\frac{dx}{dt}=\beta \frac{k\phi r\left(t\right)}{1+k\phi r\left(t\right)}+c\omega x,
(2)
here k is a scaling parameter.
We now take the regulation involving two regulators into account. Denote by r_{1}(t) and r_{2}(t) the concentration of two regulators, φ_{1} and φ_{2} the binding affinity of two regulators from their own target sites, the regulation function can be written as below:
\frac{dx}{dt}=\beta \frac{{k}_{1}{\phi}_{1}r{\left(t\right)}_{1}+{k}_{2}{\phi}_{2}{r}_{2}\left(t\right)+{k}_{3}{\phi}_{1}{\phi}_{2}{r}_{1}\left(t\right){r}_{2}\left(t\right)}{\left(1+{\phi}_{1}{r}_{1}\left(t\right)\right)\left(1+{\phi}_{2}{r}_{2}\left(t\right)\right)}+c\omega x,
(3)
Considering the general case, a gene is regulated by n regulators. There are 2^{n} different binding states in total. The ndimension binary vector is employed to indicate a certain binding state, e.g., a 4dimension vector (0 1 0 1) indicates that the second and the fourth regulators are bound to their own target sites while the first and the third are not bound. The regulation function can be written as:
\frac{d{x}_{j}}{dt}={\beta}_{j}\frac{{\sum}_{s\in {S}_{j}}{k}_{s}{\prod}_{{s}_{i}=1;i=1,n}{\phi}_{ij}{r}_{i}\left(t\right)}{{\prod}_{i=1}^{n}\left(1+{\phi}_{ij}{r}_{i}\left(t\right)\right)}+{c}_{j}{\omega}_{j}{x}_{j}
(4)
where S_{j} denotes the set of all 2^{n} possible state vectors, and s_{i} is the i_{th} element of the state vector s.
Modelling for binding affinity
Measuring affinities of molecular interactions in highthroughput format remains problematic, especially for transient and lowaffinity interactions. We here try to describe the landscape of binding affinity in the perspective of binding energy between the various DNAbinding molecules and their target genes. Binding affinity landscapes describe how each molecule translates an input DNA sequence into a binding potential that is specific to that molecule. The presented framework models several important aspects of the binding process.
By allowing molecules to bind anywhere along the input sequence, the entire range of affinities is considered, thereby allowing contributions from both strong and weak binding sites [26, 27].

Conventional cooperative binding interactions can be explicitly modelled by assigning higher statistical weights to configurations in which two molecules are bound in close proximity.

The cooperativity that arises between factors when both nucleosomes and transcription factors are integrated is captured automatically [28].
We first consider the simplest case that there is only one target site S_{ij} for TF i in the promoter of gene j:
T{F}_{i}+{S}_{ij}\underset{{k}_{d}}{\stackrel{{k}_{b}}{\leftrightarrow}}\left[T{F}_{i}\cdot {S}_{ij}\right]
The sitespecific binding affinity is given by
\phi ={C}_{i}{e}^{\frac{{E}_{ij}}{kT}}
(5)
where C_{i} is a constant, E_{ij} the binding free energy between TF_{i} and the promoter of gene j, k and T are the Boltzmann constant and temperature, respectively.
The above case can be expanded to the general case that binding may happen in anywhere along the input sequence and the accessibility of target sites varies due to the occupancy of nucleosomes. The general binding affinity is modelled as
{\phi}_{ij}={C}_{i}\sum _{n=1}^{N}{{p}_{ij}}^{\left(n\right)}{e}^{\frac{{{E}_{ij}}^{\left(n\right)}}{kT}}
(6)
where p^{(n)}_{
ij
} is the probability of transcription factor i binding to the nth binding site in the promoter of gene j. Note that the derivation of p^{(n)}_{
ij
} involves the information of the possible occupancy of nucleosomes. The nucleosomes positions can be predicted based on the nucleosomeDNA interaction model proposed by Kaplan et al [29]. Figure 10(b) shows the occupancy of nucleosomes for the genomic sequence shown in the Figure 10(a).
Since the positional weight matrices (PWM) are often used to represent the sequence motif [30, 31], we estimate the binding energy in perspective of PWM. As the background information has been taken into the PWM, the binding free energy can be approximately calculated as below:
{{E}_{ij}}^{\left(q\right)}={K}^{\left(q\right)}\sum _{l=1}^{L}\sum _{n=\left\{\left(\right)close="\}">A,C,G,T\right.}\n \n \n \n \n J\n \n \n n\n \n \n \n \n l\n \n \n \n \n \n \n \n \n M\n \n \n L\n \n \n \n \n *\n \n \n \n \n \n M\n \n \n n\n l\n \n \n \n \n
where {{J}^{n}}_{l}=\left\{\begin{array}{c}1\mathsf{\text{}}if\mathsf{\text{}}n=s\left(l\right)\\ 0\mathsf{\text{otherwise}}\end{array}\right.
here K^{(q)} is the scaling factor, M*_{L} indicates the maximum background frequency in the motif, s(l) is the nucleotide in position l.
Regulatory network modelling using dynamic Bayesian inference
In many biological processes, the transcription factor transit from inactive to active state quickly as a consequence of fast posttranslational modifications, (the time scale is micro second), so it is reasonable that we model the TF activity as a binary variable r(t) ∈{0,1}[24, 32].
For the regulation involving a single regulator, the TF activity can be seen as a two states Markov Jump Process. Based on Ref [24, 33], the probability of the system being in a particular state at a given time is given by the following Master equation:
\frac{d{p}_{1}\left(t\right)}{dt}={n}_{+}{p}_{0}\left(t\right){n}_{}{p}_{1}\left(t\right)
(7)
\frac{d{p}_{0}\left(t\right)}{dt}={n}_{+}{p}_{1}\left(t\right){n}_{}{p}_{0}\left(t\right)
(8)
here p_{1}(t) = p(r(t) = 1), p_{0}(t) = p(r(t) = 0) and n_{±} indicates the transition rate.
The observed expression data is often assumed to be normally distributed [24]. We now define a noise model that relates the predicted mRNA concentration to the observed expression data, as shown in Figure 11.
Setting y_{j}(t) as the observations of mRNA species j at time t, x_{j}(t) the predicted expression and σ_{j} the variation, the noise model can be described as
{y}_{j}\left(t\right)\leftr\left(t\right)~\right.N\left({x}_{j}\left(t\right),\mathsf{\text{}}{{\sigma}^{2}}_{\mathsf{\text{j}}}\right)
Based on Refs [24, 33], we define the TF's switching stochastic process as the prior distribution, then we combine the prior distribution and the noise model (likelihood) into Bayes' theorem to obtain the posterior over the process:
p\left(r\lefty,\mathsf{\text{}}\Omega \right.\right)=\frac{1}{S}p\left(y\leftr,\right.\mathsf{\text{}}\Omega \right)p\left(r\right)
where y denotes collectively the observations, Ω are the parameters involved in the regulation function and S a normalization constant.
Variational inference and model optimization
We will use a variational formulation of the inference problem [34]. Variational inference is a powerful inference method and it has been well applied for optimization by Opper and Sanguinetti [24, 33]. Our model optimization is based on Ref [22]. Variational inference is used as an approximation technique: given an intractable probability distribution p, the variational approach finds an optimal approximation q within a certain family of distributions. This is usually done by minimizing the KullbackLeibler (KL) divergence between the two distributions
KL\left[q\u2225p\right]={E}_{q}\left[\mathsf{\text{log}}\frac{q}{p}\right]=\int \mathsf{\text{log}}\frac{q\left(r\right)}{p\left(r\right)}q\left(r\right)dr
(9)
By selecting a suitable family of approximating distributions, the inference problem is then turned into an optimization problem. It can be shown that the KL divergence is a convex functional of q and is equal to zero iff q = p [24, 35]. In this case, we will choose the approximating process q to be again a Markov Jump Process, so that the required KL is given by
KL\left[q{\u2225p}_{post}\right]=KL\left[q{\u2225p}_{prior}\right]+\mathsf{\text{log}}S{E}_{q}\left[\mathsf{\text{log}}p\left(y\leftr,\mathsf{\text{}}\Omega \right.\right)\right]
(10)
here S is a normalization constant, E_{
q
}[log p(yr, Ω)] the expectation of the likelihood of the observations under the approximating process.
According to Ref [24], minimization of the KL functional (11) can be represented as the saddle point problem
J=\underset{\tau}{\mathsf{\text{max}}}\underset{q}{\mathsf{\text{min}}}\left\{KL\left[q\u2225{p}_{prior}\right]+\sum _{j=1}^{n}\left[{\tau}_{j}\left({y}_{j}\stackrel{\u0304}{x}\left({t}_{j}\right)\right)\frac{{\sigma}^{2}}{2}{{\tau}_{j}}^{2}\right]\right\}
(11)
here τ is auxiliary variables. It can be found that this functional is concave in τ and convex in q. Hence we can exchange min and max. Performing the max first yields the result. This also shows that there is only a unique saddle point solution.
The optimization procedure is based on a forwardbackward procedure, leading to ordinary differential equations which can iteratively be solved. Taking the regulation involving two regulators for example, the free energy is a functional of both the approximating processes q^{1}, q^{2} and their transition rates n_{1}, n_{2}. However, these are not independent, but are related by the Master equation. To incorporate this constraint, we add Lagrange multipliers as
L\left({q}^{1},{q}^{2},{g}_{1},{g}_{2}\right)=J\left[{q}^{1},{q}^{2},{g}_{1},{g}_{2}\right]+{\int}_{0}^{T}\left[\frac{d{{q}_{1}}^{1}\left(t\right)}{dt}+\left({n}_{1}+{n}_{1+}\right){{q}_{1}}^{1}\left(t\right){n}_{1+}\right]{\lambda}_{1}\left(t\right)dt+{\int}_{0}^{T}\left[\frac{d{{q}_{2}}^{1}\left(t\right)}{dt}+\left({n}_{2}+{n}_{2+}\right){{q}_{2}}^{1}\left(t\right){n}_{2+}\right]{\lambda}_{2}\left(t\right)dt
(12)
where g_{1} and g_{2} are the rates of jumps from the 0 to the 1 state for process q^{1} and q^{2}, respectively.
The Lagrange multiplier functions obey the final condition λ(T) = 0. Estimation of the parameters A and b can be done directly by maximizing the approximate marginal likelihood E_{q}[logp(yr,Ω)]. The framework of the inference is shown in the Figure 12.