Bayesian inference based modelling for gene transcriptional dynamics by integrating multiple source of knowledge
 ShuQiang Wang^{1, 2} and
 HanXiong Li^{1, 2}
https://doi.org/10.1186/175205096S1S3
© Wang and Li; licensee BioMed Central Ltd. 2012
Published: 16 July 2012
Abstract
Background
A key challenge in the post genome era is to identify genomewide transcriptional regulatory networks, which specify the interactions between transcription factors and their target genes. Numerous methods have been developed for reconstructing gene regulatory networks from expression data. However, most of them are based on coarse grained qualitative models, and cannot provide a quantitative view of regulatory systems.
Results
A binding affinity based regulatory model is proposed to quantify the transcriptional regulatory network. Multiple quantities, including binding affinity and the activity level of transcription factor (TF) are incorporated into a general learning model. The sequence features of the promoter and the possible occupancy of nucleosomes are exploited to estimate the binding probability of regulators. Comparing with the previous models that only employ microarray data, the proposed model can bridge the gap between the relative background frequency of the observed nucleotide and the gene's transcription rate.
Conclusions
We testify the proposed approach on two realworld microarray datasets. Experimental results show that the proposed model can effectively identify the parameters and the activity level of TF. Moreover, the kinetic parameters introduced in the proposed model can reveal more biological sense than previous models can do.
Background
A challenge facing molecular biology is to develop quantitative, predictive models of gene regulation. The advance of highthroughput microarray technique makes it possible to measure the expression profiles of thousands of genes, and genomewide microarray datasets are collected, providing a way to reveal the complex regulatory mechanism among cells. There are two broad classes of gene regulatory interactions: one based on the 'physical interaction' that aim at identifying relationships among transcription factors and their target genes (genetosequence interaction) and another based on the 'influence interaction' that try to relate the expression of a gene to the expression of the other genes in the cell (genetogene interaction).
In recent years, researchers have proposed many different computational approaches to reconstruct gene regulatory networks from highthroughput data, e.g. see reviews by Bansal et al. and Markowetz and Spang [1, 2]. These approaches fall roughly into two categories: qualitative and quantitative aspects. Inferring qualitative regulatory networks from microarray data has been well studied, and a number of effective approaches have been developed [3–10]. However, these methods are based on coarse grained qualitative models [11, 12], and cannot provide a realistic and quantitative view of regulatory systems. On the other hand, quantitative modelling for gene regulatory network is in its infancy. Research on quantitative models for genetic regulation has arisen only in recent years, and most of them are based on classical statistical techniques. Liebermeister et al. [13] proposed a linear model for cell cyclerelated gene expression in yeast based on independent component analysis. Holter et al. [14] employ singular value decomposition to uncover the fundamental patterns underlying gene expression profiles. Pournara et al. [15] and Yu et al. [16] proposed the Factor analysis model to describe a larger number of observed variables. However, these approaches are based on linear regression, and are not always being consistent with the observations in biochemical experiments which are nonlinear. Imoto et al. [17] proposed a nonlinear model with heterogeneous error variances. This model matches the microarray data well but it is not satisfying enough in revealing more biological sense. Segal et al. [18] proposed a transcription control network based model and apply their model to the segmentation gene network of Drosophila melanogaster. They reveal that positional information is encoded in the regulatory sequence and input factor distribution. However, there is still a little bit of dilemma in the model: the activity level of transcription factors is difficult to be measured or to be identified. Actually, assaying the transcription factors' activity state in a dynamic fashion is a major obstacle to the wider application of the kinetic modeling. TFs' activity levels are difficult to measure mainly due to two technical limitations: TFs are often present at low intercellular concentrations and the changes in their activity state can occur rapidly due to posttranslational modifications.
Based on the above description, this paper aims to describe the transcriptional regulatory network quantitatively. In this work, a Bayesian inference based regulatory model is proposed to quantify the transcriptional dynamics. Multiple quantities, including binding energy, binding affinity and the activity level of transcription factor are incorporated into a general learning model. The sequence features of the promoter and the occupancy of nucleosomes are exploited to derive the binding energy. Compared with the previous models, the proposed model can reveal more biological sense.
Results
Case Ι: Circadian patterns in rat liver
Circadian rhythm is a daily timekeeping mechanism fundamental to a wide range of species. The basic molecular mechanism of circadian rhythm has been studied extensively. As a real example to test our approach, we considered the dynamics of the circadian patterns in rat liver. We employ the datasets from Almon et al [19]. This experiment was designed to examine fluctuations in gene expression in liver within the 24 hour circadian cycle in normal animals. Fiftyfour male normal Wistar rats were housed in a strictly controlled stress free environment with light: dark cycles of 12 hr: 12 hr. Three animals were sacrificed at each of 18 selected time points within the 24 hour cycle. RNA was prepared from livers for gene arrays. Time point designations reflect time after lights on in hours. For details, please refer to Table S1 in additional file 1.
Analysis of the predicted activity levels of transcription factors
To test the proposed model on the above dataset, we employ two important transcriptional regulators of which activity levels indicate the variation of heat signals in a subset of gene circadian network, hsf1 and ppara. In total, we selected 7 genes to perform posterior inference of TF activities. To ensure identifiability, we included three genes that are regulated solely by hsf1 (HSP110, HSPA8 and COL4A1), and two genes that are regulated solely by ppara (ACSL1 and HMGCS1). The remaining two genes are jointly regulated by hsf1 and ppara. These genes were chosen since they exhibit the largest variance in the microarray time course, and hence are likely to provide a cleaner representation of the output of the system.
The biological sense of kinetic parameters
Relationship between scaling parameter k and the corresponding binding affinity φ.
Gene  HSP110  HSPA8  COL4A1  ACSL1  HMGCS1  HSP90AA1hsf1  HSPA1B hsf1  HSPA1B ppara 

k  H  H  H  H  L  H  L  H 
φ  L  H  L  H  L  H  L  H 
Case II: A yeast synthetic network for in vivo assessment
Analysis of the predicted activity levels of transcription factors
Analysis of the kinetic parameters
Relationship between k and φ for IRMA network data.
Gene  GAL80  GAL4  SWI5  ASH1  CBF1swi5  CBF1ash1 

k  H  L  H  L  H  H 
φ  L  L  H  L  H  H 
Discussion
In this study, two realworld microarray datasets were exploited two evaluate the efficiency of the proposed model. Comparison shows that the kinetic parameters obtained by our method have smaller variance than that of Barenco et al. [21]. One reason is that the proposed model provides a principled method for the incorporation of prior biological knowledge. This may be in the form of suitable ranges for kinetic parameters, known kinetic parameter values and suitable distributions on measurement noise. Besides, it is possible for the proposed model to circumvent the need for expensive samplingbased inference and a TFA profile can be inferred over all time, rather than just at the discrete timepoints at which expression was measured.
The Bayesian inference based model of transcription rates and regulator activity levels allows us to handle these biologically relevant quantities despite the indirect measurement of the former and the lack of measurements of the latter. It also allows us to handle the inherently noisy measurement in a principled way. However, the proposed model still abstracts away some of the explicit processes that generate the actual observed expression data. A more explicit modelling of these will provide a more principled treatment of different sources of noise in the data. Furthermore, this model does not handle directly the upstream events that affect regulator activity. In fact, the current model is an open loop system, such that the regulation of regulator activity is itself viewed as exogenous to the system. By developing a richer modeling language we may capture more complex reaction models, model the upstream regulation of activity levels, and identify systems that involve feedback mechanisms and signalling networks.
PostTranscriptional Modification Model (PTMM) have been previously used to model TF activities [25]; in that work, further dependencies were included between TF mRNA expression levels and their predicted activities, which enabled to predict possible posttranscriptional modifications in TFs. Naturally, it should be possible to combine both our approach and their approach to give a model capable of simultaneously inferring TF activities, combinatorial interactions and post transcriptional regulations.
Conclusions
In this work, we have proposed a computational model to reverse engineer simultaneously both the activity of TFs and the logical structure of promoters by integrating multiple sources of knowledge, including timeseries gene expression data, TFs' binding information and sequence features of promoters. The approach relies on a detailed model of transcription, which is an approximation to the MichaelisMenten model from classical enzyme kinetics, and therefore should be able to capture accurately the effects that changes in TF activity have on gene expression dynamics. We testify the proposed approach on two realworld microarray datasets. Experimental results show that the proposed model can effectively identify the parameters and the activity level of TF. Moreover, the kinetic parameters introduced in the proposed model can reveal more biological sense than previous models can do.
Methods
Problem statement
Kinematic model of regulation
Compared with the gene expression level, the gene transcription rate can capture more dynamic characteristics of transcription regulation. We here employ the transcription rate to model the regulation function. We first assume:

The derived transcription rates are average rates over a cell population.

The speed of a TF's binding to or dissociation from its target sites is assumed to be much more rapid than the transcription process, thus rapidequilibrium approximation can be used.
here x represents the mRNA concentration for a particular gene, r(t) the concentration of active TF, β and d are kinetic constants, c a baseline expression rate and ω the mRNA decay rate.
here k is a scaling parameter.
where S_{j} denotes the set of all 2^{n} possible state vectors, and s_{i} is the i_{th} element of the state vector s.
Modelling for binding affinity
Measuring affinities of molecular interactions in highthroughput format remains problematic, especially for transient and lowaffinity interactions. We here try to describe the landscape of binding affinity in the perspective of binding energy between the various DNAbinding molecules and their target genes. Binding affinity landscapes describe how each molecule translates an input DNA sequence into a binding potential that is specific to that molecule. The presented framework models several important aspects of the binding process.
By allowing molecules to bind anywhere along the input sequence, the entire range of affinities is considered, thereby allowing contributions from both strong and weak binding sites [26, 27].

Conventional cooperative binding interactions can be explicitly modelled by assigning higher statistical weights to configurations in which two molecules are bound in close proximity.

The cooperativity that arises between factors when both nucleosomes and transcription factors are integrated is captured automatically [28].
where C_{i} is a constant, E_{ij} the binding free energy between TF_{i} and the promoter of gene j, k and T are the Boltzmann constant and temperature, respectively.
where ${{J}^{n}}_{l}=\left\{\begin{array}{c}1\mathsf{\text{}}if\mathsf{\text{}}n=s\left(l\right)\\ 0\mathsf{\text{otherwise}}\end{array}\right.$
here K^{(q)} is the scaling factor, M*_{L} indicates the maximum background frequency in the motif, s(l) is the nucleotide in position l.
Regulatory network modelling using dynamic Bayesian inference
In many biological processes, the transcription factor transit from inactive to active state quickly as a consequence of fast posttranslational modifications, (the time scale is micro second), so it is reasonable that we model the TF activity as a binary variable r(t) ∈{0,1}[24, 32].
here p_{1}(t) = p(r(t) = 1), p_{0}(t) = p(r(t) = 0) and n_{±} indicates the transition rate.
where y denotes collectively the observations, Ω are the parameters involved in the regulation function and S a normalization constant.
Variational inference and model optimization
here S is a normalization constant, E_{ q }[log p(yr, Ω)] the expectation of the likelihood of the observations under the approximating process.
here τ is auxiliary variables. It can be found that this functional is concave in τ and convex in q. Hence we can exchange min and max. Performing the max first yields the result. This also shows that there is only a unique saddle point solution.
where g_{1} and g_{2} are the rates of jumps from the 0 to the 1 state for process q^{1} and q^{2}, respectively.
Declarations
Acknowledgements
This work was supported by a GRF project from Hong Kong SAR (CityU 117310) and the grant from NNSF China (51175519).
This article has been published as part of BMC Systems Biology Volume 6 Supplement 1, 2012: Selected articles from The 5th IEEE International Conference on Systems Biology (ISB 2011). The full contents of the supplement are available online at http://www.biomedcentral.com/bmcsystbiol/supplements/6/S1.
Authors’ Affiliations
References
 Bansal M, Belcastro V, AmbesiImpiombato A, di Bernardo D: How to infer gene networks from expression profiles. Mol Syst Biol. 2007, 3: 78PubMed CentralView ArticlePubMedGoogle Scholar
 Markowetz F, Spang R: Inferring cellular networks  a review. BMC Bioinformatics. 2007, 8 (Sul 6): S5PubMed CentralView ArticlePubMedGoogle Scholar
 Eisen MB, Spellman PT, Brown PO, Bostein D: Cluster analysis and display of genomewide expression patterns. Proc Natl Acad Sci USA. 1998, 95: 1486314868. 10.1073/pnas.95.25.14863.PubMed CentralView ArticlePubMedGoogle Scholar
 Davidich M, Bornholdt S: Boolean network model predicts cell cycle sequence of fission yeast. PLoS One. 2008, 3: e167210.1371/journal.pone.0001672.PubMed CentralView ArticlePubMedGoogle Scholar
 Lahdesmaki H, Shmulevich I, YliHarja O: On learning gene regulatory networks under the Boolean network model. Mach Learn. 2004, 52: 147167.View ArticleGoogle Scholar
 Imoto S, Goto T, Miyano S: Estimation of genetic networks and functional structures between genes by using Bayesian networks and nonparametric regression. Pac Symp Biocomput. 2002, 175186.Google Scholar
 Kirimasthong K, Manorat A: Inference of gene regulatory network by Bayesian network using MetropolisHastings Algorithm. Proceedings of 3rd International Conference on Advanced Data Mining and Alications: 0608 Aug, 2007; Harbin. Edited by: Alhajj R, Gao H. 2007, 276286.Google Scholar
 Segal E, Taskar B, Gasch A, Friedman N, Koller D: Rich probabilistic models for gene expression. Bioinformatics. 2001, 17 (Suppl 1): S243S252. 10.1093/bioinformatics/17.suppl_1.S243.View ArticlePubMedGoogle Scholar
 Hu Z, Killion P, Iyer V: Genetic reconstruction of a functional transcriptional regulatory network. Nat Genet. 2007, 39: 683687. 10.1038/ng2012.View ArticlePubMedGoogle Scholar
 Luscombe N, Babu M, Yu H: Genomic analysis of regulatory network dynamics reveals large topological changes. Nature. 2004, 431: 308312. 10.1038/nature02782.View ArticlePubMedGoogle Scholar
 Werhli AV, Husmeier D: Reconstructing gene regulatory networks with Bayesian networks by combining expression data with multiple sources of prior knowledge. Stat Appl Genet Mol Biol. 2007, 6: Article15PubMedGoogle Scholar
 Segal E, Barash Y, Simon I, Friedman N: From promoter sequence to expression: a probabilistic framework. Proceedings of the Sixth Annual International Conference on Computational Biology: 1821 April 2002; Washington. Edited by: Myers G, Hannenhalli S, Istrail S. 2002, 263272.Google Scholar
 Liebermeister W: Linear modes of gene expression determined by independent component analysis. Bioinformatics. 2002, 18: 5160. 10.1093/bioinformatics/18.1.51.View ArticlePubMedGoogle Scholar
 Holter N, Mitra M, Maritan A: Fundamental patterns underlying gene expression profiles: simplicity from complexity. Proc Natl Acad Sci USA. 2000, 97: 84098414. 10.1073/pnas.150242097.PubMed CentralView ArticlePubMedGoogle Scholar
 Pournara I, Wernisch L: Factor analysis for gene regulatory networks and transcription factor activity profiles. BMC Bioinformatics. 2007, 8: 6110.1186/14712105861.PubMed CentralView ArticlePubMedGoogle Scholar
 Yu T, Li K: Inference of transcriptional regulatory network by twostage constrained space factor analysis. Bioinformatics. 2005, 21: 40334038. 10.1093/bioinformatics/bti656.View ArticlePubMedGoogle Scholar
 Imoto S, Kim S, Goto T: Bayesian network and nonparametric heteroscedastic regression for nonlinear modeling of genetic network. J Bioinform Comput Biol. 2003, 1 (2): 231252. 10.1142/S0219720003000071.View ArticlePubMedGoogle Scholar
 Segal E, RavehSadka T, Schroeder M, Unnerstall U, Gaul U: Predicting expression patterns from regulatory sequence in Drosophila segmentation. Nature. 2008, 451: 535540. 10.1038/nature06496.View ArticlePubMedGoogle Scholar
 Almon RR, Yang E, Lai W, Androulakis IP: Circadian variations in rat liver gene expression: relationships to drug actions. J Pharmacol Exp Ther. 2008, 326: 700716. 10.1124/jpet.108.140186.PubMed CentralView ArticlePubMedGoogle Scholar
 Yan J, Wang HF, Liu YT, Shao CX: Analysis of Gene Regulatory Networks in the Mammalian Circadian Rhythm. Plos Comput Biol. 2008, 4 (10): e100019310.1371/journal.pcbi.1000193.PubMed CentralView ArticlePubMedGoogle Scholar
 Barenco M, Tomescu D, Brewer D, Callard R, Stark J, Hubank M: Ranked prediction of p53 targets using hidden variable dynamic modelling. Genome Biol. 2006, 7 (3): R2510.1186/gb200673r25.PubMed CentralView ArticlePubMedGoogle Scholar
 Cantone I, Marucci L, Iorio F, et al: A yeast synthetic network for in vivo assessment of reverse engineering and modelling aroaches. Cell. 2009, 137: 172181. 10.1016/j.cell.2009.01.055.View ArticlePubMedGoogle Scholar
 Stolovitzky G, Monroe D, Califano A: Dialogue on reverseengineering assessment and methods: the DREAM of highthroughput pathway inference. Ann N Y Acad Sci. 2007, 1115: 122. 10.1196/annals.1407.021.View ArticlePubMedGoogle Scholar
 Opper M, Sanguinetti G: Learning combinatorial transcriptional dynamics from gene expression data. Bioinformatics. 2010, 26 (13): 16231629. 10.1093/bioinformatics/btq244.View ArticlePubMedGoogle Scholar
 Shi Y, Klutstein M, Simon I, Mitchell T, BarJoseph Z: A combined expressioninteraction model for inferring the temporal activity of transcription factors. J Comput Biol. 2009, 16: 10351049. 10.1089/cmb.2009.0024.PubMed CentralView ArticlePubMedGoogle Scholar
 Gertz J, Siggia E, Cohen B: Analysis of combinatorial cisregulation in synthetic and genomic promoters. Nature. 2002, 457: 215218. 10.1038/nature07521.View ArticleGoogle Scholar
 Tanay A: Extensive lowaffinity transcriptional interactions in the yeast genome. Genome Res. 2006, 16: 962972. 10.1101/gr.5113606.PubMed CentralView ArticlePubMedGoogle Scholar
 Johnson SM, Tan FJ, McCullough HL, Riordan DP, Fire AZ: Flexibility and constraint in the nucleosome core landscape of Caenorhabditis elegans chromatin. Genome Res. 2006, 16: 15051516. 10.1101/gr.5560806.PubMed CentralView ArticlePubMedGoogle Scholar
 Kaplan N, Moore IK, FondufeMittendorf Y, et al: The DNAEncoded Nucleosome Organization of a Eukaryotic Genome. Nature. 2008, 458: 362366.PubMed CentralView ArticlePubMedGoogle Scholar
 Bulyk ML: Computational prediction of transcriptionfactor binding site locations. Genome Biol. 2003, 5 (1): 20110.1186/gb200351201.PubMed CentralView ArticlePubMedGoogle Scholar
 Naum I, Gary D, Ilya P: Computational technique for improvement of the positionweight matrices for the DNA/protein binding sites. Nucleic Acids Res. 2005, 33: 22902301. 10.1093/nar/gki519.View ArticleGoogle Scholar
 Cai L, Friedman N, Sunney X: Stochastic protein expression in individual cells at the single molecule level. Nature. 2006, 440: 580586. 10.1038/440580a.View ArticleGoogle Scholar
 Ocone A, Sanguinetti G: Reconstructing transcription factor activities in hierarchical transcription network motifs. Bioinformatics. 2011, 27 (20): 28732879. 10.1093/bioinformatics/btr487.View ArticlePubMedGoogle Scholar
 Guido S, Andreas R, Manfred O, Cedric A: Switching regulatory models of cellular stress response. Bioinformatics. 2009, 25: 12801286. 10.1093/bioinformatics/btp138.View ArticleGoogle Scholar
 Cover T, Thomas J: Elements of information theory. 2006, Wiley, New YorkGoogle Scholar
 Wang YL, Liu CL, Storey JD, et al: Precision and functional specificity in mRNA decay. Proc Natl Acad Sci USA. 2002, 99: 58605865. 10.1073/pnas.092538799.PubMed CentralView ArticlePubMedGoogle Scholar
 Shuman S: Structure, mechanism, and evolution of the mRNA capping apparatus. Prog Nucleic Acid Res Mol Biol. 2001, 66: 140.View ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.