The correlation between architecture and mRNA abundance in the genetic regulatory network of Escherichia coli
© Grondin et al; licensee BioMed Central Ltd. 2007
Received: 19 April 2007
Accepted: 17 July 2007
Published: 17 July 2007
Two aspects of genetic regulatory networks are the static architecture that describes the overall connectivity between the genes and the dynamics that describes the sequence of genes active at any one time as deduced from mRNA abundances. The nature of the relationship between these two aspects of these networks is a fundamental question. To address it, we have used the static architecture of the connectivity of the regulatory proteins of Escherichia coli to analyse their relationship to the abundance of the mRNAs encoding these proteins. In this we build on previous work which uses Boolean network models, but impose biological constraints that cannot be deduced from the mRNA abundances alone.
For a cell population of E. coli, we find that there is a strong and statistically significant linear dependence between the abundance of mRNA encoding a regulatory protein and the number of genes regulated by this protein. We use this result, together with the ratio of regulatory repressors to promoters, to simulate numerically a genetic regulatory network of a single cell. The resulting model exhibits similar correlations to that of E. coli.
This analysis clarifies the relationship between the static architecture of a regulatory network and the consequences for the dynamics of its pattern of mRNA abundances. It also provides the constraints on the architecture required to construct a model network to simulate mRNA production.
The interactions of the many molecular constituents of a cell can be expressed in term of various networks, such as protein-protein interaction networks [1–3], metabolic networks [4, 5] or genetic networks , in which the cell would be represented as a network of networks. Genetic regulatory networks are complex systems in which the agents or genes, that are the nodes of the network, each carry out the combined processes of transcription, translation and post-translational modifications, and the links represent the causal influences amongst these agents . There are two aspects important for understanding these networks. The first is the static architecture. This comprises the overall connectivity or architecture, namely, which nodes are connected to which others, and the designation of links as either promoters or repressors. The second is the dynamics, namely, how it is determined which nodes are active at any one time, that is, the genes that are expressed, and what determines the level of activity at an active node. The architecture and dynamics gives rise to a pattern of activity over time and the corresponding time-dependent activities. In the case of cell biology, a major goal is to explain how the genetic regulatory network functions to produce mRNAs and hence phenotypes. Specific patterns of connection that are expected to reveal mechanisms of regulation and influence the dynamics of the network have already been shown [8, 9]. However, there are several problems in trying to attain this goal on a larger scale. Indeed, the relationship between the static architecture and the dynamics of the genetic regulatory network is uncertain; in other words, the information available on the architecture, even if it were complete, may be insufficient to deduce the dynamics.
The information is unavailable that is needed on the distribution of the different species of mRNA over time in an individual cell and, moreover, that is needed for a representative number of the different cells that make up the heterogeneous population. On the other hand information is available on mRNA abundances in populations grown in a variety of conditions, but the relationship of these to the network architecture is unclear.
Network simulation might be expected to clarify the relationship between the phenotype of individual cells, the static architecture of their genetic regulatory circuits and the abundances of mRNA extracted from cell populations. Construction of such a model network should be constrained by the static architecture characteristic of real biological systems. For example, in both Saccharomyces cerevisiae and Escherichia coli, the number of regulatory proteins binding a gene is exponentially distributed, whilst the number of genes a transcription factor can bind follows a decaying power-law . Another important constraint is the distribution of mRNA abundances in populations of cells which is best fitted by a log-normal function with a decaying power-law tail [10, 11]. Here, we analyse the relationship between the network architecture and its regulatory behaviour using experimental data from E. coli. This reveals that the abundance of the mRNA encoding a regulatory protein is strongly correlated with the number of genes regulated by that protein.. To implement this relationship we use a model similar to a two state Boolean network in which nodes are either on or off, but with rates of production, for the nodes that are switched on, proportional to the number of outgoing links. Boolean networks have long been used as a biological model of cell differentiation [7, 12] or in the inference of genetic regulatory networks from mRNA data [13–15], for example.
Network architecture and mRNA abundances in E. coli
Information about the architecture of the transcriptional network of E. coli is contained in regulonDB  while data on mRNA abundances is extracted from the ASAP database . Both sets of data are combined in order to investigate correlations between incoming or outgoing degrees of connectivity and mRNA abundances (see the methods section for details).
The network simulation
We have adopted a simple genetic regulatory model network in which the nodes are agents that carry out the combined processes of transcription, translation and post-translational modifications. The interactions between the agents are the cause of the actions performed by the agents. The indirect influence of one gene on another in a cell is therefore replaced by the direct action of one agent on another in the model. This simplification, which allows us to concentrate only on the mRNA abundance, is valid if we consider the mRNA abundance to be correlated to the protein abundance. Such correlation has been previously studied in S. cerevisiae [18, 19]. The model and the simulation from which we extract the following data are described in more detail in the methods section.
The approach to the cell as a network – or network of networks – holds out the promise of a deep level of understanding of the origin of the phenotype. Study of the connectivity in metabolism has revealed power-law relationships [4, 5], the significance of which is a matter of debate [20, 21], whilst study of the connectivity of the transcriptional regulatory network in S. cerevisiae has revealed that the number of genes encoding transcription factors has a power-law relationship to the degree of outgoing connections (how many genes are regulated by the transcription factor) but has an exponential relationship to the incoming connections (how many transcription factors regulate the gene in question) . This raises the question of the relationship between such static patterns in the architecture of the overall network and the phenotype in terms of the mRNA of individual cells in which only a part of the network functions at any one time.
To begin to address it, we looked first at the static architecture of the network of regulatory proteins in E. coli from the RegulonDB  and at the abundances of the mRNA corresponding to these proteins in the ASAP database . We find that there is a linear relationship between the outgoing degree of connectivity of the regulatory protein and the abundance of mRNA encoding that protein, but that there is no evident relationship between the incoming degree and mRNA abundance. It might be argued that this has little meaning. On the one hand, the ASAP data are of heterogeneous populations of cells and not of individuals (even if mainly one set of growth conditions was used) whilst, on the other hand, the similar pattern of outgoing degrees of connectivity observed in architecture by others and in mRNA abundances by us here might be coincidental. We therefore constructed an artificial network based closely on the architecture of the genetic regulatory network of E. coli. The running of this network and the accumulation of the 'mRNA' generated is equivalent to taking a series of snapshots of an individual bacterium and adding up all the mRNA generated (which in vivo generally has a short half-life). Comparison of the results from the simulation data with those from E. coli populations indicates similar behaviour in terms of correlations between the mRNA abundances and the architecture of the system. It might be argued that this result is to be expected since the production of mRNA built into the model is correlated to the outgoing degree of the node. However, this argument ignores the fact that it is the dynamics of the network that determines which nodes are actually activated. Indeed, it is this relationship between static architecture and functional dynamics that the model network clarifies.
In comparison with the E. coli data, the Pearson correlation coefficient for the simulated data reveals more dramatically the fact that mRNA abundance is correlated to the outgoing rather than the incoming degree of connectivity. This may be because, in the model network, the gene-to-gene interactions are represented by a single intermediate, as we have assumed the mRNA abundances to be perfectly correlated to that of the proteins. Assuming a weaker connection would simply tend to diminish the correlation with the degree of connectivity.
The proportion of negative links μ plays a role similar to that of the homogeneity parameter in Boolean network [7, 22], in that it determines the probability for a node to be ON according to its inputs. It is easy to see that for μ close to zero, most of the nodes in the network are ON and the distribution of abundance must then be very similar to the distribution of the outgoing degrees of connectivity (equation (2)). In this case the correlation, between the outgoing degree of connectivity and the abundance, exhibited in the dynamical behaviour of the network, follows from the static architecture and is close to 1. On the other hand, for μ close to 1, as most of the nodes are OFF, the correlation is close to 0. In this case the static architecture is unrelated to the mRNA abundance! Furthermore, it is also possible to engineer the distribution of negative links so that, for example, the probability that the nodes of high outgoing degree of connectivity are expressed tends to zero. This would have a similar effect on the correlation as an increase in μ.
In our model, the proportion of negative links is therefore an essential parameter in determining the degree of the correlation. Thus it is a matter for numerical experiment to determine the correlation for a realistic range of values of μ. We find that for 0 < μ < 1, the correlation r is 1 <r < 0. At μ = 0.37 as shown in the illustration, the correlations are closed to what is observed in E. coli.
Many factors intervene in the dynamics of gene regulation. This includes local factors such as the sequence specificity of the transcription factor DNA binding site  and global ones such as the structural organisation of the chromosomes . The timing of interaction is another important factor: differential timing of interaction is suggested to explain the large diversity of organisms against a not so large genomic diversity . Furthermore, it is also known that local structures, such as the feed-forward loop significantly represented in E. coli for example, have an effect on the kinetics of interaction which can affect the regulatory response of genes [9, 26]. The correlation we find between the architecture of the network and the mRNA activity is one of the numerous factors influencing gene regulation and needs to be considered as such.
We have shown that there is a significant correlation between architecture and mRNA. We can ask the reason for such correlation. We speculate that it may have to do with a selective pressure to produce sufficient regulator for the task of regulation. Producing too little leads to failure to generate the phenotype whilst producing too much is not simply wasteful but also means that there is more regulator to be eliminated in order to generate another phenotype.
The mRNA abundance is obviously correlated to the mechanism of regulation of transcription factors. Here we have shown that there is also a significant correlation between the architecture and the function.
One value of simulated genetic regulatory networks is that they can help bridge gaps in our understanding, such as that between the static architecture of the biological network and the consequences of its dynamic functioning in individual cells (which results in the experimentally accessible data on heterogeneous populations of mRNA). The model network used here is based closely on information about the architecture of the real network. Determining what information is needed to go from architecture to functioning and back will depend on continued exploration of real and simulated networks and of the relationship between them.
Analysis of experimental data from E. coli suggests a significant correlation between the number of genes regulated by a transcription factor and the abundance of the mRNA that encode this transcription factor. It does not suggest an evident correlation between the number of regulators of a gene and the abundance of the mRNA it encodes. Since the relationship between the architecture of a genetic regulatory network and its functioning is unclear, a model network was constructed with architecture similar to that of the E. coli network. The correlations between mRNA abundances and degrees of incoming and outgoing connectivity observed in the E. coli data are corroborated by the correlations in the data generated by the model.
The E. coli data
The regulonDB  gives for E. coli the regulatory genes and the genes that they regulate. We calculate the incoming degree of connectivity, k in , of given genes corresponding to the number of transcription factors regulating a given gene, as well as the outgoing degree of connectivity, k out , which corresponds to the number of genes a given transcription factor regulates. In July 2005, this database contained 131 regulatory genes and 925 regulated genes.
The mRNA abundances in E. coli is obtained from the ASAP database . The abundances were measured in microarray experiments using mRNAs extracted from populations of E. coli grown in standard conditions in which cells were grown for the majority in MOPS minimal medium and harvested in early exponential phase . Here, the mRNA abundances we use are the average values over the 8 to 11 repeats of the microarray experiments.
By combining the two datasets we identify 859 genes and for which the abundance and the degrees of incoming and outgoing connectivity are known (see additional files 2 and 3 for the data). Amongst those genes, of which 113 are regulatory, 72 are not associated to regulatory genes (k in = 0) and 787 are regulated (k in > 0). The genes are then grouped according their incoming or outgoing degree of connectivity and the average of the corresponding mRNAs calculated.
Characteristics of the model
In our genetic regulatory model, we consider directed networks where the agents, or nodes, represent the cellular machinery and the links represent the regulating influence of the agents on each other. This model is based on a Boolean network [7, 12, 15]. The three principal features of this model described below are (i) the architecture of the network, (ii) the dynamics of regulation of the agents and (iii) the activity function of the agents.
Contrary to the classical representation of Boolean networks, the links between agents are considered to be either positive or negative. This models, respectively, the activation or inhibition capability of the agents on each other. The proportion of negative links is labelled μ. The difference from standard Boolean networks is in the proportion of negative links μ which differs from what is called the internal homogeneity, p, that is the proportion of output nodes that are ON according the inputs [7, 22].
Data show that in S. cerevisiae and E. coli  the distribution of the incoming degree of connectivity follows a Poisson distribution while the distribution of the outgoing degree of connectivity follows a power-law. To generate a network with such architecture, we first generate the adjacency matrix of an undirected network, that is a ij = a ji , with a power-law distribution of both the incoming and outgoing degree of connectivity using the Barabasi-Albert model . A direction is then given to the links by setting at random element a ij = 0 or a ji = 0 with equal probability. The columns of the resulting adjacency matrix are then randomised in order to give a Poisson distribution to the incoming degree of connectivity while the distribution of the outgoing degree of connectivity remains unchanged.
and it is OFF otherwise. In the following, b a is in fact set to 0. The activation function in equation (1) expresses thresholds conditioned, for example, by the specificity of the sequence or the concentration of the regulator, which may be more realistic than the binary binding or not-binding of a transcription factor to specific DNA sequences [23, 29]. Thus, although equation (1) could be expressed in terms of rather complex Boolean functions the direct formulation given here is more appropriate.
We add to the model that the level of expression of an agent is measured by the abundance of its product and is given by the activity function. We do not consider here the effects of a variable lifetime of the product, which is then arbitrarily set to one time step. Under this condition, the rate of synthesis of a product and its abundance are identical. The abundance is proportional to the rate of transcription in two cases: (i) the reaction is at equilibrium or, (ii) as here, the product is quickly degraded (1 time step in our case).
In the following, b e is set to 0.
The results of a typical run of the model are presented in the results section for a network with the following parameters. The network is constructed with 1500 nodes. This is larger than the present dataset used for E. coli but smaller than the 3000 estimated genes in E. coli. The proportion of negative links is set to μ = 0.37, close to the proportion of links that have a negative effect in E. coli (~0.41) as calculated from regulonDB. Note that the possibility of a link having dual actions (positive and negative), as observed in the data from E. coli, is not considered, hence there would be no significance to taking μ to be exactly 0.41. The results for μ = 0.37 are typical of the range 0.35 < μ < 0.42.
Genetic regulatory networks are sparse and the number of regulators acting on a gene is low . Here, the mean degree of connectivity is set to 6 that is, on average, a node has 3 incoming and 3 outgoing links. For comparison, the mean degree of connectivity of the 113 regulatory genes in E. coli is about 15 while that of all the 859 genes is about 2. Finally, the networks are not autonomous and a number of nodes are therefore chosen to receive an external input. Those nodes remain ON at any time regardless the value of equation (1). In the present case, 50 nodes are chosen at random to receive an external input. This value is sufficient to ensure a dynamical response of the network without a strong clamping effect.
Because the dynamics of the model is that of Boolean networks, there is flexibility in the setting of the parameters without affecting the outcome. This ensures that the results are consistent over small variations of the parameters, upon further observations or collection of data, for example. This is also satisfactory with the incompleteness of the data, upon the condition that there are enough data to proceed at such a large scale. Therefore, the parameters used in the model need to be close to, but not necessarily equal to the set of observed parameters.
For the network architecture and with the parameters given above, simulations show that there is a large probability for the network to be periodic, an identical configuration of the network as given by the states of the nodes being likely to appear twice in a short length of time. Our model exhibits a period and has about a quarter of the nodes are ON permanently and another quarter where the state is variably ON and OFF. Population abundance data are generated from the network by summing over a period. In the simulation, we consider only the abundance at the nodes that have been activated at least once (see additional files 4 and 5 for the generated data and corresponding degrees of connectivity).
We thank the Epigenomics Project for support and P. Balaresque for the help with the statistical analysis. We also thank the reviewers who have helped improve the original manuscript.
- Rual JF, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N, Berriz GF, Gibbons FD, Dreze M, Ayivi-Guedehoussou N, Klitgord N, Simon C, Boxem M, Milstein S, Rosenberg J, Goldberg DS, Zhang LV, Wong SL, Franklin G, Li S, Albala JS, Lim J, Fraughton C, Llamosas E, Cevik S, Bex C, Lamesch P, Sikorski RS, Vandenhaute J, Zoghbi HY, Smolyar A, Bosak S, Sequerra R, Doucette-Stamm L, Cusick ME, Hill DE, Roth FP, Vidal M: Towards a proteome-scale map of the human protein-protein interaction network. Nature. 2005, 437 (7062): 1173-1178. 10.1038/nature04209PubMedView ArticleGoogle Scholar
- Schwikowski B, Uetz P, Fields S: A network of protein-protein interactions in yeast. Nat Biotechnol. 2000, 18 (12): 1257-1261. 10.1038/82360PubMedView ArticleGoogle Scholar
- Rain JC, Selig L, De Reuse H, Battaglia V, Reverdy C, Simon S, Lenzen G, Petel F, Wojcik J, Schachter V, Chemama Y, Labigne A, Legrain P: The protein-protein interaction map of Helicobacter pylori. Nature. 2001, 409 (6817): 211-215. 10.1038/35051615PubMedView ArticleGoogle Scholar
- Raine DJ, Norris V: Network Structure of Metabolic Pathways. InterJournal of Complex Systems. 2000Google Scholar
- Jeong H, Tombor B, Albert R, Oltvai ZN, Barabasi AL: The large-scale organization of metabolic networks. Nature. 2000, 407 (6804): 651-654. 10.1038/35036627PubMedView ArticleGoogle Scholar
- Guelzim N, Bottani S, Bourgine P, Kepes F: Topological and causal structure of the yeast transcriptional regulatory network. Nat Genet. 2002, 31 (1): 60-63. 10.1038/ng873PubMedView ArticleGoogle Scholar
- Kauffman SA: The origins of order: Self-Organization and Selection in Evolution. 1993, Oxford , Oxford University PressGoogle Scholar
- Yeger-Lotem E, Sattath S, Kashtan N, Itzkovitz S, Milo R, Pinter RY, Alon U, Margalit H: Network motifs in integrated cellular networks of transcription-regulation and protein-protein interaction. Proc Natl Acad Sci U S A. 2004, 101 (16): 5934-5939. 10.1073/pnas.0306752101PubMed CentralPubMedView ArticleGoogle Scholar
- Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U: Network motifs: simple building blocks of complex networks. Science. 2002, 298 (5594): 824-827. 10.1126/science.298.5594.824PubMedView ArticleGoogle Scholar
- Hoyle DC, Rattray M, Jupp R, Brass A: Making sense of microarray data distributions. Bioinformatics. 2002, 18 (4): 576-584. 10.1093/bioinformatics/18.4.576PubMedView ArticleGoogle Scholar
- Ueda HR, Hayashi S, Matsuyama S, Yomo T, Hashimoto S, Kay SA, Hogenesch JB, Iino M: Universality and flexibility in gene expression from bacteria to human. Proc Natl Acad Sci U S A. 2004, 101 (11): 3765-3769. 10.1073/pnas.0306244101PubMed CentralPubMedView ArticleGoogle Scholar
- Huang S: Gene expression profiling, genetic networks, and cellular states: an integrating concept for tumorigenesis and drug discovery. J Mol Med. 1999, 77 (6): 469-480. 10.1007/s001099900023PubMedView ArticleGoogle Scholar
- Liang S, Fuhrman S, Somogyi R: Reveal, a general reverse engineering algorithm for inference of genetic network architectures. Pac Symp Biocomput. 1998, 18-29.Google Scholar
- Akutsu T, Miyano S, Kuhara S: Inferring qualitative relations in genetic networks and metabolic pathways. Bioinformatics. 2000, 16 (8): 727-734. 10.1093/bioinformatics/16.8.727PubMedView ArticleGoogle Scholar
- Martin S, Zhang Z, Martino A, Faulon JL: Boolean Dynamics of Genetic Regulatory Networks Inferred from Microarray Time Series Data. Bioinformatics. 2007, 23 (7): 866-874. 10.1093/bioinformatics/btm021PubMedView ArticleGoogle Scholar
- Salgado H, Gama-Castro S, Martinez-Antonio A, Diaz-Peredo E, Sanchez-Solano F, Peralta-Gil M, Garcia-Alonso D, Jimenez-Jacinto V, Santos-Zavaleta A, Bonavides-Martinez C, Collado-Vides J: RegulonDB (version 4.0): transcriptional regulation, operon organization and growth conditions in Escherichia coli K-12. Nucleic Acids Res. 2004, 32 (Database issue): D303-6. 10.1093/nar/gkh140PubMed CentralPubMedView ArticleGoogle Scholar
- Glasner JD, Liss P, Plunkett G, Darling A, Prasad T, Rusch M, Byrnes A, Gilson M, Biehl B, Blattner FR, Perna NT: ASAP, a systematic annotation package for community analysis of genomes. Nucleic Acids Res. 2003, 31 (1): 147-151. 10.1093/nar/gkg125PubMed CentralPubMedView ArticleGoogle Scholar
- Gygi SP, Rochon Y, Franza BR, Aebersold R: Correlation between protein and mRNA abundance in yeast. Mol Cell Biol. 1999, 19 (3): 1720-1730.PubMed CentralPubMedView ArticleGoogle Scholar
- Greenbaum D, Jansen R, Gerstein M: Analysis of mRNA expression and protein abundance data: an approach for the comparison of the enrichment of features in the cellular population of proteins and transcripts. Bioinformatics. 2002, 18 (4): 585-596. 10.1093/bioinformatics/18.4.585PubMedView ArticleGoogle Scholar
- Keller EF: Revisiting "scale-free" networks. Bioessays. 2005, 27 (10): 1060-1068. 10.1002/bies.20294PubMedView ArticleGoogle Scholar
- Norris V, Raine D: On the utility of scale-free networks. Bioessays. 2006, 28 (5): 563-564. 10.1002/bies.20415PubMedView ArticleGoogle Scholar
- Weisbuch G, Stauffer D: Phase transition in cellular random Boolean nets. J Physique. 1987, 48 (1): 11-18. 10.1051/jphys:0198700480101100.View ArticleGoogle Scholar
- Gerland U, Moroz JD, Hwa T: Physical constraints and functional characteristics of transcription factor-DNA interaction. Proc Natl Acad Sci U S A. 2002, 99 (19): 12015-12020. 10.1073/pnas.192693599PubMed CentralPubMedView ArticleGoogle Scholar
- Kepes F: Periodic transcriptional organization of the E.coli genome. J Mol Biol. 2004, 340 (5): 957-964. 10.1016/j.jmb.2004.05.039PubMedView ArticleGoogle Scholar
- Wilkins AS: The evolution of developmental pathways. 2002, Sinauer AssociatesGoogle Scholar
- Mangan S, Alon U: Structure and function of the feed-forward loop network motif. Proc Natl Acad Sci U S A. 2003, 100 (21): 11980-11985. 10.1073/pnas.2133841100PubMed CentralPubMedView ArticleGoogle Scholar
- Allen TE, Herrgard MJ, Liu M, Qiu Y, Glasner JD, Blattner FR, Palsson BO: Genome-scale analysis of the uses of the Escherichia coli genome: model-driven analysis of heterogeneous data sets. J Bacteriol. 2003, 185 (21): 6392-6399. 10.1128/JB.185.21.6392-6399.2003PubMed CentralPubMedView ArticleGoogle Scholar
- Barabasi AL, Albert R: Emergence of scaling in random networks. Science. 1999, 286 (5439): 509-512. 10.1126/science.286.5439.509PubMedView ArticleGoogle Scholar
- Kalir S, McClure J, Pabbaraju K, Southward C, Ronen M, Leibler S, Surette MG, Alon U: Ordering genes in a flagella pathway by analysis of expression kinetics from living bacteria. Science. 2001, 292 (5524): 2080-2083. 10.1126/science.1058758PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.