Comparative analysis of the transcription-factor gene regulatory networks of E. coli and S. cerevisiae
© Guzmán-Vargas and Santillán; licensee BioMed Central Ltd. 2008
Received: 20 March 2007
Accepted: 31 January 2008
Published: 31 January 2008
The regulatory interactions between transcription factors (TF) and regulated genes (RG) in a species genome can be lumped together in a single directed graph. The TF's and RG's conform the nodes of this graph, while links are drawn whenever a transcription factor regulates a gene's expression. Projections onto TF nodes can be constructed by linking every two nodes regulating a common gene. Similarly, projections onto RG nodes can be made by linking every two regulated genes sharing at least one common regulator. Recent studies of the connectivity pattern in the transcription-factor regulatory network of many organisms have revealed some interesting properties. However, the differences between TF and RG nodes have not been widely explored.
After analysing the RG and TF projections of the transcription-factor gene regulatory networks of Escherichia coli and Saccharomyces cerevisiae, we found several common characteristic as well as some noticeable differences. To better understand these differences, we compared the properties of the E. coli and S. cerevisiae RG- and TF-projected networks with those of the corresponding projections built from randomized versions of the original bipartite networks. These last results indicate that the observed differences are mostly due to the very different ratios of TF to RG counts of the E. coli and S. cerevisiae bipartite networks, rather than to their having different connectivity patterns.
Since E. coli is a prokaryotic organism while S. cerevisiae is eukaryotic, there are important differences between them concerning processing of mRNA before translation, DNA packing, amount of junk DNA, and gene regulation. From the results in this paper we conclude that the most important effect such differences have had on the development of the corresponding transcription-factor gene regulatory networks is their very different ratios of TF to RG numbers. This ratio is more than three times larger in S. cerevisiae than in E. coli. Our calculations reveal that, both species' gene regulatory networks have very similar connectivity patterns, despite their very different TF to RG ratios. An this, to our consideration, indicates that the structure of both networks is optimal from an evolutionary viewpoint.
Knowing the complete genome of a given species is just a piece of the information thought to be useful in understanding one of the most complicated and important puzzles in science: How does a biological system work? To fully understand the behaviour of an organism, an organ, or even a single cell, we need to understand the underlying gene regulatory dynamics. Nevertheless, given the complexity of even a single cell, answering this question is impossible for the time being.
Recent computer simulations of partial or whole genetic networks have demonstrated network behaviours – commonly called systems or emergent properties – that were not apparent from examination of only a few isolated interactions alone. Moreover, the individual building blocks – such as genes or proteins – in a living organism may not posses the explicit understanding of what they perform in the context of cellular processes. The notion of cellular process as an emergent property of the collection of individual interactions may in fact be a better description of life.
The recent advance in high-throughput techniques in genomics, such as microarrays and DNA automatic sequencing, as well as the development of powerful bioinformatics tools, have rendered an impressive amount of novel biological data. For instance, we now know the genome-wide transcription-factor regulatory networks of various species. Unfortunately, the biological information and the mathematical and computational tools available do not allow the development of detailed dynamical models at this level. An alternative to the dilemma stated in the previous paragraph is to employ the techniques of network theory. Among others, the advantages of network theory are that: it allows the description of a network structure with graph concepts, and reveals organizational features shared with numerous other biological and non-biological networks; it is possible with network theory to quantitatively describe networks of hundreds or thousands of interacting components; and finally, in some cases, the observed network topology gives clues about its evolution, and the observed network organization may help to elucidate its function and dynamic responses [1–7].
In this work we present a comparative analysis of two different genome-wide transcription-factor gene regulatory networks: those of the bacterium Escherichia coli and the budding yeast Saccharomyces cerevisiae. We measured various network properties for the bipartite networks (with unidirectional links from the transcription factors to the regulated genes), as well as for the networks resulting from projections onto the transcription-factor and onto the regulated-gene nodes. The performed measurements include the clustering coefficient, the degree distribution, the efficiency of information transfer and the network cost. We also constructed randomized networks with the same degree distributions as those of E. coli and S. cerevisiae, and carried out the same measurements to compare with the original networks. Finally, we tested network robustness by subjecting the original and the randomized networks to removal of the most connected nodes, and seeing to what extent the clustering coefficient changes.
The basic molecular mechanisms involved in gene expression are essentially the same in both prokaryotic and eukaryotic cells. However, there are important differences between them concerning processing of mRNA before translation, DNA packing, amount of junk DNA, and gene regulation. Since E. coli is a prokaryotic organism while S. cerevisiae is eukaryotic, we investigate in this work the possibility that the above referred differences emerge at the whole network level and can be identified via network theory analysis.
Results and Discussion
Global and projected network topology
Random networks have been widely studied and they usually serve as a reference against which other networks are compared to gather information regarding the node connectivity patterns. With this purpose, we built randomized versions of the E. coli and S. cerevisiae bipartite networks by randomly reconnecting the network links – following the procedure detailed in Materials and Methods. From the way they are built, the randomized networks have the same number of TF and RG nodes, and each node has the same number of links as in the corresponding original networks.
In the plots of figures 4c and 4d, the connectivity distributions for the E. coli and S. cerevisiae RG-projected networks are presented. Notice that the connectivity distributions for the S. cerevisiae RG-projected networks show an approximately exponential decreasing behaviour, while the distributions corresponding to E. coli have various local maxima and present a slow decreasing tendency.
Interestingly, the TF and RG projected networks of E. coli and S. cerevisiae have very different connectivity structures, despite the strong similarities observed in the bipartite-network link distributions (see Figure 3). Furthermore, the connectivity distributions of the original and randomized RG-projected networks are very similar in both the E. coli and S. cerevisie cases, while small deviations from the behaviour of the randomized plots are observed in the TF projections. This indicates to our understanding that the observed differences between the connectivity distributions of the E. coli and S. cerevisiae projected networks are mainly due to the very different number transcription factors and regulated genes in both organisms.
Statistics of the transcriptional regulatory networks. In this table we show different statistical properties (like the global communication efficiency, the clustering coefficient, and the cost), measured for the bipartite, TF-projected and RG-projected, original and randomized networks of E. coli and S. cerevisiae. Subindex rand denotes the values corresponding to the randomized networks.
Bipartite network (E. coli)
TF projection (E. coli)
0.515 ± 0.011
0.642 ± 0.015
0.104 ± 0.002
RG projection (E. coli)
0.524 ± 0.001
0.811 ± 0.002
0.16 ± 0.002
Bipartite network (S. cerevisiae)
TF projection (S. cerevisiae)
0.755 ± 0.003
0.807 ± 0.008
0.518 ± 0.003
RG projection (S. cerevisiae)
0.563 ± 0.005
0.693 ± 0.004
0.092 ± 0.001
Following the procedure detailed in Materials and Methods, we calculated the global communication efficiency (E) for the E. coli and S. cerevisiae TF- and RG-projected networks, as well as for their randomized versions. The results are also reported in Table 1. An efficiency close to one means that very short paths can be found communicating any two nodes in the network. Since the projected networks are not fully connected, we calculated E for the largest component in each case. In all cases, these largest components comprise the vast majority of the nodes. Notice that the communication efficiencies of the original and randomized RG-projected networks are very similar for both E. coli and S. cerevisiae. On the other hand, the value of E for the TF-projected networks is smaller in the original than in the randomized networks. This is true for both organisms, although the difference is smaller in the case of S. cerevisiae. The cost, σ associated to a network is defined as the ratio of the current number of links to the maximum possible link count, given the network nodes. We can see in Table 1 that the original and randomized RG projections present very similar network costs for the two studied organisms. In contrast, the original TF-projected networks of both species have smaller cost values than the corresponding randomized projections. Notice that, in both species, the TF-projected networks have fewer links than the corresponding original networks. This is due to the fact that some RGs are only regulated by a single TF and therefore such links are lost when the projection is made.
In summary, we have observed that the RG original and randomized projected networks have very similar properties for both studied organisms. Contrarily, consistent differences were observed between the corresponding original and randomized TF projections: E < E rand , C < C rand , and σ <σ rand .
Furthermore, although observed in both organisms, these differences are smaller in the S. cerevisiae case. To our consideration, this symmetric behaviour, together with the fact that the distributions of links incoming to the RG and outgoing from the TF nodes are alike in both species, indicates that the transcription-factor regulatory networks of E. coli and S. cerevisiae obey similar connection patterns, and that the most important dissimilarity between them is their very different number of regulated genes and transcription factors.
Network projections and levels of co-regulation interaction
We have seen that the clustering coefficient, the efficiency, and the cost of the original TF-projected networks are consistently smaller than those of the corresponding randomized networks. However, the TF projections can be constructed using different rules, and this may affect the above behaviour. For instance, a more restrictive rule consists of drawing a link between two TFs only if they share g S target genes or more, with g S > 1. If, when using this new rule, a high clustering coefficient is observed for high values of g S , it indicates a strong tendency to co-regulation.
Starting with the original networks and their corresponding randomized versions, we constructed TF projections for different values of g S , and compared their topological properties. The first thing we observed is that, as g S increases, the number of links in the projected networks decreases. That is, the number of TF pairs sharing at least g S target genes is a decreasing function of g S .
the efficiency, and the cost decrease with g S , and this decay is faster for the randomized networks; thus, C > C rand , E > E rand , and σ > σ rand for g S > 2. This finding reveals that the transcriptional regulatory organization has not arisen by chance and is determined by different levels of co-regulation. In contrast, in S. cerevisiae, all the three monitored quantities also decrease monotonically as g S increases, but the values corresponding to the original and randomized networks are consistently close each other.
Network robustness to directed attacks and random failures
Recent studies suggest that a network's connectivity pattern determines its robustness to external perturbations, such as removal of nodes or links. To test this, we measured the effects of directed attacks and random failures on the network organization. These measurements were carried out as follows:
1. A given fraction of either TF or RG nodes was eliminated from the original and the randomized E. coli and S. cerevisiae bipartite networks. The nodes to be removed were either chosen as the most connected ones (directed attacks), or at random (random failures).
2. The networks' emerging organization was evaluated by calculating their clustering coefficient.
3. The whole process was repeated for several fractions of removed nodes.
Our calculations reveal that random removal of nodes (random failures) has almost no effect on the E. coli and S. cerevisiae RG- and TF-projected networks, both original and randomized: the clustering coefficient remains quite similar to its initial value even when 30% of TF or RG nodes are removed from the corresponding networks (data not shown). This behaviour is in agreement with the fault tolerance properties that characterize scale-free networks .
We have carried a comparative analysis of the transcription-factor gene regulatory networks of E. coli and S. cerevisiae. This analysis consisted in measuring a number of statistical properties on the TF and RG projections of both networks, as well as on randomized versions of them. Some interesting observations arising from these measurements are:
• The ratio of transcription factor to regulated gene number is about 0.116 in E. coli, and about 0.036 in S. cerevisiae.
• The distributions of link counts of the E. coli and S. cerevisiae bipartite networks are very much alike; they can be approximately fitted by a decreasing power-law function.
• The connectivity distributions of the E. coli and S. cerevisiae RG-projected networks are very different, as are the connectivity distributions of the corresponding TF projections.
Intriguingly, the connectivity distributions associated to the projected networks of E. coli are quite different to those corresponding to S. cerevisiae; whereas the connectivity distributions of the original bipartite networks are alike. A possible explanation for these differences is that the nodes of the E. coli and S. cerevisiae networks have different connection patterns. However, when the same measurements were carried out on networks preserving the number of RG and TF nodes, as well number of links incoming and outgoing from each node, but in which the links have been randomly reconnected, we observed that their projections have connectivity distributions very similar to those of the corresponding original networks. Therefore, we conclude that the above mentioned differences are mostly due to the very dissimilar ratios of RG to TF numbers the E. coli and S. cerevisiae networks have.
We also measured the clustering coefficient (C), the communication efficiency (E), and the cost (σ) of the RG- and TF-projected networks of E. coli and S. cerevisiae, both original and randomized. The values of all these quantities associated the E. coli networks differ from those of S. cerevisiae. However, the E, C, and σ values of both original RG projections are very similar to those of their randomized counterparts. Moreover, the following relations are satisfied for the TF projections of both species: E < E rand , C < C rand , and σ > σ rand . Recall that the randomized networks have the same number of TF and RG nodes, as well as the same number of links for every node.
When a more restrictive rule was used to perform the projections onto the TF nodes, we observed important differences between the original networks and their randomized counterparts, for both species. These results suggest that the transcriptional regulatory networks involve different levels of co-regulation. Finally, in agreement with the assertion above, the RG- and TF-projected networks of both species show similar robustness properties to directed attacks on the RG and TF nodes.
In all the above discussed results, we have seen that the RG-projected, original and randomized networks have very similar behaviours, for both E. coli and S. cerevisiae. In contrast, the properties of the original TF-projected networks deviate from those of their randomized counterparts, but these deviations are relatively small, and they are of the same kind in E. coli as well as in S. cerevisiae. To our consideration, these coincidences reinforce our previous assertions that the differences observed between the E. coli and S. cerevisiae networks are mainly due to their very dissimilar ratios of RG to TF numbers, and not to their nodes having very different connection patterns. Moreover, the fact that the TF original projections are consistently different from their randomized versions indicates, to our consideration, that the development of the TF connection patterns has been subject to strong evolutionary stresses, contrarily to those of the regulated genes.
E. coli is a prokaryotic organism while S. cerevisiae is eukaryotic. This means that important differences can be observed between them regarding processing of mRNA before translation, DNA packing, amount of junk DNA, and gene regulation. From the results described above we conclude that the most important effect such differences have had on the development of the corresponding transcription-factor gene regulatory networks is their very different ratio of TF to RG counts: it is more than three times larger in the S. cerevisiae than in the E. coli networks. Our calculations reveal that, both species' gene regulatory networks have very similar connection patterns between the RG and TF nodes, despite their very different numbers.
The interaction dataset for the transcription-factor regulatory network of E. Coli was obtained from the RegulonDB database . For S. cerevisiae we used the data described in . For E. Coli, the resulting network has 1402 genes, with 153 regulatory genes and 1319 regulated genes. In S. cerevisiae, the resulting network has 4441, genes with 157 transcription factors and 4410 regulated genes. The transcription-factor gene regulatory networks of E. coli and S. cerevisiae are bipartite because they consist of two different kinds of nodes: transcription factors (TF) and regulated genes (RG), with the links directed from the TF to the RG nodes. The bipartite networks can be either projected onto networks comprising only transcription factors, or onto networks comprising only regulated genes. The projections onto transcription factors are constructed by linking every two nodes regulating one common gene at least; similarly, the projections onto regulated genes are made by linking every two regulated genes sharing one common regulator at least. Randomized version of the bipartite networks were built using the following algorithm:
• Given a bipartite network, we made a list of all the RG nodes, repeating each node as many times as the number on links incoming to it in the bipartite network. A similar list was made for the TF nodes. From the way they were constructed, the number of elements in these lists equals the total number of links in the bipartite network.
• One RG and one TF were selected and eliminated from the lists above, and a link was established between these RG and TF nodes in the new randomized bipartite network. This step was repeated iteratively until the lists built in the above step are empty.
The randomized networks constructed in this way have the same number of RG and TF nodes, and each node has the same number of links incoming or outgoing from it, as the corresponding original network. A node's clustering coefficient is by definition the probability that every two of its nearest neighbours are connected between them. Thus, the the clustering coefficient can be calculated as the number of triangles with one vertex in the node divided by the total number of nearest neighbours couples. If g i is the number of links connecting k i neighbours of a node i, then, the node clustering coefficient is given by:
C i = 2g i /(k i (k i - 1)),
where k i (k i - 1)/2 is the maximum possible number of links between k i nodes. The network average clustering coefficient was calculated by averaging over all the network nodes.
where l ij is the minimum path length connecting nodes i and j.
The cost of a complex network with N nodes is defined as the ratio of the actual number of links to the maximum possible number of links between the network nodes (N(N - 1)/2).
This work was partially supported by Consejo Nacional de Ciencia y Tecnologia (Conacyt-project 49128-F-26020), COFAA-IPN, and EDI-IPN, México.
- Ma HW, Kumar B, Ditges U, Gunzer F, Buer J, Zeng AP: An extended transcriptional regulatory network of Escherichia coli and analysis of its hierarchical structure and network motifs. Nucl Acid Res. 2004, 32: 6643-6649. 10.1093/nar/gkh1009.View ArticleGoogle Scholar
- Babu MM, Luscombe NM, Aravind L, Gerstein M, Teichmann SA: Structure and evolution of transcriptional regulatory networks. Curr Opin Struct Biol. 2004, 14 (3): 283-291. 10.1016/j.sbi.2004.05.004View ArticlePubMedGoogle Scholar
- Rodríguez-Caso C, Medina MA, Solé RV: Topology, tinkering and evolution of the human transcription factor network. FEBS J. 2005, 272: 6423-6434. 10.1111/j.1742-4658.2005.05041.xView ArticlePubMedGoogle Scholar
- de Silva E, Stumpf PH: Complex networks and simple models in biology. J R Soc Interface. 2005, 2: 419-439. 10.1098/rsif.2005.0067PubMed CentralView ArticlePubMedGoogle Scholar
- Louzoun Y, Muchnik L, Solomon S: Copying nodes versus editing links: the source of the difference between genetic regulatory networks and the WWW. Bioinformatics. 2006, 22 (5): 581-588. 10.1093/bioinformatics/btk030View ArticlePubMedGoogle Scholar
- Radulescu O, Lagarrigue S, Siegel A, Veber P, Le Borgne M: Topology and static response of interaction networks in molecular biology. J R Soc Interface. 2006, 3: 185-196. 10.1098/rsif.2005.0092PubMed CentralView ArticlePubMedGoogle Scholar
- Balazsi G, Barabási AL, Oltvai N: Topological units of environmental signal processing in the transcriptional regulatory network of Escherichia coli. PNAS. 2005, 102: 7841-7846. 10.1073/pnas.0500365102PubMed CentralView ArticlePubMedGoogle Scholar
- Dobrin R, Beg KQ, Barabási AL, Oltvai NZ: Agregation of topological motifs in the Escherichia coli transcriptional regulatory network. BMC Bioinformatics. 2004, 5: 10- 10.1186/1471-2105-5-10PubMed CentralView ArticlePubMedGoogle Scholar
- Guelzim N, Bottani S, Bourgine P, Képes F: Topological and causal structure of the yeast transcriptional regulatory network. Nat Genet. 2002, 31: 60-63. 10.1038/ng873View ArticlePubMedGoogle Scholar
- Shen-Orr SS, Milo R, Mangan S, Alon U: Network motifs in the transcriptional regulation network of Escherichia coli. Nat Genet. 2002, 31: 64-68. 10.1038/ng881View ArticlePubMedGoogle Scholar
- Albert R, Jeong H, Barabási AL: Error and attack tolerance of complex networks. Nature. 2000, 406:Google Scholar
- Salgado H, Gama-Castro S, Peralta-Gil M, Díaz-Peredo E, Sánchez-Solano F, Santos-Zavaleta A, Martínez-Flores I, Jiménez-Jacinto V, Bonavides-Martínez C, Segura-Salazar J, Martínez-Antonio A, Collado-Vides J: Regulon DB (version 5.0): Escherichia coli K-12 transcriptional regulatory network, operon organization, and growth conditions. Nucleic Acids Res. 2006, 34 DatabaseGoogle Scholar
- Balaji S, Aravind L LMand Iyer, Madan Babu M: Uncovering a hidden distributed architecture behind scale-free transcriptional regulatory networks. J Mol Biol. 2006, 360: 204- 10.1016/j.jmb.2006.04.026View ArticlePubMedGoogle Scholar
- V L, Marchiori M: Efficient Behavior of Small-World Networks. Phys Rev Lett. 2001, 87: 198701- 10.1103/PhysRevLett.87.198701View ArticleGoogle Scholar