Evolutionary scheme of alpha-proteobacteria
To construct a robust phylogeny of alpha-proteobacteria, a dataset of eight universal proteins from the Ribosomal Database Project was downloaded. Proteins were aligned separately and then the alignments were concatenated, resulting in an alignment of 5056 amino acids that has been used to construct a Neighbor-Joining tree (see Methods) (Figure 2). A comparison with previous phylogenetic trees of alpha-proteobacteria [42] suggested that the tree root lies in the branch connecting the Rickettsiales to the other alpha-proteobacteria. Unlike the work of Gupta and Mok (2007), our tree shows Sphingomonadales branching off after the Rickettsiales and followed by the branching of Rhodospirillales, whereas in other works the latter branched first.
Clusterization of alpha-proteobacteria based on the orthology of cell cycle genes
By using the bidirectional best hit (BBH) approach (see Material and Methods section) on 65 available genomes of alpha-proteobacteria (Additional file 1, Table S1; legends of additional Figures and Tables are in Additional file 2) we obtained a list of genes that are orthologous to the 14 genes involved in the Caulobacter cell cycle, and the results are reported in Figure 2 (see also Additional file 3, Figure S1 and Additional file 4, Table S2).
ClpX, ClpP and DnaA are present in all alphas studied, but surprisingly all other proteins analyzed can be absent in several alphas. Transcription factors, GcrA and CtrA, the DNA methyl-transferase CcrM and the hybrid histidine kinase CckA are present in most of the alpha-proteobacteria. Other modules, such as those of the DivJ-PleC-DivK two-component system, are present only in clusters A and C of alpha-proteobacteria.
Genes with similar phylogenetic profiles (genes co-occurring in different genomes) are often functionally related [43] justifying the use of our profiles to investigate possible functional relationships; the dendrogram obtained describes how similar the profiles of different genes are (see Methods and upper part of Figure 2) and it confirms the functional association between divJ, pleC and divK (encoding the two component system negatively regulating CckA activity), and between cpdR and rcdA, whose products are involved in CtrA proteolysis. Weaker and possibly new functional associations concern the gene pair ccrM/gcrA and divL/chpT. In addition, we visually inspected the phylogenetic profiles of these genes between organisms, identifying seven groups (from A to G, see Figure 2). This classification, based on orthology, will be used as a reference in the following sections.
Cluster A includes Rhizobiales, Caulobacterales and several Rhodobacterales, and is composed of the largest number of sequenced genomes; this cluster is characterized by a nearly identical conservation of factors found in Caulobacter. Although similarities are evident in this cluster, a deeper analysis revealed substantial differences that will be discussed in the next sections.
Cluster B, including other Rhodobacterales, shows a substantial difference compared to cluster A; in fact, both DivJ-PleC-DivK and RcdA-CpdR systems are missing.
Magnetospirillum and Rhodospirillum, which are closely related, are the two members of cluster C. This cluster is characterized by the presence of the PleC-DivK system since DivJ is missing and by an almost complete loss of the CpdR-RcdA system although Magnetospirillum possesses an rcdA orthologous gene.
Clusters D (Rhodospirillales) and F (part of Rickettsiales), even though they are separated in the tree reported in Figure 2, share common features: i. members of the CckA-ChpT-CtrA phosphorelay are missing at different degrees (Granulibacter and Pelagibacter do not show even ctrA orthologs); ii. DivJ-PleC-DivK and RcdA-CpdR systems are missing. Despite these similarities, the two groups diverge for the presence of CcrM and GcrA in group D.
Organisms belonging to cluster E (Sphingomonadales) show conservation of the phosphorelay CckA-ChpT-CtrA and also often possess factors required for the temporally and spatially regulated proteolysis of CtrA, such as CpdR and RcdA. However cluster E is characterized by degeneration or a complete loss of the DivJ-PleC-DivK regulation system. DivK or PleC orthologs can be found in several organisms of this subgroup although their phylogeny often significantly deviate from the phylogenetic tree of housekeeping genes (Additional file 5, Figure S2).
Finally, group G (remaining Rickettsiales), composed mainly of pathogens, has few of the factors involved in Caulobacter cell cycle progression regulation. It is however interesting to find a CckA-CtrA system whereas ChpT orthologs cannot be found using the BBH approach.
CtrA regulon in alpha-proteobacteria
The regulatory circuit that controls cell cycle progression in Caulobacter is also composed of crucial transcriptional connections, such as CtrA control on divK and the CtrA-DnaA-GcrA-CcrM circuit. This transcriptional regulation level is discussed in this section and the following. In particular, results obtained for the prediction of the CtrA regulon in alpha-proteobacteria are described here.
Laub and collaborators [13, 44] were able to identify a set of genes plausibly constituting the CtrA regulon in Caulobacter by combining varying evidence: 116 genes were identified through chromatin immunoprecipitation using phosphorylated CtrA; 88 genes were identified as CtrA-dependent for normal expression levels, and 69 as cell cycle regulated in a transcriptome analysis encompassing one complete cell cycle round. The 54 genes within the overlap of all three data sets were identified as members of the CtrA cell cycle regulon, and were used here to build the position weight matrix (PWM) of CtrA binding sites. Upstream sequences of these 54 genes were retrieved and used to find enriched sequence motifs using AlignAce [45]. The PWM obtained (Additional file 6, Table S3) corresponds to a 16-mer containing the known CtrA binding motif and was used in a sliding window approach on a non-redundant subset of the genomes used in this work. An output was obtained where genes in a given genome have a score based on the presence of CtrA sequence motifs in the region comprised of 100 nucleotides within the coding sequence to 400 nucleotides upstream of the start codon (see Methods for details).
Is the CtrA PWM, based on Caulobacter data, valid for all the alphas analyzed here? One might speculate that if a ctrA gene taken from an alpha is able to complement deletion in Caulobacter, the PWM built on Caulobacter ctrA-controlled genes would also be valid for the bacterium where the complementing ctrA comes from. It has been shown that the ctrA gene from R. prowazekii, named czcR, is able to complement the deletion of ctrA in Caulobacter, confirming that the functionality (that is, the binding site) is conserved between Rickettsia and Caulobacter[46]. Moreover other ctrA genes from species taxonomically closer to Caulobacter, such as S. meliloti, are able to complement the ctrA deletion in Caulobacter (Biondi, unpublished data). Considering the phylogeny of alphas and positions in the tree of R. prowazekii and S. meliloti, it is reasonable to consider that the CtrA binding site might be substantially conserved across the alphas.
Two kinds of results from this analysis are shown here: (i) CtrA target genes belonging to our starting dataset of cell cycle related genes (Figure 3) and (ii) enrichment of COG (clusters of orthologous groups of proteins) categories of genome-wide CtrA regulons for each genome analyzed (Figure 4).
In Figure 3 (see also Additional file 7, Table S4) we show the p-values for the presence of CtrA binding sites upstream of analyzed genes. CtrA controls the transcription of several genes involved in regulation of cell cycle progression, including itself in most of the alphas analyzed (86%). Moreover, the number of genes controlled by CtrA is maximal in cluster A. In this cluster CtrA controls at least one gene of each of the following systems: DivJ-DivK-PleC, CpdR-RcdA-ClpPX and GcrA-DnaA-CcrM. Several genes showed evolutionary conservation of CtrA control among members of cluster A, such as DivJ, RcdA and CcrM. The second important result arising from the analysis shown in Figure 3 was that in Cluster B, where the DivJ-PleC-DivK system is missing, CtrA controls both CckA and DivL.
Each genome-wide regulon of CtrA was defined as the list of genes with a Z-score (see Methods) ≥2 (corresponding to a p-value of ca. 0.023) in an organism. In Figure 4 (see also Additional file 8, Table S5), predicted regulons of CtrA were analyzed for functional enrichment (percentage of genes in a COG category controlled by CtrA) in genes belonging to functional categories. Most enriched categories were Signal transduction mechanisms (enriched in 15 organisms), Cell wall/membrane/envelope biogenesis (enriched in 10 organisms) and Cell motility (enriched in 9 organisms), while cell cycle functions were enriched in six species, belonging to cluster A, including Caulobacter and Neorickettsia sennetsu of cluster F. These results confirmed experimental data on the functions controlled by CtrA (see Background section) in Caulobacter suggesting that (i) the analysis of the regulon is able to capture good candidates of CtrA targets and (ii) the control of CtrA over these functions is at least partially evolutionarily conserved.
CtrA-DnaA-GcrA-CcrM connections
In Caulobacter, transcriptional regulation of ctrA is based on a positive feedback loop that includes CtrA itself, GcrA, DnaA and CcrM.
As reported in Figure S3 (Additional file 9) (see also Additional file 7, Table S4) the presence of CcrM methylation sites upstream of cell cycle related genes was assessed (see Methods section). For this analysis we used the consensus methylation site of CcrM, i.e. GANTC [47]. As reported elsewhere [39, 48], methylation by CcrM is conserved among alphas and ccrM genes from Caulobacter and S. meliloti which are able to cross-complement deletions, suggesting that the methylation site might be conserved. In a homogeneous background of DNA, the expected frequency of this sequence is 4/1024 nucleotides, i.e. two occurrences in the 500 bp long window that was used to define the promoter region for motif finding. The average number of occurrences found ranged from 0.4 to 1.5 methylation sites per promoter (defined as the 500 bp from the first 100 bp of a gene to 400 bp upstream of the translation start site), according to non-uniform distributions of nucleotides in these genomes. Although some genes possess more predicted methylation sites in their promoter region, it is not possible to derive a possible control of methylation in their expression.
The consensus site of DnaA is very conserved among bacteria, in fact, DnaA experimental binding sites in E. coli and B. subtilis differ only by a nucleotide [49, 50] which corresponds to that found in Caulobacter, supporting the conclusion that its binding site is conserved across alphas. Promoter regions of orthologs in all alphas were also scanned using the DnaA matrix based on 15 known DnaA motifs in Caulobacter (see Methods section) and results are shown in Figure 5 (see also Additional file 7, Table S4). Again, is the DnaA binding site conserved across the alpha proteobacteria? Following this analysis, the promoter sequence of gcrA does not bear a significant DnaA motif in Caulobacter, while elsewhere the control of DnaA on the gcrA promoter has been proposed [51]. It is worth noting that a predicted DnaA motif is present upstream of the gcrA gene from two closely related species Hyphomonas neptunium and Maricaulis maris. Moreover, in Caulobacter, the presence of DnaA binding sites was observed upstream of divJ and cckA, targets that are not confirmed by previous experimental analysis. As in the case of GcrA, the absence of conservation of putative binding sites at the taxonomic level was observed, with the only exception being the suggested DnaA control on CtrA in several species. This might be caused by the low specificity of the DnaA and GcrA matrices used for motif finding, but may also suggest that in some alpha-proteobacteria the DnaA->GcrA->CtrA circuit may be simplified by excluding GcrA. The control of DnaA on CcrM is also interesting because it mainly concerns those organisms where ccrM lacks CtrA binding sites. The opposite is also true: ccrM is very often preceded by CtrA binding sites in Rhizobia and it lacks DnaA motifs. This questions the existence of a DnaA->CcrM->CtrA circuit in Rhizobia and suggests an at least partial decoupling of CtrA activity (modulated through control of the divJ-pleC-divK system, which acts on the phosphorelay) from DNA replication triggered by DnaA. Otherwise, the absence of CtrA binding sites upstream of DnaA might suggest the existence of other not yet identified regulators, which may connect CtrA and DnaA in these organisms.
Although GcrA is considered a DNA binding protein activating transcription, no experimental evidence has ever been proposed to demonstrate this behavior [19]. However it is still possible that DNA sequences are associated with the presence of this factor. GcrA putative binding sites were searched for in promoter regions of genes encoding factors involved in cell cycle regulation using a strategy similar to the one followed to predict CtrA regulons (see Methods section). In C. crescentus, as reported in Figure 5 (see also Additional file 7, Table S4), the existence of a putative GcrA binding motif (Additional file 10, Figure S4) upstream of ctrA and also the presence of such motifs upstream of divJ was confirmed. Concerning the other species, patterns of occurrence did not respect phylogenetic relationships. The only gene for which most of the organisms seem to possess a GcrA binding site is clpX.
Verification of binding site prediction
The prediction of CtrA and DnaA binding sites across alphas is based on Caulobacter data and we have already discussed how, from previous studies, it is possible to hypothesize that CtrA and DnaA (and also CcrM) binding sites are conserved across the alpha-proteobacteria. However our prediction ability might be accurate only for bacteria closely related to Caulobacter, but, going farther, this confidence could decrease. To evaluate this bias in binding site prediction, we counted the number of genes in each genome putatively controlled by CtrA and DnaA, normalized for the genome size. We found (Figure 6A) that the number of predicted genes is fairly constant and depends only on the genome size (or number of genes), suggesting that our prediction confidence is not biased by the phylogenetic distance. This result also explains the success of the complementations of ctrA deletion in Caulobacter by orthologs from other alpha proteobacteria, as discussed in the previous sections [46, 52]. We also evaluated whether the presence of CtrA and DnaA predicted genes depended on the presence of CtrA and DnaA themselves in the genomes or if it was an artifact of bioinformatic analysis. We therefore plotted the fraction of genes controlled by CtrA and DnaA at small p-values in three alpha proteobacteria possessing CtrA and DnaA and in E. coli and B. subtilis, which possess only DnaA (Figure 6B). From this analysis it is evident that, at lower p-values, only organisms with CtrA keep a consistent fraction of genes controlled by CtrA, while for DnaA, which is present and active in all, every organism maintains a similar fraction of putatively controlled genes--even at lower p-values.
CtrA binding site consensus has been previously tested experimentally in R. prowazekii, S. meliloti and B. abortus, besides Caulobacter[38, 46, 52]. Here, we compared the experimental consensus sequences with our bioinformatic PWM (Figure 6C), and our prediction coincides with experimentally identified sequences. Our PMW corresponds also with a CtrA PMW previously found [53].
The DnaA binding site has been studied in very diverse bacteria such as Gram-negative Escherichia coli and Gram-positive Bacillus subtilis[49, 50]. The DnaA binding site in these two species differs because of one nucleotide, suggesting that the binding site should also be very conserved in alpha proteobacteria. We compared the predicted PWM for DnaA based on C. crescentus with experimental DnaA binding sites in E. coli, B. subtilis and S. meliloti (Figure 6C) [49, 50, 54]. Our prediction, based on nucleotide sequences that bind DnaA in Caulobacter, corresponds to binding sites experimentally found in other bacteria.
This verification was possible only for DnaA and CtrA, while GcrA has been studied only in C. crescentus and experimental data are available only in this organism. It has not been clarified whether GcrA binds DNA directly or through an unknown factor X [19]. Therefore, since the knowledge on GcrA is still preliminary and experimental work still needs to be done, we limited the experimental validation to CtrA and DnaA, for which data are available. It should be noted, however, that both DnaA and CtrA experimental verifications revealed that our method is accurate and reliable.
Reconstruction of regulatory circuits
Based on data of Figures 2, 3, and 5, we reconstructed the architecture of the seven clusters (A to G) found in the BBH analysis; as discussed below, only four clusters revealed a defined architecture as illustrated in Figure 7. Models of CtrA regulation are shown in the clusters (clusters A, B, C and E) where interactions between factors were found. This modeling is essential in order to underline differences and conservation of several features of cell cycle regulation in alpha proteobacteria.
Cluster A (Caulobacterales, Rhizobiales and several Rhodobacterales) contains the larger number of genomes analyzed here and the organization of cell cycle genes resembles that observed in Caulobacter (see bottom part of Figure 7 for details), i.e. it includes the phosphorelay CckA-ChpT-CtrA/CpdR and also the proteolysis machinery composed by the ubiquitous ClpPX protease, CpdR and RcdA, which is however absent in Xanthobacter. The DivJ-PleC-DivK system is conserved and corresponding genes are controlled by CtrA in all members of the cluster. CcrM also controls CtrA and GcrA.
Rhizobiales are different from bacteria similar to Caulobacter (B. japonicum, P. lavamentivorans and M. maris) due to the absence of the CtrA control on GcrA which is present only in the Caulobacter- like.
In cluster E, the CckA-ChpT-CtrA phosphorelay is present with the second branch also leading to phosphorylation of CpdR that, together with RcdA, are thought to be involved in controlling CtrA proteolysis. DivK is absent in this cluster although a divK-like gene has been found although it has an anomalous phylogeny (Additional file 3, Figure S1). In most members of cluster B, CtrA controls its own promoter, other genes involved in cell division and chromosome partitioning as well as ccrM.
In cluster B, the CckA-ChpT-CtrA regulon is isolated from GcrA, CcrM and DnaA while CtrA controls itself as well as two factors involved in its phosphorylation, CckA and DivL.
In this group, DivL lacks the kinase domain that is usually present only when DivK is also present (data not shown, based on SMART database). In fact, DivK is absent in cluster B together with its kinase/phosphatase. The fact that CtrA controls its kinase creates theoretically a feedback.
In cluster C, the CckA-ChpT-CtrA phosphorelay is present while CpdR is absent. Also CtrA is not connected with DnaA and CcrM and finally DnaA has binding sites on ccrM and divL. Connections between DnaA/CcrM and CtrA seem to be achieved by the PleC-DivK two-component system. No clear positive or negative transcriptional feedbacks of CtrA on other cell cycle factors are present.
Cluster D contains the two cases among alphas, Granulibacter and Pelagibacter, where a ctrA ortholog has not been found. The phosphorelay, even in organisms of cluster E that have CtrA, is degenerated; although cckA is present, no orthologous gene of chpT has been found. There is no explanation for the presence of CckA in organisms with no CtrA. Since histidine phosphotransferases are difficult to annotate [10, 20], it is possible that other phosphotransferases substitute for ChpT in those organisms containing both CtrA and CckA, as proposed here for cluster G.