Quantification of cell size and reporter expression across the KEIO collection
The goal of this study is to comprehensively characterize how loss of gene function impacts two key optimization parameters in synthetic biology and metabolic engineering: cell biomass and heterologous gene expression. To do this, we quantified the effect of every single deletion of non-essential genes in E. coli on cell size and the expression levels of two constitutively expressed synthetic reporter genes. A genetic probe containing the mVenus and mCherry genes expressed from identical promoter-5′-UTR sequences [12] was transformed into each of the 3822 gene knockout strains of the KEIO collection [11] and into two independent cultures of the wild-type parent strain (E. coli BW25113) (Fig. 1a). Each of the 3824 strains carrying the probe was grown from a mixture of 2–3 single colonies picked from agar plates to mitigate potential colony-to-colony variability.
Single-cell measurements of mVenus and mCherry fluorescence, along with cellular physical parameters, were acquired with a flow cytometer at the mid-log phase of growth. Fully replicating these measurements on the whole library was not practical. To estimate the experimental error associated with plate-wise measurement, we performed replicate measurements of 180 strains picked from three different plates on 4 different days. Measurement errors for both mVenus and mCherry were approximately one order of magnitude smaller than the variance measured across all KEIO strains for these variables (Additional file 1). This readily demonstrates substantial impact of the gene deletions on heterologous expression. Although forward scattered light (abbreviated FSC) can be effectively used to measure microbial cell size in flow cytometry [12], it does not necessarily scale linearly with particle sizes on all instruments [13, 14]. We used beads to verify that this was the case in our instrument within the size range typical of an E. coli cell (~2μm, Additional file 1: Fig. S1). We therefore used FSC as a proxy for cellular size (S).
As we followed the original layout of the Keio collection [11], our strains are not distributed randomly amongst plates. In fact, genes with similar functions were occasionally grouped together in the same plate. For example, many genes encoding chemotactic and flagellar proteins are clustered in plate #45 (Additional file 1: Fig. S2). This non-random arraying of strains could have introduced bias in our measurements. However, we did not observe significant plate-specific shifts of median fluorescence in relation to the whole dataset distribution (Additional file 1: Figs. S3-S6). Re-arraying of 180 strains and additional analysis further confirmed this conclusion (see Additional file 1). Therefore, to avoid the risk of introducing processing bias in the dataset, no further data normalization was performed.
The average fluorescence of mVenus and mCherry varied approximately four-fold and was strongly correlated across the 3.824 strains (r = 0.90, Fig. 1a) and with S (correlation 0.67 and 0.61 for mVenus and mCherry, respectively) (Fig. 1a). A change in cell size could indirectly affect heterologous gene expression in cases in which proteins are not sufficiently split between daughter cells during cell division (growth feedback) [9]. This scenario was supported by the observed positive correlation between FSC and fluorescence output (Fig. 1a). To quantify the specific effects of gene knockout on heterologous gene expression we needed to account for the influence of cell size variations (S). Measurements of mCherry and mVenus fluorescence were regressed against S. Pairwise averages of resulting residuals (mCreg and mVreg) were used as S-normalized measure of heterologous gene expression (E). To quantify the differential effect of knockouts on the individual reporter genes, we used the residuals obtained upon regressing mCreg and mVreg against E. This regression yielded identical sets of absolute values (residuals) that remained highly correlated with E (r = 0.94–0.98) and were used as proxy for gene-specific effects (G
spec
) (strains with significant difference between mCherry and mVenus fluorescence) (Additional file 1).
Most KEIO knockouts show a single-feature phenotype
To ease the analysis of associations between variables, we binned strains into groups of extreme phenotypic values (top and bottom 5% quantiles giving respectively S
high
/ S
low
and E
high
/ E
low
). These extreme values were homogenously distributed across the dataset (Fig. 1a, cyan dots), showing that the regression procedure did not introduce systematic biases. About 192 genes showed an extreme S or E value, whereas the number of genes with a G
spec
phenotype was 384.
Amongst the set of unique 578 strains thus selected, 81% presented exclusively one extreme E or S phenotype. A S
high
phenotype was very rarely combined with extreme expression (E
high
or E
low
, ~2% each). In contrast, the S
low
phenotype was significantly associated with E phenotypes (~7% each, p < 10−4, Fig. 1b, calculated by bootstrap against random occurrence). Extreme E and G
spec
phenotypes were found in combination in only 14–19% of strains. Notably, 33% (53/159) of genes with a S
low
shift also had a G
spec
phenotype compared with only 7.5% of those with a S
high
phenotype. This result suggests that deletions leading to smaller cells have larger chance to disrupt the balance in the expression of two synthetic genes than genetic perturbations increasing cell size (Fig. 1b).
We assessed the presence of functional enrichments in strains with a single S, E or G
spec
phenotype using DAVID Bioinformatics Resources [15]. The S
low
and S
high
categories did not present functional enrichment after Bonferroni correction for multiple-hypothesis testing. The E
low
group was significantly enriched in knockouts of genes involved in flagella assembly (GO:0044780 Bonferroni corrected, p < 0.05). Strains with a G
spec
phenotype corresponded to a diverse range of cell functions including transcription factors and enzymes involved in central carbon metabolism, with a significant enrichment in amino acid biosynthesis related genes (KEGG pathway, p < 10−2). Apart from a number of genes involved in purine nucleotide biosynthesis, the E and G
spec
phenotypes did not share substantial sets of cellular functions.
Detailed functional analysis of the size-expression relationship
To gain a better understanding of the role of different cell functions on the relationship between cell size and gene expression, KEIO knockouts populating all pairwise combinations of extreme phenotypes were investigated. To specifically investigate strong S – E associations, we only considered combinations where an extreme Z-score (St. Dev.-fold from the mean) for one variable was combined with a near-zero value (−0.5 to 0.5 range) for the other (compare one-feature categories in Fig. 2 and Fig. 1b). The presence of a G
spec
effect, which quantifies expression imbalance between the two reporter genes, was assessed in each of the observed S - E patterns. This method defined 16 phenotypic combinations corresponding to 401 strains. Combinations were arranged in a matrix with S
high
and E
low
placed at the top and bottom, respectively (Fig. 2). This arrangement exposed that most genes are characterized by a similar up/down shift in the S and E features. Only 30 (7.5%) of these strains presented a mixed phenotype (for example S
down
– E
high
). All phenotypes with >20 members were assessed for functional enrichment, while gene lists were reported for smaller groups.
Gene disruptions that severely affected both S and E phenotypes impaired major cellular functions. Only impairments in amino acid biosynthesis, and particularly in aromatic amino acids, led to an exclusive S
high
phenotype (Bonferroni corrected p < 10−1) (Fig. 2 group #1 brown genes). Genes with a significant S
high
or E
high
phenotype, either alone or in combination, were often related to cellular housekeeping functions (Fig. 2, Groups #2–8 orange genes). Intriguingly, impairing nucleotide biosynthesis primarily results in E
high
phenotype, either associated or not with a S or G
spec
effect (Fig. 2, Groups #4 and #5, brown genes). Altogether, these data show that gene disruptions in amino acid and nucleotide biosynthesis pathways trigger distinct phenotypic increases in cell size and generic gene expression, respectively.
Many knockouts with a S
low
phenotype involved nutrient and metal ion uptake, including phosphate (pstA, pstC), sulfur (cysC, cysN) and zinc (ZnuA, ZnuB) (Fig. 2 – groups #8–10 red genes. The combined S
low
- E
low
pattern was populated with strains associated with carbohydrate catabolism (sucA, sucC) and a critical regulator of stationary phase onset (dksA). Knockouts of four major cellular chaperones (Gene Ontology class GO:0006457, Bonferroni corrected p < 10−1) presented an exclusive E
low
phenotype, thus indicating that a lack of protein folding function negatively affects heterologous gene expression (Fig. 2 group 14, discussed below). A majority of knockouts in this phenotypic region (78/105 in groups #9–14, Fig. 2) cause a global and homogenous effect on gene expression as opposed to a specific effect on individual genes (no G
spec
effect).
The S
low
- E
low
phenotype was also associated with several knockouts of genes involved in bacterial chemotaxis (cheY, motA, motB), while exclusive E
low
phenotypes are linked to disruptions in flagellum assembly (GO: 0006935 and KEGG pathway flagellar biosynthesis) (Fig. 2 - Group #14 and Fig. 3, Bonferroni corrected p < 0.05). These observations show that cell motility is implicated in cell-wide changes of gene expression in E. coli (discussed below). Disruptions in the ‘Enterobacterial Common Antigen Biosynthetic Process’ (GO:0009246) also showed an E
low
phenotype – but combined with a S
high
shift, in contrast to chemotaxis genes (Fig. 2 – group #15). Inclusion of 13 other members of this ontology group that only showed a mild score further strengthened the association with a E
low
phenotype (p < 10−4, calculated via bootstrapping). Unlike many knockouts of genes involved in cell growth, which predominantly led to S
high
- E
high
phenotypes (Fig. 2, Groups #2–3), the S
high
– E
low
response triggered by ECA knockouts suggested the existence of a different underlying mechanism.
Growth, protein synthesis and motility are distinct systems of the S-E landscape
Our analyses above revealed that the phenotypic impacts of single-gene deletions define a S-E landscape with three main regions: jointly higher size and expression (S
high
– E
high
, Fig. 2 Groups #1–6), predominantly reduced size with neutral or increased expression (S
low
- E
high
, Fig. 2 Groups #7–11) and predominantly reduced expression (E
low
, Fig. 2 – Groups #12–16). To obtain a broader understanding of this landscape, we performed another functional enrichment analysis amongst the strains populating these regions [16]. Functional terms passing both a p-value (p < 0.1) and a Z-score (z > 0.5) threshold were defined as differentially enriched (Fig. 3) (Bioconductor package CompGO, Additional file 1).
The S
high
– E
high
region harbored strains deleted of key bacterial growth functions including cell division and important housekeeping cytosolic cellular processes (GO:0009987 and GO:0044444, Cellular Components) (Fig. 3 yellow circles). A detailed inspection of child GO Biological Processes (BP) revealed enrichment for functions involved in iron-sulfur cluster assembly (GO:0016226, iscA, sufC, cyaY, ygfZ), mRNA degradation (GO:0006402, rnr, pnp), chromosome condensation (GO:0030261, hupAB) and DNA-templated transcriptional regulators (GO:0006335, oxyR, mfd). A significant number of KEIO strains associated with the E. coli GO class ‘DNA-dependent DNA replication’ (GO:0006261) were also found associated with a S
high
– E
high
phenotype (Bonferroni corrected p < 10−2, 5/9 Additional file 1: Fig. S11a-c).
The S
low
- E
high
region was mainly populated by strains knocked out of cytoplasmic factors involved in protein biosynthesis (GO:0043022 and GO:0016149) (Fig. 3 – red circles). These included structural or functional components of the ribosome and important factors involved in translation (prfC, efp, queA). A total of twelve ribosome structural genes could be deleted in the KEIO collection. Out of these, 5 showed a S
low
– E
high
phenotype (rpsU, rpsT, rpmJ, rpmE, rplA, p = 0.02, calculated by bootstrap, Additional file 1). More generally, 27 KEIO strains deleted for genes with a key role in translation (GO:0006412) showed a strong S
low
– E
high
pattern (p < 10−2) (Additional file 1: Fig. S11d). Significantly, 18 of these genes also showed a significant G
spec
phenotype (p < 10−4). Thus, disruption of non-essential genes involved in translation had a differential effect on the expression of individual heterologous genes.
The E
low
region was associated with knockouts of structural flagellar proteins in the analysis above. A broader search confirmed a significant enrichment for GO:0044780 (cell motility) and the KEGG pathway eco02040 (flagellar assembly) (Fig. 3 – blue circles). A comprehensive analysis of genes involved in chemotaxis (GO:0006935, 23 genes) and flagella (GO:0009288, 23 genes) further supported the E
low
phenotype (p < 10−4 and p < 0.005, respectively). More than half (14/23) of the genes in the latter group had also a S
low
phenotype (Additional file 1: Fig. S11E and Fig. 2 – Group #13).