Genome-wide analysis of E. coli cell-gene interactions

Background The pursuit of standardization and reliability in synthetic biology has achieved, in recent years, a number of advances in the design of more predictable genetic parts for biological circuits. However, even with the development of high-throughput screening methods and whole-cell models, it is still not possible to predict reliably how a synthetic genetic construct interacts with all cellular endogenous systems. This study presents a genome-wide analysis of how the expression of synthetic genes is affected by systematic perturbations of cellular functions. We found that most perturbations modulate expression indirectly through an effect on cell size, putting forward the existence of a generic Size-Expression interaction in the model prokaryote Escherichia coli. Results The Size-Expression interaction was quantified by inserting a dual fluorescent reporter gene construct into each of the 3822 single-gene deletion strains comprised in the KEIO collection. Cellular size was measured for single cells via flow cytometry. Regression analyses were used to discriminate between expression-specific and gene-specific effects. Functions of the deleted genes broadly mapped onto three systems with distinct primary influence on the Size-Expression map. Perturbations in the Division and Biosynthesis (DB) system led to a large-cell and high-expression phenotype. In contrast, disruptions of the Membrane and Motility (MM) system caused small-cell and low-expression phenotypes. The Energy, Protein synthesis and Ribosome (EPR) system was predominantly associated with smaller cells and positive feedback on ribosome function. Conclusions Feedback between cell growth and gene expression is widespread across cell systems. Even though most gene disruptions proximally affect one component of the Size-Expression interaction, the effect therefore ultimately propagates to both. More specifically, we describe the dual impact of growth on cell size and gene expression through cell division and ribosomal content. Finally, we elucidate aspects of the tight control between swarming, gene expression and cell growth. This work provides foundations for a systematic understanding of feedbacks between genetic and physiological systems. Electronic supplementary material The online version of this article (10.1186/s12918-017-0494-1) contains supplementary material, which is available to authorized users.


Library size
The total number of KEIO strains transformed with the plasmid pEZ8-123 carrying constitutively expressed mVenus and mCherry was 3,835. Two different wild type clones were also added during data collection (see below). The whole KEIO collection library had ~745 unlabeled wells, which contained viable E. coli cells and were not included in the final dataset.
Also, according to our determination of the targeted genes we found 55 duplicated knockouts, bringing the number of unique genes to 3780 including wild type. Cell population centroids were generated, which determined population cell density and drew a Region Of Interest (ROI) around the cells to separate them from cell debris and instrument noise.

Linear relationship FSC cell size
We used different sized polypropylene fluorescence and size calibration beads (Amersham) to determine the correlation between the measured SSC and FSC channel values and the effective particle size. We found very good, linear correspondence between the measures and in particular that of the Forward Scatter, with the effective particle diameter within the range of common E. coli cell volumes (0.5 -1.5 µ, Fig. S1).

Variation in biological replicates
The systematic and random error associated with plate-wise measurement was tested with technical replicates of 180 strains from three different plates and collected in 4 different days, and was found to be 5*10 -4 for mVenus and 1.3*10 -3 for mCherry. These values were approximately 1 order of magnitude smaller than the variance across all KEIO strains for the measured variables, which was between 3.6*10 -3 and 1*10 -2 , excluding a substantial experimental error in the dataset.

Definition of expression variables
The impact of genotypic context on synthetic gene expression output was quantified by first eliminating the variation in mCherry and mVenus fluorescence across the whole dataset due to To calculate the differential effect of the knockout on the fluorescence output of each of the two reporter genes, mC reg and mV reg were further regressed against the variable E and the residuals were calculated as above from the fit. The correlation between the two new set of residuals from the second fit and E was very high (respectively 0.93 and 0.98), as it was the correlation between the two regressed values (0.84).
Significant residuals in this new measure (the two fluorescence values have identical absolute value but opposite sign), were identified as knockouts with a significant gene-specific divergence (G spec measure).

Definition
For each of the S, E and G value distributions we selected the top and bottom 5% of the values as extremes, thus selecting 190 upper and lower S and E values, and 384 in total with a G spec phenotype (Figs. S3-6).

Distribution of extreme-value genes across the KEIO collection
We determined whether the functionally enriched sets of genes in the phenotypic patterns of the cell-expression context showed any enrichment in specific plates. We used R for plotting the distribution of S or E variables of the whole dataset, and then plotted the position of the identified extreme genes.
Extreme S upper values were enriched in KEIO plates #23 to #47, whereas the extreme lower values spanned the whole collection (Fig. S7). The distribution of E outliers seemed to span the whole KEIO plate range with 3-5 hot-spots around plate #43-47 for upper extreme values and #57-59 for lower extreme values (Fig. S8). The distribution of G spec outliers spanned the KEIO plates, with two hot-spots associated with lower extreme values centered on plate #63 and plate #95 (Fig. S9). It did not appear that there was a relationship between strain hot-spots across the KEIO plates.
Co-localization of strains with similar S or E phenotype in same plates was observed: genes with a S high phenotype were concentrated in the range between plate #17 and #43 (Fig. S7), several E high genes were found in plates #33, #37 and #39. There did not appear to be a clear link between plate co-localization and extreme S -E values. In particular the genes with a S high phenotype including various aromatic amino acid biosynthetic genes (Group 1, main table 1) just 4/12 of the genes were located in plate #41 (Fig. S10 panel A). Genes with a E high pattern, either in isolation or associated with a S high pattern, were scattered across the dataset (Fig. S10 panel B), as were genes with a S low pattern ( Fig. S10 panel C). Genes with a E low phenotype were found in several KEIO plates ( Fig. S10 panel D).
However, we found a relative presence of two functional groups associated with KEIO plate

Bootstrap
P-values through bootstrapping was implemented in R by using a simple sampling and hypothesis testing algorithm. In general, a n=10000 random sampling with replacement was applied to form groups of genes of the same size of the different GO groups discussed in the main text having data in the study. The null hypothesis was that the number of genes with a given phenotype (for example a S high -E high phenotype) in the GO groups discussed in this study is the same as for groups formed with random genes across the dataset.

Gene Ontologies and heat maps
Gene Ontology biological classes were extracted from EcoCyc [1], where parent and child classes were mined for gene members. Functional enrichment of GO classes and KEGG pathways [2] was performed with the online DAVID bioinformatic resources version 6.8 [3]. A cutoff of the Bonferroni-corrected p-value was automatically applied by the online resource and corresponded to p < 0.5.
Heat maps with the set of GO genes were drawn with R packages ggplot and heatmap.2 by using Z-normalization (mean-centered St. Dev.-fold) of the values and using a color range applied to the whole dataset while plotting the specific set of genes (Fig. S11). Fig. S1. Side Scatter (SSC) and Forward Scatter (FSC) measured for three particle sizes (A) and a linear regression of the measurements (B).

Supplementary Figures
Sup. Fig. 1. Side Scatter (SSC) and Forward Scatter (FSC) measured for three particle sizes (A) and a linear regression of the measurements (B).
Sup. Figure 7 Sup. Figure 7. Distribution of FSC values measured across the KEIO collection (cyan: values in top 5%, red: values in bottom 5%).
Sup. Figure 9 Sup. Figure 9.  Fig. 3). Specifically, genes with an exclusively S high phenotype (A, dark-pink dots), with either an exclusive E high or combined E high -S high phenotype (B, respectively dark orange and red dots), exclusive S low phenotype (C, dark-cyan dots), or genes with a E low variable alone or in combination (D, dark gold) including just the motility and chaperone functions (main group 15, D dark-pink) and the ECA synthesis function (group 16, D, dark green).
Sup. Figure 10 Sup. Figure 10. Distribution of genes with significant S or E values enriched in specific phenotypic groups (main Fig.   3). Specifically, genes with an exclusively S high phenotype (A, dark-pink dots), with either an exclusive E high or combined E high -S high phenotype (B, respectively dark orange and red dots), exclusive S low phenotype (C, dark-cyan dots), or genes with a E low variable alone or in combination (D, dark gold) including just the motility and chaperone functions (main group 15, D dark-pink) and the ECA synthesis function (group 16, D, dark green).