Single-cell phenomics of unrelated wild strains
To estimate the extent of natural variation for morphological traits within the S. cerevisiae species, we selected 37 wild strains from various geographical and ecological origins (Figure 1A and Additional file 1: Table S1). These strains belong to a larger panel which was previously used to explore the genetic diversity of the species [14]. We selected this subset of strains in such a way that 1) most ecological and geographical classes were represented, 2) genetic distances between selected strains reflected all S. cerevisiae subgroups, 3) all strains were MATa/MATα diploids originating from the selfing of a haploid spore and 4) liquid cell cultures of these strains contained predominantly unattached individual cells rather than flocculent aggregates or clumps of unseparated cells. This latter criterion was essential to enable semi-automated image analysis of individual cells. We cultured each strain as five biological replicates in standard laboratory conditions as previously described [15] (exponential growth, synthetic medium, 2% glucose, 30°C). Cells were then fixed with formaldehyde and their cell wall, nuclear DNA and actin were stained using specific fluorescent dies. Images of at least 200 cells per culture were acquired by fluorescent microscopy. These images were then analyzed using the CalMorph software [13] to quantify 501 parameters reflecting the size, shape, orientation, and intracellular organization of the cells. Altogether, more than 1,000 cells were acquired for each strain, allowing the statistical inference of intra-species variation.
Cellular morphology varies greatly across the S. cerevisiae species
To directly test each of the 501 traits for intra-species variability, we performed a Kruskal-Wallis test on the null hypothesis of no strain effect. Results were compared with those obtained across 1,000 permutation tests where the 185 values of the trait were resampled. A total of 440 traits showed K > 56 from the actual dataset, while the empirical False Discovery Rate (FDR) associated with this threshold was 0.01 (Figure 1B and Additional file 2: Table S2). Detecting so many differences (88%) across only 37 strains suggests that most of the morphological organization of S. cerevisiae cells is subjected to intra-species quantitative variation.
The most striking phenotypic variation was the elongation of cells. For example, mother cells of the baker strain CLIB192 were nearly round whereas those of YJM269, isolated from apple juice, were clearly elongated, with a long axis about 1.3 times longer than their short axis (Figure 1C). This axis ratio was highly variable across strains both before and during budding, and for both mothers and buds (Additional file 2: Table S2). Thus, its variation does not reflect different properties at specific stages of the cell cycle but inherent differences in cell shape across the various backgrounds.
Another trait that greatly varied across strains was the position of bud neck. Some strains such as YJM269, BY4743, CLIB382 or UC1 budded almost longitudinally along their long axis, whereas other strains such as YJM421, DBVPG1794 or CLIB157 initiated budding at angle positions reaching 30–40 degrees (Figure 1D). This suggests that molecular determinants of bud initiation, such as Bud9p, Bud8p [16] or the 12S polarisome [17] may have strain-specific localization patterns along the cell cortex.
The size of cells was also highly variable across strains (Figure 1E). This fully agrees with previous observations made on industrial strains [18].
Importantly, many traits that were highly variable were not correlated. This is particularly apparent on Figure 1C-E, where values of the three traits mentioned above ranked strains in three different orders. Thus, the natural variation of S. cerevisiae cellular morphology represents a set of multiple independent traits with different sources of variability. We then investigated further the properties of this variation using conventional tools of multidimensional analysis.
Wild strains are continuously distributed in the phenome space
Variation of multiple traits may take place in several ways. A first possibility is the existence of one or few strains showing peculiar morphologies compared to an overall profile globally conserved within the species. A second possibility is the co-existence of two or more distinct groups, each containing numerous strains. Finally, the morphological space may not be particularly structured, and strains may all differ continuously without presenting notable outsiders. To distinguish between these possibilities, we examined the overall landscape of phenotypic variations by performing principal component analysis (PCA). A permutation test determined that no principal component was expected to explain more than 5% of the variance by chance only. From the actual dataset, five phenotypic principal components (pPCs) were observed to exceed this threshold, and their cumulated contribution reached ~60% of the variance (Additional file 3: Figure S1). The first two components were contributed by traits reflecting cell elongation (Additional file 4: Table S3). After representing the position of strains along the first four components, several observations could be made (Figure 2A-B and Additional file 3: Figure S2). First, strains were almost evenly spaced with no particular subgroup that could explain any of the components. This reveals that S. cerevisiae has a continuum of morphological features rather than discrete classes of distinct morphologies. Secondly, strains from common ecological origin did not group together. This indicates that differences in the cellular traits measured do not simply reflect adaptation to the annotated environments. Less generally, adaptation could involve subtle changes of few traits. In this case, a dedicated test should be done to detect possible links between variation of one trait and the strain origin. We therefore tested, for every trait, the effect of ecological or geographical origin using a Kruskal-Wallis rank-sum test (see Methods). No significant association was found. This could be due to limited power in our small sample size (only 37 strains). It is also possible that some properties of the strains original microenvironments (pH, specific limiting nutrients or stress factors…) were shared between strains of similar morphological profiles. Detecting this possible adaptation would require exhaustive annotations of these environments at the time of collection. Finally, measuring morphological traits in a standardized laboratory condition may not interrogate the consequences of adaptation to specific environments. Acquiring morphological profiles from relevant ecological conditions would be more appropriate to reveal associations between traits and ecological origin.
Although the overall landscape of trait variations was not structured, it remained possible that some subgroups of strains shared morphological similarities. To examine this possibility, we performed a classification based on hierarchical clustering and multiscale bootstrap resampling to infer statistical significance of the resulting dendrogram [19, 20]. For each cluster, its derived approximately unbiased probability value (AU p-value) estimates the probability that the cluster would be observed if unlimited observations were available (i.e. infinite number of strains). The procedure defined three classes (I, II and III) of strains that were significantly grouped at AU p-value > 0.95 (Figure 3A and Additional file 3: Figure S3 and S4). Interestingly, each of the three classes contained strains from various ecological origins, indicating that the fine-scale structure detected could not simply be explained by shared environmental histories. To determine the phenotypic characteristics of these three classes, we performed a linear discriminant analysis (LDA, see Methods). This extracted 39, 9 and 19 parameters that significantly contributed to classes I, II and III, respectively (Additional file 5: Table S4). The main features of Class I were a large region of actin at S/G2 and a bud nucleus located close to the neck. Class II specificity was to display nuclei that were round and centered in mother cells but elliptical in buds. Class III contrasted by small cells at G1 and nuclei that were distant from the neck in both mother cells and buds (Figure 3B).
However, most strains (24 out of 37) remained unclassified, which is consistent with the continuous distribution of strains along the major principal components described above. Observing multiple singletons can sometimes result from high measurement errors. However, the high number of traits for which a significant strain effect could be detected indicates that our measures have small residual variance (Figure 1B). Thus, these numerous singletons more likely reflect that intra-species variation of S. cerevisiae cellular morphology is poorly structured.
Relationship between phenotypic and genetic distances
In order to study the relationship between genetic and phenotypic distances, we considered all 666 pairwise combinations of strains. Figure 4A represents their phenotypic similarity (defined as the Pearson correlation coefficient of the two strains across 28 pPC scores covering 97% of total variance in PCA on all 501 traits, see Methods) as a function of their genetic distance (defined as the number of polymorphic sites differentiating two strains, as previously described [14]). Except for three pairs of strains that were very close both genetically and phenotypically, there was absolutely no correlation between the two types of divergence (Spearman ρ = −0.08). Nevertheless, this absence of correlation could be due to the fact that our population/sample is a combination of strains coming from clean and mosaic lineages. By contrast to non-mosaic strains, mosaic isolates that are genetically distant might share common parts of the genome leading to a phenotypic similarity. We therefore examined correlation across 16 strains that were previously described to represent a clean lineage (see Methods). On this subset, genetic and phenotypic distances remained uncorrelated (Spearman ρ = − 0.05), suggesting that our mixed population is not the major reason for not detecting any correlation. We conclude that, globally, morphological resemblance did not reflect genetic relatedness.
It still remained possible that subsets of traits co-varied with parts of the genetic structure of the population. To address this possibility, we extracted the principal components of the genotypic variance of the population (Additional file 3: Figure S5). The first component, gPC1, caught more than 25% of the variance and discriminated a cluster of European wine strains previously described [14]. The second component explained 7% of the variance and discriminated a pair of related clinical strains from the rest of the population. gPC3 and gPC4 explained about 5% of the variance each, and all successive ones had minor contributions. We then tested if these genotypic components of the population were correlated with any of the phenotypic principal components. We computed Spearman’s rank correlation coefficients among all combinations between the 37 gPCs and the 37 pPCs. None of these coefficients exceeded the correlations obtained when using pPCs from a randomized dataset. This implies that morphological traits and genotypic variations of this S. cerevisiae sample follow different structures.
When representing strains from classes I, II and III on the tree of genetic distances, we observed that class I strains were genetically close (Figure 4B). All five strains of class I belonged to a group of strains genetically related and generally associated with wine making [14]. The common features of these strains were to have large actin regions and a specific position of the nucleus (Additional file 5: Table S4). This suggests that phenotypic and genetic distances can be correlated locally. However, this was not the case for classes II and III. Class II contained strains YPS1000, BY and YJM653 that were all at different edges of the genetic tree, and class III contained clinical strain YJM454 and baker strain CLIB192 that were at extreme genetic distances from each other.
Natural strains vary in their degree of cell-to-cell trait variation
The fact that traits were measured on individual cells allowed us to investigate whether the level of phenotypic ‘noise’ differed between natural yeast backgrounds. Nearly half of the 501 traits reported above already estimated this intra-sample variability, since they were coefficients of variation (CVs) of measured quantities. However, these parameters sometimes varied concomitantly with the mean value of the trait considered. In agreement with previous observations made on the same type of data [8], this dependency could be positively or negatively correlated, and was not necessarily linear (Figure 5). To obtain estimates of cell-to-cell variability that were independent of mean trait values, we followed a procedure previously described that uncoupled CVs from mean by extracting residues from a lowess regression (see Methods and ref [8]). This way, 220 traits reflecting phenotypic noise per se were obtained for each sample. We then applied a Kruskal-Wallis test for each of these ‘noise traits’ on the null hypothesis of no strain effect. At p < 2.27 × 10-4 threshold (corresponding to p < 0.05 after Bonferroni correction for multiple testing), 76 noise traits were detected to be significantly affected by the strain background. This was one third of the traits considered and corresponded to variability of various cellular features: cell width, length and shape, size of actin regions within cells, bud size and orientation, and size of the bud nucleus (Additional file 6: Table S5). The trait for which cell-to-cell variation had the stronger dependence on the strain background was the short-axis length of unbudded cells (P < 10-9, Figure 6A-B), indicating that some backgrounds control cell width more tightly than others. Budding cells also showed traits with particularly different noise levels among strains. Bud size, for example, was more variable among Y9J cells than among UC8 cells (Figure 6C). The size of the region of bud occupied by actin was also more variable among DBVPG1373 cells than among YJM145 cells (Figure 6D). Interestingly, bud neck position (C105_A1B) also had higher cell-to-cell heterogeneity in some strains (YJM320 and YJM269) as compared to others (RM11-1D and YJM280), suggesting that all backgrounds do not control bipolar budding with equal precision.
Phenotypic noise varies both globally and specifically
The fact that many traits displayed strain-dependent noise raised the following question: is this variation global or specific? In the former case, one would expect to observe elevated noise of many unrelated traits in the same strains and little cell-cell variation in other strains. Alternatively, if variation is specific, a given strain may display high noise for some traits while remaining robust for other traits, and this spectrum of variability/robustness would differ between strains. To examine the first possibility, we compared strains for their phenotypic potential [8]. This value captures phenotypic noise in a broad sense, by averaging noise values from a large number of independent traits (see Methods). It was previously used on CalMorph datasets to detect artificial null mutations that affect general phenotypic buffering in yeast [8]. In principle, natural genetic variation may also affect the global molecular buffering of morphological traits, which would be detected by differences in phenotypic potentials among natural strains. After computing 5 independent estimates of the phenotypic potential of every strain, we observed that it significantly varied between backgrounds (Figure 7A, Kruskal-Wallis p = 0.02), although to a lesser extent than noise of specific traits. This shows that part of noise variation is indeed global, with strains Y9J, Y3 and DBVPG1373 showing pronounced global heterogeneities as compared to strains YJM421 and Y12. The modest statistical significance also indicates that variation is not entirely global. If it were, then the strain effect on global noise should be detected at similar or higher significance as the effect on specific noise traits, because measurement of global noise benefits from cumulated observations on various traits. This was clearly not the case, which suggests that the variation of noise is also specific.
To study this possibility, we performed a principal component analysis on the 76 noise traits that had a significant dependence on the strain background. The method is equivalent as the one presented above, except that the phenotypic values considered are now the noise of the traits instead of the trait values themselves. If noise of all traits was increased in the same strains (global variation), then the first principal component should explain most of the differences between strains, and this component should discriminate ‘noisy’ from ‘buffered’ strains. The analysis produced 7 significant components that altogether explained 71% of the variance (Figure 7B). The first component alone explained ~21% of the variance. Representing strains coordinates along these components showed that there was no obvious subgroup of strains with specific phenotypic noise values (Figure 7C). Analyzing the contribution of each trait to the principal components revealed that the first component corresponded to high variability of bud size and size of bud nucleus, but robust cell size at G1. The second component was also related to variability in bud size, whereas the third and fourth components corresponded to variability in the positioning of the dividing nucleus and variability of the size of the actin region in bud, respectively (Additional file 7: Table S6). Thus, in general, genetic backgrounds affected noise of specific sets of traits but not of all traits together. We conclude that a large fraction of cell-to-cell heterogeneity varies in a strain/trait specific manner, while another fraction varies because some strains are globally ‘noisier’ than others.