Metabolic Model-Based Analysis of the Emergence of Bacterial Cross-Feeding through Extensive Gene Loss

Metabolic dependencies between microbial species are common and have a significant impact on the assembly and resilience of microbial communities. However, the origins of such metabolic dependencies, the evolutionary forces that drive metabolic cross-feeding, and the impact of metabolic and genomic architecture on their emergence are not clear. To address these questions, we developed a novel simulation-based framework coupling a model of reductive evolution with a multi-species genome-scale model of microbial metabolism. We used this framework to model the evolution of a two-species microbial community, simulating thousands of independent evolutionary trajectories and investigating the link between genome reductive evolution and the emergence of metabolic interactions. Surprisingly, even though our model does not impose explicit selection for cooperation, metabolic dependencies emerged in nearly half of all evolutionary runs. Evolved dependencies involved cross-feeding of a diverse set of metabolites at varying frequencies, reflecting various constraints imposed by the metabolic network architecture. We additionally found metabolic ‘missed opportunities’, wherein species failed to capitalize on metabolites made available by their partners. When cross-feeding did evolve, it generally emerged immediately after a metabolite became available, but a complete dependence on such cross-fed metabolites often evolved relatively slowly. Examining the genes deleted and retained in each evolutionary trajectory and the timing of gene deletion events along these trajectories, we were further able to identify both genome-wide properties and specific gene retentions that were associated with metabolic phenotypes. Our findings provide insight into the evolution of cooperative metabolic interaction among microbial species, offering a unique view into the way such relationships could emerge in natural settings.


32
Most microorganisms in nature do not live in isolation but are rather part of complex 33 communities [1]. Such communities inhabit a wide range of environments and have attracted significant 34 interest in recent years due to an increased appreciation of their role in human health and environmental 35 stewardship as well as their potential applications in industry, agriculture, and medicine [2][3][4][5]. The 36 various microbial species that form these communities not only share a common environment, but rather 37 interact with other community members in various ways including competition for extracellular nutrients, 38 cooperation through metabolite cross-feeding, signaling, biofilm formation, and antimicrobial secretion 39 [1,6]. Such interactions allow community members to impact each other's behavior and play an important 40 role in shaping community structure and activity. A better understanding of how these interactions 41 emerge through ecological and evolutionary dynamics, how they are maintained or lost, and how they 42 impact community-level behavior is therefore crucial for both understanding the forces that have shaped 43 current natural communities and for designing synthetic communities or targeted modulation of natural 44

communities. 45
Perhaps the most intriguing form of microbial interaction is inter-species cooperation. The 46 prevalence of cooperative interaction is evident from the large number of microbes that cannot be 47 individually cultured, suggesting that they are reliant on symbiotic interactions with other members of 48 their communities [7]. In the context of metabolism, cooperation often takes the form of cross-feeding, 49 where one species secretes metabolites that other species uptake and utilize. Indeed, metabolic cross-50 feeding has been found to occur in a wide variety of environments and between diverse species [8], often 51 benefiting both partners [9]. For example, Bifidobacterium species in the gut microbiota regularly cross-52 feed fermentation products and partial digestion byproducts of polysaccharides to butyrate-forming 53 bacteria [10,11]. Furthermore, evidence suggests that metabolic cooperation drives species co-occurrence 54 in diverse microbial communities [12]. 55 Notably, however, metabolic cooperation is not limited to complex communities and has also 56 been demonstrated in small two-or three-species communities, such as those occupying various insect 57 hosts. For example, it was shown that the two endosymbionts that occupy a sharpshooter insect (and the 58 insect host) each lack necessary steps of several amino acid synthesis pathways, and consequently only 59 when the three organisms grow together can they synthesize the entire complement of amino-acids 60 [13,14]. Similarly, it was shown that two endosymbiotic bacteria that inhabit a Cicadoidea host had 61 recently diverged into 3 species, with metabolic complementarity between the two recently split lineages 62 [15]. Studying the facultative symbionts of another host, the sap-feeding whitefly, it was shown that pairs 63 of symbionts with greater metabolic complementarity are more likely to co-occur in a host [16]. Such 64 tightly coupled metabolic systems, where two or three species strongly depend on each other for survival, 65 could be viewed as an idealized model of microbial cooperation and accordingly have been extensively 66 studied. Similar tight cross-feeding behavior between a small number of species has been successfully 67 engineered in laboratory conditions, for example allowing auxotrophic strains of Escherichia coli to 68 survive in co-culture [17][18][19]. Cross-feeding has also been engineered between different strains of a yeast 69 species [20], and even between species as divergent as yeast and algae [21]. 70 This prevalence of cooperation is, however, somewhat surprising given a large body of work 71 suggesting that cooperation should be challenging to evolve [22,23]. Specifically, the emergence of 72 cheaters or the potential loss of species that provide essential metabolites could hinder the evolution of 73 cooperative interaction. One possible explanation for the emergence of cooperation in the face of these 74 challenges is the Black Queen hypothesis [24], positing that symbiotic interactions through gene loss 75 could evolve when a function is biosynthetically costly but has a leaky benefit [25]. This mechanism, 76 however, may not be applicable in fixed communities with small population size, where selection is 77 relatively weak [26]. In such communities (such as insect endosymbionts), cooperation likely emerges not 78 through selection by rather by chance as a consequence of extreme genome reduction [27]. Indeed, most 79 insect endosymbionts have extremely small genomes (some of which are the smallest bacterial genomes 80 known) that are majorly reduced compared to their closest free-living relatives [28]. 81 Importantly, however, despite the prevalence and diversity of such obligate endosymbionts, the 82 process and determinants through which extensive long term genome reduction gives rise to metabolic 83 cross-feeding are not clear and their study is challenging. Experimental evolution studies have aimed to 84 examine the evolution of metabolic cooperation, evolving a population seeded with a single species and 85 demonstrating the emergence of cooperative interactions between divergent polymorphic subsets of the 86 populations [29,30]. Similar experimental evolution studies that have focused on populations seeded with 87 multiple species that were initially cooperative have also shown that such species evolve to be both more 88 efficient at and more dependent on that cooperation [31,32]. However, such evolutionary experiments are 89 limited in both duration of evolution (which precludes studying extreme long term genomic reduction) 90 and number of replicates (which hinders development of a more comprehensive view of microbial 91 evolution and identification of general principles in the emergence of species interaction). Computational 92 methods, on the other hand, allow long time scales to be easily modeled and thus may be more applicable. 93 For example, prior studies determined the dispersal properties that favor cross-feeding by modeling the 94 co-evolution of microbial strains where the various genotypes determined which compounds were 95 secreted to a shared environment [33]. Another study used a mathematical model of two species that can 96 increase each other's fitness, aiming to identify the processes underlying the evolution of cooperation 97 [22]. Such models have also been used to study the impact of genetic variability on synergistic effects 98 between partner species [34], and to investigate the effect of population density on the emergence of 99 cross-feeding [35]. These studies have produced useful insights, but tend to explicitly model interaction in 100 a non-mechanistic way and thus are limited in utility for studying how a genomic evolutionary process 101 (e.g., genome reduction through genetic drift) could lead to the evolution of cross-feeding. 102 To address this gap, in this study, we utilized a model of microbial evolution over a long time 103 scale coupled with a mechanistic model of multi-species microbial growth. Our model is inspired by a 104 that emerged? And finally, how does evolution in a community shape co-evolutionary dynamics? 130 Addressing these questions could have profound impact both on our understanding of the evolution of 131 species interactions and on our ability to design and construct stable microbial communities for medical, 132 agricultural, and industrial application. 133 134

135
A framework for modeling the evolution of species interactions 136 To study the emergence of metabolic species interaction in bacteria we developed a 137 computational framework that integrates models of microbial co-evolution, metabolic activity, and 138 ecological interaction (Fig 1). Briefly, in this framework, we model a community comprised of two 139 generalist species growing in a shared environment (and that can therefore exchange metabolites) that go 140 through a reductive evolution process. Our reductive evolution model is inspired by a previously 141 introduced model of reductive evolution of a single endosymbiont species [36]. In our model evolution is 142 an iterative process in which a gene is first chosen at random from either of the two species for deletion 143 ( Fig 1A). The fitness effect of losing that gene (in the context of the community) is calculated using a co-144 culture metabolic model (described below). If the decrease in fitness to the species losing this gene does 145 not exceed a predefined threshold the deletion is assumed to fix; otherwise the deletion is assumed to be 146 selected against and is reverted. Importantly, during the course of this co-evolutionary process, the 147 presence of each of the two species in the community can markedly impact the evolution of the other (and 148 specially, the set of genes that can be deleted). This process repeats until no more genes can be deleted 149 from either species. 150 Specifically, to model growth in co-culture and to determine the fitness consequence of gene 151 deletions while accounting for the way the presence of one species in the community may impact the 152 fitness of the other, we used a previously introduced co-culture metabolic modeling framework [37]. This 153 framework employs dynamic Flux Balance Analysis (FBA) to predict the metabolic activity and growth 154 of two species in a shared environment over discrete time points (Fig 1B). Briefly, at each time point and 155 for each species in the co-culture, this framework uses a genome-scale metabolic model of the species 156 (based on its metabolic capacity as determined by the set of genes present in its genome), the current 157 concentration of metabolites in the environment, and a Flux Balance Analysis to predict the species' 158 behavior, including its growth rate and the rate at which it imports and excretes various metabolites [38] 159 ( Fig 1C). The estimated growth rates of the two species are then used to update the abundances of the 160 species in the community and the predicted uptake and excretion fluxes are used to update the 161 concentration of metabolites in the shared environment. Growth is simulated over several time points and 162 To model reductive evolution, genes are iteratively chosen at random as candidate for deletion, the fitness effect of their deletion is evaluated (using a co-culture growth model; see panel B), and if the fitness effect is relatively small, these genes are deleted. (B) The co-culture growth model simulates the growth of the two species in a well-mixed shared environment, and is based on a previously introduced dynamic multi-species model [37]. This model iteratively infers the behavior of each species in the shared environment based on an FBA approach (see panel C). The predicted growth of each species and the predicted rates at which it uptakes and excretes various metabolite are used to update the abundances of species in the co-culture and the concentration of metabolites in the shared environment over time. (C) An FBA model is used to predict the growth of each species in a given environment based on the set of metabolic reactions and constrains encoded by the species and the concentration of metabolites in its environment. the growth rates of each species at the last time point are used as proxies for their fitness. 163 To classify the interaction between the two species in each community and at each evolutionary 164 step, we also simulated and evaluated the growth of each of the two species in isolation (i.e., in mono-165 culture). We define a species as being dependent on its partner if it can grow in co-culture but not in 166 mono-culture (zero fitness). Accordingly, we distinguish between three possible types of interaction a 167 given community can exhibit: (i) Independentneither species is dependent on the other, (ii) commensal 168 one species ('dependent') is dependent on the other species but the other ('provider') is not, and (iii) 169 mutualisticboth species are dependent on each other. The observed interaction type at the end of the 170 simulation run (i.e., when both species reach minimal genomes) was used to label each evolutionary 171 trajectory (Fig 2A). Notably, in some cases, one of the two species can go through a catastrophic drop of 172 fitness (>50%) even in co-culture (e.g., due to a change in the other species' behavior that limits the 173 availability of a metabolite it requires). In such cases, that species was considered to have gone extinct 174 Plotted are examples of each of these four outcomes, illustrating the fitness of each of the two species in monoculture and in co-culture over evolutionary time. (B) The changes in interaction type over time for all 16,317 simulation runs. Each horizontal bar represents a single simulation run, and the color corresponds to the interaction type using the same colors as the titles in panel A. and the simulation was labeled as a collapsed community. A detailed description of the framework is 175 provided in Methods. 176 The emergence and prevalence of species interaction 177 We used the framework described above to simulate the evolution of a simple community 178 comprising two species that go through a reductive evolution process. We assumed that the community 179 composition is fixed as a two genotype community (i.e., with no new species migrating into the 180 community and no standing genetic diversity). In the simulation below, we initialized the community 181 with two identical E. coli strains (as a generalist model species). Such a scenario may represent the 182 evolutionary trajectory of two obligate symbionts that may have diverged from a common ancestor. 183 We simulated 16,317 independent evolutionary trajectories (Methods) and for each simulation 184 examined the evolved two-species community and the interaction between the two evolved species 185 (Fig 2A). Surprisingly, although our framework does not impose an explicit pressure toward species 186 interaction, we found that a substantial fraction of simulations resulted in a community with some sort of 187 metabolic dependency between the two species. Specifically, 35% of the simulations ended with a 188 commensal community, and 3.2% of the simulations ended with a mutualistic community ( Fig 2B). In 189 10.7% of the simulation, the community collapsed as described above. The remaining 51.1% of 190 simulations ended with independent communities in which both species were still capable of independent 191 growth. Using different fitness cutoffs for allowing deleterious gene deletions to fix affected the ratio of 192 the different interaction types, with a more stringent cutoff resulting in more independent communities 193 and a less stringent cutoff resulting in more commensal and mutualistic communities (see Supporting 194 Text). In natural communities the strength of selection against deleterious gene deletion reflects multiple 195 factors, ranging from population size to environment stability, which therefore indirectly affects the 196 likelihood of emergent cooperation. 197 An example of an emergent cross-feeding interaction 198 Next, we set out to examine the specific genes and metabolic processes involved in emergent 199 interactions. Before exploring large-scale patterns concerning the mechanisms involved in species 200 interaction, we set out to characterize in detail one evolved community as an example of the kind of 201 metabolic interaction that could emerge and the gene deletions that underlie such an interaction. We 202 specifically focused on one simulation run where the two evolved species (arbitrarily referred to below as 203 species A and species B) exhibited a mutualistic interaction. In this simulation, species A had retained 204 only 306 genes and species B had retained only 304 genes (compared to 1260 genes in the ancestor 205 species) and neither could grow in mono-culture. The two species, however, could still grow in co-culture 206 (albeit at only 78% and 73% of the ancestor's growth rate, respectively). 207 To identify metabolites that may be involved in cross-feeding, we first detected all the 208 metabolites found in the medium when the two evolved species grew in co-culture and that were not 209 included in the initial growth medium. We then tried growing each of the two evolved species in mono-210 culture by augmenting the initial minimal growth medium with one or more of these candidate cross-211 feeding metabolites. We found that species A could grow on the initial medium once tyrosine was added 212 (at 78% of the ancestor's growth rate). Species B could similarly grow on the initial medium once 213 thymidine was added (at 72% of the ancestor's growth rate). 214 We further examined the fluxes through the metabolic models of the evolved species and 215 compared them to the fluxes observed in the ancestor species, to identify the specific gene deletions that 216 gave rise to these dependencies (Fig 3). We found that species A became dependent on external tyrosine 217 due to a loss of the gene tyrA, which is necessary for tyrosine synthesis [39]. Indeed, species A's loss of 218 tyrA occurred at the exact same point in the evolutionary trajectory as its loss of the ability to grow in 219 mono-culture. Similarly, Species B became dependent on external thymidine due to a loss of the gene 220 thyA, which is necessary for dTMP synthesis [40]. Notably, we were also able to identify the evolved 221 mechanisms that allowed each of the two species to excrete the metabolite necessary for growth of the 222 other species. Specifically, species A started excreting thymidine due to a loss of the gene cmk, which is 223 necessary to phosphorylate CMP to CDP [41]. The loss of several other reactions prevented species A 224 from converting CMP to cytidine, uridine, uridine monophosphate, excreted uracil, or thymine, which 225 resulted in species A only being able to eliminate excess CMP by converting it to thymidine and excreting 226 it. Indeed, a cmk deletion in E. coli has been shown experimentally to result in 30-fold elevated CMP and 227 dCMP pools relative to wild-type [41]. Species B similarly excreted tyrosine due to an overproduction of 228 this metabolite following a complex combination of gene losses that resulted in elevated activation of the 229 pentose phosphate pathway and converting excess erythrose-4-phosphate into tyrosine. This community 230 provides examples of specific mechanisms that drive one species to support another via cross-feeding, 231 and those that are involved in the second species becoming dependent on these cross-fed metabolites. We 232 will next examine the prevalence of these and other specific metabolic interactions. 233

234
After characterizing one cooperating pair in detail, we moved on to examine the complete set of 235 communities evolved by our model, focusing specifically on identifying the metabolites underlying 236 emergent species interactions. To infer such cross-fed metabolites, we simulated the growth of each 237 evolved dependent species on minimal media supplemented with various metabolites and determined the 238 minimal set of supplemented metabolites required for growth (as described above; see Methods). This set 239 was assumed to represent essential metabolites provided by the partner. 240 We found that the majority of dependent species (94.3%) required only a single essential 241 metabolite to be cross fed from their partner, with only a small fraction of dependent species requiring 242 two or three such metabolites (5.5% and 0.2% respectively), and no species requiring more than three. 243 Formate was the most common essential metabolite (65% in commensal dependent species; see Fig 4A), 244 followed by Tyrosine (17.6%) and Phenylalanine (6.4%). Notably, the dependence on a single (or very 245 few) metabolites reported above contrasts observations made in several insect symbionts systems where 246 cooperating symbionts exchange multiple essential compounds (and see Discussion below), yet the 247 exchange of aromatic amino-acids is in agreement with cross-fed metabolites often observed in such 248 systems [13,42]. 249 Complete dependence on cross-fed metabolites (such as those identified above) is the most 250 defining feature of species interactions in these communities, but may represent an extreme form of 251 interaction. Clearly, cross-feeding can be beneficial to a species even when it is not essential for growth, 252 and in fact this form of cross-feeding may be a common precursor state of complete dependence. To 253 detect such non-essential cross-fed metabolites we examined transporter fluxes and identified metabolites 254 that are being excreted by one species in the community and uptaked by the other (Fig 4A, yellow  255 circles). Indeed, in addition to the essential metabolites identified above, our analysis revealed multiple 256 metabolites that are being cross-fed but are non-essential to growth. Such non-essential cross-feeding 257 often involved metabolites that were rarely if ever depended on (such as acetaldehyde and pyruvate) and 258 The frequencies of metabolites' availability, cross-feeding, and dependence are shown for species of each interaction type and for each metabolite. Each set of nested circles shows the frequency at which the given metabolite is produced by their partner species and hence available for uptake (blue), the frequency at which this metabolite is utilized by the species through cross-feeding (yellow), and the frequency at which this metabolites is depended on (red). The area of the circle scales with the frequency, but for visualization purposes the portions of the circle extending beyond the rectangular box are not shown. (B) The distributions of evolutionary time (measured as number of gene deletions) elapsed between the different stages of metabolic interaction, for commensal species dependent on a single metabolite.
were observed at similar frequencies in species across all interaction types. Interestingly, however, we 259 also detected non-essential cross-feeding of metabolites that were commonly depended upon, but these 260 were rare in independent communities and occurred surprisingly often in commensal communities where 261 providers were cross-fed such metabolites by the dependent partner (see for example, tyrosine and 262 phenylalanine in Fig 4A). This finding suggests that species cooperation may involve two species that 263 evolve a similar metabolic strategy (and therefore have the potential to both excrete and utilize a similar 264 set of metabolites). In such cases, cross-feeding is likely to emerge, first as a non-essential process, which 265 may later evolve into species commensal or mutual dependence. To confirm this hypothesis, we 266 specifically examined, for each dependent species, the time that elapsed from when this species started 267 consuming a metabolite via cross-feeding to when it became dependent on that metabolite. We find that 268 indeed, in most cases dependence does not immediately follow cross-feeding, and there is often a 269 substantial delay between cross-feeding and dependency ( Fig 4B). 270 Clearly, uptaking a metabolite is only possible if the partner species is producing that metabolite 271 and excreting it to the shared environment, thereby providing an opportunity for cross-feeding. We 272 additionally quantified the frequency and time at which such opportunities arose, regardless of whether 273 the metabolite was consumed or not ( Fig 4A, blue circles). We found that metabolites vary greatly in the 274 frequency at which they are excreted, and in a way that is not fully correlated with the frequency at which 275 they are cross-fed or dependent on. For example, various metabolites, including cytidine, succinate, and 276 acetate, are excreted at relatively similar frequencies in all interaction types, suggesting that dependency 277 on these metabolites is not limited by their availability. Conversely, other metabolites, such as serine and 278 thymidine are rarely excreted in independent communities, suggesting that the availability of these 279 metabolites often lead to cross-feeding and dependency on them. Most importantly, while cross feeding 280 often started almost immediately after the metabolite was available (in cases in which it occurred; see 281 Fig 4B), in many cases evolving species failed to utilize available metabolites, thus completely missing 282 cross-feeding (both essential and non-essential) opportunities (Fig 4A). This finding implies an intriguing 283 dichotomy where available opportunities are either utilized immediately or are not utilized at all, 284 potentially due to evolutionary constraints. 285 We finally examined the total number of different metabolites being excreted by each species 286 over time, hypothesizing that species that excrete useful metabolites early on are more likely to become 287 provider species. Surprisingly, however, we found that during the first half of the evolutionary process 288 future providers in fact tend to excrete a similar or even a smaller number of metabolites on average 289 compared to future dependents (Fig S1), and only toward the end of the evolutionary process did 290 providers excrete more metabolites than dependent species. This pattern could suggest that species that 291 eventually became dependent were less optimal early on, excreting more waste products, and that this 292 wasteful behavior may have led to the development of dependence. Notably, all species gradually excrete 293 more metabolites over the course of the evolution process, likely reflecting more complex growth 294 strategies imposed by their shrinking genome). 295

The genomic basis of evolved interactions 296
Our mechanistic model of microbial metabolism allows us to move beyond a phenotype-level 297 description of evolved communities and to directly investigate patterns of genome evolution and identify 298 genomic mechanisms involved in species interactions. We first examined the number of genes that were 299 retained or lost in different simulations to explore the relationship between genome size (in terms of the 300 number of genes retained) and species interaction. Surprisingly, with the exception of collapsed 301 communities, evolutionary trajectories exhibited a markedly low variation in the total number of genes 302 retained (note, for example, the similar length of the simulation runs illustrated in Fig 2B), with an 303 average of 298.6±4.4 genes retained in each species. Yet, when comparing species from communities of 304 different interaction types, we found that both dependent and mutualistic species had slightly but 305 significantly smaller genomes compared to independent species (P < 10 -30 and P < 10 -9 respectively; two 306 sample t-test; Fig 5A). Moreover, within commensal communities, the genomes of dependent species 307 were slightly but significantly smaller than the genomes of provider species (P < 10 -30 ). Notably, while 308 these differences in average genome size were generally very small (often less than a single gene), the 309 differences between the smallest genomes observed in the dependent or mutualistic species and the 310 smallest genomes observed in provider or independent species was much larger (278-287 genes; Fig 5A). 311 These results are consistent with the idea that cross-feeding allows dependent species to lose genes they 312 would not be able to lose otherwise [14,43]. Moreover, provider species had on average a slightly larger 313 genomes than independent species (P < 10 -3 ), suggesting that provider species are a potential consequence 314 of evolutionary trajectories that ended with larger minimal genomes. Notably, examining the genes 315 retained across the evolved species, we found that of the initial 1260 genes present in the ancestral 316 species, 560 were always lost and 149 were always retained, with only 551 genes being retained at 317 intermediate frequencies (Fig 5B). The specific subset of these 551 genes that were retained in each 318 evolved species therefore determines the types of interaction that emerged, and indeed a statistical 319 analysis was able to detect specific genes whose retention or loss was associated with specific types of 320

interaction (see Supporting Text). 321
We next examined how similar, on average, are the sets of genes retained between the two 322 partners in each community. We found that mutualistic species were less similar to each other than 323 independent species (P < 10 -20 ; two sample t-test). This finding suggests that the evolution of metabolic 324 dependency is associated with a process of functional diversification, where each of the two species 325 retains certain metabolic capacities that the other species has lost. To further investigate this 326 diversification process we turned our attention again to commensal communities, in which the two 327 species can be labeled clearly as dependent and provider and therefore the direction of dependency is 328

clear. Such communities also represent an intermediate level of interaction as compared to mutualistic 329
communities, in which each species is acting as both provider and dependent, which complicates 330 dissection of the mechanism of interaction. Indeed, the two partner species in commensal communities 331 were more similar to one another than species in mutualistic communities (P = 1.4 x 10 -7 ) but more 332 divergent than species in independent communities (P < 10 -20 ). 333 To better understand the diversification between providers and dependent species in commensal 334 communities, we compared the set of genes retained in providers vs. those retained in dependent, 335 identifying 80 genes that are more frequently retained in the provider and 41 that are more frequently 336 retained in the dependent species. For example, pflA, a pyruvate formate lyase, was retained in 55.5% of 337 providers but only 16.8% of dependents, whereas aceE and aceF, both components of the pyruvate 338 dehydrogenase complex, were retained in 81.1% in dependent species and only 19.0% in provider species 339 (these genes were also those with the greatest differential retention rate). This previous analysis identified 340 genes more often retained in the provider or dependent species, yet ignored information about which of 341 these retentions co-occurred in the same provider-dependent community. Applying a hypergeometric test 342 (see Methods), we therefore identified a set of 263 pairs of genes that are significantly exclusively-343 retained in commensal communities (i.e., the dependent species retained the first gene when the provider 344 lost the second gene or vice-versa more often than expected by chance; Fig S2). We further found that 345 this set of exclusively-retained gene pairs was enriched for pairs that shared a pathway annotation (P < 346 10 −4 ; permutation-based test), suggesting complementation at the pathway level. 347 We finally set out to examine the dynamics of gene deletion events in commensal communities, 348 specifically focusing on the order in which deletions occurred in the provider and dependent species and 349 aiming to detect dependencies between these deletion events that could highlight key evolutionary steps 350 on the route to cross-feeding. To this end, we used a permutation-based analysis to identify instances 351 where a gene in one species tended to be deleted after another gene was deleted in the partner species (see 352 Methods). We identified 9 such gene pairs (at 1% FDR), all of which involved a gene being deleted in the 353 dependent species significantly more often after a different gene was first deleted in the provider. instead of phenylalanine. These deletions therefore promote over production and excretion of tyrosine by 360 the provider, allowing the dependent to lose tyrA, a gene necessary for tyrosine synthesis. The deletion of 361 the pheA, a gene necessary for phenylalanine synthesis in the dependent, was also found to follow the 362 deletion of talA and talB in the provider, which is not surprising given the similarity in the biosynthesis 363 pathways of these two amino acids. Finally the deletion of pyrG in the dependent tended to follow the 364 deletion of cdd, cmk, and codA in the provider. The deletion of cmk (necessary for recycling CMP into 365 CTP) and of cdd and codA (catalyzing reactions that could convert CMP or related products into other 366 bases) could result in cytidine excretion and accordingly allows the dependent to lose pyrG (a component 367 of CTP synthase) which creates a dependency on cytidine (Fig S3). To further examine the mechanism 368 involved in these interactions, we tested whether the deletions of these key genes are sufficient to cause 369 over-production and excretion of the relevant metabolites. Indeed, we found that deletion of cdd, cmk, and 370 codA in the ancestral species (i.e., without any additional gene deletions) led to cytidine excretion. 371 Deletion of talA, talB, aroP, and pheP in the ancestral species, however, was insufficient to cause 372 excretion of either phenylalanine or tyrosine, suggesting that additional gene deletions are necessary to 373 give rise to this phenotype. 374 Linking genome evolution to metabolite cross-feeding 375 Having identified both the metabolites involved in cross-feeding and the genes involved in the 376 emergence of species interaction, we finally turned to examine the association between specific gene 377 retention or loss events and the cross-feeding of specific metabolites. Towards this end we again 378 considered the set of all commensal communities and, for each of the 14 cross-fed metabolites that were 379 depended upon at least 10 times, identified genes whose retention or deletion are significantly correlated 380 with the excretion of this metabolite in the provider or with the dependency on this metabolite in the 381 dependent (see methods). In total we identified 384 gene-metabolite associations in provider species, 382 including 226 significant gene retentions and 158 significant gene deletions associated with essential 383 metabolite excretion (Fig 6; χ 2 test, 1% FDR). We similarly identified 459 gene-metabolite associations in 384 dependent species (277 retentions and 182 deletions) associated with metabolite dependency. In total, 385 retention or loss of 197 of the 510 variable genes was significantly associated with excretion of or 386 dependence on at least one metabolite. Moreover, many genes were associated with the excretion of or 387 dependence on more than one metabolite, likely reflecting the interconnectedness of the metabolic 388 network where the loss of a key gene could affect multiple metabolic phenotypes. 389 Interestingly, certain genes were associated with both the excretion of a metabolite by the 390 provider and with the dependence on that same metabolite in the dependent species. For example, 391 retention of tyrP and aroP (aromatic amino acid transporters) were associated with both providing 392 tyrosine and dependence on tyrosine, as cross-feeding of that metabolite requires that both species in the 393 pair could exchange it with the shared environment. In other cases the same gene was associated with a 394 metabolite being excreted by the provider and utilized by the dependent but in a different direction. For 395 Figure 6: Associations between providing or depending on specific metabolites and deletion or retention of specific genes. Orange (purple) bars indicate genes that are lost (retained) significantly more often in species that provide or depend on a specific metabolites compared to independent species. Only associations with an absolute difference in gene retention frequency between the provider/dependent species and independent species ≥10% are shown. Genes are sorted by deletion frequency among all species from non-collapsed communities. example serA and serBgenes involved in serine biosynthesistended to be lost in species dependent on 396 serine, but retained in species producing it. In rare cases there were genes whose loss was associated with 397 both dependence and providing of a specific metabolite. For example, the loss of the gene codA was 398 associated with both excretion of cytidine by providers and dependency on cytidine in dependent species 399 (and see our analysis of that gene above). Combined, these associations suggest a complex link between 400 evolutionary gene loss events and the emergence of metabolic species interactions and highlight the 401 multitude of ways through which such interactions could evolve. 402 403

404
In this study we investigated the potential for metabolic mutualism to emerge between species 405 inhabiting a shared, isolated environment and undergoing continual gene loss. We found that cross-406 feeding interactions emerged frequently (although two-way mutualistic interactions were much rarer). By 407 examining associations between the loss and retention of genes and cross-fed metabolites we were able to 408 elucidate evolved mechanisms of metabolic overproduction and dependence and to identify the extent to 409 which the genetic and metabolic architectures imposed constraints on this process. 410 Importantly, evolved communities exhibited complex, multifaceted, and non-trivial metabolic 411 interactions that were not necessarily optimized at the community level; useful metabolites were often 412 excreted by one species but not utilized by its partner and other metabolites were cross fed without 413 evolving complete dependency. Such "messy" interactions and missed metabolic opportunities are a 414 reasonable outcome of selfish species evolution in the absence of explicit selection for interaction, and 415 could also occur in natural communities. Another potential contributor to this complexity is the 416 interconnectedness of different metabolic phenotypes induced by the genetic and metabolic architecture. 417 Indeed, our analysis has demonstrated that genes were often associated with the excretion and/or 418 production of multiple different metabolites. This interconnectedness may also account for the relatively 419 frequent occurrence of provider species utilizing metabolites excreted by their dependent partners, where 420 the gene retention and loss events that cause a dependent relationship in one direction may also facilitate 421 emergence of a reciprocal cross-feeding relationship. Interestingly, however, in our simulations, 422 metabolic dependency usually involved a single metabolite, while real world mutualistic endosymbionts 423 often exchange and are dependent on multiple metabolites [13]. One potential explanation is that in our 424 model bacteria continue to grow optimally (given the metabolic capacities encoded by their reduced 425 genome), whereas in reality extreme genome reduction likely impacts cell regulation and control of 426 growth, potentially causing cells to excrete a larger variety of useful metabolites that could be beneficial 427 to their partners. Our analysis also suggests that the likelihood of missed metabolic opportunities may 428 vary across metabolites, with some metabolites (e.g., cytidine, succinate, and acetate) being excreted as 429 relatively similar frequencies in all interaction types and others (e.g., serine and thymidine) being rarely 430 excreted in independent communities. 431 Our analysis additionally demonstrated how functional diversification leads to metabolic 432 cooperation, where each species retains certain metabolic capacities that the other species has lost. Given 433 a diversification process, it is interesting, however, to speculate about what causes one community to 434 evolve a commensal interaction and another to evolve a mutualistic interaction. We found, for example, 435 that provider species had on average a slightly larger genome than independent species, suggesting that a 436 provider state is the outcome of more constrained evolutionary trajectories that end with larger minimal 437 genomes. Our finding that dependent species in fact tend to excrete more metabolites at the beginning of 438 the evolutionary process might further imply that early 'wasteful' behavior may contribute to the 439 evolution of dependence. Another interesting outcome of our results was the dichotomy observed when a 440 new metabolite became available, with species either starting to consume it immediately and later 441 becoming dependent on it or never consuming it at all. These missed opportunities seem to be examples 442 where the evolutionary events that occurred before the availability of the metabolite precluded utilization 443 of that metabolite by potentially losing the necessary transporter or other reactions necessary for uptake. 444 With this in mind, the non-essential cross feeding observed in commensal communities may simply 445 represent communities that were on the path toward mutualism, but where cross-feeding emerged too late 446 in the evolutionary process when the providers have already lost genes that would be necessary for 447

dependence. 448
Despite these exciting results, there are clearly some caveats in the framework used in this work. 449 For example, our framework assumes that bacteria grow selfishly, and accordingly cross-feeding often 450 requires extensive gene deletions to force excretion of useful metabolites. In reality bacteria can be leaky 451 and release metabolites into their environment even without mutations [17,24]. Another drawback stems 452 from the fact that FBA does not take into account factors such as entropy or pH. For example, the 453 emergence of formate cross-feeding that occurred in our simulations might be less biologically feasible 454 because excess formate accumulation inhibits E. coli growth and acidifies the local environment [44]. 455 Additionally, we model genome reduction as occurring one gene at a time [45], but do not account for the 456 possibility of simultaneous loss of larger genomic regions [46]. Such a process could give rise to different 457 patterns, although a study of a single reductively evolving species that examined both evolutionary 458 regimes did not observe qualitative differences [36]. Moreover, in consideration of simulation time, in 459 this study we only considered two-genotype communities. It would be interesting to expand our 460 framework and to model the evolution of more complex communities or to account for spatial 461 heterogeneity [47,48]. Finally, two decisions in the design of our study that likely had significant impacts 462 on the outcomes were the media and the fitness cutoff. The media used is M9 minimal media, which was 463 chosen to be more permissive of cross-feeding than rich media. Clearly, choosing different media or 464 different limiting concentrations could impact the type of interactions that would evolve [49]. The fitness 465 cutoff used was chosen as an intermediate value between the two cutoffs used by [36], and as shown in 466 the supplementary text was found to have a significant effect on the frequency of evolved commensalism 467 and mutualism. Since this fitness cutoff represents the strength of selection, a permissive fitness cutoff (as 468 the one used in our study) allows genetic drift to play a dominant role in determining evolutionary 469 trajectories, in agreement with the balance between selection and genetic drift hypothesize to govern the 470 evolution of endosymbionts [26]. 471 Looking forward, the framework presented in this study could be broadly relevant for improving 472 our understanding of how mutualistic relationships can naturally emerge between bacterial species. This, 473 in turn, would facilitate a deeper understanding of both simple communities, as in the case of insect 474 endosymbionts, and significantly more complex communities, such as those inhabiting the human gut. The chosen gene and all the metabolic reactions that depend on this gene were deleted from the species' 488 model. The fitness effect of this deletion in the context of the community was determined using the co-489 culture growth model (see below) to evaluate the growth rate of the reduced model when grown with the 490 current model of the partner species. If the calculated fitness effect (when compared to the fitness of that 491 species prior to this gene deletion) was positive, neutral, or smaller than the chosen cutoff (cutoffs used 492 include 1%, 5%, and 10%), the deletion became permanent and the process repeated with the reduced 493 model. However, if the fitness effect exceeded the cutoff, the deletion was considered to be too harmful to 494 occur and the process repeated until a gene that could be deleted was found. This evolutionary process 495 continued until deletion of any remaining gene from either of the two species would cause a drop in 496 fitness exceeding the cutoff, in which case the simulation ended. The simulation also ended if the chosen 497 gene deletion in one species (i.e., a gene deletion that was relatively harmless for that species) caused the 498 other species to drop significantly in fitness (>50%) in the co-culture. Such simulations, where a partner 499 species was no longer being supported, represent collapsed communities. 500

501
The co-culture growth simulation was based on a previously introduced dynamic Flux-Balance 502 Analysis framework and is described in more detail elsewhere [37]. Briefly, given a multi-species 503 community inhabiting a shared medium, the framework assumed that at each time step, each species grew 504 optimally given the current concentration of metabolites in the medium (i.e., selfish growth), and then 505 updated the abundance of each species and the concentration of metabolites in the medium based on the 506 predicted growth and activity of each species. Specifically, at each time step, the framework first 507 calculated the upper bound on metabolites' uptake for each species based on the concentration of 508 metabolites in the medium and the cell density of each species. A Flux Balance Analysis (FBA) was then 509 used to determine the fluxes through each species' reactions given these uptake constraints by 510 maximizing the species' biomass production (as a proxy for growth). A second optimization was 511 performed to minimize the total flux through all reactions while keeping the biomass production fixed at 512 the maximum rate (representing a minimization of enzyme usage). The predicted growth rate of each 513 species and the predicted rates at which each species uptakes and excretes various metabolites were then 514 used to update the cell density and concentrations of metabolites in the medium. The process was then 515 repeated at the next time step. 516 For the purpose of this study, each co-culture simulation consisted of 8 steps of 0.125 hours 517 followed by 4 steps of 0.5 hours. This provided a more accurate account of species growth at the initiation 518 of any potential interaction, while still providing information about the co-culture growth at a longer time 519 scale. The growth rates at the last time point (i.e., after 2.5 hours) were used as a measure of each species' 520 fitness. Both species started at a biomass of 0.01 grams dry mass in 1L volume for mono-culture or 2L for 521 co-culture, resulting in the same cell density for both (which is equal to about 4*10^7 cells per liter for E. 522 coli). The species were grown on a medium based on M9 minimal media [51], containing sodium, 523 chloride, sulfate, inorganic phosphate, potassium, magnesium, ammonia, glucose, water, hydrogen, and 524 oxygen. In addition the metals copper, iron, molybdate, manganese, zinc, nickel, and cobalt were included 525 as they are necessary for growth of the E. coli model. These metabolites were all present in the medium at 526 an excess concentration of 10M to ensure exponential growth for the entire course of the co-culture 527 simulation. A low concentration (0.0001 mM) of jumpstart metabolites were also included to allow 528 growth of obligate mutualistic pairs (see below). FBA solutions were calculated using glpkmex, a Matlab 529 interface for GLPK, GNU Linear Programing Kit. GLPK version 4.54 was used, and glpkmex version 530 2.11. 531

532
Simulating the growth of species that evolved to be obligate mutualists with a dynamic FBA 533 model has the inherent problem that neither species is able to grow initially on the minimal medium (and 534 consequently will not excrete any of the byproducts needed to allow the other species to grow). In 535 biological systems this problem can be overcome by heterogeneity in the growth phenotypes of individual 536 cells, nutrients released by dead cells, or trace nutrients present in the environment. Rather than 537 simulating diverse growth phenotypes or cell death, in this study we jumpstarted mutualistic growth by 538 supplementing the minimal growth medium described above with trace amounts of potentially necessary 539 metabolites. The set of these "jumpstart metabolites" was determined by identifying metabolites that 540 could be produced by non-transfer reactions still present in at least one of the two species (even if the 541 pathway was not complete). This set therefore represented an upper bound on which metabolites could be 542 exchanged. Jumpstart metabolites were initialized at a low concentration of 0.0001 mM. To ensure that 543 species that utilized these metabolites for growth could eventually be supported by the production of these 544 metabolites by the partner species (rather than continually relying on the trace amounts of these 545 metabolite provided at the beginning of the simulation), at 1 hour into the growth simulation, this same 546 low concentration (0.0001 mM) was subtracted from each jumpstart metabolite. 547

548
Simulation runtime considerations necessitated using a relatively limited time resolution in the 549 co-culture growth simulation (see above). To confirm that the evolved communities were not affected by 550 this, for each completed simulation we ran additional co-culture growth simulations on the resulting 551 minimal models using a finer time resolution. Specifically, co-culture growth was simulated until the 552 medium was exhausted with time steps of 0.1 hours, using otherwise the same conditions as the co-553 culture growth model employed during evolution (including removing the jumpstart metabolites at 1 554 hour). Community growth was deemed to have been accurately simulated if: 555 1. Glucose eventually ran out, indicating that the two species were able to continue growing stably. 556 2. Both species were able to continue growth until this exhaustion of the media. Growth of both species 557 must have been at least 50% of their measured fitness value within the last hour before all growth 558 ended. 559 3. The growth rate of both species at 2.5 ± 0.2 hours was at least 90% of their fitness value as measured 560 during the course of the evolution. 561 Simulations that failed any of these three criteria were excluded from the downstream analysis. Of the 562 16377 completed simulation runs, 16, 317 (99.6%) passed this filtering step. 563

564
Interaction type was determined by comparing the fitness of each species when grown in co-565 culture with its fitness when grown in mono-culture. If the fitness of a species at a given time point was 566 zero in mono-culture and non-zero in co-culture, the species was labeled as dependent at that time. If it 567 had non-zero growth in both mono-and co-culture it was labeled as independent. Communities were 568 labeled by the relationships between the two species: If both species were independent, the community 569 was labeled as independent. If one species was dependent and the other was independent, the community 570 was labeled as commensal. If both species in a community were dependent, the community was labeled as 571 mutualistic. Within commensal communities, the dependent species was referred to as 'dependent' and 572 the independent species was referred to as 'provider'. 573

574
For each evolved species we determined what metabolites it depends on (if any). To this end, we 575 first identified metabolites that were being exchanged between the two species at the final co-culture 576 growth time point by finding exchange reactions for which the two species had fluxes of opposite sign. 577 The growth of each dependent species in the pair was then assayed on minimal medium supplemented 578 with all possible combinations of these exchanged metabolites, using a single time step mono-culture 579 growth model, to identify the smallest set of supplement metabolites that allowed it to grow at >50% its 580 growth rate in co-culture. If no combination of supplement metabolites allowed such growth the search 581 was expanded to include all combinations of metabolites present in the medium at the end of the co-582 culture simulation and that were not part of the minimal media (such metabolites could have been 583 excreted by the provider at previous time steps). 584

585
Pathways annotations for each gene in the model were obtained from the Kyoto Encyclopedia of 586 Genes and Genomes (KEGG) [52]. To identify enrichment of KEGG pathways in subsets of genes (e.g., 587 those that were deleted at significantly different rates between interaction types), we generated 100,000 588 random subsets of genes of the same size and compared the total number of genes associated with each 589 pathway in the real set to the number of genes associated with that pathway in random sets. 590

591
To compare the similarity of two genomes (e.g., in an evolved community), we used the Jaccard 592 similarity coefficient. We then used a two-sample t-test to test for significant differences in similarity 593 between different types of communities. 594

595
To identify gene-gene co-occurrence relationships, we examined all pairwise combinations of 596 genes that were both retained and deleted at least three times. Since many genes perfectly co-varied with 597 each other across simulations, we first grouped genes into sets of perfectly co-varying genes and 598 identified co-occurrence relationships between these sets. For each pair of gene sets, we found the number 599 of species that had retained each set and the number of species that had retained both sets and used a 600 hypergeometric test to determine whether these sets have been co-retrained significantly more or less 601 often than expected by chance (at 1% false discovery rate; [53]). Test for enrichment of shared pathways 602 among the significant gene set pairs was done by permuting the connections between pairs. 603

604
Given a pair of genes, A and B, we recorded the number of times gene A in the dependent was 605 deleted before gene B in the provider. We then used a permutation-based assay, permuting the time of 606 deletion (measured as the position in the ordering of all gene deletions in that simulation) of each gene 607 between all providers or dependents from commensal communities in which that gene had been deleted. 608 Gaps and overlaps in the resulting permuted gene deletion histories were resolved by shifting deletions 609 into gaps and randomly breaking ties. The number of times gene A in the dependent was deleted before 610 gene B in the provider in the original data was compared to this number in the permuted data to identify 611 significantly common ordered pairs of gene deletes (at 1% false discovery rate). 612

613
To identify correlations between retention or deletion of specific genes and metabolic 614 phenotypes, we considered all genes that were both deleted and retained at least 10 times and all 615 metabolites that were depended upon at least 10 times. For every pair of such genes and metabolites, we 616 compared the frequency of deletion of that gene in commensal species that are dependent on that 617 metabolite to the frequency of deletion of that gene in independent species. This was repeated for 618 commensal species that provided the metabolite their partner depends upon, and in both cases instances of 619 genes being deleted more often or retained more often in species with that metabolic phenotype were 620 identified (at 1% false discovery rate). 621