### Representing lipid constraints with the aid of SLIME reactions

Flux balance analysis (FBA) [21] is based on the following assumptions on metabolism: (i) a cell has a metabolic goal, which we can represent through a mathematical objective function, (ii) under short timescales there is no accumulation of intracellular metabolites, and (iii) metabolic fluxes are bounded to physical constraints, such as thermodynamics and kinetics. Those three assumptions define a basic FBA problem as followed:

$$ {\displaystyle \begin{array}{c}\operatorname{Min}\left({\mathrm{c}}^{\mathrm{T}}\mathrm{v}\right)\\ {}\mathrm{S}\cdot \mathrm{v}=0\\ {}\mathrm{LB}\le \mathrm{v}\le \mathrm{UB}\end{array}} $$

(1)

where S is the stoichiometric matrix, which contains the stoichiometric coefficients for all reactions and metabolites, v is the vector of metabolic fluxes [mmol/gDWh], c is the objective function vector, and LB and UB are the corresponding lower and upper bounds for each of the fluxes (some of them based on experimental values). As we usually wish to simulate growth, a biomass pseudo-reaction is typically added to the stoichiometric matrix as follows:

$$ \mathrm{protein}+\mathrm{carbohydrate}+\mathrm{lipid}+\mathrm{RNA}+\mathrm{DNA}+\dots \to \mathrm{Biomass} $$

(2)

where protein, carbohydrate, lipid, etc., are pseudo-metabolites that are produced from a combination of metabolic components. For example, protein is produced from a protein pseudo-reaction:

$$ {\mathrm{s}}_1\ \mathrm{ala}+{\mathrm{s}}_2\mathrm{cys}+{\mathrm{s}}_3\ \mathrm{asp}+\dots \to \mathrm{protein} $$

(3)

where s_{i} are the measured abundances [mmol/gDW] of the corresponding amino acids in yeast. In this study we focus on the lipid pseudo-reaction, which becomes more challenging to formulate, because there are so many different individual species. One option is to define an equivalent reaction to Eq.3 with every single lipid species [10]; however, as these measurements are not available for most organisms, the lipid pseudo-reaction is usually represented as the following instead:

$$ {\mathrm{s}}_1\ \mathrm{PI}+{\mathrm{s}}_2\ \mathrm{PC}+{\mathrm{s}}_3\ \mathrm{TAG}+{\mathrm{s}}_4\ \mathrm{ERG}\dots \to \mathrm{lipid} $$

(4)

where PI (phosphoinositol), PC (phosphocholine), TAG (triglyceride), ergosterol (ERG), etc. represent each of the lipid classes that exist in the model. Most of them represent not one but a plurality of different molecules, each with different combinations of acyl chain lengths and saturations. Therefore, they are also pseudo-metabolites that need to be produced in turn by additional pseudo-reactions. These pseudo-reactions can be constructed either in a restrictive or permissive approach (Fig. 1). The restrictive approach is to enforce the experimental FAME distribution to every single specific lipid species. This can be achieved by creating a generic acyl chain component [12]:

$$ {\mathrm{s}}_1\ \mathrm{Acyl}\ \mathrm{CoA}\left(16:0\right)+{\mathrm{s}}_2\ \mathrm{Acyl}\ \mathrm{CoA}\left(16:1\right)+\dots \to \mathrm{Acyl}\ \mathrm{CoA} $$

(5)

where the s_{i} coefficients are fractions inferred from FAME data. This generic Acyl – CoA is then used to form each generic lipid species, forcing then every lipid class to have the same acyl chain distribution. The latter is an important limitation of this approach, considering that the acyl chain distribution can vary significantly across lipid classes [5].

On the other hand, the permissive approach for building the pseudo-reactions is to allow any of the specific lipids to form the generic lipid class [15]. For instance, the following set of pseudo-reactions can be defined for PI:

$$ {\displaystyle \begin{array}{c}\mathrm{PI}\ \left(16:0-16:1\right)\to {\mathrm{s}}_1\mathrm{PI}\\ {}\mathrm{PI}\ \left(16:1-16:1\right)\to {\mathrm{s}}_2\mathrm{PI}\\ {}\mathrm{PI}\ \left(16:1-18:1\right)\to {\mathrm{s}}_3\mathrm{PI}\\ {}\vdots \end{array}} $$

(6)

where s_{i} can be set to 1 or adapted to represent the cost of producing each specific lipid. The problem of this approach is that it disregards the acyl chain distribution, even if FAME data is available. Therefore, once simulations are computed, the model will always end up preferring the “cheapest” species to produce in terms of carbon and energy, usually corresponding to species with the shortest acyl chains, unless the s_{i} coefficients are arbitrarily tuned to favor longer chains.

Additionally, even though for some specific species such as ergosterol the measured abundance [mg/gDW] can be directly transformed to the stoichiometric coefficient in Eq.4 [mmol/gDW], for most lipids the measured abundance cannot be directly converted, as the molecular weight varies between specific lipid species. Hence, average molecular weights need to be estimated in both permissive and restrictive approaches, leading to skewed predictions.

In this study we solve all the problems presented above through two new types of pseudo-reactions, to account for both constraints on lipid classes and on acyl chains. The first pseudo-reactions **S**plit **L**ipids **I**nto **M**easurable **E**ntities and are hence referred to as SLIME reactions. As the name suggests, these pseudo-reactions take each specific lipid and split it into its basic components, i.e. its backbone and acyl chains:

$$ {\mathrm{L}}_{\mathrm{i}\mathrm{j}}\to {\mathrm{s}}_{\mathrm{i}}\ {\mathrm{B}}_{\mathrm{i}}+\sum \limits_{\mathrm{k}\in \mathrm{j}}{\mathrm{s}}_{\mathrm{jk}}\ {\mathrm{C}}_{\mathrm{k}} $$

(7)

where L_{ij} is a lipid of class i and chain configuration j, B_{i} the corresponding backbone, C_{k} the corresponding chain k, and s_{i} and s_{jk} the associated stoichiometry coefficients. These reactions replace any pseudo-reaction of the sort of Eq.5 or Eq.6 that were already present in the model (Fig. 1).

The second type of pseudo-reactions are new lipid pseudo-reactions, which will in turn replace Eq.4, the old lipid pseudo-reaction that only constrained lipid classes. There are now three different lipid pseudo reactions (Fig. 1): the first pulls all backbone species created in Eq.7 into a generic backbone and uses the corresponding abundance data [g/gDW] as stoichiometric coefficients. The second reaction does the same but for the specific acyl chains, with data from FAME analysis [g/gDW], to create a generic acyl chain. Finally, the third reaction merges back together the generic backbone and the generic acyl chain into a generic lipid, which will be used in the biomass pseudo-reaction as in Eq.2.

For the new reactions to be consistent, we need to choose adequate stoichiometric coefficients for Eq.7. If the abundance data would be molar, s_{i} would be equal to 1 and s_{jk} would be equal to the number of repetitions of the corresponding acyl chain k in lipid j. However, as the abundance data often comes in mass units, s_{i} must be equal to the molecular weight [g/mmol] of the full lipid, and s_{jk} must be equal to the molecular weight of the corresponding acyl chain k, multiplied by the number of repetitions of k in configuration j. By choosing these values we allow the SLIME reactions to convert the molar production of the lipid [mmol/gDWh] into a mass basis [g/gDWh], which in turn will be converted to a lipid turnover [1/h] by the lipid pseudo reactions.

### Improved model of yeast

We implemented SLIMEr in the consensus genome-scale model of yeast version 7.8.0 [22], a model which used the previously mentioned permissive approach, and had at the start 2224 metabolites and 3496 reactions. Out of those reactions, 176 corresponded to reactions of the sort of Eq.6, which were replaced by 186 SLIME reactions that cover in total 19 lipid classes and 6 different acyl chains. An additional 27 metabolites (including both specific and generic backbones and acyl chains) and 15 reactions (including transport reactions, lipid pseudo-reactions and exchange reactions) were added to the model, and 10 metabolites and 1 reaction (connected to previously deleted reactions) were removed. The final enhanced model had therefore 2241 metabolites and 3520 reactions, and kept the number of genes and gene-reaction rules constant, as only pseudo-reactions with no gene-reaction rules were modified.

For the reference model, we used both lipid profiling and FAME data at low growth rate and no stress conditions [23]. The lipid profile was rescaled to be proportional to the FAME data, as detailed in the methods section. We can see that by using SLIMEr, the enhanced model was enforced to follow the acyl chain distribution of the experimental data (Fig. 2a), whereas the previous permissive model predicted mostly acyl chains of 16-carbon length (less costly), and only a small amount of 18-carbon length to satisfy the requirement of ergosterol ester (Additional file 1: Figure S1), as ergosteryl oleate is cheaper to produce mass-wise than ergosteryl palmitoleate (Additional file 1: Table S1).

With the enhanced model we also studied in how many ways lipid requirements can be satisfied spending the same amount of energy, by performing flux variability analysis (FVA) (Fig. 2b). Comparing these predictions to the ones of the permissive model (Additional file 1: Figure S1), we saw some reductions in variability, coming mostly from changes in phosphatidylcholine and triglyceride content. However, despite the additional constraints imposed, lipid metabolism could still rearrange itself in a wide amount of combinations, and overall flux variability did not decrease significantly (Additional file 1: Figure S2a). This agrees with experimental observations that lipid metabolism is highly flexible [5]; therefore, handling lipid metabolism with SLIME reactions is preferred over alternative approaches, such as models that constrain single individual lipid species [10, 24], as the latter limit the organism to only one feasible state of lipid metabolism and hence bias results.

### Model predictions of specific lipid distributions

To validate model predictions, we used reported data [5] including measurements of 102 specific lipid species. This data was added up to compute the totals of each lipid class and each acyl chain, and these sums were in turn used as input for creating both a permissive and an enhanced model. In the latter case, as a total lipid abundance of 8% was assumed, the acyl chain abundances were rescaled to be proportional to the lipid classes abundances (see the methods section for more details). We then performed random sampling of fluxes for the resulting models, to generate 10,000 specific lipid distributions for each model and for each of the 8 conditions of the study.

Comparing these in silico lipid distributions to the original in vivo measurements (Fig. 3a), the enhanced model improved the average prediction error for all experimental conditions (Additional file 1: Table S2), and overall the simulated lipid distributions came much closer to the experimental values compared to the permissive model (Fig. 3b). Furthermore, simulations with the enhanced model are also superior to simulations with a restrictive model, as the latter approach would not capture the fact that many lipid classes show preference towards few acyl chains. Instead, it would force the model to produce all acyl chains in the same proportion for each lipid class, significantly lowering the quality of predictions (Additional file 1: Figure S3).

It should be noted that even though SLIMEr improved the model’s lipid composition predictions, many other distributions are still predicted to be equally likely for all simulated conditions (Fig. 3b, Additional file 1: Figure S4); which reinforces the previously mentioned idea of a highly flexible lipid network. Furthermore, the fact that yeast picks a certain lipid distribution in vivo for each strain and condition, but has many additional options in silico, points also to a high level of regulation in place to adapt the distribution of lipid species in *S. cerevisiae* depending on the genetic background and environmental conditions [25].

### Energy costs at increasing levels of stress

As a final study, we used lipid data of yeast grown under 9 different stress levels [23] to create both a permissive and enhanced GEM for each of those conditions. We then computed the differences in ATP turnover and carbon requirements between the permissive and enhanced model, which correspond to the extra energy and carbon costs, respectively, required to achieve the given acyl chain distribution in each condition (Fig. 3c). As increasing stress levels are associated to an increase in maintenance energy (Additional file 1: Figure S5) [26], by using SLIMEr we therefore showed an increase in lipid expenses when transitioning from a metabolic state of low energy demand to high energy demand.

In the case of the reference condition, the permissive model could produce 145.9 μmol (ATP)/gDW more than the enhanced model. Also in this condition, the simulated growth-associated ATP maintenance (GAM) without accounting for known polymerization costs of proteins, carbohydrates, RNA and DNA [27] was of 36.96 mmol (ATP)/gDW, which corresponds to the maintenance costs of unspecified functions in the model, such as protein turnover, maintenance of membrane potentials, etc. The ATP cost for achieving correct acyl chain distribution under reference conditions corresponded then to 0.4% of the total costs of processes not included in the model. This is a rather low percentage, which shows that the addition of SLIME reactions will not cause a significant increase in the overall metabolic energy demand, while making the simulated fluxes in lipid metabolism better match experimentally observed distributions (Fig. 3b).