Metabolic modeling predicts metabolite changes in Mycobacterium tuberculosis

Background Mycobacterium tuberculosis (MTB) is the causal agent of the disease tuberculosis (TB). Metabolic adaptations are thought to be critical to the survival of MTB during pathogenesis. Computational tools that can be used to study MTB metabolism in silico and prioritize resource-intensive experimental work could significantly accelerate research. Results We have developed E-Flux-MFC, an enhancement of our original E-Flux method that enables the prediction of changes in the production of external and internal metabolites corresponding to gene expression measurements. We have used this method to simulate the changes in the metabolic state of Mycobacterium tuberculosis (MTB). We have validated the accuracy of E-Flux-MFC for predicting changes in lipids and metabolites during a hypoxia time course using previously published metabolomics and transcriptomics data. We have further validated the accuracy of the method for predicting changes in MTB lipids following the deletion and induction of two well-studied transcription factors (TFs). We have applied the method to predict the metabolic impact of the induction of each of the approximately 180 MTB TFs using a previously generated and publically available expression data set. Conclusions E-flux-MFC can be used to study global changes in MTB metabolites from gene expression data associated with environmental and genetic perturbations. The application of this method to a data set of MTB TF perturbations provides a resource for studying the large number of TFs whose functions remain unknown. Most TFs impact metabolites indirectly through the propagation of gene expression changes through the regulatory network rather than through their direct regulons. E-Flux-MFC is also applicable to any organism for which accurate metabolic models are available. Electronic supplementary material The online version of this article (doi:10.1186/s12918-015-0206-7) contains supplementary material, which is available to authorized users.

adaptations are thought to be critical to the survival of MTB during pathogenesis. Computational tools that can be used to study MTB metabolism in silico and prioritize resource-intensive experimental work could significantly accelerate research. Results: We have developed E-Flux-MFC, an enhancement of our original E-Flux method that enables the prediction of changes in the production of external and internal metabolites corresponding to gene expression measurements. We have used this method to simulate the changes in the metabolic state of Mycobacterium tuberculosis (MTB). We have validated the accuracy of E-Flux-MFC for predicting changes in lipids and metabolites during a hypoxia time course using previously published metabolomics and transcriptomics data. We have further validated the accuracy of the method for predicting changes in MTB lipids following the deletion and induction of two well-studied transcription factors (TFs). We have applied the method to predict the metabolic impact of the induction of each of the approximately 180 MTB TFs using a previously generated and publically available expression data set.
Conclusions: E-flux-MFC can be used to study global changes in MTB metabolites from gene expression data associated with environmental and genetic perturbations. The application of this method to a data set of MTB TF perturbations provides a resource for studying the large number of TFs whose functions remain unknown. Most TFs impact metabolites indirectly through the propagation of gene expression changes through the regulatory network rather than through their direct regulons. E-Flux-MFC is also applicable to any organism for which accurate metabolic models are available.

Background
Mycobacterium tuberculosis (MTB) is the causal agent of the disease tuberculosis (TB). With more than 9 million new cases of active disease and nearly 1.5 million deaths in 2013, TB is a global health emergency of substantial proportions [1]. This is amplified by the emergence of mono-resistant, multiple drug resistant (MDR), extensively drug resistant (XDR) and, most recently, totally drug resistant (TDR) strains of TB [2][3][4][5][6]. MTB is primarily transmitted to a new host via inhalation [7]. Within the lung MTB is phagocytosed by macrophages, which ultimately triggers the formation of a granuloma that contains the infected cells. The success of this microbe is due in part to its ability to survive within the granuloma for long periods of time (sometimes decades) in an asymptomatic state [8][9][10]. One-third of the world's population is latently infected with MTB [11].
Metabolic adaptations are thought to be critical to the survival of MTB during pathogenesis. Within the host, the bacterium must adapt to a range of stresses including hypoxic [12][13][14][15][16][17][18][19][20][21][22][23][24], acidic [25], nitrosative [26], and redox [27][28][29][30] stresses. The response of MTB to hypoxia in particular is characterized by widespread metabolic changes including the induction of cholesterol catabolism (even independent of the presence of cholesterol), alterations in the metabolism of triacylglycerides (TAG), alterations in methyl-branched lipids, the rapid release of free mycolates from trehalose dimycolates (TDM), and widespread changes in both intracellular and extracellular amino acid levels [31]. Restrictions in nutrient availability further characterize the host environment. Within the host, MTB shifts to lipids [32,33] and host-derived cholesterol [34][35][36][37][38][39] as primary nutrient sources. In addition, several lines of evidence indicate that an MTB infection results in metabolic reprogramming of the macrophage host [9,40,41]. Although the roles of a small number of transcriptional regulators have been well studied in connection to these adaptations [23,25,29,30], the regulatory mechanisms underlying these changes remain largely unknown. In fact, the potential regulatory roles of the vast majority of the approximately 180 predicted transcription factors of MTB remain unknown.
Experimental work with MTB is challenging and resource intensive. Work must be performed in a specialized BSL-3 environment, and fewer genetic tools exist for MTB relative to other organisms. Experimental work is also burdened by the extremely slow growth rate of the organism. MTB has a doubling time of approximately 24 h (as compared to 20 min for E. coli). Experiments with MTB thus require months to perform. Computational tools that can be used to study MTB in silico and prioritize resource-intensive experimental work could thus significantly accelerate research.
Computational metabolic modeling has been applied successfully to gain insight into the metabolism of MTB [42][43][44][45][46][47][48][49]. These efforts are complemented by a large number of computational studies into the metabolism of other non-pathogenic and pathogenic bacterial strains [50][51][52][53][54] as well as human metabolism [55,56]. Two widely used models of MTB metabolism have been published: iNJ661 [42] and GSMN-TB [43,57]. Further enhancements to these models have addressed shortcomings in the original model by incorporating new biochemical knowledge collected about key pathways in mycobacterial metabolism, allowing researchers to develop metabolic models specific to conditions that are important for understanding the processes that underlie virulence [47,57,58].
Genome-scale metabolic models capture information about both experimentally validated and computationally predicted biochemical processes within an organism in the form of a stoichiometric matrix. These models also describe the relationship between genes, proteins, and the enzymes that catalyze each reaction in the organism [59]. Flux balance analysis (FBA) is a method that is used to predict network metabolic capabilities at steady state [60][61][62][63][64]. The main assumption of standard FBA is the absence of the production or consumption of metabolites, outside of select source or sink metabolites describing nutrient uptake, waste secretion and biomass production [59]. Traditional implementations of FBA typically do not include terms that describe kinetic parameters, feedback inhibition, or the effects of transcriptional regulation on reaction fluxes. FBA has been used successfully to predict the metabolic phenotype of gene knockouts [65][66][67], to predict growth rates [43,51,57,[68][69][70], and to predict rates of metabolite uptake and secretion across time using quasi-steady-state modeling approaches [44,46,71,72].
Many methods have been developed to couple gene expression state with FBA and have been reviewed in depth by Lewis et al. [73]. One of the first of these methods, called rFBA, utilized Boolean constraints in order to simulate changes in metabolic flux in response to changes in environmental conditions and regulatory network perturbations [74]. Other methods, such as GIMME and iMAT, utilize gene expression microarray data in order to generate flux solutions that are consistent with a set of gene expression data [50,75,76]. Previously, we described a method called E-Flux that extends FBA by translating gene expression data into hard constraints on the maximum metabolic flux through individual reactions [46,68]. Using E-Flux, we predicted the impact of drugs on MTB mycolic acid biosynthesis [46]. By integrating expression data from a compendium of 437 microarray experiments corresponding to 75 different drugs and culture conditions [77], we correctly predicted 6 of 7 known inhibitors within this data set [46]. A more recent method, called Probabilistic Regulation of Metabolism (PROM) [49], translates expression data into soft constraints on model reaction rates. PROM generates a flux distribution that is consistent with a set of gene expression data by minimizing the sum of the violations on these constraints across all of the reactions in the model. E-Flux is tailored to predicting terminal, or sink, metabolites and the PROM method is easily adaptable for the study of those same metabolites.
Here, we present E-Flux-MFC, an enhancement of the original E-Flux method that enables the accurate prediction of changes in the production of both external and internal metabolites by integrating gene expression data. We have validated the accuracy of E-Flux-MFC in predicting changes in lipids and metabolites during a hypoxia time course using previously published metabolomics and transcriptomics data [31]. We have further validated the accuracy of E-Flux-MFC for predicting changes in MTB lipids following the deletion and induction of two well-studied transcription factors. We then use our validated approach to predict the metabolic impact of the induction of each MTB TF using a previously generated and publicly available expression data set [31,78]. These predictions provide a resource for studying the large number of TFs whose functions remain unknown, and for identifying TFs that may play direct or indirect role in regulating the metabolism of compounds of interest.

Metabolic modeling with E-Flux-MFC
Traditionally, FBA approaches have relied on the steadystate assumption, which constrains the total change in concentration of each metabolite to some constant value. Recent approaches have addressed the problem of modifying FBA in order to make predictions of changes in the consumption or production of model metabolites [71,72,79]. Here we have developed a method to predict changes in the accumulation and utilization of both extracellular and intracellular metabolites by relaxing the steady-state constraint for each of the metabolites in our model and then calculating the difference between the maximum production flux and the maximum consumption flux in order to calculate a value that we call maximum flux capacity (MFC).
In order to validate our method, we examined several published gene expression data sets associated with experimentally measured changes in metabolite production. One data set comprises both gene expression data and metabolomics data for 69 metabolites in GSMN-TB. These data were collected during a time course experiment in which Mycobacterium tuberculosis H37Rv was exposed to hypoxia [31]. This data set thus combines gene expression and metabolite measurements in conditions relevant to TB pathogenesis. Two additional data sets are expression datasets associated with knockouts of the lipid-production associated transcription factors phoP (Rv0757) [25] and dosR (Rv3133c) [23]. These are the only two TF deletion studies in MTB, of which we are aware, that have coupled both transcriptomics and metabolomics. These data were used to validate the accuracy of our approach in predicting the metabolic impacts of TF deletions.
Importantly, because our method is an adaptation of FBA, our model generates predictions of metabolite production or secretion at a quasi-steady-state that is defined by both the medium constraints placed on the model and the gene expression data from a particular time point. Our predictions are not predictions of changes in concentration over time (which would rely on precise measurements of initial metabolite measurements and medium uptake and secretion rates), but are instead qualitative predictions of changes in maximum production. We compare these predictions against measured changes in concentration. We propose that decreases and increases in maximum flux capacity generally lead to corresponding decreases and increases in metabolite concentration respectively.

Prediction of changes in metabolite production in a hypoxic time course
As a first validation of our approach, we sought to predict changes in lipid production in response to exposure to hypoxia, which generates a complex regulatory response that allows MTB to survive within a low-oxygen environment. In previously published work, MTB was subjected to a time course of hypoxia during which the relative levels of transcripts, metabolites, and selected lipids were measured [31]. These data sets provide a systems-level compendium of experimental data that describes MTB's response to a trigger for entry into dormancy.
For our method we utilized gene expression data collected across a hypoxic time course in order to generate reaction bounds. In order to model the uncertainty in our gene expression values and their relationship to modeling predictions, we utilized a Monte Carlo sampling approach. For each gene at each time point we added values sampled from a Gaussian distribution centered on zero with a standard deviation calculated based on replicate measurements. These samples were added to the log2 RMA expression values and subsequently exponentiated for reaction expression calculation. Similar approaches have been used previously in order to assess the sensitivity of modeling results on the variance of gene expression data [46,68].
In Fig. 1a, we show the results for a comparison between 24 h after the introduction of hypoxia and prehypoxic conditions. We compare log-fold changes in maximum flux capacity with log-fold changes in metabolite abundance for each metabolite that was measured in this experiment and that was also present in the MTB metabolic model (Additional file 1: Figure S1 provides a histogram of MFC values for all metabolites in our model). In order to assess the relationship between changes in MFC and changes in concentration, we calculated the Spearman correlation coefficient. For the hypoxic transition data set, we calculate a value of 0.48 (p = 1.7 × 10 −5 ). Although we do not necessarily expect a linear relationship between MFC and change in metabolite abundance with our method, we also calculate a Pearson correlation coefficient of 0.65 (p = 1.1 × 10 −9 ). Even in the absence of detailed kinetic parameters for each reaction and the lack of quantitative concentration measurements, our predictions are positively correlated with changes in concentration after the induction of hypoxia. Importantly, our predictions encompass both intracellular and extracellular metabolites.
Alterations of oxygen tension results in the downregulation of the majority MTB genes coupled with the up-regulation of a smaller subset [31]. In order to assess whether the correlation between our predictions and changes in metabolite concentration was due to these broad patterns of expression changes versus the modulation of specific genes, we performed a permutation analysis designed to assess the specificity of our results (Fig. 1b). Similar methods have been described previously [46,54,68]. We performed 1000 permutations by randomizing gene labels then applying the resulting expression set to our model. We compared the distribution of Spearman correlation coefficients for the correct gene labels with predictions generated after randomizing gene labels. Our model performs significantly better with data from the true distribution, suggesting that our predictions are due to the specific configuration of gene expression and the metabolites that we are testing and not due to a global trend in the expression data.
In addition to random sampling and gene label permutation analysis, we examined the effect of the parameter κ (see Equation). κ controls the hardness of the expressionbased reaction bounds. As κ approaches 0, these constraints become much softer and have less of an effect on the final solution. As κ increases, the expression-based reaction bounds become much harder bounds and thus place a much greater constraint on the model [49]. Higher values of κ generate bounds that are similar to the hard reaction constraints imposed by E-Flux. For each value of κ, we generated 100 samples of our hypoxic transition gene expression data. Figure 1c shows the results of this analysis.
We have plotted both the mean and standard deviation of the Spearman correlation coefficient for the MFC values that we calculate from these samples for each value of the parameter. Our sensitivity analysis suggests that past values of approximately 1.0, there is little variation in the Spearman correlation coefficient. Similarly to Chandrasekaran and Price [49], we find that a value for κ of 1.0 strikes a balance between hard and soft bounds that yields reasonable predictions for our analysis of both hypoxic transition data and transcription factor knockout data.
We further predicted the impact of hypoxia on the production of key mycobacterial lipids and compared these changes to measured changes in lipids [31]. In Fig. 1d we show comparisons between predictions (in red) and measurements (in black) for several classes of lipids during both hypoxia (blue) and subsequent reaeration (yellow). For visualization, production values are normalized by first subtracting the minimum MFC value for each species across time, then dividing by the difference between the maximum and the minimum MFC values. The measurements of changes in lipids are  Figure S1. b One-thousand samples were generated by randomizing gene labels on the time course expression data. The Spearman correlation coefficient was calculated for each permutation. This distribution is compared with the distribution of coefficients generated from Monte Carlo samples using the correct gene labels. c Spearman correlation coefficient as a function of the parameter κ for predictions of change in metabolite concentration from just prior to hypoxia to 1-day post hypoxia. Error bars represent standard deviations of the Spearman correlation coefficient calculated across 100 samples of the gene expression data. d Changes in the production of several classes of lipids across the full hypoxic and reaeration time course. Red lines show model predictions of normalized net production for each lipid across the experimental time course. Black lines show normalized measured changed in abundance across each time course for each measured member of the lipid class. Error bars represent the standard deviation across samples for predicted production and across experimental replicates for measured abundance values. e Predicted changes in TAG production (red) and consumption (blue) fluxes. Error bars represent the standard deviation across samples for predicted production and consumption plotted for individual molecular species belonging to broad classes of lipids. Our results demonstrate qualitative agreement between our predictions of lipid maximum net production rates and measured changes in lipid levels. Assuming no other processes alter lipid levels, changes in lipid amounts should reflect the time integral of net production. Qualitatively, this is what we observe. Decreases in trehalose dimycolates (TDMs) and phosphatidyl dimycocerosates (PDIMs) during hypoxia and increases during reaeration are correlated with decreases and increases in predicted net production, respectively. Similarly, increases in triacylglycerides (TAGs) during hypoxia and decreases during re-aeration are correlated with increases and decreases in predicted net production, respectively. Predictions for phosphatidylethanolamine (PE) have substantially higher variance and we do not predict significant changes in the production of PEs at either transition. This is consistent with measured changes in abundance for some members of this class, but not others.
MTB is known to accumulate TAG during hypoxia and utilize TAG during subsequent re-aeration [80][81][82][83]. The consumption of TAG is necessary for the reactivation of non-replicating Mycobacterium bovis bacillus Calmette-Guérin (BCG) after exposure to hypoxia [82]. TAGs also play a role in maintaining redox homeostasis by acting as a source of reducing equivalents [29] and the modulation of carbon flux away from the TCA cycle and into the production of TAG (which leads to slowed growth) has been shown to be an important modulator of the response of MTB to treatment by antibiotics [84]. In principle, the accumulation of TAG during hypoxia might be driven by an increase in production, a decrease in consumption, or both. We examined this by separately predicting the maximum production and consumption of TAG over the time course (Fig. 2e). The results suggest that the increase in TAG during hypoxia is driven largely by a decrease in consumption relative to production. This is consistent with the overall decrease in growth during hypoxia-induced dormancy. Moreover, of the 24 MTB lipase genes predicted to cleave acyl groups from TAG [81], 18 show repression during hypoxia [31]. Conversely, during re-aeration, both consumption and production increase, but greater relative consumption is predicted to drive lower TAG abundance. Consistent with this, during re-aeration the expression of most lipase genes returns to baseline, and genes involved in fatty acid synthase I (FAS1), energy metabolism and β-oxidation show increased expression [31].

Comparison of E-flux-MFC with E-flux and PROM
As noted above, E-flux-MFC is an extension of E-flux. E-flux calculates the maximum production of sink metabolites and is thus tailored for external metabolites that are the product of unidirectional reactions. E-flux-MFC extends E-flux to internal metabolites by predicting both the maximum production and consumption of a metabolite to calculate the MFC. To assess the increase in accuracy attained by this enhancement, we evaluated the prediction of standard E-flux on the hypoxia data set by calculating only the maximum production of all metabolites. For this analysis, the Spearman correlation coefficient was 0.16 (p = 0.09) and the Pearson correlation coefficients was 0.30 (p = 0.01). As expected, E-flux-MFC performs considerably better than E-flux.
E-Flux-MFC also borrows from PROM method, which implements soft reaction bounds. PROM predicts biomass production rather than changes in the production of individual metabolites. However, PROM can be used in the same manner as E-flux by treating each metabolite as a single element biomass vector. We implemented PROM in this way to compare with E-flux-MFC on the hypoxia data set. For this analysis, the Spearman correlation coefficient was 0.36 (p = 0.001) and the Pearson correlation coefficient was 0.35 (p = 0.002). E-flux-MFC thus also performed better than PROM, although PROM performed better than standard E-flux.

Predicting changes in lipid expression after phoP or dosR deletions
Our initial validation suggests that our approach is capable of predicting qualitative changes in metabolite levels based on gene expression changes from wild-type cells. Toward the goal of supporting experimental design, we also sought to test the ability of our approach to predict metabolic changes corresponding to gene deletions.
We first analyzed the knockout of the transcription factor PhoP [25] (GSE22854). This study compared the transcriptional responses of the CDC1551 strain of MTB to that of a phoP transposon mutant. Each strain was grown to early log phase in standing cultures in 7H9 medium supplemented with OADC (bovine albumin, dextrose, catalase, and Tween-80). Transcript levels were measured from three experimental replicates. In separate experiments, changes in the abundance of several classes of lipids were measured via thin-layer chromatography (TLC). PhoP has been shown to regulate cellular aggregation, growth after macrophage infection, and the production of lipids important for the structure of the cell wall, for virulence, and for the production storage lipids [25,[85][86][87][88]. Using the transcriptomic data, we compared our model predictions with measured lipid changes.
As above, in order to estimate the significance of predicted changes in MFC, we generated a null model distribution by adding simulated gene expression measurement noise to the values from the control channel (see Methods). Predicted changes in MFC were compared to this distribution to calculate a z-score. Predicted MFC Fig. 2 Predicted changes in lipid production are deletion or induction of phoP and dosR. a Predicted changes in lipid production in a phoP knockout strain. E-Flux-MFC correctly predicts the change in production of 7/7 previously-measured changes in lipid production in phoP knockout mutants [25,86,87]. b Predicted changes in lipid production in a dosR knockout strain after the induction of hypoxia (2 h 0.2 % O 2 ) [23]. TAG production significantly increased, in agreement with expectations based on previous observations. c Predicted changes in lipid production after the induction of phoP. d Predicted changes in lipid production after the induction of dosR. e Predictions of changes in lipid production specific to the direct regulon of phoP after induction. Direct regulon from ChIP-Seq data [31]. f Predicted changes in lipid production specific to the direct regulon of DosR after induction. Abbreviations: TAG triacylglycerols, PDIM phthiocerol dimycocerosates, SL-1 sulfolipids, PAT polyacyltrehalose, DAT diacyltrehalose, TDM trehalose dimycolates, TMM trehalose monomycolates values falling outside of the 95 % null interval of these MFC values were considered to be significant changes.
In Fig. 2a, we show z-scores of fold changes in predicted maximum flux capacity (MFC) resulting from the phoP knockout expression data. We correctly predict changes in the production of 7 measured lipids between the wild type strain and the knockout strain. Our method predicts large increases in the production of the storage lipid triacylglycerol (TAG) and large decreases in the production of virulence lipids sulfolipid (SL-1) and poly-and di-acyltrehaloses (PAT and DATs). The method predicts no changes in either TDMs or trehalose mono-mycolates (TMMs). All these predictions are in qualitative agreement with experimental measurements [25,86,87].
Of particular interest are the predictions of decreases in SL-1 and PAT, and DAT production and the increase in the production of DIM. PhoP directly regulates pks2 and pks3, genes responsible for the production of sulfolipids [89] and polyacyltrehalose/diacyltrehalose (PAT/DAT) [90] respectively. PDIM appears to be specifically required for growth in the lungs of mice [91] and, along with other transcription factors, plays a role in the regulation of the redox state of the cell by acting as a shunt for the incorporation of reducing equivalents and propionyl-CoA [92]. It has also been shown that SL-1 and PDIM production can be regulated by the availability of their common precursors methylmalonyl-CoA and propionate [92]. Our model predictions are consistent with a regulatory role for PhoP in fine-tuning the flux of these precursors to downstream lipid production pathways [93].
We next analyzed the TF DosR. DosR is known to play an important role in the regulation of hypoxic adaptation [23]. Park and colleagues knocked out the transcription DosR and measured gene expression in both the wild type and knockout strains before and after exposure to hypoxia [23] (GSE8829). Using E-Flux-MFC, we analyzed the gene expression data from this experiment. The expression data consists of two sets of two-color microarrays. One set compared gene expression between hypoxia and normoxia for wild type BCG. The other set compared gene expression between hypoxia and normoxia for ΔdosR. We used out approach to predict the impact of dosR deletion. To do so, we first generated MFC predictions for each lipid class for each condition and strain. We then calculated the fold change in the MFC values between WT and ΔdosR in each condition to estimate changes in lipid content. Significance and z-scores were calculated as above (see Methods).
In Fig. 2b we plot the predicted changes in selected lipid classes for ΔdosR relative to WT in the hypoxic condition. The most significant predicted change is a decrease in TAGs, consistent with previous data. DosR has been shown to directly regulate the TAG production gene tgs1 (Rv3130c) [23,83,94]. A combination of ChIP-Seq and transcription factor induction experiments further highlighted the strength of regulation of tgs1 by DosR [31]. We do not predict significant changes in the production of other classes of lipids. Although substantial changes in other lipids occur during hypoxia (Fig. 1), in this experiment, gene expression measurements were collected after 2 h of exposure to hypoxic conditions. Thus, these data are representative of the very early response to hypoxic conditions. DosR, along with Rv0081, appears to play a large role in the initial hypoxic response, after which longer-term adaptations to hypoxic conditions may be the result of regulation by other transcription factors [24,31].

Predicting changes in lipid expression after the induction of PhoP or DosR
We next sought to simulate the effect on lipid production of the induction of PhoP or DosR. For this, we used previously published measured global expression data for experiments in which each TF was induced via the introduction of an exogenous promoter [31]. Each of these data sets captures both the direct and indirect effects of TF induction on gene expression. Although experimental measurements of lipid production after induction are not available for either TF, one conceivable outcome is that lipid changes would mirror the results of TF deletion.
This was predicted to be the case for DosR induction, but interestingly not for PhoP induction (Fig. 2c and d). For DosR, the most significant prediction of our analysis is an increase in production of TAGs. This is consistent with expectation, given the known regulation of tgs1 by DosR described above. It is also consistent with the experimentally observed observation that certain strains (W-Beijing lineage) of MTB with constitutively active members of the DosR regulon are associated with overproduction of TAG in aerobic culture [94].
We also predict a small but significant decrease in the production of PDIM (Fig. 2d). These changes are consistent with decreases in the production of PDIM observed in lipidomics measurements across a hypoxic time course [31]. Moreover, ChIP-Seq predicts that several genes thought to play a role in PDIM synthesis and transport-Rv2935, Rv2936, and Rv2939 [95]-are regulated by DosR [31]. All three genes are strongly downregulated in the DosR induction dataset. Although the role of dosR in the regulation of triacylglycerol is well known, the role of dosR in the regulation of PDIM synthesis is less well studied. It has been observed that PDIM concentrations in the cell decrease moderately upon the introduction of hypoxia and then return to normal levels after re-aeration [31].
The predicted effects of PhoP induction on lipid production, conversely, did not completely mirror the known effects of phoP deletion (Fig. 2c). Our analysis predicts an elimination of TAG production and a reversal of SL-1 production, opposite the effect of PhoP deletion. However, PAT, DAT, and TDM are predicted to decrease in their production, as they are in the PhoP deletion [25,86,87].
It is known that the induction of genes can produce overlapping but not identical phenotypes when compared with gene deletions [96]. One possible explanation may be the effect of downstream regulatory interactions, gene expression changes, and feedback associated with changes in cellular state. To attempt to partially eliminate these indirect effects, and to attempt to tease out the direct effects from the global effects, we simulated the impact of both PhoP and DosR induction considering expression changes for only those genes predicted to be directly regulated by each TF. We consider a gene to be directly regulated if a strong binding interaction was observed in our previously-generated ChIP-seq data set. For genes not directly regulated by the TF, the mean gene expression values across replicates for corresponding WT samples were used (see Discussion for limitations of this approach). Figure 2e and f show the results for this analysis. The predicted direct effects of DosR induction are qualitatively very similar to the predictions of the global effects (Fig. 2f ). This predicts that the impact of DosR on changes in these lipids can derive primarily from changes to the direct regulon of DosR. The predicted effects of PhoP induction, however, differ from the predicted global effects for TAG, PAT, and DAT. Induction of only the PhoP regulon is predicted to decrease production of TAGs, mirroring the effect of phoP deletion. More surprisingly, PAT and DAT production is predicted to increase, mirroring the effect of PhoP deletion (for DATs) and the global effects of PhoP induction (for PAT and DAT). Changes in PAT and DAT are consistent with the predicted regulation by PhoP of the polyketide synthase pks3, known to play a role in the synthesis of acyltrehaloses [97]. The difference in the direct effects relative to global effects, however, suggests that the impact of this direct regulation is modulated by other indirect changes in cell state.

Comprehensive prediction of metabolite changes following induction of all MTB TFs
Bioinformatic analyses suggest that the MTB genome contains roughly 180 transcription factors [31]. The functions of the majority of these regulators are unknown. To begin to gain insight into the functional roles of unstudied MTB TFs, we have used our approach to predict the potential metabolic impact of the majority of MTB TFs. For this, we have used previously published [31,78] gene expression data for the induction of each MTB TF publicly available at TBDB.org. The gene expression data sets used capture the changes in all genes following TF induction. Using E-Flux-MFC, for each TF we have predicted the metabolic impact on seven major lipid classes (Fig. 3a) and all 737 non-currency metabolites (Additional file 2) in our metabolic model. Predicted changes are quantified as z-scores relative to our background models (see above and Methods), and thus reflect both the significance and magnitude of the predicted impact. TFs functionally annotated in this manner were also clustered to identify sets of regulators with potentially similar functional roles. As in Fig. 2e and f, to filter out indirect effects, and thus assess the potential function of the direct regulon of each TF, we also simulated the impact of expression changes for the direct regulon of each TF ( Fig. 3b and Additional file 2). Comparing the predictions for the global effect on lipids in Fig. 3a with the predictions of the direct regulon effects in Fig. 3b suggests that the majority of TFs may impact lipid production through indirect effects. A similar pattern is seen when examining other metabolites. These data suggest that the full functional significance of a regulator may not be well understood by examining only its directly regulated genes. Instead, the impact of the regulator in the context of the larger regulatory and metabolic network is essential.

Discussion
We have presented E-Flux-MFC, an enhancement of the original E-Flux method that enables the prediction of changes in the production of both external and internal metabolite corresponding to changes in gene expression data. We validated our method on a genome-scale metabolic model of MTB in two ways. First, we assessed the accuracy of E-Flux-MFC in predicting changes in MTB lipids and metabolites during a time course of hypoxia using previously published metabolomics and transcriptomics data [31]. Second, we assessed the accuracy of E-Flux-MFC for predicting changes in MTB lipids following the deletion and induction of two well-studied transcription factors. We then use our approach to provide insight into the potential metabolic functions of the majority of MTB TFs, most of which are unstudied. Using our method, we predicted the metabolic impact of the induction of each TF using a previously generated and publically available expression data set [31,78]. These predictions provide a resource for studying the large number of TFs whose functions remain unknown, and for identifying TFs that may be associated with metabolites or metabolic pathways of interest.
E-Flux-MFC is an extension of our previously developed method, E-Flux [46]. Both methods utilize enzyme gene expression data to constrain the maximum flux through the corresponding metabolic reactions at steady state. We and others have previously use E-Flux to accurately predict changes in terminal metabolites [46,54]. E-Flux treats the production of such metabolites as an artificial biomass function, and uses FBA with maximum flux constraints to calculate the maximum value of this function. In effect, E-Flux predicts the theoretical maximum flux into a terminal metabolite if the system objective was to solely produce this metabolite. Previous work indicates that changes in this theoretical maximum flux are well correlated with actual changes in metabolite concentrations [46,54].
By focusing on the production of terminal metabolites, E-flux avoids the need to account for the consumption of these metabolites. E-Flux-MFC extends this approach to the prediction of non-terminal metabolites. Net changes in internal metabolites will reflect the balance of production and consumption. This in turn will depend on the activities of enzymes in the corresponding pathways, and the availability of both upstream and downstream metabolites. We hypothesized that gene expression state would provide bounds on the maximum possible production and maximum possible consumption for any given state of internal metabolites. Based on this, E-Flux-MFC uses gene expression data to calculate two intermediate values. First, it combines all the pathways for the production of a target metabolite into a synthetic biomass function, and calculates a theoretical maximum production rate, ignoring consumption. Second, it combines all the pathways for the consumption of a target metabolite into a synthetic biomass function, and calculates a theoretical maximum consumption rate, ignoring production. E-Flux-MFC then calculates the difference between the maximum production flux and the maximum consumption flux in order to calculate a value that we call maximum flux capacity (MFC). MFC represents the theoretical maximum production of a target metabolite if pathways for both production and consumption were operating at their predicted maximums. In additions, while E-Flux applied hard constraints on maximum flux, E-Flux-MFC borrows a key idea from the PROM method [49] and allows fluxes that violate the maximum flux constraint, but penalizes such violations.
Several previous methods have addressed the use of gene expression data in order to predict changes in metabolite abundance. Differential producibility analysis (DPA) utilizes FBA to identify genes essential for the production of each metabolite, then utilizes changes in gene expression of essential genes to calculate signals of differential metabolite production [44]. Reporter metabolite analysis utilizes metabolic network topology to identify metabolites associated with genes that have changed in expression between two conditions [98]. Reporter feature analysis, a modification of reporter metabolite analysis, has been used to predict metabolites affected by transcription factor perturbations [99]. Reporter metabolite analysis takes into consideration only those gene expression values directly associated with the reactions that produce and consume a particular metabolite. One of the benefits of our method is that it takes into consideration the fact that the limiting reactions in the production pathway of a particular metabolite may not be the reaction that directly produces a metabolite. The value of the approach taken by DPA is that it utilizes relationships between genes and metabolites that take into account non-direct relationships between genes and the production of specific metabolites. However, neither of these approaches predicts the direction of change in the concentration of a metabolite, one of the main benefits of E-Flux-MFC.
Another method, termed flux imbalance analysis, utilizes an adaptation of the GIMME algorithm [50] in order to predict changes in metabolite concentration using gene expression data [71]. The authors found that their model predictions provide significant predictive value of the sign of the change in a metabolite's concentration. Although flux imbalance analysis successfully predicts changes in concentration, it utilizes a method that requires the introduction of a required metabolic functionality (RMF), which is a minimal user-defined functionality required for the generation of an expression-constrained flux solution. E-Flux-MFC does not require the definition of an RMF (although one may be enforced if it is well-defined for the condition of interest). Even if the model accurately predicts the theoretical maximum production and consumption of a metabolite at steady state, changes in these maxima need not result in changes in metabolite levels (if for example production, consumption or both were not operating near the maximal levels). To test the degree to which our predicted MFCs empirically correlate with actual changes in metabolite levels, we performed several validations. First, we assessed the degree to which the method could predict measured changes in MTB metabolites from corresponding measurements of gene expression for the same time points. As shown in Fig. 1, the method with our existing genome-scale model of MTB metabolism performs with reasonable accuracy. Predicted changes in MFC for internal metabolites display a statistically significant positive rank and linear correlations (Spearman's ρ =0.48, p = 1.7 × 10 −5 , Pearson's r = 0.64, p = 1.1 × 10 −9 ) with measured changes in relative metabolite concentrations (Fig. 1b) during the transition from normoxia to hypoxia. In addition, predicted MFCs over the full time course display correspondence with observed changes in lipid concentrations (Fig. 1d). Many cell wall lipids might be considered terminal metabolites from the perspective of the current metabolic network model (e.g. phosphatidylethanolamine, for which we are not aware of any recycling reactions in MTB). TAGs, by contrast, are actively produced and consumed by MTB [80][81][82][83], and the predictions of TAG MFC are in good agreement with measured abundance changes.
Second, we assessed the degree to which the method could predict the metabolic impact of perturbations to transcription factors, given global gene expression data following the perturbation. We studied two TFs-DosR and PhoP-for which such gene expression data are available [23,25], and for which information about expected metabolites changes was also available. For both TFs, E-Flux-MFC is able to correctly predict all known changes (or lack of change) in 7 different lipid classes following TF deletion (Fig. 2a and b).
We further examined the predictions of the method based on global gene expression following TF induction [31]. In this case corresponding lipid measurements do not exist. In these cases, all the results are novel predictions of the model. In some cases, hypotheses could be generated in light of current knowledge to explain the predictions after the fact. In particular, while DosR deletion abolishes TAG production in hypoxia, DosR induction increases TAG production. This is consistent with a report that strains of the W-Beijing lineage of MTB with constitutively active members of the DosR regulon are associated with overproduction of TAG [94]. Also, the model predicts a weak effect of DosR induction on PDIM. This hypothesis is consistent with the predicted regulation by DosR, based on ChIP-Seq [31], of several genes thought to play a role in PDIM synthesis (Rv2935, Rv2936, and Rv2939) [95], data that was not used in the modeling. These explanations, however, remain hypotheses generated by the model that require follow-up experimental validation.
The prediction of the impact of PhoP induction was more unexpected. The predictions were complex and did not directly mirror the effects of phoP deletion. The known increase in TAG and decrease in SL-1 in ΔphoP was either abolished or slightly reversed in the PhoP induction predictions. The decrease in PAT production in ΔphoP, however, was enhanced in the predictions for PhoP induction, and DAT was also predicted to decrease in abundance.
Differences in phenotype between gene deletions and gene induction are known to occur [96], and this asymmetry would be expected in many cases given the potential complexity and non-linearity of downstream regulatory interactions, gene expression changes, and associated feedback. We explored this by simulating the impact of both PhoP and DosR induction considering expression changes for only those genes predicted to be directly regulated by each TF. For genes not directly regulated by the TF, the WT gene expression values were used. In the case of DosR, both the direct and global effects of induction on the lipids analyzed were similar. The major predicted change was the expected increase in TAG production. For PhoP, conversely, the predicted direct effects on lipid production differed from the predicted global effects. In this case, the predicted direct effects for acyltrehaloses were substantially different from the global prediction, while the impact on TAGs was further magnified. These predictions suggest that the effect of PhoP modulation on lipid productionparticularly acyltrehaloses-may depend significantly on the state of other regulators.
We also note that the accuracy of our predictions of changes in lipid production in the PhoP deletion might be enhanced by the use of gene expression data collected from both wild type and ΔphoP mutants in acidic conditions, which induce phoP and its regulation of intracellular metabolism via the aprABC locus [25]. Nevertheless, Abramavoitch et al. report a greater than 50 fold reduction in the expression of the aprABC locus in the phoP transposon mutant, suggesting that even if phoP is not maximally induced in the neutral pH wild type condition, it may still be playing some regulatory role.
Simulating the effect of only the gene expression changes for the direct regulon of a TF has clear limitations. In particular, the synthetic expression state used may not be a realizable state of the system (e.g. if the expression state of the TF regulon imposes hard constraints on the expression of other genes). Moreover, the expression of the genes in a TF regulon may themselves reflect feedback and feedforward effects of the complete network. Such effects would not be filtered out with our approach. Nonetheless, this approach has utility. Predicting the impact of the direct regulation of a TF is conceptually akin to describing the aspects of the function of a TF by considering the functions of the genes in its direct regulon, a common practice. Although the ultimate function of a TF necessarily includes its reciprocal effect on the entire system, the comparison between the global impact and the direct impact can provide insight into what changes depend on interactions of the wider system (e.g. PhoP) and what changes are more independently associated with the direct targets of the TF (e.g. as predicted for TAG and DosR).
Several considerations also impact the interpretation of predictions based on the induction of transcription factors. A primary concern is that induction could result in non-specific changes in gene expression that do not reflect the physiologically relevant function of the induced TF. While this cannot be fully ruled out, we do not observe an overlap in either gene expression or predicted metabolic impact across all the data sets that would imply a specific global effect of the induction protocol. In addition, the metabolic impact of inducing a TF may be an incomplete picture of the function of the TF. Down-regulation or deletion may reveal additional or contrasting effects of the TF. Both are likely important aspects of the TF function. However, to the degree that there is overlap of modulated metabolites, the predictions provide insight even if the directions of the effect may be different between the conditions. There are several potential reasons for deviations between our model predictions and measured changes in metabolite and lipid abundance. One potential source of error is in our choice of biomass functions. For example, we do not enforce any minimal level of biomass production as with the required metabolic function used in GIMME or TEAM [50,71]. Another potential source of error stems from a lack of complete genetic and biochemical knowledge of lipid production pathways. Recent efforts at manual re-annotation [100] and conditionspecific high-throughput essentiality studies [38,101] have continued to improve the existing model.
We have used our method to provide insight into the functions of transcriptional regulators in MTB. The MTB genome contains roughly 180 transcription factors [31], the functions for the majority of which are unknown. Using global gene expression data for the induction of each MTB TF [31,78] publicly available at TBDB.org, we applied our approach to associate each TF with the predicted modulation of 7 major lipid classes (Fig. 3) and 207 metabolites (Additional file 2). Using experimentally determined binding sites derived from ChIP-Seq we have also simulated the metabolic impact of inducing the direct regulons of each TF. The comparison of both simulations provides insight into which functions are mediated by the TF directly, and which may arise as a result of downstream regulatory and metabolic interactions. Comparing results across regulators and metabolites suggests that in the majority of cases metabolites are impacted by TFs through indirect effects. This suggests that the full impact of a regulator can only be understood in the context of the larger regulatory and metabolic network.

Conclusions
We have presented E-Flux-MFC, an enhancement of the original E-Flux method that enables the prediction of changes in the production of both external and internal metabolite corresponding to changes in gene expression data. We validated our method using multiple datasets combining gene expression and metabolomics measurements. We have used our method to provide insight into the functions of transcriptional regulators in MTB. Using global gene expression data for the induction of each MTB TF we have associated each TF with the potential to modulate each of 7 major lipid classes and 207 metabolites. Using experimentally derived binding sites derived from ChIP-Seq we have also simulated the metabolic impact of inducing the direct regulons of each TF. The comparison of both simulations suggests that in the majority of cases metabolites are impacted by TFs through indirect effects. This indicates that the full impact of a regulator can only be understood in the context of the larger regulatory and metabolic network.
Although we have applied E-Flux-MFC to Mycobacterium tuberculosis, it is applicable to any organism for which accurate metabolic models are available. It may also prove useful for both general and tissue-specific models of human metabolism. Several efforts have been undertaken to predict changes in the abundance of metabolic markers in an effort to understand the mechanisms underlying human diseases and to propose novel diagnostics [102]. The reconstruction of cellspecific models of human metabolism has benefited from the integration of gene expression data collected from those cells. Models describing the metabolism of hepatocytes [103][104][105], macrophages [47,106], and neurons [107] have been published, among others.

MTB metabolic model
For our analysis, we utilized a modified version of the GSMN-TB model, which was originally described by Beste et al. [43]. Our modifications were incorporated in order to achieve more agreement with the current state of biochemical knowledge of the pathways responsible for the production of sulfolipid-1, phthiocerol dimycocerosates, triacylglycerol, diacyltrehalose, and polyacyltrehalose. We validated the function of our model by measuring the accuracy of the model for the prediction of gene knockout essentiality. We utilized the transposon site hybridization (TraSH) mutagenesis data set utilized to validate the original GSMN-TB model [43,108]. The TraSH data set provides microarray signal ratios that represent the relative abundances of each mutant in the TraSH library. A lower ratio indicates that a particular labeled transposon mutant is present at lower abundance in a culture relative to the abundance of a genomic DNA sample. In order to assign a gene as essential, we apply a threshold to this ratio. Microarray ratios that fall below this threshold are considered to be essential.
For each gene in the data set, we measured the growth rate in the model after the gene had been knocked out. For several different values of the microarray signal ratio, we calculated the area under the curve (AUC) for a receiver-operator characteristic (ROC) curve generated by calculating true positive and false positive rates across a range of growth rate thresholds. We performed this analysis for the original GSMN-TB model and the modi-

Transcription factor knockout data
Both of the two-color microarray datasets were analyzed using LIMMA [109] and MAANOVA [110], microarray analysis libraries for the R statistical programming language [111]. LIMMA was used to download datasets from LIMMA, for background correction using the normexp model, and for within-array normalization using the printold tip loess method. After background correction and normalization, MAANOVA was used as described previously [46]. We used MAANOVA here to fit an analysis of variance model of the form described in Equation.
As in the model used utilized for analysis of two-color microarray in the E-Flux framework [46], y ijkg denotes the log-transformed measurement from channel i, chip j, sample k, and gene g. ŷ kg is the value of gene expression that is specific to the sample k and gene g and ε ijkg is the measurement error. The model is fit to minimize the residual sum of squares. RSS = ε ijkg 2 is used as the main input for our metabolic modeling method.

E-Flux-MFC
In order to answer questions about the accumulation or degradation of both intracellular and extracellular metabolites using the metabolic model of MTB, we developed an extension of the E-Flux and PROM methods called E-Flux-MFC (E-Flux for maximum flux capacity). Both E-Flux and PROM are extensions of a method called flux balance analysis (FBA) [59]. FBA may be described as the linear programming problem in Equation. Maximize Where S is a matrix that captures the stoichiometries of constituent reactions (the stoichiometric matrix), v LB and v UB are vectors describing the upper and lower bounds of each reaction in the model, v is the set of fluxes determined by optimizing the objective function, and c is an objective function to be maximized and varies depending on the model and the conditions to be simulated.
Generally, these bounds are determined by measuring reaction fluxes through uptake reactions, defining them from physical parameters (e.g. from diffusion constants), or calculating them from thermodynamic constraints [59]. In both E-Flux and E-Flux-MFC, these bounds are calculated as a function of the expression of the genes that are associated with each reaction. The relationships between genes, proteins, and reactions in the model are represented by Boolean gene-protein-reaction (GPR) formulas.
For some reactions, there is a one-to-one correspondence between genes and the gene product catalyzing that reaction. In these cases, we substitute the gene expression value directly for the reaction expression value. We follow an approach similar to that described in several previous approaches [46,68,72,75]. In order to utilize these Boolean formulae corresponding to enzyme complexes to calculate a continuous reaction-level expression from gene expression values we incorporate convert AND relationships between genes to the minimum expression value of those two genes. We convert OR relationships the sum of the expressions of two genes. This method handles arbitrarily complex isozyme and enzyme complex relationships. While many factors contribute to enzyme activity, E-Flux-MFC uses gene expression to approximate maximum reaction activity.
In order to ensure that MFC values are comparable between replicates, reaction-level expression values are normalized within each replicate. For each experiment and control pair, we normalize by dividing by the maximum value within each replicate. This calculation yields a value that is not scale-dependent and is thus comparable across replicates. Condition-specific reaction bounds are calculated by multiplying this normalized expression level value by a set of baseline flux bounds determined using flux variability analysis (FVA) after the application of experiment-specific medium constraints, following the approach described Brandes et al. [68] using a computationally-efficient implementation [112]. In FVA, two linear programming problems are solved for each reaction in the model. These problems are described by Equation.
Maximize=minimize v i Subject to Sv ¼ 0 where v i represents each of n reactions in the model, and v LB and v UB are the lower and upper bounds on each reaction flux respectively. Here, Z obj is the value of the model objective function and Z obj min is the minimum value of this objective function to maintain during FVA.
As in PROM, we add a set of constraints on reaction fluxes that are calculated from gene expression data to the original constraints of flux balance analysis. This model is described by the following formulation in Equation. Maximize v LB and v UB are the model bounds and v LB ′ and v UB ′ are the expression-derived flux constraints. Equation minimizes the disagreement between the expression-derived flux bounds and the calculated reaction flux v. The relative weighting between the maximization of the objective function c T v (in this case, the maximization of the production or consumption of particular metabolite of interest) is determined by the parameter κ. The variables α and β are variables that are chosen by the linear programming solver and that allow the gene expression-weighted upper and lower bounds to be violated in order to provide a more optimal solution (i.e., solutions that are more consistent with the gene expression data). κ determines the balance between maximizing the value of the objective function and minimizing the sum of the violations of the expression-weighted reaction bounds. Although we constrain our model with both FVA-derived bounds and expression-derived bounds, we have observed that the size of the violation of the expression-derived bounds is generally small relative to the original bounds.
In a process that closely resembles flux variability analysis, we add a reversible demand reaction for each metabolite in turn that allows for us to relax the steady-state assumption for metabolites of interest. By maximizing the flux through the forward and reverse directions of these reactions, we generate values that tell us the maximum production and consumption fluxes for each metabolite in the model. The difference between these maximum production and consumption fluxes is a value that we term the maximum flux capacity (MFC). Between conditions, we calculate fold-changes in MFC by subtracting the experimental value from the control value and dividing by the absolute value of the control value. These fold-change values are converted to z-scores by dividing by the standard deviation of the fold change in MFC across each replicate in an experiment.

Sampling approach
In analyses utilizing microarray datasets for which replicates were conducted, we utilized expression data values across those replicates to study the effect of variance in gene expression on the final predictions of the model. For each optimization we sample from a Gaussian distribution with mean zero and with a standard deviation calculated from the standard deviation of each gene at each time point across all microarray replicates, utilizing an approach similar to that described in both [46] and [68]. In order to assess the significance of our predictions, we generate samples of gene expression values with this method using the control channel. We generate a null distribution of maximum flux capacities by comparing 1000 sets of control channel samples. We consider a prediction to be significant if it lies outside the interval containing 95 % of the control values.

Availability of supporting data
The phoP knockout data are available in NCBI's Gene Expression Omnibus (GEO) at accession number GSE22854. The dosR knockout and wild type hypoxic transition data are available at GEO accession GSE8829. The hypoxic time course and transcription factor overexpression data are available at GEO accession GSE43466 and on tbdb.org. We provide our complete model as an SBML file in Additional file 2. In addition, we have provided in Additional file 2: Table S1 the binding network used for the transcription factor overexpression analysis and the median-scaled metabolomics.

Additional files
Additional file 1: Figure S1.  Table S1 the binding network used for the transcription factor overexpression analyses and median-scaled metabolite abundance values for normoxic and hypoxic conditions. (ZIP 2706 kb)