- Research
- Open Access
Integrating transcriptional activity in genome-scale models of metabolism
- Daniel Trejo Banos^{1},
- Pauline Trébulle^{1, 2} and
- Mohamed Elati^{1, 3}Email author
https://doi.org/10.1186/s12918-017-0507-0
© The Author(s) 2017
- Published: 21 December 2017
Abstract
Background
Genome-scale metabolic models provide an opportunity for rational approaches to studies of the different reactions taking place inside the cell. The integration of these models with gene regulatory networks is a hot topic in systems biology. The methods developed to date focus mostly on resolving the metabolic elements and use fairly straightforward approaches to assess the impact of genome expression on the metabolic phenotype.
Results
We present here a method for integrating the reverse engineering of gene regulatory networks into these metabolic models. We applied our method to a high-dimensional gene expression data set to infer a background gene regulatory network. We then compared the resulting phenotype simulations with those obtained by other relevant methods.
Conclusions
Our method outperformed the other approaches tested and was more robust to noise. We also illustrate the utility of this method for studies of a complex biological phenomenon, the diauxic shift in yeast.
Keywords
- Inference and interrogation of regulatory network
- Metabolic modeling
- Saccharomyces cerevisiae
Background
The modeling of biological systems has come a long way for gene regulation, signaling networks and metabolism, but even the most cutting-edge models still focus on one subsystem at the time. The integration of the many subsystems that function together, with the development of modeling paradigms, is the next step in the process, and promising results have already been obtained [1]. For example, [2], constructed a whole-cell model by connecting 28 individual models, one for each of the relevant cell functions. The resulting model included more than 1200 experimentally observed parameters. Impressive as it is, the development of this model required a huge effort for a single organism. We aimed to develop a general methodology that can be adapted to different organisms very easily through minor modifications. We aimed to retain as much information as possible concerning external and internal effects on genotype-phenotype interactions. For example, computational techniques have recently been used to optimize the yield of substrates produced by microorganisms for industry [3] and to study gene-metabolism interactions in medicine [4].
We focus here on the integration between metabolic models and gene regulatory networks for studies of growth phenotypes. Metabolic models represent the chemical reactions required for growth and sustenance [5], whereas gene regulatory networks comprise the biological programs responsible for regulating cell function [6]. We aimed to use data analysis and mathematical modeling tools to improve both the quantity and quality of biological hypotheses relating to these two subsystems.
Related approaches include: pFBA [7] which involves two-level optimization together with post-processing and the detection of redundant fluxes, E-flux [8] in which the linear constraints on fluxes are derived from gene expression data for control and a specific conditions, GIMME [9], which uses gene expression data and a regulatory metabolic objective to detect inconsistencies in fluxes, and iFBA [10], in which a kinetic model of E. coli catabolite repression has been integrated into a simplified metabolic model. The iFBA approach requires the setting of a number of Ordinary Differential Equations(ODE) with their kinetic parameters, which decreases the generality of the model. Another integrative approach is PROM [11] in which gene expression data sets are used to compute the conditional probability of an enzyme being expressed given that its regulators are perturbed, these probabilities then being used to constrain a flux balance analysis model. Similarly, TRFBA uses gene expression data and a piece-wise linear model to formulate an optimization program accounting for gene expression [12]. One of the main drawbacks of all these previously described methods is the need to determine which TFs regulate each gene. These approaches are therefore dependent on the quality of the curated network.
We used a statistical reverse engineering method, hLICORN [13], to infer the targets of a given set of regulators at the genome scale. We then assessed the effect of a regulator on its inferred targets in a particular data set, using the CoRegNet [14] tool, which has functions for scoring the activating or repressing effects of a regulator. The derived score, or “influence”, represents the transcriptional state of the cell and forms the basis for posterior integration with metabolic models. CoRegNet allows prior knowledge from various sources to be integrated into the model, in accordance with the recommendations of the DREAM5 consortium [15]. Despite the many and varied publications on gene regulatory network inference [15], few efforts have been made to integrate these inference methods into other systems biology tools.
We based our metabolic analysis on phenotype simulations. We used a well-documented model of yeast metabolism iTO977 [16]. We assembled the inferred gene regulatory and metabolic model together in a rational manner, to simulate growth phenotype and exchange fluxes in an algorithm that we call CoRegFlux. We tested our solution against other state-of-the art methods in a rigorous experimental setting for model benchmarking and comparison [17].
Methods
Genetic regulatory network inference
The first step of our algorithm is the inference of a genome-scale regulatory network. This network captures the interactions between regulators (TF and/or kinases) and target genes, which, in our case, encode metabolic enzymes.
For this purpose, we use CoRegNet [14], a Bioconductor package suitable for reverse-engineering and analyisis of large biological networks. Briefly, CoRegNet is a workflow that use the algorithm [13, 18] to mine candidates GRNs set of co-activators and co-inhibitors for each genes. h-LICORN splits genes into regulator and target sets, then discretizes gene expression on the basis of a specified threshold and uses a frequent itemset mining algorithm to find the regulatory elements for each target. In a second step, it determines for each gene the best sets among those candidates by running a regression model. The continuous data can be used alone to refine the original network by selecting for each target the gene regulatory network (GRN) with the best R ^{2} score based on the linear model used to estimate the expression. However, CoRegNet can also refine GRNs by incorporating evidence into the network using an integrative selection algorithm and applies it to the selection of local GRN models. Each GRN is scored by the inference method h-LICORN and by each of the integrated dataset. Following this, to each GRN is associated as many scores as they are integrated regulatory and cooperative datasets in addition to the network inference R ^{2} score, all which range from 0 to 1. Finally, for each gene, the GRN with the maximum merged score is selected. The refined network obtained is then transformed into a cooperativity network, based on the common targets of regulators.
We began by selecting a data set containing enough gene expression samples to obtain a representative network of gene regulation in yeast: Many Microbe Microarray Database (M3D) [19]. This database contains data from 247 experiments measuring gene expression under different conditions in microarray assays. The data were collated, normalized and averaged (in the case of replicates) for 5520 probes mapping onto ORF.
Network interrogation and influence score
where \(\hat {X}_{A}\) and \(\hat {X}_{R}\) are the means of the activated and repressed targets of regulator j respectively. The variables s _{ A }, s _{ R } represent the respective standard deviations and n _{ A }, n _{ R } represent the number of genes contained in the respective set. The influence score accounts for the effect of a regulator on its targets according to the regulator-target relationships inferred using hLICORN and additional data-sources integrated in the network as evidences using the CoRegNet Bioconductor package. Briefly, this measure is based on a Welch’t-test between the expression of the activated and repressed targets genes in a given samples.
Thus, a data set of thousands of gene expression measurements is reduced to just dozens or hundreds of activity scores (one score for each regulator with a significant influence). In the case of the M3D data set, the influence score was computed for the set of TFs and kinases as given by [20] (200304 possible TF-target interactions) and the S. cerevisiae Kinase and Phosphatase Interactome resource [23] (262354 possible P-P interactions), with a total of 567 potential regulators.
where x _{ ik } is the expression level for enzyme i in sample k, P a(x _{ i }) is the set of regulators of i in the network and I _{ jk } is the influence of regulators j in sample k. The objective of the linear regression is to calculate the regression coefficients β _{ j }. For our purposes, we trained the linear model on the M3D data set. Thus, the beta coefficients capture the general relationship between gene expression and influence over a wide range of conditions. The linear regression model is then used to predict the level of expression of a gene encoding a given enzyme in the set of context-specific samples of interest. For this, we calculate influence for the samples of interest and predict the gene expression of the metabolic genes with 2. In this study, we used the data set of [24], from a study in which a yeast strain was subjected to various oxygen concentrations. Using the inferred network, we calculated influence scores for this data set. According to the CoRegFlux work flow, we used influence to predict enzyme activity for each sample, based on a linear regression model. We limited predictions to the enzymes present in the genome-scale metabolic model of yeast [16]. These predictions sum up all the available regulatory information for yeast, as given by the inferred network, along with the context specificity of TF influences.
Metabolic model adjustment
We then translated the predicted enzymatic activities into bounds, corresponding to the extent to which the enzyme encoded by the gene can catalyse a given reaction. These bounds can be used to constrain a linear program representing the metabolic fluxes of a stoichiometric model under the assumption of steady state, a method known as flux balance analysis [25]. The algorithm is as follows:
- 1.We transform the gene-protein-reaction (gpr) rules, which relate enzyme and enzyme complexes to a given reaction in the model. The original rules are in boolean form and our substitution follows a continuous approximation similar to [26, 27]. Thus the conversion is:
- (a)
OR sentences, which represent isoenzymes regulating the same reaction are subsituted by a function max(), which returns the maximum of the expression of the corresponding enzymes.
- (b)
AND sentences, which represent the formation of enzyme complexes are substituted by a function min() which returns the minimum of the expression of the corresponding enzymes.
- (c)
If the enzyme expression is not available, the enzyme is discarded from the rules.
We denote the evaluation of a gpr rule for an enzyme-associated reaction by g p r _{ r }(X _{ pred }), with X _{ pred } being the set of predicted gene expressions as a function of the influence scores.
- (a)
- 2.Using the continuos gpr rules we adjust the fluxes bounds for each gpr-associated flux, denoted v ^{ r } by the following relation$$ v^{r}\leq\ln\left(1+\exp\left(gpr_{r}\left(X_{pred}\right)+\theta\right)\right) $$(3)
where we introduce the parameter θ to account for enzymatic action over the reaction. We assume this parameter is condition-specific. We chose the activation function, known as softplus [28], to convey the non-linear relationship between gene expression and protein concentrations. Unlike other non-linear activation functions, like sigmoids, the softplus has a range of (0,+∞) making it straightforward to use as flux bounds.
With these new constraints, the flux values and biomass yield can be calculated by solving the linear program associated with the model. We used the R package sybil [29] to find the flux distribution optimizing growth under the new bounds.
Bayesian optimisation of the parameters
Results
We evaluated the results generated by our method in terms of the accuracy with which they predicted exchange fluxes and to illustrate the use of this approach in a relevant case study. We performed robustness tests to determine whether influence gave a more reliable picture of the regulatory state of the cell. As our case study we choose the diauxic shift, a complex biological process involving major changes in transcriptional and metabolic elements in yeast.
Robustness tests
Case study
Discussion and conclusions
We propose CoRegFlux, a new algorithm for integrating gene regulatory network inference with constraint-based metabolic models. Our method provided better median predictions with a lower variance prediction than other state-of-the-art methods for predicting exchange fluxes under different levels of perturbation of gene expression data. One of the limitations of this method is that it cannot determine the optimal parameter value for systems in which the normal FBA solution underestimates biomass yield, although it should be pointed out that FBA overestimates the growth rate in most cases [37]. From the robustness tests and the case study, we can conclude that influence score calculation is a reliable way to assess the overall effects of gene regulation. This advantage of the influence score places it among other approaches to dimensionality reduction for gene expression such as network component analysis [38]. The importance of having a clear idea of the transcriptomic state of the cell has been demonstrated in studies of metabolism and responses to particular conditions. For example, recent results have suggested that at least 70% of the total variance in promoter activity across conditions can be accounted for by global transcriptional regulation in E. coli [39].
As mentioned above, this method has potential applications in research, industry and medicine, and its improvement would therefore be worthwhile. For example, it would be interesting to include different models of gene regulation as additional predictors of enzyme activity. Future research studies could also include metabolic network learning, with a view to the development of a data-driven integrated algorithm. Finally, this method is designed to serve as a basis for the in-silico optimization of biological objectives, of potential value for experimental design in systems and synthetic biology.
Declarations
Acknowledgements
We would like to thank the AdaLab consortium and iSSB I3-BioNet team for feedback about the tool and helpful discussions. We also thank J. Sappa from Alex Edelman and Associates for careful reading of the manuscript.
Funding
This work was supported by CHIST-ERA grant (AdaLab, ANR 14-CHR2-0001-01). P.T. was supported by a fellowship from the French National Research Agency (ANR) as part of the “Investissement d’Avenir” program, through the “IDI 2016” project funded by the IDEX-Saclay, ANR-11-IDEX-0003-02. Funding for publication charge: AdaLab, ANR 14-CHR2-0001-01.
Availability of data and materials
Data and source code can be downloaded from http://github.com/i3bionet/CoRegFlux.
About this supplement
This article has been published as part of BMC Systems Biology Volume 11 Supplement 7, 2017: 16th International Conference on Bioinformatics (InCoB 2017): Systems Biology. The full contents of the supplement are available online at https://bmcsystbiol.biomedcentral.com/articles/supplements/volume-11-supplement-6.
Authors’ contributions
ME and DT conceived the study. DT and ME designed it and wrote the manuscript. All authors provided valuable advises in developing the proposed method and modifying the manuscript. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
- Macklin DN, Ruggero NA, Covert MW. The future of whole-cell modeling. Curr Opin Biotechnol. 2014; 28:111–5.View ArticlePubMedPubMed CentralGoogle Scholar
- Karr JR, Sanghvi JC, Macklin DN, Gutschow MV, Jacobs JM, Bolival Jr B, Assad-Garcia N, Glass JI, Covert MW. A whole-cell computational model predicts phenotype from genotype. Cell. 2012; 150(2):389–401.View ArticlePubMedPubMed CentralGoogle Scholar
- Michael DG, Maier EJ, Brown H, Gish SR, Fiore C, Brown RH, Brent MR. Model-based transcriptome engineering promotes a fermentative transcriptional state in yeast. Proc Natl Acad Sci. 2016; 113(47):E7428–37.View ArticlePubMedPubMed CentralGoogle Scholar
- Gatto F, Miess H, Schulze A, Nielsen J. Flux balance analysis predicts essential genes in clear cell renal cell carcinoma metabolism. Sci Rep. 2015; 5:10738.View ArticlePubMedPubMed CentralGoogle Scholar
- Steuer R. Computational approaches to the topology, stability and dynamics of metabolic networks. Phytochemistry. 2007; 68(16):2139–51.View ArticlePubMedGoogle Scholar
- Davidson EH, (ed).The Regulatory Genome. Burlington: Academic Press; 2006. pp. 1–29.Google Scholar
- Lewis NE, Hixson KK, Conrad TM, Lerman JA, Charusanti P, Polpitiya AD, Adkins JN, Schramm G, Purvine SO, Lopez-Ferrer D, et al. Omic data from evolved e. coli are consistent with computed optimal growth from genome-scale models. Mol Syst Biol. 2010; 6(1):390.PubMedPubMed CentralGoogle Scholar
- Colijn C, Brandes A, Zucker J, Lun DS, Weiner B, Farhat MR, Cheng TY, Moody DB, Murray M, Galagan JE. Interpreting expression data with metabolic flux models: predicting mycobacterium tuberculosis mycolic acid production. PLoS Comput Biol. 2009; 5(8):e1000489.View ArticlePubMedPubMed CentralGoogle Scholar
- Becker SA, Palsson BO. Context-specific metabolic networks are consistent with experiments. PLoS Comput Biol. 2008; 4(5):e1000082.View ArticlePubMedPubMed CentralGoogle Scholar
- Covert MW, Xiao N, Chen TJ, Karr JR. Integrating metabolic, transcriptional regulatory and signal transduction models in escherichia coli. Bioinformatics. 2008; 24(18):2044–50.View ArticlePubMedGoogle Scholar
- Chandrasekaran S, Price ND. Probabilistic integrative modeling of genome-scale metabolic and regulatory networks in escherichia coli and mycobacterium tuberculosis. Proc Natl Acad Sci. 2010; 107(41):17845–50.View ArticlePubMedPubMed CentralGoogle Scholar
- Motamedian E, Mohammadi M, Shojaosadati SA, Heydari M. Trfba: an algorithm to integrate genome-scale metabolic and transcriptional regulatory networks with incorporation of expression data. Bioinformatics. 2017; 33(7):1057–63. doi:10.1093/bioinformatics/btw772.PubMedGoogle Scholar
- Elati M, Neuvial P, Bolotin-Fukuhara M, Barillot E, Radvanyi F, Rouveirol C. Licorn: learning cooperative regulation networks from gene expression data. Bioinformatics. 2007; 23(18):2407–14.View ArticlePubMedGoogle Scholar
- Nicolle R, Radvanyi F, Elati M. Coregnet: reconstruction and integrated analysis of co-regulatory networks. Bioinformatics. 2015; 31(18):3066–8. doi:10.1093/bioinformatics/btv305.View ArticlePubMedPubMed CentralGoogle Scholar
- Marbach D, Costello JC, Küffner R, Vega NM, Prill RJ, Camacho DM, Allison KR, Kellis M, Collins JJ, Stolovitzky G, et al.Wisdom of crowds for robust gene network inference. Nat Methods. 2012; 9(8):796–804.View ArticlePubMedPubMed CentralGoogle Scholar
- Österlund T, Nookaew I, Bordel S, Nielsen J. Mapping condition-dependent regulation of metabolism in yeast through genome-scale modeling. BMC Syst Biol. 2013; 7(1):36.View ArticlePubMedPubMed CentralGoogle Scholar
- Machado D, Herrgård M. Systematic evaluation of methods for integration of transcriptomic data into constraint-based models of metabolism. PLoS Comput Biol. 2014; 10(4):e1003580.View ArticlePubMedPubMed CentralGoogle Scholar
- Chebil I, Nicolle R, Santini G, Rouveirol C, Elati M. Hybrid method inference for the construction of cooperative regulatory network in human. NanoBioscience IEEE Trans. 2014; 13(2):97–103.View ArticleGoogle Scholar
- Faith JJ, Driscoll ME, Fusaro VA, Cosgrove EJ, Hayete B, Juhn FS, Schneider SJ, Gardner TS. Many microbe microarrays database: uniformly normalized affymetrix compendia with structured experimental metadata. Nucleic Acids Res. 2008; 36(suppl 1):D866–70.PubMedGoogle Scholar
- Teixeira MC, Monteiro P, Jain P, Tenreiro S, Fernandes AR, Mira NP, Alenquer M, Freitas AT, Oliveira AL, Sá-Correia I. The yeastract database: a tool for the analysis of transcription regulatory associations in saccharomyces cerevisiae. Nucleic Acids Res. 2006; 34(suppl 1):D446–51.View ArticlePubMedGoogle Scholar
- Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M. Biogrid: a general repository for interaction datasets. Nucleic Acids Res. 2006; 34(suppl 1):D535–9.View ArticlePubMedGoogle Scholar
- Nicolle R, Elati M, Radvanyi F. Network transformation of gene expression for feature extraction. In: 2012 11th International Conference on Machine Learning and Applications. Piscataway: IEEE: 2012. p. 108–113. doi:10.1109/ICMLA.2012.27.Google Scholar
- Breitkreutz A, Choi H, Sharom JR, Boucher L, Neduva V, Larsen B, Lin ZY, Breitkreutz BJ, Stark C, Liu G, et al. A global protein kinase and phosphatase interaction network in yeast. Science. 2010; 328(5981):1043–6.View ArticlePubMedPubMed CentralGoogle Scholar
- Rintala E, Toivari M, Pitkänen JP, Wiebe MG, Ruohonen L, Penttilä M. Low oxygen levels as a trigger for enhancement of respiratory metabolism in saccharomyces cerevisiae. BMC Genomics. 2009; 10(1):461.View ArticlePubMedPubMed CentralGoogle Scholar
- Orth JD, Thiele I, Palsson B. What is flux balance analysis?Nat Biotechnol. 2010; 28(3):245–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Jensen PA, Lutz KA, Papin JA. Tiger: Toolbox for integrating genome-scale metabolic models, expression data, and transcriptional regulatory networks. BMC Syst Biol. 2011; 5(1):147.View ArticlePubMedPubMed CentralGoogle Scholar
- Osorio D, Botero K, Gonzalez J, Pinzon A. exp2flux: Convert Gene EXPression Data to FBA FLUXes. 2016. http://CRAN.R-project.org/package=exp2flux. R package version 0.1. Accessed Feb 2017.
- Dugas C, Bengio Y, Bélisle F, Nadeau C, Garcia R. Incorporating second-order functional knowledge for better option pricing. Adv Neural Inf Process Syst. 2001;:472–8.Google Scholar
- Gelius-Dietrich G, Fritzemeier CJ, Desouki AA, Lercher MJ. sybil – efficient constraint-based modelling in r. BMC Syst Biol. 2013; 7(1):125. doi:10.1186/1752-0509-7-125. http://www.biomedcentral.com/1752-0509/7/125.
- Močkus J. On bayesian methods for seeking the extremum In: Marchuk GI, editor. Optimization Techniques IFIP Technical Conference Novosibirsk, July 1-7, 1974. Berlin: Springer Berlin Heidelberg: 1975. p. 400–4. doi:10.1007/3-540-07165-2_55.Google Scholar
- Yan Y. rBayesianOptimization: Bayesian Optimization of Hyperparameters. http://github.com/yanyachen/rBayesianOptimization. R package version 1.1.0. Accessed Feb 2017.
- Brauer MJ, Saldanha AJ, Dolinski K, Botstein D. Homeostatic adjustment and metabolic remodeling in glucose-limited yeast cultures. Mol Biol Cell. 2005; 16(5):2503–17.View ArticlePubMedPubMed CentralGoogle Scholar
- DeRisi JL, Iyer VR, Brown PO. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science. 1997; 278(5338):680–6.View ArticlePubMedGoogle Scholar
- Zampar GG, Kümmel A, Ewald J, Jol S, Niebel B, Picotti P, Aebersold R, Sauer U, Zamboni N, Heinemann M. Temporal system-level organization of the switch from glycolytic to gluconeogenic operation in yeast. Mol Syst Biol. 2013; 9(1):651.View ArticlePubMedPubMed CentralGoogle Scholar
- de Jong BW, Siewers V, Nielsen J. Physiological and transcriptional characterization of saccharomyces cerevisiae engineered for production of fatty acid ethyl esters. FEMS Yeast Res. 2016; 16(1):fov105.View ArticlePubMedGoogle Scholar
- Varma A, Palsson BO. Stoichiometric flux balance models quantitatively predict growth and metabolic by-product secretion in wild-type escherichia coli w3110. Appl Environ Microbiol. 1994; 60(10):3724–31.PubMedPubMed CentralGoogle Scholar
- Segrè D, Zucker J, Katz J, Lin X, D’haeseleer P, Rindone WP, Kharchenko P, Nguyen DH, Wright MA, Church GM. From annotated genomes to metabolic flux models and kinetic parameter fitting. OMICS A J Integrative Biol. 2003; 7(3):301–16.View ArticleGoogle Scholar
- Liao JC, Boscolo R, Yang YL, Tran LM, Sabatti C, Roychowdhury VP. Network component analysis: reconstruction of regulatory signals in biological systems. Proc Natl Acad Sci. 2003; 100(26):15522–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Kochanowski K, Gerosa L, Brunner SF, Christodoulou D, Nikolaev YV, Sauer U. Few regulatory metabolites coordinate expression of central metabolic genes in escherichia coli. Mol Syst Biol. 2017; 13(1):903.View ArticlePubMedPubMed CentralGoogle Scholar