In this work, a cost-effective hybrid methodology is reported to make sense of accessible fluxome data for rapid optimization of complex productivity phenotypes. PLS modelling is used in tandem with classical metabolic flux analysis to establish a link between an estimated metabolic state and system productivity, therefore providing a predictive in silico platform to assist genetic/environmental metabolic engineering when a well-defined stoichiometric description of product formation is not available. An important feature of PLS is that it decomposes complex data sets into subsets of uncorrelated vectors, called latent variables, while eliminating redundant information. This permits to address biological problems where the number of variables assessed largely exceeds the number of observations, reason why this method has gathered significance in interpreting "omic" data sets . As reviewed in Teixeira et al. , combining such data-mining tools with mechanistic models gives rise to hybrid parametric-nonparametric systems, which enable cost-effective analysis of complex problems with fragmentary knowledge.
Our method is conceived to perform on the basis of an informative, yet not exhaustive, preliminary set of experiments easily available at laboratory scale. It is especially suited to deal with complex products whose synthesis mechanisms are ill-defined by considering a simple stoichiometric description as part of a global metabolic model, or whose composition is unknown. Productivity enhancement in the case of simpler molecules, for instance amino acids or TCA cycle intermediaries, has been previously achieved by stoichiometric analysis of their respective synthesis pathways [32, 33]. Here, the main output is the global identification of fluxes strongly correlated with a highly productive state, which are discriminated from a background of less significant metabolic reactions contributing to product synthesis. Thus, as opposed to classical MFA, our approach enables predicting the productivity in independent experiments based on a previous calibration, and the identification of metabolic targets for production optimization. In this respect, the estimation of reliable confidence intervals for the flux regression coefficients is crucial to remove a large portion of uncertainty in the selection of metabolic targets, considerably improving the odds of successful experimental validation. Methods that assure a higher precision in fluxome estimations, such as isotopic tracer experiments , could in principle expose other targets for manipulation.
It should be noted that the predictive capacity for the phenotypic change does not necessarily translate in the ability to predict the means to deliver this change. Specifically, finding a strong statistical correlation between a given metabolic route and productivity does not translate into a direct cause-effect relationship. While this may often be the case for the synthesis of single molecules, production of correctly formed proteins or viruses depends on a complex series of steps ranging from gene transcription to protein secretion or virus assembly, along with their regulation through even less understood signalling events (35). Therefore, the identification of genetic targets may at times be beyond the domain of central metabolic fluxes, which themselves are upstream regulated along with the productivity phenotype. On this issue, the methodology herein presented could at least allow to hypothesize how different cell pathways/functions are commonly regulated.
Besides providing a list of prospective metabolic targets to be exploited for engineering, the proposed framework adds a functional dimension to previous metabolic decomposition studies based solely on structural properties of the underlying network, namely connectivity [36, 37] or pathway feasibility [38, 39]. Here, clusters of fluxes are defined by sharing the same relationship with a given cellular output. As a main drawback, our method is constrained by data availability on cellular fluxome and target phenotype, thus demanding some experimental effort.
As mentioned earlier, when used to handle flux distributions estimated by FBA in genome-scale models, this approach represents an alternative hybrid framework to linear optimization techniques during metabolic target selection, which could significantly surpass existing limitations in modeling complex phenotypes. A recent paper by Melzer et al. (2009) also explores the use of multivariate statistics to correlate stoichiometrically-derived elementary modes in complex networks with stoichiometrically-defined productivity targets, as opposed to common search algorithms for strain improvement . However, our approach differs conceptually to the cited work in that we define a statistical bridge between a well-defined stoichiometry and a complex phenotype, therefore substituting for a metabolic link that may be ill-defined in a purely stoichiometric representation and otherwise hampered by insufficient kinetic and regulatory information. This should prove an advantage in predicting non-obvious metabolic targets associated with the synthesis of more complex recombinant products in animal cells, particularly multimeric proteins, multi-protein particles and viruses, the later adding an additional degree of complexity due to the virus-coded regulation of cellular machinery.
Finally, from a practical point of view, several issues are worth considering before opting for a genome-scale model. In one respect, the availability of a well sequenced and annotated genome may constitute a major limitation: while accurate metabolic reconstructions are available for un-mutated, standard microorganisms such as E. coli and S. cerevisiae, in an industrial setting a larger diversity of organisms are used, particularly animal cell systems for which cellular data is much scarcer . Another consideration is the cost and time associated with the creation of these models. Even if a comprehensive genome-scale stoichiometric model is already at disposal, a considerable experimental effort is necessary to overlay high-throughput metabolomic and isotopomer flux data for better constraining fluxome estimations . In particular, the computational power required for 13C flux analysis may become prohibitive for very complex networks by today's standards. Overall, our framework could potentially be more useful to steer rapid development of a broad range of organisms on the basis of a representative small-scale metabolic network. As such, it would significantly enhance the quality of information extracted from exploratory experiments compared to traditional metabolic flux analysis.