 Research article
 Open Access
 Published:
Cell functional enviromics: Unravelling the function of environmental factors
BMC Systems Biologyvolume 5, Article number: 92 (2011)
Abstract
Background
While functional genomics, focused on gene functions and genegene interactions, has become a very active field of research in molecular biology, equivalent methodologies embracing the environment and geneenvironment interactions are relatively less developed. Understanding the function of environmental factors is, however, of paramount importance given the complex, interactive nature of environmental and genetic factors across multiple time scales.
Results
Here, we propose a systems biology framework, where the function of environmental factors is set at its core. We set forth a "reverse" functional analysis approach, whereby cellular functions are reconstructed from the analysis of dynamic envirome data. Our results show these data sets can be mapped to less than 20 core cellular functions in a typical mammalian cell culture, while explaining over 90% of flux data variance. A functional enviromics map can be created, which provides a template for manipulating the environmental factors to induce a desired phenotypic trait.
Conclusion
Our results support the feasibility of cellular function reconstruction guided by the analysis and manipulation of dynamic envirome data.
Background
The phenotype of a cell results from the combined effect of environmental and genetic factors [1, 2]. While the investigation of the relationship between genetic and environmental factors at the molecular level has proved difficult [3], a number of recent systems biology studies have produced important insights about the nature of this complex relationship.
The relationship between genes and environment can be analysed by computational methods founded on the principle that the biochemical habitat of a cell is primarily "engraved" in its DNA sequence. Genes may be associated with metabolic enzymes, membrane transporters, signal transduction or regulatory control. Combined with basic biochemical information currently available in several databases (e.g. KEGG [4] and BioCyc databases [5]), it is possible to reconstruct the majority of the metabolic reaction network and also the associated exometabolome [6]. This is the currently accepted topdown, genetofunction molecular biology paradigm.
Following these principles, Borenstein et al., [7] applied a graphtheoretical algorithm to determine these exogenously acquired compounds  they called it the seed set of an organism  for each of the 478 prokaryotic species with available metabolic networks in the KEGG database [4]. They found that about 811% of the compounds in the metabolic network of an organism correspond to the seed set and that each organism possesses a characteristic seed set. Moreover, comparing the seed set of the different organisms enabled them to trace the evolutionary history of both metabolic networks and growth environments across the tree of life, supporting the "reverse" ecology principle.
Using comparative genomics and flux balance analysis, Pal et al. [8] concluded that the adaptive evolution of bacterial metabolic networks in response to changing environments proceeds essentially by horizontal gene transfer (i.e. genes acquired from other species) of genes involved primarily in the transport and catalysis of external nutrients. With a similar approach, Kreimer et al. [9] studied the modularity of metabolic networks among 325 prokaryotic species with sequenced genomes. They observed that several environmental factors contributed to the variation in metabolicnetwork modularity across species. Their observations corroborated the findings of Pal et al. [8] that the variability in the natural habitat of an organism promotes modularity in its metabolic network.
Allen et al. [10] and Kell et al. [11] analysed the complete set of metabolites excreted or secreted by yeast cultures using highthroughput direct injection mass spectrometry. They called the technique "metabolic footprinting". They observed that a high number of metabolites, typically between 50150 metabolites, are either excreted or secreted to the medium. Using statistical methods they showed that the information contained in the environment is sufficient to distinguish between different physiological states of wild type yeast strains and singlegene deletion mutants and concluded that the technique has high potential for functional analysis and metabolic engineering.
All the above studies concur with a profound, bidirectional link between environment and genes, with routes on the evolutionary principles of life. When exposed to a particular environment, a given genotype will respond and change its environment in a unique way, which can be distinguished even between singlegene deletion mutants [10]. Supported by these observations, we explored the feasibility of a bottomup, envirome to cellular function reconstruction approach. Instead of using transcriptome data as in functional genomics, we set out to determine if it is possible to reconstruct cellular function from the analysis of envirome data sets alone. We term this reverse functional analysis approach "cell functional enviromics". In what follows, we describe the principles of the employed methodology and show how this approach can be applied to reconstruct Baby Hamster Kidney (BHK) metabolism.
Results and Discussion
A systems biology approach to cell functional enviromics
In functional genomics, the goal is genomewide cellular function reconstruction through the collection and analysis of transcriptome or proteome data over time. In functional enviromics, we attempt to bridge environmental factors and function through the collection and analysis of dynamic envirome data. The necessary elements to such an analysis are: i) setting the universe of cellular functions and envirome components, ii) collecting informative envirome data over time, and iii) systems level analysis of dynamic envirome data to find relationships between environmental variables and cellular functions.
To support such a framework, we have developed a computational algorithm called "enviromeguided projection to latent pathways (PLP)". "Cellular function" has been defined in different ways, here we adopted elementary flux modes (EM) as function descriptors [12–18] because they enable systematic, largescale computational analysis of biochemical networks from a functional viewpoint. Mathematically, EMs form the convex basis of the null space solution of a metabolic network stoichiometric matrix. Biologically, they represent elementary network topologies defining all possible independent operational modes of a cell. EMs have been previously used to support computational functional genomics [19, 20]. Here we apply the same rational to derive a functional enviromics algorithm.
A medium scale network has typically a very large number of EMs in the order of ~10^{6}[21]. It is however unlikely that all of these cellular functions are active at the same time. In reality, despite the apparent complexity of cellular networks, only a very limited number of core network topologies are capable of robustly executing any particular biological function [22]. It is thus rational to assume that a given environmental stimulus activates only a moderate number of characteristic EMs.
In conformity with the above enumerated principles, PLP was designed to maximise the covariance between environmental state (envirome data sets) and observed phenotypic trait (rate of change of envirome variables) under the constraint of known genes translated into a plausible set of cellular functions (Figure 1). Simply put, PLP implements a discrimination algorithm to find a minimal set of EMs based on two criteria: i) variance of measured phenotypic trait explained by each EM and ii) degree of correlation of each EM with the environmental state. By maximising these two criteria, the algorithm delivers a ranking of EMs in increasing order of percentage of explained variance in the measured flux data, and a functional enviromics map (FEM) representing the strength of up or downregulation of EMs by environmental factors. Mathematical details of the method can be found in the Methods section. In what follows we illustrate the methodology with an application to a recombinant BHK cell line expressing the fusion glycoprotein IgG1IL2 (see Methods for details).
Envirome guided metabolic reconstruction of BHK cells
The methodology was first applied to the analysis of a data set comprising measurements of 27 environmental factors collected from six independent, differently operated culture experiments as described in Methods and Table 1. The main objective was to identify the "active" set of EMs by iteratively projecting the metabolism into a minimal set of core cellular functions that correlate with the envirome. The predictor (input) matrix included an extensive list of environmental factors including: temperature, pH, osmolality and concentrations of 24 measured components (viable cells, glucose, lactate, ammonia, glycoprotein and 19 amino acids). The target (output) matrix consisted of the 24 exchange fluxes calculated from the profiling of extracellular concentrations. The results are shown in Figure 2. The explained flux variance was calculated according to Eq. (4). It represents the variance in the flux data that is explained by a model built on the selected EMs.
A high degree of correlation was observed between measured envirome components and relative weight of discriminated EMs, ultimately reflected in high correlation between measured and predicted extracellular rates (Figure 2a). Within the universe of 251 possible EMs, 20 of them captured up to 93% of the measured rates variance (Figure 2b), even though the data were pooled together from several independent cultivation experiments, with inherently diverse environmental trajectories. The first selected EM is one of many for biomass synthesis (see Tables 2 and 3), describing more than 64% of total rate data variance. This concurs with the fact that most of the processed carbon is involved in biomass synthesis. The single EM for product formation (EM 1) is also selected, explaining part of the specific consumption rates of every amino acid. The EM corresponding to anaerobic conversion of glucose into lactate was selected in third place (EM 11). Given the low saturation constants of glucose transporters (Km = 12 mM) [23, 24]), this pathway usually carries a high flux in mammalian cell cultures, leading to waste lactate formation. Serine transamination into glycine was the fourth selection (EM 6). Indeed, the rates of serine consumption and glycine production were quite high, mainly during the initial periods of culture. This pathway may represent a strategy for ammonia detoxification, the main toxic byproduct in mammalian cell culture. The sixth position (EM 4) corresponds to the pathway of glutaminolysis, well known as a major carbon source for energy production in mammalian cells [25, 26]. Complete oxidation of glucose in the TCA cycle (EM 43) is also selected, reflecting the metabolic shift induced during glucose fedbatch experiments.
Several other methods were proposed to calculate EMs weighting factors from flux vectors but none of these methods use the correlation with the envirome as criterion for EMs discrimination. As pointed out by Wiback et al. [27], the extreme pathways (and elementary modes) weighting factors used to reconstruct a particular flux vector may not be unique. Linear programming methods can be applied to calculate the minimum and maximum values of such weighting factors, thus defining a range of possible values for each pathway, called the αspectrum [27]. Unique solutions can also be obtained if additional assumptions about the biological system are considered. For instance, Nookaew et al. [28] proposed the determination of weighting factors by mixed integer linear programing to maximise the number of elementary modes that explain measured yield values. Using a different strategy, Schwartz and Kanehisa [29] proposed a quadratic program to minimize the sum of squares of weighting factors. To better illustrate our method, we compared it with the method proposed by Schwartz and Kanehisa [29], which basically selects the elementary modes that are closest to the actual biological sate by minimising the respective weighting factors. The application of this method to the 134 flux vectors data points identifies consistent subsets of 105 and 57 EMs with weighting factors bigger than 0.1 and 1.0 respectively (Figure 3a). As expected, this method selects a much higher number of EMs than PLP because it does not seek redundancy minimization. The 20 EMs with the highest contribution to flux data variance explained only 66% of variance (Figure 3b) compared to the 93% obtained with PLP (Figure 2a). This method also took 111 EMs to reach the 93% variance obtained with PLP with only 20 EMs. The 105 EMs with weighting factors bigger than 0,1 selected by the Schwartz and Kanehisa [29] method contain the 20 EMs selected by PLP but the former method does not link the EMs with the envirome.
On the whole, these results show that the most significant pathways in cultured mammalian cells could be identified from a larger, redundant set of possible routes using the correlation with the environmental state as the selection criterion. Further, the weighting factors of selected pathways are highly correlated with the environmental state denoting either a causal or effector relationship between environmental state and active pathways. Finally, such enviromecorrelated pathways explain more than 90% of flux data variance.
Assessment of pathway variability along culture time
Pathways may be up or downregulated either as a consequence of changes in the environment or driven by changes in the internal state ultimately reflected in a particular environmental "footprint". To monitor such regulatory processes, the PLP algorithm was applied to a moving time window, from the beginning to the end of each fedbatch experiment, to identify EM rearrangements over time. Each culture was partitioned into sequential metabolic phases, where quasi steady state is observed, according to the profiles of extracellular metabolite concentrations.
Table 2 illustrates the highly conserved active pathways obtained during growth in fedbatch 3 (for fedbatches 1 and 2, see Table 4 and 5, respectively). Again, a relatively small number of EMs was able to describe over 90% of the target rate data in every phase. A moderate degree of pathway conservation amid consecutive phases was observed, with, on average, 42% of the selected pathways appearing at least twice in each culture. This parallels with 38% of EMs repeatedly selected in at least two independent experiments (pooling together different phases within each culture), from which we can conclude that cells experience as much metabolic adaptation during culture as between cultures with different preset conditions. On the other hand, 65% of the selected EMs from the combination of all experiments in Figure 2b were conserved between cultures, showing the ability of the method to capture essential common features from independent data sets.
We further analyse in more detail the cases of the reversible EM 4 (Gln ↔ Glu + Amm) and EM 7 (Asn ↔ Amm + Asp), linked to amino acids metabolism, which were selected in all phases but at some point changed direction (the minus sign in Table 2). The main factors determining these changes in directionality were the limitation in glutamine concentration (EM 4 turned "negative") and the exhaustion of aspartate (EM 7 turned "positive"). The selection of such pathways (also repeatedly selected in all phases of fedbatch cultures 1 and 2) illustrates the ability of the method to tap on key metabolic adaptation processes.
Consistency analysis of PLP reconstructions
In order to assess the consistency of PLP results, the discriminated sets of EMs were backtransformed into reduced metabolic networks by excluding reactions not participating in any of the selected pathways. Then, metabolic flux analysis (MFA) was performed to compare intracellular flux distributions of the original BHK metabolic network and PLP reconstructed metabolic networks. Also the consistency index [30] was calculated for each stationary phase. The minimum number of EMs required for obtaining consistency between experimental data and the stoichiometry of simplified networks was the criterion to stop PLP reconstruction. This methodology yielded consistent networks with no more than 17 EMs (Figure 4a). Comparing the fluxes of original and reconstructed networks, it is clear that excluded reactions have fluxes close to zero in the original BHK network, while nonzero fluxes show almost imperceptible differences (for illustration, see Figure 4b with fedbatch 1 results).
The excluded reactions (actually pathways of lumped reactions) refer to the catabolism of amino acids (see Table 6), which normally occurs when their concentrations exceed the amount required for proteins synthesis [31]. Some are inactive during the majority of metabolic phases considered, namely r22, r26 and r27 (Figure 4c), corresponding to the catabolism of valine, arginine and histidine, respectively. Therefore, it can be concluded that the consumption of these amino acids is largely determined by protein synthesis (cellular and product), being practically unaffected by their extracellular concentrations within the range present in the medium. On the contrary, the consumption of phenylalanine, lysine, leucine, cysteine and methionine, (r20, r29, r30, r15 and r23, respectively) was largely influenced by their medium concentrations, since the corresponding catabolic pathways were active during more than half of the culture phases, suggesting these amino acids should be provided in controlled, stoichiometric amounts to minimize ammonia formation. A particular scenario arises for serine, whose catabolic conversion to pyruvate (r17) is downregulated only during three metabolic phases (see Table 6). As mentioned, cells also convert serine into glycine for ammonia detoxification (r16 or EM 6): although maintaining a high extracellular concentration could be beneficial, it may also lead to net ammonia formation if present in excess.
Functional Enviromics Maps (FEM)
Functional enviromics maps are intended to compile largescale quantitative information of the interactions between environmental factors and cellular functions. A FEM consists of a data matrix, FEM = {I_{i,j}}, with columns representing elementary flux modes, rows representing environmental factors, and I_{i,j} elements the respective regression coefficients. In some cases, the magnitude of I_{i,j} may be interpreted as the strength of up or downregulation of a given cellular function i by the environmental factor j (see Methods). The concept of FEM is illustrated in Figure 5 for the case of BHK cells using the full envirome data set, comprising the data of the 6 independent culture experiments.
Analysing these data, it can be observed that several envirome factors correlate positively with the flux for biomass synthesis through EM 20, namely the concentrations of glucose, lactate, glutamine and other 8 amino acids. On the other hand, extracellular pH and the concentrations of product and histidine correlate negatively. While glucose is known to be mandatory for many cell types in culture, BHK cells are capable of glutamine synthesis from glutamate [32]. However, it is also a major energy source for transformed mammalian cell lines, favouring growth when present in the medium. The remaining amino acids correlate positively with this EM since they are more concentrated during initial exponential growth when specific biomass formation is faster. As they are taken up by cells, the cellular division rate decreases as well.
In terms of product synthesis (EM 1), negatively correlated factors include the pH and glucose concentration, while osmolality and the concentrations of valine, isoleucine, leucine and lysine correlate positively with this flux. The negative correlation with glucose concentration is explained by bioreactor operation factors, as it decreases (or is maintained) along culture time whereas product formation rate increases, rather than by an inhibition mechanism on product synthesis. The effect of pH deserves further attention, since it also correlates negatively with biomass formation. As described in Methods, the bulk pH was allowed to vary between 7 and 7.25. Some reports agree that maximum growth rates [33] and recombinant protein productivities [33, 34] are favoured at pH = 7, thus around the lower limit of the values used in this work. On the other hand, it has been shown that specific productivity in mammalian cells is higher with increased bulk osmolality [35–37]. Our results support this observation, however it should be noted that the increase in osmolality resulted mainly from the feeding of a concentrated solution during the fedbatch phase. As expected, glucose concentration positively correlated with the flux of lactate formation through EM 11, and serine concentration is the most important factor that activates glycine production through EM 6.
Despite using data from six independent experiments, some environmental factors (namely the concentrations of essential amino acids) show similar variation trends, which precludes disclosure of their true contribution to cellular phenotype. It is clear that the identifiability of environmentEMs relationships is conditional to sufficient "system excitation", which can only be assured by modelbased design of experiments. Within proper circumstances of experimental design and physiological range of physical and biochemical variables, the information of EMs regression coefficients can be used to direct the phenotype to a desired state by manipulating genes, envirome or both. It should be noted that EMs represent clusters of genes, thus the EM regression coefficients translate the lumped effect of kinetic and genetic regulation induced by the envirome.
Conclusions
While in functional genomics the aim is to unravel gene functions from the analysis of transcriptome and proteome dynamical data [38–40], here we propose a reverse, envirometofunction analysis approach sourced by dynamic envirome data. This approach is founded on the assumption that the genome sets the universe of cellular functions while the envirome sets the relative contribution of each function to the observed phenotype. We have developed a systems level methodology that deconvolutes the observed phenotype (i.e. fluxome) into gene dependent structures (the structure of elementary flux modes) and envirome dependent structures (the relative weight of elementary flux modes ). Then, only the envirome dependent structures are linearly regressed against envirome components to discriminate the core cellular functions that correlate with the environmental state.
When applied to a recombinant BHK cell line, environmental data finds correspondence to less than 20 elementary flux modes, while explaining over 90% of flux variance. Most of the discriminated elementary flux modes are typical routes known to be active in cultured mammalian cells, while excluded routes are related with the catabolism of those amino acids not in excess in the medium, thus being directly used for protein synthesis. Furthermore, metabolic flux distributions of reduced metabolic networks, accounting for the discriminated core cellular functions only, were shown to be consistent with metabolic flux distributions of the original, unreduced metabolic network. These results essentially show that through the life of the culture, phenotypic variability is almost completely traceable by monitoring changes in envirome data. The remaining 10% of unexplained variance corresponds mostly to experimental error, thus leaving little room for genetically induced variability that cannot be traced to envirome data. On the whole, such results support the feasibility of a function reconstruction approach guided by the collection and analysis of envirome data sets.
When applied to a sliding time window along a single culture experiment, it was observed that several discriminated EMs are not active during all culture phases while others may change direction at some point. Such shortterm dynamic changes correspond to metabolic adaptation induced by environmental stimuli. While some metabolic adaptation effects are readily interpretable in terms of exhaustion or limitation of essential substrates, some other have nontrivial interpretation and may correspond to unknown functions of environmental factors. The analysis of EM rearrangements can provide insights about the mechanisms underlying such metabolic adaptation. Despite the variability in observed EMs with culture time, a high degree of conservation of cellular function among the six different cultivation experiments was obtained (65% of discriminated EMs appear in all cultures). These observations support the results presented by Ma et al. [22] who used computational methods to identify network topologies that can achieve biochemical adaptation. They concluded that despite the diversity of possible biochemical networks, only a finite set of core topologies represent a robust adaptation response to a stimulus. These results support the hypothesis of a minimal set of core cellular functions activated by the envirome. The use of less informative envirome data sets (when compared to metabolome, proteome or transcriptome data sets) translates into lower discriminating power of cellular functions, particularly of those cellular functions producing similar environmental footprints. However, since only a small number of core cellular functions with distinctive footprints are active they are easily identified from an analysis of the envirome, thus constituting a vital support of cell functional enviromics.
While our intention was to demonstrate the principle of cell functional enviromics, its full potential can be realized when largescale, highthroughput analysis techniques are employed for enviromewide reconstruction of cellular functions eventually leading to high resolution functional enviromics maps. This approach has many interesting practical applications ranging from drug design to macroscopic process control. A point to be made is that finding a statistical correlation, even if linked with a metabolic structure, does not allow one to discriminate between a "cause" and an "effect". Such discrimination is only possible after a systematic analysis on the basis of a sound experimental design, which might include simultaneous environmental and genetic perturbations.
Fundamental issues, which remain to be clarified, include the degree of conservation of envirome functions among different species. One can anticipate that, given the role of environmental factors in the evolution of life, some degree of conservation may exist between species, as implicitly acknowledged by those who defend the reverse ecological principle [7]. A systematic functional enviromics study applied to different species and genotype variants could shed light on this issue.
The need for enviromedriven systems biology approaches to address human diseases has long been recognized [41]. Others have proposed enviromics as a complement to genomic and proteomic studies of mental health [42]. Even the notion of 'functional enviromics' has been set forth as a counterpart to functional genomics in tackling schizophrenia disorders [43]. Our approach has been to explore this concept in cellular systems cell functional enviromics  which we view as a natural step in both fundamental and applied molecular biology research.
Methods
Enviromeguided projection to latent pathways (PLP)
According to the definition of elementary flux modes, the universe of infinite flux solutions, r, of a given metabolic network operating in steady state can be described as a nonnegative weighted sum of a finite number of elementary flux modes, e_{i} (e.g. [12, 14, 15]):
Elementary modes thus represent unique and nondecomposable flux solutions, from which all possible system solutions can be obtained through the proper determination of elementary modes weighting factors, λ_{i}. Several methods were proposed to calculate weighting factors, λ_{ i }, from flux vectors, r. (e.g. [27–29]). As noted by Wiback et al. [27], the reconstruction of a particular flux vector according to Eq. (1) is not unique if the number of elementary modes is higher than the number of fluxes, which is generally the case. Unique solutions can however be obtained with additional assumptions about the biological system, such as the maximisation of number of elementary modes [28], or minimisation of the values of weighting factors to select the elementary modes that are closest to the actual biological state [29]. In the present study we also calculate a unique solution for the weighting factors λ_{ i }although under completely different criteria. We set the weighting factors λ_{ i }as linear functions of envirome variables, x(t), which represents additional measured information in our method that is not used in other methods. The objective function in our case consists of:

i)
maximising correlation between λ _{ i }and envirome variables (or equivalently between r(t) and x(t)), and

ii)
minimising redundancy, i.e. eliminating all elementary modes with weak correlations with the envirome.
The basic idea is that the genome sets the structure of the elementary flux modes, while the relative weights of elementary flux modes are to a large extent set by the environment.
According to the above criteria, the PLP algorithm was designed to maximise the covariance between an input data matrix, X = {x(t)}, of independent envirome observations, x(t), and an output response matrix, R = {r(t)}, of observed steady state reaction rates, r(t), under the constraint of a plausible set of elementary flux modes. This problem can be expressed mathematically by the following constrained linear program:
with X = {x(t)} a np × nx matrix of np independent observations of envirome vectors x(t) (dim(x) = nx), R = {r(t)} a np × nr matrix of np independent observations of reaction rates, r(t) (dim(r) = nr), E = {e_{i}} a nr × nem matrix of nem elementary flux modes, e_{i} (dim(e_{i}) = nr) (for the BHK metabolic network, E is the 24 × 251 matrix given in the Additional File 1), Λ = {λ(t)} a np × nem matrix of weighting vectors, λ(t), of elementary flux modes (dim(λ) = nem) and C a nem × nx matrix of regression coefficients.
Maximising covariance implies two concomitant goals, namely maximising variance and minimising redundancy. Unconstrained maximisation of covariance can be achieved by the very popular projection to latent structures method, also known as partial least squares (PLS). Since PLP is derived from PLS, in the lines below we first review the structure of PLS and then show how it can be modified to PLP.
PLS is a multivariate linear regression technique that maximises the covariance between an input matrix, X, called the predictor, and an output response matrix, which here is the target rate data, R. It differs from traditional multivariate linear regression in that it decomposes iteratively both the predictor and the response matrices into a reduced set of uncorrelated latent variables, thereby eliminating redundancy in the input and output data sets. More specifically, the matrices X and R are decomposed into loadings matrices (W and Q), scores matrices (T and U) and residuals matrices (E_{ X }and E_{ R }):
The columns of the loadings matrices W and Q are the latent variables in which the input and output matrices are decomposed, respectively. Additionally, the outputs scores matrix U is linearly regressed against the inputs scores matrix T:
where B holds the regression coefficients determined by minimising the residuals E_{ U }. Finally, the explained variance in target data (which in our case is flux data, R) by the model is given by the following equation:
with indexes i and j denoting observation and flux indexes respectively. Solving system Eqs. (3ac), which imply minimising residuals E_{X}. E_{R} and E_{U}, can be effectively performed by the very popular Non Iterative Partial Least Squares (NIPALS) algorithm. For more details about PLS and NIPALS see, for instance, Geladi and Kowalski [44].
PLP can be viewed as a constrained version of PLS by the inclusion of the elementary flux modes constraints. Given their equivalent mathematical structures, an interesting comparison may be made between projection of target reaction rate data, R, onto latent variables Q (performed in PLS according to Eq. (3b)), and projection of flux vectors onto elementary modes weighting factors according to Eq. (1). Indeed, elementary flux modes e_{ i }can be viewed as PLS latent variables equivalents, while the elementary modes weighting factors λ_{ i }can be interpreted as PLS latent variables score values. Following this analogy, a straightforward modification to Eq. (3b) can be introduced
by substituting PLS loading matrix, Q, by the elementary modes matrix, E. This means that the latent structures E are no longer degrees of freedom in PLP as Q was in PLS. Instead, E represents a constraint to the relationship between X and R imposed by the metabolic network structure and ultimately by the genome of the cells.
An advantage of PLP over PLS is that both the latent variables and score values of the target matrix have in PLP a physical meaning. The latent variables are latent pathways while scores U are equivalent to Λ, i.e. they represent the relative weighting factors of latent pathways. For this reason, the algorithm is called projection to latent pathways. Moreover, the regression coefficients B can be used to deduce the functional enviromics matrix, FEM, as follows:
FEM is a nx × nem matrix comprising the regression coefficients of elementary flux modes against envirome components, thus providing information of how elementary flux modes are up or downregulated by envirome components.
The PLP algorithm was coded in MATLAB (Mathworks, Inc) as a modified version of the NIPALS algorithm {Geladi, 1986 #15}, wherein the loadings of the outputs are fixed a priori according to the elementary modes structure, E. The calculation of the elementary modes is not automatically integrated in PLP. For that we used the METATOOL 5.0 [13] as explained below.
Cell line and culture conditions
A BHK21 cell line constitutively expressing the fusion glycoprotein IgG1IL2 (antibody type one linked to Interleukin two) was used in this study [45]. Six cell cultures were performed in 2 L bioreactor vessels in a serumfree and proteinfree medium, SFM4CHO (Hyclone), without glucose and glutamine. One culture was performed in batch mode and five in fedbatch mode. The starting volume was 950 ml, dissolved oxygen was kept at 30% of air saturation through air/nitrogen sparging at a gas flow rate of 0.02 vvm; the agitation rate was kept at 70 rpm. The pH was controlled between 7 and 7.25 by addition of NaOH (0.2 mM) or CO_{2}. The batch culture started at 48 mM of glucose, 1.2 mM of Glutamine, 2.8 mM of Glutamate, 1.3 mM of Serine and 0.75 mM of Aspartate. As for the fedbatch cultures, the initial concentrations and feeding strategies are given in Table 1.
Envirome profiling
27 envirome components were profiled, namely temperature, pH, osmolality and concentrations of 24 extracellular compounds determined as follows. Cell counts were determined in a FuchsRosenthal haemocytometer; cell viability was assessed by the trypan blue dye exclusion method. Product concentration was determined by ELISA (see Teixeira et al. [46] for details). The concentrations of the main nutrients (glucose (Glc), glutamine (Gln), glutamate (Glu) and lactate (Lac)) were determined enzymatically using the biochemical analyser YSI 7100 (Yellow Springs, USA). The osmolality of culture supernatant was assayed using a Digital MicroOsmometer, Type 5R (Hermann Roebling Messtechnik, Germany). Amino acids concentrations were determined by high performance liquid chromatography (HPLC) using a protocol optimized in our Lab (for details, see Carinhas et al. [47]). Ammonia concentrations were determined enzymatically using an UV assay (Boehringer Manheim, RBiopharm AG, Germany).
BHK Metabolic network
The BHK metabolic network adopted in this study comprised 57 metabolic reactions (including transport/diffusion across the plasma membrane), 35 intracellular metabolites and 24 extracellular metabolites and other compounds. Considered reactions explain the major fluxes of carbon and nitrogen, namely glycolysis, glutaminolysis, TCA cycle, pentosephosphate pathway, recombinant product synthesis and biosynthesis of cellular components (nucleotides, carbohydrates, lipids and proteins). The catabolism of all amino acids except tryptophan is also considered. The network was simplified by lumping series of metabolic reactions, free of branching points, without loss of representativeness under steadystate. A detailed description can be found in Additional File 2.
Metabolic flux analysis
For metabolic flux analysis (MFA), the difference between the number of reactions and the number of intracellular metabolites is 5735 = 22, which means that at least 22 measured fluxes are required to obtain a determined MFA system. Envirome profiling provided data of 24 exchange fluxes, which means that the MFA system becomes redundant with 2422 = 2 degrees of freedom. As such, the full vector of 57 fluxes was partitioned into known (extracellular) and unknown (intracellular) subsets of 24 and 33 fluxes respectively. Measured fluxes for different phases of fedbatch cultures are presented as Table 7. The resulting redundant system of linear equations was solved by the weighted least squares method [48]. The error standard deviation of measured fluxes were 5% for glucose, lactate, glutamine, glutamate and ammonia concentrations, 10% for biomass, product and the remaining amino acids concentrations. All MFA calculations were performed using the FluxAnalyzer software [13].
Consistency analysis of metabolic flux distributions
Given that the MFA system is redundant with two degrees of freedom, the consistency index method [30] can be applied to verify consistency of calculated metabolic flux distributions in relation to assumed biochemistry and steady state assumption. Briefly, the consistency index, h, was calculated according to the method described in Wang and Stephanopoulos [30]. Then, the values of h were compared to the χ^{2} statistical test for two degrees of freedom. Whenever h is below the χ^{2} threshold value the system is said to be consistent. These tasks were performed using FluxAnalyzer[13].
Determination of BHK elementary flux modes
The BHK metabolic network was firstly manipulated according to the guidelines in Gagneur and Klamt [49], and then the respective EMs were computed using FluxAnalyzer[13]. This medium scale network has a total of 251 EMs, which represent, in the context of the present study, the universe of cellular functions to be screened against envirome components. Among the full set of 251 EMs, 139 refer to biomass synthesis. A closer look to these EMs reveals that different byproducts at different stoichiometric ratios are secreted or excreted concomitantly with biomass formation, namely ammonia, lactate, glutamate, alanine, aspartate and/or glycine, thus producing a distinctive environmental footprint that can be used for their discrimination. It should be noted that a single EM describes product synthesis since the underlying synthesis reaction involves only unbalanced amino acids (the carbohydrate content of the fusion glycoprotein was not considered). Additional File 1 provides a complete list of the 251 BHK elementary flux modes, formulated in terms of extracellular compounds stoichiometry.
Application of PLP to BHK data
For the present BHK problem, the input data set X comprises the above enumerated 27 envirome factors. The target dataset R comprises measurements of the same 24 exchange fluxes used for MFA. The universe of EMs is formed by the 251 EMs obtained for BHK (see Additional File 1), The subset of EMs with significant correlation with the envirome is shown in Table 3.
References
 1.
Hunter DJ: Geneenvironment interactions in human diseases. Nat Rev Genet. 2005, 6 (4): 287298.
 2.
Sauer U: Highthroughput phenomics: experimental methods for mapping fluxomes. Curr Opin Biotech. 2004, 15 (1): 5863. 10.1016/j.copbio.2003.11.001
 3.
Smith EN, Kruglyak L: Geneenvironment interaction in yeast gene expression. Plos Biol. 2008, 6 (4): 810824.
 4.
Kyoto Encyclopedia of Genes and Genomes (KEGG). http://www.genome.jp/kegg/
 5.
BioCyc Database Collection. http://biocyc.org/
 6.
Janga SC, Babu MM: Networkbased approaches for linking metabolism with environment. Genome Biol. 2008, 9 (11): 15.
 7.
Borenstein E, Kupiec M, Feldman MW, Ruppin E: Largescale reconstruction and phylogenetic analysis of metabolic environments. P Natl Acad Sci USA. 2008, 105 (38): 1448214487. 10.1073/pnas.0806162105.
 8.
Pal C, Papp B, Lercher MJ: Adaptive evolution of bacterial metabolic networks by horizontal gene transfer. Nat Genet. 2005, 37 (12): 13721375. 10.1038/ng1686
 9.
Kreimer A, Borenstein E, Gophna U, Ruppin E: The evolution of modularity in bacterial metabolic networks. P Natl Acad Sci USA. 2008, 105 (19): 69766981. 10.1073/pnas.0712149105.
 10.
Allen J, Davey HM, Broadhurst D, Heald JK, Rowland JJ, Oliver SG, Kell DB: Highthroughput classification of yeast mutants for functional genomics using metabolic footprinting. Nat Biotechnol. 2003, 21 (6): 692696. 10.1038/nbt823
 11.
Kell DB, Brown M, Davey HM, Dunn WB, Spasic I, Oliver SG: Metabolic footprinting and systems biology: the medium is the message. Nat Rev Microbiol. 2005, 3 (7): 557565. 10.1038/nrmicro1177
 12.
Klamt S, Stelling J: Two approaches for metabolic pathway analysis?. Trends Biotechnol. 2003, 21 (2): 6469. 10.1016/S01677799(02)000343
 13.
Klamt S, Stelling J, Ginkel M, Gilles ED: FluxAnalyzer: exploring structure, pathways, and flux distributions in metabolic networks on interactive flux maps. Bioinformatics. 2003, 19 (2): 261269. 10.1093/bioinformatics/19.2.261
 14.
Palsson BO, Price ND, Papin JA: Development of networkbased pathway definitions: the need to analyze real metabolic networks. Trends Biotechnol. 2003, 21 (5): 195198. 10.1016/S01677799(03)000805
 15.
Papin JA, Stelling J, Price ND, Klamt S, Schuster S, Palsson BO: Comparison of networkbased pathway analysis methods. Trends Biotechnol. 2004, 22 (8): 400405. 10.1016/j.tibtech.2004.06.010
 16.
Schuster S, Dandekar T, Fell DA: Detection of elementary flux modes in biochemical networks: a promising tool for pathway analysis and metabolic engineering. Trends Biotechnol. 1999, 17 (2): 5360. 10.1016/S01677799(98)012906
 17.
Schuster S, Fell DA, Dandekar T: A general definition of metabolic pathways useful for systematic organization and analysis of complex metabolic networks. Nat Biotechnol. 2000, 18 (3): 326332. 10.1038/73786
 18.
Wagner C: Nullspace approach to determine the elementary modes of chemical reaction systems. J Phys Chem B. 2004, 108 (7): 24252431. 10.1021/jp034523f.
 19.
Forster J, Gombert AK, Nielsen J: A functional genomics approach using metabolomics and in silico pathway analysis. Biotechnol Bioeng. 2002, 79 (7): 703712. 10.1002/bit.10378
 20.
Pachkov M, Dandekar T, Korbel J, Bork P, Schuster S: Use of pathway analysis and genome context methods for functional genomics of Mycoplasma pneumoniae nucleotide metabolism. Gene. 2007, 396 (2): 215225. 10.1016/j.gene.2007.02.033
 21.
Klamt S, Gagneur J, von Kamp A: Algorithmic approaches for computing elementary modes in large biochemical reaction networks. IET Syst Biol. 2005, 152 (4): 249255.
 22.
Ma WZ, Trusina A, ElSamad H, Lim WA, Tang C: Defining network topologies that can achieve biochemical adaptation. Cell. 2009, 138 (4): 760773. 10.1016/j.cell.2009.06.013
 23.
Gould GW, Holman GD: The glucosetransporter family  structure, function and tissuespecific expression. Biochem J. 1993, 295: 329341.
 24.
Wlaschin KF, Hu WS: Engineering cell metabolism for highdensity cell culture via manipulation of sugar transport. J Biotechnol. 2007, 131 (2): 168179. 10.1016/j.jbiotec.2007.06.006
 25.
Cruz HJ, Ferreira AS, Freitas CM, Moreira JL, Carrondo MJT: Metabolic responses to different glucose and glutamine levels in baby hamster kidney cell culture. Appl Microbiol Biot. 1999, 51 (5): 579585. 10.1007/s002530051435.
 26.
Jeong YH, Wang SS: Role of glutamine in hybridoma cellculture  effects on cellgrowth, antibodyproduction, and cellmetabolism. Enzyme Microb Tech. 1995, 17 (1): 4755. 10.1016/01410229(94)00041O.
 27.
Wiback SJ, Mahadevan R, Palsson BO: Reconstructing metabolic flux vectors from extreme pathways: defining the alphaspectrum. J Theor Biol. 2003, 224 (3): 313324. 10.1016/S00225193(03)001681
 28.
Nookaew I, Meechai A, Thammarongtham C, Laoteng K, Ruanglek V, Cheevadhanarak S, Nielsen J, Bhumiratana S: Identification of flux regulation coefficients from elementary flux modes: A systems biology tool for analysis of metabolic networks. Biotechnol Bioeng. 2007, 97 (6): 15351549. 10.1002/bit.21339
 29.
Schwartz JM, Kanehisa M: Quantitative elementary mode analysis of metabolic pathways: the example of yeast glycolysis. BMC Bioinformatics. 2006, 7: 20 10.1186/14712105720
 30.
Wang NS, Stephanopoulos G: Application of macroscopic balances to the identification of gross measurement errors. Biotechnol Bioeng. 1983, 25 (9): 21772208. 10.1002/bit.260250906
 31.
Me LZ, Zhou WC: Fedbatch cultivation of mammalian cells for the production of recombinant proteins. Cell Culture Technology for Pharmaceutical and CellBased Therapies. Edited by: Ozturk S, Hu WS. 2006, 349386. Boca Raton: Crc PressTaylor & Francis Group
 32.
Christie A, Butler M: The adaptation of BHK cells to a nonammoniagenic glutamatebased culture medium. Biotechnol Bioeng. 1999, 64 (3): 298309. 10.1002/(SICI)10970290(19990805)64:3<298::AIDBIT6>3.0.CO;2U
 33.
Yoon SK, Choi SL, Song JY, Lee GM: Effect of culture pH on erythropoietin production by chinese hamster ovary cells grown in suspension at 32.5 and 37.0 degrees C. Biotechnol Bioeng. 2005, 89 (3): 345356. 10.1002/bit.20353
 34.
Osman JJ, Birch J, Varley J: The response of GSNS0 myeloma cells to pH shifts and pH perturbations. Biotechnol Bioeng. 2001, 75 (1): 6373. 10.1002/bit.1165
 35.
Lin JQ, Takagi M, Qu YB, Gao PJ, Yoshida T: Enhanced monoclonal antibody production by gradual increase of osmotic pressure. Cytotechnology. 1999, 29 (1): 2733. 10.1023/A:1008016806599
 36.
Ozturk SS, Palsson BO: Effect of medium smolarity on hybridoma growth, metabolism, and antibodyproduction. Biotechnol Bioeng. 1991, 37 (10): 989993. 10.1002/bit.260371015
 37.
Takagi M, Hayashi H, Yoshida T: The effect of osmolarity on metabolism and morphology in adhesion and suspension chinese hamster ovary cells producing tissue plasminogen activator. Cytotechnology. 2000, 32 (3): 171179. 10.1023/A:1008171921282
 38.
Hieter P, Boguski M: Functional genomics: It's all how you read it. Science. 1997, 278 (5338): 601602. 10.1126/science.278.5338.601
 39.
Schena M, Heller RA, Theriault TP, Konrad K, Lachenmeier E, Davis RW: Microarrays: biotechnology's discovery platform for functional genomics. Trends Biotechnol. 1998, 16 (7): 301306. 10.1016/S01677799(98)012190
 40.
Tao H, Bausch C, Richmond C, Blattner FR, Conway T: Functional genomics: Expression analysis of Escherichia coli growing on minimal and rich media. J Bacteriol. 1999, 181 (20): 64256440.
 41.
Toscano WA, Oehlke KP: Systems biology: new approaches to old environmental health problems. Int J Environ Res Publ Health. 2005, 2 (1): 49. 10.3390/ijerph2005010004.
 42.
Anthony JC: The promise of psychiatric enviromics. Brit J Psychiat. 2001, 40: s811.
 43.
van Os J, Rutten BPF, Poulton R: GeneEnvironment interactions in schizophrenia: Review of epidemiological findings and future directions. Schizophrenia Bull. 2008, 34 (6): 10661082. 10.1093/schbul/sbn117.
 44.
Geladi P, Kowalski BR: Partial LeastSquares regression  A tutorial. Anal Chim Acta. 1986, 185: 117. 10.1016/00032670(86)800289.
 45.
Burger C, Carrondo MJT, Cruz H, Cuffe M, Dias E, Griffiths JB, Hayes K, Hauser H, Looby D, Mielke C, et al.: An integrated strategy for the process development of a recombinant antibodycytokines fusion protein expressed in BHK cells. Appl Microbiol Biot. 1999, 52 (3): 345353. 10.1007/s002530051530.
 46.
Teixeira A, Cunha AE, Clemente JJ, Moreira JL, Cruz HJ, Alves PM, Carrondo MJT, Oliveira R: Modelling and optimization of a recombinant BHK21 cultivation process using hybrid greybox systems. J Biotechnol. 2005, 118 (3): 290303. 10.1016/j.jbiotec.2005.04.024
 47.
Carinhas N, Bernal V, Monteiro F, Carrondo MJT, Oliveira R, Alves PM: Improving baculovirus production at high cell density through manipulation of energy metabolism. Metab Eng. 2010, 12 (1): 3952. 10.1016/j.ymben.2009.08.008
 48.
Stephanopoulos GN, Aristidou AA, Nielsen J: Metabolic engineering: Principles and methodologies. 1998, San Diego: Academic Press
 49.
Gagneur J, Klamt S: Computation of elementary modes: a unifying framework and the new binary approach. BMC Bioinformatics. 2004, 5: 121. 10.1186/1471210551
Acknowledgements
Financial support for this work was provided by the Portuguese Fundação para a Ciência e Tecnologia through project PTDC/EBBEBI/103761/2008, PhD grant SFRH/BD/36676/2007 (NC) and PostDoc grant SFRH/BPD/46277/2008 (JD). We would like to acknowledge Professor Michael Semmens from the University of Minnesota for his valuable assistance in reviewing this manuscript.
Author information
Additional information
Authors' contributions
The project was conceived by RO and experiments planned by AT, RO, PA and MJTC. Experiments were performed by AT and NC. MS and AC supervised the bioreaction and JC developed software to support fedbatch control. PLP algorithm was developed by JD, MVS and RO. Data analysis and manuscript writing were performed by AT, JD, NC and RO. All authors read and approved the final manuscript.
Ana P Teixeira, João ML Dias contributed equally to this work.
Electronic supplementary material
Additional file 1:BHK elementary modes. List of elementary modes obtained from the BHK metabolic network (additional file 2). Elementary modes are represented in reduced form in terms of extracellular metabolites. (DOC 618 KB)
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
About this article
Received
Accepted
Published
DOI
Keywords
 Partial Little Square
 Metabolic Network
 Elementary Mode
 Flux Vector
 Metabolic Flux Analysis