 Methodology article
 Open Access
 Published:
Projection to latent pathways (PLP): a constrained projection to latent variables (PLS) method for elementary flux modes discrimination
BMC Systems Biology volume 5, Article number: 181 (2011)
Abstract
Background
Elementary flux modes (EFM) are unique and nondecomposable sets of metabolic reactions able to operate coherently in steadystate. A metabolic network has in general a very high number of EFM reflecting the typical functional redundancy of biological systems. However, most of these EFM are either thermodynamically unfeasible or inactive at preset environmental conditions.
Results
Here we present a new algorithm that discriminates the "active" set of EFM on the basis of dynamic envirome data. The algorithm merges together two wellknown methods: projection to latent structures (PLS) and EFM analysis, and is therefore termed projection to latent pathways (PLP). PLP has two concomitant goals: (1) maximisation of correlation between EFM weighting factors and measured envirome data and (2) minimisation of redundancy by eliminating EFM with low correlation with the envirome.
Conclusions
Overall, our results demonstrate that PLP slightly outperforms PLS in terms of predictive power. But more importantly, PLP is able to discriminate the subset of EFM with highest correlation with the envirome, thus providing indepth knowledge of how the environment controls core cellular functions. This offers a significant advantage over PLS since its abstract structure cannot be associated with the underlying biological structure.
Background
An elementary flux mode (EFM) can be defined as a minimal set of enzymes able to operate at steady state, with the enzymes weighted by the relative flux they need to carry for the mode to function [1]. The universe of EFM of a given metabolic network define the full set of nondecomposable steadystate flux distributions that the network can support. Any particular steadystate flux distribution can be expressed as a nonnegative linear combination of EFM. Motivated by these unique properties, EFM analysis has become a widespread technique for systems level metabolic pathways analysis [1–8].
The number of EFM of a metabolic network is in general very high, denoting the innate adaptability and robustness of biological systems. As illustrative example, the central carbon metabolism of a genomescale reconstructed Escherichia coli metabolic network has approximately 26 million EFM [9]. However, not all of these pathways are thermodynamically feasible or even physiologically reachable [10]. Over the last decade several methods were proposed to reduce the number of EFM founded on different principles (Table 1).
Some of the proposed methods reduce EFM based solely on structural information of the metabolic network. de Figueiredo et al. [11] presented a method to enumerate the EFM in increasing order of number of reactions. This approach enabled to identify the Kshortest EFM in Escherichia coli and Corynebacterium glutamicum metabolic networks, which are in principle energetically more efficient. Song and Ramkrishna [12] proposed a reduction algorithm based on the effect of EFM on the convex hull volume. This allowed the a priori reduction, without any experimental data, from the initial 369 to 35 EFM for a yeast metabolic network fermenting both glucose and xylose.
EFM can also be discriminated on the basis of reaction thermodynamics. Wlaschin et al. [13] demonstrated with experimentally determined intracellular fluxes that EFM weights are inversely correlated with the entropy generated by the involved metabolic reactions. This suggests that evolution induced cellular regulatory patterns to favour efficient pathways with low entropy generation. Zhao et al. [14] proposed a method for correlating enzyme activity and flux distribution which uses the Shannon's maximum entropy principle, a measure of system complexity, as an objective function to estimate the enzyme control flux.
Several methods have been proposed that merge linear programming and experimental data. Palsson and coauthors [15, 16] suggested linear optimization to determine how extreme pathways (the systemically independent subset of EFM) contribute to a given (measured) steadystate flux distribution. There is a range of possible nonnegative weighting values associated to extreme pathways that produce a given steadystate flux distribution. This range was calculated by maximizing and minimizing the extreme pathway weighting factors, resulting in the so called αspectrum. Wang et al. [17] presented a method to calculate the EFM coefficients for a large set of EFM by devising a quadratic program to explore the possibility and performance of using a subset of the EFM to reconstruct flux distributions. Alternatively, a framework based on EFM analysis and the convex properties of EFM was developed to calculate EFM flux regulation coefficients (FRC) corresponding to an appropriate fractional operation of this mode within the complete set of EFM [18].
Schwartz and Kanehisa [19] showed that a combination of structural and kinetic modelling in yeast glycolysis significantly constraints the range of possible behaviours of a metabolic system. All EFM are not equal contributors to physiological cellular states, and this approach may open a direction towards a broader identification of physiologically relevant EFM among the very large number of stoichiometrically possible modes.
In a previous paper [20], we have delineated a conceptual approach to map envirome factors to cellular functions based on the correlation of EFM weighting factors and measured envirome variables. Here we study in detail the computational algorithm to reduce EFM based on the degree of correlation of EFM weighting factors with measured envirome factors, which we call projection to latent pathways (PLP). The underlying principles are: (i) only a moderate number of EFM are active at given environmental conditions, (ii) the envirome plays a critical role in their regulation, and (iii) active EFM deliver a characteristic environmental footprint that can be used for their identification. In what follows we present all mathematical details underlying PLP and compare it with PLS in relation to a case study.
Results
Projection to Latent Pathways (PLP) Algorithm
Problem statement
By applying steadystate material balance equations to a metabolic network with m metabolites and q metabolic reactions, the following system of linear algebraic equations is obtained:
with r a vector of q metabolic fluxes, r_{ k } the subset of fluxes associated to irreversible reactions and N a m×q stoichiometric matrix. It is a wellknown property of system (1) that its null space solution takes the form of a polyhedral cone [21]. Furthermore, the convex basis of system (1) is formed by a large number of base vectors, which are the EFM studied in this paper:
with em_{ i } a q × 1 vector of reaction weighting factors that defines EFM i and λ_{ i } a scalar variable defining the partial contribution of em_{ i } to the overall flux phenotype, r, and n_{ em } the number of EFM.
In this paper we study the reduction of EFM on the basis of dynamical envirome data sets. The basic premise is that measured fluxome vectors can be systematically deconvoluted into genetic dependent factors (the structure of EFM, em_{ i }) and envirome dependent factors (the partial contribution of each EFM to flux phenotype, λ_{ i } ). To implement this method, we developed a discrimination algorithm that works according to the following criteria:

1.
Maximisation of explained variance of flux data sets, R = {r(t)}

2.
Maximisation of correlation of λ_{ i } against envirome data, X = {x(t)}

3.
Minimisation of the number of active EFM
with X = {x(t)} a np × nx matrix of np independent observations of envirome vectors x(t) (dim(x) = nx), R = {r(t)} a np × nr matrix of np independent observations of reaction rates, r(t) (dim(r) = q). These criteria are equivalent to a covariance maximisation problem (covariance maximisation implies maximisation of correlation and minimisation of redundancy) between envirome data, X, and observed flux data, R, under the constraint of a plausible set of EFM:
with EM = {em_{ i }} a nr × nem matrix of nem EFM, em_{ i } (dim(em_{i}) = q), Λ = {λ(t)} a np × nem matrix of weight vectors λ(t) of EFM (dim(λ) = nem) and C a nem × nx matrix of regression coefficients.
Unconstrained maximisation of covariance can be performed by the widely used method projection to latent structures (PLS), also known as partial least squares. Figure 1 shows the structural differences between PLS and PLP. Since PLP is derived from PLS, in the lines below we first review PLS decomposition and then show how it can be extended to PLP.
Projection to Latent structures (PLS)
PLS is a multivariate linear regression technique between an input (predictor) matrix, X, and an output response matrix, Y. It differs from traditional multivariate linear regression in that it decomposes both the predictor and the response matrices into reduced sets of uncorrelated latent variables, which are then linearly regressed against each other.
The most widely used PLS algorithm is the NIPALS (noniterative partial least squares) algorithm [22], which provides the basis for PLP derivation. NIPALS proceeds according to the following steps:

1.
Set the initial ny × 1 Yloading vector, q, equal to an arbitrarily chosen nonzero row of Y, y_{ t }
q=\frac{{y}_{t}^{\mathsf{\text{T}}}}{\u2225{y}_{t}\u2225}(4)
in case of univariate PLS, ny = 1 and q = 1

2.
Compute the np × 1 Yscore vector, u
u=Y\cdot q(5) 
3.
Compute the nx × 1 weight vector, w
w=\frac{{X}^{\mathsf{\text{T}}}\cdot u}{\u2225{X}^{\mathsf{\text{T}}}\cdot u\u2225},(6) 
4.
Compute the np × 1 Xscore vector, t
t=X\cdot w(7) 
5.
Recalculate the Yloading vector, q
q=\frac{{Y}^{\mathsf{\text{T}}}\cdot t}{\u2225{Y}^{\mathsf{\text{T}}}\cdot t\u2225}(8) 
6.
Repeat steps 15 until the convergence criterion tt_{ old } < eps is obeyed with, for instance, eps = 1 × 10^{8}. In case of univariate PLS, Eq. 8 yields q = 1 hence no iterations are performed.

7.
Compute the X data block loadings, p, and rescale accordingly:
p=\frac{{X}^{\mathsf{\text{T}}}\cdot t}{\u2225{t}^{\mathsf{\text{T}}}\cdot t\u2225}(9){p}_{new}=\frac{p}{\u2225p\u2225}(10)t=t\cdot \u2225p\u2225(11)w=w\cdot \u2225p\u2225(12) 
8.
Compute the regression coefficient of the inner linear model
b=\frac{{u}^{\mathsf{\text{T}}}\cdot t}{{t}^{\mathsf{\text{T}}}\cdot t}(13) 
9.
Compute the X and Y residuals
{E}_{X}=Xt\cdot {p}^{\mathsf{\text{T}}}(14){E}_{Y}=Yb\cdot t\cdot {p}^{\mathsf{\text{T}}}(15) 
10.
Then go back to step 1 and repeat the procedure for the next latent variable after making
X={E}_{X}(16)Y={E}_{Y}(17)
Steps 110 are repeated for k = 1, ..., Fac latent variables resulting into the following overall decomposition:
with E_{ i }residuals matrices. Finally, the prediction of Y from X is given by
with RC the ny × nx regression coefficients matrix given by
For more details about PLS and NIPALS see Geladi and Kowalski [23].
Projection to latent pathways (PLP)
PLP can be viewed as a constrained version of PLS that maximises the covariance between X and R under the constraint of known EFM. PLP performs essentially the same decomposition described by Eq. 1822. The main difference lies in the computation of the output loadings,Q. Since EFM are unique and nondecomposable flux solutions, any observed flux distribution can be expressed as a nonnegative weighted sum of EFM (Eq. 2). Thus, EFM em_{ i } can be interpreted as latent variables (or principle components of a metabolic network) while the weights λ_{ i } can be interpreted as score values of such latent variables. According to this analogy, PLS was modified as follows:

1.
For each EFM k, set the loadings equal to em_{ k } and compute the respective score vector, λ_{ k }:
{q}_{k}=e{m}_{k}(23){\lambda}_{k}=R\cdot {q}_{k}\left(\equiv {u}_{k}\right)(24) 
2.
Perform a univariate PLS (with q = 1) with input X and target Y = λ_{ k } for Fac latent variables as described in the previous section and compute the predicted λ_{ k }
{\widehat{\lambda}}_{k}:\phantom{\rule{0.3em}{0ex}}predicted\phantom{\rule{0.3em}{0ex}}{\lambda}_{k}\phantom{\rule{0.3em}{0ex}}from\mathsf{\text{}}univariate\mathsf{\text{}}PLS(25) 
3.
Compute the predicted R by the k EFM and the respective explained variance
{\widehat{R}}_{k}={\widehat{\lambda}}_{k}\cdot {q}_{k}^{\mathsf{\text{T}}}(26)va{r}_{k}\left(\mathsf{\text{\%}}\right)=\mathsf{\text{100}}\cdot \left(\mathsf{\text{1}}\frac{\sum _{\mathsf{\text{i}}}{\left(R{\widehat{R}}_{k}\right)}^{\mathsf{\text{T}}}\cdot \left(R{\widehat{R}}_{k}\right)}{\sum _{\mathsf{\text{i}}}{R}^{T}\cdot R}\right)(27) 
4.
Repeat steps 13 for every EFM k = 1,..., nem and choose the best, kopt, as the one that exhibits the highest variance value given by Eq. 27.
kopt:\mathsf{\text{EFMwithhighest}}\phantom{\rule{0.3em}{0ex}}va{r}_{k}\phantom{\rule{0.3em}{0ex}}\mathsf{\text{value}}(28) 
5.
Remove kopt from the list of EFM and make
R=R{\widehat{R}}_{kopt}(29) 
6.
Go back to step 1 and repeat the procedure for a maximum number of EFM or until the explained variance of R does not increase any further.
With this procedure the output loadings, Q, hold a subset of EFM from matrix EM while the output scores, U, are equivalent to the EFM weights matrix, Λ:
As such, while PLS and PLP are structurally equivalent, the loadings and scores in PLS are abstract variables while in PLP they have a physical interpretation:

1.
The number of latent variables in PLS is analogous to the number of active EFM in PLP. Thus the subset of EFM that explain most of the variance of R are interpreted as the set of metabolic pathways activated by environmental factors.

2.
The regression coefficients vector, RC_{ kept }, of the inner univariate PLS, being directly associated with EFM, show the contribution of each environmental factor to the up or downregulation of EFM.
The PLS and PLP algorithms were coded in Matlab™(Mathworks, Inc). The code is freely available for academic use under a free academic license and can be downloaded at http://www.dq.fct.unl.pt/sbegroup.
In what follows we compare both algorithms in relation to a case study.
Case study: recombinant BHK cell line
Data of a recombinant baby hamster kidney (BHK) cell line expressing a fusion glycoprotein IgG1IL2 was used to compare PLS and PLP. The data set comprises 134 observations acquired from 7 independent bioreactor experiments operated in batch and fedbatch modes. The predictor matrix, X (dim(X) = 134 × 26), includes measured data of 26 environmental factors (pH, osmolarity and concentrations of viable cells, glucose, lactate, ammonia, IgG1IL2 and 19 amino acids) while the target matrix, R (dim(R) = 134 × 24), comprises 24 production or consumption fluxes of extracellular compounds. Further details about the data can be found elsewhere ([20]).
A relatively small BHK metabolic network comprising 35 metabolites and 57 metabolic reactions was constructed. Its EFM were computed using Metatool 5.0 [24] resulting in 251 EFM. Details can be found as Additional Files 1 and 2. These 251 EFM were used as constraints to PLP decomposition.
Comparing PLP and PLS decomposition results
The full data set was divided into two partitions of randomly selected points with equal size for calibration and validation (with 67 points each). The results of a single run of PLS and PLP decomposition for the calibration data set are shown in Tables 2 and 3 respectively. PLS decomposition stops at latent variable 18, when the X variance reaches 100%. The final explained R variance is 90.1%. As for PLP, decomposition progresses up to the 17^{th} EFM, explaining 82.5% of R variance, thus 7.5% less than PLS. PLP decomposition stops when the threshold degree of correlation between λ_{ i } and Xcan no longer be satisfied (r^{2} > 0.75 and pvalue < 0.05, see Table 3). This procedure ensures that the identified EFM are the ones with highest correlation with environmental state. Figure 2 depicts predicted against "measured" λ_{ i } illustrating the high degree of correlation with envirome variables for the discriminated set of EFM.
Assessment of EFM reduction consistency
PLS belongs to a class of multivariate regression techniques that can be used to model high dimensional data sets with low number of sampling points [25]. However, when the number of samples is too low, the partitioning into calibration and validation sets may have a high impact on the final model structure. Since stemming from PLS, the same problem does in principle apply to PLP. In order to assess EFM discrimination variability due to data partitioning, a bootstrapping technique was implemented, in which PLP and PLS were repeated 200 times with randomly selected calibration and validation partitions with 67 points each. Figure 3 shows the frequency of selection of EFM resulting from the bootstrapping analysis. The complete set of results is provided as Additional File 3. These results evidence a subset of frequently selected EFM, which include EFM1, EFM2, EFM4, EFM6, EFM11, EFM179 and EFM210 with frequency of selection higher than 75% and EFM69, EFM72, EFM173 and EFM206 with frequency of selection higher than 50%. Less frequently selected EFMs are very sensitive to the data partitioning and to experimental noise and thus less reliable to interpret.
Metabolic interpretability
As mentioned previously, while in PLS the output latent variables have no physical meaning, in PLP they are EFM. To illustrate this difference we plot in Figure 4 the output loadings of the first two PLS latent variable (Table 2) against the reaction weighting factors of the first two selected EFM (179 and 1) (Table 3).
It can be seen that the first PLS loadings vector q_{ 1 } calculated by Eq. 8 does show a residual correlation with the first selected EFM 179 structure (r^{2} = 0,26). However, the second loadings vector q_{2} shows no correlation at all with second select EFM 1. Despite the fact that both the calculation of the output loadings q and the selection of EFM obey to the same criterion of maximization of the correlation between X and Y, it is clear that the data structure identified by PLS cannot be easily associated with the underlying biological structure.
It is beyond the scope of this paper to present a detailed metabolic interpretation of the discriminated EFM by PLP (for a detailed analysis see [20]). Here we just highlight a few illustrative examples, the most frequently selected EFM for biomass synthesis is EFM 179 followed by EFM173. The product formation EFM (EFM 1) is also frequently selected. The anaerobic conversion of glucose into lactate was also frequently selected (EFM 11). Serine transamination into glycine (EFM 6) was also among the most frequently selected EFM. EFM 4 corresponds to the glutaminolysis pathway, well known as a major carbon source for energy production in mammalian cells. In general, these are important pathways known to be active in mammalian cells.
Regression coefficients
While PLS regression coefficients are associated with latent variables lacking physical meaning, PLP regression coefficients are directly associated to the discriminated EFM (see Figure 5). Thus they provide information of how the envirome up or downregulates each EFM. This interpretation should however be done with care as regression coefficients cannot disclose between a cause and an effect. An EFM is per definition a nondecomposable subnetwork. Most of them start and end in extracellular compounds. Each EFM produces a characteristic dynamic footprint in the environment in terms of consumed or produced metabolites, which is more an effect rather than a cause. Moreover, it is an important feature of PLS and per inheritance of PLP that the Xloadings are computed in a way to maximise predictive power of Y in detriment of interpretability of the individual contribution of X variables. Although many papers have attempted to develop interpretation of PLS regression coefficients (e.g. [26, 27]), other techniques are in principle better suited for this purpose. Even so, main causaleffects can be extracted from the analysis of regression coefficients. For this analysis it is however important to calculate the confidence intervals of the regression coefficients, which can be obtained from the previously described bootstrapping technique [28]. From the z = 200 PLP runs with randomly selected calibration and validation data sets, z = 200 vectors of regression coefficients are calculated (see Additional File 3). The respective mean and standard deviation can be estimated as follows:
The 95% confidence intervals can then be calculated from the tstudent distribution with 0.975 half interval and zFac degrees of freedom
As illustrative example, Figure 6 plots the confidence interval against the mean of the regression coefficients for the product formation EFM (EFM 1). It can be observed that only a subset of regression coefficients lay below the one half threshold line. These include the regression coefficients associated with pH, osmolality, glutamine, lactate, IgG, valine and lysine. These regression coefficients are the most statistically significant and thus more reliable interpretations can be withdrawn from them. As example, it is a rational result that the weighting factor of the product EFM 1 is highly correlated with the product concentration since the product results from EFM 1. All other identified environmental parameters are potential targets for manipulation in order to improve product synthesis. This analysis can be systematically extended to the full set of envirome components and full set of EFMs to support the concept of cell functional enviromics as defended in [20].
Predictive power
To test the predictive power, PLS and PLP models were calibrated with the calibration data set composed by 50% of data points and then simulated on the validation data set composed by the remaining 50% measured points. The PLS model with 18 latent variables explained 90.1% of R variance in the calibration dataset but only 76.8% of the validation dataset. The quality of the results can be visually inspected in Figure 7. The degradation of accuracy in the validation dataset is rational given that the model is requested to predict data points, which may lay outside of the domain of experience defined by the calibration data set. As for PLP it is a very interesting result to verify that despite explaining a lower variance in the calibration data set (83.2% against 90.1% for PLP and PLS respectively), the accuracy of the validation data set was higher than that of PLS (81.9% against 76.8% for PLP and PLS respectively). Moreover, the variance of the validation data set is almost equal to that of the calibration data set, denoting a more consistent model, with higher predictive power than the PLS one (Figure 8).
In order to screen out the possibility of a casual better performance of PLP in relation to PLS due to the particular data partitioning employed, the same variance analysis was performed for the z = 200 PLP and PLS runs performed with randomly selected calibration and validation data points according to the bootstrapping technique previously described. The results show that the explained variance of the validation data set varied between 78.885.6% for PLP and 50.482.7% for PLS (see Additional File 3). In 194 out of 200 runs the PLP outperformed the PLS, thus confirming that while PLS is consistently more accurate in describing the calibration data than PLP, the latter is consistently more accurate at predicting the validation data than PLS.
Discussion
The key PLS feature is identifying independent X and Yscores so that the relationship between successive pairs of scores is as strong as possible. PLS may be thus viewed as a robust form of redundancy analysis, seeking directions in the factor space that are associated with high variation in the response Y but biasing them toward directions that are more accurately predicted. Due to its advantages in handling highly redundant data sets, PLS has become a widely used regression analysis technique in systems biology. It has been applied as an inference tool for predicting metabolic fluxes using isotopomer flux data [29], analysing genomic and proteomic data [25], identifying signalling networks by inducing cellular response to different stimuli [30–32] and network structure using metabolomic data [33]. Moreover, PLS has also been applied for the identification of active cellular pathways as a function of the environment using metabolic and gene expression profiles [34], detection of genegene interactions from microarrays data [35, 36] and culture media optimization using nutritional profiling data [26, 27].
The main disadvantage of PLS lies in its empirical datadriven nature with limited addedvalue in terms of mechanistic knowledge generation. Although carrying some internal structure, this structure is not inspired by any a priori mechanistic knowledge of the system. PLP may be viewed as a constrained version of PLS, attuned to the structure of the biological system under study. While in PLS the loadings and score are abstract variables, in PLP loadings and scores refer to well defined metabolic structures. Specifically, PLP explores EFM as "principle components" of a metabolic network. Indeed, EFM obey to the principle of nondecomposability, meaning that any particular flux distribution can be expressed as a nonnegative weighted sum of EFM. Thus the ranking obtained in PLP refers to active pathways as inferred by their level of correlation with the environmental state. In terms of data requirement, PLS belongs to the class of multivariate regression techniques particularly suitable to handle highly dimensional data sets even if the number of observations is limited [25]. PLS is typically used to model spectral data such as near infrared or 2Dfluorescence maps [37]. A basic requirement is that the number of latent variables must be lower than the number of observations in the calibration data set. This means that reliable linear models can be identified from a moderate number of observations of highly dimensional datasets. The same properties apply to PLP. A basic constraint is that the number of discriminated EFM cannot be higher than the number of observations in the calibration data set. However the method offers no restriction in terms of the dimensionality of the input data set.
Finally it should be commented on the computational power requirements, which scales linearly with the number of EFM. In the present study with 251 EFM, computation requirements are in the order of seconds in a common PC. For a genome scale network with several million of EFM, computation power might easily rise to the scale of days in a common PC.
Conclusions
In this work we have developed an algorithm for the discrimination of active EFM on the basis of dynamical envirome data called projection to latent pathways (PLP). The algorithm is designed to maximise the covariance between envirome data and observed flux data under the constraint of universe of genes translated into a plausible set of EFM. In general lines, the algorithm discriminates a minimal set of envirome correlated EFM that maximise the variance of measured flux data. Thus the algorithm may be viewed as a reverse, envirometofunction metabolic reconstruction methodology as opposed to the generally accepted genetofunction reconstruction approach. Although presented here as a method to analyse envirome data sets, PLP has broader scope. It is rather a general methodology for statistical elimination of redundant metabolic structures that, in a broader sense, has the potential to bring together all layers of 'omic' information under a common computational framework.
References
Schuster S, Fell DA, Dandekar T: A general definition of metabolic pathways useful for systematic organization and analysis of complex metabolic networks. Nat Biotechnol. 2000, 18 (3): 326332. 10.1038/73786.
de Figueiredo LF, Schuster S, Kaleta C, Fell DA: Can sugars be produced from fatty acids? A test case for pathway analysis tools. Bioinformatics. 2008, 24 (22): 26152621. 10.1093/bioinformatics/btn500.
Klamt S, Stelling J: Two approaches for metabolic pathway analysis?. Trends Biotechnol. 2003, 21 (2): 6469. 10.1016/S01677799(02)000343.
Palsson BO, Price ND, Papin JA: Development of networkbased pathway definitions: the need to analyze real metabolic networks. Trends Biotechnol. 2003, 21 (5): 195198. 10.1016/S01677799(03)000805.
Papin JA, Stelling J, Price ND, Klamt S, Schuster S, Palsson BO: Comparison of networkbased pathway analysis methods. Trends Biotechnol. 2004, 22 (8): 400405. 10.1016/j.tibtech.2004.06.010.
Schuster S, Dandekar T, Fell DA: Detection of elementary flux modes in biochemical networks: a promising tool for pathway analysis and metabolic engineering. Trends Biotechnol. 1999, 17 (2): 5360. 10.1016/S01677799(98)012906.
Trinh CT, Wlaschin A, Srienc F: Elementary mode analysis: a useful metabolic pathway analysis tool for characterizing cellular metabolism. Appl Microbiol Biotechnol. 2009, 81 (5): 813826. 10.1007/s0025300817701.
Wagner C: Nullspace approach to determine the elementary modes of chemical reaction systems. J Phys Chem B. 2004, 108 (7): 24252431. 10.1021/jp034523f.
Terzer M, Stelling J: Largescale computation of elementary flux modes with bit pattern trees. Bioinformatics. 2008, 24 (19): 22292235. 10.1093/bioinformatics/btn401.
Edwards JS, Palsson BO: The Escherichia coli MG1655 in silico metabolic genotype: Its definition, characteristics, and capabilities. Proc Natl Acad Sci USA. 2000, 97 (10): 55285533. 10.1073/pnas.97.10.5528.
de Figueiredo LF, Podhorski A, Rubio A, Kaleta C, Beasley JE, Schuster S, Planes FJ: Computing the shortest elementary flux modes in genomescale metabolic networks. Bioinformatics. 2009, 25 (23): 31583165. 10.1093/bioinformatics/btp564.
Song HS, Ramkrishna D: Reduction of a Set of Elementary Modes Using Yield Analysis. Biotechnol Bioeng. 2009, 102 (2): 554568. 10.1002/bit.22062.
Wlaschin AP, Trinh CT, Carlson R, Srienc F: The fractional contributions of elementary modes to the metabolism of Escherichia coli and their estimation from reaction entropies. Metab Eng. 2006, 8 (4): 338352. 10.1016/j.ymben.2006.01.007.
Zhao QY, Kurata H: Maximum entropy decomposition of flux distribution at steady state to elementary modes. J Biosci Bioeng. 2009, 107 (1): 8489. 10.1016/j.jbiosc.2008.09.011.
Wiback SJ, Mahadevan R, Palsson BO: Reconstructing metabolic flux vectors from extreme pathways: defining the alphaspectrum. J Theor Biol. 2003, 224 (3): 313324. 10.1016/S00225193(03)001681.
Wiback SJ, Mahadevan R, Palsson BO: Using metabolic flux data to further constrain the metabolic solution space and predict internal flux patterns: The Escherichia coli spectrum. Biotechnol Bioeng. 2004, 86 (3): 317331. 10.1002/bit.20011.
Wang QZ, Yang YD, Ma HW, Zhao XM: Metabolic network properties help assign weights to elementary modes to understand physiological flux distributions. Bioinformatics. 2007, 23 (9): 10491052. 10.1093/bioinformatics/btm074.
Nookaew I, Meechai A, Thammarongtham C, Laoteng K, Ruanglek V, Cheevadhanarak S, Nielsen J, Bhumiratana S: Identification of flux regulation coefficients from elementary flux modes: A systems biology tool for analysis of metabolic networks. Biotechnol Bioeng. 2007, 97 (6): 15351549. 10.1002/bit.21339.
Schwartz JM, Kanehisa M: Quantitative elementary mode analysis of metabolic pathways: the example of yeast glycolysis. BMC Bioinformatics. 2006, 7: 2010.1186/14712105720.
Teixeira AP, Dias JML, Carinhas N, Sousa M, Clemente JJ, Cunha AE, von Stosch M, Alves PM, Carrondo MJT, Oliveira R: Cell functional enviromics: Unravelling the function of environmental factors. BMC Syst Biol. 2011, 5
Schuster S, Hilgetag C: On Elementary Flux Modes in biochemical reaction systems at steady state. J Biol Syst. 1994, 2 (2): 165182. 10.1142/S0218339094000131.
Wold H: Path models with latent variables: The NIPALS approach. Quantitative sociology: International perspectives on mathematical and statistical modeling. Edited by: Blalock HM, Aganbegian A, Borodkin FM, Boudon R, Capecchi V. 1975, New York: Academic, 307357.
Geladi P, Kowalski BR: Partial LeastSquares regression  A tutorial. Anal Chim Acta. 1986, 185: 117. 10.1016/00032670(86)800289.
von Kamp A, Schuster S: Metatool 5.0: fast and flexible elementary modes analysis. Bioinformatics. 2006, 22 (15): 19301931. 10.1093/bioinformatics/btl267.
Boulesteix AL, Strimmer K: Partial least squares: a versatile tool for the analysis of highdimensional genomic data. Brief Bioinform. 2007, 8 (1): 3244.
De Alwis DM, Dutton RL, Scharer J, MooYoung M: Statistical methods in media optimization for batch and fedbatch animal cell culture. Bioprocess Biosyst Eng. 2007, 30 (2): 107113. 10.1007/s0044900601077.
Selvarasu S, Kim DY, Karimi IA, Lee DY: Combined data preprocessing and multivariate statistical analysis characterizes fedbatch culture of mouse hybridoma cells for rational medium design. J Biotechnol. 2010, 150 (1): 94100. 10.1016/j.jbiotec.2010.07.016.
Faber NM: Uncertainty estimation for multivariate regression coefficients. Chemometrics Intell Lab Syst. 2002, 64 (2): 169179. 10.1016/S01697439(02)001028.
Antoniewicz MR, Stephanopoulos G, Kelleher JK: Evaluation of regression models in metabolic physiology: predicting fluxes from isotopic data without knowledge of the pathway. Metabolomics. 2006, 2 (1): 4152. 10.1007/s1130600600182.
Ivakhno S, Armstrong JD: Nonlinear dimensionality reduction of signaling networks. BMC Syst Biol. 2007, 1: 1710.1186/17520509117.
Janes KA, Albeck JG, Gaudet S, Sorger PK, Lauffenburger DA, Yaffe MB: Systems model of signaling identifies a molecular basis set for cytokineinduced apoptosis. Science. 2005, 310 (5754): 16461653. 10.1126/science.1116598.
MillerJensen K, Janes KA, Brugge JS, Lauffenburger DA: Common effector processing mediates cellspecific responses to stimuli. Nature. 2007, 448 (7153): 604U611. 10.1038/nature06001.
Bundy JG, Papp B, Harmston R, Browne RA, Clayson EM, Burton N, Reece RJ, Oliver SG, Brindle KM: Evaluation of predicted network modules in yeast metabolism using NMRbased metabolite profiling. Genome Res. 2007, 17 (4): 510519. 10.1101/gr.5662207.
Li Z, Chan C: Integrating gene expression and metabolic profiles. J Biol Chem. 2004, 279 (26): 2712427137. 10.1074/jbc.M403494200.
Pihur V, Datta S: Reconstruction of genetic association networks from microarray data: a partial least squares approach. Bioinformatics. 2008, 24 (4): 561568. 10.1093/bioinformatics/btm640.
Tenenhaus A, Guillemot V, Gidrol X, Frouin V: Gene Association Networks from Microarray Data Using a Regularized Estimation of Partial Correlation Based on PLS Regression. IEEEACM Trans Comput Biol Bioinform. 2010, 7 (2): 251262.
Teixeira AP, Oliveira R, Alves PM, Carrondo MJT: Advances in online monitoring and control of mammalian cell cultures: Supporting the PAT initiative. Biotechnol Adv. 2009, 27 (6): 726732. 10.1016/j.biotechadv.2009.05.003.
Llaneras F, Pico J: A procedure for the estimation over time of metabolic fluxes in scenarios where measurements are uncertain and/or insufficient. BMC Bioinformatics. 2007, 8
Acknowledgements
Financial support for this work was provided by the Portuguese Fundação para a Ciência e Tecnologia through projects PTDC/EBBEBI/103761/2008 and MIT/Pt7BSBB/0082/2008, PhD grant SFRH/BD/36285/2007 (ARF), SFRH/BD/36676/2007 (NC), SFRH/BD/51577/2011 (RP), SFRH/BD/70768/2010 (IAI) and SFRH/BD/36990/2007 (MvS) and PostDoc grant SFRH/BPD/46277/2008 (JD).
Author information
Authors and Affiliations
Corresponding author
Additional information
Authors' contributions
The project was conceived by RO. Experiments were performed by AT and NC. PLP algorithm was developed and implemented by JD and RO. Algorithms (PLP and PLS) benchmarking and data analysis was performed by AR, MvS, IAI and RP. Manuscript writing was performed by JD and RO. All authors read and approved the final manuscript.
Ana R Ferreira, João ML Dias, Ana P Teixeira, Nuno Carinhas, Rui MC Portela, Inês A Isidro, Moritz von Stosch and Rui Oliveira contributed equally to this work.
Electronic supplementary material
12918_2011_777_MOESM1_ESM.DOC
Additional File 1: BHK metabolic network. Biochemical reactions/pathways, enzymes and biomass composition considered in the metabolic model of BHK cells. (DOC 66 KB)
12918_2011_777_MOESM2_ESM.DOC
Additional File 2: BHK elementary modes. List of elementary modes obtained from the BHK metabolic network (Additional File 1). Elementary modes are represented in reduced form in terms of extracellular metabolites. (DOC 628 KB)
12918_2011_777_MOESM3_ESM.XLS
Additional File 3: Envirome factors regression coefficients. List of PLP regression coefficients and the respective confidence intervals resulting from the bootstrapping analysis. (XLS 96 KB)
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Ferreira, A.R., Dias, J.M., Teixeira, A.P. et al. Projection to latent pathways (PLP): a constrained projection to latent variables (PLS) method for elementary flux modes discrimination. BMC Syst Biol 5, 181 (2011). https://doi.org/10.1186/175205095181
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/175205095181
Keywords
 Latent Variable
 Metabolic Network
 Flux Distribution
 Baby Hamster Kidney
 Elementary Flux Mode