BMC Systems Biology BioMed Central Methodology article

Background Graphical models (e.g., Bayesian networks) have been used frequently to describe complex interaction patterns and dependent structures among genes and other phenotypes. Estimation of such networks has been a challenging problem when the genes considered greatly outnumber the samples, and the situation is exacerbated when one wishes to consider the impact of polymorphisms (SNPs) in genes. Results Here we describe a multistep approach to infer a gene-SNP network from gene expression and genotyped SNP data. Our approach is based on 1) construction of a graphical Gaussian model (GGM) based on small sample estimation of partial correlation and false-discovery rate multiple testing; 2) extraction of a subnetwork of genes directly linked to a target candidate gene of interest; 3) identification of cis-acting regulatory variants for the genes composing the subnetwork; and 4) evaluating the identified cis-acting variants for trans-acting regulatory effects of the target candidate gene. This approach identifies significant gene-gene and gene-SNP associations not solely on the basis of gene co-expression but rather through whole-network modeling. We demonstrate the method by building two complex gene-SNP networks around Interferon Receptor 12B2 (IL12RB2) and Interleukin 1B (IL1B), two biologic candidates in asthma pathogenesis, using 534,290 genotyped variants and gene expression data on 22,177 genes from total RNA derived from peripheral blood CD4+ lymphocytes from 154 asthmatics. Conclusion Our results suggest that graphical models based on integrative genomic data are computationally efficient, work well with small samples, and can describe complex interactions among genes and polymorphisms that could not be identified by pair-wise association testing.


Background
Systems biology states that, in order to quantitatively understand and predict the cell behaviour, its constitutive components and their interactions must be studied as a whole system [1,2]. Metabolic networks are a paradigmatic example of this aim because, even incomplete as they may be, they are the best characterized cellular networks [3]. In recent times, the information embedded in metabolic networks is being used to assemble constraintbased models under the pseudo steady-state assumption, thus not requiring the knowledge of kinetic parameters, which are still rarely known [3,4]. Constraint-based mod-els allow the calculation of the possible metabolic states or "behaviours" that can be exhibited by the cell; however, they do not predict which of these are likely under given circumstances. One approach to perform these predictions is flux balance analysis (FBA), which is based on the assumption that cell behaviour has evolved to be optimal in a certain sense [5,6]. It has been shown that FBA is able to predict the actual fluxes [7][8][9], but this requires to identify which are the relevant objectives for different conditions [7,10]. As an alternative, one could perform a metabolic flux analysis (MFA) which, generally speaking, is the exercise of estimating the fluxes shown by cells by combination of a constraint-based model and the set of available experimental measurements.
In order to estimate the intracellular fluxes, traditional metabolic flux analysis (TMFA) employs only measurements of uptake and production rates (i.e. influxes into and outfluxes from cells) that are stoichiometrically balanced [11]. This purely stoichiometric approach has some limitations, but most of them can be overcome with simple extensions, as it will be shown below.
One typical difficulty to be tackled by MFA is that the available measurements may be insufficient to estimate the intracellular fluxes, particularly in large-scale networks, because there may be different flux distributions compatible with the available measurements. To face this situation, intracellular information obtained from stable isotope tracer experiments has been incorporated in many studies (13C-MFA) [12][13][14]. Yet, data from isotope tracer experiments will not be considered in this work. Instead, we follow a constraint-based modeling approach, in the sense that we do not attempt necessarily to predict the actual fluxes with precision, but rather to distinguish "most possible" from "impossible" flux states, based on a suitable definition of "possibility", a constraint-based model and the available measurements, which in most cases do not include isotopic data.
Another option to face a lack of measurements is the use of some rational hypotheses to chose one flux distribution among those that are compatible with the measurements. For instance, Nookaew et al. have proposed to estimate the intracellular fluxes based on the assumption that cells are likely to use as many pathways as possible to maintain robustness and redundancy [15]. Related hypotheses have been formulated using the concept of elementary modes [16,17]. The assumption of optimal cell behavior typically used in FBA could be also used (e.g. [7]). It will be shown that the methodology we propose is able to detect these flux distribution that are equally possible (or similarly possible), but for the sake of simple exposition we will not use any hypothesis herein. However, the possibilistic framework might be extended to incorporate hypotheses, as discussed in the conclusions section.
In this context, the paper discusses the use of a possibilistic framework for metabolic flux analysis.
Uncertainty, lack of measurements and model imprecision will be handled introducing the notion of "degree of possibility". Then, an efficient optimization-based approach will be employed to query the most possible fluxes and their possibility distributions. The methodology is based on a re-interpretation of the consistent causal reasoning paradigm [18] as an equivalent problem of fea-sibility subject to equality and inequality constraints; preferences under uncertain knowledge are incorporated by transforming the feasibility problem into a linear optimisation one, which may be interpreted in possibilistic terms. The optimisation approach to logic reasoning has been previously explored by the research group to which the authors belong in [19][20][21], and this paper applies it to MFA.
The main features of the possibilistic framework introduced in the paper are the following: (i) it is based on a constraint-based model and not only on stoichiometric balances, (ii) it considers measurements uncertainty in a flexible way (e.g. non-symmetric error or a band of uncertainty due to systemic error) and (iii) even model imprecision, (iv) it provides possibility distributions (and intervals) which are more informative than point-wise estimations when multiple flux values might be reasonably possible, (v) it is reliable even if only a few fluxes are measurable, (vi) it has the ability to detect, and handle, inconsistencies between measurements and model, and furthermore (vii) with high computational efficiency.
The structure of the paper is as follows: preliminaries on possibility, optimization and metabolic flux analysis are first addressed. Then, the basics of Possibilistic MFA and some refinements are discussed; the framework is illustrated with simple examples and a well-know model of C. glutamicum. The paper is closed with a summary and a discussion on future work.

Preliminaries: possibility and optimization
In an abstract ideal situation, many estimation problems in science and engineering can be cast as estimating some decision variables δ given the known values of a set of other ones m (possibly, measurements) and a model expressed as a set of equality and inequality constraints (involving decision variables, measurements and some model parameters). Then, the valid estimations will be the feasible solutions of a constraint satisfaction problem [22,23].
However, in many practical cases, the measurements are imprecise and the model parameters and constraints are also not accurate, so real data violates them. This is the reason why most real-life models should include uncertainty. The most basic representation of uncertainty would be giving interval values to measurements and model parameters. Refinements of the uncertainty representation give rise to probabilistic [23][24][25] and possibilistic [26][27][28] frameworks.
Probabilistic frameworks have an underlying interpretation in terms of the frequency in which some flux conditions appear; on the other hand, possibilistic frameworks measure the degree of compliance (consistency) of some decision variables with some (soft) modeling constraints. In this sense, the basic assumptions of both paradigms of inference under uncertainty are different.
In the following subsections the possibilistic framework will be described. Afterwards, this section of preliminaries will be closed discussing the relationship between probability and possibility and justifying the use the possibilistic framework.
Soft constraint satisfaction problems: a possibilistic approach As explained above, the possibilistic framework is the chosen representation for the problem under study, following the ideas in [29], where possibilistic constraint satisfaction problems (CSP) are presented. There, the authors introduce constraints which are satisfied to a degree, transforming the feasibility/infeasibility of a potential solution into a gradual notion: given a CSP with multiple solutions δ ∈ Δ (where Δ denotes the search space over which feasible values for the decision variables will be searched), a function π : Δ → [0, 1] was suggested in order to represent preference or priority as a "consistency degree". The meaning of π(δ) = 1 would indicate that δ is in full agreement with the model and measurement constraints; the meaning of π(δ) = 0 indicates that δ is in "absolute, total contradiction" with the problem constraints, and never should be considered a feasible value. Intermediate values would denote values of decision variables which "somehow mildly" violate the problem constraints but could be considered "partially possible" from the "practical" knowledge of the "expert" modeller who defined π. The higher the value of π(δ), the higher the accordance with the problem constraints should be (subjectively interpreted as a higher "possibility" of the decision variable choice δ). Given the here outlined subjective meaning of π, it is denoted in literature as possibility distribution. The possibilistic calculus [27,29] refers then, to computations with possibility distributions from a series of axioms. Basic ideas on it will be outlined below in this section. A simple example now illustrates the basic idea.

Example
Consider a flux balance {f 1 = f 2 }, stating equality between two flows, f 1 and f 2 , supposedly measured in a biological or chemical reaction. The measurements m a = (5, 7) and m b = (5, 5.1) are infeasible, whereas m c = (5, 5) is feasible. However, it seems clear that the subjective "possibility" of m b is higher than that of m a ; m b can be thought to be quite reasonable in practice due to measurement errors.
Usually, the function π(δ) is built by "conjunction" of possibility functions of individual relations π i (δ i ) (expressing user-defined preference or priority on each individual constraint, in many cases in a problemdependent way). Such conjunction will be latter discussed in this section. The best CSP solutions are defined to be those which satisfy the global problem to the maximal degree.
In this way, once the user has defined such function expressing how a particular combination of system variables is "consistent" with its model, the basic idea on possibilistic calculus is, given a subset of the system variables (assumed as known or measured), estimate the "most possible" values of all the remaining variables via an optimization problem. The close relationship between possibilistic calculus and optimisation is discussed in the subsection below.

Possibility theory
The basic building block of possibility theory is a userdefined possibility distribution π : Δ → [0, 1]. This defines the possibility of each "point" δ in Δ. A consistent problem formulation is defined to be the one in which there exists at least one point with possibility equal to one.
The second building block are events, formally defined as subsets of Δ, in order to address problems such as, in the above example, determining the possibility of event Possibility calculus as optimization By definition, the possibility of an event A (subset of Δ) is computed via: and, obviously, given two events A and B, A ⊂ B entails π(A) ≤ π(B).
Hence, possibility computations are optimisation problems (Cf. with probability computations, which are integration problems).

Optimization as possibility calculus
Conversely, consider a cost function J : Δ → R + (i.e., verifying J(δ) ≤ 0 for all δ ∈ Δ), so that there exists δ 0 ∈ Δ such that J(δ 0 ) = 0. Then, a consistent possibility distribution may be defined on Δ via: and the possibility of an event A is given by replacing the possibility definition (3) in (1), resulting in: In the next sections, abusing the notation, an event A will be usually described by a set of constraints on the decision variables δ.
In this way, numeric constrained optimisation problems may be subjectively interpreted in possibilistic terms: the cost J(δ) will be interpreted as the log-possibility of δ and, by definition, unfeasible values of decision variables will be assigned zero possibility.
Let us now review some other relevant definitions and issues in possibilistic calculus.

Necessity
In order to assert that an event A is necessarily true (in our context, that all problem solutions belong to A), saying that A is "possible" may be not enough: it must also be true that the complementary event "not A" is not possible. This motivates the introduction of a necessity measure:

Interactivity and possibilistic conjunction
The possibilistic analog to statistical independence is the non-interactivity.
If the joint possibility of two variables Δ = Δ 1 × Δ 2 , δ = (δ 1 , δ 2 ) ∈ Δ can be expressed as the product of two univariate ones: then variables δ 1 and δ 2 are said to be non-interactive. In that way, given an event A 1 ⊂ Δ 1 and an event A 2 ⊂ Δ 2 , it is straightforward to prove that: which can be read as "the possibility of event A 1 and event A 2 is the product of the individual possibilities when the events relate non-interactive variables", interpreting, as usual in literature, set intersection as a linguistic conjunction.
Under non-interactivity assumption, if the possibility is defined as the logarithm of a cost index (3), the product (6) gets transformed into a sum: On the following, given individual cost indices J 1 (δ 1 ), J 2 (δ 2 ), etc. relating to different constraints, the above expression (8) will be the one used in most cases to define a possibility distribution in the product space. In this way, we are interpreting the possibilistic conjunction operator in [29] as an algebraic product of possibilities, i.e., stating an underlying non-interactivity assumption between different constraints. Note, however, that the interactivity assumption is not always intuitively needed. In the other extreme (total interactivity: variables δ 1 and δ 2 fully "correlated", for instance equal), we would have: π(A 1 ∩ A 2 ) ≤ max(π(A 1 ), π(A 2 )), which would suggest the maximum possibility as the conjunction operator when two events affect exactly the same decision variables. In between those two extremes, other choices may be also possible (T -norm operators [30]).

Conditional possibility
The possibilistic analog to conditional probability is conditional possibility.
Consider an event B with nonzero possibility. A quotient definition for conditional possibility of an event A given event B will be used in this paper: In this way, given a (multivariate) possibility distribution π(δ), the conditional possibility can be computed as: so, if the possibility distribution is actually the exponential of a cost index, we get: that is, computing the possibility by subtracting the cost associated to event B from the cost of any of its subsets.
In order to get a conditional possibility distribution of a variable δ, we assume event A being an individual point δ*, getting: That is, the conditional distribution can be obtained by dividing the possibility distribution function for all points in a set by the maximum possibility of them, i.e., normalising the possibility distribution on a restricted conditioning domain B to a maximum equal to one.
The conditional definitions allow for an analogy to Bayesian inference: if we assume that B is actually certain (whatever the a priori possibility π(B) was), then conditional possibility may be understood as an a posteriori possibility.

Possibility versus probability
Both possibility theory and probability theory are frameworks for handling uncertainty in constraint satisfaction problems. Basically, a subjective interpretation would assign high possibility to events with high probability. Hence, in a first approximation, user-defined probabilities and possibilities should be related by an implicit monotonically increasing function. Possibility-necessity measures have also been linked to imprecise probabilities [31]. However, once aggregation takes place (via sums and integration in probability, via maximisations in possibility), although the subjective interpretation might be considered similar, there is no longer an implicit function relating probability and possibility. For further discussion on possibility, probability, and other uncertain reasoning frameworks, and their interrelations, the reader is referred to [31][32][33].
Ideally, probabilistic results would be preferable (to confidently assert that, e.g., 95% of cases a flux estimate will lie in a particular interval). However, there are some drawbacks: (i) exact probabilistic inference under equality and inequality modeling constraints is computationally hard (multivariate integration on irregular sets) (ii) some of the a priori Bayesian probabilities are in practice rough usergiven estimates, (iii) some of the assumptions (linearity of transformation, Gaussian distributions) may not hold in practice, and (iv) there may be some uncertainty in the model parameters or in the model probabilities. Thus, as practical use of probability does not fully adhere to the theoretical assumptions, its results should be interpreted with some flexibility. As this work will discuss, the proposed possibilistic framework is much less demanding computationally (using optimization instead of integrals, so large-scale cases become tractable) and gives similar results to the probabilistic approach in realistic cases.
The objective of the next sections is to set up a possibilistic framework for efficient computations in metabolic flux analysis.

Preliminaries for metabolic flux analysis
In biology, the metabolism of living cells is usually represented by a metabolic network encoding the elementary biochemical reactions taking place within the cell [3]. These metabolic networks can be translated to a matrix N, where rows are the m internal metabolites and columns the n reactions. Assuming that these metabolites are at steady state, mass balances can be formulated as follows [11]: Hence, a (steady-state) flux vector v represents the metabolic state of the cells at a given time, without any information on the kinetics of the reactions; it shows the contribution of each reaction to the overall metabolic processes of substrate utilization and product formation. Notice that as typically n is larger than m, the system (13) is underdetermined, i.e. there is a wide range of stoichiometrically-feasible flux vectors. Assuming now that some fluxes in v have been measured (denoted as v m ), while the rest remain unknown (denoted as v u ), equation (13) can be rearranged as follows: As measurements are imprecise in practice, such measurement imprecision can be incorporated as constraints: where e m represents measurements errors and represents the actually measured flux value. In our approach, the measurement uncertainty is translated into an a-priori v Other approaches consider different choices, as discussed below.
At this point, Traditional metabolic flux analysis (TMFA) can be defined as the estimation of the flux vector satisfying (14) and compatible with the measurements (15). In particular, TMFA can be formulated as a two step procedure [34,35]: (I) analyze measurements consistency (and detect gross errors) using chi-square tests, and (II) solve a least squares problem to estimate the actual flux vector v: where it is assumed that e m are distributed normally with a mean value of zero and a variance-covariance matrix F.
Since all the constraints are linear equalities, the analytic solution of this minimization problem can be obtained, resulting in the expressions to estimate v u and v m that are typically seen in literature (e.g. [11,36]). However, with this formulation TMFA has some important limitations: (i) irreversibility constraints, or any other inequality constraints, cannot be considered, (ii) measurement errors are assumed to be normally distributed, (iii) it only provides unique-valued flux estimation, and (iv) it needs a high number of measurable fluxes to be of use -system (14) has to be determined and redundant [37].
Several alternatives have been suggested to face those limitations (Table 1). Quadratic programming solves the least squares problem (16) allowing to include irreversibility constraints (LS-MFA), but inherits the rest of drawbacks (and the chi-square tests may lose validity). The flux spectrum approach (FS-MFA) follows an interval approach to overcome the limitations mentioned before, but its estimations tend to be conservative because only lower and upper bounds are used to represent measurements uncertainty [38,39]. Monte Carlo has been also used in the context of 13C-MFA (e.g. [40][41][42]), but rarely in absence of isotopic data. Moreover, sometimes it has been used incorrectly: Monte Carlo cannot be performed just solving a quadratic programming problem for each simulated set of measurements, because this introduce a bias on the results. Anyway, the major drawback of Monte Carlo is its high computational cost, which restricts its use for large metabolic networks as an impractical number of samples is required to assess probabilities within a reasonable accuracy.
In the following sections we introduce a possibilistic framework for MFA that brings several interesting features: (i) it overcomes all the mentioned limitations of TMFA, (ii) has the ability to detect, and handle, inconsistencies between measurements and model, and furthermore (iii) with high computational efficiency.

Possibilistic MFA: problem statement
In this section the possibilistic framework for MFA flux estimations is discussed. First, we define a set of timeinvariant constraints derived from the metabolism being modelled. Then we incorporate the constraints imposed by the measured fluxes, representing its uncertainty, by means of auxiliar slack decision variables and a cost index. In this way, the notion of "degree of possibility" is incorporated. Finally, it will be shown how (linear) optimisation problems will be able to settle queries about the most possible fluxes, the possibility distributions, etc.

Model-based constraints
First, let us define a set of invariant constraints that every steady-state flux vector must satisfy; they do not depend on environmental conditions, do not change through evolution, etc. [3]. In this work this model-based constraints, denoted as , will be the stoichiometric relationships (13) and irreversibility constraints, described by means of inequalities:  Other model-based constraints can be defined in an analogous way. For instance, elementary balances or degree of reduction balances might be incorporated into (17) as additional constraints [11]. It may be also possible to add constraints based on standard Gibbs free energy changes [43,44] or extracellular metabolites concentrations [45].

Incorporating the measurements
Estimating the non-measured fluxes would amount for solving the above equations (17), where some of the elements in vector v are measured (v m ). However, this simple approach will be impractical in two very common situations: • The measurements are very few, so the system has many -possibly infinite-solutions.
• real measurements do not exactly satisfy the constraints due to measurements (and modelling) errors. Therefore, no solution will be found. (For instance, an unfeasible set results with the constraint v 1 = v 2 and the measurements {v 1 Hence, the approach needs refinements to deal with a lack of measurements and to introduce the "possibility" of sensor errors and imperfect models. As shown below, such difficulties can be overcame by the introduction of slack variables and a cost index, enabling a grading of the different candidate flux vectors as more or less "possible".

Possibilistic description of measurements
Each experimental measurement can be described by a constraint as follows: where e m is a decision variable that represents the intrinsic uncertainty of the experimental measurements, i.e. the discrepancy between the actual flux v m , and the measured with α ≥ 0 and β = 0 (usually, if sensor error is "symmetric", α and β should be defined to be equal).

Remark
As explained in a subsequent section, the weights α and β should be defined related to each measurement's "a priori accuracy".

A global cost index
Consider now a set of measurements with its associated slack variables δ 1 = (ε 1 , μ 1 ),... δ m = (ε m , μ m ), and individual cost indices J 1 (δ 1 ),... J m (δ m ). The corresponding constraints will be called measurement-based constraints, : In order to have a possibility distribution, under the noninteractivity assumption (6) where the decision variables δ are the actual fluxes v = (v u , v m ), and the slack variables ε 1 and μ 1 .
The cost index J reflects the log-possibility of a particular combination of the decision variables, that is, the log-possibility of a particular flux vector v.

Remark
The PMFA will be cast as a linear programming problem; this is the reason why the non-negative decision variables ε 1 and μ 1 were introduced in substitution of e m . However, it can be formulated using any other optimization framework. For instance, PMFA can be easily cast under a quadratic programming framework. Throughout the paper linear programming will be assumed due to its great computational performance (solvable in polynomial time). This supposes a great advantage when dealing with largescale metabolic networks. Nevertheless, an example using quadratic programming will be described in a subsequent section to point out the flexibility of the PMFA.

Example 1 -Problem statement
Consider the toy metabolic network depicted at the top of Figure 1, and the corresponding constraints, and . Let us consider that the measurement of v 2 is "very accurate", that of v 5 is moderately accurate and those of v 3 and v 4 are quite unreliable. The weights α and β associated to the slack variables ε 1 , and μ 1 , can be defined in accord-ance with this information: if we take α 2 = β 2 = 2, α 5 = β 5 = 0.5, and α 3 = β 3 = α 4 = β 4 = 0.15, the measurements are depicted on the bottom in Figure 1, for supposed measurements = 9, = 31, = 30, = 10.

Point-wise flux estimations
The simplest outcome of a PMFA problem is a point-wise flux estimation: the minimum-cost (maximum possibility) flux vector. This problem can be conveniently cast as the optimisation of a linear functional subject to linear constraints.

Toy Network
values of possibility imply that v mp corresponds to fluxes v m deviated from the measurements .
Notice that as π(v mp ) = π( ), it can be interpreted as the "a priori" possibility of encountering the measurements ; so if it is low, this implies that either (a) there is a gross error in the measurements, (b) there is an error in the model, or both. Therefore, the maximum possibility can be used to evaluate consistency and detect errors (inconsistencies between data and models). We will come back to this point in a subsequent section.

Example 1, continued
Consider again the model and the measurements given in Figure 1. The maximum possibility flux vector resulting from (24)  Notice also that the possibility depends on the reliability associated to each measurement. For instance, if all the measurements were supposed to be more reliable -say α' = 10·α and β' = 10·β -the possibility distribution functions would be narrower. The interpretation of the new coefficients would, therefore, be that the same deviation from the fluxes of maximum possibility will be now be considered as a less possible fact.

Possibility distributions as flux estimations
Clearly, a point-wise flux estimation is limited in a situation where multiple flux values might be reasonably possible. To face these situation, marginal and conditional possibility distributions (and intervals) can be obtained, again, by solving linear optimisation problems. These provide a much more informative flux estimation than a point-wise one, such as the maximum possibility flux vector, or the interval of minimum-maximum possible values in [38].
These flux estimations, which are illustrated in Figure 2, wil be presented along this subsection.

Marginal possibility distributions
Hence, plotting the marginal possibility for a range of fixed given values f (taken within a pre-specified range) will provide the marginal possibility distributions that be interpreted as the "distribution of the possible values for each flux in the network, given the measurements" (see Figure 2, left).
Notice that "cuts" of a possibility distribution, containing those values of v i with a marginal possibility higher than γ can be obtained solving two LP problems: This provides a highly efficient procedure to compute a possibility distribution: compute "cuts" of possibilities between 0 and 1, say, 0.1, 0.

Conditional possibility distributions
Using the definition given in the preliminaries (12), the conditional possibility distribution of a flux v i can be computed as follows: Remember that conditional distribution can be obtained normalising the marginal possibility distribution to a maximum equal to one (see Figure 2).
Conditional possibility may be understood as an a posteriori possibility: π(v i = f| ) is the possibility of v i having the value f, if we assume that is actually certain, i.e. that the model and the measurements are correct.

(A posteriori) Possibilistic intervals
In analogy to (26), the interval of flux values with a degree of conditional (a posteriori) possibility higher than can be obtained solving two LP problems: The upper bound would be obtained by replacing minimum by maximum.
These possibilistic intervals have a similar interpretation to "confidence intervals" ("credible intervals") in Bayesian statistics, providing a concise flux estimation that can be represented by means of a box-plot chart (see Figure 2, right).

Example 1, continued
With the measurements in Figure 1, the resulting marginal possibility distributions for each flux are plotted in Figure  3A. They show that, for instance, the most possible value of v 1 is 0.75 (possibility of 0.74), that v 1 being 2.25 is quite possible, but that v 1 bigger than 10 is almost impossible (possibility lower than 0.05). The possibility distributions also reflect the reliability of the estimation of each flux: the estimation of v 6 is less reliable than the one of v 1 or v 2 , since it has a wide range of highly possible values.
Notice too that the uncertainty on the measurements is often strikingly reduced through the flux estimation. For instance, the estimation of v 4 -whose measurement was quite unreliable a priori -has been significantly improved, once model constraints and other measurements have been incorporated. This reflects the already noticed fact that the metabolic network structure greatly constrains the possible values of fluxes for a given, typically small, set of measured flux values. The plots of marginal possibility can also detect multiple flux vectors with maximum possibility (possibility distribution functions with flat top). Figure 3B depicts the maximum possibility flux estimation and three possibilistic intervals by means of a box-plot chart. The intervals point out that, for instance, the highly possible a posteriori values of v 5 are those in [30.75, 31] (possibility greater than 0.9) and that those in [29.5, 32] are also quite possible (possibility greater than 0.5), while those outside [27, 34.5] are almost impossible as their a posteriori possibility is lower than 0.1.

Possibilistic MFA: Refinements
Now that the basics of the PMFA framework have been introduced, some refinements will be discussed.

A better description of measurement's uncertainty
The formulation used above to describe the uncertainty of the experimental measurements might be considered somehow limited in some applications.

Remark
Notice that more slack variables can be added to achieve a more complex representation of the measured flux uncertainty. In fact, any convex representation of the logpossibility uncertainty can be approximated if a sufficient number of slack variables are incorporated (ε 1 , μ 2 , ε 2 , μ 2 ,...). Details are omitted for brevity.

Considering uncertainty in the model structure
Until now, the model-based constraints (23) -the stoichiometric relationships, reaction's irreversibility, or any other -have been considered as hard constraints; only those flux vectors v that exactly satisfy them could be feasible solutions. However, these constraints can be "softened" via suitable slack variables to consider uncertain knowledge. Then, these additional slack variables may be used in a cost index to generate a possibility distribution.
Consider, as an example, an equality restriction a = b. A relaxed ("softened") version of such restriction may be written as: with and ν being slack variables penalised in an optimisation index J = f(, ν), typically with linear cost index terms, γ + τυ, in an analogous way to the discussion on uncertain measurements.
Notice also that a "softened" inequality restriction is nothing but an equality one with no penalisation on one of the CBˆ: slack variables above. For instance a ≤ b + ε can be expressed as a = b + εμ with free μ.
Such softened model constraints may be used to roughly incorporate imprecision in the model arising, for instance, from non-compliance with the pseudo-steadystate assumption, partial unbalance of some metabolites or uncertain yields. Although these issues require further research, let us outline some preliminary ideas below.
Relaxing the pseudo-steady state assumption Equation (13) derives from the dynamic mass balance around the internal metabolites c, where it is assumed that ≈ 0 and that the term μ·c is negligible (μ denotes the growth rate). Adding slack decision variables to (13), as it has been explained, makes it possible to relax this assumption.

Partial unbalance of metabolites
Sometimes, a metabolite cannot be assumed to be balanced because there are reactions producing or consuming this metabolite that have not been taken into account in the network; for instance, this is often the case for the cofactors, ATP, NADP, etc. This unknown consumption/ production can be represented by means of slack variables (e.g. and υ) if some interval limits (e.g. max and υ max ) are provided.

Uncertainty in stoichiometric yields
In some cases, the value of a yield coefficient is not exactly known. This is usually the case of the yield coefficients of lump reactions used to represent biomass synthesis. Let v r be the flux through a reaction with an uncertain yield Y i, r for the metabolite i. The row corresponding to this metabolite in (13) can be rewritten as: If it is known that and v r is irreversible, equation (32) can be substituted by two constraints: However, if the flux v r is reversible, inequalities in (33) cannot be set up, and the approach is no longer applicable. Integrating modal interval arithmetic [46] in the proposed framework might be a possible option, under research at this moment.

Illustrative examples of other features of PMFA
Other features of Possibilistic MFA (PMFA) will be briefly illustrated using the simple metabolic network in Figure  1.

Example 2: Errors detection in measurements and model
As earlier mentioned, the value of the peak possibility in the resulting flux distribution provides an indication of the agreement between the model ( ) and the measurements ( ). A low degree of possibility means that the model and the measurements are inconsistent. That is, that there is not any flux vector "near" the measured values satisfying the model-based constraints. Therefore, if the the maximum possibility flux vector has a low value, one must assume that either (a) there is an important error in one or more measurements, (b) there is a relevant error in the model (e.g. a mass balance is not closed, or a metabolite is not at steady state), or both.
If a high inconsistency (low possibility) is detected, it is possible to investigate what is causing it, and thus correct the measurements or improve the model. Following a straight approach, we can remove one measured flux at a time and perform the flux estimation to determine if the removed measurement was causing the low possibility. If this is the case, we may consider the following alternatives: (a) consider that is a totally unreliable measurement and, thus, accept the flux estimation inferred from the others measurements, (b) obtaining either again, or a different measurable flux which could provide additional information, (c) consider a reliable piece of data and, hence, conclude that there is an error in the model or its assumptions. In case (c), a similar approach can be used to investigate which particular model-based constraint is causing the low possibility -by "softening" the suspicious constraints one at a time.
A simple example of the procedure just described is shown in Figure 4. Initially, a PMFA estimation using all the measured fluxes was performed, obtaining a maxi-

Example 3: Scenario lack of data
One of the features of PMFA is that it can be used even if there is a lack of measurements; i.e. even if (14) is underdetermined or not redundant [37].
To point this out, let us continue with our example assuming now that only two fluxes are measured (an underdetermined case). PMFA flux estimations, the marginal possibility distributions and the a posteriori intervals, are shown in Figure 5. Notice that crisp flux estimations will only be obtained if the irreversibility constraints -or other inequalities -are able to "bound" the underdeterminacy of (14). Interestingly, our experience show that this is often the case for medium size networks. Moreover, if this is not the case, the possibilistic flux estimation will be less precise -large intervals and "wide" distributionsbut still reliable, i.e. the estimation will always be as precise as allowed by the available measurements and knowledge.

Example 4: Using quadratic programming
To show how PMFA can be cast within other optimisation frameworks, an example using quadratic programming will be discussed. We define as = v m + e m and J = ·W·e m , where W is a diagonal matrix of weights.
Hence, we have for each measurement, i.e. measurements uncertainty is represented as a quadratic possibility distribution.
Let us continue with our example using the measurements of Figure 1, but representing them with the quadratic formulation just introduced. The original possibility distribution of single measurements (dashed lines) and the possibility distributions computed with PMFA (solid lines) are depicted in the Figure 5. Notice that results are similar to those obtained in the previous example ( Figure  1), where the standard linear programming framework was used (even if additional auxiliar variables 2 , μ 2 , etc. were not used). However, the qualitative similarity between the results makes the author think that, in most cases, the linear programming setup is expressive enough and much more efficient than quadratic or other more complex optimization cases.

Example 5: Comparison with other methods
This example compares PMFA with traditional MFA and some of its extensions. We continue using the network  To show that PMFA is able to represent the measurements in a flexible way, we assume that errors in v 2 and v 3 are non-symmetric, and we add a band of uncertainty to account for systemic errors ( Figure 6).
Inconveniently, errors have to be approximated with a normal distribution so that TMFA and LS-MFA can be used (see preliminaries). For the estimations with FS-MFA we represent the measurements with the interval of 95%, or 2σ (see [38]). All the results are depicted in Figure 6. Notice that TMFA assigns a negative value to an irreversible flux, v 1 , since it is not taking these constraints into account -this was predictable, but it must be highlighted because TMFA is still being widely used in the literature.
The results also point out that the possibilistic distribution (and intervals) are much more informative than the point-wise estimations of TMFA and LS-MFA, or the intervals of FS-MFA. Basically, point-wise estimations fail when several flux values reasonably possible, whereas the flux spectrum interval tend to be conservative. Furthermore, remember that TMFA and LS-MFA cannot be used in scenarios lacking data, such as example 3, where PMFA was shown to be valuable.
To complete this perspective, the next section will discuss a comparison between PMFA and Monte Carlo approaches.  Notice also that this is a simple case where Monte Carlo can be applied. Nonetheless, its worst performance is clear: the cost of computing the possibility distributions is polynomial in the number of fluxes (as shown above), whereas the cost of a Monte Carlo approach grows exponentially with the number of independent decision variables.

Larger-scale Example: C. glutamicum
In this section we apply Possibilistic MFA (PMFA) to a medium-size example. For illustrative purposes, we have chosen a very well-know metabolic model of Corynebacterium glutamicum.

Metabolic network model
The metabolic network of C. glutamicum has been taken from [47] and is a slight variation of the one originally constructed in [48,49]. The reactions considered in describing the biochemistry of the primary metabolism of C. glutamicum necessary to support lysine and biomass synthesis from glucose are given in the additional file 1. A reaction of ATP dissipation is included in the network, so that the ATP balance could be maintained, without actually constraining the flux space. On the contrary, the cofactors NADP, NAD and FAD are supposed to be balanced. The reaction for biomass formation is an approximation using as reactants those amino acids that explicitly appear in the network and the precursors of the other amino acids synthesized by C. glutamicum.

PMFA setting
The stoichiometric relationships, embedded in a 36 × 40 stoichiometric matrix, and the irreversibility of certain reactions, embedded in a 40 × 40 diagonal matrix, define our model-based constraints ( ) according to (17). Both matrices are given in the additional file 1.

Experimental measurements
Experimental data of a batch fermentation of C. glutamicum cultured on minimal glucose medium has been taken MOC Example 5 -Comparison of PMFA and alternative methods Figure 6 Example 5 -Comparison of PMFA and alternative methods. We use the model described in Figure 1 assuming that v 2 , v 3 , v 4 and v 5  from [48]. There, the growth rate and the fluxes (production/consumption rates) of the external metabolites -lactate, acetate, glucose, O2, CO2, NH3, lysine and trehalose -were experimentally measured. Since the accumulation of lactate and acetate was negligible, their flux is always zero in this case study. The measured fluxes v GLC (1), v O2 (34), v NH3 (35), v LY (37), v Thre (38) and v CO2 (39) and the growth rate v Bio (36), and also their standard deviations, are given in Figure 8.

PMFA setting
Using the data in Figure 8, we have built a possibilistic representation of single measurements defining convenient auxiliar variables and weights. The criterion to choose the weights was: full possibility for v m ∈ ± σ/2 and possibility 0.5 for those in ± σ. The values in ± 2·σ have possibility 0.1 (σ denotes standard deviation. If errors are assumed to be normally distributed, these levels correspond to the probabilistic confidence intervals of 38%, 68% and 95%, respectively). The resultant decision variables and weights define our measurement-based constraints ( ) according to (21). These possibilistic representations are depicted in Figure 8.

Possibilistic flux estimation of C. glutamicum
We used all the available measurements -v GLC (1), v O2 (34), v NH3 (35), v LY (37), v Thre (38), v CO2 (39) and v Bio (36) -to obtain the maximum possibility flux vector (results given in the additional file 1). The flux vector had a degree of possibility 0.38, which could be considered "low" if one considers that a significant degree of uncertainty was already being taken into account (table 1). We then obtained the marginal possibility distributions for each flux, which inspection indicated that the low possibility was almost completely caused by only one measured flux, v NH3 (35). This suggests that this measurement was inaccurate, or that its standard deviation was underestimated. Interestingly, this flux was indeed the most uncertain one in the original dataset (its standard deviation was a huge 44 mM/h for a nominal value of 64.8 nM/h).
As a results of this analysis -which is a rough example of the procedure mentioned in a previous section -we decided to remove the measurement and repeat our calculations. As expected, this time we obtained a maximum possibility flux vector with a similar shape, but higher possibility (0.88). The marginal possibility distributions are depicted in Figure 9A (39) and biomass v Bio (36). In this case, the obtained maximum possibility flux vector is fully possible. This flux vector and the flux intervals are depicted in Figure 10. Remarkably, even if only three fluxes were measured, there was a small range of flux vectors with an a posteriori possibility higher than 0.8. Numerical results are given in the additional file 1.
Possibilistic flux estimation with uncertain model As explained above, we can "soften" the model-based constraints to relax the pseudo-steady state assumption. As example, we assumed a degree of uncertainty around all the mass balances introducing decision variables 1 and υ 1 and weights γ 1 = τ 1 = 2 (see Figure 8). Hence, flux vectors which imply small accumulations of some metabolites will be accepted, yet considered less possible.
It could be also stated that the metabolic network used herein, the one introduced by Vallino et al., relies on an unrealistic assumption: that co-factors NADP, NAD and FAD are balanced [50,51]. To avoid this, we can remove these metabolites from our stoichiometric matrix or, as an alternative, use the expressivity of the possibilistic framework to allow a certain degree of unbalance for these metabolites. Just as example, herein we assumed that cofactors may be unbalanced with some limits (say, 30 mM/h for NADP/NADPH and 15 mM/h for FAD/FADH and NAD/NADH). This "knowledge" can be easily incorporated into our model defining the convenient auxiliar variables and weights (as explained above).
At this point, PMFA was performed in three scenarios: (a) the model-based constraints are not relaxed (reference case) (b) the pseudo-steady state assumption is relaxed and NADP/NADPH is allowed to be unbalanced, and (c) the pseudo-steady state assumption is relaxed and the three cofactors -NADP/NADPH, FAD/FADH and NAD/ NADH -are allowed to be unbalanced. The marginal possibility distributions obtained in each case are compared in Figure 11, where it can be observe how the model uncertainty is translated into the flux estimations; consider this uncertainty results in less precise flux estimations, given the less reliable model equations.

Conclusion
In this paper we have discussed a possibilistic framework for the estimation of the metabolic fluxes shown by cells at given conditions given.
Considering ordinary constraint-satisfaction problems, metabolic fluxes fulfilling a set of model-based constraints and compatible some experimental measurements are "possible", otherwise "impossible". In this paper, this idea is refined to cope with uncertain knowl-Experimentally measured fluxes during a batch fermentation of C. glutamicum Figure 8 Experimentally measured fluxes during a batch fermentation of C. glutamicum. The second column contains the experimental measurements and their standard deviation taken from [48]. The possibility distribution representing each single measurement is depicted in the third column, and the used weights are given in the last one. Possibilistic flux estimation for C. glutamicum  (1), v O2 (34), v NH3 (35), v LY (37), v Thre (38) and v CO2 (39) and v Bio (36).  Possibilistic MFA overcomes several limitations of traditional MFA and some of its extensions. It considers measurements uncertainty in a flexible way (e.g. nonsymmetric error or a band of uncertainty due to systemic error) and also model imprecision, and it is reliable even if only a few fluxes are measurable (a common scenario). Possibilistic MFA also computes possibility distributions (and intervals) which are more informative than pointwise estimations when multiple flux values might be reasonably possible. These distributions are also better than the intervals provided by the flux spectrum, or other methods giving upper and lower bounds for the fluxes. In addition, Possibilistic MFA has the ability to detect, and handle, inconsistencies between measurements and model. Finally, it must be remarked that Possibilistic MFA estimations have been cast as linear optimisation problems, for which widely-known and efficient tools exist (a MATLAB script solving example 1 is given in the additional file 2 to illustrate this point). This great computa-tional performance makes the methodology capable of dealing with large-scale or even genome-scale metabolic networks.

Measured
It must be noticed that there is a challenge when estimating the fluxes in large-scale networks because there may be diffierent flux vectors compatible with the few available measurements [52]. Interestingly, the proposed methodology is still of use in this situation: possibilistic MFA will detect all these flux vectors that are equally possible (or even similarly possible) and depict them by means of possibilistic distributions or intervals (e.g. example 3). Unfortunately, if there is a wide range of candidates, the estimation may be sometimes little informative (but at least we can be sure that it is reliable, because all the flux vectors compatible with model and measurements are captured). One strategy to face this difficulty consists of using a rational hypothesis to promote certain flux vector among those that are equally possible. For instance, it can be assumed that cell behaviour has evolved to be optimal in some sense, so that the fluxes are optimally regulated depending on the given environmental conditions, and then invoke this principle to choose particular flux vectors [3,7,9]. There might be still alternate optima, but the approach will reduce the range of possible flux vectors. Notice that this optimality principle, or any other hypothesis, might be incorporated into the possibilistic framework as far as they are encoded in in the form of a cost Possibilistic flux estimation for C. glutamicum lacking measurements Figure 10 Possibilistic flux estimation for C. glutamicum lacking measurements. In this case the measured fluxes are only v GLC (1), v CO2 (39) and v Bio (36). The maximum possibility flux estimation (circles and squares for measured and non-measured fluxes, respectively) and the flux intervals for conditional possibilities of 0.8 (box), 0.5 (thick line) and 0.1 (narrow line) are depicted. All fluxes in mM/h.
index (but sometimes it will not be the case or, in other cases, its optimization will not be computationally simple). This point requires further work.
Further extension may also address the adaptation of the ideas introduced herein to metabolic flux analysis with data from labelling experiments (13C-MFA) [12][13][14]. Extracellular dynamics could be also taken into account incorporating measurements in different time instants [38]. Finally, we are currently developing a software that implements the Possibilistic MFA methods and its future extensions, which will be freely available for academia.
In summary, this papers introduces a unifying framework for flux estimation and (possibilistic) evaluation of consistency that is flexible, usable in scenarios lacking data, highly informative, and computationally efficient. In our opinion, the combination of computational efficiency and flexibility of the assumptions is a distinctive advantage with respect to other approaches which either may rely on stronger assumptions (chi-squared distributions, interval-only descriptions, absence of irreversibility), or be only data-based (so they do not incorporate, say, stoichiometric model balances), or provide only point-wise estimates (of flux or consistency), or be computationally intensive (multi-variate integration in a general Bayesian estimation problem).

Authors' contributions
FLL, AS and JP designed the study, analyze the results and conceptualized the manuscript. FLL performed the estimations of the case studies. All authors read and approved the final manuscript. Possibilistic flux estimation for C. glutamicum with uncertainty in the model