Definition 1 (Hypergraphs and hyperarcs).
A directed hypergraph is a pair \mathcal{H}=\left(V,E\right)where V = {v_{1}, v_{2} ..., v_{
n
} } is the set of vertices and E = {e_{1}, e_{2},..., e_{
m
} } is the set of hyperarcs. A hyperarc e_{
i
} is an ordered pair e_{
i
} = (X_{
i
} , Y_{
i
} ) of disjoint subsets of V.
The set X_{
i
} is also called the tail of e_{
i
} and the set Y_{
i
} is called the head, with reference to the graphical representation of arcs (directed edges) and hyperarcs as arrows.
We denote by X:E\to \mathcal{P}\left(V\right) the application that given an hyperarc e_{
i
} returns its tail X(e_{
i
} ) ⊂ V. Analogously we use Y:E\to \mathcal{P}\left(V\right) for the application that given a hyperarc returns its head.
Definition 2 (Reactions and networks).
In a metabolic network each vertex corresponds to a metabolite and each hyperarc corresponds to a reaction. A metabolic network of m metabolites and n reactions can be represented with a m × n stoichiometric matrix S , where the rows correspond to the m metabolites and the n columns to the reactions. A reaction j is represented by the column vector S_{
j
} = (s_{1j},..., s_{
mj
} ) ^{T} where s_{
ij
} is the stoichiometric coefficient of metabolite i in reaction j. Reactants have negative coefficients and products have positive coefficients.
Examples of hypergraph, network, and stochiometric matrix are given in Figure 2A. We notice that the stoichiometric coefficients of the reactions are not taken into account in the hypergraph representation. We also notice that the pair (X, Y) is ordered so to make the distinction between reactants and products. In this representation reactions are irreversible. Many biochemical reactions can be considered as irreversible, since in organisms the homeostatic equilibrium is often strongly polarized. Nonetheless, metabolic network may comprise reversible reactions, and we model these reactions by introducing both hyperarcs: (X, Y), and (Y, X).
Hyperpaths, a generalization of simple paths in graphs where cycle free paths going from one vertex to another, are used to represent pathways. A hyperpath connects a source set of vertices to a target set of nodes. Two examples of hyperpaths are given in Figures 2A and 2C. We remark that in a natural way a set E of hyperarcs defines a hypergraph ε = (∪_{e∈E}X(e) ∪ ∪_{e∈E}Y (e), E). By abuse of the terminology we denote by E the hypergraph corresponding to the set E of hyperarcs and all the heads and tails of the hyperarcs in E. The following definition for hyperpaths is borrowed from Nielsen et al. [22].
Definition 3 (Hyperpaths).
A hyperpath P going from a source subset {S}_{\mathcal{H}}of V to a target subset T_{
P
} of P in a hypergraph \mathcal{H}=\left(V,E\right)is a hypergraph{\mathcal{H}}_{P}=\left({V}_{P},{E}_{P}\right)with VP ⊆ V, EP ⊆ E, such that there is an ordering F of the hyperarcs EP with the following properties.

\forall k\in \left\{0,\dots ,\leftF\right\right\},X\left({F}_{k}\right)\subseteq {S}_{\mathcal{H}}\cup \left({\cup}_{j<k}Y\left({F}_{j}\right)\right)

{T}_{P}\subseteq {S}_{\mathcal{H}}\cup \left({\cup}_{e\in {E}_{P}}Y\left(e\right)\right)
From the point of view of metabolism, the first condition corresponds to the requirement that reactants of reactions participating in the hyperpath can be produced without the presence of the reaction itself. Hyperpaths defined in this manner represent a metabolic route from the source to the target. According to definition (3) the hypergraph of Figure 2B with source a is not a hyperpath because neither reaction R_{1} nor R_{2} can happen until the other does not start. The definition (3), though complex, is computationally tractable, meaning that the time required to determine if a hypergraph is a hyperpath is proportional to the number of reactions. A polynomial time algorithm to determine if a hypergraph is a hyperpath is given in [23], the algorithm FindAll presented below can also be used for that purpose. In fact, as discussed below, if the set of reactions returned by FindAll \left({\mathcal{H}}_{P},{S}_{\mathcal{H}}\right) contains all the reactions in {\mathcal{H}}_{P}, then {\mathcal{H}}_{P} is a hyperpath.
The metabolic network described by a hypergraph has to be as comprehensive as possible, containing every known enzymecatalyzed reaction occurring in organisms. We say that a hyperpath produces a set of target metabolites if it contains all those target elements. A set of target compounds is said to be reachable from a given source, or linked to the source, if there is at least one hyperpath producing the targets.
We are interested in the enumeration of pathways leading to the production of a desired compound. Hyperpaths do not generally give the best representation of pathways because hyperpaths can contain reactions not necessarily linking the target to the source. Minimal hyperpaths, cf. definition (4), are an appropriate representation of pathways since they contain only the essential reactions linking the source to the target.
In the definition given below, we say that a hyperpath \mathcal{P}\left(\mathcal{V},\mathcal{E}\right) is a subset of another hyperpath {\mathcal{P}}^{\prime}\left({V}^{\prime},{E}^{\prime}\right) if V ⊆ V' and E ⊆ E'. For instance the hyperpath of Figure 2C is a subset of the one of Figure 2A.
Definition 4 (Minimal Hyperpaths).
A hyperpath (V_{
P
} , E_{
P
} ) with target TP is said to be minimal if it has no proper subsets with the same target.
The target is disconnected from the source if a reaction is removed from a minimal hyperpath. In this sense minimal hyperpaths cannot be reduced. From a metabolic engineering perspective the concept of minimal hyperpath is useful as it defines the minimum set of reactions necessary to produce a target heterologous compounds, and consequently the minimum set of enzymes needed to be inserted into the chassis organism where the compound is going to be produced.
In the following we define B\left(\mathcal{H},{S}_{\mathcal{H}}\right) to be the set of all molecules linked to the source for a given hypergraph \mathcal{H} and source set {S}_{\mathcal{H}}. The characterization of B\left(\mathcal{H},{S}_{\mathcal{H}}\right) is the first task to be solved before the enumeration. Once this set is known all the minimal hyperpaths can be enumerated for all the molecules associated to the vertices in B\left(\mathcal{H},{S}_{\mathcal{H}}\right).
Supplements
Supplements for a target are molecules whose presence in the source set increases the number of pathways for target production. Finding supplements is an important improvement when exploring ways to produce the target, since they make possible new pathways.
For each target of interest one can look for vertices that once inserted in {S}_{\mathcal{H}} give place to pathways otherwise impassable. In terms of metabolism we are looking for the "supplement" molecules, i.e., molecules that once introduced in the source set permit to find more pathways than those otherwise available. We introduce below FindSupp, an algorithm that returns the supplements.
An analysis of pathways containing supplements allows to find out pathways containing bootstrap molecules, i.e. metabolites that are needed in reactions producing compounds afterwards used for the production of the bootstrap molecules. As a matter of fact, many pathways can be made viable once bootstrap molecules become available in the metabolic network (a concept introduced in [20]). Loosely speaking bootstrap molecules are molecules that cannot be produced by the reactions belonging to a hyperpath unless they are already present in the source. Cottret et al [21] stated that given a source set the existence of a pathway making use of bootstrap molecules can be tested in polynomial time. We provide later in this section an algorithm returning the bootstrap compounds, such algorithm can be used to determine if a target molecule is connected to the source through a pathway making use of bootstraps.
Enumerating pathways using the steady state approach
In steady state, all possible pathways in a metabolic network are by definition stoichiometrically balanced, i.e. all metabolites produced from the source set must be consumed except for those that are target products. Extreme pathways and elementary modes are two methods that compute the set of independent nondecomposable pathways in the network that generate all feasible steady state solutions in the flux space. They do not directly enumerate all pathways linking a source set to a target set of compounds. However, one can construct stoichiometric matrices where input fluxes are added to the set of source compounds and outgoing fluxes are associated to the target and heterologous coproducts such that the extreme pathways and elementary modes enumerated from these matrices can be used to generate all pathways linking the source set to the target.
Given a hyperpath {\mathcal{H}}_{P}=\left(V,E\right) of a hypergraph \mathcal{H}=\left(V,E\right), we can define a set of flux vectors v_{
P
} for the hyperpath where components v_{
Pj
} corresponding to those reactions in the pathway {e}_{j}\in {\mathcal{H}}_{P} are activated:
{v}_{Pj}=\mathsf{\text{}}\left\{\begin{array}{cc}\hfill >0\hfill & \hfill {e}_{j}\in {\mathcal{H}}_{P}\hfill \\ \hfill 0\hfill & \hfill {e}_{j}\in \mathcal{H}\backslash {\mathcal{H}}_{P}\hfill \end{array}\right.
(1)
A hyperpath {\mathcal{H}}_{P}=\left(V,E\right) of a hypergraph \mathcal{H}=\left(V,E\right) with input source subset {S}_{\mathcal{H}} and the target subset T_{
P
} is defined as stoichiometrically balanced if the rows corresponding to each metabolite v_{
i
} ∈ V that are obtained from the product of the stoichiometric matrix S and the associated flux vector v_{
p
} verify:
\mathbf{S}{\mathbf{v}}_{P}=\left\{\begin{array}{cc}\hfill \le 0\hfill & \hfill {v}_{i}\in {S}_{\mathcal{H}}\hfill \\ \hfill \ge 0\hfill & \hfill {v}_{i}\in {T}_{P}\hfill \\ \hfill \mathbf{0}\hfill & \hfill {v}_{i}\in V\backslash \left\{{S}_{\mathcal{H}},\phantom{\rule{2.77695pt}{0ex}}{T}_{P}\right\}\hfill \end{array}\right.
(2)
A way to introduce the constraint on input and output metabolites in the previous equation is by adding to the stoichiometric matrix S additional columns corresponding to input reactions (reactions with no substrate that produce the source set {S}_{\mathcal{H}}), and output reactions (reactions with no product that consume the product metabolites in the hypergraph T_{
P
} ). These auxiliary reactions, even if nonproperly balanced in terms of the law of conservation of mass, are useful in order to define completely the problem in a compact manner:
\begin{array}{c}\hfill \mathbf{S}\mathbf{v}=\mathbf{0}\hfill \\ \hfill \mathbf{v}\ge 0\hfill & \hfill \mathbf{v}\in \mathcal{R}\hfill \end{array}
(3)
Both extreme pathways and elementary modes make use of this formulation in order to compute the set of feasible solutions v. Since in our hypergraph definition all reactions are irreversible, the set of pathways solving Equation 3 computed by both extreme pathways and elementary modes are identical (cf. [5]). Furthermore, solutions in v must contain only positive or null fluxes.
In order to determine all stoichiometrically balanced heterologous pathways {\mathcal{H}}_{P} that can be inserted into the chassis organism to produce a target set T_{
P
} , we need to constrain the computation of elementary modes only to those that have nonzero fluxes for heterologous reactions. Efficient solutions to this problem have been considered in the divideandconquer approach [24, 25] by rearranging the constraints in an echelon form so that the constraints containing only the desired reactions appear at the bottom. To define the constraints in our case, we consider first the hypergraph {\mathcal{R}}_{T} that is formed only by heterologous reactions. This hypergraph {\mathcal{R}}_{T} is the subset of the hypergraph \mathcal{R}\left(V,E\right) formed by those hyperedges where at least one vertex V does not belong to the source set {S}_{\mathcal{R}}, i.e. those metabolites endogenous to the chassis organisms. By considering {\mathcal{R}}_{T} instead of the full hypergraph \mathcal{R}, we are looking only for biosynthetic pathways involving heterologous reactions and therefore avoiding cycles internal to the chassis organism. Therefore, to compute all feasible steady state heterologous pathways, we reformulate Equation 2 so that the stoichiometric matrix S is defined by reactions in {\mathcal{R}}_{T}; the input is given by all substrates in the source set {S}_{\mathcal{R}}\cap X\left({E}_{\mathcal{R}}\right); and the output by all products of the reactions in the hypergraph Y\left({E}_{\mathcal{R}}\right).
Finally, from the computed set of solutions v for Equation 3, we are interested in enumerating all minimal hyperpaths from {S}_{\mathcal{R}} to the target set T on the hypergraph given by {\mathcal{R}}_{T}. According to Definition 4, minimal hyperpaths for some target T are given by those cyclefree solutions in v containing only reactions linking the source to the target. Since any feasible flux pattern v is a superposition of elementary modes with nonnegative coefficients [26], the set of minimal hyperpaths for a given target T is a subset of the elementary modes producing T that are solution of Equation 3. Namely, any feasible solution generated from the elementary modes will contain at least as many reactions as the ones that are in those elementary modes that form its basis. Therefore no additional minimal hyperpaths can be generated in this case by superposition of elementary modes.
Enumerating pathways using the topological approach
The algorithm FindAll that allows to find B\left(\mathcal{H},{S}_{\mathcal{H}}\right), the set of metabolites that can be linked to the source {S}_{\mathcal{H}} by a hyperpath. FindAll, by explicitly constructing the ordered set Fin definition (3), provides a proof of the tractability of the problem of checking if a hypergraph is a hyperpath. Moreover FindAllF permits to prune the original hypergraph enabling a faster enumeration algorithm.
As presented below the algorithm Minimize, when called on the output of FindAll, returns, if exists, a minimal hyperpath linking a given target to the source. These algorithms are the main components of the algorithm enumerating the pathways FindPath described next. Then we present FindSupp an algorithm to enumerate supplements.
Finding one minimal hyperpath
Let \mathcal{H}=\left(V,E\right) be the hypergraph representing the set of metabolic reactions, n = V, m = E and let {S}_{\mathcal{H}} be the set of source vertices representing the source metabolites.
The algorithm FindAll returns all the reactions that can contribute to the production of any element in B\left(\mathcal{H},{S}_{\mathcal{H}}\right), i.e., the set of all compounds that can be connected to the source. FindAll is a linear algorithm in the number of vertices, hyperarcs and in the total coordination; the complexity is O(n + m +Σ_{v∈V}X ^{1}(v) + Y^{1}(v)) that is bounded by O(n + m + n · m). Therefore, such algorithm can be applied to the hypergraph \mathcal{H} of all reactions in order to obtain a pruned subhypergraph {\mathcal{H}}^{\prime}=\left({V}^{\prime},{E}^{\prime}\right) where the set of vertices {V}^{\prime}:={S}_{\mathcal{H}}\cup B, and the set of edges E' is the set of reactions returned by FindAll. In the context of metabolic engineering FindAll returns all the compounds that can be produced from a given set of source compounds and reactions. For instance, using FindAll with all know metabolic reactions one can determine all the compounds that can be produced from the metabolites of E. coli.
Algorithm FindAll (Given a hypergraph \mathcal{H} and a source {S}_{\mathcal{H}}, returns all the hyperarcs that are part of at least one hyperpath.)
input:
\mathcal{H},{S}_{\mathcal{H}}

1.
for all
r
in
\mathcal{H}

2.
x(r) ← X(r)

3.
end for

4.
V\leftarrow {S}_{\mathcal{H}}

5.
D\leftarrow {S}_{\mathcal{H}}

6.
F← {∅}

7.
while V ≠ {∅}

8.
let i be an element of V

9.
V ← V \ i

10.
D ← D ∪ i

11.
for all r ∈ H such that i ∈ x(r):

12.
x(r) ← x(r) \ i

13.
if x(r) = {∅}

14.
F← {F, r}

15.
for all j in Y (r) and not in D:

16.
V ← V ∪ j

17.
end for

18.
end if

19.
end for

20.
end while
output:
F
Let D be the union of the source set and of the heads of all the reactions output in Fby FindAll. The correctness of the algorithm above is given by the following claims: every element in D is the target of some hyperpath or is part of the source, and every vertex in \mathcal{H} that can be reached from the source is in D. For the first claim we can give a constructive proof by using the output vector F, the second claim is proved by contradiction.

The proof of the fact that every element in D is reachable from the source is given constructively by the ordered set Freturned by the algorithm. In fact at each step Fis a hyperpath. This claim can be proved by induction on the steps of the algorithm, each time a hyperarc r is appended to F(line 14) the tail X(r) is contained in D (hyperpath by inductive hypothesis) and if a vertex j is added to D (line 10) it means that it was previously added to V (line 16) and thus it was in the head Y (E) of some hyperarc already in the hyperpath.

The second claim can be proved by contradiction: if an element of B\left(\mathcal{H},{S}_{\mathcal{H}}\right) were not in D there would be a hyperpath linking it to the source. In such hyperpath let consider the first (according to the order given by the definition) reaction r whose X(r) belongs to D and such that one of the elements of Y (r) does not. Consider among x(r) the last one that has been inserted into the set V ; after its removal from X(r) this set becomes empty and the elements of Y (r) are inserted into V (line 16) and then in D (line 10), which is a contradiction.
From the above statements follows that each vertex appearing in a hyperpath having as source {S}_{\mathcal{H}} is an element of D and every hyperarc is an element of F. Thus the algorithm FindAll provides an effective pruning of the original hypergraph: in \mathcal{H} there is no minimal hypergraph with source {S}_{\mathcal{H}} containing hyperarcs not in ∪ _{
k
}F_{
k
} or vertices not in B\left(\mathcal{H},{S}_{\mathcal{H}}\right). The output hyperarcs are the only ones that can belong to a minimal hyperpath, and {\mathcal{H}}^{\prime}=\left({S}_{\mathcal{H}}\cup \left({\cup}_{k}Y\left({F}_{k}\right)\right),{\cup}_{k}{F}_{k}\right) is the pruned hypergraph only containing reachable vertices and hyperarcs.
Notice that FindAll algorithm as presented above returns in polynomial time a hyperpath valid for each target vertex in B\left(\mathcal{H},{S}_{\mathcal{H}}\right). Even though there are more efficient algorithms for finding a hyperpath for one single target, for the sake of simplicity we avoid to introduce here an additional algorithm and just remark that since FindAll is polynomial, the use of it does not affect the complexity analysis of the algorithms making use of its output.
Remark that a minimal hyperpath going to a specific target can be easily extracted from the hyperpath output of FindAll. Namely, given a hyperpath \mathcal{P} connecting S to T, it is always possible to find a minimal hyperpath {\mathcal{P}}^{\prime} subset of \mathcal{P}. Moreover it can be done in polynomial time, for instance by using Minimize \left(\mathcal{P},\left\{\varnothing \right\},T,S\right), the algorithm introduced below.
Minimize\left(\mathcal{P},{R}_{f},T,S\right) is an algorithm that takes as input a hypergraph \mathcal{P}, a hyperpath R_{
f
} subset of \mathcal{P}, a target set of vertices T and a source S. If \mathcal{P} does not link T to S the empty set is returned, otherwise a hyperpath contained in \mathcal{P}, containing R_{
f
} and linking T to S is returned. In particular, if R_{
f
} is empty, the output of Minimize is the minimal hyperpath going from S to T, provided it exists. Minimize returns a hyperpath obtained by removing all inessential hyperarcs except for the ones in R_{
f
} . In the context of metabolic engineering, pathways containing a small number of heterologous reactions are generally preferred, since they are easier to engineer in the host organism. Therefore, given two pathways that produce the same target, where one is subset of the other, the one requiring the smaller number of heterologous reactions has to be selected. This is the reason that makes relevant to obtain minimal hyperpaths from generic hyperpaths.
Algorithm Minimize (Given a hypergraph \mathcal{P} containing R_{
f
} , returns either a hyperpath from S to T containing R_{
f
} or an empty set if T is not linked to S by \mathcal{P}.)
input:
\mathcal{P},{R}_{f},T,S

1.
F← FindAll(\mathcal{P}, S)

2.
P' ← P

3.
if T ⊄ ∪_{
k
} Y (F_{
k
} )

4.
{\mathcal{P}}^{\prime}\leftarrow \left\{\varnothing \right\}

5.
else

6.
for all r in \mathcal{P}

7.
if r not in R_{
f
}

8.
F← FindAll({\mathcal{P}}^{\prime} \r, S)

9.
if T ⊂ ∪ _{
k
} Y (F_{
k
} )

10.
{\mathcal{P}}^{\prime}\leftarrow {\mathcal{P}}^{\prime}\backslash r

11.
end if

12.
end if

13.
end for

14.
end if
output:
{\mathcal{P}}^{\prime}
The proof of correctness of this algorithm is simple and is based on the fact that {\mathcal{P}}^{\prime}\subseteq \mathcal{P} implies FindAll\left({\mathcal{P}}^{\prime},S\right)\subseteq FindAll\left(\mathcal{P},S\right). If a reaction in \mathcal{P} has not been removed from {\mathcal{P}}^{\prime}, then any subset of P^{0} not containing r does not produce the target. The worstcase time for this algorithm is O\left(m\cdot \left(n+m+{\sum}_{r\in \mathcal{P}}\leftX\left(r\right)\right+\leftY\left(r\right)\right\right)\right). Since X(r) and Y(r) have bounded values, the algorithm has a quadratic complexity. Even though faster algorithms can be designed, here we presented this one because of its conceptual simplicity. Remark that, since Minimize(\mathcal{P}, {∅} T, S) returns a minimal hyperpath {\mathcal{P}}^{\prime} subset of \mathcal{P} if it exists, then the minimality of a hyperpath \mathcal{P} can be tested by checking whether {\mathcal{P}}^{\prime}=\mathcal{P} or not.
A related problem to Minimize is the minimal constrained hyperpath problem: the problem of finding if a minimal hyperpath from a given source to a given target, containing the hyperarcs in R_{
f
} exists. Notice that Minimize, although linked to this problem does not solve it. In fact, if the output of Minimize is an empty set then there are no minimal hyperpaths satisfying the constraints; however if the output is a minimal hyperpath then obviously a minimal constrained hyperpath exists; and finally, if the output is a nonminimal hyperpath then we do not know if a minimal hyperpath satisfying the constraints exists or not.
Below we will discuss why we are interested in algorithms for the minimal constrained hyperpath problem, while in Appendix A.2 we show that in general the problem is NPcomplete (reduction to 3SAT).
Pathways Enumeration
The basic idea behind the enumeration algorithm presented below is to introduce an iterative refinement of partitions of the space of feasible solutions i.e. of the space of hyperpaths and in each part to look for a solution. In our implementation, a part is defined by two sets of reactions (R_{
f
} and R_{
n
} ) of the original hypergraph. These sets are used during the enumeration process, R_{
f
} is a set of hyperarcs that must be present in the enumerated hyperpath and R_{
n
} is the set of hyperarcs that must not be part of the enumerated hyperpath. The problem of finding a solution in one of the parts is addressed at each iteration and if a solution is found the part is divided in finer parts. This process is repeated until all the minimal hyperpaths have been found.
Enumeration by means of the minimal constrained hyperpath problem
First we describe informally the enumeration algorithm through the toy example hypergraph in Figure 2D and 3A, then we outline in Figure 3B a typical run for a more involved example: liquiritigenin (cf. Figure 1).
A minimal hyperpath {\mathcal{P}}_{1} connecting the node v_{8} to the source nodes v_{1}, v_{4} on the hypergraph \mathcal{H} of Figure 2D can be obtained by calling Minimize ({\mathcal{P}}^{\prime}, {∅}, {v_{8}}, {v_{1}, v_{4}}) on the hypergraph {\mathcal{P}}^{\prime} obtained by FindAll\left(\mathcal{H},\left\{{v}_{1},{v}_{4}\right\}\right). The hypergraph {\mathcal{P}}^{\prime} is represented in Figure 3A.
Once {\mathcal{P}}_{1}=\left\{{R}_{4},{R}_{3}\right\} has been obtained, the search space is divided into three parts:

the hypergraphs which do not contain R_{4},

the hypergraphs which do contain R_{4} and do not contain R_{3},

the hypergraphs which do contain R_{4} and R_{3}.
The first set does not contain hyperpaths connecting the target to the source: once the reaction R_{4} is removed, v_{8} is disconnected from the source. The second set contains a solution and thus has to be partitioned. The third set contains only one minimal pathway (the one consisting of hyperarcs R_{3}, R_{4} highlighted in Figure 3A).
The minimal hyperpath in the second set is found by running FindAll on \mathcal{H}\backslash {R}_{3} and then Minimize with constraint R_{
f
} = {R_{4}}. The minimal hyperpath so obtained is the one only containing hyperarcs R_{4}, R_{7}. The set of the hypergraphs defined by (R_{
f
} = {R_{4}}, R_{
n
} = {R_{3}}) is partitioned in two parts defined by new sets of constraints. The way the partition is done is explained in detail in algorithm FindPath and gives two non overlapping sets:
The first of these sets does not contain hyperpaths going to v_{8}: once R_{3} and R_{7} are removed, node v_{8} is disconnected from the source. The second one only contains the second and last minimal hyperpath: the one consisting of hyperarcs R_{7}, R_{4}. The algorithm here sketched is based on the fact that all minimal hyperpaths are found once the problem of finding a minimal hyperpath has been solved for each part of the partition.
Relaxed hyperpath minimization
The enumeration procedure is performed by the algorithm FindPath, which enumerates all the minimal pathways and does not output duplicate hyperpaths. Precisely FindPath (\mathcal{H}, R_{
f
} , T {S}_{\mathcal{H}} returns a set of hyperpaths containing all the minimal hyperpaths in \mathcal{H} connecting T to {S}_{\mathcal{H}} and containing all the reactions in R_{
f
} . FindPath (\mathcal{H}, {∅}, T, {S}_{\mathcal{H}}) returns all the minimal hyperpaths from the source {S}_{\mathcal{H}} to the target T in \mathcal{H}.
A schematic representation of how FindPath works for the enumeration of the pathways of liquiritigenin is given in Figure 3B where we represent each call with a box connected by an arrow to its parent process. For each call of FindPath either a new hyperpath is found and then FindPath is executed with new constraints, or there are no new hyperpaths and the branching process is stopped. The new constraints sets R_{
f
} ', R_{
n
} ' for a new call of FindPath are obtained by incrementing the sets R_{
f
} , R_{
n
} of the father process. Given an order for the hyperarcs of the hyperpath \mathcal{P} found for the father process, the set R_{
n
} ' relative to the child process is constructed by incrementing R_{
n
} by one element r belonging to \mathcal{P}, the set R_{
f
} ' is constructed by incrementing R_{
f
} by all the hyperarcs coming before r. For each element in \mathcal{P} not belonging to R_{
n
} a child process is called.
FindPath (\mathcal{H}, {∅}, T{S}_{\mathcal{H}}) returns all the minimal hyperpaths from the source {S}_{\mathcal{H}} to the target T in \mathcal{H}. In the context of metabolic engineering FindPath returns all the metabolic pathways for the production of the target compounds.
Algorithm FindPath (Enumerate all minimal hyperpaths from {S}_{\mathcal{H}} to the target set T with constrains R_{
f
} on the hypergraph given by \mathcal{H})
input:
\mathcal{H}, R_{
f
} , T {S}_{\mathcal{H}}

1.
F← FindAll(\mathcal{H},{S}_{\mathcal{H}})

2.
\mathcal{P} ← Minimize(∪ _{
k
}F_{
k
} ∪ R_{
f
} , R_{
f
} , T{S}_{\mathcal{H}})

3.
En ← ∅

4.
if
\mathcal{P}\ne \varnothing

5.
En\leftarrow \mathcal{P}

6.
\mathit{F}\leftarrow \mathsf{\text{FindAll}}\left(\mathcal{P},{S}_{\mathcal{H}}\right)

7.
for all k in F,..., 1}

8.
r = F_{
k
}

9.
if r not in R_{
f
} :

10.
En\leftarrow \left\{En,\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{FindPath}}\left(\mathcal{H}\backslash r,{R}_{f},T,{S}_{\mathcal{H}}\right)\right\}

11.
R_{
f
} ← R_{
f
} ∪ r

12.
end if

13.
end for

14.
end if
output:
En
The loop at line 7 of FindPath is done according to the order given by line 6 where the hyperarcs are ordered so that at least one of the head vertices of each hyperarc is a tail vertex of some previous reaction. Such an ordering is always possible since \mathcal{P} is a hyperpath. As said above and illustrated in Figure 3B, FindPath is an algorithm that iteratively calls itself, see line 10. Note that even if R_{
n
} is not explicitly defined in FindPath, it is constructed implicitly when at line 10 of FindPath is called on the smaller graph \mathcal{H}\backslash r.
Let us note that the output of the enumeration is not always composed of minimal hyperpaths. This is due to the fact that the algorithm Minimize while running in polynomial time can return a nonminimal hyperpath. An algorithm always returning minimal hyperpaths cannot be polynomial since the problem of finding a minimal hyperpath containing a set R_{
f
} of hyperarcs is an NPcomplete problem as showed in Appendix A.2. However, in many practical instances (for instance when hyperarcs only have one head node), the algorithm Minimize returns a minimal constrained hyperpath. As a matter of fact, for all the enumeration studies we have so far carried out, we observed that the output obtained by Minimize when called by the algorithm FindPath introduced above was always a minimal hyperpath. Nonetheless, a characterization of hard instances of the minimal constrained hyperpath problem is given in Appendix.
Supplements Enumeration
Provided a given metabolic network and a set of source compounds (e.g. a set of compounds in the growth media, a set of endogenous metabolites of a species) it may not be possible to link all the metabolites of the network to the source set. When a target compound is not accessible from the source set, one can consider the possibility of inserting into the metabolism of the organism some precursors so that the target becomes reachable. In practice such a task can be carried out through the enrichment of the growth media. More generally, the insertion of supplements can be used even when the target compound is reachable in order to access to new pathways for the production of the target.
Let a supplement for a target T be any compound i\notin B\left(\mathcal{H},{S}_{\mathcal{H}}\right) that is involved as reactant in at least one minimal hyperpath going from a superset of {S}_{\mathcal{H}}\cup i to the target T. Below we give the algorithm FindSupp finding the supplements for the production of a given target. In Figure 2D supplements are highlighted in red. Therefore, the process of finding supplements is useful as a general strategy in metabolic engineering in order to determine which metabolites might be part of the metabolism that produces a given target. Algorithm FindSupp (Find supplements for the production of the compounds in T, from the source {S}_{\mathcal{H}} of hypergraph \mathcal{H})
input:
\mathcal{H}{S}_{\mathcal{H}}T (list of compounds to produce)

1.
WishList ← T

2.
D ← {∅}

3.
while WishList \ D ≠ {∅}

4.
let i be an element of WishList \ D

5.
D ← D ∪ i

6.
Aux ← {∅}

7.
for all reactions r with i ∈ Y (r)

8.
Aux\leftarrow Aux\cup \left(X\left(r\right)\backslash \left({S}_{\mathcal{H}}\cup D\right)\right)

9.
end for

10.
WishList ← WishList ∪ Aux

11.
end while

12.
F\leftarrow \mathsf{\text{FindAll}}\left(\mathcal{H},{S}_{\mathcal{H}}\right)

13.
D\leftarrow D\backslash {S}_{\mathcal{H}}\cup \left({\cup}_{k}Y\left({F}_{k}\right)\right)
output:
D
Bootstraps
Bootstrap molecules relative to a source {S}_{\mathcal{H}} are the molecules that cannot be produced by a hyperpath with source {S}_{\mathcal{H}} unless they are already present in the media. An example of bootstrap nodes are nodes v_{2}, v_{3} of Figure 2D. In this section we give an algorithm finding in polynomial time all the bootstraps of a hypergraph \mathcal{H} with source vertices {S}_{\mathcal{H}}. Bootstraps are special kind of supplement, if at any step of a pathway, a heterologous metabolite is needed as a substrate but has not yet been produced from the source set, then this metabolite is a bootstrap and must be added in the growth media for the reaction to take place, and for the pathway to be a valid pathway. The algorithm given below enables one to detect bootstraps prior enumerating pathways running the FindPath algorithm.
Algorithm FindBootstraps (Given a hypergraph \mathcal{H} and a source {S}_{\mathcal{H}}, returns the set B of bootstrap nodes)
input:
\mathcal{H},{S}_{\mathcal{H}}

1.
\mathit{F}\leftarrow \mathsf{\text{FindAll}}\left(\mathcal{H},{S}_{\mathcal{H}}\right)

2.
D\leftarrow {S}_{\mathcal{H}}\cup \left({\cup}_{k}Y\left({F}_{k}\right)\right)

3.
{\mathcal{H}}^{\prime}\leftarrow \left\{\varnothing \right\}

4.
for all
r
in
\mathcal{H}

5.
r' ← (X (r) \ D, Y (r) \ D)

6.
if Y (r') ≠ {∅}:

7.
{\mathcal{H}}^{\prime}\leftarrow {\mathcal{H}}^{\prime}\cup {r}^{\prime}

8.
end if

9.
end for

10.
while exists v in {\cup}_{r\in {\mathcal{H}}^{\prime}}Y\left(r\right)\backslash {\cup}_{r\in {\mathcal{H}}^{\prime}}X\left(r\right)

11.
for all r' containing v:

12.
r' ← (X(r'), Y(r') \ v)

13.
if (Y(r') = {∅}) or (v ∈ X(r')):

14.
{\mathcal{H}}^{\prime}\leftarrow {\mathcal{H}}^{\prime}\backslash {r}^{\prime}

15.
end if

16.
end for

17.
end while

18.
B={\cup}_{r\in {\mathcal{H}}^{\prime}}Y\left({r}^{\prime}\right)
output:
B
The FindBootstraps algorithm is linear in the number of vertices, hyperarcs and in the total coordination. Remark that the set {\cup}_{r\in {\mathcal{H}}^{\prime}}Y\left({r}^{\prime}\right) obtained in line 18 is equal to {\cup}_{r\in {\mathcal{H}}^{\prime}}X\left({r}^{\prime}\right). In fact the bootstrap vertices b\phantom{\rule{0.3em}{0ex}}\left(\mathcal{H},{S}_{\mathcal{H}}\right) constitute the largest set of vertices not reachable from the source and such that each element of the set belongs to the head of at least one reaction whose tail only contains vertices in B\left(\mathcal{H},{S}_{\mathcal{H}}\right) or in b\phantom{\rule{0.3em}{0ex}}\left(\mathcal{H},{S}_{\mathcal{H}}\right). Notice that the set of bootstrap vertices in a hypergraph \mathcal{H} only depends on the source vertices and does not depend on the target.