In this section, we introduce integer programming-based methods for ECI. Integer programming, in particular, Integer Linear Programming (ILP) is set to minimize (or maximize) a linear objective function under linear constraints with all the variables taking integer values. In the following, each variable takes including the binary value (i.e., 0 or 1), representing the Boolean values. We apply ILP to ECI since ILP is widely used for solving NP-hard problems.
The ILP formulation for the network in Fig. 1 is as follows: ILP-ECI
$$\begin{array}{*{20}l} {\min} \left\{\sum_{i=1}^{m}TCi\right\} \end{array} $$
(1)
subject to
$$\begin{array}{*{20}l} &FC5=1 \end{array} $$
(2)
$$\begin{array}{*{20}l} &TR1+FC1+FC7+FE2\geq 1, \\ &FR1+TC1\geq 1, \\ &FR1+TC7\geq 1, \\ &FR1+TE2\geq 1, \end{array} $$
(3)
$$\begin{array}{*{20}l} &TR2+FC2+FC3+FE1\geq 1, \\ &FR2+TC2\geq 1,\\ &FR2+TC3\geq 1,\\ &FR2+TE1\geq 1, \end{array} $$
(4)
$$\begin{array}{*{20}l} &TR3+FC6+FE1\geq 1, \\ &FR3+TC6\geq 1,\\ &FR3+TE1\geq 1 \end{array} $$
(5)
$$\begin{array}{*{20}l} &TC2=TR1 \end{array} $$
(6)
$$\begin{array}{*{20}l} &TC4=TR2 \end{array} $$
(7)
$$\begin{array}{*{20}l} &TC5=TR2 \end{array} $$
(8)
$$\begin{array}{*{20}l} &FC7+TR2+TR3\geq 1,\\ &TC7+FR2\geq 1,\\ &TC7+FR3\geq 1 \end{array} $$
(9)
$$\begin{array}{*{20}l} &TC8=TR3 \end{array} $$
(10)
$$\begin{array}{*{20}l} &TC9=TR3 \end{array} $$
(11)
$$\begin{array}{*{20}l} &TC1=1, \qquad TC3=1, \qquad TC6=1 \end{array} $$
(12)
$$\begin{array}{*{20}l} &TX+FX=1 \text{for any X} \end{array} $$
(13)
We denote the above formalization as ILP-ECI. Here all variables including the value of reaction compound and enzyme nodes take either 1 or 0. Thus, \(v_{r_{i}}\) can be either 0 or 1, and \(v_{c_{i}}\) and \(v_{e_{i}}\) also take 0 or 1. In this example, \(v_{r_{i}}=0\) (resp. \(v_{r_{i}}=1\)) indicating that the value of reaction i takes 0 is represented by FRi=1 (resp. TRi=1) which implies that the reaction is inactivated (otherwise, the TRi=1 implies the reaction is activated). Therefore, TRi=0 (equivalent to FRi=1) means the corresponding value for true reaction takes 0, which implies the reaction is inactive. And FRi=0 (equivalent to TRi=1) indicates the corresponding value for false reaction takes 0, which implies the reaction is active. Thus, TRi+FRi=1 holds for any node i in the network. Similarly, TCi and FCi are used to represent the values of compound nodes. For instance, TC2=1 means that \(v_{c_{2}}=1\) and in other words, FC2=0 since TCi+FCi=1. Furthermore, \(v_{r_{i}}\) corresponds to “AND” node which implies that if \(v_{e_{i}}=0\) will inactivate \(v_{r_{i}}\).
The objective function (1) means that the damage should be minimized. FCi=1 (or TCi=0) means that a compound \(v_{c_{i}}\) is not producible. Equation (2) means that the target compound \(v_{c_{5}}\) should be 0 after the 0-1 assignment converges. Equation (3) represents the Boolean relation \(v_{r_{1}}=v_{c_{1}}\land v_{c_{7}}\land v_{e_{2}}\). Note that the Boolean relations such as “ ∨” or “ ∧” cannot be used in ILP formulation, we need to convert them into linear equations and/or inequations. Actually, “ ∨” indicates “AND” function and “ ∧”represents “OR” function. Since x1=x2∧x3∧⋯∧x
n
can be represented by
$${}\left(x_{1}\!\vee\! \overline{x_{2}} \vee\! \cdots \!\vee \overline {x_{n}}\right)\land\left(\overline{x_{1}} \vee x_{2}\right)\land \left(\overline{x_{1}} \vee x_{3}\right)\land\! \cdots \land \left(\overline{x_{1}} \vee\! x_{n}\right)\,=\,1, $$
the constraint \(v_{r_{1}}=v_{c_{1}}\land v_{c_{7}}\land v_{e_{2}}\) can be converted into
$${}\left(v_{r_{1}}\!\vee \!\overline{v_{c_{1}}}\!\vee\! \overline{v_{c_{7}}}\vee \overline{v_{e_{2}}}\right)\land \left(\overline{v_{r_{1}}}\vee \!v_{c_{1}}\right)\land \left(\overline{v_{r_{1}}}\vee\! v_{c_{7}}\right)\land\left(\overline{v_{r_{1}}}\!\vee \!v_{e_{2}}\right)\,=\,1. $$
Thus Eq. (3) is obtained. Similarly, Eqs. (4)-(5) represent the constraints of \(v_{r_{2}}\) and \(v_{r_{3}}\), respectively.
For a compound node with indegree is 1 which indicates the node has only one incoming edge, the value of the predecessor is just copied. For instance, since \(v_{c_{2}}\) has only one predecessor \(v_{r_{1}}\), \(v_{c_{2}}\) is just copied from \(v_{r_{1}}\) as shown in Eq. (6). Similarly, \(v_{c_{4}}\) is just copied from \(v_{r_{2}}\) which is shown in Eq. (7).
However, for a compound node with indegree more than 1, it is necessary to convert the “ ∨” relation into linear equation or inequations. Equation (9) represents the Boolean relation \(v_{c_{7}}=v_{r_{2}}\vee v_{r_{3}}\). Since x1=x2∨x3∨⋯∨x
n
can be represented by
$${}\left(\overline{x_{1}}\!\vee\! {x_{2}} \!\vee\! \cdots \!\vee\! { x_{n}}\right)\land\left({x_{1}} \vee \overline{x_{2}}\right)\land \left({x_{1}} \vee \overline{x_{3}}\right)\land \cdots \land \left(x_{1} \!\vee \overline{x_{n}}\right)\,=\,1, $$
\(v_{c_{7}}=v_{r_{2}}\vee v_{r_{3}}\) can be turned into \(\left (\overline {v_{c_{7}}}\vee v_{r_{2}}\vee v_{r_{3}}\right)\land (v_{c_{7}}\vee \overline {v_{r_{2}}})\land \left (v_{c_{7}}\vee \overline {v_{r_{3}}}\right)\). Thus, Eq. (9) can be obtained.
Equation (6) - Eq. (12) represent the constraints of \(v_{c_{1}},\cdots,v_{c_{9}}\) respectively. Equation (12) means that \(v_{c_{1}}, v_{c_{3}}\) and \(v_{c_{6}}\) are 1 since their indegrees is 0.
Equation (13) means that “T” and “F” correspond to “true (1)” and “false (0)”, respectively, and complement each other. X in Eq. (13) means any component or reaction in the metabolic network.
The above formalization can clearly solve ECI and obtain the correct solution {E2}. Besides, the number of variables is O(m+n) in the above formalization where m and n are the number of compounds and reactions, respectively.
It is noted that solving ILP is NP-complete, however, a problem that can be formalized as ILP is not always NP-complete. Thus in the following, we prove that ECI is NP-complete.
Theorem
ECI is NP-complete problem with the maximum indegree and outdegree being bounded by 2.
Proof
Obviously, the problem is in NP, it suffices to show that it is NP-hardness. The proof is by a polynomial time reduction from minimum edge cover (MEC), which is a problem for a given graph to find the minimum number of edges so that each node is incident to at least one of the selected edges. For instance, E1={e2,e3,e6} is one of optimal solutions of MEC for graph shown in Fig. 2. Let G=(V,E) be an instance of MEC, where V={v1,v2,⋯,v
n
} and E={e1,e2,⋯,e
m
}. We then construct the corresponding ECI as below. The metabolite network G=(V
c
∪V
r
∪V
e
,E) is given by
$$\begin{aligned} V_{c}=\left\{c_{1},c_{2},\cdots,c_{m}\right\}\cup{c_{t}}, \quad V_{r}=\left\{r_{1},r_{2},\cdots,r_{n}\right\}, \end{aligned} $$
$$\begin{aligned} &{}E\,=\,\left\{\!\left\{\!c_{i},r_{j}\right\}\!|i\,=\,1,\cdots, m, j\,=\,\!1,\cdots, n, \text{if} \ v_{j} \ \text{is an end point of}\right.\\ &{}\left. \ e_{i}.\right\} \cup \left\{\left\{r_{j},c_{t}\right\}|j=1,\cdots,n\right\} \end{aligned} $$
It is noted that the minimum damage is determined uniquely by the inhibition of enzyme set. Furthermore, our objective is to minimize the “damage” (side-effects). Then V
e
can be regarded as virtual nodes and denoted as an empty set in this case. The ECI problem can be converted into the problem of identifying the minimum set of non-target compounds. Thus the graph for MEC shown in Fig. 2 is converted into ECI shown in Fig. 3. It is clear that this conversion can be done in polynomial time. Then we show that MEC for G has a solution of size z if and only if ECI has a minimum damage of size z. To guarantee that the target compound c
t
is stopped (i.e., c
t
=0), it implies that all r
j
(j=1,⋯,n) takes the value 0. If G has an edge cover of size z, then it follows that the minimum number of c
i
taking 0 should be z. On the other hand, if the minimum damage for ECI is z, then each r
j
must be 0 so as to satisfy c
t
=0, we have at least predecessor of each r
j
must be included in the minimum damage set. Since there is an edge between c
i
and r
j
if and only if v
j
is incident to e
i
. Thus {c
i
|c
i
∈minimum damage set} is an edge cover of size z. □