Intervention in gene regulatory networks via greedy control policies based on long-run behavior

Background A salient purpose for studying gene regulatory networks is to derive intervention strategies, the goals being to identify potential drug targets and design gene-based therapeutic intervention. Optimal stochastic control based on the transition probability matrix of the underlying Markov chain has been studied extensively for probabilistic Boolean networks. Optimization is based on minimization of a cost function and a key goal of control is to reduce the steady-state probability mass of undesirable network states. Owing to computational complexity, it is difficult to apply optimal control for large networks. Results In this paper, we propose three new greedy stationary control policies by directly investigating the effects on the network long-run behavior. Similar to the recently proposed mean-first-passage-time (MFPT) control policy, these policies do not depend on minimization of a cost function and avoid the computational burden of dynamic programming. They can be used to design stationary control policies that avoid the need for a user-defined cost function because they are based directly on long-run network behavior; they can be used as an alternative to dynamic programming algorithms when the latter are computationally prohibitive; and they can be used to predict the best control gene with reduced computational complexity, even when one is employing dynamic programming to derive the final control policy. We compare the performance of these three greedy control policies and the MFPT policy using randomly generated probabilistic Boolean networks and give a preliminary example for intervening in a mammalian cell cycle network. Conclusion The newly proposed control policies have better performance in general than the MFPT policy and, as indicated by the results on the mammalian cell cycle network, they can potentially serve as future gene therapeutic intervention strategies.


Background
Boolean networks (BNs), and more generally, probabilistic Boolean networks (PBNs) [1,2], have been used for finding beneficial interventions in gene regulatory networks through the study of network dynamics. Upon describing these dynamics via Markov chains, optimal stochastic control policies can be determined via dynamic programming [3][4][5] to change the long-run dynamics, which are characterized by the steady-state distribution (SSD) of the network (Markov chain), the purpose being to reduce the risk of entering aberrant states and thereby alter the extant cell behavior. Three problems arise with this approach. First, the dynamic programming algorithm used to find optimal policies has complexity which increases exponentially with the number of the genes in the network. Approximate, or model-free, control policies have been proposed to alleviate this computational burden [6,7]. Second, the classical infinite-horizon approach to control requires a cost function and this requires subjective input. Third, and most importantly relative to the algorithms proposed in the current paper, optimization is with respect to the cost function and is only secondarily related to the steady-state distribution. Here, our purpose is to reduce the mass of the steady-state distribution corresponding to undesirable states and increase the mass corresponding to desirable states, and to do this directly without the mediating factor of a cost function.
In [7], a stationary mean-first-passage-time (MFPT) control policy is proposed that circumvents the need for a cost function and works directly with the transition probability of the Markov chain associated with the network. Biologically, it can be motivated by the following example. In a stable cancer cell line, the cells will keep proliferating without intervention. Assume that the goal of the intervention is to push the cell into programmed cell death (apoptosis) by intervening two candidate genes: p53 and telomerase. The p53 gene is the most well-known tumor suppressor gene [8][9][10]. The telomerase gene encodes telomerase, which maintains the integrity of the ends of chromosomes (telomeres) in germ cells and progenitor cells; therefore, it is responsible for replenishing cells during the normal cell turnover (homeostasis). In somatic cells, the telomerase gene is turned off, resulting in telomere shortening each time the cell divides -a key reason for the limited life-span of normal cells [11]. In the majority of tumor cells, telomerase is activated, which is believed to contribute to the prolonged life-span of the tumor cells [12]. This worsens prognosis for cancer patients [13,14]. Extensive experimental results indicate that when p53 is activated in the cells, for example in response to radiation, the cells undergo rapid growth inhibition and apoptosis in as short as a few hours [15]. In contrast, inhibition of the telomerase gene also leads to cell growth inhibition, differentiation, and cell death, but only after cells go through a number of cell divisions (allowing telomere shortening). Cell death takes a longer time through this latter process than through p53 activation. The use of mean first passage times for finding the best control gene is intuitive because the activation of p53 can lead more quickly (or with higher probability) to apoptosis than the inactivation of telemerase. Based on this kind of observation, the MFPT control policy employs two heuristics: (1) it is preferable to reach desirable states (apoptosis in the example) as early as possible; (2) it is preferable to leave undesirable states (cell proliferation) as early as possible.
Motivated by the success of the MFPT algorithm, in this paper, we propose three stationary control policies under the assumption that we have the transition probability matrix and the state transition diagram for the Markov chain of the network. Given that the intervention objective is to shift the steady-state distribution to desirable states, we propose to directly use the long-run behavior as our criterion for control instead of using the mean first passage time, which is an indirect measure. For the first control policy, we directly investigate the attractor states, which have been conjectured to correspond to phenotypes of the modeled cell. We replace the mean-first-passage-time criterion by the distance to (un)desirable attractor states of the underlying Markov chain, which can be computed efficiently. Since the shift of steady-state distribution can be computed efficiently using the analytic formula derived in [16][17][18], a second new control policy, having similar time complexity as the original MFPT control policy, is proposed based on the shift of steady-state distribution. A third policy also uses the steady-state distribution as the criterion, but gives up some computational efficiency in order to increase the certainty that applying the derived control policy will lead to the reduction of the total stationary mass for undesirable states. Because these new policies directly utilize the long-run characteristics of the network, we expect them to perform better than the MFPT policy with respect to shifting the steady-state distribution. A simulation study supports this expectation. In addition, a preliminary example of applying these control policies on a mammalian cell cycle network has shown that we can successfully identify gene E2F as the best potential control target, which has been conjectured in [19] through different mathematical modeling.

Probabilistic Boolean networks
We focus on intervention in binary PBNs in this paper but these results directly extend to more general PBNs having any discrete range of values since the underlying models are always finite Markov chains. Following the standard definitions of the Boolean network model [1,2], PBNs are described by truth tables determined by Boolean regulatory rules and the related parameters, including various probabilities. In a binary Boolean network of n genes, the state of each gene x i ∈ {0, 1} at time t + 1 is determined by the values of a set V i of predictor genes at time t via a Boolean "predictor" function , where .., f m is randomly selected according to the probability distribution c = {c 1 ,..., c m }, where it is possible for the current function to be chosen. There are two types of PBNs with different interpretations regarding q. If q < 1, the PBN is context-sensitive and q is usually assumed to be small. If q = 1, as in the original formulation of PBNs [2], the PBN is said to be instantaneously random. All of these various PBNs inherit the attractor cycles of their constituent BNs governed by predictor functions. Introduction of random perturbation makes the corresponding Markov chain of a PBN irreducible. Hence, it possesses a steady-state distribution π describing the long-run behavior. With sufficiently small p, π will reflect the attractor structure. For developing therapeutic intervention, we are especially interested in the proportion of time the network occupies an attractor in its steady state.
The dynamics of PBNs can be analyzed via their associated homogeneous irreducible finite Markov chains. For instantaneously random PBNs, the states of the associated Markov chain are the states (GAPs) of the network; for a context-sensitive PBN, the chain states are (context, GAP) pairs. We can derive the transition probability matrix P from the truth tables and the involved probabilistic parameters, and from there derive the steady-state distribution π. The computation of the transition probabilities between states in PBNs has been discussed in several papers [5,20,21]. We re-iterate them in the following theorem. The proof of the theorem can be referred to in Additional file 1. respectively. The transition probability from (s, y) to (r, x) for a context-sensitive PBN is given by where r, s denote the rth and sth BNp, which are the BNps at time t + 1 and t.
For a given transition matrix P for any type of PBN, we have where π is the corresponding steady-state distribution for P and T denotes transpose.
In this paper, we focus on BNps and instantaneously random PBNs. Subsequently we will comment on extension to context-sensitive PBNs. For the moment, we note that, for the purposes of intervention, the reduction of a context-sensitive PBN to an instantaneously random PBN has been proposed in [5]. This reduction results in a large computational savings when deriving control strategies.
Moreover, from the perspective of gene-expression inference of a context-sensitive PBN, the variables r and s in P s, y (r, x) are hidden variables because we only observe the transition of gene-expression states. Hence, it is difficult to accurately infer the transient structure of context-sensitive PBNs without a large amount of data [22]. In [21], the effect of the reduction from context-sensitive PBNs to instantaneously random PBNs in [5] is investigated and it is shown that, while there is some loss of control performance using the reduced model, the loss depending on the structure of the PBN, generally there can still be significant therapeutic benefits for these control strategies in situations where it is impractical to utilize the full model. Hence, our focus on the control policies on instantaneously random PBNs still leads to practical benefits to shift steady-state distributions beneficially.

Stochastic optimal intervention
The problem of optimal intervention for PBNs is formulated as an optimal stochastic control problem. Assuming that we can only control a single gene g in the network as in previous applications [5,7], the policy is of the form u g (t) ∈ = {0, 1}. If the control at time step t is on, u g (t) = 1, then the expression state for g is flipped; if u g (t) = 0, then the state of the control gene g remains unchanged.
We consider the intervention as perturbing the transition probability of the original underlying Markov chain. Absent control, we have the transition probability P y (x) = P (X t+1 = x|X t = y); with control, we have The previous algorithms [3][4][5] assign a cost function for each intervention in the system and propose to solve the corresponding optimization problem in both finite and infinite horizon frameworks. In general, the cost depends on the state y at time t, the successor state x at time t + 1, and the control input u. The expected immediate cost is defined for state y, when control u is selected, by The finite-horizon control problem deals with control of the underlying Markov chain over a finite horizon and does not change its steady-state distribution. The infinite-horizon control problem finds the optimal stationary control policy that is independent of time with respect to the expected total discounted cost and does affect the steadystate distribution. The discounting factor, α ∈ (0, 1), ensures the convergence of the expected total cost over the long-run [23]. In the case of cancer therapy, the discounting factor emphasizes that obtaining treatment at an earlier stage is favored over later stages [7]. The expected total discounted cost formulation is given by In this stochastic control problem, we seek an intervention strategy among all the admissible intervention strategies Γ g that minimizes the above objective function for each state X 0 = y in the network, i.e., Based on the results given in [23], it has been shown in [5] that an optimal intervention strategy exists for the discounted optimal stochastic control problem and the optimal cost function J* satisfies J* is the unique solution of this equation within the class of bounded functions. Equation 8 is known as the Bellman optimality equation. An optimal control policy is a stationary policy that attains the minimum in the right-hand side of the Bellman optimality equation for all the states in the network.
From a strictly long-run perspective, absent any assigned costs, the preceding solution can be considered as an approximate way to find a control that most beneficially shifts the steady-state distribution. The problem here is that the algorithm requires appropriate settings for the cost function , which are difficult to obtain. Moreover, the existing iterative algorithms to find an optimal control policy in the above framework still have high computational complexity O(2 3n ) for each iteration [7].

Mean-first-passage-time (MFPT) control policy
In [7], a greedy stationary control policy using mean first passage times of the underlying Markov chain is proposed. When considering therapeutic interventions, the state space can be partitioned into the set D of desirable states and the set U of undesirable states according to the expression values of a given set of genes. Based on the intuition that an effective intervention strategy should reduce the likelihood of visiting undesirable states by increasing the time to reach undesirable states or decreasing the time to desirable states, a greedy mean-first-passage-time control policy can be derived.
To describe the MFPT control policy, without loss of generality, we assume that gene x 1 decides x = {x 1 x 2 ...x n } to be desirable when x 1 = 1 and undesirable when x 1 = 0. We also assume there is a set of control genes which are different from x 1 . For simplicity, and as is often done, we assume there is a single control gene denoted as g. The intuition behind the MFPT algorithm is that when a desirable x reaches U on average faster than x c , the state with the control gene g flipped from x, it is reasonable to apply control to flip g and start the next network transition from x c . The transition matrix of the original PBN can be written as From general Markov chain theory [24], we can compute the mean first passage times K U and K D by solving the following system of linear equations: where e denotes column vectors of 1's with the appropriate length; the vectors K U and K D contain the MFPTs from each state in D to U, and from each state in U to D, respectively.
To design the MFPT control policy, we check for each state x and the corresponding flipped state x c . If x is undesirable, we check whether K D (x) -K D (x c ) ≥ λ to make the time to reach the desirable states D faster; Otherwise when x is desirable, we check whether K U (x c ) -K U (x) ≥ λ to make the time to leave the undesirable states U faster. The parameter λ is set to a higher value when the ratio of the cost of control to the cost of the undesirable states is higher, the intent being to apply the control less frequently; if we are not interested in limiting application of control, we set λ = 0.
The computational complexity for finding the MFPT control policy in the original PBN is O(2 n ).
The pseudocode for the MFPT algorithm in Appendix 1, reproduced from [7], summarizes the procedure when applied to all possible control genes.
As noted in [7], even if one wishes to apply a cost-based optimization, the MFPT procedure can serve to find a good control gene and also gain insight on the controllability of the network.

Control policies directly based on long-run behavior Basin of Attraction (BOA) control policy
Although mean first passage time is closely related to the steady-state distribution, the MFPT control policy does not use the shift of stationary mass directly as a criterion. Given the basins of attraction (BOA), which determine the long-run behavior of a PBN, we can use this information to derive a control policy more directly related to the steady-state distribution. For this BOA control policy, we again assume that the state space is partitioned into the sets D and U of desirable and undesirable states. For any state x, let A j (x) be the set of attractors (the cycle) for the basin containing x in the jth constituent BNp, keeping in mind that a state x belongs to exactly one basin of attraction in each constituent BNp. Let , the union of attractors for the basins of x taken across all constituent BNps. Further, for each constituent BNp, we compute the minimal distance of each state to states in D or to states in U, and then we compute the respective PBN distances d D and d U by taking weighted averages with respect to the selection probabilities. Since most of the stationary mass is distributed in the attractors, the structural properties of the basins, including the properties of their attractors and their sizes, determine the long-run behavior and the steady-state distribution of the network [20]. Hence, it is reasonable to design a control policy based on the BOA structure.
We proceed in similar way to the manner in which the MFPT policy is derived in [7] to obtain a basin of attraction (BOA) control policy. For a pair of undesirable states x and x c , we first check whether B(x) or B(x c ) contains any desirable attractors. If only one of them contains desirable attractors, then we decide the control policy to always go to that state so that we increase the likelihood of entering into desirable attractors. Otherwise, if both of them have desirable attractors or neither B(x) or B(x c ) has desirable attractors, we compare d D (x) and d D (x c ): Whichever is minimum, we apply control to get that state so as to reach the desirable states D faster. We do not apply any control . For a pair of desirable states x and x c , we first check whether B(x) or B(x c ) contains any undesirable attractors. If only one of them contains undesirable attractors, then we apply control to flip to that state so that we reduce the risk of getting into undesirable attractors. If the condition is satisfied for both of the states or neither of them, we then check d U (x) and d U (x c ) to make the time to reach the undesirable states U slower. The computational complexity for finding this control policy in the original PBN is O(2 n ), similar as in deriving the MFPT control policy. Since finding BOA structures of PBNs does not involve computing matrix inversions and is relatively less expensive than computing mean first passage times, especially with increasing number of genes in the network, the algorithm to find the BOA control policy is more efficient than the previous algorithm to find the MFPT control policy. The pseudocode in Appendix 2 summarizes the BOA procedure for finding the best control gene and the corresponding stationary control policy.

Steady-state distribution (SSD) control policy
Although the BOA structural properties constitute one determinant factor for the steady-state distribution that we aim to shift, the BOA algorithm does not use the steady-state distribution directly. Thus, we consider a control policy directly using the shifted stationary mass as the criterion of applying control. A key issue for such an algorithm is the efficient computation of the shifted stationary mass resulting from intervention. Recently, we have adapted the perturbation theory in finite Markov chains [18,25] to derive an analytic solution to compute the shifted mass efficiently [17]. We next state a theorem for general Markov chains which has appeared in [17,18,25].
Theorem 2: For a perturbed Markov chain with = P + E by a rank-one perturbation E = ab T , where a and b are two arbitrary vectors, the steady-state distribution is given by where e is a vector with all its elements equal to 1, t and u are any vectors such that π T t ≠ 0 and u T e ≠ 0, and β T = b T [I -P + tu T ] -1 .
If we let t = e and u = π in (9), we can derive where β T = b T Z and Z is the fundamental matrix of the underlying Markov chain for the original network. Hence, the steady-state distribution of the rank-one perturbation is expressed in terms of π and Z, the steady-state distribution and fundamental matrix of the original network. Thus, for rank-one perturbations to regulatory functions, we have an explicit way to compute the exact shifted stationary mass.
Since the control by flipping one control gene g at any given state x simply changes the original transition matrix P to the controlled transition matrix by replacing the row in P corresponding to the state x by the row that corresponds to the state x c with g flipped from x, the perturbation matrix can be written as a rank-one matrix and the perturbed steady-state distribution can be computed efficiently by: where p x and are the two rows corresponding to the states x and x c in P, z x is a column corresponding to the state x in Z, π x is the stationary mass for x, and (x) denotes the steady-state distribution after we apply gene flipping at the state x. Following this analytic solution, we can quickly compute the total stationary mass for undesirable states π U and (x), and therefore the shifted mass after the possible controls to each state. Once we have that, we can derive a steady-state distribution (SSD) control policy based on a procedure similar to deriving the MFPT control policy. We compare the total stationary mass of undesirable states after applying control to x and x c : from both the MFPT and BOA control policies a little bit by vector-matrix multiplications involved in (11). The pseudocode for the SSD algorithm in Appendix 3 summarizes the procedure for finding the best control gene and the corresponding stationary control policy based on shifted stationary mass.
As in the previous algorithms, deriving the SSD control policy only looks into the effects caused by perturbations to the pairs of states x and x c . Considering the perturbation to the original transition matrix P, for each network state x, we compare the steady-state distributions π and (x) that correspond to P and the controlled transition matrix by replacing the row in P corresponding to the state x by the row that corresponds to the flipped state x c . If the undesirable stationary mass based on this one-row perturbation caused by flipping the control gene g at x reduces the undesirable stationary mass, then we decide the control policy for the state x: u g (x) = 1. For the derivation of the final stationary control policy for all the network states, we investigate the perturbation effects independently by studying the change from P to the corresponding controlled transition matrices by one-row perturbations at all the states: . The final control policy u g for the network actually leads to a multi-row perturbation to P by combining all the beneficial one row perturbations determined in the algorithm: . Generally, is different from all the controlled transition matrices by one-row perturbations considered during the derivation of u g . Although, intuitively, this combination of beneficial one-row perturbations should reduce the total undesirable stationary mass, it is difficult to find the analytic characterization of the effect to the undesirable mass caused by the combination. We note here that in our simulations (Results and Discussion), the derived SSD control policy always reduces the stationary mass for undesirable states and performs better than the MFPT and BOA algorithms.

Conservative steady-state distribution (CSSD) control policy
All of the above control policies are relatively aggressive. Whenever we see a desirable difference by intervention through perturbing the original transition matrix of the network, we will apply control in a greedy manner. For single-gene control, the previous algorithms check whether applying gene flipping leads to immediate benefits. The decisions for different pairs of original and flipped states are independent. Therefore, we can derive the previous control policies for all the network states in a parallel fashion. However, there is no theoretical guarantee that applying these control policies will lead to the reduction of the stationary mass for undesirable states or, analogously, the increase of the stationary mass for desirable states. We now present a control policy, the conservative steady-state distribution (CSSD) control policy, for which we have a theoretical guarantee that the steady-state distribution after intervention will have less than or equal to the stationary mass of the undesirable states in the original network.
While we can compute the shifted steady-state distribution accurately using Theorem 2, we now introduce another theorem, proven in [17,25], which gives the new fundamental matrix after applying a rank-one perturbation to the network. . We can substitute these into (12) to compute the updated fundamental matrix.
Using this result, we now design a sequential algorithm that iteratively chooses states to control, so that we can theoretically guarantee that the control policy reduces the stationary mass of undesirable states. At each iteration, we check all the states, for which the control policies have not been decided, to see which state to control in order to achieve the largest reduction of the undesirable stationary mass. As in the previous subsection, for each state x, we compare the steady-state distributions π and (x) that correspond to P and the controlled transition matrix by replacing the row in P corresponding to state x by the row that corresponds to the flipped state x c .
Unlike deriving the SSD control policy u g (x) independently for all the states, for the CSSD control policy, we do not directly combine all the beneficial one-row perturbations into the new transition matrix decided by the derived SSD control policy, In this new sequential CSSD we check all possible one-row perturbations and find the best one row perturbation which results in the largest reduction of undesirable mass. Hence, we only select one state x best to control at each iteration if there is a reduction of undesirable mass. If we denote the controlled transition matrix by flipping the control gene g at that state as at the kth iteration, the sequential algorithm in fact will generate a sequence of controlled transition matrices: where K is the total number of iterations, and each pair of neighboring transition matrices differ by only one row.
Here, is the final controlled transition matrix with the derived control policy u g . As we obtain the controlled transition matrix at each iteration by a one-row perturbation to the previously computed controlled transition matrix, we can keep updating the exact steady-state distribution and the fundamental matrix using (11) and (12), respectively. Thus, at each iteration, we can directly compute the true stationary mass for undesirable states after intervention and make the decision about the control policy for the selected state as well. We let the algorithm run iteratively until we find that intervention to any state will actually increase the stationary mass of undesirable states from the previous iteration. In this way, we are guaranteed that the derived control policy will always have undesirable stationary mass less than or equal to that of the undesirable states in the original network. This algorithm is computationally more expensive compared with the previous algorithms as the search space is O(2 n ) at each iteration and the number of iterations K depends on the controllability of the networks. The advantage here is that the CSSD control policy is guaranteed to decrease undesirable stationary mass after intervention and, as well see in simulations, tends to outperform the SSD policy. The pseudocode in Appendix 4 summarizes the CSSD procedure for finding the best control gene and the corresponding stationary control policy.
The following theorem provides the guarantee that applying the conservative SSD algorithm [Appendix 4] will reduce the total undesirable stationary mass. . QED

Extension to context-sensitive PBNs
We have restricted ourselves to BNps and instantaneously random PBNs till now. All the algorithms, including the MFPT algorithm, focus on the GAP space and this only corresponds to the Markov chain space for BNps and instantaneously random PBNs. However, these algorithms, including the MFPT algorithm, can be extended to intervene in context-sensitive PBNs with no theoretical obstacles. But as the state space changes from the space of GAPs to the space of (context, GAP) pairs in context-sensitive PBNs, the computational complexity of these algorithms will increase. Moreover, for the algorithms directly based on steady-state distributions, we have to apply iterative update schemes to compute the shifted steady-state distributions since the perturbations to the transition matrix become multiple-row perturbations [17].

Comparison of four greedy control policies
In this section, as with the ensemble analysis in [7,[26][27][28], we study the performance of the four greedy control policies, MFPT, BOA, SSD, and CSSD, based on a large number of randomly generated networks with similar network properties. The two most important parameters for generating random Boolean networks are the bias (p b ) and connectivity (K). Here, p b is the mean of the Bernoulli distribution to generate the truth table of one Boolean function in a Boolean network, the bias p b being the probability that a randomly generated Boolean function takes on the value 1. K is the maximum input degree of the Boolean functions in the network. All simulation results in this section are based on 1, 000 randomly generated PBNs of 10 genes, including BNps and instantaneously random PBNs, with fixed K = 3 and different p b 's.

Performance comparison for BNps
We first consider performance comparison using randomly generated BNps with 10 genes and p = 0.01 in all experiments. Each BN is randomly generated with a specific bias p b . Since the bias affects the dynamical properties of randomly generated BNs [27], it is taken as a parameter in our simulations. The bias p b of each BNp is randomly selected from a beta distribution. The mean of the beta distribution varies 0.1 to 0.9 with step-size 0.2. The variance of the beta distribution is 0.000064. We generate 1000 random BNps for each bias mean. For each network, without loss of generality, undesirable and desirable states are defined by x 1 = 0 and x 1 = 1, respectively, and the control gene is x 10 . All four control policies are applied: MFPT with λ = 0; BOA; SSD; and CSSD. Table 1 summarizes the average stationary mass for the undesirable states before control (ORG) and after applying these four different policies. Note that the undesirable stationary mass before control is dependent on p b . Figure 1 shows both the means and standard deviations of the stationary mass for undesirable states, , with p b = 0.5 before and after control. From both the table and the figure, we see that the CSSD control policy has the best performance and the SSD policy also achieves better performance compared with the MFPT and BOA policies. They also show that in average, the BOA policy performs similarly to the MFPT policy.
Recall that it is guaranteed that the CSSD control policy always leads to a reduction of the stationary mass for undesirable states. Table 2 gives the percentages of random BNps with stationary mass shift Δ = π U -≥ 0 for different control policies and different values of p b . The MFPT control policy is probably the most aggressive policy with λ = 0 because the algorithm will force gene flipping whenever a difference between the mean first passage times is observed, and this aggressiveness is reflected by the lowest percentage in Table 2. As must be the case, the CSSD policy always leads to a reduction of the undesirable stationary mass. In this simulation, the SSD policy also always reduces the undesirable mass, although we do not have a proof that this is always the case. In fact, we have tried different settings with K and p b and for all tested settings, the SSD control policy never increases the undesirable mass. It would be nice to find a mathematical way to prove that the SSD control policy has the guarantee that it will always shift the stationary mass beneficially. In general, the BOA, SSD, and CSSD control policies are all relatively conservative compared to the MFPT policy since the criteria they use are directly related to the network's longrun behavior. It is also interesting to see that there is some correlation of the percentages with p b .

Performance comparison for instantaneously random PBNs
We have also compared the performances using 1000 randomly generated instantaneously random 10-gene PBNs with 2 context BNps, generated similarly as the BNps in the previous subsection. We fix the perturbation probability at p = 0.01. The selection probabilities are c 1 = c 2 = 0.5.
We again define states with x 1 = 0 as undesirable and states with x 1 = 1 as desirable, and again apply all four control policies with x 10 as the control gene. Table 3 summarizes the average stationary mass for the undesirable states before control (ORG) and after applying these four different control policies. The means and standard deviations of the stationary mass for undesirable states with p b = 0.5 before and after control are shown in Fig. 2.
The CSSD policy has the best performance and the SSD policy also achieves better performance compared with the MFPT and BOA policies, as in the simulations for BNps.
π x    Table 4 for different control policies and different p b 's. We see a similar trend as with the simulations for BNps.

Control policies
We observe similar trends for randomly generated networks with different numbers of genes [see Additional file 1]. Finally, while the MFPT and BOA control policies are close in terms of the shift of undesirable stationary mass, the MFPT control policy being slightly better in many cases, the BOA control policy is always significantly better than the MFPT control policy in terms of producing a beneficial shift.

A mammalian cell cycle network
We now apply these different control policies on a probabilistic Boolean network (PBN) model of the mammalian cell cycle recently proposed in [29]. For a normal mammalian organism, cell division coordinates with overall growth controlled via extra-cellular signals. These signals indicate whether a cell should divide or remain in a resting state. The positive signals, or growth factors, instigate the activation of Cyclin D (CycD), which is one of the key genes in the mammalian cell cycle. The other two important genes are retinoblastoma (Rb) and p27. Rb is a tumor-suppressor gene. This gene is expressed in the absence of the cyclins, which inhibit Rb by phosphorylation. Gene p27 is also active in the absence of the cyclins. Whenever p27 is present, it blocks the action of CycE or CycA and Rb can also be expressed, even in the presence of CycE or CycA. Hence, it stops the cell cycle.
The preceding explanation represents the wild-type cell cycle model. In this model, when p27 is active, the cell cycle can be stopped in cancerous situations. When we follow one of the proposed mutations in [29], in which p27 is mutated and it is always off, the mutation introduces a situation where both CycD and Rb might be inactive. As a result, in this mutated phenotype, the cell cycles in the absence of any growth factor. In other words, we consider the logical states in which both Rb and CycD are down-regulated as undesirable states.
We use the PBN that postulates the cell cycles with mutated phenotype in our experiments. We construct the instantaneously random PBN of the cell cycle based on the Boolean functions in Table 5 with mutated p27. This PBN consists of 9 genes: CycD, Rb, E2F, CycE, CycA, Cdc20, Cdh1, UbcH10, and CycB. The illustration of the relationship between these genes in the PBN is shown in Fig. 3. The above order of genes is used in the binary representation of the logical states, with CycD as the most significant bit and CycB as the least significant bit. The order of genes in the logical states does not affect our analysis or intervention. We assume that the extra-cellular signal to the cell cycle model is a latent variable. The growth factor is not part of the cell and its value is determined by the surrounding cells. The expression of CycD changes independently of the cell's content and reflects the state of the growth factor. Depending on the expression status of CycD, we obtain two constituent Boolean networks. The first constituent Boolean network is determined based on π U Performance comparison for randomly generated BNps Figure 1 Performance comparison for randomly generated BNps. Performance comparison for 1000 randomly generated BNps with p b = 0.5 with respect to the means and standard deviations of the stationary masses for undesirable states for different control policies: ORG -original undesirable stationary mass; MFPT -undesirable stationary mass after applying the MFPT control policy; BOA -undesirable stationary mass after applying the BOA control policy; SSD -undesirable stationary mass after applying the steady-state distribution control policy; CSSD -undesirable stationary mass after applying the conservative SSD control policy.  Percentages of random BNps with Δ = π U -≥ 0 within 1000 random BNps after applying four control policies with x 10 as the control gene.
π U the Boolean functions in Table 5 when the value of CycD is equal to 0. Similarly, the second constituent Boolean network is determined by setting the value of CycD to 1.
To completely define the PBN, we set the perturbation probability p = 0.01, and the probability of selecting each constituent Boolean network c j = 0.5, j = 1, 2.
We first compute the steady-state distribution for this mutated PBN as shown in Fig. 4(a). Since the logical states in which both Rb and CycD are down-regulated are undesirable states, we compute the total stationary mass for the undesirable states: . We then apply the MFPT control policy with λ = 0.1, the BOA control policy, the SSD control policy, and CSSD control policy to find the single control gene to reduce the stationary mass for the undesirable states. As CycD and Rb are two genes deciding network states to be either desirable or undesirable, it is problematic to apply the MFPT control policy if these genes are considered. Hence, in the experiments, we only check the last 7 genes for all four control policies. Table 6 gives the total stationary masses for different control genes using these control policies. From the table, all the control policies find the same best control gene, E2F, and their performances are similar. However, when we compare the total stationary masses of the undesirable states for all possible control genes, we see that the performance of the CSSD control policy is the best among all the control policies. The SSD control policy also gives superior performance. Figure 4(b) shows the steady-state distribution after applying the derived CSSD control policy for the best control gene E2F.
A recent paper suggests that the Myc-Rb-E2F pathway functions as a bistable switch that separates quiescence and proliferation for the mammalian cell cycle [19]. The paper shows that E2F activation correlates directly with π x x 1 0 = ∑ Performance comparison for randomly generated instantane-ously random PBNs Figure 2 Performance comparison for randomly generated instantaneously random PBNs. Performance comparison for 1000 randomly generated instantaneously random PBNs with p b = 0.5 with respect to the means and standard deviations of the stationary masses for undesirable states for different control policies: ORG -original undesirable stationary mass; MFPT -undesirable stationary mass after applying the MFPT control policy; BOA -undesirable stationary mass after applying the BOA control policy; SSD -undesirable stationary mass after applying the steady-state distribution control policy; CSSD -undesirable stationary mass after applying the conservative SSD control policy.  Percentages of random PBNs with Δ = π U -≥ 0 within 1000 random PBNs after applying four control policies with x 10 as the control gene.
π U the ability of a cell to reverse the R(estriction)-point, which marks the critical event when a mammalian cell commits to proliferation independent of growth stimulation. The R-point is fundamental for normal differentiation and appears to be dysregulated in virtually all cancers. It is interesting to see that through different math-ematical modeling we reach the same conclusion that E2F is the best potential target for future gene therapy design.
We have also computed the time to find the best control gene for all of four control policies. The values in Table 7 follow roughly the time complexity predicted in Methods section. Note that we collected the running time with the unoptimized code running in MATLAB on a standard PC with a 1.8GHz CPU and 1Gb memory. These values only serve as rough indices to show that the first three greedy control policies have roughly the same time complexity.

Conclusion
In this paper, we propose three new greedy stationary control policies directly using long-run behavior change by intervention to define the control criteria. Through simulations, we have shown that the MFPT, BOA, and SSD policies perform similarly with respect to computational complexity and all reduce the risk of entering undesirable states that correspond to aberrant phenotypes of the modeled cells, with the SSD policy having better average performance in this regard than the other two. For the conservative CSSD policy, we are guaranteed that intervention will lead to beneficial shift of steady-state distributions. We have also illustrated how these control policies can serve as the potential gene therapeutic intervention strategies in the future with a mammalian cell cycle network. As with the MFPT control policy, not only can they be used to directly shift the steady-state distribution without the need for a cost function, they can also be used to predict the potential control gene of the network, 1 ∧ Figure 3 Mutated mammalian cell cycle network. Logical regulatory graph for the mutated mammalian cell cycle network (modified from Fig. 1 in [29]). Blunt arrows stand for inhibitory effects; normal arrows for activations. Rb E2F Cdc20 Cdh1 serve as reduced-complexity approximations to cost-based control policies, and provide measures of network controllability. Our future direction will be focused on understanding the performance of these greedy control policies relative to their robustness in the presence of inaccurate inference [30] and network reduction [31,32]. 0.1303 after applying all of four control policies for all the potential control genes in the mutated 9-gene PBN. until g > number of genes

Appendix 3 -SSD algorithm
Partition the state-space into undesirable U and desirable D subsets.
Compute the original steady-state distribution π and the fundamental matrix Z. g ← 1.
repeat for All pairs of states x and x c ← flip control gene g in x do Steady-state distribution shifts for the mutated mammalian cell cycle PBN Figure 4 Steady-state distribution shifts for the mutated mammalian cell cycle PBN. Steady-state distribution shifts for the mutated 9-gene mammalian cell cycle PBN with p = 0.01: (a) Original steady-state distribution; (b) Steady-state distribution after applying the conservative SSD control policy with E2F as the control gene. Compute the shifted stationary mass of undesirable states , where is the stationary mass of the undesirable states after we apply the derived control policy; g ← g + 1.
until g > number of genes

Appendix 4 -Conservative SSD algorithm
Partition the state-space into undesirable U and desirable D subsets.
Compute the original steady-state distribution π and the fundamental matrix Z. g ← 1.
repeat π best = p; Z best = Z; Assign the set of states with no control assigned as L; repeat Δ best = 0; for All states x ∈ L do Compute with π best and Z best based on (11); if Δπ U > Δ best then Δ best = Δπ U ; Assign the best state to control: x best = x; π best = ; end if end for if Δ best > 0 then u g (x best ) = 1; u g ( ) = 0; L = L\{x best , }; Compute Z best based on (12) accordingly; end if until Δ best = 0 Δ(g) = π U -; g ← g + 1.
until g > number of genes Running time (measured in seconds) for deriving four control policies for all the potential control genes in the mutated 9-gene PBN.