An experimental design framework for Markovian gene regulatory networks under stationary control policy

Background A fundamental problem for translational genomics is to find optimal therapies based on gene regulatory intervention. Dynamic intervention involves a control policy that optimally reduces a cost function based on phenotype by externally altering the state of the network over time. When a gene regulatory network (GRN) model is fully known, the problem is addressed using classical dynamic programming based on the Markov chain associated with the network. When the network is uncertain, a Bayesian framework can be applied, where policy optimality is with respect to both the dynamical objective and the uncertainty, as characterized by a prior distribution. In the presence of uncertainty, it is of great practical interest to develop an experimental design strategy and thereby select experiments that optimally reduce a measure of uncertainty. Results In this paper, we employ mean objective cost of uncertainty (MOCU), which quantifies uncertainty based on the degree to which uncertainty degrades the operational objective, that being the cost owing to undesirable phenotypes. We assume that a number of conditional probabilities characterizing regulatory relationships among genes are unknown in the Markovian GRN. In sum, there is a prior distribution which can be updated to a posterior distribution by observing a regulatory trajectory, and an optimal control policy, known as an “intrinsically Bayesian robust” (IBR) policy. To obtain a better IBR policy, we select an experiment that minimizes the MOCU remaining after applying its output to the network. At this point, we can either stop and find the resulting IBR policy or proceed to determine more unknown conditional probabilities via regulatory observation and find the IBR policy from the resulting posterior distribution. For sequential experimental design this entire process is iterated. Owing to the computational complexity of experimental design, which requires computation of many potential IBR policies, we implement an approximate method utilizing mean first passage times (MFPTs) – but only in experimental design, the final policy being an IBR policy. Conclusions Comprehensive performance analysis based on extensive simulations on synthetic and real GRNs demonstrate the efficacy of the proposed method, including the accuracy and computational advantage of the approximate MFPT-based design.


Background
A salient aim of translational genomics is to develop new drugs via constructing gene regulatory network (GRN) models characterizing the interactions among genes and then use these models to design therapeutic interventions. Most intervention strategies in the literature assume perfect knowledge regarding the network model. However, unfortunately this is not a realistic assumption in many real-world biomedical applications as uncertainty is inherent in genomics due to the complexity of biological systems, experimental limitations, noise, etc. Presence of model uncertainty degrades the performance of interventions.
Markovian genetic networks, an example of which are probabilistic Boolean networks (PBNs), have received great attention in recent years [1][2][3][4][5]. These networks have been shown to be effective in mimicking the behavior of biological systems, particularly as they are able to capture the randomness of biological phenomena by means of a transition probability matrix (TPM). The long-run behavior of a Markovian network is determined by a steady-state distribution over network states. Designing therapeutic interventions for these networks, often studied in the context of Markov decision processes (MDPs), has been extensively studied over the past two decades [6]. The basic assumption behind many intervention algorithms is that the TPM is perfectly known.
When dealing with network models possessing uncertainty, it is prudent to design a robust intervention that provides acceptable performance across an uncertainty class of possible models compatible with the current state of knowledge. In general, the problem of designing robust operators (or interventions in this paper) is typically viewed from two different perspectives: minimax robustness and Bayesian robustness. Under a minimax criterion, the robust operator has the best worst-case performance across the uncertainty class. The main problem with minimax robustness is that it is very conservative and gives too much attention to outlier models in the uncertainty class that may possess negligible likelihood.
Bayesian robustness addresses this issue by assigning a prior probability distribution reflecting existing knowledge about the model. Under this criterion, the aim is to find a robust operator possessing the optimal performance on average relative to this prior distribution. In the context of Bayesian robustness, when optimality is relative to the prior distribution, the resulting operator is called an intrinsically Bayesian robust (IBR) operator, examples being IBR Kalman filter in signal estimation [7], IBR signal compression [8], and IBR network structural intervention for gene regulatory networks [9,10]. When optimality is relative to the posterior distribution obtained by incorporating observations into the prior distribution, the robust operator is called an optimal Bayesian operator [11][12][13][14].
It is of prime interest to reduce model uncertainty via additional experiments and thereby improve the performance of the intervention. Since conducting all potential experiments is not feasible in many biomedical applications owing to operational constraints such as budget, time, and equipment limitations, it is imperative to utilize an experimental design strategy to rank experiments and then conduct only those experiments with high priority [15][16][17][18].
As experiments are aimed at reducing uncertainty, a crucial step in experimental design is uncertainty quantification. From a translational perspective, we are not concerned with overall uncertainty, but rather with the degradation induced by the uncertainty in the intervention performance. Taking this into account, we employ an objective-based uncertainty quantification scheme called the mean objective cost of uncertainty (MOCU) [10]. MOCU has been successfully used for developing experimental design in gene regulatory networks when structural interventions are concerned [15,[19][20][21].
In this paper, we extend the application of the objectivebased experimental design for GRNs to the realm of dynamical interventions. The interactions among genes are characterized by a set of conditional probability matrices where the conditional probabilities in each matrix correspond to the regulatory relationship between a gene and its regulating genes. We address the experimental design problem involving a GRN model in which a number of probabilities across conditional probability matrices are missing. Unknown conditional probabilities are represented by conjugate prior distributions which are closed under consecutive observations. In this paper, we show how the uncertainty in the conditional probabilities can be translated into the uncertainty in an unknown transition probability matrix. Furthermore, we show how additional information in terms of a trajectory of consecutive state transitions from the true system, if available, can be integrated to update prior distributions to posterior distributions containing lesser uncertainty. Deriving IBR control policies, which involves minimizing the average cost relative to the prior distribution among all stationary control policies, is at the very core of our experimental design calculations. In this regard, we take advantage of the fact that an IBR control policy can be derived by using an effective transition probability matrix that represents the uncertainty class of transition probability matrices. We should emphasize the optimality of the IBR control policy, which is selected from all possible stationary policies as opposed to the model-constrained Bayesian robust (MCBR) control policy [22], which is selected from among only the policies that are optimal for networks belonging to the uncertainty class.
It is worth mentioning that due to the computational complexity limitation, we are only concerned with stationary control policies in this paper. Another approach for designing a Bayesian robust control policy is to design a non-stationary policy, referred to as the optimal Bayesian robust (OBR) control policy. In addition to the expected immediate cost and different future costs obtained due to being in different states at the next time steps, an OBR policy also considers the effect of observations obtained by different actions on the sequence of different posterior distributions, which makes an OBR policy be non-stationary. In an OBR setting, the control problem is transformed into an equivalent problem in which each state, being referred to as a hyperstate, contains both the ordinary state of the system and the state of the knowledge reflecting the prior information and the history of observations from the system. Utilizing the concept of hyperstates for designing OBR control policies has roots in the classical works of Bellman and Kalaba [23], Silver [24], Gozzolino [25], and Martin [26]. The major obstacle of the OBR theory is its enormous computational complexity [27][28][29], such that it cannot be applied to networks of larger than 4 genes, even when only network control is concerned [29], let alone experimental design whose complexity is several-fold more than that of the control problem. Hence, taking into account complexity considerations with OBR, we focus on IBR stationary policies for our experimental design problem, which still requires massive computations but at a more tolerable cost compared to OBR policies.
To mitigate the computational complexity burden of experimental design, and considering the fact that computing the IBR control policy can be computationally demanding, we approximate it by using the method of mean first passage time (MFPT) [30]. The main motivation behind utilizing MFPT for controlling GRN networks is the desire to reach desirable states and leave undesirable states in the shortest time possible. Using this intuition, MFPT is used in [31] to derive a stationary control policy that can be used as an approximation for the optimal control policy and in [32] to find the best control gene. Using the concept of MFPT, we approximate the IBR control policy required for the experimental design and thereby lower the complexity of the experimental design. We emphasize that the MFPT approximation is only used for experimental design and that the implemented control policy will always be the optimal stationary control policy.
We summarize the main contributions of the paper. (1) Despite all the previous MOCU-based experimental design methods whose focus was on structural interventions [15,19,20], in this paper we consider the class of stationary interventions and derive a closed-form solution for the IBR stationary intervention when the TPM is unknown. (2) While in the previous works, uncertain parameters involve a number of regulatory edges between genes, in this paper we consider the case that a number of conditional probabilities characterizing regulatory relationships between genes are unknown. Given that conditional probabilities can be estimated using time series gene expression data generated through a biological experiment, it is more realistic to consider these probabilities, rather than regulatory edges, as the outcomes of biological experiments. This new uncertainty assumption requires us to define a new uncertainty class and prior probability model. (3) To address the complexity concerns of the proposed method, we propose an approximate experimental design method utilizing mean first passage times (MFPTs) in which we extend the application of MFPT-based controls to unknown TPMs.

Markovian regulatory networks
In a network with n genes, a set of binary variables V = {X 1 , X 2 , . . . , X n }, X i ∈ {0, 1}, determines the expression states of the genes. The vector of gene expression values at time t, X(t) = (X 1 (t), . . . , X n (t)), referred to as the gene activity profile (GAP), defines the network state at each time step. In a Markovian regulatory network, network dynamics involves a trajectory of states over time governed by the transition rule captures randomness in the system and f : S × → S, S = {0, 1, . . . , 2 n − 1} being the set of corresponding decimal representations for the network states, is a mapping that characterizes the state transitions in the network. The sequence of states over time can be viewed as a Markov chain characterized by a transition probability matrix (TPM) P =[P ij ] 2 n −1 i,j=0 , where P ij = Pr[ X(t + 1) = j|X(t) = i], Pr[·] being the probability operator. An ergodic Markov chain is guaranteed to possess a unique steady-state distribution π, such that π T = π T P, T being the transpose operator.
Assume that the expression state of a gene is solely determined by its regulating genes. In other words, given the values of its regulating genes, the expression state of a gene is conditionally independent from those of other genes. Let the vector of expression states of the regulating genes for X i be denoted by X i , where the ordering in X i is induced from the ordering X 1 , X 2 , . . . , X n . In a binary setting, if X i has k i regulating genes, then X i can have 2 k i possible vector values. To define the regulatory relationship between gene X i and its regulating genes, we can construct a conditional probability matrix (CPM) C(X i ) of size 2 k i × 2, where each row of the matrix corresponds to a certain combination of gene expressions in X i and the first and second columns correspond to the conditional probability of gene X i being 0 and 1, respectively, i.e., where by X i = j we mean that the equivalent decimal value of the vector of expression states for the regulating genes of X i is j. A network of n genes with each gene having k i regulating genes can be completely defined by n CPMs C(X i ), 1 ≤ i ≤ n, each being of size 2 k i × 2. These matrices can be used to construct the transition probability matrix. Owing to the mutual conditional independence of all genes given the values of all regulating genes, the entry P ij of the TPM can be found as where j k is the binary value of the k-th gene in state j and X k [i] is the vector of binary values of the regulating genes for X k extracted from the representation of state i. For example, consider a 3-gene network, n = 3, in which gene X 1 (k = 1 in (2)) is regulated by genes X 2 and X 3 . For this network, when computing P 14 (i = 1 and j = 4 in (2)), j k = 1 (as X 1 = 1 for j = 4) and X 1 [i] = (0, 1) (as (X 2 , X 3 ) = (0, 1) for i = 1). From a translational perspective, the states of a network can be partitioned into two sets: desirable states D, being associated with healthy phenotypes, and undesirable states U , corresponding to pathological cell functions such as cancer. The goal of therapeutic interventions is to alter the dynamical behavior of the network in such a way as to reduce the steady-state probability π U = i∈U π i of the network entering the undesirable states. There are two different approaches for network interventions: structural interventions and dynamical interventions. In a structural intervention, the goal is to modify the dynamical behavior of the network via a one-time change in its underlying regulatory structure [9,[33][34][35]. Dynamical interventions are typically studied in the framework of Markov decision processes and are characterized by control policies. These interventions usually involve the change in the expression of one or more genes, being called control genes, and can be applied either over a finite-time [36][37][38] or an infinite-time horizon [31,39].

Optimal dynamical control
Network interventions in this paper belong to the category of dynamical interventions. We assume that there is a control gene g ∈ V whose expression value is affected by a binary control input c ∈ C, C = {0, 1}. The value of g is flipped when c = 1 and not flipped when c = 0. It is straightforward to extend the results to m control genes, where there are 2 m different control actions. Let P(c) = P ij (c) 2 n −1 i,j=0 denote the controlled TPM, i.e., The controlled TPM can be found using the uncontrolled TPM P as where statesĩ and i differ only in the value of gene g.
The problem of optimal control can be modeled as an optimal stochastic control problem [39]. Let the cost function r(i, j, c) : S × S × C → R determine the immediate cost accrued when the network transitions from state i to state j under control action c. This cost reflects both the desirability of states and the cost for imposing control actions. Usually, larger cost values are assigned to the undesirable states and when the control action is applied. This cost function is assumed to be time-invariant, bounded, and nonnegative. We consider an infinite-horizon discounted cost approach as proposed in [39] in which a discount factor 0 < ζ < 1 is used to guarantee convergence [40]. Control actions are chosen over time according to a control policy μ = (μ 1 , μ 2 , . . . ), μ t : S → C. In this setting, given a policy μ and an initial state X 0 , the expected total cost is where the expectation is taken relative to the probability measure over the space of state and control action trajectories. If denotes the space of all admissible policies, we seek an optimal control policy μ * (X 0 ) such that The corresponding optimal expected total cost is denoted by J * (X 0 ). It has been shown that the optimal policy μ * (X 0 ) exists and can be found by solving Bellman's optimality equation [40], The optimal cost J * = (J * (0), . . . , J * (2 n − 1)) is the unique solution of (7) among all bounded functions and the control policy μ * that attains the minimum in (7) is stationary, i.e., μ * = (μ * , μ * , ..) [40]. In order to find the fixed point in the Bellman's optimality equation and thereby find the optimal control policy, dynamic programming algorithms, including the value iteration algorithm that iteratively estimates the cost function, can be used.

MOCU-based optimal experimental design framework
In this section, we review the general framework of the experimental design method in [15], which is based on the concept of the mean objective cost of uncertainty (MOCU) [10]. Let θ = (θ 1 , θ 2 , . . . , θ T ) be composed of T uncertain parameters in a network model. The set of all possible realizations for θ is denoted by and is called an uncertainty class. A prior distribution f (θ) is assigned to θ, which reflects the likelihood of each realization of θ being the true value.
For each possible intervention ψ ∈ , the class of interventions, and each model θ in the uncertainty class, an error function ξ θ (ψ) determines the error of ψ when applied to the network model θ. The optimal intervention ψ(θ) has the lowest error relative to model θ, i.e., ξ θ (ψ(θ)) ≤ ξ θ (ψ), ∀ψ ∈ . When dealing with an uncertainty class , the intrinsically Bayesian robust (IBR) intervention ψ IBR ( ) is defined as where the expectation is taken relative to the prior distribution f (θ ).
An IBR intervention is optimal on average rather than at each specific network model θ; therefore, relative to θ an objective cost of uncertainty (OCU) can be defined as Taking the expectation of U ,ξ (θ) relative to f (θ), we obtain the mean objective cost of uncertainty (MOCU): MOCU measures the model uncertainty in terms of the expected increased error due to applying an IBR intervention (the chosen intervention in the presence of uncertainty) instead of an optimal intervention (the chosen intervention in the absence of uncertainty). Uncertainty quantification based on MOCU can lay the groundwork for objective-based experimental design.
Assume that corresponding to each parameter θ i , there is an experiment E i that results in the exact determination of θ i . The goal of the experimental design is to find which experiment should be conducted first so that model uncertainty is reduced optimally. Focusing on experiment E i and parameter θ i , consider the case that the outcome of experiment E i is θ i . Then the remaining MOCU given where the expectation is taken relative to the conditional distribution f θ|θ i = θ i , |θ i = θ i , is the reduced uncertainty class obtained after θ i = θ i , and vector θ |θ i = θ i is obtained from vector θ by setting θ i to θ i . Taking the expectation of (11) relative to the marginal distribution f θ i , which is in fact the marginal distribution of the parameter θ i , we obtain the expected remaining MOCU given experiment E i is carried out (or equivalently parameter θ i is determined): M ,ξ ( ; θ i ) measures the pertinent uncertainty expected to remain in the model after conducting experiment E i .
The experiment E i * that attains the minimum value of the expected reaming MOCU is called the optimal experiment and suggested as the first experiment [15]: The parameter θ i * corresponding to E i * is called the primary parameter. Note that (13) can be further simplified through some mathematical manipulations and removing expressions not dependent on the optimization variable [20]: A number of experimental design methods based on the MOCU framework have been proposed in the literature [15,19,20]. In all of these cases, the MOCU-based experimental design can reduce the number of needed experiments significantly in comparison to other selection policies such as entropybased experimental design, pure exploitation, or random selection policy.

Uncertainty in transition probability matrix
Assume that regulatory information between a gene and its regulating genes is missing for a number of genes in the network. In other words, a number of rows in the n conditional probability matrices are unknown. We represent unknown conditional probabilities by a set of random variables θ = (θ 1 , θ 2 , . . . , θ T ). Since each row of the CPM adds up to one, i.e., C j,1 (X i ) + C j,2 (X i ) = 1, there is only one degree of freedom. The uncertainty in the CPMs will eventually show up in the corresponding TPM and thereby can affect the performance of the control policy. Therefore, it is of interest to reduce the uncertainty in the CPMs. We seek an experimental design method that efficiently guides us on which unknown conditional probability to determine first. We need to assign prior distributions to the random variables representing unknown conditional probabilities. Assigning accurate priors is highly challenging. A prior distribution must describe the current state of knowledge regarding the unknown model accurately. It is also desirable that the prior distribution and the posterior distribution, obtained by incorporating data into the prior, belong to the same family of distributions, being referred to as a conjugate prior distribution. Using conjugate prior distributions, we can easily update the priors to the posteriors, which facilitates the computations in a Bayesian setting as it is enough to only keep track of the hyperparameters in the distributions. With this in mind, we utilize the beta distribution as the prior distribution for each unknown parameter θ i . Relative to a random variable θ i , the beta distribution Beta(α i , β i ) with hyperparameters α i and β i is of the following form: where We assume that θ 1 , θ 2 , . . . , θ T are independent and each parameter θ i , 1 ≤ i ≤ T, has a beta distribution Beta(α i , β i ); therefore, the prior distribution of θ = In addition to the set of CPMs, containing unknown conditional probabilities, it is possible that observations from network dynamics in terms of a trajectory X L = {X(0), X(1), . . . , X(L)} of L consecutive state transitions are also available. The state trajectory X L can be utilized as an additional source of information to update the initial beta distributions to the posterior beta distributions. If θ i represents the unknown conditional probability C j,1 (X i ) = Pr X i (t + 1) = 0| X i = j , then given a state trajectory X L the posterior distribution f (θ i |X L ) is again a beta distribution with new hyperparameters where X i (l) denotes the value of gene X i at the l-th state in the trajectory, and 1[·] is the indicator function. In other words, those state transitions in which the event corresponding to the unknown conditional probability θ i occurs can be used to update the information about that unknown probability. Note that X i [ X(l)] = j implies that the equivalent decimal value of the gene expression vector for the regulating genes of X i extracted from network state X(l) should be equal to j. The conditional expectation of Since the uncertainty of an unknown conditional probability is governed by the corresponding terms α , β , and given the fact that observation(s) can potentially increase α's and β's according to (17) and (18), availability of a state trajectory X L is equivalent to a lesser initial uncertainty, and hence a simpler experimental design problem.

Optimal experimental design for determining unknown conditional probabilities
Building on the general MOCU-based experimental design framework in (8)-(14), we propose an experimental design method when dynamical controls characterized by stationary control policies are concerned. A schematic diagram of the proposed experimental design framework is given in Fig. 1. We first assign beta distributions with initial hyperparameters (α i , β i ) to each unknown conditional probability. Then if a state trajectory X L is available as an additional source of knowledge, it is incorporated to update the initial hyperparameters to α i , β i according to (17) and (18). These updated hyperparameters characterize the uncertainty class for finding the best parameter to determine using the proposed MOCU-based framework. When the first experiment is chosen and carried out, its outcome (the true value for the chosen unknown conditional probability) is incorporated in the uncertainty class, leading to a reduced uncertainty class that contains fewer uncertain parameters. If operational resources allow more experiments, this new uncertainty class can be used to find the next parameter for determination (this process can be iterated). Otherwise, the experimental design step is finished and the reduced uncertainty class is used to derive the IBR control policy based on which control actions at each time step are applied to the underlying true network. As (14) suggests, in order to implement the experimental design, we need to derive IBR interventions. Therefore, we first focus on explaining how an IBR control policy can be derived. Considering an uncertainty class of TPMs, relative to an initial state X 0 , the average where ] is the expectation over both within-model stochasticity and model uncertainty. For initial state X 0 , the optimal average cost is defined as and the minimum is attained by the IBR control policy μ (X 0 ) = μ 1 (X 0 ), μ 2 (X 0 ), . . . . This control problem can be transformed into a dynamic programming problem of the following form for each i ∈ S and t ≥ 0: We call E θ P θ ij (c) the effective controlled transition probability matrix. It is obtained similarly to P ij (c) by plugging is obtained as where Pr θ [·] is the probability operator relative to θ and E θ [·] is taken relative to the updated prior (posterior) distribution f (θ|X L ). The ETPM P represents the uncertainty class of TPMs and enables finding the IBR control policy μ for an uncertainty class of TPMs in the same way that the optimal control policy for a known TPM is found. Since the θ i 's are independent, the expectation can be brought inside the product. Each conditional probability term in (23) is either known, whose known value is used for multiplication, or is unknown and corresponds to an unknown parameter θ i whose expected value obtained according to (19) is used in the multiplication. The dynamic formulation in (22) is similar to the dynamic programming used for optimal control except that the known TPM has been replaced by E θ P θ ij (c) ; therefore, a similar approach used for solving optimal control dynamic programming can be utilized here with the distinction that all theorems should be defined relative to ETPM. Keeping this in mind, we define a mapping TJ : S → R for a bounded function J : S → R and ∀i ∈ S as The following three theorems, whose proofs are similar to those in [40] relative to a known TPM, lay out the theoretical foundation for finding the IBR control policy.
Theorem 1 (Convergence of the algorithm) Letting J : S → R be a bounded function, for any i ∈ S, the optimal average cost function J (i) satisfies J (i) = lim M→∞ T M J(i).

Theorem 2 (Bellman's optimality equation)
The optimal average cost function J satisfies

Theorem 3 (Necessary and sufficient condition) A stationary policy μ is an IBR control policy if and only if for each i ∈ S, μ (i) attains the minimum in Bellman's optimality equation.
Based on Theorem 1, J can be computed recursively using the value iteration algorithm in the same way that this algorithm is used to find the optimal control policy for a known TPM. The converged cost satisfies Bellman's optimality equation (Theorem 2). Also, the corresponding policy is a stationary IBR control policy (Theorem 3). μ attains the minimum in the Bellman's optimality equation, where the ordinary TPM is replaced by the ETPM.
The concept of effective quantities has also been used for deriving IBR operators in other problems: for example in [7], effective noise statistics are used to derive the IBR Kalman filter; or in [8], effective covariance matrix is used for achieving IBR signal compression.
To define the experimental design problem in the context of the framework laid out in (8)- (14), let the class of interventions be the set of all admissible control policies . Each ψ ∈ is characterized by a control policy μ and the cost of intervention is where J θ μ (X 0 ) is obtained according to (5) with E[·] relative to the probability measure defined by the TPM P θ . To find a single value as the cost of a control policy for a specific TPM, in (26), we take the expectation over all possible initial network states X 0 , assuming that the possible initial states are equally likely. Regarding the IBR intervention, we find a single value as the average cost of the IBR intervention: where J (X 0 ) is obtained using (25). The definitions of cost and intervention in (26) and (27) set the stage for objective-based uncertainty quantification in the context of dynamical control according to (10). After defining MOCU, the MOCU-based experimental design framework can be used and the the primary parameter θ i * can be found by plugging (27) in (14): where the IBR control policy for the reduced uncertainty class | θ i = θ i is found using the ETPM P |θ i =θ i obtained relative to the conditional probability distribution f θ|θ i = θ i .
According to (28), to evaluate the determination of each unknown parameter θ i , for each realization θ i of θ i , we need to obtain the average cost of the IBR control policy μ |θ i =θ i across the reduced uncertainty class | θ i = θ i and then take the average of all these average costs relative to the marginal distribution of parameter θ i . In practice, the expression in (28) is approximated via Monte-Carlo simulations. We draw a number of samples from the marginal distribution of θ i and then approximate the expression being minimized in (28) as the average of all inner expectations computed for each generated sample. The steps required for obtaining the primary parameter θ i * are summarized in Algorithm 1. The inputs to this algorithm are n CPMs characterizing the GRN, T unknown parameters θ i corresponding to unknown conditional probabilities, hyperparameters (α i , β i ) for the prior beta distributions, the state trajectory X L , and ζ , r, and I, which determine the discount factor, cost function, and the number of iterations for value iteration, respectively.
Finding IBR control policies for an uncertainty class (like finding the optimal control policy for a known TPM [41]) is computationally expensive and the complexity grows exponentially with the number of genes. Therefore, most of the computational burden of the experimental design is in finding IBR control policies. To mitigate the complexity of the proposed method, in the next section we propose an approximate method for computing IBR control policies.

Approximate experimental design based on MFPT
The mean first passage time (MFPT) from state i to j measures how long it would take on average that the network transitions from state i to state j.
For a Markovian GRN, if the sets of desirable states D and undesirable states U are determined, we can have the following partitioning for the TPM: where P S 1 ,S 2 involves the transition probabilities from each state in the set S 1 to the states in set S 2 . The vectors K D,U and K U ,D of MFPTs from each state in D to U and from each state in U to D, respectively, can be computed as [30] K D,U = e + P D,D K D,U where e is an all-unity column vector of appropriate size. If g is the control gene andX g is the flipped state corresponding to state X obtained by flipping g in state X, then to find the MFPT-based stationary control policy μ MFPT P : S → C, the control action for each desirable state X ∈ D is obtained as [31] μ MFPT and for each undesirable state X ∈ U as where in (32) and (33) is a tuning parameter that should be adjusted based on the definition for the cost function r(i, j, u).
In the spirit of the MFPT-based approximation for optimal control, we approximate the IBR control policy needed in experimental design via MFPT. Taking into account that the IBR control policy μ is in fact the optimal control policy relative to the ETPM and that MFPT can be used as an approximation for the optimal control policy, we approximate the IBR control policy by finding the MFPT-based control policy relative to the ETPM and denote it by μ MFPT , i.e., μ MFPT P is obtained by solving (29)-(33) for the effective transition probability matrix P . When we approximate the IBR control policy via MFPT in experimental design, the average cost needed in (28) is computed via Monte-Carlo simulations (over different initial network states) as where N is the total number of simulations and r (n) (·) is the accrued total discounted cost in the n-th simulation. The pseudo-code for the approximate method is the same as Algorithm 1 except for steps 19  Having used the MFPT approach for the sole purpose of reducing the uncertainty class, the IBR control policy is obtained by solving Bellman's equation using value iteration.

Computational complexity analysis
The computationally demanding step in the proposed experimental design method is to find IBR control policies. If there are T different unknown parameters and we generate M different samples for Monte-Carlo simulations for each one, then we need to find IBR control policies Algorithm 1 Optimal experimental design for k = 1 : n do 12: if Pr X k = m k X k [ l] is known then 13: 14: else 15: if Pr X k = m k X k [ l] is unknown and corresponds to θ i then 16: if Pr X k = m k X k [ l] is unknown and corresponds to θ l , l = i then 18: {J(X 0 )} 2 n X 0 =1 ← value iteration(P |θ i , I, r, ζ ) 20: 21: In this paper, we focus on binary networks and binary control actions, i.e., |S| = 2 n , n being the number of genes and C = 2. Therefore, the order of complexity when experimental design based on the IBR control policy is implemented is O T × M × (2 n+1 ) I . The complexity grows exponentially with the number of genes and polynomially with the number of unknown parameters. The complexity of the approximate experimental design is much lower because there is no iterative calculation in MFPT. It is enough to solve the two linear equations in (30) and (31), which involves two matrix inversions. Although applying MFPT for experimental design requires us to find the average cost of the MFPTbased IBR control policy via Monte-Carlo simulations, this overhead complexity for MFPT is not concerning and still the complexity of the MFPT-based approach is much smaller in comparison to that of the optimal experimental design. Since the calculations for each unknown parameter and each realization of that parameter can be done independently, a parallel implementation can be used for the proposed experimental design methods.
In Fig. 2, we provide run times required for finding the primary parameter among 5 unknown parameters for GRNs with different numbers of genes. The codes are scripted in MATLAB and run in a parallel framework on a Machine with an Intel ® quad-core 2.67 GHz CPU and 12 GB RAM. The number of iterations for value iteration is I = 4. While the execution time grows exponentially with the number of genes and the runs may be prohibitively time-consuming beyond six genes for the optimal experimental design, the MFPT-based approximate method can be still implemented for networks of larger size.

Results
In this section, we study the performance of the proposed methods based on synthetic and real gene networks. As a class of Markovian regulatory network, we consider Boolean networks with perturbation for the simulations.

Boolean networks with perturbation
An n-gene Boolean network is defined by a set of binary variables V = {X 1 , X 2 , .., X n }, and a set of Boolean func- determines the value of gene X i when it has k i regulating genes. The transition rule X(t + 1) = F(X(t)) governs the evolution of states over time. In a Boolean network with perturbation (BNp), each gene may flip its value with a small perturbation probability p. In this network, the next state at time t + 1 is F(X(t)) with probability (1 − p) n or F(X(t)) ⊕ γ with probability 1 − (1 − p) n , where γ is a binary vector of size n and ⊕ is the component-wise addition modulo 2 operator. The underlying state evolution of a BNp over time can be viewed as a Markov chain with a transition probability matrix P. The TPM can be derived using the regulatory structure of the network and the perturbation probability [22]. When p > 0 the Markov chain is guaranteed to possess a unique steady-state distribution π .

Synthetic networks
We first randomly generate a number of BNps and then from the corresponding TPMs we extract the set of conditional probabilities for each gene in the network. We consider BNps with 6 genes. The number of regulating genes for each gene is set to 2 and they are randomly selected from the set of genes. Therefore, the size of the CPM for each gene is 4 × 2. The bias (probability) that a Boolean function takes on the value 1 is randomly selected from a beta distribution with variance 0.0001 and mean 0.5. The perturbation probability p is set to 0.01. We use this protocol to generate 100 random BNps from which we generate 100 different sets of CPMs.
The conditional probability C j,1 (X i ) = Pr[X i = 0| X i = j] characterizing the regulation of gene X i is obtained from the generated TPM P as where X i [S] = j and S i = 0. In other words, to find the conditional probability for gene X i being down regulated when the equivalent decimal value of its regulating genes X i is j, we look for the row in the TPM in which X i is j and then in that row we take the summation of all TPM entries corresponding to gene X i equal to 0. Similarly, we extract C j,2 (X i ) = Pr[X i = 1| X i = j] from the generated TPM as where X i [S] = j and S i = 1. Since more than one row in a TPM might correspond to X i = j, in order to have a consistent procedure for extracting conditional probabilities, we take the average of all the values found for the rows corresponding to X i = j.
To define the control problem, we assume that states with down-regulated genes X 1 and X 2 are undesirable, i.e., U = {1, . . . , 16}. The control gene whose expression can be flipped via control actions is X 6 . We use the following cost function for the simulations: i fj ∈ U and c = 1 5 i fj ∈ U and c = 0 1 i fj ∈ D and c = 1 0 i fj ∈ D and c = 0 . (38) This cost function reflects penalties assigned to undesirable states and also to the transitions to which the control action is applied. The discount factor ζ is set to 0.2. The tuning parameter for the MFPT method is set to = 0.3. We use value iteration with 4 iterations to find the control policies in the optimal design method and for evaluating chosen experiments. All initial beta distributions for unknown conditional probabilities θ i in the network are Beta (1, 1), a uniform distribution. We run simulations for different numbers L of initial data used for updating priors.
In the first set of simulations, we generate 100 synthetic BNps. After extracting corresponding conditional probabilities for each network, we randomly select 5 conditional probabilities in each network and assume they are unknown. The aim is to decide which unknown conditional probability should be determined first. For each network, we generate a state trajectory X L = {X(0), . . . , X(L)}, used for updating initial hyperparameters, by simulating the underlying true network.
In the simulations, when we want to evaluate the determination of an unknown probability θ i , we put back its true value φ i , which was discarded during experimental design calculations, in the network, thereby resulting in a new uncertainty class |(θ i = φ i ) of remaining unknown probabilities. We find the IBR control policy μ |(θ i =φ i ) for this new uncertainty class by solving Bellman's equation relative to P |(θ i =φ i ) . We then apply μ |(θ i =φ i ) to the underlying true network and run the controlled network until the horizon length 6 according to the underlying true TPM and μ |(θ i =φ i ) , and record the cost at each time based on the network state at that time and the cost function r(i, j, c) in (38). Then we compute the total discounted cost over the horizon by accumulating the costs incurred over the horizon length according to the discount factor ζ . We repeat this process of calculating the total discounted cost for 10,000 iterations over different network initial states X 0 and state transition paths. Note that although the underlying controlled TPM is fixed, there are still different state transition paths over the horizon due to the randomness characterized by the TPM. We represent the cost corresponding to determining parameter θ i as the average of all 10,000 total discounted costs and denote it by J(θ i ). For comparing different experimental design approaches, we report the average of J(θ i ) over 100 generated synthetic networks and 100 different sets of assumed true values for the unknown probabilities in each network drawn from the beta prior distributions.
Using either optimal or approximate experimental design methods we can rank potential experiments E 1 up to E 5 from the optimal experiment being denoted by E 1 (obtained according to (28)) to the least optimal experiment denoted by E 5 , which corresponds to the maximum value of the expression being minimized in (28). In Table 1, for different lengths L of the trajectory data used for updating priors, we rank experiments based on both experimental design methods and show the average cost J(θ i ), 1 ≤ i ≤ 5, obtained after conducting experiment E i . This table suggests that the average cost obtained after conducting experiments with higher priority is smaller. Also, although the approximate experimental design method based on MFPT has much lower complexity, its performance is close to that of the optimal method. Note that the average cost obtained after high ranked experiments is lower when they are chosen by the optimal method but as we go towards low priority experiments the performance of the approximate method becomes better. This is because the optimal method yields Table 1 Comparison of the ranked experiments according to the optimal and approximate methods a better ranking compared to the approximate method and more experiments resulting in lower average cost are given high priority in the optimal method. Another observation from the table is that, as we use more data for updating prior distributions, the difference between the performances of the different experiments gets smaller. For example, the difference between the average costs of E 1 and E 5 is larger when no data are used for the prior update than the case that the initial data X L of length L = 50 are used for the prior update. This is because by using more data in the prior update step the posterior distribution becomes tighter around the true model and less uncertainty remains in the model. Let J(θ opt ), J(θ approx ), and J(θ rnd ) be the costs corresponding to the determination of the unknown probability chosen by the optimal method, the approximate method, and randomly, respectively. Table 2 shows the average of these costs over different networks and assumed true values. For different L, both optimal and approximate methods provide close performance and clearly outperform the random selection policy.
When comparing the optimal experiment E 1 with an experiment E i , i = 1 (when experiments are ranked based on either optimal or approximate method), we say that a success occurs if J(θ 1 ) − J(θ i ) < −0.002, a failure happens if J(θ 1 ) − J(θ i ) > 0.002, and a tie corresponds to |J(θ 1 ) − J(θ i )| < 0.002. Table 3 shows the ratio of success, failure, and tie for both methods and different L. Regardless of the experimental design approach, the ratio of success is always higher than the ratio of failure and gets larger when we compare the optimal experiment E 1 with lowest ranked experiments. Note that the ratio of tie increases for larger values of L because a tighter prior leads to closer experimental performance. Now, we evaluate the experimental design methods for a sequence of experiments. At each step in the sequential experiments, we choose experiment E i * based on the experimental design method. After incorporating the true value φ i * of the corresponding unknown probability θ i * in the model, we compute the cost J(θ i * ). The distribution for the new uncertainty class |(θ i * = φ i * ) is the product of the beta distributions for the remaining unknown probabilities as we assume that all unknown probabilities are statistically independent. This distribution is used as the new prior distribution to find the next best experiment. This process continues until all unknown parameters are estimated and the underlying true network model is fully identified. Figure 3 presents the average cost over 50 different 6-gene networks and 100 different sets of assumed true values for optimal experimental design, approximate experimental design, and the random selection policy when there are T = 5 unknown probabilities and no initial data is used for updating priors, i.e., L = 0. Since the first data-point corresponds to the cost before any experiment and the final point corresponds to the cost after conducting all experiments, they are the same for all three curves. However, the decrease in the cost obtained by either experimental design method is faster in comparison to that of the random policy. Figure 4 compares approximate experimental design with the random selection policy when the network is of size n = 9 and there are T = 8 unknown probabilities. For this network size and number of unknown probabilities, the computational burden of optimal experimental design is prohibitively large. Therefore, we only implement the approximate method and the random selection policy. Recall that although we use the MFPT-based approximate approach for the experimental design step, the robust control policies after each experiment are still obtained by solving Bellman's equation using value iteration. We see the promising performance of the approximate method in this figure. By following the approximate method, after conducting only four experiments the optimal cost is almost reached.

Real network example: TP53 pathways
In this section, we consider the set of pathways involving the TP53 gene as shown in Fig. 5 [42]. TP53 is a tumor suppressor playing a major role in cellular activities in response to stress signals such as DNA damage. When DNA damage occurs, a mutant TP53 may lead to the abundance of abnormal cells, which eventually results in tumors. For example, it has been observed that mutated TP53 is present in 30 to 50% of human cancers [43]. In normal conditions, TP53 remains low-expressed under the control of MDM2, which is an oncogene often highly expressed in tumor cells. We model the pathways shown in Fig. 5 via a BNp with perturbation probability p = 0.01. Six nodes DNA DSBs, MDM2, TP53, WIP1, CHK2, and ATM are named X 1 up to X 6 , respectively. DNA DSBs is a signal that indicates the existence of double stand breaks. The dynamics of the network are governed via majority vote rule for which a regulatory matrix R defining the regulatory interactions between genes is defined as R ij = ⎧ ⎨ ⎩ 1 activating relation from j to i −1 suppressive relation from j to i 0 no relation from j to i . Table 3 The percentage of success, failure, and tie for performing the chosen experiment rather than the suboptimal experiments Using matrix R, the value of gene X i is updated as In Fig. 5, blunt arrows represent suppressive regulations and normal arrows represent activating regulations. It has been observed that in the presence of DNA damage (X 1 = 1), up-regulated MDM2 (X 2 = 1) and down-regulated TP53 (X 3 = 0) would lead to cancerous cells [15,44,45]. Therefore, the set of undesirable states U includes those states with X 1 = 0, X 2 = 1, and X 3 = 0, i.e., U = {48, . . . , 55}. The cost function r(i, j, c) is the same as the Fig. 3 Performance evaluation of different experimental design approaches for a sequence of experiments. The size of initial data used for updating priors is L = 0 one given in (38). We also use gene ATM as the control gene.
After building the BNp model from pathways, we extract the CPMs based on the procedure explained for simulations on synthetic networks. Since the network is fixed in this example, we randomly select 10 different sets of 5 conditional probabilities and assume that they are unknown. We run the experimental design simulations for 100 different assumed true values for each set of unknown probabilities. The length of X L used for updating beta priors is L = 5. Figure 6 illustrates the average cost obtained (over 1,000 different simulations) after each experiment in a sequence of experiments when optimal experimental design, approximate experimental design, or the random Fig. 4 Performance evaluation of the approximate experimental design method and random selection policy for networks with 9 genes and 8 unknown probabilities. The length of X L is L = 5

Fig. 5
Regulatory relationships between genes in a signal pathway regulating the TP53 gene [42] selection is employed. Better performance of both proposed approaches in comparison to the random selection policy is obvious from this.

Real network example: mammalian cell cycle
As another example of real gene regulatory networks, consider the 9-gene mutated cell cycle network model. Cell cycle is a tightly controlled process initiated only in response to external stimuli, such as growth factors, under normal situations. A regulatory model containing 10 genes is proposed for the normal cell cycle in [46]. These 10 genes are the genes in Table 4 along with gene p27. A permanently down-regulated gene p27 in the cell cycle network results in a mutated cell cycle network consisting of 9 genes. The Boolean functions for the mutated network are summarized in Table 4 [46]. We use these Boolean functions and build a BNp model with perturbation probability p = 0.01. The index of each gene in the BNp model is given in the table. In this network, if both Cyclin D (CycD) and retinoblastoma (Rb) are downregulated, then cell cycle continues even in the absence of stimuli, thereby leading to the growth of tumors. Hence, states with down-regulated CycD (X 1 = 0) and downregulated Rb (X 2 = 0) are undesirable. To define a control problem, we use the cost function in (38) and choose gene CycA as the control gene.
Due to the size of the mammalian cell cycle, the optimal experimental design is not applicable. Therefore, for this network, we compare the approximate method and the random selection policies. The simulation settings are

Discussion
An inherent problem of dealing with stationary control policies in a Markovian network is computational complexity, which is due to the exponential increase of the number of states with the network size. We have been able to mitigate the complexity of the experimental design and thereby push the size limit (as demonstrated in Fig. 2) by proposing an approximate experimental design approach based on mean first passage time. However, we believe that more complexity reduction should be achieved for addressing experimental design for extremely Table 4 The set of Boolean functions for a mutated cell cycle [46] Gene Node Boolean function Cdh1 X 6 (X 9 ∧ X 8 ) ∨ X 5 UbcH10 X 7 X 6 ∨ (X 6 ∧ X 7 ∧ (X 5 ∨ X 9 ∨ X 8 )) CycB X 8 (X 5 ∧ X 6 ) CycA X 9 (X 3 ∧ X 2 ∧ X 5 ∧ (X 6 ∧ X 7 )) ∨(X 9 ∧ X 2 ∧ X 5 ∧ (X 6 ∧ X 7 ))

Fig. 7
Performance evaluation of the approximate experimental design for a sequence of experiments based on the mutated mammalian cell cycle model large gene networks. Considering these intrinsic computational issues, we plan to find more efficient approximations and investigate accelerated implementation of the method via efficient computer architecture platforms, such as Graphic Processing Unit (GPU). Another consideration is the accuracy of the prior distributions used in the experimental design calculation. The performance of the experimental design depends on the degree to which the prior probabilities can describe the existing knowledge regarding uncertain parameters accurately. The problem of finding optimal prior probabilities in this context can be solved under a prior construction optimization framework, which involves constructing a mapping that transforms signaling relations to constraints on the prior distribution. Constructing optimal priors has been done for genomic classification [14,47]. In future work, we aim to develop an optimization framework to address prior construction for experimental design in gene regulatory networks.

Conclusions
Given the complexity of biological systems and the cost of experiments, experimental design is of great practical significance in translational genomics. In this paper, we address the problem of optimal experimental design for gene regulatory networks controlled with stationary control policies. The proposed experimental design framework is based on the notion of mean objective cost of uncertainty, which views model uncertainty in terms of the increased cost it induces. Future work includes further reducing the computational cost of the method and also designing optimization frameworks for constructing optimal prior distributions. Also, another avenue of research is to implement an integrative experimental design method that can utilize the RNA-Seq data for the genes on the same pathway [48] for optimal uncertainty reduction.