Intrinsically Bayesian robust classifier for single-cell gene expression trajectories in gene regulatory networks

Karbalayghareh, Alireza; Braga-Neto, Ulisses; Dougherty, Edward R.

doi:10.1186/s12918-018-0549-y

Research
Open access
Published: 21 March 2018

Intrinsically Bayesian robust classifier for single-cell gene expression trajectories in gene regulatory networks

Alireza Karbalayghareh¹,
Ulisses Braga-Neto^1,2 &
Edward R. Dougherty^1,2

BMC Systems Biology volume 12, Article number: 23 (2018) Cite this article

2778 Accesses
9 Citations
1 Altmetric
Metrics details

Abstract

Background

Expression-based phenotype classification using either microarray or RNA-Seq measurements suffers from a lack of specificity because pathway timing is not revealed and expressions are averaged across groups of cells. This paper studies expression-based classification under the assumption that single-cell measurements are sampled at a sufficient rate to detect regulatory timing. Thus, observations are expression trajectories. In effect, classification is performed on data generated by an underlying gene regulatory network.

Results

Network regulation is modeled via a Boolean network with perturbation, regulation not fully determined owing to inherent biological randomness. The binary assumption is not critical because the resulting Markov chain characterizes expression trajectories. We assume a partially known Gaussian observation model belonging to an uncertainty class of models. We derive the intrinsically Bayesian robust classifier to discriminate between wild-type and mutated networks based on expression trajectories. The classifier minimizes the expected error across the uncertainty class relative to the prior distribution. We test it using a mammalian cell-cycle model, discriminating between the normal network and one in which gene p27 is mutated, thereby producing a cancerous phenotype. Tests examine all model aspects, including trajectory length, perturbation probability, and the hyperparameters governing the prior distribution over the uncertainty class.

Conclusions

Simulations show the rates at which the expected error is diminished by smaller perturbation probability, longer trajectories, and hyperparameters that tighten the prior distribution relative to the unknown true network. For average-expression measurement, methods have been proposed to obtain prior distributions. These should be extended to the more mathematically difficult, but more informative, expression trajectories.

Background

Phenotype classification is a salient issue for translational genomics, for instance, classification of normal versus cancerous cells, of different stages of tumor development, or different prospective drug response. Both expression microarray and RNA-Seq measurements have received much interest; however, they suffer from a lack of specificity. First, pathway timing is not reflected in the data and, second, expressions are averaged across groups of cells, so that individual cell responses are not detectable. New technologies are being developed for profiling individual cells using RNA-Seq or quantitative PCR [1]. Individual cells can be captured via standard methods, such as flow cytometry, glass capillaries, or laser [2], and be measured at various time points.

Genes have interactions with each other, which can determine how they are behaving over time and define the dynamics of gene regulatory networks (GRNs). One way of showing the dynamics of GRNs over discrete time points is Boolean networks with perturbation (BNp) [3]. A BNp is a Markov chain, in which the state of a gene (0 for off and 1 for on) at the current time is a function of the states of its predictor genes at the previous time plus a small random Boolean noise. Network inference could be done using RNA-Seq time series measurements [4].

Suppose we have single-cell measurements sampled with a sufficient rate to detect regulatory timing. In effect, this would mean that classification would be done on data reflecting an underlying gene regulatory network. In [5] we proposed a classifier to classify the state trajectories of the two classes: wild-type and mutated, each having its own BNp. We derived the Bayes classifier and computed the Bayes error. We analyzed the effects of the length of the trajectories, perturbation probability, and different mutations on the Bayes error.

In [6] and [7] we assumed an observation model on the state dynamics of the BNps, from which the expression values of the genes are obtained. As the parameters of the model were all unknown and the network functions were partially known, we proposed an expectation-maximization (EM)-based algorithm to estimate these parameters and functions, and then plugged them into the Bayes classifier. In [6] we assumed that the expression trajectory data come from single-cell measurements and compared that with a multiple-cell scenario, in which instead of trajectories we have the averaged expressions of all genes over all cells, which translates to an average over all states.

In this paper we extend the single-cell trajectory classification to the Bayesian framework. We propose the intrinsically Bayesian robust (IBR) classifier for the trajectorires. The IBR classifier is a specific type of the obtimal Bayesian classifier (OBC), first introduced in [8, 9] for the classification of static data. In fact, the difference between the OBC and IBR classifiers is that in the OBC the expectation of the class-conditional densities is taken over the posteriors of the parameters to obtain the effective class-conditional densities, whereas in the IBR classifier the expectation is taken over the priors. The IBR/OBC concept has been applied to linear and morphological filtering [10, 11], and IBR Kalman filtering [12]. Regarding the prior distributions, prior construction methods using the pathway knowledge have been studied in the literature, such as [13, 14]. Bayesian methods for finding differentially expressed genes can be used for the aim of classification [15].

Here we apply the IBR classifier to the classification of trajectories. As opposed to [6, 7], where we estimate the parameters, here we assume that the parameters belong to an uncertainty class governed by a prior distribution. We assume that there are two classes: wild-type (S=0) and mutated (S=1). We introduce a Bayesian version of the partially observed Boolean dynamical system (POBDS), proposed in [16], as the observation model. We use a beta prior distribution for the prior probability of the class S=0 and also for the gene perturbation probability. Since the observation model given the states is Gaussian, we employ the normal-gamma distribution as the prior distribution of the mean and precision (inverse of variance) of the Gaussian model.

In the simulation part, we employ a mammalian cell-cycle gene regulatory network [17] consisting of 10 genes as the BNp for the class S=0 and its mutated version as the BNp for the class S=1. We analyze the effects of all the hyperparameters of the model and the length of the observed trajectories on the classification error. The proposed classifier is computationally efficient because we use the sum-product method to reduce the complexity to m×2ⁿ, where m is the length of the observed trajectory and n is the number of genes in the network. Being linearly dependent on time points, m, makes the classifier very fast even for longer trajectories.

Methods

In a Boolean network (BN) for n genes, each gene value x_i∈{0,1}, for i=1,⋯,n, at time k+1 is determined by the values of some predictor genes at time k via a Boolean function f_i:{0,1}ⁿ→{0,1}. In practice, f_i is a function of a small number of genes, K_i, called the in-degree of the gene x_i in the network. The in-degree of the network is K= maxi=1,⋯,nK_i. A gene network can be represented as a graph with vertices representing genes and edges representing regulations. There is a state diagram of 2ⁿ states corresponding to the truth table of the BN, representing the dynamics of the network. Given an initial state, a BN will eventually reach a set of states, called an attractor cycle, through which it will cycle endlessly. Each initial state corresponds to a unique attractor cycle, and the set of initial states leading to a specific attractor cycle is known as the basin of attraction (BOA) of the attractor cycle.

State model

We allow stochasticity in our state model by using Boolean networks with perturbation (BNps) instead of deterministic BNs. For BNps, perturbation is introduced with a probability p_k by which the state of a gene in the network can be randomly flipped at time k. We assume that there is an independent identically distributed (i.i.d.) random perturbation vector at each time k, denoted by n_k∈{0,1}ⁿ, such that the i-th gene flips at time k if the i-th component of n_k is equal to 1. Therefore, the dynamical model can be expressed as

$$ \mathbf{X}_{k+1}=\mathbf{f}(\mathbf{X}_{k})\oplus \mathbf{n}_{k+1}, \quad k=0,1,2,\cdots, $$

(1)

where X_k=[x₁(k),x₂(k),⋯,x_n(k)]^T is a binary state vector, called a gene activity profile (GAP), at time k, in which x_i(k) indicates the expression level of the i-th gene at time k (either 0 or 1); f=[f₁,f₂,⋯,f_n]^T:{0,1}ⁿ→{0,1}ⁿ is the vector of the network functions, in which f_i shows the expression level of the i-th gene at time k+1 when the system lies in the state X_k at time k; n_k=[n₁(k),n₂(k),⋯,n_n(k)]^T is the perturbation vector at time k, in which n₁(k),n₂(k),⋯,n_n(k) are i.i.d. Bernoulli random variables for every k with the parameter p_k=P(n_i(k)=1) for i=1,⋯,n; and ⊕ is component-wise modulo 2 addition.

The existence of perturbation makes the corresponding Markov chain of a BNp irreducible. Hence, the network possesses a steady-state distribution π describing its long-run behavior. If p_k is sufficiently small, π will reflect the attractor structure within the original network. We can derive the transition probability matrix (TPM) if we know the truth table and the perturbation probability of a BNp. As a result, the steady-state distribution π can be computed as well.

Prior for state parameter

In this paper, we assume that we know the underlying Boolean networks for both the wild-type and mutated classes, and the only uncertain parameter at the state level is the perturbation probability p_k. Since 0<p_k<1, we can employ the beta prior for p_k, for all k=1,2,⋯, as

$$ g(p_{k}) \sim \text{Beta}(a,b) = \frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)} p_{k}^{a-1} (1-p_{k})^{b-1}, $$

(2)

where a and b are known parameters. Since in reality p_k is close to zero, we can choose a and b in such a way that this fact is satisfied. To this end, we can use the mean and variance of p_k:

$$ \mathrm{E}\left[p_{k}\right]=\frac{a}{a+b}, \quad \text{Var}\left[p_{k}\right] = \frac{ab}{(a+b)^{2}(a+b+1)}. $$

(3)

Observation model

We define a Bayesian partially-observed Boolean dynamical system (BPOBDS) as the model for the gene expression data. In this model, we assume that the gene expressions come from Gaussian distributions whose parameters are governed by prior distributions whose parameters (the hyperparameters of the observations) are a function of the hidden Boolean states. If y_j(k) is the expression value of the j-th gene at time k, then the observation model is

$$ p\left(y_{j}(k)|\theta_{j}(k),\lambda_{j}(k)\right) \sim \mathcal{N}\left(\theta_{j}(k),\lambda_{j}(k)^{-1}\right), $$

(4)

for j=1,2,⋯,n and k=1,2,⋯, where θ_j(k) and λ_j(k) denote the mean and precision, respectively, of the Gaussian distribution.

Priors for observation parameters

We employ the well-known normal-gamma prior distribution for θ_j(k) and λ_j(k):

$$\begin{array}{@{}rcl@{}} p(\lambda_{j}(k)) &\sim& \text{Gamma}(\alpha_{0},\beta_{0}), \\ p(\theta_{j}(k)| \lambda_{j}(k), x_{j}(k)) &\sim& \mathcal{N} \left(\mu_{j}(k),(\kappa_{0} \lambda_{j}(k))^{-1}\right), \\ \text{where} ~~~ \mu_{j}(k) &=& \mu_{0} + \delta_{0} x_{j}(k), \end{array} $$

(5)

where α₀, β₀, κ₀, μ₀, and δ₀ are known positive hyperparameters, and x_j(k) is the hidden Boolean state of gene j at time k. The intuition behind the prior (5) is that when gene j at time k is on or off, that is, x_j(k)=1 or 0, the hyper-mean of the expression for that gene is μ_j(k)=μ₀+δ₀ or μ_j(k)=μ₀, respectively, at time k. In (5), μ₀ is the baseline expression level and δ₀ is the expression coefficient. The hyperparameters α₀, β₀, and κ₀ determine the level of uncertainty, by which we can control the variance of the outputs. We assume the same values of hyperparameters for all genes at all times.

IBR classifier

If one knows the feature-label distribution, then the error of any classifier can be found and an optimal (Bayes) classifier minimizes classifier error. If the feature-label distribution is unknown but belongs to an uncertainty class Θ of feature-label distributions, then we desire a classifier to minimize the expected error over the uncertainty class. Given a classifier ψ, from the perspective of mean-square error (MSE), the best error estimate minimizes the MSE between its true error (a function of parameter θ) and an error estimate. This Bayesian minimum-mean-square-error (MMSE) estimate is given by the expected true error, $\widehat {\varepsilon }(\psi)=\mathrm {E}_{\theta }\left [\varepsilon (\psi,\mathbf {\theta })\right ]$, where ε(ψ,θ) is the error of ψ on the feature-label distribution parameterized by θ and the expectation is taken relative to the prior distribution π(θ) [18].

An IBR classifier minimizes the Bayesian MMSE estimate. If ψ(x)=0 if x∈R₀ and ψ(x)=1 if x∈R₁, where x is a multidimensional vector of data, and R₀ and R₁ partition the sample space, then [8]

$${} \widehat{\varepsilon }\left(\psi \right) =\mathrm{E}_{\pi}[\!c]\int_{R_{1}}f_{\Theta}\left(\mathbf{x}|0\right) d\mathbf{x}+\left(1-\mathrm{E}_{\pi }[\!c]\right)\int_{R_{0}}f_{\Theta}\left(\mathbf{x}|1\right) d\mathbf{x}, $$

where

$$f_{\Theta}\left(\mathbf{x}|y\right) =\int_{\mathbf{\Theta }_{y}}f_{\mathbf{\theta}_{y}}\left(\mathbf{x}|y\right) \pi \left(\mathbf{\theta}_{y}\right) d\mathbf{\theta }_{y} $$

is the effective class-conditional density for class y, Θ_y being the space for θ_y, $f_{\mathbf {\theta }_{y}}\left (\mathbf {x}|y\right)$ is the class-conditional density, and c is the prior probability of the class 0. The IBR classifier is given by [8]

$$ \psi_{\text{IBR}}\left(\mathbf{x}\right) =\left\{ \begin{array}{ll} 0 \quad \text{if ~~ }\mathrm{E}_{\pi }[\!c]f_{\Theta}\left(\mathbf{x}|0\right) \geq \\ \qquad \quad \left(1-\mathrm{E}_{\pi }[\!c]\right)f_{\Theta}\left(\mathbf{x}|1\right) & \\ 1 \quad \text{otherwise} \end{array} \right.. $$

(6)

Trajectory-based IBR classifier

Let Θ_s=[p_2:m,θ_1:n(1:m),λ_1:n(1:m)] denote the parameters of the class S=s, for s=0,1, where p_2:m means the parameters p₂,p₃,⋯,p_m, and similarly for θ_1:n(1:m) and λ_1:n(1:m). Furthermore, let f_s denote the Boolean network function of the class S=s and $\mathcal {X}=\left [\mathbf {X}_{1},\mathbf {X}_{2},\cdots,\mathbf {X}_{m}\right ]$ denote the Boolean state trajectory at m consecutive times points at which $\mathcal {Y}$ has been observed. The n×1 Boolean vector X_k=[x₁(k),x₂(k),⋯,x_n(k)]^T has the states of the n genes at time k, which are hidden and not observed.

Suppose that we obtain the expressions of n genes at m consecutive time points. Let Y_k=[y₁(k),⋯,y_n(k)]^T denote the n×1 expression vector of n genes at time k, and $\mathcal {Y}=\left [\mathbf {Y}_{1},\cdots,\mathbf {Y}_{m}\right ]$ denote a time trajectory of length m, containing the expression vectors at the m consecutive times. The problem is to optimally classify this observed trajectory $\mathcal {Y}$ to the class 0 (wild-type) or class 1 (mutated). Let c and 1−c be the prior probabilities of the class S=0 and class S=1, respectively. Since we are uncertain about c, we use a beta prior

$$ g(c) \sim \text{Beta}\left(a_{c},b_{c}\right) = \frac{\Gamma\left(a_{c}+b_{c}\right)}{\Gamma(a_{c})\Gamma(b_{c})} c^{a_{c}-1} (1-c)^{b_{c}-1}, $$

(7)

with mean

$$ \mathrm{E}[\!c]=\frac{a_{c}}{a_{c}+b_{c}}, $$

(8)

where a_c and b_c are known parameters.

According to (6), the IBR classifier for the trajectories is

$$ \psi_{\text{IBR}}(\mathcal{Y})=\left\{ \begin{array}{ll} 0 \quad \text{if} ~~ \mathrm{E}[\!c]p(\mathcal{Y}|S=0)\geq \\ \hspace{1cm} (1-\mathrm{E}[\!c])p(\mathcal{Y}|S=1) & \\ 1 \quad \text{otherwise} \end{array},\right. $$

(9)

where $p(\mathcal {Y}|S=s)$ is the effective class-conditional density of the trajectory $\mathcal {Y}$ in the class S=s for s=0,1.

Effective class-conditional densities of trajectories

The joint distribution of $\mathcal {Y}$, $\mathcal {X}$, and Θ_s given the class S=s can be factorized as

$$ \begin{aligned} p\left(\mathcal{Y},\mathcal{X},\Theta_{s} | S=s\right) &= g(p_{2:m}) P\left(\mathcal{X}|p_{2:m}, S=s\right) \\ & \quad \times p\left(\mathcal{Y}| \theta_{1:n}(1:m),\lambda_{1:n}(1:m)\right) \\ & \quad \times p\left(\theta_{1:n}(1:m), \lambda_{1:n}(1:m)|\mathcal{X}\right). \end{aligned} $$

(10)

In deriving (10), it is assumed that the state parameters {p_2:m} are independent of the observation parameters {θ_1:n(1:m),λ_1:n(1:m)}. Due to the independence assumption in the priors,

$$ \begin{aligned} g(p_{2:m}) &= \prod_{k=1}^{m-1} g(p_{k+1}) \\ &= \prod_{k=1}^{m-1} \frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)} p_{k+1}^{a-1} (1-p_{k+1})^{b-1}, \end{aligned} $$

(11)

$$ \begin{aligned} &p\left(\theta_{1:n}(1:m), \lambda_{1:n}(1:m)|\mathcal{X}\right) \\ &= \prod_{k=1}^{m} \prod_{j=1}^{n} p(\theta_{j}(k) | \lambda_{j}(k),x_{j}(k)) p(\lambda_{j}(k))\\ &= \prod_{k=1}^{m} \prod_{j=1}^{n} \frac{1}{\sqrt{2\pi (\kappa_{0} \lambda_{j}(k))^{-1}}} \frac{\beta_{0}^{\alpha_{0}}}{\Gamma(\alpha_{0})} \\ & \times \exp \left(-\frac{(\theta_{j}(k)-\mu_{j}(k))^{2}}{2(\kappa_{0} \lambda_{j}(k))^{-1}}\right) \lambda_{j}(k)^{\alpha_{0}-1} \exp(-\beta_{0} \lambda_{j}(k)). \end{aligned} $$

(12)

If we assume that the conditional expressions, given the parameters, of the n genes at m time points are independent, the likelihood of $\mathcal {Y}$ given the parameters in (10) can be written as

$$ \begin{aligned} &p\left(\mathcal{Y}| \theta_{1:n}(1:m),\lambda_{1:n}(1:m)\right) \\ &= \prod_{k=1}^{m}\prod_{j=1}^{n} p\left(y_{j}(k) | \theta_{j}(k),\lambda_{j}(k)\right)\\ &= \prod_{k=1}^{m}\prod_{j=1}^{n} \frac{1}{\sqrt{2\pi \lambda_{j}(k)^{-1}}} \exp \left(-\frac{(y_{j}(k)-\theta_{j}(k))^{2}} {2 \lambda_{j}(k)^{-1}} \right). \end{aligned} $$

(13)

We should note that the independency assumption in (13) is only for the observations, whereas the genes have interactions at the state level, following the underlying Boolean network, so that they cannot be considered independent. Due to the Markov property in (1), $p\left (\mathcal {X}|p_{2:m},S=s\right)$ in (10) can be factored as

$$ \begin{aligned} P(\mathcal{X} | p_{2:m}, S&=s) = \\ P(\mathbf{X}_{1}| S&=s) \prod_{k=1}^{m-1} P(\mathbf{X}_{k+1} | \mathbf{X}_{k}, p_{k+1}, S=s), \end{aligned} $$

(14)

where P(X_k+1|X_k,p_k+1,S=s) is the probability of transitioning from state X_k at time k to state X_k+1 at time k+1, given the perturbation probability p_k+1, in the class S=s, and P(X₁|S=s) is the probability of the first state X₁ in the class S=s. Let xⁱ denote the n×1 Boolean vector of the state i, for i=1,2,⋯,2ⁿ. Given the perturbation probability p_k+1, the conditional transition probability matrix (TPM) at time k+1, which is a 2ⁿ×2ⁿ matrix, in the class S=s can be derived from (1) as

$$ \begin{aligned} \mathbf{A}^{(s)}_{i,j}(k+1) &= P\left(\mathbf{X}_{k+1}=\mathbf{x}^{j} | \mathbf{X}_{k}=\mathbf{x}^{i}, p_{k+1}, S=s\right)\\ &= p_{k+1}^{\mathbf{d}\left(\mathbf{x}^{j},\mathbf{f}_{s}\left(\mathbf{x}^{i}\right)\right)}(1-p_{k+1})^{n-\mathbf{d}\left(\mathbf{x}^{j},\mathbf{f}_{s}\left(\mathbf{x}^{i}\right)\right)}, \end{aligned} $$

(15)

where d(x^j,f_s(xⁱ)) is the Hamming distance between the two Boolean vectors x^j and f_s(xⁱ).

For obtaining $p(\mathcal {Y} | S=s)$, we need to integrate out the joint distribution $p(\mathcal {Y},\mathcal {X},\Theta _{s} | S=s)$ in (10) with respect to Θ_s and $\mathcal {X}$:

$$ {{\begin{aligned} p\left(\mathcal{Y} | S=s\right) &= \sum_{\mathcal{X}} \int_{\Theta_{s}} p\left(\mathcal{Y},\mathcal{X},\Theta_{s} | S=s\right) \\ &= \sum_{\mathcal{X}} \left\lbrace P(\mathbf{X}_{1} | S=s) \prod_{k=1}^{m-1} \int_{p_{k+1}} g(p_{k+1}) p_{k+1}^{\mathbf{d}\left(\mathbf{X}_{k+1},\mathbf{f}_{s}\left(\mathbf{X}_{k}\right)\right)} \right. \\ & \quad\times (1-p_{k+1})^{n-\mathbf{d}\left(\mathbf{X}_{k+1},\mathbf{f}_{s}\left(\mathbf{X}_{k}\right)\right)} d p_{k+1} \\ & \quad\times \prod_{k=1}^{m} \prod_{j=1}^{n} \int_{\theta_{j}(k)} \int_{\lambda_{j}(k)} p\left(y_{j}(k) | \theta_{j}(k), \lambda_{j}(k)\right) \\ & \quad\left. \times p\left(\theta_{j}(k) | \lambda_{j}(k),x_{j}(k)\right) ~ p\left(\lambda_{j}(k)\right) ~ d\theta_{j}(k) d \lambda_{j}(k) \right\rbrace. \end{aligned}}} $$

(16)

Fortunately, as the priors are conjugate, we can analytically solve the integrals in (16). We will use the following lemmas to do so.

Lemma 1

Let P=(0 1) be the domain of p_k+1. The following equation holds:

$$ {}\begin{aligned} K_{1} &\triangleq \int_{\mathbf{P}} g(p_{k+1}) p_{k+1}^{\mathbf{d}(\mathbf{X}_{k+1},\mathbf{f}_{s}(\mathbf{X}_{k}))} \\ & \quad \times (1-p_{k+1})^{n-\mathbf{d}(\mathbf{X}_{k+1},\mathbf{f}_{s}(\mathbf{X}_{k}))} d p_{k+1} \\ &= \frac{ \Gamma(\mathbf{d}(\mathbf{X}_{k+1},\mathbf{f}_{s}(\mathbf{X}_{k})) + a) \Gamma(n - \mathbf{d}(\mathbf{X}_{k+1},\mathbf{f}_{s}(\mathbf{X}_{k})) + b)}{\Gamma(a) \Gamma(b) \Gamma(a+b+n)\Gamma(a+b)^{-1}}. \end{aligned} $$

(17)

Proof

See Appendix 1 in Additional file 1. □

We assume that the observations $\mathcal {Y}$occur in the steady state. As a result, in (16), P(X₁|S=s) is the steady-state probability of the state X₁ in the class S=s, for s=0,1. The following lemma gives the steady-state distribution.

Lemma 2

Let $\pi ^{(s)}_{i}=P\left (\mathbf {X}_{1}=\mathbf {x}^{i} | S=s\right)$ denote the steady-state probability of the i-th state in the class S=s, and $\pi ^{(s)}=\left [\pi ^{(s)}_{1},\cdots,\pi ^{(s)}_{2^{n}}\right ]$ be the 1×2ⁿ vector of the steady-state distribution. Then π^(s) can be calculated from

$$ \pi^{(s)} = \pi^{(s)} \mathbf{M}^{(s)}, \quad \sum_{i=1}^{2^{n}} \pi_{i}^{(s)}=1, $$

(18)

where M^(s) is the transition probability matrix of the class S=s with the entries

$${} \mathbf{M}^{(s)}_{i,j} = \frac{ \Gamma\left(\mathbf{d}\left(\mathbf{x}^{j},\mathbf{f}_{s}\left(\mathbf{x}^{i}\right)\right) + a\right) \Gamma\left(n - \mathbf{d}\left(\mathbf{x}^{j},\mathbf{f}_{s}\left(\mathbf{x}^{i}\right)\right) + b\right)}{\Gamma(a) \Gamma(b) \Gamma(a+b+n) \Gamma(a+b)^{-1}}. $$

(19)

Proof

See Appendix 2 in Additional file 1. □

Lemma 3

Let Ω=(−∞ ∞) and Λ=(0 ∞) be the domains of θ_j(k) and λ_j(k), respectively, for j=1,⋯,n, and k=1,⋯,m. The following equation holds:

$$ \begin{aligned} K_{2} & \triangleq \int_{\mathbf{\Omega}} \int_{\mathbf{\Lambda}} p(y_{j}(k) | \theta_{j}(k), \lambda_{j}(k)) \\ & \quad \times p(\theta_{j}(k) | \lambda_{j}(k),x_{j}(k)) ~ p(\lambda_{j}(k)) ~ d\theta_{j}(k) d \lambda_{j}(k) \\ & =\frac{1}{(2\pi)^{\frac{1}{2}}} \left(\frac{\kappa_{0}}{\kappa_{1}}\right)^{\frac{1}{2}} \frac{\Gamma(\alpha_{1})}{\Gamma(\alpha_{0})} \frac{\beta_{0}^{\alpha_{0}}}{\beta_{1}^{\alpha_{1}}}, \end{aligned} $$

(20)

where

$$ \begin{aligned} \kappa_{1} &= \kappa_{0} + 1, \\ \alpha_{1} &= \alpha_{0}+\frac{1}{2}, \\ \beta_{1} &= \beta_{0} + \frac{\kappa_{0} \left(y_{j}(k)-\mu_{0} - \delta_{0} x_{j}(k)\right)^{2}}{2(\kappa_{0} + 1)}. \end{aligned} $$

(21)

Proof

See Appendix 3 in Additional file 1. □

Summing out $\mathcal {X}$

Using (16), (17), (19), and (20), we have

$$ {{\begin{aligned} p(\mathcal{Y}|S=s) &= \sum_{\mathbf{X}_{1}} \cdots \sum_{\mathbf{X}_{m}} \left\lbrace{\vphantom{\left. \quad \times \prod_{k=1}^{m} \prod_{j=1}^{n} \frac{1}{(2\pi)^{\frac{1}{2}}} \left(\frac{\kappa_{0}}{\kappa_{1}}\right)^{\frac{1}{2}} \frac{\Gamma(\alpha_{1})}{\Gamma(\alpha_{0})} \frac{\beta_{0}^{\alpha_{0}}}{\beta_{1}^{\alpha_{1}}} \right\rbrace}} \pi_{\mathbf{X}_{1}}^{(s)} \prod_{k=1}^{m-1} \frac{ \Gamma(\mathbf{d}(\mathbf{X}_{k+1},\mathbf{f}_{s}(\mathbf{X}_{k})) + a) \Gamma(n - \mathbf{d}(\mathbf{X}_{k+1},\mathbf{f}_{s}(\mathbf{X}_{k})) + b)}{\Gamma(a) \Gamma(b) \Gamma(a+b+n) \Gamma(a+b)^{-1}} \right.\\ &\left. \quad \times \prod_{k=1}^{m} \prod_{j=1}^{n} \frac{1}{(2\pi)^{\frac{1}{2}}} \left(\frac{\kappa_{0}}{\kappa_{1}}\right)^{\frac{1}{2}} \frac{\Gamma(\alpha_{1})}{\Gamma(\alpha_{0})} \frac{\beta_{0}^{\alpha_{0}}}{\beta_{1}^{\alpha_{1}}} \right\rbrace. \end{aligned}}} $$

(22)

We use the sum-product algorithm [19] to efficiently compute the summation in (22). Define the 2ⁿ×1 vector Φ(k) by

$${} \begin{aligned} \Phi_{i}(k) &= \left(\frac{ \beta_{0}^{\alpha_{0}}}{(2\pi)^{\frac{1}{2}}} \left(\frac{\kappa_{0}}{\kappa_{0}+1}\right)^{\frac{1}{2}} \frac{\Gamma\left(\alpha_{0}+\frac{1}{2}\right)}{\Gamma(\alpha_{0})}\right)^{n} \\ & \quad \times \prod_{j=1}^{n} \left[\beta_{0} + \frac{\kappa_{0}\left(y_{j}(k)-\mu_{0}-\delta_{0}\mathbf{x}^{i}_{j}\right)^{2}}{2(\kappa_{0}+1)} \right]^{-\left(\alpha_{0}+\frac{1}{2}\right)}, \end{aligned} $$

(23)

for i=1,⋯,2ⁿ, where $\mathbf {x}^{i}_{j}$ is the j-th entry of the Boolean state xⁱ. We define an auxiliary 2ⁿ×1 vector Π^(s)(k) at the time k, for k=1,⋯,m, which is initialized and updated as follows:

$$\begin{array}{@{}rcl@{}} \Pi^{(s)}(1) &=& \left({\pi}^{(s)}\right)^{T} \circ \Phi(1), \\ \Pi^{(s)}(k+1) &=& \left[{\mathbf{M}^{(s)}}^{T} \Pi^{(s)}(k)\right] \circ \Phi(k+1), \end{array} $$

(24)

for k=1,⋯,m−1, where T denotes the transpose, and ∘ is the Hadamard product. Once we have calculated Π^(s)(m), the summation of (22) is equal to the l₁ norm ∥Π^(s)(m)∥₁, which is the summation of the all 2ⁿ entries of Π^(s)(m). Therefore, (22) can be written as

$$ p(\mathcal{Y}|S=s) = \parallel \Pi^{(s)}(m) \parallel_{1}. $$

(25)

Results and discussion

In this section, we consider a mammalian cell-cycle gene regulatory network [17], depicted in Fig. 1, for evaluating our proposed trajectory-based IBR classifier. This GRN consists of n=10 genes, whose interactions are shown in Fig. 1. The Boolean functions of the corresponding Boolean network for this GRN are given in Table 1 [17]. We define class S=0 as the wild-type class, whose network function f₀ is in Table 1. The value of the gene CycD is determined by extracellular signals, and we assume it is f₁=0. According to [17], one mutated case which leads to cancer is when the gene p27 in the network is shut down and cannot be activated by its regulating genes, that is, f₃=0. Therefore, we define class S=1 as the mutated class with the network function f₁, which is the same as Table 1 with f₃=0. We analyze the effects of the hyperparameters and m on the classification error in Figs. 2, 3, 4, 5, 6, 7, 8 and 9. In the simulations, we have set a_c=b_c=10 and μ₀=10. Therefore, the prior probability c of the class S=0 will have the mean value E[ c]=0.5.

Table 1 Definitions of Boolean functions for the wild-type mammalian cell-cycle BN with 10 genes

Full size table

Figure 2 shows the classification error versus m for a=1,3,5, when the values of the other hyperparameters are b=100, α₀=100, β₀=10⁴, δ₀=40, and κ₀=100. We see that the classification error decreases by increasing m and converges to zero. This means that for long enough trajectories we can have perfect classification. The value of the hyperparameter a determines the amount of uncertainty for the gene perturbation probability p_k at time k. From a biological perspective, we know that p_k should be small. As a result, we have chosen b=100, and a=1,3,5, leading to the mean values of p_k, respectively, as $\mathrm {E}\left [p_{k}\right ]=\frac {a}{a+b}\approx 0.01, 0.03, 0.05$. For a given b, the bigger value of a allows a wider range of p_k, which results in higher classification error.

Figure 3 represents the classification error versus m for different values of α₀ and β₀, when a=1, b=100, δ₀=40, and κ=100. As the precision (inverse of variance) has a Gamma(α₀,β₀) distribution, its mean and variance are equal to $\mathrm {E}\left [\lambda \right ]=\frac {\alpha _{0}}{\beta _{0}}$ and $\text {Var}\left [\lambda \right ]=\frac {\alpha _{0}}{\beta _{0}^{2}}=\frac {\mathrm {E}[\lambda ]}{\beta _{0}}$. Figure 3 shows the error curves for the two cases $\frac {\alpha _{0}}{\beta _{0}}=10^{-3}$ and 2×10⁻³, each having a different value of β₀, leading to a different variance of λ. As such, we can see the effects of both the mean and variance of the precision λ. Whenever $\frac {\alpha _{0}}{\beta _{0}}$ increases, there is lower variance in the outputs, which results in lower errors. We also notice from Fig. 3 that for a given value of $\frac {\alpha _{0}}{\beta _{0}}$, increasing β₀ decreases the error, the reason being that increased β₀ yields lower variance for the precision.

Figure 4 plots the error versus α₀ for a=1,3,5, when m=5, b=100, δ₀=40, κ₀=100, and β₀=10⁵. For all values of a, the classification error is a decreasing function of α₀. Similarly, Fig. 5 gives the error versus β₀ for a=1,3,5, when α₀=10 and the other hyperparameters are the same as in Fig. 4. Figure 5 shows an increasing trend of classification error as β₀ grows.

Figure 6 shows the error as a function of a, when m=5, b=100, δ₀=40, κ₀=100, α₀=100, and β₀=10⁵. When a grows, the uncertainty of the perturbation probability grows as well. The error is an increasing function of a. Similarly, Fig. 7 is for error versus b, when a=1, and the others are the same as in Fig. 6. In Fig. 7 the classification error is decreasing as b increases.

Figure 8 illustrates the error versus κ₀, for m=5, a=1, b=100, δ₀=40, α₀=100, and β₀=10⁵. From (5), the hyperparameter κ₀ controls the variance of the mean parameters θ_j(k). When κ₀ increases, the uncertainty of θ_j(k) is reduced and its density peaks at μ₀ and μ₀+δ₀ for the unexpressed and expressed states, respectively. Consequently, we expect better error rates for higher κ₀. Accordingly, in Fig. 8 the classification error is a decreasing function of κ₀.

Figure 9 illustrates the error versus δ₀, the expression coefficient, for m=5, a=1, b=100, κ₀=100, α₀=100, and β₀=10⁵. Having larger δ₀ means that the mean values of data for each gene in the unexpressed and expressed cases are well separated, which leads to lower classification error. As expected, in Fig. 9 the error is decreasing as δ₀ gets larger.

Conclusions

In this paper, we propose a trajectory-based intrinsically Bayesian robust classifier for classification of single-cell gene-expression trajectories. We assume that the expressions of the n genes, whose interactions are known in terms of a Boolean network, are observed in m consecutive time points, for both the wild-type class (S=0) and mutated class (S=1). As the parameters have uncertainty, we assign priors for them. We assume a beta distribution as a prior for both the probability of the class S=0 and the gene perturbation probabilities at each time. We assume a normal-gamma distribution for the mean and precision of the expressions at each time and for each gene, given the underlying states. As such, we derive closed-form solutions for the effective class-conditional densities of the trajectories in each class, by which we define the IBR classifier. The performance of the IBR classifier is evaluated in a cell-cycle gene regulatory network with 10 genes, for which we know the Boolean networks of the two classes. We have analyzed the effects of m and all the hyperparameters on the classification error.

In this paper we have assumed the form of the prior distributions, including the hyperparameter values. Once this is done, the analysis is mathematical: derive the effective class-conditional densities. As is generally the case with Bayesian methods, two issues arise: how are the priors developed and how robust are the methods to the prior assumptions? Both issues have been extensively studied in regard to the OBC, of which the IBR classifier is the special case in which there is no update to a posterior.

In [9] robustness of the OBC relative to various modeling assumptions has been investigated, including false assumptions on the covariance structure and falsely assuming Gaussianity. In [20], similar robustness issues have been considered for the MMSE error estimator, which is required for defining the OBC.

Regarding prior construction, the question is how to transform existing scientific knowledge into a prior distribution. Because the IBR classifier and other IBR filtering methods involve prior distributions on the underlying scientific model, for instance, the stochastic process (power spectra) in Wiener filtering and the feature-label distribution in classification, uncertainty arises from insufficient scientific knowledge, the prior characterizes that uncertainty, and, in effect, constrains the optimization relative to that uncertainty. For IBR/OBC classification in the context of uncertain gene regulatory networks, prior construction methods based on constrained optimization have been developed for both Gaussian [21] and discrete [22] gene regulatory networks under the assumption of static observations. Future work involves extending these prior-construction methods to trajectories, which should be possible, albeit, with more difficult mathematical analysis and substantially more computation.

Another issue to be addressed in future work is extending the trajectory-based IBR classifier to optimal Bayesian classification, which will require analytic representation of the effective class-conditional densities relative to a posterior distribution derived from the prior distribution utilizing sample data.

Abbreviations

BN:: Boolean network
BNp:: Boolean network with perturbation
BOA:: Basin of attraction
BPOBDS:: Bayesian partially observed Boolean dynamical system
EM:: Expectation maximization
GRN:: Gene regulatory network
IBR:: Intrinsically Bayesian robust
MMSE:: Minimum mean square error
MSE:: Mean square error
OBC:: Optimal Bayesian classifier
TPM:: Transition probability matrix

References

Wu AR, Neff NF, Kalisky T, Dalerba P, Treutlein B, Rothenberg ME, Mburu FM, Mantalas GL, Sim S, Clarke MF, et al.Quantitative assessment of single-cell rna-sequencing methods. Nat Methods. 2014; 11(1):41–6.
Article CAS Google Scholar
Ståhlberg A, Kubista M. The workflow of single-cell expression profiling using quantitative real-time pcr. Expert Rev Mol Diagn. 2014; 14(3):323–31.
Article Google Scholar
Shmulevich I, Dougherty ER, Zhang W. From boolean to probabilistic boolean networks as models of genetic regulatory networks. Proc IEEE. 2002; 90(11):1778–92.
Article CAS Google Scholar
Karbalayghareh A, Hu T. Inference of sparse gene regulatory network from rna-seq time series data. In: 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP).2015. p. 967–71. https://doi.org/10.1109/GlobalSIP.2015.7418341.
Karbalayghareh A, Braga-Neto U, Hua J, Dougherty ER. Classification of state trajectories in gene regulatory networks. IEEE/ACM Trans Comput Biol Bioinforma. 2017; PP(99):1. https://doi.org/10.1109/TCBB.2016.2616470.
Google Scholar
Karbalayghareh A, Braga-Neto U, Dougherty ER. Classification of single-cell gene expression trajectories from incomplete and noisy data. IEEE/ACM Trans Comput Biol Bioinforma. 2017; PP(99):1. https://doi.org/10.1109/TCBB.2017.2763946.
Google Scholar
Karbalayghareh A, Braga-Neto U, Dougherty ER. Classification of gaussian trajectories with missing data in boolean gene regulatory networks. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).2017. p. 1078–82. https://doi.org/10.1109/ICASSP.2017.7952322.
Dalton LA, Dougherty ER. Optimal classifiers with minimum expected error within a bayesian framework—part i: Discrete and gaussian models. Pattern Recogn. 2013; 46(5):1301–14.
Article Google Scholar
Dalton LA, Dougherty ER. Optimal classifiers with minimum expected error within a bayesian framework — part ii: Properties and performance analysis. Pattern Recogn. 2013; 46(5):1288–300.
Article Google Scholar
Dalton LA, Dougherty ER. Intrinsically optimal bayesian robust filtering. IEEE Trans Signal Process. 2014; 62(3):657–70.
Article Google Scholar
Qian X, Dougherty ER. Bayesian regression with network prior: Optimal bayesian filtering perspective. IEEE Trans Signal Process. 2016; 64(23):6243–53.
Article Google Scholar
Dehghannasiri R, Esfahani MS, Dougherty ER. Intrinsically bayesian robust kalman filter: An innovation process approach. IEEE Trans Signal Process. 2017; 65(10):2531–46.
Article Google Scholar
Boluki S, Esfahani MS, Qian X, Dougherty ER. Constructing pathway-based priors within a gaussian mixture model for bayesian regression and classification. IEEE/ACM Trans Comput Biol Bioinforma. 2017; PP(99):1.
Google Scholar
Boluki S, Esfahani MS, Qian X, Dougherty ER. Incorporating biological prior knowledge for bayesian learning via maximal knowledge-driven information priors. BMC Bioinformatics. 2017; 18(14):552.
Article Google Scholar
Dadaneh SZ, Qian X, Zhou M. Bnp-seq: Bayesian nonparametric differential expression analysis of sequencing count data. J Am Stat Assoc. 2017; 0(ja):0. https://doi.org/10.1080/01621459.2017.1328358.
Google Scholar
Imani M, Braga-Neto UM. Maximum-likelihood adaptive filter for partially observed boolean dynamical systems. IEEE Trans Signal Process. 2017; 65(2):359–71.
Article Google Scholar
Fauré A, Naldi A, Chaouiya C, Thieffry D. Dynamical analysis of a generic boolean model for the control of the mammalian cell cycle. Bioinformatics. 2006; 22(14):124–31.
Article Google Scholar
Dalton LA, Dougherty ER. Bayesian minimum mean-square error estimation for classification error-part i: Definition and the bayesian mmse error estimator for discrete classification. IEEE Trans Signal Process. 2011; 59(1):115–29.
Article Google Scholar
Kschischang FR, Frey BJ, Loeliger HA. Factor graphs and the sum-product algorithm. IEEE Trans Inf Theory. 2001; 47(2):498–519.
Article Google Scholar
Dalton LA, Dougherty ER. Bayesian minimum mean-square error estimation for classification error-part ii: Linear classification of gaussian models. IEEE Trans Signal Process. 2011; 59(1):130–44.
Article Google Scholar
Esfahani MS, Dougherty ER. Incorporation of biological pathway knowledge in the construction of priors for optimal bayesian classification. IEEE/ACM Trans Comput Biol Bioinforma. 2014; 11(1):202–18.
Article Google Scholar
Esfahani MS, Dougherty ER. An optimization-based framework for the transformation of incomplete biological knowledge into a probabilistic structure and its application to the utilization of gene/protein signaling pathways in discrete phenotype classification. IEEE/ACM Trans Comput Biol Bioinforma. 2015; 12(6):1304–21.
Article CAS Google Scholar

Download references

Acknowledgments

Not applicable.

Funding

Publication cost of this article was funded by the Robert M. Kennedy ^′26 Chair at Texas A&M University.

Availability of data and materials

Not applicable.

About this supplement

This article has been published as part of BMC Systems Biology Volume 12 Supplement 3, 2018: Selected original research articles from the Fourth International Workshop on Computational Network Biology: Modeling, Analysis, and Control (CNB-MAC 2017): systems biology. The full contents of the supplement are available online at https://bmcsystbiol.biomedcentral.com/articles/supplements/volume-12-supplement-3.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Texas A&M University, College Station, 77843, TX, USA
Alireza Karbalayghareh, Ulisses Braga-Neto & Edward R. Dougherty
Center for Bioinformatics and Genomic Systems Engineering, College Station, 77843, TX, USA
Ulisses Braga-Neto & Edward R. Dougherty

Authors

Alireza Karbalayghareh
View author publications
You can also search for this author in PubMed Google Scholar
Ulisses Braga-Neto
View author publications
You can also search for this author in PubMed Google Scholar
Edward R. Dougherty
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

AK mathematically derived the classifier and worked out the example. UB with ED conceived the modeling and edited the original draft. ED structured the classifier, while UB conceived the modeling, and edited the original draft. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Alireza Karbalayghareh.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional file

Additional file 1

Appendices. This additional file contains the appendices for the proofs of the lemmas 1, 2, and 3. (PDF 33 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Karbalayghareh, A., Braga-Neto, U. & Dougherty, E.R. Intrinsically Bayesian robust classifier for single-cell gene expression trajectories in gene regulatory networks. BMC Syst Biol 12 (Suppl 3), 23 (2018). https://doi.org/10.1186/s12918-018-0549-y

Download citation

Published: 21 March 2018
DOI: https://doi.org/10.1186/s12918-018-0549-y

Intrinsically Bayesian robust classifier for single-cell gene expression trajectories in gene regulatory networks

Abstract

Background

Results

Conclusions

Background

Methods

State model

Prior for state parameter

Observation model

Priors for observation parameters

IBR classifier

Trajectory-based IBR classifier

Effective class-conditional densities of trajectories

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Proof

Summing out \(\mathcal {X}\)

Results and discussion

Conclusions

Abbreviations

References

Acknowledgments

Funding

Availability of data and materials

About this supplement

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Publisher’s Note

Additional file

Additional file 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Systems Biology

Contact us