Here we consider a Boolean network model that consists of N variables, which updates its state at each time step using a deterministic update rule. Initially we will focus on individuals described by this model, whose instantaneous state is described by Boolean values for model variables B1,B2…B
N
which evolve according to the rules of the model. We index these model variables using Roman letters (e.g., i,j,…), and use Greek letters (e.g., α,β,…) to refer to subsets of the model variables. For each possible subset of model variables α={i,j,… }, there is a unique state basis variable b
α
(which we sometimes write as bij…) and a unique product basis variable x
α
=xij…, which we formally define below. We consider the state basis variables to be the components of a vector b, and the product basis variables to be components of vector x. There are 2N subsets of all N Boolean model variables (including both the set θ of all model variables and the empty subset denoted ∅), so both b and x are vectors of length 2N.
Definition 1
Consider a Boolean model composed of a set of Boolean variables θ, where B
i
represents the state of model variable i∈θ; B
i
=1 for ON and B
i
=0 for OFF. If κ is the set of ON variables describing the instantaneous state of some individual, then the values of the state space and product space variables describing the state of that individual, indexed by subset α, are defined by:
$$\begin{aligned} b_{\alpha} &= \left(\prod_{i \in \alpha} B_{i} \right) \cdot \left(\prod_{i \in (\theta \backslash \alpha)} (1 - B_{i}) \right) &= \left\{ \begin{array}{ll} 1 & \textrm{if }\alpha = \kappa\\ 0 & \text{otherwise} \end{array}\right.\\ x_{\alpha} &= \prod_{i \in \alpha} B_{i} &= \left\{ \begin{array}{ll} 1 & \textrm{if }\alpha \subseteq \kappa\\ 0 & \text{otherwise}. \end{array}\right. \end{aligned} $$
The empty product x
∅
equals 1. The fact that each x
α
is a simple product of model variables motivates the terminology ‘product basis’. As shown in Additional file 1: Appendix 1, a product basis vector x is a fully equivalent representation of a mixed population to the state basis vector b.
Simulations of mixed populations
We build our simulations by selecting a set of product basis variables of interest and associating an update rule f
α
with each variable x
α
so that x
α
(t+1)=f
α
(x(t)) (the exception being the case of continuous-time Boolean networks in which f
α
=dx
α
/dt, but we will treat those separately later). We construct the simulation in two steps. The first step is to build the single-index update rules f
i
(i.e. α={i}) over all model variables i, by enumeration of their input states. The second step is to build certain multi-index update rules fij… as needed until the system of equations closes (i.e. until we have solved for each f
α
corresponding to a variable x
α
appearing in another time evolution equation f
β
). To begin with, we show how to build the change-of-basis operator T that converts state space basis vectors to product space basis vectors through the formula x=T·b.
Algorithm 1
(Constructing a change-of-basis matrix) Consider any set θ′ containing n≤N of the model variables, for which κ and α denote subsets. Define T(n) as the change-of-basis matrix that converts a length- 2nb vector indexed by κ to a length- 2nx vector indexed by α, and let \(T^{(n)}_{\alpha \kappa }\) denote the matrix element projecting b
κ
onto x
α
. We construct T(n) by assigning a 1 to each matrix element \(T^{(n)}_{\alpha \kappa }\) for which α⊆κ, and 0 to all other elements.
Proof
Consider the state vector b(κ) whose entries are all 0 except for a 1 in the position corresponding to state κ, which describes an individual in state κ. The product basis representation x of this individual is found by multiplying \(x_{\alpha } = \sum _{\gamma \subseteq \theta '} T^{(n)}_{\alpha \gamma } b_{\gamma } = T^{(n)}_{\alpha \kappa }\); thus x equals the column of the change-of-basis matrix that multiplies b
κ
. Using Definition 1, the value of the product basis variable x
α
, corresponding to matrix element \(T^{(n)}_{\alpha \kappa }\), is 1 if α⊆κ and 0 otherwise. □
We can now provide a procedure for calculating the single variable update rules f
i
. To do so, we consider only the relatively small subset of variable i’s inputs, rather than the full set of model variables. We use a superscript [ i] to denote quantities pertaining to the input subset; thus we define N[i] as the number of inputs to model variable i, θ[i] as the set of those input variables, b[i]={b
ρ
∣ρ⊆θ[i]} as the state space of input variables, and x[i]={x
ρ
∣ρ⊆θ[i]} as the corresponding product space. In biological networks, N[i] is usually small enough that we can explicitly write the change-of-basis operator \(T^{\left (N^{[i]}\right)}\) in this space using Algorithm 1.
Algorithm 2
(Computing f
i
) Define k[i] as a row vector such that \(k_{\alpha }^{[i]}\) is 1 if the pattern of Boolean inputs \(b_{\alpha }^{[i]}\) produces a 1 in output variable i, and 0 otherwise. Then \(f_{i} = \mathbf {k}^{[i]} \left (T^{\left (N^{[i]}\right)} \right)^{-1} \mathbf {x}^{[i]}\), which is a linear equation in x[i]⊆x.
Proof
By definition f
i
=k[i]·b[i], as this expression reproduces the output rule. Using the fact that \(T^{\left (N^{[i]}\right)}\) is invertible (proved in Additional file 1: Appendix 1), we write \(f_{i} = \mathbf {k}^{[i]} \left (T^{\left (N^{[i]}\right)} \right)^{-1} T^{\left (N^{[i]}\right)} \mathbf {b}^{[i]}\) and note that \(T^{(N^{[i]})} \mathbf {b}^{[i]} = \mathbf {x}^{[i]}\), which proves the formula. □
Using the set of single-index f
i
, one can compute the linear time evolution function of any multi-index product basis variable fij… using the following method.
Algorithm 3
(Computing fij…) First compute f
α
for α={i,j,… } as f
α
←f
i
(x)·f
j
(x)·… (expressed in terms of x-basis variables). Next, distribute each term inside the product, so that f
α
is a weighted sum of products of x-basis variables, and replace each nonlinear product of terms x
β
·x
γ
·… appearing inside f
α
with the product basis variable x
μ
where μ=β∪γ∪⋯. This gives an expression for f
α
that is linear in x.
Proof
First we show that f
α
=fij… equals the product f
i
·f
j
·…:
$$\begin{array}{*{20}l} f_{ij\dots}(t) &= x_{ij\dots}(t+1)\\ &= x_{i}(t+1) \cdot x_{j}(t+1) \cdot \dots\\ &= f_{i}(t) \cdot f_{j}(t) \cdot \dots\\ &= f_{i} \cdot f_{j} \cdot \ldots \end{array} $$
where the last line emphasizes that there is no time dependence in fij….
The second step is to show that each x
β
·x
γ
·… equals xβ∪γ∪…. Let k,l,… be the elements of μ=β∪γ∪…; then \(x_{\beta } \cdot x_{\gamma } \cdot \ldots = x_{k}^{p_{k}} \cdot x_{l}^{p_{l}} \cdot \dots \) where p
k
,p
l
,⋯≥1 are the respective number of times k,l,… appear in β,γ,⋯. Since all x
i
are Boolean variables, we have \(x_{i}^{p} = x_{i}\) for any p≥1. Thus x
β
·x
γ
·…=x
k
·x
l
·…=x
μ
. □
Using Algorithms 2 and 3, we can give the full procedure for building a simulation that time evolves any arbitrary set of product basis variables of interest describing some individual modeled by the Boolean rules. We denote the set of product variables of interest as Ω0; note that each element of Ω0 is itself a set of indices over model variables. Our algorithm constructs f
α
for each α∈Ω0, then additional f
β
to evolve each x
β
that appears in the formula for f
α
, etc. until the equations form a closed system (i.e. there is an update rule for every product basis variable appearing in the system of equations).
Algorithm 4
(Building a product basis simulation) First compute the full set of single-index f
i
using Algorithm 2. Next, initialize the total set of required product basis variables Ω←Ω0, and the set of product basis variables whose dynamics we have solved Ω
S
←∅. Finally, iteratively solve f
α
for each α∈Ω∖Ω
S
using Algorithm 3, while updating Ω
S
←Ω
S
∪{α} and Ω←Ω∪{μ} for each variable x
μ
appearing in the formula for f
α
. Iteration continues until Ω
S
=Ω.
Proof
We first note that Ω eventually converges to a finite set owing to the fact that the number of variables |θ| is finite, and that Ω⊆θ never loses elements of θ at each iterative step. Since a) Ω
S
⊆Ω, b) Ω
S
accumulates one term in Ω∖Ω
S
at each iterative step, and c) Ω converges, the algorithm always ends in a finite number of steps. When the algorithm terminates, the set of solved variables Ω
S
equals the set of variables Ω appearing in the equations, so the resulting system of equations is closed. □
Writing our final closed system of linear equations as a square matrix F, we have a very simple update rule for simulations using the product basis variables: x(t+1)=F·x(t).
Our final step is to generalize to a mixed-population simulation.
Definition 2
Define a population-level state vector <x> as a population-weighted linear combination of the state vectors of the subpopulations x(α):
$$\begin{array}{*{20}l} \left< \mathbf{x} \right>(t) &:= \sum_{\alpha} w_{\alpha} \mathbf{x}_{(\alpha)}(t) \end{array} $$
(1)
where the weighting factors w
α
are proportional to the occurrence of subpopulations α in the overall mixed population.
Note that we are free to choose the normalization constant \(W = \sum _{\alpha } w_{\alpha }\). In all of our examples, we will choose W=1, leading to the interpretation that w
α
is the fraction of the population in state α. Irrespective of the choice of W, this representation can be used to evolve mixed populations over time using our existing operators.
Claim 1
Each element <x
α
> of the vector <x> is proportional to the overall occurrence of individuals having x
α
=1 in the mixed population. For W=1, <x
α
> is the fraction of the population having x
α
=1.
Proof
Equation 1 is the definition of a weighted average of x, whose weighting factors w are proportional to the population fraction. □
Claim 2
Our linear operator F that evolves any arbitrary state of an individual x
α
over time also correctly evolves the state of any arbitrary mixed population <x> over time.
Proof
This Claim follows from the fact that F commutes with the sum over subpopulations:
$$\begin{array}{*{20}l} \left< \mathbf{x} \right>(t) &= \sum_{\alpha} w_{\alpha} F^{t-t_{0}} \mathbf{x}_{(\alpha)}(t_{0})\\ &= F^{t-t_{0}} \sum_{\alpha} w_{\alpha} \mathbf{x}_{(\alpha)}(t_{0})\\ &= F^{t-t_{0}} \left< \mathbf{x} \right>(t_{0}). \end{array} $$
(2)
□
Thus, the time evolution operator F produced by Algorithm 4 correctly evolves the mean occurrence of variables <x> over time in any mixed population, despite the fact that this operator was derived for an x that describes an individual (notably in assuming that each x
α
is Boolean).
Example 1: building equations
Consider the model shown in Fig. 2, whose Boolean model variables update according to the following rules:
$$\begin{array}{*{20}l} A(t+1) &= \overline{B(t)}\\ B(t+1) &= A(t) \land C(t)\\ C(t+1) &= A(t) \lor B(t). \end{array} $$
To build a product basis simulation, we first compute the change-of-basis matrices that will be used to compute the single-variable update rules f
A
, f
B
and f
C
. Variable A takes input from the single variable B, so calculating f
A
requires the change-of-basis matrix T(1). Ordering the elements by the subscripts (∅,B) respectively, and applying Algorithm 1, we obtain
$$\begin{array}{*{20}l} T^{(1)} &= \left[ \begin{array}{cc} 1 & 1\\ 0 & 1\\ \end{array} \right]. \end{array} $$
Variables B and C each take input from two variables. Ordering B’s input states (∅,A,C,AC), and C’s input states (∅,A,B,AB), we find that both f
B
and f
C
are computed using
$$\begin{array}{*{20}l} T^{(2)} &= \left[ \begin{array}{cccc} 1 & 1 & 1 & 1\\ 0 & 1 & 0 & 1\\ 0 & 0 & 1 & 1\\ 0 & 0 & 0 & 1\\ \end{array} \right]. \end{array} $$
Next, we build the single-index update functions. Variable A takes input only from variable B, so the possible patterns of active inputs are (∅,B)⊤, corresponding to the state basis variables \(\left (b^{[A]}_{\emptyset }, b^{[A]}_{B}\right)^{\top }\). The respective outputs are (1,0)=k[A] due to the NOT gate, from which we can immediately calculate f
A
using Algorithm 2:
$$\begin{array}{*{20}l} f_{A} &= \mathbf{k}^{[A]} \left(T^{(1)} \right)^{-1} \left[ \begin{array}{c} x_{\emptyset}\\ x_{B} \end{array} \right]\\ &= \left[ \begin{array}{cc} 1 & 0 \end{array} \right] \left[ \begin{array}{cc} 1 & -1\\ 0 & 1 \end{array} \right] \left[ \begin{array}{c} x_{\emptyset}\\ x_{B} \end{array} \right]\\ &= x_{\emptyset} - x_{B}. \end{array} $$
(3)
In the same way we find that the input patterns \(\left (\emptyset, b^{[B]}_{A}, b^{[B]}_{C}, b^{[B]}_{AC}\right)^{\top }\) to variable B lead to outputs k[B]=(0,0,0,1), and the input patterns \(\left (\emptyset, b^{[C]}_{A}, b^{[C]}_{B}, b^{[C]}_{AB}\right)^{\top }\) to variable C lead to outputs k[C]=(0,1,1,1). Using these together with T(2) we compute:
$$\begin{array}{*{20}l} f_{B} &= \left[ \begin{array}{cccc} 0 & 0 & 0 & 1 \end{array} \right] \left[ \begin{array}{cccc} 1 & -1 & -1 & 1\\ 0 & 1 & 0 & -1\\ 0 & 0 & 1 & -1\\ 0 & 0 & 0 & 1\\ \end{array} \right] \left[ \begin{array}{c} x_{\emptyset}\\ x_{A}\\ x_{C}\\ x_{AC} \end{array} \right]\\ &= x_{AC}\\ f_{C} &= \left[ \begin{array}{cccc} 0 & 1 & 1 & 1 \end{array} \right] \left[ \begin{array}{cccc} 1 & -1 & -1 & 1\\ 0 & 1 & 0 & -1\\ 0 & 0 & 1 & -1\\ 0 & 0 & 0 & 1\\ \end{array} \right] \left[ \begin{array}{c} x_{\emptyset}\\ x_{A}\\ x_{B}\\ x_{AB} \end{array} \right]\\ &= x_{A} + x_{B} - x_{AB}. \end{array} $$
(4)
Having built the single-index update functions, we can now derive a linear system that tracks the time evolution of any set of product variables that we aim to follow. Suppose we wish to follow the time evolution of variable x
A
, corresponding to the activity (or mean activity at the population level) of the Boolean model variable B
A
. The immediate equation for this purpose is f
A
, which we already derived (Eq. 3), but since it involves x
B
and x
∅
, our simulation must also track those variables through time using Eq. 4 along with
$$\begin{array}{*{20}l} f_{\emptyset} &= x_{\emptyset}. \end{array} $$
(5)
Equation 4 requires that we track a new multi-index variable x
AC
, requiring us to solve f
AC
using Algorithm 3:
$$\begin{array}{*{20}l} f_{AC} &= (1-x_{B}) \cdot (x_{A} + x_{B} - x_{AB})\\ &= x_{A} + x_{B} - x_{AB} - x_{AB} - x_{B} + x_{AB}\\ &= x_{A} - x_{AB}. \end{array} $$
(6)
We continue the process of identifying new variables and solving for their update functions until the equations form a closed system:
$$\begin{array}{*{20}l} f_{AB} &= (1-x_{B}) \cdot (x_{AC})\\ &= x_{AC} - x_{ABC} \end{array} $$
(7)
$$\begin{array}{*{20}l} f_{ABC} &= (1-x_{B}) \cdot (x_{AC}) \cdot (x_{A} + x_{B} - x_{AB})\\ &= x_{AC} - x_{ABC}. \end{array} $$
(8)
Equations 3-8, together with initial values for x
A
, x
B
, x
AC
, x
AB
and x
ABC
, describe the time evolution of these quantities in an individual Boolean network as a sequence of 0s and 1s in each variable. The final step is to reinterpret these equations as describing the dynamics of a mixed population, formally by taking the mean of both sides of each equation:
$$\begin{array}{*{20}l} f_{\left< A \right>} &= \left< x_{\emptyset} \right> - \left< x_{B} \right>\qquad\qquad\qquad\qquad\qquad\quad \text{(3{{b}})}\\ f_{\left< B \right>} &= \left< x_{AC} \right>\qquad\qquad\qquad\qquad\qquad\qquad\quad\;\, \text{(4{b})}\\ f_{\left< \emptyset \right>} &= \left< x_{\emptyset} \right>\qquad\qquad\qquad\qquad\qquad\qquad\qquad \text{(5{b})}\\ f_{\left< AC \right>} &= \left< x_{A} \right> - \left< x_{AB} \right>\qquad\qquad\qquad\qquad\qquad\;\; \text{(6{b})}\\ f_{\left< AB \right>} &= \left< x_{AC} \right> - \left< x_{ABC} \right>\qquad\qquad\qquad\qquad\quad\;\; \text{(7{b})}\\ f_{\left< ABC \right>} &= \left< x_{AC} \right> - \left< x_{ABC} \right>.\qquad\qquad\qquad\qquad\quad\;\, \text({8{b})} \end{array} $$
The angle-brackets denote an average, and we have used the notation <x
i
(t+1)>=f<i>. Per Claim 2, the linear equations are unaffected by the averaging process, so the same equations used to derive the dynamics of an individual also describe the mean values of those same variables in a mixed population. Whereas the product basis variables take on binary values for an individual, the population-averaged variables are real-valued on the interval [ 0,1] (using our recommended normalization in Claim 1). For example, we would set <x
A
>=0.4 if gene A is ON in 40% of the population.
Probabilistic and asynchronous Boolean networks
The product basis method can be applied to probabilistic Boolean networks (PBNs) [11, 28], in which several state transitions are possible at each time step with different probabilities. As we will show, our algorithm works in the large-population limit for which time evolution of the average state <x> is essentially deterministic, despite the fact that each individual PBN is stochastic.
Applying our method to PBNs requires that we reinterpret the meaning of the time evolution equations. For an individual we write:
$$\begin{array}{*{20}l} p(\mathbf{x}(t+1)) = F \cdot \mathbf{x}(t) \end{array} $$
(9)
where p(·) denotes the probability of an outcome. The product basis method still works with this new interpretation of the time evolution operator F, although we note several changes to the logic. First, in Algorithm 2 we generalize each k[i] to be a vector of likelihoods that each respective input pattern produces a 1 in the output variable, so that f
i
=k[i]·b[i] as before. Second, the multiplication rule used in Algorithm 3 still holds if updated to read \(p_{x_{ij\dots }} = p(x_{i}) p(x_{j}) \dots \), owing to the independence of outcomes p(x
i
), p(x
j
), etc. Third, we point out that although p(x) is real-valued, the state of an individual x is still binary, so \(x_{\alpha }^{p \ge 1} = x_{\alpha }\) as before (Algorithm 3).
Despite the fact that our algorithm produces valid product basis equations of the form of Eq. 9, the resulting linear system of equations does not form a closed system, simply because the left-hand side uses different variables than the right (p(x) versus x). Our method therefore cannot be used to simulate an individual instance of a PBN. However, by averaging both sides and taking the large-population limit so that p(x)→<x>, the system closes and reproduces Eq. 2. Thus, our system of equations accurately tracks the deterministic dynamics of arbitrary mixed populations of PBNs in the infinite-population limit, despite being unable to model stochastic individuals.
Large populations of asynchronous networks behave identically to large populations of PBNs [13] if we define a uniform time step: the probabilities of the various possible updates over that time step give the state transition weights in the corresponding synchronous PBN. If this time step is small enough, then the likelihood of two causally-connected asynchronous updates happening in the same step is small, and in this limit the local update rules for a PBN accurately model the asynchronous network. Therefore our analysis also applies to large populations of asynchronous networks for small time steps.
Continuous-time Boolean networks
Probabilistic networks give one way to incorporate rate information into our model; another way is to work in continuous time using differential equations [12]: f
A
=dx
A
/dt. The differential form does require one change in our method: the rate of change of a higher-order variable is found by using the product rule of derivatives. Whereas under a discrete update fABC… is the product f
A
·f
B
·f
C
·…, for the differential case we compute:
$$\begin{array}{*{20}l} f_{ABC\dots} &= \frac{d}{dt} \left(x_{A} x_{B} x_{C} \dots \right)\\ &= x_{B} x_{C} \dots f_{A} + x_{A} x_{C} \dots f_{B} + \cdots. \end{array} $$
(10)
Also, under discrete updates, the trivial function is f
∅
=1, but with differential updates it is f
∅
=0.