The SCNS algorithm
The SCNS algorithm solves the following problem:
We are given a set of variables V = {v1, v2, …, v
n
}, which correspond to genes measured, and an undirected state graph S = (N, E), where each vertex n ∈ N is uniquely labelled with a Boolean state s = (x
1
, …, x
n
), which corresponds to an active/inactive map of the genes, and there is an edge {s1, s2} ∈ E iff s1 and s2 differ in the value of exactly one variable, v. The edge {s1, s2} is labelled with v. In addition, we are given a designated set I ⊆ N of initial vertices, which correspond to the measurements at an early time point, and a set F ⊆ N of final vertices, which correspond to the measurements at a final time point, along with a threshold t
i
, which, intuitively, indicates how tight a matching with the experimental data we are looking for, and a maximum number of activators a
i
and repressors r
i
for each variable v
i
∈ V. We would like to find an update function u
i
: {0,1}n → {0,1} for each variable v
i
∈ V, such that the asynchronous Boolean network that arises from these rules satisfies the following conditions. Let U = {ui | v
i
∈ V} be the set of update functions. We note that the asynchronous Boolean network defines a directed graph over a set of vertices that is larger than N.
-
1.
Every final vertex f ∈ F is reachable from some initial vertex j ∈ I by a directed path p. Further, for every v
i
-labelled directed edge (s1, s2) ∈ p we have that ui(s1) = s2(v1)
-
2.
For every variable v
i
∈ V, let N
i
be the set of states without an outgoing v
i
-labelled arc. For every i the number of states s ∈ N
i
such that u
i
(s) = s(v
i
) is greater or equal to t
i
. That is, the number of edges leaving the original state space N is bounded.
We restrict our search to update functions of the form f1 ∧ ¬ f2, where f1 and f
2
are monotone Boolean formulae (contain ∧ and ∨ gates, but no negation). The variables of f1 are activators of f and the variables of f2 are repressors. We look for functions with a maximum of a
i
activators and r
i
repressors.
The algorithm has three phases. We begin by building a directed graph from the given undirected state graph S = (N, E), by considering which of the underlying directed edges in E are compatible with some Boolean update function, and pruning those that are not. This phase is implemented via enumerative search, and after termination leaves us with a directed state graph G, which could include both directions or neither direction for a given edge.
To ensure reachability, we then construct, for each pair of initial node i ∈ I and final node f ∈ F, the shortest path from i to f in the directed graph G that was built in the previous phase of the algorithm. These paths can be computed via a breadth–first search.
The search for Boolean update rules compatible with these paths is then encoded as a Boolean satisfiability (SAT) problem. The update functions of each variable can be sought after separately, giving rise to reasonably sized satisfiability queries.
For full details, we refer the reader to [25, 26].
Finding stable state attractors
To analyse together all synthesised models, we first form a combined Boolean network that makes a transition if all sub-models do. If some sub-model has a stable state attractor s, s will also be an attractor of this combined model.
Given a set of compatible update functions {f
i
1, …, f
in
} for gene x
i
, the update function for the combined model is defined as: f’
i
= (¬x
i
∧ (f
i
1 ∧ … ∧ f
in
)) ∨ (x
i
∧ (f
i
1 ∨ … ∨ f
in
))
To find a stable state s = (v1, …, vn) of the resulting combined Boolean network we encode the search as a Boolean satisfiability (SAT) problem: (f’1(s) ↔ v1) ∧ … ∧ (f’n(s) ↔ v
n
).
To simulate overexpression of gene x
i
, we set the target function as the constant function f’i(x) = 1. To simulate knock out, we set it to the constant function f’i(x) = 0.
Software architecture and implementation
The architecture of SCNS is divided into two components: the backend and the frontend. The backend, which performs all computations necessary for the reconstruction and analysis of Boolean network models, is written in F# and makes use of the Z3 SMT solver [27].
The frontend, which implements the web-based graphical user interface and sends requests to the backend, is written in Javascript/HTML and uses the Angular library [28].
Cloud computation is implemented using the MBrace library [29, 30].
SCNS runs on Windows, Linux and macOS, but support for cloud computation is currently only supported on Windows.
Configuration of parameters
In order to synthesise a matching Boolean network, SCNS requires the configuration of three parameters per gene. These are the maximum number of allowed activating inputs to the gene’s update function, the maximum number of allowed repressing inputs, and a threshold parameter. The threshold is a measure of how well a rule fits the data (higher is better).
In order to successfully find rules for each gene under consideration, it is often necessary to experiment with different parameter values. We recommend that one begins with loose parameters (larger number of activators and repressors, lower threshold parameter), then, once a matching logical rule has been found for a gene, to tighten these parameters (lower the number of inputs and increase the threshold) and re-run. This can be repeated until all genes have a matching rule.
The tool and source code are available at [31], under an MIT open source license.