### Simulation

The EGF and NGF signaling network was simulated based on a previously described model [24]. The SBML format of this model (BIOMD0000000049, www.ebi.ac.uk, retrieval date Oct. 24, 2011) was imported into the Matlab Symbiology platform to simulate the dynamics of the signaling network using *ode15s* (stiff/NDF) solver. To introduce intra-subpopulation cell-to-cell variability (termed herein noise), for each run of the simulation we sampled the values for the total Raf, Mek and Erk levels from a Normal distribution *N*(*μ*,*σ*) around the respective initial concentration for a given set of fractional deviation from the mean (*σ*=*μ*·*f*
*d*, where *f*
*d*∈{0.1,0.2,0.3,0.4,0.5,0.6,0.7}). The values of *fd* represent here the degree of stochastic variance in the expression levels of Raf, Mek and Erk. Simulations were repeated 175 times with random sampling of total Raf, Mek and Erk levels to generate the data for each cell subpopulation. In each individual simulation repeat, the response of the network to EGF or NGF was simulated for 600 seconds after stimulation and the levels of c-Raf-Ras-GTP (hereafter referred as pRaf, reflecting the consequently activated Raf), ppMek and ppErk (the active, double phosphorylated, forms of Mek and Erk, respectively) were sampled every 1 minute as the observed parameters for the unmixing analysis. Mixtures containing two distinct cell subpopoulations were generated by mixing an equal number, unless indicated otherwise, of simulated observations obtained upon EGF and NGF stimulations. Mixtures containing four distinct cell subpopoulations were generated by altering the parameter in the SBML model corresponding to the catalytic activity (*k*
_{
cat
}) of Mek (J136) from its wild-type (Mek ^{wt}) value (*k*
_{
cat
} = 0.15 *s*
^{−1}) to a value depicting a mutant Mek (Mek ^{mut}) with a lower activity (*k*
_{
cat
} = 0.015 *s*
^{−1}). Thus, by having two different stimulations and two different levels of Mek activity, observations of four distinct cell subpopulations were generated: EGF-Mek ^{wt}, EGF-Mek ^{mut}, NGF-Mek ^{wt} and NGF-Mek ^{mut} (Additional file 1a-d).

### UNPBN

Methodologically, UNPBN is based on the nonparametric Bayesian networks (NPBN) approach [21]. It allows to avoid the assumption of underlying Gaussian distributions for the data and to find networks with nonlinear relations between the nodes. The UNPBN method combines a nonparametric mixture model incorporating the Dirichlet process [21, 25] and an allocation sampler [26, 27]. Prior to the description of the UNPBN approach a short introduction of GBNs [28] is provided here, as they are a basis of the presented method. We define the data *X*, consisting of *n* observations of a system/network with *d* species/nodes (\(X \in \mathbb {R}^{n \times d}\)), such that *x*
_{
j
} represents an *n*-dimensional vector containing the observed concentrations of the *j* species (*j*=1,…,*d*). In the Bayesian networks approach the relations between the nodes in a graph \(\mathcal {G}\) are modeled as conditional probability distributions (CPDs) *p*. If the CPDs for all nodes in \(\mathcal {G}\) are given by Normal distributions of the form \(x_{j}|{pa}_{\mathcal {G}}(x_{j})\sim N(\mu _{j}+\underset {\mathcal {K}_{j}}{\sum }\beta _{j,k} (x_{k}-\mu _{k}), {\sigma _{j}^{2}})\), where \({pa}_{\mathcal {G}}(x_{j})\) denotes the parents of node *x*
_{
j
}, \(\mathcal {K}_{j}=\{k|x_{k} \in {pa}_{\mathcal {G}}(x_{j})\}\), the *μ*
_{
j
} and \({\sigma _{j}^{2}}\) are the unconditional means and variances of *x*
_{
j
}, and *β*
_{
j,k
} are real-valued coefficients determining the influence of *x*
_{
k
} on *x*
_{
j
}, and, if in addition, \(\mathcal {G}\) is a directed acyclic graph (DAG) then the pair \((p,\mathcal {G})\) is called a GBN. The network structure is inferred using Gaussian distributions with a Normal-Wishart prior [20]. The estimation of \(\mathcal {G}\) is embedded in a Markov Chain Monte Carlo (MCMC) framework, conducted by maximizing the sampling distribution of the sampled graph

$$\begin{array}{@{}rcl@{}} L\left(\mathcal{G}| X\right) &=& \prod_{j=1}^{d} \int L\left({\sigma_{j}^{2}}, \beta_{j}| X^{\left(\{j\} \cup\mathcal{K}_{j}\right)}\right)p\left({\sigma_{j}^{2}}, \beta_{j}\right)d\sigma_{j} d \beta_{j}, \end{array} $$

with *β*
_{
j
}=(*β*
_{
j,1},…,*β*
_{
j,j−1}),(*β*
_{2},…,*β*
_{
d
})=*B* and \(X^{(\mathcal {J})}\) denotes the columns of *X* with indices in \(\mathcal {J}\). The MCMC algorithm uses so called single edge operations [29].

UNPBN generalizes the GBN approach as it is based on flexible nonparametric Bayesian mixture models for networks [21] which in turn combine different GBNs for different subsets of the data. The mixture is taken with respect to all parameters \((\mu, \sigma, B, \mathcal {G})\). The model for the data can be written as \(p(x) = \int p(x|\mu,\sigma, B,\mathcal {G})dP(\mu, \sigma, B, \mathcal {G})\) with *μ* and *σ* vectors of the unconditional means *μ*
_{
j
} and variances \({\sigma _{j}^{2}}\), respectively. The discrete mixing measure *P* is distributed according to \(\mathbb {P}\), a random probability measure, and \(p(x|\mu,\sigma, B,\mathcal {G})\) is a multivariate Normal distribution with a conditional independence structure compatible with \(\mathcal {G}\). According to the discrete nature of *P*, support points \(\mu _{h},\sigma _{h}, B_{h}, \mathcal {G}_{h}\) and probabilities *w*
_{
h
}, the mixture can be written as

$$\begin{array}{@{}rcl@{}} p(x) = \sum_{h=1}^{N} w_{h} \ p(x| \mu_{h}, \sigma_{h}, B_{h}, \mathcal{G}_{h}). \end{array} $$

The prior distribution of the mixing weights *w*
_{
h
} is assigned by *P* and the prior for \(\mu _{h},\sigma _{h}, B_{h}, \mathcal {G}_{h}\) is given by the base measure *P*
_{0} of \(\mathbb {P}\) for all *h*. The *N* different mixture components *h* can be interpreted as subpopulations in the data set. Accordingly, here such subpopulations are referred to as components. The assignment of each data point to its corresponding component is described by the allocation vector *l*=(*l*
_{1},…,*l*
_{
n
})^{′} [26].

The network structure \(\mathcal {G}\) and the allocation vector *l* are the main focus of our UNPBN procedure. The remaining parameters *μ*
_{
h
}, *σ*
_{
h
} and *B*
_{
h
} are integrated out and the MCMC algorithm iterates by updating the DAG \(\mathcal {G}\), the number of components *N* and the latent allocation vector *l*, leading to the posterior distribution

$$\begin{array}{@{}rcl@{}} p(l,\mathcal{G},N|X)= \prod_{h=1}^{N} L(\mathcal{G}| X_{(\mathcal{I}_{h})})p_{N}(m)p(N)p(\mathcal{G}), \end{array} $$

where \(L(\mathcal {G}| X)=\int L(\sigma, B|X)p(\sigma, B)d{\sigma }d{B}\) is the marginal sampling distribution for \(\mathcal {G}\), *p*
_{
N
}(*m*) is a probability distribution on the space of allocation vectors, *p*(*N*) is the distribution of the number of components and \(\mathcal {I}_{h}=\{i \in \{1,\:\ldots,n\}|l_{i}=h\}\) and \(X_{(\mathcal {I})}\) denotes the rows of *X* with indices in \(\mathcal {I}\).

In our UNPBN analysis a prior is needed for \( \theta _{h}=(\mu _{h}, \sigma _{h}, B_{h}, \mathcal {G}_{h})\) and for *w*
_{1},…,*w*
_{
N
}. For \(\mathcal {G}_{h}\) the prior which was used is uniform over the cardinality of the parent sets [30], for *σ*
_{
h
} and *B*
_{
h
} we employed the Normal-Wishart prior distribution with the identity matrix as the precision matrix and *d*+2 degrees of freedom. The mean vector of the multivariate Normal distribution (*μ*
_{
h
}) was chosen as a vector of zeros. For *N* we used a Poisson distribution with parameter *λ*=1 and the *w*
_{
h
} were obtained from a Dirichlet distribution with parameter vector (*α*,…,*α*) with *α*=1. Further details for the sampling distribution, posterior distribution and the MCMC sampling scheme were discussed in previous publications [21, 26, 31]. The approach is implemented in Matlab (R2009b, The MathWorks Inc., Natick, Massachusetts). The presented results are obtained from MCMC runs with 2.8·10^{6} iterations with a thinning of 350 and a burn in of 1.4·10^{6} iterations for networks with two subpopulations and from runs with 5·10^{6} iterations with a thinning of 500 and a burn in of 2·10^{6} iterations for networks with four subpopulations.

### Postprocessing of graphs

Although it is possible to use the output from the UNPBN analysis directly, for example to choose the most frequent DAG or allocation vector as a representative, it is preferable to perform an additional postprocessing step that takes into account all MCMC samples and improves the results considerably. The inferred graphs in the iterations of the MCMC simulation are stored in the form of adjacency matrices. Such a matrix *A* consists of the elements *a*
_{
i
j
}(*i*,*j*=1,…,*d*), *a*
_{
i
j
}=1 if nodes *i* and *j* are conditionally dependent (an arrow leading from node *i* to node *j*) and *a*
_{
i
j
}=0 if nodes *i* and *j* are conditionally independent (no arrow between them) or if *i*=*j*. For each pair of nodes the MCMC output of the UNPBN analysis can be summarized by the posterior edge probability \({pep}_{i j} = \sum _{s=1}^{r} a_{i j}^{s} / r\) where *s* is the index of the *r* iteration steps in the MCMC simulation. The resulting *pep* number ranges from 0 (i.e., strong evidence for the absence of a connection) to 1 (i.e., strong evidence for a connection between the corresponding nodes). These *pep* values are used in Fig. 5 for the presentation of the obtained results.

### Postprocessing of allocations

For the allocation vector, however, it is not possible to summarize the sampled vectors in the same way as for the edges, because of the so called label switching problem. During the sampling procedure the labels of the components change randomly, so that if two allocation vectors are compared it is not clear if a particular observation has been allocated to a different component or if the label of the component has changed. We employed a method based on maximizing the adjusted *Rand index* that bypasses this obstacle and that combines the allocation vectors of each MCMC iteration into one single vector [32]. This method is implemented in R [33] in the package ’mcclust’ and was used in cases where it was necessary to fix the number of components to a particular value (Fig. 4). In cases where the analysis is focused on the unmixing performance of UNPBN (Fig. 3), the sampled allocation vectors are evaluated regarding the homogeneity of the resulting components. For each entry \(l_{i}^{s,h}\) in the allocation vector sampled in iteration *s*, in each component *h*, the true component is determined by comparison with the simulation setting. Based on this, componentwise, observations originating from the same true component are considered as allocated correctly, (the indicator function \(I\left (l_{i}^{s,h}\right)\) is set to 1) while the remaining observations in that component are considered as wrongly allocated (\(I \left (l_{i}^{s,h}\right)=0\)). The percentage of correctly allocated observations (*pco*) for a particular UNPBN outcome is derived by

$$\begin{array}{@{}rcl@{}} \textit{pco} = \frac{1}{r} \sum^{r}_{s=1} \frac{1}{N^{s}} \sum^{N^{s}}_{h=1} \frac{1}{{n_{h}^{s}}} \sum^{{n_{h}^{s}}}_{i=1} I \left(l_{i}^{s,h}\right) \cdot 100 \end{array} $$

with *r* considered MCMC iterations, size \({n_{h}^{s}}\) of component *h* in iteration *s* and *N*
^{s} number of components in the allocation vector in iteration *s*.

### Cluster analyses

In order to compare UNPBN with clustering methods, k-means and hierarchical clustering were used in this work. The k-means clustering method finds the partition that divides the data to *n* clusters (where *n* is given by the user) such that the sum distances of all observations to the corresponding cluster mean is minimized [34]. The k-means cluster analysis was performed in Matlab, using the function “kmeans” with the distance parameter being set to squared Euclidean distance. To obtain stable results, the clustering was repeated 500 times with randomly chosen different starting points. Hierarchical clustering is an agglomerative procedure which merges in each step the two closest objects, repeatedly till the whole data set is in one single cluster. The hierarchical clustering was performed in Matlab using the functions “pdist”, with the distance parameter being set to Euclidean, followed by “linkage”, with the method parameter being set to inner squared distance (“ward”, [35]).

### Silhouette analysis

We used the average silhouette width (ASW) [36], to assess the quality of a given clustering and to compare the results of clusterings with different parameter settings. For a given clustering result, the silhouette value is calculated as

$$ sil(x_{i}) = \frac{b(x_{i})-a(x_{i})}{\max\{a(x_{i}), b(x_{i}) \}} . $$

For each observation, *x*
_{
i
}, *a*(*x*
_{
i
}) is the average dissimilarity between *x*
_{
i
} and all other data points within the same cluster, and *b*(*x*
_{
i
}) is the smallest average dissimilarity between *x*
_{
i
} and the data points in the remaining clusters, calculated for each cluster separately. Any measure of dissimilarity can be used, but distance measures are the most common. In this work the Euclidean distance was employed. The silhouette value is ranging between -1 and 1. Negative values indicate that a particular observation will fit better in another cluster, so it has been matched wrongly and the quality of the clustering result can be improved. High positive values indicate a good clustering result. The ASW is computed by averaging all *s*
*i*
*l*(*x*
_{
i
}) values, thus it provides an overall evaluation of the regarded clustering. While ASW enables to compare clustering performed with the same method with different parameters, ASW values are not comparable between different clustering methods.