### Materials

#### Datasets

The following datasets were used in the evaluation of this work:

### SalPhase

A time lapse movie acquired by phase-contrast optical microscopy, monitoring four single cells of *Salmonella enterica* serotype Typhimurium that divide to become three discrete micro-colonies (86 frames in total, 5 min sampling period, 1360x1024 pixels resolution, see [13] for more details). From now on, we will refer to this movie as "SalPhase" and going from top left to bottom right we will refer to the three colonies encountered as colony 1, 2 and 3 respectively. This dataset is provided as Additional file 2.

**Additional file 2:** SalPhase time-lapse movie. (MP4 2644 kb)

### Multi-SalPhase

This time lapse phase-contrast optical microscopy movie includes multiple growing micro-colonies of *S.* Typhimurium (101 frames, 5 min sampling period, 1360x1024 pixels resolution, see [13] for more details). This more complex movie contains overcrowded merging colonies and the total number of cells exceeds three thousands in late frames. This dataset is provided as Additional file 3.

**Additional file 3:** Multi-SalPhase time-lapse movie. (MP4 4064 kb)

### Individual frames

We have also analyzed several image frames of different imaging modalities generated by different laboratories that are publically available. MicrobeTracker’s frame is a phase-contrast CSLM image (1344x1024 pixels resolution) of sparse *Escherichia coli* cells and micro-colonies available at MicrobeTracker’s webpage [25]. CellTracer’s frame contains a micro-colony of *E. coli* acquired by phase-contrast optical microscopy (901x689 pixels resolution) and was derived from a movie produced in [26] and provided at CellTracer’s webpage [27]. TLM-Tracker’s image contains a micro-colony of *Bacillus megaterium* acquired by bright-field optical microscopy (1300x1030 pixels resolution) generated in [28] and available at TLM-Tracker’s webpage [29]. Finally, Schnitzcells’ frame contains a micro-colony of *E. coli* acquired by phase-contrast optical microscopy (472x538 pixels resolution) and was derived from a movie provided at the Schnitzcells’ webpage [30]. Let us mention that Schnitcells can handle only uint16 format images, thus each of the aforementioned datasets was converted to this format too so as to enable comparative evaluation of Schnitchells with the rest of the methods.

#### Evaluation metrics

Comparative evaluation was performed based on metrics commonly used in pattern recognition, such as true positives (TP), i.e. actual cells that were correctly classified as cells, false positives (FP), i.e. artifacts that were incorrectly classified as cells, and false negatives (FN), i.e. actual cells that were missed. We remark that artifacts can be either due to noise or fragments of over-segmented cells. Furthermore, for each method we computed the *True Positive Rate* (TPR) or recall

*TPR* = *TP*/(*TP* + *FN*),

which represents the percentage of the true cells found, the *Positive Predictive Value* (PPV) or precision

*PPV* = *TP*/(*TP* + *FP*),

which represents the probability that a detected cell is a true cell. The former metric is used to characterize the sensitivity and the latter the specificity of a method. Additionally, we computed the *F-measure* [31], i.e. the harmonic mean of *TPR* and *PPV* that is commonly used to assess the recall versus precision trade-off:

*F* = 2 ⋅ (*PPV* ⋅ *TPR*)/(*PPV* + *TPR*).

### Methods

The developed BaSCA analysis pipeline consists of five stages: image preprocessing, bacterial colonies segmentation, single-cells segmentation, cells tracking and lineage trees construction, single-cell attributes estimation and visualization. Specifically, we formed a computational pipeline, focusing on detecting, segmenting and characterizing each colony and individual cell in the movie. Initially, we process the whole image and extract individual colonies. Then we analyze each colony into a partition of “objects” containing one or more cells. Gradually, we zoom in and reach the desired result; accurate single-cell boundaries detection and cell features estimation. Below, we describe each stage of the developed pipeline in detail.

#### Image preprocessing and colonies segmentation

Image preprocessing is a necessary initial step to suppress noise and correct for image background abnormalities (see Fig. 1(a-b)). We apply first Contourlet Transform based image denoising, as described in [32–35]. Then, we use Contrast-Limited Adaptive Histogram Equalization (CLAHE) [36] to enhance cell regions in the image and suppress any luminous local micro-colonies background that exists in some imaging modalities (see Fig. 1(b)). As a result, we manage to remove noise and at the same time separate the background from information bearing cell regions.

After preprocessing, our colony segmentation method creates a binary mask used to separate each colony from the image background. We first apply a mathematical morphology operation (rolling ball method) [37] to estimate the image background. Then we use Otsu’s global thresholding [38] and Canny’s edge detection algorithms [39] to create a binary image (global background mask) outlining as precisely as possible the region of each colony, regardless of its size (see Fig. 1(c)).

After completing this task for each image frame, we track each colony and extract its properties (e.g. number of cells, total area, location of the colony’s centroid etc.) in the time-lapse movie. Specifically, in order to determine the correspondence of two colonies in two consecutive frames, we check whether the centroid of a colony in the previous frame lies inside the bounding box of a colony in the current frame, and if so this colony is matched. If a colony in the current frame matches to no colony in the previous frame, we treat it as a new one, since we consider it possible for a colony to enter to the microscope’s field of view while the experiment is running. When two or more colony centroids of the previous frame lie inside the bounding box of the same colony in the current frame, the algorithm identifies that these colonies have merged. This capability is important since in a large bacterial community colonies can merge or move out of the field of view. Keeping track of colonies as they grow and merge is important not only for archiving their time varying properties but also for knowing the colony from which each individual cell has emerged i.e. the subpopulation to which it belongs to, without using any fluorescent markers. To the best of our knowledge, our divide and conquer computational approach is the only one that can track multiple subpopulations (colonies) that may merge, in addition to single-cells, in cell movies.

#### Single-cell segmentation algorithm

Having defined a colony’s region as accurately as possible, we can now “zoom in” and detect individual cells effectively. A visual overview of the whole segmentation approach is provided in Fig. 2.

### From colonies to cell objects

Despite the applied preprocessing, each colony may still contain local background pixels due to large illumination variations (e.g. see Fig. 1(b)). To overcome this problem, we use adaptive thresholding [37] to separate cells from the non-uniformly illuminated background (see Fig. 1(d)). We remark that this type of background variability is typical for images acquired by optical microscopes (either bright field or phase contrast) but not for images acquired using confocal laser scanning microscopes (CLSM) [40]. Nevertheless, the adaptive thresholding method used here has no negative effect on CLSM images. However, as illustrated in Fig. 1(d) adaptive thresholding introduces "salt and pepper" noise, so in order to eliminate it we multiply the processed image with the global background mask. The result is an image with the global background removed and the local noise inside each colony suppressed (see Fig. 1(e) for the 3rd colony of SalPhase movie).

However, the individual cells inside the colony may not be fully separated (segmented) at this stage. We rather observe that the colony is partitioned into "cell objects", where each object is a set of cells "touching" each other (see Fig. 3(a) for examples). The union of these objects covers all cells in the colony and their pairwise intersection is empty (every single-cell belongs to one and only one object). In the colony of Fig. 3 (a) three objects have been colored for demonstration purposes, one green and two red. The green is a so called **collinear** object while the two red ones are considered **complex** objects. To determine an object’s category (collinear vs. complex) we developed a divide-and-conquer approach where we analyze each object individually. First, we compute the object's skeleton as presented in [37] (refer to Fig. 3 (b-d)). Then, if the skeleton has "junction points" (i.e. pixels that have more than two pixel neighbors [37]) the object is considered complex. On the other hand, if we find no junction points, the object is either a single-cell or a cascade of single-cells (collinear object), see Fig. 3 (c).

### Collinear object analysis

In order to ascertain if a collinear object corresponds to a single-cell or a series of connected single-cells (i.e. a cell "sausage" like structure) we developed a simple algorithm based on the observation that the shape of a bacterium at the fission stage [41] resembles that of a bow tie (refer to Fig. 4 (a)). The algorithm searches for "bow tie points" by computing the Euclidean distances [42] of all pairs of diametric pixels (antipodal) lying on opposite boundaries with respect to the object’s centerline. Then, it identifies significant local minima (Fig. 4 (c)) of the distance curve, meeting the following criteria:

*localMin*/*leftLocalMax* ≤ *T* and *localMin*/*rightLocalMax* ≤ *T*

where *localMin* is the local minimum and *leftLocalMax* and *rightLocalMax* are the local maxima lying around the local minimum. Such local minima (if they exist) are called “deep valleys” (Fig. 4 (c), indicated by red circles). Intuitively, *T* is the ratio of the minimum to maximum width (i.e. the distance of diametric pixels with respect to the object’s centerline from the center of the one pole semi-circle to the center the other pole semi-circle) measured at the frame before the first cell division. The value of threshold *T* is automatically set by the pipeline and is usually in the range [0.65, 0.75]. When the algorithm identifies the existence of deep valleys, we split the object into discrete cells at these locations (marked with red dashed lines in Fig. 4(d)), that correspond to the deep valley points on the centerline (the x-axis in Fig. 4(c)) because the centerline pixels are ordered (see Fig. 4(b)). Otherwise the collinear object remains intact and is considered as a single-cell. Finally, for each segmented cell, we identify its centroid by averaging its pixel coordinates (see Fig. 4(e)).

### Complex object analysis

Complex objects are treated differently than collinear objects. First we use the watershed transform [43] to estimate how many cells are possibly "hidden” inside a complex object. We determine the centroids of extracted watersheds and consider them as initial estimates for the centroids of potential cells in the complex object (see Fig. 2 (4a)). However, a well-known problem of the watershed transform is that it favors over-segmentation [44]. So, we apply the “deep valleys” algorithm again, but in a slightly modified way. This time we need to decide whether we should merge erroneously over-segmented cell fragments, so as to avoid generating false positive cells. Fragments in the neighborhood of a given fragment are examined to determine if they should be merged with it or not. Two touching fragments can be merged if there is no bowtie point between them, identified using the deep valleys criterion. If a potential cell (watershed fragment) can be merged with more than one neighboring cells, then the merging that leads to maximum solidity is chosen. Solidity is defined as the ratio of the resulting area to the resulting convex hull area, based on the conjecture that elementary objects, i.e. well-formed single-cells, tend to have a solidity value close to one. Moreover, the new object (after the merging) is inserted into a processing queue in order to be further examined whether it should be merged with another watershed fragment. We call this iterative procedure “puzzle solving” algorithm (see Fig. 2(4b)).

In order to outline the detected cells as accurately as possible and improve the segmentation result, we apply a machine learning method based on Gaussian Mixture Modeling (GMM) [45]. In [45], we have shown how to transform image pixel intensities to a properly constructed set of data points before applying GMM to identify protein spots in a 2D electrophoresis (2DGE) gel image. In our case, pixel intensities provide no information about cell structure since they follow a uniform distribution (see Fig. 5 (a)). So we use the minimum distance from cell boundary (Euclidean distance transform [46], see Fig. 5 (b)) as the basis for data points generation. Specifically, we consider each pixel of the object acting as a data points generator in its neighborhood (refer to Fig. 6). The total number of data points, *N,* to be generated by random sampling and used to represent each object will be proportional to the number *C* of its estimated cell centroids. These *N* data points are apportioned to the pixels of the object according to each pixel’s Euclidean distance from the object’s boundary, meaning that more internal pixels will be allowed to “throw” more data points in their neighborhood. Following this data generation scheme, the pixels closer to the object’s centerline belong to it with higher probability than the more distant ones.

We may think of this process (moving from pixel intensities to data points) as a reverse engineering step where the resulting data points represent best the bacterial shape. Specifically, a pixel *i* located at coordinates *(x*
_{
i
}
*, y*
_{
i
}
*)* with distance *d*
_{
i
} acts as a data generator using a 2-D Gaussian model *N(μ,*
*Σ*
*)* with center *μ = (x*
_{
i
}
*, y*
_{
i
}
*).* Finally, we get a GMM [47] having as many Gaussian components as the number of the pixels *M* in the object. Each component has a mixing coefficient proportional to its distance value and equal to:

*π*(*i*) = *d*
_{
i
}/∑
^{M}_{
j = 1}
*d*
_{
j
}.

So for each pixel *i* of the object we draw *N∙π(i)* data points from a 2-D Gaussian distribution which is centered at the pixel’s location *μ = (x*
_{
i
}
*, y*
_{
i
}
*)* with diagonal covariance matrix *Σ* having its elements set to 0.3. The variance value (0.3) was selected to be smaller than 0.5 (half-distance between neighboring pixels) in order to ensure that data points generated by the model (representing “cell structure”) will be distributed in a manner that guarantees that their abundance reflects the distance, thus preventing the generation of “hills” of data points in-between pixel locations. This variance value was determined by experimentation and is kept fixed throughout the analysis, irrespectively of the image modality.

As mentioned before, the number N of data points generated for a complex object is proportional to the number of the estimated candidate cell centers C it may contain. In particular we use

$$ N=\frac{1}{2}\ C\ \frac{\ C ellLength\kern0.5em \times \kern0.5em C ellWidth}{CalFacto{ r}^2}, $$

where *CellLength* is the expected cell length, *CellWidth* is the expected cell width of the species under examination and can be estimated from literature. For example for *Salmonella* S. Typhimurium the width (diameter) range considered is [0.7, 1.5] μm (so we chose *CellWidth* = 1.1 μm), and in length range is [2, 5] μm (so we choose *CellLength* = 3.5 μm) based on literature [48]. Let us remark that even if these cell size parameter values are overestimated the approach will suffer only from a performance loss (e.g. by using a larger than needed *N* value). Moreover, *CalFactor* is the spatial calibration factor of the experiment’s microscope used to convert the size of an image object from pixels to physical units (μm).

Using an *N* value that is proportional to C was a deliberate choice because the number of identified candidate centers can act as an object’s complexity indicator approximating the number of single-cell structures expected to be present in a complex object's region. So, if a complex object contains a lot of candidate centers, it is justified to “throw” more data points (use a larger *N*) in order to capture adequately the underlying structure of the different single-cells included in it. As shown in Fig. 6, the generated data set for the object with the six candidate centers (Fig. 6(c)) "spent" less data points (Fig. 6(d)) than the object with the 75 candidate centers (see Fig. 6(a) and (b)).

After generating the data points we associate each one of them with its closest center using Euclidean distance [42] and nearest neighbor classification [49]. Starting with this initial assignment, we can determine the initial parameters of a 2-D GMM having *C* components (as many as the estimated cell centroids in the object) and compute the log-likelihood of each data point to belong to this initial mixture model:

\( \log\ \mathrm{p}\left({\mathbf{y}}^{(i)}\left|\varTheta \right.\right)= \log {\displaystyle \sum_{\mathrm{m}=1}^C{w}_m\mathrm{p}\left({\mathbf{y}}^{(i)}\left|{\theta}_{\mathrm{m}}\right.\right)} \),

where *Θ*≡{*θ*
_{1}, …, *θ*
_{
c
}, *w*
_{1}, …, *w*
_{
c
}} is the complete set of mixture model parameters and *w*
_{
m
} are the mixing coefficients.

The next step is to apply Finite Mixture Modeling (FMM) [50] in two dimensions (see Fig. 7). An important issue in mixture modeling is the selection of the proper number of components to use. With too many components the mixture may overfit the data, while with too few components it may not be flexible enough to capture the true underlying reality [50, 51]. In our case, this would translate to a solution with either more or less components than the actual number of cells present inside a complex object. In order to overcome this difficulty we apply a modified Expectation Maximization (EM) algorithm [50] which also employs the Minimum Message Length (MML) criterion [51] for best model selection. However, since the EM with MML is computationally expensive we first apply the "puzzle solving" algorithm to estimate the initial number of components properly i.e. start the iterative algorithm at a number not too far from that of the best model. Using the MML criterion ensures that the best model will not end up being an unnecessarily complex one unless if it pays for itself. Therefore, it is possible that the best model may end up containing less than the initial number of C components (see Fig. 7). At the end, it is possible that some circular object fragments may still remain isolated and not merged with the cell they really belong to, especially when analyzing low resolution cell movies. To overcome this difficulty, we check if a fragment’s area is under a pre-specified threshold

$$ A=\frac{1}{4}\pi \cdot {\left(\frac{CellWidth}{CalFactor}\right)}^2 $$

determined automatically by the pipeline using the pre-specified parameters. Fragments with area less than *A* are merged with the "touching" neighbor that maximizes the resulting object's solidity after the merging (as mentioned in the "puzzle solving" step of the analysis) otherwise they remain intact and are considered to be undersized single-cells. Intuitively, considering a cell's projection on the plane as a curved parallelogram with two attached semi-circles (poles), its area must be at least the area of the two cell poles, i.e. the area of a circle with radius equal to one half of the expected cell's width as in *A*.

#### Lineage tree construction

A cell in a time-lapse movie frame can: 1) proliferate, 2) divide, and 3) die or disappear from the microscope’s field of view. In order to construct the lineage of a single cell, first we have to segment the cells of each colony efficiently and then we have to track the colonies and their cells along consecutive frames. In order to track cells in overcrowded bacterial time-lapse movies, we have developed an algorithm inspired by motion estimation techniques used for video compression [52] which also follows a divide-and-conquer strategy. Our cell tracking algorithm will not be discussed here since it has been published in [53].

#### Single-cell and colony properties extraction

For each colony, we compute growth curves in terms of the colony’s area (either in pixels or in micrometers) or in terms of cell population (cell counts). In addition, we extract numerous properties at the single-cell level. We divide single-cell properties into two categories: cell *attributes* and cell *life attributes. Attributes* are cell properties changing with time, characterizing each cell at each frame of the time lapse movie. For example, cell attributes that we track are: area, fluorescent protein quantity (if any), major axis length (length), minor axis length (width), distance from the colony’s centroid etc. On the other hand, cell *Life attributes* are statistics of attributes (e.g. min, max, average, median, standard deviation) characterizing a cell’s whole life "trajectory", i.e. from its birth to its division. In addition, elongation (i.e. how much a cell has changed length before it divides) and division time (i.e. the duration of a cell’s life) are life attributes too. Additionally, given an attribute’s evolution (time-series) we can estimate life attributes e.g. estimate an individual cell’s elongation rate by fitting an exponential model to the cell’s trajectory. All these properties are estimated and exported (stored) to a bio-database for each individual cell and time frame in the movie. Moreover, several useful visualizations can be produced for displaying the computed cell attribute values directly using color on top of the input time-lapse movie videos (refer to Additional files 4, 5 and 6) or on top of a colony’s lineage tree or divisions tree as it will be demonstrated in the Results and Discussion section.

**Additional file 4:** Computed cell length overlaid on SalPhase time-lapse movie. (MP4 792 kb)

**Additional file 5:** Multi-SalPhase time-lapse movie; cell length visualization after segmentation. (MP4 6740 kb)

**Additional file 6:** SalPhase time-lapse movie; color indicates cell generation index as determined by the cell tracking algorithm. (MP4 516 kb)

#### Implementation

The pipeline of algorithms presented here has been coded in Matlab version R2015b [54]. We used extensively the image processing, statistics and machine learning, wavelet and curve fitting Matlab toolboxes. All presented results were obtained using a desktop computer with Intel Core i5-3350 processor running at 3.10 GHz, 8GB RAM, under Microsoft Windows 7 professional operating system. The current implementation is not optimized for performance. Nevertheless, a discussion of the trends of its running time is provided in Supplementary Material section 5 (see Additional file 1). Due to the divide-and-conquer strategy followed that breaks large problems into smaller subproblems, we expect that the use of parallel processing can significantly accelerate the analysis. All post-processing capabilities are also implemented in Matlab, so as to enable users to interact with the data produced using simple function calls. The authors' intention is to complete the software integration and performance optimization, add a proper user interface and then make the software accessible to the systems biology community. The emphasis of this methodology paper is on the innovative methods integrated in the BaSCA pipeline and not on the software architecture and implementation.