Skip to main content

Site-specific recombinatorics: in situ cellular barcoding with the Cre Lox system



Cellular barcoding is a recently developed biotechnology tool that enables the familial identification of progeny of individual cells in vivo. In immunology, it has been used to track the burst-sizes of multiple distinct responding T cells over several adaptive immune responses. In the study of hematopoiesis, it revealed fate heterogeneity amongst phenotypically identical multipotent cells. Most existing approaches rely on ex vivo viral transduction of cells with barcodes followed by adoptive transfer into an animal, which works well for some systems, but precludes barcoding cells in their native environment such as those inside solid tissues.


With a view to overcoming this limitation, we propose a new design for a genetic barcoding construct based on the Cre Lox system that induces randomly created stable barcodes in cells in situ by exploiting inherent sequence distance constraints during site-specific recombination. We identify the cassette whose provably maximal code diversity is several orders of magnitude higher than what is attainable with previously considered Cre Lox barcoding approaches, exceeding the number of lymphocytes or hematopoietic progenitor cells in mice.


Its high diversity and in situ applicability, make the proposed Cre Lox based tagging system suitable for whole tissue or even whole animal barcoding. Moreover, it can be built using established technology.


The fate of the progeny of two seemingly identical cells can be markedly distinct. Well studied examples include the immune system and hematopoietic system, for which the extent of clonal expansion and differentiation has been shown to vary greatly between cells of the same phenotype [14]. Fate and expression heterogeneity at the single-cell level are also apparent in other systems including the brain [57] and cancers [810]. Whether this heterogeneity is due to the stochastic nature of cellular decision making, reflects limitations in phenotyping, is caused by external events, or a mixture of effects, is a subject of active study in several fields [1113]. As addressing this pivotal question through population-level analysis is not possible, experimental tools have been developed that facilitate monitoring single cells and their offspring over several generations.

Long-term fluorescence microscopy represents the most direct approach to assess fate heterogeneity at the single-cell level. Studies employing that technique are numerous [1420], and have revealed, among many other significant observations, that although the fate of stimulated B cells are heterogeneous, there exist strong correlations at the clonal level in terms of differentiation and death versus division fates [15, 21]. Filming and tracking of cell families in vitro remains technically challenging, is labor intensive, and only partially automatable [22, 23]. Despite significant advances in the field, continuous tracking in vivo is confined to certain tissues, and time windows of up to twelve hours for slowly or non-migrating cells.

A radically different approach to long-term clonal monitoring is to mark single cells with unique DNA tags via retroviral transduction, a technique known as cellular barcoding [2, 10, 2427]. As tags are heritable, clonally related cells can be identified via DNA sequencing. By tagging multi-potent cells of the hematopoietic system and adoptively transferring them into irradiated mice, the contribution of single stem cells to overall hematopoiesis has been quantified [2426, 28]. Amongst other discoveries, this has revealed statistically consistent heterogeneity in the collection of distinct cell types produced from apparently equi-potent progenitors [2931]. Current barcoding techniques are unsuitable for tagging cells in vivo, and typically require ex vivo barcoding followed by adoptive cell transfer [26]. This restricts its scope to cells suitable for adoptive transfer, such as hematopietic stem and progenitors, naive lymphocytes, and cancer cells.

Ideally, a cellular barcoding system would inducibly mark cells in situ in their native environment, would be non-toxic, permanent and heritable, barcodes would be easy to read with a high-throughput technique, and the system would enable labeling large numbers of cells with unique barcodes. Recently, two studies have been published that address some of these points. Sun et al. [32] employed a Dox inducible hyperactive form of the Sleeping Beauty transposase to genetically tag stem cells in situ, and followed clonal dynamics during native hematopoiesis in mice. In that system, tags consist of a random insertion site of an artificial transposon, which upon withdrawal of Dox is relatively stable. A second in vivo cellular barcoding system based on site-specific DNA recombination with the Rci invertase was implemented by Peikon and co-workers [3335]. Inspired by the brainbow mouse [36], this system induces a random barcode by stochastically shuffling a synthetic cassette pre-integrated into the genome of a cell. The authors predicted high code diversity from relatively small constructs (approx. 2 kb) and demonstrated feasibility of random barcode generation in Escherichia coli [35].

Each of those approaches provide elegant advances on shortcomings of previous systems by generating largely unique tags without significant perturbation to the system of interest, but some difficulties remain. For barcode readout, the method in [32] requires whole-genome amplification technology and three arm-ligation-mediated PCR to efficiently amplify unknown insertion sites. Furthermore, the random location of the transposon may impact behavior of some barcoded clones and thus lead to biased data. Moreover, some background transposon mobilization was detected in certain cell types, subverting the stability of the barcodes. The Rci invertase based system remains to be implemented in cells other than bacteria. Similar to the Sleeping Beauty transposase, the method requires tight temporal control over Rci expression to make codes permanent.

In the present article, we consider the Cre Lox system as a driver to induce in situ from a series of tightly spaced Lox sites large numbers of distinct, permanent, randomly determined barcodes. In contrast to the Brainbow construct [36, 37], which relies on overlapping pairs of incompatible Lox sites to recombine randomly into one of several stable DNA sequence configurations, our design exploits constraints on the distance between Lox sites that arise during DNA loop formation, a prerequisite for site-specific recombination [3840]. This known feature has not previously been exploited, but is a crucial design element for obtaining high barcode diversity. First, by allowing repeated usage of the same Lox site, code diversity is solely restricted by cassette size and not, as in the Brainbow construct, by the relatively small set of non-interacting Lox sites [41]. Second, for a design without distance constraints, the diversity of stable barcodes creatable with the Cre Lox system is of order O(n) at best, where n is the number of Lox sites [35]. Whereas with distance contraints, optimal barcode diversities of order O(n 3) are possible. As will be shown in this article, boosting this scaling with the four incompatible Lox sites that have been reported in the literature [41], 1012 distinct codes of about 600 bp each can be generated from a genetic construct as small as 2.5 kb. In combination with the CreEr system [42], this is sufficient to inducibly barcode label e.g. all naive CD8 T cells in a mouse [43]. Desirable features are inherently part of the Lox barcode cassette design, including: short and stable barcodes; a single barcode per cell; and robust read-out.

Cre Lox biology

Before introducing the Lox barcode cassette, we revisit some facts about Cre Lox biology [44, 45]. Cre is a bacteriophage Pl recombinase that catalyzes site-specific recombination between Lox sites. A Lox site is a 34 bp long sequence composed of two 13 bp palindromic flanking regions and an asymmetric 8 bp core region (Fig. 1 a). For recombination to occur four Cre proteins bind to the four palindromic regions of two Lox sites and form a synaptic complex. A first pair of strand exchanges leads to a Holliday junction intermediate [46]. Isomerization of the intermediate then allows a second pair of strand exchanges, and formation of the final recombinant product [40]. The DNA cleavage site is situated in the asymmetric core region. If the Lox sites are on the same chromosome, their interaction requires formation of a DNA loop. If they have the same orientation (direct repeats), recombination results in excision of the intervening sequence, and this reaction is essentially unidirectional [47]. If Lox sites are in the opposite orientation to each other (inverted repeats), the sequence between the sites is inverted, becoming its reverse complement (Fig. 1 b). In the absence of Cre, Lox-Lox recombination events are below detection limits (e.g. [37], Fig. S1). Due to compatibility with eukaryotes, the Cre Lox system has become an essential tool in genetic engineering and a large array of transgenic mouse models with inducible cell-type specific expression of Cre have been created [42].

Fig. 1
figure 1

Lox biology and Lox barcode cassette. a Lox DNA sequences. Lox sites are composed of two 13 bp palindromic Cre binding sites and an 8 bp core (original LoxP sequence shown). Asymmetric cleavage sites in the core are indicated by arrows. b Cre mediated site-specific excision and inversion of a sequence with a minimum of 82 bp between two Lox sites on the same chromosome [38]. If Lox sites are oriented in the same direction, productive recombination excises the sequence, while if they are oriented in opposite direction the sequence is inverted (i.e., the reverse complement). c An alternating Lox cassette with 13 elements of size 7 bp. To illustrate how barcodes are generated, two excision and one inversion event are shown that create from the initial cassette a size-stable barcode with three random elements. Pairs of interacting Lox sites are indicated by a, b, and c. Elements affected by recombination have colored background. The barcode with three elements is size-stable as Lox sites oriented in the same direction (arrows) are closer than the minimal Lox interaction distance of 82 bp, precluding further excision. d Four concatenated alternating Lox cassettes of length 13 elements each with poorly-interacting Lox site variants [41] result in a code diversity greater than 1012

In in vitro trials with Cre mediated Lox reactions, a sharp decrease in recombination efficiency has been observed when the sequence separating two Lox sites is less than 94 bp [38]. Recombination is still detectable at low levels at 82 bp, but not at 80 bp where DNA stiffness appears to prevent DNA loop formation, and as a consequence Lox site interaction. For the distinct, but similar, Flp/FRT system this minimal distance was established to be smaller in vivo, with interactions still possible at 74 bp [39]. The existence of a minimal distance is one of the key features that we will exploit to make random barcodes stable, but in our proposed design it will only prove necessary for it to be greater than 44 bp.

Lox barcode cassettes

In full generality, a Lox barcode cassette is a series of Lox sites interlaced with n distinguishable DNA code elements of size m bp each. On Cre expression, code elements change orientation and position, or are excised [34]. Through Cre mediated excision, the number of elements eventually decreases until reaching a stable number (Fig. 1 c). Sequences that have attained a stable number of code elements form size-stable barcodes. A cassette’s code diversity is the number of size-stable barcodes that can be generated from the cassette via site-specific recombination.

Our main result is a robust Lox cassette design that provably maximizes code diversity for any given cassette length n and element length m≥5 bp. The design is robust to both sequencing errors and to the minimal interaction distance between Lox sites. The analysis that leads us to the design is provided in the “Optimal design” Section. The identification of code element sequences that avoid misclassification due to sequencing mismatch errors then follows. Finally, probabilistic aspects of code generation from an optimal barcode cassette are explored via Monte Carlo simulation. Lox cassettes with code elements of size 4 bp, higher order Lox interactions, the impact of transient Cre activation, and distance-dependent Lox-Lox complex formation are considered in the discussion.


A robust cassette design that maximizes code diversity

The optimal design will prove to have the orientation of both the outmost, and any two consecutive, Lox sites inverted (Fig. 1 c). Code elements between Lox sites are of size longer than four bp, but shorter than 24 bp. The lower limit ensures that elements can be chosen sufficiently distinctly to correct for at least two sequencing errors per element. Due to the minimal Lox interaction distance, the upper limit is necessary to ensure that barcodes with three code elements are size-stable.

The barcode diversity for this cassette design with n code elements under constitutive Cre expression will, as shown in the Optimal design Section, transpire to be

$$\begin{array}{*{20}l} \frac{(n+1)(n-1)^{2}}{2}+(n+1) =O(n^{3}), \end{array} $$

which is maximal for code elements that are larger than four base pairs.

A good compromise between cassette length, robustness to sequencing errors and barcode diversity is given by an alternating Lox cassette with 13 elements of length 7 bp each as shown in Fig. 1 c. The cassette is initially 567 bp long, which after excisions and inversions, generates size-stable barcodes that are composed of either a single element or three elements, with lengths 75 bp and 157 bp respectively, including remaining inactive Lox sites. This generates a code diversity of 1022 barcodes, far less than the 3×109 base pairs of the mouse genome, i.e. the maximal theoretical diversity achievable by the Sleeping Beauty transposase barcoding system [32].

However, concatenating four such cassettes with poorly-interacting Lox variants (e.g. LoxP, Lox2272, Lox5171 and m2 [41], Fig. 1 d) yields a size-stable code diversity of 10224≈1012. In mice, this is sufficient to tag all CD8 T cells [43] or all nucleated cells in the bone marrow [48].

A practical implementation

To implement Cre Lox barcoding in the mouse, one could cross mice generated from embryonic stem cells that had previously been transduced with the concatenated Lox barcoding cassettes described above (2268 bp) onto a Tamoxifen inducible cell-type specific CreEr expressing background [42]. A barcoding experiment would then be initiated by administrating Tamoxifen to the animal, which activates Cre and generation of a barcode (≤628 bp) in each cell where Cre becomes active. Some time after activation, cells of interest would be harvested and sorted for specific phenotypes, and sequenced using a next generation sequencing platform that allow read-lengths >600 bp. Cells originating from the same progenitor alive at the time of tamoxifen administration would carry the same barcode. This information would then used for inference on, for example, lineage pathways and clonal fate tracking. To identify frequent barcodes that are to be discarded in the analysis (see the Barcode distribution is heterogeneous Section), in a control experiment large numbers of cells would be harvested shortly after tamoxifen administration and sequenced.

Optimal design

A simple upper bound on the barcode diversity of k elements from a cassette initially containing n elements is the number of possible outcomes when choosing k from n elements in arbitrary order and orientation:

$$\begin{array}{*{20}l} {n \choose k} k! 2^{k} = \frac{2^{k}n!}{(n-k)!}. \end{array} $$

Although loose, it will become clear that it captures the dominant growth, O(n k), indicating the importance of k in generating barcode diversity and motivating a closer look at how cassette designs influence it.

For what follows, we introduce some terminology: a cassette is alternating if the orientation of any two consecutive Lox sites is inverted (Fig. 1 c); outermost Lox sites are termed flanking Lox sites; and flanking sites are direct or inverted if they have the same or opposite orientation, respectively.

Code diversity is determined by code element length and orientation of flanking sites

Cre recombination requires a minimal distance between the interacting Lox sites. In what follows we assume that the minimal distance for Lox interaction is 82 bp, but our results will be robust for any minimal interaction distance greater than 44 bp.

To understand how a minimal Lox-Lox interaction distance and cassette design determine size-stable barcodes and code diversity, we start with the simplest case, a barcode with a single code element (Fig. 2 a). If the code element is less than 82 bp, the barcode is size-stable irrespective of the orientation of its flanking sites. If the element is larger than 82 bp, the code is only size-stable if the flanking sites are inverted as excision will remove the element.

Fig. 2
figure 2

Barcode stability and code diversity. a The size-stability of barcodes with a single element depends on the length of the sequence between the flanking sites and their relative orientation. b Critical distances of barcodes of different sizes from a cassette with inverted flanking sites. The dotted line show the critical distance if flanking sites are oriented in the same direction. c Stability of barcodes from 2 to 5 elements for a Lox barcode cassette with inverted flanking sites. If the critical distance surpasses the minimal distance of 82 bp, stable codes (green) become unstable (gray). Barcodes of size three and four are unstable if m≥24 and m≥5 respectively, while codes of size five are always unstable. The gray interval illustrates potential uncertainty in the estimate of the minimal interaction distance. d Sequences between Lox cleavage sites represent the fundamental building blocks of the Lox barcode cassette. There are two with inverted Lox repeats (red, green) and two direct Lox repeats (blue) types of blocks. In the example, code elements are of size 7 bp and N denotes an arbitrary base. e For a cassette with inverted flanking sites pointing at each other and 5≤m<24, there are four possible block compositions ({k r ,k g ,k b }): two for barcodes of size three (three: {1,0,2} and one: {2,1,0}), one for barcodes of size two (two: {1,0,1}) and one for barcodes of size one (one: {1,0,0}). f) Operations (moves) on odd and even elements possible via Cre Lox recombination, in cassettes with five, six and seven elements. Each move inverts the orientation of the respective element

For a barcode with two elements, the sequence between the flanking sites contains an additional element and a Lox site (34 bp), giving a sequence of 2m+34 bp. If the flanking sites have the same orientation, the barcode is size-stable if 2m+34<82 bp, hence if m<24 bp. If they are in opposite orientation, excisions can only occur if flanking sites interact with the middle Lox site, and m<82 bp is sufficient for stability (Fig. 2 b). For given m, in general if there exists a barcode of size k with direct flanking sites, a barcode with k+1 elements is possible that has inverted flanking sites. Thus m and the orientation of the flanking sites are critical features that determine the maximum k.

In Fig. 2 c, the stability of barcodes with k{2,3,4,5} is shown as a function of m for a cassette with inverted flanking sites. The stability depends on a critical distance, i.e., the largest distance between two Lox sites in the barcode that is, or can be brought into, the same orientation via recombination. As shown, barcodes of size three and four become unstable if m≥24 bp and m≥5 bp, respectively, while barcodes of size five or greater are always unstable.

Orientation of a cassette’s flanking sites is immutable under recombination. Therefore cassettes with direct and inverted flanking sites generate barcodes with direct and inverted flanking sites only. Having seen that maximal code diversity grows as O(n k), and that having inverted flanking sites relative to direct ones increases the maximum size of barcodes by one, it follows that the diversity for cassettes with inverted flanking sites is of the order O(n k+1). Inverted flanking sites are thus superior in terms of code diversity and are an essential design decision.

Optimality regarding the size of the elements, m, is more intricate. For m<5, the maximum size of barcodes is four elements, and according to the formula above, their diversity grows as O(n 4). The stability of barcodes with four elements is, however, sensitive to the minimal distance estimate (illustrated by the gray interval in Fig. 2 c). In addition, the short length of code elements limits error correction, a point revisited later. Thus we focus on cassettes in the regime 5bp≤m<24 bp, which generate error-robust barcodes of up to size three and a code diversity that is insensitive to the reported minimal Lox interaction distance.

Alternating Lox cassettes with inverted flanking sites maximize code diversity

For the orientation of the remaining Lox sites we prove, via a two-step strategy, that the alternating design produces maximal code diversity. First we derive a refined upper bound for the diversity that takes into account the structure of the Lox cassette, but ignores constraints imposed by the recombination process. We then show that alternating Lox cassettes with inverted flanking sites and n≥7 elements are unconstrained in terms of barcode generation via sequential recombination events, thus achieving this upper bound.

An upper bound for Lox barcode diversity

During Cre induced recombination, Cre proteins cleave the core region of the interacting Lox sites asymmetrically [40]. The sequences between subsequent cleavage sites are not affected by Cre and represent the fundamental building blocks of the Lox barcode cassette. Each block contains a code element and half a Lox site on each side.

Depending on the orientation of the Lox sites, there are four possible types of blocks (Fig. 2 d). Three colours have been used to code these: red, green and blue. By definition, the reverse complement of a block is of the same colour class. In contrast to blue blocks, red and green blocks have their Lox cores cleaved in a way such that their flanking Lox sites are unchanged after inversion, while the intervening sequence is reverse-complemented.

Blocks are similar to the concept of units defined in [34], which proves instrumental to derive expressions for the total number of sequences, stable or unstable, that are generated from a Lox cassette where all (n+1) sites can interact. In our context, the latter condition implies m>82 and, as discussed above, a code diversity of order O(n). Here we focus on enumerating exclusively size-stable sequences that arise in the regime 5bp≤m<24 bp with code diversities of order O(n 3).

Stable codes are necessarily made of blocks from the initial cassette, and as shown in Fig. 2 e, their composition in terms of block colors is prescribed. Letting n r , n g , and n b be the number of red, green, and blue blocks in the initial cassette with n elements, an upper bound on the number of possible barcodes of size k with k r red, k g green and k b blue blocks is

$$\begin{array}{*{20}l} k_{r}!{n_{r}\choose{k_{r}}}k_{g}!{{n_{g}}\choose{k_{g}}}k_{b}!{{n_{b}}\choose{k_{b}}}2^{k_{r}+k_{g}}, \end{array} $$

where n r +n g +n b =n and k r +k g +k b =k. It is the number of possible outcomes when choosing k r , k g and k b from n r , n g and n b elements in arbitrary order. The additional factor \(2^{k_{r}+k_{g}}\phantom {\dot {i}\!}\) arises as there are two valid orientations of every code element of a red and green block after recombination, whereas blue blocks due not enjoy this property. Conditioned on n r , n g , and n b , to derive an upper bound for a cassettes’s diversity, we add the numbers for the four possible stable barcode configurations of k r , k g , and k b , shown in Fig. 2 e, taking into account that certain configurations appear more than once (e.g. the configurations with one red and two blue blocks appears three times). For 5 bp ≤m<24 bp, and cassettes with inverted flanking sites pointing at each other (the opposite case is similar) this yields, by applying the expression above to each of the four configurations,

$$\begin{array}{*{20}l} 3\left(1!{n_{r} \choose 1}2!{n_{b}\choose 2}\right)2^{1} +1\left(2!{{n_{r}}\choose{2}}1!{{n_{g}}\choose{1}}\right)2^{2+1}+\\ +2\left(1!{{n_{r}}\choose{1}}1!{{n_{b}}\choose{1}}\right)2^{1} +1\left(1!{{n_{r}}\choose{1}}\right)2^{1}. \end{array} $$

By construction, n g =n r −1, and since n b =n−2n r +1, substituting the respective terms leads to an expression that is a function of n and n r alone. For given n odd, this reduces the task of finding the optimal cassette design to an explicitly solvable one-dimensional optimization problem,

$${} \begin{aligned} \underset{n_{r}}{\arg\max}\quad 32 {n_{r}^{3}} - 12 (2 n + 3) {n_{r}^{2}} + (6 n^{2} + 10 n + 14) n_{r} \quad \\\text{for}\quad n_{r} \leq \frac{n+1}{2}. \end{aligned} $$

For n≥5, the global maximum is achieved at the boundary n r =(n+1)/2. This implies n b =0, and a global upper diversity bound of (n+1)(n−1)2+(n+1), of order O(n 3). It is easily verified that n b =0 is only possible if the cassette design is alternating and n is odd, which implies the flanking sites are inverted.

Alternating Lox cassette design achieves the upper diversity bound

For an alternating cassette design, achieving the code diversity upper bound requires complete freedom in code generation via recombination events. By construction, we show that this is the case if n≥7. To aid understanding, we illustrate in Fig. 2 f operations on odd and even elements via Cre Lox recombination, in cassettes with five, six and seven elements. Note that each operation (or move) inverts the orientation of the respective element.

Consider an alternating cassette with five elements and m≥5 bp, and recombination events that do not alter the size of the cassette (i.e., inversions). First note that red blocks in position three and five can move into the first position via a single recombination event (Fig. 2 f). Furthermore, a red block in position one can be inverted by first moving to position three, then to five, and back again. A straight-forward recipe to create an arbitrary code made of a single red block is then to: i) move the block into the first position (if required); ii) change its orientation (if required); and finally iii) excise the remaining blocks.

Similarly, to generate an arbitrary code composed of a red and a green block from an alternating cassette with six elements, we can perform steps i) and ii). Then we apply the same procedure to the green blocks, leaving the first block untouched. This results in the first two blocks of the cassette being identical to the desired code. To generate the size-stable code, elements that are not part of the code are excised.

Finally, for a cassette with seven elements, sequentially following the recipe given above, the first three blocks can be populated such that they match any possible code before excising the remaining blocks. This shows that any possible code of size one to three can be created via Lox recombination if the cassette is alternating, n≥7, m≥5 bp, and flanking sites are inverted.

Under constitutive Cre expression, barcodes with three elements can still undergo inversions via the flanking sites, which reduces their code diversity by a factor of two. The code diversity is therefore that given in Eq. (1).

Design of code element sequences

That barcodes generated from a Lox cassette are pre-defined in terms of sequence and position in the genome represents an advantage over existing in situ barcoding systems that rely on insertion site analysis for barcode readout [32, 49]. If codes-reading was error-free, choosing code elements of a particular color (red, green or blue, see Fig. 2 d for the definition) from a set of sequences that differ at least by one bp pair in both orientations would be sufficient. The maximum number of such elements is easily computed as (4m−4m/2)/2 and 4m/2 for m even or odd, respectively, which is large even for small m.

With reading errors, in order to remain perfectly robust to one mismatch error, elements of a given color need to differ by at least three base pairs in both orientations for nearest-neighbor matching to be able to correct the error [50]. To ensure correction of j mismatch errors, the minimal required Hamming distance between the code elements is 2j+1 bp. The size of the sets of elements that meet this condition quickly decreases with increasing j (see Fig. 3 a for numerical estimates). To reliably be able to correct for two sequencing errors requires m≥5 bp.

Fig. 3
figure 3

Design of code elements and probabilistic features of optimal Lox cassettes. a Numerically determined maximal size of sets of elements that are separated by a minimal Hamming distance of 1 bp (black), 3 bp (gray), and 5 bp (blue). In order to be robust to two sequencing errors, the minimal distance needs to be 5 bp, which requires m to be larger than 4 bp. b Upper bound for the expected proportion of misclassified elements as a function of empirical DNA sequencing mismatch error rates [52] for three common sequencing platforms (Illumina, Ion Torrent, Pacific Biosciences) and different sequence data (P. falciparum (∙), E. coli (\(\blacktriangle \)), R. spha. (■), H. sapiens (+)). The minimal distance that separates the elements is 1 bp (solid), 3 bp (dotted), and 5 bp (dashed). c Ranked probabilities of the 1022 size-stable barcodes from a cassette with 13 elements generated under constitutive Cre expression. While a few codes are relatively frequent, the majority of codes are rare. d Scatter-plot showing barcode probabilities against the average number of excisions (black) and the number of inversions (blue) that are needed to generate from an initially 13 element optimal cassette a size-stable barcode. e Maximum number of cells in which a barcode can be induced versus the number of cells that produce 99 %-unique codes, for one to four sequential cassettes. The color represents the percentage of discarded codes relative to the total code diversity, which can be adjusted to experimental conditions post acquisition. Inducing barcodes in a target population of 108 (circle) and 1012 (square) cells yields a proportion of 10 % and 0.1 % of 99 %-unique barcodes respectively. f Although code diversity grows as O(n 3), the expected number of recombination events that are needed to generate a size-stable code increases linearly with the number of elements in the initial cassette

Assuming that sequencing errors arise independently and error rates are identical for all bases, the number of mismatch sequencing errors in a code element of size m is Binomial with parameters m and the error probability per bp [51]. Any element that has j or less errors will be classified correctly by nearest-neighbor matching. The probability of more than j errors gives an upper bound for the expected proportion of misclassified code elements. Fig. 3 b shows this for elements of size m=7 bp as a function of the minimal distance and the mismatch error rates for next-generation sequencing platforms [52]. Different symbols indicate different sequence data. Even for low-fidelity platforms like Pacific Bioscience single molecule real time sequencing, a minimal distance of five bp results in less than ten misclassified elements per million.

A concrete example for an alternating Lox barcoding cassette with 13 code elements of size seven bp each (in bold), and robust to two sequencing errors per element (i.e. the minimal Hamming distance between elements of the same color is 5), is:


Probabilistic features of optimal Lox cassettes

In this section we explore probabilistic features of the optimal design: the probability to generate each of the final codes; and the number of recombination events that are needed to create size-stable codes. For the analysis, we make two assumptions: first, all interactions with Lox sites that are at least 82 bp apart are equally likely; second, recombination events occur sequentially and independently.

Barcode distribution is heterogeneous

Size-stable barcodes of a Lox cassette are randomly generated and not all codes are equally likely. This is in contrast to the Rci invertase based approach implemented by Peikon et al. [35], who reported a close to the ideal uniform distribution of barcodes generated in E. coli. after several recombination events.

Although an analytical expression for the probability mass function of final codes is not available, stochastic simulations enable us to study properties of practical importance such as the probability of generating a code more than once. Ensuring this probability is low is important in practice because progeny of two cells that independently generate the same code will be confounded as pertaining to the same clone.

Figure 3 c shows the probability to be generated for each of the 1022 codes that ensue from a cassette with 13 elements (sorted in ascending order). To produce this plot, 108 barcodes were Monte Carlo generated in silico via sequential recombination of the initial cassette. The number of times a specific code appeared was recorded, normalized and sorted. While some codes are relatively frequent, most are rare. In Fig. 3 d, the average number of recombination events (inversions: blue, excision: black) is plotted as a function of barcode probability. The number of inversions and barcode probability are negatively correlated, an indication that rare codes undergo, on average, more inversions. The number of excisions is close to two for all codes.

Ideally, each cell is tagged with a unique barcode. As with all existing barcoding techniques however, 100 % unique barcodes cannot be guaranteed unless each cell is separately transduced with a different code, an approach pursued by Grosselin et al. [53]. What influences the expected number of unique barcodes is the code diversity D, p i , the probability of code i, where i{1,2,…,D}, and j, the total number of codes that are generated. Using analysis of the generalised birthday party problem [54], the expected proportion of unique codes is

$$\begin{array}{*{20}l} \sum_{i=1}^{D}p_{i}(1-p_{i})^{j-1} \approx 1-(j-1)\sum_{i=1}^{D}{p_{i}^{2}}, \end{array} $$

where the numerically convenient approximation on the right hand side arises from a Taylor expansion around 0 and is appropriate if (j−1)1/(maxi p i ). Relatively large p i ’s negatively affect the expected proportion of unique codes. Therefore, for heterogeneous barcode distributions, a natural strategy is to discard most frequent codes in order to exclude from the analysis barcodes that are more likely to be induced more than once. In the following, we assume that from all induced barcodes, keeping a subset that contains on average 99 % unique barcodes is sufficient for most applications and call these barcode sets 99 %-unique.

Using the approximation Eq. (2), in Fig. 3 e we computed the maximum number of cells in which a barcode is induced versus the number of induced barcodes that are 99 %-unique, for one to four sequential cassettes (indicated by the numbers 1 to 4). The color represents the percentage of discarded codes relative to the total code diversity. This parameter can be adjusted to meet the specific needs of a given experiment. For instance, for four concatenated cassettes with 13 elements each, inducing barcodes in a target population of 108 cells yields 10 %, or 107 99 %-unique barcodes (indicated by a circle). If the target population is larger, e.g., 1012 cells (indicated by a square), the proportion of 99 %-unique to total induced barcodes is reduced (approximately 0.1 %), giving 109 single cells that carry a 99 %-unique barcode.

These results show that by discarding frequent codes from the read-out, large numbers of clones can be tracked with high confidence, suggesting Cre Lox in situ barcoding is suitable for high-throughput lineage tracing experiments.

Number of recombination events to generate barcodes does not diverge with cassette size

If Cre is expressed for long enough, Lox cassettes will eventually become size-stable. The time this will take correlates with the number of recombination events that separate a stable barcode from its initial cassette. Below, we estimate this quantity using the theory of absorbing Markov chains.

In a cassette with n elements, there are n+1 Lox sites. The number of Lox pairs that are flanking k elements is n+1−k. Lox pairs that have less than three elements in between do not interact, as they are separated by less than the minimal 82 bp distance. Pairs of Lox sites that have three or more elements in between are termed productive. For n≥3 the number of productive pairs is \(\sum _{k=3}^{n}(n+1-k)=(n-1)(n-2)/2\), and the number of productive pairs, where recombination leads to excision, i.e. where an even number of elements separates the two sites, is

$$\begin{array}{*{20}l} \sum_{3\leq k\leq n: k~\text{even}}(n+1-k)={\frac{(n-1)(n-3)}{4}} \end{array} $$

for n odd. The equalities are a direct consequence of evaluating the respective sums. The probability that a productive pair excises exactly k elements is given by the ratio of productive pairs that are separated by k elements by the total number of productive pairs, i.e.

$${} \begin{aligned} P(\textrm{excision of }k~\text{elements})&=\frac{n+1-k}{\sum_{k=3}^{n}(n+1-k)}\\ &= \frac{2(n+1-k)}{(n-1)(n-2)}, \end{aligned} $$

for k even, 3≤kn, otherwise it is zero. Similarly, the number of productive pairs where recombination leads to inversion is (for n is odd)

$$\begin{array}{*{20}l} \sum_{3\leq k\leq n,k~\text{odd}}(n+1-k)={\frac{(n-1)^{2}}{4}}, \end{array} $$

and the probability that interaction of a productive pair leads to an inversion is

$$\begin{array}{*{20}l} P(\text{inversion})=\frac{2(n-1)^{2}}{4(n-1)(n-2)}={\frac{n-1}{2(n-2)}}. \end{array} $$

Equations (3)–(5) enable a description of the formation of size-stable barcodes as a discrete-time absorbing Markov chain. The number of elements in the cassette corresponds to its state, and Eqs. (3) and (5) give the transition probabilities from n to nk, and from n to n elements respectively. There are n−3 transient and 4 absorbing states. Absorbing states are cassettes that have either three, two, one, or zero elements. Absorbing Markov models are well understood, and a wealth of theoretical predictions regarding their properties are available [55]. These include the average number of steps until reaching an absorbing state, starting in one of the transient states. The fundamental matrix of this Markov Chain is

$$\begin{array}{*{20}l} N= (I_{n-3}-Q)^{-1}, \end{array} $$

where I n−3 is an (n−3)×(n−3) identity matrix, and Q is the transition matrix corresponding to the transient states. The expected number of recombination events, starting with a cassette of n elements, until reaching a final code is then the n th entry of the vector t=N c, where c is a column vector all of whose entries are 1.

In Fig. 3 f, the average number of recombination events that separate the initial cassette from a final code is shown as a function of the cassette length. Although code diversity grows as O(n 3), the number of recombination events required to generate a code increases linearly in n.


Lox barcode cassettes with code elements of size four

When we derive the upper code diversity bound and the optimal Lox barcode cassette, we focus on code elements in the regime 5 bp≤m<24 bp. These have maximal size-stable barcodes of three elements that are largely insensitive to over and under estimation of the minimal Lox interaction distance. For m<5 bp, size-stable barcodes of four elements are possible and their maximal code diversity grows as O(n 4). These are stable, however, only if the minimal interaction distance between two Lox sites is greater than 80 bp, a distance at which interactions have shown to still be possible in vivo in the similar Flp/FRT system [39].

Most interesting is the case m=4 bp, which permits correction of one sequencing error with six code elements that are 3 bp apart in both orientations (see gray bars in Fig. 3 a). The upper diversity bound is derived along the same lines as for m≥5 bp (see Fig. 4 a for possible stable codes), which gives

$$\begin{array}{*{20}l}{} 48{{n_{r}}\choose{1}}{{n_{b}}\choose{3}} &+64{{n_{r}}\choose{2}}{{n_{g}}\choose{1}}{{n_{b}}\choose{1}} +12{{n_{r}}\choose{1}}{{n_{b}}\choose{2}}+\\ &+16{{n_{r}}\choose{2}}{{n_{g}}\choose{1}} +4{{n_{r}}\choose{1}}{{n_{b}}\choose{1}} +2{{n_{r}}\choose{1}}. \end{array} $$
Fig. 4
figure 4

Short code elements, higher order Lox interactions, transient Cre activation, and distance dependent Lox-Lox complex formation. a Possible size-stable barcodes if m<5 bp. b Cassette with 17 elements and m=4 bp that attains an effective code diversity of 19,716 barcodes, requiring that the minimal Lox interaction distance is greater than 80 bp. c Higher order Lox interactions, where two or more pairs of Lox sites recombine simultaneously, can lead to unexpected recombination products. d Estimated barcode distribution if up to two Lox pairs can interact simultaneously (blue). The distribution becomes flatter at the lower end, which implies that rare codes are more likely compared to a scenario in which recombination events occur sequentially only (black). e Mimicking a short Cre activation pulse in a population of a million cells carrying a 13 element Lox cassette, the number of recombination events is assumed Poisson distributed with mean 1. The main graph shows code abundance after the pulse, where almost 104 distinct barcodes are generated, with circa 30 % being generated only once. The inset, similar to Fig. 3 e, gives the number of 99 %-unique barcodes as a function of induced and discarded barcodes. f Distance dependent Lox-Lox interactions. Assuming that the likelihood for two productive Lox sites to form a complex is inversely proportional to their distance, interactions between sites that are close-by are favored over interactions of sites that lie further apart (see inset for the distribution over distance dependent (red) and uniform (gray) Lox-Lox interactions). The difference in barcode probabilities is relatively small between the distance dependent (red) and uniform (gray) scenario, their distribution being more homogeneous if close-by sites form complexes more frequently

To maximize usage of the 6 code elements, we start with a cassette that has six red, five green and six blue blocks, i.e. {n r ,n g ,n b }={6,5,6}. This gives an upper diversity bound of 36996 barcodes. As confirmed by Monte Carlo simulations, this upper bound is attained by a cassette with inverted flanking sites in which the first 11 Lox sites are alternating, and the remaining sites, except the last, are oriented in the same direction as the first Lox site (Fig. 4 b). Under constitutive Cre expression, barcodes with four elements can still undergo inversions, and the effective code diversity is 19,716.

Careful measurements will be needed to determine whether Lox sites at a distance of 80 bp still interact. If they don’t, the cassette shown in Fig. 4 b with m=4 bp represents an interesting alternative to the barcode cassette design described in the main text, as with less elements it reaches higher code diversity, but at the cost of less robustness to sequencing error and hence barcode readout fidelity.

Higher order Lox interactions

In the Cre Lox system, single recombination events always involve exactly two Lox sites. However nothing except DNA flexibility prevents several pairs of Lox sites to interact at the same time. The rate at which pairs of Lox sites bind simultaneously depends on the number of Lox sites and the kinetic rates of Lox-Lox complexes. In vitro, the latter appear surprisingly stable [40] and together with the potentially large number of Lox sites in the barcode cassettes, make simultaneous interactions a plausible possibility.

Higher order Lox interactions lead to unexpected and in certain cases novel recombination products (Fig. 4 c). For example, simultaneous interactions of two overlapping pairs of Lox sites oriented in the same direction do not result in excision, but in a reordering of the sequences between the sites. Similarly, if pairs are inverted, simultaneous recombinations do not invert but excise the sequence between the outermost sites.

For the alternating cassette and n≥7, multiple concurrent Lox interactions do not generate additional codes as the upper code diversity bound is already attained. Therefore our results on Lox barcode design and code elements remain unchanged in the presence of higher order Lox interactions. What does change is the distribution over barcodes, which flattens in the tail if more than one Lox pair recombines at a time (Fig. 4 d).

Transient Cre expression

Code diversity strongly depends on the number of elements in size-stable barcodes. If Cre is expressed constitutively, size-stable barcodes with code elements of size m≥5 bp have a maximum of three elements. One possibility is to create transient Cre activity rather than constitutive.

A well tested system that provides temporal control over Cre activity is tamoxifen inducible CreEr [42]. In the presence of tamoxifen, the fusion protein CreEr, which is normally located in the cytoplasm, is transported into the nucleus, where it can bind to Lox sites and induce recombination. Depending on the duration of Cre activation and its efficiency, stable sequences with more than three elements are likely to be generated from a Lox barcode cassette. Although most of these sequences are stable only in the absence of Cre, in this section we make no distinction between these and the size-stable barcodes defined earlier.

Figure 4 e shows barcode probabilities after activation of CreEr in 106 cells with an optimal Lox cassette of size 13. The number of recombination events induced by transient CreEr activity is assumed Poisson distributed with mean one. About 104 distinct barcodes are generated, and 30 % of these appear only once. For comparison, the inset, similar to Fig. 3 e, indicates that a maximum of circa 170 99 %-unique barcodes are generated from a single cassette, by inducing barcodes in about 17000 cells. For a Poisson distributed number of recombination events with mean one, this is 30 times more than what is feasible with size-stable codes from the same cassette.

Although highly promising in terms of code diversity, it should be noted that potential drawbacks of this approach are the length of the barcodes (leading to more involved code sequencing), leakiness of CreEr into the nucleus in non-induced cells [56], the relatively long half-life of tamoxifen [57], and a barcode probability that depends on the efficiency of Cre induction.

Distance dependent Lox-Lox complex formation.

Cre and co-localization are necessary for two Lox sites to form a complex. Therefore the distance between two sites, in addition to the minimal distance constraint considered so far, is likely to impact on Lox-Lox recombination efficiencies. In this section we analyze how distance dependent Lox-Lox interactions change barcode probabilities relative to uniform interactions.

Modelling DNA as a flexible polymer, the probability of a Lox-Lox complex is predicted to be inversely proportional to the distance in bp between sites [58]. Together with a minimal distance of 82 bp, we use this model to compare the distribution over barcodes with the distance-independent scenario for the 13-element optimal Lox barcoding cassette. As shown in Fig. 4 f, Lox sites that are closer form complexes more often relative to the uniform case (inset), and barcode probabilities are more homogeneous. Thus, in our model, distance dependent Lox-Lox complex complex formation improves mixing of Lox barcode cassettes before reaching their size-stable configuration.


Existing cellular barcoding approaches have already lead to significant biological discoveries and so new approaches that overcome their shortcomings are inherently desirable. Here we have established that using Cre Lox, it would be feasible to create an in situ, triggerable barcoding system with sufficient diversity to label a whole mouse, and propose this as a system for experimental implementation.


Simulations for predicting barcode probabilities and computing the number of 99 %-unique barcodes (Figs. 3 ce and 4 df) are implemented in custom C++ code and visualized in R.


bp, base pair; DOX, doxycycline; kb, kilobase


  1. Buchholz VR, Flossdorf M, Hensel I, Kretschmer L, Weissbrich B, Gräf P, Verschoor A, Schiemann M, Höfer T, Busch DH. Disparate individual fates compose robust CD8+ T cell immunity. Science. 2013; 340(6132):630–5.

    Article  CAS  PubMed  Google Scholar 

  2. Gerlach C, van Heijst JWJ, Swart E, Sie D, Armstrong N, Kerkhoven RM, Zehn D, Bevan MJ, Schepers K, Schumacher TNM. One naive T cell, multiple fates in CD8+ T cell differentiation. J Exp Med. 2010; 207(6):1235–46. doi:10.1084/jem.20091175.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Verovskaya E, Broekhuis MJ, Zwart E, Ritsema M, van Os R, de Haan G, Bystrykh LV. Heterogeneity of young and aged murine hematopoietic stem cells revealed by quantitative clonal analysis using cellular barcoding. Blood. 2013; 122(4):523–32.

    Article  CAS  PubMed  Google Scholar 

  4. Ema H, Morita Y, Suda T. Heterogeneity and hierarchy of hematopoietic stem cells. Exp Hematol. 2014; 42(2):74–82.

    Article  CAS  PubMed  Google Scholar 

  5. Johnson MB, Wang PP, Atabay KD, Murphy EA, Doan RN, Hecht JL, Walsh CA. Single-cell analysis reveals transcriptional heterogeneity of neural progenitors in human cortex. Nat Neurosci. 2015; 18(5):637–46. doi:10.1038/nn.3980.

    Article  CAS  PubMed  Google Scholar 

  6. Yagi T. Genetic basis of neuronal individuality in the mammalian brain,. J Neurogenet. 2013; 27(3):97–105. doi:10.3109/01677063.2013.801969.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Zeisel A, Muñoz-Manchado AB, Codeluppi S, Lönnerberg P, La Manno G, Juréus A, Marques S, Munguba H, He L, Betsholtz C, Rolny C, Castelo-Branco G, Hjerling-Leffler J, Linnarsson S. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science. 2015; 347(6226):1138–42.

    Article  CAS  PubMed  Google Scholar 

  8. Nolan-Stevaux O, Tedesco D, Ragan S, Makhanov M, Chenchik A, Ruefli-Brasse A, Quon K, Kassner PD. Measurement of cancer cell growth heterogeneity through lentiviral barcoding identifies clonal dominance as a characteristic of in vivo ttmor engraftment. PLoS ONE. 2013; 8(6):67316.

    Article  Google Scholar 

  9. Bhang H-EC, Ruddy DA, Krishnamurthy Radhakrishna V, Caushi JX, Zhao R, Hims MM, Singh AP, Kao I, Rakiec D, Shaw P, Balak M, Raza A, Ackley E, Keen N, Schlabach MR, Palmer M, Leary RJ, Chiang DY, Sellers WR, Michor F, Cooke VG, Korn JM, Stegmeier F. Studying clonal dynamics in response to cancer therapy using high-complexity barcoding. Nat Med. 2015; 21(5):440–8. doi:10.1038/nm.3841.

    Article  CAS  PubMed  Google Scholar 

  10. Klauke K, Broekhuis MJC, Weersing E, Dethmers-Ausema A, Ritsema M, González MV, Zwart E, Bystrykh LV, de Haan G. Tracing dynamics and clonal heterogeneity of cbx7-induced leukemic stem cells by cellular barcoding. Stem Cell Rep. 2015; 4(1):74–89. doi:10.1016/j.stemcr.2014.10.012.

    Article  CAS  Google Scholar 

  11. Rohr JC, Gerlach C, Kok L, Schumacher TN. Single cell behavior in t cell differentiation. Trends Immunol. 2014; 35(4):170–7.

    Article  CAS  PubMed  Google Scholar 

  12. Reiner SL, Adams WC. Lymphocyte fate specification as a deterministic but highly plastic process. Nat Rev Immunol. 2014; 14(10):699–704.

    Article  CAS  PubMed  Google Scholar 

  13. Duffy KR, Hodgkin PD. Intracellular competition for fates in the immune system. Trends Cell Biol. 2012; 22(9):457–64. doi:10.1016/j.tcb.2012.05.004.

    Article  CAS  PubMed  Google Scholar 

  14. Smith JA, Martin L. Do cells cycle? Proc Natl Acad Sci U S A. 1973; 70(4):1263–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Hawkins ED, Markham JF, McGuinness LP, Hodgkin PD. A single-cell pedigree analysis of alternative stochastic lymphocyte fates. Proc Natl Acad Sci U S A. 2009; 106(32):13457–62.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Markham JF, Wellard CJ, Hawkins ED, Duffy KR, Hodgkin PD. A minimum of two distinct heritable factors are required to explain correlation structures in proliferating lymphocytes. J R Soc Interface. 2010; 7(48):1049–59.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Rieger MA, Hoppe PS, Smejkal BM, Eitelhuber AC, Schroeder T. Hematopoietic cytokines can instruct lineage choice. Science. 2009; 325(5937):217–8.

    Article  CAS  PubMed  Google Scholar 

  18. Gomes FL, Zhang G, Carbonell F, Correa JA, Harris WA, Simons BD, Cayouette M. Reconstruction of rat retinal progenitor cell lineages in vitro reveals a surprising degree of stochasticity in cell fate decisions. Development. 2011; 138(2):227–35.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Giurumescu CA, Kang S, Planchon TA, Betzig E, Bloomekatz J, Yelon D, Cosman P, Chisholm AD. Quantitative semi-automated analysis of morphogenesis with single-cell resolution in complex embryos. Development. 2012; 139(22):4271–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Richards JL, Zacharias AL, Walton T, Burdick JT, Murray JI. A quantitative model of normal caenorhabditis elegans embryogenesis and its disruption after stress. Dev Biol. 2013; 374(1):12–23.

    Article  CAS  PubMed  Google Scholar 

  21. Duffy KR, Wellard CJ, Markham JF, Zhou JHS, Holmberg R, Hawkins ED, Hasbold J, Dowling MR, Hodgkin PD. Activation-induced B cell fates are selected by intracellular stochastic competition. Science. 2012; 335(6066):338–41.

    Article  CAS  PubMed  Google Scholar 

  22. Etzrodt M, Endele M, Schroeder T. Quantitative single-cell approaches to stem cell research. Cell Stem Cell. 2014; 15(5):546–58.

    Article  CAS  PubMed  Google Scholar 

  23. Cohen AR. Extracting meaning from biological imaging data. Mol Biol Cell. 2014; 25(22):3470–3.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Gerrits A, Dykstra B, Kalmykowa OJ, Klauke K, Verovskaya E, Broekhuis MJC, de Haan G, Bystrykh LV. Cellular barcoding tool for clonal analysis in the hematopoietic system. Blood. 2010; 115(13):2610–8. doi:10.1182/blood-2009-06-229757.

    Article  CAS  PubMed  Google Scholar 

  25. Lu R, Neff NF, Quake SR, Weissman IL. Tracking single hematopoietic stem cells in vivo using high-throughput sequencing in conjunction with viral genetic barcoding. Nat Biotech. 2011; 29(10):928–33. doi:10.1038/nbt.1977.

    Article  CAS  Google Scholar 

  26. Naik SH, Schumacher TN, Perié L. Cellular barcoding: a technical appraisal. Exp Hematol. 2014; 42(8):598–608. doi:10.1016/j.exphem.2014.05.003.

    Article  PubMed  Google Scholar 

  27. Schepers K, Swart E, van Heijst JW, Gerlach C, Castrucci M, Sie D, Heimerikx M, Velds A, Kerkhoven RM, Arens R, Schumacher TN. Dissecting T cell lineage relationships by cellular barcoding. J Exp Med. 2008; 205(10):2309–18. doi:10.1084/jem.20072462.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Capel B, Hawley R, Covarrubias L, Hawley T, Mintz B. Clonal contributions of small numbers of retrovirally marked hematopoietic stem cells engrafted in unirradiated neonatal W/Wv mice. Proc Natl Acad Sci U S A. 1989; 86(12):4564–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Naik SH, Perié L, Swart E, Gerlach C, van Rooij N, de Boer RJ, Schumacher TN. Diverse and heritable lineage imprinting of early haematopoietic progenitors. Nature. 2013; 496(7444):229–32. doi:10.1038/nature12013.

    Article  CAS  PubMed  Google Scholar 

  30. Perié L, Hodgkin PD, Naik SH, Schumacher TN, de Boer RJ, Duffy KR. Determining lineage pathways from cellular barcoding experiments. Cell Rep. 2014; 6(4):617–24.

    Article  PubMed  Google Scholar 

  31. Perié L, Duffy KR, Kok L, de Boer RJ, Schmacher TN. The branching point in erythro-myeloid differentiation. Cell. 2015; 163(7):1655–62.

    Article  PubMed  Google Scholar 

  32. Sun J, Ramos A, Chapman B, Johnnidis JB, Le L, Ho YJ, Klein A, Hofmann O, Camargo FD. Clonal dynamics of native haematopoiesis. Nature. 2014; 514(7522):322–7. doi:10.1038/nature13824.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Zador AM, Dubnau J, Oyibo HK, Zhan H, Cao G, Peikon ID. Sequencing the Connectome. PLoS Biol. 2012; 10(10):1001411. doi:10.1371/journal.pbio.1001411.

    Article  Google Scholar 

  34. Wei Y, Koulakov AA. An exactly solvable model of random site-specific recombinations. Bull Math Biol. 2012; 74(12):2897–916.

    Article  PubMed  Google Scholar 

  35. Peikon ID, Gizatullina DI, Zador AM. In vivo generation of DNA sequence diversity for cellular barcoding. Nucleic Acids Res. 2014; 42(16):127. doi:10.1093/nar/gku604.

    Article  Google Scholar 

  36. Livet J, Weissman TA, Kang H, Draft RW, Lu J, Bennis RA, Sanes JR, Lichtman JW. Transgenic strategies for combinatorial expression of fluorescent proteins in the nervous system. Nature. 2007; 450(7166):56–62.

    Article  CAS  PubMed  Google Scholar 

  37. Cai D, Cohen KB, Luo T, Lichtman JW, Sanes JR. Improved tools for the Brainbow toolbox. Nat Methods. 2013; 10(6):540–7. doi:10.1038/nmeth.2450.

    Article  CAS  PubMed Central  Google Scholar 

  38. Hoess R, Wierzbicki A, Abremski K. Formation of small circular DNA molecules via an in vitro site-specific recombination system. Gene. 1985; 40(2-3):325–9.

    Article  CAS  PubMed  Google Scholar 

  39. Ringrose L, Chabanis S, Angrand PO, Woodroofe C, Stewart AF. Quantitative comparison of DNA looping in vitro and in vivo: chromatin increases effective DNA flexibility at short distances. EMBO J. 1999; 18(23):6630–41.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Pinkney JN, Zawadzki P, Mazuryk J, Arciszewska LK, Sherratt DJ, Kapanidis AN. Capturing reaction paths and intermediates in Cre-loxP recombination using single-molecule fluorescence. Proc Natl Acad Sci U S A. 2012; 109(51):20871–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Parrish M, Unruh J, Krumlauf R. BAC modification through serial or simultaneous use of CRE/Lox technology. J Biomed Biotechnol. 2011; 2011:1–12. doi:10.1155/2011/924068.

    Article  Google Scholar 

  42. Nagy A. Cre recombinase: the universal reagent for genome tailoring. Genesis. 2000; 26:99–109.

    Article  CAS  PubMed  Google Scholar 

  43. Blattman JN, Antia R, Sourdive DJD, Wang X, Kaech SM, Murali-Krishna K, Altman JD, Ahmed R. Estimating the precursor frequency of naive antigen-specific CD8 T cells. J Exp Med. 2002; 195(5):657–64.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Sternberg N, Hamilton D, Hoess R. Bacteriophage P1 site-specific recombination. II. Recombination between loxP and the bacterial chromosome. J Mol Biol. 1981; 150(4):487–507.

    Article  CAS  PubMed  Google Scholar 

  45. Hamilton DL, Abremski K. Site-specific recombination by the bacteriophage P1 lox-Cre system. Cre-mediated synapsis of two lox sites. J Mol Biol. 1984; 178(2):481–6.

    Article  CAS  PubMed  Google Scholar 

  46. Guo F, Gopaul DN, Van Duyne GD. Structure of Cre recombinase complexed with DNA in a site-specific recombination synapse. Nature. 1997; 389(6646):40–6. doi:10.1038/37925.

    Article  CAS  PubMed  Google Scholar 

  47. Oberdoerffer P, Otipoby KL, Maruyama M, Rajewsky K. Unidirectional Cre-mediated genetic inversion in mice using the mutant loxP pair lox66/lox71. Nucleic Acids Res. 2003; 31(22):e140.

    Article  PubMed  PubMed Central  Google Scholar 

  48. Colvin GA, Lambert JF, Abedi M, Hsieh CC, Carlson JE, Stewart FM, Quesenberry PJ. Murine marrow cellularity and the concept of stem cell competition: geographic and quantitative determinants in stem cell biology. Leukemia. 2004; 18(3):575–83. doi:10.1038/sj.leu.2403268.

    Article  CAS  PubMed  Google Scholar 

  49. Bystrykh LV, Verovskaya E, Zwart E, Broekhuis M, de Haan G. Counting stem cells: methodological constraints. Nat Meth. 2012; 9(6):567–74. doi:10.1038/nmeth.2043.

    Article  CAS  Google Scholar 

  50. Cover TM, Thomas JA. Elements of information theory. New York: Wiley-Interscience; 1991.

    Book  Google Scholar 

  51. Li H, Ruan J, Durbin R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 2008; 18(11):1851–8. doi:10.1101/gr.078212.108.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Ross MG, Russ C, Costello M, Hollinger A, Lennon NJ, Hegarty R, Nusbaum C, Jaffe DB. Characterizing and measuring bias in sequence data. Genome Biol. 2013; 14(5):51. doi:10.1186/gb-2013-14-5-r51.

    Article  Google Scholar 

  53. Grosselin J, Sii-Felice K, Payen E, Chretien S, Roux DT, Leboulch P. Arrayed lentiviral barcoding for quantification analysis of hematopoietic dynamics. Stem Cells. 2013; 31(10):2162–71. doi:10.1002/stem.1383.

    Article  CAS  PubMed  Google Scholar 

  54. Koot MR, Mandjes M. The analysis of singletons in generalized birthday problems. Probab Eng Inform Sc. 2012; 26(2):245–62. doi:10.1017/s0269964811000350.

    Article  Google Scholar 

  55. Grinstead CM, Snell JL. Introduction to Probability, 2 revised edn. Providence, Rhode Island, U.S.A: American Mathematical Society; 1997.

    Google Scholar 

  56. Kretzschmar K, Watt FM. Lineage Tracing. Cell. 2012; 148(1-2):33–45. doi:10.1016/j.cell.2012.01.002.

    Article  CAS  PubMed  Google Scholar 

  57. Reinert RB, Kantz J, Misfeldt AA, Poffenberger G, Gannon M, Brissova M, Powers AC. Tamoxifen-induced Cre-loxP recombination is prolonged in pancreatic islets of adult mice. PLoS ONE. 2012; 7(3):33529.

    Article  Google Scholar 

  58. Inferring the in vivo looping properties of DNA. Proc Natl Acad Sci U S A. 2005; 102(49):17642–5. doi:10.1073/pnas.0505693102.

Download references


The authors thank Ton Schumacher (Netherlands Cancer Institute) for informative discussions.


The work of T.W., S.N. and K.D. was supported by Human Frontier Science Program grant RGP0060/2012. K.D. was also supported by Science Foundation Ireland grant 12 IP 1263. D.M. was supported by a National Health and Medical Research Council Early Career Fellowship grant 1052195.

Availability of supporting data

The C++ code supporting the results is provided in Additional file 1.

Authors’ contributions

TW, with input from SN and KD, conceived the study. TW, MD and KD performed the mathematical analysis. TW, MD, DM, SG, SN and KD interpreted the data. TW, MD, DM, SG, SN, and KD wrote the paper. All authors read and approved the final manuscript.

Authors’ information

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Consent for publication

Not applicable.

Ethics approval and consent to participate

Not applicable.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Ken R. Duffy.

Additional file

Additional file 1

The C++ code provided in ‘Additional file 1’ computes barcode probabilities (Fig. 3 c) and average number of inversions and excisions (Fig. 3 d) for a Lox barcoding cassette with m Lox sites. With modifications (specified at the end of the file) it also computes the data for Figs. 3 e and 4 e (inset) and distributions shown in Fig. 4 d-f. CPP 12.1 kb

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Weber, T.S., Dukes, M., Miles, D.C. et al. Site-specific recombinatorics: in situ cellular barcoding with the Cre Lox system. BMC Syst Biol 10, 43 (2016).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: