### A probabilistic model of a diffusing particle surrounded by indistinguishable particles

To develop the probabilistic model for estimating the 2D lateral diffusion constants under high particle density, we focus on a single Brownian particle in a time frame (Fig. 1). Without loss of generality, we take the position of the particle as the origin of our polar coordinates. As is well known, the probability of finding the same Brownian particle at a position with a radial distance greater than Δ*r* after a time-lag Δ*t* is given by [22].

$$ {P}_{\mathrm{dif}}\left(r>\Delta r|D\right)={e}^{-\frac{\Delta {r}^2}{4D\Delta t}}, $$

(1)

where the parameter _{
D
} is the diffusion constant of the particle.

In typical time-lapse single molecular imaging of cells, particles are indistinguishable from one another. By assuming the independence of the dynamics of each particle, we can model the distribution of such indistinguishable surrounding particles by a local uniform density,*ρ*, which is a sort of mean field approximation of the surrounding particles. In this approximation, we can derive the probability of having the nearest surrounding particle at a distance greater than Δ*r* as follows. We begin with a finite case where there are, on average, *N* surrounding particles in the disk with a radius *R* around a point. We assume that the surrounding particles are uniformly distributed within the disk. If we consider a smaller disk with a radius Δ*r* inside the disk, the probability of a single surrounding particle being found outside of the smaller disk is 1 − *a*/*A*, where *a* = *π*Δ*r*^{2} and *A* = *πR*^{2} correspond to the areas of the smaller and bigger disks, respectively. Then, the probability that all the *N* surrounding particles are also found outside of the smaller disk is (1 − *a*/*A*)^{N}. Assuming that *a* is much smaller than *A*, this probability can be approximated as

$$ {\left(1-\frac{a}{A}\right)}^N=\exp \left(N\log \left(1-\frac{a}{A}\right)\right)\cong \exp \left(-\frac{aN}{A}\right)=\exp \left(-\rho \pi \Delta {r}^2\right), $$

where *ρ* = *N*/*A* is the local particle density. Thus, the probability of having the nearest surrounding particle at a distance greater than Δ*r* is given by

$$ {P}_{\mathrm{bg}}\left(r>\Delta r|\rho \right)={e}^{-\rho \pi \Delta {r}^2}. $$

(2)

By combining the above results together, the probability of detecting the nearest particle at a distance greater than Δ*r* would be given by

$$ {P}_{\mathrm{nn}}\left(r>\Delta r|\rho, D\right)={P}_{\mathrm{dif}}\left(r>\Delta r|D\right){P}_{\mathrm{bg}}\left(r>\Delta r|\rho \right)={e}^{-\rho \pi \Delta {r}^2-\frac{\Delta {r}^2}{4D\Delta t}}. $$

(3)

This is the fundamental probabilistic model upon which we develop the estimation algorithm of the diffusion constant in this paper (Fig. 1). This probabilistic model generalizes the theory of Brownian motion of a single isolated particle into that of a single particle surrounded by indistinguishable particles.

The indication of the model becomes more manifest if we calculate the expected mean square displacement to the nearest particle (MSDN) as

$$ \mathrm{MSDN}=E\left(\Delta {r}^2\right)\equiv \underset{0}{\overset{\infty }{\int }}\Delta {r}^2\left(-\frac{d}{d\left(\Delta r\right)}{P}_{\mathrm{nn}}\left(r>\Delta r|\rho, D\right)\right)d\left(\Delta r\right)=\frac{4D\Delta t}{1+4\rho \pi D\Delta t}. $$

(4)

This is a natural generalization of the well-known relationship between the MSD of a single diffusing particle and the diffusion constant [22],

$$ \mathrm{MSD}=4D\Delta t. $$

(5)

As expected, MSDN goes back to the original MSD in the limit of *ρ* being zero, (i.e., where there are no surrounding particles). Due to the additional term in the denominator, the MSDN is, in general, smaller than MSD. This is because the nearest particle can be the original particle diffused from the origin as in MSD, or even a nearer surrounding particle.

This relationship can be easily solved with respect to *D*, allowing it to be estimated as

$$ D=\frac{\mathrm{MSDN}}{4\Delta t\left(1-\rho \pi \kern0em \mathrm{MSDN}\right)}. $$

(6)

Compared to the standard estimation from MSD,

$$ D=\frac{\mathrm{MSD}}{4\Delta t}, $$

(7)

the estimated diffusion constant acquires a fold increase of 1/(1 − *ρπ*MSDN), which compensates for the apparent reduction of the displacement compared to MSD. In Fig. 2, we show the MSDN for simulated data. As Δ*t* increases, the points deviate from the line 4*D*Δ*t* and obey the above theoretical prediction as expected. Note that the time course of MSDN is conceptually different from that of MSD in a trajectory after SPT. In the case of SPT, the identification of the same particle is consecutively performed using all measured time points during Δ*t*. On the other hand, in MSDN, the nearest point after time duration Δ*t* was chosen without referring to the measured time points before Δ*t*.

### Maximum likelihood estimation of diffusion constants for local particle density

Though the above relationship between the diffusion constant and MSDN allows us to estimate diffusion constants for the case of a uniform particle distribution, it is difficult to generalize it into an inhomogeneous particle distribution, which is a less ideal but much more relevant situation. In such a case, a constant particle density *ρ* alone cannot capture the underlying particle distribution.

Here, we formulate a more general estimation algorithm of diffusion constants using a maximum likelihood estimation based on the above probabilistic model. The log-likelihood of an observed dataset is given by

$$ {\displaystyle \begin{array}{l}l=\log \prod \limits_{i=1}^N{P}_{\mathrm{nn}}\left(r=\Delta {r}_i|{\rho}_i,D\right)\\ {}=\sum \limits_{i=1}^N\left[\log \left(2{\rho}_i\pi +\frac{1}{2D\Delta t}\right)+\log \left(\Delta {r}_i\right)-{\rho}_i\pi \Delta {r}_i^2-\frac{\Delta {r}_i^2}{4D\Delta t}\right].\end{array}} $$

(8)

Here, the index *i* represents each particle in the preceding time frame, Δ*r*_{
i
} is the distance to the nearest particle in the subsequent time frame, and *ρ*_{
i
} is the local particle density around particle *i*. If we assume a uniform distribution (i.e., that all *ρ*_{
i
} are the same), this maximum likelihood estimation of *D* is analytically tractable and reduces to the same relation between the diffusion constant and MSDN described above.

In the case of general *ρ*_{
i
}, it is convenient to utilize the EM algorithm [21, 23]. For this purpose, we introduce a latent variable *q*_{
i
} ∈ {0, 1}, which takes the value of zero if the nearest point comes from the surrounding particles, but becomes one if it is the original particle diffused from the origin. Then the complete-data log-likelihood with the information of the latent variable is given by

$$ {l}^{\prime }=\log \prod \limits_{i=1}^Np\left(\Delta {r}_i,{q}_i|{\rho}_i,D\right). $$

(9)

Here, the joint probability distribution is defined as

$$ p\left(\Delta {r}_i,{q}_i|{\rho}_i,D\right)=\left\{\begin{array}{cc}2{\rho}_i\pi \Delta {r}_i{e}^{-{\rho}_i\pi \Delta {r}_i^2-\frac{\Delta {r}_i^2}{4D\Delta t}}& \mathrm{for}\kern0.24em {q}_i=0\\ {}\frac{\Delta {r}_i}{2D\Delta t}{e}^{-{\rho}_i\pi \Delta {r}_i^2-\frac{\Delta {r}_i^2}{4D\Delta t}}& \mathrm{for}\kern0.24em {q}_i=1\end{array}\right.. $$

(10)

In the EM algorithm, instead of maximizing the log-likelihood directly, a quantity *Q*(*D*, *D*^{l}) is maximized with respect to *D* by iteration:

$$ Q\left(D,{D}^l\right)=\sum \limits_{i=1}^N\sum \limits_{q\in \left\{0,1\right\}}\log \left(p\left(\Delta {r}_i,q|{\rho}_i,D\right)\right)p\left(q|\Delta {r}_i,{\rho}_i,{D}^l\right). $$

(11)

Here, *D*^{l} is the estimation of the diffusion constant *D* at the *l*-th iteration. The conditional probability based on *D*^{l} is calculated from the above joint probability as

$$ p\left(q=0|\Delta {r}_i,{\rho}_i,{D}^l\right)=\frac{4{\rho}_i\pi {D}^l\Delta t}{4{\rho}_i\pi {D}^l\Delta t+1}, $$

(12)

$$ p\left(q=1|\Delta {r}_i,{\rho}_i,{D}^l\right)=\frac{1}{4{\rho}_i\pi {D}^l\Delta t+1}. $$

(13)

Taking the derivative of *Q* with respect to *D* and equating it to zero,

$$ \frac{dQ}{dD}=\sum \limits_{i=1}^N\left[\frac{\Delta {r}_i^2}{4{D}^2\Delta t}p\left(q=0|\Delta {r}_i,{\rho}_i,{D}^l\right)+\left(\frac{\Delta {r}_i^2}{4{D}^2\Delta t}-\frac{1}{D}\right)p\Big(q=1|\Delta {r}_i,{\rho}_i,{D}^l\Big)\right]=0, $$

(14)

we obtain the update rule

$$ {D}^{l+1}=\frac{\left\langle \Delta {r}^2\right\rangle }{4\Delta t{\left\langle P\left(q=1\right)\right\rangle}_{D^l}}, $$

(15)

where we have defined the expected fraction of data points with *q* = 1 as

$$ {\left\langle P\left(q=1\right)\right\rangle}_{D^l}\equiv \frac{1}{N}\sum \limits_{i=1}^Np\left(q=1|\Delta {r}_i,{\rho}_i,{D}^l\right). $$

(16)

Now, the correction from the original MSD relation is neatly summarized by this expected fraction of the data whose nearest points come from the original particle diffused from the origin.

### Generalization to models with multiple diffusive states

In this subsection, we further generalize the maximum likelihood estimation of diffusion constants into the case where particles take multiple states with different diffusion constants. It has been revealed that some membrane proteins change their physical properties upon binding to other molecules or spontaneous change of their conformation, and that these changes can be inferred from the change of the diffusion constant in some cases [14, 15]. Here we consider this type of change of diffusion constants, which we shall refer as to the change of their states. In this paper, we only provide the solution for relatively simpler situations of the dynamics with multiple diffusive states where the interconversion of different states can be ignored in the time resolution [10].

This simple generalization is practically quite useful, even when there is no biological reason to expect the existence of such multiple states of the target molecule. In a real experiment, many fluorescently-dyed surface molecules disappear for several reasons, such as internalization of the particle, breaching of the fluorescent dye, and so on. Such disappearance of particles can be modeled in the above framework by adding an additional state whose diffusion constant is infinitely large. In addition, some accidental peaks of fluorescent intensity may be wrongly detected as particles due to the low signal-to-noise ratio of the original images (false detections). Those spurious particles also tend to disappear in the subsequent time frame. Thus, we can reduce the effects of such false detections by introducing such a state in advance. We will address this issue again in Result section.

The derivation of the corresponding EM algorithm is largely parallel to the one in the previous subsection. In addition to the latent variable *q*_{
i
}, which specifies whether or not the nearest particles are the original particle itself, we introduce an additional latent variable specifying states of the particle *i*, *s*_{
i
} ∈ {1, ⋯, *M*}, where *M* is the number of possible states.

The joint probability distribution of this model is given by

$$ p\left(\Delta {r}_i,{q}_i,{s}_i|{\rho}_i,{D}_{s_i},{\alpha}_{s_i}\right)=\left\{\begin{array}{cc}2{\rho}_i{\pi \alpha}_{s_i}\Delta {r}_i{e}^{-{\rho}_i\pi \Delta {r}_i^2-\frac{\Delta {r}_i^2}{4{D}_{s_i}\Delta t}}& \mathrm{for}\kern0.24em {q}_i=0\\ {}\frac{\alpha_{s_i}\Delta {r}_i}{2{D}_{s_i}\Delta t}{e}^{-{\rho}_i\pi \Delta {r}_i^2-\frac{\Delta {r}_i^2}{4{D}_{s_i}\Delta t}}& \mathrm{for}\kern0.24em {q}_i=1\end{array}\right., $$

(17)

where \( {D}_{S_i} \) is the diffusion constant of the state *s*_{
i
}, and \( {\alpha}_{s_i} \) is the probability of being the state *s*_{
i
}.

The quantity *Q* for deriving the update rule of the EM algorithm is similarly defined by

$$ Q\left(D,{D}^l\right)=\sum \limits_{i=1}^N\sum \limits_{s=1}^M\sum \limits_{q\in \left\{0,1\right\}}\log \left(p\left(\Delta {r}_i,q,s|{\rho}_i,\theta \right)\right)p\left(q,s|\Delta {r}_i,{\rho}_i,{\theta}^l\right). $$

(18)

Here, *θ* collectively denotes all of the parameters to be estimated, namely, *θ* = {*D*_{1}, ⋯, *D*_{
M
}, *α*_{1}, ⋯*α*_{
M
}}. The conditional probability is calculated from the joint probability as follows:

$$ p\left(q=0,s|\Delta {r}_i,{\rho}_i,{\theta}^l\right)=\frac{2{\rho}_i{\pi \alpha}_s^l{e}^{-\frac{\Delta {r}_i^2}{4{D}_s\Delta t}}}{\sum \limits_{s^{\prime }=1}^M\left(2{\rho}_i\pi +\frac{1}{2{D_{s^{\prime}}}^l\Delta t}\right){\alpha}_{s^{\prime}}^l{e}^{-\frac{\Delta {r}_i^2}{4{D}_{s^{\prime }}\Delta t}}}, $$

(19)

$$ p\left(q=1,s|\Delta {r}_i,{\rho}_i,{\theta}^l\right)=\frac{\frac{\alpha_s^l}{2{D_{s^{\prime}}}^l\Delta t}{e}^{-\frac{\Delta {r}_i^2}{4{D}_s\Delta t}}}{\sum \limits_{s^{\prime }=1}^M\left(2{\rho}_i\pi +\frac{1}{2{D_{s^{\prime}}}^l\Delta t}\right){\alpha}_{s^{\prime}}^l{e}^{-\frac{\Delta {r}_i^2}{4{D}_{s^{\prime }}\Delta t}}}. $$

(20)

Compared to the single state case, here, the joint probability also depends upon the displacement, Δ*r*_{
i
}.

By maximizing *Q* under the restriction of conservation of probability, \( \sum \limits_s{\alpha}_s=1 \), we obtain

$$ {\alpha_s}^{l+1}=\frac{1}{N}\sum \limits_{i=1}^N\sum \limits_{q\in \left\{0,1\right\}}p\left(q,s|\Delta {r}_i,{\rho}_i,{\theta}^l\right), $$

(21)

$$ {D_s}^{l+1}=\frac{\sum \limits_{i=1}^N\sum \limits_{q\in \left\{0,1\right\}}\Delta {r_i}^2p\left(q,s|\Delta {r}_i,{\rho}_i,{\theta}^l\right)}{4\Delta t\sum \limits_{i=1}^Np\left(q=1,s|\Delta {r}_i,{\rho}_i,{\theta}^l\right)}. $$

(22)

This is our final update rule for maximum likelihood estimation for the multi state model.