Model-based extension of high-throughput to high-content data

Background High-quality quantitative data is a major limitation in systems biology. The experimental data used in systems biology can be assigned to one of the following categories: assays yielding average data of a cell population, high-content single cell measurements and high-throughput techniques generating single cell data for large cell populations. For modeling purposes, a combination of data from different categories is highly desirable in order to increase the number of observable species and processes and thereby maximize the identifiability of parameters. Results In this article we present a method that combines the power of high-content single cell measurements with the efficiency of high-throughput techniques. A calibration on the basis of identical cell populations measured by both approaches connects the two techniques. We develop a mathematical model to relate quantities exclusively observable by high-content single cell techniques to those measurable with high-content as well as high-throughput methods. The latter are defined as free variables, while the variables measurable with only one technique are described in dependence of those. It is the combination of data calibration and model into a single method that makes it possible to determine quantities only accessible by single cell assays but using high-throughput techniques. As an example, we apply our approach to the nucleocytoplasmic transport of STAT5B in eukaryotic cells. Conclusions The presented procedure can be generally applied to systems that allow for dividing observables into sets of free quantities, which are easily measurable, and variables dependent on those. Hence, it extends the information content of high-throughput methods by incorporating data from high-content measurements.


Background
In systems biology, a wide range of experimental data is used for mathematical modeling. Qualitative data mostly serves as a basis for determining network structures, whereas dynamic pathway modeling relies on high-quality quantitative data. In general, experimental data describing biological systems can be divided into three groups. Firstly, data generated from large cell populations yields an average information of the whole population behavior. However, cell population assays such as biochemical measurements or microarray studies can be misleading as large cell-to-cell variations are often observed, even in seemingly uniform populations. This stochasticity can be caused by asynchronous cell cycles, differences in cell sizes and varying protein states or expression levels [1][2][3]. Secondly, single cell data with high-content information from a limited number of cells result in a stochastic distribution of measured quantities. Many single cell approaches are based on microscopy, but other technologies are under development to investigate for example gene expression or proteins in single cells [4][5][6]. The third group covers a small range of experimental techniques that generate single cell data from large cell populations in a high-throughput format. Most common among those is flow cytometry, which however is limited to measurements from cells in suspension. Moreover, in contrast to microscopy, standard flow cytometry can only detect average whole cell fluorescence intensities lacking spatially resolved information. Currently, highthroughput imaging techniques as well as imaging flow cytometers digitally imaging cells directly in flow are being developed, with the goal to gather high-content information from a large number of single cells [7,8]. This will increase the number of parameters that can be determined in parallel by high-throughput and highcontent techniques.
For modeling purposes it is essential to link data from different types of experiments in order to include as many details of the system as possible in the modeling process and to avoid non-identifiabilities during the parameter estimation. However, some of the components can only be measured by time consuming highcontent techniques. For models describing entire cell populations, high-content data for large cell numbers is necessary but often impossible to provide. In contrast, high-throughput techniques can generate these large data sets, despite a lack in detailed single cell information.
A signaling pathway that has been extensively investigated by dynamic pathway modeling is the JAK-STAT pathway [9]. Upon binding of an extracellular ligand to the respective receptor latent signal transducers and activators of transcription (STATs) are activated by Janus kinases (JAK) leading to rapid nucleocytoplasmic cycling of STATs. In addition, constitutive nucleocytoplasmic cycling of unphosphorylated STAT has been shown for several STAT proteins by biochemical and microscopic experiments [10][11][12][13][14][15]. It has been proposed that import of STAT is enhanced upon activation [16], while export of activated STAT is slowed down either through retention in the nucleus by DNA binding [17] or possibly a different export mechanism [15]. Previously, rapid nucleocytoplasmic cycling of activated STAT5 has been identified as the step most sensitive to perturbation within the core module of the JAK2/ STAT5 pathway by mathematical modeling based on biochemical data [18], but import and export rates could not be measured experimentally. These transport steps are crucial as important decisions regulating cell fate are made by the nuclear reactions of STATs.
A method to determine the rates for nuclear import and export of STAT5 is fluorescence recovery after photobleaching (FRAP). FRAP is a single cell fluorescence microscopy method routinely used to measure the kinetics of transport processes between cell compartments as well as diffusion and dynamic binding reactions [19,20]. One prerequisite for a quantitative FRAP experiment is that the investigated system is in a steady state on the time scale of the experiment otherwise a mathematical description of the data is difficult to obtain. The JAK-STAT system is only in a steady state in unstimulated cells, ligand stimulation induces phosphorylation of STATs and thereby perturbs the steady state. Therefore, we focused on the nuclear import and export rates of unphosphorylated STAT5 with the goal to generate rates for the steady state in unstimulated cells that can be set to a fixed value in a larger pathway model. Biochemical data describing the phosphorylation dynamics of the pathway components after stimulation in combination with mathematical modeling can then serve to indirectly determine nuclear import and export rates for phosphorylated STAT5.
Here, we present a model for extracting the import and export rates from FRAP experiments of STAT5B-GFP in the steady state of unstimulated NIH3T3-EpoR cells. Furthermore, the dependence of these rates on STAT5B-GFP concentration and cell size is shown. To be able to combine this information with biochemical data from cell populations expressing STAT5B-GFP, cell size distribution and STAT5B-GFP concentration distribution within the respective cell population are additionally measured by flow cytometry. Cell size as well as STAT5B-GFP concentration are estimated directly from flow cytometry data after calibration of these data to microscopy data.
The calibration procedure can be generally applied to link data from powerful high-content techniques and fast, efficient high-throughput methods. In combination with the mathematical model, it provides a novel rationale to determine formerly inaccessible information for large cell populations by less time-consuming highthroughput measurements.

Results and Discussion
Data calibration links high-content with high-throughput data To formulate a general calibration procedure for combining high-content and high-throughput data we use a method based on least squares regression of the quantile-quantile plot (QQ-plot) for corresponding population measurements. Let where Y refers to the quantity of interest, e.g. protein concentration, cell volume, total amount of protein, X C and X T are observables for the high content or highthroughput technique that are both linearly connected to Y via slopes m, m' and intercepts d, d'. In a more general formalism, a measurement technique is a strictly monotonic function F with Y = F(X), i.e. F uniquely relates an observable to a quantity of interest. Practically, the scale of X is chosen in such a way that F is linear. The strict monotony of F requires m, m' to be non-zero. Eqs. (1) and (2) show that for every value of Y i.e. X T (Y ) depends linearly on the high-content quantity X C (Y ). The slope m m and intercept d d m of eq. (4) need to be determined in order to translate X C into X T and vice versa. For this purpose, the distribution quantiles of X C and X T are used.
Assuming that N C and N T >N C measurements have been performed for the high-content and high-throughput techniques, respectively, the ordered set of measurements {X C,i } i = 1 ,...,N C is an estimate of the N C equally spaced quantiles { } , , , of the theoretical distribution of X C . In the same way, the sample quantiles { } , , , ..,N T estimate the N C theoretical quantiles of X T . According to eq. (4) the distributions of X T and X C belong to the same location-scale family. Consequently, the QQ-plot of X T versus X C is supposed to follow a straight line with intercept d d m and slope m m . A least squares fit of the QQ-plot gives asymptotically unbiased estimates of slope and intercept for a large class of theoretical distributions. The convergence of the sample quantiles to the theoretical quantiles as well as the convergence of the least squares estimator is well known and is carried out rigorously in [21].
An implementation of this calibration procedure is provided by the R script [Additional file 1] in the supplement. A sample configuration is given by [Additional The calibration is then included in the overall procedure linking high-throughput with high-content data ( fig. 1). The essential steps are the calibration and the mathematical model depicted in the center of the workflow diagram. For calibration, the identical cell population is measured by a single-cell technique as well as a high-throughput method to determine a subset of matching quantities, defined as the free variables. At the current state of technology only flow cytometry is widely available and fulfills the requirement of generating highthroughput data at the single-cell level. This restricts the free variables that can be determined experimentally to cell volume (V cell ) and concentration of a fluorescently labeled marker (C cell ). A high-content technique that can be combined with flow cytometry and that can also assess cell volume and fluorescence intensity is microscopy. As new high-throughput techniques advance other parameters can be considered as free variables. The data for the free variables are then compared and the resulting calibration creates the possibility to switch between the different measurement units. Next, the quantities of interest, i.e. any high-content information determined by for example microscopy that is dependent on cell size or concentration of the fluorescently labeled marker or both, need to be expressed in dependence of the free variables. A valid model and parameter estimation connecting dependent and free variables of the single cell measurement has to be identified. The high-throughput measurements can then be translated via calibration into the ambit of single-cell measurements and via the fixed parameter model into cell population quantities. The method can be applied to combine experimental data generated by different experimental techniques if the free variables can be measured by all of the techniques used for data generation.
Nucleocytoplasmic cycling of STAT5B is modeled as saturatable pump As an example, the analysis of nuclear import and export of the transcription factor STAT5B was chosen. Nucleocytoplasmic cycling is only measurable by single cell microscopy, namely FRAP, whereas other crucial features such as the dynamic changes of the phosphorylation state of the proteins are accessed by biochemical measurements from cell populations. In eukaryotic cells, the nucleus is separated from the cytoplasm by the nuclear envelope. Molecules can only migrate between those two compartments through nuclear pores forming small holes in the membrane. Small molecules (<20-40 kDa) can diffuse freely through nuclear pores whereas larger molecules require active transport aided by soluble transport proteins that interact with the cargo molecule as well as the nuclear pore. Active nuclear import and export are regulated by different mechanisms. In order to be imported into the nucleus, proteins usually carry a nuclear localization signal (NLS) to which importins can bind and enable nuclear translocation. Similarly, a nuclear export signal (NES) within the cargo protein structure is recognized by an exportin. For most proteins of the STAT family, the respective importins have been identified (reviewed in [22]). In the case of STAT5B however, so far no importins could be identified that directly interact with the transcription factor [23]. Instead, import of STAT5B has been suggested to require additional factors acting as chaperones between the importins and STAT5B [24]. Active nuclear export of STATs is generally mediated by the exportin CRM1.
Here, a simple model for the active transport of STAT5B through the nuclear pore was used. A single nuclear pore and the respective import and export factors necessary to transport a single protein of interest were modeled as a pump [25,26] making the following assumptions: For small concentrations, the amount of protein transported through the pores is proportional to the concentration. For large concentrations the transport current, i.e. the number of molecules per time, saturates. For a large set of nuclear pores for which the capacity of every pore may vary a Michaelis-Menten curve is a reasonable way to model the overall current.
For large concentrations (c ≫ g) I saturates with saturation value b. For small concentrations (c ≪ g) I depends linearly on c with slope β γ . Equation (5) is the resulting current for all pores of a cell. The constants b and g may still vary within a population, i.e. from cell to cell. In a next step the saturation value b = k K as well as the slope β γ κ = K are assumed to depend linearly on a quantity K which is the product of the abundance of transport factors and the number of nuclear pores. This is appropriate for two reasons: First, if the system is in saturation and the number of nuclear pores is doubled, then the system has twice the capacity to transport STAT5B and the current will be doubled. The same holds for the transport factors. Hence, the saturation value b is proportional to the product of transport factor abundance and nuclear pore number. Second, if the STAT5B concentration is low and consequently the transport rate is independent of the concentration then doubling the number of pores or the number of transport factors will lead to a doubling of the transport rate. Hence, the transport rate, i.e. the slope of the current β γ , is proportional to the product as well. Plugging in b = K in β γ κ = K reveals that γ κ κ = is independent of K and eq. (5) reads Consequently, given an arbitrary cell from the population and knowing about K, the current I K is a much Figure 1 Workflow for calibrating and linking data from different experiments. The calibration and model are the crucial steps of the procedure described here. Data calibration by a quantile-quantile (QQ) plot allows to translate data from one technique to the units of another measurement. Here, it is necessary to determine quantities with a known relationship from the identical cell population by both methods (blue). Furthermore, a model has to be developed to describe the dependent variables only measurable by one technique in terms of the free variables assessable by both techniques (green). Then, additional cell populations can be measured by the high-throughput technique only and the information content of the data can be increased via the calibration and the model (dashed arrows, red) to be finally combined with cell population data for mathematical modeling by an ordinary differential equation (ODE) model (transparent red). General terms are shown in bold letters, the specific case of the example system is given below. Steps shown in transparent colors are not subject of this study.
better estimator for the transport current than the mere population average. The question arising from this is if and how K is accessible. Three cases seem plausible: 1. K is dominated by the number of nuclear pores which have a similar density throughout the cell population. Hence, K depends linearly on the nuclear surface area A nuc . 2. K is dominated by the number of cytoplasmic transport factors with the same concentration in all cells which is proportional to the cytoplasmic volume V cyt . 3. K is dominated by the number of nuclear transport factors with the same concentration in all cells which is proportional to the nuclear volume V nuc .
The different hypotheses represent different models, model 1 is without any normalization. Models 2 -4 are defined by the respective normalized currents: Reformulating the problem as demonstrates the concentration dependency of the normalized transport currents under the assumption that the parameters b l and g l are constant throughout the population. This assumption is necessary for a valid formula describing the import and export currents within a population. The second formulation with a l (c) follows the idea of a linearly increasing current for small concentrations and will also be used. Import and export current distribution for STAT5B Import and export currents depend on STAT5B concentration and cell size To determine the import and export rates a imp and a exp 39 FRAP data sets generated from cells expressing varying concentrations of STAT5B-GFP were fitted with eq. (23) described in the Methods section. Variable protein levels were achieved by a tightly regulatable expression system that we developed based on a Tet-inducible promoter.
The cell-to-cell variability of a imp and a exp exceeded the confidence intervals of the rate values by far ( fig.  2A). This supported the assumption of a confounding variable K. The three hypotheses described above were tested for cell-to-cell variability of the transport rates. The comparison of different normalizations was based on the normalized currents j l (c) given by equation (8).
For every normalization the Michaelis-Menten curve was determined from a least squares fit, i.e. the different normalizations were ranked by decreasing χ 2 values. The results for no normalization, normalization by the nuclear surface area and normalization by the originating compartment volumes are shown in fig. 2B-D, the estimated parameters are shown in tab. 1.
The significance of the χ 2 reduction has been tested with a bootstrap method: from the 39 data points 39 points have been drawn randomly with replacement. Then for all models, i.e. without normalization, For the export distributions model 3 -normalization by nucleus volume -is superior to all other models at a 99% confidence level (3σ). For the import data the situation is not so clear. Models 3 and 4 cannot be discriminated and seem to describe the data equally well. Both models are clearly superior to model 1 and exceed model 2 at a 1σ level.
We decided to follow the hypothesis of normalization by the volumes of the originating compartments, i.e. export model 3 and import model 4. From a biological point of view this seems to be the most reasonable hypothesis. From a practical point of view, models 3 and 4 describe the import equally well and cannot be distinguished given the data at hand.

Calibration of flow cytometry data to microscopy data yields comparable quantities
Data calibration requires the measurement of the identical cell population by both techniques. In con-focal microscopy only intact, living cells attached to a surface can be observed. However, for flow cytometry cells are detached from their growth surfaces, generating a cell suspension of intact, living cells mixed with dead cells and cell fragments. Therefore, the flow cytometry data have to be preprocessed, so that it only includes living cells and is comparable to the microscopy data. To achieve this, the scatter (F 0 ∝ cell cross-section area) was plotted against the side scatter (F 1 ∝ granularity) ( fig. 4A). To exclude dead cells and cell fragments, linear cuts were sufficient: a line through the point cloud was defined by linear regression without intercept for the scatter plot. Based on this line two perpendicular lines -the cuts -were introduced separating vital cells in the inner region from undesired cells in the outer region. The resulting subset of cells had the same volume and STAT5B-GFP distribution as the microscopy cells.
To yield comparable distributions, it has to be considered that different quantities are measured by the two techniques. Microscopy data directly result in absolute numbers for compartment volumes and protein concentrations, whereas flow cytometry data from the fluorescence intensity channel F 2 are logarithmic due to the amplification of the signal by the instrument and the forward scatter of a flow cytometer using light scattering is an approximate measure of the cell cross-section area (see [27]). Therefore, values from either flow cytometry or microscopy measurements had to be transformed to yield comparable quantities. For practical reasons, the microscopy fluorescence intensities I micro were logarithmized yielding X M = log I micro . Similarly, cell volumes V micro determined by microscopy were converted to cross-section areas assuming a spherical shape of the cell as is the case for flow cytometry samples. This . For flow cytometry, X F = F 2 and X F = F 0 for fluorescence intensity and cross section area respectively. X M and X F defined like this build the basis for the calibration method described above.
To take the different sensitivities of the two experimental techniques for fluorescence detection into   account, the flow cytometry data was corrected for cells that are too dim to be detected by microscopy. Then, the quantiles of X M were plotted versus the quantiles of X F. The best accordance in the QQ-plot is expected if an additional cut-off is introduced to the flow cytometry data: only X F >Z for a cut-off parameter Z is taken into account. The accordance is measured by summing up al l squares of the residuals for the QQ-plot (absolute χ 2 ) and Z is chosen optimally if χ 2 reaches a local minimum. In order to avoid that Z is chosen too large and large parts of the flow cytometry distributions are dropped, large Z values were penalized quadratically. The objective function is X n p n 2 2 2 1 2 where p 0 [1] is the fraction of the population that is dropped and n = 2 for quadratic penalization. The penalization is chosen on purpose to fulfill 〈(n + 1)p n )〉 = 1 for uniformly distributed p. This guarantees that the penalization is of the same magnitude as χ 2 . The resulting X 2 curve for the size distribution indicated that only the complete flow cytometry data set lead to the best accordance, while a local minimum existed for the fluorescence intensity distribution ( fig. 4B).
After choosing the optimal cut-off, a least squares regression was applied to the QQ-plot. The linearity of the data points confirmed that the shapes of the two distributions are the same. However, even after two cuts there were deviations for the border points that result from a small population of cells which is detected differently by flow cytometry and by microscopy. To exclude biased fit parameters the least squares regression was restricted to the inner 66% region of points ( fig. 4C).
Thus, data preprocessing and subsequent least squares regression of the QQ-plot lead to comparable quantities obtained by different experimental techniques. All functions for preprocessing the flow cytometry data and for calibration of flow cytometry to microscopy data are included in the R script 3.1 provided in the supplement.

Distributions of transport currents for an exemplary cell population are calculated
In order to compute the distribution of currents for a sample flow cytometry measurement, the calibration was combined with the formula describing the currents (eq. (8)). As has been shown above ( fig. 2D), the rates a in and a out depend on the STAT5B concentrations C cyt in the cytoplasm and C nuc in the nucleus as well as the compartment volumes V nuc and V cyt for nuclear export and import, respectively.
Since the individual cell compartments cannot be distinguished by flow cytometry, an average ratio of the cytoplasmic to nuclear quantities had to be estimated from microscopy data. For every FRAP data set, the fractions f V For the population current calculation, only the mean values of f V and f c were used. The resulting distributions of import and export currents are shown in fig. 5. The transport currents were determined for two cell populations expressing either very low or very high levels of STAT5B-GFP. For import as well as export currents the distributions show different average values but similar variance. This current distribution can be directly combined with other population data generated from the same cell population, such as biochemical time course data describing the phosphorylation dynamics of the proteins involved. Only by using both types of data for mathematical modeling it is possible to combine a detailed experimental investigation of nuclear import and export with signal transduction mediated by phosphorylation of signaling proteins.

Conclusion
In this study, transport rates for unphosphorylated STAT5B were determined in single cells by FRAP and found to follow saturation kinetics dependent on both STAT5B-GFP expression level as well as size of the originating compartment. This reflects a saturation of cofactors necessary for active transport of STAT5B through the nuclear pore complex. The parameters for concentration and volume dependency of the cycling currents were estimated. To predict transport currents with the saturation model for large cell populations, STAT5B-GFP concentration and cell size distribution were measured by flow cytometry. As flow cytometry only yields relative values for cell size and total cell fluorescence, a calibration to absolute numbers generated by single cell microscopy is required. For calibration, the concentration of the transcription factor STAT5B as well as cell size were determined by confocal microscopy and flow cytometry from the identical cell population, resulting in a linear calibration curve. Subsequently, absolute cell size and STAT5B-GFP concentration distributions were computed from flow cytometry data using the calibration curve. Finally, transport current distributions and thereby cell-to-cell variation were predicted using the saturation model.
In recent years, other members of the STAT protein family have been studied by FRAP to investigate nucleocytoplasmic cycling [12,15,28], but the data have so far only been interpreted qualitatively. Our results provide a procedure to link directly measured import and export rates of unphosphorylated STAT5B with data indirectly describing the nucleocytoplasmic cycling of activated STAT5B generated by biochemical experiments. Furthermore, by using an inducible expression system for STAT5B-GFP, we identify a saturation-like behavior of STAT5B nuclear import and export, indicating a limitation in transport factors. The nature of these factors remains to be identified.
The proposed method is generic and is applicable as long as two conditions are fulfilled. First, the quantities that are measured by a certain high-content method have to be functionally related. This allows for expressing a subset of the quantities, defined as the dependent variables, as a function of the remaining, free variables. Second, the free variables have to be part of the quantities that can be measured by a given high-throughput method. If these conditions hold, it is possible to determine the function connecting free and dependent variables by setting up a mathematical model and estimating its parameters. Furthermore, it is possible to calibrate the two measurement techniques against each other as the high-throughput quantities are in particular part of the high-content quantities. This means that a high-throughput measurement can be translated into the ambit of a high-content measurement. Applying the fixed parameter model then leads to a prediction of the dependent variables' distributions representing an indirect determination of these variables for every cell of the population. The method is especially useful if there is a great discrepancy in accessibility between free and dependent variables. This combination of two experimental approaches results in a higher degree of measured variables suitable for mathematical modeling and a reduction of non-identifiabilities in the parameter estimation.

Experimental Procedures
The retroviral expression vector pMOWS containing the cDNA for murine HA-EpoR was introduced into NIH3T3 cells (ATCC) and a single cell clone stably expressing HA-EpoR was obtained by selection with G418. pMOWSIN-TREt-STAT5B-GFP was cotransduced into NIH3T3-EpoR cells together with the cDNA for the transactivator protein contained in pMOWS-rtTAM2. A single cell clone stably expressing murine STAT5B-GFP was obtained by selection with puromycin. Expression of STAT5BGFP was regulated by a Tet-inducible promoter included in pMOWSIN-TREt. pMOWSIN-TREt was generated by digesting pTRE-tight (Clontech) and inserting TREt into the self-inactivating (SIN) retroviral vector pMOWSIN. pMOWS-rtTAM2 was generated by introducing cDNA of rtTAM2 from pUHrT-62-1 (H. Bujard, Heidelberg, Germany) into pMOWS using BamHI/EcoRI restriction sites [29]. To simplify identification of the nuclei cells used for FRAP experiments also were transduced with pMOWS-H2B-mCherry. All cells were maintained in DMEM supplemented with 10% calf serum and 1% PenStrep.
For FRAP experiments cells were grown to 60-80% confluency in Labtek chambered coverglasses over night. Doxycycline was added at a concentration of 10-250 ng/ ml approximately 16 hours before serum-starvation. Cells were serum-starved in DMEM supplemented with 25 mM HEPES pH 7.4 and doxycycline for at least 5 hours. Confocal microscopy was performed on a Leica SP5 with a 63×/1.4 NA oil immersion objective and the pinhole set to 1 Airy unit. All live cell imaging was performed at 37°C. For cell volume and STAT5B-GFP concentration estimation a z-stack of the entire cell was acquired prior to each FRAP experiment. Cytoplasmic and nuclear volume was estimated from z-stack data by measuring the whole cell area or nuclear area of each slice in ImageJ (see [30]) and calculating the respective volume by summing up the voxels per slice. Cytoplasmic volume V cyt was calculated as the difference of the nuclear volume V nuc subtracted from the whole cell volume V cell . To avoid overestimation of bright cells due to scattered light, the number of slices considered was determined by measuring the maximum average intensity in a small region of the nucleus over the whole stack. Only those slices with at least half the maximum mean intensity were included in the analysis. Mean intensities in the nucleus and the whole cell were converted to GFP concentrations by using a dilution series of recombinant SBP-GFP in PBS and embedded in 15% polyacrylamide gel as reference. The cytoplasmic concentration of STAT5B-GFP c cyt was calculated as To determine import and export currents, STAT5B-GFP was photobleached in the entire nuclear region with 100% laser power (488 nm). For analysis of the transport dynamics 10 prebleach and approximately 240 postbleach images of the whole cell were acquired for 30-40 min after bleaching.
Flow cytometry analysis of STAT5B-GFP expression level and measurement of the approximate cell size were performed on a BD FACSCalibur system with the software package CellQuest. Cells were grown in 60 mm cell culture dishes and were treated as for microscopy. Cells were detached from the dishes by 0.05% trypsin/ EDTA and washed once in PBS/0.3% BSA. For each cell population 20 000 cells were measured. Forward and side scatter were detected linearly, for fluorescence intensity detection the signal was logarithmically amplified. NIH3T3-EpoR cells were used as control for cellular autofluorescence and cell size. Raw data was extracted from CellQuest files with FCSExtract [31]. Fluorescence intensity values were directly used for analysis. Values for the forward scatter were assumed to be approximately proportional to the cross-section area of the cell [27]. Cell shape was assumed to be roughly spherical for detached cells and therefore the relation between cross-section area and volume is known.
For the calibration measurement, cells from one cell population were seeded in 60 mm dishes as well as Labtek chambers 20 hours before the experiment. STAT5B-GFP expression was induced with either 10 or 250 ng/ ml doxycycline 16 hours prior to serum-starvation. Flow cytometry analysis was performed as described above. zstacks of 100 tiled frames were acquired by confocal microscopy. For each doxycyline treatment the cell volume and the total amount of STAT5B-GFP per cell were determined for 200 cells as described for FRAP experiments above.

Mathematical Model
Import and export currents from FRAP data In the biological system employed here, fluorescently labeled STAT5B (STAT5B-GFP) is introduced into the cells in addition to endogenous STAT5B so that Here imp indicates transport from cytoplasm to nucleus and accordingly exp transport from nucleus to cytoplasm. From the definition of j imp/exp arises a system of coupled differential equations for the labeled molecule concentrations: As in the previous section K = A, V cyt , V nuc accounts for the normalization. The associated index l is omitted as an index of j. The ε-terms describe the continuous bleaching due to constant laser exposition during postbleach image acquisition.
The two-compartment system is in equilibrium, i.e. j imp = j exp = j and j(t) is constant in time. Also, during the short period of photobleaching j remains constant because the bleaching process only destroys the fluorescing dye but not the molecule of interest. By combining equation (8) with the equilibrium condition and the ansatz c cyt/nuc,L (t) = c cyt/nuc,0 (t)e -εt , equation (16)  , Consequently, the experimentally accessible quantities cyt and nuc are directly associated to the concentrations appearing in the ODE system. Note that the exponential decrease of the signal (due to continuous bleaching) and the proportionality factor between the signal S and the concentration c L drop out. This is even true if the proportionality factor is time dependent.
Using both eq. (19) and the experimental quantities cyt and nuc can be fitted to an exponential curve and the fit parameter a 1 gives the desired result The resulting rate function a l (c) or equivalently j l (c) = cα l (c) can be used for microscopy data: cell images are analyzed for the quantity K and for fluorescence intensities which allow calculating the protein concentrations of interest. Plugging these values in the formula for j l (c) yields an estimate for the current between nucleus and cytoplasm of the investigated cell without measuring it explicitly.

Additional material
Additional file 1: The file is an R script file designed for calibrating flow cytometry data to microscopy data. A documentation of how to use the script is included in the header of the file.
Additional file 2: Sample config file for populist.R. In the file "Purpose" can either be "Calibration" or "Measurement". "Method" is either "FACS" or "Microscopy". "Dox" refers to the preparation and can have arbitrary numbers. For every value of "Dox", an extra calibration is performed. "Intensity" and "Volume" refer to the column names of microscopy and flow cytometry data where intensity and volume values can be found. Finally, "File" is the filename of the data file that should be used.