Analysis of alternative signaling pathways of endoderm induction of human embryonic stem cells identifies context specific differences
© Mathew et al.; licensee BioMed Central Ltd. 2012
Received: 13 August 2012
Accepted: 11 December 2012
Published: 15 December 2012
Lineage specific differentiation of human embryonic stem cells (hESCs) is largely mediated by specific growth factors and extracellular matrix molecules. Growth factors initiate a cascade of signals which control gene transcription and cell fate specification. There is a lot of interest in inducing hESCs to an endoderm fate which serves as a pathway towards more functional cell types like the pancreatic cells. Research over the past decade has established several robust pathways for deriving endoderm from hESCs, with the capability of further maturation. However, in our experience, the functional maturity of these endoderm derivatives, specifically to pancreatic lineage, largely depends on specific pathway of endoderm induction. Hence it will be of interest to understand the underlying mechanism mediating such induction and how it is translated to further maturation. In this work we analyze the regulatory interactions mediating different pathways of endoderm induction by identifying co-regulated transcription factors.
hESCs were induced towards endoderm using activin A and 4 different growth factors (FGF2 (F), BMP4 (B), PI3KI (P), and WNT3A (W)) and their combinations thereof, resulting in 15 total experimental conditions. At the end of differentiation each condition was analyzed by qRT-PCR for 12 relevant endoderm related transcription factors (TFs). As a first approach, we used hierarchical clustering to identify which growth factor combinations favor up-regulation of different genes. In the next step we identified sets of co-regulated transcription factors using a biclustering algorithm. The high variability of experimental data was addressed by integrating the biclustering formulation with bootstrap re-sampling to identify robust networks of co-regulated transcription factors. Our results show that the transition from early to late endoderm is favored by FGF2 as well as WNT3A treatments under high activin. However, induction of late endoderm markers is relatively favored by WNT3A under high activin.
Use of FGF2, WNT3A or PI3K inhibition with high activin A may serve well in definitive endoderm induction followed by WNT3A specific signaling to direct the definitive endoderm into late endodermal lineages. Other combinations, though still feasible for endoderm induction, appear less promising for pancreatic endoderm specification in our experiments.
KeywordsHuman embryonic stem cells Endoderm Hierarchical clustering Biclustering Bootstrap
Embryonic stem cells have been shown to have tremendous impact in the field of regenerative medicine because of its potential to differentiate to multiple cell types of interest. Efficient harvesting of this potential requires careful development of protocols to evolve the cells through specific signaling pathways which will induce desired lineages and properties in the differentiated phenotypes. Our primary interest lies in differentiation of human embryonic stem cells (hESCs) to insulin producing β-cells of the pancreas as a cellular transplantation strategy for diabetes mellitus. The first and perhaps the most important step in differentiation to endodermal organs like pancreas and liver is the commitment to definitive endoderm (DE). Multiple signaling pathways have been reported to have success in inducing endoderm differentiation with subsequent maturation to liver, pancreas and lung. While there is some understanding of the activity pathway of these individual signaling molecules, detailed knowledge of transcriptional controls activated through these signaling pathways is largely unknown. Moreover, cooperative effect of these endoderm induction pathways, along with its impact on long term maturation has received less attention. Although standard protocols have been established for the later stages of pancreatic induction, it is not always obvious how these endoderm derivatives derived from different pathways will respond to subsequent pancreatic induction signals. In this article, we have analyzed the endoderm induction stage of the differentiation process induced by the combinatorial action of the signaling pathways using an integrated experimental and mathematical approach. A detailed mathematical analysis is adopted to capture co-regulated TFs across different growth factor combinations and projection of maturation potential of the various endoderm derivatives.
Differentiation of hESCs to DE
Activin A (henceforth denoted as activin) has been shown to be effective in inducing DE from hESCs and is a key induction factor used in many protocols[2, 3]. However, recent studies have shown that activin alone may not produce homogeneous differentiation and additional factors must be used to modulate supplementary signaling pathways along with the nodal pathway activated by activin[1, 4]. We chose several widely used DE induction protocols all of which involve activin with either PI3K inhibition, WNT3A, BMP4 or FGF2. The hESCs were differentiated into DE using these molecules alone and in all possible combinations, at the end of which the differentiated cell population was analyzed for endoderm markers. Our aim is twofold: to identify which growth factor combinations are most effective for efficient DE induction; and to understand TF interactions governing these induction conditions. We analyzed the mean expression data using Hierarchical clustering (HC) to identify relationships between the conditions and the TFs and biclustering on the original expression data with replicates to identify the TFs which are co-regulated under subsets of these conditions.
HC is a useful technique to analyze and interpret multivariate data. Each data point here is represented as a vector and the distances between these data points are measured using a suitable distance measure. The clustering process then links the data points together and the result is a hierarchical grouping of the data points in each of the dimensions (TFs and conditions in our case). Our primary goal in using HC is to capture the similarities between different growth factor treatments for DE induction as well as to identify co-regulated TFs under each of these treatments. HC has been successfully used in a number of bioinformatics applications including microarray data analysis, structure identification of bio-molecules and gene pathway identification.
Biclustering to identify co-regulated genes across different conditions
While HC homogenizes the entire dataset, techniques like biclustering are useful in preserving the second dimension in clustering; in our case all the endoderm induction conditions. We are interested in identifying specific sets of genes exhibiting similar expression patterns across various subsets of experimental conditions, which can be achieved by biclustering. Likewise, many TFs are known to have multiple functions, and hence participate in multiple regulatory networks, which can also be captured by overlapped biclusters. In 2000, Cheng and Church proposed the use of a similarity measure called the mean square residue for identification of coherent biclusters. Since then newer and better algorithms have been developed to identify biclusters with particular characteristic trends like coherence, low overlaps and hierarchical structure. These algorithms perform either one or a combination of iterative row and column clustering, greedy iterative search, exhaustive bicluster enumeration or distribution parameter identification. Bleuler et al. proposed an evolutionary algorithm (EA) to determine high quality, partially overlapped biclusters using the Cheng and Church formulation. EAs have the advantage of large search space and are efficient methods for complex optimization problems. High quality biclusters should satisfy many criteria; namely they should contain as many genes and conditions as possible, low mean square residue, high row variance and should have low overlapping. Divina et al. formulated Sequential Evolutionary Biclustering (SEBI) algorithm to identify such biclusters from the expression data which has been adopted in the current work to identify important biclusters for the endoderm induction data under different combinations of the growth factors. SEBI can find high quality biclusters and has been proved to perform well for large-scale biological datasets. At the same time, it allows the user the flexibility of selecting the degree of overlap of the biclusters.
Handling data variability
The gene expression data obtained for cell culture systems are subjected to noise because of the heterogeneity and stochasticity associated with the system. Differences among the biological replicates may therefore arise due to the inherent heterogeneity of the ES cell population as well as by experimental noise. Therefore, it is essential that the biclustering algorithm be supplemented with additional methods to discover good quality and robust biclusters from noisy gene expression data. One way to do this is to obtain a large number of experimental replicates and perform biclustering over the entire dataset. This is however, expensive and impractical. A mathematical surrogate of this approach is bootstrapping, a concept first presented systematically by Efron et al..
Essentially, bootstrapping generates a pseudo dataset from the small number of experimental replicates by a sampling with replacement technique. The advantage of bootstrap lies in estimating statistically significant parameters from a limited number of experimental replicates. Thus, the results from a bootstrap analysis can provide information on the parameter variances and confidence intervals. These bootstrap data-sets are further analyzed by ensemble methods like bagging to identify aggregation of biclusters, referred to as meta-clusters. We have adopted a similar approach to aggregate the individual biclusters identified from the bootstrap datasets. However instead of identifying an ensemble of biclusters, we have concentrated on identifying the most repeated subset of the bicluster, which we denote as robust.
Experimental analysis of endoderm differentiation using combinations of major pathways
Hierarchical clustering of the mean expression data identifies differences in the endoderm induced by BMP4 in the presence and absence of exogenous FGF2
The clusters identified by the hierarchical algorithm reflect our biological understanding of the induction conditions as seen from the previous studies. A major difference between the two clusters of conditions was the context dependent function of BMP4. In the presence of FGF2 and high activin, BMP4 was found to favor the endodermal lineage which was seen in several recent studies[20–22] and was also on par with PI3KI dominant conditions which gave the best endoderm in our experiments. Also, in our BMP4 dominant conditions, the late stage markers showed very high expression while the major DE markers were low indicating that the resulting endoderm may already be mature. Among the second group of conditions, PI3KI and high activin resulted in high expression of three major DE markers SOX17, CXCR4 and CER which is supported by a number of earlier studies[23, 24]. Using all the factors together does not improve upon the endoderm derived by PI3KI treatment. The second group of conditions also contains FGF2 as a major factor along with WNT3A. It is found that both pluripotency (OCT4) and the endoderm factors (CER and HNF6) are relatively favored by conditions involving FGF2 and WNT3A as the major contributor. In fact, FGF2 has been found to be sufficient to maintain the hESCs in the pluripotent state and has also been used for endoderm induction in several differentiation protocols. Thus, FGF2 can potentially favor both pluripotency as well as endoderm differentiation depending on associated conditions.
Identification of co-regulated transcription factors by biclustering
While hierarchical clustering enables a fast and simplistic analysis of the experimental data sets, it does not provide information on which subsets of TFs are co-regulated across subsets of conditions. Identifying such co-clusters will be beneficial, since the governing signaling pathways change with the induction condition and the same TFs may not be co-regulated. The technique of biclustering serves to mine subgroups of such TFs exhibiting similar trends in their expression level under subsets of conditions. Hence TFs appearing in the same bicluster can be inferred to be co-regulated and constituents of a similar network architecture. The experimental data matrix, X, constituting the mean expression data across all the growth factor conditions is analyzed using the algorithm elaborated in Methods section. Here, the biclustering approach is formulated as an optimization problem solved using genetic algorithm (GA) and the quality of every candidate bicluster is assessed by a fitness function. The fitness function has a number of free parameters associated with it which can be tuned in order to identify certain desired trends. The detailed procedure on the selection of the optimum parameters is outlined in the Additional file2.
Recently, a new method was proposed by Banka et al. called as Fuzzy Possibilistic Biclustering which assigns a membership value to each gene-condition pair in the expression matrix and therefore, allows varying degree of overlapping amongst the biclusters[27, 28]. However, though the method has been proven to provide very large biclusters with acceptable residue, the selection of the degree of fuzziness often depends upon the question that the biologists have set to answer. In our case, we are interested in analyzing the well identified markers of endoderm induction under necessary signaling pathways. Since, our aim is to discover subtle differences in the gene regulation when the induction conditions are changed, a traditional crisp method like SEBI will be more useful for identifying the best induction condition.
Robust biclusters identify WNT3A treatment to favor both early and late endoderm
The above identified biclusters were for the mean dataset, and hence does not explicitly take into account the experimental variations. In general biological datasets are known for their noise and uncertainty, and in particular stem cells have inherent heterogeneity and stochasticity. In order to increase confidence in the identified bicluster we undertook bootstrap analysis on the experimental data to generate 1000 pseudo-datasets. Each of these datasets were treated as an experimental repeat and subjected to the entire biclustering analysis. In order to identify somewhat overlapped biclusters, we ran the biclustering algorithm five times at each data point by subsequently penalizing previously identified biclusters.
The differentiation of hESCs into the endoderm lineages is carried out by the activation of different signaling pathways mimicking in vivo development. However, there is no consensus on which induction method is the most desirable and whether combination of these could result in an endoderm with the best signature. Here, we have used a combination of experimental and mathematical techniques to shed light on these concerns.
The DE signature differs under exogenous activation of different signaling pathways participating in endoderm commitment
Our experiments with different DE inducing conditions show that the DE potential of the differentiating hESCs is highly dependent on the method of DE induction. The major DE markers (CER, CXCR4, FOXA2, SOX17) showed considerable variation when some of the pathways were activated above their basal levels.
All the pathways studied here have been known to be important at the earlier stages of in vivo endoderm differentiation and has also been documented as necessary for in vitro differentiation[2, 6, 7, 31]. The common denominator in our studies is activin A which is an essential inducer of DE[2, 3, 24]. This is primarily because activin, being a member of the TGFβ family, mimics nodal signaling which is proven to be necessary for endoderm development. Activin has been shown to maintain pluripotency at low concentrations and to induce mesoderm and endoderm at high concentrations. However, activin alone may not result in efficient endoderm induction. Low PI3K signaling was essential for efficient induction of DE from hESCs. Our hierarchical clusters show that Activin and PI3K inhibition in combination favor the up-regulation of a number of DE markers and form the most minimal signaling pathways to be modulated for efficient DE induction. In fact a number of recent studies have identified the interplay between PI3K/Akt and Activin/Smad2,3 pathways and the resulting regulation of the gene transcription events necessary for early DE induction.
Among the DE markers, CER showed up-regulation on differentiation, and the highest up-regulation was achieved in the presence of FGF2, WNT and PI3KI treatments. Katoh et al. recently identified the binding domains of several key signaling effectors of the activin and WNT pathways on the promoter regions of CER in hESCs. According to their results, the key nodal effectors Smad3/Smad4 as well as the WNT effectors beta-catenin and TCF/LEF transcriptional complex regulate the expression of the CER gene. In addition to high activin and WNT signaling, PI3K inhibition may be necessary to enhance the effect of nodal signaling as Smad3/Smad4 complex is negatively regulated by Akt. Exogenous FGF2 simultaneously activates the ERK pathway and maintains the expression of other key regulators of differentiation. However, BMP4 effectors Smad1/3 may compete with the activin pathway and thus reduce the up-regulation of CER, as substantiated by the consistent grouping of the BMP4 dominant conditions in the hierarchical clustering with low CER as a common marker.
The response to the BMP4 pathway, however, was highly dependent on the context, namely the presence and absence of FGF2 which was a striking feature of the hierarchical clustering on the 15 conditions. BMP4 is typically known as an activin antagonist and high concentrations of BMP4 in the culture with high activin results in mesoderm fate[34–36]. At the same time, BMP4 alone results in the extra-embryonic lineages. The presence of FGF2 with BMP4 modulates the net response to the mesendoderm fate, which is an intermediate stage that can result in DE and mesoderm. Several recent studies have demonstrated the use of this combination to promote endoderm formation[21, 22, 38]. FGF2 sustains the expression of Nanog (a pluripotency marker) and this sustained Nanog expression is found to shift the outcome of BMP4 induced differentiation of hESCs towards mesendoderm. However, prolonged use of FGF2 and BMP4 together may be detrimental for pancreatic differentiation, since this combination has been shown to induce hepatic differentiation after the DE stage. Also, BMP4 dominant clusters showed high expression of late endoderm markers HNF4α, HNF1β and GATA4. This may indicate that BMP4 accelerates the differentiation to the mesendoderm phase and therefore, the overall dynamics may be faster for the BMP4 dominant case. But, it was striking to note that the expression of HNF6, another important marker for late endoderm was still lower in the BMP4 dominant case. Hence, hierarchical clustering alone was not sufficient to answer if BMP4 addition could be useful for late endoderm differentiation. Importantly, BMP4 dominant conditions gave low expression of markers from the robust biclusters. Thus the current analysis shows that BMP4 may not be a suitable choice for endoderm induction.
WNT3A/β-catenin signaling has been shown to be important both for maintenance of pluripotency as well as induction of differentiation. The WNT pathway is also found to be important in the formation of primitive streak due to which it is often used in the very early stages of in vitro differentiation until the formation of mesendoderm. Stabilization of β-catenin by canonical WNT signaling is found to be responsible for differentiation by epithelial-mesenchymal transition;, however presence of Wnt after this stage supports mesoderm. Also, FGF2 is found to synergistically influence the WNT pathway. WNT alongwith PI3KI was commonly present in both the groups identified by our hierarchical clustering. WNT was consistently found to be supportive to the activin + FGF2 signaling assessed by the up-regulation of DE markers. Hence, WNT and PI3KI may be the essential pathway modulators necessary for endoderm differentiation.
Robust biclusters identify the necessary pathways for efficient endoderm differentiation to the pancreatic lineage
The robust biclusters identified by the biclustering + bootstrap analysis show the most important trends preserved under experimental variations. Supportively, CER, HNF6 and HNF4α belonged to the robust clusters. As mentioned earlier, CER is an important target of the activin and WNT signaling pathways and HNF6 is a very early pancreatic progenitor marker taking part in the transcriptional network activating pancreatic progenitors[32, 40]. As seen from the Group 1 bicluster, FGF2 + WNT3A conditions favor CER and HNF6 while BMP4 limits their up-regulation. It is also found that the stability of β–catenin is partly enhanced by PI3K signaling (activated by FGF2) and hence this combination of high activin + FGF2 + WNT3A may work to control the expression of some endoderm markers like CER and HNF6. At the same time, CER protein is a negative regulator of the Tgfb (activin, BMP4) pathway and up-regulation of CER is necessary to limit the activation of these pathways, since inhibition of the Tgfb pathway was found to be necessary for efficient differentiation to the pancreatic progenitors after PDX1 and HNF6 expression. However, external addition of WNT3A may still be necessary since CER negatively regulates the WNT pathway.
The focus of the current work was to achieve insights into the in vitro differentiation process of human embryonic stem cells to the endoderm stage using both experimental and mathematical approaches. Our work has identified the differences between the different protocols for endoderm induction. Essentially, high activin A and PI3K inhibition or high activin A with FGF2 or WNT3A serve well as early DE inducer. Additionally, biclustering shows that the early and late endoderm markers are co-regulated under high activin and WNT3A. Thus, overall high activin with PI3KI and WNT3A together may serve better for in vitro differentiation of hESCs to the definitive endoderm and pancreatic endoderm lineages.
Cell culture and treatment
H1 hESCs were placed on hESC certified matrigel coated wells and maintained with mTeSR1 with media change every day. Cells were passaged every 5 to 7 days by incubating in 1 mg/ml dispase for 5 minutes followed by mechanically breaking the colonies and splitting at a 1:3–1:5 dilution. Cells were examined under the microscope every day and colonies with observable differentiation were picked and removed before the media changes.
hESC differentiation to DE
H1 hESCs were allowed to grow to 60-70% confluency before the experiments were started. Once confluency was reached, differentiation was performed by adding DE induction media for 4 days with media change every day. Several induction conditions were chosen according to previously published studies[3, 5–7]. All conditions were prepared in DMEM:F12 supplemented with B27 and 0.2% BSA with 100 ng/ml Activin A. Conditions involved the use of individual and all possible combinations of growth factors and molecules at the following concentrations: basic FGF (F) at 100 ng/ml, BMP4 (B) at 100 ng/ml, WNT3A (W) at 25 ng/ml and Wortmannin (PI3K inhibitor, P) at 1 μM. This leads to 15 different experimental conditions.
Measurement of Transcription Factor (TF) expression
After 4 days of DE induction, cells were lysed and RNA extracted using Nucleospin RNA II kit (Macherey Nagel) according to the manufacturer’s instructions. The sample absorbance at 280 nm and 260 nm was measured using a BioRad Smart Spec spectrophotometer to obtain RNA concentration and quality. Reverse transcription was performed using ImProm II Promega reverse transcription kit following the manufacturer’s recommendation. qRT-PCR analysis was performed for endoderm and pancreatic markers using the primers listed in Additional file3: Table S1.
A total of 12 transcription factors were studied which included pluripotency marker OCT4, mesendoderm marker BRACHYURY, DE markers namely, CXCR4, SOX17, CER, FOXA2 and pancreatic progenitor markers PTF1α, PDX1, GATA4, HNF1β, HNF4α and HNF6. GAPDH was selected as the housekeeping gene. Briefly, the fold change was calculated from the cycle times, C T , after normalization with respect to the control sample and housekeeping gene, GAPDH as:, where, ΔΔC T = [(CT,target − CT,GAPDH) sample − (CT,target − CT,GAPDH) undiff cells ]. The control sample was chosen to be undifferentiated cells at day 0.
TF expression profiles
The TF expression profiles can be grouped together to form an expression matrix with the rows corresponding to the measurements of interest (like the relative mRNA concentrations) and the columns corresponding to the experimental conditions or samples. Thus, each element in the matrix refers to the intensity of the particular measurement in a given sample. Many of the genes are closely regulated under a subset of conditions indicating that they are probably under the influence of the same regulatory network under these conditions. The expression data is helpful in identifying such sub groups of transcription factors and conditions. However, expression data matrices are often complex and further computational analysis is required to mine important connections from such large expression matrices.
Hierarchical clustering partitions the data into clusters through an iterative process, where similarity or dissimilarity between every pair of variables in the data matrix is calculated using an appropriate distance measure followed by grouping the variables in close proximity using a linkage function. We used the in-built Matlab functions to perform the analysis using various distance measures e.g. Euclidean, city block etc., on the mean centered and variance scaled expression matrix. The results were represented as a clustergram i.e. the linkage tree and the corresponding heat map. We tested the tree generated using different linkage measures after normalization of the mean expression matrix and found all the trees to be very similar with the cophenetic correlation coefficient greater than 0.9.
Biclustering can be described as two dimensional clustering, where a subset of genes exhibiting similar trend across a subset of conditions is being identified. Such subsets can be considered to be participating in similar regulatory mechanism, hence constituting a regulatory network. In order to identify sets of TFs expressing coherent trends under specific sets of conditions, we analyzed our TF-condition matrix, X, using the Sequential Evolutionary Biclustering (SEBI) developed by Divina et al.. The SEBI algorithm identifies coherent biclusters sequentially with the help of a number of metrics as described below. For a bicluster B(I, J) ∈ X, containing elements, e ij for i ∈ I, j ∈ J, the residue, r ij of each element in the bicluster is defined as: r ij = e ij − e iJ − e Ij − e IJ . The gene base is defined as, with I and J representing the total number of genes and conditions respectively in the bicluster B. The condition base is defined as. The base of the bicluster is the mean of all entries in the bicluster, i.e.,. The residue, therefore, indicates the degree of coherence of the element with other elements in the bicluster. Further, the squared mean residue of all the elements in the bicluster is defined as. It is possible to have biclusters having constant expression values and hence have low residue value. To avoid such trivial biclusters, the variance metric is introduced. The variance, var IJ , of a bicluster is defined as,. Hence, the variance captures fluctuating trends. Finally, we would be interested in biclusters with as many genes and conditions as possible i.e. having large volume. The basic premise of the analysis is that the genes belonging to a bicluster are under the influence of a common regulatory pathway and hence show coherence in their expression trends. However it is possible for the genes to participate in multiple regulatory pathways, to capture which we allow certain degree of overlapping amongst the biclusters discovered sequentially by the SEBI algorithm using a penalty term.
Where N, M are the number of rows and columns of the expression matrix, respectively and |Cov(e ij )| is the number of previous biclusters containing e ij . The use of the penalty term biases the search against members which already have appeared in the previous biclusters, thus reducing the overlapping amongst the biclusters.
w d is defined as and δ is the threshold mean squared residue and biclusters with mean squared residue above δ are discarded.
The current optimization formulation has been identified to be NP-hard and has been shown to be effectively handled by evolutionary techniques like Genetic Algorithm (GA). GA is an iterative search process which looks for the fittest member of a population (candidate solutions) using the biological principle of evolution under mutation and natural selection. In a typical GA, the optimization variables are encoded as a sequence of binary bits and these sequences are concatenated to form the chromosome. Thus, for the present formulation, each chromosome consists of I binary bits for genes and J binary bits for conditions forming the I + J binary bits of the chromosome. The binary variables, 0 and 1 represent the absence or presence of a gene (or condition) respectively. Thus, a GA population is made of chromosomes with each chromosome representing a candidate bicluster.
Each chromosome has a metric associated with it called the fitness which we wish to maximize. The GA algorithm is initiated by randomly initializing a population of chromosomes (i.e. biclusters). The population is continuously evolved in every generation by the operators: reproduction, crossover and mutation. At the end of every generation, individuals for the next one are selected on the basis of their fitness values. This cycle of evolution is continued until a predetermined termination criterion is reached. For the present case, we continued the simulations for a maximum number of generations until no further change in the population was observed. The biclustering formulation was coded in FORTRAN R90 and the Genetic Algorithm (version 1.7a) driver obtained from David Carroll, CU Aerospace, Urbana, IL. Computations were performed on INTEL (R) Core (TM) 2 Quad CPU (Q8400 @ 2.66 GHz).
Determination of robust biclusters
The inherent noise in biological systems makes it difficult to draw meaningful conclusions from a deterministic analysis. The formulation proposed above is based on the mean gene expression data which possibly reduces confidence in the identified bicluster. Here we have adopted the bootstrap technique to obtain robust biclusters from noisy experimental data. Bootstrap is a statistical technique to generate large data set from a small number of experimental replicates, using sampling with replacement technique. The present formulation systematically re-samples the original experimental data set using Monte Carlo algorithm to generate the artificial data set. The optimization formulation of the biclustering problem is then solved at each of the bootstrap data points to generate a family of alternate biclusters. The final goal will be to identify the most repeated biclusters in the entire array, based on the justification that such a bicluster will be relatively insensitive to experimental noise and hence is robust. To this end, the number of repeats of a particular gene-condition combination is analyzed using the quicksort algorithm (N log N). Our analysis showed that the complete bicluster was typically not repeated significantly; instead only subsets of the biclusters were repeated sufficient number of times. For identification of robust biclusters, we set the threshold frequency of repeats as 500 out of every 1000 alternate biclusters. The most repeated subsets are thereby concluded to be robust under experimental noise. The work flow for the entire analysis is depicted in Figure 1.
We would like to thank Dr. Ira Fox from the University of Pittsburgh for his generous gift of H1 hESCs.
- Zhang DH, Jiang W, Shi Y, Deng HK: Generation of pancreatic islet cells from human embryonic stem cells. Sci China Series C: Life Sci. 2009, 52: 615-621. 10.1007/s11427-009-0095-3.View Article
- D’Amour KA, Agulnick AD, Eliazer S, Kelly OG, Kroon E, Baetge EE: Efficient differentiation of human embryonic stem cells to definitive endoderm. Nat Biotechnol. 2005, 23: 1534-1541. 10.1038/nbt1163.View Article
- D’Amour KA, Bang AG, Eliazer S, Kelly OG, Agulnick AD, Smart NG, Moorman MA, Kroon E, Carpenter MK, Baetge EE: Production of pancreatic hormone–expressing endocrine cells from human embryonic stem cells. Nat Biotechnol. 2006, 24: 1392-1401. 10.1038/nbt1259.View Article
- Payne C, King J, Hay D: The role of activin/nodal and Wnt signaling in endoderm formation. Activins Inhibins. 2011, 85: 207-View Article
- Zhang D, Jiang W, Liu M, Sui X, Yin X, Chen S, Shi Y, Deng H: Highly efficient differentiation of human ES cells and iPS cells into mature pancreatic insulin-producing cells. Cell res. 2009, 19: 429-438. 10.1038/cr.2009.28.View Article
- Phillips BW, Hentze H, Rust WL, Chen QP, Chipperfield H, Tan EK, Abraham S, Sadasivam A, Soong PL, Wang ST: Directed differentiation of human embryonic stem cells into the pancreatic endocrine lineage. Stem cells develop. 2007, 16: 561-578. 10.1089/scd.2007.0029.View Article
- Basma H, Soto-Gutiérrez A, Yannam GR, Liu L, Ito R, Yamamoto T, Ellis E, Carson SD, Sato S, Chen Y: Differentiation and transplantation of human embryonic stem cell-derived hepatocytes. Gastroenterology. 2009, 136: 990-999. 10.1053/j.gastro.2008.10.047. e994View Article
- Friedman J, Hastie T, Tibshirani R: The elements of statistical learning. 2008, Springer Series in Statistics: Springer Verlag
- Slonim DK: From patterns to pathways: gene expression data analysis comes of age. Nat Genet. 2002, 32: 502-508. 10.1038/ng1033.View Article
- Kerr G, Ruskin HJ, Crane M, Doolan P: Techniques for clustering gene expression data. Comp biol med. 2008, 38: 283-293. 10.1016/j.compbiomed.2007.11.001.View Article
- Cheng Y, Church GM: Biclustering of expression data. 2000, Molecular Biology: In Proceedings of International Conference on Intelligent Systems in, 93-
- Yang J, Wang H, Wang W, Yu P: Enhanced biclustering on expression data. Proc Third IEEE Symp Bioinformatics Bioeng. 2003, 321-327.View Article
- Madeira SC, Oliveira AL: Biclustering algorithms for biological data analysis: a survey. IEEE Trans comput Biol Bioinformatics. 2004, 1: 24-45. 10.1109/TCBB.2004.2.View Article
- Bleuler S, Prelic A, Zitzler E: An EA framework for biclustering of gene expression data. Evol Comput. 2004, 161: 166-173. 2004 CEC2004 Congress on; 19–23 June 2004
- Divina F, Aguilar-Ruiz JS: Biclustering of expression data with evolutionary computation. Knowl Data Eng, IEEE Trans. 2006, 18: 590-602.View Article
- Willems E, Leyns L, Vandesompele J: Standardization of real-time PCR gene expression data from independent biological replicates. Anal Biochem. 2008, 379: 127-129. 10.1016/j.ab.2008.04.036.View Article
- Efron B, Tibshirani R: An introduction to the bootstrap. 1993, Chapman & Hall/CRC PressView Article
- Politis DN, Romano JP: The stationary bootstrap. J Am Stat Assoc. 1994, 89: 1303-1313. 10.1080/01621459.1994.10476870.View Article
- Hanczar B, Nadif M: Using the bagging approach for biclustering of gene expression data. Neurocomputing. 2011, 74: 1595-1605. 10.1016/j.neucom.2011.01.013.View Article
- Bernardo AS, Faial T, Gardner L, Niakan KK, Ortmann D, Senner CE, Callery EM: BRACHYURY and CDX2 mediate BMP-induced differentiation of human and mouse pluripotent stem cells into embryonic and extraembryonic lineages. Cell stem cell. 2011, 9: 144-155. 10.1016/j.stem.2011.06.015.View Article
- Xu X, Browning V, Odorico J: Activin, BMP and FGF pathways cooperate to promote endoderm and pancreatic lineage cell differentiation from human embryonic stem cells. Mech Dev. 2011, 128: 412-427. 10.1016/j.mod.2011.08.001.View Article
- Yu P, Pan G, Yu J, Thomson JA: FGF2 Sustains NANOG and Switches the Outcome of BMP4-Induced Human Embryonic Stem Cell Differentiation. Cell stem cell. 2011, 8: 326-334. 10.1016/j.stem.2011.01.001.View Article
- Singh AM, Reynolds D, Cliff T, Ohtsuka S, Mattheyses AL, Sun Y, Menendez L, Kulik M, Dalton S: Signaling Network Crosstalk in Human Pluripotent Cells: A Smad2/3-Regulated Switch that Controls the Balance between Self-Renewal and Differentiation. Cell stem cell. 2012, 10: 312-326. 10.1016/j.stem.2012.01.014.View Article
- McLean AB, D’Amour KA, Jones KL, Krishnamoorthy M, Kulik MJ, Reynolds DM, Sheppard AM, Liu H, Xu Y, Baetge EE: Activin a efficiently specifies definitive endoderm from human embryonic stem cells only when phosphatidylinositol 3 kinase signaling is suppressed. Stem Cells. 2007, 25: 29-38. 10.1634/stemcells.2006-0219.View Article
- Reynolds D, Vallier L, Chng Z, Pedersen R: Signaling Pathways in Embryonic Stem Cells. Regulatory Networks in Stem Cells. Edited by: Rajashekhar VK, Vemuri MK. 2009, New York NY: Human Press, 293-308.View Article
- Shiraki N, Yoshida T, Araki K, Umezawa A, Higuchi Y, Goto H, Kume K, Kume S: Guided Differentiation of Embryonic Stem Cells into Pdx1 Expressing Regional Specific Definitive Endoderm. Stem Cells. 2008, 26: 874-885. 10.1634/stemcells.2007-0608.View Article
- Mitra S, Banka H, Paik JH: Evolutionary fuzzy biclustering of gene expression data. Lecture notes in Computer Science. 2007, 4481: 284-291. 10.1007/978-3-540-72458-2_35.View Article
- Filippone M, Masulli F, Rovetta S, Mitra S, Banka H: Possibilistic approach to biclustering: An application to oligonucleotide microarray data analysis. Lecture notes in Computer Science. 2006, 4210: 312-322. 10.1007/11885191_22.View Article
- Nosova E, Tagliaferri R, Masulli F, Rovetta S: Biclustering by Resampling. Comput Int Methods Bioinformatics Biostatistics. 2011, 147-158.View Article
- Zorn AM, Wells JM: Vertebrate endoderm development and organ formation. Annual rev cell dev biol. 2009, 25: 221-10.1146/annurev.cellbio.042308.113344.View Article
- Zaret KS, Grompe M: Generation and regeneration of cells of the liver and pancreas. Science’s STKE. 2008, 322: 1490-
- Katoh M: CER1 is a common target of WNT and NODAL signaling pathways in human embryonic stem cells. Int j mol med. 2006, 17: 795-799.
- Mfopou JK, Chen B, Sui L, Sermon K, Bouwens L: Recent advances and prospects in the differentiation of pancreatic cells from human embryonic stem cells. Diabetes. 2010, 59: 2094-2101. 10.2337/db10-0439.View Article
- Poulain M, Fürthauer M, Thisse B, Thisse C, Lepage T: Zebrafish endoderm formation is regulated by combinatorial Nodal, FGF and BMP signalling. Development. 2006, 133: 2189-10.1242/dev.02387.View Article
- Sulzbacher S, Schroeder IS, Truong TT, Wobus AM: Activin A-Induced Differentiation of Embryonic Stem Cells into Endoderm and Pancreatic Progenitors—The Influence of Differentiation Factors and Culture Conditions. Stem Cell Rev Reports. 2009, 5: 159-173. 10.1007/s12015-009-9061-5.View Article
- Sumi T, Tsuneyoshi N, Nakatsuji N, Suemori H: Defining early lineage specification of human embryonic stem cells by the orchestrated balance of canonical Wnt/-catenin, Activin/Nodal and BMP signaling. Development. 2008, 135: 2969-10.1242/dev.021121.View Article
- Xu RH, Chen X, Li DS, Li R, Addicks GC, Glennon C, Zwaka TP, Thomson JA: BMP4 initiates human embryonic stem cell differentiation to trophoblast. Nat Biotechnol. 2002, 20: 1261-1264. 10.1038/nbt761.View Article
- Vallier L, Touboul T, Chng Z, Brimpari M, Hannan N, Millan E, Smithers LE, Trotter M, Rugg-Gunn P, Weber A: Early cell fate decisions of human embryonic stem cells and mouse epiblast stem cells are controlled by the same signalling pathways. PLoS One. 2009, 4: e6082-10.1371/journal.pone.0006082.View Article
- Katoh M: Review Cross-talk of WNT and FGF Signaling Pathways at GSK3β to Regulate β-Catenin and SNAIL Signaling Cascades. Cancer biol therapy. 2006, 5: 1059-1064. 10.4161/cbt.5.9.3151.View Article
- Wilding L, Gannon M: The role of pdx1 and HNF6 in proliferation and differentiation of endocrine precursors. Diabetes/metabol res rev. 2004, 20: 114-123. 10.1002/dmrr.429.View Article
- Voskas D, Ling LS, Woodgett JR: Does GSK-3 provide a shortcut for PI3K activation of Wnt signalling?. F1000 biol reports. 2010, 2: 82-
- Nostro MC, Sarangi F, Ogawa S, Holtzinger A, Corneo B, Li X, Micallef SJ, Park IH, Basford C, Wheeler MB: Stage-specific signaling through TGFβ family members and WNT regulates patterning and pancreatic specification of human pluripotent stem cells. Development. 2011, 138: 861-871. 10.1242/dev.055236.View Article
- Tanay A, Sharan R, Shamir R: Biclustering algorithms: A survey. Handbook of comput mol biol. Edited by: Aluru S. 2005, Chapman and Hall/CRC
- Goldberg DE: Genetic algorithms in search, optimization, and machine learning. 1989, Addison Wesley Publishing House
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.