Multiway modeling and analysis in stem cell systems biology
© Yener et al; licensee BioMed Central Ltd. 2008
Received: 05 February 2008
Accepted: 14 July 2008
Published: 14 July 2008
Systems biology refers to multidisciplinary approaches designed to uncover emergent properties of biological systems. Stem cells are an attractive target for this analysis, due to their broad therapeutic potential. A central theme of systems biology is the use of computational modeling to reconstruct complex systems from a wealth of reductionist, molecular data (e.g., gene/protein expression, signal transduction activity, metabolic activity, etc.). A number of deterministic, probabilistic, and statistical learning models are used to understand sophisticated cellular behaviors such as protein expression during cellular differentiation and the activity of signaling networks. However, many of these models are bimodal i.e., they only consider row-column relationships. In contrast, multiway modeling techniques (also known as tensor models) can analyze multimodal data, which capture much more information about complex behaviors such as cell differentiation. In particular, tensors can be very powerful tools for modeling the dynamic activity of biological networks over time. Here, we review the application of systems biology to stem cells and illustrate application of tensor analysis to model collagen-induced osteogenic differentiation of human mesenchymal stem cells.
We applied Tucker1, Tucker3, and Parallel Factor Analysis (PARAFAC) models to identify protein/gene expression patterns during extracellular matrix-induced osteogenic differentiation of human mesenchymal stem cells. In one case, we organized our data into a tensor of type protein/gene locus link × gene ontology category × osteogenic stimulant, and found that our cells expressed two distinct, stimulus-dependent sets of functionally related genes as they underwent osteogenic differentiation. In a second case, we organized DNA microarray data in a three-way tensor of gene IDs × osteogenic stimulus × replicates, and found that application of tensile strain to a collagen I substrate accelerated the osteogenic differentiation induced by a static collagen I substrate.
Our results suggest gene- and protein-level models whereby stem cells undergo transdifferentiation to osteoblasts, and lay the foundation for mechanistic, hypothesis-driven studies. Our analysis methods are applicable to a wide range of stem cell differentiation models.
Design optimization of tissue structure and function: a systems biology approach
The structure/function relationship dogma is central to understanding how biological systems function. The idea is deceptively simple: understanding the structural organization of biological systems, from massive ecological systems to the shape of a single protein, reveals the function of the system. The concept is powerful enough to have inspired a 200+ year-long effort to describe the components of our biological universe in ever finer detail, beginning with the Linnean taxonomic system of cataloging organisms based on their structural similarities, and culminating with microscale descriptions such as the complete genomes of several organisms, including humans . The reductionist approach to biological research has thus reigned supreme for generations, and as a result we now understand how the linear arrangement of nucleotides encodes the linear arrangement of amino acids, how proteins interact to form functional groups such as signal transduction and metabolic pathways, etc.
But at each level of biological organization, we reach a wall- having reduced the complex biological universe to a myriad of minute parts, we encounter new forms of complexity: data overload and the "curse of dimensionality ." Simply put, we've taken our biological machines apart but can't put them back together again- our ability to accumulate reductionist data has outstripped our ability to understand it. Thus, we encounter a gap in the structure/function relationship: having accumulated an extraordinary amount of detailed information about biological structures, we can't assemble it in a way that explains the correspondingly complex biological functions these structures perform.
This gap is especially evident at the level of tissues, where most diseases and injuries are manifest. Heart disease and cancer remain the top two causes of death in the United States. One fundamental characteristic of both diseases is tissue failure: namely, errors in the structural organization and function of cells in the affected tissues. Likewise, it is estimated that one in six US residents requires medical treatment for an injury each year , yet the process of wound healing is so complex it is difficult to accurately predict how quickly most serious wounds will heal [4, 5]. Existing models of wound healing rely on clinically relevant, but somewhat superficial, measures of tissue state such as reduction in wound area, linear advancement of wound edge, pain, and ease of use [4–6]. In fact, despite a multitude of genetic screens, biochemical assays, and imaging techniques, the "gold standard" for diagnosis and evaluation remains the expert opinion of highly trained pathologists who scan samples of the tissues in histopathology slides. In other words, the human eye is currently the most accurate tool we have available for identifying telltale alterations in the structure and function of diseased and damaged tissues. And it is clear that human judgment is not fail-proof: thousands of diseases are misdiagnosed every year, costing hundreds of millions of dollars in wasted or ineffective medical treatment.
To improve diagnosis and treatment of diseases and wounds, we need a better understanding of how the tremendous numbers of cellular and subcellular parts are organized into functional tissues. One strategy for achieving this is to employ robust methods for describing complex systems, adapted from math and engineering disciplines far outside traditional biomedical fields . Viewed from this perspective, tissue organization and function can be treated as a design optimization problem: what is the optimal arrangement of cellular constituents that achieves the best tissue performance?
When applied to problems of biological complexity, this design optimization approach is sometimes called systems biology. If one begins with the assertion that healthy, native tissues represent a design optimum, the task of systems biology is to identify the "control knobs" that govern tissue structure and function, and the specific "settings" of these knobs that yield an optimally functional (i.e., healthy) tissue. A second important assertion is that tissues are "self-correcting," in that when damaged, they are capable of generating an appropriate response that restores them to their optimal condition (i.e., wound healing): how are the control knobs "turned" to restore optimal function in a wounded tissue? All systems biology approaches therefore focus on defining three characteristics common to self-correcting systems: robustnes s (ability to maintain phenotypic stability in response to perturbation), modularity (clustering of components into functional "teams"), and, most importantly, emergent properties (behaviors unique to the entire system, and not found in any of its constituents). Systems biology approaches also share a concept known as iterative refinement, meaning that they cycle between perturbing a biological system, analyzing the data thus generated, and predicting how the system will respond to a new perturbation; these cycles learn relationships.
Applications of tissue structure and function principles: tissue engineering
Since the late 1980s engineering design principles have been applied to living systems to create replacement tissues de novo . In its most basic sense, an engineered tissue construct (ETC) is a three-dimensional assembly of one or more cell types suspended in an extracellular scaffold material and fed by soluble molecules, including growth factors, hormones, and nutrients . Once assembled, the ETCs are intended to be implanted as replacements for damaged or diseased tissues. The results thus far have been promising , and some enjoy widespread use in the clinic .
Stem cells are very popular sources for the cellular component in these ETCs because they have the ability to proliferate (thereby populating the ETC with a high density of cells) and undergo differentiation once they reach their desired location and cell density. While embryonic stem cells retain the ability to differentiate into all tissues found in the adult, adult stem cells isolated from a number of different organs retain a more limited differentiation capacity. In either case, the promise is the same: stem cells offer the potential to define and manipulate fundamental principles of cell and tissue behavior, which in turn will uncover a new set of therapeutic targets for correcting errors in cell and tissue function .
Due to our very limited understanding of the principles governing tissue structure and function, most current tissue engineering typically follows a trial-and-error approach to design . While persistence can pay off in the long run, the inefficiency of this approach remains one of the major barriers to widespread clinical application of ETCs . Some stem cells used in ETCs also have the capacity to form tumors in vivo. Until it is known how these cells decide which phenotype to adopt, this too presents a significant clinical challenge.
Human stem cells: attractive targets for systems biology analysis
Systems biology approaches have been employed to help uncover the mechanisms governing differentiation and function of tissue stem cells [17–25] Comparatively little is known about the molecular control of human mesenchymal stem cells (hMSC). Originally described by Friedenstein , hMSC are a popular choice for musculoskeletal tissue engineering. hMSC are multipotent, self-renewing cells that can be isolated from the adult bone marrow . They are capable of differentiating into at least three (osteogenic, chondrogenic, adipogenic) and perhaps as many as eight distinct lineages .
Due to its complex nature, unraveling stem cell differentiation requires a multi-stage approach. Reductionist studies have thus far identified a small number of potential regulators of this process [29–31], but fail to capture the global effects of these candidates on stem cell behavior. High-throughput, macro-scale studies (genomics, proteomics, etc.) are much better equipped to capture global changes in a complex system, and are the preferred choice for sampling dynamic changes in stem cell "state." But by themselves, these methods are typically not equipped to develop rigorous, testable hypotheses concerning the mechanisms governing this behavior .
The second stage of a systems biology approach to cell differentiation is to model the observed protein activity/gene expression changes as a function of the input stimuli. This is where so-called "traditional" biologists collaborate with experts in mathematics and computer science. Several modeling approaches have been applied. For example, Janes and Lauffenburger  illustrate how deterministic, probabilistic, and statistical learning models can be used to extract information about proteomic networks. The random nature of proteomic, genomic, and signaling data suggests that machine learning methods can also be used to model complex behavior such as protein-protein interactions [34, 35]. For a compherensive review on this topic, we refer the reader to [20, 36].
Application of tensor analysis to stem cell systems biology: multiway models
Faced with analyzing a wealth of data points as inputs, systems biologists turn to dimension reduction techniques using linear algebra. In linear algebra, a matrix is a two-way model used to describe linear relationships between the variables in two dimensions (e.g., rows and columns in a table). For example, in a gene expression experiment, the concentration of a chemical stimulant can serve as the row variables, and the resulting gene expression values can be the column variables. An entry in such a table would describe the gene expression level associated with a particular chemical stimulant concentration. However, it is now quite straightforward to generate data with three or more variables (also known as modes) (e.g., concentration of stimulant, protein phosphorylation level, gene expression level, duration of stimulant exposure, etc.) that cannot be represented by matrices.
Having generated an "n-dimensional cube" data structure to represent the data, we now turn to the problem of how to fit a multiway model to the data and analyze the multilinear relationships between the modes. Two common models in multiway data analysis are Tucker3 [40–42] and PARAFAC .
Both methods model the original data by assembling a substantially smaller dataset representing the larger original data. While PARAFAC has not been applied to tackle the question of cell phenotype changes over time under different stimuli, Omberg et al. did use a Tucker3 (N-mode SVD) approach to model the time course of global gene expression in response to cell cycle inhibitors in yeast .
Overview of multiway modeling and analysis techniques
A higher-order tensor is a multiway dataset represented as T ∈ RI × J ×....N, where M > 2. (Note that there are several notations used for representing multiway data sets such that both T and refer to the same multiway array and will be used interchangeably.) For example consider a 3-way tensor i.e., a data cube that has locus links as its rows, gene ontology categories as its columns, and experimental conditions as its tubes. Below we will discuss three main techniques for analyzing such a tensor.
where P, Q and R indicate the number of components extracted from first, second and third mode (P ≤ I, Q ≤ J and R ≤ K), respectively. A ∈ RI × P, B ∈ RJ × Q and C ∈ RK × R are the component matrices. G ∈ RP × Q × R is the core tensor and E ∈ RI × J × K represents the error term. Different constraints such as nonnegativity, unimodality or orthogonality can be enforced on the component matrices. A Tucker3 model with orthogonality constraints on component matrices is a generalization of SVD from matrices to high-order datasets and is also called Higher-Order Singular Value Decomposition (HOSVD)  or multilinear SVD.
Tucker3 is the most flexible model among multiway analysis techniques. Although it suffers from core rotations, which result in non-unique solutions, a Tucker3 model may be preferred over other multiway analysis methods such as PARAFAC because of its flexibility in extracting a different number of components in each mode.
The simplest three-way model in terms of interpretation is Parallel Factor Analysis (PARAFAC) by Harshman  or Canonical Decomposition (CANDECOMP) by Carroll & Chang . These two methods were originally proposed independently, but they employ the same model and are therefore considered completely equivalent.
The multiway modeling and analysis techniques presented above were applied to two systems biology problems: (i) discovering functional clusters of gene/protein expression during stem cell differentiation, and (ii) dynamics of hMSC osteogenic differentiation over time.
In this section we will apply multiway modelling and analysis techniques into two problems. First we will extend our bilinear work in  to 3-way modelling and analysis to investigate the impact of different stimulants on hMSC osteogenic differentiation. The main result here is the confirmation of multiple paths for reaching osteoblast. Second, we focus on the time domain and model how osteogenesis evolves as a function of time under tensile strain. Our main finding is that tensile strain accelerates the osteogenic differentiation.
Case study I
Impact of different stimulants on hMSC osteogenic differentiation
In Bennett et al. , we examined the protein expression profiles of four populations of hMSC stimulated to undergo osteogenic differentiation via either contact with pro-osteogenic extracellular matrix (ECM) proteins (collagen I, vitronectin, or laminin-5) or addition of osteogenic media supplements (OS media). Unstimulated hMSC and fully differentiated human osteoblasts (hOST) were included as the start and desired end points of this differentiation, respectively. Our goal was to identify key changes in protein (and hence, gene) expression as hMSC move from an undifferentiated state (represented by unstimulated hMSC) to an osteogenic phenotype (represented by hOST). To capture a large amount of protein expression data from each population, we used two-dimensional liquid chromatography tandem mass spectroscopy (2D LC-MS/MS) to detect the 1000 most abundantly expressed proteins in each population of cells. The experiment was repeated three times, yielding three sets of data for each of the cell populations. Our hypothesis was that as hMSC differentiate into the hOST phenotype, their protein/gene expression profile grows increasingly similar to that of hOST. To our surprise, we discovered that, while this appeared to be true, OS and ECM stimulants triggered a rise in different sets of genes found in hOST. These results suggest to us that neither ECM or OS yield a complete osteogenic phenotype when used alone, and that they enhance different types of osteogenic genes. In other words, hMSC exhibited at least two different patterns of protein/gene expression change during osteogenic differentiation. Our results in  were mainly obtained by using bilinear methods (although we briefly introduced multiway analysis using Tucker3 method on locus link mode only). Next we show in detail how to fit several multiway models to this multiarray data and analyze its structure.
Constructing a third order proteomics tensor
In  we identified 361 proteins (not expressed by the undifferentiated cells) that varied amongst the hMSC and hOST populations, and annotated the gene for each with its locus link number. We also mapped each gene in a gene ontology tree. The data are arranged as a tensor of type protein/gene locus link (LL) × category (Gene Ontology-GO) × stimulant, with dimensions (361 × 69 × 5), which we call the Proteomics Tensor (T). The stimulants were numbered as: collagen-I = sample 1, OS = sample 2, vitronectin = sample 3, hOST (positive control) = sample 4, hOST = sample 4, laminin-5 = sample 5.
We applied all three techniques reviewed above to model and analyze the T1. Tucker1 model had (69, 26, 4) rank reductions for each mode, respectively to explain almost 90% variance of the data. These numbers were used to choose the core component numbers for our Tucker3 model. We also deployed a 12-component PARAFAC model based on the core consistency analysis .
Locus link mode analysis
The top 12 singular values of the matrix corresponding to unfolding of tensor T in Locus Link mode indicates that corresponding 12 singular vectors jointly capture only 50% of the original data.
Tucker1: Top 12 Singular (cumulative) Values for Locus Link Mode Matricizing
We choose a model with a core cube that has dimensions 69 × 26 × 4, which explains 82% of the total variance in the data. Note that the choice of the component number combination 69 × 26 × 4 is made based on the rank reductions in the unfolded tensor T and
Core analysis of the Tucker3 model applied to tensor T shows the contribution of each element in the tensor T to the fitting of Tucker3 model to the data.
ANALYSIS OF 69 × 26 × 4 CORE ARRAY
Fraction of Variance
Summed Fraction of Var.
[1, 1, 1]
[2, 2, 1]
[3, 3, 1]
[4, 4, 1]
[7, 5, 1]
[6, 1, 2]
[5, 6, 1]
[12, 9, 1]
[3, 1, 3]
[8, 7, 1]
[5, 3, 1]
[6, 6, 1]
Core Consistency analysis of PARAFAC model for tensor T.
Visual inspection shows that similar to Tucker3 analysis, there is a concentration of a large number of locus link numbers and then there are a small number of outliers.
Clustering on locus link mode
We used a k-means clustering algorithm with 100 repeated runs to divide the locus links into two clusters (i.e., k = 2) to identify the outliers from the rest. In order to resolve disagreements between different runs, we computed a majority function by calculating the number of occurrences of a particular clustering and then picking the maximum number of occurrences.
Again, we first input the entire component matrix corresponding to the locus link mode to a k-means algorithm and observed that k-means algorithm on the locus link component matrix with dimensions (361 × 69) had considerable difficulty producing a stable clustering. The maximum occurrence value was close to 40, which is less than 10% of the runs. Thus we chose the top 2 vectors as the input of the k-means algorithm to produce a stable k-means output over 100 runs which constructed a smaller cluster with 30 locus links.
We input the entire component matrix corresponding to the locus link mode for a 12-factor PARAFAC model to cluster locus links. We observed that a k-means algorithm produced more stable clustering results on the locus link component matrix with dimensions (361 × 12) obtained from PARAFAC model. However for consistency we also computed the clustering by using only the top 2 vectors and obtained one small cluster with 63 locus links and one large cluster with the remaining locus links. Comparison of the clusters detected using Tucker1, Tucker3 and PARAFAC methods are given in Figure 9. Notice that while the clusters are not identical across all three methods, there is considerable agreement on the outliers.
Category mode analysis
The original proteomics data contained 71 gene ontology categories. We elected to drop the categories corresponding to Transmembrane Receptor Activity and Transmembrane Receptor Protein Tyrosine Kinase Signaling Protein because they contained very few genes. We have analyzed the data by using all three techniques considered in this work. Our analysis in Category mode also aims at identifying clusters of interest that may include categories that are nontrivially related.
Clustering of gene categories
k-means clustering results obtained from 24 singular vectors of the unfolded tensor in category mode.
BIOSYNTHESIS, PROTEIN METABOLISM
DNA BINDING, NUCLEOBASE-NUCLEOSIDE-NUCLEOTIDE AND NUCLEIC ACID METABOLISM, RNA BINDING
CALMODULIN BINDING, CELL GROWTH AND/OR MAINTENANCE, CELL MOTILITY, CYTOSKELETAL PROTEIN BINDING, ORGANOGENESIS, PURINE NUCLEOTIDE BINDING, SIGNAL TRANSDUCTION
The rest of the 69 catagories
k-means clustering results obtained from the 24 column vectors of the category component matrix of Tucker3 analysis.
BIOSYNTHESIS, CALCIUM ION BINDING, DNA BINDING, PROTEIN METABOLISM, PURINE NUCLEOTIDE BINDING, RNA BINDING
CELL GROWTH AND/OR MAINTENANCE
CYTOSKELETAL PROTEIN BINDING, SIGNAL TRANSDUCTION
The rest of the categories.
The Tucker3 analysis was less informative and the clear distinction between signal transduction and gene expression was lost. Also, the category of calmodulin binding was absent and the category of calcium ion binding, a less specific category, was selected. Other attractive categories in the Tucker1 analysis (signal transduction, purine nucleotide binding, DNA binding) also appeared here, suggesting they may play an especially prominent role during osteogenesis.
k-means clustering of category names for PARAFAC analysis.
BIOSYNTHESIS, DNA BINDING, NUCLEOBASE- NUCLEOSIDE-NUCLEOTIDE-AND-NUCLEIC-ACID-METABOLISM, PROTEIN METABOLISM, RNA BINDING
CELL GROWTH AND/OR MAINTENANCE, CELL MOTILITY, CYTOSKELETAL PROTEIN BINDING, ORGANOGENESIS, PHOSPHORUS METABOLISM, PURINE NUCLEOTIDE BINDING, SIGNAL TRANSDUCTION
IMMUNE RESPONSE, RESPONSE TO EXTERNAL STIMULUS, RESPONSE TO STRESS
The rest of the categories
Comparison to two-way analysis results
When compared to the categories identified by our previous two-way analysis [48, 49], all three tensor models generated more meaningful clusters of functionally related protein categories. Also, many of the categories selected by the tensor models were more specific than those we found previously (e.g., protein metabolism vs. amino acid and derivative metabolism). Curiously, the category of organogenesis, which is perhaps the most directly related to the osteogenic differentiation we are inducing, appeared only in the Tucker1 and PARAFAC models, and was clustered with signal transduction categories in each case. Many of the categories that appeared in the two-way analysis are concerned with activities shared by all cells (e.g., oxidoreductase activity, protein translation, oxygen metabolism).
Sample mode analysis
We unfolded the tensor T in the third mode (sample mode as shown in Figure 6), T(3) and then applied Singular Value Decomposition (SVD) on T(3) to capture the structure in sample mode. The largest 4 singular values out of 5 nonzero singular values belong to the singular vectors that explained almost 93% of the variance (1st: 53%; 2nd: 15%; 3rd:14.5%; 4th: 10.5).
We fit our Tucker3 multiway analysis model with a core tensor of dimensions 69 × 24 × 4 to the data tensor, T and analyzed the structure in the third mode (sample mode).
Clustering of sample mode
k-means results for all three techniques.
K-Means Cluster Assignment for Samples
In order to capture the data structure in sample mode, we applied a k-means clustering algorithm on the component matrix corresponding to the third mode. We observed (see Table 7) that class memberships are not the same as in SVD analysis on the unfolded tensor (PT(3)) discussed above.
We applied a k-means clustering algorithm to the sample mode component matrix on PARAFAC model which produced the same class assignment as our Tucker1 model, as shown in Table 7.
Clustering based on the PARAFAC decomposition yielded the least informative results, in that the three clusters on the plot formed a pattern lacking any clear, biological meaning.
Case study II
Collagen-induced hMSC stem cell osteogenesis over time
It is clear that cell differentiation is a carefully timed process. What is missing from many systems biology approaches is the element of time, and to add it requires slightly more rigorous analysis. In many cases where data are collected as a function of time, the time element is simply removed, e.g. by taking the maximum activation across the time periods, then using linear two-way causal analysis techniques.
We collected gene expression data (from microarray analysis) for hMSC induced to undergo osteogenic differentiation via two types of stimulus: (1) by simply placing them on a flexible collagen-I coated substrate (unstrained), or (2) by also applying cyclic tensile strain to these substrates. Both conditions were run for five days and triplicate samples collected at day 1, 2, 4 and 5. mRNA from three replicates of naïve hMSC grown on tissue culture plastic (TCP) and fully differentiated hOST were also collected to represent the starting point and desired end point, respectively. The resulting data were filtered as follows: only those genes associated with a locus link number were considered; of these, only those data points tagged as valid (P = present or M = marginal designations) across all 30 samples by the Genespring microarray analysis software were used; of these, only genes with statistically reliable replicates (t-test, 0.05 level of significance) were considered.
Our hypothesis was that application of strain "accelerates" the osteogenic differentiation induced by the collagen I substrate, which will be reflected by the earlier appearance of the osteogenesis-associated genes in the strained samples.
Constructing a third order time evolving hMSC tensor
We fit a Tucker3 multiway analysis model with core component numbers 6 × 6 × 3 to the data tensor T. The Tucker3 model decomposed the hMSC tensor into three component matrices, one for each mode and a core tensor with dimensions 6x 6x 3. The explained variance obtained by this model was 98.16%.
We analyzed the hMSC tensor with the PARAFAC technique as well and computed the core consistency of our PARAFAC model. Our consistency analysis identified a 2-component model which explained more than 91% of the variance and had 99.71% core consistency (the 3-component model had 37.11% core consistency, which is well below the rule of thumb 90% requirement).
Locus link analysis
Our objective was to identify the outliers in the locus link mode and examine them in order to learn which genes are potentially important for the cell differentiation process.
For both Tucker3 and PARAFAC we computed the 98% concentration of the locus link numbers projected onto lower dimensions to capture the outliers in the remaining 2 percentile. Finally we took the intersection of the 2 percentile set of Tucker3 and PARAFAC.
Results and discussion
The methods illustrated here provide a means for translating large data sets that capture global gene and protein expression changes during hMSC differentiation into simplified models. The performance of the Tucker1, 3, and PARAFAC models sometimes differed considerably. For example, in our proteomics data set (case study I), Tucker3 appeared to perform best in locus link mode. K-means algorithm identified two clusters (Figure 7), one of which (the outliers) included genes that participate in signal transduction, especially calcium/calmodulin-associated proteins (e.g., calmodulin-dependent protein kinase α, β, and γ); and control of transcription and translation (e.g., STAT1, SYNCRIP). We feel these genes may be of special interest because they fall outside the majority of the "common" genes that can be reduced by the model, and thus may contribute uniquely to the distinct protein profiles in this data set. Tucker1 and PARAFAC, by comparison, were comparatively poor at identifying meaningful clusters. Nevertheless, the outliers identified by each method shared one common feature: they all included a set of six genes, five of which participate in signal transduction pathways (see Figure 10). One of these, calmodulin-dependent protein kinase II delta, was previously identified by SVD analysis as a candidate "osteogenic" gene, in that its expression is found in hMSC populations most closely resembling osteoblasts. This bolsters our belief that calcium/calmodulin signaling in especially important during hMSC osteogenic differentiation.
When one considers the set of proteins shared by at least two of the three methods, this pattern becomes even clearer: additional isoforms of calmodulin-dependent protein kinase II and other signaling proteins (caldesmon, PDLIM7, RhoA, Rho C, and protein phosphatase 2) emphasize the importance of integrin-associated signaling pathways during ECM-induced differentiation. These interpretations are consistent with those of others who have applied similar techniques to other stem cell data sets (refs: UID# 15257023, 17541472, 17625253). These models also included a number of muscle-associated proteins (tropomyosin, two myosin isoforms, CAPZB) suggesting that bone and muscle differentiation may be closely related. This also agrees with our previous analysis of osteogenic gene focusing in response to tensile strain, wherein we observed a drop in expression of marker genes for many different lineages (nerve, fat, cartilage), but observed no drop in smooth muscle cell markers .
In category mode, PARAFAC yielded the most interesting clustering results for the proteomics data set, in that it identified clusters of functionally related genes that contribute the most to the model. Furthermore, many of these genes afford a plausible biological explanation for how hMSC undergo differentiation. It selected the greatest number of categories, yet organized them into three clearly distinct clusters. P cluster 1 contained categories primarily concerned with nucleotide binding and metabolism, and resembled the gene expression cluster (T1 cluster 2) in the Tucker1 analysis. P cluster 2 closely resembled the signal transduction cluster (T1 cluster 3) in the Tucker1 model, and added an additional category, phosphorus metabolism. The third cluster contained categories not found in the other two models, that centered on the theme of extracellular matrix protein synthesis and modification. We previously identified these categories as significant during hMSC differentiation .
Tucker1 and Tucker3 identified smaller sets of outliers. The first cluster in the Tucker1 model (T1 cluster 1) contained two categories primarily associated with cell survival, and therefore sheds little light on the potential mechanisms underlying hMSC differentiation. However, T1 cluster 2 and T1 cluster 3 contained categories concerned with control of gene expression and signal transduction, respectively. Given the tight association between these activities and their clear association with cellular differentiation, selection of these categories may help identify the potential mechanisms used by hMSC during osteogenic differentiation. In particular, the signal transduction cluster (T1 cluster 3) contained categories concerned with traditional signaling pathways known to control differentiation. For example, calmodulin and calmodulin-dependent protein kinase II stimulate osteogenic differentiation of hMSC while promoting cell migration and suppressing cell growth ; all of these activities are contained in the signal transduction cluster. G proteins, which correspond to the purine nucleotide binding category, are well-known to play an important role in osteogenic differentiation [reviewed in ]. Again, these results agree with our previous analysis, which identified calcium-dependent signaling as an important factor in osteogenesis .
The plot of sample mode data from Tucker3 (Figure 13) is quite informative. The wide separation of the NoStimulant and Osteoblast samples allows us to interpret the space between them as a form of "differentiation axis," and illustrates two important themes. First, we observe that populations of hMSC grown On_Vitronectin or On_Collagen lie midway between the unstimulated hMSC and osteoblasts, demonstrating the partial differentiation induced by these stimulants. Second, the observation that hMSC cultured IN_OSMedium lie beyond the intended target (Osteoblasts) suggests that OS may "over-stimulate" these cells. OS medium contains dexamtheazone, a synthetic form corticosteroid, and this population of cells expressed a distinct set of genes/proteins devoted to steroid metabolism. Both ECM and OS stimulants yield cells that resemble osteoblasts, yet they induce the expression of quite different genes. It is quite possible that the typical OS exposure regimen drives steroid metabolism genes beyond the level necessary for osteogenesis. It is also possible that a combination of the genes expressed in ECM stimulated cells and genes expressed in OS stimulated cells would yield a phenotype closer to true osteoblasts than either set of genes alone. Curiously, the same type of plot for the PARAFAC data (Figure 14) offered no clear biologically meaningful relationship between the samples.
We intersected the 98% outliers obtained by both Tucker3 and PARAFAC analysis to compose a list of interesting genes
INTERESTING GENES for the DIFFERENTIATION PROCESS:
AMD1 B2M CANX SERPINH1 CHI3L1 COL1A1 COL3A1 COL6A3 COL8A1 COL12A1 COMT CTGF CTSB CTSD CTSK CYP1B1 AKR1C1 ENO1 FHL2 GARS HMOX1 IGFBP6 IMPDH2 LOX LOXL1 LRPAP1 LUM MX1 SERPINE2 HTRA1 RPL27A RPS15A S100A4 SPARC TGFBI TMSB4X UBA1 IFITM1 EIF3D EIF2S2 CCPG1 ISG15 SERF2 BASP1 IFITM3 FST POSTN MAPKBP1 KIF1B FBXL2 C6orf48 TMEM66 CCDC91 TRERF1 C15orf24 IKZF5 BAIAP2L2 DCUN1D5 TUBA1C CTHRC1
Finally, the graph of the samples in Figure 17, when viewed as an axis of similarity between undifferentiated hMSC grown on tissue culture plastic (TCP) and hOST is entirely consistent with our hypothesis, and strongly suggests that hMSC transdifferentiate towards the osteoblast phenotype under these conditions. Furthermore, at each time point tested (days 2, 4, and 5), the strained population always lies closer to the osteoblasts than the unstrained population. This is consistent with our previous finding that application of tensile strain accelerates the osteogenic differentiation of hMSC .
Application of tensor analysis to complex data sets such as those generated in studies of human stem cell differentiation is a powerful method for uncovering important patterns in the data. In particular, we have applied three different analysis methods to two different data sets extracted from hMSC, to yield models that present the data in simplified forms. The first data set was the same one used in  while the second one is entirely new.
A cross comparison of the tensor modeling and analysis techniques indicated that the second data set can be modeled and interpreted much better (i.e., by using fewer components, capturing a higher percentage of the variance in the data, and much better consistency in the convergence and fitting). These models also identify candidate genes/proteins as being especially important because they contribute a great deal to explaining the variation between our treatment conditions. It is important that multiple modeling approaches consistently identified a small set of genes that play a large role in differentiating between stem cell populations; these genes thus serve as candidates for hypothesis-driven research aimed at uncovering the molecular mechanisms governing phenotypic changes in stem cells.
While traditional two-way analysis tools are powerful instruments to find relationships in two-way data, the application of tensors allowed us to capture more information than two-dimensional techniques and thus provided a more robust analysis of hMSC differentiation. We feel that our tensor approach has a wide range of possible applications in complex problems in systems biology.
1We have used the Matlab with PLSToolbox in our modeling and analysis .
List of abbreviations
engineered tissue construct
Higher-Order Singular Value Decomposition
human mesenchymal stem cells
Parallel Factor Analysis
singular value decomposition
- 2D LC-MS/MS:
two-dimensional liquid chromatography tandem mass spectroscopy
The authors wish to thank the National Institute of Biomedical Imaging & Bioengineering (National Institutes of Health), who supported this effort through grant #1 R01 RB002197 (to GEP). We would like to thank the anonymous referees for their very prompt, thorough, and constructive reviews.
- McPherson JD, Marra M, Hillier L, Waterston RH, Chinwalla A, Wallis J, Sekhon M, Wylie K, Mardis ER, Wilson RK, et al.: A physical map of the human genome. Nature. 2001, 409: 934-941.View ArticlePubMedGoogle Scholar
- Kaminski N: Bioinformatics. A user's perspective. Am J Respir Cell Mol Biol. 2000, 23: 705-711.View ArticlePubMedGoogle Scholar
- Vyrostek SB, Annest JL, Ryan GW: Surveillance for fatal and nonfatal injuries–United States, 2001. MMWR Surveill Summ. 2004, 53: 1-57.PubMedGoogle Scholar
- Jessup RL: What is the best method for assessing the rate of wound healing? A comparison of 3 mathematical formulas. Adv Skin Wound Care. 2006, 19: 138-147.View ArticlePubMedGoogle Scholar
- Wang HL, Boyapati L: "PASS" principles for predictable bone regeneration. Implant Dent. 2006, 15: 8-17.View ArticlePubMedGoogle Scholar
- Chaby G, Senet P, Vaneau M, Martel P, Guillaume JC, Meaume S, Teot L, Debure C, Dompmartin A, Bachelet H, et al.: Dressings for acute and chronic wounds: a systematic review. Arch Dermatol. 2007, 143: 1297-1304.View ArticlePubMedGoogle Scholar
- Liu ET: Systems biology, integrative biology, predictive biology. Cell. 2005, 121: 505-506.View ArticlePubMedGoogle Scholar
- Aderem A: Systems biology: its practice and challenges. Cell. 2005, 121: 511-513.View ArticlePubMedGoogle Scholar
- Levenberg S, Langer R: Advances in tissue engineering. Curr Top Dev Biol. 2004, 61: 113-134.View ArticlePubMedGoogle Scholar
- Mistry AS, Mikos AG: Tissue engineering strategies for bone regeneration. Adv Biochem Eng Biotechnol. 2005, 94: 1-22.PubMedGoogle Scholar
- Cortesini R: Progress in tissue engineering and organogenesis in transplantation medicine. Exp Clin Transplant. 2003, 1: 102-111.PubMedGoogle Scholar
- Shieh SJ, Vacanti JP: State-of-the-art tissue engineering: from tissue engineering to organ building. Surgery. 2005, 137: 1-7.View ArticlePubMedGoogle Scholar
- Le Blanc K, Pittenger M: Mesenchymal stem cells: progress toward promise. Cytotherapy. 2005, 7: 36-45.View ArticlePubMedGoogle Scholar
- Hacker MC, Mikos AG: Trends in tissue engineering research. Tissue Eng. 2006, 12: 2049-2057.View ArticlePubMedGoogle Scholar
- Ilic D: Dusko Ilic() explores the latest developments in the field of stem cell research and regenerative medicine. Regen Med. 2007, 2: 237-242.View ArticleGoogle Scholar
- Richkind KE, Romansky SG, Finklestein JZ: t(4;19)(q35;q13.1): a recurrent change in primitive mesenchymal tumors?. Cancer Genet Cytogenet. 1996, 87: 71-74.View ArticlePubMedGoogle Scholar
- Chen C, Fuhrken PG, Huang LT, Apostolidis P, Wang M, Paredes CJ, Miller WM, Papoutsakis ET: A systems-biology analysis of isogenic megakaryocytic and granulocytic cultures identifies new molecular components of megakaryocytic apoptosis. BMC Genomics. 2007, 8: 384-PubMed CentralView ArticlePubMedGoogle Scholar
- Hurlbut WB: Altered nuclear transfer: a way forward for embryonic stem cell research. Stem Cell Rev. 2005, 1: 293-300.View ArticlePubMedGoogle Scholar
- Jensen J: Pathway decision-making strategies for generating pancreatic beta-cells: systems biology or hit and miss?. Curr Opin Endocrinol Diabetes Obes. 2007, 14: 277-282.View ArticlePubMedGoogle Scholar
- Loeffler M, Roeder I: Conceptual models to understand tissue stem cell organization. Curr Opin Hematol. 2004, 11: 81-87.View ArticlePubMedGoogle Scholar
- Price ND, Foltz G, Madan A, Hood L, Tian Q: Systems Biology and Cancer Stem Cells. J Cell Mol Med. 2007Google Scholar
- Puente LG, Borris DJ, Carriere JF, Kelly JF, Megeney LA: Identification of candidate regulators of embryonic stem cell differentiation by comparative phosphoprotein affinity profiling. Mol Cell Proteomics. 2006, 5: 57-67.View ArticlePubMedGoogle Scholar
- Roeder I, Lorenz R: Asymmetry of stem cell fate and the potential impact of the niche: observations, simulations, and interpretations. Stem Cell Rev. 2006, 2: 171-180.View ArticlePubMedGoogle Scholar
- Roeder I, Braesel K, Lorenz R, Loeffler M: Stem cell fate analysis revisited: interpretation of individual clone dynamics in the light of a new paradigm of stem cell organization. J Biomed Biotechnol. 2007, 2007: 84656-PubMed CentralView ArticlePubMedGoogle Scholar
- Sun T, McMinn P, Coakley S, Holcombe M, Smallwood R, Macneil S: An integrated systems biology approach to understanding the rules of keratinocyte colony formation. J R Soc Interface. 2007, 4: 1077-1092.PubMed CentralView ArticlePubMedGoogle Scholar
- Friedenstein AJ: Precursor cells of mechanocytes. Int Rev Cytol. 1976, 47: 327-359.View ArticlePubMedGoogle Scholar
- Jaiswal N, Haynesworth SE, Caplan AI, Bruder SP: Osteogenic differentiation of purified, culture-expanded human mesenchymal stem cells in vitro. J Cell Biochem. 1997, 64: 295-312.View ArticlePubMedGoogle Scholar
- Song L, Webb NE, Song Y, Tuan RS: Identification and functional analysis of candidate genes regulating mesenchymal stem cell self-renewal and multipotency. Stem Cells. 2006, 24: 1707-1718.View ArticlePubMedGoogle Scholar
- Klees RF, Salasznyk RM, Boskey A, Plopper GE: Activation of FAK is necessary for the osteogenic differentiation of human mesenchymal stem cells on laminin-5. J Cell Biochem. 2007, 100: 499-514.View ArticlePubMedGoogle Scholar
- Salasznyk RM, Klees RF, Hughlock MK, Plopper GE: ERK signaling pathways regulate the osteogenic differentiation of human mesenchymal stem cells on collagen I and vitronectin. Cell Commun Adhes. 2004, 11: 137-153.View ArticlePubMedGoogle Scholar
- Ward DF, Salasznyk RM, Klees RF, Backiel J, Boskey A, Plopper GE: Mechanical strain enhances ECM induced cell fate determination and promotes osteogenic differentiation of human mesenchymal stem cells through the ERK MAPK pathway. Stem Cells Dev. 2007, 16: 467-480.View ArticlePubMedGoogle Scholar
- Salasznyk RM, Klees RF, Vandenberg S, Bennett K, Plopper GE: Gene focusing as a basis for controlling stem cell differentiation. Stem Cells Dev. 2005, 14: 608-620.View ArticlePubMedGoogle Scholar
- Janes KA, Lauffenburger DA: A biological approach to computational models of proteomic networks. Curr Opin Chem Biol. 2006, 10: 73-80.View ArticlePubMedGoogle Scholar
- Sachs K, Perez O, Pe'er D, Lauffenburger DA, Nolan GP: Causal protein-signaling networks derived from multiparameter single-cell data. Science. 2005, 308: 523-529.View ArticlePubMedGoogle Scholar
- Woolf PJ, Prudhomme W, Daheron L, Daley GQ, Lauffenburger DA: Bayesian analysis of signaling networks governing embryonic stem cell fate decisions. Bioinformatics. 2005, 21: 741-753.View ArticlePubMedGoogle Scholar
- Viswanathan S, Zandstra P: Towards predictive models of stem cell fate. Cytotechnology. 2003, 41: 75-92.PubMed CentralView ArticlePubMedGoogle Scholar
- Alter O, Golub GH: Singular value decomposition of genome-scale mRNA lengths distribution reveals asymmetry in RNA gel electrophoresis band broadening. Proc Natl Acad Sci USA. 2006, 103: 11828-11833.PubMed CentralView ArticlePubMedGoogle Scholar
- Golub GH, Van Loan CF: Matrix computations. 1989, Baltimore: Johns Hopkins University PressGoogle Scholar
- Bader BW, Kolda TG: Algorithm 862: MATLAB tensor classes for fast algorithm typing. ACM Transactions on Mathematical Software. 2006, 32: 635-653.View ArticleGoogle Scholar
- Tucker LR: Some mathematical notes on three-mode factor analysis. Psychometrika. 1966, 31: 279-311.View ArticlePubMedGoogle Scholar
- Tucker LR: Implications of factor analysis to three-way matrices of measurement of change. Problems In Measuring Change. 1963, 122-137. Madison: The University of Wisconsin PressGoogle Scholar
- Tucker LR: The extension of factor analysis to three-dimensional matrices. Contributions to Mathematical Psychology. 1964, 110-182. New York: Holt, Reinhart, and WinstonGoogle Scholar
- Harshman RA: Foundations of the PARAFAC procedure: Models and conditions for an explanatory multi-modal factor analysis. UCLA working papers in phonetics. 1970, 16: 1-84.Google Scholar
- Omberg L, Golub GH, Alter O: A tensor higher-order singular value decomposition for integrative analysis of DNA microarray data from different studies. Proc Natl Acad Sci USA. 2007, 104: 18371-18376.PubMed CentralView ArticlePubMedGoogle Scholar
- Kroonenberg PM, de Leeuw J: Principal component analyses of three-mode data by means of alternating least squares algorithms. Pychometrica. 1980, 45: 69-97.View ArticleGoogle Scholar
- De Lathauwer L, De Moor B, Vandewalle J: A multilinear singular value decomposition. SIAM Journal on Matrix Analysis and Applications. 2000, 21: 1253-1278.View ArticleGoogle Scholar
- Carroll J, Chang J: Analysis of individual differences in multidimensional scaling via an N-way generalization of Eckart-Young decomposition. Psychometrika. 1970, 35: 283-319.View ArticleGoogle Scholar
- Bennett KP, Bergeron C, Acar E, Klees RF, Vandenberg SL, Yener B, Plopper GE: Proteomics reveals multiple routes to the osteogenic phenotype in mesenchymal stem cells. BMC Genomics. 2007, 8: 380-PubMed CentralView ArticlePubMedGoogle Scholar
- Eigenvector Research, PLS Toolbox. 2008
- Bro R, Kiers HAL: A new efficient method for determining the number of components in PARAFAC models. J Chemometrics. 2003, 17: 274-286.View ArticleGoogle Scholar
- Salasznyk RM, Westcott AM, Klees RF, Ward DF, Xiang Z, Vandenberg S, Bennett K, Plopper GE: Comparing the protein expression profiles of human mesenchymal stem cells and human osteoblasts using gene ontologies. Stem Cells Dev. 2005, 14: 354-366.View ArticlePubMedGoogle Scholar
- Shin MK, Kim MK, Bae YS, Jo I, Lee SJ, Chung CP, Park YJ, Min dS: A novel collagen-binding peptide promotes osteogenic differentiation via Ca(2+)/calmodulin-dependent protein kinase II/ERK/AP-1 signaling pathway in human bone marrow-derived mesenchymal stem cells. Cell Signal. 2008, 20: 613-624.View ArticlePubMedGoogle Scholar
- Schindeler A, Little DG: Ras-MAPK signaling in osteogenic differentiation: friend or foe?. J Bone Miner Res. 2006, 21: 1331-1338.View ArticlePubMedGoogle Scholar
- Gaudet S, Janes KA, Albeck JG, Pace EA, Lauffenburger DA, Sorger PK: A compendium of signals and responses triggered by prodeath and prosurvival cytokines. Mol Cell Proteomics. 2005, 4: 1569-1590.View ArticlePubMedGoogle Scholar