The Cell Collective is a web-based platform (accessible at http://www.thecellcollective.org) in which laboratory scientists can collaboratively build mathematical models of biological processes by utilizing existing laboratory data, and subsequently simulate the models to further guide their laboratory experiments. Conceptually, the platform can be broken up into three parts (Figure 2) that form the basis for the core functionality of the software: 1) integrated Knowledge Base of protein dynamics generated from laboratory research in a single repository, 2) integration of this knowledge into mathematical representation that allows visualization of the dynamics of the data (i.e., put it in motion via simulations), and 3) simulations and analyses of the model dynamics. As can also be seen in the figure, these three parts form a loop that is closed by laboratory experimentation. The first model in The Cell Collective (available in for all users to simulate and build upon) is one of the largest models of intracellular signal transduction [32]. Features available in the current version of The Cell Collective are described in more detail in the following sections.
Knowledge Base of interaction dynamics
When laboratory scientists produce new results, for example regarding the role of one protein interacting with another protein, these results are usually published along with thousands of other results generated by the scientific community. The publication of individual results in isolation means that separate findings are not necessarily absorbed, verified, analyzed, and integrated into the existing knowledge. With the invention of various high-throughput technologies, the gap between the amount of knowledge produced and the ability of the scientific community to fully utilize this knowledge has grown [36].
The first major component of The Cell Collective (as highlighted in Figure 2) is a Knowledge Base which enables laboratory scientists to contribute to the integration of knowledge about individual biological processes at the most local level which includes, for example, the identification of direct protein-protein interactions. However, the goal of The Cell Collective is not to duplicate other well-established resources by providing extensive parts lists that make up various biological processes and cells. Instead, the aim of the platform is to extend static knowledge and data into dynamical models; hence the information provided in the Knowledge Base needs to be dynamical in nature. This means that the information (which is purely qualitative – see the Methods section) contained in The Cell Collective Knowledge Base takes into account the dynamical relationship between all of the interacting partners. For example, let’s assume, there are two positive regulators ( X and Y ) of a hypothetical species Z. While in the context of a parts list, information about the above species and interactions would be sufficient, in order to abstract the biological process to a dynamical model, one needs to know the dynamical relationship between the interacting partners. For instance, are both X and Y necessary for the activation, or is either one of them sufficient to activate Z? This is the type of information that is used to construct dynamical models in The Cell Collective.
Based on a widely known wiki-like concept, the Knowledge Base module of the platform was developed to allow laboratory scientists to contribute – collaboratively – their knowledge to the complete regulatory mechanisms of individual biological species. Because all of the regulatory information forms the basis of the modeled biological/biochemical process, and hence has to be correct for the model to exhibit similar behaviors as seen in the laboratory, this process of aggregating all known information about a species into one place can also serve as a mechanism to identify possible contradictions or holes in the current knowledge about the regulatory mechanism of a particular species. Using the previous hypothetical example, let’s assume laboratory scientist A discovers that proteins X and Y are both necessary to activate species Z, but scientist B’s laboratory results suggest either protein X or Y can sufficiently activate Z (Figure 3). The process of integrating all known information on species Z becomes crucial in discovering such discrepancies (or additional missing information), which may have not been found otherwise. Because the goal of The Cell Collective is to also integrate this information into dynamical models, simulations of the large-scale model (which might have hundreds or thousands of additional components in it) can suggest whose data is more likely to be correct. Assume that scientist A adds his information into the model and the model exhibits phenomena similar to the ones seen in the laboratory, whereas when the model is built with the data from scientist B’s experiments, the simulation dynamics of the overall model fails to resemble the known actions of the real system. In such a case, new laboratory experiments would be warranted, with a potential to produce more insights into the regulatory mechanism of protein Z (Figure 3).
The sea of biological information has made it difficult for the data to be verified on such an integrated basis. We fully understand how some of the most complex biological systems work only when the experimental data is re-integrated into and seen in the context of the entire system; a platform for integration of data is exactly what The Cell Collective provides.
Dynamical information
Each species in The Cell Collective’s Knowledge Base has a dedicated page where laboratory scientists can directly deposit their knowledge regarding the species’ regulatory mechanisms. While the wiki-like format of the Knowledge Base gives users the ability to input their data in a free form which can be also interactively discussed, each page is structured to help users organize and review their data more efficiently. Because the wiki format is an easy medium for collecting knowledge from a large number of individuals, a number of scientific efforts have successfully adopted a variation of this technology (e.g., [12][32][33][34]).
First, the Regulation Mechanism Summary section describes the general mechanism of the activation/deactivation of the species. This section, found at the top of the page of a given species, is most important from a systems perspective as the information therein takes into an account the role of all immediate upstream regulators (see below).
The Upstream Regulators section contains the list of key players that have a role in the regulation of the species, as well as any evidence (as found in the laboratory) supporting those roles. Using the earlier example involving the regulatory mechanism of species Z, this section would include proteins X and Y as upstream regulators, and the findings of laboratory scientists A and B suggesting the role of these regulators in the activation of the species (Figure 4). On the other hand, the Regulation Mechanism Summary section (discussed above) would contain the overall dynamical information as to how Z is regulated in the context of both X and Y (i.e., are both regulators required for the activation, or only one of them?).
Model-specific Information section: Because a number of molecular species can be regulated differently based on the type of the cell, this section allows users to enter such cell type-specific information. For example, an intracellular species can be regulated either by different players, or the same players but with different dynamical relationships in, say, a T cell and a mammary epithelial cell. This section enables users to differentiate between the regulatory mechanisms of the species in the two (or more) different types of cells (i.e., models). Hence, this section can be utilized by users to define upstream regulators and the regulation mechanism summary that is specific to users’ different models. For example, the regulation mechanism summary of species Z in scientist A’s model would describe his findings that both upstream regulators of Z are necessary for its activation, whereas scientist B’s regulation mechanism summary on wiki page for Z would indicate that either one of the upstream regulators can activate Z (Figure 4).
Finally, References is a section that users can use to record any published works that support information entered in any of the above sections. Users can enter references by simply entering the Pubmed ID (pmid) of the article of interest and The Cell Collective will automatically import all of the bibliographical information about the works.
As a starting point, we have deposited all biological knowledge describing one of the largest dynamical models of signal transduction built and published as part of our previous research [32]. This model consists of around 400 biochemical interactions between 130 species, comprising a number of main signaling pathways such as the Epidermal Growth Factor, Integrin, and G-Protein Coupled Receptor pathways. The dynamical information about the hundreds of local interactions, collected manually from published biochemical literature, is available in the Knowledge Base module. Expert scientists in the field may begin contributing to it, as well as discovering discrepancies and gaps in the biological knowledge that might have been included in the model.
Once the dynamical information about the individual interactions is added in the platform Knowledge Base, the next step is to convert this knowledge into a dynamical model; a discussion on where this piece fits into the overall concept of The Cell Collective follows in the next section.
Building computational models
While the Knowledge Base component of The Cell Collective serves as the knowledge aggregator for the dynamical regulatory mechanisms of individual biological species, the next step (#2 in Figure 2) is to convert this knowledge into a dynamical computational model that can be simulated and analyzed on the computer.
Perhaps one of the biggest challenges in transforming biological knowledge into a computational model is the conceptual gap between the mathematical and biological sciences. Thus far, the creation of mathematical models has been limited to scientists who are well versed in computer science and mathematics. To address this issue, we have developed Bio-Logic Builder (manuscript submitted), a component of The Cell Collective, which allows laboratory scientists to build computational models based purely on the logic of the species’ regulatory mechanisms as discovered in the laboratory.
The step of transforming biological knowledge into its model representation is aided by the information provided in the Knowledge Base component of the software platform (Figure 4). Specifically, as discussed above, the information recorded for the corresponding local interactions by individual scientists amounts to the overall regulation mechanism which represents the blueprint of each species’ bio-logic. While the local interactions (concerning a hypothetical protein Z in Figure 4) are discovered in the laboratory by individual scientists (for example scientists A and B as shown in the figure), the species overall regulation mechanism should take into an account all of the local knowledge (and hence should be determined in a collaborative fashion). Bio-Logic Builder was developed in such a way that all that is necessary to construct the computational representation of the regulatory mechanism of each species is the same qualitative data provided in the Knowledge Base component. Scientists define each species’ bio-logic in a modular fashion by simply defining activators and inhibitors (i.e., upstream regulators) of the species of interest, as well as the logical relationship between the upstream regulators (e.g., whether or not a set of activators is required for activation, as discussed in an example above). Because models in The Cell Collective utilize a qualitative, rule-based mathematical framework, no kinetic parameters are necessary to construct the models. (A quick tutorial on how to use the Bio-Logic Builder to construct models is available at http://www.thecellcollective.org)
Once the bio-logic is defined for all species in a given model, in silico simulations and analyses can be conducted (step #3 in Figure 2). How this can be done with The Cell Collective is the focus of the next section.
Simulations and analyses of model dynamics
The idea behind abstracting biological processes as computational models is to be able to visualize the dynamics of these processes on the computer, and to conduct in silico experiments that can provide i) new insights into laboratory experiments and ii) additional basis for theoretical computational research to further elucidate the complexity governing these biological processes. With its simulation and analysis component, The Cell Collective has been designed to provide exactly these features. Specifically, in the current version of the platform, two tools for simulations and analyses (discussed below) are available.
Real-time simulations
Perhaps the most unique and novel innovation to computational modeling is the real-time simulation feature in the platform, which allows users to visualize the dynamics of any model interactively and in real time. Similar to the rest of the platform, the simulation features have been designed with simplicity and intuitiveness in mind.
All modeled biological/biochemical processes in The Cell Collective, represented by species that make up the internal machinery of the cell, are simulated in external environments which drive the dynamics of the system. In our example of signal transduction, this environment is represented by external species corresponding to various extracellular signals such as growth hormones, stress, etc. Using a simple slider, users can change the amount of each extracellular signal (measured in %ON on a scale of 0 to 100 – see the Methods section for more detail) and visualize the effects of the changes on the dynamics of the cell while the simulation is running. Similarly, users can introduce biological mutations to simulate loss-of-function and gain-of-function experiments while watching the dynamics of the cell change as a result of the mutations. For users’ convenience, real time simulations can be also paused and resumed at any time. Figure 5 shows a screen-shot of the real time simulation tool. A short video demonstration of real time simulations using the previously mentioned large-scale model of signal transduction is also available as a Additional file 1.
Additional file 1: Real time simulation example. Video example of a real time simulation of a large-scale model of intracellular signal transduction. (MOV 10 MB)
Dynamic Analysis
Laboratory studies to identify functional relationships between extracellular stimuli and various components of the cell involve a number of experiments that can be both time consuming and resource demanding. For example, a laboratory study [37] that suggests that Akt (a serine/threonine kinase involved in the regulation of a variety of cellular responses such as apoptosis, proliferation, etc.) is activated in response to the Epidermal Growth Factor (EGF), the activity of Akt is measured and compared in untreated cells and cells treated with EGF. Such studies usually involve the construction of a number of protein constructs, cell cultures, assays, etc, amounting to the use of many resources.
While Akt has been known for many years to be activated in response to EGF, there are many areas of the cell that are not as well understood. Laboratory experiments in such areas can be sometimes based on less sound hypotheses that may lead to the waste of many resources. But what if one had the ability to pre-test laboratory hypotheses on the computer, using a computational model, in a matter of minutes? This would allow laboratory scientists to weed out weak hypotheses while focusing on the ones that have a better chance of being proven correct, and hence resulting in more efficient studies.
This is where the Dynamic Analysis simulation feature of The Cell Collective plays an important role. This tool allows users to conduct in silico experiments that closely resemble the way laboratory experiments are performed, with the advantage that in these computational studies researchers can perform more simulations and experiments in a much shorter time-frame. For example, models in The Cell Collective can be simulated and their dynamics visualized and analyzed in hundreds or thousands of extracellular environments (as opposed to the limited number of scenarios possible in the laboratory) in a manner of minutes.
As an example, we will demonstrate how the software can be used to study the relationship between EGF and Akt. The dynamical analysis studies are done in two parts. First, on the main page of the simulation tool (Figure 6), users define the extracellular environment under which the study will be done. This is analogous to the preparation of cell media in the laboratory. Similar to laboratory experiments with real cells, different studies using computational models (or virtual cells) also require the set up of optimal extracellular conditions. As visualized in the figure, this can be done easily by setting the ranges of the activity (from 0 to 100%) of the individual extracellular (external) species via the dual sliders (or by just typing the activity levels in the appropriate text boxes). Because in this example experiment, we are interested in the effects of EGF on the network model, the activity of EGF (boxed in red) is set to range on the full scale between 0 and 100% ON. On the other hand, the activity ranges of the remaining external species are selected for optimal results based on our previous research [32], and supported by laboratory-generated data. For example, the Extracellular Matrix (ECM) is set to higher activity levels, varying between 56 and 100% (boxed in blue); this corresponds to a biological finding that EGF-induced growth (as well as other cellular processes) is dependent on cell anchorage via ECM [38]. (Note that, from our experience with large-scale models, while optimal conditions should be determined, the simulations and results are not sensitive to exact values.)
While in this example, 100 simulations are performed, users can specify the number of simulations to be run within the study (Figure 6). During each simulation, an activity level for each extracellular species is selected randomly by the software such that the activity falls into the specified range. As a result, the user is able to simulate what would amount to 100 different laboratory experiments, with each experiment corresponding to a different external condition.
Once the in silico experiment has completed, users can analyze the dynamics of the model. Currently, the Dynamic Analysis tool allows users to generate dose-response curves to investigate qualitative (input-output) relationships between external cellular signals and various components of the model, such as the one between EGF and Akt as visualized in Figure 7. As can be seen in the graph, there is indeed a positive correlation between EGF and Akt, similar to the phenomenon seen in the laboratory. An additional significant advantage of computational experiments using this tool is that users can generate a number of analyses without re-running the entire experiment. For instance, in addition to examining the functional relationship of Akt and growth, one can generate similar dose-response curves for any species in the model using a single 100-simulation experiment. This is done by specifying the appropriate extracellular signal and output species (i.e., any species of interest) from drop-down menus available on the page. On the generated graph, the selected external species is represented on the x-axis whereas the output species is represented on the y-axis. Furthermore, similar to the real time simulation feature, mutations to any of the cellular species can easily be specified which allows users to simulate gain/loss-of-function in an intuitive fashion. In the current version of the software, users can generate the dose-response graphs for all species in the model by selecting the appropriate input-output species. While we are in the course of adding additional means of visualizing the simulation results, users can also download all generated (raw) simulation data, which can subsequently be analyzed by users according to their needs.
The Dynamical Analysis feature can be used not only to generate new hypotheses, but also to test the correctness of the model. Because the models are built using local knowledge of the individual interactions, how do we know that all of this local information adds up to a system that represents what is seen in the laboratory? Hence the correctness of the model needs to be tested on global phenomena of the system. The above example demonstrates how the model of signal transduction in a fibroblast cell can be tested to ensure that species associated with apoptosis and growth (such as Akt) appropriately respond to a growth signal (EGF). If, for example, the dose-response curve for Akt and EGF suggested a negative correlation, one would have to go back and investigate which of the local interaction data resulted in the contradictory result.
Seed models
In addition to the signal transduction model of a fibroblast cell created and previously published by our group [32], as part of our most recent research efforts, we have constructed additional models of the budding yeast cell cycle [39] and host cell infection by Influenza A, including the viral replication cycle (manuscript submitted). We have also re-created a model of ErbB signaling and regulation of the G1/S transition in the cell cycle during breast cancer. This model was initially created by the authors to study trastuzumab resistance and predict possible drug targets in breast cancer [40]. All of these models are now available and published in The Cell Collective, hence available to the scientific community as seed models for further contributions and/or simulations and analyses.
Collaboration and accessibility
As discussed in the Background section, collaboration amongst laboratory scientists working in different areas of complex biological processes and the accessibility to modeling frameworks is key to new discoveries using the systems approach. These two properties were strictly kept in mind when designing the software, and provide the main framework for The Cell Collective.
First, motivated by this framework was the use a wiki-like format to keep track of the knowledge concerning the dynamical properties of biological process. This framework was also applied to the way users interact with the actual computational models.
Perhaps the most important feature in the context of accessibility is the concept of “Published Models” (Figure 8). These models created by the community are freely accessible to all registered users, fostering the idea of open science. All users can view the bio-logic as well as the information in the knowledge base, and perform real time simulations on these models directly. To make changes to these models and see how these modifications affect the dynamics of the model, users can create personal copies of published models. Once a copy of a published model is created, the copy will be available and visible only to the one user until shared under “My Models” as seen in Figure 8. (As mentioned earlier, a number of models are now available under Published Models for all users to access and simulate.)
My Models is a collection of models created by any given user. Users have an additional ability to share and collaborate on any of these models with a select group of colleagues. The degree to which such a collaboration can take place is guided with the choice of three types of permission a user can specify when sharing his/her model. First, models can be shared such that other users can simulate the shared models and view the model’s bio-logic. A second way of model sharing also allows other users to contribute to the models and directly edit them. Finally, models can be also shared so that other users become model administrators and have the same rights as the creator of the model, including the ability to share the model with additional collaborators.
Many biomedical research software tools (especially the commercial ones) tend to limit users in such a way that once the user commits to the tool, it becomes difficult to move their data to a different platform. This is exactly the opposite with The Cell Collective. In addition to being able to share models with any and every user of the platform, features to export models in formats that can work with other modeling tools are also available. In the most recent version, users can export all mathematical expressions for each model (including the available published models) in the form of flat text files as well as SBML (SBML [28]).
Finally, a forum is available as part of The Cell Collective modeling suite. This will afford users additional means of communication with the scientific community as well as with the platform’s development team.