Tav4SB: integrating tools for analysis of kinetic models of biological systems
© Rybiński et al; licensee BioMed Central Ltd. 2012
Received: 15 September 2011
Accepted: 5 April 2012
Published: 5 April 2012
Skip to main content
© Rybiński et al; licensee BioMed Central Ltd. 2012
Received: 15 September 2011
Accepted: 5 April 2012
Published: 5 April 2012
Progress in the modeling of biological systems strongly relies on the availability of specialized computer-aided tools. To that end, the Taverna Workbench eases integration of software tools for life science research and provides a common workflow-based framework for computational experiments in Biology.
The Taverna services for Systems Biology (Tav4SB) project provides a set of new Web service operations, which extend the functionality of the Taverna Workbench in a domain of systems biology. Tav4SB operations allow you to perform numerical simulations or model checking of, respectively, deterministic or stochastic semantics of biological models. On top of this functionality, Tav4SB enables the construction of high-level experiments. As an illustration of possibilities offered by our project we apply the multi-parameter sensitivity analysis. To visualize the results of model analysis a flexible plotting operation is provided as well. Tav4SB operations are executed in a simple grid environment, integrating heterogeneous software such as Mathematica, PRISM and SBML ODE Solver. The user guide, contact information, full documentation of available Web service operations, workflows and other additional resources can be found at the Tav4SB project’s Web page: http://bioputer.mimuw.edu.pl/tav4sb/.
The Tav4SB Web service provides a set of integrated tools in the domain for which Web-based applications are still not as widely available as for other areas of computational biology. Moreover, we extend the dedicated hardware base for computationally expensive task of simulating cellular models. Finally, we promote the standardization of models and experiments as well as accessibility and usability of remote services.
The Taverna Workbench  is a tool which facilitates the design and execution of in silico experiments. The experiments are constructed as workflows which can be stored and executed when needed. The building blocks of a workflow are services, also known as processors. Technically, workflow is a set of processors, together with connections between their inputs and outputs. The remote processors are implemented as Web service (WS)  operations. Scattered physically throughout computational resources of numerous scientific facilities and combined together, the WSs operations enable a highly complex analysis, surpassing limits of a common workstation.
Taverna services come from a diverse set of life science domains. In the field of computational biology, the Taverna Workbench provides an access to services which are mainly related to the sequence annotation and analysis. Here, we present remote processors that extend Taverna’s functionality in the domain of systems biology, specifically, in the analysis of kinetic models of biological systems. Our hardware base offers computational resources sufficient for computationally demanding experiments, such as multiple invocations of the model-checking procedure. Essentially, the Taverna Workbench provides a convenient user interface for our WS operations. Without programming their own WS client, users can analyze the behavior of cellular systems under various conditions.
For a given biochemical network model, the underlying mathematical model is determined by the chosen semantics. The most common representations are ordinary differential equations (ODEs) for the deterministic framework and continuous-time Markov chain (CTMC) for the framework [3, 4]. The latter representation may be equivalently expressed as a set of differential equations, know also as the chemical master equation. Unlike the Tav4SB project, almost all of the Web-based applications reviewed in  allow for the analysis of only deterministic representations of biological systems.
numerical simulations for the deterministic formulation of a biochemical network model, using the SBML ODE Solver library (SOSlib) ,
visualization of data series, such as ODEs trajectories or values of parametrized CSL properties, and probabilistic distribution sampling, using Mathematica , and
high-level analysis, such as multi-parameter sensitivity analysis (MPSA)  of biological models, with error calculation via either numerical simulations or the probabilistic model checking technique.
The SBML ODE Solver library enables numerical analysis of models encoded directly in Systems Biology Markup Language (SBML) . The library employs libSBML  to automatically derive ODEs, plus their Jacobian and higher derivatives, as well as the CVODES package — the state of the art numerical integration library from SUNDIALS .
PRISM is one of the leading tools implementing probabilistic model checking, a technique of formal verification of systems that exhibit a stochastic behavior. A system to be analyzed is modeled as a Markov chain, and an examined property is expressed in a suitable probabilistic temporal logic. Some recent works, see e.g. [14, 15], demonstrate applicability of PRISM to analysis of models of biological systems. Case studies include models of cell cycle control, fibroblast growth factor signaling, and MAPK cascade . For biological applications a CTMC is typically chosen as an underlying mathematical model and its properties are specified in a continuous time logic, for instance in CSL. This approach seems promising and, compared with numerical simulations, it can often yield a better understanding of the dynamics of analyzed systems.
PRISM handles models defined in the PRISM input language. Currently, a prototype translator from SBML is not integrated into the application itself. Therefore, we also provided a separate operation to automatically translate from SBML to the PRISM language, using the prototype translator.
Finally, Wolfram’s Mathematica is a tool with one of the most advanced graphics engines among plotting software. Tav4SB provides Mathematica’s two- and three-dimensional list plots together with a versatile set of options for customizing their display. Additionally, Tav4SB allows to sample from the extensive collection of parametric probability distributions available in Mathematica.
The aim of the Tav4SB project is to support the orchestration of physically scattered tools for execution of repeatable scientific experiments To understand a place of Tav4SB in a plethora of similar software, consider the following, mundane technical problem. You have a set of scripts, command line tools or any other form of legacy code, installed on one or more computational servers, not necessarily in the same local area network. For instance, you might have a Mathematica script which can be only executed on a server which has Mathematica installed on it; and simultaneously you might need to use PRISM, installed on a remote server with a large amount of required memory. You want to connect these tools in an in silico experiment, say described by a workflow. Moreover, in case the experiment doesn’t go as planned, you want to be able to easily modify and re-run your workflow.
Tav4SB project is a realization of a minimalist approach to a platform-independent solution, based on the workflow management system and a service-oriented architecture built around the Web service standard and a straightforward queue of computational tasks.
Tav4SB project consists of two parts. The client part of the project (Tav4SB client) is a library of sample workflows and helper scripts for analysis of kinetic models of biological systems, using earlier described features. The server part of the project (Tav4SB server) is a simple grid environment which wraps aforementioned computational tools. Those tools are intended to be run in a multi-threaded manner, on one or more, possibly remote, computational servers.
As an utility for wrapping scientific software in Web services, the Tav4SB project enters premises of projects such as Soaplab2  and Opal2 [18, 19]. The main difference is that the support for the physical scattering of computational tools is an integral part of the Tav4SB server. Moreover, Tav4SB server easily allows for a direct connection with legacy code. If necessary, the Java Native Interface (JNI)  can be used to connect with the platform-specific libraries written, for instance in C, C++, or Fortran. However, in the current state of the project, all that comes at a cost of moderate programming skills required from a user of the Tav4SB server, when compared to Soaplab2 and Opal2 strategy with the custom configuration file languages. Please note however that these languages need to be learned and they pose an easier approach for the user only to a limited extend. Also note that, as a minimalist solution with the stateless Web service interface, the Tav4SB server doesn’t comply with the standards of an open, stateful grid services architecture (cf. Web Services Resource Framework ), which the most prominent representative is Globus Toolkit , a full-fledged grid environment.
We have chosen the popular Systems Biology Markup Language (SBML) , an XML-based data format, to represent kinetic models of biological systems. Due to the wide range of dedicated software and due to the support by models repositories like BioModels , SBML can be used without a detailed knowledge of the language specification.
Client communicates with the server side via WS operations, using Simple Object Access Protocol (SOAP) . These operations represent the workflow’s remote processors. Their signatures are defined in a Web Service Definition Language (WSDL)  file. We employed a “WSDL first” approach: the WSDL file was manually written (in a document/literal style).
Java Web service classes were automatically generated from the WSDL file.
The WSDL file is hosted by the Apache Tomcat servlet container. It acts as a proxy between the client and the computational part of the server. A Web service operation call is translated into a Java Message Service (JMS)  messages. JMS Application Programming Interface (API) allows Java applications to create, send, receive, and read messages. It is a part of the Java Platform, Enterprise Edition (JEE) standards. In our system, JMS messages represent computational tasks, and their results. One operation call can be translated into multiple tasks, enabling seamless, tool-specific parallelization of a submitted job.
Computational cluster management modules are written in Java using the Apache ActiveMQ implementation of the JMS standard. These modules are deployed as the Java Archive (JAR) files. The JMS messages are sent over TCP/IP, which basically makes modules independent of their physical location.
New tasks, created by the Web server module, are added to the tasks queue. At this point tasks are assigned to any available worker of a compatible type. Results are collected in a temporary queue, exclusive for a single WS operation call. Long-running tasks use an asynchronous call registry. In such case, direct (synchronous) response to the WS operation call is merely a message reporting the start of computations. The computed results are collected in a dedicated queue and, when completed, sent to a caller by email (using the JavaMail package).
Worker translates both a JMS task message into running computational processes and results of these processes back into a JMS result message. Each worker supports a specific type of computation and can communicate with an actual computational tool differently. Currently we implemented three types of workers: Mathematica worker which communicates with Mathematica via J/Link library, PRISM and odeSolver workers which communicate with, respectively, PRISM and SOSlib via a command-line interpreter (shell).
We constructed a set of exemplary workflows. Their main purpose is to demonstrate how Tav4SB WS operations can be used by the Taverna Workbench client. There are two kinds of workflows: Tav4SB WS operation wrappers and in silico experiments.
Wrapper workflows illustrate a direct usage of Tav4SB operations in Taverna. Their purpose is to be re-used as nested workflows — building blocks of experiments described below. Additionally, we built a number of helper Taverna processors, used for interacting with XML-formatted inputs and outputs of WS operations.
The first workflow numerically simulates the ODEs of the model and plots resulting trajectories. ODEs are derived automatically from a SBML model file, based on rate laws of reactions. In the deterministic model of the enzymatic reaction, rates are described by the law of mass-action. As a result of running this simple experiment one gets time evolution of species concentrations in the form of both data points series and a plot.
Sensitivity analysis investigates a relation between uncertain input or parameters of a model, and a property of an observable output [10, 29]. Sensitivity analysis has been used for various parametrization tasks of models of biological systems, including finding essential parameters for research prioritization , identifying insignificant parameters for the model reduction  or parameters clustering for the discovery of common functions .
Select parameters to assess.
Set parameters range.
Generate independent samples.
For each sample calculate the error (based on the output).
Classify samples as acceptable or unacceptable.
For each of the selected parameters compare the classified samples sets.
Calculating the error for each sample (Step 4) involves a separate analysis of the model. This is a factor that determines the running time of the MPSA procedure. We ran two variants of MPSA, differing in the way in which the error is calculated. In one variant we used ODEs simulations and in the other one we exploited the probabilistic model checking technique. We focused on kinetic parameters of two forward reactions of enzymatic reaction models (Equation (1)), i.e. k 1 and k 3. As an error function we took, respectively, the mean squared error of an ODE trajectory of the product P and the absolute difference of the value of the formula (3), in both cases between results for a parameters sample and for the reference values of parameters (Equation (2)). In turn, we obtained empirical cumulative distribution functions (ECDF) of acceptable and unacceptable samples, for each of the selected parameters. ECDFs were compared using the Kolmogorov-Smirnov test (KS-test) and one minus the Pearson product–moment correlation coefficient (PMCC). As a final output of the MPSA method, we got two rankings for each of the sensitivity indices: KS-test and PMCC.
Interestingly, the results of the other variant of the MPSA procedure are significantly different; one observes that now k 1 dominates k 3. This may be ascribed to the particular choice of the formula (3) which calculates the average number of occurrences of the first reaction r 1. Furthermore, an inspection of values of sensitivity indices given in Figure 5 brings to light that the domination is not as definite as in the first variant of MPSA. Results demonstrate that an application of the probabilistic model checking technique may allow for revealing more subtle dependencies in the model, depending on the properties of interest.
MPSA combined with PMC may be applied as a pre-processing step which finds parameters that are insignificant for an analysis oriented on a very specific property of a model. This would provide a novel notion of a probabilistic abstraction , i.e. property-specific reduction of the probabilistic model. However, for a successful application, the pre-processing should have low running time, compared to an analysis that follows. In our experiment this is not the case, as we run the exact PMC procedure, which is essentially the same one that would be ran during the further analysis. However, we conjecture that for the MPSA procedure the level of accuracy offered by PRISM is much too high. We suppose that satisfactory results may be obtained using an approximate approach, such as Monte Carlo model-checking . We plan to pursue this idea as a continuation of the work presented here.
To measure the network load and the overhead of the task management in Tav4SB server we ran a performance test. The test was set up with the MAPK cascade case study from the PRISM Web page  and with the asynchronous version of the PRISM WS operation. This version of the PRISM operation sends computation time statistics, together with results (by email). To run the performance test, we deployed the Tav4SB server on the conventional, computational cluster maintained by the Center of Excellence BioExploratorium at the University of Warsaw. The cluster contains 16 machines with 2 dual-core CPUs each, giving 64 cores in total. We used 14 machines to deploy workers and 1 machine for the management queue. The Web server was deployed on a separate gate server.
Results of the performance test of Tav4SB server
# of threads/machines
Web-based applications are still not as widely available for the systems biology domain as for other research areas . One reason for this state of affairs is the fact that simulating cellular models is computationally expensive, when compared to the data processing tasks. In turn, there is a constant demand for a hardware dedicated to the analysis of kinetic models of biological systems .
Our services extend the functionality of the Taverna Workbench in the field of systems biology. Together with the services we provide a hardware base for our minimalist grid environment. The grid itself can, and will be, easily extended, independently of a physical location of peripherals and independently of an operating system they are running. Moreover, our grid facilitates integration of heterogeneous tools, such as Mathematica, PRISM or SOSlib. The end-user goal of the Tav4SB project is to abstract details of the technological infrastructure. Finally, via SBML and the Taverna Workbench, we would like to promote standardization of models and experiments as well as accessibility of services and their usability for non-programmers. In order to further enhance the usability, we released the source code of the project so that users can extended the Tav4SB functionality with their own workers modules. Users with programming skills can contribute to the development of the technical aspects of the server part of the project. These aspects cover the plug-in architecture of workers, the library of legacy code connectors (e.g., currently used, command-line interface or Java library), descriptors for the automatic generation of the workers code for common types of wrapped applications (cf. ACD metadata files in the Soaplab2 project ), and, last but not least, the support for Semantic Web services and ontologies [39–41].
From the point of view of in silico experiments, we propose a novel technique: application of the probabilistic model checking to the calculation of error in the multi-parameter sensitivity analysis procedure. It seem that this approach is particularly well suited for revealing intricate and subtle dependencies, that may not be discovered using, for instance, ODE-based numerical simulations of a model. We suppose that this technique may have interesting applications, e.g. for probabilistic abstraction .
Project name: Tav4SB
Project home page: http://bioputer.mimuw.edu.pl/tav4sb/
Operating system(s): Platform independent (both client and server parts)
Programming language: Optionally, SCUFL/t2flow, BeanShell, XSLT (client) and Java, Mathematica, Bash (server)
Other requirements: the Taverna Workbench client 2.3 or higher, JSBML 0.8-b2, plus, optionally, any files hosting Web server (client) and Apache Tomcat 6.0 series, Apache Maven 2 or higher, plus, optionally, Mathematica 7.0 or higher, PRISM 4.0 series and SBML ODE Solver 1.6 (server)
License: GNU AGPL
Any restrictions to use by non-academics: None
Please note that, technically, SCUFL and t2flow are workflow description languages, but together with the graphical notation provided by the Taverna Workbench they can be seen as visual programming languages. These and other client dependencies on a programming language are optional because one can write their own WS client in virtually any language. Also, be advised that the Apache Maven tool (in other requirements) automatically resolves all dependencies on Java libraries, such as JavaMail or Apache ActiveMQ (cf. Figure 1).
The definition of operations provided by Tav4SB WS plus workflows files, together with installation and execution instructions are available from the project’s home page. Documentation of the Tav4SB WS can be found in BioCatalogue , a curated catalogue of life sciences Web services. Wrappers and experiments workflows are also available from the myExperiment repository , together with the workflow figures.
Client workflows were tested on Ubuntu Linux (10.10), Mac OS X (10.6.8) and Windows Vista (Business) operating systems. The production server is currently deployed on computational servers at the Faculty of Mathematics, Informatics and Mechanics of the University of Warsaw (running Ubuntu Linux Server, Gentoo Linux and PLD Linux). The performance test server was deployed on a cluster of Ubuntu Linux machines (workers and queue) and Solaris gateway (WS). A local developer’s environment, with both client and server, was deployed and tested on Ubuntu Linux (10.10) and Mac OS X (10.6.8).
This work was partially supported by the Polish government grant N N206 356036, and by the Biocentrum Ochota project (POIG.02.03.00-00-003/09). The first author is a scholar within the Human Capital Operational Programme financed by the European Social Fund and state budget. This paper was written for the benefit of University of Zielona Góra.
EU logotypes We would like to thank to Janusz Dutkowski (Departments of Medicine and Bioengineering, University of California San Diego) for helpful comments on the manuscript.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.