 Software
 Open Access
 Published:
RaTrav: a tool for calculating mean firstpassage times on biochemical networks
BMC Systems Biology volume 7, Article number: 130 (2013)
Abstract
Background
The concept of mean firstpassage times (MFPTs) occupies an important place in the theory of stochastic processes, with the methods of their calculation being equally important in theoretical physics, chemistry and biology. We present here a software tool designed to support computational biology studies where Markovian dynamics takes place and MFPTs between initial and single or multiple final states in networklike systems are used. Two methods are made available for which their efficiency is strongly dependent on the topology of the defined network: the combinatorial Hill technique and the Monte Carlo simulation method.
Results
After a brief introduction to RaTrav, we highlight the utility of MFPT calculations by providing two examples (accompanied by Additional file 1) where they are deemed to be of importance: analysis of a proteinprotein docking funnel and interpretation of the free energy transduction between two coupled enzymatic reactions controlled by the dynamics of transition between enzyme conformational states.
Conclusions
RaTrav is a versatile and easy to use software tool for calculating MFPTs across biochemical networks. The user simply prepares a text file with the structure of a given network, along with some additional basic parameters such as transition probabilities, waiting probabilities (if any) and local times (weights of edges), which define explicitly the stochastic dynamics on the network. The RaTrav tool can then be applied in order to compute desired MFPTs. For the provided examples, we were able to find the favourable binding path within a proteinprotein docking funnel and to calculate the degree of coupling for two chemical reactions catalysed simultaneously by the same protein enzyme. However, the list of possible applications is much wider.
Background
The theory of stochastic Markov processes has many applications in theoretical physics, chemistry and biology [1–5]. If a system allows transitions in time between various discrete states, we may model the system in the general language of networks and assign dynamics properties to the nodes and edges of such networks [6]. In early attempts to understand the dynamics of stochastic networks the term ’random walk’ was introduced [7], which describes the displacement of a point on a network after a sequence of random moves. Equally important to measuring displacement is the quantity called mean firstpassage time (MFPT), which is defined as the time needed to reach the final state by a statistical ensemble of network walkers [8]. According to the alternative definition, this time is equivalent to the reciprocal steadystate flux resulting in a network in which a single walker returns instantly to the initial state every time it reaches the final state [9].
MFPTs have been successfully used in a variety of biochemical studies, for example: the study of protein folding times [10], protein helix unfolding rates under mechanical forces [11], studies of DNAbased nanoscale walkers called molecular spiders [12], studies of polymer translocation [13], calcium spark activation times [14], metastatic cancer progression [15], and the analysis of temperature and detectionwavelength dependence of the electron transfer rates in the initial stages of photosynthesis [16].
In general, time for random walks upon a network may be discrete or continuous. The RaTrav tool works with discrete time. To our best knowledge, there is currently no open source software available which is able to perform MFPTs calculations on a discrete time and space network of any arbitrary size. In this article we present RaTrav, an open source software tool. First, we introduce the formulation of MFPT calculations and provide details on the implementation and usage of RaTrav (Methods section). Then we focus on two biological applications and demonstrate how various biological questions may be answered using the RaTrav software tool (Results and discussion section).
Methods
Mean firstpassage times
When considering the Markov processes, MFPT is defined as the average time (in number of steps or other units of time) for a statistical sample of random walks starting in some initial state on a network of states to reach for the first time the desired final state or any state from a collection of final states. In formal terms, the stochastic continuous time Markov process realized on a given network of states is described by a system of coupled master equations
which jointly describe the time variation of the probabilities p_{ l }(t) of the hypothetical random walker being in some state l at time t, where the overdot denotes a time derivative. In Eq. 1, ${w}_{{l}^{\prime}l}$ is the transition probability per unit time along the edge from state (node) l to l^{′}, which in general needs not satisfy the detailed balance condition. Furthermore, the transition probabilities ${w}_{{l}^{\prime}l}$ from a given state l to its nearest neighbours l^{′} need not sum up to unity. On establishing a discrete time t = n τ_{0}, where n is the number of steps measured in some unit of time τ_{0}, the differential Eq. 1 is to be recast as the difference master equation
in which, as opposed to Eq. 1, the transition probabilities ${u}_{{l}^{\prime}l}\phantom{\rule{0.3em}{0ex}}=\phantom{\rule{0.3em}{0ex}}{\tau}_{0}{w}_{{l}^{\prime}l}$ from the node l to the node l^{′} sum up to unity:
Let us note that in general the above sum includes also the waiting probabilities u_{ l l } on a given node l before a random walker performs a jump to a neighboring node l^{′}. They have efficient applications in many biological and physical problems. For example, a proteinprotein complex residing in the bound conformational state while performing its function and before proceeding to the dissociation state, or the components of the complex becoming trapped in a local minimum of the free energy landscape before finding their optimal docked, and fully functional, conformational state. Moroever, the longliving metastable states are detected in various excited atomic or molecular systems before they relax to the lowerlying energy states.
In developing the RaTrav code we have ensured that the definitions for the input data are as general as possible. Therefore, we have made a distinction between the exit probability from a node and the transition time to its nearest neighbour. More generally, on defining the various local times ${\tau}_{{l}^{\prime}l}$, individually for each transition along a link with the probability ${u}_{{l}^{\prime}l}$, we have the alternative expression for the transition probabilities per unit time in Eq. 1:
In the particular case for all ${\tau}_{{l}^{\prime}l}\phantom{\rule{0.3em}{0ex}}=\phantom{\rule{0.3em}{0ex}}{\tau}_{0}$ we reconstruct the transition probabilities that fulfill the conditions in Eq. 3. The local times can be related to the reaction coordinates, for example, when employing molecular dynamics simulations to move between well defined protein conformational states, which might be measured on the order of microseconds.
By virtue of its generality, Eq. 4 has been directly implemented only in the Hill combinatorial technique [9]. Nevertheless, as regards Monte Carlo simulations, it ensures a determination of the same unit of time for the both methods. The description and benchmark of the combinatorial Hill and stochastic Monte Carlo methods on basic networks (equal exit probabilities towards each neighbour, equal weights of edges) may be found in our previous paper [17]. In comparison to the previous results, newly developed C++ code is provided, which allows for the definition of multiple final states, different transition probabilities and local times along network edges. Moreover, the networks can be connected or disconnected structures represented as basic graphs, as directed graphs, as multigraphs and as multicomponent graphs or their mutual mixtures. In order to compare MFPTs generated by these two methods we have run a series of tests on a number of network topologies studied previously [17]: hypercubes of various dimensions, Sierpinski gaskets of various orders, Bethe lattices with various number of shells and random treelike networks; with equal and randomly chosen probabilities, with identical and different local transition times, and with single and multiple final states. For most of the cases, the difference between MFPTs calculated from the Hill and Monte Carlo methods, calculated as 100% (H − MC/H), was smaller than 0.2% when using 10^{7} walkers.
Both methods mentioned above have their benefits and drawbacks. For instance, the advantage of Hill’s over the Monte Carlo method is its speed and precision of calculation when the network is an acyclic graph, or includes a low number of cycles. On the other hand, for particularly knotted networks, the Monte Carlo method is the logical choice, providing reliable MFPT estimations in a reasonable turnaround time. The best strategy to follow using the Monte Carlo method is to start with a lower number of walkers; even if the obtained results are not particularly accurate, this helps to estimate the running time with the desired higher number of walkers and avoids the situations where calculations are indicated to be intractable in a finite time. Some indication of running times may be found in Tab. 4 of [17].
Implementation of the Monte Carlo simulation method
The Monte Carlo method relies on the simulation of random walks on a network of interest, driven by a pseudorandom number generator. Random numbers were generated in the Monte Carlo method using the Boost C++ Libraries components (http://www.boost.org) – Mersenne Twister mt19937 random number generator (with a standard seed) and a uniform_real_distribution function.
In our implementation (see source code), there is an outer for loop over the number of simulations and an inner for loop over the number of walkers in each simulation. Each walker is placed in an initial state (node on the network) and its first passage time set to 0. For each iteration, a random number is generated. Exit probabilities from each node sum up to one but do not have to be equal, thus forming a ranges of transition probabilities to each neighbour. Each transition is chosen as an effect of casting a random number. After a walker undergoes a transition it is placed in a new state (or stays in the same state if the waiting probability exists and the random walker chooses this transition) and its first passage time is increased by the weight of edge (local time) it passed. The walker performs its random walk until it reaches the final state or one state from the set of final states. When all random walkers finish their walk, the mean firstpassage time is calculated based on first passage times for each walker. If the user chooses to perform more simulations with the given number of walkers, mean firstpassage times are averaged further and standard deviation of this mean over the number of simulations, with the given number of walkers, is calculated. Please notice that if the number of walkers is large enough (in fact the user should perform some initial simulations to be sure the number of walkers is sufficient for a new network), MFPTs obtained from each simulation will be similar. However, because of the random movement of walkers, the first passage time of a single walker cannot be predicted based on e.g. first passage time of another walker.
In terms of efficiency of the calculations, because the current step of a walker determines its next step, a single random walk cannot be parallelized. However, the calculation of mean firstpassage time may be parallelized, because the walkers in the statistical ensemble are independent. We have shown that using parallel processing methods (Message Passing Interface), the efficiency may be increased over 90% using more than 10 CPUs or by using a few cores on a single CPU [18]. However, to keep the RaTrav software as portable as possible, we haven’t implemented MPI parallel processing methods in the current version of the software.
Implementation of the combinatorial Hill method
The Hill method relies on the idea that instead of an ensemble of walkers one can consider only a single walker that after traversing a network of states (nodes) appears instantly at the starting node every time it reaches the target node. Because this procedure is repeated many times the Hill method shows that the MFPT τ between these two nodes corresponds to the reciprocal oneway stationary flux J, τ = J^{−1}, resulting in a modified network in which the transitions to the target node have been redirected to the starting node. The steadystate flux has the following form
where ${w}_{l{l}^{\prime}}$ stands for the transition probabilities per unit time from a set of l^{′} nodes adjacent to the target node l and occupied by the walker with probabilities ${p}_{{l}^{\prime}}^{\text{st}}$. We can calculate these probabilities solving the system of the stationary master equations in Eq. 1 for ${\stackrel{\u0307}{p}}_{l}\left(t\right)\phantom{\rule{0.3em}{0ex}}=\phantom{\rule{0.3em}{0ex}}0$, or equivalently using the Hill algorithm [9] that we have now implemented in the RaTrav tool. The algorithm proceeds according to the following steps:

1.
Determine two nodes on the original network (graph) in order to calculate the MFPT between them.

2.
Modify this graph through the identification of the initial node with the final node, combined with elimination of the latter.

3.
Construct for such a modified graph G the complete set of its subgraphs, called maximal trees T. The maximal tree is a connected graph which contains all nodes of the graph G and no cycles.

4.
Make each possible maximal tree T to be a directed graph. It is obtained from T by directing all its edges (links) towards the node l. Each directed tree T _{ α } contributes a weight W _{ l }(T _{ α }) as the product of transition probabilities per unit time w _{ i j } from the node j to the node i.

5.
Calculate the sum of these weights, ${W}_{l}\phantom{\rule{0.3em}{0ex}}=\phantom{\rule{0.3em}{0ex}}\sum _{\text{all}\phantom{\rule{2.77626pt}{0ex}}{T}_{\alpha}}$ W _{ l }(T _{ α }), which run over all maximal trees T _{ α } directed to a given node l.
Then, the steadystate occupation probability of the node l in the network (graph) G becomes
where the expression in the denominator obeys a summation over all sum of weights generated for the graph G to ensure that $\sum _{n}{p}_{n}^{\text{st}}\phantom{\rule{0.3em}{0ex}}=\phantom{\rule{0.3em}{0ex}}1$. The construction of these probabilities is fundamental for the calculation of the stationary flux in Eq. 5 and finally the MFPT. To this end we have applied the algorithm which for a given set of elements (the graph edges) generates its subset in lexicographical order.
Input file and control keywords
The user is required to prepare a text file with the structure of a given network. Upon completion of the computations, an output file is produced with computed MFPTs along with, in case of the Monte Carlo method, their estimated errors. The user needs to choose between the Hill’s and Monte Carlo methods, between using basic (just neighbouring nodes) or advanced (with transition probabilities and local times along edges), between calculating all the MFPTs or only a selection, and whether to define multiple final states.
Let us start with a very simple square network for which the input file in RaTrav format takes the following form:
Keyword NODE is followed by the node number (the first numerical column, entries are counted from 0; it is important to maintain an increasing numeric order of node identities and gaps in numbering are not permitted). Subsequent numbers are the identities of neighbouring nodes and their order does not matter. In the case of performing a Monte Carlo simulation there is the requirement for WALK and SIMU keywords (number of walkers, number of simulations with WALK walkers). Entering SIMU >1, the user will receive error estimates for each MFPT calculation (standard deviation of average MFPT over the number of simulations run). To establish waiting probabilities on, for example the 9th node, the NODE 9 input line may be defined as:
i.e. the node number is repeated. This means that the node labelled as 9 in a graph connects directly to the node labelled as 3 and to itself twice, so the transition probability from node 9 to node 3 is 1/3, whereas the waiting probability on the node 9 is 2/3. If the user chooses to use Hill’s algorithm, WALK and SIMU keywords won’t be used even if present in the input file.
In advanced file format the user can define transition probabilities and weights of edges. To minimize rounding errors, in addition to the decimal format, it is possible to define the transition probabilities and weight of edges as common fractions. The same square can be defined as follows (node number, followed by a triplet containing: identity of neighbouring node, transition probability, weight of edge):
With transition probabilities equal to 1/2 and weights of edges equal to 1.0, this format is equivalent to the basic file format. However, in advanced file format it is possible to use different combinations of probabilities and weights. Requirements are that the probabilities in each line have to sum up to 1.0, and in the case of Hill’s method, at least one local timescale (weight of edge) must be different from 0. The last condition does not apply to the Monte Carlo method. Thus, on account of some specific problem, the advanced input file may take the following form:
In the above please note the different values of weights possible for twoway transitions between the same pair of nodes, for example, 1 and 3. We assume in general that a passage in either direction along a given edge does not need to correspond to the same weight or transition probability.
Optional use of the MFPT keyword allows definition of selected MFPTs the user wants to be computed, e.g.: MFPT 0 1 means the MFPT between states 0 and 1 will be calculated, MFPT 0 1 2 means the MFPT between initial state 0 and two final states will be calculated – either state 1 or 2 has to be reached (it is possible to define any number of final states).
The INFO keyword allows the user to pass any comments which will be copied to the output files.
Compilation and usage
There is a Makefile attached so under Linux it is sufficient to simply call make. However, two variables have to be set before doing so:
The first variable is used to choose the compiler, e.g. icc for Intel C Compiler or g++ for GNU C Compiler. The Monte Carlo method uses Boost C++ Libraries (http://www.boost.org) which have to be downloaded and installed. The second variable is used to set the path to the Boost C++ Libraries so the user must change PATH_TO_BOOST to the local path. Alternatively the RaTrav tool may be compiled without the Makefile, e.g. with ICC, as follows:
The general usage is as follows:
The five parameters have the following meaning: input is a name of the input text file which defines the network; output is a name of the output text file generated by RaTrav; the METHOD parameter that should be set equal to 0 to use the Monte Carlo method or 1 for Hill’s method; the INPUT_FORMAT parameter should be set equal to 0 for a basic input file or 1 for an advanced input file; the MODE parameter should be set equal to 0 when all MPFTs are to be calculated or 1 when selected MFPTs are to be calculated (MFPT keyword needed as introduced above).
For example: to use the Monte Carlo method, with a basic input file and for all MFPTs to be calculated, the user runs RaTrav input.txt output.txt 0 0 0; to use Hill’s method, with an advanced input file and for selected MFPTs to be calculated, the user runs RaTrav input.txt output.txt 1 1 1.
A basic example
To better explain the functionalities of the RaTrav tool, we present in Figure 1 a simple irregular network with nine nodes (black dots) and nine edges (links between nodes). It has one cycle (edges coloured in green); nodes have different exit probabilities ${u}_{l{l}^{\prime}}$ (probabilities of passing defined as common fractions; coloured in black, in case of a cycle in green, in case of waiting on a node in cyan), and local time scales ${\tau}_{l{l}^{\prime}}$ (the cost of passing defined as decimal fractions with arrows; coloured in red). For some nodes there are waiting probabilities (i.e. transitions to the same node; cyan circular arrows); sometimes the transition probability from an edge node is set equal to one, instead of a waiting probability with an associated waiting time (weight of self transition). Both variables ${u}_{l{l}^{\prime}}$ and ${\tau}_{l{l}^{\prime}}$ may be symmetric or asymmetric. The weight of an edge is the time required for transition between states, whereas the waiting time is the lifetime of the molecule in a particular state. These times can be related to experimental values, for example, the halflife of a complex can be calculated from its dissociation rate k_{off}, τ_{1/2}= ln(2)/k_{off}[19].
The network presented in Figure 1 (input and output files may be found in Additional file 1) is represented in RaTrav format as follows (nodes are numbered from the left to the right, and from top to bottom, so S=0, E1=7, E2=8):
Please note that node numbers must be written in increasing order, but the neighbours of each node may be written in any order. If we add the following lines to the input file:
the output using Hill’s method (WALK and SIMU not used) is:
and with the Monte Carlo method (WALK and SIMU used) is:
For the presented network, the MFPT between S and E1 is calculated to be 80.300, between S and E2 to be 83.233, and between S and E1 or E2 (multiple final state) to be 61.556. Two observations can be made: for the submitted network Hill’s method is faster and gives exact results; using the Monte Carlo method, with 1000000 walkers, gives similar results. In case of reducing the number of walkers to 100000 but increasing the number of simulations to 10:
the following results with Monte Carlo can be obtained:
In this case the program also returns the standard deviation for each MFPT; comparing each MFPT, and its associated error, with the equivalent but exact result from Hill’s method, we may notice that one standard deviation may be insufficient to denote each MFPT pair to match; however, within three standard deviations each MFPT pair should match. This may not always be the case for more complicated networks, since if the number of walkers in each simulation is too low, every node may not be visited by a walker from the ensemble (ergodicity issues).
Important remarks

1.
When using the Hill method the final state must be different from the initial state. However, when using the Monte Carlo method one can identify such states and obtain the MFPT, which is in fact a return time to the origin.

2.
For the Hill method the number of final states cannot exceed N2, where N is the total number of states (nodes in a graph). However, for the Monte Carlo method one can define such a configuration of states, e.g. MFPT 0 1 2 3 for a square.

3.
By choosing the Monte Carlo method the user is able to define ultrafast transitions between all or selected states, for which the local passage time along an edge between such states is set equal to zero. For the Hill method at least one local time scale must be different from zero.

4.
A network does not have to be compact. If there is no path between states, RaTrav will return MFPT as ’Infinity (not accessible)’.
Results and discussion
In this section we provide two applications for which the use of the RaTrav tool dedicated to MFPT calculations provides meaningful results. Each subsection includes a theoretical introduction to the problem, followed by a guide for the reader as to how to construct the appropriate RaTrav input files, and finally a discussion of the RaTrav results. The files accompanying the examples are available in Additional file 1.
Analysis of conformational pathways within a proteinprotein binding funnel
In the molecular machinery of life proteins are responsible for a diverse array of functions. However, the great majority of biological functions are mediated not by isolated proteins but by their interactions. In addition to predicting the correct geometry of proteinprotein complexes (in 3D) from their unbound components, for which a number of fully automated servers now exist, see for example the SwarmDock server [20] or the ClusPro server [21], of equal importance is to study the dynamics of binding, i.e. how the binding partners, upon complex formation, sample the binding funnel.
Studying the topological properties of proteinprotein binding funnels will enable us to understand how to change the dynamics of proteinprotein association in a controlled way. The importance of being able to do this relates to rational drug design, where funnel sampling becomes particularly important when designing a series of similar protein ligands (such as proteins with a few key point mutations), or blocking peptides, and ascertaining if they are likely to be more effective in inhibiting a particular receptor proteinbinding site more than the wildtype protein ligand.
Automatic generation of proteinprotein conformational space networks (in RaTrav formatted files), for any protein receptor/ligand pair, was recently incorporated into our docking tool, the SwarmDock Server [20]. For this study, we chose the vitamin Dbinding protein/actin complex (Brookhaven protein database code, 1KXP) [22], which was previously studied by us in terms of conformational occupation probabilities and their usefulness to filter away nonfunnel like proteinprotein energy structures, thus improving the ranking of the correct docking poses [23]. The above study, based on state occupancies, was useful in distinguishing between true positive and false positive binding funnels. In the example described below we focus on the properties of the true positive binding funnel and calculate mean firstpassage times between distinct conformational states within the funnel, i.e. we are interested in finding the favourable transition path from the top to the bottom of the binding funnel.
The initial network of 32 conformational states, generated by the SwarmDock server, is depicted in Figure 2. The assigned quality of each state, that is its similarity with the final bound complex state, was based in accordance to the CAPRI (Critical Assessment of PRediction of Interactions) criteria [24], on three quantities: fraction of native contacts (fnat), interface root mean square deviation (IRMSD) and ligand root mean square deviation (LRMSD). These values are used to classify the conformations as incorrect (fnat < 0.1 or (LRMSD > 10Å and IRMSD > 4Å)), acceptable ((fnat ≥ 0.3 and LRMSD > 5Å nd IRMSD > 2Å) or ((fnat ≥ 0.1 and fnat < 0.3) and (LRMSD ≤ 10Åor IRMSD ≤ 4Å))), medium quality ((fnat ≥ 0.5 and LRMSD > 1Å and IRMSD > 1Å) or ((fnat ≥ 0.3 and fnat < 0.5) and (LRMSD ≤ 5Å or IRMSD ≤ 2Å))) or high quality (fnat ≥ 0.5 and (LRMSD ≤ 1Å or IRMSD ≤ 1Å)), relative to the conformation of the native bound complex (i.e. the conformation at the bottom of the binding funnel).
Similarly to the previous study [23], the transition probability for states with ligand RMSD above 6Å was set to zero. For the remaining states, transition probabilities were assigned based on energy value (the OPUSPSP potential [27]): if the energy of the state in question was higher than the neighbouring state, the exit probability was set to 1.0, otherwise it was set to exp(−Δ E), where Δ E is the difference in the energy between the two conformations. The exit probabilities for each node were normalized. In the present study, the weights of each edges are set equal to 1.0 (global time scale, we assume that conformational states are sampled uniformly in time), and if the transition probability is smaller than 10^{−6} for a particular edge it is removed. PDB files (Example1/*.pdb), the RMSD matrix file (Example1/1KXP.rmsd), the MFPT matrix file (Example1/1KXP.mfpt) and characteristics of the conformational states file (Example1/1KXP.info) are deposited in Additional file 1. Here we are interested in finding the favourable path(s) between incorrect conformational states (ID 9 and 30; states on the edge of the binding funnel) and the best state found by SwarmDock (the state closest to the bottom of the binding funnel; the native complex state). The state identified as being the closest to the native complex state, based upon the CAPRI criteria described above, is M21 (see Figure 2 and Example1/1KXP.info for details).
The RaTrav tool was run on this network (Example1/1KXP.ratrav) with 10^{6} random walkers. To speed up the computations we parallelized the calculations by creating a separate input file for each pair of states and by using the MFPT keyword (see Methods). We ran the calculations on our computational cluster (HP ProLiant BL460c) in parallel for 992 MFPTs (32∗32−32) for a maximum walltime of 14 days. A total of 291 MFPTs were reported to be not accessible; links between nodes with transition probabilities smaller than 10^{−6}. Of the remaining 701 MFPTs a total of 372 were reported back by RaTrav, the remainder, 329 were still in the process of being computed and assumed to be essentially infinite; that is a substantial number of the network walkers were stuck in dead ends.
Using the initial network (Example1/1KXP.ratrav), we assigned weights to edges based on calculated MFPTs; we removed links if MFPTs haven’t been calculated. On this MFPT weighted network (all weights are positive) we ran the Dijkstra algorithm [28] to find the favourable trajectories (shortest paths in units of mean firstpassage times) between conformational states on the edge of the binding funnel (I9 and I30) and the medium quality structure near the bottom of the funnel (M21).
The Dijkstra algorithm identified the following shortest paths: 9→20→22→21 (sum of MFPTs equals 4,624,065 steps) and 30→20→22→21 (sum of MFPTs equals 4,243,494 steps).
We analyzed the trajectory in terms of changes in interface and ligand RMSD, number of native and nonnative contacts and energy of each conformational state. We summarize the calculations in Table 1 and the values for all states are present in Additional file 1 (Example1/1KXP.info).
The final state is accessed slightly faster from the edge state I30 which has fnat = 25% than the edge state I9 with the slightly lower fnat = 22%. The difference in paths is only due to the MFPT for the first transition, to the state A20. Interestingly, the initial transitions (IDs 9→20 or 30→20) are down an energy gradient, at least in terms of the OPUSPSP potential used to score proteinprotein interactions. However, the next two transitions, A20→M22→M21 are movements to slightly increased energy states, indicating that the most time efficient pathways do not necessarily follow a decreasing energy gradient. Moreover, in terms of MFPTs, the transition between states A20 and M22, which looks in Figure 2 to be a bottleneck (kinetic trap), is not the limiting transition when comparing MFPTs for the first transition (I9 or I30 to A20) or the last transition in the pathway (M22 to M21).
In conclusion, by exploring MFPTs between conformational states within a proteinprotein binding funnel, dynamic information can be obtained that may provide important complementary, and sometimes counter intuitive, information on the funnel’s physical properties, which may facilitate rational design of competitive protein ligand inhibitors.
Free energy transduction between two coupled enzymatic reactions
As a second example of the utility of RaTrav, we describe here a method that enables a user to determine the nonequilibrium stationary fluxes in a system of interest on the basis of MFPT calculations. To this end we consider the action of a protein enzyme that converts the free energy between two coupled chemical reactions [29],[30]. Our primary task is to calculate the degree of coupling between the free energydonating reaction and the free energyaccepting one, the parameter that characterizes the efficacy of this chemochemical machine.
Let us consider the shaded box shown in Figure 3 which represents the network of conformational substates of a protein macromolecule that catalyses simultaneously two, in general, reversible reactions: the free energydonating reaction R_{1} ⇔ P_{1} and the free energyaccepting reaction R_{2} ⇔ P_{2}. A model structure of such a network consisting of two hundred nodes is depicted in Figure 4. A set of distinguished transition states, called the gates, underscored by enlarged black nodes, corresponds directly to the system of labels used in Figure 3. For the sake of clarity, we have limited our calculations to a rather simple network of states displaying a treelike topology, but much more complex networks of states can also be taken into account. In this context, the actual network can be thought of as a spanning tree, a loopless subnetwork consisting of edges with the highest transition probabilities per unit time between conformational substates.
The steadystate reaction fluxes
(i = 1,2) determine by definition the effective reaction rates, where [ E]_{ 0 } is the total concentration of the enzyme molecule and the overdot means a derivative with respect to time. The corresponding chemical forces
drive both chemical reactions. The symbols enclosed in the square brackets denote the molar concentrations of the chemical compounds R_{ i } and P_{ i } in the steady state (no subscript) or in the equilibrium state (the superscript eq). Here, β = (k_{ B }T)^{−1}, where k_{ B } is the Boltzmann constant and T denotes the absolute temperature.
The protein enzyme acts as a mesoscopic molecular machine, i.e. a system that enables two subsystems to perform work on one another, with the efficiency given by the ratio
of the output power J_{2}A_{2} to the input power J_{1}A_{1}. The minus sign in Eq. 8 arises from the fact that the free energy transduction can be realized only if the positive flux J_{2} > 0 corresponds to the negative chemical force A_{2} < 0 or vice versa [9]. This contradicts the second law of thermodynamics according to which the flux as well as the force should be of the same sign. However, if both reactions are coupled by the protein enzyme and proceed simultaneously in a common cycle, then the first reaction can change the direction of the second reaction. Consequently, the free energy dissipation is minimized at the expense of the free energy conversion, which can be realised with higher efficacy in the system. The effectiveness of this process is characterised by a degree of coupling
In the following, our main objective is to give some instructive hints of how to calculate a degree of coupling for the two chemical reactions controlled and gated by the network of conformational transitions depicted in Figure 4. This network forms the content of the grey box, representing the enzymesubstrate(s) complex M, shown schematically in Figure 3. We assume that the stochastic dynamics of conformational and chemical transitions within the enzymatic protein is jointly described by the system of master equations (see Eqs. 1 or 2). For the isolated network (without the chemical reactions) we assume the transition probabilities per unit time from the conformational substate l to its k_{ l } directly adjacent substates l^{′} to be
p determines the probability of transition to any conformational state neighbouring l. Following the detailed balance principle
the equilibrium occupation probability of the substate (node) l
where the summation runs over all nodes composing a network of conformational substates.
The nonzero thermodynamic forces, Eqs. 7, drive the system out of equilibrium breaking simultaneously the detailed balance condition for external and in general reversible transitions between gates (the transition states), which for l = 1,2 are characterised by the external transition times τ_{ l } (cf. Figure 3). These parameters can be compared to the longest MFPT within a network of states (nodes), which usually refers to the passage time, counted in the random walker steps, between the most distant pair of nodes. Together with the equilibrium occupation probabilities in Eq. 12 they determine the additional transition probabilities between the gates in the forward direction (exit from the gate l^{′′})
and in the backward direction (exit from the gate l^{′})
The factor ${e}^{\beta {A}_{l}}$ breaks the detailed balance symmetry. Here, the index l = 1,2 and the appropriate selection of primes and double primes is explained in Figure 4.
It follows from Eq. 13 and 14 that the probability p we introduced in Eq. 10 determines the characteristic unit of time which establishes an elementary time scale for the computer machine step. Thus, it should be firmly emphasized that the adequate determination of this quantity has a decisive meaning for the computational purpose at hand. To find p we select one gate l for which the external transition probability v_{ n m } is the highest. Then, assuming that at this gate the sum p + v_{ n m } = 1, we obtain the appropriate expression for p:
Consequently, in the case of the remaining nodes the sum of internal and possibly external transition probabilities must be complemented to unity, which means that additionally the nonzero waiting probabilities have to be taken into account on these states.
For the network of states depicted in Figure 4 (Example2/ratrav_fig4.in in Additional file 1) the longest MFPT, τ_{ max }, counted in random walk steps without external transitions, equals 3261 (see Example2/HI_ratrav_fig4.out in Additional file 1); the transition state with the lowest occupation probability corresponds to the gate l = 1^{′′} (although l = 1^{′} is another good choice), for which ${p}_{{1}^{\mathrm{\prime \prime}}}^{\text{eq}}\phantom{\rule{0.3em}{0ex}}=\phantom{\rule{0.3em}{0ex}}1/398$ (see Eq. 12 with ${k}_{{1}^{\mathrm{\prime \prime}}}\phantom{\rule{0.3em}{0ex}}=\phantom{\rule{0.3em}{0ex}}1$, $\sum _{l}{k}_{l}\phantom{\rule{0.3em}{0ex}}=\phantom{\rule{0.3em}{0ex}}2\phantom{\rule{0.3em}{0ex}}(N1)$ where N = 200 is the number of nodes in the network). Without loss of generality we consider in what follows only the case for τ_{ 1 } = τ_{2}. Selecting τ_{1} = 40, which is far below the maximal value τ_{ max } = 3261, we enforce the chemical reactions to be completely controlled by the internal dynamics of conformational transitions for which p ≈ 0.091 in Eq. 15. Moreover, assuming β A_{1} = 10 we require that the free energydonating reaction proceeds sufficiently far from equilibrium in the forward direction 1^{′′} → 1^{′}, while becoming almost negligible in the backward direction 1^{′} → 1^{′′}. In turn, we put β A_{2} = 0 to provide the detailed balance symmetry for the free energyaccepting reaction. Such a choice of parameters τ_{ l } and A_{ l } for l = 1,2 assures the highest transition probability from the node l = 1^{′′}.
We are now in a position to determine the steadystate reaction fluxes for both chemical transitions. The net fluxes are decomposed into oneway fluxes
where the signs plus and minus refer to the forward and the backward directions, respectively. To find them we must perform four independent calculations. Following the Hill reasoning [9] described in Methods, instead of the oneway steady state fluxes, it is enough to calculate the MFPTs between correctly identified nodes in the modified network of states. The schemes illustrated in Figure 5, explain in a pictorial form the direct relation between the MFPT and a oneway stationary flux J_{±l}. According to the Hill algorithm
where ∗ is the additional target node connected to the network in a position where the intermediate nodes (gates) l^{′′} for J_{+l}, or l^{′} for J_{−l} are localised (see Figure 5). The algorithm allowing a calculation of the oneway stationary flux between two directly connected states (nodes) in the arbitrary network of states proceeds in the following steps:

1.
Create a network (graph) with connections (links) between nodes in accordance with the RaTrav input file format (see input files in Example2).

2.
Select two nodes connected directly by the link. In Figure 5a it is the reversible external transition between the node 1^{′} and the node 1^{′′} characterized by the exit probabilities v _{ ±1 }.

3.
Choose a direction in which the flux will be calculated. In Figure 5a the flux J _{+1} is assumed to be in the counterclockwise direction.

4.
Redirect one of the two (if reversible) transitions along a link to the additional node added to the original network. In Figure 5a the transition from the node 1^{′′} to the node 1^{′} has been redirected to the new node labeled there by the star.

5.
Calculate the MFPT between the initial node (1^{′}) and the final node (∗).

6.
The inverse of this MFPT determines the oneway stationary flux.
Interestingly, the MFPTs in Eqs. 18 and 19 turn out to be split into two parts
(see Theorem 2 in [30]). Here, the superscript ‘st’ indicates the stationary occupation probabilities of the selected gates which can be also computed by the RaTrav tool. A method of how to calculate all quantities, for example, on the r.h.s of Eq. 20 for l = 2 is explained graphically in Figure 6. The relations in Eqs. 20 and 21 offer a very useful way in which all component MFPTs from states l^{′} or l^{′′} to ∗ and hence the stationary fluxes in Eqs. 18 and 19 can be calculated. Moreover, they are of particular importance since they enable us to identify scales of two characteristic time contributions, distinguishing the conformational dynamics within the protein enzyme (the first component) and between two coupled chemical reactions each of which is gated by a single transition substate (the second component). In this second case both components, ${v}_{\pm l}{p}_{\pm l}^{\text{st}}$, can be thought of as the counterparts of the reciprocal equilibrium rate constants supposed by the transition state theory.
Input data used to compute all directed reaction fluxes in the network of states shown in Figure 4, accompanied by the outputs, are to be found in Additional file 1. The MFPTs required for a determination of the steadystate fluxes in Eqs. 18 and 19 and obtained by using of the Hill algorithm are as follows: τ(1^{′}→∗) = 5050.17, τ(1^{′′}→∗) = 5086078.41, τ(2^{′}→∗) = 423.29 and τ(2^{′′}→∗) = 461.62, whereas those obtained by application of the Monte Carlo simulations for WALK=1000000 and SIMU=1 are: τ(1^{′}→∗) = 5048.95, τ(1^{′′}→∗) = 5081053.61, τ(2^{′}→∗) = 425.39 and τ(2^{′′}→∗) = 463.55. Therefore, combining these numerical results, we conclude from Eqs. 16, 17, 18, 19 and 9, that the degree of coupling is
It is worth emphasizing that both numbers are very close to unity and this result requires sensible interpretation. Let us recall that the degree of coupling is defined as the ratio of the free energyaccepting reaction flux and the free energydonating reaction flux. We have assumed that both reactions are controlled (τ = 40 ≪ τ_{ max } = 3261) and simultaneously gated by the dynamics of stochastic transitions within a network of conformational states. The system of connections in this network is not arbitrary but displays the scalefree topology and fractality. The fractality results mainly from the effective repulsion between the most connected nodes, so called hubs, which tend to link to small degree nodes and not to each other. Two main hubs are clearly visible in Figure 4. Their existence enables the most distant separation between the pairs of gates (transition states) for the output and input reactions. As a consequence, the MFPTs between the gates shown in Figure 4 become comparable and hence the degree of coupling is very close to unity. We have examined that all other configurations of the gates decreases the value of this parameter. There is still an open question for scientists of how to construct the effective networks of states for the realworld protein machines to predict and explain their basic functions. We hope, at least to some extent, that our software contributes to manage this task.
Conclusions
There is a wide range of problems related to the theory of stochastic processes where the mean firstpassage time quantity may be applied to describe the dynamics of the networked system. RaTrav is a software tool for calculating MFPTs on any arbitrary network or graph representing a substrate for Markovian processes, defined by the users in accordance with their requirements. MFPTs may be calculated between a pair of states or between an initial state and multiple final states. Moreover, exit probabilities from the nodes as well as local time scales along the edges may be assigned by the user. A choice between two MFPT calculation methods is made available: a stochastic Monte Carlo method and the combinatorial Hill method. To highlight the usefulness of the RaTrav tool, we presented in this article two examples of biochemical processes where the calculation of MFPTs plays an important role. For the first example (analysis of a proteinprotein binding funnel), due to the large number of cycles within the network, the Monte Carlo method was applied. For the second, a treelike network of conformational states for an enzyme, the Hill algorithm was applied in order to return results much faster and without any approximations. To our best knowledge, RaTrav is not only the first open source computational package for computing MFPTs, but as well the first computational tool to be made available where the local transition times of the network edges have been successfully introduced into Hill’s method.
We are aware that there is a substantial gap between physics and biology in terms of describing the network properties such as the proper sampling of states, assignment of transition probabilities and assignment of weights of edges (local times) in line with a chosen reaction coordinate. However, with further development in various fields, we believe that RaTrav, as a general tool, can be applied to a wide group of problems, where a biosystem or a process can be represented as a complex network.
Availability and requirements
Project name: RaTravProject home page: http://sourceforge.net/projects/ratravOperating system(s): Platform independentProgramming language: C++Other requirements: Boost C++ Libraries (http://www.boost.org/)License: GNU General Public License, version 3 (http://opensource.org/licenses/GPL3.0) for RaTrav. Boost Software License (http://www.boost.org/users/license.html) for Boost C++ Libraries.
The package contains full source code, binary version of RaTrav and manual.
Authors’ information
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint first authors.
References
 1.
Klipp E, Liebermeister W, Wierling C, Kowald A, Lehrach H, Herwig R: Systems Biology: A Textbook. 2009, Weinheim: WileyBlackwell
 2.
van Kampen NG: Stochastic Processes in Physics and Chemistry. 2007, Amsterdam: Elsevier B. V
 3.
Gardiner CW: Stochastic Methods: A Handbook for the Natural and Social Sciences. 2009, Berlin: Springer
 4.
Redner S: A Guide to FirstPassage Processes. 2001, Cambridge: Cambridge University Press
 5.
Berg HC: Random Walks in Biology. 1993, Princeton: Princeton University Press
 6.
Newman MEJ: Networks. An Introduction. 2010, Oxford: Oxford University Press
 7.
Klafter J, Sokolov IM: First Steps in Random Walks. From Tools to Applications. 2011, Oxford: Oxford University Press
 8.
Condamin S, Benichou O, Tejedor V, Voituriez R, Klafter J: Firstpassage times in complex scaleinvariant media. Nature. 2007, 450: 7780.
 9.
Hill TL: Free Energy Transduction and Biochemical Cycle Kinetics. 1989, New York: Springer
 10.
Krivov SV, Muff S, Caflisch A, Karplus M: Onedimensional barrierpreserving freeenergy projections of a βsheet miniprotein: new insights into the folding process. J Phys Chem B. 2008, 112: 87018714.
 11.
Kreuzer SM, Moon TJ, Elber R: Catch bondlike kinetics of helix cracking: network analysis by molecular dynamics and milestoning. J Chem Phys. 2013, 139: 121902
 12.
Mohr OSD, Stefanovic D: Firstpassage properties of molecular spiders. Phys Rev E. 2013, 88: 012724
 13.
Hawk AT, Konda SSM, Makarov DE: Computation of transit times using the milestoning method with applications to polymer translocation. J Chem Phys. 2013, 139: 064101
 14.
Asfaw M, AlvarezLacalle E, Shiferaw Y: The timing statistics of spontaneous calcium release in cardiac myocytes. PLoS One. 2013, 8: e62967
 15.
Newton PK, Mason J, Bethel K, Bazhenova L, Nieva J, Norton L, Kuhn P: Spreaders and sponges define metastasis in lung cancer: a Markov chain Monte Carlo mathematical model. Cancer Res. 2013, 73: 27602769.
 16.
Kurzynski M, Chelminiak P: Temperature and detectionwavelength dependence of the electron transfer rates in initial stages of photosynthesis. J Phys Chem B. 2013, 117: 1233912346.
 17.
Torchala M, Chelminiak P, Bates PA: Mean firstpassage time calculations: comparison of the deterministic Hill’s algorithm with Monte Carlo simulations. Eur Phys J B. 2012, 85: 116
 18.
Torchala M: Efficient parallel computations using MPI in chemistry and physics. Adv Cheminform. 2007, 1: 2127.
 19.
Copeland RA: Conformational adaptation in drugtarget interactions and residence time. Future Med Chem. 2011, 3: 14911501.
 20.
Torchala M, Moal IH, Chaleil RAG, FernandezRecio J, Bates PA: SwarmDock: a server for flexible proteinprotein docking. Bioinformatics. 2013, 29: 807809.
 21.
Comeau SR, Gatchell DW, Vajda S, Camacho CJ: ClusPro: an automated docking and discrimination method for the prediction of protein complexes. Bioinformatics. 2004, 20: 4550.
 22.
Otterbein LR, Cosio C, Graceffa P, Dominguez R: Crystal structures of the vitamin Dbinding protein and its complex with actin: structural basis of the actinscavenger system. Proc Natl Acad Sci USA. 2002, 99: 80038008.
 23.
Torchala M, Moal IH, Chaleil RAG, Agius R, Bates PA: A Markovchain model description of binding funnels to enhance the ranking of docked solutions. Proteins. 2013, in press: doi: 10.1002/prot.24369
 24.
Lensink MF, Wodak SJ: Docking, scoring and affinity prediction in CAPRI. Proteins. 2013, in press: doi: 10.1002/prot.24428
 25.
Hwang H, Vreven T, Janin J, Weng Z: Proteinprotein docking benchmark version 4.0. Proteins. 2010, 78: 31113114.
 26.
Bastian M, Heymann S, Jacomy M: Gephi: an open source software for exploring and manipulating networks. International AAAI Conference on Weblogs and Social Media. 2009, San Jose, California: AAAI Publications, 361362.
 27.
Lu M, Dousis AD, Ma J: OPUSPSP: an orientationdependent statistical allatom potential derived from sidechain packing. J Mol Biol. 2008, 376: 288301.
 28.
Dijkstra EW: A note on two problems in connexion with graphs. Numer Math. 1959, 1: 269271.
 29.
Kurzynski M: The Thermodynamic Machinery of Life. 2006, Berlin: SpringerVerlag
 30.
Kurzynski M, Chelminiak P: Mean firstpassage time in the stochastic theory of biochemical processes. Application to actomyosin molecular motor. J Stat Phys. 2003, 110: 137181.
 31.
Song C, Havlin S, Makse HA: Origins of fractality in the growth of complex networks. Nature. 2006, 2: 275281.
Acknowledgements
We are grateful to Raphael Chaleil for his useful comments on software development. This work was supported by Cancer Research UK and the Polish Ministry of Science and Higher Education (project N N202 180038).
Author information
Additional information
Competing interests
The authors declare that they have no conflict of interest.
Authors’ contributions
MT and PC designed, wrote and tested the RaTrav C++ code. MT and PC prepared the examples and generated the data. MT, PC and PB wrote the manuscript. All authors discussed the results and approved the final version of the manuscript.
Electronic supplementary material
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
About this article
Received
Accepted
Published
DOI
Keywords
 Markov processes
 Master equation
 Random walk
 Mean firstpassage time
 Stationary flux
 Monte Carlo method
 Hill’s method
 Complex networks
 Chemical kinetics
 Proteinprotein binding funnel
 Proteinprotein interactions
 Proteinprotein docking
 Free energy transduction