Inferring biochemical reaction pathways: the case of the gemcitabine pharmacokinetics
- Paola Lecca^{1}Email author,
- Daniele Morpurgo^{1},
- Gianluca Fantaccini^{1},
- Alessandro Casagrande^{1, 2} and
- Corrado Priami^{1, 2}
DOI: 10.1186/1752-0509-6-51
© Lecca et al.; licensee BioMed Central Ltd. 2012
Received: 19 December 2011
Accepted: 23 April 2012
Published: 28 May 2012
Abstract
Background
The representation of a biochemical system as a network is the precursor of any mathematical model of the processes driving the dynamics of that system. Pharmacokinetics uses mathematical models to describe the interactions between drug, and drug metabolites and targets and through the simulation of these models predicts drug levels and/or dynamic behaviors of drug entities in the body. Therefore, the development of computational techniques for inferring the interaction network of the drug entities and its kinetic parameters from observational data is raising great interest in the scientific community of pharmacologists. In fact, the network inference is a set of mathematical procedures deducing the structure of a model from the experimental data associated to the nodes of the network of interactions. In this paper, we deal with the inference of a pharmacokinetic network from the concentrations of the drug and its metabolites observed at discrete time points.
Results
The method of network inference presented in this paper is inspired by the theory of time-lagged correlation inference with regard to the deduction of the interaction network, and on a maximum likelihood approach with regard to the estimation of the kinetic parameters of the network. Both network inference and parameter estimation have been designed specifically to identify systems of biotransformations, at the biochemical level, from noisy time-resolved experimental data. We use our inference method to deduce the metabolic pathway of the gemcitabine. The inputs to our inference algorithm are the experimental time series of the concentration of gemcitabine and its metabolites. The output is the set of reactions of the metabolic network of the gemcitabine.
Conclusions
Time-lagged correlation based inference pairs up to a probabilistic model of parameter inference from metabolites time series allows the identification of the microscopic pharmacokinetics and pharmacodynamics of a drug with a minimal a priori knowledge. In fact, the inference model presented in this paper is completely unsupervised. It takes as input the time series of the concetrations of the parent drug and its metabolites. The method, applied to the case study of the gemcitabine pharmacokinetics, shows good accuracy and sensitivity.
Background
Drug metabolism is a process by which pharmaceutical substances undergo biochemical modification in living organisms. Such modifications may lead to the generation of an active drug from an inactive precursor or in other cases lead to the generation of an inactive compound from an active substance, aiding in detoxication. This process is performed by a network of biochemical reactions that modify the pharmaceutical substances. The mechanisms of action of a drug refer to the specific biochemical, chemical, or physical interactions through which a drug substance exerts its pharmacological effect. The biochemical transformations of a drug form part of the pharmacokinetic pathway, whereas the interactions of the drug on its targets compose the pharmacodynamic pathway. Our study focuses on the reconstruction of network of biochemical interactions taking place between the drug, its metabolites, and their cellular targets. In brief, reconstructing a pharmacokinetic model starting from in-vitro experiments.
At the macroscopic level, the pharmacokinetics concerns with the intracellular uptake kinetics and the metabolic transformations of a drug, usually performed by specialized enzymatic systems. Similarly, at the microscopic level, the pharmacodynamics concerns with biochemical interactions between the drug, or in some cases its metabolites, and their molecular targets. The drug metabolites and the drug targets are components of a network of interactions. This network is usually represented by a graph whose nodes are the system’s components (drug, metabolites and targets) and the edges between nodes link two biochemically interacting components. The directions of the edges specify the directions of the chemical reactions and the sign of the edges denotes the inhibitory or activatory effect of one component on another.
To infer the directional signed edges of the network it is common to use experimental time series of the concentration of the components of the systems, i.e. of the nodes of the network we want to infer. The network inferred from the species dynamics will include edges between those nodes whose time series exhibit sufficient functional connectivity, typically defined as a measure of coupling exceeding a predetermined threshold [1]. The same time series used to infer the network can be used also to estimate the rate constants that parametrize the interactions between the biochemical species. The preliminary network can be fitted to the experimental time series to eventually detect those edges whose kinetics is null, and that can be removed from the network.
In accordance to several studies of the late 1990s that indicated that poor understanding of pharmacokinetics properties and toxicity were important causes of costly late-stage failures in drug development [2], it has become appreciated the ability to infer and calibrate pharmacokinetic models, whose usefulness is now widely recognized through all the process of drug design and clinical development [3].
Our study proposes a method to deduce both reactions pathways and kinetic parameters of biotransformations of drugs from time series data of measured concentrations of metabolites. Determining the pathways in chemical reaction networks from time series data has been an active area of research over a decade. A comprehensive review of available techniques can be found in [4–6]. Unlike in the most part of network inference techniques, in our approach a probabilistic model of parameter inference is part of the procedure of the identification of interconnection topology. The integration of parameters estimation into a method of topology identification results in a well performing network inference algorithm. The parameter estimation works as a sort of model discriminator selecting only the interaction responsible for the kinetic of the system. In this way, the whole procedure is able to recover a good approximation of the true network topology, especially for metabolic networks. In this study, we focus on gemcitabine metabolic pathway. Gemcitabine is an oncological drug used in various carcinomas: non-small cell lung cancer, pancreatic cancer, bladder cancer and breast cancer [7]. It is being investigated for use in esophageal cancer, and is used experimentally in lymphomas and various other tumor types. The gemcitabine reaction pathway has been recently investigated [7] and new observational data provided by the experiments of Veltkamp et al. [8] contribute to elucidate the cellular pharmacology of gemcitabine. Therefore, the gemcitabine pathway is a good case study for our inference algorithm.
The majority of the literature about application of inference techniques in pharmacokinetics and pharmacodynamics modelling concerns the parameter estimation, rather then the identification of the interaction network structure. Moreover, the majority of the methods of parameter estimation fits continuous pharmacokinetics models to discrete time measurements of the drug’s kinetic and/or dynamic response. In pharmacokinetics, the first methods of inference were applied almost thirty years ago to the clinical problem of dose regimen [9]. They were mostly Bayesian methods.
More recently other parameter inference techniques and parameter fitting procedures have been developed. As a result, many software tools implementing them were developed and provided to the scientific community of modelers and pharmacologists (e.g. maximum likelihood inference techniques [10], parameter fitting by nonlinear mixed [11–13] effects are only two recent examples). Now, the application of inference and fitting procedures for parameter estimation has been extended to all phases of drug development [14], and it is no longer limited to the dose regime scheduling. In pharmacokinetics and pharmacodynamics, the parameter estimation techniques are applied to infer or fit two categories of parameters. The first category includes parameters for which a priori information is available, e. g. organ blood flows, organ volume fractions, receptor density, cellular signalling protein turnover, whereas the second category includes parameters with no a priori information, e. g. intrinsic clearance, transport and binding coefficients, or parameters representing specific properties of the drug such as partition coefficient.
The development of inference methods that deduce from time discretely observed concentrations of metabolites, both the network of the biotransformations and mechanisms of action as well as the kinetic parameters of these interactions is a new challenge of a new paradigm in drug design and discovery: the network pharmacology introduced by A. L. Hopkins [15]. Network pharmacology is grounded on network biology, that conceives a biological systems as a network whose nodes are the system’s components and the edges between nodes indicate the occurrence of interactions between the nodes. In the network-centric view of a biological system, the time evolution of biological systems, entities and processes is the result of the occurrence of the interactions among the system’s components. The dynamics of the network is governed by the kinetic parameters of the interactions. In most of the cases these parameters are not directly measurable and a priori knowledge is unavailable. Often, subjective prior beliefs on the parameters are difficult to be converted into a mathematically formulated model and prior. All these considerations motivated our work and our attempt to develop inference methods which are robust against noise, efficient in computation and flexible enough to meet different constraints.
The paper outlines as follows: (i) we first present the model of inference based on the estimation of the time-lagged correlation functions; (ii) we review the parameter inference methods of KInfer; (iii) we describe the in vitro cytotoxicity, cellular uptake, efflux, biotransformation, and nucleic acid incorporation of the gemcitabine and present the experimental data used in this study; (iv) we show and discuss the inferred network of biotransformations and mechanisms of action of gemcitabine as obtained by our inference algorithm. Finally, (v) we end up the paper with some final conclusions and plans of future works.
A time-lagged correlation based network inference
The time-lagged correlation is a measure that is related to the Pearson correlation coefficient but that takes into account shifts in time, i.e. lags, between the expression of the causal effector and the target module. The key-idea of correlation-based methods for network inference is that for a data set comprising time series profiles of N species, X_{ i }(t) for i = 1,…,N, the correlation matrix of the N(N − 1)/2independent pairwise correlation coefficients can be used to cluster the data set into groups of species within which correlations between species are high, when compared to pairwise correlations between different groups. These groupings can most easily be discerned by calculating a matrix of pairwise distances, d_{ ij } from the correlation matrix, whereby d_{ ij } = 0for two species which are completely positively correlated and increases as the pairwise correlation coefficient decreases.
The distance matrix can subsequently be analyzed to find, and visualize, patterns of proximities between species. A large family of techniques for the analysis and visualization of proximity data coming from similarity/distance matrix is available in literature. Many of these techniques refer to methods of data clustering, and, more recently, to generalized methods of graph splatting such as layout algorithms and multidimensional scaling algorithms [20–24]. The purpose shared by all these techniques is to reveal and, possibly, visualize, patterns of similarities and topological structures underlying the data. In this study we use a multidimensional scaling method. Multidimensional scaling transforms a distance matrix into a set of coordinates such that the distances derived from these coordinates approximate as well as possible the original distances. This way, the multidimensional scaling techniques are used in information visualization for exploring similarities or dissimilarities in data. In fact, the visualization of the similarities/distances among data facilitates the interpretation and the analysis of complex network, and, for this reason, they are currently becoming popular in bioinformatics [17, 25, 26] in network inference [16, 17] and network analysis problems. We will describe later in the paper the details of the multidimensional scaling we used for this study.
The values of τin (2) range over the interval of the rate limitingness across the entire reaction network.
where N_{ μν } is the number of points in the particular rectangle labeled μ ν, N_{ tot } is the total number of points and A_{ μν }is the area of the rectangle.
where V_{ μν } is the Voronoi cell μν.
is the maximum absolute value of the correlation between two species with a time lag τ.
The distances are used to find the connections between the different species in the system. Namely, the distances measure the relatedness of the time series describing the time behaviour of the species; the more related the more likely that two species are connected by a single reaction. The greater the distance between two species the more likely that two species are connected by several intermediate reactions or they are not connected at all.
so that Kruskal-Shepard scaling is also known as least-squares scaling [31]. This scaling seeks values of the coordinates ${\mathbf{x}}_{1},{\mathbf{x}}_{2},\dots ,{\mathbf{x}}_{n}\in {\mathbf{R}}^{D}$ to minimize S. For a given value of D the estimation of the coordinates is performed in such a way that the pairwise distances are preserved as well as possible. The choice of D is arbitrary in principle, but low in practice: D = 2; 3 are the most frequently used dimensions, for the simple reason that the points serve as easily visualized representors of the species. A downhill simplex (amoeba) algorithm is used to minimize S_{ D }[33, 34]. In the application domain of this study, the downhill simplex method turned out to be more efficient that the gradient descendant minimization method that is usually exploited in the Kruskal-Shepard scaling algorithm.
In this study we reported the 2D and the 3D visualizations of the networks. Nevertheless, a set of Euclidean distances on p points can be represented exactly (i.e. with S_{ D }=0) in at most p−1dimensions. An insufficient number of dimensions is not the only cause of non zero stress. It may be caused also by random measurements errors in the input data. In such cases, even if the “true” number of dimensions of the problem were known, this would not guarantee that the stress corresponding to that number of dimensions is null [35–37]. Unfortunately, in most datasets the true dimensionality of the problem is not known in advance, like in the case treated in this study. The commonly advocated procedure for determining the dimensionality is a heuristic one of seeking a sharp drop or “elbow” in the rate of decline of stress as dimensionality increases. In practice such elbows are rarely obvious. In fact is has often noted that the pattern of change of stress versus dimensionality, rather than having an elbow, is characterized by a smooth and gradual decline [38]. As reported by M. D. Lee in [38], although there are at least two variants [39, 40] of the multidimensional scaling that attempt to determine automatically the number of dimensions of the spatial representation they derive, there is not a rigorous and principled basis for this determination.
Then, we apply the rules of error propagation [41] to find the error on the covariance estimate given in Eq. 1.
where m is the number of records in the time series.
with δ r_{ ij }(τ) as in Eq. (17).
Then, we derive the undirected graph representing the network: species whose distance do not overcome the threshold undergo a biochemical interaction. In order to determine directions of the edge we need to infer a temporal ordering of the reaction events. Having knowledge of the temporal sequence of the reaction means knowing whether a perturbation of one species follows or proceeds that of another species. As suggested by Arkin et al. [17] the temporal ordering of variation in each of the variables can be assigned in the following way. If the time series for a given species has a maximum correlation at negative lags compared to a reference time series, then that species receives the input signals after the reference series, and vice versa. Similarly, if the two series are maximally correlated at zero lag but correlation tails to negative (positive) lags, variation in the given species closely follows (precedes) variation in the reference species. In this study, we do not consider negative lags, but without any loss of generality, we consider only positive lags and look for cause-effect relationships on a positive temporal scale.
In the next section we will describe the procedure of inference of the kinetic parameters of the interactions among the nodes of the network.
Inferring the kinetic parameters with KInfer
The tool we recently developed to infer the kinetic parameters is KInfer (Kinetics Inference). It takes as an input the experimental time series of each reactants in the systems and a set of chemical reactions that are supposed to occur among the species of that system. Therefore, we apply KInfer to the network model we inferred with the time-lagged correlation based inference procedure described in the previous section.
We report here the key statements of the theory implemented by the tool and refer the reader to the references [18, 42, 43].
where θ_{ i } , i=1,2,…,N, is the vector of the rate coefficients, which are present in the expression of the function f_{ i }; α_{ w }∈R, and N_{ i } is the number of parameter in the f_{ i }rate equation.
where ${\mathrm{\Omega}}_{{\mathbf{X}}^{\left(i\right)}}$ is the sample space of X^{(i)}, and K_{ i } is the number of chemical species in the expression for f_{ i }.
While the increments/decrements are conditionally independent given the starting point ${X}_{i}\left({t}_{k}\right)$, the random variables D_{ i }(t_{ k }) are not independent of each other. Intuitively, if X_{ i }(t_{ k })happens to be below its expected value because of random fluctuations, then the following increment D_{ i }(t_{k + 1}) can be expected to be bigger as a result, while the previous one D_{ i }(t_{ k }) will be smaller. A simple calculation allows us to obtain the covariance matrix of the vector of increments for the i-th species. This is a banded matrix C_{ i }≡C=Cov(D_{ i }) with diagonal elements given by 2σ^{2} and a non-zero band above and below the diagonal given by −σ^{2}. All other entries of Care null.
where D={D_{1},…,D_{ N }}, D_{ i }=D_{ i }(t_{1}),D_{ i }(t_{2}),…D_{ i }(t_{ M })(i=1,2,…,N), and ${\mathbf{m}}_{i}\left({t}_{k-1}\right)\equiv E\left[{f}_{i}\left(\mathbf{X}\right({t}_{k-1}),{\theta}_{i})\right]$.
Eq. (26) can be optimized w. r. t. the parameters Θ=(θ_{1},θ_{2},…,θ_{ N })of the model to yield estimates of the parameters themselves and of the noise level. The inferred parameters are then used as multiplicative coefficients for each element of the correlation matrix, so that we obtain a weighted correlation matrix, reflecting the physical interaction in the network.
This procedure makes null the element of the correlation matrix if the rate constant multiplying this element is a null kinetic rate constant or if this rate constant is affected by a relative error of 50%.
Metabolism and mechanisms of action of gemcitabine
Gemcitabine (2^{ ′ },2^{ ′ }-difluorodeoxycytidine, dFdC) is an anticancer drug, which is effective against solid tumours, including non-small-cell lung cancer and pancreatic cancer. Gemcitabine is an anticancer nucleoside analog in which the hydrogen atoms on the 2^{ ′ }-carbon of deoxycytidine are replaced by fluorine atoms. As with fluorouracil and other analogues of pyrimidines, the triphosphate analogue of gemcitabine replaces one of the building blocks of nucleic acids, in this case cytidine, during DNA replication. The process arrests tumor growth, as only one additional nucleoside can be attached to the “faulty” nucleoside, resulting in termination of DNA replication and ultimately leading the cells to apoptosis.
Results
We first applied the algorithm of network inference to deduce some of the biotransformations of gemcitabine from the experimental time series of metabolite concentrations available in [8]. The algorithm can infer the reactions between the measured species, therefore, since the experiments in [8] measured the concentrations of dFdCout, dFdC, dFdCMP, dFdCDP, dFdCTP, dFdU, dFdUMP, dFdUDP, dFdUTP, the reactions we expect to infer are only those among these chemical species.
Minimum and maximum correlation and corresponding value of the lag τ ( τ _{ min_ corr } and τ _{ max_ corr } , respectively) obtained from real experimental data
Reference species | Species | τ _{ max_corr } | Max correlation | τ _{ min_corr } | Min correlation |
---|---|---|---|---|---|
dFdCout | dFdC | 6 | -0.5131595 | 0 | -0.5282891 |
dFdCout | dFdCMP | 6 | -0.6701863 | 0 | -0.7983718 |
dFdCout | dFdCDP | 6 | -0.574825 | 0 | -0.7518225 |
dFdCout | dFdCTP | 6 | -0.1352853 | 0 | -0.2802762 |
dFdCout | dFdU | 6 | -0.2821309 | 0 | -0.4398082 |
dFdCout | dFdUMP | 6 | -0.4024222 | 0 | -0.5185132 |
dFdCout | dFdUDP | 6 | -0.04846271 | 0 | -0.1694 |
dFdCout | dFdUTP | 6 | 0.001938335 | 0 | -0.1202802 |
dFdC | dFdCMP | 0 | 0.8020463 | 6 | 0.6897214 |
dFdC | dFdCDP | 0 | 0.4703194 | 6 | 0.3333613 |
dFdC | dFdCTP | 0 | -0.01175614 | 6 | -0.1341448 |
dFdC | dFdU | 0 | 0.2034577 | 6 | 0.07041349 |
dFdC | dFdUMP | 0 | 0.6428315 | 6 | 0.5227021 |
dFdC | dFdUDP | 0 | 0.01913633 | 6 | -0.09642542 |
dFdC | dFdUTP | 0 | -0.1380516 | 6 | -0.2471314 |
dFdCMP | dFdCDP | 0 | 0.900924 | 6 | 0.7347594 |
dFdCMP | dFdCTP | 0 | 0.5163396 | 6 | 0.3631771 |
dFdCMP | dFdU | 0 | 0.6947208 | 6 | 0.5328304 |
dFdCMP | dFdUMP | 0 | 0.9040247 | 6 | 0.7781543 |
dFdCMP | dFdUDP | 0 | 0.4910136 | 6 | 0.3534895 |
dFdCMP | dFdUTP | 0 | 0.3758334 | 6 | 0.2393577 |
dFdCDP | dFdCTP | 0 | 0.8094718 | 6 | 0.6765684 |
dFdCDP | dFdU | 0 | 0.9141036 | 6 | 0.77697 |
dFdCDP | dFdUMP | 0 | 0.9074831 | 6 | 0.8139078 |
dFdCDP | dFdUDP | 0 | 0.7619494 | 6 | 0.6458895 |
dFdCDP | dFdUTP | 0 | 0.6994546 | 6 | 0.580302 |
dFdCTP | dFdU | 0 | 0.9740346 | 6 | 0.9208324 |
dFdCTP | dFdUMP | 0 | 0.7554619 | 6 | 0.7344294 |
dFdCTP | dFdUDP | 0 | 0.9872751 | 6 | 0.9356584 |
dFdCTP | dFdUTP | 0 | 0.9856026 | 6 | 0.9283316 |
dFdU | dFdUMP | 0 | 0.872246 | 6 | 0.8236066 |
dFdU | dFdUDP | 0 | 0.9585976 | 6 | 0.8808236 |
dFdU | dFdUTP | 0 | 0.9251018 | 6 | 0.8427518 |
dFdUMP | dFdUDP | 0 | 0.7730816 | 6 | 0.6595772 |
dFdUMP | dFdUTP | 0 | 0.6680645 | 6 | 0.5550562 |
dFdUDP | dFdUTP | 0 | 0.9860277 | 6 | 0.936058 |
dFdUTP | dFdUTP | 0 | 1 | 6 | 0.9634716 |
We used the time series of dFdCout, dFdC, dFdCMP, dFdCDP, dFdCTP, dFdUout, dFdU, dFdUMP, dFdUDP, dFdUTP generated by the simulation of the model as input to the inference algorithm. The simulated time series contain 200 points regularly spaced in the interval [0, 20] hours (see the curves in Figure 9). Unlike the set of real data, this set contains also a time course of the extracellular concentration of dFdU (dFdUout).
Minimum and maximum correlation and corresponding value of the lag τ ( τ _{ min_corr } and τ _{ max_corr } , respectively) obtained from synthetic time series of dFdC and its metabolites
Reference species | Species | τ _{ max_corr } | Max correlation | τ _{ min_corr } | Min correlation |
---|---|---|---|---|---|
dFdCout | dFdCout | 0 | 1 | 8 | 0.0002199688 |
dFdCout | dFdC | 2 | 0.523715 | 0 | 0.1466252 |
dFdCout | dFdCMP | 7 | 0.3980003 | 0 | -0.0301777 |
dFdCout | dFdCDP | 8 | -0.1001522 | 0 | -0.3789898 |
dFdCout | dFdCTP | 8 | -0.2498713 | 0 | -0.3428283 |
dFdCout | dFdUout | 0 | -0.1964403 | 8 | -0.2029342 |
dFdCout | dFdU | 8 | 0.1920973 | 1 | -0.2593897 |
dFdCout | dFdUMP | 8 | -0.2603844 | 0 | -0.4254257 |
dFdCout | dFdUDP | 0 | -0.2085703 | 8 | -0.2154594 |
dFdCout | dFdUTP | 0 | -0.1226727 | 8 | -0.126728 |
dFdC | dFdC | 0 | 1 | 8 | 0.4815612 |
dFdC | dFdCMP | 3 | 0.9411531 | 8 | 0.7860167 |
dFdC | dFdCDP | 8 | 0.1379433 | 0 | -0.3858859 |
dFdC | dFdCTP | 8 | -0.2253363 | 0 | -0.6472458 |
dFdC | dFdUout | 0 | -0.6342664 | 8 | -0.6438545 |
dFdC | dFdU | 6 | 0.6141992 | 0 | 0.0142063 |
dFdC | dFdUMP | 8 | -0.1229368 | 0 | -0.6612913 |
dFdC | dFdUDP | 8 | -0.6378404 | 0 | -0.6527617 |
dFdC | dFdUTP | 0 | -0.4012575 | 8 | -0.4125833 |
dFdCMP | dFdCMP | 0 | 1 | 8 | 0.6754058 |
dFdCMP | dFdCDP | 8 | 0.3596562 | 0 | -0.01262979 |
dFdCMP | dFdCTP | 8 | 0.0264333 | 0 | -0.403415 |
dFdCMP | dFdUout | 8 | -0.750782 | 0 | -0.7628276 |
dFdCMP | dFdU | 5 | 0.7362333 | 0 | 0.4506942 |
dFdCMP | dFdUMP | 8 | 0.09732282 | 0 | -0.3657606 |
dFdCMP | dFdUDP | 8 | -0.7043206 | 0 | -0.7583209 |
dFdCMP | dFdUTP | 0 | -0.5428359 | 8 | -0.5446803 |
dFdCDP | dFdCDP | 0 | 1 | 8 | 0.6404886 |
dFdCDP | dFdCTP | 0 | 0.9100672 | 8 | 0.7518772 |
dFdCDP | dFdUout | 8 | 0.01484932 | 0 | -0.1177313 |
dFdCDP | dFdU | 0 | 0.5647046 | 8 | -0.008875956 |
dFdCDP | dFdUMP | 0 | 0.8015955 | 8 | 0.5253429 |
dFdCDP | dFdUDP | 8 | 0.07034648 | 0 | -0.05754906 |
dFdCDP | dFdUTP | 8 | -0.521363 | 0 | -0.5894032 |
dFdCTP | dFdCTP | 0 | 1 | 8 | 0.6794037 |
dFdCTP | dFdUout | 8 | 0.3093714 | 0 | 0.1864065 |
dFdCTP | dFdU | 0 | 0.3050905 | 8 | -0.2878365 |
dFdCTP | dFdUMP | 0 | 0.8551774 | 8 | 0.4323933 |
dFdCTP | dFdUDP | 8 | 0.3505927 | 0 | 0.2575427 |
dFdCTP | dFdUTP | 8 | -0.2830911 | 0 | -0.3565221 |
Minimum and maximum correlation and corresponding value of the lag τ ( τ _{ min_corr } and τ _{ max_corr } , respectively) obtained from the synthetic time series of dFdU metabolites
Reference species | Species | τ _{ max_corr } | Max correlation | τ _{ min_corr } | Min correlation |
---|---|---|---|---|---|
dFdUout | dFdUout | 0 | 1 | 8 | 0.8183285 |
dFdUout | dFdU | 0 | -0.6570017 | 7 | -0.7962606 |
dFdUout | dFdUMP | 0 | 0.1308286 | 8 | -0.2427138 |
dFdUout | dFdUDP | 0 | 0.8595276 | 8 | 0.7535818 |
dFdUout | dFdUTP | 8 | 0.6199872 | 0 | 0.6165086 |
dFdU | dFdU | 0 | 1 | 8 | 0.4509447 |
dFdU | dFdUMP | 3 | 0.4809823 | 0 | 0.4264742 |
dFdU | dFdUDP | 8 | -0.3964367 | 0 | -0.506585 |
dFdU | dFdUTP | 8 | -0.400108 | 0 | -0.4027564 |
dFdUMP | dFdUMP | 0 | 1 | 8 | 0.5456419 |
dFdUMP | dFdUDP | 8 | 0.1875394 | 0 | 0.06438349 |
dFdUMP | dFdUTP | 0 | -0.1140161 | 8 | -0.1304115 |
dFdUDP | dFdUDP | 0 | 1 | 8 | 0.8698683 |
dFdUDP | dFdUTP | 8 | 0.567206 | 0 | 0.5078434 |
dFdUTP | dFdUTP | 0 | 1 | 8 | 0.9067803 |
Set of reactions inferred by the time-lagged correlation inference method from synthetic time series data: we denote with the letter “E” the expected reactions, and with a letter “U” those unexpected and incorrect, and with the letter P those reactions that are unexpected but that can be easily explained and are plausible in terms of correlation; the reaction dFdCMP →dFdU (i.e. reaction E13 in Table 6) has been not detected
Reaction ID | Reaction | Rate constant (k ± Δk) | Expected/Unexpected |
---|---|---|---|
R1 | dFdCout $\stackrel{{k}_{1}}{\to}$ dFdC | 7.10898534 ± 0.08340605 | E |
R2 | dFdC $\stackrel{{k}_{2}}{\to}$ dFdCout | 0.77373197 ± 0.20074963 | E |
R3 | dFdCout $\stackrel{{k}_{3}}{\to}$ dFdCMP | 0.70443170 ± 0.21974489 | P |
R4 | dFdCMP $\stackrel{{k}_{4}}{\to}$ dFdCout | 2.97694326 ± 1.04080703 | P |
R5 | dFdC $\stackrel{{k}_{5}}{\to}$ dFdCMP | 0.42700844 ± 0.15141087 | E |
R6 | dFdCMP $\stackrel{{k}_{6}}{\to}$ dFdC | 1.05564461 ± 0.29004594 | E |
R7 | dFdCDP $\stackrel{{k}_{7}}{\to}$ dFdCTP | 0.67814401 ± 0.33891991 | E |
R8 | dFdCTP $\stackrel{{k}_{8}}{\to}$ dFdCDP | 0.24600288 ± 0.12911296 | E |
R9 | dFdC $\stackrel{{k}_{9}}{\to}$ dFdUout | 0.07429936 ± 0.10274560 | P |
R10 | dFdUout $\stackrel{{k}_{10}}{\to}$ dFdC | 1.84455256 ± 0.41277353 | U |
R11 | dFdC $\stackrel{{k}_{11}}{\to}$ dFdU | 0.05303525 ± 0.11826494 | E |
R12 | dFdU $\stackrel{{k}_{12}}{\to}$ dFdC | 0.42656017 ± 0.36820500 | U |
R13 | dFdCMP $\stackrel{{k}_{13}}{\to}$ dFdU | 0.01746121 ± 0.04239150 | U |
R14 | dFdU $\stackrel{{k}_{14}}{\to}$ dFdCMP | 1.15296892 ± 0.87614147 | U |
R15 | dFdCDP $\stackrel{{k}_{15}}{\to}$ dFdU | 0.74470692 ± 0.04372718 | U |
R16 | dFdU $\stackrel{{k}_{16}}{\to}$ dFdCDP | 0.13066227 ± 0.30402961 | U |
R17 | dFdCTP $\stackrel{{k}_{17}}{\to}$ dFdU | 0.21526282 ± 0.24385105 | U |
R18 | dFdU $\stackrel{{k}_{18}}{\to}$ dFdCTP | 0.22459952 ± 0.31608938 | U |
R19 | dFdUout $\stackrel{{k}_{19}}{\to}$ dFdU | 0.15238122 ± 0.22702423 | E |
R20 | dFdU $\stackrel{{k}_{20}}{\to}$ dFdUout | 0.90365762 ± 1.15063193 | E |
R21 | dFdCDP $\stackrel{{k}_{21}}{\to}$ dFdUMP | 0.33079357 ± 0.29670272 | U |
R22 | dFdUMP $\stackrel{{k}_{22}}{\to}$ dFdCDP | 0.14176496 ± 0.32573757 | U |
R23 | dFdCTP $\stackrel{{k}_{23}}{\to}$ dFdUMP | 0.03484985 ± 0.08815358 | U |
R24 | dFdUMP $\stackrel{{k}_{24}}{\to}$ dFdCTP | 0.23689470 ± 0.32365144 | U |
R25 | dFdU $\stackrel{{k}_{25}}{\to}$ dFdUMP | 1.30633488 ± 0.31031032 | E |
R26 | dFdUMP $\stackrel{{k}_{26}}{\to}$ dFdU | 1.10583395 ± 0.20384256 | E |
R27 | dFdUout $\stackrel{{k}_{27}}{\to}$ dFdUDP | 0.19345312 ± 1.03917708 | P |
R28 | dFdUDP $\stackrel{{k}_{28}}{\to}$ dFdUout | 0.74875449 ± 1.17435092 | P |
R29 | dFdC $\stackrel{{k}_{29}}{\to}$ dFdUTP | 0.12752520 ± 0.13267091 | U |
R30 | dFdUTP $\stackrel{{k}_{30}}{\to}$ dFdC | 0.99825505 ± 0.40035050 | U |
R31 | dFdCMP $\stackrel{{k}_{31}}{\to}$ dFdUTP | 0.52124642 ± 0.95261538 | U |
R32 | dFdUTP $\stackrel{{k}_{32}}{\to}$ dFdCMP | 0.46822249 ± 0.95099997 | U |
The set of reactions for which the ratio between the estimate of the rate constant and the estimate of its error is equal or greater than one
Reaction ID | Rate constant | Expected/Unexpected |
---|---|---|
R1 | k_{1}=7.11±0.08 | E |
R2 | k_{2}=0.8±0.2 | E |
R3 | k_{3}=0.7±0.2 | P |
R4 | k_{4}=3±1 | P |
R5 | k_{5}=0.43±0.15 | E |
R6 | k_{6}=1.1±0.3 | E |
R7 | k_{7}=0.7±0.3 | E |
R8 | k_{8}=0.25±0.13 | E |
R10 | k_{10}=1.8±0.4 | U |
R14 | k_{14}=1.2±0.9 | E |
R15 | k_{15}=0.74±0.04 | P |
R25 | k_{25}=1.3±0.3 | E |
R26 | k_{26}=1.1±0.2 | E |
R30 | k_{30}=1.0±0.4 | U |
The set of expected real reactions (from the cartoon of Figure 3)
Reaction ID | Reaction |
---|---|
E1 | dFdCout → dFdUout |
E2 | dFdCout → dFdC |
E3 | dFdC → dFdCout |
E4 | dFdC → dFdU |
E5 | dFdU → dFdUout |
E6 | dFdUout → dFdU |
E7 | dFdC → dFdCMP |
E8 | dFdCMP → dFdC |
E9 | dFdCMP → dFdCDP |
E10 | dFdCDP → dFdCMP |
E11 | dFdCDP → dFdCTP |
E12 | dFdCTP → dFdCDP |
E13 | dFdCMP → dFdUMP |
E14 | dFdU → dFdUMP |
E15 | dFdUMP → dFdU |
E16 | dFdUMP → dFdUDP |
E17 | dFdUDP → dFdUMP |
E18 | dFdUDP → dFdUTP |
E19 | dFdUTP → dFdUDP |
According to reaction R3, the decrement of dFdCout is directly proportional to the increment of dFdCMP via a single reaction consuming dFdCout and producing dFdCMP. However, the model includes also the uptake of external dFdC into the cell and its excretion, i.e. $\mathrm{dFdCout}\phantom{\rule{0.3em}{0ex}}\underset{{k}_{2}}{\overset{{k}_{1}}{\rightleftharpoons}}\phantom{\rule{0.3em}{0ex}}\mathrm{dFdC}$. Once gemcitabine is inside the cell, it undergoes a reversible reaction of phosphorilation, i.e: $\mathrm{dFdC}\phantom{\rule{0.3em}{0ex}}\underset{{k}_{6}}{\overset{{k}_{5}}{\rightleftharpoons}}\phantom{\rule{0.3em}{0ex}}\mathit{\text{dFdCMP}}$. The estimates of the rates obtained with KInfer for these reactions are: k_{1}=7.11, k_{2}=0.8, k_{5}=0.43 and k_{6}=1.1. k_{1} is one order of magnitude greater than k_{5}, and similarly k_{6} is one order of magnitude greater than k_{5}. As consequence, since (i) the uptake of dFdCout is much faster than the phosphorilation reaction, and (ii) the dephosphorilation of dFdC is much faster than the excretion from the cell [7, 8], the correlation-based inference model detects a significant correlation between dFdCout and dFdCMP through the reactions R3 and R4. Nevertheless, the algorithm is able to detect the uptake and the excretion of the dFdC (Reactions R1 and R2).
Finally, the correlation-basednetwork inference algorithm does not detect the reaction dFdCMP→dFdU that is reported in the cartoon of Figure 3, and later in the Table 6 (i.e reaction E13).
Table 5 reports also Reaction R10: $\mathit{\text{dFdUout}}\stackrel{{k}_{10}}{\to}\mathit{\text{dFdC}}$. According to the current knowledge, the interaction between dFdUout and dFdC is mediated by the pathway illustrated in Figure 14 (B). Therefore reaction R10 is not correct, because a decrement of the extra-cellular concentration of dFdU does not cause an increment of the intracellular concentration of dFdC. The opposite is plausible: a decrement of the intracellular dFdC is proportional to an increment of the intracellular dFdU, that can be released outside the cell. Finally, Table 5 reports reaction R30, that is not correct, because, as we can see in Figure 3, the metabolites of phosphorylated metabolites of dFdU are not converted in any metabolites of dFdC.
The reactions R11, R19 and R20 tagged as “expected” by the correlation-based inference procedure and reported in Table 4 have been cut off by the network calibration procedure, and thus they do not appear in Table 5.
Our procedure on the case study of gemcitabine metabolism outperforms the algorithm in [44] that has been designed for the determination of chemical reaction network interconnections from time series data too. However, we point out that the sensitivity and the accuracy defined in this way are global measures that average over the whole network giving two values as representative of the performance of the inference algorithm. A better assessment of the inference algorithm could be conducted if we disposed either of real biological but small-scale data. A comparison of these performances with the performances of other state-of-art tools is still in progress. Preliminary results of this analysis show us that the outcome of the network inference varies between tools and can be complementary. Moreover, each tool is tailored to a specific biological pathway and experimental set up. For this reason mainly, in the scientific community there is no general consensus that would have been reached declaring one network inference method as gold standard [45].
Conclusions
We presented a method for the prediction of the interactions within reaction networks from experimental time series of the concentration of the species composing the system. The overall aim of our investigation was to develop a procedure of system identification in which the parameter inference acts as tool of model refinement and elimination of false positive interconnections. Both the network topology inference and the parameter estimation have been designed to predict reaction pathways in chemical kinetic systems from noisy time resolved concentration measurements.
We developed a method that does not require any a priori information about the topology of the interconnections. It consists of two parts: the estimation of time-lagged correlation function between the components of the systems for the determination of direct edges between the components and the inference of the kinetic parameters to establish real dynamics and to cut “null dynamics” edges instead. The method has been developed to deducing the connectivities of chemical species, the reaction pathway, and the reaction mechanism of complex reaction systems. The procedure can process more than 3,000 reactions involving hundreds of species. In particular, our methodology coupling the connectivities inference to the parameter inference reveals to be suitable to infer reactions mechanisms of complex systems as drug metabolic network, where many feedback loops and reversible reactions are present. The data driven estimation of the time lag interval facilitates the use of the tool and minimize the a priori information to input. Good performances have been obtained on the test case of gemcitabine metabolism, that is supported by recent experimental studies. Finally, this work highlights five main aspects determining a sensitive and accurate inferability of a biochemical network with time-lagged correlation-based methods: (i) the availability of replicates of experiments, so that averaged time series can be used to soften the noise in the input data; (ii) the availability of data having a time resolution suitable to capture the time of onset of the correlation between the species and the number of chemical species that can be measured simultaneously; (iii) the availability of local measures on the level of individual edges and subnetworks, instead of having only global measures of time series averaging over the whole network dynamics [45]; (iv) the effects of measurement error, and (v) the presence of unmeasured species. These issues have to be addressed before adopting inference methods in the daily laboratory practice for any specific problem.
Declarations
Acknowledgements
The authors would like to thank Lorenzo Stella of the Microsoft Research - University of Trento Centre for Computaional and Systems Biology of Trento (Italy) for his valuable suggestions and for his collaboration in testing the method and validating the results and Sanders Veltkamp of the Astellas Pharma for his help in understanding the experimental obsevation and the pharmacokinetic model.
Authors’ Affiliations
References
- Kramer MA, Eden UT, Cash SS, Kolaczyk ED: Network inference with confidence from multivariate time series. Phys Rev E 2009, 79: 442-446.View ArticleGoogle Scholar
- van de Waterbeemd H, Gifford E: ADMET in silico modelling: towards prediction paradise? Nat Rev Drug Discov 2003, 2: 192-204. 10.1038/nrd1032View ArticleGoogle Scholar
- van der Graaf PH, Benson N: Systems pharmacology: bridging systems biology and pharmacokinetics-pharmacodynamics (PKPD) in drug discovery and development. Pharm Res 2011, 28: 1460-1464. 10.1007/s11095-011-0467-9View ArticleGoogle Scholar
- Crampin EJ, Schnell S, McSharry PE: Mathematicaland computational techniques in deduce complex biochemical reaction mechanisms. Progress Biophys Mol Biol 2004, 86: 77-112. 10.1016/j.pbiomolbio.2004.04.002View ArticleGoogle Scholar
- Ross J, Schreiber I, Vlad MO: Determination of Complex Reaction Mechanisms: Analysis of Chemical, Biological, and Genetic Networks. Oxford University Press Inc., New York USA; 2006.Google Scholar
- Smet RD, Marchal K: Advantages and limitations of current network inference methods. Nat Rev, Microbiol 2010, 8: 717-729.Google Scholar
- Mini E, Nobili S, Caciagli B, Landini I, Mazzei T: Cellular pharmacology of gemcitabine. Ann Oncol 2006, 17: v7-v12. 10.1093/annonc/mdj941View ArticleGoogle Scholar
- Veltkamp SA, Pluim D, van Eijndhoven MA, Bolijn MJ, Ong FH, Govindarajan R, Unadkat JD, Beijnen JH, Schellens JHM: New insights into the pharmacology and cytotoxicity of gemcitabine and 2′,2′-difluorodeoxyuridine. Mol Cancer Ther 2008,7(8):2415-2425. 10.1158/1535-7163.MCT-08-0137View ArticleGoogle Scholar
- Sheiner LB, Halkin H, Peck C, Rosenberg B, Melmon KL: Improved computer-assisted digoxin therapy: a method using feedback of measured serum digoxin concentrations. Ann Intern Med 1975,85(5):619-627.View ArticleGoogle Scholar
- D’Argenio DZ, Schumitzky A, Wang X: ADAPT 5 User’s Guide: Pharmacokinetic/Pharmacodynamic Systems Analysis Software. 2009. http://bmsr.usc.edu/Software/ADAPT/Google Scholar
- Bonate PL: Pharmacokinetic-Pharmacodynamic Modeling and Simulation. Springer Science+Businnes Media LLC, New York, USA; 2005.Google Scholar
- Pharmacokinetic and Pharmacodynamic Modeling in SimBiology 2011.http://www.mathworks.com/products/simbiology/pharmacokinetics-software.html
- NONMEM home page 2011.http://www.iconplc.com/technology/products/nonmem/
- Xu L, D’Argenio DZ, Eiseman JL, Egorin MJ: Bayesian inference in physiologically-based pharmacokinetic modeling: application to anticancer drug development. Adv Methods Pharmacokinetic Pharmacodynamic Syst Anal 2004,756(1):105-131.View ArticleGoogle Scholar
- Hopkins AL: Network pharmacology: the next paradigm in drug discovery. Nat Chem Biol 2008, 4: 682-690. 10.1038/nchembio.118View ArticleGoogle Scholar
- Samoilov M, Arkin A, Ross J: On the deduction of chemical reaction pathways from measurements of time series of concentration. Chaos 2001, 11: 108-114. 10.1063/1.1336499View ArticleGoogle Scholar
- Arkin A, Shen P, Ross J: A test case of correlation metric construction of a reaction pathway from measurements. Science 1997,277(5330):1275-1279. 10.1126/science.277.5330.1275View ArticleGoogle Scholar
- Lecca P, Palmisano A, Ihekwaba A, Priami C: Calibration of dynamic models of biological systems with KInfer. European Biophys J 2010,29(6):1019-1039.View ArticleGoogle Scholar
- KInfer. COSBI Lab 2010.http://www.cosbi.eu/index.php/research/prototypes/kinfer
- Buja A, Swayne DF, Littman ML, Dean N, Hofmann H: Interactive data visualization with multidimensional scaling. Stress Int J Biol Stress 2004, 06511: 1-30.Google Scholar
- Buja A, Swayne DF, Littman ML, Dean N, Hofmann H, Chen L: Data visualization with multidimensional scaling. J Comput Graphical Stat 2008,17(2):444-472. 10.1198/106186008X318440View ArticleGoogle Scholar
- DeJordy R, Borgatti SP, Roussin C, Halgin DS: Visualizing proximity data. Field Methods 2007, 19: 239. 10.1177/1525822X07302104View ArticleGoogle Scholar
- Everitt B, Rabe-Hesketh S: The analysis of proximity data. J. Wiley, London: Arnold; New York; 1997.Google Scholar
- Telea AC: Data Visualization. Principples and Practice. A. K. Peters, Ltd., Wellesley; 2008.Google Scholar
- Plavec I, Sirenko O, Privat S, Wang Y, Dajee M, Melrose J, Nakao B, Hytopoulos E, Berg EL, Butcher EC: Method for analyzing signaling networks in complex cellular systems. PNAS 2004, 101: 1223-1228. 10.1073/pnas.0308221100View ArticleGoogle Scholar
- Venna J, Kaski S: Visualizing Gene Interaction Graphs with Local Multidimensional Scaling. Proceedings of the 14th European Symposium on Artificial Neural Networks (ESANN’2006), Volume 101 2006.Google Scholar
- Stineman RW: A consistently well behaved method of interpolation. Creative Computing 1980,6(7):54-57.Google Scholar
- Fraser AM, Swinney HL: Independent coordinates for strange attractors from mutual information. Physi Rev A 1986,33(2):1134-1140. 10.1103/PhysRevA.33.1134View ArticleGoogle Scholar
- Browne M: A geometric approach to non-paramteric density estimation. Pattern Recognit 2007, 40: 134-140. 10.1016/j.patcog.2006.05.012View ArticleGoogle Scholar
- Du Q, Grunzburger M: Grid generation and optimization based on centroidal Voronoi tessellations. App Math Comp 2002,133(2-3):591-607. 10.1016/S0096-3003(01)00260-0View ArticleGoogle Scholar
- Hastie T, Friedman J, R T: The elements of statistical learning. Data Mining, Inference and prediction. Springer, New York, USA; 2001.Google Scholar
- Wasserman S, Faust K: Social Network Analysis: Methods and Applications. Cambridge University Press, Cambridge, UK; 1994.View ArticleGoogle Scholar
- Press WH, Teukolsky SA, Vetterling WT, Flannery BP: Numerical Recipes: The Art of Scientific Computing (3rd ed.), “Section 10.5. Downhill Simplex Method in Multidimensions”. Cambridge University Press, New York; 2007.Google Scholar
- Nelder JA, Mead R: A simplex method for function minimization. The Comp Jnl 1965,7(4):308-313.View ArticleGoogle Scholar
- Cox TF, Cox MAA: Multidimensional scaling. Chapman and Hall/CLC, USA; 2001.Google Scholar
- Borg I, Groenen P: Modern Multidimensional Scaling: theory and applications. Springer Series in Statistics, New York, USA; 2005.Google Scholar
- Borgatti SP: Multidimensional scaling. 1997.http://www.analytictech.com/borgatti/mds.htmGoogle Scholar
- Lee MD: Determining the dimensionality of multidimensional scaling representations for cognitive modeling. J Math Psychol 2001,45(1):149-166. 10.1006/jmps.1999.1300View ArticleGoogle Scholar
- Lee MD: The connectionist construction of psycological spaces. Connect Sci 1997,9(4):323-352. 10.1080/095400997116586View ArticleGoogle Scholar
- Shepard RN: The analysis of proximities: multidimensional scaling with an unknown distance function. Psycometrika 1962,27(2):125-140. 10.1007/BF02289630View ArticleGoogle Scholar
- Taylor JR: Introduction to Error Analysis: The Study of Uncertainties in Physical Measurements. Univ Science Books, Sausalito; 1997.Google Scholar
- Lecca P, Palmisano A, Priami C, Sanguinetti G: A new probabilistic generative model of parameter inference in biochemical networks. ACM Symposium on Applied Computing, ACM DL 2009
- Lecca P, Palmisano A, Priami C: Deducing chemical reaction rate constants and their regions of confidence from noisy measurements of time series of concentration. In 11th Int. Conference on Computer Modelling and Simulation (UKSim 2009). IEEE Computer Society, Cambridge - England; 2009:200-205.Google Scholar
- Papachristodoulou A, Recht B: Determining interconnections in chemical reaction networks. American Control Conference, New York, 2007 2007, 4872-4877.View ArticleGoogle Scholar
- Emmet-Streib E, Altay G: Local network-based measures to asses the inferability of different regulatory networks. IET Syst Biol 2010, 4: 277-288. 10.1049/iet-syb.2010.0028View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.