- Open Access
An approach for dynamical network reconstruction of simple network motifs
BMC Systems Biology volume 7, Article number: S4 (2013)
One of the most important projects in the post-genome-era is the systemic identification of biological network. The almost of studies for network identification focused on the improvement of computational efficiency in large-scale network inference of complex system with cyclic relations and few attempted have been done for answering practical problem occurred in real biological systems. In this study, we focused to evaluate inferring performance of our previously proposed method for inferring biological network on simple network motifs.
We evaluated the network inferring accuracy and efficiency of our previously proposed network inferring algorithm, by using 6 kinds of repeated appearance of highly significant network motifs in the regulatory network of E. coli proposed by Shen-Orr et al and Herrgård et al, and 2 kinds of network motif in S. cerevisiae proposed by Lee et. al. As a result, our method could reconstruct about 40% of interactions in network motif from time-series data set. Moreover the introduction of time-series data of one-factor disrupted model could remarkably improved the performance of network inference.
The results of network inference examination of E. coli network motif shows that our network inferring algorithm was able to apply to typical topology of biological network. A continuous examination of inferring well established network motif in biology would strengthen the applicability of our algorithm to the realistic biological network.
The investigation of network dynamics in biology is a major issue in systems and synthetic biology. Recent advances in high-throughput technologies for comprehensive observation of cells produce a lot of data for analyzing dynamics of complex system such as gene regulatory networks and metabolic pathways. Time-series with dynamic behavior are one of such data involving enormous amount of information regarding the regulation of biological network in vivo. However, as such information is entirely implicit, it requires the development of adequate analytic and computational methods to reconstruct biological systems. The key in developing such computational methods is to build a reliable mathematical model for analyzing biological networks, and to explore parameter values in the model within vast searching space. Tominaga et al. and Maki et al. have developed a novel method [1, 2] inferring conceptual biological networks by the combination of a dynamical network model called S-system  with a traditional parameter estimation based on simple genetic algorithms [4, 5]. The S-system is based on an ordinary differential equation, in which the temporal (time-dependent) dynamic process of system components are characterized by power-law formalism. The S-system is suitable for conceptual modeling and describing complex systems with a loop or a cyclic interaction because the dynamic behavior of the network can be easily obtained by numerical integration and customized . The values of interrelated coefficients in the formalism are directly or indirectly related to the regulation mechanism in the network model. The inferred network structure from the inference of parameters provides one of the best candidates for the biological network structure. However, S-system requires a large number of parameters that must be estimated to identify dynamical biological networks; the number of estimated parameters is 2n(n + 1) (where n is the number of system components).
We previously proposed efficient procedures for inferring biological network based on experimentally observed time-series data of mRNA or metabolites [7–10] using S-system and real-coded genetic algorithms (RCGAs)  with a combination of uni-modal normal distribution crossover(UNDX)  and minimal generation gap(MGG) . Other groups have also developed several methods to optimize parameters using S-system [14–19], Beside of S-system modeling, a lot of network reconstruction algorithms from time-series have been developed [20–27]. However, most of the works focused on the improvement of computational efficiency in large-scale network inference of complex systems with cyclic relations and few attempts have been done for answering practical problems occurred in real biological systems. Herrgård et. al., Shen-Orr et. al., and Lee et. al. proposed that the gene regulatory network in Escherichia coli or Saccharomyces cerevisiae identified by experimental studies is composed of the limited number of network motif; each motif has simple form of relationships between transcription factors and genes [28, 29]. Little attention has been paid to evaluate the performance of network inference for such simple network motifs with dynamical modeling, S-system. In this paper, in order to evaluate the inferring performance of our previously proposed network inferring algorithms, we applied our algorithm to 8 kinds of simple form of network motifs proposed by Shen-Orr et. al. , Herrgård et. al. , and Lee et. al.  Shen-Orr et. al. and Herrgård et. al. suggested repeated appearances of highly significant motifs. Lee et. al suggested network motifs based on genome-wide location data.
Results and discussion
Results of network identification
We inferred network candidates 100 times for each network motif, based on artificial generated time-series data sets (see Figure 1, 2 and 3). After obtaining 100 network candidates, we calculated precision and recall (see Figure 4) to evaluate the accuracy and efficiency of our algorithm from network structural (topological) point of view. In the case of network inferences for DOR(Dense overlapping regulation), FF(Feed-forward), RM(Regulator Module) and TM(Target Module) networks, the value of recall becomes around 0.4, which indicates about 40% of interactions in the network model are properly reconstructed by our algorithm. The better case were for RI(Regulatory Interaction), AR (Autoregulation), and ML(Multicomponent-Loop) network, and our algorithm could reconstruct around 60% of interactions in the network motif. We also calculated F-measure to evaluate balance of accuracy and efficiency of network identification. Also in F-measure, network identification for RI, AR, and ML network represents better estimation results compared with DOR, FF, RM, SIM and TM. On the other hand, the low values of precision were observed in the all cases, indicating that network candidates inferred from our algorithms. The performance of inferring accuracy (precision) was relatively low, namely, the inferred network candidates contain many false-positive interactions. Figure 5(A) shows the best case of identified network topology for SIM (precision : 0.33, recall : 1.0). Figure 5(A) contains all regulatory interactions in network motif for SIM (shown in Figure 1(E)). However, this contains many false-positive interactions, such as self-degradation for synthetic process, and inhibitory regulation from X2, X3, and X4 to X1.
However, the lower values of precision were often observed in our previous works applied to other types of networks, so that we have already developed a method to remove the false-positive interactions inferred by parallel computing [7, 8]. Even though we can apply our previously proposed method to improve the precision values, our aim here is to see how both precision and recall values can be improved by altering the information content of time-series data.
We thus focus on the inferred network candidates for SIM since the performance of accuracy and efficiency (Figure 4(E)) was very low. There is a possibility that the imbalance between huge degree of freedom in S-system network modeling and information amount in reference time-series data yields such low performance of accuracy and efficiency. In other words, the information content of the single reference time-series data (shown in Figure 3(E)) is not enough to identify network candidates. To overcome this situation, we tried to infer network candidates by testing another kind of time-series data, more strictly, one-factor disrupted model. We prepared time-series data for one-factor disrupted model as shown in Figure 6. The S-system parameters for reconstructing networks are same as Figure 2 and 3 except the rate constant for the synthetic process of disrupted factor. We prepared time-series data for the one-factor disrupted model with the rate constant for the synthetic process of disrupted factor i(α i ) set to 0.0. We inferred 8 network candidates from 5 time-series data including wild-type (see Figure 3(E)) and one-factor disrupted strain. The comparison between single and 5 time-series in inferring accuracy and efficiency is shown in Figure 7. The result shows that the performance is remarkably improved compared with the case in single time-series. We applied the same data to other motifs (data not shown) and found that the introduction of time-series data using one-factor disrupted model can improve the performance of our algorithm.
We applied our previously proposed algorithm to the network motifs proposed by Herrgård and Shen-Orr. As a result, the efficiency (recall) of our method exhibited relatively high in most of network motifs. In particular, in the Regulatory Interactions (RI) model, we reconstructed about 68% of interactions in the model. Interestingly, the performance of network inference for complex regulatory network including cyclic interactions (AR and ML) was better than that for simple network analyzed in this study. It is likely that the abundant information related to dynamic behavior contained in time-series data for complex regulatory network constrains the degree of freedom S-system modeling, for this reason, the false-positive or false-negative interactions for complex network are reduced.
In order to examine how to improve both the accuracy and efficiency, we attempted to infer the network candidates based on 5 time-series data including time-series for one-factor disrupted model. In this situation, the performance of inferring accuracy and efficiency remarkably increased. This result suggests that the inferring performance can be improved by adding other kinds of time-series data.
Note that the present performance is examined by a set of data generated from arbitrary given parameter values. We should test the performance of our method for various structures of networks with different parameters as well as for observed data. From practical point of view, there have been various kinds of data accumulated under different experimental conditions. The differential information content of such data is expected to further improve the performance of our method. A continuous examination of inferring well-established network motifs in biology would strengthen the applicability of our algorithm to the realistic biological network including gene regulatory networks or metabolic pathways.
In order to evaluate the applicability of our inferring algorithm, we prepared 8 kinds of artificial network models, Regulatory Interaction (RI), Regulator Module (RM), Target Module (TM), Feed-Forward (FF), Single Input Module (SIM), Dense Overlapping Regulation (DOR), Autoregulation (AR), and Multicomponent-Loop (ML). Each network model contains a significant network motif in the regulatory network of Escherichia coli proposed by Shen-Orr et. al. and Herrgård et al [28, 29], and that of Saccharomyces cerevisiae proposed by Lee et. al . We modified the 8 network motifs to network models consisting of 4 nodes (X1, X2, X3, and X4) without a loss of each network topology. Figure 1 shows each network structure analyzed in this paper.
Subsequently, we prepared artificial time-series data containing 40 sampling points for each network motif by the numerical integration . The reference time-series data of 8 network models are shown in Figure 2.
S-system is a suitable formalism for dealing with gene expression network or conceptual metabolic pathway structures. It can sufficiently represent the structure of organizationally complex system, to capture the essence of experimentally observed response:
where n is the number of system components (genes or metabolites) in the investigating network, X i is the experimentally observed response (gene expression level for gene expression network, or concentration of metabolites for metabolic pathway's investigation), α i and β i are apparent positive rate constant, and g ij and h ij are interrelated coefficients between X i s.
The first term on the right-hand side of Eq. (1) corresponds to the synthetic process of X i , and the second term expresses the degradation process of X i . The value of g ij (h ij ) express the interactive effects of X j to the synthetic process (degradation process) of X i . The value of g ij (h ij ) also determine the structure of the interactions between X i and X j . When the value of g ij (h ij ) is positive, X j induces a synthetic process (degradation process) of X i . On the other hand, when g ij (h ij ) is negative, X j suppresses the synthetic process (degradation process) of X i . When the value of g ij (h ij ) is zero, then there are no effects of X j on the synthetic (degradation) process of X j .
The biological network can be inferred by estimating α i , β i , and h ij in the S-system formula. A representation of S-system parameters to be estimated is shown in Figure 1.
Real-coded genetic algorithms
The S-system is a formalism of ordinary non-linear differential equations, and thus the system can easily be solved numerically by using numerical integration algorithm customized specifically for this formalism . However, when an adequate time-course of relevant state variable is given, a set of parameter values α i , β i , g ij , and h ij , in many cases, will not be uniquely determined, because it is highly possible that the other set of parameter values will also show a similar time-course. Therefore, even if one set of parameter values that could explain the observed time-course is obtained, this set is still one of the best candidates that explain the observed time-courses. Our strategy is to explore and exploit these candidates within the immense huge searching space of parameter values.
In this problem, each set of parameter values to be estimated is evaluated by using following procedure: Suppose that is the value of the numerically integrated time-course at time t of state variable X i in the d-th data-set, and represents the experimentally observed time-course at time t of X i in the d-th data-set. Sum up the square values of relative error between and to get the total relative error E;
where D is the total number of data-sets that experimentally observed under the different kind of experimental conditions such as disruption of genes or inhibition of kinase activities, N is the total number of experimentally observable state variables and T is the total number of sampling points over time in one experimental conditions. The computational task is to find out a set of parameter values that minimizes the objective function E. We have developed the efficient computational technique based on real-coded genetic algorithms (RCGAs) as a nonlinear numerical optimization method which is much less likely to be stranded in local minima. This technique is based on the combination of the operator called uni-modal normal distribution crossover (UNDX)  with the alternation of generation model called minimal generation gap (MGG) model . Furthermore, in order to find the skeletal structure (small-size system) of the S-system formalism that explain the experimentally observed response, some of the parameters (g ij and h ij ), absolute values of which are less than a given threshold value are to be removed (reset to zero) during optimization procedures.
Evaluation of identified network
We used the precision and the recall for evaluating the inferred biological network candidates. The precision is defined as follows:
where TP i is the number of true-positive interactions in i-th network candidate, FP i is the number of false-positive interactions in i-th network candidate, and n is the number of inferred network candidates. The value of precision shows the inferring accuracy of biological network candidates. We also used recall, which indicates the inferring efficiency of network candidates as follows:
where FN i is the number of false-negative interactions in i-th network candidates. Both precision and recall values are defined between 0.0 to 1.0, and the best value of precision and recall are 1.0.
For evaluating balance using both precision and recall, defined as follows:
Tominaga D, Koga N, Okamoto N: Efficient numerical optimization algorithm based on genetic algorithm for inverse problem. Proceedings of the Genetic and Evolutionary Computation Conference. 2000, 251-258.
Maki Y, Tominaga D, Okamoto M, Watanabe S, Eguchi Y: Development Of A System For The Inference Of Large Scale Genetic Networks. 2001
Savageau AM: Biochemical Systems Analysis: A study of function and design in molecular biology. 1976, Addison-Wesley, Reading
Holland JH: Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence. 1992, Cambridge, MA, USA: MIT Press
Goldberg DE: Genetic Algorithms in Search, Optimization and Machine Learning. 1989, Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc.,, 1
Irvine H D, Savageau MA: Efficient solution of nonlinear ordinary differential equations expressed in S-system canonical form. SIAM Journal on Numerical Analysis. 1990, 27 (3): 704-735. 10.1137/0727042.
Nakatsui M, Ueda T, Maki Y, Ono I, Okamoto M: Method for inferring and extracting reliable genetic interactions from time-series profile of gene expression. Mathematical Biosciences. 2008, 215: 105-114. 10.1016/j.mbs.2008.06.007. [http://www.sciencedirect.com/science/article/pii/S0025556408000953] 10.1016/j.mbs.2008.06.007
Nakatsui M, Ueda T, Maki Y, Ono I, Okamoto M: Efficient inferring method of genetic interactions based on time-series of gene expression profile. Proceedings of 13th International Symposium on Artificial Life and Robotics. 2008, 71-76.
Shikata N, Maki Y, Nakatsui M, Mori M, Noguchi Y, Yoshida S, Takahashi M, Kondo N, Okamoto M: Determining important regulatory relations of amino acids from dynamic network analysis of plasma amino acids. Amino Acids. 2010, [http://dx.doi.org/10.1007/s00726-008-0226-3]
Komori A, Maki Y, Nakatsui M, Ono I, Okamoto M: Efficient Numerical Optimization Algorithm Based on New Real-Coded Genetic Algorithm, AREX + JGG, and Application to the Inverse Problem in Systems Biology. Applied Mathematics. 2012, 3: 1463-1470. 10.4236/am.2012.330205.
Janikow CZ, Michalewicz Z: An Experimental Comparison of Binary and Floating Point Representations in Genetic Algorithms. Proc of the 4th International Conference on Genetic Algorithms. Edited by: Belew RK, Booker LB. 1991, Morgan Kaufmann, 151-157.
Ono I, Sato H: A Real-Coded Genetic Algorithm for Function Optimization Using Unimodal Distribution Crossover. Proceedings of the 7th ICGA. 1997, 249-253.
Sato H, Ono I, Kobayashi S: A New Generation Alternation Model of Genetic Algorithms and Its Assessment. Journal of Japanese Society for Artificial Intelligence. 1997, 734-744.
Voit EO, Almeida J: Decoupling dynamical systems for pathway identification from metabolic profiles. Bioinformatics. 2004, 20 (11): 1670-1681. 10.1093/bioinformatics/bth140. [http://dx.doi.org/10.1093/bioinformatics/bth140] 10.1093/bioinformatics/bth140
Tucker W, Kutalik Z, Moulton V: Estimating parameters for generalized mass action models using constraint propagation. Mathematical Biosciences. 2007, 208 (2): 607-620. 10.1016/j.mbs.2006.11.009. [http://dx.doi.org/10.1016/j.mbs.2006.11.009] 10.1016/j.mbs.2006.11.009
Gonzalez OR, Küper C, Jung K, Naval PC, Mendoza E: Parameter estimation using Simulated Annealing for S-system models of biochemical networks. Bioinformatics. 2007, 23 (4): 480-486. 10.1093/bioinformatics/btl522. [http://bioinformatics.oxfordjournals.org/content/23/4/480.abstract] 10.1093/bioinformatics/btl522
Prospero C Naval J, Sison LG, Mendoza E: Metabolic Network Parameter Inference using Particle Swarm Optimization. Proceedings of International Conference on Molecular Systems Biology. 2006
Maki Y, Takahashi Y, Arikawa Y, Watanabe S, Aoshima K, Eguchi Y, Ueda T, Aburatani S, Kuhara S, Okamoto M: An Integrated Comprehensive Workbench for Inferring Genetic Networks: ::::voyagene::::. J Bioinformatics and Computational Biology. 2004, 2 (3): 533-550. 10.1142/S0219720004000727.
Chou I, Voit EO: Recent developments in parameter estimation and structure identification of biochemical and genomic systems. Mathematical biosciences. 2009, 219 (2): 57-83. 10.1016/j.mbs.2009.03.002.
Luque B, Lacasa L, Ballesteros F, Luque J: Horizontal visibility graphs: Exact results for random time series. Physical Review E. 2009, 80 (4): 046103-
Moles CG, Mendes P, Banga JR: Parameter estimation in biochemical pathways: a comparison of global optimization methods. Genome research. 2003, 13 (11): 2467-2474. 10.1101/gr.1262503.
Nelander S, Wang W, Nilsson B, She QB, Pratilas C, Rosen N, Gennemark P, Sander C: Models from experiments: combinatorial drug perturbations of cancer cells. Molecular systems biology. 2008, 4:
Zhang J, Small M: Complex Network from Pseudoperiodic Time Series: Topology versus Dynamics. Phys Rev Lett. 2006, 96: 238701-[http://link.aps.org/doi/10.1103/PhysRevLett.96.238701]
Bezsudnov IV, Gavrilov SV, Snarskii AA: From time series to complex networks: the Dynamical Visibility Graph. ArXiv e-prints. 2012
Donges JF, Donner RV, Kurths J: Testing time series irreversibility using complex network methods. EPL (Europhysics Letters). 2013, 102: 10004-10.1209/0295-5075/102/10004.
Holme P, Saramäki J: Temporal networks. Physics Reports. 2012, 519: 97-125. 10.1016/j.physrep.2012.03.001.
Csermely P, Korcsmaros T, Kiss HJM, London G, Nussinov R: Structure and dynamics of molecular networks: A novel paradigm of drug discovery. A comprehensive review. ArXiv e-prints. 2012
Harrgård MJ, Covert MW, Palson BO: Reconciling Gene Expression Data with Known Genome-Scale Regulatory Network Structure. Genome Research. 2003, 13: 2423-2434. 10.1101/gr.1330003.
Shen-Orr SS, Milo R, Mangan S, Alon U: Network motifs in the transcriptional regulation network of Escherichia coli. Nature Genetics. 2002, 31: 1061-4036.
Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, Gerber GK, Hannett NM, Harbison CT, Thompson CM, Simon I, Zeitlinger J, Jennings EG, Murray HL, Gordon DB, Ren B, Wyrick JJ, Tagne JB, Volkert TL, Fraenkel E, Gifford DK, Young RA: Transcriptional Regulatory Networks in Saccharomyces cerevisiae. Science. 2002, 298 (5594): 799-804. 10.1126/science.1075090. [http://www.sciencemag.org/content/298/5594/799.abstract] 10.1126/science.1075090
This work was partly supported by the commission for Development of Artificial Gene Synthesis Technology for Creating Innovative Biomaterial from the Ministry of Economy, Trade and Industry (METI), Japan. This work was also partly supported by JSPS KAKENHI Grant Number 23700358, Grait-in-Aid for Young Scientists (B), "Development of high accurate method for parameter estimation by the combination of numerical optimization and symbolic computation".
Publication of this supplement was supported by the commision for Development of Artificial Gene Synthesis Technology for Creating Innovative Biomaterial from the Ministry of Economy, Trade and Industry (METI), Japan.
This article has been published as part of BMC Systems Biology Volume 7 Supplement 6, 2013: Selected articles from the 24th International Conference on Genome Informatics (GIW2013). The full contents of the supplement are available online at http://www.biomedcentral.com/bmcsystbiol/supplements/7/S6.
The authors declare that they have no competing interests.
The study was designed by MN, MA, and AK. Analysis was carried out by MN. The paper was written by MN and MA. All authors approved of the final manuscript.